The function to combine multiple workbooks data

(a solution ready-to-go!

Sonrisa
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
)

Big thanks to Matt for encouranging me to post a function, similar to the one that Chris created here but, based on my latest post.

Download the file and check out the code

So here it is!

Sonrisa
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  a packaged and ready to deliver function that can do the job of combining multiple excel workbook data (sheets, named ranges and even tables).  You can download the workbook from here or you can read the M code here with detailed description of what it does, step by step:

[code language=”css”]///let’s define the input parameters
let
CombineExcel = (FolderPath as text,optional Kind as text) =>
///FolderPath is the location of the folder. Example: C:MyFolder
///Kind is the type of object that we using. Please use the chart found in-
///-this workbook for valid input parameters
let
///here’s where the query actually starts by going for the folder
Source = Folder.Files(FolderPath),
///let’s filter so we only get the excel files. Either .xls or .xlsx
#”Filtered Rows” = Table.SelectRows(Source,
each [Extension] = “.xls”
or [Extension] = “.xlsx”),
///now we remove all the columns except the Content and Name column
#”Removed Other Columns” = Table.SelectColumns(#”Filtered Rows”,{“Content”, “Name”}),
///Now we use the binary (content) field to create a custom column that shows us
#”Added Custom” = Table.AddColumn(#”Removed Other Columns”, “MyExcelData”,
each Excel.Workbook([Content])),
///we can now expand that new added custom column that contains the metadata-
///-about the excel file
#”Expand MyExcelData” = Table.ExpandTableColumn(#”Added Custom”, “MyExcelData”,
{“Name”, “Data”, “Item”, “Kind”},
{“MyExcelData.Name”, “MyExcelData.Data”,
“MyExcelData.Item”, “MyExcelData.Kind”}),
///now we filter our query by using the Kind metada field
///which is exactly the same text value that you enter on the 2nd input
#”Filtered Rows1″ = Table.SelectRows(#”Expand MyExcelData”,
each [MyExcelData.Kind] = Kind
or [MyExcelData.Kind] = null),
///Only tables have the correct headers. Sheets and ranges need a fix
///That fix is called PromoteHeaders or FirstRowAsHeaders
#”Added Custom2″ = Table.AddColumn(#”Filtered Rows1″, “Custom.1″,
each if [MyExcelData.Kind]<>”Table”
then Table.PromoteHeaders([MyExcelData.Data])
else [MyExcelData.Data]),
///We remove the data that we don’t need
#”Removed Columns” = Table.RemoveColumns(#”Added Custom2″,{“Content”, “MyExcelData.Name”,
“MyExcelData.Data”}),
///Create a new field that can count the rows of the table of each object
#”Added Custom3″ = Table.AddColumn(#”Removed Columns”, “Rows”, each Table.RowCount([Custom.1])),
///Rename some columns for easier understanding
#”Renamed Columns” = Table.RenameColumns(#”Added Custom3″,{{“Custom.1”, “ExpandThis”},
{“MyExcelData.Kind”, “Kind”},
{“MyExcelData.Item”, “ItemName”}})
in
#”Renamed Columns”
in
CombineExcel[/code]

The input parameters

Input Type Description Example
FolderPath Required The complete folderpath where the files are stored C:Miguel
Kind Optional This is the type of object, as represented by Power Query, that will get combined in each of the files Sheet

The FolderPath is quite straightforward but, what is that thing called Kind? Let’s look at another table to see what Kind is to Power Query:

Extension Excel Object Type Power Query Kind
.xls Named Range null
.xls Sheet null
.xlsx Sheet Sheet
.xlsx Named Range DefinedName
.xlsx Table Table

Note: the values shown on the column Power Query Kind are the ones that are valid as the second parameter of the function.

so from this table, we can understand that for 97-2003 Excel Files (.xls), Power Query will not distinguish their objects and the kind for everything will be null. That means that when you want to combine xls files, you can just leave that parameter empty.

For Excel files of the extension .xlsx (2007 and later) we get to see how PQ sees and dinstiguish those objects.

Things to take in consideration

The reason behind why you need to choose that 2nd parameter to be one of either null, Sheet, DefinedName or Table it’s because how data is stored in an Excel File. You could’ve chosen the Table as your 2nd parameter but that table is part of a Sheet, and since I’m assuming that you don’t want duplicated data, that’s why we filter that to be only either the Sheet data, the table data or perhaps the DefinedName data.

There’s an special scenario for files with the extension .xls – if they have a named range, you’ll get a duplicate no matter what since both of those excel object types are treated as null but fear not cause there’s a way to audit everything and choose only the data that works for you.

The Output and steps needed from the user

Just like the cup noodles, the consumer has some steps to do like the following:

  • Open the package and pouring water into it  – (that’s opening the file and entering the input values)
  • Setting the microwave and hit the start button – (what we’re about to see right now)

so once you enter the values needed, you’ll get a window like the following one:

This has 5 columns:

  1. Name = the name of the file (with the actual extension)
  2. ItemName = this is the name of the item, if it’s a named Range then it’ll be the name of that range, for Sheets then it’ll be the name of the Sheet and for Tables it’ll be the name of the table
  3. Kind = it’s the name of the object that was found from the excel workbook
  4. ExpandThis = the table that was extracted from that object of the file
  5. Rows = the total rows found in that table

From the result above, you can check what is about to be combined and do any type of auditing to see if the amount of rows to be combined is correct or any other type of auditing you might want to consider.

Note: you can dig deeper and go straight into a table or choose the table preview on the preview pane at the bottom of the PQ window and see for yourself what’s about to be combined

After you’ve checked that, which is basically setting the timer on the microwave, you can now click on the arrows that go on opposite direction found right on the header of the [ExpandThis] field:

and you’ll get a window like the following:

You hit OK and then you’ll get your “soup” ready!

it is ready, but it’s still in the “microwave”. Let’s take it out of there and grab it into our plate by clicking the save & load and load it into either Excel or my Data Model.

And there you have it! your Power Query flavored “soup” is ready to be consumed

Sonrisa
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn

Don’t forget to download the workbook here and if you have any questions about this code or anything shown here just drop a comment.

Posted in:

8 Comments

  1. Miguel,

    Very interesting post which shows the extensibility of Power Query.

    As you have included the rowcount as metadata it should be possible to add the import date.

    Is it easy to modifiy this function to work for TXT files ?

    Best Regards

  2. Maybe I just don’t get it but where do I enter the input values, like the folder path? all I see when I open the workbook is that table you show above and I try to enter the file path there and nothing happens

      • Hi,
        Yes sir I got that part and it worked until I went to close and load it. For some reason my excel 2010 has never given me the option of “close and load to” , so I can only use the close and load which loads it do the default location but after I do that I expect to see another sheet in your workbook but there’s nothing there. Im sure Im doing something dumb but can you help me to figure it out.

  3. Excellent post, thanks. I tested this function with 1,000 xlsm workbooks. One Table was read from each workbook in <1 min!

    I am wondering if there is any performance improvement, if binary XLSB workbooks are used, instead of xlsx/xlsm. I will try put this theory to the test soon!

  4. I removed vbaProject.bin & blank cols/rows (reset UsedRange) and the same Power Query on 1K workbooks seem to run noticeable faster. So it pays to shrinks files.

    http://www.spreadsheet1.com/reset-used-range.html

    http://www.spreadsheet1.com/move-excel-vba-projects-from-one-workbook-to-another.html

    However, I couldn’t run the query on binary workbooks, although I changed M code from xlsx to xlsb. Any idea why? If PQ reads the legacy binary XLS, why can’t it read XLSB?

One Ping

  1. Pingback:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.