Combining Data From Multiple Excel Workbooks With Power Query–The Easy/Complete/Power BI Ready Way!

In a series of multiple blog posts within the last few weeks, a lot of blog authors (Ken Puls, Chris Webb, Mike and many others) who I follow have raised the enthusiasm for combining multiple Excel Workbooks or even worksheets into a single table.

All of our approaches have two things in common:

  • Calling a Function
  • Refer to a single type of object (most of the time it’s just a Worksheet)

wich may not the best or most complete way to work with Power BI (on the cloud) since queries that refer to a function can not be refreshed on the cloud and also because you might want to combine sheets, tables and named ranges at the same time .

Note that If you try refreshing a workbook that contains a Query with a function on Power BI, you’ll get an error similar to this:

So, what if you could create a query that can combine anything and everything from multiple workbooks?
Sonrisa
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
READ THIS POST and you’ll see how

So, what if you want to…

  • Combine multiple parts of a workbook (not just the sheets, but also named ranges and tables)
  • Not being forced to create a function
  • Make it Cloud-READY so it can be refreshed from a Power BI site

Hear me out, there’s a way

Sonrisa
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
if you want to follow along you can also download all the files from here.

***Beware, there are LOTS and LOTS of images ahead***

Step 1: Choose the files that hold the data that you want to combine and the attributes/properties that will go with them

First, let’s store all of our files into a single folder

Then we can step into Excel and Power Query and go with our normal process of going through the selection from folder like you can see in the following image:

Next step is finding your Folder:

Let’s go ahead and click OK once you’ve found your folder. Then, let’s filter the files that are not going to be part of this process like that .ini file in that folder.

You could’ve done the same by filtering with the extension field.  The following is optional but, you can also use properties of your files like size, date crated and such to be part of your final table. This is your chance to select the ones that you want in the event that you want them:

Once we know which ones we want, let’s choose them AND don’t forget to choose the Binary field as that’s the field that does all the magic (that’s the one that holds the actual file). In this case, I’m going to pick the name of the file and the binary field but, remember, you could’ve picked the date accessed, archived and such.

and this should be our result:

this process will always get all the files that are in my folder except for that .ini file that I didn’t want. So I could add new files to that folder and that query will add it to the process without me having to select it and changing the query somehow

Sonrisa
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
/ in other words, this is done. I don’t have to modify this phase anymore.

Step 2: Extracting the data from those files

So now I know which files hold the data that I need to combine….what’s next??

Let’s create a new custom column with the formula shown above:

=Excel.Workbook([Content])

Note: In case you wanted to do the same process but with a CSV file, you could’ve used the following formula:

=Csv.Document([Content])

both of these formulas basically transform (more like display) that binary into either an Excel.Workbook or CSV that you can interact with in Power Query.

and this is the result of that formula

or should I say, Field? go Ahead and expand the MyExcelData field to show what we got!

You can filter the Data.Kind field IF you just want to combine all the sheets, tables or perhaps all the Named Ranges BUT, what if you have an specific request to have multiple types of combination?

*The following process is optional, but it’s pretty cool*

Create a new column using the following formula:

= [Name]&”, “&[MyExcelData.Item]&”, “&[MyExcelData.Kind]

this basically creates a field that will have the name of the file, the item type and the kind. Use that new field to filter the sheets, named ranges and tables that you need and this should be your result:

As you can see, my scenario was kinda complex as I wanted to combine multiple objects that hold monthly data. Some files even had multiple months in different formats (tables, sheets and then named ranges)

Before you try expanding that “MyExcelData.Data” field, please just look at how it looks and in what issues you’ll be envolving yourself

Sonrisa
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
:

So let’s use something that fixes that for us!

Let’s create a new custom column with the following formula:

=if [MyExcelData.Kind]<>”Table” then Table.PromoteHeaders([MyExcelData.Data]) else [MyExcelData.Data]

that should promote the first rows to be the headers of that table, but only for any objects that are not tables because tables already have the headers set up (only sheets and named ranges need that fix). Let’s check if it worked:

Now let’s delete the fields that we don’t need:

and then expand that Custom.1 field!

The final steps before loading it into Excel

I’m going to save you from the rest of the images and just show you what I did with that month,year and date column:

You can download this Workbook and the files that I used from here.

For those who want to see the code, here it is:

[code language=”css”]let
Source = Folder.Files("C:UsersMiguelGoogle DriveBest Way to Combine"),
#"Filtered Rows" = Table.SelectRows(Source, each ([Name] <> "desktop.ini")),
#"Expand Attributes" = Table.ExpandRecordColumn(#"Filtered Rows", "Attributes",
{"Archive"}, {"Attributes.Archive"}),
#"Removed Other Columns" = Table.SelectColumns(#"Expand Attributes",{"Content", "Name"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "MyExcelData",
each Excel.Workbook([Content])),
#"Expand MyExcelData" = Table.ExpandTableColumn(#"Added Custom", "MyExcelData",
{"Name", "Data", "Item", "Kind"},
{"MyExcelData.Name", "MyExcelData.Data",
"MyExcelData.Item", "MyExcelData.Kind"}),
#"Added Custom1" = Table.AddColumn(#"Expand MyExcelData", "Custom",
each [Name] & ", "&[MyExcelData.Item] &", "&[MyExcelData.Kind]),
#"Filtered Rows1" = Table.SelectRows(#"Added Custom1",
each ([Custom] <> "File7.xlsx, Feb_2013, Sheet" and
[Custom] <> "File8.xlsx, Apr_2013, Sheet" and
[Custom] <> "File8.xlsx, May_2013, Sheet" and
[Custom] <> "File9.xlsx, Jun_2013, Sheet")),
#"Added Custom2" = Table.AddColumn(#"Filtered Rows1", "Custom.1",
each if [MyExcelData.Kind]<>"Table"
then Table.PromoteHeaders([MyExcelData.Data])
else [MyExcelData.Data]),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Content",
"Name", "MyExcelData.Name",
"MyExcelData.Data", "MyExcelData.Kind",
"Custom"}),
#"Expand Custom.1" = Table.ExpandTableColumn(#"Removed Columns",
"Custom.1", {"Amount", "Date", "Gross Sales", "Product"},
{"Custom.1.Amount", "Custom.1.Date",
"Custom.1.Gross Sales", "Custom.1.Product"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Expand Custom.1",
"MyExcelData.Item",Splitter.SplitTextByDelimiter("_"),
{"MyExcelData.Item.1", "MyExcelData.Item.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",
{{"Custom.1.Date", type text}}),
#"Added Custom3" = Table.AddColumn(#"Changed Type", "Custom",
each [Custom.1.Date]&"-"&[MyExcelData.Item.1]&"-"&[MyExcelData.Item.2]),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom3",{{"Custom", type date}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Custom", "Date"}}),
#"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns",{"MyExcelData.Item.1",
"MyExcelData.Item.2", "Custom.1.Date"}),
#"Renamed Columns1" = Table.RenameColumns(#"Removed Columns1",
{{"Custom.1.Amount", "Amount"},
{"Custom.1.Gross Sales", "Gross Sales"},
{"Custom.1.Product", "Product"}})
in
#"Renamed Columns1"[/code]

Last Step: Try it out the refresh process with this Query on Power BI
Sonrisa
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn
  • Facebook
  • Twitter
  • LinkedIn

This should be the result of that:

IT WORKS!!!! IT’S ALIVE

  • Facebook
  • Twitter
  • LinkedIn

Posted in:

24 Comments

  1. Great resource Miguel! Thanks for taking the time to create the images. This makes it much easier to follow for me.

    I’m getting an Error with one file on the MyExcelData=Excel.Workbook([Content]) custom column. Do you know what can cause the Excel file to return an error. The file just contains one sheet of data exported from salesforce.com. I’m just curious if there are any common causes for this before I start digging.

    Thanks again!

    • Hey Jon,

      Could you try and open the file using Power Query > From Excel file to see if you also get an error with that same file?

      It’s really weird as it shouldn’t happen at all. Doing the Excel.Workbook() simply reads the binary as an Excel Workbook. It doesn’t do anything more

  2. Hi,

    Thanks for the great post. I have a question and need some help in solving it.

    I have a table which has 2 columns, Measure & File.

    Measure File
    Measure1 C:\Users\Documents\Work\INNOVATION\Test\Data\Measure1.xlsx
    Measure2 C:\Users\Documents\Work\INNOVATION\Test\Data\Measure2.xlsx
    Measure3 C:\Users\Documents\Work\INNOVATION\Test\Data\Measure3.xlsx
    Measure4 C:\Users\Documents\Work\INNOVATION\Test\Data\Measure4.xlsx

    I need to import these files based on the selection of the measure. I am very new to PQ but trying to learn from various blogs like yours. I am trying to design a function which will accept the Measure as a input and import the corresponding file. I am completely lost in doing that.

    Any guidance will be very helpful!

    TIA,
    Neel

  3. Awesome approach! Quick question though, have you tried to do this from One Drive where you are passing a URL in instead of a folder path?

  4. Thank you very much for this post! Your approach to this problem was much simpler and easier to implement than several other BI Wizards. The parameterized function approach was giving errors, while this approach worked beautifully.

  5. Hi Miguel,

    Thanks for your kind explanation. This is working fine for me as long as the number of files inside the folder remains the same. I am getting error if i increase or decrease the number of files (different files are ok, as long as total number of files remains the same). Is it possible to program it to have flexibility with number of files?

  6. Hi Miguel!

    Thank you for this solution! I’m having a problem though, I have three different tables that I imported. They all have the same header which makes three rows in PBI consist of those headers. The first row I can solve by clicking “Use first row as headers”. The other two I have to delete out of the Querie, but every time the data is updated in Excel the headers appear again. Do you have any suggestions to how to make them go away forever?

    Thank you in advance!

  7. Hi, is there any change to above related with newly released powerBI desktop function called “Combine” for Folder based data sources?? see the Feb 2017 release info

    • Hi! The ‘Combine Binaries’ experience changed in the Power BI Desktop November 2016 version to one that supports combining more than just flat files like txt or csv. Now you can use it to combine Excel files or any file type to be honest, but they all have to be of the same file type (only combine all the xlsx or all the json for example), otherwise it might not work. Furthermore, the new ‘Combine Binaries’ experience adds some overhead and complexity as it creates 4 new queries that help the whole process, which might add processing time to the queries.

      The post here is a more direct and efficient way of combining the files from a folder where everything is kept in just 1 query. Still relevant to anyone trying to combine file from a folder really fast.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.