Combining Data From Multiple Excel Workbooks With Power Query–The Easy/Complete/Power BI Ready Way!

In a series of multiple blog posts within the last few weeks, a lot of blog authors (Ken Puls, Chris Webb, Mike and many others) who I follow have raised the enthusiasm for combining multiple Excel Workbooks or even worksheets into a single table.

All of our approaches have two things in common:

  • Calling a Function
  • Refer to a single type of object (most of the time it’s just a Worksheet)

wich may not the best or most complete way to work with Power BI (on the cloud) since queries that refer to a function can not be refreshed on the cloud and also because you might want to combine sheets, tables and named ranges at the same time .

Note that If you try refreshing a workbook that contains a Query with a function on Power BI, you’ll get an error similar to this:

PQFunctionAndDMG

So, what if you could create a query that can combine anything and everything from multiple workbooks? Sonrisa READ THIS POST and you’ll see how

So, what if you want to…

  • Combine multiple parts of a workbook (not just the sheets, but also named ranges and tables)
  • Not being forced to create a function
  • Make it Cloud-READY so it can be refreshed from a Power BI site

Hear me out, there’s a way Sonrisa if you want to follow along you can also download all the files from here.

***Beware, there are LOTS and LOTS of images ahead***

Step 1: Choose the files that hold the data that you want to combine and the attributes/properties that will go with them

First, let’s store all of our files into a single folder

Then we can step into Excel and Power Query and go with our normal process of going through the selection from folder like you can see in the following image:

image

Next step is finding your Folder:

image

Let’s go ahead and click OK once you’ve found your folder. Then, let’s filter the files that are not going to be part of this process like that .ini file in that folder.

image

You could’ve done the same by filtering with the extension field.  The following is optional but, you can also use properties of your files like size, date crated and such to be part of your final table. This is your chance to select the ones that you want in the event that you want them:

image

Once we know which ones we want, let’s choose them AND don’t forget to choose the Binary field as that’s the field that does all the magic (that’s the one that holds the actual file). In this case, I’m going to pick the name of the file and the binary field but, remember, you could’ve picked the date accessed, archived and such.

image

and this should be our result:

image

this process will always get all the files that are in my folder except for that .ini file that I didn’t want. So I could add new files to that folder and that query will add it to the process without me having to select it and changing the query somehow Sonrisa / in other words, this is done. I don’t have to modify this phase anymore.

Step 2: Extracting the data from those files

So now I know which files hold the data that I need to combine….what’s next??

image

Let’s create a new custom column with the formula shown above:

=Excel.Workbook([Content])

Note: In case you wanted to do the same process but with a CSV file, you could’ve used the following formula:

=Csv.Document([Content])

both of these formulas basically transform (more like display) that binary into either an Excel.Workbook or CSV that you can interact with in Power Query.

and this is the result of that formula

image

or should I say, Field? go Ahead and expand the MyExcelData field to show what we got!

image

You can filter the Data.Kind field IF you just want to combine all the sheets, tables or perhaps all the Named Ranges BUT, what if you have an specific request to have multiple types of combination?

*The following process is optional, but it’s pretty cool*

Create a new column using the following formula:

= [Name]&”, “&[MyExcelData.Item]&”, “&[MyExcelData.Kind]

image

this basically creates a field that will have the name of the file, the item type and the kind. Use that new field to filter the sheets, named ranges and tables that you need and this should be your result:

image

As you can see, my scenario was kinda complex as I wanted to combine multiple objects that hold monthly data. Some files even had multiple months in different formats (tables, sheets and then named ranges)

Before you try expanding that “MyExcelData.Data” field, please just look at how it looks and in what issues you’ll be envolving yourself Sonrisa:

image

So let’s use something that fixes that for us!

Let’s create a new custom column with the following formula:

=if [MyExcelData.Kind]<>”Table” then Table.PromoteHeaders([MyExcelData.Data]) else [MyExcelData.Data]

image

that should promote the first rows to be the headers of that table, but only for any objects that are not tables because tables already have the headers set up (only sheets and named ranges need that fix). Let’s check if it worked:

image

Now let’s delete the fields that we don’t need:

image

and then expand that Custom.1 field!

image

The final steps before loading it into Excel

I’m going to save you from the rest of the images and just show you what I did with that month,year and date column:

image

You can download this Workbook and the files that I used from here.

For those who want to see the code, here it is:

[code language=”css”]let
Source = Folder.Files("C:UsersMiguelGoogle DriveBest Way to Combine"),
#"Filtered Rows" = Table.SelectRows(Source, each ([Name] <> "desktop.ini")),
#"Expand Attributes" = Table.ExpandRecordColumn(#"Filtered Rows", "Attributes",
{"Archive"}, {"Attributes.Archive"}),
#"Removed Other Columns" = Table.SelectColumns(#"Expand Attributes",{"Content", "Name"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "MyExcelData",
each Excel.Workbook([Content])),
#"Expand MyExcelData" = Table.ExpandTableColumn(#"Added Custom", "MyExcelData",
{"Name", "Data", "Item", "Kind"},
{"MyExcelData.Name", "MyExcelData.Data",
"MyExcelData.Item", "MyExcelData.Kind"}),
#"Added Custom1" = Table.AddColumn(#"Expand MyExcelData", "Custom",
each [Name] & ", "&[MyExcelData.Item] &", "&[MyExcelData.Kind]),
#"Filtered Rows1" = Table.SelectRows(#"Added Custom1",
each ([Custom] <> "File7.xlsx, Feb_2013, Sheet" and
[Custom] <> "File8.xlsx, Apr_2013, Sheet" and
[Custom] <> "File8.xlsx, May_2013, Sheet" and
[Custom] <> "File9.xlsx, Jun_2013, Sheet")),
#"Added Custom2" = Table.AddColumn(#"Filtered Rows1", "Custom.1",
each if [MyExcelData.Kind]<>"Table"
then Table.PromoteHeaders([MyExcelData.Data])
else [MyExcelData.Data]),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Content",
"Name", "MyExcelData.Name",
"MyExcelData.Data", "MyExcelData.Kind",
"Custom"}),
#"Expand Custom.1" = Table.ExpandTableColumn(#"Removed Columns",
"Custom.1", {"Amount", "Date", "Gross Sales", "Product"},
{"Custom.1.Amount", "Custom.1.Date",
"Custom.1.Gross Sales", "Custom.1.Product"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Expand Custom.1",
"MyExcelData.Item",Splitter.SplitTextByDelimiter("_"),
{"MyExcelData.Item.1", "MyExcelData.Item.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",
{{"Custom.1.Date", type text}}),
#"Added Custom3" = Table.AddColumn(#"Changed Type", "Custom",
each [Custom.1.Date]&"-"&[MyExcelData.Item.1]&"-"&[MyExcelData.Item.2]),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom3",{{"Custom", type date}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Custom", "Date"}}),
#"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns",{"MyExcelData.Item.1",
"MyExcelData.Item.2", "Custom.1.Date"}),
#"Renamed Columns1" = Table.RenameColumns(#"Removed Columns1",
{{"Custom.1.Amount", "Amount"},
{"Custom.1.Gross Sales", "Gross Sales"},
{"Custom.1.Product", "Product"}})
in
#"Renamed Columns1"[/code]

Last Step: Try it out the refresh process with this Query on Power BI Sonrisa

This should be the result of that:

image

IT WORKS!!!! IT’S ALIVE

Posted in: