Combine or Append Data: Optimal Combination Pattern

This is going to be the last post series in the series on Combine or Append Data.

In the first post we saw the basics of how to do the Append operation through the UI.

In the second post we saw the Combine Files experience with Flat Files and how easy it is to combine as many files as you want.

In the third post we had a contrast of the Combine Files experience using Excel Workbooks instead of simple flat files and what things we needed to consider this time that we didn’t consider with simple flat files.

In this fourth and last post we’ll be going back to the basics using the function that we discovered in the first post – Table.Combine which is the most optimal function for combining / appending data.

Why this approach?

Don’t get me wrong, the other approaches are completely valid, but in some cases they might add some overhead to the whole process which is unnecessary.

I’m talking specifically about the Combine Files experience which does improve the end-user experience by adding some overhead, but if your goal is to improve performance, then you could potentially make a compromise on that end-user experience to optimize your query and make it run up to 500% faster.

No – I’m not saying that we should use exactly the same approach that we did on the first post in this series.

That would be too time consuming and it simply won’t scale, but what about a combination of that Append approach and the Combine Files?

This is more for advanced users rather than your average Power Query user, but it does have a huge impact if your goal is to improve performance and this is usually what I personally do when I get hired to optimize queries from my clients.

The Setup

  • Facebook
  • Twitter
  • LinkedIn

I’ll be using the same dataset that we used in the 2nd post for the Combine Files using multiple flat files from a series of folders.

The end goal is to use a function called Table.Combine to combine all of the files instead of using the normal Combine Files approach which uses a function called Table.ExpandTableColumn.

So, how does the Table.Combine function work? Good question! This function requires a List of Tables so it can append all of those tables into just one big table.

You can learn more about what lists, tables and other structured values are inside of Power Query and the M language from here.

Let me show you how this new pattern would look like, but please take in consideration that the end goal is the same every time, but the journey  to it might be different depending on where you’re starting at and what “checkpoints” you need to make.

Step 1: Connecting to the Folder

Similar to what we’ve seen before, we’ll be just connecting to the folder:

  • Facebook
  • Twitter
  • LinkedIn

You can download the files that I’m using by clicking on the button below:

Download Files

Just connect to the folder where your data is being stored and you’ll get this window:

  • Facebook
  • Twitter
  • LinkedIn

We now click the Edit button and head over to the Power Query Editor. Remember, we’re taking the same assumptions and notes that we did in the previous post (post 2).

Up until now, we’re basically doing exactly the same that we’ve done before, but things are about to change now:

Step 2: Buffering the Binaries

Since we’re working with files, is usually a good idea to buffer them (save them in memory) so it’s easier / faster for us to read the files and combine faster and for that, I’ll be deleting all of the columns except the Content column and then I’ll right click that column to do a drill down operation:

This operation that I’m doing will transform this column into a list. You can learn more about this process and the whole Navigation approach from this article.

After that, I’ll go to the formula bar and simply edit that so I can modify the formula to be like this:

= List.Transform ( #”Removed Other Columns”[Content], Binary.Buffer)

In essence, I’ve applied a function to each row in that list. That function is called Binary.Buffer and it does the caching of the file itself so things should run faster.

Step 3: Create a Custom Function

You’ve probably heard me saying the words “recipe” and “custom function” interchangeably as the whole idea of a custom function is that you’re creating a recipe on how to “cook” your data with some “ingredients” (parameters) as the input.

In this case we’re going to be creating our own custom function which would look like this:

(myFile as binary) => Table.PromoteHeaders( Csv.Document(myFile, [Encoding=1252]) )

I highly recommend that you read my series on Parameters on Functions (here) to find out more on how a custom function could be created.

The only thing that this function does is interpret the binary as a CSV file (using the Csv.Document function) and then promoting the values from the first row of that file to be the column headers.

Step 4: Invoking the Custom Function

Now with your custom function created, you can now invoke that function against our primary query where we buffered the binaries.

You’ll end up creating a custom step (by clicking the fx button on the formula bar) and then adding this piece of code:

= List.Transform( Content, fxRead)

and you’ll notice that now the list has Table values instead of Binaries.

Now we end up with a list of tables, which is exactly what the Table.Combine function needs and that’s the function that we’re going to use now:

Step 5: Using the Table.Combine function

Similar to what we did in the previous step, we need to create a custom step and now we’ll wrap the formula around the Table.Combine function and voilá:

All of the data is now combined in just 1 table and you can see how clean and straightforward this approach is.

I highly encourage you to test this approach against the Combine Files experience so you can time the difference specifically for you. It is really significant when you’re dealing with large volumes of data.

Here’s what the final code looks like:

let
Source = Folder.Files(“D:\Sales”),
#”Removed Other Columns” = Table.SelectColumns(Source,{“Content”}),
Content = List.Transform ( #”Removed Other Columns”[Content], Binary.Buffer),
Custom1 = List.Transform( Content, fxRead),
Custom2 = Table.Combine( Custom1 )
in
Custom2

Final words

I focused on using files from a folder, but this approach also works with any other data source. tables from a database, tables from a web service or any other data source of your choice. The end goal is to create a list of Tables using the method that you see fit and then use the Table.Combine function.

Share your stories with me! Have you tried optimizing your Combine Files query? how fast is it going now with this new approach?

8 Comments

  1. This was an inspiring post. I figured out a way to reuse this pattern when dealing which multiple SharePoint lists all having the same structure replacing Binary.Buffer with Table.Buffer 👍

    Content = List.Transform ( #”Removed Other Columns”[Content], Table.Buffer),

  2. Hi Miguel, I do combine a lot of identical table objects from different xlsx files, the issue I face is performance…It take ages in my case when append the 60 files…Any proposal to get improve performance when combining tables object that are in xlsx files stored in the same SharePoint Library? (I saw your technique above works fine by creating a function to “transform” then combine csv files using buffer for improving performance).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.