What are Merge / Join Operations?

  • A merge operation creates a new query from two existing queries. One query result contains all columns from a primary table, with one column serving as a single column containing a navigation link to a related table. The related table contains all rows that match each row from a primary table based on a common column value.

    This is one of the best features and tools for Data Preparation that you can possibly have, but trying to select from one of the 6 options available can be confusing or even daunting at first.

    A lot of questions arise when you try to jump into these join kinds like:

    • When should I use which join?
    • Why should I use that join?
    • How can I use it?

    This reference has been created from a practical standpoint to show you when, why and how to use each of those Join Kinds when doing a Merge operation inside of Power Query / Power BI.

Join Kinds

Left Outer Join

Left Outer Join

The Left Outer Join is commonly used as a sort of a VLOOKUP, where you bring records from a reference table into another table, but it could also be used in the other way around depending on whether we want to use a Expand or an Aggregation operation after the Merge operation. It brings all of the records from the first (or top) table and only the records from the second (or bottom) table that match the records from the first (or top) table.

Merge Window:

Left Outer Join

We are using the ProductID column from both tables as the JOIN column. Our sales table might have duplicate values on this column, but we know for a fact that the Products table won’t have duplicates. What we’re looking here is to simply have the name of the Product inside our Sales table as a new column.

Tables Used:

  • Sales Table: this is our Top or First table and it has daily sales data, but it’s missing the Product Name.
  • Products Table: this is our Bottom or Second table and it has unique values. This table holds the names and product IDs of every single product in our inventory.

Output Table:

Left Outer Join Output
After performing the Left Outer Join and the Expand operation, we get the ProductName from our Products Table right inside our Sales Table, similar to what you’d expect from a VLOOKUP.

Right Outer Join

Right Outer Join

One of the most powerful and smart ways to apply what I like to call ‘Clever Filters’. The whole idea is that the second (or bottom) table is the one that has the control as to what values or rows will be displayed on the output table, hence the label reads ‘all from second, matching from first’

Merge Window:

Left Outer Join

We are using 2 columns to create the JOIN: The Region and the Category Code from both tables. What we’re looking for with this example is to, in a way, filter the rows from the ExportedData table using the values from the SelectionTable to only show the records that match the rows from the SelectionTable.

Tables Used:

  • ExportedData: this is our Top or First table and it has data that was directly exported from our Global SAP system.
  • SelectionTable: this is our Bottom or Second table and it has the combination of values that we want to take in consideration from the ExportedData table. If it’s not listed here, then we don’t want to load those rows from the ExportedData

Output Table:

Left Outer Join Output
After performing the Right Outer Join, we don’t necessarily need to Expand the newly created SelectionTable column. We can simply remove it if we want to and just keep the merged data that we were initially interested in.

Left Anti Join

Left Anti Join

If you want to filter out or exclude rows from a table, then this is the Join Kind that will suit you best. It brings all of the records from the first (or top) table, except for the matching rows from the second (or bottom) table.

Merge Window:

Left Outer Join

We are using 2 columns to create the JOIN: The Region and the Category Code from both tables. What we’re looking for with this example is to, in a way, exclude specific rows from the ExportedData table using the values from the ExclusionTable to only show the records that DO NOT match the rows from the ExclusionTable.

Tables Used:

  • ExportedData: this is our Top or First table and it has data that was directly exported from our Global SAP system.
  • ExclusionTable: this is our Bottom or Second table and it has the combination of values that we DO NOT want to take in consideration from the ExportedData table. If it’s listed here, then we don’t want to load those rows from the ExportedData

Output Table:

Left Outer Join Output
After performing the Left Anti Join, we don’t necessarily need to Expand the newly created ExclusionTable column. We can simply remove it if we want to and just keep the merged data that we were initially interested in.

Right Anti Join

Right Anti Join

You can use this JOIN to compare the same table in multiple points in a timeline to see what are the differences between them, or similar to how you use the Left Anti Join, but by switching the position of the tables from first to second and viceversa.

Merge Window:

Left Outer Join

Both tables have the same format / layout, but one has the latest data (NewTable) whilst the other one (OriginalTable) was last updated 2 weeks before the NewTable and we would like to see what are the new Tasks that were added to the NewTable. We can use the Task Column from both tables to make the JOIN.

Tables Used:

  • OriginalTable: this is our Top or First table and it has the tasks that are assigned to a specific owner. It was last updated on the year 2018.
  • NewTable: this is our Bottom or Second table and it it essentially the OriginalTable, but it was last updated on January of 2019.

Output Table:

Left Outer Join Output
After performing the Right Anti Join, we get a table with only the Tasks that appear in the NewTable and were not present in the OriginalTable. (I had to expand the values from the column NewTable in order to get these columns)

Inner Join

Inner Join

The whole idea of the Inner Join is that you only want to find the matches from both tables, that is, the rows that exist in both tables. This means that if a value exists in one table, but not in the other, then it won’t show in the output of your merge operation.

Merge Window:

Left Outer Join

We are using the Account column from both tables as the JOIN column. The main goal with this merge is that we only want the accounts (customers) that had sales in both OnlineSales and the TeleSales table

Tables Used:

  • OnlineSales: this is our Top or First table and it has the accounts that had sales through our online store and the amount that they spent.
  • TeleSales: this is our Bottom or Second table and, contrary to the OnlineSales table, this has the accounts that had sales through any other channel that is not Online.

Output Table:

Left Outer Join Output
After performing the Inner Join and the Expand operation, we get the TeleSales from our TeleSales Table right inside our Sales Table, and only the Accounts that had sales in both the Online and TeleSales.

Full Outer Join

Full Outer Join

With the Full Outer Join we want all rows from both tables. This means that if a value exists in one table, but not in the other, then it’ll still show in the output of your merge operation under a new row with null values.

Merge Window:

Left Outer Join

We are using the Account column from both tables as the JOIN column. The main goal with this merge is that we want ALL the accounts (customers) that had sales in either the OnlineSales or the TeleSales table, and get a column with the amount of sales made on Online and another one for the amount for TeleSales.

Tables Used:

  • OnlineSales: this is our Top or First table and it has the accounts that had sales through our online store and the amount that they spent.
  • TeleSales: this is our Bottom or Second table and, contrary to the OnlineSales table, this has the accounts that had sales through any other channel that is not Online.

Output Table:

Left Outer Join Output
After performing the Full Outer Join, the Expand operation and some other Data Preparation steps, we get ALL of the accounts from both tables and a column for the amount of the TeleSales and another column for the OnlineSales amount. If the Account didn’t have sales in either channel, then it’ll show as null on that specific column.