This is the last post in the series! I highly encourage you to read Part 5 of this series before reading this one, but nevertheless, you can jump right in if you know the basics of Merge / Join Operations inside of Power BI / Power Query.
We will be using the same sample data that we used in Part 5, but this time we’ll have a completely new goal which is probably one of the most frequent ones that I’ve had when Modelling data for Power BI.
In this, Part 6, we’ll go over the Full Outer Join from a purely practical standpoint.
We’ll be using the same 2 tables that we used for Part 5:
- OnlineSales – we sell some of our products using an online store hosted on Shopify and this is the export that Shopify gives us
- TeleSales – we also have a different channel that we call TeleSales, which is for everything that happens outside of the Online Channel
This time we need a table with ALL of the accounts (customers) that had sales in EITHER OnlineSales and / or the TeleSales table.
The result that we’re looking for should look like this:
Again, this is different to Part 5 in the sense that we don’t want just the customers that had sales in both channels, but rather ALL of the accounts and then a column for their sales Online and another for their sales amount in the TeleSales Column.
The following is a step by step process on how we can accomplish our desired result with nothing but a few clicks. No code needed – just the use of the Full Outer Join inside of Power Query / Power BI and a few other native functionality inside of the tool.
Loading the Data
You can follow along by downloading the sample file from the button below:
The first thing that we need to do is load both of those tables to Power Query / Power BI. To do that, we simply go to Data > Get & Transform Data > Get Data > From File > From Workbook:
then we find the file that we downloaded (with the name of Sample Data.xlsx). We then select the option to “Select multiple items” and we select both the OnlineSales and TeleSales tables:
and then we click on ‘Transform Data’ or ‘Edit’ depending on what version of Power Query / Power BI you might have.
That should land both of your tables inside the Power Query editor as shown below:
Doing the Full Outer Join
At this stage, we’re ready to click on the ‘Merge Queries’ button that you see in the top middle of the previous image. Be sure to select, from the dropdown, the option to merge queries as new.
Now we get a new Window to set the Merge operation and in here is completely up to you which you want to pick as the first or second one. In my specific case, I chose to go with Online Sales as the first table and TeleSales as the second one.
For the columns that you need to select on each table, just select the Account column on both and then choose the Full Outer from the Join Kind dropdown as shown below:
The label for the Join Kind reads “all rows from both”, but the note underneath gives you a number of matches of 13. I know that this might find confusing, but don’t really pay that much attention to that note about the number of matches, since this specific join will provide the rows from both tables regardless.
After hitting OK, you’ll go back to the Power Query Editor and you should be able to see it like this:
Expanding the Newly Created Column
This is Key! go ahead and expand the newly created column as shown in the next picture:
and now your table should look like this:
which has some scrambled data, but we can use Power Query’s native functionality to shape the data however we want.
Doing a Conditional Column to define the Account
You’ll notice that we have 2 Account columns. One that we had for the data from the OnlineSales and the other one that came from the expand operation of the TeleSales Column / Data.
Instead of having 2 Account columns, we want to have just 1 and for that we can go ahead and create a conditional column with a simple logic.
To create a conditional column you’ll go to the Add Column Ribbon and select the button for Conditional Column.
This will pop up the Conditional Column window where you can define that conditional logic for the new column which should look like this:
In short, what this does is that it looks for the Account column and if it finds a null value (basically an empty value), it’ll try to use the value from the TeleSales.Account column.
The result of that operation will look like this:
There’s a few columns that we truly don’t need anymore. Let’s get rid of:
and then let’s move the AccountRevised column to be the first column just for aesthetics. That should look like this:
Just to make sure that everything looks great and is easy to understand, let’s change the names of some columns:
- AccountRevised should be Account
- Amount should be OnlineSales.Amount
and last but not least, let’s define the data type of that Account column to be an integer.
The result of that looks like this:
Look at both of the tables in detail. There were some accounts that only had sales in online and some other accounts that only had sales in TeleSales, but we have a full view of every single account and the numbers for each sales channel – this is what its called an integrated view and we didn’t have to use a single line of code.
This is one of the most powerful tools that I’ve been able to find inside of Power Query. Especially when you have multiple systems that are not integrated to each other, so for reporting purposes you need to tie them somehow and Power Query / Power BI get’s the job done.
Excel specialist turned into BI specialist using the latest tools from Microsoft for BI – Power BI. He is the co-author of ‘M is for Data Monkey’, blogger and also Youtuber of powerful Excel video Tricks.