Within the past few years we’ve seen Power BI / Power Query get new features that are using Artificial Intelligence at its core. This blog post is an overview of all of these features, what experience they try to provide and what common problems they try to solve. Column from Example in Power BI / […]
I’ve just finished moving from HubSpot to ActiveCampaign. While HubSpot seems to be an amazing platform for anyone into data analytics, it simply wasn’t cost-effective for my business, not to mention that some of its features seem really limited to the point where I wasn’t even using them. The migration from HubSpot to ActiveCampaign wasn’t […]
In a previous blog post (here) I covered the types of errors that you might most likely find when using Power BI / Power Query. In this Conditional Logic series, I’ve covered the try keyword in Power Query and how to use basically an IFERROR in your own code, but this was mainly done to […]
What time is it right now for you? We might share the same time zone, but that is usually not the case with worldwide operations. If I say, let’s meet tomorrow at 8am. Will that be your 8am? Or will that be my 8am? I feel like I should’ve posted this blog post a long […]
For quite a few years, I’ve been receiving one question that I haven’t fully delved into. It goes something like this: I manually add new columns to the output of my query (created with Power Query in Excel), and when I hit the refresh button, I lose everything. Why does this happen? How can I […]
One of the services that we provide here at Powered Solutions is the creation of Power BI Custom Connectors (see here). For the past few months or so I’ve been getting more and more questions related to what type of ‘Get Data’ experience we can have inside of Power BI / Power Query. More often […]
Last week I started this series where I cover how we perceive Data Visualizations in our daily lives. If you haven’t read Part 1 yet, I highly encourage you to start reading it now here. I started this series with how videogames are most commonly using data visualizations. I even created a report for one […]
For the better part of 2019, the focus of this blog has been Data Preparation articles for Power BI / Power Query. Usually, the average data analyst spends most of his / her time on this task – making sure that the data is in the shape / form that the computer requires it to […]
During the month of January this year I launched the first Power BI forum entirely in Spanish. I have tried to promote this initiative and to date we have around 240 members which is great. I am publishing this article because I believe we can reach many more Spanish-speaking people who are seeking help from […]
Microsoft released these features almost a year ago and in the last couple of months it has received some love with some updates. I’ve noticed that most people don’t really use these features, mainly because they’re disabled by default, so I wanted to create this blog post to cover the why and when to use […]
Since it’s origins, Power Query / Power BI has had this feature called Group By and you can see it under the main menu and the Transform ribbon under the following icon:
is not a really descriptive icon. It doesn’t give you that much information other than something is dependent with something else (via that line).
What does Group by do? When should I use Group by?
In short, the Group By Operation inside Power BI / Power Query tries to do 2 things:
- Summarize your Data – you get your table summarized by only the columns that you select. This is amazing if you’re trying to get rid of duplicates or to check where you have duplicates.
- Provide Aggregations or Non aggregated Data – imagine these new columns that will provide aggregations such as the sum, max, min, average of a column and in some cases other columns that will not do any aggregation and will only the grouped rows as a table
You should use the Group By functionality any time that you need to do anything that has to do with grouping rows from a table based on the values that they have in their field/s.
Let’s go straight into real-world examples of when you might want to use this Group by feature and what it brings to the table
Be sure to click on the following button in order to download the sample file with also the solutions.Download sample file
1. Summarize Data
Original Dataset: We have data that looks more like a report with all of the fields rather than something that we would use inside a Power BI / PowerPivot Data Model.
Goal: Normalize our dataset and create a Customers Dimension Table for our Power BI Data Model. We would have a fact table with only the customer key and another table with all the fields for customers.
How to group rows with Power BI / Power Query for this ?
Here’s the step by step of what we need to do:
- Head over to the sheet 1 or, if using Power BI Desktop, connect to the table within the sheetname “1” from the sample workbook.
- Name this Query “Original”
- Reference the “Original” Query twice and name one of those references “Dim_Customers” and the other one “Fact_Sales”
Now that we have these 3 queries, the whole goal is to only load the “Dim_Customers” and the “Fact_Sales” to our Data Model.
In a more technical sense, we are dealing with what it’s called a denormalized table and we need to normalize it (reduce the redundancy of data) by basically moving most of those fields to a new table and only keeping 1 field that will act as the “key” for our customers. I just so happen to call that field “CustomerKey” to make it easier for this example, but in the real world it might be called something else.
Creating a Dimension table for Customers
Let’s work on that “Dim_Customers” query. In the original table you’ll see that I marked some columns with a yellow color. I did this because all of those fields are all referring to a single “object” or “element” and that is the customer.
Click on the Group By icon and then in the Group By window select the Advanced option. Then for the Group by fields select CustomerKey, Customer, Category, Group, Primary Contact as shown in the next picture:
The rest you can leave it as default.
The result will be a summarized table with no duplicates for our customer fields and a new column called “Count” which we can just remove. After removing that “Count” column, you’ll end up with your table exactly as you need it:
Normalizing our Fact Table
Our goal with this query is super simple. Let’s delete all of the fields that have anything to do with the newly created dimension table for customers, but keep the CustomerKey field so we can create the relationship between tables.
In a more visual way, let’s delete the fields highlighted in red in the picture below:
You simply select those fields in red (Customer, Category, Group, and Primary Contact) and then do a right click on either one of those columns and select the option that reads “Remove Columns”:
The result of that operation will give you a table that looks like this:
and with that you have your Fact_Sales table ready to be loaded to your Data Model.
Building our Data Model and creating the report
if you’re in Power BI Desktop, you can select your queries from the “Queries” pane and make sure that only the Fact_Sales and Dim_Customers load to your Data Model, but inside of Power Query for Excel you need to first load your queries as “connection only” and then load them to your Data Model.
The main key here is that you need both of those tables / queries that we just created in your Data Model and then inside of it you can create a relationship between those 2 tables using the CustomerKey field from both tables. You can simply drag one field from one table to the field of the other table using the Diagram view and the app will create the relationship for you. The end result will look like this:
With that out of the way, you can focus on just creating your report. In my case, I ended up creating this report inside of Excel which is basically a top 10 customers by order total from each Customer Group
The main takeaway here is that this principle can be used for any Dimension or any type of Normalization scenario that you can think of.
There is another valid way of doing this and that is by simply keeping the columns that you need and then remove the duplicates from those columns. Again, completely valid but its a matter of preference at that point.
Excel specialist turned into BI specialist using the latest tools from Microsoft for BI – Power BI. He is the co-author of ‘M is for Data Monkey’, blogger and also Youtuber of powerful Excel video Tricks.
He has been recognized as a Microsoft Most Valuable Professional (MVP), is a Microsoft Certified Professional (MCP – MCSA: BI Reporting), a Microsoft Certified Trainer (MCT), and is one of the international pioneers in Power Pivot, Power Query and Power BI.
One of my most popular posts is one about recurring dates with offset in Power Query (url). It has thousands of hits, but one of the key follow up questions that people have after reading that is “How do I fill dates between dates?” This is the post where I’ll cover that very same topic […]