Power BI

  • New Artificial Intelligence Features in Power BI / Power Query

    Within the past few years we’ve seen Power BI / Power Query get new features that are using Artificial Intelligence at its core. This blog post is an overview of all of these features, what experience they try to provide and what common problems they try to solve. Column from Example in Power BI / […]

  • Step level error in Power BI / Power Query

    In a previous blog post (here) I covered the types of errors that you might most likely find when using Power BI / Power Query. In this Conditional Logic series, I’ve covered the try keyword in Power Query and how to use basically an IFERROR in your own code, but this was mainly done to […]

  • Data Visualizations in our daily lives: Tables – Part 2

    Last week I started this series where I cover how we perceive Data Visualizations in our daily lives. If you haven’t read Part 1 yet, I highly encourage you to start reading it now here. I started this series with how videogames are most commonly using data visualizations. I even created a report for one […]

  • Data Visualizations in our daily lives – Part 1

    For the better part of 2019, the focus of this blog has been Data Preparation articles for Power BI / Power Query. Usually, the average data analyst spends most of his / her time on this task – making sure that the data is in the shape / form that the computer requires it to […]

  • Join the Spanish Power BI forum

    During the month of January this year I launched the first Power BI forum entirely in Spanish. I have tried to promote this initiative and to date we have around 240 members which is great. I am publishing this article because I believe we can reach many more Spanish-speaking people who are seeking help from […]

  • Grouping rows with Power BI / Power Query

    image

    Since it’s origins, Power Query  / Power BI has had this feature called Group By and you can see it under the main menu and the Transform ribbon under the following icon:

    image

    is not a really descriptive icon. It doesn’t give you that much information other than something is dependent with something else (via that line).

    What does Group by do? When should I use Group by?

    In short, the Group By Operation inside Power BI / Power Query tries to do 2 things:

    1. Summarize your Data – you get your table summarized by only the columns that you select. This is amazing if you’re trying to get rid of duplicates or to check where you have duplicates.
    2. Provide Aggregations or Non aggregated Data – imagine these new columns that will provide aggregations such as the sum, max, min, average of a column and in some cases other columns that will not do any aggregation and will only the grouped rows as a table

    You should use the Group By functionality any time that you need to do anything that has to do with grouping rows from a table based on the values that they have in their field/s.

    Let’s go straight into real-world examples of when you might want to use this Group by feature and what it brings to the table

    Be sure to click on the following button in order to download the sample file with also the solutions.

    Download sample file

    1. Summarize Data

    Original Dataset: We have data that looks more like a report with all of the fields rather than something that we would use inside a Power BI / PowerPivot Data Model.

    SNAGHTMLa9353dd[4]

    Goal: Normalize our dataset and create a Customers Dimension Table for our Power BI Data Model. We would have a fact table with only the customer key and another table with all the fields for customers.

    image

    How to group rows with Power BI / Power Query for this ?

    Here’s the step by step of what we need to do:

    1. Head over to the sheet 1 or, if using Power BI Desktop, connect to the table within the sheetname “1” from the sample workbook.
    2. Name this Query “Original”
    3. Reference the “Original” Query twice and name one of those references “Dim_Customers” and the other one “Fact_Sales”

    Now that we have these 3 queries, the whole goal is to only load the “Dim_Customers” and the “Fact_Sales” to our Data Model.

    In a more technical sense, we are dealing with what it’s called a denormalized table and we need to normalize it (reduce the redundancy of data) by basically moving most of those fields to a new table and only keeping 1 field that will act as the “key” for our customers. I just so happen to call that field “CustomerKey” to make it easier for this example, but in the real world it might be called something else.

    Creating a Dimension table for Customers

    Let’s work on that “Dim_Customers” query. In the original table you’ll see that I marked some columns with a yellow color. I did this because all of those fields are all referring to a single “object” or “element” and that is the customer.

    Click on the Group By icon and then in the Group By window select the Advanced option. Then for the Group by fields select CustomerKey, Customer, Category, Group, Primary Contact as shown in the next picture:

    image

    The rest you can leave it as default.

    The result will be a summarized table with no duplicates for our customer fields and a new column called “Count” which we can just remove. After removing that “Count” column, you’ll end up with your table exactly as you need it:

    image

    Normalizing our Fact Table

    Our goal with this query is super simple. Let’s delete all of the fields that have anything to do with the newly created dimension table for customers, but keep the CustomerKey field so we can create the relationship between tables.

    In a more visual way, let’s delete the fields highlighted in red in the picture below:

    image

    You simply select those fields in red (Customer, Category, Group, and Primary Contact) and then do a right click on either one of those columns and select the option that reads “Remove Columns”:

    image

    The result of that operation will give you a table that looks like this:

    image

    and with that you have your Fact_Sales table ready to be loaded to your Data Model.

    Building our Data Model and creating the report

    if you’re in Power BI Desktop, you can select your queries from the “Queries” pane and make sure that only the Fact_Sales and Dim_Customers load to your Data Model, but inside of Power Query for Excel you need to first load your queries as “connection only” and then load them to your Data Model.

    The main key here is that you need both of those tables / queries that we just created in your Data Model and then inside of it you can create a relationship between those 2 tables using the CustomerKey field from both tables. You can simply drag one field from one table to the field of the other table using the Diagram view and the app will create the relationship for you. The end result will look like this:

    image

    With that out of the way, you can focus on just creating your report. In my case, I ended up creating this report inside of Excel which is basically a top 10 customers by order total from each Customer Group

    image

    Takeaways

    The main takeaway here is that this principle can be used for any Dimension or any type of Normalization scenario that you can think of.

    There is another valid way of doing this and that is by simply keeping the columns that you need and then remove the duplicates from those columns. Again, completely valid but its a matter of preference at that point.