Tag Archives: power query

  • Grouping rows with Power BI / Power Query

    image

    Since it’s origins, Power Query  / Power BI has had this feature called Group By and you can see it under the main menu and the Transform ribbon under the following icon:

    image

    is not a really descriptive icon. It doesn’t give you that much information other than something is dependent with something else (via that line).

    What does Group by do? When should I use Group by?

    In short, the Group By Operation inside Power BI / Power Query tries to do 2 things:

    1. Summarize your Data – you get your table summarized by only the columns that you select. This is amazing if you’re trying to get rid of duplicates or to check where you have duplicates.
    2. Provide Aggregations or Non aggregated Data – imagine these new columns that will provide aggregations such as the sum, max, min, average of a column and in some cases other columns that will not do any aggregation and will only the grouped rows as a table

    You should use the Group By functionality any time that you need to do anything that has to do with grouping rows from a table based on the values that they have in their field/s.

    Let’s go straight into real-world examples of when you might want to use this Group by feature and what it brings to the table

    Be sure to click on the following button in order to download the sample file with also the solutions.

    Download sample file

    1. Summarize Data

    Original Dataset: We have data that looks more like a report with all of the fields rather than something that we would use inside a Power BI / PowerPivot Data Model.

    SNAGHTMLa9353dd[4]

    Goal: Normalize our dataset and create a Customers Dimension Table for our Power BI Data Model. We would have a fact table with only the customer key and another table with all the fields for customers.

    image

    How to group rows with Power BI / Power Query for this ?

    Here’s the step by step of what we need to do:

    1. Head over to the sheet 1 or, if using Power BI Desktop, connect to the table within the sheetname “1” from the sample workbook.
    2. Name this Query “Original”
    3. Reference the “Original” Query twice and name one of those references “Dim_Customers” and the other one “Fact_Sales”

    Now that we have these 3 queries, the whole goal is to only load the “Dim_Customers” and the “Fact_Sales” to our Data Model.

    In a more technical sense, we are dealing with what it’s called a denormalized table and we need to normalize it (reduce the redundancy of data) by basically moving most of those fields to a new table and only keeping 1 field that will act as the “key” for our customers. I just so happen to call that field “CustomerKey” to make it easier for this example, but in the real world it might be called something else.

    Creating a Dimension table for Customers

    Let’s work on that “Dim_Customers” query. In the original table you’ll see that I marked some columns with a yellow color. I did this because all of those fields are all referring to a single “object” or “element” and that is the customer.

    Click on the Group By icon and then in the Group By window select the Advanced option. Then for the Group by fields select CustomerKey, Customer, Category, Group, Primary Contact as shown in the next picture:

    image

    The rest you can leave it as default.

    The result will be a summarized table with no duplicates for our customer fields and a new column called “Count” which we can just remove. After removing that “Count” column, you’ll end up with your table exactly as you need it:

    image

    Normalizing our Fact Table

    Our goal with this query is super simple. Let’s delete all of the fields that have anything to do with the newly created dimension table for customers, but keep the CustomerKey field so we can create the relationship between tables.

    In a more visual way, let’s delete the fields highlighted in red in the picture below:

    image

    You simply select those fields in red (Customer, Category, Group, and Primary Contact) and then do a right click on either one of those columns and select the option that reads “Remove Columns”:

    image

    The result of that operation will give you a table that looks like this:

    image

    and with that you have your Fact_Sales table ready to be loaded to your Data Model.

    Building our Data Model and creating the report

    if you’re in Power BI Desktop, you can select your queries from the “Queries” pane and make sure that only the Fact_Sales and Dim_Customers load to your Data Model, but inside of Power Query for Excel you need to first load your queries as “connection only” and then load them to your Data Model.

    The main key here is that you need both of those tables / queries that we just created in your Data Model and then inside of it you can create a relationship between those 2 tables using the CustomerKey field from both tables. You can simply drag one field from one table to the field of the other table using the Diagram view and the app will create the relationship for you. The end result will look like this:

    image

    With that out of the way, you can focus on just creating your report. In my case, I ended up creating this report inside of Excel which is basically a top 10 customers by order total from each Customer Group

    image

    Takeaways

    The main takeaway here is that this principle can be used for any Dimension or any type of Normalization scenario that you can think of.

    There is another valid way of doing this and that is by simply keeping the columns that you need and then remove the duplicates from those columns. Again, completely valid but its a matter of preference at that point.

  • Logical Operators and Nested IFs in Power BI / Power Query

    image

    In the previous post I showed you guys how to create a conditional column in Power BI / Power Query using the UI and then just using the Power Query Formula language.

    In this post we’ll go over the available conditional operators and how to do Nested IFs in Power BI / Power Query.

  • Conditional Logic: IF statement for Conditional Columns

    image

    If you come from Excel, you’ve probably seen or heard about the IF statements and its new sister the IFERROR.

    I remember the first time that I saw a conditional chain like the picture below:

    It looked WAY better as a diagram than as an Excel formula, nevertheless – it worked just fine inside of Excel.

    The question is….how do Conditionals work in Power BI / Power Query? do we have an IF function? maybe an IFERROR? THIS is the blog post where I’ll cover this topic.

  • Power BI 101 for an Excel User: Read this before you use Power BI

    image

    For the better part of the last 2 years, I’ve been most of my time working “on the field” getting to know each and every user persona of Power Pivot, Power Query and Power BI in general.

    This is one of the reasons why I didn’t post that much during the 2016-2018 period. I did a full research on my own to better understand the user personas, what their pain points are and how to better reach these with techniques and patterns that are applicable to them.

    I’ve learned a lot from these people and one of the main situations that most new Power BI users that come from Excel face is the fact that they try to tackle their scenarios the same way that they would tackle them inside of Excel, which usually prevents them from taking the full advantage of what Power BI has to offer and at times it makes them waste way too much time in their initial steps because of preconceived ideas.

    In this post I’d like to talk about the main scenario that I see and it has to do with overusing DAX for any situation that you can think of, why this happens and start the conversation on how you can avoid placing yourself or any of your colleagues in this position so you can fully take advantage of what Power BI has to offer which goes beyond just DAX.

  • Combine or Append Data: Optimal Combination Pattern

    image

    This is going to be the last post series in the series on Combine or Append Data.

    In the first post we saw the basics of how to do the Append operation through the UI.

    In the second post we saw the Combine Files experience with Flat Files and how easy it is to combine as many files as you want.

    In the third post we had a contrast of the Combine Files experience using Excel Workbooks instead of simple flat files and what things we needed to consider this time that we didn’t consider with simple flat files.

    In this fourth and last post we’ll be going back to the basics using the function that we discovered in the first post – Table.Combine which is the most optimal function for combining / appending data.

  • Combine or Append Data: Combining Excel Files

    image

    In the previous post we saw how we were able to combine multiple files from a Folder.

    In that post we were using flat files but, how would that process be for Excel files?

    This is the post where we’re going to see the difference between simple flat files and more complex files (like an Excel workbook) when it comes to using the Combine Files experience inside of Power BI / Power Query.

  • Combine or Append Data: Combining Flat Files

    Combine Flat FIles from Folder

    In the previous blog post, I went through the basic concepts behind the Append operation found in Power Query for Excel and Power BI.

    In that post, we only used 2 files and it was pretty straightforward to simply click the Append queries button to combine both queries like so:

    A more complex scenario

    but what happens when you have multiple files? Let’s say 12 files. 1 for each month of the year.

  • Combine or Append Data in Power BI / Power Query: Main Concepts

    image

    I’ve previously done a series on Merge / JOIN operations (First Part here) and it’s now time to do one on Combine / Append operations.

    so…How do you combine / append / stack tables with Power BI / Power Query?

    There are multiple ways to accomplish this, but we’re going to start with the basics.

  • Connecting to Files in SharePoint & OneDrive with Power BI / Power Query

    image

    I’ve been trying to join multiple Facebook communities that revolve around Power BI topics.

    I was able to join a couple communities that are completely neutral in the sense that they’re not run by a for-profit company, but rather just community members which make things easier as there’s little chance of a conflict of interest with the admins of the group.

    One of these groups is called “Power BI Latinoamerica” which is a Community that primarily speaks the Spanish language and within that group one of the admins posted a video that caught my attention:

    It’s basically a video that showcases a way to connect to an Excel file that is being hosted on OneDrive and while that method is completely valid, I was trying to reference the author of that video to one of my articles about connecting to files hosted on SharePoint and OneDrive and then I realized that I haven’t formally wrote about that topic in my blog…ever.

    Disclaimer, I’ve created multiple videos about this for some of my online courses, so you might’ve seen this method before if you’ve followed any of the courses where I participate.

    It’s time to change that! Let’s find out what’s the easiest and most optimal way to connect to ANY file hosted on OneDrive or SharePoint.