Lazy Evaluation & Query Folding in Power BI / Power Query

How do things work in the world? Humans are curious beings.

We want to know how things work and perhaps harness them for our own purposes.

In this case, we’re curious about how the Evaluation Model works for Power BI & Power Query when it comes to executing your queries and getting your data.

You can read the official documentation from the Microsoft team here, but in this post I’ll cover the basics, so you can have a quick overview of how things work with Power Query / Power BI and how you can optimize your queries to better harness their engine.

How do Queries get evaluated in Power BI / Power Query?

The evaluation process is not  visible to end users. Power BI / Power Query does an amazing job of providing what appears to be a seamless experience to the end-user where the end user can perform all the data transformations that he wants against pretty much any data source.

Same User Interface. Same Functions. Same output format. All through just 1 tool.

When you click on any of the buttons inside the Power Query Editor, the tool will automatically create a piece of code for you. This piece of code is called a step, and you can even see, to your right, a timeline of all the steps that you performed.

 

You can see this piece of code in the “Advanced Editor” and also in the Formula bar.

(Example of Applied Steps by transforming a numeric column using a round operation)

The code that you see there was generated through your clicks and the language that it uses is called M.

I highly recommend that you read this article before moving forward.

Evaluation Model in Practice

Imagine this situation, you have a table inside of a database that you want to connect to. That table holds over 30 Million records, but you only want a subset of those and only a subset of the columns as well. I’m using the AdventureWorks database in case you want to follow along.

How can you make that happen with Power Query without downloading all 30 Million rows?

You’d start by creating the query following these steps:

  • Connecting to the database – it’s a SQL Server Database so we select that option

  • Selecting the table – the name of the table is SalesOrderHeader, and it has 30 Million rows of sales data
  • Doing a Filter on the table –  we do a filter on the OrderDate column to only have the values of the current year

  • Removing some columns – we only want to keep a subset of the columns from the original table. The ones that we want to keep are shown below

Up to this point, we just used the mouse and we never had to use the keyboard other than to input the server and credential details.

After only using the UI, here’s the final code that Power Query created for us:

let
     Source = Sql.Database("azuredb.powerqueryworkshop.com", "adventureworks2012"),
     Sales_SalesOrderHeader = Source{[Schema="Sales",Item="SalesOrderHeader"]}[Data],
     #"Removed Other Columns" = Table.SelectColumns(Sales_SalesOrderHeader,{"SalesOrderID", "OrderDate", "AccountNumber", "CustomerID", "SalesPersonID", "TerritoryID", "TaxAmt", "Freight", "TotalDue"})
in
     #"Removed Other Columns"

  • Facebook
  • Twitter
  • LinkedIn

Let’s pay close attention to the first step of that query and notice the column that reads [Data]:

in that step, you’ll see ALL of the tables, views and functions of that specific database and in that column (Data) you can see that it holds that specific table, view or function data.

I clicked on the whitespace inside the highlighted cell to get a preview of that value in the pane below. That’s a preview of the Table inside that cell.

Wait…so you’re saying that in the very first step Power Query loads ALL of the data from ALL of the objects inside that database?

That sounds scary! but no… that’s not what happens. Take a look at the second step (Navigation):

See? we’ve navigated to a specific table of that database.

This is confusing. Why does Power Query load ALL of the data and then drills down or “navigates” to ONLY that specific table?

Well, that’s not really what happens either.

You’re in front of the beauty of the Evaluation Model. Those other tables that we didn’t select will not get evaluated. Only the one that we navigated to will be evaluated.

In the Power Query formula language or M language, some values get lazily evaluated, which means that they might not get evaluated at all in your query and that’s the situation that happens with that specific query.

Power Query doesn’t evaluate ALL of the tables, views and functions of that database, it only evaluates the SalesOrderHeader table – the rest are lazily evaluated

Note: You can read this full article in the event that you’re interested in learning more about the “Navigation” step.

What is Lazy Evaluation?

In short, lazy evaluation means that the evaluation of a specific value is deferred until its results are needed by other computations.

This means that some values won’t be evaluated at all – Power Query might show pointers in the Preview window, but that doesn’t mean that they’re evaluated / calculated.

Whenever you see a column with colored values like List, Table or even Records, take in consideration that those values might be lazily evaluated.

(these are usually pointers)

That’s the main reason why when you try to see the preview of one of those cells, it might take a pretty long time to evaluate that preview – because it wasn’t there before and it needs to be evaluated.

Going back to our previous example, these values from the Data Column are shown in that specific preview as pointers to a possible result, but that result won’t be evaluated until you tell Power Query that you need to evaluate it.

This doesn’t only happen with List, Table or Records values. Wherever Power Query can create a pointer, it will use it.

That’s right – even Data Source functions can be considered lazily evaluated at times.

Let me give you an example of a situation that a friend from Europe had a few months ago and how Lazy Evaluation came into play.

Lazy Evaluation in Data Source functions

So my friend had this piece of code:

try Folder.Files(“C:\MyNonExistingfolder”) otherwise Folder.Files(“C:\users”)

In essence, what he wanted to accomplish was a simple conditional. He implemented a basic IFERROR where, if the initial folder didn’t exist, it’ll fall back to a different one.

The try function is essentially just another way of doing an IFERROR. The first argument is the valid value, and the 2nd one is the alternate result.

Now, I purposely wrote the code this way because the folder “C:\MyNonExistingfolder “ well…it doesn’t exist, BUT why are we getting that folder instead of the other one that actually exists? (“C:\users”)

The problem? Lazy evaluation.

If you try doing just:

try Folder.Files(“C:\MyNonExistingfolder”)}

you’ll get this result:

which basically says that there’s no error. Even when the Folder doesn’t even exist.  This happens because there’s a lazy evaluation and the actual folder hasn’t been evaluated whatsoever.

What do we need to do? Force the evaluation!

How do we force an evaluation? Using any of the Buffer functions

For our case, we’ll be using the Table.Buffer function to force the evaluation of our code

and voilá! we now get the result that we were looking for. And this is how you force the evaluation of a value.

What is Query Folding?

Now we head over to one of the coolest things about this Evaluation Model: Query Folding.

In short, query folding is the process where the M code gets translated into the native query language of a data source.

This happens automatically. You don’t have to do anything whatsoever. Power Query will try to do its best job to optimize everything that you’ve done and “fold it”.

There are some interesting articles out on the web right now that have covered this topic before:

What’s different about this post than the rest? we take a purely practical approach.

Query Folding in Practice

Let’s use the same query that we previously saw and elaborate on that.

Here’s the situation, our database has MILLIONS of records, but we only want a subset of them and that’s why we did the “Filtered Rows” step and then we removed some columns.

How can Power Query tell my database to only retrieve only that subset of data?  Query Folding is the answer!

If you go step by step, you’ll think that Power Query does the following:

  1. Connect to the database
  2. Does a lazy evaluation of some tables until we connect to a specific table (SalesOrderHeader)
  3. Downloads a copy of that table to my local hard drive and then proceeds to perform transformations such as Filtering and Removing rows locally

Well, you’ll be wrong.

Instead, what happens is that Power Query takes all of your steps and translates them into a single Native Query for your database so your server/database only receives the command to send us the output that we’re looking for WITHOUT ever performing any transformations locally.

HOW DOES POWER QUERY DO THAT? It’s pretty much magic. Nobody outside Microsoft really knows how that works, but you don’t have to do anything to make it work. THIS is Query Folding in a nutshell.

How can I tell if Query Folding is happening?

There are 2 things that need to happen in order for Query Folding to work:

  1. Your Data Source must have its own Native Query Language. (Examples: Databases and OData)
  2. Your steps should be compatible with Query Folding

The first one is easier to determine than the last. If you’re connecting to an Excel, CSV, TXT, JSON or other type of file directly, you won’t be able to take advantage of Query Folding because they don’t handle their own query language.

In contrast, a SQL Database, a MySql, Oracle, Postre or any other type of database does have their own query language and Power Query will be able to translate your steps into a native query language for that data source.

The second one is usually a mystery. Microsoft doesn’t tell us what is foldable and what it’s not, mainly because it relies on a number of factors and variables, but there are 2 methods to tell if query folding is happening:

  1. Using a monitoring tool for your data source – to tell what type of command you’re receiving from Power BI / Power Query
  2. Through Power Query – there’s a couple of ways to see if Power Query was able to come up with a specific Native Query for your source

I’m going to be focusing on the last one, as the first one does rely on your own data source. For example, for SQL Server you can use SQL Server profiler, for OData you could even use a tool like Fiddler that serves as a proxy to figure out what requests are being sent.

Method 1: Right click the Step

(SQL Statement created by Power Query and pushed to the data source)

This is the easiest way to make it work, you can just right click one of the step of your query and it the View Native Query button is not greyed out, then you can be certain that Query Folding is being applied.

What if the button is greyed out?  at that point we can’t rule out that Query Folding is happening. At times, Power Query can perform the native query without ever showing it.

Method 2: Use the Value.Metadata function

This is a more advanced way to tell if your query has in fact Query Folding, but it’s still relatively simple.

All you need to do is go to the formula bar and wrap your step code with the function Value.Metadata.

What I usually do is just click on the fx button in the formula bar to create a custom step and then wrap that around the Value.Metadata like this:

The result of that will give you a record and if that record has the “Query Folding” field. You can navigate through that record and see what’s inside:

This is just another way to see if there’s a Native Query at that step or not.

What do you think about Lazy Evaluation and Query Folding after reading this article?

Let me know your thoughts in the comments below!

Leave a Reply

avatar
  Subscribe  
Notify of