Data Types, Data Conversion and Ascribed Data Types in Power Query and Power BI

image

As we saw in a previous post, there are many things that we need to consider when dealing with Excel files since even the extension of an Excel file can dramatically impact your solution and how Power Query interprets the contents of that file.

In a more broader sense, we also need to take in consideration 3 key elements that apply to every single data that lands inside Power Query :

  • Data Types – every field/column can have a specific Data Type associated to it
    • Data Type Conversion – every field/column can have its members (rows) converted to another Data Type as well as the Data Type of its field/column defined to a different Data Type.
    • Ascribing Data Types – instead of performing a conversion for every member of a field/column, we can simply define that the column should be considered of a specific Data Type without doing any conversion. This is considered ascribing a data type.

You might’ve heard about Data Type Conversion, as it’s what Power Query automatically does when you click on the “Data Type” dropdown and select a data type for a column, but you can also do what it’s called Ascribing a Data Type, where you can define a new Data Type to a specific field/column without doing a conversion process. This post will showcase how to do this and what are the benefits of doing this.

A Data Type Conversion Cheatsheet

Before we move forward, I want to share something with you.

In most database systems and programming languages you can find a simple cheatsheet of the data types available and what are the possible conversions between those data types.

I couldn’t find one for Power Query’s M Language so I created my own that you can use however you want (please use it as is without modifications). It uses all of the possible data types that you find from the data type dropdown.

image

I sincerely hope that you find it useful and you can leave any feedback/suggestion in the comments section below.

Click in the button below so you can download the samples used in this article:

Download Sample File

Ascribing a Data Type – Theory

Imagine that you have a query (Original Data) like the following:

image

As you can see, none of the columns has a Data Type defined. We can change that by simply selecting each one through the UI and that’ll give us a table that looks like this now:

image

Every single column has a defined data type, except for the personal information column because the UI doesn’t let us.

How can we make sure that this Personal Information column gets the correct data type? well, for that, we’re going to create a new query (FinalTypes) using the following code:

let
     Query1 = type table [  
       Name = text, 
       Hire Date = date, 
       Level = number,  
       Salary = number,
       Personal Information = record
   ]
in
     Query1

That query will look like this:

image

This query is essentially a type that defines what data type each column has inside a table. Again, this is not a table per se, but actually the data types of the columns of a table.

The values that you see as “number”, “text”, “date” and such are types of Primitive Types. Here’s a list of all the types that you can find in the M language:

  • type null, which classifies the null value
  • type logical, which classifies the values true and false
  • type number, which classifies number values
  • type time, which classifies time values
  • type date, which classifies date values
  • type datetime, which classifies datetime values
  • type datetimezone, which classifies datetimezone values
  • type duration, which classifies duration values
  • type text, which classifies text values
  • type binary, which classifies binary values
  • type type, which classifies type values.
  • type list, which classifies list values
  • type record, which classifies record values
  • type table, which classifies table values
  • type function, which classifies function values
  • type anynonnull, which classifies all values excluding null

Our next step is to basically do something similar to a “Format Painter”, but in this case we’re going to be doing a “Data Type Painter”. To do this, we’ll create a new query and we’ll use a function called Value.ReplaceType which requires a value and then a type. The code of that query will look like the following:

let
     Custom1 = Value.ReplaceType(#"Original Data", FinalTypes)
in
     Custom1

The result of that will look like this:

image

What fundamentally happens is that the Value.ReplaceType replaces the Data Types of the original table with the values that we defined in the FinalTypes query. Basically like doing a Format Painter, and is such a close comparison that it really looks like a format painter, because when using the Value.ReplaceType the function will not take in consideration the name of the columns but rather the ordinal position of them, so you need to make sure that the table that you’re trying to apply this “Data Type Painter” has exactly the same amount of columns as your ‘type table’ and that the columns have exactly the same position and name – if the name changes then it’ll be replaced with the name of your ‘type table’.

Ascribing a Data Type – Practical Example

Imagine that you have a query that holds Product Performance on a Monthly basis for the last 7 months. You then create a new column that holds the average of all of those months and you get something like this:

image

Let’s just say that you want to replace all the null values from the monthly columns with the values that we see on the [Average] column.

Replacing values with values from another column

To do that, we use a replace values operation on the monthly columns. So we select those columns, right click and select replace values. We need to find null values but we can’t choose another column for the ‘Replace With’ parameter, so we use a placeholder instead. In our case that placeholder will be the number 0:

image

We then tweak the code from the formula and replace that placeholder 0 with the code each [Average] and that’ll look like this:

image

Notice what happened. All null values have been replaced with the values from the [Average] column. That’s AWESOME! but there’s some bad news…we just completely lost all the data types of all the columns of our table.

How can we fix that? Ascribing the data type!

Ascribing Data Types to the rescue

Let’s go ahead and click on the fx button that we see next to the formula bar so we can create a custom step. In this new step we’ll use the following formula:

= Value.ReplaceType(#"Replaced Value", Value.Type(Source))

image

Notice that we’re also using another function by the name of Value.Type, and what this function does is simply grab any value and output its data type. We provided the value that we had in the Source step and that gave us a type table with all the data type information for each of its columns.

Something to take in consideration is that the columns [1], [2], [3], and even column [7] have the icons of Integer values, but they have some rows with decimal numbers. This is because we ascribed the data types and never performed a data type conversion.

This was done on purpose so you can see what ascribing a data type does and what it doesn’t do. You can fix this by simply making sure that the [Average] column had the correct data type prior to doing the Replace values operation and using that step for the Value.Type function inside the Value.ReplaceType function.

To learn more cool techniques and more advanced scenarios using Power Query and the M language, go ahead and enroll to the FREE trial of the Power Query Academy.

Join the FREE trial of the Power Query Academy