Power BI / Power Query and Excel files–XLSX vs XLS formats and datatypes

This is a topic that has been previously covered here by my good friend Ken Puls, but it was just recently that I found this response from Ehren about how Power Query interprets xls files differently to xlsx files and it is certainly worth a blog post to cover this in more detail.

I highly encourage you to read this post if you use Power Query inside Excel or Power BI against Excel files either in *.xls or *.xlsx

The Data inside the files

You can download both files and the final query by clicking in the following button so you can follow along:

Download Files

Both files have exactly the same data. The only difference is that one is saved as an XLS file and the other one as an XLSX file.

What data do we have inside the files? well, we have 3 columns:

Column Date

Has dates, but they are formatted using the ‘dd-MMM-YY’ cell format as shown in the next image:

Column Code

Has numeric values, but they are formatted as text using the ‘00000’ cell format as shown in the next image:

Column Value

Has decimal numbers, but they are formatted as currency with no decimal places as shown in the next image:

Formatting cells is a common practice with Excel files. You format the value of a cell just to make it easier for the end user to read such values, but underneath that formatting mask you have the real value.

How Power Query interprets the Data inside different Excel file extensions

Here’s where it’s important to denote the distinction of how Power Query behaves when connecting to different Excel file extensions.

Take a look at different your data looks when it lands on Power Query depending on the file extension:

As we know from Ehren’s remarks, Power Query uses a different mechanism to get the data out of both of these file extensions.

but how different are these mechanisms? Well, in simple terms, when you connect to an *.xls file you’ll be importing the “masked values” or basically the “formatted cell” and not the actual value of a cell. With an *.xlsx file is the complete opposite, you get the values of each cell and the formatting is not taken in consideration.

Here’s a quick summary table comparing how Power Query interprets the data of both of the files that we have.

Power Query Interpretation
Column Original
Value
xls
file
xlsx
file
Date Floating
point (Date)
Text Date
Code Integer Text Integer
Value Decimal
Number
Text Decimal
Number

General Suggestions

Depending on what you need, you could convert your file from an xlsx file to an xls or the other way around, but if, for example, you need to fix your data somehow, you can always use Power Query’s Conversion operations to change the data type of a column from a text to numeric, text to date and such, but this won’t get rid of the situations where you might be missing data because of the formatting, for example, like in our case where we format the numeric values to not have decimal places, so when we connect to an xls file we completely lose the decimals. In those cases it’s better to either remove the formatting or change the extension to an xlsx.

There are far more benefits in using the modern *.xlsx file extensions (like using Tables), so I encourage you to try and use that extension whereas possible. If it’s not possible, then perhaps your system could output a flat file instead (csv or even txt) which would be easier to handle.

To learn more cool techniques and more advanced scenarios using Power Query and the M language, go ahead and enroll to the FREE trial of the Power Query Academy.

Join the FREE trial of the Power Query Academy

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.