Learn about the 3 options for Power BI data preparation and which one offers the best trade-offs
Microsoft Power BI is a fantastic tool for data visualization and business intelligence.
But as any data analyst can attest, before you can build metrics and dashboards, you need to spend more than 1/3 of your time preparing and cleaning the dataset for your Power BI instance.
Luckily, there is a better way to cut down on this time-consuming task.
In this article you will learn:
The goal of data preparation for Power BI is to build a common data model. A unified single source of truth that centralizes dispersed datasets from different data sources, systematizes data types across datasets, and links data across tables so it can be joined when building metrics and visualizations.
There are three possible ways of building the common data model:
What is it?
Power Query is a data transformation and data preparation engine that lets do ETL operations inside Power BI Desktop.
How do you prepare data with Power Query?
Business analysts use Power Query to get data from different data sources (e.g. Excel and CSV files, SQL server, Azure data lake, etc.), clean data with data transformations inside the Power Query editor, and finally load data into Power BI Desktop, where they use DAX to build business metrics.
The main challenge of using Power Query
The main challenge of this approach is reusability. ETL pipelines made with Power Query bypass the company-wide data engineering ETL processes and make it hard to reuse such a pipeline across workspaces.
This is why Microsoft launched Dataflows.
What are Dataflows?
Dataflows are a fairly recent Power BI service that tries to make data preparation for Power BI more self-service than Power Query. In their essence, Dataflows are a collection of tables that are created and managed in workspaces in the Power BI service.
How do you use Dataflows to prepare data?
Dataflows uses a similar UI to Power Query, where you click in the UI to select data sources, define the column value data types and headers, link tables together to create a schema, and finally save the data flow into a common dataflow repository inside Azure.
Then, in the Power BI Desktop, you choose a dataflow as a data source and pull the data into your workspace to work on it, build aggregations and metrics, and generally transform data into metrics.
What is the main challenge of Dataflows?
The main challenge of Power BI Dataflows is that they are quite limited in the data sources they offer to the user building an ETL pipeline with them.
Instead of relying on Power BI services to prepare the data, you can create entire datasets with devoted ETL/ELT tools.
What are ETL/ELT tools?
ETL tools are software that uses automation to simplify each step of the extract, transform, and load data pipeline.
How to ETL/ELT tools work?
ETL/ELT tools offer connectors that automatically collect data from raw data sources, and other connectors that load the data into data storage (such as SQL server, Azure data lake, or even less-Microsoft focused technologies, such as Google BigQuery, Snowflake, Amazon Redshift, and the like).
For transformations (data cleaning, aggregation, and the like), some ETL tools offer code blocks that you can use to script your transformations, or no-code features that achieve the same but through a click-and-point or drag-and-drop user interface.
What is the main advantage of ETL tools?
Unlike Power BI services, devoted ETL tools give you ownership over your data pipelines and can be used without the need to rely on Power BI.
Without a shred of doubt, the best option is to use a devoted ETL tool.
Why?
This sounds good in theory, so let’s see in practice how simple it is to create an ETL data pipeline that can be picked up by Power BI.
Keboola is the data platform as a service that helps you build ETL and ELT data pipelines with low-code features for data engineers and no-code features for domain experts who don’t know how to code.
Let’s see how simple it is to prepare data for Power BI via Keboola’s Visual Flow Builder.
Log into your Keboola account to get started.
Don’t have a Keboola account yet? No worries! Keboola offers an always-free (no credit card required) tier that allows you to follow along. Create an account here.
Once in Keboola, select from the top menu Flows > Flows > New Flow:
Give your new flow a name and an easy-to-understand description. Then click create flow.
This will open up the Flow Builder view, where you can view and build your data flow.
Start by selecting the components to collect your data from your data sources. The general process is simple:
1. Select the right component to extract/transform/load data from/to your specified data source.
2. Authorize Keboola to access data in that component.
3. Extract/transform/load data.
We will show you an end-to-end example of setting up ready-made components by walking you through reading data files from Google Drive.
We start by clicking on Select Components > Google Drive to collect a file from our Google Drive folder.
We proceed by clicking on the Google Drive component> show details cog > Select or Create Configuration to add a new configuration. The configuration grants access to Keboola so it can read the data files on your Google Drive:
Name the configuration (e.g., “Google Drive Config”) and click on Create (shown below) and the Configure (not shown):
The wizard guides you through Google’s authorization to give Keboola access to our Google Drive (not shown here, but super easy to follow).
Continue adding other ready-made components and organize them in a logical flow or select the ones you want to run in parallel using the drag-and-drop function.
For each flow:
1. Select one or multiple ready-made components.
2. Organize them into a logical flow using the drag-and-drop functionality. 3. Configure each component by providing the credentials and instructions for what or where to extract or write or for what code to execute in a transformation.
4. Save the changes.
Once you’ve built your flow end-to-end simply click on Run flow, to set the data pipeline into action.
Within the Flow Builder, click on Set schedule under Automate to set up the orchestration that will automatically run your job at your preferred recurring time.
That’s the power of automation for you - set it and automate it.
In the tab “All runs” you can check the run history of your flow with its detailed breakdown of each task.
Your data is ready to be imported into the Power BI Desktop app, where you can visualize dashboards and build metrics with DAX.
Keboola empowers every user.
Whether you’re a data engineer who needs low-code features to construct your data pipeline with Python or a machine learning data scientist who wants to quickly build a dataset with no-code features, Keboola offers the Visual Flow Builder and no-code transformations to self-service your needs.
With 250+ connectors that extract and load the data out-of-the-box, Keboola can help you cut the time for data prep and liberate resources for data analysis in Power BI.
Did we mention it’s free?
Keboola offers an always free tier (no credit card required) so you can test and develop ETL data pipelines for your Power BI instance without breaking the piggy bank.