Before you can visualize your data, you have to collect it, clean it, and validate it.
Google Data Studio is a beautiful visualization tool that turns your data into compelling story-telling reports.
But before you can visualize your data, you have to collect it, clean it, and validate it.
This is where Keboola comes in.
Keboola is the Data Stack as a Service (DaaS) platform that helps you with all your data operations - from building and automating ETL pipelines to data governance. Many companies use Keboola to extract data from its raw sources, clean it, save the cleaned data, and load it to a data visualization tool such as Google Data Studio.
Keboola uses components called Data sources and Data destinations to automate the data pipeline work.
But among its 200+ components, there is no Data Studio Extractor. So how can you send data from Keboola to Google’s Data Studio?
We recommend two methods:
To follow along with this tutorial, you need to:
This method will export data in the form of a CSV file from Keboola to your local computer and then from your local computer to Data Studio. There’s also an option to connect data from Google Sheets directly, however, there is a limited amount of rows.
After you have extracted, cleaned, and saved your data, go to Storage and select (click) the bucket where the data you want to export resides. In the example below, we saved the data to the bucket called “cleaned_wines”:
Once inside the bucket, hover over the table you want to export and click on the three dots (...) in the right corner, then click “Export”.
Keboola will prepare the data for exporting. When ready, click on the “Download” button in the popup window to save the CSV file to your local computer.
The file will download to your computer’s default download destination in a .gz compressed format. Click on the file to extract the CSV from the compressed .gz file (not shown here).
Head to your Google’s Data Studio account and click Create > Data source.
Google’s Data Studio will offer you a multitude of options. Use the search bar to find the CSV connector called “File Upload” and click on it.
The wizard will drive you through authorizing the “File Upload” connector (1st image), and uploading data to Google’s Data Studio (2nd image):
Once the upload is finished, click on “Connect” to save the data to Google’s Data Studio.
Your data is now safely loaded into Google Data Studio and you can start building reports that will mesmerize your audience.
This method assumes you have a running instance of BigQuery data warehouse on Google Cloud Platform (GCP).
You will build a data pipeline that will write data from Keboola > BigQuery > Data Studio.
A service account will give programmatic access to Keboola, so Keboola can write (send) data to GCP’s BigQuery.
To set up a service account:
Add the BigQuery Data Editor and BigQuery Job User roles in the step “Grant this service account access to the project”. Click Continue.
In the last step leave everything as is and click Done. You will be redirected to the list of your service accounts for your project, as pictured below. Click on the Service Account email to enter the detailed view.
In the detailed view, click on the tab Key > Add key > Create new key.
Select JSON for “Key type” and Create the new key.
The JSON key will download automatically to your computer. You have finished setting up the programmatic access permissions, now you can configure the BigQuery writer in Keboola, to start sending information from Keboola to BigQuery.
Keboola’s BigQuery Writer is a component that allows you to automatically write (send) data from Keboola to BigQuery.
Keep in mind that BigQuery is just one of many different data storages you can use to build your pipelines. If you are using Amazon Redshift, MySQL, PostgreSQL, or other databases and data warehouses, you could build pipelines with the same guiding principles. Just pick the appropriate Keboola Writer and send the data to the data storage you <3 the most.
To configure it, log into Keboola and:
You will be redirected to the “My Google BigQuery Data Destination” configuration.
Start the configuration by adding the service account key we created in the previous step. Click on Set service account key.
A pop-up will open. Simply copy-and-paste your JSON account key created in the previous step on Google Cloud Platform into the window and click Submit.
Once saved, your configuration will update to indicate you added the Google Service Account Key:
Next, create the BigQuery Dataset where data from Keboola will be written to. In case the dataset does not already exist in BigQuery, the service account will create it. We named the dataset bq_red_wines (snake_case with underscores) and clicked Saved.
Next, we add the data we would like to export from Keboola to BigQuery. In the last configuration section:
Keboola will offer you the option to adjust the data loading specifications:
Finally, export the data to BigQuery by clicking on Run Component in the right-hand menu.
The data has been successfully sent from Keboola to BigQuery:
Head to your Google’s Data Studio account and click Create > Data source.
Google’s Data Studio will offer you a multitude of options. Use the search bar to find the connector called “BigQuery” and click on it.
If you use the same account for Data Studio as you do for Google Cloud Platform, Data Studio will automatically connect to BigQuery. Otherwise, go through the authorization setup.
Next, pick the data source you would like to import: My Projects > Project Id (name of the project in GCP, for example, Red Wines Library) > Dataset (e.g., bq_red_wines) > Table (e.g. cleaned_red_wines) > click Connect (upper right corner).
Your data is now safely loaded into Google Data Studio and you can start building reports that will mesmerize your audience.
Both CSV and BigQuery allow you to export your data from Keboola to Data Studio in a simple and fast way. But which one should you choose?
CSV exports and imports are easier and faster to configure, work better for smaller data sets, but cannot be automated. For each data update at the source, you will have to re-do the export via CSV to keep data fresh. Alternatively you could push your data to Google Spreadsheets instead of CSVs - this method can be fully automated (detailed tutorial here).
On the other hand, the pipeline with BigQuery takes a bit longer to configure, but can easily be automated end-to-end, and can scale to large amounts of data.
Ultimately, the choice will depend on your data needs: small data with fast configuration vs big data with (slightly) higher upfront investment.