Comparing dbt vs Airflow across 5 main points.
Apache Airflow and dbt are both popular contenders for the best workflow management tool in the Modern Data Stack.
However, when it comes to picking the best tool, the differences are not trivial, and the use cases for each tool are quite different.
In this article, we’ll compare dbt vs Airflow on multiple points:
Apache Airflow and dbt are both open-source frameworks that empower data teams to build data pipelines. But in fact, they build different types of data pipelines, have different target users, and offer distinct advantages and disadvantages.
*Of course this article will compare dbt vs Airflow on the crucial differences and showcase how they are more apt for different data use cases. But keep in mind there is a third option, Keboola, that can achieve the best results of both tools at a fraction of the costs.
Apache Airflow is a Python-based open-source framework that allows data teams to schedule and automate workflows with DAGs (Directed Acyclic Graphs). Data teams use Airflow for a myriad of use cases: from building ETL data pipelines to launching machine learning apps.
How does Apache Airflow work?
Airflow offers many advantages:
However, Airflow also has many disadvantages:
Read more: What are the best alternatives to Airflow?
dbt is a SQL-based data engineering framework maintained by the team at dbt Labs. You can use dbt as part of dbt Cloud (vendor solution) or dbt Core (open-source solution).
dbt empowers data engineers and data analysts to transform data in the warehouse (e.g., Snowflake) through SQL code.
How does dbt work?
The main advantage of dbt is the low barrier to entry (need to know SQL only) that allows data analysts to perform data engineering without the overhead.
Alas, dbt is focused on the data transformation layer of ETL data pipelines. And unless your source data is within the same data warehouse where you run dbt, dbt’s greatest shortcoming is its inability to extract or load data between different endpoints.
Read more: Why did we integrate dbt into Keboola?
They are not. Data teams can use Airflow’s BashOperator to run dbt CLI from Airflow.
Unfortunately, such an architecture loses the main point of dbt, which is to empower data analysts to do their own data engineering work.
Is there a way to combine Airflows’s ability to extract, transform, and load data with Python with the user-friendliness of dbt for easy self-service by analysts?
Yes, the answer is Keboola.
Keboola is a data platform as a service that joins the best of both Airflow and dbt.
In Keboola, you can specify transformations in Python, SQL, R, Julia, or even use dbt Core. With 250+ pre-built extractors and writers, you can go beyond Airflow’s operators to automate ETL data pipelines.
What’s more, Keboola’s orchestrator self-heals and monitors issues with out-of-the-box telemetry, so your data engineers can keep control of their data pipeline workflows on the low level.
Keboola allows you to have the user-friendliness of dbt with the power of Airflow in one place. So it will be offered as a third option for this comparison.
To better understand which out of Airflow vs dbt is better for your data teams, let’s look at their transformation abilities.
dbt beats Airflow on two key features of data transformations:
The main advantage of Airflow transformations over dbt transformations is the ability of Airflow to tap into the vast Python ecosystem:
The Keboola transformation engines offer you advantages of both dbt and Airflow. Because you can run dbt cloud jobs and use Python as a backend engine in Keboola, you unlock all their advantages and more:
What’s more, Keboola gives you multiple options for transforming data:
Airflow is better than dbt at orchestrating workflows if:
However, dbt is better if:
With Keboola you can:
A main advantage of Keboola over dbt and Airflow is its traceability. Data lineage is traced by default for every job. Making it easier to debug complex setups.
Both Airflow and dbt are open-source.
Open-source tools are renowned for their low entry costs. There are no vendor fees, no licensing, and no consumption caps. But you pay the bill down the line with higher maintenance costs and costly data engineering hours to customize the chosen solution to your data integration needs.
Instead, Keboola offers a fully managed solution with no maintenance overhead and no deployment headaches at zero costs.
Its free tier allows you to orchestrate data pipelines without swiping the credit card.
Both dbt and Apache Airflow are open-source tools. This means you’ll have to get support via docs, online tutorials, and GitHub requests or Slack messages with the team.
This can be challenging.
Airflow is designed to run on-premise as a self-service solution. Which poses its own challenges for setting up Airflow depending on your on-premise infrastructure. For example, Airflow does not run on Windows. If you’re a Windows shop, you’ll have to figure out how to make Airflow run with Docker. Expect to spend some time debugging your DataOps architecture to make it work.
Some Cloud providers expose Airflow’s web user interface or command line interface to paying customers (e.g. Google Cloud Platform via Cloud Composer, AWS under Amazon Managed Workflows for Apache Airflow (MWAA), and Microsoft Azure with Docker/Kubernetes deployments), but the managed service is often pricier than its cloud-native alternatives (e.g. AWS step functions).
Similarly, you can get paid support for dbt Cloud, but it also comes with a vendor price tag.
Prepare your data engineering teams to spend some time debugging the tool of your choice before making it work.
Alternatively, you can pick Keboola. Keboola is a fully managed platform so you will not waste time setting up your DevOps or debugging your solution. With its always free tier, you can use Keboola immediately. And next to extensive documentation and tutorials, there is always a human on the other side ready to help.
Its support is so good that it makes Keboola the users’ #1 choice. But don’t just take our word for it. Check the G2 crowd reviews and awards.
“Keboola puts you in a full control of your data. We have a lot of options to choose from in one platform. It gives us enough room for creativity in approaches to data transformation. It helps us to consume the data and insights in the most suitable way for us.” - Monika S., Head of data team
Whether your data team will benefit more from Airflow or dbt will depend on your data use cases:
Who is the tool best for?
But you don’t have to choose one or the other. Pick Keboola to have the best of both worlds.
Keboola allows you to tap into the advantages of both dbt and Airflow:
With no upfront costs, always free tier, and superior features.