Join our newsletter

#noSpamWePromise
By subscribing to our newsletter you agree with Keboola Czech s.r.o. Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
cross-icon

Run your data operations on a single, unified platform.

  • Easy setup, no data storage required
  • Free forever for core features
  • Simple expansion with additional credits
cross-icon
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

dbt vs Airflow vs Keboola

Comparing dbt vs Airflow across 5 main points.

How To
March 27, 2023
dbt vs Airflow vs Keboola
Comparing dbt vs Airflow across 5 main points.

Apache Airflow and dbt are both popular contenders for the best workflow management tool in the Modern Data Stack.

However, when it comes to picking the best tool, the differences are not trivial, and the use cases for each tool are quite different. 

In this article, we’ll compare dbt vs Airflow on multiple points:

  1. Big picture: Key features and shortcomings
  2. Transformations
  3. Orchestrations
  4. Support
  5. Pricing
  6. Which one to choose
#getsmarter
Oops! Something went wrong while submitting the form.
Oops! Something went wrong while submitting the form.

dbt vs Airflow - Big picture comparison

Apache Airflow and dbt are both open-source frameworks that empower data teams to build data pipelines. But in fact, they build different types of data pipelines, have different target users, and offer distinct advantages and disadvantages. 

*Of course this article will compare dbt vs Airflow on the crucial differences and showcase how they are more apt for different data use cases. But keep in mind there is a third option, Keboola, that can achieve the best results of both tools at a fraction of the costs.

Apache Airflow

Apache Airflow is a Python-based open-source framework that allows data teams to schedule and automate workflows with DAGs (Directed Acyclic Graphs). Data teams use Airflow for a myriad of use cases: from building ETL data pipelines to launching machine learning apps.

How does Apache Airflow work?

  • Declare data pipelines by defining a DAG. A DAG chains different Python scripts in a dependency graph.
  • Each Python script does one of the following: data extraction, data transformation, and data loading.
  • Python scripts can be semi-automated with Airflow’s operators, aka pre-built Python scripts for common use cases (for example, loading data into Snowflake).

Airflow offers many advantages:

  • Monitor workflows easily through Airflow’s graphical UI.
  • Powerful and intuitive transformations using Python (e.g. you can run big data transformations in Spark).

However, Airflow also has many disadvantages: 

  • Doesn’t preserve metadata when an Airflow DAG gets deleted, making debugging data pipelines hard.
  • You’ll need some DevOps skills to get it running. For example, Airflow doesn’t run natively on Windows, you’ll have to deploy it via a Docker image.
  • Non-intuitive for non-engineers. Airflow is built for data engineers. Can’t be used as a workflow tool to empower non-technical experts.
  • Its orchestration engine often malfunctions.

Read more: What are the best alternatives to Airflow?

dbt

dbt is a SQL-based data engineering framework maintained by the team at dbt Labs. You can use dbt as part of dbt Cloud (vendor solution) or dbt Core (open-source solution).

dbt empowers data engineers and data analysts to transform data in the warehouse (e.g., Snowflake) through SQL code.

How does dbt work?

  • dbt organizes all your data engineering logic on the level of dbt projects - folder or repos with config files and transformations (called dbt models).
  • When you run a dbt model (execute “dbt run” in the command line interface), the model executes a set of SQL SELECT commands that transform the data and create a new dataset.

The main advantage of dbt is the low barrier to entry (need to know SQL only) that allows data analysts to perform data engineering without the overhead.

Alas, dbt is focused on the data transformation layer of ETL data pipelines. And unless your source data is within the same data warehouse where you run dbt, dbt’s greatest shortcoming is its inability to extract or load data between different endpoints.

Read more: Why did we integrate dbt into Keboola?

Are they mutually exclusive?

They are not. Data teams can use Airflow’s BashOperator to run dbt CLI from Airflow. 

Unfortunately, such an architecture loses the main point of dbt, which is to empower data analysts to do their own data engineering work.

Is there a way to combine Airflows’s ability to extract, transform, and load data with Python with the user-friendliness of dbt for easy self-service by analysts?

Yes, the answer is Keboola.

The third option: Keboola

Keboola is a data platform as a service that joins the best of both Airflow and dbt.

In Keboola, you can specify transformations in Python, SQL, R, Julia, or even use dbt Core. With 250+ pre-built extractors and writers, you can go beyond Airflow’s operators to automate ETL data pipelines.

What’s more, Keboola’s orchestrator self-heals and monitors issues with out-of-the-box telemetry, so your data engineers can keep control of their data pipeline workflows on the low level.

Keboola allows you to have the user-friendliness of dbt with the power of Airflow in one place. So it will be offered as a third option for this comparison. 

To better understand which out of Airflow vs dbt is better for your data teams, let’s look at their transformation abilities. 

dbt vs Airflow - Data Transformations 

dbt beats Airflow on two key features of data transformations:

  1. Parametrized transformations are more intuitive in dbt. You simply pass the incoming data source and outgoing data destinations as parameters to the dbt models. Airflow has Dynamic Task Mapping which requires more code and architectural design choices to achieve the same.
  2. dbt offers incremental transformations. A dbt DAG can specify an incremental macro that only transforms new data in the source/destination table, saving on compute and storage resources when performing transformations. Incrementality is a challenge for Airflow, and not just for transformations. Even incremental data extraction and data loading require a lot of manual coding.

The main advantage of Airflow transformations over dbt transformations is the ability of Airflow to tap into the vast Python ecosystem:

  1. Advanced transformations. You can easily pick advanced transformations for parallelized JSON flattening or use a data science API call to apply state-of-the-art machine learning transformations (e.g. outlier removal) without the need to code your own transformation logic.

Keboola 

The Keboola transformation engines offer you advantages of both dbt and Airflow. Because you can run dbt cloud jobs and use Python as a backend engine in Keboola, you unlock all their advantages and more:

  • Parametrized dbt transformations
  • Incremental dbt transformations.
  • Advanced Python transformations like Spark transformations for big data algorithms.

What’s more, Keboola gives you multiple options for transforming data: 

  • Execute transformations via a command line interface (CLI)
  • Write low-code transformations in multiple languages (Python, SQL, R, Julia, …), or 
  • Use no-code transformations that empower domain experts without any coding skills.

dbt vs Airflow - Orchestrations

Airflow is better than dbt at orchestrating workflows if:

  1. Orchestrate a diverse stack. Your team needs to ensure that a dbt job kicks off before or after another process outside of dbt.
  2. You need finer control over dependencies - for example, making sure one dbt job starts only after another has finished. Consecutive dependency triggering is better solved in Airflow.

However, dbt is better if:

  1. You are orchestrating semantic dependencies. dbt models can be triggered so that tables don't load until their dependencies are loaded. Irrespective of the jobs in your orchestration engine.
  2. With dbt templates you can reuse the same code for multiple orchestrations, making it easier to understand what is going on without getting bogged down in details. 

Keboola

With Keboola you can: 

  • Use the classic cron orchestrator, similar to Airflow, allowing you to specify Airflow-like DAG dependencies
  • Additionally, because Keboola runs dbt Core as one of its backends, you can orchestrate semantic dependencies with Keboola as well.

A main advantage of Keboola over dbt and Airflow is its traceability. Data lineage is traced by default for every job. Making it easier to debug complex setups.

dbt vs Airflow - Pricing

Both Airflow and dbt are open-source.

Open-source tools are renowned for their low entry costs. There are no vendor fees, no licensing, and no consumption caps. But you pay the bill down the line with higher maintenance costs and costly data engineering hours to customize the chosen solution to your data integration needs.

Instead, Keboola offers a fully managed solution with no maintenance overhead and no deployment headaches at zero costs. 

Its free tier allows you to orchestrate data pipelines without swiping the credit card.

dbt vs Airflow - Support

Both dbt and Apache Airflow are open-source tools. This means you’ll have to get support via docs, online tutorials, and GitHub requests or Slack messages with the team.

This can be challenging.

Airflow is designed to run on-premise as a self-service solution. Which poses its own challenges for setting up Airflow depending on your on-premise infrastructure. For example, Airflow does not run on Windows. If you’re a Windows shop, you’ll have to figure out how to make Airflow run with Docker. Expect to spend some time debugging your DataOps architecture to make it work.

Some Cloud providers expose Airflow’s web user interface or command line interface​​ to paying customers (e.g. Google Cloud Platform via Cloud Composer, AWS under Amazon Managed Workflows for Apache Airflow (MWAA), and Microsoft Azure with Docker/Kubernetes deployments), but the managed service is often pricier than its cloud-native alternatives (e.g. AWS step functions). 

Similarly, you can get paid support for dbt Cloud, but it also comes with a vendor price tag.

Prepare your data engineering teams to spend some time debugging the tool of your choice before making it work. 

Alternatively, you can pick Keboola. Keboola is a fully managed platform so you will not waste time setting up your DevOps or debugging your solution. With its always free tier, you can use Keboola immediately. And next to extensive documentation and tutorials, there is always a human on the other side ready to help. 

Its support is so good that it makes Keboola the users’ #1 choice. But don’t just take our word for it. Check the G2 crowd reviews and awards.

“Keboola puts you in a full control of your data. We have a lot of options to choose from in one platform. It gives us enough room for creativity in approaches to data transformation. It helps us to consume the data and insights in the most suitable way for us.”  - Monika S., Head of data team

dbt vs Airflow - How to choose?

Whether your data team will benefit more from Airflow or dbt will depend on your data use cases:

  1. Transformations. dbt is more suitable for organizations looking to parametrize multiple similar transformations in a data warehouse and save on resources with incremental materializations. Airflow is a better choice for teams that have complex transformations that can be automated with Python libraries.
  2. Orchestrations. If you orchestrate a varied data stack, Airflow is better. If your orchestrations need to handle semantic dependencies, pick dbt.
  3. Support. Neither Airflow nor dbt offer out-of-the-box support or SLAs. You will have to spend lots of time debugging your solution or opt for a vendor who packages dbt or Airflow for you.
  4. Pricing. Both dbt and Airflow are free and open-source. The initial costs and investments are low, but you pay down the line with engineering resources needed to debug, maintain, and customize the chosen solution.

Who is the tool best for?

  • Airflow is best for Data engineering teams with knowledge of DevOps that are managing a complex orchestration ecosystem that includes multiple workflows over different tools.
  • dbt is best for Data analysts teams who need to self-serve their engineering needs via SQL-based transformations in the data warehouse. 

But you don’t have to choose one or the other. Pick Keboola to have the best of both worlds.

Get the best of both Airflow and dbt with Keboola

Keboola allows you to tap into the advantages of both dbt and Airflow:

  1. Transformations: both options as dbt and Airflow, with additional languages (R, Julia) and deployment modes (CLI, low-code, no-code).
  2. Orchestrations: incremental, consecutive, scheduled, and flexible. Maintained by the Keboola team, so you don’t have to sweat the DevOps part.
  3. Pricing: cost-effective solution with a great entry start (aka, FREE).
  4. Support: stellar and award-winning.
  5. Best for: Teams of data engineers, data scientists, data analysts, and even just data enthusiasts without coding skills, who need to self-serve and automate data operations.

With no upfront costs, always free tier, and superior features.

Try Keboola for free

Did you enjoy this content?
Have our newsletter delivered to your inbox.
By subscribing to our newsletter you agree with Keboola Czech s.r.o. Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Recommended Articles

Close Cookie Preference Manager
Cookie Settings
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage and assist in our marketing efforts. More info
Strictly Necessary (Always Active)
Cookies required to enable basic website functionality.
Made by Flinch 77
Oops! Something went wrong while submitting the form.