Join our newsletter

#noSpamWePromise
By subscribing to our newsletter you agree with Keboola Czech s.r.o. Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
cross-icon

Run your data operations on a single, unified platform.

  • Easy setup, no data storage required
  • Free forever for core features
  • Simple expansion with additional credits
cross-icon
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

3 Modern Data Architecture Paradigms (Pros & Cons)

A BREAKDOWN OF THE 3 MOST POPULAR MODERN DATA ARCHITECTURE PARADIGMS.

How To
August 18, 2022
3 Modern Data Architecture Paradigms (Pros & Cons)
A BREAKDOWN OF THE 3 MOST POPULAR MODERN DATA ARCHITECTURE PARADIGMS.

“Why are you still doing ETL pipelines?”

“The Data Warehouse is the only way you can keep data quality high, despite the extra data modeling needed.”

“Have you not heard of data mesh beforehand? It solves all centralization problems.”

When it comes to data management, there are as many opinions as there are data managers. 

This is not to say there aren’t any good answers or right principles to follow. But we need to be aware that modern data architecture can take many forms. And one form is not always superior to the others. The subtle difference is in the tradeoffs they offer.

In this guide, we will compare side-by-side 3 architectural patterns present in modern data architectures and evaluate their pros and cons.

Table of contents:

  • What is data architecture?
  • ETL
  • ELT
  • Data mesh
  • Which data architecture should you use?
#getsmarter
Oops! Something went wrong while submitting the form.
Oops! Something went wrong while submitting the form.

Manage and orchestrate your data in one place. Cut costs and build data products in days instead of weeks.

What is Data Architecture?

Data architecture is the blueprint that guides all activities across your company’s data systems.

You can think of data architecture as a unified view of every data flow from raw data to insights and back. Along the way, it specifies all of the technology and processes needed to achieve the information requirements.

This blueprint bridges the divide between the business and technology silos within an organization. It starts with business objectives, specifies data requirements and data standards, then pins down the infrastructure and tools needed to get the data flows going.

3 Popular modern data architecture paradigms

Data architects most often rely on 3 different data architecture patterns for the modern data enterprise needs.

Here we present them side-by-side with an emphasis on tradeoffs in each paradigm.

1. ETL 

ETL architecture

ETL stands for the three steps in this design: Extract raw data from its data sources, Transform the data (e.g. clean it), and finally Load the data into a data storage, where it can be accessed by data analysts, data scientists, or used for business intelligence. 

Let’s look at a concrete example. A data engineer would implement this architectural pattern by building a data pipeline that:

  1. Extracts data from Facebook Ads API.
  2. Transforms the data to aggregate impressions, clicks, and advertising spend on a daily level.
  3. Load the cleaned data into a MySQL relational database.

Curious about the intricacies of ETL? Check our in-depth guide to the ETL process.

ETL is the most common architectural paradigm. It has been around for the longest and its tradeoffs are well known.

Pros:

  • Well-known & understood
  • Lots of data platforms and tools to help with automation and that are business user friendly
  • Fast to set up, so high value for experiments, proof of concepts, and MVP data products.

Cons:

  • Hard to maintain.
  • Doesn’t scale well with new sources, changing data structures, and increased ingestion volumes.
  • Does not answer changing business needs. If a new question is posed, and the data was not explicitly extracted beforehand, a new ETL pipeline needs to be built. 

The challenges of ETL were partially mitigated with the rise of specific tools, such as NoSQL databases, that could handle big data and high volumes at ingestion. 

But because maintenance is hard and ETL only answers part of the business requirements (the ones it was built for, not very future-proof), modern Enterprises often swapped ETL with ELT by adding data warehousing to their data architecture.

Recommended reading: Complete ETL process overview.

2. ELT 

ELT is a data architecture that follows the same steps as ETL but tries to correct for its disadvantages. 

Instead of extracting just some data, ELT extracts all data and loads it into data storage. The data storage is most commonly a data lake architecture, such as Amazon Redshift, Google BigQuery, or Snowflake. With the use of cloud technologies, the load is super efficient and is not a bottleneck for scaling. 

Transform happens only later when data is moved from the data lake into a data warehouse

Moving the data involves rigorous conceptual and physical data modeling in which the schema of the data is defined before the data integration. Incoming data sets need to comply with the schema, otherwise, data does not get inserted into the data warehouse.

Those are just some of the differences. Check how ETL and ELT compare on 11 crucial points.

The ELT, and especially the data warehouse, is becoming the most common modern enterprise architecture. 

Because of its data sanitation and quality assurance, it is loved by business intelligence and data analysts. Because all data is kept in its rich and informative raw form in the data lake, data scientists can build machine learning and artificial intelligence models to answer new and unexpected questions. 

Pros:

  • High data quality. 
  • Easy to perform business intelligence analyses.
  • Can answer new questions (all data is kept).
  • Scalability is solved - all data for future/new data analytics can be imported into data lakes.

Cons:

  • Data modeling and constraints become a bottleneck as the business needs evolve. Adding new validated data models to the schema takes time because all new data needs to follow the conceptual, semantic, and other constraints (e.g. allowed data structures and data types).
  • The data lake can become a data swamp where so much information is kept that it becomes useless for exploratory and data mining tasks.
  • High reliance on data engineers to build new data pipelines - data access is restricted, to avoid users ingesting unvalidated data into the data warehouse.

Data warehouses are great for providing data quality and a holistic picture. 

But they are quite slow to address new changes and change data models to the evolving needs of decision-making in a fast-paced environment. This problem led to the development of data mesh.

Recommended reading: ETL vs ELT: 11 Critical differences.

3. Data Mesh

data mesh architecture

Data mesh recognizes that a single centralized (data warehousing) solution might not be the best for every department. For some, the slow and validated data is great. But for other departments, a choice of different tools would better serve their needs (e.g. a different data storage, like MongoDB for document data or Neo4j for graph analytics).

Data mesh is consumer-centric. Each stakeholder is responsible for building their own ETL/ELT pipelines with the tools and technologies that serve their use cases best. The data assets are then the responsibility of each team and are not shared across the organization (unless this is the intended use case).

So how does data mesh prevent this each-team-on-their-own process from becoming a data mess? With infrastructure as a service. 

Even though each team builds its own data flows, there are common sharing interfaces (data catalogs), data security and governance policies, and underlying technologies that support work across teams.

Want more info? We got you covered: Data mesh - the answer to the failures of centralized data architectures

Data mesh seems intrinsically superior to other architectures because it:

  • Has all the benefits of ETL.
  • Has all the benefits of ELT.
  • Has additional benefits of agility, data relevance, and consumer-centricity.
  • Avoids data operations becoming a mess by imposing the same data infrastructure, standard, and common interfaces.

What are the cons?

  • It is a new architectural paradigm with not a lot of experts that can help you build it from scratch.
  • On the buy-not-build side, data platforms and tools on the market don’t often offer data mesh out-of-the-box.

Recommended reading: Data mesh - the answer to the failures of centralized data architectures.

Manage and orchestrate your data in one place. Cut costs and build data products in days instead of weeks.

Which modern data architecture should you use?

Ultimately, the best modern-day architecture will depend on your business needs: 

  1. Need a quick-and-dirty MVP? ETL is your choice. 
  2. Building a data operation to last and expect standard questions? Go with ELT. 
  3. Working with different teams at different speeds and with different needs? Data mesh is here for you.

What you want to avoid, though, is commitment lock-in which would make your data architecture virtually impossible to switch. Your data architecture choice should not hold you back when your business needs change.

Build your data architecture in a few clicks 

With Keboola, you can connect all your tools into a single infrastructure.

Here is why data architects and engineers love Keboola:

  1. Monthly fees keep your relationship with Keboola flexible, and there is no vendor lock-in. It is easy to take your data and scripts out of Keboola and migrate them to a different solution.
  2. You can avoid manual engineering and maintenance by easily ingesting 250+ sources and destinations with a couple of clicks. 
  3. Develop and automate transformation pipelines using your preferred language (SQL, R, Python, Spark, Julia, and others).
  4. Effortless data governance implementation and guarantee of enterprise-level security standards for all operations within the platform that brings you peace of mind.
  5. Transparent pricing model based on usage. All activities can easily be measured and compared without having to do mental gymnastics for your calculations.

Keboola can be used for ETL, ELT, or even data mesh architectures. 

Try it for free. 

Keboola has an always-free, no-questions-asked plan. So, you can implement all the envisioned architectural designs with a couple of clicks.

Take Keboola for a spin today.

Did you enjoy this content?
Have our newsletter delivered to your inbox.
By subscribing to our newsletter you agree with Keboola Czech s.r.o. Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Recommended Articles

Close Cookie Preference Manager
Cookie Settings
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage and assist in our marketing efforts. More info
Strictly Necessary (Always Active)
Cookies required to enable basic website functionality.
Made by Flinch 77
Oops! Something went wrong while submitting the form.