A BREAKDOWN OF THE 3 MOST POPULAR MODERN DATA ARCHITECTURE PARADIGMS.
“Why are you still doing ETL pipelines?”
“The Data Warehouse is the only way you can keep data quality high, despite the extra data modeling needed.”
“Have you not heard of data mesh beforehand? It solves all centralization problems.”
When it comes to data management, there are as many opinions as there are data managers.
This is not to say there aren’t any good answers or right principles to follow. But we need to be aware that modern data architecture can take many forms. And one form is not always superior to the others. The subtle difference is in the tradeoffs they offer.
In this guide, we will compare side-by-side 3 architectural patterns present in modern data architectures and evaluate their pros and cons.
Table of contents:
Data architecture is the blueprint that guides all activities across your company’s data systems.
You can think of data architecture as a unified view of every data flow from raw data to insights and back. Along the way, it specifies all of the technology and processes needed to achieve the information requirements.
This blueprint bridges the divide between the business and technology silos within an organization. It starts with business objectives, specifies data requirements and data standards, then pins down the infrastructure and tools needed to get the data flows going.
Data architects most often rely on 3 different data architecture patterns for the modern data enterprise needs.
Here we present them side-by-side with an emphasis on tradeoffs in each paradigm.
ETL stands for the three steps in this design: Extract raw data from its data sources, Transform the data (e.g. clean it), and finally Load the data into a data storage, where it can be accessed by data analysts, data scientists, or used for business intelligence.
Let’s look at a concrete example. A data engineer would implement this architectural pattern by building a data pipeline that:
Curious about the intricacies of ETL? Check our in-depth guide to the ETL process.
ETL is the most common architectural paradigm. It has been around for the longest and its tradeoffs are well known.
Pros:
Cons:
The challenges of ETL were partially mitigated with the rise of specific tools, such as NoSQL databases, that could handle big data and high volumes at ingestion.
But because maintenance is hard and ETL only answers part of the business requirements (the ones it was built for, not very future-proof), modern Enterprises often swapped ETL with ELT by adding data warehousing to their data architecture.
Recommended reading: Complete ETL process overview.
ELT is a data architecture that follows the same steps as ETL but tries to correct for its disadvantages.
Instead of extracting just some data, ELT extracts all data and loads it into data storage. The data storage is most commonly a data lake architecture, such as Amazon Redshift, Google BigQuery, or Snowflake. With the use of cloud technologies, the load is super efficient and is not a bottleneck for scaling.
Transform happens only later when data is moved from the data lake into a data warehouse.
Moving the data involves rigorous conceptual and physical data modeling in which the schema of the data is defined before the data integration. Incoming data sets need to comply with the schema, otherwise, data does not get inserted into the data warehouse.
Those are just some of the differences. Check how ETL and ELT compare on 11 crucial points.
The ELT, and especially the data warehouse, is becoming the most common modern enterprise architecture.
Because of its data sanitation and quality assurance, it is loved by business intelligence and data analysts. Because all data is kept in its rich and informative raw form in the data lake, data scientists can build machine learning and artificial intelligence models to answer new and unexpected questions.
Pros:
Cons:
Data warehouses are great for providing data quality and a holistic picture.
But they are quite slow to address new changes and change data models to the evolving needs of decision-making in a fast-paced environment. This problem led to the development of data mesh.
Recommended reading: ETL vs ELT: 11 Critical differences.
Data mesh recognizes that a single centralized (data warehousing) solution might not be the best for every department. For some, the slow and validated data is great. But for other departments, a choice of different tools would better serve their needs (e.g. a different data storage, like MongoDB for document data or Neo4j for graph analytics).
Data mesh is consumer-centric. Each stakeholder is responsible for building their own ETL/ELT pipelines with the tools and technologies that serve their use cases best. The data assets are then the responsibility of each team and are not shared across the organization (unless this is the intended use case).
So how does data mesh prevent this each-team-on-their-own process from becoming a data mess? With infrastructure as a service.
Even though each team builds its own data flows, there are common sharing interfaces (data catalogs), data security and governance policies, and underlying technologies that support work across teams.
Want more info? We got you covered: Data mesh - the answer to the failures of centralized data architectures
Data mesh seems intrinsically superior to other architectures because it:
What are the cons?
Recommended reading: Data mesh - the answer to the failures of centralized data architectures.
Ultimately, the best modern-day architecture will depend on your business needs:
What you want to avoid, though, is commitment lock-in which would make your data architecture virtually impossible to switch. Your data architecture choice should not hold you back when your business needs change.
With Keboola, you can connect all your tools into a single infrastructure.
Here is why data architects and engineers love Keboola:
Keboola can be used for ETL, ELT, or even data mesh architectures.
Try it for free.
Keboola has an always-free, no-questions-asked plan. So, you can implement all the envisioned architectural designs with a couple of clicks.