Why Doesn’t the Modern Data Stack Result in a Modern Data Experience?

The good, bad, and ugly of the Modern Data Architecture.

Contents

How To

October 7, 2022

Updated on

5 min read

Why Doesn’t the Modern Data Stack Result in a Modern Data Experience?

No items found.

The good, bad, and ugly of the Modern Data Architecture.

Download for Free

Oops! Something went wrong while submitting the form. Try it again please.

Scroll to download

The data landscape is exploding with tools.

As data professionals we have at our fingertips specialized tools for anything: from specialized databases (graph, geo, you name it) to tools for SQL-driven transformations (looking at you, dbt).

Yet, a lot of data work is about provisioning, selecting, administering, and just maintaining those tools. Which is just a pain.

As Pavel Dolezal, CEO and co-founder of Keboola said:

“How come working with data often feels like stepping on a lego barefoot?”

The answer is in how the Modern Data Architecture is built. What do we mean by Modern Data Architecture?

Streaming Data At Scale

Complete the form below to get your complimentary copy.

Oops! Something went wrong while submitting the form.

Remove bottlenecks with data infrastructure that takes care of all the tools in one place. Try for free. No credit card required.

The ripple effects on your (data) team

Working with a fragmented and multifaceted data stack harms your data team first, and then everyone else:

Lack of data democratization.

Each tool is a small data ecosystem. A lot of data and metadata is hidden within the tool. The “tribe” that owns the tool can access it. But other stakeholders, who might need access for their work either need to pass through the tribal gatekeepers or need to learn the tool. Both come at a time cost.

Data silos are being wedged between stakeholders, along the lines of the modern data stack.

Missing observability.

Each tool is built as its independent platform. Each tool has its devoted way of tracing data lineage, logging errors, and orchestrating runs.

The fragmentation makes holistic observability hard. But observability is not just a goal in and of itself. It serves compliance and engineering purposes.

Error resolution becomes painful since tracing the data issues back through the data journey involves changing multiple tools and standards.

Security is spread on thin ice.

Similar to observability, each tool needs its particular access control, privileges settings, and increases the area of security leakages and attacks.

As more tools are added to your company’s data stack, each tool acts as a potential vector of vulnerability.

Managing access and security becomes a juggling game against time and the inevitable “oh, I forgot about that one” slip.

Increased hidden and not-so-hidden costs.

Adding a new tool compounds your monthly expenses. But the individual-tool pricing is visible. You can control it. Unfortunately, it is not the only cost.

There are two hidden costs associated with a fragmented data stack.

First, there is the opportunity cost. Each hour spent managing a fragmented ecosystem of apps, tools, and frameworks, is an hour stolen away from building data products and delivering data insights that drive growth.

Second, there is the risk cost. Because tools are hyper-specialized, they need devoted personnel to not just use them, but also maintain, deploy and fine-tune them. Special “tribal” teams learn the ins and outs of tools and often custom adjust them to their domain needs. This knowledge is rarely shared throughout the different data roles (engineering, science, analytics), let alone throughout the company. Exposing you to risks whenever the devoted go-to person is not available.

In other words: The modern data stack does not result in a modern data experience.

Remove bottlenecks with data infrastructure that takes care of all the tools in one place. Try for free. No credit card required.

4 Types of Modern Data Architecture

Current enterprises build their data operations around 4 different types of Modern Data Architectures:

Analytical Database at its Center. A set of ETL data pipelines that extract all the data from the raw sources and load the transformed data into a database (Postgres, MySQL, …) or a data warehouse (Snowflake, Redshift, BigQuery) at its center.
Data Lake at its Center. A set of ELT pipelines that store everything in a data lake, which gets later integrated into an approved data model.
Full-blown Data Streaming. A set of microservices commonly configured as a messaging queue (Kafka, RabbitMQ, …) that act both as a data source and data consumers. Usually deployed for real-time analytics.
Full-blown Data Mesh. A set of microservices that are not product-oriented like streaming architectures, but are domain-focused on the teams and products (read more about data mesh here).

Modern Data Architectures were developed to solve the pains of data operations. Analytical databases are fast to set up and get quick results. Data lakes solve the problem of missing and losing data in analytical databases. Data streaming allows you to better keep a finger on the pulse as data is evolving. And data meshes de-centralize the architecture to solve the intrinsic problems of the other three.

But all four architectures suffer from the same problem.

The Modern Data Stack - the tools in those architectures - has exploded. We integrate so many tools into the Modern Data Architecture that the data operations are focusing more on the integration, administration, and maintenance of those tools instead of building data products.

As Benn Stancil - Chief Analytics Officer and co-founder of Mode - says:

“The reason to build a “modern data experience” isn’t to unify the disjointed products of a bunch of startups; it’s to serve a world in which far more people want to work like analysts.”

So what exactly are the challenges of the Modern Data Stack?

The problems of the exploding Modern Data Stack

Tools specialized and fragmented a unified data stack to bring a cutting-edge advantage. NoSQL databases were developed to compensate for the issues of the relational database paradigm. dbt was born to ease the pains of transformations.

But with each new tool added to our data stack, we added another card to our house of data cards. Making the entire infrastructure a little bit more unstable.

“The number of tools in the data stack is exploding. There are tools for data observability, cost management, compliance, data lifecycle, data classification, and the list goes on. The problems start compounding every time a new tool is added to the data stack. Each tool needs to be integrated with the data stack, and they have to rediscover and create another copy of metadata. The organization now has a patchwork of inconsistent, fragmented, and siloed metadata spread across different systems and stored in proprietary formats. Teams struggle to keep the metadata correct, complete, and consistent across tools and fall back to the error-prone “tribal knowledge” approach to data. The disconnected user experience jumping between tools worsens and the user frustration grows, affecting team productivity. This also puts an undue burden on teams operationalizing the data stack as they need to set up multiple systems, configure them, and manage them.” - Suresh Srinivas, Chief Architect for data, Uber

How do we elevate the modern data experience?

There are three ways to solve the current problem:

Tools. Find better tools that glue together the disparate and fragmented stacks. dbt has been hailed as a glue for modern data problems. And dbt is great. But it does not bring together every part of the data stack - it cannot do storage, extraction, and governance. More importantly, this solution misses the point. The Modern Data Stack exploded because data operatives love their tools. They want to bring their chosen specialized solutions to the table. Forcing them to use a single tool goes against the benefits of the Modern Data Stack.
Processes. You can improve your processes. Better governance and documentation. More conversations across different teams and knowledge shares. But unless the processes are integrated within the tools we use, the processes become a second thought, a forgotten “must-do”.
Infrastructure. An automated data infrastructure that takes care of all the tools working together in one place. Sharing data democratically, bringing cross-tooling observability, increasing security, and consolidating (and lowering) the costs. While allowing users to plug and play their chosen stack of choice. In other words, the data mesh architecture.

Pavel Dolezal, CEO and co-founder of Keboola, puts it nicely:

“Sure you can hire the best 5 baristas to brew the most exquisite coffee for your company. But when 1000 employees need their morning fuel at the same time, your 5 baristas will not be able to handle their orders. Rather than relying on just 5 baristas, get more sophisticated coffee machines and teach people how to press a couple of buttons. Sure the coffee will not be Michelline Star class, but at least no one will suffer their morning jitters.”

So of course, allowing everyone at your company to build and play with data might not produce the most polished technical products. But it sure beats waiting around on the bottlenecks to resolve before you get your reports.

This is exactly what Keboola offers.

Keboola sets up the infrastructure, security, governance, environment, configuration, and all other DataOps out of the box (no worries, you can dive in and adjust/build your own data engines). You can either use it on its own for your data operations or integrate it with the specialized tools you grew to love.

It is also called the “Data Stack as a Service” platform, to emphasize the ability to bring your own tools to the data game. Keboola allows you to:

Run all the typical data operations. ELT, ETL, and Reverse ETL are made simple by components that integrate data from over 250 sources and destinations, without additional engineering or maintenance needed.
Include external solutions in the mix. The plug-and-play infrastructure and universal standards allow you to bring your chosen tools to the game.
Ready for data science. Out-of-the-box Python and R transformations and workspaces allow you to run free-code data science or plug-and-play with your favorite machine learning apps.
Security is a first-class citizen. Data governance and enterprise-level security standards are a must to run a smooth operation.
Observability. Every event, job, and user interaction is monitored to the finest granularity, to offer users an overview of the platform’s functioning.
Data democratization. Use the Data Catalog to share data between teams and departments.
Work democratization. From low-code, automation APIs, and user-friendly UIs to full platform as a code approach. Your personnel can choose what level of technical expertise they want when building data pipelines.
Interoperability. Keboola is designed to be fully interoperable with other tools and standards. From OpenLineage for governance to ML Flow for data science. There is no vendor lockin, use Keboola to plug and play with your favorite tools.

Ready to move from theory to practice?

Keboola has an always free, no-questions-asked plan so you can explore all the power of the data mesh paradigm. Feel free to give it a go.

‍

Download for Free

Oops! Something went wrong while submitting the form. Try it again please.

Recommended Articles

Thumbnail img

Tariff Pain from a Data Point of View

Thumbnail img

MCP Server Integration: One Month of AI-Powered Data Engineering

No items found.

Join our newsletter

#noSpamWePromise

You are now subscribed to Keboola newsletter

Oops! Something went wrong while submitting the form.

>