The good, bad, and ugly of the Modern Data Architecture.
The data landscape is exploding with tools.
As data professionals we have at our fingertips specialized tools for anything: from specialized databases (graph, geo, you name it) to tools for SQL-driven transformations (looking at you, dbt).
Yet, a lot of data work is about provisioning, selecting, administering, and just maintaining those tools. Which is just a pain.
As Pavel Dolezal, CEO and co-founder of Keboola said:
“How come working with data often feels like stepping on a lego barefoot?”
The answer is in how the Modern Data Architecture is built. What do we mean by Modern Data Architecture?
Working with a fragmented and multifaceted data stack harms your data team first, and then everyone else:
Each tool is a small data ecosystem. A lot of data and metadata is hidden within the tool. The “tribe” that owns the tool can access it. But other stakeholders, who might need access for their work either need to pass through the tribal gatekeepers or need to learn the tool. Both come at a time cost.
Data silos are being wedged between stakeholders, along the lines of the modern data stack.
Each tool is built as its independent platform. Each tool has its devoted way of tracing data lineage, logging errors, and orchestrating runs.
The fragmentation makes holistic observability hard. But observability is not just a goal in and of itself. It serves compliance and engineering purposes.
Error resolution becomes painful since tracing the data issues back through the data journey involves changing multiple tools and standards.
Similar to observability, each tool needs its particular access control, privileges settings, and increases the area of security leakages and attacks.
As more tools are added to your company’s data stack, each tool acts as a potential vector of vulnerability.
Managing access and security becomes a juggling game against time and the inevitable “oh, I forgot about that one” slip.
Adding a new tool compounds your monthly expenses. But the individual-tool pricing is visible. You can control it. Unfortunately, it is not the only cost.
There are two hidden costs associated with a fragmented data stack.
First, there is the opportunity cost. Each hour spent managing a fragmented ecosystem of apps, tools, and frameworks, is an hour stolen away from building data products and delivering data insights that drive growth.
Second, there is the risk cost. Because tools are hyper-specialized, they need devoted personnel to not just use them, but also maintain, deploy and fine-tune them. Special “tribal” teams learn the ins and outs of tools and often custom adjust them to their domain needs. This knowledge is rarely shared throughout the different data roles (engineering, science, analytics), let alone throughout the company. Exposing you to risks whenever the devoted go-to person is not available.
In other words: The modern data stack does not result in a modern data experience.
Current enterprises build their data operations around 4 different types of Modern Data Architectures:
Modern Data Architectures were developed to solve the pains of data operations. Analytical databases are fast to set up and get quick results. Data lakes solve the problem of missing and losing data in analytical databases. Data streaming allows you to better keep a finger on the pulse as data is evolving. And data meshes de-centralize the architecture to solve the intrinsic problems of the other three.
But all four architectures suffer from the same problem.
The Modern Data Stack - the tools in those architectures - has exploded. We integrate so many tools into the Modern Data Architecture that the data operations are focusing more on the integration, administration, and maintenance of those tools instead of building data products.
As Benn Stancil - Chief Analytics Officer and co-founder of Mode - says:
“The reason to build a “modern data experience” isn’t to unify the disjointed products of a bunch of startups; it’s to serve a world in which far more people want to work like analysts.”
So what exactly are the challenges of the Modern Data Stack?
Tools specialized and fragmented a unified data stack to bring a cutting-edge advantage. NoSQL databases were developed to compensate for the issues of the relational database paradigm. dbt was born to ease the pains of transformations.
But with each new tool added to our data stack, we added another card to our house of data cards. Making the entire infrastructure a little bit more unstable.
“The number of tools in the data stack is exploding. There are tools for data observability, cost management, compliance, data lifecycle, data classification, and the list goes on. The problems start compounding every time a new tool is added to the data stack. Each tool needs to be integrated with the data stack, and they have to rediscover and create another copy of metadata. The organization now has a patchwork of inconsistent, fragmented, and siloed metadata spread across different systems and stored in proprietary formats. Teams struggle to keep the metadata correct, complete, and consistent across tools and fall back to the error-prone “tribal knowledge” approach to data. The disconnected user experience jumping between tools worsens and the user frustration grows, affecting team productivity. This also puts an undue burden on teams operationalizing the data stack as they need to set up multiple systems, configure them, and manage them.” - Suresh Srinivas, Chief Architect for data, Uber
There are three ways to solve the current problem:
Pavel Dolezal, CEO and co-founder of Keboola, puts it nicely:
“Sure you can hire the best 5 baristas to brew the most exquisite coffee for your company. But when 1000 employees need their morning fuel at the same time, your 5 baristas will not be able to handle their orders. Rather than relying on just 5 baristas, get more sophisticated coffee machines and teach people how to press a couple of buttons. Sure the coffee will not be Michelline Star class, but at least no one will suffer their morning jitters.”
So of course, allowing everyone at your company to build and play with data might not produce the most polished technical products. But it sure beats waiting around on the bottlenecks to resolve before you get your reports.
This is exactly what Keboola offers.
Keboola sets up the infrastructure, security, governance, environment, configuration, and all other DataOps out of the box (no worries, you can dive in and adjust/build your own data engines). You can either use it on its own for your data operations or integrate it with the specialized tools you grew to love.
It is also called the “Data Stack as a Service” platform, to emphasize the ability to bring your own tools to the data game. Keboola allows you to:
Ready to move from theory to practice?
Keboola has an always free, no-questions-asked plan so you can explore all the power of the data mesh paradigm. Feel free to give it a go.