Data Mesh, Data Stack, and the Holy Grail

Milan Veverka, Product Marketing @ Keboola

Apr 18, 202210 min read

Data Mesh, Data Stack, and the Holy Grail

If an organization is to be productive, it needs to work with the right data at the right times exactly where it needs them, across the whole structure and in each of its departments.

Zhamak Dehghani made a very persuasive case for such a capability in her seminal 2019 article, and I, unaware of her contribution then, made a similar case in a much less seminal blog on the “Human API” of data stack. We called the underlying concept different names — she called it domain ownership; I, data democracy. Today, few people care to make the distinction, and a single catch-all term — data mesh — has been adopted for the philosophy. The term has been steadily gaining in popularity, picking up speed in late 2021. A day doesn’t pass without at least one data-meshy post in my LinkedIn feed, and there has barely been a data conference without data mesh as a topic recently.

Yet, for all this hype, data mesh seems to remain an elusive dream. Indeed, when I spoke to Mohammad Syed, Lead Data Strategist at Carruthers and Jackson, in late 2021, we agreed that data mesh as we understand it is not unlike the north star: a direction to follow, but a too distant goal if one were to reach it. At least for most.

This is because data mesh is a product of the infrastructure and technology that ensures the extraction, routing, security and delivery of information. Regardless of what organization of the data function(s), there is infrastructure and technology that everything runs on. Known as data stack, its overall architecture and composition directly dictates what one can do with it — and with the organization. To create data mesh, you need a top notch data stack. It is as simple — or complex, as we will see shortly — as that.

Data stacks come in three types: simple, modern, and … Until not so long ago, many would have said utopian. Not any more. But let’s first check the first two.

Simple data stack

Imagine the simplest possible setup: some kind of extraction tool, landing data in a database, and a BI tool on top of it. Works great if there aren’t too many data sources, the data is not very complicated and the data engineering team is small. Like, one person small. Probably part-time. Anything beyond that, and we will start running into issues, and concepts like domain ownership are just a hallucination at this point (unless, of course, all the domains are owned by the same person). A natural start, good enough for the simplest of use cases, but a complete headache for most real companies.

Modern data stack

Enter the next evolutionary step: a number of disparate tools, each focusing on a small sliver of the required functionality. By way of reminder, each data product represents a value chain with at least three steps: the data needs to come from somewhere, something needs to happen to it, and it needs to be delivered somehow. In reality that means at least two technologies the product owner needs access to and expertise in, apart from their domain knowledge. Sounds simple enough, right? And it is, as long as all you offer are very simple products.

In reality, there are many more technologies involved, with a complex web of interconnections, held together by the sheer will of the data engineering team(s). Small wonder they would be reluctant to let “some marketing guy” get in and build products. A recent article came up with 12(!) categories of tools that the pile consists of (just the fact that things like “Govern” or “Observe” are mentioned as categories when they should be just features is rather concerning). And, to be honest, the author still left at least a few “categories” unmentioned (user management, log management / telemetry etc. etc.). I like what Ethan Aaron wrote in his post, as it very well describes how this incredible complexity came into being by turning features into companies and companies into categories. And this complexity in turn leads to what Peter Jackson, the CDO at C&J, had in mind when he told me: “Data Mesh is like teenage sex; everyone is talking about it, everyone wants to do it but no one has a clue how.”

For a genuine data mesh, one would need a data stack that’s simple, cheap, and cooperation-friendly. As always, such a layman definition translates into a long list of daunting requirements:

Security and governance. Table stakes, nothing to add. Perhaps that the more complex the setup, the more potential vulnerabilities and the harder the governance
Consistent across use cases. Users should be able to interact with the environment in a consistent way, rather than using different interfaces for different use cases (think “this data is in this database, this data is an Excel sheet… - you’re on your own” scenarios)
CI/CD support, versioning and collaboration. There must be clear change tracking and deployment policy
Measurability of usage. The ability to assign infrastructure cost to use cases/products allows to calculate ROI on various activities, rather than having a black hole of IT / data team budget
Ease of use / short learning curve. Limited need for special or broad skills (if every marketing analyst needs to be an expert in 7 technologies on top of their domain expertise, it will be very hard to find and hire them)
Support of data product ownership and data lineage. Trust is an essential component of Data Mesh, and knowing where and who the product is coming from goes a long way to establish that
Transparency and auditability. Further supporting governance and trust, but also making it much easier to administer the whole system
Manageable administration complexity and overhead cost. Too complex of an architecture is a major barrier to success with data mesh
Scalability of the data stack. Ability to start small (such as our team of 1 example) and grow big (enterprise) all on the same stack.
Version control built-in. To make data products (and changes to them) independent of the end use case thus avoiding the traditional IT bottleneck of “change complexity”.

Against this backdrop, most companies, in considering ways to reach data mesh, find they lack the technical maturity required to implement their desired use cases. As Mohammad put it in our conversation, “organizations have to evaluate their ability to sustain complex architectures, particularly from the perspective of governance structures, ground- level skills and literacy and the formality of data processes. A data mesh is a complicated architectural pattern to maintain, and so requires extensive maturity across all the layers of the organization.” Speaking of use cases, he added that “organizations need to carefully consider as part of the data strategy which types of use cases will need to be supported by their strategic architecture today, and which use cases are longer-term bets.” Spot on, in the realm of the modern data stack.

Luckily, there is a data stack for any user including the most demanding one, regardless of their internal competence level. Enter the third type.

Data stack as a service, a.k.a. the holy grail?

In September 2021, at a data mesh meetup in Prague, Ilja Volf from leading European e-commerce operator Mall Group discussed their experience with implementing data mesh, and the effect it had on use case delivery for their business. In his talk, he describes the whole journey from a team of “5 engineers and more platforms than we had people” to supporting not only consolidated reporting and analysis, but also providing high-value data products such as dynamic pricing etc. via nearly a hundred users across the organization actively building data products. The key to their success? Simplification, automation and focus on outcomes rather than features. We are proud to have supported Mall Group with our data stack as a service, Keboola, on their journey.

Data mesh is not just about data — it is about organizational change and mindset. A complete data stack as a service, Keboola greatly reduces both complexity and time-to-value. With no disparate services to handle various parts of the stack, Keboola decentralizes IT bottlenecks, and democratizes not just the access to data products but to their full life-cycle (including development, maintenance and monitoring). Because of the resulting labor and talent savings, the total cost of ownership is a fraction of what it costs to run a functionally equivalent modern data stack. And Keboola’s DNA directly leads itself into supporting a data mesh pattern of collaboration:

A projects-based structure for domain ownership
An automated infrastructure handling and management for true tool democratization.
A Data Product Catalog as the means of publishing data products within the organization and ability to easily build on them further
A centralized telemetry data feed to do away with the need of 3rd party logging and audit tools, as well as it contains data needed for cost allocation and monitoring by job, workflow, use case, project or user
SOC2 certification for peace of mind over the whole stack
Development Branches and Keboola as Code for seamless integration into any data-ops policy framework
Use of standard tools (e.g., SQL or Python) to implement the required business logic, and a consistent environment for learning curve reduction
A framework and library of Keboola Templates for further accelerated development and delivery of use cases
An APIs and open architecture for limitless expansion, extension and integration

Think of the old analogy of a swan moving gracefully across the water. That its feet are paddling away furiously under the water— or in our case, that the stack does an awful lot of work behind the scenes to reduce complexity and ensure the UI and processes involved are simple enough for anyone with basic SQL. (Thank you Graham Sharpe for this analogy). By drastic reduction of complexity and relentless automation of many of the tasks that would otherwise require deep technical skills, Keboola makes the whole stack accessible and domain ownership possible to a much broader audience.

Whether Keboola is the Holy Grail or not, if you are thinking about data mesh and how to implement it as a whole or even partially, taking a closer look at Keboola would certainly be worth your time. It certainly was for Mall Group and many others.

This blog post has been originally published by Milan Veverka as a LinkedIn article.

Milan Veverka

Product Marketing @ Keboola

Community

Data Mesh, Data Stack, and the Holy Grail

Milan Veverka, Product Marketing @ Keboola

Apr 18, 202210 min read

If an organization is to be productive, it needs to work with the right data at the right times exactly where it needs them, across the whole structure and in each of its departments.

Data stacks come in three types: simple, modern, and … Until not so long ago, many would have said utopian. Not any more. But let’s first check the first two.

Simple data stack

Modern data stack

For a genuine data mesh, one would need a data stack that’s simple, cheap, and cooperation-friendly. As always, such a layman definition translates into a long list of daunting requirements:

Security and governance. Table stakes, nothing to add. Perhaps that the more complex the setup, the more potential vulnerabilities and the harder the governance
Consistent across use cases. Users should be able to interact with the environment in a consistent way, rather than using different interfaces for different use cases (think “this data is in this database, this data is an Excel sheet… - you’re on your own” scenarios)
CI/CD support, versioning and collaboration. There must be clear change tracking and deployment policy
Measurability of usage. The ability to assign infrastructure cost to use cases/products allows to calculate ROI on various activities, rather than having a black hole of IT / data team budget
Ease of use / short learning curve. Limited need for special or broad skills (if every marketing analyst needs to be an expert in 7 technologies on top of their domain expertise, it will be very hard to find and hire them)
Support of data product ownership and data lineage. Trust is an essential component of Data Mesh, and knowing where and who the product is coming from goes a long way to establish that
Transparency and auditability. Further supporting governance and trust, but also making it much easier to administer the whole system
Manageable administration complexity and overhead cost. Too complex of an architecture is a major barrier to success with data mesh
Scalability of the data stack. Ability to start small (such as our team of 1 example) and grow big (enterprise) all on the same stack.
Version control built-in. To make data products (and changes to them) independent of the end use case thus avoiding the traditional IT bottleneck of “change complexity”.

Luckily, there is a data stack for any user including the most demanding one, regardless of their internal competence level. Enter the third type.

Data stack as a service, a.k.a. the holy grail?

A projects-based structure for domain ownership
An automated infrastructure handling and management for true tool democratization.
A Data Product Catalog as the means of publishing data products within the organization and ability to easily build on them further
A centralized telemetry data feed to do away with the need of 3rd party logging and audit tools, as well as it contains data needed for cost allocation and monitoring by job, workflow, use case, project or user
SOC2 certification for peace of mind over the whole stack
Development Branches and Keboola as Code for seamless integration into any data-ops policy framework
Use of standard tools (e.g., SQL or Python) to implement the required business logic, and a consistent environment for learning curve reduction
A framework and library of Keboola Templates for further accelerated development and delivery of use cases
An APIs and open architecture for limitless expansion, extension and integration

This blog post has been originally published by Milan Veverka as a LinkedIn article.

Milan Veverka

Product Marketing @ Keboola

Data Mesh, Data Stack, and the Holy Grail

Simple data stack

Modern data stack

Data stack as a service, a.k.a. the holy grail?

Related Articles

MCP Server Integration: One Month of AI-Powered Data Engineering

Shadow AI Is Already Inside Your Company: Here’s How to Control It Before It Blows Up

The Story of Keboola MCP: How We Decided Not to Wait

Ready to transform your data workflow?

Data Mesh, Data Stack, and the Holy Grail

Simple data stack

Modern data stack

Data stack as a service, a.k.a. the holy grail?

Related Articles

MCP Server Integration: One Month of AI-Powered Data Engineering

Shadow AI Is Already Inside Your Company: Here’s How to Control It Before It Blows Up

The Story of Keboola MCP: How We Decided Not to Wait

Ready to transform your data workflow?