Milan Veverka's thoughts and learnings on the future of data stack.
If an organization is to be productive, it needs to work with the right data at the right times exactly where it needs them, across the whole structure and in each of its departments.
Zhamak Dehghani made a very persuasive case for such a capability in her seminal 2019 article, and I, unaware of her contribution then, made a similar case in a much less seminal blog on the “Human API” of data stack. We called the underlying concept different names — she called it domain ownership; I, data democracy. Today, few people care to make the distinction, and a single catch-all term — data mesh — has been adopted for the philosophy. The term has been steadily gaining in popularity, picking up speed in late 2021. A day doesn’t pass without at least one data-meshy post in my LinkedIn feed, and there has barely been a data conference without data mesh as a topic recently.
Yet, for all this hype, data mesh seems to remain an elusive dream. Indeed, when I spoke to Mohammad Syed, Lead Data Strategist at Carruthers and Jackson, in late 2021, we agreed that data mesh as we understand it is not unlike the north star: a direction to follow, but a too distant goal if one were to reach it. At least for most.
This is because data mesh is a product of the infrastructure and technology that ensures the extraction, routing, security and delivery of information. Regardless of what organization of the data function(s), there is infrastructure and technology that everything runs on. Known as data stack, its overall architecture and composition directly dictates what one can do with it — and with the organization. To create data mesh, you need a top notch data stack. It is as simple — or complex, as we will see shortly — as that.
Data stacks come in three types: simple, modern, and … Until not so long ago, many would have said utopian. Not any more. But let’s first check the first two.
Imagine the simplest possible setup: some kind of extraction tool, landing data in a database, and a BI tool on top of it. Works great if there aren’t too many data sources, the data is not very complicated and the data engineering team is small. Like, one person small. Probably part-time. Anything beyond that, and we will start running into issues, and concepts like domain ownership are just a hallucination at this point (unless, of course, all the domains are owned by the same person). A natural start, good enough for the simplest of use cases, but a complete headache for most real companies.
Enter the next evolutionary step: a number of disparate tools, each focusing on a small sliver of the required functionality. By way of reminder, each data product represents a value chain with at least three steps: the data needs to come from somewhere, something needs to happen to it, and it needs to be delivered somehow. In reality that means at least two technologies the product owner needs access to and expertise in, apart from their domain knowledge. Sounds simple enough, right? And it is, as long as all you offer are very simple products.
In reality, there are many more technologies involved, with a complex web of interconnections, held together by the sheer will of the data engineering team(s). Small wonder they would be reluctant to let “some marketing guy” get in and build products. A recent article came up with 12(!) categories of tools that the pile consists of (just the fact that things like “Govern” or “Observe” are mentioned as categories when they should be just features is rather concerning). And, to be honest, the author still left at least a few “categories” unmentioned (user management, log management / telemetry etc. etc.). I like what Ethan Aaron wrote in his post, as it very well describes how this incredible complexity came into being by turning features into companies and companies into categories. And this complexity in turn leads to what Peter Jackson, the CDO at C&J, had in mind when he told me: “Data Mesh is like teenage sex; everyone is talking about it, everyone wants to do it but no one has a clue how.”
For a genuine data mesh, one would need a data stack that’s simple, cheap, and cooperation-friendly. As always, such a layman definition translates into a long list of daunting requirements:
Against this backdrop, most companies, in considering ways to reach data mesh, find they lack the technical maturity required to implement their desired use cases. As Mohammad put it in our conversation, “organizations have to evaluate their ability to sustain complex architectures, particularly from the perspective of governance structures, ground- level skills and literacy and the formality of data processes. A data mesh is a complicated architectural pattern to maintain, and so requires extensive maturity across all the layers of the organization.” Speaking of use cases, he added that “organizations need to carefully consider as part of the data strategy which types of use cases will need to be supported by their strategic architecture today, and which use cases are longer-term bets.” Spot on, in the realm of the modern data stack.
Luckily, there is a data stack for any user including the most demanding one, regardless of their internal competence level. Enter the third type.
In September 2021, at a data mesh meetup in Prague, Ilja Volf from leading European e-commerce operator Mall Group discussed their experience with implementing data mesh, and the effect it had on use case delivery for their business. In his talk, he describes the whole journey from a team of “5 engineers and more platforms than we had people” to supporting not only consolidated reporting and analysis, but also providing high-value data products such as dynamic pricing etc. via nearly a hundred users across the organization actively building data products. The key to their success? Simplification, automation and focus on outcomes rather than features. We are proud to have supported Mall Group with our data stack as a service, Keboola, on their journey.
Data mesh is not just about data — it is about organizational change and mindset. A complete data stack as a service, Keboola greatly reduces both complexity and time-to-value. With no disparate services to handle various parts of the stack, Keboola decentralizes IT bottlenecks, and democratizes not just the access to data products but to their full life-cycle (including development, maintenance and monitoring). Because of the resulting labor and talent savings, the total cost of ownership is a fraction of what it costs to run a functionally equivalent modern data stack. And Keboola’s DNA directly leads itself into supporting a data mesh pattern of collaboration:
Think of the old analogy of a swan moving gracefully across the water. That its feet are paddling away furiously under the water— or in our case, that the stack does an awful lot of work behind the scenes to reduce complexity and ensure the UI and processes involved are simple enough for anyone with basic SQL. (Thank you Graham Sharpe for this analogy). By drastic reduction of complexity and relentless automation of many of the tasks that would otherwise require deep technical skills, Keboola makes the whole stack accessible and domain ownership possible to a much broader audience.
Whether Keboola is the Holy Grail or not, if you are thinking about data mesh and how to implement it as a whole or even partially, taking a closer look at Keboola would certainly be worth your time. It certainly was for Mall Group and many others.
This blog post has been originally published by Milan Veverka as a LinkedIn article.