How to build a data stack as a one-person data army, on a shoestring budget. Starting with an ETL.
You have started your data journey. You know you need to somehow collect data from various sources and land them into a data warehouse or data lake of some sort. Right now you’re browsing tools and calculating costs - there’s one for extraction, another one for transformations, there’s an ETL tool. What if we told you there’s a better way? The one you won’t end up with disjointed and expensive apps, but instead, you’ll be able to rely on a single platform as your data needs grow in their complexity.
Read on to learn why an “I just need an ETL tool” approach is potentially dangerous, and what alternatives does Keboola offer.
In the data space, there is a tool (well, a multitude of) for literally everything. The current raze is the “modern data stack”, or “best of breed” approach.
You often start with a database and an ETL tool, with the aim to eventually serve a BI tool/platform of some kind. You may already have one, or you may not yet - doesn’t matter, but kudos to you if you’re worried about how to organize the data before committing to a way to present it.
There are many tools that help you solve the initial problem - getting your data somewhere. Quite a few of them also do a pretty good job of it. But they all have a problem - they don’t set you up well for the future. What is the very next thing you will need to do? Chances are, it will be a transformation of some kind - combining data from multiple sources, or to optimize the data structures to fit your business model. Many “ETL” tools will abandon you at that moment, saying “OK, that’s not us, you need to add something else”. And you will have a new problem to solve.
And the same scenario will repeat itself every time you want to add new capability or function. What if the next transformation is better done in Python than SQL? What will help you monitor your flow and notify you of any problems? Considered version control and collaboration over your stack? Pretty soon you will need to add other tools not to add to your functionality, but to de-risk the resulting hodge-podge of tools. User management, data lineage, or any kind of log/audit trail will become new integration projects in their own right, taking away your focus and resources from actual value creation. You’ll need to hire more people just to keep supporting the house of cards, and you carry the risk of the pieces not fitting quite as well anymore.
In her article, Kelly Burdine describes her own journey of setting up a very simple data stack, being a one-woman data army at Aula. Well, simple in terms of what it does (ETL for BI from 4 data sources). Kelly had the experience of doing this before, so she encountered fewer dead ends than a beginner would. It still took 3 months as she needed to discover, select, purchase, set up and integrate 4 different tools (not counting the data sources themselves, of course, or the data destinations) and the stack now runs a cost of about $2k a month. And that is just the beginning!
What happens when you outgrow the initial ETL use case? The complexity will grow dramatically. Adding data science capabilities, perhaps data catalog, some collaboration tools as your team will grow - all those requirements just force purchasing, integrating, paying for and then maintaining more and more tools. Then the complexity itself becomes a problem, which you will solve with - wait for it - more tools. User management, DataOps or process and data flow monitoring are all categories that exist only to solve complexities of the stick-it-together, hold-it-with-ducttape data stack, not actually to add any value to the business.
Keboola doesn’t offer just an ETL tool, but a complete data-stack-as-a-service (DSaaS), for which ETL is just an (usually the first) use case. It offers a range of integrations and ease of use matching or surpassing its competitors, while also providing a solid foundation for whatever you will need to build in the future. This means more productivity, more value delivered to the company, and significantly lower total costs at every stage of the journey.
Keboola ETL features:
“How about the cost” you ask? “A complex tool like this sure must cost quite a bit…”. Well, everything in Keboola is purely on an as-needed basis. You never pay for a feature you don’t need, and you don’t need a new contract to use a new feature - in fact, you can start without a contract, without a credit card, for free. At any point of your journey, we can guarantee that the cost of ownership will be lower than the alternative, especially if you include all the productivity gains and elimination of administrative, non-value producing tasks.
Check out what Clarice did at Compology, or Matus at Productboard. They didn’t bother themselves with trying to reinvent a wheel and build a data stack from scratch - they went with a proven solution in Keboola, which is essentially a data-stack-as-a-service. DSaaS? The results are clear - rapid development, small and agile data team (Clarice is about to hire her first helper - interested?).
Start for free today, upgrade only as and when you need to, never pay for what you don’t use and never, ever feel the need to replace and start from scratch, ever again.