You built the AI layer. You also built your own prison.

To the data scientist who shipped first.
You wired Claude into Databricks. You wrote a skill — client revenue, margin breakdown, pipeline health. You connected it through MCP. You showed your CFO. They watched it answer in 12 seconds a question that used to take two days in Excel. They told the founder. The founder told the board. You are, briefly, the most exciting person in the company.
You should be. You shortened your company's AI timeline by 12 months.
“We had a thousand-flowers-blooming season. Everyone had Python, skills, agents running on their laptop. Beautiful, unmaintainable, dangerous. Now we have to operationalize that.” — CFO, large pharma, San Francisco
Months 0-6, hero. Months 6-18, maintainer. By month 24, the company is quietly evaluating commercial platforms against the thing you built. You will be in those conversations. This is how to avoid that.
What you actually built
Proof. The skill, the CLI, the MCP wiring is the prototype. The real win is that your company is now willing to act on AI-generated answers about financial systems of record. That is bigger than the code.
What you did NOT build
Audit trail. Every action invisible to compliance. SOX if public, HIPAA in healthcare, insider-trading exposure on planning data. The auditor's question — “who saw what, when, with whose authorization” — has no answer. The walkthrough ends badly.
Access control. Audit covers “who saw what.” It does not cover “who was authorized.” Today, anyone with the CLI runs any skill against any dataset. The first compliance interview fails.
Semantic layer. Every query ships raw schema to the LLM — columns, tables, sometimes sample rows. You are paying 10x-100x token cost to re-teach the model your business on every call. Invisible at one user. Visible on the bill at forty.
Skill governance. Eight skills in a Git repo with CI works. Fifty skills with multiple owners does not. When you change “client revenue,” what propagates to the analyst running an older version from her laptop? You are building a marketplace whether you call it that or not.
Sandbox. Today: edit and pray. “Use Docker” is not the answer — containerizing a skill does not isolate NetSuite, Snowflake, or your planning system. A real sandbox is an isolated copy of the whole pipeline — data, connections, credentials, versions. Without it, every prompt is a deployment.
Second-eye approval. Ask the AI to “adjust this number to align with the forecast,” the LLM will do it. That is now a permanent change your auditor sees. “Do not modify without confirmation” in the prompt is a hope, not a control. Stochasticity is the architecture.
Monitoring AND AI observability. Yes, your company has Datadog. It is owned by IT and it answers “is the API up.” It does not answer “did Claude hallucinate the Q3 revenue number.” Prompt drift, hallucination rate, tool-call failure rate, output correctness — different layer entirely. Nobody owns it because nobody built it.
Data quality and tests. Monitoring tells you the pipeline is up. It does not tell you the data is right. When NetSuite silently drops 3% of journal entries, nobody catches it until a CFO sees a wrong number. Same for your skills — schema or model shifts and “client revenue” pulls from a dead column into a board pack.
Model lock-in. Your skills are Claude-shaped. Claude 5 breaks prompts, or the company evaluates Gemini — you rewrite everything.
Each one is, in isolation, a project. Together they are a platform. You are about to spend the next 18 months building one.
Why the remaining 30% is harder than the first 70%
The MVP gets you 70% of perceived value in 5% of effort. The company assumes the remaining 30% costs 15% more effort. It does not — that 30% is compliance, governance, scale, reliability, all of which compound.
Audit is binary. Immutable log every auditor accepts, or nothing. There is no “we'll improve next quarter.”
Token economics flip. Per-query cost stays flat; usage 40x's. A system with a semantic layer goes DOWN per query at scale. Yours goes up. Your CFO assumes the second curve. You are on the first.
Skill burden compounds. Twenty skills is a job. Sixty is a team. Every LLM update, every schema change is a small fire when there is no test layer.
Open source covers parts of this. LangFuse for audit. dbt for semantic. Git for versioning. Slack approval bots. Real tools. Stitching six of them together, owning operations, hardening for SOX — that IS the 18-month platform build. The OSS pieces are not the shortcut. They are the parts list.
One honest distinction: “good enough for Series B” is 3-6 months. SOX-grade for a $300M planning cycle is 18+ months. Your company does not stay at Series B grade once the CFO trusts the answers.
The exit ramp
What you built is right — for the question it was supposed to answer: “can AI on our data work.” It worked.
The exit ramp is: stop being the architecture.
- Position the MVP as validation. Internally: “I de-risked the AI investment. Here is the production-grade architecture we now need.” You are not abandoning the work; you are graduating it.
- Write the nine-bullet document. Audit, access control, semantic layer, skill governance, sandbox, approval, monitoring + observability, data quality + tests, model migration. Concrete plan against each.
- Build vs buy, honestly. Eighteen months, three engineers, $500K-$1M annualized — ending at 70% of what is already commercial. The opportunity cost is what your team does NOT ship that year.
- Pick a platform that keeps your work. Your Claude skills still run. Your CLI mental model ports. You become the platform's first power user.
- Return to being a data scientist. Strategy, new skill design, the LLM frontier — only available if you are not also running infrastructure.
A note to the manager (read this to your CFO)
If you manage the data scientist who built this — head of data, CFO, founder — there is a specific window: months 6 to 18 after the MVP ships. The tool appears to work from the outside while compliance asks questions you cannot answer, token costs accelerate, and your data scientist quietly takes recruiter calls.
Wait until month 18, you build the platform while bleeding talent. Decide at month 6, you do it while the team is still energized. Same destination, different cost.
This is my opinion, my vision, and how I have seen this play out at a dozen companies. The ask is one thing: make the list above and write your plan against each pillar. If the honest answer is “we'll build it,” put a timeline and a headcount number against it and read it out loud.
If you want to talk to someone who has built this six times, contact below. No demo. A conversation.
A companion piece is coming about what this costs you personally. The two problems are the same problem.
— Pavel Doležal, CEO Keboola
Newsletter
Get more like this in your inbox
Practical data engineering and AI insights from the Keboola team.



