Data Streams · OpenTelemetry
Your app telemetry, next to your business data.
One OTLP endpoint. Three tables — logs, metrics, traces. Query them with the rest of your warehouse in SQL.
Why this matters
Telemetry is only half the question. The other half is revenue.
Datadog and Honeycomb tell you what went wrong. Joining a trace with an order tells you what it cost — and that join only works where the business data already lives.
Correlate failures with revenue
Join `traces` with `orders` by trace_id. Find out which 500 errors cost you a conversion — and which ones nobody noticed.
Measure LLM cost per outcome
Token usage in `metrics`, conversions in your business tables. One SQL query answers "what did this AI feature actually cost us per converted user?"
Quantify deploy impact
Tag every span with `deployment_environment`. Pinpoint the deploy that dropped checkout conversion 4%, not the one a Slack thread blamed.
Skip the export pipeline
No Datadog → S3 → warehouse two-day ETL. Telemetry lands in Storage 15 seconds after it leaves your app.
How it fits
A standard pipe.
A SQL-shaped destination.
Nothing custom on your side — vanilla OTel SDKs and OTLP. On our side, the same Data Streams engine that handles 140K events/sec, shaped for OpenTelemetry payloads.
Instrumentation
OTel SDK
Standard OpenTelemetry SDKs for Python, Node, Go, Java, .NET, Rust.
Standard protocol
OTLP endpoint
OTLP/HTTP over `http/protobuf` — the same exporter you point at Datadog.
Real-time ingest
Data Streams
Same engine as Keboola's HTTP streams — sub-15s to Storage.
Joinable in SQL
Storage tables
Three auto-created tables — logs, metrics, traces — with typed columns.
Keboola Storage
logs · metrics · traces
Three SQL-queryable tables, pre-extracted columns, and the same trace_id that travels with your spans — ready to JOIN with the orders, sessions, and users you already store.
Standards-based
Vanilla OTLP — no custom SDK, no lock-in
Pre-extracted
service, severity, trace_id, deployment_environment as columns
Joinable in SQL
trace_id is the foreign key between telemetry and business data
Three signals
Pre-shaped tables. No unwrapping JSON.
Each signal lands in its own table with the columns you'll actually query promoted to top-level — no json_extract gymnastics.
Signals
Logs
Application events, errors, warnings, debug. Severity, service, and trace_id are top-level columns so you can SELECT and JOIN without unwrapping JSON.
Pre-extracted columns
- timestamp
- severity
- service
- trace_id
- body
- deployment_environment
- host_name
- k8s_pod_name
-- Error rate by service, last hourSELECTservice,COUNT(*) AS errorsFROM logsWHERE severity = 'ERROR'AND timestamp > NOW() - INTERVAL '1h'GROUP BY serviceORDER BY errors DESC;
Setup
Two env vars. That's the install.
Point your existing OTel SDK or Collector at the endpoint we generate. No custom packages, no auth dance.
Create a Data Stream
Copy the endpoint URL
Set the env vars
export OTEL_EXPORTER_OTLP_ENDPOINT="<your-stream-endpoint>"export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"export OTEL_SERVICE_NAME="checkout"# pip install opentelemetry-exporter-otlp-proto-http
In practice
Three queries you couldn't write before.
Each runs against the three signal tables joined with your business data — no custom export, no warehouse round-trip.
What did this AI feature cost per conversion?
Join token-usage `metrics` with conversion events. Get cost-per-converted-user, not cost-per-call.
WITH tokens AS (SELECTattributes:user_id::TEXT AS user_id,SUM(value) AS totalFROM metricsWHERE metric_name = 'llm.tokens.total'GROUP BY 1)SELECTCOUNT(c.user_id) AS converted,AVG(t.total) * 0.000002 AS cost_per_conv_usdFROM conversions cJOIN tokens t ON t.user_id = c.user_idWHERE c.converted_at > NOW() - INTERVAL '7d';
Did this morning's deploy drop conversion?
Spans carry `deployment_environment` and a build SHA. Bucket conversion by deploy and spot the regression.
SELECTt.attributes:build_sha::TEXT AS build,COUNT(DISTINCT o.order_id) AS orders,AVG(o.amount) AS avg_order,SUM(CASE WHEN o.status = 'failed'THEN 1 ELSE 0 END) AS failuresFROM traces tJOIN orders o ON o.trace_id = t.trace_idWHERE t.operation = 'checkout.submit'AND t.start_time > NOW() - INTERVAL '24h'GROUP BY 1ORDER BY orders DESC;
Which errors hurt our highest-LTV customers?
Severity-filtered logs joined with user LTV. Stop optimizing for noise — fix the errors your best customers hit.
SELECTl.service,l.body AS error,COUNT(DISTINCT u.user_id) AS users,SUM(u.lifetime_value) AS at_risk_revenueFROM logs lJOIN sessions s ON s.trace_id = l.trace_idJOIN users u ON u.user_id = s.user_idWHERE l.severity = 'ERROR'AND l.timestamp > NOW() - INTERVAL '7d'GROUP BY 1, 2ORDER BY at_risk_revenue DESCLIMIT 20;
And your APM?
Keep it. This runs in parallel.
We're not asking your SREs to switch tools. Use the OpenTelemetry Collector to fan out — same exporter, multiple destinations. Datadog keeps the dashboards; Keboola gets the joins.
Not a Datadog replacement
Keep your dashboards and alerts where they already work. This pipe runs in parallel — a copy of telemetry for the data team, not a replacement for SRE tooling.
OTLP standard, no lock-in
Configure the OpenTelemetry Collector once. Fan out to Datadog, Honeycomb, Grafana, and Keboola — same exporter, multiple destinations.
Lives where business data lives
Orders, sessions, conversions, LTV — already in your Keboola Storage. That's the join target. APMs can't get there; OTel into Keboola can.
Frequently Asked Questions
http/protobuf protocol. Any official OpenTelemetry SDK works — we've verified Python, Node.js, Go, Java, .NET, and Rust. The OpenTelemetry Collector is supported via its otlphttp exporter, so you can keep your existing collector and just add Keboola as one more destination.Ready to put telemetry next to the data that matters?
Spin up a Data Stream with the OpenTelemetry source, paste the endpoint into your SDK, and the first JOIN works inside a coffee break.