Skip to content
vollko
Main
Homepage Engineering Transformation Whitepaper OSS catalog
The trace · deep dives
01 · sense
sensing-ingestion
02 · substrate · memory & identity
knowledge-graphs agent-memory agent-identity observability
03 · cognition · the firm thinks
agent-frameworks orchestration eval-harness protocols
04 · trust + learning
governance feedback-loops
05 · synthesis · one trace
end-to-endStart a conversation
AI-native · substrate

Every call traced.

OTel GenAI spans, cost attribution, hallucination flags, byte-deterministic replay. The agent's flight recorder.

workflow.escalation_v3 · 4.3s retrieve.context · 80ms vector.search graph.walk llm.call · gpt-4o · 3.1s · $0.024 · 2,180 tok tool.policy.check tool.send_email · 920ms eval.outcome_capture · 12min later CORRELATION_ID = corr_t88241links all spans across days METRICS rolled up per span cost: $0.024latency p95: 4.3stokens: 2,180tools: 4hallucination heuristic: 0.08policy_blocks: 0
Section 01 · OTel GenAI

One wire format, any backend.

every agent call · six numbers that matter
EVERY AGENT CALL · 6 NUMBERS WHO RAN IT customer-triage agent WHAT IT COST $0.024 HOW LONG 3.14 seconds HOW MUCH THINKING 2,180 tokens WAS IT FLAGGED no · 8% suspicious WAS IT BLOCKED no · 0 policy stops OPEN STANDARD · works with Datadog, Honeycomb, Phoenix, Langfuse, Logfire pick a tool by query power · switch any time · no vendor lock-in
Record once. Read anywhere. The vendor becomes a query interface, not a cage.
Section 02 · cost attribution

$ per span, $ per agent.

7-day cost breakdown · one agent
$ / DAY MTWTFSS spike: route to cheap model LLM calls retrieval tool side-effects
Cost is a span attribute. Alert on rising cost like you alert on rising latency.
Section 03 · byte-deterministic replay

Reproduce the exact behavior.

scroll + seed = same output every time
agent-scrollbyte-deterministic transcript [01] system: ...[02] user: "draft reply"[03] tool.search(q="...")[04] tool.search.result[05] llm.call(...)[06] llm.response: ...[07] tool.send_email(...)SHA: bafkrei4f... agent-rerunreproducibility seeds prompt_sha: bf41...model: claude-4-5temperature: 0.0seed: 42tool_seeds: { search: "fixed:42", rng: "fixed:42"} REPLAY → same prompt→ same retrieval→ same LLM call→ same response bytesdebug in safetyaudit in court
Without deterministic replay, every agent bug is a one-time event. With it, every bug is a test case.
Section 04 · drift on the response

Watch distributions, not metrics.

distribution drift on output length
REFERENCE (week 1)PRODUCTION (week 8) tokens out per call 2002000 2002000 bimodal · new mode emerging refusal-rate, output-length, embedding distribution · drift before semantics
A single average won't catch this. A KS test on the distribution will, the day it emerges.
Section 05 · tools 2026

Where the OSS picks land.

OTel-native ↑↓ proprietary · OSS ←→ SaaS
SaaSOSSOTel-nativeproprietary OpenLLMetryauto-instruments 40+ providers Arize PhoenixOSS + OTel-native LangfuseOSS core (now ClickHouse-owned) Logfirefree tier 10M spans/mo · Pydantic Honeycombfirst-class OTel GenAI May 2026 Datadog LLM Obs Heliconeskip · maintenance mode (Mintlify) agent-scroll + agent-rerunvollko · replay primitives
Pick OTel-native + OSS. Add a hosted backend if your team prefers a UI. Avoid proprietary wire formats.
Section 05b · transcript + replay

Two implementations. Same bytes.

tamper-proof transcript + the exact recipe to run it again
TAMPER-PROOF TRANSCRIPT RAW TRANSCRIPTfrom any vendor (OpenAI · Anthropic · Google) NORMALIZEDsame shape no matter who recorded it SEALED & SIGNEDany change is detectable · even years later THE RECIPE TO RUN IT AGAIN REPRODUCIBILITY BUNDLE MODEL same vendor, same version SETTINGS same knobs (creativity, randomness) INPUT exactly what the agent saw EXPECTED OUTPUT what should come back TOLERANCE exact · close-enough · same meaning SIGNATURE tamper-proof
A scroll proves what was said. A rerun-bundle proves it can be reproduced. Together: dispute-grade evidence.
Section 06 · vollko OSS

The primitives.

· · ·
Build the AI-native firm