Skip to content
vollko
Main
Homepage Engineering Transformation Whitepaper OSS catalog
The trace · deep dives
01 · sense
sensing-ingestion
02 · substrate · memory & identity
knowledge-graphs agent-memory agent-identity observability
03 · cognition · the firm thinks
agent-frameworks orchestration eval-harness protocols
04 · trust + learning
governance feedback-loops
05 · synthesis · one trace
end-to-endStart a conversation
AI-native · layer 01

Read the world cleanly.

Six sensor families. VLM doc parsing, CDC-to-vector, event streams, schema registries. The substrate's intake.

PHYSICAL TELEMETRY MARKET DOCUMENTS HUMAN TRIGGERS ROUTERnormalize · tag · key STREAMms-to-s EVENTtrigger-based BATCHmin-to-day
Section 01 · six sensor families

Documents are not enough.

six families · each with its own ingestion pattern
01 · PHYSICALmachines on the floorsensors read livetemperature · vibration · flow 02 · TELEMETRYwhat software is doingevery action recordedlogs · traces · queue depths 03 · MARKETthe outside worldchecked on scheduleprices · news · regulations 04 · DOCUMENTScontracts, emails, PDFsread as if a person skimmed themtables, headings, signatures kept 05 · HUMANtickets, chat, callsprivate info stripped firstbefore any agent touches them 06 · TRIGGERS'something just happened'rules in Git, not in headsthresholds · schedules · anomalies "AI projects" usually only build #04 (documents). The other five are where the leverage hides.
Six families. Each has its own ingestion pattern and its own failure mode.
Section 02 · document parsing 2026

An AI reads the page like a person would.

documents arrive already understood, not just scanned
PDF / SCAN AI READERreads in one pass98% accuracy on tablesstructured output STRUCTURED PAGEtables, headings, listskept in place chunks → vector store entities → knowledge graph lineage → episodic log substrate
Old stack: scan then guess the layout. New stack: one model reads the whole page at once.
Section 03 · CDC into vector store

Change data capture - embeddings stay fresh.

Debezium 3.5 SparseVector cross-connector (Mar 2026)
POSTGRESsource of truthorders, customers, etc. DEBEZIUMWAL → eventsexactly-once EMBED WORKERon insert/updatere-embed changed row VECTOR STOREpgvector / Qdrantalways fresh never re-embed the full corpus. just the rows that changed.
For Postgres-native shops, this is logical replication - free with the database.
Section 04 · event streaming for agents

Redpanda Agentic Data Plane (Feb 2026).

streaming substrate purpose-built for agent events
AGENTpublishes REDPANDA / KAFKA / NATSappend-only · ordered · partitioned sense.docs.receivedcognition.action.proposedaction.executed other agent (subscribes) eval pipeline audit log + warehouse
Redpanda's pivot in 2026: the streaming substrate is the agent data plane.
Section 05 · schema registry

Apicurio - stores agent artifacts.

first registry to host A2A Agent Cards + MCP prompts alongside Avro/Protobuf
APICURIO REGISTRY · Apache 2.0 AVROevent payloads PROTOBUFgRPC contracts MCP PROMPTSversioned, reviewedNEW in 2026 A2A AGENT CARDScapabilities, scopesNEW in 2026
Confluent and Buf will follow within 12 months. Apicurio shipped first.
Section 06 · tools 2026

The OSS-first stack.

VLM-native ↑↓ legacy OCR · OSS ←→ SaaS
SaaSOSSVLM-nativelegacy OCR Granite-DoclingIBM · Apache 2.0 · 258M Markerdatalab-to · OSS Unstructured30+ formats · OSS Mistral OCR 3paid API · $2/1k pages Reducto$75M Series B · verticals AWS Textract personal-ediscoveryvollko · local + MCP
For self-hosted: Granite-Docling. For paid quality: Mistral OCR 3. For verticals: Reducto.
Section 07 · vollko OSS

The primitives.

· · ·
Build the AI-native firm