Article 記事

One binary, the whole runtime: WASM hot-reload, durable sessions and voice in Cosmictron

author Jonathan Conway
timestamp 19 May 2026
classification cosmictron / dark-factory / wasm / hot-reload / agent-runtime / substrate / dbsp / developer-experience

A fintech I spoke to last year spent four months building what they called a “simple” agent workflow. It read trade confirmations, matched them against settlement records, flagged discrepancies, and drafted a reconciliation report. The demo worked. Then they ran it in staging and discovered that when the reconciliation agent wrote a flag, the notification agent missed it because the message queue had a two-second delivery window and the polling interval was three seconds. They fixed it with a tighter poll. Then the polling hammered Postgres under load. They added a Redis cache. The Redis cache went stale when two agents wrote the same record concurrently. They added a lock. The lock caused deadlocks when the queue retried a failed job that held the lock. By month four they had a distributed system held together by manual timeouts and a shared Slack channel where engineers posted warnings at two in the morning.

The root problem was not any one piece. It was the number of pieces. Every boundary between app server and database and cache and queue was a surface where time could pass and state could diverge. Their agents were correct in isolation. In combination, they were a collection of race conditions waiting to meet.

This is the death by a thousand cuts that the Cosmictron single-binary design is intended to end.

What the glue stack is actually doing

Before going into what Cosmictron does differently, it is worth being precise about what the traditional stack is doing and why the cuts accumulate.

An application server owns your HTTP surface and your session state. A relational database owns your durable records. A cache (Redis or equivalent) owns your hot reads and your ephemeral pub/sub. A message queue owns your async dispatch and your retry logic. An observability platform owns your traces, metrics, and logs. A container orchestrator owns your lifecycle and your rollouts. Each of these is a separate process, usually on a separate host, communicating over TCP.

Every data access is a network round-trip. Every real-time feature requires a separate integration: a Postgres LISTEN/NOTIFY with a long-poll client, or a Redis pub/sub channel that the app server bridges into a WebSocket, or a Kafka consumer that writes back to Postgres so that a second consumer can read it. Every authorisation check has to be re-implemented at the application layer because the database has no concept of “this session is allowed to see this row in real time.”

For human-facing applications, this architecture is survivable. Humans do not notice a fifty-millisecond round-trip. They tolerate the occasional stale cache. But for an agent swarm where hundreds of workers are reading and writing shared mission state continuously, every hop compounds. The fintech’s problem was not bad engineering. It was that their chosen architecture had no way to eliminate the fundamental latency and consistency costs of data access across process boundaries.

A governed dark factory, where Ninmu dispatches tasks to cells that must coordinate tightly without races, cannot afford those costs. The design choice in Cosmictron is to move the application logic inside the database, not the other way around.

The single binary: what lives where

Cosmictron is, as the name does not suggest, a single Rust binary. That binary contains:

A storage engine with ACID semantics per reducer transaction, an MVCC implementation, WAL persistence, and a B-tree plus hash index layer. A query engine that parses and plans SQL, injects row-level security predicates automatically, and handles time-travel queries with AS OF syntax. A module host running both Wasmtime (for Rust modules compiled to WASM) and V8/Deno (for TypeScript modules). A subscription engine that compiles SQL queries into DBSP circuits and pushes incremental deltas over WebSocket when reducers commit. An authentication system covering email/password, passkeys, and magic links, with sessions bound to WebSocket identities. A control plane HTTP surface for module deployment, schema migrations, and health. An observability layer exporting OpenTelemetry, Prometheus, and structured JSON logs.

Those are the pieces that, in the traditional stack, would be Postgres, Redis, Kafka (or RabbitMQ), an Express or FastAPI application server, a separate auth service, and a Grafana stack. In Cosmictron they are in one process, sharing memory, with no network hops between them.

The practical consequence: a reducer that writes a row can cause a DBSP subscription to propagate a delta to three subscriber agents before the round-trip latency of a single Postgres query has elapsed. The measured figure is 2,326 agent actions per second at 0.43 milliseconds on one node. That is not a benchmark crafted to impress. It is what happens when you eliminate the hops.

Interactive: click any layer to inspect its role and metrics. Toggle air-gap mode to see which external layers vanish and which internal layers continue running inside the single binary. The greyed layers are the hops Cosmictron eliminates.

Business logic inside the database

The concept that makes all of this coherent is the reducer. Reducers are the only way to mutate data in Cosmictron. They run inside ACID transactions, co-located with the storage engine, in sandboxed WASM (Rust) or V8 (TypeScript) runtimes. If a reducer panics or returns an error, the transaction rolls back automatically. There is no way to leave the database in a partially-written state from a reducer crash.

Row-level security is not a middleware layer or an application-level check. It is a SQL predicate attached to a table at definition time:

#[table(name = "mission_events", public)]
#[rls(read = "owner_id = ctx.sender OR ctx.sender IN (SELECT agent_id FROM mission_grants WHERE mission_id = id)")]
#[rls(write = "owner_id = ctx.sender")]
pub struct MissionEvent {
    #[primary_key] pub id: u64,
    pub owner_id: Identity,
    pub payload: String,
}

That predicate is injected as a WHERE clause into every subscription and every query. An agent that calls a reducer cannot see rows it does not own, regardless of how the reducer is written, because the storage engine enforces the policy before the reducer gets a chance to make a mistake.

For a regulated deployment this matters in a way that is easy to underestimate. In a traditional stack the authorisation check lives in application code. Code can have bugs. Code can be omitted under time pressure. Code can be bypassed by a developer who adds a “quick admin endpoint” that will be removed after the demo. Declarative RLS at the storage layer cannot be bypassed by application code, because the storage layer runs beneath application code.

The security model extends to agent identity. Every client connection has an Ed25519 identity. Ultra, Substrate’s separate cryptographic authority plane, manages the full lifecycle: registration, active, suspended, revoked. When an agent’s identity is revoked, its subscriptions are terminated and its pending reducers are rejected at the boundary. There is no shared application credential that a revoked agent could still use.

Hot-reload without dropping connections

Zero-downtime hot-reload is the feature that looks like a deployment convenience and turns out to be a governance requirement.

In a traditional stack, deploying a new version of a service drops the WebSocket connections that agents hold. Those agents have to reconnect and re-establish their subscriptions. In a long-running mission this means state that was in-flight at the moment of deployment is either lost or has to be reconstructed. The reconstruction logic is usually ad hoc. The gap between the old session and the new session is a period where the audit trail is incomplete.

Cosmictron’s hot-reload works differently. The new module loads alongside the old one. Schema migrations run. In-flight reducers drain. The swap is atomic. If the new module’s initialisation fails, it rolls back to the previous version automatically. Client connections are never dropped. Subscriptions continue without interruption. The audit trail has no gap.

This is not just convenient for developers. For any mission governed by an EU AI Act Article 12 requirement (automatic recording of events over the lifetime of a high-risk system from deployment to decommissioning), a deployment that silently drops the event stream creates a hole that has to be explained to an auditor. Hot-reload is one of the mechanisms that makes “governed by construction” a runtime property rather than a policy ambition. The deterministic replay story that makes the audit trail reconstructable is covered in more depth in deterministic replay as the audit trail.

Sessions that survive restarts

Durable sessions are the other property the single-binary design makes straightforward. Because session state lives in the same storage engine as application data, it participates in the same WAL-based persistence. A session is not a Redis key with a TTL. It is a row.

When the binary restarts (planned or unplanned), sessions are restored from the storage engine on startup. Agents that reconnect after a restart find their session still valid, their subscription queries registered, and the pending deltas they missed during the restart delivered. For voice and telephony agents this is load-bearing. A phone call that spans a restart should not result in a lost transcript or a confused session state. The media lifecycle and call state are stored as rows, and they survive.

The stateful agent runtime builds on top of sessions to provide supervisor handoff hooks and tool authorisation with idempotency guarantees. If a tool call fails and the client retries, Cosmictron detects the repeated request ID and returns the cached result without re-executing the tool. This eliminates the class of bugs where a retry loop fires an external API call twice because the network timed out after the call succeeded but before the response arrived.

Subscriptions without polling

The DBSP subscription engine is the piece that deserves the most attention from anyone who has spent time debugging a polling-based agent architecture.

Most real-time databases re-evaluate subscriptions from scratch on every write. A client subscribes to a SQL query. Every time a row changes, the database re-runs the full query and sends the new result set, or at best the diff between the old and new result set computed naively. For simple queries on small tables this is fine. For joined aggregations over tables with millions of rows and hundreds of concurrent subscribers, it is not.

DBSP compiles subscriptions into circuits. A circuit is a directed acyclic graph where each node is an incremental operator: an incremental join, an incremental group-by, an incremental filter. When a reducer commits a change, only the delta propagates through the circuit. A join probes only the rows in the other table that match the changed key. An aggregation updates its running total by adding the new value and subtracting the old one. The full table is never re-scanned.

The result is O(delta) propagation cost rather than O(N). For non-trivial queries with joins and aggregations, the VLDB DBSP paper reports 10-100x better subscription scalability compared to full re-evaluation approaches. Cosmictron’s measured production figure of 2,326 actions per second at 0.43 milliseconds reflects this: that throughput holds under load precisely because the subscription cost does not grow with table size.

Interactive: toggle between DBSP incremental and naive full re-evaluation, then trigger writes. Watch the cumulative compute-unit counters diverge. In incremental mode a reducer write propagates only through the affected circuit nodes, inside the same binary, with no external queue or database round-trip between the write and the subscriber agents.

For the governed dark factory, this matters in a specific way. When Ninmu decomposes a mission into tasks and dispatches them to agent cells, those cells need to observe shared mission state in real time. The kanban board of a running mission, the supply-chain exception queue, the budget ledger: all of these are DBSP-subscribed views. When a cell writes a result, every other cell that subscribes to an affected view receives the delta sub-millisecond. Coordination happens through the shared state directly, without any external queue, without any polling interval, without any cache invalidation logic. The multi-agent coordination without races post covers the coordination patterns this enables in more detail; the DBSP mechanism itself is the subject of DBSP incremental views: the death of polling.

Developer experience: SDKs and React hooks

The single-binary architecture is only useful if developers can actually build on it. The SDK surface is designed to make the most common patterns feel close to what a developer already knows.

The TypeScript SDK exports React hooks that bind directly to Cosmictron subscriptions:

import { useSubscription, useReducer } from '@cosmictron/react';

function MissionBoard({ missionId }: { missionId: string }) {
  const tasks = useSubscription<Task>(
    `SELECT * FROM tasks WHERE mission_id = '${missionId}' ORDER BY created_at`,
  );
  const updateStatus = useReducer('update_task_status');

  return (
    <div>
      {tasks.map(task => (
        <TaskCard
          key={task.id}
          task={task}
          onComplete={() => updateStatus({ taskId: task.id, status: 'done' })}
        />
      ))}
    </div>
  );
}

useSubscription establishes a DBSP subscription when the component mounts. The hook returns the current result set and re-renders only when a delta arrives. There is no polling. There is no manual refetch. There is no stale data from a disconnected cache. The component is a live view of the database.

useReducer returns a function that calls a Cosmictron reducer. The call goes over WebSocket, runs inside a transaction co-located with the storage engine, and the subscription delta from the resulting write arrives on the same WebSocket connection before the call returns. From the component’s perspective, a reducer call and its resulting state update are a single atomic operation.

For Rust developers building module logic:

use cosmictron::prelude::*;

#[reducer]
pub fn assign_task(ctx: &ReducerContext, task_id: u64, agent_id: Identity) -> Result<()> {
    ctx.db.tasks()
        .filter(|t| t.id == task_id)
        .update(|t| { t.assigned_to = Some(agent_id); t.status = "assigned".to_string(); })?;
    Ok(())
}

The reducer runs inside a WASM sandbox. It has typed access to tables via generated accessors. It cannot perform I/O outside of what the module declares (no arbitrary HTTP calls, no filesystem access). The sandbox is the security boundary that makes multi-tenant deployments safe: two tenants’ module code cannot interfere with each other’s data even if both modules are running inside the same binary.

The auth SDK wraps the three authentication methods behind React components that work without any server-side code:

import { CosmictronAuthProvider, LoginForm } from '@cosmictron/auth-sdk';

function App() {
  return (
    <CosmictronAuthProvider endpoint="http://localhost:4000">
      <LoginForm onSuccess={(session) => console.log('Authenticated:', session.identity)} />
    </CosmictronAuthProvider>
  );
}

The authenticated session binds to the WebSocket identity, so RLS policies referencing ctx.sender apply automatically from the moment of login.

Migration from a traditional stack

The practical question for any team reading this is: what does it actually take to move from an existing stack to Cosmictron?

The honest answer is that it is not a lift-and-shift. You are not moving Postgres tables over one-for-one. The Cosmictron data model requires that mutations go through reducers, which means your existing service code that writes directly to the database via an ORM needs to be rewritten as reducer calls. This is real work. For a mature production system with thousands of direct queries it is months of engineering.

The more tractable migration path is greenfield agent workflows. New agent functionality, new microservices, new agentic pipelines: build these on Cosmictron from the start and leave the legacy system in place. The Cosmictron PgWire surface means that read-heavy tools that speak Postgres (BI tools, reporting dashboards, data pipelines) can query Cosmictron tables without modification. The HTTP REST API gives external systems a standard surface for writes that need to flow in from outside the Cosmictron perimeter.

For the specific case of adding agents to an existing regulated workflow, the pattern that works well is to use Cosmictron as the agent coordination layer while leaving the system of record in the existing database. The agents read from both, write their results into Cosmictron, and the Cosmictron module synchronises outcomes back to the system of record through a reducer that calls an external API. This way the existing audit trail is not disrupted and the agent layer gets the real-time coordination semantics it needs.

The supply-chain and kanban examples in the Cosmictron examples directory both demonstrate this pattern. The supply-chain example in particular shows how to model external events arriving from a legacy ERP system, route them to agents via Cosmictron reducers, and write the results back out through a signed action log. It is worth reading before starting a migration.

Voice and telephony: the built-in case

The telephony foundations are worth a specific mention because they are an unusual thing to find in a database runtime, and the design reason is instructive.

Voice agents are among the most regulated agent use cases. A call-centre agent that accesses patient records, discusses eligibility decisions, or processes financial transactions is subject to recording requirements under MiFID II, HIPAA, and various national ePrivacy regulations. Recording requires consent. Consent needs to be captured and stored with the call metadata. The recording itself needs to be linked to the transcript and the agent session and the actions taken during the call.

In a traditional stack this requires: a telephony provider (Twilio, Vonage, etc.), a media recording service, a consent store, a transcript service, a separate database for linking all of the above, and an application layer that tries to keep them in sync. Each of these is a network hop, a potential failure point, and a seam in the audit trail.

Cosmictron includes provider-neutral telephony ingress, WebRTC/media lifecycle support, bridge logic, and recording and retention workflows as first-class features of the runtime. Consent capture is a reducer: it writes a consent record to the database before the recording starts. The recording is linked to the session row. The transcript, when it arrives, is written as a row in the same schema. The agent actions taken during the call are in the same ACID transaction space as the call state.

This is not a feature you could reasonably bolt onto a Postgres-plus-Redis stack later. It requires that call state, agent state, and recording metadata share a consistency boundary. The single-binary design provides that boundary.

For a healthcare or financial services deployment this has a concrete regulatory implication. The EU AI Act’s Article 12 logging requirements apply to high-risk AI systems and demand automatic recording of events over the system’s lifetime, including operational monitoring and risk situations. A voice agent that makes eligibility decisions is a high-risk system. Having call state, consent, transcript, and agent actions in a single auditable store, with deterministic replay, is the difference between a system that satisfies the requirement by construction and one that requires a quarterly evidence-gathering exercise to produce a compliance pack.

The kanban and supply-chain examples

Two examples from the Cosmictron repository illustrate the single-binary design in practice and are worth examining before building your own agent workflow.

The agent-kanban example models a software delivery mission as a kanban board. Agents subscribe to columns in real time. When a task moves to “in review,” the reviewer agent receives the delta sub-millisecond and begins its work. When the review is complete, the reducer fires, the task moves to “done,” the budget ledger is updated (via Ninmu), and every subscribed agent observes the change simultaneously. There is no poll, no cache, no race between the review agent finishing and the planning agent noticing. The shared state is the coordination mechanism.

The supply-chain example is more instructive for regulated readers because it introduces external data sources. Events arrive from a simulated ERP system over HTTP. A reducer validates them, writes them into the Cosmictron schema, and the DBSP subscriptions propagate the relevant deltas to the agents that care about each event type. Exceptions are routed to a human gate agent that surfaces them in a UI. The full event lineage, from ERP arrival through agent processing through exception decision, is in one auditable store.

Both examples include a prediction-market variant that demonstrates how multiple agents can write competing assessments of the same mission state and have the subscription engine surface the aggregated view without any agent needing to know about the others. This is the coordination pattern that makes large swarms tractable: agents do not coordinate by calling each other, they coordinate by writing to shared state and reading from incremental views.

What to demand in an RFP

If you are evaluating an agent runtime for a regulated deployment, the questions below separate the platforms that have actually solved the distributed-state problem from those that have papered over it.

Ask whether business logic runs co-located with the storage engine or in a separate process. If the answer is “separate process,” ask how they handle the consistency window between the application write and the database write. If the consistency window is nonzero, ask how they reconstruct the audit trail during that window.

Ask how subscriptions are implemented. If the answer is full re-evaluation or change data capture via Debezium or Postgres logical replication, ask what happens to subscription latency when table size grows. Ask to see a benchmark that shows latency at their claimed scale with joins and aggregations, not just key-value lookups.

Ask what happens to in-flight agent sessions during a deployment. If the answer involves any reconnection window, ask how session state is preserved across that window and who is responsible for replaying missed events.

Ask about the security model for multi-tenant deployments. If the authorisation boundary is in application code rather than the storage layer, ask to see the test coverage for the authorisation logic and ask who audits it.

Ask about telephony and media recording specifically if your use case involves voice agents. A platform that treats recording as an external integration is a platform where consent and call state and agent actions are in different consistency domains. That is a compliance risk for any regulated conversation.

A 90-day pilot design

The most effective way to evaluate whether a single-binary runtime changes anything in practice is to take one high-coordination agent workflow and run it twice: once on the existing stack and once on Cosmictron.

Pick a workflow that currently has at least one known race condition or cache staleness issue. The supply-chain exception workflow is a good candidate if you operate in that space, because the timing sensitivity between the event arriving and the agent responding is easy to measure. A claims routing workflow in healthcare is another good candidate, because the cost of a routing decision based on stale data is concrete and quantifiable.

Run the same agent logic on both stacks. Measure: end-to-end latency from event arrival to agent response, consistency errors per thousand events (wrong routing due to stale state), and infrastructure component count (the number of processes that have to be healthy for the workflow to run correctly).

The consistency error rate is the number that matters most for regulated buyers. A workflow that runs faster on average but produces one-in-ten-thousand incorrect routing decisions due to cache staleness is not suitable for a claims or trade-matching context. The single-binary design eliminates the staleness class entirely by design, not by making the cache faster.

If the pilot confirms that the consistency errors disappear and the infrastructure component count drops, the next question is whether the hot-reload and durable session properties hold under realistic deployment conditions. Deploy a new module version during a running mission and verify that the audit trail has no gap. Restart the binary under load and verify that no sessions are lost. These are the properties that convert a promising benchmark into a production decision.

The full picture of how Cosmictron fits into the broader factory, including how the six systems interoperate rather than just coexist, is in the six systems as one factory, not a stack. The Voxeltron cell density story, which is what makes it economical to run thousands of Cosmictron cells simultaneously for a large mission, is in Voxeltron: under 50ms boot, 10,000 idle cells.

The factory connection

It is worth stepping back to explain why this runtime design matters for the governed dark factory specifically, rather than just for well-engineered applications in general.

The Substrate homepage starts with a claim: declare the mission and the budget, and a governed swarm plans, writes, tests, reviews, and ships it, with every action signed and auditable by construction. That claim requires, among other things, that hundreds of agents can share mission state at sub-millisecond latency without races, that every write is transactional and signed, and that the entire run is replayable.

None of those properties can be retrofitted onto a glue stack. They require that the data plane and the execution plane and the identity plane share a consistency boundary from the beginning. Cosmictron is the data and execution plane of that boundary. Ultra is the identity plane. The two of them together, inside Voxeltron cells, under Ninmu’s governance, are what makes “governed by construction” a runtime property rather than an aspiration.

The death by a thousand cuts that the fintech experienced was not unusual. It is the default outcome of building agent workflows on infrastructure that was designed for human-speed services. The single-binary design is the alternative, and it is already running in production at 2,326 actions per second, with modules deployed over hot-reload and sessions surviving restarts, on hardware the customer owns.

If you want the full technical picture, including the financial model and the head-to-head against glue stacks, you can request the investor brief. If cost governance is the specific concern, cost governance before the invoice arrives covers the Ninmu budget ledger in detail.