Article 記事

Bitemporal memory as the compliance backbone of the business factory

      author
      Jonathan Conway
    

      timestamp
      10 May 2026
    

      classification
      bitemporal / governed-memory / compliance / audit / business-factory / regulated / substrate / provenance
    

A team at a mid-tier investment bank deployed an AML screening agent in late 2024. The agent worked. It read transaction records, queried an entity graph, flagged suspicious patterns, and produced exception reports fast enough to clear a week’s backlog in a day. Everyone was pleased. Six months later, a regulator asked a simple question: what did the agent know about counterparty X at the time it cleared transaction 4847?

The team could not answer. The memory system stored current state. Old facts had been overwritten. The retrieval logs were gone. What remained was a correct current picture of the counterparty and no record at all of what the agent had seen on the day it mattered.

The fine was not because the agent gave a wrong answer. The fine was because the firm could not demonstrate what answer the agent had.

That distinction is the whole argument for bitemporal memory in regulated settings. And it is the reason the governed memory engine, the memory layer at the heart of the Substrate business factory, is built around two time axes rather than one.

The problem with a single timeline

Almost every production memory system stores one thing: the current state of what is known. A client is classified as “High Risk”. A counterparty’s jurisdiction is “UK”. A medication interaction flag is “active”. These are facts as of now. If the fact changes, the old version is updated or overwritten, and the previous answer is gone.

For most applications, that is the right behaviour. Yesterday’s shopping basket preference does not need to be forensically reconstructable.

In regulated environments it is a structural liability. Three scenarios make this concrete.

The first is the audit reconstruction scenario. A regulator, an internal auditor, or an opposing counsel asks: show me what your system knew about entity X on date D. If your memory system holds only current state, the answer is “we don’t know.” That is not defensible. MiFID II Article 16(6) requires records “sufficient to enable the competent authority to fulfil its supervisory tasks.” The EU AI Act Article 12 requires high-risk systems to maintain logging that enables post-hoc review of the system’s operation over its lifetime (EUR-Lex, Regulation (EU) 2024/1689). A memory system that cannot reconstruct past state is structurally non-compliant with both.

The second is the stale-fact scenario. Facts about the world often change with a delay between when the change happened (valid time) and when the system learned about it (transaction time). A client might have been reclassified by the compliance team on Tuesday, but the agent did not receive the updated record until Thursday. An agent running Wednesday sees stale data. In a flat-timeline system, there is no way to represent this gap. Either the system shows the reclassification as effective from Tuesday (possibly incorrect, because the system did not know that on Wednesday) or from Thursday (possibly incorrect for audit purposes, because the reclassification was legally in effect earlier). Bitemporal storage handles this precisely: the fact has a valid-from date independent of the transaction date, and both are preserved.

The third is the human-gate reconstruction scenario. When the swarm surfaces a decision to a human approver, that person is making a judgment call on the basis of what the agents retrieved at that moment. If there is later a question about the decision, you need to be able to show the exact memory state the agents presented at the gate. Not what the system knows now. What it knew then. Without bitemporal storage, this is impossible.

What bitemporal actually means

The term gets used loosely. For the governed memory engine, it has a precise meaning: every fact in the memory graph carries four timestamps, not one.

Valid from: the point in time when the fact became true in the real world.

Valid to: the point when it ceased to be true in the real world (absent means “still true”).

Transaction from: the point when the system recorded this fact.

Transaction to: the point when the system recorded that this fact was superseded (absent means “still the current record”).

A query can be scoped to any combination of these. “Give me the facts that were valid as of March 15th and that the system knew about as of March 15th” is a precise query. So is “give me the facts the system believed were valid on March 15th, including late-arriving information recorded afterward.” These are different questions. In a regulated audit, you need to be able to ask both, and get different answers.

Interactive: drag the as-of slider to scrub through a client’s risk classification history. The facts visible at any point are exactly those that were valid and known at that time. Dimmed facts existed but were out of scope for that query. This is what the agent saw at the human gate.

Spend a minute with that diagram. Drag the as-of slider back to a point before the most recent reclassification. The higher-risk classification disappears from the visible set. That is not a UI trick. It is the query executing against the bitemporal index: the same query the swarm would have executed at that moment in time. If you need to reproduce what happened at the human gate, you set the as-of time to the gate timestamp and you have an exact record.

The governed memory engine achieves this at roughly 3 ms per recall on the temporal graph. That number is verified from the Substrate platform metrics. It is fast enough that bitemporal scoping adds no meaningful latency to the swarm’s execution path.

Cryptographic provenance: signing the knowledge state

A bitemporal index tells you what was visible when. It does not, by itself, tell you that the record has not been altered since. For a tamper-evident audit trail, you need the memory reads and writes to be part of a signed chain of events.

This is where the identity service, the Substrate identity and provenance layer, enters the picture. Every action the swarm takes, including every memory read and every memory write, is signed by the agent that took it using its Ed25519 identity and committed to an append-only log. Each log entry includes a hash of the previous entry. Alter any block and every subsequent block becomes cryptographically invalid.

The result is that the audit trail is not separate from the memory system. The audit trail is the memory system, as seen through the provenance chain. When a regulator asks “what did the agent know at the moment it passed the file to the human gate,” the answer is: open the signed event log, find the memory read that preceded the gate event, and reconstruct the subgraph that was returned. The answer is provable, not asserted.

Interactive: click ‘simulate tamper’ on the hash chain to break the signature on any block. Every downstream block becomes invalid immediately. This is what tamper-evidence means in practice: not a policy that says records must not be altered, but a cryptographic structure that makes undetected alteration impossible. Hover any block to inspect its signed fields.

The combination of bitemporal indexing and cryptographic provenance closes a gap that a significant number of production agent deployments leave open. Bitemporal alone tells you what was known when, but cannot prove the record was not altered afterward. Cryptographic provenance alone can detect tampering, but without bitemporal scoping it cannot answer historical queries. Together they make the entire factory run forensically reconstructable: the subgraph the agent queried, the agent that queried it, the policy that scoped the query, and the signed evidence that none of this was modified post-hoc.

How this sits inside the Substrate factory

When you declare a mission and a budget to Substrate, you are not just setting up a task queue. You are declaring a governed perimeter. Every agent that runs under that mission operates within governed memory: only the facts in scope for the declared policy are visible, all reads and writes are timestamped against both valid and transaction time, and every action is signed into the identity service provenance chain.

This is what “governed by construction” means in practice. Governance is not a layer you add after the swarm runs. The swarm cannot run outside the governed perimeter. There is no API call that bypasses the temporal scoping. There is no write that skips the signing step. The 790,000 lines of owned Rust and Elixir code across the six systems (the mission orchestrator, the realtime data plane, the governed memory engine, the identity service, the agent forge, the cell runtime) were written to a single governance contract, not assembled from components that each have their own idea of what an audit trail means.

Compare this to the glue-stack approach. A typical enterprise pilot today runs an orchestration framework (LangGraph or similar), a vector store (Pinecone or a Postgres extension), a separate logging service, and some audit tooling bolted on at the edges. Each of these has its own timeline. The vector store does not have a transaction time. The logging service timestamps from when the log event arrived, not from when the fact became valid in the domain. If there is a discrepancy between the two, and there often is, you are in the situation of the AML team in the opening: you have data, but you cannot prove what the agent knew when.

The Substrate approach is different not because it has better individual components (though the ~3 ms recall figure is real, and Postgres-based audit tables rarely get there without significant tuning). The difference is that the temporal model and the provenance model are the same model, expressed once, at the storage layer, and inherited by everything above it.

Three sector walkthroughs

Trade finance and AML

A trade-finance evidence pack is fundamentally a reconstruction task. After the fact, you need to show which entities were involved, what rules applied to them at the time, what the agent concluded, and who approved what. Every one of those requirements is a temporal query. The entities may have changed sanctions status between when the transaction was initiated and when it was cleared. The rules may have been updated. The exposure limit may have been revised.

With flat-timeline memory, you assemble the evidence pack from what the system knows now, and hope it matches what the system knew then. Usually it does. When it does not, and the discrepancy ends up in front of a regulator, you have no recourse.

With bitemporal memory, the evidence pack is assembled by querying the memory graph as it stood at the time of each decision. The sanctions status that was visible at T1, the rule version that was in effect at T2, the exposure limit that was valid at T3. Every item in the pack is scoped to the moment it was relevant. The whole pack is signed by the identity service. The detailed trade-finance walkthrough is in the trade-finance evidence pack article.

DORA resilience testing

The Digital Operational Resilience Act requires financial institutions to run and document resilience tests. Specifically, it requires evidence that the test was conducted under conditions representative of the actual operational environment at that point in time. For an agent-driven DORA test, this means the agents must have access to the memory state that was actually in effect during the period being tested, not the current state of the system.

This is a classic bitemporal use case. You are not testing what the system knows now. You are reconstructing what it knew during the relevant period and running the test against that state. Without bitemporal storage, DORA evidence preparation is either impossible (you have no historical state) or manual (someone reconstructs the historical state from backup files and asserts it is accurate). Neither holds up well under audit.

Healthcare claims with lineage

A clinical coding agent classifies an episode of care for billing. The classification depends on procedure codes, diagnosis codes, and payer rules. All three change over time. Procedure codes are revised annually. Diagnosis codes are updated. Payer rules change on contract cycles.

For a claim to be defensible under audit, the agent must have applied the rules that were in effect at the date of service, not the rules in effect at the time of coding. If the claim is reviewed eighteen months later, the auditor needs to see which version of each rule was visible to the agent at the date-of-service time point. This is a bitemporal query: give me the facts that were valid on date-of-service, as known at the time of coding.

Without bitemporal storage, the clinical coding agent is essentially making legally uncertain decisions because it cannot prove which rule version it used. With it, every claim carries a cryptographically verifiable lineage: the exact rule versions, the model state at the moment of classification, the human gate attestation if one was required, all scoped to the temporal window that matters.

What a bolt-on audit table cannot give you

The instinct of most enterprise architects, when faced with this requirement, is to add an audit table. Insert a row every time a fact changes, capture the old value and the timestamp, and query that table when you need historical state.

This works. Up to a point. The problems emerge in production.

The first problem is consistency. The audit table is a separate data store from the primary memory store. There is always a window, however small, where a write to the primary store has succeeded but the audit table row has not been committed. Under load, or after a crash, these windows become visible. The audit record and the primary record disagree. For a regulatory inquiry, this is a problem.

The second problem is completeness. Audit tables record changes to facts. They generally do not record reads. But in an agent memory system, the read is often as forensically significant as the write. The question “what did the agent know when it made decision X” is answered by what was returned on the read that preceded decision X, not just by what was in the table at that time. Without read provenance, you are missing half the audit trail.

The third problem is integration. An audit table is a log of events. A bitemporal graph is a queryable data structure. Reconstructing the state of a complex entity graph at a specific point in time from a log of change events requires replay logic that is non-trivial to implement correctly and extremely expensive to run on large datasets at query time. the governed memory engine’s bitemporal index is designed to answer these queries in roughly 3 ms because the temporal structure is first-class in the storage model, not reconstructed at query time.

The fourth problem is provenance. An audit table records that a fact changed. It does not record which agent made the change, under which signed identity, pursuant to which policy, as part of which mission. Without that context, the audit table is a history without a chain of custody. the identity service’s signed event log provides the chain of custody. Together with the governed memory engine’s bitemporal index, you have both the what (what was in memory at any point) and the who, when, and why (the signed provenance of every operation that affected it).

Governance by omission

One more property of the governed memory engine’s design that is relevant to regulated deployments: the governed perimeter does not just scope queries to a time window. It scopes queries to a policy-defined boundary of what is in scope for a given mission.

This has a specific implication for agents handling multiple clients, multiple regulatory jurisdictions, or multiple data classification levels. An agent operating under a mission scoped to EU regulatory requirements simply does not see facts that are outside that scope. They are not returned. They are not suppressed by a post-retrieval filter that could be misconfigured. They are absent from the query space. The spec calls this “governed by omission.”

For GDPR, this means that once a hard delete is executed (see the dedicated treatment in GDPR hard-delete and agent memory and hard-delete as governed by omission), the deleted facts do not merely fail to appear in query results. They are not in scope. The memory layer does not know about them in any retrievable sense. The deletion itself is recorded in the signed audit log, as an event with a timestamp and a signing agent identity. What is not in the log is the content of what was deleted. This is the correct cryptographic posture for a system that must prove erasure.

What to demand in an RFP

If you are evaluating memory infrastructure for a regulated agent deployment, the temporal question separates the serious systems from the demos faster than any benchmark.

Ask whether the system has a bitemporal data model. Not “do you track history” (almost everything does in some form) but: can I query your memory store for the facts that were valid on date D1 as known at date D2, where D1 and D2 are independent parameters? Ask them to show you the query syntax and run it against a live dataset with a test case where valid time and transaction time diverge. Many systems will struggle.

Ask whether reads are part of the audit trail, not just writes. Specifically: if a regulator asks what facts were retrieved by agent X at time T, can you produce a cryptographically signed record of that retrieval? A system that logs writes but not reads is missing the evidence most relevant to reconstructing agent behaviour.

Ask whether the audit log and the memory store share a time model. If cost control lives in one tool and the audit log lives in another, they will disagree when you need them most. The Substrate approach is that the signed event log and the bitemporal memory index are produced by the same operation, under the same governance contract, with no possibility of divergence.

Ask what happens to the memory model when the deployment goes air-gapped. For sovereign deployments on private infrastructure (which a significant number of regulated buyers require), the bitemporal model must function identically with no dependence on an external time authority or cloud service. Because Substrate runs on the customer’s own hardware by default, this is not a special case. It is the baseline.

Ask for the recall latency under temporal scoping. A bitemporal query that takes 800 ms per call will kill a real-time trading workflow. The ~3 ms figure for the governed memory engine is achievable because temporal scoping is in the index, not applied as a post-retrieval filter.

A 90-day pilot design

Pick one regulated workflow where the audit reconstruction question is a genuine pain point. AML exception handling. DORA evidence preparation. Clinical audit review. Any workflow where, if a regulator asked “show me what the system knew at time T”, the current answer is “we would have to look at backups and reconstruct it manually.”

Run it through Substrate for a quarter. The three numbers to track are: the percentage of audit queries that can be answered directly from the signed event log without manual reconstruction; the time to produce an evidence pack scoped to a specific historical time window; and the number of occasions the swarm returned a temporally inconsistent fact (a fact that was out of scope for the valid-time window of the query). The last number, if the deployment is correct, should be zero.

What you are really testing is whether the governance properties are real or asserted. Any vendor can claim “full audit trail.” The question is whether the audit trail can reconstruct the exact memory state the agent operated on, at any point in time, with cryptographic proof that the record was not altered afterward. If the answer is yes, you have a system a CRO can sign off on. If the answer is “it depends on which logs you look at,” you do not.

The investor brief, including the financial model for a compliant agent memory deployment against a traditional bolted-on audit approach, is available at /substrate. For the deeper treatment of how provenance separates compliant deployments from everything else, see provenance as the real differentiator. And for the full story of how the Substrate factory turns a declared mission and budget into a signed, auditable output, declare the mission and the budget is the place to start.