Article 記事

Spreading activation and governance: explainable retrieval regulators can trust

author Jonathan Conway
timestamp 11 May 2026
classification kizuna-mem / spreading-activation / governed-memory / explainability / act-r / retrieval / compliance / regulated

A healthcare payer I spoke to recently ran an AI pilot on prior-authorisation decisions. The system performed well on accuracy. Then their clinical governance lead asked a simple question: why did the agent retrieve those three clinical precedents and not the four more recent ones? Nobody could answer. The retrieval was a vector similarity score against a flat embedding store. It had no path. It had no reason. It just had a number, and a number is not a defence.

This is not a niche problem. It is the problem that separates a demo from a system a regulated enterprise can actually run. The EU AI Act, Article 9, requires high-risk AI systems to maintain documentation sufficient to demonstrate conformity with the act’s requirements. Article 13 requires that high-risk systems be transparent in a way that allows deployers to interpret the system’s output. When the output is a clinical decision, a credit determination, or a government eligibility ruling, the retrieval path that produced the agent’s context is part of the evidence chain. If you cannot reconstruct it, you do not have evidence. You have a black box with a good benchmark number.

The Substrate factory’s answer to this is spreading activation over a governed temporal graph in Kizuna-mem. The name comes from cognitive science. The mechanism is precise. And the governance layer is what turns it from a clever retrieval trick into something a CRO can sign off on.

The problem with the two popular alternatives

Before explaining what spreading activation does, it is worth being specific about what the alternatives cannot do and why that gap is structural rather than incidental.

Flat vector search retrieves memories by embedding similarity. You compute a vector for the query and find the nearest neighbours in the embedding space. Fast, cheap, well-understood. The problem is that similarity is not relevance, and relevance in a regulated context is not even a single concept. A document can be semantically similar to a query but refer to a superseded policy. It can be highly relevant to the query text but completely out of scope for this mission. It can be exactly the memory you need but have been hard-deleted because the data subject exercised their rights under GDPR. Vector search has no native concept of any of this. You can bolt governance rules onto the outside, but the retrieval path itself does not carry them, which means you cannot reconstruct a governed path after the fact.

Community-summary GraphRAG (as popularised by Microsoft Research’s 2024 paper and its derivatives) does something more interesting. It builds a hierarchical community structure over the graph and uses community summaries to answer broad questions. The retrieval path is more structured than flat vector search. But the path goes through summaries, not original memories, which creates a provenance problem for regulated work. The agent retrieved context that passed through a summarisation step, and you cannot show the auditor the original memory node, because the retrieval path did not visit it. More practically, the community structure is computed as a global operation over the entire graph, which means governance-by-omission (the principle that out-of-scope memories should simply not be present) requires either maintaining separate graphs per scope or re-running the global community detection every time scope changes. Neither is practical in a factory that handles missions across multiple regulated domains simultaneously.

Spreading activation starts from a different premise. The question it asks is not “which memories are most similar to this query?” but “given this anchor in the graph, how does activation propagate through the network, and which nodes accumulate enough activation to be retrieved?” That question has a path. The path is explainable. And if you govern which nodes are in the graph at all, the governance is intrinsic to the retrieval rather than bolted onto it.

Where spreading activation comes from

The cognitive architecture ACT-R (Adaptive Control of Thought, Rational), developed by John Anderson at Carnegie Mellon University, models human memory as a spreading activation network. The core intuition is that memory retrieval is not a lookup operation but a competition. Chunks in memory have activation levels. When a query fires, activation spreads from the query to associated chunks along weighted links. The chunk that accumulates the most activation above a threshold is retrieved.

Three parameters from ACT-R are directly relevant to Kizuna-mem’s implementation.

Base-level activation (BLA) captures the frequency and recency of prior access. A memory that has been retrieved many times and recently has higher base-level activation than one that was written once three months ago and never touched. In ACT-R the formula involves a sum over prior retrieval events with a power-law time decay. In Kizuna-mem the BLA component is an interpolation between structural centrality (how many evidence paths run through this node) and temporal recency, controlled by a per-profile alpha parameter. Lower alpha gives you more recency weighting. Higher alpha gives you more structural importance.

Fan effect describes the interference pattern that occurs when a node has many links. The more things a concept is associated with, the slower and less certain its retrieval. In ACT-R terms, activation spreads but divides across the fan. In Kizuna-mem terms, this is reflected in the edge weight normalisation: when a node has many outgoing edges, each individual edge carries less activation than it would in a sparse neighbourhood. This is why a highly connected community node does not automatically dominate retrieval. Its activation per downstream path is diluted by the fan.

Lateral inhibition is the mechanism by which one strongly-activated node suppresses nearby competitors. In the brain this prevents retrieval from flooding the system with every weakly-associated memory. In the archetype visible in the diagram below, the furthest-reachable non-anchor node dims late in the propagation cycle, representing the inhibition signal cutting off low-relevance paths once the high-relevance ones have fired. This is the mechanism that keeps the retrieved context focused rather than encyclopaedic.

These three mechanisms together give you retrieval behaviour that looks remarkably like what you would want from a governed enterprise agent: it prefers recent and frequently-used memories, it does not let highly-connected hub nodes dominate, and it naturally narrows the context window toward the most activated paths.

Interactive: click any node to re-anchor the query to that concept. The activation propagates outward from your chosen anchor with fan-effect dilution and lateral inhibition visible at the edges. Drag the decay slider to see how a tighter or looser decay changes the retrieval horizon. Hover any node to see its activation level, hop depth from anchor, and the precise decay calculation.

What governance adds to the model

The cognitive science gives you a good retrieval algorithm. Governance is what makes it defensible.

In Kizuna-mem, governance operates at two levels, and both matter for audit.

Governance by scope means that the temporal graph the agent queries is not the global memory graph. It is a scoped view, filtered at write time by mission perimeter. When Ninmu declares a mission, it establishes a scope context. Kizuna-mem ensures that only memories with valid provenance within that scope context are present in the graph the agent sees. This is not a filter applied at query time. The out-of-scope memories are genuinely absent from the traversal. An agent running a trade-finance evidence pack cannot accidentally retrieve memories from a different client’s regulatory investigation, because those memories do not exist in its graph.

This is the “governed by construction” principle from the Substrate homepage made concrete. The governance is not a permission check on the retrieval result. It is a structural property of the graph the retrieval runs on.

Governance by time is what Kizuna-mem’s bitemporal design provides. Every memory node carries two timestamps: valid time (when the information was true in the world) and transaction time (when the memory was recorded in the system). A query has a natural time anchoring. When the agent asks “what did we know about this counterparty’s risk profile at the time of the October trade?”, the spreading activation runs over a bitemporal slice: only memories whose valid time covers October and whose transaction time precedes the query. This is not a post-hoc filter. The graph itself is time-indexed, and the activation computation is over the time-sliced graph. The bitemporal design is covered in detail in bitemporal memory as the compliance backbone.

The combination of scope governance and temporal governance means that the spreading activation path is not just explainable: it is auditable. The auditor can ask “which memories were present in the graph at the time of this retrieval?” and get a precise answer, because the graph state at that moment is recorded in the bitemporal log. The auditor can ask “why was this memory retrieved?” and get an answer that reads like an inference chain: activation propagated from the mission anchor (the query) through entity node X, then episode node Y, then fact node Z, with the weights shown. That is an audit trail.

How the activation path becomes the evidence trail

The explainability of spreading activation is not just a qualitative claim. The path can be serialised.

When Kizuna-mem runs a retrieval, it records the activation computation: which node was the anchor, which edges were traversed in order of depth, what activation value each node accumulated, and which nodes crossed the retrieval threshold. This record is signed by Ultra and appended to the mission’s tamper-evident log. The retrieved memories come with their paths attached, not as a separate log entry but as metadata on the retrieval event itself.

For a regulated enterprise, this has a specific value. The agent’s context at any decision point is now not just a bag of retrieved text. It is a signed graph traversal with attribution back to the original memory nodes, each of which in turn has its own provenance record (when it was written, by which agent action, with what source citation). The chain from the agent’s output back to the raw observations that informed it is complete and tamper-evident.

This is what the compliance trace diagram below shows. The retrieval path from the spreading activation becomes one of the artifacts in the evidence pack. It sits alongside the signed action logs, the human gate attestations, and the policy version records. When the regulator asks “show me why the agent decided X”, the answer includes the memory retrieval path as a first-class document.

Interactive: toggle “show glue-stack gaps” to see which artifacts in the evidence pack a LangGraph and vector-store stack would be unable to produce. Hover any workflow step to inspect the specific artifact it emits and which Substrate system owns it. The retrieval path record from Kizuna-mem is the step that makes the memory retrieval auditable.

Why this matters specifically for long missions

The case for explainable retrieval is easy to make for a single decision. It becomes more urgent when the mission runs for days or weeks.

A factory mission that produces a quarterly control test, or monitors a regulatory change programme over several months, accumulates a large and evolving graph. The same entity appears in many episodes at different times. Policies change. Data subjects’ records are updated. New evidence arrives that supersedes earlier beliefs. In a flat vector store, all of this history exists as a cloud of embeddings with no explicit structure. The agent retrieves the nearest neighbours, which may be recent or historical depending on the accident of the embedding space, and there is no way to explain why the system chose one historical version over another.

In a governed temporal graph with spreading activation, the structure is explicit. The fact that the November episode superseded the October belief is an edge in the graph, with a transaction time attached. When the activation propagates, it naturally flows through the current state of the graph, and the traversal record shows exactly which time slice it operated on.

This is the architecture that makes multi-hop reasoning over long missions tractable without hallucination. The agent does not need to reason about temporal ordering in its prompt context, because the graph has already encoded the temporal ordering structurally. The agent asks a question, the activation spreads through the governed, time-indexed graph, the retrieved memories are the ones that accumulated enough activation under the current scope and time constraints, and the path is signed and logged.

The connection to HippoRAG-style approaches (from the work of Guo et al. at Ohio State, 2024) is worth noting here. HippoRAG proposed using a personalised PageRank over a knowledge graph as a more biologically-inspired retrieval method. Kizuna-mem’s spreading activation shares the graph-traversal intuition but adds the ACT-R mechanisms (BLA, fan effect, lateral inhibition) and, critically, the bitemporal governance layer that makes it defensible for regulated work rather than just more accurate.

Mem0’s graph memory approach and similar production systems have shown that graph-structured memory substantially outperforms flat vector retrieval on multi-hop reasoning benchmarks. The governance dimension is not something those systems were designed for, which is why they work well for developer productivity tools and poorly for anything that has to answer to a regulator.

The sector walkthrough: healthcare prior authorisation

Return to the prior-authorisation problem from the opening. A patient’s authorisation decision involves multiple types of relevant memory: clinical precedents for the patient’s condition, the insurer’s current coverage policies, the patient’s own prior claims history, any exceptions or overrides previously granted, and the clinical guidelines in force at the date of service (not the current date, which may be different if a policy was subsequently changed).

With a flat vector store, the agent retrieves the most similar memories to the query. It may retrieve a clinical precedent that was superseded by a later guidelines update. It may retrieve a policy that was in force last year but not this year. It has no way to distinguish between the patient’s own history and a superficially similar case for a different patient if both happen to have high embedding similarity to the current query.

With spreading activation over a governed temporal graph, the retrieval is anchored to a specific point in time (the date of service), operates only over the patient’s scoped graph (the counterparty’s graph is not present), and the path from the clinical precedent back to the current query is explicit and signed.

The before state is a three-week cycle: a human reviewer reads the documents, constructs the reasoning chain manually, and submits the decision with an audit trail assembled by hand that nonetheless frequently has gaps when challenged. The after state is a mission: declare the patient file, the relevant date range, the coverage policy version, and the budget. The swarm runs the retrieval over the governed temporal graph, surfaces the cases where the evidence is ambiguous to a clinical reviewer at a human gate, and produces a signed decision with the full retrieval path attached.

The human reviewer is not removed. They are promoted to the decisions that require human judgement rather than the mechanical construction of an evidence chain. And the evidence chain, when it is produced, is auditable by construction rather than assembled after the fact.

For a government context, the same pattern applies to eligibility assessments, case management decisions, and benefit determinations, all of which are subject to the EU AI Act’s high-risk category designation under Annex III, point 5 (administration of benefits and services) and have had explainability requirements in some jurisdictions since long before the Act was written. For finance, credit decisions and AML determinations sit under similar regimes. The vertical differs. The architectural requirement does not.

What GraphRAG cannot retrofit

The community-summary approach to GraphRAG is genuinely useful for exploratory question-answering over large knowledge bases. It is not a good fit for regulated retrieval, and the reasons are structural.

Community detection (Louvain, Leiden, or similar) is a global algorithm. It produces a partition of the graph based on the full graph structure at computation time. Adding new memories or changing scope requires recomputation, which is either expensive or stale. The community summaries are generated by an LLM and are not tamper-evident: you cannot prove to an auditor that the summary accurately represents the underlying memories, because the summary is a lossy, generative artefact.

The retrieval path through a community summary does not give you attribution to specific memory nodes. The agent received a summary. The summary was generated from some set of nodes. The regulator wants to know which nodes. You cannot answer that question from the community summary alone.

None of this is a criticism of community-summary GraphRAG as a system. It is a precise statement about the gap between what it was designed for (exploratory RAG over enterprise documents) and what regulated agent memory requires (explainable, governed, bitemporal, tamper-evident retrieval paths).

Spreading activation over a governed temporal graph was designed for the second use case. The connection to provenance as the real differentiator is direct: the activation path is not just a better retrieval mechanism, it is a provenance record for the context the agent acted on.

What to put in an RFP

If you are evaluating agentic platforms for regulated work and memory is in scope (it should be), these questions separate systems that can answer for their retrievals from those that cannot.

Ask whether the system can produce a signed, serialised retrieval path for any context window it provided to an agent. Not a log of which documents were retrieved, but the traversal path with activation values and edge weights. If the answer is “we can show you the top-K results and their similarity scores”, you have a vector store, not a governed memory system.

Ask how the system handles temporal retrieval. Specifically: if a policy changed after a decision was made, and the regulator asks what the agent knew at the time of the decision, can the system replay the exact retrieval that would have run at that time, over the graph state that existed at that time? If the answer involves any form of log reconstruction or approximate reconstruction, the system does not have bitemporal memory. It has logs.

Ask how scope is enforced. Is out-of-scope data filtered at query time, or is it absent from the graph the agent queries? The distinction matters for audit. “We filter it” means the data is present and a bug could leak it. “It is not there” means the scope enforcement is structural.

Ask whether a deletion is cryptographically evidenced. For GDPR Article 17 and EU AI Act compliance, a deletion that removes a data subject’s records from the agent memory must itself be logged in the tamper-evident audit trail, with a record of what was deleted (without re-exposing the deleted data), when, and on whose instruction. Most vector stores and graph databases treat deletion as a database operation. In a governed factory, it is a signed event in the audit log. The full account is in factory memory that improves itself.

Ask what the latency looks like. Spreading activation with a governed temporal graph is more complex than vector search. The claim that matters is whether it fits the agent’s operational requirement. The verified figure for Kizuna-mem is approximately 3 ms bitemporal recall. That is fast enough for real-time agent use and does not require the common compromise of pre-computing static summaries that lose the retrieval path.

A 90-day pilot design

You do not need to run the full factory to test whether governed explainable retrieval is real. Pick one decision type that has an existing audit trail: prior-authorisation denials, AML escalations, credit exceptions. Run the factory on historical cases with the outcomes already known. At each retrieval, capture the activation path. Then ask a domain expert to review ten paths and rate whether the retrieved memories are the ones a human expert would have selected and whether the path is coherent.

That exercise tells you three things. Whether the graph structure is good enough to support the reasoning. Whether the governance filters are working (no out-of-scope memories appearing in the paths). And whether the retrieval explanation would survive scrutiny from the same expert who would review the underlying decision.

Most vendor demos will show you accuracy numbers on a benchmark. What you are testing in this pilot is not accuracy. It is whether the system can answer for itself. That is a different question, and it is the one your audit committee will ask when a decision is challenged.

Cross-reference

The bitemporal design that makes the temporal dimension of retrieval auditable is covered in detail in bitemporal memory as the compliance backbone. The provenance records that attach to each memory node and travel with the retrieval path sit on top of the architecture described in provenance as the real differentiator. The EU AI Act logging requirements that make explicit retrieval paths a compliance necessity rather than a nice-to-have are mapped in EU AI Act Article 12: what to log. And the broader memory governance architecture, including how the factory’s memory improves itself over time within the governed perimeter, is in the factory’s memory that improves itself.

If your question is not about memory architecture but about whether the full factory can be deployed in your environment without data leaving the building, that is the sovereign deployment story in sovereign AI, air-gapped by default.

If you want the investor brief, the full technical and commercial case for Substrate, including the financial model and deployment architecture, is available at /substrate.