Article 記事

Trade finance evidence packs with full lineage: from associate-weeks to machine-hours

author Jonathan Conway
timestamp 8 May 2026
classification trade-finance / aml / evidence-pack / lineage / compliance / substrate / dark-factory / regulated

A trade-finance analyst at a large European bank once told me her team spent two weeks assembling the evidence pack for a single complex correspondent-banking review. Not investigating the case. Not deciding anything. Gathering, copying, reformatting, and stapling together the evidence. The actual decision took an afternoon. The other nine days were the paper trail.

That ratio is not unusual. In trade finance, AML compliance, and the surrounding ecosystem of regulatory reporting, the dominant cost is not analytical work. It is the assembly of evidence that regulators, internal auditors, or counterpart banks will later inspect. Pull transaction data from one system, entity information from another, sanctions lists from a third, the rule engine’s output from a fourth. Reconcile the timestamps. Map the entities. Write a narrative that shows a regulator exactly what you looked at, when, and what you decided. Then sign it and hope nobody asks for a source reference you cannot produce.

The manual version of this is slow, expensive, and fragile. Slow because the assembly is sequential and human. Expensive because the people doing it are skilled analysts who could be doing something more useful. Fragile because the audit trail is constructed retrospectively: someone sat down after the work was done and tried to reconstruct from logs the exact sequence of events. That reconstruction is where gaps appear. It is also where regulators ask uncomfortable questions.

The Financial Action Task Force’s 2024 guidance on technology-enabled AML compliance (source: FATF, November 2024) is explicit that automated evidence trails with full lineage are not just acceptable but preferable, provided the automation itself is auditable. That is the word that concentrates minds. Auditable automation is not the same as automation you can audit after the fact with heroic effort. It means the automation was built from the start to produce a verifiable record of what happened.

This article walks through what that looks like end to end, using a realistic trade-finance evidence pack as the worked example. The goal is not a theoretical walkthrough. It is the specific architecture, the signed artifacts, the data structures, the human gates, and the lineage links that a regulator or internal audit committee can actually inspect.

The problem with assembling evidence by hand

The assembly problem has three layers, and confusing them leads to bad solutions.

The first layer is gathering: finding all the relevant source documents across systems that were not designed to talk to each other. An LC (letter of credit) workflow might touch a trade-finance platform, a KYC database, a core banking ledger, an OFAC/sanctions API, an internal entity resolution service, and a correspondent bank’s SWIFT messages. None of these systems were designed to be queried together with a shared timeline. Assembling them requires someone who knows all the systems and has access to all the systems.

The second layer is entity resolution: deciding that the “Sunrise Trading Co Ltd” in the LC application is the same entity as the “Sunrise Trading Company” in the sanctions hit, which is the same entity as “SUNRISE TRDG CO LTD” in the payment instruction. This is not a string match. It requires cross-referencing company registries, beneficial ownership data, correspondent histories, and analyst judgement. Get it wrong and you either miss a genuine sanctions exposure or generate a false positive that burns analyst time.

The third layer is lineage: showing that the decision you reached was based on the evidence you gathered, gathered at a specific point in time, from specific sources, using a specific version of the rules. This is the layer that matters most to regulators and matters least to the people building agent pipelines, because it is entirely invisible in demos. A demo shows the right answer. An audit shows how you got there.

Current glue-stack approaches handle layer one adequately and layer two poorly. Layer three they mostly ignore. LangChain and similar orchestrators will call your sanctions API, your KYC service, and your document parser and aggregate the results. But those calls are not signed. The exact inputs and outputs are not in a tamper-evident log. If a regulator asks in six months which sanctions list version was in effect when this transaction was cleared, the answer is probably in a log file that nobody can easily reconstruct.

How a governed factory approaches this differently

The Substrate factory runs from a single declared mission: “Produce the AML evidence pack for this set of transactions within this budget and this policy perimeter.” Ninmu, the swarm conductor, decomposes that declaration into a task graph and routes each task to the cheapest agent model that can actually do it.

What makes this different from a standard agent pipeline is not the orchestration. It is what every agent is required to do before it does anything else: assert its signed identity through Ultra, the factory’s separate cryptographic authority plane. Every agent has an Ed25519 identity lifecycle. Every action it takes is signed by that identity and logged in a tamper-evident append-only record. The log is not bolted on. It is the storage engine.

This means that from the first byte ingested to the last signature on the evidence pack, there is an unbroken chain of signed records linking every piece of output back to the input that produced it and the policy that governed the decision.

Interactive: toggle the glue-stack gap view to see which artifacts a LangGraph plus Postgres pipeline cannot produce. Hover any workflow step to inspect the specific artifact it contributes to the evidence pack and the Substrate system responsible for it.

Work through that diagram step by step, because each transition is doing something specific.

Ingest and manifest. The first step is not “read the documents”. It is “ingest the documents and produce a signed manifest that records exactly what was received, from where, and when.” Ultra signs the ingest event, which commits the input hash alongside the timestamp and the agent identity. If anyone later claims that a document was added to or removed from the case after the fact, the manifest refutes it.

Entity graph construction. The ingestion feeds into an entity resolution step that builds a timestamped entity graph in Kizuna-mem, the factory’s bitemporal memory. “Timestamped” is doing a lot of work here. Kizuna-mem is a bitemporal store: it records both the valid time (when an entity relationship was actually true in the world) and the transaction time (when the factory learned about it). This matters because sanctions lists and beneficial ownership registers change. The graph records what was known at the time of the decision, not what is known now. A regulator asking “what sanctions exposure did you see at the point of clearance” gets an exact answer, not an approximation.

Policy checks. The policy check step runs the entity graph through the rule engine inside Cosmictron, the factory’s live data plane. The rule engine records not just its verdict but the full rule trace: which rules fired, in which order, against which entity attributes, at what risk threshold. This trace is an artifact in the evidence pack. It is the answer to “how did you classify this transaction as low-risk when the counterparty has a partial name match on the SDN list?”, answered with the exact rule logic that was applied.

Exception surfacing. This is the human gate. When the rule engine flags an exception, it does not send an email. It routes to a named analyst through Ninmu’s gate mechanism with the full context: the entity graph, the rule trace, the policy version, the risk score, and a recommended action. The analyst’s decision is recorded, including any override reason, and lands in the signed log. This is the audit trail for the human decision, not just the machine decision. Both are in the same chain.

Assembly. Once the exception has been resolved, the assembly step gathers every artifact produced by the preceding steps and constructs the evidence pack. The pack is not a summary. It is a structured document with cryptographic back-links to every source artifact. Each section of the pack references, by signed hash, the specific ingest manifest, entity graph snapshot, rule trace, and analyst attestation that supports its conclusions.

Signing and lodging. The final step has Ultra produce a top-level signature over the assembled pack. That signature commits the hash of every artifact in the pack and the hash of the entire chain of signed actions that produced them. It is verifiable without having access to the factory. Anyone with the public key can confirm that the pack has not been altered since it was produced.

The signed lineage: every artifact linked to its source

The compliance trace shows the workflow. The signed lineage shows what it looks like inside Ultra’s append-only log.

Interactive: click “simulate tamper” to see what happens when any block in the chain is altered. The downstream chain breaks immediately. Hover any block to inspect the exact signed fields: agent identity, tool, input hash, policy version, and the hash linking back to the previous block.

Each block in that chain is one signed agent action. The key fields are not incidental. They are the answer to specific regulatory requirements:

  • Agent identity: which agent, at which version, performed this step. This answers EU AI Act Article 12’s requirement for “recording of input data” alongside “the identity of the natural or legal persons involved in the generation” of AI outputs.
  • Tool: which capability was called. The rule engine version, the model version, the external API endpoint and its response hash. Changing the model version mid-investigation without re-running is detectable because the earlier blocks committed a different model version.
  • Policy version: the compliance rule set that was active. OFAC sanctions lists update frequently. The policy version in the signed block tells you exactly which list version was in effect at the moment of screening.
  • Input and output hashes: the exact inputs and outputs of this step. Not a summary. The hash of the actual data. This prevents retroactive claim that a different input was used.
  • Previous hash: the link that makes this a chain rather than a set of independent records. Altering any earlier block invalidates every subsequent one. The tamper simulation in the diagram shows this in real time.

The practical implication for regulators is that the evidence pack is self-contained and self-verifying. You do not need access to the factory’s internal systems to verify that the pack is genuine. You need the public key, the pack, and the log. The factory can produce these in a format suitable for offline verification.

Before and after: what the numbers look like

The before state is not a theoretical horror story. It is the standard operating procedure at most tier-two and tier-three banks running trade-finance AML reviews today.

A complex correspondent-banking case with multiple counterparties, a partial sanctions hit, and a cross-border payment chain typically involves: an analyst spending two to four days gathering and normalising data; a senior analyst or compliance officer spending half a day reviewing the entity graph and making a clearance decision; a junior analyst spending another day or two assembling the evidence narrative; and a second compliance officer reviewing and signing the pack. Total: five to seven working days, two to four people, direct labour cost in the range of a few thousand pounds per case at market rates. For a bank processing several hundred such cases per month, the annual labour bill for evidence assembly alone runs into seven figures.

The factory approach is illustrative here rather than a claimed benchmark, because production numbers depend on the specific systems integrated, the data quality, and the case mix. What the architecture enables structurally: gathering, entity resolution, policy checking, and assembly are parallelised across the swarm. The analyst’s time goes to the exception gate and final review, not data gathering. Cycle time drops from days to hours. Labour cost shifts from assembly to governance.

The audit quality improvement is harder to quantify but arguably more important. A manually assembled pack has gaps. Not because the analyst is careless, but because manual assembly from disparate systems is error-prone and the audit trail is reconstructed, not recorded. A factory-assembled pack has no gaps by construction. Every step produced a signed artifact. Every artifact is in the pack. The chain is verifiable. This is not a feature a regulated enterprise can add to a manual process. It is an architectural property of a factory that was built to produce it.

How the six systems combine in this workflow

It is worth being explicit about which system does what, because the “governed by construction” claim only holds if you understand why the construction matters.

Ninmu orchestrates the task graph and holds the mission budget. Every token consumed by every agent is metered before it runs. If the case is complex and the swarm is consuming more budget than a routine case, Ninmu adjusts routing: simpler steps run on smaller models. The evidence pack is produced within the declared budget, not discovered to have exceeded it on the invoice. For a compliance team processing hundreds of cases per month, predictable per-case cost is a budget planning requirement, not a nice-to-have.

Cosmictron is the live data plane. It carries the real-time state of the case as it progresses through the workflow. The incremental subscription model (DBSP-based, as described in the death of polling) means that every agent has a live view of the case state without polling. When the entity resolution agent updates the graph, the policy check agent’s view updates in sub-millisecond time. There are no stale reads, no cache invalidation problems, no race conditions. The deterministic replay property means the entire run can be replayed exactly if an audit requires it.

Kizuna-mem stores the entity graph and the timeline. The bitemporal structure ensures that queries about what was known at the time of a decision return the state of the graph at that exact moment. This is the answer to the “sanctions list version” question and to a harder one: “Did you know, at clearance, that this entity had been flagged in a different case two weeks earlier?” The answer is in the bitemporal graph and it is auditable either way. More on this in bitemporal memory as the compliance backbone.

Ultra provides the cryptographic identity plane. Every agent action is signed. The evidence pack is signed. The chain is verifiable. Ultra’s separate authority plane means that even if an agent’s runtime is compromised, it cannot produce a valid signed action under a different identity. The boundary is cryptographic, not policy.

Kizuna (the forge, not Kizuna-mem) provides the signed supply chain for the factory itself. Every version of every agent component is signed and policy-gated. This answers a DORA compliance question directly: “How do you know the agent running today is the same agent that ran when you produced this evidence pack six months ago?” The answer is in Kizuna’s signed artifact store.

Voxeltron provides the deployment fabric. Isolated cells boot in under 50 ms and the factory scales horizontally without changing the governance model. The cells run on the customer’s own hardware. No data leaves the walls. The factory continues to operate during a cloud provider outage.

Adjacent workflows that share the same factory line

The trade-finance evidence pack is the worked example, but the architecture is general. DORA operational resilience testing has a similar structure: gather the scenario definition, the test execution logs, the gap analysis, and the remediation decisions; link them with cryptographic provenance; sign the whole thing. Same factory, different task graph. Fraud investigation shares the entity resolution and policy check steps with an AML review; only the specific rules differ. The Basel Committee’s BCBS 239 principles require regulatory reports to be traceable to source data, which a factory producing lineage-linked packs satisfies as a side effect.

The “software then everything else” claim on the Substrate homepage is sometimes read as a product roadmap promise. It is better understood as an architectural statement. When the factory is governed by construction, its governance properties apply to every mission it runs. The same signed identity, the same tamper-evident log, the same bitemporal memory, the same budget governance are in the factory, not in the workflow. Adding a DORA workflow does not require adding DORA governance. The governance is already there.

What to demand in an RFP

If you are evaluating an evidence-pack workflow for trade finance or AML, the following questions separate systems that are genuinely auditable from systems that produce attractive-looking outputs.

Ask for the signed artifact for a specific step. Not a summary. The actual signed artifact, with the agent identity, the input hash, the policy version, and the link to the previous artifact in the chain. If the vendor shows you a log file, ask how that log file is protected against tampering after the fact. The answer should be “it cannot be tampered with without detection, because it is a signed hash chain.” Any other answer is a monitoring system, not an audit trail.

Ask about the bitemporal guarantee. Can the system show you the state of the entity graph at the exact time of the clearance decision, not the current state? If the answer requires exporting a database snapshot, you do not have a bitemporal system. You have a database with timestamps. The distinction matters when sanctions lists change and you need to prove that the version in effect at the time of the decision was the version you used.

Ask how human decisions are recorded. The signed log should capture not just that a human approved a gate, but who approved it, when, what context they were shown, and whether they overrode a machine recommendation. If the human decision is stored in a separate ticketing system that is not linked cryptographically to the evidence pack, you have a gap in the chain. This is the gap that appears in regulatory investigations.

Ask about sovereignty. Where does the data go? If any element of the evidence pack workflow sends transaction data, entity data, or beneficial ownership data to an external API without your explicit control, that is a data residency problem in several jurisdictions. The factory should run entirely on your own infrastructure with your own models. Frontier-model failover is acceptable for non-sensitive analytical steps if the data is appropriately redacted first, but the governance chain must remain on your hardware.

Ask about the cost model. Per-case costs should be predictable before the case runs, not discovered on an invoice. If the vendor cannot tell you the cost ceiling for a specific case type before it runs, the cost governance is not in the machine. It is in a human watching a dashboard intermittently.

A 90-day pilot design

The simplest pilot is also the most diagnostic. Pick one case type with a well-understood current cost: a standard LC confirmation review, a medium-complexity sanctions alert, or a quarterly control test. Measure the current state precisely: wall-clock time from case assignment to signed pack, analyst hours per case, and exception rate (how often does a human need to deviate from the recommended decision).

Run the factory on the same case type for 30 days in parallel with the existing process. Do not replace the existing process. Run both, compare every output, and measure the gap. At the end of 30 days you have three numbers: the cycle time ratio, the cost ratio, and the evidence quality differential (measured as the number of audit artifacts present in the factory output versus the manual output).

The evidence quality differential is the number that will surprise you most. Manual packs have gaps that nobody notices until someone looks. Factory packs do not have gaps, because gaps are structurally impossible in a signed hash chain. The 30-day comparison will reveal how many gaps were in the manual packs. That number is your risk exposure quantified, and it tends to focus minds on the governance question in a way that cycle time ratios do not.

For the second 30 days, run the factory as the primary process with the manual process as the check. By the end of 60 days you have enough data to calculate the annualised labour saving and to demonstrate to your internal audit team that the factory output is auditable in the way the manual output never quite was. The final 30 days are about hardening: the specific edge cases your case mix throws up, the human gate performance, and the integration points with your downstream reporting systems.

The human gate performance deserves specific attention. The risk in any compliance automation is that the gate becomes a rubber stamp. Three design choices prevent this: show the analyst the full context for every exception, not a summary; capture override reasons and review them for rule-quality signals; and monitor gate metrics (time to decision, override rate, downstream error rate) monthly. A gate approving 99% of everything in under 30 seconds is providing the appearance of oversight. The governance is in the factory, but the integrity of the human gate requires active management. More on the gate taxonomy in human approval gates that do not bottleneck.

The deeper point about evidence-first design

There is a pattern in how compliance technology has evolved over the past two decades that is worth naming directly. Systems are built to do the work first and to produce evidence second. The evidence is an afterthought, assembled from logs and exports after the fact, in whatever format the current regulator happens to require.

The factory inverts this. The evidence pack is not a report generated at the end of the workflow. It is the workflow. Every step produces a signed artifact. Every artifact is linked to its source. The assembly step does not create evidence. It organises evidence that has been accumulating from the first byte ingested. The pack is audit-ready not because someone assembled it carefully at the end, but because the factory could not have produced the output without also producing the lineage.

This is what “governed by construction” means in practice. Not that governance has been added to an existing workflow. That the governance is the mechanism by which the workflow operates. You cannot have the signed pack without the signed chain. You cannot have the signed chain without the signed actions. You cannot have the signed actions without the agent identities. The properties are not separable. Which means they cannot be quietly removed when they become inconvenient, which is precisely when they are most needed.

The before-state is associate-weeks and hand-assembled packs with gaps nobody notices until a regulator asks a question the pack cannot answer. The after-state is machine-hours, expert oversight at the gates that matter, and a pack that is self-verifying by the time it leaves the factory. The difference is not speed, though the speed difference is real. The difference is the nature of what gets produced: not a document that summarises what happened, but a cryptographic record that proves it.

If you want to go deeper on any of the component systems, deterministic replay as the audit trail covers how Cosmictron’s replay property makes the entire run reconstructable without additional instrumentation. For the EU AI Act logging requirements that this architecture satisfies, EU AI Act Article 12: what to log maps the specific regulatory requirements to specific factory artifacts. And if you are wondering how the same factory handles the cases where the automation should not proceed without a human, human approval gates that do not bottleneck covers the gate taxonomy and the design choices that keep oversight real rather than nominal.

To discuss a pilot design or request the investor brief, you can reach us via the Substrate page. The first conversation is about your specific case mix, not ours.