Article 記事

Supply-chain risk in the age of cheap agentic vulnerability discovery

author Jonathan Conway
timestamp 24 May 2026
classification supply-chain / security / agentic-ai / substrate / ultra / kizuna / voxeltron / sbom

A tier-one bank’s security team ran a red-team exercise last year in which they gave a commercially available autonomous agent access to their internal developer environment. The agent was instructed to find vulnerabilities. It worked for three hours. By the time the team stopped it, the agent had catalogued 23 distinct attack paths across their agent framework’s transitive dependencies, filed speculative issues against two open-source libraries, and written draft exploit notes in plain English that any junior penetration tester could follow. Total cost: a few hundred tokens. Total time: the length of a long lunch.

Nobody was hacked. The point was the economics. What used to require a skilled human researcher running tools over days now runs as a background task. The marginal cost of a probe is approaching zero.

That is the supply-chain threat model that regulated enterprises need to internalise in 2026. It is not that agents have become dramatically more capable than last year’s tools. It is that capability has become available at a price that makes continuous, exhaustive probing economically rational. A motivated attacker does not need to find the vulnerability themselves. They can instruct an agent to look, wait, and report.

The question for a CRO evaluating an agentic deployment is therefore not “could our stack be attacked?” Everything can be attacked. The question is: how large is the surface, and how fast can we know about a problem and fix it ourselves?

The dependency trap, quantified

A typical Python-based agent deployment today carries a striking amount of weight it did not consciously choose. An agent framework like LangChain pulls in dozens of direct dependencies. Each of those has its own tree. A database driver, an observability SDK, a cloud storage client, a serialisation library. Add Redis for session state, Postgres for memory, a third-party observability SaaS that phones home, and CI runners that execute agent-generated code in shared environments. By the time you have a working demo, you have imported several hundred packages, many of which you have never read, written by people you have never met, signed by nobody you can verify.

This is not a criticism of open-source software. It is an observation about what “attack surface” means when the attacker is an agent that can read package manifests, search vulnerability databases, generate proof-of-concept code, and iterate, all in a single session.

Software Bill of Materials (SBOM) requirements are already live in parts of the US federal procurement space following the 2021 Executive Order on improving the nation’s cybersecurity (source: NIST SBOM guidance, 2021), and similar provisions are appearing in the EU Cyber Resilience Act (source: European Parliament, Cyber Resilience Act, adopted 2024). The direction of travel is clear: you will be asked to prove what is in your software. A stack that cannot answer that question cleanly is a compliance problem as well as a security one.

The deeper issue is patch latency. When a vulnerability is found in a transitive dependency, the clock starts. The package maintainer has to patch it, release it, and the consuming projects have to notice, update, and redeploy. In a glue stack where you do not own the components, that chain runs through other people’s release schedules. In an owned codebase, the maintainer is you.

What agentic probing looks like in practice

It is worth being precise about what “near-zero marginal cost probing” actually means, because the framing matters for the threat model.

An autonomous agent tasked with vulnerability research will typically start by enumerating the dependency tree of the target system. This is not exotic. Any package manager will emit a full dependency list. The agent then cross-references those packages against public vulnerability databases, GitHub issue trackers, and security advisories. It generates a ranked list of candidates. For each candidate with a plausible attack path, it writes test code.

None of this requires novel AI capability. What has changed is the ability to run this process continuously, in parallel across many targets, with no human in the loop until a result is found. The cost per candidate is a few inference calls. A motivated attacker running this at scale, across a portfolio of target organisations, is spending money that would have been considered trivially small for a corporate security budget even three years ago.

For a regulated enterprise, the implication is not simply that they might be attacked. It is that the attack can now be tuned to the specific shape of their deployment. If an attacker knows you are running a particular version of a particular orchestration framework, the agent can focus its probing on the known surface of that specific version. The more distinctive your dependency tree, the more precisely a probing agent can target it.

A minimised, owned stack reduces this surface in two ways. First, there is less to probe: fewer packages, fewer versions, fewer interfaces. Second, owned code is not publicly enumerable in the same way. The attacker cannot read your internal Rust crate on PyPI.

Interactive: compare the owned Substrate stack against a typical agent framework plus glue on seven security axes. Toggle the regulated-finance lens to see how the weightings shift when the buyer’s security team answers to a prudential regulator. All axes point toward “better”: low attack surface, fast patch latency, and low dependency count are all inverted so that owning less external code scores higher.

Substrate’s answer: the three-layer defence

The Substrate platform takes a position on supply-chain security that is architectural rather than operational. Rather than adding tooling around a sprawling dependency graph, it reduces the graph to begin with and then enforces explicit boundaries around what remains.

There are three interlocking layers.

Ultra: the outer trust boundary. Ultra is the cryptographic authority plane for the entire factory. Every agent has a provable identity, issued as an Ed25519 key pair, with a full lifecycle: registered, active, suspended, revoked. No agent acts without a valid, current identity. Every action the agent takes is signed and lands in an append-only, tamper-evident log. Fake identities are rejected at the boundary before they touch anything.

The consequence for supply-chain security is direct. A compromised dependency that attempts to act as an agent cannot do so without a valid identity it does not have. A probe that tries to exfiltrate data via an agent call will produce a signed, attributed log entry. The attacker is not invisible: every action is on record, tied to an identity, timestamped, and immutable.

Compare that to the alternative. In a glue stack, agents typically act on the application’s shared credentials. There is no provable record of which specific agent action caused a given side effect. If a compromised package writes to a database, the database record shows the application’s service account, not the compromised package. You know something happened; you do not know what. Incident response becomes archaeological. That is the structural problem Ultra solves.

Kizuna: the signed supply chain. Kizuna is the source-control and forge system built specifically for agents as first-class citizens. Every commit, merge, and artifact produced within the factory is signed. Policy gates control what agents can do: agents have trust levels and declared capabilities, and the forge enforces those declarations. The MCP policy gateway defaults to deny.

The practical supply-chain implication is that Kizuna generates an SBOM by construction. Not as a separate audit step, not as a quarterly compliance exercise, but as the natural output of the forge’s normal operation. Every artifact that exits the factory carries a signed provenance record: what went into it, which agent produced it, which policy gates it passed through. That record is the SBOM.

This matters for the 2026 regulatory environment because the direction of SBOM requirements is toward real-time, signed attestations, not periodic self-reported documents. A forge that signs every artifact as a matter of course is already ahead of where the requirements are going.

Voxeltron: isolated execution. Voxeltron is the compute fabric. Each agent runs in an isolated cell. Isolated cells boot in under 50 milliseconds and the platform can sustain approximately 10,000 idle cells per host. The isolation is structural: there is no shared mutable state between cells, no ambient network access that is not explicitly granted, and no persistent filesystem by default.

For supply-chain security, isolation is the last line of defence. Even if a compromised dependency somehow executes within a cell, the blast radius is bounded by the cell’s declared permissions. The cell cannot reach other cells. It cannot reach the host filesystem. It cannot exfiltrate through network channels it has not been explicitly permitted to use. A rollback to a known-good state takes as long as a cell boot.

This is meaningfully different from a traditional container setup. A container is isolated from the host in certain ways, but containers in a typical deployment share network namespaces, service accounts, and persistent storage in ways that make lateral movement possible. Voxeltron’s model is deliberately more restrictive because agent workloads are short-lived and high-churn: there is no reason for an agent cell to carry ambient access it does not need for its declared task.

Interactive: click any layer to inspect the security properties of that component. Toggle air-gap mode to see which connections survive a fully network-isolated deployment. The bottom layer shows the position of glue-stack components: outside the trust boundary, unsigned, accessible to probing. Ultra and Kizuna are highlighted because they are the two components that enforce the trust perimeter and sign the supply chain respectively.

The ownership argument

There is a deeper point here that goes beyond tooling.

The 790,000 lines of owned Rust and Elixir code across the six Substrate systems are not a boast about engineering productivity. They are a statement about the security posture. Owned code has no transitive dependency surprises. You know what is in it because you wrote it. You can patch it on your own timeline because you own the release process. You can audit it because you have access to the full source, the tests, and the commit history.

Rust, specifically, eliminates an entire class of vulnerabilities by construction. Memory safety bugs, use-after-free, buffer overflows: these are not things that a well-written Rust programme has retrofitted protections against. They are things it structurally cannot express. For a security team that has spent years watching CVEs pile up in C and C++ dependencies, this is not a minor point.

The same logic applies to Elixir for the parts of the system that benefit from its concurrency model. Explicit boundaries between components. No garbage-collection pauses creating side-channel timing opportunities. No runtime type coercion producing unexpected behaviour when an adversarial input arrives at a parsing boundary.

None of this is to say that owning code is free of cost. Writing 790,000 lines of well-tested, production-grade systems code is expensive. The argument is that this cost is paid once, up front, and produces a predictable security surface. The alternative is paying a continuous, open-ended cost in the form of dependency updates, vulnerability monitoring, patch testing, and incident response, against a surface you did not design and cannot fully control. The Rust and Elixir choice is not coincidental: both languages make certain error classes structurally impossible, which means the security review does not have to look for an entire category of bug. That is worth something, especially when the reviewer is a regulator or a penetration tester paid by the hour.

There is also an operational continuity argument that sits alongside the security one. When a dependency in a glue stack goes unmaintained, or the project changes its licence, or a cloud provider deprecates a service, the organisation has a forced migration on whatever timeline the external project dictates. A factory running primarily owned code has those decisions under its own control. This matters for regulated institutions that need to commit to long support windows and cannot accept the factory going offline because a transitive package stopped compiling.

For a regulated enterprise, the calculation is starker still. A dependency-heavy stack is not just a technical risk: it is a compliance risk. If a regulator asks you to demonstrate the full provenance of a decision made by your agent system, and part of that decision passed through a third-party library whose behaviour you cannot reconstruct, you have a problem. The signed, deterministic, fully owned provenance chain that Substrate provides by construction is not a nice-to-have for an institution that answers to the FCA, the PRA, the ECB, or equivalent.

A sector walkthrough: regulated finance

Consider what a supply-chain incident looks like for a financial institution running a glue-stack agent deployment versus one running Substrate.

The glue-stack scenario. A vulnerability is discovered in a transitive dependency of the agent orchestration framework. The CVE is published on a Monday. By Tuesday, the vulnerability database has been updated and an agent tasked with probing agent deployments begins finding instances. The financial institution’s security team is notified on Wednesday when their monitoring picks up unusual log patterns. By Thursday, they are trying to establish which agent actions might have been affected, but the audit log is a combination of application logs from the orchestrator, database records from Postgres, and trace data from the observability SaaS: three separate systems that do not agree on a canonical timeline. The incident response team cannot definitively answer whether data was exfiltrated because the agent’s actions were attributed to the application service account, not a specific agent identity. The regulator asks for a complete account of what happened. The honest answer is that a complete account does not exist.

The Substrate scenario. The same vulnerability is published on Monday. Because Substrate’s production factory does not use the affected package (the owned codebase has no dependency on it), there is nothing to patch. The security team notes the CVE, confirms it is not in scope, and closes the ticket by Tuesday. If, hypothetically, a related component had been affected, Ultra’s signed action log would provide a complete, tamper-evident timeline of every action taken by every agent in the period of interest, attributed to specific cryptographic identities. Kizuna’s signed artifact chain would show exactly what code was in production at what time. The regulator’s question about what happened has a precise, signed answer.

This is what “governed by construction” means for supply-chain security. The properties the regulator will ask for are not assembled after the fact. They are produced as the natural output of a system designed around signing, attribution, and explicit boundaries from the first commit.

Healthcare and government: where the blast radius is larger

The supply-chain risk calculus is even starker in healthcare and government deployments than in finance.

In healthcare, an agent system might handle clinical coding, insurance claims, or treatment eligibility decisions. A compromised dependency that can read the data passing through such a system is not just a regulatory incident. It is a patient data breach with direct liability under HIPAA and, in the UK, NHS data governance frameworks. The fact that the breach was caused by a transitive dependency rather than a first-party vulnerability is not a defence. The institution is responsible for every component it deployed.

The particular challenge in healthcare is that the same agent system that handles data about treatment decisions may also handle billing and appeals logic. These interact. A vulnerability in a serialisation library, for example, might allow a malformed input to cause unexpected behaviour at the boundary between the clinical and administrative systems. In a glue stack with many independently sourced components, identifying and bounding such interactions requires knowing the complete dependency graph, which is exactly what most deployments cannot cleanly produce.

For government, the concern is different in character but not in severity. Government agent deployments often involve personal data at population scale: eligibility determination, benefits processing, casework. A supply-chain compromise in this context can affect millions of individuals and creates a political liability in addition to a regulatory one. Governments are also increasingly subject to scrutiny about the provenance of the systems they deploy: there are legitimate questions about whether software produced by certain foreign suppliers carries supply-chain risk of a different kind entirely.

The practical answer in both sectors is the same as in finance: a system with a minimal, owned dependency tree, cryptographic agent identities, signed artifacts, and isolated execution cells has a defensible security posture that can be explained to a regulator in terms they will recognise. A system that cannot trace its complete dependency tree or attribute every agent action is not in that position.

What to demand in an RFP

If you are evaluating an agentic platform for a regulated deployment, supply-chain security should be a first-order question, not an annex. Here is the language to put in the document.

Ask the vendor to provide a complete, signed SBOM for the system as deployed in your environment. If they cannot produce this within a day, the system does not generate it by construction: someone has to assemble it manually, which means it will be incomplete and out of date.

Ask how the system attributes agent actions. Can it tell you, for a specific decision, which agent identity took each step, what data it read, and what it wrote? If the answer involves correlating multiple log sources, you have a reconstruction problem, not an audit trail.

Ask what happens to the trust boundary when a dependency is compromised. Does the system have cryptographic agent identities that would prevent a compromised package from acting as an authorised agent? If agents act on shared application credentials, the answer is no.

Ask about the patch lifecycle for the runtime itself. Is the compute and orchestration layer owned code, or does it depend on a third-party framework that releases on its own schedule? For a vulnerability in an owned component, the patch timeline is yours to control. For a vulnerability in a borrowed framework, you are waiting for someone else.

Ask to see the air-gap deployment. A system that cannot run in a fully network-isolated environment has external dependencies that are not accounted for in the security model. For institutions with genuine data-residency requirements, the air-gap test is the real test.

These questions distinguish systems built for security from systems with security documentation written about them. There is a meaningful difference, and it is worth finding out before the contract is signed.

The 90-day way to verify it

A supply-chain security claim is testable. You do not need to trust a vendor’s marketing. You need a structured pilot.

Start by asking the vendor to run their own system against a red-team agent: give an autonomous agent access to the dependency manifest of the vendor’s platform and instruct it to find probing opportunities. This is not an unusual request. Any vendor confident in their supply-chain posture should welcome it.

Second, run a tabletop incident response exercise: assume a dependency in the platform has been compromised and work through the following questions. How long does it take to confirm scope (what was affected)? How long to produce a complete timeline of affected agent actions? How long to demonstrate to a regulator that the incident is contained? The answers to these questions differ dramatically between a system with signed, attributed action logs and one without.

Third, ask for the SBOM in a machine-readable format (CycloneDX or SPDX are the current standards) and run it against a public vulnerability database. A minimal owned stack should return few or no matches. A dependency-heavy glue stack will return a list that is instructive in its own right.

The three tests together give you a defensible technical basis for a procurement decision. They also give you the documentation trail a regulator or internal audit committee would expect to see behind a decision to deploy agentic AI in a high-risk context. The EU AI Act’s requirements for high-risk systems include exactly this kind of documented risk assessment (source: EU AI Act, Article 9, EUR-Lex).

The honest version of the trade-off

The case for a minimal, owned stack is strong on security grounds. It is worth being honest about what it trades away.

An owned stack is not portable. You cannot trivially swap components in and out the way you can in a loosely coupled ecosystem of open-source libraries. The Substrate systems were designed to interoperate with each other, and that integration is a source of the security and governance properties. Breaking the integration breaks the properties.

An owned stack requires that you trust the team who built it. The code is not on PyPI for the community to audit. The security properties are real, but they depend on the internal quality of the owned codebase, not on the wisdom of crowds. This is a different kind of trust from the open-source model, and it is worth being clear about the distinction.

For a regulated enterprise deploying agentic AI on production data in 2026, I think the trade-off is clearly in favour of the owned approach. The attack-surface reduction is large. The governance properties are built in rather than retrofitted. The regulatory alignment is direct. And the cost of a supply-chain incident in a regulated context, measured in fines, remediation, regulatory scrutiny, and reputational damage, is sufficiently high that the premium for a more defensible stack is easy to justify.

The dependency trap is real, and agentic probing has made it more acute than it was two years ago. A governed factory with explicit trust boundaries and a signed supply chain is not a conservative choice: in the current threat environment, it is the rational one.

For more on the identity and signing layer that makes this possible, see cryptographic agent identity and Ultra. For the sovereignty angle, including air-gapped deployment and owned model infrastructure, see sovereign AI, air-gapped by default. The Kizuna forge and its role as the signed supply chain for agent artifacts is covered in Kizuna: GitHub reimagined for agents. And if you are asking whether the governed factory is worth it compared to building on existing open-source infrastructure, the missing 80 percent makes the full case.

If you want the security architecture and investor brief, including the detailed threat model and the head-to-head against glue-stack deployments in regulated finance, request it at the Substrate page.