Article 記事

the agent forge: the code forge built for agents as first-class citizens

      author
      Jonathan Conway
    

      timestamp
      20 May 2026
    

      classification
      agent-forge / supply-chain / agent-identity / signed-artifacts / ci-cd / mcp / identity / substrate
    

A fintech that shall remain unnamed ran a six-week proof of concept with an autonomous coding agent in early 2026. The agent did useful work. It opened pull requests, wrote tests, updated dependencies. Then one of those dependency updates introduced a transitive package with a subtle backdoor. Nobody caught it at the forge. The agent had committed with the team’s shared CI service account, which had read-write access to the package registry, which trusted anything signed by that account, which trusted the agent because… well, because it had the token. The compromise was found three weeks later in production. The resulting incident ran to eight figures when you count the remediation, the regulatory notification, and the auditors.

The forge had no idea an agent was involved. As far as GitHub was concerned, it was the same bot account it had always been.

That story will not be the last of its kind. It will not even be unusual, because the forge is the part of the supply chain that everybody forgot to secure for the agentic era. You can have the most carefully governed orchestration layer, the most rigorous identity plane, the most auditable memory system. If the place where agents commit code and merge branches treats them as a slightly less trusted human with a reusable token, the whole chain has a hole.

This is the problem the agent forge was built to close.

Why the existing forges are structurally inadequate

The honest version of the problem is not that GitHub and GitLab are bad. They are excellent tools for what they were designed to do. The issue is the design assumption: a committer is a person, possibly automating some repetitive tasks, whose identity is anchored to a human credential (an email address, a Personal Access Token, an OAuth app that a human authorised).

That assumption shows up everywhere. A bot account has a human’s token with a scope list. Branch protection rules check whether the committer is in a named group, but the group membership is managed by a person and the agent is indistinguishable from any other member of that group. CI artefacts are signed by the pipeline runner, which is a shared service identity, not a cryptographic attestation of what agent did what work under what policy. The only way to know whether an agent opened a pull request is to read the commit message and hope whoever configured the automation remembered to mention it.

From a regulated-enterprise standpoint, none of this is acceptable. A CRO doing diligence on an agentic pipeline needs to answer: who opened this? under what authority? with what capability limits? was this artefact produced by the exact pipeline run I can replay? If the answer to any of those questions is “we infer it from log messages and naming conventions,” you do not have a governed supply chain. You have a supply chain with labels on it.

The deeper structural issue is that the forge is not just a place where code lives. It is the final control point before software enters the world. Everything upstream (the orchestrator, the identity plane, the memory system, the runtime) can be perfectly governed, and if the forge is porous, the signed evidence pack you produce at the end is not signing the actual chain of custody. It is signing a reconstruction.

How the agent forge approaches agent identity

The agent forge starts from a different premise. Every participant in the forge, human or agent, is an identity with a cryptographic lifecycle. For agents, that lifecycle is: requested, active, suspended, revoked, retired. Each state transition is recorded and signed. An agent that is suspended cannot commit. An agent that is revoked cannot even authenticate. There is no fallback to a shared token.

The identity is anchored in Ed25519 key pairs, managed by the agent forge’s OIDC provider via PKCE. This is not a bolt-on: it is the kizuna_identity application at the core of the platform. Every commit, every merge event, every artefact upload carries a cryptographic attestation that names the specific agent identity that produced it, the trust level that identity held at the time, and the policy under which the action was authorised.

Trust levels run from 0 to 4. A freshly registered agent starts at level 0 and has almost no permissions by default. Reputation scoring (accumulated over successful, policy-compliant runs) moves the agent up. A level-4 agent can do things a level-0 agent cannot, but that elevation is earned and auditable, not bestowed by putting the agent in a privileged group. For regulated enterprises, this matters because it means the question “why did this agent have permission to do that?” has a deterministic, signed answer.

Interactive: this is an agent forge commit chain. Toggle “simulate tamper” to see what happens when a single block is modified after the fact. Every downstream block fails verification, the broken link is highlighted, and the artefact is rejected at the merge gate. Hover any block to inspect its signed fields.

The chain in that diagram is not decorative. Each block records the agent identity (with its trust level at commit time), the tool used, the policy the action was authorised under, and the hash of the previous block. The hash chain is the property that makes tampering detectable. You cannot change the content of a past commit without changing its hash, which invalidates every subsequent signature. A single tampered block makes the entire downstream chain unverifiable, and the agent forge rejects it at the merge gate before it can enter the main branch.

This is what “every artefact signed” means in practice. It is not a signature on a completed bundle. It is a signature on each action as it happens, chaining back to the first action in the run, under a policy that was enforced at the time, by an identity whose trust level was verified before the action was permitted.

The policy gateway: deny by default

The other piece that existing forges lack is a policy enforcement point with the right default. GitHub’s default is permissive: you can do whatever you have the scope for, unless a branch rule or required review prevents a specific operation. The burden is on the operator to enumerate the things that should be blocked.

The agent forge’s MCP Policy Gateway runs the other way. The default answer to any request from an agent is no. An agent gets precisely the capabilities it has been explicitly granted for the specific operation it is attempting, at the trust level it currently holds, with a maximum delegation depth of three hops. High-risk operations, such as merging to a protected branch, deploying, or reading secrets, require step-up authentication. There is no “the agent had the right scope and so it went through.”

In practice, the gateway enforces five layers before an agent action completes. Token validity and audience check. Scope validation for the specific requested action. Trust-level minimum for the operation class. High-risk step-up where configured. Delegation depth check so a chain of delegated agents cannot recursively elevate themselves past the original authority. These are not configurable guards you can turn off because they slow down a demo. They are the control surface, and they are on by default.

The playground is worth mentioning separately, because it shows the practical value of the gateway. Agents can be tested in an isolated LiveView sandbox before they are granted production trust. Four sandbox templates cover the common onboarding scenarios. The playground runs inside the policy gateway, so an agent demonstrating its capabilities in the sandbox is demonstrating them under the same rules it will face in production. What you see is what you get, which is the opposite of most agent evaluation environments, which are deliberately unconstrained because constraints make demos harder.

The forge as part of the governed supply chain

Zoom out from the identity and policy mechanics and the architectural point becomes clearer. In Substrate’s full stack, the supply chain starts when the mission orchestrator accepts a mission and assigns it a budget. Every subsequent action, whether that is an agent checking memory via the governed memory engine, committing code via the agent forge, signing an attestation via the identity service, or deploying a cell via the cell runtime, is a link in the same chain. The audit pack that comes out the end is a chain of cryptographic signatures all the way from the mission declaration to the deployed artefact.

The agent forge is the link that covers the forge: the commits, the reviews, the CI runs, the artefact uploads, the merge decisions. Without it, the chain has a gap at one of the most consequential points: the place where an agent’s work becomes software. You can sign everything before the commit and everything after the deploy, but if the forge itself is unsigned in the middle, what you have is two separate chains with a gap between them that an adversary, or a badly behaved agent, can exploit.

Interactive: click any layer to see how the agent forge connects the agent orchestration above to the deployment and identity layers below. Toggle air-gap mode to see which connections are severed when the factory runs fully on-premises, and observe that the forge remains fully functional because all signing is local.

The layered view makes the position of the forge in the overall chain concrete. Mission and budget governance sits at the top (the mission orchestrator). Agent identity and the cryptographic root of trust sit below it (the identity service). The forge (the agent forge) spans from the identity layer up into the orchestration layer: agents commit with issued by the identity service identities, and the resulting signed artefacts flow downstream to the cell runtime for deployment and to the realtime data plane’s deterministic replay log for audit. The forge is not a separate product you bolt on. It is the connective tissue between what an agent decided to do and what the system actually shipped.

This is why the supply-chain argument matters so much to regulated buyers. An EU AI Act Article 12 audit log needs to record not just what was decided, but what was committed, by whom, under what policy, and whether the deployed artefact matches the committed code. the agent forge’s CI/CD engine produces signed attestations for every pipeline run. Those attestations include the content-addressable cache key, the artefact hashes, the trigger event, the agent identity that initiated the run, and the policy set under which the pipeline was evaluated. A regulator can verify any of those independently. There is no reconstruction needed.

CI/CD for the agentic era

The CI/CD engine deserves its own section because it is where several things that sound like obvious safety measures turn out to be surprisingly absent from current pipelines.

The engine is GitHub Actions-compatible in its workflow YAML format, which matters for adoption. You do not need to rewrite your pipelines to move to the agent forge. What changes is the security model underneath. Each pipeline run is executed with workspace isolation. Artefacts are stored with content-addressable keys and a configurable TTL, which means duplicate builds share cache without sharing state, and a cache entry cannot be substituted for another because the key is derived from the content. The 500 MB artefact size limit is a hard ceiling, not a soft suggestion.

Command injection prevention is built into the execution layer, not the linting layer. The distinction matters. A linting rule catches obviously malicious input in workflow YAML. Command injection prevention at the execution layer catches the case where an agent dynamically constructs a command that looks benign to a linter but does something unexpected at runtime. This is increasingly the relevant threat surface: agents generating workflow steps on the fly, not humans writing workflows that a linter reviews.

Matrix strategies run up to 256 combinations. Service containers are supported, which means you can run a Postgres sidecar in your test pipeline without reaching for a third-party service. The runner daemon is written in Rust, which is not a coincidence: the runner is the thing that executes arbitrary code on behalf of agents, and you want it to have explicit, bounded, auditable behaviour with a minimal attack surface.

The semantic diff intelligence is worth a brief mention because it is the piece that connects the forge back to the broader intelligence of the factory. Heuristic and LLM-powered diff analysis means the system can produce a meaningful summary of what a large agent-generated commit actually does, not just a line-count. A human approver at a merge gate sees a structured explanation of the change, not a wall of diffs they would need to read themselves. This does not replace the cryptographic chain of custody. It makes the human gate useful to the human at the gate.

Cross-repo federation and benchmarks

A factory that operates at scale across a regulated enterprise is not a single repository. It is dozens or hundreds, potentially spanning multiple business units, jurisdictions, and security classification levels. the agent forge’s cross-repo federation uses A2A (agent-to-agent) messaging to coordinate agents across repository boundaries without collapsing the policy model.

An agent in repository A that needs to reference or modify code in repository B does so through a federated request that goes through the policy gateway for both repositories. The trust level of the requesting agent is checked in the context of the target repository, not just the source. The delegation depth limit applies across the federation boundary. There is no way for an agent in a low-trust repository to acquire elevated permissions in a high-trust repository simply by crossing a federation link.

The benchmark and leaderboard system is the less obvious piece of the agent-first design. Because every agent run in the forge produces a signed attestation, and because those attestations record cost, latency, success rate, and policy compliance, it is straightforward to compute a performance record for each agent identity over time. That record is what the reputation system uses to determine trust-level changes. An agent with a long record of accurate, policy-compliant, low-cost runs earns trust that a new agent has not yet demonstrated. This is not a proxy for quality. It is a quantitative, auditable record of observed behaviour, which is the only kind of trust that holds up in a regulated environment.

A regulated sector walkthrough: a government digital service team

Imagine a digital transformation team at a national tax authority (this is a composite scenario, not a specific deployment). They have a backlog of legacy services to migrate. Each migration is a Substrate mission: declare the goal and the budget, let the swarm plan and execute the work, sign the output.

Before the agent forge, their challenge would be the supply chain. Agents commit code. How do you prove, to an internal audit function and potentially to the national cybersecurity agency, that every line of that code was produced by an authorised agent, under a policy that was in force at the time, with no unexplained modifications between commit and deployment? With a conventional forge and a bot account, you cannot. You have logs, which can be tampered with. You have branch protection rules, which an account with sufficient scope can bypass. You have CI artefacts, which are signed by the runner service account and tell you nothing about the agent that initiated the run.

With the agent forge, the audit answer is three steps. Retrieve the signed commit attestation for the artefact in question. Verify the signature chain back to the issuing agent identity. Verify that identity’s trust level and policy assignment at the time of the commit. The evidence is a bundle of cryptographic proofs, not a collection of log entries that someone assembled after the fact. The internal audit function does not need to trust the team’s narrative. They can verify the chain independently.

The before/after looks like this. Before: migration cycle takes four to six months per service, requires two security reviews, produces an audit trail assembled by hand from commit history, CI logs, and deployment tickets. Each assembly takes a week and still has gaps that auditors note. After: migration cycle takes days to weeks depending on service complexity, security review is replaced by a gate in the mission orchestrator DAG that an auditor approves once they have reviewed the policy set, audit trail is the signed artefact chain produced as the work happens. The audit function goes from “assemble evidence after the fact” to “inspect the chain at any point during the run.”

This is what “governed by construction” means when it lands in the forge. The governance is not a report generated afterwards. It is a property of the artefacts themselves, from the first commit to the deployed cell.

What to demand in an RFP

If you are evaluating forges as part of an agentic AI procurement, the identity question separates the serious proposals from the rest. Here is what to include.

Ask whether agents have cryptographic identities that are distinct from human accounts and from each other. The right answer is a full lifecycle (registered, active, suspended, revoked) anchored in asymmetric keys, not a shared service account. Ask what happens to commits made by an agent whose identity has since been revoked. The right answer is that those commits are verifiably attributed to the revoked identity, and that attribution is immutable.

Ask how the forge enforces capability limits on agents. The right answer is a deny-by-default policy gateway that checks trust level, scope, and delegation depth before every action, not a set of branch protection rules that an account with sufficient scope can bypass. Ask to see what happens when an agent tries to perform an operation above its trust level. Watch whether the system rejects the request and logs a policy violation, or whether it quietly escalates.

Ask how CI artefacts are signed and what the attestation covers. The right answer is that each artefact has a content-addressable identity and a signature that names the specific pipeline run, the agent that initiated it, the policy set applied, and the trigger event. Not “signed by our CI service.” Signed by an identifiable, policy-bounded agent identity.

Ask about command injection prevention. The gap between “we lint workflow YAML for obvious injection patterns” and “we prevent command injection at the execution layer regardless of how the command was generated” is the gap where agent-generated pipelines can cause harm that linting does not catch.

Finally, ask how the forge connects to the rest of the signed supply chain. A forge that signs its own artefacts but has no way to chain those signatures to an upstream orchestration event or a downstream deployment attestation is a local property, not a supply-chain property. The regulated buyer needs the chain, not just the links.

A 90-day pilot design

You do not need to migrate your entire source control estate to evaluate this. Pick one regulated workflow that currently runs through a forge with agent involvement: a compliance check, a dependency audit, a regulatory report generator. Set up the agent forge for that workflow and run it in parallel with your existing forge for thirty days.

Three things will tell you what you need to know. First, can you produce a chain of custody for every artefact the agents touched, naming the agent identity, the trust level, and the policy, without assembling it manually from log files? If the answer is yes, you have a governed supply chain for that workflow. Second, does the policy gateway reject attempts by agents to exceed their declared capabilities? Run a controlled test: give an agent a task that would require it to act outside its trust level and watch whether the gateway catches it. Third, does the CI engine produce attestations that an auditor can verify independently, without access to your internal systems?

If all three pass at day thirty, spend the next sixty days running the workflow under audit conditions, with your internal audit function attempting to verify the chain retrospectively. If they can do it in less time than it previously took to assemble the evidence manually, you have your production case.

The deeper point of the pilot is not the tooling. It is the question of whether the forge is part of your governance model or outside it. For regulated enterprises that are moving serious work to agent swarms, that is not an optional question. It is the one that audit committees and regulators will ask, and the answer needs to be a cryptographic proof, not a process narrative.

Closing

The forge is where the agent’s work becomes permanent. Everything else in the governed factory (the mission planning, the identity attestations, the memory, the runtime) produces evidence of intent and execution. The forge is where those intentions become commits, where those executions become artefacts, where “the agent did this” becomes “this software was produced this way.” If that link is weak, the entire upstream chain is weaker than its weakest point.

The agent forge is the answer to the fintech incident I described at the start. Not because it prevents every possible supply chain attack (nothing does), but because it makes every agent action in the forge attributable, policy-bounded, and verifiable. When the next incident happens (and it will), the question “who did this, under what authority, and how did it get past the policy gateway?” has a deterministic answer. That answer is a signed block in a chain, not a conversation with the team that was on call at the time.

For more on how the identity service provides the cryptographic root of trust that the agent forge’s agent identities are anchored to, see the cryptographic agent identity deep dive. For how the signed artefact chain connects to the broader supply-chain risk picture for agentic systems, see supply-chain risk in the agentic era. And for how the agent forge sits in the full six-system picture alongside the mission orchestrator, the realtime data plane, the governed memory engine, the identity service, and the cell runtime, see the six systems as one factory. The composable-standards angle, including how MCP and signed artefacts connect the agent forge to the broader interoperability story, is in composable agents via open standards.

If you want the full investor brief and the head-to-head comparison against GitHub + bolt-on agent automation, request it here.