Article 記事

Sovereign AI: air-gapped by default, owned models, no extraterritorial cloud risk

author Jonathan Conway
timestamp 6 May 2026
classification sovereignty / air-gap / cloud-act / open-weight-models / regulated / substrate / voxeltron / ninmu

In early 2024 the German Federal Office for Information Security published guidance warning that sensitive government workloads processed on US-hyperscaler infrastructure are subject to US law regardless of where the servers sit (source: BSI Technische Richtlinie TR-03161). The CLOUD Act had already established the principle in 2018: a US provider can be compelled to hand over data stored anywhere in the world. A German data centre, a Dutch co-location facility, a UK government tenancy on AWS GovCloud: none of those physically separate the data from US jurisdiction if the operator is a US legal entity or has US parent ownership.

Most enterprise AI procurement teams understood this in the abstract. What changed in 2025 and into 2026 was the degree to which regulators started treating it as a concrete risk rather than a theoretical one. The EU’s proposed EuroStack initiative called explicitly for a European-operated, open-source AI infrastructure layer, naming dependency on hyperscalers as a strategic vulnerability (source: European Parliament Research Service briefing, January 2026). India’s Reliance group announced sovereign AI investments on a scale that made the ambition clear: the point is not to be better at using another country’s AI, it is to own the capability. These are not hobbyist concerns about privacy. They are board-level decisions about what happens to a country’s critical systems when another country changes its export controls or sanctions regime.

For a government department, a regulated financial institution, or a critical infrastructure operator, the calculation has a clean shape: if your AI system can be legally accessed by a foreign government, you do not have a sovereign system. You have a system with an interesting logo.

The real shape of the risk

The conversation about sovereignty in AI tends to collapse into three separate but related problems, and it is worth keeping them distinct because they require different mitigations.

The first is legal extraterritoriality. As described above, the CLOUD Act creates a mechanism by which any US company can be compelled to produce data regardless of where it is stored. Other jurisdictions have analogous instruments. This is not a hypothetical: it is a well-established legal framework that has been used in practice. For any organisation that processes data under rules prohibiting its transfer outside the jurisdiction (GDPR, sector-specific financial regulations, government classification requirements), running that workload on a US-hyperscaler-hosted AI service creates a compliance exposure that cannot be resolved by contractual means. The provider may be legally prevented from even notifying you.

The second is concentration and availability risk. As of mid-2026, the majority of frontier AI inference capacity is controlled by a small number of US-based companies. If your production AI pipeline depends on API calls to a frontier model, you have taken on a single point of failure that sits outside your change-management and incident-response process. A pricing change, a rate-limiting policy, a service outage, a regulatory freeze on exports to your jurisdiction: any of these events can take your AI-dependent processes offline. These are not edge cases. The AI API market has already demonstrated all four in the space of three years.

The third is model behaviour and data leakage. When you call an external API, your prompt and context leave your network. This is obvious, but the downstream consequences are often underestimated. In a factory scenario where agents are processing regulated financial data, clinical records, government casework, or commercially sensitive IP, the prompt content is often more sensitive than the output. The model provider may use it for training, telemetry, or abuse detection. In some jurisdictions, transmitting that data to a foreign server is itself a regulatory violation before anything else has happened.

None of these three risks can be resolved by a data-processing agreement. They are structural, and they require a structural response.

What genuine sovereignty requires

A sovereign deployment is not a deployment in a data centre that happens to be in your country while everything above the hardware layer remains US-controlled software. That is a popular arrangement but it does not actually solve the problems above. Genuine sovereignty means owning:

The compute fabric. Bare metal or private cloud, fully under your operational control, with no mandatory heartbeat calls to an external control plane.

The models. Open-weight models, ideally fine-tuned on your domain data, that you can run without a licence that can be revoked, rate-limited, or amended at a foreign company’s discretion.

The inference and orchestration stack. The software that routes requests, schedules agents, meters spend, and maintains memory. If this layer phones home, logs to an external SaaS, or requires a live connection to a vendor’s authentication service, it is not sovereign.

The audit and memory layer. In a regulated context, the entire run of the system needs to be reconstructable. If the audit trail lives in an external SaaS or depends on a vendor’s logging infrastructure, an outage or access dispute takes your audit evidence with it.

The identity and signing plane. If agent identities and action signatures are issued by an external authority, air-gapping your compute does not produce a verifiable audit trail: it produces an unverifiable one, which is worse.

This is a long list. Owning all of it is genuinely hard. It is also, for the categories of organisation described above, the only way to produce a system that can withstand a regulators examination without having to explain why a foreign government could theoretically access the evidence.

Interactive: click “simulate provider outage” to watch the factory continue running on owned models inside the walls. Click “enable air-gap mode” to see the frontier connection disappear entirely. The factory keeps operating.

Most agent frameworks and AI API platforms can pass neither test. They are built around the assumption of a permanent external connection to a frontier API, and the assumption is load-bearing: the orchestration, the cost tracking, the model selection, and sometimes the audit logging all depend on it. Pull the connection and the system either stops or continues without its governance layer. Neither outcome is acceptable in a high-stakes regulated deployment.

How Substrate addresses this from the first line of code

Substrate was designed with the constraint that it must run without any external connection, from day one, not as a retrofit.

The six systems that make up the factory (Ninmu, Cosmictron, Kizuna-mem, Ultra, Kizuna, and Voxeltron) are all owned code: approximately 790,000 lines of Rust and Elixir, no third-party agent framework at the core. There is no orchestration layer that dials home. There is no memory store that requires a vendor SaaS account. There is no identity provider that needs a live connection to issue certificates.

The deployment unit is a single binary per node. Voxeltron boots an isolated cell in under 50 milliseconds and can hold approximately 10,000 idle cells per host. A fully functional factory instance, including the orchestration, memory, identity, and signing infrastructure, can be installed on hardware that has no internet connection and will never have one. The configuration is entirely static.

Ninmu, the swarm conductor, meters every token before it runs. That metering is not a call to a vendor’s billing API. It is internal to the factory. The budget ledger lives in Cosmictron, the same system that holds the rest of the mission state. There is no billing endpoint to disconnect in air-gap mode because billing was never a remote call.

Kizuna-mem provides bitemporal memory with approximately 3 milliseconds recall. In an air-gapped deployment, the memory graph is entirely local. The governance layer (what is in scope, what can be recalled, what must be deleted under a data-subject request) operates identically in a connected and an air-gapped instance.

Ultra handles agent identity. Each agent has an Ed25519 keypair. Actions are signed at the point of execution. The tamper-evident log is written to Cosmictron. None of this requires an external certificate authority or an external log sink. In an air-gapped deployment, the signed audit trail is exactly as verifiable as it is in a connected one: the signing keys are owned, the log is local, and the chain of custody is complete.

Interactive: click a layer to inspect what it does and where it sits. Toggle air-gap mode to see which layers are marked external (only the optional frontier failover) and which run entirely within the walls.

The one concession to the connected world is the optional frontier failover. If an organisation is willing to accept the legal and data-residency implications, Ninmu can route certain tasks to a frontier API when the owned model is insufficient. This is explicitly opt-in, configurable per task type, and disabled by default. In air-gap mode, the option does not exist. The factory does not attempt any outbound connection and does not degrade if none is available.

A concrete deployment: Voxeltron + Cosmictron + Kizuna-mem on air-gapped hardware

To make this concrete, consider a government agency deploying a casework-automation factory on a classified network. The network has no internet access. No packets in, no packets out. The agency provides bare-metal servers in a physically secured facility.

The deployment looks like this. Kizuna, the AI-native forge, is used to build and sign the factory artefacts on a separate, connected build machine. The signed artefacts are transferred by physical media to the air-gapped environment: a procedure analogous to software supply-chain practice for classified systems. Voxeltron’s deployment control plane installs the artefacts, verifies their signatures, and starts the cell runtime. From this point forward, the environment has no need of any external system.

An operator declares a mission in Ninmu: process the incoming casework queue, apply eligibility criteria, surface exceptions for human review, produce a signed decision record for each case. They set a budget denominated in tokens, which Ninmu will meter against a pre-loaded schedule of open-weight model costs. The models are fine-tuned versions of open-weight base models, trained on the agency’s historical casework using a Kizuna fine-tuning run that happened on the connected build machine before the air-gap transfer.

The swarm runs. Voxeltron spawns cells as the queue demands, each isolated, each booting in under 50 milliseconds. Cosmictron holds the live state of every case, the mission’s budget ledger, and the incremental views that let the swarm’s agents subscribe to state changes without polling. Kizuna-mem provides the memory that lets the swarm recognise patterns across cases: a previously decided precedent, a policy change that superseded an earlier rule, a claimant whose record was updated three weeks ago.

Ultra signs every action. Every eligibility decision, every human-gate trigger, every case closure is written to the Cosmictron append-only log with the signing agent’s identity and the action’s policy context. The log is the audit trail. There is no separate audit system to query, no external logging service to maintain. An auditor examining the record of any decision can replay the exact state of the system at the moment it was made, see which model ran the inference, see what memory was retrieved, see which human approved the exception, and verify the cryptographic chain back to the original casework documents.

If the frontier failover were enabled in this deployment (it would not be, but supposing it were), the Ninmu routing table would route only tasks explicitly cleared for external processing to the API endpoint. Every such routing event would be logged, including the fact that data left the walls. In the classified deployment, the configuration simply has no frontier endpoint. Ninmu treats the absence of a frontier endpoint as normal: it selects from the owned models for every task. The factory does not error. It does not degrade. It runs.

This is what the homepage means by “no data leaves the walls, no single provider outage or price hike can take the factory offline”. It is a design constraint, not a marketing claim. The constraint was in the architecture from the first build. Retrofitting it into a system that assumed permanent connectivity is not a software project; it is a different system.

What EU AI Act and DORA require, and how air-gap deployment satisfies them

The EU AI Act (August 2026 enforcement for high-risk systems under Article 6 and Annex III) requires, under Article 12, that high-risk AI systems automatically log events over the lifetime of the system. Logs must cover decision events, risk situations, substantial modifications, and operational monitoring. They must be tamper-evident. Minimum retention is 6 months, with longer requirements for some categories (source: EUR-Lex, Regulation 2024/1689, Article 12).

Most current agent stacks fail this requirement not through any deliberate evasion but because they were not designed for it. A LangGraph-style orchestration system running against an external model API produces logs in three or more separate places: the LLM provider’s usage logs, the application’s own log store, and whatever observability tool you have connected. None of these is the authoritative record. All three can have gaps. None is cryptographically tamper-evident by default.

Substrate’s architecture produces a single authoritative log in Cosmictron that is the authoritative record by construction. Every agent action is signed before it is committed. The log is append-only. Because the log is the storage engine (deterministic replay means the state of the system at any point can be reconstructed from the log), there is no separate audit trail to keep in sync. The single log satisfies the Article 12 requirement, including the tamper-evidence requirement, by construction. This is also the reason the phrase “governed by construction” appears in Substrate’s positioning: the governance properties are architectural invariants, not policies enforced by a monitoring layer that can drift or fail.

DORA (Digital Operational Resilience Act) requires regulated financial entities to maintain ICT risk management frameworks that include operational logs and the ability to recover from incidents (source: Regulation (EU) 2022/2554). An AI system that depends on a third-party frontier API for its core functionality has, by definition, a single external dependency that is outside the entity’s incident-response perimeter. Sovereign deployment eliminates that dependency. The factory’s operational resilience is bounded by the resilience of hardware and infrastructure the organisation itself controls.

The point is not that sovereign deployment is required for EU AI Act compliance. It is that sovereign deployment, done properly, produces compliance properties that are very hard to achieve any other way.

What to demand in an RFP

If your organisation is procuring an AI system for regulated work and sovereignty is a requirement (either because of legal classification, sector regulation, or board policy), the following questions cut through the pitch decks.

Ask whether the system can run with no outbound network connections, permanently. Not “in theory” and not “with reduced functionality”. Ask them to demonstrate a running instance on an air-gapped machine. If the demonstration requires cutting off internet access and reconnecting before they hand the laptop back, something depends on the connection.

Ask where the audit log is written and who controls it. The answer should be: locally, in a tamper-evident log that you own, with no dependency on the vendor’s infrastructure for access, verification, or retention. If the answer involves a vendor-hosted log service, ask what happens to your audit evidence when the service is unavailable or the vendor is acquired.

Ask about model ownership specifically. “We use open-weight models” is not the same as “you own and control the model weights”. Ask whether you receive the weights, whether you can fine-tune them, and whether a licence change by the base model provider affects your ability to run them in two years.

Ask how cost governance works without an internet connection. In most platforms, cost tracking depends on calling the provider’s usage API. If the model call is local, there is no provider API. A system with real internal cost governance (Ninmu’s budget ledger, held in Cosmictron alongside the rest of the mission state) does not change behaviour based on whether there is an outbound connection.

Ask how identity and signing work in air-gap mode. If agent identities are issued by an external certificate authority, an air-gapped deployment either cannot issue new identities or must operate on pre-issued credentials with no revocation path. An owned identity plane (Ultra’s Ed25519 key management) operates identically in connected and air-gapped modes because it was never a remote service.

These questions have short answers if the system was designed for sovereignty. They have long, hedged, or evasive answers if it was designed for a connected cloud and sovereignty is a retrofit.

A 90-day pilot design

The shape of a meaningful pilot is not complicated. Pick a workflow that is regulated, has a clear before-state (a human labour cost, a cycle time, a compliance overhead), and can be run entirely within your network perimeter.

In week one, deploy the factory on your own hardware. Not a cloud tenancy, not a vendor-managed instance: your hardware, in your facility. Verify that the instance runs with no outbound connections. Check the logs. Run the audit replay on a trivial mission to confirm the chain of custody is complete.

In weeks two through six, run a representative sample of the target workflow. Casework, control testing, claims processing, whatever fits your sector. Set the budget deliberately tighter than your current human cost for the same volume.

In weeks seven through twelve, measure three things. The cycle time (hours from declaration to signed output). The exception rate and time-to-decision at the human gates (the gates are where you learn how much the swarm actually needs human input, which is almost always less than expected). And the compliance properties of the output: can an internal auditor reconstruct any decision from the log without external dependencies?

At the end of ninety days you have a functioning, sovereign, auditable system running regulated work on your own hardware, a set of measured before-and-after numbers for a realistic workflow, and a decision about whether the factory warrants wider deployment. You have not bet your infrastructure on a single vendor’s pricing and availability. You have a factory that runs, and that you own.

Related reading and cross-links

The governance properties described here are downstream of the six-system architecture. How they fit together as one factory rather than a stack of products is in the six systems as one factory.

If you are comparing this approach to existing sovereign-infra platforms in the government and critical infrastructure market, sovereign alternative to Palantir-style platforms covers the head-to-head.

For the specific compliance requirements around logging and audit trails (EU AI Act Article 12, DORA, the bitemporal memory layer that makes them tractable), deterministic replay as the audit trail goes deep on the mechanics.

And if the budget-governance side is the priority question (it usually is: sovereignty without cost control still produces a runaway-spend problem), declare the mission and the budget is the right starting point.

The straightforward version

The strategic imperative is simple enough to fit in one paragraph. Any regulated organisation that runs AI on infrastructure it does not control has taken on a risk it cannot fully manage: legal exposure under foreign law, operational exposure to a vendor’s commercial decisions, and compliance exposure in every jurisdiction that asks “can you prove this data stayed in the country”. The technical capability to run a fully sovereign AI factory (owned models, owned orchestration, owned audit trail, no outbound connections required) now exists. The question is whether procurement processes are fast enough to use it before an incident forces the conversation.

Substrate is built for the organisations that want to have the conversation before the incident. The factory runs on your hardware. The audit trail is yours. The models are yours. The budget ledger does not depend on an API that someone else controls. If the internet goes down, or the US government changes its sanctions list, or a frontier provider changes its pricing model at two weeks notice: the factory keeps running.

Request the investor brief and the technical deployment guide at /substrate.