Article 記事

GDPR hard-delete in agent memory graphs: the technical mechanics

      author
      Jonathan Conway
    

      timestamp
      23 May 2026
    

      classification
      gdpr / hard-delete / governed-memory / oamp / bitemporal / crypto-shred / graph-surgery / eprivacy / regulated / substrate
    

A data protection officer at a European insurance group received an Article 17 erasure request in March 2026. The subject was straightforward: a customer whose policy had lapsed, who had since moved to a competitor, and who wanted no continuing data footprint. The DPO ran the standard process. Primary systems confirmed. Email platform confirmed. CRM confirmed. Then someone thought to ask about the claims AI that had been running for fourteen months.

The claims AI used a vector store for context retrieval and a graph database for entity relationships. The vendor confirmed that the customer’s primary records had been deleted. When the DPO asked specifically about the vector embeddings, the AI-derived risk profile, and the synthesised claim history nodes that the memory layer had been building from ingested documents, the vendor’s answer arrived three weeks later. It said, in essence, that the embeddings had been marked for exclusion from future queries, that the derived nodes would be purged in the next scheduled vacuum run, and that the timing of that run depended on load. There was no receipt. There was no audit record of what had been deleted or when. The regulator considered this “incomplete.”

The phrase “incomplete” in a DPA enforcement context is a courtesy. What it means is: you do not know what your system held, you cannot prove what you removed, and your deletion mechanism is a best-effort background process with no accountability properties. Under GDPR Article 17, erasure must be completed “without undue delay.” Under Article 5(2), the controller must be able to demonstrate compliance. An unscheduled vacuum run satisfies neither.

The sibling article Memory that was never there: hard-delete, governance by omission, and GDPR as a feature of the factory explains the design principle and the regulatory framework. This article goes into the mechanics: what actually happens inside the memory graph when a deletion request arrives, why naive approaches fail, and what “cryptographic hard-delete” means at the level of data structures, transactions, and signed audit evidence.

The structure of the problem

A naive model of agent memory looks like a key-value store: you put a fact in, you retrieve it later. Real agent memory systems are more complex, and that complexity is precisely what creates the deletion problem.

When a fact enters the governed memory engine, several things happen in sequence. The raw fact is stored as a primary record on the valid-time axis, with the transaction-time of ingestion also recorded. An embedding is computed from the fact’s text and indexed in the vector space. The fact is attached to an entity node in the property graph. Background consolidation processes examine the entity’s accumulated facts and produce higher-level semantic nodes: a “synthesised risk profile” that condenses multiple individual facts, or a “claim pattern” node representing a behaviour extracted across a sequence of events. These derived nodes carry provenance pointers back to the source facts that contributed to them, but they are themselves new storage objects with their own embeddings and graph positions. Hot-path retrieval caches may also hold copies of recently accessed facts and derived nodes to support the governed memory engine’s ~3 ms recall figure.

When an Article 17 request arrives, the deletion must propagate to all five layers: primary record, embedding index, graph adjacency, derived nodes, and caches. Miss any one and the deletion is, in legal terms, incomplete. The standard vector database pattern fails at the embedding layer. The standard graph database pattern fails at the derived-node layer. None of them, individually, produces a verifiable deletion receipt. This is not a vendor quality problem; it is structural.

Graph surgery: what it actually involves

The term “graph surgery” describes the process of removing a node from a property graph while maintaining the integrity of the remaining structure. It sounds like a deletion, and in part it is, but it requires considerably more than issuing a DELETE FROM nodes WHERE id = ?.

Consider an entity node representing a financial customer. That node has outgoing edges to dozens of fact nodes: address history, account status, transaction flags, consent records, risk classifications. Some of those fact nodes are shared: a counterparty relationship node connects the customer to another entity. Some fact nodes contributed to derived nodes: a “high-risk profile” synthesised node was produced from three specific fact nodes, including one that belongs to the subject of the erasure request.

A correct deletion of the customer’s data involves four phases, in order.

First: a traversal of the entity subgraph to identify all first-order fact nodes associated with the subject. Shared facts require careful handling: the edge to the subject is severed, but the node itself is not deleted if other entities reference it.

Second: a recursive traversal of the provenance graph to identify derived nodes whose derivation included any of the subject’s fact nodes. Provenance pointers in the governed memory engine are bidirectional; every derived node carries a derived_from array listing the source fact ids. For each derived node in the set: if it can be regenerated without the deleted facts and the result would be materially identical, it may be retained with updated provenance. If not, it joins the deletion set. The recursion repeats for derived nodes of derived nodes; in practice this is at most two or three levels deep.

Third: once the full deletion set is computed, the transaction is composed. In the governed memory engine this is a single atomic operation against the bitemporal store: primary records are removed from both the valid-time and transaction-time axes, embedding vectors for each deleted node are marked for immediate exclusion (not deferred compaction), and provenance edges are severed. The transaction commits entirely or rolls back entirely.

Fourth: the embedding exclusion set is updated. This is the gap that catches most vector database implementations: the governed memory engine writes each deleted vector id to an exclusion set that is consulted before the ANN index at query time. The deleted embedding is therefore never returned regardless of when physical compaction occurs. The exclusion set entries become part of the deletion record.

Interactive: click a fact in the timeline to select it, then use the GDPR hard-delete button. The fact vanishes from both time axes and a signed deletion event appears in the audit log. Drag the as-of slider to confirm the fact is absent at every point in history, including before the deletion was issued.

WORM provenance: why you cannot delete the deletion

“Write Once Read Many” storage for the audit trail creates an apparent paradox. If the audit log is append-only and tamper-evident, how do you satisfy a deletion request that would normally require removing records from it?

The answer depends on understanding what the audit trail is required to hold and what it is not.

GDPR Article 17 requires erasure of the personal data. The EU AI Act Article 12 requires that the audit log enable post-hoc review of the system’s operation. These are different objects. Personal data is the semantic content of a fact about a person. The audit log entry for a deletion event does not need to contain personal data; it needs to contain evidence that the deletion occurred and was correct.

The governed memory engine’s audit chain, signed through the identity service’s Ed25519 authority plane, stores the following fields in a deletion block: the block index (sequence number in the chain), the hash of the preceding block (the tamper-evident link), the timestamp, the identity of the operation that authorised the deletion, the GDPR Article and basis for the deletion, the scope (entity id, policy perimeter), and a cryptographic hash of the deleted data. Crucially, it does not store the deleted data itself.

The hash of the deleted data is sufficient for later verification. A regulator who wants to confirm that a specific record was deleted can compute the expected hash from the original record (which they would need to provide as part of the verification request) and match it against the deletion block. If they match, the deletion of that specific data is proven. The audit log does not need to hold the data to prove the deletion of the data.

This is the same principle that makes signed certificate revocation lists work in TLS infrastructure. You do not need to store the revoked certificate in the revocation list. You store its identifier and a signature that proves the revocation event was authorised. The certificate itself is gone; the proof of its removal is permanent.

The WORM property of the audit chain means that the deletion block, once written, cannot be removed or altered without breaking the chain. This satisfies the Article 5(2) accountability obligation: the evidence that the erasure was performed is itself tamper-evident and permanent. A regulator looking at the audit chain will find not a gap (which might indicate unauthorised removal of records) but a deletion block (which is the positive evidence of a governed erasure).

Crypto-shred: keyed deletion as the fastest path to irrecoverability

For data at rest in storage systems where physical deletion is slow, expensive, or operationally impractical (columnar stores, object storage, backup tapes), the governed memory engine supports an alternative mechanism called crypto-shred.

The principle is straightforward. Instead of storing a fact in plaintext, you store it encrypted under a purpose-specific data key. The data key is stored in a dedicated key management service (the identity service’s authority plane can serve this role, or an external KMS can be integrated via OAMP’s key-provider capability). When an erasure request arrives for data protected by this key, you delete the key. The data, wherever it resides, physically or in backup, becomes computationally irrecoverable. The data is still there in bits, but those bits are indistinguishable from random noise without the key.

Crypto-shred is particularly relevant for compliance in scenarios where backup media cannot be immediately overwritten. Backup tapes, cloud object storage versioning, and disaster recovery snapshots often retain data beyond the normal retention window. Obtaining a formal deletion guarantee from a media recycling vendor takes time. Under crypto-shred, the backup tape can be left intact and the key can be deleted immediately. The backup copy is no longer recoverable personal data under any practical definition; it is encrypted noise. Several national DPAs, including the Dutch AP and the Hamburg DPA, have indicated that crypto-shred constitutes adequate erasure for backup scenarios under GDPR, provided the data was encrypted prior to backup and the key deletion is itself documented and evidenced.

OAMP v1.3 formalises the crypto-shred delete mode as a first-class operation in the delete semantics specification. When a backend advertises governance.hard_delete.crypto_shred: true on its capabilities endpoint, it is committing to: data-at-rest encryption with per-subject or per-entity granularity (not whole-database encryption), a key deletion operation that is synchronous and produces a deletion receipt, and a guarantee that the encrypted data is not also stored in recoverable plaintext form at any layer. The distinction between whole-database encryption (which does not help with per-subject erasure) and per-subject key encryption (which does) is a common source of confusion in RFP processes; OAMP’s capabilities declaration makes it explicit.

Interactive: the GDPR hard-delete, including the crypto-shred of the associated encryption key, is a signed block in the same hash chain as every other agent action. Hover any block to inspect the signed fields. Toggle the tamper simulation to observe what happens to the chain when a deletion block is altered.

The policy surface: consent-aware recording in the realtime data plane

Deletion in memory graphs addresses historical facts. A parallel problem exists at the point of ingestion: the system needs to know, before recording a fact, whether recording it is permissible under the subject’s current consent record.

The realtime data plane, the agent runtime at the core of the Substrate factory, includes consent-aware recording as part of its telephony and media foundations. When a voice interaction is in progress, the system holds a live consent record for the subject of the call. Recording is permissible only for the duration and scope covered by the active consent. If consent is withdrawn mid-call, recording stops and previously captured audio for that session is flagged for deletion. The telephony pipeline emits a consent-change event that the governed memory engine receives and processes as a deletion trigger for the memory representations derived from the withdrawn-consent segment.

Under UK PECR (the Privacy and Electronic Communications Regulations 2003) and its EU equivalent ePrivacy Directive (2002/58/EC), recording a communication requires either the consent of all parties or a legitimate interest basis that is explicitly defined and bounded. The legitimate interest basis for recording calls in financial services (for compliance with COBS/MiFID II record-keeping requirements) does not override a subsequent GDPR erasure request from the data subject, but it does affect the timing: records kept under a statutory retention obligation cannot be erased before the retention period expires, but they must be erased promptly once it does.

The realtime data plane’s consent model treats these as two distinct states: “recording permitted under consent” and “recording retained under statutory obligation.” When the statutory retention period expires, the memory layer automatically emits a deletion trigger without requiring a separate erasure request from the subject. The trigger produces the same cryptographic hard-delete as a manual Article 17 request, with the distinction that the deletion block in the audit chain references the retention schedule expiry as the authorising basis rather than a subject erasure request.

This distinction matters for the audit trail. A regulator examining the deletion log wants to know not just that a deletion happened but under what lawful basis it was authorised. A deletion block that says policy: gdpr-art17-erasure means the subject exercised their right. A deletion block that says policy: retention-schedule-expiry-cobs-v3 means the record was held for the full lawful retention period and then deleted as scheduled. Both are legally correct outcomes. Both produce the same cryptographic evidence structure. The difference is in the policy field, which the identity service signs along with everything else.

Soft-delete versus hard-delete: the cost of deferral

Soft-delete is operationally convenient. Marking a record as deleted is O(1). Traversing a provenance graph, identifying derived nodes, and issuing a coordinated atomic deletion across multiple storage layers is not. That is a defensible engineering choice. The question is whether the convenience is worth the legal exposure.

What soft-delete commits you to: the record is excluded from query results but remains in the primary store, the embedding index, the derived node graph, replicas, and backups. The vacuum gets to it eventually, on a schedule determined by load, with no guarantee of timing and no per-record audit trail. In a cloud-hosted system, a sufficiently privileged database query returns the “deleted” record. That includes the vendor’s own infrastructure access.

The “vacuumed eventually” posture fails Article 17’s “without undue delay” standard in a straightforward way: the deferral timeline is not controlled by the data controller. It also fails Article 5(2) because there is no evidence of completion until the vacuum runs.

Soft-delete is defensible in one context only: as a transitional state before a verified hard-delete completes within a defined window. Several DPA enforcement decisions have accepted “soft-delete followed by verified hard-delete within 48 hours” as adequate. The word “verified” matters: the evidence must survive independently of the system that performed the deletion.

The governed memory engine supports this pattern: an immediate soft-delete excludes the data from query results without interrupting the running factory, followed by an atomic hard-delete that completes within a configurable window and produces a signed receipt. The window is a policy setting the data controller controls.

Auditing the audit of deletes

The audit trail for deletions is itself subject to audit. This is a governance requirement under GDPR Article 30, which requires the controller to maintain records of processing activities including erasure. An Article 30 entry for an erasure request should cover: date, requester identity, data categories affected, systems in which deletion was performed, completion date, and a reference to the evidence.

The governed memory engine produces all of this as a machine-readable deletion receipt from OAMP’s crypto_shred delete mode. the identity service signs the receipt. The Article 30 record references the receipt by hash; the hash links it to the verifiable evidence without copying personal data into the compliance register.

In a sovereign deployment the entire audit chain is under the customer’s control, available to the DPO, external auditors, or a regulator without involving the vendor. In a hosted deployment the customer must verify that their contract allows independent export of the audit chain in a format that can be verified offline. OAMP’s export format is standardised and Ed25519 verification requires no vendor infrastructure. A signed chain in cold storage is verifiable by any party with the public key, even after a vendor shutdown. The deletion evidence is genuinely portable.

The contrast in practice: a telephony scenario

An insurance group runs an inbound customer service operation. AI agents use the realtime data plane as their runtime, the governed memory engine for customer memory, and the identity service for signed action provenance. Every call goes through the consent-aware recording pipeline: before any audio is retained, the realtime data plane checks the active consent record.

A customer calls in, confirms recording consent, then mid-call withdraws it. the realtime data plane stops recording immediately. The segment already captured is flagged “post-withdrawal deletion required,” the memory representations derived from it enter “pending deletion” state, and a task queues. Within thirty seconds: the audio is crypto-shredded (per-session key deleted, ciphertext irrecoverable), the partial transcription’s fact nodes are identified by provenance traversal, ineligible derived nodes are deleted, and a deletion block is written to the identity service audit chain. When the agent subsequently retrieves customer context, the governed memory engine returns memory as of before the withdrawn segment. The supervisor’s quality review shows the consent withdrawal event and a link to the deletion receipt. They can verify the operation without seeing the deleted data.

Compare this to a glue-stack telephony integration. The consent withdrawal arrives at the call routing layer. A webhook fires to the external vector store to delete the transcript chunk. The vector store acknowledges, but the AI context window already holds the transcript. The context expires when the session ends. The embedding is physically removed at next compaction. The webhook may or may not have reached the graph database holding derived facts. Nobody checked the derived nodes. There is no deletion receipt. There is no signed audit chain entry.

One architecture discharges the Article 17 obligation under ePrivacy, GDPR, and the EU AI Act simultaneously. The other creates a trail of partial evidence and deferred operations.

The provenance differentiator

Hard-delete is necessary but not sufficient. The differentiating property of the governed memory engine’s approach is that deletion events carry the same provenance structure as every other factory action.

Every action in the Substrate factory is signed by the identity service: agent identity (Ed25519 public key), tool name, policy reference, input and output hashes, block index. A deletion action is structurally identical to an ingestion action except that its tool field is memory/crypto-shred and its policy field names the relevant GDPR article or retention schedule. Who issued this deletion? The signed identity. Under what authority? The policy field. What data was affected? The hash of the deleted content. The answers are in the chain, signed the same way as every other answer.

This makes erasure events first-class operations in the factory’s audit record rather than second-class cleanup activities that happen outside the main trail. Provenance, not recall, is the real differentiator covers this property for the factory as a whole; the deletion case is the clearest demonstration of why it matters.

What to demand in an RFP

Any procurement for an agentic system handling personal data about EU or UK data subjects should include the following technical questions in the specification. Generic vendor prose about “GDPR compliance” is not an adequate answer to any of them.

Ask specifically about graph surgery. The question to ask is: describe your process for identifying and deleting derived memory nodes when a source fact is erased. If the vendor’s answer does not include a mention of provenance traversal and derived-node assessment, they are not doing graph surgery. They are deleting primary records and hoping nothing derived survives.

Ask about embedding index deletion timing. The question is: between issuing a deletion request and the next index compaction, is the deleted vector still physically present in the index and, if so, can it be returned by an ANN query? The correct answer is that the deleted vector is excluded from query results immediately, regardless of physical compaction timing, and that a mechanism exists (such as an exclusion set) to guarantee this.

Ask about crypto-shred. The question is: is your at-rest encryption implemented at per-subject or per-entity granularity, and can you delete a single subject’s data key without affecting other subjects’ data? Whole-database encryption does not answer this question correctly.

Ask for a deletion receipt. The question is: what evidence do you produce that a deletion operation completed, and is that evidence independently verifiable without access to your infrastructure? The correct answer includes a signed deletion block in a tamper-evident audit chain, with an export format that can be verified offline.

Ask about the relationship between deletions and EU AI Act Article 12 logs. The question is: if a deletion removes personal data from the memory system, do your Article 12 audit logs still contain copies of that personal data? If the answer is yes, you have a conflict between two obligations that the architecture has not resolved. The correct answer is that the audit log stores hashes of deleted data, not the data itself, and that this design was a deliberate architectural choice.

Ask about the factory’s behaviour in air-gapped or private cloud deployments. The deletion mechanism should operate identically in a sovereign deployment as in a hosted one. If the deletion guarantee depends on a vendor-hosted KMS, it does not survive the deployments that regulated buyers actually need.

A 90-day way to test it

A pilot that genuinely tests deletion capability does not need to involve real personal data. It requires a realistic dataset with known provenance structure, a deletion request that exercises the full cascade, and a verification process that checks each layer independently.

Create a test subject with a controlled set of facts in the memory system. Allow the consolidation processes to run for a week, producing derived nodes with traceable provenance. Issue a deletion request for the test subject. Verify the result against each layer: primary store (should be empty for the subject), embedding index (should return nothing for the subject, both before and after compaction), derived nodes (traverse provenance pointers; nodes exclusively derived from the deleted subject should be gone), deletion receipt (should exist as a signed block in the audit chain, independently verifiable), and Article 12 log (should contain a deletion block, should not contain the deleted personal data in recoverable form).

Run the same test against the vendor’s system. If any layer fails the check, you have found the gap in a controlled environment at negligible cost. If all layers pass, you have a genuine deletion capability you can rely on in production.

Three months of testing one realistic deletion scenario per week, across different data shapes and provenance depths, is enough to be confident that the mechanism generalises. It is also enough to satisfy most DPA requests for evidence of your deletion testing programme.

For the design principles that underpin why governance by omission works the way it does, the sibling article Memory that was never there is the right starting point. For how the bitemporal structure underpins the as-of reconstructability that makes deletion verifiable across time, read Bitemporal memory as the compliance backbone. The EU AI Act logging obligations that constrain what the audit chain must contain are mapped in EU AI Act Article 12: what high-risk systems must log. For the signed provenance architecture that gives deletion events the same standing as any other factory action, read Provenance, not recall, is the real differentiator.

If you are evaluating the factory for a regulated use case and want the full architecture view with head-to-head comparisons against glue stacks, you can request the investor brief from the Substrate page.