Article 記事

Your AI agent just violated MiFID II. Here's why.

      author
      Jonathan Conway
    

      timestamp
      23 March 2026
    

      classification
      public
    

You ask a compliance agent on a trading desk a routine question: “What is this client’s risk classification and exposure limit?” The answer comes back: “Professional Investor, $50M exposure.”

The problem is that the client was reclassified to Retail three months ago. The limit is now $5M.

The agent didn’t hallucinate. It retrieved real data from real documents. The data was just no longer true. The trade went through at 10x the permitted exposure. The firm is facing a MiFID II Article 25 breach, a regulatory inquiry, and a fine.

This is the part that matters: a memory system can be technically impressive and still be unsafe for regulated work. This isn’t a hypothetical. It is the logical outcome of deploying AI agents with memory systems that weren’t designed for environments where wrong answers have legal consequences.

The problem with agent memory in regulated industries

AI agents are spreading rapidly through financial services, trading, and pharma. Compliance screening. Client advisory. Risk assessment. Drug interaction checking. Post-trade surveillance. These are valuable applications. They save time, reduce human error, and scale operations that would otherwise require armies of analysts.

But every one of these use cases asks the same hard question: can we prove the agent remembered the right thing at the right time? The agent’s memory must be accurate, auditable, and compliant. And right now, the agent memory systems that most teams are building on were designed for consumer chatbots, not regulated environments.

Here’s what goes wrong.

Problem 1: stale data that looks current

Every agent memory system stores facts. “Client X is a Professional Investor.” “Patient Y is taking 20mg Atorvastatin.” “Counterparty Z has a BBB credit rating.”

These facts change. Clients get reclassified. Patients change medications. Credit ratings get downgraded. The question is: does the memory system know the difference between what’s true now and what used to be true?

The failure mode is simple: old information can look more relevant than new information.

Most memory systems store facts as vectors in an embedding space. When you query “What is Client X’s risk classification?”, the system finds the most semantically similar text. The January document that says “Professional Investor” has months of context around it: onboarding emails, risk assessments, exposure reports. The March reclassification to “Retail” is a single paragraph. The old classification scores higher because it has richer context, even though it’s wrong.

When Agent Memory Fails in Regulated Environments

In a consumer chatbot, returning outdated preferences is annoying. In financial services, it’s a regulatory violation. In pharma, it could mean recommending a drug interaction that was flagged six months ago.

Problem 2: no audit trail

When a regulator asks “What did the agent know, and when did it know it?”, you need an answer. Not a vague one. A precise one, with timestamps, showing exactly which facts the agent had access to at the moment it made the decision.

Most agent memory systems can’t answer this question. They store the current state of memory. They don’t track how memory evolved over time. When a fact changes, the old version is overwritten or deleted. The history is gone.

In regulated work, the record has to survive the question, not just the system.

This is a hard problem in financial services. SEC Rule 17a-4 requires broker-dealers to preserve records in a non-rewriteable, non-erasable format for specified retention periods. MiFID II Article 16(6) requires investment firms to keep records of all services, activities, and transactions “sufficient to enable the competent authority to fulfil its supervisory tasks.” FINRA Rule 4511 requires firms to make and preserve books and records as required under FINRA rules.

An agent memory system without an immutable audit trail is a compliance liability.

Problem 3: data leaves your network

In pharma, patient medication histories are Protected Health Information under HIPAA. In financial services, client trading patterns and exposure data may constitute Material Non-Public Information (MNPI) under SEC regulations. In both cases, there are strict rules about where this data can go.

Most agent memory systems depend on external APIs for core operations. They send text to OpenAI or Cohere for embedding. They send conversations to GPT-4 or Claude for entity extraction. They store vectors in Pinecone or Qdrant. Every step involves data leaving your infrastructure and landing on someone else’s servers.

Memory Architecture: Cloud Dependency vs Self-Contained

For a consumer app, this is fine. For a bank handling MNPI, it’s a potential regulatory violation. For a pharma company processing patient data without a Business Associate Agreement (BAA) in place with every vendor in the chain, it’s a HIPAA breach.

Problem 4: “delete” doesn’t mean deleted

GDPR Article 17 gives individuals the right to erasure. When a patient or client says “forget me,” you need to prove that you did. Not “we called the delete API.” Prove it. In a way that would hold up to regulatory scrutiny.

Most memory systems offer an API-level delete. You call memory.delete(user_id) and the system removes the record from its primary store. But what about the embedding vectors derived from that data? The entity relationships extracted from it? The vector index entries? The WAL (Write-Ahead Log) that might replay the data on crash recovery?

One system (Letta) lets agents self-edit their own memory via tool calls. An agent can modify or delete memories that are relevant to a compliance investigation. From a regulatory standpoint, this makes the entire memory store inadmissible as a compliance record.

What a compliant agent memory system requires

So what does a memory system need before you can trust it in this kind of environment? After working through these problems with teams in financial services and pharma, we’ve identified seven requirements that a memory system must meet for regulated environments:

In plain English, the system has to remember facts, remember its own history, keep data under your control, and prove what happened later.

Bitemporal data model. Every fact tracks two timelines: when it became true in the real world, and when the system learned about it. Old facts are temporally invalidated, not deleted. Point-in-time reconstruction is always available.
WORM audit trail. Every observation, retrieval, deletion, and configuration change is recorded in a tamper-evident, append-only log with HMAC hash chains. SEC 17a-4(f) requires non-rewriteable, non-erasable record preservation. No other agent memory system offers this.
Air-gapped deployment. The entire system runs on your infrastructure with zero external API calls. Embeddings computed locally. No data leaves your network. Single-command deployment, not a five-service orchestration.
Verifiable deletion. When data is deleted, it’s gone. Byte-level zeroing of content. Removal from every index (vector, BM25, entity hash). Forced WAL checkpoint to prevent replay. Immutable audit log recording what was deleted and when, without preserving the deleted content.
Encryption at rest. AES-256-GCM with authenticated encryption. Not “the cloud provider handles it.” Application-level encryption where you control the key.
Explainable retrieval. When the agent retrieves context, the system can show the exact activation path: which facts were considered, how they were scored, why certain facts ranked higher than others. Regulators increasingly require explainability for AI-driven decisions.
Sub-50ms retrieval. Risk assessment, compliance screening, and trading workflows have hard latency requirements. Python-based memory systems with garbage collection pauses and external API round-trips can’t meet them consistently.

Compliance Readiness: Agent Memory Systems

How the governed memory engine addresses each requirement

We built the governed memory engine for environments where these requirements aren’t optional. Here’s how each one works.

Bitemporal data model. Every edge in the knowledge graph carries four timestamps: when the relationship became true, when it stopped being true, when the system recorded it, and when the system invalidated it. A fact is “currently valid” only when both invalidation timestamps are unset. Retrieval applies temporal filtering before ranking. Stale facts never enter the candidate pool, regardless of how good their embedding scores are. And the full history is preserved, so you can reconstruct exactly what the system believed at any point in time.

WORM audit trail. Every event (observations, retrievals, deletions, auth events, config changes) is written to a Write-Once-Read-Many log with HMAC-SHA-256 hash chains. Each entry includes the hash of the previous entry, creating a tamper-evident chain. Segments are sealed with a Merkle root for cryptographic integrity verification. Configurable retention locks (e.g., 7 years for SEC compliance). Streaming to SIEM systems via syslog, webhooks, or S3-compatible storage.

Air-gapped deployment. Two processes: a Zig core and a Rust sidecar, communicating over a Unix domain socket. Embeddings run locally via ONNX Runtime (BGE-small-en-v1.5). No OpenAI API calls. No cloud vector database. No external dependencies in the data path. Deploy with docker compose up. The entire system fits in a ~15MB binary.

Verifiable deletion. The ForgetEntity API performs a 7-step cascade: remove from vector index, zero all edges, regenerate affected community summaries, zero text blob content at the byte level (not just deallocation), remove from all indexes, write an immutable audit log entry, and force a WAL checkpoint. Data cannot be recovered after deletion. The audit trail records that deletion occurred without preserving the deleted content.

Encryption at rest. AES-256-GCM authenticated encryption on every text blob. Random nonce per blob prevents pattern analysis. Key provided via environment variable, so your key management infrastructure controls access. FIPS 140-2 mode planned via OpenSSL provider integration.

Explainable retrieval. the governed memory engine uses spreading activation from the ACT-R cognitive architecture. Instead of returning “these 5 documents scored highest on cosine similarity” (a black box), it produces a traceable activation path: the query anchored to these entities, activation spread along these edges with these weights, these facts were boosted by recency, these were suppressed by lateral inhibition. Every step is auditable.

Sub-50ms retrieval. The Zig core uses arena-per-query allocation with zero garbage collection. Memory-mapped graph storage with compile-time alignment verification. P95 retrieval latency of 24ms at 10K nodes. P99 of 39ms. Python-based competitors measure retrieval in hundreds of milliseconds. For a trading desk compliance agent that needs to check risk limits before execution, this difference matters.

Real-world scenarios

Here is what those requirements look like when the agent is doing real work.

Trading desk: client risk reclassification. A client moves from Professional to Retail classification. the governed memory engine’s bitemporal model immediately invalidates the old classification edge and creates a new one. Any subsequent query returns only the current classification. The WORM audit trail records the change. If a regulator asks six months later what the agent knew when it approved a specific trade, the system can reconstruct its exact knowledge state at that timestamp.

Pharma: drug interaction update. A clinical trial reveals a new interaction between Drug A and Drug B. The observation is ingested, and the spreading activation retrieval ensures that any query about either drug surfaces the interaction warning, even if the query text doesn’t mention the interaction directly. The graph structure (Drug A -> interacts_with -> Drug B) carries the signal. Vector search would miss this unless the query contained the exact right keywords.

Wealth management: GDPR right-to-erasure. A client requests that all their data be deleted. The ForgetEntity API cascades through every node, edge, embedding vector, and index entry associated with that client. Content is zeroed at the byte level. The WAL is checkpointed to prevent replay. An immutable audit log entry records the deletion. When the DPA asks for proof of deletion, you have a cryptographically verifiable record.

Post-trade surveillance: regulatory inquiry. A regulator asks: “Show us every piece of information your compliance agent had access to when it approved trade #4847 on March 15th.” the governed memory engine’s bitemporal queries reconstruct the exact graph state at that timestamp. The WORM audit log shows the retrieval that was performed, the activation scores, and the context that was returned to the agent. This level of reconstruction is impossible with memory systems that only store current state.

The competitive reality

The benchmark is not whether a memory system works in a demo. It is whether it can survive regulated scrutiny. We’ve evaluated every agent memory system on the market against the seven requirements above. No other system meets all of them.

Mem0 is the fastest to deploy and has the simplest API. But it lacks bitemporal auditing, WORM trails, and explainable retrieval. Air-gapped deployment is limited to an enterprise tier. For consumer applications, Mem0 is a strong choice. For regulated environments, it has structural gaps.

Zep/Graphiti has a good temporal knowledge graph model. But Zep deprecated their self-hosted Community Edition. The full feature set is cloud-only. Financial institutions that require on-premises deployment cannot use Zep’s capabilities. This is a disqualifying constraint for most banks and trading firms.

Letta has an interesting agent architecture, but agents can self-edit their own memory. From a compliance perspective, this means the memory store is not a reliable record of what the agent knew. An agent could modify or delete memories relevant to a regulatory investigation.

Cognee is air-gap capable and has encryption, but lacks bitemporal queries, WORM audit trails, and published performance numbers. Deployment requires orchestrating three separate databases (Kuzu, LanceDB, PostgreSQL).

LangMem has no temporal model, no encryption documentation, and no multi-tenant isolation.

The bottom line

AI agents in regulated industries need memory systems designed for those environments from day one. Not consumer memory systems with compliance features bolted on after the fact.

The cost of getting this wrong isn’t a degraded user experience. It’s regulatory fines, legal liability, and reputational damage. The cost of getting it right is quieter, but much more important: a memory system that can answer “what did the agent know, and when did it know it?” with cryptographic certainty.

That’s what we built. If your team is deploying AI agents in financial services, trading, or pharma and you’re evaluating memory infrastructure, we’d welcome the conversation.

the governed memory engine is open-source under the AGPL license with SDKs in Python, TypeScript, Rust, and Elixir. The project is currently in limited release while we incorporate feedback from early customers. If you’re interested in evaluating it for your environment, get in touch.