How AI Agents Remember: A Comparison of Memory Architectures#
Synthesized from 8 sources across this wiki. This analysis compares the four dominant memory architecture patterns, maps them against formal requirements from cognitive science, and provides a practical decision framework for choosing the right approach.
The Core Problem#
LLMs are stateless. Every session starts blank. Without persistent memory, agents re-discover known information, repeat mistakes, and can’t build on prior work. But persistent memory introduces its own risks: stale facts, compounding errors, and growing retrieval noise. This is the fundamental tension the cross-source-themes identified as “the unsolved frontier.”
Four sources ingested in April 2026 have moved this from “unsolved” to “understood with clear tradeoffs.”
Why RAG Is Not Enough#
Standard RAG treats memory as a stateless lookup table (continuum-memory-architectures). It fails on six dimensions:
| Requirement | What It Means | RAG Support |
|---|---|---|
| Persistence | State survives across sessions | ✅ Partial (stores persist, but no identity continuity) |
| Selective Retention | Old/irrelevant memories fade | ❌ Everything persists equally |
| Retrieval-Driven Mutation | Lookups change future accessibility | ❌ Read-only retrieval |
| Associative Routing | Multi-hop traversal across entities | ❌ Embedding distance only |
| Temporal Continuity | “What happened around X?” | ❌ Time is metadata, not structure |
| Consolidation | Episodes compress into knowledge | ❌ No abstraction mechanism |
CMA won 82 of 92 decisive trials against a RAG baseline. The evidence is clear: RAG is a starting point, not a destination.
Four Failure Modes of Flat Vector Storage#
Before choosing an architecture, understand what goes wrong with the simplest approach (efficient-memory-architectures):
Context Poisoning — Agent stores its own hallucinations, retrieves them later, compounds errors in a feedback loop. The most dangerous failure mode for autonomous agents.
Context Distraction — Vector DB returns top-10 semantically similar entries, but semantic similarity ≠ relevance. Critical information buried under noise. LLM attention diluted.
Context Clash — Contradictory facts loaded simultaneously (old address + new address). Agent guesses which is current. Often guesses wrong.
Work Duplication — In multi-agent systems without shared memory, agents duplicate each other’s work. Computational waste multiplies, state diverges.
The Four Architecture Patterns#
Pattern 1: Vector-Only#
Store text embeddings in a vector database. Search by cosine similarity.
- Tools: Pinecone, Chroma, Weaviate, Qdrant
- Strengths: Fast (p95 < 50ms), mature tooling, easy to start
- Weaknesses: No relationships, no time awareness, no contradiction detection
- Cost: ~$0.10-0.50/GB/month. Free tiers available.
- Best for: RAG apps, document Q&A, prototyping, tight budgets
- CMA compliance: 1/6 (persistence only)
Pattern 2: Graph + Vector (Mem0 Style)#
Combine graph databases with vector embeddings. Entities as nodes, relationships as edges. Semantic search + graph traversal.
- Tools: mem0, Neo4j + embeddings
- Strengths: Understands relationships, multi-hop reasoning, temporal awareness, contextual retrieval
- Weaknesses: More complex setup, higher learning curve, graph ETL maintenance (budget 20-30% engineering time)
- Cost: Similar to vector-only for storage. mem0 abstracts infrastructure.
- Best for: Autonomous agents, customer service, research assistants, long-term autonomy
- CMA compliance: 4/6 (persistence, selective retention, associative routing, partial temporal)
Benchmark (LOCOMO): Mem0g achieves 68.4% accuracy at 1,800 tokens vs full-context 72.9% at 26,000 tokens. That’s 93% fewer tokens for a 4.5-point accuracy tradeoff.
Pattern 3: File + Database Hybrid#
Markdown files in directories with a lightweight index (SQLite, frontmatter). Human-readable, git-compatible.
- Tools: Markdown + SQLite, Obsidian, this wiki (llm-wiki-pattern)
- Strengths: Human-readable, git version control, easy to debug, portable, transparent
- Weaknesses: Manual schema management, no built-in semantic search, harder to scale past ~200 files
- Cost: Essentially free (filesystem + optional SQLite)
- Best for: Solo developers, small teams, personal knowledge management, situations where debuggability matters more than speed
- CMA compliance: 2/6 (persistence, partial consolidation via manual curation)
Key insight: This wiki itself is a File + Database Hybrid memory system. The llm-wiki-pattern is a legitimate memory architecture — not just a note-taking approach.
Pattern 4: Hierarchical Memory#
Multi-tier memory inspired by cognitive science. Information flows between layers based on importance and access patterns.
- Implementations:
- H-MEM: Domain → Category → Memory Trace → Episode. Index-based routing eliminates irrelevant branches early.
- MemGPT: OS-inspired paging. Small Core Memory (always in context) + massive External Context (archival). Token savings exceeding 90%.
- pai: Three-tier hot/warm/cold with continuous signal capture and self-modification.
- Strengths: Mimics human cognition, efficient resource use, good for long-lived agents
- Weaknesses: Complex to implement, needs tuning for promotion/demotion rules, overkill for simple agents
- Cost: Infrastructure varies. MemGPT reduces token costs 90% (10,000 → 1,000 tokens).
- Best for: Enterprise agents, agents running days/weeks, multi-tenant SaaS
- CMA compliance: 5/6 (all except full retrieval-driven mutation in most implementations)
Decision Matrix#
| Factor | Vector-Only | Graph+Vector | File+DB | Hierarchical |
|---|---|---|---|---|
| Semantic Search | ★★★ | ★★★ | ★★ | ★★ |
| Relationship Queries | ★ | ★★★ | ★★ | ★★ |
| Debuggability | ★★ | ★★ | ★★★ | ★★ |
| Setup Complexity | Low | Medium | Low | High |
| Scalability | ★★★ | ★★ | ★★ | ★★ |
| Token Efficiency | ★★ | ★★★ | ★★★ | ★★★ |
| CMA Compliance | 1/6 | 4/6 | 2/6 | 5/6 |
| Monthly Cost | $0.10-0.50/GB | Similar | ~Free | Variable |
When to Use What#
Simple Q&A / RAG app? → Vector-Only
Autonomous agent, long-running? → Graph + Vector (Mem0)
Solo dev, want transparency? → File + Database (wiki pattern)
Enterprise, multi-tenant? → Hierarchical (MemGPT / H-MEM)Progression Path#
Start simple, add complexity when you hit real problems:
- Start: Vector-Only (prototype, validate the use case)
- Scale: Add Graph when you need relationships (“who worked with whom on project X?”)
- Optimize: Add Hierarchical layers when token costs or retrieval noise become problems
- Maintain: Add Forgetting when the store grows past useful size
The Memory Types That Matter#
All four sources converge on the same cognitive science mapping:
| Memory Type | What It Stores | Persistence | Implementation |
|---|---|---|---|
| Working | Current context window | Session only | LLM context window |
| Episodic | Specific past events | Medium-term, decays | Vector DB with timestamps |
| Semantic | Persistent facts | Long-term | Knowledge graph or vector DB |
| Procedural | Learned skills/workflows | Long-term | Code, PDDL, Pydantic schemas |
Critical insight from mem0-memory-management: Treating all memory identically is the root cause of most production failures. Episodic memories decay faster than semantic ones. Procedural memory is underused but disproportionately valuable.
Forgetting: The Counterintuitive Requirement#
Every source agrees: a memory system that never forgets eventually fails.
- Bjork’s Theory of Disuse: Forgetting is active and adaptive — it protects retrieval quality (mem0-memory-management)
- Ebbinghaus Curves: Steep initial decay, reduced rate for reinforced memories (efficient-memory-architectures)
- RIF Scoring: RIF = α×Recency + β×Relevance + γ×Utility — tunable per domain (efficient-memory-architectures)
- CMA Selective Retention: Memories compete for accessibility based on recency, usage, salience (continuum-memory-architectures)
- Production results: Aggressive forgetting reduces vector DB size 40-60% after 30 days
Domain caveat: Healthcare, financial, and legal domains may legally require perfect recall. Use tiered archival storage instead of deletion.
How Existing Wiki Tools Handle Memory#
The wiki’s original sources already documented different memory approaches. The new sources provide the theoretical framework to evaluate them:
| Tool | Pattern | CMA Score | Strength | Weakness |
|---|---|---|---|---|
| scion | None (fresh per agent) | 0/6 | Clean, no stale data | Re-discovers everything |
| claude-code | File (CLAUDE.md) + auto | 2/6 | Simple, human-editable | No relationships, no forgetting |
| kiro | Persistent + learning | 3/6 | Compounds across sessions | Risk of stale memories |
| pai | Hierarchical (hot/warm/cold) | 5/6 | Most sophisticated | High setup cost |
| mem0 | Graph + Vector | 4/6 | Best benchmarked | Graph maintenance overhead |
| llm-wiki-pattern | File + Database | 2/6 | Transparent, git-friendly | Manual curation required |
The Extraction Problem: What to Remember#
mem0-memory-management provides the clearest answer with its two-phase pipeline:
Phase 1 — Extract: Not every message is memory-worthy. Run an LLM pass to identify discrete, durable facts. Not summaries. Not compressed conversation. Specific facts: “user prefers Python,” “project uses PostgreSQL.”
Phase 2 — Update: Before writing, check against existing store:
- ADD — new fact
- UPDATE — supersedes existing (user changed jobs)
- DELETE — existing fact no longer true
- NOOP — duplicate, skip
This resolves contradictions at write time, not query time. The store stays coherent as it grows. This directly addresses the wiki’s core tension: “persistent context compounds value AND errors.”
Benchmark Summary#
| System | Accuracy | Latency (median) | Tokens | Approach |
|---|---|---|---|---|
| Full-context | 72.9% | 9.87s | ~26,000 | Send everything |
| CMA | 89% win rate | 1.48s | — | Graph + temporal + mutation |
| Mem0g | 68.4% | 1.18s | ~1,800 | Graph + vector |
| A-Mem | 68.6% | — | — | Zettelkasten-inspired |
| Mem0 | 66.9% | 0.71s | ~1,800 | Vector-only selective |
| OpenAI Memory | 52.9% | — | — | Built-in |
| LangMem | 50.9% | — | — | — |
| MemoryBank | 31.3% | — | — | — |
The core tradeoff: Full-context is most accurate but 14× more tokens and 7× slower. Selective retrieval (Mem0/CMA) trades ~5% accuracy for 93% token reduction and 7× speed improvement. For any interactive, real-time agent, selective retrieval is the production-viable path.
Recommendations#
For most developers starting out: Use Vector-Only (Pinecone/Chroma). It’s the fastest path to working memory. Switch to Graph+Vector when you hit relationship queries or contradiction problems.
For production autonomous agents: Graph + Vector (mem0) is the current best balance. Benchmarked, production-tested, handles relationships and forgetting.
For personal knowledge management: File + Database (the llm-wiki-pattern). Human-readable, git-friendly, transparent. This wiki proves the pattern works at 33+ sources.
For enterprise/long-running agents: Hierarchical (MemGPT or pai-style). The setup cost is high but the token savings and cognitive fidelity pay back at scale.
For everyone: Implement forgetting. A memory system that never prunes will eventually drown in noise. Start with conservative decay rates and tune based on retrieval quality metrics.
Open Questions#
- Can CMA’s retrieval-driven mutation be implemented without unacceptable latency at scale?
- How should multi-agent systems share memory without cascading errors? (CRDTs? Event sourcing?)
- What’s the right forgetting rate for different domains?
- Can the File + Database pattern (this wiki) be enhanced with semantic search while keeping its transparency?
- How do you audit an evolving memory graph for compliance?
Analysis based on 8 sources ingested into this wiki between 2026-04-07 and 2026-04-14. Represents the state of agent memory architectures as of April 2026.