Agent Memory & Persistence#

How LLM agents retain, retrieve, and manage knowledge across sessions. Identified as “the unsolved frontier” in cross-source-themes — the biggest divergence across tools in the wiki.

The Core Problem#

LLMs are stateless by design. Context windows reset at session end. Without persistent memory, agents start fresh every time, making the same mistakes and asking the same questions. But persistent context compounds value AND errors — the fundamental tension.

Memory Architecture Patterns#

Four patterns have emerged (per agent-memory-systems-2026):

Pattern	Best For	Key Tradeoff
Vector-Only	RAG apps, prototyping	Fast but no relationships
Graph + Vector (mem0)	Autonomous agents	Relationships + semantic search, more complex
File + Database (llm-wiki-pattern)	Solo devs, small teams	Human-readable, git-friendly, no semantic search
Hierarchical	Enterprise agents	Mimics human cognition, complex to tune

Memory Types (Cognitive Science)#

From mem0-memory-management and efficient-memory-architectures, mapped from psychology:

Working memory — active context window. Fast, limited, temporary.
Episodic memory — specific past experiences. Decays in relevance.
Semantic memory — persistent facts. Knowledge graphs or vector DBs.
Procedural memory — learned skills and workflows. Underused, high-value.

Critical insight: treating all memory identically is the root cause of most production failures.

Four Memory Layers (mem0)#

Conversation → “what’s happening now?”
Session → “what’s this task’s context?”
User → “what do I know about this person?”
Organizational → “what’s universally true?”

CMA: Formal Requirements (continuum-memory-architectures)#

Six necessary conditions for real agent memory (standard RAG meets none):

Persistence across sessions
Selective retention (forgetting curves)
Retrieval-driven mutation (lookups alter future accessibility)
Associative routing (graph traversal, multi-hop)
Temporal continuity (episode ordering)
Consolidation and abstraction (gist extraction)

CMA won 82/92 decisive trials vs RAG baseline.

Failure Modes of Flat Vector Storage#

From efficient-memory-architectures:

Context Poisoning — agent retrieves own hallucinations, compounds errors
Context Distraction — semantic similarity ≠ relevance
Context Clash — contradictory info loaded simultaneously
Work Duplication — multi-agent systems without shared memory

Forgetting as Design Requirement#

Both mem0-memory-management and continuum-memory-architectures emphasize active forgetting:

Bjork’s “New Theory of Disuse”: forgetting protects retrieval quality
RIF scoring: Recency × Relevance × Frequency
Ebbinghaus curves: steep initial decay, reinforced memories persist
Production: 40-60% DB size reduction after 30 days of pruning

How Wiki Tools Handle Memory#

Tool	Approach	From
scion	No memory (each agent starts fresh)	cross-source-themes
claude-code	CLAUDE.md + auto memory	claude-code-docs
kiro	Learns from reviews, persistent	kiro-autonomous-agent
pai	TELOS (10 files), three-tier hot/warm/cold	personal-ai-infrastructure
mem0	Graph+vector, four layers, five scopes	mem0-memory-management
llm-wiki-pattern	Compiled markdown knowledge base	llm-wiki-karpathy

Benchmark Landscape#

System	Accuracy (LOCOMO)	Tokens
Full-context	72.9%	~26,000
Mem0g (graph)	68.4%	~1,800
A-Mem	68.6%	—
Mem0 (vector)	66.9%	~1,800
OpenAI Memory	52.9%	—
LangMem	50.9%	—

The 4.5-point accuracy gap between full-context and selective retrieval is the core tradeoff: 93% fewer tokens for ~5% less accuracy.