LLM Wiki Pattern#
A pattern for building personal knowledge bases where the LLM incrementally builds and maintains a persistent, interlinked wiki from raw sources. Proposed by andrej-karpathy. This wiki is a running instance of this pattern.
Core Insight#
Wiki > RAG. RAG rediscovers knowledge from scratch on every query — no accumulation. An LLM-maintained wiki compiles knowledge once and keeps it current. Cross-references, contradictions, and synthesis compound with every source added.
Three-Layer Architecture#
Raw Sources (immutable) → Wiki (LLM-maintained) → Schema (CLAUDE.md)
you add these LLM writes these you + LLM co-evolve- Raw sources: Immutable documents. LLM reads, never modifies.
- The wiki: LLM-generated markdown. Summaries, entities, concepts, comparisons, synthesis. LLM owns entirely.
- The schema: Instructions telling the LLM how the wiki works — structure, conventions, workflows. The key configuration file.
Three Operations#
- Ingest: Process a source → create/update 10-15 wiki pages → update index → append to log
- Query: Read index → find relevant pages → synthesize answer → optionally file back as new page
- Lint: Health-check for contradictions, stale claims, orphan pages, missing cross-references, data gaps
Navigation#
- index.md: Content-oriented catalog. LLM reads first on every query. Works at moderate scale (~100 sources, ~hundreds of pages) without embedding infrastructure.
- log.md: Chronological append-only record. Parseable with
grep "^## \[" log.md.
Why It Works#
Humans abandon wikis because maintenance burden grows faster than value. LLMs don’t get bored, don’t forget cross-references, can touch 15 files in one pass. The maintenance cost is near zero.
The human’s job: curate sources, direct analysis, ask good questions, think about meaning. The LLM’s job: everything else.
Intellectual Lineage#
Related to Vannevar Bush’s Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush’s vision was closer to this than to what the web became: private, actively curated, connections as valuable as documents. The part he couldn’t solve was who does the maintenance.
Connections to the Ecosystem#
- The schema (CLAUDE.md) is effectively a skill in the agent-skills-standard sense — instructions that shape agent behavior for a domain
- Index-first navigation is a form of context-management — selective loading, same principle as progressive disclosure
- Could scale to multi-agent-orchestration: separate agents for ingest, query, lint, with the wiki as shared state
- claude-code’s CLAUDE.md and auto memory are the native implementation substrate
Contrast: NotebookLM#
notebooklm shares the “human curates sources, AI processes them” principle but takes a different path:
- Session-oriented vs. cumulative: NotebookLM notes live within a session/notebook; the LLM Wiki compounds knowledge permanently across all sources.
- Interactive exploration vs. structured ingest: NotebookLM encourages open-ended conversation then pinning; the wiki follows a defined ingest → entity → concept → cross-reference workflow.
- Manual cross-reference vs. automatic wikilinks: NotebookLM’s “Suggest Related Ideas” is on-demand; the wiki maintains links as a structural invariant.
Both validate the core insight: grounding AI in curated sources produces better knowledge work than open-ended generation.