Context Management#
Strategies for managing the limited context window available to LLM agents, especially in multi-skill pipelines where different agent personas must share information efficiently.
Why It Matters#
Context windows are finite. Every skill, instruction, tool definition, and conversation turn consumes tokens. Without management, agents hit limits, lose early instructions, or waste budget on irrelevant context.
Progressive Disclosure#
The agent-skills-standard and claude-code both use this pattern:
- Metadata (~100 tokens): Loaded at startup for all skills — just name + description
- Instructions (<5000 tokens): Loaded only when skill activated
- Resources: Loaded only when specifically needed
claude-code extends this: MCP tool definitions are deferred by default (only names consume context until a tool is used).
Four Recipes (from ten-pillars-agentic-skill-design)#
1. Chunking#
Split large documents into semantic chunks with overlap for context continuity. Overlap last 2 sentences between chunks.
2. Progressive Summarization#
Two-pass compression: extract key facts, then compress to target ratio. Preserves critical details while reducing token count.
3. Selective Context Loading#
A ContextManager that registers which context keys each skill needs, then loads only relevant context with priority-based budget allocation. Each skill gets only what it needs.
4. Agent Persona Context Templates#
Minimal context handoff between agent personas. Define required vs. optional fields per handoff (analyst→engineer→reviewer). Only required fields guaranteed; optional added if budget permits.
Implementations Across Tools#
| Tool | Context Strategy |
|---|---|
| claude-code | Auto-compaction, deferred MCP tools, subagent isolation, CLAUDE.md under 200 lines |
| scion | Each agent gets own container with own context. No shared context beyond git. |
| kiro | Persistent context across tasks/repos/sessions. Learns from feedback. |
Key Tension#
- Persistent context (Kiro, Claude Code auto memory): Agent accumulates knowledge over time. Risk: stale or wrong memories.
- Fresh context (Scion, Claude Code sessions): Each task starts clean. Risk: re-discovering known information.
- Selective loading (Ten Pillars recipes): Middle ground — load only what’s relevant per skill/task.
For a deep dive into memory architectures that address this tension, see agent-memory-persistence. The continuum-memory-architectures paper formalizes six requirements any real memory system must meet. mem0 provides the most benchmarked production implementation.
Provenance as Context Metadata#
notebooklm takes a different approach to context management: rather than token budgets and progressive disclosure, it separates context by authorship. Written Notes (human) and Saved Responses (AI) are distinct types, making provenance visible. This is a lightweight form of context metadata — knowing who produced a piece of context, not just what it contains. The 5,000-word note query limit is a concrete example of context window constraints surfacing in a consumer product.
CLAUDE.md as Architecture Enforcement#
vibe-coding-lessons-k10s reveals a critical secondary role for context files: CLAUDE.md as a constraint mechanism, not just an instruction file. When AI generates code without architectural boundaries, it defaults to god objects and feature-stuffing. The fix is encoding invariants in the persistent context:
- State ownership rules (which struct owns which data)
- Scope boundaries (who you’re NOT building for)
- Data representation rules (never flatten to positional arrays)
- Concurrency rules (background tasks never mutate UI state directly)
The AI follows rules it can see — it just won’t invent them. This creates a tension with token optimization: CLAUDE.md costs tokens on every turn, but removing architectural constraints leads to worse code. The resolution is lean but precise — lookup table, not brain dump.
Token Optimization Tactics#
claude-code-token-optimization provides seven practical levers for reducing context waste:
- Model switching — match model cost to task complexity (Haiku/Sonnet/Opus)
- CLAUDE.md sizing — persistent file costs tokens every turn; keep lean
- Subagent isolation — verbose work stays in child context, only summary returns
- Precise targeting — exact files/lines vs. “look around the repo”
- Proactive compaction —
/compactbefore overload, not after - Context inspection —
/contextto find quiet offenders - Lean tooling — fewer connected tools = less overhead
Core insight: “Stop thinking about prompts and start thinking about context architecture.”
Filesystem as Context Architecture (ICM)#
icm-folder-structure formalizes the most radical version of this principle: the filesystem IS the context management layer. No framework code needed.
- Five-layer hierarchy: identity → routing → stage contract → reference material → working artifacts
- Each stage receives 2,000-8,000 focused tokens vs. 30,000-50,000 monolithic
- Avoids “lost in the middle” degradation by construction — irrelevant tokens never loaded
- Layer 3 (reference: “internalize as constraints”) vs. Layer 4 (working: “process as input”) gives the model structural signals about how to use each piece of context
- Stage contracts (CONTEXT.md) are simultaneously agent instructions AND human documentation
ICM is the architectural pattern that implements selective context loading as a first-class design decision rather than an optimization applied after the fact.