Context Management#

Strategies for managing the limited context window available to LLM agents, especially in multi-skill pipelines where different agent personas must share information efficiently.

Why It Matters#

Context windows are finite. Every skill, instruction, tool definition, and conversation turn consumes tokens. Without management, agents hit limits, lose early instructions, or waste budget on irrelevant context.

Progressive Disclosure#

The agent-skills-standard and claude-code both use this pattern:

Metadata (~100 tokens): Loaded at startup for all skills — just name + description
Instructions (<5000 tokens): Loaded only when skill activated
Resources: Loaded only when specifically needed

claude-code extends this: MCP tool definitions are deferred by default (only names consume context until a tool is used).

Four Recipes (from ten-pillars-agentic-skill-design)#

1. Chunking#

Split large documents into semantic chunks with overlap for context continuity. Overlap last 2 sentences between chunks.

2. Progressive Summarization#

Two-pass compression: extract key facts, then compress to target ratio. Preserves critical details while reducing token count.

3. Selective Context Loading#

A ContextManager that registers which context keys each skill needs, then loads only relevant context with priority-based budget allocation. Each skill gets only what it needs.

4. Agent Persona Context Templates#

Minimal context handoff between agent personas. Define required vs. optional fields per handoff (analyst→engineer→reviewer). Only required fields guaranteed; optional added if budget permits.

Implementations Across Tools#

Tool	Context Strategy
claude-code	Auto-compaction, deferred MCP tools, subagent isolation, CLAUDE.md under 200 lines
scion	Each agent gets own container with own context. No shared context beyond git.
kiro	Persistent context across tasks/repos/sessions. Learns from feedback.

Key Tension#

Persistent context (Kiro, Claude Code auto memory): Agent accumulates knowledge over time. Risk: stale or wrong memories.
Fresh context (Scion, Claude Code sessions): Each task starts clean. Risk: re-discovering known information.
Selective loading (Ten Pillars recipes): Middle ground — load only what’s relevant per skill/task.

For a deep dive into memory architectures that address this tension, see agent-memory-persistence. The continuum-memory-architectures paper formalizes six requirements any real memory system must meet. mem0 provides the most benchmarked production implementation.

Provenance as Context Metadata#

notebooklm takes a different approach to context management: rather than token budgets and progressive disclosure, it separates context by authorship. Written Notes (human) and Saved Responses (AI) are distinct types, making provenance visible. This is a lightweight form of context metadata — knowing who produced a piece of context, not just what it contains. The 5,000-word note query limit is a concrete example of context window constraints surfacing in a consumer product.

CLAUDE.md as Architecture Enforcement#

vibe-coding-lessons-k10s reveals a critical secondary role for context files: CLAUDE.md as a constraint mechanism, not just an instruction file. When AI generates code without architectural boundaries, it defaults to god objects and feature-stuffing. The fix is encoding invariants in the persistent context:

State ownership rules (which struct owns which data)
Scope boundaries (who you’re NOT building for)
Data representation rules (never flatten to positional arrays)
Concurrency rules (background tasks never mutate UI state directly)

The AI follows rules it can see — it just won’t invent them. This creates a tension with token optimization: CLAUDE.md costs tokens on every turn, but removing architectural constraints leads to worse code. The resolution is lean but precise — lookup table, not brain dump.

Token Optimization Tactics#

claude-code-token-optimization provides seven practical levers for reducing context waste:

Model switching — match model cost to task complexity (Haiku/Sonnet/Opus)
CLAUDE.md sizing — persistent file costs tokens every turn; keep lean
Subagent isolation — verbose work stays in child context, only summary returns
Precise targeting — exact files/lines vs. “look around the repo”
Proactive compaction — /compact before overload, not after
Context inspection — /context to find quiet offenders
Lean tooling — fewer connected tools = less overhead

Core insight: “Stop thinking about prompts and start thinking about context architecture.”

Filesystem as Context Architecture (ICM)#

icm-folder-structure formalizes the most radical version of this principle: the filesystem IS the context management layer. No framework code needed.

Five-layer hierarchy: identity → routing → stage contract → reference material → working artifacts
Each stage receives 2,000-8,000 focused tokens vs. 30,000-50,000 monolithic
Avoids “lost in the middle” degradation by construction — irrelevant tokens never loaded
Layer 3 (reference: “internalize as constraints”) vs. Layer 4 (working: “process as input”) gives the model structural signals about how to use each piece of context
Stage contracts (CONTEXT.md) are simultaneously agent instructions AND human documentation

ICM is the architectural pattern that implements selective context loading as a first-class design decision rather than an optimization applied after the fact.

Knowledge Interchange: Open Knowledge Format#

open-knowledge-format addresses context management at the organizational interchange level rather than the agent level. The core problem it solves: agents in different teams, tools, and organizations need the same contextual knowledge (table schemas, metric definitions, runbooks) but each system locks it in incompatible formats.

OKF’s contribution to context management:

Progressive disclosure by construction: index.md files at each directory level let agents navigate a bundle top-down, loading only what they need — the same pattern as selective context loading but standardized across producers.
Producer/consumer decoupling: Context producers (data teams, documentation pipelines) and consumers (agents, visualizers) don’t need to agree on tooling — only on format.
Filesystem as namespace: File path = concept identity. No registry, no API, no token overhead for coordination.

This complements icm-folder-structure (which uses filesystem as context architecture for a single agent’s workflow) by providing the interoperability layer for sharing knowledge between agents and teams.