Context Management#

Strategies for managing the limited context window available to LLM agents, especially in multi-skill pipelines where different agent personas must share information efficiently.

Why It Matters#

Context windows are finite. Every skill, instruction, tool definition, and conversation turn consumes tokens. Without management, agents hit limits, lose early instructions, or waste budget on irrelevant context.

Progressive Disclosure#

The agent-skills-standard and claude-code both use this pattern:

Metadata (~100 tokens): Loaded at startup for all skills — just name + description
Instructions (<5000 tokens): Loaded only when skill activated
Resources: Loaded only when specifically needed

claude-code extends this: MCP tool definitions are deferred by default (only names consume context until a tool is used).

Four Recipes (from ten-pillars-agentic-skill-design)#

1. Chunking#

Split large documents into semantic chunks with overlap for context continuity. Overlap last 2 sentences between chunks.

2. Progressive Summarization#

Two-pass compression: extract key facts, then compress to target ratio. Preserves critical details while reducing token count.

3. Selective Context Loading#

A ContextManager that registers which context keys each skill needs, then loads only relevant context with priority-based budget allocation. Each skill gets only what it needs.

4. Agent Persona Context Templates#

Minimal context handoff between agent personas. Define required vs. optional fields per handoff (analyst→engineer→reviewer). Only required fields guaranteed; optional added if budget permits.

Implementations Across Tools#

Tool	Context Strategy
claude-code	Auto-compaction, deferred MCP tools, subagent isolation, CLAUDE.md under 200 lines
scion	Each agent gets own container with own context. No shared context beyond git.
kiro	Persistent context across tasks/repos/sessions. Learns from feedback.

Key Tension#

Persistent context (Kiro, Claude Code auto memory): Agent accumulates knowledge over time. Risk: stale or wrong memories.
Fresh context (Scion, Claude Code sessions): Each task starts clean. Risk: re-discovering known information.
Selective loading (Ten Pillars recipes): Middle ground — load only what’s relevant per skill/task.

Provenance as Context Metadata#

notebooklm takes a different approach to context management: rather than token budgets and progressive disclosure, it separates context by authorship. Written Notes (human) and Saved Responses (AI) are distinct types, making provenance visible. This is a lightweight form of context metadata — knowing who produced a piece of context, not just what it contains. The 5,000-word note query limit is a concrete example of context window constraints surfacing in a consumer product.