Cross-Source Theme Analysis#
42 sources, 12 tools, 4 OSS frameworks, 4 benchmarks, 2 standards, 3 methodologies, 4 memory systems, and sources covering cost, governance, UX, and industry impact. Here are the themes that appear across 3+ sources independently.
Refresh history: Originally written against 11 sources (Apr 9). Refreshed Apr 15 against 33 sources. Refreshed May 9 against 42 sources — added Gas Town, Symphony, Multica evidence to existing themes.
Theme 1: Context Is King (15/33 sources) ⭐⭐⭐⭐⭐#
The single most repeated idea. Now backed by quantitative evidence from memory research.
| Source | How it appears |
|---|---|
| agent-skills-standard | Progressive disclosure: ~100 tokens at startup, full content only when activated |
| claude-code | MCP tools deferred, skills load on demand, subagent context isolation, CLAUDE.md under 200 lines |
| ten-pillars-agentic-skill-design | Pillar 9: four context management recipes |
| pai | TELOS (10 files), three-tier memory, “context document” as core primitive |
| fabric | Per-pattern model mapping, composable strategies |
| ai-technique-podcast | “Context beats clever prompting.” |
| skills-pipeline-sleestk | Reference files loaded on demand, minimal structured context forward |
| llm-wiki-pattern | Index-first navigation — read catalog, drill into relevant pages only |
| scion | Each agent gets own container with own context. No shared pollution. |
| mem0-memory-management | Four memory layers. 93% token reduction with selective retrieval vs full-context. |
| continuum-memory-architectures | Six formal requirements for context persistence. CMA won 82/92 vs RAG. |
| efficient-memory-architectures | Four failure modes of flat vector storage. MemGPT: 90% token savings. |
| agent-cost-economics | Five waste vectors — 60-80% of tokens wasted on wrong context. |
| crewai-multi-agent | Role + backstory as persona-based context management |
| langgraph-agent-orchestration | Checkpointed state as persistent context across workflow steps |
Strengthened consensus: Not just “load the right thing” but now quantified: selective retrieval = 93% fewer tokens for ~5% accuracy tradeoff (mem0-memory-management). Context management is simultaneously a quality strategy AND a cost strategy (agent-cost-economics).
Theme 2: Composition Over Monoliths (17/42 sources) ⭐⭐⭐⭐⭐#
Validated across every framework — product-level and open-source alike.
| Source | How it appears |
|---|---|
| fabric | 251 focused patterns. Unix philosophy: pipe and compose. |
| skills-pipeline-sleestk | 6-stage YouTube pipeline. Each skill is one domain. |
| claude-code | Subagents (Explore, Plan, General-purpose). Skills per task. |
| scion | Harness per tool. Template per agent. Grove per project. |
| agent-skills-standard | One skill per directory. Under 500 lines. |
| ten-pillars-agentic-skill-design | Pillar 3 (SRP), Pillar 4 (modularity). |
| pai | 63 skills, 21 hooks, 14 agents, 12 standalone packs. |
| kiro | Powers as modular packages. Sub-agents for coordination. |
| autogen-multi-agent | Specialized agents communicate through dialogue |
| crewai-multi-agent | Crew of role-specialized agents with task dependencies |
| langgraph-agent-orchestration | Nodes as composable units in a graph |
| openai-swarm | Agents as system prompts + functions. Minimal units. |
| paperclip | Agents organized into companies with specialized roles |
| spec-kit | 30+ agents, 50+ extensions, each with focused scope |
| gastown | Seven specialized roles (Mayor, Polecats, Refinery, Witness, Deacon, Dogs). Molecules as composable workflow units. |
| symphony | WORKFLOW.md per repo. Each issue gets isolated workspace + agent session. Separation of scheduler from agent. |
| multica | Reusable skills that compound. Each agent is a focused teammate with a specific runtime. |
Strongest convergence in the wiki. Every single multi-agent framework chose small, focused, composable units. No exceptions.
Theme 3: The Human Stays in the Loop — But How Much? (13/33 sources) ⭐⭐⭐⭐⭐#
Now formalized with measurable UX patterns and a phased adoption framework.
| Source | Position |
|---|---|
| scion | “Interaction is imperative.” |
| kiro | Frontier agents: hours/days of autonomy. PR-only output. |
| claude-code | 6 permission modes — configurable dial. |
| pai | Self-modifying, but human sets goals (TELOS). |
| evaluating-agent-skills-caparas | Human review is Tier 3 — expensive, use sparingly. |
| ai-technique-podcast | “AI as thinking partner, not executor only.” |
| llm-wiki-pattern | Human curates sources. LLM does everything else. |
| agentic-ux-patterns | Six UX patterns with metrics: Intent Preview (>85% acceptance), Autonomy Dial (4 levels), Confidence Signal, Audit & Undo (<5% reversion), Escalation (>90% recovery). |
| agentic-ai-governance | Kill switches, dynamic least privilege, continuous observability. |
| autogen-multi-agent | Configurable human participation in agent conversations. |
| crewai-multi-agent | Delegation: agents can ask humans or other agents for help. |
| langgraph-agent-orchestration | Human-in-the-loop at any node (first-class). |
| agentic-ai-non-code-domains | Healthcare demands human oversight; finance requires compliance gates. |
New: The agentic-ux-patterns source formalizes this spectrum into six measurable patterns. The Autonomy Dial (Observe → Propose → Confirm → Autonomous) maps directly to Claude Code’s permission modes. Phased adoption: safety first → calibrated autonomy → proactive delegation.
Theme 4: Skills Are Evolving Into a Standard (6/33 sources) ⭐⭐⭐⭐#
Unchanged from original analysis. The evolution trajectory remains clear.
Fabric Patterns (2023) → Agent Skills Standard (2025) → Claude Code Skills (2026) → Pipelines + Evaluation. From simple prompt files to a full lifecycle.
Theme 5: Memory Is No Longer the Unsolved Frontier (10/33 sources) ⭐⭐⭐⭐⭐#
Upgraded from “unsolved” to “understood with clear tradeoffs.” Four new memory sources provide formal requirements, benchmarks, and architecture patterns.
| Source | Memory approach |
|---|---|
| pai | Three-tier hot/warm/cold. Self-modification. Most sophisticated product. |
| claude-code | CLAUDE.md + auto memory. Per working tree. |
| kiro | Persistent context. Learns from code reviews. |
| llm-wiki-pattern | The wiki IS the memory. File + Database pattern. |
| scion | No memory. Each agent starts fresh. |
| mem0-memory-management | Graph+vector, four layers, five scopes. LOCOMO benchmarks. |
| continuum-memory-architectures | Six formal CMA requirements. 82/92 wins vs RAG. |
| agent-memory-systems-2026 | Four patterns: vector-only, graph+vector, file+DB, hierarchical. |
| efficient-memory-architectures | H-MEM, MemGPT (90% savings), GraphRAG, selective forgetting. |
| crewai-multi-agent | Built-in short/long/entity memory across agents. |
Key advances: CMA defines six necessary conditions (RAG meets none). Mem0 provides production benchmarks (93% token reduction). Forgetting is now recognized as a design requirement, not a failure. See memory-architecture-comparison for the full analysis.
Theme 6: Git as Universal Substrate (9/42 sources) ⭐⭐⭐⭐⭐#
Upgraded from ⭐⭐⭐⭐ to ⭐⭐⭐⭐⭐. Gas Town and Symphony provide the strongest evidence yet that git is the coordination primitive for multi-agent systems.
| Source | How it appears |
|---|---|
| scion | Git worktrees per agent |
| kiro | Git branches, PR output |
| claude-code | Git-based workspaces |
| gastown | Git worktrees for every polecat. Dolt (git-for-data) for beads. Merge queue (Refinery). Wasteland federation via DoltHub. Most git-native tool in the wiki. |
| symphony | Per-issue workspace directories. Workspaces persist across runs. WORKFLOW.md version-controlled with the codebase. |
| multica | Git-based workspace isolation per agent task |
| paperclip | Agent-agnostic but assumes git-based code output |
| spec-kit | Spec-driven development with git-versioned artifacts |
| llm-wiki-pattern | The wiki itself is git-backed |
Key advance (May 2026): Gas Town takes git further than any other tool — worktrees for isolation, Dolt for cell-level merge of concurrent agent writes, and a Bors-style merge queue for quality gates. Symphony uses git implicitly (workspaces are filesystem directories that can be git repos via hooks). The pattern is universal.
Theme 7: Evaluation Has a Framework Now (8/33 sources) ⭐⭐⭐⭐#
Upgraded from “weakest link” to “framework exists, adoption lags.”
| Source | Contribution |
|---|---|
| evaluating-agent-skills-caparas | Three-tier framework (deterministic → LLM-judge → human) |
| ten-pillars-agentic-skill-design | Pillar 7. Acknowledged “no controlled study.” |
| anthropic-eval-guide | Success criteria, eval types, design principles |
| promptfoo | Open-source eval CLI, YAML test cases, CI/CD |
| humaneval-benchmark | Code generation: 164 problems, pass@k, 0% → 96.3% |
| swe-bench | Real-world SE: 2,294 GitHub issues, top 74.4% resolved |
| gaia-benchmark | General AI: 466 questions, humans 92% vs AI <50% |
| agentbench | Agent decision-making: 8 environments, multi-turn |
Key advance: The benchmark landscape now covers code generation (solved at 96%), real-world SE (rapidly improving at 74%), general reasoning (far from human at <50%), and interactive agents (commercial » open-source). See agent-benchmarks for the full comparison. Still missing: skill-level eval, multi-agent coordination quality, memory quality benchmarks.
Theme 8: Open Standards Are Winning (5/33 sources) ⭐⭐⭐#
Unchanged. MCP + Agent Skills as two-layer open substrate.
NEW Theme 9: Graphs Are Becoming the Consensus Orchestration Architecture (4/33 sources) ⭐⭐⭐#
| Source | Evidence |
|---|---|
| langgraph-agent-orchestration | Built on graphs from day one. Most production-ready OSS. |
| autogen-multi-agent | Transitioning from GroupChat to graph-based MAF. |
| scion | Directed workflows for agent coordination. |
| kiro | Sub-agents coordinated through structured task graphs. |
Both AutoGen (via MAF) and LangGraph converging on typed nodes + edges. The conversation-based approach (AutoGen v0.2 GroupChat) is being abandoned by its own creators. See multi-agent-framework-guide.
May 2026 nuance: gastown proves graphs aren’t the only path to production scale. Gas Town’s process-model (deterministic routing via external state, GUPP pull-based execution) scales to 20-30 agents without graphs. The graph convergence applies to framework-level orchestration; workspace-level orchestration may use different primitives entirely.
NEW Theme 10: Token Economics Drive Architecture (5/33 sources) ⭐⭐⭐#
| Source | Evidence |
|---|---|
| agent-cost-economics | 60-80% of tokens wasted. Five waste vectors. $5T infrastructure bet. |
| mem0-memory-management | 93% token reduction with selective retrieval. |
| efficient-memory-architectures | MemGPT: 90% token savings via OS-style paging. |
| context-management | Progressive disclosure, scoped instructions, deferred tools. |
| agent-memory-systems-2026 | Cost comparison across four memory patterns. |
Cost optimization is not a billing concern — it’s an architectural concern. Every memory, context, and orchestration decision has a direct token cost implication. See cost-optimization-guide.
NEW Theme 11: Governance Is the Next Frontier (4/33 sources) ⭐⭐⭐#
| Source | Evidence |
|---|---|
| agentic-ai-governance | Five pillars. Shadow AI ($412K/yr). NIST AI Agent Standards Initiative. |
| agentic-ux-patterns | Six UX patterns as user-facing governance layer. |
| agentic-ai-non-code-domains | Regulated industries (healthcare, finance) demand governance for deployment. |
| agent-cost-economics | 68% of employees use AI without IT approval. |
Legacy security models fail for agents (speed, identity, permissions all different). Regulatory landscape crystallizing: NIST, EU AI Act, OWASP, Singapore framework. See governance-safety-overview.
NEW Theme 12: Agentic AI Is Expanding Beyond Code (3/33 sources) ⭐⭐⭐#
| Source | Evidence |
|---|---|
| agentic-ai-non-code-domains | Six industries: finance (40-60% compliance reduction), healthcare, legal, manufacturing, telecoms, transport. |
| agent-cost-economics | Enterprise ARPU $450-500/mo. SaaS disruption. $24.2B raised in 2025. |
| llm-wiki-pattern | Applies to research, reading, business — anywhere knowledge accumulates. |
The wiki’s themes generalize across all industries. Main difference: non-code domains have higher stakes (healthcare hallucinations, legal liability). See beyond-code-industry-impact.
Theme Matrix (Updated May 2026)#
| Theme | Sources | Strength | Change |
|---|---|---|---|
| Context is king | 15/42 | ⭐⭐⭐⭐⭐ | → Quantified (93% token reduction) |
| Composition over monoliths | 17/42 | ⭐⭐⭐⭐⭐ | ↑ Gas Town, Symphony, Multica all validate |
| Human in the loop (spectrum) | 13/42 | ⭐⭐⭐⭐⭐ | → Formalized as 6 UX patterns |
| Memory architectures | 10/42 | ⭐⭐⭐⭐⭐ | → “Understood” |
| Git as universal substrate | 9/42 | ⭐⭐⭐⭐⭐ | ↑↑ Gas Town is most git-native tool ever |
| Evaluation frameworks | 8/42 | ⭐⭐⭐⭐ | → “Framework exists” |
| Skills evolving into standard | 6/42 | ⭐⭐⭐⭐ | → Unchanged |
| Open standards winning | 5/42 | ⭐⭐⭐ | → Unchanged |
| Token economics drive architecture | 5/42 | ⭐⭐⭐ | → |
| Graph orchestration convergence | 4/42 | ⭐⭐⭐ | ~ Gas Town proves alternative path |
| Governance is next frontier | 4/42 | ⭐⭐⭐ | → |
| Expanding beyond code | 3/42 | ⭐⭐⭐ | → |