Key Insights: The Agentic AI Landscape (May 2026)#
Synthesized from 42 sources across this wiki. This analysis captures the patterns, tensions, and emerging consensus visible when you look across the entire landscape.
Refresh history: Originally written against 16 sources (Apr 8-10). Refreshed Apr 15 against 33 sources. Refreshed May 9 against 42 sources — added Gas Town, Symphony, Multica; updated layers, multi-agent philosophies, and “What’s Still Missing.”
1. Seven Layers Have Emerged (was Six)#
| Layer | Representatives | Core Bet |
|---|---|---|
| Company | paperclip | Org charts, budgets, governance, goal alignment |
| Methodology | spec-kit, bmad-method | Specs, plans, tasks, quality gates |
| Orchestration | gastown, symphony, multica | Workspace coordination, issue-to-agent automation, team collaboration |
| Infrastructure | scion, langgraph-agent-orchestration, autogen-multi-agent | Containers, runtimes, graph orchestration |
| Tool | claude-code, kiro, crewai-multi-agent | Agentic loop, skills, hooks, MCP |
| Pattern | fabric, agent-skills-standard, openai-swarm | Curated prompts, composable strategies, handoffs |
| Memory | mem0, agent-memory-persistence | Persistence, retrieval, forgetting, knowledge graphs |
New (May 2026): Orchestration is now its own layer between Methodology and Infrastructure. Gas Town (workspace/process-model), Symphony (spec/scheduler), and Multica (platform/teammates) represent three distinct approaches at this layer. The emerging stack: Paperclip → Multica → Gas Town → LangGraph/MAF → Claude Code/Kiro → Scion → Mem0.
2. The Autonomy–Interaction Spectrum Is Formalized#
No longer just a spectrum — now has measurable UX patterns (agentic-ux-patterns):
← More Human Control More Agent Autonomy →
Observe & Plan & Act with Act
Suggest Propose Confirmation Autonomously
(Scion) (CrewAI) (Claude Code) (Kiro, PAI)The Autonomy Dial provides four levels with metrics: >85% acceptance rate, <5% reversion rate, >90% escalation recovery. Phased adoption: safety first → calibrated autonomy → proactive delegation.
3. Two Open Standards + Graph Convergence#
| Standard | What It Does | Adoption |
|---|---|---|
| mcp-protocol | Connects agents to external tools/data | Claude Code, Kiro, Fabric |
| agent-skills-standard | Packages reusable agent capabilities | Claude Code, agentskills.io |
New: Graph-based workflows are converging as the third standard — not a formal spec, but a consensus architecture. Both AutoGen (MAF) and LangGraph use typed nodes + edges. The conversation-based approach (GroupChat) is being abandoned by its own creators.
4. Progressive Disclosure Is Quantified#
The consensus solution to context management now has benchmarks:
| Approach | Tokens | Accuracy | Source |
|---|---|---|---|
| Full-context (send everything) | ~26,000 | 72.9% | mem0-memory-management |
| Selective retrieval (Mem0) | ~1,800 | 66.9% | mem0-memory-management |
| Graph-enhanced (Mem0g) | ~1,800 | 68.4% | mem0-memory-management |
| MemGPT paging | ~1,000 | — | efficient-memory-architectures |
93% fewer tokens for ~5% accuracy tradeoff. For any interactive agent, selective retrieval is the production-viable path.
5. Memory Is Understood (Was “Unsolved Frontier”)#
Four architecture patterns documented (memory-architecture-comparison):
- Vector-Only: fast, no relationships (1/6 CMA compliance)
- Graph+Vector (mem0): relationships + semantic search (4/6 CMA)
- File+Database (llm-wiki-pattern): human-readable, git-friendly (2/6 CMA)
- Hierarchical (MemGPT, pai): mimics human cognition (5/6 CMA)
Six formal CMA requirements defined (continuum-memory-architectures). Standard RAG meets none. Forgetting is a design requirement, not a failure. Progression: start vector-only → add graph → add hierarchy → add forgetting.
6. Git Remains the Universal Coordination Mechanism#
Unchanged. Every tool uses git. No one is building a custom protocol. But git only works for text-shaped artifacts.
7. The Skill Hierarchy Is Crystallizing#
Unchanged. Fabric Patterns → Agent Skills Standard → Claude Code Skills → PAI Skills. PAI’s insight: CODE → CLI → PROMPT → SKILL (code before prompts).
8. Security Models Now Have a Governance Layer#
Expanded from tool-level security to organizational governance (agentic-ai-governance):
| Layer | Approach | Source |
|---|---|---|
| Infrastructure | Container isolation | scion |
| Agent | Permission modes + classifier | claude-code |
| Output | Sandbox + PR-only | kiro |
| Policy | Hooks + allowlists | pai |
| Organization | Five pillars: inventory, identity, least privilege, observability, compliance | agentic-ai-governance |
| User-facing | Six UX patterns: intent preview, autonomy dial, rationale, confidence, audit, escalation | agentic-ux-patterns |
Shadow AI: 68% of employees use AI without IT approval. $412K/yr average cost. Regulatory landscape crystallizing (NIST, EU AI Act, OWASP, Singapore).
9. The “Personal AI” Vision Extends to Every Industry#
Expanded from coding to six industries (agentic-ai-non-code-domains):
| Industry | Impact | Key Tension |
|---|---|---|
| Financial Services | 40-60% compliance time reduction | Regulatory compliance |
| Healthcare | Enormous potential | Hallucination = life-threatening |
| Professional Services | Existential SaaS disruption | Hourly-billing model at risk |
| Manufacturing | Adaptive vs rigid automation | Physical-digital convergence |
| Telecoms | 8-15% cost reductions | Underutilized data |
| Transportation | 12-20% delivery improvements | Clear quantifiable ROI |
The wiki’s themes generalize across all industries. Stakes are higher outside code. Domain expertise is the moat.
10. Evaluation Has Benchmarks Now (Was “Weakest Link”)#
Upgraded with four standardized benchmarks (agent-benchmarks):
| Benchmark | What It Tests | Top Score | Human Baseline |
|---|---|---|---|
| humaneval-benchmark | Code generation | 96.3% | — |
| swe-bench | Real-world SE | 74.4% | — |
| gaia-benchmark | General AI assistant | <50% | 92% |
| agentbench | Interactive agents (8 envs) | Commercial » OSS | — |
Progression: code gen (solved) → real-world SE (improving fast) → general reasoning (far from human) → interactive agents (commercial leads). Still missing: skill-level eval, multi-agent quality, memory quality benchmarks.
11. Token Economics Are an Architectural Concern (NEW)#
60-80% of agent tokens are waste (agent-cost-economics). Five waste vectors: file reading loops, retry tax, over-qualified models, no caching, context contamination.
$5T infrastructure bet with base case 3.2% ROI. Per-token costs falling 85% but total cost flat/increasing due to volume. Reasoning models use 8× more tokens. The industry’s viability depends on enterprise adoption at $450-500/mo ARPU.
Optimization is architecture: model routing (5-8× savings), prompt caching (90% discount), session discipline, selective retrieval (93% reduction). See cost-optimization-guide.
12. Multi-Agent Has Seven Philosophies Now (was Four) (NEW → EXPANDED)#
Frameworks (composable, bring-your-own-model):
| Framework | Philosophy | Best For |
|---|---|---|
| autogen-multi-agent | Conversation | Research, prototyping |
| crewai-multi-agent | Role-based teams | Complex research |
| langgraph-agent-orchestration | State machine graphs | Production workflows |
| openai-swarm | Minimal handoffs | Simple routing, learning |
Orchestration tools (workspace/platform-level):
| Tool | Philosophy | Best For |
|---|---|---|
| gastown | Process-model (GUPP) | 20-30 parallel agents, crash-surviving state |
| symphony | Spec/protocol (WORKFLOW.md) | Issue-tracker-driven automation, minimal infra |
| multica | Platform (agents as teammates) | Team collaboration, compounding skills |
Key architectural split: Conversation-as-control (3-5 agents) vs process-model (20-30 agents). Gas Town is the only tool using deterministic routing via external state rather than LLM conversation for coordination. See orchestration-tools-compared and multi-agent-framework-guide.
13. Governance Requires Five Pillars (NEW)#
From agentic-ai-governance: Agent Inventory → Agent Identity (NHI) → Dynamic Least Privilege → Continuous Observability → Continuous Compliance. Legacy security fails because agents violate every assumption (identity, permissions, behavior, speed, audit trail). Kill switches are non-negotiable. See governance-safety-overview.
14. What’s Still Missing#
Gaps visible across 42 sources:
- Skill-level evaluation: benchmarks test whole models, not individual skills
- Multi-agent coordination quality: no benchmark for how well agents work together
- Memory quality benchmarks: LOCOMO is closest, but no standard for long-term memory accuracy
- Conflict resolution: when agents or memories contradict, no standard mechanism
- Cross-framework interoperability: MCP connects tools, but no standard for agent-to-agent handoff across frameworks
- Environmental impact: $5T infrastructure has energy implications (partially addressed by ai-environmental-impact)
- Harness engineering practices: Symphony/OpenAI coined the term but no comprehensive guide exists
- Cross-model adversarial review: Metaswarm pattern (writer ≠ reviewer) is promising but not yet in wiki as a source
- Content pipeline: turning wiki/agent knowledge into publishable content (the user’s stated goal)
Analysis based on 42 sources ingested between 2026-04-07 and 2026-05-09. Refreshed 2026-05-09.