Choosing a Multi-Agent Framework in 2026#

Synthesized from 14 sources across this wiki (updated May 2026). This analysis compares multi-agent frameworks and orchestration tools, maps their coordination philosophies, and provides a practical decision framework.

The Landscape: Three Tiers#

The wiki now covers fourteen distinct approaches to multi-agent orchestration, split across three tiers:

Product-level (opinionated, integrated):

Tool	Philosophy	Layer
scion	Infrastructure-first. Hypervisor for agents. Container isolation.	Infrastructure
kiro	Product-first. Frontier agent. Hours of autonomy. PR output.	Tool
claude-code	Tool-first. Subagents, MCP, skills, permission modes.	Tool
paperclip	Company-first. Org charts, budgets, governance above agents.	Company

Orchestration tools (workspace/platform-level, agent-agnostic):

Tool	Philosophy	Core Metaphor
gastown	Workspace-first. Git worktrees, merge queue, 20-30 agents. Process model.	Town with Mayor, Polecats, Refinery
symphony	Spec-first. Language-agnostic protocol. WORKFLOW.md. Per-issue workspaces.	Scheduler/runner daemon
multica	Platform-first. Agents as teammates. Compounding skills. Cloud-first.	Team collaboration board

Open-source frameworks (composable, bring-your-own-model):

Framework	Philosophy	Core Metaphor	Status
openai-agents-sdk	Handoffs + guardrails	Functions that return agents	✅ Active (26.2K stars)
google-adk	Workflow agents + transfer	Sequential/Parallel/Loop hierarchy	✅ Active (ADK 2.0 beta)
microsoft-agent-framework	Graph workflows	Typed nodes + edges	✅ Active (1.0 GA, LTS)
langgraph-agent-orchestration	Explicit state machines	Graph nodes + edges + checkpoints	✅ Active (v1.0)
crewai-multi-agent	Role-based teams	Specialized experts collaborate	✅ Active (v1.10+)
autogen-multi-agent	Conversation-based	Agents negotiate through dialogue	⚠️ Legacy (→ MAF)
openai-swarm	Minimal handoffs	Functions that return agents	⚠️ Legacy (→ Agents SDK)

These aren’t competing — they operate at different levels. You might use LangGraph to orchestrate agents that run inside Claude Code, coordinated by Gas Town at the workspace level, with Paperclip managing company goals above.

The Seven Frameworks Compared#

OpenAI Agents SDK ✅#

Core idea: Production successor to Swarm. Agents are model configs with tools. Handoffs are functions that return another agent. Adds guardrails, tracing, MCP, and sandboxes.

26.2K GitHub stars — strong adoption
Python-only, OpenAI-model-locked
Guardrails: Input/output validation (Swarm had none)
Tracing: Native OpenTelemetry for debugging handoff decisions
MCP support: Native tool integration via Model Context Protocol
Sandboxes (May 2026): Isolated environments for code execution, file inspection, long-horizon tasks
Strengths: Simplest mental model, production-ready, excellent for OpenAI-only deployments
Weaknesses: No multi-provider support, no graph workflows, Python-only
Best for: Teams committed to OpenAI models wanting fast multi-agent prototypes that can go to production

Key signal: The handoff pattern from Swarm is now production-grade with proper guardrails and observability.

Google Agent Development Kit (ADK) ✅#

Core idea: Model-agnostic framework with deterministic workflow agents (Sequential/Parallel/Loop) plus LLM-driven delegation. Native A2A + MCP interoperability.

Four languages: Python, TypeScript, Go, Java — broadest language support in the space
Model-agnostic: Gemini, Claude, Ollama, vLLM, LiteLLM (10+ providers)
Workflow agents: SequentialAgent, ParallelAgent, LoopAgent — deterministic orchestration without LLM overhead
Three interaction mechanisms: Shared state, LLM transfer, AgentTool wrapping
Native A2A: Expose/consume agents across frameworks via Agent-to-Agent protocol
Native MCP: Full tool integration
ADK 2.0 Beta: Graph-based workflows, collaborative agents, dynamic workflows
Built-in evaluation: Multi-turn eval datasets, local eval via CLI/UI
Strengths: Most polyglot, most interoperable, deterministic workflow control, built-in eval
Weaknesses: Google ecosystem bias in deployment (Cloud Run, GKE), newer community
Best for: Multi-language teams, model-agnostic deployments, teams needing A2A interop

Key signal: ADK 2.0 adding graph workflows validates the convergence thesis while keeping deterministic workflow agents as a unique differentiator.

Microsoft Agent Framework 1.0 ✅#

Core idea: Convergence of Semantic Kernel (foundation) + AutoGen (orchestration) into one production SDK with graph-based workflows, six providers, and LTS commitment.

1.0 GA: April 3, 2026 — first enterprise agent SDK with LTS
75K+ stars merged: Combined community of Semantic Kernel + AutoGen
.NET + Python: Same concepts, same API shape, idiomatic in each language
Six providers: Azure OpenAI, OpenAI, Claude, Bedrock, Gemini, Ollama — one-line swap
Graph workflows: Round-robin, supervisor, hierarchical, dynamic hand-off
Native MCP + A2A: Full interoperability at 1.0
DevUI: Browser-based local debugger (message graphs, tool invocations, token latency, orchestration decisions)
Strengths: Enterprise-grade, LTS, broadest provider support, excellent debugging, .NET first-class
Weaknesses: Microsoft ecosystem bias, newer than LangGraph for graph workflows
Best for: Enterprise teams, Azure-centric stacks, .NET shops, teams needing LTS guarantees

Key signal: AutoGen and Semantic Kernel are now officially legacy. MAF is the canonical Microsoft path through 2027+.

LangGraph (LangChain) ✅#

Core idea: Agent workflows as explicit state-machine graphs. Nodes are LLM calls or tools. Edges define transitions.

v1.0 shipped (2026) — mature, battle-tested
Checkpointing: Pause/resume at any point, survive process restarts. The killer feature.
Human-in-the-loop: First-class at any node (not bolted on)
Conditional edges: Branching based on state (deterministic or LLM-decided)
Cycles: Graphs can loop for iterative refinement
Streaming: Intermediate results as workflow progresses
LangSmith integration: Tracing and debugging
Production users: Klarna, Replit, Elastic, Uber, LinkedIn, GitLab
Strengths: Most battle-tested graph framework, explicit control, error recovery, checkpointing
Weaknesses: Steeper learning curve, LangChain ecosystem coupling, Python/TypeScript only
Best for: Production stateful workflows, anything needing pause/resume or human approval gates

Key insight: LangGraph is where you go when “it works in a demo” needs to become “it works in production.” Now at v1.0 with proven enterprise adoption.

CrewAI ✅#

Core idea: Agents defined by role + goal + backstory, organized into “crews” with process strategies.

v1.10+ with A2A/MCP support, Flows, Enterprise platform
Four abstractions: Agent, Task, Crew, Process
Process strategies: Sequential or Hierarchical (manager delegates)
Built-in memory: Short-term, long-term, entity memory
Flows: Structured workflow orchestration (added 2025)
Role + backstory: Persona-based context-management — shapes agent behavior
Strengths: Most intuitive team metaphor, built-in memory, A2A/MCP support, Enterprise platform
Weaknesses: Less explicit control than graphs, process strategies are coarse-grained
Best for: Complex research tasks, content creation, multi-perspective analysis, rapid prototyping

Key insight: CrewAI’s backstory pattern remains the most accessible way to implement persona-based agents. Now with A2A/MCP interop it’s no longer isolated.

AutoGen (Microsoft) ⚠️ Legacy#

Core idea: Agents communicate through structured multi-turn conversations.

56.8K GitHub stars — largest historical community
⚠️ Now in maintenance mode: Development shifted to microsoft-agent-framework
Magentic-One: Generalist agent team still available via CLI: m1 "task"
Best for: Existing codebases not yet migrated. Plan migration during 2026.

OpenAI Swarm ⚠️ Legacy#

Core idea: Agents are system prompts with functions. Handoffs are functions that return another agent.

⚠️ Superseded by openai-agents-sdk: Swarm remains educational only
Two primitives: Routines + Handoffs — still the best way to learn multi-agent patterns
Best for: Learning. Build it yourself in 50 lines to understand the primitives.

Decision Matrix#

Factor	Agents SDK	Google ADK	MAF 1.0	LangGraph	CrewAI
Production readiness	★★★	★★★	★★★	★★★	★★
Ease of getting started	★★★	★★	★★	★★	★★★
Explicit control	★★ (handoffs)	★★★ (workflow agents)	★★★ (graph edges)	★★★ (graph edges)	★★ (process)
State persistence	★★ (session)	★★★ (state + memory)	★★★ (agent state)	★★★ (checkpointed)	★★ (three types)
Human-in-the-loop	★★ (approvals)	★★★ (PolicyEngine)	★★ (graph nodes)	★★★ (any node)	★★ (delegation)
Model support	★ (OpenAI only)	★★★ (10+ providers)	★★★ (6 providers)	★★★ (any via LangChain)	★★★ (any)
Language support	★ (Python)	★★★ (Py/TS/Go/Java)	★★ (.NET/Python)	★★ (Python/TS)	★ (Python)
Interop (MCP/A2A)	★★ (MCP only)	★★★ (MCP + A2A)	★★★ (MCP + A2A)	★★ (MCP)	★★ (MCP + A2A)
Debugging	★★ (tracing)	★★ (Dev UI)	★★★ (DevUI)	★★★ (LangSmith)	★★
Enterprise/LTS	★	★★	★★★ (LTS)	★★	★★ (Enterprise)

When to Use What#

Learning multi-agent patterns?        → Swarm (simplest mental model, educational)
OpenAI-only, fast to production?      → OpenAI Agents SDK (handoffs + guardrails)
Multi-language team?                  → Google ADK (Python/TS/Go/Java)
Model-agnostic, interop needed?       → Google ADK (A2A + MCP native)
Enterprise, Azure/.NET?               → Microsoft Agent Framework 1.0 (LTS)
Production stateful workflows?        → LangGraph (checkpointing, human-in-loop)
Quick multi-perspective prototype?    → CrewAI (role + backstory, intuitive)
Need human approval gates?            → LangGraph or ADK (PolicyEngine)
20-30 parallel coding agents?         → Gas Town (process-model, merge queue)
Issue-tracker automation?             → Symphony (Linear → Codex, WORKFLOW.md)
Agents as teammates on a board?       → Multica (11 runtimes, compounding skills)
Company-level governance?             → Paperclip (org charts, budgets)

Progression Path#

Learn: Build a Swarm-style handoff system to understand the primitives
Prototype: CrewAI for quick multi-agent prototypes (intuitive team metaphor)
Ship (simple): OpenAI Agents SDK if you’re OpenAI-only and want minimal ceremony
Ship (complex): LangGraph or MAF when you need checkpointing, graphs, or human-in-the-loop
Scale (multi-language): Google ADK when your team spans Python/Go/Java/.NET
Scale (parallel): Gas Town for 20-30 parallel agents with merge queue
Automate: Symphony to turn issue tracker work into autonomous agent runs
Collaborate: Multica when your team (humans + agents) needs shared visibility
Govern: Paperclip for company-level orchestration with budgets and accountability

The Graph Convergence (Confirmed)#

The most significant finding across all sources: graph-based workflows are the consensus architecture for production multi-agent systems. What was a thesis in April is now confirmed:

Framework	Graph Status
langgraph-agent-orchestration	Built on graphs from day one (v1.0 shipped)
microsoft-agent-framework	Graph workflows as core architecture (1.0 GA)
google-adk	ADK 2.0 beta adding graph-based workflows
autogen-multi-agent	Abandoned GroupChat for graph-based MAF
scion	Directed workflows for agent coordination
kiro	Structured task graphs internally

Why graphs win:

Explicit: You define the flow, not the conversation
Debuggable: Visualize the graph, trace execution path
Checkpointable: Pause/resume at any node
Composable: Subgraphs as reusable components
Enforceable: Security invariants at the graph level

Exception: Gas Town’s process-model proves graphs aren’t the only path to scale. External state coordination (Dolt/Git) enables 20-30 agents without graph overhead.

The Protocol Layer (New in 2026)#

A major development since April: the interoperability story is crystallizing around three protocols:

Protocol	Layer	Purpose	Adopted By
MCP (Model Context Protocol)	Tools	Connect agents to external tools/data	All frameworks
A2A (Agent-to-Agent)	Agents	Cross-framework agent communication	ADK, MAF, CrewAI
AG-UI (Agent-User Interaction)	Frontend	Agent-to-user interaction in UIs	CopilotKit, LlamaIndex

Key insight: MCP won for tools. A2A is emerging for agent interop. These are complementary layers, not competitors. The “Will MCP become the standard?” question from April is partially answered: MCP is the tool layer; A2A is the agent layer.

Frameworks with native A2A support can coordinate agents across different runtimes — a Google ADK agent can collaborate with a Microsoft Agent Framework agent via structured protocol messaging.

How Product-Level Tools Relate#

Tool	What It Provides	Framework Complement
claude-code	The agent itself (LLM + tools + skills)	LangGraph/ADK orchestrate multiple instances
kiro	Autonomous frontier agent	Could be a node in a LangGraph/MAF workflow
scion	Container isolation + lifecycle	Provides the runtime for any framework’s agents
gastown	Workspace orchestration + merge queue	Coordinates Claude Code/Codex/Copilot (20-30 agents)
symphony	Issue-to-agent automation	Reads Linear, spawns Codex sessions per issue
multica	Team collaboration platform	Assigns issues to agents, compounds skills (11 runtimes)
paperclip	Company-level governance	Sits above frameworks, manages goals and budgets

The emerging stack:

Paperclip (company goals/governance)
    → Multica (team collaboration + skill compounding)
        → Gas Town (workspace orchestration + merge queue)
            → LangGraph / MAF / ADK (workflow graphs)
                → Claude Code / Kiro / Codex (individual agents)
                    → Scion (infrastructure isolation)

The Architectural Split#

Architecture	Control Plane	Scale	Tools
Handoff-based	Function returns	2-5 agents	Agents SDK, Swarm
Conversation-as-control	LLM routes via messages	3-5 agents	AutoGen, CrewAI
Workflow-agent	Deterministic Sequential/Parallel/Loop	5-15 agents	Google ADK
Graph-as-control	Explicit edges + conditions	5-15 agents	LangGraph, MAF
Process-model	Deterministic routing via external state	20-30 agents	Gas Town
Issue-tracker-driven	Tracker polls + workspace isolation	Bounded (default 10)	Symphony
Platform-driven	Web UI + daemon dispatch	Runtime-bound	Multica

Coordination Patterns Across All Fourteen Approaches#

Pattern	Who Uses It	How
Git-based coordination	Scion, Kiro, Claude Code, Gas Town, Symphony	Worktrees/branches per agent, PRs as output
Graph-based	LangGraph, MAF, ADK 2.0, Scion	Explicit nodes + edges + conditions
Workflow agents	Google ADK	Sequential/Parallel/Loop deterministic control
Conversation-based	AutoGen, Claude Code (subagents)	Agents negotiate through dialogue
Role-based delegation	CrewAI, Paperclip, Gas Town	Specialized agents with defined responsibilities
Function handoffs	Agents SDK, Swarm, Claude Code (tool use)	Functions transfer control between agents
Process-model (GUPP)	Gas Town	Pull-based: work on hook → agent executes
Issue-tracker polling	Symphony, Multica	Daemon reads issues, dispatches agents per task
Container isolation	Scion	Each agent in its own container
Permission modes	Claude Code	Configurable dial from full control to full autonomy
A2A protocol	ADK, MAF, CrewAI	Cross-framework agent communication
Compounding skills	Multica	Solutions become reusable team capabilities

The Multi-Agent Memory Problem#

Multi-agent systems multiply the agent-memory-persistence challenge:

Shared memory: How do agents share what they’ve learned? CrewAI has built-in cross-agent memory. LangGraph uses checkpointed state. ADK has session state + memory service. Agents SDK has session-based state.
Conflicting memories: When Agent A and Agent B learn contradictory facts, who wins? No framework has a standard resolution mechanism.
Cascading permissions: When Agent A delegates to Agent B, does B inherit A’s full memory access? (agentic-ai-governance flags this as a key risk)
Cost multiplication: Each agent consumes tokens independently. Poor memory management across N agents means N× the waste (agent-cost-economics).

Production Challenges (Common Across All)#

Non-determinism: Same input → different agent dialogues → different outcomes. Testing is hard.
Debugging complexity: Tracing failures across multiple agents. MAF’s DevUI and LangGraph’s LangSmith help most here.
Context switching overhead: Maintaining coherence as control passes between agents.
Cost scaling: More agents = more tokens. Model routing becomes critical.
Emergent behavior: Individual agents within guardrails can produce unanticipated combined outcomes.
Interop friction: Despite A2A/MCP, cross-framework coordination is still early.

Recommendations#

If you’re new to multi-agent: Start with Swarm’s handoff pattern. Build it yourself in 50 lines. Understand the primitives before adopting a framework.
If you need a quick prototype: CrewAI. Define roles, backstories, tasks. Sequential process. Working multi-agent system in an afternoon.
If you’re OpenAI-only and want production: OpenAI Agents SDK. Handoffs + guardrails + tracing with minimal ceremony.
If you need multi-language support: Google ADK. Python/TypeScript/Go/Java with the same concepts across all four.
If you need cross-framework interop: Google ADK or Microsoft Agent Framework. Both have native A2A + MCP.
If you’re going to production with complex workflows: LangGraph. Checkpointing, human-in-the-loop, explicit graph control, and proven enterprise adoption (Klarna, Replit, Uber).
If you’re on Azure/.NET: Microsoft Agent Framework 1.0. LTS commitment, DevUI, six providers, graph workflows. The canonical choice through 2027.
If you need company-level orchestration: paperclip above whatever framework you choose.
If you need 20-30 parallel agents: gastown. Process-model with crash-surviving state and merge queue.
If you want issue-tracker-driven automation: symphony. Minimal infrastructure, WORKFLOW.md as policy-as-code.
If you want agents as teammates: multica. Cloud-first platform with compounding skills and 11 runtime support.
For everyone: Plan for the graph convergence. Even if you start with CrewAI or Agents SDK, your production system will likely end up as a graph. But note: Gas Town’s process-model proves graphs aren’t the only path to scale.

Open Questions#

Will A2A achieve the same ubiquity as MCP, or will it fragment?
Can CrewAI’s intuitive role metaphor be preserved within a graph-based architecture?
How should multi-agent memory be shared without cascading errors?
What’s the right granularity for task decomposition across agents?
Will the product-level tools (Claude Code, Kiro) eventually embed framework-level orchestration natively?
Will Google ADK’s four-language approach force other frameworks to expand language support?
How will the LTS commitment from Microsoft affect framework choice in regulated industries?
Will cross-model adversarial review (Metaswarm pattern) become standard for trust?
Can Multica’s compounding skills scale to large teams?
Will AG-UI become the standard for agent-to-frontend communication?

Analysis based on 14 sources ingested into this wiki between 2026-04-07 and 2026-05-11. Updated May 2026 with OpenAI Agents SDK, Google ADK, Microsoft Agent Framework 1.0, and the A2A/MCP protocol landscape. See orchestration-tools-compared for the Gas Town/Symphony/Multica head-to-head.

Multi Agent Framework Guide