The Ten Pillars of Agentic Skill Design#

Author: Ian Forster (with support of Kiro) Version: 2.0 | November 2024

Summary#

A research paper proposing a comprehensive ten-pillar framework for designing agentic skills files — the modular extensions that encapsulate domain knowledge, workflows, and tool integrations for AI agents. Synthesizes software engineering principles, prompt engineering best practices, and analysis of 4,476+ GitHub repositories to address the lack of standardized design methodologies.

The Ten Pillars#

Architecture and Structure — Metadata, interfaces, core logic, workflows, configuration. Hierarchical separation of concerns following mcp-protocol server architecture.
Documentation Clarity — Self-documenting skills: purpose, usage triggers, examples, limitations, dependencies.
Scope Definition — Single Responsibility Principle. One domain per skill. Composition over inheritance.
Modularity and Reusability — Composable units. Include directives, skill libraries, dependency injection.
Prompt Engineering — Chain of Thought, ReAct pattern, self-reflection (Reflexion). System messages, stepwise instructions, few-shot examples.
Tool Integration and Security — mcp-protocol patterns. Defense-in-depth: credential management, input validation, sandboxing, human-in-the-loop, prompt injection defenses.
Testing, Validation, and Observability — Unit/integration tests. AgentOps observability: execution traces, model interactions, performance metrics, anomaly detection, quality monitoring.
Version Control and Maintenance — Semantic versioning. Changelogs. Dependency management. Compatibility matrices.
Performance Optimization — Token optimization. Context management recipes: chunking, progressive summarization, selective context loading, agent persona context templates. Caching patterns.
Anti-Patterns — Monolithic skills, hard-coded config, overly generic prompts, missing error handling, ignoring token limits, poor tool integration, lack of testing.

Key Findings#

Standardization matters: MCP as universal protocol demonstrates need for standards. Skills on standard protocols show better interoperability.
Modular > monolithic: MASAI achieved 28.33% on SWE-bench Lite by decomposing into specialized sub-agents. Validates pillars 3-4.
Sometimes agents aren’t needed: AGENTLESS achieved 27% on SWE-bench with a simple three-phase pipeline. Skills should match complexity to task requirements.
Trajectory efficiency: LLMs often generate unnecessary reasoning steps. Skills should implement early stopping.
Self-reflection improves performance: Reflexion and ReAct frameworks show feedback loops and self-correction significantly help.
Context management is critical: Concrete recipes for multi-skill pipelines where different agent personas collaborate.

Context Management Recipes#

Four concrete patterns for multi-skill pipelines:

Chunking — Split large documents into semantic chunks with overlap for context continuity
Progressive Summarization — Two-pass compression preserving key facts at target ratio
Selective Context Loading — ContextManager class that loads only relevant context per skill, with priority-based budget allocation
Agent Persona Context Templates — Minimal context handoff between agent personas (analyst→engineer→reviewer)

Empirical Evidence Cited#

Anthropic Productivity Study (2025): 80% task time reduction across 100K conversations
SWE-bench: 1.96% success rate reveals gaps in structured problem-solving
CodeAct: 20% higher success rate with structured executable code actions
AutoGen: 4x coding effort reduction, 3-10x manual interaction reduction
MASAI: 28.33% on SWE-bench Lite via modular sub-agents
AGENTLESS: 27% on SWE-bench without agentic workflows

Honest Limitations#

The paper explicitly acknowledges: no original controlled study, no before/after case studies, no ablation studies, repository survey lacks formal methodology. Benefits are “anticipated” not “proven.” Recommends practitioners measure their own results.

Ten Pillars Agentic Skill Design