💡 Analyses on LLM Wiki — Agentic AI Landscape

Mon, 01 Jan 0001 00:00:00 +0000

Cross-Source Theme Analysis#

16 sources, 8 tools, 2 standards, 3 methodologies, 1 practitioner account, 2 skill/eval resources. Here are the themes that appear across 3+ sources independently — not because they reference each other, but because they converged on the same ideas.

Note: This analysis was originally written against 11 sources. The 5 newest sources (Paperclip, Spec Kit, BMad Method, Anthropic Eval Guide, Promptfoo) strengthen existing themes — particularly Theme 3 (human-in-the-loop spectrum) and Theme 7 (evaluation). A full refresh is recommended when the wiki reaches 20+ sources.

Mon, 01 Jan 0001 00:00:00 +0000

How to Eval a Skill (Practical Guide)#

Anthropic’s prompt evals measure whether a prompt produces good output. Skill evals are harder because a skill has more surface area: it needs to trigger correctly, execute the right steps, use the right tools, produce the right output, and NOT trigger on the wrong inputs.

This guide maps Anthropic’s eval methodology onto skills, drawing from the wiki’s sources.

The Key Difference: Prompts vs. Skills#

	Prompt Eval	Skill Eval
What you test	Does this prompt produce good output?	Does this skill trigger, execute, and produce correctly?
Input	A prompt + expected output	A prompt + context + expected behavior chain
Failure modes	Bad output	Wrong trigger, wrong steps, wrong tools, bad output, false positive activation
Non-determinism	Output varies	Trigger, routing, tool selection, AND output all vary

A skill eval must test the full chain: routing → activation → execution → output → side effects.

Mon, 01 Jan 0001 00:00:00 +0000

Key Insights: The Agentic AI Landscape (April 2026)#

Synthesized from 16 sources across this wiki. This analysis captures the patterns, tensions, and emerging consensus visible when you look across the entire landscape rather than at any single tool.

1. Five Layers Are Emerging#

The landscape has organized into five distinct layers:

Layer	Representative	Core Bet
Company	paperclip	Orchestrate agents into companies with org charts, budgets, governance.
Methodology	spec-kit, bmad-method	Structure the development process. Specs before code, or adaptive agile workflows.
Infrastructure	scion (GCP)	The agent runtime is the hard problem. Be a hypervisor. Stay agnostic.
Product	kiro (AWS)	Ship an opinionated end-to-end agent. Autonomy and scale matter most.
Tool	claude-code (Anthropic), fabric	Make the individual agent excellent. Let users compose upward.

The methodology layer is new — spec-kit (“specs before code”) and bmad-method (“expert collaboration over autopilot”) represent two competing philosophies for structuring AI-assisted development. paperclip adds the company layer above everything, orchestrating agents into organizations with budgets and governance.

Mon, 01 Jan 0001 00:00:00 +0000

Evidence Map: Supporting the Ten Pillars Framework#

This analysis maps each pillar from ten-pillars-agentic-skill-design against real-world evidence collected across 11 sources in this wiki. Your paper acknowledged “no original controlled study” as a limitation — the wiki now provides post-hoc validation from production implementations.

Pillar 1: Architecture and Structure#

Your claim: Organize content into clearly defined sections — metadata, interfaces, core logic, workflows, configuration.

Supporting evidence:

agent-skills-standard codified this into a formal spec: SKILL.md with YAML frontmatter (name, description, license, compatibility, metadata, allowed-tools) + markdown body + optional scripts/references/assets directories. This is now an open standard at agentskills.io.
claude-code implements it: .claude directory with CLAUDE.md, .claude/rules/, .claude/skills/, .claude/agents/. Hierarchical, scoped (org → project → user → local).
pai takes it further: USER/ vs SYSTEM/ separation. Six layers of customization (identity, preferences, workflows, skills, hooks, memory). Upgrade-safe architecture.
skills-pipeline-sleestk follows the spec exactly: each skill is a directory with SKILL.md + references/ subdirectory.

Strength: Strong. Multiple independent implementations converged on the same structure. The Agent Skills spec formalizes what your paper recommended.