<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>💡 Analyses on LLM Wiki — Agentic AI Landscape</title><link>https://blog.imfsoftware.com/llm-wiki/docs/analyses/</link><description>Recent content in 💡 Analyses on LLM Wiki — Agentic AI Landscape</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://blog.imfsoftware.com/llm-wiki/docs/analyses/index.xml" rel="self" type="application/rss+xml"/><item><title/><link>https://blog.imfsoftware.com/llm-wiki/docs/analyses/cross-source-themes/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://blog.imfsoftware.com/llm-wiki/docs/analyses/cross-source-themes/</guid><description>&lt;h1 id="cross-source-theme-analysis"&gt;Cross-Source Theme Analysis&lt;a class="anchor" href="#cross-source-theme-analysis"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;16 sources, 8 tools, 2 standards, 3 methodologies, 1 practitioner account, 2 skill/eval resources. Here are the themes that appear across 3+ sources independently — not because they reference each other, but because they converged on the same ideas.&lt;/p&gt;
&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: This analysis was originally written against 11 sources. The 5 newest sources (Paperclip, Spec Kit, BMad Method, Anthropic Eval Guide, Promptfoo) strengthen existing themes — particularly Theme 3 (human-in-the-loop spectrum) and Theme 7 (evaluation). A full refresh is recommended when the wiki reaches 20+ sources.&lt;/p&gt;</description></item><item><title/><link>https://blog.imfsoftware.com/llm-wiki/docs/analyses/how-to-eval-a-skill/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://blog.imfsoftware.com/llm-wiki/docs/analyses/how-to-eval-a-skill/</guid><description>&lt;h1 id="how-to-eval-a-skill-practical-guide"&gt;How to Eval a Skill (Practical Guide)&lt;a class="anchor" href="#how-to-eval-a-skill-practical-guide"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Anthropic&amp;rsquo;s prompt evals measure whether a prompt produces good output. Skill evals are harder because a skill has more surface area: it needs to trigger correctly, execute the right steps, use the right tools, produce the right output, and NOT trigger on the wrong inputs.&lt;/p&gt;
&lt;p&gt;This guide maps Anthropic&amp;rsquo;s eval methodology onto skills, drawing from the wiki&amp;rsquo;s sources.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="the-key-difference-prompts-vs-skills"&gt;The Key Difference: Prompts vs. Skills&lt;a class="anchor" href="#the-key-difference-prompts-vs-skills"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;th&gt;Prompt Eval&lt;/th&gt;
 &lt;th&gt;Skill Eval&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;What you test&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Does this prompt produce good output?&lt;/td&gt;
 &lt;td&gt;Does this skill trigger, execute, and produce correctly?&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Input&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;A prompt + expected output&lt;/td&gt;
 &lt;td&gt;A prompt + context + expected behavior chain&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Failure modes&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Bad output&lt;/td&gt;
 &lt;td&gt;Wrong trigger, wrong steps, wrong tools, bad output, false positive activation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Non-determinism&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Output varies&lt;/td&gt;
 &lt;td&gt;Trigger, routing, tool selection, AND output all vary&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A skill eval must test the full chain: &lt;strong&gt;routing → activation → execution → output → side effects&lt;/strong&gt;.&lt;/p&gt;</description></item><item><title/><link>https://blog.imfsoftware.com/llm-wiki/docs/analyses/key-insights-agentic-landscape/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://blog.imfsoftware.com/llm-wiki/docs/analyses/key-insights-agentic-landscape/</guid><description>&lt;h1 id="key-insights-the-agentic-ai-landscape-april-2026"&gt;Key Insights: The Agentic AI Landscape (April 2026)&lt;a class="anchor" href="#key-insights-the-agentic-ai-landscape-april-2026"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Synthesized from 16 sources across this wiki. This analysis captures the patterns, tensions, and emerging consensus visible when you look across the entire landscape rather than at any single tool.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="1-five-layers-are-emerging"&gt;1. Five Layers Are Emerging&lt;a class="anchor" href="#1-five-layers-are-emerging"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The landscape has organized into five distinct layers:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Layer&lt;/th&gt;
 &lt;th&gt;Representative&lt;/th&gt;
 &lt;th&gt;Core Bet&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Company&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/paperclip/"&gt;paperclip&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;Orchestrate agents into companies with org charts, budgets, governance.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Methodology&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/spec-kit/"&gt;spec-kit&lt;/a&gt;, &lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/bmad-method/"&gt;bmad-method&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;Structure the development process. Specs before code, or adaptive agile workflows.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/scion/"&gt;scion&lt;/a&gt; (GCP)&lt;/td&gt;
 &lt;td&gt;The agent runtime is the hard problem. Be a hypervisor. Stay agnostic.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Product&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/kiro/"&gt;kiro&lt;/a&gt; (AWS)&lt;/td&gt;
 &lt;td&gt;Ship an opinionated end-to-end agent. Autonomy and scale matter most.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Tool&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/claude-code/"&gt;claude-code&lt;/a&gt; (Anthropic), &lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/fabric/"&gt;fabric&lt;/a&gt;&lt;/td&gt;
 &lt;td&gt;Make the individual agent excellent. Let users compose upward.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The methodology layer is new — &lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/spec-kit/"&gt;spec-kit&lt;/a&gt; (&amp;ldquo;specs before code&amp;rdquo;) and &lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/bmad-method/"&gt;bmad-method&lt;/a&gt; (&amp;ldquo;expert collaboration over autopilot&amp;rdquo;) represent two competing philosophies for structuring AI-assisted development. &lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/paperclip/"&gt;paperclip&lt;/a&gt; adds the company layer above everything, orchestrating agents into organizations with budgets and governance.&lt;/p&gt;</description></item><item><title/><link>https://blog.imfsoftware.com/llm-wiki/docs/analyses/ten-pillars-evidence-map/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://blog.imfsoftware.com/llm-wiki/docs/analyses/ten-pillars-evidence-map/</guid><description>&lt;h1 id="evidence-map-supporting-the-ten-pillars-framework"&gt;Evidence Map: Supporting the Ten Pillars Framework&lt;a class="anchor" href="#evidence-map-supporting-the-ten-pillars-framework"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;This analysis maps each pillar from &lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/sources/ten-pillars-agentic-skill-design/"&gt;ten-pillars-agentic-skill-design&lt;/a&gt; against real-world evidence collected across 11 sources in this wiki. Your paper acknowledged &amp;ldquo;no original controlled study&amp;rdquo; as a limitation — the wiki now provides post-hoc validation from production implementations.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pillar-1-architecture-and-structure"&gt;Pillar 1: Architecture and Structure&lt;a class="anchor" href="#pillar-1-architecture-and-structure"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Your claim&lt;/strong&gt;: Organize content into clearly defined sections — metadata, interfaces, core logic, workflows, configuration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Supporting evidence&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/concepts/agent-skills-standard/"&gt;agent-skills-standard&lt;/a&gt;&lt;/strong&gt; codified this into a formal spec: SKILL.md with YAML frontmatter (name, description, license, compatibility, metadata, allowed-tools) + markdown body + optional scripts/references/assets directories. This is now an open standard at agentskills.io.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/claude-code/"&gt;claude-code&lt;/a&gt;&lt;/strong&gt; implements it: &lt;code&gt;.claude&lt;/code&gt; directory with CLAUDE.md, &lt;code&gt;.claude/rules/&lt;/code&gt;, &lt;code&gt;.claude/skills/&lt;/code&gt;, &lt;code&gt;.claude/agents/&lt;/code&gt;. Hierarchical, scoped (org → project → user → local).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/entities/pai/"&gt;pai&lt;/a&gt;&lt;/strong&gt; takes it further: USER/ vs SYSTEM/ separation. Six layers of customization (identity, preferences, workflows, skills, hooks, memory). Upgrade-safe architecture.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://blog.imfsoftware.com/llm-wiki/docs/sources/skills-pipeline-sleestk/"&gt;skills-pipeline-sleestk&lt;/a&gt;&lt;/strong&gt; follows the spec exactly: each skill is a directory with SKILL.md + references/ subdirectory.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Strength&lt;/strong&gt;: Strong. Multiple independent implementations converged on the same structure. The Agent Skills spec formalizes what your paper recommended.&lt;/p&gt;</description></item></channel></rss>