You captured a baseline in Lesson 2. You've been using Graphify for several sessions. Now it's time to close the loop — re-run the same tasks, compare results, and decide whether the investment is paying off.
After this lesson, you have concrete data showing whether Graphify improves your development workflow — and if it does, by how much.
Use the exact same tasks from your Lesson 2 baseline. This is critical — different tasks produce incomparable numbers.
/graphify loaded# Eval: [Task Name] — WITH GRAPHIFY
## Setup
- Date: YYYY-MM-DD
- Tool: Kiro CLI
- Graphify: on (graph loaded via /graphify)
- Project: DocumentDB Perf Tests
- Graph freshness: updated [today / N days ago]
## Prompt
[Exact same prompt as baseline]
## Results
- Round-trips: N
- First-attempt correct: yes/no
- Wall-clock: Xm Ys
- Files the agent read: N (vs. baseline)
- Notes: [observations about behavior differences]
## Comparison
- Round-trip delta: N → M (X% reduction)
- Correctness: [same / improved / regressed]
- Time delta: Xm → Ym
| Metric | Marginal improvement | Strong improvement | Exceptional |
|---|---|---|---|
| Round-trips | 20-30% fewer | 50-60% fewer | 70%+ fewer |
| First-attempt correct | Same | Improved (no → yes) | Consistently yes |
| Wall-clock time | 10-20% faster | 30-50% faster | 60%+ faster |
Even a 20% round-trip reduction pays off over time. If you run 10 AI sessions per day and each saves 2-3 exchanges, that's 20-30 fewer messages per day — less context window pressure, lower costs, faster flow.
Based on your data, you'll likely see the biggest improvements on:
You'll likely see smaller improvements on:
If you're tracking token usage (from the Anthropic dashboard or API logs):
# Rough cost calculation:
# Baseline tokens per session: B
# With-graph tokens per session: G
# Sessions per month: S
# Cost per million tokens: $C (input) / $D (output)
#
# Monthly savings = (B - G) × S × ($C / 1_000_000)
#
# Example:
# B = 500,000 tokens, G = 80,000 tokens
# S = 200 sessions/month, $C = $3/M input
# Savings = 420,000 × 200 × $0.000003 = $252/month
If your comparison shows marginal or no improvement, that's valid data too. Graphify adds the most value on codebases with 200+ files and repetitive structure. If your project is smaller or highly dynamic, the overhead of maintaining the graph may not pay off. That's fine — knowing this is the win.
Don't stop at one comparison. Track over time:
# ~/Graphify/evals/log.md
| Date | Task | Metric | Baseline | With Graph | Delta |
|------|------|--------|----------|------------|-------|
| 2025-01-15 | Pattern reuse | Round-trips | 8 | 3 | -62% |
| 2025-01-15 | Discovery | Round-trips | 5 | 2 | -60% |
| 2025-01-20 | Integration | Round-trips | 6 | 4 | -33% |
| 2025-01-20 | New feature | Round-trips | 4 | 3 | -25% |
Over weeks, you'll see which task types benefit most and can adjust your workflow accordingly — maybe only loading the graph for discovery-heavy tasks.
After your comparison, ask:
If the answer is "keep it," congratulations — you've validated a new tool in your workflow with data. If the answer is "drop it," that's equally valuable knowledge. You know exactly what it costs and what it buys.
Why must you use the exact same prompts from your baseline evaluation?
When is it reasonable to stop using Graphify on a project?
/graphify and re-run each taskYou've gone from zero to a fully-instrumented Graphify workflow: build, measure, maintain, leverage, scale, and prove. The graph is a tool — use it where it helps, skip it where it doesn't. The evaluation data tells you which is which.