Measuring Your ROI

Lesson 10 · Learn Graphify · ~15 minutes

You captured a baseline in Lesson 2. You've been using Graphify for several sessions. Now it's time to close the loop — re-run the same tasks, compare results, and decide whether the investment is paying off.

Win

After this lesson, you have concrete data showing whether Graphify improves your development workflow — and if it does, by how much.

The Comparison Protocol

Use the exact same tasks from your Lesson 2 baseline. This is critical — different tasks produce incomparable numbers.

Start a fresh AI session with /graphify loaded
Run the same prompt you used in the baseline
Record the same metrics: round-trips, first-attempt correctness, wall-clock time
Compare side-by-side

Recording the "After" Eval

# Eval: [Task Name] — WITH GRAPHIFY

## Setup
- Date: YYYY-MM-DD
- Tool: Kiro CLI
- Graphify: on (graph loaded via /graphify)
- Project: DocumentDB Perf Tests
- Graph freshness: updated [today / N days ago]

## Prompt
[Exact same prompt as baseline]

## Results
- Round-trips: N
- First-attempt correct: yes/no
- Wall-clock: Xm Ys
- Files the agent read: N (vs. baseline)
- Notes: [observations about behavior differences]

## Comparison
- Round-trip delta: N → M (X% reduction)
- Correctness: [same / improved / regressed]
- Time delta: Xm → Ym

What Good Results Look Like

Metric	Marginal improvement	Strong improvement	Exceptional
Round-trips	20-30% fewer	50-60% fewer	70%+ fewer
First-attempt correct	Same	Improved (no → yes)	Consistently yes
Wall-clock time	10-20% faster	30-50% faster	60%+ faster

Note

Even a 20% round-trip reduction pays off over time. If you run 10 AI sessions per day and each saves 2-3 exchanges, that's 20-30 fewer messages per day — less context window pressure, lower costs, faster flow.

Where Graphify Helps Most

Based on your data, you'll likely see the biggest improvements on:

Pattern reuse tasks — the agent sees conventions in the graph without reading every example
Discovery tasks — structural questions answered from graph data, zero file reads
Large-scope tasks — tasks that touch many files benefit most from reduced discovery

You'll likely see smaller improvements on:

Single-file edits — the agent only needed one file anyway
Behavioral debugging — graph can't help with runtime issues
Novel code — no existing patterns to reference

Calculating Cost Savings

If you're tracking token usage (from the Anthropic dashboard or API logs):

# Rough cost calculation:
# Baseline tokens per session: B
# With-graph tokens per session: G
# Sessions per month: S
# Cost per million tokens: $C (input) / $D (output)
#
# Monthly savings = (B - G) × S × ($C / 1_000_000)
#
# Example:
# B = 500,000 tokens, G = 80,000 tokens
# S = 200 sessions/month, $C = $3/M input
# Savings = 420,000 × 200 × $0.000003 = $252/month

Honest Assessment

If your comparison shows marginal or no improvement, that's valid data too. Graphify adds the most value on codebases with 200+ files and repetitive structure. If your project is smaller or highly dynamic, the overhead of maintaining the graph may not pay off. That's fine — knowing this is the win.

Building a Longitudinal Record

Don't stop at one comparison. Track over time:

# ~/Graphify/evals/log.md

| Date | Task | Metric | Baseline | With Graph | Delta |
|------|------|--------|----------|------------|-------|
| 2025-01-15 | Pattern reuse | Round-trips | 8 | 3 | -62% |
| 2025-01-15 | Discovery | Round-trips | 5 | 2 | -60% |
| 2025-01-20 | Integration | Round-trips | 6 | 4 | -33% |
| 2025-01-20 | New feature | Round-trips | 4 | 3 | -25% |

Over weeks, you'll see which task types benefit most and can adjust your workflow accordingly — maybe only loading the graph for discovery-heavy tasks.

Deciding to Keep or Drop

After your comparison, ask:

Is the improvement worth the maintenance cost? (Git hook is near-zero; CI workflows have some overhead)
Which task types benefit? Maybe load the graph selectively, not always.
Is the graph still accurate enough? If your project changes faster than the graph updates, the value degrades.

If the answer is "keep it," congratulations — you've validated a new tool in your workflow with data. If the answer is "drop it," that's equally valuable knowledge. You know exactly what it costs and what it buys.

Verify Your Understanding

Why must you use the exact same prompts from your baseline evaluation?

Different prompts would cause Graphify to build a different graph
Different tasks produce incomparable numbers — you need a controlled comparison
The AI agent caches results from the first run and reuses them
Graphify only optimizes for prompts it has seen before

When is it reasonable to stop using Graphify on a project?

When the codebase grows beyond 1000 files
When the measured improvement doesn't justify the maintenance overhead
After you've used it for more than 30 days
When you switch from Claude to a different AI provider

Do This Now

Pull up your Lesson 2 baseline eval files
Start a session with /graphify and re-run each task
Record results in the "after" template above
Calculate your deltas
Make your keep/drop/modify decision based on data

Course Complete

You've gone from zero to a fully-instrumented Graphify workflow: build, measure, maintain, leverage, scale, and prove. The graph is a tool — use it where it helps, skip it where it doesn't. The evaluation data tells you which is which.