Capture Your Baseline

Lesson 2 · Learn Graphify · ~15 minutes

You can't prove Graphify helps without a "before" measurement. This lesson sets up a lightweight evaluation protocol you'll run before building your first graph — so you have real numbers to compare against.

Win

After this lesson, you'll have 2-3 baseline task measurements that you can re-run with Graphify later to produce a concrete before/after comparison.

Why Evaluate

The published Graphify benchmarks report 6x–49x token reduction on 100–500 file repos. But those numbers measure a specific thing: tokens consumed during code discovery (grep/glob). Your actual workflow might benefit differently — fewer round-trips, higher first-attempt correctness, faster pattern reuse. You need your own numbers.

What to Measure

Pick metrics that matter for your stated goal (less back-and-forth, more predictable outcomes):

Metric How to capture Why it matters
Round-trips Count human→agent message pairs until task completes Direct measure of "back and forth"
Token usage graphify benchmark (after graph exists) or Anthropic dashboard Cost proxy; also correlates with context bloat
First-attempt correctness Did the agent's first code output pass tests / match patterns? (yes/no) Measures whether graph context improves precision
Wall-clock time Start to "task done" timestamp End-to-end productivity signal
Note

Keep it simple. You don't need all four. Pick round-trips (easiest to count) plus one other. Resist the urge to over-instrument.

Pick Your Eval Tasks

Choose 2-3 tasks that are representative of your real work on the DocumentDB project and repeatable (you could give the same prompt to the agent twice and compare). Good candidates:

  1. Pattern reuse task — "Write a new test file for operator X following the existing pattern in operator Y." The agent needs to discover the pattern first.
  2. Discovery task — "What assertion helpers are available and when should I use each one?" The agent needs to navigate the framework.
  3. Integration task — "Add a new error code constant and use it in an existing test." Requires finding the right file, understanding conventions, and making a change.
Important

Run these tasks without Graphify first. That's the whole point — you need the baseline before the intervention.

The Eval Record Template

For each task, record this in a simple markdown file:

# Eval: [Task Name]

## Setup
- Date: YYYY-MM-DD
- Tool: Claude Code / Kiro CLI
- Graphify: off / on
- Project: DocumentDB Perf Tests

## Prompt
[Exact prompt you gave the agent]

## Results
- Round-trips: N
- First-attempt correct: yes/no
- Wall-clock: Xm Ys
- Notes: [anything notable about the session]

Step-by-Step: Run Your Baseline

Create an eval directory in this workspace:

mkdir -p ~/Graphify/evals

Pick your 2-3 tasks from the list above (or invent your own — they should match real work you'd do this week).

Open a fresh Kiro CLI or Claude Code session on the DocumentDB project — without building a Graphify graph first.

Run each task. Count round-trips. Note if the first output was correct. Record wall-clock time.

Save each result:

~/Graphify/evals/baseline-[task-slug].md

After the Baseline

Once you have baseline numbers, you'll:

  1. Build your Graphify graph (Lesson 1 steps)
  2. Re-run the same tasks with graph context active
  3. Compare side-by-side

You'll also run graphify benchmark after building the graph — it measures the theoretical token reduction (graph size vs. raw file corpus). This gives you the "max possible" savings; your actual savings will be lower but should track the same direction.

Expected Results

Based on independent benchmarks for repos in the 100–500 file range:

If your DocumentDB project is 500+ files, you may see higher savings.

Verify Your Understanding

Why must the baseline be captured before building the Graphify graph?

Which metric most directly measures "less back and forth with the agent"?

Do This Now

  1. Run mkdir -p ~/Graphify/evals
  2. Pick 2-3 eval tasks representative of your DocumentDB work
  3. Run them in a fresh session without Graphify and record results
  4. Save as ~/Graphify/evals/baseline-*.md
Primary Source

Read Graphify Review: The 71x Claude Code Token Savings — specifically the "Where Does the 71x Token Savings Claim Come From?" section for the methodology behind the published numbers and realistic expectations by codebase size.