Cost Controls and Observability

Lesson 8 · Safe Agentic Workflows · ~8 minutes

Scheduled agents are powerful — they work while you sleep. But they also spend while you sleep. Without cost controls, a daily workflow can quietly accumulate hundreds of dollars per month in AI credits and Actions minutes. This lesson gives you the tools to prevent runaway costs and gain visibility into what your agents are actually doing.

The Cost Problem

Every workflow run incurs two costs:

⚠️ The Compound Effect

A single run might cost $5. That seems fine. But $5 × 365 days = $1,825/year. And that's one workflow. Most teams end up with 5–15 scheduled workflows. Without budgets, the default behavior is uncapped spending that scales linearly with time.

The core principle: every scheduled workflow must have an explicit budget. Never rely on defaults for production automation.

Hard Budget: max-ai-credits

The max-ai-credits frontmatter field sets a hard ceiling on AI credit consumption per run. If the workflow reaches this limit, it stops immediately — mid-analysis if necessary.

.github/workflows/daily-review.md
---
description: "Daily code review suggestions"

on:
  schedule: daily around 8am on weekdays

engine:
  id: copilot
  model: gpt-4.1-mini

max-ai-credits: 200
max-daily-ai-credits: 800

safe-outputs:
  create-issue:
    title-prefix: "[daily-review] "
    close-older-issues: true
---

Key behaviors of max-ai-credits:

Model Selection and Cost Tradeoffs

Not every workflow needs the most capable model. Choosing the right model for the task is the single biggest cost lever you have.

Model Best For Relative Cost
gpt-4.1-mini Scanning, summarization, pattern matching, routine monitoring $ (cheapest)
claude-haiku-4-5 Fast classification, simple analysis, structured extraction $
gpt-4.1 Code generation, complex reasoning, multi-step analysis $$$
claude-sonnet-4 Nuanced writing, architectural analysis, code review $$$
o3 / claude-opus-4 Multi-step reasoning, complex refactoring, research $$$$$

Rule of thumb: start every scheduled workflow with the cheapest model that produces acceptable output. Upgrade only when you see quality failures in the logs.

Optimization Levers

Beyond model selection, four levers reduce cost per run:

Lever How Impact
Tighter prompts Remove preamble, examples, and instruction the model doesn't need. Be specific about output format. 20–40% fewer input tokens
Fewer output tokens Ask for concise output. "Summarize in 3 bullets" costs less than "write a detailed report." 30–60% fewer output tokens
Skip-if conditions Add skip-if logic so the workflow doesn't run when there's nothing to analyze (no new commits, no open PRs). Eliminates entire runs
Scoped file reads Use permissions: contents: read with path filters instead of reading the entire repo. 50–80% less context
skip-if example
---
on:
  schedule: daily around 9am on weekdays

skip-if:
  no-commits-since: 24h
  no-open-prs: true
---

# Only runs if there's actually something to review

Observability: gh aw logs

You can't optimize what you can't measure. The gh aw logs command shows exactly what each run consumed:

$ gh aw logs daily-review --last 7

Run ID      Date        Duration  AIC Used  Status
─────────   ──────────  ────────  ────────  ──────
a3f2c1e     Jun 28      42s       148       ✓ success
b7d4e2a     Jun 27      38s       135       ✓ success
c9a1f3b     Jun 26      1m12s     312       ✓ success  ← spike
d2e5a4c     Jun 25      35s       127       ✓ success
e4b6c7d     Jun 24      40s       141       ✓ success
f1a8d9e     Jun 23      0s        0         ⊘ skipped (no-commits)
g3c2e1f     Jun 22      0s        0         ⊘ skipped (weekend)

Key things to look for:

Auditing Failures: gh aw audit

When a workflow fails or produces unexpected output, gh aw audit shows the full execution trace:

$ gh aw audit daily-review --run c9a1f3b

Run: c9a1f3b (Jun 26, 1m12s, 312 AIC)
Status: success (but over typical budget)

Timeline:
  00:00  Started. Model: gpt-4.1-mini
  00:02  Read 14 files (src/api/*.ts) — 8,200 tokens
  00:08  Read 6 files (tests/*.ts) — 4,100 tokens     ← unusual
  00:15  Model response: 2,800 tokens (analysis)
  00:42  Model response: 1,200 tokens (issue body)
  01:12  Output: created issue #247

Token breakdown:
  Input:  18,400 tokens (context + prompt)
  Output:  4,000 tokens (responses)
  Total:   312 AIC

Note: Input was 2x normal due to test file reads triggered by
      new test files added in commit a1b2c3d.

The audit trail answers "why did this run cost more?" and "what went wrong?" — essential for debugging scheduled automation that runs without supervision.

OpenTelemetry Integration

For teams running multiple workflows, CLI inspection doesn't scale. Export telemetry to your observability stack for dashboards and alerting:

.github/aw-config.yml
telemetry:
  otlp:
    endpoint: https://otel.yourcompany.com:4317
    headers:
      Authorization: "Bearer ${secrets.OTLP_TOKEN}"
    export:
      - traces      # Full execution spans
      - metrics     # AIC usage, duration, token counts
      - logs        # Model interactions (redacted)

  alerts:
    - name: budget-spike
      condition: aic_used > 2 * avg(aic_used, 7d)
      notify: slack:#agentic-ops
📊 Dashboards

Track AIC spend per workflow, cost trends over time, model utilization. Grafana, Datadog, or any OTLP-compatible backend.

🚨 Alerting

Get notified on cost spikes, repeated failures, or budget exhaustion before it becomes a monthly bill surprise.

🔍 Traces

See exactly which files were read, which model calls were made, and where time was spent — per run.

📈 Trends

Spot workflows whose costs are creeping up as the repo grows, so you can optimize before the bill lands.

Outcomes Measurement

Cost control isn't just about spending less — it's about spending well. A workflow that costs $3/day but whose output is ignored every time is wasting $90/month. Track whether your automation actually helps:

$ gh aw stats daily-review --last 30d

Runs: 22 (8 skipped)
Avg cost: 145 AIC ($1.45/run)
Monthly spend: ~$32
Acceptance rate: 73% (issues read within 4h)
Action rate: 45% (led to a commit or PR within 24h)
Noop rate: 18%

A workflow with a low action rate isn't necessarily bad — a security scanner that finds nothing 90% of the time is doing its job. But a daily summary that's never read should be made weekly or killed.

Concurrency Controls

Prevent duplicate runs from piling up — especially when a workflow is triggered by both schedule and manual dispatch, or when a slow run overlaps with the next scheduled execution:

concurrency configuration
---
concurrency:
  group: daily-review-${branch}
  cancel-in-progress: true
---

Behaviors:

Staged Mode: Preview Before Going Live

When you first deploy a scheduled workflow — or after making significant changes — use staged: true to preview what it would do without actually publishing outputs:

staged mode
---
staged: true   # Runs the full workflow but doesn't publish outputs

safe-outputs:
  create-issue:
    title-prefix: "[daily-review] "
---

In staged mode:

This is your safety net: validate cost, quality, and relevance before committing to daily spend.

Connection to Your Work

As you add scheduled workflows (the background agents from Lesson 5), cost control moves from "nice to have" to essential infrastructure. Here's the progression:

Stage What to Do
First workflow Set max-ai-credits, use cheapest model, enable staged: true
2–5 workflows Add max-daily-ai-credits, check gh aw logs weekly, tune budgets down
5+ workflows Set up OTLP export, create a cost dashboard, add spike alerts
Team-wide adoption Establish org-level budget policies, review outcomes monthly, kill low-value workflows
✅ Cost Optimization Checklist

Check Your Understanding

What happens when a workflow run hits its max-ai-credits limit?
Correct — max-ai-credits is a hard stop. When the budget is exhausted, the workflow terminates immediately and no outputs are published. This prevents a runaway analysis from producing incomplete or misleading results.
Not quite. max-ai-credits is a hard ceiling — when reached, the run stops immediately with no partial outputs published. It doesn't downgrade models, pause for approval, or charge a penalty. The workflow simply terminates to protect your budget.
What is the purpose of staged: true in a workflow?
Right — staged mode runs the complete workflow end-to-end (consuming credits) but holds all outputs without publishing. You review them with gh aw logs <run-id> --staged-output and remove staged: true once you're confident in the quality and cost.
Staged mode runs the full workflow normally — including model calls and file reads — but captures outputs without publishing them (no issues created, no comments posted). It's a preview mechanism so you can validate cost and quality before committing to ongoing automated output.
How would you identify which of your scheduled workflows is the most expensive?
Correct — gh aw logs shows actual AIC consumed per run, which you can compare across workflows. For teams with many workflows, OTLP dashboards aggregate this automatically. Note that max-ai-credits is the budget cap, not actual spend — a workflow might use far less than its limit.
The max-ai-credits value is just the budget cap — actual spend may be much lower. To find the most expensive workflow, check gh aw logs for actual AIC usage per run, or set up OTLP dashboards that aggregate cost metrics across all workflows. The billing page exists but doesn't break down by individual workflow.

What's Next

You now know how to keep your agentic workflows on a budget and how to see exactly what they're doing. Cost controls and observability aren't optional add-ons — they're the foundation that makes all your other automation sustainable. The next lesson covers testing and validation — how to verify that your workflows produce correct outputs before trusting them in production.

Primary Source

Cost Management Reference — complete documentation on max-ai-credits, max-daily-ai-credits, model pricing, OTLP export configuration, and budget alerting.

Questions? Ask me about budget sizing for specific workflow types, OTLP setup, model selection tradeoffs, or how to calculate ROI on your scheduled workflows.
← Back Next →