COST
playbook/COST.md
LLM Cost Playbook — Atelier Stations
Last updated · 2026-05-18 Purpose · Stop runaway LLM spend per sprint. Document which model handles which step, when to offload to local Ollama on the Mac Mini, and what's the realistic per-sprint cost ceiling.
Hardware available
| Asset | Role | Cost |
|---|---|---|
| Anthropic API (Opus 4.7) | deep reasoning, synthesis, critical paths | $15/M input, $75/M output |
| Anthropic API (Sonnet 4.6) | balanced, fast, high-frequency | $3/M input, $15/M output |
| OpenAI GPT-5 | secondary opinion, eventual ensemble | similar to Opus tier |
| Mac Mini (100.86.52.24) via Ollama | local, free, any open-weights model | $0 |
| VPS (saas_factory_autonomous) | upstream idea generation only | already running, not Atelier scope |
Routing rule by station
S1 — Stress Test (target ceiling : $3 per sprint)
| Step | Default model | Cheap path | Notes |
|---|---|---|---|
Channel A · YC Office Hours (/office-hours + /plan-ceo-review) | Opus 4.7 | Sonnet 4.6 | Opus is worth the cost here — these skills decide whether 13 days of sprint happen |
| Channel B · SaaS Thesis Factory load | (no LLM call) | (no LLM call) | Reads existing analyzer output. $0. |
| Channel C · Falsifier prompt | Sonnet 4.6 | Ollama llama3.3:70b | Single tight prompt (~500 tokens out). Fine on a smaller model — pattern is simple |
| Synthesis discussion MD | Sonnet 4.6 | Sonnet 4.6 | Generative writing task — Sonnet is plenty. Don't waste Opus here |
Estimated S1 total cost on Opus+Sonnet+Sonnet : ~$2-3 per sprint. On Sonnet-only : ~$1. Pure Ollama path : $0 (but Falsifier quality may drop — A/B test before defaulting).
S2 — Market Intelligence Report (target ceiling : $8 per sprint)
| Step | Default model | Cheap path | Notes |
|---|---|---|---|
| Lens 1 · Porter 5F | Sonnet 4.6 | Ollama | Structural analysis — pattern matching, not deep reasoning |
| Lens 2 · JTBD synthesis | Opus 4.7 | Sonnet 4.6 | The customer-voice clustering is where Opus's reasoning matters most |
| Lens 3 · April Dunford positioning | Sonnet 4.6 | Sonnet 4.6 | Mechanical — fill the framework |
| Customer voice scraping (Reddit/G2) | (no LLM, web fetch) | (no LLM, web fetch) | $0 LLM cost |
| Customer voice sentiment clustering | Sonnet 4.6 | Ollama | Bulk text → structured clusters |
| Competitor card generation (×20) | Sonnet 4.6 | Sonnet 4.6 | High-volume, mechanical |
| Hiring signal interpretation | Sonnet 4.6 | Sonnet 4.6 | Pattern matching |
| The "case vide" 2×2 chart axes selection | Opus 4.7 | Sonnet 4.6 | This is the headline output — invest |
| Section synthesis (writing the 9 sections) | Opus 4.7 | Sonnet 4.6 | Final pass before lock |
Estimated S2 total cost on mixed routing : ~$5-8 per sprint. Sonnet-only : ~$3. Pure Ollama : ~$1 (just for the Opus calls if any).
S3 — Brand & Naming (target ceiling : $1 per sprint)
All gstack skills (/brand-identity, /naming-methodology, /brand-archetype, /color-psychology) call Claude internally — their model selection is governed by the user's claude CLI config, not Atelier. Estimated : $0.50-1.50 per sprint.
S4 — Spec & Design (target ceiling : $2 per sprint)
/design-shotgun, /design-html, /plan-eng-review are gstack skills — same logic. Estimated : $1-2.
S5 — Build (uncapped — depends on iteration count)
Code generation + /qa + /codex + /cso security audit are gstack. Most expensive station because of iteration. Estimated : $5-20 depending on how many fix-cycles. This is where the budget actually lives. S1+S2 should be cheap to leave headroom here.
S6 — Launch (target ceiling : $1.50 per sprint)
/landing-page-copywriter, /marketing-psychology, /design-html — gstack. ~$1-2.
S7 — Iterate (recurring, $0.50/week per active sprint)
Cron retros : /benchmark, /canary, /retro. Most run zero LLM (browse + perf checks). The weekly synthesis : Sonnet 4.6 = ~$0.50.
Full-sprint cost envelope
| Routing strategy | S1 | S2 | S3-S6 | S5 (build cycles) | Total per sprint |
|---|---|---|---|---|---|
| Default (Opus where critical, Sonnet otherwise) | $3 | $8 | $4 | $10 | ~$25 |
| Lean (Sonnet everywhere, Opus only synthesis) | $1.50 | $3 | $2 | $8 | ~$15 |
| Aggressive (Ollama for non-critical, Sonnet for synthesis) | $0.50 | $1 | $1 | $6 | ~$9 |
If MRR target per sprint is 5k€ within 60 days, even the default ($25) is a 200× ratio. The lean and aggressive paths are for benchmarking the orchestrator's stability before committing to default routing.
Practical defaults shipped in Phase 1
- S1 Channel A (gstack skills) : whatever the user's
claudeCLI has configured. Atelier doesn't override. - S1 Channel C Falsifier (when wired to orchestrator in Phase 3) : Sonnet 4.6 by default. Ollama is the optional opt-in for sprint #5+ once we know the prompt is stable.
- S1 Synthesis discussion MD (when wired) : Sonnet 4.6. Output is generative prose — Opus would be overkill.
- S2 lenses (Phase 2) : mixed routing as the table above. Opus only on JTBD clustering + 2×2 axes selection.
How to route to the Mac Mini Ollama
The Mac Mini at 100.86.52.24 (SSH alias macmini already configured in ~/.ssh/config) runs Ollama. To call it from the orchestrator on this Mac :
ssh macmini 'ollama run llama3.3:70b "<prompt>"'
Or via HTTP if Ollama is bound to 0.0.0.0:11434 :
curl -X POST http://100.86.52.24:11434/api/generate \
-d '{"model":"llama3.3:70b","prompt":"<prompt>","stream":false}'
Atelier's orchestrator should accept a --model-routing flag with values default | lean | aggressive and dispatch each step accordingly.
Monitoring
Per-sprint LLM cost should be emitted as a [COST] event in agent-activity.jsonl after each station, with provider, model, input_tokens, output_tokens, usd_cost. The dashboard's /live feed already supports the new event format — Channel C and synthesis will start emitting these once Phase 3 wires the orchestrator hook.
Cost overruns are first-class events : if a sprint exceeds 2× the routing strategy's ceiling, halt and require manual continue.