reference

COST

playbook/COST.md

LLM Cost Playbook — Atelier Stations

Last updated · 2026-05-18 Purpose · Stop runaway LLM spend per sprint. Document which model handles which step, when to offload to local Ollama on the Mac Mini, and what's the realistic per-sprint cost ceiling.

Hardware available

AssetRoleCost
Anthropic API (Opus 4.7)deep reasoning, synthesis, critical paths$15/M input, $75/M output
Anthropic API (Sonnet 4.6)balanced, fast, high-frequency$3/M input, $15/M output
OpenAI GPT-5secondary opinion, eventual ensemblesimilar to Opus tier
Mac Mini (100.86.52.24) via Ollamalocal, free, any open-weights model$0
VPS (saas_factory_autonomous)upstream idea generation onlyalready running, not Atelier scope

Routing rule by station

S1 — Stress Test (target ceiling : $3 per sprint)

StepDefault modelCheap pathNotes
Channel A · YC Office Hours (/office-hours + /plan-ceo-review)Opus 4.7Sonnet 4.6Opus is worth the cost here — these skills decide whether 13 days of sprint happen
Channel B · SaaS Thesis Factory load(no LLM call)(no LLM call)Reads existing analyzer output. $0.
Channel C · Falsifier promptSonnet 4.6Ollama llama3.3:70bSingle tight prompt (~500 tokens out). Fine on a smaller model — pattern is simple
Synthesis discussion MDSonnet 4.6Sonnet 4.6Generative writing task — Sonnet is plenty. Don't waste Opus here

Estimated S1 total cost on Opus+Sonnet+Sonnet : ~$2-3 per sprint. On Sonnet-only : ~$1. Pure Ollama path : $0 (but Falsifier quality may drop — A/B test before defaulting).

S2 — Market Intelligence Report (target ceiling : $8 per sprint)

StepDefault modelCheap pathNotes
Lens 1 · Porter 5FSonnet 4.6OllamaStructural analysis — pattern matching, not deep reasoning
Lens 2 · JTBD synthesisOpus 4.7Sonnet 4.6The customer-voice clustering is where Opus's reasoning matters most
Lens 3 · April Dunford positioningSonnet 4.6Sonnet 4.6Mechanical — fill the framework
Customer voice scraping (Reddit/G2)(no LLM, web fetch)(no LLM, web fetch)$0 LLM cost
Customer voice sentiment clusteringSonnet 4.6OllamaBulk text → structured clusters
Competitor card generation (×20)Sonnet 4.6Sonnet 4.6High-volume, mechanical
Hiring signal interpretationSonnet 4.6Sonnet 4.6Pattern matching
The "case vide" 2×2 chart axes selectionOpus 4.7Sonnet 4.6This is the headline output — invest
Section synthesis (writing the 9 sections)Opus 4.7Sonnet 4.6Final pass before lock

Estimated S2 total cost on mixed routing : ~$5-8 per sprint. Sonnet-only : ~$3. Pure Ollama : ~$1 (just for the Opus calls if any).

S3 — Brand & Naming (target ceiling : $1 per sprint)

All gstack skills (/brand-identity, /naming-methodology, /brand-archetype, /color-psychology) call Claude internally — their model selection is governed by the user's claude CLI config, not Atelier. Estimated : $0.50-1.50 per sprint.

S4 — Spec & Design (target ceiling : $2 per sprint)

/design-shotgun, /design-html, /plan-eng-review are gstack skills — same logic. Estimated : $1-2.

S5 — Build (uncapped — depends on iteration count)

Code generation + /qa + /codex + /cso security audit are gstack. Most expensive station because of iteration. Estimated : $5-20 depending on how many fix-cycles. This is where the budget actually lives. S1+S2 should be cheap to leave headroom here.

S6 — Launch (target ceiling : $1.50 per sprint)

/landing-page-copywriter, /marketing-psychology, /design-html — gstack. ~$1-2.

S7 — Iterate (recurring, $0.50/week per active sprint)

Cron retros : /benchmark, /canary, /retro. Most run zero LLM (browse + perf checks). The weekly synthesis : Sonnet 4.6 = ~$0.50.

Full-sprint cost envelope

Routing strategyS1S2S3-S6S5 (build cycles)Total per sprint
Default (Opus where critical, Sonnet otherwise)$3$8$4$10~$25
Lean (Sonnet everywhere, Opus only synthesis)$1.50$3$2$8~$15
Aggressive (Ollama for non-critical, Sonnet for synthesis)$0.50$1$1$6~$9

If MRR target per sprint is 5k€ within 60 days, even the default ($25) is a 200× ratio. The lean and aggressive paths are for benchmarking the orchestrator's stability before committing to default routing.

Practical defaults shipped in Phase 1

  • S1 Channel A (gstack skills) : whatever the user's claude CLI has configured. Atelier doesn't override.
  • S1 Channel C Falsifier (when wired to orchestrator in Phase 3) : Sonnet 4.6 by default. Ollama is the optional opt-in for sprint #5+ once we know the prompt is stable.
  • S1 Synthesis discussion MD (when wired) : Sonnet 4.6. Output is generative prose — Opus would be overkill.
  • S2 lenses (Phase 2) : mixed routing as the table above. Opus only on JTBD clustering + 2×2 axes selection.

How to route to the Mac Mini Ollama

The Mac Mini at 100.86.52.24 (SSH alias macmini already configured in ~/.ssh/config) runs Ollama. To call it from the orchestrator on this Mac :

ssh macmini 'ollama run llama3.3:70b "<prompt>"'

Or via HTTP if Ollama is bound to 0.0.0.0:11434 :

curl -X POST http://100.86.52.24:11434/api/generate \
  -d '{"model":"llama3.3:70b","prompt":"<prompt>","stream":false}'

Atelier's orchestrator should accept a --model-routing flag with values default | lean | aggressive and dispatch each step accordingly.

Monitoring

Per-sprint LLM cost should be emitted as a [COST] event in agent-activity.jsonl after each station, with provider, model, input_tokens, output_tokens, usd_cost. The dashboard's /live feed already supports the new event format — Channel C and synthesis will start emitting these once Phase 3 wires the orchestrator hook.

Cost overruns are first-class events : if a sprint exceeds 2× the routing strategy's ceiling, halt and require manual continue.