reference

COST

playbook/COST.md

LLM Cost Playbook — Atelier Stations

Last updated · 2026-05-18 Purpose · Stop runaway LLM spend per sprint. Document which model handles which step, when to offload to local Ollama on the Mac Mini, and what's the realistic per-sprint cost ceiling.

Hardware available

Asset	Role	Cost
Anthropic API (Opus 4.7)	deep reasoning, synthesis, critical paths	$15/M input, $75/M output
Anthropic API (Sonnet 4.6)	balanced, fast, high-frequency	$3/M input, $15/M output
OpenAI GPT-5	secondary opinion, eventual ensemble	similar to Opus tier
Mac Mini (100.86.52.24) via Ollama	local, free, any open-weights model	$0
VPS (saas_factory_autonomous)	upstream idea generation only	already running, not Atelier scope

Routing rule by station

S1 — Stress Test (target ceiling : $3 per sprint)

Step	Default model	Cheap path	Notes
Channel A · YC Office Hours (`/office-hours` + `/plan-ceo-review`)	Opus 4.7	Sonnet 4.6	Opus is worth the cost here — these skills decide whether 13 days of sprint happen
Channel B · SaaS Thesis Factory load	(no LLM call)	(no LLM call)	Reads existing analyzer output. $0.
Channel C · Falsifier prompt	Sonnet 4.6	Ollama llama3.3:70b	Single tight prompt (~500 tokens out). Fine on a smaller model — pattern is simple
Synthesis discussion MD	Sonnet 4.6	Sonnet 4.6	Generative writing task — Sonnet is plenty. Don't waste Opus here

Estimated S1 total cost on Opus+Sonnet+Sonnet : ~$2-3 per sprint. On Sonnet-only : ~$1. Pure Ollama path : $0 (but Falsifier quality may drop — A/B test before defaulting).

S2 — Market Intelligence Report (target ceiling : $8 per sprint)

Step	Default model	Cheap path	Notes
Lens 1 · Porter 5F	Sonnet 4.6	Ollama	Structural analysis — pattern matching, not deep reasoning
Lens 2 · JTBD synthesis	Opus 4.7	Sonnet 4.6	The customer-voice clustering is where Opus's reasoning matters most
Lens 3 · April Dunford positioning	Sonnet 4.6	Sonnet 4.6	Mechanical — fill the framework
Customer voice scraping (Reddit/G2)	(no LLM, web fetch)	(no LLM, web fetch)	$0 LLM cost
Customer voice sentiment clustering	Sonnet 4.6	Ollama	Bulk text → structured clusters
Competitor card generation (×20)	Sonnet 4.6	Sonnet 4.6	High-volume, mechanical
Hiring signal interpretation	Sonnet 4.6	Sonnet 4.6	Pattern matching
The "case vide" 2×2 chart axes selection	Opus 4.7	Sonnet 4.6	This is the headline output — invest
Section synthesis (writing the 9 sections)	Opus 4.7	Sonnet 4.6	Final pass before lock

Estimated S2 total cost on mixed routing : ~$5-8 per sprint. Sonnet-only : ~$3. Pure Ollama : ~$1 (just for the Opus calls if any).

S3 — Brand & Naming (target ceiling : $1 per sprint)

All gstack skills (/brand-identity, /naming-methodology, /brand-archetype, /color-psychology) call Claude internally — their model selection is governed by the user's claude CLI config, not Atelier. Estimated : $0.50-1.50 per sprint.

S4 — Spec & Design (target ceiling : $2 per sprint)

/design-shotgun, /design-html, /plan-eng-review are gstack skills — same logic. Estimated : $1-2.

S5 — Build (uncapped — depends on iteration count)

Code generation + /qa + /codex + /cso security audit are gstack. Most expensive station because of iteration. Estimated : $5-20 depending on how many fix-cycles. This is where the budget actually lives. S1+S2 should be cheap to leave headroom here.

S6 — Launch (target ceiling : $1.50 per sprint)

/landing-page-copywriter, /marketing-psychology, /design-html — gstack. ~$1-2.

S7 — Iterate (recurring, $0.50/week per active sprint)

Cron retros : /benchmark, /canary, /retro. Most run zero LLM (browse + perf checks). The weekly synthesis : Sonnet 4.6 = ~$0.50.

Full-sprint cost envelope

Routing strategy	S1	S2	S3-S6	S5 (build cycles)	Total per sprint
Default (Opus where critical, Sonnet otherwise)	$3	$8	$4	$10	~$25
Lean (Sonnet everywhere, Opus only synthesis)	$1.50	$3	$2	$8	~$15
Aggressive (Ollama for non-critical, Sonnet for synthesis)	$0.50	$1	$1	$6	~$9

If MRR target per sprint is 5k€ within 60 days, even the default ($25) is a 200× ratio. The lean and aggressive paths are for benchmarking the orchestrator's stability before committing to default routing.

Practical defaults shipped in Phase 1

S1 Channel A (gstack skills) : whatever the user's claude CLI has configured. Atelier doesn't override.
S1 Channel C Falsifier (when wired to orchestrator in Phase 3) : Sonnet 4.6 by default. Ollama is the optional opt-in for sprint #5+ once we know the prompt is stable.
S1 Synthesis discussion MD (when wired) : Sonnet 4.6. Output is generative prose — Opus would be overkill.
S2 lenses (Phase 2) : mixed routing as the table above. Opus only on JTBD clustering + 2×2 axes selection.

How to route to the Mac Mini Ollama

The Mac Mini at 100.86.52.24 (SSH alias macmini already configured in ~/.ssh/config) runs Ollama. To call it from the orchestrator on this Mac :

ssh macmini 'ollama run llama3.3:70b "<prompt>"'

Or via HTTP if Ollama is bound to 0.0.0.0:11434 :

curl -X POST http://100.86.52.24:11434/api/generate \
  -d '{"model":"llama3.3:70b","prompt":"<prompt>","stream":false}'

Atelier's orchestrator should accept a --model-routing flag with values default | lean | aggressive and dispatch each step accordingly.

Monitoring

Per-sprint LLM cost should be emitted as a [COST] event in agent-activity.jsonl after each station, with provider, model, input_tokens, output_tokens, usd_cost. The dashboard's /live feed already supports the new event format — Channel C and synthesis will start emitting these once Phase 3 wires the orchestrator hook.

Cost overruns are first-class events : if a sprint exceeds 2× the routing strategy's ceiling, halt and require manual continue.