Handbook
Cost-aware planning and model tiering
Forge planning should raise final quality and control token cost without re-typing the same instructions each turn. This page defines the reusable pattern: a lean triage rule that sizes every request, a…
Companion rules: forge-triage.mdc and forge-planning-standards.mdc. Install: see Cursor rules — quickstart (Forge + Versonas) (--with-cost-tiering-rules). Where these sit relative to Rules / AGENTS.md / Skills / subagents / recipes: Versona operating model — process-first layout, artifacts, and tooling split §3.
Why
Cursor gives no pre-execution token meter (the context ring reports usage only afterward), and there is no way to scope a rule to Plan Mode alone. So the practical levers are: (1) rules that are in context during planning and shape plan output, (2) deliberate model selection, and (3) subagents that isolate token-heavy work on cheaper models. This pattern packages all three.
1. Triage every request
forge-triage.mdc (lean, always-apply) emits one compact line before non-trivial work:
Triage: <XS|S|M|L|XL> (~<rough token order>) · <1-3 word rationale>
The size is a heuristic self-estimate from visible signals (files touched, search breadth, generation volume, loop count), not a measured number. Calibrate against the context ring over time.
2. T-shirt sizing rubric
| Size | Signals | ~Token order | Triage depth | Model / orchestration response |
|---|---|---|---|---|
| XS | 1 file, direct answer, no search | <5k | none | current model, no subagents |
| S | 2-4 files, simple edit, narrow grep | 5-25k | 1-line header | Auto; Explore only if needed |
| M | multi-file feature, 1 repo, moderate reasoning | 25-100k | header + approach | Auto/mid main; Explore for search, Bash for runs |
| L | cross-cutting, iterative loops, or multi-repo | 100-400k | invest: strategy step | high-tier orchestrator + cheap subagents; plan before executing |
| XL | migration, many files, long multi-tier loops | >400k | invest + split | full tiered plan, checkpoints, split across sessions |
3. The gate (the key economic rule)
Strategy investment scales with size, so a trivial request never pays orchestration overhead:
- XS / S — execute directly. Do not spawn a strategy subagent; triage stays inline.
- M — one-sentence approach, then execute.
- L / XL — invest a strategy step only if
est(strategy) ≲ 10% of est(execution). The strategy pass (cheap) returns which subagents/models per phase and what to delegate.
Recursion trap to avoid: a subagent that "decides the strategy" is itself a token cost. Keep triage an inline heuristic (a few tokens by the current model); only at L/XL is the task big enough to amortize a spawned strategy pass.
4. Model tiering
| Tier | Work | Model |
|---|---|---|
| Orchestrator (main) | planning, architecture, synthesis | high-tier or Auto |
| Search / read | codebase Q&A, "where is X" | built-in Explore (auto, faster model) |
| Scripts / shell | tests, builds, git | built-in Bash (auto) |
| Mechanical edits | rename, boilerplate, apply-a-pattern | cheap grunt subagent (composer-2.5) |
The grunt subagent template lives at Grunt; copy it into .cursor/agents/ manually (like Skills).
Cost math (so tiering actually saves money)
- Each subagent carries its own context window: N parallel subagents ≈ N x tokens. Savings come from a cheap model doing the bulk-token work in a sequential tier (grep/mechanical → report back → high-tier reasons), not from fan-out.
- On legacy request-based plans, a subagent's
model:is ignored and it runs on Composer; token-based plans honor it. - Prefer Auto/Composer for everyday turns (larger included allowance; Teams plans are exempt from the Cursor Token Rate). Reserve Max Mode for tasks that genuinely need the larger context — it bills at the model's full API rate.
Fast-variant policy
Quality-first default: when the agent selects a model for a subagent or delegated Task, use the standard variant of a tier, not a speed-tier *-fast variant (e.g. composer-2.5, never composer-2.5-fast), unless the user explicitly asks for speed. If a -fast model would otherwise be the natural pick, flag it and use the standard variant.
Capability boundary: a rule is prompt context — it cannot override Cursor's model router. It only governs models the agent itself chooses (subagent model: fields, Task-tool model requests). The main-chat model remains the user's picker/Auto choice, which a rule cannot veto. So this policy is "don't delegate to fast variants," plus a reminder to the user, not a hard router block.
5. Testing rule effectiveness
Rules are prompts; treat their effectiveness as testable. The reusable, isolated harness lives beside the templates at Triage + planning effectiveness test:
- Run a synthetic request rules-on vs rules-off so measured behavior is attributable to the rules.
- Score each transcript against a rubric (triage header present and correctly sized? plan has the required sections? model tiering proposed? gate respected?).
- PDCA loop: run → score → refine rule wording on gaps → re-run → until the acceptance gate passes.
Effectiveness results
Latest run of the isolated harness (rules-on vs rules-off), scored by a separate evaluator against Scoring rubric:
| Scenario | Rules off | Rules on | Max | Notes |
|---|---|---|---|---|
| A — detailed plan (E2EE feature) | 8 | 23 | 24 | rules-off emitted no triage line, no PDCA/drift gate, no model tiering |
| B — small rename (S) | 6 | 10 | 10 | rules-off over-explained without sizing; rules-on gated correctly, no over-planning |
| Total | 14 | 33 | 34 | delta +19 |
Acceptance gate: PASS (A ≥ 20 with 10/10 plan rows; B = 10 with the two anti-over-planning rows maxed; delta ≥ 8).
PDCA action taken: the first rules-on run sized Scenario A as M while acknowledging the feature was L (rubric row A2 = partial). forge-triage.mdc was refined to size the underlying work, not just the current turn ("just plan it" for a large feature is still L/XL). A follow-up check confirmed the triage line then read L, closing the gap.
Relationship to other Forge planning docs
- Product planning (Business Driver → Product Spark → iterations) stays in Planning flow — vision to daily Sparks and
forge-planning.mdc. This page is about how any implementation plan is produced and sized, not the product-planning hierarchy. - File token bands for source analyzability are separate:
code-footprint.mdcand Agentic coding standards (cross-cutting).