Handbook

Respecting resources

Agentic SDLC assumes humans own intent and agents amplify execution under policy. That policy must account for finite resources: cloud and IDE tokens, human review capacity, premium model budget, and the practical limits…

Why this matters

Agentic SDLC assumes humans own intent and agents amplify execution under policy. That policy must account for finite resources: cloud and IDE tokens, human review capacity, premium model budget, and the practical limits of local-first compute (for example a ~4GB local model profile).

Teams that ignore resource economics get fast-looking automation with review bottlenecks, token burn, and fragile unattended runs. Forge treats bounded execution as a first-class design choice—not an afterthought.

What is scarce vs abundant

Respecting resources: scarce versus abundant, and the routing order deterministic, local, decompose, escalate

Protect the scarce (tokens, review, premium budget) by spending the abundant (time, deterministic checks, local calls). The router picks the cheapest tier that clears the bar. See it end-to-end in the worked examples.

Resource	Default stance	Practical lever
Cloud / IDE tokens	Scarce	Cheapest tier that clears the quality bar; decompose before escalate
Human review and attention	Scarce	Smaller autonomous units (autonomy ladder); Assay Gate; explicit merge approval
Premium model budget	Scarce	ROI-gated escalation; strategy passes only when task size amortizes cost
Wall-clock time	Abundant (default)	Iterate, repair, decompose—spend time before spending tokens
Deterministic checks	Cheap	Scripts, CI, freeze gates—prefer when a rule exists

Guiding principle: maximize deterministic work and cheap local calls; decompose rather than pay; escalate to a larger model or a human only when value justifies it.

Operating rules

Deterministic-first — If a template, script, or CI check can settle the task, do not invoke an LLM.
Cheapest tier that clears the bar — Classify the task (domain, size, value); route to deterministic → local → decompose → escalate in that order.
Decompose before escalate — When local quality is marginal, split into smaller units that fit local capability instead of paying for cloud reasoning immediately.
Bounded loops — Cap retries, decomposition depth, context budget, and per-run local calls; refuse to spend past the ceiling and escalate instead.
Architecture and security always escalate — Required quality is set high for these task classes regardless of local model cards.
Track escalation rate — A rising rate signals weak scaffolds or mis-sized autonomy; improve capability cards and gates over time.

Resource types in practice

Type	Where it shows up	Forge alignment
Tokens (Cursor / cloud)	IDE context, subagents, API calls	Cost-aware planning and model tiering for interactive work
Compute (local ~4GB profile)	Ollama / Granite / small local workers	Strong on XS/S clear/complicated edits; weak on planning and architecture—see Autonomy levels
Human gates	Merge, Assay Gate, approval requests	Forge ceremonies; Lenses agentic bridge
Evidence (CI)	Tests, acceptance criteria, risks reviewed	`forge/forge.config.yaml` core evidence; proof reports from governed runs

Interactive planning vs autonomous execution

Mode	Audience	Primary doc
Interactive (Cursor)	Human in the loop each turn	Cost-aware planning and model tiering — t-shirt triage, model tiering, plan structure
Autonomous / bounded loops	Unattended or PDCA campaigns	This page + Autonomy levels — ladder, gates, local-first routing

Cost-aware rules shape how you plan in the IDE. Respecting resources shapes how much change an autonomous run may attempt and when it must stop and ask a human.

Autonomy ladder (overview)

Autonomy is not binary. Forge defines a testable L0–L8 ladder: each level names the unit of autonomous delivery, what architecture or contracts stay fixed, and where humans gate.

At a glance: L0 is suggestions only; L1 is a single function or small contract-bound change; L2 is a multi-file change-set without rearchitecture; L3 is an end-to-end use-case slice inside one existing app. Higher levels add scope and gates.

Full table, enforcement, and PoC notes: Autonomy levels.

Reference implementation: Forge Dark Factory (PoC)

Forge Dark Factory is a governed, sequential, local-first autonomous coding loop (PoC). It implements deterministic classify → route → context → plan → draft → apply → verify → repair → proof → dual-wiki trace → escalate, with an L1 target today and foundation for L2–L3.

The PoC repo is not a published product surface; the ladder and resource rules in this handbook are methodology whether or not you run Dark Factory. Operators adopting Platform workcells or LCDL patch execution can use the same gates and honesty about local model limits.

See it in action: Bounded execution examples walks real L1 and L2 runs — including a token-free deterministic fix and a local→Cursor worker-ladder step — so you can see exactly what each level spends.

Bounded execution examples — real L1/L2 runs, loop, routing, PDCA, and dual-wiki diagrams
Autonomy levels — L0–L8 ladder, Assay enforcement, 4GB boundary
Cost-aware planning and model tiering — Cursor triage and model tiering
Agentic SDLC — cross-cutting agent layer
Agentic coding standards — review bottleneck and WIP
Versona operating model — cognition vs execution plane
Assay Gate ceremony

Software delivery