Handbook

Autonomy levels

Autonomy is not binary. Levels are theoretically unbounded; this ladder gives a meaningful, testable progression for governed agent execution. Each level is defined by its unit of autonomous delivery, what stays fixed…

Introduction

Autonomy is not binary. Levels are theoretically unbounded; this ladder gives a meaningful, testable progression for governed agent execution. Each level is defined by its unit of autonomous delivery, what stays fixed, and where humans gate.

Use the ladder when scoping unattended runs, PDCA campaigns, wizard sessions, or Dark Factory targets—so teams do not claim more autonomy than their gates and resources support.

Implementation readiness (summary)

Operator hub and per-level reference architecture (building blocks, evidence, vision): Forge Platform autonomy levels.

Level	Policy	PoC demonstrated
L0	Defined	N/A (interactive)
L1	Defined	Yes
L2	Defined	Yes
L3	Defined	Yes
L4–L8	Defined (vision gates)	No

Full readiness matrix, Wizard ↔ execution mapping, and per-level architecture pages live on the Platform hub — this page remains the canonical policy table.

L0–L8 ladder

Level	Autonomous unit	What stays fixed	Human gate
L0 Assisted	Suggestions only	Everything	Continuous
L1 Function	One method/function to a given signature/contract	Architecture, API, tests	Approve branch/merge
L2 Change-set	Multi-function / multi-file defect fix or small change, no rearchitecture	Architecture, public contracts	Accept acceptance criteria + merge
L3 Use-case slice	End-to-end user-visible flow inside one existing app (UI + logic + data + tests)	Existing architecture, single platform	Intent + acceptance in; review out
L4 Feature/component	Capability across modules; may add a component within existing architecture; cross-repo, one platform	Platform, major architecture	ADR + release gate
L5 Subsystem w/ local arch evolution	Introduces patterns/refactors within a platform	Platform boundary	Architecture decision escalation
L6 Product increment	Multi-repo, multi-service on a single cloud/platform	Cloud/platform choice	Go/no-go
L7 Multi-cloud / multi-platform solution (max − 1)	A whole engineered solution spanning clouds/platforms	Business framing	Strategic checkpoints
L8 (max) Autonomous problem solving	Frames a business/humanity problem and composes L7 solutions as puzzle pieces	Nothing but the goal	Mission definition only

Autonomy ladder L0 to L8 with L0 to L3 emphasized as defined and exercised and L4 to L8 muted as future vision

L0–L3 are defined and exercised today (L1–L3 are demonstrated in the worked examples); L4–L8 remain vision requiring ADRs, go/no-go, and strategic checkpoints. Higher levels add gates and never remove lower-level ones.

How a level is enforced

A run (or campaign item) declares its target level.
The Assay gate for that level must pass before the change is considered done. Core evidence in forge/forge.config.yaml includes tests_pass, acceptance_criteria_met, and risks_reviewed.
Higher levels do not skip lower-level gates; they add gates (for example ADR + release at L4).
At L2, multi-file work should produce proof that two or more distinct files changed when acceptance criteria require a change-set—not a single-file patch dressed as L2.

Resource honesty (local-first)

Fully cloud-free autonomy above L1 is not realistic on a ~4GB local model profile: planning, architecture, and ambiguity exceed small-model capability.

Level band	Realistic local-first posture
L0–L1	Achievable with deterministic routing + local worker + verify/repair
L2–L3	Often needs ROI-gated escalation to a larger model or human at pivots
L4+	Requires explicit human gates (ADR, go/no-go, strategic checkpoints) regardless of model

The realistic operating mode is local-first with ROI-gated escalation. Track escalation rate over time; it should fall as capability cards and deterministic scaffolds improve. See Respecting resources.

Wizard alignment (planning intent)

The Blueprints Wizard in Forge Lenses captures planning-time autonomy separately from runtime loops. The AutonomyLevel enum includes l0_analyst, l1_drafter, l2_stage_autopilot, and l3_goal_autopilot, with MutationPolicy describing how far downstream automation may edit artifacts.

Wizard policies inform prompts and downstream automation; they do not silently apply upstream edits. Persisted session policies should match the ladder level you intend for execution. Full Wizard ↔ execution mapping: Platform autonomy levels hub.

Forge Dark Factory (PoC reference implementation)

Forge Dark Factory is the current PoC reference implementation for bounded autonomous coding—not a production Platform submodule.

Forge Dark Factory bounded execution loop: classify, route, context, plan, draft, apply, verify, proof, trace, escalate

The governed loop for L1–L3. Verify failures trigger bounded repair; ambiguity or budget exhaustion escalates to a human, who still approves the branch or merge.

Aspect	PoC scope today
Target autonomy	L1–L3 demonstrated (function, change-set, use-case slice); see worked examples
Loop	Classify → route → context → plan → draft → apply → verify → repair → proof → dual-wiki trace → escalate
Dependencies	`forge-lcdl` (patch units, verify, repair, proof); `forge-workcells` (optional local worker)
Trace	Machine record (M) + generated human narrative (H) with freeze gate
Routing	Deterministic Cynefin × t-shirt × value; decompose before cloud/human escalate

Do not treat Dark Factory as compliance-ready or as permission for unsupervised push/deploy. It demonstrates how the ladder and respecting resources rules compose in code.

What we do not claim

No unsupervised push/deploy — Git workflow and release decisions remain human-gated unless your org explicitly automates them with separate policy.
No compliance-ready autonomy — The ladder is engineering governance, not a certification.
No “fully autonomous” delivery — Even L8 assumes mission definition by humans; intermediate levels add explicit gates.
Escalation is expected — Especially for architecture, security, and ambiguous work; a low escalation rate is a goal, not a guarantee on day one.

Forge Platform autonomy levels — operator hub, readiness matrix, per-level reference architecture
Bounded execution examples — real L1–L3 runs with loop, PDCA, and dual-wiki diagrams
Respecting resources — token economics, decompose-before-escalate, bounded loops
Cost-aware planning and model tiering — interactive Cursor planning
Agentic SDLC — humans own intent; agents amplify execution
Agentic coding standards — review capacity and smaller PRs
Assay Gate ceremony

Software delivery