Handbook

What a Green L3 Run Proves (and Doesn't)

Forge recently published Example 7 in the bounded execution examples handbook page—a green L3 use-case slice run from the lenses-production-l3-ci campaign. This post explains what that evidence actually means for…

What L3 claims

At L3, the autonomous unit is an end-to-end user-visible flow inside one existing app: logic, docs, and UI layers touched, with end-to-end verification. Architecture and platform stay fixed. Humans supply intent and acceptance criteria in; they review out.

That is a meaningful step up from L2, where the unit is a multi-file change-set without requiring cross-layer or E2E proof.

What Example 7 did

The L3-ui-smoke campaign item targeted a sandbox docs-health app:

Aspect	Value
Plan	3 patch units across layers: `logic`, `docs`, `ui`
Files changed	`app/scanner.py`, `README.md`, `ui/index.html`
E2E	`verify-docs-health-l3.py` recorded in `tests_run`
Result	`final_status: pass`, `escalated: false`

The assay gate enforced L3-specific rules: ≥2 distinct layers, both .py and non-.py files in the proof union, and E2E pass in the test record—not merely “three files changed.”

What it proves

The ladder is enforceable in code — L3 is not only a policy table row; assay_gate.py rejects passes that lack cross-layer and E2E evidence.
A use-case slice can be decomposed — three ordered patch units, each applied and verified, then aggregated proof.
Machine records are auditable — dual-wiki trace (machine JSON + generated human report) supports back-tracing without hand-edited narratives.

Teams evaluating Forge Dark Factory (PoC) can treat this as demonstrated L3—not “foundation only.”

What it does not prove

Production app autonomy — the target was a PoC sandbox, not a live product repo with promotion to main.
L4+ capability — no multi-repo Campaign, ADR gate, or release gate was exercised.
Unsupervised ship — the run stops at evidenced pass; merge and release remain human decisions.
Cloud-free delivery at scale — local-first routing still applies; L2–L3 often needs ROI-gated escalation to larger models or humans at pivots.

Do not read a green L3 run as “our agent owns the product.” Read it as “we can bound and evidence a single use-case slice inside fixed architecture.”

How to read the handbook cluster

This example sits in a deliberate reading order:

Respecting resources — scarce vs abundant, decompose before escalate
Autonomy levels — canonical L0–L8 policy
Bounded execution examples — Examples 1–7 with diagrams
Platform autonomy hub — readiness matrix and L3 building blocks

Blueprints blog