Handbook
What a Green L3 Run Proves (and Doesn't)
Forge recently published Example 7 in the bounded execution examples handbook page—a green L3 use-case slice run from the lenses-production-l3-ci campaign. This post explains what that evidence actually means for…
What L3 claims
At L3, the autonomous unit is an end-to-end user-visible flow inside one existing app: logic, docs, and UI layers touched, with end-to-end verification. Architecture and platform stay fixed. Humans supply intent and acceptance criteria in; they review out.
That is a meaningful step up from L2, where the unit is a multi-file change-set without requiring cross-layer or E2E proof.
What Example 7 did
The L3-ui-smoke campaign item targeted a sandbox docs-health app:
| Aspect | Value |
|---|---|
| Plan | 3 patch units across layers: logic, docs, ui |
| Files changed | app/scanner.py, README.md, ui/index.html |
| E2E | verify-docs-health-l3.py recorded in tests_run |
| Result | final_status: pass, escalated: false |
The assay gate enforced L3-specific rules: ≥2 distinct layers, both .py and non-.py files in the proof union, and E2E pass in the test record—not merely “three files changed.”
What it proves
- The ladder is enforceable in code — L3 is not only a policy table row;
assay_gate.pyrejects passes that lack cross-layer and E2E evidence. - A use-case slice can be decomposed — three ordered patch units, each applied and verified, then aggregated proof.
- Machine records are auditable — dual-wiki trace (machine JSON + generated human report) supports back-tracing without hand-edited narratives.
Teams evaluating Forge Dark Factory (PoC) can treat this as demonstrated L3—not “foundation only.”
What it does not prove
- Production app autonomy — the target was a PoC sandbox, not a live product repo with promotion to main.
- L4+ capability — no multi-repo Campaign, ADR gate, or release gate was exercised.
- Unsupervised ship — the run stops at evidenced pass; merge and release remain human decisions.
- Cloud-free delivery at scale — local-first routing still applies; L2–L3 often needs ROI-gated escalation to larger models or humans at pivots.
Do not read a green L3 run as “our agent owns the product.” Read it as “we can bound and evidence a single use-case slice inside fixed architecture.”
How to read the handbook cluster
This example sits in a deliberate reading order:
- Respecting resources — scarce vs abundant, decompose before escalate
- Autonomy levels — canonical L0–L8 policy
- Bounded execution examples — Examples 1–7 with diagrams
- Platform autonomy hub — readiness matrix and L3 building blocks