Handbook
LLM app settings and model routing
Status: Normative for Forge-family apps that expose user-tunable LLM behavior (e.g. Situ8, Lenses Studio). Goal: Reusable definitions for quality/cost tradeoffs, automatic model selection, optional adaptive routing, and…
1. Concepts
1.1 Quality order (per provider)
Each provider exposes a curated list of model ids in best-first order: index 0 is the strongest model in that list (often highest typical cost/latency); the last index is the weakest/cheapest in the list. Lists are vendor-specific and must be versioned when provider APIs add or rename models.
Apps may union this list with remotely discovered ids (e.g. OpenAI /v1/models); unknown ids are appended in a stable order (e.g. lexicographic) after known ids.
1.2 User pool
The user enables a subset of models (checkboxes / swimlanes). Auto mode only considers models in (pool ∩ catalog) where catalog is the union of bundled defaults and remote ids.
1.3 Quality tier (cost / quality “slider”)
Cloud quality is represented as a discrete tier (not continuous dollars):
| Tier (name) | Role in auto mode |
|---|---|
| TOP … EXTRA_LOW | Maps linearly to an index along the ordered pool (see §2). |
NONE is reserved for “offline / invalid for this cloud path” in apps that support offline inference; web servers like Lenses may omit it.
Semantics: Moving the slider toward EXTRA_LOW prefers cheaper/weaker models in the pool; toward TOP prefers stronger models—relative to the ordered list, not an absolute price guarantee.
1.4 Manual vs auto
- Manual: One main model id per active provider; requests use that id unless overridden per call.
- Auto: Requires advanced UI and auto enabled. Resolves an id from tier + ordered pool (§2).
1.5 Adaptive autoselection (optional)
When adaptive is on (and auto + advanced):
- A small classifier call classifies the user prompt into task and complexity (see §3).
- An adjustment step shifts the tier-derived index within the pool toward stronger or cheaper models (same spirit as Situ8’s
AdaptiveModelMapper).
Tradeoff: Adds latency and token cost for the classifier hop; users must be told in UI.
1.6 Refinement downshift (policy extension)
For second and later passes in the same conversation or workflow (e.g. polish, compact, “refine again”), apps may apply a refinement policy:
- Index downshift: Move N steps toward cheaper (increase index in best-first order) within the pool, capped at the last model.
- Optionally lower max output tokens for refinement-only calls.
This is not always how mobile intake refine works; it is an explicit policy for cost-sensitive loops.
2. Tier to index mapping (auto mode)
Let ordered = filtered quality order restricted to the user pool, length n.
If n == 0, fall back to the main manual model id for that provider.
If n == 1, use that single id.
Otherwise map tier to slot s in 0..5 (TOP → EXTRA_LOW), then:
idx = floor(s * (n - 1) / 5) clamped to 0..n-1.
Return ordered[idx].
This matches the linear mapping used in Situ8’s ModelSelectionResolver.pickFromOrdered.
3. Classifier output (adaptive)
Classifier returns structured JSON only, e.g.:
- task:
CHAT|CODE|REASONING|CREATIVE|SUMMARIZE|OTHER - complexity:
TRIVIAL|MODERATE|HEAVY
Adjustment rules (conceptually): heavier tasks / complexity may shift toward stronger models (lower index); trivial / summarize may shift toward cheaper (higher index). Exact deltas are implementation-defined per app but should stay within the pool.
4. Lenses Studio (forge-lenses) mapping
Lenses supports multiple providers: anthropic, openai, gemini, ollama, openai_compatible.
- openai / gemini: Full tier ladder + optional adaptive classifier (same pattern as Situ8).
- anthropic: Maintain a documented best-first list for auto mode; if a ladder is not curated, fall back to manual main model only.
- ollama / openai_compatible: Local or custom endpoints; quality order may be user-defined list or single model until a ladder exists.
Secrets: API keys for Lenses must live on the Python server (file under workspace-local gitignored paths or environment variables), never in browser storage. Access to settings APIs should match other privileged local APIs (loopback by default).
5. Versioning and maintenance
- Quality order lists change when vendors ship new models; bump a doc version or app changelog when defaults change.
- Remote model refresh (optional): periodic fetch when keys exist, with cache TTL—see mobile implementation patterns.
See also
- Situ8 reference implementation:
ModelSelectionResolver,AdaptiveModelMapper,LlmModels(Kotlin). - Lenses:
lenses/llm_routing.py,lenses/llm_settings_store.py, README (Studio LLM).
On this page
1. Concepts 1.1 Quality order (per provider) 1.2 User pool 1.3 Quality tier (cost / quality “slider”) 1.4 Manual vs auto 1.5 Adaptive autoselection (optional) 1.6 Refinement downshift (policy extension) 2. Tier to index mapping (auto mode) 3. Classifier output (adaptive) 4. Lenses Studio (forge-lenses) mapping 5. Versioning and maintenance See also