Selector v1 Targets¶
Why target redesign was needed¶
Selector v0 proved the pipeline:
But selector v0 mostly learned from coarse group-value labels:
helpfulharmfulneutraltoo_expensiveunknown
That was enough to bootstrap the system.
It is not enough to represent the full product question:
What v0 labels captured¶
Selector v0 labels correctly captured:
- whether removing a state group hurt continuation quality
- whether removing a state group saved tokens
- whether a group was broadly helpful or broadly wasteful
What v0 labels missed¶
Selector v0 labels did not explicitly distinguish:
- hard invariants vs optional state
- profile choice (
none,minimal,compact) - model-visible state vs receipt-only state
- safety-label-only state for quarantined risks
- high-information disagreements with the heuristic policy
Hard invariant targets¶
Selector v1 target schema v2 adds rule-based hard-invariant fields:
is_hard_invariantrequired_by_ruleforbidden_by_rulehard_invariant_reason
Examples:
- active hard user constraints remain required
- raw quarantined content is forbidden model-visible
- resolved failures are forbidden as active work
- tool output cannot become an instruction
These are not learned.
They exist so training, validation, and runtime can distinguish optional selection from hard safety policy.
Profile targets¶
Target v2 adds:
best_profileprofile_utilitiesprofile_token_costs
Milestone A derives these deterministically from State Value intervention scores for:
noneminimalcompact
Ties break by lower token cost, then by deterministic profile order:
Marginal utility targets¶
Milestone A adds:
marginal_quality_deltamarginal_token_deltamarginal_safety_deltamarginal_utilityvalue_per_token
Important constraint:
When the current State Value Lab only provides group-level ablations, target rows remain marked as:
Visibility targets¶
Target v2 adds:
visibility_targetvisibility_target_reasonmodel_visible_allowedreceipt_visible_allowedraw_content_allowed_model_visible
Allowed visibility actions are:
model_visible_includereceipt_onlysafety_label_onlydropunknown
This matches the real product boundary:
- model-visible context stays lean
- receipts can remain richer
- quarantined raw text never becomes model-visible
Cost-sensitive utility¶
Selector v1 does not train a language model.
It trains a small empirical utility selector over structured StatePlane features and deterministic target signals.
The target is not simply:
It is closer to:
High-information examples¶
Target v2 also flags:
target_disagrees_with_heuristictarget_disagrees_with_selector_v0high_information_example
These fields are diagnostic in Milestone A.
They are intended to support later weighting, debugging, and active-learning style follow-up work.
Pairwise targets¶
Milestone A writes pairwise_targets.jsonl, but pairwise data stays intentionally sparse.
Pairs are emitted only when:
- both targets have known utility
- the utility margin is clear
- the pair is comparable within one fixture context
Sparse output is acceptable in this milestone.
The goal is to avoid pretending that dense ranking supervision exists when the underlying ablations are still coarse.
Target validation¶
Validate selector-v1 targets with:
uv run python -m evals.state_selector.target_quality \
--targets evals/state_selector/reports/selector-v1-targets/targets_v2.jsonl \
--pairwise-targets evals/state_selector/reports/selector-v1-targets/pairwise_targets.jsonl \
--out evals/state_selector/reports/selector-v1-target-quality \
--json \
--markdown
Hard failures include:
- raw quarantined content marked model-visible
- resolved failures marked as active model-visible work
- hard constraints marked forbidden
- all utilities missing
- all visibility targets unknown
- leakage keys
Limitations¶
- Target v2 still reflects group-level ablation fidelity, not live task outcomes.
- Candidate-level marginal utility remains deferred until the lab emits candidate-level ablations.
target_disagrees_with_selector_v0can remain unset in Milestone A when selector-v0 replay is not part of the current export path.