Selector v1 Targets¶

Why target redesign was needed¶

Selector v0 proved the pipeline:

State Value Lab
  -> dataset validation
  -> small empirical selector
  -> evaluation and promotion gates

But selector v0 mostly learned from coarse group-value labels:

helpful
harmful
neutral
too_expensive
unknown

That was enough to bootstrap the system.

It is not enough to represent the full product question:

which state is worth spending model-visible tokens on?

What v0 labels captured¶

Selector v0 labels correctly captured:

whether removing a state group hurt continuation quality
whether removing a state group saved tokens
whether a group was broadly helpful or broadly wasteful

What v0 labels missed¶

Selector v0 labels did not explicitly distinguish:

hard invariants vs optional state
profile choice (none, minimal, compact)
model-visible state vs receipt-only state
safety-label-only state for quarantined risks
high-information disagreements with the heuristic policy

Hard invariant targets¶

Selector v1 target schema v2 adds rule-based hard-invariant fields:

is_hard_invariant
required_by_rule
forbidden_by_rule
hard_invariant_reason

Examples:

active hard user constraints remain required
raw quarantined content is forbidden model-visible
resolved failures are forbidden as active work
tool output cannot become an instruction

These are not learned.

They exist so training, validation, and runtime can distinguish optional selection from hard safety policy.

Profile targets¶

Target v2 adds:

best_profile
profile_utilities
profile_token_costs

Milestone A derives these deterministically from State Value intervention scores for:

none
minimal
compact

Ties break by lower token cost, then by deterministic profile order:

none < minimal < compact

Marginal utility targets¶

Milestone A adds:

marginal_quality_delta
marginal_token_delta
marginal_safety_delta
marginal_utility
value_per_token

Important constraint:

candidate-level marginal utility is not fabricated

When the current State Value Lab only provides group-level ablations, target rows remain marked as:

target_granularity = "group"

Visibility targets¶

Target v2 adds:

visibility_target
visibility_target_reason
model_visible_allowed
receipt_visible_allowed
raw_content_allowed_model_visible

Allowed visibility actions are:

model_visible_include
receipt_only
safety_label_only
drop
unknown

This matches the real product boundary:

model-visible context stays lean
receipts can remain richer
quarantined raw text never becomes model-visible

Cost-sensitive utility¶

Selector v1 does not train a language model.

It trains a small empirical utility selector over structured StatePlane features and deterministic target signals.

The target is not simply:

helpful vs harmful

It is closer to:

marginal utility under token budget and safety constraints

High-information examples¶

Target v2 also flags:

target_disagrees_with_heuristic
target_disagrees_with_selector_v0
high_information_example

These fields are diagnostic in Milestone A.

They are intended to support later weighting, debugging, and active-learning style follow-up work.

Pairwise targets¶

Milestone A writes pairwise_targets.jsonl, but pairwise data stays intentionally sparse.

Pairs are emitted only when:

both targets have known utility
the utility margin is clear
the pair is comparable within one fixture context

Sparse output is acceptable in this milestone.

The goal is to avoid pretending that dense ranking supervision exists when the underlying ablations are still coarse.

Target validation¶

Validate selector-v1 targets with:

uv run python -m evals.state_selector.target_quality \
  --targets evals/state_selector/reports/selector-v1-targets/targets_v2.jsonl \
  --pairwise-targets evals/state_selector/reports/selector-v1-targets/pairwise_targets.jsonl \
  --out evals/state_selector/reports/selector-v1-target-quality \
  --json \
  --markdown

Hard failures include:

raw quarantined content marked model-visible
resolved failures marked as active model-visible work
hard constraints marked forbidden
all utilities missing
all visibility targets unknown
leakage keys

Limitations¶

Target v2 still reflects group-level ablation fidelity, not live task outcomes.
Candidate-level marginal utility remains deferred until the lab emits candidate-level ablations.
target_disagrees_with_selector_v0 can remain unset in Milestone A when selector-v0 replay is not part of the current export path.