Skip to content

Selector v1 Targets

Why target redesign was needed

Selector v0 proved the pipeline:

State Value Lab
  -> dataset validation
  -> small empirical selector
  -> evaluation and promotion gates

But selector v0 mostly learned from coarse group-value labels:

  • helpful
  • harmful
  • neutral
  • too_expensive
  • unknown

That was enough to bootstrap the system.

It is not enough to represent the full product question:

which state is worth spending model-visible tokens on?

What v0 labels captured

Selector v0 labels correctly captured:

  • whether removing a state group hurt continuation quality
  • whether removing a state group saved tokens
  • whether a group was broadly helpful or broadly wasteful

What v0 labels missed

Selector v0 labels did not explicitly distinguish:

  • hard invariants vs optional state
  • profile choice (none, minimal, compact)
  • model-visible state vs receipt-only state
  • safety-label-only state for quarantined risks
  • high-information disagreements with the heuristic policy

Hard invariant targets

Selector v1 target schema v2 adds rule-based hard-invariant fields:

  • is_hard_invariant
  • required_by_rule
  • forbidden_by_rule
  • hard_invariant_reason

Examples:

  • active hard user constraints remain required
  • raw quarantined content is forbidden model-visible
  • resolved failures are forbidden as active work
  • tool output cannot become an instruction

These are not learned.

They exist so training, validation, and runtime can distinguish optional selection from hard safety policy.

Profile targets

Target v2 adds:

  • best_profile
  • profile_utilities
  • profile_token_costs

Milestone A derives these deterministically from State Value intervention scores for:

  • none
  • minimal
  • compact

Ties break by lower token cost, then by deterministic profile order:

none < minimal < compact

Marginal utility targets

Milestone A adds:

  • marginal_quality_delta
  • marginal_token_delta
  • marginal_safety_delta
  • marginal_utility
  • value_per_token

Important constraint:

candidate-level marginal utility is not fabricated

When the current State Value Lab only provides group-level ablations, target rows remain marked as:

target_granularity = "group"

Visibility targets

Target v2 adds:

  • visibility_target
  • visibility_target_reason
  • model_visible_allowed
  • receipt_visible_allowed
  • raw_content_allowed_model_visible

Allowed visibility actions are:

  • model_visible_include
  • receipt_only
  • safety_label_only
  • drop
  • unknown

This matches the real product boundary:

  • model-visible context stays lean
  • receipts can remain richer
  • quarantined raw text never becomes model-visible

Cost-sensitive utility

Selector v1 does not train a language model.

It trains a small empirical utility selector over structured StatePlane features and deterministic target signals.

The target is not simply:

helpful vs harmful

It is closer to:

marginal utility under token budget and safety constraints

High-information examples

Target v2 also flags:

  • target_disagrees_with_heuristic
  • target_disagrees_with_selector_v0
  • high_information_example

These fields are diagnostic in Milestone A.

They are intended to support later weighting, debugging, and active-learning style follow-up work.

Pairwise targets

Milestone A writes pairwise_targets.jsonl, but pairwise data stays intentionally sparse.

Pairs are emitted only when:

  • both targets have known utility
  • the utility margin is clear
  • the pair is comparable within one fixture context

Sparse output is acceptable in this milestone.

The goal is to avoid pretending that dense ranking supervision exists when the underlying ablations are still coarse.

Target validation

Validate selector-v1 targets with:

uv run python -m evals.state_selector.target_quality \
  --targets evals/state_selector/reports/selector-v1-targets/targets_v2.jsonl \
  --pairwise-targets evals/state_selector/reports/selector-v1-targets/pairwise_targets.jsonl \
  --out evals/state_selector/reports/selector-v1-target-quality \
  --json \
  --markdown

Hard failures include:

  • raw quarantined content marked model-visible
  • resolved failures marked as active model-visible work
  • hard constraints marked forbidden
  • all utilities missing
  • all visibility targets unknown
  • leakage keys

Limitations

  • Target v2 still reflects group-level ablation fidelity, not live task outcomes.
  • Candidate-level marginal utility remains deferred until the lab emits candidate-level ablations.
  • target_disagrees_with_selector_v0 can remain unset in Milestone A when selector-v0 replay is not part of the current export path.