Replay and Diff a Coding-Agent Run¶

What this example shows¶

This example shows the next step after structured coding-agent state:

record explicit runs
persist structured events
export replay receipts and local artifacts
replay a prior run deterministically
diff run boundaries
reconstruct the lean StatePlane context, rich receipt, and prepared OpenAI request from the event log

StatePlane artifacts are receipts, not the source of truth.

The source of truth is the event log.

Replay rebuilds projections, model-visible context, receipts, and request payloads from events. This makes it possible to inspect what the agent knew before a request and what changed after each run.

Why replay matters¶

Replay answers questions that transcript storage alone does not answer cleanly:

What did the agent know before this request?
Which failure was still active?
Which risk was quarantined?
Can I rebuild the same snapshot and request without mutating history?

The replay path is read-only. It does not append new events.

Why diff matters¶

Diff turns the event log into state transitions instead of just more logs.

For the coding-agent workflow in this example, the important change is:

tests/parser/test_errors.py::test_parser_error -> resolved
tests/cli/test_errors.py::test_cli_reports_parser_error -> active

That is more useful than reading two raw test transcripts and trying to infer the delta manually.

Run the example¶

From the repository root:

python examples/replay_and_diff_coding_run/demo.py

No OpenAI API key is required.

No network access is required.

Step 1: Record structured events¶

session = StatePlaneSession.local(
    ".stateplane/demo.db",
    session_id="repo-maintainer-replay-demo",
    token_budget=8000,
)

session.start_run(run_id="run-1", title="Fix parser failure")

session.record_goal(
    title="Fix parser and CLI failures",
    description="Make parser and CLI tests pass without editing tests or changing the public API.",
    run_id="run-1",
)

session.record_user_constraint(
    key="edit_tests",
    value=False,
    text="Do not edit tests.",
    strength="hard",
    run_id="run-1",
)

session.record_test_run(
    command="pytest tests/parser tests/cli",
    exit_code=1,
    framework="pytest",
    failures=[
        {
            "test_id": "tests/parser/test_errors.py::test_parser_error",
            "file": "tests/parser/test_errors.py",
            "name": "test_parser_error",
            "error_type": "AssertionError",
            "message": "ParserError is not wrapped before reaching the CLI boundary.",
        }
    ],
    run_id="run-1",
)

The example then records:

a file patch
an explicit decision
a passing parser test
a failing CLI test
a quarantined risk
the next prepared OpenAI request

Step 2: Export run artifacts¶

bundle = session.export_run_artifacts(
    run_id="run-2",
    output_dir=".stateplane/artifacts/run-2",
    current_goal="Now fix the related CLI failure.",
)

This writes a local artifact directory that can be inspected later without calling OpenAI.

Step 3: Replay a run¶

replay = session.replay(
    run_id="run-2",
    current_goal="Now fix the related CLI failure.",
    prepare_request=True,
    model="gpt-5-mini",
    instructions="You are a careful repo-maintenance agent.",
    user_input="Now fix the related CLI failure.",
)

Replay rebuilds:

the coding projection
the merged StateRecord set
the lean STATEPLANE CONTEXT
the richer StatePlane receipt
the prepared OpenAI request

It does that from events, not from exported artifact files.

Step 4: Diff two runs¶

diff = session.diff_runs(
    before_run_id="run-1",
    after_run_id="run-2",
)

The resulting summary highlights the important transition:

the parser failure was resolved
the CLI failure remains active
the quarantined risk was added

Step 5: Counterfactual replay¶

counterfactual = session.replay_counterfactual(
    exclude_event_ids=[risk_event_id],
    current_goal="Now fix the related CLI failure.",
    prepare_request=True,
    model="gpt-5-mini",
    instructions="You are a careful repo-maintenance agent.",
    user_input="Now fix the related CLI failure.",
)

This is intentionally narrow.

It does not create branching history. It filters an in-memory event list and replays the result read-only.

That is enough to answer questions like:

what would the snapshot look like before the quarantined risk?
what changed when this event is excluded?

Artifact layout¶

artifacts/run-2/
  manifest.json
  events.json
  run.json
  coding_projection.before.json
  coding_projection.after.json
  coding_diff_from_previous.json
  state_records.json
  snapshot.json
  snapshot.md
  receipt.md
  prepared_openai_request.json
  receipt.json

What is deterministic¶

For the same event log and the same replay arguments, StatePlane deterministically rebuilds:

the coding projection
the merged StateRecord set
the snapshot hash
the replay receipt hash
the run diff summary

Artifact export also computes stable file hashes and a stable manifest receipt hash that excludes machine-specific absolute paths.

What is not included¶

This example does not:

call OpenAI
execute tools
edit files
use embeddings or vector search
create a remote backend or dashboard

It stays local and deterministic on purpose.

Limitations¶

Per-call request arguments such as instructions are only replayable when they are supplied again or were previously captured in event metadata by prepare_openai_request(...).

Artifact bundles are receipts only. They are not authoritative state.

Counterfactual replay is event filtering only. It does not model branch history or causal rewrites.

Next steps¶

If this workflow fits your integration, the next practical step is to call the structured record_* APIs from your own tool wrappers and test runners so runs become replayable without manual data cleanup.