Skip to content

Replay and Diff a Coding-Agent Run

What this example shows

This example shows the next step after structured coding-agent state:

  • record explicit runs
  • persist structured events
  • export replay receipts and local artifacts
  • replay a prior run deterministically
  • diff run boundaries
  • reconstruct the lean StatePlane context, rich receipt, and prepared OpenAI request from the event log

StatePlane artifacts are receipts, not the source of truth.

The source of truth is the event log.

Replay rebuilds projections, model-visible context, receipts, and request payloads from events. This makes it possible to inspect what the agent knew before a request and what changed after each run.

Why replay matters

Replay answers questions that transcript storage alone does not answer cleanly:

  • What did the agent know before this request?
  • Which failure was still active?
  • Which risk was quarantined?
  • Can I rebuild the same snapshot and request without mutating history?

The replay path is read-only. It does not append new events.

Why diff matters

Diff turns the event log into state transitions instead of just more logs.

For the coding-agent workflow in this example, the important change is:

tests/parser/test_errors.py::test_parser_error -> resolved
tests/cli/test_errors.py::test_cli_reports_parser_error -> active

That is more useful than reading two raw test transcripts and trying to infer the delta manually.

Run the example

From the repository root:

python examples/replay_and_diff_coding_run/demo.py

No OpenAI API key is required.

No network access is required.

Step 1: Record structured events

session = StatePlaneSession.local(
    ".stateplane/demo.db",
    session_id="repo-maintainer-replay-demo",
    token_budget=8000,
)

session.start_run(run_id="run-1", title="Fix parser failure")

session.record_goal(
    title="Fix parser and CLI failures",
    description="Make parser and CLI tests pass without editing tests or changing the public API.",
    run_id="run-1",
)

session.record_user_constraint(
    key="edit_tests",
    value=False,
    text="Do not edit tests.",
    strength="hard",
    run_id="run-1",
)

session.record_test_run(
    command="pytest tests/parser tests/cli",
    exit_code=1,
    framework="pytest",
    failures=[
        {
            "test_id": "tests/parser/test_errors.py::test_parser_error",
            "file": "tests/parser/test_errors.py",
            "name": "test_parser_error",
            "error_type": "AssertionError",
            "message": "ParserError is not wrapped before reaching the CLI boundary.",
        }
    ],
    run_id="run-1",
)

The example then records:

  • a file patch
  • an explicit decision
  • a passing parser test
  • a failing CLI test
  • a quarantined risk
  • the next prepared OpenAI request

Step 2: Export run artifacts

bundle = session.export_run_artifacts(
    run_id="run-2",
    output_dir=".stateplane/artifacts/run-2",
    current_goal="Now fix the related CLI failure.",
)

This writes a local artifact directory that can be inspected later without calling OpenAI.

Step 3: Replay a run

replay = session.replay(
    run_id="run-2",
    current_goal="Now fix the related CLI failure.",
    prepare_request=True,
    model="gpt-5-mini",
    instructions="You are a careful repo-maintenance agent.",
    user_input="Now fix the related CLI failure.",
)

Replay rebuilds:

  • the coding projection
  • the merged StateRecord set
  • the lean STATEPLANE CONTEXT
  • the richer StatePlane receipt
  • the prepared OpenAI request

It does that from events, not from exported artifact files.

Step 4: Diff two runs

diff = session.diff_runs(
    before_run_id="run-1",
    after_run_id="run-2",
)

The resulting summary highlights the important transition:

  • the parser failure was resolved
  • the CLI failure remains active
  • the quarantined risk was added

Step 5: Counterfactual replay

counterfactual = session.replay_counterfactual(
    exclude_event_ids=[risk_event_id],
    current_goal="Now fix the related CLI failure.",
    prepare_request=True,
    model="gpt-5-mini",
    instructions="You are a careful repo-maintenance agent.",
    user_input="Now fix the related CLI failure.",
)

This is intentionally narrow.

It does not create branching history. It filters an in-memory event list and replays the result read-only.

That is enough to answer questions like:

  • what would the snapshot look like before the quarantined risk?
  • what changed when this event is excluded?

Artifact layout

artifacts/run-2/
  manifest.json
  events.json
  run.json
  coding_projection.before.json
  coding_projection.after.json
  coding_diff_from_previous.json
  state_records.json
  snapshot.json
  snapshot.md
  receipt.md
  prepared_openai_request.json
  receipt.json

What is deterministic

For the same event log and the same replay arguments, StatePlane deterministically rebuilds:

  • the coding projection
  • the merged StateRecord set
  • the snapshot hash
  • the replay receipt hash
  • the run diff summary

Artifact export also computes stable file hashes and a stable manifest receipt hash that excludes machine-specific absolute paths.

What is not included

This example does not:

  • call OpenAI
  • execute tools
  • edit files
  • use embeddings or vector search
  • create a remote backend or dashboard

It stays local and deterministic on purpose.

Limitations

Per-call request arguments such as instructions are only replayable when they are supplied again or were previously captured in event metadata by prepare_openai_request(...).

Artifact bundles are receipts only. They are not authoritative state.

Counterfactual replay is event filtering only. It does not model branch history or causal rewrites.

Next steps

If this workflow fits your integration, the next practical step is to call the structured record_* APIs from your own tool wrappers and test runners so runs become replayable without manual data cleanup.