Replay and Diff a Coding-Agent Run¶
What this example shows¶
This example shows the next step after structured coding-agent state:
- record explicit runs
- persist structured events
- export replay receipts and local artifacts
- replay a prior run deterministically
- diff run boundaries
- reconstruct the lean StatePlane context, rich receipt, and prepared OpenAI request from the event log
StatePlane artifacts are receipts, not the source of truth.
The source of truth is the event log.
Replay rebuilds projections, model-visible context, receipts, and request payloads from events. This makes it possible to inspect what the agent knew before a request and what changed after each run.
Why replay matters¶
Replay answers questions that transcript storage alone does not answer cleanly:
- What did the agent know before this request?
- Which failure was still active?
- Which risk was quarantined?
- Can I rebuild the same snapshot and request without mutating history?
The replay path is read-only. It does not append new events.
Why diff matters¶
Diff turns the event log into state transitions instead of just more logs.
For the coding-agent workflow in this example, the important change is:
tests/parser/test_errors.py::test_parser_error -> resolved
tests/cli/test_errors.py::test_cli_reports_parser_error -> active
That is more useful than reading two raw test transcripts and trying to infer the delta manually.
Run the example¶
From the repository root:
No OpenAI API key is required.
No network access is required.
Step 1: Record structured events¶
session = StatePlaneSession.local(
".stateplane/demo.db",
session_id="repo-maintainer-replay-demo",
token_budget=8000,
)
session.start_run(run_id="run-1", title="Fix parser failure")
session.record_goal(
title="Fix parser and CLI failures",
description="Make parser and CLI tests pass without editing tests or changing the public API.",
run_id="run-1",
)
session.record_user_constraint(
key="edit_tests",
value=False,
text="Do not edit tests.",
strength="hard",
run_id="run-1",
)
session.record_test_run(
command="pytest tests/parser tests/cli",
exit_code=1,
framework="pytest",
failures=[
{
"test_id": "tests/parser/test_errors.py::test_parser_error",
"file": "tests/parser/test_errors.py",
"name": "test_parser_error",
"error_type": "AssertionError",
"message": "ParserError is not wrapped before reaching the CLI boundary.",
}
],
run_id="run-1",
)
The example then records:
- a file patch
- an explicit decision
- a passing parser test
- a failing CLI test
- a quarantined risk
- the next prepared OpenAI request
Step 2: Export run artifacts¶
bundle = session.export_run_artifacts(
run_id="run-2",
output_dir=".stateplane/artifacts/run-2",
current_goal="Now fix the related CLI failure.",
)
This writes a local artifact directory that can be inspected later without calling OpenAI.
Step 3: Replay a run¶
replay = session.replay(
run_id="run-2",
current_goal="Now fix the related CLI failure.",
prepare_request=True,
model="gpt-5-mini",
instructions="You are a careful repo-maintenance agent.",
user_input="Now fix the related CLI failure.",
)
Replay rebuilds:
- the coding projection
- the merged
StateRecordset - the lean
STATEPLANE CONTEXT - the richer StatePlane receipt
- the prepared OpenAI request
It does that from events, not from exported artifact files.
Step 4: Diff two runs¶
The resulting summary highlights the important transition:
- the parser failure was resolved
- the CLI failure remains active
- the quarantined risk was added
Step 5: Counterfactual replay¶
counterfactual = session.replay_counterfactual(
exclude_event_ids=[risk_event_id],
current_goal="Now fix the related CLI failure.",
prepare_request=True,
model="gpt-5-mini",
instructions="You are a careful repo-maintenance agent.",
user_input="Now fix the related CLI failure.",
)
This is intentionally narrow.
It does not create branching history. It filters an in-memory event list and replays the result read-only.
That is enough to answer questions like:
- what would the snapshot look like before the quarantined risk?
- what changed when this event is excluded?
Artifact layout¶
artifacts/run-2/
manifest.json
events.json
run.json
coding_projection.before.json
coding_projection.after.json
coding_diff_from_previous.json
state_records.json
snapshot.json
snapshot.md
receipt.md
prepared_openai_request.json
receipt.json
What is deterministic¶
For the same event log and the same replay arguments, StatePlane deterministically rebuilds:
- the coding projection
- the merged
StateRecordset - the snapshot hash
- the replay receipt hash
- the run diff summary
Artifact export also computes stable file hashes and a stable manifest receipt hash that excludes machine-specific absolute paths.
What is not included¶
This example does not:
- call OpenAI
- execute tools
- edit files
- use embeddings or vector search
- create a remote backend or dashboard
It stays local and deterministic on purpose.
Limitations¶
Per-call request arguments such as instructions are only replayable when they are supplied again or
were previously captured in event metadata by prepare_openai_request(...).
Artifact bundles are receipts only. They are not authoritative state.
Counterfactual replay is event filtering only. It does not model branch history or causal rewrites.
Next steps¶
If this workflow fits your integration, the next practical step is to call the structured
record_* APIs from your own tool wrappers and test runners so runs become replayable without
manual data cleanup.