Preserve in-progress message history when agent run errors#517
Conversation
Previously, when the server returned an error (e.g. 503 timeout) mid-run, any assistant messages, tool calls, and tool results produced since the user's last prompt were lost. loopAgentSteps created a shallow copy of the initial agent state, so mutations didn't propagate back to the SDK's shared sessionState reference if an error threw up. Now loopAgentSteps mutates the caller's state directly, and the SDK's cancellation path detects runtime progress so the user prompt isn't duplicated.
Greptile SummaryThis PR fixes a data-loss bug where assistant messages, tool calls, and tool results produced during an agent run were silently discarded whenever the server returned an error (e.g., a 503 timeout or a 402 payment-required). Two coordinated changes achieve the fix:
Key observations:
Confidence Score: 4/5Safe to merge; both bugs and their root causes are addressed, tests pass, and issues found are non-blocking P2 style/edge-case concerns. The fix is well-targeted and the logic holds up under scrutiny. The mutation semantics change is intentional and documented. The two P2 comments (shared-reference aliasing awareness and the history-length heuristic edge case) are informational notes rather than blocking bugs. Test coverage for the new scenario is solid. Score is 4 rather than 5 because the history-length heuristic is a subtle implicit contract that could silently fail under aggressive context-pruning combined with an error mid-run.
Important Files Changed
Sequence DiagramsequenceDiagram
participant SDK as sdk/run.ts (runOnce)
participant RT as agent-runtime (loopAgentSteps)
participant LLM as LLM / Tool Calls
SDK->>SDK: capture initialHistoryLength
SDK->>RT: callMainPrompt(sessionState)
Note over RT: initialAgentState.messageHistory = initialMessages<br/>(user msg now in shared state)
RT->>LLM: runAgentStep (step 1)
LLM-->>RT: agentState + messages
Note over RT: Object.assign(initialAgentState, newAgentState)<br/>(tool calls/results now in shared state)
RT->>LLM: runAgentStep (step N...)
LLM--xRT: 503 / timeout error thrown
Note over RT: Error propagates – shared state already mutated
RT--xSDK: throws error
SDK->>SDK: getCancelledSessionState()
SDK->>SDK: runtimeMadeProgress? history.length > initialHistoryLength → true
Note over SDK: Skip re-adding user prompt<br/>Preserve in-progress tool calls/results
SDK-->>Caller: RunResult { output: error, sessionState: preserved }
|
| const runtimeMadeProgress = | ||
| state.mainAgentState.messageHistory.length > initialHistoryLength |
There was a problem hiding this comment.
History-length heuristic can yield false negatives with context pruning
runtimeMadeProgress is determined solely by comparing the current history length against initialHistoryLength. If a run starts, the runtime mutates the history to add messages, but then a context-pruning pass removes enough messages to bring the length back at or below initialHistoryLength before the error is thrown, runtimeMadeProgress will be false. The user's prompt would then be inserted again by getCancelledSessionState, potentially duplicating it.
This is an unlikely edge case (pruning typically keeps recent messages including the user prompt), and the alternative of tracking whether state mutation occurred would add complexity. It's worth noting as a known limitation of the length-based check, perhaps with an inline comment:
| const runtimeMadeProgress = | |
| state.mainAgentState.messageHistory.length > initialHistoryLength | |
| const runtimeMadeProgress = | |
| state.mainAgentState.messageHistory.length > initialHistoryLength | |
| // Note: relies on history only growing during a run. If context pruning | |
| // shortens history below the baseline length, this may be a false negative. |
…tract Addresses review feedback: - Switch SDK's runtimeMadeProgress check from history-length comparison to array-reference inequality. This is robust to context pruning shrinking history below its starting length mid-run, which would cause a false negative and duplicate the user prompt. - Document loopAgentSteps' mutation contract: it mutates params.agentState in place throughout the run so callers see in-progress work even when an error throws before a normal return.
Summary
loopAgentStepsnow mutates the caller's sharedAgentStateinstead of a shallow copy, so in-progress work propagates back to the SDK'ssessionState.mainAgentStateeven when an error is thrown up (e.g. 402 or pre-loop setup errors).getCancelledSessionStatenow detects runtime progress and avoids duplicating the user prompt when the runtime already added it.sdk/src/__tests__/run-error-preserves-history.test.ts.Test plan
bun test sdk/src/__tests__/— 354 pass, 0 fail (incl. new tests)bun test packages/agent-runtime/src/__tests__/main-prompt.test.ts— 6 pass (fixed latent test bug exposed by new mutation semantics)🤖 Generated with Claude Code