fix: cancel in-flight handlers when transport closes in server.run() by maxisbey · Pull Request #2306 · modelcontextprotocol/python-sdk

@maxisbey

When the transport closes (stdin EOF, client disconnect) while a request
handler is still running, server.run()'s task group joins on the handler
instead of cancelling it. The handler eventually finishes, tries to send
its response through a write stream that _receive_loop already closed,
and server.run() crashes with ClosedResourceError wrapped in a
triple-nested ExceptionGroup.

The fix cancels the task group when the incoming_messages loop ends.
Handlers receive CancelledError and can clean up in finally blocks.

The existing CancelledError catch in _handle_request (added for
CancelledNotification handling in #1153) now distinguishes the two
cancellation sources: responder.cancel() already sent an error response
and we skip the duplicate; transport-close cancellation is re-raised so
the task group swallows it.

Github-Issue: #526

@maxisbey marked this pull request as ready for review

March 18, 2026 15:51

claude[bot]

@maxisbey

Two additional races in the same transport-close window as the previous
commit, both triggered when handlers are blocked on server-to-client
requests (sampling, roots, elicitation) at the moment the transport closes:

1. _receive_loop's finally iterates _response_streams.items() with await
   checkpoints inside the loop. The woken handler's send_request finally
   pops from that dict before the iterator's next __next__(), raising
   RuntimeError: dictionary changed size during iteration. Fix: snapshot
   with list() before iterating.

2. The woken handler's send_request raises MCPError (CONNECTION_CLOSED),
   which _handle_request catches and converts to an error response. It
   then falls through to message.respond() against a write stream that
   _receive_loop already closed. Fix: catch ClosedResourceError and drop
   the response.

Both reproduce deterministically with two handlers blocked on list_roots()
when to_server is closed. Single test covers both: fails 20/20 with either
fix reverted, passes 50/50 with both.

claude[bot]

Python 3.14's compiler attributes the async trampoline's CLEANUP_THROW
instructions (for the try-body's await) to the next physical line of
code, which was the else body. coverage.py traced a phantom line event
there, tripping strict-no-cover even though the else never runs.

Moving the try/respond after the if/else avoids the misattribution and
also deduplicates the two respond() calls.
streamable_http's terminate() closes _write_stream_reader (the receive
end) before _write_stream (the send end). A handler reaching respond()
between those two closes gets BrokenResourceError (peer end closed)
rather than ClosedResourceError (our end closed). The stdio path only
ever hits ClosedResourceError because _receive_loop's async-with closes
the send end.

felixweinberger

@maxisbey maxisbey deleted the fix/cancel-in-flight-on-transport-close branch

March 20, 2026 13:37

This was referenced

Mar 27, 2026