fix: cancel in-flight handlers when transport closes in server.run() by maxisbey · Pull Request #2306 · modelcontextprotocol/python-sdk
When the transport closes (stdin EOF, client disconnect) while a request handler is still running, server.run()'s task group joins on the handler instead of cancelling it. The handler eventually finishes, tries to send its response through a write stream that _receive_loop already closed, and server.run() crashes with ClosedResourceError wrapped in a triple-nested ExceptionGroup. The fix cancels the task group when the incoming_messages loop ends. Handlers receive CancelledError and can clean up in finally blocks. The existing CancelledError catch in _handle_request (added for CancelledNotification handling in #1153) now distinguishes the two cancellation sources: responder.cancel() already sent an error response and we skip the duplicate; transport-close cancellation is re-raised so the task group swallows it. Github-Issue: #526
marked this pull request as ready for review
Two additional races in the same transport-close window as the previous commit, both triggered when handlers are blocked on server-to-client requests (sampling, roots, elicitation) at the moment the transport closes: 1. _receive_loop's finally iterates _response_streams.items() with await checkpoints inside the loop. The woken handler's send_request finally pops from that dict before the iterator's next __next__(), raising RuntimeError: dictionary changed size during iteration. Fix: snapshot with list() before iterating. 2. The woken handler's send_request raises MCPError (CONNECTION_CLOSED), which _handle_request catches and converts to an error response. It then falls through to message.respond() against a write stream that _receive_loop already closed. Fix: catch ClosedResourceError and drop the response. Both reproduce deterministically with two handlers blocked on list_roots() when to_server is closed. Single test covers both: fails 20/20 with either fix reverted, passes 50/50 with both.
Python 3.14's compiler attributes the async trampoline's CLEANUP_THROW instructions (for the try-body's await) to the next physical line of code, which was the else body. coverage.py traced a phantom line event there, tripping strict-no-cover even though the else never runs. Moving the try/respond after the if/else avoids the misattribution and also deduplicates the two respond() calls.
streamable_http's terminate() closes _write_stream_reader (the receive end) before _write_stream (the send end). A handler reaching respond() between those two closes gets BrokenResourceError (peer end closed) rather than ClosedResourceError (our end closed). The stdio path only ever hits ClosedResourceError because _receive_loop's async-with closes the send end.
maxisbey
deleted the
fix/cancel-in-flight-on-transport-close
branch
This was referenced
Mar 27, 2026This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters