Detect list mutation during sort even when list length is unchanged by rlaisqls · Pull Request #7432

Detect list mutation during sort even when list length is unchanged by rlaisqls · Pull Request #7432 · RustPython/RustPython

@youknowone, here is a precise breakdown of how CPython (from 2.x through the current 3.14 main branch) handles this — the mechanism has not changed across versions.

How CPython detects list mutation during sort

Before sorting — establish a sentinel

CPython saves the current ob_size, ob_item, and allocated fields, then zeroes out ob_size, sets ob_item = NULL, and sets allocated = -1. The comment reads: /* any operation will reset it to >= 0 */

The allocated field has a strict invariant in normal use: 0 <= ob_size <= allocated. list.sort() deliberately violates this by temporarily setting allocated to -1 to detect mutations.

The list is thus left in a publicly-visible empty state (from Python code len(L) == 0 during the sort). The real elements are tucked away in the local saved_ob_item pointer, invisible to Python callbacks.

After sorting — check the sentinel

After the sort completes, CPython checks: if (self->allocated != -1 && result != NULL) — this is the point where it detects that "the user mucked with the list during the sort".

Because every single mutating operation on a CPython list — append, extend, insert, pop, remove, clear, __setitem__, __delitem__, __iadd__, __imul__, etc. — funnels through list_resize(), which unconditionally sets allocated to a non-negative value, the -1 sentinel is impossible to encounter unless no mutation occurred. Even list.clear() goes through list_resize(0), which sets allocated = 0, not -1.

Why the current RustPython approach breaks down

RustPython's elements.capacity() > 0 check mirrors the CPython intent, but fails because Rust's Vec memory model is different:

Vec::clear() retains the allocation → capacity() stays > 0 → detects mutation correctly
core::mem::take() (used in RustPython's own clear() method at Line 235) produces a brand-new empty Vec with capacity 0 — indistinguishable from the sort's own internal sentinel state → mutation silently missed

The CPython sentinel works because -1 is structurally impossible for allocated under any normal operation. RustPython's capacity() == 0 is not similarly exclusive — it can be achieved by user-initiated mem::take-style clears.

Recommended fix (recap)

The Rust equivalent of CPython's -1 sentinel is an explicit mutation counter (AtomicUsize) on PyList, incremented in every #[pymethod] that mutates the list. A snapshot taken before the sort is compared to the value after — any difference means mutation occurred. This is exactly what I outlined in my earlier comment, and it's the closest structural analogue to CPython's approach.

🧠 Learnings used

Learnt from: jackoconnordev
Repo: RustPython/RustPython PR: 6086
File: benches/microbenchmarks/sort.py:7-7
Timestamp: 2025-08-09T22:55:10.512Z
Learning: In RustPython benchmarks, when testing sorting performance, benchmarking only `sorted()` is sufficient since it internally uses the same sorting logic as `list.sort()`. The only difference is that `sorted()` creates a copy first, so testing both would be redundant for evaluating the sorting algorithm's performance.

Learnt from: ShaharNaveh
Repo: RustPython/RustPython PR: 7181
File: crates/stdlib/src/csv.rs:20-20
Timestamp: 2026-02-17T07:54:36.200Z
Learning: In RustPython, ensure that modules using Python sequence-like types bring in extension methods by importing the trait (for example, 'use rustpython_vm::sliceable::SliceableSequenceOp;'). Even if the trait name isn't directly referenced in code, its methods (such as len()) become in-scope on types like PyStr. Add this import in modules that rely on those extension methods to avoid compilation errors.