Optimize behavior of vec.split_off(0) (take all) by richkadel · Pull Request #76682 · rust-lang/rust
labels
Sep 13, 2020Optimization improvement to `split_off()` so the performance meets the
intuitively expected behavior when `at == 0`, avoiding the current
behavior of copying the entire vector.
The change honors documented behavior that the method leaves the
original vector's "previous capacity unchanged".
This improvement better supports the pattern for building and flushing a
buffer of elements, such as the following:
```rust
let mut vec = Vec::new();
loop {
vec.push(something);
if condition_is_met {
process(vec.split_off(0));
}
}
```
`Option` wrapping is the first alternative I thought of, but is much
less obvious and more verbose:
```rust
let mut capacity = 1;
let mut vec: Option<Vec<Stuff>> = None;
loop {
vec.get_or_insert_with(|| Vec::with_capacity(capacity)).push(something);
if condition_is_met {
capacity = vec.capacity();
process(vec.take().unwrap());
}
}
```
Directly applying `mem::replace()` could work, but `mem::` functions are
typically a last resort, when a developer is actively seeking better
performance than the standard library provides, for example.
The benefit of the approach to this change is it does not change the
existing API contract, but improves the peformance of `split_off(0)` for
`Vec`, `String` (which delegates `split_off()` to `Vec`), and any other
existing use cases.
This change adds tests to validate the behavior of `split_off()` with
regard to capacity, as originally documented, and confirm that behavior
still holds, when `at == 0`.
The change is an implementation detail, and does not require a
documentation change, but documenting the new behavior as part of its
API contract may benefit future users.
(Let me know if I should make that documentation update.)
Note, for future consideration:
I think it would be helpful to introduce an additional method to `Vec`
(if not also to `String`):
```
pub fn take_all(&mut self) -> Self {
self.split_off(0)
}
```
This would make it more clear how `Vec` supports the pattern, and make
it easier to find, since the behavior is similar to other `take()`
methods in the Rust standard library.
bors
added
S-waiting-on-bors
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.labels
Sep 14, 2020This was referenced
Jan 13, 2024matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request
Jan 26, 2024Remove special-case handling of `vec.split_off(0)` rust-lang#76682 added special handling to `Vec::split_off` for the case where `at == 0`. Instead of copying the vector's contents into a freshly-allocated vector and returning it, the special-case code steals the old vector's allocation, and replaces it with a new (empty) buffer with the same capacity. That eliminates the need to copy the existing elements, but comes at a surprising cost, as seen in rust-lang#119913. The returned vector's capacity is no longer determined by the size of its contents (as would be expected for a freshly-allocated vector), and instead uses the full capacity of the old vector. In cases where the capacity is large but the size is small, that results in a much larger capacity than would be expected from reading the documentation of `split_off`. This is especially bad when `split_off` is called in a loop (to recycle a buffer), and the returned vectors have a wide variety of lengths. I believe it's better to remove the special-case code, and treat `at == 0` just like any other value: - The current documentation states that `split_off` returns a “newly allocated vector”, which is not actually true in the current implementation when `at == 0`. - If the value of `at` could be non-zero at runtime, then the caller has already agreed to the cost of a full memcpy of the taken elements in the general case. Avoiding that copy would be nice if it were close to free, but the different handling of capacity means that it is not. - If the caller specifically wants to avoid copying in the case where `at == 0`, they can easily implement that behaviour themselves using `mem::replace`. Fixes rust-lang#119913.
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request
Jan 26, 2024Rollup merge of rust-lang#119917 - Zalathar:split-off, r=cuviper Remove special-case handling of `vec.split_off(0)` rust-lang#76682 added special handling to `Vec::split_off` for the case where `at == 0`. Instead of copying the vector's contents into a freshly-allocated vector and returning it, the special-case code steals the old vector's allocation, and replaces it with a new (empty) buffer with the same capacity. That eliminates the need to copy the existing elements, but comes at a surprising cost, as seen in rust-lang#119913. The returned vector's capacity is no longer determined by the size of its contents (as would be expected for a freshly-allocated vector), and instead uses the full capacity of the old vector. In cases where the capacity is large but the size is small, that results in a much larger capacity than would be expected from reading the documentation of `split_off`. This is especially bad when `split_off` is called in a loop (to recycle a buffer), and the returned vectors have a wide variety of lengths. I believe it's better to remove the special-case code, and treat `at == 0` just like any other value: - The current documentation states that `split_off` returns a “newly allocated vector”, which is not actually true in the current implementation when `at == 0`. - If the value of `at` could be non-zero at runtime, then the caller has already agreed to the cost of a full memcpy of the taken elements in the general case. Avoiding that copy would be nice if it were close to free, but the different handling of capacity means that it is not. - If the caller specifically wants to avoid copying in the case where `at == 0`, they can easily implement that behaviour themselves using `mem::replace`. Fixes rust-lang#119913.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters