GitHub - peiking88/cpp-sort: Sorting algorithms & related tools for C++

cpp-sort logo

Latest Release Conan Package Code Coverage Pitchfork Layout

It would be nice if only one or two of the sorting methods would dominate all of the others, regardless of application or the computer being used. But in fact, each method has its own peculiar virtues. [...] Thus we find that nearly all of the algorithms deserve to be remembered, since there are some applications in which they turn out to be best. — Donald Knuth, The Art Of Computer Programming, Volume 3

cpp-sort is a generic C++20 header-only sorting library. It revolves around one main generic sorting interface and provides several small tools to pick and/or design sorting algorithms. Using its basic sorting features should be trivial enough:

#include <array>
#include <iostream>
#include <cpp-sort/sorters/smooth_sorter.h>

int main()
{
    std::array<int, 5> arr = { 5, 8, 3, 2, 9 };
    cppsort::smooth_sort(arr);

    // prints 2 3 5 8 9
    for (int val: arr) {
        std::cout << val << ' ';
    }
}

SIMD-Optimized Sorting

cpp-sort now includes simd_sorter, a high-performance SIMD-optimized sorter that integrates x86-simd-sort for x86 platforms with AVX2 or AVX-512 support.

#include <vector>
#include <cpp-sort/sorters/simd_sorter.h>

int main()
{
    std::vector<float> data = { 5.0f, 8.0f, 3.0f, 2.0f, 9.0f };
    
    // Use SIMD-optimized sorting (3-10x faster than std::sort for primitive types)
    cppsort::simd_sort(data);
    
    return 0;
}

Supported Types

simd_sorter supports the following primitive types:

  • 32-bit: int32_t, uint32_t, float
  • 64-bit: int64_t, uint64_t, double
  • 16-bit: int16_t, uint16_t (AVX-512 only)

Performance

Array Size Type Speedup vs std::sort
100 float 4-5x
1,000 float 3-4x
100,000 float 7-8x
1,000,000 float 10x

For unsupported types or when SIMD is unavailable, simd_sorter automatically falls back to std::sort, ensuring correctness and portability.

Parallel Sorting

cpp-sort includes 8 parallel sorters powered by libfork for large datasets. They automatically enable parallel sorting for datasets with 1 million or more elements, falling back to sequential sorting for smaller datasets.

#include <vector>
#include <cpp-sort/sorters/parallel_sorter.h>
#include <cpp-sort/sorters/parallel_pdq_sorter.h>
#include <cpp-sort/sorters/parallel_simd_sorter.h>

int main()
{
    // Large dataset (>= 1M elements) - uses parallel sorting
    std::vector<int> large_data(2'000'000);
    cppsort::parallel_sort(large_data);        // General parallel sort
    cppsort::parallel_pdq_sort(large_data);    // Fastest for general data
    cppsort::parallel_simd_sort(large_data);   // Fastest for SIMD-compatible types
    
    return 0;
}

Available Parallel Sorters

Sorter Stability Supported Types Description
parallel_sorter Any comparable General-purpose parallel sort (wraps std::sort)
parallel_merge_sorter Any comparable Stable parallel merge sort
parallel_quick_sorter Any comparable Parallel quicksort with median-of-three pivot
parallel_pdq_sorter Any comparable Parallel pattern-defeating quicksort (fastest general)
parallel_simd_sorter Primitives only† SIMD + parallel (fastest for primitives)
parallel_tim_sorter Any comparable Adaptive stable sort (excellent for sorted data)
parallel_heap_sorter Any comparable Parallel heapsort (consistent O(n log n))
parallel_grail_sorter Any comparable Stable in-place sort (memory-efficient)

Primitives only: int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float, double

Features

  • Automatic threshold: Parallel sorting activates for >= 1,000,000 elements
  • NUMA-aware: Uses libfork's work-stealing scheduler optimized for NUMA systems
  • Low overhead: Leverages C++20 coroutines for efficient task management
  • Fallback support: Gracefully falls back to sequential sorting when:
    • Dataset size is below threshold
    • C++20 coroutine support is unavailable

Performance (1M random integers, 32 cores)

Algorithm Serial (us) Parallel (us) Speedup
parallel_simd_sorter 1,763 6,505 0.27x*
parallel_pdq_sorter 14,853 7,619 1.95x
parallel_grail_sorter 56,984 11,086 5.14x
parallel_quick_sorter 46,814 9,889 4.73x
parallel_heap_sorter 67,664 11,301 5.99x
parallel_merge_sorter 47,308 10,223 4.63x
parallel_tim_sorter - - 4-6x

*simd_sorter is already optimal for small datasets; parallel overhead is counterproductive

Performance on Sorted Data (100K integers, 32 cores)

Algorithm Serial (us) Parallel (us) Speedup
parallel_tim_sorter 22 22 1.0x (already optimal)
parallel_pdq_sorter 39 39 1.0x
parallel_grail_sorter 390 393 1.0x
parallel_merge_sorter 83 257 0.32x

TimSort's adaptive nature makes it optimal for sorted data

Large Dataset Performance (10M random integers, 32 cores)

Algorithm Serial (us) Parallel (us) Speedup
parallel_heap_sorter 937,969 91,831 10.2x
parallel_grail_sorter 693,035 88,652 7.8x
parallel_quick_sorter 566,381 69,205 8.2x
parallel_merge_sorter 559,396 84,134 6.6x
parallel_pdq_sorter 164,428 62,990 2.6x
parallel_simd_sorter 24,911 63,620 0.4x*

*For SIMD-compatible types, sequential simd_sorter outperforms parallel version

Serial vs Parallel Performance Comparison

Run ./build.sh -c <algorithm> to compare serial and parallel versions:

./build.sh -c all      # Compare all algorithm pairs
./build.sh -c merge    # Compare merge_sorter vs parallel_merge_sorter
./build.sh -c quick    # Compare quick_sorter vs parallel_quick_sorter
./build.sh -c pdq      # Compare pdq_sorter vs parallel_pdq_sorter
./build.sh -c tim      # Compare tim_sorter vs parallel_tim_sorter
./build.sh -c heap     # Compare heap_sorter vs parallel_heap_sorter
./build.sh -c grail    # Compare grail_sorter vs parallel_grail_sorter
./build.sh -c simd     # Compare simd_sorter vs parallel_simd_sorter

Typical Speedup (Random Data, 5M elements, 32 cores):

Algorithm Pair Serial (μs) Parallel (μs) Speedup
merge_sorter ~180,000 ~40,000 4-5x
quick_sorter ~150,000 ~35,000 4-5x
pdq_sorter ~140,000 ~35,000 4x
tim_sorter ~600,000 ~80,000 7-8x
heap_sorter ~250,000 ~50,000 5x
grail_sorter ~200,000 ~45,000 4-5x
simd_sorter ~20,000 ~5,000 4x

Note: older versions of the library targeting C++14 are still available in the 1.x.y-develop and 1.x.y-stable, but they are not actively developed anymore. Open an issue if you need anything to be backported.

The main features & the extra features

cpp-sort provides a full set of sorting-related features. Here are the main building blocks of the library:

  • Every sorting algorithm exists as a function object called a sorter
  • Sorters can be wrapped in sorter adapters to augment their behaviour
  • The library provides a sorter facade to easily build sorters
  • Fixed-size sorters can be used to efficiently sort tiny fixed-size collections
  • Metrics can be used to gather information about the sorting operation
  • Measures of disorder can be used to evaluate the disorder in a collection

Here is a more complete example of what can be done with the library:

#include <algorithm>
#include <cassert>
#include <forward_list>
#include <functional>
#include <vector>
#include <cpp-sort/adapters.h>
#include <cpp-sort/sorters.h>

int main()
{
    struct wrapper { int value; };

    std::forward_list<wrapper> li = { {5}, {8}, {3}, {2}, {9} };
    std::vector<wrapper> vec = { {5}, {8}, {3}, {2}, {9} };

    // When used, this sorter will use a pattern-defeating quicksort
    // to sort random-access collections, and a mergesort otherwise
    cppsort::hybrid_adapter<
        cppsort::pdq_sorter,
        cppsort::merge_sorter
    > sorter;

    // Sort li and vec in reverse order using their value member
    sorter(li, std::greater{}, &wrapper::value);
    sorter(vec, std::greater{}, &wrapper::value);

    assert(std::equal(
        li.begin(), li.end(),
        vec.begin(), vec.end(),
        [](const auto& lhs, const auto& rhs) { return lhs.value == rhs.value; }
    ));
}

Even when the sorting functions are used without the extra features, they still provide some interesting guarantees (ideas often taken from the Ranges TS):

  • They provide both an iterator and a range interface
  • When possible, they accept a custom comparator parameter
  • Most of them accept a projection parameter
  • They correctly handle proxy iterators with iter_swap and iter_move
  • They also work when iterators don't provide post-incrementation nor post-decrementation
  • The value types of the collections to be sorted need not be default-constructible
  • The value types of the collections to be sorted need not be copyable (only movable)
  • Stateless sorters can be converted to a function pointer for each overloaded operator()
  • Sorters are function objects: they can directly be passed as "overload sets" to other functions

You can read more about all the available tools and find some tutorials about using and extending cpp-sort in the wiki.

Benchmarks

The following graph has been generated with a script found in the benchmarks directory. It shows the time needed for heap_sort to sort one million elements without being adapted, then when it is adapted with either drop_merge_adapter or split_adapter.

Graph showing the speed difference between heap_sort raw, then adapted with split_adapter and drop_merge_adapter, when the number of inversions in the std::vector<int> to sort increases

As can be seen above, wrapping heap_sort with either of the adapters makes it adaptive to the number of inversions in a non-intrusive manner. The algorithms used to adapt it have different pros and cons, it is up to you to use either.

This benchmark is mostly there to show the possibilities offered by the library. You can find more such commented benchmarks in the dedicated wiki page.

Compiler support & tooling

Ubuntu builds status MSVC builds status MinGW-w64 builds status MacOS builds status

cpp-sort requires C++20 support, and should work with the following compilers:

  • g++-11 or more recent.
  • clang++-13 or more recent (with both libstdc++ and libc++).
  • The versions of MinGW-w64 and AppleClang equivalent to the compilers mentioned above.
  • Visual Studio 2022 version 17.14.36414.22 or more recent, only with /permissive-. A few features are unavailable.
  • clang-cl corresponding the the Visual Studio version above.

The compilers listed above are the ones used by the CI pipeline, and the library is also tested with the most recent versions of those compilers on a regular basis. All the other compiler versions in-between are untested, but should also work. Feel free to open an issue if it isn't the case.

The features in the library might differ depending on the C++ version used and on the compiler extensions enabled. Those changes are documented in the wiki.

The main repository contains additional support for standard tooling such as CMake or Conan. You can read more about those in the wiki.

Thanks

I got a new car. I just need to put it together. They’re easier to steal piece by piece. — Jarod Kintz, $3.33

Even though some parts of the library are original research and some others correspond to custom and rather naive implementations of standard sorting algorithms, cpp-sort also reuses a great deal of code and ideas from open-source projects, often altered to integrate seamlessly into the library. Here is a list of the external resources used to create this library. I hope that the many different licenses are compatible. If it is not the case, please contact me (or submit an issue) and we will see what can be done about it:

  • Some of the algorithms used by insertion_sorter and pdq_sorter come from Orson Peters' pattern-defeating quicksort. Some parts of the benchmarks come from there as well.

  • The algorithm used by tim_sorter comes from Goro Fuji's (gfx) implementation of a Timsort.

  • The three algorithms used by spread_sorter come from Steven Ross Boost.Sort module.

  • The algorithm used by d_ary_spread_sorter comes from Tim Blechmann's Boost.Heap module.

  • The algorithm used by spin_sorter comes from the eponymous algorithm implemented in Boost.Sort. by Francisco Jose Tapia.

  • utility::as_function, and several projection-enhanced helper algorithms come from Eric Niebler's Range v3 library. Several ideas such as proxy iterators, customization points and projections, as well as a few other utility functions also come from that library or from the related articles and standard C++ proposals.

  • The algorithm used by ska_sorter comes from Malte Skarupke's implementation of his own ska_sort algorithm.

  • The algorithm used by drop_merge_adapter comes from Adrian Wielgosik C++ reimplementation of Emil Ernerfeldt's drop-merge sort.

  • Many enhanced standard algorithms are directly adapted from their counterparts in libc++, enhanced to handle both projections and proxy iterators.

  • The library internally uses an inplace_merge function that works with forward iterators. Its implementation uses a merge algorithm proposed by Dudziński and Dydek, and implemented by Alexander Stepanov and Paul McJones in their book Elements of Programming.

  • The inplace_merge overload for random-access iterators uses the Symmerge algorithm proposed by Pok-Son Kim and Arne Kutzner in Stable Minimum Storage Merging by Symmetric Comparisons when there isn't enough memory available to perform an out-of-place merge.

  • The implementation of Dijkstra's smoothsort used by smooth_sorter has been directly adapted from Keith Schwarz's implementation of the algorithm.

  • The algorithm used by wiki_sorter has been adapted from BonzaiThePenguin's WikiSort.

  • The algorithm used by grail_sorter has been adapted from Mrrl's GrailSort.

  • The algorithm used by indirect_adapter with forward or bidirectional iterators is a slightly modified version of Matthew Bentley's indiesort.

  • The implementation of the random-access overload of nth_element used by some of the algorithms comes from Danila Kutenin's miniselect library and uses Andrei Alexandrescu's AdaptiveQuickselect algorithm.

  • The sorting networks used by sorting_network_sorter all come from this list maintained by Bert Dobbelaere. The page has references to the sources of all of the sorting networks it lists.

  • Some of the optimizations used by sorting_network_sorter come from this discussion on StackOverflow and are backed by the article Applying Sorting Networks to Synthesize Optimized Sorting Libraries.

  • The algorithm behind utility::quicksort_adversary is a fairly straightforward adaptation of the one provided by M. D. McIlroy in A Killer Adversary for Quicksort.

  • The algorithm used by utility::check_strict_weak_ordering is a reimplementation of the one desribed in the README file of Danila Kutenin's quadratic_strict_weak_ordering project.

  • The SIMD-optimized sorting implementation in simd_sorter comes from Intel's x86-simd-sort library, which provides AVX2 and AVX-512 accelerated sorting for primitive types.

  • The parallel sorting implementations (parallel_sorter, parallel_merge_sorter, parallel_quick_sorter, parallel_pdq_sorter, parallel_simd_sorter, parallel_tim_sorter, parallel_heap_sorter, parallel_grail_sorter) use libfork, a lock-free, wait-free, continuation-stealing coroutine tasking library built on C++20 coroutines.

  • The test suite reimplements random number algorithms originally found in the following places:

  • The LaTeX scripts used to draw the sorting networks are modified versions of kaayy's sortingnetwork.tex, slightly adapted to be 0-based and draw the network from top to bottom.

  • The CMake tools embedded in the projects include scripts from RWTH-HPC/CMake-codecov.

  • Some of the benchmarks use a colorblind-friendly palette developed by Thøger Rivera-Thorsen.