Fix #2546: Implemented ADT-based probe search and batched AllReduce by guptapratykshh · Pull Request #2679 · su2code/SU2
Proposed Changes
This PR addresses the performance bottleneck experienced when using a large number of probes (e.g., >100) in parallel simulations. The previous implementation used a brute-force O(N) linear search per probe to find the nearest grid point, resulting in O(N_probes * N_points) complexity which caused significant slowdowns.
The changes include:
- ADT-Based Nearest Neighbor Search: Implemented an Alternating Digital Tree (ADT) strategy for probe location. This reduces search complexity to O(log(N_points)) per probe. A heuristic is used to switch to ADT only when the number of probes exceeds a threshold (default: 10).
- Batched Communication: Consolidated the
AllReduceoperations for probe values. Instead of performing one MPI reduction per probe, all probe values are now collected and reduced in a single batched operation at the end of the output routine, significantly reducing MPI overhead. - Regression Testing: Added a new regression test case (test_11_probes.cfg) to parallel_regression.py that specifically exercises the ADT path (11 probes) and verifies the correctness of the probe output against known values.
Related Work
Fixes issue #2546 (Probe performance bottleneck).
PR Checklist
- I am submitting my contribution to the develop branch.
- My contribution generates no new compiler warnings (try with --warnlevel=3 when using meson).
- My contribution is commented and consistent with SU2 style (https://su2code.github.io/docs_v7/Style-Guide/).
- I used the pre-commit hook to prevent dirty commits and used
pre-commit run --allto format old commits. - I have added a test case that demonstrates my contribution, if necessary.
- I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp), if necessary.