Fix #2546: Implemented ADT-based probe search and batched AllReduce by guptapratykshh · Pull Request #2679 · su2code/SU2

Proposed Changes

This PR addresses the performance bottleneck experienced when using a large number of probes (e.g., >100) in parallel simulations. The previous implementation used a brute-force O(N) linear search per probe to find the nearest grid point, resulting in O(N_probes * N_points) complexity which caused significant slowdowns.

The changes include:

  1. ADT-Based Nearest Neighbor Search: Implemented an Alternating Digital Tree (ADT) strategy for probe location. This reduces search complexity to O(log(N_points)) per probe. A heuristic is used to switch to ADT only when the number of probes exceeds a threshold (default: 10).
  2. Batched Communication: Consolidated the AllReduce operations for probe values. Instead of performing one MPI reduction per probe, all probe values are now collected and reduced in a single batched operation at the end of the output routine, significantly reducing MPI overhead.
  3. Regression Testing: Added a new regression test case (test_11_probes.cfg) to parallel_regression.py that specifically exercises the ADT path (11 probes) and verifies the correctness of the probe output against known values.

Related Work

Fixes issue #2546 (Probe performance bottleneck).

PR Checklist

  • I am submitting my contribution to the develop branch.
  • My contribution generates no new compiler warnings (try with --warnlevel=3 when using meson).
  • My contribution is commented and consistent with SU2 style (https://su2code.github.io/docs_v7/Style-Guide/).
  • I used the pre-commit hook to prevent dirty commits and used pre-commit run --all to format old commits.
  • I have added a test case that demonstrates my contribution, if necessary.
  • I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp), if necessary.