Hybrid Parallel AD (Part 3/?) by jblueh · Pull Request #1294 · su2code/SU2
Proposed Changes
- Fix preaccumulation issues with
RealReverseIndex. FixIdentify issues regarding index reuse and partial tape evaluations.- Add regression tests for hybrid parallel AD (subset of parallel_regression_AD).
- Re-enable parallel preaccumulation.
- Shared reading optimizations.
Related Work
continues #1284
PR Checklist
- I am submitting my contribution to the develop branch.
- My contribution generates no new compiler warnings (try with the '-Wall -Wextra -Wno-unused-parameter -Wno-empty-body' compiler flags, or simply --warnlevel=2 when using meson).
- My contribution is commented and consistent with SU2 style.
- I have added a test case that demonstrates my contribution, if necessary.
- I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp) , if necessary.
jblueh
changed the title
WIP: Hybrid Parallel AD (Part 3/?)
[WIP] Hybrid Parallel AD (Part 3/?)
Part 3, but the branch-name is is hybrid_parallel_ad4 ... who are you trying to fool?
(jk :D , that mismatch was already in the previous PR)
f501dc1 fixes divergence of the disc_adj_fsi case among the parallel regression tests that occurred when using the CoDiPack type RealReverseIndex instead of RealReverse.
Since the bug is not obvious from the commit and there might be a better way to fix this, let me put this up for discussion.
In CDiscAdjMultizoneDriver::SetRecording, the tape is recorded part by part. The part of the code that is at fault involves three of them.
DEPENDENCIES -> OBJECTIVE_FUNCTION: an index is last used on a right hand sideOBJECTIVE_FUNCTION -> TRANSFER: the index is available for reuse and used on the left hand siderecording for a zone: the index is used on a right hand side
In CDiscAdjMultizoneDriver.cpp::ComputeAdjoints, however, the TRANSFER -> OBJECTIVE_FUNCTION part of the evaluation is skipped in some cases. After the recording for a zone evaluation accumulated adjoints for the said index, the adjoint reset that would be performed in the skipped tape part is missing. Instead, the tape evaluation DEPENDENCIES -> OBJECTIVE_FUNCTION receives the adjoints for this index, which is wrong because the index is associated with a different primal variable in this part of the tape.
Have there been issues regarding dependencies between tape parts previously? Are there mechanisms established that address such issues?
If someone wants to give it a try, I added a build option for the tape type in 183c3ca. You can switch to the index tape by calling meson with -Dcodi-tape=JacobianIndex.
Comment on lines +61 to +62
| if get_option('codi-tape') == 'JacobianIndex' | ||
| codi_rev_args += '-DCODI_INDEX_TAPE' |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Parallel preaccumulations must not share any variables. In particular, they must not have any common inputs and outputs. I suspect (judging from the code) that the preaccumulations treated in 781092a violate this pattern, and that this is the reason why excluding them makes several hybrid parallel AD tests pass. Other preaccumulations might be affected as well. The correctness of preaccumulations inside numerics objects, for example, depends on the values of pointers like V_i, V_j. Other preaccumulations might look wrong but are ok because common variables are prevented by color loops.
| END_SU2_OMP_FOR | ||
|
|
||
| AD::EndNoSharedReading(); | ||
|
|
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these shared reading optimizations depend on TimeStep being passive. Does this always hold true?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a fair-enough assumption, I added some more over the Primitive loops and removed some over smaller loops where the performance benefit might not justify the increased maintenance.
pcarruscag
changed the title
[WIP] Hybrid Parallel AD (Part 3/?)
Hybrid Parallel AD (Part 3/?)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the only thing left is to decide whether hybrid AD compiles by default if -Dwith-omp=true or if a special flag is added?
So the only thing left is to decide whether hybrid AD compiles by default if -Dwith-omp=true or if a special flag is added?
You mean a flag like -Dwith-opdi? This would only be meaningful for reverse AD + OpenMP. Conversely, reverse AD + OpenMP are not very meaningful without OpDiLib (you could probably execute it with -t 1, but then why bother using OpenMP in the first place). So if that was the question, I would say we leave it the way it is at the moment.
The omp simd directive is used in the reverse mode for passive linear algebra things, not a huge deal.
I see, you can't drop -Dwith-omp without losing vectorization. For a vectorized serial code, you want to be able to compile with OpenMP but make everything else, or at least AD, behave as if you wouldn't?
We don't really lose vectorization... the compiler will still generate it eventually, it will just be a bit less consistent across compilers and what they include in -O2 or -O3.
This is mostly for vector operations which probably don't take a huge time anyway, to justify another flag.
Maybe FORCE_OPDI_OMPT_BACKEND could be used to encode an "off-switch"?
Have a look at a9466bb for a suggestion on how to disable OpDiLib within the scope of the existing build options (casually tested).
To clarify the AD implications: If you use OpDiLib, you will have atomic adjoints where no shared reading optimizations are applied, even with -t 1. Disabling OpDiLib - if we add that option - will amend this. The presence of _OPENMP, however, makes SU2 use the thread-safe CoDiPack type and enables thread-safety augmentations in CoDiPack. Therefore, a reverse AD + OpenMP - OpDiLib build is not the same as a classical serial reverse AD build that is vectorized one way or the other.
If we add this, reverse AD + OpenMP - OpDiLib should also print an error for -t 2 or higher.
Since I am not 100% sure about it yet, could you clarify again what it is that you want to achieve in the end, and why?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters