[GSOC24] Addition of CUDA and GPU Acceleration to FGMRES Linear Solver in SU2 by areenraj · Pull Request #2346

[GSOC24] Addition of CUDA and GPU Acceleration to FGMRES Linear Solver in SU2 by areenraj · Pull Request #2346 · su2code/SU2

29 Mar 2025 Commits

The commits currently made aim to provide more control over memory access and transfer so that we can make further changes to the surrounding linear algebra functions which make up the FGMRES Solver.

Streamlined Vector Storage in the GPU

Each vector definition now also creates its corresponding analog in the GPU Memory space. This is done by allocating memory for it using cudaMalloc when the Initialize function of the CSysVector Class is called. This allows for the continuous storage of vector data in the GPU memory in between calls of the Matrix Vector Product and other linear algebra functions. The previous implementation only allowed this data to persist during a single call of the Matrix Vector Product and it had to be refreshed for each call.

This implementation was similar to how the matrix was stored previously. Saving the matrix in pinned memory is also removed due to its huge size as pointed out by earlier feedback. The device pointer can be accessed at any point using a new dedicated public member function (GetDevicePointer) of the CsysVector Class.

Added Memory Transfer Control

As previously discussed, we needed more control over the memory transfer in between calls so that a larger load of the computation could be carried out on the GPU without multiple memory transfers. Now these transfers are carried out by member functions with a flag built into them to decide whether the copy needs to be carried out or not. This flag is set to true by default and does not need to be specified all the time.

Further changes are necessary to actually use this flag to decrease the frequency of memory transfer - namely a variable that allows the inner loop of FGMRES to communicate with the MatrixVectorProduct function to know when to switch the flag on or off. This will be added after I port the preconditioner.

Minor Change - Standardized File Structure Slightly

Redundant .cuh header files are now gone. I have added a GPUComms.cuh file so that any functions that need to be accessed by all CUDA files - like error checking - can be added here for future references. I've also added GPUVector and GPUMatrix files here - each containing the cuda wrapping member functions for the CSysVector and CSysMatrix class respectively.

Please let me know if you notice any bugs or if my implementations can be improved in any way.