[MRG] Cupy backend by ncassereau · Pull Request #315 · PythonOT/POT
Types of changes
- Docs change / refactoring / dependency upgrade
- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing functionality to change)
Motivation and context / Related issue
Cupy backend
How has this been tested (if it applies)
Tested with cupy 9.0.0
Performance
Make pytest will show that cupy's tests are very slow. This is my outlook on that:
For gromov, generator.choice is used in numerous functions but for cupy it requires sending the p argument back on the CPU. These recurrent operations are limited by memory bandwidth and penalises GPUs pretty harshly.
For other tests, GPUs have an advantage for bigger calculation and tests are usually done with tiny problems, for which CPUs might be faster (no overhead because no transfer towards and from GPUs and very quick memory).
To be fair, I tried to resolve different-sized problems on sinkhorn (using pretty much the code from
test/test_gpu.py::test_gpu_sinkhorn). Here are the results (averaged on 500 runs) :
| Size | Sinkhorn (Numpy) | Sinkhorn (ot.gpu) | Sinkhorn (CuPy) |
|---|---|---|---|
| 50 | 0.0009 | 0.0105 | 0.0111 |
| 100 | 0.0010 | 0.0106 | 0.0111 |
| 500 | 0.0027 | 0.0065 | 0.0068 |
| 1000 | 0.0093 | 0.0072 | 0.0068 |
For some reason, CupyBackend is even faster than ot.gpu for bigger problems (maybe slightly different algorithm ???). For smaller ones, it might be due to the overhead added by the Backend classes.
Checklist
- The documentation is up-to-date with the changes I made.
- I have read the CONTRIBUTING document.
- All tests passed, and additional code has been covered with new tests.