[MRG] Cupy backend by ncassereau · Pull Request #315

[MRG] Cupy backend by ncassereau · Pull Request #315 · PythonOT/POT

Types of changes

Docs change / refactoring / dependency upgrade
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Motivation and context / Related issue

Cupy backend

How has this been tested (if it applies)

Tested with cupy 9.0.0

Performance

Make pytest will show that cupy's tests are very slow. This is my outlook on that:
For gromov, generator.choice is used in numerous functions but for cupy it requires sending the p argument back on the CPU. These recurrent operations are limited by memory bandwidth and penalises GPUs pretty harshly.

For other tests, GPUs have an advantage for bigger calculation and tests are usually done with tiny problems, for which CPUs might be faster (no overhead because no transfer towards and from GPUs and very quick memory).

To be fair, I tried to resolve different-sized problems on sinkhorn (using pretty much the code from
test/test_gpu.py::test_gpu_sinkhorn). Here are the results (averaged on 500 runs) :

Size	Sinkhorn (Numpy)	Sinkhorn (ot.gpu)	Sinkhorn (CuPy)
50	0.0009	0.0105	0.0111
100	0.0010	0.0106	0.0111
500	0.0027	0.0065	0.0068
1000	0.0093	0.0072	0.0068

For some reason, CupyBackend is even faster than ot.gpu for bigger problems (maybe slightly different algorithm ???). For smaller ones, it might be due to the overhead added by the Backend classes.

Checklist

The documentation is up-to-date with the changes I made.
I have read the CONTRIBUTING document.
All tests passed, and additional code has been covered with new tests.