Compute Wasserstein distance with dimension mismatch
Dear contributors,
I discover that OT is (still) able to computer the Wasserstein distance of two discrete measures when the number of weights is lower with the number of spikes.
To be more precise, I can call
With a.size != C.shape[0] and b.size != C.shape[1].
According to the doc, (see https://pot.readthedocs.io/en/latest/all.html#module-ot), it should not be the case.
To Reproduce
Create two discrete distributions with
- n=100 weights
- n+1 spikes
See the following minimal working example:
import numpy as np import ot n_points = 101 n_weight = 100 # Draw samples np.random.seed(24) samples_1 = np.random.normal(0., 1., n_points) samples_2 = np.random.normal(0., 1., n_points) # b) weights (of spikes) weights_1 = np.random.exponential(1., n_weight) weights_1 /= np.sum(weights_1) weights_2 = np.random.exponential(1., n_weight) weights_2 /= np.sum(weights_2) # Compute OT matC = ot.dist(samples_1.reshape((n_points, 1)), samples_1.reshape((n_points, 1))) matC /= matC.max() was = ot.emd2(weights_1, weights_2, matC) print(was)
Desktop:
- OS: MacOSX Mojave
- Python version: 3.7.5
- POT was installed with pip (and was up to date)
Additional context
Even though the seed is set at the beginning of the MWE, there are cases where POT returns "UserWarning: Problem infeasible. Check that a and b are in the simplex" (because of randomness in the implementation?)