`Data.func` and friends: migrate to Dask by sadielbartholomew · Pull Request #300 · NCAS-CMS/cf-python

Convert the Data.func method towards #182, in doing so daskifying several other element-wise operation methods that use it to perform the underlying operation, as follows (though note the below context since in short most of these can and will be then converted to use Dask built-ins rather than calling func):

  • the trigonometric and hyperbolic methods and their inverses (12 in total: sin, arcsin, sinh, arcsin etc.);
  • ceil;
  • exp;
  • floor;
  • rint;
  • round;
  • trunc;

and daskifying log except in the case of setting a base that isn't e, 2 or 10 which requires which requires __itruediv__ to be daskified to work.

Context and plan/proposal for follow-on work

As well as daskifying func this PR effectively daskifies the other methods listed above in doing so because they use func to operate in the code at present, hence I have removed the test-skipping decorators from the unit tests for, and added the daskified marker to, each of the methods (except log where a comment has been made to note the nearly-daskified state), however such methods are more efficiently daskified by conversion to using the existing Dask built-in equivalents instead of calling func this way.

So the plan is to use follow-on PRs to convert from use of func to the appropriate built-in in each method. It turns out there are Dask built-ins for every one listed above available for us to use directly.

A small complication is in the case of the trig. and hyperbolic methods with a restricted domain (4 of the 12, e.g. arcsin which is undefined outside of [-1, 1]), since we support a preserve_invalid keyword which preserves any NaN or inf like values rather than masking them as important for the use context of cf-python. In order to still support that for methods we think should have it as a keyword flag, we should use the daskified func rather than the Dask built-in, as it already implements that keyword. Overall, therefore, my proposal is as follows, to be applied in a series of new PRs:

For data methods that apply a trivial operation in an element-by-element manner (see list above), use the following:

  1. If there is a Dask built-in equivalent method, use that directly, except where there is a valid reason we can't, notably as an example for methods where there is a restricted domain such that the output may have new masked elements, where we want to have a preserve_invalid option that NumPy does not support in any trivial way.
  2. Otherwise use func to apply the operation.