New method `pad_missing` to support aggregation of DSGs
It is currently not possible to aggregate two DSG features of different lengths into a single DSG with a larger feature axis. E.g.
>>> a <CF Field: precipitation_flux(cf_role=timeseries_id(2), ncdim%timeseries(9)) kg m-2 day-1> >>> b <CF Field: precipitation_flux(cf_role=timeseries_id(3), ncdim%timeseries(5)) kg m-2 day-1> >>> cf.aggregate([a,b]) [<CF Field: precipitation_flux(cf_role=timeseries_id(2), ncdim%timeseries(9)) kg m-2 day-1>, <CF Field: precipitation_flux(cf_role=timeseries_id(3), ncdim%timeseries(5)) kg m-2 day-1>
This is something we might want to do, because we can store DSGs of different lengths in one CF-netCDF data variable using a ragged array representation.
However, if we could pad out the ncdim%timeseries axis of b with missing data then we could do this with a new pad_missing method:
>>> # Pad out the 'ncdim%timeseries' axis with missing data: >>> # 0 elements at the start of the axis and 4 elements at the end: >>> b = b.pad_missing('ncdim%timeseries', (0, 4)) >>> c = cf.aggregate([a,b]) # Now this aggregates >>> c <CF Field: precipitation_flux(cf_role=timeseries_id(5), ncdim%timeseries(9)) kg m-2 day-1>] >>> # Compress the field >>> c = c[0].compress('contiguous') >>> # Write it to disk in a single CF-netCDF data variable *without* the extra padding >>> cf.write(c, 'dsg.nc')
Numpy and Dask have a pad method that lets you do all sorts of fancy padding, but not for missing data. Their API is also more general. As a result, it may be better to call our method pad_missing to discern it from the more general pad method, and it would always be possible to implement a full cf-python pad in the future if ever the need arose.
PR to follow.