Change constructs access API to facilitate performance improvements
Current situation
Access to named constructs is currently either by the "filter_by_*" Constructs methods, or by Field attributes:
>>> f = cf.example_field(0) >>> f.dimension_coordinates <Constructs: dimension_coordinate(3)> >>> f.constructs.filter_by_type('dimension_coordinate') <Constructs: dimension_coordinate(3)>
The reason for the "dimension_coordinates" alias being an attribute rather than method is that it was expected that it would often be followed by more options:
>>> f.dimension_coordinates('time') <Constructs: dimension_coordinate(1)> >>> f.dimension_coordinates.filter_by_property(units='degrees_east') <Constructs: dimension_coordinate(1)>
This is fine, but comes with the slight confusion that the result of the attribute is a callable Constructs instance, which obfuscates the help:
>>> help(f.dimension_coordinates) Help on Constructs in module cf.constructs object: class Constructs(cfdm.constructs.Constructs) | Constructs(auxiliary_coordinate=None, dimension_coordinate=None, domain_ancillary=None, field_ancillary=None, cell_measure=None, coordinate_reference=None, domain_axis=None, cell_method=None, source=None, copy=True, _use_data=True, _view=False, _ignore=()) | | A container for metadata constructs. ...
Performance
It turns out that the code for accessing constructs is very slow, and must be improved.
A key part of this improvement relies on being able to work with python dictionaries rather than dictionary-like Constructs objects.
To facilitate this, whilst retaining the intuitive nature of the API the least intrusive change is to make what were attributes properties: i.e. f.dimension_coordinates becomes f.dimension_coordinates(). This then allows keyword parameters that can change the behaviour when speed is an issue.
Code breaking
This change will break any code that uses bare construct access attributes:
>>> # These won't work any more >>> f.auxiliary_coordinates >>> f.coordinate_references >>> f.coordinates >>> f.cell_measures >>> f.dimension_coordinates >>> f.domain_ancillaries >>> f.domain_axes >>> f.cell_methods >>> f.field_ancillaries
>>> # These will work as before >>> f.auxiliary_coordinates() >>> f.coordinate_references() >>> f.coordinates() >>> f.cell_measures() >>> f.dimension_coordinates() >>> f.domain_ancillaries() >>> f.domain_axes() >>> f.cell_methods() >>> f.field_ancillaries()
>>> # These will also work as before >>> f.auxiliary_coordinates(x) >>> f.coordinate_references(x) >>> f.coordinates(x) >>> f.cell_measures(x) >>> f.dimension_coordinates(x) >>> f.domain_ancillaries(x) >>> f.domain_axes(x) >>> f.cell_methods(x) >>> f.field_ancillaries(x)
This is the only backwards incompatible change to the API. All other changes will not break existing code.
With the new API, reading a file is, in one reproducible test, ~3 times faster:
>>> import cf, timeit >>> f = cf.example_field(1) >>> cf.write(f, 'tmp.nc') >>> cf.__version__ 3.8.0 >>> sum(timeit.repeat("cf.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100 0.22618969043996912
>>> import cf, timeit >>> cf.__version__ 3.9.0b1 >>> sum(timeit.repeat("cf.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100 0.07341453492001165
Note that much of the improvement comes from unrelated changes (such as removing unnecessary __repr__ calls and unnecessary deep copies).
See NCAS-CMS/cfdm#130 for further details. In particular, the cfdm version of the above timing test gives a ~10 times speed up - the reasons for the lesser improvement in cf-python will need investigating
Edit: The less speed-up is understandable and due to extra checks being carried out in, e.g., set_construct. This is not to say that things can't be sped up more, but that is for another issue
PR to follow ...