TCTracks: improve hdf5 I/O by tovogt · Pull Request #735 · CLIMADA-project/climada_python
Changes proposed in this PR:
- This changes how
TCTracks.from_hdf5andTCTracks.write_hdf5work internally. While it does not change the API, it changes the file structure and is not backwards compatible the way it is implemented here in the sense that files that have been stored with the old implementation cannot be read with the new implementation. - As mentioned in the PR that originally introduced this feature (HDF5 file IO for TCTracks #349), the earlier file format was nice to understand, but it required a lot of disk space and was very different from other file formats that are commonly used to store TC tracks (like IBTrACS or the format used by CHAZ or by Kerry Emanuel). I also found in the meantime that the I/O is really, really slow for large numbers of tracks.
- I originally implemented this feature to have a compact file format for storing large numbers of TC tracks, but I found that it's simply too slow for real applications. I also thought that this might be a format that we will use in ISIMIP3a to provide TC tracks, but since it's so slow and has a quite unusual structure, we decided to use a different format instead. The format proposed in this PR is very close to what we will use in ISIMIP3a. We will just rename most of the variables, basically.
I think that nobody currently uses the HDF5 I/O feature of TCTracks objects, and think that it's safe to drop backwards compatibility, but I might be wrong. I think, I also talked to @chahan about this a few months ago and he was quite optimistic that nobody actually uses this feature and we can easily change the file format at some point.
@ThomasRoosli You modified the TCForecast class to be compatible with this feature back in the day (CLIMADA-project/climada_petals#33). The proposed new format won't require any changes to the wrapper you wrote for TCForecast. However, did you end up using this HDF5 I/O feature in practice?
@ChrisFairless @bguillod and everyone that uses CLIMADA's TC feature: Do you know anyone who actually uses the HDF5 IO features of the TCTracks class?
If you insist that backwards compatibility is important, I would have this proposition: We let write_hdf5 write to the new format no matter what - this should be safe. But in from_hdf5, we implement a check that determines whether the user is trying to read from the legacy file format and falls back to the old implementation if necessary. I would prefer to drop backwards compatibility because it's quite a lot of lines of code, and I'm quite convinced that nobody uses the old format. But I might be wrong...
PR Author Checklist
- Read the Contribution Guide
- Correct target branch selected (if unsure, select
develop) - Descriptive pull request title added
- Source branch up-to-date with target branch
- Documentation updated
- Tests updated
- Tests passing
- No new linter issues
- Changelog updated
PR Reviewer Checklist
- Read the Contribution Guide
- CLIMADA Reviewer Checklist passed
- Tests passing
- No new linter issues