Feature/subset_years by NicolasColombi · Pull Request #1023 · CLIMADA-project/climada_python

@NicolasColombi

Changes proposed in this PR:

This PR adds a function that allows to subset tracks by year after having them loaded into a TCTtracks object, as not all methods to load tracks allow filtering by year (e.g. from_simulations_emanuel())

-climada.hazard.tc_tracks.TCTracks.subset_year()
-climada.hazard.test.ttest_c_tracks.TestFuncs.test_subset_year()

PR Author Checklist

PR Reviewer Checklist

@NicolasColombi

@NicolasColombi

sarah-hlsn

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function and tests look great to me, made a minor suggestion for the docstrings. It might be worth it to add functionalities to filter by date, which could be useful for some users, though admittedly not to the same degree as filtering by year so we may want to avoid spending more time on this.

Otherwise, ready for merge from my side!

year = date_array.astype("datetime64[Y]").item().year
except AttributeError:
raise ValueError(
f"Invalid date format in track {i}, could not extract year."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider to use id as a fallback since the id strings provided by IBTrACS start with the year. Then again this would only work for IBTrACS (or at least I don't know for other datasets) so this may overcomplicate things.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not, for the reason you just mentioned, that it would only work for IBtracks. But, I will check the most common tracks source and see if the method works for them. So far, Emmanuel, chaz, FAST works. The problem is that the time vector has different formats between different sources... So the way I used to access "year" may not work for all of them; I will double-check.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!


return out

def subset_year(self, start_year: int = None, end_year: int = None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a useful new function to have! I am wondering whether it could be worthwile to extend the functionality for the user to also be able to select based on months and days. It would make this function more complex but give much more flexibility for a variety of applications...up to you @NicolasColombi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sarah-hlsn! Indeed it would be a good idea, and in principle, it should not be much more work to implement. I will carve out some time to do it.

@NicolasColombi @sarah-hlsn

Co-authored-by: Sarah Hülsen <49907095+sarah-hlsn@users.noreply.github.com>

@NicolasColombi

Function and tests look great to me, made a minor suggestion for the docstrings. It might be worth it to add functionalities to filter by date, which could be useful for some users, though admittedly not to the same degree as filtering by year so we may want to avoid spending more time on this.

Otherwise, ready for merge from my side!

Thank you Sarah. By date do you mean being able to select in this format: e.g. from 2025-02-15 to 2027-08-03 ?
picking up your comment below, also allowing to only filter by month or day? e.g., all tracks that originated in February? I think it would be useful.

@sarah-hlsn

Yes, I was thinking both could be useful! (YY-mm-dd format, as well as filtering across years by month)

@NicolasColombi

@sarah-hlsn I updated the function as discussed: you can now select by date, filter by year or month or day or a combination of those. The tests are modified accordingly. Let me know if you can have a quick look 🙃

@sarah-hlsn

Thanks for implementing these changes, looks good to me!