evaluation refactor by n-poulsen · Pull Request #2679 · DeepLabCut/DeepLabCut
Conversation
Evaluation refactor
Improvements to the evaluation code, as well as new tests to ensure that mAP scores match the pycocotools implementation.
Change list:
- Moved all metric computation code to a
deeplabcut/core/metricsfolder (as metrics are computed withnumpy) - Cleaned metric computation code so the prediction/ground truth matching always happens
- Refactored in a way such that no OOM errors should occur, even on very large datasets (>60k images)
- Multi-animal RMSE: only compute RMSE using (ground-truth, detection) matches with non-zero RMSE
- Add
compute_detection_rmseto compute "detection" RMSE, matching the DeepLabCut 2.X implementation - Fixed the bug for PAF models documented in Evaluation error with PAF heads:
ValueError: matrix contains invalid numeric entries#2631
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, although I did not test
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but see suggested changes to docstrings
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe in main docs we need to add information about these new metrics as well @n-poulsen
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters