evaluation refactor by n-poulsen · Pull Request #2679

evaluation refactor by n-poulsen · Pull Request #2679 · DeepLabCut/DeepLabCut

Conversation

Improvements to the evaluation code, as well as new tests to ensure that mAP scores match the pycocotools implementation.

Change list:

Moved all metric computation code to a deeplabcut/core/metrics folder (as metrics are computed with numpy)
Cleaned metric computation code so the prediction/ground truth matching always happens
- Refactored in a way such that no OOM errors should occur, even on very large datasets (>60k images)
Multi-animal RMSE: only compute RMSE using (ground-truth, detection) matches with non-zero RMSE
Add compute_detection_rmse to compute "detection" RMSE, matching the DeepLabCut 2.X implementation
Fixed the bug for PAF models documented in Evaluation error with PAF heads: ValueError: matrix contains invalid numeric entries #2631

The reason will be displayed to describe this comment to others. Learn more.

lgtm, although I did not test

The reason will be displayed to describe this comment to others. Learn more.

but see suggested changes to docstrings

The reason will be displayed to describe this comment to others. Learn more.

maybe in main docs we need to add information about these new metrics as well @n-poulsen