evaluation refactor by n-poulsen · Pull Request #2679 · DeepLabCut/DeepLabCut

Conversation

@n-poulsen

Evaluation refactor

Improvements to the evaluation code, as well as new tests to ensure that mAP scores match the pycocotools implementation.

Change list:

  • Moved all metric computation code to a deeplabcut/core/metrics folder (as metrics are computed with numpy)
  • Cleaned metric computation code so the prediction/ground truth matching always happens
    • Refactored in a way such that no OOM errors should occur, even on very large datasets (>60k images)
  • Multi-animal RMSE: only compute RMSE using (ground-truth, detection) matches with non-zero RMSE
  • Add compute_detection_rmse to compute "detection" RMSE, matching the DeepLabCut 2.X implementation
  • Fixed the bug for PAF models documented in Evaluation error with PAF heads: ValueError: matrix contains invalid numeric entries #2631

MMathisLab

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, although I did not test

MMathisLab

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but see suggested changes to docstrings

MMathisLab

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in main docs we need to add information about these new metrics as well @n-poulsen

Labels

2 participants

@n-poulsen @MMathisLab