Load scheduler state when resuming training by n-poulsen · Pull Request #2788 · DeepLabCut/DeepLabCut

This pull request addresses learning rate schedulers when resuming training. Currently, the schedulers are simply built from the config file. So when continuing to train from an existing snapshot, the scheduler will restart at epoch 0, and the learning rate won't be adapted (as mentioned in #2784).

In this pull request, the code is updated to:

  • save the scheduler state dicts in snapshots
  • when resuming training, try to load the state dict for the scheduler
    • if successful, set the optimizers learning rate to the last learning rate from the scheduler
  • a load_scheduler_state_dict key is added to the runner configuration
    • the default is True: loading a snapshot with a saved scheduler to continue training will load the scheduler state dict
    • however, users might edit the scheduler's parameters and want an updated learning rate to continue training
    • in that case, they need to set load_scheduler_state_dict: false so the state dict doesn't overwrite their edited parameters
    • the self.starting_epoch value is used to set the last_epoch in the scheduler; so the way the learning rate will be schedule matches the epochs printed in the logs

Tests were added to validate that the expected learning rates are applied.