Load scheduler state when resuming training by n-poulsen · Pull Request #2788 · DeepLabCut/DeepLabCut
This pull request addresses learning rate schedulers when resuming training. Currently, the schedulers are simply built from the config file. So when continuing to train from an existing snapshot, the scheduler will restart at epoch 0, and the learning rate won't be adapted (as mentioned in #2784).
In this pull request, the code is updated to:
- save the scheduler state dicts in snapshots
- when resuming training, try to load the state dict for the scheduler
- if successful, set the optimizers learning rate to the last learning rate from the scheduler
- a
load_scheduler_state_dictkey is added to the runner configuration- the default is
True: loading a snapshot with a saved scheduler to continue training will load the scheduler state dict - however, users might edit the scheduler's parameters and want an updated learning rate to continue training
- in that case, they need to set
load_scheduler_state_dict: falseso the state dict doesn't overwrite their edited parameters - the
self.starting_epochvalue is used to set thelast_epochin the scheduler; so the way the learning rate will be schedule matches the epochs printed in the logs
- the default is
Tests were added to validate that the expected learning rates are applied.