[WIP] Migration to structured configurations

Migrating to typed and validated configurations

This Issue will act as a placeholder to summarize the work in progress for migrating from plain dictionaries to typed and validated configuration classes. Originally this idea was formulated in #3172, but to facilitate easier review and a smooth migration the progress will be spilt into some intermediate steps with separate PRs. The old configuration system will stay in place during the intermediate steps and a versioning system will be put in place to facilitate long-term backwards compatibility across config versions.

Procedure

The migration will consist of small PRs subsequent changes, which can be reviewed separately, before merging into a feature branch.

Work in progress / roadmap

(ticking after review and merging into the feature branch)

  • Centralize project configuration logic
  • Improve testing to capture existing behavior
  • New config_mixin for easy moving between types (yaml file <> dict <> dataclass <> DictConfig)
  • Add typed configs for project config, pytorch config, etc. with pydantic validation
  • Add typed 3D Project Configs (and tensorflow ?)
  • Replace dictionary configs with identical DictConfig version (smooth transition)
  • Add versioning system for migration between old and new configs
  • Improve configurations -> reduce duplicated fields, correct casing
  • Address in-place configuration edits throughout the pipeline
  • Add aliasing system for accessing new fields using old fieldnames (e.g. corer2move2 -> corner2move2)
  • Replace loaders in core config (e.g. deprectate read_config in favor of typed ProjectConfig)
  • Address None-type / missing config issues in downstream code
  • Add tracking system for storing / logging changes in configuration (clean/dirty status w.r.t. disk)
  • Refactor patterns getting items from None-type subconfigurations (configurations that are allowed to be missing)
  • Add flags for affordance in downstream use cases
  • Fix circular imports in core/config
  • storing parallel versions (i.e. copy instead of overwrite)
  • Verify test coverage and improve if necessary
  • Improve loaders in e.g. deeplabcut/pose_estimation_pytorch/data/base.py (mismatching init, mixed project/pose logic)
  • Remove intermediate OmegaConf DictConfig representation and migrate to fully typed
  • add LazyConfig ?

Related PRs

  1. [dev] C1 - centralize project config I/O and add testing #3190
  2. [dev] C2 - Add typed configs as pydantic dataclasses & omegaconf dictconfigs #3191
  3. [dev] C3 - Replace configuration dictionaries with DictConfigs. #3194
  4. [dev] C3-II Additional refactoring of configurations  #3212
  5. [dev] C4 - add config migration system #3197
  6. [dev] C5 - Fully typed configs (remove intermediate OmegaConf DictConfig) #3209
  7. [dev] C6 - Add aliasing system for accessing deprecated fields in typed configurations #3211
    ...
    [WIP] Final migration to configuration version 1: structured and validated configs #3198

Motivation

As formulated by @arashsm79 in #3172

Summary

  • Introduce new configuration classes for inference, logging, model, pose, project, runner, and training settings.
  • Refactore data loading mechanisms to utilize new configuration structures.
  • Move the multithreading and compilation options in inference configuration to the config module.
  • Typed configuration for logging.
  • Update dataset loaders to accept model configurations directly or via file paths.

Why Typed & Structured Configuration (OmegaConf + Pydantic)

  • Strong guarantees for correctness

    • Runtime type safety ensures invalid configs fail fast with clear errors instead of silently producing incorrect training runs.
    • Schema-validated configs dramatically reduce debugging time for users and maintainers.
  • Static typing improves developer velocity

    • IDE autocomplete and inline documentation make configs discoverable and self-documenting.
    • Refactors become safer: config changes are more likely to be caught at development time.
  • Hierarchical, composable configuration

    • Natural representation of DeepLabCut’s nested project/model/training settings.
    • Easy composition and merging from multiple sources (base config, model presets, experiment overrides).
  • Cleaner overrides and defaults.

  • Structured configs make it easier to define parameter ranges for tuning and automation.

  • Config schemas can be versioned and evolve safely over time while preserving backward compatibility.

  • Full, validated configuration can be saved alongside results, which improves reproducibility and transparency.

  • Builds on well-maintained, widely adopted libraries (OmegaConf, Pydantic).

Resources for knowing more about structured configs:

Future Work

  • Currently default model definitions are still stored as yaml files in the package. Moving to LazyConfig as in Detectron 2 would improve things significantly.

More things that could be done ( @deruyter92 ):

  • I think we need to make sure that everytime a model is used, all the changes to the project's config.yaml are reflected in the model's configuration under metadata as well.
  • There might be a better way to handle things in deeplabcut/pose_estimation_pytorch/data/base.py.