[WIP] Migration to structured configurations

Migrating to typed and validated configurations

This Issue will act as a placeholder to summarize the work in progress for migrating from plain dictionaries to typed and validated configuration classes. Originally this idea was formulated in #3172, but to facilitate easier review and a smooth migration the progress will be spilt into some intermediate steps with separate PRs. The old configuration system will stay in place during the intermediate steps and a versioning system will be put in place to facilitate long-term backwards compatibility across config versions.

Procedure

The migration will consist of small PRs subsequent changes, which can be reviewed separately, before merging into a feature branch.

Work in progress / roadmap

(ticking after review and merging into the feature branch)

Centralize project configuration logic
Improve testing to capture existing behavior
New config_mixin for easy moving between types (yaml file <> dict <> dataclass <> DictConfig)
Add typed configs for project config, pytorch config, etc. with pydantic validation
Add typed 3D Project Configs (and tensorflow ?)
Replace dictionary configs with identical DictConfig version (smooth transition)
Add versioning system for migration between old and new configs
Improve configurations -> reduce duplicated fields, correct casing
Address in-place configuration edits throughout the pipeline
Add aliasing system for accessing new fields using old fieldnames (e.g. corer2move2 -> corner2move2)
Replace loaders in core config (e.g. deprectate read_config in favor of typed ProjectConfig)
Address None-type / missing config issues in downstream code
Add tracking system for storing / logging changes in configuration (clean/dirty status w.r.t. disk)
Refactor patterns getting items from None-type subconfigurations (configurations that are allowed to be missing)
Add flags for affordance in downstream use cases
Fix circular imports in core/config
storing parallel versions (i.e. copy instead of overwrite)
Verify test coverage and improve if necessary
Improve loaders in e.g. deeplabcut/pose_estimation_pytorch/data/base.py (mismatching init, mixed project/pose logic)
Remove intermediate OmegaConf DictConfig representation and migrate to fully typed
add LazyConfig ?

Related PRs

Motivation

As formulated by @arashsm79 in #3172

Summary

Introduce new configuration classes for inference, logging, model, pose, project, runner, and training settings.
Refactore data loading mechanisms to utilize new configuration structures.
Move the multithreading and compilation options in inference configuration to the config module.
Typed configuration for logging.
Update dataset loaders to accept model configurations directly or via file paths.

Why Typed & Structured Configuration (OmegaConf + Pydantic)

Strong guarantees for correctness
- Runtime type safety ensures invalid configs fail fast with clear errors instead of silently producing incorrect training runs.
- Schema-validated configs dramatically reduce debugging time for users and maintainers.
Static typing improves developer velocity
- IDE autocomplete and inline documentation make configs discoverable and self-documenting.
- Refactors become safer: config changes are more likely to be caught at development time.
Hierarchical, composable configuration
- Natural representation of DeepLabCut’s nested project/model/training settings.
- Easy composition and merging from multiple sources (base config, model presets, experiment overrides).
Cleaner overrides and defaults.
Structured configs make it easier to define parameter ranges for tuning and automation.
Config schemas can be versioned and evolve safely over time while preserving backward compatibility.
Full, validated configuration can be saved alongside results, which improves reproducibility and transparency.
Builds on well-maintained, widely adopted libraries (OmegaConf, Pydantic).

Resources for knowing more about structured configs:

hydra
MIT Responsible AI's hydra-zen
Pydantic
Omegaconf
Soklaski, R., Goodwin, J., Brown, O., Yee, M. and Matterer, J., 2022. Tools and practices for responsible AI engineering. arXiv preprint arXiv:2201.05647.

Future Work

Currently default model definitions are still stored as yaml files in the package. Moving to LazyConfig as in Detectron 2 would improve things significantly.

More things that could be done ( @deruyter92 ):

I think we need to make sure that everytime a model is used, all the changes to the project's config.yaml are reflected in the model's configuration under metadata as well.
There might be a better way to handle things in deeplabcut/pose_estimation_pytorch/data/base.py.