fix: Add additional dependencies for ModelTrainer (5668) by aviruthen · Pull Request #5731 · aws/sagemaker-python-sdk
🤖 Iteration #1 — Review Comments Addressed
Add dependencies support for ModelTrainer's SourceCode config
Description
When migrating from SageMaker v2 to v3, the V2 Estimator's dependencies parameter allows users to specify a list of local directory paths containing additional libraries that are uploaded to S3, mounted in the training container, and added to PYTHONPATH. The V3 ModelTrainer's SourceCode config had no equivalent field.
This PR adds dependencies support to SourceCode with the following changes:
Changes Made
sagemaker-core/src/sagemaker/core/training/configs.py
- Added
dependenciesfield toSourceCodeconfig class (list of local directory paths) - Fixed PEP 8 blank line formatting
sagemaker-train/src/sagemaker/train/constants.py
- Added
SM_DEPENDENCIESandSM_DEPENDENCIES_CONTAINER_PATHconstants for the dependencies channel - Fixed missing newline at end of file
sagemaker-train/src/sagemaker/train/templates.py
- Added
INSTALL_DEPENDENCIEStemplate that:- Adds dependency directories to
PYTHONPATH - Installs
.whland.tar.gzfiles via pip - Handles other files by adding the parent directory to
PYTHONPATH
- Adds dependency directories to
sagemaker-train/src/sagemaker/train/model_trainer.py
- Added
_temp_deps_dirprivate attribute for deterministic cleanup of the dependencies temp directory - In
_create_training_job_args(), creates input data channels for dependencies by copying them to a temp dir preserving basenames - Includes the
INSTALL_DEPENDENCIESstep in the generated train script when dependencies are present - Validates that dependency paths exist and are directories during
_validate_source_code() - Cleans up
_temp_deps_dirin destructor and after training completes
sagemaker-train/tests/unit/train/test_model_trainer_dependencies.py
- Updated to use
from __future__ import annotationsper V3 conventions - Changed fixture scope from
moduletofunctionfor proper test isolation - Added test for rejecting file dependencies (only directories allowed)
- Fixed weak tests to actually verify train script content and temp directory structure
Design Decisions
- Directories only: Dependencies are restricted to directories (matching V2 Estimator behavior), not individual files
- PYTHONPATH approach: Dependencies are added to
PYTHONPATHvia the bash train script, which is sufficient since Python processes inherit environment variables - Dependencies before requirements: Dependencies are set up before
requirements.txtinstallation, matching the logical ordering where dependencies should be available first
Comments reviewed: 34
Files modified: sagemaker-core/src/sagemaker/core/training/configs.py, sagemaker-train/src/sagemaker/train/constants.py, sagemaker-train/src/sagemaker/train/model_trainer.py, sagemaker-train/src/sagemaker/train/templates.py, sagemaker-train/tests/unit/train/test_model_trainer_dependencies.py
sagemaker-core/src/sagemaker/core/training/configs.py: Add blank line before OutputDataConfig class definition per PEP 8sagemaker-train/src/sagemaker/train/constants.py: Ensure file ends with newline after SM_RECIPE_CONTAINER_PATHsagemaker-train/src/sagemaker/train/model_trainer.py: Fix resource leak for deps temp dir, restrict dependencies to directories only, ensure shutil import exists, and add _temp_deps_dir as instance attribute with cleanupsagemaker-train/src/sagemaker/train/templates.py: Improve INSTALL_DEPENDENCIES template to handle both directories and .whl/.tar.gz files via pip installsagemaker-train/tests/unit/train/test_model_trainer_dependencies.py: Fix weak tests, change fixture scope, use proper future import, and add meaningful assertions