fix: Add additional dependencies for ModelTrainer (5668) by aviruthen · Pull Request #5731 · aws/sagemaker-python-sdk

🤖 Iteration #1 — Review Comments Addressed

Add dependencies support for ModelTrainer's SourceCode config

Description

When migrating from SageMaker v2 to v3, the V2 Estimator's dependencies parameter allows users to specify a list of local directory paths containing additional libraries that are uploaded to S3, mounted in the training container, and added to PYTHONPATH. The V3 ModelTrainer's SourceCode config had no equivalent field.

This PR adds dependencies support to SourceCode with the following changes:

Changes Made

sagemaker-core/src/sagemaker/core/training/configs.py

  • Added dependencies field to SourceCode config class (list of local directory paths)
  • Fixed PEP 8 blank line formatting

sagemaker-train/src/sagemaker/train/constants.py

  • Added SM_DEPENDENCIES and SM_DEPENDENCIES_CONTAINER_PATH constants for the dependencies channel
  • Fixed missing newline at end of file

sagemaker-train/src/sagemaker/train/templates.py

  • Added INSTALL_DEPENDENCIES template that:
    • Adds dependency directories to PYTHONPATH
    • Installs .whl and .tar.gz files via pip
    • Handles other files by adding the parent directory to PYTHONPATH

sagemaker-train/src/sagemaker/train/model_trainer.py

  • Added _temp_deps_dir private attribute for deterministic cleanup of the dependencies temp directory
  • In _create_training_job_args(), creates input data channels for dependencies by copying them to a temp dir preserving basenames
  • Includes the INSTALL_DEPENDENCIES step in the generated train script when dependencies are present
  • Validates that dependency paths exist and are directories during _validate_source_code()
  • Cleans up _temp_deps_dir in destructor and after training completes

sagemaker-train/tests/unit/train/test_model_trainer_dependencies.py

  • Updated to use from __future__ import annotations per V3 conventions
  • Changed fixture scope from module to function for proper test isolation
  • Added test for rejecting file dependencies (only directories allowed)
  • Fixed weak tests to actually verify train script content and temp directory structure

Design Decisions

  • Directories only: Dependencies are restricted to directories (matching V2 Estimator behavior), not individual files
  • PYTHONPATH approach: Dependencies are added to PYTHONPATH via the bash train script, which is sufficient since Python processes inherit environment variables
  • Dependencies before requirements: Dependencies are set up before requirements.txt installation, matching the logical ordering where dependencies should be available first

Comments reviewed: 34
Files modified: sagemaker-core/src/sagemaker/core/training/configs.py, sagemaker-train/src/sagemaker/train/constants.py, sagemaker-train/src/sagemaker/train/model_trainer.py, sagemaker-train/src/sagemaker/train/templates.py, sagemaker-train/tests/unit/train/test_model_trainer_dependencies.py

  • sagemaker-core/src/sagemaker/core/training/configs.py: Add blank line before OutputDataConfig class definition per PEP 8
  • sagemaker-train/src/sagemaker/train/constants.py: Ensure file ends with newline after SM_RECIPE_CONTAINER_PATH
  • sagemaker-train/src/sagemaker/train/model_trainer.py: Fix resource leak for deps temp dir, restrict dependencies to directories only, ensure shutil import exists, and add _temp_deps_dir as instance attribute with cleanup
  • sagemaker-train/src/sagemaker/train/templates.py: Improve INSTALL_DEPENDENCIES template to handle both directories and .whl/.tar.gz files via pip install
  • sagemaker-train/tests/unit/train/test_model_trainer_dependencies.py: Fix weak tests, change fixture scope, use proper future import, and add meaningful assertions