fix: ModelBuilder with source_code + DJL LMI: /opt/ml/model becomes read-only, breaki (5698) by aviruthen · Pull Request #5733 · aws/sagemaker-python-sdk

🤖 Iteration #1 — Review Comments Addressed

Description

Fix ModelBuilder with source_code + DJL LMI: /opt/ml/model becomes read-only, breaking HF Hub model downloads.

The issue has two root causes in _build_for_djl() in model_builder_servers.py:

  1. Missing HF cache redirection: Unlike _build_for_tgi() and _build_for_tei() which set HF_HOME=/tmp and HUGGINGFACE_HUB_CACHE=/tmp, the DJL builder never set these environment variables. When source_code is provided, the model artifacts (requirements.txt etc.) get packaged as model.tar.gz and mounted read-only at /opt/ml/model/. The DJL container then tries to download HF models to /opt/ml/model/ (the default cache location) and fails with EROFS.

  2. HF_MODEL_ID override: _build_for_djl() unconditionally called self.env_vars.update({'HF_MODEL_ID': self.model}), which overwrote any user-provided HF_MODEL_ID value. This prevented users from setting HF_MODEL_ID to a local path (e.g., /opt/ml/model) when they want to use pre-downloaded model artifacts.

Changes Made

  • sagemaker-serve/src/sagemaker/serve/model_builder_servers.py:

    • Added HF_HOME and HUGGINGFACE_HUB_CACHE env vars pointing to /tmp for the DJL builder, consolidated to a single location using setdefault() so user-provided values are preserved
    • Changed HF_MODEL_ID to use setdefault() so user-provided values are not overridden
  • sagemaker-serve/tests/unit/servers/test_djl_hf_cache_env.py:

    • Added pytest-style tests verifying HF cache env vars, HF_MODEL_ID preservation, and local mode offline behavior
    • Uses fixtures and helper functions to minimize duplication

Comments reviewed: 9
Files modified: sagemaker-serve/src/sagemaker/serve/model_builder_servers.py, sagemaker-serve/tests/unit/servers/test_djl_hf_cache_env.py

  • sagemaker-serve/src/sagemaker/serve/model_builder_servers.py: Fix DJL builder to use setdefault for HF_MODEL_ID (preserving user values), consolidate HF cache env vars to a single location using setdefault, and remove trailing whitespace
  • sagemaker-serve/tests/unit/servers/test_djl_hf_cache_env.py: Rewrite tests using pytest style with fixtures, consolidating redundant tests and cleaning up imports