fix: ModelBuilder with source_code + DJL LMI: /opt/ml/model becomes read-only, breaki (5698) by aviruthen · Pull Request #5733 · aws/sagemaker-python-sdk
🤖 Iteration #1 — Review Comments Addressed
Description
Fix ModelBuilder with source_code + DJL LMI: /opt/ml/model becomes read-only, breaking HF Hub model downloads.
The issue has two root causes in _build_for_djl() in model_builder_servers.py:
-
Missing HF cache redirection: Unlike
_build_for_tgi()and_build_for_tei()which setHF_HOME=/tmpandHUGGINGFACE_HUB_CACHE=/tmp, the DJL builder never set these environment variables. Whensource_codeis provided, the model artifacts (requirements.txt etc.) get packaged asmodel.tar.gzand mounted read-only at/opt/ml/model/. The DJL container then tries to download HF models to/opt/ml/model/(the default cache location) and fails withEROFS. -
HF_MODEL_ID override:
_build_for_djl()unconditionally calledself.env_vars.update({'HF_MODEL_ID': self.model}), which overwrote any user-providedHF_MODEL_IDvalue. This prevented users from settingHF_MODEL_IDto a local path (e.g.,/opt/ml/model) when they want to use pre-downloaded model artifacts.
Changes Made
-
sagemaker-serve/src/sagemaker/serve/model_builder_servers.py:- Added
HF_HOMEandHUGGINGFACE_HUB_CACHEenv vars pointing to/tmpfor the DJL builder, consolidated to a single location usingsetdefault()so user-provided values are preserved - Changed
HF_MODEL_IDto usesetdefault()so user-provided values are not overridden
- Added
-
sagemaker-serve/tests/unit/servers/test_djl_hf_cache_env.py:- Added pytest-style tests verifying HF cache env vars, HF_MODEL_ID preservation, and local mode offline behavior
- Uses fixtures and helper functions to minimize duplication
Comments reviewed: 9
Files modified: sagemaker-serve/src/sagemaker/serve/model_builder_servers.py, sagemaker-serve/tests/unit/servers/test_djl_hf_cache_env.py
sagemaker-serve/src/sagemaker/serve/model_builder_servers.py: Fix DJL builder to use setdefault for HF_MODEL_ID (preserving user values), consolidate HF cache env vars to a single location using setdefault, and remove trailing whitespacesagemaker-serve/tests/unit/servers/test_djl_hf_cache_env.py: Rewrite tests using pytest style with fixtures, consolidating redundant tests and cleaning up imports