Fix #5632: HyperparameterTuner drops content_type when converting Inp... by JiwaniZakir · Pull Request #5703 · aws/sagemaker-python-sdk
Closes #5632
Motivation
HyperparameterTuner._build_training_job_definition() converts InputData objects to Channel objects but omits content_type during that conversion, causing built-in algorithms (e.g., XGBoost) to fail with validate_data_file_path errors because the training container cannot determine the data format.
Changes
sagemaker-train/src/sagemaker/train/tuner.py
- Line 1433: Added
content_type=inp.content_typeto theChannel(...)constructor call inside theisinstance(inp, InputData)branch of_build_training_job_definition(). This is the sole change required to propagate the field that was silently dropped.
sagemaker-train/tests/unit/train/test_tuner.py
- Added
test_build_training_job_definition_preserves_content_type()toTestHyperparameterTunerStaticMethods. The test constructs anInputDatawithcontent_type="text/csv", callstuner._build_training_job_definition(), and asserts that the resultingChannelfor the"train"channel carriescontent_type == "text/csv". This directly exercises the previously broken code path.
Testing
The new unit test covers the regression:
tests/unit/train/test_tuner.py::TestHyperparameterTunerStaticMethods::test_build_training_job_definition_preserves_content_type PASSED
Manually verified against XGBoost 1.7-1 using the reproduction case from the issue report: training jobs now complete successfully when InputData(content_type="csv") is passed to tuner.tune(), without requiring the Channel-based workaround.