Fix #5632: HyperparameterTuner drops content_type when converting Inp... by JiwaniZakir · Pull Request #5703 · aws/sagemaker-python-sdk

Closes #5632

Motivation

HyperparameterTuner._build_training_job_definition() converts InputData objects to Channel objects but omits content_type during that conversion, causing built-in algorithms (e.g., XGBoost) to fail with validate_data_file_path errors because the training container cannot determine the data format.

Changes

sagemaker-train/src/sagemaker/train/tuner.py

  • Line 1433: Added content_type=inp.content_type to the Channel(...) constructor call inside the isinstance(inp, InputData) branch of _build_training_job_definition(). This is the sole change required to propagate the field that was silently dropped.

sagemaker-train/tests/unit/train/test_tuner.py

  • Added test_build_training_job_definition_preserves_content_type() to TestHyperparameterTunerStaticMethods. The test constructs an InputData with content_type="text/csv", calls tuner._build_training_job_definition(), and asserts that the resulting Channel for the "train" channel carries content_type == "text/csv". This directly exercises the previously broken code path.

Testing

The new unit test covers the regression:

tests/unit/train/test_tuner.py::TestHyperparameterTunerStaticMethods::test_build_training_job_definition_preserves_content_type PASSED

Manually verified against XGBoost 1.7-1 using the reproduction case from the issue report: training jobs now complete successfully when InputData(content_type="csv") is passed to tuner.tune(), without requiring the Channel-based workaround.