fix: load_feature_definitions_from_dataframe() doesn't recognize pandas nullable dtyp (5675) by aviruthen · Pull Request #5732 · aws/sagemaker-python-sdk
🤖 Iteration #1 — Review Comments Addressed
Description
Fix load_feature_definitions_from_dataframe() to correctly recognize pandas nullable dtypes (Int64, Float64, string, etc.).
Problem
When a DataFrame uses pandas nullable dtypes (common after calling pd.DataFrame.convert_dtypes()), all numeric columns were incorrectly mapped to StringFeatureDefinition. This is because _INTEGER_TYPES and _FLOAT_TYPES only contained lowercase numpy dtype names (e.g., int64, float64), while pandas nullable dtypes use capitalized names (e.g., Int64, Float64).
Changes
sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py:
- Added pandas nullable dtype mappings to
_DTYPE_TO_FEATURE_TYPE_MAPfor consistency - Updated
_generate_feature_definitionto explicitly check_STRING_TYPESso the"string"dtype is properly handled rather than falling through to the default case _INTEGER_TYPES,_FLOAT_TYPES, and_STRING_TYPESsets (already added in prior iteration) correctly include pandas nullable dtype names
sagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py:
- Consolidated individual nullable integer dtype tests into a single
pytest.mark.parametrizetest coveringInt8,Int16,Int32,Int64,UInt8,UInt16,UInt32,UInt64 - Consolidated nullable float dtype tests into a single parametrized test covering
Float32,Float64 - Fixed assertion lines exceeding 100-character line length limit by breaking them across multiple lines
Note
This fix was previously applied in V2 via PR #3740 but was not carried over to the V3 (sagemaker-mlops) codebase.
Comments reviewed: 4
Files modified: sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py, sagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py
sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py: Add pandas nullable dtype support to _DTYPE_TO_FEATURE_TYPE_MAP and update _generate_feature_definition to use _STRING_TYPESsagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py: Use pytest.mark.parametrize for nullable dtype tests, fix line length issues