fix: load_feature_definitions_from_dataframe() doesn't recognize pandas nullable dtyp (5675) by aviruthen · Pull Request #5732 · aws/sagemaker-python-sdk

🤖 Iteration #1 — Review Comments Addressed

Description

Fix load_feature_definitions_from_dataframe() to correctly recognize pandas nullable dtypes (Int64, Float64, string, etc.).

Problem

When a DataFrame uses pandas nullable dtypes (common after calling pd.DataFrame.convert_dtypes()), all numeric columns were incorrectly mapped to StringFeatureDefinition. This is because _INTEGER_TYPES and _FLOAT_TYPES only contained lowercase numpy dtype names (e.g., int64, float64), while pandas nullable dtypes use capitalized names (e.g., Int64, Float64).

Changes

sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py:

  • Added pandas nullable dtype mappings to _DTYPE_TO_FEATURE_TYPE_MAP for consistency
  • Updated _generate_feature_definition to explicitly check _STRING_TYPES so the "string" dtype is properly handled rather than falling through to the default case
  • _INTEGER_TYPES, _FLOAT_TYPES, and _STRING_TYPES sets (already added in prior iteration) correctly include pandas nullable dtype names

sagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py:

  • Consolidated individual nullable integer dtype tests into a single pytest.mark.parametrize test covering Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64
  • Consolidated nullable float dtype tests into a single parametrized test covering Float32, Float64
  • Fixed assertion lines exceeding 100-character line length limit by breaking them across multiple lines

Note

This fix was previously applied in V2 via PR #3740 but was not carried over to the V3 (sagemaker-mlops) codebase.

Comments reviewed: 4
Files modified: sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py, sagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py

  • sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py: Add pandas nullable dtype support to _DTYPE_TO_FEATURE_TYPE_MAP and update _generate_feature_definition to use _STRING_TYPES
  • sagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py: Use pytest.mark.parametrize for nullable dtype tests, fix line length issues