Enhance URI scheme validation for Windows paths by AlgoDeveloper400 · Pull Request #3161 · apache/iceberg-python

fix: handle Windows drive letters in parse_location

Rationale for this change

When a Windows user passes a local file path like C:\Users\file.avro to PyArrowFileIO,
Python's urlparse incorrectly treats the Windows drive letter C as a URL scheme (like s3 or http).

This caused PyIceberg to crash with:

Unrecognized filesystem type in URI: 'c'

The Fix

Before ❌ (Original Code):

uri = urlparse(location)

if not uri.scheme:
    default_scheme = properties.get("DEFAULT_SCHEME", "file")
    default_netloc = properties.get("DEFAULT_NETLOC", "")
    return default_scheme, default_netloc, os.path.abspath(location)

After ✅ (Fixed Code):

uri = urlparse(location)

if not uri.scheme or (len(uri.scheme) == 1 and uri.scheme.isalpha()):
    # len == 1 and isalpha() catches Windows drive letters like C:\ D:\
    default_scheme = properties.get("DEFAULT_SCHEME", "file")
    default_netloc = properties.get("DEFAULT_NETLOC", "")
    return default_scheme, default_netloc, os.path.abspath(location)

The only change:

# Before ❌
if not uri.scheme:

# After ✅
if not uri.scheme or (len(uri.scheme) == 1 and uri.scheme.isalpha()):

The added condition checks if the scheme is a single alphabetic character (e.g. C, D, E)
and treats it as a Windows drive letter instead of a URL scheme.


Example

from pyiceberg.io.pyarrow import PyArrowFileIO

io = PyArrowFileIO()

# Before fix - crashed with: Unrecognized filesystem type in URI: 'c'
# After fix - works correctly
scheme, netloc, path = io.parse_location("C:\\Users\\test\\file.avro")

print(scheme)  # 'file'
print(netloc)  # ''
print(path)    # 'C:\\Users\\test\\file.avro'

Impact

This fix affects all local file operations on Windows including:

  • Reading local Iceberg tables
  • Writing local Iceberg tables
  • Any local Avro/Parquet file operations

Are these changes tested?

Yes - existing tests now pass on Windows.

tests/test_avro_sanitization.py

python -m pytest tests/test_avro_sanitization.py -v
tests/test_avro_sanitization.py::test_comprehensive_field_name_sanitization  PASSED
tests/test_avro_sanitization.py::test_comprehensive_avro_compatibility        PASSED
tests/test_avro_sanitization.py::test_emoji_field_name_sanitization           PASSED

tests/io/test_pyarrow.py

python -m pytest tests/io/test_pyarrow.py::test_pyarrow_infer_local_fs_from_path -v
tests/io/test_pyarrow.py::test_pyarrow_infer_local_fs_from_path               PASSED

Are there any user-facing changes?

Yes - fixes local file access on Windows for all PyIceberg users.