Enhance URI scheme validation for Windows paths by AlgoDeveloper400 · Pull Request #3161 · apache/iceberg-python
fix: handle Windows drive letters in parse_location
Rationale for this change
When a Windows user passes a local file path like C:\Users\file.avro to PyArrowFileIO,
Python's urlparse incorrectly treats the Windows drive letter C as a URL scheme (like s3 or http).
This caused PyIceberg to crash with:
Unrecognized filesystem type in URI: 'c'
The Fix
Before ❌ (Original Code):
uri = urlparse(location) if not uri.scheme: default_scheme = properties.get("DEFAULT_SCHEME", "file") default_netloc = properties.get("DEFAULT_NETLOC", "") return default_scheme, default_netloc, os.path.abspath(location)
After ✅ (Fixed Code):
uri = urlparse(location) if not uri.scheme or (len(uri.scheme) == 1 and uri.scheme.isalpha()): # len == 1 and isalpha() catches Windows drive letters like C:\ D:\ default_scheme = properties.get("DEFAULT_SCHEME", "file") default_netloc = properties.get("DEFAULT_NETLOC", "") return default_scheme, default_netloc, os.path.abspath(location)
The only change:
# Before ❌ if not uri.scheme: # After ✅ if not uri.scheme or (len(uri.scheme) == 1 and uri.scheme.isalpha()):
The added condition checks if the scheme is a single alphabetic character (e.g. C, D, E)
and treats it as a Windows drive letter instead of a URL scheme.
Example
from pyiceberg.io.pyarrow import PyArrowFileIO io = PyArrowFileIO() # Before fix - crashed with: Unrecognized filesystem type in URI: 'c' # After fix - works correctly scheme, netloc, path = io.parse_location("C:\\Users\\test\\file.avro") print(scheme) # 'file' print(netloc) # '' print(path) # 'C:\\Users\\test\\file.avro'
Impact
This fix affects all local file operations on Windows including:
- Reading local Iceberg tables
- Writing local Iceberg tables
- Any local Avro/Parquet file operations
Are these changes tested?
Yes - existing tests now pass on Windows.
tests/test_avro_sanitization.py
python -m pytest tests/test_avro_sanitization.py -v
tests/test_avro_sanitization.py::test_comprehensive_field_name_sanitization PASSED
tests/test_avro_sanitization.py::test_comprehensive_avro_compatibility PASSED
tests/test_avro_sanitization.py::test_emoji_field_name_sanitization PASSED
tests/io/test_pyarrow.py
python -m pytest tests/io/test_pyarrow.py::test_pyarrow_infer_local_fs_from_path -v
tests/io/test_pyarrow.py::test_pyarrow_infer_local_fs_from_path PASSED
Are there any user-facing changes?
Yes - fixes local file access on Windows for all PyIceberg users.