Add DIGEST-MD5 SASL delegation token auth to HiveCatalog by ShreyeshArangath · Pull Request #3150 · apache/iceberg-python

and others added 4 commits

March 16, 2026 13:44
Enable PyIceberg's HiveCatalog to authenticate using DIGEST-MD5 SASL
with delegation tokens from $HADOOP_TOKEN_FILE_LOCATION, which is the
standard mechanism in secure Hadoop environments. This unblocks PyIceberg
adoption in production clusters that don't use Kerberos directly.

- Add HiveAuthError exception for Hive-specific auth failures
- Add hadoop_credentials module to parse HDTS binary token files
- Add _DigestMD5SaslTransport to work around THRIFT-5926 (None initial response)
- Support hive.metastore.authentication property (NONE/KERBEROS/DIGEST-MD5)
- Add pure-sasl to hive extras in pyproject.toml
- Backward compatible: existing kerberos_auth boolean still works

Closes apache#3145

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address all findings from code review:

Critical:
- Rewrite VInt decoder to match Java WritableUtils.readVLong exactly,
  using signed-byte interpretation and correct prefix/length semantics

High:
- Catch OSError (not just FileNotFoundError) when reading token file
- Reject unknown auth mechanisms with HiveAuthError instead of silently
  falling back to unauthenticated TBufferedTransport
- Replace monkey-patching sasl.process in _DigestMD5SaslTransport with
  a clean send_sasl_msg override (thread-safe, no shared state mutation)

Medium:
- Fix kerberos_service_name default from config key to actual value
- Wrap UnicodeDecodeError in HiveAuthError for invalid UTF-8 in tokens
- Rewrite VInt test encoder to match real Hadoop encoding format
- Fix dead kerberos backward-compat tests to actually exercise __init__

Low:
- Add upper bound to pure-sasl dependency (<1.0.0)
- Fix tmp_path typing from object to pathlib.Path
- Fix docs to say pure-sasl (pip package name) not puresasl

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The parent TSaslClientTransport.send_sasl_msg() has no type annotations,
so there is no override incompatibility for mypy to suppress.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ShreyeshArangath

@ShreyeshArangath @claude

- Document that hive.kerberos-service-name applies to both KERBEROS and DIGEST-MD5
- Add precedence note for hive.metastore.authentication vs legacy boolean
- Add test for empty-string auth mechanism raising HiveAuthError
- Add integration test for KERBEROS via hive.metastore.authentication config
- Expand HiveAuthError docstring to cover token file errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fokko

Fokko

robreeves