fix: sigv4 auth to use base64-encoded content sha256 and custom canonical request by plusplusjiajia · Pull Request #3120 · apache/iceberg-python
thanks for the PR! i understand this is to align with the java SigV4 implementation. Could you help me understand the specific scenario in which this is currently breaking? (i dont know much about sigv4)
@kevinjqliu That's a great question — let me walk through the context.
The root cause is in how the Java Iceberg SDK computes the x-amz-content-sha256 header. It uses AWS SDK v2's SignerChecksumParams with Algorithm.SHA256 and sets the checksumHeaderName to X-Amz-Content-SHA256 . Internally, the AWS SDK's
applies BinaryUtils.toBase64() to the checksum before writing it into the specified header — this is part of the flexible checksum mechanism rather than standard SigV4 behavior.
So the base64 encoding in x-amz-content-sha256 is essentially a side effect of Java Iceberg leveraging the flexible checksum API. For empty bodies, the Java side already has a RESTSigV4AuthSession.java#L119-L121 to override this with the standard hex value, but for non-empty bodies, the base64 value is left as-is (confirmed by the TestRESTSigV4AuthSession.java#L174 ).
Since x-amz-content-sha256 is a signed header, its value participates in the canonical request construction. When a REST catalog server built with the Java Iceberg SDK verifies incoming signatures, it expects the same base64-encoded value. If the Python client sends a hex-encoded value instead, the canonical headers won't match during server-side signature verification, resulting in a signature mismatch.
This PR aligns the Python implementation with the Java SDK's current behavior to ensure interoperability. That said, I agree it would be worth discussing whether the Java side should also be updated to use standard hex encoding — but that would need to be a coordinated change across both implementations. Happy to hear your thoughts on this!