bz2.BZ2File / gzip.GZipFile / lzma.LZMAFile expose misleading `fileno` method. · Issue #100066

bz2.BZ2File / gzip.GZipFile / lzma.LZMAFile expose misleading `fileno` method. · Issue #100066 · python/cpython

The various compressing/decompressing file wrappers (bz2.BZ2File, gzip.GZipFile, lzma.LZMAFile) currently have fileno methods that return the underlying file descriptor:

	def fileno(self):
	"""Return the file descriptor for the underlying file."""
	self._check_not_closed()
	return self._fp.fileno()

I imagine this was done because it seemed useful, but I'm not sure what use it is. You can't safely use things like select since the compression/decompression might buffer, and passing it to things that use the file descriptor directly will produce garbage (when reading) or corrupt the file (when writing).

An example how misleading this can be, courtesy of @ericfrederich:

>>> import bz2
>>> import subprocess
>>> with bz2.open('/tmp/out.bz2', 'w') as f:
...   subprocess.check_call(['echo', '-n', "Why doesn't this work?"], stdout=f)
...
0
>>> bz2.open('/tmp/out.bz2', 'r').read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/bz2.py", line 178, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.7/_compression.py", line 103, in read
    data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream
>>> open('/tmp/out.bz2', 'rb').read()
b"Why doesn't this work?BZh9\x17rE8P\x90\x00\x00\x00\x00"

Note the (empty) bz2 data after the data written by the subprocess.

Am I missing a situation where this is actually useful? If there isn't one, can we consider adding a warning for the confusing behaviour?