Issue 32698: Improper gzip compression if output file extension is not "gz"

Issue32698

Created on 2018-01-28 18:37 by Delgan, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test.py Delgan, 2018-01-28 18:37
Messages (3)
msg310978 - (view) Author: Delgan (Delgan) * Date: 2018-01-28 18:37
Hello.

The following code produces a improper compressed "test.txt.gzip" file:

    import gzip
    import shutil
    
    input_path = "test.txt"
    output_path = input_path + ".gzip"
    
    with open(input_path, 'w') as file:
        file.write("abc" * 10)
    
    with gzip.open(output_path, 'wb') as f_out:
        with open(input_path, 'rb') as f_in:
            shutil.copyfileobj(f_in, f_out)

Although the content can be read correctly using `gzip.open(outputh_path, 'rb')`, it cannot be correctly opened using software like 7-Zip or WinRar.
If I open the "test.txt.gzip" file, it contains another "test.txt.gzip" file. If I change the code to use ".gz" extension and then open "test.txt.gz", it contains the expected "test.txt" file.
The contained "test.txt.gzip" is actually the same (at bytes level) that "test.txt", just the filename differs which causes tools like 7-Zip to mess up.

The bug is not present using compressions functions from "bz2" and "lzma" modules. I can use custom extension, it still can be (un)compressed without issue.

As to why I need to use an extension differents from ".gz": I would like to compress arbitrary ".tar" file given in input to ".tgz". I wish the user could open the file in his favorite software archiver and see that it contains a ".tar" file, rather than he does not understand why it contains the same ".tgz" file.
msg311018 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-01-28 22:27
According to the documentation, you can use the lower-level GzipFile constructor’s “filename” argument:

>>> with open(output_path, 'wb') as f_out, \
...     gzip.GzipFile(fileobj=f_out, mode='wb', filename=input_path) as f_out, \
...     open(input_path, 'rb') as f_in:
...         shutil.copyfileobj(f_in, f_out)
... 
>>> import os
>>> os.system("7z l test.txt.gzip")
[. . .]
   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2018-01-28 22:23:16 .....           30           34  test.txt
------------------- ----- ------------ ------------  ------------------------
msg311160 - (view) Author: Delgan (Delgan) * Date: 2018-01-29 19:33
Thanks @martin.panter for your response.

I will close this issue as "not a bug" as there is a workaround and as the current behavior could be deduced by reading carefully the entire documentation.
History
Date User Action Args
2022-04-11 14:58:57adminsetgithub: 76879
2018-01-29 19:33:55Delgansetstatus: open -> closed
resolution: not a bug
messages: + msg311160

stage: resolved

2018-01-28 22:27:15martin.pantersetnosy: + martin.panter
messages: + msg311018
2018-01-28 18:37:33Delgancreate