Issue17681
Created on 2013-04-09 15:03 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| gzip_extra.diff | serhiy.storchaka, 2013-11-16 19:18 | review | ||
| zipfile_extra.diff | serhiy.storchaka, 2013-11-16 19:19 | review | ||
| README.dz | serhiy.storchaka, 2013-11-16 19:19 | |||
| README.zip | serhiy.storchaka, 2013-11-16 19:20 | |||
| Messages (8) | |||
|---|---|---|---|
| msg186423 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2013-04-09 15:03 | |
Gzip files can contains an extra field and some applications use this for extending gzip format. The current GzipFile implementation ignores this field on input and doesn't allow to create a new file with an extra field. I propose to save an extra field data on reading as a GzipFile attribute and add new parameter for GzipFile constructor for creating new file with an extra field. |
|||
| msg190295 - (view) | Author: Dmi Baranov (dmi.baranov) * | Date: 2013-05-29 12:07 | |
I'll be glad to do it, but having some questions for discussing. First about FEXTRA format - it consists of a series of subfields [1] and current Lib/test/test_gzip.py :: test_read_with_extra having a bit incorrect extra field - sure, if somebody using format from RFC1952. You having a real samples with extra field?. Should we parse subfields here (I have already asked Jean-Loup Gailly, maintainer of registry of subfield IDs, for current registry values and waiting reply) or will just provide extra header as byte string? Next about GzipFile's public interface - GzipFile(...).extra look ugly. Should I extend this ticket to support all metadata headers? FNAME, FCOMMENT, FHCRC, etc - correctly reading now, but no ways to get it outside (and no ways to create a file with FCOMMENT and FHCRC now). Eg, something to like this: GzipFile(...).metadata.FNAME == 'sample.gz' GzipFile(..., extra=b'AP6Test', comment='comment') [1] http://tools.ietf.org/html/rfc1952#section-2.3.1.1 |
|||
| msg190301 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2013-05-29 12:44 | |
I have an almost ready patch but I doubt about interface. It can be discussed. ZIP file entries have similar extra field and I'm planning to add similar feature to the zipfile module too. Here are preliminary patches. |
|||
| msg203077 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2013-11-16 19:24 | |
Some examples:
>>> import zipfile
>>> z = zipfile.ZipFile('README.zip')
>>> z.filelist[0].extra
b'UT\x05\x00\x03\xe0\xc3\x87Rux\x0b\x00\x01\x04\xe8\x03\x00\x00\x04\xe8\x03\x00\x00'
>>> z.filelist[0].extra_map
<zipfile.ExtraMap object at 0xb6fe8bec>
>>> list(z.filelist[0].extra_map.items())
[(21589, b'\x03\xe0\xc3\x87R'), (30837, b'\x01\x04\xe8\x03\x00\x00\x04\xe8\x03\x00\x00')]
>>> import gzip
>>> gz = gzip.open('README.dz')
>>> gz.extra_bytes
b''
>>> gz.extra_map
<gzip.ExtraMap object at 0xb6fd04ac>
>>> list(gz.extra_map.items())
[]
>>> gz.read(1)
b'T'
>>> gz.extra_bytes
b'RA\x08\x00\x01\x00\xcb\xe3\x01\x00T\x0b'
>>> list(gz.extra_map.items())
[(b'RA', b'\x01\x00\xcb\xe3\x01\x00T\x0b')]
|
|||
| msg365626 - (view) | Author: Jason Williams (Jason Williams) | Date: 2020-04-02 20:51 | |
What's needed to get this integrated? It will be great to not have to fork the GZIP. |
|||
| msg391612 - (view) | Author: Alex Mijalis (amijalis) | Date: 2021-04-22 16:45 | |
Agreed, it would be really nice to integrate these changes. These special fields are found in gzipped .bam files, a common DNA sequence alignment format used in the bioinformatics community. It would be nice to be able to read and write them with the standard library. |
|||
| msg393052 - (view) | Author: Benjamin Sergeant (Benjamin.Sergeant) | Date: 2021-05-05 23:23 | |
There is a comment field too which would be nice to support. The Go gzip module has a Header class that describe all the metadata. I see in 3.8 mtime was made configurable, so hopefully we can add comment and extra. https://golang.org/pkg/compress/gzip/#Header For our purpose we'd like to put arbitrary stuff in a gzip file but it is complicated to do so, I might use the patch here and apply to the python gzip module, but that feels a bit hackish. |
|||
| msg393053 - (view) | Author: Benjamin Sergeant (Benjamin.Sergeant) | Date: 2021-05-05 23:33 | |
type Header struct {
Comment string // comment
Extra []byte // "extra data"
ModTime time.Time // modification time
Name string // file name
OS byte // operating system type
}
This is what the header/extra things look like for reference.
|
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:44 | admin | set | github: 61881 |
| 2021-05-06 07:45:21 | nikratio | set | nosy:
- nikratio |
| 2021-05-05 23:33:10 | Benjamin.Sergeant | set | messages: + msg393053 |
| 2021-05-05 23:23:53 | Benjamin.Sergeant | set | nosy:
+ Benjamin.Sergeant messages: + msg393052 |
| 2021-04-22 16:45:39 | amijalis | set | nosy:
+ amijalis messages: + msg391612 |
| 2020-04-02 20:51:48 | Jason Williams | set | nosy:
+ Jason Williams messages: + msg365626 |
| 2018-07-13 12:03:05 | serhiy.storchaka | set | versions: + Python 3.8, - Python 3.4 |
| 2014-01-24 05:23:00 | nikratio | set | nosy:
+ nikratio |
| 2013-11-16 19:24:33 | serhiy.storchaka | set | messages:
+ msg203077 stage: needs patch -> patch review |
| 2013-11-16 19:20:24 | serhiy.storchaka | set | files: + README.zip |
| 2013-11-16 19:19:58 | serhiy.storchaka | set | files: + README.dz |
| 2013-11-16 19:19:13 | serhiy.storchaka | set | files: + zipfile_extra.diff |
| 2013-11-16 19:18:40 | serhiy.storchaka | set | files: + gzip_extra.diff |
| 2013-11-16 19:17:55 | serhiy.storchaka | set | files: - zip_extra.diff |
| 2013-11-16 19:17:43 | serhiy.storchaka | set | files: - gzip_extra.diff |
| 2013-05-29 12:45:13 | serhiy.storchaka | set | files: + zip_extra.diff |
| 2013-05-29 12:44:36 | serhiy.storchaka | set | files:
+ gzip_extra.diff keywords: + patch messages: + msg190301 title: Work with an extra field of gzip files -> Work with an extra field of gzip and zip files |
| 2013-05-29 12:07:24 | dmi.baranov | set | nosy:
+ dmi.baranov messages: + msg190295 |
| 2013-04-09 15:03:01 | serhiy.storchaka | create | |
