Issue 36061: zipfile does not handle arcnames with non-ascii characters on Windows
Created on 2019-02-21 01:59 by Shane Lee, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| test.zip | Shane Lee, 2019-02-21 01:59 | A zip file containing files with non-ascii filenames | ||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 11965 | closed | python-dev, 2019-02-21 02:19 | |
| Messages (2) | |||
|---|---|---|---|
| msg336172 - (view) | Author: Shane Lee (Shane Lee) * | Date: 2019-02-21 01:59 | |
Python 2.7.15 (probably affects newer versions as well)
Given an archive with any number of files inside that have non-ascii characters in their filename `zipfile` will crash when extracting them to the file system.
```
Traceback (most recent call last):
File "c:\dev\salt\salt\modules\archive.py", line 1081, in unzip
zfile.extract(target, dest, password)
File "c:\python27\lib\zipfile.py", line 1028, in extract
return self._extract_member(member, path, pwd)
File "c:\python27\lib\zipfile.py", line 1069, in _extract_member
targetpath = os.path.join(targetpath, arcname)
File "c:\python27\lib\ntpath.py", line 85, in join
result_path = result_path + p_path
UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 3: ordinal not in range(128)
```
|
|||
| msg336183 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2019-02-21 05:40 | |
You can not just add .decode('cp437') to arcname.
1. This will fail if the ZIP archive contains file names encoded with UTF-8. They are already unicode and contains non-ascii characters. For decode() they will be implicit encoded to str, that will fail.
2. This will fail when targetpath is a 8-bit string containing non-ascii characters. Currently this works (maybe incorrectly).
3. While cp437 is the only official encoding in ZIP archives if UTF-8 is not used, de facto different encodings (like cp866) are used on localized Windows.
Fixing the problem without introducing other problems and breaking existing working code is hard. One possible solution is using Python 3.
I suggest to close this issue as "won't fix".
|
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:11 | admin | set | github: 80242 |
| 2019-02-21 11:06:19 | methane | set | status: open -> closed resolution: wont fix stage: patch review -> resolved |
| 2019-02-21 05:40:42 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg336183 |
| 2019-02-21 02:19:01 | python-dev | set | keywords:
+ patch stage: patch review pull_requests: + pull_request11991 |
| 2019-02-21 01:59:30 | Shane Lee | create | |
