Issue2660
Created on 2008-04-19 21:04 by azverkan, last changed 2022-04-11 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| 2to3bug.py | azverkan, 2008-04-19 21:04 | testcase | ||
| 2to3_encoding.patch | vstinner, 2009-05-04 20:55 | |||
| Messages (8) | |||
|---|---|---|---|
| msg65637 - (view) | Author: Brandon Ehle (azverkan) | Date: 2008-04-19 21:04 | |
While running the 2to3 script on the scons codebase, I ran into an
UnicodeDecodeError.
Attached is just the portion of the script that causes the error.
2to3 throws an error on the string regardless of whether the unicode
string literal is prepended on the front.
RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: ws_comma
Traceback (most recent call last):
File "/usr/local/bin/2to3", line 5, in <module>
sys.exit(refactor.main())
File "/usr/local/lib/python3.0/lib2to3/refactor.py", line 81, in main
rt.refactor_args(args)
File "/usr/local/lib/python3.0/lib2to3/refactor.py", line 188, in
refactor_args
self.refactor_file(arg)
File "/usr/local/lib/python3.0/lib2to3/refactor.py", line 217, in
refactor_file
input = f.read() + "\n" # Silence certain parse errors
File "/usr/local/lib/python3.0/io.py", line 1611, in read
decoder.decode(self.buffer.read(), final=True))
File "/usr/local/lib/python3.0/io.py", line 1199, in decode
output = self.decoder.decode(input, final=final)
File "/usr/local/lib/python3.0/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 59-60:
invalid data
|
|||
| msg65638 - (view) | Author: Collin Winter (collinwinter) * ![]() |
Date: 2008-04-19 21:48 | |
2to3 running under Python 2.5.1 handles this file just fine. 2to3 running under 3.0a4+ (r62404) fails as detailed below. However, that file doesn't run correctly under Python itself: collinwinter@Silves:~/src/python/py3k$ ./python /home/collinwinter/Desktop/2to3bug.py File "/home/collinwinter/Desktop/2to3bug.py", line 3 collinwinter@Silves:~/src/python/py3k This suggests this problem isn't 2to3-specific. Refiling this issue against py3k's Unicode support. |
|||
| msg65641 - (view) | Author: Brandon Ehle (azverkan) | Date: 2008-04-20 01:38 | |
Someone on the #python IRC channel suggested that the default for python 3.0 for unicode string literals is reversed from python 2.5. If you remove the unicode string literal (u'') from the front of the string, it runs fine under python 3.0 and fails under 2.5 and 2.6 instead. |
|||
| msg65642 - (view) | Author: Brandon Ehle (azverkan) | Date: 2008-04-20 01:40 | |
Also, I can confirm that running 2to3 with Python 2.6 correctly converts the script but running 2to3 with Python 3.0 results in a UnicodeDecodeError exception. |
|||
| msg86641 - (view) | Author: Daniel Diniz (ajaksu2) * ![]() |
Date: 2009-04-27 01:42 | |
Confirmed in py3k on rev71995. |
|||
| msg86643 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2009-04-27 02:39 | |
The problem is that 2to3 just reads the file with whatever locale.getpreferredencoding() returns. It should use tokenize.detect_encoding() to discover the correct encoding to open it with. |
|||
| msg87175 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2009-05-04 20:55 | |
Patch using tokenize.detect_encoding() to read the encoding of Python scripts instead of using default io.open() encoding (utf-8). We might write unit test. See also related issue: #5093 |
|||
| msg87481 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2009-05-09 00:33 | |
Fixed in r72491. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:33 | admin | set | github: 46912 |
| 2009-05-09 00:33:45 | benjamin.peterson | set | status: open -> closed resolution: fixed messages: + msg87481 |
| 2009-05-04 20:55:19 | vstinner | set | files:
+ 2to3_encoding.patch nosy:
+ vstinner keywords: + patch |
| 2009-04-27 02:39:29 | benjamin.peterson | set | messages: + msg86643 |
| 2009-04-27 01:42:30 | ajaksu2 | set | type: behavior components: + 2to3 (2.x to 3.x conversion tool) versions: + Python 2.6, Python 3.1, - Python 3.0 nosy: + ajaksu2, benjamin.peterson messages:
+ msg86641 |
| 2008-04-20 01:40:01 | azverkan | set | messages: + msg65642 |
| 2008-04-20 01:38:09 | azverkan | set | messages: + msg65641 |
| 2008-04-19 22:16:59 | collinwinter | set | title: 2to3 throws a utf8 decode error on a iso-8859-1 string -> Py3k fails to parse a file with an iso-8859-1 string |
| 2008-04-19 21:48:49 | collinwinter | set | priority: high assignee: collinwinter -> messages: + msg65638 components: + Unicode, - 2to3 (2.x to 3.x conversion tool) |
| 2008-04-19 21:04:59 | azverkan | create | |

