Issue38032
Created on 2019-09-04 23:21 by JustinTArthur, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| badvar.py | JustinTArthur, 2019-09-04 23:21 | Module demonstrating non-word continuation characters | ||
| Messages (8) | |||
|---|---|---|---|
| msg351153 - (view) | Author: Justin Arthur (JustinTArthur) * | Date: 2019-09-04 23:21 | |
Python 3 code with an identifier that has a non-spacing mark in it does not get tokenized by lib2to3 and will result in an exception thrown in the parsing process.
Parsing the attached file (badvar.py), results in `ParseError: bad token: type=58, value='̇', context=('', (1, 1))`
This happens because the Name pattern regular expression in lib2to3 is `r'\w+'` and the word character class doesn't contain non-spacing marks (and possible other [continuation characters allowed in Python 3 identifiers](https://docs.python.org/3/reference/lexical_analysis.html#identifiers)).
(reported by energizer in the Python IRC channel)
|
|||
| msg351215 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2019-09-05 17:39 | |
"2to3 is a Python program that reads Python 2.x source code and applies a series of fixers to transform it into valid Python 3.x code." The example you supply, badvar,py, is not a valid Python 2.x program. Python 2 identifiers cannot contain such characters. https://docs.python.org/3/library/2to3.html https://docs.python.org/2/reference/lexical_analysis.html#identifiers https://docs.python.org/3/reference/lexical_analysis.html#identifiers |
|||
| msg351220 - (view) | Author: Justin Arthur (JustinTArthur) * | Date: 2019-09-05 18:57 | |
Ned, can you confirm that 2to3 is not intended for cumulative/incremental runs over the same codebase? If it's not intended to be run on previously ported code, this will just need to be fixed on the lib2to3 downstream projects like awpa and Black that are encountering this issue. |
|||
| msg351224 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2019-09-06 02:48 | |
Benjamin, can you answer Justin's question above? |
|||
| msg351226 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2019-09-06 04:10 | |
2to3 should be able to parse valid Python 3 code. |
|||
| msg351227 - (view) | Author: Ned Deily (ned.deily) * ![]() |
Date: 2019-09-06 04:26 | |
> 2to3 should be able to parse valid Python 3 code. OK, then should the original behavior here be treated as a bug and fixed? If so, this issue should be re-opened. |
|||
| msg356957 - (view) | Author: Batuhan Taskaya (BTaskaya) * ![]() |
Date: 2019-11-19 09:50 | |
Is there a consensus about fixing this? By the way, this isn't valid in the current tokenizer too. 1,0-1,2: NAME 'iÌ' 1,2-1,3: ERRORTOKEN '‡' 1,4-1,5: OP '=' 1,6-1,7: NUMBER '5' 1,7-1,8: NEWLINE '\n' |
|||
| msg377798 - (view) | Author: Justin Arthur (JustinTArthur) * | Date: 2020-10-02 04:52 | |
Not sure if there is consensus on how to fix, but fixing #12731 will fix this for most of the cases I've seen complaints about as a side effect. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:19 | admin | set | github: 82213 |
| 2021-10-20 22:55:19 | iritkatriel | set | status: open -> closed superseder: Close 2to3 issues and list them here resolution: wont fix stage: needs patch -> resolved |
| 2020-10-02 04:52:07 | JustinTArthur | set | messages: + msg377798 |
| 2019-11-19 09:50:19 | BTaskaya | set | nosy:
+ BTaskaya messages: + msg356957 |
| 2019-09-06 04:38:14 | ned.deily | set | nosy:
- ned.deily stage: resolved -> needs patch versions: + Python 3.9, - Python 3.5, Python 3.6 |
| 2019-09-06 04:30:11 | benjamin.peterson | set | status: closed -> open resolution: not a bug -> (no value) |
| 2019-09-06 04:26:42 | ned.deily | set | messages: + msg351227 |
| 2019-09-06 04:10:06 | benjamin.peterson | set | messages: + msg351226 |
| 2019-09-06 02:48:28 | ned.deily | set | nosy:
+ benjamin.peterson messages: + msg351224 |
| 2019-09-05 18:57:03 | JustinTArthur | set | messages: + msg351220 |
| 2019-09-05 17:39:39 | ned.deily | set | status: open -> closed nosy:
+ ned.deily resolution: not a bug |
| 2019-09-04 23:21:42 | JustinTArthur | create | |
