Issue25643
Created on 2015-11-17 01:27 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| tokenize_input.patch | serhiy.storchaka, 2015-11-17 01:27 | review | ||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 25050 | merged | pablogsal, 2021-03-28 04:12 | |
| PR 25080 | merged | pablogsal, 2021-03-29 21:53 | |
| Messages (7) | |||
|---|---|---|---|
| msg254778 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2015-11-17 01:27 | |
Here is preliminary patch that refactors the lowest level of Python tokenizer, reading and decoding. It splits the code on smaller simpler functions, decreases the source size by 37 lines, and fixes bugs: issue14811, issue18961, and a number of others. Added tests for most of fixed bugs (except leaks and others hardly reproducible). But the fix for other bugs can be harder, especially for issues with null byte (issue1105770, issue20115). Many bug easily can be fixed if read all Python file in memory instead of reading it line by line. I don't know if it is acceptable. |
|||
| msg255082 - (view) | Author: Stéphane Wirtel (matrixise) * ![]() |
Date: 2015-11-22 06:29 | |
Hi Serhiy,
Just of your information but I think you know that, the tests pass ;-)
[398/399] test_multiprocessing_spawn (138 sec) -- running: test_tools
(108 sec)
[399/399] test_tools (121 sec)
385 tests OK.
3 tests altered the execution environment:
test___all__ test_site test_warnings
11 tests skipped:
test_devpoll test_kqueue test_msilib test_ossaudiodev
test_startfile test_tix test_tk test_ttk_guionly test_winreg
test_winsound test_zipfile64
But I am interested by this part of CPython, I am not an expert in
lexing and parsing but how can I help you ? I am a novice in this
domain.
Stephane
|
|||
| msg255355 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2015-11-25 14:17 | |
"especially for issues with null byte" I don't think that we should put to much energy in handling correctly NUL bytes. I see NUL bytes in code as bugs in the code, not in the Python parser. We *might* try to give warnings or better error messages to the user, that's all. |
|||
| msg262091 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2016-03-20 21:30 | |
New changeset 23a7481eafd4 by Serhiy Storchaka in branch 'default': Issues #25643, #26581: Added new tests for detecting Python source code encoding. https://hg.python.org/cpython/rev/23a7481eafd4 |
|||
| msg376742 - (view) | Author: Brett Cannon (brett.cannon) * ![]() |
Date: 2020-09-11 22:10 | |
@serhiy: did you still want to commit this? |
|||
| msg389654 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2021-03-28 22:48 | |
New changeset 261a452a1300eeeae1428ffd6e6623329c085e2c by Pablo Galindo in branch 'master': bpo-25643: Refactor the C tokenizer into smaller, logical units (GH-25050) https://github.com/python/cpython/commit/261a452a1300eeeae1428ffd6e6623329c085e2c |
|||
| msg389692 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2021-03-29 12:28 | |
Oh, 6 years to fix this bug. Better late than never ;-) Thanks for reporting and for fixing it! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:23 | admin | set | github: 69829 |
| 2021-04-13 17:07:04 | vstinner | link | issue14811 superseder |
| 2021-03-29 21:53:38 | pablogsal | set | pull_requests: + pull_request23830 |
| 2021-03-29 12:28:05 | vstinner | set | messages: + msg389692 |
| 2021-03-28 22:49:06 | pablogsal | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2021-03-28 22:48:13 | pablogsal | set | messages: + msg389654 |
| 2021-03-28 04:12:55 | pablogsal | set | keywords:
+ patch nosy: + pablogsal pull_requests:
+ pull_request23799 |
| 2020-09-11 22:10:31 | brett.cannon | set | messages: + msg376742 |
| 2017-03-14 14:57:52 | serhiy.storchaka | set | keywords:
- patch versions: + Python 3.7, - Python 3.6 |
| 2017-03-14 14:29:12 | Jim Fasarakis-Hilliard | set | nosy:
+ Jim Fasarakis-Hilliard |
| 2017-03-14 13:52:27 | serhiy.storchaka | link | issue3353 dependencies |
| 2016-03-20 21:30:29 | python-dev | set | nosy:
+ python-dev messages: + msg262091 |
| 2016-03-17 12:04:22 | serhiy.storchaka | set | dependencies: + Double coding cookie |
| 2015-11-25 14:17:10 | vstinner | set | nosy:
+ vstinner messages: + msg255355 |
| 2015-11-22 06:29:50 | matrixise | set | messages: + msg255082 |
| 2015-11-22 04:47:25 | matrixise | set | nosy:
+ matrixise |
| 2015-11-17 17:42:47 | brett.cannon | set | nosy:
+ brett.cannon |
| 2015-11-17 17:22:56 | yselivanov | set | nosy:
+ yselivanov |
| 2015-11-17 01:27:33 | serhiy.storchaka | create | |

