Issue32067
Created on 2017-11-18 11:12 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 4454 | closed | serhiy.storchaka, 2017-11-18 18:08 | |
| Messages (2) | |||
|---|---|---|---|
| msg306479 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2017-11-18 11:12 | |
Currently `{m}`, `{m,n}`, `{m,}` and `{,n}` where m and n are non-negative decimal numbers are accepted in regular expressions as quantifiers that mean repeating the previous RE from m (0 by default) to n (infinity by default) times.
But if the opening brace '{'is not followed by one of the above patterns, it means just the literal '{'.
>>> import re
>>> re.search('(foobar){e}', 'xirefoabralfobarxie')
>>> re.search('(foobar){e}', 'foobar{e}')
<re.Match object; span=(0, 9), match='foobar{e}'>
This conflicts with the regex module which uses braces for defining the "fuzzy" matching.
>>> import regex
>>> regex.search('(foobar){e}', 'xirefoabralfobarxie')
<regex.Match object; span=(0, 6), match='xirefo', fuzzy_counts=(6, 0, 0)>
>>> regex.search('(foobar){e}', 'foobar{e}')
<regex.Match object; span=(0, 6), match='foobar'>
I don't think it is worth to add support of fuzzy matching in the re module, but for compatibility it would be better to raise an error or a warning in case of '{' not following by the one of the recognized patterns. This could also help to catch typos and errors in regular expressions, i.e. in '-{1.2}' or '-{1, 2}' instead of '-{1,2}'.
Possible variants:
1. Emit a DeprecationWarning in 3.7 (and 2.7.15 with the -3 option), raise a re.error in 3.8 or 3.9.
2. Emit a PendingDeprecationWarning in 3.7, a DeprecationWarning in 3.8, and raise a re.error in 3.9 or 3.10.
3. Emit a RuntimeWarning or SyntaxWarning in 3.7 and forever.
4. Emit a FutureWarning in 3.7, and implement the fuzzy matching or replace re with regex sometimes in future. Unlikely.
|
|||
| msg306491 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2017-11-18 18:14 | |
Since this will require changing regular expressions in several places in the stdlib I have chosen emitting PendingDeprecationWarning and long deprecation period. But I'm now not sure that this is a good idea. Non-escaped braces can be used much in a wild. It may be better to use other heuristic for recognizing the fuzzy matching and other possible extensions that use braces. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:54 | admin | set | github: 76248 |
| 2018-12-23 10:14:12 | serhiy.storchaka | set | status: open -> closed resolution: rejected stage: patch review -> resolved |
| 2017-11-24 17:51:14 | jwilk | set | nosy:
+ jwilk |
| 2017-11-18 18:14:02 | serhiy.storchaka | set | messages: + msg306491 |
| 2017-11-18 18:08:22 | serhiy.storchaka | set | keywords:
+ patch stage: patch review pull_requests: + pull_request4392 |
| 2017-11-18 11:46:05 | serhiy.storchaka | set | title: Deprecate accepting -> Deprecate accepting unrecognized braces in regular expressions |
| 2017-11-18 11:12:27 | serhiy.storchaka | create | |
