Issue36397
Created on 2019-03-22 02:48 by Elias Tarhini, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Messages (4) | |||
|---|---|---|---|
| msg338581 - (view) | Author: Elias Tarhini (Elias Tarhini) | Date: 2019-03-22 02:48 | |
I believe I've found a bug in the `re` module -- specifically, in the 3.7+ support for splitting on zero-width patterns. Compare Java's behavior...
jshell> "1211".split("(?<=(\\d))(?!\\1)(?=\\d)");
$1 ==> String[3] { "1", "2", "11" }
...with Python's:
>>> re.split(r'(?<=(\d))(?!\1)(?=\d)', '1211')
['1', '1', '2', '2', '11']
(The pattern itself is pretty straightforward in design, but regex syntax can cloud things, so to be totally clear: it finds any point that follows a digit and precedes a *different* digit.)
* Tested on 3.7.1 win10 and 3.7.0 linux.
|
|||
| msg338582 - (view) | Author: Matthew Barnett (mrabarnett) * ![]() |
Date: 2019-03-22 03:26 | |
From the docs: """If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.""" The pattern does contain a capture, so that's why the result has additional '1' and '2'. Presumably, Java's split doesn't do that. Not a bug. |
|||
| msg338704 - (view) | Author: Elias Tarhini (Elias Tarhini) | Date: 2019-03-23 21:51 | |
Thank you. Was too zeroed-in on the idea that it was from the zero-width pattern, and I forgot to consider the group. Looks like `re.sub(pattern, 'some-delim', s).split('some-delim')` is a way to do this if it's not possible to use a non-capturing group
|
|||
| msg338705 - (view) | Author: Matthew Barnett (mrabarnett) * ![]() |
Date: 2019-03-23 22:13 | |
The list alternates between substrings (s, between the splits) and captures (c): ['1', '1', '2', '2', '11'] -s- -c- -s- -c- -s-- You can use slicing to extract the substrings: >>> re.split(r'(?<=(\d))(?!\1)(?=\d)', '12111')[ : : 2] ['1', '2', '111'] |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:12 | admin | set | github: 80578 |
| 2019-03-23 22:13:44 | mrabarnett | set | messages: + msg338705 |
| 2019-03-23 21:51:18 | Elias Tarhini | set | messages: + msg338704 |
| 2019-03-22 03:26:31 | mrabarnett | set | status: open -> closed resolution: not a bug messages: + msg338582 stage: resolved |
| 2019-03-22 02:48:42 | Elias Tarhini | create | |
