Regexp finditer() fails to match some non-overlapping matches?
John Machin
sjmachin at lexicon.net
Sat May 3 18:12:17 EDT 2003
More information about the Python-list mailing list
Sat May 3 18:12:17 EDT 2003
- Previous message (by thread): Regexp finditer() fails to match some non-overlapping matches?
- Next message (by thread): Any work on Pippy -> Python 2.2.x+?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
philipj at telia.com (Philip Jägenstedt) wrote in message news:<313b626e.0305031036.1fcfeab2 at posting.google.com>... > import re > str="__Bullet lists__" > pattern = r"^|__.+__" > rules = re.compile(pattern) > for m in rules.finditer(str): > print m.start(), m.end(), m.group() > > In this case, the string "__Bullet lists__" will not be matched, > because there is the zero-length match before it. > > So what it boils down to is: why doesn't finditer() match both the > beginning of the line, and some other thing that lives at the > beginning of the line? > > For example: > > |_|_|b|o|l|d|_|_| > 0 1 2 3 4 5 6 7 8 > > I'd like to have a zero-length match (0-0) since there are no # or * > characters, and then the 0-8 match for __bold__. But, I cannot see how > to do it. > > I have the problem using Debian GNU/Linux testing, with python 2.2.1. > If any other information is needed, do ask! You will have the problem on any OS with any version of Python, with not only finditer() but also with findall() and sub() ... and, I'll cheerfully wager without having ever used them, the same applies to the corresponding facilities in Perl, Ruby, etc etc. Likewise in any text editor that supports regular expressions. Try g/your_pattern/s//foo/g in vi and see how many foos you get at the start of each line. The "problem" is that implementors *don't* regard your examples as "non-overlapping" -- they start at the same position in the text. An RE searching engine will always advance its input position by one character after a zero-length match otherwise it would loop endlessly. Your solution should be relatively simple: abandon the canned loop of finditer(), write your own loop which searches for the next occurrence of one of your multiple patterns of interest, then examines which one or more are actually present starting at the match point. Be careful about advancing the search point after a match. You may like to submit a documentation enhancement request, to the effect that "non-overlapping" needs clarification. HTH, John
- Previous message (by thread): Regexp finditer() fails to match some non-overlapping matches?
- Next message (by thread): Any work on Pippy -> Python 2.2.x+?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list