How do I get to *all* of the groups of an re search?
Bengt Richter
bokr at oz.net
Fri Jan 10 20:20:49 EST 2003
More information about the Python-list mailing list
Fri Jan 10 20:20:49 EST 2003
- Previous message (by thread): How do I get to *all* of the groups of an re search?
- Next message (by thread): How do I get to *all* of the groups of an re search?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 10 Jan 2003 15:33:57 -0700, Andrew Dalke <adalke at mindspring.com> wrote: >Kyler Laird wrote: >> As it is, I am resigned to understanding that Python's re >> module makes an arbitrary and undocumented decision to return >> the last instance of a match for a group. I'm embarrassed. > >It is documented, and behaves as documented. > >http://www.python.org/doc/current/lib/match-objects.html >] If a group number is negative or larger than the number of >] groups defined in the pattern, an IndexError exception is >] raised. If a group is contained in a part of the pattern that did not >] match, the corresponding result is None. If a group is contained >] in a part of the pattern that matched multiple times, the last >] match is returned. > >As far as my research went, no standard regexp library could provide >that sort of information. They only give the last group which >matched a pattern. > I was surprised to see the same substring apparently re-used though. I had expected that whatever match was decided on "used up" the text so that it could not be returned in another match pattern, but it looks like the last-of-multiple-matches logic kicks in even when the multiple match has already occurred in a single span. E.g., The original: >>> import re >>> text = 'foo foo1 foo2 bar bar1 bar2 bar3' >>> test_re = re.compile('([a-z]+)( \\1[0-9]+)+') >>> print test_re.findall(text) [('foo', ' foo2'), ('bar', ' bar3')] Adding some parens: >>> test_re = re.compile('([a-z]+)(( \\1[0-9]+)+)') >>> print test_re.findall(text) [('foo', ' foo1 foo2', ' foo2'), ('bar', ' bar1 bar2 bar3', ' bar3')] Why are foo2 and bar3 showing up twice each? E.g., why not '' in the last position? Is that the way it is supposed to work? Just asking ;-) [... snipping some other boggledystuff ...] > >It is working as documented. > >You can also solve this without regexps. > >> I certainly did not encounter a limitation with REs - I can >> define the solution perfectly using an RE. The problem is just >> getting the Python re module to share its results. Python's >> broken re module doesn't make REs any less appropriate. > >Show me a module besides Martel which lets you get access to >the parse tree. I looked at about a dozen packages, read >through Friedl's 1st edition book, and posted to various newsgroups >looking for one. > >The problem is that the regexp defines a tree structure, but >the interface to the parsers are linear, and there was a choice >make (some time ago) to flatten that tree to only contain the >last groups which match. Even if they've already been matched? Regards, Bengt Richter
- Previous message (by thread): How do I get to *all* of the groups of an re search?
- Next message (by thread): How do I get to *all* of the groups of an re search?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list