re findall mod for issue of side effects
Andrew Henshaw
andrew_dot_henshaw_at_earthling_dot_net
Mon Jan 15 08:11:43 EST 2001
More information about the Python-list mailing list
Mon Jan 15 08:11:43 EST 2001
- Previous message (by thread): re findall mod for issue of side effects
- Next message (by thread): Problem with a tuple - newbie ignorance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Tim Peters" <tim.one at home.com> wrote in message news:mailman.979546103.2078.python-list at python.org... > Except that's what non-capturing parens are for, in the context of > re.findall() and *everywhere else*. Adding a unique wart to findall() is > probably a poor idea unless it's astonishingly useful. ...snip... > > Sure did. But your pattern after "grouping" is also radically different in > another way: it can match an empty string, where your original pattern > could not. And a pattern that can match nothing everywhere is a very > strange pattern for findall() (what is it you're trying to find then? a > bunch of nothings? that's what you're *telling* it to find). You do > strange things, you get strange results. > > A pattern matching what it appears you *intended* to search for here: > > >>> r = re.compile('(?:abc)+(?:xyz)*|(?:xyz)+') > >>> r.findall(s) > ['abcabcxyz'] > >>> > > That is, don't hand findall a pattern than matches empty strings, and it > won't return empty matches. Introducing empty matches was by mistake. I should have left the pattern at (abc)+xyz There is a problem with (?:) that I brought up in 'Using re -side effects or misunderstanding'. ...snip... > > Does anybody else see this [grammaticGrouping] to be as useful > > as I do? > > Sorry, I don't: I see it as misusing findall(), and then adding a wart to > cover that up. But then I'm always generous in my assessments <wink>. > > More generally useful would be a new flag on regexp compilation meaning "all > my parens are non-capturing". Then that part of it could be enjoyed by all > uses of regexps, not just findall. I don't see a need for that, but it > wouldn't be particularly damaging. This is what I had suggested (well, maybe not, see below) in yesterday's take on this subject (see: 'Using re -side effects or misunderstanding') and would be my preferred design. I looked at the code and became worried that the effect of that flag would have deep consequences that I wasn't going to foresee in my quick examination. Therefore, I thought I'd limit the 'wart', as you call it, to the area that I was particularly interested, for demonstration purposes. As to your suggestion (a new flag on regexp compilation meaning "all my parens are non-capturing"), I'd still like to retain the ability to use the non-capturing flag to exclude portions from the return string. This may be what you're stating, but I'd like the flag to indicate that parens are for grammatical grouping - they do not force a tuple return. Thus, s='..abcxyz..' r=re.compile('(ab)+(?:c)(xyz)+') r.findall(s) would return ['abxyz'] (I should put a fractional wink in here about null strings) > > If you're going to ask findall() to match empty strings, though, filter 'em > out yourself. Yes, I agree. That bit of code shouldn't be in there. I realized that late last night, when I was playing with the patch. Also, the patch is flawed in that it doesn't handle the '(?:)' type of parens correctly. The problems one generates when one tries to rush a 'product' out the door. > > cruel-but-fair-ly y'rs - tim > Not cruel at all. AH
- Previous message (by thread): re findall mod for issue of side effects
- Next message (by thread): Problem with a tuple - newbie ignorance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list