Reg Exp: Need advice concerning "greediness"
Calvelo Daniel
dcalvelo at pharion.univ-lille2.fr
Mon Oct 2 06:17:39 EDT 2000
More information about the Python-list mailing list
Mon Oct 2 06:17:39 EDT 2000
- Previous message (by thread): Reg Exp: Need advice concerning "greediness"
- Next message (by thread): Reg Exp: Need advice concerning "greediness"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Franz GEIGER <fgeiger at datec.at> wrote: : Hello all, : I want to exchange font colors of headings of a certain level in HTML files. : I have a line containing a heading level 1, e.g.: <h1><font : COLOR="#FF0000">Heading Level 1</font></h1>. : Now I want to split this into 3 groups: Everything before "COLOR=xyz", : "COLOR=xyz" itself, and everything after "COLOR=xyz". : I tried: : sRslt = "<h1><font COLOR="#FF0000">Heading Level 1</font></h1>"; : print re.findall(re.compile(r'(.*?FONT.*?)(COLOR=.*?)*([ |>].*)', re.I | : re.S), sRslt); Beware of quotes in your example: >>> sRslt = "<h1><font COLOR="#FF0000">Heading Level 1</font></h1>" >>> sRslt '<h1><font COLOR=' (That explains weird results reported here) As for your regexp, the following works: >>> print re.findall(re.compile(r'(.*?FONT[^">]+?)(COLOR=.*?)?([ |>].*)', re.I | re.S), sRslt); [('<h1><font ', 'COLOR="#FF0000"', '>Heading Level 1</font></h1>')] I used a negated character class to force an end for the first group before a cpossible COLOR tag. Otherwise, what I think is happening is that your non-greedy search is indeed non-greedy, but the null-match of '(COLOR=.*?)*' is included into it. BTW, I changed that '*' to '?', which is what you meant, if I read correctly. HTH, DCA -- Daniel Calvelo Aros calvelo at lifl.fr
- Previous message (by thread): Reg Exp: Need advice concerning "greediness"
- Next message (by thread): Reg Exp: Need advice concerning "greediness"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list