regex for href substitution
Robin Becker
robin at jessikat.fsnet.co.uk
Wed Feb 19 04:07:43 EST 2003
More information about the Python-list mailing list
Wed Feb 19 04:07:43 EST 2003
- Previous message (by thread): regex for href substitution
- Next message (by thread): regex for href substitution
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In article <mailman.1045612196.24428.python-list at python.org>, Ian Bicking <ianb at colorstudy.com> writes ....... > >Anyway, a regex like this will mostly work: > >href_re = re.compile(r'(<a[^>]+href=")([^"]*)(".*?>)', re.I | re.S) >page = href_re.sub(subber, page) >def subber(match): > return match.group(1) + rewrite_url(match.group(2)) + match.group(3) > > ..... the thing with the above approach is that it's a bit naive, href attributes come in a lot of shapes <a href = 'my url'> fails the above so we need lots of white spaces and alternates, we don't actually need quotes eg <a href=/cgi-bin/bongo.cgi> should be legal older html. Also there are other possible attributes now in <a> tags so we can't be sure it's always <a href=> how about <a class="bingo" href="...">. The more I think about it the more I seem to prefer the htmllib approach. -- Robin Becker
- Previous message (by thread): regex for href substitution
- Next message (by thread): regex for href substitution
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list