How to extract a part of html file
Mike Meyer
mwm at mired.org
Thu Oct 20 09:47:37 EDT 2005
More information about the Python-list mailing list
Thu Oct 20 09:47:37 EDT 2005
- Previous message (by thread): How to extract a part of html file
- Next message (by thread): How to extract a part of html file
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Ben Finney <bignose+hates-spam at benfinney.id.au> writes: > Joe <dinamo99 at lycos.com> wrote: >> I'm trying to extract part of html code from a tag to a tag > For tag soup, use BeautifulSoup: > <URL:http://www.crummy.com/software/BeautifulSoup/> Except he's trying to extract an apparently random part of the file. BeautifulSoup is a wonderful thing for dealing with X/HTML documents as structured documents, which is how you want to deal with them most of the time. In this case, an re works nicely: >>> import re >>> s = '<span class="boldyellow"><B><U> and ends with TD><TD> <img src="http://whatever/some.gif"> </TD></TR></TABLE>' >>> r = re.match('<span class="boldyellow"><B><U>(.*)TD><TD> <img src="http://whatever/some.gif"> </TD></TR></TABLE>', s) >>> r.group(1) ' and ends with ' >>> String.find also works really well: >>> start = s.find('<span class="boldyellow"><B><U>') + len('<span class="boldyellow"><B><U>') >>> stop = s.find('TD><TD> <img src="http://whatever/some.gif"> </TD></TR></TABLE>', start) >>> s[start:stop] ' and ends with ' >>> Not a lot to choose between them. <mike -- Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
- Previous message (by thread): How to extract a part of html file
- Next message (by thread): How to extract a part of html file
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list