HTML Parser - beginner needs help
Alex Martelli
aleaxit at yahoo.com
Thu Sep 14 17:51:56 EDT 2000
More information about the Python-list mailing list
Thu Sep 14 17:51:56 EDT 2000
- Previous message (by thread): HTML Parser - beginner needs help
- Next message (by thread): HTML Parser - beginner needs help
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"zet" <zet at i.com.ua> wrote in message news:968956212.35650 at ipt2.iptelecom.net.ua... > Can somebody provide small piece of code, which returns list of img tags? > I've trying this lines: > > class IMGParser(HTMLParser): > def end_img(arg): > return > > but it return only an anchors, how to get IMG's? The general idea: import sgmllib class Imgs(sgmllib.SGMLParser): def do_img(self, attributes): print attributes getim=Imgs() getim.feed(open("c:/mydocu~1/samba98.htm").read()) getim.close() giving output such as: [('height', '51'), ('src', 'Samba98_files/cllogo_medium.gif'), ('width', '220')] [('height', '28'), ('src', 'Samba98_files/button_home.gif'), ('width', '28')] [('height', '28'), ('src', 'Samba98_files/button_up.gif'), ('width', '28')] [('height', '28'), ('src', 'Samba98_files/button_home.gif'), ('width', '28')] [('height', '28'), ('src', 'Samba98_files/button_up.gif'), ('width', '28')] If what you want to do is accumulate a list of the src attributes only, for example, the class could be: class Imgs(sgmllib.SGMLParser): def __init__(self): self.imgs = [] def do_img(self, attributes): self.imgs.append(attributes[src]) and the end result would be left in the .imgs field of the object after .close is called (of course, you could make an accessor method for that, if you so desire). Alex
- Previous message (by thread): HTML Parser - beginner needs help
- Next message (by thread): HTML Parser - beginner needs help
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list