Pulling out <TITLE></TITLE>
Bengt Richter
bokr at accessone.com
Wed Nov 21 04:13:05 EST 2001
More information about the Python-list mailing list
Wed Nov 21 04:13:05 EST 2001
- Previous message (by thread): Pulling out <TITLE></TITLE>
- Next message (by thread): Pulling out <TITLE></TITLE>
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, 18 Nov 2001 20:45:44 -0800, Brett Cannon <bac at OCF.Berkeley.EDU> wrote: >You could just read each page and use a regex to fetch it: > >title_value=re.search(r'<title>(?P<title>.*?)</title>',re.I) >title_value.group('title') > Hm. What happens with the following page? <HTML><HEAD> <!-- (old title kept for reference, or possible restoring) <TITLE>This is the old title</TITLE> --> <TITLE>Official new title</TITLE> </HEAD><Body>...whatever...</BODY></HTML> >On Sun, 18 Nov 2001, David A McInnis wrote: > >> I am writing a script to catalog about 30,000 html pages on my site and need >> to pull out the value of <TITLE></TITLE>. >> >> I guess this is possible with htmllib, but I cannot figure it out. >> >> Thanks, >> David >> >> >> >
- Previous message (by thread): Pulling out <TITLE></TITLE>
- Next message (by thread): Pulling out <TITLE></TITLE>
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list