read all available pages on a Website
Brad Tilley
bradtilley at usa.net
Mon Sep 13 09:49:02 EDT 2004
More information about the Python-list mailing list
Mon Sep 13 09:49:02 EDT 2004
- Previous message (by thread): launching JCL on MVS
- Next message (by thread): read all available pages on a Website
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Alex Martelli wrote: > Leif K-Brooks <eurleif at ecritters.biz> wrote: > > >>Tim Roberts wrote: >> >>>Brad Tilley <bradtilley at usa.net> wrote: >>> >>> >>>>Is there a way to make urllib or urllib2 read all of the pages on a Web >>>>site? >>> >>>By the way, there are many web sites for which this sort of behavior is not >>>welcome. >> >>Any site that didn't want to be crawled would most likely use a >>robots.txt file, so you could check that before doing the crawl. > > > Python's Tools/webchecker/ directory has just the code you need for all > of this. The directory is part of the Python source distribution, but > it's all pure Python code, so, if your distribution is binary and omits > that directory, just download the Python source distribution, unpack it, > and there you are. > > > Alex Thank you, this is ideal.
- Previous message (by thread): launching JCL on MVS
- Next message (by thread): read all available pages on a Website
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list