read all available pages on a Website

Alex Martelli aleaxit at yahoo.com
Mon Sep 13 05:09:08 EDT 2004

Previous message (by thread): read all available pages on a Website
Next message (by thread): read all available pages on a Website
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Leif K-Brooks <eurleif at ecritters.biz> wrote:

> Tim Roberts wrote:
> > Brad Tilley <bradtilley at usa.net> wrote:
> > 
> >>Is there a way to make urllib or urllib2 read all of the pages on a Web
> >>site?
> > By the way, there are many web sites for which this sort of behavior is not
> > welcome.
> 
> Any site that didn't want to be crawled would most likely use a 
> robots.txt file, so you could check that before doing the crawl.

Python's Tools/webchecker/ directory has just the code you need for all
of this.  The directory is part of the Python source distribution, but
it's all pure Python code, so, if your distribution is binary and omits
that directory, just download the Python source distribution, unpack it,
and there you are.

Alex

Previous message (by thread): read all available pages on a Website
Next message (by thread): read all available pages on a Website
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-list mailing list