Modification of a urllib2 object ?
George Sakkis
george.sakkis at gmail.com
Fri Oct 10 16:02:22 EDT 2008
More information about the Python-list mailing list
Fri Oct 10 16:02:22 EDT 2008
- Previous message (by thread): Modification of a urllib2 object ?
- Next message (by thread): Modification of a urllib2 object ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Oct 10, 2:32 pm, vincehofmeis... at gmail.com wrote: > I have several ways to the following problem. > > This is what I have: > > ... > import ClientForm > import BeautifulSoup from BeautifulSoup > > request = urllib2.Request('http://form.com/) > > self.first_object = urllib2.open(request) > > soup = BeautifulSoup(self.first_object) > > forms = ClienForm.ParseResponse(self.first_object) > > Now, when I do this, forms returns an index errror because no forms > are returned, but the BeautifulSoup registers fine. First off, please copy and paste working code; the above has several syntax errors, so it can't raise IndexError (or anything else for that matter). > Now, when I switch the order to this: > > import ClientForm > import BeautifulSoup from BeautifulSoup > > request = urllib2.Request('http://form.com/) > > self.first_object = urllib2.open(request) > > forms = ClienForm.ParseResponse(self.first_object) > > soup = BeautifulSoup(self.first_object) > > Now, the form is returned correctly, but the BeautifulSoup objects > returns empty. > > So what I can draw from this is both methods erase the properties of > the object, No, that's not the case. What happens is that the http response object returned by urllib2.open() is read by the ClienForm.ParseResponse or BeautifulSoup - whatever happens first - and the second call has nothing to read. The easiest solution is to save the request object and call urllib2.open twice. Alternatively check if ClientForm has a parse method that accepts strings instead of urllib2 requests and then read and save the html text explicitly: >>> text = urllib2.open(request).read() >>> soup = BeautifulSoup(text) >>> forms = ClientForm.ParseString(text) HTH, George
- Previous message (by thread): Modification of a urllib2 object ?
- Next message (by thread): Modification of a urllib2 object ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list