Message 60331 - Python tracker

Message60331

Author kingswood
Recipients
Date 2004-04-03.18:04:36
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=555155

This problem is actually more widespread than previously
indicated. Not only do all calls to self.error where that
function returns need to cope with that, and recover (the
HTMLParser defines that every character in the input will be
visited exactly once), but other modules are also affected.

In particular, feeding HTML (from spam) with a tag <!12345>
into HTMLParser causes markupbase._scan_name to emit an
error that now needs to recover.

The patch in #917188 may be better than the one suggested
here as it deals with all places where self.error() can return.
More is needed to fix the problem completely.
In markupbase.py, at least this is necessary

--- markupbase.py.orig  Sat Apr 03 17:43:48 2004
+++ markupbase.py       Sat Apr 03 18:02:48 2004
@@ -377,6 +377,8 @@
         else:
             self.updatepos(declstartpos, i)
             self.error("expected name token")
+        return None,rawdata.find(">",i)

     # To be overridden -- handlers for unknown objects
     def unknown_decl(self, data):
History
Date User Action Args
2008-01-20 09:56:06adminlinkissue736428 messages
2008-01-20 09:56:06admincreate