[Python-Dev] sgmllib Comments
"Martin v. Löwis"
martin at v.loewis.de
Mon Jun 12 08:18:50 CEST 2006
More information about the Python-Dev mailing list
Mon Jun 12 08:18:50 CEST 2006
- Previous message: [Python-Dev] sgmllib Comments
- Next message: [Python-Dev] sgmllib Comments
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Sam Ruby wrote: > If we can agree on the behavior, I would be glad to write up a patch. > > It seems to me that the simplest way to proceed would be for the code > that attempts to resolve character references (both named and numeric) > in attributes to be isolated in a single method. Subclasses that desire > different behavior (including the existing Python 2.4 and prior > behaviour) could simply override this method. In SGML, this is problematic: The named things are not character references, they are entity references, and it isn't necessarily the case that they expand to a character. For example, &author; might expand to "Martin v. Löwis", and &logo; might refer to a bitmap image which is unparsed. That said, providing a overridable replacement function sounds like the right approach. To keep with tradition, I would still distinguish between character references and entity references, i.e. providing two overridable functions instead. Returning None could mean that no replacement is available. As for default implementations, I think they should do what currently happens: entity references are replaced according to entitydefs, character references are replaced to bytes if they are smaller than 256. Contrary to what others said, it appears that SGML *does* support hexadecimal character references, provided that the SGML declaraction contains the HCRO definition (which, for HTML and XML, is defined as HCRO "&#x"). So it seems safe to process hex character references by default (although it isn't safe to assume Unicode, IMO). Regards, Martin
- Previous message: [Python-Dev] sgmllib Comments
- Next message: [Python-Dev] sgmllib Comments
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-Dev mailing list