Pulling out <TITLE></TITLE>
Bengt Richter
bokr at accessone.com
Thu Nov 22 05:18:21 EST 2001
More information about the Python-list mailing list
Thu Nov 22 05:18:21 EST 2001
- Previous message (by thread): Pulling out <TITLE></TITLE>
- Next message (by thread): Question on default value of __init__
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 21 Nov 2001 23:42:51 -0800, Brett Cannon <bac at OCF.Berkeley.EDU> wrote: >Could use negative lookahead and lookbehinds. Another solution is to just >strip out all comments from the HTML. Probably wouldn't hurt, anyway, >since it will probably increase performance slightly be cutting down on >the amount of tags to deal with. It's probably useful to not that there must be one legal <TITLE></TITLE> in an html doc, and it must be in the <HEAD>/<HEAD> section. Once you've found it, you're done. But the real question is what the operational requirement is. Seems like he is making a command line tool to generate an index (to put in DB? to put into a monolithic HTML page indexing and inliking to all the pages? to put in a hierarchical frame-wrapped version of that? etc?) of HTML pages (all in a single directory? specified by a list of glob expressions?). Or he might want a cron job and keep track of file mod dates to avoid reprocessing? (Hm, wonder about using make cleverly)... You never know what people are up to ;-) > >But it is also illegal syntax, I believe, to embed tags within a comment. > Nope. It's ok. I think the standard will move to the XML definition of a comment, even though you probably will want to handle the illegal but widely accepted error of including more than one '-' in a row between the '<!--' and the '-->' (the '--' is illegal in XML for compatibility with SGML). >From the CML spec: -- 2.5 Comments Comments may appear anywhere in a document outside other markup;in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor may, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string "--" (double-hyphen) must not occur within comments. Comments [15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->' An example of a comment: <!-- declarations for <head> & <body> --> -- The example show's it's ok to embed tags in comments.
- Previous message (by thread): Pulling out <TITLE></TITLE>
- Next message (by thread): Question on default value of __init__
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list