strip not well formed html tags...
Shagshag13
shagshag13 at yahoo.fr
Tue Oct 22 13:24:20 EDT 2002
More information about the Python-list mailing list
Tue Oct 22 13:24:20 EDT 2002
- Previous message (by thread): building 2.2.2, g++ Dec Unix 4.0G
- Next message (by thread): strip not well formed html tags...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Mark McEahern" <marklists at mceahern.com> a écrit dans le message de news: mailman.1035288928.25184.python-list at python.org... > > i've seen many post about how to strip html tags from a string, > > some use sgmllib, others regular expressions... i 'd the following > > trouble i would like to strip html (or even xml) tags but i had > > to work on incomplete string so they could be not well formed - what > > should i use ? regexp ? sgmllib with many exceptions handling ? > > 1. Try mxTidy. thanks i 'll check... > 2. Consider providing an example of the data you're talking about. sorry : """<tag1> <tag2>a title</tag2><tag3>this is an example <tag3>and closing tags are missing """ """but this one is possible too </tag1><tag3> <script>here is javascript code</script>""" (but also even html tags) i want this to be human readable - by now i use regexp to strip tags but i have trouble with javascript <script>here is javascript code</script> -> i get the content, so i was wondering if i shoud'nt use a real parser... but here my trouble will be as i don't kwow in advance all the tags that i will have to handle. thanks, s13.
- Previous message (by thread): building 2.2.2, g++ Dec Unix 4.0G
- Next message (by thread): strip not well formed html tags...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list