Issue 27120: xmllib unable to parse in UTF8 format
Created on 2016-05-25 09:09 by enrico.scame, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| xmllib.py | enrico.scame, 2016-05-25 13:14 | |||
| Messages (4) | |||
|---|---|---|---|
| msg266322 - (view) | Author: Enrico (enrico.scame) | Date: 2016-05-25 09:09 | |
The xmllib.XMLParser seems to be unable to parse an XML file that contains cyrillic characters. File "xmllib.pyc", line 172, in feed File "xmllib.pyc", line 268, in goahead File "xmllib.pyc", line 798, in syntax_error Error: Syntax error at line 8: illegal character in content |
|||
| msg266339 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-05-25 12:36 | |
Could you please provide minimal reproducer? Minimal script and minimal data that expose the issue. |
|||
| msg266344 - (view) | Author: Enrico (enrico.scame) | Date: 2016-05-25 13:14 | |
I have attached xmllib.py. This file is in python23\lib folder.
The strings in XML file are in cyrillic language.
My code:
import xmllib
class Parser(xmllib.XMLParser):
# a simple styling engine
def __init__(self):
xmllib.XMLParser.__init__(self)
self.cursupervisore = None
self.curdata = ''
self.elements = {'Superv':(self.starttag_superv, self.endtag_superv)
........
}
def load(self, file):
while 1:
s = file.readline()
if not s:
break
self.feed(s)
self.close()
def read_plant_tree(filexml):
c = Parser()
c.load(filexml)
|
|||
| msg266479 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-05-27 06:02 | |
See also issue222587. Seems this was the reason why the xmllib module was deprecated. Use the xml package for parsing XML (xml.etree.ElementTree, xml.dom.minidom, xml.sax, etc). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:31 | admin | set | github: 71307 |
| 2016-05-27 06:03:07 | serhiy.storchaka | set | status: open -> closed stage: test needed -> resolved |
| 2016-05-27 06:02:48 | serhiy.storchaka | set | resolution: wont fix messages: + msg266479 |
| 2016-05-25 13:14:10 | enrico.scame | set | files:
+ xmllib.py messages: + msg266344 |
| 2016-05-25 12:36:20 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg266339 |
| 2016-05-25 09:09:33 | enrico.scame | create | |
