Issue6266
Created on 2009-06-11 10:53 by Neil Muller, last changed 2022-04-11 14:56 by admin. This issue is now closed.
| Messages (9) | |||
|---|---|---|---|
| msg89248 - (view) | Author: Neil Muller (Neil Muller) | Date: 2009-06-11 10:53 | |
Consider:
>>> from StringIO import StringIO
>>> source = StringIO('<body xmlns="http://éffbot.org/ns">text</body>')
>>> import xml.etree.ElementTree as ET
>>> events = ("start-ns",)
>>> context = ET.iterparse(source, events)
>>> for action, elem in context:
... print action, elem
...
start-ns ('', u'http://\xe9ffbot.org/ns')
>>> source.seek(0)
>>> import xml.etree.cElementTree as cET
>>> context = cET.iterparse(source, events)
>>> for action, elem in context:
... print action, elem
...
start-ns ('', 'http://\xc3\xa9ffbot.org/ns')
I'm not sure which is more correct here, but unsing different encodings
in the result is somewhat unexpected.
|
|||
| msg89550 - (view) | Author: (nlopes) | Date: 2009-06-20 23:39 | |
This is a pretty dumb patch, but it does it's job. Basically it decodes the utf-8 encoded prefix and uri. Then, encodes it into Latin1. Probably there are better ways of doing this and those ideas are welcome. Patch attached. |
|||
| msg89551 - (view) | Author: Fredrik Lundh (effbot) * ![]() |
Date: 2009-06-21 00:15 | |
Converting from UTF-8 to Unicode is the right thing to do, but
converting back to Latin-1 is not correct -- note that ET returns a
Unicode string, not an 8-bit string. There's a "makestring" helper that
does the right thing in the library; just changing:
parcel = Py_BuildValue("ss", (prefix) ? prefix : "", uri);
to
parcel = Py_BuildValue("sN", (prefix) ? prefix : "", makestring(uri));
should work (even if you should probably do that in two steps, and look
for errors from makestring before proceeding).
|
|||
| msg89552 - (view) | Author: (nlopes) | Date: 2009-06-21 00:42 | |
You're right about the conversion to Latin1. I actually played a bit with makestring before going in another direction (although not very good) because makestring alone wasn't giving what is intended. I'll try to check tomorrow a good approach for this (already had that in mind). |
|||
| msg89560 - (view) | Author: Fredrik Lundh (effbot) * ![]() |
Date: 2009-06-21 13:12 | |
It should definitely give what's intended (either a Unicode string, or, if the content is plain ASCII, an 8-bit string). What did you get instead? |
|||
| msg89564 - (view) | Author: (nlopes) | Date: 2009-06-21 17:24 | |
I got pure gibberish output, but I know why. It was a compilation gone
wrong.
To get the output as ElementTree, I think instead of
parcel = Py_BuildValue("sN", (prefix) ? prefix : "", makestring(uri));
it should be
parcel = Py_BuildValue("sN", (prefix) ? prefix : "",
PyUnicode_AsUnicode(makestring(uri), strlen(uri)));
Else it will not be the expected result.
Or am I overseeing something?
|
|||
| msg89568 - (view) | Author: (nlopes) | Date: 2009-06-21 17:41 | |
Don't mind what I just said. I overlooked the N. I couldn't figure out what was going wrong with your solution. That works. Mine is a ... aham. :) |
|||
| msg99442 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2010-02-16 21:58 | |
Merged with the upstream patch proposed on #6472 (with test case). |
|||
| msg100871 - (view) | Author: Florent Xicluna (flox) * ![]() |
Date: 2010-03-11 15:57 | |
Fixed with #6472. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:50 | admin | set | github: 50515 |
| 2010-03-11 15:57:40 | flox | set | status: open -> closed superseder: Update ElementTree with upstream changes messages: + msg100871 dependencies:
- Update ElementTree with upstream changes |
| 2010-02-16 21:58:35 | flox | set | dependencies:
+ Update ElementTree with upstream changes messages: + msg99442 |
| 2010-02-16 13:23:15 | flox | set | nosy:
+ flox priority: normal components: + XML type: behavior stage: needs patch |
| 2009-06-21 17:42:45 | nlopes | set | files: - _elementtree.diff |
| 2009-06-21 17:41:47 | nlopes | set | messages: + msg89568 |
| 2009-06-21 17:24:35 | nlopes | set | messages: + msg89564 |
| 2009-06-21 13:12:43 | effbot | set | messages: + msg89560 |
| 2009-06-21 00:42:26 | nlopes | set | messages: + msg89552 |
| 2009-06-21 00:15:37 | effbot | set | messages: + msg89551 |
| 2009-06-20 23:39:22 | nlopes | set | files:
+ _elementtree.diff nosy:
+ nlopes keywords: + patch |
| 2009-06-11 10:53:51 | Neil Muller | create | |
