Issue33255
Created on 2018-04-10 09:21 by nhatcher, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 6523 | closed | nhatcher, 2018-04-18 19:44 | |
| Messages (5) | |||
|---|---|---|---|
| msg315164 - (view) | Author: Nicolás Hatcher (nhatcher) * | Date: 2018-04-10 09:21 | |
Hey I'm new here, so please let me know what incorrect things I am doing!
I _think_ `json.dumps(o, ensure_ascii=False)` is doing the wrong thing when `o` has both unicode and str keys/values. For instance:
```
import json
o = {u"greeting": "hi", "currency": "€"}
json.dumps(o, ensure_ascii=False, encoding="utf8")
json.dumps(o, ensure_ascii=False)
```
The first `dumps` will work while the second will fail. the reason is:
https://github.com/python/cpython/blob/2.7/Lib/json/encoder.py#L198
This will decode any str if the encoding is not 'utf-8'. In the mixed case (unicode and str) this will blow. I workaround is to use any of the aliases for 'utf-8' like 'utf8' or 'u8'.
I would be crazy happy to provide a PR if this is really an issue.
Let me know if extra clarification is needed.
Nicolás
|
|||
| msg315270 - (view) | Author: Ivan Pozdeev (Ivan.Pozdeev) * | Date: 2018-04-13 22:20 | |
Treating 'utf-8' and its aliases differently (when they specifically mean the Python's, rather than something else's, encoding) is definitely as issue. You shouldn't hardcode a list of aliases though; rather use existing facilities to resolve them. From quick googling, e.g. `codecs.lookup(<encoding>).name` can get the canonical name. Make sure to follow https://devguide.python.org/pullrequest when doing the PR; a test case will likely be needed, too. |
|||
| msg315478 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2018-04-19 05:47 | |
In simplejson:
>>> simplejson.dumps({u"greeting": "hi", "currency": "€"}, ensure_ascii=False, encoding="utf8")
u'{"currency": "\u20ac", "greeting": "hi"}'
>>> simplejson.dumps({u"greeting": "hi", "currency": "€"}, ensure_ascii=False)
u'{"currency": "\u20ac", "greeting": "hi"}'
I think it makes sense to fix the case for "utf-8".
|
|||
| msg315890 - (view) | Author: Nicolás Hatcher (nhatcher) * | Date: 2018-04-29 12:08 | |
Hi Sehriy,
I am ok with that change. I think it makes much more sense, but I also think it will break people's codes. At least with the simplest fix in which:
>>> json.dumps({"g"}, ensure_ascii=False)
u'"g"'
Which is again compatible with simplejson.
Although the documentation is not clear in this point there might be code out there relaying on this behaviour.
Is that acceptable?
|
|||
| msg315895 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2018-04-29 13:36 | |
You could decode only non-ascii strings. But I'm not sure that it is worth to change something in 2.7. This could be treated aa a new feature. Left this on to Benjamin, the release manager of 2.7. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:59 | admin | set | github: 77436 |
| 2020-01-05 20:50:13 | cheryl.sabella | set | status: open -> closed resolution: wont fix stage: patch review -> resolved |
| 2018-08-15 13:57:33 | mcepl | set | nosy:
+ mcepl |
| 2018-04-29 13:36:39 | serhiy.storchaka | set | nosy:
+ benjamin.peterson messages: + msg315895 |
| 2018-04-29 12:08:30 | nhatcher | set | messages: + msg315890 |
| 2018-04-19 05:47:35 | serhiy.storchaka | set | nosy:
+ bob.ippolito, serhiy.storchaka messages: + msg315478 |
| 2018-04-18 19:44:34 | nhatcher | set | keywords:
+ patch stage: patch review pull_requests: + pull_request6217 |
| 2018-04-13 22:20:39 | Ivan.Pozdeev | set | nosy:
+ Ivan.Pozdeev messages: + msg315270 |
| 2018-04-10 09:21:14 | nhatcher | create | |
