Issue33928
Created on 2018-06-21 10:55 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Messages (6) | |||
|---|---|---|---|
| msg320154 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2018-06-21 10:55 | |
_Py_DecodeUTF8Ex() creates surrogate pairs with 16-bit wchar_t (on Windows), whereas input bytes should be escaped. I'm quite sure that it's a bug. |
|||
| msg320155 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2018-06-21 10:57 | |
Extract of _Py_DecodeUTF8Ex() code, there is an explicit "write a surrogate pair" comment:
#if SIZEOF_WCHAR_T == 4
ch = ucs4lib_utf8_decode(&s, e, (Py_UCS4 *)unicode, &outpos);
#else
ch = ucs2lib_utf8_decode(&s, e, (Py_UCS2 *)unicode, &outpos);
#endif
if (ch > 0xFF) {
#if SIZEOF_WCHAR_T == 4
Py_UNREACHABLE();
#else
assert(ch > 0xFFFF && ch <= MAX_UNICODE);
/* write a surrogate pair */
unicode[outpos++] = (wchar_t)Py_UNICODE_HIGH_SURROGATE(ch);
unicode[outpos++] = (wchar_t)Py_UNICODE_LOW_SURROGATE(ch);
#endif
}
|
|||
| msg320158 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2018-06-21 11:00 | |
Could you show an example please? |
|||
| msg320163 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2018-06-21 11:10 | |
> Could you show an example please? I saw an issue when reading the code, I didn't try to trigger the issue using real code yet. |
|||
| msg320170 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2018-06-21 11:43 | |
I don't see anything wrong. |
|||
| msg320193 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2018-06-21 16:03 | |
> I don't see anything wrong. I write a C function to test _Py_DecodeUTF8Ex(): * surrogateescape=0 fails with a decoding error as expected * surrogateescape=1 escapes the bytes as expected as: '\udced\udcb2\udc80' Ok, I just misunderstood the code: the decoder is fine! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:02 | admin | set | github: 78109 |
| 2018-06-21 16:03:48 | vstinner | set | status: open -> closed resolution: not a bug messages: + msg320193 stage: resolved |
| 2018-06-21 11:43:19 | serhiy.storchaka | set | messages: + msg320170 |
| 2018-06-21 11:10:16 | vstinner | set | messages: + msg320163 |
| 2018-06-21 11:00:41 | serhiy.storchaka | set | messages: + msg320158 |
| 2018-06-21 10:57:13 | vstinner | set | messages: + msg320155 |
| 2018-06-21 10:55:58 | vstinner | create | |
