Issue 34484: Unicode HOWTO incorrectly refers to Private Use Area for surrogateescape
Created on 2018-08-23 21:14 by mark.dickinson, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 12155 | merged | miss-islington, 2019-03-04 04:10 | |
| Messages (8) | |||
|---|---|---|---|
| msg323976 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2018-08-23 21:14 | |
The Unicode HOWTO currently has contains this text in the "Files in an Unknown Encoding" section [1]: > The surrogateescape error handler will decode any non-ASCII bytes as code > points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These > private code points will then be turned back into the same bytes when the > surrogateescape error handler is used when encoding the data and writing it > back out. Unless I'm missing something, the subrange U+DC80 to U+DCFF of the low surrogates is *not* a Private Use Area. There *is* a kinda-sorta PUA in the high surrogates from U+DB80 to U+DBFF (because the only valid codepoints that use these surrogates in their UTF-16 encoding are the codepoints in planes 15 and 16, which are almost entirely PUA codepoints), but that's not what the surrogateescape handler is using. [1] https://docs.python.org/3/howto/unicode.html#files-in-an-unknown-encoding |
|||
| msg323977 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2018-08-23 21:24 | |
For history, this text was introduced as a result of issue #4163. |
|||
| msg323978 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2018-08-23 21:25 | |
Whoops. Sorry, that should be #4153. |
|||
| msg325432 - (view) | Author: A.M. Kuchling (akuchling) * ![]() |
Date: 2018-09-15 13:46 | |
Corrected in the unicode-howto-update branch being developed for issue #20906. |
|||
| msg337069 - (view) | Author: A.M. Kuchling (akuchling) * ![]() |
Date: 2019-03-04 04:10 | |
New changeset 97c288df614dd7856f5a0336925f56a7a2a5bc74 by Andrew Kuchling in branch 'master': bpo-20906: Various revisions to the Unicode howto (#8394) https://github.com/python/cpython/commit/97c288df614dd7856f5a0336925f56a7a2a5bc74 |
|||
| msg337105 - (view) | Author: miss-islington (miss-islington) | Date: 2019-03-04 13:01 | |
New changeset 84fa6b9e5932af981cb299c0c5ac80b9cc37c3fa by Miss Islington (bot) in branch '3.7': bpo-20906: Various revisions to the Unicode howto (GH-8394) https://github.com/python/cpython/commit/84fa6b9e5932af981cb299c0c5ac80b9cc37c3fa |
|||
| msg337222 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2019-03-05 16:32 | |
Thanks for the fix. @akuchling: safe to close this issue? |
|||
| msg342499 - (view) | Author: A.M. Kuchling (akuchling) * ![]() |
Date: 2019-05-14 18:36 | |
Yes, I think this issue can now be closed. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:05 | admin | set | github: 78665 |
| 2019-05-14 18:36:01 | akuchling | set | status: open -> closed messages:
+ msg342499 |
| 2019-03-05 16:32:35 | mark.dickinson | set | messages: + msg337222 |
| 2019-03-04 13:01:50 | miss-islington | set | nosy:
+ miss-islington messages: + msg337105 |
| 2019-03-04 04:10:40 | miss-islington | set | keywords:
+ patch pull_requests: + pull_request12154 |
| 2019-03-04 04:10:38 | akuchling | set | messages: + msg337069 |
| 2019-03-02 21:50:48 | akuchling | set | stage: patch review |
| 2018-09-15 13:46:48 | akuchling | set | assignee: docs@python -> akuchling |
| 2018-08-23 21:25:15 | mark.dickinson | set | messages: + msg323978 |
| 2018-08-23 21:24:39 | mark.dickinson | set | messages: + msg323977 |
| 2018-08-23 21:14:39 | mark.dickinson | create | |
