Issue30588
Created on 2017-06-07 15:02 by mdartiailh, last changed 2022-04-11 14:58 by admin.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 14747 | open | carlbordum, 2019-07-13 14:19 | |
| Messages (9) | |||
|---|---|---|---|
| msg295342 - (view) | Author: Matthieu Dartiailh (mdartiailh) * | Date: 2017-06-07 15:02 | |
codecs.escape_decode does not appear in the codecs documentation. This function is to my knowledge the only convenient way to process the escaped characters in a literal string (actually found here https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python). It is most useful when implementing a parser for a language extending python semantic while retaining python processing of string (cf https://github.com/MatthieuDartiailh/enaml). Is there a reason for that function not being documented ? |
|||
| msg295344 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2017-06-07 15:22 | |
This is an internal function kept for compatibility. It is used only for decoding pickle protocol 0 data created in Python 2. Look at unicode_escape and raw_unicode_escape codecs for doing similar decoding to strings in Python 3. |
|||
| msg295347 - (view) | Author: Matthieu Dartiailh (mdartiailh) * | Date: 2017-06-07 15:36 | |
The issue is that unicode_escape will not properly handle strings mixing
unicode character and escaped character as it assumes latin-1 compatible
characters only. For example, given the literal string 'Δ\nΔ', one
cannot encode using latin-1 and encoding it using utf-8 then using
unicode _escape produces a wrong output: 'Î\x94\nÎ\x94'. However using
codecs.escape_decode(r'Δ\nΔ'.encode('utf-8'))[0].decode('utf-8') gives
the proper output. Internally the Python parser handle this case but I
was unable to find where and this is the closest solution I found. I
guess it may be possible using error handlers but it seems much more
cumbersome.
Best regards
Matthieu
|
|||
| msg327259 - (view) | Author: Paul Hoffman (paulehoffman) * | Date: 2018-10-06 21:48 | |
Bumping this thread a bit. It appears that this "internal" function is being talked about out in the real world. I came across it in a recent blog post, saw that it wasn't in the official documentation, and went looking here. I propose that it be documented even if it feels like a tad of a kludge. |
|||
| msg327268 - (view) | Author: Andrew Svetlov (asvetlov) * ![]() |
Date: 2018-10-07 07:58 | |
-1 Internal function means: you can use it on your risk but the function can be changed or even removed in any Python release. I see no point in documenting and making it public. |
|||
| msg339469 - (view) | Author: Gregory P. Smith (gregory.p.smith) * ![]() |
Date: 2019-04-05 01:03 | |
We can't change it or remove it, it is public by virtue of its name. We should document it. Removing or renaming it to be _private requires a PendingDeprecationWarning -> DeprecationWarning -> removal cycle. it is well known and used. https://stackoverflow.com/questions/14820429/how-do-i-decodestring-escape-in-python3/23151714#23151714 |
|||
| msg347827 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2019-07-13 14:32 | |
I disagree. We can change, rename or remove it because it is not public function and never was. But we can not just remove it while it is used in the pickle module, and there is no reason to change it as it works pretty good for its purpose. If you want to make it public and maintain it, I suggest first discuss this on the Python-Ideas mailing list. You should prove that the benefit of adding it is larger than the cost of the maintance. |
|||
| msg347919 - (view) | Author: Carl Bordum Hansen (carlbordum) * | Date: 2019-07-14 14:30 | |
You have a point, the function is not in codecs.__all__. Reading the stackoverflow questions, it seems like this is a function that is useful. |
|||
| msg347922 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2019-07-14 14:55 | |
Reading the stackoverflow questions, I am not sure that this function would be useful for the author of the question. He just needs to remove b'\\000', this is only what we know. There are many ways to do it, and after using codecs.escape_decode() you will need to remove b'\000'. If you want to add a feature similar to the "string-escape" codec in Python 3, it is better to provide it officially as a new codec "bytes-escape" (functions like codecs.utf_16_le_decode() are internal). But we should discuss its behavior taking to account the difference between string literals in Python 2 and bytes literals in Python 3. For example how to treat non-escaped non-ascii bytes (they where acceptable in Python 2, but not in Python 3). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:47 | admin | set | github: 74773 |
| 2019-07-14 14:55:14 | serhiy.storchaka | set | messages: + msg347922 |
| 2019-07-14 14:30:25 | carlbordum | set | nosy:
+ carlbordum messages: + msg347919 |
| 2019-07-13 14:32:02 | serhiy.storchaka | set | messages: + msg347827 |
| 2019-07-13 14:19:48 | carlbordum | set | keywords:
+ patch stage: needs patch -> patch review pull_requests: + pull_request14542 |
| 2019-04-05 01:03:44 | gregory.p.smith | set | stage: needs patch |
| 2019-04-05 01:03:29 | gregory.p.smith | set | nosy:
+ gregory.p.smith, njs messages:
+ msg339469 |
| 2019-04-05 01:02:19 | gregory.p.smith | link | issue36530 superseder |
| 2018-10-07 07:58:25 | asvetlov | set | nosy:
+ asvetlov messages: + msg327268 |
| 2018-10-06 21:48:10 | paulehoffman | set | nosy:
+ paulehoffman messages: + msg327259 |
| 2017-06-07 15:36:21 | mdartiailh | set | messages: + msg295347 |
| 2017-06-07 15:22:14 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg295344 |
| 2017-06-07 15:02:55 | mdartiailh | create | |
