Issue 30588: Missing documentation for codecs.escape_decode

Issue 30588: Missing documentation for codecs.escape_decode

Issue30588

Created on 2017-06-07 15:02 by mdartiailh, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL	Status	Linked	Edit
PR 14747	open	carlbordum, 2019-07-13 14:19

Messages (9)
msg295342 - (view)	Author: Matthieu Dartiailh (mdartiailh) *	Date: 2017-06-07 15:02
codecs.escape_decode does not appear in the codecs documentation. This function is to my knowledge the only convenient way to process the escaped characters in a literal string (actually found here https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python). It is most useful when implementing a parser for a language extending python semantic while retaining python processing of string (cf https://github.com/MatthieuDartiailh/enaml). Is there a reason for that function not being documented ?
msg295344 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-06-07 15:22
This is an internal function kept for compatibility. It is used only for decoding pickle protocol 0 data created in Python 2. Look at unicode_escape and raw_unicode_escape codecs for doing similar decoding to strings in Python 3.
msg295347 - (view)	Author: Matthieu Dartiailh (mdartiailh) *	Date: 2017-06-07 15:36
The issue is that unicode_escape will not properly handle strings mixing unicode character and escaped character as it assumes latin-1 compatible characters only. For example, given the literal string 'Δ\nΔ', one cannot encode using latin-1 and encoding it using utf-8 then using unicode _escape produces a wrong output: 'Î\x94\nÎ\x94'. However using codecs.escape_decode(r'Δ\nΔ'.encode('utf-8'))[0].decode('utf-8') gives the proper output. Internally the Python parser handle this case but I was unable to find where and this is the closest solution I found. I guess it may be possible using error handlers but it seems much more cumbersome. Best regards Matthieu
msg327259 - (view)	Author: Paul Hoffman (paulehoffman) *	Date: 2018-10-06 21:48
Bumping this thread a bit. It appears that this "internal" function is being talked about out in the real world. I came across it in a recent blog post, saw that it wasn't in the official documentation, and went looking here. I propose that it be documented even if it feels like a tad of a kludge.
msg327268 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2018-10-07 07:58
-1 Internal function means: you can use it on your risk but the function can be changed or even removed in any Python release. I see no point in documenting and making it public.
msg339469 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2019-04-05 01:03
We can't change it or remove it, it is public by virtue of its name. We should document it. Removing or renaming it to be _private requires a PendingDeprecationWarning -> DeprecationWarning -> removal cycle. it is well known and used. https://stackoverflow.com/questions/14820429/how-do-i-decodestring-escape-in-python3/23151714#23151714
msg347827 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-07-13 14:32
I disagree. We can change, rename or remove it because it is not public function and never was. But we can not just remove it while it is used in the pickle module, and there is no reason to change it as it works pretty good for its purpose. If you want to make it public and maintain it, I suggest first discuss this on the Python-Ideas mailing list. You should prove that the benefit of adding it is larger than the cost of the maintance.
msg347919 - (view)	Author: Carl Bordum Hansen (carlbordum) *	Date: 2019-07-14 14:30
You have a point, the function is not in codecs.__all__. Reading the stackoverflow questions, it seems like this is a function that is useful.
msg347922 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-07-14 14:55
Reading the stackoverflow questions, I am not sure that this function would be useful for the author of the question. He just needs to remove b'\\000', this is only what we know. There are many ways to do it, and after using codecs.escape_decode() you will need to remove b'\000'. If you want to add a feature similar to the "string-escape" codec in Python 3, it is better to provide it officially as a new codec "bytes-escape" (functions like codecs.utf_16_le_decode() are internal). But we should discuss its behavior taking to account the difference between string literals in Python 2 and bytes literals in Python 3. For example how to treat non-escaped non-ascii bytes (they where acceptable in Python 2, but not in Python 3).

History
Date	User	Action	Args
2022-04-11 14:58:47	admin	set	github: 74773
2019-07-14 14:55:14	serhiy.storchaka	set	messages: + msg347922
2019-07-14 14:30:25	carlbordum	set	nosy: + carlbordum messages: + msg347919
2019-07-13 14:32:02	serhiy.storchaka	set	messages: + msg347827
2019-07-13 14:19:48	carlbordum	set	keywords: + patch stage: needs patch -> patch review pull_requests: + pull_request14542
2019-04-05 01:03:44	gregory.p.smith	set	stage: needs patch
2019-04-05 01:03:29	gregory.p.smith	set	nosy: + gregory.p.smith, njs messages: + msg339469 versions: + Python 3.8, - Python 3.3, Python 3.4, Python 3.5, Python 3.6
2019-04-05 01:02:19	gregory.p.smith	link	issue36530 superseder
2018-10-07 07:58:25	asvetlov	set	nosy: + asvetlov messages: + msg327268
2018-10-06 21:48:10	paulehoffman	set	nosy: + paulehoffman messages: + msg327259
2017-06-07 15:36:21	mdartiailh	set	messages: + msg295347
2017-06-07 15:22:14	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg295344
2017-06-07 15:02:55	mdartiailh	create