Issue 36297: Remove unicode_internal codec
Created on 2019-03-15 05:32 by methane, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (9)
msg337965 - (view)
Author: Inada Naoki (methane) *
Date: 2019-03-15 05:32
Date: 2019-03-15 09:26
Date: 2019-03-15 16:35
Date: 2019-03-15 16:51
Date: 2019-03-15 16:55
Date: 2019-03-15 17:05
Date: 2019-03-18 06:44
Date: 2019-03-18 09:34
Date: 2019-03-18 10:08
Date: 2019-03-15 05:32
unicode_internal codec is deprecated since Python 3.3.
It raises DeprecationWarning from 3.3.
>>> "hello".encode('unicode_internal')
__main__:1: DeprecationWarning: unicode_internal codec has been deprecated
b'h\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00'
May I remove it in 3.8?
msg337976 - (view)
Author: STINNER Victor (vstinner) *
Date: 2019-03-15 09:26
I found: * _PyUnicode_DecodeUnicodeInternal() * _codecs.unicode_internal_decode() * _codecs.unicode_internal_encode() * Lib/encodings/unicode_internal.py Files which contain "unicode_internal": Doc/library/codecs.rst Doc/whatsnew/3.3.rst Lib/encodings/unicode_internal.py Lib/test/test_codeccallbacks.py Lib/test/test_codecs.py Lib/test/test_unicode.py Misc/HISTORY Modules/_codecsmodule.c Modules/clinic/_codecsmodule.c.h Objects/unicodeobject.c PCbuild/lib.pyproj > May I remove it in 3.8? Since using the codec emits a DeprecationWarning at runtime, I think that it's safe to remove it.msg338000 - (view) Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2019-03-15 16:35
What is the purpose of the unicode-internal codec at first place?msg338005 - (view) Author: Marc-Andre Lemburg (lemburg) *
Date: 2019-03-15 16:51
On 15.03.2019 17:35, Serhiy Storchaka wrote: > > What is the purpose of the unicode-internal codec at first place? It provides a fast and direct access to the internal representation of Unicode used in Python to the outside world.msg338006 - (view) Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2019-03-15 16:55
Is it for debugging only?msg338009 - (view) Author: Marc-Andre Lemburg (lemburg) *
Date: 2019-03-15 17:05
On 15.03.2019 17:55, Serhiy Storchaka wrote: > Is it for debugging only? No, you can use it to store Unicode object as-is without any encoding/decoding, but after the recent changes to the internals of the Unicode implementation it's not all that useful anymore, since we now have per object state which is not reflected by the codec.msg338164 - (view) Author: Inada Naoki (methane) *
Date: 2019-03-18 06:44
New changeset 6a16b18224fa98f6d192aa5014affeccc0376eb3 by Inada Naoki in branch 'master': bpo-36297: remove "unicode_internal" codec (GH-12342) https://github.com/python/cpython/commit/6a16b18224fa98f6d192aa5014affeccc0376eb3msg338184 - (view) Author: STINNER Victor (vstinner) *
Date: 2019-03-18 09:34
Thanks INADA-san. IMHO Python has too many codecs, it's painful to maintain them. So it's nice to see deprecate ones to be removed. Next step: remove all deprecated APIs using Py_UNICODE* :-D (I know that Serhiy is working on that.)msg338190 - (view) Author: Inada Naoki (methane) *
Date: 2019-03-18 10:08
I tried to remove all legacy API and wchar_t cache in unicodeobject. This is experimental branch. https://github.com/methane/cpython/pull/18/files I'm thinking about adding configure option to remove them from 3.8. * It may help people to find third party extensions using legacy API. * Projects which doesn't use such third party extension can use this option to reduce some memory usage (8 byte for all unicode object).