Issue 36297: Remove unicode_internal codec

Created on 2019-03-15 05:32 by methane, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (9) msg337965 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-03-15 05:32
unicode_internal codec is deprecated since Python 3.3.
It raises DeprecationWarning from 3.3.

>>> "hello".encode('unicode_internal')
__main__:1: DeprecationWarning: unicode_internal codec has been deprecated
b'h\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00'

May I remove it in 3.8?
msg337976 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-15 09:26
I found:

* _PyUnicode_DecodeUnicodeInternal()
* _codecs.unicode_internal_decode()
* _codecs.unicode_internal_encode()
* Lib/encodings/unicode_internal.py

Files which contain "unicode_internal":

Doc/library/codecs.rst
Doc/whatsnew/3.3.rst
Lib/encodings/unicode_internal.py
Lib/test/test_codeccallbacks.py
Lib/test/test_codecs.py
Lib/test/test_unicode.py
Misc/HISTORY
Modules/_codecsmodule.c
Modules/clinic/_codecsmodule.c.h
Objects/unicodeobject.c
PCbuild/lib.pyproj


> May I remove it in 3.8?

Since using the codec emits a DeprecationWarning at runtime, I think that it's safe to remove it.
msg338000 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-03-15 16:35
What is the purpose of the unicode-internal codec at first place?
msg338005 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2019-03-15 16:51
On 15.03.2019 17:35, Serhiy Storchaka wrote:
> 
> What is the purpose of the unicode-internal codec at first place?

It provides a fast and direct access to the internal representation of
Unicode used in Python to the outside world.
msg338006 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-03-15 16:55
Is it for debugging only?
msg338009 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2019-03-15 17:05
On 15.03.2019 17:55, Serhiy Storchaka wrote:
> Is it for debugging only?

No, you can use it to store Unicode object as-is without any
encoding/decoding, but after the recent changes to the internals
of the Unicode implementation it's not all that useful anymore,
since we now have per object state which is not reflected by the
codec.
msg338164 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-03-18 06:44
New changeset 6a16b18224fa98f6d192aa5014affeccc0376eb3 by Inada Naoki in branch 'master':
bpo-36297: remove "unicode_internal" codec (GH-12342)
https://github.com/python/cpython/commit/6a16b18224fa98f6d192aa5014affeccc0376eb3
msg338184 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-18 09:34
Thanks INADA-san. IMHO Python has too many codecs, it's painful to maintain them. So it's nice to see deprecate ones to be removed.

Next step: remove all deprecated APIs using Py_UNICODE* :-D (I know that Serhiy is working on that.)
msg338190 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-03-18 10:08
I tried to remove all legacy API and wchar_t cache in unicodeobject.  This is experimental branch.
https://github.com/methane/cpython/pull/18/files


I'm thinking about adding configure option to remove them from 3.8.

* It may help people to find third party extensions using legacy API.
* Projects which doesn't use such third party extension can use this option to reduce some memory usage (8 byte for all unicode object).