bpo-41894: Fix UTF-8 errors loading native module by kadler · Pull Request #22466 · python/cpython

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

kadler

Copy link

Contributor

@kadler kadler commented Sep 30, 2020

When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

While this is most likely to occur on AIX, which still uses non-UTF-8
locales by default, it could occur on any Unix-like system.

https://bugs.python.org/issue41894

methane

@taleinat

@aeros, why the "skip news" label? This is clearly a bug fix and those usually entail NEWS entries.

@kadler

Copy link

Contributor Author

kadler commented Oct 2, 2020

ZackerySpytz

serhiy-storchaka

serhiy-storchaka

serhiy-storchaka

When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.
For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

@kadler

Copy link

Contributor Author

kadler commented Oct 13, 2020

Ok, I believe all suggestions have been addressed. Wherever a filesystem path has been used, I have switched it to use PyUnicode_DecodeFSDefault and for any system error messages I have used PyUnicode_DecodeLocale. While PEP 489 dictates that the function name will always be ASCII, I have not switched to using PyUnicode_DecodeASCII since it does not solve any problem (other than being potentially less efficient), thus reducing the size of the changes.

serhiy-storchaka

methane

@miss-islington

Thanks @kadler for the PR, and @methane for merging it 🌮🎉.. I'm working now to backport this PR to: 3.8, 3.9.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request

Oct 15, 2020
…GH-22466)

When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 2d2af32)

Co-authored-by: Kevin Adler <kadler@us.ibm.com>

@bedevere-bot

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request

Oct 15, 2020
…GH-22466)

When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 2d2af32)

Co-authored-by: Kevin Adler <kadler@us.ibm.com>

@bedevere-bot

miss-islington added a commit that referenced this pull request

Oct 15, 2020
When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 2d2af32)

Co-authored-by: Kevin Adler <kadler@us.ibm.com>

miss-islington added a commit that referenced this pull request

Oct 15, 2020
When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 2d2af32)

Co-authored-by: Kevin Adler <kadler@us.ibm.com>

@kadler kadler deleted the unicode-dynload-error branch

Oct 16, 2020

xzy3 pushed a commit to xzy3/cpython that referenced this pull request

Oct 18, 2020
…GH-22466)

When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

adorilson pushed a commit to adorilson/cpython that referenced this pull request

Mar 13, 2021
…GH-22466)

When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.

PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem  paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).

In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.

* bpo-41894: Fix bugs in dynload error msg handling

For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.

In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

Labels

type-bug

An unexpected behavior, bug, or error