bpo-41894: Fix UTF-8 errors loading native module by kadler · Pull Request #22466 · python/cpython
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.
While this is most likely to occur on AIX, which still uses non-UTF-8
locales by default, it could occur on any Unix-like system.
@aeros, why the "skip news" label? This is clearly a bug fix and those usually entail NEWS entries.
When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere.
For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Ok, I believe all suggestions have been addressed. Wherever a filesystem path has been used, I have switched it to use PyUnicode_DecodeFSDefault and for any system error messages I have used PyUnicode_DecodeLocale. While PEP 489 dictates that the function name will always be ASCII, I have not switched to using PyUnicode_DecodeASCII since it does not solve any problem (other than being potentially less efficient), thus reducing the size of the changes.
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
Oct 15, 2020…GH-22466) When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> (cherry picked from commit 2d2af32) Co-authored-by: Kevin Adler <kadler@us.ibm.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
Oct 15, 2020…GH-22466) When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> (cherry picked from commit 2d2af32) Co-authored-by: Kevin Adler <kadler@us.ibm.com>
miss-islington added a commit that referenced this pull request
Oct 15, 2020When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> (cherry picked from commit 2d2af32) Co-authored-by: Kevin Adler <kadler@us.ibm.com>
miss-islington added a commit that referenced this pull request
Oct 15, 2020When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> (cherry picked from commit 2d2af32) Co-authored-by: Kevin Adler <kadler@us.ibm.com>
kadler
deleted the
unicode-dynload-error
branch
xzy3 pushed a commit to xzy3/cpython that referenced this pull request
Oct 18, 2020…GH-22466) When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
adorilson pushed a commit to adorilson/cpython that referenced this pull request
Mar 13, 2021…GH-22466) When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>