Issue9542
Created on 2010-08-08 23:56 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| PyUnicode_FSDecoder.patch | vstinner, 2010-08-08 23:56 | |||
| Messages (3) | |||
|---|---|---|---|
| msg113352 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-08-08 23:56 | |
For my work on #9425 (Rewrite import machinery to work with unicode paths), I need a PyArg_Parse converter converting bytes and str to str. PyUnicode_FSConverter() is the opposite because it encodes str to bytes. To handle (input) filenames in a function, we have 3 choices: 1/ use bytes: that's the current choice for most Python functions. It gives full unicode support for POSIX OSes (FS using a bytes API), but it is not enough for Windows (Windows uses mbcs encoding which is a very small subset of Unicode) 2/ use str with the PEP 383 (surrogateescape): it begins to be used in Python 3.1, and more seriously in Python 3.2. It offers full unicode support on all OSes (POSIX and Windows) 3/ use the native type for each OS (bytes on POSIX, str on Windows): I dislike this solution because it implies code duplication PyUnicode_FSConverter() is the converter for solution (1). PyUnicode_FSDecoder() will be the converter for the solution (2). |
|||
| msg113740 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-08-13 01:47 | |
Lib/os.py may also be patched to add a Python implementation. Eg. def fsdecode(value): if isinstance(value, str): return value elif isinstance(value, bytes): encoding = sys.getfilesystemencoding() if encoding == 'mbcs': return value.decode(encoding) else: return value.decode(encoding, 'surrogateescape') else: raise TypeError("expect bytes or str, not %s" % type(value).__name__) -- Note: Solution (1) (use bytes API) is not deprecated by this issue. PyUnicode_FSConverter is still useful if the underlying library has a bytes API (eg. OpenSSL only supports char*). Solution (2) is preferred if we have access to a character API, eg. Windows wide character API. |
|||
| msg113854 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-08-14 00:00 | |
Commited to 3.2 as r83990. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:05 | admin | set | github: 53751 |
| 2010-08-14 00:00:29 | vstinner | set | status: open -> closed resolution: fixed messages: + msg113854 |
| 2010-08-13 01:47:35 | vstinner | set | messages: + msg113740 |
| 2010-08-08 23:56:47 | vstinner | create | |
