[Python-Dev] Format strings, Unicode, and Py2.7: need clarification
Steven D'Aprano
steve at pearwood.info
Wed May 17 20:41:12 EDT 2017
More information about the Python-Dev mailing list
Wed May 17 20:41:12 EDT 2017
- Previous message (by thread): [Python-Dev] Format strings, Unicode, and Py2.7: need clarification
- Next message (by thread): [Python-Dev] Format strings, Unicode, and Py2.7: need clarification
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, May 17, 2017 at 02:41:29PM -0700, Craig Rodrigues wrote:
> e = "{}".format(u"hi")
[...]
> type(e) == str
> The confusion for me is why is type(e) of type str, and not unicode?
I think that's one of the reasons why the Python 2.7 string model is (1)
convenient to those using purely ASCII, but (2) ultimately broken.
You can see why it's broken if you do this:
py> "{}".format(u"hiµ")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in
position 2: ordinal not in range(128)
So it tries to encode the Unicode string to ASCII, and if that succeeds,
format returns a byte str. I'm not sure if that was a deliberate design
choice for format, or just a side-effect of it calling str() on its
arguments by default.
I'm not sure if I've answered your question or not. Are you looking for
justification of this misfeature, or an explanation of the historical
reasons why it exists, or something else?
(If you're looking for the same behaviour in Python 3 and 2.7, probably
the best thing you can do is just religiously use unicode strings u'' in
both. You might try:
from __future__ import unicode_literals
in 2.7, but I'm not sure that's enough.)
--
Steve
- Previous message (by thread): [Python-Dev] Format strings, Unicode, and Py2.7: need clarification
- Next message (by thread): [Python-Dev] Format strings, Unicode, and Py2.7: need clarification
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-Dev mailing list