Message 315504 - Python tracker

Message315504

Author	pekka.klarck
Recipients	pekka.klarck
Date	2018-04-20.07:24:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1524209082.19.0.682650639539.issue33317@psf.upfronthosting.co.za>
In-reply-to

Content
If I have two strings that look the same but have different Unicode form, it's very hard to see where the problem actually is: >>> a = 'hyv\xe4' >>> b = 'hyva\u0308' >>> print(a) hyvä >>> print(b) hyvä >>> a == b False >>> print(repr(a)) 'hyvä' >>> print(repr(b)) 'hyvä' This affects, for example, test automation frameworks using `repr()` in error reporting. For example, both unittest and pytest report `self.assertEqual('hyv\xe4', 'hyva\u0308')` like this: AssertionError: 'hyvä' != 'hyvä' - hyvä + hyvä Because the NFC form is used by strings by default, I would propose that `repr()` would show the decomposed form if the string is in NFD. In practice I'd like `repr('hyva\0308')` to yield `'hyva\0308'`.

Content

If I have two strings that look the same but have different Unicode form, it's very hard to see where the problem actually is:

>>> a = 'hyv\xe4'
>>> b = 'hyva\u0308'
>>> print(a)
hyvä
>>> print(b)
hyvä
>>> a == b
False
>>> print(repr(a))
'hyvä'
>>> print(repr(b))
'hyvä'

This affects, for example, test automation frameworks using `repr()` in error reporting. For example, both unittest and pytest report `self.assertEqual('hyv\xe4', 'hyva\u0308')` like this:

AssertionError: 'hyvä' != 'hyvä'
- hyvä
+ hyvä

Because the NFC form is used by strings by default, I would propose that `repr()` would show the decomposed form if the string is in NFD. In practice I'd like `repr('hyva\0308')` to yield `'hyva\0308'`.

History
Date	User	Action	Args
2018-04-20 07:24:42	pekka.klarck	set	recipients: + pekka.klarck
2018-04-20 07:24:42	pekka.klarck	set	messageid: <1524209082.19.0.682650639539.issue33317@psf.upfronthosting.co.za>
2018-04-20 07:24:42	pekka.klarck	link	issue33317 messages
2018-04-20 07:24:41	pekka.klarck	create