Unicode script
MRAB
python at mrabarnett.plus.com
Thu Dec 15 21:44:20 EST 2016
More information about the Python-list mailing list
Thu Dec 15 21:44:20 EST 2016
- Previous message (by thread): Unicode script
- Next message (by thread): Unicode script
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2016-12-15 21:57, Terry Reedy wrote: > On 12/15/2016 1:06 PM, MRAB wrote: >> On 2016-12-15 16:53, Steve D'Aprano wrote: >>> Suppose I have a Unicode character, and I want to determine the script or >>> scripts it belongs to. >>> >>> For example: >>> >>> U+0033 DIGIT THREE "3" belongs to the script "COMMON"; >>> U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN"; >>> U+03BE GREEK SMALL LETTER XI "ΞΎ" belongs to the script "GREEK". >>> >>> >>> Is this information available from Python? >>> >>> >>> More about Unicode scripts: >>> >>> http://www.unicode.org/reports/tr24/ >>> http://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt >>> http://www.unicode.org/Public/UCD/latest/ucd/ScriptExtensions.txt >>> >>> >> Interestingly, there's issue 6331 "Add unicode script info to the >> unicode database". Looks like it didn't make it into Python 3.6. > > https://bugs.python.org/issue6331 > Opened in 2009 with patch and 2 revisions for 2.x. At least the Python > code needs to be updated. > > Approved in principle by Martin, then unicodedata curator, but no longer > active. Neither, very much, are the other 2 listed in the Expert's index. > > From what I could see, both the Python API (there is no doc patch yet) > and internal implementation need more work. If I were to get involved, > I would look at the APIs of PyICU (see Eryk Sun's post) and the > unicodescript module on PyPI (mention by Pander Musubi, on the issue). > For what it's worth, the post has prompted me to get back to a module I started which will report such Unicode properties, essentially the ones that the regex module supports. It just needs a few more tweaks and packaging up...
- Previous message (by thread): Unicode script
- Next message (by thread): Unicode script
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list