Micro Python -- a lean and efficient implementation of Python 3
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Tue Jun 10 05:13:25 EDT 2014
More information about the Python-list mailing list
Tue Jun 10 05:13:25 EDT 2014
- Previous message (by thread): Micro Python -- a lean and efficient implementation of Python 3
- Next message (by thread): Micro Python -- a lean and efficient implementation of Python 3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Le mardi 10 juin 2014 09:32:34 UTC+2, wxjm... at gmail.com a écrit : > Le mercredi 4 juin 2014 13:53:19 UTC+2, Robin Becker a écrit : > > > On 04/06/2014 12:01, Tim Chase wrote: > > > > > > > On 2014-06-04 00:58, Paul Rubin wrote: > > > > > > >> Steven D'Aprano <steve at pearwood.info> writes: > > > > > > >>>> Maybe there's a use-case for a microcontroller that works in > > > > > > >>>> ISO-8859-5 natively, thus using only eight bits per character, > > > > > > >>> That won't even make the Russians happy, since in Russia there > > > > > > >>> are multiple incompatible legacy encodings. > > > > > > >> > > > > > > >> I've never understood why not use UTF-8 for everything. > > > > > > > > > > > > > > If you use UTF-8 for everything, then you end up in a world where > > > > > > > string-indexing (see ChrisA's other side thread on this topic) is no > > > > > > > longer an O(1) operation, but an O(N) operation. Some of us slice > > > > > > > strings for a living. ;-) I understand that using UTF-32 would allow > > > > > > > us to maintain O(1) indexing at the cost of every string occupying 4 > > > > > > > bytes per character. The FSR (again, as I understand it) allows > > > > > > > strings that fit in one-byte-per-character to use that, scaling up to > > > > > > > use wider characters internally as they're actually needed/used. > > > > > > > > > > > > > ........ > > > > > > I believe that we should distinguish between glyph/character indexing and string > > > > > > indexing. Even in unicode it may be hard to decide where a visual glyph starts > > > > > > and ends. I assume most people would like to assign one glyph to one unicode, > > > > > > but that's not always possible with composed glyphs. > > > > > > > > > > > > >>> for a in (u'\xc5',u'A\u030a'): > > > > > > ... for o in (u'\xf6',u'o\u0308'): > > > > > > ... u=a+u'ngstr'+o+u'm' > > > > > > ... print("%s %s" % (repr(u),u)) > > > > > > ... > > > > > > u'\xc5ngstr\xf6m' Ångström > > > > > > u'\xc5ngstro\u0308m' Ångström > > > > > > u'A\u030angstr\xf6m' Ångström > > > > > > u'A\u030angstro\u0308m' Ångström > > > > > > >>> u'\xc5ngstr\xf6m'==u'\xc5ngstro\u0308m' > > > > > > False > > > > > > > > > > > > so even unicode doesn't always allow for O(1) glyph indexing. I know this is > > > > > > artificial, but this is the same situation as utf8 faces just the frequency of > > > > > > occurrence is different. A very large amount of computing is still western > > > > > > centric so searching a byte string for latin characters is still efficient; > > > > > > searching for an n with a tilde on top might not be so easy. > > > > > > -- > > > > > > Robin Becker > > > > ========= > > > > Python succeeded to become an anti-unicode product! > > > > jmf ----- And deeply buggy!
- Previous message (by thread): Micro Python -- a lean and efficient implementation of Python 3
- Next message (by thread): Micro Python -- a lean and efficient implementation of Python 3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list