[Python-Dev] Optimize Unicode strings in Python 3.3
martin at v.loewis.de
martin at v.loewis.de
Fri May 4 02:52:46 CEST 2012
More information about the Python-Dev mailing list
Fri May 4 02:52:46 CEST 2012
- Previous message: [Python-Dev] Optimize Unicode strings in Python 3.3
- Next message: [Python-Dev] Optimize Unicode strings in Python 3.3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> Various notes: > * PyUnicode_READ() is slower than reading a Py_UNICODE array. > * Some decoders unroll the main loop to process 4 or 8 bytes (32 or > 64 bits CPU) at each step. > > I am interested if you know other tricks to optimize Unicode strings > in Python, or if you are interested to work on this topic. Beyond creation, the most frequent approach is to specialize loops for all three possible width, allowing the compiler to hard-code the element size. This brings it back in performance to the speed of accessing a Py_UNICODE array (or faster for 1-byte strings). A possible micro-optimization might be to use pointer arithmetic instead of indexing. However, I would expect that compilers will already convert a counting loop into pointer arithmetic if the index is only ever used for array access. A source of slow-down appears to be widening copy operations. I wonder whether microprocessors are able to do this faster than what the compiler generates out of a naive copying loop. Another potential area for further optimization is to better pass-through PyObject*. Some APIs still use char* or Py_UNICODE*, when the caller actually holds a PyObject*, and the callee ultimate recreates an object out of the pointers being passed. Some people (hi Larry) still think that using a rope representation for string concatenation might improve things, see #1569040. Regards, Martin
- Previous message: [Python-Dev] Optimize Unicode strings in Python 3.3
- Next message: [Python-Dev] Optimize Unicode strings in Python 3.3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-Dev mailing list