[Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

Antoine Pitrou solipsis at pitrou.net
Fri Nov 25 13:11:57 CET 2011

Previous message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
Next message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 25 Nov 2011 22:37:49 +1100
Matt Joiner <anacrolix at gmail.com> wrote:
> On Fri, Nov 25, 2011 at 10:04 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> > On Fri, 25 Nov 2011 20:34:21 +1100
> > Matt Joiner <anacrolix at gmail.com> wrote:
> >>
> >> It's Python 3.2. I tried it for larger files and got some interesting results.
> >>
> >> readinto() for 10MB files, reading 10MB all at once:
> >>
> >> readinto/2.7 100 loops, best of 3: 8.6 msec per loop
> >> readinto/3.2 10 loops, best of 3: 29.6 msec per loop
> >> readinto/3.3 100 loops, best of 3: 19.5 msec per loop
> >>
> >> With 100KB chunks for the 10MB file (annotated with #):
> >>
> >> matt at stanley:~/Desktop$ for f in read bytearray_read readinto; do for
> >> v in 2.7 3.2 3.3; do echo -n "$f/$v "; "python$v" -m timeit -s 'import
> >> readinto' "readinto.$f()"; done; done
> >> read/2.7 100 loops, best of 3: 7.86 msec per loop # this is actually
> >> faster than the 10MB read
> >> read/3.2 10 loops, best of 3: 253 msec per loop # wtf?
> >> read/3.3 10 loops, best of 3: 747 msec per loop # wtf??
> >
> > No "wtf" here, the read() loop is quadratic since you're building a
> > new, larger, bytes object every iteration.  Python 2 has a fragile
> > optimization for concatenation of strings, which can avoid the
> > quadratic behaviour on some systems (depends on realloc() being fast).
> 
> Is there any way to bring back that optimization? a 30 to 100x slow
> down on probably one of the most common operations... string
> contatenation, is very noticeable. In python3.3, this is representing
> a 0.7s stall building a 10MB string. Python 2.7 did this in 0.007s.

Well, extending a bytearray() (as you saw yourself) is the proper
solution in such cases. Note that you probably won't see a difference
when concatenating very small strings.

It would be interesting if you could run the same benchmarks on other
OSes (Windows or OS X, for example).

Regards

Antoine.

Previous message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
Next message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list