Issue11114
Created on 2011-02-04 08:52 by Laurens, last changed 2022-04-11 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| tell v01.py | Laurens, 2011-02-04 10:40 | example program | ||
| textiotell.patch | pitrou, 2011-02-04 18:10 | |||
| textiotell2.patch | pitrou, 2011-02-04 20:06 | |||
| Messages (15) | |||
|---|---|---|---|
| msg127874 - (view) | Author: Laurens (Laurens) | Date: 2011-02-04 08:52 | |
file.tell() has become extremely slow in version 3.2, both rc1 and rc2. This problem did not exist in version 2.7.1, nor in version 3.1. It could be reproduced both on mac and windows xp. |
|||
| msg127877 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2011-02-04 09:39 | |
Do you have a benchmark program you can post? |
|||
| msg127881 - (view) | Author: Laurens (Laurens) | Date: 2011-02-04 10:40 | |
Correction: the problem also exists in version 3.1. I created a benchmark program an ran it on my machine (iMac, snow leopard 10.6), with the following results: ------------------------------------------ 2.6.6 (r266:84292, Dec 30 2010, 09:20:14) [GCC 4.2.1 (Apple Inc. build 5664)] result: 0.0009 s. ------------------------------------------ 2.7.1 (r271:86832, Jan 13 2011, 07:38:03) [GCC 4.2.1 (Apple Inc. build 5664)] result: 0.0008 s. ------------------------------------------ 3.1.3 (r313:86882M, Nov 30 2010, 09:55:56) [GCC 4.0.1 (Apple Inc. build 5494)] result: 9.5682 s. ------------------------------------------ 3.2rc2 (r32rc2:88269, Jan 30 2011, 14:30:28) [GCC 4.2.1 (Apple Inc. build 5664)] result: 8.3531 s. Removing the line containing "tell" gives the following results: ------------------------------------------ 2.6.6 (r266:84292, Dec 30 2010, 09:20:14) [GCC 4.2.1 (Apple Inc. build 5664)] result: 0.0007 s. ------------------------------------------ 2.7.1 (r271:86832, Jan 13 2011, 07:38:03) [GCC 4.2.1 (Apple Inc. build 5664)] result: 0.0006 s. ------------------------------------------ 3.1.3 (r313:86882M, Nov 30 2010, 09:55:56) [GCC 4.0.1 (Apple Inc. build 5494)] result: 0.0093 s. ------------------------------------------ 3.2rc2 (r32rc2:88269, Jan 30 2011, 14:30:28) [GCC 4.2.1 (Apple Inc. build 5664)] result: 0.0007 s. (Apparently, reading a file became a lot faster from 3.1 to 3.2.) Conclusion: Execution of file.tell() makes the program about 10000 times slower. Remark: the file mdutch.txt is a dummy text file containing 1000 lines with one word on each line. |
|||
| msg127889 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * ![]() |
Date: 2011-02-04 13:53 | |
I found that adding "infile._CHUNK_SIZE = 20" makes the test much faster - 'only' 5 times slower than 2.7. |
|||
| msg127892 - (view) | Author: DSM (dsm001) | Date: 2011-02-04 14:02 | |
With a similar setup (OS X 10.6) I see the same problem. It seems to go away if the file is opened in binary mode for reading. @Laurens, can you confirm? |
|||
| msg127893 - (view) | Author: DSM (dsm001) | Date: 2011-02-04 14:04 | |
(By "go away" I mean "stop being pathological", not "stop differing": I still see a factor of 2.) |
|||
| msg127894 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-02-04 14:10 | |
That's expected. seek() and tell() on text (unicode) files are slow by construction. You should open your file in binary mode instead, if you want to do any seeking. Maybe I should add a note in http://docs.python.org/dev/library/io.html#performance |
|||
| msg127901 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-02-04 16:26 | |
That said, I think it is possible to make algorithmic improvements to TextIOWrapper.tell() so that at least performance becomes acceptable. |
|||
| msg127907 - (view) | Author: Laurens (Laurens) | Date: 2011-02-04 16:54 | |
First of all, thanks to all for your cooperation, it is very much appreciated. I made some minor changes to the benchmark program. Conclusions are: * setting file._CHUNK_SIZE to 20 has a dramatic effect, changing execution time in 3.2rc2 from 8.4s to 0.06s, so more than a factor 100 improvement. It's a bit clumsy, but acceptable as a performance work around. * opening file binary has a dramatic effect as well, I would say. After 2 minutes I stopped execution of the program, concluding that this change made the program at least a factor 10 *slower* instead of faster. So I cannot confirm DSM's statement that the performance hit would be a factor 2. Instead, I see a performance hit of at least a factor 100000 (10e5) compared tot 2.7.1, which is presumably not by construction ;-). |
|||
| msg127908 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-02-04 16:57 | |
> opening file binary has a dramatic effect as well, I would say. After 2 > minutes I stopped execution of the program Hint: b'' is not equal to '' ;) |
|||
| msg127914 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-02-04 18:10 | |
Here is a proof-of-concept patch for the pure Python version of TextIOWrapper.tell(). It turns the O(CHUNK_SIZE) operation into an O(1) operation most of time (still O(CHUNK_SIZE) worst-case - weird decoders and/or crazy input). |
|||
| msg127915 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-02-04 18:19 | |
> Here is a proof-of-concept patch for the pure Python version of > TextIOWrapper.tell(). It turns the O(CHUNK_SIZE) operation into an > O(1) operation most of time (still O(CHUNK_SIZE) worst-case - weird > decoders and/or crazy input). Actually, that's wrong. The patch is still O(CHUNK_SIZE) but with a vastly different multiplier (and, optimistically, much smaller, as common codecs are codecs in C). |
|||
| msg127933 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-02-04 20:06 | |
New patch also optimizing the C version. tell() can be more than 100x faster now (still much slower than binary tell()). |
|||
| msg127934 - (view) | Author: Laurens (Laurens) | Date: 2011-02-04 20:09 | |
All, thanks for your help. Opening the file in binary mode worked immediately in the toy program (that is, the benchmark code I sent you). (Antoine, thanks for the hint.) In my real world program, I solved the problem by reading a line from a binary input file, and decode it explicitly and immediately to a string. The performance has become acceptable now, and I propose to close this issue. Thanks again, cheers, Laurens |
|||
| msg129418 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-02-25 20:29 | |
Committed in r88607 (3.3). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:12 | admin | set | github: 55323 |
| 2011-02-25 20:29:37 | pitrou | set | status: open -> closed nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg129418 resolution: accepted -> fixed |
| 2011-02-04 20:09:29 | Laurens | set | resolution: accepted messages: + msg127934 nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens |
| 2011-02-04 20:06:01 | pitrou | set | files:
+ textiotell2.patch messages:
+ msg127933 |
| 2011-02-04 18:20:00 | pitrou | set | nosy:
amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg127915 |
| 2011-02-04 18:10:04 | pitrou | set | files:
+ textiotell.patch messages:
+ msg127914 |
| 2011-02-04 16:57:11 | pitrou | set | nosy:
amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg127908 |
| 2011-02-04 16:54:35 | Laurens | set | nosy:
amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg127907 |
| 2011-02-04 16:26:08 | pitrou | set | versions:
+ Python 3.3, - Python 3.2 nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens title: file.tell extremely slow -> TextIOWrapper.tell extremely slow messages: + msg127901 type: performance |
| 2011-02-04 14:10:49 | pitrou | set | nosy:
amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg127894 |
| 2011-02-04 14:04:58 | dsm001 | set | nosy:
amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg127893 |
| 2011-02-04 14:02:28 | dsm001 | set | nosy:
+ dsm001 messages: + msg127892 |
| 2011-02-04 13:53:36 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc, pitrou messages: + msg127889 |
| 2011-02-04 10:40:18 | Laurens | set | files:
+ tell v01.py nosy: eric.smith, Laurens messages: + msg127881 |
| 2011-02-04 09:39:55 | eric.smith | set | nosy:
+ eric.smith messages: + msg127877 |
| 2011-02-04 08:52:36 | Laurens | create | |
