VERY SLOW compared to PERL
David Bolen
db3l at fitlinxx.com
Thu Nov 16 00:48:31 EST 2000
More information about the Python-list mailing list
Thu Nov 16 00:48:31 EST 2000
- Previous message (by thread): .readline() - VERY SLOW compared to PERL
- Next message (by thread): .readline() - VERY SLOW compared to PERL
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Harald Schneider" <h_schneider at marketmix.com> writes: > Thanks or your reply. .readlines() won't fit, since the data is VERY huge. > So .readline() is a must. Well, you can still use readlines() but just supply a maximum buffer size so that it doesn't snarf too much of the file into memory at once. Python will avoid reading past that number (subject to a small minimum like a few K I think), and then you can keep calling readlines() to continue processing the file in chunks. This can make the I/O more efficient as well as the internal processing Python does for each line. Whether or not it works well for the other processing you need to do I can't say, but as a point of comparison, the following two scripts: Perl: open(INPUT,'file.input') or die "Failure opening"; $count = 0; while (<INPUT>) { $count++; } print "$count\n"; Python: file = open('file.input') count = 0 while 1: lines = file.readlines(8192) if not lines: break count = count + len(lines) print count Run on my machine (WinNT 4.0 SP4) on a text file of 100,000 lines of 78 characters (8000000 bytes including line endings) in .951s for the Python script and .651s for the Perl script. Bumping the buffer size up to 64K in the Python script drops it to .751s. So you can get it to within about 15% of the Perl runtime, but of course that mileage may vary once you do other processing within the loop. In general, this sort of raw text processing is just one of those cases where Perl is going to be more efficient than Python in general. Such processing is something that plays into Perl's strengths and something Perl was really designed to handle. With that said, you can generally get Python to be at least competitive (which I'd agree your original 3x factor wasn't), and then it becomes a question of issues such as maintainability and/or purpose of the script as to which language might be more appropriate. You also mentioned search for a key in your first post, so I might also mention that while you would probably use a regex pattern match in Perl to locate lines with that key, depending on the effort to isolate the key from within each line, you may find better performance with Python by using functions from the string module as opposed to regex's. -- -- David -- /-----------------------------------------------------------------------\ \ David Bolen \ E-mail: db3l at fitlinxx.com / | FitLinxx, Inc. \ Phone: (203) 708-5192 | / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \-----------------------------------------------------------------------/
- Previous message (by thread): .readline() - VERY SLOW compared to PERL
- Next message (by thread): .readline() - VERY SLOW compared to PERL
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list