How to count lines in a text file ?
Alex Martelli
aleaxit at yahoo.com
Wed Sep 22 15:17:01 EDT 2004
More information about the Python-list mailing list
Wed Sep 22 15:17:01 EDT 2004
- Previous message (by thread): How to count lines in a text file ?
- Next message (by thread): How to count lines in a text file ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Bengt Richter <bokr at oz.net> wrote: ... > >memory at once. If you must be able to deal with humungoug files, too > >big to fit in memory at once, try something like: > > > >numlines = 0 > >for line in open('text.txt'): numlines += 1 > > I don't have 2.4 2.4a3 is freely available for download and everybody's _encouraged_ to download it and try it out -- come on, don't be the last one to!-) > but how would that compare with a generator expression like (untested) > > sum(1 for line in open('text.txt')) > > or, if you _are_ willing to read in the whole file, > > open('text.txt').read().count('\n') I'm not on the same machine as when I ran the other timing measurements (including pyrex &c) but here's the results on this one machine...: $ wc /usr/share/dict/words 234937 234937 2486825 /usr/share/dict/words $ python2.4 ~/cb/timeit.py "numlines=0 for line in file('/usr/share/dict/words'): numlines+=1" 10 loops, best of 3: 3.08e+05 usec per loop $ python2.4 ~/cb/timeit.py "file('/usr/share/dict/words').read().count('\n')" 10 loops, best of 3: 2.72e+05 usec per loop $ python2.4 ~/cb/timeit.py "len(file('/usr/share/dict/words').readlines())" 10 loops, best of 3: 3.25e+05 usec per loop $ python2.4 ~/cb/timeit.py "sum(1 for line in file('/usr/share/dict/words'))" 10 loops, best of 3: 4.42e+05 usec per loop Last but not least...: $ python2.4 ~/cb/timeit.py -s'import cou' "cou.cou(file('/usr/share/dict/words'))" 10 loops, best of 3: 2.05e+05 usec per loop where cou.pyx is the pyrex program I've already shown on the other subthread. Using the count.c I've also shown takes 2.03e+05 usec. (Can't try psyco here, not an intel-like cpu). Summary: "sum(1 for ...)" is no speed daemon; the plain loop is best among the pure-python approaches for files that can't fit in memory. If the file DOES fit in memory, read().count('\n') is faster, but len(...readlines()) is slower. Pyrex rocks, essentially removing the need for C-coded extensions (less than a 1% advantage) -- and so does psyco, but not if you're using a Mac (quick, somebody gift Armin Rigo with a Mac before it's too late...!!!). Alex
- Previous message (by thread): How to count lines in a text file ?
- Next message (by thread): How to count lines in a text file ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list