How to count lines in a text file ?
Alex Martelli
aleaxit at yahoo.com
Wed Sep 22 09:37:37 EDT 2004
More information about the Python-list mailing list
Wed Sep 22 09:37:37 EDT 2004
- Previous message (by thread): How to count lines in a text file ?
- Next message (by thread): How to count lines in a text file ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Christos TZOTZIOY Georgiou <tzot at sil-tec.gr> wrote: ... > >memory at once. If you must be able to deal with humungoug files, too > >big to fit in memory at once, try something like: > > > >numlines = 0 > >for line in open('text.txt'): numlines += 1 > > And a short story of premature optimisation follows... Thanks for sharing! > def count_lines(filename): > fp = open(filename) > count = 1 + max(enumerate(fp))[0] > fp.close() > return count Cute, actually! > containing Alex' code. Guess what? My code was slower... (and I should > put a try: except Value: clause to cater for empty files) > > Of course, on second thought, the reason must be that enumerate > generates one tuple for every line in the file; in any case, I'll mark I thought built-ins could recycle their tuples, sometimes, but you may in fact be right (we should check with Raymong Hettinger, though). With 2.4, I measure 30 msec with your approach, and 24 with mine, to count the 45425 lines of /usr/share/dict/words on my Linux box (admittedly not a creat example of 'humungous file'); and similarly kjv.txt, a King James' Bible (31103 lines, but 10 times the size of the words file), 41 with yours, 36 with mine. They're pretty close. At least they beat len(file(...).readlines()), which takes 33 on words, 62 on kjv.txt... If one is really in a hurry counting lines, a dedicated C extension might help. E.g.: static PyObject *count(PyObject *self, PyObject *args) { PyObject* seq; PyObject* item; int result; /* get one argument as an iterator */ if(!PyArg_ParseTuple(args, "O", &seq)) return 0; seq = PyObject_GetIter(seq); if(!seq) return 0; /* count items */ result = 0; while((item=PyIter_Next(seq))) { result += 1; Py_DECREF(item); } /* clean up and return result */ Py_DECREF(seq); return Py_BuildValue("i", result); } Using this count-items-in-iterable thingy, words takes 10 msec, kjv takes 26. Happier news is that one does NOT have to learn C to gain this. Consider the Pyrex file: def count(seq): cdef int i it = iter(seq) i = 0 for x in it: i = i + 1 return i pyrexc'ing this and building the Python extension from the resulting C file gives just about the same performance as the pure-C coding: 10 msec on words, 26 on kjv, the same to within 1% as pure-C coding (there is a systematic speedup of a bit less than 1% for the C-coded function). And if one doesn't even want to use pyrex? Why, that's what psyco is for...: import psyco def count(seq): it = iter(seq) i = 0 for x in it: i = i + 1 return i psyco.bind(seq) Again to the same level of precision, the SAME numbers, 10 and 26 msec (actually, in this case the less-than-1% systematic bias is in favour of psyco compared to pure-C coding...!-) So: your instinct that C-coded loops are faster weren't too badly off... and you can get the same performance (just about) with Pyrex or (on an intel or compatible processor, only -- sigh) with psyco. Alex
- Previous message (by thread): How to count lines in a text file ?
- Next message (by thread): How to count lines in a text file ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list