looking for speed-up ideas
William Park
opengeometry at yahoo.ca
Mon Feb 3 20:22:32 EST 2003
More information about the Python-list mailing list
Mon Feb 3 20:22:32 EST 2003
- Previous message (by thread): looking for speed-up ideas
- Next message (by thread): looking for speed-up ideas
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Ram Bhamidipaty <ramb at sonic.net> wrote: > I have some python code that processes a large file. I want to see how > much faster this code can get. Mind you, I don't _need_ the code to go > faster - but it sure would be nice if it were faster... > > Here is a specification of the input: > 1. Lines start with T, S or F > 2. The first line of the file starts with > T, all other lines start with S or F. > 3. F lines look like "F/<number>/string" > 4. S lines look like "S/string/<number>/<number>" > > Here is a sample: > > T /remote 0 > S/name/0/1 > S/joe/1/2 > S/bob/1/3 > F/3150900/big_file.tar.gz > S/testing/3/4 > F/414/.envrc > F/276/BUILD_FLAGS > F/36505/make.incl > F/3861/build_envrc > > In case you are curious the file is a dump of a file system. F lines > specify a file name and file size. S lines speficy a directory. The > numbers on an S line represent a directory number and a directory > parent number. All the F lines under an S line are files in a > particular directory. > > My script reads the file and prints out the 200 largest files. Behold: egrep '^F' dumpfile | sort -t '/' -n -k 2,2 | tail -200 How fast does it run? > > I am currently using the heapcq module written by John Eikenberry. I > downloaded it from here: http://zhar.net/projects/python/ > > There is an edited version of the script at the end of this message. > > My script current processes 300,000 lines in about 18 seconds > on a Sun Ultra 60. To make preformance testing easier the > script currently limits processing to just reading 300,000 lines. > The wc program can read the same 300k lines in around 0.4 seconds. > > The full input file is around 43 Meg with around 2.2 million lines. -- William Park, Open Geometry Consulting, <opengeometry at yahoo.ca> Linux solution for data management and processing.
- Previous message (by thread): looking for speed-up ideas
- Next message (by thread): looking for speed-up ideas
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list