Processing a file using multithreads
Roy Smith
roy at panix.com
Fri Sep 9 09:19:07 EDT 2011
More information about the Python-list mailing list
Fri Sep 9 09:19:07 EDT 2011
- Previous message (by thread): Processing a file using multithreads
- Next message (by thread): Processing a file using multithreads
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In article <c6cbd486-7e5e-4d26-93b9-088d48a25dea at g9g2000yqb.googlegroups.com>, aspineux <aspineux at gmail.com> wrote: > On Sep 9, 12:49 am, Abhishek Pratap <abhishek.... at gmail.com> wrote: > > 1. My input file is 10 GB. > > 2. I want to open 10 file handles each handling 1 GB of the file > > 3. Each file handle is processed in by an individual thread using the > > same function ( so total 10 cores are assumed to be available on the > > machine) > > 4. There will be 10 different output files > > 5. once the 10 jobs are complete a reduce kind of function will > > combine the output. > > > > Could you give some ideas ? > > You can use "multiprocessing" module instead of thread to bypass the > GIL limitation. I agree with this. > First cut your file in 10 "equal" parts. If it is line based search > for the first line close to the cut. Be sure to have "start" and > "end" for each parts, start is the address of the first character of > the first line and end is one line too much (== start of the next > block) How much of the total time will be I/O and how much actual processing? Unless your processing is trivial, the I/O time will be relatively small. In that case, you might do well to just use the unix command-line "split" utility to split the file into pieces first, then process the pieces in parallel. Why waste effort getting the file-splitting-at-line-boundaries logic correct when somebody has done it for you?
- Previous message (by thread): Processing a file using multithreads
- Next message (by thread): Processing a file using multithreads
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list