Parallel/Multiprocessing script design question
Ivan Voras
ivoras at _fer.hr_
Thu Sep 13 09:53:45 EDT 2007
More information about the Python-list mailing list
Thu Sep 13 09:53:45 EDT 2007
- Previous message (by thread): Parallel/Multiprocessing script design question
- Next message (by thread): hang in multithreaded program / python and gdb.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Amit N wrote: > About 800+ 10-15MB files are generated daily that need to be processed. The > processing consists of different steps that the files must go through: > > -Uncompress > -FilterA > -FilterB > -Parse > -Possibly compress parsed files for archival You can implement one of two easy straightforward approaches: 1 - Create one program, start N instances of it, where N is the number of CPUs/cores, and let each process one file to completion. You'll probably need an "overseer" program to start them and dispatch jobs to them. The easiest is to start your processes with first N files, then monitor them for completion and when any of them finishes, start another with the next file in queue, etc. 2 - Create a program / process for each of these steps and let the steps operate independently, but feed output from one step to the input of the next. You'll probably need some buffering and more control, so that if (for example) "FilterA" is slower then "Uncompress", the "Uncompress" process is signaled to wait a little until "FilterA" needs more data. The key is that, as long as all the steps run at approximatly the same speed, they can run in parallel. Note that both approaches are in principle independent on whether you use threads or processes, with the exception of communication between the steps/stages, but you can't use threads in python if your goal is parallel execution of threads. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 257 bytes Desc: OpenPGP digital signature URL: <http://mail.python.org/pipermail/python-list/attachments/20070913/b09e8055/attachment.sig>
- Previous message (by thread): Parallel/Multiprocessing script design question
- Next message (by thread): hang in multithreaded program / python and gdb.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list