ExternalProcess (biojava-legacy 1.9.5 API)

Utility class to execute an external process and to handle the STDOUT, STDERR and STDIN streams in multiple threads managed by a thread pool.

This class is intended for applications that call an external program many times, e.g. in a loop, and that need high performance throughput, i.e. the program's input and output should not be written to disk. The Java Runtime.exec(java.lang.String) methods requires the application to read/write the external program's input and output streams in multiple threads. Otherwise the calling application may block. However, instantiating multiple threads for each call is extensive. On Linux systems there is also the problem that each Java thread is represented by a single process and the number of processes is limited on Linux. Because the Java garbage collector does not free the Thread objects properly, an application might run out of threads (indicated by a OutOfMemoryError exception) after multiple iterations. Therefore, the ExternalProcess class uses a thread pool.

The simplest way to use this class is by calling the static methods execute(String) and execute(String, String, StringWriter, StringWriter). However, these methods are not thread safe and no configuration is possible. In the former case the program's input, output and error output is redirected to STDIN, STDOUT and STDERR of the calling program. In the latter case input is provided as string and output and error output is written to StringWriter objects. The environment, i.e. the current working directory and the environment variables, are inherited from the calling process. In both cases, a static thread pool of size THREAD_POOL_SIZE is used. The command that should be executed is provided as a string argument.

In scenarios where the environment has to be changed, the program input is generated just in time, or the program's output is parsed just in time, the use of an explicit instance of the ExternalProcess class is recommended. This instance could be initialized with a custom thread pool. Otherwise a SimpleThreadPool of size 3 is used. The input and output is managed by multithreaded input handler and output handler objects. There are four predefined handlers that read the program's input from a Reader object or a InputStream object and write the program's output to a Writer object or a OutputStream object. These classes are called: ReaderInputHandler, SimpleInputHandler, WriterOutputHandler and SimpleOutputHandler. If no handlers are specified the input and output is redirected to the standards streams of the calling process.

Before one of the methods execute() or execute(Properties) is called, the commands property should be set. One may include placeholders of the form %PARAM% within the commands. If a Properties object is passed to the execute(Properties) method, the placeholders are replaced by the particular property value. Therefore, the Properties object must contain a key named PARAM (case doesn't matter). The environment for calling the external program can be configured using the properties workingDirectory and environmentProperties.

Finally, the sleepTime property can be increased, in case the output handlers are not able to catch the whole program's output within the given time. The default value is SLEEP_TIME [in milliseconds].