map() vs. submit() With the ThreadPoolExecutor in Python - Super Fast Python

Use map() when converting a for-loop to use threads and use submit() when you need more control over asynchronous tasks when using the ThreadPoolExecutor in Python.

In this tutorial, you will discover the difference between map() and submit() when executing tasks with the ThreadPoolExecutor in Python.

Let’s get started.

Use map() to Execute Tasks With the ThreadPoolExecutor

Use map() to convert a for-loop to use threads.

Perhaps the most common pattern when using the ThreadPoolExecutor is to convert a for-loop that executes a function on each item in a collection to use threads.

It assumes that the function has no side effects, meaning it does not access any data outside of the function and does not change the data provided to it. It takes data and produces a result.

These types of for loops can be written explicitly in Python; for example:

...

# apply a function to each element in a collection

for item in mylist:

result = task(item)

A better practice is to use the built-in map() function that applies the function to each item in the iterable for you.

...

# apply the function to each element in the collection

results = map(task, mylist)

The built-in map() function does not perform the task() function to each item until we iterate the results, so-called lazy evaluation:

...

# iterate the results from map

for result in results:

print(result)

Therefore, it is common to see this operation in a for-loop idiom as follows:

...

# iterate the results from map

for result in map(task, mylist):

print(result)

We can perform this same operation using the thread pool, except each call of the function with an item in the iterable is a task that is executed asynchronously using threads.

For example:

...

# iterate the results from map

for result in executor.map(task, mylist):

print(result)

Like the built-in map() function, the ThreadPoolExecutor map() function returns an iterable over the results returned by the target function applied to the provided iterable of items.

Although the tasks are executed asynchronously, the results are iterated in the order of the iterable provided to the map() function, the same as the built-in map() function.

In this way, we can think of the ThreadPoolExecutor version of the map() function as an asynchronous version of the built-in map() function and is ideal if you are looking to update your for loop to use threads.

The example below demonstrates using the map() function with a task that will sleep a random amount of time less than one second and return the provided value.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

# SuperFastPython.com

# example of the map and wait pattern for the ThreadPoolExecutor

from time import sleep

from random import random

from concurrent.futures import ThreadPoolExecutor

# custom task that will sleep for a variable amount of time

def task(name):

    # sleep for less than a second

    sleep(random())

    return name

# start the thread pool

with ThreadPoolExecutor(10) as executor:

    # execute tasks concurrently and process results in order

    for result in executor.map(task, range(10)):

        # retrieve the result

        print(result)

Running the example, we can see that the results are reported in the order that the tasks were created and sent into the thread pool.

Like the built-in map() function, the ThreadPoolExecutor map() function can take more than one iterable. This means your function can take more than one argument.

...

# example of calling map with more than one iterable

for result in executor.map(task, mylist1, mylist2):

print(result)

Unlike the built-in map() function, the tasks are sent into the thread pool immediately after calling map() instead of being executed in a lazy manner as results are requested.

Put another way, the tasks will execute and complete in their own time regardless of whether we execute the iterable of results returned by calling map().

...

# example of calling map and not iterating the results

_ = executor.map(task, mylist)

Now that we are familiar with the map() function, let’s take a look at the submit() function.

Use submit() to Execute Tasks With the ThreadPoolExecutor

Use submit() when you want more control over asynchronous tasks.

The submit() function will take the name of the target task function you wish to execute asynchronously as well as any arguments to the function. It will then return a Future object.

...

# submit a task to the thread pool and get a future object

future = executor.submit(task, arg1, arg2)

The Future object can be kept and used to query the status of the asynchronous task, such as whether it is running(), done(), or has been cancelled().

...

# check if a task is running

if future.running():

# do something...

It can also be used to get the result() from the task when it is completed or the exception() if one was raised during the execution of the task.

...

# get the result from a task via it's future object

result = future.result()

The Future object can also be used to cancel() the task before it has started running and to add a callback function via add_done_callback() that will be executed once the task has completed.

...

# cancel the task if has not yet started running

if future.cancel():

print('Task was cancelled')

It is a common pattern to submit many tasks to a thread pool and store the Future objects in a collection.

For example, it is common to use a list comprehension.

...

# create many tasks and store the future objects in a list

futures = [executor.submit(work) for _ in range(100)]

We can iterate the list of Future objects to get results in the order that the tasks were submitted; for example:

...

# get results from tasks in the order they were submitted

for future in futures:

# get the result

result = future.result()

Recall that the call to the result() function on the Future will not return until the task is done.

The collection of future objects can then be handed off to utility functions provided by the concurrent.futures module, such as wait() and as_completed().

The wait() module function takes a collection of Future objects and by default will return all tasks that are done, although can be configured to return when any task raises an exception or is done.

...

# wait for all tasks to be done

wait(futures)

The as_completed() module function takes a collection of Future objects and will return the Future objects in the order that the tasks are completed as they are completed. This is instead of the order that they were submitted to the thread pool, allowing your program to be more responsive.

...

# respond to tasks as they are completed

for future in as_completed(futures):

# get the result

result = future.result()

The processing of future objects in the order they are completed may be the most common usage pattern of the submit() function with the ThreadPoolExecutor.

The example below demonstrates this pattern, submitting the tasks in order from 0 to 9 and showing results in the order that they were completed.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

# SuperFastPython.com

# example of the submit and use as completed pattern for the ThreadPoolExecutor

from time import sleep

from random import random

from concurrent.futures import ThreadPoolExecutor

from concurrent.futures import as_completed

# custom task that will sleep for a variable amount of time

def task(name):

    # sleep for less than a second

    sleep(random())

    return name

# start the thread pool

with ThreadPoolExecutor(10) as executor:

    # submit tasks and collect futures

    futures = [executor.submit(task, i) for i in range(10)]

    # process task results as they are available

    for future in as_completed(futures):

        # retrieve the result

        print(future.result())

Running the example we can see that the results are retrieved and printed in the order that the tasks completed, not the order that the tasks were submitted to the thread pool.

Now that we are familiar with how to use submit() to execute tasks in the ThreadPoolExecutor, let’s take a look at a comparison between map() and submit().

Let’s compare the map() and submit() functions for the ThreadPoolExecutor.

Both the map() and submit() functions are similar in that they both allow you to execute tasks asynchronously using threads.

The map() function is simpler:

  • It is a threaded version of the built-in map() function.
  • It assumes you want to call the same function many times with different values.
  • It only takes iterables as arguments to the target function.
  • It only allows you to iterate results from the target function.

In an effort to keep your code simpler and easier to read, you should try to use map() first, before you try to use the submit() function.

The simplicity of the map() function means it is also limited:

  • It does not provide control over the order that task results are used.
  • It does not provide a way to check the status of tasks.
  • It does not allow you to cancel tasks before they start running.
  • It does not allow you control over how to handle an exception raised by a task function.

If the map() function is too restrictive, you may want to consider the submit() function instead.

The submit() function gives more control:

  • It assumes you want to submit one task at a time.
  • It allows a different target function with a variable number of arguments for each task.
  • It allows you to check on the status of each task.
  • It allows you to cancel a task before it has started running.
  • It allows callback functions to be called automatically when tasks are done.
  • It allows you to handle an exception raised by a target task function.
  • It allows you control over when you would like to get the result of the task, if at all.
  • It can be used with module functions like wait() and as_completed() to work with tasks in groups.

The added control with submit() comes with added complexity:

  • It requires that you manage the Future object for each task.
  • It requires that you explicitly retrieve the result for each task.
  • It requires extra code if you need to apply the same function with different arguments.

Now that we have compared and contrasted the map() and submit() functions on the ThreadPoolExecutor, which one should you use?

Use map() if:

  • You are already using the built-in map() function.
  • You are calling a (near-)pure function in a for-loop for each item in an iterable.

Use submit() if:

  • You need to check the status of tasks while they are executing.
  • You need control over the order that you process results from tasks.
  • You need to conditionally cancel the execution of tasks.
  • You can simplify your code by using callback functions called when tasks are done.

So which are you going to use for your program?
Let me know in the comments below.


Free Python ThreadPoolExecutor Course

Download your FREE ThreadPoolExecutor PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPoolExecutor API.

Discover how to use the ThreadPoolExecutor class including how to configure the number of workers and how to execute tasks asynchronously.

Learn more
 


Further Reading

This section provides additional resources that you may find helpful.

Books

I also recommend specific chapters from the following books:

Guides

APIs

References

Takeaways

You now know when to use map() and submit() to execute tasks with the ThreadPoolExecutor.

Do you have any questions about how to use map() and submit()?
Ask your question in the comments below and I will do my best to answer.

Photo by sterlinglanier Lanier on Unsplash