How do I run tasks in parallel with concurrent.futures?

· Category: Python Programming

Short answer

The concurrent.futures module provides high-level interfaces for asynchronously executing callables. ThreadPoolExecutor is ideal for I/O-bound tasks, while ProcessPoolExecutor bypasses the GIL for CPU-bound work.

Steps

  1. Import concurrent.futures.
  2. Submit tasks with .submit() or map over an iterable with .map().
  3. Retrieve results via .result() or iterate over the map output.
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import requests

def fetch(url):
    return requests.get(url, timeout=5).status_code

urls = ["https://example.com"] * 5

with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(fetch, urls))
    print(results)

Tips

  • Use as_completed() to process results in the order they finish.
  • ProcessPoolExecutor requires picklable functions and arguments.
  • Set max_workers based on CPU cores for processes and based on connection limits for threads.
  • The Future object can be cancelled before it starts executing.

Common issues

  • Shared mutable state in threads requires locks; processes do not share memory by default.
  • ProcessPoolExecutor has higher overhead than threads; use it only when computation dominates.
  • Unhandled exceptions in worker threads/processes are raised when .result() is called.