How do I run tasks in parallel with concurrent.futures?
· Category: Python Programming
Short answer
The concurrent.futures module provides high-level interfaces for asynchronously executing callables. ThreadPoolExecutor is ideal for I/O-bound tasks, while ProcessPoolExecutor bypasses the GIL for CPU-bound work.
Steps
- Import
concurrent.futures. - Submit tasks with
.submit()or map over an iterable with.map(). - Retrieve results via
.result()or iterate over the map output.
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import requests
def fetch(url):
return requests.get(url, timeout=5).status_code
urls = ["https://example.com"] * 5
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(fetch, urls))
print(results)
Tips
- Use
as_completed()to process results in the order they finish. ProcessPoolExecutorrequires picklable functions and arguments.- Set
max_workersbased on CPU cores for processes and based on connection limits for threads. - The
Futureobject can be cancelled before it starts executing.
Common issues
- Shared mutable state in threads requires locks; processes do not share memory by default.
ProcessPoolExecutorhas higher overhead than threads; use it only when computation dominates.- Unhandled exceptions in worker threads/processes are raised when
.result()is called.