13.4 The multiprocessing Module in Python

The multiprocessing module in Python provides a powerful interface for running tasks in parallel by creating separate processes, each with its own memory space. Unlike multithreading, where threads share the same memory and are limited by Python's Global Interpreter Lock (GIL), multiprocessing bypasses the GIL, allowing true parallelism, making it especially useful for CPU-bound tasks.

In this section, we will explore how to create and manage processes using the multiprocessing module, including process pools, inter-process communication, and managing shared resources.


13.4.1 What is Multiprocessing?

Multiprocessing involves running multiple processes simultaneously. Each process runs in its own memory space and can execute independently of the others, making it ideal for tasks that require heavy computation or true parallelism across multiple CPU cores.

Key characteristics of multiprocessing:

  • True parallelism: Multiple processes can run simultaneously on different CPU cores.
  • Independent memory: Each process has its own memory space, so there is no risk of race conditions unless processes explicitly share data.
  • Bypasses the GIL: Unlike threads, each process has its own interpreter and memory, so the Global Interpreter Lock (GIL) does not apply.

13.4.2 Creating and Starting Processes

The Process class in the multiprocessing module allows you to create and run processes. Each process runs independently and can execute a separate task.

Example: Creating and Starting a Process

import multiprocessing
import time

def task():
    print("Task starting...")
    time.sleep(2)
    print("Task completed.")

# Create a new process
process = multiprocessing.Process(target=task)

# Start the process
process.start()

# Wait for the process to finish
process.join()

print("Main process continues after the task.")

In this example:

  • The Process class creates a new process that runs the task() function.
  • The start() method begins the execution of the process.
  • The join() method waits for the process to complete before continuing with the main program.

13.4.3 Process Pools

When you need to run many processes simultaneously, managing them individually can be cumbersome. The multiprocessing.Pool class simplifies this by providing a pool of worker processes to which you can submit tasks. The pool handles the distribution of tasks across the processes and ensures efficient use of resources.

Example: Using Pool to Run Multiple Tasks in Parallel

import multiprocessing
import time

def compute_square(n):
    time.sleep(1)  # Simulate a time-consuming task
    return n * n

if __name__ == "__main__":
    # Create a pool of worker processes
    with multiprocessing.Pool(processes=4) as pool:
        # Map the compute_square function to a list of numbers
        results = pool.map(compute_square, [1, 2, 3, 4, 5])

    print(f"Results: {results}")

Output:

Results: [1, 4, 9, 16, 25]

In this example:

  • The Pool object creates a pool of 4 worker processes.
  • The map() method applies the compute_square() function to each element in the list [1, 2, 3, 4, 5] in parallel.
  • The results are collected and returned once all tasks have completed.

13.4.4 Inter-Process Communication (IPC)

Since processes in Python run in separate memory spaces, they cannot directly share variables or data. To facilitate communication between processes, the multiprocessing module provides several tools for inter-process communication (IPC), such as queues and pipes.

Queues

A queue allows processes to safely share data. It is thread- and process-safe, ensuring that only one process can access the queue at a time.

Example: Using a Queue for Inter-Process Communication

import multiprocessing
import time

def producer(queue):
    for i in range(5):
        print(f"Producing {i}")
        time.sleep(1)
        queue.put(i)  # Put items into the queue

def consumer(queue):
    while True:
        item = queue.get()  # Retrieve items from the queue
        if item is None:
            break
        print(f"Consumed {item}")

if __name__ == "__main__":
    # Create a queue for communication between processes
    queue = multiprocessing.Queue()

    # Create producer and consumer processes
    producer_process = multiprocessing.Process(target=producer, args=(queue,))
    consumer_process = multiprocessing.Process(target=consumer, args=(queue,))

    # Start the processes
    producer_process.start()
    consumer_process.start()

    # Wait for the producer to finish
    producer_process.join()

    # Send None to the consumer to signal it's done
    queue.put(None)

    # Wait for the consumer to finish
    consumer_process.join()

In this example:

  • The producer puts items into the queue, and the consumer retrieves them.
  • The consumer runs in an infinite loop, consuming items from the queue until it receives None, signaling the end of processing.

13.4.5 Shared Memory

When processes need to share data, you can use shared memory. The multiprocessing module provides several objects that can be shared between processes, such as Value and Array.

Example: Using Value for Shared Memory

import multiprocessing

def increment_value(shared_value):
    for _ in range(1000):
        with shared_value.get_lock():  # Ensure exclusive access
            shared_value.value += 1

if __name__ == "__main__":
    # Create a shared Value object
    shared_value = multiprocessing.Value('i', 0)  # 'i' means integer

    # Create two processes that modify the shared value
    process1 = multiprocessing.Process(target=increment_value, args=(shared_value,))
    process2 = multiprocessing.Process(target=increment_value, args=(shared_value,))

    # Start the processes
    process1.start()
    process2.start()

    # Wait for both processes to finish
    process1.join()
    process2.join()

    print(f"Final value: {shared_value.value}")

In this example:

  • The Value object is shared between two processes, and both processes increment the shared value.
  • The get_lock() method ensures exclusive access to the shared value, preventing race conditions.

13.4.6 Synchronization Primitives

Just like in multithreading, you can use synchronization primitives in multiprocessing to manage shared resources. The multiprocessing module provides locks, semaphores, and events.

Locks

A lock ensures that only one process can access a shared resource at a time.

Example: Using a Lock in Multiprocessing

import multiprocessing

def task(lock, shared_resource):
    with lock:
        shared_resource.value += 1
        print(f"Resource value: {shared_resource.value}")

if __name__ == "__main__":
    # Create a lock and shared resource
    lock = multiprocessing.Lock()
    shared_resource = multiprocessing.Value('i', 0)

    # Create multiple processes
    processes = [multiprocessing.Process(target=task, args=(lock, shared_resource)) for _ in range(5)]

    # Start all processes
    for process in processes:
        process.start()

    # Wait for all processes to finish
    for process in processes:
        process.join()

    print(f"Final resource value: {shared_resource.value}")

In this example:

  • The lock ensures that only one process at a time can modify the shared resource, preventing race conditions.

13.4.7 Daemon Processes

Just like threads, processes can also be daemon processes. A daemon process runs in the background and does not block the program from exiting. Once all non-daemon processes have completed, the program will terminate, and any running daemon processes will be killed.

Example: Creating a Daemon Process

import multiprocessing
import time

def background_task():
    while True:
        print("Background task running...")
        time.sleep(1)

if __name__ == "__main__":
    # Create a daemon process
    process = multiprocessing.Process(target=background_task)
    process.daemon = True
    process.start()

    # Main process continues for a few seconds
    time.sleep(3)
    print("Main process exiting, background task will be terminated.")

In this example:

  • The daemon process runs a background task. When the main process exits after 3 seconds, the daemon process is automatically terminated.

13.4.8 Multiprocessing in concurrent.futures

The concurrent.futures module also supports multiprocessing through the ProcessPoolExecutor, which provides a simpler interface for managing process pools compared to multiprocessing.Pool.

Example: Using ProcessPoolExecutor

from concurrent.futures import ProcessPoolExecutor
import time

def compute_square(n):
    time.sleep(1)
    return n * n

if __name__ == "__main__":
    with ProcessPoolExecutor() as executor:
        results = list(executor.map(compute_square, [1, 2

, 3, 4, 5]))
    
    print(f"Results: {results}")

In this example:

  • ProcessPoolExecutor creates a pool of processes and executes the compute_square() function in parallel.
  • The map() method applies the function to each element in the list [1, 2, 3, 4, 5] and returns the results once all processes have finished.

13.4.9 Summary

The multiprocessing module in Python enables true parallelism by running multiple processes on different CPU cores. It is especially useful for CPU-bound tasks that require heavy computation and can benefit from parallel execution.

  • Processes: Independent units of execution with their own memory space, ideal for CPU-bound tasks.
  • Process Pools: Allow you to run multiple tasks in parallel using a pool of worker processes, simplifying the management of processes.
  • Inter-Process Communication (IPC): Enables processes to share data safely using queues, pipes, and shared memory.
  • Locks and Synchronization: Manage access to shared resources using locks and other synchronization primitives.
  • Daemon Processes: Background processes that do not block the main program from exiting.

Multiprocessing is a powerful tool for improving performance in Python programs that are limited by CPU-bound operations, enabling you to fully utilize multi-core systems and achieve true parallelism.