13.1 Threading in Python
Threading allows a program to run multiple operations concurrently in the same process. Python's threading
module provides high-level support for multithreading, which allows you to run different parts of a program in parallel. This is especially useful in situations where a program has tasks that can run independently, such as waiting for input, performing background computations, or handling I/O operations.
In this section, we will explore the basics of threading in Python, including creating threads, managing them, using thread synchronization, and discussing Python's Global Interpreter Lock (GIL).
13.1.1 What is a Thread?
A thread is a separate flow of execution in a program. Each thread runs independently but shares the same memory space with other threads in the process. This allows multiple operations to occur simultaneously, though in practice, due to Python's Global Interpreter Lock (GIL), only one thread can execute Python bytecode at a time.
Use Cases for Threading:
- I/O-bound tasks: Tasks that involve waiting for input/output operations, like reading from or writing to files, handling network requests, etc.
- Background tasks: Tasks that should run independently without blocking the main thread, such as periodic checks, monitoring, etc.
13.1.2 Creating and Starting Threads
The threading
module provides a way to create and manage threads. A new thread can be created using the Thread
class, and the start()
method begins its execution.
Example: Creating and Starting a Thread
import threading
import time
# Define a simple function to run in a thread
def print_numbers():
for i in range(5):
time.sleep(1)
print(f"Thread: {i}")
# Create a new thread and start it
thread = threading.Thread(target=print_numbers)
thread.start()
# Main thread continues to run
for i in range(5):
time.sleep(1)
print(f"Main thread: {i}")
Output:
Main thread: 0
Thread: 0
Main thread: 1
Thread: 1
Main thread: 2
Thread: 2
Main thread: 3
Thread: 3
Main thread: 4
Thread: 4
In this example:
- A new thread is created using
threading.Thread(target=print_numbers)
. thread.start()
starts the thread, which runs concurrently with the main program.- Both the main thread and the new thread print numbers at the same time, demonstrating parallel execution.
13.1.3 Joining Threads
Once a thread has been started, the main program may need to wait for the thread to complete before continuing execution. This can be achieved using the join()
method, which blocks the calling thread until the thread whose join()
method is called has completed.
Example: Using join()
to Wait for a Thread
import threading
import time
def print_numbers():
for i in range(5):
time.sleep(1)
print(f"Thread: {i}")
# Create and start the thread
thread = threading.Thread(target=print_numbers)
thread.start()
# Wait for the thread to finish before continuing
thread.join()
print("Thread has finished, continuing in the main thread.")
Output:
Thread: 0
Thread: 1
Thread: 2
Thread: 3
Thread: 4
Thread has finished, continuing in the main thread.
In this example:
- The
join()
method is used to ensure the main program waits for the thread to complete before proceeding to print the final message.
13.1.4 Thread Safety and Race Conditions
When multiple threads access shared data, it is possible to encounter race conditions. A race condition occurs when two or more threads modify the same data at the same time, leading to unpredictable results. To prevent this, Python provides thread synchronization primitives like locks.
13.1.5 Using Locks to Prevent Race Conditions
A lock is a synchronization primitive that can be used to ensure that only one thread at a time can access a shared resource. Python's Lock
class in the threading
module allows you to prevent race conditions by acquiring a lock before accessing shared data and releasing the lock afterward.
Example: Preventing Race Conditions with a Lock
import threading
# Shared resource
counter = 0
# Create a lock
lock = threading.Lock()
def increment_counter():
global counter
for _ in range(1000):
with lock: # Acquire the lock before accessing the shared resource
counter += 1
# Create two threads that increment the counter
thread1 = threading.Thread(target=increment_counter)
thread2 = threading.Thread(target=increment_counter)
# Start both threads
thread1.start()
thread2.start()
# Wait for both threads to finish
thread1.join()
thread2.join()
print(f"Final counter value: {counter}")
Output:
Final counter value: 2000
In this example:
- The
Lock
is used to prevent race conditions. Each thread must acquire the lock before modifying the sharedcounter
variable. - Without the lock, the final value of
counter
would be unpredictable because both threads might try to modify it simultaneously.
13.1.6 Daemon Threads
A daemon thread runs in the background and does not prevent the program from exiting. Once all non-daemon threads have finished, the program will exit, even if daemon threads are still running.
To create a daemon thread, set the daemon
property of the thread to True
before starting it.
Example: Creating a Daemon Thread
import threading
import time
def background_task():
while True:
print("Background task is running...")
time.sleep(2)
# Create a daemon thread
daemon_thread = threading.Thread(target=background_task)
daemon_thread.daemon = True
daemon_thread.start()
# Main thread continues to run for a few seconds
time.sleep(5)
print("Main thread is exiting, background task will be terminated.")
Output:
Background task is running...
Background task is running...
Background task is running...
Main thread is exiting, background task will be terminated.
In this example:
- The daemon thread runs a background task, printing a message every 2 seconds.
- When the main thread exits after 5 seconds, the program terminates and the daemon thread is also stopped.
13.1.7 The Global Interpreter Lock (GIL)
Python uses the Global Interpreter Lock (GIL), which ensures that only one thread can execute Python bytecode at a time, even in a multi-threaded program. This means that Python's threading is not suitable for CPU-bound tasks because only one thread can execute at a time, which limits parallelism on multi-core processors.
However, threading is still useful for I/O-bound tasks, where threads spend most of their time waiting for I/O operations to complete rather than performing CPU-intensive work.
Workaround for CPU-bound Tasks: Multiprocessing
For CPU-bound tasks, you can use the multiprocessing
module, which spawns separate processes, each with its own memory space and Python interpreter, thus bypassing the GIL.
13.1.8 Thread Pools and concurrent.futures
The concurrent.futures
module provides a high-level interface for managing threads and processes using ThreadPoolExecutor
. This simplifies managing multiple threads by allowing you to submit tasks to a thread pool and automatically handle the threads.
Example: Using ThreadPoolExecutor
from concurrent.futures import ThreadPoolExecutor
import time
def task(message):
time.sleep(2)
return f"Task completed: {message}"
# Create a thread pool with 3 workers
with ThreadPoolExecutor(max_workers=3) as executor:
futures = [executor.submit(task, f"Message {i}") for i in range(5)]
# Retrieve results as they are completed
for future in futures:
print(future.result())
Output:
Task completed: Message 0
Task completed: Message 1
Task completed: Message 2
Task completed: Message 3
Task completed: Message 4
In this example:
ThreadPoolExecutor
allows you to create a pool of worker threads. Themax_workers
parameter controls how many threads can run in parallel.submit()
submits tasks to be executed by the thread pool, and the results are retrieved asynchronously usingresult()
.
13.1.9 Summary
- Threading in Python enables concurrent execution of tasks, making it useful for I/O-bound tasks like handling multiple network connections or file I/O in parallel.
- Creating Threads: Threads can be created using the
threading.Thread
class, and thestart()
method is used to begin execution. - Joining Threads: The
join()
method ensures the main thread waits for a thread to finish before continuing. - Locks: Use
Lock
to prevent race conditions when threads access shared data. - Daemon Threads: Threads that run in the background and do not block program termination.
- Global Interpreter Lock (GIL): Limits true parallelism for CPU-bound tasks in Python. For CPU-bound tasks, consider using the
multiprocessing
module. ThreadPoolExecutor
: Provides a higher-level API for managing multiple threads in a thread pool, making it easier to submit tasks and retrieve results.
Threading in Python can help optimize programs with I/O-bound operations or tasks that can be performed concurrently, but it is important to understand the limitations imposed by the GIL for CPU-bound tasks.