In this topic, we'll explore the differences between threading and multiprocessing in Python, covering their basic concepts, use cases, advantages, and disadvantages. We'll provide examples to illustrate how each technique works and discuss when to choose one over the other.
In this section, we’ll introduce threading and multiprocessing, explaining their basic concepts and differences.
Threading allows multiple threads of execution to run concurrently within a single process. Threads share the same memory space but can execute different tasks independently.
Multiprocessing involves running multiple processes concurrently, typically on multiple CPU cores or processors. Each process has its own memory space and runs independently of other processes.
In this section, we’ll focus on threading in Python, discussing how to create and manage threads using the threading
module.
We can create threads in Python using the Thread
class from the threading
module. Threads can execute any callable object, such as functions or methods.
import threading
import time
def print_numbers():
for i in range(5):
print(i)
time.sleep(1)
thread = threading.Thread(target=print_numbers)
thread.start()
thread.join()
print_numbers
function that prints numbers from 0 to 4 with a delay of 1 second between each print statement.thread
that executes the print_numbers
function.start()
method is called to start the execution of the thread, and the join()
method is called to wait for the thread to complete.Since threads share the same memory space, access to shared resources must be synchronized to avoid race conditions. We can use locks, semaphores, and conditions to synchronize access to critical sections of code.
In this section, we’ll explore multiprocessing in Python, discussing how to create and manage processes using the multiprocessing
module.
Processes in Python can be created using the Process
class from the multiprocessing
module. Each process runs independently of other processes and has its own memory space.
from multiprocessing import Process
import time
def print_numbers():
for i in range(5):
print(i)
time.sleep(1)
process = Process(target=print_numbers)
process.start()
process.join()
print_numbers
function similar to the threading example.process
that executes the print_numbers
function.start()
method is called to start the execution of the process, and the join()
method is called to wait for the process to complete.Processes in Python communicate with each other using inter-process communication (IPC) mechanisms such as pipes, queues, and shared memory. These mechanisms allow processes to exchange data and synchronize their execution.
In this section, we’ll discuss the advantages and disadvantages of threading and multiprocessing and when to choose one over the other.
In this section, we’ll provide practical examples to illustrate when to use threading or multiprocessing in real-world scenarios.
Consider a web scraping task where multiple URLs need to be fetched concurrently. Since fetching URLs involves I/O operations (network requests), threading can be a suitable approach to improve performance.
import threading
import requests
def fetch_url(url):
response = requests.get(url)
print(f"Fetched {url}: {response.status_code}")
urls = [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
]
threads = []
for url in urls:
thread = threading.Thread(target=fetch_url, args=(url,))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
fetch_url(url)
to fetch a single URL using the requests
module. This function is the target for each thread.urls
containing the URLs we want to fetch concurrently.threads
to store the thread objects.urls
list and create a new thread for each URL using threading.Thread
. We pass the fetch_url
function as the target and the URL as the argument.start()
method, which initiates the execution of the target function in a separate thread.threads
list.join()
to wait for each thread to complete its execution before proceeding further.Now, consider an image processing task where multiple images need to be processed simultaneously. Since image processing involves CPU-bound operations, multiprocessing can be a suitable approach to leverage multiple CPU cores.
from multiprocessing import Process
from PIL import Image
def process_image(image_path):
image = Image.open(image_path)
# Perform image processing operations
# ...
image.save(image_path.replace('.jpg', '_processed.jpg'))
image_paths = [
"image1.jpg",
"image2.jpg",
"image3.jpg"
]
processes = []
for image_path in image_paths:
process = Process(target=process_image, args=(image_path,))
process.start()
processes.append(process)
for process in processes:
process.join()
process_image(image_path)
to process a single image using the PIL
(Python Imaging Library) module. This function is the target for each process.image_paths
containing the paths of the images we want to process concurrently.processes
to store the process objects.image_paths
list and create a new process for each image using multiprocessing.Process
. We pass the process_image
function as the target and the image path as the argument.start()
method, which initiates the execution of the target function in a separate process.processes
list.join()
to wait for each process to complete its execution before proceeding further.Threading is suitable for I/O-bound tasks, such as network requests or file I/O operations, where the performance bottleneck is primarily due to waiting for external resources. By leveraging multiple threads, we can perform these tasks concurrently and improve overall efficiency. On the other hand, multiprocessing is ideal for CPU-bound tasks that involve intensive computation. By creating multiple processes, each running on a separate CPU core, we can achieve true parallelism and maximize CPU utilization, thereby improving performance. Happy Coding!❤️