Threading vs. Multiprocessing

In this topic, we'll explore the differences between threading and multiprocessing in Python, covering their basic concepts, use cases, advantages, and disadvantages. We'll provide examples to illustrate how each technique works and discuss when to choose one over the other.

Introduction to Threading and Multiprocessing

In this section, we’ll introduce threading and multiprocessing, explaining their basic concepts and differences.

Threading

Threading allows multiple threads of execution to run concurrently within a single process. Threads share the same memory space but can execute different tasks independently.

Multiprocessing

Multiprocessing involves running multiple processes concurrently, typically on multiple CPU cores or processors. Each process has its own memory space and runs independently of other processes.

Threading in Python

In this section, we’ll focus on threading in Python, discussing how to create and manage threads using the threading module.

Creating Threads

We can create threads in Python using the Thread class from the threading module. Threads can execute any callable object, such as functions or methods.

Example:

				
					import threading
import time

def print_numbers():
    for i in range(5):
        print(i)
        time.sleep(1)

thread = threading.Thread(target=print_numbers)
thread.start()
thread.join()
				
			

Explanation:

  • In this example, we define a print_numbers function that prints numbers from 0 to 4 with a delay of 1 second between each print statement.
  • We create a new thread thread that executes the print_numbers function.
  • The start() method is called to start the execution of the thread, and the join() method is called to wait for the thread to complete.

Thread Synchronization

Since threads share the same memory space, access to shared resources must be synchronized to avoid race conditions. We can use locks, semaphores, and conditions to synchronize access to critical sections of code.

Multiprocessing in Python

In this section, we’ll explore multiprocessing in Python, discussing how to create and manage processes using the multiprocessing module.

Creating Processes

Processes in Python can be created using the Process class from the multiprocessing module. Each process runs independently of other processes and has its own memory space.

Example:

				
					from multiprocessing import Process
import time

def print_numbers():
    for i in range(5):
        print(i)
        time.sleep(1)

process = Process(target=print_numbers)
process.start()
process.join()
				
			

Explanation:

  • In this example, we define a print_numbers function similar to the threading example.
  • We create a new process process that executes the print_numbers function.
  • The start() method is called to start the execution of the process, and the join() method is called to wait for the process to complete.

Inter-Process Communication

Processes in Python communicate with each other using inter-process communication (IPC) mechanisms such as pipes, queues, and shared memory. These mechanisms allow processes to exchange data and synchronize their execution.

Threading vs. Multiprocessing: When to Use Each

In this section, we’ll discuss the advantages and disadvantages of threading and multiprocessing and when to choose one over the other.

Threading

Advantages:

  • Lightweight: Threads consume less memory and resources compared to processes.
  • Shared Memory: Threads share the same memory space, making data sharing and communication easier.

Disadvantages:

  • Global Interpreter Lock (GIL): Python’s GIL can limit the parallelism of threads, especially in CPU-bound tasks.
  • Limited CPU Utilization: Due to the GIL, threads are not suitable for CPU-bound tasks that require intensive computation.

Multiprocessing

Advantages:

  • True Parallelism: Multiprocessing allows for true parallelism by running tasks on separate CPU cores or processors.
  • No GIL Limitations: Each process has its own Python interpreter and memory space, avoiding the limitations of the GIL.

Disadvantages:

  • Increased Overhead: Processes have higher overhead compared to threads due to separate memory spaces.
  • Communication Overhead: Inter-process communication can be slower and more complex than thread synchronization.

Threading and Multiprocessing in Practice

In this section, we’ll provide practical examples to illustrate when to use threading or multiprocessing in real-world scenarios.

Example: Web Scraping with Threading

Consider a web scraping task where multiple URLs need to be fetched concurrently. Since fetching URLs involves I/O operations (network requests), threading can be a suitable approach to improve performance.

				
					import threading
import requests

def fetch_url(url):
    response = requests.get(url)
    print(f"Fetched {url}: {response.status_code}")

urls = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
]

threads = []
for url in urls:
    thread = threading.Thread(target=fetch_url, args=(url,))
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()
				
			

Explanation:

  • We define a function fetch_url(url) to fetch a single URL using the requests module. This function is the target for each thread.
  • We create a list urls containing the URLs we want to fetch concurrently.
  • We initialize an empty list threads to store the thread objects.
  • We iterate over the urls list and create a new thread for each URL using threading.Thread. We pass the fetch_url function as the target and the URL as the argument.
  • We start each thread using the start() method, which initiates the execution of the target function in a separate thread.
  • We append each thread object to the threads list.
  • After all threads are started, we use join() to wait for each thread to complete its execution before proceeding further.

Example: Image Processing with Multiprocessing

Now, consider an image processing task where multiple images need to be processed simultaneously. Since image processing involves CPU-bound operations, multiprocessing can be a suitable approach to leverage multiple CPU cores.

				
					from multiprocessing import Process
from PIL import Image

def process_image(image_path):
    image = Image.open(image_path)
    # Perform image processing operations
    # ...
    image.save(image_path.replace('.jpg', '_processed.jpg'))

image_paths = [
    "image1.jpg",
    "image2.jpg",
    "image3.jpg"
]

processes = []
for image_path in image_paths:
    process = Process(target=process_image, args=(image_path,))
    process.start()
    processes.append(process)

for process in processes:
    process.join()
				
			

Explanation:

  • We define a function process_image(image_path) to process a single image using the PIL (Python Imaging Library) module. This function is the target for each process.
  • We create a list image_paths containing the paths of the images we want to process concurrently.
  • We initialize an empty list processes to store the process objects.
  • We iterate over the image_paths list and create a new process for each image using multiprocessing.Process. We pass the process_image function as the target and the image path as the argument.
  • We start each process using the start() method, which initiates the execution of the target function in a separate process.
  • We append each process object to the processes list.
  • After all processes are started, we use join() to wait for each process to complete its execution before proceeding further.

Threading is suitable for I/O-bound tasks, such as network requests or file I/O operations, where the performance bottleneck is primarily due to waiting for external resources. By leveraging multiple threads, we can perform these tasks concurrently and improve overall efficiency. On the other hand, multiprocessing is ideal for CPU-bound tasks that involve intensive computation. By creating multiple processes, each running on a separate CPU core, we can achieve true parallelism and maximize CPU utilization, thereby improving performance. Happy Coding!❤️

Table of Contents