Parallel Algorithms : Unleash the Power of Multiple Cores

This chapter dives into the exciting world of parallel algorithms in C++. We'll explore how to leverage the processing power of modern multi-core CPUs to solve problems faster by utilizing multiple cores simultaneously. Imagine having multiple chefs working together in a kitchen instead of just one – that's the power of parallel algorithms!

Understanding Parallelism and Sequential vs. Parallel Algorithms

Sequential Algorithms: The Traditional Approach

Sequential algorithms execute instructions one after another, like following a recipe step-by-step. This is the traditional way of writing code, but it can become slow for computationally intensive tasks.

The Challenge of Single-Core Processing

Most computers have multiple cores (CPUs) that can work on tasks concurrently. However, traditional sequential algorithms don’t take advantage of this parallelism by default. It’s like having multiple chefs in the kitchen, but only one is allowed to cook at a time!

Parallel Algorithms: Divide and Conquer for Speed

Parallel algorithms break down a large problem into smaller, independent sub-problems. These sub-problems can then be solved simultaneously on different cores, significantly reducing the overall execution time. It’s like having each chef prepare a different dish simultaneously, leading to a faster overall meal preparation!

The C++ Standard Template Library (STL) and Parallel Algorithms

The Power of STL Algorithms

The C++ Standard Template Library (STL) provides a rich set of algorithms for common operations on data structures like vectors, lists, and arrays. These algorithms are designed to be generic and work with different data types.

Introducing Parallel Algorithm Adapters

The STL offers parallel algorithm adapters that can be applied to existing sequential algorithms to enable parallel execution. These adapters take advantage of multiple cores to speed up the computation.

Example: Parallel sort vs. Sequential sort

				
					#include <iostream>
#include <vector>
#include <algorithm>
#include <execution> // Include for parallel execution

int main() {
  std::vector<int> numbers = {5, 2, 8, 1, 9};

  // Sequential sort
  std::sort(numbers.begin(), numbers.end());

  std::cout << "Sorted numbers (sequential): ";
  for (int num : numbers) {
    std::cout << num << " ";
  }
  std::cout << std::endl;

  // Parallel sort using adapter
  std::vector<int> numbers_parallel = numbers;
  std::sort(std::execution::par, numbers_parallel.begin(), numbers_parallel.end()); // Corrected parallel sort

  std::cout << "Sorted numbers (parallel): ";
  for (int num : numbers_parallel) {
    std::cout << num << " ";
  }
  std::cout << std::endl;

  // Output (may vary on different systems):
  // Sorted numbers (sequential): 1 2 5 8 9
  // Sorted numbers (parallel): 1 2 5 8 9  (Order will be same, but execution might be faster on multi-core systems)
}

				
			
				
					Sorted numbers (sequential): 1 2 5 8 9 
Sorted numbers (parallel): 1 2 5 8 9 

				
			

Explanation:

  • We define two vectors with the same numbers.
  • The first sort uses the standard sequential algorithm.
  • The second sort uses the std::execution::par adapter to enable parallel execution.
  • The output will show the sorted numbers (order will be the same) but the parallel version might be faster on a multi-core system.

Exploring Different Parallel Algorithm Adapters

std::execution::par: The Workhorse for Parallel Execution

This adapter is the most common for enabling parallel execution on all available cores. It leverages a thread pool to manage the workload and distribute tasks across cores.

std::execution::seq: Back to Sequential Execution

This adapter explicitly forces the algorithm to run sequentially, even on a multi-core system. This might be useful for debugging or specific scenarios where parallel execution isn’t beneficial.

Other Adapters for Specific Use Cases

The STL provides other adapters for more fine-grained control over parallel execution, such as std::execution::par_unseq (allowing out-of-order execution) or custom policies for specific hardware architectures.

When to Use Parallel Algorithms?

Not All Algorithms Benefit from Parallelism

Parallel algorithms offer significant speedups for computationally intensive tasks that can be easily divided into independent sub-problems. However, not all algorithms benefit from parallelism.

Factors to Consider

  • Task Granularity: Fine-grained tasks with high overhead for thread creation and synchronization might not see significant speedups due to the overhead associated with parallelization.
  • Data Dependencies: If sub-problems have dependencies on each other (meaning one sub-problem’s result is needed before another can start), parallelization might not be beneficial.
  • Amdahl’s Law: This law states that the theoretical speedup of a parallel program is limited by the sequential portion of the algorithm. If a significant portion of the algorithm remains sequential, the overall speedup will be limited.

Beyond the Basics: Advanced Parallel Programming Techniques

Task-Based Parallelism with std::async

The <future> header provides the std::async function to launch asynchronous tasks and manage their execution. This approach offers more flexibility than algorithm adapters for complex parallel workflows.

Parallel for Loops with std::for_each

The <execution> header offers std::for_each with parallel execution policies to parallelize simple loop iterations that don’t require complex data dependencies.

Parallel Data Structures

Some libraries provide parallel versions of data structures like vectors, sets, and maps that can take advantage of multiple cores for faster operations.

Putting It All Together: Example – Parallel Matrix Multiplication

Sequential Matrix Multiplication

Multiplying matrices involves nested loops with a significant amount of computation. Here’s a simplified example of sequential matrix multiplication:

				
					#include <iostream>
#include <vector>

void multiplyMatrices(const std::vector<std::vector<int>>& A, const std::vector<std::vector<int>>& B, std::vector<std::vector<int>>& C) {
  int rowsA = A.size();
  int colsA = A[0].size();
  int colsB = B[0].size();

  // Resize matrix C to the appropriate dimensions
  C.resize(rowsA, std::vector<int>(colsB, 0));

  for (int i = 0; i < rowsA; ++i) {
    for (int j = 0; j < colsB; ++j) {
      for (int k = 0; k < colsA; ++k) {
        C[i][j] += A[i][k] * B[k][j];
      }
    }
  }
}

int main() {
  // Example matrices A and B
  std::vector<std::vector<int>> A = {{1, 2, 3}, {4, 5, 6}};
  std::vector<std::vector<int>> B = {{7, 8}, {9, 10}, {11, 12}};

  // Result matrix C
  std::vector<std::vector<int>> C;

  // Multiply matrices A and B, store result in C
  multiplyMatrices(A, B, C);

  // Display the result matrix C
  std::cout << "Result matrix C:" << std::endl;
  for (const auto& row : C) {
    for (int element : row) {
      std::cout << element << " ";
    }
    std::cout << std::endl;
  }

  return 0;
}

				
			
				
					// output //
Result matrix C:
58 64 
139 154 

				
			

Parallelizing Matrix Multiplication

This sequential implementation can be parallelized by breaking down the outer loop and performing calculations for each row of the resulting matrix (C) independently. Here’s a basic example using std::thread:

				
					#include <iostream>
#include <vector>
#include <thread>

void parallelMultiplyMatrices(const std::vector<std::vector<int>>& A, const std::vector<std::vector<int>>& B, std::vector<std::vector<int>>& C) {
  int rowsA = A.size();
  int colsB = B[0].size();

  std::vector<std::thread> threads(rowsA);
  for (int i = 0; i < rowsA; ++i) {
    threads[i] = std::thread([&A, &B, &C, i, colsB]() {
      for (int j = 0; j < colsB; ++j) {
        C[i][j] = 0;
        for (int k = 0; k < A[0].size(); ++k) {
          C[i][j] += A[i][k] * B[k][j];
        }
      }
    });
  }

  for (auto& thread : threads) {
    thread.join();
  }
}

int main() {
  // Example matrices A and B
  std::vector<std::vector<int>> A = {{1, 2, 3}, {4, 5, 6}};
  std::vector<std::vector<int>> B = {{7, 8}, {9, 10}, {11, 12}};

  // Result matrix C
  std::vector<std::vector<int>> C(A.size(), std::vector<int>(B[0].size(), 0)); // Initialize with correct dimensions

  // Multiply matrices A and B in parallel, store result in C
  parallelMultiplyMatrices(A, B, C);

  // Display the result matrix C
  std::cout << "Result matrix C:" << std::endl;
  for (const auto& row : C) {
    for (int element : row) {
      std::cout << element << " ";
    }
    std::cout << std::endl;
  }

  return 0;
}

				
			
				
					Result matrix C:
58 64 
139 154 

				
			

Explanation:

  • We create a thread for each row of the resulting matrix (C).
  • Each thread calculates the elements of its assigned row independently.
  • This approach can significantly improve performance on multi-core systems.

Key Points

  • Parallel algorithms can significantly speed up computationally intensive tasks that can be divided into independent sub-problems.
  • Consider factors like task granularity, data dependencies, and Amdahl’s Law when deciding whether to parallelize an algorithm.
  • The STL provides parallel algorithm adapters for a convenient way to enable parallel execution on existing algorithms.
  • Explore advanced techniques like std::async and parallel data structures for more fine-grained control and potentially higher performance gains.
  • Always profile your code to measure the actual benefits of parallelization. Not all algorithms see significant speedups from parallelization due to overhead.

Additional tips

  • Start simple: Begin by understanding parallel algorithm adapters and their usage with basic algorithms.
  • Think in terms of data parallelism: Identify independent data chunks that can be processed concurrently.
  • Be mindful of synchronization: If sub-problems have dependencies, ensure proper synchronization to avoid race conditions and incorrect results.
  • Test thoroughly: Parallel programs can have subtle bugs related to threading and synchronization. Write unit tests to verify correctness under various scenarios.
  • Consider debugging tools: Debuggers with multi-threading support can be invaluable for visualizing thread behavior and identifying issues.

This chapter has equipped you with a comprehensive understanding of parallel programming concepts in C++, empowering you to write efficient and scalable C++ applications that can leverage the power of multiple cores effectively. Now, go forth and conquer the world of parallel programming in C++. Happy coding !❤️

Table of Contents

Contact here

Copyright © 2025 Diginode

Made with ❤️ in India