This chapter dives into the exciting world of parallel algorithms in C++. We'll explore how to leverage the processing power of modern multi-core CPUs to solve problems faster by utilizing multiple cores simultaneously. Imagine having multiple chefs working together in a kitchen instead of just one – that's the power of parallel algorithms!
Sequential algorithms execute instructions one after another, like following a recipe step-by-step. This is the traditional way of writing code, but it can become slow for computationally intensive tasks.
Most computers have multiple cores (CPUs) that can work on tasks concurrently. However, traditional sequential algorithms don’t take advantage of this parallelism by default. It’s like having multiple chefs in the kitchen, but only one is allowed to cook at a time!
Parallel algorithms break down a large problem into smaller, independent sub-problems. These sub-problems can then be solved simultaneously on different cores, significantly reducing the overall execution time. It’s like having each chef prepare a different dish simultaneously, leading to a faster overall meal preparation!
The C++ Standard Template Library (STL) provides a rich set of algorithms for common operations on data structures like vectors, lists, and arrays. These algorithms are designed to be generic and work with different data types.
The STL offers parallel algorithm adapters that can be applied to existing sequential algorithms to enable parallel execution. These adapters take advantage of multiple cores to speed up the computation.
sort
vs. Sequential sort
#include
#include
#include
#include // Include for parallel execution
int main() {
std::vector numbers = {5, 2, 8, 1, 9};
// Sequential sort
std::sort(numbers.begin(), numbers.end());
std::cout << "Sorted numbers (sequential): ";
for (int num : numbers) {
std::cout << num << " ";
}
std::cout << std::endl;
// Parallel sort using adapter
std::vector numbers_parallel = numbers;
std::sort(std::execution::par, numbers_parallel.begin(), numbers_parallel.end()); // Corrected parallel sort
std::cout << "Sorted numbers (parallel): ";
for (int num : numbers_parallel) {
std::cout << num << " ";
}
std::cout << std::endl;
// Output (may vary on different systems):
// Sorted numbers (sequential): 1 2 5 8 9
// Sorted numbers (parallel): 1 2 5 8 9 (Order will be same, but execution might be faster on multi-core systems)
}
Sorted numbers (sequential): 1 2 5 8 9
Sorted numbers (parallel): 1 2 5 8 9
sort
uses the standard sequential algorithm.sort
uses the std::execution::par
adapter to enable parallel execution.This adapter is the most common for enabling parallel execution on all available cores. It leverages a thread pool to manage the workload and distribute tasks across cores.
This adapter explicitly forces the algorithm to run sequentially, even on a multi-core system. This might be useful for debugging or specific scenarios where parallel execution isn’t beneficial.
The STL provides other adapters for more fine-grained control over parallel execution, such as std::execution::par_unseq
(allowing out-of-order execution) or custom policies for specific hardware architectures.
Parallel algorithms offer significant speedups for computationally intensive tasks that can be easily divided into independent sub-problems. However, not all algorithms benefit from parallelism.
The <future>
header provides the std::async
function to launch asynchronous tasks and manage their execution. This approach offers more flexibility than algorithm adapters for complex parallel workflows.
The <execution>
header offers std::for_each
with parallel execution policies to parallelize simple loop iterations that don’t require complex data dependencies.
Some libraries provide parallel versions of data structures like vectors, sets, and maps that can take advantage of multiple cores for faster operations.
Multiplying matrices involves nested loops with a significant amount of computation. Here’s a simplified example of sequential matrix multiplication:
#include
#include
void multiplyMatrices(const std::vector>& A, const std::vector>& B, std::vector>& C) {
int rowsA = A.size();
int colsA = A[0].size();
int colsB = B[0].size();
// Resize matrix C to the appropriate dimensions
C.resize(rowsA, std::vector(colsB, 0));
for (int i = 0; i < rowsA; ++i) {
for (int j = 0; j < colsB; ++j) {
for (int k = 0; k < colsA; ++k) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
}
int main() {
// Example matrices A and B
std::vector> A = {{1, 2, 3}, {4, 5, 6}};
std::vector> B = {{7, 8}, {9, 10}, {11, 12}};
// Result matrix C
std::vector> C;
// Multiply matrices A and B, store result in C
multiplyMatrices(A, B, C);
// Display the result matrix C
std::cout << "Result matrix C:" << std::endl;
for (const auto& row : C) {
for (int element : row) {
std::cout << element << " ";
}
std::cout << std::endl;
}
return 0;
}
// output //
Result matrix C:
58 64
139 154
This sequential implementation can be parallelized by breaking down the outer loop and performing calculations for each row of the resulting matrix (C
) independently. Here’s a basic example using std::thread
:
#include
#include
#include
void parallelMultiplyMatrices(const std::vector>& A, const std::vector>& B, std::vector>& C) {
int rowsA = A.size();
int colsB = B[0].size();
std::vector threads(rowsA);
for (int i = 0; i < rowsA; ++i) {
threads[i] = std::thread([&A, &B, &C, i, colsB]() {
for (int j = 0; j < colsB; ++j) {
C[i][j] = 0;
for (int k = 0; k < A[0].size(); ++k) {
C[i][j] += A[i][k] * B[k][j];
}
}
});
}
for (auto& thread : threads) {
thread.join();
}
}
int main() {
// Example matrices A and B
std::vector> A = {{1, 2, 3}, {4, 5, 6}};
std::vector> B = {{7, 8}, {9, 10}, {11, 12}};
// Result matrix C
std::vector> C(A.size(), std::vector(B[0].size(), 0)); // Initialize with correct dimensions
// Multiply matrices A and B in parallel, store result in C
parallelMultiplyMatrices(A, B, C);
// Display the result matrix C
std::cout << "Result matrix C:" << std::endl;
for (const auto& row : C) {
for (int element : row) {
std::cout << element << " ";
}
std::cout << std::endl;
}
return 0;
}
Result matrix C:
58 64
139 154
C
).std::async
and parallel data structures for more fine-grained control and potentially higher performance gains.This chapter has equipped you with a comprehensive understanding of parallel programming concepts in C++, empowering you to write efficient and scalable C++ applications that can leverage the power of multiple cores effectively. Now, go forth and conquer the world of parallel programming in C++. Happy coding !❤️