wk12_concurrency_1


Concurrency, part 1

C++ program that incorporates std::thread, std::mutex, std::lock, std::atomic, and memory_order concepts in the context of a Monte Carlo simulation:

Resources (C++ Tutorial)

algorithm, auto, command_line_arguments, exception, iostream, namespace, range-based, sstream, std::cin, std::cout, std::cerr, std::exception, std::ifstream, std::ofstream, stdexcept, throw, vector

Implementation

#include <iostream>
#include <vector>
#include <thread>
#include <mutex>
#include <random>
#include <atomic>
#include <chrono>

// Shared random number generator
std::mt19937 global_rng;
std::mutex rng_mutex;

// Atomic counter for completed tasks
std::atomic<int> completed_tasks(0);

// Function to estimate pi using Monte Carlo method
void estimate_pi(int samples, double& result) {
    int inside_circle = 0;
    std::mt19937 local_rng;

    {
        std::lock_guard<std::mutex> lock(rng_mutex);
        local_rng = global_rng; // Create a thread-local copy of the RNG
    }

    std::uniform_real_distribution<double> dist(0.0, 1.0);

    for (int i = 0; i < samples; ++i) {
        double x = dist(local_rng);
        double y = dist(local_rng);
        if (x*x + y*y <= 1.0) {
            inside_circle++;
        }
    }

    result = 4.0 * inside_circle / samples;
    completed_tasks.fetch_add(1, std::memory_order_relaxed);
}

int main() {
    const int num_threads = 4;
    const int samples_per_thread = 1000000;

    std::vector<std::thread> threads;
    std::vector<double> results(num_threads);

    // Seed the global RNG
    std::random_device rd;
    global_rng.seed(rd());

    auto start_time = std::chrono::high_resolution_clock::now();

    // Launch threads
    for (int i = 0; i < num_threads; ++i) {
        threads.emplace_back(estimate_pi, samples_per_thread, std::ref(results[i]));
    }

    // Wait for all threads to complete
    for (auto& thread : threads) {
        thread.join();
    }

    auto end_time = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time);

    // Calculate final estimate of pi
    double pi_estimate = 0.0;
    for (double result : results) {
        pi_estimate += result;
    }
    pi_estimate /= num_threads;

    std::cout << "Estimated value of pi: " << pi_estimate << std::endl;
    std::cout << "Actual value of pi:    " << M_PI << std::endl;
    std::cout << "Error:                 " << std::abs(pi_estimate - M_PI) << std::endl;
    std::cout << "Time taken:            " << duration.count() << " ms" << std::endl;
    std::cout << "Completed tasks:       " << completed_tasks.load(std::memory_order_relaxed) << std::endl;

    return 0;
}

This program demonstrates the following concepts:

  1. std::thread:
  2. We create multiple threads to parallelize the Monte Carlo simulation for estimating pi.

  3. std::mutex and std::lock_guard:

  4. We use a mutex (rng_mutex) to protect access to the global random number generator.
  5. std::lock_guard is used to ensure the mutex is properly locked and unlocked.

  6. std::atomic and memory_order:

  7. We use an atomic counter (completed_tasks) to track the number of completed tasks across threads.
  8. fetch_add is used with std::memory_order_relaxed since the exact order of increments is not critical in this case.

  9. Monte Carlo Simulation:

  10. The program estimates the value of pi using the Monte Carlo method.
  11. Each thread generates random points and checks if they fall within a unit circle.

  12. Thread-Local Random Number Generation:

  13. Each thread creates a local copy of the random number generator to avoid contention.

  14. Performance Measurement:

  15. We use std::chrono to measure the execution time of the parallel computation.

This implementation showcases how to use threading and synchronization primitives in C++ to parallelize a computational task while ensuring thread safety. The use of atomic operations allows for efficient tracking of progress across threads without the need for heavy-weight synchronization.

Remember that the actual performance gain from parallelization can vary depending on the hardware and the specific workload. In some cases, the overhead of creating and managing threads might outweigh the benefits for small workloads.

Previous Page | Course Schedule | Course Content