std::unordered_set

STL container: `std::unordered_set`

std::unordered_set is a container in the C++ Standard Template Library (STL) that stores unique elements in no particular order. It uses a hash table for its internal implementation, providing average constant-time complexity for insertion, deletion, and search operations.

Key Characteristics

Stores unique elements
Elements are not ordered
Provides average constant-time complexity (O(1)) for insertion, deletion, and search operations
Uses a hash function to map elements to buckets
Allows for custom hash functions and equality comparators
Does not allow modification of elements (to preserve hash integrity)
Suitable for fast lookup and uniqueness checking

Example 1: Basic Usage

#include <iostream>
#include <unordered_set>
#include <string>

int main() {
    std::unordered_set<std::string> fruits;

    // Inserting elements
    fruits.insert("apple");
    fruits.insert("banana");
    fruits.insert("cherry");
    fruits.insert("apple"); // Duplicate, won't be inserted

    // Printing the set
    std::cout << "Fruits in the set:" << std::endl;
    for (const auto& fruit : fruits) {
        std::cout << fruit << std::endl;
    }

    // Checking if an element exists
    if (fruits.find("banana") != fruits.end()) {
        std::cout << "Banana is in the set" << std::endl;
    }

    // Size of the set
    std::cout << "Number of fruits: " << fruits.size() << std::endl;

    // Removing an element
    fruits.erase("cherry");

    return 0;
}

Explanation:

An unordered_set of strings is created to store fruit names
Elements are inserted using the insert() function
Duplicate insertions are ignored (e.g., "apple")
The set is iterated using a range-based for loop
find() is used to check for the existence of an element
size() returns the number of elements in the set
erase() removes an element from the set

Example 2: Custom Type with Custom Hash Function

#include <iostream>
#include <unordered_set>
#include <string>

struct Person {
    std::string name;
    int age;

    bool operator==(const Person& other) const {
        return name == other.name && age == other.age;
    }
};

// Custom hash function for Person
struct PersonHash {
    std::size_t operator()(const Person& p) const {
        return std::hash<std::string>()(p.name) ^ std::hash<int>()(p.age);
    }
};

int main() {
    std::unordered_set<Person, PersonHash> people;

    people.insert({"Alice", 30});
    people.insert({"Bob", 25});
    people.insert({"Charlie", 35});
    people.insert({"Alice", 30}); // Duplicate, won't be inserted

    for (const auto& person : people) {
        std::cout << person.name << ": " << person.age << std::endl;
    }

    return 0;
}

Explanation:

A custom Person struct is defined with name and age
A custom operator== is defined for Person to check equality
A custom hash function PersonHash is created for Person objects
An unordered_set of Person objects is created, using PersonHash
This example demonstrates how to use custom types with custom hash functions in an unordered_set

Example 3: Performance Comparison with `std::set`

#include <iostream>
#include <unordered_set>
#include <set>
#include <chrono>
#include <random>

// Function to measure execution time
template<typename Func>
long long measureTime(Func func) {
    auto start = std::chrono::high_resolution_clock::now();
    func();
    auto end = std::chrono::high_resolution_clock::now();
    return std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
}

int main() {
    const int NUM_ELEMENTS = 1000000;
    const int NUM_SEARCHES = 1000000;

    std::unordered_set<int> uset;
    std::set<int> set;

    // Random number generation
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution<> dis(1, NUM_ELEMENTS);

    // Insertion
    auto insertUnordered = [&]() {
        for (int i = 0; i < NUM_ELEMENTS; ++i) {
            uset.insert(dis(gen));
        }
    };
    auto insertOrdered = [&]() {
        for (int i = 0; i < NUM_ELEMENTS; ++i) {
            set.insert(dis(gen));
        }
    };

    std::cout << "Insertion time (microseconds):" << std::endl;
    std::cout << "unordered_set: " << measureTime(insertUnordered) << std::endl;
    std::cout << "set: " << measureTime(insertOrdered) << std::endl;

    // Search
    auto searchUnordered = [&]() {
        for (int i = 0; i < NUM_SEARCHES; ++i) {
            uset.find(dis(gen));
        }
    };
    auto searchOrdered = [&]() {
        for (int i = 0; i < NUM_SEARCHES; ++i) {
            set.find(dis(gen));
        }
    };

    std::cout << "Search time (microseconds):" << std::endl;
    std::cout << "unordered_set: " << measureTime(searchUnordered) << std::endl;
    std::cout << "set: " << measureTime(searchOrdered) << std::endl;

    return 0;
}

Explanation:

This example compares the performance of std::unordered_set and std::set
It measures the time taken for insertion and search operations
A large number of random integers are inserted into both containers
The measureTime function is used to calculate execution time
std::unordered_set typically shows faster insertion and search times due to its hash-based implementation

Example 4: Bucket Interface

#include <iostream>
#include <unordered_set>
#include <string>

int main() {
    std::unordered_set<std::string> words = {
        "apple", "banana", "cherry", "date", "elderberry",
        "fig", "grape", "honeydew", "imbe", "jackfruit"
    };

    // Print bucket information
    std::cout << "Bucket count: " << words.bucket_count() << std::endl;
    std::cout << "Max bucket count: " << words.max_bucket_count() << std::endl;
    std::cout << "Load factor: " << words.load_factor() << std::endl;
    std::cout << "Max load factor: " << words.max_load_factor() << std::endl;

    // Print contents of each bucket
    for (size_t i = 0; i < words.bucket_count(); ++i) {
        std::cout << "Bucket " << i << " contains:";
        for (auto it = words.begin(i); it != words.end(i); ++it) {
            std::cout << " " << *it;
        }
        std::cout << std::endl;
    }

    // Find which bucket an element is in
    std::string search = "grape";
    size_t bucket = words.bucket(search);
    std::cout << "'" << search << "' is in bucket " << bucket << std::endl;

    // Rehash the container
    std::cout << "\nRehashing to 20 buckets..." << std::endl;
    words.rehash(20);
    std::cout << "New bucket count: " << words.bucket_count() << std::endl;

    return 0;
}

Explanation:

This example demonstrates the bucket interface of std::unordered_set
bucket_count() returns the number of buckets in the container
max_bucket_count() shows the maximum number of buckets the container can have
load_factor() returns the average number of elements per bucket
max_load_factor() returns the current maximum load factor
The example iterates through each bucket, showing its contents
bucket(key) is used to find which bucket a specific element is in
rehash() is used to manually rehash the container with a new bucket count

Additional Considerations

Iterator Stability: Iterators and references to elements in an unordered_set remain valid after insertion or deletion of other elements.
Rehashing: When the load factor exceeds the max load factor, the container automatically increases the number of buckets and rehashes the elements.
Custom Types: When using custom types, you need to provide a hash function and an equality comparison function.
No Duplicates: unordered_set automatically handles duplicate elements by not inserting them.
Memory Usage: unordered_set typically uses more memory than set due to its hash table structure, but this trade-off allows for faster access times.

Summary

std::unordered_set is a powerful container in C++ for storing unique elements with fast access times. Key points to remember:

It provides average constant-time complexity for insertion, deletion, and search operations
Elements are unordered and unique
It uses a hash function to distribute elements into buckets
Custom types can be used with custom hash and equality functions
It's generally faster than std::set for most operations, especially with large datasets
The bucket interface allows for fine-grained control and analysis of the underlying hash table

unordered_set is ideal for scenarios where fast lookup and uniqueness of elements are required, and the order of elements is not important. It's commonly used in situations like: - Removing duplicates from a collection - Fast membership testing - Implementing certain algorithms that require quick element access and uniqueness

Understanding when to use std::unordered_set versus other containers like std::set or std::vector is crucial for writing efficient and clear C++ code in various application domains. Its constant-time average complexity for key operations makes it a go-to choice for many performance-critical applications where set operations are frequently performed.

What are the main use cases for std::unordered_set
How does std::unordered_set handle duplicate elements
What are the time complexities for common operations in std::unordered_set
How does std::unordered_set differ from std::set
Can you provide an example of using std::unordered_set with custom data types

Previous Page | Course Schedule | Course Content

std::unordered_set

STL container: std::unordered_set

Key Characteristics

Example 1: Basic Usage

Explanation:

Example 2: Custom Type with Custom Hash Function

Explanation:

Example 3: Performance Comparison with std::set

Explanation:

Example 4: Bucket Interface

Explanation:

Additional Considerations

Summary

Related

STL container: `std::unordered_set`

Example 3: Performance Comparison with `std::set`