A while ago I wrote an article about mutexes in C++. I explained how they solve the problem of race conditions by allowing only one thread to access data simultaneously.

A read-write mutex (also known: readers-writer, shared-exclusive, multiple-readers/single-writer, mrsw) is a specialization of a mutex that allows for greater performance in a scenario where reads are more frequent than writes.

We can see how a shared mutex can be more performant in an example with multiple readers:

Shared mutex - multiple readers, single writer

The image shows how multiple readers can access the same resource at the same time, but when a writer has the resource, nobody else can access it. If we were using a regular mutex, the program would run slower because all threads would have to wait for the mutex:

Exclusive mutex

Using shared_mutex

C++17 introduced a shared_mutex implementation that is now available in most C++ compilers.

While a regular mutex exposes 3 methods: lock, unlock and try_lock, a shared_mutex adds 3 more: lock_shared, unlock_shared, try_lock_shared. The first 3 methods work exactly the same as in a regular mutex. i.e. If a mutex is locked, all other threads will wait until it is unlocked. The shared versions are a little more complicated.

Allowing readers to always lock a shared mutex even if there are writers waiting could result in writes waiting forever, so the implementation of shared_mutex uses a special algorithm that lets the programmer use it without having to think about this problem. There is one scenario that can be esily explained: If there is no thread calling lock ever, all the threads calling lock_shared will be allowed to proceed without ever having to wait.

Let’s look at an example with multiple readers and one writer that uses a regular mutex:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <iostream>
#include <thread>
#include <mutex>

int value = 0;
std::mutex mutex;

// Reads the value and sets v to that value
void readValue(int& v) {
  mutex.lock();
  // Simulate some latency
  std::this_thread::sleep_for(std::chrono::seconds(1));
  v = value;
  mutex.unlock();
}

// Sets value to v
void setValue(int v) {
  mutex.lock();
  // Simulate some latency
  std::this_thread::sleep_for(std::chrono::seconds(1));
  value = v;
  mutex.unlock();
}

int main() {
  int read1;
  int read2;
  int read3;
  std::thread t1(readValue, std::ref(read1));
  std::thread t2(readValue, std::ref(read2));
  std::thread t3(readValue, std::ref(read3));
  std::thread t4(setValue, 1);

  t1.join();
  t2.join();
  t3.join();
  t4.join();

  std::cout << read1 << "\n";
  std::cout << read2 << "\n";
  std::cout << read3 << "\n";
  std::cout << value << "\n";
}

We can compile the code with this command:

1
g++ regular-mutex.cpp -std=c++11 -pthread -o regular-mutex

If we time the execution of this program, we will see that it takes around 4 seconds, because each thread takes around 1 second to execute:

1
2
3
4
5
6
7
8
9
$ time ./regular-mutex
0
0
0
1

real	0m4.006s
user	0m0.006s
sys	0m0.000s

Let’s optimize this code by using a shared_mutex:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <iostream>
#include <thread>
#include <shared_mutex>

int value = 0;
std::shared_mutex mutex;

// Reads the value and sets v to that value
void readValue(int& v) {
  mutex.lock_shared();
  // Simulate some latency
  std::this_thread::sleep_for(std::chrono::seconds(1));
  v = value;
  mutex.unlock_shared();
}

// Sets value to v
void setValue(int v) {
  mutex.lock();
  // Simulate some latency
  std::this_thread::sleep_for(std::chrono::seconds(1));
  value = v;
  mutex.unlock();
}

int main() {
  int read1;
  int read2;
  int read3;
  std::thread t1(readValue, std::ref(read1));
  std::thread t2(readValue, std::ref(read2));
  std::thread t3(readValue, std::ref(read3));
  std::thread t4(setValue, 1);

  t1.join();
  t2.join();
  t3.join();
  t4.join();

  std::cout << read1 << "\n";
  std::cout << read2 << "\n";
  std::cout << read3 << "\n";
  std::cout << value << "\n";
}

The code is pretty much the same. We just changed the type of the mutex from std::mutex to std::shared_mutex and used lock_shared and unlock_shared for the read path. To compile this code we need to use C++17 and at least version 6.1 of gcc compiler:

1
g++ shared-mutex.cpp -std=c++17 -pthread -o shared-mutex

After executing this code, we can see that now it takes 2 seconds instead of 4:

1
2
3
4
5
6
7
8
9
$ time ./shared-mutex
0
0
0
1

real	0m2.004s
user	0m0.004s
sys	0m0.000s

The reason this version is faster is because threads t1, t2 and t3 can access the data at the same time. The only thread that will be waiting for 1 second is t4.

Lock guards

In a previous article where I introduced mutexes, I also introduced lock guards as a way to safely lock mutexes. Using a lock guard guarantees that a mutex will be unlocked when the lock guard goes out of scope. This prevents mistakes where mutexes are not unlocked because there was an exeption or the programmer forgot to unlock it in all code paths.

We can’t use std::lock_guard on a shared mutex because it doesn’t know about lock_shared and unlock_shared. Instead, there is a shared_lock that works similar to lock_guard but uses lock_shared and unlock_shared instead of lock and unlock:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <thread>
#include <shared_mutex>

int value = 0;
std::shared_mutex mutex;

// Reads the value and sets v to that value
void readValue(int& v) {
  std::shared_lock<std::shared_mutex> g(mutex);
  // Simulate some latency
  std::this_thread::sleep_for(std::chrono::seconds(1));
  v = value;
}

// Sets value to v
void setValue(int v) {
  std::lock_guard<std::shared_mutex> g(mutex);
  // Simulate some latency
  std::this_thread::sleep_for(std::chrono::seconds(1));
  value = v;
}

int main()
{
  int read1;
  int read2;
  int read3;
  std::thread t1(readValue, std::ref(read1));
  std::thread t2(readValue, std::ref(read2));
  std::thread t3(readValue, std::ref(read3));
  std::thread t4(setValue, 1);

  t1.join();
  t2.join();
  t3.join();
  t4.join();

  std::cout << read1 << "\n";
  std::cout << read2 << "\n";
  std::cout << read3 << "\n";
  std::cout << value << "\n";
}

The program remains pretty much the same. The only difference is that we use guards instead of manually locking and unlocking the mutexes. It is important that shared_lock is only used on read paths or it could result in data insconsistency.

[ c++  programming  design_patterns  ]
Configuring ESP32 to act as Access Point
Aligned and packed data in C and C++
ESP32 Non-Volatile Storage (NVS)
Making HTTP / HTTPS requests with ESP32
Modularizing ESP32 Software