cgra's

2021-08-26

TRB OOM Patch - no_peer_dumpster

Filed under: TRB — cgra @ 14:14:15

A strong warning

What follows, should be considered EXPERIMENTAL, because:

  1. I could be wrong,
  2. brain fart this thick is difficult to breathe,
  3. nobody else looked yet, and
  4. it's not been extensively tested.

A strong warning

Let us proceed. We will first analyze the current situation, then have a look at the proposed solution.

The reader may first want to have a careful look at the patch contents below this article, then resume here to read the whole description.

Current situation

TRB has a bunch of (here, here, and here) out-of-memory DoS issues. This article will attempt to address one specific issue in that group, which is the vNodesDisconnected dumpster, where every disconnected CNode is put on hold for 15 minutes, after being disconnected, and before actually deallocated.

The dumpster mechanism is bad, because:

  • It's not necessary (elaborated later), and
  • It enables an attacker to waste the target node's RAM just by repeatedly connecting and disconnecting.

Brain Farts of Satoshi, episode ∞ - Ref Counting

Tied to the dumpster mechanism are1: CNode's AddRef(), Release(), GetRefCount(), nRefCount, and nReleaseTime.

AddRef() and Release() calls should go in pairs, otherwise nRefCount could remain positive, indefinitely. In such cases, the node wouldn't ever get removed from vNodesDisconnected, and deallocated.

These refs are adjusted as follows:

  • Increment and decrement for each peer, when beginning and ending a message processing loop in ThreadMessageHandler2() thread.
  • Increment and decrement for each peer, when beginning and ending a send()/recv() handling loop in ThreadSocketHandler2() thread.
  • Increment on peer's connect-time, when peer gets added to vNodes. Both, inbound and outbound.
  • Decrement when disconnected node is moved from vNodes to vNodesDisconnected.
  • Increment in a corner case where simultaneously connecting inbound, and outbound, to the same peer, resulting in sharing a CNode object without separate inbound/outbound connections. See next.

In ConnectNode(), an unbalanced AddRef() is theoretically possible: If new inbound peer was connected after the FindNode() call in OpenNetworkConnection(), and before the FindNode() call in ConnectNode(), and also happened to be the same peer our node was about to connect to (ie. ConnectNode()'s parameter addrConnect). Here, the inbound and outbound connection attempts would end up peculiarly sharing the same CNode object.

Otherwise, starting from the last FindNode() call, nothing prevents a peer from creating multiple connections to our node, and if we had an outbound connection already to the peer, we could end up with separate inbound and outbound connections to the same (unlike above, not a shared CNode object).

In the dumpster filling section is an expression that never triggers. The ref count increments for new connections (inbound, outbound) are only balanced with their equivalent decrements after triggering the expression, which is impossible!

Therefore, ref counts are only ever productively read by the check: whether to already delete a disconnected node, or not.

Ref counting is mostly pointless

Because the ref count reading happens in the ThreadSocketHandler2() thread, it's entirely pointless to adjust ref counts in the send()/recv handler (increment, decrement). The ref count reading will never happen at the same time -- both are in different positions of the same thread.

Adjusting the ref count, when peer is being added to, or removed from vNodes (new inbound, new outbound; dumpster load), is similarly pointless, but serves one distinct purpose: Prevent a theoretical risk of a dangling pointer dereference on two rows:

  • Patch line 57, a redundant timestamp assignment, and
  • patch line 177, an outbound-node-flag assignment, which could easily reside in a thread-safe position.

The last remaining, and within the scope of this patch, fundamentally the only, useful place to adjust the ref count is the ThreadMessageHandler2()'s loop. It signals the deletion code in ThreadSocketHandler2() thread that whether the message handler thread is still actively handling the peer data, or not.

The 15 min dumpster cooldown of disconnected nodes doesn't add to the ref counting anything else, but a loose, extra protection against the marginal dangling pointer dereference risk in OpenNetworkConnection(), at patch line 177: The second FindNode() call could find an existing peer, which the ThreadSocketHandler2() could delete immediately after - now we have a dangling pnode pointer.

Proposed changes

Removals: redundant outbound validation, unbalanced ref count increment, redundant assignment

Repeatedly validating outbound connections (patch lines 164, 165, 28, and 32) is useless, and for the FindNode() check especially, because the acceptance of inbound connections from a single peer IP address is not limited to one only.

The AddRef() by the second FindNode() position is not balanced anywhere, and could cause a stale peer to stick till node restart.

Also, assignment at patch line 57 is already done in CNode constructor, so is redundant. And besides, it marginally risks a dangling pointer dereference, as previously noted.

For these reasons we may remove lines from ConnectNode(), marked by patch lines from 28 to 41, and line 57.

Simplified outbound connection init

We may simplify the OpenNetworkConnection() function to return void, instead of bool, because the return value isn't used anywhere, and it further helps cleaning up the whole function.

The outbound-node-flag assignment from OpenNetworkConnection() should be moved to happen before the new peer gets added to vNodes in ConnectNode(), to avoid one of the marginal risks of a dangling pointer dereference noted earlier.

ConnectNode() would convey it's meaning more accurately, if it didn't return a CNode pointer. Specifically, we don't want to do anything with a node object, when outside of cs_vNodes/ref count guard, to avoid further dangling pointers. So we change the return type to void.

ConnectNode() parameter nTimeout is always 0, so we bolt the value down.

These changes are highlighted by patch lines: 16, 17, 25, 26, 53, 56-62, 155-180, 218 and 219.

Reduce the ref counting to required minimum

Because the AddRef() / Release() / GetRefCount() only make sense in coordinating the stale peer deletion moment, between ThreadMessageHandler2() and ThreadSocketHandler2(), we remove all the other uses:

AddRef()'s nTimeout parameter is never used, we may remove it.

The nReleaseTime will not be used, and must go (patch lines 90, 227, 235 and 250-255). We previously fixed the corner case it was loosely originally protecting -- nReleaseTime won't bed missed.

Because ref count is never 0 at deletion check time, we may remove the expression that never triggers.

Tow away the dumpster

The vNodesDisconnected can go entirely. We will keep the stale nodes in vNodes instead, and they won't stay there for long.

None of the TRY_CRITICAL_BLOCK's were necessary, because:

  • The whole dumpster-loading and dumpster-unloading section is surrounded by the cs_vNodes lock
  • Either an outer cs_vNodes has us covered, where the other threads may hold these locks (ThreadRCPServer() and ThreadBitcoinMiner(), which only are concerned with relaying either user's transactions, or freshly self-mined blocks), or
  • ThreadMessageHandler2() is either sleeping, waiting for cs_vNodes, or signalling own use with an increased ref count

We keep the zero ref count condition, under which the final deallocation may happen, like before.

The outbound/inbound flag check is pointless, because no node may show up here that has both flags false. The only three places where CNode constructor is called, are:

The 1) removal from vNodes, and 2) CNode deallocation, will happen together, under required conditions. Closing the TCP socket and other cleanup should happen without delay, even if the final deletion had to wait until ThreadMessageHandler2() is done with it's current iteration.

This whole dumpster hauling change is highlighted by the patch lines 70, and 78-118.

Finally, ThreadMessageHandler2() and ThreadSocketHandler2() threads could remain off-sync for a while so that the deletion section didn't immediately coincide with the message handler's sleeping period. For this reason, we're probably better off not letting already disconnected nodes to unnecessarily get locked down by the ref count guard (patch lines 188-198).

In conclusion

Simple as that! Right?

The patch

The patch file and my corresponding signature.

1 diff -uNr a/bitcoin/manifest b/bitcoin/manifest
2 --- a/bitcoin/manifest 1b686bcb52a6a5df9ee7db45315e32fd9b90ebff8783cde2aec664538b6722a972be2c060c4dc97eb5138f454413a2df670ed361b120bfa43acba686aeb9a54f
3 +++ b/bitcoin/manifest 2e02a09e85de234b0523bb1b280cd7838355b5a2fe20984c4f295327fe31980303ea83ad4b84b995e424bbf24184f1b8fb4ead67b516eae0ba852f992ad358ce
4 @@ -32,3 +32,4 @@
5 616451 mod6_phexdigit_fix mod6 Adds missing comma to separate values in the phexdigit array in util.cpp.
6 617254 mod6_excise_hash_truncation mod6 Regrind of ben_vulpes original; Removes truncation of hashes printed to TRB log file
7 617255 mod6_whogaveblox mod6 Regrind of asciilifeform original; Record the origin of every incoming candidate block (whether accepted or rejected)
8 +697539 cgra_no_peer_dumpster cgra Don't store stale peers; Trim down 'ref counter'
9 diff -uNr a/bitcoin/src/net.cpp b/bitcoin/src/net.cpp
10 --- a/bitcoin/src/net.cpp 6d7c7634cce09792942fc69cf945b64e8f8e77b496ad6bbaed12dccbc5e13c31fc2f1162735cae7951893fb6b3635955703059b1e8ba1607cc483c494c0a126c
11 +++ b/bitcoin/src/net.cpp f35e0d35dad6b268d20b025c3d49875c8e8c78f254caa20b2c88952f1719b83216d9cf7a6c99eb4045387d4abe27bf2c12bebbb189d719c61d39074e9c7705bb
12 @@ -18,7 +18,7 @@
13 void ThreadMessageHandler2(void* parg);
14 void ThreadSocketHandler2(void* parg);
15 void ThreadOpenConnections2(void* parg);
16 -bool OpenNetworkConnection(const CAddress& addrConnect);
17 +void OpenNetworkConnection(const CAddress& addrConnect);
18
19
20
21 @@ -446,22 +446,8 @@
22 return NULL;
23 }
24
25 -CNode* ConnectNode(CAddress addrConnect, int64 nTimeout)
26 +void ConnectNode(CAddress addrConnect)
27 {
28 - if (addrConnect.ip == addrLocalHost.ip)
29 - return NULL;
30 -
31 - // Look for an existing connection
32 - CNode* pnode = FindNode(addrConnect.ip);
33 - if (pnode)
34 - {
35 - if (nTimeout != 0)
36 - pnode->AddRef(nTimeout);
37 - else
38 - pnode->AddRef();
39 - return pnode;
40 - }
41 -
42 /// debug print
43 printf("trying connection %s lastseen=%.1fhrs lasttry=%.1fhrs\n",
44 addrConnect.ToString().c_str(),
45 @@ -484,19 +470,9 @@
46
47 // Add node
48 CNode* pnode = new CNode(hSocket, addrConnect, false);
49 - if (nTimeout != 0)
50 - pnode->AddRef(nTimeout);
51 - else
52 - pnode->AddRef();
53 + pnode->fNetworkNode = true;
54 CRITICAL_BLOCK(cs_vNodes)
55 vNodes.push_back(pnode);
56 -
57 - pnode->nTimeConnected = GetTime();
58 - return pnode;
59 - }
60 - else
61 - {
62 - return NULL;
63 }
64 }
65
66 @@ -604,7 +580,6 @@
67 void ThreadSocketHandler2(void* parg)
68 {
69 printf("ThreadSocketHandler started\n");
70 - list vNodesDisconnected;
71 int nPrevNodeCount = 0;
72
73 loop
74 @@ -618,40 +593,19 @@
75 vector vNodesCopy = vNodes;
76 BOOST_FOREACH(CNode* pnode, vNodesCopy)
77 {
78 - if (pnode->fDisconnect ||
79 - (pnode->GetRefCount() <= 0 && pnode->vRecv.empty() && pnode->vSend.empty()))
80 + if (pnode->fDisconnect)
81 {
82 - // remove from vNodes
83 - vNodes.erase(remove(vNodes.begin(), vNodes.end(), pnode), vNodes.end());
84 -
85 // close socket and cleanup
86 pnode->CloseSocketDisconnect();
87 pnode->Cleanup();
88
89 - // hold in disconnected pool until all refs are released
90 - pnode->nReleaseTime = max(pnode->nReleaseTime, GetTime() + 15 * 60);
91 - if (pnode->fNetworkNode || pnode->fInbound)
92 - pnode->Release();
93 - vNodesDisconnected.push_back(pnode);
94 - }
95 - }
96 -
97 - // Delete disconnected nodes
98 - list vNodesDisconnectedCopy = vNodesDisconnected;
99 - BOOST_FOREACH(CNode* pnode, vNodesDisconnectedCopy)
100 - {
101 - // wait until threads are done using it
102 - if (pnode->GetRefCount() <= 0)
103 - {
104 - bool fDelete = false;
105 - TRY_CRITICAL_BLOCK(pnode->cs_vSend)
106 - TRY_CRITICAL_BLOCK(pnode->cs_vRecv)
107 - TRY_CRITICAL_BLOCK(pnode->cs_mapRequests)
108 - TRY_CRITICAL_BLOCK(pnode->cs_inventory)
109 - fDelete = true;
110 - if (fDelete)
111 + // is ThreadMessageHandler2() also done with this node?
112 + if (pnode->GetRefCount() <= 0)
113 {
114 - vNodesDisconnected.remove(pnode);
115 + // remove from vNodes
116 + vNodes.erase(
117 + remove(vNodes.begin(), vNodes.end(), pnode),
118 + vNodes.end());
119 delete pnode;
120 }
121 }
122 @@ -754,7 +708,6 @@
123 {
124 printf("accepted connection %s\n", addr.ToString().c_str());
125 CNode* pnode = new CNode(hSocket, addr, true);
126 - pnode->AddRef();
127 CRITICAL_BLOCK(cs_vNodes)
128 vNodes.push_back(pnode);
129 }
130 @@ -768,8 +721,6 @@
131 CRITICAL_BLOCK(cs_vNodes)
132 {
133 vNodesCopy = vNodes;
134 - BOOST_FOREACH(CNode* pnode, vNodesCopy)
135 - pnode->AddRef();
136 }
137 BOOST_FOREACH(CNode* pnode, vNodesCopy)
138 {
139 @@ -886,11 +837,6 @@
140 }
141 }
142 }
143 - CRITICAL_BLOCK(cs_vNodes)
144 - {
145 - BOOST_FOREACH(CNode* pnode, vNodesCopy)
146 - pnode->Release();
147 - }
148
149 Sleep(10);
150 }
151 @@ -1055,27 +1001,22 @@
152 }
153 }
154
155 -bool OpenNetworkConnection(const CAddress& addrConnect)
156 +void OpenNetworkConnection(const CAddress& addrConnect)
157 {
158 //
159 // Initiate outbound network connection
160 //
161 if (fShutdown)
162 - return false;
163 + return;
164 if (addrConnect.ip == addrLocalHost.ip || !addrConnect.IsIPv4() ||
165 FindNode(addrConnect.ip) || CNode::IsBanned(addrConnect.ip))
166 - return false;
167 + return;
168
169 vnThreadsRunning[1]--;
170 - CNode* pnode = ConnectNode(addrConnect);
171 + ConnectNode(addrConnect);
172 vnThreadsRunning[1]++;
173 if (fShutdown)
174 - return false;
175 - if (!pnode)
176 - return false;
177 - pnode->fNetworkNode = true;
178 -
179 - return true;
180 + return;
181 }
182
183
184 @@ -1113,9 +1054,14 @@
185 vector vNodesCopy;
186 CRITICAL_BLOCK(cs_vNodes)
187 {
188 - vNodesCopy = vNodes;
189 - BOOST_FOREACH(CNode* pnode, vNodesCopy)
190 - pnode->AddRef();
191 + BOOST_FOREACH(CNode* pnode, vNodes)
192 + {
193 + if (!pnode->fDisconnect)
194 + {
195 + pnode->AddRef(); // guard against premature CNode delete
196 + vNodesCopy.push_back(pnode);
197 + }
198 + }
199 }
200
201 // Poll the connected nodes for messages
202 @@ -1140,7 +1086,7 @@
203 CRITICAL_BLOCK(cs_vNodes)
204 {
205 BOOST_FOREACH(CNode* pnode, vNodesCopy)
206 - pnode->Release();
207 + pnode->Release(); // now safe to delete CNode, if must
208 }
209
210 // Wait and allow messages to bunch up.
211 diff -uNr a/bitcoin/src/net.h b/bitcoin/src/net.h
212 --- a/bitcoin/src/net.h 492c9cc92a504bb8174d75fafcbee6980986182a459efc9bfa1d64766320d98ba2fa971d78d00a777c6cc50f82a5d424997927378e99738b1b3b550bdaa727f7
213 +++ b/bitcoin/src/net.h 48c582f558747cae0d621146500651c3cdbf770154dec5b528e87f6340c08620cf02a705d6b3703d36e28deca5d62faa73b1eb9a7e9bc45c203091d7078ebcd7
214 @@ -31,7 +31,7 @@
215 bool AddAddress(CAddress addr, int64 nTimePenalty=0, CAddrDB *pAddrDB=NULL);
216 void AddressCurrentlyConnected(const CAddress& addr);
217 CNode* FindNode(unsigned int ip);
218 -CNode* ConnectNode(CAddress addrConnect, int64 nTimeout=0);
219 +void ConnectNode(CAddress addrConnect);
220 void AbandonRequests(void (*fn)(void*, CDataStream&), void* param1);
221 bool AnySubscribed(unsigned int nChannel);
222 void MapPort(bool fMapPort);
223 @@ -126,7 +126,6 @@
224 int nMisbehavior;
225
226 public:
227 - int64 nReleaseTime;
228 std::map mapRequests;
229 CCriticalSection cs_mapRequests;
230 uint256 hashContinue;
231 @@ -176,7 +175,6 @@
232 fSuccessfullyConnected = false;
233 fDisconnect = false;
234 nRefCount = 0;
235 - nReleaseTime = 0;
236 hashContinue = 0;
237 nStartingHeight = -1;
238 fGetAddr = false;
239 @@ -205,16 +203,12 @@
240
241 int GetRefCount()
242 {
243 - return std::max(nRefCount, 0) + (GetTime() < nReleaseTime ? 1 : 0);
244 + return nRefCount;
245 }
246
247 - CNode* AddRef(int64 nTimeout=0)
248 + void AddRef()
249 {
250 - if (nTimeout != 0)
251 - nReleaseTime = std::max(nReleaseTime, GetTime() + nTimeout);
252 - else
253 - nRefCount++;
254 - return this;
255 + nRefCount++;
256 }
257
258 void Release()
259
  1. See bitcoin/src/net.h, declaration of class CNode []
  2. See: bitcoin/src/net.cpp []

2021-08-22

TRB Homework, Part N: Threads

Filed under: TRB — cgra @ 11:11:20

I was in a process of writing an explanation for a TRB patch. The changes in question touched TRB's concurrency mechanics, which I thought I had properly already taken into account, but then realized this might've not really been the case. So it was time for some TRB homework: The Threads.

The summary of my study below, is based mostly on these sources:

I also found Ada's approach to concurrency1 an interesting, and contrasting read.

Anyway...

The next time I see multi-threaded code, I will remember these four things:

I. Compiler and CPU do not always follow the programmer-specified execution order

The compiler may optimize/reorder code, making it vastly different from what the programmer specified.

The CPU may also reorder code, again, making it different from what the programmer (and compiler!) specified.

The only guiding principle is, that the reorganized program's effect must be exactly the same as the programmer-specified variant would result in. But only in a single-threaded case!

II. Nothing must be assumed of individual thread's relative execution speed

Concurrently running threads' instructions interleave, but there's roughly no guarantee at all, how exactly. If point I above didn't exist, the only guarantee would be that each thread's instructions execute in the specified order, for that thread only.

III. Compiler and CPU memory barriers required for thread synchronization

Compilers and CPU's provide2 memory "barriers" or "fences" for the programmer to control the said reordering mechanisms. These are markers, or machine instructions, that are placed in code, where no reordering across that point should happen, if it were to change the running program's current memory state -- compared to what it would've been without any reordering.

In practice, certain implicit compiler barriers also exist, like an external call. Compiler cannot reliably predict the external function's side effects, so it must sync up with the memory state of the programmer-specified program once, before resuming its improv-reordering solo, after the external call has returned.

IV. Usually, simple reads or writes of primitive data types are atomic

From multi-threaded programming's point of view, neither C++ nor GCC, AFAIK, guarantee atomic handling3 for any piece of data, not even for a bool.

CPU manufacturers, on the other hand, specify4 which of their CPUs' memory operations are atomic. In x86_64 CPUs, for example, reads and writes up to 64-bit are atomic, if properly aligned.

In practice, handling of the "usual primitive" C++-variables5 just happen to be atomic for each individual read or write. And only for that. For example, incrementing a variable, is likely not atomic, because it's not a simple assignment, or a value read.

A couple of practical experiments

A C++ piece, compiled with -O2 optimization:

1 #include < stdio.h>
2
3 int main()
4 {
5 unsigned int cnt = 0;
6 for (int i=0; i<0xFFFFFF; i++)
7 cnt++;
8 printf("%d\n", cnt);
9 return 0;
10 }

...and it could result in the following machine code, like in my case:

1 ...
2 0000000000400450 < main>:
3 400450: 48 83 ec 08 sub $0x8,%rsp
4 400454: ba ff ff ff 00 mov $0xffffff,%edx
5 400459: be f4 05 40 00 mov $0x4005f4,%esi
6 40045e: bf 01 00 00 00 mov $0x1,%edi
7 400463: 31 c0 xor %eax,%eax
8 400465: e8 c6 ff ff ff callq 400430 <__printf_chk@plt>
9 ...

Notice that there's no loop at all! Here the compiler is not merely reordering, but entirely omitting -- the point I's compiler in full swing.

I also tried out a CPU reordering demonstration, which I copied with some changes, below:

1 #include < pthread.h>
2 #include < semaphore.h>
3 #include < stdio.h>
4
5 // Set one of these to 1 to prevent CPU reordering
6 #define USE_CPU_FENCE 0
7 #define USE_SINGLE_HW_THREAD 0
8
9 #if USE_SINGLE_HW_THREAD
10 #include < sched.h>
11 #endif
12
13
14 sem_t beginSema1, beginSema2, endSema;
15 int X, Y, r1, r2;
16
17 inline void fence()
18 {
19 #if USE_CPU_FENCE
20 // Prevent CPU (and compiler) reordering across this barrier/fence
21 asm volatile("mfence" ::: "memory");
22 #else
23 // Prevent just the compiler reordering across this barrier/fence
24 asm volatile("" ::: "memory");
25 #endif
26 }
27
28 void *thread1Func(void *param)
29 {
30 for (;;)
31 {
32 sem_wait(&beginSema1); // Wait for signal
33
34 // ----- THE TRANSACTION! -----
35 X = 1;
36 fence();
37 r1 = Y;
38
39 sem_post(&endSema); // Notify transaction complete
40 }
41 };
42
43 void *thread2Func(void *param)
44 {
45 for (;;)
46 {
47 sem_wait(&beginSema2); // Wait for signal
48
49 // ----- THE TRANSACTION! -----
50 Y = 1;
51 fence();
52 r2 = X;
53
54 sem_post(&endSema); // Notify transaction complete
55 }
56 };
57
58 int main()
59 {
60 // Initialize the semaphores
61 sem_init(&beginSema1, 0, 0);
62 sem_init(&beginSema2, 0, 0);
63 sem_init(&endSema, 0, 0);
64
65 // Spawn the threads
66 pthread_t thread1, thread2;
67 pthread_create(&thread1, NULL, thread1Func, NULL);
68 pthread_create(&thread2, NULL, thread2Func, NULL);
69
70 #if USE_SINGLE_HW_THREAD
71 // Force thread affinities to the same cpu core.
72 cpu_set_t cpus;
73 CPU_ZERO(&cpus);
74 CPU_SET(0, &cpus);
75 pthread_setaffinity_np(thread1, sizeof(cpu_set_t), &cpus);
76 pthread_setaffinity_np(thread2, sizeof(cpu_set_t), &cpus);
77 #endif
78
79 // Repeat the experiment ad infinitum
80 int detected = 0;
81 for (int iterations = 1; ; iterations++)
82 {
83 // Reset X and Y
84 X = 0;
85 Y = 0;
86 // Signal both threads
87 sem_post(&beginSema1);
88 sem_post(&beginSema2);
89
90 // Wait for both threads
91 sem_wait(&endSema);
92 sem_wait(&endSema);
93
94 // Check if there was a simultaneous reorder
95 if (r1 == 0 && r2 == 0)
96 {
97 detected++;
98 printf("%d reorders detected after %d iterations\n",
99 detected, iterations);
100 }
101 }
102 return 0; // Never returns
103 }

On my machine, it behaved exactly as advertised. I also removed the randomization6, and it still behaved the same. So, again, the point I's CPU in full swing.

TRB locks

Underneath the TRB's critical section construct is found the Linux libc pthread library. When a critical section starts, pthread_mutex_lock() is executed. Equally, when a critical section ends, pthread_mutex_unlock() is called. Also, a pthread_mutex_trylock() variation exists.

I dug into TRB's musl libc, looking for clues do pthread_mutex_lock(), or pthread_mutex_unlock() include a CPU memory barrier, or not, and what would such a barrier look like. I found a Linux kernel "futex" syscall, which I further dug into, but could only find suggestive material, and not properly verify the case. The closest I got was that the futex syscall, for x86_64, may contain CPU instructions like:

  • LOCK XCHGL
  • LOCK XADDL
  • LOCK ORL
  • LOCK ANDL
  • LOCK XORL

Because the pthread lock calls are external calls, likely that alone makes them implicit compiler memory barriers.

Finally

An other type of a concurrency issue...

  1. See "Tasks and Synchronization" []
  2. Or at least, should provide []
  3. Atomicity matters in that sense, that, for example, you only want to read either: 1) an old value of an integer, or 2) a new value, but not 3) a whatever-bit-mixture of both, because the new value's write was in-progress at the moment of the read. []
  4. Again, AFAIK. []
  5. bool, char, int, long: signed/unsigned; float, double []
  6. Shown in the original version []

Powered by a less-enormous pile of '???'