I am wondering why Pelles is using the LOCK CMPXCHG instruction instead of LOCK XADD to implement atomic_fetch_add_explicit?
Using CMPXCHG is inefficient and completely unnecessary for fetch-and-add. CMPXCHG requires a loop and is lock-free, while XADD is loopless and wait-free.
I am baffled by this decision. Fwiw, here is a little test program I wrote:
____________________
#include <stdio.h>
#include <threads.h>
#include <stdatomic.h>
static atomic_int g_count = 0;
static int my_thread(void* arg)
{
printf("my_thread(%p)\n", arg);
atomic_fetch_add_explicit(&g_count, 1, memory_order_relaxed);
return 1234;
}
int main(void)
{
thrd_t t;
printf("ATOMIC_INT_LOCK_FREE = %d\n", ATOMIC_INT_LOCK_FREE);
thrd_create(&t, my_thread, NULL);
int ret = 0;
thrd_join(t, &ret);
printf("my_thread ret:%d, g_count:%d\n", ret, g_count);
return 0;
}
____________________
I looked at the disassembly and was shocked to see LOCK CMPXCHG. Damn.