From mboxrd@z Thu Jan 1 00:00:00 1970 From: trd@45mercystreet.com (Toby Douglass) Date: Sat, 21 Nov 2009 16:21:00 +0100 Subject: CAS implementation may be broken In-Reply-To: <4AF1FBA4.4070307@xenomai.org> References: <4AF1C361.8090405@45mercystreet.com> <20091104190544.GA518@n2100.arm.linux.org.uk> <4AF1FBA4.4070307@xenomai.org> Message-ID: <4B08055C.3000408@45mercystreet.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi all - I may be *utterly* wrong, and I'm expecting that someone here is simply going to look at what I'm about to say and explain to me what I've misunderstood, but I suspect it may be that atomic compare-and-swap is incorrectly implemented in the Linux kernel on ARM v6 and above (e.g. using ldrex and strex). I'm looking at this code, for 2.6.31, which I believe is the latest released version; http://lxr.linux.no/#linux+v2.6.31/arch/arm/include/asm/system.h 382 do { 383 asm volatile("@ __cmpxchg4\n" 384 " ldrex %1, [%2]\n" 385 " mov %0, #0\n" 386 " teq %1, %3\n" 387 " strexeq %0, %4, [%2]\n" 388 : "=&r" (res), "=&r" (oldval) 389 : "r" (ptr), "Ir" (old), "r" (new) 390 : "memory", "cc"); 391 } while (res); This code is for a four byte CAS; the issue I'm about to describe exists similarly for the two byte and one byte CASs. The issue is the ABA problem. Imagine you have a populated stack. Two threads come to pop. Both read the head pointer and the next pointer of the top element and both simultaneously come to the point where the are about to CAS, comparing the head pointer with the top element pointer and if they match, replacing the head pointer with the next pointer of the top element - so by this, I mean both threads have just got into the do-while loop above, but have not yet executed ldrex. Now, one thread pauses for a long time. The other thread pops. Then a bunch of the other threads pop and push and the stack is *totally* different. Then our initial thread which popped *pushes* that original element back. Finally, our second thread kicks into life. Remember that his exchange pointer is now utterly incorrect, because the stack has completely changed. Now, he's just inside the do-while loop. He checks the top pointer - and lo and behold, it matches the head element. However, because someone else has messed with the top pointer in the meantime, the CAS fails - strex refuses to swap, it can see from the shared TLB attribute that someone else has touched destination (the top pointer). So we dodge ABA *there*. The problem is *we then come round in the do-while loop again*. We have *not* updated our exchange value. So THIS second time around, we *repeat* our strex and we DO swap - and we just swapped in completely the wrong next pointer, from way back before the stack was totally changed by all the other threads popping and pushing. So that's the problem. A common way to solve ABA on contiguous double-word compare-and-swap architectures (e.g. x86, x64 and Itanium) is to use a pointer-counter pair. This permits the caller to ensure swaps fail, even if the pointers all match, because the counters don't. Load-linked/conditional-store architectures solve ABA by having the store fail if the destination has been touched since the load was performed. Currently, the code appears to violate this, by repeating the CAS *anyway*. In fact, the appropriate behaviour would seem to be *not* to loop, but rather, to issue the ldrex/strex *once*, and indicate to the user if the store succeed or failed. This requires a prototype change, because the return value is the original value of the destination and so is unable to indicate, when returning that value, if it is returned from a successful or unsuccessful swap.