Carlos, Dave, This patch hasn't been finally discussed (and merged) yet. I've attached the last version of the patch from Carlos, that way it get archived in Kyle's Patchwork as well :-) My personal opinion is, that we should try to reduce the number of clobbered registers (which is in line with what Dave said below). Thread is here: http://marc.info/?t=121612540800004&r=1&w=2 Helge John David Anglin wrote: >> The question is "Are you OK with the existing ABI?" :-) > > No. As I understand it, r2 doesn't need to be clobbered because > glibc doesn't currently clobber it. So, using it in the LWS code > would cause an ABI break. That's one register back to userspace. > > I want to keep r19 and r27 for userspace so the PIC register doesn't > have to be saved and restored in the asm (linux-atomic.c is compiled > as PIC code). You can have r29. > > That leaves three free registers for the LWS code: r22, r23 and r29. > The LWS ABI has r1, r20-r26 and r28-r31. Userspace has two call-clobbered > registers free across the asm in PIC code, and three in non-PIC code. > That's enough to efficiently perform the error comparisons. > > The asm would be more efficient if the registers used for lws_mem, > lws_old and lws_new were not written to. This occurs only for the > call in the 32-bit runtime with a 64-bit kernel. As it stands, > the lws_mem, lws_old and lws_new arguments get reloaded every time > around the EAGAIN loop. This is the crucial code in the compare > and swap: > > /* The load and store could fail */ > 1: ldw 0(%sr3,%r26), %r28 > sub,<> %r28, %r25, %r0 > 2: stw %r24, 0(%sr3,%r26) > > The sub,<> instruction uses a 32-bit compare/subtract condition, so > the clipping of r25 isn't necessary. Similarly, the stw instruction > ignores the most significant 32-bits of r24. The value in r26 needs > clipping but you have three free registers, and it looks like r1 is > also free at this point in the code. You can deposit the least > significant 32-bits of r26 into a field of zeros in another register > in one instruction. > > It looks like lws_compare_and_swap64 and lws_compare_and_swap32 become > more or less functionally identical. The above would become something > like: > > #ifdef CONFIG_64BIT > depd,z %r26,63,32,%r1 > 1: ldw 0(%sr3,%r1), %r28 > sub,<> %r28, %r25, %r0 > 2: stw %r24, 0(%sr3,%r1) > #else > 1: ldw 0(%sr3,%r26), %r28 > sub,<> %r28, %r25, %r0 > 2: stw %r24, 0(%sr3,%r26) > #endif > > The argument clipping in the current code would be removed. As a result, > the branch to lws_compare_and_swap can be eliminated in the 64-bit path. > > It's my impression that the tightness of the loop for the compare/exchange > operation is important. > > Dave