Re: x86-32: clean up rwsem inline asm statements

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: x86-32: clean up rwsem inline asm statements
@ 2010-01-13 19:58 George Spelvin
  2010-01-13 21:04 ` H. Peter Anvin
  0 siblings, 1 reply; 10+ messages in thread
From: George Spelvin @ 2010-01-13 19:58 UTC (permalink / raw)
  To: hpa; +Cc: linux, linux-kernel, schwab, torvalds

> As far as I can tell, very few of these assembly statements actually
> need a size at all -- the very first inc statement is purely to memory,
> and as such it needs a size marker, but almost any operation which is
> register-register or register-memory will simply take its size from the
> register operand.  For those, it seems cleaner to simply drop the size
> suffix, and in fact this is the style we have been pushing people
> towards (use the suffix where mandatory or where the size is fixed
> anyway, to help catch bugs; use no suffix where the size can vary and is
> implied by the operands.)

The one thing is that for a register-memory operation, using the
size of the memory operand can catch bugs where it doesn't match
the size of the register operand.

GCC's inline asm doesn't make operand size very implicit, and it's
awkward to cast output operands, so there's a potential for bugs.
I especially get nervous when the operand itself is an immediate
constant, as I can't remember the ANSI rules for the type very well.
(Quick: is 0x80000000 an unsigned 32-bit int or a signed 64-bit one?
What about 2147483648 or 1<<31?)

Since this is mostly inline functions, it's not so big a problem, but
I'd consider something like:

static inline void __up_write(struct rw_semaphore *sem)
{
       unsigned long tmp;
	asm volatile("# beginning __up_write\n\t"
		     LOCK_PREFIX "  xadd%z0	%1,(%2)\n\t"
		     /* tries to transition
			0xffff0001 -> 0x00000000 */
		     "  jz	1f\n"
		     "  call call_rwsem_wake\n"
		     "1:\n\t"
		     "# ending __up_write\n"
       	     : "+m" (sem->count), "=d" (tmp)
       	     : "a" (sem), "1" (-RWSEM_ACTIVE_WRITE_BIAS)
       	     : "memory", "cc");
}

Just in case the size of -RWSEM_ACTIVE_WRITE_BIAS doesn't match that
of sem->count.  It'll explode when you try to run it, of course, but
there's something to be said for compile-time errors.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86-32: clean up rwsem inline asm statements
  2010-01-13 19:58 x86-32: clean up rwsem inline asm statements George Spelvin
@ 2010-01-13 21:04 ` H. Peter Anvin
  2010-01-14  0:27   ` George Spelvin
  2010-01-14  1:05   ` Linus Torvalds
  0 siblings, 2 replies; 10+ messages in thread
From: H. Peter Anvin @ 2010-01-13 21:04 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-kernel, schwab, torvalds

On 01/13/2010 11:58 AM, George Spelvin wrote:
>> As far as I can tell, very few of these assembly statements actually
>> need a size at all -- the very first inc statement is purely to memory,
>> and as such it needs a size marker, but almost any operation which is
>> register-register or register-memory will simply take its size from the
>> register operand.  For those, it seems cleaner to simply drop the size
>> suffix, and in fact this is the style we have been pushing people
>> towards (use the suffix where mandatory or where the size is fixed
>> anyway, to help catch bugs; use no suffix where the size can vary and is
>> implied by the operands.)
> 
> The one thing is that for a register-memory operation, using the
> size of the memory operand can catch bugs where it doesn't match
> the size of the register operand.
> 
> GCC's inline asm doesn't make operand size very implicit, and it's
> awkward to cast output operands, so there's a potential for bugs.
> I especially get nervous when the operand itself is an immediate
> constant, as I can't remember the ANSI rules for the type very well.
> (Quick: is 0x80000000 an unsigned 32-bit int or a signed 64-bit one?
> What about 2147483648 or 1<<31?)
> 
> Since this is mostly inline functions, it's not so big a problem, but
> I'd consider something like:
> 
> static inline void __up_write(struct rw_semaphore *sem)
> {
>        unsigned long tmp;
> 	asm volatile("# beginning __up_write\n\t"
> 		     LOCK_PREFIX "  xadd%z0	%1,(%2)\n\t"
> 		     /* tries to transition
> 			0xffff0001 -> 0x00000000 */
> 		     "  jz	1f\n"
> 		     "  call call_rwsem_wake\n"
> 		     "1:\n\t"
> 		     "# ending __up_write\n"
>        	     : "+m" (sem->count), "=d" (tmp)
>        	     : "a" (sem), "1" (-RWSEM_ACTIVE_WRITE_BIAS)
>        	     : "memory", "cc");
> }
> 
> Just in case the size of -RWSEM_ACTIVE_WRITE_BIAS doesn't match that
> of sem->count.  It'll explode when you try to run it, of course, but
> there's something to be said for compile-time errors.

There are a number of things that can be done better... for one thing,
"+m" (sem->count) and "a" (sem) is just bloody wrong.  The right thing
would be "a" (&sem->count) for proper robustness.

There is no real point in being concerned about the type of immediates,
because the immediate type isn't really used... it shows up as a literal
in the assembly language.  However, if you're really concerned, the
right thing to do is to do a cast in C, not playing games with the assembly.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86-32: clean up rwsem inline asm statements
  2010-01-13 21:04 ` H. Peter Anvin
@ 2010-01-14  0:27   ` George Spelvin
  2010-01-14  0:39     ` H. Peter Anvin
  2010-01-14  1:05   ` Linus Torvalds
  1 sibling, 1 reply; 10+ messages in thread
From: George Spelvin @ 2010-01-14  0:27 UTC (permalink / raw)
  To: hpa, linux; +Cc: linux-kernel, schwab, torvalds

> There are a number of things that can be done better... for one thing,
> "+m" (sem->count) and "a" (sem) is just bloody wrong.  The right thing
> would be "a" (&sem->count) for proper robustness.

Actually, no.  The "+m" (sem->count) is telling GCC that sem->count is
updated; "a" (&sem->count) does *not* tell it to invalidate cached
copies of sem->count that it may have lying around.

However, we can't just use "+m" (sem->count) because GCC has a poor
grasp on the concept of atomic operations; as far as it is concerned,
it is exactly equivalent to copying the value into a stack slot, do the
operation there, and copy it back.

(It's much more likely to do that with "g" operand constraints, but it has
been known to point to a stack copy that it's made for some other reason.)

The current situation is, as far as I can remember the previous discussion
on the subject, the simplest way to explain to GCC just what it needs to do.

> There is no real point in being concerned about the type of immediates,
> because the immediate type isn't really used... it shows up as a literal
> in the assembly language.  However, if you're really concerned, the
> right thing to do is to do a cast in C, not playing games with the assembly.

Ah, right, sorry; I remembered having this problem, but it was with
register operands.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86-32: clean up rwsem inline asm statements
  2010-01-14  0:27   ` George Spelvin
@ 2010-01-14  0:39     ` H. Peter Anvin
  0 siblings, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2010-01-14  0:39 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-kernel, schwab, torvalds

On 01/13/2010 04:27 PM, George Spelvin wrote:
>> There are a number of things that can be done better... for one thing,
>> "+m" (sem->count) and "a" (sem) is just bloody wrong.  The right thing
>> would be "a" (&sem->count) for proper robustness.
> 
> Actually, no.  The "+m" (sem->count) is telling GCC that sem->count is
> updated; "a" (&sem->count) does *not* tell it to invalidate cached
> copies of sem->count that it may have lying around.
> 
> However, we can't just use "+m" (sem->count) because GCC has a poor
> grasp on the concept of atomic operations; as far as it is concerned,
> it is exactly equivalent to copying the value into a stack slot, do the
> operation there, and copy it back.

You completely missed the point.

The reason we need both "+m" (sem->count) and "a" (sem) is because we
need the address of the semaphore in %eax in case we hit the slow path.
 They're actually orthogonal requirements, and we could implement them as:

	LOCK_PREFIX xadd %1,%0
	jz 1f
	call call_rwsem_wake
1:

	: "+m" (sem->count), "=d" (tmp)
       	: "a" (sem)

... just fine, and arguably that would be more correct in the sense that
if sem->count isn't the first member of *sem, then we still pass the
address of sem in %eax to call_rwsem_wake.  In practice, with the
zero-offset sem->count, gcc will typically generate (%eax) as the
argument since it has it available anyway, but it wouldn't have to.

The code as it currently is:

	LOCK_PREFIX xadd %1,(%2)
	jz 1f
	call call_rwsem_wake
1:

	: "+m" (sem->count), "=d" (tmp)
       	: "a" (sem)

... implicitly assumes the offset of "count" is zero but doesn't enforce
it in any way.  My comment was that since the assumption is clearly that
arguments 0 and 2 point to the same memory object, it should be
(&sem->count) in the second case, but that's actually not really the
"proper" fix because call_rwsem_wake presumably expects the value of sem.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86-32: clean up rwsem inline asm statements
  2010-01-13 21:04 ` H. Peter Anvin
  2010-01-14  0:27   ` George Spelvin
@ 2010-01-14  1:05   ` Linus Torvalds
  1 sibling, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2010-01-14  1:05 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: George Spelvin, linux-kernel, schwab

On Wed, 13 Jan 2010, H. Peter Anvin wrote:
> 
> There are a number of things that can be done better... for one thing,
> "+m" (sem->count) and "a" (sem) is just bloody wrong.  The right thing
> would be "a" (&sem->count) for proper robustness.

Actually, no.

Strictly speaking, we should use "a" (sem), and then use '%0' (pointing to 
the "+m" (sem->count)) for the actual memory access in the inline asm, 
rather than '(%1)'.

We do need %eax to contain the pointer to the semaphore, because _that_ is 
what we pass in as an argument (and return as a value!) to the rwsem slow 
paths.

So "a" (sem) is absolutely the right thing to do.

The reason we use "(%1)" rather than "%0" is - if I recall correctly - 
that back when we inlined it, gcc would often stupidly use the original 
value of the semaphore address rather than %eax (which obviously also 
contained it), and it generated larger code with big constants etc.

Now, since we only inline it in one place anyway, and the semaphore is 
always an argument to that inlining site anyway, I don't think it matters 
(it's never going to be some global pointer), and we probably could/should 
just do this right. So instead of

	LOCK_PREFIX "  inc%z0      (%1)\n\t"

we'd probably be better off with just

	LOCK_PREFIX "  inc%z0 %0\n\t"

instead, letting gcc generate the memory pointer to "sem->count" itself.

		Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* x86-32: clean up rwsem inline asm statements
@ 2010-01-13  0:21 Linus Torvalds
  2010-01-13  0:45 ` Andreas Schwab
  2010-01-13  0:48 ` H. Peter Anvin
  0 siblings, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2010-01-13  0:21 UTC (permalink / raw)
  To: Ingo Molnar, Peter Anvin, Thomas Gleixner; +Cc: Linux Kernel Mailing List


This makes gcc use the right register names and instruction operand sizes 
automatically for the rwsem inline asm statements.

So instead of using "(%%eax)" to specify the memory address that is the 
semaphore, we use "(%1)" or similar. And instead of forcing the operation 
to always be 32-bit, we use "%z0", taking the size from the actual 
semaphore data structure itself.

This doesn't actually matter on x86-32, but if we want to use the same 
inline asm for x86-64, we'll need to have the compiler generate the proper 
64-bit names for the registers (%rax instead of %eax), and if we want to 
use a 64-bit counter too (in order to avoid the 15-bit limit on the 
write counter that limits concurrent users to 32767 threads), we'll need 
to be able to generate instructions with "q" accesses rather than "l".

Since this header currently isn't enabled on x86-64, none of that matters, 
but we do want to use the xadd version of the semaphores rather than have 
to take spinlocks to do a rwsem. The mm->mmap_sem can be heavily contended 
when you have lots of threads all taking page faults, and the fallback 
rwsem code that uses a spinlock performs abysmally badly in that case.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This really should be a no-op, but it does seem to do something to gcc so 
that when I do a "cmp" on the whole kernel before and after, it doesn't 
match. The differences _seem_ to be some code size issue that change all 
the offsets, it's a bit hard to figure it out from the binary.

Anyway, this is purely preparatory, but I think it cleans up the asm too, 
so...

		Linus

 arch/x86/include/asm/rwsem.h |   30 +++++++++++++++---------------
 1 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index ca7517d..4a884e0 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -105,7 +105,7 @@ do {								\
 static inline void __down_read(struct rw_semaphore *sem)
 {
 	asm volatile("# beginning down_read\n\t"
-		     LOCK_PREFIX "  incl      (%%eax)\n\t"
+		     LOCK_PREFIX "  inc%z0      (%1)\n\t"
 		     /* adds 0x00000001, returns the old value */
 		     "  jns        1f\n"
 		     "  call call_rwsem_down_read_failed\n"
@@ -123,10 +123,10 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 {
 	__s32 result, tmp;
 	asm volatile("# beginning __down_read_trylock\n\t"
-		     "  movl      %0,%1\n\t"
+		     "  mov%z0      %0,%1\n\t"
 		     "1:\n\t"
-		     "  movl	     %1,%2\n\t"
-		     "  addl      %3,%2\n\t"
+		     "  mov%z0	    %1,%2\n\t"
+		     "  add%z0      %3,%2\n\t"
 		     "  jle	     2f\n\t"
 		     LOCK_PREFIX "  cmpxchgl  %2,%0\n\t"
 		     "  jnz	     1b\n\t"
@@ -147,9 +147,9 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 
 	tmp = RWSEM_ACTIVE_WRITE_BIAS;
 	asm volatile("# beginning down_write\n\t"
-		     LOCK_PREFIX "  xadd      %%edx,(%%eax)\n\t"
+		     LOCK_PREFIX "  xadd%z0     %1,(%2)\n\t"
 		     /* subtract 0x0000ffff, returns the old value */
-		     "  testl     %%edx,%%edx\n\t"
+		     "  test%z0     %1,%1\n\t"
 		     /* was the count 0 before? */
 		     "  jz        1f\n"
 		     "  call call_rwsem_down_write_failed\n"
@@ -185,7 +185,7 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	__s32 tmp = -RWSEM_ACTIVE_READ_BIAS;
 	asm volatile("# beginning __up_read\n\t"
-		     LOCK_PREFIX "  xadd      %%edx,(%%eax)\n\t"
+		     LOCK_PREFIX "  xadd%z0     %1,(%2)\n\t"
 		     /* subtracts 1, returns the old value */
 		     "  jns        1f\n\t"
 		     "  call call_rwsem_wake\n"
@@ -201,18 +201,18 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
+	unsigned long tmp;
 	asm volatile("# beginning __up_write\n\t"
-		     "  movl      %2,%%edx\n\t"
-		     LOCK_PREFIX "  xaddl     %%edx,(%%eax)\n\t"
+		     LOCK_PREFIX "  xadd%z0     %1,(%2)\n\t"
 		     /* tries to transition
 			0xffff0001 -> 0x00000000 */
 		     "  jz       1f\n"
 		     "  call call_rwsem_wake\n"
 		     "1:\n\t"
 		     "# ending __up_write\n"
-		     : "+m" (sem->count)
-		     : "a" (sem), "i" (-RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc", "edx");
+		     : "+m" (sem->count),"=d" (tmp)
+		     : "a" (sem), "1" (-RWSEM_ACTIVE_WRITE_BIAS)
+		     : "memory", "cc");
 }
 
 /*
@@ -221,7 +221,7 @@ static inline void __up_write(struct rw_semaphore *sem)
 static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	asm volatile("# beginning __downgrade_write\n\t"
-		     LOCK_PREFIX "  addl      %2,(%%eax)\n\t"
+		     LOCK_PREFIX "  add%z0     %2,(%1)\n\t"
 		     /* transitions 0xZZZZ0001 -> 0xYYYY0001 */
 		     "  jns       1f\n\t"
 		     "  call call_rwsem_downgrade_wake\n"
@@ -237,7 +237,7 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
  */
 static inline void rwsem_atomic_add(int delta, struct rw_semaphore *sem)
 {
-	asm volatile(LOCK_PREFIX "addl %1,%0"
+	asm volatile(LOCK_PREFIX "add%z0 %1,%0"
 		     : "+m" (sem->count)
 		     : "ir" (delta));
 }
@@ -249,7 +249,7 @@ static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
 {
 	int tmp = delta;
 
-	asm volatile(LOCK_PREFIX "xadd %0,%1"
+	asm volatile(LOCK_PREFIX "xadd%z0 %0,%1"
 		     : "+r" (tmp), "+m" (sem->count)
 		     : : "memory");
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: x86-32: clean up rwsem inline asm statements
  2010-01-13  0:21 Linus Torvalds
@ 2010-01-13  0:45 ` Andreas Schwab
  2010-01-13  1:26   ` Linus Torvalds
  2010-01-13  0:48 ` H. Peter Anvin
  1 sibling, 1 reply; 10+ messages in thread
From: Andreas Schwab @ 2010-01-13  0:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Peter Anvin, Thomas Gleixner,
	Linux Kernel Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> @@ -249,7 +249,7 @@ static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
>  {
>  	int tmp = delta;
>  
> -	asm volatile(LOCK_PREFIX "xadd %0,%1"
> +	asm volatile(LOCK_PREFIX "xadd%z0 %0,%1"
>  		     : "+r" (tmp), "+m" (sem->count)
>  		     : : "memory");

I think %z0 should be %z1 here.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86-32: clean up rwsem inline asm statements
  2010-01-13  0:45 ` Andreas Schwab
@ 2010-01-13  1:26   ` Linus Torvalds
  0 siblings, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2010-01-13  1:26 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Ingo Molnar, Peter Anvin, Thomas Gleixner,
	Linux Kernel Mailing List

On Wed, 13 Jan 2010, Andreas Schwab wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > @@ -249,7 +249,7 @@ static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
> >  {
> >  	int tmp = delta;
> >  
> > -	asm volatile(LOCK_PREFIX "xadd %0,%1"
> > +	asm volatile(LOCK_PREFIX "xadd%z0 %0,%1"
> >  		     : "+r" (tmp), "+m" (sem->count)
> >  		     : : "memory");
> 
> I think %z0 should be %z1 here.

You're right, but Peter made it all a non-issue anyway.

Thanks for noticing, though, since it implies that somebody read the patch 
carefully. It's always a bit scary to change fundamental inline asms, 
especially for something like a locking routine where the core _semantics_ 
are so subtle (and if you get locking wrong, things may still work - you 
just end up with some really odd race conditions).

		Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86-32: clean up rwsem inline asm statements
  2010-01-13  0:21 Linus Torvalds
  2010-01-13  0:45 ` Andreas Schwab
@ 2010-01-13  0:48 ` H. Peter Anvin
  2010-01-13  0:59   ` Linus Torvalds
  1 sibling, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2010-01-13  0:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Thomas Gleixner, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1447 bytes --]

On 01/12/2010 04:21 PM, Linus Torvalds wrote:
> 
> So instead of using "(%%eax)" to specify the memory address that is the 
> semaphore, we use "(%1)" or similar. And instead of forcing the operation 
> to always be 32-bit, we use "%z0", taking the size from the actual 
> semaphore data structure itself.
> 
> This doesn't actually matter on x86-32, but if we want to use the same 
> inline asm for x86-64, we'll need to have the compiler generate the proper 
> 64-bit names for the registers (%rax instead of %eax), and if we want to 
> use a 64-bit counter too (in order to avoid the 15-bit limit on the 
> write counter that limits concurrent users to 32767 threads), we'll need 
> to be able to generate instructions with "q" accesses rather than "l".
> 

As far as I can tell, very few of these assembly statements actually
need a size at all -- the very first inc statement is purely to memory,
and as such it needs a size marker, but almost any operation which is
register-register or register-memory will simply take its size from the
register operand.  For those, it seems cleaner to simply drop the size
suffix, and in fact this is the style we have been pushing people
towards (use the suffix where mandatory or where the size is fixed
anyway, to help catch bugs; use no suffix where the size can vary and is
implied by the operands.)

So, proposed alternate version of your patch attached.

(Also changed cmpxchgl -> cmpxchg.)

	-hpa

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 3474 bytes --]

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index ca7517d..4136200 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -105,7 +105,7 @@ do {								\
 static inline void __down_read(struct rw_semaphore *sem)
 {
 	asm volatile("# beginning down_read\n\t"
-		     LOCK_PREFIX "  incl      (%%eax)\n\t"
+		     LOCK_PREFIX "  inc%z0      (%1)\n\t"
 		     /* adds 0x00000001, returns the old value */
 		     "  jns        1f\n"
 		     "  call call_rwsem_down_read_failed\n"
@@ -123,12 +123,12 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 {
 	__s32 result, tmp;
 	asm volatile("# beginning __down_read_trylock\n\t"
-		     "  movl      %0,%1\n\t"
+		     "  mov          %0,%1\n\t"
 		     "1:\n\t"
-		     "  movl	     %1,%2\n\t"
-		     "  addl      %3,%2\n\t"
+		     "  mov          %1,%2\n\t"
+		     "  add          %3,%2\n\t"
 		     "  jle	     2f\n\t"
-		     LOCK_PREFIX "  cmpxchgl  %2,%0\n\t"
+		     LOCK_PREFIX "  cmpxchg  %2,%0\n\t"
 		     "  jnz	     1b\n\t"
 		     "2:\n\t"
 		     "# ending __down_read_trylock\n\t"
@@ -147,9 +147,9 @@ static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
 
 	tmp = RWSEM_ACTIVE_WRITE_BIAS;
 	asm volatile("# beginning down_write\n\t"
-		     LOCK_PREFIX "  xadd      %%edx,(%%eax)\n\t"
+		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
 		     /* subtract 0x0000ffff, returns the old value */
-		     "  testl     %%edx,%%edx\n\t"
+		     "  test      %1,%1\n\t"
 		     /* was the count 0 before? */
 		     "  jz        1f\n"
 		     "  call call_rwsem_down_write_failed\n"
@@ -185,7 +185,7 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	__s32 tmp = -RWSEM_ACTIVE_READ_BIAS;
 	asm volatile("# beginning __up_read\n\t"
-		     LOCK_PREFIX "  xadd      %%edx,(%%eax)\n\t"
+		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
 		     /* subtracts 1, returns the old value */
 		     "  jns        1f\n\t"
 		     "  call call_rwsem_wake\n"
@@ -201,18 +201,18 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
+	unsigned long tmp;
 	asm volatile("# beginning __up_write\n\t"
-		     "  movl      %2,%%edx\n\t"
-		     LOCK_PREFIX "  xaddl     %%edx,(%%eax)\n\t"
+		     LOCK_PREFIX "  xadd      %1,(%2)\n\t"
 		     /* tries to transition
 			0xffff0001 -> 0x00000000 */
 		     "  jz       1f\n"
 		     "  call call_rwsem_wake\n"
 		     "1:\n\t"
 		     "# ending __up_write\n"
-		     : "+m" (sem->count)
-		     : "a" (sem), "i" (-RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc", "edx");
+		     : "+m" (sem->count), "=d" (tmp)
+		     : "a" (sem), "1" (-RWSEM_ACTIVE_WRITE_BIAS)
+		     : "memory", "cc");
 }
 
 /*
@@ -221,7 +221,7 @@ static inline void __up_write(struct rw_semaphore *sem)
 static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	asm volatile("# beginning __downgrade_write\n\t"
-		     LOCK_PREFIX "  addl      %2,(%%eax)\n\t"
+		     LOCK_PREFIX "  add%z0    %2,(%1)\n\t"
 		     /* transitions 0xZZZZ0001 -> 0xYYYY0001 */
 		     "  jns       1f\n\t"
 		     "  call call_rwsem_downgrade_wake\n"
@@ -237,7 +237,7 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
  */
 static inline void rwsem_atomic_add(int delta, struct rw_semaphore *sem)
 {
-	asm volatile(LOCK_PREFIX "addl %1,%0"
+	asm volatile(LOCK_PREFIX "add%z0 %1,%0"
 		     : "+m" (sem->count)
 		     : "ir" (delta));
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: x86-32: clean up rwsem inline asm statements
  2010-01-13  0:48 ` H. Peter Anvin
@ 2010-01-13  0:59   ` Linus Torvalds
  0 siblings, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2010-01-13  0:59 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, Linux Kernel Mailing List



On Tue, 12 Jan 2010, H. Peter Anvin wrote:
> 
> As far as I can tell, very few of these assembly statements actually
> need a size at all -- the very first inc statement is purely to memory,
> and as such it needs a size marker, but almost any operation which is
> register-register or register-memory will simply take its size from the
> register operand.  For those, it seems cleaner to simply drop the size
> suffix, and in fact this is the style we have been pushing people
> towards (use the suffix where mandatory or where the size is fixed
> anyway, to help catch bugs; use no suffix where the size can vary and is
> implied by the operands.)
> 
> So, proposed alternate version of your patch attached.

Looks good to me.

		Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-01-14  1:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-13 19:58 x86-32: clean up rwsem inline asm statements George Spelvin
2010-01-13 21:04 ` H. Peter Anvin
2010-01-14  0:27   ` George Spelvin
2010-01-14  0:39     ` H. Peter Anvin
2010-01-14  1:05   ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2010-01-13  0:21 Linus Torvalds
2010-01-13  0:45 ` Andreas Schwab
2010-01-13  1:26   ` Linus Torvalds
2010-01-13  0:48 ` H. Peter Anvin
2010-01-13  0:59   ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox