public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC patch] cmpxchg_double: remove local variables to get better performance
@ 2012-03-02  8:31 Alex Shi
  2012-03-02  8:54 ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Shi @ 2012-03-02  8:31 UTC (permalink / raw)
  To: tglx, hpa@zytor.com, mingo@redhat.com, x86@kernel.org,
	linux-kernel@vger.kernel.org, jeremy, jbeulich
  Cc: Andi Kleen, asit.k.mallick@intel.com

There are some local variables in cmpxchg_double macro, seems these are
used to for force casting on input variables to transfer them into '*p1'
type. May there are some reason I don't know. But I just saw 2 problems
here:

1, user may mis-use the macro, like give a 'long' type o1, but just use
a 'int*' or 'char*' p1.  
If we remove the force cast here, gcc will check the mis-using in
compiling. and user can get the error report in compiling for such
issues.

2, local variable increased the data section, and bring extra memory bus
accesses, that hurt performance in this critical macro.
I did a little experiment on my nhm i7 desktop, to run the macro with a
fixed times, here is the data:
			 using local vars         no local variable
with lock prefix,         267700578ns             232079696ns
without lock prefix,      34715666ns              34687566ns

So, we may need rethink about the local variable usage here. 

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index b3b7332..8bf9127 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -210,17 +210,15 @@ extern void __add_wrong_size(void)
 #define __cmpxchg_double(pfx, p1, p2, o1, o2, n1, n2)			\
 ({									\
 	bool __ret;							\
-	__typeof__(*(p1)) __old1 = (o1), __new1 = (n1);			\
-	__typeof__(*(p2)) __old2 = (o2), __new2 = (n2);			\
 	BUILD_BUG_ON(sizeof(*(p1)) != sizeof(long));			\
 	BUILD_BUG_ON(sizeof(*(p2)) != sizeof(long));			\
 	VM_BUG_ON((unsigned long)(p1) % (2 * sizeof(long)));		\
 	VM_BUG_ON((unsigned long)((p1) + 1) != (unsigned long)(p2));	\
 	asm volatile(pfx "cmpxchg%c4b %2; sete %0"			\
-		     : "=a" (__ret), "+d" (__old2),			\
+		     : "=a" (__ret), "+d" (o2),				\
 		       "+m" (*(p1)), "+m" (*(p2))			\
-		     : "i" (2 * sizeof(long)), "a" (__old1),		\
-		       "b" (__new1), "c" (__new2));			\
+		     : "i" (2 * sizeof(long)), "a" (o1),		\
+		       "b" (n1), "c" (n2));				\
 	__ret;								\
 })
 



^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-03-03  6:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-02  8:31 [RFC patch] cmpxchg_double: remove local variables to get better performance Alex Shi
2012-03-02  8:54 ` Jan Beulich
2012-03-02  9:00   ` Alex Shi
2012-03-02  9:11     ` Jan Beulich
2012-03-02 15:12       ` Alex Shi
2012-03-02 15:30         ` Jan Beulich
2012-03-03  6:03           ` Alex Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox