[patch 1/3] powerpc: optimise smp

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [patch 1/3] powerpc: optimise smp_wmb
@ 2008-11-12  3:50 Nick Piggin
  2008-11-12  3:51 ` [patch 2/3] powerpc: optimise smp_rmb Nick Piggin
  2008-11-12  3:54 ` [patch 3/3] powerpc: optimise mutex Nick Piggin
  0 siblings, 2 replies; 3+ messages in thread
From: Nick Piggin @ 2008-11-12  3:50 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev

Hi,

OK, another go at submitting these 3 memory ordering improvement patches.
I've tested on my G5, but I'd like to especially get confirmation as to
whether the __SUBARCH_HAS_LWSYNC thing works on E500MC.

--

Change 2d1b2027626d5151fff8ef7c06ca8e7876a1a510 removed __SUBARCH_HAS_LWSYNC,
causing smp_wmb to revert back to eieio for all CPUs. Restore the behaviour
intorduced in 74f0609526afddd88bef40b651da24f3167b10b2.

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/arch/powerpc/include/asm/synch.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/synch.h	2008-11-12 12:26:15.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/synch.h	2008-11-12 12:33:10.000000000 +1100
@@ -5,6 +5,10 @@
 #include <linux/stringify.h>
 #include <asm/feature-fixups.h>
 
+#if defined(__powerpc64__) || defined(CONFIG_PPC_E500MC)
+#define __SUBARCH_HAS_LWSYNC
+#endif
+
 #ifndef __ASSEMBLY__
 extern unsigned int __start___lwsync_fixup, __stop___lwsync_fixup;
 extern void do_lwsync_fixups(unsigned long value, void *fixup_start,
Index: linux-2.6/arch/powerpc/include/asm/system.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/system.h	2008-11-12 12:26:46.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/system.h	2008-11-12 12:28:57.000000000 +1100
@@ -45,14 +45,14 @@
 #ifdef CONFIG_SMP
 
 #ifdef __SUBARCH_HAS_LWSYNC
-#    define SMPWMB      lwsync
+#    define SMPWMB      LWSYNC
 #else
 #    define SMPWMB      eieio
 #endif
 
 #define smp_mb()	mb()
 #define smp_rmb()	rmb()
-#define smp_wmb()	__asm__ __volatile__ (__stringify(SMPWMB) : : :"memory")
+#define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 #define smp_read_barrier_depends()	read_barrier_depends()
 #else
 #define smp_mb()	barrier()

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [patch 2/3] powerpc: optimise smp_rmb
  2008-11-12  3:50 [patch 1/3] powerpc: optimise smp_wmb Nick Piggin
@ 2008-11-12  3:51 ` Nick Piggin
  2008-11-12  3:54 ` [patch 3/3] powerpc: optimise mutex Nick Piggin
  1 sibling, 0 replies; 3+ messages in thread
From: Nick Piggin @ 2008-11-12  3:51 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev


After commit 598056d5af8fef1dbe8f96f5c2b641a528184e5a, rmb() becomes a sync
instruction, which is needed to order cacheable vs noncacheable loads. However
smp_rmb() is #defined to rmb(), and smp_rmb() can be an lwsync.

Restore smp_rmb() performance by using lwsync there. Update comments.

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/arch/powerpc/include/asm/system.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/system.h	2008-11-12 12:28:57.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/system.h	2008-11-12 12:35:12.000000000 +1100
@@ -23,15 +23,17 @@
  * read_barrier_depends() prevents data-dependent loads being reordered
  *	across this point (nop on PPC).
  *
- * We have to use the sync instructions for mb(), since lwsync doesn't
- * order loads with respect to previous stores.  Lwsync is fine for
- * rmb(), though. Note that rmb() actually uses a sync on 32-bit
- * architectures.
+ * *mb() variants without smp_ prefix must order all types of memory
+ * operations with one another. sync is the only instruction sufficient
+ * to do this.
  *
- * For wmb(), we use sync since wmb is used in drivers to order
- * stores to system memory with respect to writes to the device.
- * However, smp_wmb() can be a lighter-weight lwsync or eieio barrier
- * on SMP since it is only used to order updates to system memory.
+ * For the smp_ barriers, ordering is for cacheable memory operations
+ * only. We have to use the sync instruction for smp_mb(), since lwsync
+ * doesn't order loads with respect to previous stores.  Lwsync can be
+ * used for smp_rmb() and smp_wmb().
+ *
+ * However, on CPUs that don't support lwsync, lwsync actually maps to a
+ * heavy-weight sync, so smp_wmb() can be a lighter-weight eieio.
  */
 #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
 #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
@@ -51,7 +53,7 @@
 #endif
 
 #define smp_mb()	mb()
-#define smp_rmb()	rmb()
+#define smp_rmb()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
 #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 #define smp_read_barrier_depends()	read_barrier_depends()
 #else

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [patch 3/3] powerpc: optimise mutex
  2008-11-12  3:50 [patch 1/3] powerpc: optimise smp_wmb Nick Piggin
  2008-11-12  3:51 ` [patch 2/3] powerpc: optimise smp_rmb Nick Piggin
@ 2008-11-12  3:54 ` Nick Piggin
  1 sibling, 0 replies; 3+ messages in thread
From: Nick Piggin @ 2008-11-12  3:54 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev


Implement a more optimal mutex fastpath for powerpc, making use of acquire
and release barrier semantics. This takes the mutex lock+unlock benchmark
from 203 to 173 cycles on a G5.

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/arch/powerpc/include/asm/mutex.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/mutex.h	2008-11-12 12:50:06.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/mutex.h	2008-11-12 13:41:24.000000000 +1100
@@ -1,9 +1,134 @@
 /*
- * Pull in the generic implementation for the mutex fastpath.
+ * Optimised mutex implementation of include/asm-generic/mutex-dec.h algorithm
+ */
+#ifndef _ASM_POWERPC_MUTEX_H
+#define _ASM_POWERPC_MUTEX_H
+
+static inline int __mutex_cmpxchg_lock(atomic_t *v, int old, int new)
+{
+	int t;
+
+	__asm__ __volatile__ (
+"1:	lwarx	%0,0,%1		# mutex trylock\n\
+	cmpw	0,%0,%2\n\
+	bne-	2f\n"
+	PPC405_ERR77(0,%1)
+"	stwcx.	%3,0,%1\n\
+	bne-	1b"
+	ISYNC_ON_SMP
+	"\n\
+2:"
+	: "=&r" (t)
+	: "r" (&v->counter), "r" (old), "r" (new)
+	: "cc", "memory");
+
+	return t;
+}
+
+static inline int __mutex_dec_return_lock(atomic_t *v)
+{
+	int t;
+
+	__asm__ __volatile__(
+"1:	lwarx	%0,0,%1		# mutex lock\n\
+	addic	%0,%0,-1\n"
+	PPC405_ERR77(0,%1)
+"	stwcx.	%0,0,%1\n\
+	bne-	1b"
+	ISYNC_ON_SMP
+	: "=&r" (t)
+	: "r" (&v->counter)
+	: "cc", "memory");
+
+	return t;
+}
+
+static inline int __mutex_inc_return_unlock(atomic_t *v)
+{
+	int t;
+
+	__asm__ __volatile__(
+	LWSYNC_ON_SMP
+"1:	lwarx	%0,0,%1		# mutex unlock\n\
+	addic	%0,%0,1\n"
+	PPC405_ERR77(0,%1)
+"	stwcx.	%0,0,%1 \n\
+	bne-	1b"
+	: "=&r" (t)
+	: "r" (&v->counter)
+	: "cc", "memory");
+
+	return t;
+}
+
+/**
+ *  __mutex_fastpath_lock - try to take the lock by moving the count
+ *                          from 1 to a 0 value
+ *  @count: pointer of type atomic_t
+ *  @fail_fn: function to call if the original value was not 1
+ *
+ * Change the count from 1 to a value lower than 1, and call <fail_fn> if
+ * it wasn't 1 originally. This function MUST leave the value lower than
+ * 1 even when the "1" assertion wasn't true.
+ */
+static inline void
+__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
+{
+	if (unlikely(__mutex_dec_return_lock(count) < 0))
+		fail_fn(count);
+}
+
+/**
+ *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
+ *                                 from 1 to a 0 value
+ *  @count: pointer of type atomic_t
+ *  @fail_fn: function to call if the original value was not 1
+ *
+ * Change the count from 1 to a value lower than 1, and call <fail_fn> if
+ * it wasn't 1 originally. This function returns 0 if the fastpath succeeds,
+ * or anything the slow path function returns.
+ */
+static inline int
+__mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
+{
+	if (unlikely(__mutex_dec_return_lock(count) < 0))
+		return fail_fn(count);
+	return 0;
+}
+
+/**
+ *  __mutex_fastpath_unlock - try to promote the count from 0 to 1
+ *  @count: pointer of type atomic_t
+ *  @fail_fn: function to call if the original value was not 0
+ *
+ * Try to promote the count from 0 to 1. If it wasn't 0, call <fail_fn>.
+ * In the failure case, this function is allowed to either set the value to
+ * 1, or to set it to a value lower than 1.
+ */
+static inline void
+__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
+{
+	if (unlikely(__mutex_inc_return_unlock(count) <= 0))
+		fail_fn(count);
+}
+
+#define __mutex_slowpath_needs_to_unlock()		1
+
+/**
+ * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
+ *
+ *  @count: pointer of type atomic_t
+ *  @fail_fn: fallback function
  *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
+ * Change the count from 1 to 0, and return 1 (success), or if the count
+ * was not 1, then return 0 (failure).
  */
+static inline int
+__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
+{
+	if (likely(__mutex_cmpxchg_lock(count, 1, 0) == 1))
+		return 1;
+	return 0;
+}
 
-#include <asm-generic/mutex-dec.h>
+#endif

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-11-12  3:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-12  3:50 [patch 1/3] powerpc: optimise smp_wmb Nick Piggin
2008-11-12  3:51 ` [patch 2/3] powerpc: optimise smp_rmb Nick Piggin
2008-11-12  3:54 ` [patch 3/3] powerpc: optimise mutex Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).