* [patch 1/2] powerpc: optimise smp_mb
@ 2009-02-19 17:12 Nick Piggin
2009-02-19 17:21 ` [patch 2/2] powerpc: replace isync with lwsync Nick Piggin
2009-03-04 4:03 ` [patch 1/2] powerpc: optimise smp_mb Benjamin Herrenschmidt
0 siblings, 2 replies; 6+ messages in thread
From: Nick Piggin @ 2009-02-19 17:12 UTC (permalink / raw)
To: benh, paulus, linuxppc-dev
Using lwsync, isync sequence in a microbenchmark is 5 times faster on my G5 than
using sync for smp_mb. Although it takes more instructions.
Running tbench with 4 clients on my 4 core G5 (20 times) gives the
following:
unpatched AVG=920.33 STD=2.36
patched AVG=921.27 STD=2.77
So not a big improvement here, actually it could even be in the noise.
But other workloads or systems might see a bigger win, and the patch
maybe is interesting or could be improved, so I'll ask for comments.
---
Index: linux-2.6/arch/powerpc/include/asm/system.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/system.h 2009-02-20 01:51:24.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/system.h 2009-02-20 02:09:41.000000000 +1100
@@ -52,7 +52,16 @@
# define SMPWMB eieio
#endif
+#ifdef __powerpc64__
+#define smp_mb() __asm__ __volatile__ ( \
+ "1: lwsync \n" \
+ " cmpw 0,%%r0,%%r0 \n" \
+ " bne- 1b \n" \
+ " isync \n" \
+ : : : "memory")
+#else
#define smp_mb() mb()
+#endif
#define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
#define smp_read_barrier_depends() read_barrier_depends()
^ permalink raw reply [flat|nested] 6+ messages in thread
* [patch 2/2] powerpc: replace isync with lwsync
2009-02-19 17:12 [patch 1/2] powerpc: optimise smp_mb Nick Piggin
@ 2009-02-19 17:21 ` Nick Piggin
2009-03-04 4:04 ` Benjamin Herrenschmidt
2009-03-04 4:03 ` [patch 1/2] powerpc: optimise smp_mb Benjamin Herrenschmidt
1 sibling, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2009-02-19 17:21 UTC (permalink / raw)
To: benh, paulus, linuxppc-dev
OK, here is this patch again. You didn't think I'd let a 2% performance
improvement be forgotten? :)
Anyway, patch won't work well on architecture without lwsync, but I won't
bother fixing that kind of thing and making it merge worthy until you
guys say something positive about it.
20 runs of tbench on the G5
unpatched AVG=920.37 STD=2.36
patched AVG=938.89 STD=3.33
(throughput in MB/s) This is a 1.9% throughput increase.
---
Index: linux-2.6/arch/powerpc/include/asm/atomic.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/atomic.h 2009-02-20 01:50:20.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/atomic.h 2009-02-20 02:13:22.000000000 +1100
@@ -55,7 +55,7 @@
PPC405_ERR77(0,%2)
" stwcx. %0,0,%2 \n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (t)
: "r" (a), "r" (&v->counter)
: "cc", "memory");
@@ -91,7 +91,7 @@
PPC405_ERR77(0,%2)
" stwcx. %0,0,%2 \n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (t)
: "r" (a), "r" (&v->counter)
: "cc", "memory");
@@ -125,7 +125,7 @@
PPC405_ERR77(0,%1)
" stwcx. %0,0,%1 \n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (t)
: "r" (&v->counter)
: "cc", "xer", "memory");
@@ -169,7 +169,7 @@
PPC405_ERR77(0,%1)
" stwcx. %0,0,%1\n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (t)
: "r" (&v->counter)
: "cc", "xer", "memory");
@@ -202,7 +202,7 @@
PPC405_ERR77(0,%2)
" stwcx. %0,0,%1 \n\
bne- 1b \n"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
" subf %0,%2,%0 \n\
2:"
: "=&r" (t)
@@ -235,7 +235,7 @@
PPC405_ERR77(0,%1)
" stwcx. %0,0,%1\n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
"\n\
2:" : "=&b" (t)
: "r" (&v->counter)
@@ -291,7 +291,7 @@
add %0,%1,%0\n\
stdcx. %0,0,%2 \n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (t)
: "r" (a), "r" (&v->counter)
: "cc", "memory");
@@ -325,7 +325,7 @@
subf %0,%1,%0\n\
stdcx. %0,0,%2 \n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (t)
: "r" (a), "r" (&v->counter)
: "cc", "memory");
@@ -357,7 +357,7 @@
addic %0,%0,1\n\
stdcx. %0,0,%1 \n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (t)
: "r" (&v->counter)
: "cc", "xer", "memory");
@@ -399,7 +399,7 @@
addic %0,%0,-1\n\
stdcx. %0,0,%1\n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (t)
: "r" (&v->counter)
: "cc", "xer", "memory");
@@ -425,7 +425,7 @@
blt- 2f\n\
stdcx. %0,0,%1\n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
"\n\
2:" : "=&r" (t)
: "r" (&v->counter)
@@ -458,7 +458,7 @@
add %0,%2,%0 \n"
" stdcx. %0,0,%1 \n\
bne- 1b \n"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
" subf %0,%2,%0 \n\
2:"
: "=&r" (t)
Index: linux-2.6/arch/powerpc/include/asm/bitops.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/bitops.h 2009-02-20 01:50:20.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/bitops.h 2009-02-20 02:13:22.000000000 +1100
@@ -139,7 +139,7 @@
PPC405_ERR77(0,%3)
PPC_STLCX "%1,0,%3 \n"
"bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (old), "=&r" (t)
: "r" (mask), "r" (p)
: "cc", "memory");
@@ -160,7 +160,7 @@
PPC405_ERR77(0,%3)
PPC_STLCX "%1,0,%3 \n"
"bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (old), "=&r" (t)
: "r" (mask), "r" (p)
: "cc", "memory");
@@ -182,7 +182,7 @@
PPC405_ERR77(0,%3)
PPC_STLCX "%1,0,%3 \n"
"bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (old), "=&r" (t)
: "r" (mask), "r" (p)
: "cc", "memory");
@@ -204,7 +204,7 @@
PPC405_ERR77(0,%3)
PPC_STLCX "%1,0,%3 \n"
"bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (old), "=&r" (t)
: "r" (mask), "r" (p)
: "cc", "memory");
Index: linux-2.6/arch/powerpc/include/asm/futex.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/futex.h 2009-02-20 01:50:20.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/futex.h 2009-02-20 02:13:22.000000000 +1100
@@ -97,7 +97,7 @@
PPC405_ERR77(0,%2)
"2: stwcx. %4,0,%2\n\
bne- 1b\n"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
"3: .section .fixup,\"ax\"\n\
4: li %0,%5\n\
b 3b\n\
Index: linux-2.6/arch/powerpc/include/asm/spinlock.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/spinlock.h 2009-02-20 01:50:20.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/spinlock.h 2009-02-20 02:13:22.000000000 +1100
@@ -65,7 +65,7 @@
bne- 2f\n\
stwcx. %1,0,%2\n\
bne- 1b\n\
- isync\n\
+ lwsync\n\
2:" : "=&r" (tmp)
: "r" (token), "r" (&lock->slock)
: "cr0", "memory");
@@ -193,7 +193,7 @@
PPC405_ERR77(0,%1)
" stwcx. %0,0,%1\n\
bne- 1b\n\
- isync\n\
+ lwsync\n\
2:" : "=&r" (tmp)
: "r" (&rw->lock)
: "cr0", "xer", "memory");
@@ -217,7 +217,7 @@
PPC405_ERR77(0,%1)
" stwcx. %1,0,%2\n\
bne- 1b\n\
- isync\n\
+ lwsync\n\
2:" : "=&r" (tmp)
: "r" (token), "r" (&rw->lock)
: "cr0", "memory");
Index: linux-2.6/arch/powerpc/include/asm/system.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/system.h 2009-02-20 02:09:41.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/system.h 2009-02-20 02:13:22.000000000 +1100
@@ -246,7 +246,7 @@
PPC405_ERR77(0,%2)
" stwcx. %3,0,%2 \n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (prev), "+m" (*(volatile unsigned int *)p)
: "r" (p), "r" (val)
: "cc", "memory");
@@ -289,7 +289,7 @@
PPC405_ERR77(0,%2)
" stdcx. %3,0,%2 \n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (prev), "+m" (*(volatile unsigned long *)p)
: "r" (p), "r" (val)
: "cc", "memory");
@@ -382,7 +382,7 @@
PPC405_ERR77(0,%2)
" stwcx. %4,0,%2\n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
"\n\
2:"
: "=&r" (prev), "+m" (*p)
@@ -427,7 +427,7 @@
bne- 2f\n\
stdcx. %4,0,%2\n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
"\n\
2:"
: "=&r" (prev), "+m" (*p)
Index: linux-2.6/arch/powerpc/include/asm/synch.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/synch.h 2009-02-20 01:50:20.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/synch.h 2009-02-20 02:13:22.000000000 +1100
@@ -38,7 +38,7 @@
#ifdef CONFIG_SMP
#define ISYNC_ON_SMP "\n\tisync\n"
-#define LWSYNC_ON_SMP stringify_in_c(LWSYNC) "\n"
+#define LWSYNC_ON_SMP "\n\t" stringify_in_c(LWSYNC) "\n"
#else
#define ISYNC_ON_SMP
#define LWSYNC_ON_SMP
Index: linux-2.6/arch/powerpc/include/asm/mutex.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/mutex.h 2009-02-20 01:50:20.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/mutex.h 2009-02-20 02:13:22.000000000 +1100
@@ -15,7 +15,7 @@
PPC405_ERR77(0,%1)
" stwcx. %3,0,%1\n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
"\n\
2:"
: "=&r" (t)
@@ -35,7 +35,7 @@
PPC405_ERR77(0,%1)
" stwcx. %0,0,%1\n\
bne- 1b"
- ISYNC_ON_SMP
+ LWSYNC_ON_SMP
: "=&r" (t)
: "r" (&v->counter)
: "cc", "memory");
Index: linux-2.6/arch/powerpc/mm/hash_low_64.S
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/hash_low_64.S 2009-02-20 01:50:20.000000000 +1100
+++ linux-2.6/arch/powerpc/mm/hash_low_64.S 2009-02-20 02:13:22.000000000 +1100
@@ -110,7 +110,7 @@
/* Write the linux PTE atomically (setting busy) */
stdcx. r30,0,r6
bne- 1b
- isync
+ lwsync
/* Step 2:
*
@@ -393,7 +393,7 @@
/* Write the linux PTE atomically (setting busy) */
stdcx. r30,0,r6
bne- 1b
- isync
+ lwsync
/* Step 2:
*
@@ -734,7 +734,7 @@
/* Write the linux PTE atomically (setting busy) */
stdcx. r30,0,r6
bne- 1b
- isync
+ lwsync
/* Step 2:
*
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 1/2] powerpc: optimise smp_mb
2009-02-19 17:12 [patch 1/2] powerpc: optimise smp_mb Nick Piggin
2009-02-19 17:21 ` [patch 2/2] powerpc: replace isync with lwsync Nick Piggin
@ 2009-03-04 4:03 ` Benjamin Herrenschmidt
2009-03-04 9:38 ` Nick Piggin
1 sibling, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2009-03-04 4:03 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, paulus
Allright, sorry for the delay, I had those stored into my "need more
than half a brain cell for review" list and only got to them today :-)
On Thu, 2009-02-19 at 18:12 +0100, Nick Piggin wrote:
> Using lwsync, isync sequence in a microbenchmark is 5 times faster on my G5 than
> using sync for smp_mb. Although it takes more instructions.
>
> Running tbench with 4 clients on my 4 core G5 (20 times) gives the
> following:
>
> unpatched AVG=920.33 STD=2.36
> patched AVG=921.27 STD=2.77
>
> So not a big improvement here, actually it could even be in the noise.
> But other workloads or systems might see a bigger win, and the patch
> maybe is interesting or could be improved, so I'll ask for comments.
So not a huge objection here, however I have some doubts as to whether
this will be worthwhile on power5,6,7 since those optimized somewhat the
behaviour of the full sync. Since anything older than power4 doesn't
have lwsync, that potentially makes it not worth the pain.
But I need to measure to be sure... it might be that newer embedded
processors that support lwsync and SMP (and that are using a different
pipeline structure) might benefit from this. I'll try to run some tests
later this week or next week, but ping me in case I forget.
Now what would be worth doing is to also try using a twi;isync sequence
like we do to order MMIO reads, see if it's any better than cmp/branch
Cheers,
Ben.
> ---
> Index: linux-2.6/arch/powerpc/include/asm/system.h
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/include/asm/system.h 2009-02-20 01:51:24.000000000 +1100
> +++ linux-2.6/arch/powerpc/include/asm/system.h 2009-02-20 02:09:41.000000000 +1100
> @@ -52,7 +52,16 @@
> # define SMPWMB eieio
> #endif
>
> +#ifdef __powerpc64__
> +#define smp_mb() __asm__ __volatile__ ( \
> + "1: lwsync \n" \
> + " cmpw 0,%%r0,%%r0 \n" \
> + " bne- 1b \n" \
> + " isync \n" \
> + : : : "memory")
> +#else
> #define smp_mb() mb()
> +#endif
> #define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
> #define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
> #define smp_read_barrier_depends() read_barrier_depends()
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 2/2] powerpc: replace isync with lwsync
2009-02-19 17:21 ` [patch 2/2] powerpc: replace isync with lwsync Nick Piggin
@ 2009-03-04 4:04 ` Benjamin Herrenschmidt
2009-03-04 10:15 ` Nick Piggin
0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2009-03-04 4:04 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, paulus
On Thu, 2009-02-19 at 18:21 +0100, Nick Piggin wrote:
> OK, here is this patch again. You didn't think I'd let a 2% performance
> improvement be forgotten? :)
>
> Anyway, patch won't work well on architecture without lwsync, but I won't
> bother fixing that kind of thing and making it merge worthy until you
> guys say something positive about it.
>
> 20 runs of tbench on the G5
>
> unpatched AVG=920.37 STD=2.36
> patched AVG=938.89 STD=3.33
>
> (throughput in MB/s) This is a 1.9% throughput increase.
Definitely worth it believe. We could use a macro that uses michael new
improvements on the CPU features code pathing so that the isync gets
changed to lwsync on some CPUs based on the availability of it.
Cheers,
Ben.
> ---
>
> Index: linux-2.6/arch/powerpc/include/asm/atomic.h
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/include/asm/atomic.h 2009-02-20 01:50:20.000000000 +1100
> +++ linux-2.6/arch/powerpc/include/asm/atomic.h 2009-02-20 02:13:22.000000000 +1100
> @@ -55,7 +55,7 @@
> PPC405_ERR77(0,%2)
> " stwcx. %0,0,%2 \n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (t)
> : "r" (a), "r" (&v->counter)
> : "cc", "memory");
> @@ -91,7 +91,7 @@
> PPC405_ERR77(0,%2)
> " stwcx. %0,0,%2 \n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (t)
> : "r" (a), "r" (&v->counter)
> : "cc", "memory");
> @@ -125,7 +125,7 @@
> PPC405_ERR77(0,%1)
> " stwcx. %0,0,%1 \n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (t)
> : "r" (&v->counter)
> : "cc", "xer", "memory");
> @@ -169,7 +169,7 @@
> PPC405_ERR77(0,%1)
> " stwcx. %0,0,%1\n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (t)
> : "r" (&v->counter)
> : "cc", "xer", "memory");
> @@ -202,7 +202,7 @@
> PPC405_ERR77(0,%2)
> " stwcx. %0,0,%1 \n\
> bne- 1b \n"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> " subf %0,%2,%0 \n\
> 2:"
> : "=&r" (t)
> @@ -235,7 +235,7 @@
> PPC405_ERR77(0,%1)
> " stwcx. %0,0,%1\n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> "\n\
> 2:" : "=&b" (t)
> : "r" (&v->counter)
> @@ -291,7 +291,7 @@
> add %0,%1,%0\n\
> stdcx. %0,0,%2 \n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (t)
> : "r" (a), "r" (&v->counter)
> : "cc", "memory");
> @@ -325,7 +325,7 @@
> subf %0,%1,%0\n\
> stdcx. %0,0,%2 \n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (t)
> : "r" (a), "r" (&v->counter)
> : "cc", "memory");
> @@ -357,7 +357,7 @@
> addic %0,%0,1\n\
> stdcx. %0,0,%1 \n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (t)
> : "r" (&v->counter)
> : "cc", "xer", "memory");
> @@ -399,7 +399,7 @@
> addic %0,%0,-1\n\
> stdcx. %0,0,%1\n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (t)
> : "r" (&v->counter)
> : "cc", "xer", "memory");
> @@ -425,7 +425,7 @@
> blt- 2f\n\
> stdcx. %0,0,%1\n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> "\n\
> 2:" : "=&r" (t)
> : "r" (&v->counter)
> @@ -458,7 +458,7 @@
> add %0,%2,%0 \n"
> " stdcx. %0,0,%1 \n\
> bne- 1b \n"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> " subf %0,%2,%0 \n\
> 2:"
> : "=&r" (t)
> Index: linux-2.6/arch/powerpc/include/asm/bitops.h
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/include/asm/bitops.h 2009-02-20 01:50:20.000000000 +1100
> +++ linux-2.6/arch/powerpc/include/asm/bitops.h 2009-02-20 02:13:22.000000000 +1100
> @@ -139,7 +139,7 @@
> PPC405_ERR77(0,%3)
> PPC_STLCX "%1,0,%3 \n"
> "bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (old), "=&r" (t)
> : "r" (mask), "r" (p)
> : "cc", "memory");
> @@ -160,7 +160,7 @@
> PPC405_ERR77(0,%3)
> PPC_STLCX "%1,0,%3 \n"
> "bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (old), "=&r" (t)
> : "r" (mask), "r" (p)
> : "cc", "memory");
> @@ -182,7 +182,7 @@
> PPC405_ERR77(0,%3)
> PPC_STLCX "%1,0,%3 \n"
> "bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (old), "=&r" (t)
> : "r" (mask), "r" (p)
> : "cc", "memory");
> @@ -204,7 +204,7 @@
> PPC405_ERR77(0,%3)
> PPC_STLCX "%1,0,%3 \n"
> "bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (old), "=&r" (t)
> : "r" (mask), "r" (p)
> : "cc", "memory");
> Index: linux-2.6/arch/powerpc/include/asm/futex.h
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/include/asm/futex.h 2009-02-20 01:50:20.000000000 +1100
> +++ linux-2.6/arch/powerpc/include/asm/futex.h 2009-02-20 02:13:22.000000000 +1100
> @@ -97,7 +97,7 @@
> PPC405_ERR77(0,%2)
> "2: stwcx. %4,0,%2\n\
> bne- 1b\n"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> "3: .section .fixup,\"ax\"\n\
> 4: li %0,%5\n\
> b 3b\n\
> Index: linux-2.6/arch/powerpc/include/asm/spinlock.h
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/include/asm/spinlock.h 2009-02-20 01:50:20.000000000 +1100
> +++ linux-2.6/arch/powerpc/include/asm/spinlock.h 2009-02-20 02:13:22.000000000 +1100
> @@ -65,7 +65,7 @@
> bne- 2f\n\
> stwcx. %1,0,%2\n\
> bne- 1b\n\
> - isync\n\
> + lwsync\n\
> 2:" : "=&r" (tmp)
> : "r" (token), "r" (&lock->slock)
> : "cr0", "memory");
> @@ -193,7 +193,7 @@
> PPC405_ERR77(0,%1)
> " stwcx. %0,0,%1\n\
> bne- 1b\n\
> - isync\n\
> + lwsync\n\
> 2:" : "=&r" (tmp)
> : "r" (&rw->lock)
> : "cr0", "xer", "memory");
> @@ -217,7 +217,7 @@
> PPC405_ERR77(0,%1)
> " stwcx. %1,0,%2\n\
> bne- 1b\n\
> - isync\n\
> + lwsync\n\
> 2:" : "=&r" (tmp)
> : "r" (token), "r" (&rw->lock)
> : "cr0", "memory");
> Index: linux-2.6/arch/powerpc/include/asm/system.h
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/include/asm/system.h 2009-02-20 02:09:41.000000000 +1100
> +++ linux-2.6/arch/powerpc/include/asm/system.h 2009-02-20 02:13:22.000000000 +1100
> @@ -246,7 +246,7 @@
> PPC405_ERR77(0,%2)
> " stwcx. %3,0,%2 \n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (prev), "+m" (*(volatile unsigned int *)p)
> : "r" (p), "r" (val)
> : "cc", "memory");
> @@ -289,7 +289,7 @@
> PPC405_ERR77(0,%2)
> " stdcx. %3,0,%2 \n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (prev), "+m" (*(volatile unsigned long *)p)
> : "r" (p), "r" (val)
> : "cc", "memory");
> @@ -382,7 +382,7 @@
> PPC405_ERR77(0,%2)
> " stwcx. %4,0,%2\n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> "\n\
> 2:"
> : "=&r" (prev), "+m" (*p)
> @@ -427,7 +427,7 @@
> bne- 2f\n\
> stdcx. %4,0,%2\n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> "\n\
> 2:"
> : "=&r" (prev), "+m" (*p)
> Index: linux-2.6/arch/powerpc/include/asm/synch.h
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/include/asm/synch.h 2009-02-20 01:50:20.000000000 +1100
> +++ linux-2.6/arch/powerpc/include/asm/synch.h 2009-02-20 02:13:22.000000000 +1100
> @@ -38,7 +38,7 @@
>
> #ifdef CONFIG_SMP
> #define ISYNC_ON_SMP "\n\tisync\n"
> -#define LWSYNC_ON_SMP stringify_in_c(LWSYNC) "\n"
> +#define LWSYNC_ON_SMP "\n\t" stringify_in_c(LWSYNC) "\n"
> #else
> #define ISYNC_ON_SMP
> #define LWSYNC_ON_SMP
> Index: linux-2.6/arch/powerpc/include/asm/mutex.h
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/include/asm/mutex.h 2009-02-20 01:50:20.000000000 +1100
> +++ linux-2.6/arch/powerpc/include/asm/mutex.h 2009-02-20 02:13:22.000000000 +1100
> @@ -15,7 +15,7 @@
> PPC405_ERR77(0,%1)
> " stwcx. %3,0,%1\n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> "\n\
> 2:"
> : "=&r" (t)
> @@ -35,7 +35,7 @@
> PPC405_ERR77(0,%1)
> " stwcx. %0,0,%1\n\
> bne- 1b"
> - ISYNC_ON_SMP
> + LWSYNC_ON_SMP
> : "=&r" (t)
> : "r" (&v->counter)
> : "cc", "memory");
> Index: linux-2.6/arch/powerpc/mm/hash_low_64.S
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/mm/hash_low_64.S 2009-02-20 01:50:20.000000000 +1100
> +++ linux-2.6/arch/powerpc/mm/hash_low_64.S 2009-02-20 02:13:22.000000000 +1100
> @@ -110,7 +110,7 @@
> /* Write the linux PTE atomically (setting busy) */
> stdcx. r30,0,r6
> bne- 1b
> - isync
> + lwsync
>
> /* Step 2:
> *
> @@ -393,7 +393,7 @@
> /* Write the linux PTE atomically (setting busy) */
> stdcx. r30,0,r6
> bne- 1b
> - isync
> + lwsync
>
> /* Step 2:
> *
> @@ -734,7 +734,7 @@
> /* Write the linux PTE atomically (setting busy) */
> stdcx. r30,0,r6
> bne- 1b
> - isync
> + lwsync
>
> /* Step 2:
> *
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 1/2] powerpc: optimise smp_mb
2009-03-04 4:03 ` [patch 1/2] powerpc: optimise smp_mb Benjamin Herrenschmidt
@ 2009-03-04 9:38 ` Nick Piggin
0 siblings, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2009-03-04 9:38 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus
On Wed, Mar 04, 2009 at 03:03:15PM +1100, Benjamin Herrenschmidt wrote:
> Allright, sorry for the delay, I had those stored into my "need more
> than half a brain cell for review" list and only got to them today :-)
No problem :)
> On Thu, 2009-02-19 at 18:12 +0100, Nick Piggin wrote:
> > Using lwsync, isync sequence in a microbenchmark is 5 times faster on my G5 than
> > using sync for smp_mb. Although it takes more instructions.
> >
> > Running tbench with 4 clients on my 4 core G5 (20 times) gives the
> > following:
> >
> > unpatched AVG=920.33 STD=2.36
> > patched AVG=921.27 STD=2.77
> >
> > So not a big improvement here, actually it could even be in the noise.
> > But other workloads or systems might see a bigger win, and the patch
> > maybe is interesting or could be improved, so I'll ask for comments.
>
> So not a huge objection here, however I have some doubts as to whether
> this will be worthwhile on power5,6,7 since those optimized somewhat the
> behaviour of the full sync. Since anything older than power4 doesn't
> have lwsync, that potentially makes it not worth the pain.
I would be interested to know. Avoiding sync when there *is* outstanding
IO operations happening should be a win? (My test of tbench on localhost
obviously wouldn't generate much MMIO).
I mean, even in the most optimised implementation possible, this sequence
is less constraining than sync.
> But I need to measure to be sure... it might be that newer embedded
> processors that support lwsync and SMP (and that are using a different
> pipeline structure) might benefit from this. I'll try to run some tests
> later this week or next week, but ping me in case I forget.
OK I'll ping you next week.
> Now what would be worth doing is to also try using a twi;isync sequence
> like we do to order MMIO reads, see if it's any better than cmp/branch
Probably makes sense to use the same pattern.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 2/2] powerpc: replace isync with lwsync
2009-03-04 4:04 ` Benjamin Herrenschmidt
@ 2009-03-04 10:15 ` Nick Piggin
0 siblings, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2009-03-04 10:15 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus
On Wed, Mar 04, 2009 at 03:04:11PM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2009-02-19 at 18:21 +0100, Nick Piggin wrote:
> > OK, here is this patch again. You didn't think I'd let a 2% performance
> > improvement be forgotten? :)
> >
> > Anyway, patch won't work well on architecture without lwsync, but I won't
> > bother fixing that kind of thing and making it merge worthy until you
> > guys say something positive about it.
> >
> > 20 runs of tbench on the G5
> >
> > unpatched AVG=920.37 STD=2.36
> > patched AVG=938.89 STD=3.33
> >
> > (throughput in MB/s) This is a 1.9% throughput increase.
>
> Definitely worth it believe. We could use a macro that uses michael new
> improvements on the CPU features code pathing so that the isync gets
> changed to lwsync on some CPUs based on the availability of it.
OK. I guess the interesting part about this is that I can't find any
IBM documentation for lwsync capable CPUs that suggest using this
pattern for acquire locking. It would be interesting to know whether
it helps other CPUs...
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-03-04 10:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-19 17:12 [patch 1/2] powerpc: optimise smp_mb Nick Piggin
2009-02-19 17:21 ` [patch 2/2] powerpc: replace isync with lwsync Nick Piggin
2009-03-04 4:04 ` Benjamin Herrenschmidt
2009-03-04 10:15 ` Nick Piggin
2009-03-04 4:03 ` [patch 1/2] powerpc: optimise smp_mb Benjamin Herrenschmidt
2009-03-04 9:38 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).