* [patch 1/2] powerpc: optimise smp_mb @ 2009-02-19 17:12 Nick Piggin 2009-02-19 17:21 ` [patch 2/2] powerpc: replace isync with lwsync Nick Piggin 2009-03-04 4:03 ` [patch 1/2] powerpc: optimise smp_mb Benjamin Herrenschmidt 0 siblings, 2 replies; 6+ messages in thread From: Nick Piggin @ 2009-02-19 17:12 UTC (permalink / raw) To: benh, paulus, linuxppc-dev Using lwsync, isync sequence in a microbenchmark is 5 times faster on my G5 than using sync for smp_mb. Although it takes more instructions. Running tbench with 4 clients on my 4 core G5 (20 times) gives the following: unpatched AVG=920.33 STD=2.36 patched AVG=921.27 STD=2.77 So not a big improvement here, actually it could even be in the noise. But other workloads or systems might see a bigger win, and the patch maybe is interesting or could be improved, so I'll ask for comments. --- Index: linux-2.6/arch/powerpc/include/asm/system.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/system.h 2009-02-20 01:51:24.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/system.h 2009-02-20 02:09:41.000000000 +1100 @@ -52,7 +52,16 @@ # define SMPWMB eieio #endif +#ifdef __powerpc64__ +#define smp_mb() __asm__ __volatile__ ( \ + "1: lwsync \n" \ + " cmpw 0,%%r0,%%r0 \n" \ + " bne- 1b \n" \ + " isync \n" \ + : : : "memory") +#else #define smp_mb() mb() +#endif #define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") #define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") #define smp_read_barrier_depends() read_barrier_depends() ^ permalink raw reply [flat|nested] 6+ messages in thread
* [patch 2/2] powerpc: replace isync with lwsync 2009-02-19 17:12 [patch 1/2] powerpc: optimise smp_mb Nick Piggin @ 2009-02-19 17:21 ` Nick Piggin 2009-03-04 4:04 ` Benjamin Herrenschmidt 2009-03-04 4:03 ` [patch 1/2] powerpc: optimise smp_mb Benjamin Herrenschmidt 1 sibling, 1 reply; 6+ messages in thread From: Nick Piggin @ 2009-02-19 17:21 UTC (permalink / raw) To: benh, paulus, linuxppc-dev OK, here is this patch again. You didn't think I'd let a 2% performance improvement be forgotten? :) Anyway, patch won't work well on architecture without lwsync, but I won't bother fixing that kind of thing and making it merge worthy until you guys say something positive about it. 20 runs of tbench on the G5 unpatched AVG=920.37 STD=2.36 patched AVG=938.89 STD=3.33 (throughput in MB/s) This is a 1.9% throughput increase. --- Index: linux-2.6/arch/powerpc/include/asm/atomic.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/atomic.h 2009-02-20 01:50:20.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/atomic.h 2009-02-20 02:13:22.000000000 +1100 @@ -55,7 +55,7 @@ PPC405_ERR77(0,%2) " stwcx. %0,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (a), "r" (&v->counter) : "cc", "memory"); @@ -91,7 +91,7 @@ PPC405_ERR77(0,%2) " stwcx. %0,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (a), "r" (&v->counter) : "cc", "memory"); @@ -125,7 +125,7 @@ PPC405_ERR77(0,%1) " stwcx. %0,0,%1 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (&v->counter) : "cc", "xer", "memory"); @@ -169,7 +169,7 @@ PPC405_ERR77(0,%1) " stwcx. %0,0,%1\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (&v->counter) : "cc", "xer", "memory"); @@ -202,7 +202,7 @@ PPC405_ERR77(0,%2) " stwcx. %0,0,%1 \n\ bne- 1b \n" - ISYNC_ON_SMP + LWSYNC_ON_SMP " subf %0,%2,%0 \n\ 2:" : "=&r" (t) @@ -235,7 +235,7 @@ PPC405_ERR77(0,%1) " stwcx. %0,0,%1\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP "\n\ 2:" : "=&b" (t) : "r" (&v->counter) @@ -291,7 +291,7 @@ add %0,%1,%0\n\ stdcx. %0,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (a), "r" (&v->counter) : "cc", "memory"); @@ -325,7 +325,7 @@ subf %0,%1,%0\n\ stdcx. %0,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (a), "r" (&v->counter) : "cc", "memory"); @@ -357,7 +357,7 @@ addic %0,%0,1\n\ stdcx. %0,0,%1 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (&v->counter) : "cc", "xer", "memory"); @@ -399,7 +399,7 @@ addic %0,%0,-1\n\ stdcx. %0,0,%1\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (&v->counter) : "cc", "xer", "memory"); @@ -425,7 +425,7 @@ blt- 2f\n\ stdcx. %0,0,%1\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP "\n\ 2:" : "=&r" (t) : "r" (&v->counter) @@ -458,7 +458,7 @@ add %0,%2,%0 \n" " stdcx. %0,0,%1 \n\ bne- 1b \n" - ISYNC_ON_SMP + LWSYNC_ON_SMP " subf %0,%2,%0 \n\ 2:" : "=&r" (t) Index: linux-2.6/arch/powerpc/include/asm/bitops.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/bitops.h 2009-02-20 01:50:20.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/bitops.h 2009-02-20 02:13:22.000000000 +1100 @@ -139,7 +139,7 @@ PPC405_ERR77(0,%3) PPC_STLCX "%1,0,%3 \n" "bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (old), "=&r" (t) : "r" (mask), "r" (p) : "cc", "memory"); @@ -160,7 +160,7 @@ PPC405_ERR77(0,%3) PPC_STLCX "%1,0,%3 \n" "bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (old), "=&r" (t) : "r" (mask), "r" (p) : "cc", "memory"); @@ -182,7 +182,7 @@ PPC405_ERR77(0,%3) PPC_STLCX "%1,0,%3 \n" "bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (old), "=&r" (t) : "r" (mask), "r" (p) : "cc", "memory"); @@ -204,7 +204,7 @@ PPC405_ERR77(0,%3) PPC_STLCX "%1,0,%3 \n" "bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (old), "=&r" (t) : "r" (mask), "r" (p) : "cc", "memory"); Index: linux-2.6/arch/powerpc/include/asm/futex.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/futex.h 2009-02-20 01:50:20.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/futex.h 2009-02-20 02:13:22.000000000 +1100 @@ -97,7 +97,7 @@ PPC405_ERR77(0,%2) "2: stwcx. %4,0,%2\n\ bne- 1b\n" - ISYNC_ON_SMP + LWSYNC_ON_SMP "3: .section .fixup,\"ax\"\n\ 4: li %0,%5\n\ b 3b\n\ Index: linux-2.6/arch/powerpc/include/asm/spinlock.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/spinlock.h 2009-02-20 01:50:20.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/spinlock.h 2009-02-20 02:13:22.000000000 +1100 @@ -65,7 +65,7 @@ bne- 2f\n\ stwcx. %1,0,%2\n\ bne- 1b\n\ - isync\n\ + lwsync\n\ 2:" : "=&r" (tmp) : "r" (token), "r" (&lock->slock) : "cr0", "memory"); @@ -193,7 +193,7 @@ PPC405_ERR77(0,%1) " stwcx. %0,0,%1\n\ bne- 1b\n\ - isync\n\ + lwsync\n\ 2:" : "=&r" (tmp) : "r" (&rw->lock) : "cr0", "xer", "memory"); @@ -217,7 +217,7 @@ PPC405_ERR77(0,%1) " stwcx. %1,0,%2\n\ bne- 1b\n\ - isync\n\ + lwsync\n\ 2:" : "=&r" (tmp) : "r" (token), "r" (&rw->lock) : "cr0", "memory"); Index: linux-2.6/arch/powerpc/include/asm/system.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/system.h 2009-02-20 02:09:41.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/system.h 2009-02-20 02:13:22.000000000 +1100 @@ -246,7 +246,7 @@ PPC405_ERR77(0,%2) " stwcx. %3,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (prev), "+m" (*(volatile unsigned int *)p) : "r" (p), "r" (val) : "cc", "memory"); @@ -289,7 +289,7 @@ PPC405_ERR77(0,%2) " stdcx. %3,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (prev), "+m" (*(volatile unsigned long *)p) : "r" (p), "r" (val) : "cc", "memory"); @@ -382,7 +382,7 @@ PPC405_ERR77(0,%2) " stwcx. %4,0,%2\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP "\n\ 2:" : "=&r" (prev), "+m" (*p) @@ -427,7 +427,7 @@ bne- 2f\n\ stdcx. %4,0,%2\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP "\n\ 2:" : "=&r" (prev), "+m" (*p) Index: linux-2.6/arch/powerpc/include/asm/synch.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/synch.h 2009-02-20 01:50:20.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/synch.h 2009-02-20 02:13:22.000000000 +1100 @@ -38,7 +38,7 @@ #ifdef CONFIG_SMP #define ISYNC_ON_SMP "\n\tisync\n" -#define LWSYNC_ON_SMP stringify_in_c(LWSYNC) "\n" +#define LWSYNC_ON_SMP "\n\t" stringify_in_c(LWSYNC) "\n" #else #define ISYNC_ON_SMP #define LWSYNC_ON_SMP Index: linux-2.6/arch/powerpc/include/asm/mutex.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/mutex.h 2009-02-20 01:50:20.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/mutex.h 2009-02-20 02:13:22.000000000 +1100 @@ -15,7 +15,7 @@ PPC405_ERR77(0,%1) " stwcx. %3,0,%1\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP "\n\ 2:" : "=&r" (t) @@ -35,7 +35,7 @@ PPC405_ERR77(0,%1) " stwcx. %0,0,%1\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (&v->counter) : "cc", "memory"); Index: linux-2.6/arch/powerpc/mm/hash_low_64.S =================================================================== --- linux-2.6.orig/arch/powerpc/mm/hash_low_64.S 2009-02-20 01:50:20.000000000 +1100 +++ linux-2.6/arch/powerpc/mm/hash_low_64.S 2009-02-20 02:13:22.000000000 +1100 @@ -110,7 +110,7 @@ /* Write the linux PTE atomically (setting busy) */ stdcx. r30,0,r6 bne- 1b - isync + lwsync /* Step 2: * @@ -393,7 +393,7 @@ /* Write the linux PTE atomically (setting busy) */ stdcx. r30,0,r6 bne- 1b - isync + lwsync /* Step 2: * @@ -734,7 +734,7 @@ /* Write the linux PTE atomically (setting busy) */ stdcx. r30,0,r6 bne- 1b - isync + lwsync /* Step 2: * ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 2/2] powerpc: replace isync with lwsync 2009-02-19 17:21 ` [patch 2/2] powerpc: replace isync with lwsync Nick Piggin @ 2009-03-04 4:04 ` Benjamin Herrenschmidt 2009-03-04 10:15 ` Nick Piggin 0 siblings, 1 reply; 6+ messages in thread From: Benjamin Herrenschmidt @ 2009-03-04 4:04 UTC (permalink / raw) To: Nick Piggin; +Cc: linuxppc-dev, paulus On Thu, 2009-02-19 at 18:21 +0100, Nick Piggin wrote: > OK, here is this patch again. You didn't think I'd let a 2% performance > improvement be forgotten? :) > > Anyway, patch won't work well on architecture without lwsync, but I won't > bother fixing that kind of thing and making it merge worthy until you > guys say something positive about it. > > 20 runs of tbench on the G5 > > unpatched AVG=920.37 STD=2.36 > patched AVG=938.89 STD=3.33 > > (throughput in MB/s) This is a 1.9% throughput increase. Definitely worth it believe. We could use a macro that uses michael new improvements on the CPU features code pathing so that the isync gets changed to lwsync on some CPUs based on the availability of it. Cheers, Ben. > --- > > Index: linux-2.6/arch/powerpc/include/asm/atomic.h > =================================================================== > --- linux-2.6.orig/arch/powerpc/include/asm/atomic.h 2009-02-20 01:50:20.000000000 +1100 > +++ linux-2.6/arch/powerpc/include/asm/atomic.h 2009-02-20 02:13:22.000000000 +1100 > @@ -55,7 +55,7 @@ > PPC405_ERR77(0,%2) > " stwcx. %0,0,%2 \n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (t) > : "r" (a), "r" (&v->counter) > : "cc", "memory"); > @@ -91,7 +91,7 @@ > PPC405_ERR77(0,%2) > " stwcx. %0,0,%2 \n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (t) > : "r" (a), "r" (&v->counter) > : "cc", "memory"); > @@ -125,7 +125,7 @@ > PPC405_ERR77(0,%1) > " stwcx. %0,0,%1 \n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (t) > : "r" (&v->counter) > : "cc", "xer", "memory"); > @@ -169,7 +169,7 @@ > PPC405_ERR77(0,%1) > " stwcx. %0,0,%1\n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (t) > : "r" (&v->counter) > : "cc", "xer", "memory"); > @@ -202,7 +202,7 @@ > PPC405_ERR77(0,%2) > " stwcx. %0,0,%1 \n\ > bne- 1b \n" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > " subf %0,%2,%0 \n\ > 2:" > : "=&r" (t) > @@ -235,7 +235,7 @@ > PPC405_ERR77(0,%1) > " stwcx. %0,0,%1\n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > "\n\ > 2:" : "=&b" (t) > : "r" (&v->counter) > @@ -291,7 +291,7 @@ > add %0,%1,%0\n\ > stdcx. %0,0,%2 \n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (t) > : "r" (a), "r" (&v->counter) > : "cc", "memory"); > @@ -325,7 +325,7 @@ > subf %0,%1,%0\n\ > stdcx. %0,0,%2 \n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (t) > : "r" (a), "r" (&v->counter) > : "cc", "memory"); > @@ -357,7 +357,7 @@ > addic %0,%0,1\n\ > stdcx. %0,0,%1 \n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (t) > : "r" (&v->counter) > : "cc", "xer", "memory"); > @@ -399,7 +399,7 @@ > addic %0,%0,-1\n\ > stdcx. %0,0,%1\n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (t) > : "r" (&v->counter) > : "cc", "xer", "memory"); > @@ -425,7 +425,7 @@ > blt- 2f\n\ > stdcx. %0,0,%1\n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > "\n\ > 2:" : "=&r" (t) > : "r" (&v->counter) > @@ -458,7 +458,7 @@ > add %0,%2,%0 \n" > " stdcx. %0,0,%1 \n\ > bne- 1b \n" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > " subf %0,%2,%0 \n\ > 2:" > : "=&r" (t) > Index: linux-2.6/arch/powerpc/include/asm/bitops.h > =================================================================== > --- linux-2.6.orig/arch/powerpc/include/asm/bitops.h 2009-02-20 01:50:20.000000000 +1100 > +++ linux-2.6/arch/powerpc/include/asm/bitops.h 2009-02-20 02:13:22.000000000 +1100 > @@ -139,7 +139,7 @@ > PPC405_ERR77(0,%3) > PPC_STLCX "%1,0,%3 \n" > "bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (old), "=&r" (t) > : "r" (mask), "r" (p) > : "cc", "memory"); > @@ -160,7 +160,7 @@ > PPC405_ERR77(0,%3) > PPC_STLCX "%1,0,%3 \n" > "bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (old), "=&r" (t) > : "r" (mask), "r" (p) > : "cc", "memory"); > @@ -182,7 +182,7 @@ > PPC405_ERR77(0,%3) > PPC_STLCX "%1,0,%3 \n" > "bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (old), "=&r" (t) > : "r" (mask), "r" (p) > : "cc", "memory"); > @@ -204,7 +204,7 @@ > PPC405_ERR77(0,%3) > PPC_STLCX "%1,0,%3 \n" > "bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (old), "=&r" (t) > : "r" (mask), "r" (p) > : "cc", "memory"); > Index: linux-2.6/arch/powerpc/include/asm/futex.h > =================================================================== > --- linux-2.6.orig/arch/powerpc/include/asm/futex.h 2009-02-20 01:50:20.000000000 +1100 > +++ linux-2.6/arch/powerpc/include/asm/futex.h 2009-02-20 02:13:22.000000000 +1100 > @@ -97,7 +97,7 @@ > PPC405_ERR77(0,%2) > "2: stwcx. %4,0,%2\n\ > bne- 1b\n" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > "3: .section .fixup,\"ax\"\n\ > 4: li %0,%5\n\ > b 3b\n\ > Index: linux-2.6/arch/powerpc/include/asm/spinlock.h > =================================================================== > --- linux-2.6.orig/arch/powerpc/include/asm/spinlock.h 2009-02-20 01:50:20.000000000 +1100 > +++ linux-2.6/arch/powerpc/include/asm/spinlock.h 2009-02-20 02:13:22.000000000 +1100 > @@ -65,7 +65,7 @@ > bne- 2f\n\ > stwcx. %1,0,%2\n\ > bne- 1b\n\ > - isync\n\ > + lwsync\n\ > 2:" : "=&r" (tmp) > : "r" (token), "r" (&lock->slock) > : "cr0", "memory"); > @@ -193,7 +193,7 @@ > PPC405_ERR77(0,%1) > " stwcx. %0,0,%1\n\ > bne- 1b\n\ > - isync\n\ > + lwsync\n\ > 2:" : "=&r" (tmp) > : "r" (&rw->lock) > : "cr0", "xer", "memory"); > @@ -217,7 +217,7 @@ > PPC405_ERR77(0,%1) > " stwcx. %1,0,%2\n\ > bne- 1b\n\ > - isync\n\ > + lwsync\n\ > 2:" : "=&r" (tmp) > : "r" (token), "r" (&rw->lock) > : "cr0", "memory"); > Index: linux-2.6/arch/powerpc/include/asm/system.h > =================================================================== > --- linux-2.6.orig/arch/powerpc/include/asm/system.h 2009-02-20 02:09:41.000000000 +1100 > +++ linux-2.6/arch/powerpc/include/asm/system.h 2009-02-20 02:13:22.000000000 +1100 > @@ -246,7 +246,7 @@ > PPC405_ERR77(0,%2) > " stwcx. %3,0,%2 \n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (prev), "+m" (*(volatile unsigned int *)p) > : "r" (p), "r" (val) > : "cc", "memory"); > @@ -289,7 +289,7 @@ > PPC405_ERR77(0,%2) > " stdcx. %3,0,%2 \n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (prev), "+m" (*(volatile unsigned long *)p) > : "r" (p), "r" (val) > : "cc", "memory"); > @@ -382,7 +382,7 @@ > PPC405_ERR77(0,%2) > " stwcx. %4,0,%2\n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > "\n\ > 2:" > : "=&r" (prev), "+m" (*p) > @@ -427,7 +427,7 @@ > bne- 2f\n\ > stdcx. %4,0,%2\n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > "\n\ > 2:" > : "=&r" (prev), "+m" (*p) > Index: linux-2.6/arch/powerpc/include/asm/synch.h > =================================================================== > --- linux-2.6.orig/arch/powerpc/include/asm/synch.h 2009-02-20 01:50:20.000000000 +1100 > +++ linux-2.6/arch/powerpc/include/asm/synch.h 2009-02-20 02:13:22.000000000 +1100 > @@ -38,7 +38,7 @@ > > #ifdef CONFIG_SMP > #define ISYNC_ON_SMP "\n\tisync\n" > -#define LWSYNC_ON_SMP stringify_in_c(LWSYNC) "\n" > +#define LWSYNC_ON_SMP "\n\t" stringify_in_c(LWSYNC) "\n" > #else > #define ISYNC_ON_SMP > #define LWSYNC_ON_SMP > Index: linux-2.6/arch/powerpc/include/asm/mutex.h > =================================================================== > --- linux-2.6.orig/arch/powerpc/include/asm/mutex.h 2009-02-20 01:50:20.000000000 +1100 > +++ linux-2.6/arch/powerpc/include/asm/mutex.h 2009-02-20 02:13:22.000000000 +1100 > @@ -15,7 +15,7 @@ > PPC405_ERR77(0,%1) > " stwcx. %3,0,%1\n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > "\n\ > 2:" > : "=&r" (t) > @@ -35,7 +35,7 @@ > PPC405_ERR77(0,%1) > " stwcx. %0,0,%1\n\ > bne- 1b" > - ISYNC_ON_SMP > + LWSYNC_ON_SMP > : "=&r" (t) > : "r" (&v->counter) > : "cc", "memory"); > Index: linux-2.6/arch/powerpc/mm/hash_low_64.S > =================================================================== > --- linux-2.6.orig/arch/powerpc/mm/hash_low_64.S 2009-02-20 01:50:20.000000000 +1100 > +++ linux-2.6/arch/powerpc/mm/hash_low_64.S 2009-02-20 02:13:22.000000000 +1100 > @@ -110,7 +110,7 @@ > /* Write the linux PTE atomically (setting busy) */ > stdcx. r30,0,r6 > bne- 1b > - isync > + lwsync > > /* Step 2: > * > @@ -393,7 +393,7 @@ > /* Write the linux PTE atomically (setting busy) */ > stdcx. r30,0,r6 > bne- 1b > - isync > + lwsync > > /* Step 2: > * > @@ -734,7 +734,7 @@ > /* Write the linux PTE atomically (setting busy) */ > stdcx. r30,0,r6 > bne- 1b > - isync > + lwsync > > /* Step 2: > * ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 2/2] powerpc: replace isync with lwsync 2009-03-04 4:04 ` Benjamin Herrenschmidt @ 2009-03-04 10:15 ` Nick Piggin 0 siblings, 0 replies; 6+ messages in thread From: Nick Piggin @ 2009-03-04 10:15 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus On Wed, Mar 04, 2009 at 03:04:11PM +1100, Benjamin Herrenschmidt wrote: > On Thu, 2009-02-19 at 18:21 +0100, Nick Piggin wrote: > > OK, here is this patch again. You didn't think I'd let a 2% performance > > improvement be forgotten? :) > > > > Anyway, patch won't work well on architecture without lwsync, but I won't > > bother fixing that kind of thing and making it merge worthy until you > > guys say something positive about it. > > > > 20 runs of tbench on the G5 > > > > unpatched AVG=920.37 STD=2.36 > > patched AVG=938.89 STD=3.33 > > > > (throughput in MB/s) This is a 1.9% throughput increase. > > Definitely worth it believe. We could use a macro that uses michael new > improvements on the CPU features code pathing so that the isync gets > changed to lwsync on some CPUs based on the availability of it. OK. I guess the interesting part about this is that I can't find any IBM documentation for lwsync capable CPUs that suggest using this pattern for acquire locking. It would be interesting to know whether it helps other CPUs... ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 1/2] powerpc: optimise smp_mb 2009-02-19 17:12 [patch 1/2] powerpc: optimise smp_mb Nick Piggin 2009-02-19 17:21 ` [patch 2/2] powerpc: replace isync with lwsync Nick Piggin @ 2009-03-04 4:03 ` Benjamin Herrenschmidt 2009-03-04 9:38 ` Nick Piggin 1 sibling, 1 reply; 6+ messages in thread From: Benjamin Herrenschmidt @ 2009-03-04 4:03 UTC (permalink / raw) To: Nick Piggin; +Cc: linuxppc-dev, paulus Allright, sorry for the delay, I had those stored into my "need more than half a brain cell for review" list and only got to them today :-) On Thu, 2009-02-19 at 18:12 +0100, Nick Piggin wrote: > Using lwsync, isync sequence in a microbenchmark is 5 times faster on my G5 than > using sync for smp_mb. Although it takes more instructions. > > Running tbench with 4 clients on my 4 core G5 (20 times) gives the > following: > > unpatched AVG=920.33 STD=2.36 > patched AVG=921.27 STD=2.77 > > So not a big improvement here, actually it could even be in the noise. > But other workloads or systems might see a bigger win, and the patch > maybe is interesting or could be improved, so I'll ask for comments. So not a huge objection here, however I have some doubts as to whether this will be worthwhile on power5,6,7 since those optimized somewhat the behaviour of the full sync. Since anything older than power4 doesn't have lwsync, that potentially makes it not worth the pain. But I need to measure to be sure... it might be that newer embedded processors that support lwsync and SMP (and that are using a different pipeline structure) might benefit from this. I'll try to run some tests later this week or next week, but ping me in case I forget. Now what would be worth doing is to also try using a twi;isync sequence like we do to order MMIO reads, see if it's any better than cmp/branch Cheers, Ben. > --- > Index: linux-2.6/arch/powerpc/include/asm/system.h > =================================================================== > --- linux-2.6.orig/arch/powerpc/include/asm/system.h 2009-02-20 01:51:24.000000000 +1100 > +++ linux-2.6/arch/powerpc/include/asm/system.h 2009-02-20 02:09:41.000000000 +1100 > @@ -52,7 +52,16 @@ > # define SMPWMB eieio > #endif > > +#ifdef __powerpc64__ > +#define smp_mb() __asm__ __volatile__ ( \ > + "1: lwsync \n" \ > + " cmpw 0,%%r0,%%r0 \n" \ > + " bne- 1b \n" \ > + " isync \n" \ > + : : : "memory") > +#else > #define smp_mb() mb() > +#endif > #define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > #define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") > #define smp_read_barrier_depends() read_barrier_depends() ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 1/2] powerpc: optimise smp_mb 2009-03-04 4:03 ` [patch 1/2] powerpc: optimise smp_mb Benjamin Herrenschmidt @ 2009-03-04 9:38 ` Nick Piggin 0 siblings, 0 replies; 6+ messages in thread From: Nick Piggin @ 2009-03-04 9:38 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus On Wed, Mar 04, 2009 at 03:03:15PM +1100, Benjamin Herrenschmidt wrote: > Allright, sorry for the delay, I had those stored into my "need more > than half a brain cell for review" list and only got to them today :-) No problem :) > On Thu, 2009-02-19 at 18:12 +0100, Nick Piggin wrote: > > Using lwsync, isync sequence in a microbenchmark is 5 times faster on my G5 than > > using sync for smp_mb. Although it takes more instructions. > > > > Running tbench with 4 clients on my 4 core G5 (20 times) gives the > > following: > > > > unpatched AVG=920.33 STD=2.36 > > patched AVG=921.27 STD=2.77 > > > > So not a big improvement here, actually it could even be in the noise. > > But other workloads or systems might see a bigger win, and the patch > > maybe is interesting or could be improved, so I'll ask for comments. > > So not a huge objection here, however I have some doubts as to whether > this will be worthwhile on power5,6,7 since those optimized somewhat the > behaviour of the full sync. Since anything older than power4 doesn't > have lwsync, that potentially makes it not worth the pain. I would be interested to know. Avoiding sync when there *is* outstanding IO operations happening should be a win? (My test of tbench on localhost obviously wouldn't generate much MMIO). I mean, even in the most optimised implementation possible, this sequence is less constraining than sync. > But I need to measure to be sure... it might be that newer embedded > processors that support lwsync and SMP (and that are using a different > pipeline structure) might benefit from this. I'll try to run some tests > later this week or next week, but ping me in case I forget. OK I'll ping you next week. > Now what would be worth doing is to also try using a twi;isync sequence > like we do to order MMIO reads, see if it's any better than cmp/branch Probably makes sense to use the same pattern. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-03-04 10:16 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-19 17:12 [patch 1/2] powerpc: optimise smp_mb Nick Piggin 2009-02-19 17:21 ` [patch 2/2] powerpc: replace isync with lwsync Nick Piggin 2009-03-04 4:04 ` Benjamin Herrenschmidt 2009-03-04 10:15 ` Nick Piggin 2009-03-04 4:03 ` [patch 1/2] powerpc: optimise smp_mb Benjamin Herrenschmidt 2009-03-04 9:38 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).