* [patch] powerpc: smp_wmb lwsync optimisation fix @ 2008-11-01 12:33 Nick Piggin 2008-11-01 13:05 ` [patch] powerpc: rmp_wmb lwsync optimisation Nick Piggin ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Nick Piggin @ 2008-11-01 12:33 UTC (permalink / raw) To: paulus, benh, Anton Blanchard, linuxppc-dev A previous change removed __SUBARCH_HAS_LWSYNC define, and replaced it with __powerpc64__. smp_wmb() seems to be the last place not updated. Signed-off-by: Nick Piggin <npiggin@suse.de> --- Index: linux-2.6/arch/powerpc/include/asm/system.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/system.h 2008-11-01 20:31:51.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/system.h 2008-11-01 20:32:33.000000000 +1100 @@ -44,7 +44,7 @@ #define AT_VECTOR_SIZE_ARCH 6 /* entries in ARCH_DLINFO */ #ifdef CONFIG_SMP -#ifdef __SUBARCH_HAS_LWSYNC +#if defined(__powerpc64__) # define SMPWMB lwsync #else # define SMPWMB eieio ^ permalink raw reply [flat|nested] 7+ messages in thread
* [patch] powerpc: rmp_wmb lwsync optimisation 2008-11-01 12:33 [patch] powerpc: smp_wmb lwsync optimisation fix Nick Piggin @ 2008-11-01 13:05 ` Nick Piggin 2008-11-01 13:07 ` [rfc][patch] powerpc: replace isync with lwsync Nick Piggin 2008-11-01 16:47 ` [patch] powerpc: smp_wmb lwsync optimisation fix Kumar Gala 2 siblings, 0 replies; 7+ messages in thread From: Nick Piggin @ 2008-11-01 13:05 UTC (permalink / raw) To: paulus, benh, Anton Blanchard, linuxppc-dev smp_rmb can be lwsync if possible. Clarify the comment. Signed-off-by: Nick Piggin <npiggin@suse.de> --- Index: linux-2.6/arch/powerpc/include/asm/system.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/system.h 2008-11-01 23:56:39.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/system.h 2008-11-02 00:02:46.000000000 +1100 @@ -23,15 +23,17 @@ * read_barrier_depends() prevents data-dependent loads being reordered * across this point (nop on PPC). * - * We have to use the sync instructions for mb(), since lwsync doesn't - * order loads with respect to previous stores. Lwsync is fine for - * rmb(), though. Note that rmb() actually uses a sync on 32-bit - * architectures. + * *mb() variants without smp_ prefix must order all types of memory + * operations with one another. sync is the only instruction sufficient + * to do this. * - * For wmb(), we use sync since wmb is used in drivers to order - * stores to system memory with respect to writes to the device. - * However, smp_wmb() can be a lighter-weight lwsync or eieio barrier - * on SMP since it is only used to order updates to system memory. + * For the smp_ barriers, ordering is for cacheable memory operations + * only. We have to use the sync instruction for smp_mb(), since lwsync + * doesn't order loads with respect to previous stores. Lwsync can be + * used for smp_rmb() and smp_wmb(). + * + * However, on 32-bit, lwsync is actually just a sync, in which case smp_wmb() + * can be a lighter-weight eieio barrier. */ #define mb() __asm__ __volatile__ ("sync" : : : "memory") #define rmb() __asm__ __volatile__ ("sync" : : : "memory") @@ -51,7 +53,7 @@ #endif #define smp_mb() mb() -#define smp_rmb() rmb() +#define smp_rmb() __asm__ __volatile__(LWSYNC_ON_SMP : : : "memory") #define smp_wmb() __asm__ __volatile__ (__stringify(SMPWMB) : : :"memory") #define smp_read_barrier_depends() read_barrier_depends() #else ^ permalink raw reply [flat|nested] 7+ messages in thread
* [rfc][patch] powerpc: replace isync with lwsync 2008-11-01 12:33 [patch] powerpc: smp_wmb lwsync optimisation fix Nick Piggin 2008-11-01 13:05 ` [patch] powerpc: rmp_wmb lwsync optimisation Nick Piggin @ 2008-11-01 13:07 ` Nick Piggin 2008-11-03 5:32 ` Paul Mackerras 2008-11-01 16:47 ` [patch] powerpc: smp_wmb lwsync optimisation fix Kumar Gala 2 siblings, 1 reply; 7+ messages in thread From: Nick Piggin @ 2008-11-01 13:07 UTC (permalink / raw) To: paulus, benh, Anton Blanchard, linuxppc-dev Hi guys, This is an interesting one for me. AFAIKS it is possible to use lwsync for a full barrier after a successful ll/sc operation, right? (or stop me here if I'm wrong). Anyway, I was interested in exploring this. Unfortunately my G5 might not be very indicative of more modern, and future developments in high end powerpc CPUs... it would be interesting to get opinion and verification from insiders. OK, on my G5, using lwsync instead of isync in spinlocks is a bit faster in really stupid userspace microbenchmark (4 threads, looping, locking, incrementing shared variable, unlocking). This prompted me to look at bit further. So I converted a significant number of isyncs in the kernel to lwsync. The resulting kernel (on 2 core, 2 socket system) ran tbench consistently about 1.75% faster than unpatched (avg ~934MB/s vs ~918MB/s) (Tbench was just the first benchmark I picked that could run really quickly and give relatively stable numbers). This seems pretty significant. More than I was expecting. I've attached the patch I used (I've not thoroughly audited the code for all users of isync, only replaced some main ones) Now I'd like to know why this is faster, whether it makes sense to to, whether it helps with more useful workloads and modern systems. isync followed by a branch I guess does something like puts a bubble into the pipeline until the branch retires? So it is probably always going to cost some cycles. lwsync on the other hand, I suppose has to do a bit more when it comes to the store queue. Maybe flush it or insert a barrier or something into it. Also has some ordering of loads, but effectively no more than isync AFAIKS. Thanks, Nick --- Index: linux-2.6/arch/powerpc/include/asm/atomic.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/atomic.h 2008-11-01 20:36:12.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/atomic.h 2008-11-01 20:36:33.000000000 +1100 @@ -55,7 +55,7 @@ PPC405_ERR77(0,%2) " stwcx. %0,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (a), "r" (&v->counter) : "cc", "memory"); @@ -91,7 +91,7 @@ PPC405_ERR77(0,%2) " stwcx. %0,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (a), "r" (&v->counter) : "cc", "memory"); @@ -125,7 +125,7 @@ PPC405_ERR77(0,%1) " stwcx. %0,0,%1 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (&v->counter) : "cc", "memory"); @@ -169,7 +169,7 @@ PPC405_ERR77(0,%1) " stwcx. %0,0,%1\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (&v->counter) : "cc", "memory"); @@ -202,7 +202,7 @@ PPC405_ERR77(0,%2) " stwcx. %0,0,%1 \n\ bne- 1b \n" - ISYNC_ON_SMP + LWSYNC_ON_SMP " subf %0,%2,%0 \n\ 2:" : "=&r" (t) @@ -235,7 +235,7 @@ PPC405_ERR77(0,%1) " stwcx. %0,0,%1\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP "\n\ 2:" : "=&b" (t) : "r" (&v->counter) @@ -293,7 +293,7 @@ add %0,%1,%0\n\ stdcx. %0,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (a), "r" (&v->counter) : "cc", "memory"); @@ -327,7 +327,7 @@ subf %0,%1,%0\n\ stdcx. %0,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (a), "r" (&v->counter) : "cc", "memory"); @@ -359,7 +359,7 @@ addic %0,%0,1\n\ stdcx. %0,0,%1 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (&v->counter) : "cc", "memory"); @@ -401,7 +401,7 @@ addic %0,%0,-1\n\ stdcx. %0,0,%1\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (t) : "r" (&v->counter) : "cc", "memory"); @@ -427,7 +427,7 @@ blt- 2f\n\ stdcx. %0,0,%1\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP "\n\ 2:" : "=&r" (t) : "r" (&v->counter) @@ -460,7 +460,7 @@ add %0,%2,%0 \n" " stdcx. %0,0,%1 \n\ bne- 1b \n" - ISYNC_ON_SMP + LWSYNC_ON_SMP " subf %0,%2,%0 \n\ 2:" : "=&r" (t) Index: linux-2.6/arch/powerpc/include/asm/bitops.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/bitops.h 2008-11-01 20:36:12.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/bitops.h 2008-11-01 20:36:40.000000000 +1100 @@ -139,7 +139,7 @@ PPC405_ERR77(0,%3) PPC_STLCX "%1,0,%3 \n" "bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (old), "=&r" (t) : "r" (mask), "r" (p) : "cc", "memory"); @@ -160,7 +160,7 @@ PPC405_ERR77(0,%3) PPC_STLCX "%1,0,%3 \n" "bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (old), "=&r" (t) : "r" (mask), "r" (p) : "cc", "memory"); @@ -182,7 +182,7 @@ PPC405_ERR77(0,%3) PPC_STLCX "%1,0,%3 \n" "bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (old), "=&r" (t) : "r" (mask), "r" (p) : "cc", "memory"); @@ -204,7 +204,7 @@ PPC405_ERR77(0,%3) PPC_STLCX "%1,0,%3 \n" "bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (old), "=&r" (t) : "r" (mask), "r" (p) : "cc", "memory"); Index: linux-2.6/arch/powerpc/include/asm/futex.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/futex.h 2008-11-01 20:36:12.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/futex.h 2008-11-01 20:36:45.000000000 +1100 @@ -97,7 +97,7 @@ PPC405_ERR77(0,%2) "2: stwcx. %4,0,%2\n\ bne- 1b\n" - ISYNC_ON_SMP + LWSYNC_ON_SMP "3: .section .fixup,\"ax\"\n\ 4: li %0,%5\n\ b 3b\n\ Index: linux-2.6/arch/powerpc/include/asm/spinlock.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/spinlock.h 2008-11-01 20:34:45.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/spinlock.h 2008-11-01 20:35:27.000000000 +1100 @@ -65,7 +65,7 @@ bne- 2f\n\ stwcx. %1,0,%2\n\ bne- 1b\n\ - isync\n\ + lwsync\n\ 2:" : "=&r" (tmp) : "r" (token), "r" (&lock->slock) : "cr0", "memory"); @@ -193,7 +193,7 @@ PPC405_ERR77(0,%1) " stwcx. %0,0,%1\n\ bne- 1b\n\ - isync\n\ + lwsync\n\ 2:" : "=&r" (tmp) : "r" (&rw->lock) : "cr0", "xer", "memory"); @@ -217,7 +217,7 @@ PPC405_ERR77(0,%1) " stwcx. %1,0,%2\n\ bne- 1b\n\ - isync\n\ + lwsync\n\ 2:" : "=&r" (tmp) : "r" (token), "r" (&rw->lock) : "cr0", "memory"); Index: linux-2.6/arch/powerpc/include/asm/system.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/system.h 2008-11-01 20:34:45.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/system.h 2008-11-01 20:36:54.000000000 +1100 @@ -235,7 +235,7 @@ PPC405_ERR77(0,%2) " stwcx. %3,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (prev), "+m" (*(volatile unsigned int *)p) : "r" (p), "r" (val) : "cc", "memory"); @@ -278,7 +278,7 @@ PPC405_ERR77(0,%2) " stdcx. %3,0,%2 \n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP : "=&r" (prev), "+m" (*(volatile unsigned long *)p) : "r" (p), "r" (val) : "cc", "memory"); @@ -371,7 +371,7 @@ PPC405_ERR77(0,%2) " stwcx. %4,0,%2\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP "\n\ 2:" : "=&r" (prev), "+m" (*p) @@ -416,7 +416,7 @@ bne- 2f\n\ stdcx. %4,0,%2\n\ bne- 1b" - ISYNC_ON_SMP + LWSYNC_ON_SMP "\n\ 2:" : "=&r" (prev), "+m" (*p) Index: linux-2.6/arch/powerpc/include/asm/synch.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/synch.h 2008-11-01 20:34:45.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/synch.h 2008-11-01 20:39:42.000000000 +1100 @@ -34,7 +34,7 @@ #ifdef CONFIG_SMP #define ISYNC_ON_SMP "\n\tisync\n" -#define LWSYNC_ON_SMP stringify_in_c(LWSYNC) "\n" +#define LWSYNC_ON_SMP "\n\t" stringify_in_c(LWSYNC) "\n" #else #define ISYNC_ON_SMP #define LWSYNC_ON_SMP ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc][patch] powerpc: replace isync with lwsync 2008-11-01 13:07 ` [rfc][patch] powerpc: replace isync with lwsync Nick Piggin @ 2008-11-03 5:32 ` Paul Mackerras 2008-11-03 8:31 ` Nick Piggin 0 siblings, 1 reply; 7+ messages in thread From: Paul Mackerras @ 2008-11-03 5:32 UTC (permalink / raw) To: Nick Piggin; +Cc: Anton Blanchard, linuxppc-dev Nick Piggin writes: > This is an interesting one for me. AFAIKS it is possible to use lwsync for > a full barrier after a successful ll/sc operation, right? (or stop me here > if I'm wrong). An lwsync would order subsequent loads after the lwarx/ldarx, and subsequent stores after the stcwx./stcdx., which should be good enough. > isync followed by a branch I guess does something like puts a bubble > into the pipeline until the branch retires? So it is probably always > going to cost some cycles. I don't know about "retires", but isync is going to stop following instructions from executing until the outcome of the branch is known. On machines that don't have lwsync we will still want to use isync (since the other alternative would be the full heavyweight sync). Your patch doesn't seem to do that. Paul. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc][patch] powerpc: replace isync with lwsync 2008-11-03 5:32 ` Paul Mackerras @ 2008-11-03 8:31 ` Nick Piggin 0 siblings, 0 replies; 7+ messages in thread From: Nick Piggin @ 2008-11-03 8:31 UTC (permalink / raw) To: Paul Mackerras; +Cc: Anton Blanchard, linuxppc-dev On Mon, Nov 03, 2008 at 04:32:22PM +1100, Paul Mackerras wrote: > Nick Piggin writes: > > > This is an interesting one for me. AFAIKS it is possible to use lwsync for > > a full barrier after a successful ll/sc operation, right? (or stop me here > > if I'm wrong). > > An lwsync would order subsequent loads after the lwarx/ldarx, and > subsequent stores after the stcwx./stcdx., which should be good > enough. OK, thanks for confirmation. > > isync followed by a branch I guess does something like puts a bubble > > into the pipeline until the branch retires? So it is probably always > > going to cost some cycles. > > I don't know about "retires", but isync is going to stop following > instructions from executing until the outcome of the branch is known. OK, I probably don't use the right terminology. I assume the branch retires when its outcome is known and the CPU starts executing the result. > On machines that don't have lwsync we will still want to use isync > (since the other alternative would be the full heavyweight sync). > Your patch doesn't seem to do that. No, it's just a quick hack at the moment. I think your reply gets it past the not-totally-broken stage :) But at this point I can't justify sending such a change upstream based on a small improvement on G5. I would like to know about newer POWER CPUs, and even unreleased ones. If there is some reason lwsync gets relatively more constraining or difficult to execute than isync, then maybe this change is not useful. Thanks, Nick ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] powerpc: smp_wmb lwsync optimisation fix 2008-11-01 12:33 [patch] powerpc: smp_wmb lwsync optimisation fix Nick Piggin 2008-11-01 13:05 ` [patch] powerpc: rmp_wmb lwsync optimisation Nick Piggin 2008-11-01 13:07 ` [rfc][patch] powerpc: replace isync with lwsync Nick Piggin @ 2008-11-01 16:47 ` Kumar Gala 2008-11-02 1:42 ` Nick Piggin 2 siblings, 1 reply; 7+ messages in thread From: Kumar Gala @ 2008-11-01 16:47 UTC (permalink / raw) To: Nick Piggin; +Cc: paulus, Anton Blanchard, linuxppc-dev On Nov 1, 2008, at 7:33 AM, Nick Piggin wrote: > A previous change removed __SUBARCH_HAS_LWSYNC define, and replaced it > with __powerpc64__. smp_wmb() seems to be the last place not updated. Uugh... no.. I missed the patch that removed __SUBARCH_HAS_LWSYNC, but thats no good. We have LWSYNC on non-powerpc64 machines. Will go figure out who forgets we have ppc32 machines :) - k > > > Signed-off-by: Nick Piggin <npiggin@suse.de> > --- > Index: linux-2.6/arch/powerpc/include/asm/system.h > =================================================================== > --- linux-2.6.orig/arch/powerpc/include/asm/system.h 2008-11-01 > 20:31:51.000000000 +1100 > +++ linux-2.6/arch/powerpc/include/asm/system.h 2008-11-01 > 20:32:33.000000000 +1100 > @@ -44,7 +44,7 @@ > #define AT_VECTOR_SIZE_ARCH 6 /* entries in ARCH_DLINFO */ > #ifdef CONFIG_SMP > > -#ifdef __SUBARCH_HAS_LWSYNC > +#if defined(__powerpc64__) > # define SMPWMB lwsync > #else > # define SMPWMB eieio > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] powerpc: smp_wmb lwsync optimisation fix 2008-11-01 16:47 ` [patch] powerpc: smp_wmb lwsync optimisation fix Kumar Gala @ 2008-11-02 1:42 ` Nick Piggin 0 siblings, 0 replies; 7+ messages in thread From: Nick Piggin @ 2008-11-02 1:42 UTC (permalink / raw) To: Kumar Gala; +Cc: paulus, Anton Blanchard, linuxppc-dev On Sat, Nov 01, 2008 at 11:47:58AM -0500, Kumar Gala wrote: > > On Nov 1, 2008, at 7:33 AM, Nick Piggin wrote: > > >A previous change removed __SUBARCH_HAS_LWSYNC define, and replaced it > >with __powerpc64__. smp_wmb() seems to be the last place not updated. > > Uugh... no.. I missed the patch that removed __SUBARCH_HAS_LWSYNC, but > thats no good. We have LWSYNC on non-powerpc64 machines. Will go > figure out who forgets we have ppc32 machines :) I think it may have been you :) But actually, SMPWMB would need more massaging before it is suitable, by the looks of your LWSYNC define in synch.h. I don't mind, so much, about how this gets fixed. But it would be nice to fix it somehow. After the smp_rmb and smp_wmb patches, there are practically no sync instructions left in mm/ :) > > - k > > > > > > >Signed-off-by: Nick Piggin <npiggin@suse.de> > >--- > >Index: linux-2.6/arch/powerpc/include/asm/system.h > >=================================================================== > >--- linux-2.6.orig/arch/powerpc/include/asm/system.h 2008-11-01 > >20:31:51.000000000 +1100 > >+++ linux-2.6/arch/powerpc/include/asm/system.h 2008-11-01 > >20:32:33.000000000 +1100 > >@@ -44,7 +44,7 @@ > >#define AT_VECTOR_SIZE_ARCH 6 /* entries in ARCH_DLINFO */ > >#ifdef CONFIG_SMP > > > >-#ifdef __SUBARCH_HAS_LWSYNC > >+#if defined(__powerpc64__) > ># define SMPWMB lwsync > >#else > ># define SMPWMB eieio > >_______________________________________________ > >Linuxppc-dev mailing list > >Linuxppc-dev@ozlabs.org > >https://ozlabs.org/mailman/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-11-03 8:31 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-01 12:33 [patch] powerpc: smp_wmb lwsync optimisation fix Nick Piggin 2008-11-01 13:05 ` [patch] powerpc: rmp_wmb lwsync optimisation Nick Piggin 2008-11-01 13:07 ` [rfc][patch] powerpc: replace isync with lwsync Nick Piggin 2008-11-03 5:32 ` Paul Mackerras 2008-11-03 8:31 ` Nick Piggin 2008-11-01 16:47 ` [patch] powerpc: smp_wmb lwsync optimisation fix Kumar Gala 2008-11-02 1:42 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).