linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [patch 1/2] powerpc: rmb fix
@ 2008-05-21 14:10 Nick Piggin
  2008-05-21 14:12 ` [patch 2/2] powerpc: optimise smp_wmb Nick Piggin
  2008-05-21 15:27 ` [patch 1/2] powerpc: rmb fix Benjamin Herrenschmidt
  0 siblings, 2 replies; 23+ messages in thread
From: Nick Piggin @ 2008-05-21 14:10 UTC (permalink / raw)
  To: paulus, linuxppc-dev

Hi,

I'm sure I've sent these patches before, but I can't remember why they
weren't merged. They still seem obviously correct to me.

--

lwsync is explicitly defined not to have any effect on the ordering of
accesses to device memory, so it cannot be used for rmb(). sync appears
to be the only barrier which fits the bill.

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/include/asm-powerpc/system.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/system.h
+++ linux-2.6/include/asm-powerpc/system.h
@@ -34,7 +34,7 @@
  * SMP since it is only used to order updates to system memory.
  */
 #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
-#define rmb()  __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
+#define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
 #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
 #define read_barrier_depends()  do { } while(0)
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 14:10 [patch 1/2] powerpc: rmb fix Nick Piggin
@ 2008-05-21 14:12 ` Nick Piggin
  2008-05-21 15:26   ` Benjamin Herrenschmidt
  2008-05-21 20:16   ` Segher Boessenkool
  2008-05-21 15:27 ` [patch 1/2] powerpc: rmb fix Benjamin Herrenschmidt
  1 sibling, 2 replies; 23+ messages in thread
From: Nick Piggin @ 2008-05-21 14:12 UTC (permalink / raw)
  To: paulus, linuxppc-dev


lwsync is the recommended method of store/store ordering on caching enabled
memory. For those subarchs which have lwsync, use it rather than eieio for
smp_wmb.

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/include/asm-powerpc/system.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/system.h
+++ linux-2.6/include/asm-powerpc/system.h
@@ -30,8 +30,8 @@
  *
  * For wmb(), we use sync since wmb is used in drivers to order
  * stores to system memory with respect to writes to the device.
- * However, smp_wmb() can be a lighter-weight eieio barrier on
- * SMP since it is only used to order updates to system memory.
+ * However, smp_wmb() can be a lighter-weight lwsync or eieio barrier
+ * on SMP since it is only used to order updates to system memory.
  */
 #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
 #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
@@ -43,9 +43,16 @@
 #ifdef __KERNEL__
 #define AT_VECTOR_SIZE_ARCH 6 /* entries in ARCH_DLINFO */
 #ifdef CONFIG_SMP
+
+#ifdef __SUBARCH_HAS_LWSYNC
+#    define SMPWMB      lwsync
+#else
+#    define SMPWMB      eieio
+#endif
+
 #define smp_mb()	mb()
 #define smp_rmb()	rmb()
-#define smp_wmb()	eieio()
+#define smp_wmb()	__asm__ __volatile__ (__stringify(SMPWMB) : : :"memory")
 #define smp_read_barrier_depends()	read_barrier_depends()
 #else
 #define smp_mb()	barrier()

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 14:12 ` [patch 2/2] powerpc: optimise smp_wmb Nick Piggin
@ 2008-05-21 15:26   ` Benjamin Herrenschmidt
  2008-05-21 15:34     ` Nick Piggin
  2008-05-21 20:16   ` Segher Boessenkool
  1 sibling, 1 reply; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 15:26 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linuxppc-dev, paulus


On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
> lwsync is the recommended method of store/store ordering on caching enabled
> memory. For those subarchs which have lwsync, use it rather than eieio for
> smp_wmb.

Yuck... existence of lwsync depends on the processor at boot time...

Ben.

> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> Index: linux-2.6/include/asm-powerpc/system.h
> ===================================================================
> --- linux-2.6.orig/include/asm-powerpc/system.h
> +++ linux-2.6/include/asm-powerpc/system.h
> @@ -30,8 +30,8 @@
>   *
>   * For wmb(), we use sync since wmb is used in drivers to order
>   * stores to system memory with respect to writes to the device.
> - * However, smp_wmb() can be a lighter-weight eieio barrier on
> - * SMP since it is only used to order updates to system memory.
> + * However, smp_wmb() can be a lighter-weight lwsync or eieio barrier
> + * on SMP since it is only used to order updates to system memory.
>   */
>  #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
>  #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
> @@ -43,9 +43,16 @@
>  #ifdef __KERNEL__
>  #define AT_VECTOR_SIZE_ARCH 6 /* entries in ARCH_DLINFO */
>  #ifdef CONFIG_SMP
> +
> +#ifdef __SUBARCH_HAS_LWSYNC
> +#    define SMPWMB      lwsync
> +#else
> +#    define SMPWMB      eieio
> +#endif
> +
>  #define smp_mb()	mb()
>  #define smp_rmb()	rmb()
> -#define smp_wmb()	eieio()
> +#define smp_wmb()	__asm__ __volatile__ (__stringify(SMPWMB) : : :"memory")
>  #define smp_read_barrier_depends()	read_barrier_depends()
>  #else
>  #define smp_mb()	barrier()
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/2] powerpc: rmb fix
  2008-05-21 14:10 [patch 1/2] powerpc: rmb fix Nick Piggin
  2008-05-21 14:12 ` [patch 2/2] powerpc: optimise smp_wmb Nick Piggin
@ 2008-05-21 15:27 ` Benjamin Herrenschmidt
  2008-05-21 15:32   ` Nick Piggin
  1 sibling, 1 reply; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 15:27 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linuxppc-dev, paulus


On Wed, 2008-05-21 at 16:10 +0200, Nick Piggin wrote:
> Hi,
> 
> I'm sure I've sent these patches before, but I can't remember why they
> weren't merged. They still seem obviously correct to me.

We should already do all that's needed in our IO accessors no ?

> --
> 
> lwsync is explicitly defined not to have any effect on the ordering of
> accesses to device memory, so it cannot be used for rmb(). sync appears
> to be the only barrier which fits the bill.
> 
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> Index: linux-2.6/include/asm-powerpc/system.h
> ===================================================================
> --- linux-2.6.orig/include/asm-powerpc/system.h
> +++ linux-2.6/include/asm-powerpc/system.h
> @@ -34,7 +34,7 @@
>   * SMP since it is only used to order updates to system memory.
>   */
>  #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
> -#define rmb()  __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
> +#define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
>  #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
>  #define read_barrier_depends()  do { } while(0)
>  
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/2] powerpc: rmb fix
  2008-05-21 15:27 ` [patch 1/2] powerpc: rmb fix Benjamin Herrenschmidt
@ 2008-05-21 15:32   ` Nick Piggin
  2008-05-21 15:43     ` Benjamin Herrenschmidt
  2008-05-23  2:14     ` Paul Mackerras
  0 siblings, 2 replies; 23+ messages in thread
From: Nick Piggin @ 2008-05-21 15:32 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus

On Wed, May 21, 2008 at 11:27:03AM -0400, Benjamin Herrenschmidt wrote:
> 
> On Wed, 2008-05-21 at 16:10 +0200, Nick Piggin wrote:
> > Hi,
> > 
> > I'm sure I've sent these patches before, but I can't remember why they
> > weren't merged. They still seem obviously correct to me.
> 
> We should already do all that's needed in our IO accessors no ?

More than one device driver does raw/relaxed io accessors and expects the
*mb functions to order them.

> 
> > --
> > 
> > lwsync is explicitly defined not to have any effect on the ordering of
> > accesses to device memory, so it cannot be used for rmb(). sync appears
> > to be the only barrier which fits the bill.
> > 
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > ---
> > Index: linux-2.6/include/asm-powerpc/system.h
> > ===================================================================
> > --- linux-2.6.orig/include/asm-powerpc/system.h
> > +++ linux-2.6/include/asm-powerpc/system.h
> > @@ -34,7 +34,7 @@
> >   * SMP since it is only used to order updates to system memory.
> >   */
> >  #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
> > -#define rmb()  __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
> > +#define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
> >  #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
> >  #define read_barrier_depends()  do { } while(0)
> >  
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@ozlabs.org
> > https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 15:26   ` Benjamin Herrenschmidt
@ 2008-05-21 15:34     ` Nick Piggin
  2008-05-21 15:43       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2008-05-21 15:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus

On Wed, May 21, 2008 at 11:26:32AM -0400, Benjamin Herrenschmidt wrote:
> 
> On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
> > lwsync is the recommended method of store/store ordering on caching enabled
> > memory. For those subarchs which have lwsync, use it rather than eieio for
> > smp_wmb.
> 
> Yuck... existence of lwsync depends on the processor at boot time...

Not according to the __stringify(LWSYNC) that I just removed. At least,
presumably it is always present on 64 bit processors, and 32 bit ones
will be no worse off as they'll continue just using eieio.

> 
> Ben.
> 
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > ---
> > Index: linux-2.6/include/asm-powerpc/system.h
> > ===================================================================
> > --- linux-2.6.orig/include/asm-powerpc/system.h
> > +++ linux-2.6/include/asm-powerpc/system.h
> > @@ -30,8 +30,8 @@
> >   *
> >   * For wmb(), we use sync since wmb is used in drivers to order
> >   * stores to system memory with respect to writes to the device.
> > - * However, smp_wmb() can be a lighter-weight eieio barrier on
> > - * SMP since it is only used to order updates to system memory.
> > + * However, smp_wmb() can be a lighter-weight lwsync or eieio barrier
> > + * on SMP since it is only used to order updates to system memory.
> >   */
> >  #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
> >  #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
> > @@ -43,9 +43,16 @@
> >  #ifdef __KERNEL__
> >  #define AT_VECTOR_SIZE_ARCH 6 /* entries in ARCH_DLINFO */
> >  #ifdef CONFIG_SMP
> > +
> > +#ifdef __SUBARCH_HAS_LWSYNC
> > +#    define SMPWMB      lwsync
> > +#else
> > +#    define SMPWMB      eieio
> > +#endif
> > +
> >  #define smp_mb()	mb()
> >  #define smp_rmb()	rmb()
> > -#define smp_wmb()	eieio()
> > +#define smp_wmb()	__asm__ __volatile__ (__stringify(SMPWMB) : : :"memory")
> >  #define smp_read_barrier_depends()	read_barrier_depends()
> >  #else
> >  #define smp_mb()	barrier()
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@ozlabs.org
> > https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 15:34     ` Nick Piggin
@ 2008-05-21 15:43       ` Benjamin Herrenschmidt
  2008-05-21 15:47         ` Nick Piggin
  2008-05-21 16:01         ` Nick Piggin
  0 siblings, 2 replies; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 15:43 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linuxppc-dev, paulus


On Wed, 2008-05-21 at 17:34 +0200, Nick Piggin wrote:
> On Wed, May 21, 2008 at 11:26:32AM -0400, Benjamin Herrenschmidt wrote:
> > 
> > On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
> > > lwsync is the recommended method of store/store ordering on caching enabled
> > > memory. For those subarchs which have lwsync, use it rather than eieio for
> > > smp_wmb.
> > 
> > Yuck... existence of lwsync depends on the processor at boot time...
> 
> Not according to the __stringify(LWSYNC) that I just removed. At least,
> presumably it is always present on 64 bit processors, and 32 bit ones
> will be no worse off as they'll continue just using eieio.

No, it doesn't exist on power3, but it degrades into a sync

Ben.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/2] powerpc: rmb fix
  2008-05-21 15:32   ` Nick Piggin
@ 2008-05-21 15:43     ` Benjamin Herrenschmidt
  2008-05-23  2:14     ` Paul Mackerras
  1 sibling, 0 replies; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 15:43 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linuxppc-dev, paulus


On Wed, 2008-05-21 at 17:32 +0200, Nick Piggin wrote:
> > We should already do all that's needed in our IO accessors no ?
> 
> More than one device driver does raw/relaxed io accessors and expects
> the
> *mb functions to order them.

That's fishy... ok.

Ben.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 15:43       ` Benjamin Herrenschmidt
@ 2008-05-21 15:47         ` Nick Piggin
  2008-05-21 16:02           ` Benjamin Herrenschmidt
  2008-05-21 16:01         ` Nick Piggin
  1 sibling, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2008-05-21 15:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus

On Wed, May 21, 2008 at 11:43:00AM -0400, Benjamin Herrenschmidt wrote:
> 
> On Wed, 2008-05-21 at 17:34 +0200, Nick Piggin wrote:
> > On Wed, May 21, 2008 at 11:26:32AM -0400, Benjamin Herrenschmidt wrote:
> > > 
> > > On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
> > > > lwsync is the recommended method of store/store ordering on caching enabled
> > > > memory. For those subarchs which have lwsync, use it rather than eieio for
> > > > smp_wmb.
> > > 
> > > Yuck... existence of lwsync depends on the processor at boot time...
> > 
> > Not according to the __stringify(LWSYNC) that I just removed. At least,
> > presumably it is always present on 64 bit processors, and 32 bit ones
> > will be no worse off as they'll continue just using eieio.
> 
> No, it doesn't exist on power3, but it degrades into a sync

OK, but I just don't understand what the problem is... your synch.h has

#ifdef __powerpc64__
#define __SUBARCH_HAS_LWSYNC
#endif

#ifdef __SUBARCH_HAS_LWSYNC
#    define LWSYNC      lwsync
#else
#    define LWSYNC      sync
#endif


And LWSYNC is then used for rmb()... how was that OK but this not?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 15:43       ` Benjamin Herrenschmidt
  2008-05-21 15:47         ` Nick Piggin
@ 2008-05-21 16:01         ` Nick Piggin
  2008-05-21 20:12           ` Segher Boessenkool
  1 sibling, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2008-05-21 16:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus

On Wed, May 21, 2008 at 11:43:00AM -0400, Benjamin Herrenschmidt wrote:
> 
> On Wed, 2008-05-21 at 17:34 +0200, Nick Piggin wrote:
> > On Wed, May 21, 2008 at 11:26:32AM -0400, Benjamin Herrenschmidt wrote:
> > > 
> > > On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
> > > > lwsync is the recommended method of store/store ordering on caching enabled
> > > > memory. For those subarchs which have lwsync, use it rather than eieio for
> > > > smp_wmb.
> > > 
> > > Yuck... existence of lwsync depends on the processor at boot time...
> > 
> > Not according to the __stringify(LWSYNC) that I just removed. At least,
> > presumably it is always present on 64 bit processors, and 32 bit ones
> > will be no worse off as they'll continue just using eieio.
> 
> No, it doesn't exist on power3, but it degrades into a sync

Oh, the instruction does exist, but it degrades to a sync so
actually turns out to be slower than eieio?

I think it would be a good idea just to take the hit on power3.
>From memory, I measured lwsync is 5 times faster than eieio on
a dual G5. This was on a simple microbenchmark that made use of
smp_wmb for store ordering, but it did not involve any IO access
(which presumably would disadvantage eieio further).

Given the G5 speedup, I'd be surprised if there is not an improvment
on POWER4 and 5 as well, although no idea about POWER6 or cell...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 15:47         ` Nick Piggin
@ 2008-05-21 16:02           ` Benjamin Herrenschmidt
  2008-05-21 20:51             ` Segher Boessenkool
  0 siblings, 1 reply; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 16:02 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linuxppc-dev, paulus


On Wed, 2008-05-21 at 17:47 +0200, Nick Piggin wrote:
> 
> OK, but I just don't understand what the problem is... your synch.h
> has
> 
> #ifdef __powerpc64__
> #define __SUBARCH_HAS_LWSYNC
> #endif
> 
> #ifdef __SUBARCH_HAS_LWSYNC
> #    define LWSYNC      lwsync
> #else
> #    define LWSYNC      sync
> #endif
> 

This is mostly useless then since lwsync is just a sync to a processor
that doesn't know it (it's a sync with a reservd bit set) :-) Or it's
just to make gas happy if you specify a processor type that doesn't have
lwsync ?

Paul, do we have a case of a 32 bits CPU that chokes on the added bit ?

Ben.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 16:01         ` Nick Piggin
@ 2008-05-21 20:12           ` Segher Boessenkool
  2008-05-21 20:44             ` Benjamin Herrenschmidt
  2008-05-22  0:30             ` Nick Piggin
  0 siblings, 2 replies; 23+ messages in thread
From: Segher Boessenkool @ 2008-05-21 20:12 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linuxppc-dev, paulus

>> From memory, I measured lwsync is 5 times faster than eieio on
> a dual G5. This was on a simple microbenchmark that made use of
> smp_wmb for store ordering, but it did not involve any IO access
> (which presumably would disadvantage eieio further).

This is very much specific to your particular benchmark.

On the 970, there are two differences between lwsync and eieio:

1) lwsync cannot issue before all previous loads are done; eieio
does not have this restriction.

Then, they both fly through the execution core, it doesn't wait
for the barrier insn to complete in the storage system.  In both
cases, a barrier is inserted into both the L2 queues and the
non-cacheable queues.  These barriers are both removed at the
same time, that is, when both are the oldest in their queue and
have done their thing.

2) For eieio, the non-cacheable unit waits for all previous
(non-cacheable) stores to complete, and then arbitrates for the
bus and sends an EIEIO transaction.

Your benchmark doesn't do non-cacheable stores, so it would seem
the five-time slowdown is caused by that bus arbitration (and the
bus transaction).  Maybe your cacheable stores hit the bus as well,
that would make this worse.  Your benchmark also doesn't see the
negative effects from 1).

In "real" code, I expect 2) to be pretty much invisible (the store
queues will never be completely filled up), but 1) shouldn't be very
bad either.  So it's a wash.  But only a real benchmark will tell.

> Given the G5 speedup, I'd be surprised if there is not an improvment
> on POWER4 and 5 as well,

The 970 storage subsystem and the POWER4 one are very different.
Or maybe these queues are just about the last thing that _is_
identical, I dunno, there aren't public POWER4 docs for this ;-)

> although no idea about POWER6 or cell...

No idea about POWER6; for CBE, the backend works similar to the
970 one.

Given that the architecture says to use lwsync for cases like this,
it would be very surprising if it performed (much) worse than eieio,
eh? ;-)  So I think your patch is a win; just wanted to clarify on
your five-time slowdown number.

HTH,


Segher

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 14:12 ` [patch 2/2] powerpc: optimise smp_wmb Nick Piggin
  2008-05-21 15:26   ` Benjamin Herrenschmidt
@ 2008-05-21 20:16   ` Segher Boessenkool
  1 sibling, 0 replies; 23+ messages in thread
From: Segher Boessenkool @ 2008-05-21 20:16 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linuxppc-dev, paulus

> +#ifdef __SUBARCH_HAS_LWSYNC
> +#    define SMPWMB      lwsync
> +#else
> +#    define SMPWMB      eieio
> +#endif
> +
>  #define smp_mb()	mb()
>  #define smp_rmb()	rmb()
> -#define smp_wmb()	eieio()
> +#define smp_wmb()	__asm__ __volatile__ (__stringify(SMPWMB) : : 
> :"memory")

SMPWMB is used only here.  Why not just

#ifdef __SUBARCH_HAS_LWSYNC
#define smp_wmb() lwsync()
#else
#define smp_wmb() eieio()
#endif

?


Segher

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 20:12           ` Segher Boessenkool
@ 2008-05-21 20:44             ` Benjamin Herrenschmidt
  2008-05-21 22:07               ` Segher Boessenkool
  2008-05-22  0:30             ` Nick Piggin
  1 sibling, 1 reply; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 20:44 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Nick Piggin, linuxppc-dev, paulus


On Wed, 2008-05-21 at 22:12 +0200, Segher Boessenkool wrote:
> No idea about POWER6; for CBE, the backend works similar to the
> 970 one.
> 
> Given that the architecture says to use lwsync for cases like this,
> it would be very surprising if it performed (much) worse than eieio,
> eh? ;-)  So I think your patch is a win; just wanted to clarify on
> your five-time slowdown number.

It makes sense to use lwsync rather than eieio for smb_wmb() as this is
not supposed to be used to order with cache inhibited storage. It's
really a data barrier used by the kernel for normal cacheable storage.

The main question is do we care if the downgrade to sync on power3 hurts
performances (and does it ?) and what do we do for 32 bits as currently,
no 32 bits implementation has lwsync afaik (though that might not be
true for long).

Also, we don't, I think, have verified that they all properly ignore the
added bit and behave as sync rather than program checking..

Ben.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 16:02           ` Benjamin Herrenschmidt
@ 2008-05-21 20:51             ` Segher Boessenkool
  0 siblings, 0 replies; 23+ messages in thread
From: Segher Boessenkool @ 2008-05-21 20:51 UTC (permalink / raw)
  To: benh; +Cc: Nick Piggin, linuxppc-dev, paulus

> This is mostly useless then since lwsync is just a sync to a processor
> that doesn't know it (it's a sync with a reservd bit set) :-) Or it's
> just to make gas happy if you specify a processor type that doesn't 
> have
> lwsync ?

GAS doesn't care (I tried with -Wa,-m405).  Support for this insn
was added to binutils in august 2001; do we still support older
binutils than that?  If so, we should just do the ".long" trick in
a lwsync() inline function, like we do for similar cases.


Segher

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 20:44             ` Benjamin Herrenschmidt
@ 2008-05-21 22:07               ` Segher Boessenkool
  0 siblings, 0 replies; 23+ messages in thread
From: Segher Boessenkool @ 2008-05-21 22:07 UTC (permalink / raw)
  To: benh; +Cc: Nick Piggin, linuxppc-dev, paulus

> The main question is do we care if the downgrade to sync on power3 
> hurts
> performances (and does it ?) and what do we do for 32 bits as 
> currently,
> no 32 bits implementation has lwsync afaik (though that might not be
> true for long).

Some time ago, I benchmarked (*) a loop of "stw;sync" vs. "stw;eieio"
on a 750 (yeah, great micro-benchmark, heh); and the sync version
was a factor ~100 slower.

> Also, we don't, I think, have verified that they all properly ignore 
> the
> added bit and behave as sync rather than program checking..

The architecture books don't have a programming note about this
(like they do in similar cases), so either a) it works fine on
every implementation; or b) someone forgot to add such a note.

We shouldn't use lwsync on 32-bit (it's an awful performance hit
on "classic" cpus, and who knows what it does on embedded cpus
(bookE *shudder*)).  For POWER3 and *star, i.e., all 64-bit?  Perhaps
it'll be okay there, but if as you suggest some 32-bit CPUs will add
proper lwsync support soonish, maybe a feature fixup or some such is
in order.


Segher


(*) Not on purpose.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/2] powerpc: optimise smp_wmb
  2008-05-21 20:12           ` Segher Boessenkool
  2008-05-21 20:44             ` Benjamin Herrenschmidt
@ 2008-05-22  0:30             ` Nick Piggin
  1 sibling, 0 replies; 23+ messages in thread
From: Nick Piggin @ 2008-05-22  0:30 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev, paulus

On Wed, May 21, 2008 at 10:12:03PM +0200, Segher Boessenkool wrote:
> >>From memory, I measured lwsync is 5 times faster than eieio on
> >a dual G5. This was on a simple microbenchmark that made use of
> >smp_wmb for store ordering, but it did not involve any IO access
> >(which presumably would disadvantage eieio further).
> 
> This is very much specific to your particular benchmark.
> 
> On the 970, there are two differences between lwsync and eieio:
> 
> 1) lwsync cannot issue before all previous loads are done; eieio
> does not have this restriction.
> 
> Then, they both fly through the execution core, it doesn't wait
> for the barrier insn to complete in the storage system.  In both
> cases, a barrier is inserted into both the L2 queues and the
> non-cacheable queues.  These barriers are both removed at the
> same time, that is, when both are the oldest in their queue and
> have done their thing.
> 
> 2) For eieio, the non-cacheable unit waits for all previous
> (non-cacheable) stores to complete, and then arbitrates for the
> bus and sends an EIEIO transaction.
> 
> Your benchmark doesn't do non-cacheable stores, so it would seem
> the five-time slowdown is caused by that bus arbitration (and the
> bus transaction).  Maybe your cacheable stores hit the bus as well,
> that would make this worse.  Your benchmark also doesn't see the
> negative effects from 1).
> 
> In "real" code, I expect 2) to be pretty much invisible (the store
> queues will never be completely filled up), but 1) shouldn't be very
> bad either.  So it's a wash.  But only a real benchmark will tell.

OK, interesting thanks. Yes the "benchmark" is not a good one, but
it verified for me that there is a difference there. Combined with
IBM's documents saying lwsync is preferred for store/store ordering
is my rationale for sending the patch. A real benchmark would be nice
but it would probably be hard to notice any improvement.

 
> >Given the G5 speedup, I'd be surprised if there is not an improvment
> >on POWER4 and 5 as well,
> 
> The 970 storage subsystem and the POWER4 one are very different.
> Or maybe these queues are just about the last thing that _is_
> identical, I dunno, there aren't public POWER4 docs for this ;-)
> 
> >although no idea about POWER6 or cell...
> 
> No idea about POWER6; for CBE, the backend works similar to the
> 970 one.
> 
> Given that the architecture says to use lwsync for cases like this,
> it would be very surprising if it performed (much) worse than eieio,
> eh? ;-)  So I think your patch is a win; just wanted to clarify on
> your five-time slowdown number.

Sure, thanks!

Nick

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/2] powerpc: rmb fix
  2008-05-21 15:32   ` Nick Piggin
  2008-05-21 15:43     ` Benjamin Herrenschmidt
@ 2008-05-23  2:14     ` Paul Mackerras
  2008-05-23  4:40       ` Nick Piggin
  1 sibling, 1 reply; 23+ messages in thread
From: Paul Mackerras @ 2008-05-23  2:14 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linuxppc-dev

Nick Piggin writes:

> More than one device driver does raw/relaxed io accessors and expects the
> *mb functions to order them.

Can you point us at an example?

I don't think the semantics of the raw accessors are particularly well
defined (it's not even defined what endianness the data comes back in)
so I'd be surprised if there was any real definition of their
behaviour w.r.t. barriers.

Paul.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/2] powerpc: rmb fix
  2008-05-23  2:14     ` Paul Mackerras
@ 2008-05-23  4:40       ` Nick Piggin
  2008-05-23  4:53         ` Paul Mackerras
  0 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2008-05-23  4:40 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

On Fri, May 23, 2008 at 12:14:41PM +1000, Paul Mackerras wrote:
> Nick Piggin writes:
> 
> > More than one device driver does raw/relaxed io accessors and expects the
> > *mb functions to order them.
> 
> Can you point us at an example?

Uh, I might be getting confused because the semantics are completely
different eg. between powerpc and ia64.

There don't seem to actually be read*_relaxed calls that also use rmb
in the same file (although there is no reason why they might not appear).
But I must be thinking of are the raw_read accessors. They aren't ordered
on powerpc, and a few drivers appear to hope rmb() will order them.


 
> I don't think the semantics of the raw accessors are particularly well
> defined (it's not even defined what endianness the data comes back in)
> so I'd be surprised if there was any real definition of their
> behaviour w.r.t. barriers.

I think it is fair to say the intention of the _relaxed variants is
that they require memory barriers for ordering. No idea about raw read
or write but it appears that at least on powerpc and alpha they are
unordered.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/2] powerpc: rmb fix
  2008-05-23  4:40       ` Nick Piggin
@ 2008-05-23  4:53         ` Paul Mackerras
  2008-05-23  5:48           ` Nick Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Mackerras @ 2008-05-23  4:53 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linuxppc-dev

Nick Piggin writes:

> There don't seem to actually be read*_relaxed calls that also use rmb
> in the same file (although there is no reason why they might not appear).
> But I must be thinking of are the raw_read accessors. They aren't ordered
> on powerpc, and a few drivers appear to hope rmb() will order them.

Which drivers?

Paul.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/2] powerpc: rmb fix
  2008-05-23  4:53         ` Paul Mackerras
@ 2008-05-23  5:48           ` Nick Piggin
  2008-05-23  6:40             ` Paul Mackerras
  0 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2008-05-23  5:48 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

On Fri, May 23, 2008 at 02:53:21PM +1000, Paul Mackerras wrote:
> Nick Piggin writes:
> 
> > There don't seem to actually be read*_relaxed calls that also use rmb
> > in the same file (although there is no reason why they might not appear).
> > But I must be thinking of are the raw_read accessors. They aren't ordered
> > on powerpc, and a few drivers appear to hope rmb() will order them.
> 
> Which drivers?

net/macb.c, net/sfc/falcon_io.c... hmm, I thought there was a couple
more but maybe I was thinking of write side.

Anyway, even if there were zero, then the point is still that you
implement that API, so you should either strongly order your
__raw_ and _relaxed then you can weaken your rmb, or you have to
strengthen your rmb to match your weak read ops.

Saying it doesn't matter because there are no drivers is likely to
cause more headaches in future...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/2] powerpc: rmb fix
  2008-05-23  5:48           ` Nick Piggin
@ 2008-05-23  6:40             ` Paul Mackerras
  2008-05-26  1:38               ` Nick Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Mackerras @ 2008-05-23  6:40 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linuxppc-dev

Nick Piggin writes:

> Anyway, even if there were zero, then the point is still that you
> implement that API, so you should either strongly order your
> __raw_ and _relaxed then you can weaken your rmb, or you have to
> strengthen your rmb to match your weak read ops.
> 
> Saying it doesn't matter because there are no drivers is likely to
> cause more headaches in future...

Well, it seems that rmb is used in several ways:

(1) to order reads from normal memory w.r.t. each other
(2) to order readl et al. w.r.t. reads from normal memory
(3) to order readl_relaxed et al. w.r.t. reads from normal memory

and I want to get a feeling for how prevalent each type of use is.

Now, the instances of (1) can presumably be converted to smb_rmb().
The instances of (2) aren't necessary on powerpc because readl
etc. are fully synchronous -- but that mightn't be true on other
platforms.  And for the instances of (3), powerpc needs to use sync,
as you say.

If the instances of (2) have to stay, then I would rather try to work
out some way that rmb doesn't have to be a sync, or at least doesn't
always have to be a sync.  And that might mean redefining the meaning
of rmb so that the instances of (3) have to be changed to something
else.  Since (3) seems to be rarer than (2), it would presumably be
(3) that we would change rather than (2).

Paul.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/2] powerpc: rmb fix
  2008-05-23  6:40             ` Paul Mackerras
@ 2008-05-26  1:38               ` Nick Piggin
  0 siblings, 0 replies; 23+ messages in thread
From: Nick Piggin @ 2008-05-26  1:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

On Fri, May 23, 2008 at 04:40:21PM +1000, Paul Mackerras wrote:
> Nick Piggin writes:
> 
> > Anyway, even if there were zero, then the point is still that you
> > implement that API, so you should either strongly order your
> > __raw_ and _relaxed then you can weaken your rmb, or you have to
> > strengthen your rmb to match your weak read ops.
> > 
> > Saying it doesn't matter because there are no drivers is likely to
> > cause more headaches in future...
> 
> Well, it seems that rmb is used in several ways:
> 
> (1) to order reads from normal memory w.r.t. each other
> (2) to order readl et al. w.r.t. reads from normal memory
> (3) to order readl_relaxed et al. w.r.t. reads from normal memory
> 
> and I want to get a feeling for how prevalent each type of use is.
> 
> Now, the instances of (1) can presumably be converted to smb_rmb().

Right.


> The instances of (2) aren't necessary on powerpc because readl
> etc. are fully synchronous -- but that mightn't be true on other
> platforms.a

Yes... this is one place where a dedicated io_rmb() etc will
help on powerpc (as per my other proposal).


>  And for the instances of (3), powerpc needs to use sync,
> as you say.
> 
> If the instances of (2) have to stay, then I would rather try to work
> out some way that rmb doesn't have to be a sync, or at least doesn't
> always have to be a sync.  And that might mean redefining the meaning
> of rmb so that the instances of (3) have to be changed to something
> else.  Since (3) seems to be rarer than (2), it would presumably be
> (3) that we would change rather than (2).

That would be nice, but the problem is that if you weaken anything,
then technically you need to audit everything. And if you audit
everything anyway, you may as well retain existing semantics of rmb,
and introduce a new primitive, io_rmb... this is the reasoning I
use when I think the ia64 guys went the wrong way by weakening
wmb and introducing mmiowb.

Would the io_*mb() proposal suit you? It should also be able to
improve your IO write/write and read/write write/read performance
too.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2008-05-26  1:38 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-21 14:10 [patch 1/2] powerpc: rmb fix Nick Piggin
2008-05-21 14:12 ` [patch 2/2] powerpc: optimise smp_wmb Nick Piggin
2008-05-21 15:26   ` Benjamin Herrenschmidt
2008-05-21 15:34     ` Nick Piggin
2008-05-21 15:43       ` Benjamin Herrenschmidt
2008-05-21 15:47         ` Nick Piggin
2008-05-21 16:02           ` Benjamin Herrenschmidt
2008-05-21 20:51             ` Segher Boessenkool
2008-05-21 16:01         ` Nick Piggin
2008-05-21 20:12           ` Segher Boessenkool
2008-05-21 20:44             ` Benjamin Herrenschmidt
2008-05-21 22:07               ` Segher Boessenkool
2008-05-22  0:30             ` Nick Piggin
2008-05-21 20:16   ` Segher Boessenkool
2008-05-21 15:27 ` [patch 1/2] powerpc: rmb fix Benjamin Herrenschmidt
2008-05-21 15:32   ` Nick Piggin
2008-05-21 15:43     ` Benjamin Herrenschmidt
2008-05-23  2:14     ` Paul Mackerras
2008-05-23  4:40       ` Nick Piggin
2008-05-23  4:53         ` Paul Mackerras
2008-05-23  5:48           ` Nick Piggin
2008-05-23  6:40             ` Paul Mackerras
2008-05-26  1:38               ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).