* [patch 1/2] powerpc: rmb fix
@ 2007-08-21 2:11 Nick Piggin
2007-08-21 19:07 ` Joel Schopp
0 siblings, 1 reply; 36+ messages in thread
From: Nick Piggin @ 2007-08-21 2:11 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Paul Mackerras
In the interest of completeness, I'll split these patches up and submit to
the powerpc dev list. Any discussion or ack/nack would be appreciated.
---
lwsync is defined to only order memory operations on cacheable memory.
A full sync appears to be the only barrier that will order all memory
loads including device memory.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Index: linux-2.6/include/asm-powerpc/system.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/system.h
+++ linux-2.6/include/asm-powerpc/system.h
@@ -33,7 +33,7 @@
* SMP since it is only used to order updates to system memory.
*/
#define mb() __asm__ __volatile__ ("sync" : : : "memory")
-#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
+#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
#define wmb() __asm__ __volatile__ ("sync" : : : "memory")
#define read_barrier_depends() do { } while(0)
@@ -42,7 +42,7 @@
#ifdef __KERNEL__
#ifdef CONFIG_SMP
#define smp_mb() mb()
-#define smp_rmb() rmb()
+#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
#define smp_wmb() eieio()
#define smp_read_barrier_depends() read_barrier_depends()
#else
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-21 2:11 Nick Piggin
@ 2007-08-21 19:07 ` Joel Schopp
2007-08-21 19:43 ` Segher Boessenkool
0 siblings, 1 reply; 36+ messages in thread
From: Joel Schopp @ 2007-08-21 19:07 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, Paul Mackerras
> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> -#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
> +#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> #define read_barrier_depends() do { } while(0)
>
> @@ -42,7 +42,7 @@
> #ifdef __KERNEL__
> #ifdef CONFIG_SMP
> #define smp_mb() mb()
> -#define smp_rmb() rmb()
> +#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
> #define smp_wmb() eieio()
> #define smp_read_barrier_depends() read_barrier_depends()
> #else
I had to think about this one for awhile. It looks at first glance to be the right
thing to do. But I do wonder how long rmb() has been lwsync and if as a practical
matter that has caused any problems? If this isn't causing any problems maybe there
is some loigic we are overlooking?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-21 19:07 ` Joel Schopp
@ 2007-08-21 19:43 ` Segher Boessenkool
2007-08-21 21:42 ` Linas Vepstas
` (2 more replies)
0 siblings, 3 replies; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-21 19:43 UTC (permalink / raw)
To: Joel Schopp; +Cc: Nick Piggin, linuxppc-dev, Paul Mackerras
>> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
>> -#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
>> "memory")
>> +#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
>> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
>> #define read_barrier_depends() do { } while(0)
>>
>> @@ -42,7 +42,7 @@
>> #ifdef __KERNEL__
>> #ifdef CONFIG_SMP
>> #define smp_mb() mb()
>> -#define smp_rmb() rmb()
>> +#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
>> "memory")
>> #define smp_wmb() eieio()
>> #define smp_read_barrier_depends() read_barrier_depends()
>> #else
>
> I had to think about this one for awhile. It looks at first glance to
> be the right
> thing to do. But I do wonder how long rmb() has been lwsync
Since the {ppc,ppc64} -> powerpc merge.
> and if as a practical matter that has caused any problems?
It has not as far as I know.
> If this isn't causing any problems maybe there
> is some loigic we are overlooking?
The I/O accessor functions enforce the necessary ordering
already I believe.
Segher
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-21 19:43 ` Segher Boessenkool
@ 2007-08-21 21:42 ` Linas Vepstas
2007-08-22 1:16 ` Nick Piggin
2007-08-22 3:15 ` Nick Piggin
2 siblings, 0 replies; 36+ messages in thread
From: Linas Vepstas @ 2007-08-21 21:42 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: Nick Piggin, Paul Mackerras, linuxppc-dev
On Tue, Aug 21, 2007 at 09:43:17PM +0200, Segher Boessenkool wrote:
> >> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> >> -#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >> "memory")
> >> +#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define read_barrier_depends() do { } while(0)
> >>
> >> @@ -42,7 +42,7 @@
> >> #ifdef __KERNEL__
> >> #ifdef CONFIG_SMP
> >> #define smp_mb() mb()
> >> -#define smp_rmb() rmb()
> >> +#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >> "memory")
> >> #define smp_wmb() eieio()
> >> #define smp_read_barrier_depends() read_barrier_depends()
> >> #else
> >
> > I had to think about this one for awhile. It looks at first glance to
> > be the right
> > thing to do. But I do wonder how long rmb() has been lwsync
>
> Since the {ppc,ppc64} -> powerpc merge.
>
> > and if as a practical matter that has caused any problems?
>
> It has not as far as I know.
>
> > If this isn't causing any problems maybe there
> > is some loigic we are overlooking?
>
> The I/O accessor functions enforce the necessary ordering
> already I believe.
So, is this patch desirable?
--linas
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-21 19:43 ` Segher Boessenkool
2007-08-21 21:42 ` Linas Vepstas
@ 2007-08-22 1:16 ` Nick Piggin
2007-08-22 3:29 ` Segher Boessenkool
2007-08-22 3:15 ` Nick Piggin
2 siblings, 1 reply; 36+ messages in thread
From: Nick Piggin @ 2007-08-22 1:16 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev, Paul Mackerras
On Tue, Aug 21, 2007 at 09:43:17PM +0200, Segher Boessenkool wrote:
> >> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> >>-#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >>"memory")
> >>+#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define read_barrier_depends() do { } while(0)
> >>
> >>@@ -42,7 +42,7 @@
> >> #ifdef __KERNEL__
> >> #ifdef CONFIG_SMP
> >> #define smp_mb() mb()
> >>-#define smp_rmb() rmb()
> >>+#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >>"memory")
> >> #define smp_wmb() eieio()
> >> #define smp_read_barrier_depends() read_barrier_depends()
> >> #else
> >
> >I had to think about this one for awhile. It looks at first glance to
> >be the right
> >thing to do. But I do wonder how long rmb() has been lwsync
>
> Since the {ppc,ppc64} -> powerpc merge.
>
> >and if as a practical matter that has caused any problems?
>
> It has not as far as I know.
>
> >If this isn't causing any problems maybe there
> >is some loigic we are overlooking?
>
> The I/O accessor functions enforce the necessary ordering
> already I believe.
Ah, it looks like you might be right, IO should appear to go in-order, in
which case the rmb() would simply need to order cacheable loads. Interesting
way to do things... are drivers simply not up to scratch enough to allow
out of order IO?
Anyway, this raises another question -- if IO accessors have the right
ordering, why is wmb() not an lwsync as well? There appears to be many
more wmb() calls than rmb()...
Thanks,
Nick
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-21 19:43 ` Segher Boessenkool
2007-08-21 21:42 ` Linas Vepstas
2007-08-22 1:16 ` Nick Piggin
@ 2007-08-22 3:15 ` Nick Piggin
2007-08-22 3:33 ` Segher Boessenkool
2 siblings, 1 reply; 36+ messages in thread
From: Nick Piggin @ 2007-08-22 3:15 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev, Paul Mackerras
On Tue, Aug 21, 2007 at 09:43:17PM +0200, Segher Boessenkool wrote:
> >> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> >>-#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >>"memory")
> >>+#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define read_barrier_depends() do { } while(0)
> >>
> >>@@ -42,7 +42,7 @@
> >> #ifdef __KERNEL__
> >> #ifdef CONFIG_SMP
> >> #define smp_mb() mb()
> >>-#define smp_rmb() rmb()
> >>+#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >>"memory")
> >> #define smp_wmb() eieio()
> >> #define smp_read_barrier_depends() read_barrier_depends()
> >> #else
> >
> >I had to think about this one for awhile. It looks at first glance to
> >be the right
> >thing to do. But I do wonder how long rmb() has been lwsync
>
> Since the {ppc,ppc64} -> powerpc merge.
>
> >and if as a practical matter that has caused any problems?
>
> It has not as far as I know.
>
> >If this isn't causing any problems maybe there
> >is some loigic we are overlooking?
>
> The I/O accessor functions enforce the necessary ordering
> already I believe.
Hmm, I never followed those discussions last year about IO ordering, and
I can't see where (if) it was documented anywhere :(
It appears that legacy code is handled by defining the old IO accessors to
be completely ordered, and introducing new __raw_ variants that are not
(OTOH, it seems like other architectures are implementing __raw prefix as
inorder unless there is a _relaxed postfix).
Drivers are definitely using these __raw_ accessors, and from a quick
look, they do appear to be hoping that *mb() is going to order access for
them.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 1:16 ` Nick Piggin
@ 2007-08-22 3:29 ` Segher Boessenkool
2007-08-22 3:55 ` Nick Piggin
0 siblings, 1 reply; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-22 3:29 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, Paul Mackerras
>>> If this isn't causing any problems maybe there
>>> is some loigic we are overlooking?
>>
>> The I/O accessor functions enforce the necessary ordering
>> already I believe.
>
> Ah, it looks like you might be right, IO should appear to go in-order,
> in
> which case the rmb() would simply need to order cacheable loads.
> Interesting
> way to do things... are drivers simply not up to scratch enough to
> allow
> out of order IO?
The powerpc kernel needs to have full sync insns in every I/O
accessor in order to enforce all the ordering rules Linux demands.
It's a bloody shame, but the alternative would be to make the
barriers lots more expensive. A third alternative would be to
have barrier ops that do not order everything, but just A vs. B
for various choices of A and B (coherent accesses, MMIO accesses,
etc.)
> Anyway, this raises another question -- if IO accessors have the right
> ordering, why is wmb() not an lwsync as well? There appears to be many
> more wmb() calls than rmb()...
Input MMIO accessors are {sync, load, stall pipeline until load came
back}.
That's a full ordering on both sides.
Output MMIO on the other hand is done with {sync, store}. Now since
wmb() has to order MMIO writes vs. main memory writes, we need a full
sync here. On some (most, all?) CPUs an eieio is actually enough btw.
The barrier insn could be put at the end of all MMIO write ops too,
but I believe that would be more expensive (in execution time; in code
size it definitely would be, of course).
Segher
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 3:15 ` Nick Piggin
@ 2007-08-22 3:33 ` Segher Boessenkool
2007-08-22 4:05 ` Nick Piggin
0 siblings, 1 reply; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-22 3:33 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, Paul Mackerras
>> The I/O accessor functions enforce the necessary ordering
>> already I believe.
>
> Hmm, I never followed those discussions last year about IO ordering,
> and
> I can't see where (if) it was documented anywhere :(
The comments in system.h weren't updated with the last fix, I think.
> It appears that legacy code is handled by defining the old IO
> accessors to
> be completely ordered, and introducing new __raw_ variants that are not
> (OTOH, it seems like other architectures are implementing __raw prefix
> as
> inorder unless there is a _relaxed postfix).
__raw_XX() is for platform code only, which can do the needed
barriers without having to use the heavy hammer like everything
else unfortunately does.
> Drivers are definitely using these __raw_ accessors, and from a quick
> look, they do appear to be hoping that *mb() is going to order access
> for
> them.
Which drivers?
Segher
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 3:29 ` Segher Boessenkool
@ 2007-08-22 3:55 ` Nick Piggin
2007-08-23 17:57 ` Segher Boessenkool
0 siblings, 1 reply; 36+ messages in thread
From: Nick Piggin @ 2007-08-22 3:55 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev, Paul Mackerras
On Wed, Aug 22, 2007 at 05:29:50AM +0200, Segher Boessenkool wrote:
> >>>If this isn't causing any problems maybe there
> >>>is some loigic we are overlooking?
> >>
> >>The I/O accessor functions enforce the necessary ordering
> >>already I believe.
> >
> >Ah, it looks like you might be right, IO should appear to go in-order,
> >in
> >which case the rmb() would simply need to order cacheable loads.
> >Interesting
> >way to do things... are drivers simply not up to scratch enough to
> >allow
> >out of order IO?
>
> The powerpc kernel needs to have full sync insns in every I/O
> accessor in order to enforce all the ordering rules Linux demands.
> It's a bloody shame, but the alternative would be to make the
> barriers lots more expensive. A third alternative would be to
Well lots more expensive compared to what you have now. But what
you have now is like having those expensive barriers between
*every* io access.
> have barrier ops that do not order everything, but just A vs. B
> for various choices of A and B (coherent accesses, MMIO accesses,
> etc.)
The non-smp_ variant is supposed to order everything, AFAIK. Maybe
you could get more fancy and have PIO vs MMIO etc etc. but it looks
like this whole area is in a pretty sticky state anyway so let's
not think about that.
> >Anyway, this raises another question -- if IO accessors have the right
> >ordering, why is wmb() not an lwsync as well? There appears to be many
> >more wmb() calls than rmb()...
>
> Input MMIO accessors are {sync, load, stall pipeline until load came
> back}.
> That's a full ordering on both sides.
>
> Output MMIO on the other hand is done with {sync, store}. Now since
> wmb() has to order MMIO writes vs. main memory writes, we need a full
> sync here. On some (most, all?) CPUs an eieio is actually enough btw.
> The barrier insn could be put at the end of all MMIO write ops too,
> but I believe that would be more expensive (in execution time; in code
> size it definitely would be, of course).
Ah, that explains why wmb() is a sync. Doesn't seem like a very good
idea though, if the rationale of having fully ordered IO accessors was
because drivers didn't have enough barriers in them.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 3:33 ` Segher Boessenkool
@ 2007-08-22 4:05 ` Nick Piggin
2007-08-23 17:49 ` Segher Boessenkool
0 siblings, 1 reply; 36+ messages in thread
From: Nick Piggin @ 2007-08-22 4:05 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev, Paul Mackerras
On Wed, Aug 22, 2007 at 05:33:16AM +0200, Segher Boessenkool wrote:
> >>The I/O accessor functions enforce the necessary ordering
> >>already I believe.
> >
> >Hmm, I never followed those discussions last year about IO ordering,
> >and
> >I can't see where (if) it was documented anywhere :(
>
> The comments in system.h weren't updated with the last fix, I think.
>
> >It appears that legacy code is handled by defining the old IO
> >accessors to
> >be completely ordered, and introducing new __raw_ variants that are not
> >(OTOH, it seems like other architectures are implementing __raw prefix
> >as
> >inorder unless there is a _relaxed postfix).
>
> __raw_XX() is for platform code only, which can do the needed
> barriers without having to use the heavy hammer like everything
> else unfortunately does.
npiggin@nick:~/usr/src/linux-2.6/drivers> egrep '__raw_(write|read)' -r * | wc -l
685
> >Drivers are definitely using these __raw_ accessors, and from a quick
> >look, they do appear to be hoping that *mb() is going to order access
> >for
> >them.
>
> Which drivers?
There are maybe a dozen that use the raw accessors, and use non-smp_
memory barriers. I just looked at drivers/video/tgafb.c, which
indeed appears to intermix them.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 4:05 ` Nick Piggin
@ 2007-08-23 17:49 ` Segher Boessenkool
0 siblings, 0 replies; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-23 17:49 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, Paul Mackerras
>>> Drivers are definitely using these __raw_ accessors, and from a quick
>>> look, they do appear to be hoping that *mb() is going to order access
>>> for
>>> them.
>>
>> Which drivers?
>
> There are maybe a dozen that use the raw accessors, and use non-smp_
> memory barriers. I just looked at drivers/video/tgafb.c, which
> indeed appears to intermix them.
Hrm yeah. It also looks like old buggy code that could use a
cleanup or two (or three or four). I wonder if all __raw_ users
are like that?
Segher
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 3:55 ` Nick Piggin
@ 2007-08-23 17:57 ` Segher Boessenkool
2007-08-24 2:47 ` Nick Piggin
0 siblings, 1 reply; 36+ messages in thread
From: Segher Boessenkool @ 2007-08-23 17:57 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, Paul Mackerras
>> The powerpc kernel needs to have full sync insns in every I/O
>> accessor in order to enforce all the ordering rules Linux demands.
>> It's a bloody shame, but the alternative would be to make the
>> barriers lots more expensive. A third alternative would be to
>
> Well lots more expensive compared to what you have now. But what
> you have now is like having those expensive barriers between
> *every* io access.
Yeah. But I/O reads are very expensive anyway, and the barriers
are used for more than just I/O ordering.
I/O writes are a different thing; ideally, they would use only
eieio, if anything at all.
Maybe the tradeoff isn't optimal. The I/O primitives didn't have
all those "sync"s in there before, they got added because some bad
interaction with spinlocks was discovered, if my memory isn't failing
me.
>> have barrier ops that do not order everything, but just A vs. B
>> for various choices of A and B (coherent accesses, MMIO accesses,
>> etc.)
>
> The non-smp_ variant is supposed to order everything, AFAIK. Maybe
> you could get more fancy and have PIO vs MMIO etc etc. but it looks
> like this whole area is in a pretty sticky state anyway so let's
> not think about that.
*Thinking* about it is fun. Trying to get the code merged would be
a different thing ;-)
Segher
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-23 17:57 ` Segher Boessenkool
@ 2007-08-24 2:47 ` Nick Piggin
0 siblings, 0 replies; 36+ messages in thread
From: Nick Piggin @ 2007-08-24 2:47 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev, Paul Mackerras
On Thu, Aug 23, 2007 at 07:57:20PM +0200, Segher Boessenkool wrote:
> >>The powerpc kernel needs to have full sync insns in every I/O
> >>accessor in order to enforce all the ordering rules Linux demands.
> >>It's a bloody shame, but the alternative would be to make the
> >>barriers lots more expensive. A third alternative would be to
> >
> >Well lots more expensive compared to what you have now. But what
> >you have now is like having those expensive barriers between
> >*every* io access.
>
> Yeah. But I/O reads are very expensive anyway, and the barriers
> are used for more than just I/O ordering.
rmb() should only be used when IO ordering matters. smp_rmb() is
for regular ordering.
So doesn't the fact IO reads are very expensive anyway lend more
weight _to have_ the full IO ordering in rmb()?
> I/O writes are a different thing; ideally, they would use only
> eieio, if anything at all.
For IO to IO ordering, yes eieio would be ideal. I don't know that
there is really such a primitive for that in Linux. io_wmb().
> Maybe the tradeoff isn't optimal. The I/O primitives didn't have
> all those "sync"s in there before, they got added because some bad
> interaction with spinlocks was discovered, if my memory isn't failing
> me.
I think it may have been because IO ops are pretty strongly ordered
on x86, and it was thought that a fair amount of code was relying on
that. So the old primitives were made quite strongly ordered, and
new ones were added to avoid the overhead. Unfortunately, you can't
actually use the unordered ones unless you have working barrier
instructions. Hence why I think rmb() should be an IO barrier.
But I'm not pushing this too hard. You guys all know the gory ppc
details better than I, so I'll just leave it with you to work out
whether it is the right thing to do.
> >>have barrier ops that do not order everything, but just A vs. B
> >>for various choices of A and B (coherent accesses, MMIO accesses,
> >>etc.)
> >
> >The non-smp_ variant is supposed to order everything, AFAIK. Maybe
> >you could get more fancy and have PIO vs MMIO etc etc. but it looks
> >like this whole area is in a pretty sticky state anyway so let's
> >not think about that.
>
> *Thinking* about it is fun. Trying to get the code merged would be
> a different thing ;-)
;)
^ permalink raw reply [flat|nested] 36+ messages in thread
* [patch 1/2] powerpc: rmb fix
@ 2008-05-21 14:10 Nick Piggin
2008-05-21 14:12 ` [patch 2/2] powerpc: optimise smp_wmb Nick Piggin
2008-05-21 15:27 ` [patch 1/2] powerpc: rmb fix Benjamin Herrenschmidt
0 siblings, 2 replies; 36+ messages in thread
From: Nick Piggin @ 2008-05-21 14:10 UTC (permalink / raw)
To: paulus, linuxppc-dev
Hi,
I'm sure I've sent these patches before, but I can't remember why they
weren't merged. They still seem obviously correct to me.
--
lwsync is explicitly defined not to have any effect on the ordering of
accesses to device memory, so it cannot be used for rmb(). sync appears
to be the only barrier which fits the bill.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/include/asm-powerpc/system.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/system.h
+++ linux-2.6/include/asm-powerpc/system.h
@@ -34,7 +34,7 @@
* SMP since it is only used to order updates to system memory.
*/
#define mb() __asm__ __volatile__ ("sync" : : : "memory")
-#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
+#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
#define wmb() __asm__ __volatile__ ("sync" : : : "memory")
#define read_barrier_depends() do { } while(0)
^ permalink raw reply [flat|nested] 36+ messages in thread
* [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 14:10 [patch 1/2] powerpc: rmb fix Nick Piggin
@ 2008-05-21 14:12 ` Nick Piggin
2008-05-21 15:26 ` Benjamin Herrenschmidt
2008-05-21 20:16 ` Segher Boessenkool
2008-05-21 15:27 ` [patch 1/2] powerpc: rmb fix Benjamin Herrenschmidt
1 sibling, 2 replies; 36+ messages in thread
From: Nick Piggin @ 2008-05-21 14:12 UTC (permalink / raw)
To: paulus, linuxppc-dev
lwsync is the recommended method of store/store ordering on caching enabled
memory. For those subarchs which have lwsync, use it rather than eieio for
smp_wmb.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/include/asm-powerpc/system.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/system.h
+++ linux-2.6/include/asm-powerpc/system.h
@@ -30,8 +30,8 @@
*
* For wmb(), we use sync since wmb is used in drivers to order
* stores to system memory with respect to writes to the device.
- * However, smp_wmb() can be a lighter-weight eieio barrier on
- * SMP since it is only used to order updates to system memory.
+ * However, smp_wmb() can be a lighter-weight lwsync or eieio barrier
+ * on SMP since it is only used to order updates to system memory.
*/
#define mb() __asm__ __volatile__ ("sync" : : : "memory")
#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
@@ -43,9 +43,16 @@
#ifdef __KERNEL__
#define AT_VECTOR_SIZE_ARCH 6 /* entries in ARCH_DLINFO */
#ifdef CONFIG_SMP
+
+#ifdef __SUBARCH_HAS_LWSYNC
+# define SMPWMB lwsync
+#else
+# define SMPWMB eieio
+#endif
+
#define smp_mb() mb()
#define smp_rmb() rmb()
-#define smp_wmb() eieio()
+#define smp_wmb() __asm__ __volatile__ (__stringify(SMPWMB) : : :"memory")
#define smp_read_barrier_depends() read_barrier_depends()
#else
#define smp_mb() barrier()
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 14:12 ` [patch 2/2] powerpc: optimise smp_wmb Nick Piggin
@ 2008-05-21 15:26 ` Benjamin Herrenschmidt
2008-05-21 15:34 ` Nick Piggin
2008-05-21 20:16 ` Segher Boessenkool
1 sibling, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 15:26 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, paulus
On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
> lwsync is the recommended method of store/store ordering on caching enabled
> memory. For those subarchs which have lwsync, use it rather than eieio for
> smp_wmb.
Yuck... existence of lwsync depends on the processor at boot time...
Ben.
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> Index: linux-2.6/include/asm-powerpc/system.h
> ===================================================================
> --- linux-2.6.orig/include/asm-powerpc/system.h
> +++ linux-2.6/include/asm-powerpc/system.h
> @@ -30,8 +30,8 @@
> *
> * For wmb(), we use sync since wmb is used in drivers to order
> * stores to system memory with respect to writes to the device.
> - * However, smp_wmb() can be a lighter-weight eieio barrier on
> - * SMP since it is only used to order updates to system memory.
> + * However, smp_wmb() can be a lighter-weight lwsync or eieio barrier
> + * on SMP since it is only used to order updates to system memory.
> */
> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> #define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> @@ -43,9 +43,16 @@
> #ifdef __KERNEL__
> #define AT_VECTOR_SIZE_ARCH 6 /* entries in ARCH_DLINFO */
> #ifdef CONFIG_SMP
> +
> +#ifdef __SUBARCH_HAS_LWSYNC
> +# define SMPWMB lwsync
> +#else
> +# define SMPWMB eieio
> +#endif
> +
> #define smp_mb() mb()
> #define smp_rmb() rmb()
> -#define smp_wmb() eieio()
> +#define smp_wmb() __asm__ __volatile__ (__stringify(SMPWMB) : : :"memory")
> #define smp_read_barrier_depends() read_barrier_depends()
> #else
> #define smp_mb() barrier()
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2008-05-21 14:10 [patch 1/2] powerpc: rmb fix Nick Piggin
2008-05-21 14:12 ` [patch 2/2] powerpc: optimise smp_wmb Nick Piggin
@ 2008-05-21 15:27 ` Benjamin Herrenschmidt
2008-05-21 15:32 ` Nick Piggin
1 sibling, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 15:27 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, paulus
On Wed, 2008-05-21 at 16:10 +0200, Nick Piggin wrote:
> Hi,
>
> I'm sure I've sent these patches before, but I can't remember why they
> weren't merged. They still seem obviously correct to me.
We should already do all that's needed in our IO accessors no ?
> --
>
> lwsync is explicitly defined not to have any effect on the ordering of
> accesses to device memory, so it cannot be used for rmb(). sync appears
> to be the only barrier which fits the bill.
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> Index: linux-2.6/include/asm-powerpc/system.h
> ===================================================================
> --- linux-2.6.orig/include/asm-powerpc/system.h
> +++ linux-2.6/include/asm-powerpc/system.h
> @@ -34,7 +34,7 @@
> * SMP since it is only used to order updates to system memory.
> */
> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> -#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
> +#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> #define read_barrier_depends() do { } while(0)
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2008-05-21 15:27 ` [patch 1/2] powerpc: rmb fix Benjamin Herrenschmidt
@ 2008-05-21 15:32 ` Nick Piggin
2008-05-21 15:43 ` Benjamin Herrenschmidt
2008-05-23 2:14 ` Paul Mackerras
0 siblings, 2 replies; 36+ messages in thread
From: Nick Piggin @ 2008-05-21 15:32 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus
On Wed, May 21, 2008 at 11:27:03AM -0400, Benjamin Herrenschmidt wrote:
>
> On Wed, 2008-05-21 at 16:10 +0200, Nick Piggin wrote:
> > Hi,
> >
> > I'm sure I've sent these patches before, but I can't remember why they
> > weren't merged. They still seem obviously correct to me.
>
> We should already do all that's needed in our IO accessors no ?
More than one device driver does raw/relaxed io accessors and expects the
*mb functions to order them.
>
> > --
> >
> > lwsync is explicitly defined not to have any effect on the ordering of
> > accesses to device memory, so it cannot be used for rmb(). sync appears
> > to be the only barrier which fits the bill.
> >
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > ---
> > Index: linux-2.6/include/asm-powerpc/system.h
> > ===================================================================
> > --- linux-2.6.orig/include/asm-powerpc/system.h
> > +++ linux-2.6/include/asm-powerpc/system.h
> > @@ -34,7 +34,7 @@
> > * SMP since it is only used to order updates to system memory.
> > */
> > #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> > -#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
> > +#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> > #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> > #define read_barrier_depends() do { } while(0)
> >
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@ozlabs.org
> > https://ozlabs.org/mailman/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 15:26 ` Benjamin Herrenschmidt
@ 2008-05-21 15:34 ` Nick Piggin
2008-05-21 15:43 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 36+ messages in thread
From: Nick Piggin @ 2008-05-21 15:34 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus
On Wed, May 21, 2008 at 11:26:32AM -0400, Benjamin Herrenschmidt wrote:
>
> On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
> > lwsync is the recommended method of store/store ordering on caching enabled
> > memory. For those subarchs which have lwsync, use it rather than eieio for
> > smp_wmb.
>
> Yuck... existence of lwsync depends on the processor at boot time...
Not according to the __stringify(LWSYNC) that I just removed. At least,
presumably it is always present on 64 bit processors, and 32 bit ones
will be no worse off as they'll continue just using eieio.
>
> Ben.
>
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > ---
> > Index: linux-2.6/include/asm-powerpc/system.h
> > ===================================================================
> > --- linux-2.6.orig/include/asm-powerpc/system.h
> > +++ linux-2.6/include/asm-powerpc/system.h
> > @@ -30,8 +30,8 @@
> > *
> > * For wmb(), we use sync since wmb is used in drivers to order
> > * stores to system memory with respect to writes to the device.
> > - * However, smp_wmb() can be a lighter-weight eieio barrier on
> > - * SMP since it is only used to order updates to system memory.
> > + * However, smp_wmb() can be a lighter-weight lwsync or eieio barrier
> > + * on SMP since it is only used to order updates to system memory.
> > */
> > #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> > #define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> > @@ -43,9 +43,16 @@
> > #ifdef __KERNEL__
> > #define AT_VECTOR_SIZE_ARCH 6 /* entries in ARCH_DLINFO */
> > #ifdef CONFIG_SMP
> > +
> > +#ifdef __SUBARCH_HAS_LWSYNC
> > +# define SMPWMB lwsync
> > +#else
> > +# define SMPWMB eieio
> > +#endif
> > +
> > #define smp_mb() mb()
> > #define smp_rmb() rmb()
> > -#define smp_wmb() eieio()
> > +#define smp_wmb() __asm__ __volatile__ (__stringify(SMPWMB) : : :"memory")
> > #define smp_read_barrier_depends() read_barrier_depends()
> > #else
> > #define smp_mb() barrier()
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@ozlabs.org
> > https://ozlabs.org/mailman/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 15:34 ` Nick Piggin
@ 2008-05-21 15:43 ` Benjamin Herrenschmidt
2008-05-21 15:47 ` Nick Piggin
2008-05-21 16:01 ` Nick Piggin
0 siblings, 2 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 15:43 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, paulus
On Wed, 2008-05-21 at 17:34 +0200, Nick Piggin wrote:
> On Wed, May 21, 2008 at 11:26:32AM -0400, Benjamin Herrenschmidt wrote:
> >
> > On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
> > > lwsync is the recommended method of store/store ordering on caching enabled
> > > memory. For those subarchs which have lwsync, use it rather than eieio for
> > > smp_wmb.
> >
> > Yuck... existence of lwsync depends on the processor at boot time...
>
> Not according to the __stringify(LWSYNC) that I just removed. At least,
> presumably it is always present on 64 bit processors, and 32 bit ones
> will be no worse off as they'll continue just using eieio.
No, it doesn't exist on power3, but it degrades into a sync
Ben.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2008-05-21 15:32 ` Nick Piggin
@ 2008-05-21 15:43 ` Benjamin Herrenschmidt
2008-05-23 2:14 ` Paul Mackerras
1 sibling, 0 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 15:43 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, paulus
On Wed, 2008-05-21 at 17:32 +0200, Nick Piggin wrote:
> > We should already do all that's needed in our IO accessors no ?
>
> More than one device driver does raw/relaxed io accessors and expects
> the
> *mb functions to order them.
That's fishy... ok.
Ben.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 15:43 ` Benjamin Herrenschmidt
@ 2008-05-21 15:47 ` Nick Piggin
2008-05-21 16:02 ` Benjamin Herrenschmidt
2008-05-21 16:01 ` Nick Piggin
1 sibling, 1 reply; 36+ messages in thread
From: Nick Piggin @ 2008-05-21 15:47 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus
On Wed, May 21, 2008 at 11:43:00AM -0400, Benjamin Herrenschmidt wrote:
>
> On Wed, 2008-05-21 at 17:34 +0200, Nick Piggin wrote:
> > On Wed, May 21, 2008 at 11:26:32AM -0400, Benjamin Herrenschmidt wrote:
> > >
> > > On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
> > > > lwsync is the recommended method of store/store ordering on caching enabled
> > > > memory. For those subarchs which have lwsync, use it rather than eieio for
> > > > smp_wmb.
> > >
> > > Yuck... existence of lwsync depends on the processor at boot time...
> >
> > Not according to the __stringify(LWSYNC) that I just removed. At least,
> > presumably it is always present on 64 bit processors, and 32 bit ones
> > will be no worse off as they'll continue just using eieio.
>
> No, it doesn't exist on power3, but it degrades into a sync
OK, but I just don't understand what the problem is... your synch.h has
#ifdef __powerpc64__
#define __SUBARCH_HAS_LWSYNC
#endif
#ifdef __SUBARCH_HAS_LWSYNC
# define LWSYNC lwsync
#else
# define LWSYNC sync
#endif
And LWSYNC is then used for rmb()... how was that OK but this not?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 15:43 ` Benjamin Herrenschmidt
2008-05-21 15:47 ` Nick Piggin
@ 2008-05-21 16:01 ` Nick Piggin
2008-05-21 20:12 ` Segher Boessenkool
1 sibling, 1 reply; 36+ messages in thread
From: Nick Piggin @ 2008-05-21 16:01 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus
On Wed, May 21, 2008 at 11:43:00AM -0400, Benjamin Herrenschmidt wrote:
>
> On Wed, 2008-05-21 at 17:34 +0200, Nick Piggin wrote:
> > On Wed, May 21, 2008 at 11:26:32AM -0400, Benjamin Herrenschmidt wrote:
> > >
> > > On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
> > > > lwsync is the recommended method of store/store ordering on caching enabled
> > > > memory. For those subarchs which have lwsync, use it rather than eieio for
> > > > smp_wmb.
> > >
> > > Yuck... existence of lwsync depends on the processor at boot time...
> >
> > Not according to the __stringify(LWSYNC) that I just removed. At least,
> > presumably it is always present on 64 bit processors, and 32 bit ones
> > will be no worse off as they'll continue just using eieio.
>
> No, it doesn't exist on power3, but it degrades into a sync
Oh, the instruction does exist, but it degrades to a sync so
actually turns out to be slower than eieio?
I think it would be a good idea just to take the hit on power3.
>From memory, I measured lwsync is 5 times faster than eieio on
a dual G5. This was on a simple microbenchmark that made use of
smp_wmb for store ordering, but it did not involve any IO access
(which presumably would disadvantage eieio further).
Given the G5 speedup, I'd be surprised if there is not an improvment
on POWER4 and 5 as well, although no idea about POWER6 or cell...
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 15:47 ` Nick Piggin
@ 2008-05-21 16:02 ` Benjamin Herrenschmidt
2008-05-21 20:51 ` Segher Boessenkool
0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 16:02 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, paulus
On Wed, 2008-05-21 at 17:47 +0200, Nick Piggin wrote:
>
> OK, but I just don't understand what the problem is... your synch.h
> has
>
> #ifdef __powerpc64__
> #define __SUBARCH_HAS_LWSYNC
> #endif
>
> #ifdef __SUBARCH_HAS_LWSYNC
> # define LWSYNC lwsync
> #else
> # define LWSYNC sync
> #endif
>
This is mostly useless then since lwsync is just a sync to a processor
that doesn't know it (it's a sync with a reservd bit set) :-) Or it's
just to make gas happy if you specify a processor type that doesn't have
lwsync ?
Paul, do we have a case of a 32 bits CPU that chokes on the added bit ?
Ben.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 16:01 ` Nick Piggin
@ 2008-05-21 20:12 ` Segher Boessenkool
2008-05-21 20:44 ` Benjamin Herrenschmidt
2008-05-22 0:30 ` Nick Piggin
0 siblings, 2 replies; 36+ messages in thread
From: Segher Boessenkool @ 2008-05-21 20:12 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, paulus
>> From memory, I measured lwsync is 5 times faster than eieio on
> a dual G5. This was on a simple microbenchmark that made use of
> smp_wmb for store ordering, but it did not involve any IO access
> (which presumably would disadvantage eieio further).
This is very much specific to your particular benchmark.
On the 970, there are two differences between lwsync and eieio:
1) lwsync cannot issue before all previous loads are done; eieio
does not have this restriction.
Then, they both fly through the execution core, it doesn't wait
for the barrier insn to complete in the storage system. In both
cases, a barrier is inserted into both the L2 queues and the
non-cacheable queues. These barriers are both removed at the
same time, that is, when both are the oldest in their queue and
have done their thing.
2) For eieio, the non-cacheable unit waits for all previous
(non-cacheable) stores to complete, and then arbitrates for the
bus and sends an EIEIO transaction.
Your benchmark doesn't do non-cacheable stores, so it would seem
the five-time slowdown is caused by that bus arbitration (and the
bus transaction). Maybe your cacheable stores hit the bus as well,
that would make this worse. Your benchmark also doesn't see the
negative effects from 1).
In "real" code, I expect 2) to be pretty much invisible (the store
queues will never be completely filled up), but 1) shouldn't be very
bad either. So it's a wash. But only a real benchmark will tell.
> Given the G5 speedup, I'd be surprised if there is not an improvment
> on POWER4 and 5 as well,
The 970 storage subsystem and the POWER4 one are very different.
Or maybe these queues are just about the last thing that _is_
identical, I dunno, there aren't public POWER4 docs for this ;-)
> although no idea about POWER6 or cell...
No idea about POWER6; for CBE, the backend works similar to the
970 one.
Given that the architecture says to use lwsync for cases like this,
it would be very surprising if it performed (much) worse than eieio,
eh? ;-) So I think your patch is a win; just wanted to clarify on
your five-time slowdown number.
HTH,
Segher
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 14:12 ` [patch 2/2] powerpc: optimise smp_wmb Nick Piggin
2008-05-21 15:26 ` Benjamin Herrenschmidt
@ 2008-05-21 20:16 ` Segher Boessenkool
1 sibling, 0 replies; 36+ messages in thread
From: Segher Boessenkool @ 2008-05-21 20:16 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, paulus
> +#ifdef __SUBARCH_HAS_LWSYNC
> +# define SMPWMB lwsync
> +#else
> +# define SMPWMB eieio
> +#endif
> +
> #define smp_mb() mb()
> #define smp_rmb() rmb()
> -#define smp_wmb() eieio()
> +#define smp_wmb() __asm__ __volatile__ (__stringify(SMPWMB) : :
> :"memory")
SMPWMB is used only here. Why not just
#ifdef __SUBARCH_HAS_LWSYNC
#define smp_wmb() lwsync()
#else
#define smp_wmb() eieio()
#endif
?
Segher
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 20:12 ` Segher Boessenkool
@ 2008-05-21 20:44 ` Benjamin Herrenschmidt
2008-05-21 22:07 ` Segher Boessenkool
2008-05-22 0:30 ` Nick Piggin
1 sibling, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2008-05-21 20:44 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: Nick Piggin, linuxppc-dev, paulus
On Wed, 2008-05-21 at 22:12 +0200, Segher Boessenkool wrote:
> No idea about POWER6; for CBE, the backend works similar to the
> 970 one.
>
> Given that the architecture says to use lwsync for cases like this,
> it would be very surprising if it performed (much) worse than eieio,
> eh? ;-) So I think your patch is a win; just wanted to clarify on
> your five-time slowdown number.
It makes sense to use lwsync rather than eieio for smb_wmb() as this is
not supposed to be used to order with cache inhibited storage. It's
really a data barrier used by the kernel for normal cacheable storage.
The main question is do we care if the downgrade to sync on power3 hurts
performances (and does it ?) and what do we do for 32 bits as currently,
no 32 bits implementation has lwsync afaik (though that might not be
true for long).
Also, we don't, I think, have verified that they all properly ignore the
added bit and behave as sync rather than program checking..
Ben.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 16:02 ` Benjamin Herrenschmidt
@ 2008-05-21 20:51 ` Segher Boessenkool
0 siblings, 0 replies; 36+ messages in thread
From: Segher Boessenkool @ 2008-05-21 20:51 UTC (permalink / raw)
To: benh; +Cc: Nick Piggin, linuxppc-dev, paulus
> This is mostly useless then since lwsync is just a sync to a processor
> that doesn't know it (it's a sync with a reservd bit set) :-) Or it's
> just to make gas happy if you specify a processor type that doesn't
> have
> lwsync ?
GAS doesn't care (I tried with -Wa,-m405). Support for this insn
was added to binutils in august 2001; do we still support older
binutils than that? If so, we should just do the ".long" trick in
a lwsync() inline function, like we do for similar cases.
Segher
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 20:44 ` Benjamin Herrenschmidt
@ 2008-05-21 22:07 ` Segher Boessenkool
0 siblings, 0 replies; 36+ messages in thread
From: Segher Boessenkool @ 2008-05-21 22:07 UTC (permalink / raw)
To: benh; +Cc: Nick Piggin, linuxppc-dev, paulus
> The main question is do we care if the downgrade to sync on power3
> hurts
> performances (and does it ?) and what do we do for 32 bits as
> currently,
> no 32 bits implementation has lwsync afaik (though that might not be
> true for long).
Some time ago, I benchmarked (*) a loop of "stw;sync" vs. "stw;eieio"
on a 750 (yeah, great micro-benchmark, heh); and the sync version
was a factor ~100 slower.
> Also, we don't, I think, have verified that they all properly ignore
> the
> added bit and behave as sync rather than program checking..
The architecture books don't have a programming note about this
(like they do in similar cases), so either a) it works fine on
every implementation; or b) someone forgot to add such a note.
We shouldn't use lwsync on 32-bit (it's an awful performance hit
on "classic" cpus, and who knows what it does on embedded cpus
(bookE *shudder*)). For POWER3 and *star, i.e., all 64-bit? Perhaps
it'll be okay there, but if as you suggest some 32-bit CPUs will add
proper lwsync support soonish, maybe a feature fixup or some such is
in order.
Segher
(*) Not on purpose.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 2/2] powerpc: optimise smp_wmb
2008-05-21 20:12 ` Segher Boessenkool
2008-05-21 20:44 ` Benjamin Herrenschmidt
@ 2008-05-22 0:30 ` Nick Piggin
1 sibling, 0 replies; 36+ messages in thread
From: Nick Piggin @ 2008-05-22 0:30 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev, paulus
On Wed, May 21, 2008 at 10:12:03PM +0200, Segher Boessenkool wrote:
> >>From memory, I measured lwsync is 5 times faster than eieio on
> >a dual G5. This was on a simple microbenchmark that made use of
> >smp_wmb for store ordering, but it did not involve any IO access
> >(which presumably would disadvantage eieio further).
>
> This is very much specific to your particular benchmark.
>
> On the 970, there are two differences between lwsync and eieio:
>
> 1) lwsync cannot issue before all previous loads are done; eieio
> does not have this restriction.
>
> Then, they both fly through the execution core, it doesn't wait
> for the barrier insn to complete in the storage system. In both
> cases, a barrier is inserted into both the L2 queues and the
> non-cacheable queues. These barriers are both removed at the
> same time, that is, when both are the oldest in their queue and
> have done their thing.
>
> 2) For eieio, the non-cacheable unit waits for all previous
> (non-cacheable) stores to complete, and then arbitrates for the
> bus and sends an EIEIO transaction.
>
> Your benchmark doesn't do non-cacheable stores, so it would seem
> the five-time slowdown is caused by that bus arbitration (and the
> bus transaction). Maybe your cacheable stores hit the bus as well,
> that would make this worse. Your benchmark also doesn't see the
> negative effects from 1).
>
> In "real" code, I expect 2) to be pretty much invisible (the store
> queues will never be completely filled up), but 1) shouldn't be very
> bad either. So it's a wash. But only a real benchmark will tell.
OK, interesting thanks. Yes the "benchmark" is not a good one, but
it verified for me that there is a difference there. Combined with
IBM's documents saying lwsync is preferred for store/store ordering
is my rationale for sending the patch. A real benchmark would be nice
but it would probably be hard to notice any improvement.
> >Given the G5 speedup, I'd be surprised if there is not an improvment
> >on POWER4 and 5 as well,
>
> The 970 storage subsystem and the POWER4 one are very different.
> Or maybe these queues are just about the last thing that _is_
> identical, I dunno, there aren't public POWER4 docs for this ;-)
>
> >although no idea about POWER6 or cell...
>
> No idea about POWER6; for CBE, the backend works similar to the
> 970 one.
>
> Given that the architecture says to use lwsync for cases like this,
> it would be very surprising if it performed (much) worse than eieio,
> eh? ;-) So I think your patch is a win; just wanted to clarify on
> your five-time slowdown number.
Sure, thanks!
Nick
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2008-05-21 15:32 ` Nick Piggin
2008-05-21 15:43 ` Benjamin Herrenschmidt
@ 2008-05-23 2:14 ` Paul Mackerras
2008-05-23 4:40 ` Nick Piggin
1 sibling, 1 reply; 36+ messages in thread
From: Paul Mackerras @ 2008-05-23 2:14 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev
Nick Piggin writes:
> More than one device driver does raw/relaxed io accessors and expects the
> *mb functions to order them.
Can you point us at an example?
I don't think the semantics of the raw accessors are particularly well
defined (it's not even defined what endianness the data comes back in)
so I'd be surprised if there was any real definition of their
behaviour w.r.t. barriers.
Paul.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2008-05-23 2:14 ` Paul Mackerras
@ 2008-05-23 4:40 ` Nick Piggin
2008-05-23 4:53 ` Paul Mackerras
0 siblings, 1 reply; 36+ messages in thread
From: Nick Piggin @ 2008-05-23 4:40 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
On Fri, May 23, 2008 at 12:14:41PM +1000, Paul Mackerras wrote:
> Nick Piggin writes:
>
> > More than one device driver does raw/relaxed io accessors and expects the
> > *mb functions to order them.
>
> Can you point us at an example?
Uh, I might be getting confused because the semantics are completely
different eg. between powerpc and ia64.
There don't seem to actually be read*_relaxed calls that also use rmb
in the same file (although there is no reason why they might not appear).
But I must be thinking of are the raw_read accessors. They aren't ordered
on powerpc, and a few drivers appear to hope rmb() will order them.
> I don't think the semantics of the raw accessors are particularly well
> defined (it's not even defined what endianness the data comes back in)
> so I'd be surprised if there was any real definition of their
> behaviour w.r.t. barriers.
I think it is fair to say the intention of the _relaxed variants is
that they require memory barriers for ordering. No idea about raw read
or write but it appears that at least on powerpc and alpha they are
unordered.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2008-05-23 4:40 ` Nick Piggin
@ 2008-05-23 4:53 ` Paul Mackerras
2008-05-23 5:48 ` Nick Piggin
0 siblings, 1 reply; 36+ messages in thread
From: Paul Mackerras @ 2008-05-23 4:53 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev
Nick Piggin writes:
> There don't seem to actually be read*_relaxed calls that also use rmb
> in the same file (although there is no reason why they might not appear).
> But I must be thinking of are the raw_read accessors. They aren't ordered
> on powerpc, and a few drivers appear to hope rmb() will order them.
Which drivers?
Paul.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2008-05-23 4:53 ` Paul Mackerras
@ 2008-05-23 5:48 ` Nick Piggin
2008-05-23 6:40 ` Paul Mackerras
0 siblings, 1 reply; 36+ messages in thread
From: Nick Piggin @ 2008-05-23 5:48 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
On Fri, May 23, 2008 at 02:53:21PM +1000, Paul Mackerras wrote:
> Nick Piggin writes:
>
> > There don't seem to actually be read*_relaxed calls that also use rmb
> > in the same file (although there is no reason why they might not appear).
> > But I must be thinking of are the raw_read accessors. They aren't ordered
> > on powerpc, and a few drivers appear to hope rmb() will order them.
>
> Which drivers?
net/macb.c, net/sfc/falcon_io.c... hmm, I thought there was a couple
more but maybe I was thinking of write side.
Anyway, even if there were zero, then the point is still that you
implement that API, so you should either strongly order your
__raw_ and _relaxed then you can weaken your rmb, or you have to
strengthen your rmb to match your weak read ops.
Saying it doesn't matter because there are no drivers is likely to
cause more headaches in future...
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2008-05-23 5:48 ` Nick Piggin
@ 2008-05-23 6:40 ` Paul Mackerras
2008-05-26 1:38 ` Nick Piggin
0 siblings, 1 reply; 36+ messages in thread
From: Paul Mackerras @ 2008-05-23 6:40 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev
Nick Piggin writes:
> Anyway, even if there were zero, then the point is still that you
> implement that API, so you should either strongly order your
> __raw_ and _relaxed then you can weaken your rmb, or you have to
> strengthen your rmb to match your weak read ops.
>
> Saying it doesn't matter because there are no drivers is likely to
> cause more headaches in future...
Well, it seems that rmb is used in several ways:
(1) to order reads from normal memory w.r.t. each other
(2) to order readl et al. w.r.t. reads from normal memory
(3) to order readl_relaxed et al. w.r.t. reads from normal memory
and I want to get a feeling for how prevalent each type of use is.
Now, the instances of (1) can presumably be converted to smb_rmb().
The instances of (2) aren't necessary on powerpc because readl
etc. are fully synchronous -- but that mightn't be true on other
platforms. And for the instances of (3), powerpc needs to use sync,
as you say.
If the instances of (2) have to stay, then I would rather try to work
out some way that rmb doesn't have to be a sync, or at least doesn't
always have to be a sync. And that might mean redefining the meaning
of rmb so that the instances of (3) have to be changed to something
else. Since (3) seems to be rarer than (2), it would presumably be
(3) that we would change rather than (2).
Paul.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2008-05-23 6:40 ` Paul Mackerras
@ 2008-05-26 1:38 ` Nick Piggin
0 siblings, 0 replies; 36+ messages in thread
From: Nick Piggin @ 2008-05-26 1:38 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
On Fri, May 23, 2008 at 04:40:21PM +1000, Paul Mackerras wrote:
> Nick Piggin writes:
>
> > Anyway, even if there were zero, then the point is still that you
> > implement that API, so you should either strongly order your
> > __raw_ and _relaxed then you can weaken your rmb, or you have to
> > strengthen your rmb to match your weak read ops.
> >
> > Saying it doesn't matter because there are no drivers is likely to
> > cause more headaches in future...
>
> Well, it seems that rmb is used in several ways:
>
> (1) to order reads from normal memory w.r.t. each other
> (2) to order readl et al. w.r.t. reads from normal memory
> (3) to order readl_relaxed et al. w.r.t. reads from normal memory
>
> and I want to get a feeling for how prevalent each type of use is.
>
> Now, the instances of (1) can presumably be converted to smb_rmb().
Right.
> The instances of (2) aren't necessary on powerpc because readl
> etc. are fully synchronous -- but that mightn't be true on other
> platforms.a
Yes... this is one place where a dedicated io_rmb() etc will
help on powerpc (as per my other proposal).
> And for the instances of (3), powerpc needs to use sync,
> as you say.
>
> If the instances of (2) have to stay, then I would rather try to work
> out some way that rmb doesn't have to be a sync, or at least doesn't
> always have to be a sync. And that might mean redefining the meaning
> of rmb so that the instances of (3) have to be changed to something
> else. Since (3) seems to be rarer than (2), it would presumably be
> (3) that we would change rather than (2).
That would be nice, but the problem is that if you weaken anything,
then technically you need to audit everything. And if you audit
everything anyway, you may as well retain existing semantics of rmb,
and introduce a new primitive, io_rmb... this is the reasoning I
use when I think the ia64 guys went the wrong way by weakening
wmb and introducing mmiowb.
Would the io_*mb() proposal suit you? It should also be able to
improve your IO write/write and read/write write/read performance
too.
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2008-05-26 1:38 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-21 14:10 [patch 1/2] powerpc: rmb fix Nick Piggin
2008-05-21 14:12 ` [patch 2/2] powerpc: optimise smp_wmb Nick Piggin
2008-05-21 15:26 ` Benjamin Herrenschmidt
2008-05-21 15:34 ` Nick Piggin
2008-05-21 15:43 ` Benjamin Herrenschmidt
2008-05-21 15:47 ` Nick Piggin
2008-05-21 16:02 ` Benjamin Herrenschmidt
2008-05-21 20:51 ` Segher Boessenkool
2008-05-21 16:01 ` Nick Piggin
2008-05-21 20:12 ` Segher Boessenkool
2008-05-21 20:44 ` Benjamin Herrenschmidt
2008-05-21 22:07 ` Segher Boessenkool
2008-05-22 0:30 ` Nick Piggin
2008-05-21 20:16 ` Segher Boessenkool
2008-05-21 15:27 ` [patch 1/2] powerpc: rmb fix Benjamin Herrenschmidt
2008-05-21 15:32 ` Nick Piggin
2008-05-21 15:43 ` Benjamin Herrenschmidt
2008-05-23 2:14 ` Paul Mackerras
2008-05-23 4:40 ` Nick Piggin
2008-05-23 4:53 ` Paul Mackerras
2008-05-23 5:48 ` Nick Piggin
2008-05-23 6:40 ` Paul Mackerras
2008-05-26 1:38 ` Nick Piggin
-- strict thread matches above, loose matches on Subject: below --
2007-08-21 2:11 Nick Piggin
2007-08-21 19:07 ` Joel Schopp
2007-08-21 19:43 ` Segher Boessenkool
2007-08-21 21:42 ` Linas Vepstas
2007-08-22 1:16 ` Nick Piggin
2007-08-22 3:29 ` Segher Boessenkool
2007-08-22 3:55 ` Nick Piggin
2007-08-23 17:57 ` Segher Boessenkool
2007-08-24 2:47 ` Nick Piggin
2007-08-22 3:15 ` Nick Piggin
2007-08-22 3:33 ` Segher Boessenkool
2007-08-22 4:05 ` Nick Piggin
2007-08-23 17:49 ` Segher Boessenkool
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).