* [patch 1/2] powerpc: rmb fix
@ 2007-08-21 2:11 Nick Piggin
2007-08-21 2:16 ` [patch 1/2] powerpc: smp_wmb speedup Nick Piggin
2007-08-21 19:07 ` [patch 1/2] powerpc: rmb fix Joel Schopp
0 siblings, 2 replies; 15+ messages in thread
From: Nick Piggin @ 2007-08-21 2:11 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Paul Mackerras
In the interest of completeness, I'll split these patches up and submit to
the powerpc dev list. Any discussion or ack/nack would be appreciated.
---
lwsync is defined to only order memory operations on cacheable memory.
A full sync appears to be the only barrier that will order all memory
loads including device memory.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Index: linux-2.6/include/asm-powerpc/system.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/system.h
+++ linux-2.6/include/asm-powerpc/system.h
@@ -33,7 +33,7 @@
* SMP since it is only used to order updates to system memory.
*/
#define mb() __asm__ __volatile__ ("sync" : : : "memory")
-#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
+#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
#define wmb() __asm__ __volatile__ ("sync" : : : "memory")
#define read_barrier_depends() do { } while(0)
@@ -42,7 +42,7 @@
#ifdef __KERNEL__
#ifdef CONFIG_SMP
#define smp_mb() mb()
-#define smp_rmb() rmb()
+#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
#define smp_wmb() eieio()
#define smp_read_barrier_depends() read_barrier_depends()
#else
^ permalink raw reply [flat|nested] 15+ messages in thread
* [patch 1/2] powerpc: smp_wmb speedup
2007-08-21 2:11 [patch 1/2] powerpc: rmb fix Nick Piggin
@ 2007-08-21 2:16 ` Nick Piggin
2007-08-21 2:21 ` Nick Piggin
2007-08-21 19:07 ` [patch 1/2] powerpc: rmb fix Joel Schopp
1 sibling, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2007-08-21 2:16 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Paul Mackerras
This one is perhaps not as straightforward. I'm pretty limited in the types
of powerpc machines I can test with, so I don't actually know whether this
is the right thing to do on power5/6 etc. I can supply the simple test program
I used if anybody is interested.
---
On my dual G5, lwsync is over 5 times faster than eieio when used in a simple
test case (that actually makes real use of lwsync to provide write ordering).
This is not surprising, as it avoids the IO access synchronisation of eieio,
and still permits the important relaxation of executing loads before stores.
The on sub-architectures where lwsync is unavailable, eieio is retained, as
it should be faster than the alternative full sync (eieio is a proper subset
of sync).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Index: linux-2.6/include/asm-powerpc/system.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/system.h
+++ linux-2.6/include/asm-powerpc/system.h
@@ -43,7 +43,11 @@
#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
+#ifdef __SUBARCH_HAS_LWSYNC
+#define smp_wmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
+#else
#define smp_wmb() eieio()
+#endif
#define smp_read_barrier_depends() read_barrier_depends()
#else
#define smp_mb() barrier()
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: smp_wmb speedup
2007-08-21 2:16 ` [patch 1/2] powerpc: smp_wmb speedup Nick Piggin
@ 2007-08-21 2:21 ` Nick Piggin
0 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2007-08-21 2:21 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Paul Mackerras
Sorry, this is patch 2/2 of course.
On Tue, Aug 21, 2007 at 04:16:52AM +0200, Nick Piggin wrote:
> This one is perhaps not as straightforward. I'm pretty limited in the types
> of powerpc machines I can test with, so I don't actually know whether this
> is the right thing to do on power5/6 etc. I can supply the simple test program
> I used if anybody is interested.
>
> ---
> On my dual G5, lwsync is over 5 times faster than eieio when used in a simple
> test case (that actually makes real use of lwsync to provide write ordering).
>
> This is not surprising, as it avoids the IO access synchronisation of eieio,
> and still permits the important relaxation of executing loads before stores.
> The on sub-architectures where lwsync is unavailable, eieio is retained, as
> it should be faster than the alternative full sync (eieio is a proper subset
> of sync).
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
>
> Index: linux-2.6/include/asm-powerpc/system.h
> ===================================================================
> --- linux-2.6.orig/include/asm-powerpc/system.h
> +++ linux-2.6/include/asm-powerpc/system.h
> @@ -43,7 +43,11 @@
> #ifdef CONFIG_SMP
> #define smp_mb() mb()
> #define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
> +#ifdef __SUBARCH_HAS_LWSYNC
> +#define smp_wmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
> +#else
> #define smp_wmb() eieio()
> +#endif
> #define smp_read_barrier_depends() read_barrier_depends()
> #else
> #define smp_mb() barrier()
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-21 2:11 [patch 1/2] powerpc: rmb fix Nick Piggin
2007-08-21 2:16 ` [patch 1/2] powerpc: smp_wmb speedup Nick Piggin
@ 2007-08-21 19:07 ` Joel Schopp
2007-08-21 19:43 ` Segher Boessenkool
1 sibling, 1 reply; 15+ messages in thread
From: Joel Schopp @ 2007-08-21 19:07 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, Paul Mackerras
> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> -#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
> +#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> #define read_barrier_depends() do { } while(0)
>
> @@ -42,7 +42,7 @@
> #ifdef __KERNEL__
> #ifdef CONFIG_SMP
> #define smp_mb() mb()
> -#define smp_rmb() rmb()
> +#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
> #define smp_wmb() eieio()
> #define smp_read_barrier_depends() read_barrier_depends()
> #else
I had to think about this one for awhile. It looks at first glance to be the right
thing to do. But I do wonder how long rmb() has been lwsync and if as a practical
matter that has caused any problems? If this isn't causing any problems maybe there
is some loigic we are overlooking?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-21 19:07 ` [patch 1/2] powerpc: rmb fix Joel Schopp
@ 2007-08-21 19:43 ` Segher Boessenkool
2007-08-21 21:42 ` Linas Vepstas
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Segher Boessenkool @ 2007-08-21 19:43 UTC (permalink / raw)
To: Joel Schopp; +Cc: Nick Piggin, linuxppc-dev, Paul Mackerras
>> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
>> -#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
>> "memory")
>> +#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
>> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
>> #define read_barrier_depends() do { } while(0)
>>
>> @@ -42,7 +42,7 @@
>> #ifdef __KERNEL__
>> #ifdef CONFIG_SMP
>> #define smp_mb() mb()
>> -#define smp_rmb() rmb()
>> +#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
>> "memory")
>> #define smp_wmb() eieio()
>> #define smp_read_barrier_depends() read_barrier_depends()
>> #else
>
> I had to think about this one for awhile. It looks at first glance to
> be the right
> thing to do. But I do wonder how long rmb() has been lwsync
Since the {ppc,ppc64} -> powerpc merge.
> and if as a practical matter that has caused any problems?
It has not as far as I know.
> If this isn't causing any problems maybe there
> is some loigic we are overlooking?
The I/O accessor functions enforce the necessary ordering
already I believe.
Segher
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-21 19:43 ` Segher Boessenkool
@ 2007-08-21 21:42 ` Linas Vepstas
2007-08-22 1:16 ` Nick Piggin
2007-08-22 3:15 ` Nick Piggin
2 siblings, 0 replies; 15+ messages in thread
From: Linas Vepstas @ 2007-08-21 21:42 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: Nick Piggin, Paul Mackerras, linuxppc-dev
On Tue, Aug 21, 2007 at 09:43:17PM +0200, Segher Boessenkool wrote:
> >> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> >> -#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >> "memory")
> >> +#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define read_barrier_depends() do { } while(0)
> >>
> >> @@ -42,7 +42,7 @@
> >> #ifdef __KERNEL__
> >> #ifdef CONFIG_SMP
> >> #define smp_mb() mb()
> >> -#define smp_rmb() rmb()
> >> +#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >> "memory")
> >> #define smp_wmb() eieio()
> >> #define smp_read_barrier_depends() read_barrier_depends()
> >> #else
> >
> > I had to think about this one for awhile. It looks at first glance to
> > be the right
> > thing to do. But I do wonder how long rmb() has been lwsync
>
> Since the {ppc,ppc64} -> powerpc merge.
>
> > and if as a practical matter that has caused any problems?
>
> It has not as far as I know.
>
> > If this isn't causing any problems maybe there
> > is some loigic we are overlooking?
>
> The I/O accessor functions enforce the necessary ordering
> already I believe.
So, is this patch desirable?
--linas
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-21 19:43 ` Segher Boessenkool
2007-08-21 21:42 ` Linas Vepstas
@ 2007-08-22 1:16 ` Nick Piggin
2007-08-22 3:29 ` Segher Boessenkool
2007-08-22 3:15 ` Nick Piggin
2 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2007-08-22 1:16 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev, Paul Mackerras
On Tue, Aug 21, 2007 at 09:43:17PM +0200, Segher Boessenkool wrote:
> >> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> >>-#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >>"memory")
> >>+#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define read_barrier_depends() do { } while(0)
> >>
> >>@@ -42,7 +42,7 @@
> >> #ifdef __KERNEL__
> >> #ifdef CONFIG_SMP
> >> #define smp_mb() mb()
> >>-#define smp_rmb() rmb()
> >>+#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >>"memory")
> >> #define smp_wmb() eieio()
> >> #define smp_read_barrier_depends() read_barrier_depends()
> >> #else
> >
> >I had to think about this one for awhile. It looks at first glance to
> >be the right
> >thing to do. But I do wonder how long rmb() has been lwsync
>
> Since the {ppc,ppc64} -> powerpc merge.
>
> >and if as a practical matter that has caused any problems?
>
> It has not as far as I know.
>
> >If this isn't causing any problems maybe there
> >is some loigic we are overlooking?
>
> The I/O accessor functions enforce the necessary ordering
> already I believe.
Ah, it looks like you might be right, IO should appear to go in-order, in
which case the rmb() would simply need to order cacheable loads. Interesting
way to do things... are drivers simply not up to scratch enough to allow
out of order IO?
Anyway, this raises another question -- if IO accessors have the right
ordering, why is wmb() not an lwsync as well? There appears to be many
more wmb() calls than rmb()...
Thanks,
Nick
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-21 19:43 ` Segher Boessenkool
2007-08-21 21:42 ` Linas Vepstas
2007-08-22 1:16 ` Nick Piggin
@ 2007-08-22 3:15 ` Nick Piggin
2007-08-22 3:33 ` Segher Boessenkool
2 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2007-08-22 3:15 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev, Paul Mackerras
On Tue, Aug 21, 2007 at 09:43:17PM +0200, Segher Boessenkool wrote:
> >> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
> >>-#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >>"memory")
> >>+#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> >> #define read_barrier_depends() do { } while(0)
> >>
> >>@@ -42,7 +42,7 @@
> >> #ifdef __KERNEL__
> >> #ifdef CONFIG_SMP
> >> #define smp_mb() mb()
> >>-#define smp_rmb() rmb()
> >>+#define smp_rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : :
> >>"memory")
> >> #define smp_wmb() eieio()
> >> #define smp_read_barrier_depends() read_barrier_depends()
> >> #else
> >
> >I had to think about this one for awhile. It looks at first glance to
> >be the right
> >thing to do. But I do wonder how long rmb() has been lwsync
>
> Since the {ppc,ppc64} -> powerpc merge.
>
> >and if as a practical matter that has caused any problems?
>
> It has not as far as I know.
>
> >If this isn't causing any problems maybe there
> >is some loigic we are overlooking?
>
> The I/O accessor functions enforce the necessary ordering
> already I believe.
Hmm, I never followed those discussions last year about IO ordering, and
I can't see where (if) it was documented anywhere :(
It appears that legacy code is handled by defining the old IO accessors to
be completely ordered, and introducing new __raw_ variants that are not
(OTOH, it seems like other architectures are implementing __raw prefix as
inorder unless there is a _relaxed postfix).
Drivers are definitely using these __raw_ accessors, and from a quick
look, they do appear to be hoping that *mb() is going to order access for
them.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 1:16 ` Nick Piggin
@ 2007-08-22 3:29 ` Segher Boessenkool
2007-08-22 3:55 ` Nick Piggin
0 siblings, 1 reply; 15+ messages in thread
From: Segher Boessenkool @ 2007-08-22 3:29 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, Paul Mackerras
>>> If this isn't causing any problems maybe there
>>> is some loigic we are overlooking?
>>
>> The I/O accessor functions enforce the necessary ordering
>> already I believe.
>
> Ah, it looks like you might be right, IO should appear to go in-order,
> in
> which case the rmb() would simply need to order cacheable loads.
> Interesting
> way to do things... are drivers simply not up to scratch enough to
> allow
> out of order IO?
The powerpc kernel needs to have full sync insns in every I/O
accessor in order to enforce all the ordering rules Linux demands.
It's a bloody shame, but the alternative would be to make the
barriers lots more expensive. A third alternative would be to
have barrier ops that do not order everything, but just A vs. B
for various choices of A and B (coherent accesses, MMIO accesses,
etc.)
> Anyway, this raises another question -- if IO accessors have the right
> ordering, why is wmb() not an lwsync as well? There appears to be many
> more wmb() calls than rmb()...
Input MMIO accessors are {sync, load, stall pipeline until load came
back}.
That's a full ordering on both sides.
Output MMIO on the other hand is done with {sync, store}. Now since
wmb() has to order MMIO writes vs. main memory writes, we need a full
sync here. On some (most, all?) CPUs an eieio is actually enough btw.
The barrier insn could be put at the end of all MMIO write ops too,
but I believe that would be more expensive (in execution time; in code
size it definitely would be, of course).
Segher
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 3:15 ` Nick Piggin
@ 2007-08-22 3:33 ` Segher Boessenkool
2007-08-22 4:05 ` Nick Piggin
0 siblings, 1 reply; 15+ messages in thread
From: Segher Boessenkool @ 2007-08-22 3:33 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, Paul Mackerras
>> The I/O accessor functions enforce the necessary ordering
>> already I believe.
>
> Hmm, I never followed those discussions last year about IO ordering,
> and
> I can't see where (if) it was documented anywhere :(
The comments in system.h weren't updated with the last fix, I think.
> It appears that legacy code is handled by defining the old IO
> accessors to
> be completely ordered, and introducing new __raw_ variants that are not
> (OTOH, it seems like other architectures are implementing __raw prefix
> as
> inorder unless there is a _relaxed postfix).
__raw_XX() is for platform code only, which can do the needed
barriers without having to use the heavy hammer like everything
else unfortunately does.
> Drivers are definitely using these __raw_ accessors, and from a quick
> look, they do appear to be hoping that *mb() is going to order access
> for
> them.
Which drivers?
Segher
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 3:29 ` Segher Boessenkool
@ 2007-08-22 3:55 ` Nick Piggin
2007-08-23 17:57 ` Segher Boessenkool
0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2007-08-22 3:55 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev, Paul Mackerras
On Wed, Aug 22, 2007 at 05:29:50AM +0200, Segher Boessenkool wrote:
> >>>If this isn't causing any problems maybe there
> >>>is some loigic we are overlooking?
> >>
> >>The I/O accessor functions enforce the necessary ordering
> >>already I believe.
> >
> >Ah, it looks like you might be right, IO should appear to go in-order,
> >in
> >which case the rmb() would simply need to order cacheable loads.
> >Interesting
> >way to do things... are drivers simply not up to scratch enough to
> >allow
> >out of order IO?
>
> The powerpc kernel needs to have full sync insns in every I/O
> accessor in order to enforce all the ordering rules Linux demands.
> It's a bloody shame, but the alternative would be to make the
> barriers lots more expensive. A third alternative would be to
Well lots more expensive compared to what you have now. But what
you have now is like having those expensive barriers between
*every* io access.
> have barrier ops that do not order everything, but just A vs. B
> for various choices of A and B (coherent accesses, MMIO accesses,
> etc.)
The non-smp_ variant is supposed to order everything, AFAIK. Maybe
you could get more fancy and have PIO vs MMIO etc etc. but it looks
like this whole area is in a pretty sticky state anyway so let's
not think about that.
> >Anyway, this raises another question -- if IO accessors have the right
> >ordering, why is wmb() not an lwsync as well? There appears to be many
> >more wmb() calls than rmb()...
>
> Input MMIO accessors are {sync, load, stall pipeline until load came
> back}.
> That's a full ordering on both sides.
>
> Output MMIO on the other hand is done with {sync, store}. Now since
> wmb() has to order MMIO writes vs. main memory writes, we need a full
> sync here. On some (most, all?) CPUs an eieio is actually enough btw.
> The barrier insn could be put at the end of all MMIO write ops too,
> but I believe that would be more expensive (in execution time; in code
> size it definitely would be, of course).
Ah, that explains why wmb() is a sync. Doesn't seem like a very good
idea though, if the rationale of having fully ordered IO accessors was
because drivers didn't have enough barriers in them.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 3:33 ` Segher Boessenkool
@ 2007-08-22 4:05 ` Nick Piggin
2007-08-23 17:49 ` Segher Boessenkool
0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2007-08-22 4:05 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev, Paul Mackerras
On Wed, Aug 22, 2007 at 05:33:16AM +0200, Segher Boessenkool wrote:
> >>The I/O accessor functions enforce the necessary ordering
> >>already I believe.
> >
> >Hmm, I never followed those discussions last year about IO ordering,
> >and
> >I can't see where (if) it was documented anywhere :(
>
> The comments in system.h weren't updated with the last fix, I think.
>
> >It appears that legacy code is handled by defining the old IO
> >accessors to
> >be completely ordered, and introducing new __raw_ variants that are not
> >(OTOH, it seems like other architectures are implementing __raw prefix
> >as
> >inorder unless there is a _relaxed postfix).
>
> __raw_XX() is for platform code only, which can do the needed
> barriers without having to use the heavy hammer like everything
> else unfortunately does.
npiggin@nick:~/usr/src/linux-2.6/drivers> egrep '__raw_(write|read)' -r * | wc -l
685
> >Drivers are definitely using these __raw_ accessors, and from a quick
> >look, they do appear to be hoping that *mb() is going to order access
> >for
> >them.
>
> Which drivers?
There are maybe a dozen that use the raw accessors, and use non-smp_
memory barriers. I just looked at drivers/video/tgafb.c, which
indeed appears to intermix them.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 4:05 ` Nick Piggin
@ 2007-08-23 17:49 ` Segher Boessenkool
0 siblings, 0 replies; 15+ messages in thread
From: Segher Boessenkool @ 2007-08-23 17:49 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, Paul Mackerras
>>> Drivers are definitely using these __raw_ accessors, and from a quick
>>> look, they do appear to be hoping that *mb() is going to order access
>>> for
>>> them.
>>
>> Which drivers?
>
> There are maybe a dozen that use the raw accessors, and use non-smp_
> memory barriers. I just looked at drivers/video/tgafb.c, which
> indeed appears to intermix them.
Hrm yeah. It also looks like old buggy code that could use a
cleanup or two (or three or four). I wonder if all __raw_ users
are like that?
Segher
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-22 3:55 ` Nick Piggin
@ 2007-08-23 17:57 ` Segher Boessenkool
2007-08-24 2:47 ` Nick Piggin
0 siblings, 1 reply; 15+ messages in thread
From: Segher Boessenkool @ 2007-08-23 17:57 UTC (permalink / raw)
To: Nick Piggin; +Cc: linuxppc-dev, Paul Mackerras
>> The powerpc kernel needs to have full sync insns in every I/O
>> accessor in order to enforce all the ordering rules Linux demands.
>> It's a bloody shame, but the alternative would be to make the
>> barriers lots more expensive. A third alternative would be to
>
> Well lots more expensive compared to what you have now. But what
> you have now is like having those expensive barriers between
> *every* io access.
Yeah. But I/O reads are very expensive anyway, and the barriers
are used for more than just I/O ordering.
I/O writes are a different thing; ideally, they would use only
eieio, if anything at all.
Maybe the tradeoff isn't optimal. The I/O primitives didn't have
all those "sync"s in there before, they got added because some bad
interaction with spinlocks was discovered, if my memory isn't failing
me.
>> have barrier ops that do not order everything, but just A vs. B
>> for various choices of A and B (coherent accesses, MMIO accesses,
>> etc.)
>
> The non-smp_ variant is supposed to order everything, AFAIK. Maybe
> you could get more fancy and have PIO vs MMIO etc etc. but it looks
> like this whole area is in a pretty sticky state anyway so let's
> not think about that.
*Thinking* about it is fun. Trying to get the code merged would be
a different thing ;-)
Segher
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] powerpc: rmb fix
2007-08-23 17:57 ` Segher Boessenkool
@ 2007-08-24 2:47 ` Nick Piggin
0 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2007-08-24 2:47 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev, Paul Mackerras
On Thu, Aug 23, 2007 at 07:57:20PM +0200, Segher Boessenkool wrote:
> >>The powerpc kernel needs to have full sync insns in every I/O
> >>accessor in order to enforce all the ordering rules Linux demands.
> >>It's a bloody shame, but the alternative would be to make the
> >>barriers lots more expensive. A third alternative would be to
> >
> >Well lots more expensive compared to what you have now. But what
> >you have now is like having those expensive barriers between
> >*every* io access.
>
> Yeah. But I/O reads are very expensive anyway, and the barriers
> are used for more than just I/O ordering.
rmb() should only be used when IO ordering matters. smp_rmb() is
for regular ordering.
So doesn't the fact IO reads are very expensive anyway lend more
weight _to have_ the full IO ordering in rmb()?
> I/O writes are a different thing; ideally, they would use only
> eieio, if anything at all.
For IO to IO ordering, yes eieio would be ideal. I don't know that
there is really such a primitive for that in Linux. io_wmb().
> Maybe the tradeoff isn't optimal. The I/O primitives didn't have
> all those "sync"s in there before, they got added because some bad
> interaction with spinlocks was discovered, if my memory isn't failing
> me.
I think it may have been because IO ops are pretty strongly ordered
on x86, and it was thought that a fair amount of code was relying on
that. So the old primitives were made quite strongly ordered, and
new ones were added to avoid the overhead. Unfortunately, you can't
actually use the unordered ones unless you have working barrier
instructions. Hence why I think rmb() should be an IO barrier.
But I'm not pushing this too hard. You guys all know the gory ppc
details better than I, so I'll just leave it with you to work out
whether it is the right thing to do.
> >>have barrier ops that do not order everything, but just A vs. B
> >>for various choices of A and B (coherent accesses, MMIO accesses,
> >>etc.)
> >
> >The non-smp_ variant is supposed to order everything, AFAIK. Maybe
> >you could get more fancy and have PIO vs MMIO etc etc. but it looks
> >like this whole area is in a pretty sticky state anyway so let's
> >not think about that.
>
> *Thinking* about it is fun. Trying to get the code merged would be
> a different thing ;-)
;)
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2007-08-24 2:47 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-21 2:11 [patch 1/2] powerpc: rmb fix Nick Piggin
2007-08-21 2:16 ` [patch 1/2] powerpc: smp_wmb speedup Nick Piggin
2007-08-21 2:21 ` Nick Piggin
2007-08-21 19:07 ` [patch 1/2] powerpc: rmb fix Joel Schopp
2007-08-21 19:43 ` Segher Boessenkool
2007-08-21 21:42 ` Linas Vepstas
2007-08-22 1:16 ` Nick Piggin
2007-08-22 3:29 ` Segher Boessenkool
2007-08-22 3:55 ` Nick Piggin
2007-08-23 17:57 ` Segher Boessenkool
2007-08-24 2:47 ` Nick Piggin
2007-08-22 3:15 ` Nick Piggin
2007-08-22 3:33 ` Segher Boessenkool
2007-08-22 4:05 ` Nick Piggin
2007-08-23 17:49 ` Segher Boessenkool
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).