[PATCH 6/24] make atomic_read() behave consistently on frv

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 6/24] make atomic_read() behave consistently on frv
@ 2007-08-09 13:41 Chris Snook
  2007-08-09 16:54 ` Chris Snook
  2007-08-10  9:23 ` David Howells
  0 siblings, 2 replies; 30+ messages in thread
From: Chris Snook @ 2007-08-09 13:41 UTC (permalink / raw)
  To: linux-kernel, linux-arch, torvalds
  Cc: netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl

From: Chris Snook <csnook@redhat.com>

Make atomic_read() volatile on frv.  This ensures that busy-waiting
for an interrupt handler to change an atomic_t won't get compiled to an
infinite loop, consistent with SMP architectures.

Signed-off-by: Chris Snook <csnook@redhat.com>

--- linux-2.6.23-rc2-orig/include/asm-frv/atomic.h	2007-07-08 19:32:17.000000000 -0400
+++ linux-2.6.23-rc2/include/asm-frv/atomic.h	2007-08-09 06:41:48.000000000 -0400
@@ -40,7 +40,12 @@ typedef struct {
 } atomic_t;
 
 #define ATOMIC_INIT(i)		{ (i) }
-#define atomic_read(v)		((v)->counter)
+
+/*
+ * Casting to volatile here minimizes the need for barriers,
+ * without having to declare the type itself as volatile.
+ */
+#define atomic_read(v)		(*(volatile int *)&(v)->counter)
 #define atomic_set(v, i)	(((v)->counter) = (i))
 
 #ifndef CONFIG_FRV_OUTOFLINE_ATOMIC_OPS

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-09 13:41 [PATCH 6/24] make atomic_read() behave consistently on frv Chris Snook
@ 2007-08-09 16:54 ` Chris Snook
  2007-08-10  9:23 ` David Howells
  1 sibling, 0 replies; 30+ messages in thread
From: Chris Snook @ 2007-08-09 16:54 UTC (permalink / raw)
  To: linux-kernel, linux-arch, torvalds
  Cc: netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl

Chris Snook wrote:
> From: Chris Snook <csnook@redhat.com>
> 
> Make atomic_read() volatile on frv.  This ensures that busy-waiting
> for an interrupt handler to change an atomic_t won't get compiled to an
> infinite loop, consistent with SMP architectures.

To head off the criticism, I admit this is an oversimplification, and true 
busy-waiters should be using cpu_relax(), which contains a barrier.  As 
discussed in recent threads, there are other cases which can be optimized by 
removing the need for a barrier, and having behavior consistent with 
architectures where the benefit is more profound is also valuable.

	-- Chris

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-09 13:41 [PATCH 6/24] make atomic_read() behave consistently on frv Chris Snook
  2007-08-09 16:54 ` Chris Snook
@ 2007-08-10  9:23 ` David Howells
  2007-08-10 19:54   ` Chris Snook
  2007-08-11  8:47   ` David Howells
  1 sibling, 2 replies; 30+ messages in thread
From: David Howells @ 2007-08-10  9:23 UTC (permalink / raw)
  To: Chris Snook
  Cc: dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Chris Snook <csnook@redhat.com> wrote:

> To head off the criticism, I admit this is an oversimplification, and true
> busy-waiters should be using cpu_relax(), which contains a barrier.

Why would you want to use cpu_relax()?  That's there to waste time efficiently,
isn't it?  Shouldn't you be using smp_rmb() or something like that?

David

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-10  9:23 ` David Howells
@ 2007-08-10 19:54   ` Chris Snook
  2007-08-11  0:54     ` Herbert Xu
  2007-08-11  8:47   ` David Howells
  1 sibling, 1 reply; 30+ messages in thread
From: Chris Snook @ 2007-08-10 19:54 UTC (permalink / raw)
  To: David Howells
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

David Howells wrote:
> Chris Snook <csnook@redhat.com> wrote:
> 
>> To head off the criticism, I admit this is an oversimplification, and true
>> busy-waiters should be using cpu_relax(), which contains a barrier.
> 
> Why would you want to use cpu_relax()?  That's there to waste time efficiently,
> isn't it?  Shouldn't you be using smp_rmb() or something like that?
> 
> David

cpu_relax() contains a barrier, so it should do the right thing.  For 
non-smp architectures, I'm concerned about interacting with interrupt 
handlers.  Some drivers do use atomic_* operations.

	-- Chris

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-10 19:54   ` Chris Snook
@ 2007-08-11  0:54     ` Herbert Xu
  2007-08-11  4:29       ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Herbert Xu @ 2007-08-11  0:54 UTC (permalink / raw)
  To: Chris Snook
  Cc: dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Chris Snook <csnook@redhat.com> wrote:
> 
> cpu_relax() contains a barrier, so it should do the right thing.  For 
> non-smp architectures, I'm concerned about interacting with interrupt 
> handlers.  Some drivers do use atomic_* operations.

What problems with interrupt handlers? Access to int/long must
be atomic or we're in big trouble anyway.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-11  0:54     ` Herbert Xu
@ 2007-08-11  4:29       ` Paul E. McKenney
  2007-08-13  5:15         ` Herbert Xu
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2007-08-11  4:29 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Chris Snook, dhowells, linux-kernel, linux-arch, torvalds, netdev,
	akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl

On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote:
> Chris Snook <csnook@redhat.com> wrote:
> > 
> > cpu_relax() contains a barrier, so it should do the right thing.  For 
> > non-smp architectures, I'm concerned about interacting with interrupt 
> > handlers.  Some drivers do use atomic_* operations.
> 
> What problems with interrupt handlers? Access to int/long must
> be atomic or we're in big trouble anyway.

Reordering due to compiler optimizations.  CPU reordering does not
affect interactions with interrupt handlers on a given CPU, but
reordering due to compiler code-movement optimization does.  Since
volatile can in some cases suppress code-movement optimizations,
it can affect interactions with interrupt handlers.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-11  4:29       ` Paul E. McKenney
@ 2007-08-13  5:15         ` Herbert Xu
  2007-08-13  6:03           ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Herbert Xu @ 2007-08-13  5:15 UTC (permalink / raw)
  To: paulmck
  Cc: herbert, csnook, dhowells, linux-kernel, linux-arch, torvalds,
	netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl

Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote:
>> Chris Snook <csnook@redhat.com> wrote:
>> > 
>> > cpu_relax() contains a barrier, so it should do the right thing.  For 
>> > non-smp architectures, I'm concerned about interacting with interrupt 
>> > handlers.  Some drivers do use atomic_* operations.
>> 
>> What problems with interrupt handlers? Access to int/long must
>> be atomic or we're in big trouble anyway.
> 
> Reordering due to compiler optimizations.  CPU reordering does not
> affect interactions with interrupt handlers on a given CPU, but
> reordering due to compiler code-movement optimization does.  Since
> volatile can in some cases suppress code-movement optimizations,
> it can affect interactions with interrupt handlers.

If such reordering matters, then you should use one of the
*mb macros or barrier() rather than relying on possibly
hidden volatile cast.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-13  5:15         ` Herbert Xu
@ 2007-08-13  6:03           ` Paul E. McKenney
  2007-08-14  5:34             ` Nick Piggin
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2007-08-13  6:03 UTC (permalink / raw)
  To: Herbert Xu
  Cc: csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev,
	akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl

On Mon, Aug 13, 2007 at 01:15:52PM +0800, Herbert Xu wrote:
> Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote:
> >> Chris Snook <csnook@redhat.com> wrote:
> >> > 
> >> > cpu_relax() contains a barrier, so it should do the right thing.  For 
> >> > non-smp architectures, I'm concerned about interacting with interrupt 
> >> > handlers.  Some drivers do use atomic_* operations.
> >> 
> >> What problems with interrupt handlers? Access to int/long must
> >> be atomic or we're in big trouble anyway.
> > 
> > Reordering due to compiler optimizations.  CPU reordering does not
> > affect interactions with interrupt handlers on a given CPU, but
> > reordering due to compiler code-movement optimization does.  Since
> > volatile can in some cases suppress code-movement optimizations,
> > it can affect interactions with interrupt handlers.
> 
> If such reordering matters, then you should use one of the
> *mb macros or barrier() rather than relying on possibly
> hidden volatile cast.

If communicating among CPUs, sure.  However, when communicating between
mainline and interrupt/NMI handlers on the same CPU, the barrier() and
most expecially the *mb() macros are gross overkill.  So there really
truly is a place for volatile -- not a large place, to be sure, but a
place nonetheless.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-13  6:03           ` Paul E. McKenney
@ 2007-08-14  5:34             ` Nick Piggin
  2007-08-14  7:26               ` Herbert Xu
  2007-08-14 17:01               ` Paul E. McKenney
  0 siblings, 2 replies; 30+ messages in thread
From: Nick Piggin @ 2007-08-14  5:34 UTC (permalink / raw)
  To: paulmck
  Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds,
	netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl

Paul E. McKenney wrote:
> On Mon, Aug 13, 2007 at 01:15:52PM +0800, Herbert Xu wrote:
> 
>>Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
>>
>>>On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote:
>>>
>>>>Chris Snook <csnook@redhat.com> wrote:
>>>>
>>>>>cpu_relax() contains a barrier, so it should do the right thing.  For 
>>>>>non-smp architectures, I'm concerned about interacting with interrupt 
>>>>>handlers.  Some drivers do use atomic_* operations.
>>>>
>>>>What problems with interrupt handlers? Access to int/long must
>>>>be atomic or we're in big trouble anyway.
>>>
>>>Reordering due to compiler optimizations.  CPU reordering does not
>>>affect interactions with interrupt handlers on a given CPU, but
>>>reordering due to compiler code-movement optimization does.  Since
>>>volatile can in some cases suppress code-movement optimizations,
>>>it can affect interactions with interrupt handlers.
>>
>>If such reordering matters, then you should use one of the
>>*mb macros or barrier() rather than relying on possibly
>>hidden volatile cast.
> 
> 
> If communicating among CPUs, sure.  However, when communicating between
> mainline and interrupt/NMI handlers on the same CPU, the barrier() and
> most expecially the *mb() macros are gross overkill.  So there really
> truly is a place for volatile -- not a large place, to be sure, but a
> place nonetheless.

I really would like all volatile users to go away and be replaced
by explicit barriers. It makes things nicer and more explicit... for
atomic_t type there probably aren't many optimisations that can be
made which volatile would disallow (in actual kernel code), but for
others (eg. bitops, maybe atomic ops in UP kernels), there would be.

Maybe it is the safe way to go, but it does obscure cases where there
is a real need for barriers.

Many atomic operations are allowed to be reordered between CPUs, so
I don't have a good idea for the rationale to order them within the
CPU (also loads and stores to long and ptr types are not ordered like
this, although we do consider those to be atomic operations too).

barrier() in a way is like enforcing sequential memory ordering
between process and interrupt context, wheras volatile is just
enforcing coherency of a single memory location (and as such is
cheaper).

What do you think of this crazy idea?

/* Enforce a compiler barrier for only operations to location X.
  * Call multiple times to provide an ordering between multiple
  * memory locations. Other memory operations can be assumed by
  * the compiler to remain unchanged and may be reordered
  */
#define order(x) asm volatile("" : "+m" (x))

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-14  5:34             ` Nick Piggin
@ 2007-08-14  7:26               ` Herbert Xu
  2007-08-14 17:01               ` Paul E. McKenney
  1 sibling, 0 replies; 30+ messages in thread
From: Herbert Xu @ 2007-08-14  7:26 UTC (permalink / raw)
  To: Nick Piggin
  Cc: paulmck, csnook, dhowells, linux-kernel, linux-arch, torvalds,
	netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl

On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote:
>
> What do you think of this crazy idea?
> 
> /* Enforce a compiler barrier for only operations to location X.
>  * Call multiple times to provide an ordering between multiple
>  * memory locations. Other memory operations can be assumed by
>  * the compiler to remain unchanged and may be reordered
>  */
> #define order(x) asm volatile("" : "+m" (x))

Yes this is a very good idea.  This also makes it explicit
that the coder is depending on this rather than the more
vague connotations of atomic_read.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-14  5:34             ` Nick Piggin
  2007-08-14  7:26               ` Herbert Xu
@ 2007-08-14 17:01               ` Paul E. McKenney
  2007-08-14 22:01                 ` Arnd Bergmann
  2007-08-15 13:30                 ` Nick Piggin
  1 sibling, 2 replies; 30+ messages in thread
From: Paul E. McKenney @ 2007-08-14 17:01 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds,
	netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl

On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote:
> Paul E. McKenney wrote:
> >On Mon, Aug 13, 2007 at 01:15:52PM +0800, Herbert Xu wrote:
> >
> >>Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> >>
> >>>On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote:
> >>>
> >>>>Chris Snook <csnook@redhat.com> wrote:
> >>>>
> >>>>>cpu_relax() contains a barrier, so it should do the right thing.  For 
> >>>>>non-smp architectures, I'm concerned about interacting with interrupt 
> >>>>>handlers.  Some drivers do use atomic_* operations.
> >>>>
> >>>>What problems with interrupt handlers? Access to int/long must
> >>>>be atomic or we're in big trouble anyway.
> >>>
> >>>Reordering due to compiler optimizations.  CPU reordering does not
> >>>affect interactions with interrupt handlers on a given CPU, but
> >>>reordering due to compiler code-movement optimization does.  Since
> >>>volatile can in some cases suppress code-movement optimizations,
> >>>it can affect interactions with interrupt handlers.
> >>
> >>If such reordering matters, then you should use one of the
> >>*mb macros or barrier() rather than relying on possibly
> >>hidden volatile cast.
> >
> >
> >If communicating among CPUs, sure.  However, when communicating between
> >mainline and interrupt/NMI handlers on the same CPU, the barrier() and
> >most expecially the *mb() macros are gross overkill.  So there really
> >truly is a place for volatile -- not a large place, to be sure, but a
> >place nonetheless.
> 
> I really would like all volatile users to go away and be replaced
> by explicit barriers. It makes things nicer and more explicit... for
> atomic_t type there probably aren't many optimisations that can be
> made which volatile would disallow (in actual kernel code), but for
> others (eg. bitops, maybe atomic ops in UP kernels), there would be.
> 
> Maybe it is the safe way to go, but it does obscure cases where there
> is a real need for barriers.

I prefer burying barriers into other primitives.

> Many atomic operations are allowed to be reordered between CPUs, so
> I don't have a good idea for the rationale to order them within the
> CPU (also loads and stores to long and ptr types are not ordered like
> this, although we do consider those to be atomic operations too).
> 
> barrier() in a way is like enforcing sequential memory ordering
> between process and interrupt context, wheras volatile is just
> enforcing coherency of a single memory location (and as such is
> cheaper).

barrier() is useful, but it has the very painful side-effect of forcing
the compiler to dump temporaries.  So we do need something that is
not quite so global in effect.

> What do you think of this crazy idea?
> 
> /* Enforce a compiler barrier for only operations to location X.
>  * Call multiple times to provide an ordering between multiple
>  * memory locations. Other memory operations can be assumed by
>  * the compiler to remain unchanged and may be reordered
>  */
> #define order(x) asm volatile("" : "+m" (x))

There was something very similar discussed earlier in this thread,
with quite a bit of debate as to exactly what the "m" flag should
look like.  I suggested something similar named ACCESS_ONCE in the
context of RCU (http://lkml.org/lkml/2007/7/11/664):

	#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

The nice thing about this is that it works for both loads and stores.
Not clear that order() above does this -- I get compiler errors when
I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-14 17:01               ` Paul E. McKenney
@ 2007-08-14 22:01                 ` Arnd Bergmann
  2007-08-14 22:43                   ` Paul E. McKenney
  2007-08-15 13:30                 ` Nick Piggin
  1 sibling, 1 reply; 30+ messages in thread
From: Arnd Bergmann @ 2007-08-14 22:01 UTC (permalink / raw)
  To: paulmck
  Cc: Nick Piggin, Herbert Xu, csnook, dhowells, linux-kernel,
	linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl

On Tuesday 14 August 2007, Paul E. McKenney wrote:
> > #define order(x) asm volatile("" : "+m" (x))
> 
> There was something very similar discussed earlier in this thread,
> with quite a bit of debate as to exactly what the "m" flag should
> look like.  I suggested something similar named ACCESS_ONCE in the
> context of RCU (http://lkml.org/lkml/2007/7/11/664):
> 
>         #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> 
> The nice thing about this is that it works for both loads and stores.
> Not clear that order() above does this -- I get compiler errors when
> I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2.

Well, it serves a different purpose: While your ACCESS_ONCE() macro is
an lvalue, the order() macro is a statement that can be used in place
of the barrier() macro. order() is the most lightweight barrier as it
only enforces ordering on a single variable in the compiler, but does
not have any side-effects visible to other threads, like the cache
line access in ACCESS_ONCE has.

	Arnd <><

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-14 22:01                 ` Arnd Bergmann
@ 2007-08-14 22:43                   ` Paul E. McKenney
  2007-08-15 13:29                     ` Arnd Bergmann
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2007-08-14 22:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nick Piggin, Herbert Xu, csnook, dhowells, linux-kernel,
	linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl

On Wed, Aug 15, 2007 at 12:01:54AM +0200, Arnd Bergmann wrote:
> On Tuesday 14 August 2007, Paul E. McKenney wrote:
> > > #define order(x) asm volatile("" : "+m" (x))
> > 
> > There was something very similar discussed earlier in this thread,
> > with quite a bit of debate as to exactly what the "m" flag should
> > look like.  I suggested something similar named ACCESS_ONCE in the
> > context of RCU (http://lkml.org/lkml/2007/7/11/664):
> > 
> >         #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> > 
> > The nice thing about this is that it works for both loads and stores.
> > Not clear that order() above does this -- I get compiler errors when
> > I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2.
> 
> Well, it serves a different purpose: While your ACCESS_ONCE() macro is
> an lvalue, the order() macro is a statement that can be used in place
> of the barrier() macro. order() is the most lightweight barrier as it
> only enforces ordering on a single variable in the compiler, but does
> not have any side-effects visible to other threads, like the cache
> line access in ACCESS_ONCE has.

ACCESS_ONCE() is indeed intended to be used when actually loading or
storing the variable.  That said, I must admit that it is not clear to me
why you would want to add an extra order() rather than ACCESS_ONCE()ing
one or both of the adjacent accesses to that same variable.

So, what am I missing?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-14 22:43                   ` Paul E. McKenney
@ 2007-08-15 13:29                     ` Arnd Bergmann
  2007-08-15 15:06                       ` Michael Buesch
  0 siblings, 1 reply; 30+ messages in thread
From: Arnd Bergmann @ 2007-08-15 13:29 UTC (permalink / raw)
  To: paulmck
  Cc: Nick Piggin, Herbert Xu, csnook, dhowells, linux-kernel,
	linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl

On Wednesday 15 August 2007, Paul E. McKenney wrote:

> ACCESS_ONCE() is indeed intended to be used when actually loading or
> storing the variable.  That said, I must admit that it is not clear to me
> why you would want to add an extra order() rather than ACCESS_ONCE()ing
> one or both of the adjacent accesses to that same variable.
> 
> So, what am I missing?

You're probably right, the only case I can construct is something like

	if (ACCESS_ONCE(x)) {
		...
		ACCESS_ONCE(x)++;
	}

which would be slightly less efficient than

	if (x)
		x++;
	order(x);

because in the first case, you need to do two ordered accesses
but only one in the second case. However, I can't think of a case
where this actually makes a noticable difference in real life.

	Arnd <><

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-15 13:29                     ` Arnd Bergmann
@ 2007-08-15 15:06                       ` Michael Buesch
  0 siblings, 0 replies; 30+ messages in thread
From: Michael Buesch @ 2007-08-15 15:06 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: paulmck, Nick Piggin, Herbert Xu, csnook, dhowells, linux-kernel,
	linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl

On Wednesday 15 August 2007 15:29:43 Arnd Bergmann wrote:
> On Wednesday 15 August 2007, Paul E. McKenney wrote:
> 
> > ACCESS_ONCE() is indeed intended to be used when actually loading or
> > storing the variable.  That said, I must admit that it is not clear to me
> > why you would want to add an extra order() rather than ACCESS_ONCE()ing
> > one or both of the adjacent accesses to that same variable.
> > 
> > So, what am I missing?
> 
> You're probably right, the only case I can construct is something like
> 
> 	if (ACCESS_ONCE(x)) {
> 		...
> 		ACCESS_ONCE(x)++;
> 	}
> 
> which would be slightly less efficient than
> 
> 	if (x)
> 		x++;
> 	order(x);
> 
> because in the first case, you need to do two ordered accesses
> but only one in the second case. However, I can't think of a case
> where this actually makes a noticable difference in real life.

How can this example actually get used in a sane and race-free
way? This would need locking around the whole if
statement. But locking is a barrier.

-- 
Greetings Michael.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-14 17:01               ` Paul E. McKenney
  2007-08-14 22:01                 ` Arnd Bergmann
@ 2007-08-15 13:30                 ` Nick Piggin
  2007-08-15 20:15                   ` Paul E. McKenney
  1 sibling, 1 reply; 30+ messages in thread
From: Nick Piggin @ 2007-08-15 13:30 UTC (permalink / raw)
  To: paulmck
  Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds,
	netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	Arnd Bergmann

Paul E. McKenney wrote:
> On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote:

>>Maybe it is the safe way to go, but it does obscure cases where there
>>is a real need for barriers.
> 
> 
> I prefer burying barriers into other primitives.

When they should naturally be there, eg. locking or the RCU primitives,
I agree.

I don't like having them scattered in various "just in case" places,
because it makes both the users and the APIs hard to understand and
change.

Especially since several big architectures don't have volatile in their
atomic_get and _set, I think it would be a step backwards to add them in
as a "just in case" thin now (unless there is a better reason).

>>Many atomic operations are allowed to be reordered between CPUs, so
>>I don't have a good idea for the rationale to order them within the
>>CPU (also loads and stores to long and ptr types are not ordered like
>>this, although we do consider those to be atomic operations too).
>>
>>barrier() in a way is like enforcing sequential memory ordering
>>between process and interrupt context, wheras volatile is just
>>enforcing coherency of a single memory location (and as such is
>>cheaper).
> 
> 
> barrier() is useful, but it has the very painful side-effect of forcing
> the compiler to dump temporaries.  So we do need something that is
> not quite so global in effect.

Yep.

>>What do you think of this crazy idea?
>>
>>/* Enforce a compiler barrier for only operations to location X.
>> * Call multiple times to provide an ordering between multiple
>> * memory locations. Other memory operations can be assumed by
>> * the compiler to remain unchanged and may be reordered
>> */
>>#define order(x) asm volatile("" : "+m" (x))
> 
> 
> There was something very similar discussed earlier in this thread,
> with quite a bit of debate as to exactly what the "m" flag should
> look like.  I suggested something similar named ACCESS_ONCE in the
> context of RCU (http://lkml.org/lkml/2007/7/11/664):

Oh, I missed that earlier debate. Will go have a look.

> 	#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> 
> The nice thing about this is that it works for both loads and stores.
> Not clear that order() above does this -- I get compiler errors when
> I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2.

As Arnd ponted out, order() is not supposed to be an lvalue, but a
statement like the rest of our memory ordering API.

As to your followup question of why to use it over ACCESS_ONCE. I
guess, aside from consistency with the rest of the barrier APIs, you
can use it in other primitives when you don't actually know what the
caller is going to do or if it even will make an access. You could
also use it between calls to _other_ primitives, etc... it just
seems more flexible to me, but I haven't actually used such a thing
in real code...

ACCESS_ONCE doesn't seem as descriptive. What it results in is the
memory location being loaded or stored (presumably once exactly),
but I think the more general underlying idea is a barrier point.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-15 13:30                 ` Nick Piggin
@ 2007-08-15 20:15                   ` Paul E. McKenney
  2007-08-16  1:09                     ` Nick Piggin
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2007-08-15 20:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds,
	netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	Arnd Bergmann

On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote:
> Paul E. McKenney wrote:
> >On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote:
> 
> >>Maybe it is the safe way to go, but it does obscure cases where there
> >>is a real need for barriers.
> >
> >
> >I prefer burying barriers into other primitives.
> 
> When they should naturally be there, eg. locking or the RCU primitives,
> I agree.
> 
> I don't like having them scattered in various "just in case" places,
> because it makes both the users and the APIs hard to understand and
> change.

I certainly agree that we shouldn't do volatile just for the fun of it,
and also that use of volatile should be quite rare.

> Especially since several big architectures don't have volatile in their
> atomic_get and _set, I think it would be a step backwards to add them in
> as a "just in case" thin now (unless there is a better reason).

Good point, except that I would expect gcc's optimization to continue
to improve.  I would like the kernel to be able to take advantage of
improved optimization, which means that we are going to have to make
a few changes.  Adding volatile to atomic_get() and atomic_set() is
IMHO one of those changes.

> >>Many atomic operations are allowed to be reordered between CPUs, so
> >>I don't have a good idea for the rationale to order them within the
> >>CPU (also loads and stores to long and ptr types are not ordered like
> >>this, although we do consider those to be atomic operations too).
> >>
> >>barrier() in a way is like enforcing sequential memory ordering
> >>between process and interrupt context, wheras volatile is just
> >>enforcing coherency of a single memory location (and as such is
> >>cheaper).
> >
> >
> >barrier() is useful, but it has the very painful side-effect of forcing
> >the compiler to dump temporaries.  So we do need something that is
> >not quite so global in effect.
> 
> Yep.
> 
> >>What do you think of this crazy idea?
> >>
> >>/* Enforce a compiler barrier for only operations to location X.
> >>* Call multiple times to provide an ordering between multiple
> >>* memory locations. Other memory operations can be assumed by
> >>* the compiler to remain unchanged and may be reordered
> >>*/
> >>#define order(x) asm volatile("" : "+m" (x))
> >
> >There was something very similar discussed earlier in this thread,
> >with quite a bit of debate as to exactly what the "m" flag should
> >look like.  I suggested something similar named ACCESS_ONCE in the
> >context of RCU (http://lkml.org/lkml/2007/7/11/664):
> 
> Oh, I missed that earlier debate. Will go have a look.
> 
> >	#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> >
> >The nice thing about this is that it works for both loads and stores.
> >Not clear that order() above does this -- I get compiler errors when
> >I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2.
> 
> As Arnd ponted out, order() is not supposed to be an lvalue, but a
> statement like the rest of our memory ordering API.
> 
> As to your followup question of why to use it over ACCESS_ONCE. I
> guess, aside from consistency with the rest of the barrier APIs, you
> can use it in other primitives when you don't actually know what the
> caller is going to do or if it even will make an access. You could
> also use it between calls to _other_ primitives, etc... it just
> seems more flexible to me, but I haven't actually used such a thing
> in real code...
> 
> ACCESS_ONCE doesn't seem as descriptive. What it results in is the
> memory location being loaded or stored (presumably once exactly),
> but I think the more general underlying idea is a barrier point.

OK, first, I am not arguing that ACCESS_ONCE() can replace all current
uses of barrier().  Similarly, rcu_dereference(), rcu_assign_pointer(),
and the various RCU versions of the list-manipulation API in no way
replaced all uses of explicit memory barriers.  However, I do believe that
these API are much easier to use (where they apply, of course) than were
the earlier idioms involving explicit memory barriers.

But we of course need to keep the explicit memory and compiler barriers
for other situations.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-15 20:15                   ` Paul E. McKenney
@ 2007-08-16  1:09                     ` Nick Piggin
  2007-08-16  2:27                       ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Nick Piggin @ 2007-08-16  1:09 UTC (permalink / raw)
  To: paulmck
  Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds,
	netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	Arnd Bergmann

Paul E. McKenney wrote:
> On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote:

>>Especially since several big architectures don't have volatile in their
>>atomic_get and _set, I think it would be a step backwards to add them in
>>as a "just in case" thin now (unless there is a better reason).
> 
> 
> Good point, except that I would expect gcc's optimization to continue
> to improve.  I would like the kernel to be able to take advantage of
> improved optimization, which means that we are going to have to make
> a few changes.  Adding volatile to atomic_get() and atomic_set() is
> IMHO one of those changes.

What optimisations? gcc already does most of the things you need a
barrier/volatile for, like reordering non-dependant loads and stores,
and eliminating mem ops completely by caching in registers.


>>As to your followup question of why to use it over ACCESS_ONCE. I
>>guess, aside from consistency with the rest of the barrier APIs, you
>>can use it in other primitives when you don't actually know what the
>>caller is going to do or if it even will make an access. You could
>>also use it between calls to _other_ primitives, etc... it just
>>seems more flexible to me, but I haven't actually used such a thing
>>in real code...
>>
>>ACCESS_ONCE doesn't seem as descriptive. What it results in is the
>>memory location being loaded or stored (presumably once exactly),
>>but I think the more general underlying idea is a barrier point.
> 
> 
> OK, first, I am not arguing that ACCESS_ONCE() can replace all current
> uses of barrier().

OK. Well I also wasn't saying that ACCESS_ONCE should not be
implemented. But if we want something like it, then it would make
sense to have an equivalent barrier statement as well (ie. order()).

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-16  1:09                     ` Nick Piggin
@ 2007-08-16  2:27                       ` Paul E. McKenney
  0 siblings, 0 replies; 30+ messages in thread
From: Paul E. McKenney @ 2007-08-16  2:27 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds,
	netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	Arnd Bergmann

On Thu, Aug 16, 2007 at 11:09:25AM +1000, Nick Piggin wrote:
> Paul E. McKenney wrote:
> >On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote:
> 
> >>Especially since several big architectures don't have volatile in their
> >>atomic_get and _set, I think it would be a step backwards to add them in
> >>as a "just in case" thin now (unless there is a better reason).
> >
> >Good point, except that I would expect gcc's optimization to continue
> >to improve.  I would like the kernel to be able to take advantage of
> >improved optimization, which means that we are going to have to make
> >a few changes.  Adding volatile to atomic_get() and atomic_set() is
> >IMHO one of those changes.
> 
> What optimisations? gcc already does most of the things you need a
> barrier/volatile for, like reordering non-dependant loads and stores,
> and eliminating mem ops completely by caching in registers.

Yep, most of the currently practiced optimizations.  Given that CPU clock
frequencies are not rising as quickly as they once did, I would expect
that there will be added effort on implementing the known more-aggressive
optimization techniques, and on coming up with new ones as well.

Some of these new optimizations will likely be inappropriate for kernel
code, but I expect that some will be things that we want.

> >>As to your followup question of why to use it over ACCESS_ONCE. I
> >>guess, aside from consistency with the rest of the barrier APIs, you
> >>can use it in other primitives when you don't actually know what the
> >>caller is going to do or if it even will make an access. You could
> >>also use it between calls to _other_ primitives, etc... it just
> >>seems more flexible to me, but I haven't actually used such a thing
> >>in real code...
> >>
> >>ACCESS_ONCE doesn't seem as descriptive. What it results in is the
> >>memory location being loaded or stored (presumably once exactly),
> >>but I think the more general underlying idea is a barrier point.
> >
> >OK, first, I am not arguing that ACCESS_ONCE() can replace all current
> >uses of barrier().
> 
> OK. Well I also wasn't saying that ACCESS_ONCE should not be
> implemented. But if we want something like it, then it would make
> sense to have an equivalent barrier statement as well (ie. order()).

And I am not arguing against use of asms to implement the volatility
in atomic_read() and atomic_set(), and in fact it appears that one
of the architectures will be taking this approach.

Sounds like we might be in violent agreement.  ;-)

						Thanx, Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-10  9:23 ` David Howells
  2007-08-10 19:54   ` Chris Snook
@ 2007-08-11  8:47   ` David Howells
  2007-08-13  6:44     ` Chris Snook
  1 sibling, 1 reply; 30+ messages in thread
From: David Howells @ 2007-08-11  8:47 UTC (permalink / raw)
  To: Chris Snook
  Cc: dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Chris Snook <csnook@redhat.com> wrote:

> cpu_relax() contains a barrier, so it should do the right thing.  For non-smp
> architectures, I'm concerned about interacting with interrupt handlers.  Some
> drivers do use atomic_* operations.

I'm not sure that actually answers my question.  Why not smp_rmb()?

David

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-11  8:47   ` David Howells
@ 2007-08-13  6:44     ` Chris Snook
  2007-08-14  5:42       ` Nick Piggin
  0 siblings, 1 reply; 30+ messages in thread
From: Chris Snook @ 2007-08-13  6:44 UTC (permalink / raw)
  To: David Howells
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

David Howells wrote:
> Chris Snook <csnook@redhat.com> wrote:
> 
>> cpu_relax() contains a barrier, so it should do the right thing.  For non-smp
>> architectures, I'm concerned about interacting with interrupt handlers.  Some
>> drivers do use atomic_* operations.
> 
> I'm not sure that actually answers my question.  Why not smp_rmb()?
> 
> David

I would assume because we want to waste time efficiently even on non-smp 
architectures, rather than frying the CPU or draining the battery.  Certain 
looping execution patterns can cause the CPU to operate above thermal design 
power.  I have fans on my workstation that only ever come on when running 
LINPACK, and that's generally memory bandwidth-bound.  Just imagine what happens 
when you're executing the same few non-serializing instructions in a tight loop 
without ever stalling on memory fetches, or being scheduled out.

If there's another reason, I'd like to hear it too, because I'm just guessing here.

	-- Chris

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-13  6:44     ` Chris Snook
@ 2007-08-14  5:42       ` Nick Piggin
  2007-08-15 18:51         ` Segher Boessenkool
  0 siblings, 1 reply; 30+ messages in thread
From: Nick Piggin @ 2007-08-14  5:42 UTC (permalink / raw)
  To: Chris Snook
  Cc: David Howells, linux-kernel, linux-arch, torvalds, netdev, akpm,
	ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Chris Snook wrote:
> David Howells wrote:
> 
>> Chris Snook <csnook@redhat.com> wrote:
>>
>>> cpu_relax() contains a barrier, so it should do the right thing.  For 
>>> non-smp
>>> architectures, I'm concerned about interacting with interrupt 
>>> handlers.  Some
>>> drivers do use atomic_* operations.
>>
>>
>> I'm not sure that actually answers my question.  Why not smp_rmb()?
>>
>> David
> 
> 
> I would assume because we want to waste time efficiently even on non-smp 
> architectures, rather than frying the CPU or draining the battery.  
> Certain looping execution patterns can cause the CPU to operate above 
> thermal design power.  I have fans on my workstation that only ever come 
> on when running LINPACK, and that's generally memory bandwidth-bound.  
> Just imagine what happens when you're executing the same few 
> non-serializing instructions in a tight loop without ever stalling on 
> memory fetches, or being scheduled out.
> 
> If there's another reason, I'd like to hear it too, because I'm just 
> guessing here.

Well if there is only one memory location involved, then smp_rmb() isn't
going to really do anything anyway, so it would be incorrect to use it.

Consider that smp_rmb basically will do anything from flushing the
pipeline to invalidating loads speculatively executed out of order. AFAIK
it will not control the visibility of stores coming from other CPUs (that
is up to the cache coherency).

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-14  5:42       ` Nick Piggin
@ 2007-08-15 18:51         ` Segher Boessenkool
  2007-08-15 19:18           ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Segher Boessenkool @ 2007-08-15 18:51 UTC (permalink / raw)
  To: Nick Piggin
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen,
	akpm, torvalds, jesper.juhl, linux-arch, zlynx, schwidefsky,
	Chris Snook, davem, wensong, wjiang, David Howells

> Well if there is only one memory location involved, then smp_rmb() 
> isn't
> going to really do anything anyway, so it would be incorrect to use it.

rmb() orders *any* two reads; that includes two reads from the same
location.

> Consider that smp_rmb basically will do anything from flushing the
> pipeline to invalidating loads speculatively executed out of order. 
> AFAIK
> it will not control the visibility of stores coming from other CPUs 
> (that
> is up to the cache coherency).

The writer side should typically use wmb() whenever the reader side
uses rmb(), sure.


Segher


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-15 18:51         ` Segher Boessenkool
@ 2007-08-15 19:18           ` Paul E. McKenney
  2007-08-15 19:46             ` Segher Boessenkool
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2007-08-15 19:18 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Nick Piggin, heiko.carstens, horms, linux-kernel, rpjday, ak,
	netdev, cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx,
	schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells

On Wed, Aug 15, 2007 at 08:51:58PM +0200, Segher Boessenkool wrote:
> >Well if there is only one memory location involved, then smp_rmb() 
> >isn't
> >going to really do anything anyway, so it would be incorrect to use it.
> 
> rmb() orders *any* two reads; that includes two reads from the same
> location.

If the two reads are to the same location, all CPUs I am aware of
will maintain the ordering without need for a memory barrier.

						Thanx, Paul

> >Consider that smp_rmb basically will do anything from flushing the
> >pipeline to invalidating loads speculatively executed out of order. 
> >AFAIK
> >it will not control the visibility of stores coming from other CPUs 
> >(that
> >is up to the cache coherency).
> 
> The writer side should typically use wmb() whenever the reader side
> uses rmb(), sure.
> 
> 
> Segher
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-15 19:18           ` Paul E. McKenney
@ 2007-08-15 19:46             ` Segher Boessenkool
  2007-08-15 19:59               ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Segher Boessenkool @ 2007-08-15 19:46 UTC (permalink / raw)
  To: paulmck
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen,
	akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx,
	schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells

>>> Well if there is only one memory location involved, then smp_rmb()
>>> isn't
>>> going to really do anything anyway, so it would be incorrect to use 
>>> it.
>>
>> rmb() orders *any* two reads; that includes two reads from the same
>> location.
>
> If the two reads are to the same location, all CPUs I am aware of
> will maintain the ordering without need for a memory barrier.

That's true of course, although there is no real guarantee for that.


Segher


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-15 19:46             ` Segher Boessenkool
@ 2007-08-15 19:59               ` Paul E. McKenney
  2007-08-15 20:13                 ` Segher Boessenkool
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2007-08-15 19:59 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen,
	akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx,
	schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells

On Wed, Aug 15, 2007 at 09:46:55PM +0200, Segher Boessenkool wrote:
> >>>Well if there is only one memory location involved, then smp_rmb()
> >>>isn't
> >>>going to really do anything anyway, so it would be incorrect to use 
> >>>it.
> >>
> >>rmb() orders *any* two reads; that includes two reads from the same
> >>location.
> >
> >If the two reads are to the same location, all CPUs I am aware of
> >will maintain the ordering without need for a memory barrier.
> 
> That's true of course, although there is no real guarantee for that.

A CPU that did not provide this property ("cache coherence") would be
most emphatically reviled.  So we are pretty safe assuming that CPUs
will provide it.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-15 19:59               ` Paul E. McKenney
@ 2007-08-15 20:13                 ` Segher Boessenkool
  2007-08-15 20:38                   ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Segher Boessenkool @ 2007-08-15 20:13 UTC (permalink / raw)
  To: paulmck
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen,
	akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx,
	schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells

>>>>> Well if there is only one memory location involved, then smp_rmb()
>>>>> isn't
>>>>> going to really do anything anyway, so it would be incorrect to use
>>>>> it.
>>>>
>>>> rmb() orders *any* two reads; that includes two reads from the same
>>>> location.
>>>
>>> If the two reads are to the same location, all CPUs I am aware of
>>> will maintain the ordering without need for a memory barrier.
>>
>> That's true of course, although there is no real guarantee for that.
>
> A CPU that did not provide this property ("cache coherence") would be
> most emphatically reviled.

That doesn't have anything to do with coherency as far as I can see.

It's just about the order in which a CPU (speculatively) performs the 
loads
(which isn't necessarily the same as the order in which it executes the
corresponding instructions, even).

> So we are pretty safe assuming that CPUs
> will provide it.

Yeah, pretty safe.  I just don't like undocumented assumptions :-)


Segher


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-15 20:13                 ` Segher Boessenkool
@ 2007-08-15 20:38                   ` Paul E. McKenney
  2007-08-15 21:15                     ` Segher Boessenkool
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2007-08-15 20:38 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen,
	akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx,
	schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells

On Wed, Aug 15, 2007 at 10:13:49PM +0200, Segher Boessenkool wrote:
> >>>>>Well if there is only one memory location involved, then smp_rmb()
> >>>>>isn't
> >>>>>going to really do anything anyway, so it would be incorrect to use
> >>>>>it.
> >>>>
> >>>>rmb() orders *any* two reads; that includes two reads from the same
> >>>>location.
> >>>
> >>>If the two reads are to the same location, all CPUs I am aware of
> >>>will maintain the ordering without need for a memory barrier.
> >>
> >>That's true of course, although there is no real guarantee for that.
> >
> >A CPU that did not provide this property ("cache coherence") would be
> >most emphatically reviled.
> 
> That doesn't have anything to do with coherency as far as I can see.
> 
> It's just about the order in which a CPU (speculatively) performs the 
> loads
> (which isn't necessarily the same as the order in which it executes the
> corresponding instructions, even).

Please check the definition of "cache coherence".

Summary: the CPU is indeed within its rights to execute loads and stores
to a single variable out of order, -but- only if it gets the same result
that it would have obtained by executing them in order.  Which means that
any reordering of accesses by a single CPU to a single variable will be
invisible to the software.

> >So we are pretty safe assuming that CPUs
> >will provide it.
> 
> Yeah, pretty safe.  I just don't like undocumented assumptions :-)

Can't help you there!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-15 20:38                   ` Paul E. McKenney
@ 2007-08-15 21:15                     ` Segher Boessenkool
  2007-08-16  1:20                       ` Nick Piggin
  0 siblings, 1 reply; 30+ messages in thread
From: Segher Boessenkool @ 2007-08-15 21:15 UTC (permalink / raw)
  To: paulmck
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen,
	akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx,
	schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells

> Please check the definition of "cache coherence".

Which of the twelve thousand such definitions?  :-)

> Summary: the CPU is indeed within its rights to execute loads and 
> stores
> to a single variable out of order, -but- only if it gets the same 
> result
> that it would have obtained by executing them in order.  Which means 
> that
> any reordering of accesses by a single CPU to a single variable will be
> invisible to the software.

I'm still not sure if that applies to all architectures.
Doesn't matter anyway, let's kill this thread :-)


Segher


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
  2007-08-15 21:15                     ` Segher Boessenkool
@ 2007-08-16  1:20                       ` Nick Piggin
  0 siblings, 0 replies; 30+ messages in thread
From: Nick Piggin @ 2007-08-16  1:20 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: paulmck, heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, torvalds, linux-arch, jesper.juhl, zlynx,
	schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells

Segher Boessenkool wrote:
>> Please check the definition of "cache coherence".
> 
> 
> Which of the twelve thousand such definitions?  :-)

Every definition I have seen says that writes to a single memory
location have a serial order as seen by all CPUs, and that a read
will return the most recent write in the sequence (with a bit of
extra mumbo jumbo to cover store queues and store forwarding).

Within such a definition, I don't see how would be allowed for a
single CPU to execute reads out of order and have the second read
return an earlier write than the first read.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2007-08-16  2:27 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-09 13:41 [PATCH 6/24] make atomic_read() behave consistently on frv Chris Snook
2007-08-09 16:54 ` Chris Snook
2007-08-10  9:23 ` David Howells
2007-08-10 19:54   ` Chris Snook
2007-08-11  0:54     ` Herbert Xu
2007-08-11  4:29       ` Paul E. McKenney
2007-08-13  5:15         ` Herbert Xu
2007-08-13  6:03           ` Paul E. McKenney
2007-08-14  5:34             ` Nick Piggin
2007-08-14  7:26               ` Herbert Xu
2007-08-14 17:01               ` Paul E. McKenney
2007-08-14 22:01                 ` Arnd Bergmann
2007-08-14 22:43                   ` Paul E. McKenney
2007-08-15 13:29                     ` Arnd Bergmann
2007-08-15 15:06                       ` Michael Buesch
2007-08-15 13:30                 ` Nick Piggin
2007-08-15 20:15                   ` Paul E. McKenney
2007-08-16  1:09                     ` Nick Piggin
2007-08-16  2:27                       ` Paul E. McKenney
2007-08-11  8:47   ` David Howells
2007-08-13  6:44     ` Chris Snook
2007-08-14  5:42       ` Nick Piggin
2007-08-15 18:51         ` Segher Boessenkool
2007-08-15 19:18           ` Paul E. McKenney
2007-08-15 19:46             ` Segher Boessenkool
2007-08-15 19:59               ` Paul E. McKenney
2007-08-15 20:13                 ` Segher Boessenkool
2007-08-15 20:38                   ` Paul E. McKenney
2007-08-15 21:15                     ` Segher Boessenkool
2007-08-16  1:20                       ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).