* [PATCH 6/24] make atomic_read() behave consistently on frv @ 2007-08-09 13:41 Chris Snook 2007-08-09 16:54 ` Chris Snook 2007-08-10 9:23 ` David Howells 0 siblings, 2 replies; 30+ messages in thread From: Chris Snook @ 2007-08-09 13:41 UTC (permalink / raw) To: linux-kernel, linux-arch, torvalds Cc: netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl From: Chris Snook <csnook@redhat.com> Make atomic_read() volatile on frv. This ensures that busy-waiting for an interrupt handler to change an atomic_t won't get compiled to an infinite loop, consistent with SMP architectures. Signed-off-by: Chris Snook <csnook@redhat.com> --- linux-2.6.23-rc2-orig/include/asm-frv/atomic.h 2007-07-08 19:32:17.000000000 -0400 +++ linux-2.6.23-rc2/include/asm-frv/atomic.h 2007-08-09 06:41:48.000000000 -0400 @@ -40,7 +40,12 @@ typedef struct { } atomic_t; #define ATOMIC_INIT(i) { (i) } -#define atomic_read(v) ((v)->counter) + +/* + * Casting to volatile here minimizes the need for barriers, + * without having to declare the type itself as volatile. + */ +#define atomic_read(v) (*(volatile int *)&(v)->counter) #define atomic_set(v, i) (((v)->counter) = (i)) #ifndef CONFIG_FRV_OUTOFLINE_ATOMIC_OPS ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-09 13:41 [PATCH 6/24] make atomic_read() behave consistently on frv Chris Snook @ 2007-08-09 16:54 ` Chris Snook 2007-08-10 9:23 ` David Howells 1 sibling, 0 replies; 30+ messages in thread From: Chris Snook @ 2007-08-09 16:54 UTC (permalink / raw) To: linux-kernel, linux-arch, torvalds Cc: netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Chris Snook wrote: > From: Chris Snook <csnook@redhat.com> > > Make atomic_read() volatile on frv. This ensures that busy-waiting > for an interrupt handler to change an atomic_t won't get compiled to an > infinite loop, consistent with SMP architectures. To head off the criticism, I admit this is an oversimplification, and true busy-waiters should be using cpu_relax(), which contains a barrier. As discussed in recent threads, there are other cases which can be optimized by removing the need for a barrier, and having behavior consistent with architectures where the benefit is more profound is also valuable. -- Chris ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-09 13:41 [PATCH 6/24] make atomic_read() behave consistently on frv Chris Snook 2007-08-09 16:54 ` Chris Snook @ 2007-08-10 9:23 ` David Howells 2007-08-10 19:54 ` Chris Snook 2007-08-11 8:47 ` David Howells 1 sibling, 2 replies; 30+ messages in thread From: David Howells @ 2007-08-10 9:23 UTC (permalink / raw) To: Chris Snook Cc: dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Chris Snook <csnook@redhat.com> wrote: > To head off the criticism, I admit this is an oversimplification, and true > busy-waiters should be using cpu_relax(), which contains a barrier. Why would you want to use cpu_relax()? That's there to waste time efficiently, isn't it? Shouldn't you be using smp_rmb() or something like that? David ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-10 9:23 ` David Howells @ 2007-08-10 19:54 ` Chris Snook 2007-08-11 0:54 ` Herbert Xu 2007-08-11 8:47 ` David Howells 1 sibling, 1 reply; 30+ messages in thread From: Chris Snook @ 2007-08-10 19:54 UTC (permalink / raw) To: David Howells Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl David Howells wrote: > Chris Snook <csnook@redhat.com> wrote: > >> To head off the criticism, I admit this is an oversimplification, and true >> busy-waiters should be using cpu_relax(), which contains a barrier. > > Why would you want to use cpu_relax()? That's there to waste time efficiently, > isn't it? Shouldn't you be using smp_rmb() or something like that? > > David cpu_relax() contains a barrier, so it should do the right thing. For non-smp architectures, I'm concerned about interacting with interrupt handlers. Some drivers do use atomic_* operations. -- Chris ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-10 19:54 ` Chris Snook @ 2007-08-11 0:54 ` Herbert Xu 2007-08-11 4:29 ` Paul E. McKenney 0 siblings, 1 reply; 30+ messages in thread From: Herbert Xu @ 2007-08-11 0:54 UTC (permalink / raw) To: Chris Snook Cc: dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Chris Snook <csnook@redhat.com> wrote: > > cpu_relax() contains a barrier, so it should do the right thing. For > non-smp architectures, I'm concerned about interacting with interrupt > handlers. Some drivers do use atomic_* operations. What problems with interrupt handlers? Access to int/long must be atomic or we're in big trouble anyway. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-11 0:54 ` Herbert Xu @ 2007-08-11 4:29 ` Paul E. McKenney 2007-08-13 5:15 ` Herbert Xu 0 siblings, 1 reply; 30+ messages in thread From: Paul E. McKenney @ 2007-08-11 4:29 UTC (permalink / raw) To: Herbert Xu Cc: Chris Snook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote: > Chris Snook <csnook@redhat.com> wrote: > > > > cpu_relax() contains a barrier, so it should do the right thing. For > > non-smp architectures, I'm concerned about interacting with interrupt > > handlers. Some drivers do use atomic_* operations. > > What problems with interrupt handlers? Access to int/long must > be atomic or we're in big trouble anyway. Reordering due to compiler optimizations. CPU reordering does not affect interactions with interrupt handlers on a given CPU, but reordering due to compiler code-movement optimization does. Since volatile can in some cases suppress code-movement optimizations, it can affect interactions with interrupt handlers. Thanx, Paul ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-11 4:29 ` Paul E. McKenney @ 2007-08-13 5:15 ` Herbert Xu 2007-08-13 6:03 ` Paul E. McKenney 0 siblings, 1 reply; 30+ messages in thread From: Herbert Xu @ 2007-08-13 5:15 UTC (permalink / raw) To: paulmck Cc: herbert, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote: >> Chris Snook <csnook@redhat.com> wrote: >> > >> > cpu_relax() contains a barrier, so it should do the right thing. For >> > non-smp architectures, I'm concerned about interacting with interrupt >> > handlers. Some drivers do use atomic_* operations. >> >> What problems with interrupt handlers? Access to int/long must >> be atomic or we're in big trouble anyway. > > Reordering due to compiler optimizations. CPU reordering does not > affect interactions with interrupt handlers on a given CPU, but > reordering due to compiler code-movement optimization does. Since > volatile can in some cases suppress code-movement optimizations, > it can affect interactions with interrupt handlers. If such reordering matters, then you should use one of the *mb macros or barrier() rather than relying on possibly hidden volatile cast. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-13 5:15 ` Herbert Xu @ 2007-08-13 6:03 ` Paul E. McKenney 2007-08-14 5:34 ` Nick Piggin 0 siblings, 1 reply; 30+ messages in thread From: Paul E. McKenney @ 2007-08-13 6:03 UTC (permalink / raw) To: Herbert Xu Cc: csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Mon, Aug 13, 2007 at 01:15:52PM +0800, Herbert Xu wrote: > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote: > >> Chris Snook <csnook@redhat.com> wrote: > >> > > >> > cpu_relax() contains a barrier, so it should do the right thing. For > >> > non-smp architectures, I'm concerned about interacting with interrupt > >> > handlers. Some drivers do use atomic_* operations. > >> > >> What problems with interrupt handlers? Access to int/long must > >> be atomic or we're in big trouble anyway. > > > > Reordering due to compiler optimizations. CPU reordering does not > > affect interactions with interrupt handlers on a given CPU, but > > reordering due to compiler code-movement optimization does. Since > > volatile can in some cases suppress code-movement optimizations, > > it can affect interactions with interrupt handlers. > > If such reordering matters, then you should use one of the > *mb macros or barrier() rather than relying on possibly > hidden volatile cast. If communicating among CPUs, sure. However, when communicating between mainline and interrupt/NMI handlers on the same CPU, the barrier() and most expecially the *mb() macros are gross overkill. So there really truly is a place for volatile -- not a large place, to be sure, but a place nonetheless. Thanx, Paul ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-13 6:03 ` Paul E. McKenney @ 2007-08-14 5:34 ` Nick Piggin 2007-08-14 7:26 ` Herbert Xu 2007-08-14 17:01 ` Paul E. McKenney 0 siblings, 2 replies; 30+ messages in thread From: Nick Piggin @ 2007-08-14 5:34 UTC (permalink / raw) To: paulmck Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Paul E. McKenney wrote: > On Mon, Aug 13, 2007 at 01:15:52PM +0800, Herbert Xu wrote: > >>Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: >> >>>On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote: >>> >>>>Chris Snook <csnook@redhat.com> wrote: >>>> >>>>>cpu_relax() contains a barrier, so it should do the right thing. For >>>>>non-smp architectures, I'm concerned about interacting with interrupt >>>>>handlers. Some drivers do use atomic_* operations. >>>> >>>>What problems with interrupt handlers? Access to int/long must >>>>be atomic or we're in big trouble anyway. >>> >>>Reordering due to compiler optimizations. CPU reordering does not >>>affect interactions with interrupt handlers on a given CPU, but >>>reordering due to compiler code-movement optimization does. Since >>>volatile can in some cases suppress code-movement optimizations, >>>it can affect interactions with interrupt handlers. >> >>If such reordering matters, then you should use one of the >>*mb macros or barrier() rather than relying on possibly >>hidden volatile cast. > > > If communicating among CPUs, sure. However, when communicating between > mainline and interrupt/NMI handlers on the same CPU, the barrier() and > most expecially the *mb() macros are gross overkill. So there really > truly is a place for volatile -- not a large place, to be sure, but a > place nonetheless. I really would like all volatile users to go away and be replaced by explicit barriers. It makes things nicer and more explicit... for atomic_t type there probably aren't many optimisations that can be made which volatile would disallow (in actual kernel code), but for others (eg. bitops, maybe atomic ops in UP kernels), there would be. Maybe it is the safe way to go, but it does obscure cases where there is a real need for barriers. Many atomic operations are allowed to be reordered between CPUs, so I don't have a good idea for the rationale to order them within the CPU (also loads and stores to long and ptr types are not ordered like this, although we do consider those to be atomic operations too). barrier() in a way is like enforcing sequential memory ordering between process and interrupt context, wheras volatile is just enforcing coherency of a single memory location (and as such is cheaper). What do you think of this crazy idea? /* Enforce a compiler barrier for only operations to location X. * Call multiple times to provide an ordering between multiple * memory locations. Other memory operations can be assumed by * the compiler to remain unchanged and may be reordered */ #define order(x) asm volatile("" : "+m" (x)) -- SUSE Labs, Novell Inc. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-14 5:34 ` Nick Piggin @ 2007-08-14 7:26 ` Herbert Xu 2007-08-14 17:01 ` Paul E. McKenney 1 sibling, 0 replies; 30+ messages in thread From: Herbert Xu @ 2007-08-14 7:26 UTC (permalink / raw) To: Nick Piggin Cc: paulmck, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote: > > What do you think of this crazy idea? > > /* Enforce a compiler barrier for only operations to location X. > * Call multiple times to provide an ordering between multiple > * memory locations. Other memory operations can be assumed by > * the compiler to remain unchanged and may be reordered > */ > #define order(x) asm volatile("" : "+m" (x)) Yes this is a very good idea. This also makes it explicit that the coder is depending on this rather than the more vague connotations of atomic_read. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-14 5:34 ` Nick Piggin 2007-08-14 7:26 ` Herbert Xu @ 2007-08-14 17:01 ` Paul E. McKenney 2007-08-14 22:01 ` Arnd Bergmann 2007-08-15 13:30 ` Nick Piggin 1 sibling, 2 replies; 30+ messages in thread From: Paul E. McKenney @ 2007-08-14 17:01 UTC (permalink / raw) To: Nick Piggin Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote: > Paul E. McKenney wrote: > >On Mon, Aug 13, 2007 at 01:15:52PM +0800, Herbert Xu wrote: > > > >>Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > >> > >>>On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote: > >>> > >>>>Chris Snook <csnook@redhat.com> wrote: > >>>> > >>>>>cpu_relax() contains a barrier, so it should do the right thing. For > >>>>>non-smp architectures, I'm concerned about interacting with interrupt > >>>>>handlers. Some drivers do use atomic_* operations. > >>>> > >>>>What problems with interrupt handlers? Access to int/long must > >>>>be atomic or we're in big trouble anyway. > >>> > >>>Reordering due to compiler optimizations. CPU reordering does not > >>>affect interactions with interrupt handlers on a given CPU, but > >>>reordering due to compiler code-movement optimization does. Since > >>>volatile can in some cases suppress code-movement optimizations, > >>>it can affect interactions with interrupt handlers. > >> > >>If such reordering matters, then you should use one of the > >>*mb macros or barrier() rather than relying on possibly > >>hidden volatile cast. > > > > > >If communicating among CPUs, sure. However, when communicating between > >mainline and interrupt/NMI handlers on the same CPU, the barrier() and > >most expecially the *mb() macros are gross overkill. So there really > >truly is a place for volatile -- not a large place, to be sure, but a > >place nonetheless. > > I really would like all volatile users to go away and be replaced > by explicit barriers. It makes things nicer and more explicit... for > atomic_t type there probably aren't many optimisations that can be > made which volatile would disallow (in actual kernel code), but for > others (eg. bitops, maybe atomic ops in UP kernels), there would be. > > Maybe it is the safe way to go, but it does obscure cases where there > is a real need for barriers. I prefer burying barriers into other primitives. > Many atomic operations are allowed to be reordered between CPUs, so > I don't have a good idea for the rationale to order them within the > CPU (also loads and stores to long and ptr types are not ordered like > this, although we do consider those to be atomic operations too). > > barrier() in a way is like enforcing sequential memory ordering > between process and interrupt context, wheras volatile is just > enforcing coherency of a single memory location (and as such is > cheaper). barrier() is useful, but it has the very painful side-effect of forcing the compiler to dump temporaries. So we do need something that is not quite so global in effect. > What do you think of this crazy idea? > > /* Enforce a compiler barrier for only operations to location X. > * Call multiple times to provide an ordering between multiple > * memory locations. Other memory operations can be assumed by > * the compiler to remain unchanged and may be reordered > */ > #define order(x) asm volatile("" : "+m" (x)) There was something very similar discussed earlier in this thread, with quite a bit of debate as to exactly what the "m" flag should look like. I suggested something similar named ACCESS_ONCE in the context of RCU (http://lkml.org/lkml/2007/7/11/664): #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) The nice thing about this is that it works for both loads and stores. Not clear that order() above does this -- I get compiler errors when I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2. Thanx, Paul ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-14 17:01 ` Paul E. McKenney @ 2007-08-14 22:01 ` Arnd Bergmann 2007-08-14 22:43 ` Paul E. McKenney 2007-08-15 13:30 ` Nick Piggin 1 sibling, 1 reply; 30+ messages in thread From: Arnd Bergmann @ 2007-08-14 22:01 UTC (permalink / raw) To: paulmck Cc: Nick Piggin, Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Tuesday 14 August 2007, Paul E. McKenney wrote: > > #define order(x) asm volatile("" : "+m" (x)) > > There was something very similar discussed earlier in this thread, > with quite a bit of debate as to exactly what the "m" flag should > look like. I suggested something similar named ACCESS_ONCE in the > context of RCU (http://lkml.org/lkml/2007/7/11/664): > > #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) > > The nice thing about this is that it works for both loads and stores. > Not clear that order() above does this -- I get compiler errors when > I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2. Well, it serves a different purpose: While your ACCESS_ONCE() macro is an lvalue, the order() macro is a statement that can be used in place of the barrier() macro. order() is the most lightweight barrier as it only enforces ordering on a single variable in the compiler, but does not have any side-effects visible to other threads, like the cache line access in ACCESS_ONCE has. Arnd <>< ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-14 22:01 ` Arnd Bergmann @ 2007-08-14 22:43 ` Paul E. McKenney 2007-08-15 13:29 ` Arnd Bergmann 0 siblings, 1 reply; 30+ messages in thread From: Paul E. McKenney @ 2007-08-14 22:43 UTC (permalink / raw) To: Arnd Bergmann Cc: Nick Piggin, Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Wed, Aug 15, 2007 at 12:01:54AM +0200, Arnd Bergmann wrote: > On Tuesday 14 August 2007, Paul E. McKenney wrote: > > > #define order(x) asm volatile("" : "+m" (x)) > > > > There was something very similar discussed earlier in this thread, > > with quite a bit of debate as to exactly what the "m" flag should > > look like. I suggested something similar named ACCESS_ONCE in the > > context of RCU (http://lkml.org/lkml/2007/7/11/664): > > > > #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) > > > > The nice thing about this is that it works for both loads and stores. > > Not clear that order() above does this -- I get compiler errors when > > I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2. > > Well, it serves a different purpose: While your ACCESS_ONCE() macro is > an lvalue, the order() macro is a statement that can be used in place > of the barrier() macro. order() is the most lightweight barrier as it > only enforces ordering on a single variable in the compiler, but does > not have any side-effects visible to other threads, like the cache > line access in ACCESS_ONCE has. ACCESS_ONCE() is indeed intended to be used when actually loading or storing the variable. That said, I must admit that it is not clear to me why you would want to add an extra order() rather than ACCESS_ONCE()ing one or both of the adjacent accesses to that same variable. So, what am I missing? Thanx, Paul ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-14 22:43 ` Paul E. McKenney @ 2007-08-15 13:29 ` Arnd Bergmann 2007-08-15 15:06 ` Michael Buesch 0 siblings, 1 reply; 30+ messages in thread From: Arnd Bergmann @ 2007-08-15 13:29 UTC (permalink / raw) To: paulmck Cc: Nick Piggin, Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Wednesday 15 August 2007, Paul E. McKenney wrote: > ACCESS_ONCE() is indeed intended to be used when actually loading or > storing the variable. That said, I must admit that it is not clear to me > why you would want to add an extra order() rather than ACCESS_ONCE()ing > one or both of the adjacent accesses to that same variable. > > So, what am I missing? You're probably right, the only case I can construct is something like if (ACCESS_ONCE(x)) { ... ACCESS_ONCE(x)++; } which would be slightly less efficient than if (x) x++; order(x); because in the first case, you need to do two ordered accesses but only one in the second case. However, I can't think of a case where this actually makes a noticable difference in real life. Arnd <>< ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-15 13:29 ` Arnd Bergmann @ 2007-08-15 15:06 ` Michael Buesch 0 siblings, 0 replies; 30+ messages in thread From: Michael Buesch @ 2007-08-15 15:06 UTC (permalink / raw) To: Arnd Bergmann Cc: paulmck, Nick Piggin, Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl On Wednesday 15 August 2007 15:29:43 Arnd Bergmann wrote: > On Wednesday 15 August 2007, Paul E. McKenney wrote: > > > ACCESS_ONCE() is indeed intended to be used when actually loading or > > storing the variable. That said, I must admit that it is not clear to me > > why you would want to add an extra order() rather than ACCESS_ONCE()ing > > one or both of the adjacent accesses to that same variable. > > > > So, what am I missing? > > You're probably right, the only case I can construct is something like > > if (ACCESS_ONCE(x)) { > ... > ACCESS_ONCE(x)++; > } > > which would be slightly less efficient than > > if (x) > x++; > order(x); > > because in the first case, you need to do two ordered accesses > but only one in the second case. However, I can't think of a case > where this actually makes a noticable difference in real life. How can this example actually get used in a sane and race-free way? This would need locking around the whole if statement. But locking is a barrier. -- Greetings Michael. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-14 17:01 ` Paul E. McKenney 2007-08-14 22:01 ` Arnd Bergmann @ 2007-08-15 13:30 ` Nick Piggin 2007-08-15 20:15 ` Paul E. McKenney 1 sibling, 1 reply; 30+ messages in thread From: Nick Piggin @ 2007-08-15 13:30 UTC (permalink / raw) To: paulmck Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, Arnd Bergmann Paul E. McKenney wrote: > On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote: >>Maybe it is the safe way to go, but it does obscure cases where there >>is a real need for barriers. > > > I prefer burying barriers into other primitives. When they should naturally be there, eg. locking or the RCU primitives, I agree. I don't like having them scattered in various "just in case" places, because it makes both the users and the APIs hard to understand and change. Especially since several big architectures don't have volatile in their atomic_get and _set, I think it would be a step backwards to add them in as a "just in case" thin now (unless there is a better reason). >>Many atomic operations are allowed to be reordered between CPUs, so >>I don't have a good idea for the rationale to order them within the >>CPU (also loads and stores to long and ptr types are not ordered like >>this, although we do consider those to be atomic operations too). >> >>barrier() in a way is like enforcing sequential memory ordering >>between process and interrupt context, wheras volatile is just >>enforcing coherency of a single memory location (and as such is >>cheaper). > > > barrier() is useful, but it has the very painful side-effect of forcing > the compiler to dump temporaries. So we do need something that is > not quite so global in effect. Yep. >>What do you think of this crazy idea? >> >>/* Enforce a compiler barrier for only operations to location X. >> * Call multiple times to provide an ordering between multiple >> * memory locations. Other memory operations can be assumed by >> * the compiler to remain unchanged and may be reordered >> */ >>#define order(x) asm volatile("" : "+m" (x)) > > > There was something very similar discussed earlier in this thread, > with quite a bit of debate as to exactly what the "m" flag should > look like. I suggested something similar named ACCESS_ONCE in the > context of RCU (http://lkml.org/lkml/2007/7/11/664): Oh, I missed that earlier debate. Will go have a look. > #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) > > The nice thing about this is that it works for both loads and stores. > Not clear that order() above does this -- I get compiler errors when > I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2. As Arnd ponted out, order() is not supposed to be an lvalue, but a statement like the rest of our memory ordering API. As to your followup question of why to use it over ACCESS_ONCE. I guess, aside from consistency with the rest of the barrier APIs, you can use it in other primitives when you don't actually know what the caller is going to do or if it even will make an access. You could also use it between calls to _other_ primitives, etc... it just seems more flexible to me, but I haven't actually used such a thing in real code... ACCESS_ONCE doesn't seem as descriptive. What it results in is the memory location being loaded or stored (presumably once exactly), but I think the more general underlying idea is a barrier point. -- SUSE Labs, Novell Inc. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-15 13:30 ` Nick Piggin @ 2007-08-15 20:15 ` Paul E. McKenney 2007-08-16 1:09 ` Nick Piggin 0 siblings, 1 reply; 30+ messages in thread From: Paul E. McKenney @ 2007-08-15 20:15 UTC (permalink / raw) To: Nick Piggin Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, Arnd Bergmann On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote: > Paul E. McKenney wrote: > >On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote: > > >>Maybe it is the safe way to go, but it does obscure cases where there > >>is a real need for barriers. > > > > > >I prefer burying barriers into other primitives. > > When they should naturally be there, eg. locking or the RCU primitives, > I agree. > > I don't like having them scattered in various "just in case" places, > because it makes both the users and the APIs hard to understand and > change. I certainly agree that we shouldn't do volatile just for the fun of it, and also that use of volatile should be quite rare. > Especially since several big architectures don't have volatile in their > atomic_get and _set, I think it would be a step backwards to add them in > as a "just in case" thin now (unless there is a better reason). Good point, except that I would expect gcc's optimization to continue to improve. I would like the kernel to be able to take advantage of improved optimization, which means that we are going to have to make a few changes. Adding volatile to atomic_get() and atomic_set() is IMHO one of those changes. > >>Many atomic operations are allowed to be reordered between CPUs, so > >>I don't have a good idea for the rationale to order them within the > >>CPU (also loads and stores to long and ptr types are not ordered like > >>this, although we do consider those to be atomic operations too). > >> > >>barrier() in a way is like enforcing sequential memory ordering > >>between process and interrupt context, wheras volatile is just > >>enforcing coherency of a single memory location (and as such is > >>cheaper). > > > > > >barrier() is useful, but it has the very painful side-effect of forcing > >the compiler to dump temporaries. So we do need something that is > >not quite so global in effect. > > Yep. > > >>What do you think of this crazy idea? > >> > >>/* Enforce a compiler barrier for only operations to location X. > >>* Call multiple times to provide an ordering between multiple > >>* memory locations. Other memory operations can be assumed by > >>* the compiler to remain unchanged and may be reordered > >>*/ > >>#define order(x) asm volatile("" : "+m" (x)) > > > >There was something very similar discussed earlier in this thread, > >with quite a bit of debate as to exactly what the "m" flag should > >look like. I suggested something similar named ACCESS_ONCE in the > >context of RCU (http://lkml.org/lkml/2007/7/11/664): > > Oh, I missed that earlier debate. Will go have a look. > > > #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) > > > >The nice thing about this is that it works for both loads and stores. > >Not clear that order() above does this -- I get compiler errors when > >I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2. > > As Arnd ponted out, order() is not supposed to be an lvalue, but a > statement like the rest of our memory ordering API. > > As to your followup question of why to use it over ACCESS_ONCE. I > guess, aside from consistency with the rest of the barrier APIs, you > can use it in other primitives when you don't actually know what the > caller is going to do or if it even will make an access. You could > also use it between calls to _other_ primitives, etc... it just > seems more flexible to me, but I haven't actually used such a thing > in real code... > > ACCESS_ONCE doesn't seem as descriptive. What it results in is the > memory location being loaded or stored (presumably once exactly), > but I think the more general underlying idea is a barrier point. OK, first, I am not arguing that ACCESS_ONCE() can replace all current uses of barrier(). Similarly, rcu_dereference(), rcu_assign_pointer(), and the various RCU versions of the list-manipulation API in no way replaced all uses of explicit memory barriers. However, I do believe that these API are much easier to use (where they apply, of course) than were the earlier idioms involving explicit memory barriers. But we of course need to keep the explicit memory and compiler barriers for other situations. Thanx, Paul ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-15 20:15 ` Paul E. McKenney @ 2007-08-16 1:09 ` Nick Piggin 2007-08-16 2:27 ` Paul E. McKenney 0 siblings, 1 reply; 30+ messages in thread From: Nick Piggin @ 2007-08-16 1:09 UTC (permalink / raw) To: paulmck Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, Arnd Bergmann Paul E. McKenney wrote: > On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote: >>Especially since several big architectures don't have volatile in their >>atomic_get and _set, I think it would be a step backwards to add them in >>as a "just in case" thin now (unless there is a better reason). > > > Good point, except that I would expect gcc's optimization to continue > to improve. I would like the kernel to be able to take advantage of > improved optimization, which means that we are going to have to make > a few changes. Adding volatile to atomic_get() and atomic_set() is > IMHO one of those changes. What optimisations? gcc already does most of the things you need a barrier/volatile for, like reordering non-dependant loads and stores, and eliminating mem ops completely by caching in registers. >>As to your followup question of why to use it over ACCESS_ONCE. I >>guess, aside from consistency with the rest of the barrier APIs, you >>can use it in other primitives when you don't actually know what the >>caller is going to do or if it even will make an access. You could >>also use it between calls to _other_ primitives, etc... it just >>seems more flexible to me, but I haven't actually used such a thing >>in real code... >> >>ACCESS_ONCE doesn't seem as descriptive. What it results in is the >>memory location being loaded or stored (presumably once exactly), >>but I think the more general underlying idea is a barrier point. > > > OK, first, I am not arguing that ACCESS_ONCE() can replace all current > uses of barrier(). OK. Well I also wasn't saying that ACCESS_ONCE should not be implemented. But if we want something like it, then it would make sense to have an equivalent barrier statement as well (ie. order()). -- SUSE Labs, Novell Inc. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-16 1:09 ` Nick Piggin @ 2007-08-16 2:27 ` Paul E. McKenney 0 siblings, 0 replies; 30+ messages in thread From: Paul E. McKenney @ 2007-08-16 2:27 UTC (permalink / raw) To: Nick Piggin Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, Arnd Bergmann On Thu, Aug 16, 2007 at 11:09:25AM +1000, Nick Piggin wrote: > Paul E. McKenney wrote: > >On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote: > > >>Especially since several big architectures don't have volatile in their > >>atomic_get and _set, I think it would be a step backwards to add them in > >>as a "just in case" thin now (unless there is a better reason). > > > >Good point, except that I would expect gcc's optimization to continue > >to improve. I would like the kernel to be able to take advantage of > >improved optimization, which means that we are going to have to make > >a few changes. Adding volatile to atomic_get() and atomic_set() is > >IMHO one of those changes. > > What optimisations? gcc already does most of the things you need a > barrier/volatile for, like reordering non-dependant loads and stores, > and eliminating mem ops completely by caching in registers. Yep, most of the currently practiced optimizations. Given that CPU clock frequencies are not rising as quickly as they once did, I would expect that there will be added effort on implementing the known more-aggressive optimization techniques, and on coming up with new ones as well. Some of these new optimizations will likely be inappropriate for kernel code, but I expect that some will be things that we want. > >>As to your followup question of why to use it over ACCESS_ONCE. I > >>guess, aside from consistency with the rest of the barrier APIs, you > >>can use it in other primitives when you don't actually know what the > >>caller is going to do or if it even will make an access. You could > >>also use it between calls to _other_ primitives, etc... it just > >>seems more flexible to me, but I haven't actually used such a thing > >>in real code... > >> > >>ACCESS_ONCE doesn't seem as descriptive. What it results in is the > >>memory location being loaded or stored (presumably once exactly), > >>but I think the more general underlying idea is a barrier point. > > > >OK, first, I am not arguing that ACCESS_ONCE() can replace all current > >uses of barrier(). > > OK. Well I also wasn't saying that ACCESS_ONCE should not be > implemented. But if we want something like it, then it would make > sense to have an equivalent barrier statement as well (ie. order()). And I am not arguing against use of asms to implement the volatility in atomic_read() and atomic_set(), and in fact it appears that one of the architectures will be taking this approach. Sounds like we might be in violent agreement. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-10 9:23 ` David Howells 2007-08-10 19:54 ` Chris Snook @ 2007-08-11 8:47 ` David Howells 2007-08-13 6:44 ` Chris Snook 1 sibling, 1 reply; 30+ messages in thread From: David Howells @ 2007-08-11 8:47 UTC (permalink / raw) To: Chris Snook Cc: dhowells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Chris Snook <csnook@redhat.com> wrote: > cpu_relax() contains a barrier, so it should do the right thing. For non-smp > architectures, I'm concerned about interacting with interrupt handlers. Some > drivers do use atomic_* operations. I'm not sure that actually answers my question. Why not smp_rmb()? David ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-11 8:47 ` David Howells @ 2007-08-13 6:44 ` Chris Snook 2007-08-14 5:42 ` Nick Piggin 0 siblings, 1 reply; 30+ messages in thread From: Chris Snook @ 2007-08-13 6:44 UTC (permalink / raw) To: David Howells Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl David Howells wrote: > Chris Snook <csnook@redhat.com> wrote: > >> cpu_relax() contains a barrier, so it should do the right thing. For non-smp >> architectures, I'm concerned about interacting with interrupt handlers. Some >> drivers do use atomic_* operations. > > I'm not sure that actually answers my question. Why not smp_rmb()? > > David I would assume because we want to waste time efficiently even on non-smp architectures, rather than frying the CPU or draining the battery. Certain looping execution patterns can cause the CPU to operate above thermal design power. I have fans on my workstation that only ever come on when running LINPACK, and that's generally memory bandwidth-bound. Just imagine what happens when you're executing the same few non-serializing instructions in a tight loop without ever stalling on memory fetches, or being scheduled out. If there's another reason, I'd like to hear it too, because I'm just guessing here. -- Chris ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-13 6:44 ` Chris Snook @ 2007-08-14 5:42 ` Nick Piggin 2007-08-15 18:51 ` Segher Boessenkool 0 siblings, 1 reply; 30+ messages in thread From: Nick Piggin @ 2007-08-14 5:42 UTC (permalink / raw) To: Chris Snook Cc: David Howells, linux-kernel, linux-arch, torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl Chris Snook wrote: > David Howells wrote: > >> Chris Snook <csnook@redhat.com> wrote: >> >>> cpu_relax() contains a barrier, so it should do the right thing. For >>> non-smp >>> architectures, I'm concerned about interacting with interrupt >>> handlers. Some >>> drivers do use atomic_* operations. >> >> >> I'm not sure that actually answers my question. Why not smp_rmb()? >> >> David > > > I would assume because we want to waste time efficiently even on non-smp > architectures, rather than frying the CPU or draining the battery. > Certain looping execution patterns can cause the CPU to operate above > thermal design power. I have fans on my workstation that only ever come > on when running LINPACK, and that's generally memory bandwidth-bound. > Just imagine what happens when you're executing the same few > non-serializing instructions in a tight loop without ever stalling on > memory fetches, or being scheduled out. > > If there's another reason, I'd like to hear it too, because I'm just > guessing here. Well if there is only one memory location involved, then smp_rmb() isn't going to really do anything anyway, so it would be incorrect to use it. Consider that smp_rmb basically will do anything from flushing the pipeline to invalidating loads speculatively executed out of order. AFAIK it will not control the visibility of stores coming from other CPUs (that is up to the cache coherency). -- SUSE Labs, Novell Inc. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-14 5:42 ` Nick Piggin @ 2007-08-15 18:51 ` Segher Boessenkool 2007-08-15 19:18 ` Paul E. McKenney 0 siblings, 1 reply; 30+ messages in thread From: Segher Boessenkool @ 2007-08-15 18:51 UTC (permalink / raw) To: Nick Piggin Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells > Well if there is only one memory location involved, then smp_rmb() > isn't > going to really do anything anyway, so it would be incorrect to use it. rmb() orders *any* two reads; that includes two reads from the same location. > Consider that smp_rmb basically will do anything from flushing the > pipeline to invalidating loads speculatively executed out of order. > AFAIK > it will not control the visibility of stores coming from other CPUs > (that > is up to the cache coherency). The writer side should typically use wmb() whenever the reader side uses rmb(), sure. Segher ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-15 18:51 ` Segher Boessenkool @ 2007-08-15 19:18 ` Paul E. McKenney 2007-08-15 19:46 ` Segher Boessenkool 0 siblings, 1 reply; 30+ messages in thread From: Paul E. McKenney @ 2007-08-15 19:18 UTC (permalink / raw) To: Segher Boessenkool Cc: Nick Piggin, heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells On Wed, Aug 15, 2007 at 08:51:58PM +0200, Segher Boessenkool wrote: > >Well if there is only one memory location involved, then smp_rmb() > >isn't > >going to really do anything anyway, so it would be incorrect to use it. > > rmb() orders *any* two reads; that includes two reads from the same > location. If the two reads are to the same location, all CPUs I am aware of will maintain the ordering without need for a memory barrier. Thanx, Paul > >Consider that smp_rmb basically will do anything from flushing the > >pipeline to invalidating loads speculatively executed out of order. > >AFAIK > >it will not control the visibility of stores coming from other CPUs > >(that > >is up to the cache coherency). > > The writer side should typically use wmb() whenever the reader side > uses rmb(), sure. > > > Segher > > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-15 19:18 ` Paul E. McKenney @ 2007-08-15 19:46 ` Segher Boessenkool 2007-08-15 19:59 ` Paul E. McKenney 0 siblings, 1 reply; 30+ messages in thread From: Segher Boessenkool @ 2007-08-15 19:46 UTC (permalink / raw) To: paulmck Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx, schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells >>> Well if there is only one memory location involved, then smp_rmb() >>> isn't >>> going to really do anything anyway, so it would be incorrect to use >>> it. >> >> rmb() orders *any* two reads; that includes two reads from the same >> location. > > If the two reads are to the same location, all CPUs I am aware of > will maintain the ordering without need for a memory barrier. That's true of course, although there is no real guarantee for that. Segher ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-15 19:46 ` Segher Boessenkool @ 2007-08-15 19:59 ` Paul E. McKenney 2007-08-15 20:13 ` Segher Boessenkool 0 siblings, 1 reply; 30+ messages in thread From: Paul E. McKenney @ 2007-08-15 19:59 UTC (permalink / raw) To: Segher Boessenkool Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx, schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells On Wed, Aug 15, 2007 at 09:46:55PM +0200, Segher Boessenkool wrote: > >>>Well if there is only one memory location involved, then smp_rmb() > >>>isn't > >>>going to really do anything anyway, so it would be incorrect to use > >>>it. > >> > >>rmb() orders *any* two reads; that includes two reads from the same > >>location. > > > >If the two reads are to the same location, all CPUs I am aware of > >will maintain the ordering without need for a memory barrier. > > That's true of course, although there is no real guarantee for that. A CPU that did not provide this property ("cache coherence") would be most emphatically reviled. So we are pretty safe assuming that CPUs will provide it. Thanx, Paul ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-15 19:59 ` Paul E. McKenney @ 2007-08-15 20:13 ` Segher Boessenkool 2007-08-15 20:38 ` Paul E. McKenney 0 siblings, 1 reply; 30+ messages in thread From: Segher Boessenkool @ 2007-08-15 20:13 UTC (permalink / raw) To: paulmck Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx, schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells >>>>> Well if there is only one memory location involved, then smp_rmb() >>>>> isn't >>>>> going to really do anything anyway, so it would be incorrect to use >>>>> it. >>>> >>>> rmb() orders *any* two reads; that includes two reads from the same >>>> location. >>> >>> If the two reads are to the same location, all CPUs I am aware of >>> will maintain the ordering without need for a memory barrier. >> >> That's true of course, although there is no real guarantee for that. > > A CPU that did not provide this property ("cache coherence") would be > most emphatically reviled. That doesn't have anything to do with coherency as far as I can see. It's just about the order in which a CPU (speculatively) performs the loads (which isn't necessarily the same as the order in which it executes the corresponding instructions, even). > So we are pretty safe assuming that CPUs > will provide it. Yeah, pretty safe. I just don't like undocumented assumptions :-) Segher ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-15 20:13 ` Segher Boessenkool @ 2007-08-15 20:38 ` Paul E. McKenney 2007-08-15 21:15 ` Segher Boessenkool 0 siblings, 1 reply; 30+ messages in thread From: Paul E. McKenney @ 2007-08-15 20:38 UTC (permalink / raw) To: Segher Boessenkool Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx, schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells On Wed, Aug 15, 2007 at 10:13:49PM +0200, Segher Boessenkool wrote: > >>>>>Well if there is only one memory location involved, then smp_rmb() > >>>>>isn't > >>>>>going to really do anything anyway, so it would be incorrect to use > >>>>>it. > >>>> > >>>>rmb() orders *any* two reads; that includes two reads from the same > >>>>location. > >>> > >>>If the two reads are to the same location, all CPUs I am aware of > >>>will maintain the ordering without need for a memory barrier. > >> > >>That's true of course, although there is no real guarantee for that. > > > >A CPU that did not provide this property ("cache coherence") would be > >most emphatically reviled. > > That doesn't have anything to do with coherency as far as I can see. > > It's just about the order in which a CPU (speculatively) performs the > loads > (which isn't necessarily the same as the order in which it executes the > corresponding instructions, even). Please check the definition of "cache coherence". Summary: the CPU is indeed within its rights to execute loads and stores to a single variable out of order, -but- only if it gets the same result that it would have obtained by executing them in order. Which means that any reordering of accesses by a single CPU to a single variable will be invisible to the software. > >So we are pretty safe assuming that CPUs > >will provide it. > > Yeah, pretty safe. I just don't like undocumented assumptions :-) Can't help you there! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-15 20:38 ` Paul E. McKenney @ 2007-08-15 21:15 ` Segher Boessenkool 2007-08-16 1:20 ` Nick Piggin 0 siblings, 1 reply; 30+ messages in thread From: Segher Boessenkool @ 2007-08-15 21:15 UTC (permalink / raw) To: paulmck Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx, schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells > Please check the definition of "cache coherence". Which of the twelve thousand such definitions? :-) > Summary: the CPU is indeed within its rights to execute loads and > stores > to a single variable out of order, -but- only if it gets the same > result > that it would have obtained by executing them in order. Which means > that > any reordering of accesses by a single CPU to a single variable will be > invisible to the software. I'm still not sure if that applies to all architectures. Doesn't matter anyway, let's kill this thread :-) Segher ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv 2007-08-15 21:15 ` Segher Boessenkool @ 2007-08-16 1:20 ` Nick Piggin 0 siblings, 0 replies; 30+ messages in thread From: Nick Piggin @ 2007-08-16 1:20 UTC (permalink / raw) To: Segher Boessenkool Cc: paulmck, heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm, torvalds, linux-arch, jesper.juhl, zlynx, schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells Segher Boessenkool wrote: >> Please check the definition of "cache coherence". > > > Which of the twelve thousand such definitions? :-) Every definition I have seen says that writes to a single memory location have a serial order as seen by all CPUs, and that a read will return the most recent write in the sequence (with a bit of extra mumbo jumbo to cover store queues and store forwarding). Within such a definition, I don't see how would be allowed for a single CPU to execute reads out of order and have the second read return an earlier write than the first read. -- SUSE Labs, Novell Inc. ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2007-08-16 2:27 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-08-09 13:41 [PATCH 6/24] make atomic_read() behave consistently on frv Chris Snook 2007-08-09 16:54 ` Chris Snook 2007-08-10 9:23 ` David Howells 2007-08-10 19:54 ` Chris Snook 2007-08-11 0:54 ` Herbert Xu 2007-08-11 4:29 ` Paul E. McKenney 2007-08-13 5:15 ` Herbert Xu 2007-08-13 6:03 ` Paul E. McKenney 2007-08-14 5:34 ` Nick Piggin 2007-08-14 7:26 ` Herbert Xu 2007-08-14 17:01 ` Paul E. McKenney 2007-08-14 22:01 ` Arnd Bergmann 2007-08-14 22:43 ` Paul E. McKenney 2007-08-15 13:29 ` Arnd Bergmann 2007-08-15 15:06 ` Michael Buesch 2007-08-15 13:30 ` Nick Piggin 2007-08-15 20:15 ` Paul E. McKenney 2007-08-16 1:09 ` Nick Piggin 2007-08-16 2:27 ` Paul E. McKenney 2007-08-11 8:47 ` David Howells 2007-08-13 6:44 ` Chris Snook 2007-08-14 5:42 ` Nick Piggin 2007-08-15 18:51 ` Segher Boessenkool 2007-08-15 19:18 ` Paul E. McKenney 2007-08-15 19:46 ` Segher Boessenkool 2007-08-15 19:59 ` Paul E. McKenney 2007-08-15 20:13 ` Segher Boessenkool 2007-08-15 20:38 ` Paul E. McKenney 2007-08-15 21:15 ` Segher Boessenkool 2007-08-16 1:20 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).