* Re: [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() [not found] <20131108062703.GC2693@Krystal> @ 2013-11-08 10:29 ` Mathieu Desnoyers 2013-11-08 11:08 ` Peter Zijlstra 0 siblings, 1 reply; 6+ messages in thread From: Mathieu Desnoyers @ 2013-11-08 10:29 UTC (permalink / raw) To: peterz Cc: linux-arch, geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec, michael, mikey, linux, schwidefsky, heiko.carstens, tony.luck, Will Deacon (forwarded due to SMTP issues) ----- Forwarded Message ----- > From: "Mathieu Desnoyers" <compudj@krystal.dyndns.org> > To: "mathieu desnoyers" <mathieu.desnoyers@efficios.com> > Sent: Friday, November 8, 2013 1:27:03 AM > Subject: (forw) [mathieu.desnoyers@polymtl.ca: Re: [PATCH 3/4] arch: Introduce smp_load_acquire(), > smp_store_release()] > > ----- Forwarded message from Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > ----- > > Date: Thu, 7 Nov 2013 16:03:20 -0500 > To: peterz@infradead.org > Cc: linux-arch@vger.kernel.org, geert@linux-m68k.org, > paulmck@linux.vnet.ibm.com, torvalds@linux-foundation.org, > VICTORK@il.ibm.com, oleg@redhat.com, anton@samba.org, > benh@kernel.crashing.org, fweisbec@gmail.com, michael@ellerman.id.au, > mikey@neuling.org, linux@arm.linux.org.uk, schwidefsky@de.ibm.com, > heiko.carstens@de.ibm.com, tony.luck@intel.com, Will Deacon > <will.deacon@arm.com> > User-Agent: Mutt/1.5.21 (2010-09-15) > From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > Subject: Re: [PATCH 3/4] arch: Introduce smp_load_acquire(), > smp_store_release() > > * peterz@infradead.org (peterz@infradead.org) wrote: > > A number of situations currently require the heavyweight smp_mb(), > > even though there is no need to order prior stores against later > > loads. Many architectures have much cheaper ways to handle these > > situations, but the Linux kernel currently has no portable way > > to make use of them. > > > > This commit therefore supplies smp_load_acquire() and > > smp_store_release() to remedy this situation. The new > > smp_load_acquire() primitive orders the specified load against > > any subsequent reads or writes, while the new smp_store_release() > > primitive orders the specifed store against any prior reads or > > writes. These primitives allow array-based circular FIFOs to be > > implemented without an smp_mb(), and also allow a theoretical > > hole in rcu_assign_pointer() to be closed at no additional > > expense on most architectures. > > > > In addition, the RCU experience transitioning from explicit > > smp_read_barrier_depends() and smp_wmb() to rcu_dereference() > > and rcu_assign_pointer(), respectively resulted in substantial > > improvements in readability. It therefore seems likely that > > replacing other explicit barriers with smp_load_acquire() and > > smp_store_release() will provide similar benefits. It appears > > that roughly half of the explicit barriers in core kernel code > > might be so replaced. > > > > [Changelog by PaulMck] > > > > Cc: Tony Luck <tony.luck@intel.com> > > Cc: Oleg Nesterov <oleg@redhat.com> > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > > Cc: Frederic Weisbecker <fweisbec@gmail.com> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > > Cc: Michael Ellerman <michael@ellerman.id.au> > > Cc: Michael Neuling <mikey@neuling.org> > > Cc: Russell King <linux@arm.linux.org.uk> > > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > > Cc: Heiko Carstens <heiko.carstens@de.ibm.com> > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> > > Cc: Victor Kaplansky <VICTORK@il.ibm.com> > > Acked-by: Will Deacon <will.deacon@arm.com> > > Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > > Signed-off-by: Peter Zijlstra <peterz@infradead.org> > > --- > > arch/arm/include/asm/barrier.h | 15 ++++++++++ > > arch/arm64/include/asm/barrier.h | 50 > > ++++++++++++++++++++++++++++++++++++ > > arch/ia64/include/asm/barrier.h | 49 > > +++++++++++++++++++++++++++++++++++ > > arch/metag/include/asm/barrier.h | 15 ++++++++++ > > arch/mips/include/asm/barrier.h | 15 ++++++++++ > > arch/powerpc/include/asm/barrier.h | 21 ++++++++++++++- > > arch/s390/include/asm/barrier.h | 15 ++++++++++ > > arch/sparc/include/asm/barrier_64.h | 15 ++++++++++ > > arch/x86/include/asm/barrier.h | 15 ++++++++++ > > include/asm-generic/barrier.h | 15 ++++++++++ > > include/linux/compiler.h | 9 ++++++ > > 11 files changed, 233 insertions(+), 1 deletion(-) > > > > Index: linux-2.6/arch/arm/include/asm/barrier.h > > =================================================================== > > --- linux-2.6.orig/arch/arm/include/asm/barrier.h 2013-11-07 > > 17:36:09.105170623 +0100 > > +++ linux-2.6/arch/arm/include/asm/barrier.h 2013-11-07 17:36:09.097170473 > > +0100 > > @@ -59,6 +59,21 @@ > > #define smp_wmb() dmb(ishst) > > #endif > > > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ACCESS_ONCE(*p) = (v); \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ___p1; \ > > +}) > > Can you move those "generic" definitions into asm-generic/barrier.h > under an ifdef guard ? > > The pattern using "smp_mb()" seems to be the right target for a generic > implementation. > > We should probably document the requirements on sizeof(*p) and > alignof(*p) directly above the macro definition. > > > + > > #define read_barrier_depends() do { } while(0) > > #define smp_read_barrier_depends() do { } while(0) > > > > Index: linux-2.6/arch/arm64/include/asm/barrier.h > > =================================================================== > > --- linux-2.6.orig/arch/arm64/include/asm/barrier.h 2013-11-07 > > 17:36:09.105170623 +0100 > > +++ linux-2.6/arch/arm64/include/asm/barrier.h 2013-11-07 > > 17:36:09.098170492 +0100 > > @@ -35,10 +35,60 @@ > > #define smp_mb() barrier() > > #define smp_rmb() barrier() > > #define smp_wmb() barrier() > > + > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ACCESS_ONCE(*p) = (v); \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ___p1; \ > > +}) > > + > > #else > > + > > #define smp_mb() asm volatile("dmb ish" : : : "memory") > > #define smp_rmb() asm volatile("dmb ishld" : : : "memory") > > #define smp_wmb() asm volatile("dmb ishst" : : : "memory") > > + > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + switch (sizeof(*p)) { \ > > + case 4: \ > > + asm volatile ("stlr %w1, %0" \ > > + : "=Q" (*p) : "r" (v) : "memory"); \ > > + break; \ > > + case 8: \ > > + asm volatile ("stlr %1, %0" \ > > + : "=Q" (*p) : "r" (v) : "memory"); \ > > + break; \ > > + } \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1; \ > > + compiletime_assert_atomic_type(*p); \ > > + switch (sizeof(*p)) { \ > > + case 4: \ > > + asm volatile ("ldar %w0, %1" \ > > + : "=r" (___p1) : "Q" (*p) : "memory"); \ > > + break; \ > > + case 8: \ > > + asm volatile ("ldar %0, %1" \ > > + : "=r" (___p1) : "Q" (*p) : "memory"); \ > > + break; \ > > + } \ > > + ___p1; \ > > +}) > > + > > #endif > > > > #define read_barrier_depends() do { } while(0) > > Index: linux-2.6/arch/ia64/include/asm/barrier.h > > =================================================================== > > --- linux-2.6.orig/arch/ia64/include/asm/barrier.h 2013-11-07 > > 17:36:09.105170623 +0100 > > +++ linux-2.6/arch/ia64/include/asm/barrier.h 2013-11-07 17:36:09.098170492 > > +0100 > > @@ -45,11 +45,60 @@ > > # define smp_rmb() rmb() > > # define smp_wmb() wmb() > > # define smp_read_barrier_depends() read_barrier_depends() > > + > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + switch (sizeof(*p)) { \ > > + case 4: \ > > + asm volatile ("st4.rel [%0]=%1" \ > > + : "=r" (p) : "r" (v) : "memory"); \ > > + break; \ > > + case 8: \ > > + asm volatile ("st8.rel [%0]=%1" \ > > + : "=r" (p) : "r" (v) : "memory"); \ > > + break; \ > > + } \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1; \ > > + compiletime_assert_atomic_type(*p); \ > > + switch (sizeof(*p)) { \ > > + case 4: \ > > + asm volatile ("ld4.acq %0=[%1]" \ > > + : "=r" (___p1) : "r" (p) : "memory"); \ > > + break; \ > > + case 8: \ > > + asm volatile ("ld8.acq %0=[%1]" \ > > + : "=r" (___p1) : "r" (p) : "memory"); \ > > + break; \ > > + } \ > > + ___p1; \ > > +}) > > + > > #else > > + > > # define smp_mb() barrier() > > # define smp_rmb() barrier() > > # define smp_wmb() barrier() > > # define smp_read_barrier_depends() do { } while(0) > > + > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ACCESS_ONCE(*p) = (v); \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ___p1; \ > > +}) > > #endif > > > > /* > > Index: linux-2.6/arch/metag/include/asm/barrier.h > > =================================================================== > > --- linux-2.6.orig/arch/metag/include/asm/barrier.h 2013-11-07 > > 17:36:09.105170623 +0100 > > +++ linux-2.6/arch/metag/include/asm/barrier.h 2013-11-07 > > 17:36:09.099170511 +0100 > > @@ -82,4 +82,19 @@ > > #define smp_read_barrier_depends() do { } while (0) > > #define set_mb(var, value) do { var = value; smp_mb(); } while (0) > > > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ACCESS_ONCE(*p) = (v); \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ___p1; \ > > +}) > > + > > #endif /* _ASM_METAG_BARRIER_H */ > > Index: linux-2.6/arch/mips/include/asm/barrier.h > > =================================================================== > > --- linux-2.6.orig/arch/mips/include/asm/barrier.h 2013-11-07 > > 17:36:09.105170623 +0100 > > +++ linux-2.6/arch/mips/include/asm/barrier.h 2013-11-07 17:36:09.099170511 > > +0100 > > @@ -180,4 +180,19 @@ > > #define nudge_writes() mb() > > #endif > > > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ACCESS_ONCE(*p) = (v); \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ___p1; \ > > +}) > > + > > #endif /* __ASM_BARRIER_H */ > > Index: linux-2.6/arch/powerpc/include/asm/barrier.h > > =================================================================== > > --- linux-2.6.orig/arch/powerpc/include/asm/barrier.h 2013-11-07 > > 17:36:09.105170623 +0100 > > +++ linux-2.6/arch/powerpc/include/asm/barrier.h 2013-11-07 > > 17:36:09.100170529 +0100 > > @@ -45,11 +45,15 @@ > > # define SMPWMB eieio > > #endif > > > > +#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : > > :"memory") > > + > > #define smp_mb() mb() > > -#define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : > > :"memory") > > +#define smp_rmb() __lwsync() > > #define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : > > :"memory") > > #define smp_read_barrier_depends() read_barrier_depends() > > #else > > +#define __lwsync() barrier() > > + > > #define smp_mb() barrier() > > #define smp_rmb() barrier() > > #define smp_wmb() barrier() > > @@ -65,4 +69,19 @@ > > #define data_barrier(x) \ > > asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); > > > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + __lwsync(); \ > > Even though this is correct, it appears to bear more overhead than > necessary. See arch/powerpc/include/asm/synch.h > > PPC_ACQUIRE_BARRIER and PPC_RELEASE_BARRIER > > You'll notice that some variants of powerpc require something more > heavy-weight than a lwsync instruction. The fallback will be "isync" > rather than "sync" if you use PPC_ACQUIRE_BARRIER and > PPC_RELEASE_BARRIER rather than LWSYNC directly. > > > + ACCESS_ONCE(*p) = (v); \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > + compiletime_assert_atomic_type(*p); \ > > + __lwsync(); \ > > + ___p1; \ > > +}) > > + > > #endif /* _ASM_POWERPC_BARRIER_H */ > > Index: linux-2.6/arch/s390/include/asm/barrier.h > > =================================================================== > > --- linux-2.6.orig/arch/s390/include/asm/barrier.h 2013-11-07 > > 17:36:09.105170623 +0100 > > +++ linux-2.6/arch/s390/include/asm/barrier.h 2013-11-07 17:36:09.100170529 > > +0100 > > @@ -32,4 +32,19 @@ > > > > #define set_mb(var, value) do { var = value; mb(); } while (0) > > > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + barrier(); \ > > + ACCESS_ONCE(*p) = (v); \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > + compiletime_assert_atomic_type(*p); \ > > + barrier(); \ > > + ___p1; \ > > +}) > > + > > #endif /* __ASM_BARRIER_H */ > > Index: linux-2.6/arch/sparc/include/asm/barrier_64.h > > =================================================================== > > --- linux-2.6.orig/arch/sparc/include/asm/barrier_64.h 2013-11-07 > > 17:36:09.105170623 +0100 > > +++ linux-2.6/arch/sparc/include/asm/barrier_64.h 2013-11-07 > > 17:36:09.101170548 +0100 > > @@ -53,4 +53,19 @@ > > > > #define smp_read_barrier_depends() do { } while(0) > > > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + barrier(); \ > > + ACCESS_ONCE(*p) = (v); \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > + compiletime_assert_atomic_type(*p); \ > > + barrier(); \ > > + ___p1; \ > > +}) > > + > > #endif /* !(__SPARC64_BARRIER_H) */ > > Index: linux-2.6/arch/x86/include/asm/barrier.h > > =================================================================== > > --- linux-2.6.orig/arch/x86/include/asm/barrier.h 2013-11-07 > > 17:36:09.105170623 +0100 > > +++ linux-2.6/arch/x86/include/asm/barrier.h 2013-11-07 22:23:46.097491898 > > +0100 > > @@ -92,12 +92,53 @@ > > #endif > > #define smp_read_barrier_depends() read_barrier_depends() > > #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) > > -#else > > +#else /* !SMP */ > > #define smp_mb() barrier() > > #define smp_rmb() barrier() > > #define smp_wmb() barrier() > > #define smp_read_barrier_depends() do { } while (0) > > #define set_mb(var, value) do { var = value; barrier(); } while (0) > > +#endif /* SMP */ > > + > > +#if defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE) > > + > > +/* > > + * For either of these options x86 doesn't have a strong TSO memory > > + * model and we should fall back to full barriers. > > + */ > > + > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ACCESS_ONCE(*p) = (v); \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ___p1; \ > > +}) > > + > > +#else /* regular x86 TSO memory ordering */ > > + > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + barrier(); \ > > + ACCESS_ONCE(*p) = (v); \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > + compiletime_assert_atomic_type(*p); \ > > + barrier(); \ > > + ___p1; \ > > +}) > > Hrm, I really don't get the two barrier() above. > > On x86, in a standard lock, we can get away with having surrounding > memory barriers defined as compiler barrier() because the LOCK prefix of > the atomic instructions taking and releasing the lock are implicit full > memory barriers. > > Understandably, TSO allows you to remove write barriers. However, AFAIU, > the smp_store_release()/smp_load_acquire() semantics provides ordering > guarantees for both loads and stores with respect to the > store_release/load_acquire operations. I don't see how the simple > compiler barrier() here conveys this. > > Unless what you really mean is that the smp_load_acquire() only provides > ordering guarantees of following loads with respect to the load_acquire, > and that smp_store_release() only provides ordering guarantees of prior > writes before the store_release ? If this is the case, then I think the > names chosen are too short and don't convey that: > > a) those are load and store operations, > b) those have an acquire/release semantic which scope only targets, > respectively, other load and store operations. > > Maybe the following names would be clearer ? > > smp_store_release_wmb(p, v) > smp_load_acquire_rmb(p) > > Or maybe we just need to document really well what's the semantic of a > store_release and load_acquire. > > Furthermore, I don't see how a simple compiler barrier() can provide the > acquire semantic within smp_load_acquire on x86 TSO. AFAIU, a smp_rmb() > might be needed. > > > + > > #endif > > > > /* > > Index: linux-2.6/include/asm-generic/barrier.h > > =================================================================== > > --- linux-2.6.orig/include/asm-generic/barrier.h 2013-11-07 > > 17:36:09.105170623 +0100 > > +++ linux-2.6/include/asm-generic/barrier.h 2013-11-07 17:36:09.102170567 > > +0100 > > @@ -62,5 +62,20 @@ > > #define set_mb(var, value) do { (var) = (value); mb(); } while (0) > > #endif > > > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ACCESS_ONCE(*p) = (v); \ > > +} while (0) > > + > > +#define smp_load_acquire(p) \ > > +({ \ > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > + compiletime_assert_atomic_type(*p); \ > > + smp_mb(); \ > > + ___p1; \ > > +}) > > + > > #endif /* !__ASSEMBLY__ */ > > #endif /* __ASM_GENERIC_BARRIER_H */ > > Index: linux-2.6/include/linux/compiler.h > > =================================================================== > > --- linux-2.6.orig/include/linux/compiler.h 2013-11-07 17:36:09.105170623 > > +0100 > > +++ linux-2.6/include/linux/compiler.h 2013-11-07 17:36:09.102170567 +0100 > > @@ -298,6 +298,11 @@ > > # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), > > typeof(b)) > > #endif > > > > +/* Is this type a native word size -- useful for atomic operations */ > > +#ifndef __native_word > > +# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == > > sizeof(long)) > > +#endif > > Should we also check the pointer alignment, or that would be going too > far ? > > Thanks, > > Mathieu > > > + > > /* Compile time object size, -1 for unknown */ > > #ifndef __compiletime_object_size > > # define __compiletime_object_size(obj) -1 > > @@ -337,6 +342,10 @@ > > #define compiletime_assert(condition, msg) \ > > _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) > > > > +#define compiletime_assert_atomic_type(t) \ > > + compiletime_assert(__native_word(t), \ > > + "Need native word sized stores/loads for atomicity.") > > + > > /* > > * Prevent the compiler from merging or refetching accesses. The compiler > > * is also forbidden from reordering successive instances of > > ACCESS_ONCE(), > > > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com > > ----- End forwarded message ----- > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() 2013-11-08 10:29 ` [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() Mathieu Desnoyers @ 2013-11-08 11:08 ` Peter Zijlstra 2013-11-08 13:53 ` Mathieu Desnoyers 0 siblings, 1 reply; 6+ messages in thread From: Peter Zijlstra @ 2013-11-08 11:08 UTC (permalink / raw) To: Mathieu Desnoyers Cc: linux-arch, geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec, michael, mikey, linux, schwidefsky, heiko.carstens, tony.luck, Will Deacon On Fri, Nov 08, 2013 at 10:29:35AM +0000, Mathieu Desnoyers wrote: > > > +#define smp_store_release(p, v) \ > > > +do { \ > > > + compiletime_assert_atomic_type(*p); \ > > > + smp_mb(); \ > > > + ACCESS_ONCE(*p) = (v); \ > > > +} while (0) > > > + > > > +#define smp_load_acquire(p) \ > > > +({ \ > > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > > + compiletime_assert_atomic_type(*p); \ > > > + smp_mb(); \ > > > + ___p1; \ > > > +}) > > > > Can you move those "generic" definitions into asm-generic/barrier.h > > under an ifdef guard ? > > > > The pattern using "smp_mb()" seems to be the right target for a generic > > implementation. > > > > We should probably document the requirements on sizeof(*p) and > > alignof(*p) directly above the macro definition. So that is the one in asm-generic; its just that some archs have a rather complex barrier.h and I didn't want to include asm-generic header to make it still worse. Better to then keep everything in a single file and suffer a little duplication. > > > +#define smp_store_release(p, v) \ > > > +do { \ > > > + compiletime_assert_atomic_type(*p); \ > > > + __lwsync(); \ > > > > Even though this is correct, it appears to bear more overhead than > > necessary. See arch/powerpc/include/asm/synch.h > > > > PPC_ACQUIRE_BARRIER and PPC_RELEASE_BARRIER > > > > You'll notice that some variants of powerpc require something more > > heavy-weight than a lwsync instruction. The fallback will be "isync" > > rather than "sync" if you use PPC_ACQUIRE_BARRIER and > > PPC_RELEASE_BARRIER rather than LWSYNC directly. I think Paul answered this. > > > +#else /* regular x86 TSO memory ordering */ > > > + > > > +#define smp_store_release(p, v) \ > > > +do { \ > > > + compiletime_assert_atomic_type(*p); \ > > > + barrier(); \ > > > + ACCESS_ONCE(*p) = (v); \ > > > +} while (0) > > > + > > > +#define smp_load_acquire(p) \ > > > +({ \ > > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > > + compiletime_assert_atomic_type(*p); \ > > > + barrier(); \ > > > + ___p1; \ > > > +}) > > > > Hrm, I really don't get the two barrier() above. > > > > Or maybe we just need to document really well what's the semantic of a > > store_release and load_acquire. Hence the first patch; they provide the full ACQUIRE and RELEASE semantics. > > Furthermore, I don't see how a simple compiler barrier() can provide the > > acquire semantic within smp_load_acquire on x86 TSO. AFAIU, a smp_rmb() > > might be needed. I think the other Paul answered this one on how the TSO memory model provides all the required semantics. > > > +/* Is this type a native word size -- useful for atomic operations */ > > > +#ifndef __native_word > > > +# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == > > > sizeof(long)) > > > +#endif > > > > Should we also check the pointer alignment, or that would be going too > > far ? I did consider that, but pointer alignment tests turn into code so I left those out. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() 2013-11-08 11:08 ` Peter Zijlstra @ 2013-11-08 13:53 ` Mathieu Desnoyers 0 siblings, 0 replies; 6+ messages in thread From: Mathieu Desnoyers @ 2013-11-08 13:53 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-arch, geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec, michael, mikey, linux, schwidefsky, heiko carstens, tony luck, Will Deacon ----- Original Message ----- > From: "Peter Zijlstra" <peterz@infradead.org> > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> > Cc: linux-arch@vger.kernel.org, geert@linux-m68k.org, paulmck@linux.vnet.ibm.com, torvalds@linux-foundation.org, > VICTORK@il.ibm.com, oleg@redhat.com, anton@samba.org, benh@kernel.crashing.org, fweisbec@gmail.com, > michael@ellerman.id.au, mikey@neuling.org, linux@arm.linux.org.uk, schwidefsky@de.ibm.com, "heiko carstens" > <heiko.carstens@de.ibm.com>, "tony luck" <tony.luck@intel.com>, "Will Deacon" <will.deacon@arm.com> > Sent: Friday, November 8, 2013 6:08:18 AM > Subject: Re: [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() > > On Fri, Nov 08, 2013 at 10:29:35AM +0000, Mathieu Desnoyers wrote: > > > > +#define smp_store_release(p, v) \ > > > > +do { \ > > > > + compiletime_assert_atomic_type(*p); \ > > > > + smp_mb(); \ > > > > + ACCESS_ONCE(*p) = (v); \ > > > > +} while (0) > > > > + > > > > +#define smp_load_acquire(p) \ > > > > +({ \ > > > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > > > + compiletime_assert_atomic_type(*p); \ > > > > + smp_mb(); \ > > > > + ___p1; \ > > > > +}) > > > > > > Can you move those "generic" definitions into asm-generic/barrier.h > > > under an ifdef guard ? > > > > > > The pattern using "smp_mb()" seems to be the right target for a generic > > > implementation. > > > > > > We should probably document the requirements on sizeof(*p) and > > > alignof(*p) directly above the macro definition. > > So that is the one in asm-generic; its just that some archs have a > rather complex barrier.h and I didn't want to include asm-generic header > to make it still worse. Better to then keep everything in a single file > and suffer a little duplication. OK, I guess it makes sense initially. > > > > > +#define smp_store_release(p, v) \ > > > > +do { \ > > > > + compiletime_assert_atomic_type(*p); \ > > > > + __lwsync(); \ > > > > > > Even though this is correct, it appears to bear more overhead than > > > necessary. See arch/powerpc/include/asm/synch.h > > > > > > PPC_ACQUIRE_BARRIER and PPC_RELEASE_BARRIER > > > > > > You'll notice that some variants of powerpc require something more > > > heavy-weight than a lwsync instruction. The fallback will be "isync" > > > rather than "sync" if you use PPC_ACQUIRE_BARRIER and > > > PPC_RELEASE_BARRIER rather than LWSYNC directly. > > I think Paul answered this. Yes, > > > > > +#else /* regular x86 TSO memory ordering */ > > > > + > > > > +#define smp_store_release(p, v) \ > > > > +do { \ > > > > + compiletime_assert_atomic_type(*p); \ > > > > + barrier(); \ > > > > + ACCESS_ONCE(*p) = (v); \ > > > > +} while (0) > > > > + > > > > +#define smp_load_acquire(p) \ > > > > +({ \ > > > > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > > > > + compiletime_assert_atomic_type(*p); \ > > > > + barrier(); \ > > > > + ___p1; \ > > > > +}) > > > > > > Hrm, I really don't get the two barrier() above. > > > > > > Or maybe we just need to document really well what's the semantic of a > > > store_release and load_acquire. > > Hence the first patch; they provide the full ACQUIRE and RELEASE > semantics. Yes, got it following Paul's explanation. > > > > Furthermore, I don't see how a simple compiler barrier() can provide the > > > acquire semantic within smp_load_acquire on x86 TSO. AFAIU, a smp_rmb() > > > might be needed. > > I think the other Paul answered this one on how the TSO memory model > provides all the required semantics. Yes, that's right. > > > > > +/* Is this type a native word size -- useful for atomic operations */ > > > > +#ifndef __native_word > > > > +# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == > > > > sizeof(long)) > > > > +#endif > > > > > > Should we also check the pointer alignment, or that would be going too > > > far ? > > I did consider that, but pointer alignment tests turn into code so I > left those out. This is what I suspected. I'd still recommend commenting the requirements on "p" above the implementation of smp_store_release() and smp_store_acquire(), so callers clearly know what to expect. Other than this small nit (which you may choose to disregard at will), you can add my: Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release() @ 2013-11-07 22:03 peterz 2013-11-07 22:03 ` [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() peterz 0 siblings, 1 reply; 6+ messages in thread From: peterz @ 2013-11-07 22:03 UTC (permalink / raw) To: linux-arch Cc: geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec, mathieu.desnoyers, michael, mikey, linux, schwidefsky, heiko.carstens, tony.luck, Peter Zijlstra *** last posting didn't make it out to the lists *** These patches introduce 2 new barrier primitives: smp_load_acquire(p) smp_store_release(p, v) See the first patch, which changes Documentation/memory-barriers.txt, to find the exact definitions of what an ACQUIRE/RELEASE barrier is -- previously known as LOCK/UNLOCK barriers. The second patch simplifies the asm/barrier.h implementation of a lot of architectures by using asm-generic/barrier.h in order to save a lot of code duplication later on. The third patch adds the new barrier primitives. The fourth adds the first user. Build tested for: alpha-defconfig - OK ia64-defconfig - OK m32r-defconfig - OK frv-defconfig - OK m68k-defconfig - OK mips-defconfig - OK mips-fuloong2e_defconfig - OK arm-defconfig - OK blackfin-defconfig - OK mn10300-defconfig - OK powerpc-ppc40x_defconfig - OK powerpc-defconfig - OK s390-defconfig - OK sh-defconfig - OK sparc-defconfig - OK sparc64-defconfig - OK i386-defconfig - OK x86_64-defconfig - OK Changes since the last posting that didn't make it out to lkml (and other lists) due to an excessive Cc list: - added the fourth patch as a first user - changed the x86 implementation to not assume a TSO model when OOSTORE or PPRO_FENCE --- Documentation/memory-barriers.txt | 164 ++++++++++++++++++---------------- arch/alpha/include/asm/barrier.h | 25 ++---- arch/arc/include/asm/Kbuild | 1 + arch/arc/include/asm/atomic.h | 5 ++ arch/arc/include/asm/barrier.h | 42 --------- arch/arm/include/asm/barrier.h | 15 ++++ arch/arm64/include/asm/barrier.h | 50 +++++++++++ arch/avr32/include/asm/barrier.h | 17 ++-- arch/blackfin/include/asm/barrier.h | 18 +--- arch/cris/include/asm/Kbuild | 1 + arch/cris/include/asm/barrier.h | 25 ------ arch/frv/include/asm/barrier.h | 8 +- arch/h8300/include/asm/barrier.h | 21 +---- arch/hexagon/include/asm/Kbuild | 1 + arch/hexagon/include/asm/barrier.h | 41 --------- arch/ia64/include/asm/barrier.h | 49 ++++++++++ arch/m32r/include/asm/barrier.h | 80 +---------------- arch/m68k/include/asm/barrier.h | 14 +-- arch/metag/include/asm/barrier.h | 15 ++++ arch/microblaze/include/asm/Kbuild | 1 + arch/microblaze/include/asm/barrier.h | 27 ------ arch/mips/include/asm/barrier.h | 15 ++++ arch/mn10300/include/asm/Kbuild | 1 + arch/mn10300/include/asm/barrier.h | 37 -------- arch/parisc/include/asm/Kbuild | 1 + arch/parisc/include/asm/barrier.h | 35 -------- arch/powerpc/include/asm/barrier.h | 21 ++++- arch/s390/include/asm/barrier.h | 15 ++++ arch/score/include/asm/Kbuild | 1 + arch/score/include/asm/barrier.h | 16 ---- arch/sh/include/asm/barrier.h | 21 +---- arch/sparc/include/asm/barrier_32.h | 12 +-- arch/sparc/include/asm/barrier_64.h | 15 ++++ arch/tile/include/asm/barrier.h | 68 +------------- arch/unicore32/include/asm/barrier.h | 11 +-- arch/x86/include/asm/barrier.h | 43 ++++++++- arch/xtensa/include/asm/barrier.h | 9 +- include/asm-generic/barrier.h | 55 +++++++++--- include/linux/compiler.h | 9 ++ kernel/events/ring_buffer.c | 62 +++++++------ 40 files changed, 444 insertions(+), 623 deletions(-) ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() 2013-11-07 22:03 [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release() peterz @ 2013-11-07 22:03 ` peterz 2013-11-07 21:03 ` Mathieu Desnoyers 0 siblings, 1 reply; 6+ messages in thread From: peterz @ 2013-11-07 22:03 UTC (permalink / raw) To: linux-arch Cc: geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec, mathieu.desnoyers, michael, mikey, linux, schwidefsky, heiko.carstens, tony.luck, Will Deacon, Peter Zijlstra [-- Attachment #1: peter_zijlstra-arch-sr-la.patch --] [-- Type: text/plain, Size: 15047 bytes --] A number of situations currently require the heavyweight smp_mb(), even though there is no need to order prior stores against later loads. Many architectures have much cheaper ways to handle these situations, but the Linux kernel currently has no portable way to make use of them. This commit therefore supplies smp_load_acquire() and smp_store_release() to remedy this situation. The new smp_load_acquire() primitive orders the specified load against any subsequent reads or writes, while the new smp_store_release() primitive orders the specifed store against any prior reads or writes. These primitives allow array-based circular FIFOs to be implemented without an smp_mb(), and also allow a theoretical hole in rcu_assign_pointer() to be closed at no additional expense on most architectures. In addition, the RCU experience transitioning from explicit smp_read_barrier_depends() and smp_wmb() to rcu_dereference() and rcu_assign_pointer(), respectively resulted in substantial improvements in readability. It therefore seems likely that replacing other explicit barriers with smp_load_acquire() and smp_store_release() will provide similar benefits. It appears that roughly half of the explicit barriers in core kernel code might be so replaced. [Changelog by PaulMck] Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> --- arch/arm/include/asm/barrier.h | 15 ++++++++++ arch/arm64/include/asm/barrier.h | 50 ++++++++++++++++++++++++++++++++++++ arch/ia64/include/asm/barrier.h | 49 +++++++++++++++++++++++++++++++++++ arch/metag/include/asm/barrier.h | 15 ++++++++++ arch/mips/include/asm/barrier.h | 15 ++++++++++ arch/powerpc/include/asm/barrier.h | 21 ++++++++++++++- arch/s390/include/asm/barrier.h | 15 ++++++++++ arch/sparc/include/asm/barrier_64.h | 15 ++++++++++ arch/x86/include/asm/barrier.h | 15 ++++++++++ include/asm-generic/barrier.h | 15 ++++++++++ include/linux/compiler.h | 9 ++++++ 11 files changed, 233 insertions(+), 1 deletion(-) Index: linux-2.6/arch/arm/include/asm/barrier.h =================================================================== --- linux-2.6.orig/arch/arm/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 +++ linux-2.6/arch/arm/include/asm/barrier.h 2013-11-07 17:36:09.097170473 +0100 @@ -59,6 +59,21 @@ #define smp_wmb() dmb(ishst) #endif +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ___p1; \ +}) + #define read_barrier_depends() do { } while(0) #define smp_read_barrier_depends() do { } while(0) Index: linux-2.6/arch/arm64/include/asm/barrier.h =================================================================== --- linux-2.6.orig/arch/arm64/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 +++ linux-2.6/arch/arm64/include/asm/barrier.h 2013-11-07 17:36:09.098170492 +0100 @@ -35,10 +35,60 @@ #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() + +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ___p1; \ +}) + #else + #define smp_mb() asm volatile("dmb ish" : : : "memory") #define smp_rmb() asm volatile("dmb ishld" : : : "memory") #define smp_wmb() asm volatile("dmb ishst" : : : "memory") + +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + switch (sizeof(*p)) { \ + case 4: \ + asm volatile ("stlr %w1, %0" \ + : "=Q" (*p) : "r" (v) : "memory"); \ + break; \ + case 8: \ + asm volatile ("stlr %1, %0" \ + : "=Q" (*p) : "r" (v) : "memory"); \ + break; \ + } \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1; \ + compiletime_assert_atomic_type(*p); \ + switch (sizeof(*p)) { \ + case 4: \ + asm volatile ("ldar %w0, %1" \ + : "=r" (___p1) : "Q" (*p) : "memory"); \ + break; \ + case 8: \ + asm volatile ("ldar %0, %1" \ + : "=r" (___p1) : "Q" (*p) : "memory"); \ + break; \ + } \ + ___p1; \ +}) + #endif #define read_barrier_depends() do { } while(0) Index: linux-2.6/arch/ia64/include/asm/barrier.h =================================================================== --- linux-2.6.orig/arch/ia64/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 +++ linux-2.6/arch/ia64/include/asm/barrier.h 2013-11-07 17:36:09.098170492 +0100 @@ -45,11 +45,60 @@ # define smp_rmb() rmb() # define smp_wmb() wmb() # define smp_read_barrier_depends() read_barrier_depends() + +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + switch (sizeof(*p)) { \ + case 4: \ + asm volatile ("st4.rel [%0]=%1" \ + : "=r" (p) : "r" (v) : "memory"); \ + break; \ + case 8: \ + asm volatile ("st8.rel [%0]=%1" \ + : "=r" (p) : "r" (v) : "memory"); \ + break; \ + } \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1; \ + compiletime_assert_atomic_type(*p); \ + switch (sizeof(*p)) { \ + case 4: \ + asm volatile ("ld4.acq %0=[%1]" \ + : "=r" (___p1) : "r" (p) : "memory"); \ + break; \ + case 8: \ + asm volatile ("ld8.acq %0=[%1]" \ + : "=r" (___p1) : "r" (p) : "memory"); \ + break; \ + } \ + ___p1; \ +}) + #else + # define smp_mb() barrier() # define smp_rmb() barrier() # define smp_wmb() barrier() # define smp_read_barrier_depends() do { } while(0) + +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ___p1; \ +}) #endif /* Index: linux-2.6/arch/metag/include/asm/barrier.h =================================================================== --- linux-2.6.orig/arch/metag/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 +++ linux-2.6/arch/metag/include/asm/barrier.h 2013-11-07 17:36:09.099170511 +0100 @@ -82,4 +82,19 @@ #define smp_read_barrier_depends() do { } while (0) #define set_mb(var, value) do { var = value; smp_mb(); } while (0) +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ___p1; \ +}) + #endif /* _ASM_METAG_BARRIER_H */ Index: linux-2.6/arch/mips/include/asm/barrier.h =================================================================== --- linux-2.6.orig/arch/mips/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 +++ linux-2.6/arch/mips/include/asm/barrier.h 2013-11-07 17:36:09.099170511 +0100 @@ -180,4 +180,19 @@ #define nudge_writes() mb() #endif +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ___p1; \ +}) + #endif /* __ASM_BARRIER_H */ Index: linux-2.6/arch/powerpc/include/asm/barrier.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 +++ linux-2.6/arch/powerpc/include/asm/barrier.h 2013-11-07 17:36:09.100170529 +0100 @@ -45,11 +45,15 @@ # define SMPWMB eieio #endif +#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") + #define smp_mb() mb() -#define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") +#define smp_rmb() __lwsync() #define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") #define smp_read_barrier_depends() read_barrier_depends() #else +#define __lwsync() barrier() + #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() @@ -65,4 +69,19 @@ #define data_barrier(x) \ asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + __lwsync(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + __lwsync(); \ + ___p1; \ +}) + #endif /* _ASM_POWERPC_BARRIER_H */ Index: linux-2.6/arch/s390/include/asm/barrier.h =================================================================== --- linux-2.6.orig/arch/s390/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 +++ linux-2.6/arch/s390/include/asm/barrier.h 2013-11-07 17:36:09.100170529 +0100 @@ -32,4 +32,19 @@ #define set_mb(var, value) do { var = value; mb(); } while (0) +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + barrier(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + barrier(); \ + ___p1; \ +}) + #endif /* __ASM_BARRIER_H */ Index: linux-2.6/arch/sparc/include/asm/barrier_64.h =================================================================== --- linux-2.6.orig/arch/sparc/include/asm/barrier_64.h 2013-11-07 17:36:09.105170623 +0100 +++ linux-2.6/arch/sparc/include/asm/barrier_64.h 2013-11-07 17:36:09.101170548 +0100 @@ -53,4 +53,19 @@ #define smp_read_barrier_depends() do { } while(0) +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + barrier(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + barrier(); \ + ___p1; \ +}) + #endif /* !(__SPARC64_BARRIER_H) */ Index: linux-2.6/arch/x86/include/asm/barrier.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 +++ linux-2.6/arch/x86/include/asm/barrier.h 2013-11-07 22:23:46.097491898 +0100 @@ -92,12 +92,53 @@ #endif #define smp_read_barrier_depends() read_barrier_depends() #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) -#else +#else /* !SMP */ #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() #define smp_read_barrier_depends() do { } while (0) #define set_mb(var, value) do { var = value; barrier(); } while (0) +#endif /* SMP */ + +#if defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE) + +/* + * For either of these options x86 doesn't have a strong TSO memory + * model and we should fall back to full barriers. + */ + +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ___p1; \ +}) + +#else /* regular x86 TSO memory ordering */ + +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + barrier(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + barrier(); \ + ___p1; \ +}) + #endif /* Index: linux-2.6/include/asm-generic/barrier.h =================================================================== --- linux-2.6.orig/include/asm-generic/barrier.h 2013-11-07 17:36:09.105170623 +0100 +++ linux-2.6/include/asm-generic/barrier.h 2013-11-07 17:36:09.102170567 +0100 @@ -62,5 +62,20 @@ #define set_mb(var, value) do { (var) = (value); mb(); } while (0) #endif +#define smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +#define smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + smp_mb(); \ + ___p1; \ +}) + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_GENERIC_BARRIER_H */ Index: linux-2.6/include/linux/compiler.h =================================================================== --- linux-2.6.orig/include/linux/compiler.h 2013-11-07 17:36:09.105170623 +0100 +++ linux-2.6/include/linux/compiler.h 2013-11-07 17:36:09.102170567 +0100 @@ -298,6 +298,11 @@ # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) #endif +/* Is this type a native word size -- useful for atomic operations */ +#ifndef __native_word +# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long)) +#endif + /* Compile time object size, -1 for unknown */ #ifndef __compiletime_object_size # define __compiletime_object_size(obj) -1 @@ -337,6 +342,10 @@ #define compiletime_assert(condition, msg) \ _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) +#define compiletime_assert_atomic_type(t) \ + compiletime_assert(__native_word(t), \ + "Need native word sized stores/loads for atomicity.") + /* * Prevent the compiler from merging or refetching accesses. The compiler * is also forbidden from reordering successive instances of ACCESS_ONCE(), ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() 2013-11-07 22:03 ` [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() peterz @ 2013-11-07 21:03 ` Mathieu Desnoyers 2013-11-08 4:58 ` Paul Mackerras 0 siblings, 1 reply; 6+ messages in thread From: Mathieu Desnoyers @ 2013-11-07 21:03 UTC (permalink / raw) To: peterz Cc: linux-arch, geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec, michael, mikey, linux, schwidefsky, heiko.carstens, tony.luck, Will Deacon * peterz@infradead.org (peterz@infradead.org) wrote: > A number of situations currently require the heavyweight smp_mb(), > even though there is no need to order prior stores against later > loads. Many architectures have much cheaper ways to handle these > situations, but the Linux kernel currently has no portable way > to make use of them. > > This commit therefore supplies smp_load_acquire() and > smp_store_release() to remedy this situation. The new > smp_load_acquire() primitive orders the specified load against > any subsequent reads or writes, while the new smp_store_release() > primitive orders the specifed store against any prior reads or > writes. These primitives allow array-based circular FIFOs to be > implemented without an smp_mb(), and also allow a theoretical > hole in rcu_assign_pointer() to be closed at no additional > expense on most architectures. > > In addition, the RCU experience transitioning from explicit > smp_read_barrier_depends() and smp_wmb() to rcu_dereference() > and rcu_assign_pointer(), respectively resulted in substantial > improvements in readability. It therefore seems likely that > replacing other explicit barriers with smp_load_acquire() and > smp_store_release() will provide similar benefits. It appears > that roughly half of the explicit barriers in core kernel code > might be so replaced. > > [Changelog by PaulMck] > > Cc: Tony Luck <tony.luck@intel.com> > Cc: Oleg Nesterov <oleg@redhat.com> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Frederic Weisbecker <fweisbec@gmail.com> > Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > Cc: Michael Ellerman <michael@ellerman.id.au> > Cc: Michael Neuling <mikey@neuling.org> > Cc: Russell King <linux@arm.linux.org.uk> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> > Cc: Victor Kaplansky <VICTORK@il.ibm.com> > Acked-by: Will Deacon <will.deacon@arm.com> > Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > Signed-off-by: Peter Zijlstra <peterz@infradead.org> > --- > arch/arm/include/asm/barrier.h | 15 ++++++++++ > arch/arm64/include/asm/barrier.h | 50 ++++++++++++++++++++++++++++++++++++ > arch/ia64/include/asm/barrier.h | 49 +++++++++++++++++++++++++++++++++++ > arch/metag/include/asm/barrier.h | 15 ++++++++++ > arch/mips/include/asm/barrier.h | 15 ++++++++++ > arch/powerpc/include/asm/barrier.h | 21 ++++++++++++++- > arch/s390/include/asm/barrier.h | 15 ++++++++++ > arch/sparc/include/asm/barrier_64.h | 15 ++++++++++ > arch/x86/include/asm/barrier.h | 15 ++++++++++ > include/asm-generic/barrier.h | 15 ++++++++++ > include/linux/compiler.h | 9 ++++++ > 11 files changed, 233 insertions(+), 1 deletion(-) > > Index: linux-2.6/arch/arm/include/asm/barrier.h > =================================================================== > --- linux-2.6.orig/arch/arm/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 > +++ linux-2.6/arch/arm/include/asm/barrier.h 2013-11-07 17:36:09.097170473 +0100 > @@ -59,6 +59,21 @@ > #define smp_wmb() dmb(ishst) > #endif > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) Can you move those "generic" definitions into asm-generic/barrier.h under an ifdef guard ? The pattern using "smp_mb()" seems to be the right target for a generic implementation. We should probably document the requirements on sizeof(*p) and alignof(*p) directly above the macro definition. > + > #define read_barrier_depends() do { } while(0) > #define smp_read_barrier_depends() do { } while(0) > > Index: linux-2.6/arch/arm64/include/asm/barrier.h > =================================================================== > --- linux-2.6.orig/arch/arm64/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 > +++ linux-2.6/arch/arm64/include/asm/barrier.h 2013-11-07 17:36:09.098170492 +0100 > @@ -35,10 +35,60 @@ > #define smp_mb() barrier() > #define smp_rmb() barrier() > #define smp_wmb() barrier() > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #else > + > #define smp_mb() asm volatile("dmb ish" : : : "memory") > #define smp_rmb() asm volatile("dmb ishld" : : : "memory") > #define smp_wmb() asm volatile("dmb ishst" : : : "memory") > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("stlr %w1, %0" \ > + : "=Q" (*p) : "r" (v) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("stlr %1, %0" \ > + : "=Q" (*p) : "r" (v) : "memory"); \ > + break; \ > + } \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1; \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("ldar %w0, %1" \ > + : "=r" (___p1) : "Q" (*p) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("ldar %0, %1" \ > + : "=r" (___p1) : "Q" (*p) : "memory"); \ > + break; \ > + } \ > + ___p1; \ > +}) > + > #endif > > #define read_barrier_depends() do { } while(0) > Index: linux-2.6/arch/ia64/include/asm/barrier.h > =================================================================== > --- linux-2.6.orig/arch/ia64/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 > +++ linux-2.6/arch/ia64/include/asm/barrier.h 2013-11-07 17:36:09.098170492 +0100 > @@ -45,11 +45,60 @@ > # define smp_rmb() rmb() > # define smp_wmb() wmb() > # define smp_read_barrier_depends() read_barrier_depends() > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("st4.rel [%0]=%1" \ > + : "=r" (p) : "r" (v) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("st8.rel [%0]=%1" \ > + : "=r" (p) : "r" (v) : "memory"); \ > + break; \ > + } \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1; \ > + compiletime_assert_atomic_type(*p); \ > + switch (sizeof(*p)) { \ > + case 4: \ > + asm volatile ("ld4.acq %0=[%1]" \ > + : "=r" (___p1) : "r" (p) : "memory"); \ > + break; \ > + case 8: \ > + asm volatile ("ld8.acq %0=[%1]" \ > + : "=r" (___p1) : "r" (p) : "memory"); \ > + break; \ > + } \ > + ___p1; \ > +}) > + > #else > + > # define smp_mb() barrier() > # define smp_rmb() barrier() > # define smp_wmb() barrier() > # define smp_read_barrier_depends() do { } while(0) > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > #endif > > /* > Index: linux-2.6/arch/metag/include/asm/barrier.h > =================================================================== > --- linux-2.6.orig/arch/metag/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 > +++ linux-2.6/arch/metag/include/asm/barrier.h 2013-11-07 17:36:09.099170511 +0100 > @@ -82,4 +82,19 @@ > #define smp_read_barrier_depends() do { } while (0) > #define set_mb(var, value) do { var = value; smp_mb(); } while (0) > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #endif /* _ASM_METAG_BARRIER_H */ > Index: linux-2.6/arch/mips/include/asm/barrier.h > =================================================================== > --- linux-2.6.orig/arch/mips/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 > +++ linux-2.6/arch/mips/include/asm/barrier.h 2013-11-07 17:36:09.099170511 +0100 > @@ -180,4 +180,19 @@ > #define nudge_writes() mb() > #endif > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #endif /* __ASM_BARRIER_H */ > Index: linux-2.6/arch/powerpc/include/asm/barrier.h > =================================================================== > --- linux-2.6.orig/arch/powerpc/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 > +++ linux-2.6/arch/powerpc/include/asm/barrier.h 2013-11-07 17:36:09.100170529 +0100 > @@ -45,11 +45,15 @@ > # define SMPWMB eieio > #endif > > +#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > + > #define smp_mb() mb() > -#define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > +#define smp_rmb() __lwsync() > #define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") > #define smp_read_barrier_depends() read_barrier_depends() > #else > +#define __lwsync() barrier() > + > #define smp_mb() barrier() > #define smp_rmb() barrier() > #define smp_wmb() barrier() > @@ -65,4 +69,19 @@ > #define data_barrier(x) \ > asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + __lwsync(); \ Even though this is correct, it appears to bear more overhead than necessary. See arch/powerpc/include/asm/synch.h PPC_ACQUIRE_BARRIER and PPC_RELEASE_BARRIER You'll notice that some variants of powerpc require something more heavy-weight than a lwsync instruction. The fallback will be "isync" rather than "sync" if you use PPC_ACQUIRE_BARRIER and PPC_RELEASE_BARRIER rather than LWSYNC directly. > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + __lwsync(); \ > + ___p1; \ > +}) > + > #endif /* _ASM_POWERPC_BARRIER_H */ > Index: linux-2.6/arch/s390/include/asm/barrier.h > =================================================================== > --- linux-2.6.orig/arch/s390/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 > +++ linux-2.6/arch/s390/include/asm/barrier.h 2013-11-07 17:36:09.100170529 +0100 > @@ -32,4 +32,19 @@ > > #define set_mb(var, value) do { var = value; mb(); } while (0) > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ___p1; \ > +}) > + > #endif /* __ASM_BARRIER_H */ > Index: linux-2.6/arch/sparc/include/asm/barrier_64.h > =================================================================== > --- linux-2.6.orig/arch/sparc/include/asm/barrier_64.h 2013-11-07 17:36:09.105170623 +0100 > +++ linux-2.6/arch/sparc/include/asm/barrier_64.h 2013-11-07 17:36:09.101170548 +0100 > @@ -53,4 +53,19 @@ > > #define smp_read_barrier_depends() do { } while(0) > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ___p1; \ > +}) > + > #endif /* !(__SPARC64_BARRIER_H) */ > Index: linux-2.6/arch/x86/include/asm/barrier.h > =================================================================== > --- linux-2.6.orig/arch/x86/include/asm/barrier.h 2013-11-07 17:36:09.105170623 +0100 > +++ linux-2.6/arch/x86/include/asm/barrier.h 2013-11-07 22:23:46.097491898 +0100 > @@ -92,12 +92,53 @@ > #endif > #define smp_read_barrier_depends() read_barrier_depends() > #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) > -#else > +#else /* !SMP */ > #define smp_mb() barrier() > #define smp_rmb() barrier() > #define smp_wmb() barrier() > #define smp_read_barrier_depends() do { } while (0) > #define set_mb(var, value) do { var = value; barrier(); } while (0) > +#endif /* SMP */ > + > +#if defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE) > + > +/* > + * For either of these options x86 doesn't have a strong TSO memory > + * model and we should fall back to full barriers. > + */ > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > +#else /* regular x86 TSO memory ordering */ > + > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + barrier(); \ > + ___p1; \ > +}) Hrm, I really don't get the two barrier() above. On x86, in a standard lock, we can get away with having surrounding memory barriers defined as compiler barrier() because the LOCK prefix of the atomic instructions taking and releasing the lock are implicit full memory barriers. Understandably, TSO allows you to remove write barriers. However, AFAIU, the smp_store_release()/smp_load_acquire() semantics provides ordering guarantees for both loads and stores with respect to the store_release/load_acquire operations. I don't see how the simple compiler barrier() here conveys this. Unless what you really mean is that the smp_load_acquire() only provides ordering guarantees of following loads with respect to the load_acquire, and that smp_store_release() only provides ordering guarantees of prior writes before the store_release ? If this is the case, then I think the names chosen are too short and don't convey that: a) those are load and store operations, b) those have an acquire/release semantic which scope only targets, respectively, other load and store operations. Maybe the following names would be clearer ? smp_store_release_wmb(p, v) smp_load_acquire_rmb(p) Or maybe we just need to document really well what's the semantic of a store_release and load_acquire. Furthermore, I don't see how a simple compiler barrier() can provide the acquire semantic within smp_load_acquire on x86 TSO. AFAIU, a smp_rmb() might be needed. > + > #endif > > /* > Index: linux-2.6/include/asm-generic/barrier.h > =================================================================== > --- linux-2.6.orig/include/asm-generic/barrier.h 2013-11-07 17:36:09.105170623 +0100 > +++ linux-2.6/include/asm-generic/barrier.h 2013-11-07 17:36:09.102170567 +0100 > @@ -62,5 +62,20 @@ > #define set_mb(var, value) do { (var) = (value); mb(); } while (0) > #endif > > +#define smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ACCESS_ONCE(*p) = (v); \ > +} while (0) > + > +#define smp_load_acquire(p) \ > +({ \ > + typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > + compiletime_assert_atomic_type(*p); \ > + smp_mb(); \ > + ___p1; \ > +}) > + > #endif /* !__ASSEMBLY__ */ > #endif /* __ASM_GENERIC_BARRIER_H */ > Index: linux-2.6/include/linux/compiler.h > =================================================================== > --- linux-2.6.orig/include/linux/compiler.h 2013-11-07 17:36:09.105170623 +0100 > +++ linux-2.6/include/linux/compiler.h 2013-11-07 17:36:09.102170567 +0100 > @@ -298,6 +298,11 @@ > # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) > #endif > > +/* Is this type a native word size -- useful for atomic operations */ > +#ifndef __native_word > +# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long)) > +#endif Should we also check the pointer alignment, or that would be going too far ? Thanks, Mathieu > + > /* Compile time object size, -1 for unknown */ > #ifndef __compiletime_object_size > # define __compiletime_object_size(obj) -1 > @@ -337,6 +342,10 @@ > #define compiletime_assert(condition, msg) \ > _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) > > +#define compiletime_assert_atomic_type(t) \ > + compiletime_assert(__native_word(t), \ > + "Need native word sized stores/loads for atomicity.") > + > /* > * Prevent the compiler from merging or refetching accesses. The compiler > * is also forbidden from reordering successive instances of ACCESS_ONCE(), > > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() 2013-11-07 21:03 ` Mathieu Desnoyers @ 2013-11-08 4:58 ` Paul Mackerras 0 siblings, 0 replies; 6+ messages in thread From: Paul Mackerras @ 2013-11-08 4:58 UTC (permalink / raw) To: Mathieu Desnoyers Cc: peterz, linux-arch, geert, paulmck, torvalds, VICTORK, oleg, anton, benh, fweisbec, michael, mikey, linux, schwidefsky, heiko.carstens, tony.luck, Will Deacon On Thu, Nov 07, 2013 at 04:03:20PM -0500, Mathieu Desnoyers wrote: > * peterz@infradead.org (peterz@infradead.org) wrote: > > +#define smp_store_release(p, v) \ > > +do { \ > > + compiletime_assert_atomic_type(*p); \ > > + __lwsync(); \ > > Even though this is correct, it appears to bear more overhead than > necessary. See arch/powerpc/include/asm/synch.h > > PPC_ACQUIRE_BARRIER and PPC_RELEASE_BARRIER > > You'll notice that some variants of powerpc require something more > heavy-weight than a lwsync instruction. The fallback will be "isync" > rather than "sync" if you use PPC_ACQUIRE_BARRIER and > PPC_RELEASE_BARRIER rather than LWSYNC directly. I think this needs to fall back to sync as Peter has it. isync is not actually a memory barrier. It is more like an execution barrier, and when used after stwcx.; bne (or stdcx.; bne) it has the effect of preventing following loads/stores from being executed until the stwcx. has completed. In this case we're not using stwcx./stdcx., just a normal store, so isync won't have the desired effect. Regards, Paul. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-11-08 13:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20131108062703.GC2693@Krystal>
2013-11-08 10:29 ` [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() Mathieu Desnoyers
2013-11-08 11:08 ` Peter Zijlstra
2013-11-08 13:53 ` Mathieu Desnoyers
2013-11-07 22:03 [PATCH 0/4] arch: Introduce smp_load_acquire() and smp_store_release() peterz
2013-11-07 22:03 ` [PATCH 3/4] arch: Introduce smp_load_acquire(), smp_store_release() peterz
2013-11-07 21:03 ` Mathieu Desnoyers
2013-11-08 4:58 ` Paul Mackerras
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).