[PATCH v5 tip/core/locking 3/7] Documentation/memory-barriers.txt: Prohibit speculative writes

public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v5 tip/core/locking 3/7] Documentation/memory-barriers.txt: Prohibit speculative writes
       [not found] ` <1386638883-25379-1-git-send-email-paulmck@linux.vnet.ibm.com>
@ 2013-12-10  1:27   ` Paul E. McKenney
  2013-12-10  1:28   ` [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier Paul E. McKenney
  1 sibling, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-12-10  1:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	Paul E. McKenney, Linux-Arch

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

No SMP architecture currently supporting Linux allows speculative writes,
so this commit updates Documentation/memory-barriers.txt to prohibit
them in Linux core code.  It also records restrictions on their use.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linux-Arch <linux-arch@vger.kernel.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 Documentation/memory-barriers.txt | 183 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 175 insertions(+), 8 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 2d22da095a60..deafa36aeea1 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -571,11 +571,10 @@ dependency barrier to make it work correctly.  Consider the following bit of
 code:
 
 	q = ACCESS_ONCE(a);
-	if (p) {
-		<data dependency barrier>
-		q = ACCESS_ONCE(b);
+	if (q) {
+		<data dependency barrier>  /* BUG: No data dependency!!! */
+		p = ACCESS_ONCE(b);
 	}
-	x = *q;
 
 This will not have the desired effect because there is no actual data
 dependency, but rather a control dependency that the CPU may short-circuit
@@ -584,11 +583,176 @@ the load from b as having happened before the load from a.  In such a
 case what's actually required is:
 
 	q = ACCESS_ONCE(a);
-	if (p) {
+	if (q) {
 		<read barrier>
-		q = ACCESS_ONCE(b);
+		p = ACCESS_ONCE(b);
 	}
-	x = *q;
+
+However, stores are not speculated.  This means that ordering -is- provided
+in the following example:
+
+	q = ACCESS_ONCE(a);
+	if (ACCESS_ONCE(q)) {
+		ACCESS_ONCE(b) = p;
+	}
+
+Please note that ACCESS_ONCE() is not optional!  Without the ACCESS_ONCE(),
+the compiler is within its rights to transform this example:
+
+	q = a;
+	if (q) {
+		b = p;  /* BUG: Compiler can reorder!!! */
+		do_something();
+	} else {
+		b = p;  /* BUG: Compiler can reorder!!! */
+		do_something_else();
+	}
+
+into this, which of course defeats the ordering:
+
+	b = p;
+	q = a;
+	if (q)
+		do_something();
+	else
+		do_something_else();
+
+Worse yet, if the compiler is able to prove (say) that the value of
+variable 'a' is always non-zero, it would be well within its rights
+to optimize the original example by eliminating the "if" statement
+as follows:
+
+	q = a;
+	b = p;  /* BUG: Compiler can reorder!!! */
+	do_something();
+
+The solution is again ACCESS_ONCE(), which preserves the ordering between
+the load from variable 'a' and the store to variable 'b':
+
+	q = ACCESS_ONCE(a);
+	if (q) {
+		ACCESS_ONCE(b) = p;
+		do_something();
+	} else {
+		ACCESS_ONCE(b) = p;
+		do_something_else();
+	}
+
+You could also use barrier() to prevent the compiler from moving
+the stores to variable 'b', but barrier() would not prevent the
+compiler from proving to itself that a==1 always, so ACCESS_ONCE()
+is also needed.
+
+It is important to note that control dependencies absolutely require a
+a conditional.  For example, the following "optimized" version of
+the above example breaks ordering:
+
+	q = ACCESS_ONCE(a);
+	ACCESS_ONCE(b) = p;  /* BUG: No ordering vs. load from a!!! */
+	if (q) {
+		/* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */
+		do_something();
+	} else {
+		/* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */
+		do_something_else();
+	}
+
+It is of course legal for the prior load to be part of the conditional,
+for example, as follows:
+
+	if (ACCESS_ONCE(a) > 0) {
+		ACCESS_ONCE(b) = q / 2;
+		do_something();
+	} else {
+		ACCESS_ONCE(b) = q / 3;
+		do_something_else();
+	}
+
+This will again ensure that the load from variable 'a' is ordered before the
+stores to variable 'b'.
+
+In addition, you need to be careful what you do with the local variable 'q',
+otherwise the compiler might be able to guess the value and again remove
+the needed conditional.  For example:
+
+	q = ACCESS_ONCE(a);
+	if (q % MAX) {
+		ACCESS_ONCE(b) = p;
+		do_something();
+	} else {
+		ACCESS_ONCE(b) = p;
+		do_something_else();
+	}
+
+If MAX is defined to be 1, then the compiler knows that (q % MAX) is
+equal to zero, in which case the compiler is within its rights to
+transform the above code into the following:
+
+	q = ACCESS_ONCE(a);
+	ACCESS_ONCE(b) = p;
+	do_something_else();
+
+This transformation loses the ordering between the load from variable 'a'
+and the store to variable 'b'.  If you are relying on this ordering, you
+should do something like the following:
+
+	q = ACCESS_ONCE(a);
+	BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
+	if (q % MAX) {
+		ACCESS_ONCE(b) = p;
+		do_something();
+	} else {
+		ACCESS_ONCE(b) = p;
+		do_something_else();
+	}
+
+Finally, control dependencies do -not- provide transitivity.  This is
+demonstrated by two related examples:
+
+	CPU 0                     CPU 1
+	=====================     =====================
+	r1 = ACCESS_ONCE(x);      r2 = ACCESS_ONCE(y);
+	if (r1 >= 0)              if (r2 >= 0)
+	  ACCESS_ONCE(y) = 1;       ACCESS_ONCE(x) = 1;
+
+	assert(!(r1 == 1 && r2 == 1));
+
+The above two-CPU example will never trigger the assert().  However,
+if control dependencies guaranteed transitivity (which they do not),
+then adding the following two CPUs would guarantee a related assertion:
+
+	CPU 2                     CPU 3
+	=====================     =====================
+	ACCESS_ONCE(x) = 2;       ACCESS_ONCE(y) = 2;
+
+	assert(!(r1 == 2 && r2 == 2 && x == 1 && y == 1)); /* FAILS!!! */
+
+But because control dependencies do -not- provide transitivity, the
+above assertion can fail after the combined four-CPU example completes.
+If you need the four-CPU example to provide ordering, you will need
+smp_mb() between the loads and stores in the CPU 0 and CPU 1 code fragments.
+
+In summary:
+
+  (*) Control dependencies can order prior loads against later stores.
+      However, they do -not- guarantee any other sort of ordering:
+      Not prior loads against later loads, nor prior stores against
+      later anything.  If you need these other forms of ordering,
+      use smb_rmb(), smp_wmb(), or, in the case of prior stores and
+      later loads, smp_mb().
+
+  (*) Control dependencies require at least one run-time conditional
+      between the prior load and the subsequent store.  If the compiler
+      is able to optimize the conditional away, it will have also
+      optimized away the ordering.  Careful use of ACCESS_ONCE() can
+      help to preserve the needed conditional.
+
+  (*) Control dependencies require that the compiler avoid reordering the
+      dependency into nonexistence.  Careful use of ACCESS_ONCE() or
+      barrier() can help to preserve your control dependency.
+
+  (*) Control dependencies do -not- provide transitivity.  If you
+      need transitivity, use smp_mb().
 
 
 SMP BARRIER PAIRING
@@ -1083,7 +1247,10 @@ compiler from moving the memory accesses either side of it to the other side:
 
 	barrier();
 
-This is a general barrier - lesser varieties of compiler barrier do not exist.
+This is a general barrier -- there are no read-read or write-write variants
+of barrier().  Howevever, ACCESS_ONCE() can be thought of as a weak form
+for barrier() that affects only the specific accesses flagged by the
+ACCESS_ONCE().
 
 The compiler barrier has no direct effect on the CPU, which may then reorder
 things however it wishes.
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
       [not found] ` <1386638883-25379-1-git-send-email-paulmck@linux.vnet.ibm.com>
  2013-12-10  1:27   ` [PATCH v5 tip/core/locking 3/7] Documentation/memory-barriers.txt: Prohibit speculative writes Paul E. McKenney
@ 2013-12-10  1:28   ` Paul E. McKenney
  2013-12-10  1:28     ` Paul E. McKenney
                       ` (3 more replies)
  1 sibling, 4 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-12-10  1:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	Paul E. McKenney, Linux-Arch, Ingo Molnar, Oleg Nesterov,
	Linus Torvalds

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The Linux kernel has traditionally required that an UNLOCK+LOCK pair
act as a full memory barrier when either (1) that UNLOCK+LOCK pair
was executed by the same CPU or task, or (2) the same lock variable
was used for the UNLOCK and LOCK.  It now seems likely that very few
places in the kernel rely on this full-memory-barrier semantic, and
with the advent of queued locks, providing this semantic either requires
complex reasoning, or for some architectures, added overhead.

This commit therefore adds a smp_mb__after_unlock_lock(), which may be
placed after a LOCK primitive to restore the full-memory-barrier semantic.
All definitions are currently no-ops, but will be upgraded for some
architectures when queued locks arrive.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linux-Arch <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/arm/include/asm/barrier.h      | 2 ++
 arch/arm64/include/asm/barrier.h    | 2 ++
 arch/ia64/include/asm/barrier.h     | 2 ++
 arch/metag/include/asm/barrier.h    | 2 ++
 arch/mips/include/asm/barrier.h     | 2 ++
 arch/powerpc/include/asm/barrier.h  | 2 ++
 arch/s390/include/asm/barrier.h     | 2 ++
 arch/sparc/include/asm/barrier_64.h | 2 ++
 arch/x86/include/asm/barrier.h      | 2 ++
 include/asm-generic/barrier.h       | 2 ++
 10 files changed, 20 insertions(+)

diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 2f59f7443396..133aa543dfdb 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -74,6 +74,8 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #define read_barrier_depends()		do { } while(0)
 #define smp_read_barrier_depends()	do { } while(0)
 
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 78e20ba8806b..f97efda22e0a 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -91,6 +91,8 @@ do {									\
 
 #endif
 
+#define smp_mb__after_unlock_lock()	do { } while 0
+
 #define read_barrier_depends()		do { } while(0)
 #define smp_read_barrier_depends()	do { } while(0)
 
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index bddcfe91d9c8..fb64ead98e65 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -101,6 +101,8 @@ do {									\
 })
 #endif
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 /*
  * XXX check on this ---I suspect what Linus really wants here is
  * acquire vs release semantics but we can't discuss this stuff with
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index 5d6b4b407dda..76a3ff950607 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -97,4 +97,6 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* _ASM_METAG_BARRIER_H */
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index e1aa4e4c2984..6f9b586241d6 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -195,4 +195,6 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* __ASM_BARRIER_H */
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index f89da808ce31..abf645799991 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -84,4 +84,6 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* _ASM_POWERPC_BARRIER_H */
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 578680f6207a..c8f12244aed3 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -47,4 +47,6 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* __ASM_BARRIER_H */
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index b5aad964558e..a2b73329f89c 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -68,4 +68,6 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* !(__SPARC64_BARRIER_H) */
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index f7053f354ac3..0371fd15f7ee 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -115,6 +115,8 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 /*
  * Stop RDTSC speculation. This is needed when you need to use RDTSC
  * (or get_cycles or vread that possibly accesses the TSC) in a defined
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 6f692f8ac664..938ecf3a7cd8 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -77,5 +77,7 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10  1:28   ` [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier Paul E. McKenney
@ 2013-12-10  1:28     ` Paul E. McKenney
  2013-12-10  1:34     ` Josh Triplett
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-12-10  1:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	Paul E. McKenney, Linux-Arch, Ingo Molnar, Oleg Nesterov,
	Linus Torvalds

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The Linux kernel has traditionally required that an UNLOCK+LOCK pair
act as a full memory barrier when either (1) that UNLOCK+LOCK pair
was executed by the same CPU or task, or (2) the same lock variable
was used for the UNLOCK and LOCK.  It now seems likely that very few
places in the kernel rely on this full-memory-barrier semantic, and
with the advent of queued locks, providing this semantic either requires
complex reasoning, or for some architectures, added overhead.

This commit therefore adds a smp_mb__after_unlock_lock(), which may be
placed after a LOCK primitive to restore the full-memory-barrier semantic.
All definitions are currently no-ops, but will be upgraded for some
architectures when queued locks arrive.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linux-Arch <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/arm/include/asm/barrier.h      | 2 ++
 arch/arm64/include/asm/barrier.h    | 2 ++
 arch/ia64/include/asm/barrier.h     | 2 ++
 arch/metag/include/asm/barrier.h    | 2 ++
 arch/mips/include/asm/barrier.h     | 2 ++
 arch/powerpc/include/asm/barrier.h  | 2 ++
 arch/s390/include/asm/barrier.h     | 2 ++
 arch/sparc/include/asm/barrier_64.h | 2 ++
 arch/x86/include/asm/barrier.h      | 2 ++
 include/asm-generic/barrier.h       | 2 ++
 10 files changed, 20 insertions(+)

diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 2f59f7443396..133aa543dfdb 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -74,6 +74,8 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #define read_barrier_depends()		do { } while(0)
 #define smp_read_barrier_depends()	do { } while(0)
 
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 78e20ba8806b..f97efda22e0a 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -91,6 +91,8 @@ do {									\
 
 #endif
 
+#define smp_mb__after_unlock_lock()	do { } while 0
+
 #define read_barrier_depends()		do { } while(0)
 #define smp_read_barrier_depends()	do { } while(0)
 
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index bddcfe91d9c8..fb64ead98e65 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -101,6 +101,8 @@ do {									\
 })
 #endif
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 /*
  * XXX check on this ---I suspect what Linus really wants here is
  * acquire vs release semantics but we can't discuss this stuff with
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index 5d6b4b407dda..76a3ff950607 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -97,4 +97,6 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* _ASM_METAG_BARRIER_H */
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index e1aa4e4c2984..6f9b586241d6 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -195,4 +195,6 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* __ASM_BARRIER_H */
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index f89da808ce31..abf645799991 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -84,4 +84,6 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* _ASM_POWERPC_BARRIER_H */
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 578680f6207a..c8f12244aed3 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -47,4 +47,6 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* __ASM_BARRIER_H */
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index b5aad964558e..a2b73329f89c 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -68,4 +68,6 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* !(__SPARC64_BARRIER_H) */
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index f7053f354ac3..0371fd15f7ee 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -115,6 +115,8 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 /*
  * Stop RDTSC speculation. This is needed when you need to use RDTSC
  * (or get_cycles or vread that possibly accesses the TSC) in a defined
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 6f692f8ac664..938ecf3a7cd8 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -77,5 +77,7 @@ do {									\
 	___p1;								\
 })
 
+#define smp_mb__after_unlock_lock()	do { } while (0)
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10  1:28   ` [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier Paul E. McKenney
  2013-12-10  1:28     ` Paul E. McKenney
@ 2013-12-10  1:34     ` Josh Triplett
  2013-12-10  5:26       ` Paul E. McKenney
  2013-12-10 12:37     ` Peter Zijlstra
  2013-12-10 17:04     ` Oleg Nesterov
  3 siblings, 1 reply; 14+ messages in thread
From: Josh Triplett @ 2013-12-10  1:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	sbw, Linux-Arch, Ingo Molnar, Oleg Nesterov, Linus Torvalds

On Mon, Dec 09, 2013 at 05:28:02PM -0800, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> The Linux kernel has traditionally required that an UNLOCK+LOCK pair
> act as a full memory barrier when either (1) that UNLOCK+LOCK pair
> was executed by the same CPU or task, or (2) the same lock variable
> was used for the UNLOCK and LOCK.  It now seems likely that very few
> places in the kernel rely on this full-memory-barrier semantic, and
> with the advent of queued locks, providing this semantic either requires
> complex reasoning, or for some architectures, added overhead.
> 
> This commit therefore adds a smp_mb__after_unlock_lock(), which may be
> placed after a LOCK primitive to restore the full-memory-barrier semantic.
> All definitions are currently no-ops, but will be upgraded for some
> architectures when queued locks arrive.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Linux-Arch <linux-arch@vger.kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>

It seems quite unfortunate that this isn't in some common location, and
then only overridden by architectures that need to do so.

More importantly: you document this earlier in the patch series than you
introduce it.

- Josh Triplett

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10  1:34     ` Josh Triplett
@ 2013-12-10  5:26       ` Paul E. McKenney
  2013-12-10 18:53         ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2013-12-10  5:26 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	sbw, Linux-Arch, Ingo Molnar, Oleg Nesterov, Linus Torvalds

On Mon, Dec 09, 2013 at 05:34:17PM -0800, Josh Triplett wrote:
> On Mon, Dec 09, 2013 at 05:28:02PM -0800, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > The Linux kernel has traditionally required that an UNLOCK+LOCK pair
> > act as a full memory barrier when either (1) that UNLOCK+LOCK pair
> > was executed by the same CPU or task, or (2) the same lock variable
> > was used for the UNLOCK and LOCK.  It now seems likely that very few
> > places in the kernel rely on this full-memory-barrier semantic, and
> > with the advent of queued locks, providing this semantic either requires
> > complex reasoning, or for some architectures, added overhead.
> > 
> > This commit therefore adds a smp_mb__after_unlock_lock(), which may be
> > placed after a LOCK primitive to restore the full-memory-barrier semantic.
> > All definitions are currently no-ops, but will be upgraded for some
> > architectures when queued locks arrive.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Linux-Arch <linux-arch@vger.kernel.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> 
> It seems quite unfortunate that this isn't in some common location, and
> then only overridden by architectures that need to do so.

I was thinking that include/asm-generic/barrier.h was the place, but
it is all-or-nothing, used by UP architectures, from what I can see.
I figured that if there is such a common location, posting this patch
might flush it out.  I am not sure that this single definition is worth
the creation of a common place -- or even this definition combined with
smp_read_barrier_depends().

> More importantly: you document this earlier in the patch series than you
> introduce it.

Fair point, I reversed the order of those two patches.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10  1:28   ` [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier Paul E. McKenney
  2013-12-10  1:28     ` Paul E. McKenney
  2013-12-10  1:34     ` Josh Triplett
@ 2013-12-10 12:37     ` Peter Zijlstra
  2013-12-10 17:17       ` Paul E. McKenney
  2013-12-10 17:45       ` Josh Triplett
  2013-12-10 17:04     ` Oleg Nesterov
  3 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2013-12-10 12:37 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec,
	sbw, Linux-Arch, Ingo Molnar, Oleg Nesterov, Linus Torvalds

On Mon, Dec 09, 2013 at 05:28:02PM -0800, Paul E. McKenney wrote:
> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> index f89da808ce31..abf645799991 100644
> --- a/arch/powerpc/include/asm/barrier.h
> +++ b/arch/powerpc/include/asm/barrier.h
> @@ -84,4 +84,6 @@ do {									\
>  	___p1;								\
>  })
>  
> +#define smp_mb__after_unlock_lock()	do { } while (0)
> +
>  #endif /* _ASM_POWERPC_BARRIER_H */

Didn't ben said ppc actually violates the current unlock+lock assumtion
and therefore this barrier woulnd't actually be a nop on ppc

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10  1:28   ` [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier Paul E. McKenney
                       ` (2 preceding siblings ...)
  2013-12-10 12:37     ` Peter Zijlstra
@ 2013-12-10 17:04     ` Oleg Nesterov
  2013-12-10 17:18       ` Paul E. McKenney
  3 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-12-10 17:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, darren,
	fweisbec, sbw, Linux-Arch, Ingo Molnar, Linus Torvalds

On 12/09, Paul E. McKenney wrote:
>
> This commit therefore adds a smp_mb__after_unlock_lock(), which may be
> placed after a LOCK primitive to restore the full-memory-barrier semantic.
> All definitions are currently no-ops, but will be upgraded for some
> architectures when queued locks arrive.

I am wondering, perhaps smp_mb__after_unlock() makes more sense?

Note that it already has the potential user:

	--- x/kernel/sched/wait.c
	+++ x/kernel/sched/wait.c
	@@ -176,8 +176,9 @@ prepare_to_wait(wait_queue_head_t *q, wa
		spin_lock_irqsave(&q->lock, flags);
		if (list_empty(&wait->task_list))
			__add_wait_queue(q, wait);
	-	set_current_state(state);
	+	__set_current_state(state);
		spin_unlock_irqrestore(&q->lock, flags);
	+	smp_mb__after_unlock();
	 }
	 EXPORT_SYMBOL(prepare_to_wait);
	 
	@@ -190,8 +191,9 @@ prepare_to_wait_exclusive(wait_queue_hea
		spin_lock_irqsave(&q->lock, flags);
		if (list_empty(&wait->task_list))
			__add_wait_queue_tail(q, wait);
	-	set_current_state(state);
	+	__set_current_state(state);
		spin_unlock_irqrestore(&q->lock, flags);
	+	smp_mb__after_unlock();
	 }
	 EXPORT_SYMBOL(prepare_to_wait_exclusive);
	 

Assuming it can also be used "later", after another LOCK, like in
your example in 5/7.

Oleg.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10 12:37     ` Peter Zijlstra
@ 2013-12-10 17:17       ` Paul E. McKenney
  2013-12-10 17:45       ` Josh Triplett
  1 sibling, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-12-10 17:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec,
	sbw, Linux-Arch, Ingo Molnar, Oleg Nesterov, Linus Torvalds, benh

On Tue, Dec 10, 2013 at 01:37:26PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 09, 2013 at 05:28:02PM -0800, Paul E. McKenney wrote:
> > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> > index f89da808ce31..abf645799991 100644
> > --- a/arch/powerpc/include/asm/barrier.h
> > +++ b/arch/powerpc/include/asm/barrier.h
> > @@ -84,4 +84,6 @@ do {									\
> >  	___p1;								\
> >  })
> >  
> > +#define smp_mb__after_unlock_lock()	do { } while (0)
> > +
> >  #endif /* _ASM_POWERPC_BARRIER_H */
> 
> Didn't ben said ppc actually violates the current unlock+lock assumtion
> and therefore this barrier woulnd't actually be a nop on ppc

Last I knew, I was saying that it did in theory, but wasn't able to
demonstrate it in practice.  But yes, I would be more comfortable with
it being smp_mb().

Ben?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10 17:04     ` Oleg Nesterov
@ 2013-12-10 17:18       ` Paul E. McKenney
  2013-12-10 17:18         ` Paul E. McKenney
  2013-12-10 17:32         ` Oleg Nesterov
  0 siblings, 2 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-12-10 17:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, darren,
	fweisbec, sbw, Linux-Arch, Ingo Molnar, Linus Torvalds

On Tue, Dec 10, 2013 at 06:04:04PM +0100, Oleg Nesterov wrote:
> On 12/09, Paul E. McKenney wrote:
> >
> > This commit therefore adds a smp_mb__after_unlock_lock(), which may be
> > placed after a LOCK primitive to restore the full-memory-barrier semantic.
> > All definitions are currently no-ops, but will be upgraded for some
> > architectures when queued locks arrive.
> 
> I am wondering, perhaps smp_mb__after_unlock() makes more sense?
> 
> Note that it already has the potential user:
> 
> 	--- x/kernel/sched/wait.c
> 	+++ x/kernel/sched/wait.c
> 	@@ -176,8 +176,9 @@ prepare_to_wait(wait_queue_head_t *q, wa
> 		spin_lock_irqsave(&q->lock, flags);
> 		if (list_empty(&wait->task_list))
> 			__add_wait_queue(q, wait);
> 	-	set_current_state(state);
> 	+	__set_current_state(state);
> 		spin_unlock_irqrestore(&q->lock, flags);
> 	+	smp_mb__after_unlock();
> 	 }
> 	 EXPORT_SYMBOL(prepare_to_wait);
> 	 
> 	@@ -190,8 +191,9 @@ prepare_to_wait_exclusive(wait_queue_hea
> 		spin_lock_irqsave(&q->lock, flags);
> 		if (list_empty(&wait->task_list))
> 			__add_wait_queue_tail(q, wait);
> 	-	set_current_state(state);
> 	+	__set_current_state(state);
> 		spin_unlock_irqrestore(&q->lock, flags);
> 	+	smp_mb__after_unlock();
> 	 }
> 	 EXPORT_SYMBOL(prepare_to_wait_exclusive);
> 	 
> 
> Assuming it can also be used "later", after another LOCK, like in
> your example in 5/7.

I am fine either way.  But there was an objection to tying this to the
unlock because it costs more on many architectures than tying this to
the lock.

But if you are saying "in addition to" rather than "instead of" that
would be a different conversation.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10 17:18       ` Paul E. McKenney
@ 2013-12-10 17:18         ` Paul E. McKenney
  2013-12-10 17:32         ` Oleg Nesterov
  1 sibling, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-12-10 17:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, darren,
	fweisbec, sbw, Linux-Arch, Ingo Molnar, Linus Torvalds

On Tue, Dec 10, 2013 at 06:04:04PM +0100, Oleg Nesterov wrote:
> On 12/09, Paul E. McKenney wrote:
> >
> > This commit therefore adds a smp_mb__after_unlock_lock(), which may be
> > placed after a LOCK primitive to restore the full-memory-barrier semantic.
> > All definitions are currently no-ops, but will be upgraded for some
> > architectures when queued locks arrive.
> 
> I am wondering, perhaps smp_mb__after_unlock() makes more sense?
> 
> Note that it already has the potential user:
> 
> 	--- x/kernel/sched/wait.c
> 	+++ x/kernel/sched/wait.c
> 	@@ -176,8 +176,9 @@ prepare_to_wait(wait_queue_head_t *q, wa
> 		spin_lock_irqsave(&q->lock, flags);
> 		if (list_empty(&wait->task_list))
> 			__add_wait_queue(q, wait);
> 	-	set_current_state(state);
> 	+	__set_current_state(state);
> 		spin_unlock_irqrestore(&q->lock, flags);
> 	+	smp_mb__after_unlock();
> 	 }
> 	 EXPORT_SYMBOL(prepare_to_wait);
> 	 
> 	@@ -190,8 +191,9 @@ prepare_to_wait_exclusive(wait_queue_hea
> 		spin_lock_irqsave(&q->lock, flags);
> 		if (list_empty(&wait->task_list))
> 			__add_wait_queue_tail(q, wait);
> 	-	set_current_state(state);
> 	+	__set_current_state(state);
> 		spin_unlock_irqrestore(&q->lock, flags);
> 	+	smp_mb__after_unlock();
> 	 }
> 	 EXPORT_SYMBOL(prepare_to_wait_exclusive);
> 	 
> 
> Assuming it can also be used "later", after another LOCK, like in
> your example in 5/7.

I am fine either way.  But there was an objection to tying this to the
unlock because it costs more on many architectures than tying this to
the lock.

But if you are saying "in addition to" rather than "instead of" that
would be a different conversation.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10 17:18       ` Paul E. McKenney
  2013-12-10 17:18         ` Paul E. McKenney
@ 2013-12-10 17:32         ` Oleg Nesterov
  1 sibling, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-12-10 17:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, darren,
	fweisbec, sbw, Linux-Arch, Ingo Molnar, Linus Torvalds

On 12/10, Paul E. McKenney wrote:
>
> On Tue, Dec 10, 2013 at 06:04:04PM +0100, Oleg Nesterov wrote:
> >
> > I am wondering, perhaps smp_mb__after_unlock() makes more sense?
> >
> > Note that it already has the potential user:
> >
> > 	--- x/kernel/sched/wait.c
> > 	+++ x/kernel/sched/wait.c
> > 	@@ -176,8 +176,9 @@ prepare_to_wait(wait_queue_head_t *q, wa
> > 		spin_lock_irqsave(&q->lock, flags);
> > 		if (list_empty(&wait->task_list))
> > 			__add_wait_queue(q, wait);
> > 	-	set_current_state(state);
> > 	+	__set_current_state(state);
> > 		spin_unlock_irqrestore(&q->lock, flags);
> > 	+	smp_mb__after_unlock();
> > 	 }
> > 	 EXPORT_SYMBOL(prepare_to_wait);
> >
> > 	@@ -190,8 +191,9 @@ prepare_to_wait_exclusive(wait_queue_hea
> > 		spin_lock_irqsave(&q->lock, flags);
> > 		if (list_empty(&wait->task_list))
> > 			__add_wait_queue_tail(q, wait);
> > 	-	set_current_state(state);
> > 	+	__set_current_state(state);
> > 		spin_unlock_irqrestore(&q->lock, flags);
> > 	+	smp_mb__after_unlock();
> > 	 }
> > 	 EXPORT_SYMBOL(prepare_to_wait_exclusive);
> >
> >
> > Assuming it can also be used "later", after another LOCK, like in
> > your example in 5/7.
>
> I am fine either way.  But there was an objection to tying this to the
> unlock because it costs more on many architectures than tying this to
> the lock.

OK, I see, thanks.

> But if you are saying "in addition to" rather than "instead of" that
> would be a different conversation.

Yes, please forget for now.

Oleg.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10 12:37     ` Peter Zijlstra
  2013-12-10 17:17       ` Paul E. McKenney
@ 2013-12-10 17:45       ` Josh Triplett
  2013-12-10 20:11         ` Paul E. McKenney
  1 sibling, 1 reply; 14+ messages in thread
From: Josh Triplett @ 2013-12-10 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, rostedt, dhowells, edumazet, darren,
	fweisbec, sbw, Linux-Arch, Ingo Molnar, Oleg Nesterov,
	Linus Torvalds

On Tue, Dec 10, 2013 at 01:37:26PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 09, 2013 at 05:28:02PM -0800, Paul E. McKenney wrote:
> > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> > index f89da808ce31..abf645799991 100644
> > --- a/arch/powerpc/include/asm/barrier.h
> > +++ b/arch/powerpc/include/asm/barrier.h
> > @@ -84,4 +84,6 @@ do {									\
> >  	___p1;								\
> >  })
> >  
> > +#define smp_mb__after_unlock_lock()	do { } while (0)
> > +
> >  #endif /* _ASM_POWERPC_BARRIER_H */
> 
> Didn't ben said ppc actually violates the current unlock+lock assumtion
> and therefore this barrier woulnd't actually be a nop on ppc

Or, ppc could fix its lock primitives to preserve the unlock+lock
assumption, and avoid subtle breakage across half the kernel.

- Josh Triplett

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10  5:26       ` Paul E. McKenney
@ 2013-12-10 18:53         ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-12-10 18:53 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	sbw, Linux-Arch, Ingo Molnar, Oleg Nesterov, Linus Torvalds

On Mon, Dec 09, 2013 at 09:26:41PM -0800, Paul E. McKenney wrote:
> On Mon, Dec 09, 2013 at 05:34:17PM -0800, Josh Triplett wrote:
> > On Mon, Dec 09, 2013 at 05:28:02PM -0800, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > 
> > > The Linux kernel has traditionally required that an UNLOCK+LOCK pair
> > > act as a full memory barrier when either (1) that UNLOCK+LOCK pair
> > > was executed by the same CPU or task, or (2) the same lock variable
> > > was used for the UNLOCK and LOCK.  It now seems likely that very few
> > > places in the kernel rely on this full-memory-barrier semantic, and
> > > with the advent of queued locks, providing this semantic either requires
> > > complex reasoning, or for some architectures, added overhead.
> > > 
> > > This commit therefore adds a smp_mb__after_unlock_lock(), which may be
> > > placed after a LOCK primitive to restore the full-memory-barrier semantic.
> > > All definitions are currently no-ops, but will be upgraded for some
> > > architectures when queued locks arrive.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Cc: Linux-Arch <linux-arch@vger.kernel.org>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > 
> > It seems quite unfortunate that this isn't in some common location, and
> > then only overridden by architectures that need to do so.
> 
> I was thinking that include/asm-generic/barrier.h was the place, but
> it is all-or-nothing, used by UP architectures, from what I can see.
> I figured that if there is such a common location, posting this patch
> might flush it out.  I am not sure that this single definition is worth
> the creation of a common place -- or even this definition combined with
> smp_read_barrier_depends().

And of course the right place to put this is include/linux/spinlock.h,
the same place where smp_mb__before_spinlock() is defined.  Exceptions
then go into the corresponding arch-specific spinlock.h files.

Much better that way, thank you for calling this out!

							Thanx, Paul

> > More importantly: you document this earlier in the patch series than you
> > introduce it.
> 
> Fair point, I reversed the order of those two patches.
> 
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier
  2013-12-10 17:45       ` Josh Triplett
@ 2013-12-10 20:11         ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-12-10 20:11 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Peter Zijlstra, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, rostedt, dhowells, edumazet, darren,
	fweisbec, sbw, Linux-Arch, Ingo Molnar, Oleg Nesterov,
	Linus Torvalds

On Tue, Dec 10, 2013 at 09:45:08AM -0800, Josh Triplett wrote:
> On Tue, Dec 10, 2013 at 01:37:26PM +0100, Peter Zijlstra wrote:
> > On Mon, Dec 09, 2013 at 05:28:02PM -0800, Paul E. McKenney wrote:
> > > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> > > index f89da808ce31..abf645799991 100644
> > > --- a/arch/powerpc/include/asm/barrier.h
> > > +++ b/arch/powerpc/include/asm/barrier.h
> > > @@ -84,4 +84,6 @@ do {									\
> > >  	___p1;								\
> > >  })
> > >  
> > > +#define smp_mb__after_unlock_lock()	do { } while (0)
> > > +
> > >  #endif /* _ASM_POWERPC_BARRIER_H */
> > 
> > Didn't ben said ppc actually violates the current unlock+lock assumtion
> > and therefore this barrier woulnd't actually be a nop on ppc
> 
> Or, ppc could fix its lock primitives to preserve the unlock+lock
> assumption, and avoid subtle breakage across half the kernel.

Indeed.  However, another motivation for this change was the difficulty
in proving that x86 really provided the equivalent of a full barrier
for the MCS lock handoff case:

	http://www.spinics.net/lists/linux-mm/msg65653.html

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-12-10 20:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20131210012738.GA24317@linux.vnet.ibm.com>
     [not found] ` <1386638883-25379-1-git-send-email-paulmck@linux.vnet.ibm.com>
2013-12-10  1:27   ` [PATCH v5 tip/core/locking 3/7] Documentation/memory-barriers.txt: Prohibit speculative writes Paul E. McKenney
2013-12-10  1:28   ` [PATCH v5 tip/core/locking 6/7] locking: Add an smp_mb__after_unlock_lock() for UNLOCK+LOCK barrier Paul E. McKenney
2013-12-10  1:28     ` Paul E. McKenney
2013-12-10  1:34     ` Josh Triplett
2013-12-10  5:26       ` Paul E. McKenney
2013-12-10 18:53         ` Paul E. McKenney
2013-12-10 12:37     ` Peter Zijlstra
2013-12-10 17:17       ` Paul E. McKenney
2013-12-10 17:45       ` Josh Triplett
2013-12-10 20:11         ` Paul E. McKenney
2013-12-10 17:04     ` Oleg Nesterov
2013-12-10 17:18       ` Paul E. McKenney
2013-12-10 17:18         ` Paul E. McKenney
2013-12-10 17:32         ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox