* [PATCH v2 RFC 0/3] Memory-barrier documentation updates @ 2013-11-21 21:30 Paul E. McKenney 2013-11-21 21:31 ` [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2013-11-21 21:30 UTC (permalink / raw) To: linux-kernel Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw Hello! This series applies some long-needed updates to memory-barriers.txt: 1. Add ACCESS_ONCE() calls where needed to ensure their inclusion in code copy-and-pasted from this file. 2. Add long atomic examples alongside the existing atomics. 3. Prohibit architectures supporting the Linux kernel from speculating stores. Changes from v1: o Combined with Peter Zijlstra's speculative-store-prohibition patch. o Added more pitfalls to avoid when prohibiting speculative stores, along with how to avoid them. o Applied Josh Triplett's review comments. Thanx, Paul b/Documentation/memory-barriers.txt | 360 ++++++++++++++++++++++++++---------- 1 file changed, 266 insertions(+), 94 deletions(-) ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-21 21:30 [PATCH v2 RFC 0/3] Memory-barrier documentation updates Paul E. McKenney @ 2013-11-21 21:31 ` Paul E. McKenney 2013-11-21 21:31 ` [PATCH v2 RFC 2/3] documentation: Add long atomic examples " Paul E. McKenney ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Paul E. McKenney @ 2013-11-21 21:31 UTC (permalink / raw) To: linux-kernel Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw, Paul E. McKenney From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> The Documentation/memory-barriers.txt file was written before the need for ACCESS_ONCE() was fully appreciated. It therefore contains no ACCESS_ONCE() calls, which can be a problem when people lift examples from it. This commit therefore adds ACCESS_ONCE() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> --- Documentation/memory-barriers.txt | 206 +++++++++++++++++++++++--------------- 1 file changed, 126 insertions(+), 80 deletions(-) diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index c8c42e64e953..1c3c1d025f20 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -194,18 +194,22 @@ There are some minimal guarantees that may be expected of a CPU: (*) On any given CPU, dependent memory accesses will be issued in order, with respect to itself. This means that for: - Q = P; D = *Q; + ACCESS_ONCE(Q) = P; smp_read_barrier_depends(); D = ACCESS_ONCE(*Q); the CPU will issue the following memory operations: Q = LOAD P, D = LOAD *Q - and always in that order. + and always in that order. On most systems, smp_read_barrier_depends() + does nothing, but it is required for DEC Alpha. The ACCESS_ONCE() + is required to prevent compiler mischief. Please note that you + should normally use something like rcu_dereference() instead of + open-coding smp_read_barrier_depends(). (*) Overlapping loads and stores within a particular CPU will appear to be ordered within that CPU. This means that for: - a = *X; *X = b; + a = ACCESS_ONCE(*X); ACCESS_ONCE(*X) = b; the CPU will only issue the following sequence of memory operations: @@ -213,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU: And for: - *X = c; d = *X; + ACCESS_ONCE(*X) = c; d = ACCESS_ONCE(*X); the CPU will only issue: @@ -224,6 +228,41 @@ There are some minimal guarantees that may be expected of a CPU: And there are a number of things that _must_ or _must_not_ be assumed: + (*) It _must_not_ be assumed that the compiler will do what you want with + memory references that are not protected by ACCESS_ONCE(). Without + ACCESS_ONCE(), the compiler is within its rights to do all sorts + of "creative" transformations: + + (-) Repeat the load, possibly getting a different value on the second + and subsequent loads. This is especially prone to happen when + register pressure is high. + + (-) Merge adjacent loads and stores to the same location. The most + familiar example is the transformation from: + + while (a) + do_something(); + + to something like: + + if (a) + for (;;) + do_something(); + + Using ACCESS_ONCE() as follows prevents this sort of optimization: + + while (ACCESS_ONCE(a)) + do_something(); + + (-) "Store tearing", where a single store in the source code is split + into smaller stores in the object code. Note that gcc really + will do this on some architectures when storing certain constants. + It can be cheaper to do a series of immediate stores than to + form the constant in a register and then to store that register. + + (-) "Load tearing", which splits loads in a manner analogous to + store tearing. + (*) It _must_not_ be assumed that independent loads and stores will be issued in the order given. This means that for: @@ -450,14 +489,14 @@ The usage requirements of data dependency barriers are a little subtle, and it's not always obvious that they're needed. To illustrate, consider the following sequence of events: - CPU 1 CPU 2 - =============== =============== + CPU 1 CPU 2 + =============== =============== { A == 1, B == 2, C = 3, P == &A, Q == &C } B = 4; <write barrier> - P = &B - Q = P; - D = *Q; + ACCESS_ONCE(P) = &B + Q = ACCESS_ONCE(P); + D = *Q; There's a clear data dependency here, and it would seem that by the end of the sequence, Q must be either &A or &B, and that: @@ -477,15 +516,15 @@ Alpha). To deal with this, a data dependency barrier or better must be inserted between the address load and the data load: - CPU 1 CPU 2 - =============== =============== + CPU 1 CPU 2 + =============== =============== { A == 1, B == 2, C = 3, P == &A, Q == &C } B = 4; <write barrier> - P = &B - Q = P; - <data dependency barrier> - D = *Q; + ACCESS_ONCE(P) = &B + Q = ACCESS_ONCE(P); + <data dependency barrier> + D = *Q; This enforces the occurrence of one of the two implications, and prevents the third possibility from arising. @@ -504,21 +543,22 @@ Another example of where data dependency barriers might by required is where a number is read from memory and then used to calculate the index for an array access: - CPU 1 CPU 2 - =============== =============== + CPU 1 CPU 2 + =============== =============== { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } M[1] = 4; <write barrier> - P = 1 - Q = P; - <data dependency barrier> - D = M[Q]; + ACCESS_ONCE(P) = 1 + Q = ACCESS_ONCE(P); + <data dependency barrier> + D = M[Q]; -The data dependency barrier is very important to the RCU system, for example. -See rcu_dereference() in include/linux/rcupdate.h. This permits the current -target of an RCU'd pointer to be replaced with a new modified target, without -the replacement target appearing to be incompletely initialised. +The data dependency barrier is very important to the RCU system, +for example. See rcu_assign_pointer() and rcu_dereference() in +include/linux/rcupdate.h. This permits the current target of an RCU'd +pointer to be replaced with a new modified target, without the replacement +target appearing to be incompletely initialised. See also the subsection on "Cache Coherency" for a more thorough example. @@ -530,22 +570,23 @@ A control dependency requires a full read memory barrier, not simply a data dependency barrier to make it work correctly. Consider the following bit of code: - q = &a; + q = ACCESS_ONCE(a); if (p) { <data dependency barrier> - q = &b; + q = ACCESS_ONCE(b); } x = *q; This will not have the desired effect because there is no actual data -dependency, but rather a control dependency that the CPU may short-circuit by -attempting to predict the outcome in advance. In such a case what's actually -required is: +dependency, but rather a control dependency that the CPU may short-circuit +by attempting to predict the outcome in advance, so that other CPUs see +the load from b as having happened before the load from a. In such a +case what's actually required is: - q = &a; + q = ACCESS_ONCE(a); if (p) { <read barrier> - q = &b; + q = ACCESS_ONCE(b); } x = *q; @@ -561,23 +602,23 @@ barrier, though a general barrier would also be viable. Similarly a read barrier or a data dependency barrier should always be paired with at least an write barrier, though, again, a general barrier is viable: - CPU 1 CPU 2 - =============== =============== - a = 1; + CPU 1 CPU 2 + =============== =============== + ACCESS_ONCE(a) = 1; <write barrier> - b = 2; x = b; - <read barrier> - y = a; + ACCESS_ONCE(b) = 2; x = ACCESS_ONCE(b); + <read barrier> + y = ACCESS_ONCE(a); Or: - CPU 1 CPU 2 - =============== =============================== + CPU 1 CPU 2 + =============== =============================== a = 1; <write barrier> - b = &a; x = b; - <data dependency barrier> - y = *x; + ACCESS_ONCE(b) = &a; x = ACCESS_ONCE(b); + <data dependency barrier> + y = *x; Basically, the read barrier always has to be there, even though it can be of the "weaker" type. @@ -586,13 +627,13 @@ the "weaker" type. match the loads after the read barrier or the data dependency barrier, and vice versa: - CPU 1 CPU 2 - =============== =============== - a = 1; }---- --->{ v = c - b = 2; } \ / { w = d - <write barrier> \ <read barrier> - c = 3; } / \ { x = a; - d = 4; }---- --->{ y = b; + CPU 1 CPU 2 + =================== =================== + ACCESS_ONCE(a) = 1; }---- --->{ v = ACCESS_ONCE(c); + ACCESS_ONCE(b) = 2; } \ / { w = ACCESS_ONCE(d); + <write barrier> \ <read barrier> + ACCESS_ONCE(c) = 3; } / \ { x = ACCESS_ONCE(a); + ACCESS_ONCE(d) = 4; }---- --->{ y = ACCESS_ONCE(b); EXAMPLES OF MEMORY BARRIER SEQUENCES @@ -1435,12 +1476,12 @@ three CPUs; then should the following sequence of events occur: CPU 1 CPU 2 =============================== =============================== - *A = a; *E = e; + ACCESS_ONCE(*A) = a; ACCESS_ONCE(*E) = e; LOCK M LOCK Q - *B = b; *F = f; - *C = c; *G = g; + ACCESS_ONCE(*B) = b; ACCESS_ONCE(*F) = f; + ACCESS_ONCE(*C) = c; ACCESS_ONCE(*G) = g; UNLOCK M UNLOCK Q - *D = d; *H = h; + ACCESS_ONCE(*D) = d; ACCESS_ONCE(*H) = h; Then there is no guarantee as to what order CPU 3 will see the accesses to *A through *H occur in, other than the constraints imposed by the separate locks @@ -1460,17 +1501,17 @@ However, if the following occurs: CPU 1 CPU 2 =============================== =============================== - *A = a; - LOCK M [1] - *B = b; - *C = c; - UNLOCK M [1] - *D = d; *E = e; - LOCK M [2] - *F = f; - *G = g; - UNLOCK M [2] - *H = h; + ACCESS_ONCE(*A) = a; + LOCK M [1] + ACCESS_ONCE(*B) = b; + ACCESS_ONCE(*C) = c; + UNLOCK M [1] + ACCESS_ONCE(*D) = d; ACCESS_ONCE(*E) = e; + LOCK M [2] + ACCESS_ONCE(*F) = f; + ACCESS_ONCE(*G) = g; + UNLOCK M [2] + ACCESS_ONCE(*H) = h; CPU 3 might see: @@ -2177,11 +2218,11 @@ A programmer might take it for granted that the CPU will perform memory operations in exactly the order specified, so that if the CPU is, for example, given the following piece of code to execute: - a = *A; - *B = b; - c = *C; - d = *D; - *E = e; + a = ACCESS_ONCE(*A); + ACCESS_ONCE(*B) = b; + c = ACCESS_ONCE(*C); + d = ACCESS_ONCE(*D); + ACCESS_ONCE(*E) = e; they would then expect that the CPU will complete the memory operation for each instruction before moving on to the next one, leading to a definite sequence of @@ -2228,12 +2269,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its _own_ accesses appear to be correctly ordered, without the need for a memory barrier. For instance with the following code: - U = *A; - *A = V; - *A = W; - X = *A; - *A = Y; - Z = *A; + U = ACCESS_ONCE(*A); + ACCESS_ONCE(*A) = V; + ACCESS_ONCE(*A) = W; + X = ACCESS_ONCE(*A); + ACCESS_ONCE(*A) = Y; + Z = ACCESS_ONCE(*A); and assuming no intervention by an external influence, it can be assumed that the final result will appear to be: @@ -2250,7 +2291,12 @@ accesses: in that order, but, without intervention, the sequence may have almost any combination of elements combined or discarded, provided the program's view of -the world remains consistent. +the world remains consistent. Note that ACCESS_ONCE() is -not- optional +in the above example, as there are architectures where a given CPU might +interchange successive loads to the same location. On such architectures, +ACCESS_ONCE() does whatever is necessary to prevent this, for example, on +Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the +special ld.acq and st.rel instructions that prevent such reordering. The compiler may also combine, discard or defer elements of the sequence before the CPU even sees them. @@ -2264,13 +2310,13 @@ may be reduced to: *A = W; -since, without a write barrier, it can be assumed that the effect of the -storage of V to *A is lost. Similarly: +since, without either a write barrier or an ACCESS_ONCE(), it can be +assumed that the effect of the storage of V to *A is lost. Similarly: *A = Y; Z = *A; -may, without a memory barrier, be reduced to: +may, without a memory barrier or an ACCESS_ONCE(), be reduced to: *A = Y; Z = Y; -- 1.8.1.5 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 RFC 2/3] documentation: Add long atomic examples to memory-barriers.txt 2013-11-21 21:31 ` [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt Paul E. McKenney @ 2013-11-21 21:31 ` Paul E. McKenney 2013-11-21 21:55 ` [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls " Peter Zijlstra 2013-11-22 15:39 ` Peter Zijlstra 2 siblings, 0 replies; 14+ messages in thread From: Paul E. McKenney @ 2013-11-21 21:31 UTC (permalink / raw) To: linux-kernel Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw, Paul E. McKenney From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Although the atomic_long_t functions are quite useful, they are a bit obscure. This commit therefore adds the common ones alongside their atomic_t counterparts in Documentation/memory-barriers.txt. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> --- Documentation/memory-barriers.txt | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 1c3c1d025f20..2170dcbedfaf 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1728,21 +1728,23 @@ explicit lock operations, described later). These include: xchg(); cmpxchg(); - atomic_xchg(); - atomic_cmpxchg(); - atomic_inc_return(); - atomic_dec_return(); - atomic_add_return(); - atomic_sub_return(); - atomic_inc_and_test(); - atomic_dec_and_test(); - atomic_sub_and_test(); - atomic_add_negative(); - atomic_add_unless(); /* when succeeds (returns 1) */ + atomic_xchg(); atomic_long_xchg(); + atomic_cmpxchg(); atomic_long_cmpxchg(); + atomic_inc_return(); atomic_long_inc_return(); + atomic_dec_return(); atomic_long_dec_return(); + atomic_add_return(); atomic_long_add_return(); + atomic_sub_return(); atomic_long_sub_return(); + atomic_inc_and_test(); atomic_long_inc_and_test(); + atomic_dec_and_test(); atomic_long_dec_and_test(); + atomic_sub_and_test(); atomic_long_sub_and_test(); + atomic_add_negative(); atomic_long_add_negative(); test_and_set_bit(); test_and_clear_bit(); test_and_change_bit(); + /* when succeeds (returns 1) */ + atomic_add_unless(); atomic_long_add_unless(); + These are used for such things as implementing LOCK-class and UNLOCK-class operations and adjusting reference counters towards object destruction, and as such the implicit memory barrier effects are necessary. -- 1.8.1.5 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-21 21:31 ` [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt Paul E. McKenney 2013-11-21 21:31 ` [PATCH v2 RFC 2/3] documentation: Add long atomic examples " Paul E. McKenney @ 2013-11-21 21:55 ` Peter Zijlstra 2013-11-21 22:09 ` Paul E. McKenney 2013-11-22 15:39 ` Peter Zijlstra 2 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2013-11-21 21:55 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec, sbw On Thu, Nov 21, 2013 at 01:31:27PM -0800, Paul E. McKenney wrote: > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > > The Documentation/memory-barriers.txt file was written before the need > for ACCESS_ONCE() was fully appreciated. It therefore contains no > ACCESS_ONCE() calls, which can be a problem when people lift examples > from it. This commit therefore adds ACCESS_ONCE() calls. So I find the repeated ACCESS_ONCE() significantly detracts from the readability of the text. Can't we simply state that all accesses are assumed single-copy atomic and this can be achieved for naturally aligned words using ACCESS_ONCE() in C/C++ ? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-21 21:55 ` [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls " Peter Zijlstra @ 2013-11-21 22:09 ` Paul E. McKenney 2013-11-21 22:18 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2013-11-21 22:09 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec, sbw On Thu, Nov 21, 2013 at 10:55:17PM +0100, Peter Zijlstra wrote: > On Thu, Nov 21, 2013 at 01:31:27PM -0800, Paul E. McKenney wrote: > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > > > > The Documentation/memory-barriers.txt file was written before the need > > for ACCESS_ONCE() was fully appreciated. It therefore contains no > > ACCESS_ONCE() calls, which can be a problem when people lift examples > > from it. This commit therefore adds ACCESS_ONCE() calls. > > So I find the repeated ACCESS_ONCE() significantly detracts from the > readability of the text. > > Can't we simply state that all accesses are assumed single-copy atomic > and this can be achieved for naturally aligned words using ACCESS_ONCE() > in C/C++ ? We could, but at the moment I would prefer the decrease in readability to the copy-and-paste bugs that omit needed ACCESS_ONCE() calls. Is there some way to get both ACCESS_ONCE() and readability? An abbreviation such as AO()? More easily distinguished variable names? Something else? Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-21 22:09 ` Paul E. McKenney @ 2013-11-21 22:18 ` Peter Zijlstra 2013-11-21 22:32 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2013-11-21 22:18 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec, sbw On Thu, Nov 21, 2013 at 02:09:51PM -0800, Paul E. McKenney wrote: > On Thu, Nov 21, 2013 at 10:55:17PM +0100, Peter Zijlstra wrote: > > On Thu, Nov 21, 2013 at 01:31:27PM -0800, Paul E. McKenney wrote: > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > > > > > > The Documentation/memory-barriers.txt file was written before the need > > > for ACCESS_ONCE() was fully appreciated. It therefore contains no > > > ACCESS_ONCE() calls, which can be a problem when people lift examples > > > from it. This commit therefore adds ACCESS_ONCE() calls. > > > > So I find the repeated ACCESS_ONCE() significantly detracts from the > > readability of the text. > > > > Can't we simply state that all accesses are assumed single-copy atomic > > and this can be achieved for naturally aligned words using ACCESS_ONCE() > > in C/C++ ? > > We could, but at the moment I would prefer the decrease in readability > to the copy-and-paste bugs that omit needed ACCESS_ONCE() calls. > > Is there some way to get both ACCESS_ONCE() and readability? An > abbreviation such as AO()? More easily distinguished variable names? > Something else? Use a form that looks less like C and thus defeats copy/paste? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-21 22:18 ` Peter Zijlstra @ 2013-11-21 22:32 ` Paul E. McKenney 2013-11-22 11:17 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2013-11-21 22:32 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec, sbw On Thu, Nov 21, 2013 at 11:18:21PM +0100, Peter Zijlstra wrote: > On Thu, Nov 21, 2013 at 02:09:51PM -0800, Paul E. McKenney wrote: > > On Thu, Nov 21, 2013 at 10:55:17PM +0100, Peter Zijlstra wrote: > > > On Thu, Nov 21, 2013 at 01:31:27PM -0800, Paul E. McKenney wrote: > > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > > > > > > > > The Documentation/memory-barriers.txt file was written before the need > > > > for ACCESS_ONCE() was fully appreciated. It therefore contains no > > > > ACCESS_ONCE() calls, which can be a problem when people lift examples > > > > from it. This commit therefore adds ACCESS_ONCE() calls. > > > > > > So I find the repeated ACCESS_ONCE() significantly detracts from the > > > readability of the text. > > > > > > Can't we simply state that all accesses are assumed single-copy atomic > > > and this can be achieved for naturally aligned words using ACCESS_ONCE() > > > in C/C++ ? > > > > We could, but at the moment I would prefer the decrease in readability > > to the copy-and-paste bugs that omit needed ACCESS_ONCE() calls. > > > > Is there some way to get both ACCESS_ONCE() and readability? An > > abbreviation such as AO()? More easily distinguished variable names? > > Something else? > > Use a form that looks less like C and thus defeats copy/paste? My concern with that approach is that there is likely to be a large number of people who are likely to be willing and able to transcribe from any reasonable non-C form to ACCESS_ONCE()-free C code. :-/ But maybe you have something specific in mind? Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-21 22:32 ` Paul E. McKenney @ 2013-11-22 11:17 ` Peter Zijlstra 2013-11-22 18:54 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2013-11-22 11:17 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec, sbw On Thu, Nov 21, 2013 at 02:32:30PM -0800, Paul E. McKenney wrote: > > > We could, but at the moment I would prefer the decrease in readability > > > to the copy-and-paste bugs that omit needed ACCESS_ONCE() calls. > > > > > > Is there some way to get both ACCESS_ONCE() and readability? An > > > abbreviation such as AO()? More easily distinguished variable names? > > > Something else? > > > > Use a form that looks less like C and thus defeats copy/paste? > > My concern with that approach is that there is likely to be a large > number of people who are likely to be willing and able to transcribe > from any reasonable non-C form to ACCESS_ONCE()-free C code. :-/ > > But maybe you have something specific in mind? No, that was pretty much it. My issues is though that the subject matter is difficult enough without actively obfuscating the examples. Furthermore, people will find ways to get it wrong anyhow, if all they do is copy/paste without thought, then getting it wrong is pretty much guaranteed in this case. Memory ordering isn't something you can do without thinking. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-22 11:17 ` Peter Zijlstra @ 2013-11-22 18:54 ` Paul E. McKenney 0 siblings, 0 replies; 14+ messages in thread From: Paul E. McKenney @ 2013-11-22 18:54 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec, sbw On Fri, Nov 22, 2013 at 12:17:23PM +0100, Peter Zijlstra wrote: > On Thu, Nov 21, 2013 at 02:32:30PM -0800, Paul E. McKenney wrote: > > > > > We could, but at the moment I would prefer the decrease in readability > > > > to the copy-and-paste bugs that omit needed ACCESS_ONCE() calls. > > > > > > > > Is there some way to get both ACCESS_ONCE() and readability? An > > > > abbreviation such as AO()? More easily distinguished variable names? > > > > Something else? > > > > > > Use a form that looks less like C and thus defeats copy/paste? > > > > My concern with that approach is that there is likely to be a large > > number of people who are likely to be willing and able to transcribe > > from any reasonable non-C form to ACCESS_ONCE()-free C code. :-/ > > > > But maybe you have something specific in mind? > > No, that was pretty much it. My issues is though that the subject matter > is difficult enough without actively obfuscating the examples. > > Furthermore, people will find ways to get it wrong anyhow, if all they > do is copy/paste without thought, then getting it wrong is pretty much > guaranteed in this case. Memory ordering isn't something you can do > without thinking. One could argue that parsing the ACCESS_ONCE() calls will help them realize that they actually need to think. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-21 21:31 ` [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt Paul E. McKenney 2013-11-21 21:31 ` [PATCH v2 RFC 2/3] documentation: Add long atomic examples " Paul E. McKenney 2013-11-21 21:55 ` [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls " Peter Zijlstra @ 2013-11-22 15:39 ` Peter Zijlstra 2013-11-22 18:13 ` Paul E. McKenney 2 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2013-11-22 15:39 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec, sbw On Thu, Nov 21, 2013 at 01:31:27PM -0800, Paul E. McKenney wrote: > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > > The Documentation/memory-barriers.txt file was written before the need > for ACCESS_ONCE() was fully appreciated. It therefore contains no > ACCESS_ONCE() calls, which can be a problem when people lift examples > from it. This commit therefore adds ACCESS_ONCE() calls. > Under the 'COMPILER BARRIER' section we state that: "This is a general barrier - lesser varieties of compiler barrier do not exist." One could argue ACCESS_ONCE() is such a lesser barrier. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-22 15:39 ` Peter Zijlstra @ 2013-11-22 18:13 ` Paul E. McKenney 2013-11-23 9:04 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2013-11-22 18:13 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec, sbw On Fri, Nov 22, 2013 at 04:39:08PM +0100, Peter Zijlstra wrote: > On Thu, Nov 21, 2013 at 01:31:27PM -0800, Paul E. McKenney wrote: > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > > > > The Documentation/memory-barriers.txt file was written before the need > > for ACCESS_ONCE() was fully appreciated. It therefore contains no > > ACCESS_ONCE() calls, which can be a problem when people lift examples > > from it. This commit therefore adds ACCESS_ONCE() calls. > > > > Under the 'COMPILER BARRIER' section we state that: > > "This is a general barrier - lesser varieties of compiler barrier do not > exist." > > One could argue ACCESS_ONCE() is such a lesser barrier. Fair point -- I should have updated this section when adding ACCESS_ONCE(). How about the following? Thanx, Paul ------------------------------------------------------------------------ COMPILER BARRIER ---------------- The Linux kernel has an explicit compiler barrier function that prevents the compiler from moving the memory accesses either side of it to the other side: barrier(); This is a general barrier -- there are no read-read or write-write variants of barrier(). Howevever, ACCESS_ONCE() can be thought of as a weak form for barrier() that affects only the specific accesses flagged by the ACCESS_ONCE(). The compiler barrier has no direct effect on the CPU, which may then reorder things however it wishes. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-22 18:13 ` Paul E. McKenney @ 2013-11-23 9:04 ` Peter Zijlstra 2013-11-23 17:17 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2013-11-23 9:04 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec, sbw On Fri, Nov 22, 2013 at 10:13:13AM -0800, Paul E. McKenney wrote: > How about the following? > > Thanx, Paul > > ------------------------------------------------------------------------ > > COMPILER BARRIER > ---------------- > > The Linux kernel has an explicit compiler barrier function that prevents the > compiler from moving the memory accesses either side of it to the other side: > > barrier(); > > This is a general barrier -- there are no read-read or write-write variants > of barrier(). Howevever, ACCESS_ONCE() can be thought of as a weak form > for barrier() that affects only the specific accesses flagged by the > ACCESS_ONCE(). > > The compiler barrier has no direct effect on the CPU, which may then reorder > things however it wishes. > Seems ok, however this also seems like the natural spot to put that chunk about how a compiler can mis-transform stuff without either barrier or ACCESS_ONC(); that currently seems spread out over the document in some notes. The biggest of which seems to have ended up in the GUARANTEES chapter. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-23 9:04 ` Peter Zijlstra @ 2013-11-23 17:17 ` Paul E. McKenney 2013-12-04 0:47 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2013-11-23 17:17 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec, sbw On Sat, Nov 23, 2013 at 10:04:06AM +0100, Peter Zijlstra wrote: > On Fri, Nov 22, 2013 at 10:13:13AM -0800, Paul E. McKenney wrote: > > How about the following? > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > COMPILER BARRIER > > ---------------- > > > > The Linux kernel has an explicit compiler barrier function that prevents the > > compiler from moving the memory accesses either side of it to the other side: > > > > barrier(); > > > > This is a general barrier -- there are no read-read or write-write variants > > of barrier(). Howevever, ACCESS_ONCE() can be thought of as a weak form > > for barrier() that affects only the specific accesses flagged by the > > ACCESS_ONCE(). > > > > The compiler barrier has no direct effect on the CPU, which may then reorder > > things however it wishes. > > > > Seems ok, however this also seems like the natural spot to put that > chunk about how a compiler can mis-transform stuff without either > barrier or ACCESS_ONC(); that currently seems spread out over the > document in some notes. > > The biggest of which seems to have ended up in the GUARANTEES chapter. Good point! I believe that the spread-out stuff is still needed, so I will add a summary of that information here, perhaps based in part on Jon Corbet's ACCESS_ONCE() article (http://lwn.net/Articles/508991/). Seem reasonable? Thanx, Paul ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt 2013-11-23 17:17 ` Paul E. McKenney @ 2013-12-04 0:47 ` Paul E. McKenney 0 siblings, 0 replies; 14+ messages in thread From: Paul E. McKenney @ 2013-12-04 0:47 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt, dhowells, edumazet, darren, fweisbec, sbw On Sat, Nov 23, 2013 at 09:17:19AM -0800, Paul E. McKenney wrote: > On Sat, Nov 23, 2013 at 10:04:06AM +0100, Peter Zijlstra wrote: > > On Fri, Nov 22, 2013 at 10:13:13AM -0800, Paul E. McKenney wrote: > > > How about the following? > > > > > > Thanx, Paul > > > > > > ------------------------------------------------------------------------ > > > > > > COMPILER BARRIER > > > ---------------- > > > > > > The Linux kernel has an explicit compiler barrier function that prevents the > > > compiler from moving the memory accesses either side of it to the other side: > > > > > > barrier(); > > > > > > This is a general barrier -- there are no read-read or write-write variants > > > of barrier(). Howevever, ACCESS_ONCE() can be thought of as a weak form > > > for barrier() that affects only the specific accesses flagged by the > > > ACCESS_ONCE(). > > > > > > The compiler barrier has no direct effect on the CPU, which may then reorder > > > things however it wishes. > > > > > > > Seems ok, however this also seems like the natural spot to put that > > chunk about how a compiler can mis-transform stuff without either > > barrier or ACCESS_ONC(); that currently seems spread out over the > > document in some notes. > > > > The biggest of which seems to have ended up in the GUARANTEES chapter. > > Good point! I believe that the spread-out stuff is still needed, so I > will add a summary of that information here, perhaps based in part on > Jon Corbet's ACCESS_ONCE() article (http://lwn.net/Articles/508991/). > > Seem reasonable? For example, how about the following? Thanx, Paul ------------------------------------------------------------------------ b/Documentation/memory-barriers.txt | 253 +++++++++++++++++++++++++++++++----- 1 file changed, 219 insertions(+), 34 deletions(-) Documentation/memory-barriers.txt: Document ACCESS_ONCE() The situations in which ACCESS_ONCE() is required are not well documented, so this commit adds some verbiage to memory-barriers.txt. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Rusty Russell <rusty@rustcorp.com.au> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 1a2e5079b7f8..27ad05f47d83 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -231,37 +231,8 @@ And there are a number of things that _must_ or _must_not_ be assumed: (*) It _must_not_ be assumed that the compiler will do what you want with memory references that are not protected by ACCESS_ONCE(). Without ACCESS_ONCE(), the compiler is within its rights to do all sorts - of "creative" transformations: - - (-) Repeat the load, possibly getting a different value on the second - and subsequent loads. This is especially prone to happen when - register pressure is high. - - (-) Merge adjacent loads and stores to the same location. The most - familiar example is the transformation from: - - while (a) - do_something(); - - to something like: - - if (a) - for (;;) - do_something(); - - Using ACCESS_ONCE() as follows prevents this sort of optimization: - - while (ACCESS_ONCE(a)) - do_something(); - - (-) "Store tearing", where a single store in the source code is split - into smaller stores in the object code. Note that gcc really - will do this on some architectures when storing certain constants. - It can be cheaper to do a series of immediate stores than to - form the constant in a register and then to store that register. - - (-) "Load tearing", which splits loads in a manner analogous to - store tearing. + of "creative" transformations, which are covered in the Compiler + Barrier section. (*) It _must_not_ be assumed that independent loads and stores will be issued in the order given. This means that for: @@ -748,7 +719,8 @@ In summary: (*) Control dependencies require that the compiler avoid reordering the dependency into nonexistence. Careful use of ACCESS_ONCE() or - barrier() can help to preserve your control dependency. + barrier() can help to preserve your control dependency. Please + see the Compiler Barrier section for more information. (*) Control dependencies do -not- provide transitivity. If you need transitivity, use smp_mb(). @@ -1251,8 +1223,221 @@ of barrier(). Howevever, ACCESS_ONCE() can be thought of as a weak form for barrier() that affects only the specific accesses flagged by the ACCESS_ONCE(). -The compiler barrier has no direct effect on the CPU, which may then reorder -things however it wishes. +The barrier() function has the following effects: + + (*) Prevents the compiler from reordering accesses following the + barrier() to precede any accesses preceding the barrier(). + One example use for this property is to ease communication between + interrupt-handler code and the code that was interrupted. + + (*) Within a loop, forces the compiler to load the variables used + in that loop's conditional on each pass through that loop. + +The ACCESS_ONCE() function can prevent any number of optimizations that, +while perfectly safe in single-threaded code, can be fatal in concurrent +code. Here are some examples of these sorts of optimizations: + + (*) The compiler is within its rights to merge successive loads from + the same variable. Such merging can cause the compiler to "optimize" + the following code: + + while (tmp = a) + do_something_with(tmp); + + into the following code, which, although in some sense legitimate + for single-threaded code, is almost certainly not what the developer + intended: + + if (tmp = a) + for (;;) + do_something_with(tmp); + + Use ACCESS_ONCE() to prevent the compiler from doing this to you: + + while (tmp = ACCESS_ONCE(a)) + do_something_with(tmp); + + (*) The compiler is within its rights to reload a variable, for example, + in cases where high register pressure prevents the compiler from + keeping all data of interest in registers. The compiler might + therefore optimize the variable tmp out of our previous example: + + while (tmp = a) + do_something_with(tmp); + + This could result in the following code, which is perfectly safe in + single-threaded code, but can be fatal in concurrent code: + + while (a) + do_something_with(a); + + For example, the optimized version of this code could result in + passing a zero to do_something_with() in the case where the variable + a was modified by some other CPU between the "while" statement and + the call to do_something_with(). + + Again, use ACCESS_ONCE() to prevent the compiler from doing this: + + while (tmp = ACCESS_ONCE(a)) + do_something_with(tmp); + + Note that if the compiler runs short of registers, it might save + tmp onto the stack. The overhead of this saving and later restoring + is why compilers reload variables. Doing so is perfectly safe for + single-threaded code, so you need to tell the compiler about cases + where it is not safe. + + (*) The compiler is within its rights to omit a load entirely if it knows + what the value will be. For example, if the compiler can prove that + the value of variable a is always zero, it can optimize this code: + + while (tmp = a) + do_something_with(tmp); + + Into this: + + do { } while (0); + + This transformation is a win for single-threaded code because it gets + rid of a load and a branch. The problem is that the compiler will + carry out its proof assuming that the current CPU is the only one + updating variable a. If variable a is shared, then the compiler's + proof will be erroneous. Use ACCESS_ONCE() to tell the compiler + that it doesn't know as much as it thinks it does: + + while (tmp = ACCESS_ONCE(a)) + do_something_with(tmp); + + But please note that the compiler is also closely watching what you + do with the value after the ACCESS_ONCE(). For example, suppose you + do the following and MAX is a preprocessor macro with the value 1: + + for ((tmp = ACCESS_ONCE(a)) % MAX) + do_something_with(tmp); + + Then the compiler knows that the result of the "%" operator applied + to MAX will always be zero, again allowing the compiler to optimize + the code into near-nonexistence. (It will still load from the + variable a.) + + (*) Similarly, the compiler is within its rights to omit a store entirely + if it knows that the variable already has the value being stored. + Again, the compiler assumes that the current CPU is the only one + storing into the variable, which can cause the compiler to do the + wrong thing for shared variables. For example, suppose you have + the following: + + a = 0; + /* Code that does not store to variable a. */ + a = 0; + + The compiler sees that the value of variable a is already zero, so + it might well omit the second store. This would come as a fatal + surprise if some other CPU might have stored to variable a in the + meantime. + + Use ACCESS_ONCE() to prevent the compiler from making this sort of + wrong guess: + + ACCESS_ONCE(a) = 0; + /* Code that does not store to variable a. */ + ACCESS_ONCE(a) = 0; + + (*) The compiler is within its rights to reorder memory accesses unless + you tell it not to. For example, consider the following interaction + between process-level code and an interrupt handler: + + void process_level(void) + { + msg = get_message(); + flag = true; + } + + void interrupt_handler(void) + { + if (flag) + process_message(msg); + } + + There is nothing to prevent the the compiler from transforming + process_level() to the following, in fact, this might well be a + win for single-threaded code: + + void process_level(void) + { + flag = true; + msg = get_message(); + } + + If the interrupt occurs between these two statement, then + interrupt_handler() might be passed a garbled msg. Use ACCESS_ONCE() + to prevent this as follows: + + void process_level(void) + { + ACCESS_ONCE(msg) = get_message(); + ACCESS_ONCE(flag) = true; + } + + void interrupt_handler(void) + { + if (ACCESS_ONCE(flag)) + process_message(ACCESS_ONCE(msg)); + } + + (*) The compiler is within its rights to invent stores to a variable, + as in the following example: + + if (a) + b = a; + else + b = 42; + + The compiler might save a branch by optimizing this as follows: + + b = 42; + if (a) + b = a; + + In single-threaded code, this is not only safe, but also saves + a branch. Unfortunately, in concurrent code, this optimization + could cause some other CPU to see a spurious value of 42 -- even + if variable a was never zero -- when loading variable b. + Use ACCESS_ONCE() to prevent this as follows: + + if (a) + ACCESS_ONCE(b) = a; + else + ACCESS_ONCE(b) = 42; + + The compiler can also invent loads. These are usually less + damaging, but they can result in cache-line bouncing and thus in + poor performance and scalability. Use ACCESS_ONCE() to prevent + invented loads. + + (*) For aligned memory locations whose size allows them to be accessed + with a single memory-reference instruction, prevents "load tearing" + and "store tearing," in which a single large access is replaced by + multiple smaller accesses. For example, given an architecture having + 16-bit store instructions with 7-bit immediate fields, the compiler + might be tempted to use two 16-bit store-immediate instructions to + implement the following 32-bit store: + + p = 0x00010002; + + Please note that GCC really does use this sort of optimization, + which is not surprising given that it would likely take more + than two instructions to build the constant and then store it. + This optimization can therefore be a win in single-threaded code. + In fact, a recent bug (since fixed) caused GCC to incorrectly use + this optimization in a volatile store. In the absence of such bugs, + use of ACCESS_ONCE() prevents store tearing: + + ACCESS_ONCE(p) = 0x00010002; + + +Please note that these compiler barriers have no direct effect on the CPU, +which may then reorder things however it wishes. CPU MEMORY BARRIERS ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-12-04 0:48 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-21 21:30 [PATCH v2 RFC 0/3] Memory-barrier documentation updates Paul E. McKenney 2013-11-21 21:31 ` [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls to memory-barriers.txt Paul E. McKenney 2013-11-21 21:31 ` [PATCH v2 RFC 2/3] documentation: Add long atomic examples " Paul E. McKenney 2013-11-21 21:55 ` [PATCH v2 RFC 1/3] documentation: Add needed ACCESS_ONCE() calls " Peter Zijlstra 2013-11-21 22:09 ` Paul E. McKenney 2013-11-21 22:18 ` Peter Zijlstra 2013-11-21 22:32 ` Paul E. McKenney 2013-11-22 11:17 ` Peter Zijlstra 2013-11-22 18:54 ` Paul E. McKenney 2013-11-22 15:39 ` Peter Zijlstra 2013-11-22 18:13 ` Paul E. McKenney 2013-11-23 9:04 ` Peter Zijlstra 2013-11-23 17:17 ` Paul E. McKenney 2013-12-04 0:47 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox