* [RFC] tools/memory-model: Rule out OOTA @ 2025-01-06 21:40 Jonas Oberhauser 2025-01-07 10:06 ` Peter Zijlstra ` (3 more replies) 0 siblings, 4 replies; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-06 21:40 UTC (permalink / raw) To: paulmck Cc: stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon, Jonas Oberhauser The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following example shared on this list a few years ago: P0(int *a, int *b, int *x, int *y) { int r1; r1 = READ_ONCE(*x); smp_rmb(); if (r1 == 1) { *a = *b; } smp_wmb(); WRITE_ONCE(*y, 1); } P1(int *a, int *b, int *x, int *y) { int r1; r1 = READ_ONCE(*y); smp_rmb(); if (r1 == 1) { *b = *a; } smp_wmb(); WRITE_ONCE(*x, 1); } exists b=42 The root cause is an interplay between plain accesses and rw-fences, i.e., smp_rmb() and smp_wmb(): while smp_rmb() and smp_wmb() provide sufficient ordering for plain accesses to rule out data races, they do not in the current formalization generally actually order the plain accesses, allowing, e.g., the load and stores to *b to proceed in any order even if P1 reads from P0; and in particular, the marked accesses around those plain accesses are also not ordered, which causes this OOTA. In this patch, we choose the rather conservative approach of forcing only the order of these marked accesses, specifically, when a marked read r is separated from a plain read r' by an smp_rmb() (or r' has an address dependency on r or is r'), on which a write w' depends, and w' is either plain and seperated by a subsequent write w by an smp_wmb() (or w' is w), then r precedes w in ppo. Furthermore, we do not force any order in cases where r or w could be elided due to a store with the same address either being before the read (which would allow r to be substituted for the value of the store) or after the write (which would allow w to be dropped). Even though this patch is conservative in this sense, it ensures general OOTA-freedom, more specifically, that any execution with no data race will not have any cycles of (ctrl | addr | data) ; rfe This definition of OOTA is much weaker than more standard definitions, such as requiring that there are no cycles of ctrl | addr | data | rf These definitions work well for syntactic dependencies (hardware models) but not semantic dependencies (language models, like LKMM). We first discuss why the more standard definition does not work well for language models like LKMM. For example, consider r1 = *a; *b = 1; if (*a == 1) *b = 1; *c = *b; In the execution where r1 == 1, there is a control dependency from the load of *a to the second store to *b, from which the load to *b reads, and the store to *c has a data dependency on this load from *b. Nevertheless there is no semantic dependency from the load of *a to the store to *c; the compiler could easily replace the last line with *c = 1 and move this line to the top as follows: *c = 1; r1 = *a; *b = 1; Since there is no order imposed by this sequence of syntactic dependencies and reads, syntactic dependencies can not by themselves form an acyclic relation. In turn, there are some sequences of syntactic dependencies and reads that do form semantic dependencies, such as r1 = *a; *b = 2; if (*a == 1) *b = 1; *c = *b; Here we would consider that the store to *c has a semantic data dependency on the read from *a, given that depending on the result of that read, we store either the value 1 or 2 to *c. Unfortunately, herd7 is currently limited to syntactic dependencies and can not distinguish these two programs. As a result, while our patch is intended to provide ordering for cases resembling the second program (but not the first), with the dependencies considered by the current version of herd7, we do not get such an ordering. There are two more caveats of this patch. The first is that just because there are no subsequent writes to the location of a write w (until the next compiler barrier), does not imply that the write w can not be elided, e.g., if the location of w does not live until the next barrier (as pointed out by Alan Stern a while back). Unfortunately we can not currently express this in herd7's syntax. In fact, to avoid OOTA, it would be sufficient to provide order only in cases where w is read-from by another thread. But that is a rather unnatural formulation. The last caveat is that while we have done a formal proof that this patch excludes OOTA (in all data race free executions), we did so with a different formalization of compiler barrier (formalized in this patch as w_barrier). I suspect that it may be possible to almost completely switch over from w_barrier to the normal definition of barrier, +- the fact that a marked write together with po-loc is also a compiler barrier. But I currently do not have time to investigate this deeply, and I thought maybe there are already some comments on the main parts of the patch. The epsilons and deltas should be resolvable. Have fun, jonas Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> --- tools/memory-model/linux-kernel.cat | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat index d7e7bf13c831..180cab56729e 100644 --- a/tools/memory-model/linux-kernel.cat +++ b/tools/memory-model/linux-kernel.cat @@ -71,6 +71,10 @@ let barrier = fencerel(Barrier | Rmb | Wmb | Mb | Sync-rcu | Sync-srcu | Rcu-lock | Rcu-unlock | Srcu-lock | Srcu-unlock) | (po ; [Release]) | ([Acquire] ; po) +let w_barrier = po? ; [F | Marked] ; po? +let rmb-plain = [R4rmb] ; po ; [Rmb] ; (po \ (po ; [W] ; (po-loc \ w_barrier))) ; [R4rmb & Plain] +let plain-wmb = [W & Plain] ; (po \ ((po-loc \ w_barrier) ; po ; [W] ; po)) ; [Wmb] ; po ; [W] + (**********************************) (* Fundamental coherence ordering *) (**********************************) @@ -90,7 +94,7 @@ empty rmw & (fre ; coe) as atomic let dep = addr | data let rwdep = (dep | ctrl) ; [W] let overwrite = co | fr -let to-w = rwdep | (overwrite & int) | (addr ; [Plain] ; wmb) +let to-w = ((addr | rmb-plain)? ; rwdep ; plain-wmb?) | (overwrite & int) | addr ; [Plain] ; wmb let to-r = (addr ; [R]) | (dep ; [Marked] ; rfi) let ppo = to-r | to-w | (fence & int) | (po-unlock-lock-po & int) -- 2.34.1 ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-06 21:40 [RFC] tools/memory-model: Rule out OOTA Jonas Oberhauser @ 2025-01-07 10:06 ` Peter Zijlstra 2025-01-07 11:02 ` Jonas Oberhauser 2025-01-07 15:46 ` Jonas Oberhauser ` (2 subsequent siblings) 3 siblings, 1 reply; 59+ messages in thread From: Peter Zijlstra @ 2025-01-07 10:06 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, stern, parri.andrea, will, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: > We first discuss why the more standard definition does not work well for > language models like LKMM. For example, consider > > r1 = *a; > *b = 1; > if (*a == 1) if (r1 == 1) ? > *b = 1; > *c = *b; > > In the execution where r1 == 1, there is a control dependency from > the load of *a to the second store to *b, from which the load to *b reads, > and the store to *c has a data dependency on this load from *b. Nevertheless > there is no semantic dependency from the load of *a to the store to *c; the > compiler could easily replace the last line with *c = 1 and move this line to > the top as follows: > > *c = 1; > r1 = *a; > *b = 1; > > Since there is no order imposed by this sequence of syntactic dependencies > and reads, syntactic dependencies can not by themselves form an acyclic > relation. > > In turn, there are some sequences of syntactic dependencies and reads that do > form semantic dependencies, such as > > r1 = *a; > *b = 2; > if (*a == 1) r1 again? > *b = 1; > *c = *b; > > Here we would consider that the store to *c has a semantic data dependency on > the read from *a, given that depending on the result of that read, we store > either the value 1 or 2 to *c. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-07 10:06 ` Peter Zijlstra @ 2025-01-07 11:02 ` Jonas Oberhauser 0 siblings, 0 replies; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-07 11:02 UTC (permalink / raw) To: Peter Zijlstra Cc: paulmck, stern, parri.andrea, will, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/7/2025 um 11:06 AM schrieb Peter Zijlstra: > On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: > >> We first discuss why the more standard definition does not work well for >> language models like LKMM. For example, consider >> >> r1 = *a; >> *b = 1; >> if (*a == 1) > > if (r1 == 1) > > ? > >> *b = 1; >> *c = *b; >> >> In the execution where r1 == 1, there is a control dependency from >> the load of *a to the second store to *b, from which the load to *b reads, >> and the store to *c has a data dependency on this load from *b. Nevertheless >> there is no semantic dependency from the load of *a to the store to *c; the >> compiler could easily replace the last line with *c = 1 and move this line to >> the top as follows: >> >> *c = 1; >> r1 = *a; >> *b = 1; >> >> Since there is no order imposed by this sequence of syntactic dependencies >> and reads, syntactic dependencies can not by themselves form an acyclic >> relation. >> >> In turn, there are some sequences of syntactic dependencies and reads that do >> form semantic dependencies, such as >> >> r1 = *a; >> *b = 2; >> if (*a == 1) > > r1 again? > >> *b = 1; >> *c = *b; >> >> Here we would consider that the store to *c has a semantic data dependency on >> the read from *a, given that depending on the result of that read, we store >> either the value 1 or 2 to *c. Yes on both counts, thanks! jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-06 21:40 [RFC] tools/memory-model: Rule out OOTA Jonas Oberhauser 2025-01-07 10:06 ` Peter Zijlstra @ 2025-01-07 15:46 ` Jonas Oberhauser 2025-01-07 16:09 ` Alan Stern 2025-07-23 0:43 ` Paul E. McKenney 3 siblings, 0 replies; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-07 15:46 UTC (permalink / raw) To: paulmck Cc: stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/6/2025 um 10:40 PM schrieb Jonas Oberhauser:> > Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> > --- > tools/memory-model/linux-kernel.cat | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat > index d7e7bf13c831..180cab56729e 100644 > --- a/tools/memory-model/linux-kernel.cat > +++ b/tools/memory-model/linux-kernel.cat > @@ -71,6 +71,10 @@ let barrier = fencerel(Barrier | Rmb | Wmb | Mb | Sync-rcu | Sync-srcu | > Rcu-lock | Rcu-unlock | Srcu-lock | Srcu-unlock) | > (po ; [Release]) | ([Acquire] ; po) > > +let w_barrier = po? ; [F | Marked] ; po? > +let rmb-plain = [R4rmb] ; po ; [Rmb] ; (po \ (po ; [W] ; (po-loc \ w_barrier))) ; [R4rmb & Plain] > +let plain-wmb = [W & Plain] ; (po \ ((po-loc \ w_barrier) ; po ; [W] ; po)) ; [Wmb] ; po ; [W] > + > (**********************************) > (* Fundamental coherence ordering *) > (**********************************) > @@ -90,7 +94,7 @@ empty rmw & (fre ; coe) as atomic > let dep = addr | data > let rwdep = (dep | ctrl) ; [W] > let overwrite = co | fr > -let to-w = rwdep | (overwrite & int) | (addr ; [Plain] ; wmb) > +let to-w = ((addr | rmb-plain)? ; rwdep ; plain-wmb?) | (overwrite & int) | addr ; [Plain] ; wmb > let to-r = (addr ; [R]) | (dep ; [Marked] ; rfi) > let ppo = to-r | to-w | (fence & int) | (po-unlock-lock-po & int) > I will also try to give an intuitive :) :( :) reasoning for why this patch rules out OOTA. If we look at an dep ; rfe cycle dep ; rfe ; dep ; rfe ; ... then because of the absence of data races, each rfe is more or less an w-post-bounded ; rfe ; r-pre-bounded edge. If we rotate the circle around we turn dep ; w-post-bounded ; rfe ; r-pre-bounded ; dep ; w-post-bounded ; rfe ; r-pre-bounded ; ... into rfe ; (r-pre-bounded ; dep ; w-post-bounded) ; rfe ; (r-pre-bounded ; dep ; w-post-bounded) ; rfe ; (r-pre-bounded ; ... and ideally, each of these (w-pre-bounded ; dep ; r-post-bounded) would imply happens-before, since then the cycle would be. rfe ; hb+; rfe; hb+ ; ... which is acyclic. However, we do not get hb+ in general, in particular if the bounding is due to rmb/wmb. For all other cases, it is relatively easy to see that we get hb+, e.g., if the bound is due to an smp_mb(). Luckily, in our specific case, we can get hb+ evenfor cases where rmb/wmb bound these accesses, because these accesses related by the dep edges are known to be reading/read-from externally. Such an external interaction would be impossible if there were another store to the same location between such an access and the next w_barrier: because of the absence of data races and the lack of w_barriers that would allow synchronization with the outside world, the external event could not "occur between" the access and such a store. As a result, all pre-bounds caused by a rmb must have the form [R4rmb] ; po ; [Rmb] ; (po \ (po ; [W] ; (po-loc \ w_barrier))) ; [R4rmb & Plain] and similar for post-bounds caused by wmb, which means the corresponding r-pre-bounded ; dep ; w-post-bounded edges must be rmb-plain ; dep ; plain-wmb which is in ppo and thus also hb. Hope that helps clarify the idea... jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-06 21:40 [RFC] tools/memory-model: Rule out OOTA Jonas Oberhauser 2025-01-07 10:06 ` Peter Zijlstra 2025-01-07 15:46 ` Jonas Oberhauser @ 2025-01-07 16:09 ` Alan Stern 2025-01-07 18:47 ` Paul E. McKenney 2025-01-08 17:33 ` Jonas Oberhauser 2025-07-23 0:43 ` Paul E. McKenney 3 siblings, 2 replies; 59+ messages in thread From: Alan Stern @ 2025-01-07 16:09 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: > The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following > example shared on this list a few years ago: > > P0(int *a, int *b, int *x, int *y) { > int r1; > > r1 = READ_ONCE(*x); > smp_rmb(); > if (r1 == 1) { > *a = *b; > } > smp_wmb(); > WRITE_ONCE(*y, 1); > } > > P1(int *a, int *b, int *x, int *y) { > int r1; > > r1 = READ_ONCE(*y); > smp_rmb(); > if (r1 == 1) { > *b = *a; > } > smp_wmb(); > WRITE_ONCE(*x, 1); > } > > exists b=42 > > The root cause is an interplay between plain accesses and rw-fences, i.e., > smp_rmb() and smp_wmb(): while smp_rmb() and smp_wmb() provide sufficient > ordering for plain accesses to rule out data races, they do not in the current > formalization generally actually order the plain accesses, allowing, e.g., the > load and stores to *b to proceed in any order even if P1 reads from P0; and in > particular, the marked accesses around those plain accesses are also not > ordered, which causes this OOTA. That's right. The memory model deliberately tries to avoid placing restrictions on plain accesses, whenever it can. In the example above, for instance, I think it's more interesting to ask exists 0:r1=1 /\ 1:r1=1 than to concentrate on a and b. OOTA is a very difficult subject. It can be approached only by making the memory model take all sorts of compiler optimizations into account, and doing this for all possible optimizations is not feasible. (For example, in a presentation to the C++ working group last year, Paul and I didn't try to show how to extend the C++ memory model to exclude OOTA [other than by fiat, as it does now]. Instead we argued that with the existing memory model, no reasonable compiler would ever produce an executable that could exhibit OOTA and so the memory model didn't need to be changed.) > In this patch, we choose the rather conservative approach of forcing only the > order of these marked accesses, specifically, when a marked read r is > separated from a plain read r' by an smp_rmb() (or r' has an address > dependency on r or is r'), on which a write w' depends, and w' is either plain > and seperated by a subsequent write w by an smp_wmb() (or w' is w), then r > precedes w in ppo. Is this really valid? In the example above, if there were no other references to a or b in the rest of the program, the compiler could eliminate them entirely. (Whether the result could count as OOTA is open to question, but that's not the point.) Is it not possible that a compiler might find other ways to defeat your intentions? In any case, my feeling is that memory models for higher languages (i.e., anything above the assembler level) should not try very hard to address the question of OOTA. And for LKMM, OOTA involving _plain_ accesses is doubly out of bounds. Your proposed change seems to add a significant complication to the memory model for a not-very-clear benefit. Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-07 16:09 ` Alan Stern @ 2025-01-07 18:47 ` Paul E. McKenney 2025-01-08 17:39 ` Jonas Oberhauser 2025-01-08 17:33 ` Jonas Oberhauser 1 sibling, 1 reply; 59+ messages in thread From: Paul E. McKenney @ 2025-01-07 18:47 UTC (permalink / raw) To: Alan Stern Cc: Jonas Oberhauser, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Tue, Jan 07, 2025 at 11:09:55AM -0500, Alan Stern wrote: > On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: > > The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following > > example shared on this list a few years ago: > > > > P0(int *a, int *b, int *x, int *y) { > > int r1; > > > > r1 = READ_ONCE(*x); > > smp_rmb(); > > if (r1 == 1) { > > *a = *b; > > } > > smp_wmb(); > > WRITE_ONCE(*y, 1); > > } > > > > P1(int *a, int *b, int *x, int *y) { > > int r1; > > > > r1 = READ_ONCE(*y); > > smp_rmb(); > > if (r1 == 1) { > > *b = *a; > > } > > smp_wmb(); > > WRITE_ONCE(*x, 1); > > } > > > > exists b=42 > > > > The root cause is an interplay between plain accesses and rw-fences, i.e., > > smp_rmb() and smp_wmb(): while smp_rmb() and smp_wmb() provide sufficient > > ordering for plain accesses to rule out data races, they do not in the current > > formalization generally actually order the plain accesses, allowing, e.g., the > > load and stores to *b to proceed in any order even if P1 reads from P0; and in > > particular, the marked accesses around those plain accesses are also not > > ordered, which causes this OOTA. > > That's right. The memory model deliberately tries to avoid placing > restrictions on plain accesses, whenever it can. > > In the example above, for instance, I think it's more interesting to ask > > exists 0:r1=1 /\ 1:r1=1 > > than to concentrate on a and b. > > OOTA is a very difficult subject. It can be approached only by making > the memory model take all sorts of compiler optimizations into account, > and doing this for all possible optimizations is not feasible. Mark Batty and his students believe otherwise, but I am content to let them make that argument. As in I agree with you rather than them. At least unless and until they make their argument. ;-) > (For example, in a presentation to the C++ working group last year, Paul > and I didn't try to show how to extend the C++ memory model to exclude > OOTA [other than by fiat, as it does now]. Instead we argued that with > the existing memory model, no reasonable compiler would ever produce an > executable that could exhibit OOTA and so the memory model didn't need > to be changed.) Furthermore, the LKMM design choice was that if a given litmus test was flagged as having a data race, anything might happen, including OOTA. In case there is interest, that presentation may be found here: https://drive.google.com/file/d/1ZeJlUJfH90S2uf2wRvNXQvM4jNVSlZI8/view?usp=sharing The most recent version of the working paper may be found here: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3064r2.pdf Thanx, Paul > > In this patch, we choose the rather conservative approach of forcing only the > > order of these marked accesses, specifically, when a marked read r is > > separated from a plain read r' by an smp_rmb() (or r' has an address > > dependency on r or is r'), on which a write w' depends, and w' is either plain > > and seperated by a subsequent write w by an smp_wmb() (or w' is w), then r > > precedes w in ppo. > > Is this really valid? In the example above, if there were no other > references to a or b in the rest of the program, the compiler could > eliminate them entirely. (Whether the result could count as OOTA is > open to question, but that's not the point.) Is it not possible that a > compiler might find other ways to defeat your intentions? > > In any case, my feeling is that memory models for higher languages > (i.e., anything above the assembler level) should not try very hard to > address the question of OOTA. And for LKMM, OOTA involving _plain_ > accesses is doubly out of bounds. > > Your proposed change seems to add a significant complication to the > memory model for a not-very-clear benefit. > > Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-07 18:47 ` Paul E. McKenney @ 2025-01-08 17:39 ` Jonas Oberhauser 2025-01-08 18:09 ` Paul E. McKenney 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-08 17:39 UTC (permalink / raw) To: paulmck, Alan Stern Cc: parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/7/2025 um 7:47 PM schrieb Paul E. McKenney: > On Tue, Jan 07, 2025 at 11:09:55AM -0500, Alan Stern wrote: > >> (For example, in a presentation to the C++ working group last year, Paul >> and I didn't try to show how to extend the C++ memory model to exclude >> OOTA [other than by fiat, as it does now]. Instead we argued that with >> the existing memory model, no reasonable compiler would ever produce an >> executable that could exhibit OOTA and so the memory model didn't need >> to be changed.) > > Furthermore, the LKMM design choice was that if a given litmus test was > flagged as having a data race, anything might happen, including OOTA. Note that there is no data race in this litmus test. There is a race condition on plain accesses according to LKMM, but LKMM also says that this is *not* a data race. The patch removes the (actually non-existant) race condition by saying that a critical section that is protected from having a data race with address dependency or rmb/wmb (which LKMM already says works for avoiding data races), is in fact also ordered and therefore has no race condition either. As a side effect :), this happens to fix OOTA in general in LKMM. Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-08 17:39 ` Jonas Oberhauser @ 2025-01-08 18:09 ` Paul E. McKenney 2025-01-08 19:17 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Paul E. McKenney @ 2025-01-08 18:09 UTC (permalink / raw) To: Jonas Oberhauser Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Wed, Jan 08, 2025 at 06:39:12PM +0100, Jonas Oberhauser wrote: > > > Am 1/7/2025 um 7:47 PM schrieb Paul E. McKenney: > > On Tue, Jan 07, 2025 at 11:09:55AM -0500, Alan Stern wrote: > > > > > (For example, in a presentation to the C++ working group last year, Paul > > > and I didn't try to show how to extend the C++ memory model to exclude > > > OOTA [other than by fiat, as it does now]. Instead we argued that with > > > the existing memory model, no reasonable compiler would ever produce an > > > executable that could exhibit OOTA and so the memory model didn't need > > > to be changed.) > > > > Furthermore, the LKMM design choice was that if a given litmus test was > > flagged as having a data race, anything might happen, including OOTA. > > Note that there is no data race in this litmus test. > There is a race condition on plain accesses according to LKMM, > but LKMM also says that this is *not* a data race. > > The patch removes the (actually non-existant) race condition by saying that > a critical section that is protected from having a data race with address > dependency or rmb/wmb (which LKMM already says works for avoiding data > races), is in fact also ordered and therefore has no race condition either. > > As a side effect :), this happens to fix OOTA in general in LKMM. Fair point, no data race is flagged. On the other hand, Documentation/memory-barriers.txt says the following: ------------------------------------------------------------------------ However, stores are not speculated. This means that ordering -is- provided for load-store control dependencies, as in the following example: q = READ_ONCE(a); if (q) { WRITE_ONCE(b, 1); } Control dependencies pair normally with other types of barriers. That said, please note that neither READ_ONCE() nor WRITE_ONCE() are optional! Without the READ_ONCE(), the compiler might combine the load from 'a' with other loads from 'a'. Without the WRITE_ONCE(), the compiler might combine the store to 'b' with other stores to 'b'. Either can result in highly counterintuitive effects on ordering. ------------------------------------------------------------------------ If I change the two plain assignments to use WRITE_ONCE() as required by memory-barriers.txt, OOTA is avoided: ------------------------------------------------------------------------ C jonas {} P0(int *a, int *b, int *x, int *y) { int r1; r1 = READ_ONCE(*x); smp_rmb(); if (r1 == 1) { WRITE_ONCE(*a, *b); } smp_wmb(); WRITE_ONCE(*y, 1); } P1(int *a, int *b, int *x, int *y) { int r1; r1 = READ_ONCE(*y); smp_rmb(); if (r1 == 1) { WRITE_ONCE(*b, *a); } smp_wmb(); WRITE_ONCE(*x, 1); } exists b=42 ------------------------------------------------------------------------ $ herd7 -conf linux-kernel.cfg /tmp/jonas.litmus Test jonas Allowed States 1 [b]=0; No Witnesses Positive: 0 Negative: 3 Condition exists ([b]=42) Observation jonas Never 0 3 Time jonas 0.01 Hash=39c0c230bd221b2f54fc88be6771372a ------------------------------------------------------------------------ If LKMM is to allow plain assignments in this case, we need to also update memory-barriers.txt. I am reluctant to do this because the community needs to trust plain C-language assignments less rather than more, especially given that compilers are continuing to become more aggressive. Yes, in your example, the "if" and the two explicit barriers should prevent compilers from being too clever, but these sorts of things are more fragile than one might think given future code changes. Thoughts? Thanx, Paul ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-08 18:09 ` Paul E. McKenney @ 2025-01-08 19:17 ` Jonas Oberhauser 2025-01-09 17:54 ` Paul E. McKenney 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-08 19:17 UTC (permalink / raw) To: paulmck Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney: > On Wed, Jan 08, 2025 at 06:39:12PM +0100, Jonas Oberhauser wrote: >> >> >> Am 1/7/2025 um 7:47 PM schrieb Paul E. McKenney: >>> On Tue, Jan 07, 2025 at 11:09:55AM -0500, Alan Stern wrote: >>> >>>> (For example, in a presentation to the C++ working group last year, Paul >>>> and I didn't try to show how to extend the C++ memory model to exclude >>>> OOTA [other than by fiat, as it does now]. Instead we argued that with >>>> the existing memory model, no reasonable compiler would ever produce an >>>> executable that could exhibit OOTA and so the memory model didn't need >>>> to be changed.) >>> >>> Furthermore, the LKMM design choice was that if a given litmus test was >>> flagged as having a data race, anything might happen, including OOTA. >> >> Note that there is no data race in this litmus test. >> There is a race condition on plain accesses according to LKMM, >> but LKMM also says that this is *not* a data race. >> >> The patch removes the (actually non-existant) race condition by saying that >> a critical section that is protected from having a data race with address >> dependency or rmb/wmb (which LKMM already says works for avoiding data >> races), is in fact also ordered and therefore has no race condition either. >> >> As a side effect :), this happens to fix OOTA in general in LKMM. > > Fair point, no data race is flagged. > > On the other hand, Documentation/memory-barriers.txt says the following: > > ------------------------------------------------------------------------ > > However, stores are not speculated. This means that ordering -is- provided > for load-store control dependencies, as in the following example: > > q = READ_ONCE(a); > if (q) { > WRITE_ONCE(b, 1); > } > > Control dependencies pair normally with other types of barriers. > That said, please note that neither READ_ONCE() nor WRITE_ONCE() > are optional! Without the READ_ONCE(), the compiler might combine the > load from 'a' with other loads from 'a'. Without the WRITE_ONCE(), > the compiler might combine the store to 'b' with other stores to 'b'. > Either can result in highly counterintuitive effects on ordering. > > ------------------------------------------------------------------------ > > If I change the two plain assignments to use WRITE_ONCE() as required > by memory-barriers.txt, OOTA is avoided: I think this direction of inquiry is a bit misleading. There need not be any speculative store at all: P0(int *a, int *b, int *x, int *y) { int r1; int r2 = 0; r1 = READ_ONCE(*x); smp_rmb(); if (r1 == 1) { r2 = *b; } WRITE_ONCE(*a, r2); smp_wmb(); WRITE_ONCE(*y, 1); } P1(int *a, int *b, int *x, int *y) { int r1; int r2 = 0; r1 = READ_ONCE(*y); smp_rmb(); if (r1 == 1) { r2 = *a; } WRITE_ONCE(*b, r2); smp_wmb(); WRITE_ONCE(*x, 1); } The reason that the WRITE_ONCE helps in the speculative store case is that both its ctrl dependency and the wmb provide ordering, which together creates ordering between *x and *y. I should point out that a version of herd7 that respects semantic dependencies (instead of syntactic only) might solve this case, by figuring out that the WRITE_ONCE to *a resp. *b depends on the first READ_ONCE. Here's another funny example: P0(int *a, int *b, int *x, int *y) { int r1; r1 = READ_ONCE(*x); smp_rmb(); int r2 = READ_ONCE(*b); if (r1 == 1) { r2 = *b; } WRITE_ONCE(*a, r2); smp_wmb(); WRITE_ONCE(*y, 1); } P1(int *a, int *b, int *x, int *y) { int r1; r1 = READ_ONCE(*y); smp_rmb(); int r2 = READ_ONCE(*a); if (r1 == 1) { r2 = *a; } WRITE_ONCE(*b, r2); smp_wmb(); WRITE_ONCE(*x, 1); } exists (0:r1=1 /\ 1:r1=1) Is there still a semantic dependency from the inner load to the store to *a resp. *b, especially since the outer load from *b resp. *a is reading from the same store as the inner one? The compiler is definitely allowed to eliminate the inner load, which *also removes the OOTA*. Please do look at the OOTA graph generated by herd7 for this one, it looks quite amazing. > If LKMM is to allow plain assignments in this case, we need to also update > memory-barriers.txt. But I am not suggesting to allow the plain assignment *by itself*. In particular, my patch does not enforce any happens-before order between the READ_ONCE(*x) and the plain assignment to *b. It only provides order between READ_ONCE(*x) and WRITE_ONCE(*y,...), through dependencies in the plain critical section. Which must be 1) properly guarded (e.g., by rmb/wmb) and 2) live. Because of this, I don't know if the text needs much updating, although one could add a text in the direction that "in the rare case where compilers do guarantee that a load and dependent store (including plain) will be emitted in some form, one can use rmb and wmb to ensure the order of surrounding marked accesses". > I am reluctant to do this because the community> needs to trust plain C-language assignments less rather than more, > especially given that compilers are continuing to become more aggressive. Yes, I agree. > Yes, in your example, the "if" and the two explicit barriers should > prevent compilers from being too clever, but these sorts of things are > more fragile than one might think given future code changes. > > Thoughts? We certainly need to be very careful about how to formalize what the compiler is allowed of doing and what it is not. And even more careful about how to communicate this. Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-08 19:17 ` Jonas Oberhauser @ 2025-01-09 17:54 ` Paul E. McKenney 2025-01-09 18:35 ` Jonas Oberhauser 2025-01-09 20:37 ` Peter Zijlstra 0 siblings, 2 replies; 59+ messages in thread From: Paul E. McKenney @ 2025-01-09 17:54 UTC (permalink / raw) To: Jonas Oberhauser Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote: > > > Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney: > > On Wed, Jan 08, 2025 at 06:39:12PM +0100, Jonas Oberhauser wrote: > > > > > > > > > Am 1/7/2025 um 7:47 PM schrieb Paul E. McKenney: > > > > On Tue, Jan 07, 2025 at 11:09:55AM -0500, Alan Stern wrote: > > > > > > > > > (For example, in a presentation to the C++ working group last year, Paul > > > > > and I didn't try to show how to extend the C++ memory model to exclude > > > > > OOTA [other than by fiat, as it does now]. Instead we argued that with > > > > > the existing memory model, no reasonable compiler would ever produce an > > > > > executable that could exhibit OOTA and so the memory model didn't need > > > > > to be changed.) > > > > > > > > Furthermore, the LKMM design choice was that if a given litmus test was > > > > flagged as having a data race, anything might happen, including OOTA. > > > > > > Note that there is no data race in this litmus test. > > > There is a race condition on plain accesses according to LKMM, > > > but LKMM also says that this is *not* a data race. > > > > > > The patch removes the (actually non-existant) race condition by saying that > > > a critical section that is protected from having a data race with address > > > dependency or rmb/wmb (which LKMM already says works for avoiding data > > > races), is in fact also ordered and therefore has no race condition either. > > > > > > As a side effect :), this happens to fix OOTA in general in LKMM. > > > > Fair point, no data race is flagged. > > > > On the other hand, Documentation/memory-barriers.txt says the following: > > > > ------------------------------------------------------------------------ > > > > However, stores are not speculated. This means that ordering -is- provided > > for load-store control dependencies, as in the following example: > > > > q = READ_ONCE(a); > > if (q) { > > WRITE_ONCE(b, 1); > > } > > > > Control dependencies pair normally with other types of barriers. > > That said, please note that neither READ_ONCE() nor WRITE_ONCE() > > are optional! Without the READ_ONCE(), the compiler might combine the > > load from 'a' with other loads from 'a'. Without the WRITE_ONCE(), > > the compiler might combine the store to 'b' with other stores to 'b'. > > Either can result in highly counterintuitive effects on ordering. > > > > ------------------------------------------------------------------------ > > > > If I change the two plain assignments to use WRITE_ONCE() as required > > by memory-barriers.txt, OOTA is avoided: > > > I think this direction of inquiry is a bit misleading. There need not be any > speculative store at all: > > > > P0(int *a, int *b, int *x, int *y) { > int r1; > int r2 = 0; > r1 = READ_ONCE(*x); > smp_rmb(); > if (r1 == 1) { > r2 = *b; > } > WRITE_ONCE(*a, r2); > smp_wmb(); > WRITE_ONCE(*y, 1); > } > > P1(int *a, int *b, int *x, int *y) { > int r1; > > int r2 = 0; > > r1 = READ_ONCE(*y); > smp_rmb(); > if (r1 == 1) { > r2 = *a; > } > WRITE_ONCE(*b, r2); > smp_wmb(); > WRITE_ONCE(*x, 1); > } > > > The reason that the WRITE_ONCE helps in the speculative store case is that > both its ctrl dependency and the wmb provide ordering, which together > creates ordering between *x and *y. Ah, and that is because LKMM does not enforce control dependencies past the end of the "if" statement. Cute! But memory-barriers.txt requires that the WRITE_ONCE() be within the "if" statement for control dependencies to exist, so LKMM is in agreement with memory-barriers.txt in this case. So again, if we change this, we need to also change memory-barriers.txt. > I should point out that a version of herd7 that respects semantic > dependencies (instead of syntactic only) might solve this case, by figuring > out that the WRITE_ONCE to *a resp. *b depends on the first READ_ONCE. > > Here's another funny example: > > > P0(int *a, int *b, int *x, int *y) { > int r1; > > r1 = READ_ONCE(*x); > smp_rmb(); > int r2 = READ_ONCE(*b); > if (r1 == 1) { > r2 = *b; > } > WRITE_ONCE(*a, r2); > smp_wmb(); > WRITE_ONCE(*y, 1); > } > > P1(int *a, int *b, int *x, int *y) { > int r1; > > r1 = READ_ONCE(*y); > smp_rmb(); > int r2 = READ_ONCE(*a); > if (r1 == 1) { > r2 = *a; > } > WRITE_ONCE(*b, r2); > smp_wmb(); > WRITE_ONCE(*x, 1); > } > > exists (0:r1=1 /\ 1:r1=1) > > Is there still a semantic dependency from the inner load to the store to *a > resp. *b, especially since the outer load from *b resp. *a is reading from > the same store as the inner one? The compiler is definitely allowed to > eliminate the inner load, which *also removes the OOTA*. Also cute. And also the WRITE_ONCE() outside of the "if" statement. > Please do look at the OOTA graph generated by herd7 for this one, it looks > quite amazing. Given the way this morning is going, I must take your word for it... > > If LKMM is to allow plain assignments in this case, we need to also update > > memory-barriers.txt. > > But I am not suggesting to allow the plain assignment *by itself*. > In particular, my patch does not enforce any happens-before order between > the READ_ONCE(*x) and the plain assignment to *b. > It only provides order between READ_ONCE(*x) and WRITE_ONCE(*y,...), through > dependencies in the plain critical section. > > Which must be 1) properly guarded (e.g., by rmb/wmb) and 2) live. > > Because of this, I don't know if the text needs much updating, although one > could add a text in the direction that "in the rare case where compilers do > guarantee that a load and dependent store (including plain) will be emitted > in some form, one can use rmb and wmb to ensure the order of surrounding > marked accesses". If we want to respect something containing a control dependency to a WRITE_ONCE() not in the body of the "if" statement, we need to make some change to memory-barriers.txt. > > I am reluctant to do this because the community> needs to trust plain > C-language assignments less rather than more, > > especially given that compilers are continuing to become more aggressive. > > Yes, I agree. Whew!!! ;-) > > Yes, in your example, the "if" and the two explicit barriers should > > prevent compilers from being too clever, but these sorts of things are > > more fragile than one might think given future code changes. > > > > Thoughts? > > We certainly need to be very careful about how to formalize what the > compiler is allowed of doing and what it is not. And even more careful about > how to communicate this. No argument here! Thanx, Paul ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-09 17:54 ` Paul E. McKenney @ 2025-01-09 18:35 ` Jonas Oberhauser 2025-01-10 14:54 ` Paul E. McKenney 2025-01-09 20:37 ` Peter Zijlstra 1 sibling, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-09 18:35 UTC (permalink / raw) To: paulmck Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney: > On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote: >> >> >> Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney: >>> On Wed, Jan 08, 2025 at 06:39:12PM +0100, Jonas Oberhauser wrote: >>>> >>>> >>>> Am 1/7/2025 um 7:47 PM schrieb Paul E. McKenney: >>>>> On Tue, Jan 07, 2025 at 11:09:55AM -0500, Alan Stern wrote: >>>>> >>>> The patch removes the (actually non-existant) race condition by saying that >>>> a critical section that is protected from having a data race with address >>>> dependency or rmb/wmb (which LKMM already says works for avoiding data >>>> races), is in fact also ordered and therefore has no race condition either. >>>> >>>> As a side effect :), this happens to fix OOTA in general in LKMM. >>> >>> Fair point, no data race is flagged. >>> >>> On the other hand, Documentation/memory-barriers.txt says the following: >>> >>> ------------------------------------------------------------------------ >>> >>> However, stores are not speculated. This means that ordering -is- provided >>> for load-store control dependencies, as in the following example: >>> >>> q = READ_ONCE(a); >>> if (q) { >>> WRITE_ONCE(b, 1); >>> } >>> >>> Control dependencies pair normally with other types of barriers. >>> That said, please note that neither READ_ONCE() nor WRITE_ONCE() >>> are optional! Without the READ_ONCE(), the compiler might combine the >>> load from 'a' with other loads from 'a'. Without the WRITE_ONCE(), >>> the compiler might combine the store to 'b' with other stores to 'b'. >>> Either can result in highly counterintuitive effects on ordering. >>> >>> ------------------------------------------------------------------------ >>> >>> If I change the two plain assignments to use WRITE_ONCE() as required >>> by memory-barriers.txt, OOTA is avoided: >> >> >> I think this direction of inquiry is a bit misleading. There need not be any >> speculative store at all: >> >> >> >> P0(int *a, int *b, int *x, int *y) { >> int r1; >> int r2 = 0; >> r1 = READ_ONCE(*x); >> smp_rmb(); >> if (r1 == 1) { >> r2 = *b; >> } >> WRITE_ONCE(*a, r2); >> smp_wmb(); >> WRITE_ONCE(*y, 1); >> } >> >> P1(int *a, int *b, int *x, int *y) { >> int r1; >> >> int r2 = 0; >> >> r1 = READ_ONCE(*y); >> smp_rmb(); >> if (r1 == 1) { >> r2 = *a; >> } >> WRITE_ONCE(*b, r2); >> smp_wmb(); >> WRITE_ONCE(*x, 1); >> } >> >> >> The reason that the WRITE_ONCE helps in the speculative store case is that >> both its ctrl dependency and the wmb provide ordering, which together >> creates ordering between *x and *y. > > Ah, and that is because LKMM does not enforce control dependencies past > the end of the "if" statement. Cute! > > But memory-barriers.txt requires that the WRITE_ONCE() be within the > "if" statement for control dependencies to exist, so LKMM is in agreement > with memory-barriers.txt in this case. So again, if we change this, > we need to also change memory-barriers.txt. > [...] > If we want to respect something containing a control dependency to a > WRITE_ONCE() not in the body of the "if" statement, we need to make some > change to memory-barriers.txt. I'm not sure what you denotate by *this* in "if we change this", but just to clarify, I am not thinking of claiming that there were a (semantic) control dependency to WRITE_ONCE(*b, r2) in this example. There is however a data dependency from r2 = *a to WRITE_ONCE, and I would say that there is a semantic data (not control) dependency from r1 = READ_ONCE(*y) to WRITE_ONCE(*b, r2), too: depending on the value read from *y, the value stored to *b will be different. The latter would be enough to avoid OOTA according to the mainline LKMM, but currently this semantic dependency is not detected by herd7. I currently can not come up with an example where there would be a (semantic) control dependency from a load to a store that is not in the arm of an if statement (or a loop / switch of some form with the branch depending on the load). I think the control dependency is just a red herring. It is only there to avoid the data race. In a hypothetical LKMM where reading in a race is not a data race unless the data is used (*1), this would also work: unsigned int r1; unsigned int r2 = 0; r1 = READ_ONCE(*x); smp_rmb(); r2 = *b; WRITE_ONCE(*a, (~r1 + 1) & r2); smp_wmb(); WRITE_ONCE(*y, 1); Here in case r1 == 0, the value of r2 is not used, so there is a race but there would not be data race in the hypothetical LKMM. This example would also have OOTA under such a hypothetical LKMM, but not with my patch, because in the case where r1 == 1, READ_ONCE(*x) is seperated by rmb from the load from *b, upon which the store to *a depends, which itself is seperated by a wmb from the store to WRITE_ONCE(*y,1) and this would ensure that READ_ONCE(*x) and WRITE_ONCE(*y,1) can not be reordered with each other anymore. (*1= such a definition is not absurd! One needs to allow such races to make sequence locks and other similar datastructures well-defined.) I currently don't know another way than the if-statement to avoid the data race in the program(*2) in the current LKMM, so that's why I rely on it, but at least conceptually it is orthogonal to the problem. (*2=we can avoid the data race flag in herd by using filter, and only generating the graphs where r1==1 and there is no data race. But that is cheating -- the program is not valid under mainline LKMM.) >> Please do look at the OO TA graph generated by herd7 for this one, it looks >> quite amazing. > > Given the way this morning is going, I must take your word for it... That sounds awful :( Technical issues? With any luck, you can test it on arm's herd7 web interface at https://developer.arm.com/herd7 (just don't be like me and type all the code first and then change the drop-down selector to Linux - that will reset the code window...) Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-09 18:35 ` Jonas Oberhauser @ 2025-01-10 14:54 ` Paul E. McKenney 2025-01-10 16:21 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Paul E. McKenney @ 2025-01-10 14:54 UTC (permalink / raw) To: Jonas Oberhauser Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote: > Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney: > > On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote: > > > > > > > > > Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney: > > > > On Wed, Jan 08, 2025 at 06:39:12PM +0100, Jonas Oberhauser wrote: > > > > > > > > > > > > > > > Am 1/7/2025 um 7:47 PM schrieb Paul E. McKenney: > > > > > > On Tue, Jan 07, 2025 at 11:09:55AM -0500, Alan Stern wrote: > > > > > > > > > > > The patch removes the (actually non-existant) race condition by saying that > > > > > a critical section that is protected from having a data race with address > > > > > dependency or rmb/wmb (which LKMM already says works for avoiding data > > > > > races), is in fact also ordered and therefore has no race condition either. > > > > > > > > > > As a side effect :), this happens to fix OOTA in general in LKMM. > > > > > > > > Fair point, no data race is flagged. > > > > > > > > On the other hand, Documentation/memory-barriers.txt says the following: > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > However, stores are not speculated. This means that ordering -is- provided > > > > for load-store control dependencies, as in the following example: > > > > > > > > q = READ_ONCE(a); > > > > if (q) { > > > > WRITE_ONCE(b, 1); > > > > } > > > > > > > > Control dependencies pair normally with other types of barriers. > > > > That said, please note that neither READ_ONCE() nor WRITE_ONCE() > > > > are optional! Without the READ_ONCE(), the compiler might combine the > > > > load from 'a' with other loads from 'a'. Without the WRITE_ONCE(), > > > > the compiler might combine the store to 'b' with other stores to 'b'. > > > > Either can result in highly counterintuitive effects on ordering. > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > If I change the two plain assignments to use WRITE_ONCE() as required > > > > by memory-barriers.txt, OOTA is avoided: > > > > > > > > > I think this direction of inquiry is a bit misleading. There need not be any > > > speculative store at all: > > > > > > > > > > > > P0(int *a, int *b, int *x, int *y) { > > > int r1; > > > int r2 = 0; > > > r1 = READ_ONCE(*x); > > > smp_rmb(); > > > if (r1 == 1) { > > > r2 = *b; > > > } > > > WRITE_ONCE(*a, r2); > > > smp_wmb(); > > > WRITE_ONCE(*y, 1); > > > } > > > > > > P1(int *a, int *b, int *x, int *y) { > > > int r1; > > > > > > int r2 = 0; > > > > > > r1 = READ_ONCE(*y); > > > smp_rmb(); > > > if (r1 == 1) { > > > r2 = *a; > > > } > > > WRITE_ONCE(*b, r2); > > > smp_wmb(); > > > WRITE_ONCE(*x, 1); > > > } > > > > > > > > > The reason that the WRITE_ONCE helps in the speculative store case is that > > > both its ctrl dependency and the wmb provide ordering, which together > > > creates ordering between *x and *y. > > > > Ah, and that is because LKMM does not enforce control dependencies past > > the end of the "if" statement. Cute! > > > > But memory-barriers.txt requires that the WRITE_ONCE() be within the > > "if" statement for control dependencies to exist, so LKMM is in agreement > > with memory-barriers.txt in this case. So again, if we change this, > > we need to also change memory-barriers.txt. > > [...] > > If we want to respect something containing a control dependency to a > > WRITE_ONCE() not in the body of the "if" statement, we need to make some > > change to memory-barriers.txt. > > I'm not sure what you denotate by *this* in "if we change this", but just to > clarify, I am not thinking of claiming that there were a (semantic) control > dependency to WRITE_ONCE(*b, r2) in this example. > > There is however a data dependency from r2 = *a to WRITE_ONCE, and I would > say that there is a semantic data (not control) dependency from r1 = > READ_ONCE(*y) to WRITE_ONCE(*b, r2), too: depending on the value read from > *y, the value stored to *b will be different. The latter would be enough to > avoid OOTA according to the mainline LKMM, but currently this semantic > dependency is not detected by herd7. According to LKMM, address and data dependencies must be headed by rcu_dereference() or similar. See Documentation/RCU/rcu_dereference.rst. Therefore, there is nothing to chain the control dependency with. > I currently can not come up with an example where there would be a > (semantic) control dependency from a load to a store that is not in the arm > of an if statement (or a loop / switch of some form with the branch > depending on the load). > > I think the control dependency is just a red herring. It is only there to > avoid the data race. Well, that red herring needs to have a companion fish to swim with in order to enforce ordering, and I am not seeing that companion. Or am I (yet again!) missing something subtle here? > In a hypothetical LKMM where reading in a race is not a data race unless the > data is used (*1), this would also work: You lost me on the "(*1)", which might mean that I am misunderstanding your text and examples below. > unsigned int r1; > unsigned int r2 = 0; > r1 = READ_ONCE(*x); > smp_rmb(); > r2 = *b; This load from *b does not head any sort of dependency per LKMM, as noted in rcu_dereference.rst. As that document states, there are too many games that compilers are permitted to play with plain C-language loads. > WRITE_ONCE(*a, (~r1 + 1) & r2); > smp_wmb(); > WRITE_ONCE(*y, 1); > > > Here in case r1 == 0, the value of r2 is not used, so there is a race but > there would not be data race in the hypothetical LKMM. That plain C-language load from b, if concurrent with any sort of store to b, really is a data race. Sure, a compiler that can prove that r1==0 at the WRITE_ONCE() to a might optimize that load away, but the C-language definition of data race still applies. Ah, I finally see that (*1) is a footnote. > This example would also have OOTA under such a hypothetical LKMM, but not > with my patch, because in the case where r1 == 1, > READ_ONCE(*x) is seperated by rmb from the load from *b, > upon which the store to *a depends, > which itself is seperated by a wmb from the store to WRITE_ONCE(*y,1) > and this would ensure that READ_ONCE(*x) and WRITE_ONCE(*y,1) can not be > reordered with each other anymore. > > > (*1= such a definition is not absurd! One needs to allow such races to make > sequence locks and other similar datastructures well-defined.) I am currently not at all comfortable with the thought of allowing plain C-language loads to head any sort of dependency. I really did put that restriction into both memory-barriers.txt and rcu_dereference.rst intentionally. There is the old saying "Discipline = freedom", and therefore compilers' lack of discipline surrounding plain C-language loads implies a lack of freedom. ;-) > I currently don't know another way than the if-statement to avoid the data > race in the program(*2) in the current LKMM, so that's why I rely on it, but > at least conceptually it is orthogonal to the problem. > > (*2=we can avoid the data race flag in herd by using filter, and only > generating the graphs where r1==1 and there is no data race. But that is > cheating -- the program is not valid under mainline LKMM.) Such cheats can be valid in cases where that is how you tell herd7 about some restriction that it cannot be told otherwise, but in this case, I agree that this cheat is unhelpful. > > > Please do look at the OO > TA graph generated by herd7 for this one, it looks > > > quite amazing. > > > > Given the way this morning is going, I must take your word for it... > > That sounds awful :( > Technical issues? Nothing awful, just catching up from tracking down a Linux-kernel RCU bug that fought well [1] and from the holidays. > With any luck, you can test it on arm's herd7 web interface at > https://developer.arm.com/herd7 (just don't be like me and type all the code > first and then change the drop-down selector to Linux - that will reset the > code window...) Ah, I have been running them locally, and didn't have time to chase down the herd7 arguments. Which reminds me... We should decide which of these examples should be added to the github litmus archive, perhaps to illustrate the fact that plain C-language loads do not head dependency chains. Thoughts? Thanx, Paul [1] https://people.kernel.org/paulmck/hunting-a-tree03-heisenbug ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-10 14:54 ` Paul E. McKenney @ 2025-01-10 16:21 ` Jonas Oberhauser 2025-01-13 22:04 ` Paul E. McKenney 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-10 16:21 UTC (permalink / raw) To: paulmck Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/10/2025 um 3:54 PM schrieb Paul E. McKenney: > On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote: >> Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney: >>> On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote: >>>> >>>> >>>> Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney: >>>>> If I change the two plain assignments to use WRITE_ONCE() as required >>>>> by memory-barriers.txt, OOTA is avoided: >>>> >>>> >>>> I think this direction of inquiry is a bit misleading. There need not be any >>>> speculative store at all: >>>> >>>> >>>> >>>> P0(int *a, int *b, int *x, int *y) { >>>> int r1; >>>> int r2 = 0; >>>> r1 = READ_ONCE(*x); >>>> smp_rmb(); >>>> if (r1 == 1) { >>>> r2 = *b; >>>> } >>>> WRITE_ONCE(*a, r2); >>>> smp_wmb(); >>>> WRITE_ONCE(*y, 1); >>>> } >>>> >>>> P1(int *a, int *b, int *x, int *y) { >>>> int r1; >>>> >>>> int r2 = 0; >>>> >>>> r1 = READ_ONCE(*y); >>>> smp_rmb(); >>>> if (r1 == 1) { >>>> r2 = *a; >>>> } >>>> WRITE_ONCE(*b, r2); >>>> smp_wmb(); >>>> WRITE_ONCE(*x, 1); >>>> } >>>> >>>> >>> If we want to respect something containing a control dependency to a >>> WRITE_ONCE() not in the body of the "if" statement, we need to make some >>> change to memory-barriers.txt. >> >> I'm not sure what you denotate by *this* in "if we change this", but just to >> clarify, I am not thinking of claiming that there were a (semantic) control >> dependency to WRITE_ONCE(*b, r2) in this example. >> >> There is however a data dependency from r2 = *a to WRITE_ONCE, and I would >> say that there is a semantic data (not control) dependency from r1 = >> READ_ONCE(*y) to WRITE_ONCE(*b, r2), too: depending on the value read from >> *y, the value stored to *b will be different. The latter would be enough to >> avoid OOTA according to the mainline LKMM, but currently this semantic >> dependency is not detected by herd7. > > According to LKMM, address and data dependencies must be headed by > rcu_dereference() or similar. See Documentation/RCU/rcu_dereference.rst. > > Therefore, there is nothing to chain the control dependency with. Note that herd7 does generate dependencies. And speaking informally, there clearly is a semantic dependency. Both the original formalization of LKMM and my patch do say that a plain load at the head of a dependency chain does not provide any dependency ordering, i.e., [Plain & R] ; dep is never part of hb, both in LKMM and in my patch. By the way, if your concern is the dependency *starting* from the plain load, then we can look at examples where the dependency starts from a marked load: r1 = READ_ONCE(*x); smp_rmb(); if (r1 == 1) { r2 = READ_ONCE(*a); } *b = 1; smp_wmb(); WRITE_ONCE(*y,1); This is more or less analogous to the case of the addr ; [Plain] ; wmb case you already have. >> I currently can not come up with an example where there would be a >> (semantic) control dependency from a load to a store that is not in the arm >> of an if statement (or a loop / switch of some form with the branch >> depending on the load). >> >> I think the control dependency is just a red herring. It is only there to >> avoid the data race. > > Well, that red herring needs to have a companion fish to swim with in > order to enforce ordering, and I am not seeing that companion. > > Or am I (yet again!) missing something subtle here? It makes more sense to think about how people do message passing (or seqlock), which might look something like this: [READ_ONCE] rmb [plain read] and [plain write] wmb [WRITE_ONCE] Clearly LKMM says that there is some sort of order (not quite happens-before order though) between the READ_ONCE and the plain read, and between the plain write and the WRITE_ONCE. This order is clearly defined in the data race definition, in r-pre-bounded and w-post-bounded. Now consider [READ_ONCE] rmb [plain read] // some code that creates order between the plain accesses [plain write] wmb [WRITE_ONCE] where for some specific reason we can discern that the compiler can not fully eliminate/move across the barrier either this specific plain read, nor the plain write, nor the ordering between the two. In this case, is there order between the READ_ONCE and the WRITE_ONCE, or not? Of course, we know current LKMM says no. I would say that in those very specific cases, we do have ordering. >> In a hypothetical LKMM where reading in a race is not a data race unless the >> data is used (*1), this would also work: > > You lost me on the "(*1)", which might mean that I am misunderstanding > your text and examples below. This was meant to be a footnote :D >> unsigned int r1; >> unsigned int r2 = 0; >> r1 = READ_ONCE(*x); >> smp_rmb(); >> r2 = *b; > > This load from *b does not head any sort of dependency per LKMM, as noted > in rcu_dereference.rst. As that document states, there are too many games > that compilers are permitted to play with plain C-language loads. > >> WRITE_ONCE(*a, (~r1 + 1) & r2); >> smp_wmb(); >> WRITE_ONCE(*y, 1); >> >> >> Here in case r1 == 0, the value of r2 is not used, so there is a race but >> there would not be data race in the hypothetical LKMM. > > That plain C-language load from b, if concurrent with any sort of store to > b, really is a data race. Sure, a compiler that can prove that r1==0 at > the WRITE_ONCE() to a might optimize that load away, but the C-language > definition of data race still applies. It is a data race according to C, but so are all races on WRITE_ONCE and READ_ONCE, so we already do not actually care what C says. What we care about is what the compiler says (and does). The reality is that no matter what kind of crazy optimizations the compiler does to the load and to the concurrent store, all that would happen is that the load "returns" some insane value. But that insane value is not used by the remainder of the computation. I think the right way to think about it is that a race between a read and a write gives the read an indeterminate value, and a race between two writes produces undefined behavior. I vaguely recall that this is even guaranteed by LLVM. That is why sequence locks work, after all. In our internal memory model we have relaxed the definition accordingly and there are a bunch of internally used datastructures that can only be verified because of the relaxation. > > I am currently not at all comfortable with the thought of allowing > plain C-language loads to head any sort of dependency. I really did put > that restriction into both memory-barriers.txt and rcu_dereference.rst > intentionally. There is the old saying "Discipline = freedom", and > therefore compilers' lack of discipline surrounding plain C-language > loads implies a lack of freedom. ;-) Yes, I understand your concern (or more generally, the concern of letting plain accesses play a role in ordering). Obviously, allowing arbitrary plain loads to invoke some kind of ordering because of a dependency is plain (heh) wrong. There are two kinds of potential problems: - the load or its dependent store may not exist in that location at all - the dependency may not really exist The second case is a problem also with marked accesses, and should be handled by herd7 only giving us actual semantic dependencies (whatever those are). It can not be solved in cat. Either way it is a limitation that herd7 (and also other tools) currently has and we already live with. So the new problem we deal with is to somehow restrict the rule to loads and dependent stores that the compiler for whatever reason will not fully eliminate. This problem too can not be solved completely inside cat. We can give an approximation, as discussed with Alan (stating that a store would not be elided if it is read by another thread, and a read would not be elided if it reads from another thread and a store that won't be elided depends on it). This approximation is also limited, e.g., if the addresses of the plain loads and stores have not yet escaped the function, but at least this scenario is currently impossible in herd7 (unlike the fake dependency scenario). In my mind it would again be better to offload the correct selection of "compiler-un(re)movable plain loads and stores" to the tools. That may again not solve the problem fully, but it at least would mean that any changes to address the imprecision wouldn't need to go through the kernel tree, and IMHO it is easier to say LKMM in the cat files is the model, and the interpretation of the model has some limitations. > We should decide which of these examples should be > added to the github litmus archive, perhaps to illustrate the fact that > plain C-language loads do not head dependency chains. Thoughts? I'm not sure that is a good idea, given that running the herd7 tool on the litmus test will clearly show a dependency chain headed by plain loads in the visual graphs (with `doshow rwdep`). Maybe it makes more sense to say in the docs that they may head syntactic or semantic dependency chains, but because of the common case that the compiler may cruelly optimize things, LKMM does not guarantee ordering based on the dependency chains headed by plain loads. That would be consistent with the tooling. > [1] https://people.kernel.org/paulmck/hunting-a-tree03-heisenbug Fun. Thanks :) Duplication and Devious both start with D. Best wishes jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-10 16:21 ` Jonas Oberhauser @ 2025-01-13 22:04 ` Paul E. McKenney 2025-01-16 18:40 ` Paul E. McKenney 2025-01-16 19:08 ` Jonas Oberhauser 0 siblings, 2 replies; 59+ messages in thread From: Paul E. McKenney @ 2025-01-13 22:04 UTC (permalink / raw) To: Jonas Oberhauser Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Fri, Jan 10, 2025 at 05:21:59PM +0100, Jonas Oberhauser wrote: > > > Am 1/10/2025 um 3:54 PM schrieb Paul E. McKenney: > > On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote: > > > Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney: > > > > On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote: > > > > > > > > > > > > > > > Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney: > > > > > > If I change the two plain assignments to use WRITE_ONCE() as required > > > > > > by memory-barriers.txt, OOTA is avoided: > > > > > > > > > > > > > > > I think this direction of inquiry is a bit misleading. There need not be any > > > > > speculative store at all: > > > > > > > > > > > > > > > > > > > > P0(int *a, int *b, int *x, int *y) { > > > > > int r1; > > > > > int r2 = 0; > > > > > r1 = READ_ONCE(*x); > > > > > smp_rmb(); > > > > > if (r1 == 1) { > > > > > r2 = *b; > > > > > } > > > > > WRITE_ONCE(*a, r2); > > > > > smp_wmb(); > > > > > WRITE_ONCE(*y, 1); > > > > > } > > > > > > > > > > P1(int *a, int *b, int *x, int *y) { > > > > > int r1; > > > > > > > > > > int r2 = 0; > > > > > > > > > > r1 = READ_ONCE(*y); > > > > > smp_rmb(); > > > > > if (r1 == 1) { > > > > > r2 = *a; > > > > > } > > > > > WRITE_ONCE(*b, r2); > > > > > smp_wmb(); > > > > > WRITE_ONCE(*x, 1); > > > > > } > > > > > > > > > > > > > > If we want to respect something containing a control dependency to a > > > > WRITE_ONCE() not in the body of the "if" statement, we need to make some > > > > change to memory-barriers.txt. > > > > > > I'm not sure what you denotate by *this* in "if we change this", but just to > > > clarify, I am not thinking of claiming that there were a (semantic) control > > > dependency to WRITE_ONCE(*b, r2) in this example. > > > > > > There is however a data dependency from r2 = *a to WRITE_ONCE, and I would > > > say that there is a semantic data (not control) dependency from r1 = > > > READ_ONCE(*y) to WRITE_ONCE(*b, r2), too: depending on the value read from > > > *y, the value stored to *b will be different. The latter would be enough to > > > avoid OOTA according to the mainline LKMM, but currently this semantic > > > dependency is not detected by herd7. > > > > According to LKMM, address and data dependencies must be headed by > > rcu_dereference() or similar. See Documentation/RCU/rcu_dereference.rst. > > > > Therefore, there is nothing to chain the control dependency with. > > Note that herd7 does generate dependencies. And speaking informally, there > clearly is a semantic dependency. > > Both the original formalization of LKMM and my patch do say that a plain > load at the head of a dependency chain does not provide any dependency > ordering, i.e., > > [Plain & R] ; dep > > is never part of hb, both in LKMM and in my patch. Agreed, LKMM does filter the underlying herd7 dependencies. > By the way, if your concern is the dependency *starting* from the plain > load, then we can look at examples where the dependency starts from a marked > load: > > > r1 = READ_ONCE(*x); > smp_rmb(); > if (r1 == 1) { > r2 = READ_ONCE(*a); > } > *b = 1; > smp_wmb(); > WRITE_ONCE(*y,1); > > This is more or less analogous to the case of the addr ; [Plain] ; wmb case > you already have. This is probably a failure of imagination on my part, but I am not seeing how to create another thread that interacts with that store to "b" without resulting in a data race. Ignoring that, I am not seeing much in the way of LKMM dependencies in that code. > > > I currently can not come up with an example where there would be a > > > (semantic) control dependency from a load to a store that is not in the arm > > > of an if statement (or a loop / switch of some form with the branch > > > depending on the load). > > > > > > I think the control dependency is just a red herring. It is only there to > > > avoid the data race. > > > > Well, that red herring needs to have a companion fish to swim with in > > order to enforce ordering, and I am not seeing that companion. > > > > Or am I (yet again!) missing something subtle here? > > It makes more sense to think about how people do message passing (or > seqlock), which might look something like this: > > [READ_ONCE] > rmb > [plain read] > > and > > [plain write] > wmb > [WRITE_ONCE] > > > Clearly LKMM says that there is some sort of order (not quite happens-before > order though) between the READ_ONCE and the plain read, and between the > plain write and the WRITE_ONCE. This order is clearly defined in the data > race definition, in r-pre-bounded and w-post-bounded. > > Now consider > > [READ_ONCE] > rmb > [plain read] > // some code that creates order between the plain accesses > [plain write] > wmb > [WRITE_ONCE] > > where for some specific reason we can discern that the compiler can not > fully eliminate/move across the barrier either this specific plain read, nor > the plain write, nor the ordering between the two. > > In this case, is there order between the READ_ONCE and the WRITE_ONCE, or > not? Of course, we know current LKMM says no. I would say that in those very > specific cases, we do have ordering. Agreed, for LKMM to deal with seqlock, the read-side critical section would need to use READ_ONCE(), which is a bit unnatural. The C++ standards committee has been discussing this for some time, as that memory model also gives data race in that case. But it might be better to directly model seqlock than to try to make LKMM deal with the underlying atomic operations. > > > In a hypothetical LKMM where reading in a race is not a data race unless the > > > data is used (*1), this would also work: > > > > You lost me on the "(*1)", which might mean that I am misunderstanding > > your text and examples below. > > This was meant to be a footnote :D My first thought was indirecting through a value-1 pointer without the needed casts. ;-) > > > unsigned int r1; > > > unsigned int r2 = 0; > > > r1 = READ_ONCE(*x); > > > smp_rmb(); > > > r2 = *b; > > > > This load from *b does not head any sort of dependency per LKMM, as noted > > in rcu_dereference.rst. As that document states, there are too many games > > that compilers are permitted to play with plain C-language loads. > > > > > WRITE_ONCE(*a, (~r1 + 1) & r2); > > > smp_wmb(); > > > WRITE_ONCE(*y, 1); > > > > > > > > > Here in case r1 == 0, the value of r2 is not used, so there is a race but > > > there would not be data race in the hypothetical LKMM. > > > > That plain C-language load from b, if concurrent with any sort of store to > > b, really is a data race. Sure, a compiler that can prove that r1==0 at > > the WRITE_ONCE() to a might optimize that load away, but the C-language > > definition of data race still applies. > > It is a data race according to C, but so are all races on WRITE_ONCE and > READ_ONCE, so we already do not actually care what C says. The situation with WRITE_ONCE() and READ_ONCE() is debatable. The volatile accesses are defined more by folklore than standardese, and they do seriously constrain the compiler. Which is why many people don't like volatile much. > What we care about is what the compiler says (and does). > > The reality is that no matter what kind of crazy optimizations the compiler > does to the load and to the concurrent store, all that would happen is that > the load "returns" some insane value. But that insane value is not used by > the remainder of the computation. Yes, but only as long as your properly constrain what the code in that read-side critical section is permitted to do. > I think the right way to think about it is that a race between a read and a > write gives the read an indeterminate value, and a race between two writes > produces undefined behavior. I vaguely recall that this is even guaranteed > by LLVM. > > That is why sequence locks work, after all. In our internal memory model we > have relaxed the definition accordingly and there are a bunch of internally > used datastructures that can only be verified because of the relaxation. Again, it is likely best to model sequence locking directly. > > I am currently not at all comfortable with the thought of allowing > > plain C-language loads to head any sort of dependency. I really did put > > that restriction into both memory-barriers.txt and rcu_dereference.rst > > intentionally. There is the old saying "Discipline = freedom", and > > therefore compilers' lack of discipline surrounding plain C-language > > loads implies a lack of freedom. ;-) > > Yes, I understand your concern (or more generally, the concern of letting > plain accesses play a role in ordering). > Obviously, allowing arbitrary plain loads to invoke some kind of ordering > because of a dependency is plain (heh) wrong. > There are two kinds of potential problems: > - the load or its dependent store may not exist in that location at all > - the dependency may not really exist > > The second case is a problem also with marked accesses, and should be > handled by herd7 only giving us actual semantic dependencies (whatever those > are). It can not be solved in cat. Either way it is a limitation that herd7 > (and also other tools) currently has and we already live with. This is easy to say. Please see this paper (which Alan also referred to) for some of the challenges: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3064r2.pdf Mark Batty and his group have identified more gotchas. (See the citation in the above paper.) > So the new problem we deal with is to somehow restrict the rule to loads and > dependent stores that the compiler for whatever reason will not fully > eliminate. > > This problem too can not be solved completely inside cat. We can give an > approximation, as discussed with Alan (stating that a store would not be > elided if it is read by another thread, and a read would not be elided if it > reads from another thread and a store that won't be elided depends on it). > > This approximation is also limited, e.g., if the addresses of the plain > loads and stores have not yet escaped the function, but at least this > scenario is currently impossible in herd7 (unlike the fake dependency > scenario). > > In my mind it would again be better to offload the correct selection of > "compiler-un(re)movable plain loads and stores" to the tools. That may again > not solve the problem fully, but it at least would mean that any changes to > address the imprecision wouldn't need to go through the kernel tree, and > IMHO it is easier to say LKMM in the cat files is the model, and the > interpretation of the model has some limitations. Which we currently do via marked accesses. ;-) > > We should decide which of these examples should be > > added to the github litmus archive, perhaps to illustrate the fact that > > plain C-language loads do not head dependency chains. Thoughts? > > I'm not sure that is a good idea, given that running the herd7 tool on the > litmus test will clearly show a dependency chain headed by plain loads in > the visual graphs (with `doshow rwdep`). > > Maybe it makes more sense to say in the docs that they may head syntactic or > semantic dependency chains, but because of the common case that the compiler > may cruelly optimize things, LKMM does not guarantee ordering based on the > dependency chains headed by plain loads. That would be consistent with the > tooling. Well, LKMM and herd7 have different notions of what constitutes a dependency, and the LKMM notion is most relevant here. (And herd7 needs its broader definition so as to handle a wide variety of memory models.) > > [1] https://people.kernel.org/paulmck/hunting-a-tree03-heisenbug > > Fun. Thanks :) Duplication and Devious both start with D. In reference to the performance and scalability consequences of eliminating that duplication, so does Devine! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-13 22:04 ` Paul E. McKenney @ 2025-01-16 18:40 ` Paul E. McKenney 2025-01-16 19:13 ` Jonas Oberhauser 2025-01-16 19:28 ` Jonas Oberhauser 2025-01-16 19:08 ` Jonas Oberhauser 1 sibling, 2 replies; 59+ messages in thread From: Paul E. McKenney @ 2025-01-16 18:40 UTC (permalink / raw) To: Jonas Oberhauser Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Mon, Jan 13, 2025 at 02:04:58PM -0800, Paul E. McKenney wrote: > On Fri, Jan 10, 2025 at 05:21:59PM +0100, Jonas Oberhauser wrote: > > Am 1/10/2025 um 3:54 PM schrieb Paul E. McKenney: > > > On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote: > > > > Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney: > > > > > On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote: [ . . . ] > > > > I currently can not come up with an example where there would be a > > > > (semantic) control dependency from a load to a store that is not in the arm > > > > of an if statement (or a loop / switch of some form with the branch > > > > depending on the load). > > > > > > > > I think the control dependency is just a red herring. It is only there to > > > > avoid the data race. > > > > > > Well, that red herring needs to have a companion fish to swim with in > > > order to enforce ordering, and I am not seeing that companion. > > > > > > Or am I (yet again!) missing something subtle here? > > > > It makes more sense to think about how people do message passing (or > > seqlock), which might look something like this: > > > > [READ_ONCE] > > rmb > > [plain read] > > > > and > > > > [plain write] > > wmb > > [WRITE_ONCE] > > > > > > Clearly LKMM says that there is some sort of order (not quite happens-before > > order though) between the READ_ONCE and the plain read, and between the > > plain write and the WRITE_ONCE. This order is clearly defined in the data > > race definition, in r-pre-bounded and w-post-bounded. > > > > Now consider > > > > [READ_ONCE] > > rmb > > [plain read] > > // some code that creates order between the plain accesses > > [plain write] > > wmb > > [WRITE_ONCE] > > > > where for some specific reason we can discern that the compiler can not > > fully eliminate/move across the barrier either this specific plain read, nor > > the plain write, nor the ordering between the two. > > > > In this case, is there order between the READ_ONCE and the WRITE_ONCE, or > > not? Of course, we know current LKMM says no. I would say that in those very > > specific cases, we do have ordering. > > Agreed, for LKMM to deal with seqlock, the read-side critical section > would need to use READ_ONCE(), which is a bit unnatural. The C++ > standards committee has been discussing this for some time, as that > memory model also gives data race in that case. > > But it might be better to directly model seqlock than to try to make > LKMM deal with the underlying atomic operations. Maybe I should give an example approach, perhaps inspiring a better approach. o Model reader-writer locks in LKMM, including a relaxed write-lock-held primitive. o Model sequence locks in terms of reader-writer locks: o The seqlock writer maps to write lock. o The seqlock reader maps to read lock, but with a write-lock-held check. If the write lock is held at that point, the seqlock tells the caller to retry. Please note that the point is simply to exercise the failure path. o If a value is read in the seqlock reader and used across a "you need to retry" indication, that flags a seqlock data race. But is there a better way? Thanx, Paul ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-16 18:40 ` Paul E. McKenney @ 2025-01-16 19:13 ` Jonas Oberhauser 2025-01-16 19:31 ` Paul E. McKenney 2025-01-16 19:28 ` Jonas Oberhauser 1 sibling, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-16 19:13 UTC (permalink / raw) To: paulmck Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/16/2025 um 7:40 PM schrieb Paul E. McKenney: > On Mon, Jan 13, 2025 at 02:04:58PM -0800, Paul E. McKenney wrote: >> On Fri, Jan 10, 2025 at 05:21:59PM +0100, Jonas Oberhauser wrote: >>> Am 1/10/2025 um 3:54 PM schrieb Paul E. McKenney: >>>> On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote: >>>>> Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney: >>>>>> On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote: > > [ . . . ] > >>>>> I currently can not come up with an example where there would be a >>>>> (semantic) control dependency from a load to a store that is not in the arm >>>>> of an if statement (or a loop / switch of some form with the branch >>>>> depending on the load). >>>>> >>>>> I think the control dependency is just a red herring. It is only there to >>>>> avoid the data race. >>>> >>>> Well, that red herring needs to have a companion fish to swim with in >>>> order to enforce ordering, and I am not seeing that companion. >>>> >>>> Or am I (yet again!) missing something subtle here? >>> >>> It makes more sense to think about how people do message passing (or >>> seqlock), which might look something like this: >>> >>> [READ_ONCE] >>> rmb >>> [plain read] >>> >>> and >>> >>> [plain write] >>> wmb >>> [WRITE_ONCE] >>> >>> >>> Clearly LKMM says that there is some sort of order (not quite happens-before >>> order though) between the READ_ONCE and the plain read, and between the >>> plain write and the WRITE_ONCE. This order is clearly defined in the data >>> race definition, in r-pre-bounded and w-post-bounded. >>> >>> Now consider >>> >>> [READ_ONCE] >>> rmb >>> [plain read] >>> // some code that creates order between the plain accesses >>> [plain write] >>> wmb >>> [WRITE_ONCE] >>> >>> where for some specific reason we can discern that the compiler can not >>> fully eliminate/move across the barrier either this specific plain read, nor >>> the plain write, nor the ordering between the two. >>> >>> In this case, is there order between the READ_ONCE and the WRITE_ONCE, or >>> not? Of course, we know current LKMM says no. I would say that in those very >>> specific cases, we do have ordering. >> >> Agreed, for LKMM to deal with seqlock, the read-side critical section >> would need to use READ_ONCE(), which is a bit unnatural. The C++ >> standards committee has been discussing this for some time, as that >> memory model also gives data race in that case. >> >> But it might be better to directly model seqlock than to try to make >> LKMM deal with the underlying atomic operations. > > Maybe I should give an example approach, perhaps inspiring a better > approach. > > o Model reader-writer locks in LKMM, including a relaxed > write-lock-held primitive. > > o Model sequence locks in terms of reader-writer locks: > > o The seqlock writer maps to write lock. > > o The seqlock reader maps to read lock, but with a > write-lock-held check. If the write lock is held at > that point, the seqlock tells the caller to retry. > > Please note that the point is simply to exercise > the failure path. > > o If a value is read in the seqlock reader and used > across a "you need to retry" indication, that > flags a seqlock data race. > > But is there a better way? > You need to be careful with those hb edges. The reader critical section does not have to happen-before the writer critical section, as would with an actual read-write lock. I think the solution would have to be along changing the definition of r-post-bounded. The read_enter() function reads from a write_exit() and establishes hb. The read_exit() function also reads from a write_exit(), if the same as the matching read_enter(), then it returns success, otherwise failure. Reads po-before a successful read_exit() are bounded with regards to subsequent write_enter() on the same lock. Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-16 19:13 ` Jonas Oberhauser @ 2025-01-16 19:31 ` Paul E. McKenney 2025-01-16 20:21 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Paul E. McKenney @ 2025-01-16 19:31 UTC (permalink / raw) To: Jonas Oberhauser Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Thu, Jan 16, 2025 at 08:13:28PM +0100, Jonas Oberhauser wrote: > Am 1/16/2025 um 7:40 PM schrieb Paul E. McKenney: > > On Mon, Jan 13, 2025 at 02:04:58PM -0800, Paul E. McKenney wrote: > > > On Fri, Jan 10, 2025 at 05:21:59PM +0100, Jonas Oberhauser wrote: > > > > Am 1/10/2025 um 3:54 PM schrieb Paul E. McKenney: > > > > > On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote: > > > > > > Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney: > > > > > > > On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote: > > > > [ . . . ] > > > > > > > > I currently can not come up with an example where there would be a > > > > > > (semantic) control dependency from a load to a store that is not in the arm > > > > > > of an if statement (or a loop / switch of some form with the branch > > > > > > depending on the load). > > > > > > > > > > > > I think the control dependency is just a red herring. It is only there to > > > > > > avoid the data race. > > > > > > > > > > Well, that red herring needs to have a companion fish to swim with in > > > > > order to enforce ordering, and I am not seeing that companion. > > > > > > > > > > Or am I (yet again!) missing something subtle here? > > > > > > > > It makes more sense to think about how people do message passing (or > > > > seqlock), which might look something like this: > > > > > > > > [READ_ONCE] > > > > rmb > > > > [plain read] > > > > > > > > and > > > > > > > > [plain write] > > > > wmb > > > > [WRITE_ONCE] > > > > > > > > > > > > Clearly LKMM says that there is some sort of order (not quite happens-before > > > > order though) between the READ_ONCE and the plain read, and between the > > > > plain write and the WRITE_ONCE. This order is clearly defined in the data > > > > race definition, in r-pre-bounded and w-post-bounded. > > > > > > > > Now consider > > > > > > > > [READ_ONCE] > > > > rmb > > > > [plain read] > > > > // some code that creates order between the plain accesses > > > > [plain write] > > > > wmb > > > > [WRITE_ONCE] > > > > > > > > where for some specific reason we can discern that the compiler can not > > > > fully eliminate/move across the barrier either this specific plain read, nor > > > > the plain write, nor the ordering between the two. > > > > > > > > In this case, is there order between the READ_ONCE and the WRITE_ONCE, or > > > > not? Of course, we know current LKMM says no. I would say that in those very > > > > specific cases, we do have ordering. > > > > > > Agreed, for LKMM to deal with seqlock, the read-side critical section > > > would need to use READ_ONCE(), which is a bit unnatural. The C++ > > > standards committee has been discussing this for some time, as that > > > memory model also gives data race in that case. > > > > > > But it might be better to directly model seqlock than to try to make > > > LKMM deal with the underlying atomic operations. > > > > Maybe I should give an example approach, perhaps inspiring a better > > approach. > > > > o Model reader-writer locks in LKMM, including a relaxed > > write-lock-held primitive. > > > > o Model sequence locks in terms of reader-writer locks: > > > > o The seqlock writer maps to write lock. > > > > o The seqlock reader maps to read lock, but with a > > write-lock-held check. If the write lock is held at > > that point, the seqlock tells the caller to retry. > > > > Please note that the point is simply to exercise > > the failure path. > > > > o If a value is read in the seqlock reader and used > > across a "you need to retry" indication, that > > flags a seqlock data race. > > > > But is there a better way? > > > > You need to be careful with those hb edges. The reader critical section does > not have to happen-before the writer critical section, as would with an > actual read-write lock. True, the reader critical section only needs each of its loads to have an fr link to all of the stores in the writer critical section. On the other hand, the Linux-kernel implementation of seqcount does use smp_rmb() and smp_wmb(), which does provide the message-passing pattern, and thus hb, correct? > I think the solution would have to be along changing the definition of > r-post-bounded. > The read_enter() function reads from a write_exit() and establishes hb. > The read_exit() function also reads from a write_exit(), if the same as the > matching read_enter(), then it returns success, otherwise failure. > > Reads po-before a successful read_exit() are bounded with regards to > subsequent write_enter() on the same lock. Or are you trying to model the effects of the data races? Thanx, Paul ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-16 19:31 ` Paul E. McKenney @ 2025-01-16 20:21 ` Jonas Oberhauser 0 siblings, 0 replies; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-16 20:21 UTC (permalink / raw) To: paulmck Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/16/2025 um 8:31 PM schrieb Paul E. McKenney: > On Thu, Jan 16, 2025 at 08:13:28PM +0100, Jonas Oberhauser wrote: >> Am 1/16/2025 um 7:40 PM schrieb Paul E. McKenney: >>> On Mon, Jan 13, 2025 at 02:04:58PM -0800, Paul E. McKenney wrote: >>>> On Fri, Jan 10, 2025 at 05:21:59PM +0100, Jonas Oberhauser wrote: >>>>> Am 1/10/2025 um 3:54 PM schrieb Paul E. McKenney: >>>>>> On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote: >>>>>>> Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney: >>>>>>>> On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote: >>> >>> [ . . . ] >>> >>>>>>> I currently can not come up with an example where there would be a >>>>>>> (semantic) control dependency from a load to a store that is not in the arm >>>>>>> of an if statement (or a loop / switch of some form with the branch >>>>>>> depending on the load). >>>>>>> >>>>>>> I think the control dependency is just a red herring. It is only there to >>>>>>> avoid the data race. >>>>>> >>>>>> Well, that red herring needs to have a companion fish to swim with in >>>>>> order to enforce ordering, and I am not seeing that companion. >>>>>> >>>>>> Or am I (yet again!) missing something subtle here? >>>>> >>>>> It makes more sense to think about how people do message passing (or >>>>> seqlock), which might look something like this: >>>>> >>>>> [READ_ONCE] >>>>> rmb >>>>> [plain read] >>>>> >>>>> and >>>>> >>>>> [plain write] >>>>> wmb >>>>> [WRITE_ONCE] >>>>> >>>>> >>>>> Clearly LKMM says that there is some sort of order (not quite happens-before >>>>> order though) between the READ_ONCE and the plain read, and between the >>>>> plain write and the WRITE_ONCE. This order is clearly defined in the data >>>>> race definition, in r-pre-bounded and w-post-bounded. >>>>> >>>>> Now consider >>>>> >>>>> [READ_ONCE] >>>>> rmb >>>>> [plain read] >>>>> // some code that creates order between the plain accesses >>>>> [plain write] >>>>> wmb >>>>> [WRITE_ONCE] >>>>> >>>>> where for some specific reason we can discern that the compiler can not >>>>> fully eliminate/move across the barrier either this specific plain read, nor >>>>> the plain write, nor the ordering between the two. >>>>> >>>>> In this case, is there order between the READ_ONCE and the WRITE_ONCE, or >>>>> not? Of course, we know current LKMM says no. I would say that in those very >>>>> specific cases, we do have ordering. >>>> >>>> Agreed, for LKMM to deal with seqlock, the read-side critical section >>>> would need to use READ_ONCE(), which is a bit unnatural. The C++ >>>> standards committee has been discussing this for some time, as that >>>> memory model also gives data race in that case. >>>> >>>> But it might be better to directly model seqlock than to try to make >>>> LKMM deal with the underlying atomic operations. >>> >>> Maybe I should give an example approach, perhaps inspiring a better >>> approach. >>> >>> o Model reader-writer locks in LKMM, including a relaxed >>> write-lock-held primitive. >>> >>> o Model sequence locks in terms of reader-writer locks: >>> >>> o The seqlock writer maps to write lock. >>> >>> o The seqlock reader maps to read lock, but with a >>> write-lock-held check. If the write lock is held at >>> that point, the seqlock tells the caller to retry. >>> >>> Please note that the point is simply to exercise >>> the failure path. >>> >>> o If a value is read in the seqlock reader and used >>> across a "you need to retry" indication, that >>> flags a seqlock data race. >>> >>> But is there a better way? >>> >> >> You need to be careful with those hb edges. The reader critical section does >> not have to happen-before the writer critical section, as would with an >> actual read-write lock. > > True, the reader critical section only needs each of its loads to have > an fr link to all of the stores in the writer critical section. Yes. And importantly, a properly-synchronized fr link! Meaning that it does not constitute a data race. Actually, you only need the proper synchronization, then the fr link follows from the race-coherence axioms. > On the other hand, the Linux-kernel implementation of seqcount does use > smp_rmb() and smp_wmb(), which does provide the message-passing pattern, > and thus hb, correct? Only in one direction, from the last writer CS to the reader CS. But not from the reader CS to the next writer CS :( The message is passed, but from a data race POV, you are reading something while new data is coming in (from the next write). This fr-race could only be prevented by rw-xbstar = fence | *(r-post-bounded ; xbstar ; w-pre-bounded)* but there is no xbstar :( Well, one thought is that one could declare xbstar from the read_exit to the write_enter, but just not add any release semantics to read_exit() (like a rw_read_unlock would have). But that sounds really scary because that xbstar definitely does not exist in the implementation, so my 9pm brain is doubtful that this is correct. >> I think the solution would have to be along changing the definition of >> r-post-bounded. >> The read_enter() function reads from a write_exit() and establishes hb. >> The read_exit() function also reads from a write_exit(), if the same as the >> matching read_enter(), then it returns success, otherwise failure. >> >> Reads po-before a successful read_exit() are bounded with regards to >> subsequent write_enter() on the same lock. > > Or are you trying to model the effects of the data races? No, I had even overlooked the "real data race" in the failure case xP I was modelling the properly-synchronized part of the fr, without introducing hb/xbstar from the reader to the next writer - but only for the success case. The failure case also needs some way to avoid the data race (unless it abuses the value). Perhaps by relaxing the definition of data race as we did in VMM, so that the readside CS doesn't become UB until you do something forbidden with the value. My suggestion for avoiding the data race in success was to extend the notion of rw-xbstar above in some way that provides the necessary protection, e.g. by drawing an "r-post-bounded"-like edge to the write_enter(), which would then "w-pre-bound" (or something similar) the write-side CS. Have fun, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-16 18:40 ` Paul E. McKenney 2025-01-16 19:13 ` Jonas Oberhauser @ 2025-01-16 19:28 ` Jonas Oberhauser 2025-01-16 19:39 ` Paul E. McKenney 1 sibling, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-16 19:28 UTC (permalink / raw) To: paulmck Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/16/2025 um 7:40 PM schrieb Paul E. McKenney: > o If a value is read in the seqlock reader and used > across a "you need to retry" indication, that > flags a seqlock data race. This too is insufficient, you also need to prevent dereferencing or having control dependency inside the seqlock. Otherwise you could derefence a torn pointer and... At this point your definition of data race becomes pretty much the same as we have. https://github.com/open-s4c/libvsync/blob/main/vmm/vmm.cat#L150 (also this rule should only concern reads that are actually "data-racy" - if the read is synchronized by some other writes, then you can read & use it just fine across the seqlock data race) I also noticed that in my previous e-mail I had overlooked the reads inside the CS in the failure case, but you are of course right, there needs to be some mechanism to prevent them from being data racy unless abused. But I am not sure how to formalize that in a way that is simpler than just re-defining data races in general, without adding some special support to herd7 for it. What do you think? jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-16 19:28 ` Jonas Oberhauser @ 2025-01-16 19:39 ` Paul E. McKenney 2025-01-17 12:08 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Paul E. McKenney @ 2025-01-16 19:39 UTC (permalink / raw) To: Jonas Oberhauser Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Thu, Jan 16, 2025 at 08:28:06PM +0100, Jonas Oberhauser wrote: > > > Am 1/16/2025 um 7:40 PM schrieb Paul E. McKenney: > > o If a value is read in the seqlock reader and used > > across a "you need to retry" indication, that > > flags a seqlock data race. > > > This too is insufficient, you also need to prevent dereferencing or having > control dependency inside the seqlock. Otherwise you could derefence a torn > pointer and... True, but isn't that prohibition separable from the underlying implementation? > At this point your definition of data race becomes pretty much the same as > we have. > > https://github.com/open-s4c/libvsync/blob/main/vmm/vmm.cat#L150 > > > (also this rule should only concern reads that are actually "data-racy" - if > the read is synchronized by some other writes, then you can read & use it > just fine across the seqlock data race) Perhaps LKMM should adopt this or something similar, but what do others think? > I also noticed that in my previous e-mail I had overlooked the reads inside > the CS in the failure case, but you are of course right, there needs to be > some mechanism to prevent them from being data racy unless abused. > > But I am not sure how to formalize that in a way that is simpler than just > re-defining data races in general, without adding some special support to > herd7 for it. > > What do you think? I was thinking in terms of identifying reads in critical sections (sort of like LKMM does for RCU read-side critical sections), then identifying any dependencies from those reads that cross a failed reader boundary. If that set is non-empty, flag it. But I clearly cannot claim to have thought this through. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-16 19:39 ` Paul E. McKenney @ 2025-01-17 12:08 ` Jonas Oberhauser 0 siblings, 0 replies; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-17 12:08 UTC (permalink / raw) To: paulmck Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/16/2025 um 8:39 PM schrieb Paul E. McKenney: > On Thu, Jan 16, 2025 at 08:28:06PM +0100, Jonas Oberhauser wrote: >> >> >> Am 1/16/2025 um 7:40 PM schrieb Paul E. McKenney: >>> o If a value is read in the seqlock reader and used >>> across a "you need to retry" indication, that >>> flags a seqlock data race. >> >> >> This too is insufficient, you also need to prevent dereferencing or having >> control dependency inside the seqlock. Otherwise you could derefence a torn >> pointer and... > > True, but isn't that prohibition separable from the underlying > implementation? Yes, but so is the prohibition of using the value after the failed reader_exit(). So probably it needs to be added to the specification of what you are allowed to do with values from the read-side critical section. Actually this was a bug we had in some code once, and I overlooked it because I thought the incorrect data isn't used anyways, right? Luckily I had put the condition into our cat model already and so the tooling caught the bug before it went out... >> At this point your definition of data race becomes pretty much the same as >> we have. >> >> https://github.com/open-s4c/libvsync/blob/main/vmm/vmm.cat#L150 >> >> >> (also this rule should only concern reads that are actually "data-racy" - if >> the read is synchronized by some other writes, then you can read & use it >> just fine across the seqlock [edit: boundary]) > > Perhaps LKMM should adopt this or something similar, but what do others > think? I am not sure how many others are still reading this deep into the conversation, maybe best to start a new thread. >> I also noticed that in my previous e-mail I had overlooked the reads inside >> the CS in the failure case, but you are of course right, there needs to be >> some mechanism to prevent them from being data racy unless abused. >> >> But I am not sure how to formalize that in a way that is simpler than just >> re-defining data races in general, without adding some special support to >> herd7 for it. >> >> What do you think? > > I was thinking in terms of identifying reads in critical sections (sort > of like LKMM does for RCU read-side critical sections), then identifying > any dependencies from those reads that cross a failed reader boundary. > If that set is non-empty, flag it. Yes, the general idea sounds reasonable, but the details have a lot of potential for future improvement. One tricky part of seqlock besides the data race is that it kind of uses negative message passing - the fact that you have not seen the message means you also have not seen the flag. (And the message in this case is the write_enter(), and the flag is the plain access in the critical section! Fun.) This makes it hard to formalize in the box of LKMM and make it play well with all the other pieces. Maybe something like avoiding rw data races also under something like prop? r-prop-post-bounded ; ((overwrite & ext) ; cumul-fence*) ; w-prop-pre-bounded for the cases where the pre-bounded write must propagate after the overwriting store, and the post-bounded read executes before and on the same CPU as the overwritten event. Then you could argue that if the overwriting write has not propagated to the overwritten event, the pre-bounded write also has not propagated to that event. From that you can conclude it also can not have propagated to the post-bounded read. I'm not sure if the cases where propagation is handled by a strong-fence are already covered by the rw-xbstar rule. Have fun, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-13 22:04 ` Paul E. McKenney 2025-01-16 18:40 ` Paul E. McKenney @ 2025-01-16 19:08 ` Jonas Oberhauser 2025-01-16 23:02 ` Alan Stern 1 sibling, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-16 19:08 UTC (permalink / raw) To: paulmck Cc: Alan Stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/13/2025 um 11:04 PM schrieb Paul E. McKenney: > On Fri, Jan 10, 2025 at 05:21:59PM +0100, Jonas Oberhauser wrote: >> >> >> Am 1/10/2025 um 3:54 PM schrieb Paul E. McKenney: >>> On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote: >>>> Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney: >>>>> On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote: >>>>>> >>>>>> >>>>>> Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney: >>>>>>> If I change the two plain assignments to use WRITE_ONCE() as required >>>>>>> by memory-barriers.txt, OOTA is avoided: >>>>>> >>>>>> >>>>>> I think this direction of inquiry is a bit misleading. There need not be any >>>>>> speculative store at all: >>>>>> >>>>>> >>>>>> >>>>>> P0(int *a, int *b, int *x, int *y) { >>>>>> int r1; >>>>>> int r2 = 0; >>>>>> r1 = READ_ONCE(*x); >>>>>> smp_rmb(); >>>>>> if (r1 == 1) { >>>>>> r2 = *b; >>>>>> } >>>>>> WRITE_ONCE(*a, r2); >>>>>> smp_wmb(); >>>>>> WRITE_ONCE(*y, 1); >>>>>> } >>>>>> >>>>>> P1(int *a, int *b, int *x, int *y) { >>>>>> int r1; >>>>>> >>>>>> int r2 = 0; >>>>>> >>>>>> r1 = READ_ONCE(*y); >>>>>> smp_rmb(); >>>>>> if (r1 == 1) { >>>>>> r2 = *a; >>>>>> } >>>>>> WRITE_ONCE(*b, r2); >>>>>> smp_wmb(); >>>>>> WRITE_ONCE(*x, 1); >>>>>> } >>>>>> >>>>>> >>>>> If we want to respect something containing a control dependency to a >>>>> WRITE_ONCE() not in the body of the "if" statement, we need to make some >>>>> change to memory-barriers.txt. >>>> >>>> I'm not sure what you denotate by *this* in "if we change this", but just to >>>> clarify, I am not thinking of claiming that there were a (semantic) control >>>> dependency to WRITE_ONCE(*b, r2) in this example. >>>> >>>> There is however a data dependency from r2 = *a to WRITE_ONCE, and I would >>>> say that there is a semantic data (not control) dependency from r1 = >>>> READ_ONCE(*y) to WRITE_ONCE(*b, r2), too: depending on the value read from >>>> *y, the value stored to *b will be different. The latter would be enough to >>>> avoid OOTA according to the mainline LKMM, but currently this semantic >>>> dependency is not detected by herd7. >>> >>> According to LKMM, address and data dependencies must be headed by >>> rcu_dereference() or similar. See Documentation/RCU/rcu_dereference.rst. >>> >>> Therefore, there is nothing to chain the control dependency with. >> >> Note that herd7 does generate dependencies. And speaking informally, there >> clearly is a semantic dependency. >> >> Both the original formalization of LKMM and my patch do say that a plain >> load at the head of a dependency chain does not provide any dependency >> ordering, i.e., >> >> [Plain & R] ; dep >> >> is never part of hb, both in LKMM and in my patch. > > Agreed, LKMM does filter the underlying herd7 dependencies. > >> By the way, if your concern is the dependency *starting* from the plain >> load, then we can look at examples where the dependency starts from a marked >> load: >> >> >> r1 = READ_ONCE(*x); >> smp_rmb(); >> if (r1 == 1) { >> r2 = READ_ONCE(*a); >> } >> *b = 1; >> smp_wmb(); >> WRITE_ONCE(*y,1); >> >> This is more or less analogous to the case of the addr ; [Plain] ; wmb case >> you already have. > > This is probably a failure of imagination on my part, but I am not > seeing how to create another thread that interacts with that store to > "b" without resulting in a data race. The other thread is the same just flipping x/y and a/b around. r1 = READ_ONCE(*y); smp_rmb(); if (r1 == 1) { r2 = READ_ONCE(*b); } *a = 1; smp_wmb(); WRITE_ONCE(*x,1); There is no data race, because the read from *b can only happen if we also read from the store to *y, and there are a wmb and an rmb to 'prevent the reordering' of the assignments to *b and *y, and the load from *y and the load from *b. > Ignoring that, I am not seeing much in the way of LKMM dependencies > in that code. Ah, I meant to write *b = r2 and *a = r2 My mistake. Note that if we just change it like this: r1 = READ_ONCE(*x); smp_rmb(); if (r1 == 1) { r2 = READ_ONCE(*a); } *r2 = a; smp_wmb(); WRITE_ONCE(*y,1); then we have an addr dependency and the OOTA becomes forbidden (unless you know how to initialize *a and *b to valid addresses, you may need to add something like `if (r2 == 0) r2 = a` to run this in herd7). It still has the same anomaly that triggers the OOTA though, where both threads can read r1==1. This anomaly goes away with my patch. >> That is why sequence locks work, after all. In our internal memory model we >> have relaxed the definition accordingly and there are a bunch of internally >> used datastructures that can only be verified because of the relaxation. > > Again, it is likely best to model sequence locking directly. You can take that point of view. For our model, like I said, we have taken the other point of view. The main advantage being that it helps us prove that our sequence lock and a few other speculative algorithms we have developed are correct. >>> I am currently not at all comfortable with the thought of allowing >>> plain C-language loads to head any sort of dependency. I really did put >>> that restriction into both memory-barriers.txt and rcu_dereference.rst >>> intentionally. There is the old saying "Discipline = freedom", and >>> therefore compilers' lack of discipline surrounding plain C-language >>> loads implies a lack of freedom. ;-) >> >> Yes, I understand your concern (or more generally, the concern of letting >> plain accesses play a role in ordering). >> Obviously, allowing arbitrary plain loads to invoke some kind of ordering >> because of a dependency is plain (heh) wrong. >> There are two kinds of potential problems: >> - the load or its dependent store may not exist in that location at all >> - the dependency may not really exist >> >> The second case is a problem also with marked accesses, and should be >> handled by herd7 only giving us actual semantic dependencies (whatever those >> are). It can not be solved in cat. Either way it is a limitation that herd7 >> (and also other tools) currently has and we already live with. > > This is easy to say. Please see this paper (which Alan also referred to) > for some of the challenges: > > https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3064r2.pdf > > Mark Batty and his group have identified more gotchas. (See the citation > in the above paper.) I'm well aware that an absolute definition of semantic dependencies is not easy to give. The point is, it's not a problem that can, in any way, be really solved inside the cat model. So if it should be solved (instead of approximated in ways that can be easily undermined), it needs to be handled by the tooling around the cat model. Btw, I'm re-reading that paper and here are some comments: - I agree with the conclusion that C++ should not try to solve OOTA other than declaring that they do not exist - the definition of OOTA as sdep | rfe is not the right one. You really need sdep ; rfe, because you can have multiple subsequent sdep links that together are not a dependency, e.g., int * x[] = { &a, &a }; int i = b; *x[i] = 1; here the semantic address dependency from loading b to loading x[i] and the semantic address dependency from loading x[i] and storing to *x[i] do not together form a semantic dependency anymore, because *x[i] is always a. So this whole code can just become a=1, and with mirrored code you can get a=b=1, which is an sdep | rfe cycle. So compilers can absolutely generate an observed OOTA for your definition of OOTA! As I explained in the original e-mail, you should instead rely on semantic dependency to cover cases of sdep ; sdep where there is in fact still a dependency, and define OOTA as sdep;rfe. - I did not like the phrasing that a dependency is a function of one execution, especially in contrast to source code. For a fixed execution, there are no dependencies because all values are fixed. It is the other executions - where you could have read another value - that create the dependencies. Perhaps it is better to say that it is not a *local* function of source code, i.e., just because the same source code has a dependency in context C does not mean that it has a dependency in context C'. In fact I would say a major gotcha of dependencies is that they are not a function of even the set of all permissible (according to the standard) executions. That is because the compiler does not have to make every permissible execution possible, only at least one. If the compiler knows that among all executions that it actually makes possible, some value is always 0 - even if there is a permissible execution in which it is not 0 - it can still replace that value with 0. E.g. T1 { x = 1; x = 0; } T2 { T[x] = 1; } ~~ merge -- makes executions where T[x] reads x==1 impossible ~> T1' { x = 1; x = 0; T[x] = 1; } ~~ replace ~> T1' { x = 1; x = 0; T[0] = 1; // no more dependency :( } which defeats any semantic dependency definition that is a function of a single execution. Of course you avoid this by making it really a function of the compiler + program *and* the execution, and looking at all the other possible (not just permissible) executions. - On the one hand, that definition makes a lot of sense. On the other hand, at least without the atomics=volatile restriction it would have the downside that a compiler which generates just a single execution for your program can say that there are no dependencies whatsoever and generate all kinds of "out of thin air" behaviors. I am not sure if that gets really resolved by the volatile restrictions you put, but either way those seem far stronger than what one would want. I would say that the approach with volatile is overzealous because it tries to create a "local" order solution to the problem that only requires a "global" ordering solution. Since not every semantic dependency needs to provide order in C++ -- only the cycle of dependencies -- it is totally ok to add too many semantic dependency edges to a program, even those that are not going to be exactly maintained by every compiler, as long as we can ensure that globally, no dependency cycle occurs. So for example, if we merge x = y || y = x, the merge can turn it into x=y=x or y=x=y (and then into an empty program), but not into a cyclic dependency. So even though one side of the dependency may be violated, for sake of OOTA, we could still label both sides as dependent. For LKMM the problem is of course much easier because you have volatiles and compiler barriers. Again you could maybe add incorrect semantic dependencies between accesses, as long as only the really preserved ones will imply ordering. So I'm not convinced that for either of the two cases you need to do a compiler-specific definition of dependencies. BTW, for what it's worth, Dat3M in a sense uses the clang dependencies - it first allows the compiler to do its optimizations, and then verifies the llvm-ir (with a more hardware-like dependency definition). I think something like that can be a good practical solution with fewer problems than other attempts to approximate the solution. >> So the new problem we deal with is to somehow restrict the rule to loads and >> dependent stores that the compiler for whatever reason will not fully >> eliminate. >> >> This problem too can not be solved completely inside cat. We can give an >> approximation, as discussed with Alan (stating that a store would not be >> elided if it is read by another thread, and a read would not be elided if it >> reads from another thread and a store that won't be elided depends on it). >> >> This approximation is also limited, e.g., if the addresses of the plain >> loads and stores have not yet escaped the function, but at least this >> scenario is currently impossible in herd7 (unlike the fake dependency >> scenario). >> >> In my mind it would again be better to offload the correct selection of >> "compiler-un(re)movable plain loads and stores" to the tools. That may again >> not solve the problem fully, but it at least would mean that any changes to >> address the imprecision wouldn't need to go through the kernel tree, and >> IMHO it is easier to say LKMM in the cat files is the model, and the >> interpretation of the model has some limitations. > > Which we currently do via marked accesses. ;-) Hey, programmers are not tools : ) >>> We should decide which of these examples should be >>> added to the github litmus archive, perhaps to illustrate the fact that >>> plain C-language loads do not head dependency chains. Thoughts? >> >> I'm not sure that is a good idea, given that running the herd7 tool on the >> litmus test will clearly show a dependency chain headed by plain loads in >> the visual graphs (with `doshow rwdep`). >> >> Maybe it makes more sense to say in the docs that they may head syntactic or >> semantic dependency chains, but because of the common case that the compiler >> may cruelly optimize things, LKMM does not guarantee ordering based on the >> dependency chains headed by plain loads. That would be consistent with the >> tooling. > > Well, LKMM and herd7 have different notions of what constitutes a > dependency, and the LKMM notion is most relevant here. (And herd7 > needs its broader definition so as to handle a wide variety of > memory models.) I would say that herd7's definition handles a wide variety of hardware memory models, but it has limitations for language level models. Perhaps it would be good for herd7 to support multiple dependency models as well. Have fun, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-16 19:08 ` Jonas Oberhauser @ 2025-01-16 23:02 ` Alan Stern 2025-01-17 8:34 ` Hernan Ponce de Leon ` (2 more replies) 0 siblings, 3 replies; 59+ messages in thread From: Alan Stern @ 2025-01-16 23:02 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote: > I'm well aware that an absolute definition of semantic dependencies is not > easy to give. In fact it's undecidable. No tool is ever going to be able to detect semantic dependencies perfectly. > Btw, I'm re-reading that paper and here are some comments: > - I agree with the conclusion that C++ should not try to solve OOTA other > than declaring that they do not exist > - the definition of OOTA as sdep | rfe is not the right one. You really need > sdep ; rfe, because you can have multiple subsequent sdep links that > together are not a dependency, e.g., > > int * x[] = { &a, &a }; > int i = b; > *x[i] = 1; > > here the semantic address dependency from loading b to loading x[i] and the > semantic address dependency from loading x[i] and storing to *x[i] do not > together form a semantic dependency anymore, because *x[i] is always a. So > this whole code can just become a=1, and with mirrored code you can get > a=b=1, which is an sdep | rfe cycle. We regard sdep as extending from loads (or sets of loads) to stores. (Perhaps the paper doesn't state this explicitly -- it should.) So an sdep ; sdep sequence is not possible. Nevertheless, it's arguable that in your example there is no semantic dependency from the load of b to the load of x[i]. Given the code shown, a compiler could replace the load of x[i] with the constant value &a, yielding simply *(&a) = 1. There are other examples which do make the point, however. For example: int a = 1, b = 1, c = 2; int *x[] = {&a, &b, &c}; int r1 = i; if (r1 > 1) r1 = 1; int *r2 = x[r1]; int r3 = *r2; q = r3; Here there is a semantic dependency (if you accept dependencies to loads) from the load of i to the load of x[r1] and from the load of *r2 to the store to q. But overall the value stored to q is always 1, so it doesn't depend on the load from i. > - I did not like the phrasing that a dependency is a function of one > execution, especially in contrast to source code. For a fixed execution, > there are no dependencies because all values are fixed. It is the other > executions - where you could have read another value - that create the > dependencies. > > Perhaps it is better to say that it is not a *local* function of source > code, i.e., just because the same source code has a dependency in context C > does not mean that it has a dependency in context C'. Of course, what we meant is that whether or not there is a semantic dependency depends on both the source code and the execution. > In fact I would say a major gotcha of dependencies is that they are not a > function of even the set of all permissible (according to the standard) > executions. > That is because the compiler does not have to make every permissible > execution possible, only at least one. The paper includes examples of this, I believe. > If the compiler knows that among all executions that it actually makes > possible, some value is always 0 - even if there is a permissible execution > in which it is not 0 - it can still replace that value with 0. E.g. > > T1 { x = 1; x = 0; } > T2 { T[x] = 1; } > > ~~ merge -- makes executions where T[x] reads x==1 impossible ~> > > T1' { x = 1; x = 0; T[x] = 1; } > > ~~ replace ~> > > T1' { x = 1; x = 0; T[0] = 1; // no more dependency :( } > > which defeats any semantic dependency definition that is a function of a > single execution. Our wording could be improved. We simply want to make the point that the same source code may have a semantic dependency in one execution but not in another. I guess we should avoid calling it a "function" of the execution, to avoid implying that the execution alone is what matters. > Of course you avoid this by making it really a function of the compiler + > program *and* the execution, and looking at all the other possible (not just > permissible) executions. > > - On the one hand, that definition makes a lot of sense. On the other hand, > at least without the atomics=volatile restriction it would have the downside > that a compiler which generates just a single execution for your program can > say that there are no dependencies whatsoever and generate all kinds of "out > of thin air" behaviors. That is so. But a compiler which examines only a single thread at a time cannot afford to generate just a single execution, because it cannot know what values the loads will obtain when the full program runs. > I am not sure if that gets really resolved by the volatile restrictions you > put, but either way those seem far stronger than what one would want. It does get resolved, because treating atomics as volatile also means that the compiler cannot afford to generate just a single execution. Again, because it cannot know what values the loads will obtain at runtime, since volatile loads can yield any value in a non-benign environment. > I would say that the approach with volatile is overzealous because it tries > to create a "local" order solution to the problem that only requires a > "global" ordering solution. Since not every semantic dependency needs to > provide order in C++ -- only the cycle of dependencies -- it is totally ok > to add too many semantic dependency edges to a program, even those that are > not going to be exactly maintained by every compiler, as long as we can > ensure that globally, no dependency cycle occurs. But then how would you characterize semantic dependencies, if you want to allow the definition to include some dependencies that aren't semantic but not so many that you ever create a cycle? This sounds like an even worse problem than we started with! > So for example, if we merge x = y || y = x, the merge can turn it into > x=y=x or y=x=y (and then into an empty program), but not into a cyclic > dependency. So even though one side of the dependency may be violated, for > sake of OOTA, we could still label both sides as dependent. They _are_ both semantically dependent (in the original parallel version, I mean). I don't see what merging has to do with it. > For LKMM the problem is of course much easier because you have volatiles and > compiler barriers. Again you could maybe add incorrect semantic dependencies > between accesses, as long as only the really preserved ones will imply > ordering. I'm not particularly concerned about OOTA or semantic dependencies in LKMM. > So I'm not convinced that for either of the two cases you need to do a > compiler-specific definition of dependencies. For the C++ case, I cannot think of any other way to approach the problem. Nor do I see anything wrong with a compiler-specific definition, given how nebulous the whole idea is in the first place. > BTW, for what it's worth, Dat3M in a sense uses the clang dependencies - it > first allows the compiler to do its optimizations, and then verifies the > llvm-ir (with a more hardware-like dependency definition). What do you mean by "verifies"? What property of the llvm-ir does it verify? > I think something like that can be a good practical solution with fewer > problems than other attempts to approximate the solution. I do not like the idea of tying the definition of OOTA (which needs to apply to every implementation) to a particular clang compiler. Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-16 23:02 ` Alan Stern @ 2025-01-17 8:34 ` Hernan Ponce de Leon 2025-01-17 11:29 ` Jonas Oberhauser 2025-01-17 15:52 ` Alan Stern 2 siblings, 0 replies; 59+ messages in thread From: Hernan Ponce de Leon @ 2025-01-17 8:34 UTC (permalink / raw) To: Alan Stern, Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm On 1/17/2025 12:02 AM, Alan Stern wrote: > On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote: >> I'm well aware that an absolute definition of semantic dependencies is not >> easy to give. > > In fact it's undecidable. No tool is ever going to be able to detect > semantic dependencies perfectly. > >> Btw, I'm re-reading that paper and here are some comments: >> - I agree with the conclusion that C++ should not try to solve OOTA other >> than declaring that they do not exist >> - the definition of OOTA as sdep | rfe is not the right one. You really need >> sdep ; rfe, because you can have multiple subsequent sdep links that >> together are not a dependency, e.g., >> >> int * x[] = { &a, &a }; >> int i = b; >> *x[i] = 1; >> >> here the semantic address dependency from loading b to loading x[i] and the >> semantic address dependency from loading x[i] and storing to *x[i] do not >> together form a semantic dependency anymore, because *x[i] is always a. So >> this whole code can just become a=1, and with mirrored code you can get >> a=b=1, which is an sdep | rfe cycle. > > We regard sdep as extending from loads (or sets of loads) to stores. > (Perhaps the paper doesn't state this explicitly -- it should.) So an > sdep ; sdep sequence is not possible. > > Nevertheless, it's arguable that in your example there is no semantic > dependency from the load of b to the load of x[i]. Given the code > shown, a compiler could replace the load of x[i] with the constant value > &a, yielding simply *(&a) = 1. > > There are other examples which do make the point, however. For example: > > int a = 1, b = 1, c = 2; > int *x[] = {&a, &b, &c}; > > int r1 = i; > if (r1 > 1) > r1 = 1; > int *r2 = x[r1]; > int r3 = *r2; > q = r3; > > Here there is a semantic dependency (if you accept dependencies to > loads) from the load of i to the load of x[r1] and from the load of *r2 > to the store to q. But overall the value stored to q is always 1, so it > doesn't depend on the load from i. > >> - I did not like the phrasing that a dependency is a function of one >> execution, especially in contrast to source code. For a fixed execution, >> there are no dependencies because all values are fixed. It is the other >> executions - where you could have read another value - that create the >> dependencies. >> >> Perhaps it is better to say that it is not a *local* function of source >> code, i.e., just because the same source code has a dependency in context C >> does not mean that it has a dependency in context C'. > > Of course, what we meant is that whether or not there is a semantic > dependency depends on both the source code and the execution. > >> In fact I would say a major gotcha of dependencies is that they are not a >> function of even the set of all permissible (according to the standard) >> executions. >> That is because the compiler does not have to make every permissible >> execution possible, only at least one. > > The paper includes examples of this, I believe. > >> If the compiler knows that among all executions that it actually makes >> possible, some value is always 0 - even if there is a permissible execution >> in which it is not 0 - it can still replace that value with 0. E.g. >> >> T1 { x = 1; x = 0; } >> T2 { T[x] = 1; } >> >> ~~ merge -- makes executions where T[x] reads x==1 impossible ~> >> >> T1' { x = 1; x = 0; T[x] = 1; } >> >> ~~ replace ~> >> >> T1' { x = 1; x = 0; T[0] = 1; // no more dependency :( } >> >> which defeats any semantic dependency definition that is a function of a >> single execution. > > Our wording could be improved. We simply want to make the point that > the same source code may have a semantic dependency in one execution but > not in another. I guess we should avoid calling it a "function" of the > execution, to avoid implying that the execution alone is what matters. > >> Of course you avoid this by making it really a function of the compiler + >> program *and* the execution, and looking at all the other possible (not just >> permissible) executions. >> >> - On the one hand, that definition makes a lot of sense. On the other hand, >> at least without the atomics=volatile restriction it would have the downside >> that a compiler which generates just a single execution for your program can >> say that there are no dependencies whatsoever and generate all kinds of "out >> of thin air" behaviors. > > That is so. But a compiler which examines only a single thread at a > time cannot afford to generate just a single execution, because it > cannot know what values the loads will obtain when the full program > runs. > >> I am not sure if that gets really resolved by the volatile restrictions you >> put, but either way those seem far stronger than what one would want. > > It does get resolved, because treating atomics as volatile also means > that the compiler cannot afford to generate just a single execution. > Again, because it cannot know what values the loads will obtain at > runtime, since volatile loads can yield any value in a non-benign > environment. > >> I would say that the approach with volatile is overzealous because it tries >> to create a "local" order solution to the problem that only requires a >> "global" ordering solution. Since not every semantic dependency needs to >> provide order in C++ -- only the cycle of dependencies -- it is totally ok >> to add too many semantic dependency edges to a program, even those that are >> not going to be exactly maintained by every compiler, as long as we can >> ensure that globally, no dependency cycle occurs. > > But then how would you characterize semantic dependencies, if you want > to allow the definition to include some dependencies that aren't > semantic but not so many that you ever create a cycle? This sounds like > an even worse problem than we started with! > >> So for example, if we merge x = y || y = x, the merge can turn it into >> x=y=x or y=x=y (and then into an empty program), but not into a cyclic >> dependency. So even though one side of the dependency may be violated, for >> sake of OOTA, we could still label both sides as dependent. > > They _are_ both semantically dependent (in the original parallel > version, I mean). I don't see what merging has to do with it. > >> For LKMM the problem is of course much easier because you have volatiles and >> compiler barriers. Again you could maybe add incorrect semantic dependencies >> between accesses, as long as only the really preserved ones will imply >> ordering. > > I'm not particularly concerned about OOTA or semantic dependencies in > LKMM. > >> So I'm not convinced that for either of the two cases you need to do a >> compiler-specific definition of dependencies. > > For the C++ case, I cannot think of any other way to approach the > problem. Nor do I see anything wrong with a compiler-specific > definition, given how nebulous the whole idea is in the first place. > >> BTW, for what it's worth, Dat3M in a sense uses the clang dependencies - it >> first allows the compiler to do its optimizations, and then verifies the >> llvm-ir (with a more hardware-like dependency definition). > > What do you mean by "verifies"? What property of the llvm-ir does it > verify? Let me be more precise here. The Dat3M verifier has its own IR and it can verify 3 kinds of properties: - assertions written in the code - lack of liveness violations (e.g., a spinloop does not hang) - other properties written in the .cat file as "flag ~empty(r)", data races in LKMM/C fit in this category. Dat3M has several parsers to create its own IR: it can read the litmus format of herd, but also C code by first converting to llvm-ir, and then parsing this one. In the case of verifying kernel code wrt LKMM we do some magic such that anything related to the "LKMM concurrency API" (things like atomic_cmpxchg) appear as function calls in the llvm-ir. This allows us to parse this as the concrete type of event we want. Hernan > >> I think something like that can be a good practical solution with fewer >> problems than other attempts to approximate the solution. > > I do not like the idea of tying the definition of OOTA (which needs to > apply to every implementation) to a particular clang compiler. > > Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-16 23:02 ` Alan Stern 2025-01-17 8:34 ` Hernan Ponce de Leon @ 2025-01-17 11:29 ` Jonas Oberhauser 2025-01-17 20:01 ` Alan Stern 2025-01-17 15:52 ` Alan Stern 2 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-17 11:29 UTC (permalink / raw) To: Alan Stern Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/17/2025 um 12:02 AM schrieb Alan Stern: > On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote: >> I'm well aware that an absolute definition of semantic dependencies is not >> easy to give. > > In fact it's undecidable. No tool is ever going to be able to detect > semantic dependencies perfectly. It depends. Firstly, let's consider that the tool only runs on finite (or "finitized") programs. Seceondly, your definition depends on the compiler. So in the sense that we don't know the compiler, it is undecidable. But if you fix the compiler, you could still enumerate all executions under that compiler and compute whether that compiler has a dependency or not. But as I mentioned before, I think you can define semantic dependencies more appropriately because you don't really need to preserve semantic dependencies in C++, and in LKMM (and VMM) you have volatiles that restrict what kind of dependency-eliminations the compiler can do. >> >> int * x[] = { &a, &a }; >> int i = b; >> *x[i] = 1; >> >> here the semantic address dependency from loading b to loading x[i] and the >> semantic address dependency from loading x[i] and storing to *x[i] do not >> together form a semantic dependency anymore, because *x[i] is always a. So >> this whole code can just become a=1, and with mirrored code you can get >> a=b=1, which is an sdep | rfe cycle. > > We regard sdep as extending from loads (or sets of loads) to stores. > (Perhaps the paper doesn't state this explicitly -- it should.) So an > sdep ; sdep sequence is not possible. I see. There is a line in the introduction ("Roughly speaking, there is a semantic dependency from a given load to a given store when all other things being equal, a change in the value loaded can change the value stored or prevent the store from occurring at all"), but I read it too carelessly and treated it as an implication rather than an iff. I suppose at some point you anyways replace the definition of OOTA with a version where only semantic dependencies from loads to a store are considered, so the concern becomes irrelevant then. >> - On the one hand, that definition makes a lot of sense. On the other hand, >> at least without the atomics=volatile restriction it would have the downside >> that a compiler which generates just a single execution for your program can >> say that there are no dependencies whatsoever and generate all kinds of "out >> of thin air" behaviors. > > That is so. But a compiler which examines only a single thread at a > time cannot afford to generate just a single execution, because it > cannot know what values the loads will obtain when the full program > runs. > >> I am not sure if that gets really resolved by the volatile restrictions you >> put, but either way those seem far stronger than what one would want. > > It does get resolved, because treating atomics as volatile also means > that the compiler cannot afford to generate just a single execution. > Again, because it cannot know what values the loads will obtain at > runtime, since volatile loads can yield any value in a non-benign > environment. Yes. Actually I wonder if you put this "all loads are volatile" restriction, can a globally analysing compiler still have any optimizations that a locally analysing compiler can not? It rather seems the other way, that the locally analysing quasi-volatile compiler can at least to some local optimizations, while the global volatile compiler can not (e.g., x=1; y=x; can not be x=1; y=1; for the global compiler because x is volatile now). >> I would say that the approach with volatile is overzealous because it tries >> to create a "local" order solution to the problem that only requires a >> "global" ordering solution. Since not every semantic dependency needs to >> provide order in C++ -- only the cycle of dependencies -- it is totally ok >> to add too many semantic dependency edges to a program, even those that are >> not going to be exactly maintained by every compiler, as long as we can >> ensure that globally, no dependency cycle occurs. > > But then how would you characterize semantic dependencies, if you want > to allow the definition to include some dependencies that aren't > semantic but not so many that you ever create a cycle? I don't know which definition is correct yet, but the point is that you don't have to avoid making so many dependencies that you would create a cycle. It just forbids the compiler from looking for cycles and optimizing based on the existance of the cycle. (Looking for unused vars and removing those is still allowed, under the informal argument that this simulates an execution where no OOTA happened) > This sounds like > an even worse problem than we started with! > > >> So for example, if we merge x = y || y = x, the merge can turn it into >> x=y=x or y=x=y (and then into an empty program), but not into a cyclic >> dependency. So even though one side of the dependency may be violated, for >> sake of OOTA, we could still label both sides as dependent. > > They _are_ both semantically dependent (in the original parallel > version, I mean). I don't see what merging has to do with it. Note that I was considering the case where they are not volatile. With a compiler that is not treating them as volatile, which merges the two threads, under your definition, there is no semantic dependency in at least one direction because there is no hardware realization H where you read something else (of course you exclude such compilers, but I think realistically they should be allowed). My point is that we can say they are semantically dependent for the sake of OOTA, not derive any ordering from these dependencies other than the cyclical one, and therefore allow compilers to one of the two optimizations (make x=y no longer depend on y or make y=x no longer depend on x) but noth make a cycle analysis to remove both dependencies and generate an OOTA value (it can remove both dependencies by leaving x and y unchanged though). >> So I'm not convinced that for either of the two cases you need to do a >> compiler-specific definition of dependencies. > > For the C++ case, I cannot think of any other way to approach the > problem. Nor do I see anything wrong with a compiler-specific > definition, given how nebulous the whole idea is in the first place. > >> BTW, for what it's worth, Dat3M in a sense uses the clang dependencies - it >> first allows the compiler to do its optimizations, and then verifies the >> llvm-ir (with a more hardware-like dependency definition). > > What do you mean by "verifies"? What property of the llvm-ir does it > verify? Verify that the algorithm, e.g., qspinlock, has no executions that are buggy, e.g., have a data race on the critical section. It does so for a fixed test case with a small number of threads, i.e., a finite program. >> I think something like that can be a good practical solution with fewer >> problems than other attempts to approximate the solution. > > I do not like the idea of tying the definition of OOTA (which needs to > apply to every implementation) to a particular clang compiler. But that is what you have done, no? Whether something is an sdep depends on the compiler, so compiler A could generate an execution that is OOTA in the sdep definition of compiler B. (Of course with the assumption of atomic=volatile, it may just be that we are back to the beginning and all "naive" semantic dependencies are actually semantic dependencies for all compilers). Anyways what I meant is not about tying the definition of OOTA to one compiler or other. As I mentioned I think it can be fine to define OOTA in the same way for all compilers. What I meant is to specialize the memory model to a specific compiler, as long as that is the compiler that is used in reality. So long as your code does not depend on the ordering of any semantic dependencies, the verification can be cross platform. And although... > I'm not particularly concerned about OOTA or semantic dependencies in LKMM. ... there is code that relies on semantic dependencies, e.g. RCU read side CS code. (even if we do not care about OOTA). For that code, the semantic dependencies must be guaranteed to create ordering. So you either need a definition of semantic dependency that a) applies in all cases we practically need and b) is guaranteed by all compilers or we need to live with the fact that we do not have a semantic dependency definition that is independent of the compilation (even of the same compiler) and need to do our verification for that specific compilation. I think for LKMM we could give such a semantic dependency definition because it uses volatile, and verify RCU-read-side code. But we currently do not have one. What I meant to say is that using the actual (whatever compiler you use) optimizations first to remove syntactic dependencies, and then verifying under the assumption of whatever dependencies are left, may be better than trying to approximate dependencies in some way in cat. Given that we want to verify and rely on the code today, not in X years when we all agree on what a compiler-independent definition of semantic dependency is. I think for C++ consume we could also give one by simply restricting some compiler optimizations for consume loads (and doing whatever needs to be done on alpha). Or just kick it out and not have any dependency ordering except the global OOTA case. Sorry for the confusion, I think there are so many different combinations/battlefields (OOTA vs just semantic dependencies, volatile/non-volatile atomics, verifying the model vs verifying a piece of code etc.) that it becomes hard for me not to confuse myself, let alone others :)) Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-17 11:29 ` Jonas Oberhauser @ 2025-01-17 20:01 ` Alan Stern 2025-01-21 10:36 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Alan Stern @ 2025-01-17 20:01 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Fri, Jan 17, 2025 at 12:29:34PM +0100, Jonas Oberhauser wrote: > > > Am 1/17/2025 um 12:02 AM schrieb Alan Stern: > > On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote: > > > I'm well aware that an absolute definition of semantic dependencies is not > > > easy to give. > > > > In fact it's undecidable. No tool is ever going to be able to detect > > semantic dependencies perfectly. > > It depends. > Firstly, let's consider that the tool only runs on finite (or "finitized") > programs. A program may be finite; that doesn't mean its executions are. Programs can go into infinite loops. Are you saying that the tool would stop verifying executions after (say) a billion steps? But then it would not be able to detect semantic dependences in programs that run longer. > Seceondly, your definition depends on the compiler. > > So in the sense that we don't know the compiler, it is undecidable. > But if you fix the compiler, you could still enumerate all executions under > that compiler and compute whether that compiler has a dependency or not. You can't enumerate the executions unless you put an artificial bound on the length of each execution, as noted above. > But as I mentioned before, I think you can define semantic dependencies more > appropriately because you don't really need to preserve semantic > dependencies in C++, and in LKMM (and VMM) you have volatiles that restrict > what kind of dependency-eliminations the compiler can do. What makes you think this "more appropriate" definition of semantic dependency will be any easier to detect than the original one? > Yes. Actually I wonder if you put this "all loads are volatile" restriction, > can a globally analysing compiler still have any optimizations that a > locally analysing compiler can not? Yes, although whether they are pertinent is open to question. For example, a globally optimizing compiler may observe that no thread ever reads the value of a particular shared variable and then eliminate that variable. > It rather seems the other way, that the locally analysing quasi-volatile > compiler can at least to some local optimizations, while the global volatile > compiler can not (e.g., x=1; y=x; can not be x=1; y=1; for the global > compiler because x is volatile now). In the paper we speculate that it may be sufficient to require globally optimizing compilers to treat atomics as quasi volatile with the added restriction that loads must not be omitted. > > But then how would you characterize semantic dependencies, if you want > > to allow the definition to include some dependencies that aren't > > semantic but not so many that you ever create a cycle? > > I don't know which definition is correct yet, but the point is that you > don't have to avoid making so many dependencies that you would create a > cycle. It just forbids the compiler from looking for cycles and optimizing > based on the existance of the cycle. (Looking for unused vars and removing > those is still allowed, under the informal argument that this simulates an > execution where no OOTA happened) At the moment, the only underlying ideas we have driving our notions of semantic dependency is that a true semantic dependency should be one which must give rise to order in the machine code. In turn, this order prevents OOTA cycles from occuring during execution. That is the essence of the paper. Can your ideas be fleshed out to a comparable degree? > > > So for example, if we merge x = y || y = x, the merge can turn it into > > > x=y=x or y=x=y (and then into an empty program), but not into a cyclic > > > dependency. So even though one side of the dependency may be violated, for > > > sake of OOTA, we could still label both sides as dependent. > > > > They _are_ both semantically dependent (in the original parallel > > version, I mean). I don't see what merging has to do with it. > > Note that I was considering the case where they are not volatile. > With a compiler that is not treating them as volatile, which merges the two > threads, under your definition, there is no semantic dependency in at least > one direction because there is no hardware realization H where you read > something else (of course you exclude such compilers, but I think > realistically they should be allowed). > > My point is that we can say they are semantically dependent for the sake of > OOTA, not derive any ordering from these dependencies other than the > cyclical one, and therefore allow compilers to one of the two optimizations > (make x=y no longer depend on y or make y=x no longer depend on x) but noth > make a cycle analysis to remove both dependencies and generate an OOTA value > (it can remove both dependencies by leaving x and y unchanged though). I don't understand. > > I do not like the idea of tying the definition of OOTA (which needs to > > apply to every implementation) to a particular clang compiler. > > But that is what you have done, no? Whether something is an sdep depends on > the compiler, so compiler A could generate an execution that is OOTA in the > sdep definition of compiler B. Yes, but this does not tie the definition to _one particular_ compiler. That is, we don't say "This dependency is semantic because of the way GCC 14.2.1 handles it." Rather, we define for each compiler whether a dependency is semantic for _that_ compiler. > (Of course with the assumption of atomic=volatile, it may just be that we > are back to the beginning and all "naive" semantic dependencies are actually > semantic dependencies for all compilers). > > Anyways what I meant is not about tying the definition of OOTA to one > compiler or other. As I mentioned I think it can be fine to define OOTA in > the same way for all compilers. > What I meant is to specialize the memory model to a specific compiler, as > long as that is the compiler that is used in reality. > So long as your code does not depend on the ordering of any semantic > dependencies, the verification can be cross platform. > > And although... > > > I'm not particularly concerned about OOTA or semantic dependencies in > LKMM. > > ... there is code that relies on semantic dependencies, e.g. RCU read side > CS code. (even if we do not care about OOTA). > For that code, the semantic dependencies must be guaranteed to create > ordering. > > So you either need a definition of semantic dependency that > a) applies in all cases we practically need and > b) is guaranteed by all compilers > > or we need to live with the fact that we do not have a semantic dependency > definition that is independent of the compilation (even of the same > compiler) and need to do our verification for that specific compilation. Add the qualification that the definition should be practical to evaluate, and I agree. > I think for LKMM we could give such a semantic dependency definition because > it uses volatile, and verify RCU-read-side code. But we currently do not > have one. What I meant to say is that using the actual (whatever compiler > you use) optimizations first to remove syntactic dependencies, and then > verifying under the assumption of whatever dependencies are left, may be > better than trying to approximate dependencies in some way in cat. Given > that we want to verify and rely on the code today, not in X years when we > all agree on what a compiler-independent definition of semantic dependency > is. > > I think for C++ consume we could also give one by simply restricting some > compiler optimizations for consume loads (and doing whatever needs to be > done on alpha). Or just kick it out and not have any dependency ordering > except the global OOTA case. > > Sorry for the confusion, I think there are so many different > combinations/battlefields (OOTA vs just semantic dependencies, > volatile/non-volatile atomics, verifying the model vs verifying a piece of > code etc.) that it becomes hard for me not to confuse myself, let alone > others :)) I know what that feels like! Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-17 20:01 ` Alan Stern @ 2025-01-21 10:36 ` Jonas Oberhauser 2025-01-21 16:39 ` Alan Stern 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-21 10:36 UTC (permalink / raw) To: Alan Stern Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/17/2025 um 9:01 PM schrieb Alan Stern: > On Fri, Jan 17, 2025 at 12:29:34PM +0100, Jonas Oberhauser wrote: >> >> >> Am 1/17/2025 um 12:02 AM schrieb Alan Stern: >>> On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote: >>>> I'm well aware that an absolute definition of semantic dependencies is not >>>> easy to give. >>> >>> In fact it's undecidable. No tool is ever going to be able to detect >>> semantic dependencies perfectly. >> >> It depends. >> Firstly, let's consider that the tool only runs on finite (or "finitized") >> programs. > > A program may be finite; that doesn't mean its executions are. Programs > can go into infinite loops. Are you saying that the tool would stop > verifying executions after (say) a billion steps? But then it would not > be able to detect semantic dependences in programs that run longer. Yes, what I said is not fully correct. Finite (non-trivial-)loop-free programs, or programs with finite (representative) execution spaces, would have been more correct. > >> Seceondly, your definition depends on the compiler. >> >> So in the sense that we don't know the compiler, it is undecidable. >> But if you fix the compiler, you could still enumerate all executions under >> that compiler and compute whether that compiler has a dependency or not. > > You can't enumerate the executions unless you put an artificial bound on > the length of each execution, as noted above. Note that for many practical cases it is possible to write test cases where the bound is not artificial in the sense that it is at least as large as the bound needed for exhaustive enumeration. >> But as I mentioned before, I think you can define semantic dependencies more >> appropriately because you don't really need to preserve semantic >> dependencies in C++, and in LKMM (and VMM) you have volatiles that restrict >> what kind of dependency-eliminations the compiler can do. > > What makes you think this "more appropriate" definition of semantic > dependency will be any easier to detect than the original one? For starters, as you mentioned, the compiler has to assume any value is possible. Which means that if any other value would lead to a "different behavior", you know you have a semantic dependency. You can detect this in a per-thread manner, independently of the compiler. Without the assumption of volatile, even a different value that could actually be generated in another run of the same program does not prove a semantic dependency. >> Yes. Actually I wonder if you put this "all loads are volatile" restriction, >> can a globally analysing compiler still have any optimizations that a >> locally analysing compiler can not? > > Yes, although whether they are pertinent is open to question. For > example, a globally optimizing compiler may observe that no thread ever > reads the value of a particular shared variable and then eliminate that > variable. Oh, I meant the "all atomic objects is volatile" restriction, not just the loads. In that case, all stores the object - even if never read - still need to be generated. Are there still any optimizations? >> It rather seems the other way, that the locally analysing quasi-volatile >> compiler can at least to some local optimizations, while the global volatile >> compiler can not (e.g., x=1; y=x; can not be x=1; y=1; for the global >> compiler because x is volatile now). > > In the paper we speculate that it may be sufficient to require globally > optimizing compilers to treat atomics as quasi volatile with the added > restriction that loads must not be omitted. I see. >>> But then how would you characterize semantic dependencies, if you want >>> to allow the definition to include some dependencies that aren't >>> semantic but not so many that you ever create a cycle? >> >> I don't know which definition is correct yet, but the point is that you >> don't have to avoid making so many dependencies that you would create a >> cycle. It just forbids the compiler from looking for cycles and optimizing >> based on the existance of the cycle. (Looking for unused vars and removing >> those is still allowed, under the informal argument that this simulates an >> execution where no OOTA happened) > > At the moment, the only underlying ideas we have driving our notions of > semantic dependency is that a true semantic dependency should be one > which must give rise to order in the machine code. In turn, this order > prevents OOTA cycles from occuring during execution. That is the > essence of the paper. > > Can your ideas be fleshed out to a comparable degree? Well, I certainly have not put as much deep thought into it as you have, so it is certainly more fragile. But my current thoughts are along these lines: We consider inter-thread semantic dependencies (isdep) based on the set of allowed executions + thread local optimizations + what would be allowed to happen if rfe edges along the dependencies become rfi edges due to merging. So the definition is not compiler-specific. Those provide order at machine level unless the compiler restricts the set of executions through its choices, especially cross-thread optimizations, and then uses the restricted set of executions (i.e., possible input range) to optimize the execution further. I haven't thought deeply about all the different optimizations that are possible there. For an example of how it may not provide order, if we have accesses related by isdep;rfe;isdep, then the compiler could merge all the involved threads into one, and the accesses could no longer have any dependency. So it is important that one can not forbid isdep;rf cycles, the axiom would be that isdep;rf is irreflexive. When merging threads, if in an OOTA execution there is an inter-thread semantic dependency from r in one of the merged threads to w in one of the merged threads, such that r reads from a non-merged thread and w is read by a non-merged thread in the OOTA cycle, then it is still an inter-thread dependency which still preserves the order. This prevents the full cycle even if it some other sub-edges within the merged threads are now no longer dependencies. But if r is reading from w in the OOTA cycle, then the compiler (which is not allowed to look for the cycle) has to put r before w in the merged po, preventing r from reading w. (it can also omit r by giving it a value that it could legally read in this po, but it still can't get its value from w this way). E.g., P0 { y = x; } P1 { z = y; } ... (some way to assign x dependent on z) would have an inter-thread dependency from the load of x to the store to z. If P0 and P1 are merged, and there are no other accesses to y, then the intra-thread dependency from loading x to store to y etc. are eliminated, but the inter-thread dependency from x to z remains. >>>> So for example, if we merge x = y || y = x, the merge can turn it into >>>> x=y=x or y=x=y (and then into an empty program), but not into a cyclic >>>> dependency. So even though one side of the dependency may be violated, for >>>> sake of OOTA, we could still label both sides as dependent. >>> >>> They _are_ both semantically dependent (in the original parallel >>> version, I mean). I don't see what merging has to do with it. >> >> Note that I was considering the case where they are not volatile. >> With a compiler that is not treating them as volatile, which merges the two >> threads, under your definition, there is no semantic dependency in at least >> one direction because there is no hardware realization H where you read >> something else (of course you exclude such compilers, but I think >> realistically they should be allowed). >> >> My point is that we can say they are semantically dependent for the sake of >> OOTA, not derive any ordering from these dependencies other than the >> cyclical one, and therefore allow compilers to one of the two optimizations >> (make x=y no longer depend on y or make y=x no longer depend on x) but not >> make a cycle analysis to remove both dependencies and generate an OOTA value >> (it can remove both dependencies by leaving x and y unchanged though). > > I don't understand. Does the explanation above help? > >>> I do not like the idea of tying the definition of OOTA (which needs to >>> apply to every implementation) to a particular clang compiler. >> >> But that is what you have done, no? Whether something is an sdep depends on >> the compiler, so compiler A could generate an execution that is OOTA in the >> sdep definition of compiler B. > > Yes, but this does not tie the definition to _one particular_ compiler. > That is, we don't say "This dependency is semantic because of the way > GCC 14.2.1 handles it." Rather, we define for each compiler whether a > dependency is semantic for _that_ compiler. Yes, but the result is still that every compiler has its own memory model (at least at the point of which graphs are being ruled out as OOTA). So there is no definition of 'G is OOTA', only 'G is OOTA on compiler x'. The example I gave of the tool verifying the program relative to one specific compiler is also not giving a definition of 'G is OOTA', in fact, it does not specify OOTA at all; it simply says ``with compiler x, we get the following "semantic dependencies" and the following graphs...'' And as long as compiler x does not generate OOTA, there will be no OOTA graphs among those. So it does not solve or tackle the theoretical problem in any way, and can not do cross-compiler verification. But it will already apply the 'naive-dependency-breaking optimizations' that you would not see in e.g. herd7. >> So you either need a definition of semantic dependency that >> a) applies in all cases we practically need and >> b) is guaranteed by all compilers >> >> or we need to live with the fact that we do not have a semantic dependency >> definition that is independent of the compilation (even of the same >> compiler) and need to do our verification for that specific compilation. > > Add the qualification that the definition should be practical to > evaluate, and I agree. Yes, an important point. And hard to resolve as well. Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-21 10:36 ` Jonas Oberhauser @ 2025-01-21 16:39 ` Alan Stern 2025-01-22 3:46 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Alan Stern @ 2025-01-21 16:39 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Tue, Jan 21, 2025 at 11:36:01AM +0100, Jonas Oberhauser wrote: > > What makes you think this "more appropriate" definition of semantic > > dependency will be any easier to detect than the original one? > > For starters, as you mentioned, the compiler has to assume any value is > possible. > > Which means that if any other value would lead to a "different behavior", > you know you have a semantic dependency. You can detect this in a per-thread > manner, independently of the compiler. How? The question is not as simple as it sounds. What counts as "different behavior"? What if some loads take place with the other value that didn't take place with the original value? What if completely different code ends up executing but it stores the same values as before? > Without the assumption of volatile, even a different value that could > actually be generated in another run of the same program does not prove a > semantic dependency. So then how do you tell whether there is a semantic dependency? > > > Yes. Actually I wonder if you put this "all loads are volatile" restriction, > > > can a globally analysing compiler still have any optimizations that a > > > locally analysing compiler can not? > > > > Yes, although whether they are pertinent is open to question. For > > example, a globally optimizing compiler may observe that no thread ever > > reads the value of a particular shared variable and then eliminate that > > variable. > > Oh, I meant the "all atomic objects is volatile" restriction, not just the > loads. In that case, all stores the object - even if never read - still need > to be generated. > > Are there still any optimizations? Perhaps not any that affect shared variable accesses. In a way, that was the intention. > Well, I certainly have not put as much deep thought into it as you have, so > it is certainly more fragile. But my current thoughts are along these lines: > > We consider inter-thread semantic dependencies (isdep) based on the set of > allowed executions + thread local optimizations + what would be allowed to > happen if rfe edges along the dependencies become rfi edges due to merging. > So the definition is not compiler-specific. How do you know what thread-local optimizations may be applied? Different compilers use different ones, and new ones are constantly being invented. Why not consider global optimizations? Yes, they are the same as thread-local optimizations if all the threads have been merged into one, but what if they haven't been merged? For that matter, why bring thread merging into the discussion? > Those provide order at machine level unless the compiler restricts the set > of executions through its choices, especially cross-thread optimizations, > and then uses the restricted set of executions (i.e., possible input range) > to optimize the execution further. > I haven't thought deeply about all the different optimizations that are > possible there. > > For an example of how it may not provide order, if we have accesses related > by isdep;rfe;isdep, then the compiler could merge all the involved threads > into one, and the accesses could no longer have any dependency. > > So it is important that one can not forbid isdep;rf cycles, the axiom would > be that isdep;rf is irreflexive. > > > When merging threads, if in an OOTA execution there is an inter-thread > semantic dependency from r in one of the merged threads to w in one of the > merged threads, such that r reads from a non-merged thread and w is read by > a non-merged thread in the OOTA cycle, then it is still an inter-thread > dependency which still preserves the order. This prevents the full cycle > even if it some other sub-edges within the merged threads are now no longer > dependencies. > > But if r is reading from w in the OOTA cycle, then the compiler (which is > not allowed to look for the cycle) has to put r before w in the merged po, > preventing r from reading w. (it can also omit r by giving it a value that > it could legally read in this po, but it still can't get its value from w > this way). > > E.g., > P0 { > y = x; > } > P1 { > z = y; > } > ... (some way to assign x dependent on z) > > would have an inter-thread dependency from the load of x to the store to z. > If P0 and P1 are merged, and there are no other accesses to y, then the > intra-thread dependency from loading x to store to y etc. are eliminated, > but the inter-thread dependency from x to z remains. > > I don't understand. > > Does the explanation above help? No. I don't want to think about thread merging, and at first glance it looks like you don't want to think about anything else. > > > > I do not like the idea of tying the definition of OOTA (which needs to > > > > apply to every implementation) to a particular clang compiler. > > > > > > But that is what you have done, no? Whether something is an sdep depends on > > > the compiler, so compiler A could generate an execution that is OOTA in the > > > sdep definition of compiler B. > > > > Yes, but this does not tie the definition to _one particular_ compiler. > > That is, we don't say "This dependency is semantic because of the way > > GCC 14.2.1 handles it." Rather, we define for each compiler whether a > > dependency is semantic for _that_ compiler. > > Yes, but the result is still that every compiler has its own memory model > (at least at the point of which graphs are being ruled out as OOTA). So > there is no definition of 'G is OOTA', only 'G is OOTA on compiler x'. Our definition of OOTA and semantic dependency does not apply to graphs; it applies to particular hardware executions (together with the abstract executions that they realize). Besides, it is still possible to use our definition to talk about semantic dependency in a compiler-independent way. Namely, if a dependency is semantic relative to _every_ compiler then we can say it is absolutely semantic. > The example I gave of the tool verifying the program relative to one > specific compiler is also not giving a definition of 'G is OOTA', in fact, > it does not specify OOTA at all; it simply says ``with compiler x, we get > the following "semantic dependencies" and the following graphs...'' > > And as long as compiler x does not generate OOTA, there will be no OOTA > graphs among those. > > So it does not solve or tackle the theoretical problem in any way, and can > not do cross-compiler verification. But it will already apply the > 'naive-dependency-breaking optimizations' that you would not see in e.g. > herd7. Okay, fine. But we're trying to come up with general characterizations, and it appears that you're doing something quite different. Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-21 16:39 ` Alan Stern @ 2025-01-22 3:46 ` Jonas Oberhauser 2025-01-22 19:11 ` Alan Stern 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-22 3:46 UTC (permalink / raw) To: Alan Stern Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/21/2025 um 5:39 PM schrieb Alan Stern: > On Tue, Jan 21, 2025 at 11:36:01AM +0100, Jonas Oberhauser wrote: >>> What makes you think this "more appropriate" definition of semantic >>> dependency will be any easier to detect than the original one? >> >> For starters, as you mentioned, the compiler has to assume any value is >> possible. >> >> Which means that if any other value would lead to a "different behavior", >> you know you have a semantic dependency. You can detect this in a per-thread >> manner, independently of the compiler. > > How? The question is not as simple as it sounds. What counts as > "different behavior"? What if some loads take place with the other > value that didn't take place with the original value? What if > completely different code ends up executing but it stores the same > values as before? I agree that it is not easy, mostly because there's no good way to say whether two stores from two executions are "the same". Besides the complications mentioned in your paper (e.g., another store to the same address being on the new code path), I would also need to think about barriers (e.g., a relaxed store in one execution, but a sc one in another execution), other kinds of synchronization (what if there are other atomic accesses in between that establish synchronization?). One could take your definition (with the requirements about H dropped), and perhaps since in this specific example all atomic accesses are volatile, the restriction about counting accesses does not sting as much. > >> Without the assumption of volatile, even a different value that could >> actually be generated in another run of the same program does not prove a >> semantic dependency. > > So then how do you tell whether there is a semantic dependency? > >>>> Yes. Actually I wonder if you put this "all loads are volatile" restriction, >>>> can a globally analysing compiler still have any optimizations that a >>>> locally analysing compiler can not? >>> >>> Yes, although whether they are pertinent is open to question. For >>> example, a globally optimizing compiler may observe that no thread ever >>> reads the value of a particular shared variable and then eliminate that >>> variable. >> >> Oh, I meant the "all atomic objects is volatile" restriction, not just the >> loads. In that case, all stores the object - even if never read - still need >> to be generated. >> >> Are there still any optimizations? > > Perhaps not any that affect shared variable accesses. In a way, that > was the intention. Yes, but it becomes a bit strange then to treat the "globally analyzing compiler" as a harder problem. You have made a globally analyzing compiler that can not globally analyze. I understand that makes the argument feasible, but I am not sure if this is how compilers really behave (or at least will still behave in the future). >> Well, I certainly have not put as much deep thought into it as you have, so >> it is certainly more fragile. But my current thoughts are along these lines: >> >> We consider inter-thread semantic dependencies (isdep) based on the set of >> allowed executions + thread local optimizations + what would be allowed to >> happen if rfe edges along the dependencies become rfi edges due to merging. >> So the definition is not compiler-specific. > > How do you know what thread-local optimizations may be applied? > Different compilers use different ones, and new ones are constantly > being invented. I'm not sure it is necessary to know specific optimizations. I don't have a satisfactory answer now. But perhaps it is possible to define this through the abstract machine, maybe with a detour through the set of allowed C realizations. Something like "given some execution G, and a subset of threads (T)i, and a read r and write w in those executions, if for every C realization of threads (T)i together such that all executions of the realization under all rf relations that exist in some original execution G' modulo some po-preserving mapping have the same observable side effects as that original execution G', there is a sequence of syntactic dependency + rf from r to w, then there is an inter-thread semantic dependency from r to w". But this makes it really hard to prove that there is a dependency, since it quantifies over all C programs. And it only takes into account C-level optimizations, not optimizations specific to some hardware platform (which for all we know may have some transactional memory or powerful write speculation which the compiler knows about and uses in its optimizations). I am not sure there is a better definition. > Why not consider global optimizations? Yes, they are the same as > thread-local optimizations if all the threads have been merged into one, > but what if they haven't been merged? Some global optimizations are considered by the fact that we only consider the set of allowed executions. So if the compiler can see that in all executions the value of some variable is always 0 or 1 - for example, because those are the only kinds of stores to that variable - it might use that to do local optimizations. Such as eliminating some switch cases which then lead to eliminating control dependencies. > For that matter, why bring thread merging into the discussion? For one, it sounds impractical to make a compiler that does advanced global optimizations without merging the threads. It's probably easy enough to do on toy examples, but for realistic applications it seems completely infeasible, and additionally, hard to gain much benefit from it. So I'm not sure it is necessary to solve the more advanced problem. For another, thread merging defeats per-thread semantic dependencies. So unless all optimizations related to that are ruled out (e.g., by saying all accesses are volatile), it needs to be considered either specifically or by a more powerful simpler abstraction. Which I don't have. > > I don't want to think about thread merging, and at first glance it > looks like you don't want to think about anything else. I *want* to think about other practical global optimizations that are not included in the optimizations due to knowing the set of possible executions. I just haven't been able to. >>>>> I do not like the idea of tying the definition of OOTA (which needs to >>>>> apply to every implementation) to a particular clang compiler. >>>> >>>> But that is what you have done, no? Whether something is an sdep depends on >>>> the compiler, so compiler A could generate an execution that is OOTA in the >>>> sdep definition of compiler B. >>> >>> Yes, but this does not tie the definition to _one particular_ compiler. >>> That is, we don't say "This dependency is semantic because of the way >>> GCC 14.2.1 handles it." Rather, we define for each compiler whether a >>> dependency is semantic for _that_ compiler. >> >> Yes, but the result is still that every compiler has its own memory model >> (at least at the point of which graphs are being ruled out as OOTA). So >> there is no definition of 'G is OOTA', only 'G is OOTA on compiler x'. > > Our definition of OOTA and semantic dependency does not apply to graphs; > it applies to particular hardware executions (together with the abstract > executions that they realize). True, but it immediately induces a predicate over graphs indexed by the compiler (by looking at hardware executions generated by x and the graphs from the realized abstract executions). > > Besides, it is still possible to use our definition to talk about > semantic dependency in a compiler-independent way. Namely, if a > dependency is semantic relative to _every_ compiler then we can say it > is absolutely semantic. Sure, but that definition is useless. For example, in the running OOTA example x=y||y=x, there are no absolutely semantic dependencies. One compiler can turn this into x=y, where there is no dependency from y to x, and another into y=x, where there is no dependency from x to y. In fact even a naive non-optimizing compiler has no dependencies here under your definition because there are no hardware executions with other values. In my informal definition, there is (intended to be) a inter-thread dependency from the load from x in T2 to the store to x in T1, and I would claim there is no execution in which that load can read from that store. I would not claim anything about the order of the load from y and the store to x. >> The example I gave of the tool verifying the program relative to one >> specific compiler is also not giving a definition of 'G is OOTA', in fact, >> it does not specify OOTA at all; it simply says ``with compiler x, we get >> the following "semantic dependencies" and the following graphs...'' >> >> And as long as compiler x does not generate OOTA, there will be no OOTA >> graphs among those. >> >> So it does not solve or tackle the theoretical problem in any way, and can >> not do cross-compiler verification. But it will already apply the >> 'naive-dependency-breaking optimizations' that you would not see in e.g. >> herd7. > > Okay, fine. But we're trying to come up with general characterizations, > and it appears that you're doing something quite different. Well, it wasn't me who did this, it was Hernan's group who did this for other practical concerns, but yes I agree. It's not a solution to the very hard problem you're trying to solve, but a solution to perhaps the more immediate problem people have while looking at real code. Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-22 3:46 ` Jonas Oberhauser @ 2025-01-22 19:11 ` Alan Stern 0 siblings, 0 replies; 59+ messages in thread From: Alan Stern @ 2025-01-22 19:11 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Wed, Jan 22, 2025 at 04:46:03AM +0100, Jonas Oberhauser wrote: > > > > > Yes. Actually I wonder if you put this "all loads are volatile" restriction, > > > > > can a globally analysing compiler still have any optimizations that a > > > > > locally analysing compiler can not? > > > > > > > > Yes, although whether they are pertinent is open to question. For > > > > example, a globally optimizing compiler may observe that no thread ever > > > > reads the value of a particular shared variable and then eliminate that > > > > variable. > > > > > > Oh, I meant the "all atomic objects is volatile" restriction, not just the > > > loads. In that case, all stores the object - even if never read - still need > > > to be generated. > > > > > > Are there still any optimizations? > > > > Perhaps not any that affect shared variable accesses. In a way, that > > was the intention. > > Yes, but it becomes a bit strange then to treat the "globally analyzing > compiler" as a harder problem. You have made a globally analyzing compiler > that can not globally analyze. I understand that makes the argument > feasible, but I am not sure if this is how compilers really behave (or at > least will still behave in the future). There's still an important difference. A globally optimizing compiler is allowed to generate different object code for a thread (containing the same source code) in different programs, depending on the source for the other threads. A locally analyzing compiler is not allowed to do this; it will always generate the same object code for threads containing the same source code, regardless of what the rest of the program does. Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-16 23:02 ` Alan Stern 2025-01-17 8:34 ` Hernan Ponce de Leon 2025-01-17 11:29 ` Jonas Oberhauser @ 2025-01-17 15:52 ` Alan Stern 2025-01-17 16:45 ` Jonas Oberhauser 2 siblings, 1 reply; 59+ messages in thread From: Alan Stern @ 2025-01-17 15:52 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Thu, Jan 16, 2025 at 06:02:18PM -0500, Alan Stern wrote: > On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote: > > I would say that the approach with volatile is overzealous because it tries > > to create a "local" order solution to the problem that only requires a > > "global" ordering solution. Since not every semantic dependency needs to > > provide order in C++ -- only the cycle of dependencies -- it is totally ok > > to add too many semantic dependency edges to a program, even those that are > > not going to be exactly maintained by every compiler, as long as we can > > ensure that globally, no dependency cycle occurs. > > But then how would you characterize semantic dependencies, if you want > to allow the definition to include some dependencies that aren't > semantic but not so many that you ever create a cycle? This sounds like > an even worse problem than we started with! An interesting side comment on this issue... This is a slight variation of the example on page 19 (section 4.3) of the paper. (Pretend this is actually C++ code, the shared variables are all atomic, and their accesses are all relaxed.) bool x, y, z; void P0(bool *x, bool *y, bool *z) { bool r1, r2; r1 = *x; r2 = *y; *z = (r1 != r2); } The paper points out that although there is an apparent semantic dependency from the load of x to the store to z, if the compiler is allowed not to handle atomics as quasi volatile then the dependency can be broken. Nevertheless, I am not able to think of a program that could exhibit OOTA as a result of breaking the semantic dependency. The best I can come up with is this: [P0 as above] void P1(bool *x, bool *y, bool *z) { bool r3; r3 = z; x = r3; } void P2(bool *x, bool *y, bool *z) { y = true; } exists (x=true /\ z=true) If P2 were not present, this result could not occur in any physical execution, even if the dependency in P0 is broken. With P2 this result isn't OOTA, even in executions where P0 ends up storing z before loading x, because P2 could have executed first, then P0, then P1. So perhaps this is an example of what you were talking about -- a dependency which may or may not be semantic, but either way cannot lead to OOTA. Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-17 15:52 ` Alan Stern @ 2025-01-17 16:45 ` Jonas Oberhauser 2025-01-17 19:02 ` Alan Stern 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-17 16:45 UTC (permalink / raw) To: Alan Stern Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/17/2025 um 4:52 PM schrieb Alan Stern: > On Thu, Jan 16, 2025 at 06:02:18PM -0500, Alan Stern wrote: >> On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote: >>> I would say that the approach with volatile is overzealous because it tries >>> to create a "local" order solution to the problem that only requires a >>> "global" ordering solution. Since not every semantic dependency needs to >>> provide order in C++ -- only the cycle of dependencies -- it is totally ok >>> to add too many semantic dependency edges to a program, even those that are >>> not going to be exactly maintained by every compiler, as long as we can >>> ensure that globally, no dependency cycle occurs. >> >> But then how would you characterize semantic dependencies, if you want >> to allow the definition to include some dependencies that aren't >> semantic but not so many that you ever create a cycle? This sounds like >> an even worse problem than we started with! > > An interesting side comment on this issue... > > This is a slight variation of the example on page 19 (section 4.3) of > the paper. (Pretend this is actually C++ code, the shared variables are > all atomic, and their accesses are all relaxed.) > > bool x, y, z; > > void P0(bool *x, bool *y, bool *z) { > bool r1, r2; > > r1 = *x; > r2 = *y; > > *z = (r1 != r2); > } > > The paper points out that although there is an apparent semantic > dependency from the load of x to the store to z, if the compiler is > allowed not to handle atomics as quasi volatile then the dependency > can be broken. Nevertheless, I am not able to think of a program that > could exhibit OOTA as a result of breaking the semantic dependency. The > best I can come up with is this: > > [P0 as above] > > void P1(bool *x, bool *y, bool *z) { > bool r3; > > r3 = z; > x = r3; > } > > void P2(bool *x, bool *y, bool *z) { > y = true; > } > > exists (x=true /\ z=true) > > If P2 were not present, this result could not occur in any physical > execution, even if the dependency in P0 is broken. With P2 this result > isn't OOTA, even in executions where P0 ends up storing z before loading > x, because P2 could have executed first, then P0, then P1. > > So perhaps this is an example of what you were talking about -- a > dependency which may or may not be semantic, but either way cannot lead > to OOTA. Yes, that looks like an example of what I have in mind. If at the model level we just say "yes there is a dependency, but no it does not give any ordering guarantee", then the compiler is still free to break the dependency like in your example. A thread P3 { r1 = z; atomic_thread_fence(); r2 = y; } could still observe r2 == false, r1 == true, "showing" that the dependency was broken. This would not violate such a model. (if we consider consume, then that would need to restrict the compiler from eliminating the dependency like this) That is not to say that I am 100% sure that it is possible to define sdep correctly to make this work. One problem is if the compiler merges two threads (with an OOTA cycle of 3+ threads), it can turn sdep;rfe;sdep;rfe into sdep;rfi;sdep;rfe. If sdep is too naive then it is easy to come up with counter examples where this sdep;rfi;sdep no longer provides ordering, making the whole sdep;rfe cycle possible. I am not sure if sdep can be formalized in a way that ensures that this sdep;rfi;sdep edge would still need to be preserved. Of course one could have inter-thread semantic dependencies and only forbid isdep ; rf (e?) from being reflexive... Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-17 16:45 ` Jonas Oberhauser @ 2025-01-17 19:02 ` Alan Stern 0 siblings, 0 replies; 59+ messages in thread From: Alan Stern @ 2025-01-17 19:02 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Fri, Jan 17, 2025 at 05:45:50PM +0100, Jonas Oberhauser wrote: > > > Am 1/17/2025 um 4:52 PM schrieb Alan Stern: > > On Thu, Jan 16, 2025 at 06:02:18PM -0500, Alan Stern wrote: > > > On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote: > > This is a slight variation of the example on page 19 (section 4.3) of > > the paper. (Pretend this is actually C++ code, the shared variables are > > all atomic, and their accesses are all relaxed.) > > > > bool x, y, z; > > > > void P0(bool *x, bool *y, bool *z) { > > bool r1, r2; > > > > r1 = *x; > > r2 = *y; > > > > *z = (r1 != r2); > > } > > > > The paper points out that although there is an apparent semantic > > dependency from the load of x to the store to z, if the compiler is > > allowed not to handle atomics as quasi volatile then the dependency > > can be broken. Nevertheless, I am not able to think of a program that > > could exhibit OOTA as a result of breaking the semantic dependency. The > > best I can come up with is this: > > > > [P0 as above] > > > > void P1(bool *x, bool *y, bool *z) { > > bool r3; > > > > r3 = z; > > x = r3; > > } > > > > void P2(bool *x, bool *y, bool *z) { > > y = true; > > } > > > > exists (x=true /\ z=true) > > > > If P2 were not present, this result could not occur in any physical > > execution, even if the dependency in P0 is broken. With P2 this result > > isn't OOTA, even in executions where P0 ends up storing z before loading > > x, because P2 could have executed first, then P0, then P1. > > > > So perhaps this is an example of what you were talking about -- a > > dependency which may or may not be semantic, but either way cannot lead > > to OOTA. > > Yes, that looks like an example of what I have in mind. > > If at the model level we just say "yes there is a dependency, but no it does > not give any ordering guarantee", then the compiler is still free to break > the dependency like in your example. > > A thread P3 { r1 = z; atomic_thread_fence(); r2 = y; } > could still observe r2 == false, r1 == true, "showing" that the dependency > was broken. That wouldn't prove anything unless P0 had its own memory barrier somewhere before it stored z. At any rate, I don't have any ideas on how to characterize semantic dependencies that can be broken without risking OOTA. (Some people would say that if a dependency can be broken at all, that demonstrates it wasn't semantic to begin with.) Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-09 17:54 ` Paul E. McKenney 2025-01-09 18:35 ` Jonas Oberhauser @ 2025-01-09 20:37 ` Peter Zijlstra 2025-01-09 21:13 ` Paul E. McKenney 1 sibling, 1 reply; 59+ messages in thread From: Peter Zijlstra @ 2025-01-09 20:37 UTC (permalink / raw) To: Paul E. McKenney Cc: Jonas Oberhauser, Alan Stern, parri.andrea, will, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Thu, Jan 09, 2025 at 09:54:28AM -0800, Paul E. McKenney wrote: > > P0(int *a, int *b, int *x, int *y) { > > int r1; > > int r2 = 0; > > r1 = READ_ONCE(*x); > > smp_rmb(); > > if (r1 == 1) { > > r2 = *b; > > } > > WRITE_ONCE(*a, r2); > > smp_wmb(); > > WRITE_ONCE(*y, 1); > > } > > > > P1(int *a, int *b, int *x, int *y) { > > int r1; > > > > int r2 = 0; > > > > r1 = READ_ONCE(*y); > > smp_rmb(); > > if (r1 == 1) { > > r2 = *a; > > } > > WRITE_ONCE(*b, r2); > > smp_wmb(); > > WRITE_ONCE(*x, 1); > > } > > > > > > The reason that the WRITE_ONCE helps in the speculative store case is that > > both its ctrl dependency and the wmb provide ordering, which together > > creates ordering between *x and *y. > > Ah, and that is because LKMM does not enforce control dependencies past > the end of the "if" statement. Cute! I think the reason we hesitated on that was CMOV and similar conditional instructions. If the body of the branch is a CMOV, then there no conditionality on the common path after the body. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-09 20:37 ` Peter Zijlstra @ 2025-01-09 21:13 ` Paul E. McKenney 0 siblings, 0 replies; 59+ messages in thread From: Paul E. McKenney @ 2025-01-09 21:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Jonas Oberhauser, Alan Stern, parri.andrea, will, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Thu, Jan 09, 2025 at 09:37:08PM +0100, Peter Zijlstra wrote: > On Thu, Jan 09, 2025 at 09:54:28AM -0800, Paul E. McKenney wrote: > > > P0(int *a, int *b, int *x, int *y) { > > > int r1; > > > int r2 = 0; > > > r1 = READ_ONCE(*x); > > > smp_rmb(); > > > if (r1 == 1) { > > > r2 = *b; > > > } > > > WRITE_ONCE(*a, r2); > > > smp_wmb(); > > > WRITE_ONCE(*y, 1); > > > } > > > > > > P1(int *a, int *b, int *x, int *y) { > > > int r1; > > > > > > int r2 = 0; > > > > > > r1 = READ_ONCE(*y); > > > smp_rmb(); > > > if (r1 == 1) { > > > r2 = *a; > > > } > > > WRITE_ONCE(*b, r2); > > > smp_wmb(); > > > WRITE_ONCE(*x, 1); > > > } > > > > > > > > > The reason that the WRITE_ONCE helps in the speculative store case is that > > > both its ctrl dependency and the wmb provide ordering, which together > > > creates ordering between *x and *y. > > > > Ah, and that is because LKMM does not enforce control dependencies past > > the end of the "if" statement. Cute! > > I think the reason we hesitated on that was CMOV and similar conditional > instructions. If the body of the branch is a CMOV, then there no > conditionality on the common path after the body. That does match my recollection. In addition, in some cases the compiler can move memory references following the body of the "if" to precede that "if", and then CPU memory-reference reordering can do the rest. Thanx, Paul ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-07 16:09 ` Alan Stern 2025-01-07 18:47 ` Paul E. McKenney @ 2025-01-08 17:33 ` Jonas Oberhauser 2025-01-08 18:47 ` Alan Stern 1 sibling, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-08 17:33 UTC (permalink / raw) To: Alan Stern Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/7/2025 um 5:09 PM schrieb Alan Stern: > On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: >> The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following >> example shared on this list a few years ago: >> >> P0(int *a, int *b, int *x, int *y) { >> int r1; >> >> r1 = READ_ONCE(*x); >> smp_rmb(); >> if (r1 == 1) { >> *a = *b; >> } >> smp_wmb(); >> WRITE_ONCE(*y, 1); >> } >> >> P1(int *a, int *b, int *x, int *y) { >> int r1; >> >> r1 = READ_ONCE(*y); >> smp_rmb(); >> if (r1 == 1) { >> *b = *a; >> } >> smp_wmb(); >> WRITE_ONCE(*x, 1); >> } >> >> exists b=42 >> >> The root cause is an interplay between plain accesses and rw-fences, i.e., >> smp_rmb() and smp_wmb(): while smp_rmb() and smp_wmb() provide sufficient >> ordering for plain accesses to rule out data races, they do not in the current >> formalization generally actually order the plain accesses, allowing, e.g., the >> load and stores to *b to proceed in any order even if P1 reads from P0; and in >> particular, the marked accesses around those plain accesses are also not >> ordered, which causes this OOTA. > > That's right. The memory model deliberately tries to avoid placing > restrictions on plain accesses, whenever it can. > > In the example above, for instance, I think it's more interesting to ask > > exists 0:r1=1 /\ 1:r1=1 > > than to concentrate on a and b. Yes, and of course there's a relationship between the two anomalies. My proposed patch solves OOTA indirectly, by forbidding the marked accesses to *x and *y from being reordered in case r1 == 1, and thereby, forbidding exactly the outcome where both have r1 == 1. > OOTA is a very difficult subject. No doubt. > It can be approached only by making > the memory model take all sorts of compiler optimizations into account, > and doing this for all possible optimizations is not feasible. I think there's two parts of this, one is the correct definition of semantic (or compiler-preserved) dependencies. This is a really hard problem that can indeed only be solved by looking at all ways the compiler can optimize things. I'm not trying to solve that (and can't address that issue in a cat file anyways). The second part is to see which accesses could participate in an OOTA. I think this is a lot simpler. > (For example, in a presentation to the C++ working group last year, Paul > and I didn't try to show how to extend the C++ memory model to exclude > OOTA [other than by fiat, as it does now]. Instead we argued that with > the existing memory model, no reasonable compiler would ever produce an > executable that could exhibit OOTA and so the memory model didn't need > to be changed.) I think a model like C++ should exclude it exactly by fiat, by formalizing OOTA-freedom as an axiom, like acyclic ( (data|ctrl|addr) ; rfe ) This axiom is to some extent unsatisfying because it does not explain *how* the OOTA is avoided, and in particular, it does not forbid any specific kind of behavior that would lead to OOTA, just the complete combination of them. So for example, the following would be allowed: P0: r0 = y.load_explicit(memory_order_relaxed); x.store_explicit(r0,memory_order_relaxed); P1: r1 = x; // reads 1 atomic_thread_fence(); y = 1; which could by some people be interpreted as the accesses of P0 being executed out of order (although there is no such concept in C++), indicating that if P0's accesses are allowed to be executed out of order, then so should P2's: P2: r2 = x.load_explicit(memory_order_relaxed); y.store_explicit(r2,memory_order_relaxed); Of course if both P0 and P2 are "executing out of order" at the same time, one would have OOTA, but this "global behavior" would be forbidden by the axiom. But that is already how C++ works: a release access in C++ does not have any ordering by itself either. It is just the combination of release + acquire on other threads that somehow establishes synchronization. LKMM works differently though, by providing "local" ordering rules, through ppo. We can argue about a ppo locally even without knowing the code of any other threads, let alone whether their accesses have acquire or release semantics. So we can address OOTA in a "local" manner as well. And it has the advantage of having compiler barriers around a lot of things, which makes reasoning a lot more feasible. >> In this patch, we choose the rather conservative approach of forcing only the >> order of these marked accesses, specifically, when a marked read r is >> separated from a plain read r' by an smp_rmb() (or r' has an address >> dependency on r or is r'), on which a write w' depends, and w' is either plain >> and seperated by a subsequent write w by an smp_wmb() (or w' is w), then r >> precedes w in ppo. > > Is this really valid? In the example above, if there were no other > references to a or b in the rest of the program, the compiler could > eliminate them entirely. In the example above, this is not possible, because the address of a/b have escaped the function and are not deallocated before synchronization happens. Therefore the compiler must assume that a/b are accessed inside the compiler barrier. > (Whether the result could count as OOTA is > open to question, but that's not the point.) Is it not possible that a > compiler might find other ways to defeat your intentions? The main point (which I already mentioned in the previous e-mail) is if the object is deallocated without synchronization (or never escapes the function in the first place). And indeed, any such case renders the added rule unsound. It is in a sense unrelated to OOTA; cases where the load/store can be elided are never OOTA. Of course that is outside the current scope of what herd7 needs to deal with or can express, because deallocation isn't a thing in herd7. Nevertheless, trying to specify inside cat when an access is "live" -- relevant enough that the compiler will keep it around -- is hard and tedious (it's the main source of complication in the patch). A much better way would be to add a base set of "live loads and stores" Live, which are the loads and stores that the compiler must consider to be live. Just like addr, ctrl, etc., we don't have to specify these in cat, and rather rely on herd7 to correctly . If an access interacts with an access of another thread (by reading from it or being read from it), it must be live. Then we could formulate the rule as +let to-w = (overwrite & int) | (addr | rmb ; [Live])? ; rwdep ; ([Live] ; wmb)? (the latter case being a generalization of the current `addr ; [Plain] ; wmb` and `rwdep` cases of to-w, assuming we restrict it Life accesses - it is otherwise also unsound: int a[2] = {0}; int r1 = READ_ONCE(*x); a[r1] = 0; // compiler will just remove this smp_wmb(); WRITE_ONCE(*y, 1); ) The formulation in the patch is just based on a complicated and close but imperfect approximation of Live. > In any case, my feeling is that memory models for higher languages > (i.e., anything above the assembler level) should not try very hard to > address the question of OOTA. And for LKMM, OOTA involving _plain_ > accesses is doubly out of bounds. > > Your proposed change seems to add a significant complication to the > memory model for a not-very-clear benefit. Even if we ignore OOTA for the moment, it is not inconceivable to have some cases use a combination of plain accesses with dependencies/rmb/wmb, such as in an RCU case. That's probably the reason for the current addr ; [Plain] ; wmb exists. It's not clear that it covers all cases though. Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-08 17:33 ` Jonas Oberhauser @ 2025-01-08 18:47 ` Alan Stern 2025-01-08 19:22 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Alan Stern @ 2025-01-08 18:47 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Wed, Jan 08, 2025 at 06:33:05PM +0100, Jonas Oberhauser wrote: > > > Am 1/7/2025 um 5:09 PM schrieb Alan Stern: > > Is this really valid? In the example above, if there were no other > > references to a or b in the rest of the program, the compiler could > > eliminate them entirely. > > In the example above, this is not possible, because the address of a/b have > escaped the function and are not deallocated before synchronization happens. > Therefore the compiler must assume that a/b are accessed inside the compiler > barrier. I'm not quite sure what you mean by that, but if the compiler has access to the entire program and can do a global analysis then it can recognize cases where accesses that may seem to be live aren't really. However, I admit this objection doesn't really apply to Linux kernel programming. > > (Whether the result could count as OOTA is > > open to question, but that's not the point.) Is it not possible that a > > compiler might find other ways to defeat your intentions? > > The main point (which I already mentioned in the previous e-mail) is if the > object is deallocated without synchronization (or never escapes the function > in the first place). > > And indeed, any such case renders the added rule unsound. It is in a sense > unrelated to OOTA; cases where the load/store can be elided are never OOTA. That is a matter of definition. In our paper, Paul and I described instances of OOTA in which all the accesses have been optimized away as "trivial". > Of course that is outside the current scope of what herd7 needs to deal with > or can express, because deallocation isn't a thing in herd7. > > Nevertheless, trying to specify inside cat when an access is "live" -- > relevant enough that the compiler will keep it around -- is hard and tedious > (it's the main source of complication in the patch). > > A much better way would be to add a base set of "live loads and stores" > Live, which are the loads and stores that the compiler must consider to be > live. Just like addr, ctrl, etc., we don't have to specify these in cat, and > rather rely on herd7 to correctly . I agree that determining which accesses are live (in the sense that the compiler cannot optimize them out of existence) accounts for a large part of the difficulty of analyzing plain accesses in general, and OOTA in particular. > If an access interacts with an access of another thread (by reading from it > or being read from it), it must be live. This is the sort of approximation I'm a little uncomfortable with. It would be better to say that a store which is read from by a live load must be live. I don't see why a load which reads from a live store has to be live. > Then we could formulate the rule as > > +let to-w = (overwrite & int) | (addr | rmb ; [Live])? ; rwdep ; ([Live] ; > wmb)? > > (the latter case being a generalization of the current `addr ; [Plain] ; > wmb` and `rwdep` cases of to-w, assuming we restrict it Life accesses - it > is otherwise also unsound: > > int a[2] = {0}; > int r1 = READ_ONCE(*x); > a[r1] = 0; // compiler will just remove this > smp_wmb(); > WRITE_ONCE(*y, 1); > Yes, and we recognize that this part of the rule is on shaky ground. > ) > > The formulation in the patch is just based on a complicated and close but > imperfect approximation of Live. Maybe you can reformulate the patch to make this more explicit. In any case, it seems that any approximation we can make to Live will be subject to various sorts of errors. Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-08 18:47 ` Alan Stern @ 2025-01-08 19:22 ` Jonas Oberhauser 2025-01-09 16:17 ` Alan Stern 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-08 19:22 UTC (permalink / raw) To: Alan Stern Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/8/2025 um 7:47 PM schrieb Alan Stern: > On Wed, Jan 08, 2025 at 06:33:05PM +0100, Jonas Oberhauser wrote: >> >> >> Am 1/7/2025 um 5:09 PM schrieb Alan Stern: >>> Is this really valid? In the example above, if there were no other >>> references to a or b in the rest of the program, the compiler could >>> eliminate them entirely. >> >> In the example above, this is not possible, because the address of a/b have >> escaped the function and are not deallocated before synchronization happens. >> Therefore the compiler must assume that a/b are accessed inside the compiler >> barrier. > > I'm not quite sure what you mean by that, but if the compiler has access > to the entire program and can do a global analysis then it can recognize > cases where accesses that may seem to be live aren't really. Even for trivial enough cases where the compiler has access to all the source, compiler barriers should be opaque to the compiler. Since it is opaque, *a = 1; compiler_barrier(); might as well be *a = 1; *d = *a; // *d is in device memory and so in my opinion the compiler needs to ensure that the value of *a right before the compiler barrier is 1. Of course, only if the address of *a could be possibly legally known to the opaque code in the compiler barrier. > > However, I admit this objection doesn't really apply to Linux kernel > programming. > >>> (Whether the result could count as OOTA is >>> open to question, but that's not the point.) Is it not possible that a >>> compiler might find other ways to defeat your intentions? >> >> The main point (which I already mentioned in the previous e-mail) is if the >> object is deallocated without synchronization (or never escapes the function >> in the first place). >> >> And indeed, any such case renders the added rule unsound. It is in a sense >> unrelated to OOTA; cases where the load/store can be elided are never OOTA. > > That is a matter of definition. In our paper, Paul and I described > instances of OOTA in which all the accesses have been optimized away as > "trivial". Yes, by OOTA I mean a rwdep;rfe cycle. In the absence of data races, such a cycle can't be optimized away because it is created with volatile/compiler-barrier-protected accesses. >> If an access interacts with an access of another thread (by reading from it >> or being read from it), it must be live. > > This is the sort of approximation I'm a little uncomfortable with. It > would be better to say that a store which is read from by a live load > must be live. I don't see why a load which reads from a live store has > to be live. You are right, and I was careless. All we need is that a store that is read externally by a live load is live, and that a load that reads from an external store and has its value semantically depended-on by a live store is live. >> The formulation in the patch is just based on a complicated and close but >> imperfect approximation of Live. > > Maybe you can reformulate the patch to make this more explicit. It would look something like this: Live = R & rng(po \ po ; [W] ; (po-loc \ w_barrier)) | W & dom(po \ ((po-loc \ w_barrier) ; [W] ; po)) let to-w = (overwrite & int) | (addr | rmb ; [Live])? ; rwdep ; ([Live] ; wmb)? > In any case, it seems that any approximation we can make to Live will be > subject to various sorts of errors. Probably (this is certainly true for trying to approximate dependencies, for example), but what I know for certain is that the approximations of Live inside cat get more ugly the more precise they become. In the above definition of Live I have not included that the address must escape, nor that it must not be freed. A non-local definition that suffices for OOTA would be so: Live = R & rng(rfe) & dom(rwdep ; rfe) | W & dom(rfe) It seems the ideal solution is to let Live be defined by the tools, which should keep up with or exceed the analysis done by state-of-art compilers. Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-08 19:22 ` Jonas Oberhauser @ 2025-01-09 16:17 ` Alan Stern 2025-01-09 16:44 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Alan Stern @ 2025-01-09 16:17 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Wed, Jan 08, 2025 at 08:22:07PM +0100, Jonas Oberhauser wrote: > > > Am 1/8/2025 um 7:47 PM schrieb Alan Stern: > > On Wed, Jan 08, 2025 at 06:33:05PM +0100, Jonas Oberhauser wrote: > > > > > > > > > Am 1/7/2025 um 5:09 PM schrieb Alan Stern: > > > > Is this really valid? In the example above, if there were no other > > > > references to a or b in the rest of the program, the compiler could > > > > eliminate them entirely. > > > > > > In the example above, this is not possible, because the address of a/b have > > > escaped the function and are not deallocated before synchronization happens. > > > Therefore the compiler must assume that a/b are accessed inside the compiler > > > barrier. > > > > I'm not quite sure what you mean by that, but if the compiler has access > > to the entire program and can do a global analysis then it can recognize > > cases where accesses that may seem to be live aren't really. > > Even for trivial enough cases where the compiler has access to all the > source, compiler barriers should be opaque to the compiler. > > Since it is opaque, > > *a = 1; > compiler_barrier(); > > might as well be > > *a = 1; > *d = *a; // *d is in device memory > > and so in my opinion the compiler needs to ensure that the value of *a right > before the compiler barrier is 1. > > Of course, only if the address of *a could be possibly legally known to the > opaque code in the compiler barrier. What do you mean by "opaque code in the compiler barrier"? The compiler_barrier() instruction doesn't generate any code at all; it merely directs the compiler not to carry any knowledge about values stored in memory from one side of the barrier to the other. Note that it does _not_ necessarily prevent the compiler from carrying knowledge that a memory location is unused from one side of the barrier to the other. > > However, I admit this objection doesn't really apply to Linux kernel > > programming. > > > > > > (Whether the result could count as OOTA is > > > > open to question, but that's not the point.) Is it not possible that a > > > > compiler might find other ways to defeat your intentions? > > > > > > The main point (which I already mentioned in the previous e-mail) is if the > > > object is deallocated without synchronization (or never escapes the function > > > in the first place). > > > > > > And indeed, any such case renders the added rule unsound. It is in a sense > > > unrelated to OOTA; cases where the load/store can be elided are never OOTA. > > > > That is a matter of definition. In our paper, Paul and I described > > instances of OOTA in which all the accesses have been optimized away as > > "trivial". > > Yes, by OOTA I mean a rwdep;rfe cycle. > > In the absence of data races, such a cycle can't be optimized away because > it is created with volatile/compiler-barrier-protected accesses. That wasn't true in the C++ context of the paper Paul and I worked on. Of course, C++ is not our current context here. What I was trying to get at above is that compiler-barrier protection does not necessarily guarantee that non-volatile accesses can't be optimized away. (However, it's probably safe for us to make such an assumption here.) > It would look something like this: > > Live = R & rng(po \ po ; [W] ; (po-loc \ w_barrier)) | W & dom(po \ ((po-loc > \ w_barrier) ; [W] ; po)) > > let to-w = (overwrite & int) | (addr | rmb ; [Live])? ; rwdep ; ([Live] ; > wmb)? > > > > In any case, it seems that any approximation we can make to Live will be > > subject to various sorts of errors. > > Probably (this is certainly true for trying to approximate dependencies, for > example), but what I know for certain is that the approximations of Live > inside cat get more ugly the more precise they become. In the above > definition of Live I have not included that the address must escape, nor > that it must not be freed. > > A non-local definition that suffices for OOTA would be so: > > Live = R & rng(rfe) & dom(rwdep ; rfe) | W & dom(rfe) I could live with this (although I would prefer to have more parentheses -- IMO it'smistake-prone to rely on the relative precedence of | and &). Especially if the to-w definition above were rewritten in a way that would be a little easier to parse and understand. > It seems the ideal solution is to let Live be defined by the tools, which > should keep up with or exceed the analysis done by state-of-art compilers. I don't think it works that way in practice. :-) Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-09 16:17 ` Alan Stern @ 2025-01-09 16:44 ` Jonas Oberhauser 2025-01-09 19:27 ` Alan Stern 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-09 16:44 UTC (permalink / raw) To: Alan Stern Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/9/2025 um 5:17 PM schrieb Alan Stern: > On Wed, Jan 08, 2025 at 08:22:07PM +0100, Jonas Oberhauser wrote: >> >> >> Am 1/8/2025 um 7:47 PM schrieb Alan Stern: >>> On Wed, Jan 08, 2025 at 06:33:05PM +0100, Jonas Oberhauser wrote: >>>> >>>> >>>> Am 1/7/2025 um 5:09 PM schrieb Alan Stern: >>>>> Is this really valid? In the example above, if there were no other >>>>> references to a or b in the rest of the program, the compiler could >>>>> eliminate them entirely. >>>> >>>> In the example above, this is not possible, because the address of a/b have >>>> escaped the function and are not deallocated before synchronization happens. >>>> Therefore the compiler must assume that a/b are accessed inside the compiler >>>> barrier. >>> >>> I'm not quite sure what you mean by that, but if the compiler has access >>> to the entire program and can do a global analysis then it can recognize >>> cases where accesses that may seem to be live aren't really. >> >> Even for trivial enough cases where the compiler has access to all the >> source, compiler barriers should be opaque to the compiler. >> >> Since it is opaque, >> >> *a = 1; >> compiler_barrier(); >> >> might as well be >> >> *a = 1; >> *d = *a; // *d is in device memory >> >> and so in my opinion the compiler needs to ensure that the value of *a right >> before the compiler barrier is 1. >> >> Of course, only if the address of *a could be possibly legally known to the >> opaque code in the compiler barrier. > > What do you mean by "opaque code in the compiler barrier"? The > compiler_barrier() instruction doesn't generate any code at all; it > merely directs the compiler not to carry any knowledge about values > stored in memory from one side of the barrier to the other. What I mean by "opaque" is that the compiler does not analyze the code inside the compiler barrier, so it must treat it as a black box that could manipulate memory arbitrarily within the limitation that it can not guess the address of memory. So for example, in int a = 1; barrier(); a = 2; //... the compiler does not know how the code inside barrier() accesses memory, including volatile memory. But it knows that it can not access `a`, because the address of `a` has never escaped before the barrier(). So it can change this to: barrier(); int a = 2; // ... But if we let the address of `a` escape, for example with some external function foo(int*): int a; foo(&a); a = 1; barrier(); a = 2; // ... Then the compiler has to assume that the code of foo and barrier might be something like this: foo(p) { SPECIAL_VARIABLE = p; } barrier() { TURN_THE_BREAKS_ON = *SPECIAL_VARIABLE; } and it must make sure that the value of `a` before barrier() is 1. That is at least my understanding. In fact, even if `a` is unused after a=2, the compiler can only eliminate `a` in the former case, but in the latter case, still needs to ensure that the value of `a` before barrier() is 1 (but it can eliminate a=2). > > Note that it does _not_ necessarily prevent the compiler from carrying > knowledge that a memory location is unused from one side of the barrier > to the other. Yes, or even merging/moving assignments to the memory location across a barrier(), as in the example above. >>> However, I admit this objection doesn't really apply to Linux kernel >>> programming. >>> >>>>> (Whether the result could count as OOTA is >>>>> open to question, but that's not the point.) Is it not possible that a >>>>> compiler might find other ways to defeat your intentions? >>>> >>>> The main point (which I already mentioned in the previous e-mail) is if the >>>> object is deallocated without synchronization (or never escapes the function >>>> in the first place). >>>> >>>> And indeed, any such case renders the added rule unsound. It is in a sense >>>> unrelated to OOTA; cases where the load/store can be elided are never OOTA. >>> >>> That is a matter of definition. In our paper, Paul and I described >>> instances of OOTA in which all the accesses have been optimized away as >>> "trivial". >> >> Yes, by OOTA I mean a rwdep;rfe cycle. >> >> In the absence of data races, such a cycle can't be optimized away because >> it is created with volatile/compiler-barrier-protected accesses. > > That wasn't true in the C++ context of the paper Paul and I worked on. > Of course, C++ is not our current context here. Yes, you are completely correct. In C++ (or pure C), where data races are prevented by compiler/language-builtins rather than with compiler-barriers/volatiles, all my assumptions break. That is because the compiler absolutely knows that an atomic_store(&x) does not access any memory location other than x, so it can do a lot more "harmful" optimizations. That's why I said such a language model should just exclude global OOTA by fiat. I have to read your paper again (I think I read it a few months ago) to understand if the trivial OOTA would make even that vague axiom unsound (my intuition says that if the OOTA is never observed by influencing the side-effect, then forbidding OOTA makes no difference to the set of "observable behaviors" of a C++ program even there is a trivial OOTA, and if the OOTA has visible side-effects, then it is acceptable for the compiler not to do the "optimization" that turns it into a trivial OOTA and choose some other optimization instead, so we can as well forbid the compiler from doing it). > What I was trying to get at above is that compiler-barrier protection > does not necessarily guarantee that non-volatile accesses can't be > optimized away. (However, it's probably safe for us to make such an > assumption here.) Yes, I agree. >> It seems the ideal solution is to let Live be defined by the tools, which >> should keep up with or exceed the analysis done by state-of-art compilers. > > I don't think it works that way in practice. :-) Yeah... maybe not :( Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-09 16:44 ` Jonas Oberhauser @ 2025-01-09 19:27 ` Alan Stern 2025-01-09 20:09 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Alan Stern @ 2025-01-09 19:27 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Thu, Jan 09, 2025 at 05:44:54PM +0100, Jonas Oberhauser wrote: > > > Am 1/9/2025 um 5:17 PM schrieb Alan Stern: > > What do you mean by "opaque code in the compiler barrier"? The > > compiler_barrier() instruction doesn't generate any code at all; it > > merely directs the compiler not to carry any knowledge about values > > stored in memory from one side of the barrier to the other. > > What I mean by "opaque" is that the compiler does not analyze the code > inside the compiler barrier, so it must treat it as a black box that could > manipulate memory arbitrarily within the limitation that it can not guess > the address of memory. Okay, I see what you're getting at. The way you express it is a little confusing, because in fact there is NO code inside the compiler barrier (although the compiler doesn't know that) -- the barrier() macro expands to an empty assembler instruction, along with an annotation telling the compiler that this instruction may affect the contents of memory in unknown and unpredictable ways. > So for example, in > > int a = 1; > barrier(); > a = 2; > //... > > the compiler does not know how the code inside barrier() accesses memory, > including volatile memory. I would say rather that the compiler does not know that the values stored in memory are the same before and after the barrier(). Even the values of local variables whose addresses have not been exported. > But it knows that it can not access `a`, because the address of `a` has > never escaped before the barrier(). I don't think this is right. barrier is (or can be) a macro, not a function call with its own scope. As such, it has -- in principle -- the ability to export the address of a. Question: Can the compiler assume that no other threads access a between the two stores, on the grounds that this would be a data race? I'd guess that it can't make that assumption, but it would be nice to know for sure. > So it can change this to: > > barrier(); > int a = 2; > // ... > > But if we let the address of `a` escape, for example with some external > function foo(int*): > > int a; > foo(&a); > a = 1; > barrier(); > a = 2; > // ... > > Then the compiler has to assume that the code of foo and barrier might be > something like this: > > foo(p) { SPECIAL_VARIABLE = p; } > barrier() { TURN_THE_BREAKS_ON = *SPECIAL_VARIABLE; } I think you're giving the compiler too much credit. The one thing the compiler is allowed to assume is that the code, as written, does not contain a data race or other undefined behavior. > and it must make sure that the value of `a` before barrier() is 1. > > That is at least my understanding. This is the sort of question that memory-barriers.txt should answer. It's closely related to the question I mentioned above. > In fact, even if `a` is unused after a=2, the compiler can only eliminate > `a` in the former case, but in the latter case, still needs to ensure that > the value of `a` before barrier() is 1 (but it can eliminate a=2). And what if a were a global shared variable instead of a local one? The compiler is still allowed to do weird optimizations on it, since the accesses aren't volatile. The barrier() merely prevents the compiler from using its knowledge that a is supposed to contain 1 before the barrier to influence its decisions about how to optimize the code following the barrier. > > That wasn't true in the C++ context of the paper Paul and I worked on. > > Of course, C++ is not our current context here. > > Yes, you are completely correct. In C++ (or pure C), where data races are > prevented by compiler/language-builtins rather than with > compiler-barriers/volatiles, all my assumptions break. > > That is because the compiler absolutely knows that an atomic_store(&x) does > not access any memory location other than x, so it can do a lot more > "harmful" optimizations. > > That's why I said such a language model should just exclude global OOTA by > fiat. One problem with doing this is that there is no widely agreed-upon formal definition of OOTA. A cycle in (rwdep ; rfe) isn't the answer because rwdep does not encapsulate the notion of semantic dependency. > I have to read your paper again (I think I read it a few months ago) to > understand if the trivial OOTA would make even that vague axiom unsound > (my intuition says that if the OOTA is never observed by influencing the > side-effect, then forbidding OOTA makes no difference to the set of > "observable behaviors" of a C++ program even there is a trivial OOTA, and if > the OOTA has visible side-effects, then it is acceptable for the compiler > not to do the "optimization" that turns it into a trivial OOTA and choose > some other optimization instead, so we can as well forbid the compiler from > doing it). If an instance of OOTA is never observed, does it exist? In the paper, I speculated that if a physical execution of a program matches an abstract execution containing such a non-observed OOTA cycle, then it also matches another abstract execution in which the cycle doesn't exist. I don't know how to prove this conjecture, though. Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-09 19:27 ` Alan Stern @ 2025-01-09 20:09 ` Jonas Oberhauser 2025-01-10 3:12 ` Alan Stern 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-09 20:09 UTC (permalink / raw) To: Alan Stern Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/9/2025 um 8:27 PM schrieb Alan Stern: > On Thu, Jan 09, 2025 at 05:44:54PM +0100, Jonas Oberhauser wrote: >> >> >> Am 1/9/2025 um 5:17 PM schrieb Alan Stern: >>> What do you mean by "opaque code in the compiler barrier"? The >>> compiler_barrier() instruction doesn't generate any code at all; it >>> merely directs the compiler not to carry any knowledge about values >>> stored in memory from one side of the barrier to the other. >> >> What I mean by "opaque" is that the compiler does not analyze the code >> inside the compiler barrier, so it must treat it as a black box that could >> manipulate memory arbitrarily within the limitation that it can not guess >> the address of memory. > > Okay, I see what you're getting at. The way you express it is a little > confusing, because in fact there is NO code inside the compiler barrier > (although the compiler doesn't know that) -- the barrier() macro expands > to an empty assembler instruction, along with an annotation telling the > compiler that this instruction may affect the contents of memory in > unknown and unpredictable ways. I am happy to learn a better way to express this. > >> So for example, in >> >> int a = 1; >> barrier(); >> a = 2; >> //... >> >> the compiler does not know how the code inside barrier() accesses memory, >> including volatile memory. > > I would say rather that the compiler does not know that the values > stored in memory are the same before and after the barrier(). > > Even the > values of local variables whose addresses have not been exported. No, this is not true. I used to think so too until a short while ago. But if you look at the output of gcc -o3 you will see that it does happily remove `a` in this function. > >> But it knows that it can not access `a`, because the address of `a` has >> never escaped before the barrier(). > > I don't think this is right. barrier is (or can be) a macro, not a > function call with its own scope. As such, it has -- in principle -- > the ability to export the address of a. See above. Please test it for yourself, but for your convenience here is the code extern void foo(int *p); int opt_a() { int a; a = 1; __asm__ volatile ("" ::: "memory"); a = 2; } int opt_b() { int a; foo(&a); a = 1; __asm__ volatile ("" ::: "memory"); a = 2; } and corresponding asm: opt_a: bx lr (the whole function body got deleted!) opt_b: push {lr} sub sp, sp, #12 // calling foo(&a) add r0, sp, #4 bl foo // a = 1 -- to make sure a==1 before the barrier() movs r3, #1 str r3, [sp, #4] // [empty code from the barrier()] // a = 2 -- totally deleted // [return] add sp, sp, #12 ldr pc, [sp], #4 If you wanted to avoid this, you would need to expose the address of `a` to the asm block using one of its clobber arguments (but I'm not familiar with the syntax to produce a working example on the spot). > > Question: Can the compiler assume that no other threads access a between > the two stores, on the grounds that this would be a data race? I'd > guess that it can't make that assumption, but it would be nice to know > for sure. It can not make the assumption if &a has escaped. In that case, barrier() could be so: barrier(){ store_release(OTHER_THREAD_PLEASE_MODIFY,&a); while (! load_acquire(OTHER_THREAD_IS_DONE)); } with another thread doing while (! load_acquire(OTHER_THREAD_PLEASE_MODIFY)) yield(); *OTHER_THREAD_PLEASE_MODIFY ++; store_release(OTHER_THREAD_IS_DONE, 1); >> >> But if we let the address of `a` escape, for example with some external >> function foo(int*): >> >> int a; >> foo(&a); >> a = 1; >> barrier(); >> a = 2; >> // ... >> >> Then the compiler has to assume that the code of foo and barrier might be >> something like this: >> >> foo(p) { SPECIAL_VARIABLE = p; } >> barrier() { TURN_THE_BREAKS_ON = *SPECIAL_VARIABLE; } > > I think you're giving the compiler too much credit. The one thing the > compiler is allowed to assume is that the code, as written, does not > contain a data race or other undefined behavior. Apologies, the way I used "assume" is misleading. I should have said that the compiler has to ensure that even if the code of foo() and barrier() were so, that the behavior of the code it generates is the same (w.r.t. observable side effects) as if the program were executed by the abstract machine. Or I should have said that it can *not* assume that the functions are *not* as defined above. Which means that TURN_THE_BREAKS_ON would need to be assigned 1. The only way the compiler can achieve that guarantee (while treating barrier as a black box) is to make sure that the value of `a` before barrier() is 1. >> >> That is at least my understanding. > > This is the sort of question that memory-barriers.txt should answer. > It's closely related to the question I mentioned above. > >> In fact, even if `a` is unused after a=2, the compiler can only eliminate >> `a` in the former case, but in the latter case, still needs to ensure that >> the value of `a` before barrier() is 1 (but it can eliminate a=2). > > And what if a were a global shared variable instead of a local one? The > compiler is still allowed to do weird optimizations on it, since the > accesses aren't volatile. I'm not sure if `a` being global is enough for the compiler to consider `a`'s address as having escaped to the inline asm memory, especially if `a` has static lifetime. If it must be considered escaped, then it can do weird optimizations on it, but not across the barrier(). That is because inside the barrier(), we could be reading the value and store into a volatile field like TURN_THE_BREAKS_ON. In that case, the value right before the barrier needs to be equal to the value in the abstract machine. > >>> That wasn't true in the C++ context of the paper Paul and I worked on. >>> Of course, C++ is not our current context here. >> >> Yes, you are completely correct. In C++ (or pure C), where data races are >> prevented by compiler/language-builtins rather than with >> compiler-barriers/volatiles, all my assumptions break. >> >> That is because the compiler absolutely knows that an atomic_store(&x) does >> not access any memory location other than x, so it can do a lot more >> "harmful" optimizations. >> >> That's why I said such a language model should just exclude global OOTA by >> fiat. > > One problem with doing this is that there is no widely agreed-upon > formal definition of OOTA. A cycle in (rwdep ; rfe) isn't the answer > because rwdep does not encapsulate the notion of semantic dependency. rwdep does not encapsulate any specific notion. It is herd7 which decides which dependency edges to add to the graphs it generates. If herd7 would generate edges for semantic dependencies instead of for its version of syntactic dependencies, then rwdep is the answer. Given that we can not define dep inside the cat model, one may as well define it as rwdep;rfe with the intended meaning of the dependencies being the semantic ones; then it is an inaccuracy of herd7 that it does not provide the proper dependencies. Anyways LKMM should not care about syntactic dependencies, e.g. if (READ_ONCE(*a)) { WRITE_ONCE(*b,1); } else { WRITE_ONCE(*b,1); } has no semantic dependency and gcc does not guarantee the order between these two accesses, even though herd7 does give us a dependency edge. >> I have to read your paper again (I think I read it a few months ago) to >> understand if the trivial OOTA would make even that vague axiom unsound >> (my intuition says that if the OOTA is never observed by influencing the >> side-effect, then forbidding OOTA makes no difference to the set of >> "observable behaviors" of a C++ program even there is a trivial OOTA, and if >> the OOTA has visible side-effects, then it is acceptable for the compiler >> not to do the "optimization" that turns it into a trivial OOTA and choose >> some other optimization instead, so we can as well forbid the compiler from >> doing it). > > If an instance of OOTA is never observed, does it exist? :) :) :) > In the paper, I speculated that if a physical execution of a program > matches an abstract execution containing such a non-observed OOTA cycle, > then it also matches another abstract execution in which the cycle > doesn't exist. I don't know how to prove this conjecture, though. Yes, that also makes sense. Note that this speculation does not hold in the current LKMM though. In the Litmus test I shared in the opening e-mail, where the outcome 0:r1=1 /\ 1:r1=1 is only possible with an OOTA (even though the values from the OOTA are never used anywhere). With C++'s non-local model I wouldn't be totally surprised if there were similar examples in C++, but given that its ordering definition is a lot more straightforward than LKMM in that it doesn't have all these cases of different barriers like wmb and rmb and corner cases like Noreturn etc., my intuition says that there aren't any. But I am not going to think deeply about it for the time being. Best wishes jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-09 20:09 ` Jonas Oberhauser @ 2025-01-10 3:12 ` Alan Stern 2025-01-10 12:21 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Alan Stern @ 2025-01-10 3:12 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Thu, Jan 09, 2025 at 09:09:00PM +0100, Jonas Oberhauser wrote: > > > So for example, in > > > > > > int a = 1; > > > barrier(); > > > a = 2; > > > //... > > > > > > the compiler does not know how the code inside barrier() accesses memory, > > > including volatile memory. > > > > I would say rather that the compiler does not know that the values > > stored in memory are the same before and after the barrier(). > > > > Even the > > values of local variables whose addresses have not been exported. > > No, this is not true. I used to think so too until a short while ago. > > But if you look at the output of gcc -o3 you will see that it does happily > remove `a` in this function. Isn't that consistent with what I said? > > > But it knows that it can not access `a`, because the address of `a` has > > > never escaped before the barrier(). > > > > I don't think this is right. barrier is (or can be) a macro, not a > > function call with its own scope. As such, it has -- in principle -- > > the ability to export the address of a. > > See above. Please test it for yourself, but for your convenience here is the > code Oh, I believe that gcc does what you say. I'm just not sure your explanation is entirely accurate. > If you wanted to avoid this, you would need to expose the address of `a` to > the asm block using one of its clobber arguments (but I'm not familiar with > the syntax to produce a working example on the spot). > > > > > Question: Can the compiler assume that no other threads access a between > > the two stores, on the grounds that this would be a data race? I'd > > guess that it can't make that assumption, but it would be nice to know > > for sure. > > It can not make the assumption if &a has escaped. In that case, barrier() > could be so: > > barrier(){ > store_release(OTHER_THREAD_PLEASE_MODIFY,&a); > > while (! load_acquire(OTHER_THREAD_IS_DONE)); > } > > with another thread doing > > while (! load_acquire(OTHER_THREAD_PLEASE_MODIFY)) yield(); > *OTHER_THREAD_PLEASE_MODIFY ++; > store_release(OTHER_THREAD_IS_DONE, 1); Bear in mind that there's a difference between what a compiler _can do_ and what gcc _currently does_. > > I think you're giving the compiler too much credit. The one thing the > > compiler is allowed to assume is that the code, as written, does not > > contain a data race or other undefined behavior. > > Apologies, the way I used "assume" is misleading. > I should have said that the compiler has to ensure that even if the code of > foo() and barrier() were so, that the behavior of the code it generates is > the same (w.r.t. observable side effects) as if the program were executed by > the abstract machine. Or I should have said that it can *not* assume that > the functions are *not* as defined above. > > Which means that TURN_THE_BREAKS_ON would need to be assigned 1. > The only way the compiler can achieve that guarantee (while treating barrier > as a black box) is to make sure that the value of `a` before barrier() is 1. Who says the compiler has to treat barrier() as a black box? As far as I know, gcc makes no such guarantee. > > > That's why I said such a language model should just exclude global OOTA by > > > fiat. > > > > One problem with doing this is that there is no widely agreed-upon > > formal definition of OOTA. A cycle in (rwdep ; rfe) isn't the answer > > because rwdep does not encapsulate the notion of semantic dependency. > > rwdep does not encapsulate any specific notion. > It is herd7 which decides which dependency edges to add to the graphs it > generates. Sorry, that's what I meant: rwdep plus the decisions that herd7 makes about which edges are dependencies. > If herd7 would generate edges for semantic dependencies instead of for its > version of syntactic dependencies, then rwdep is the answer. That statement is meaningless (or at least, impossible to implement) because there is no widely agreed-upon formal definition for semantic dependency. > Given that we can not define dep inside the cat model, one may as well > define it as rwdep;rfe with the intended meaning of the dependencies being > the semantic ones; then it is an inaccuracy of herd7 that it does not > provide the proper dependencies. Perhaps so. LKMM does include other features which the compiler can defeat if the programmer isn't sufficiently careful. Still, I suspect that changing the memory model solely with the goal of eliminating OOTA may not be a good idea. > Anyways LKMM should not care about syntactic dependencies, e.g. > > > if (READ_ONCE(*a)) { > WRITE_ONCE(*b,1); > } else { > WRITE_ONCE(*b,1); > } > > has no semantic dependency and gcc does not guarantee the order between > these two accesses, even though herd7 does give us a dependency edge. Like I said. > > > I have to read your paper again (I think I read it a few months ago) to > > > understand if the trivial OOTA would make even that vague axiom unsound > > > (my intuition says that if the OOTA is never observed by influencing the > > > side-effect, then forbidding OOTA makes no difference to the set of > > > "observable behaviors" of a C++ program even there is a trivial OOTA, and if > > > the OOTA has visible side-effects, then it is acceptable for the compiler > > > not to do the "optimization" that turns it into a trivial OOTA and choose > > > some other optimization instead, so we can as well forbid the compiler from > > > doing it). > > > > If an instance of OOTA is never observed, does it exist? > > :) :) :) > > > In the paper, I speculated that if a physical execution of a program > > matches an abstract execution containing such a non-observed OOTA cycle, > > then it also matches another abstract execution in which the cycle > > doesn't exist. I don't know how to prove this conjecture, though. > > Yes, that also makes sense. > > Note that this speculation does not hold in the current LKMM though. In the > Litmus test I shared in the opening e-mail, where the outcome 0:r1=1 /\ > 1:r1=1 is only possible with an OOTA (even though the values from the OOTA > are never used anywhere). If the fact that the outcome 0:r1=1 /\ 1:r1=1 has occurred is proof that there was OOTA, then the OOTA cycle _is_ observed, albeit indirectly -- at least, in the sense that I intended. (The situation mentioned in the paper is better described as an execution where the compiler has elided all the accesses in the OOTA cycle.) > With C++'s non-local model I wouldn't be totally surprised if there were > similar examples in C++, but given that its ordering definition is a lot > more straightforward than LKMM in that it doesn't have all these cases of > different barriers like wmb and rmb and corner cases like Noreturn etc., my > intuition says that there aren't any. I'll have to give this some thought. Alan > But I am not going to think deeply about it for the time being. > > Best wishes > jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-10 3:12 ` Alan Stern @ 2025-01-10 12:21 ` Jonas Oberhauser 2025-01-10 21:51 ` Alan Stern 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-10 12:21 UTC (permalink / raw) To: Alan Stern Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/10/2025 um 4:12 AM schrieb Alan Stern: > On Thu, Jan 09, 2025 at 09:09:00PM +0100, Jonas Oberhauser wrote: > >>>> So for example, in >>>> >>>> int a = 1; >>>> barrier(); >>>> a = 2; >>>> //... >>>> >>>> the compiler does not know how the code inside barrier() accesses memory, >>>> including volatile memory. >>> >>> I would say rather that the compiler does not know that the values >>> stored in memory are the same before and after the barrier(). >>> >>> Even the >>> values of local variables whose addresses have not been exported. >> >> No, this is not true. I used to think so too until a short while ago. >> >> But if you look at the output of gcc -o3 you will see that it does happily >> remove `a` in this function. > > Isn't that consistent with what I said? Ok, after careful reading, I think there are two assumptions you have that I think are not true, but my example is only inconsistent with exactly one of them being not true, not with both of them being not true: 1) the barrier only tells the compiler that it may *change* the value of memory locations. I believe it also tells the compiler that it may *read* the value of memory locations. 2) the barrier also talks about the values of local variables whose addresses have not been exported. I do not believe this is the case. The second example I put (where a=1 is still emitted) shows your assumption 1) is inconsistent with what gcc currently does. For what gcc guarantees, the manual says: "add memory to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction", i.e., it needs to flush the value from the register to actual memory. I believe this too is not consistent with your assumption 1), because if the barrier would just modify memory and not read it, there would be no need to flush the value to memory. It would suffice to ensure that the value is not assumed to be the same after the barrier. With your assumption 1) discharged, the fact that `a=1` still can be removed from before the barrier should show that this guarantee does not hold for all memory locations (only for those that could legally be accessed in the barrier, which are those whose address has been exported). >>> Question: Can the compiler assume that no other threads access a between >>> the two stores, on the grounds that this would be a data race? I'd >>> guess that it can't make that assumption, but it would be nice to know >>> for sure. >> >> It can not make the assumption if &a has escaped. In that case, barrier() >> could be so: >> >> barrier(){ >> store_release(OTHER_THREAD_PLEASE_MODIFY,&a); >> >> while (! load_acquire(OTHER_THREAD_IS_DONE)); >> } >> >> with another thread doing >> >> while (! load_acquire(OTHER_THREAD_PLEASE_MODIFY)) yield(); >> *OTHER_THREAD_PLEASE_MODIFY ++; >> store_release(OTHER_THREAD_IS_DONE, 1); > > Bear in mind that there's a difference between what a compiler _can do_ > and what gcc _currently does_. Of course. But I am not sure if your comment addresses my comment here, or was intended for another section. > >>> I think you're giving the compiler too much credit. The one thing the >>> compiler is allowed to assume is that the code, as written, does not >>> contain a data race or other undefined behavior. >> >> Apologies, the way I used "assume" is misleading. >> I should have said that the compiler has to ensure that even if the code of >> foo() and barrier() were so, that the behavior of the code it generates is >> the same (w.r.t. observable side effects) as if the program were executed by >> the abstract machine. Or I should have said that it can *not* assume that >> the functions are *not* as defined above. >> >> Which means that TURN_THE_BREAKS_ON would need to be assigned 1. >> The only way the compiler can achieve that guarantee (while treating barrier >> as a black box) is to make sure that the value of `a` before barrier() is 1. > > Who says the compiler has to treat barrier() as a black box? As far as > I know, gcc makes no such guarantee. Note that if it wouldn't, then barrier() would not work. Since the asm instruction is empty, the compiler could figure this out easily and just delete it. But maybe I should also be more precise about what I mean by black box, namely, that 1) the asm block has significant side effects, 2) which include reading and storing to arbitrary (legally known) memory locations in an arbitrary order & control flow Both of these imply that the compiler can not assume that it does not execute some logic equivalent to what I put above. >> If herd7 would generate edges for semantic dependencies instead of for its >> version of syntactic dependencies, then rwdep is the answer. > > That statement is meaningless (or at least, impossible to implement) > because there is no widely agreed-upon formal definition for semantic > dependency. Yes, which also means that a 100% correct end-to-end solution (herd + cat + ... ?) is currently not implementable. But we can still break the problem into two halves, one which is 100% solved inside the cat file, and one which is the responsibility of herd7 and currently not solved (or 100% satisfactorily solvable). The advantage being that if we read the cat file as a mathematical definition, we can at least on paper argue 100% correctly about code for the cases where we either can figure out on paper what the semantic dependencies are, or where we at least just say "with relation to current compilers, we know what the semantically preserved dependencies are", even if herd7 or other tools lags behind in one or both. After all, herd7 is just a (useful) automation tool for reasoning about LKMM, which has its limitations (scalability, a specific definition of dependencies, limited C subset...). >> Given that we can not define dep inside the cat model, one may as well >> define it as rwdep;rfe with the intended meaning of the dependencies being >> the semantic ones; then it is an inaccuracy of herd7 that it does not >> provide the proper dependencies. > > Perhaps so. LKMM does include other features which the compiler can > defeat if the programmer isn't sufficiently careful. How many of these are due to herd7's limitations vs. in the cat file? >>> In the paper, I speculated that if a physical execution of a program >>> matches an abstract execution containing such a non-observed OOTA cycle, >>> then it also matches another abstract execution in which the cycle >>> doesn't exist. I don't know how to prove this conjecture, though. >> >> Yes, that also makes sense. >> >> Note that this speculation does not hold in the current LKMM though. In the >> Litmus test I shared in the opening e-mail, where the outcome 0:r1=1 /\ >> 1:r1=1 is only possible with an OOTA (even though the values from the OOTA >> are never used anywhere). > > If the fact that the outcome 0:r1=1 /\ 1:r1=1 has occurred is proof that > there was OOTA, then the OOTA cycle _is_ observed, albeit indirectly -- > at least, in the sense that I intended. (The situation mentioned in the > paper is better described as an execution where the compiler has elided > all the accesses in the OOTA cycle.) I'm not sure that sense makes a lot of sense to me. But it does make the proof of your claim totally trivial. If there is no other OOTA-free execution with the same observable behavior, then it is proof that the OOTA happened, so the OOTA was observed. So by contraposition any non-observed OOTA has an OOTA-free execution with the same observable behavior. The sense in which I would define observed is more along the lines of "there is an observable side effect (such as store to volatile location) which has a semantic dependency on a load that reads from one of the stores in the OOTA cycle". > >> With C++'s non-local model I wouldn't be totally surprised if there were >> similar examples in C++, but given that its ordering definition is a lot >> more straightforward than LKMM in that it doesn't have all these cases of >> different barriers like wmb and rmb and corner cases like Noreturn etc., my >> intuition says that there aren't any. > > I'll have to give this some thought. If you do want to prove the claim for the stricter definition of observable OOTA, I would think about looking at the first read r in each thread that reads from an OOTA store w, and see if at least one of them could read from another store. If that is not possible, then my intuition would be that there would be some happens-before relation blocking you from reading from an earlier store than w, in particular, w ->hb r. If it is not possible on any thread, then you get a bunch of these hb edges much in parallel to the OOTA cycle. Perhaps you can turn that into an hb cycle. This last step doesn't work in LKMM because the hb may be caused by rmb/wmb, which does not extend over the plain accesses in the OOTA cycle to the bounding store to make a bigger hb cycle. But in C++, if you have w ->hb r on each OOTA rfe edge, then you also have r ->hb w' along the OOTA dep edge, and you get an hb cycle. Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-10 12:21 ` Jonas Oberhauser @ 2025-01-10 21:51 ` Alan Stern 2025-01-11 12:46 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Alan Stern @ 2025-01-10 21:51 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Fri, Jan 10, 2025 at 01:21:32PM +0100, Jonas Oberhauser wrote: > > > Am 1/10/2025 um 4:12 AM schrieb Alan Stern: > > On Thu, Jan 09, 2025 at 09:09:00PM +0100, Jonas Oberhauser wrote: > > > > > > > So for example, in > > > > > > > > > > int a = 1; > > > > > barrier(); > > > > > a = 2; > > > > > //... > > > > > > > > > > the compiler does not know how the code inside barrier() accesses memory, > > > > > including volatile memory. > > > > > > > > I would say rather that the compiler does not know that the values > > > > stored in memory are the same before and after the barrier(). > > > > > > > > Even the > > > > values of local variables whose addresses have not been exported. > > > > > > No, this is not true. I used to think so too until a short while ago. > > > > > > But if you look at the output of gcc -o3 you will see that it does happily > > > remove `a` in this function. > > > > Isn't that consistent with what I said? > > > Ok, after careful reading, I think there are two assumptions you have that I > think are not true, but my example is only inconsistent with exactly one of > them being not true, not with both of them being not true: > > 1) the barrier only tells the compiler that it may *change* the value of > memory locations. I believe it also tells the compiler that it may *read* > the value of memory locations. > 2) the barrier also talks about the values of local variables whose > addresses have not been exported. I do not believe this is the case. I checked the GCC manual. You are right about 1); the compiler is required to guarantee that the contents of memory before the barrier are fully up-to-date (no dirty values remaining in registers or temporaries). 2) isn't so clear. If a local variable's address is never computed then the compiler might put the variable in a register, in which case the barrier would not clobber it. On the other hand, if the variable's address is computed somewhere (even if it isn't exported) then the variable can't be kept in a register and so it is subject to the barrier's effects. The manual says: Using the ‘"memory"’ clobber effectively forms a read/write memory barrier for the compiler. Not a word about what happens if a variable has the "register" storage class, for example, and so might never be stored in memory at all. But this leaves the programmer with _no_ way to specify a memory barrier for such variables! Of course, the fact that these variables cannot be exposed to outside code does mitigate the problem... > For what gcc guarantees, the manual says: "add memory to the list of > clobbered registers. This will cause GCC to not keep memory values cached in > registers across the assembler instruction", i.e., it needs to flush the > value from the register to actual memory. Yes, GCC will write back dirty values from registers. But not because the cached values will become invalid (in fact, the cached values might not even be used after the barrier). Rather, because the compiler is required to assume that the assembler code in the barrier might access arbitrary memory locations -- even if that code is empty. > > > > Question: Can the compiler assume that no other threads access a between > > > > the two stores, on the grounds that this would be a data race? I'd > > > > guess that it can't make that assumption, but it would be nice to know > > > > for sure. > > > > > > It can not make the assumption if &a has escaped. In that case, barrier() > > > could be so: > > > > > > barrier(){ > > > store_release(OTHER_THREAD_PLEASE_MODIFY,&a); > > > > > > while (! load_acquire(OTHER_THREAD_IS_DONE)); > > > } > > > > > > with another thread doing > > > > > > while (! load_acquire(OTHER_THREAD_PLEASE_MODIFY)) yield(); > > > *OTHER_THREAD_PLEASE_MODIFY ++; > > > store_release(OTHER_THREAD_IS_DONE, 1); Okay, yes, the compiler can't know whether the assembler code will do this. But as far as I know, there is no specification about whether inline assembler can synchronize with code in another thread (in the sense used by the C++ memory model) and therefore create a happens-before link. Language specifications tend to ignore issues like inline assembler. Does this give the compiler license to believe no such link can exist and therefore accesses to these non-atomic variables by another thread concurrent with the barrier would be data races? In the end maybe this doesn't matter. > > > If herd7 would generate edges for semantic dependencies instead of for its > > > version of syntactic dependencies, then rwdep is the answer. > > > > That statement is meaningless (or at least, impossible to implement) > > because there is no widely agreed-upon formal definition for semantic > > dependency. > > Yes, which also means that a 100% correct end-to-end solution (herd + cat + > ... ?) is currently not implementable. > > But we can still break the problem into two halves, one which is 100% solved > inside the cat file, and one which is the responsibility of herd7 and > currently not solved (or 100% satisfactorily solvable). I believe that Luc and the other people involved with herd7 take the opposite point of view: herd7 is intended to do the "easy" analysis involving only straightforward code parsing, leaving the "hard" conceptual parts to the user-supplied .cat file. > The advantage being that if we read the cat file as a mathematical > definition, we can at least on paper argue 100% correctly about code for the > cases where we either can figure out on paper what the semantic dependencies > are, or where we at least just say "with relation to current compilers, we > know what the semantically preserved dependencies are", even if herd7 or > other tools lags behind in one or both. > > After all, herd7 is just a (useful) automation tool for reasoning about > LKMM, which has its limitations (scalability, a specific definition of > dependencies, limited C subset...). I still think we should not attempt any sort of formalization of semantic dependency here. > > > Given that we can not define dep inside the cat model, one may as well > > > define it as rwdep;rfe with the intended meaning of the dependencies being > > > the semantic ones; then it is an inaccuracy of herd7 that it does not > > > provide the proper dependencies. > > > > Perhaps so. LKMM does include other features which the compiler can > > defeat if the programmer isn't sufficiently careful. > > How many of these are due to herd7's limitations vs. in the cat file? Important limitations are present in both. > > > > In the paper, I speculated that if a physical execution of a program > > > > matches an abstract execution containing such a non-observed OOTA cycle, > > > > then it also matches another abstract execution in which the cycle > > > > doesn't exist. I don't know how to prove this conjecture, though. > > > > > > Yes, that also makes sense. > > > > > > Note that this speculation does not hold in the current LKMM though. In the > > > Litmus test I shared in the opening e-mail, where the outcome 0:r1=1 /\ > > > 1:r1=1 is only possible with an OOTA (even though the values from the OOTA > > > are never used anywhere). > > > > If the fact that the outcome 0:r1=1 /\ 1:r1=1 has occurred is proof that > > there was OOTA, then the OOTA cycle _is_ observed, albeit indirectly -- > > at least, in the sense that I intended. (The situation mentioned in the > > paper is better described as an execution where the compiler has elided > > all the accesses in the OOTA cycle.) > > I'm not sure that sense makes a lot of sense to me. Here's an example illustrating what I had in mind. Imagine that all the accesses here are C++-style relaxed atomic (i.e., not volatile and also not subject to data races): P0(int *a, int *b) { int r0 = *a; *b = r0; *b = 2; // r0 not used again } P1(int *a, int *b) { int r1 = *b; *a = r1; *a = 2; // r1 not used again } The compiler could eliminate the r0 and r1 accesses entirely, leaving just: P0(int *a, int *b) { *b = 2; } P1(int *a, int *b) { *a = 2; } An execution of the corresponding machine code would then be compatible with an abstract execution of the source code in which both r0 and r1 get set to 42 (OOTA). But it would also be compatible with an abstract execution in which both r0 and r1 are 0, so it doesn't make sense to say that the hardware execution is, or might be, an instance of OOTA. > But it does make the proof of your claim totally trivial. If there is no > other OOTA-free execution with the same observable behavior, then it is > proof that the OOTA happened, so the OOTA was observed. > So by contraposition any non-observed OOTA has an OOTA-free execution with > the same observable behavior. > > The sense in which I would define observed is more along the lines of "there > is an observable side effect (such as store to volatile location) which has > a semantic dependency on a load that reads from one of the stores in the > OOTA cycle". Yes, I can see that from your proposed definition of Live. I'm afraid we've wandered off the point of this email thread, however... Getting back to the original point, why don't you rewrite your patch as discussed earlier and describe it as an attempt to add ordering for important situations involving plain accesses that the LKMM currently does not handle? In other words, leave out as far as possible any mention of OOTA or semantic dependency. Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-10 21:51 ` Alan Stern @ 2025-01-11 12:46 ` Jonas Oberhauser 2025-01-11 21:19 ` Alan Stern 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-11 12:46 UTC (permalink / raw) To: Alan Stern Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/10/2025 um 10:51 PM schrieb Alan Stern: > On Fri, Jan 10, 2025 at 01:21:32PM +0100, Jonas Oberhauser wrote: >> >> >> Am 1/10/2025 um 4:12 AM schrieb Alan Stern: >>> On Thu, Jan 09, 2025 at 09:09:00PM +0100, Jonas Oberhauser wrote: >>> >>>>>> So for example, in >>>>>> >>>>>> int a = 1; >>>>>> barrier(); >>>>>> a = 2; >>>>>> //... >>>>>> >>>>>> the compiler does not know how the code inside barrier() accesses memory, >>>>>> including volatile memory. >>>>> >>>>> I would say rather that the compiler does not know that the values >>>>> stored in memory are the same before and after the barrier(). >>>>> >>>>> Even the >>>>> values of local variables whose addresses have not been exported. >>>> >>>> No, this is not true. I used to think so too until a short while ago. >>>> >>>> But if you look at the output of gcc -o3 you will see that it does happily >>>> remove `a` in this function. >>> >>> Isn't that consistent with what I said? >> >> >> Ok, after careful reading, I think there are two assumptions you have that I >> think are not true, but my example is only inconsistent with exactly one of >> them being not true, not with both of them being not true: >> >> [...] >> >> 2) the barrier also talks about the values of local variables whose >> addresses have not been exported. I do not believe this is the case. > > 2) isn't so clear. If a local variable's address is never computed then > the compiler might put the variable in a register, in which case the > barrier would not clobber it. On the other hand, if the variable's > address is computed somewhere (even if it isn't exported) then the > variable can't be kept in a register and so it is subject to the > barrier's effects. I understood "its address is exported" to mean that enough information has been exported to legally compute its address. Btw, I just remembered a discussion about provenance in C & C++ which is also very related to this, where the compiler moved a (I think non-atomic) access across a release fence because it "knew" that the address of the non-atomic was not exported. I can't find that discussion now. >> For what gcc guarantees, the manual says: "add memory to the list of >> clobbered registers. This will cause GCC to not keep memory values cached in >> registers across the assembler instruction", i.e., it needs to flush the >> value from the register to actual memory. > > Yes, GCC will write back dirty values from registers. But not because > the cached values will become invalid (in fact, the cached values might > not even be used after the barrier). Rather, because the compiler is > required to assume that the assembler code in the barrier might access > arbitrary memory locations -- even if that code is empty. Yes :) >>>> If herd7 would generate edges for semantic dependencies instead of for its >>>> version of syntactic dependencies, then rwdep is the answer. >>> >>> That statement is meaningless (or at least, impossible to implement) >>> because there is no widely agreed-upon formal definition for semantic >>> dependency. >> >> Yes, which also means that a 100% correct end-to-end solution (herd + cat + >> ... ?) is currently not implementable. >> >> But we can still break the problem into two halves, one which is 100% solved >> inside the cat file, and one which is the responsibility of herd7 and >> currently not solved (or 100% satisfactorily solvable). > > I believe that Luc and the other people involved with herd7 take the > opposite point of view: herd7 is intended to do the "easy" analysis > involving only straightforward code parsing, leaving the "hard" > conceptual parts to the user-supplied .cat file. I can understand the attractiveness of that point of view, but there is no way to define "semantic dependencies" at all or "live access" 100% accurately in cat, since it requires a lot of syntactic information that is not present at that level. But there is in herd7 (at least for some specific definition of "semantically dependent"). >> The advantage being that if we read the cat file as a mathematical >> definition, we can at least on paper argue 100% correctly about code for the >> cases where we either can figure out on paper what the semantic dependencies >> are, or where we at least just say "with relation to current compilers, we >> know what the semantically preserved dependencies are", even if herd7 or >> other tools lags behind in one or both. >> >> After all, herd7 is just a (useful) automation tool for reasoning about >> LKMM, which has its limitations (scalability, a specific definition of >> dependencies, limited C subset...). > > I still think we should not attempt any sort of formalization of > semantic dependency here. I 100% agree and apologies if I ever gave that impression. What I want is to change the interpretation of ctrl,data,addr in LKMM from saying "it is intended to be a syntactic dependency, which causes LKMM to be inaccurate" to "it is intended to be a semantic dependency, but because there is no formal defn. and tooling we *use* syntactic dependencies, which causes the current implementations to be inaccurate", without formally defining what a semantic dependency is. E.g., in "A WARNING", I would change ------------- The protections provided by READ_ONCE(), WRITE_ONCE(), and others are not perfect; and under some circumstances it is possible for the compiler to undermine the memory model. ------------- into something like ------------- The current tooling around LKMM does not model semantic dependencies, and instead uses syntactic dependencies to specify the protections provided by READ_ONCE(), WRITE_ONCE(), and others. The compiler can undermine these syntactic dependencies under some circumstances. As a consequence, the tooling may write checks that LKMM and the compiler can not cash. ------------- etc. This is under my assumption that if we had let's say gcc's "semantic dependencies" or an under-approximation of it (by that I mean allow less things to be dependent than gcc can see), that these cases would be resolved, in the sense that gcc can not undermine [R & Marked] ; gcc-dep where gcc-dep is the dependencies detected by gcc. But I would be interested to see any cases where this assumption is not true. Note that this still results in a fully formal definition of LKMM, because just like now, addr,ctrl,and data are simply uninterpreted relations. We don't need to formalize their meaning at that level. >>>> Given that we can not define dep inside the cat model, one may as well >>>> define it as rwdep;rfe with the intended meaning of the dependencies being >>>> the semantic ones; then it is an inaccuracy of herd7 that it does not >>>> provide the proper dependencies. >>> >>> Perhaps so. LKMM does include other features which the compiler can >>> defeat if the programmer isn't sufficiently careful. >> >> How many of these are due to herd7's limitations vs. in the cat file? > > Important limitations are present in both. I am genuinely asking. Do we have a list of the limitations? Maybe it would be good to collect it in the "A WARNING" section of explanation.txt if it doesn't exist elsewhere. >>>>> In the paper, I speculated that if a physical execution of a program >>>>> matches an abstract execution containing such a non-observed OOTA cycle, >>>>> then it also matches another abstract execution in which the cycle >>>>> doesn't exist. I don't know how to prove this conjecture, though. >>>> >>>> Yes, that also makes sense. >>>> >>>> Note that this speculation does not hold in the current LKMM though. In the >>>> Litmus test I shared in the opening e-mail, where the outcome 0:r1=1 /\ >>>> 1:r1=1 is only possible with an OOTA (even though the values from the OOTA >>>> are never used anywhere). >>> >>> If the fact that the outcome 0:r1=1 /\ 1:r1=1 has occurred is proof that >>> there was OOTA, then the OOTA cycle _is_ observed, albeit indirectly -- >>> at least, in the sense that I intended. (The situation mentioned in the >>> paper is better described as an execution where the compiler has elided >>> all the accesses in the OOTA cycle.) >> >> I'm not sure that sense makes a lot of sense to me. > > Here's an example illustrating what I had in mind. Imagine that all the > accesses here are C++-style relaxed atomic (i.e., not volatile and also > not subject to data races): > > P0(int *a, int *b) { > int r0 = *a; > *b = r0; > *b = 2; > // r0 not used again > } > > P1(int *a, int *b) { > int r1 = *b; > *a = r1; > *a = 2; > // r1 not used again > } > > The compiler could eliminate the r0 and r1 accesses entirely, leaving > just: > > P0(int *a, int *b) { > *b = 2; > } > > P1(int *a, int *b) { > *a = 2; > } > > An execution of the corresponding machine code would then be compatible > with an abstract execution of the source code in which both r0 and r1 > get set to 42 (OOTA). But it would also be compatible with an abstract > execution in which both r0 and r1 are 0, so it doesn't make sense to say > that the hardware execution is, or might be, an instance of OOTA. Yes, but this does not require the definition you expressed before. This is already not observable according to the definition that there is no read R from an OOTA-cycle store, where some observable side effect semantically depends on R. What I was trying to say is that your definition is almost a no-true-scotsman (sorry Paul) definition: every program with an OOTA execution where you could potentially not find an "equivalent" execution without OOTA, is simply labeled a "no-true-unobserved OOTA". >> But it does make the proof of your claim totally trivial. If there is no >> other OOTA-free execution with the same observable behavior, then it is >> proof that the OOTA happened, so the OOTA was observed. >> So by contraposition any non-observed OOTA has an OOTA-free execution with >> the same observable behavior. >> >> The sense in which I would define observed is more along the lines of "there >> is an observable side effect (such as store to volatile location) which has >> a semantic dependency on a load that reads from one of the stores in the >> OOTA cycle". > > Yes, I can see that from your proposed definition of Live. > I'm afraid we've wandered off the point of this email thread, however... Maybe, but it is still an important and interesting discussion. I am also open to continuing it on another channel though. > Getting back to the original point, why don't you rewrite your patch as > discussed earlier and describe it as an attempt to add ordering for > important situations involving plain accesses that the LKMM currently > does not handle? In other words, leave out as far as possible any > mention of OOTA or semantic dependency. I will try, but I have some other things to do first, so it may take a while. Have fun, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-11 12:46 ` Jonas Oberhauser @ 2025-01-11 21:19 ` Alan Stern 2025-01-12 15:55 ` Jonas Oberhauser 0 siblings, 1 reply; 59+ messages in thread From: Alan Stern @ 2025-01-11 21:19 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Sat, Jan 11, 2025 at 01:46:21PM +0100, Jonas Oberhauser wrote: > What I want is to change the interpretation of ctrl,data,addr in LKMM from > saying "it is intended to be a syntactic dependency, which causes LKMM to be > inaccurate" to "it is intended to be a semantic dependency, but because > there is no formal defn. and tooling we *use* syntactic dependencies, which > causes the current implementations to be inaccurate", without formally > defining what a semantic dependency is. I guess you could take that point of view. We have never tried to make it explicit. > E.g., in "A WARNING", I would change > ------------- > The protections provided by READ_ONCE(), WRITE_ONCE(), and others are > not perfect; and under some circumstances it is possible for the > compiler to undermine the memory model. > ------------- > into something like > ------------- > The current tooling around LKMM does not model semantic dependencies, and > instead uses syntactic dependencies to specify the protections provided by > READ_ONCE(), WRITE_ONCE(), and others. The compiler can undermine these > syntactic dependencies under some circumstances. > As a consequence, the tooling may write checks that LKMM and the compiler > can not cash. > ------------- > > etc. I doubt the ordinary reader would appreciate the difference. How would you feel about changing the existing text this way instead? ... it is possible for the compiler to undermine the memory model (because the LKMM and the software tools associated with it do not understand the somewhat vague concept of "semantic dependency" -- see below). > This is under my assumption that if we had let's say gcc's "semantic > dependencies" or an under-approximation of it (by that I mean allow less > things to be dependent than gcc can see), that these cases would be > resolved, in the sense that gcc can not undermine [R & Marked] ; gcc-dep > where gcc-dep is the dependencies detected by gcc. That seems circular. Basically, you're saying the gcc will not break any dependencies that gcc classifies as not-breakable! > But I would be interested to see any cases where this assumption is not > true. > > > Note that this still results in a fully formal definition of LKMM, because > just like now, addr,ctrl,and data are simply uninterpreted relations. We > don't need to formalize their meaning at that level. > > > > Perhaps so. LKMM does include other features which the compiler can > > > > defeat if the programmer isn't sufficiently careful. > > > > > > How many of these are due to herd7's limitations vs. in the cat file? > > > > Important limitations are present in both. > > I am genuinely asking. Do we have a list of the limitations? > Maybe it would be good to collect it in the "A WARNING" section of > explanation.txt if it doesn't exist elsewhere. There are a few listed already at various spots in explanation.txt -- search for "undermine". And yes, many or most of these limitations do arise from LKMM's failure to recognize when a dependency isn't semantic. Maybe some are also related to undefined behavior, which LKMM is not aware of. There is one other weakness I know of, however -- something totally different. It's an instance in which the formal model in the .cat file fails to capture the intent of the informal operational model. As I recall, it goes like this: The operational model asserts that an A-cumulative fence (like a release fence) on a CPU ensures ordering for all stores that propagate to that CPU before the fence is executed. But the .cat code says only that this ordering applies to stores which the CPU reads from before the fence is executed. I believe you can make up litmus tests where you can prove that a store must have propagated to the CPU before an A-cumulative fence occurs, even though the CPU doesn't read from that store; for such examples the LKMM may accept executions that shouldn't be allowed. There may even be an instance of this somewhere in the various litmus test archives; I don't remember. > > An execution of the corresponding machine code would then be compatible > > with an abstract execution of the source code in which both r0 and r1 > > get set to 42 (OOTA). But it would also be compatible with an abstract > > execution in which both r0 and r1 are 0, so it doesn't make sense to say > > that the hardware execution is, or might be, an instance of OOTA. > > Yes, but this does not require the definition you expressed before. This is > already not observable according to the definition that there is no read R > from an OOTA-cycle store, where some observable side effect semantically > depends on R. Yes. This was not meant to be an exact analogy. It was merely an observation of a concept closely related to something you said. > What I was trying to say is that your definition is almost a > no-true-scotsman (sorry Paul) definition: (I would use the term "tautology" -- although I don't believe it was a tautology in its original context, which referred specifically to cases where the compiler had removed all the accesses in an OOTA cycle.) > every program with an OOTA > execution where you could potentially not find an "equivalent" execution > without OOTA, is simply labeled a "no-true-unobserved OOTA". > > > > But it does make the proof of your claim totally trivial. If there is no > > > other OOTA-free execution with the same observable behavior, then it is > > > proof that the OOTA happened, so the OOTA was observed. > > > So by contraposition any non-observed OOTA has an OOTA-free execution with > > > the same observable behavior. > > > > > > The sense in which I would define observed is more along the lines of "there > > > is an observable side effect (such as store to volatile location) which has > > > a semantic dependency on a load that reads from one of the stores in the > > > OOTA cycle". Agreed. But maybe this indicates that your sense is too weak a criterion, that an indirect observation should count just as much as a direct one. Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-11 21:19 ` Alan Stern @ 2025-01-12 15:55 ` Jonas Oberhauser 2025-01-13 19:43 ` Alan Stern 0 siblings, 1 reply; 59+ messages in thread From: Jonas Oberhauser @ 2025-01-12 15:55 UTC (permalink / raw) To: Alan Stern Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon Am 1/11/2025 um 10:19 PM schrieb Alan Stern: > On Sat, Jan 11, 2025 at 01:46:21PM +0100, Jonas Oberhauser wrote: >> >> This is under my assumption that if we had let's say gcc's "semantic >> dependencies" or an under-approximation of it (by that I mean allow less >> things to be dependent than gcc can see), that these cases would be >> resolved, in the sense that gcc can not undermine [R & Marked] ; gcc-dep >> where gcc-dep is the dependencies detected by gcc. > > That seems circular. Basically, you're saying the gcc will not break > any dependencies that gcc classifies as not-breakable! Maybe my formulation is not exactly what I meant to express. I am thinking of examples like this, r1 = READ_ONCE(*x); *b = r1; ~~> r1 = READ_ONCE(*x); if (*b != r1) { *b = r1; } Here there is clearly a dependency to a store, but gcc might turn it into an independent load (in case *b == r1). Just because gcc admits that there is a dependency, does not necessarily mean that it will not still undermine the ordering "bestowed upon" that dependency by a memory model in some creative way. The cases that I could think of all still worked for very specific architecture-specific reasons (e.g., x86 has CMOV but all loads provide acquire-ordering, and arm does not have flag-conditional str, etc.) Or perhaps there is no dependency in case *b == r1. I am not sure. Another thought that pops up here is that when I last worked on formalizing dependencies, I could not define dependencies as being between one load and one store, a dependency might be between a set of loads and one store. I would have to look up the exact reason, but I think it was because sometimes you need to change more than one value to influence the result, e.g., a && b where both a and b are 0 - just changing one will not make a difference. All of these complications make me wonder whether even a relational notion of semantic dependency is good enough. >>>> Alan Stern: >>>>> Perhaps so. LKMM does include other features which the compiler can >>>>> defeat if the programmer isn't sufficiently careful. >>>> >>>> How many of these are due to herd7's limitations vs. in the cat file? >>> >>> Important limitations are present in both. >> >> I am genuinely asking. Do we have a list of the limitations? >> Maybe it would be good to collect it in the "A WARNING" section of >> explanation.txt if it doesn't exist elsewhere. > > There are a few listed already at various spots in explanation.txt -- > search for "undermine". And yes, many or most of these limitations do > arise from LKMM's failure to recognize when a dependency isn't semantic. > Maybe some are also related to undefined behavior, which LKMM is not > aware of. Thanks. Another point mentioned in that document is the total order of po, whereas C has a more relaxed notion of sequenced-before; this could even affect volatile accesses, e.g. in f(READ_ONCE(*a), g()). where g() calls an rmb and then READ_ONCE(*b), and it is not clear whether there should be a ppo from reading *a to *b in some executions or not. > > There is one other weakness I know of, however -- something totally > different. It's an instance in which the formal model in the .cat file > fails to capture the intent of the informal operational model. > > As I recall, it goes like this: The operational model asserts that an > A-cumulative fence (like a release fence) on a CPU ensures ordering for > all stores that propagate to that CPU before the fence is executed. But > the .cat code says only that this ordering applies to stores which the > CPU reads from before the fence is executed. I believe you can make up > litmus tests where you can prove that a store must have propagated to > the CPU before an A-cumulative fence occurs, even though the CPU doesn't > read from that store; for such examples the LKMM may accept executions > that shouldn't be allowed. > > There may even be an instance of this somewhere in the various litmus > test archives; I don't remember. Ah, I think you explained this case to me before. It is the one where prop & int covers some but not all cases, right? (compared to the cat PowerPC model, which does not even have prop & int) For now I am mostly worried about cases where LKMM promises too much, rather than too little. The latter case arises naturally as a trade-off between complexity of the model, algorithmic complexity, and what guarantees are actually needed from the model by real code. By the way, I am currently investigating a formulation of LKMM where there is a separate "propagation order" per thread, prop-order[t], which relates `a` to `b` events iff `a` is "observed" to propagate to t before `b`. Observation can also include co and fr, not just rf, which might be sufficient to cover those cases. I have a hand-written proof sketch that an order ORD induced by prop-order[t], xb, and some other orders is acyclic, and a Coq proof that executing an operational model in any linearization of ORD is permitted (e.g., does not propagate a store before it is executed, or in violation of co) and has the same rf as the axiomatic execution. So if the proof sketch works out, this might indicate that with such a per-thread propagation order, one can eliminate those cases. But to make the definition work, I had to make xb use prop-order[t] instead of prop in some cases, and the definitions of xb and prop-order[t] are mutually recursive, so it's not a very digestible definition. So I do not recommend replacing LKMM with such a model even if it works, but it is useful for investigating some boundary conditions. >>> An execution of the corresponding machine code would then be compatible >>> with an abstract execution of the source code in which both r0 and r1 >>> get set to 42 (OOTA). But it would also be compatible with an abstract >>> execution in which both r0 and r1 are 0, so it doesn't make sense to say >>> that the hardware execution is, or might be, an instance of OOTA. >> >> Yes, but this does not require the definition you expressed before. This is >> already not observable according to the definition that there is no read R >> from an OOTA-cycle store, where some observable side effect semantically >> depends on R. >> >> What I was trying to say is that your definition is almost a >> no-true-scotsman (sorry Paul) definition: > > (I would use the term "tautology" -- although I don't believe it was a > tautology in its original context, which referred specifically to cases > where the compiler had removed all the accesses in an OOTA cycle.) > >> every program with an OOTA >> execution where you could potentially not find an "equivalent" execution >> without OOTA, is simply labeled a "no-true-unobserved OOTA". >> >>>> But it does make the proof of your claim totally trivial. If there is no >>>> other OOTA-free execution with the same observable behavior, then it is >>>> proof that the OOTA happened, so the OOTA was observed. >>>> So by contraposition any non-observed OOTA has an OOTA-free execution with >>>> the same observable behavior. >>>> >>>> The sense in which I would define observed is more along the lines of "there >>>> is an observable side effect (such as store to volatile location) which has >>>> a semantic dependency on a load that reads from one of the stores in the >>>> OOTA cycle". > > Agreed. But maybe this indicates that your sense is too weak a > criterion, that an indirect observation should count just as much as a > direct one. I don't follow this conclusion. I think there are two relevant claims here: 1) compilers do not introduce "observed OOTA" 2) For every execution graph with not-observed OOTA, there is another execution with the same observable side effects that does not have OOTA. While it may be much easier to prove 2) with a more relaxed notion of observed OOTA, 1) sounds much harder. How does the compiler know that there is no indirect way to observe the OOTA? E.g., in the LKMM example, ignoring the compiler barriers, it might be possible for a compiler to deduce that the plain accesses are never used and delete them, resulting in an OOTA that is observed under the more relaxed setting, violating claim 1). Best wishes, jonas ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-12 15:55 ` Jonas Oberhauser @ 2025-01-13 19:43 ` Alan Stern 0 siblings, 0 replies; 59+ messages in thread From: Alan Stern @ 2025-01-13 19:43 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Sun, Jan 12, 2025 at 04:55:07PM +0100, Jonas Oberhauser wrote: > > > Am 1/11/2025 um 10:19 PM schrieb Alan Stern: > > On Sat, Jan 11, 2025 at 01:46:21PM +0100, Jonas Oberhauser wrote: > >> > > > This is under my assumption that if we had let's say gcc's "semantic > > > dependencies" or an under-approximation of it (by that I mean allow less > > > things to be dependent than gcc can see), that these cases would be > > > resolved, in the sense that gcc can not undermine [R & Marked] ; gcc-dep > > > where gcc-dep is the dependencies detected by gcc. > > > > That seems circular. Basically, you're saying the gcc will not break > > any dependencies that gcc classifies as not-breakable! > > Maybe my formulation is not exactly what I meant to express. > I am thinking of examples like this, > > r1 = READ_ONCE(*x); > *b = r1; > > ~~> > > r1 = READ_ONCE(*x); > if (*b != r1) { > *b = r1; > } > > Here there is clearly a dependency to a store, but gcc might turn it into an > independent load (in case *b == r1). > > Just because gcc admits that there is a dependency, does not necessarily > mean that it will not still undermine the ordering "bestowed upon" that > dependency by a memory model in some creative way. Yes; that's why the LKMM considers only ordering involving marked accesses, unless absolutely necessary. > The cases that I could think of all still worked for very specific > architecture-specific reasons (e.g., x86 has CMOV but all loads provide > acquire-ordering, and arm does not have flag-conditional str, etc.) > > Or perhaps there is no dependency in case *b == r1. I am not sure. > > Another thought that pops up here is that when I last worked on formalizing > dependencies, I could not define dependencies as being between one load and > one store, a dependency might be between a set of loads and one store. I > would have to look up the exact reason, but I think it was because sometimes > you need to change more than one value to influence the result, e.g., a && b > where both a and b are 0 - just changing one will not make a difference. Paul and I mentioned exactly this issue in our C++ presentation. It's in the paper; I had to reformulate the definition of OOTA to take it into account. > All of these complications make me wonder whether even a relational notion > of semantic dependency is good enough. For handling OOTA, probably not. > > > I am genuinely asking. Do we have a list of the limitations? > > > Maybe it would be good to collect it in the "A WARNING" section of > > > explanation.txt if it doesn't exist elsewhere. > > > > There are a few listed already at various spots in explanation.txt -- > > search for "undermine". And yes, many or most of these limitations do > > arise from LKMM's failure to recognize when a dependency isn't semantic. > > Maybe some are also related to undefined behavior, which LKMM is not > > aware of. > > Thanks. Another point mentioned in that document is the total order of po, > whereas C has a more relaxed notion of sequenced-before; this could even > affect volatile accesses, e.g. in f(READ_ONCE(*a), g()). where g() calls an > rmb and then READ_ONCE(*b), and it is not clear whether there should be a > ppo from reading *a to *b in some executions or not. Hah, yes. Of course, this isn't a problem for the litmus tests that herd7 will accept, because herd7 doesn't understand user-defined functions. Even so, we can run across odd things like this: r1 = READ_ONCE(*x); if (r1) smp_mb(); WRITE_ONCE(*y, 1); Here the WRITE_ONCE has to be ordered after the READ_ONCE, even when *x is 0, although the memory model isn't aware of that fact. > For now I am mostly worried about cases where LKMM promises too much, rather > than too little. The latter case arises naturally as a trade-off between > complexity of the model, algorithmic complexity, and what guarantees are > actually needed from the model by real code. Agreed, promising too much is a worse problem than promising too little. > By the way, I am currently investigating a formulation of LKMM where there > is a separate "propagation order" per thread, prop-order[t], which relates > `a` to `b` events iff `a` is "observed" to propagate to t before `b`. I guess this would depend on what you mean by "observed". Are there any useful cases of this where b isn't executed by t? All I can think of is that when the right sort of memory barrier separates a from b, then for all t, a propagates to t before b does -- and this does not depend on t. > Observation can also include co and fr, not just rf, which might be > sufficient to cover those cases. I have a hand-written proof sketch that an > order ORD induced by prop-order[t], xb, and some other orders is acyclic, > and a Coq proof that executing an operational model in any linearization of > ORD is permitted (e.g., does not propagate a store before it is executed, or > in violation of co) and has the same rf as the axiomatic execution. > > So if the proof sketch works out, this might indicate that with such a > per-thread propagation order, one can eliminate those cases. > > But to make the definition work, I had to make xb use prop-order[t] instead > of prop in some cases, and the definitions of xb and prop-order[t] are > mutually recursive, so it's not a very digestible definition. These mutually recursive definitions are the bane of memory models. In fact, that's what drove Will Deacon to impose other-multicopy atomicity on the ARM64 memory model -- doing so allowed him to avoid mutual recursion. > I think there are two relevant claims here: > 1) compilers do not introduce "observed OOTA" > 2) For every execution graph with not-observed OOTA, there is another > execution with the same observable side effects that does not have OOTA. > > While it may be much easier to prove 2) with a more relaxed notion of > observed OOTA, 1) sounds much harder. How does the compiler know that there > is no indirect way to observe the OOTA? > > E.g., in the LKMM example, ignoring the compiler barriers, it might be > possible for a compiler to deduce that the plain accesses are never used and > delete them, resulting in an OOTA that is observed under the more relaxed > setting, violating claim 1). A main point of the paper Paul and I wrote for the C++ working group was that the _only_ way OOTA can occur in real-world programs is if all the accesses have been removed by the compiler (assuming the compiler obeys some minimal restrictions). We did not consider the issue of whether such instances of OOTA could be considered to be "observed". As I understand it, they would not satisfy your proposed notion of "observed". Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-01-06 21:40 [RFC] tools/memory-model: Rule out OOTA Jonas Oberhauser ` (2 preceding siblings ...) 2025-01-07 16:09 ` Alan Stern @ 2025-07-23 0:43 ` Paul E. McKenney 2025-07-23 7:26 ` Hernan Ponce de Leon ` (2 more replies) 3 siblings, 3 replies; 59+ messages in thread From: Paul E. McKenney @ 2025-07-23 0:43 UTC (permalink / raw) To: Jonas Oberhauser Cc: stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: > The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following > example shared on this list a few years ago: Apologies for being slow, but I have finally added the litmus tests in this email thread to the https://github.com/paulmckrcu/litmus repo. It is quite likely that I have incorrectly intuited the missing portions of the litmus tests, especially the two called out in the commit log below. If you have time, please do double-check. And the updated (and condensed!) version of the C++ OOTA paper may be found here, this time with a proposed change to the standard: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3692r1.pdf Thanx, Paul ------------------------------------------------------------------------ commit fd17e8fceb75326e159ba3aa6fdb344f74f5c7a5 Author: Paul E. McKenney <paulmck@kernel.org> Date: Tue Jul 22 17:21:19 2025 -0700 manual/oota: Add Jonas and Alan OOTA examples Each of these new litmus tests contains the URL of the email message that I took it from. Please note that I had to tweak the example leading up to C-JO-OOTA-4.litmus, and I might well have misinterpreted Jonas's "~" operator. Also, C-JO-OOTA-7.litmus includes a "*r2 = a" statement that makes herd7 very unhappy. On the other hand, initializing registers to the address of a variable is straight forward, as shown in the resulting litmus test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> diff --git a/manual/oota/C-AS-OOTA-1.litmus b/manual/oota/C-AS-OOTA-1.litmus new file mode 100644 index 00000000..81a873a7 --- /dev/null +++ b/manual/oota/C-AS-OOTA-1.litmus @@ -0,0 +1,40 @@ +C C-AS-OOTA-1 + +(* + * Result: Sometimes + * + * Because smp_rmb() combined with smp_wmb() does not order earlier + * reads against later writes. + * + * https://lore.kernel.org/all/a3bf910f-509a-4ad3-a1cc-4b14ef9b3259@rowland.harvard.edu + *) + +{} + +P0(int *a, int *b, int *x, int *y) +{ + int r1; + + r1 = READ_ONCE(*x); + smp_rmb(); + if (r1 == 1) { + *a = *b; + } + smp_wmb(); + WRITE_ONCE(*y, 1); +} + +P1(int *a, int *b, int *x, int *y) +{ + int r1; + + r1 = READ_ONCE(*y); + smp_rmb(); + if (r1 == 1) { + *b = *a; + } + smp_wmb(); + WRITE_ONCE(*x, 1); +} + +exists (0:r1=1 /\ 1:r1=1) diff --git a/manual/oota/C-AS-OOTA-2.litmus b/manual/oota/C-AS-OOTA-2.litmus new file mode 100644 index 00000000..c672b0e7 --- /dev/null +++ b/manual/oota/C-AS-OOTA-2.litmus @@ -0,0 +1,33 @@ +C C-AS-OOTA-2 + +(* + * Result: Always + * + * If we were using C-language relaxed atomics instead of volatiles, + * the compiler *could* eliminate the first WRITE_ONCE() in each process, + * then also each process's local variable, thus having an undefined value + * for each of those local variables. But this cannot happen given that + * we are using Linux-kernel _ONCE() primitives. + * + * https://lore.kernel.org/all/c2ae9bca-8526-425e-b9b5-135004ad59ad@rowland.harvard.edu/ + *) + +{} + +P0(int *a, int *b) +{ + int r0 = READ_ONCE(*a); + + WRITE_ONCE(*b, r0); + WRITE_ONCE(*b, 2); +} + +P1(int *a, int *b) +{ + int r1 = READ_ONCE(*b); + + WRITE_ONCE(*a, r0); + WRITE_ONCE(*a, 2); +} + +exists ((0:r0=0 \/ 0:r0=2) /\ (1:r1=0 \/ 1:r1=2)) diff --git a/manual/oota/C-JO-OOTA-1.litmus b/manual/oota/C-JO-OOTA-1.litmus new file mode 100644 index 00000000..6ab437b4 --- /dev/null +++ b/manual/oota/C-JO-OOTA-1.litmus @@ -0,0 +1,40 @@ +C C-JO-OOTA-1 + +(* + * Result: Never + * + * But Sometimes in LKMM as of early 2025, given that 42 is a possible + * value for things like S19.. + * + * https://lore.kernel.org/all/20250106214003.504664-1-jonas.oberhauser@huaweicloud.com/ + *) + +{} + +P0(int *a, int *b, int *x, int *y) +{ + int r1; + + r1 = READ_ONCE(*x); + smp_rmb(); + if (r1 == 1) { + *a = *b; + } + smp_wmb(); + WRITE_ONCE(*y, 1); +} + +P1(int *a, int *b, int *x, int *y) +{ + int r1; + + r1 = READ_ONCE(*y); + smp_rmb(); + if (r1 == 1) { + *b = *a; + } + smp_wmb(); + WRITE_ONCE(*x, 1); +} + +exists (b=42) diff --git a/manual/oota/C-JO-OOTA-2.litmus b/manual/oota/C-JO-OOTA-2.litmus new file mode 100644 index 00000000..ad708c60 --- /dev/null +++ b/manual/oota/C-JO-OOTA-2.litmus @@ -0,0 +1,44 @@ +C C-JO-OOTA-2 + +(* + * Result: Never + * + * But Sometimes in LKMM as of early 2025, given that 42 is a possible + * value for things like S23. + * + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ + *) + +{} + +P0(int *a, int *b, int *x, int *y) +{ + int r1; + int r2 = 0; + + r1 = READ_ONCE(*x); + smp_rmb(); + if (r1 == 1) { + r2 = *b; + } + WRITE_ONCE(*a, r2); + smp_wmb(); + WRITE_ONCE(*y, 1); +} + +P1(int *a, int *b, int *x, int *y) +{ + int r1; + int r2 = 0; + + r1 = READ_ONCE(*y); + smp_rmb(); + if (r1 == 1) { + r2 = *a; + } + WRITE_ONCE(*b, r2); + smp_wmb(); + WRITE_ONCE(*x, 1); +} + +exists (b=42) diff --git a/manual/oota/C-JO-OOTA-3.litmus b/manual/oota/C-JO-OOTA-3.litmus new file mode 100644 index 00000000..633b8334 --- /dev/null +++ b/manual/oota/C-JO-OOTA-3.litmus @@ -0,0 +1,46 @@ +C C-JO-OOTA-3 + +(* + * Result: Never + * + * But LKMM finds the all-ones result, perhaps due to not tracking + * control dependencies out of the "if" statement. + * + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ + *) + +{} + +P0(int *a, int *b, int *x, int *y) +{ + int r1; + int r2; + + r1 = READ_ONCE(*x); + smp_rmb(); + r2 = READ_ONCE(*b); + if (r1 == 1) { + r2 = *b; + } + WRITE_ONCE(*a, r2); + smp_wmb(); + WRITE_ONCE(*y, 1); +} + +P1(int *a, int *b, int *x, int *y) +{ + int r1; + int r2; + + r1 = READ_ONCE(*y); + smp_rmb(); + r2 = READ_ONCE(*a); + if (r1 == 1) { + r2 = *a; + } + WRITE_ONCE(*b, r2); + smp_wmb(); + WRITE_ONCE(*x, 1); +} + +exists (0:r1=1 /\ 1:r1=1) diff --git a/manual/oota/C-JO-OOTA-4.litmus b/manual/oota/C-JO-OOTA-4.litmus new file mode 100644 index 00000000..cab7ebb6 --- /dev/null +++ b/manual/oota/C-JO-OOTA-4.litmus @@ -0,0 +1,43 @@ +C C-JO-OOTA-4 + +(* + * Result: Never + * + * And LKMM agrees, which might be a surprise. + * + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ + *) + +{} + +P0(int *a, int *b, int *x, int *y) +{ + int r1; + int r2; + int r3; + + r1 = READ_ONCE(*x); + smp_rmb(); + r2 = *b; + r3 = r1 == 0; + WRITE_ONCE(*a, (r3 + 1) & r2); + smp_wmb(); + WRITE_ONCE(*y, 1); +} + +P1(int *a, int *b, int *x, int *y) +{ + int r1; + int r2; + int r3; + + r1 = READ_ONCE(*y); + smp_rmb(); + r2 = *a; + r3 = r1 == 0; + WRITE_ONCE(*b, (r3 + 1) & r2); + smp_wmb(); + WRITE_ONCE(*x, 1); +} + +exists (0:r1=1 /\ 1:r1=1) diff --git a/manual/oota/C-JO-OOTA-5.litmus b/manual/oota/C-JO-OOTA-5.litmus new file mode 100644 index 00000000..145c8378 --- /dev/null +++ b/manual/oota/C-JO-OOTA-5.litmus @@ -0,0 +1,44 @@ +C C-JO-OOTA-5 + +(* + * Result: Never + * + * But LKMM finds the all-ones result, perhaps due r2 being unused. + * + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ + *) + +{} + +P0(int *a, int *b, int *x, int *y) +{ + int r1; + int r2; + + r1 = READ_ONCE(*x); + smp_rmb(); + if (r1 == 1) { + r2 = READ_ONCE(*a); + } + *b = 1; + smp_wmb(); + WRITE_ONCE(*y, 1); +} + +P1(int *a, int *b, int *x, int *y) +{ + int r1; + int r2; + + r1 = READ_ONCE(*y); + smp_rmb(); + if (r1 == 1) { + r2 = READ_ONCE(*b); + } + *a = 1; + smp_wmb(); + WRITE_ONCE(*x, 1); +} + +locations [0:r2;1:r2] +exists (0:r1=1 /\ 1:r1=1) diff --git a/manual/oota/C-JO-OOTA-6.litmus b/manual/oota/C-JO-OOTA-6.litmus new file mode 100644 index 00000000..942e6c82 --- /dev/null +++ b/manual/oota/C-JO-OOTA-6.litmus @@ -0,0 +1,44 @@ +C C-JO-OOTA-6 + +(* + * Result: Never + * + * But LKMM finds the all-ones result, due to OOTA on r2. + * + * https://lore.kernel.org/all/1147ad3e-e3ad-4fa1-9a63-772ba136ea9a@huaweicloud.com/ + *) + +{} + +P0(int *a, int *b, int *x, int *y) +{ + int r1; + int r2; + + r1 = READ_ONCE(*x); + smp_rmb(); + if (r1 == 1) { + r2 = READ_ONCE(*a); + } + *b = r2; + smp_wmb(); + WRITE_ONCE(*y, 1); +} + +P1(int *a, int *b, int *x, int *y) +{ + int r1; + int r2; + + r1 = READ_ONCE(*y); + smp_rmb(); + if (r1 == 1) { + r2 = READ_ONCE(*b); + } + *a = r2; + smp_wmb(); + WRITE_ONCE(*x, 1); +} + +locations [0:r2;1:r2] +exists (0:r1=1 /\ 1:r1=1) diff --git a/manual/oota/C-JO-OOTA-7.litmus b/manual/oota/C-JO-OOTA-7.litmus new file mode 100644 index 00000000..31c0b8ae --- /dev/null +++ b/manual/oota/C-JO-OOTA-7.litmus @@ -0,0 +1,47 @@ +C C-JO-OOTA-7 + +(* + * Result: Never + * + * But LKMM finds the all-ones result, due to OOTA on r2. + * + * https://lore.kernel.org/all/1147ad3e-e3ad-4fa1-9a63-772ba136ea9a@huaweicloud.com/ + *) + +{ + 0:r2=a; + 1:r2=b; +} + +P0(int *a, int *b, int *x, int *y) +{ + int r1; + int r2; + + r1 = READ_ONCE(*x); + smp_rmb(); + if (r1 == 1) { + r2 = READ_ONCE(*a); + } + *r2 = a; + smp_wmb(); + WRITE_ONCE(*y, 1); +} + +P1(int *a, int *b, int *x, int *y) +{ + int r1; + int r2; + + r1 = READ_ONCE(*y); + smp_rmb(); + if (r1 == 1) { + r2 = READ_ONCE(*b); + } + *r2 = b; + smp_wmb(); + WRITE_ONCE(*x, 1); +} + +locations [0:r2;1:r2] +exists (0:r1=1 /\ 1:r1=1) diff --git a/manual/oota/C-PM-OOTA-1.litmus b/manual/oota/C-PM-OOTA-1.litmus new file mode 100644 index 00000000..e771e3c9 --- /dev/null +++ b/manual/oota/C-PM-OOTA-1.litmus @@ -0,0 +1,37 @@ +C C-PM-OOTA-1 + +(* + * Result: Never + * + * LKMM agrees. + * + * https://lore.kernel.org/all/9a0dccbb-bfa7-4b33-ac1a-daa9841b609a@paulmck-laptop/ + *) + +{} + +P0(int *a, int *b, int *x, int *y) { + int r1; + + r1 = READ_ONCE(*x); + smp_rmb(); + if (r1 == 1) { + WRITE_ONCE(*a, *b); + } + smp_wmb(); + WRITE_ONCE(*y, 1); +} + +P1(int *a, int *b, int *x, int *y) { + int r1; + + r1 = READ_ONCE(*y); + smp_rmb(); + if (r1 == 1) { + WRITE_ONCE(*b, *a); + } + smp_wmb(); + WRITE_ONCE(*x, 1); +} + +exists b=42 ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-07-23 0:43 ` Paul E. McKenney @ 2025-07-23 7:26 ` Hernan Ponce de Leon 2025-07-23 16:39 ` Paul E. McKenney 2025-07-23 17:13 ` Alan Stern 2025-07-23 19:25 ` Alan Stern 2 siblings, 1 reply; 59+ messages in thread From: Hernan Ponce de Leon @ 2025-07-23 7:26 UTC (permalink / raw) To: paulmck, Jonas Oberhauser Cc: stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm On 7/23/2025 2:43 AM, Paul E. McKenney wrote: > On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: >> The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following >> example shared on this list a few years ago: > > Apologies for being slow, but I have finally added the litmus tests in > this email thread to the https://github.com/paulmckrcu/litmus repo. I do not understand some of the comments in the preamble of the tests: (* * Result: Never * * But Sometimes in LKMM as of early 2025, given that 42 is a possible * value for things like S19.. * * https://lore.kernel.org/all/20250106214003.504664-1-jonas.oberhauser@huaweicloud.com/ *) I see that herd7 reports one of the states to be [b]=S16. Is this supposed to be some kind of symbolic state (i.e., any value is possible)? The value in the "Result" is what we would like the model to say if we would have a perfect version of dependencies, right? > > It is quite likely that I have incorrectly intuited the missing portions > of the litmus tests, especially the two called out in the commit log > below. If you have time, please do double-check. I read the "On the other hand" from the commit log as "this fixes the problem". However I still get the following error when running C-JO-OOTA-7 with herd7 Warning: File "manual/oota/C-JO-OOTA-7.litmus": Non-symbolic memory access found on '[0]' (User error) Hernan> > And the updated (and condensed!) version of the C++ OOTA paper may be > found here, this time with a proposed change to the standard: > > https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3692r1.pdf > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit fd17e8fceb75326e159ba3aa6fdb344f74f5c7a5 > Author: Paul E. McKenney <paulmck@kernel.org> > Date: Tue Jul 22 17:21:19 2025 -0700 > > manual/oota: Add Jonas and Alan OOTA examples > > Each of these new litmus tests contains the URL of the email message > that I took it from. > > Please note that I had to tweak the example leading up to > C-JO-OOTA-4.litmus, and I might well have misinterpreted Jonas's "~" > operator. > > Also, C-JO-OOTA-7.litmus includes a "*r2 = a" statement that makes herd7 > very unhappy. On the other hand, initializing registers to the address > of a variable is straight forward, as shown in the resulting litmus test. > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > diff --git a/manual/oota/C-AS-OOTA-1.litmus b/manual/oota/C-AS-OOTA-1.litmus > new file mode 100644 > index 00000000..81a873a7 > --- /dev/null > +++ b/manual/oota/C-AS-OOTA-1.litmus > @@ -0,0 +1,40 @@ > +C C-AS-OOTA-1 > + > +(* > + * Result: Sometimes > + * > + * Because smp_rmb() combined with smp_wmb() does not order earlier > + * reads against later writes. > + * > + * https://lore.kernel.org/all/a3bf910f-509a-4ad3-a1cc-4b14ef9b3259@rowland.harvard.edu > + *) > + > +{} > + > +P0(int *a, int *b, int *x, int *y) > +{ > + int r1; > + > + r1 = READ_ONCE(*x); > + smp_rmb(); > + if (r1 == 1) { > + *a = *b; > + } > + smp_wmb(); > + WRITE_ONCE(*y, 1); > +} > + > +P1(int *a, int *b, int *x, int *y) > +{ > + int r1; > + > + r1 = READ_ONCE(*y); > + smp_rmb(); > + if (r1 == 1) { > + *b = *a; > + } > + smp_wmb(); > + WRITE_ONCE(*x, 1); > +} > + > +exists (0:r1=1 /\ 1:r1=1) > diff --git a/manual/oota/C-AS-OOTA-2.litmus b/manual/oota/C-AS-OOTA-2.litmus > new file mode 100644 > index 00000000..c672b0e7 > --- /dev/null > +++ b/manual/oota/C-AS-OOTA-2.litmus > @@ -0,0 +1,33 @@ > +C C-AS-OOTA-2 > + > +(* > + * Result: Always > + * > + * If we were using C-language relaxed atomics instead of volatiles, > + * the compiler *could* eliminate the first WRITE_ONCE() in each process, > + * then also each process's local variable, thus having an undefined value > + * for each of those local variables. But this cannot happen given that > + * we are using Linux-kernel _ONCE() primitives. > + * > + * https://lore.kernel.org/all/c2ae9bca-8526-425e-b9b5-135004ad59ad@rowland.harvard.edu/ > + *) > + > +{} > + > +P0(int *a, int *b) > +{ > + int r0 = READ_ONCE(*a); > + > + WRITE_ONCE(*b, r0); > + WRITE_ONCE(*b, 2); > +} > + > +P1(int *a, int *b) > +{ > + int r1 = READ_ONCE(*b); > + > + WRITE_ONCE(*a, r0); > + WRITE_ONCE(*a, 2); > +} > + > +exists ((0:r0=0 \/ 0:r0=2) /\ (1:r1=0 \/ 1:r1=2)) > diff --git a/manual/oota/C-JO-OOTA-1.litmus b/manual/oota/C-JO-OOTA-1.litmus > new file mode 100644 > index 00000000..6ab437b4 > --- /dev/null > +++ b/manual/oota/C-JO-OOTA-1.litmus > @@ -0,0 +1,40 @@ > +C C-JO-OOTA-1 > + > +(* > + * Result: Never > + * > + * But Sometimes in LKMM as of early 2025, given that 42 is a possible > + * value for things like S19.. > + * > + * https://lore.kernel.org/all/20250106214003.504664-1-jonas.oberhauser@huaweicloud.com/ > + *) > + > +{} > + > +P0(int *a, int *b, int *x, int *y) > +{ > + int r1; > + > + r1 = READ_ONCE(*x); > + smp_rmb(); > + if (r1 == 1) { > + *a = *b; > + } > + smp_wmb(); > + WRITE_ONCE(*y, 1); > +} > + > +P1(int *a, int *b, int *x, int *y) > +{ > + int r1; > + > + r1 = READ_ONCE(*y); > + smp_rmb(); > + if (r1 == 1) { > + *b = *a; > + } > + smp_wmb(); > + WRITE_ONCE(*x, 1); > +} > + > +exists (b=42) > diff --git a/manual/oota/C-JO-OOTA-2.litmus b/manual/oota/C-JO-OOTA-2.litmus > new file mode 100644 > index 00000000..ad708c60 > --- /dev/null > +++ b/manual/oota/C-JO-OOTA-2.litmus > @@ -0,0 +1,44 @@ > +C C-JO-OOTA-2 > + > +(* > + * Result: Never > + * > + * But Sometimes in LKMM as of early 2025, given that 42 is a possible > + * value for things like S23. > + * > + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ > + *) > + > +{} > + > +P0(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2 = 0; > + > + r1 = READ_ONCE(*x); > + smp_rmb(); > + if (r1 == 1) { > + r2 = *b; > + } > + WRITE_ONCE(*a, r2); > + smp_wmb(); > + WRITE_ONCE(*y, 1); > +} > + > +P1(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2 = 0; > + > + r1 = READ_ONCE(*y); > + smp_rmb(); > + if (r1 == 1) { > + r2 = *a; > + } > + WRITE_ONCE(*b, r2); > + smp_wmb(); > + WRITE_ONCE(*x, 1); > +} > + > +exists (b=42) > diff --git a/manual/oota/C-JO-OOTA-3.litmus b/manual/oota/C-JO-OOTA-3.litmus > new file mode 100644 > index 00000000..633b8334 > --- /dev/null > +++ b/manual/oota/C-JO-OOTA-3.litmus > @@ -0,0 +1,46 @@ > +C C-JO-OOTA-3 > + > +(* > + * Result: Never > + * > + * But LKMM finds the all-ones result, perhaps due to not tracking > + * control dependencies out of the "if" statement. > + * > + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ > + *) > + > +{} > + > +P0(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + > + r1 = READ_ONCE(*x); > + smp_rmb(); > + r2 = READ_ONCE(*b); > + if (r1 == 1) { > + r2 = *b; > + } > + WRITE_ONCE(*a, r2); > + smp_wmb(); > + WRITE_ONCE(*y, 1); > +} > + > +P1(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + > + r1 = READ_ONCE(*y); > + smp_rmb(); > + r2 = READ_ONCE(*a); > + if (r1 == 1) { > + r2 = *a; > + } > + WRITE_ONCE(*b, r2); > + smp_wmb(); > + WRITE_ONCE(*x, 1); > +} > + > +exists (0:r1=1 /\ 1:r1=1) > diff --git a/manual/oota/C-JO-OOTA-4.litmus b/manual/oota/C-JO-OOTA-4.litmus > new file mode 100644 > index 00000000..cab7ebb6 > --- /dev/null > +++ b/manual/oota/C-JO-OOTA-4.litmus > @@ -0,0 +1,43 @@ > +C C-JO-OOTA-4 > + > +(* > + * Result: Never > + * > + * And LKMM agrees, which might be a surprise. > + * > + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ > + *) > + > +{} > + > +P0(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + int r3; > + > + r1 = READ_ONCE(*x); > + smp_rmb(); > + r2 = *b; > + r3 = r1 == 0; > + WRITE_ONCE(*a, (r3 + 1) & r2); > + smp_wmb(); > + WRITE_ONCE(*y, 1); > +} > + > +P1(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + int r3; > + > + r1 = READ_ONCE(*y); > + smp_rmb(); > + r2 = *a; > + r3 = r1 == 0; > + WRITE_ONCE(*b, (r3 + 1) & r2); > + smp_wmb(); > + WRITE_ONCE(*x, 1); > +} > + > +exists (0:r1=1 /\ 1:r1=1) > diff --git a/manual/oota/C-JO-OOTA-5.litmus b/manual/oota/C-JO-OOTA-5.litmus > new file mode 100644 > index 00000000..145c8378 > --- /dev/null > +++ b/manual/oota/C-JO-OOTA-5.litmus > @@ -0,0 +1,44 @@ > +C C-JO-OOTA-5 > + > +(* > + * Result: Never > + * > + * But LKMM finds the all-ones result, perhaps due r2 being unused. > + * > + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ > + *) > + > +{} > + > +P0(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + > + r1 = READ_ONCE(*x); > + smp_rmb(); > + if (r1 == 1) { > + r2 = READ_ONCE(*a); > + } > + *b = 1; > + smp_wmb(); > + WRITE_ONCE(*y, 1); > +} > + > +P1(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + > + r1 = READ_ONCE(*y); > + smp_rmb(); > + if (r1 == 1) { > + r2 = READ_ONCE(*b); > + } > + *a = 1; > + smp_wmb(); > + WRITE_ONCE(*x, 1); > +} > + > +locations [0:r2;1:r2] > +exists (0:r1=1 /\ 1:r1=1) > diff --git a/manual/oota/C-JO-OOTA-6.litmus b/manual/oota/C-JO-OOTA-6.litmus > new file mode 100644 > index 00000000..942e6c82 > --- /dev/null > +++ b/manual/oota/C-JO-OOTA-6.litmus > @@ -0,0 +1,44 @@ > +C C-JO-OOTA-6 > + > +(* > + * Result: Never > + * > + * But LKMM finds the all-ones result, due to OOTA on r2. > + * > + * https://lore.kernel.org/all/1147ad3e-e3ad-4fa1-9a63-772ba136ea9a@huaweicloud.com/ > + *) > + > +{} > + > +P0(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + > + r1 = READ_ONCE(*x); > + smp_rmb(); > + if (r1 == 1) { > + r2 = READ_ONCE(*a); > + } > + *b = r2; > + smp_wmb(); > + WRITE_ONCE(*y, 1); > +} > + > +P1(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + > + r1 = READ_ONCE(*y); > + smp_rmb(); > + if (r1 == 1) { > + r2 = READ_ONCE(*b); > + } > + *a = r2; > + smp_wmb(); > + WRITE_ONCE(*x, 1); > +} > + > +locations [0:r2;1:r2] > +exists (0:r1=1 /\ 1:r1=1) > diff --git a/manual/oota/C-JO-OOTA-7.litmus b/manual/oota/C-JO-OOTA-7.litmus > new file mode 100644 > index 00000000..31c0b8ae > --- /dev/null > +++ b/manual/oota/C-JO-OOTA-7.litmus > @@ -0,0 +1,47 @@ > +C C-JO-OOTA-7 > + > +(* > + * Result: Never > + * > + * But LKMM finds the all-ones result, due to OOTA on r2. > + * > + * https://lore.kernel.org/all/1147ad3e-e3ad-4fa1-9a63-772ba136ea9a@huaweicloud.com/ > + *) > + > +{ > + 0:r2=a; > + 1:r2=b; > +} > + > +P0(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + > + r1 = READ_ONCE(*x); > + smp_rmb(); > + if (r1 == 1) { > + r2 = READ_ONCE(*a); > + } > + *r2 = a; > + smp_wmb(); > + WRITE_ONCE(*y, 1); > +} > + > +P1(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + > + r1 = READ_ONCE(*y); > + smp_rmb(); > + if (r1 == 1) { > + r2 = READ_ONCE(*b); > + } > + *r2 = b; > + smp_wmb(); > + WRITE_ONCE(*x, 1); > +} > + > +locations [0:r2;1:r2] > +exists (0:r1=1 /\ 1:r1=1) > diff --git a/manual/oota/C-PM-OOTA-1.litmus b/manual/oota/C-PM-OOTA-1.litmus > new file mode 100644 > index 00000000..e771e3c9 > --- /dev/null > +++ b/manual/oota/C-PM-OOTA-1.litmus > @@ -0,0 +1,37 @@ > +C C-PM-OOTA-1 > + > +(* > + * Result: Never > + * > + * LKMM agrees. > + * > + * https://lore.kernel.org/all/9a0dccbb-bfa7-4b33-ac1a-daa9841b609a@paulmck-laptop/ > + *) > + > +{} > + > +P0(int *a, int *b, int *x, int *y) { > + int r1; > + > + r1 = READ_ONCE(*x); > + smp_rmb(); > + if (r1 == 1) { > + WRITE_ONCE(*a, *b); > + } > + smp_wmb(); > + WRITE_ONCE(*y, 1); > +} > + > +P1(int *a, int *b, int *x, int *y) { > + int r1; > + > + r1 = READ_ONCE(*y); > + smp_rmb(); > + if (r1 == 1) { > + WRITE_ONCE(*b, *a); > + } > + smp_wmb(); > + WRITE_ONCE(*x, 1); > +} > + > +exists b=42 ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-07-23 7:26 ` Hernan Ponce de Leon @ 2025-07-23 16:39 ` Paul E. McKenney 2025-07-24 14:14 ` Paul E. McKenney 0 siblings, 1 reply; 59+ messages in thread From: Paul E. McKenney @ 2025-07-23 16:39 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: Jonas Oberhauser, stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm On Wed, Jul 23, 2025 at 09:26:32AM +0200, Hernan Ponce de Leon wrote: > On 7/23/2025 2:43 AM, Paul E. McKenney wrote: > > On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: > > > The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following > > > example shared on this list a few years ago: > > > > Apologies for being slow, but I have finally added the litmus tests in > > this email thread to the https://github.com/paulmckrcu/litmus repo. > > I do not understand some of the comments in the preamble of the tests: > > (* > * Result: Never > * > * But Sometimes in LKMM as of early 2025, given that 42 is a possible > * value for things like S19.. > * > * https://lore.kernel.org/all/20250106214003.504664-1-jonas.oberhauser@huaweicloud.com/ > *) > > I see that herd7 reports one of the states to be [b]=S16. Is this supposed > to be some kind of symbolic state (i.e., any value is possible)? Exactly! > The value in the "Result" is what we would like the model to say if we would > have a perfect version of dependencies, right? In this case, yes. There are other cases elsewhere in which the "Result:" comment instead records LKMM's current state, so that any deviation (whether right or wrong) are noted. Most recently, the 1800+ changes in luc/RelAcq. > > It is quite likely that I have incorrectly intuited the missing portions > > of the litmus tests, especially the two called out in the commit log > > below. If you have time, please do double-check. > > I read the "On the other hand" from the commit log as "this fixes the > problem". However I still get the following error when running C-JO-OOTA-7 > with herd7 > > Warning: File "manual/oota/C-JO-OOTA-7.litmus": Non-symbolic memory access > found on '[0]' (User error) Yes, my interpretation of the example in that URL didn't make any sense at all to herd7. So I would welcome a fix to this litmus test. The only potential fixes that I found clearly went against the intent of this litmus test. My only real contribution in my coding of manual/oota/C-JO-OOTA-7.litmus is showing how to initialize a local herd7 variable to contain a pointer to a global variable. ;-) Thanx, Paul > Hernan> > > And the updated (and condensed!) version of the C++ OOTA paper may be > > found here, this time with a proposed change to the standard: > > > > https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3692r1.pdf > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit fd17e8fceb75326e159ba3aa6fdb344f74f5c7a5 > > Author: Paul E. McKenney <paulmck@kernel.org> > > Date: Tue Jul 22 17:21:19 2025 -0700 > > > > manual/oota: Add Jonas and Alan OOTA examples > > Each of these new litmus tests contains the URL of the email message > > that I took it from. > > Please note that I had to tweak the example leading up to > > C-JO-OOTA-4.litmus, and I might well have misinterpreted Jonas's "~" > > operator. > > Also, C-JO-OOTA-7.litmus includes a "*r2 = a" statement that makes herd7 > > very unhappy. On the other hand, initializing registers to the address > > of a variable is straight forward, as shown in the resulting litmus test. > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > diff --git a/manual/oota/C-AS-OOTA-1.litmus b/manual/oota/C-AS-OOTA-1.litmus > > new file mode 100644 > > index 00000000..81a873a7 > > --- /dev/null > > +++ b/manual/oota/C-AS-OOTA-1.litmus > > @@ -0,0 +1,40 @@ > > +C C-AS-OOTA-1 > > + > > +(* > > + * Result: Sometimes > > + * > > + * Because smp_rmb() combined with smp_wmb() does not order earlier > > + * reads against later writes. > > + * > > + * https://lore.kernel.org/all/a3bf910f-509a-4ad3-a1cc-4b14ef9b3259@rowland.harvard.edu > > + *) > > + > > +{} > > + > > +P0(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + > > + r1 = READ_ONCE(*x); > > + smp_rmb(); > > + if (r1 == 1) { > > + *a = *b; > > + } > > + smp_wmb(); > > + WRITE_ONCE(*y, 1); > > +} > > + > > +P1(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + > > + r1 = READ_ONCE(*y); > > + smp_rmb(); > > + if (r1 == 1) { > > + *b = *a; > > + } > > + smp_wmb(); > > + WRITE_ONCE(*x, 1); > > +} > > + > > +exists (0:r1=1 /\ 1:r1=1) > > diff --git a/manual/oota/C-AS-OOTA-2.litmus b/manual/oota/C-AS-OOTA-2.litmus > > new file mode 100644 > > index 00000000..c672b0e7 > > --- /dev/null > > +++ b/manual/oota/C-AS-OOTA-2.litmus > > @@ -0,0 +1,33 @@ > > +C C-AS-OOTA-2 > > + > > +(* > > + * Result: Always > > + * > > + * If we were using C-language relaxed atomics instead of volatiles, > > + * the compiler *could* eliminate the first WRITE_ONCE() in each process, > > + * then also each process's local variable, thus having an undefined value > > + * for each of those local variables. But this cannot happen given that > > + * we are using Linux-kernel _ONCE() primitives. > > + * > > + * https://lore.kernel.org/all/c2ae9bca-8526-425e-b9b5-135004ad59ad@rowland.harvard.edu/ > > + *) > > + > > +{} > > + > > +P0(int *a, int *b) > > +{ > > + int r0 = READ_ONCE(*a); > > + > > + WRITE_ONCE(*b, r0); > > + WRITE_ONCE(*b, 2); > > +} > > + > > +P1(int *a, int *b) > > +{ > > + int r1 = READ_ONCE(*b); > > + > > + WRITE_ONCE(*a, r0); > > + WRITE_ONCE(*a, 2); > > +} > > + > > +exists ((0:r0=0 \/ 0:r0=2) /\ (1:r1=0 \/ 1:r1=2)) > > diff --git a/manual/oota/C-JO-OOTA-1.litmus b/manual/oota/C-JO-OOTA-1.litmus > > new file mode 100644 > > index 00000000..6ab437b4 > > --- /dev/null > > +++ b/manual/oota/C-JO-OOTA-1.litmus > > @@ -0,0 +1,40 @@ > > +C C-JO-OOTA-1 > > + > > +(* > > + * Result: Never > > + * > > + * But Sometimes in LKMM as of early 2025, given that 42 is a possible > > + * value for things like S19.. > > + * > > + * https://lore.kernel.org/all/20250106214003.504664-1-jonas.oberhauser@huaweicloud.com/ > > + *) > > + > > +{} > > + > > +P0(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + > > + r1 = READ_ONCE(*x); > > + smp_rmb(); > > + if (r1 == 1) { > > + *a = *b; > > + } > > + smp_wmb(); > > + WRITE_ONCE(*y, 1); > > +} > > + > > +P1(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + > > + r1 = READ_ONCE(*y); > > + smp_rmb(); > > + if (r1 == 1) { > > + *b = *a; > > + } > > + smp_wmb(); > > + WRITE_ONCE(*x, 1); > > +} > > + > > +exists (b=42) > > diff --git a/manual/oota/C-JO-OOTA-2.litmus b/manual/oota/C-JO-OOTA-2.litmus > > new file mode 100644 > > index 00000000..ad708c60 > > --- /dev/null > > +++ b/manual/oota/C-JO-OOTA-2.litmus > > @@ -0,0 +1,44 @@ > > +C C-JO-OOTA-2 > > + > > +(* > > + * Result: Never > > + * > > + * But Sometimes in LKMM as of early 2025, given that 42 is a possible > > + * value for things like S23. > > + * > > + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ > > + *) > > + > > +{} > > + > > +P0(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2 = 0; > > + > > + r1 = READ_ONCE(*x); > > + smp_rmb(); > > + if (r1 == 1) { > > + r2 = *b; > > + } > > + WRITE_ONCE(*a, r2); > > + smp_wmb(); > > + WRITE_ONCE(*y, 1); > > +} > > + > > +P1(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2 = 0; > > + > > + r1 = READ_ONCE(*y); > > + smp_rmb(); > > + if (r1 == 1) { > > + r2 = *a; > > + } > > + WRITE_ONCE(*b, r2); > > + smp_wmb(); > > + WRITE_ONCE(*x, 1); > > +} > > + > > +exists (b=42) > > diff --git a/manual/oota/C-JO-OOTA-3.litmus b/manual/oota/C-JO-OOTA-3.litmus > > new file mode 100644 > > index 00000000..633b8334 > > --- /dev/null > > +++ b/manual/oota/C-JO-OOTA-3.litmus > > @@ -0,0 +1,46 @@ > > +C C-JO-OOTA-3 > > + > > +(* > > + * Result: Never > > + * > > + * But LKMM finds the all-ones result, perhaps due to not tracking > > + * control dependencies out of the "if" statement. > > + * > > + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ > > + *) > > + > > +{} > > + > > +P0(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + > > + r1 = READ_ONCE(*x); > > + smp_rmb(); > > + r2 = READ_ONCE(*b); > > + if (r1 == 1) { > > + r2 = *b; > > + } > > + WRITE_ONCE(*a, r2); > > + smp_wmb(); > > + WRITE_ONCE(*y, 1); > > +} > > + > > +P1(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + > > + r1 = READ_ONCE(*y); > > + smp_rmb(); > > + r2 = READ_ONCE(*a); > > + if (r1 == 1) { > > + r2 = *a; > > + } > > + WRITE_ONCE(*b, r2); > > + smp_wmb(); > > + WRITE_ONCE(*x, 1); > > +} > > + > > +exists (0:r1=1 /\ 1:r1=1) > > diff --git a/manual/oota/C-JO-OOTA-4.litmus b/manual/oota/C-JO-OOTA-4.litmus > > new file mode 100644 > > index 00000000..cab7ebb6 > > --- /dev/null > > +++ b/manual/oota/C-JO-OOTA-4.litmus > > @@ -0,0 +1,43 @@ > > +C C-JO-OOTA-4 > > + > > +(* > > + * Result: Never > > + * > > + * And LKMM agrees, which might be a surprise. > > + * > > + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ > > + *) > > + > > +{} > > + > > +P0(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + int r3; > > + > > + r1 = READ_ONCE(*x); > > + smp_rmb(); > > + r2 = *b; > > + r3 = r1 == 0; > > + WRITE_ONCE(*a, (r3 + 1) & r2); > > + smp_wmb(); > > + WRITE_ONCE(*y, 1); > > +} > > + > > +P1(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + int r3; > > + > > + r1 = READ_ONCE(*y); > > + smp_rmb(); > > + r2 = *a; > > + r3 = r1 == 0; > > + WRITE_ONCE(*b, (r3 + 1) & r2); > > + smp_wmb(); > > + WRITE_ONCE(*x, 1); > > +} > > + > > +exists (0:r1=1 /\ 1:r1=1) > > diff --git a/manual/oota/C-JO-OOTA-5.litmus b/manual/oota/C-JO-OOTA-5.litmus > > new file mode 100644 > > index 00000000..145c8378 > > --- /dev/null > > +++ b/manual/oota/C-JO-OOTA-5.litmus > > @@ -0,0 +1,44 @@ > > +C C-JO-OOTA-5 > > + > > +(* > > + * Result: Never > > + * > > + * But LKMM finds the all-ones result, perhaps due r2 being unused. > > + * > > + * https://lore.kernel.org/all/1daba0ea-0dd6-4e67-923e-fd3c1a62b40b@huaweicloud.com/ > > + *) > > + > > +{} > > + > > +P0(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + > > + r1 = READ_ONCE(*x); > > + smp_rmb(); > > + if (r1 == 1) { > > + r2 = READ_ONCE(*a); > > + } > > + *b = 1; > > + smp_wmb(); > > + WRITE_ONCE(*y, 1); > > +} > > + > > +P1(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + > > + r1 = READ_ONCE(*y); > > + smp_rmb(); > > + if (r1 == 1) { > > + r2 = READ_ONCE(*b); > > + } > > + *a = 1; > > + smp_wmb(); > > + WRITE_ONCE(*x, 1); > > +} > > + > > +locations [0:r2;1:r2] > > +exists (0:r1=1 /\ 1:r1=1) > > diff --git a/manual/oota/C-JO-OOTA-6.litmus b/manual/oota/C-JO-OOTA-6.litmus > > new file mode 100644 > > index 00000000..942e6c82 > > --- /dev/null > > +++ b/manual/oota/C-JO-OOTA-6.litmus > > @@ -0,0 +1,44 @@ > > +C C-JO-OOTA-6 > > + > > +(* > > + * Result: Never > > + * > > + * But LKMM finds the all-ones result, due to OOTA on r2. > > + * > > + * https://lore.kernel.org/all/1147ad3e-e3ad-4fa1-9a63-772ba136ea9a@huaweicloud.com/ > > + *) > > + > > +{} > > + > > +P0(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + > > + r1 = READ_ONCE(*x); > > + smp_rmb(); > > + if (r1 == 1) { > > + r2 = READ_ONCE(*a); > > + } > > + *b = r2; > > + smp_wmb(); > > + WRITE_ONCE(*y, 1); > > +} > > + > > +P1(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + > > + r1 = READ_ONCE(*y); > > + smp_rmb(); > > + if (r1 == 1) { > > + r2 = READ_ONCE(*b); > > + } > > + *a = r2; > > + smp_wmb(); > > + WRITE_ONCE(*x, 1); > > +} > > + > > +locations [0:r2;1:r2] > > +exists (0:r1=1 /\ 1:r1=1) > > diff --git a/manual/oota/C-JO-OOTA-7.litmus b/manual/oota/C-JO-OOTA-7.litmus > > new file mode 100644 > > index 00000000..31c0b8ae > > --- /dev/null > > +++ b/manual/oota/C-JO-OOTA-7.litmus > > @@ -0,0 +1,47 @@ > > +C C-JO-OOTA-7 > > + > > +(* > > + * Result: Never > > + * > > + * But LKMM finds the all-ones result, due to OOTA on r2. > > + * > > + * https://lore.kernel.org/all/1147ad3e-e3ad-4fa1-9a63-772ba136ea9a@huaweicloud.com/ > > + *) > > + > > +{ > > + 0:r2=a; > > + 1:r2=b; > > +} > > + > > +P0(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + > > + r1 = READ_ONCE(*x); > > + smp_rmb(); > > + if (r1 == 1) { > > + r2 = READ_ONCE(*a); > > + } > > + *r2 = a; > > + smp_wmb(); > > + WRITE_ONCE(*y, 1); > > +} > > + > > +P1(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + > > + r1 = READ_ONCE(*y); > > + smp_rmb(); > > + if (r1 == 1) { > > + r2 = READ_ONCE(*b); > > + } > > + *r2 = b; > > + smp_wmb(); > > + WRITE_ONCE(*x, 1); > > +} > > + > > +locations [0:r2;1:r2] > > +exists (0:r1=1 /\ 1:r1=1) > > diff --git a/manual/oota/C-PM-OOTA-1.litmus b/manual/oota/C-PM-OOTA-1.litmus > > new file mode 100644 > > index 00000000..e771e3c9 > > --- /dev/null > > +++ b/manual/oota/C-PM-OOTA-1.litmus > > @@ -0,0 +1,37 @@ > > +C C-PM-OOTA-1 > > + > > +(* > > + * Result: Never > > + * > > + * LKMM agrees. > > + * > > + * https://lore.kernel.org/all/9a0dccbb-bfa7-4b33-ac1a-daa9841b609a@paulmck-laptop/ > > + *) > > + > > +{} > > + > > +P0(int *a, int *b, int *x, int *y) { > > + int r1; > > + > > + r1 = READ_ONCE(*x); > > + smp_rmb(); > > + if (r1 == 1) { > > + WRITE_ONCE(*a, *b); > > + } > > + smp_wmb(); > > + WRITE_ONCE(*y, 1); > > +} > > + > > +P1(int *a, int *b, int *x, int *y) { > > + int r1; > > + > > + r1 = READ_ONCE(*y); > > + smp_rmb(); > > + if (r1 == 1) { > > + WRITE_ONCE(*b, *a); > > + } > > + smp_wmb(); > > + WRITE_ONCE(*x, 1); > > +} > > + > > +exists b=42 > ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-07-23 16:39 ` Paul E. McKenney @ 2025-07-24 14:14 ` Paul E. McKenney 2025-07-25 5:23 ` Hernan Ponce de Leon 0 siblings, 1 reply; 59+ messages in thread From: Paul E. McKenney @ 2025-07-24 14:14 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: Jonas Oberhauser, stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm On Wed, Jul 23, 2025 at 09:39:05AM -0700, Paul E. McKenney wrote: > On Wed, Jul 23, 2025 at 09:26:32AM +0200, Hernan Ponce de Leon wrote: > > On 7/23/2025 2:43 AM, Paul E. McKenney wrote: > > > On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: > > > > The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following > > > > example shared on this list a few years ago: > > > > > > Apologies for being slow, but I have finally added the litmus tests in > > > this email thread to the https://github.com/paulmckrcu/litmus repo. > > > > I do not understand some of the comments in the preamble of the tests: > > > > (* > > * Result: Never > > * > > * But Sometimes in LKMM as of early 2025, given that 42 is a possible > > * value for things like S19.. > > * > > * https://lore.kernel.org/all/20250106214003.504664-1-jonas.oberhauser@huaweicloud.com/ > > *) > > > > I see that herd7 reports one of the states to be [b]=S16. Is this supposed > > to be some kind of symbolic state (i.e., any value is possible)? > > Exactly! > > > The value in the "Result" is what we would like the model to say if we would > > have a perfect version of dependencies, right? > > In this case, yes. I should hasten to add that, compiler optimizations being what they are, "perfect" may or may not be attainable, and even if attainable, might not be maintainable. I am pretty sure that you all already understood that, but I felt the need to make it explicit. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-07-24 14:14 ` Paul E. McKenney @ 2025-07-25 5:23 ` Hernan Ponce de Leon 2025-07-29 20:34 ` Paul E. McKenney 0 siblings, 1 reply; 59+ messages in thread From: Hernan Ponce de Leon @ 2025-07-25 5:23 UTC (permalink / raw) To: paulmck Cc: Jonas Oberhauser, stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm On 7/24/2025 4:14 PM, Paul E. McKenney wrote: > On Wed, Jul 23, 2025 at 09:39:05AM -0700, Paul E. McKenney wrote: >> On Wed, Jul 23, 2025 at 09:26:32AM +0200, Hernan Ponce de Leon wrote: >>> On 7/23/2025 2:43 AM, Paul E. McKenney wrote: >>>> On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: >>>>> The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following >>>>> example shared on this list a few years ago: >>>> >>>> Apologies for being slow, but I have finally added the litmus tests in >>>> this email thread to the https://github.com/paulmckrcu/litmus repo. >>> >>> I do not understand some of the comments in the preamble of the tests: >>> >>> (* >>> * Result: Never >>> * >>> * But Sometimes in LKMM as of early 2025, given that 42 is a possible >>> * value for things like S19.. >>> * >>> * https://lore.kernel.org/all/20250106214003.504664-1-jonas.oberhauser@huaweicloud.com/ >>> *) >>> >>> I see that herd7 reports one of the states to be [b]=S16. Is this supposed >>> to be some kind of symbolic state (i.e., any value is possible)? >> >> Exactly! >> >>> The value in the "Result" is what we would like the model to say if we would >>> have a perfect version of dependencies, right? >> >> In this case, yes. > > I should hasten to add that, compiler optimizations being what they are, > "perfect" may or may not be attainable, and even if attainable, might > not be maintainable. Yes, I just wanted to clarify if this is what herd7 + the current model are saying or what developers should expect. Hernan > > I am pretty sure that you all already understood that, but I felt the > need to make it explicit. ;-) > > Thanx, Paul ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-07-25 5:23 ` Hernan Ponce de Leon @ 2025-07-29 20:34 ` Paul E. McKenney 0 siblings, 0 replies; 59+ messages in thread From: Paul E. McKenney @ 2025-07-29 20:34 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: Jonas Oberhauser, stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm On Fri, Jul 25, 2025 at 07:23:23AM +0200, Hernan Ponce de Leon wrote: > On 7/24/2025 4:14 PM, Paul E. McKenney wrote: > > On Wed, Jul 23, 2025 at 09:39:05AM -0700, Paul E. McKenney wrote: > > > On Wed, Jul 23, 2025 at 09:26:32AM +0200, Hernan Ponce de Leon wrote: > > > > On 7/23/2025 2:43 AM, Paul E. McKenney wrote: > > > > > On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: > > > > > > The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following > > > > > > example shared on this list a few years ago: > > > > > > > > > > Apologies for being slow, but I have finally added the litmus tests in > > > > > this email thread to the https://github.com/paulmckrcu/litmus repo. > > > > > > > > I do not understand some of the comments in the preamble of the tests: > > > > > > > > (* > > > > * Result: Never > > > > * > > > > * But Sometimes in LKMM as of early 2025, given that 42 is a possible > > > > * value for things like S19.. > > > > * > > > > * https://lore.kernel.org/all/20250106214003.504664-1-jonas.oberhauser@huaweicloud.com/ > > > > *) > > > > > > > > I see that herd7 reports one of the states to be [b]=S16. Is this supposed > > > > to be some kind of symbolic state (i.e., any value is possible)? > > > > > > Exactly! > > > > > > > The value in the "Result" is what we would like the model to say if we would > > > > have a perfect version of dependencies, right? > > > > > > In this case, yes. > > > > I should hasten to add that, compiler optimizations being what they are, > > "perfect" may or may not be attainable, and even if attainable, might > > not be maintainable. > > Yes, I just wanted to clarify if this is what herd7 + the current model are > saying or what developers should expect. Good point, and I added explicit words to this effect in the comments of those aspirational OOTA litmus tests, so thank you! Thanx, Paul > Hernan > > > > > I am pretty sure that you all already understood that, but I felt the > > need to make it explicit. ;-) > > > > Thanx, Paul > ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-07-23 0:43 ` Paul E. McKenney 2025-07-23 7:26 ` Hernan Ponce de Leon @ 2025-07-23 17:13 ` Alan Stern 2025-07-23 17:27 ` Paul E. McKenney 2025-07-23 19:25 ` Alan Stern 2 siblings, 1 reply; 59+ messages in thread From: Alan Stern @ 2025-07-23 17:13 UTC (permalink / raw) To: Paul E. McKenney Cc: Jonas Oberhauser, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Tue, Jul 22, 2025 at 05:43:16PM -0700, Paul E. McKenney wrote: > On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: > > The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following > > example shared on this list a few years ago: > > Apologies for being slow, but I have finally added the litmus tests in > this email thread to the https://github.com/paulmckrcu/litmus repo. > > It is quite likely that I have incorrectly intuited the missing portions > of the litmus tests, especially the two called out in the commit log > below. If you have time, please do double-check. I didn't look very closely when this first came out... > --- /dev/null > +++ b/manual/oota/C-AS-OOTA-2.litmus > @@ -0,0 +1,33 @@ > +C C-AS-OOTA-2 > + > +(* > + * Result: Always > + * > + * If we were using C-language relaxed atomics instead of volatiles, > + * the compiler *could* eliminate the first WRITE_ONCE() in each process, > + * then also each process's local variable, thus having an undefined value > + * for each of those local variables. But this cannot happen given that > + * we are using Linux-kernel _ONCE() primitives. > + * > + * https://lore.kernel.org/all/c2ae9bca-8526-425e-b9b5-135004ad59ad@rowland.harvard.edu/ > + *) > + > +{} > + > +P0(int *a, int *b) > +{ > + int r0 = READ_ONCE(*a); > + > + WRITE_ONCE(*b, r0); > + WRITE_ONCE(*b, 2); > +} > + > +P1(int *a, int *b) > +{ > + int r1 = READ_ONCE(*b); > + > + WRITE_ONCE(*a, r0); This should be r1 instead of r0. > + WRITE_ONCE(*a, 2); > +} > + > +exists ((0:r0=0 \/ 0:r0=2) /\ (1:r1=0 \/ 1:r1=2)) Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-07-23 17:13 ` Alan Stern @ 2025-07-23 17:27 ` Paul E. McKenney 0 siblings, 0 replies; 59+ messages in thread From: Paul E. McKenney @ 2025-07-23 17:27 UTC (permalink / raw) To: Alan Stern Cc: Jonas Oberhauser, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Wed, Jul 23, 2025 at 01:13:35PM -0400, Alan Stern wrote: > On Tue, Jul 22, 2025 at 05:43:16PM -0700, Paul E. McKenney wrote: > > On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote: > > > The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following > > > example shared on this list a few years ago: > > > > Apologies for being slow, but I have finally added the litmus tests in > > this email thread to the https://github.com/paulmckrcu/litmus repo. > > > > It is quite likely that I have incorrectly intuited the missing portions > > of the litmus tests, especially the two called out in the commit log > > below. If you have time, please do double-check. > > I didn't look very closely when this first came out... > > > --- /dev/null > > +++ b/manual/oota/C-AS-OOTA-2.litmus > > @@ -0,0 +1,33 @@ > > +C C-AS-OOTA-2 > > + > > +(* > > + * Result: Always > > + * > > + * If we were using C-language relaxed atomics instead of volatiles, > > + * the compiler *could* eliminate the first WRITE_ONCE() in each process, > > + * then also each process's local variable, thus having an undefined value > > + * for each of those local variables. But this cannot happen given that > > + * we are using Linux-kernel _ONCE() primitives. > > + * > > + * https://lore.kernel.org/all/c2ae9bca-8526-425e-b9b5-135004ad59ad@rowland.harvard.edu/ > > + *) > > + > > +{} > > + > > +P0(int *a, int *b) > > +{ > > + int r0 = READ_ONCE(*a); > > + > > + WRITE_ONCE(*b, r0); > > + WRITE_ONCE(*b, 2); > > +} > > + > > +P1(int *a, int *b) > > +{ > > + int r1 = READ_ONCE(*b); > > + > > + WRITE_ONCE(*a, r0); > > This should be r1 instead of r0. Ah, good eyes, thank you! With that change, I still get "Always" as shown below. Which I believe makes sense, given that LKMM deals with volatile atomics in contrast to the C++ relaxed atomics that you were discussing in the email. Please let me know if I am still missing something. > > + WRITE_ONCE(*a, 2); > > +} > > + > > +exists ((0:r0=0 \/ 0:r0=2) /\ (1:r1=0 \/ 1:r1=2)) > > Alan ------------------------------------------------------------------------ $ herd7 -conf linux-kernel.cfg ~/paper/scalability/LWNLinuxMM/litmus/manual/oota/C-AS-OOTA-2.litmus Test C-AS-OOTA-2 Allowed States 3 0:r0=0; 1:r1=0; 0:r0=0; 1:r1=2; 0:r0=2; 1:r1=0; Ok Witnesses Positive: 5 Negative: 0 Condition exists ((0:r0=0 \/ 0:r0=2) /\ (1:r1=0 \/ 1:r1=2)) Observation C-AS-OOTA-2 Always 5 0 Time C-AS-OOTA-2 0.01 Hash=7b4c046bc861c102997a87e32907fa80 ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-07-23 0:43 ` Paul E. McKenney 2025-07-23 7:26 ` Hernan Ponce de Leon 2025-07-23 17:13 ` Alan Stern @ 2025-07-23 19:25 ` Alan Stern 2025-07-23 19:57 ` Paul E. McKenney 2 siblings, 1 reply; 59+ messages in thread From: Alan Stern @ 2025-07-23 19:25 UTC (permalink / raw) To: Paul E. McKenney Cc: Jonas Oberhauser, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Tue, Jul 22, 2025 at 05:43:16PM -0700, Paul E. McKenney wrote: > Also, C-JO-OOTA-7.litmus includes a "*r2 = a" statement that makes herd7 > very unhappy. On the other hand, initializing registers to the address > of a variable is straight forward, as shown in the resulting litmus test. ... > diff --git a/manual/oota/C-JO-OOTA-7.litmus b/manual/oota/C-JO-OOTA-7.litmus > new file mode 100644 > index 00000000..31c0b8ae > --- /dev/null > +++ b/manual/oota/C-JO-OOTA-7.litmus > @@ -0,0 +1,47 @@ > +C C-JO-OOTA-7 > + > +(* > + * Result: Never > + * > + * But LKMM finds the all-ones result, due to OOTA on r2. > + * > + * https://lore.kernel.org/all/1147ad3e-e3ad-4fa1-9a63-772ba136ea9a@huaweicloud.com/ > + *) > + > +{ > + 0:r2=a; > + 1:r2=b; > +} In this litmus test a and b are never assigned any values, so they always contain 0. > + > +P0(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + > + r1 = READ_ONCE(*x); > + smp_rmb(); > + if (r1 == 1) { > + r2 = READ_ONCE(*a); If this executes then r2 now contains 0. > + } > + *r2 = a; And so what is supposed to happen here? No wonder herd7 is unhappy! > + smp_wmb(); > + WRITE_ONCE(*y, 1); > +} > + > +P1(int *a, int *b, int *x, int *y) > +{ > + int r1; > + int r2; > + > + r1 = READ_ONCE(*y); > + smp_rmb(); > + if (r1 == 1) { > + r2 = READ_ONCE(*b); > + } > + *r2 = b; Same here. > + smp_wmb(); > + WRITE_ONCE(*x, 1); > +} > + > +locations [0:r2;1:r2] > +exists (0:r1=1 /\ 1:r1=1) Alan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [RFC] tools/memory-model: Rule out OOTA 2025-07-23 19:25 ` Alan Stern @ 2025-07-23 19:57 ` Paul E. McKenney 0 siblings, 0 replies; 59+ messages in thread From: Paul E. McKenney @ 2025-07-23 19:57 UTC (permalink / raw) To: Alan Stern Cc: Jonas Oberhauser, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, urezki, quic_neeraju, frederic, linux-kernel, lkmm, hernan.poncedeleon On Wed, Jul 23, 2025 at 03:25:13PM -0400, Alan Stern wrote: > On Tue, Jul 22, 2025 at 05:43:16PM -0700, Paul E. McKenney wrote: > > Also, C-JO-OOTA-7.litmus includes a "*r2 = a" statement that makes herd7 > > very unhappy. On the other hand, initializing registers to the address > > of a variable is straight forward, as shown in the resulting litmus test. > > ... > > > diff --git a/manual/oota/C-JO-OOTA-7.litmus b/manual/oota/C-JO-OOTA-7.litmus > > new file mode 100644 > > index 00000000..31c0b8ae > > --- /dev/null > > +++ b/manual/oota/C-JO-OOTA-7.litmus > > @@ -0,0 +1,47 @@ > > +C C-JO-OOTA-7 > > + > > +(* > > + * Result: Never > > + * > > + * But LKMM finds the all-ones result, due to OOTA on r2. > > + * > > + * https://lore.kernel.org/all/1147ad3e-e3ad-4fa1-9a63-772ba136ea9a@huaweicloud.com/ > > + *) > > + > > +{ > > + 0:r2=a; > > + 1:r2=b; > > +} > > In this litmus test a and b are never assigned any values, so they > always contain 0. > > > + > > +P0(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + > > + r1 = READ_ONCE(*x); > > + smp_rmb(); > > + if (r1 == 1) { > > + r2 = READ_ONCE(*a); > > If this executes then r2 now contains 0. > > > + } > > + *r2 = a; > > And so what is supposed to happen here? No wonder herd7 is unhappy! Nothing good, I will admit! Good eyes, and thank you! > > + smp_wmb(); > > + WRITE_ONCE(*y, 1); > > +} > > + > > +P1(int *a, int *b, int *x, int *y) > > +{ > > + int r1; > > + int r2; > > + > > + r1 = READ_ONCE(*y); > > + smp_rmb(); > > + if (r1 == 1) { > > + r2 = READ_ONCE(*b); > > + } > > + *r2 = b; > > Same here. > > > + smp_wmb(); > > + WRITE_ONCE(*x, 1); > > +} > > + > > +locations [0:r2;1:r2] > > +exists (0:r1=1 /\ 1:r1=1) Yes, I did misinterpret Jonas's initialization advice, which reads as follows: "unless you know how to initialize *a and *b to valid addresses, you may need to add something like `if (r2 == 0) r2 = a` to run this in herd7". Given that there are two instances of r2, there are a number of possible combinations of initialization. I picked the one shown in the patch below, and got this: $ herd7 -conf linux-kernel.cfg ~/paper/scalability/LWNLinuxMM/litmus/manual/oota/C-JO-OOTA-7.litmus Test C-JO-OOTA-7 Allowed States 3 0:r1=0; 0:r2=a; 1:r1=0; 1:r2=b; 0:r1=0; 0:r2=a; 1:r1=1; 1:r2=b; 0:r1=1; 0:r2=a; 1:r1=0; 1:r2=b; No Witnesses Positive: 0 Negative: 3 Flag mixed-accesses Condition exists (0:r1=1 /\ 1:r1=1) Observation C-JO-OOTA-7 Never 0 3 Time C-JO-OOTA-7 0.01 Hash=d9bb35335e45b31b1a39bab88eca837c I get something very similar if I cross-initialize them, that is a=b;b=a. Thoughts? Thanx, Paul ------------------------------------------------------------------------ diff --git a/manual/oota/C-JO-OOTA-7.litmus b/manual/oota/C-JO-OOTA-7.litmus index 31c0b8ae..d7fe0f94 100644 --- a/manual/oota/C-JO-OOTA-7.litmus +++ b/manual/oota/C-JO-OOTA-7.litmus @@ -11,6 +11,8 @@ C C-JO-OOTA-7 { 0:r2=a; 1:r2=b; + a=a; + b=b; } P0(int *a, int *b, int *x, int *y) ^ permalink raw reply related [flat|nested] 59+ messages in thread
end of thread, other threads:[~2025-07-29 20:34 UTC | newest] Thread overview: 59+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-06 21:40 [RFC] tools/memory-model: Rule out OOTA Jonas Oberhauser 2025-01-07 10:06 ` Peter Zijlstra 2025-01-07 11:02 ` Jonas Oberhauser 2025-01-07 15:46 ` Jonas Oberhauser 2025-01-07 16:09 ` Alan Stern 2025-01-07 18:47 ` Paul E. McKenney 2025-01-08 17:39 ` Jonas Oberhauser 2025-01-08 18:09 ` Paul E. McKenney 2025-01-08 19:17 ` Jonas Oberhauser 2025-01-09 17:54 ` Paul E. McKenney 2025-01-09 18:35 ` Jonas Oberhauser 2025-01-10 14:54 ` Paul E. McKenney 2025-01-10 16:21 ` Jonas Oberhauser 2025-01-13 22:04 ` Paul E. McKenney 2025-01-16 18:40 ` Paul E. McKenney 2025-01-16 19:13 ` Jonas Oberhauser 2025-01-16 19:31 ` Paul E. McKenney 2025-01-16 20:21 ` Jonas Oberhauser 2025-01-16 19:28 ` Jonas Oberhauser 2025-01-16 19:39 ` Paul E. McKenney 2025-01-17 12:08 ` Jonas Oberhauser 2025-01-16 19:08 ` Jonas Oberhauser 2025-01-16 23:02 ` Alan Stern 2025-01-17 8:34 ` Hernan Ponce de Leon 2025-01-17 11:29 ` Jonas Oberhauser 2025-01-17 20:01 ` Alan Stern 2025-01-21 10:36 ` Jonas Oberhauser 2025-01-21 16:39 ` Alan Stern 2025-01-22 3:46 ` Jonas Oberhauser 2025-01-22 19:11 ` Alan Stern 2025-01-17 15:52 ` Alan Stern 2025-01-17 16:45 ` Jonas Oberhauser 2025-01-17 19:02 ` Alan Stern 2025-01-09 20:37 ` Peter Zijlstra 2025-01-09 21:13 ` Paul E. McKenney 2025-01-08 17:33 ` Jonas Oberhauser 2025-01-08 18:47 ` Alan Stern 2025-01-08 19:22 ` Jonas Oberhauser 2025-01-09 16:17 ` Alan Stern 2025-01-09 16:44 ` Jonas Oberhauser 2025-01-09 19:27 ` Alan Stern 2025-01-09 20:09 ` Jonas Oberhauser 2025-01-10 3:12 ` Alan Stern 2025-01-10 12:21 ` Jonas Oberhauser 2025-01-10 21:51 ` Alan Stern 2025-01-11 12:46 ` Jonas Oberhauser 2025-01-11 21:19 ` Alan Stern 2025-01-12 15:55 ` Jonas Oberhauser 2025-01-13 19:43 ` Alan Stern 2025-07-23 0:43 ` Paul E. McKenney 2025-07-23 7:26 ` Hernan Ponce de Leon 2025-07-23 16:39 ` Paul E. McKenney 2025-07-24 14:14 ` Paul E. McKenney 2025-07-25 5:23 ` Hernan Ponce de Leon 2025-07-29 20:34 ` Paul E. McKenney 2025-07-23 17:13 ` Alan Stern 2025-07-23 17:27 ` Paul E. McKenney 2025-07-23 19:25 ` Alan Stern 2025-07-23 19:57 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).