* Some observations (results) on BPF acquire and release
@ 2024-10-23 17:47 Andrea Parri
2024-10-24 4:25 ` Paul E. McKenney
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Andrea Parri @ 2024-10-23 17:47 UTC (permalink / raw)
To: puranjay, paulmck; +Cc: bpf, lkmm, linux-kernel
Hi Puranjay and Paul,
I'm running some experiment on the (experimental) formalization of BPF
acquire and release available from [1] and wanted to report about some
(initial) observations for discussion and possibly future developments;
apologies in advance for the relatively long email and any repetition.
A first and probably most important observation is that the (current)
formalization of acquire and release appears to be "too strong": IIUC,
the simplest example/illustration for this is given by the following
BPF R+release+fence
{
0:r2=x; 0:r4=y;
1:r2=y; 1:r4=x; 1:r6=l;
}
P0 | P1 ;
r1 = 1 | r1 = 2 ;
*(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ;
r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ;
exists ([y]=2 /\ 1:r3=0)
This "exists" condition is not satisfiable according to the BPF model;
however, if we adopt the "natural"/intended(?) PowerPC implementations
of the synchronization primitives above (aka, with store_release() -->
LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
condition in question becomes (architecturally) satisfiable on PowerPC
(although I'm not aware of actual observations on PowerPC hardware).
At first, the previous observation (validated via simulations and later
extended to similar but more complex scenarios ) made me believe that
the BPF formalization of acquire and release could be strictly stronger
than the corresponding LKMM formalization; but that is _not_ the case:
The following "exists" condition is satisfiable according to the BPF
model (and it remains satisfiable even if the load_acquire() in P2 is
paired with an additional store_release() in P1). In contrast, the
corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire()
and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the
same conclusion holds even if the putative smp_load_acquire() in P2 is
"replaced" with an smp_rmb() or with an address dependency).
BPF Z6.3+fence+fence+acquire
{
0:r2=x; 0:r4=y; 0:r6=l;
1:r2=y; 1:r4=z; 1:r6=m;
2:r2=z; 2:r4=x;
}
P0 | P1 | P2 ;
r1 = 1 | r1 = 2 | r1 = load_acquire((u32 *)(r2 + 0)) ;
*(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 | r3 = *(u32 *)(r4 + 0) ;
r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | ;
r3 = 1 | r3 = 1 | ;
*(u32 *)(r4 + 0) = r3 | *(u32 *)(r4 + 0) = r3 | ;
exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0)
These remarks show that the proposed BPF formalization of acquire and
release somehow, but substantially, diverged from the corresponding
LKMM formalization. My guess is that the divergences mentioned above
were not (fully) intentional, or I'm wondering -- why not follow the
latter (the LKMM's) more closely? - This is probably the first question
I would need to clarify before trying/suggesting modifications to the
present formalizations. ;-) Thoughts?
Andrea
[1] https://github.com/puranjaymohan/herdtools7/commits/bpf_acquire_release/
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: Some observations (results) on BPF acquire and release 2024-10-23 17:47 Some observations (results) on BPF acquire and release Andrea Parri @ 2024-10-24 4:25 ` Paul E. McKenney 2024-10-24 11:13 ` Andrea Parri 2024-10-24 11:44 ` Jonas Oberhauser 2024-10-25 9:32 ` Hernan Ponce de Leon 2 siblings, 1 reply; 19+ messages in thread From: Paul E. McKenney @ 2024-10-24 4:25 UTC (permalink / raw) To: Andrea Parri; +Cc: puranjay, bpf, lkmm, linux-kernel On Wed, Oct 23, 2024 at 08:47:44PM +0300, Andrea Parri wrote: > Hi Puranjay and Paul, > > I'm running some experiment on the (experimental) formalization of BPF > acquire and release available from [1] and wanted to report about some > (initial) observations for discussion and possibly future developments; > apologies in advance for the relatively long email and any repetition. > > > A first and probably most important observation is that the (current) > formalization of acquire and release appears to be "too strong": IIUC, > the simplest example/illustration for this is given by the following > > BPF R+release+fence > { > 0:r2=x; 0:r4=y; > 1:r2=y; 1:r4=x; 1:r6=l; > } > P0 | P1 ; > r1 = 1 | r1 = 2 ; > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; > r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; > store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ; > exists ([y]=2 /\ 1:r3=0) > > This "exists" condition is not satisfiable according to the BPF model; > however, if we adopt the "natural"/intended(?) PowerPC implementations > of the synchronization primitives above (aka, with store_release() --> > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the > condition in question becomes (architecturally) satisfiable on PowerPC > (although I'm not aware of actual observations on PowerPC hardware). Yes, you are quite right, for efficient use on PowerPC, we need the BPF memory model to allow the above cycle in the R litmus test. My bad, as I put too much emphasis on ARM64. > At first, the previous observation (validated via simulations and later > extended to similar but more complex scenarios ) made me believe that > the BPF formalization of acquire and release could be strictly stronger > than the corresponding LKMM formalization; but that is _not_ the case: > > The following "exists" condition is satisfiable according to the BPF > model (and it remains satisfiable even if the load_acquire() in P2 is > paired with an additional store_release() in P1). In contrast, the > corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire() > and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the > same conclusion holds even if the putative smp_load_acquire() in P2 is > "replaced" with an smp_rmb() or with an address dependency). > > BPF Z6.3+fence+fence+acquire > { > 0:r2=x; 0:r4=y; 0:r6=l; > 1:r2=y; 1:r4=z; 1:r6=m; > 2:r2=z; 2:r4=x; > } > P0 | P1 | P2 ; > r1 = 1 | r1 = 2 | r1 = load_acquire((u32 *)(r2 + 0)) ; > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 | r3 = *(u32 *)(r4 + 0) ; > r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | ; > r3 = 1 | r3 = 1 | ; > *(u32 *)(r4 + 0) = r3 | *(u32 *)(r4 + 0) = r3 | ; > exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0) And again agreed, we do want to forbid Z6.3. > These remarks show that the proposed BPF formalization of acquire and > release somehow, but substantially, diverged from the corresponding > LKMM formalization. My guess is that the divergences mentioned above > were not (fully) intentional, or I'm wondering -- why not follow the > latter (the LKMM's) more closely? - This is probably the first question > I would need to clarify before trying/suggesting modifications to the > present formalizations. ;-) Thoughts? Thank you for digging into this! I clearly need to get my validation work going again, but I very much welcome any further help you would be willing to provide. Thanx, Paul > Andrea > > > [1] https://github.com/puranjaymohan/herdtools7/commits/bpf_acquire_release/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-24 4:25 ` Paul E. McKenney @ 2024-10-24 11:13 ` Andrea Parri 2024-10-25 13:56 ` Andrea Parri 0 siblings, 1 reply; 19+ messages in thread From: Andrea Parri @ 2024-10-24 11:13 UTC (permalink / raw) To: Paul E. McKenney; +Cc: puranjay, bpf, lkmm, linux-kernel On Wed, Oct 23, 2024 at 09:25:40PM -0700, Paul E. McKenney wrote: > On Wed, Oct 23, 2024 at 08:47:44PM +0300, Andrea Parri wrote: > > Hi Puranjay and Paul, > > > > I'm running some experiment on the (experimental) formalization of BPF > > acquire and release available from [1] and wanted to report about some > > (initial) observations for discussion and possibly future developments; > > apologies in advance for the relatively long email and any repetition. > > > > > > A first and probably most important observation is that the (current) > > formalization of acquire and release appears to be "too strong": IIUC, > > the simplest example/illustration for this is given by the following > > > > BPF R+release+fence > > { > > 0:r2=x; 0:r4=y; > > 1:r2=y; 1:r4=x; 1:r6=l; > > } > > P0 | P1 ; > > r1 = 1 | r1 = 2 ; > > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; > > r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; > > store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ; > > exists ([y]=2 /\ 1:r3=0) > > > > This "exists" condition is not satisfiable according to the BPF model; > > however, if we adopt the "natural"/intended(?) PowerPC implementations > > of the synchronization primitives above (aka, with store_release() --> > > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the > > condition in question becomes (architecturally) satisfiable on PowerPC > > (although I'm not aware of actual observations on PowerPC hardware). > > Yes, you are quite right, for efficient use on PowerPC, we need the BPF > memory model to allow the above cycle in the R litmus test. My bad, > as I put too much emphasis on ARM64. > > > At first, the previous observation (validated via simulations and later > > extended to similar but more complex scenarios ) made me believe that > > the BPF formalization of acquire and release could be strictly stronger > > than the corresponding LKMM formalization; but that is _not_ the case: > > > > The following "exists" condition is satisfiable according to the BPF > > model (and it remains satisfiable even if the load_acquire() in P2 is > > paired with an additional store_release() in P1). In contrast, the > > corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire() > > and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the > > same conclusion holds even if the putative smp_load_acquire() in P2 is > > "replaced" with an smp_rmb() or with an address dependency). > > > > BPF Z6.3+fence+fence+acquire > > { > > 0:r2=x; 0:r4=y; 0:r6=l; > > 1:r2=y; 1:r4=z; 1:r6=m; > > 2:r2=z; 2:r4=x; > > } > > P0 | P1 | P2 ; > > r1 = 1 | r1 = 2 | r1 = load_acquire((u32 *)(r2 + 0)) ; > > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 | r3 = *(u32 *)(r4 + 0) ; > > r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | ; > > r3 = 1 | r3 = 1 | ; > > *(u32 *)(r4 + 0) = r3 | *(u32 *)(r4 + 0) = r3 | ; > > exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0) > > And again agreed, we do want to forbid Z6.3. > > > These remarks show that the proposed BPF formalization of acquire and > > release somehow, but substantially, diverged from the corresponding > > LKMM formalization. My guess is that the divergences mentioned above > > were not (fully) intentional, or I'm wondering -- why not follow the > > latter (the LKMM's) more closely? - This is probably the first question > > I would need to clarify before trying/suggesting modifications to the > > present formalizations. ;-) Thoughts? > > Thank you for digging into this! > > I clearly need to get my validation work going again, but I very much > welcome any further help you would be willing to provide. Thanks for the confirmation. The BPF tests above (and other I have) were all hand-written, but I'm working towards improving such automation/validation; I won't keep it a secret should I find something relevant/interesting. :-) But the subset of the LKMM which deals with "strong fences" and Acq & Rel (limited to so called marked accesses) seems relatively contained /simple: its analysis could be useful, if not determining, in trying to resolve the above issues. Andrea ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-24 11:13 ` Andrea Parri @ 2024-10-25 13:56 ` Andrea Parri 2024-11-13 19:25 ` Andrea Parri 0 siblings, 1 reply; 19+ messages in thread From: Andrea Parri @ 2024-10-25 13:56 UTC (permalink / raw) To: Paul E. McKenney; +Cc: puranjay, bpf, lkmm, linux-kernel > But the subset of the LKMM which deals with "strong fences" and Acq & > Rel (limited to so called marked accesses) seems relatively contained > /simple: its analysis could be useful, if not determining, in trying > to resolve the above issues. Elaborating on the previous suggestion/comparison with the LKMM, the "subset" in question can take the following form (modulo my typos): "LKMM with acquire/release, strong fences, and marked accesses only" [...] let acq-po = [Acq] ; po ; [M] let po-rel = [M] ; po ; [Rel] let strong-fence = [M] ; fencerel(Mb) ; [M] let ppo = acq-po | po-rel | strong-fence let A-cumul(r) = rfe? ; r let cumul-fence = A-cumul(strong-fence | po-rel) let overwrite = co | fr let prop = (overwrite & ext)? ; cumul-fence* ; rfe? let hb = ppo | rfe | ((prop \ id) & int) acyclic hb as Hb let pb = prop ; strong-fence ; hb* acyclic pb as Pb For BPF, we'd want to replace acq-po, po-rel and strong-fence with load_acquire, store_release and po_amo_fetch respectively: Unless I'm missing something, this should restore the intended behaviors for the R and Z6.3 tests discussed earlier. A couple of other remarks: - Notice how the above formalization is completely symmetrical wrt. co <-> fr, IOW, co links are considered "on par with" fr links. In particular, the following test is satisfiable in the above formalization, as is the corresponding C test in the LKMM: BPF 2+2W+release+fence { 0:r2=x; 0:r4=y; 1:r2=y; 1:r4=x; 1:r6=l; } P0 | P1 ; r1 = 1 | r1 = 1 ; *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; r3 = 2 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; store_release((u32 *)(r4 + 0), r3) | r3 = 2 ; | store_release((u32 *)(r4 + 0), r3) ; exists ([x]=1 /\ [y]=1) (On an historical note, this wasn't always the case in the LKMM, cf. e.g. [1], but be alerted that the formalization in [1] is decisively more involved and less intuitive than today's / what the LKMM community has converged to. ;-) ) - The above formalization merges the so called "Observation" axiom in the "Happens-before" axiom. In the LKMM, this followed the removal of B-cumulativity for smp_wmb() and smp_store_release() and a consequent "great simplification" of the hb relation: link [2] can provide more details and some examples related to those changes. For completeness, here is the BPF analogue of test "C-release-acquire-is-B-cumulative.litmus" from that article: BPF ISA2+release+acquire+acquire { 0:r2=x; 0:r4=y; 1:r2=y; 1:r4=z; 2:r2=z; 2:r4=x; } P0 | P1 | P2 ; r1 = 1 | r1 = load_acquire((u32 *)(r2 + 0)) | r1 = load_acquire((u32 *)(r2 + 0)) ; *(u32 *)(r2 + 0) = r1 | r3 = 1 | r3 = *(u32 *)(r4 + 0) ; r3 = 1 | *(u32 *)(r4 + 0) = r3 | ; store_release((u32 *)(r4 + 0), r3) | | ; exists (1:r1=1 /\ 2:r1=1 /\ 2:r3=0) The formalization sketched above allows this behavior. Notice, however, that the behavior is forbidden after "completion" of the release/acquire chain, i.e. by making the store from P1 a store-release (a property also known as A-cumulativy of the release operation). I guess the next question (once clarified the intentions for the R and Z6.3 tests seen earlier) is "Does BPF really care about 2+2W and B-cumulativity for store-release?"; I mentioned some tradeoff, but in the end this is a call for the BPF community. Andrea [1] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/StrongModel.html [2] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/WeakModel.html#Cumulativity ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-25 13:56 ` Andrea Parri @ 2024-11-13 19:25 ` Andrea Parri 2024-11-13 22:59 ` Puranjay Mohan 0 siblings, 1 reply; 19+ messages in thread From: Andrea Parri @ 2024-11-13 19:25 UTC (permalink / raw) To: Paul E. McKenney; +Cc: puranjay, bpf, lkmm, linux-kernel [...] > I guess the next question (once clarified the intentions for the R > and Z6.3 tests seen earlier) is "Does BPF really care about 2+2W > and B-cumulativity for store-release?"; I mentioned some tradeoff, > but in the end this is a call for the BPF community. Interpreting the radio silence as an unanimous "No, it doesn't", please find tentative fixes/patch (on top of the bpf_acquire_release branch cited in an earlier post) at the bottom of this email. While testing the changes in question, I noticed an (unrelated) omission in the current PPO relation; the second patch below addresses that. Both patches were tested using the "BPF catalogue" available in the tree at stake: as expected, the only differences in outcomes were for the new/added five tests. Please use and integrate according to your preference, any feedback welcome. Andrea From 5bf399413578b6a94c42f6367245be2bdbf58926 Mon Sep 17 00:00:00 2001 From: Andrea Parri <parri.andrea@gmail.com> Date: Mon, 11 Nov 2024 08:38:42 +0200 Subject: [PATCH 1/2] BPF: Fix propagation ordering, after LKMM Signed-off-by: Andrea Parri <parri.andrea@gmail.com> --- catalogue/bpf/tests/2+2W+release+fence.litmus | 15 ++++++++++++ .../tests/ISA2+release+acquire+acquire.litmus | 15 ++++++++++++ catalogue/bpf/tests/R+release+fence.litmus | 14 +++++++++++ .../bpf/tests/Z6.3+fence+fence+acquire.litmus | 16 +++++++++++++ herd/libdir/bpf.cat | 24 +++++++++---------- 5 files changed, 72 insertions(+), 12 deletions(-) create mode 100644 catalogue/bpf/tests/2+2W+release+fence.litmus create mode 100644 catalogue/bpf/tests/ISA2+release+acquire+acquire.litmus create mode 100644 catalogue/bpf/tests/R+release+fence.litmus create mode 100644 catalogue/bpf/tests/Z6.3+fence+fence+acquire.litmus diff --git a/catalogue/bpf/tests/2+2W+release+fence.litmus b/catalogue/bpf/tests/2+2W+release+fence.litmus new file mode 100644 index 0000000000000..0f88babbf27de --- /dev/null +++ b/catalogue/bpf/tests/2+2W+release+fence.litmus @@ -0,0 +1,15 @@ +BPF 2+2W+release+fence +(* + * Result: Sometimes + *) +{ + 0:r2=x; 0:r4=y; + 1:r2=y; 1:r4=x; 1:r6=l; +} + P0 | P1 ; + r1 = 1 | r1 = 1 ; + *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; + r3 = 2 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; + store_release((u32 *)(r4 + 0), r3) | r3 = 2 ; + | *(u32 *)(r4 + 0) = r3 ; +exists ([x]=1 /\ [y]=1) diff --git a/catalogue/bpf/tests/ISA2+release+acquire+acquire.litmus b/catalogue/bpf/tests/ISA2+release+acquire+acquire.litmus new file mode 100644 index 0000000000000..44c1308d70b5a --- /dev/null +++ b/catalogue/bpf/tests/ISA2+release+acquire+acquire.litmus @@ -0,0 +1,15 @@ +BPF ISA2+release+acquire+acquire +(* + * Result: Sometimes + *) +{ + 0:r2=x; 0:r4=y; + 1:r2=y; 1:r4=z; + 2:r2=z; 2:r4=x; +} + P0 | P1 | P2 ; + r1 = 1 | r1 = load_acquire((u32 *)(r2 + 0)) | r1 = load_acquire((u32 *)(r2 + 0)) ; + *(u32 *)(r2 + 0) = r1 | r3 = 1 | r3 = *(u32 *)(r4 + 0) ; + r3 = 1 | *(u32 *)(r4 + 0) = r3 | ; + store_release((u32 *)(r4 + 0), r3) | | ; +exists (1:r1=1 /\ 2:r1=1 /\ 2:r3=0) diff --git a/catalogue/bpf/tests/R+release+fence.litmus b/catalogue/bpf/tests/R+release+fence.litmus new file mode 100644 index 0000000000000..a0bcbe0bbb804 --- /dev/null +++ b/catalogue/bpf/tests/R+release+fence.litmus @@ -0,0 +1,14 @@ +BPF R+release+fence +(* + * Result: Sometimes + *) +{ + 0:r2=x; 0:r4=y; + 1:r2=y; 1:r4=x; 1:r6=l; +} + P0 | P1 ; + r1 = 1 | r1 = 2 ; + *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; + r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; + store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ; +exists ([y]=2 /\ 1:r3=0) diff --git a/catalogue/bpf/tests/Z6.3+fence+fence+acquire.litmus b/catalogue/bpf/tests/Z6.3+fence+fence+acquire.litmus new file mode 100644 index 0000000000000..67e9146bcef2b --- /dev/null +++ b/catalogue/bpf/tests/Z6.3+fence+fence+acquire.litmus @@ -0,0 +1,16 @@ +BPF Z6.3+fence+fence+acquire +(* + * Result: Never + *) +{ + 0:r2=x; 0:r4=y; 0:r6=l; + 1:r2=y; 1:r4=z; 1:r6=m; + 2:r2=z; 2:r4=x; +} + P0 | P1 | P2 ; + r1 = 1 | r1 = 2 | r1 = load_acquire((u32 *)(r2 + 0)) ; + *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 | r3 = *(u32 *)(r4 + 0) ; + r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | ; + r3 = 1 | r3 = 1 | ; + *(u32 *)(r4 + 0) = r3 | *(u32 *)(r4 + 0) = r3 | ; +exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0) diff --git a/herd/libdir/bpf.cat b/herd/libdir/bpf.cat index 28b42497ea0fc..7803a1e818de7 100644 --- a/herd/libdir/bpf.cat +++ b/herd/libdir/bpf.cat @@ -43,9 +43,9 @@ show addr_dep as addr show data_dep as data show ctrl_dep as ctrl -(*************) -(* ppo rules *) -(*************) +(**********************) +(* ppo and prop rules *) +(**********************) let ppo = (* Explicit synchronization *) @@ -65,6 +65,10 @@ include "cos-opt.cat" let com = co | rf | fr +(* Propagation ordering from SC and release operations *) +let A-cumul = rfe? ; (po_amo_fetch | store_release) +let prop = (coe | fre)? ; A-cumul* ; rfe? + (**********) (* Axioms *) (**********) @@ -72,17 +76,13 @@ let com = co | rf | fr (* Sc per location *) acyclic com | po-loc as Coherence -(* No thin air *) -let hb = (ppo | rfe)+ -acyclic hb +(* No-Thin-Air and Observation *) +let hb = ppo | rfe | ((prop \ id) & int) +acyclic hb as Happens-before (* Propagation *) -let A-cumul = rfe;po_amo_fetch | rfe;store_release | po_amo_fetch;fre -let prop = (store_release | po_amo_fetch | A-cumul);hb* -acyclic prop | co - -(* Observation *) -irreflexive fre;prop +let pb = prop ; po_amo_fetch ; hb* +acyclic pb as Propagation (* Atomicity *) empty rmw & (fre;coe) as Atomic -- 2.43.0 From 84520eaacbcb399b5cc7b4cbe0f716ad84e87824 Mon Sep 17 00:00:00 2001 From: Andrea Parri <parri.andrea@gmail.com> Date: Mon, 11 Nov 2024 08:03:48 +0200 Subject: [PATCH 2/2] BPF: Fix overlapping-address ordering Signed-off-by: Andrea Parri <parri.andrea@gmail.com> --- catalogue/bpf/tests/LB+release-oa+acquire.litmus | 15 +++++++++++++++ herd/libdir/bpf.cat | 13 ++++++------- 2 files changed, 21 insertions(+), 7 deletions(-) create mode 100644 catalogue/bpf/tests/LB+release-oa+acquire.litmus diff --git a/catalogue/bpf/tests/LB+release-oa+acquire.litmus b/catalogue/bpf/tests/LB+release-oa+acquire.litmus new file mode 100644 index 0000000000000..1544179357e9b --- /dev/null +++ b/catalogue/bpf/tests/LB+release-oa+acquire.litmus @@ -0,0 +1,15 @@ +BPF LB+release-oa+acquire +(* + * Result: Never + *) +{ + 0:r2=x; 0:r4=y; + 1:r2=y; 1:r4=x; +} + P0 | P1 ; + r1 = *(u32 *)(r2 + 0) | r1 = load_acquire((u32 *)(r2 + 0)) ; + r3 = 1 | r3 = 1 ; + store_release((u32 *)(r4 + 0), r3) | *(u32 *)(r4 + 0) = r3 ; + r5 = 2 | ; + *(u32 *)(r4 + 0) = r5 | ; +exists (0:r1=1 /\ 1:r1=2) diff --git a/herd/libdir/bpf.cat b/herd/libdir/bpf.cat index 7803a1e818de7..4917d0f77009f 100644 --- a/herd/libdir/bpf.cat +++ b/herd/libdir/bpf.cat @@ -47,6 +47,10 @@ show ctrl_dep as ctrl (* ppo and prop rules *) (**********************) +(* Compute coherence relation *) +include "cos-opt.cat" +let com = co | rf | fr + let ppo = (* Explicit synchronization *) po_amo_fetch | rcpc @@ -57,13 +61,8 @@ let ppo = (* Pipeline Dependencies *) | [M];(addr|data);[W];rfi;[R] | [M];addr;[M];po;[W] -(* Successful cmpxchg R -(M)> W *) -| rmw - -(* Compute coherence relation *) -include "cos-opt.cat" - -let com = co | rf | fr +(* Overlapping-address ordering *) +| (coi | fri) (* Propagation ordering from SC and release operations *) let A-cumul = rfe? ; (po_amo_fetch | store_release) -- 2.43.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-11-13 19:25 ` Andrea Parri @ 2024-11-13 22:59 ` Puranjay Mohan 2024-11-14 14:56 ` Andrea Parri 0 siblings, 1 reply; 19+ messages in thread From: Puranjay Mohan @ 2024-11-13 22:59 UTC (permalink / raw) To: Andrea Parri; +Cc: Paul E. McKenney, puranjay, bpf, lkmm, linux-kernel On Wed, Nov 13, 2024 at 8:25 PM Andrea Parri <parri.andrea@gmail.com> wrote: > > [...] > > > I guess the next question (once clarified the intentions for the R > > and Z6.3 tests seen earlier) is "Does BPF really care about 2+2W > > and B-cumulativity for store-release?"; I mentioned some tradeoff, > > but in the end this is a call for the BPF community. > > Interpreting the radio silence as an unanimous "No, it doesn't", please find > tentative fixes/patch (on top of the bpf_acquire_release branch cited in an > earlier post) at the bottom of this email. > > While testing the changes in question, I noticed an (unrelated) omission in > the current PPO relation; the second patch below addresses that. > > Both patches were tested using the "BPF catalogue" available in the tree at > stake: as expected, the only differences in outcomes were for the new/added > five tests. > > Please use and integrate according to your preference, any feedback welcome. Hi Andrea Sorry for the silence and a huge thanks for working on this and fixing the cat file. I have applied your patches and modified them to add the new tests to kinds.txt and shelf.py now these tests will run with all other tests using 'make cata-bpf-test' All 175 tests, including new tests added by you, pass :D make cata-bpf-test _build/default/internal/herd_catalogue_regression_test.exe \ -j 32 \ -herd-timeout 16.0 \ -herd-path _build/install/default/bin/herd7 \ -libdir-path ./herd/libdir \ -kinds-path catalogue/bpf/tests/kinds.txt \ -shelf-path catalogue/bpf/shelf.py \ test herd7 catalogue bpf tests: OK I have pushed it and sent a PR so we can get all this merged to master. https://github.com/herd/herdtools7/pull/1050 P.S. - I am writing this using gmail so the formatting might be broken, sorry for that. Thanks, Puranjay Mohan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-11-13 22:59 ` Puranjay Mohan @ 2024-11-14 14:56 ` Andrea Parri 0 siblings, 0 replies; 19+ messages in thread From: Andrea Parri @ 2024-11-14 14:56 UTC (permalink / raw) To: Puranjay Mohan; +Cc: Paul E. McKenney, puranjay, bpf, lkmm, linux-kernel > I have applied your patches and modified them to add the new tests to > kinds.txt and shelf.py > now these tests will run with all other tests using 'make cata-bpf-test' > > All 175 tests, including new tests added by you, pass :D > > make cata-bpf-test > > _build/default/internal/herd_catalogue_regression_test.exe \ > -j 32 \ > -herd-timeout 16.0 \ > -herd-path _build/install/default/bin/herd7 \ > -libdir-path ./herd/libdir \ > -kinds-path catalogue/bpf/tests/kinds.txt \ > -shelf-path catalogue/bpf/shelf.py \ > test > herd7 catalogue bpf tests: OK > > > I have pushed it and sent a PR so we can get all this merged to master. > https://github.com/herd/herdtools7/pull/1050 Cool! Thank you for the follow-up. Andrea ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-23 17:47 Some observations (results) on BPF acquire and release Andrea Parri 2024-10-24 4:25 ` Paul E. McKenney @ 2024-10-24 11:44 ` Jonas Oberhauser 2024-10-24 12:11 ` Puranjay Mohan 2024-10-25 9:32 ` Hernan Ponce de Leon 2 siblings, 1 reply; 19+ messages in thread From: Jonas Oberhauser @ 2024-10-24 11:44 UTC (permalink / raw) To: Andrea Parri, puranjay, paulmck; +Cc: bpf, lkmm, linux-kernel Am 10/23/2024 um 7:47 PM schrieb Andrea Parri: > Hi Puranjay and Paul, > > These remarks show that the proposed BPF formalization of acquire and > release somehow, but substantially, diverged from the corresponding > LKMM formalization. My guess is that the divergences mentioned above > were not (fully) intentional, or I'm wondering -- why not follow the > latter (the LKMM's) more closely? - This is probably the first question > I would need to clarify before trying/suggesting modifications to the > present formalizations. ;-) Thoughts? > I'm also curious why the formalization (not just in the semantics but also how it is structured) is so completely different from LKMM's. At first glance there are also many semantic differences, e.g., it seems coe is much weaker in eBPF and the last axiom also seems a bit like a tack-on that doesn't "play well" with the previous axioms. It would make sense to me to start with the framework of LKMM and maybe weaken it from there if it is really necessary. But maybe I don't know enough about how eBPF atomics are intended to work... Best wishes, jonas ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-24 11:44 ` Jonas Oberhauser @ 2024-10-24 12:11 ` Puranjay Mohan 2024-10-24 12:21 ` Jonas Oberhauser 0 siblings, 1 reply; 19+ messages in thread From: Puranjay Mohan @ 2024-10-24 12:11 UTC (permalink / raw) To: Jonas Oberhauser, Andrea Parri, paulmck; +Cc: bpf, lkmm, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2051 bytes --] Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> writes: > Am 10/23/2024 um 7:47 PM schrieb Andrea Parri: >> Hi Puranjay and Paul, >> >> These remarks show that the proposed BPF formalization of acquire and >> release somehow, but substantially, diverged from the corresponding >> LKMM formalization. My guess is that the divergences mentioned above >> were not (fully) intentional, or I'm wondering -- why not follow the >> latter (the LKMM's) more closely? - This is probably the first question >> I would need to clarify before trying/suggesting modifications to the >> present formalizations. ;-) Thoughts? >> > > I'm also curious why the formalization (not just in the semantics but > also how it is structured) is so completely different from LKMM's. While initially writing the cat formalization for BPF, I started with LKMM but because BPF memory model is an instruction level memory model and much simpler than LKMM, I wrote it from scratch. But I converted all LKMM litmus tests to BPF and made sure that the cat model is complaint. > At first glance there are also many semantic differences, e.g., it seems > coe is much weaker in eBPF and the last axiom also seems a bit like a > tack-on that doesn't "play well" with the previous axioms. Yes, the last axion is a tack-on that I added to make acquire/release work with other atomics just before the presentation at LPC. The acquire/release part is still under development and not perfect. If what you are saying about coe is true then it is a bug and I will try to fix it. > It would make sense to me to start with the framework of LKMM and maybe > weaken it from there if it is really necessary. But maybe I don't know > enough about how eBPF atomics are intended to work... The cat formalization for BPF is currently experimental and it would be great if people can find bugs and contribute to it. It would be great if people who worked on the LKMM's cat file could help building BPF's cat file too. Thanks, Puranjay [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 255 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-24 12:11 ` Puranjay Mohan @ 2024-10-24 12:21 ` Jonas Oberhauser 2024-10-24 14:42 ` Paul E. McKenney 0 siblings, 1 reply; 19+ messages in thread From: Jonas Oberhauser @ 2024-10-24 12:21 UTC (permalink / raw) To: Puranjay Mohan, Andrea Parri, paulmck; +Cc: bpf, lkmm, linux-kernel Am 10/24/2024 um 2:11 PM schrieb Puranjay Mohan: > Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> writes: > >> Am 10/23/2024 um 7:47 PM schrieb Andrea Parri: >>> Hi Puranjay and Paul, >>> >>> These remarks show that the proposed BPF formalization of acquire and >>> release somehow, but substantially, diverged from the corresponding >>> LKMM formalization. My guess is that the divergences mentioned above >>> were not (fully) intentional, or I'm wondering -- why not follow the >>> latter (the LKMM's) more closely? - This is probably the first question >>> I would need to clarify before trying/suggesting modifications to the >>> present formalizations. ;-) Thoughts? >>> >> >> I'm also curious why the formalization (not just in the semantics but >> also how it is structured) is so completely different from LKMM's. > Thanks Puranjay for your response! > BPF memory model is an instruction level memory model You mean BPF has no optimizing byte code compiler? Is it guaranteed to stay this way? WASM does JIT optimizations as far as I know, which would bring back a lot of the complexity of software models like LKMM. > much simpler than LKMM LKMM has a simple core, roughly like this: ppo = ... (* all the ppo related rules that are relevant to you -- some fences don't matter and you can just remove them *) prop = (coe | fre) (* remove reflexive closure *) ; ... hb = [Marked] ; (ppo | rfe | prop & int | prop ; strong-sync) ; [Marked] acyclic hb (* ... also add the atomicity & sc-per-loc axioms *) If you can exclude compiler optimizations, you can remove the Marked bits. Best wishes, jonas ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-24 12:21 ` Jonas Oberhauser @ 2024-10-24 14:42 ` Paul E. McKenney 0 siblings, 0 replies; 19+ messages in thread From: Paul E. McKenney @ 2024-10-24 14:42 UTC (permalink / raw) To: Jonas Oberhauser; +Cc: Puranjay Mohan, Andrea Parri, bpf, lkmm, linux-kernel On Thu, Oct 24, 2024 at 02:21:15PM +0200, Jonas Oberhauser wrote: > > > Am 10/24/2024 um 2:11 PM schrieb Puranjay Mohan: > > Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> writes: > > > > > Am 10/23/2024 um 7:47 PM schrieb Andrea Parri: > > > > Hi Puranjay and Paul, > > > > > > > > These remarks show that the proposed BPF formalization of acquire and > > > > release somehow, but substantially, diverged from the corresponding > > > > LKMM formalization. My guess is that the divergences mentioned above > > > > were not (fully) intentional, or I'm wondering -- why not follow the > > > > latter (the LKMM's) more closely? - This is probably the first question > > > > I would need to clarify before trying/suggesting modifications to the > > > > present formalizations. ;-) Thoughts? > > > > > > > > > > I'm also curious why the formalization (not just in the semantics but > > > also how it is structured) is so completely different from LKMM's. > > > > Thanks Puranjay for your response! > > > > BPF memory model is an instruction level memory model > > You mean BPF has no optimizing byte code compiler? > Is it guaranteed to stay this way? > WASM does JIT optimizations as far as I know, which would bring back a lot > of the complexity of software models like LKMM. Sadly (at least from a simplicity viewpoint), BPF assembly goes at least through compiler backends. So there will be some issues of this sort. We have started on them, but as Puranjay says, there is more to be done. Thanx, Paul > > much simpler than LKMM > > LKMM has a simple core, roughly like this: > > ppo = ... (* all the ppo related rules that are relevant to you -- some > fences don't matter and you can just remove them *) > prop = (coe | fre) (* remove reflexive closure *) ; ... > hb = [Marked] ; (ppo | rfe | prop & int | prop ; strong-sync) ; [Marked] > > acyclic hb > (* ... also add the atomicity & sc-per-loc axioms *) > > If you can exclude compiler optimizations, you can remove the Marked bits. > > > Best wishes, > > jonas > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-23 17:47 Some observations (results) on BPF acquire and release Andrea Parri 2024-10-24 4:25 ` Paul E. McKenney 2024-10-24 11:44 ` Jonas Oberhauser @ 2024-10-25 9:32 ` Hernan Ponce de Leon 2024-10-25 13:15 ` Andrea Parri 2 siblings, 1 reply; 19+ messages in thread From: Hernan Ponce de Leon @ 2024-10-25 9:32 UTC (permalink / raw) To: Andrea Parri, puranjay, paulmck; +Cc: bpf, lkmm, linux-kernel On 10/23/2024 7:47 PM, Andrea Parri wrote: > Hi Puranjay and Paul, > > I'm running some experiment on the (experimental) formalization of BPF > acquire and release available from [1] and wanted to report about some > (initial) observations for discussion and possibly future developments; > apologies in advance for the relatively long email and any repetition. > > > A first and probably most important observation is that the (current) > formalization of acquire and release appears to be "too strong": IIUC, > the simplest example/illustration for this is given by the following > > BPF R+release+fence > { > 0:r2=x; 0:r4=y; > 1:r2=y; 1:r4=x; 1:r6=l; > } > P0 | P1 ; > r1 = 1 | r1 = 2 ; > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; > r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; > store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ; > exists ([y]=2 /\ 1:r3=0) > > This "exists" condition is not satisfiable according to the BPF model; > however, if we adopt the "natural"/intended(?) PowerPC implementations > of the synchronization primitives above (aka, with store_release() --> > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the > condition in question becomes (architecturally) satisfiable on PowerPC > (although I'm not aware of actual observations on PowerPC hardware). Are the resulting PPC tests available somewhere? > > > At first, the previous observation (validated via simulations and later > extended to similar but more complex scenarios ) made me believe that > the BPF formalization of acquire and release could be strictly stronger > than the corresponding LKMM formalization; but that is _not_ the case: > > The following "exists" condition is satisfiable according to the BPF > model (and it remains satisfiable even if the load_acquire() in P2 is > paired with an additional store_release() in P1). In contrast, the > corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire() > and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the > same conclusion holds even if the putative smp_load_acquire() in P2 is > "replaced" with an smp_rmb() or with an address dependency). > > BPF Z6.3+fence+fence+acquire > { > 0:r2=x; 0:r4=y; 0:r6=l; > 1:r2=y; 1:r4=z; 1:r6=m; > 2:r2=z; 2:r4=x; > } > P0 | P1 | P2 ; > r1 = 1 | r1 = 2 | r1 = load_acquire((u32 *)(r2 + 0)) ; > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 | r3 = *(u32 *)(r4 + 0) ; > r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | ; > r3 = 1 | r3 = 1 | ; > *(u32 *)(r4 + 0) = r3 | *(u32 *)(r4 + 0) = r3 | ; > exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0) > > > These remarks show that the proposed BPF formalization of acquire and > release somehow, but substantially, diverged from the corresponding > LKMM formalization. My guess is that the divergences mentioned above > were not (fully) intentional, or I'm wondering -- why not follow the > latter (the LKMM's) more closely? - This is probably the first question > I would need to clarify before trying/suggesting modifications to the > present formalizations. ;-) Thoughts? > > Andrea > > > [1] https://github.com/puranjaymohan/herdtools7/commits/bpf_acquire_release/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-25 9:32 ` Hernan Ponce de Leon @ 2024-10-25 13:15 ` Andrea Parri 2024-10-25 13:28 ` Hernan Ponce de Leon 0 siblings, 1 reply; 19+ messages in thread From: Andrea Parri @ 2024-10-25 13:15 UTC (permalink / raw) To: Hernan Ponce de Leon; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel > > BPF R+release+fence > > { > > 0:r2=x; 0:r4=y; > > 1:r2=y; 1:r4=x; 1:r6=l; > > } > > P0 | P1 ; > > r1 = 1 | r1 = 2 ; > > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; > > r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; > > store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ; > > exists ([y]=2 /\ 1:r3=0) > > > > This "exists" condition is not satisfiable according to the BPF model; > > however, if we adopt the "natural"/intended(?) PowerPC implementations > > of the synchronization primitives above (aka, with store_release() --> > > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the > > condition in question becomes (architecturally) satisfiable on PowerPC > > (although I'm not aware of actual observations on PowerPC hardware). > > Are the resulting PPC tests available somewhere? My data go back to the LKMM paper, cf. e.g. the R+pooncerelease+fencembonceonce entry at https://diy.inria.fr/linux/hard.html#unseen . Andrea ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-25 13:15 ` Andrea Parri @ 2024-10-25 13:28 ` Hernan Ponce de Leon 2024-10-25 13:44 ` Andrea Parri 0 siblings, 1 reply; 19+ messages in thread From: Hernan Ponce de Leon @ 2024-10-25 13:28 UTC (permalink / raw) To: Andrea Parri; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel On 10/25/2024 3:15 PM, Andrea Parri wrote: >>> BPF R+release+fence >>> { >>> 0:r2=x; 0:r4=y; >>> 1:r2=y; 1:r4=x; 1:r6=l; >>> } >>> P0 | P1 ; >>> r1 = 1 | r1 = 2 ; >>> *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; >>> r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; >>> store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ; >>> exists ([y]=2 /\ 1:r3=0) >>> >>> This "exists" condition is not satisfiable according to the BPF model; >>> however, if we adopt the "natural"/intended(?) PowerPC implementations >>> of the synchronization primitives above (aka, with store_release() --> >>> LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the >>> condition in question becomes (architecturally) satisfiable on PowerPC >>> (although I'm not aware of actual observations on PowerPC hardware). >> >> Are the resulting PPC tests available somewhere? > > My data go back to the LKMM paper, cf. e.g. the R+pooncerelease+fencembonceonce > entry at https://diy.inria.fr/linux/hard.html#unseen . > > Andrea I guess I understood you wrong. I thought you had manually "compiled" those to PPC litmus format (i.e., doing exactly what the JIT compiler would do). I can obviously write them manually myself, but I find this painful and error prone (I am particularly bad at this task), so I wanted to avoid this if someone else had already done it. Hernan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-25 13:28 ` Hernan Ponce de Leon @ 2024-10-25 13:44 ` Andrea Parri 2024-10-25 13:57 ` Hernan Ponce de Leon 0 siblings, 1 reply; 19+ messages in thread From: Andrea Parri @ 2024-10-25 13:44 UTC (permalink / raw) To: Hernan Ponce de Leon; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel On Fri, Oct 25, 2024 at 03:28:17PM +0200, Hernan Ponce de Leon wrote: > On 10/25/2024 3:15 PM, Andrea Parri wrote: > > > > BPF R+release+fence > > > > { > > > > 0:r2=x; 0:r4=y; > > > > 1:r2=y; 1:r4=x; 1:r6=l; > > > > } > > > > P0 | P1 ; > > > > r1 = 1 | r1 = 2 ; > > > > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; > > > > r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; > > > > store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ; > > > > exists ([y]=2 /\ 1:r3=0) > > > > > > > > This "exists" condition is not satisfiable according to the BPF model; > > > > however, if we adopt the "natural"/intended(?) PowerPC implementations > > > > of the synchronization primitives above (aka, with store_release() --> > > > > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the > > > > condition in question becomes (architecturally) satisfiable on PowerPC > > > > (although I'm not aware of actual observations on PowerPC hardware). > > > > > > Are the resulting PPC tests available somewhere? > > > > My data go back to the LKMM paper, cf. e.g. the R+pooncerelease+fencembonceonce > > entry at https://diy.inria.fr/linux/hard.html#unseen . > > > > Andrea > > I guess I understood you wrong. I thought you had manually "compiled" those > to PPC litmus format (i.e., doing exactly what the JIT compiler would do). I > can obviously write them manually myself, but I find this painful and error > prone (I am particularly bad at this task), so I wanted to avoid this if > someone else had already done it. FWIW, a comprehensive collection of PPC litmus tests could be found at https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc002.html (just follow the link on the test pattern/variants to see the sources); be aware the results of those tables date back to the PPC paper though. Alternatively, remind that PPC is well supported by the herdtools7 diy7 generator; I see no reason for having to (re)write such tests manually. Andrea ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-25 13:44 ` Andrea Parri @ 2024-10-25 13:57 ` Hernan Ponce de Leon 2024-10-25 14:27 ` Andrea Parri 2024-10-25 23:26 ` Paul E. McKenney 0 siblings, 2 replies; 19+ messages in thread From: Hernan Ponce de Leon @ 2024-10-25 13:57 UTC (permalink / raw) To: Andrea Parri; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel On 10/25/2024 3:44 PM, Andrea Parri wrote: > On Fri, Oct 25, 2024 at 03:28:17PM +0200, Hernan Ponce de Leon wrote: >> On 10/25/2024 3:15 PM, Andrea Parri wrote: >>>>> BPF R+release+fence >>>>> { >>>>> 0:r2=x; 0:r4=y; >>>>> 1:r2=y; 1:r4=x; 1:r6=l; >>>>> } >>>>> P0 | P1 ; >>>>> r1 = 1 | r1 = 2 ; >>>>> *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; >>>>> r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; >>>>> store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ; >>>>> exists ([y]=2 /\ 1:r3=0) >>>>> >>>>> This "exists" condition is not satisfiable according to the BPF model; >>>>> however, if we adopt the "natural"/intended(?) PowerPC implementations >>>>> of the synchronization primitives above (aka, with store_release() --> >>>>> LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the >>>>> condition in question becomes (architecturally) satisfiable on PowerPC >>>>> (although I'm not aware of actual observations on PowerPC hardware). >>>> >>>> Are the resulting PPC tests available somewhere? >>> >>> My data go back to the LKMM paper, cf. e.g. the R+pooncerelease+fencembonceonce >>> entry at https://diy.inria.fr/linux/hard.html#unseen . >>> >>> Andrea >> >> I guess I understood you wrong. I thought you had manually "compiled" those >> to PPC litmus format (i.e., doing exactly what the JIT compiler would do). I >> can obviously write them manually myself, but I find this painful and error >> prone (I am particularly bad at this task), so I wanted to avoid this if >> someone else had already done it. > > FWIW, a comprehensive collection of PPC litmus tests could be found at > > https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc002.html > > (just follow the link on the test pattern/variants to see the sources); > be aware the results of those tables date back to the PPC paper though. > > Alternatively, remind that PPC is well supported by the herdtools7 diy7 > generator; I see no reason for having to (re)write such tests manually. > > Andrea I am particularly interested in tests using lwarx and stwcx instructions (this is what I understood would be used if one follows [1] to compile the tests in this thread). I have not yet check the cambridge website, but due to the timeline, I don't expect to find tests with those instructions. The same is true with [2]. I have limited experience with diy7, but I remember that it had some limitations to generate RMW instructions, at least for C [3]. Hernan [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/net/bpf_jit_comp32.c [2] https://github.com/herd/herdtools7/tree/master/catalogue/herding-cats/ppc/tests/campaign [3] https://github.com/herd/herdtools7/issues/905 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-25 13:57 ` Hernan Ponce de Leon @ 2024-10-25 14:27 ` Andrea Parri 2024-10-25 15:00 ` Hernan Ponce de Leon 2024-10-25 23:26 ` Paul E. McKenney 1 sibling, 1 reply; 19+ messages in thread From: Andrea Parri @ 2024-10-25 14:27 UTC (permalink / raw) To: Hernan Ponce de Leon; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel > I am particularly interested in tests using lwarx and stwcx instructions > (this is what I understood would be used if one follows [1] to compile the > tests in this thread). > > I have not yet check the cambridge website, but due to the timeline, I don't > expect to find tests with those instructions. The same is true with [2]. > > I have limited experience with diy7, but I remember that it had some > limitations to generate RMW instructions, at least for C [3]. Oh, I'm sure there are, though I'd also not consider myself the 'expert' when it comes to diy7 internals. ;-) Here's an example use of diy7 / diyone7 generating lwarx and stwcx and reflecting the previous pattern: $ diyone7 -arch PPC LwSyncdWW Coe SyncdWRPA SyncdRRAP Fre PPC A "LwSyncdWW Coe SyncdWRNaA SyncdRRANa Fre" Generator=diyone7 (version 7.57+1) Prefetch=0:x=F,0:y=W,1:y=F,1:x=T Com=Co Fr Orig=LwSyncdWW Coe SyncdWRNaA SyncdRRANa Fre { 0:r2=x; 0:r4=y; 1:r2=y; 1:r3=z; 1:r6=x; } P0 | P1 ; li r1,1 | li r1,2 ; stw r1,0(r2) | stw r1,0(r2) ; lwsync | sync ; li r3,1 | Loop00: ; stw r3,0(r4) | lwarx r4,r0,r3 ; | stwcx. r4,r0,r3 ; | bne Loop00 ; | sync ; | lwz r5,0(r6) ; exists ([y]=2 /\ 1:r5=0) But again, I'd probably have to defer to proper herdtools7 developers and maintainers for any diy7 bug or misbehavior you'd have to discover. Andrea > > Hernan > > [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/net/bpf_jit_comp32.c > [2] https://github.com/herd/herdtools7/tree/master/catalogue/herding-cats/ppc/tests/campaign > [3] https://github.com/herd/herdtools7/issues/905 > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-25 14:27 ` Andrea Parri @ 2024-10-25 15:00 ` Hernan Ponce de Leon 0 siblings, 0 replies; 19+ messages in thread From: Hernan Ponce de Leon @ 2024-10-25 15:00 UTC (permalink / raw) To: Andrea Parri; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel On 10/25/2024 4:27 PM, Andrea Parri wrote: >> I am particularly interested in tests using lwarx and stwcx instructions >> (this is what I understood would be used if one follows [1] to compile the >> tests in this thread). >> >> I have not yet check the cambridge website, but due to the timeline, I don't >> expect to find tests with those instructions. The same is true with [2]. >> >> I have limited experience with diy7, but I remember that it had some >> limitations to generate RMW instructions, at least for C [3]. > > Oh, I'm sure there are, though I'd also not consider myself the 'expert' > when it comes to diy7 internals. ;-) Here's an example use of diy7 / > diyone7 generating lwarx and stwcx and reflecting the previous pattern: > > $ diyone7 -arch PPC LwSyncdWW Coe SyncdWRPA SyncdRRAP Fre > PPC A > "LwSyncdWW Coe SyncdWRNaA SyncdRRANa Fre" > Generator=diyone7 (version 7.57+1) > Prefetch=0:x=F,0:y=W,1:y=F,1:x=T > Com=Co Fr > Orig=LwSyncdWW Coe SyncdWRNaA SyncdRRANa Fre > { > 0:r2=x; 0:r4=y; > 1:r2=y; 1:r3=z; 1:r6=x; > } > P0 | P1 ; > li r1,1 | li r1,2 ; > stw r1,0(r2) | stw r1,0(r2) ; > lwsync | sync ; > li r3,1 | Loop00: ; > stw r3,0(r4) | lwarx r4,r0,r3 ; > | stwcx. r4,r0,r3 ; > | bne Loop00 ; > | sync ; > | lwz r5,0(r6) ; > exists ([y]=2 /\ 1:r5=0) That is exactly what I was looking for. Thanks Andrea! Hernan > > But again, I'd probably have to defer to proper herdtools7 developers > and maintainers for any diy7 bug or misbehavior you'd have to discover. > > Andrea > > >> >> Hernan >> >> [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/net/bpf_jit_comp32.c >> [2] https://github.com/herd/herdtools7/tree/master/catalogue/herding-cats/ppc/tests/campaign >> [3] https://github.com/herd/herdtools7/issues/905 >> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Some observations (results) on BPF acquire and release 2024-10-25 13:57 ` Hernan Ponce de Leon 2024-10-25 14:27 ` Andrea Parri @ 2024-10-25 23:26 ` Paul E. McKenney 1 sibling, 0 replies; 19+ messages in thread From: Paul E. McKenney @ 2024-10-25 23:26 UTC (permalink / raw) To: Hernan Ponce de Leon; +Cc: Andrea Parri, puranjay, bpf, lkmm, linux-kernel On Fri, Oct 25, 2024 at 03:57:29PM +0200, Hernan Ponce de Leon wrote: > On 10/25/2024 3:44 PM, Andrea Parri wrote: > > On Fri, Oct 25, 2024 at 03:28:17PM +0200, Hernan Ponce de Leon wrote: > > > On 10/25/2024 3:15 PM, Andrea Parri wrote: > > > > > > BPF R+release+fence > > > > > > { > > > > > > 0:r2=x; 0:r4=y; > > > > > > 1:r2=y; 1:r4=x; 1:r6=l; > > > > > > } > > > > > > P0 | P1 ; > > > > > > r1 = 1 | r1 = 2 ; > > > > > > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; > > > > > > r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; > > > > > > store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ; > > > > > > exists ([y]=2 /\ 1:r3=0) > > > > > > > > > > > > This "exists" condition is not satisfiable according to the BPF model; > > > > > > however, if we adopt the "natural"/intended(?) PowerPC implementations > > > > > > of the synchronization primitives above (aka, with store_release() --> > > > > > > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the > > > > > > condition in question becomes (architecturally) satisfiable on PowerPC > > > > > > (although I'm not aware of actual observations on PowerPC hardware). > > > > > > > > > > Are the resulting PPC tests available somewhere? > > > > > > > > My data go back to the LKMM paper, cf. e.g. the R+pooncerelease+fencembonceonce > > > > entry at https://diy.inria.fr/linux/hard.html#unseen . > > > > > > > > Andrea > > > > > > I guess I understood you wrong. I thought you had manually "compiled" those > > > to PPC litmus format (i.e., doing exactly what the JIT compiler would do). I > > > can obviously write them manually myself, but I find this painful and error > > > prone (I am particularly bad at this task), so I wanted to avoid this if > > > someone else had already done it. > > > > FWIW, a comprehensive collection of PPC litmus tests could be found at > > > > https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc002.html > > > > (just follow the link on the test pattern/variants to see the sources); > > be aware the results of those tables date back to the PPC paper though. > > > > Alternatively, remind that PPC is well supported by the herdtools7 diy7 > > generator; I see no reason for having to (re)write such tests manually. > > > > Andrea > > I am particularly interested in tests using lwarx and stwcx instructions > (this is what I understood would be used if one follows [1] to compile the > tests in this thread). > > I have not yet check the cambridge website, but due to the timeline, I don't > expect to find tests with those instructions. The same is true with [2]. > > I have limited experience with diy7, but I remember that it had some > limitations to generate RMW instructions, at least for C [3]. > > Hernan > > [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/net/bpf_jit_comp32.c > [2] https://github.com/herd/herdtools7/tree/master/catalogue/herding-cats/ppc/tests/campaign > [3] https://github.com/herd/herdtools7/issues/905 Please see attached for a tarball of random PPC litmus tests. You asked for this! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-11-14 14:56 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-23 17:47 Some observations (results) on BPF acquire and release Andrea Parri 2024-10-24 4:25 ` Paul E. McKenney 2024-10-24 11:13 ` Andrea Parri 2024-10-25 13:56 ` Andrea Parri 2024-11-13 19:25 ` Andrea Parri 2024-11-13 22:59 ` Puranjay Mohan 2024-11-14 14:56 ` Andrea Parri 2024-10-24 11:44 ` Jonas Oberhauser 2024-10-24 12:11 ` Puranjay Mohan 2024-10-24 12:21 ` Jonas Oberhauser 2024-10-24 14:42 ` Paul E. McKenney 2024-10-25 9:32 ` Hernan Ponce de Leon 2024-10-25 13:15 ` Andrea Parri 2024-10-25 13:28 ` Hernan Ponce de Leon 2024-10-25 13:44 ` Andrea Parri 2024-10-25 13:57 ` Hernan Ponce de Leon 2024-10-25 14:27 ` Andrea Parri 2024-10-25 15:00 ` Hernan Ponce de Leon 2024-10-25 23:26 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox