public inbox for lkmm@lists.linux.dev
 help / color / mirror / Atom feed
* Some observations (results) on BPF acquire and release
@ 2024-10-23 17:47 Andrea Parri
  2024-10-24  4:25 ` Paul E. McKenney
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Andrea Parri @ 2024-10-23 17:47 UTC (permalink / raw)
  To: puranjay, paulmck; +Cc: bpf, lkmm, linux-kernel

Hi Puranjay and Paul,

I'm running some experiment on the (experimental) formalization of BPF
acquire and release available from [1] and wanted to report about some
(initial) observations for discussion and possibly future developments;
apologies in advance for the relatively long email and any repetition.


A first and probably most important observation is that the (current)
formalization of acquire and release appears to be "too strong": IIUC,
the simplest example/illustration for this is given by the following

BPF R+release+fence
{
 0:r2=x; 0:r4=y;
 1:r2=y; 1:r4=x; 1:r6=l;
}
 P0                                 | P1                                         ;
 r1 = 1                             | r1 = 2                                     ;
 *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
 r3 = 1                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
 store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0)                      ;
exists ([y]=2 /\ 1:r3=0)

This "exists" condition is not satisfiable according to the BPF model;
however, if we adopt the "natural"/intended(?) PowerPC implementations
of the synchronization primitives above (aka, with store_release() -->
LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
condition in question becomes (architecturally) satisfiable on PowerPC
(although I'm not aware of actual observations on PowerPC hardware).


At first, the previous observation (validated via simulations and later
extended to similar but more complex scenarios ) made me believe that
the BPF formalization of acquire and release could be strictly stronger
than the corresponding LKMM formalization; but that is _not_ the case:

The following "exists" condition is satisfiable according to the BPF
model (and it remains satisfiable even if the load_acquire() in P2 is
paired with an additional store_release() in P1).  In contrast, the
corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire()
and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the
same conclusion holds even if the putative smp_load_acquire() in P2 is
"replaced" with an smp_rmb() or with an address dependency).

BPF Z6.3+fence+fence+acquire
{
 0:r2=x; 0:r4=y; 0:r6=l;
 1:r2=y; 1:r4=z; 1:r6=m;
 2:r2=z; 2:r4=x;
}
 P0                                         | P1                                         | P2                                 ;
 r1 = 1                                     | r1 = 2                                     | r1 = load_acquire((u32 *)(r2 + 0)) ;
 *(u32 *)(r2 + 0) = r1                      | *(u32 *)(r2 + 0) = r1                      | r3 = *(u32 *)(r4 + 0)              ;
 r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) |                                    ;
 r3 = 1                                     | r3 = 1                                     |                                    ;
 *(u32 *)(r4 + 0) = r3                      | *(u32 *)(r4 + 0) = r3                      |                                    ;
exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0)


These remarks show that the proposed BPF formalization of acquire and
release somehow, but substantially, diverged from the corresponding
LKMM formalization.  My guess is that the divergences mentioned above
were not (fully) intentional, or I'm wondering -- why not follow the
latter (the LKMM's) more closely? -  This is probably the first question
I would need to clarify before trying/suggesting modifications to the
present formalizations.  ;-)  Thoughts?

  Andrea


[1] https://github.com/puranjaymohan/herdtools7/commits/bpf_acquire_release/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-23 17:47 Some observations (results) on BPF acquire and release Andrea Parri
@ 2024-10-24  4:25 ` Paul E. McKenney
  2024-10-24 11:13   ` Andrea Parri
  2024-10-24 11:44 ` Jonas Oberhauser
  2024-10-25  9:32 ` Hernan Ponce de Leon
  2 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2024-10-24  4:25 UTC (permalink / raw)
  To: Andrea Parri; +Cc: puranjay, bpf, lkmm, linux-kernel

On Wed, Oct 23, 2024 at 08:47:44PM +0300, Andrea Parri wrote:
> Hi Puranjay and Paul,
> 
> I'm running some experiment on the (experimental) formalization of BPF
> acquire and release available from [1] and wanted to report about some
> (initial) observations for discussion and possibly future developments;
> apologies in advance for the relatively long email and any repetition.
> 
> 
> A first and probably most important observation is that the (current)
> formalization of acquire and release appears to be "too strong": IIUC,
> the simplest example/illustration for this is given by the following
> 
> BPF R+release+fence
> {
>  0:r2=x; 0:r4=y;
>  1:r2=y; 1:r4=x; 1:r6=l;
> }
>  P0                                 | P1                                         ;
>  r1 = 1                             | r1 = 2                                     ;
>  *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
>  r3 = 1                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
>  store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0)                      ;
> exists ([y]=2 /\ 1:r3=0)
> 
> This "exists" condition is not satisfiable according to the BPF model;
> however, if we adopt the "natural"/intended(?) PowerPC implementations
> of the synchronization primitives above (aka, with store_release() -->
> LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
> condition in question becomes (architecturally) satisfiable on PowerPC
> (although I'm not aware of actual observations on PowerPC hardware).

Yes, you are quite right, for efficient use on PowerPC, we need the BPF
memory model to allow the above cycle in the R litmus test.  My bad,
as I put too much emphasis on ARM64.

> At first, the previous observation (validated via simulations and later
> extended to similar but more complex scenarios ) made me believe that
> the BPF formalization of acquire and release could be strictly stronger
> than the corresponding LKMM formalization; but that is _not_ the case:
> 
> The following "exists" condition is satisfiable according to the BPF
> model (and it remains satisfiable even if the load_acquire() in P2 is
> paired with an additional store_release() in P1).  In contrast, the
> corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire()
> and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the
> same conclusion holds even if the putative smp_load_acquire() in P2 is
> "replaced" with an smp_rmb() or with an address dependency).
> 
> BPF Z6.3+fence+fence+acquire
> {
>  0:r2=x; 0:r4=y; 0:r6=l;
>  1:r2=y; 1:r4=z; 1:r6=m;
>  2:r2=z; 2:r4=x;
> }
>  P0                                         | P1                                         | P2                                 ;
>  r1 = 1                                     | r1 = 2                                     | r1 = load_acquire((u32 *)(r2 + 0)) ;
>  *(u32 *)(r2 + 0) = r1                      | *(u32 *)(r2 + 0) = r1                      | r3 = *(u32 *)(r4 + 0)              ;
>  r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) |                                    ;
>  r3 = 1                                     | r3 = 1                                     |                                    ;
>  *(u32 *)(r4 + 0) = r3                      | *(u32 *)(r4 + 0) = r3                      |                                    ;
> exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0)

And again agreed, we do want to forbid Z6.3.

> These remarks show that the proposed BPF formalization of acquire and
> release somehow, but substantially, diverged from the corresponding
> LKMM formalization.  My guess is that the divergences mentioned above
> were not (fully) intentional, or I'm wondering -- why not follow the
> latter (the LKMM's) more closely? -  This is probably the first question
> I would need to clarify before trying/suggesting modifications to the
> present formalizations.  ;-)  Thoughts?

Thank you for digging into this!

I clearly need to get my validation work going again, but I very much
welcome any further help you would be willing to provide.

							Thanx, Paul

>   Andrea
> 
> 
> [1] https://github.com/puranjaymohan/herdtools7/commits/bpf_acquire_release/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-24  4:25 ` Paul E. McKenney
@ 2024-10-24 11:13   ` Andrea Parri
  2024-10-25 13:56     ` Andrea Parri
  0 siblings, 1 reply; 19+ messages in thread
From: Andrea Parri @ 2024-10-24 11:13 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: puranjay, bpf, lkmm, linux-kernel

On Wed, Oct 23, 2024 at 09:25:40PM -0700, Paul E. McKenney wrote:
> On Wed, Oct 23, 2024 at 08:47:44PM +0300, Andrea Parri wrote:
> > Hi Puranjay and Paul,
> > 
> > I'm running some experiment on the (experimental) formalization of BPF
> > acquire and release available from [1] and wanted to report about some
> > (initial) observations for discussion and possibly future developments;
> > apologies in advance for the relatively long email and any repetition.
> > 
> > 
> > A first and probably most important observation is that the (current)
> > formalization of acquire and release appears to be "too strong": IIUC,
> > the simplest example/illustration for this is given by the following
> > 
> > BPF R+release+fence
> > {
> >  0:r2=x; 0:r4=y;
> >  1:r2=y; 1:r4=x; 1:r6=l;
> > }
> >  P0                                 | P1                                         ;
> >  r1 = 1                             | r1 = 2                                     ;
> >  *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
> >  r3 = 1                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
> >  store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0)                      ;
> > exists ([y]=2 /\ 1:r3=0)
> > 
> > This "exists" condition is not satisfiable according to the BPF model;
> > however, if we adopt the "natural"/intended(?) PowerPC implementations
> > of the synchronization primitives above (aka, with store_release() -->
> > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
> > condition in question becomes (architecturally) satisfiable on PowerPC
> > (although I'm not aware of actual observations on PowerPC hardware).
> 
> Yes, you are quite right, for efficient use on PowerPC, we need the BPF
> memory model to allow the above cycle in the R litmus test.  My bad,
> as I put too much emphasis on ARM64.
> 
> > At first, the previous observation (validated via simulations and later
> > extended to similar but more complex scenarios ) made me believe that
> > the BPF formalization of acquire and release could be strictly stronger
> > than the corresponding LKMM formalization; but that is _not_ the case:
> > 
> > The following "exists" condition is satisfiable according to the BPF
> > model (and it remains satisfiable even if the load_acquire() in P2 is
> > paired with an additional store_release() in P1).  In contrast, the
> > corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire()
> > and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the
> > same conclusion holds even if the putative smp_load_acquire() in P2 is
> > "replaced" with an smp_rmb() or with an address dependency).
> > 
> > BPF Z6.3+fence+fence+acquire
> > {
> >  0:r2=x; 0:r4=y; 0:r6=l;
> >  1:r2=y; 1:r4=z; 1:r6=m;
> >  2:r2=z; 2:r4=x;
> > }
> >  P0                                         | P1                                         | P2                                 ;
> >  r1 = 1                                     | r1 = 2                                     | r1 = load_acquire((u32 *)(r2 + 0)) ;
> >  *(u32 *)(r2 + 0) = r1                      | *(u32 *)(r2 + 0) = r1                      | r3 = *(u32 *)(r4 + 0)              ;
> >  r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) |                                    ;
> >  r3 = 1                                     | r3 = 1                                     |                                    ;
> >  *(u32 *)(r4 + 0) = r3                      | *(u32 *)(r4 + 0) = r3                      |                                    ;
> > exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0)
> 
> And again agreed, we do want to forbid Z6.3.
> 
> > These remarks show that the proposed BPF formalization of acquire and
> > release somehow, but substantially, diverged from the corresponding
> > LKMM formalization.  My guess is that the divergences mentioned above
> > were not (fully) intentional, or I'm wondering -- why not follow the
> > latter (the LKMM's) more closely? -  This is probably the first question
> > I would need to clarify before trying/suggesting modifications to the
> > present formalizations.  ;-)  Thoughts?
> 
> Thank you for digging into this!
> 
> I clearly need to get my validation work going again, but I very much
> welcome any further help you would be willing to provide.

Thanks for the confirmation.

The BPF tests above (and other I have) were all hand-written, but I'm
working towards improving such automation/validation; I won't keep it
a secret should I find something relevant/interesting.  :-)

But the subset of the LKMM which deals with "strong fences" and Acq &
Rel (limited to so called marked accesses) seems relatively contained
/simple:  its analysis could be useful, if not determining, in trying
to resolve the above issues.

  Andrea

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-23 17:47 Some observations (results) on BPF acquire and release Andrea Parri
  2024-10-24  4:25 ` Paul E. McKenney
@ 2024-10-24 11:44 ` Jonas Oberhauser
  2024-10-24 12:11   ` Puranjay Mohan
  2024-10-25  9:32 ` Hernan Ponce de Leon
  2 siblings, 1 reply; 19+ messages in thread
From: Jonas Oberhauser @ 2024-10-24 11:44 UTC (permalink / raw)
  To: Andrea Parri, puranjay, paulmck; +Cc: bpf, lkmm, linux-kernel



Am 10/23/2024 um 7:47 PM schrieb Andrea Parri:
> Hi Puranjay and Paul,
> 
> These remarks show that the proposed BPF formalization of acquire and
> release somehow, but substantially, diverged from the corresponding
> LKMM formalization.  My guess is that the divergences mentioned above
> were not (fully) intentional, or I'm wondering -- why not follow the
> latter (the LKMM's) more closely? -  This is probably the first question
> I would need to clarify before trying/suggesting modifications to the
> present formalizations.  ;-)  Thoughts?
> 

I'm also curious why the formalization (not just in the semantics but 
also how it is structured) is so completely different from LKMM's.

At first glance there are also many semantic differences, e.g., it seems 
coe is much weaker in eBPF and the last axiom also seems a bit like a 
tack-on that doesn't "play well" with the previous axioms.

It would make sense to me to start with the framework of LKMM and maybe 
weaken it from there if it is really necessary. But maybe I don't know 
enough about how eBPF atomics are intended to work...

Best wishes,
   jonas


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-24 11:44 ` Jonas Oberhauser
@ 2024-10-24 12:11   ` Puranjay Mohan
  2024-10-24 12:21     ` Jonas Oberhauser
  0 siblings, 1 reply; 19+ messages in thread
From: Puranjay Mohan @ 2024-10-24 12:11 UTC (permalink / raw)
  To: Jonas Oberhauser, Andrea Parri, paulmck; +Cc: bpf, lkmm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2051 bytes --]

Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> writes:

> Am 10/23/2024 um 7:47 PM schrieb Andrea Parri:
>> Hi Puranjay and Paul,
>> 
>> These remarks show that the proposed BPF formalization of acquire and
>> release somehow, but substantially, diverged from the corresponding
>> LKMM formalization.  My guess is that the divergences mentioned above
>> were not (fully) intentional, or I'm wondering -- why not follow the
>> latter (the LKMM's) more closely? -  This is probably the first question
>> I would need to clarify before trying/suggesting modifications to the
>> present formalizations.  ;-)  Thoughts?
>> 
>
> I'm also curious why the formalization (not just in the semantics but 
> also how it is structured) is so completely different from LKMM's.

While initially writing the cat formalization for BPF, I started with
LKMM but because BPF memory model is an instruction level memory model
and much simpler than LKMM, I wrote it from scratch. But I converted all
LKMM litmus tests to BPF and made sure that the cat model is complaint.

> At first glance there are also many semantic differences, e.g., it seems 
> coe is much weaker in eBPF and the last axiom also seems a bit like a 
> tack-on that doesn't "play well" with the previous axioms.

Yes, the last axion is a tack-on that I added to make acquire/release
work with other atomics just before the presentation at LPC. The
acquire/release part is still under development and not perfect.

If what you are saying about coe is true then it is a bug and I will try
to fix it. 

> It would make sense to me to start with the framework of LKMM and maybe 
> weaken it from there if it is really necessary. But maybe I don't know 
> enough about how eBPF atomics are intended to work...

The cat formalization for BPF is currently experimental and it would be
great if people can find bugs and contribute to it. It would be great if
people who worked on the LKMM's cat file could help building BPF's cat
file too. 


Thanks,
Puranjay

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-24 12:11   ` Puranjay Mohan
@ 2024-10-24 12:21     ` Jonas Oberhauser
  2024-10-24 14:42       ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Jonas Oberhauser @ 2024-10-24 12:21 UTC (permalink / raw)
  To: Puranjay Mohan, Andrea Parri, paulmck; +Cc: bpf, lkmm, linux-kernel



Am 10/24/2024 um 2:11 PM schrieb Puranjay Mohan:
> Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> writes:
> 
>> Am 10/23/2024 um 7:47 PM schrieb Andrea Parri:
>>> Hi Puranjay and Paul,
>>>
>>> These remarks show that the proposed BPF formalization of acquire and
>>> release somehow, but substantially, diverged from the corresponding
>>> LKMM formalization.  My guess is that the divergences mentioned above
>>> were not (fully) intentional, or I'm wondering -- why not follow the
>>> latter (the LKMM's) more closely? -  This is probably the first question
>>> I would need to clarify before trying/suggesting modifications to the
>>> present formalizations.  ;-)  Thoughts?
>>>
>>
>> I'm also curious why the formalization (not just in the semantics but
>> also how it is structured) is so completely different from LKMM's.
> 

Thanks Puranjay for your response!


 > BPF memory model is an instruction level memory model

You mean BPF has no optimizing byte code compiler?
Is it guaranteed to stay this way?
WASM does JIT optimizations as far as I know, which would bring back a 
lot of the complexity of software models like LKMM.

 > much simpler than LKMM

LKMM has a simple core, roughly like this:

ppo = ... (* all the ppo related rules that are relevant to you -- some 
fences don't matter and you can just remove them *)
prop = (coe | fre) (* remove reflexive closure *) ; ...
hb = [Marked] ; (ppo | rfe | prop & int | prop ; strong-sync) ; [Marked]

acyclic hb
(* ... also add the atomicity & sc-per-loc axioms *)

If you can exclude compiler optimizations, you can remove the Marked bits.


Best wishes,

   jonas


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-24 12:21     ` Jonas Oberhauser
@ 2024-10-24 14:42       ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-10-24 14:42 UTC (permalink / raw)
  To: Jonas Oberhauser; +Cc: Puranjay Mohan, Andrea Parri, bpf, lkmm, linux-kernel

On Thu, Oct 24, 2024 at 02:21:15PM +0200, Jonas Oberhauser wrote:
> 
> 
> Am 10/24/2024 um 2:11 PM schrieb Puranjay Mohan:
> > Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> writes:
> > 
> > > Am 10/23/2024 um 7:47 PM schrieb Andrea Parri:
> > > > Hi Puranjay and Paul,
> > > > 
> > > > These remarks show that the proposed BPF formalization of acquire and
> > > > release somehow, but substantially, diverged from the corresponding
> > > > LKMM formalization.  My guess is that the divergences mentioned above
> > > > were not (fully) intentional, or I'm wondering -- why not follow the
> > > > latter (the LKMM's) more closely? -  This is probably the first question
> > > > I would need to clarify before trying/suggesting modifications to the
> > > > present formalizations.  ;-)  Thoughts?
> > > > 
> > > 
> > > I'm also curious why the formalization (not just in the semantics but
> > > also how it is structured) is so completely different from LKMM's.
> > 
> 
> Thanks Puranjay for your response!
> 
> 
> > BPF memory model is an instruction level memory model
> 
> You mean BPF has no optimizing byte code compiler?
> Is it guaranteed to stay this way?
> WASM does JIT optimizations as far as I know, which would bring back a lot
> of the complexity of software models like LKMM.

Sadly (at least from a simplicity viewpoint), BPF assembly goes at least
through compiler backends.  So there will be some issues of this sort.
We have started on them, but as Puranjay says, there is more to be done.

							Thanx, Paul

> > much simpler than LKMM
> 
> LKMM has a simple core, roughly like this:
> 
> ppo = ... (* all the ppo related rules that are relevant to you -- some
> fences don't matter and you can just remove them *)
> prop = (coe | fre) (* remove reflexive closure *) ; ...
> hb = [Marked] ; (ppo | rfe | prop & int | prop ; strong-sync) ; [Marked]
> 
> acyclic hb
> (* ... also add the atomicity & sc-per-loc axioms *)
> 
> If you can exclude compiler optimizations, you can remove the Marked bits.
> 
> 
> Best wishes,
> 
>   jonas
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-23 17:47 Some observations (results) on BPF acquire and release Andrea Parri
  2024-10-24  4:25 ` Paul E. McKenney
  2024-10-24 11:44 ` Jonas Oberhauser
@ 2024-10-25  9:32 ` Hernan Ponce de Leon
  2024-10-25 13:15   ` Andrea Parri
  2 siblings, 1 reply; 19+ messages in thread
From: Hernan Ponce de Leon @ 2024-10-25  9:32 UTC (permalink / raw)
  To: Andrea Parri, puranjay, paulmck; +Cc: bpf, lkmm, linux-kernel

On 10/23/2024 7:47 PM, Andrea Parri wrote:
> Hi Puranjay and Paul,
> 
> I'm running some experiment on the (experimental) formalization of BPF
> acquire and release available from [1] and wanted to report about some
> (initial) observations for discussion and possibly future developments;
> apologies in advance for the relatively long email and any repetition.
> 
> 
> A first and probably most important observation is that the (current)
> formalization of acquire and release appears to be "too strong": IIUC,
> the simplest example/illustration for this is given by the following
> 
> BPF R+release+fence
> {
>   0:r2=x; 0:r4=y;
>   1:r2=y; 1:r4=x; 1:r6=l;
> }
>   P0                                 | P1                                         ;
>   r1 = 1                             | r1 = 2                                     ;
>   *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
>   r3 = 1                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
>   store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0)                      ;
> exists ([y]=2 /\ 1:r3=0)
> 
> This "exists" condition is not satisfiable according to the BPF model;
> however, if we adopt the "natural"/intended(?) PowerPC implementations
> of the synchronization primitives above (aka, with store_release() -->
> LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
> condition in question becomes (architecturally) satisfiable on PowerPC
> (although I'm not aware of actual observations on PowerPC hardware).

Are the resulting PPC tests available somewhere?

> 
> 
> At first, the previous observation (validated via simulations and later
> extended to similar but more complex scenarios ) made me believe that
> the BPF formalization of acquire and release could be strictly stronger
> than the corresponding LKMM formalization; but that is _not_ the case:
> 
> The following "exists" condition is satisfiable according to the BPF
> model (and it remains satisfiable even if the load_acquire() in P2 is
> paired with an additional store_release() in P1).  In contrast, the
> corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire()
> and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the
> same conclusion holds even if the putative smp_load_acquire() in P2 is
> "replaced" with an smp_rmb() or with an address dependency).
> 
> BPF Z6.3+fence+fence+acquire
> {
>   0:r2=x; 0:r4=y; 0:r6=l;
>   1:r2=y; 1:r4=z; 1:r6=m;
>   2:r2=z; 2:r4=x;
> }
>   P0                                         | P1                                         | P2                                 ;
>   r1 = 1                                     | r1 = 2                                     | r1 = load_acquire((u32 *)(r2 + 0)) ;
>   *(u32 *)(r2 + 0) = r1                      | *(u32 *)(r2 + 0) = r1                      | r3 = *(u32 *)(r4 + 0)              ;
>   r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) |                                    ;
>   r3 = 1                                     | r3 = 1                                     |                                    ;
>   *(u32 *)(r4 + 0) = r3                      | *(u32 *)(r4 + 0) = r3                      |                                    ;
> exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0)
> 
> 
> These remarks show that the proposed BPF formalization of acquire and
> release somehow, but substantially, diverged from the corresponding
> LKMM formalization.  My guess is that the divergences mentioned above
> were not (fully) intentional, or I'm wondering -- why not follow the
> latter (the LKMM's) more closely? -  This is probably the first question
> I would need to clarify before trying/suggesting modifications to the
> present formalizations.  ;-)  Thoughts?
> 
>    Andrea
> 
> 
> [1] https://github.com/puranjaymohan/herdtools7/commits/bpf_acquire_release/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-25  9:32 ` Hernan Ponce de Leon
@ 2024-10-25 13:15   ` Andrea Parri
  2024-10-25 13:28     ` Hernan Ponce de Leon
  0 siblings, 1 reply; 19+ messages in thread
From: Andrea Parri @ 2024-10-25 13:15 UTC (permalink / raw)
  To: Hernan Ponce de Leon; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel

> > BPF R+release+fence
> > {
> >   0:r2=x; 0:r4=y;
> >   1:r2=y; 1:r4=x; 1:r6=l;
> > }
> >   P0                                 | P1                                         ;
> >   r1 = 1                             | r1 = 2                                     ;
> >   *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
> >   r3 = 1                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
> >   store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0)                      ;
> > exists ([y]=2 /\ 1:r3=0)
> > 
> > This "exists" condition is not satisfiable according to the BPF model;
> > however, if we adopt the "natural"/intended(?) PowerPC implementations
> > of the synchronization primitives above (aka, with store_release() -->
> > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
> > condition in question becomes (architecturally) satisfiable on PowerPC
> > (although I'm not aware of actual observations on PowerPC hardware).
> 
> Are the resulting PPC tests available somewhere?

My data go back to the LKMM paper, cf. e.g. the R+pooncerelease+fencembonceonce
entry at https://diy.inria.fr/linux/hard.html#unseen .

  Andrea

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-25 13:15   ` Andrea Parri
@ 2024-10-25 13:28     ` Hernan Ponce de Leon
  2024-10-25 13:44       ` Andrea Parri
  0 siblings, 1 reply; 19+ messages in thread
From: Hernan Ponce de Leon @ 2024-10-25 13:28 UTC (permalink / raw)
  To: Andrea Parri; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel

On 10/25/2024 3:15 PM, Andrea Parri wrote:
>>> BPF R+release+fence
>>> {
>>>    0:r2=x; 0:r4=y;
>>>    1:r2=y; 1:r4=x; 1:r6=l;
>>> }
>>>    P0                                 | P1                                         ;
>>>    r1 = 1                             | r1 = 2                                     ;
>>>    *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
>>>    r3 = 1                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
>>>    store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0)                      ;
>>> exists ([y]=2 /\ 1:r3=0)
>>>
>>> This "exists" condition is not satisfiable according to the BPF model;
>>> however, if we adopt the "natural"/intended(?) PowerPC implementations
>>> of the synchronization primitives above (aka, with store_release() -->
>>> LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
>>> condition in question becomes (architecturally) satisfiable on PowerPC
>>> (although I'm not aware of actual observations on PowerPC hardware).
>>
>> Are the resulting PPC tests available somewhere?
> 
> My data go back to the LKMM paper, cf. e.g. the R+pooncerelease+fencembonceonce
> entry at https://diy.inria.fr/linux/hard.html#unseen .
> 
>    Andrea

I guess I understood you wrong. I thought you had manually "compiled" 
those to PPC litmus format (i.e., doing exactly what the JIT compiler 
would do). I can obviously write them manually myself, but I find this 
painful and error prone (I am particularly bad at this task), so I 
wanted to avoid this if someone else had already done it.

Hernan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-25 13:28     ` Hernan Ponce de Leon
@ 2024-10-25 13:44       ` Andrea Parri
  2024-10-25 13:57         ` Hernan Ponce de Leon
  0 siblings, 1 reply; 19+ messages in thread
From: Andrea Parri @ 2024-10-25 13:44 UTC (permalink / raw)
  To: Hernan Ponce de Leon; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel

On Fri, Oct 25, 2024 at 03:28:17PM +0200, Hernan Ponce de Leon wrote:
> On 10/25/2024 3:15 PM, Andrea Parri wrote:
> > > > BPF R+release+fence
> > > > {
> > > >    0:r2=x; 0:r4=y;
> > > >    1:r2=y; 1:r4=x; 1:r6=l;
> > > > }
> > > >    P0                                 | P1                                         ;
> > > >    r1 = 1                             | r1 = 2                                     ;
> > > >    *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
> > > >    r3 = 1                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
> > > >    store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0)                      ;
> > > > exists ([y]=2 /\ 1:r3=0)
> > > > 
> > > > This "exists" condition is not satisfiable according to the BPF model;
> > > > however, if we adopt the "natural"/intended(?) PowerPC implementations
> > > > of the synchronization primitives above (aka, with store_release() -->
> > > > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
> > > > condition in question becomes (architecturally) satisfiable on PowerPC
> > > > (although I'm not aware of actual observations on PowerPC hardware).
> > > 
> > > Are the resulting PPC tests available somewhere?
> > 
> > My data go back to the LKMM paper, cf. e.g. the R+pooncerelease+fencembonceonce
> > entry at https://diy.inria.fr/linux/hard.html#unseen .
> > 
> >    Andrea
> 
> I guess I understood you wrong. I thought you had manually "compiled" those
> to PPC litmus format (i.e., doing exactly what the JIT compiler would do). I
> can obviously write them manually myself, but I find this painful and error
> prone (I am particularly bad at this task), so I wanted to avoid this if
> someone else had already done it.

FWIW, a comprehensive collection of PPC litmus tests could be found at

  https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc002.html

(just follow the link on the test pattern/variants to see the sources);
be aware the results of those tables date back to the PPC paper though.

Alternatively, remind that PPC is well supported by the herdtools7 diy7
generator; I see no reason for having to (re)write such tests manually.

  Andrea

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-24 11:13   ` Andrea Parri
@ 2024-10-25 13:56     ` Andrea Parri
  2024-11-13 19:25       ` Andrea Parri
  0 siblings, 1 reply; 19+ messages in thread
From: Andrea Parri @ 2024-10-25 13:56 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: puranjay, bpf, lkmm, linux-kernel

> But the subset of the LKMM which deals with "strong fences" and Acq &
> Rel (limited to so called marked accesses) seems relatively contained
> /simple:  its analysis could be useful, if not determining, in trying
> to resolve the above issues.

Elaborating on the previous suggestion/comparison with the LKMM, the
"subset" in question can take the following form (modulo my typos):

"LKMM with acquire/release, strong fences, and marked accesses only"

[...]

let acq-po = [Acq] ; po ; [M]
let po-rel = [M] ; po ; [Rel]
let strong-fence = [M] ; fencerel(Mb) ; [M]
let ppo = acq-po | po-rel | strong-fence

let A-cumul(r) = rfe? ; r
let cumul-fence = A-cumul(strong-fence | po-rel)
let overwrite = co | fr
let prop = (overwrite & ext)? ; cumul-fence* ; rfe?

let hb = ppo | rfe | ((prop \ id) & int)
acyclic hb as Hb

let pb = prop ; strong-fence ; hb*
acyclic pb as Pb


For BPF, we'd want to replace acq-po, po-rel and strong-fence with
load_acquire, store_release and po_amo_fetch respectively:  Unless
I'm missing something, this should restore the intended behaviors
for the R and Z6.3 tests discussed earlier.

A couple of other remarks:

- Notice how the above formalization is completely symmetrical wrt.
  co <-> fr, IOW, co links are considered "on par with" fr links.
  In particular, the following test is satisfiable in the above
  formalization, as is the corresponding C test in the LKMM:

BPF 2+2W+release+fence
{
 0:r2=x; 0:r4=y;
 1:r2=y; 1:r4=x; 1:r6=l;
}
 P0                                 | P1                                         ;
 r1 = 1                             | r1 = 1                                     ;
 *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
 r3 = 2                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
 store_release((u32 *)(r4 + 0), r3) | r3 = 2                                     ;
                                    | store_release((u32 *)(r4 + 0), r3)         ;
exists ([x]=1 /\ [y]=1)

  (On an historical note, this wasn't always the case in the LKMM,
  cf. e.g. [1], but be alerted that the formalization in [1] is
  decisively more involved and less intuitive than today's / what
  the LKMM community has converged to.  ;-) )

- The above formalization merges the so called "Observation" axiom
  in the "Happens-before" axiom.  In the LKMM, this followed the
  removal of B-cumulativity for smp_wmb() and smp_store_release()
  and a consequent "great simplification" of the hb relation: link
  [2] can provide more details and some examples related to those
  changes.  For completeness, here is the BPF analogue of test
  "C-release-acquire-is-B-cumulative.litmus" from that article:

BPF ISA2+release+acquire+acquire
{
 0:r2=x; 0:r4=y;
 1:r2=y; 1:r4=z;
 2:r2=z; 2:r4=x;
}
 P0                                 | P1                                 | P2                                 ;
 r1 = 1                             | r1 = load_acquire((u32 *)(r2 + 0)) | r1 = load_acquire((u32 *)(r2 + 0)) ;
 *(u32 *)(r2 + 0) = r1              | r3 = 1                             | r3 = *(u32 *)(r4 + 0)              ;
 r3 = 1                             | *(u32 *)(r4 + 0) = r3              |                                    ;
 store_release((u32 *)(r4 + 0), r3) |                                    |                                    ;
exists (1:r1=1 /\ 2:r1=1 /\ 2:r3=0)

  The formalization sketched above allows this behavior.  Notice,
  however, that the behavior is forbidden after "completion" of
  the release/acquire chain, i.e. by making the store from P1 a
  store-release (a property also known as A-cumulativy of the
  release operation).


I guess the next question (once clarified the intentions for the R
and Z6.3 tests seen earlier) is "Does BPF really care about 2+2W
and B-cumulativity for store-release?"; I mentioned some tradeoff,
but in the end this is a call for the BPF community.

  Andrea

[1] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/StrongModel.html
[2] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/WeakModel.html#Cumulativity

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-25 13:44       ` Andrea Parri
@ 2024-10-25 13:57         ` Hernan Ponce de Leon
  2024-10-25 14:27           ` Andrea Parri
  2024-10-25 23:26           ` Paul E. McKenney
  0 siblings, 2 replies; 19+ messages in thread
From: Hernan Ponce de Leon @ 2024-10-25 13:57 UTC (permalink / raw)
  To: Andrea Parri; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel

On 10/25/2024 3:44 PM, Andrea Parri wrote:
> On Fri, Oct 25, 2024 at 03:28:17PM +0200, Hernan Ponce de Leon wrote:
>> On 10/25/2024 3:15 PM, Andrea Parri wrote:
>>>>> BPF R+release+fence
>>>>> {
>>>>>     0:r2=x; 0:r4=y;
>>>>>     1:r2=y; 1:r4=x; 1:r6=l;
>>>>> }
>>>>>     P0                                 | P1                                         ;
>>>>>     r1 = 1                             | r1 = 2                                     ;
>>>>>     *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
>>>>>     r3 = 1                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
>>>>>     store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0)                      ;
>>>>> exists ([y]=2 /\ 1:r3=0)
>>>>>
>>>>> This "exists" condition is not satisfiable according to the BPF model;
>>>>> however, if we adopt the "natural"/intended(?) PowerPC implementations
>>>>> of the synchronization primitives above (aka, with store_release() -->
>>>>> LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
>>>>> condition in question becomes (architecturally) satisfiable on PowerPC
>>>>> (although I'm not aware of actual observations on PowerPC hardware).
>>>>
>>>> Are the resulting PPC tests available somewhere?
>>>
>>> My data go back to the LKMM paper, cf. e.g. the R+pooncerelease+fencembonceonce
>>> entry at https://diy.inria.fr/linux/hard.html#unseen .
>>>
>>>     Andrea
>>
>> I guess I understood you wrong. I thought you had manually "compiled" those
>> to PPC litmus format (i.e., doing exactly what the JIT compiler would do). I
>> can obviously write them manually myself, but I find this painful and error
>> prone (I am particularly bad at this task), so I wanted to avoid this if
>> someone else had already done it.
> 
> FWIW, a comprehensive collection of PPC litmus tests could be found at
> 
>    https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc002.html
> 
> (just follow the link on the test pattern/variants to see the sources);
> be aware the results of those tables date back to the PPC paper though.
> 
> Alternatively, remind that PPC is well supported by the herdtools7 diy7
> generator; I see no reason for having to (re)write such tests manually.
> 
>    Andrea

I am particularly interested in tests using lwarx and stwcx instructions 
(this is what I understood would be used if one follows [1] to compile 
the tests in this thread).

I have not yet check the cambridge website, but due to the timeline, I 
don't expect to find tests with those instructions. The same is true 
with [2].

I have limited experience with diy7, but I remember that it had some 
limitations to generate RMW instructions, at least for C [3].

Hernan

[1] 
https://github.com/torvalds/linux/blob/master/arch/powerpc/net/bpf_jit_comp32.c
[2] 
https://github.com/herd/herdtools7/tree/master/catalogue/herding-cats/ppc/tests/campaign
[3] https://github.com/herd/herdtools7/issues/905


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-25 13:57         ` Hernan Ponce de Leon
@ 2024-10-25 14:27           ` Andrea Parri
  2024-10-25 15:00             ` Hernan Ponce de Leon
  2024-10-25 23:26           ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Andrea Parri @ 2024-10-25 14:27 UTC (permalink / raw)
  To: Hernan Ponce de Leon; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel

> I am particularly interested in tests using lwarx and stwcx instructions
> (this is what I understood would be used if one follows [1] to compile the
> tests in this thread).
> 
> I have not yet check the cambridge website, but due to the timeline, I don't
> expect to find tests with those instructions. The same is true with [2].
> 
> I have limited experience with diy7, but I remember that it had some
> limitations to generate RMW instructions, at least for C [3].

Oh, I'm sure there are, though I'd also not consider myself the 'expert'
when it comes to diy7 internals.  ;-)  Here's an example use of diy7 /
diyone7 generating lwarx and stwcx and reflecting the previous pattern:

$ diyone7 -arch PPC LwSyncdWW Coe SyncdWRPA SyncdRRAP Fre
PPC A
"LwSyncdWW Coe SyncdWRNaA SyncdRRANa Fre"
Generator=diyone7 (version 7.57+1)
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Co Fr
Orig=LwSyncdWW Coe SyncdWRNaA SyncdRRANa Fre
{
0:r2=x; 0:r4=y;
1:r2=y; 1:r3=z; 1:r6=x;
}
 P0           | P1              ;
 li r1,1      | li r1,2         ;
 stw r1,0(r2) | stw r1,0(r2)    ;
 lwsync       | sync            ;
 li r3,1      | Loop00:         ;
 stw r3,0(r4) | lwarx r4,r0,r3  ;
              | stwcx. r4,r0,r3 ;
              | bne  Loop00     ;
              | sync            ;
              | lwz r5,0(r6)    ;
exists ([y]=2 /\ 1:r5=0)

But again, I'd probably have to defer to proper herdtools7 developers
and maintainers for any diy7 bug or misbehavior you'd have to discover.

  Andrea


> 
> Hernan
> 
> [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/net/bpf_jit_comp32.c
> [2] https://github.com/herd/herdtools7/tree/master/catalogue/herding-cats/ppc/tests/campaign
> [3] https://github.com/herd/herdtools7/issues/905
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-25 14:27           ` Andrea Parri
@ 2024-10-25 15:00             ` Hernan Ponce de Leon
  0 siblings, 0 replies; 19+ messages in thread
From: Hernan Ponce de Leon @ 2024-10-25 15:00 UTC (permalink / raw)
  To: Andrea Parri; +Cc: puranjay, paulmck, bpf, lkmm, linux-kernel

On 10/25/2024 4:27 PM, Andrea Parri wrote:
>> I am particularly interested in tests using lwarx and stwcx instructions
>> (this is what I understood would be used if one follows [1] to compile the
>> tests in this thread).
>>
>> I have not yet check the cambridge website, but due to the timeline, I don't
>> expect to find tests with those instructions. The same is true with [2].
>>
>> I have limited experience with diy7, but I remember that it had some
>> limitations to generate RMW instructions, at least for C [3].
> 
> Oh, I'm sure there are, though I'd also not consider myself the 'expert'
> when it comes to diy7 internals.  ;-)  Here's an example use of diy7 /
> diyone7 generating lwarx and stwcx and reflecting the previous pattern:
> 
> $ diyone7 -arch PPC LwSyncdWW Coe SyncdWRPA SyncdRRAP Fre
> PPC A
> "LwSyncdWW Coe SyncdWRNaA SyncdRRANa Fre"
> Generator=diyone7 (version 7.57+1)
> Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
> Com=Co Fr
> Orig=LwSyncdWW Coe SyncdWRNaA SyncdRRANa Fre
> {
> 0:r2=x; 0:r4=y;
> 1:r2=y; 1:r3=z; 1:r6=x;
> }
>   P0           | P1              ;
>   li r1,1      | li r1,2         ;
>   stw r1,0(r2) | stw r1,0(r2)    ;
>   lwsync       | sync            ;
>   li r3,1      | Loop00:         ;
>   stw r3,0(r4) | lwarx r4,r0,r3  ;
>                | stwcx. r4,r0,r3 ;
>                | bne  Loop00     ;
>                | sync            ;
>                | lwz r5,0(r6)    ;
> exists ([y]=2 /\ 1:r5=0)

That is exactly what I was looking for. Thanks Andrea!

Hernan

> 
> But again, I'd probably have to defer to proper herdtools7 developers
> and maintainers for any diy7 bug or misbehavior you'd have to discover.
> 
>    Andrea
> 
> 
>>
>> Hernan
>>
>> [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/net/bpf_jit_comp32.c
>> [2] https://github.com/herd/herdtools7/tree/master/catalogue/herding-cats/ppc/tests/campaign
>> [3] https://github.com/herd/herdtools7/issues/905
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-25 13:57         ` Hernan Ponce de Leon
  2024-10-25 14:27           ` Andrea Parri
@ 2024-10-25 23:26           ` Paul E. McKenney
  1 sibling, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-10-25 23:26 UTC (permalink / raw)
  To: Hernan Ponce de Leon; +Cc: Andrea Parri, puranjay, bpf, lkmm, linux-kernel

On Fri, Oct 25, 2024 at 03:57:29PM +0200, Hernan Ponce de Leon wrote:
> On 10/25/2024 3:44 PM, Andrea Parri wrote:
> > On Fri, Oct 25, 2024 at 03:28:17PM +0200, Hernan Ponce de Leon wrote:
> > > On 10/25/2024 3:15 PM, Andrea Parri wrote:
> > > > > > BPF R+release+fence
> > > > > > {
> > > > > >     0:r2=x; 0:r4=y;
> > > > > >     1:r2=y; 1:r4=x; 1:r6=l;
> > > > > > }
> > > > > >     P0                                 | P1                                         ;
> > > > > >     r1 = 1                             | r1 = 2                                     ;
> > > > > >     *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
> > > > > >     r3 = 1                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
> > > > > >     store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0)                      ;
> > > > > > exists ([y]=2 /\ 1:r3=0)
> > > > > > 
> > > > > > This "exists" condition is not satisfiable according to the BPF model;
> > > > > > however, if we adopt the "natural"/intended(?) PowerPC implementations
> > > > > > of the synchronization primitives above (aka, with store_release() -->
> > > > > > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
> > > > > > condition in question becomes (architecturally) satisfiable on PowerPC
> > > > > > (although I'm not aware of actual observations on PowerPC hardware).
> > > > > 
> > > > > Are the resulting PPC tests available somewhere?
> > > > 
> > > > My data go back to the LKMM paper, cf. e.g. the R+pooncerelease+fencembonceonce
> > > > entry at https://diy.inria.fr/linux/hard.html#unseen .
> > > > 
> > > >     Andrea
> > > 
> > > I guess I understood you wrong. I thought you had manually "compiled" those
> > > to PPC litmus format (i.e., doing exactly what the JIT compiler would do). I
> > > can obviously write them manually myself, but I find this painful and error
> > > prone (I am particularly bad at this task), so I wanted to avoid this if
> > > someone else had already done it.
> > 
> > FWIW, a comprehensive collection of PPC litmus tests could be found at
> > 
> >    https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc002.html
> > 
> > (just follow the link on the test pattern/variants to see the sources);
> > be aware the results of those tables date back to the PPC paper though.
> > 
> > Alternatively, remind that PPC is well supported by the herdtools7 diy7
> > generator; I see no reason for having to (re)write such tests manually.
> > 
> >    Andrea
> 
> I am particularly interested in tests using lwarx and stwcx instructions
> (this is what I understood would be used if one follows [1] to compile the
> tests in this thread).
> 
> I have not yet check the cambridge website, but due to the timeline, I don't
> expect to find tests with those instructions. The same is true with [2].
> 
> I have limited experience with diy7, but I remember that it had some
> limitations to generate RMW instructions, at least for C [3].
> 
> Hernan
> 
> [1] https://github.com/torvalds/linux/blob/master/arch/powerpc/net/bpf_jit_comp32.c
> [2] https://github.com/herd/herdtools7/tree/master/catalogue/herding-cats/ppc/tests/campaign
> [3] https://github.com/herd/herdtools7/issues/905

Please see attached for a tarball of random PPC litmus tests.

You asked for this!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-10-25 13:56     ` Andrea Parri
@ 2024-11-13 19:25       ` Andrea Parri
  2024-11-13 22:59         ` Puranjay Mohan
  0 siblings, 1 reply; 19+ messages in thread
From: Andrea Parri @ 2024-11-13 19:25 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: puranjay, bpf, lkmm, linux-kernel

[...]

> I guess the next question (once clarified the intentions for the R
> and Z6.3 tests seen earlier) is "Does BPF really care about 2+2W
> and B-cumulativity for store-release?"; I mentioned some tradeoff,
> but in the end this is a call for the BPF community.

Interpreting the radio silence as an unanimous "No, it doesn't", please find
tentative fixes/patch (on top of the bpf_acquire_release branch cited in an
earlier post) at the bottom of this email.

While testing the changes in question, I noticed an (unrelated) omission in
the current PPO relation; the second patch below addresses that.

Both patches were tested using the "BPF catalogue" available in the tree at
stake: as expected, the only differences in outcomes were for the new/added
five tests.

Please use and integrate according to your preference, any feedback welcome.

  Andrea


From 5bf399413578b6a94c42f6367245be2bdbf58926 Mon Sep 17 00:00:00 2001
From: Andrea Parri <parri.andrea@gmail.com>
Date: Mon, 11 Nov 2024 08:38:42 +0200
Subject: [PATCH 1/2] BPF: Fix propagation ordering, after LKMM

Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
 catalogue/bpf/tests/2+2W+release+fence.litmus | 15 ++++++++++++
 .../tests/ISA2+release+acquire+acquire.litmus | 15 ++++++++++++
 catalogue/bpf/tests/R+release+fence.litmus    | 14 +++++++++++
 .../bpf/tests/Z6.3+fence+fence+acquire.litmus | 16 +++++++++++++
 herd/libdir/bpf.cat                           | 24 +++++++++----------
 5 files changed, 72 insertions(+), 12 deletions(-)
 create mode 100644 catalogue/bpf/tests/2+2W+release+fence.litmus
 create mode 100644 catalogue/bpf/tests/ISA2+release+acquire+acquire.litmus
 create mode 100644 catalogue/bpf/tests/R+release+fence.litmus
 create mode 100644 catalogue/bpf/tests/Z6.3+fence+fence+acquire.litmus

diff --git a/catalogue/bpf/tests/2+2W+release+fence.litmus b/catalogue/bpf/tests/2+2W+release+fence.litmus
new file mode 100644
index 0000000000000..0f88babbf27de
--- /dev/null
+++ b/catalogue/bpf/tests/2+2W+release+fence.litmus
@@ -0,0 +1,15 @@
+BPF 2+2W+release+fence
+(*
+ * Result: Sometimes
+ *)
+{
+ 0:r2=x; 0:r4=y;
+ 1:r2=y; 1:r4=x; 1:r6=l;
+}
+ P0                                 | P1                                         ;
+ r1 = 1                             | r1 = 1                                     ;
+ *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
+ r3 = 2                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
+ store_release((u32 *)(r4 + 0), r3) | r3 = 2                                     ;
+                                    | *(u32 *)(r4 + 0) = r3                      ;
+exists ([x]=1 /\ [y]=1)
diff --git a/catalogue/bpf/tests/ISA2+release+acquire+acquire.litmus b/catalogue/bpf/tests/ISA2+release+acquire+acquire.litmus
new file mode 100644
index 0000000000000..44c1308d70b5a
--- /dev/null
+++ b/catalogue/bpf/tests/ISA2+release+acquire+acquire.litmus
@@ -0,0 +1,15 @@
+BPF ISA2+release+acquire+acquire
+(*
+ * Result: Sometimes
+ *)
+{
+ 0:r2=x; 0:r4=y;
+ 1:r2=y; 1:r4=z;
+ 2:r2=z; 2:r4=x;
+}
+ P0                                 | P1                                 | P2                                 ;
+ r1 = 1                             | r1 = load_acquire((u32 *)(r2 + 0)) | r1 = load_acquire((u32 *)(r2 + 0)) ;
+ *(u32 *)(r2 + 0) = r1              | r3 = 1                             | r3 = *(u32 *)(r4 + 0)              ;
+ r3 = 1                             | *(u32 *)(r4 + 0) = r3              |                                    ;
+ store_release((u32 *)(r4 + 0), r3) |                                    |                                    ;
+exists (1:r1=1 /\ 2:r1=1 /\ 2:r3=0)
diff --git a/catalogue/bpf/tests/R+release+fence.litmus b/catalogue/bpf/tests/R+release+fence.litmus
new file mode 100644
index 0000000000000..a0bcbe0bbb804
--- /dev/null
+++ b/catalogue/bpf/tests/R+release+fence.litmus
@@ -0,0 +1,14 @@
+BPF R+release+fence
+(*
+ * Result: Sometimes
+ *)
+{
+ 0:r2=x; 0:r4=y;
+ 1:r2=y; 1:r4=x; 1:r6=l;
+}
+ P0                                 | P1                                         ;
+ r1 = 1                             | r1 = 2                                     ;
+ *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
+ r3 = 1                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
+ store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0)                      ;
+exists ([y]=2 /\ 1:r3=0)
diff --git a/catalogue/bpf/tests/Z6.3+fence+fence+acquire.litmus b/catalogue/bpf/tests/Z6.3+fence+fence+acquire.litmus
new file mode 100644
index 0000000000000..67e9146bcef2b
--- /dev/null
+++ b/catalogue/bpf/tests/Z6.3+fence+fence+acquire.litmus
@@ -0,0 +1,16 @@
+BPF Z6.3+fence+fence+acquire
+(*
+ * Result: Never
+ *)
+{
+ 0:r2=x; 0:r4=y; 0:r6=l;
+ 1:r2=y; 1:r4=z; 1:r6=m;
+ 2:r2=z; 2:r4=x;
+}
+ P0                                         | P1                                         | P2                                 ;
+ r1 = 1                                     | r1 = 2                                     | r1 = load_acquire((u32 *)(r2 + 0)) ;
+ *(u32 *)(r2 + 0) = r1                      | *(u32 *)(r2 + 0) = r1                      | r3 = *(u32 *)(r4 + 0)              ;
+ r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) |                                    ;
+ r3 = 1                                     | r3 = 1                                     |                                    ;
+ *(u32 *)(r4 + 0) = r3                      | *(u32 *)(r4 + 0) = r3                      |                                    ;
+exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0)
diff --git a/herd/libdir/bpf.cat b/herd/libdir/bpf.cat
index 28b42497ea0fc..7803a1e818de7 100644
--- a/herd/libdir/bpf.cat
+++ b/herd/libdir/bpf.cat
@@ -43,9 +43,9 @@ show addr_dep as addr
 show data_dep as data
 show ctrl_dep as ctrl
 
-(*************)
-(* ppo rules *)
-(*************)
+(**********************)
+(* ppo and prop rules *)
+(**********************)
 
 let ppo =
 (* Explicit synchronization *)
@@ -65,6 +65,10 @@ include "cos-opt.cat"
 
 let com = co | rf | fr
 
+(* Propagation ordering from SC and release operations *)
+let A-cumul = rfe? ; (po_amo_fetch | store_release)
+let prop = (coe | fre)? ; A-cumul* ; rfe?
+
 (**********)
 (* Axioms *)
 (**********)
@@ -72,17 +76,13 @@ let com = co | rf | fr
 (* Sc per location *)
 acyclic com | po-loc as Coherence
 
-(* No thin air *)
-let hb = (ppo | rfe)+
-acyclic hb
+(* No-Thin-Air and Observation *)
+let hb = ppo | rfe | ((prop \ id) & int)
+acyclic hb as Happens-before
 
 (* Propagation *)
-let A-cumul = rfe;po_amo_fetch | rfe;store_release | po_amo_fetch;fre
-let prop = (store_release | po_amo_fetch | A-cumul);hb*
-acyclic prop | co
-
-(* Observation *)
-irreflexive fre;prop
+let pb = prop ; po_amo_fetch ; hb*
+acyclic pb as Propagation
 
 (* Atomicity *)
 empty rmw & (fre;coe) as Atomic
-- 
2.43.0


From 84520eaacbcb399b5cc7b4cbe0f716ad84e87824 Mon Sep 17 00:00:00 2001
From: Andrea Parri <parri.andrea@gmail.com>
Date: Mon, 11 Nov 2024 08:03:48 +0200
Subject: [PATCH 2/2] BPF: Fix overlapping-address ordering

Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
 catalogue/bpf/tests/LB+release-oa+acquire.litmus | 15 +++++++++++++++
 herd/libdir/bpf.cat                              | 13 ++++++-------
 2 files changed, 21 insertions(+), 7 deletions(-)
 create mode 100644 catalogue/bpf/tests/LB+release-oa+acquire.litmus

diff --git a/catalogue/bpf/tests/LB+release-oa+acquire.litmus b/catalogue/bpf/tests/LB+release-oa+acquire.litmus
new file mode 100644
index 0000000000000..1544179357e9b
--- /dev/null
+++ b/catalogue/bpf/tests/LB+release-oa+acquire.litmus
@@ -0,0 +1,15 @@
+BPF LB+release-oa+acquire
+(*
+ * Result: Never
+ *)
+{
+ 0:r2=x; 0:r4=y;
+ 1:r2=y; 1:r4=x;
+}
+ P0                                 | P1                                 ;
+ r1 = *(u32 *)(r2 + 0)              | r1 = load_acquire((u32 *)(r2 + 0)) ;
+ r3 = 1                             | r3 = 1                             ;
+ store_release((u32 *)(r4 + 0), r3) | *(u32 *)(r4 + 0) = r3              ;
+ r5 = 2                             |                                    ;
+ *(u32 *)(r4 + 0) = r5              |                                    ;
+exists (0:r1=1 /\ 1:r1=2)
diff --git a/herd/libdir/bpf.cat b/herd/libdir/bpf.cat
index 7803a1e818de7..4917d0f77009f 100644
--- a/herd/libdir/bpf.cat
+++ b/herd/libdir/bpf.cat
@@ -47,6 +47,10 @@ show ctrl_dep as ctrl
 (* ppo and prop rules *)
 (**********************)
 
+(* Compute coherence relation *)
+include "cos-opt.cat"
+let com = co | rf | fr
+
 let ppo =
 (* Explicit synchronization *)
  po_amo_fetch | rcpc
@@ -57,13 +61,8 @@ let ppo =
 (* Pipeline Dependencies *)
 | [M];(addr|data);[W];rfi;[R]
 | [M];addr;[M];po;[W]
-(* Successful cmpxchg R -(M)> W *)
-| rmw
-
-(* Compute coherence relation *)
-include "cos-opt.cat"
-
-let com = co | rf | fr
+(* Overlapping-address ordering *)
+| (coi | fri)
 
 (* Propagation ordering from SC and release operations *)
 let A-cumul = rfe? ; (po_amo_fetch | store_release)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-11-13 19:25       ` Andrea Parri
@ 2024-11-13 22:59         ` Puranjay Mohan
  2024-11-14 14:56           ` Andrea Parri
  0 siblings, 1 reply; 19+ messages in thread
From: Puranjay Mohan @ 2024-11-13 22:59 UTC (permalink / raw)
  To: Andrea Parri; +Cc: Paul E. McKenney, puranjay, bpf, lkmm, linux-kernel

On Wed, Nov 13, 2024 at 8:25 PM Andrea Parri <parri.andrea@gmail.com> wrote:
>
> [...]
>
> > I guess the next question (once clarified the intentions for the R
> > and Z6.3 tests seen earlier) is "Does BPF really care about 2+2W
> > and B-cumulativity for store-release?"; I mentioned some tradeoff,
> > but in the end this is a call for the BPF community.
>
> Interpreting the radio silence as an unanimous "No, it doesn't", please find
> tentative fixes/patch (on top of the bpf_acquire_release branch cited in an
> earlier post) at the bottom of this email.
>
> While testing the changes in question, I noticed an (unrelated) omission in
> the current PPO relation; the second patch below addresses that.
>
> Both patches were tested using the "BPF catalogue" available in the tree at
> stake: as expected, the only differences in outcomes were for the new/added
> five tests.
>
> Please use and integrate according to your preference, any feedback welcome.

Hi Andrea

Sorry for the silence and a huge thanks for working on this and fixing
the cat file.

I have applied your patches and modified them to add the new tests to
kinds.txt and shelf.py
now these tests will run with all other tests using 'make cata-bpf-test'

All 175 tests, including new tests added by you, pass :D

make cata-bpf-test

_build/default/internal/herd_catalogue_regression_test.exe \
        -j 32 \
        -herd-timeout 16.0 \
        -herd-path _build/install/default/bin/herd7 \
        -libdir-path ./herd/libdir \
        -kinds-path catalogue/bpf/tests/kinds.txt \
        -shelf-path catalogue/bpf/shelf.py \
        test
herd7 catalogue bpf tests: OK


I have pushed it and sent a PR so we can get all this merged to master.
https://github.com/herd/herdtools7/pull/1050

P.S. - I am writing this using gmail so the formatting might be
broken, sorry for that.

Thanks,
Puranjay Mohan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Some observations (results) on BPF acquire and release
  2024-11-13 22:59         ` Puranjay Mohan
@ 2024-11-14 14:56           ` Andrea Parri
  0 siblings, 0 replies; 19+ messages in thread
From: Andrea Parri @ 2024-11-14 14:56 UTC (permalink / raw)
  To: Puranjay Mohan; +Cc: Paul E. McKenney, puranjay, bpf, lkmm, linux-kernel

> I have applied your patches and modified them to add the new tests to
> kinds.txt and shelf.py
> now these tests will run with all other tests using 'make cata-bpf-test'
> 
> All 175 tests, including new tests added by you, pass :D
> 
> make cata-bpf-test
> 
> _build/default/internal/herd_catalogue_regression_test.exe \
>         -j 32 \
>         -herd-timeout 16.0 \
>         -herd-path _build/install/default/bin/herd7 \
>         -libdir-path ./herd/libdir \
>         -kinds-path catalogue/bpf/tests/kinds.txt \
>         -shelf-path catalogue/bpf/shelf.py \
>         test
> herd7 catalogue bpf tests: OK
> 
> 
> I have pushed it and sent a PR so we can get all this merged to master.
> https://github.com/herd/herdtools7/pull/1050

Cool!  Thank you for the follow-up.

  Andrea

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-11-14 14:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 17:47 Some observations (results) on BPF acquire and release Andrea Parri
2024-10-24  4:25 ` Paul E. McKenney
2024-10-24 11:13   ` Andrea Parri
2024-10-25 13:56     ` Andrea Parri
2024-11-13 19:25       ` Andrea Parri
2024-11-13 22:59         ` Puranjay Mohan
2024-11-14 14:56           ` Andrea Parri
2024-10-24 11:44 ` Jonas Oberhauser
2024-10-24 12:11   ` Puranjay Mohan
2024-10-24 12:21     ` Jonas Oberhauser
2024-10-24 14:42       ` Paul E. McKenney
2024-10-25  9:32 ` Hernan Ponce de Leon
2024-10-25 13:15   ` Andrea Parri
2024-10-25 13:28     ` Hernan Ponce de Leon
2024-10-25 13:44       ` Andrea Parri
2024-10-25 13:57         ` Hernan Ponce de Leon
2024-10-25 14:27           ` Andrea Parri
2024-10-25 15:00             ` Hernan Ponce de Leon
2024-10-25 23:26           ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox