LKMM: Making RMW barriers explicit

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* LKMM: Making RMW barriers explicit
@ 2024-05-16  1:43 Alan Stern
  2024-05-16  8:31 ` Jonas Oberhauser
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2024-05-16  1:43 UTC (permalink / raw)
  To: Hernan Ponce de Leon, Jonas Oberhauser
  Cc: Paul E. McKenney, linux-kernel, linux-arch, kernel-team,
	parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes

Hernan and Jonas:

Can you explain more fully the changes you want to make to herd7 and/or 
the LKMM?  The goal is to make the memory barriers currently implicit in 
RMW operations explicit, but I couldn't understand how you propose to do 
this.

Are you going to change herd7 somehow, and if so, how?  It seems like 
you should want to provide sufficient information so that the .bell 
and .cat files can implement the appropriate memory barriers associated 
with each RMW operation.  What additional information is needed?  And 
how (explained in English, not by quoting source code) will the .bell 
and .cat files make use of this information?

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-16  1:43 LKMM: Making RMW barriers explicit Alan Stern
@ 2024-05-16  8:31 ` Jonas Oberhauser
  2024-05-16  8:44   ` Hernan Ponce de Leon
  0 siblings, 1 reply; 33+ messages in thread
From: Jonas Oberhauser @ 2024-05-16  8:31 UTC (permalink / raw)
  To: Alan Stern, Hernan Ponce de Leon
  Cc: Paul E. McKenney, linux-kernel, linux-arch, kernel-team,
	parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes

Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
> Hernan and Jonas:
> 
> Can you explain more fully the changes you want to make to herd7 and/or
> the LKMM?  The goal is to make the memory barriers currently implicit in
> RMW operations explicit, but I couldn't understand how you propose to do
> this.
> 
> Are you going to change herd7 somehow, and if so, how?  It seems like
> you should want to provide sufficient information so that the .bell
> and .cat files can implement the appropriate memory barriers associated
> with each RMW operation.  What additional information is needed?  And
> how (explained in English, not by quoting source code) will the .bell
> and .cat files make use of this information?
> 
> Alan

I don't know whether herd7 needs to be changed. Probably, herd7 does the 
following:
- if a tag called Mb appears on an rmw instruction (by instruction I 
mean things like xchg(), atomic_inc_return_relaxed()), replace it with 
one of those things:
   * full mb ; once (the rmw) ; full mb, if a value returning 
(successful) rmw
   * once (the rmw)   otherwise
- everything else gets translated 1:1 into some internal representation

What I'm proposing is:
1. remove this transpilation step,
2. and instead allow the Mb tag to actually appear on RMW instructions
3. change the cat file to explicitly define the behavior of the Mb tag 
on RMW instructions

There are probably two ways to achieve this:
a) change herd7 to remove the special behavior for Mb, after that we 
should be able to do everything else in the .cat/.bell/.def files.
b) sidestep herd7's behavior by renaming Mb to _Mb in the .def file,
  and then defining Mb=_Mb in the .bell file, and define the semantics 
of the Mb tag in the .cat files.

The latter would not include modification to herd7, but it's a bit hacky.

I'm not sure if the second way really works since I don't know exactly 
how the herd7 tool does the transpilation,  e.g., if it really looks for 
an Mb tag or rather for the names of the instructions.

Does it make sense?

jonas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-16  8:31 ` Jonas Oberhauser
@ 2024-05-16  8:44   ` Hernan Ponce de Leon
  2024-05-18  0:31     ` Alan Stern
  0 siblings, 1 reply; 33+ messages in thread
From: Hernan Ponce de Leon @ 2024-05-16  8:44 UTC (permalink / raw)
  To: Jonas Oberhauser, Alan Stern
  Cc: Paul E. McKenney, linux-kernel, linux-arch, kernel-team,
	parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes

On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
> 
> 
> Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
>> Hernan and Jonas:
>>
>> Can you explain more fully the changes you want to make to herd7 and/or
>> the LKMM?  The goal is to make the memory barriers currently implicit in
>> RMW operations explicit, but I couldn't understand how you propose to do
>> this.
>>
>> Are you going to change herd7 somehow, and if so, how?  It seems like
>> you should want to provide sufficient information so that the .bell
>> and .cat files can implement the appropriate memory barriers associated
>> with each RMW operation.  What additional information is needed?  And
>> how (explained in English, not by quoting source code) will the .bell
>> and .cat files make use of this information?
>>
>> Alan
> 
> 
> I don't know whether herd7 needs to be changed. Probably, herd7 does the 
> following:
> - if a tag called Mb appears on an rmw instruction (by instruction I 
> mean things like xchg(), atomic_inc_return_relaxed()), replace it with 
> one of those things:
>    * full mb ; once (the rmw) ; full mb, if a value returning 
> (successful) rmw
>    * once (the rmw)   otherwise
> - everything else gets translated 1:1 into some internal representation

This is my understanding from reading the source code of CSem.ml in 
herd7's repo.

Also, this is exactly what dartagnan is currently doing.

> 
> What I'm proposing is:
> 1. remove this transpilation step,
> 2. and instead allow the Mb tag to actually appear on RMW instructions
> 3. change the cat file to explicitly define the behavior of the Mb tag 
> on RMW instructions

These are the exact 3 things I changed in dartagnan for testing what 
Jonas proposed.

I am not sure if further changes are needed for herd7.

Hernan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-16  8:44   ` Hernan Ponce de Leon
@ 2024-05-18  0:31     ` Alan Stern
  2024-05-21  9:57       ` Jonas Oberhauser
  2024-05-21 11:38       ` Hernan Ponce de Leon
  0 siblings, 2 replies; 33+ messages in thread
From: Alan Stern @ 2024-05-18  0:31 UTC (permalink / raw)
  To: Hernan Ponce de Leon
  Cc: Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes

On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
> > 
> > 
> > Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
> > > Hernan and Jonas:
> > > 
> > > Can you explain more fully the changes you want to make to herd7 and/or
> > > the LKMM?  The goal is to make the memory barriers currently implicit in
> > > RMW operations explicit, but I couldn't understand how you propose to do
> > > this.
> > > 
> > > Are you going to change herd7 somehow, and if so, how?  It seems like
> > > you should want to provide sufficient information so that the .bell
> > > and .cat files can implement the appropriate memory barriers associated
> > > with each RMW operation.  What additional information is needed?  And
> > > how (explained in English, not by quoting source code) will the .bell
> > > and .cat files make use of this information?
> > > 
> > > Alan
> > 
> > 
> > I don't know whether herd7 needs to be changed. Probably, herd7 does the
> > following:
> > - if a tag called Mb appears on an rmw instruction (by instruction I
> > mean things like xchg(), atomic_inc_return_relaxed()), replace it with
> > one of those things:
> >    * full mb ; once (the rmw) ; full mb, if a value returning
> > (successful) rmw
> >    * once (the rmw)   otherwise
> > - everything else gets translated 1:1 into some internal representation
> 
> This is my understanding from reading the source code of CSem.ml in herd7's
> repo.
> 
> Also, this is exactly what dartagnan is currently doing.
> 
> > 
> > What I'm proposing is:
> > 1. remove this transpilation step,
> > 2. and instead allow the Mb tag to actually appear on RMW instructions
> > 3. change the cat file to explicitly define the behavior of the Mb tag
> > on RMW instructions
> 
> These are the exact 3 things I changed in dartagnan for testing what Jonas
> proposed.
> 
> I am not sure if further changes are needed for herd7.

Okay, good.  This answers the first part of what I asked.  What about 
the second part?  That is, how will the changes to the .def, .bell, and 
.cat files achieve your goals?

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-18  0:31     ` Alan Stern
@ 2024-05-21  9:57       ` Jonas Oberhauser
  2024-05-21 15:36         ` Alan Stern
  2024-05-21 11:38       ` Hernan Ponce de Leon
  1 sibling, 1 reply; 33+ messages in thread
From: Jonas Oberhauser @ 2024-05-21  9:57 UTC (permalink / raw)
  To: Alan Stern, Hernan Ponce de Leon
  Cc: Paul E. McKenney, linux-kernel, linux-arch, kernel-team,
	parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes



Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
> On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
>> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
>>>
>>>
>>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
>>>> Hernan and Jonas:
>>>>
>>>> Can you explain more fully the changes you want to make to herd7 and/or
>>>> the LKMM?  The goal is to make the memory barriers currently implicit in
>>>> RMW operations explicit, but I couldn't understand how you propose to do
>>>> this.
>>>>
>>>> Are you going to change herd7 somehow, and if so, how?  It seems like
>>>> you should want to provide sufficient information so that the .bell
>>>> and .cat files can implement the appropriate memory barriers associated
>>>> with each RMW operation.  What additional information is needed?  And
>>>> how (explained in English, not by quoting source code) will the .bell
>>>> and .cat files make use of this information?
>>>>
>>>> Alan
>>>
>>>
>>> I don't know whether herd7 needs to be changed. Probably, herd7 does the
>>> following:
>>> - if a tag called Mb appears on an rmw instruction (by instruction I
>>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with
>>> one of those things:
>>>     * full mb ; once (the rmw) ; full mb, if a value returning
>>> (successful) rmw
>>>     * once (the rmw)   otherwise
>>> - everything else gets translated 1:1 into some internal representation
>>
>> This is my understanding from reading the source code of CSem.ml in herd7's
>> repo.
>>
>> Also, this is exactly what dartagnan is currently doing.
>>
>>>
>>> What I'm proposing is:
>>> 1. remove this transpilation step,
>>> 2. and instead allow the Mb tag to actually appear on RMW instructions
>>> 3. change the cat file to explicitly define the behavior of the Mb tag
>>> on RMW instructions
>>
>> These are the exact 3 things I changed in dartagnan for testing what Jonas
>> proposed.
>>
>> I am not sure if further changes are needed for herd7.
> 
> Okay, good.  This answers the first part of what I asked.  What about
> the second part?  That is, how will the changes to the .def, .bell, and
> .cat files achieve your goals?
> 
> Alan


Firstly, we'd allow 'mb as a barrier mode in events, e.g.

instructions RMW[{'once,'acquire,'release,'mb}]

then the Mb tags would appear in the graph. And then I'd define the 
ordering explicitly. One way is to say that an Mb tag orders all memory 
accesses before(or at) the tag with all memory accesses after(or at) the 
tag, except the accesses of the rmw with each other.
This is the same as the full fence before the read, which orders all 
memory accesses before the read with every access after(or at) the read,
plus the full fence after the write, which orders all memory accesses 
before(or at) the write with all accesses after the write.

That would be done by adding

      [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]

to ppo.


One could also split it into two rules to keep with the "two full 
fences" analogy. Maybe a good way would be like this:

      [M] ; po; [RMW_MB & R] ; po^? ; [M]

      [M] ; po^?; [RMW_MB & W] ; po ; [M]


Hope that makes sense,

jonas


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-21  9:57       ` Jonas Oberhauser
@ 2024-05-21 15:36         ` Alan Stern
  2024-05-22  9:20           ` Jonas Oberhauser
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2024-05-21 15:36 UTC (permalink / raw)
  To: Jonas Oberhauser
  Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes

On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote:
> 
> 
> Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
> > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
> > > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
> > > > 
> > > > 
> > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
> > > > > Hernan and Jonas:
> > > > > 
> > > > > Can you explain more fully the changes you want to make to herd7 and/or
> > > > > the LKMM?  The goal is to make the memory barriers currently implicit in
> > > > > RMW operations explicit, but I couldn't understand how you propose to do
> > > > > this.
> > > > > 
> > > > > Are you going to change herd7 somehow, and if so, how?  It seems like
> > > > > you should want to provide sufficient information so that the .bell
> > > > > and .cat files can implement the appropriate memory barriers associated
> > > > > with each RMW operation.  What additional information is needed?  And
> > > > > how (explained in English, not by quoting source code) will the .bell
> > > > > and .cat files make use of this information?
> > > > > 
> > > > > Alan
> > > > 
> > > > 
> > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the
> > > > following:
> > > > - if a tag called Mb appears on an rmw instruction (by instruction I
> > > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with
> > > > one of those things:
> > > >     * full mb ; once (the rmw) ; full mb, if a value returning
> > > > (successful) rmw
> > > >     * once (the rmw)   otherwise
> > > > - everything else gets translated 1:1 into some internal representation
> > > 
> > > This is my understanding from reading the source code of CSem.ml in herd7's
> > > repo.
> > > 
> > > Also, this is exactly what dartagnan is currently doing.
> > > 
> > > > 
> > > > What I'm proposing is:
> > > > 1. remove this transpilation step,
> > > > 2. and instead allow the Mb tag to actually appear on RMW instructions
> > > > 3. change the cat file to explicitly define the behavior of the Mb tag
> > > > on RMW instructions
> > > 
> > > These are the exact 3 things I changed in dartagnan for testing what Jonas
> > > proposed.
> > > 
> > > I am not sure if further changes are needed for herd7.

What about failed RMW instructions?  IIRC, herd7 generates just an R for 
these, not both R and W, but won't it still be annotated with an mb tag? 
And wasn't this matter of failed RMWs one of the issues that the two of 
you specifically wanted to make explicit in the memory model, rather 
than implicit in the operation of herd7?

And wasn't another one of these issues the difference between 
value-returning and non-value-returning RMWs?  As far as I can, nothing 
in the .def file specifically mentions this.  There's the noreturn tag 
in the .bell file, but nothing in the .def file says which instructions 
it applies to.  Or are we supposed to know that it automatically applies 
to all __atomic_op() instances?

> > Okay, good.  This answers the first part of what I asked.  What about
> > the second part?  That is, how will the changes to the .def, .bell, and
> > .cat files achieve your goals?
> > 
> > Alan
> 
> 
> Firstly, we'd allow 'mb as a barrier mode in events, e.g.

You mean as a mode in RMW events.  'mb already is a mode for F events.

> instructions RMW[{'once,'acquire,'release,'mb}]
> 
> then the Mb tags would appear in the graph. And then I'd define the ordering
> explicitly. One way is to say that an Mb tag orders all memory accesses
> before(or at) the tag with all memory accesses after(or at) the tag, except
> the accesses of the rmw with each other.

I don't think you need to add very much.  The .cat file already defines 
the mb relation as including the term:

	([M] ; fencerel(Mb) ; [M])

All that's needed is to replace the fencerel(Mb) with something more 
general...

> This is the same as the full fence before the read, which orders all memory
> accesses before the read with every access after(or at) the read,
> plus the full fence after the write, which orders all memory accesses
> before(or at) the write with all accesses after the write.
> 
> That would be done by adding
> 
>      [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]

... like this.

> to ppo.

You need to update the definition of mb, not ppo.  And the RMW_MB above 
looks wrong; shouldn't it be just Mb?

Also, is the "\ rmw" part really necessary?  I don't think it makes any 
difference; the memory model already knows that the read part of an RMW 
has to happen before the write part.

> One could also split it into two rules to keep with the "two full fences"
> analogy. Maybe a good way would be like this:
> 
>      [M] ; po; [RMW_MB & R] ; po^? ; [M]
> 
>      [M] ; po^?; [RMW_MB & W] ; po ; [M]

My preference is for the first approach.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-21 15:36         ` Alan Stern
@ 2024-05-22  9:20           ` Jonas Oberhauser
  2024-05-22 14:20             ` Alan Stern
  0 siblings, 1 reply; 33+ messages in thread
From: Jonas Oberhauser @ 2024-05-22  9:20 UTC (permalink / raw)
  To: Alan Stern
  Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes



Am 5/21/2024 um 5:36 PM schrieb Alan Stern:
> On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote:
>>
>>
>> Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
>>> On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
>>>> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
>>>>>
>>>>>
>>>>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
>>>>>> Hernan and Jonas:
>>>>>>
>>>>>> Can you explain more fully the changes you want to make to herd7 and/or
>>>>>> the LKMM?  The goal is to make the memory barriers currently implicit in
>>>>>> RMW operations explicit, but I couldn't understand how you propose to do
>>>>>> this.
>>>>>>
>>>>>> Are you going to change herd7 somehow, and if so, how?  It seems like
>>>>>> you should want to provide sufficient information so that the .bell
>>>>>> and .cat files can implement the appropriate memory barriers associated
>>>>>> with each RMW operation.  What additional information is needed?  And
>>>>>> how (explained in English, not by quoting source code) will the .bell
>>>>>> and .cat files make use of this information?
>>>>>>
>>>>>> Alan
>>>>>
>>>>>
>>>>> I don't know whether herd7 needs to be changed. Probably, herd7 does the
>>>>> following:
>>>>> - if a tag called Mb appears on an rmw instruction (by instruction I
>>>>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with
>>>>> one of those things:
>>>>>      * full mb ; once (the rmw) ; full mb, if a value returning
>>>>> (successful) rmw
>>>>>      * once (the rmw)   otherwise
>>>>> - everything else gets translated 1:1 into some internal representation
>>>>
>>>> This is my understanding from reading the source code of CSem.ml in herd7's
>>>> repo.
>>>>
>>>> Also, this is exactly what dartagnan is currently doing.
>>>>
>>>>>
>>>>> What I'm proposing is:
>>>>> 1. remove this transpilation step,
>>>>> 2. and instead allow the Mb tag to actually appear on RMW instructions
>>>>> 3. change the cat file to explicitly define the behavior of the Mb tag
>>>>> on RMW instructions
>>>>
>>>> These are the exact 3 things I changed in dartagnan for testing what Jonas
>>>> proposed.
>>>>
>>>> I am not sure if further changes are needed for herd7.
> 
> What about failed RMW instructions?  IIRC, herd7 generates just an R for
> these, not both R and W, but won't it still be annotated with an mb tag?
> And wasn't this matter of failed RMWs one of the issues that the two of
> you specifically wanted to make explicit in the memory model, rather
> than implicit in the operation of herd7?

That's why we use the RMW_MB tag. I should have copied that definition too:


(* full barrier events that appear in non-failing RMW *)
let RMW_MB = Mb & (dom(rmw) | range(rmw))


This ensures that the only successful rmw instructions have an RMW_MB tag.


> 
> And wasn't another one of these issues the difference between
> value-returning and non-value-returning RMWs?  As far as I can, nothing
> in the .def file specifically mentions this.  There's the noreturn tag
> in the .bell file, but nothing in the .def file says which instructions
> it applies to.  Or are we supposed to know that it automatically applies
> to all __atomic_op() instances?

Ah, now you're already forestalling my next suggestion :))

I would say let's fix one issue at a time (learned this from Andrea).

For the other issue, do noreturn rmws always have the same ordering as once?

I suspect we need to extend herd slightly more to support the second 
one, I don't know if there's syntax for passing tags to __atomic_op.


> 
>>> Okay, good.  This answers the first part of what I asked.  What about
>>> the second part?  That is, how will the changes to the .def, .bell, and
>>> .cat files achieve your goals?
>>>
>>> Alan
>>
>>
>> Firstly, we'd allow 'mb as a barrier mode in events, e.g.
> 
> You mean as a mode in RMW events.  'mb already is a mode for F events.

Thanks, you're right.

> 
>> instructions RMW[{'once,'acquire,'release,'mb}]
>>
>> then the Mb tags would appear in the graph. And then I'd define the ordering
>> explicitly. One way is to say that an Mb tag orders all memory accesses
>> before(or at) the tag with all memory accesses after(or at) the tag, except
>> the accesses of the rmw with each other.
> 
> I don't think you need to add very much.  The .cat file already defines
> the mb relation as including the term:
> 
> 	([M] ; fencerel(Mb) ; [M])
> 
> All that's needed is to replace the fencerel(Mb) with something more
> general...
> 
>> This is the same as the full fence before the read, which orders all memory
>> accesses before the read with every access after(or at) the read,
>> plus the full fence after the write, which orders all memory accesses
>> before(or at) the write with all accesses after the write.
>>
>> That would be done by adding
>>
>>       [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
> 
> .. like this.
> 
>> to ppo.
> 
> You need to update the definition of mb, not ppo.

Yes, I meant to update the definition of mb, but I momentarily thought 
the effect of that is just that

ppo_new = ppo_old |  [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]

I forgot that extending mb has another effect, in pb.


>  And the RMW_MB above
> looks wrong; shouldn't it be just Mb?

As discussed above, no, since the Mb will also be on the failed RMW.

> 
> Also, is the "\ rmw" part really necessary?  I don't think it makes any
> difference; the memory model already knows that the read part of an RMW
> has to happen before the write part.

It unfortunately does make a difference, because mb is a strong fence.
This means that an Mb in an rmw sequence would create additional pb edges.

   prop;(rfe ; [Mb];rmw;[Mb])

I believe this is might give wrong results on powerpc, but I'd need to 
double check.


> 
>> One could also split it into two rules to keep with the "two full fences"
>> analogy. Maybe a good way would be like this:
>>
>>       [M] ; po; [RMW_MB & R] ; po^? ; [M]
>>
>>       [M] ; po^?; [RMW_MB & W] ; po ; [M]
> 
> My preference is for the first approach.

That was also my early preference, but to be honest I expected that 
you'd prefer the second approach.
Actually, I also started warming up to it.


Have fun,
  jonas


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-22  9:20           ` Jonas Oberhauser
@ 2024-05-22 14:20             ` Alan Stern
  2024-05-22 16:54               ` Andrea Parri
  2024-05-23 12:54               ` Jonas Oberhauser
  0 siblings, 2 replies; 33+ messages in thread
From: Alan Stern @ 2024-05-22 14:20 UTC (permalink / raw)
  To: Jonas Oberhauser
  Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes

On Wed, May 22, 2024 at 11:20:47AM +0200, Jonas Oberhauser wrote:
> 
> 
> Am 5/21/2024 um 5:36 PM schrieb Alan Stern:
> > On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote:
> > > 
> > > 
> > > Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
> > > > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
> > > > > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
> > > > > > 
> > > > > > 
> > > > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
> > > > > > > Hernan and Jonas:
> > > > > > > 
> > > > > > > Can you explain more fully the changes you want to make to herd7 and/or
> > > > > > > the LKMM?  The goal is to make the memory barriers currently implicit in
> > > > > > > RMW operations explicit, but I couldn't understand how you propose to do
> > > > > > > this.
> > > > > > > 
> > > > > > > Are you going to change herd7 somehow, and if so, how?  It seems like
> > > > > > > you should want to provide sufficient information so that the .bell
> > > > > > > and .cat files can implement the appropriate memory barriers associated
> > > > > > > with each RMW operation.  What additional information is needed?  And
> > > > > > > how (explained in English, not by quoting source code) will the .bell
> > > > > > > and .cat files make use of this information?
> > > > > > > 
> > > > > > > Alan
> > > > > > 
> > > > > > 
> > > > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the
> > > > > > following:
> > > > > > - if a tag called Mb appears on an rmw instruction (by instruction I
> > > > > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with
> > > > > > one of those things:
> > > > > >      * full mb ; once (the rmw) ; full mb, if a value returning
> > > > > > (successful) rmw
> > > > > >      * once (the rmw)   otherwise
> > > > > > - everything else gets translated 1:1 into some internal representation
> > > > > 
> > > > > This is my understanding from reading the source code of CSem.ml in herd7's
> > > > > repo.
> > > > > 
> > > > > Also, this is exactly what dartagnan is currently doing.
> > > > > 
> > > > > > 
> > > > > > What I'm proposing is:
> > > > > > 1. remove this transpilation step,
> > > > > > 2. and instead allow the Mb tag to actually appear on RMW instructions
> > > > > > 3. change the cat file to explicitly define the behavior of the Mb tag
> > > > > > on RMW instructions
> > > > > 
> > > > > These are the exact 3 things I changed in dartagnan for testing what Jonas
> > > > > proposed.
> > > > > 
> > > > > I am not sure if further changes are needed for herd7.
> > 
> > What about failed RMW instructions?  IIRC, herd7 generates just an R for
> > these, not both R and W, but won't it still be annotated with an mb tag?
> > And wasn't this matter of failed RMWs one of the issues that the two of
> > you specifically wanted to make explicit in the memory model, rather
> > than implicit in the operation of herd7?
> 
> That's why we use the RMW_MB tag. I should have copied that definition too:
> 
> 
> (* full barrier events that appear in non-failing RMW *)
> let RMW_MB = Mb & (dom(rmw) | range(rmw))
> 
> 
> This ensures that the only successful rmw instructions have an RMW_MB tag.

It would be better if there was a way to tell herd7 not to add the 'mb 
tag to failed instructions in the first place.  This approach is 
brittle; see below.

An alternative would be to have a way for the .cat file to remove the 
'mb tag from a failed RMW instruction.  But I don't know if this is 
feasible.

> > And wasn't another one of these issues the difference between
> > value-returning and non-value-returning RMWs?  As far as I can, nothing
> > in the .def file specifically mentions this.  There's the noreturn tag
> > in the .bell file, but nothing in the .def file says which instructions
> > it applies to.  Or are we supposed to know that it automatically applies
> > to all __atomic_op() instances?
> 
> Ah, now you're already forestalling my next suggestion :))
> 
> I would say let's fix one issue at a time (learned this from Andrea).
> 
> For the other issue, do noreturn rmws always have the same ordering as once?

If they aren't annotated with _acquire or _release then yes...  And I 
don't know whether there _are_ any annotated no-return RMWs.  If 
somebody wanted such a thing, they probably would just use a 
value-returning RMW instead.

> I suspect we need to extend herd slightly more to support the second one, I
> don't know if there's syntax for passing tags to __atomic_op.

This may not be be needed.  But still, it would nice to be explicit (in 
a comment in the .def file if nowhere else) that __atomic_op always adds 
a 'noreturn tag.

> > > instructions RMW[{'once,'acquire,'release,'mb}]
> > > 
> > > then the Mb tags would appear in the graph. And then I'd define the ordering
> > > explicitly. One way is to say that an Mb tag orders all memory accesses
> > > before(or at) the tag with all memory accesses after(or at) the tag, except
> > > the accesses of the rmw with each other.
> > 
> > I don't think you need to add very much.  The .cat file already defines
> > the mb relation as including the term:
> > 
> > 	([M] ; fencerel(Mb) ; [M])
> > 
> > All that's needed is to replace the fencerel(Mb) with something more
> > general...

And this is why I said the RMW_MB mechanism is brittle.  With the 'mb 
tag still added to failed RMW events, the term above will cause the 
memory model to think there is ordering even though the event isn't in 
the RMW_MB class.

> > Also, is the "\ rmw" part really necessary?  I don't think it makes any
> > difference; the memory model already knows that the read part of an RMW
> > has to happen before the write part.
> 
> It unfortunately does make a difference, because mb is a strong fence.
> This means that an Mb in an rmw sequence would create additional pb edges.
> 
>   prop;(rfe ; [Mb];rmw;[Mb])
> 
> I believe this is might give wrong results on powerpc, but I'd need to
> double check.

PowerPC uses a load-reserve/store-conditional approach for RMW, so it's 
tricky.  However, you're right that having a fictitious mb between the 
read and the write of an RMW would mean that the preceding (in coherence 
order) write would have to be visible to all CPUs before the RMW write 
could execute, and I don't believe we want to assert this.

> > > One could also split it into two rules to keep with the "two full fences"
> > > analogy. Maybe a good way would be like this:
> > > 
> > >       [M] ; po; [RMW_MB & R] ; po^? ; [M]
> > > 
> > >       [M] ; po^?; [RMW_MB & W] ; po ; [M]
> > 
> > My preference is for the first approach.
> 
> That was also my early preference, but to be honest I expected that you'd
> prefer the second approach.
> Actually, I also started warming up to it.

If you do want to use this approach, it should be simplified.  All you 
need is:

	[M] ; po ; [RMW_MB]

	[RMW_MB] ; po ; [M]

This is because events tagged with RMW_MB always are memory accesses, 
and accesses that aren't part of the RMW are already covered by the 
fencerel(Mb) thing above.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-22 14:20             ` Alan Stern
@ 2024-05-22 16:54               ` Andrea Parri
  2024-05-22 18:20                 ` Alan Stern
  2024-05-23 14:27                 ` Jonas Oberhauser
  2024-05-23 12:54               ` Jonas Oberhauser
  1 sibling, 2 replies; 33+ messages in thread
From: Andrea Parri @ 2024-05-22 16:54 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jonas Oberhauser, Hernan Ponce de Leon, Paul E. McKenney,
	linux-kernel, linux-arch, kernel-team, boqun.feng, j.alglave,
	luc.maranget, Joel Fernandes

Alan, all,

("randomly" picking a recent post in the thread, after having observed
this discussion for a while...)

> It would be better if there was a way to tell herd7 not to add the 'mb 
> tag to failed instructions in the first place.  This approach is 
> brittle; see below.

AFAIU, changing the herd representation to generate mb-accesses in place
of certain mb-fences...

> If you do want to use this approach, it should be simplified.  All you 
> need is:
> 
> 	[M] ; po ; [RMW_MB]
> 
> 	[RMW_MB] ; po ; [M]
> 
> This is because events tagged with RMW_MB always are memory accesses, 
> and accesses that aren't part of the RMW are already covered by the 
> fencerel(Mb) thing above.

... and updating the .cat file to the effects of something like

  -let mb = ([M] ; fencerel(Mb) ; [M]) |
  +let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) |

... can hardly be called "making RMW barriers explicit".  (So much so
that the first commit in PR #865 was titled "Remove explicit barriers
from RMWs".  :-))

Overall, this discussion rather seems to confirm the close link between
tools/memory-model/ and herdtools7.  (After all, to what extent could
any putative RMW_MB be considered "explicit" without _knowing the under-
lying representation of the RMW operations...)  My understanding is that
this discussion was at least in part motivated by a desire to experiment
and familiarize with the current herd representation (that does indeed
require some getting-used-to...); this suggests, as some of you already
mentioned, to add some comments or a .txt in tools/memory-model/ in order
to document such representation and ameliorate that experience.  OTOH, I
must admit, I'm unable to see here sufficient motivation(tm) for changing
the current representation (and model): not the how, but the why...

  Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-22 16:54               ` Andrea Parri
@ 2024-05-22 18:20                 ` Alan Stern
  2024-05-22 19:48                   ` Hernan Ponce de Leon
  2024-05-23 14:27                 ` Jonas Oberhauser
  1 sibling, 1 reply; 33+ messages in thread
From: Alan Stern @ 2024-05-22 18:20 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Jonas Oberhauser, Hernan Ponce de Leon, Paul E. McKenney,
	linux-kernel, linux-arch, kernel-team, boqun.feng, j.alglave,
	luc.maranget, Joel Fernandes

On Wed, May 22, 2024 at 06:54:25PM +0200, Andrea Parri wrote:
> Alan, all,
> 
> ("randomly" picking a recent post in the thread, after having observed
> this discussion for a while...)
> 
> > It would be better if there was a way to tell herd7 not to add the 'mb 
> > tag to failed instructions in the first place.  This approach is 
> > brittle; see below.
> 
> AFAIU, changing the herd representation to generate mb-accesses in place
> of certain mb-fences...

I believe herd7 already generates mb accesses (not fences) for certain 
RMW operations.  But then it does some post-processing on them, and that 
post-processing is what we are thinking of changing.

> > If you do want to use this approach, it should be simplified.  All you 
> > need is:
> > 
> > 	[M] ; po ; [RMW_MB]
> > 
> > 	[RMW_MB] ; po ; [M]
> > 
> > This is because events tagged with RMW_MB always are memory accesses, 
> > and accesses that aren't part of the RMW are already covered by the 
> > fencerel(Mb) thing above.
> 
> ... and updating the .cat file to the effects of something like
> 
>   -let mb = ([M] ; fencerel(Mb) ; [M]) |
>   +let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) |
> 
> ... can hardly be called "making RMW barriers explicit".  (So much so
> that the first commit in PR #865 was titled "Remove explicit barriers
> from RMWs".  :-))

There is another point, something we didn't spell out explicitly in the 
email discussion.  Namely, in linux-kernel.def there is a long list of 
instructions along with corresponding herd7 implementation instructions, 
and those instructions explicitly contain either {once}, {acquire}, 
{release}, or {mb} tags.  So to a large extent, these barriers already 
are explicit in the memory model.  Not in the .cat file, but in the .def 
file.

What is not so explicit is how the {mb} tag works.  Its operation isn't 
as simple as the operation of the {acquire} and {release} tags; those 
just modify the R or W access in the RMW pair as you would expect.  
Instead, an {mb} tag says to insert strong memory barriers before the R 
access and after the W access.  This is more or less what the 
post-processing mentioned earlier does, and Jonas and Hernan want to 
move this out of herd7 and into the memory model.

> Overall, this discussion rather seems to confirm the close link between
> tools/memory-model/ and herdtools7.  (After all, to what extent could
> any putative RMW_MB be considered "explicit" without _knowing the under-
> lying representation of the RMW operations...)  My understanding is that
> this discussion was at least in part motivated by a desire to experiment
> and familiarize with the current herd representation (that does indeed
> require some getting-used-to...); this suggests, as some of you already
> mentioned, to add some comments or a .txt in tools/memory-model/ in order
> to document such representation and ameliorate that experience.  OTOH, I
> must admit, I'm unable to see here sufficient motivation(tm) for changing
> the current representation (and model): not the how, but the why...

Well, it's not a big change.  And in my opinion, if something can be 
moved out of herd7's innards and into the memory model files, then doing 
so is generally a good idea.

However, I do agree that there will still be a close link between 
tools/memory-model/ and herdtools7.  This may be unavoidable, at least 
to some extent, but any way to reduce it is worth considering.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-22 18:20                 ` Alan Stern
@ 2024-05-22 19:48                   ` Hernan Ponce de Leon
  2024-05-23  9:04                     ` Andrea Parri
  0 siblings, 1 reply; 33+ messages in thread
From: Hernan Ponce de Leon @ 2024-05-22 19:48 UTC (permalink / raw)
  To: Alan Stern, Andrea Parri
  Cc: Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes

On 5/22/2024 8:20 PM, Alan Stern wrote:
> On Wed, May 22, 2024 at 06:54:25PM +0200, Andrea Parri wrote:
>> Alan, all,
>>
>> ("randomly" picking a recent post in the thread, after having observed
>> this discussion for a while...)
>>
>>> It would be better if there was a way to tell herd7 not to add the 'mb
>>> tag to failed instructions in the first place.  This approach is
>>> brittle; see below.
>>
>> AFAIU, changing the herd representation to generate mb-accesses in place
>> of certain mb-fences...
> 
> I believe herd7 already generates mb accesses (not fences) for certain
> RMW operations.  But then it does some post-processing on them, and that
> post-processing is what we are thinking of changing.
> 
>>> If you do want to use this approach, it should be simplified.  All you
>>> need is:
>>>
>>> 	[M] ; po ; [RMW_MB]
>>>
>>> 	[RMW_MB] ; po ; [M]
>>>
>>> This is because events tagged with RMW_MB always are memory accesses,
>>> and accesses that aren't part of the RMW are already covered by the
>>> fencerel(Mb) thing above.
>>
>> ... and updating the .cat file to the effects of something like
>>
>>    -let mb = ([M] ; fencerel(Mb) ; [M]) |
>>    +let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) |
>>
>> ... can hardly be called "making RMW barriers explicit".  (So much so
>> that the first commit in PR #865 was titled "Remove explicit barriers
>> from RMWs".  :-))
> 
> There is another point, something we didn't spell out explicitly in the
> email discussion.  Namely, in linux-kernel.def there is a long list of
> instructions along with corresponding herd7 implementation instructions,
> and those instructions explicitly contain either {once}, {acquire},
> {release}, or {mb} tags.  So to a large extent, these barriers already
> are explicit in the memory model.  Not in the .cat file, but in the .def
> file.
> 
> What is not so explicit is how the {mb} tag works.  Its operation isn't
> as simple as the operation of the {acquire} and {release} tags; those
> just modify the R or W access in the RMW pair as you would expect.
> Instead, an {mb} tag says to insert strong memory barriers before the R
> access and after the W access.  This is more or less what the
> post-processing mentioned earlier does, and Jonas and Hernan want to
> move this out of herd7 and into the memory model.
> 
>> Overall, this discussion rather seems to confirm the close link between
>> tools/memory-model/ and herdtools7.  (After all, to what extent could
>> any putative RMW_MB be considered "explicit" without _knowing the under-
>> lying representation of the RMW operations...)  My understanding is that
>> this discussion was at least in part motivated by a desire to experiment
>> and familiarize with the current herd representation (that does indeed
>> require some getting-used-to...); this suggests, as some of you already
>> mentioned, to add some comments or a .txt in tools/memory-model/ in order
>> to document such representation and ameliorate that experience.  OTOH, I
>> must admit, I'm unable to see here sufficient motivation(tm) for changing
>> the current representation (and model): not the how, but the why...
> 
> Well, it's not a big change.  And in my opinion, if something can be
> moved out of herd7's innards and into the memory model files, then doing
> so is generally a good idea.
> 
> However, I do agree that there will still be a close link between
> tools/memory-model/ and herdtools7.  This may be unavoidable, at least
> to some extent, but any way to reduce it is worth considering.

I can give my motivation as a tool developer. It would be much simpler 
if one could find all the information to support lkmm in 
tools/memory-model/ (in the form of the model + some comments or a .txt 
to cover those things we cannot move out of the tool implementation), 
rather than having to dive into herd7 code.

Hernan

> 
> Alan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-22 19:48                   ` Hernan Ponce de Leon
@ 2024-05-23  9:04                     ` Andrea Parri
  0 siblings, 0 replies; 33+ messages in thread
From: Andrea Parri @ 2024-05-23  9:04 UTC (permalink / raw)
  To: Hernan Ponce de Leon
  Cc: Alan Stern, Jonas Oberhauser, Paul E. McKenney, linux-kernel,
	linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes

> It would be much simpler if
> one could find all the information to support lkmm in tools/memory-model/
> (in the form of the model + some comments or a .txt to cover those things we
> cannot move out of the tool implementation), rather than having to dive into
> herd7 code.

Got it.  And I do find that very relatable to LKMM developers who, like me,
are definitely not fluent in OCaml.  :-/

Let me draft some .txt to such effect, based on but expanding and hopefully
completing my previous remarks in

  https://lore.kernel.org/lkml/ZjrSm119+IfIU9eU@andrea/

Having something like that "on paper" can also help evaluate future changes
to the tool and/or model.

Thank you for the suggestion.

  Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-22 16:54               ` Andrea Parri
  2024-05-22 18:20                 ` Alan Stern
@ 2024-05-23 14:27                 ` Jonas Oberhauser
  2024-05-23 16:35                   ` Andrea Parri
  1 sibling, 1 reply; 33+ messages in thread
From: Jonas Oberhauser @ 2024-05-23 14:27 UTC (permalink / raw)
  To: Andrea Parri, Alan Stern
  Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes

Am 5/22/2024 um 6:54 PM schrieb Andrea Parri:
> Alan, all,
> 
> ("randomly" picking a recent post in the thread, after having observed
> this discussion for a while...)
> 
>> It would be better if there was a way to tell herd7 not to add the 'mb
>> tag to failed instructions in the first place.  This approach is
>> brittle; see below.
> 
> AFAIU, changing the herd representation to generate mb-accesses in place
> of certain mb-fences...
> 
>> If you do want to use this approach, it should be simplified.  All you
>> need is:
>>
>> 	[M] ; po ; [RMW_MB]
>>
>> 	[RMW_MB] ; po ; [M]
>>
>> This is because events tagged with RMW_MB always are memory accesses,
>> and accesses that aren't part of the RMW are already covered by the
>> fencerel(Mb) thing above.
> 
> .. and updating the .cat file to the effects of something like
> 
>    -let mb = ([M] ; fencerel(Mb) ; [M]) |
>    +let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) |
> 
> .. can hardly be called "making RMW barriers explicit".  (So much so
> that the first commit in PR #865 was titled "Remove explicit barriers
> from RMWs".  :-))
> 
> Overall, this discussion rather seems to confirm the close link between
> tools/memory-model/ and herdtools7.  (After all, to what extent could
> any putative RMW_MB be considered "explicit" without _knowing the under-
> lying representation of the RMW operations...)  

My view is a bit different. There's more or less standard theory for 
weak memory models, including how operations at the source code level 
map to events in the graph.

Part of that standard theory are relations like rmw, and the 
circumstances under which they appear in the graph.

You'll see these relations in generic papers about weak memory models, 
like GenMC, Hahn, etc. as well as in pretty much every specific memory 
model like TSO, C11, PowerPC, VMM, etc., totally independently of herd7 
(even though these notations have historically developed together with 
herd).

Naively I would expect that a tool like herd7 would be a generic WMM 
exploration tool, which follows these standards with a very obvious 1:1 
mapping of what we see in the code and the events we see in the graph, 
plus perhaps a thin and explicit configurable herd7-specific mapping 
layer like what we see in linux_kernel.cat to configure the syntax for a 
specific model.

In that mapping layer, we currently see that xchg() is an exchange 
operation with an Mb tag. Just like an smp_load_acquire is a read with 
an acquire tag.
Or so it seems.

Instead, we find that contrary to what's written in that file, and 
contrary to the conventions, an xchg() is an F[mb] ; R[once] ; W[once] 
; F[Mb]. And in fact a cmp_xchg could even be a R[once].

That's because the herd7 tool isn't quite as generic as one might think, 
but rather specifically "detects" that it's running the LKMM and then 
the mapping isn't what you'd naively think, but something hidden in the 
OCaml implementation of herd7.

That would be comparable to a popular tool for matrix calculations using 
the syntax
[ 10  5  4
   3   4  2 ]

to define a 2x3 matrix, unless one of the values is -3, in which case it 
becomes a vector of 6 elements, because that's what a really important 
user of the tool wanted, and then didn't see enough of a need to change.

My point is that to anyone who has seen standard matrix notation, this 
syntax in a matrix-computation-tool looks like it would be a matrix, 
right? And it usually is a matrix; then it should always be a matrix.

I usually say I don't look much at natural language documentation and 
only read code, because code doesn't lie. In this case, the natural 
language documentation is saying the correct thing thanks to the hard 
work of a lot of people, but the code (in .cat etc.) doesn't do what it 
seems to be doing.

And I think that should be changed, both to reduce the anyways high 
mental load of reading the code without having to do mental 
transformations in your head, and also to make herd more Lkmm-agnostic.
In the ideal world, herd doesn't know anything about Lkmm except what we 
tell it through tools/memory-model, and generic things like C syntax + 
semantics.

So when using syntax like dom(rmw), I don't see it as confirming the 
close link to herd more than before, but rather depending more on 
standard notations and conventions, and relying on herd's close link to 
those standard notations (such as using rmw for the rmw relation and 
dom(r) for the domain of r) for dom(rmw) to mean what anyone who has 
deeply read a couple of modern WMM papers would expect.

> My understanding is that
> this discussion was at least in part motivated by a desire to experiment
> and familiarize with the current herd representation

I would phrase it more extreme: I want to get rid of the unnecessary 
non-standard parts of the herd representation.

Those parts have led me astray several times. Ok, who cares about me, 
but even Luc once forgot about the non-standard parts and thought it is 
a bug:

https://github.com/herd/herdtools7/issues/384#issuecomment-1132859904

I also know that Viktor stumbled over it before and also suggested we 
change it.

I think there's 0 benefit to them, they're just wasting people's time 
and energy and lead to misunderstanding.

Ok, this e-mail became long. Hope it at least helps clarify my 
motivation :))

jonas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-23 14:27                 ` Jonas Oberhauser
@ 2024-05-23 16:35                   ` Andrea Parri
  2024-05-23 20:30                     ` Jonas Oberhauser
  0 siblings, 1 reply; 33+ messages in thread
From: Andrea Parri @ 2024-05-23 16:35 UTC (permalink / raw)
  To: Jonas Oberhauser
  Cc: Alan Stern, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel,
	linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes

> I would phrase it more extreme: I want to get rid of the unnecessary
> non-standard parts of the herd representation.

Please indulge the thought that what might appear to be "non-standard"
to one or one's community might appear differently to others.

Continuing with the example above, I'm pretty sure your "standard" and
simple idea of mb-reads and mb-writes, or even "unsuccessful" mb-reads
will turn up the nose of many kernel developers...  The current repre-
sentation for xchg() was described in the ASPLOS'18 work and it's been
used (& tested) upstream since the LKMM was first merged ~6 years ago.

But that's not the point, "standards" can change and certainly kernels
and tools do.  My remark was more to point out that you're not getting
rid of anything..., you're simply proposing a different representation
(asking kernel developers & maintainers to "deal with it"): here's why
I was and I am looking forward to something more than "because we can".

  Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-23 16:35                   ` Andrea Parri
@ 2024-05-23 20:30                     ` Jonas Oberhauser
  2024-05-24  3:30                       ` Andrea Parri
  2024-05-24  8:16                       ` Hernan Ponce de Leon
  0 siblings, 2 replies; 33+ messages in thread
From: Jonas Oberhauser @ 2024-05-23 20:30 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Alan Stern, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel,
	linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes

Am 5/23/2024 um 6:35 PM schrieb Andrea Parri:
>> I would phrase it more extreme: I want to get rid of the unnecessary
>> non-standard parts of the herd representation.
> 
> Please indulge the thought that what might appear to be "non-standard"
> to one or one's community might appear differently to others.

Certainly. And of course, the formalization of the LKMM doesn't have to 
be optimized for the WMM community, even though I suspect that they (and 
possibly the tools they develop) are the main direct consumers of the 
formalization.

> Continuing with the example above, I'm pretty sure your "standard" and
> simple idea of mb-reads and mb-writes, or even "unsuccessful" mb-reads
> will turn up the nose of many kernel developers...  

I'm not sure how true that is. Firstly I'm not sure how many kernel 
developers really read the formalization and try to see what it does, 
rather than relying on tools to gain an intuition of what's going on.

But let's say a kernel developer wants to make sure that their intuition 
of how cmpxchg works is matched by the formal model, e.g., they want to 
double check that the formal model is correct
after they got some unexpected results in a herd7 litmus test.

How would they go about it today? The only way is to read the OCaml 
source code, because there's no other code that obviously defines the 
behavior. At best, they would read

atomic_cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W)
atomic_cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W)
atomic_cmpxchg(X,V,W) __cmpxchg{mb}(X,V,W)

see the Acquire, Release, and Mb references in the model, and think "ok, 
so '__cmpxchg{acquire}' means I get an Acquire tag in the success case 
which gives acq_po ordering, '__cmpxchg{release}' means I get a Release 
tag in the success case which gives po_rel ordering. Therefore 
'__cmpxchg{mb}' must give me an Mb tag in the success case. That would 
give me mb ordering, but not between the store and subsequent 
operations, and that's just wrong."

But of course this "understanding by analogy" is broken, because there's 
a bunch of special OCaml match expressions in herd that look only for 
release or mb and just give totally different behavior for that one 
case. Every other tag behaves exactly the same way.

At worst they wouldn't even guess that this only is the success ordering 
(that's definitely what happened to us in the weak memory model 
community, but we don't matter for this argument).

A better situation would be to do something like this:

                                        (*success*)(*fail*)
atomic_cmpxchg_acquire(X,V,W) __cmpxchg {acquire}  {once} (X,V,W)
atomic_cmpxchg_release(X,V,W) __cmpxchg {release}  {once} (X,V,W)
atomic_cmpxchg(X,V,W)         __cmpxchg {mb}       {once} (X,V,W)

and being explicit in the model about what the Mb tag does:

      | [M];fencerel(Mb);[M] (* smp_mb() and operations with Mb ordering 
such as cmpxchg() *)
      | [M];po;[R & Mb] (* reads with an Mb tag - coming from full fence 
rmws - are ordered as-if there was an smp_mb() right before the read *)
      | [W & Mb];po;[M] (* correspondingly, writes with an Mb tag are 
ordered as-if there was an smp_mb() right after the write *)

And suddenly you can read the model and map it 1:1 to the intuition that 
you can treat a successful cmpxchg() (or an xchg() etc.) as-if it was 
enclosed by smp_mb().

There doesn't need to be a real fence.

> The current repre-
> sentation for xchg() was described in the ASPLOS'18 work

Do you mean the one example in Table 3?
What about cmpxchg() or cmpxchg_acquire()?

> But that's not the point, "standards" can change and certainly kernels
> and tools do.  My remark was more to point out that you're not getting
> rid of anything..., 

We're definitely getting rid of some lines in herd7, that have been 
added solely for dealing with this specific case of LKMM.

For example, there's some code looking like this (for a memory ordering 
tag `a`)

                 (match a with ["release"] -> a |  _ -> a_once)

which specifically refers to the release tag in LKMM and we can turn 
that into

                 a

with no more reference to any LKMM tags. Of course, this is not the 
Linux community's problem, just the herd (and others who want to use the 
literal cat file of LKMM) maintainers who have to deal with it.

And imagine that at some point you want to add another tag to the linux 
kernel - like for example for 'accessing an atomic in initialization 
code, so that the compiler can do optimizations like merge a bunch of 
bitwise operations'. Let's call it 'init.

What will be the behavior of

WRITE_INIT(X,V) __store{init}(X,V);

with the current herd7? Honestly I have no clue, because there might be a

                 (match a with ["release"] -> a |  _ -> a_once)

somewhere that will turn this 'init into 'once. We'd have to comb the 
herd7 codebase to know for sure (or hope that our experiments are 
sufficiently thorough to catch all cases).

 >you're simply proposing a different representation

I wouldn't phrase it like that, since it's a representation many people 
familiar with WMM would expect, but admittedly that doesn't have to be 
the deciding factor.

> asking kernel developers & maintainers to "deal with it"

Deal with what, no longer having to learn OCaml to be sure that the 
LKMM's formal definition matches the description in memory_barriers.txt?
Otherwise, I don't see anything that changes for the worse.
With the change, people need to think for a few seconds to see that the 
explicit rules in the .cat file are a formalization of "treat xchg() as 
if it had smp_mb() before and after".
Without the change, they need to read an OCaml codebase to see that is 
meant by

atomic_xchg(X,V,W) __xchg{mb}(X,V,W)

So I don't see any downsides.

I would also support making the representation explicit in the .def 
files instead, with something like

cmp_xchg(x,e,v) = { if (*x==e) { smp_mb(); int r = 
__cmp_xchg{once}(x,e,v) ; smp_mb(); return r } else { __read{once}(x) } }

Then you get to keep the representation.
Without knowing herd's internals, I have no idea of how to seriously 
specify a meta language so that herd could effectively and efficiently 
deal with it in general. But one could at least envision some specific 
syntax for cmpxchg with a code sequence for the failure case and a code 
sequence for the success case.

Personally I don't prefer this option, firstly because I don't see a 
good reason for the Linux community to go their own way here. I don't 
think there's really much of a problem with saying "this is the 
intuition; this is how we formalize it" and for the two not to be 
completely identical. It happens all the time, and in this case the gap 
between "we formalize it by really having two smp_mb() appear in the 
graph if it's successful" and "we formalize it by providing the same 
ordering that the smp_mb() would have if it was there" isn't big.
Especially for those kernel people who have enough motivation to 
actually deep dive into the formalization in the first place. But it 
makes it a lot more accessible for the WMM community, which can only 
benefit LKMM.

And secondly because people will probably mostly focus on the .cat file, 
which means that it's still a bit of a booby trap to put something so 
important (and perhaps surprising) into the .def file, but at least one 
that is in the open for people who are more careful and also read the 
.def file.

 > I am looking forward to something more than "because we can".

- it makes it easier to maintain the LKMM in the future, because you 
don't have to work around hidden transformations inside herd7
- it makes implicit behavior explicit
- it makes it easier to understand that the formalization matches the 
intention
- it makes it easier to learn the LKMM from the formalization without 
having to cross-reference every bit with the informal documentation to 
avoid misunderstandings

Have a good time,
   jonas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-23 20:30                     ` Jonas Oberhauser
@ 2024-05-24  3:30                       ` Andrea Parri
  2024-05-24  8:16                       ` Hernan Ponce de Leon
  1 sibling, 0 replies; 33+ messages in thread
From: Andrea Parri @ 2024-05-24  3:30 UTC (permalink / raw)
  To: Jonas Oberhauser
  Cc: Alan Stern, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel,
	linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes

> Do you mean the one example in Table 3?
> What about cmpxchg() or cmpxchg_acquire()?

Yes, Table 3.

The cmpxchg*() primitives were not discussed in the paper.  IIRC, their
representation has not changed since at least 1c27b644c0fd.


> We're definitely getting rid of some lines in herd7, that have been added
> solely for dealing with this specific case of LKMM.

Good.  If the herd7 maintainers are "tired" of dealing with those lines,
that's definitely a big fat "why" to put in a changelog.


> Deal with what, no longer having to learn OCaml to be sure that the LKMM's
> formal definition matches the description in memory_barriers.txt?

Nope.  ;-)   Dealing with the review, testing, and maintainance of a new
representation.


> - it makes it easier to maintain the LKMM in the future, because you don't
> have to work around hidden transformations inside herd7
> - it makes implicit behavior explicit
> - it makes it easier to understand that the formalization matches the
> intention
> - it makes it easier to learn the LKMM from the formalization without having
> to cross-reference every bit with the informal documentation to avoid
> misunderstandings

Jonas -  You write "less hidden", "less implicit", but I keep reading "a
representation I/some people would expect".  We've already acknowledged
that's no deciding factor to abandon the current seasoned representation.

  Andrea

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-23 20:30                     ` Jonas Oberhauser
  2024-05-24  3:30                       ` Andrea Parri
@ 2024-05-24  8:16                       ` Hernan Ponce de Leon
  1 sibling, 0 replies; 33+ messages in thread
From: Hernan Ponce de Leon @ 2024-05-24  8:16 UTC (permalink / raw)
  To: Jonas Oberhauser, Andrea Parri
  Cc: Alan Stern, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes

On 5/23/2024 10:30 PM, Jonas Oberhauser wrote:
> 
> 
> Am 5/23/2024 um 6:35 PM schrieb Andrea Parri:
>>> I would phrase it more extreme: I want to get rid of the unnecessary
>>> non-standard parts of the herd representation.
>>
>> Please indulge the thought that what might appear to be "non-standard"
>> to one or one's community might appear differently to others.
> 
> Certainly. And of course, the formalization of the LKMM doesn't have to 
> be optimized for the WMM community, even though I suspect that they (and 
> possibly the tools they develop) are the main direct consumers of the 
> formalization.
> 
>> Continuing with the example above, I'm pretty sure your "standard" and
>> simple idea of mb-reads and mb-writes, or even "unsuccessful" mb-reads
>> will turn up the nose of many kernel developers... 
> 
> I'm not sure how true that is. Firstly I'm not sure how many kernel 
> developers really read the formalization and try to see what it does, 
> rather than relying on tools to gain an intuition of what's going on.
> 
> But let's say a kernel developer wants to make sure that their intuition 
> of how cmpxchg works is matched by the formal model, e.g., they want to 
> double check that the formal model is correct
> after they got some unexpected results in a herd7 litmus test.
> 
> How would they go about it today? The only way is to read the OCaml 
> source code, because there's no other code that obviously defines the 
> behavior. At best, they would read
> 
> atomic_cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W)
> atomic_cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W)
> atomic_cmpxchg(X,V,W) __cmpxchg{mb}(X,V,W)
> 
> see the Acquire, Release, and Mb references in the model, and think "ok, 
> so '__cmpxchg{acquire}' means I get an Acquire tag in the success case 
> which gives acq_po ordering, '__cmpxchg{release}' means I get a Release 
> tag in the success case which gives po_rel ordering. Therefore 
> '__cmpxchg{mb}' must give me an Mb tag in the success case. That would 
> give me mb ordering, but not between the store and subsequent 
> operations, and that's just wrong."
> 
> But of course this "understanding by analogy" is broken, because there's 
> a bunch of special OCaml match expressions in herd that look only for 
> release or mb and just give totally different behavior for that one 
> case. Every other tag behaves exactly the same way.
> 
> At worst they wouldn't even guess that this only is the success ordering 
> (that's definitely what happened to us in the weak memory model 
> community, but we don't matter for this argument).
> 
> A better situation would be to do something like this:
> 
>                                         (*success*)(*fail*)
> atomic_cmpxchg_acquire(X,V,W) __cmpxchg {acquire}  {once} (X,V,W)
> atomic_cmpxchg_release(X,V,W) __cmpxchg {release}  {once} (X,V,W)
> atomic_cmpxchg(X,V,W)         __cmpxchg {mb}       {once} (X,V,W)

I also think this is the best option. We could change the internal 
representation of __cmpxchg in herd7 to have two memory orders (*) 
similar to the current internal representation of C11 CAS and get the 
memory order to be used for the success/fail case from the .def instead 
of having this hardcoded in the code.

(*) Not to be confused to how LKMM sees a CAS instruction.

Hernan

> 
> and being explicit in the model about what the Mb tag does:
> 
>       | [M];fencerel(Mb);[M] (* smp_mb() and operations with Mb ordering 
> such as cmpxchg() *)
>       | [M];po;[R & Mb] (* reads with an Mb tag - coming from full fence 
> rmws - are ordered as-if there was an smp_mb() right before the read *)
>       | [W & Mb];po;[M] (* correspondingly, writes with an Mb tag are 
> ordered as-if there was an smp_mb() right after the write *)
> 
> And suddenly you can read the model and map it 1:1 to the intuition that 
> you can treat a successful cmpxchg() (or an xchg() etc.) as-if it was 
> enclosed by smp_mb().
> 
> There doesn't need to be a real fence.
> 
> 
>> The current repre-
>> sentation for xchg() was described in the ASPLOS'18 work
> 
> Do you mean the one example in Table 3?
> What about cmpxchg() or cmpxchg_acquire()?
> 
>> But that's not the point, "standards" can change and certainly kernels
>> and tools do.  My remark was more to point out that you're not getting
>> rid of anything..., 
> 
> We're definitely getting rid of some lines in herd7, that have been 
> added solely for dealing with this specific case of LKMM.
> 
> For example, there's some code looking like this (for a memory ordering 
> tag `a`)
> 
>                  (match a with ["release"] -> a |  _ -> a_once)
> 
> which specifically refers to the release tag in LKMM and we can turn 
> that into
> 
>                  a
> 
> with no more reference to any LKMM tags. Of course, this is not the 
> Linux community's problem, just the herd (and others who want to use the 
> literal cat file of LKMM) maintainers who have to deal with it.
> 
> And imagine that at some point you want to add another tag to the linux 
> kernel - like for example for 'accessing an atomic in initialization 
> code, so that the compiler can do optimizations like merge a bunch of 
> bitwise operations'. Let's call it 'init.
> 
> What will be the behavior of
> 
> WRITE_INIT(X,V) __store{init}(X,V);
> 
> with the current herd7? Honestly I have no clue, because there might be a
> 
>                  (match a with ["release"] -> a |  _ -> a_once)
> 
> somewhere that will turn this 'init into 'once. We'd have to comb the 
> herd7 codebase to know for sure (or hope that our experiments are 
> sufficiently thorough to catch all cases).
> 
>  >you're simply proposing a different representation
> 
> I wouldn't phrase it like that, since it's a representation many people 
> familiar with WMM would expect, but admittedly that doesn't have to be 
> the deciding factor.
> 
>> asking kernel developers & maintainers to "deal with it"
> 
> Deal with what, no longer having to learn OCaml to be sure that the 
> LKMM's formal definition matches the description in memory_barriers.txt?
> Otherwise, I don't see anything that changes for the worse.
> With the change, people need to think for a few seconds to see that the 
> explicit rules in the .cat file are a formalization of "treat xchg() as 
> if it had smp_mb() before and after".
> Without the change, they need to read an OCaml codebase to see that is 
> meant by
> 
> atomic_xchg(X,V,W) __xchg{mb}(X,V,W)
> 
> So I don't see any downsides.
> 
> 
> I would also support making the representation explicit in the .def 
> files instead, with something like
> 
> cmp_xchg(x,e,v) = { if (*x==e) { smp_mb(); int r = 
> __cmp_xchg{once}(x,e,v) ; smp_mb(); return r } else { __read{once}(x) } }
> 
> Then you get to keep the representation.
> Without knowing herd's internals, I have no idea of how to seriously 
> specify a meta language so that herd could effectively and efficiently 
> deal with it in general. But one could at least envision some specific 
> syntax for cmpxchg with a code sequence for the failure case and a code 
> sequence for the success case.
> 
> Personally I don't prefer this option, firstly because I don't see a 
> good reason for the Linux community to go their own way here. I don't 
> think there's really much of a problem with saying "this is the 
> intuition; this is how we formalize it" and for the two not to be 
> completely identical. It happens all the time, and in this case the gap 
> between "we formalize it by really having two smp_mb() appear in the 
> graph if it's successful" and "we formalize it by providing the same 
> ordering that the smp_mb() would have if it was there" isn't big.
> Especially for those kernel people who have enough motivation to 
> actually deep dive into the formalization in the first place. But it 
> makes it a lot more accessible for the WMM community, which can only 
> benefit LKMM.
> 
> And secondly because people will probably mostly focus on the .cat file, 
> which means that it's still a bit of a booby trap to put something so 
> important (and perhaps surprising) into the .def file, but at least one 
> that is in the open for people who are more careful and also read the 
> .def file.
> 
>  > I am looking forward to something more than "because we can".
> 
> - it makes it easier to maintain the LKMM in the future, because you 
> don't have to work around hidden transformations inside herd7
> - it makes implicit behavior explicit
> - it makes it easier to understand that the formalization matches the 
> intention
> - it makes it easier to learn the LKMM from the formalization without 
> having to cross-reference every bit with the informal documentation to 
> avoid misunderstandings
> 
> 
> Have a good time,
>    jonas
> 
> 
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-22 14:20             ` Alan Stern
  2024-05-22 16:54               ` Andrea Parri
@ 2024-05-23 12:54               ` Jonas Oberhauser
  2024-05-23 13:35                 ` Paul E. McKenney
  2024-05-23 14:05                 ` Alan Stern
  1 sibling, 2 replies; 33+ messages in thread
From: Jonas Oberhauser @ 2024-05-23 12:54 UTC (permalink / raw)
  To: Alan Stern
  Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes



Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
> On Wed, May 22, 2024 at 11:20:47AM +0200, Jonas Oberhauser wrote:
>>
>>
>> Am 5/21/2024 um 5:36 PM schrieb Alan Stern:
>>> On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote:
>>>>
>>>>
>>>> Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
>>>>> On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
>>>>>> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
>>>>>>>
>>>>>>>
>>>>>>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
>>>>>>>> Hernan and Jonas:
>>>>>>>>
>>>>>>>> Can you explain more fully the changes you want to make to herd7 and/or
>>>>>>>> the LKMM?  The goal is to make the memory barriers currently implicit in
>>>>>>>> RMW operations explicit, but I couldn't understand how you propose to do
>>>>>>>> this.
>>>>>>>>
>>>>>>>> Are you going to change herd7 somehow, and if so, how?  It seems like
>>>>>>>> you should want to provide sufficient information so that the .bell
>>>>>>>> and .cat files can implement the appropriate memory barriers associated
>>>>>>>> with each RMW operation.  What additional information is needed?  And
>>>>>>>> how (explained in English, not by quoting source code) will the .bell
>>>>>>>> and .cat files make use of this information?
>>>>>>>>
>>>>>>>> Alan
>>>>>>>
>>>>>>>
>>>>>>> I don't know whether herd7 needs to be changed. Probably, herd7 does the
>>>>>>> following:
>>>>>>> - if a tag called Mb appears on an rmw instruction (by instruction I
>>>>>>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with
>>>>>>> one of those things:
>>>>>>>       * full mb ; once (the rmw) ; full mb, if a value returning
>>>>>>> (successful) rmw
>>>>>>>       * once (the rmw)   otherwise
>>>>>>> - everything else gets translated 1:1 into some internal representation
>>>>>>
>>>>>> This is my understanding from reading the source code of CSem.ml in herd7's
>>>>>> repo.
>>>>>>
>>>>>> Also, this is exactly what dartagnan is currently doing.
>>>>>>
>>>>>>>
>>>>>>> What I'm proposing is:
>>>>>>> 1. remove this transpilation step,
>>>>>>> 2. and instead allow the Mb tag to actually appear on RMW instructions
>>>>>>> 3. change the cat file to explicitly define the behavior of the Mb tag
>>>>>>> on RMW instructions
>>>>>>
>>>>>> These are the exact 3 things I changed in dartagnan for testing what Jonas
>>>>>> proposed.
>>>>>>
>>>>>> I am not sure if further changes are needed for herd7.
>>>
>>> What about failed RMW instructions?  IIRC, herd7 generates just an R for
>>> these, not both R and W, but won't it still be annotated with an mb tag?
>>> And wasn't this matter of failed RMWs one of the issues that the two of
>>> you specifically wanted to make explicit in the memory model, rather
>>> than implicit in the operation of herd7?
>>
>> That's why we use the RMW_MB tag. I should have copied that definition too:
>>
>>
>> (* full barrier events that appear in non-failing RMW *)
>> let RMW_MB = Mb & (dom(rmw) | range(rmw))
>>
>>
>> This ensures that the only successful rmw instructions have an RMW_MB tag.
> 
> It would be better if there was a way to tell herd7 not to add the 'mb
> tag to failed instructions in the first place.  This approach is
> brittle; see below.

Hernan told me that in fact that is actually currently the case in 
herd7. Failing RMW get assigned the Once tag implicitly.
Another thing that I'd suggest to change.

> 
> An alternative would be to have a way for the .cat file to remove the
> 'mb tag from a failed RMW instruction.  But I don't know if this is
> feasible.

For Mb it's feasible, as there is no Mb read or Mb store.

Mb = Mb & (~M | dom(rmw) | range(rmw))

However one would want to do the same for Acq and Rel.

For that one would need to distinguish e.g. between a read that comes 
from a failed rmw instruction, and where the tag would disappear, or a 
normal standalone read.

For example, by using two different acquire tags, 'acquire and 
'rmw-acquire, and defining

Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))

Anyways we can do this change independently. So for now, we don't need 
RMW_MB.


> 
>>> And wasn't another one of these issues the difference between
>>> value-returning and non-value-returning RMWs?  As far as I can, nothing
>>> in the .def file specifically mentions this.  There's the noreturn tag
>>> in the .bell file, but nothing in the .def file says which instructions
>>> it applies to.  Or are we supposed to know that it automatically applies
>>> to all __atomic_op() instances?
>>
>> Ah, now you're already forestalling my next suggestion :))
>>
>> I would say let's fix one issue at a time (learned this from Andrea).
>>
>> For the other issue, do noreturn rmws always have the same ordering as once?
> 
> If they aren't annotated with _acquire or _release then yes...  And I
> don't know whether there _are_ any annotated no-return RMWs.  If
> somebody wanted such a thing, they probably would just use a
> value-returning RMW instead.
> 
>> I suspect we need to extend herd slightly more to support the second one, I
>> don't know if there's syntax for passing tags to __atomic_op.
> 
> This may not be be needed.  But still, it would nice to be explicit (in
> a comment in the .def file if nowhere else) that __atomic_op always adds
> a 'noreturn tag.
> 
>>>> instructions RMW[{'once,'acquire,'release,'mb}]
>>>>
>>>> then the Mb tags would appear in the graph. And then I'd define the ordering
>>>> explicitly. One way is to say that an Mb tag orders all memory accesses
>>>> before(or at) the tag with all memory accesses after(or at) the tag, except
>>>> the accesses of the rmw with each other.
>>>
>>> I don't think you need to add very much.  The .cat file already defines
>>> the mb relation as including the term:
>>>
>>> 	([M] ; fencerel(Mb) ; [M])
>>>
>>> All that's needed is to replace the fencerel(Mb) with something more
>>> general...
> 
> And this is why I said the RMW_MB mechanism is brittle.  With the 'mb
> tag still added to failed RMW events, the term above will cause the
> memory model to think there is ordering even though the event isn't in
> the RMW_MB class.
> 

Huh, I thought that fencerel(Mb) is something like po ; [F & Mb] ; po 
(where F includes standalone fences). But looking into the stdlib.cat, 
you're right.


>>> Also, is the "\ rmw" part really necessary?  I don't think it makes any
>>> difference; the memory model already knows that the read part of an RMW
>>> has to happen before the write part.
>>
>> It unfortunately does make a difference, because mb is a strong fence.
>> This means that an Mb in an rmw sequence would create additional pb edges.
>>
>>    prop;(rfe ; [Mb];rmw;[Mb])
>>
>> I believe this is might give wrong results on powerpc, but I'd need to
>> double check.
> 
> PowerPC uses a load-reserve/store-conditional approach for RMW, so it's
> tricky.  However, you're right that having a fictitious mb between the
> read and the write of an RMW would mean that the preceding (in coherence
> order) write would have to be visible to all CPUs before the RMW write
> could execute, and I don't believe we want to assert this.
> 
>>>> One could also split it into two rules to keep with the "two full fences"
>>>> analogy. Maybe a good way would be like this:
>>>>
>>>>        [M] ; po; [RMW_MB & R] ; po^? ; [M]
>>>>
>>>>        [M] ; po^?; [RMW_MB & W] ; po ; [M]
>>>
>>> My preference is for the first approach.
>>
>> That was also my early preference, but to be honest I expected that you'd
>> prefer the second approach.
>> Actually, I also started warming up to it.
> 
> If you do want to use this approach, it should be simplified.  All you
> need is:
> 
> 	[M] ; po ; [RMW_MB]
> 
> 	[RMW_MB] ; po ; [M]
> 
> This is because events tagged with RMW_MB always are memory accesses,
> and accesses that aren't part of the RMW are already covered by the
> fencerel(Mb) thing above.

This has exactly the issue mentioned above - it will cause the rmw to 
have an internal strong fence that on powerpc probably isn't there.

We could do (with the assumption that Mb applies only to successful rmw):

  	[M] ; po ; [Mb & R]
  	[Mb & W] ; po ; [M]


Kind regards, jonas


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-23 12:54               ` Jonas Oberhauser
@ 2024-05-23 13:35                 ` Paul E. McKenney
  2024-05-23 14:05                 ` Alan Stern
  1 sibling, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2024-05-23 13:35 UTC (permalink / raw)
  To: Jonas Oberhauser
  Cc: Alan Stern, Hernan Ponce de Leon, linux-kernel, linux-arch,
	kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes

On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote:
> Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
> > On Wed, May 22, 2024 at 11:20:47AM +0200, Jonas Oberhauser wrote:
> > > Am 5/21/2024 um 5:36 PM schrieb Alan Stern:
> > > > On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote:
> > > > > Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
> > > > > > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
> > > > > > > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
> > > > > > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
> > > > > > > > > Hernan and Jonas:
> > > > > > > > > 
> > > > > > > > > Can you explain more fully the changes you want to make to herd7 and/or
> > > > > > > > > the LKMM?  The goal is to make the memory barriers currently implicit in
> > > > > > > > > RMW operations explicit, but I couldn't understand how you propose to do
> > > > > > > > > this.
> > > > > > > > > 
> > > > > > > > > Are you going to change herd7 somehow, and if so, how?  It seems like
> > > > > > > > > you should want to provide sufficient information so that the .bell
> > > > > > > > > and .cat files can implement the appropriate memory barriers associated
> > > > > > > > > with each RMW operation.  What additional information is needed?  And
> > > > > > > > > how (explained in English, not by quoting source code) will the .bell
> > > > > > > > > and .cat files make use of this information?
> > > > > > > > > 
> > > > > > > > > Alan
> > > > > > > > 
> > > > > > > > 
> > > > > > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the
> > > > > > > > following:
> > > > > > > > - if a tag called Mb appears on an rmw instruction (by instruction I
> > > > > > > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with
> > > > > > > > one of those things:
> > > > > > > >       * full mb ; once (the rmw) ; full mb, if a value returning
> > > > > > > > (successful) rmw
> > > > > > > >       * once (the rmw)   otherwise
> > > > > > > > - everything else gets translated 1:1 into some internal representation
> > > > > > > 
> > > > > > > This is my understanding from reading the source code of CSem.ml in herd7's
> > > > > > > repo.
> > > > > > > 
> > > > > > > Also, this is exactly what dartagnan is currently doing.
> > > > > > > 
> > > > > > > > 
> > > > > > > > What I'm proposing is:
> > > > > > > > 1. remove this transpilation step,
> > > > > > > > 2. and instead allow the Mb tag to actually appear on RMW instructions
> > > > > > > > 3. change the cat file to explicitly define the behavior of the Mb tag
> > > > > > > > on RMW instructions
> > > > > > > 
> > > > > > > These are the exact 3 things I changed in dartagnan for testing what Jonas
> > > > > > > proposed.
> > > > > > > 
> > > > > > > I am not sure if further changes are needed for herd7.
> > > > 
> > > > What about failed RMW instructions?  IIRC, herd7 generates just an R for
> > > > these, not both R and W, but won't it still be annotated with an mb tag?
> > > > And wasn't this matter of failed RMWs one of the issues that the two of
> > > > you specifically wanted to make explicit in the memory model, rather
> > > > than implicit in the operation of herd7?
> > > 
> > > That's why we use the RMW_MB tag. I should have copied that definition too:
> > > 
> > > 
> > > (* full barrier events that appear in non-failing RMW *)
> > > let RMW_MB = Mb & (dom(rmw) | range(rmw))
> > > 
> > > 
> > > This ensures that the only successful rmw instructions have an RMW_MB tag.
> > 
> > It would be better if there was a way to tell herd7 not to add the 'mb
> > tag to failed instructions in the first place.  This approach is
> > brittle; see below.
> 
> Hernan told me that in fact that is actually currently the case in herd7.
> Failing RMW get assigned the Once tag implicitly.
> Another thing that I'd suggest to change.

Let's please be careful, though.  There is a lot out there that depends
on this semantic, odd though it might seem at first glance.  ;-)

							Thanx, Paul

> > An alternative would be to have a way for the .cat file to remove the
> > 'mb tag from a failed RMW instruction.  But I don't know if this is
> > feasible.
> 
> For Mb it's feasible, as there is no Mb read or Mb store.
> 
> Mb = Mb & (~M | dom(rmw) | range(rmw))
> 
> However one would want to do the same for Acq and Rel.
> 
> For that one would need to distinguish e.g. between a read that comes from a
> failed rmw instruction, and where the tag would disappear, or a normal
> standalone read.
> 
> For example, by using two different acquire tags, 'acquire and 'rmw-acquire,
> and defining
> 
> Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))
> 
> Anyways we can do this change independently. So for now, we don't need
> RMW_MB.
> 
> 
> > 
> > > > And wasn't another one of these issues the difference between
> > > > value-returning and non-value-returning RMWs?  As far as I can, nothing
> > > > in the .def file specifically mentions this.  There's the noreturn tag
> > > > in the .bell file, but nothing in the .def file says which instructions
> > > > it applies to.  Or are we supposed to know that it automatically applies
> > > > to all __atomic_op() instances?
> > > 
> > > Ah, now you're already forestalling my next suggestion :))
> > > 
> > > I would say let's fix one issue at a time (learned this from Andrea).
> > > 
> > > For the other issue, do noreturn rmws always have the same ordering as once?
> > 
> > If they aren't annotated with _acquire or _release then yes...  And I
> > don't know whether there _are_ any annotated no-return RMWs.  If
> > somebody wanted such a thing, they probably would just use a
> > value-returning RMW instead.
> > 
> > > I suspect we need to extend herd slightly more to support the second one, I
> > > don't know if there's syntax for passing tags to __atomic_op.
> > 
> > This may not be be needed.  But still, it would nice to be explicit (in
> > a comment in the .def file if nowhere else) that __atomic_op always adds
> > a 'noreturn tag.
> > 
> > > > > instructions RMW[{'once,'acquire,'release,'mb}]
> > > > > 
> > > > > then the Mb tags would appear in the graph. And then I'd define the ordering
> > > > > explicitly. One way is to say that an Mb tag orders all memory accesses
> > > > > before(or at) the tag with all memory accesses after(or at) the tag, except
> > > > > the accesses of the rmw with each other.
> > > > 
> > > > I don't think you need to add very much.  The .cat file already defines
> > > > the mb relation as including the term:
> > > > 
> > > > 	([M] ; fencerel(Mb) ; [M])
> > > > 
> > > > All that's needed is to replace the fencerel(Mb) with something more
> > > > general...
> > 
> > And this is why I said the RMW_MB mechanism is brittle.  With the 'mb
> > tag still added to failed RMW events, the term above will cause the
> > memory model to think there is ordering even though the event isn't in
> > the RMW_MB class.
> > 
> 
> Huh, I thought that fencerel(Mb) is something like po ; [F & Mb] ; po (where
> F includes standalone fences). But looking into the stdlib.cat, you're
> right.
> 
> 
> > > > Also, is the "\ rmw" part really necessary?  I don't think it makes any
> > > > difference; the memory model already knows that the read part of an RMW
> > > > has to happen before the write part.
> > > 
> > > It unfortunately does make a difference, because mb is a strong fence.
> > > This means that an Mb in an rmw sequence would create additional pb edges.
> > > 
> > >    prop;(rfe ; [Mb];rmw;[Mb])
> > > 
> > > I believe this is might give wrong results on powerpc, but I'd need to
> > > double check.
> > 
> > PowerPC uses a load-reserve/store-conditional approach for RMW, so it's
> > tricky.  However, you're right that having a fictitious mb between the
> > read and the write of an RMW would mean that the preceding (in coherence
> > order) write would have to be visible to all CPUs before the RMW write
> > could execute, and I don't believe we want to assert this.
> > 
> > > > > One could also split it into two rules to keep with the "two full fences"
> > > > > analogy. Maybe a good way would be like this:
> > > > > 
> > > > >        [M] ; po; [RMW_MB & R] ; po^? ; [M]
> > > > > 
> > > > >        [M] ; po^?; [RMW_MB & W] ; po ; [M]
> > > > 
> > > > My preference is for the first approach.
> > > 
> > > That was also my early preference, but to be honest I expected that you'd
> > > prefer the second approach.
> > > Actually, I also started warming up to it.
> > 
> > If you do want to use this approach, it should be simplified.  All you
> > need is:
> > 
> > 	[M] ; po ; [RMW_MB]
> > 
> > 	[RMW_MB] ; po ; [M]
> > 
> > This is because events tagged with RMW_MB always are memory accesses,
> > and accesses that aren't part of the RMW are already covered by the
> > fencerel(Mb) thing above.
> 
> This has exactly the issue mentioned above - it will cause the rmw to have
> an internal strong fence that on powerpc probably isn't there.
> 
> We could do (with the assumption that Mb applies only to successful rmw):
> 
>  	[M] ; po ; [Mb & R]
>  	[Mb & W] ; po ; [M]
> 
> 
> Kind regards, jonas
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-23 12:54               ` Jonas Oberhauser
  2024-05-23 13:35                 ` Paul E. McKenney
@ 2024-05-23 14:05                 ` Alan Stern
  2024-05-23 14:26                   ` Hernan Ponce de Leon
  2024-05-23 14:36                   ` Jonas Oberhauser
  1 sibling, 2 replies; 33+ messages in thread
From: Alan Stern @ 2024-05-23 14:05 UTC (permalink / raw)
  To: Jonas Oberhauser
  Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes

On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote:
> 
> 
> Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
> > It would be better if there was a way to tell herd7 not to add the 'mb
> > tag to failed instructions in the first place.  This approach is
> > brittle; see below.
> 
> Hernan told me that in fact that is actually currently the case in herd7.
> Failing RMW get assigned the Once tag implicitly.
> Another thing that I'd suggest to change.

Indeed.

> > An alternative would be to have a way for the .cat file to remove the
> > 'mb tag from a failed RMW instruction.  But I don't know if this is
> > feasible.
> 
> For Mb it's feasible, as there is no Mb read or Mb store.
> 
> Mb = Mb & (~M | dom(rmw) | range(rmw))
> 
> However one would want to do the same for Acq and Rel.
> 
> For that one would need to distinguish e.g. between a read that comes from a
> failed rmw instruction, and where the tag would disappear, or a normal
> standalone read.
> 
> For example, by using two different acquire tags, 'acquire and 'rmw-acquire,
> and defining
> 
> Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))
> 
> Anyways we can do this change independently. So for now, we don't need
> RMW_MB.

Overall, it seems better to have herd7 assign the right tag, but change 
the way the .def file works so that it can tell herd7 which tag to use 
in each of the success and failure cases.

> > 	[M] ; po ; [RMW_MB]
> > 
> > 	[RMW_MB] ; po ; [M]
> > 
> > This is because events tagged with RMW_MB always are memory accesses,
> > and accesses that aren't part of the RMW are already covered by the
> > fencerel(Mb) thing above.
> 
> This has exactly the issue mentioned above - it will cause the rmw to have
> an internal strong fence that on powerpc probably isn't there.

Oops, that's right.  Silly oversight on my part.  But at least you 
understood what I meant.

> We could do (with the assumption that Mb applies only to successful rmw):
> 
>  	[M] ; po ; [Mb & R]
>  	[Mb & W] ; po ; [M]

That works.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-23 14:05                 ` Alan Stern
@ 2024-05-23 14:26                   ` Hernan Ponce de Leon
  2024-05-23 15:14                     ` Boqun Feng
  2024-05-23 16:05                     ` Alan Stern
  2024-05-23 14:36                   ` Jonas Oberhauser
  1 sibling, 2 replies; 33+ messages in thread
From: Hernan Ponce de Leon @ 2024-05-23 14:26 UTC (permalink / raw)
  To: Alan Stern, Jonas Oberhauser
  Cc: Paul E. McKenney, linux-kernel, linux-arch, kernel-team,
	parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes

On 5/23/2024 4:05 PM, Alan Stern wrote:
> On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote:
>>
>>
>> Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
>>> It would be better if there was a way to tell herd7 not to add the 'mb
>>> tag to failed instructions in the first place.  This approach is
>>> brittle; see below.
>>
>> Hernan told me that in fact that is actually currently the case in herd7.
>> Failing RMW get assigned the Once tag implicitly.
>> Another thing that I'd suggest to change.
> 
> Indeed.
> 
>>> An alternative would be to have a way for the .cat file to remove the
>>> 'mb tag from a failed RMW instruction.  But I don't know if this is
>>> feasible.
>>
>> For Mb it's feasible, as there is no Mb read or Mb store.
>>
>> Mb = Mb & (~M | dom(rmw) | range(rmw))
>>
>> However one would want to do the same for Acq and Rel.
>>
>> For that one would need to distinguish e.g. between a read that comes from a
>> failed rmw instruction, and where the tag would disappear, or a normal
>> standalone read.
>>
>> For example, by using two different acquire tags, 'acquire and 'rmw-acquire,
>> and defining
>>
>> Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))
>>
>> Anyways we can do this change independently. So for now, we don't need
>> RMW_MB.
> 
> Overall, it seems better to have herd7 assign the right tag, but change
> the way the .def file works so that it can tell herd7 which tag to use
> in each of the success and failure cases.

I am not fully sure how herd7 uses the .def file, but I guess something 
like adding a second memory tag to __cmpxchg could work

cmpxchg(X,V,W) __cmpxchg{mb, once}(X,V,W)
cmpxchg_relaxed(X,V,W) __cmpxchg{once, once}(X,V,W)
cmpxchg_acquire(X,V,W) __cmpxchg{acquire, acquire}(X,V,W)
cmpxchg_release(X,V,W) __cmpxchg{release, release}(X,V,W)

Hernan

> 
>>> 	[M] ; po ; [RMW_MB]
>>>
>>> 	[RMW_MB] ; po ; [M]
>>>
>>> This is because events tagged with RMW_MB always are memory accesses,
>>> and accesses that aren't part of the RMW are already covered by the
>>> fencerel(Mb) thing above.
>>
>> This has exactly the issue mentioned above - it will cause the rmw to have
>> an internal strong fence that on powerpc probably isn't there.
> 
> Oops, that's right.  Silly oversight on my part.  But at least you
> understood what I meant.
> 
>> We could do (with the assumption that Mb applies only to successful rmw):
>>
>>   	[M] ; po ; [Mb & R]
>>   	[Mb & W] ; po ; [M]
> 
> That works.
> 
> Alan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-23 14:26                   ` Hernan Ponce de Leon
@ 2024-05-23 15:14                     ` Boqun Feng
  2024-05-24  1:38                       ` Alan Stern
  2024-05-23 16:05                     ` Alan Stern
  1 sibling, 1 reply; 33+ messages in thread
From: Boqun Feng @ 2024-05-23 15:14 UTC (permalink / raw)
  To: Hernan Ponce de Leon
  Cc: Alan Stern, Jonas Oberhauser, Paul E. McKenney, linux-kernel,
	linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget,
	Joel Fernandes

On Thu, May 23, 2024 at 04:26:23PM +0200, Hernan Ponce de Leon wrote:
> On 5/23/2024 4:05 PM, Alan Stern wrote:
> > On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote:
> > > 
> > > 
> > > Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
> > > > It would be better if there was a way to tell herd7 not to add the 'mb
> > > > tag to failed instructions in the first place.  This approach is
> > > > brittle; see below.
> > > 
> > > Hernan told me that in fact that is actually currently the case in herd7.
> > > Failing RMW get assigned the Once tag implicitly.
> > > Another thing that I'd suggest to change.
> > 
> > Indeed.
> > 
> > > > An alternative would be to have a way for the .cat file to remove the
> > > > 'mb tag from a failed RMW instruction.  But I don't know if this is
> > > > feasible.
> > > 
> > > For Mb it's feasible, as there is no Mb read or Mb store.
> > > 
> > > Mb = Mb & (~M | dom(rmw) | range(rmw))
> > > 
> > > However one would want to do the same for Acq and Rel.
> > > 
> > > For that one would need to distinguish e.g. between a read that comes from a
> > > failed rmw instruction, and where the tag would disappear, or a normal
> > > standalone read.
> > > 
> > > For example, by using two different acquire tags, 'acquire and 'rmw-acquire,
> > > and defining
> > > 
> > > Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))
> > > 
> > > Anyways we can do this change independently. So for now, we don't need
> > > RMW_MB.
> > 
> > Overall, it seems better to have herd7 assign the right tag, but change
> > the way the .def file works so that it can tell herd7 which tag to use
> > in each of the success and failure cases.
> 
> I am not fully sure how herd7 uses the .def file, but I guess something like
> adding a second memory tag to __cmpxchg could work
> 
> cmpxchg(X,V,W) __cmpxchg{mb, once}(X,V,W)
> cmpxchg_relaxed(X,V,W) __cmpxchg{once, once}(X,V,W)
> cmpxchg_acquire(X,V,W) __cmpxchg{acquire, acquire}(X,V,W)
> cmpxchg_release(X,V,W) __cmpxchg{release, release}(X,V,W)
> 

Note that cmpxchg_acquire() and cmpxchg_release() don't have _acqurie
or _release ordering if they fails.

Besides, I'm not sure this is a good idea. Because the "{mb}, {once},
etc" part is a syntax thing, you write a cmpxchg(), it should be
translated to a cmpxchg event with MB tag on. As to failed cmpxchg()
doesn't provide ordering, it's a semantics thing, as Jonas showed that
it can be represent in cat file. As long as it's a semanitc thing and we
can represent in cat file, I don't think we want herd to give a special
treatment.

What you and Jonas looks fine to me, since it moves the semantic bits
from herd internal to cat file.

Regards,
Boqun

> Hernan
> 
> > 
> > > > 	[M] ; po ; [RMW_MB]
> > > > 
> > > > 	[RMW_MB] ; po ; [M]
> > > > 
> > > > This is because events tagged with RMW_MB always are memory accesses,
> > > > and accesses that aren't part of the RMW are already covered by the
> > > > fencerel(Mb) thing above.
> > > 
> > > This has exactly the issue mentioned above - it will cause the rmw to have
> > > an internal strong fence that on powerpc probably isn't there.
> > 
> > Oops, that's right.  Silly oversight on my part.  But at least you
> > understood what I meant.
> > 
> > > We could do (with the assumption that Mb applies only to successful rmw):
> > > 
> > >   	[M] ; po ; [Mb & R]
> > >   	[Mb & W] ; po ; [M]
> > 
> > That works.
> > 
> > Alan
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-23 15:14                     ` Boqun Feng
@ 2024-05-24  1:38                       ` Alan Stern
  2024-05-24  2:50                         ` Boqun Feng
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2024-05-24  1:38 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Hernan Ponce de Leon, Jonas Oberhauser, Paul E. McKenney,
	linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave,
	luc.maranget, Joel Fernandes

On Thu, May 23, 2024 at 08:14:38AM -0700, Boqun Feng wrote:
> Besides, I'm not sure this is a good idea. Because the "{mb}, {once},
> etc" part is a syntax thing, you write a cmpxchg(), it should be
> translated to a cmpxchg event with MB tag on. As to failed cmpxchg()
> doesn't provide ordering, it's a semantics thing, as Jonas showed that
> it can be represent in cat file. As long as it's a semanitc thing and we
> can represent in cat file, I don't think we want herd to give a special
> treatment.

I don't really understand the distinction you're making between 
syntactic things and semantic things.  For most instructions there's no 
problem, because the instruction does just one thing.  But a cmpxchg 
instruction can do either of two things, depending on whether it 
succeeds or fails, so it makes sense to tell herd7 how to represent 
both of them.

> What you and Jonas looks fine to me, since it moves the semantic bits
> from herd internal to cat file.

Trying to recognize failed RMW events by looking for R events with an mb 
tag that aren't in the rmw relation seems very artificial.  That fact 
that it would work is merely an artifact of herd7's internal actions.  I 
think it would be much cleaner if herd7 represented these events in some 
other way, particularly if we can tell it how.

After all, herd7 already does generate different sets of events for 
successful (both R and W) and failed (only R) RMWs.  It's not such a big 
stretch to make the R events it generates different in the two cases.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-24  1:38                       ` Alan Stern
@ 2024-05-24  2:50                         ` Boqun Feng
  2024-05-24 14:14                           ` Alan Stern
  0 siblings, 1 reply; 33+ messages in thread
From: Boqun Feng @ 2024-05-24  2:50 UTC (permalink / raw)
  To: Alan Stern
  Cc: Hernan Ponce de Leon, Jonas Oberhauser, Paul E. McKenney,
	linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave,
	luc.maranget, Joel Fernandes

On Thu, May 23, 2024 at 09:38:05PM -0400, Alan Stern wrote:
> On Thu, May 23, 2024 at 08:14:38AM -0700, Boqun Feng wrote:
> > Besides, I'm not sure this is a good idea. Because the "{mb}, {once},
> > etc" part is a syntax thing, you write a cmpxchg(), it should be
> > translated to a cmpxchg event with MB tag on. As to failed cmpxchg()
> > doesn't provide ordering, it's a semantics thing, as Jonas showed that
> > it can be represent in cat file. As long as it's a semanitc thing and we
> > can represent in cat file, I don't think we want herd to give a special
> > treatment.
> 
> I don't really understand the distinction you're making between 
> syntactic things and semantic things.  For most instructions there's no 

Syntax is how the code is written, and semantic is how the code is
executed (in each execution candidate). So if we write a cmpxchg{mb}(),
and in execution candiates, it could generates a read{MB} event and a
write{MB} event (succeed case), or a read{MB} event (fail case), "{MB}"
here doesn't mean it's a full barrier, it only means the event comes
from a no suffix API. Here "{MB}" only has syntactic meaning (no
semantic meaning).

> problem, because the instruction does just one thing.  But a cmpxchg 
> instruction can do either of two things, depending on whether it 
> succeeds or fails, so it makes sense to tell herd7 how to represent 
> both of them.
> 
> > What you and Jonas looks fine to me, since it moves the semantic bits
> > from herd internal to cat file.
> 
> Trying to recognize failed RMW events by looking for R events with an mb 
> tag that aren't in the rmw relation seems very artificial.  That fact 

Not really, RMW events contains all events generated from
read-modify-write primitives, if there is an R event without a rmw
relation (i.e there is no corresponding write event), it's a failed RWM
by definition: it cannot be anything else.

> that it would work is merely an artifact of herd7's internal actions.  I 
> think it would be much cleaner if herd7 represented these events in some 
> other way, particularly if we can tell it how.
> 
> After all, herd7 already does generate different sets of events for 
> successful (both R and W) and failed (only R) RMWs.  It's not such a big 
> stretch to make the R events it generates different in the two cases.
> 

I thought we want to simplify the herd internal processing by avoid
adding mb events in cmpxchg() cases, in the same spirit, if syntactic
tagging is already good enough, why do we want to make herd complicate?

Regards,
Boqun

> Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-24  2:50                         ` Boqun Feng
@ 2024-05-24 14:14                           ` Alan Stern
  2024-05-24 14:34                             ` Boqun Feng
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2024-05-24 14:14 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Hernan Ponce de Leon, Jonas Oberhauser, Paul E. McKenney,
	linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave,
	luc.maranget, Joel Fernandes

On Thu, May 23, 2024 at 07:50:11PM -0700, Boqun Feng wrote:
> On Thu, May 23, 2024 at 09:38:05PM -0400, Alan Stern wrote:
> > On Thu, May 23, 2024 at 08:14:38AM -0700, Boqun Feng wrote:
> > > Besides, I'm not sure this is a good idea. Because the "{mb}, {once},
> > > etc" part is a syntax thing, you write a cmpxchg(), it should be
> > > translated to a cmpxchg event with MB tag on. As to failed cmpxchg()
> > > doesn't provide ordering, it's a semantics thing, as Jonas showed that
> > > it can be represent in cat file. As long as it's a semanitc thing and we
> > > can represent in cat file, I don't think we want herd to give a special
> > > treatment.
> > 
> > I don't really understand the distinction you're making between 
> > syntactic things and semantic things.  For most instructions there's no 
> 
> Syntax is how the code is written, and semantic is how the code is
> executed (in each execution candidate). So if we write a cmpxchg{mb}(),
> and in execution candiates, it could generates a read{MB} event and a
> write{MB} event (succeed case), or a read{MB} event (fail case), "{MB}"
> here doesn't mean it's a full barrier, it only means the event comes
> from a no suffix API. Here "{MB}" only has syntactic meaning (no
> semantic meaning).

Okay, I get it.  Then you might agree that it probably would be better 
to use a different tag here, because the mb tag is already in use with 
other instructions (like smp_mb()) where it does always mean there's a 
full barrier.

> Not really, RMW events contains all events generated from
> read-modify-write primitives, if there is an R event without a rmw
> relation (i.e there is no corresponding write event), it's a failed RWM
> by definition: it cannot be anything else.

Not true.  An R event without an rmw relation could be a READ_ONCE().  
Or a plain read.  The memory model uses the tag to distinguish these 
cases.

> > that it would work is merely an artifact of herd7's internal actions.  I 
> > think it would be much cleaner if herd7 represented these events in some 
> > other way, particularly if we can tell it how.
> > 
> > After all, herd7 already does generate different sets of events for 
> > successful (both R and W) and failed (only R) RMWs.  It's not such a big 
> > stretch to make the R events it generates different in the two cases.
> > 
> 
> I thought we want to simplify the herd internal processing by avoid
> adding mb events in cmpxchg() cases, in the same spirit, if syntactic
> tagging is already good enough, why do we want to make herd complicate?

Herd7 already is complicated by the fact that it needs to handle 
cmpxchg() instructions in two ways: success and failure.  This 
complication is unavoidable.  Adding one extra layer (different tags for 
the different ways) is an insignificant increase in the complication, 
IMO. And it's a net reduction when you compare it to the amount of 
complication currently in the herd7 code.

Also what about cmpxchg_acquire()?  If it fails, it will generate an R 
event with an acquire tag not in the rmw relation.  There is no way to 
tell such events apart from a normal smp_load_acquire(), and so the .cat 
file would have no way to know that the event should not have acquire 
semantics.  I guess we would have to rename this tag, as well.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-24 14:14                           ` Alan Stern
@ 2024-05-24 14:34                             ` Boqun Feng
  2024-05-24 14:53                               ` Alan Stern
  0 siblings, 1 reply; 33+ messages in thread
From: Boqun Feng @ 2024-05-24 14:34 UTC (permalink / raw)
  To: Alan Stern
  Cc: Hernan Ponce de Leon, Jonas Oberhauser, Paul E. McKenney,
	linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave,
	luc.maranget, Joel Fernandes

On Fri, May 24, 2024 at 10:14:25AM -0400, Alan Stern wrote:
[...]
> > Not really, RMW events contains all events generated from
> > read-modify-write primitives, if there is an R event without a rmw
> > relation (i.e there is no corresponding write event), it's a failed RWM
> > by definition: it cannot be anything else.
> 
> Not true.  An R event without an rmw relation could be a READ_ONCE().  

No, the R event is already in the RWM events, it has come from a rwm
atomic.

> Or a plain read.  The memory model uses the tag to distinguish these 
> cases.
> 
> > > that it would work is merely an artifact of herd7's internal actions.  I 
> > > think it would be much cleaner if herd7 represented these events in some 
> > > other way, particularly if we can tell it how.
> > > 
> > > After all, herd7 already does generate different sets of events for 
> > > successful (both R and W) and failed (only R) RMWs.  It's not such a big 
> > > stretch to make the R events it generates different in the two cases.
> > > 
> > 
> > I thought we want to simplify the herd internal processing by avoid
> > adding mb events in cmpxchg() cases, in the same spirit, if syntactic
> > tagging is already good enough, why do we want to make herd complicate?
> 
> Herd7 already is complicated by the fact that it needs to handle 
> cmpxchg() instructions in two ways: success and failure.  This 
> complication is unavoidable.  Adding one extra layer (different tags for 
> the different ways) is an insignificant increase in the complication, 
> IMO. And it's a net reduction when you compare it to the amount of 
> complication currently in the herd7 code.
> 
> Also what about cmpxchg_acquire()?  If it fails, it will generate an R 
> event with an acquire tag not in the rmw relation.  There is no way to 
> tell such events apart from a normal smp_load_acquire(), and so the .cat 
> file would have no way to know that the event should not have acquire 
> semantics.  I guess we would have to rename this tag, as well.

No,

	let read_of_rmw = (RMW & R) 
	let fail_read_of_rmw = read_of_rmw / dom(rmw)
	let fail_cmpxchg_acquire = fail_read_of_rmw & [Acquire]

gives you all the failed cmpxchg_acquire() apart from a normal
smp_load_acquire().

Regards,
Boqun

> 
> Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-24 14:34                             ` Boqun Feng
@ 2024-05-24 14:53                               ` Alan Stern
  2024-05-24 18:09                                 ` Jonas Oberhauser
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2024-05-24 14:53 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Hernan Ponce de Leon, Jonas Oberhauser, Paul E. McKenney,
	linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave,
	luc.maranget, Joel Fernandes

On Fri, May 24, 2024 at 07:34:06AM -0700, Boqun Feng wrote:
> On Fri, May 24, 2024 at 10:14:25AM -0400, Alan Stern wrote:
> [...]
> > > Not really, RMW events contains all events generated from
> > > read-modify-write primitives, if there is an R event without a rmw
> > > relation (i.e there is no corresponding write event), it's a failed RWM
> > > by definition: it cannot be anything else.
> > 
> > Not true.  An R event without an rmw relation could be a READ_ONCE().  
> 
> No, the R event is already in the RWM events, it has come from a rwm
> atomic.

Oh, right.  For some reason I was thinking that an instruction could 
belong to only one set, R or RMW.  But that doesn't make sense.

> > Or a plain read.  The memory model uses the tag to distinguish these 
> > cases.
> > 
> > > > that it would work is merely an artifact of herd7's internal actions.  I 
> > > > think it would be much cleaner if herd7 represented these events in some 
> > > > other way, particularly if we can tell it how.
> > > > 
> > > > After all, herd7 already does generate different sets of events for 
> > > > successful (both R and W) and failed (only R) RMWs.  It's not such a big 
> > > > stretch to make the R events it generates different in the two cases.
> > > > 
> > > 
> > > I thought we want to simplify the herd internal processing by avoid
> > > adding mb events in cmpxchg() cases, in the same spirit, if syntactic
> > > tagging is already good enough, why do we want to make herd complicate?
> > 
> > Herd7 already is complicated by the fact that it needs to handle 
> > cmpxchg() instructions in two ways: success and failure.  This 
> > complication is unavoidable.  Adding one extra layer (different tags for 
> > the different ways) is an insignificant increase in the complication, 
> > IMO. And it's a net reduction when you compare it to the amount of 
> > complication currently in the herd7 code.
> > 
> > Also what about cmpxchg_acquire()?  If it fails, it will generate an R 
> > event with an acquire tag not in the rmw relation.  There is no way to 
> > tell such events apart from a normal smp_load_acquire(), and so the .cat 
> > file would have no way to know that the event should not have acquire 
> > semantics.  I guess we would have to rename this tag, as well.
> 
> No,
> 
> 	let read_of_rmw = (RMW & R) 
> 	let fail_read_of_rmw = read_of_rmw / dom(rmw)
> 	let fail_cmpxchg_acquire = fail_read_of_rmw & [Acquire]
> 
> gives you all the failed cmpxchg_acquire() apart from a normal
> smp_load_acquire().

Yes, I see.  Using this approach, no further changes to herd7 or the 
.def file would be needed.  We would just have to add 'mb to the 
Accesses enumeration and to the list of tags allowed for an RMW 
instruction.

Question: Since R and RMW have different lists of allowable tags, how 
does herd7 decide which tags are allowed for an event that is in both 
the R and RMW sets?

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-24 14:53                               ` Alan Stern
@ 2024-05-24 18:09                                 ` Jonas Oberhauser
  2024-05-24 18:47                                   ` Boqun Feng
  2024-05-24 18:48                                   ` Alan Stern
  0 siblings, 2 replies; 33+ messages in thread
From: Jonas Oberhauser @ 2024-05-24 18:09 UTC (permalink / raw)
  To: Alan Stern, Boqun Feng
  Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, parri.andrea, j.alglave, luc.maranget,
	Joel Fernandes



Am 5/24/2024 um 4:53 PM schrieb Alan Stern:
> On Fri, May 24, 2024 at 07:34:06AM -0700, Boqun Feng wrote:
>> On Fri, May 24, 2024 at 10:14:25AM -0400, Alan Stern wrote:
>> [...]
>>>> Not really, RMW events contains all events generated from
>>>> read-modify-write primitives, if there is an R event without a rmw
>>>> relation (i.e there is no corresponding write event), it's a failed RWM
>>>> by definition: it cannot be anything else.
>>>
>>> Not true.  An R event without an rmw relation could be a READ_ONCE().
>>
>> No, the R event is already in the RWM events, it has come from a rwm
>> atomic.
> 
> Oh, right.  For some reason I was thinking that an instruction could
> belong to only one set, R or RMW.  But that doesn't make sense.

I thought the same, so I perhaps contributed to that confusion.

> 
>>> Or a plain read.  The memory model uses the tag to distinguish these
>>> cases.
>>>
>>>>> that it would work is merely an artifact of herd7's internal actions.  I
>>>>> think it would be much cleaner if herd7 represented these events in some
>>>>> other way, particularly if we can tell it how.
>>>>>
>>>>> After all, herd7 already does generate different sets of events for
>>>>> successful (both R and W) and failed (only R) RMWs.  It's not such a big
>>>>> stretch to make the R events it generates different in the two cases.
>>>>>
>>>>
>>>> I thought we want to simplify the herd internal processing by avoid
>>>> adding mb events in cmpxchg() cases, in the same spirit, if syntactic
>>>> tagging is already good enough, why do we want to make herd complicate?
>>>
>>> Herd7 already is complicated by the fact that it needs to handle
>>> cmpxchg() instructions in two ways: success and failure.  This
>>> complication is unavoidable.  Adding one extra layer (different tags for
>>> the different ways) is an insignificant increase in the complication,
>>> IMO. And it's a net reduction when you compare it to the amount of
>>> complication currently in the herd7 code.
>>>
>>> Also what about cmpxchg_acquire()?  If it fails, it will generate an R
>>> event with an acquire tag not in the rmw relation.

I would like that, but that is not the current implementation, a failed 
atomic_compare_exchange always produces a R*[once]; this behavior is 
currently hardcoded in herd7.

>>>  There is no way to
>>> tell such events apart from a normal smp_load_acquire(), and so the .cat
>>> file would have no way to know that the event should not have acquire
>>> semantics.  I guess we would have to rename this tag, as well.
>>
>> No,
>>
>> 	let read_of_rmw = (RMW & R)
>> 	let fail_read_of_rmw = read_of_rmw / dom(rmw)
>> 	let fail_cmpxchg_acquire = fail_read_of_rmw & [Acquire]
>>
>> gives you all the failed cmpxchg_acquire() apart from a normal
>> smp_load_acquire().
> 
> Yes, I see.  Using this approach, no further changes to herd7 or the
> def file would be needed.  We would just have to add 'mb to the
> Accesses enumeration and to the list of tags allowed for an RMW
> instruction.
> 
> Question: Since R and RMW have different lists of allowable tags, how
> does herd7 decide which tags are allowed for an event that is in both
> the R and RMW sets?

After looking over the patch draft for herd7 written by Hernan [1], my 
best guess is: it doen't. It seems that after herd7 detects you are 
using LKMM it simply drops all tags except 'acquire on read and 'release 
on store. Everything else becomes 'once (and 'mb adds some F[mb] sometimes).

That means that if one were to go through with the earlier suggestion to 
distinguish between the smp_mb() Mb and the cmpxchg() Mb, e.g. by 
calling the latter RmwMb, current herd7 would always erase the RmwMb tag 
because it is not called "acquire" or "release".
The same would happen of course if you introduced an RmwAcquire tag.

So there are several options:

- treat the tags as a syntactic thing which is always present, and
  specify their meaning purely in the cat file, analogous to what you
  have defined above. This is personally my favorite solution. To
  implement this in herd7 we would simply remove all the current special
  cases for the LKMM barriers, which I like.

  However then we need to actually define the behavior of herd if
  an inappropriate tag (like release on a load) is provided. Since the
  restriction is actually mostly enforced by the .def file - by simply
  not  providing a smp_store_acquire() etc. -, that only concerns RMWs,
  where xchg_acquire() would apply the Acquire tag to the write also.

  The easiest solution is to simply remove the syntax for specifying
  restrictions - it seems it is not doing much right now anyways - and
  do the filtering inside .bell, with things like

     (* only writes can have Release tags *)
     let Release = Release & W \ (RMW \ rng(rmw))

  One good thing about this way is that it would work even without
  modifying herd, since it is in a sense idempotent with the
  transformations done by herd.

  FWIW, in our internal weak memory model in Dresden we did exactly this,
  and use REL for the syntax and Rel for the semantic release ordering to
  make the distinction more clear, with things like:

     let Acq = (ACQ | SC) & (R | F)
     let Rel = (REL | SC) & (W | F)

  (SC is our equivalent to LKMM's Mb)

- treat the tags as semantic markers that are only present or not under
  some circumstances, and define the semantics fully based on the present
  tags. The circumstances are currently hardcoded in herd7, but we should
  move them out with some syntax like __cmpxchg{acquire}{once}.

  This requires touching the parser and of course still has the issue
  with the acquire tag appearing on the store as well.

- provide a full syntax for defining the event sequence that is
  expected. For example, for a cmpxchg we could do

     cmpxchg = { F[success-cmpxchg]; c = R&RMW[once]; if (c==v) { 
W&RMW[once]; } F[success-cmpxchg] }

  and then define in .bell that a success-cmpxchg is an mb if it is
  directly next to a cmpxchg that succeeds.

  The advantage is that you can customize the representation to whatever
  kernel devs thing is the most intuitive. The downside is that it
  requires major rewrites to herd7, someone to think about a reasonable
  language for specifying this etc.



Best wishes,
   jonas



[1] https://github.com/herd/herdtools7/pull/865


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-24 18:09                                 ` Jonas Oberhauser
@ 2024-05-24 18:47                                   ` Boqun Feng
  2024-05-24 18:48                                   ` Alan Stern
  1 sibling, 0 replies; 33+ messages in thread
From: Boqun Feng @ 2024-05-24 18:47 UTC (permalink / raw)
  To: Jonas Oberhauser
  Cc: Alan Stern, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel,
	linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget,
	Joel Fernandes

On Fri, May 24, 2024 at 08:09:28PM +0200, Jonas Oberhauser wrote:
[...]
> > 
> > Question: Since R and RMW have different lists of allowable tags, how
> > does herd7 decide which tags are allowed for an event that is in both
> > the R and RMW sets?
> 
> After looking over the patch draft for herd7 written by Hernan [1], my best
> guess is: it doen't. It seems that after herd7 detects you are using LKMM it

Right, you can event put a 'acquire tag to WRITE_ONCE():

	-WRITE_ONCE(X,V) { __store{once}(X,V); }
	+WRITE_ONCE(X,V) { __store{acquire}(X,V); }		

, herd won't complain, but it may change the model behavior.

Here is what I know from the herd code:

*	First, normally herd just put whatever the tag you specify in
	.def file to the accesses event (it has to be one of the tags
	in Access list in .bell file).

*	An access event looks like:

	(read_or_write?, ..., anonatation, is_atomic?, ...)

	"is_atomic?" means whether the event comes from an rmw
	primitives. So a READ_ONCE() will look like:

	(read, ..., once, false, ...)

	and a WRITE_ONCE() will look like:

	(write, ..., once, false, ...)

	the read part of an atomic_add_return_relaxed() is:

	(read, ..., once, true, ...)

	note that the "is_atomic?" is how this event ends up in "RMW"
	set.

> simply drops all tags except 'acquire on read and 'release on store.

Right, herd does some extra work to filter out RELEASE reads and ACQUIRE
writes for rwm atomics, basically, when herd is generating events for an
rmw atomic, if it's not "mb", it will only kill 'acquire for read and
'release on store, otherwise, it changes the annotation part to 'once.
Funny example, if you do a __atomic_fetch_op{hello}(..), herd will treat
it as a __atomic_fetch_op{once}(..) because of this.

> Everything else becomes 'once (and 'mb adds some F[mb] sometimes).
> 
> That means that if one were to go through with the earlier suggestion to
> distinguish between the smp_mb() Mb and the cmpxchg() Mb, e.g. by calling
> the latter RmwMb, current herd7 would always erase the RmwMb tag because it
> is not called "acquire" or "release".
> The same would happen of course if you introduced an RmwAcquire tag.
> 
> So there are several options:
> 
> - treat the tags as a syntactic thing which is always present, and
>  specify their meaning purely in the cat file, analogous to what you
>  have defined above. This is personally my favorite solution. To
>  implement this in herd7 we would simply remove all the current special
>  cases for the LKMM barriers, which I like.
> 

Agreed.

>  However then we need to actually define the behavior of herd if
>  an inappropriate tag (like release on a load) is provided. Since the
>  restriction is actually mostly enforced by the .def file - by simply
>  not  providing a smp_store_acquire() etc. -, that only concerns RMWs,
>  where xchg_acquire() would apply the Acquire tag to the write also.
> 
>  The easiest solution is to simply remove the syntax for specifying
>  restrictions - it seems it is not doing much right now anyways - and
>  do the filtering inside .bell, with things like
> 
>     (* only writes can have Release tags *)
>     let Release = Release & W \ (RMW \ rng(rmw))
> 
>  One good thing about this way is that it would work even without
>  modifying herd, since it is in a sense idempotent with the

Well, we still need to turn off the "smart" annotation rewritting in
herd (e.g. -> once), but I think that's a good thing: it simplifies the
internal work herd does, and it also helps people on other tools
understand LKMM better.

>  transformations done by herd.
> 
>  FWIW, in our internal weak memory model in Dresden we did exactly this,

so the model doesn't work elsewhere even in Germany? ;-) Sorry, couldn'
t resist ;-) ;-) ;-)

>  and use REL for the syntax and Rel for the semantic release ordering to
>  make the distinction more clear, with things like:
> 
>     let Acq = (ACQ | SC) & (R | F)
>     let Rel = (REL | SC) & (W | F)
> 
>  (SC is our equivalent to LKMM's Mb)
> 
> - treat the tags as semantic markers that are only present or not under
>  some circumstances, and define the semantics fully based on the present
>  tags. The circumstances are currently hardcoded in herd7, but we should
>  move them out with some syntax like __cmpxchg{acquire}{once}.
> 
>  This requires touching the parser and of course still has the issue
>  with the acquire tag appearing on the store as well.
> 
> - provide a full syntax for defining the event sequence that is
>  expected. For example, for a cmpxchg we could do
> 
>     cmpxchg = { F[success-cmpxchg]; c = R&RMW[once]; if (c==v) {
> W&RMW[once]; } F[success-cmpxchg] }
> 
>  and then define in .bell that a success-cmpxchg is an mb if it is
>  directly next to a cmpxchg that succeeds.
> 
>  The advantage is that you can customize the representation to whatever
>  kernel devs thing is the most intuitive. The downside is that it
>  requires major rewrites to herd7, someone to think about a reasonable
>  language for specifying this etc.
> 

I no doubt am the fan of the first approach, herd is powerful because
the ability to customize the semantic rules for model ordering models,
moving more parts from herd internals to the cat file (or bell file) is
always a good thing to me.

Regards,
Boqun

> 
> 
> Best wishes,
>   jonas
> 
> 
> 
> [1] https://github.com/herd/herdtools7/pull/865
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-24 18:09                                 ` Jonas Oberhauser
  2024-05-24 18:47                                   ` Boqun Feng
@ 2024-05-24 18:48                                   ` Alan Stern
  1 sibling, 0 replies; 33+ messages in thread
From: Alan Stern @ 2024-05-24 18:48 UTC (permalink / raw)
  To: Jonas Oberhauser
  Cc: Boqun Feng, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel,
	linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget,
	Joel Fernandes

On Fri, May 24, 2024 at 08:09:28PM +0200, Jonas Oberhauser wrote:
> 
> 
> Am 5/24/2024 um 4:53 PM schrieb Alan Stern:
> > Question: Since R and RMW have different lists of allowable tags, how
> > does herd7 decide which tags are allowed for an event that is in both
> > the R and RMW sets?
> 
> After looking over the patch draft for herd7 written by Hernan [1], my best
> guess is: it doen't. It seems that after herd7 detects you are using LKMM it
> simply drops all tags except 'acquire on read and 'release on store.
> Everything else becomes 'once (and 'mb adds some F[mb] sometimes).

Okay, yes, this is the sort of thing we would like to move away from.

> That means that if one were to go through with the earlier suggestion to
> distinguish between the smp_mb() Mb and the cmpxchg() Mb, e.g. by calling
> the latter RmwMb, current herd7 would always erase the RmwMb tag because it
> is not called "acquire" or "release".
> The same would happen of course if you introduced an RmwAcquire tag.
> 
> So there are several options:
> 
> - treat the tags as a syntactic thing which is always present, and
>  specify their meaning purely in the cat file, analogous to what you
>  have defined above. This is personally my favorite solution. To
>  implement this in herd7 we would simply remove all the current special
>  cases for the LKMM barriers, which I like.
> 
>  However then we need to actually define the behavior of herd if
>  an inappropriate tag (like release on a load) is provided. Since the
>  restriction is actually mostly enforced by the .def file - by simply
>  not  providing a smp_store_acquire() etc. -, that only concerns RMWs,
>  where xchg_acquire() would apply the Acquire tag to the write also.
> 
>  The easiest solution is to simply remove the syntax for specifying
>  restrictions - it seems it is not doing much right now anyways - and
>  do the filtering inside .bell, with things like
> 
>     (* only writes can have Release tags *)
>     let Release = Release & W \ (RMW \ rng(rmw))
> 
>  One good thing about this way is that it would work even without
>  modifying herd, since it is in a sense idempotent with the
>  transformations done by herd.

This seems like a good approach.

>  FWIW, in our internal weak memory model in Dresden we did exactly this,
>  and use REL for the syntax and Rel for the semantic release ordering to
>  make the distinction more clear, with things like:
> 
>     let Acq = (ACQ | SC) & (R | F)
>     let Rel = (REL | SC) & (W | F)
> 
>  (SC is our equivalent to LKMM's Mb)

Definitely, the syntactic markers should be easily distinguished (by 
case would be a good way) from the semantic classes used in the model.

> - treat the tags as semantic markers that are only present or not under
>  some circumstances, and define the semantics fully based on the present
>  tags. The circumstances are currently hardcoded in herd7, but we should
>  move them out with some syntax like __cmpxchg{acquire}{once}.
> 
>  This requires touching the parser and of course still has the issue
>  with the acquire tag appearing on the store as well.
> 
> - provide a full syntax for defining the event sequence that is
>  expected. For example, for a cmpxchg we could do
> 
>     cmpxchg = { F[success-cmpxchg]; c = R&RMW[once]; if (c==v) {
> W&RMW[once]; } F[success-cmpxchg] }
> 
>  and then define in .bell that a success-cmpxchg is an mb if it is
>  directly next to a cmpxchg that succeeds.
> 
>  The advantage is that you can customize the representation to whatever
>  kernel devs thing is the most intuitive. The downside is that it
>  requires major rewrites to herd7, someone to think about a reasonable
>  language for specifying this etc.

Let's avoid major rewrites to herd7.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-23 14:26                   ` Hernan Ponce de Leon
  2024-05-23 15:14                     ` Boqun Feng
@ 2024-05-23 16:05                     ` Alan Stern
  1 sibling, 0 replies; 33+ messages in thread
From: Alan Stern @ 2024-05-23 16:05 UTC (permalink / raw)
  To: Hernan Ponce de Leon
  Cc: Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes

On Thu, May 23, 2024 at 04:26:23PM +0200, Hernan Ponce de Leon wrote:
> On 5/23/2024 4:05 PM, Alan Stern wrote:
> > Overall, it seems better to have herd7 assign the right tag, but change
> > the way the .def file works so that it can tell herd7 which tag to use
> > in each of the success and failure cases.
> 
> I am not fully sure how herd7 uses the .def file, but I guess something like
> adding a second memory tag to __cmpxchg could work
> 
> cmpxchg(X,V,W) __cmpxchg{mb, once}(X,V,W)
> cmpxchg_relaxed(X,V,W) __cmpxchg{once, once}(X,V,W)
> cmpxchg_acquire(X,V,W) __cmpxchg{acquire, acquire}(X,V,W)
> cmpxchg_release(X,V,W) __cmpxchg{release, release}(X,V,W)

Right, except that the last two should be:

cmpxchg_acquire(X,V,W) __cmpxchg{acquire, once}(X,V,W)
cmpxchg_release(X,V,W) __cmpxchg{release, once}(X,V,W)

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-23 14:05                 ` Alan Stern
  2024-05-23 14:26                   ` Hernan Ponce de Leon
@ 2024-05-23 14:36                   ` Jonas Oberhauser
  1 sibling, 0 replies; 33+ messages in thread
From: Jonas Oberhauser @ 2024-05-23 14:36 UTC (permalink / raw)
  To: Alan Stern
  Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes



Am 5/23/2024 um 4:05 PM schrieb Alan Stern:
> On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote:
>>
>>
>> Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
>>> It would be better if there was a way to tell herd7 not to add the 'mb
>>> tag to failed instructions in the first place.  This approach is
>>> brittle; see below.
>>
>> Hernan told me that in fact that is actually currently the case in herd7.
>> Failing RMW get assigned the Once tag implicitly.
>> Another thing that I'd suggest to change.
> 
> Indeed.
> 
>>> An alternative would be to have a way for the .cat file to remove the
>>> 'mb tag from a failed RMW instruction.  But I don't know if this is
>>> feasible.
>>
>> For Mb it's feasible, as there is no Mb read or Mb store.
>>
>> Mb = Mb & (~M | dom(rmw) | range(rmw))
>>
>> However one would want to do the same for Acq and Rel.
>>
>> For that one would need to distinguish e.g. between a read that comes from a
>> failed rmw instruction, and where the tag would disappear, or a normal
>> standalone read.
>>
>> For example, by using two different acquire tags, 'acquire and 'rmw-acquire,
>> and defining
>>
>> Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))
>>
>> Anyways we can do this change independently. So for now, we don't need
>> RMW_MB.
> 
> Overall, it seems better to have herd7 assign the right tag, but change
> the way the .def file works so that it can tell herd7 which tag to use
> in each of the success and failure cases.

Yes, that would be good.
In principle herd should already support this kind of logic for e.g. C11 
which also has distinct success and failure modes.
But of course I don't know if there's syntax to make this change in 
.def, let alone what it would look like.


> But at least you
> understood what I meant.

I do try :) (: :)

> 
>> We could do (with the assumption that Mb applies only to successful rmw):
>>
>>   	[M] ; po ; [Mb & R]
>>   	[Mb & W] ; po ; [M]
> 
> That works.

Ok, I'll prepare a patch for this and Andrea or anyone else can still 
interject.
I suppose the patch would not change the semantics at all with the 
current herd7, since Mb does not appear on any reads and writes for the 
time being.

best wishes,
   jonas




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: LKMM: Making RMW barriers explicit
  2024-05-18  0:31     ` Alan Stern
  2024-05-21  9:57       ` Jonas Oberhauser
@ 2024-05-21 11:38       ` Hernan Ponce de Leon
  1 sibling, 0 replies; 33+ messages in thread
From: Hernan Ponce de Leon @ 2024-05-21 11:38 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch,
	kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget,
	Joel Fernandes

On 5/18/2024 2:31 AM, Alan Stern wrote:
> On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
>> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
>>>
>>>
>>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
>>>> Hernan and Jonas:
>>>>
>>>> Can you explain more fully the changes you want to make to herd7 and/or
>>>> the LKMM?  The goal is to make the memory barriers currently implicit in
>>>> RMW operations explicit, but I couldn't understand how you propose to do
>>>> this.
>>>>
>>>> Are you going to change herd7 somehow, and if so, how?  It seems like
>>>> you should want to provide sufficient information so that the .bell
>>>> and .cat files can implement the appropriate memory barriers associated
>>>> with each RMW operation.  What additional information is needed?  And
>>>> how (explained in English, not by quoting source code) will the .bell
>>>> and .cat files make use of this information?
>>>>
>>>> Alan
>>>
>>>
>>> I don't know whether herd7 needs to be changed. Probably, herd7 does the
>>> following:
>>> - if a tag called Mb appears on an rmw instruction (by instruction I
>>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with
>>> one of those things:
>>>     * full mb ; once (the rmw) ; full mb, if a value returning
>>> (successful) rmw
>>>     * once (the rmw)   otherwise
>>> - everything else gets translated 1:1 into some internal representation
>>
>> This is my understanding from reading the source code of CSem.ml in herd7's
>> repo.
>>
>> Also, this is exactly what dartagnan is currently doing.
>>
>>>
>>> What I'm proposing is:
>>> 1. remove this transpilation step,
>>> 2. and instead allow the Mb tag to actually appear on RMW instructions
>>> 3. change the cat file to explicitly define the behavior of the Mb tag
>>> on RMW instructions
>>
>> These are the exact 3 things I changed in dartagnan for testing what Jonas
>> proposed.
>>
>> I am not sure if further changes are needed for herd7.

I implemented these changes in herd7 and they seem enough.
I opened a PRs for discussion
	https://github.com/herd/herdtools7/pull/865

> 
> Okay, good.  This answers the first part of what I asked.  What about
> the second part?  That is, how will the changes to the .def, .bell, and
> .cat files achieve your goals?

At least the cat model needs to be updated. If I remove the fences from 
herd7, but keep the current model, these 4 tests fail.

  --- Summary:
  !!! Result changed: 
./litmus/manual/locked/SUW+or-ow+l-ow-or.litmus.out.new
  !!! Result changed: 
./litmus/manual/atomic/C-PaulEMcKenney-SB+adat-o+adat-o.litmus.out.new
  !!! Result changed: ./litmus/manual/atomic/C-atomic-01.litmus.out.new
  !!! Result changed: ./litmus/manual/atomic/C-atomic-02.litmus.out.new

Using this patch I get the correct results

diff --git a/tools/memory-model/linux-kernel.cat 
b/tools/memory-model/linux-kernel.cat
index adf3c4f41229..7e4787cdbd66 100644
--- a/tools/memory-model/linux-kernel.cat
+++ b/tools/memory-model/linux-kernel.cat
@@ -34,6 +34,12 @@ let R4rmb = R \ Noreturn     (* Reads for which rmb 
works *)
  let rmb = [R4rmb] ; fencerel(Rmb) ; [R4rmb]
  let wmb = [W] ; fencerel(Wmb) ; [W]
  let mb = ([M] ; fencerel(Mb) ; [M]) |
+       (* everything across a full barrier RMW is ordered. This 
includes up to one event inside the RMW. *)
+       ([M] ; po ; [RMW & Mb] ; po ; [M]) |
+       (* full barrier RMW writes are ordered with everything behind 
the RMW *)
+       ([W & RMW & Mb] ; po ; [M]) |
+       (* full barrier RMW reads are ordered with everything before the 
RMW *)
+       ([M] ; po ; [R & RMW & Mb]) |
         ([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) |
         ([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) |
         ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) |

Hernan

> 
> Alan


^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-05-24 18:48 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-16  1:43 LKMM: Making RMW barriers explicit Alan Stern
2024-05-16  8:31 ` Jonas Oberhauser
2024-05-16  8:44   ` Hernan Ponce de Leon
2024-05-18  0:31     ` Alan Stern
2024-05-21  9:57       ` Jonas Oberhauser
2024-05-21 15:36         ` Alan Stern
2024-05-22  9:20           ` Jonas Oberhauser
2024-05-22 14:20             ` Alan Stern
2024-05-22 16:54               ` Andrea Parri
2024-05-22 18:20                 ` Alan Stern
2024-05-22 19:48                   ` Hernan Ponce de Leon
2024-05-23  9:04                     ` Andrea Parri
2024-05-23 14:27                 ` Jonas Oberhauser
2024-05-23 16:35                   ` Andrea Parri
2024-05-23 20:30                     ` Jonas Oberhauser
2024-05-24  3:30                       ` Andrea Parri
2024-05-24  8:16                       ` Hernan Ponce de Leon
2024-05-23 12:54               ` Jonas Oberhauser
2024-05-23 13:35                 ` Paul E. McKenney
2024-05-23 14:05                 ` Alan Stern
2024-05-23 14:26                   ` Hernan Ponce de Leon
2024-05-23 15:14                     ` Boqun Feng
2024-05-24  1:38                       ` Alan Stern
2024-05-24  2:50                         ` Boqun Feng
2024-05-24 14:14                           ` Alan Stern
2024-05-24 14:34                             ` Boqun Feng
2024-05-24 14:53                               ` Alan Stern
2024-05-24 18:09                                 ` Jonas Oberhauser
2024-05-24 18:47                                   ` Boqun Feng
2024-05-24 18:48                                   ` Alan Stern
2024-05-23 16:05                     ` Alan Stern
2024-05-23 14:36                   ` Jonas Oberhauser
2024-05-21 11:38       ` Hernan Ponce de Leon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox