* LKMM: Making RMW barriers explicit @ 2024-05-16 1:43 Alan Stern 2024-05-16 8:31 ` Jonas Oberhauser 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2024-05-16 1:43 UTC (permalink / raw) To: Hernan Ponce de Leon, Jonas Oberhauser Cc: Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes Hernan and Jonas: Can you explain more fully the changes you want to make to herd7 and/or the LKMM? The goal is to make the memory barriers currently implicit in RMW operations explicit, but I couldn't understand how you propose to do this. Are you going to change herd7 somehow, and if so, how? It seems like you should want to provide sufficient information so that the .bell and .cat files can implement the appropriate memory barriers associated with each RMW operation. What additional information is needed? And how (explained in English, not by quoting source code) will the .bell and .cat files make use of this information? Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-16 1:43 LKMM: Making RMW barriers explicit Alan Stern @ 2024-05-16 8:31 ` Jonas Oberhauser 2024-05-16 8:44 ` Hernan Ponce de Leon 0 siblings, 1 reply; 33+ messages in thread From: Jonas Oberhauser @ 2024-05-16 8:31 UTC (permalink / raw) To: Alan Stern, Hernan Ponce de Leon Cc: Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes Am 5/16/2024 um 3:43 AM schrieb Alan Stern: > Hernan and Jonas: > > Can you explain more fully the changes you want to make to herd7 and/or > the LKMM? The goal is to make the memory barriers currently implicit in > RMW operations explicit, but I couldn't understand how you propose to do > this. > > Are you going to change herd7 somehow, and if so, how? It seems like > you should want to provide sufficient information so that the .bell > and .cat files can implement the appropriate memory barriers associated > with each RMW operation. What additional information is needed? And > how (explained in English, not by quoting source code) will the .bell > and .cat files make use of this information? > > Alan I don't know whether herd7 needs to be changed. Probably, herd7 does the following: - if a tag called Mb appears on an rmw instruction (by instruction I mean things like xchg(), atomic_inc_return_relaxed()), replace it with one of those things: * full mb ; once (the rmw) ; full mb, if a value returning (successful) rmw * once (the rmw) otherwise - everything else gets translated 1:1 into some internal representation What I'm proposing is: 1. remove this transpilation step, 2. and instead allow the Mb tag to actually appear on RMW instructions 3. change the cat file to explicitly define the behavior of the Mb tag on RMW instructions There are probably two ways to achieve this: a) change herd7 to remove the special behavior for Mb, after that we should be able to do everything else in the .cat/.bell/.def files. b) sidestep herd7's behavior by renaming Mb to _Mb in the .def file, and then defining Mb=_Mb in the .bell file, and define the semantics of the Mb tag in the .cat files. The latter would not include modification to herd7, but it's a bit hacky. I'm not sure if the second way really works since I don't know exactly how the herd7 tool does the transpilation, e.g., if it really looks for an Mb tag or rather for the names of the instructions. Does it make sense? jonas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-16 8:31 ` Jonas Oberhauser @ 2024-05-16 8:44 ` Hernan Ponce de Leon 2024-05-18 0:31 ` Alan Stern 0 siblings, 1 reply; 33+ messages in thread From: Hernan Ponce de Leon @ 2024-05-16 8:44 UTC (permalink / raw) To: Jonas Oberhauser, Alan Stern Cc: Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On 5/16/2024 10:31 AM, Jonas Oberhauser wrote: > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern: >> Hernan and Jonas: >> >> Can you explain more fully the changes you want to make to herd7 and/or >> the LKMM? The goal is to make the memory barriers currently implicit in >> RMW operations explicit, but I couldn't understand how you propose to do >> this. >> >> Are you going to change herd7 somehow, and if so, how? It seems like >> you should want to provide sufficient information so that the .bell >> and .cat files can implement the appropriate memory barriers associated >> with each RMW operation. What additional information is needed? And >> how (explained in English, not by quoting source code) will the .bell >> and .cat files make use of this information? >> >> Alan > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the > following: > - if a tag called Mb appears on an rmw instruction (by instruction I > mean things like xchg(), atomic_inc_return_relaxed()), replace it with > one of those things: > * full mb ; once (the rmw) ; full mb, if a value returning > (successful) rmw > * once (the rmw) otherwise > - everything else gets translated 1:1 into some internal representation This is my understanding from reading the source code of CSem.ml in herd7's repo. Also, this is exactly what dartagnan is currently doing. > > What I'm proposing is: > 1. remove this transpilation step, > 2. and instead allow the Mb tag to actually appear on RMW instructions > 3. change the cat file to explicitly define the behavior of the Mb tag > on RMW instructions These are the exact 3 things I changed in dartagnan for testing what Jonas proposed. I am not sure if further changes are needed for herd7. Hernan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-16 8:44 ` Hernan Ponce de Leon @ 2024-05-18 0:31 ` Alan Stern 2024-05-21 9:57 ` Jonas Oberhauser 2024-05-21 11:38 ` Hernan Ponce de Leon 0 siblings, 2 replies; 33+ messages in thread From: Alan Stern @ 2024-05-18 0:31 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote: > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote: > > > > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern: > > > Hernan and Jonas: > > > > > > Can you explain more fully the changes you want to make to herd7 and/or > > > the LKMM? The goal is to make the memory barriers currently implicit in > > > RMW operations explicit, but I couldn't understand how you propose to do > > > this. > > > > > > Are you going to change herd7 somehow, and if so, how? It seems like > > > you should want to provide sufficient information so that the .bell > > > and .cat files can implement the appropriate memory barriers associated > > > with each RMW operation. What additional information is needed? And > > > how (explained in English, not by quoting source code) will the .bell > > > and .cat files make use of this information? > > > > > > Alan > > > > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the > > following: > > - if a tag called Mb appears on an rmw instruction (by instruction I > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with > > one of those things: > > * full mb ; once (the rmw) ; full mb, if a value returning > > (successful) rmw > > * once (the rmw) otherwise > > - everything else gets translated 1:1 into some internal representation > > This is my understanding from reading the source code of CSem.ml in herd7's > repo. > > Also, this is exactly what dartagnan is currently doing. > > > > > What I'm proposing is: > > 1. remove this transpilation step, > > 2. and instead allow the Mb tag to actually appear on RMW instructions > > 3. change the cat file to explicitly define the behavior of the Mb tag > > on RMW instructions > > These are the exact 3 things I changed in dartagnan for testing what Jonas > proposed. > > I am not sure if further changes are needed for herd7. Okay, good. This answers the first part of what I asked. What about the second part? That is, how will the changes to the .def, .bell, and .cat files achieve your goals? Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-18 0:31 ` Alan Stern @ 2024-05-21 9:57 ` Jonas Oberhauser 2024-05-21 15:36 ` Alan Stern 2024-05-21 11:38 ` Hernan Ponce de Leon 1 sibling, 1 reply; 33+ messages in thread From: Jonas Oberhauser @ 2024-05-21 9:57 UTC (permalink / raw) To: Alan Stern, Hernan Ponce de Leon Cc: Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes Am 5/18/2024 um 2:31 AM schrieb Alan Stern: > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote: >> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote: >>> >>> >>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern: >>>> Hernan and Jonas: >>>> >>>> Can you explain more fully the changes you want to make to herd7 and/or >>>> the LKMM? The goal is to make the memory barriers currently implicit in >>>> RMW operations explicit, but I couldn't understand how you propose to do >>>> this. >>>> >>>> Are you going to change herd7 somehow, and if so, how? It seems like >>>> you should want to provide sufficient information so that the .bell >>>> and .cat files can implement the appropriate memory barriers associated >>>> with each RMW operation. What additional information is needed? And >>>> how (explained in English, not by quoting source code) will the .bell >>>> and .cat files make use of this information? >>>> >>>> Alan >>> >>> >>> I don't know whether herd7 needs to be changed. Probably, herd7 does the >>> following: >>> - if a tag called Mb appears on an rmw instruction (by instruction I >>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with >>> one of those things: >>> * full mb ; once (the rmw) ; full mb, if a value returning >>> (successful) rmw >>> * once (the rmw) otherwise >>> - everything else gets translated 1:1 into some internal representation >> >> This is my understanding from reading the source code of CSem.ml in herd7's >> repo. >> >> Also, this is exactly what dartagnan is currently doing. >> >>> >>> What I'm proposing is: >>> 1. remove this transpilation step, >>> 2. and instead allow the Mb tag to actually appear on RMW instructions >>> 3. change the cat file to explicitly define the behavior of the Mb tag >>> on RMW instructions >> >> These are the exact 3 things I changed in dartagnan for testing what Jonas >> proposed. >> >> I am not sure if further changes are needed for herd7. > > Okay, good. This answers the first part of what I asked. What about > the second part? That is, how will the changes to the .def, .bell, and > .cat files achieve your goals? > > Alan Firstly, we'd allow 'mb as a barrier mode in events, e.g. instructions RMW[{'once,'acquire,'release,'mb}] then the Mb tags would appear in the graph. And then I'd define the ordering explicitly. One way is to say that an Mb tag orders all memory accesses before(or at) the tag with all memory accesses after(or at) the tag, except the accesses of the rmw with each other. This is the same as the full fence before the read, which orders all memory accesses before the read with every access after(or at) the read, plus the full fence after the write, which orders all memory accesses before(or at) the write with all accesses after the write. That would be done by adding [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M] to ppo. One could also split it into two rules to keep with the "two full fences" analogy. Maybe a good way would be like this: [M] ; po; [RMW_MB & R] ; po^? ; [M] [M] ; po^?; [RMW_MB & W] ; po ; [M] Hope that makes sense, jonas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-21 9:57 ` Jonas Oberhauser @ 2024-05-21 15:36 ` Alan Stern 2024-05-22 9:20 ` Jonas Oberhauser 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2024-05-21 15:36 UTC (permalink / raw) To: Jonas Oberhauser Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote: > > > Am 5/18/2024 um 2:31 AM schrieb Alan Stern: > > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote: > > > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote: > > > > > > > > > > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern: > > > > > Hernan and Jonas: > > > > > > > > > > Can you explain more fully the changes you want to make to herd7 and/or > > > > > the LKMM? The goal is to make the memory barriers currently implicit in > > > > > RMW operations explicit, but I couldn't understand how you propose to do > > > > > this. > > > > > > > > > > Are you going to change herd7 somehow, and if so, how? It seems like > > > > > you should want to provide sufficient information so that the .bell > > > > > and .cat files can implement the appropriate memory barriers associated > > > > > with each RMW operation. What additional information is needed? And > > > > > how (explained in English, not by quoting source code) will the .bell > > > > > and .cat files make use of this information? > > > > > > > > > > Alan > > > > > > > > > > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the > > > > following: > > > > - if a tag called Mb appears on an rmw instruction (by instruction I > > > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with > > > > one of those things: > > > > * full mb ; once (the rmw) ; full mb, if a value returning > > > > (successful) rmw > > > > * once (the rmw) otherwise > > > > - everything else gets translated 1:1 into some internal representation > > > > > > This is my understanding from reading the source code of CSem.ml in herd7's > > > repo. > > > > > > Also, this is exactly what dartagnan is currently doing. > > > > > > > > > > > What I'm proposing is: > > > > 1. remove this transpilation step, > > > > 2. and instead allow the Mb tag to actually appear on RMW instructions > > > > 3. change the cat file to explicitly define the behavior of the Mb tag > > > > on RMW instructions > > > > > > These are the exact 3 things I changed in dartagnan for testing what Jonas > > > proposed. > > > > > > I am not sure if further changes are needed for herd7. What about failed RMW instructions? IIRC, herd7 generates just an R for these, not both R and W, but won't it still be annotated with an mb tag? And wasn't this matter of failed RMWs one of the issues that the two of you specifically wanted to make explicit in the memory model, rather than implicit in the operation of herd7? And wasn't another one of these issues the difference between value-returning and non-value-returning RMWs? As far as I can, nothing in the .def file specifically mentions this. There's the noreturn tag in the .bell file, but nothing in the .def file says which instructions it applies to. Or are we supposed to know that it automatically applies to all __atomic_op() instances? > > Okay, good. This answers the first part of what I asked. What about > > the second part? That is, how will the changes to the .def, .bell, and > > .cat files achieve your goals? > > > > Alan > > > Firstly, we'd allow 'mb as a barrier mode in events, e.g. You mean as a mode in RMW events. 'mb already is a mode for F events. > instructions RMW[{'once,'acquire,'release,'mb}] > > then the Mb tags would appear in the graph. And then I'd define the ordering > explicitly. One way is to say that an Mb tag orders all memory accesses > before(or at) the tag with all memory accesses after(or at) the tag, except > the accesses of the rmw with each other. I don't think you need to add very much. The .cat file already defines the mb relation as including the term: ([M] ; fencerel(Mb) ; [M]) All that's needed is to replace the fencerel(Mb) with something more general... > This is the same as the full fence before the read, which orders all memory > accesses before the read with every access after(or at) the read, > plus the full fence after the write, which orders all memory accesses > before(or at) the write with all accesses after the write. > > That would be done by adding > > [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M] ... like this. > to ppo. You need to update the definition of mb, not ppo. And the RMW_MB above looks wrong; shouldn't it be just Mb? Also, is the "\ rmw" part really necessary? I don't think it makes any difference; the memory model already knows that the read part of an RMW has to happen before the write part. > One could also split it into two rules to keep with the "two full fences" > analogy. Maybe a good way would be like this: > > [M] ; po; [RMW_MB & R] ; po^? ; [M] > > [M] ; po^?; [RMW_MB & W] ; po ; [M] My preference is for the first approach. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-21 15:36 ` Alan Stern @ 2024-05-22 9:20 ` Jonas Oberhauser 2024-05-22 14:20 ` Alan Stern 0 siblings, 1 reply; 33+ messages in thread From: Jonas Oberhauser @ 2024-05-22 9:20 UTC (permalink / raw) To: Alan Stern Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes Am 5/21/2024 um 5:36 PM schrieb Alan Stern: > On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote: >> >> >> Am 5/18/2024 um 2:31 AM schrieb Alan Stern: >>> On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote: >>>> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote: >>>>> >>>>> >>>>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern: >>>>>> Hernan and Jonas: >>>>>> >>>>>> Can you explain more fully the changes you want to make to herd7 and/or >>>>>> the LKMM? The goal is to make the memory barriers currently implicit in >>>>>> RMW operations explicit, but I couldn't understand how you propose to do >>>>>> this. >>>>>> >>>>>> Are you going to change herd7 somehow, and if so, how? It seems like >>>>>> you should want to provide sufficient information so that the .bell >>>>>> and .cat files can implement the appropriate memory barriers associated >>>>>> with each RMW operation. What additional information is needed? And >>>>>> how (explained in English, not by quoting source code) will the .bell >>>>>> and .cat files make use of this information? >>>>>> >>>>>> Alan >>>>> >>>>> >>>>> I don't know whether herd7 needs to be changed. Probably, herd7 does the >>>>> following: >>>>> - if a tag called Mb appears on an rmw instruction (by instruction I >>>>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with >>>>> one of those things: >>>>> * full mb ; once (the rmw) ; full mb, if a value returning >>>>> (successful) rmw >>>>> * once (the rmw) otherwise >>>>> - everything else gets translated 1:1 into some internal representation >>>> >>>> This is my understanding from reading the source code of CSem.ml in herd7's >>>> repo. >>>> >>>> Also, this is exactly what dartagnan is currently doing. >>>> >>>>> >>>>> What I'm proposing is: >>>>> 1. remove this transpilation step, >>>>> 2. and instead allow the Mb tag to actually appear on RMW instructions >>>>> 3. change the cat file to explicitly define the behavior of the Mb tag >>>>> on RMW instructions >>>> >>>> These are the exact 3 things I changed in dartagnan for testing what Jonas >>>> proposed. >>>> >>>> I am not sure if further changes are needed for herd7. > > What about failed RMW instructions? IIRC, herd7 generates just an R for > these, not both R and W, but won't it still be annotated with an mb tag? > And wasn't this matter of failed RMWs one of the issues that the two of > you specifically wanted to make explicit in the memory model, rather > than implicit in the operation of herd7? That's why we use the RMW_MB tag. I should have copied that definition too: (* full barrier events that appear in non-failing RMW *) let RMW_MB = Mb & (dom(rmw) | range(rmw)) This ensures that the only successful rmw instructions have an RMW_MB tag. > > And wasn't another one of these issues the difference between > value-returning and non-value-returning RMWs? As far as I can, nothing > in the .def file specifically mentions this. There's the noreturn tag > in the .bell file, but nothing in the .def file says which instructions > it applies to. Or are we supposed to know that it automatically applies > to all __atomic_op() instances? Ah, now you're already forestalling my next suggestion :)) I would say let's fix one issue at a time (learned this from Andrea). For the other issue, do noreturn rmws always have the same ordering as once? I suspect we need to extend herd slightly more to support the second one, I don't know if there's syntax for passing tags to __atomic_op. > >>> Okay, good. This answers the first part of what I asked. What about >>> the second part? That is, how will the changes to the .def, .bell, and >>> .cat files achieve your goals? >>> >>> Alan >> >> >> Firstly, we'd allow 'mb as a barrier mode in events, e.g. > > You mean as a mode in RMW events. 'mb already is a mode for F events. Thanks, you're right. > >> instructions RMW[{'once,'acquire,'release,'mb}] >> >> then the Mb tags would appear in the graph. And then I'd define the ordering >> explicitly. One way is to say that an Mb tag orders all memory accesses >> before(or at) the tag with all memory accesses after(or at) the tag, except >> the accesses of the rmw with each other. > > I don't think you need to add very much. The .cat file already defines > the mb relation as including the term: > > ([M] ; fencerel(Mb) ; [M]) > > All that's needed is to replace the fencerel(Mb) with something more > general... > >> This is the same as the full fence before the read, which orders all memory >> accesses before the read with every access after(or at) the read, >> plus the full fence after the write, which orders all memory accesses >> before(or at) the write with all accesses after the write. >> >> That would be done by adding >> >> [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M] > > .. like this. > >> to ppo. > > You need to update the definition of mb, not ppo. Yes, I meant to update the definition of mb, but I momentarily thought the effect of that is just that ppo_new = ppo_old | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M] I forgot that extending mb has another effect, in pb. > And the RMW_MB above > looks wrong; shouldn't it be just Mb? As discussed above, no, since the Mb will also be on the failed RMW. > > Also, is the "\ rmw" part really necessary? I don't think it makes any > difference; the memory model already knows that the read part of an RMW > has to happen before the write part. It unfortunately does make a difference, because mb is a strong fence. This means that an Mb in an rmw sequence would create additional pb edges. prop;(rfe ; [Mb];rmw;[Mb]) I believe this is might give wrong results on powerpc, but I'd need to double check. > >> One could also split it into two rules to keep with the "two full fences" >> analogy. Maybe a good way would be like this: >> >> [M] ; po; [RMW_MB & R] ; po^? ; [M] >> >> [M] ; po^?; [RMW_MB & W] ; po ; [M] > > My preference is for the first approach. That was also my early preference, but to be honest I expected that you'd prefer the second approach. Actually, I also started warming up to it. Have fun, jonas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-22 9:20 ` Jonas Oberhauser @ 2024-05-22 14:20 ` Alan Stern 2024-05-22 16:54 ` Andrea Parri 2024-05-23 12:54 ` Jonas Oberhauser 0 siblings, 2 replies; 33+ messages in thread From: Alan Stern @ 2024-05-22 14:20 UTC (permalink / raw) To: Jonas Oberhauser Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On Wed, May 22, 2024 at 11:20:47AM +0200, Jonas Oberhauser wrote: > > > Am 5/21/2024 um 5:36 PM schrieb Alan Stern: > > On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote: > > > > > > > > > Am 5/18/2024 um 2:31 AM schrieb Alan Stern: > > > > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote: > > > > > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote: > > > > > > > > > > > > > > > > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern: > > > > > > > Hernan and Jonas: > > > > > > > > > > > > > > Can you explain more fully the changes you want to make to herd7 and/or > > > > > > > the LKMM? The goal is to make the memory barriers currently implicit in > > > > > > > RMW operations explicit, but I couldn't understand how you propose to do > > > > > > > this. > > > > > > > > > > > > > > Are you going to change herd7 somehow, and if so, how? It seems like > > > > > > > you should want to provide sufficient information so that the .bell > > > > > > > and .cat files can implement the appropriate memory barriers associated > > > > > > > with each RMW operation. What additional information is needed? And > > > > > > > how (explained in English, not by quoting source code) will the .bell > > > > > > > and .cat files make use of this information? > > > > > > > > > > > > > > Alan > > > > > > > > > > > > > > > > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the > > > > > > following: > > > > > > - if a tag called Mb appears on an rmw instruction (by instruction I > > > > > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with > > > > > > one of those things: > > > > > > * full mb ; once (the rmw) ; full mb, if a value returning > > > > > > (successful) rmw > > > > > > * once (the rmw) otherwise > > > > > > - everything else gets translated 1:1 into some internal representation > > > > > > > > > > This is my understanding from reading the source code of CSem.ml in herd7's > > > > > repo. > > > > > > > > > > Also, this is exactly what dartagnan is currently doing. > > > > > > > > > > > > > > > > > What I'm proposing is: > > > > > > 1. remove this transpilation step, > > > > > > 2. and instead allow the Mb tag to actually appear on RMW instructions > > > > > > 3. change the cat file to explicitly define the behavior of the Mb tag > > > > > > on RMW instructions > > > > > > > > > > These are the exact 3 things I changed in dartagnan for testing what Jonas > > > > > proposed. > > > > > > > > > > I am not sure if further changes are needed for herd7. > > > > What about failed RMW instructions? IIRC, herd7 generates just an R for > > these, not both R and W, but won't it still be annotated with an mb tag? > > And wasn't this matter of failed RMWs one of the issues that the two of > > you specifically wanted to make explicit in the memory model, rather > > than implicit in the operation of herd7? > > That's why we use the RMW_MB tag. I should have copied that definition too: > > > (* full barrier events that appear in non-failing RMW *) > let RMW_MB = Mb & (dom(rmw) | range(rmw)) > > > This ensures that the only successful rmw instructions have an RMW_MB tag. It would be better if there was a way to tell herd7 not to add the 'mb tag to failed instructions in the first place. This approach is brittle; see below. An alternative would be to have a way for the .cat file to remove the 'mb tag from a failed RMW instruction. But I don't know if this is feasible. > > And wasn't another one of these issues the difference between > > value-returning and non-value-returning RMWs? As far as I can, nothing > > in the .def file specifically mentions this. There's the noreturn tag > > in the .bell file, but nothing in the .def file says which instructions > > it applies to. Or are we supposed to know that it automatically applies > > to all __atomic_op() instances? > > Ah, now you're already forestalling my next suggestion :)) > > I would say let's fix one issue at a time (learned this from Andrea). > > For the other issue, do noreturn rmws always have the same ordering as once? If they aren't annotated with _acquire or _release then yes... And I don't know whether there _are_ any annotated no-return RMWs. If somebody wanted such a thing, they probably would just use a value-returning RMW instead. > I suspect we need to extend herd slightly more to support the second one, I > don't know if there's syntax for passing tags to __atomic_op. This may not be be needed. But still, it would nice to be explicit (in a comment in the .def file if nowhere else) that __atomic_op always adds a 'noreturn tag. > > > instructions RMW[{'once,'acquire,'release,'mb}] > > > > > > then the Mb tags would appear in the graph. And then I'd define the ordering > > > explicitly. One way is to say that an Mb tag orders all memory accesses > > > before(or at) the tag with all memory accesses after(or at) the tag, except > > > the accesses of the rmw with each other. > > > > I don't think you need to add very much. The .cat file already defines > > the mb relation as including the term: > > > > ([M] ; fencerel(Mb) ; [M]) > > > > All that's needed is to replace the fencerel(Mb) with something more > > general... And this is why I said the RMW_MB mechanism is brittle. With the 'mb tag still added to failed RMW events, the term above will cause the memory model to think there is ordering even though the event isn't in the RMW_MB class. > > Also, is the "\ rmw" part really necessary? I don't think it makes any > > difference; the memory model already knows that the read part of an RMW > > has to happen before the write part. > > It unfortunately does make a difference, because mb is a strong fence. > This means that an Mb in an rmw sequence would create additional pb edges. > > prop;(rfe ; [Mb];rmw;[Mb]) > > I believe this is might give wrong results on powerpc, but I'd need to > double check. PowerPC uses a load-reserve/store-conditional approach for RMW, so it's tricky. However, you're right that having a fictitious mb between the read and the write of an RMW would mean that the preceding (in coherence order) write would have to be visible to all CPUs before the RMW write could execute, and I don't believe we want to assert this. > > > One could also split it into two rules to keep with the "two full fences" > > > analogy. Maybe a good way would be like this: > > > > > > [M] ; po; [RMW_MB & R] ; po^? ; [M] > > > > > > [M] ; po^?; [RMW_MB & W] ; po ; [M] > > > > My preference is for the first approach. > > That was also my early preference, but to be honest I expected that you'd > prefer the second approach. > Actually, I also started warming up to it. If you do want to use this approach, it should be simplified. All you need is: [M] ; po ; [RMW_MB] [RMW_MB] ; po ; [M] This is because events tagged with RMW_MB always are memory accesses, and accesses that aren't part of the RMW are already covered by the fencerel(Mb) thing above. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-22 14:20 ` Alan Stern @ 2024-05-22 16:54 ` Andrea Parri 2024-05-22 18:20 ` Alan Stern 2024-05-23 14:27 ` Jonas Oberhauser 2024-05-23 12:54 ` Jonas Oberhauser 1 sibling, 2 replies; 33+ messages in thread From: Andrea Parri @ 2024-05-22 16:54 UTC (permalink / raw) To: Alan Stern Cc: Jonas Oberhauser, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes Alan, all, ("randomly" picking a recent post in the thread, after having observed this discussion for a while...) > It would be better if there was a way to tell herd7 not to add the 'mb > tag to failed instructions in the first place. This approach is > brittle; see below. AFAIU, changing the herd representation to generate mb-accesses in place of certain mb-fences... > If you do want to use this approach, it should be simplified. All you > need is: > > [M] ; po ; [RMW_MB] > > [RMW_MB] ; po ; [M] > > This is because events tagged with RMW_MB always are memory accesses, > and accesses that aren't part of the RMW are already covered by the > fencerel(Mb) thing above. ... and updating the .cat file to the effects of something like -let mb = ([M] ; fencerel(Mb) ; [M]) | +let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) | ... can hardly be called "making RMW barriers explicit". (So much so that the first commit in PR #865 was titled "Remove explicit barriers from RMWs". :-)) Overall, this discussion rather seems to confirm the close link between tools/memory-model/ and herdtools7. (After all, to what extent could any putative RMW_MB be considered "explicit" without _knowing the under- lying representation of the RMW operations...) My understanding is that this discussion was at least in part motivated by a desire to experiment and familiarize with the current herd representation (that does indeed require some getting-used-to...); this suggests, as some of you already mentioned, to add some comments or a .txt in tools/memory-model/ in order to document such representation and ameliorate that experience. OTOH, I must admit, I'm unable to see here sufficient motivation(tm) for changing the current representation (and model): not the how, but the why... Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-22 16:54 ` Andrea Parri @ 2024-05-22 18:20 ` Alan Stern 2024-05-22 19:48 ` Hernan Ponce de Leon 2024-05-23 14:27 ` Jonas Oberhauser 1 sibling, 1 reply; 33+ messages in thread From: Alan Stern @ 2024-05-22 18:20 UTC (permalink / raw) To: Andrea Parri Cc: Jonas Oberhauser, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On Wed, May 22, 2024 at 06:54:25PM +0200, Andrea Parri wrote: > Alan, all, > > ("randomly" picking a recent post in the thread, after having observed > this discussion for a while...) > > > It would be better if there was a way to tell herd7 not to add the 'mb > > tag to failed instructions in the first place. This approach is > > brittle; see below. > > AFAIU, changing the herd representation to generate mb-accesses in place > of certain mb-fences... I believe herd7 already generates mb accesses (not fences) for certain RMW operations. But then it does some post-processing on them, and that post-processing is what we are thinking of changing. > > If you do want to use this approach, it should be simplified. All you > > need is: > > > > [M] ; po ; [RMW_MB] > > > > [RMW_MB] ; po ; [M] > > > > This is because events tagged with RMW_MB always are memory accesses, > > and accesses that aren't part of the RMW are already covered by the > > fencerel(Mb) thing above. > > ... and updating the .cat file to the effects of something like > > -let mb = ([M] ; fencerel(Mb) ; [M]) | > +let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) | > > ... can hardly be called "making RMW barriers explicit". (So much so > that the first commit in PR #865 was titled "Remove explicit barriers > from RMWs". :-)) There is another point, something we didn't spell out explicitly in the email discussion. Namely, in linux-kernel.def there is a long list of instructions along with corresponding herd7 implementation instructions, and those instructions explicitly contain either {once}, {acquire}, {release}, or {mb} tags. So to a large extent, these barriers already are explicit in the memory model. Not in the .cat file, but in the .def file. What is not so explicit is how the {mb} tag works. Its operation isn't as simple as the operation of the {acquire} and {release} tags; those just modify the R or W access in the RMW pair as you would expect. Instead, an {mb} tag says to insert strong memory barriers before the R access and after the W access. This is more or less what the post-processing mentioned earlier does, and Jonas and Hernan want to move this out of herd7 and into the memory model. > Overall, this discussion rather seems to confirm the close link between > tools/memory-model/ and herdtools7. (After all, to what extent could > any putative RMW_MB be considered "explicit" without _knowing the under- > lying representation of the RMW operations...) My understanding is that > this discussion was at least in part motivated by a desire to experiment > and familiarize with the current herd representation (that does indeed > require some getting-used-to...); this suggests, as some of you already > mentioned, to add some comments or a .txt in tools/memory-model/ in order > to document such representation and ameliorate that experience. OTOH, I > must admit, I'm unable to see here sufficient motivation(tm) for changing > the current representation (and model): not the how, but the why... Well, it's not a big change. And in my opinion, if something can be moved out of herd7's innards and into the memory model files, then doing so is generally a good idea. However, I do agree that there will still be a close link between tools/memory-model/ and herdtools7. This may be unavoidable, at least to some extent, but any way to reduce it is worth considering. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-22 18:20 ` Alan Stern @ 2024-05-22 19:48 ` Hernan Ponce de Leon 2024-05-23 9:04 ` Andrea Parri 0 siblings, 1 reply; 33+ messages in thread From: Hernan Ponce de Leon @ 2024-05-22 19:48 UTC (permalink / raw) To: Alan Stern, Andrea Parri Cc: Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On 5/22/2024 8:20 PM, Alan Stern wrote: > On Wed, May 22, 2024 at 06:54:25PM +0200, Andrea Parri wrote: >> Alan, all, >> >> ("randomly" picking a recent post in the thread, after having observed >> this discussion for a while...) >> >>> It would be better if there was a way to tell herd7 not to add the 'mb >>> tag to failed instructions in the first place. This approach is >>> brittle; see below. >> >> AFAIU, changing the herd representation to generate mb-accesses in place >> of certain mb-fences... > > I believe herd7 already generates mb accesses (not fences) for certain > RMW operations. But then it does some post-processing on them, and that > post-processing is what we are thinking of changing. > >>> If you do want to use this approach, it should be simplified. All you >>> need is: >>> >>> [M] ; po ; [RMW_MB] >>> >>> [RMW_MB] ; po ; [M] >>> >>> This is because events tagged with RMW_MB always are memory accesses, >>> and accesses that aren't part of the RMW are already covered by the >>> fencerel(Mb) thing above. >> >> ... and updating the .cat file to the effects of something like >> >> -let mb = ([M] ; fencerel(Mb) ; [M]) | >> +let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) | >> >> ... can hardly be called "making RMW barriers explicit". (So much so >> that the first commit in PR #865 was titled "Remove explicit barriers >> from RMWs". :-)) > > There is another point, something we didn't spell out explicitly in the > email discussion. Namely, in linux-kernel.def there is a long list of > instructions along with corresponding herd7 implementation instructions, > and those instructions explicitly contain either {once}, {acquire}, > {release}, or {mb} tags. So to a large extent, these barriers already > are explicit in the memory model. Not in the .cat file, but in the .def > file. > > What is not so explicit is how the {mb} tag works. Its operation isn't > as simple as the operation of the {acquire} and {release} tags; those > just modify the R or W access in the RMW pair as you would expect. > Instead, an {mb} tag says to insert strong memory barriers before the R > access and after the W access. This is more or less what the > post-processing mentioned earlier does, and Jonas and Hernan want to > move this out of herd7 and into the memory model. > >> Overall, this discussion rather seems to confirm the close link between >> tools/memory-model/ and herdtools7. (After all, to what extent could >> any putative RMW_MB be considered "explicit" without _knowing the under- >> lying representation of the RMW operations...) My understanding is that >> this discussion was at least in part motivated by a desire to experiment >> and familiarize with the current herd representation (that does indeed >> require some getting-used-to...); this suggests, as some of you already >> mentioned, to add some comments or a .txt in tools/memory-model/ in order >> to document such representation and ameliorate that experience. OTOH, I >> must admit, I'm unable to see here sufficient motivation(tm) for changing >> the current representation (and model): not the how, but the why... > > Well, it's not a big change. And in my opinion, if something can be > moved out of herd7's innards and into the memory model files, then doing > so is generally a good idea. > > However, I do agree that there will still be a close link between > tools/memory-model/ and herdtools7. This may be unavoidable, at least > to some extent, but any way to reduce it is worth considering. I can give my motivation as a tool developer. It would be much simpler if one could find all the information to support lkmm in tools/memory-model/ (in the form of the model + some comments or a .txt to cover those things we cannot move out of the tool implementation), rather than having to dive into herd7 code. Hernan > > Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-22 19:48 ` Hernan Ponce de Leon @ 2024-05-23 9:04 ` Andrea Parri 0 siblings, 0 replies; 33+ messages in thread From: Andrea Parri @ 2024-05-23 9:04 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: Alan Stern, Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes > It would be much simpler if > one could find all the information to support lkmm in tools/memory-model/ > (in the form of the model + some comments or a .txt to cover those things we > cannot move out of the tool implementation), rather than having to dive into > herd7 code. Got it. And I do find that very relatable to LKMM developers who, like me, are definitely not fluent in OCaml. :-/ Let me draft some .txt to such effect, based on but expanding and hopefully completing my previous remarks in https://lore.kernel.org/lkml/ZjrSm119+IfIU9eU@andrea/ Having something like that "on paper" can also help evaluate future changes to the tool and/or model. Thank you for the suggestion. Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-22 16:54 ` Andrea Parri 2024-05-22 18:20 ` Alan Stern @ 2024-05-23 14:27 ` Jonas Oberhauser 2024-05-23 16:35 ` Andrea Parri 1 sibling, 1 reply; 33+ messages in thread From: Jonas Oberhauser @ 2024-05-23 14:27 UTC (permalink / raw) To: Andrea Parri, Alan Stern Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes Am 5/22/2024 um 6:54 PM schrieb Andrea Parri: > Alan, all, > > ("randomly" picking a recent post in the thread, after having observed > this discussion for a while...) > >> It would be better if there was a way to tell herd7 not to add the 'mb >> tag to failed instructions in the first place. This approach is >> brittle; see below. > > AFAIU, changing the herd representation to generate mb-accesses in place > of certain mb-fences... > >> If you do want to use this approach, it should be simplified. All you >> need is: >> >> [M] ; po ; [RMW_MB] >> >> [RMW_MB] ; po ; [M] >> >> This is because events tagged with RMW_MB always are memory accesses, >> and accesses that aren't part of the RMW are already covered by the >> fencerel(Mb) thing above. > > .. and updating the .cat file to the effects of something like > > -let mb = ([M] ; fencerel(Mb) ; [M]) | > +let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) | > > .. can hardly be called "making RMW barriers explicit". (So much so > that the first commit in PR #865 was titled "Remove explicit barriers > from RMWs". :-)) > > Overall, this discussion rather seems to confirm the close link between > tools/memory-model/ and herdtools7. (After all, to what extent could > any putative RMW_MB be considered "explicit" without _knowing the under- > lying representation of the RMW operations...) My view is a bit different. There's more or less standard theory for weak memory models, including how operations at the source code level map to events in the graph. Part of that standard theory are relations like rmw, and the circumstances under which they appear in the graph. You'll see these relations in generic papers about weak memory models, like GenMC, Hahn, etc. as well as in pretty much every specific memory model like TSO, C11, PowerPC, VMM, etc., totally independently of herd7 (even though these notations have historically developed together with herd). Naively I would expect that a tool like herd7 would be a generic WMM exploration tool, which follows these standards with a very obvious 1:1 mapping of what we see in the code and the events we see in the graph, plus perhaps a thin and explicit configurable herd7-specific mapping layer like what we see in linux_kernel.cat to configure the syntax for a specific model. In that mapping layer, we currently see that xchg() is an exchange operation with an Mb tag. Just like an smp_load_acquire is a read with an acquire tag. Or so it seems. Instead, we find that contrary to what's written in that file, and contrary to the conventions, an xchg() is an F[mb] ; R[once] ; W[once] ; F[Mb]. And in fact a cmp_xchg could even be a R[once]. That's because the herd7 tool isn't quite as generic as one might think, but rather specifically "detects" that it's running the LKMM and then the mapping isn't what you'd naively think, but something hidden in the OCaml implementation of herd7. That would be comparable to a popular tool for matrix calculations using the syntax [ 10 5 4 3 4 2 ] to define a 2x3 matrix, unless one of the values is -3, in which case it becomes a vector of 6 elements, because that's what a really important user of the tool wanted, and then didn't see enough of a need to change. My point is that to anyone who has seen standard matrix notation, this syntax in a matrix-computation-tool looks like it would be a matrix, right? And it usually is a matrix; then it should always be a matrix. I usually say I don't look much at natural language documentation and only read code, because code doesn't lie. In this case, the natural language documentation is saying the correct thing thanks to the hard work of a lot of people, but the code (in .cat etc.) doesn't do what it seems to be doing. And I think that should be changed, both to reduce the anyways high mental load of reading the code without having to do mental transformations in your head, and also to make herd more Lkmm-agnostic. In the ideal world, herd doesn't know anything about Lkmm except what we tell it through tools/memory-model, and generic things like C syntax + semantics. So when using syntax like dom(rmw), I don't see it as confirming the close link to herd more than before, but rather depending more on standard notations and conventions, and relying on herd's close link to those standard notations (such as using rmw for the rmw relation and dom(r) for the domain of r) for dom(rmw) to mean what anyone who has deeply read a couple of modern WMM papers would expect. > My understanding is that > this discussion was at least in part motivated by a desire to experiment > and familiarize with the current herd representation I would phrase it more extreme: I want to get rid of the unnecessary non-standard parts of the herd representation. Those parts have led me astray several times. Ok, who cares about me, but even Luc once forgot about the non-standard parts and thought it is a bug: https://github.com/herd/herdtools7/issues/384#issuecomment-1132859904 I also know that Viktor stumbled over it before and also suggested we change it. I think there's 0 benefit to them, they're just wasting people's time and energy and lead to misunderstanding. Ok, this e-mail became long. Hope it at least helps clarify my motivation :)) jonas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-23 14:27 ` Jonas Oberhauser @ 2024-05-23 16:35 ` Andrea Parri 2024-05-23 20:30 ` Jonas Oberhauser 0 siblings, 1 reply; 33+ messages in thread From: Andrea Parri @ 2024-05-23 16:35 UTC (permalink / raw) To: Jonas Oberhauser Cc: Alan Stern, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes > I would phrase it more extreme: I want to get rid of the unnecessary > non-standard parts of the herd representation. Please indulge the thought that what might appear to be "non-standard" to one or one's community might appear differently to others. Continuing with the example above, I'm pretty sure your "standard" and simple idea of mb-reads and mb-writes, or even "unsuccessful" mb-reads will turn up the nose of many kernel developers... The current repre- sentation for xchg() was described in the ASPLOS'18 work and it's been used (& tested) upstream since the LKMM was first merged ~6 years ago. But that's not the point, "standards" can change and certainly kernels and tools do. My remark was more to point out that you're not getting rid of anything..., you're simply proposing a different representation (asking kernel developers & maintainers to "deal with it"): here's why I was and I am looking forward to something more than "because we can". Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-23 16:35 ` Andrea Parri @ 2024-05-23 20:30 ` Jonas Oberhauser 2024-05-24 3:30 ` Andrea Parri 2024-05-24 8:16 ` Hernan Ponce de Leon 0 siblings, 2 replies; 33+ messages in thread From: Jonas Oberhauser @ 2024-05-23 20:30 UTC (permalink / raw) To: Andrea Parri Cc: Alan Stern, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes Am 5/23/2024 um 6:35 PM schrieb Andrea Parri: >> I would phrase it more extreme: I want to get rid of the unnecessary >> non-standard parts of the herd representation. > > Please indulge the thought that what might appear to be "non-standard" > to one or one's community might appear differently to others. Certainly. And of course, the formalization of the LKMM doesn't have to be optimized for the WMM community, even though I suspect that they (and possibly the tools they develop) are the main direct consumers of the formalization. > Continuing with the example above, I'm pretty sure your "standard" and > simple idea of mb-reads and mb-writes, or even "unsuccessful" mb-reads > will turn up the nose of many kernel developers... I'm not sure how true that is. Firstly I'm not sure how many kernel developers really read the formalization and try to see what it does, rather than relying on tools to gain an intuition of what's going on. But let's say a kernel developer wants to make sure that their intuition of how cmpxchg works is matched by the formal model, e.g., they want to double check that the formal model is correct after they got some unexpected results in a herd7 litmus test. How would they go about it today? The only way is to read the OCaml source code, because there's no other code that obviously defines the behavior. At best, they would read atomic_cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W) atomic_cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W) atomic_cmpxchg(X,V,W) __cmpxchg{mb}(X,V,W) see the Acquire, Release, and Mb references in the model, and think "ok, so '__cmpxchg{acquire}' means I get an Acquire tag in the success case which gives acq_po ordering, '__cmpxchg{release}' means I get a Release tag in the success case which gives po_rel ordering. Therefore '__cmpxchg{mb}' must give me an Mb tag in the success case. That would give me mb ordering, but not between the store and subsequent operations, and that's just wrong." But of course this "understanding by analogy" is broken, because there's a bunch of special OCaml match expressions in herd that look only for release or mb and just give totally different behavior for that one case. Every other tag behaves exactly the same way. At worst they wouldn't even guess that this only is the success ordering (that's definitely what happened to us in the weak memory model community, but we don't matter for this argument). A better situation would be to do something like this: (*success*)(*fail*) atomic_cmpxchg_acquire(X,V,W) __cmpxchg {acquire} {once} (X,V,W) atomic_cmpxchg_release(X,V,W) __cmpxchg {release} {once} (X,V,W) atomic_cmpxchg(X,V,W) __cmpxchg {mb} {once} (X,V,W) and being explicit in the model about what the Mb tag does: | [M];fencerel(Mb);[M] (* smp_mb() and operations with Mb ordering such as cmpxchg() *) | [M];po;[R & Mb] (* reads with an Mb tag - coming from full fence rmws - are ordered as-if there was an smp_mb() right before the read *) | [W & Mb];po;[M] (* correspondingly, writes with an Mb tag are ordered as-if there was an smp_mb() right after the write *) And suddenly you can read the model and map it 1:1 to the intuition that you can treat a successful cmpxchg() (or an xchg() etc.) as-if it was enclosed by smp_mb(). There doesn't need to be a real fence. > The current repre- > sentation for xchg() was described in the ASPLOS'18 work Do you mean the one example in Table 3? What about cmpxchg() or cmpxchg_acquire()? > But that's not the point, "standards" can change and certainly kernels > and tools do. My remark was more to point out that you're not getting > rid of anything..., We're definitely getting rid of some lines in herd7, that have been added solely for dealing with this specific case of LKMM. For example, there's some code looking like this (for a memory ordering tag `a`) (match a with ["release"] -> a | _ -> a_once) which specifically refers to the release tag in LKMM and we can turn that into a with no more reference to any LKMM tags. Of course, this is not the Linux community's problem, just the herd (and others who want to use the literal cat file of LKMM) maintainers who have to deal with it. And imagine that at some point you want to add another tag to the linux kernel - like for example for 'accessing an atomic in initialization code, so that the compiler can do optimizations like merge a bunch of bitwise operations'. Let's call it 'init. What will be the behavior of WRITE_INIT(X,V) __store{init}(X,V); with the current herd7? Honestly I have no clue, because there might be a (match a with ["release"] -> a | _ -> a_once) somewhere that will turn this 'init into 'once. We'd have to comb the herd7 codebase to know for sure (or hope that our experiments are sufficiently thorough to catch all cases). >you're simply proposing a different representation I wouldn't phrase it like that, since it's a representation many people familiar with WMM would expect, but admittedly that doesn't have to be the deciding factor. > asking kernel developers & maintainers to "deal with it" Deal with what, no longer having to learn OCaml to be sure that the LKMM's formal definition matches the description in memory_barriers.txt? Otherwise, I don't see anything that changes for the worse. With the change, people need to think for a few seconds to see that the explicit rules in the .cat file are a formalization of "treat xchg() as if it had smp_mb() before and after". Without the change, they need to read an OCaml codebase to see that is meant by atomic_xchg(X,V,W) __xchg{mb}(X,V,W) So I don't see any downsides. I would also support making the representation explicit in the .def files instead, with something like cmp_xchg(x,e,v) = { if (*x==e) { smp_mb(); int r = __cmp_xchg{once}(x,e,v) ; smp_mb(); return r } else { __read{once}(x) } } Then you get to keep the representation. Without knowing herd's internals, I have no idea of how to seriously specify a meta language so that herd could effectively and efficiently deal with it in general. But one could at least envision some specific syntax for cmpxchg with a code sequence for the failure case and a code sequence for the success case. Personally I don't prefer this option, firstly because I don't see a good reason for the Linux community to go their own way here. I don't think there's really much of a problem with saying "this is the intuition; this is how we formalize it" and for the two not to be completely identical. It happens all the time, and in this case the gap between "we formalize it by really having two smp_mb() appear in the graph if it's successful" and "we formalize it by providing the same ordering that the smp_mb() would have if it was there" isn't big. Especially for those kernel people who have enough motivation to actually deep dive into the formalization in the first place. But it makes it a lot more accessible for the WMM community, which can only benefit LKMM. And secondly because people will probably mostly focus on the .cat file, which means that it's still a bit of a booby trap to put something so important (and perhaps surprising) into the .def file, but at least one that is in the open for people who are more careful and also read the .def file. > I am looking forward to something more than "because we can". - it makes it easier to maintain the LKMM in the future, because you don't have to work around hidden transformations inside herd7 - it makes implicit behavior explicit - it makes it easier to understand that the formalization matches the intention - it makes it easier to learn the LKMM from the formalization without having to cross-reference every bit with the informal documentation to avoid misunderstandings Have a good time, jonas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-23 20:30 ` Jonas Oberhauser @ 2024-05-24 3:30 ` Andrea Parri 2024-05-24 8:16 ` Hernan Ponce de Leon 1 sibling, 0 replies; 33+ messages in thread From: Andrea Parri @ 2024-05-24 3:30 UTC (permalink / raw) To: Jonas Oberhauser Cc: Alan Stern, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes > Do you mean the one example in Table 3? > What about cmpxchg() or cmpxchg_acquire()? Yes, Table 3. The cmpxchg*() primitives were not discussed in the paper. IIRC, their representation has not changed since at least 1c27b644c0fd. > We're definitely getting rid of some lines in herd7, that have been added > solely for dealing with this specific case of LKMM. Good. If the herd7 maintainers are "tired" of dealing with those lines, that's definitely a big fat "why" to put in a changelog. > Deal with what, no longer having to learn OCaml to be sure that the LKMM's > formal definition matches the description in memory_barriers.txt? Nope. ;-) Dealing with the review, testing, and maintainance of a new representation. > - it makes it easier to maintain the LKMM in the future, because you don't > have to work around hidden transformations inside herd7 > - it makes implicit behavior explicit > - it makes it easier to understand that the formalization matches the > intention > - it makes it easier to learn the LKMM from the formalization without having > to cross-reference every bit with the informal documentation to avoid > misunderstandings Jonas - You write "less hidden", "less implicit", but I keep reading "a representation I/some people would expect". We've already acknowledged that's no deciding factor to abandon the current seasoned representation. Andrea ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-23 20:30 ` Jonas Oberhauser 2024-05-24 3:30 ` Andrea Parri @ 2024-05-24 8:16 ` Hernan Ponce de Leon 1 sibling, 0 replies; 33+ messages in thread From: Hernan Ponce de Leon @ 2024-05-24 8:16 UTC (permalink / raw) To: Jonas Oberhauser, Andrea Parri Cc: Alan Stern, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On 5/23/2024 10:30 PM, Jonas Oberhauser wrote: > > > Am 5/23/2024 um 6:35 PM schrieb Andrea Parri: >>> I would phrase it more extreme: I want to get rid of the unnecessary >>> non-standard parts of the herd representation. >> >> Please indulge the thought that what might appear to be "non-standard" >> to one or one's community might appear differently to others. > > Certainly. And of course, the formalization of the LKMM doesn't have to > be optimized for the WMM community, even though I suspect that they (and > possibly the tools they develop) are the main direct consumers of the > formalization. > >> Continuing with the example above, I'm pretty sure your "standard" and >> simple idea of mb-reads and mb-writes, or even "unsuccessful" mb-reads >> will turn up the nose of many kernel developers... > > I'm not sure how true that is. Firstly I'm not sure how many kernel > developers really read the formalization and try to see what it does, > rather than relying on tools to gain an intuition of what's going on. > > But let's say a kernel developer wants to make sure that their intuition > of how cmpxchg works is matched by the formal model, e.g., they want to > double check that the formal model is correct > after they got some unexpected results in a herd7 litmus test. > > How would they go about it today? The only way is to read the OCaml > source code, because there's no other code that obviously defines the > behavior. At best, they would read > > atomic_cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W) > atomic_cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W) > atomic_cmpxchg(X,V,W) __cmpxchg{mb}(X,V,W) > > see the Acquire, Release, and Mb references in the model, and think "ok, > so '__cmpxchg{acquire}' means I get an Acquire tag in the success case > which gives acq_po ordering, '__cmpxchg{release}' means I get a Release > tag in the success case which gives po_rel ordering. Therefore > '__cmpxchg{mb}' must give me an Mb tag in the success case. That would > give me mb ordering, but not between the store and subsequent > operations, and that's just wrong." > > But of course this "understanding by analogy" is broken, because there's > a bunch of special OCaml match expressions in herd that look only for > release or mb and just give totally different behavior for that one > case. Every other tag behaves exactly the same way. > > At worst they wouldn't even guess that this only is the success ordering > (that's definitely what happened to us in the weak memory model > community, but we don't matter for this argument). > > A better situation would be to do something like this: > > (*success*)(*fail*) > atomic_cmpxchg_acquire(X,V,W) __cmpxchg {acquire} {once} (X,V,W) > atomic_cmpxchg_release(X,V,W) __cmpxchg {release} {once} (X,V,W) > atomic_cmpxchg(X,V,W) __cmpxchg {mb} {once} (X,V,W) I also think this is the best option. We could change the internal representation of __cmpxchg in herd7 to have two memory orders (*) similar to the current internal representation of C11 CAS and get the memory order to be used for the success/fail case from the .def instead of having this hardcoded in the code. (*) Not to be confused to how LKMM sees a CAS instruction. Hernan > > and being explicit in the model about what the Mb tag does: > > | [M];fencerel(Mb);[M] (* smp_mb() and operations with Mb ordering > such as cmpxchg() *) > | [M];po;[R & Mb] (* reads with an Mb tag - coming from full fence > rmws - are ordered as-if there was an smp_mb() right before the read *) > | [W & Mb];po;[M] (* correspondingly, writes with an Mb tag are > ordered as-if there was an smp_mb() right after the write *) > > And suddenly you can read the model and map it 1:1 to the intuition that > you can treat a successful cmpxchg() (or an xchg() etc.) as-if it was > enclosed by smp_mb(). > > There doesn't need to be a real fence. > > >> The current repre- >> sentation for xchg() was described in the ASPLOS'18 work > > Do you mean the one example in Table 3? > What about cmpxchg() or cmpxchg_acquire()? > >> But that's not the point, "standards" can change and certainly kernels >> and tools do. My remark was more to point out that you're not getting >> rid of anything..., > > We're definitely getting rid of some lines in herd7, that have been > added solely for dealing with this specific case of LKMM. > > For example, there's some code looking like this (for a memory ordering > tag `a`) > > (match a with ["release"] -> a | _ -> a_once) > > which specifically refers to the release tag in LKMM and we can turn > that into > > a > > with no more reference to any LKMM tags. Of course, this is not the > Linux community's problem, just the herd (and others who want to use the > literal cat file of LKMM) maintainers who have to deal with it. > > And imagine that at some point you want to add another tag to the linux > kernel - like for example for 'accessing an atomic in initialization > code, so that the compiler can do optimizations like merge a bunch of > bitwise operations'. Let's call it 'init. > > What will be the behavior of > > WRITE_INIT(X,V) __store{init}(X,V); > > with the current herd7? Honestly I have no clue, because there might be a > > (match a with ["release"] -> a | _ -> a_once) > > somewhere that will turn this 'init into 'once. We'd have to comb the > herd7 codebase to know for sure (or hope that our experiments are > sufficiently thorough to catch all cases). > > >you're simply proposing a different representation > > I wouldn't phrase it like that, since it's a representation many people > familiar with WMM would expect, but admittedly that doesn't have to be > the deciding factor. > >> asking kernel developers & maintainers to "deal with it" > > Deal with what, no longer having to learn OCaml to be sure that the > LKMM's formal definition matches the description in memory_barriers.txt? > Otherwise, I don't see anything that changes for the worse. > With the change, people need to think for a few seconds to see that the > explicit rules in the .cat file are a formalization of "treat xchg() as > if it had smp_mb() before and after". > Without the change, they need to read an OCaml codebase to see that is > meant by > > atomic_xchg(X,V,W) __xchg{mb}(X,V,W) > > So I don't see any downsides. > > > I would also support making the representation explicit in the .def > files instead, with something like > > cmp_xchg(x,e,v) = { if (*x==e) { smp_mb(); int r = > __cmp_xchg{once}(x,e,v) ; smp_mb(); return r } else { __read{once}(x) } } > > Then you get to keep the representation. > Without knowing herd's internals, I have no idea of how to seriously > specify a meta language so that herd could effectively and efficiently > deal with it in general. But one could at least envision some specific > syntax for cmpxchg with a code sequence for the failure case and a code > sequence for the success case. > > Personally I don't prefer this option, firstly because I don't see a > good reason for the Linux community to go their own way here. I don't > think there's really much of a problem with saying "this is the > intuition; this is how we formalize it" and for the two not to be > completely identical. It happens all the time, and in this case the gap > between "we formalize it by really having two smp_mb() appear in the > graph if it's successful" and "we formalize it by providing the same > ordering that the smp_mb() would have if it was there" isn't big. > Especially for those kernel people who have enough motivation to > actually deep dive into the formalization in the first place. But it > makes it a lot more accessible for the WMM community, which can only > benefit LKMM. > > And secondly because people will probably mostly focus on the .cat file, > which means that it's still a bit of a booby trap to put something so > important (and perhaps surprising) into the .def file, but at least one > that is in the open for people who are more careful and also read the > .def file. > > > I am looking forward to something more than "because we can". > > - it makes it easier to maintain the LKMM in the future, because you > don't have to work around hidden transformations inside herd7 > - it makes implicit behavior explicit > - it makes it easier to understand that the formalization matches the > intention > - it makes it easier to learn the LKMM from the formalization without > having to cross-reference every bit with the informal documentation to > avoid misunderstandings > > > Have a good time, > jonas > > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-22 14:20 ` Alan Stern 2024-05-22 16:54 ` Andrea Parri @ 2024-05-23 12:54 ` Jonas Oberhauser 2024-05-23 13:35 ` Paul E. McKenney 2024-05-23 14:05 ` Alan Stern 1 sibling, 2 replies; 33+ messages in thread From: Jonas Oberhauser @ 2024-05-23 12:54 UTC (permalink / raw) To: Alan Stern Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes Am 5/22/2024 um 4:20 PM schrieb Alan Stern: > On Wed, May 22, 2024 at 11:20:47AM +0200, Jonas Oberhauser wrote: >> >> >> Am 5/21/2024 um 5:36 PM schrieb Alan Stern: >>> On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote: >>>> >>>> >>>> Am 5/18/2024 um 2:31 AM schrieb Alan Stern: >>>>> On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote: >>>>>> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote: >>>>>>> >>>>>>> >>>>>>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern: >>>>>>>> Hernan and Jonas: >>>>>>>> >>>>>>>> Can you explain more fully the changes you want to make to herd7 and/or >>>>>>>> the LKMM? The goal is to make the memory barriers currently implicit in >>>>>>>> RMW operations explicit, but I couldn't understand how you propose to do >>>>>>>> this. >>>>>>>> >>>>>>>> Are you going to change herd7 somehow, and if so, how? It seems like >>>>>>>> you should want to provide sufficient information so that the .bell >>>>>>>> and .cat files can implement the appropriate memory barriers associated >>>>>>>> with each RMW operation. What additional information is needed? And >>>>>>>> how (explained in English, not by quoting source code) will the .bell >>>>>>>> and .cat files make use of this information? >>>>>>>> >>>>>>>> Alan >>>>>>> >>>>>>> >>>>>>> I don't know whether herd7 needs to be changed. Probably, herd7 does the >>>>>>> following: >>>>>>> - if a tag called Mb appears on an rmw instruction (by instruction I >>>>>>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with >>>>>>> one of those things: >>>>>>> * full mb ; once (the rmw) ; full mb, if a value returning >>>>>>> (successful) rmw >>>>>>> * once (the rmw) otherwise >>>>>>> - everything else gets translated 1:1 into some internal representation >>>>>> >>>>>> This is my understanding from reading the source code of CSem.ml in herd7's >>>>>> repo. >>>>>> >>>>>> Also, this is exactly what dartagnan is currently doing. >>>>>> >>>>>>> >>>>>>> What I'm proposing is: >>>>>>> 1. remove this transpilation step, >>>>>>> 2. and instead allow the Mb tag to actually appear on RMW instructions >>>>>>> 3. change the cat file to explicitly define the behavior of the Mb tag >>>>>>> on RMW instructions >>>>>> >>>>>> These are the exact 3 things I changed in dartagnan for testing what Jonas >>>>>> proposed. >>>>>> >>>>>> I am not sure if further changes are needed for herd7. >>> >>> What about failed RMW instructions? IIRC, herd7 generates just an R for >>> these, not both R and W, but won't it still be annotated with an mb tag? >>> And wasn't this matter of failed RMWs one of the issues that the two of >>> you specifically wanted to make explicit in the memory model, rather >>> than implicit in the operation of herd7? >> >> That's why we use the RMW_MB tag. I should have copied that definition too: >> >> >> (* full barrier events that appear in non-failing RMW *) >> let RMW_MB = Mb & (dom(rmw) | range(rmw)) >> >> >> This ensures that the only successful rmw instructions have an RMW_MB tag. > > It would be better if there was a way to tell herd7 not to add the 'mb > tag to failed instructions in the first place. This approach is > brittle; see below. Hernan told me that in fact that is actually currently the case in herd7. Failing RMW get assigned the Once tag implicitly. Another thing that I'd suggest to change. > > An alternative would be to have a way for the .cat file to remove the > 'mb tag from a failed RMW instruction. But I don't know if this is > feasible. For Mb it's feasible, as there is no Mb read or Mb store. Mb = Mb & (~M | dom(rmw) | range(rmw)) However one would want to do the same for Acq and Rel. For that one would need to distinguish e.g. between a read that comes from a failed rmw instruction, and where the tag would disappear, or a normal standalone read. For example, by using two different acquire tags, 'acquire and 'rmw-acquire, and defining Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw)) Anyways we can do this change independently. So for now, we don't need RMW_MB. > >>> And wasn't another one of these issues the difference between >>> value-returning and non-value-returning RMWs? As far as I can, nothing >>> in the .def file specifically mentions this. There's the noreturn tag >>> in the .bell file, but nothing in the .def file says which instructions >>> it applies to. Or are we supposed to know that it automatically applies >>> to all __atomic_op() instances? >> >> Ah, now you're already forestalling my next suggestion :)) >> >> I would say let's fix one issue at a time (learned this from Andrea). >> >> For the other issue, do noreturn rmws always have the same ordering as once? > > If they aren't annotated with _acquire or _release then yes... And I > don't know whether there _are_ any annotated no-return RMWs. If > somebody wanted such a thing, they probably would just use a > value-returning RMW instead. > >> I suspect we need to extend herd slightly more to support the second one, I >> don't know if there's syntax for passing tags to __atomic_op. > > This may not be be needed. But still, it would nice to be explicit (in > a comment in the .def file if nowhere else) that __atomic_op always adds > a 'noreturn tag. > >>>> instructions RMW[{'once,'acquire,'release,'mb}] >>>> >>>> then the Mb tags would appear in the graph. And then I'd define the ordering >>>> explicitly. One way is to say that an Mb tag orders all memory accesses >>>> before(or at) the tag with all memory accesses after(or at) the tag, except >>>> the accesses of the rmw with each other. >>> >>> I don't think you need to add very much. The .cat file already defines >>> the mb relation as including the term: >>> >>> ([M] ; fencerel(Mb) ; [M]) >>> >>> All that's needed is to replace the fencerel(Mb) with something more >>> general... > > And this is why I said the RMW_MB mechanism is brittle. With the 'mb > tag still added to failed RMW events, the term above will cause the > memory model to think there is ordering even though the event isn't in > the RMW_MB class. > Huh, I thought that fencerel(Mb) is something like po ; [F & Mb] ; po (where F includes standalone fences). But looking into the stdlib.cat, you're right. >>> Also, is the "\ rmw" part really necessary? I don't think it makes any >>> difference; the memory model already knows that the read part of an RMW >>> has to happen before the write part. >> >> It unfortunately does make a difference, because mb is a strong fence. >> This means that an Mb in an rmw sequence would create additional pb edges. >> >> prop;(rfe ; [Mb];rmw;[Mb]) >> >> I believe this is might give wrong results on powerpc, but I'd need to >> double check. > > PowerPC uses a load-reserve/store-conditional approach for RMW, so it's > tricky. However, you're right that having a fictitious mb between the > read and the write of an RMW would mean that the preceding (in coherence > order) write would have to be visible to all CPUs before the RMW write > could execute, and I don't believe we want to assert this. > >>>> One could also split it into two rules to keep with the "two full fences" >>>> analogy. Maybe a good way would be like this: >>>> >>>> [M] ; po; [RMW_MB & R] ; po^? ; [M] >>>> >>>> [M] ; po^?; [RMW_MB & W] ; po ; [M] >>> >>> My preference is for the first approach. >> >> That was also my early preference, but to be honest I expected that you'd >> prefer the second approach. >> Actually, I also started warming up to it. > > If you do want to use this approach, it should be simplified. All you > need is: > > [M] ; po ; [RMW_MB] > > [RMW_MB] ; po ; [M] > > This is because events tagged with RMW_MB always are memory accesses, > and accesses that aren't part of the RMW are already covered by the > fencerel(Mb) thing above. This has exactly the issue mentioned above - it will cause the rmw to have an internal strong fence that on powerpc probably isn't there. We could do (with the assumption that Mb applies only to successful rmw): [M] ; po ; [Mb & R] [Mb & W] ; po ; [M] Kind regards, jonas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-23 12:54 ` Jonas Oberhauser @ 2024-05-23 13:35 ` Paul E. McKenney 2024-05-23 14:05 ` Alan Stern 1 sibling, 0 replies; 33+ messages in thread From: Paul E. McKenney @ 2024-05-23 13:35 UTC (permalink / raw) To: Jonas Oberhauser Cc: Alan Stern, Hernan Ponce de Leon, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote: > Am 5/22/2024 um 4:20 PM schrieb Alan Stern: > > On Wed, May 22, 2024 at 11:20:47AM +0200, Jonas Oberhauser wrote: > > > Am 5/21/2024 um 5:36 PM schrieb Alan Stern: > > > > On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote: > > > > > Am 5/18/2024 um 2:31 AM schrieb Alan Stern: > > > > > > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote: > > > > > > > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote: > > > > > > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern: > > > > > > > > > Hernan and Jonas: > > > > > > > > > > > > > > > > > > Can you explain more fully the changes you want to make to herd7 and/or > > > > > > > > > the LKMM? The goal is to make the memory barriers currently implicit in > > > > > > > > > RMW operations explicit, but I couldn't understand how you propose to do > > > > > > > > > this. > > > > > > > > > > > > > > > > > > Are you going to change herd7 somehow, and if so, how? It seems like > > > > > > > > > you should want to provide sufficient information so that the .bell > > > > > > > > > and .cat files can implement the appropriate memory barriers associated > > > > > > > > > with each RMW operation. What additional information is needed? And > > > > > > > > > how (explained in English, not by quoting source code) will the .bell > > > > > > > > > and .cat files make use of this information? > > > > > > > > > > > > > > > > > > Alan > > > > > > > > > > > > > > > > > > > > > > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the > > > > > > > > following: > > > > > > > > - if a tag called Mb appears on an rmw instruction (by instruction I > > > > > > > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with > > > > > > > > one of those things: > > > > > > > > * full mb ; once (the rmw) ; full mb, if a value returning > > > > > > > > (successful) rmw > > > > > > > > * once (the rmw) otherwise > > > > > > > > - everything else gets translated 1:1 into some internal representation > > > > > > > > > > > > > > This is my understanding from reading the source code of CSem.ml in herd7's > > > > > > > repo. > > > > > > > > > > > > > > Also, this is exactly what dartagnan is currently doing. > > > > > > > > > > > > > > > > > > > > > > > What I'm proposing is: > > > > > > > > 1. remove this transpilation step, > > > > > > > > 2. and instead allow the Mb tag to actually appear on RMW instructions > > > > > > > > 3. change the cat file to explicitly define the behavior of the Mb tag > > > > > > > > on RMW instructions > > > > > > > > > > > > > > These are the exact 3 things I changed in dartagnan for testing what Jonas > > > > > > > proposed. > > > > > > > > > > > > > > I am not sure if further changes are needed for herd7. > > > > > > > > What about failed RMW instructions? IIRC, herd7 generates just an R for > > > > these, not both R and W, but won't it still be annotated with an mb tag? > > > > And wasn't this matter of failed RMWs one of the issues that the two of > > > > you specifically wanted to make explicit in the memory model, rather > > > > than implicit in the operation of herd7? > > > > > > That's why we use the RMW_MB tag. I should have copied that definition too: > > > > > > > > > (* full barrier events that appear in non-failing RMW *) > > > let RMW_MB = Mb & (dom(rmw) | range(rmw)) > > > > > > > > > This ensures that the only successful rmw instructions have an RMW_MB tag. > > > > It would be better if there was a way to tell herd7 not to add the 'mb > > tag to failed instructions in the first place. This approach is > > brittle; see below. > > Hernan told me that in fact that is actually currently the case in herd7. > Failing RMW get assigned the Once tag implicitly. > Another thing that I'd suggest to change. Let's please be careful, though. There is a lot out there that depends on this semantic, odd though it might seem at first glance. ;-) Thanx, Paul > > An alternative would be to have a way for the .cat file to remove the > > 'mb tag from a failed RMW instruction. But I don't know if this is > > feasible. > > For Mb it's feasible, as there is no Mb read or Mb store. > > Mb = Mb & (~M | dom(rmw) | range(rmw)) > > However one would want to do the same for Acq and Rel. > > For that one would need to distinguish e.g. between a read that comes from a > failed rmw instruction, and where the tag would disappear, or a normal > standalone read. > > For example, by using two different acquire tags, 'acquire and 'rmw-acquire, > and defining > > Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw)) > > Anyways we can do this change independently. So for now, we don't need > RMW_MB. > > > > > > > > And wasn't another one of these issues the difference between > > > > value-returning and non-value-returning RMWs? As far as I can, nothing > > > > in the .def file specifically mentions this. There's the noreturn tag > > > > in the .bell file, but nothing in the .def file says which instructions > > > > it applies to. Or are we supposed to know that it automatically applies > > > > to all __atomic_op() instances? > > > > > > Ah, now you're already forestalling my next suggestion :)) > > > > > > I would say let's fix one issue at a time (learned this from Andrea). > > > > > > For the other issue, do noreturn rmws always have the same ordering as once? > > > > If they aren't annotated with _acquire or _release then yes... And I > > don't know whether there _are_ any annotated no-return RMWs. If > > somebody wanted such a thing, they probably would just use a > > value-returning RMW instead. > > > > > I suspect we need to extend herd slightly more to support the second one, I > > > don't know if there's syntax for passing tags to __atomic_op. > > > > This may not be be needed. But still, it would nice to be explicit (in > > a comment in the .def file if nowhere else) that __atomic_op always adds > > a 'noreturn tag. > > > > > > > instructions RMW[{'once,'acquire,'release,'mb}] > > > > > > > > > > then the Mb tags would appear in the graph. And then I'd define the ordering > > > > > explicitly. One way is to say that an Mb tag orders all memory accesses > > > > > before(or at) the tag with all memory accesses after(or at) the tag, except > > > > > the accesses of the rmw with each other. > > > > > > > > I don't think you need to add very much. The .cat file already defines > > > > the mb relation as including the term: > > > > > > > > ([M] ; fencerel(Mb) ; [M]) > > > > > > > > All that's needed is to replace the fencerel(Mb) with something more > > > > general... > > > > And this is why I said the RMW_MB mechanism is brittle. With the 'mb > > tag still added to failed RMW events, the term above will cause the > > memory model to think there is ordering even though the event isn't in > > the RMW_MB class. > > > > Huh, I thought that fencerel(Mb) is something like po ; [F & Mb] ; po (where > F includes standalone fences). But looking into the stdlib.cat, you're > right. > > > > > > Also, is the "\ rmw" part really necessary? I don't think it makes any > > > > difference; the memory model already knows that the read part of an RMW > > > > has to happen before the write part. > > > > > > It unfortunately does make a difference, because mb is a strong fence. > > > This means that an Mb in an rmw sequence would create additional pb edges. > > > > > > prop;(rfe ; [Mb];rmw;[Mb]) > > > > > > I believe this is might give wrong results on powerpc, but I'd need to > > > double check. > > > > PowerPC uses a load-reserve/store-conditional approach for RMW, so it's > > tricky. However, you're right that having a fictitious mb between the > > read and the write of an RMW would mean that the preceding (in coherence > > order) write would have to be visible to all CPUs before the RMW write > > could execute, and I don't believe we want to assert this. > > > > > > > One could also split it into two rules to keep with the "two full fences" > > > > > analogy. Maybe a good way would be like this: > > > > > > > > > > [M] ; po; [RMW_MB & R] ; po^? ; [M] > > > > > > > > > > [M] ; po^?; [RMW_MB & W] ; po ; [M] > > > > > > > > My preference is for the first approach. > > > > > > That was also my early preference, but to be honest I expected that you'd > > > prefer the second approach. > > > Actually, I also started warming up to it. > > > > If you do want to use this approach, it should be simplified. All you > > need is: > > > > [M] ; po ; [RMW_MB] > > > > [RMW_MB] ; po ; [M] > > > > This is because events tagged with RMW_MB always are memory accesses, > > and accesses that aren't part of the RMW are already covered by the > > fencerel(Mb) thing above. > > This has exactly the issue mentioned above - it will cause the rmw to have > an internal strong fence that on powerpc probably isn't there. > > We could do (with the assumption that Mb applies only to successful rmw): > > [M] ; po ; [Mb & R] > [Mb & W] ; po ; [M] > > > Kind regards, jonas > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-23 12:54 ` Jonas Oberhauser 2024-05-23 13:35 ` Paul E. McKenney @ 2024-05-23 14:05 ` Alan Stern 2024-05-23 14:26 ` Hernan Ponce de Leon 2024-05-23 14:36 ` Jonas Oberhauser 1 sibling, 2 replies; 33+ messages in thread From: Alan Stern @ 2024-05-23 14:05 UTC (permalink / raw) To: Jonas Oberhauser Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote: > > > Am 5/22/2024 um 4:20 PM schrieb Alan Stern: > > It would be better if there was a way to tell herd7 not to add the 'mb > > tag to failed instructions in the first place. This approach is > > brittle; see below. > > Hernan told me that in fact that is actually currently the case in herd7. > Failing RMW get assigned the Once tag implicitly. > Another thing that I'd suggest to change. Indeed. > > An alternative would be to have a way for the .cat file to remove the > > 'mb tag from a failed RMW instruction. But I don't know if this is > > feasible. > > For Mb it's feasible, as there is no Mb read or Mb store. > > Mb = Mb & (~M | dom(rmw) | range(rmw)) > > However one would want to do the same for Acq and Rel. > > For that one would need to distinguish e.g. between a read that comes from a > failed rmw instruction, and where the tag would disappear, or a normal > standalone read. > > For example, by using two different acquire tags, 'acquire and 'rmw-acquire, > and defining > > Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw)) > > Anyways we can do this change independently. So for now, we don't need > RMW_MB. Overall, it seems better to have herd7 assign the right tag, but change the way the .def file works so that it can tell herd7 which tag to use in each of the success and failure cases. > > [M] ; po ; [RMW_MB] > > > > [RMW_MB] ; po ; [M] > > > > This is because events tagged with RMW_MB always are memory accesses, > > and accesses that aren't part of the RMW are already covered by the > > fencerel(Mb) thing above. > > This has exactly the issue mentioned above - it will cause the rmw to have > an internal strong fence that on powerpc probably isn't there. Oops, that's right. Silly oversight on my part. But at least you understood what I meant. > We could do (with the assumption that Mb applies only to successful rmw): > > [M] ; po ; [Mb & R] > [Mb & W] ; po ; [M] That works. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-23 14:05 ` Alan Stern @ 2024-05-23 14:26 ` Hernan Ponce de Leon 2024-05-23 15:14 ` Boqun Feng 2024-05-23 16:05 ` Alan Stern 2024-05-23 14:36 ` Jonas Oberhauser 1 sibling, 2 replies; 33+ messages in thread From: Hernan Ponce de Leon @ 2024-05-23 14:26 UTC (permalink / raw) To: Alan Stern, Jonas Oberhauser Cc: Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On 5/23/2024 4:05 PM, Alan Stern wrote: > On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote: >> >> >> Am 5/22/2024 um 4:20 PM schrieb Alan Stern: >>> It would be better if there was a way to tell herd7 not to add the 'mb >>> tag to failed instructions in the first place. This approach is >>> brittle; see below. >> >> Hernan told me that in fact that is actually currently the case in herd7. >> Failing RMW get assigned the Once tag implicitly. >> Another thing that I'd suggest to change. > > Indeed. > >>> An alternative would be to have a way for the .cat file to remove the >>> 'mb tag from a failed RMW instruction. But I don't know if this is >>> feasible. >> >> For Mb it's feasible, as there is no Mb read or Mb store. >> >> Mb = Mb & (~M | dom(rmw) | range(rmw)) >> >> However one would want to do the same for Acq and Rel. >> >> For that one would need to distinguish e.g. between a read that comes from a >> failed rmw instruction, and where the tag would disappear, or a normal >> standalone read. >> >> For example, by using two different acquire tags, 'acquire and 'rmw-acquire, >> and defining >> >> Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw)) >> >> Anyways we can do this change independently. So for now, we don't need >> RMW_MB. > > Overall, it seems better to have herd7 assign the right tag, but change > the way the .def file works so that it can tell herd7 which tag to use > in each of the success and failure cases. I am not fully sure how herd7 uses the .def file, but I guess something like adding a second memory tag to __cmpxchg could work cmpxchg(X,V,W) __cmpxchg{mb, once}(X,V,W) cmpxchg_relaxed(X,V,W) __cmpxchg{once, once}(X,V,W) cmpxchg_acquire(X,V,W) __cmpxchg{acquire, acquire}(X,V,W) cmpxchg_release(X,V,W) __cmpxchg{release, release}(X,V,W) Hernan > >>> [M] ; po ; [RMW_MB] >>> >>> [RMW_MB] ; po ; [M] >>> >>> This is because events tagged with RMW_MB always are memory accesses, >>> and accesses that aren't part of the RMW are already covered by the >>> fencerel(Mb) thing above. >> >> This has exactly the issue mentioned above - it will cause the rmw to have >> an internal strong fence that on powerpc probably isn't there. > > Oops, that's right. Silly oversight on my part. But at least you > understood what I meant. > >> We could do (with the assumption that Mb applies only to successful rmw): >> >> [M] ; po ; [Mb & R] >> [Mb & W] ; po ; [M] > > That works. > > Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-23 14:26 ` Hernan Ponce de Leon @ 2024-05-23 15:14 ` Boqun Feng 2024-05-24 1:38 ` Alan Stern 2024-05-23 16:05 ` Alan Stern 1 sibling, 1 reply; 33+ messages in thread From: Boqun Feng @ 2024-05-23 15:14 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: Alan Stern, Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget, Joel Fernandes On Thu, May 23, 2024 at 04:26:23PM +0200, Hernan Ponce de Leon wrote: > On 5/23/2024 4:05 PM, Alan Stern wrote: > > On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote: > > > > > > > > > Am 5/22/2024 um 4:20 PM schrieb Alan Stern: > > > > It would be better if there was a way to tell herd7 not to add the 'mb > > > > tag to failed instructions in the first place. This approach is > > > > brittle; see below. > > > > > > Hernan told me that in fact that is actually currently the case in herd7. > > > Failing RMW get assigned the Once tag implicitly. > > > Another thing that I'd suggest to change. > > > > Indeed. > > > > > > An alternative would be to have a way for the .cat file to remove the > > > > 'mb tag from a failed RMW instruction. But I don't know if this is > > > > feasible. > > > > > > For Mb it's feasible, as there is no Mb read or Mb store. > > > > > > Mb = Mb & (~M | dom(rmw) | range(rmw)) > > > > > > However one would want to do the same for Acq and Rel. > > > > > > For that one would need to distinguish e.g. between a read that comes from a > > > failed rmw instruction, and where the tag would disappear, or a normal > > > standalone read. > > > > > > For example, by using two different acquire tags, 'acquire and 'rmw-acquire, > > > and defining > > > > > > Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw)) > > > > > > Anyways we can do this change independently. So for now, we don't need > > > RMW_MB. > > > > Overall, it seems better to have herd7 assign the right tag, but change > > the way the .def file works so that it can tell herd7 which tag to use > > in each of the success and failure cases. > > I am not fully sure how herd7 uses the .def file, but I guess something like > adding a second memory tag to __cmpxchg could work > > cmpxchg(X,V,W) __cmpxchg{mb, once}(X,V,W) > cmpxchg_relaxed(X,V,W) __cmpxchg{once, once}(X,V,W) > cmpxchg_acquire(X,V,W) __cmpxchg{acquire, acquire}(X,V,W) > cmpxchg_release(X,V,W) __cmpxchg{release, release}(X,V,W) > Note that cmpxchg_acquire() and cmpxchg_release() don't have _acqurie or _release ordering if they fails. Besides, I'm not sure this is a good idea. Because the "{mb}, {once}, etc" part is a syntax thing, you write a cmpxchg(), it should be translated to a cmpxchg event with MB tag on. As to failed cmpxchg() doesn't provide ordering, it's a semantics thing, as Jonas showed that it can be represent in cat file. As long as it's a semanitc thing and we can represent in cat file, I don't think we want herd to give a special treatment. What you and Jonas looks fine to me, since it moves the semantic bits from herd internal to cat file. Regards, Boqun > Hernan > > > > > > > [M] ; po ; [RMW_MB] > > > > > > > > [RMW_MB] ; po ; [M] > > > > > > > > This is because events tagged with RMW_MB always are memory accesses, > > > > and accesses that aren't part of the RMW are already covered by the > > > > fencerel(Mb) thing above. > > > > > > This has exactly the issue mentioned above - it will cause the rmw to have > > > an internal strong fence that on powerpc probably isn't there. > > > > Oops, that's right. Silly oversight on my part. But at least you > > understood what I meant. > > > > > We could do (with the assumption that Mb applies only to successful rmw): > > > > > > [M] ; po ; [Mb & R] > > > [Mb & W] ; po ; [M] > > > > That works. > > > > Alan > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-23 15:14 ` Boqun Feng @ 2024-05-24 1:38 ` Alan Stern 2024-05-24 2:50 ` Boqun Feng 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2024-05-24 1:38 UTC (permalink / raw) To: Boqun Feng Cc: Hernan Ponce de Leon, Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget, Joel Fernandes On Thu, May 23, 2024 at 08:14:38AM -0700, Boqun Feng wrote: > Besides, I'm not sure this is a good idea. Because the "{mb}, {once}, > etc" part is a syntax thing, you write a cmpxchg(), it should be > translated to a cmpxchg event with MB tag on. As to failed cmpxchg() > doesn't provide ordering, it's a semantics thing, as Jonas showed that > it can be represent in cat file. As long as it's a semanitc thing and we > can represent in cat file, I don't think we want herd to give a special > treatment. I don't really understand the distinction you're making between syntactic things and semantic things. For most instructions there's no problem, because the instruction does just one thing. But a cmpxchg instruction can do either of two things, depending on whether it succeeds or fails, so it makes sense to tell herd7 how to represent both of them. > What you and Jonas looks fine to me, since it moves the semantic bits > from herd internal to cat file. Trying to recognize failed RMW events by looking for R events with an mb tag that aren't in the rmw relation seems very artificial. That fact that it would work is merely an artifact of herd7's internal actions. I think it would be much cleaner if herd7 represented these events in some other way, particularly if we can tell it how. After all, herd7 already does generate different sets of events for successful (both R and W) and failed (only R) RMWs. It's not such a big stretch to make the R events it generates different in the two cases. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-24 1:38 ` Alan Stern @ 2024-05-24 2:50 ` Boqun Feng 2024-05-24 14:14 ` Alan Stern 0 siblings, 1 reply; 33+ messages in thread From: Boqun Feng @ 2024-05-24 2:50 UTC (permalink / raw) To: Alan Stern Cc: Hernan Ponce de Leon, Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget, Joel Fernandes On Thu, May 23, 2024 at 09:38:05PM -0400, Alan Stern wrote: > On Thu, May 23, 2024 at 08:14:38AM -0700, Boqun Feng wrote: > > Besides, I'm not sure this is a good idea. Because the "{mb}, {once}, > > etc" part is a syntax thing, you write a cmpxchg(), it should be > > translated to a cmpxchg event with MB tag on. As to failed cmpxchg() > > doesn't provide ordering, it's a semantics thing, as Jonas showed that > > it can be represent in cat file. As long as it's a semanitc thing and we > > can represent in cat file, I don't think we want herd to give a special > > treatment. > > I don't really understand the distinction you're making between > syntactic things and semantic things. For most instructions there's no Syntax is how the code is written, and semantic is how the code is executed (in each execution candidate). So if we write a cmpxchg{mb}(), and in execution candiates, it could generates a read{MB} event and a write{MB} event (succeed case), or a read{MB} event (fail case), "{MB}" here doesn't mean it's a full barrier, it only means the event comes from a no suffix API. Here "{MB}" only has syntactic meaning (no semantic meaning). > problem, because the instruction does just one thing. But a cmpxchg > instruction can do either of two things, depending on whether it > succeeds or fails, so it makes sense to tell herd7 how to represent > both of them. > > > What you and Jonas looks fine to me, since it moves the semantic bits > > from herd internal to cat file. > > Trying to recognize failed RMW events by looking for R events with an mb > tag that aren't in the rmw relation seems very artificial. That fact Not really, RMW events contains all events generated from read-modify-write primitives, if there is an R event without a rmw relation (i.e there is no corresponding write event), it's a failed RWM by definition: it cannot be anything else. > that it would work is merely an artifact of herd7's internal actions. I > think it would be much cleaner if herd7 represented these events in some > other way, particularly if we can tell it how. > > After all, herd7 already does generate different sets of events for > successful (both R and W) and failed (only R) RMWs. It's not such a big > stretch to make the R events it generates different in the two cases. > I thought we want to simplify the herd internal processing by avoid adding mb events in cmpxchg() cases, in the same spirit, if syntactic tagging is already good enough, why do we want to make herd complicate? Regards, Boqun > Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-24 2:50 ` Boqun Feng @ 2024-05-24 14:14 ` Alan Stern 2024-05-24 14:34 ` Boqun Feng 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2024-05-24 14:14 UTC (permalink / raw) To: Boqun Feng Cc: Hernan Ponce de Leon, Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget, Joel Fernandes On Thu, May 23, 2024 at 07:50:11PM -0700, Boqun Feng wrote: > On Thu, May 23, 2024 at 09:38:05PM -0400, Alan Stern wrote: > > On Thu, May 23, 2024 at 08:14:38AM -0700, Boqun Feng wrote: > > > Besides, I'm not sure this is a good idea. Because the "{mb}, {once}, > > > etc" part is a syntax thing, you write a cmpxchg(), it should be > > > translated to a cmpxchg event with MB tag on. As to failed cmpxchg() > > > doesn't provide ordering, it's a semantics thing, as Jonas showed that > > > it can be represent in cat file. As long as it's a semanitc thing and we > > > can represent in cat file, I don't think we want herd to give a special > > > treatment. > > > > I don't really understand the distinction you're making between > > syntactic things and semantic things. For most instructions there's no > > Syntax is how the code is written, and semantic is how the code is > executed (in each execution candidate). So if we write a cmpxchg{mb}(), > and in execution candiates, it could generates a read{MB} event and a > write{MB} event (succeed case), or a read{MB} event (fail case), "{MB}" > here doesn't mean it's a full barrier, it only means the event comes > from a no suffix API. Here "{MB}" only has syntactic meaning (no > semantic meaning). Okay, I get it. Then you might agree that it probably would be better to use a different tag here, because the mb tag is already in use with other instructions (like smp_mb()) where it does always mean there's a full barrier. > Not really, RMW events contains all events generated from > read-modify-write primitives, if there is an R event without a rmw > relation (i.e there is no corresponding write event), it's a failed RWM > by definition: it cannot be anything else. Not true. An R event without an rmw relation could be a READ_ONCE(). Or a plain read. The memory model uses the tag to distinguish these cases. > > that it would work is merely an artifact of herd7's internal actions. I > > think it would be much cleaner if herd7 represented these events in some > > other way, particularly if we can tell it how. > > > > After all, herd7 already does generate different sets of events for > > successful (both R and W) and failed (only R) RMWs. It's not such a big > > stretch to make the R events it generates different in the two cases. > > > > I thought we want to simplify the herd internal processing by avoid > adding mb events in cmpxchg() cases, in the same spirit, if syntactic > tagging is already good enough, why do we want to make herd complicate? Herd7 already is complicated by the fact that it needs to handle cmpxchg() instructions in two ways: success and failure. This complication is unavoidable. Adding one extra layer (different tags for the different ways) is an insignificant increase in the complication, IMO. And it's a net reduction when you compare it to the amount of complication currently in the herd7 code. Also what about cmpxchg_acquire()? If it fails, it will generate an R event with an acquire tag not in the rmw relation. There is no way to tell such events apart from a normal smp_load_acquire(), and so the .cat file would have no way to know that the event should not have acquire semantics. I guess we would have to rename this tag, as well. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-24 14:14 ` Alan Stern @ 2024-05-24 14:34 ` Boqun Feng 2024-05-24 14:53 ` Alan Stern 0 siblings, 1 reply; 33+ messages in thread From: Boqun Feng @ 2024-05-24 14:34 UTC (permalink / raw) To: Alan Stern Cc: Hernan Ponce de Leon, Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget, Joel Fernandes On Fri, May 24, 2024 at 10:14:25AM -0400, Alan Stern wrote: [...] > > Not really, RMW events contains all events generated from > > read-modify-write primitives, if there is an R event without a rmw > > relation (i.e there is no corresponding write event), it's a failed RWM > > by definition: it cannot be anything else. > > Not true. An R event without an rmw relation could be a READ_ONCE(). No, the R event is already in the RWM events, it has come from a rwm atomic. > Or a plain read. The memory model uses the tag to distinguish these > cases. > > > > that it would work is merely an artifact of herd7's internal actions. I > > > think it would be much cleaner if herd7 represented these events in some > > > other way, particularly if we can tell it how. > > > > > > After all, herd7 already does generate different sets of events for > > > successful (both R and W) and failed (only R) RMWs. It's not such a big > > > stretch to make the R events it generates different in the two cases. > > > > > > > I thought we want to simplify the herd internal processing by avoid > > adding mb events in cmpxchg() cases, in the same spirit, if syntactic > > tagging is already good enough, why do we want to make herd complicate? > > Herd7 already is complicated by the fact that it needs to handle > cmpxchg() instructions in two ways: success and failure. This > complication is unavoidable. Adding one extra layer (different tags for > the different ways) is an insignificant increase in the complication, > IMO. And it's a net reduction when you compare it to the amount of > complication currently in the herd7 code. > > Also what about cmpxchg_acquire()? If it fails, it will generate an R > event with an acquire tag not in the rmw relation. There is no way to > tell such events apart from a normal smp_load_acquire(), and so the .cat > file would have no way to know that the event should not have acquire > semantics. I guess we would have to rename this tag, as well. No, let read_of_rmw = (RMW & R) let fail_read_of_rmw = read_of_rmw / dom(rmw) let fail_cmpxchg_acquire = fail_read_of_rmw & [Acquire] gives you all the failed cmpxchg_acquire() apart from a normal smp_load_acquire(). Regards, Boqun > > Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-24 14:34 ` Boqun Feng @ 2024-05-24 14:53 ` Alan Stern 2024-05-24 18:09 ` Jonas Oberhauser 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2024-05-24 14:53 UTC (permalink / raw) To: Boqun Feng Cc: Hernan Ponce de Leon, Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget, Joel Fernandes On Fri, May 24, 2024 at 07:34:06AM -0700, Boqun Feng wrote: > On Fri, May 24, 2024 at 10:14:25AM -0400, Alan Stern wrote: > [...] > > > Not really, RMW events contains all events generated from > > > read-modify-write primitives, if there is an R event without a rmw > > > relation (i.e there is no corresponding write event), it's a failed RWM > > > by definition: it cannot be anything else. > > > > Not true. An R event without an rmw relation could be a READ_ONCE(). > > No, the R event is already in the RWM events, it has come from a rwm > atomic. Oh, right. For some reason I was thinking that an instruction could belong to only one set, R or RMW. But that doesn't make sense. > > Or a plain read. The memory model uses the tag to distinguish these > > cases. > > > > > > that it would work is merely an artifact of herd7's internal actions. I > > > > think it would be much cleaner if herd7 represented these events in some > > > > other way, particularly if we can tell it how. > > > > > > > > After all, herd7 already does generate different sets of events for > > > > successful (both R and W) and failed (only R) RMWs. It's not such a big > > > > stretch to make the R events it generates different in the two cases. > > > > > > > > > > I thought we want to simplify the herd internal processing by avoid > > > adding mb events in cmpxchg() cases, in the same spirit, if syntactic > > > tagging is already good enough, why do we want to make herd complicate? > > > > Herd7 already is complicated by the fact that it needs to handle > > cmpxchg() instructions in two ways: success and failure. This > > complication is unavoidable. Adding one extra layer (different tags for > > the different ways) is an insignificant increase in the complication, > > IMO. And it's a net reduction when you compare it to the amount of > > complication currently in the herd7 code. > > > > Also what about cmpxchg_acquire()? If it fails, it will generate an R > > event with an acquire tag not in the rmw relation. There is no way to > > tell such events apart from a normal smp_load_acquire(), and so the .cat > > file would have no way to know that the event should not have acquire > > semantics. I guess we would have to rename this tag, as well. > > No, > > let read_of_rmw = (RMW & R) > let fail_read_of_rmw = read_of_rmw / dom(rmw) > let fail_cmpxchg_acquire = fail_read_of_rmw & [Acquire] > > gives you all the failed cmpxchg_acquire() apart from a normal > smp_load_acquire(). Yes, I see. Using this approach, no further changes to herd7 or the .def file would be needed. We would just have to add 'mb to the Accesses enumeration and to the list of tags allowed for an RMW instruction. Question: Since R and RMW have different lists of allowable tags, how does herd7 decide which tags are allowed for an event that is in both the R and RMW sets? Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-24 14:53 ` Alan Stern @ 2024-05-24 18:09 ` Jonas Oberhauser 2024-05-24 18:47 ` Boqun Feng 2024-05-24 18:48 ` Alan Stern 0 siblings, 2 replies; 33+ messages in thread From: Jonas Oberhauser @ 2024-05-24 18:09 UTC (permalink / raw) To: Alan Stern, Boqun Feng Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget, Joel Fernandes Am 5/24/2024 um 4:53 PM schrieb Alan Stern: > On Fri, May 24, 2024 at 07:34:06AM -0700, Boqun Feng wrote: >> On Fri, May 24, 2024 at 10:14:25AM -0400, Alan Stern wrote: >> [...] >>>> Not really, RMW events contains all events generated from >>>> read-modify-write primitives, if there is an R event without a rmw >>>> relation (i.e there is no corresponding write event), it's a failed RWM >>>> by definition: it cannot be anything else. >>> >>> Not true. An R event without an rmw relation could be a READ_ONCE(). >> >> No, the R event is already in the RWM events, it has come from a rwm >> atomic. > > Oh, right. For some reason I was thinking that an instruction could > belong to only one set, R or RMW. But that doesn't make sense. I thought the same, so I perhaps contributed to that confusion. > >>> Or a plain read. The memory model uses the tag to distinguish these >>> cases. >>> >>>>> that it would work is merely an artifact of herd7's internal actions. I >>>>> think it would be much cleaner if herd7 represented these events in some >>>>> other way, particularly if we can tell it how. >>>>> >>>>> After all, herd7 already does generate different sets of events for >>>>> successful (both R and W) and failed (only R) RMWs. It's not such a big >>>>> stretch to make the R events it generates different in the two cases. >>>>> >>>> >>>> I thought we want to simplify the herd internal processing by avoid >>>> adding mb events in cmpxchg() cases, in the same spirit, if syntactic >>>> tagging is already good enough, why do we want to make herd complicate? >>> >>> Herd7 already is complicated by the fact that it needs to handle >>> cmpxchg() instructions in two ways: success and failure. This >>> complication is unavoidable. Adding one extra layer (different tags for >>> the different ways) is an insignificant increase in the complication, >>> IMO. And it's a net reduction when you compare it to the amount of >>> complication currently in the herd7 code. >>> >>> Also what about cmpxchg_acquire()? If it fails, it will generate an R >>> event with an acquire tag not in the rmw relation. I would like that, but that is not the current implementation, a failed atomic_compare_exchange always produces a R*[once]; this behavior is currently hardcoded in herd7. >>> There is no way to >>> tell such events apart from a normal smp_load_acquire(), and so the .cat >>> file would have no way to know that the event should not have acquire >>> semantics. I guess we would have to rename this tag, as well. >> >> No, >> >> let read_of_rmw = (RMW & R) >> let fail_read_of_rmw = read_of_rmw / dom(rmw) >> let fail_cmpxchg_acquire = fail_read_of_rmw & [Acquire] >> >> gives you all the failed cmpxchg_acquire() apart from a normal >> smp_load_acquire(). > > Yes, I see. Using this approach, no further changes to herd7 or the > def file would be needed. We would just have to add 'mb to the > Accesses enumeration and to the list of tags allowed for an RMW > instruction. > > Question: Since R and RMW have different lists of allowable tags, how > does herd7 decide which tags are allowed for an event that is in both > the R and RMW sets? After looking over the patch draft for herd7 written by Hernan [1], my best guess is: it doen't. It seems that after herd7 detects you are using LKMM it simply drops all tags except 'acquire on read and 'release on store. Everything else becomes 'once (and 'mb adds some F[mb] sometimes). That means that if one were to go through with the earlier suggestion to distinguish between the smp_mb() Mb and the cmpxchg() Mb, e.g. by calling the latter RmwMb, current herd7 would always erase the RmwMb tag because it is not called "acquire" or "release". The same would happen of course if you introduced an RmwAcquire tag. So there are several options: - treat the tags as a syntactic thing which is always present, and specify their meaning purely in the cat file, analogous to what you have defined above. This is personally my favorite solution. To implement this in herd7 we would simply remove all the current special cases for the LKMM barriers, which I like. However then we need to actually define the behavior of herd if an inappropriate tag (like release on a load) is provided. Since the restriction is actually mostly enforced by the .def file - by simply not providing a smp_store_acquire() etc. -, that only concerns RMWs, where xchg_acquire() would apply the Acquire tag to the write also. The easiest solution is to simply remove the syntax for specifying restrictions - it seems it is not doing much right now anyways - and do the filtering inside .bell, with things like (* only writes can have Release tags *) let Release = Release & W \ (RMW \ rng(rmw)) One good thing about this way is that it would work even without modifying herd, since it is in a sense idempotent with the transformations done by herd. FWIW, in our internal weak memory model in Dresden we did exactly this, and use REL for the syntax and Rel for the semantic release ordering to make the distinction more clear, with things like: let Acq = (ACQ | SC) & (R | F) let Rel = (REL | SC) & (W | F) (SC is our equivalent to LKMM's Mb) - treat the tags as semantic markers that are only present or not under some circumstances, and define the semantics fully based on the present tags. The circumstances are currently hardcoded in herd7, but we should move them out with some syntax like __cmpxchg{acquire}{once}. This requires touching the parser and of course still has the issue with the acquire tag appearing on the store as well. - provide a full syntax for defining the event sequence that is expected. For example, for a cmpxchg we could do cmpxchg = { F[success-cmpxchg]; c = R&RMW[once]; if (c==v) { W&RMW[once]; } F[success-cmpxchg] } and then define in .bell that a success-cmpxchg is an mb if it is directly next to a cmpxchg that succeeds. The advantage is that you can customize the representation to whatever kernel devs thing is the most intuitive. The downside is that it requires major rewrites to herd7, someone to think about a reasonable language for specifying this etc. Best wishes, jonas [1] https://github.com/herd/herdtools7/pull/865 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-24 18:09 ` Jonas Oberhauser @ 2024-05-24 18:47 ` Boqun Feng 2024-05-24 18:48 ` Alan Stern 1 sibling, 0 replies; 33+ messages in thread From: Boqun Feng @ 2024-05-24 18:47 UTC (permalink / raw) To: Jonas Oberhauser Cc: Alan Stern, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget, Joel Fernandes On Fri, May 24, 2024 at 08:09:28PM +0200, Jonas Oberhauser wrote: [...] > > > > Question: Since R and RMW have different lists of allowable tags, how > > does herd7 decide which tags are allowed for an event that is in both > > the R and RMW sets? > > After looking over the patch draft for herd7 written by Hernan [1], my best > guess is: it doen't. It seems that after herd7 detects you are using LKMM it Right, you can event put a 'acquire tag to WRITE_ONCE(): -WRITE_ONCE(X,V) { __store{once}(X,V); } +WRITE_ONCE(X,V) { __store{acquire}(X,V); } , herd won't complain, but it may change the model behavior. Here is what I know from the herd code: * First, normally herd just put whatever the tag you specify in .def file to the accesses event (it has to be one of the tags in Access list in .bell file). * An access event looks like: (read_or_write?, ..., anonatation, is_atomic?, ...) "is_atomic?" means whether the event comes from an rmw primitives. So a READ_ONCE() will look like: (read, ..., once, false, ...) and a WRITE_ONCE() will look like: (write, ..., once, false, ...) the read part of an atomic_add_return_relaxed() is: (read, ..., once, true, ...) note that the "is_atomic?" is how this event ends up in "RMW" set. > simply drops all tags except 'acquire on read and 'release on store. Right, herd does some extra work to filter out RELEASE reads and ACQUIRE writes for rwm atomics, basically, when herd is generating events for an rmw atomic, if it's not "mb", it will only kill 'acquire for read and 'release on store, otherwise, it changes the annotation part to 'once. Funny example, if you do a __atomic_fetch_op{hello}(..), herd will treat it as a __atomic_fetch_op{once}(..) because of this. > Everything else becomes 'once (and 'mb adds some F[mb] sometimes). > > That means that if one were to go through with the earlier suggestion to > distinguish between the smp_mb() Mb and the cmpxchg() Mb, e.g. by calling > the latter RmwMb, current herd7 would always erase the RmwMb tag because it > is not called "acquire" or "release". > The same would happen of course if you introduced an RmwAcquire tag. > > So there are several options: > > - treat the tags as a syntactic thing which is always present, and > specify their meaning purely in the cat file, analogous to what you > have defined above. This is personally my favorite solution. To > implement this in herd7 we would simply remove all the current special > cases for the LKMM barriers, which I like. > Agreed. > However then we need to actually define the behavior of herd if > an inappropriate tag (like release on a load) is provided. Since the > restriction is actually mostly enforced by the .def file - by simply > not providing a smp_store_acquire() etc. -, that only concerns RMWs, > where xchg_acquire() would apply the Acquire tag to the write also. > > The easiest solution is to simply remove the syntax for specifying > restrictions - it seems it is not doing much right now anyways - and > do the filtering inside .bell, with things like > > (* only writes can have Release tags *) > let Release = Release & W \ (RMW \ rng(rmw)) > > One good thing about this way is that it would work even without > modifying herd, since it is in a sense idempotent with the Well, we still need to turn off the "smart" annotation rewritting in herd (e.g. -> once), but I think that's a good thing: it simplifies the internal work herd does, and it also helps people on other tools understand LKMM better. > transformations done by herd. > > FWIW, in our internal weak memory model in Dresden we did exactly this, so the model doesn't work elsewhere even in Germany? ;-) Sorry, couldn' t resist ;-) ;-) ;-) > and use REL for the syntax and Rel for the semantic release ordering to > make the distinction more clear, with things like: > > let Acq = (ACQ | SC) & (R | F) > let Rel = (REL | SC) & (W | F) > > (SC is our equivalent to LKMM's Mb) > > - treat the tags as semantic markers that are only present or not under > some circumstances, and define the semantics fully based on the present > tags. The circumstances are currently hardcoded in herd7, but we should > move them out with some syntax like __cmpxchg{acquire}{once}. > > This requires touching the parser and of course still has the issue > with the acquire tag appearing on the store as well. > > - provide a full syntax for defining the event sequence that is > expected. For example, for a cmpxchg we could do > > cmpxchg = { F[success-cmpxchg]; c = R&RMW[once]; if (c==v) { > W&RMW[once]; } F[success-cmpxchg] } > > and then define in .bell that a success-cmpxchg is an mb if it is > directly next to a cmpxchg that succeeds. > > The advantage is that you can customize the representation to whatever > kernel devs thing is the most intuitive. The downside is that it > requires major rewrites to herd7, someone to think about a reasonable > language for specifying this etc. > I no doubt am the fan of the first approach, herd is powerful because the ability to customize the semantic rules for model ordering models, moving more parts from herd internals to the cat file (or bell file) is always a good thing to me. Regards, Boqun > > > Best wishes, > jonas > > > > [1] https://github.com/herd/herdtools7/pull/865 > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-24 18:09 ` Jonas Oberhauser 2024-05-24 18:47 ` Boqun Feng @ 2024-05-24 18:48 ` Alan Stern 1 sibling, 0 replies; 33+ messages in thread From: Alan Stern @ 2024-05-24 18:48 UTC (permalink / raw) To: Jonas Oberhauser Cc: Boqun Feng, Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, j.alglave, luc.maranget, Joel Fernandes On Fri, May 24, 2024 at 08:09:28PM +0200, Jonas Oberhauser wrote: > > > Am 5/24/2024 um 4:53 PM schrieb Alan Stern: > > Question: Since R and RMW have different lists of allowable tags, how > > does herd7 decide which tags are allowed for an event that is in both > > the R and RMW sets? > > After looking over the patch draft for herd7 written by Hernan [1], my best > guess is: it doen't. It seems that after herd7 detects you are using LKMM it > simply drops all tags except 'acquire on read and 'release on store. > Everything else becomes 'once (and 'mb adds some F[mb] sometimes). Okay, yes, this is the sort of thing we would like to move away from. > That means that if one were to go through with the earlier suggestion to > distinguish between the smp_mb() Mb and the cmpxchg() Mb, e.g. by calling > the latter RmwMb, current herd7 would always erase the RmwMb tag because it > is not called "acquire" or "release". > The same would happen of course if you introduced an RmwAcquire tag. > > So there are several options: > > - treat the tags as a syntactic thing which is always present, and > specify their meaning purely in the cat file, analogous to what you > have defined above. This is personally my favorite solution. To > implement this in herd7 we would simply remove all the current special > cases for the LKMM barriers, which I like. > > However then we need to actually define the behavior of herd if > an inappropriate tag (like release on a load) is provided. Since the > restriction is actually mostly enforced by the .def file - by simply > not providing a smp_store_acquire() etc. -, that only concerns RMWs, > where xchg_acquire() would apply the Acquire tag to the write also. > > The easiest solution is to simply remove the syntax for specifying > restrictions - it seems it is not doing much right now anyways - and > do the filtering inside .bell, with things like > > (* only writes can have Release tags *) > let Release = Release & W \ (RMW \ rng(rmw)) > > One good thing about this way is that it would work even without > modifying herd, since it is in a sense idempotent with the > transformations done by herd. This seems like a good approach. > FWIW, in our internal weak memory model in Dresden we did exactly this, > and use REL for the syntax and Rel for the semantic release ordering to > make the distinction more clear, with things like: > > let Acq = (ACQ | SC) & (R | F) > let Rel = (REL | SC) & (W | F) > > (SC is our equivalent to LKMM's Mb) Definitely, the syntactic markers should be easily distinguished (by case would be a good way) from the semantic classes used in the model. > - treat the tags as semantic markers that are only present or not under > some circumstances, and define the semantics fully based on the present > tags. The circumstances are currently hardcoded in herd7, but we should > move them out with some syntax like __cmpxchg{acquire}{once}. > > This requires touching the parser and of course still has the issue > with the acquire tag appearing on the store as well. > > - provide a full syntax for defining the event sequence that is > expected. For example, for a cmpxchg we could do > > cmpxchg = { F[success-cmpxchg]; c = R&RMW[once]; if (c==v) { > W&RMW[once]; } F[success-cmpxchg] } > > and then define in .bell that a success-cmpxchg is an mb if it is > directly next to a cmpxchg that succeeds. > > The advantage is that you can customize the representation to whatever > kernel devs thing is the most intuitive. The downside is that it > requires major rewrites to herd7, someone to think about a reasonable > language for specifying this etc. Let's avoid major rewrites to herd7. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-23 14:26 ` Hernan Ponce de Leon 2024-05-23 15:14 ` Boqun Feng @ 2024-05-23 16:05 ` Alan Stern 1 sibling, 0 replies; 33+ messages in thread From: Alan Stern @ 2024-05-23 16:05 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On Thu, May 23, 2024 at 04:26:23PM +0200, Hernan Ponce de Leon wrote: > On 5/23/2024 4:05 PM, Alan Stern wrote: > > Overall, it seems better to have herd7 assign the right tag, but change > > the way the .def file works so that it can tell herd7 which tag to use > > in each of the success and failure cases. > > I am not fully sure how herd7 uses the .def file, but I guess something like > adding a second memory tag to __cmpxchg could work > > cmpxchg(X,V,W) __cmpxchg{mb, once}(X,V,W) > cmpxchg_relaxed(X,V,W) __cmpxchg{once, once}(X,V,W) > cmpxchg_acquire(X,V,W) __cmpxchg{acquire, acquire}(X,V,W) > cmpxchg_release(X,V,W) __cmpxchg{release, release}(X,V,W) Right, except that the last two should be: cmpxchg_acquire(X,V,W) __cmpxchg{acquire, once}(X,V,W) cmpxchg_release(X,V,W) __cmpxchg{release, once}(X,V,W) Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-23 14:05 ` Alan Stern 2024-05-23 14:26 ` Hernan Ponce de Leon @ 2024-05-23 14:36 ` Jonas Oberhauser 1 sibling, 0 replies; 33+ messages in thread From: Jonas Oberhauser @ 2024-05-23 14:36 UTC (permalink / raw) To: Alan Stern Cc: Hernan Ponce de Leon, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes Am 5/23/2024 um 4:05 PM schrieb Alan Stern: > On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote: >> >> >> Am 5/22/2024 um 4:20 PM schrieb Alan Stern: >>> It would be better if there was a way to tell herd7 not to add the 'mb >>> tag to failed instructions in the first place. This approach is >>> brittle; see below. >> >> Hernan told me that in fact that is actually currently the case in herd7. >> Failing RMW get assigned the Once tag implicitly. >> Another thing that I'd suggest to change. > > Indeed. > >>> An alternative would be to have a way for the .cat file to remove the >>> 'mb tag from a failed RMW instruction. But I don't know if this is >>> feasible. >> >> For Mb it's feasible, as there is no Mb read or Mb store. >> >> Mb = Mb & (~M | dom(rmw) | range(rmw)) >> >> However one would want to do the same for Acq and Rel. >> >> For that one would need to distinguish e.g. between a read that comes from a >> failed rmw instruction, and where the tag would disappear, or a normal >> standalone read. >> >> For example, by using two different acquire tags, 'acquire and 'rmw-acquire, >> and defining >> >> Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw)) >> >> Anyways we can do this change independently. So for now, we don't need >> RMW_MB. > > Overall, it seems better to have herd7 assign the right tag, but change > the way the .def file works so that it can tell herd7 which tag to use > in each of the success and failure cases. Yes, that would be good. In principle herd should already support this kind of logic for e.g. C11 which also has distinct success and failure modes. But of course I don't know if there's syntax to make this change in .def, let alone what it would look like. > But at least you > understood what I meant. I do try :) (: :) > >> We could do (with the assumption that Mb applies only to successful rmw): >> >> [M] ; po ; [Mb & R] >> [Mb & W] ; po ; [M] > > That works. Ok, I'll prepare a patch for this and Andrea or anyone else can still interject. I suppose the patch would not change the semantics at all with the current herd7, since Mb does not appear on any reads and writes for the time being. best wishes, jonas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: LKMM: Making RMW barriers explicit 2024-05-18 0:31 ` Alan Stern 2024-05-21 9:57 ` Jonas Oberhauser @ 2024-05-21 11:38 ` Hernan Ponce de Leon 1 sibling, 0 replies; 33+ messages in thread From: Hernan Ponce de Leon @ 2024-05-21 11:38 UTC (permalink / raw) To: Alan Stern Cc: Jonas Oberhauser, Paul E. McKenney, linux-kernel, linux-arch, kernel-team, parri.andrea, boqun.feng, j.alglave, luc.maranget, Joel Fernandes On 5/18/2024 2:31 AM, Alan Stern wrote: > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote: >> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote: >>> >>> >>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern: >>>> Hernan and Jonas: >>>> >>>> Can you explain more fully the changes you want to make to herd7 and/or >>>> the LKMM? The goal is to make the memory barriers currently implicit in >>>> RMW operations explicit, but I couldn't understand how you propose to do >>>> this. >>>> >>>> Are you going to change herd7 somehow, and if so, how? It seems like >>>> you should want to provide sufficient information so that the .bell >>>> and .cat files can implement the appropriate memory barriers associated >>>> with each RMW operation. What additional information is needed? And >>>> how (explained in English, not by quoting source code) will the .bell >>>> and .cat files make use of this information? >>>> >>>> Alan >>> >>> >>> I don't know whether herd7 needs to be changed. Probably, herd7 does the >>> following: >>> - if a tag called Mb appears on an rmw instruction (by instruction I >>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with >>> one of those things: >>> * full mb ; once (the rmw) ; full mb, if a value returning >>> (successful) rmw >>> * once (the rmw) otherwise >>> - everything else gets translated 1:1 into some internal representation >> >> This is my understanding from reading the source code of CSem.ml in herd7's >> repo. >> >> Also, this is exactly what dartagnan is currently doing. >> >>> >>> What I'm proposing is: >>> 1. remove this transpilation step, >>> 2. and instead allow the Mb tag to actually appear on RMW instructions >>> 3. change the cat file to explicitly define the behavior of the Mb tag >>> on RMW instructions >> >> These are the exact 3 things I changed in dartagnan for testing what Jonas >> proposed. >> >> I am not sure if further changes are needed for herd7. I implemented these changes in herd7 and they seem enough. I opened a PRs for discussion https://github.com/herd/herdtools7/pull/865 > > Okay, good. This answers the first part of what I asked. What about > the second part? That is, how will the changes to the .def, .bell, and > .cat files achieve your goals? At least the cat model needs to be updated. If I remove the fences from herd7, but keep the current model, these 4 tests fail. --- Summary: !!! Result changed: ./litmus/manual/locked/SUW+or-ow+l-ow-or.litmus.out.new !!! Result changed: ./litmus/manual/atomic/C-PaulEMcKenney-SB+adat-o+adat-o.litmus.out.new !!! Result changed: ./litmus/manual/atomic/C-atomic-01.litmus.out.new !!! Result changed: ./litmus/manual/atomic/C-atomic-02.litmus.out.new Using this patch I get the correct results diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat index adf3c4f41229..7e4787cdbd66 100644 --- a/tools/memory-model/linux-kernel.cat +++ b/tools/memory-model/linux-kernel.cat @@ -34,6 +34,12 @@ let R4rmb = R \ Noreturn (* Reads for which rmb works *) let rmb = [R4rmb] ; fencerel(Rmb) ; [R4rmb] let wmb = [W] ; fencerel(Wmb) ; [W] let mb = ([M] ; fencerel(Mb) ; [M]) | + (* everything across a full barrier RMW is ordered. This includes up to one event inside the RMW. *) + ([M] ; po ; [RMW & Mb] ; po ; [M]) | + (* full barrier RMW writes are ordered with everything behind the RMW *) + ([W & RMW & Mb] ; po ; [M]) | + (* full barrier RMW reads are ordered with everything before the RMW *) + ([M] ; po ; [R & RMW & Mb]) | ([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) | ([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) | ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) | Hernan > > Alan ^ permalink raw reply related [flat|nested] 33+ messages in thread
end of thread, other threads:[~2024-05-24 18:48 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-16 1:43 LKMM: Making RMW barriers explicit Alan Stern 2024-05-16 8:31 ` Jonas Oberhauser 2024-05-16 8:44 ` Hernan Ponce de Leon 2024-05-18 0:31 ` Alan Stern 2024-05-21 9:57 ` Jonas Oberhauser 2024-05-21 15:36 ` Alan Stern 2024-05-22 9:20 ` Jonas Oberhauser 2024-05-22 14:20 ` Alan Stern 2024-05-22 16:54 ` Andrea Parri 2024-05-22 18:20 ` Alan Stern 2024-05-22 19:48 ` Hernan Ponce de Leon 2024-05-23 9:04 ` Andrea Parri 2024-05-23 14:27 ` Jonas Oberhauser 2024-05-23 16:35 ` Andrea Parri 2024-05-23 20:30 ` Jonas Oberhauser 2024-05-24 3:30 ` Andrea Parri 2024-05-24 8:16 ` Hernan Ponce de Leon 2024-05-23 12:54 ` Jonas Oberhauser 2024-05-23 13:35 ` Paul E. McKenney 2024-05-23 14:05 ` Alan Stern 2024-05-23 14:26 ` Hernan Ponce de Leon 2024-05-23 15:14 ` Boqun Feng 2024-05-24 1:38 ` Alan Stern 2024-05-24 2:50 ` Boqun Feng 2024-05-24 14:14 ` Alan Stern 2024-05-24 14:34 ` Boqun Feng 2024-05-24 14:53 ` Alan Stern 2024-05-24 18:09 ` Jonas Oberhauser 2024-05-24 18:47 ` Boqun Feng 2024-05-24 18:48 ` Alan Stern 2024-05-23 16:05 ` Alan Stern 2024-05-23 14:36 ` Jonas Oberhauser 2024-05-21 11:38 ` Hernan Ponce de Leon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox