From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9460EC6FA8A for ; Sat, 10 Sep 2022 15:06:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229455AbiIJPDe (ORCPT ); Sat, 10 Sep 2022 11:03:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229582AbiIJPDc (ORCPT ); Sat, 10 Sep 2022 11:03:32 -0400 Received: from netrider.rowland.org (netrider.rowland.org [192.131.102.5]) by lindbergh.monkeyblade.net (Postfix) with SMTP id 7EFCC46617 for ; Sat, 10 Sep 2022 08:03:31 -0700 (PDT) Received: (qmail 520554 invoked by uid 1000); 10 Sep 2022 11:03:30 -0400 Date: Sat, 10 Sep 2022 11:03:30 -0400 From: Alan Stern To: Hernan Luis Ponce de Leon Cc: Jonas Oberhauser , Boqun Feng , Peter Zijlstra , "Paul E. McKenney" , "parri.andrea@gmail.com" , "will@kernel.org" , "npiggin@gmail.com" , "dhowells@redhat.com" , "j.alglave@ucl.ac.uk" , "luc.maranget@inria.fr" , "akiyks@gmail.com" , "dlustig@nvidia.com" , "joel@joelfernandes.org" , "linux-kernel@vger.kernel.org" , "linux-arch@vger.kernel.org" Subject: Re: "Verifying and Optimizing Compact NUMA-Aware Locks on Weak Memory Models" Message-ID: References: <20220826124812.GA3007435@paulmck-ThinkPad-P17-Gen-1> <674d0fda790d4650899e2fcf43894053@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-arch@vger.kernel.org On Sat, Sep 10, 2022 at 12:11:36PM +0000, Hernan Luis Ponce de Leon wrote: > > What they mean seems to be that a prop relation followed only by wmb > (not mb) doesn't enforce the order of some writes to the same > location, leading to the claimed hang in qspinlock (at least as far as > LKMM is concerned). You were quoting Jonas here, right? The email doesn't make this obvious because it doesn't have two levels of "> > " markings. > What we mean is that wmb does not give the same propagation properties as mb. In general, _no_ two distinct relations in the LKMM have the same propagation properties. If wmb always behaved the same way as mb, we wouldn't use two separate words for them. > The claim is based on these relations from the memory model > > let strong-fence = mb | gp > ... > let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb | > po-unlock-lock-po) ; [Marked] > let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ; > [Marked] ; rfe? ; [Marked] Please be more specific. What difference between mb and wmb are you concerned about? Can you give a small litmus test that illustrates this difference? Can you explain in more detail how this difference affects the qspinlock implementation? > From an engineering perspective, I think the only issue is that cat > *currently* does not have any syntax for this, Syntax for what? The difference between wmb and mb? > nor does herd currently > implement the await model checking techniques proposed in those works > (c.f. Theorem 5.3. in the "making weak memory models fair" paper, > which says that for this kind of loop, iff the mo-maximal reads in > some graph are read in a loop iteration that does not exit the loop, > the loop can run forever). However GenMC and I believe also Dat3M and > recently also Nidhugg support such techniques. It may not even be too > much effort to implement something like this in herd if desired. I believe that herd has no way to express the idea of a program running forever. On the other hand, it's certainly true (in all of these models) than for any finite number N, there is a feasible execution in which a loop runs for more than N iterations before the termination condition eventually becomes true. Alan > The Dartagnan model checker uses the Theorem 5.3 from above to detect > liveness violations. > > We did not try to come up with a litmus test about the behavior > because herd7 cannot reason about liveness. > However, if anybody is interested, the violating execution is shown here > https://github.com/huawei-drc/cna-verification/blob/master/verification-output/BUG1.png > > Hernan