From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64C89CCA47B for ; Tue, 12 Jul 2022 00:56:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232033AbiGLA4v (ORCPT ); Mon, 11 Jul 2022 20:56:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229616AbiGLA4s (ORCPT ); Mon, 11 Jul 2022 20:56:48 -0400 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8AF7D2A72C for ; Mon, 11 Jul 2022 17:56:47 -0700 (PDT) Received: by mail-pj1-x1035.google.com with SMTP id b8so4844489pjo.5 for ; Mon, 11 Jul 2022 17:56:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=kCOaQJiX/ub5vIUCXViUmqmONl/eauYA/7dZ8qeb3oU=; b=RZUFdRPiKD6UuuDpCJ77uRhOQX7myXXVL39YwByUPt+HoY0O8UNxykm9I0KeC+DFje Bjv/EsYmQhlS38mk+jHrzYHkYK8NQS8lnAqVcWH12EbaA+O9bYKaXDym/godxTGBnfAP bgzsA9z5gI918GgarMxe+LSIEQqIeDjrUrrA/qBQvdteTn/LU8P35YKQvw+afamlRjHG ZNKGb/lqmIvs3ustRg4KfDXMEbST6SgUvuygJZ9oP+3f7uwlP0LsaNtOuYfeE7GeGNCK a5YTTUArdaBeuj0pseNKj9Bk+jfJ+VeV4OXZX98DtIkDlItf+YqjLTAfL7XI7khXKSFl BEDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=kCOaQJiX/ub5vIUCXViUmqmONl/eauYA/7dZ8qeb3oU=; b=geBwRzg2ittDrazvXkbqeBPek/q6d7jIi6yDR+AiNKRsYneZ9cj14vvVVjGN9VZ2DE 0reQHY9CSNqctDD0ZgholRWSNo1EKdAkw5qawmRaesU/0VMJ/LtKIuzCMtXETCt4t199 QIVdO+pnlV+2gXJaq+LKyJe6dykRY601J7WPUuqWs1U2AMjH62uZiKI2pazfm+y3q+un nx01F2750waywpjwP72TpDt8GOj39knXkPOADHWoKpql6vZ71G5zvJ+MmvcwsnBbirQl KvL4P73aVQMWUm2oiP7zlxrmgyWi7meKIh/V274Tw52Uzdb9B/oH0nlbc6XdwdKPNdk9 jCQQ== X-Gm-Message-State: AJIora9tiL3L/I+WHbR4GSdEdVbzcWsjN0SudFgewVvwBTLwRISu5GLY QbWzhx1xJlu065ePWtS9BmtGlf1jryU= X-Google-Smtp-Source: AGRyM1scsLDcJ8lkYgqAw5lc/sZ/PJRCQZAJAZIQzI4lbXdlNLovRExr3LbaYi8NMgQQ9dpQ1Qva2g== X-Received: by 2002:a17:903:41c2:b0:16c:52f8:9240 with SMTP id u2-20020a17090341c200b0016c52f89240mr4185882ple.161.1657587407068; Mon, 11 Jul 2022 17:56:47 -0700 (PDT) Received: from localhost (193-116-203-247.tpgi.com.au. [193.116.203.247]) by smtp.gmail.com with ESMTPSA id q1-20020a170902a3c100b001690a7df347sm5323331plb.96.2022.07.11.17.56.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Jul 2022 17:56:46 -0700 (PDT) Date: Tue, 12 Jul 2022 10:56:41 +1000 From: Nicholas Piggin Subject: Re: [PATCH 00/13] locking/qspinlock: simplify code generation To: Peter Zijlstra Cc: Boqun Feng , linux-kernel@vger.kernel.org, Waiman Long , Ingo Molnar , Will Deacon References: <20220704143820.3071004-1-npiggin@gmail.com> In-Reply-To: MIME-Version: 1.0 Message-Id: <1657587057.joeh91dvd9.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Excerpts from Peter Zijlstra's message of July 6, 2022 3:59 am: > On Tue, Jul 05, 2022 at 12:38:07AM +1000, Nicholas Piggin wrote: >> Hi, >>=20 >> Been recently looking a bit closer at queued spinlock code, and >> found it's a little tricky to follow especially the pv generation. >> This series tries to improve the situation. It's not well tested >> outside powerpc, but it's really the x86 pv code that is the >> other major complexity that should need some review and testing. >> Opinions? >=20 > perhaps something like so on top/instead? This would still allow > slotting in other implementations with relative ease and the compilers > should constant fold all this. Yeah that could be a bit neater... I don't know. It all has to be inlined and compiled together so it's a matter of taste in syntactic sugar. Doing it with C is probably better than doing it with CPP, all else being equal. At the moment I'm not planning to replace the PV functions on powerpc though. If/when it comes to that I'd say more changes would be needed. Thanks, Nick >=20 > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -609,7 +609,7 @@ static void pv_kick_node(struct qspinloc > * > * The current value of the lock will be returned for additional process= ing. > */ > -static void pv_wait_head_or_lock(struct qspinlock *lock, struct qnode *n= ode) > +static u32 pv_wait_head_or_lock(struct qspinlock *lock, struct qnode *no= de) > { > struct qspinlock **lp =3D NULL; > int waitcnt =3D 0; > @@ -641,7 +641,7 @@ static void pv_wait_head_or_lock(struct > set_pending(lock); > for (loop =3D SPIN_THRESHOLD; loop; loop--) { > if (trylock_clear_pending(lock)) > - return; /* got lock */ > + goto out; /* got lock */ > cpu_relax(); > } > clear_pending(lock); > @@ -669,7 +669,7 @@ static void pv_wait_head_or_lock(struct > */ > WRITE_ONCE(lock->locked, _Q_LOCKED_VAL); > WRITE_ONCE(*lp, NULL); > - return; /* got lock */ > + goto out; /* got lock */ > } > } > WRITE_ONCE(node->state, vcpu_hashed); > @@ -683,12 +683,22 @@ static void pv_wait_head_or_lock(struct > */ > } > =20 > +out: > /* > * The cmpxchg() or xchg() call before coming here provides the > * acquire semantics for locking. > */ > + return atomic_read(&lock->val); > } > =20 > +static const struct queue_ops pv_ops =3D { > + .init_node =3D pv_init_node, > + .trylock =3D pv_hybrid_queued_unfair_trylock, > + .wait_node =3D pv_wait_node, > + .wait_head_or_lock =3D pv_wait_head_or_lock, > + .kick_node =3D pv_kick_node, > +}; > + > /* > * PV versions of the unlock fastpath and slowpath functions to be used > * instead of queued_spin_unlock(). > @@ -756,18 +766,18 @@ __visible void __pv_queued_spin_unlock(s > EXPORT_SYMBOL(__pv_queued_spin_unlock); > #endif > =20 > -#else /* CONFIG_PARAVIRT_SPINLOCKS */ > -static __always_inline void pv_init_node(struct qnode *node) { } > -static __always_inline void pv_wait_node(struct qnode *node, > - struct qnode *prev) { } > -static __always_inline void pv_kick_node(struct qspinlock *lock, > - struct qnode *node) { } > -static __always_inline void pv_wait_head_or_lock(struct qspinlock *lock, > - struct qnode *node) { } > -static __always_inline bool pv_hybrid_queued_unfair_trylock(struct qspin= lock *lock) { BUILD_BUG(); } > #endif /* CONFIG_PARAVIRT_SPINLOCKS */ > =20 > -static inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, bo= ol paravirt) > +struct queue_ops { > + void (*init_node)(struct qnode *node); > + bool (*trylock)(struct qspinlock *lock); > + void (*wait_node)(struct qnode *node, struct qnode *prev); > + u32 (*wait_head_or_lock)(struct qspinlock *lock, struct qnode *node); > + void (*kick_node)(struct qspinlock *lock, struct qnode *node); > +}; > + > +static __always_inline > +void queued_spin_lock_mcs_queue(struct qspinlock *lock, const struct que= ue_ops *ops) > { > struct qnode *prev, *next, *node; > u32 val, old, tail; > @@ -813,16 +823,16 @@ static inline void queued_spin_lock_mcs_ > =20 > node->locked =3D 0; > node->next =3D NULL; > - if (paravirt) > - pv_init_node(node); > + if (ops && ops->init_node) > + ops->init_node(node); > =20 > /* > * We touched a (possibly) cold cacheline in the per-cpu queue node; > * attempt the trylock once more in the hope someone let go while we > * weren't watching. > */ > - if (paravirt) { > - if (pv_hybrid_queued_unfair_trylock(lock)) > + if (ops && ops->trylock) { > + if (ops->trylock(lock)) > goto release; > } else { > if (queued_spin_trylock(lock)) > @@ -857,8 +867,8 @@ static inline void queued_spin_lock_mcs_ > WRITE_ONCE(prev->next, node); > =20 > /* Wait for mcs node lock to be released */ > - if (paravirt) > - pv_wait_node(node, prev); > + if (ops && ops->wait_node) > + ops->wait_node(node, prev); > else > smp_cond_load_acquire(&node->locked, VAL); > =20 > @@ -893,12 +903,11 @@ static inline void queued_spin_lock_mcs_ > * If PV isn't active, 0 will be returned instead. > * > */ > - if (paravirt) { > - pv_wait_head_or_lock(lock, node); > - val =3D atomic_read(&lock->val); > + if (ops && ops->wait_head_or_lock) { > + val =3D ops->wait_head_or_lock(lock, node); > } else { > val =3D atomic_cond_read_acquire(&lock->val, > - !(VAL & _Q_LOCKED_PENDING_MASK)); > + !(VAL & _Q_LOCKED_PENDING_MASK)); > } > =20 > /* > @@ -1049,14 +1058,14 @@ void queued_spin_lock_slowpath(struct qs > */ > queue: > lockevent_inc(lock_slowpath); > - queued_spin_lock_mcs_queue(lock, false); > + queued_spin_lock_mcs_queue(lock, NULL); > } > EXPORT_SYMBOL(queued_spin_lock_slowpath); > =20 > #ifdef CONFIG_PARAVIRT_SPINLOCKS > void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) > { > - queued_spin_lock_mcs_queue(lock, true); > + queued_spin_lock_mcs_queue(lock, &pv_ops); > } > EXPORT_SYMBOL(__pv_queued_spin_lock_slowpath); > =20 >=20