From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0714EEB64DD for ; Wed, 9 Aug 2023 11:59:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=jlaWt1J5jCTGccRD+L+QSoUiKmUTFkvC3DpfwRbtk4w=; b=INGLczC9wIi5ft YdXFXX3R4xCpGDUzVqkWGNgpKprn66RdayLnO1wHDlh0ad6R6PdlwlGEeXbEYcpe7kZlIqx3296lP sw9ZLly55aDtPDnEIVwNjCiBL+nNXGhgN7JPm+2RG/qJbK9o5zU/rWK9imkCLP2fuzFl5ruwXObG5 lnorsRy+PLLNfLwL+1NeL3D+iuJZHWamBFnhKBTObs6c8lqRxSvMylbrtXk80DKqsYWHmo1+kBI8i Vd+x0wZZpuPb/0hzHDlergiBCDQQJY0sVO7YHid7LGkZ4eC2unCckZfhQ3XSTyWX3r4Mpf7cguehV DQyAZ6IyWg6yyQD5xmTA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qThq2-004qG5-1P; Wed, 09 Aug 2023 11:58:54 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qThpz-004qFj-0h for linux-arm-kernel@lists.infradead.org; Wed, 09 Aug 2023 11:58:52 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BB8506376F; Wed, 9 Aug 2023 11:58:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2563EC433C8; Wed, 9 Aug 2023 11:58:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1691582330; bh=/H1ZcbRfUSpWvjoUyw2HXiK0PXbG+SaxxJBWfjFis1c=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rJQpckXDBvAWwfq+S4FYIEpKw6nOfUxvf0nt/E6OvJUe2zH0jaqaUt/i9VG1Nov4H EfR+czQz3C0j7MQxmOd3QAAswKf1BP4k3/+lO36cw5H9vkxfDHtcaLoxp6cnhbrrs3 j6pE08HJ2FJt9n1L7pRyKHlrBESG63GTmVdrsjtpV6xLdEse5NnZZWOqFSg3ohhvt4 SnJUFLMUl76ZY8hBba5rffHkff9gkPi0ho4AkVjWr69yPwQLU9CZSQrqGQNpfuSd3g Ff8Fb+TH2d4YPDM2Icj6Wbs5GGA5rPHTFWNOGZsFNtj66s9B3PYu1NITXWDphhOQ7K KfcT7gcxtMv8g== Date: Wed, 9 Aug 2023 12:58:45 +0100 From: Will Deacon To: Waiman Long Cc: Robin Murphy , Mark Rutland , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v4] perf/arm-dmc620: Fix dmc620_pmu_irqs_lock/cpu_hotplug_lock circular lock dependency Message-ID: <20230809115845.GA3903@willie-the-truck> References: <20230807154446.208572-1-longman@redhat.com> <0d32adf1-43fd-2762-d5ab-707d5969dcb0@arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230809_045851_328296_F119F6CE X-CRM114-Status: GOOD ( 37.43 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Aug 08, 2023 at 03:10:01PM -0400, Waiman Long wrote: > = > On 8/8/23 08:29, Robin Murphy wrote: > > On 2023-08-07 16:44, Waiman Long wrote: > > > The following circular locking dependency was reported when running > > > cpus online/offline test on an arm64 system. > > > = > > > [=A0=A0 84.195923] Chain exists of: > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 dmc620_pmu_irqs_l= ock --> cpu_hotplug_lock --> > > > cpuhp_state-down > > > = > > > [=A0=A0 84.207305]=A0 Possible unsafe locking scenario: > > > = > > > [=A0=A0 84.213212]=A0=A0=A0=A0=A0=A0=A0 CPU0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 CPU1 > > > [=A0=A0 84.217729]=A0=A0=A0=A0=A0=A0=A0 ----=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ---- > > > [=A0=A0 84.222247]=A0=A0 lock(cpuhp_state-down); > > > [=A0=A0 84.225899] lock(cpu_hotplug_lock); > > > [=A0=A0 84.232068] lock(cpuhp_state-down); > > > [=A0=A0 84.238237]=A0=A0 lock(dmc620_pmu_irqs_lock); > > > [=A0=A0 84.242236] > > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 *** DEADLOCK *** > > > = > > > The problematic locking order seems to be > > > = > > > =A0=A0=A0=A0lock(dmc620_pmu_irqs_lock) --> lock(cpu_hotplug_lock) > > > = > > > This locking order happens when dmc620_pmu_get_irq() calls > > > cpuhp_state_add_instance_nocalls(). Since dmc620_pmu_irqs_lock is used > > > for protecting the dmc620_pmu_irqs structure only, we don't actually > > > need > > > to hold the lock when adding a new instance to the CPU hotplug > > > subsystem. > > > = > > > Fix this possible deadlock scenario by adding a new > > > dmc620_pmu_get_irq_lock for protecting the call to > > > __dmc620_pmu_get_irq() > > > and taking dmc620_pmu_irqs_lock inside __dmc620_pmu_get_irq() > > > only when dmc620_pmu_irqs is being searched or modified. As a > > > result, cpuhp_state_add_instance_nocalls() won't be called with > > > dmc620_pmu_irqs_lock held and cpu_hotplug_lock won't be acquired after > > > dmc620_pmu_irqs_lock. > > > = > > > Suggested-by: Robin Murphy > > > Signed-off-by: Waiman Long > > > --- > > > =A0 drivers/perf/arm_dmc620_pmu.c | 18 ++++++++++++++---- > > > =A0 1 file changed, 14 insertions(+), 4 deletions(-) > > > = > > > diff --git a/drivers/perf/arm_dmc620_pmu.c > > > b/drivers/perf/arm_dmc620_pmu.c > > > index 9d0f01c4455a..895971915f2d 100644 > > > --- a/drivers/perf/arm_dmc620_pmu.c > > > +++ b/drivers/perf/arm_dmc620_pmu.c > > > @@ -68,6 +68,7 @@ > > > =A0 =A0 static LIST_HEAD(dmc620_pmu_irqs); > > > =A0 static DEFINE_MUTEX(dmc620_pmu_irqs_lock); > > > +static DEFINE_MUTEX(dmc620_pmu_get_irq_lock); > > > =A0 =A0 struct dmc620_pmu_irq { > > > =A0=A0=A0=A0=A0 struct hlist_node node; > > > @@ -421,11 +422,18 @@ static irqreturn_t dmc620_pmu_handle_irq(int > > > irq_num, void *data) > > > =A0 static struct dmc620_pmu_irq *__dmc620_pmu_get_irq(int irq_num) > > > =A0 { > > > =A0=A0=A0=A0=A0 struct dmc620_pmu_irq *irq; > > > +=A0=A0=A0 bool found =3D false; > > > =A0=A0=A0=A0=A0 int ret; > > > =A0 +=A0=A0=A0 mutex_lock(&dmc620_pmu_irqs_lock); > > = > > Do we strictly need this? I'd hope that the outer release/acquire of > > dmc620_get_pmu_irqs_lock already means we can't observe an invalid value > > of irq->irq_num, and the refcount op should be atomic in itself, no? > > Fair enough if there's some other subtlety I'm missing - I do trust that > > you're more experienced in locking and barrier semantics than I am! - > > and if it comes to it I'd agree that simple extra locking is preferable > > to getting into explicit memory barriers here. locking > = > I guess we can use rcu_read_lock/rcu_read_unlock and > list_for_each_entry_rcu() to avoid taking dmc620_pmu_irqs_lock here. I thought we decided that we couldn't use RCU in: https://lore.kernel.org/r/2f56057b-08ef-c3a6-8300-33f36d2c3916@arm.com ? > > One other nit either way, could we clarify the names to be something > > like irqs_list_lock and irqs_users_lock? The split locking scheme > > doesn't exactly lend itself to being super-obvious, especially if we do > > end up nesting both locks, so I think naming them after what they > > semantically protect seems the most readable option. Otherwise, this > > does pretty much look like what I originally had in mind. > = > I think it is a good to rename dmc620_pmu_irqs_lock to > dmc620_pmu_irqs_list_lock. For the other lock, its purpose is to make sure > that only one user can get to __dmc620_pmu_get_irq(), may be > dmc620_irqs_get_lock. I can add some comment to clarify the nesting > relationship. Please do that and I'll pick the patch up for 6.6. Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel