From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 573FB364EA1 for ; Wed, 18 Mar 2026 22:15:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773872109; cv=none; b=dGACjPEABlfnrb0smGQ7qOBpx2fH8dtRtJqYSTVe6bab/E5BnUKPiZdjenLjE3ChVV3t7PPD4q6F4k3KvxYTaFrC5/VZ5dqczY1HU7uX4/gc1c8kgKtPEOTfkZ1trC+6j0Kzm6ct2lORCnAN+TKs1SOos4scqLjVQdykPg3es4Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773872109; c=relaxed/simple; bh=KadbjDW89prqRlUOAXQRRnU/T8ijVg6gmfoDKWiPmKw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=PrHSfDqtN9nsA9cv84jt6Fu7GIACT0IFEWAKvxLMHhGZq8EoJfmZi9rGf4QNZp389IQOG3Lh9ibUSYP1Xj6W9+58KhNqr67JfFivKaGhfIe6I4BsuUT4ZlFGW0jcC0fvs0/CJdTyTJv/L4cn4Si3tIZtgJ5NE9u4NVeBT5YeVhU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=msjr2OID; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="msjr2OID" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C9196C19421; Wed, 18 Mar 2026 22:15:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773872109; bh=KadbjDW89prqRlUOAXQRRnU/T8ijVg6gmfoDKWiPmKw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=msjr2OIDp7MCbfyfUaWgKewP6751KsLEezgztFUe/rSG9NGXh6lhoXFk7SfXb84cj boq4NYl04GwKvzrVIeiRwA5eS8piqzAJ1Vh3hpmtaxcXqpSTg5EHhllK86bh4z0znn d7LmZEDxDTcBQOQCu0vF+9YfLtPlTWl5ApK2m/Km24vOa43I5VVXrijv+j/4eFjUkF v82JJ31Pitz6BcZAjx7euRgblsp8PT8itJmqUGottXPWEIv+N0rRpw3grHIjxF4/yU ruJTpBqPUzEvFsbAMbpOpxzPYJnsUlXtx+3RuWUPsR76sfVWcYRSPH7On5Q7tqOMSw qFeWYTM0421qw== Received: from phl-compute-02.internal (phl-compute-02.internal [10.202.2.42]) by mailfauth.phl.internal (Postfix) with ESMTP id 8F29CF40080; Wed, 18 Mar 2026 18:15:07 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-02.internal (MEProxy); Wed, 18 Mar 2026 18:15:07 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdeftdehfedtucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpeffhffvvefukfhfgggtuggjsehttdertd dttddvnecuhfhrohhmpeeuohhquhhnucfhvghnghcuoegsohhquhhnsehkvghrnhgvlhdr ohhrgheqnecuggftrfgrthhtvghrnhepkefghffhueehlefhkeetueffjeevteejfeffte ettdetgeefffdtudetuddugfelnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghm pehmrghilhhfrhhomhepsghoqhhunhdomhgvshhmthhprghuthhhphgvrhhsohhnrghlih hthidqudeijedtleekgeejuddqudejjeekheehhedvqdgsohhquhhnpeepkhgvrhhnvghl rdhorhhgsehfihigmhgvrdhnrghmvgdpnhgspghrtghpthhtohepuddtpdhmohguvgepsh hmthhpohhuthdprhgtphhtthhopehjohgvlhgrghhnvghlfhesnhhvihguihgrrdgtohhm pdhrtghpthhtohepphgruhhlmhgtkheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepsg highgvrghshieslhhinhhuthhrohhnihigrdguvgdprhgtphhtthhopehfrhgvuggvrhhi tgeskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepnhgvvghrrghjrdhiihhtrhdutdesgh hmrghilhdrtghomhdprhgtphhtthhopehurhgviihkihesghhmrghilhdrtghomhdprhgt phhtthhopegsohhquhhnrdhfvghnghesghhmrghilhdrtghomhdprhgtphhtthhopehrtg husehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtohepmhgvmhigohhrsehgmhgr ihhlrdgtohhm X-ME-Proxy: Feedback-ID: i8dbe485b:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 18 Mar 2026 18:15:07 -0400 (EDT) Date: Wed, 18 Mar 2026 15:15:06 -0700 From: Boqun Feng To: Joel Fernandes Cc: paulmck@kernel.org, Sebastian Andrzej Siewior , frederic@kernel.org, neeraj.iitr10@gmail.com, urezki@gmail.com, boqun.feng@gmail.com, rcu@vger.kernel.org, Kumar Kartikeya Dwivedi Subject: Re: Next-level bug in SRCU implementation of RCU Tasks Trace + PREEMPT_RT Message-ID: References: <20260318105058.j2aKncBU@linutronix.de> <20260318144305.xI6RDtzk@linutronix.de> <214fb140-041d-4fd1-8694-658547209b84@paulmck-laptop> <3c4c5a29-24ea-492d-aeee-e0d9605b4183@nvidia.com> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Mar 18, 2026 at 02:55:48PM -0700, Boqun Feng wrote: > On Wed, Mar 18, 2026 at 02:52:48PM -0700, Boqun Feng wrote: > [...] > > > Ah so it is an ABBA deadlock, not a ABA self-deadlock. I guess this is a > > > different issue, from the NMI issue? It is more of an issue of calling > > > call_srcu API with scheduler locks held. > > > > > > Something like below I think: > > > > > > CPU A (BPF tracepoint) CPU B (concurrent call_srcu) > > > ---------------------------- ------------------------------------ > > > [1] holds &rq->__lock > > > [2] > > > -> call_srcu > > > -> srcu_gp_start_if_needed > > > -> srcu_funnel_gp_start > > > -> spin_lock_irqsave_ssp_content... > > > -> holds srcu locks > > > > > > [4] calls call_rcu_tasks_trace() [5] srcu_funnel_gp_start (cont..) > > > -> queue_delayed_work > > > -> call_srcu() -> __queue_work() > > > -> srcu_gp_start_if_needed() -> wake_up_worker() > > > -> srcu_funnel_gp_start() -> try_to_wake_up() > > > -> spin_lock_irqsave_ssp_contention() [6] WANTS rq->__lock > > > -> WANTS srcu locks > > > > I see, we can also have a self deadlock even without CPU B, when CPU A > > is going to try_to_wake_up() the a worker on the same CPU. > > > > An interesting observation is that the deadlock can be avoided in > > queue_delayed_work() uses a non-zero delay, that means a timer will be > > armed instead of acquiring the rq lock. > > If my observation is correct, then this can probably fix the deadlock issue with runqueue lock (untested though), but it won't work if BPF tracepoint can happen with timer base lock held. Regards, Boqun ------> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index 2328827f8775..a5d67264acb5 100644 --- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -1061,6 +1061,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp, struct srcu_node *snp_leaf; unsigned long snp_seq; struct srcu_usage *sup = ssp->srcu_sup; + bool irqs_were_disabled; /* Ensure that snp node tree is fully initialized before traversing it */ if (smp_load_acquire(&sup->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER) @@ -1098,6 +1099,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp, /* Top of tree, must ensure the grace period will be started. */ raw_spin_lock_irqsave_ssp_contention(ssp, &flags); + irqs_were_disabled = irqs_disabled_flags(flags); if (ULONG_CMP_LT(sup->srcu_gp_seq_needed, s)) { /* * Record need for grace period s. Pair with load @@ -1118,9 +1120,16 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp, // it isn't. And it does not have to be. After all, it // can only be executed during early boot when there is only // the one boot CPU running with interrupts still disabled. + // + // If irq was disabled when call_srcu() is called, then we + // could be in the scheduler path with a runqueue lock held, + // delay the process_srcu() work 1 more jiffies so we don't go + // through the kick_pool() -> wake_up_process() path below, and + // we could avoid deadlock with runqueue lock. if (likely(srcu_init_done)) queue_delayed_work(rcu_gp_wq, &sup->work, - !!srcu_get_delay(ssp)); + !!srcu_get_delay(ssp) + + !!irqs_were_disabled); else if (list_empty(&sup->work.work.entry)) list_add(&sup->work.work.entry, &srcu_boot_list); }