From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 388F13D5258; Tue, 24 Mar 2026 17:36:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774373772; cv=none; b=XbKVOSlPdwfn/4IxZBGBnv3DKPbm9cu8k6JDugeemutHOcU5W/wbwXU4CasIzPkp4/vTfldaxndaKp+Fl/+cKdePL4tWO91QkoUUfW++kmLYDBP/YM78ZJo+kf8/8gW368LyMRf1cz6fq6fKxQcS55qttANPY4h9Jft0x2u4K/8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774373772; c=relaxed/simple; bh=V0ZecmU2AMKGKl2aDUn7f5w7I0jXyQ7cAYdWe2OxXNY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=FF8gPBbDx+GEgB/nExynWhYp28O5USSp4dXW59b86br/ZxTsJ9I6NYjLUUvT1Mk555dQ61bFM5dnyjCU6uyGn008wgrejiqNwr3k3CpJZIdqkUib1qFebknmjUn0dLCVs6xBy/4cT9pyC38ZSQJk2XtidEwjiQIpxUMIw1zKjLs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eqD+XJ73; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eqD+XJ73" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3BEC4C19424; Tue, 24 Mar 2026 17:36:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774373771; bh=V0ZecmU2AMKGKl2aDUn7f5w7I0jXyQ7cAYdWe2OxXNY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=eqD+XJ73xB8JtsFAPyFuDvX555CWAdod90jitRAdKE7OnWuHYupTqc4wbCLmgFMwf +ColFY7MorE4jl30Mv8pu4bEK21R+tEDnDMknh3xts21hPBUvaKFYMaWUJYHqWb/gv kySyQ10tEl006wZ8lItQYsZLPbfj2fCgMF+YaH9E0pSQQrPshdMsbwjqyWJGlKQgCS Oxo9B/OwjpivSdZITZOZjEbruO52crB4gzEj4g0a9n/gv8FUInVdk40pT6zI7ezAsI VOjg+e2xdzmHiZri7kQJq1rooVW8nOQRbbIQQVJauG4KTxf2alJPmcJHwLVUcyd6Uc mAF/ftZUpWEgA== Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfauth.phl.internal (Postfix) with ESMTP id 53E46F4006C; Tue, 24 Mar 2026 13:36:10 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Tue, 24 Mar 2026 13:36:10 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdefvddvvddtucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggugfgjsehtkeertddttdejnecuhfhrohhmpeeuohhquhhn ucfhvghnghcuoegsohhquhhnsehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvghrnh epteelkeejledtledtvefhhedtfeeghfeuhfeljeevffelgeeltefhieefvdeuteehnecu ffhomhgrihhnpehkvghrnhgvlhdrohhrghenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpegsohhquhhnodhmvghsmhhtphgruhhthhhpvghrshho nhgrlhhithihqdduieejtdelkeegjeduqddujeejkeehheehvddqsghoqhhunheppehkvg hrnhgvlhdrohhrghesfhhigihmvgdrnhgrmhgvpdhnsggprhgtphhtthhopedukedpmhho uggvpehsmhhtphhouhhtpdhrtghpthhtoheprghlvgigvghirdhsthgrrhhovhhoihhtoh hvsehgmhgrihhlrdgtohhmpdhrtghpthhtohepfhhrvgguvghrihgtsehkvghrnhgvlhdr ohhrghdprhgtphhtthhopehjohgvlhgrghhnvghlfhesnhhvihguihgrrdgtohhmpdhrtg hpthhtohepphgruhhlmhgtkheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepmhgvmhig ohhrsehgmhgrihhlrdgtohhmpdhrtghpthhtohepsghighgvrghshieslhhinhhuthhroh hnihigrdguvgdprhgtphhtthhopehnvggvrhgrjhdrihhithhruddtsehgmhgrihhlrdgt ohhmpdhrtghpthhtohepuhhrvgiikhhisehgmhgrihhlrdgtohhmpdhrtghpthhtohepsg hoqhhunhdrfhgvnhhgsehgmhgrihhlrdgtohhm X-ME-Proxy: Feedback-ID: i8dbe485b:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 24 Mar 2026 13:36:09 -0400 (EDT) Date: Tue, 24 Mar 2026 10:36:08 -0700 From: Boqun Feng To: Alexei Starovoitov Cc: Frederic Weisbecker , Joel Fernandes , "Paul E. McKenney" , Kumar Kartikeya Dwivedi , Sebastian Andrzej Siewior , Neeraj upadhyay , Uladzislau Rezki , Boqun Feng , rcu@vger.kernel.org, Tejun Heo , bpf , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrea Righi , Zqiang Subject: Re: [PATCH v2] rcu: Use an intermediate irq_work to start process_srcu() Message-ID: References: <20260320222916.19987-1-boqun@kernel.org> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue, Mar 24, 2026 at 07:56:44AM -0700, Alexei Starovoitov wrote: > On Tue, Mar 24, 2026 at 4:27 AM Frederic Weisbecker wrote: > > > > Le Fri, Mar 20, 2026 at 03:29:16PM -0700, Boqun Feng a écrit : > > > Since commit c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms > > > of SRCU-fast") we switched to SRCU in BPF. However as BPF instrument can > > > happen basically everywhere (including where a scheduler lock is held), > > > call_srcu() now needs to avoid acquiring scheduler lock because > > > otherwise it could cause deadlock [1]. Fix this by following what the > > > previous RCU Tasks Trace did: using an irq_work to delay the queuing of > > > the work to start process_srcu(). > > > > > > [boqun: Apply Joel's feedback] > > > [boqun: Apply Andrea's test feedback] > > > > > > Reported-by: Andrea Righi > > > Closes: https://lore.kernel.org/all/abjzvz_tL_siV17s@gpd4/ > > > Fixes: commit c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast") > > > Link: https://lore.kernel.org/rcu/3c4c5a29-24ea-492d-aeee-e0d9605b4183@nvidia.com/ [1] > > > Suggested-by: Zqiang > > > Tested-by: Andrea Righi > > > Signed-off-by: Boqun Feng > > > > I have the feeling that this problem should be solved at the BPF > > level. Tracepoints can fire at any time, in that sense they are like NMIs, > > and NMIs shouldn't acquire locks, let alone call call_rcu_*() > > > > BPF should arrange for delaying such operations to more appropriate contexts. > > > > I understand this is a regression trigerred by an RCU change but to me it > > rather reveals a hidden design issue rather than an API breakage. > > You all are still missing that rcu_tasks_trace was developed > exclusively for bpf with bpf requirements. I'm missing this for sure. But what would be a better design? BPF is also calling normal RCU (via call_rcu()) as well, and I don't think we can say normal call_rcu() was designed exclusively for BPF. Plus BPF heavily uses irq_work to avoid deadlocks but irq_work_queue() itself has a tracepoint in it (trace_ipi_send_cpu()), so in theory you could hit a deadlock there too. > Then srcu_fast was introduced and then it looked like that > task_trace can be replaced with srcu_fast and that's where the problems > discovered. So either task_trace need to be resurrected > or srcu_fast needs to be fixed. > Let's punt to bpf subsystem isn't an option. I don't think we are suggesting that, i.e. "BPF should fix its own issue", at least myself was hoping that we can redesign the APIs that BPF relies on and make it clear that it's for BPF and we can in theory avoid all the deadlocks (we will probably have to make some primitives non-traceable) and BPF can use it combined with other general synchronization primitives. The current approach seems to me that we just whack-a-mole when an issue happens, and it's not a systematic solution. (It doesn't have to be a problem, it could be an opportunity ;-)) Regards, Boqun > At this rate we will have rcu_bpf. Which rcu_tasks_trace effectively was.