From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09F563932D1 for ; Wed, 22 Apr 2026 03:02:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776826928; cv=none; b=FQ+P1nYZk9OhCndfnrM8pOwtf0zqAerUxuHGncnWStUtGrUrjBoagKGhpilYFOrPTnG9BriQBQtvEnHSO/3eSTrj0ZoC5q7ZV3knifyYYQGs47TcrNAfUCW2MqFjWLlUoZe6lDd9Lz1ALZf5auG17dVMHeE88TtrjkWxcZ6ziJ0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776826928; c=relaxed/simple; bh=sd4O6bp9bJ6/CkpIBKSx3gBRzusegphBjTuUq0STAQ0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kbnnNnv1gyI4r07AApjDX+lyhw6tWfo22uzul0FSuIXHJ4n0bSLrkL7QJZ7SE6dcyLxsMqg0grW7MiG6cl5WR5wlbcsgbe8OGqrFemn3akdtiWt0ED4jonH/Oh2AA1h2Ar7EX5+X+zoDnKH/+6K+OsqfF1UR1DJ+mZnOU0UlPEc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=cLiP4ZEn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cLiP4ZEn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8D429C2BCB0; Wed, 22 Apr 2026 03:02:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776826925; bh=sd4O6bp9bJ6/CkpIBKSx3gBRzusegphBjTuUq0STAQ0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=cLiP4ZEnUe7HRV+QaodZGpD/1YTHkLAj6H+vbGJmFwrXaauLuOTvX+QWd2+TVDbwa thHLX39Cp6IDLUfKSUcZ0sSc4Eh0jO1Lj0eZxLMrbmWHJmlBZhWfKuvjBVt2h58UvC ifBU+FUGg21SKDQuWb4vYCi/P57VJJ9WR0fpYpceU3c7sOIKGhCMnhYMOKN+mw9cVV KfIgaXMe8wWlgDouuSa2ObFEI9PUIzcDZ7AVsKWIay9j7ZS46EwqWZHRsf7YSWlNds 5QKGBxwFERxA/UqXIU6Hj24S4uSOnxEteFHvygjp2HfivQp2Ei0+u8Edc2aCX84GtS uVPWJnPV+IwsA== Date: Wed, 22 Apr 2026 12:02:03 +0900 From: "Harry Yoo (Oracle)" To: "Paul E. McKenney" Cc: Alexei Starovoitov , Andrew Morton , Vlastimil Babka , Christoph Lameter , David Rientjes , Roman Gushchin , Hao Li , Alexei Starovoitov , Uladzislau Rezki , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Zqiang , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , rcu@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 4/8] mm/slab: introduce kfree_rcu_nolock() Message-ID: References: <20260416091022.36823-1-harry@kernel.org> <20260416091022.36823-5-harry@kernel.org> <805c33d7-3a7b-470c-bd9d-065717a3e3e2@paulmck-laptop> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <805c33d7-3a7b-470c-bd9d-065717a3e3e2@paulmck-laptop> On Tue, Apr 21, 2026 at 04:10:41PM -0700, Paul E. McKenney wrote: > On Tue, Apr 21, 2026 at 03:46:30PM -0700, Alexei Starovoitov wrote: > > On Thu Apr 16, 2026 at 2:10 AM PDT, Harry Yoo (Oracle) wrote: > > > struct kfree_rcu_cpu { > > > + // Objects queued on a lockless linked list, used to free objects > > > + // in unknown contexts when trylock fails. > > > + struct llist_head defer_head; > > > + > > > + struct irq_work defer_free; > > > + struct irq_work sched_delayed_monitor; > > > + struct irq_work run_page_cache_worker; > > > + > > > // Objects queued on a linked list > > > struct rcu_ptr *head; > > > unsigned long head_gp_snap; > > > @@ -1333,12 +1341,99 @@ struct kfree_rcu_cpu { > > > struct llist_head bkvcache; > > > int nr_bkv_objs; > > > }; > > > + > > > +static void defer_kfree_rcu_irq_work_fn(struct irq_work *work); > > > +static void sched_delayed_monitor_irq_work_fn(struct irq_work *work); > > > +static void run_page_cache_worker_irq_work_fn(struct irq_work *work); > > > + > > > +static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = { > > > + .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock), > > > + .defer_head = LLIST_HEAD_INIT(defer_head), > > > + .defer_free = IRQ_WORK_INIT(defer_kfree_rcu_irq_work_fn), > > > + .sched_delayed_monitor = > > > + IRQ_WORK_INIT_LAZY(sched_delayed_monitor_irq_work_fn), > > > + .run_page_cache_worker = > > > + IRQ_WORK_INIT_LAZY(run_page_cache_worker_irq_work_fn), > > > +}; > > > > I think kfree_rcu_cpu doesn't need to be per-cpu. After reading this, I was like "Oh, that's quite a drastic change?", but looks like I misread it. I didn't create a new percpu structure, but extended the existing one. I guess you meant the new fields added (defer_head, and irq works) to struct kfree_rcu_cpu, not the whole structure. > > It can be global llist with single irq_work for them all. It could be, but what is the benefit of separating them from existing kfree_rcu_cpu and making them global? > I would be quite nervous about that, but you might well be right, given > that this is a trylock-acquisition failure path. Give or take people > and/or machines analyzing the code for potential denial-of-service > attacks. :-/ It'll probably not that bad because it's trylock-acquisiion failure path of per-cpu lock; IIRC during my test, falling back to defer_free happened only a few times (< 10) when the kunit test is calling kfree_rcu() in a tight loop (100k calls) while concurrently invoking kfree_rcu_nolock() ~10k times on the same CPU. > > Not sure about sched_delayed_monitor/run_page_cache_worker. > > Do they have to be per-cpu ? Since existing sched_delayed_monitor/run_page_cache_worker works are per-cpu, I think it's better to keep those irq_works per-cpu as well. > > Can all 3 share single irq_work? I thought defer_free and defer_call_rcu should be non-lazy irq work and others should be lazy irq work. And I was thinking of having one lazy and one non-lazy IRQ work (two instead of four). But given that sched_delayed_monitor and run_page_cache_worker should not triggered that frequently anyway, it'll probably be okay for all of them to share a single non-lazy IRQ work. > On the other hand, if all CPUs are doing kfree_rcu() in even a semi-tight > loop, having them all unconditionally use global state is not going to > make for a fun time on large systems. And there already are situations > where user code can make all CPUs to call_rcu() in a semi-tight loop, > so even if that is not yet the case for kfree_rcu(), past experience > indicates that it soon will be. A tight loop for kfree_rcu() should be fine. I think the question is "Can a malicious user can make all CPUs to kfree_rcu() in a tight loop AND concurrently trigger kfree_rcu_nolock() on those CPUs, so that trylock will mostly fail" > And noted on the desirability of call_rcu_nolock(), apologies for being > slow. No problem. Really appreciate looking into it, Alexei and Paul! -- Cheers, Harry / Hyeonggon