All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Harry Yoo (Oracle)" <harry@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Hao Li <hao.li@linux.dev>, Alexei Starovoitov <ast@kernel.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun@kernel.org>, Zqiang <qiang.zhang@linux.dev>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	rcu@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 4/8] mm/slab: introduce kfree_rcu_nolock()
Date: Wed, 22 Apr 2026 12:02:03 +0900	[thread overview]
Message-ID: <aeg6K-NcvFHZ02bk@hyeyoo> (raw)
In-Reply-To: <805c33d7-3a7b-470c-bd9d-065717a3e3e2@paulmck-laptop>

On Tue, Apr 21, 2026 at 04:10:41PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 21, 2026 at 03:46:30PM -0700, Alexei Starovoitov wrote:
> > On Thu Apr 16, 2026 at 2:10 AM PDT, Harry Yoo (Oracle) wrote:
> > >  struct kfree_rcu_cpu {
> > > +	// Objects queued on a lockless linked list, used to free objects
> > > +	// in unknown contexts when trylock fails.
> > > +	struct llist_head defer_head;
> > > +
> > > +	struct irq_work defer_free;
> > > +	struct irq_work sched_delayed_monitor;
> > > +	struct irq_work run_page_cache_worker;
> > > +
> > >  	// Objects queued on a linked list
> > >  	struct rcu_ptr *head;
> > >  	unsigned long head_gp_snap;
> > > @@ -1333,12 +1341,99 @@ struct kfree_rcu_cpu {
> > >  	struct llist_head bkvcache;
> > >  	int nr_bkv_objs;
> > >  };
> > > +
> > > +static void defer_kfree_rcu_irq_work_fn(struct irq_work *work);
> > > +static void sched_delayed_monitor_irq_work_fn(struct irq_work *work);
> > > +static void run_page_cache_worker_irq_work_fn(struct irq_work *work);
> > > +
> > > +static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
> > > +	.lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock),
> > > +	.defer_head = LLIST_HEAD_INIT(defer_head),
> > > +	.defer_free = IRQ_WORK_INIT(defer_kfree_rcu_irq_work_fn),
> > > +	.sched_delayed_monitor =
> > > +		IRQ_WORK_INIT_LAZY(sched_delayed_monitor_irq_work_fn),
> > > +	.run_page_cache_worker =
> > > +		IRQ_WORK_INIT_LAZY(run_page_cache_worker_irq_work_fn),
> > > +};
> > 
> > I think kfree_rcu_cpu doesn't need to be per-cpu.

After reading this, I was like "Oh, that's quite a drastic change?",
but looks like I misread it. I didn't create a new percpu structure,
but extended the existing one.

I guess you meant the new fields added (defer_head, and irq works)
to struct kfree_rcu_cpu, not the whole structure.

> > It can be global llist with single irq_work for them all.

It could be, but what is the benefit of separating them from
existing kfree_rcu_cpu and making them global?

> I would be quite nervous about that, but you might well be right, given
> that this is a trylock-acquisition failure path.  Give or take people
> and/or machines analyzing the code for potential denial-of-service
> attacks.  :-/

It'll probably not that bad because it's trylock-acquisiion failure path
of per-cpu lock; IIRC during my test, falling back to defer_free
happened only a few times (< 10) when the kunit test is calling
kfree_rcu() in a tight loop (100k calls) while concurrently invoking
kfree_rcu_nolock() ~10k times on the same CPU.

> > Not sure about sched_delayed_monitor/run_page_cache_worker.
> > Do they have to be per-cpu ?

Since existing sched_delayed_monitor/run_page_cache_worker works are
per-cpu, I think it's better to keep those irq_works per-cpu as well.

> > Can all 3 share single irq_work?

I thought defer_free and defer_call_rcu should be non-lazy irq work
and others should be lazy irq work. And I was thinking of having
one lazy and one non-lazy IRQ work (two instead of four).

But given that sched_delayed_monitor and run_page_cache_worker should
not triggered that frequently anyway, it'll probably be okay for all of
them to share a single non-lazy IRQ work.

> On the other hand, if all CPUs are doing kfree_rcu() in even a semi-tight
> loop, having them all unconditionally use global state is not going to
> make for a fun time on large systems.  And there already are situations
> where user code can make all CPUs to call_rcu() in a semi-tight loop,
> so even if that is not yet the case for kfree_rcu(), past experience
> indicates that it soon will be.

A tight loop for kfree_rcu() should be fine.

I think the question is "Can a malicious user can make all CPUs to
kfree_rcu() in a tight loop AND concurrently trigger kfree_rcu_nolock()
on those CPUs, so that trylock will mostly fail"

> And noted on the desirability of call_rcu_nolock(), apologies for being
> slow.

No problem. Really appreciate looking into it, Alexei and Paul!

-- 
Cheers,
Harry / Hyeonggon


  parent reply	other threads:[~2026-04-22  3:02 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16  9:10 [RFC PATCH v2 0/8] kvfree_rcu() improvements Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 1/8] mm/slab: introduce k[v]free_rcu() with struct rcu_ptr Harry Yoo (Oracle)
2026-04-22 14:41   ` Vlastimil Babka (SUSE)
2026-04-23  1:36     ` Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 2/8] fs/dcache: use rcu_ptr instead of rcu_head for external names Harry Yoo (Oracle)
2026-04-21 20:21   ` Al Viro
2026-04-22  1:16     ` Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 3/8] mm/slab: move kfree_rcu_cpu[_work] definitions Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 4/8] mm/slab: introduce kfree_rcu_nolock() Harry Yoo (Oracle)
2026-04-21 22:46   ` Alexei Starovoitov
2026-04-21 23:10     ` Paul E. McKenney
2026-04-21 23:14       ` Alexei Starovoitov
2026-04-22  3:02       ` Harry Yoo (Oracle) [this message]
2026-04-22 14:42   ` Uladzislau Rezki
2026-04-23  1:08     ` Harry Yoo (Oracle)
2026-04-23  1:56       ` Harry Yoo (Oracle)
2026-04-27 18:08         ` Vlastimil Babka (SUSE)
2026-04-27 18:51           ` Paul E. McKenney
2026-04-23  2:14       ` Harry Yoo (Oracle)
2026-04-23  4:23     ` Harry Yoo (Oracle)
2026-04-23 11:35       ` Uladzislau Rezki
2026-04-28 13:12         ` Harry Yoo (Oracle)
2026-04-30 12:10           ` Uladzislau Rezki
2026-04-27 13:08   ` Vlastimil Babka (SUSE)
2026-04-16  9:10 ` [PATCH 5/8] mm/slab: make kfree_rcu_nolock() work with sheaves Harry Yoo (Oracle)
2026-04-27 13:32   ` Vlastimil Babka (SUSE)
2026-04-27 13:53     ` Vlastimil Babka (SUSE)
2026-04-27 14:45       ` Alexei Starovoitov
2026-04-27 15:08         ` Vlastimil Babka (SUSE)
2026-04-27 15:11           ` Alexei Starovoitov
2026-04-16  9:10 ` [PATCH 6/8] mm/slab: wrap rcu sheaf handling with ifdef Harry Yoo (Oracle)
2026-04-27 15:47   ` Vlastimil Babka (SUSE)
2026-04-16  9:10 ` [PATCH 7/8] mm/slab: introduce deferred submission of rcu sheaves Harry Yoo (Oracle)
2026-04-21 22:51   ` Alexei Starovoitov
2026-04-22  3:11     ` Harry Yoo (Oracle)
2026-04-27 15:55   ` Vlastimil Babka (SUSE)
2026-04-16  9:10 ` [PATCH 8/8] lib/tests/slub_kunit: add a test case for kfree_rcu_nolock() Harry Yoo (Oracle)
2026-04-22 14:30 ` [RFC PATCH v2 0/8] kvfree_rcu() improvements Vlastimil Babka (SUSE)
2026-04-22 22:41   ` Paul E. McKenney
2026-04-23  1:31   ` Harry Yoo (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeg6K-NcvFHZ02bk@hyeyoo \
    --to=harry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=boqun@kernel.org \
    --cc=cl@gentwo.org \
    --cc=frederic@kernel.org \
    --cc=hao.li@linux.dev \
    --cc=jiangshanlai@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=qiang.zhang@linux.dev \
    --cc=rcu@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.