Re: [PATCH 4/8] mm/slab: introduce kfree_rcu_nolock()

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Harry Yoo (Oracle)" <harry@kernel.org>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Hao Li <hao.li@linux.dev>, Alexei Starovoitov <ast@kernel.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun@kernel.org>, Zqiang <qiang.zhang@linux.dev>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	rcu@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 4/8] mm/slab: introduce kfree_rcu_nolock()
Date: Thu, 23 Apr 2026 13:23:25 +0900	[thread overview]
Message-ID: <aemevSofweaUSx0n@hyeyoo> (raw)
In-Reply-To: <aejeVK0J_jHSfVhD@milan>

On Wed, Apr 22, 2026 at 04:42:28PM +0200, Uladzislau Rezki wrote:
> I think a better option is to add a separate kvfree_rcu_nmi() helper,
> or similar, and avoid complicating the generic implementation. Otherwise,
> the common path risks becoming harder to maintain.
> 
> Below is a simple implementation.

I'm happy to keep things simple as long as that doesn't mean
compromising performance.

We can discuss that.

> <patch>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index d5a70a831a2a..f6ae3795ec6c 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1402,6 +1402,14 @@ struct kfree_rcu_cpu {
>  
>  	struct llist_head bkvcache;
>  	int nr_bkv_objs;
> +
> +	/* For NMI context. */

I think "unknown context" is a better term since it includes
NMI context as well as other contexts.
(I'm also slightly moving towards the term, :D)

> +	struct llist_head drain_list;
> +	struct llist_node *pending_list;
> +
> +	struct rcu_work drain_rcu_work;
> +	struct irq_work drain_irqwork;
> +	atomic_t drain_in_progress;
>  };

[... changing the order of functions a little bit to help review ...]

>  static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
> @@ -1926,6 +1934,69 @@ void __init kfree_rcu_scheduler_running(void)
>  	}
>  }
> +
> +/*
> + * Queue a request for lazy invocation.
> + * Context: For NMI contexts or unknown contexts only.
> + */
> +void
> +kvfree_call_rcu_nolock(struct rcu_head *head, void *ptr)
> +{
> +	struct kfree_rcu_cpu *krcp = this_cpu_ptr(&krc);
> +
> +	head->func = ptr;
> +	llist_add((struct llist_node *) head, &krcp->drain_list);
> +

So it inserts objects to the list,

> +	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) {
> +		/* Only first(and only one) user rings the bell. */
> +		if (!atomic_cmpxchg(&krcp->drain_in_progress, 0, 1))
> +			irq_work_queue(&krcp->drain_irqwork);

and only the task that succeeds cmpxchg queues the IRQ work.
The IRQ work queues an RCU work which iterates over the
list of objects and frees them.

Draining will be performed a little bit more frequently
(every call_rcu_hurry() + work queue delay) compared to the ordinary
kvfree_rcu path (every 1-5 seconds)

The question is how frequent is too frequent, when it comes to
additional IRQ/RCU work invocations affecting performance.

> +static void
> +kvfree_rcu_nolock_irqwork(struct irq_work *irqwork)
> +{
> +	struct kfree_rcu_cpu *krcp =
> +		container_of(irqwork, struct kfree_rcu_cpu, drain_irqwork); > +	bool queued;
> +
> +	krcp->pending_list = llist_del_all(&krcp->drain_list);
> +	ASSERT_EXCLUSIVE_WRITER(krcp->pending_list);
> +	queued = queue_rcu_work(rcu_reclaim_wq, &krcp->drain_rcu_work);
> +	WARN_ON_ONCE(!queued);
> +}
>
> +static void
> +kvfree_rcu_nolock_work(struct work_struct *work)
> +{
> +	struct kfree_rcu_cpu *krcp = container_of(to_rcu_work(work),
> +		struct kfree_rcu_cpu, drain_rcu_work);
> +	struct llist_node *pos, *n, *pending;
> +	bool queued;
> +
> +	pending = krcp->pending_list;
> +	krcp->pending_list = NULL;
> +	ASSERT_EXCLUSIVE_WRITER(krcp->pending_list);
> +
> +	llist_for_each_safe(pos, n, pending) {
> +		struct rcu_head *rcu = (struct rcu_head *) pos;
> +		void *ptr = (void *) rcu->func;
> +		kvfree(ptr);
> +	}

This is pretty similar to what a kvfree_rcu(two_arg) call does
in the slowpath (kvfree_rcu_list), except that we don't maintain
RCU state explicitly.

How much performance do we sacrifice compared to
letting them go through the kvfree_rcu() fastpath?

> +	atomic_set(&krcp->drain_in_progress, 0);
> +	if (!llist_empty(&krcp->drain_list)) {
> +		if (!atomic_cmpxchg(&krcp->drain_in_progress, 0, 1)) {
> +			krcp->pending_list = llist_del_all(&krcp->drain_list);
> +			ASSERT_EXCLUSIVE_WRITER(krcp->pending_list);
> +			queued = queue_rcu_work(rcu_reclaim_wq, &krcp->drain_rcu_work);
> +			WARN_ON_ONCE(!queued);
> +		}
> +	}
> +}
> +	}
> +}
> +EXPORT_SYMBOL_GPL(kvfree_call_rcu_nolock);
> +
>  /*
>   * Queue a request for lazy invocation of the appropriate free routine
>   * after a grace period.  Please note that three paths are maintained,

-- 
Cheers,
Harry / Hyeonggon

next prev parent reply	other threads:[~2026-04-23  4:23 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16  9:10 [RFC PATCH v2 0/8] kvfree_rcu() improvements Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 1/8] mm/slab: introduce k[v]free_rcu() with struct rcu_ptr Harry Yoo (Oracle)
2026-04-22 14:41   ` Vlastimil Babka (SUSE)
2026-04-23  1:36     ` Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 2/8] fs/dcache: use rcu_ptr instead of rcu_head for external names Harry Yoo (Oracle)
2026-04-21 20:21   ` Al Viro
2026-04-22  1:16     ` Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 3/8] mm/slab: move kfree_rcu_cpu[_work] definitions Harry Yoo (Oracle)
2026-04-16  9:10 ` [PATCH 4/8] mm/slab: introduce kfree_rcu_nolock() Harry Yoo (Oracle)
2026-04-21 22:46   ` Alexei Starovoitov
2026-04-21 23:10     ` Paul E. McKenney
2026-04-21 23:14       ` Alexei Starovoitov
2026-04-22  3:02       ` Harry Yoo (Oracle)
2026-04-22 14:42   ` Uladzislau Rezki
2026-04-23  1:08     ` Harry Yoo (Oracle)
2026-04-23  1:56       ` Harry Yoo (Oracle)
2026-04-27 18:08         ` Vlastimil Babka (SUSE)
2026-04-27 18:51           ` Paul E. McKenney
2026-04-23  2:14       ` Harry Yoo (Oracle)
2026-04-23  4:23     ` Harry Yoo (Oracle) [this message]
2026-04-23 11:35       ` Uladzislau Rezki
2026-04-28 13:12         ` Harry Yoo (Oracle)
2026-04-30 12:10           ` Uladzislau Rezki
2026-05-19  7:44             ` Harry Yoo
2026-05-20 10:01               ` Uladzislau Rezki
2026-04-27 13:08   ` Vlastimil Babka (SUSE)
2026-04-16  9:10 ` [PATCH 5/8] mm/slab: make kfree_rcu_nolock() work with sheaves Harry Yoo (Oracle)
2026-04-27 13:32   ` Vlastimil Babka (SUSE)
2026-04-27 13:53     ` Vlastimil Babka (SUSE)
2026-04-27 14:45       ` Alexei Starovoitov
2026-04-27 15:08         ` Vlastimil Babka (SUSE)
2026-04-27 15:11           ` Alexei Starovoitov
2026-04-16  9:10 ` [PATCH 6/8] mm/slab: wrap rcu sheaf handling with ifdef Harry Yoo (Oracle)
2026-04-27 15:47   ` Vlastimil Babka (SUSE)
2026-04-16  9:10 ` [PATCH 7/8] mm/slab: introduce deferred submission of rcu sheaves Harry Yoo (Oracle)
2026-04-21 22:51   ` Alexei Starovoitov
2026-04-22  3:11     ` Harry Yoo (Oracle)
2026-04-27 15:55   ` Vlastimil Babka (SUSE)
2026-04-16  9:10 ` [PATCH 8/8] lib/tests/slub_kunit: add a test case for kfree_rcu_nolock() Harry Yoo (Oracle)
2026-04-22 14:30 ` [RFC PATCH v2 0/8] kvfree_rcu() improvements Vlastimil Babka (SUSE)
2026-04-22 22:41   ` Paul E. McKenney
2026-04-23  1:31   ` Harry Yoo (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aemevSofweaUSx0n@hyeyoo \
    --to=harry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=boqun@kernel.org \
    --cc=cl@gentwo.org \
    --cc=frederic@kernel.org \
    --cc=hao.li@linux.dev \
    --cc=jiangshanlai@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=qiang.zhang@linux.dev \
    --cc=rcu@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox