From: Uladzislau Rezki <urezki@gmail.com>
To: "Harry Yoo (Oracle)" <harry@kernel.org>
Cc: Uladzislau Rezki <urezki@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@kernel.org>,
Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Hao Li <hao.li@linux.dev>, Alexei Starovoitov <ast@kernel.org>,
"Paul E . McKenney" <paulmck@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun@kernel.org>, Zqiang <qiang.zhang@linux.dev>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
rcu@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 4/8] mm/slab: introduce kfree_rcu_nolock()
Date: Thu, 30 Apr 2026 14:10:58 +0200 [thread overview]
Message-ID: <afNG0jNQNYeZ940g@pc636> (raw)
In-Reply-To: <3s4jafam3la72a6y3dkfvhtzxk3fsngb2cka3bpfqrirl5m633@pz3vzizefoxb>
Hello, Harry!
>
> Hi Ulad. Apologies for the delayed response.
> I meant to reply sooner but sidetracked by other issues.
>
No problem, sometimes i also can lag because of other tasks :)
> Your questions are fair, but let me try to clarify
> the current situation.
>
> And before diving into details, I would like to reiterate that
> there are potentially two points to discuss here:
>
> Point 1. Can we justify complicating subsystems by passing
> `allow_spin` parameter all over the place?
>
Yes, we can. But as i noted i see some drawbacks :)
- all new incoming patches have to respect that new third argument;
- the fallback mechanism which uses irq-work is not optimal in my
opinion:
a) We introduce an extra window between queuing a pointer, mark
irq-work to be executed and then reenter the kfree_rcu() with
no-sync flag and now we need to wait a GP for them. But the GP
might be already passed for such pointers. So we potentially
need more time to offload. This is rather minus.
b) Since it is for BPF, allow_spin is always false, thus only
fallback path is used. Decoupling comes to mind.
c)
Why should we mix those? What it is worth to do, is to prevent mixing
"unknown path which is for BPF/others" with generic kfree_rcu().
It is easier to go that way and more cleaner, IMO. We need less code
and we fix a specific requirements.
>
> Point 2. Can we avoid adding this complexity to kvfree_rcu() and
> let slab handle it instead? (as mentioned in [4])
>
it depends if BPF people want to free a pointer using RCU machinery?
Do you know if that an intention?
> On Point 1: IMHO it could be justified, but at the same time I hope we
> end up avoiding more complexity in the long term by working on Point 2.
>
> This reply focuses only on Point 1 and explains why it could be
> justified.
>
> On Thu, Apr 23, 2026 at 01:35:25PM +0200, Uladzislau Rezki wrote:
> > On Thu, Apr 23, 2026 at 01:23:25PM +0900, Harry Yoo (Oracle) wrote:
> > > On Wed, Apr 22, 2026 at 04:42:28PM +0200, Uladzislau Rezki wrote:
> > > How much performance do we sacrifice compared to
> > > letting them go through the kvfree_rcu() fastpath?
> >
> > Freeing an object over RCU from
> > NMI context is a corner case. It is __not_ generic.
>
> First, I want to clarify that kfree_rcu_nolock() is not just for NMI
> context. It is intended to be used when the context is unknown (because
> it can be called in an arbitrary code locations).
>
When we say "unknown" to me it sounds like a worst case, which is NMI :)
> There are two kinds of problematic situations where BPF programs
> are attached to:
>
> - 1) a tracepoint or a function that can be invoked in a critical
> section (w/ a lock held), or
>
> - 2) a function that can be called in an NMI context, which might
> preempt an arbitrary context holding a lock.
>
> While 1) and 2) are not (I think) dominant use cases, and although
> most of users can legally call kvfree_rcu(), BPF can't use kvfree_rcu()
> and must consider the most restrictive contexts.
>
> > We even do not have(now
> > in mainline) users because we never support it from NMI,
> > just like call_rcu().
>
> Unfortunately, we've had this use case (of allocating memory for BPF
> programs) for a long time in the mainline. There are two current
> approaches to mitigate the limitation:
>
> - 1) Pre-allocate all memory. e.g.) allocate all hash table elements
> when creating a BPF map, rather than allocating them on demand.
> This ensures correctness but sacrifices memory.
>
> - 2) Use the BPF-specific memory allocator [1] [2] to allocate memory
> on demand and avoid preallocation. While this wastes less memory
> than 1) and also maintains performance, it is re-inventing yet
> another memory allocator.
>
> Also, the allocator reinvented kfree_rcu batching as well.
>
> Now, we're trying to avoid 1) and 2) as much as possible and use
> kmalloc_nolock() instead [3].
>
> > If BPF needs
> > it, then the first question which comes to mind is not about performance.
> > It is how to support this case in kfree_rcu() without adding noticeable
> > complexity or overhead or hacks to the generic path without making it harder
> > to maintain.
>
> Since there will be only few subsystems that needs it, and because
> they already use it on production systems, I don't see much value in
> maintaining a simple implementation if that compromises performance
> (and thus make the transition harder).
>
> > Performance wise you noted, you mean:
> >
> > a) call latency(this is probably the most important for NMI)?
> > b) memory footprint?
> > c) pointer-chasing overhead?
>
> I think it's either
>
> - The performance of kfree_rcu_nolock() itself (a), or
> - Not distrubing workloads running on the machine (b and c)
>
> depending on what people use BPF for.
>
Are you aware of any specific workloads which we can run? To test
and see what we have when it comes to performance metrics? I mean
exact uses cases with exact steps who to trigger them?
That would be useful to see on behaviour.
Thank you!
--
Uladzislau Rezki
next prev parent reply other threads:[~2026-04-30 12:11 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 9:10 [RFC PATCH v2 0/8] kvfree_rcu() improvements Harry Yoo (Oracle)
2026-04-16 9:10 ` [PATCH 1/8] mm/slab: introduce k[v]free_rcu() with struct rcu_ptr Harry Yoo (Oracle)
2026-04-22 14:41 ` Vlastimil Babka (SUSE)
2026-04-23 1:36 ` Harry Yoo (Oracle)
2026-04-16 9:10 ` [PATCH 2/8] fs/dcache: use rcu_ptr instead of rcu_head for external names Harry Yoo (Oracle)
2026-04-21 20:21 ` Al Viro
2026-04-22 1:16 ` Harry Yoo (Oracle)
2026-04-16 9:10 ` [PATCH 3/8] mm/slab: move kfree_rcu_cpu[_work] definitions Harry Yoo (Oracle)
2026-04-16 9:10 ` [PATCH 4/8] mm/slab: introduce kfree_rcu_nolock() Harry Yoo (Oracle)
2026-04-21 22:46 ` Alexei Starovoitov
2026-04-21 23:10 ` Paul E. McKenney
2026-04-21 23:14 ` Alexei Starovoitov
2026-04-22 3:02 ` Harry Yoo (Oracle)
2026-04-22 14:42 ` Uladzislau Rezki
2026-04-23 1:08 ` Harry Yoo (Oracle)
2026-04-23 1:56 ` Harry Yoo (Oracle)
2026-04-27 18:08 ` Vlastimil Babka (SUSE)
2026-04-27 18:51 ` Paul E. McKenney
2026-04-23 2:14 ` Harry Yoo (Oracle)
2026-04-23 4:23 ` Harry Yoo (Oracle)
2026-04-23 11:35 ` Uladzislau Rezki
2026-04-28 13:12 ` Harry Yoo (Oracle)
2026-04-30 12:10 ` Uladzislau Rezki [this message]
2026-04-27 13:08 ` Vlastimil Babka (SUSE)
2026-04-16 9:10 ` [PATCH 5/8] mm/slab: make kfree_rcu_nolock() work with sheaves Harry Yoo (Oracle)
2026-04-27 13:32 ` Vlastimil Babka (SUSE)
2026-04-27 13:53 ` Vlastimil Babka (SUSE)
2026-04-27 14:45 ` Alexei Starovoitov
2026-04-27 15:08 ` Vlastimil Babka (SUSE)
2026-04-27 15:11 ` Alexei Starovoitov
2026-04-16 9:10 ` [PATCH 6/8] mm/slab: wrap rcu sheaf handling with ifdef Harry Yoo (Oracle)
2026-04-27 15:47 ` Vlastimil Babka (SUSE)
2026-04-16 9:10 ` [PATCH 7/8] mm/slab: introduce deferred submission of rcu sheaves Harry Yoo (Oracle)
2026-04-21 22:51 ` Alexei Starovoitov
2026-04-22 3:11 ` Harry Yoo (Oracle)
2026-04-27 15:55 ` Vlastimil Babka (SUSE)
2026-04-16 9:10 ` [PATCH 8/8] lib/tests/slub_kunit: add a test case for kfree_rcu_nolock() Harry Yoo (Oracle)
2026-04-22 14:30 ` [RFC PATCH v2 0/8] kvfree_rcu() improvements Vlastimil Babka (SUSE)
2026-04-22 22:41 ` Paul E. McKenney
2026-04-23 1:31 ` Harry Yoo (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afNG0jNQNYeZ940g@pc636 \
--to=urezki@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=ast@kernel.org \
--cc=boqun@kernel.org \
--cc=cl@gentwo.org \
--cc=frederic@kernel.org \
--cc=hao.li@linux.dev \
--cc=harry@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=qiang.zhang@linux.dev \
--cc=rcu@vger.kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox