BPF List
 help / color / mirror / Atom feed
* [PATCH for-next v3 0/9] mm/slab: introduce kfree_rcu_nolock() and improve slub_kunit coverage
@ 2026-06-15 11:05 Harry Yoo (Oracle)
  2026-06-15 11:05 ` [PATCH for-next v3 1/9] slub_kunit: fall back to SW perf events when HW PMU is not available Harry Yoo (Oracle)
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: Harry Yoo (Oracle) @ 2026-06-15 11:05 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin, Alexei Starovoitov,
	Andrii Nakryiko, Puranjay Mohan, Amery Hung,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Pedro Falcato,
	Suren Baghdasaryan
  Cc: linux-mm, linux-kernel, linux-rt-devel, rcu, bpf

Not the best time to post a series, but didn't want to delay posting
the series for too long. no pressures ;)  This is aimed to be queued
for review and testing after the merge window closes.

This series is based on next-20260612, and is also available on
git.kernel.org [3].

To RCU folks: It would be great if you could kindly take a quick look at
patch 4 and either ack or nack the patch ;)

To BPF folks: Ulad asked to share workloads to measure performance
of kfree_rcu_nolock(). Unfortunately, I focused more on correctness
and have not spent much effort on that. It would be nice if BPF folks
could help evaluate it on their relevant workloads.

To PREEMPT_RT folks: The most relevant part is allowing
kfree_rcu_sheaf() on PREEMPT_RT (patch 6). It carefully avoids sleeping
by acquiring the locks via local_trylock() or spin_trylock_irqsave()
to avoid sleeping within a raw spinlock. When trylock or unlock is
unsafe, kmalloc_nolock() always fails.

Changes since RFC v2
====================

Reduced complexity and intrusiveness (Uladzislau Rezki)
-------------------------------------------------------

While discussing concerns about the complexity of adding allow_spin
handling with Ulad (Thanks!), I realized that adding complexity to the
kvfree_rcu batching is not strictly necessary: only slab objects need to
be batched, they are already batched by rcu sheaves, and slab already
supports unknown context. So it is enough to implement only a minimal
fallback for the sheaves path.

I tried to avoid making intrusive changes to the existing kvfree_rcu
path as much as possible. struct rcu_ptr is renamed to kfree_rcu_head
following Vlastimil's suggestion, and it is used only in the
kfree_rcu_nolock() path for now.

As a result, the complexity is significantly reduced and the series
became much less intrusive. This is also reflected well in the diffstat
below.

RFC v2 diffstat:
  8 files changed, 514 insertions(+), 163 deletions(-)

v3 diffstat:
  6 files changed, 370 insertions(+), 105 deletions(-)

v3 diffstat (slub_kunit improvements - patch 1, 2, 9 excluded):
  5 files changed, 199 insertions(+), 66 deletions(-)

kfree_rcu_sheaf() PREEMPT_RT support (Vlastimil Babka)
------------------------------------------------------

As suggested by Vlastimil (Thanks!), kfree_rcu_sheaf() can now be used
on PREEMPT_RT as well, by always assuming allow_spin is false on
PREEMPT_RT.

slub_kunit enhancements
-----------------------

- Currently the test is skipped when there is no hardware PMU. This can
  happen on machines without a PMU, or in virtualized environments
  (e.g., automated testing or virtme). Implement a fallback based on SW
  perf events so that the test can still run in such environments, even
  though the coverage is slightly smaller.

- While testing on PREEMPT_RT, I found that kmalloc_nolock() fails every
  time, so the fallback path is not properly tested. This is a limitation
  of perf events: the handler is called in NMI (HW perf events) or
  interrupt context (SW perf events), where kmalloc_nolock() cannot
  succeed.

  slub_kunit now registers a kprobe pre-handler at the points in the slab
  allocator where lockdep_assert_held() is invoked. The pre-handler calls
  kmalloc_nolock() and friends, to improve coverage on PREEMPT_RT instead
  of relying on perf events.

One thing that needs to be further explored
-------------------------------------------

The global deferred_free_by_rcu (introduced by patch 8) list for the
fallback should probably be per-CPU [5].

Actual Cover Letter
===================

This series improves kmalloc_nolock() and kfree_nolock() coverage
in slub_kunit (patch 1 and 2) and introduces kfree_rcu_nolock() for
an unknown context as suggested by Alexei Starovoitov.

Unknown context means the caller does not know whether spinning on a lock
is safe (e.g., a BPF program attached to an arbitrary kernel function or
in NMI context).

The slab allocator already supports unknown context via kmalloc_nolock()
and kfree_nolock(), but te slab allocator does not support freeing
objects by RCU in unknown context.

It is not ideal to have completely separate batching for unknown context
because the worst scenario where spinning on a lock would lead to
deadlock is very rare, and in most cases, it is safe to use the
existing mechanism (kfree_rcu_sheaf()).

Since most part of the slab allocator already supports unknown context
and sheaves support batching kvfree_rcu() calls for slab objects,
implement kfree_rcu_nolock() with minimal changes by teaching
kfree_rcu_sheaf() how to support unknown context and making
it a little bit harder to allocate an empty sheaf, instead of making
intrusive changes to the existing kvfree_rcu batching logic.

kfree_rcu_nolock() tries to free the object to the rcu sheaf if
trylock succeeds. Once the rcu sheaf becomes full, it is submitted to
RCU via call_rcu() if spinning is allowed or IRQs are enabled (to avoid
calling call_rcu() in the middle of call_rcu()). Otherwise, call_rcu()
is deferred via irq work.

In unknown context, when there is no sheaf available, kfree_rcu_sheaf()
falls back to defer_kfree_rcu(), which inserts the object to a global
lockless list [5] and those objects are freed after synchronize_rcu() in
a workqueue.

Unlike kfree_rcu(), only the 2-argument variant is supported.
This is because the last resort of the 1-arg variant is
synchronize_rcu(), which cannot be used in an unknown context.

As suggested by Alexei Starovoitov, kfree_rcu_nolock() can be used with
struct kfree_rcu_head (8 bytes), which is smaller than struct rcu_head
(16 bytes).

For more background and future plans, please see [4].

[1] RFC v1: https://lore.kernel.org/linux-mm/20260206093410.160622-1-harry.yoo@oracle.com

[2] RFC v2: https://lore.kernel.org/linux-mm/20260416091022.36823-1-harry@kernel.org

[3] https://git.kernel.org/pub/scm/linux/kernel/git/harry/linux.git/log/?h=kfree_rcu_nolock-v3r3

[4] kmalloc_nolock() follow-ups, including kfree_rcu_nolock(),
    https://lore.kernel.org/linux-mm/esepccfhqg7m6jo76ns2znj2cnuaepx2xvw5zaygtwohq4psma@563ypprp6rr3

[5] However, we should probably make the list percpu because,
    unlike RFC v2, it can be triggered more frequently under memory
    pressure.

    https://lore.kernel.org/linux-mm/805c33d7-3a7b-470c-bd9d-065717a3e3e2@paulmck-laptop

Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
Harry Yoo (Oracle) (9):
      slub_kunit: fall back to SW perf events when HW PMU is not available
      mm/slab, slub_kunit: register kprobe to trigger _nolock APIs
      mm/slab: handle the !allow_spin case in kfree_rcu_sheaf()
      mm/slab: use call_rcu() in unknown context if irqs are enabled
      mm/slab: extend deferred free mechanism to handle rcu sheaves
      mm/slab: allow kfree_rcu_sheaf() on PREEMPT_RT
      mm/slab: introduce kfree_rcu_nolock()
      mm/slab: introduce struct kfree_rcu_head and use in kfree_rcu_nolock()
      slub_kunit: extend the test for kfree_rcu_nolock()

 include/linux/rcupdate.h |  12 +++
 include/linux/types.h    |   4 +
 lib/tests/slub_kunit.c   | 174 ++++++++++++++++++++++++++++------
 mm/slab.h                |   5 +-
 mm/slab_common.c         |  38 ++++++--
 mm/slub.c                | 242 ++++++++++++++++++++++++++++++++++-------------
 6 files changed, 370 insertions(+), 105 deletions(-)
---
base-commit: c425609d6ac4012c8bbf01ec2e10e801b1923a7b
change-id: 20260615-kfree_rcu_nolock-e5502555992f

Best regards,
-- 
Harry Yoo (Oracle) <harry@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-06-15 20:28 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15 11:05 [PATCH for-next v3 0/9] mm/slab: introduce kfree_rcu_nolock() and improve slub_kunit coverage Harry Yoo (Oracle)
2026-06-15 11:05 ` [PATCH for-next v3 1/9] slub_kunit: fall back to SW perf events when HW PMU is not available Harry Yoo (Oracle)
2026-06-15 11:14   ` sashiko-bot
2026-06-15 12:58   ` Harry Yoo
2026-06-15 20:01   ` Alexei Starovoitov
2026-06-15 11:05 ` [PATCH for-next v3 2/9] mm/slab, slub_kunit: register kprobe to trigger _nolock APIs Harry Yoo (Oracle)
2026-06-15 11:25   ` sashiko-bot
2026-06-15 20:04   ` Alexei Starovoitov
2026-06-15 11:05 ` [PATCH for-next v3 3/9] mm/slab: handle the !allow_spin case in kfree_rcu_sheaf() Harry Yoo (Oracle)
2026-06-15 11:24   ` sashiko-bot
2026-06-15 11:05 ` [PATCH for-next v3 4/9] mm/slab: use call_rcu() in unknown context if irqs are enabled Harry Yoo (Oracle)
2026-06-15 11:25   ` sashiko-bot
2026-06-15 11:05 ` [PATCH for-next v3 5/9] mm/slab: extend deferred free mechanism to handle rcu sheaves Harry Yoo (Oracle)
2026-06-15 11:24   ` sashiko-bot
2026-06-15 11:06 ` [PATCH for-next v3 6/9] mm/slab: allow kfree_rcu_sheaf() on PREEMPT_RT Harry Yoo (Oracle)
2026-06-15 11:19   ` sashiko-bot
2026-06-15 11:06 ` [PATCH for-next v3 7/9] mm/slab: introduce kfree_rcu_nolock() Harry Yoo (Oracle)
2026-06-15 11:22   ` sashiko-bot
2026-06-15 11:06 ` [PATCH for-next v3 8/9] mm/slab: introduce struct kfree_rcu_head and use in kfree_rcu_nolock() Harry Yoo (Oracle)
2026-06-15 11:22   ` sashiko-bot
2026-06-15 11:06 ` [PATCH for-next v3 9/9] slub_kunit: extend the test for kfree_rcu_nolock() Harry Yoo (Oracle)
2026-06-15 11:43 ` [PATCH for-next v3 0/9] mm/slab: introduce kfree_rcu_nolock() and improve slub_kunit coverage Harry Yoo
2026-06-15 20:28 ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox