From: Pedro Falcato <pfalcato@suse.de>
To: "Harry Yoo (Oracle)" <harry@kernel.org>
Cc: Vlastimil Babka <vbabka@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Hao Li <hao.li@linux.dev>, Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Puranjay Mohan <puranjay@kernel.org>,
Amery Hung <ameryhung@gmail.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Clark Williams <clrkwllms@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun@kernel.org>,
Uladzislau Rezki <urezki@gmail.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang@linux.dev>,
Suren Baghdasaryan <surenb@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-rt-devel@lists.linux.dev, rcu@vger.kernel.org,
bpf@vger.kernel.org
Subject: Re: [PATCH for-next v3 5/9] mm/slab: extend deferred free mechanism to handle rcu sheaves
Date: Wed, 24 Jun 2026 15:37:23 +0100 [thread overview]
Message-ID: <ajvrNoE81J-_4zsM@pedro-suse> (raw)
In-Reply-To: <20260615-kfree_rcu_nolock-v3-5-70a54f3775bb@kernel.org>
On Mon, Jun 15, 2026 at 08:05:59PM +0900, Harry Yoo (Oracle) wrote:
> __kfree_rcu_sheaf() cannot invoke call_rcu() when spinning is not
> allowed and IRQs are disabled. To relax the limitation, extend the
> deferred free fallback so that a full rcu sheaf can be submitted to
> call_rcu() via the existing IRQ work.
>
> Since the deferred mechanism does more than deferred free of objects,
> rename the struct to deferred_percpu_work and adjust names accordingly.
>
> When a sheaf is queued on an IRQ work, it is detached from
> pcs->rcu_free but call_rcu() is not invoked until the irq_work runs.
> To keep the kvfree_rcu barrier's promise, call irq_work_sync() on each
> CPU before calling rcu_barrier().
>
> In the meantime, remove the TODO item as apparently there is no simple
> and effective way to achieve that.
>
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
> ---
> mm/slab.h | 2 +-
> mm/slab_common.c | 7 ++---
> mm/slub.c | 79 ++++++++++++++++++++++++++++++++++----------------------
> 3 files changed, 51 insertions(+), 37 deletions(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index b1bd33a16544..961581e35ec8 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -744,7 +744,7 @@ void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
> void __check_heap_object(const void *ptr, unsigned long n,
> const struct slab *slab, bool to_user);
>
> -void defer_free_barrier(void);
> +void deferred_work_barrier(void);
>
> static inline bool slub_debug_orig_size(struct kmem_cache *s)
> {
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index bc1a8ec938d9..55546b8385ff 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -551,7 +551,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
> }
>
> /* Wait for deferred work from kmalloc/kfree_nolock() */
> - defer_free_barrier();
> + deferred_work_barrier();
>
> cpus_read_lock();
> mutex_lock(&slab_mutex);
> @@ -2113,13 +2113,10 @@ void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
> cpus_read_lock();
> flush_rcu_sheaves_on_cache(s);
> cpus_read_unlock();
> + deferred_work_barrier();
> rcu_barrier();
> }
>
> - /*
> - * TODO: Introduce a version of __kvfree_rcu_barrier() that works
> - * on a specific slab cache.
> - */
Perhaps could be worth detailing why this is not possible.
> __kvfree_rcu_barrier();
> }
> EXPORT_SYMBOL_GPL(kvfree_rcu_barrier_on_cache);
> diff --git a/mm/slub.c b/mm/slub.c
> index 6a3552b70683..ba593c1c53d5 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -418,6 +418,8 @@ struct slab_sheaf {
> union {
> struct rcu_head rcu_head;
> struct list_head barn_list;
> + /* only used to defer call_rcu() in unknown context */
> + struct llist_node llnode;
> /* only used for prefilled sheafs */
> struct {
> unsigned int capacity;
> @@ -4071,6 +4073,20 @@ static void flush_all(struct kmem_cache *s)
> cpus_read_unlock();
> }
>
> +struct deferred_percpu_work {
> + struct llist_head objects;
> + struct llist_head rcu_sheaves;
> + struct irq_work work;
> +};
> +
> +static void deferred_percpu_work_fn(struct irq_work *work);
> +
> +static DEFINE_PER_CPU(struct deferred_percpu_work, deferred_percpu_work) = {
> + .objects = LLIST_HEAD_INIT(objects),
> + .rcu_sheaves = LLIST_HEAD_INIT(rcu_sheaves),
> + .work = IRQ_WORK_INIT(deferred_percpu_work_fn),
> +};
> +
> static void flush_rcu_sheaf(struct work_struct *w)
> {
> struct slub_percpu_sheaves *pcs;
> @@ -4142,6 +4158,7 @@ void flush_all_rcu_sheaves(void)
> mutex_unlock(&slab_mutex);
> cpus_read_unlock();
>
> + deferred_work_barrier();
> rcu_barrier();
> }
>
> @@ -6158,12 +6175,6 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj, bool allow_spin)
> if (likely(rcu_sheaf->size < s->sheaf_capacity)) {
> rcu_sheaf = NULL;
> } else {
> - /* call_rcu() disables IRQs to protect percpu data structures */
> - if (unlikely(!allow_spin && irqs_disabled())) {
> - rcu_sheaf->size--;
> - local_unlock(&s->cpu_sheaves->lock);
> - goto fail;
> - }
> pcs->rcu_free = NULL;
> rcu_sheaf->node = numa_node_id();
> }
> @@ -6172,8 +6183,18 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj, bool allow_spin)
> * we flush before local_unlock to make sure a racing
> * flush_all_rcu_sheaves() doesn't miss this sheaf
> */
> - if (rcu_sheaf)
> - call_rcu(&rcu_sheaf->rcu_head, rcu_free_sheaf);
> + if (rcu_sheaf) {
> + /* call_rcu() disables IRQs to protect percpu data structures */
> + if (unlikely(!allow_spin && irqs_disabled())) {
> + struct deferred_percpu_work *dpw;
> +
> + dpw = this_cpu_ptr(&deferred_percpu_work);
> + if (llist_add(&rcu_sheaf->llnode, &dpw->rcu_sheaves))
> + irq_work_queue(&dpw->work);
> + } else {
> + call_rcu(&rcu_sheaf->rcu_head, rcu_free_sheaf);
> + }
> + }
>
> local_unlock(&s->cpu_sheaves->lock);
>
> @@ -6360,31 +6381,20 @@ static void free_to_pcs_bulk(struct kmem_cache *s, size_t size, void **p)
> }
> }
>
> -struct defer_free {
> - struct llist_head objects;
> - struct irq_work work;
> -};
> -
> -static void free_deferred_objects(struct irq_work *work);
> -
> -static DEFINE_PER_CPU(struct defer_free, defer_free_objects) = {
> - .objects = LLIST_HEAD_INIT(objects),
> - .work = IRQ_WORK_INIT(free_deferred_objects),
> -};
> -
> /*
> * In PREEMPT_RT irq_work runs in per-cpu kthread, so it's safe
> * to take sleeping spin_locks from __slab_free().
> * In !PREEMPT_RT irq_work will run after local_unlock_irqrestore().
> */
> -static void free_deferred_objects(struct irq_work *work)
> +static void deferred_percpu_work_fn(struct irq_work *work)
> {
> - struct defer_free *df = container_of(work, struct defer_free, work);
> - struct llist_head *objs = &df->objects;
> + struct deferred_percpu_work *dpw;
> + struct llist_head *objs, *rcu_sheaves;
> struct llist_node *llnode, *pos, *t;
>
> - if (llist_empty(objs))
> - return;
> + dpw = container_of(work, struct deferred_percpu_work, work);
> + rcu_sheaves = &dpw->rcu_sheaves;
> + objs = &dpw->objects;
>
> llnode = llist_del_all(objs);
> llist_for_each_safe(pos, t, llnode) {
> @@ -6408,27 +6418,34 @@ static void free_deferred_objects(struct irq_work *work)
> __slab_free(s, slab, x, x, 1, _THIS_IP_);
> stat(s, FREE_SLOWPATH);
> }
> +
> + llnode = llist_del_all(rcu_sheaves);
> + llist_for_each_safe(pos, t, llnode) {
llist_for_each_entry_safe?
> + struct slab_sheaf *rcu_sheaf = llist_entry(pos, struct slab_sheaf, llnode);
> +
> + call_rcu(&rcu_sheaf->rcu_head, rcu_free_sheaf);
> + }
> }
>
Otherwise LGTM!
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
--
Pedro
next prev parent reply other threads:[~2026-06-24 14:37 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-15 11:05 [PATCH for-next v3 0/9] mm/slab: introduce kfree_rcu_nolock() and improve slub_kunit coverage Harry Yoo (Oracle)
2026-06-15 11:05 ` [PATCH for-next v3 1/9] slub_kunit: fall back to SW perf events when HW PMU is not available Harry Yoo (Oracle)
2026-06-15 11:14 ` sashiko-bot
2026-06-15 12:58 ` Harry Yoo
2026-06-15 20:01 ` Alexei Starovoitov
2026-06-16 5:23 ` Harry Yoo
2026-06-15 11:05 ` [PATCH for-next v3 2/9] mm/slab, slub_kunit: register kprobe to trigger _nolock APIs Harry Yoo (Oracle)
2026-06-15 11:25 ` sashiko-bot
2026-06-15 20:04 ` Alexei Starovoitov
2026-06-16 6:57 ` Vlastimil Babka (SUSE)
2026-06-24 13:41 ` Pedro Falcato
2026-06-15 11:05 ` [PATCH for-next v3 3/9] mm/slab: handle the !allow_spin case in kfree_rcu_sheaf() Harry Yoo (Oracle)
2026-06-15 11:24 ` sashiko-bot
2026-06-16 7:55 ` Vlastimil Babka (SUSE)
2026-06-16 9:20 ` Vlastimil Babka (SUSE)
2026-06-17 5:32 ` Harry Yoo
2026-06-17 5:58 ` Vlastimil Babka (SUSE)
2026-06-24 14:28 ` Pedro Falcato
2026-06-15 11:05 ` [PATCH for-next v3 4/9] mm/slab: use call_rcu() in unknown context if irqs are enabled Harry Yoo (Oracle)
2026-06-15 11:25 ` sashiko-bot
2026-06-16 9:51 ` Vlastimil Babka (SUSE)
2026-06-24 14:30 ` Pedro Falcato
2026-06-15 11:05 ` [PATCH for-next v3 5/9] mm/slab: extend deferred free mechanism to handle rcu sheaves Harry Yoo (Oracle)
2026-06-15 11:24 ` sashiko-bot
2026-06-16 13:03 ` Vlastimil Babka (SUSE)
2026-06-24 14:37 ` Pedro Falcato [this message]
2026-06-15 11:06 ` [PATCH for-next v3 6/9] mm/slab: allow kfree_rcu_sheaf() on PREEMPT_RT Harry Yoo (Oracle)
2026-06-15 11:19 ` sashiko-bot
2026-06-16 17:24 ` Vlastimil Babka (SUSE)
2026-06-17 5:14 ` Harry Yoo
2026-06-17 5:38 ` Vlastimil Babka (SUSE)
2026-06-17 20:40 ` Paul E. McKenney
2026-06-18 12:46 ` Harry Yoo
2026-06-18 16:26 ` Paul E. McKenney
2026-06-15 11:06 ` [PATCH for-next v3 7/9] mm/slab: introduce kfree_rcu_nolock() Harry Yoo (Oracle)
2026-06-15 11:22 ` sashiko-bot
2026-06-16 17:28 ` Vlastimil Babka (SUSE)
2026-06-21 0:29 ` XIAO WU
2026-06-22 5:28 ` Harry Yoo
2026-06-22 14:56 ` XIAO WU
2026-06-15 11:06 ` [PATCH for-next v3 8/9] mm/slab: introduce struct kfree_rcu_head and use in kfree_rcu_nolock() Harry Yoo (Oracle)
2026-06-15 11:22 ` sashiko-bot
2026-06-16 17:36 ` Vlastimil Babka (SUSE)
2026-06-15 11:06 ` [PATCH for-next v3 9/9] slub_kunit: extend the test for kfree_rcu_nolock() Harry Yoo (Oracle)
2026-06-16 17:38 ` Vlastimil Babka (SUSE)
2026-06-15 11:43 ` [PATCH for-next v3 0/9] mm/slab: introduce kfree_rcu_nolock() and improve slub_kunit coverage Harry Yoo
2026-06-15 20:28 ` Alexei Starovoitov
2026-06-16 4:57 ` Harry Yoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajvrNoE81J-_4zsM@pedro-suse \
--to=pfalcato@suse.de \
--cc=akpm@linux-foundation.org \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=boqun@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cl@gentwo.org \
--cc=clrkwllms@kernel.org \
--cc=frederic@kernel.org \
--cc=hao.li@linux.dev \
--cc=harry@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=mathieu.desnoyers@efficios.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=puranjay@kernel.org \
--cc=qiang.zhang@linux.dev \
--cc=rcu@vger.kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=surenb@google.com \
--cc=urezki@gmail.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.