Re: [PATCH v3 09/21] slab: add optimized sheaf refill from partial list

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Harry Yoo <harry.yoo@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Petr Tesarik <ptesarik@suse.com>,
	Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Hao Li <hao.li@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Alexei Starovoitov <ast@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-rt-devel@lists.linux.dev, bpf@vger.kernel.org,
	kasan-dev@googlegroups.com
Subject: Re: [PATCH v3 09/21] slab: add optimized sheaf refill from partial list
Date: Tue, 20 Jan 2026 10:41:37 +0900	[thread overview]
Message-ID: <aW7dUeoDALhJI0Ic@hyeyoo> (raw)
In-Reply-To: <e106a4d5-32f7-4314-b8c1-19ebc6da6d7a@suse.cz>

On Mon, Jan 19, 2026 at 11:54:18AM +0100, Vlastimil Babka wrote:
> On 1/19/26 07:41, Harry Yoo wrote:
> > On Fri, Jan 16, 2026 at 03:40:29PM +0100, Vlastimil Babka wrote:
> >> At this point we have sheaves enabled for all caches, but their refill
> >> is done via __kmem_cache_alloc_bulk() which relies on cpu (partial)
> >> slabs - now a redundant caching layer that we are about to remove.
> >> 
> >> The refill will thus be done from slabs on the node partial list.
> >> Introduce new functions that can do that in an optimized way as it's
> >> easier than modifying the __kmem_cache_alloc_bulk() call chain.
> >> 
> >> Extend struct partial_context so it can return a list of slabs from the
> >> partial list with the sum of free objects in them within the requested
> >> min and max.
> >> 
> >> Introduce get_partial_node_bulk() that removes the slabs from freelist
> >> and returns them in the list.
> >> 
> >> Introduce get_freelist_nofreeze() which grabs the freelist without
> >> freezing the slab.
> >> 
> >> Introduce alloc_from_new_slab() which can allocate multiple objects from
> >> a newly allocated slab where we don't need to synchronize with freeing.
> >> In some aspects it's similar to alloc_single_from_new_slab() but assumes
> >> the cache is a non-debug one so it can avoid some actions.
> >> 
> >> Introduce __refill_objects() that uses the functions above to fill an
> >> array of objects. It has to handle the possibility that the slabs will
> >> contain more objects that were requested, due to concurrent freeing of
> >> objects to those slabs. When no more slabs on partial lists are
> >> available, it will allocate new slabs. It is intended to be only used
> >> in context where spinning is allowed, so add a WARN_ON_ONCE check there.
> >> 
> >> Finally, switch refill_sheaf() to use __refill_objects(). Sheaves are
> >> only refilled from contexts that allow spinning, or even blocking.
> >> 
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
> >>  mm/slub.c | 284 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> >>  1 file changed, 264 insertions(+), 20 deletions(-)
> >> 
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index 9bea8a65e510..dce80463f92c 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -3522,6 +3525,63 @@ static inline void put_cpu_partial(struct kmem_cache *s, struct slab *slab,
> >>  #endif
> >>  static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags);
> >>  
> >> +static bool get_partial_node_bulk(struct kmem_cache *s,
> >> +				  struct kmem_cache_node *n,
> >> +				  struct partial_context *pc)
> >> +{
> >> +	struct slab *slab, *slab2;
> >> +	unsigned int total_free = 0;
> >> +	unsigned long flags;
> >> +
> >> +	/* Racy check to avoid taking the lock unnecessarily. */
> >> +	if (!n || data_race(!n->nr_partial))
> >> +		return false;
> >> +
> >> +	INIT_LIST_HEAD(&pc->slabs);
> >> +
> >> +	spin_lock_irqsave(&n->list_lock, flags);
> >> +
> >> +	list_for_each_entry_safe(slab, slab2, &n->partial, slab_list) {
> >> +		struct freelist_counters flc;
> >> +		unsigned int slab_free;
> >> +
> >> +		if (!pfmemalloc_match(slab, pc->flags))
> >> +			continue;
> >> +		/*
> >> +		 * determine the number of free objects in the slab racily
> >> +		 *
> >> +		 * due to atomic updates done by a racing free we should not
> >> +		 * read an inconsistent value here, but do a sanity check anyway
> >> +		 *
> >> +		 * slab_free is a lower bound due to subsequent concurrent
> >> +		 * freeing, the caller might get more objects than requested and
> >> +		 * must deal with it
> >> +		 */
> >> +		flc.counters = data_race(READ_ONCE(slab->counters));
> >> +		slab_free = flc.objects - flc.inuse;
> >> +
> >> +		if (unlikely(slab_free > oo_objects(s->oo)))
> >> +			continue;
> > 
> > When is this condition supposed to be true?
> > 
> > I guess it's when __update_freelist_slow() doesn't update
> > slab->counters atomically?
> 
> Yeah. Probably could be solvable with WRITE_ONCE() there, as this is only
> about hypothetical read/write tearing, not seeing stale values.

Ok. That's less confusing than "we should not read an inconsistent value
here, but do a sanity check anyway".

> >> +
> >> +		/* we have already min and this would get us over the max */
> >> +		if (total_free >= pc->min_objects
> >> +		    && total_free + slab_free > pc->max_objects)
> >> +			break;
> >> +
> >> +		remove_partial(n, slab);
> >> +
> >> +		list_add(&slab->slab_list, &pc->slabs);
> >> +
> >> +		total_free += slab_free;
> >> +		if (total_free >= pc->max_objects)
> >> +			break;
> >> +	}
> >> +
> >> +	spin_unlock_irqrestore(&n->list_lock, flags);
> >> +	return total_free > 0;
> >> +}
> >> +
> >>  /*
> >>   * Try to allocate a partial slab from a specific node.
> >>   */
> >> +static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
> >> +		void **p, unsigned int count, bool allow_spin)
> >> +{
> >> +	unsigned int allocated = 0;
> >> +	struct kmem_cache_node *n;
> >> +	unsigned long flags;
> >> +	void *object;
> >> +
> >> +	if (!allow_spin && (slab->objects - slab->inuse) > count) {
> >> +
> >> +		n = get_node(s, slab_nid(slab));
> >> +
> >> +		if (!spin_trylock_irqsave(&n->list_lock, flags)) {
> >> +			/* Unlucky, discard newly allocated slab */
> >> +			defer_deactivate_slab(slab, NULL);
> >> +			return 0;
> >> +		}
> >> +	}
> >> +
> >> +	object = slab->freelist;
> >> +	while (object && allocated < count) {
> >> +		p[allocated] = object;
> >> +		object = get_freepointer(s, object);
> >> +		maybe_wipe_obj_freeptr(s, p[allocated]);
> >> +
> >> +		slab->inuse++;
> >> +		allocated++;
> >> +	}
> >> +	slab->freelist = object;
> >> +
> >> +	if (slab->freelist) {
> >> +
> >> +		if (allow_spin) {
> >> +			n = get_node(s, slab_nid(slab));
> >> +			spin_lock_irqsave(&n->list_lock, flags);
> >> +		}
> >> +		add_partial(n, slab, DEACTIVATE_TO_HEAD);
> >> +		spin_unlock_irqrestore(&n->list_lock, flags);
> >> +	}
> >> +
> >> +	inc_slabs_node(s, slab_nid(slab), slab->objects);
> > 
> > Maybe add a comment explaining why inc_slabs_node() doesn't need to be
> > called under n->list_lock?
> 
> Hm, we might not even be holding it. The old code also did the inc with no
> comment. If anything could use one, it would be in
> alloc_single_from_new_slab()? But that's outside the scope here.

Ok. Perhaps worth adding something like this later, but yeah it's outside
the scope here.

diff --git a/mm/slub.c b/mm/slub.c
index 698c0d940f06..c5a1e47dfe16 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1633,6 +1633,9 @@ static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
 {
 	struct kmem_cache_node *n = get_node(s, node);
 
+	if (kmem_cache_debug(s))
+		/* slab validation may generate false errors without the lock */
+		lockdep_assert_held(&n->list_lock);
 	atomic_long_inc(&n->nr_slabs);
 	atomic_long_add(objects, &n->total_objects);
 }


-- 
Cheers,
Harry / Hyeonggon

next prev parent reply	other threads:[~2026-01-20  1:42 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-16 14:40 [PATCH v3 00/21] slab: replace cpu (partial) slabs with sheaves Vlastimil Babka
2026-01-16 14:40 ` [PATCH v3 01/21] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache() Vlastimil Babka
2026-01-16 14:40 ` [PATCH v3 02/21] slab: add SLAB_CONSISTENCY_CHECKS to SLAB_NEVER_MERGE Vlastimil Babka
2026-01-16 17:22   ` Suren Baghdasaryan
2026-01-19  3:41   ` Harry Yoo
2026-01-16 14:40 ` [PATCH v3 03/21] mm/slab: move and refactor __kmem_cache_alias() Vlastimil Babka
2026-01-16 14:40 ` [PATCH v3 04/21] mm/slab: make caches with sheaves mergeable Vlastimil Babka
2026-01-16 14:40 ` [PATCH v3 05/21] slab: add sheaves to most caches Vlastimil Babka
2026-01-20 18:47   ` Breno Leitao
2026-01-21  8:12     ` Vlastimil Babka
2026-01-16 14:40 ` [PATCH v3 06/21] slab: introduce percpu sheaves bootstrap Vlastimil Babka
2026-01-17  2:11   ` Suren Baghdasaryan
2026-01-19  3:40     ` Harry Yoo
2026-01-19  9:13       ` Vlastimil Babka
2026-01-19  9:34     ` Vlastimil Babka
2026-01-21 10:52     ` Vlastimil Babka
2026-01-19 11:32   ` Hao Li
2026-01-21 10:54     ` Vlastimil Babka
2026-01-16 14:40 ` [PATCH v3 07/21] slab: make percpu sheaves compatible with kmalloc_nolock()/kfree_nolock() Vlastimil Babka
2026-01-18 20:45   ` Suren Baghdasaryan
2026-01-19  4:31   ` Harry Yoo
2026-01-19 10:09     ` Vlastimil Babka
2026-01-19 10:23       ` Vlastimil Babka
2026-01-19 12:06         ` Hao Li
2026-01-16 14:40 ` [PATCH v3 08/21] slab: handle kmalloc sheaves bootstrap Vlastimil Babka
2026-01-19  5:23   ` Harry Yoo
2026-01-20  1:04   ` Hao Li
2026-01-16 14:40 ` [PATCH v3 09/21] slab: add optimized sheaf refill from partial list Vlastimil Babka
2026-01-19  6:41   ` Harry Yoo
2026-01-19  8:02     ` Harry Yoo
2026-01-19 10:54     ` Vlastimil Babka
2026-01-20  1:41       ` Harry Yoo [this message]
2026-01-20  9:32         ` Hao Li
2026-01-20 10:22           ` Harry Yoo
2026-01-20  2:32   ` Harry Yoo
2026-01-20  6:33     ` Vlastimil Babka
2026-01-20 10:27       ` Harry Yoo
2026-01-20 10:32         ` Vlastimil Babka
2026-01-20  2:55   ` Hao Li
2026-01-20 17:19   ` Suren Baghdasaryan
2026-01-21 13:22     ` Vlastimil Babka
2026-01-21 16:12       ` Suren Baghdasaryan
2026-01-16 14:40 ` [PATCH v3 10/21] slab: remove cpu (partial) slabs usage from allocation paths Vlastimil Babka
2026-01-20  4:20   ` Harry Yoo
2026-01-20  8:36   ` Hao Li
2026-01-20 18:06   ` Suren Baghdasaryan
2026-01-21 13:56     ` Vlastimil Babka
2026-01-16 14:40 ` [PATCH v3 11/21] slab: remove SLUB_CPU_PARTIAL Vlastimil Babka
2026-01-20  5:24   ` Harry Yoo
2026-01-20 12:10   ` Hao Li
2026-01-20 22:25   ` Suren Baghdasaryan
2026-01-21  0:58     ` Harry Yoo
2026-01-21  1:06       ` Harry Yoo
2026-01-21 16:21       ` Suren Baghdasaryan
2026-01-21 14:22     ` Vlastimil Babka
2026-01-21 14:43       ` Vlastimil Babka
2026-01-21 16:22       ` Suren Baghdasaryan
2026-01-16 14:40 ` [PATCH v3 12/21] slab: remove the do_slab_free() fastpath Vlastimil Babka
2026-01-20  5:35   ` Harry Yoo
2026-01-20 12:29   ` Hao Li
2026-01-21 16:57     ` Suren Baghdasaryan
2026-01-16 14:40 ` [PATCH v3 13/21] slab: remove defer_deactivate_slab() Vlastimil Babka
2026-01-20  5:47   ` Harry Yoo
2026-01-20  9:35   ` Hao Li
2026-01-21 17:11     ` Suren Baghdasaryan
2026-01-16 14:40 ` [PATCH v3 14/21] slab: simplify kmalloc_nolock() Vlastimil Babka
2026-01-20 12:06   ` Hao Li
2026-01-21 17:39     ` Suren Baghdasaryan
2026-01-22  1:53   ` Harry Yoo
2026-01-22  8:16     ` Vlastimil Babka
2026-01-22  8:34       ` Harry Yoo
2026-01-16 14:40 ` [PATCH v3 15/21] slab: remove struct kmem_cache_cpu Vlastimil Babka
2026-01-20 12:40   ` Hao Li
2026-01-21 14:29     ` Vlastimil Babka
2026-01-21 17:54       ` Suren Baghdasaryan
2026-01-21 19:03         ` Vlastimil Babka
2026-01-22  3:10   ` Harry Yoo
2026-01-16 14:40 ` [PATCH v3 16/21] slab: remove unused PREEMPT_RT specific macros Vlastimil Babka
2026-01-21  6:42   ` Hao Li
2026-01-21 17:57     ` Suren Baghdasaryan
2026-01-22  3:50   ` Harry Yoo
2026-01-16 14:40 ` [PATCH v3 17/21] slab: refill sheaves from all nodes Vlastimil Babka
2026-01-21 18:30   ` Suren Baghdasaryan
2026-01-22  4:44   ` Harry Yoo
2026-01-22  8:37     ` Vlastimil Babka
2026-01-22  4:58   ` Hao Li
2026-01-22  8:32     ` Vlastimil Babka
2026-01-22  7:02   ` Harry Yoo
2026-01-22  8:42     ` Vlastimil Babka
2026-01-16 14:40 ` [PATCH v3 18/21] slab: update overview comments Vlastimil Babka
2026-01-21 20:58   ` Suren Baghdasaryan
2026-01-22  3:54   ` Hao Li
2026-01-22  6:41   ` Harry Yoo
2026-01-22  8:49     ` Vlastimil Babka
2026-01-16 14:40 ` [PATCH v3 19/21] slab: remove frozen slab checks from __slab_free() Vlastimil Babka
2026-01-22  0:54   ` Suren Baghdasaryan
2026-01-22  6:31     ` Vlastimil Babka
2026-01-22  5:01   ` Hao Li
2026-01-16 14:40 ` [PATCH v3 20/21] mm/slub: remove DEACTIVATE_TO_* stat items Vlastimil Babka
2026-01-22  0:58   ` Suren Baghdasaryan
2026-01-22  5:17   ` Hao Li
2026-01-16 14:40 ` [PATCH v3 21/21] mm/slub: cleanup and repurpose some " Vlastimil Babka
2026-01-22  2:35   ` Suren Baghdasaryan
2026-01-22  9:30     ` Vlastimil Babka
2026-01-22  5:52   ` Hao Li
2026-01-22  9:30     ` Vlastimil Babka

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:698c0d940f0 dfblob:c5a1e47dfe1 )
 OR (
bs:"Re: [PATCH v3 09/21] slab: add optimized sheaf refill from partial list" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aW7dUeoDALhJI0Ic@hyeyoo \
    --to=harry.yoo@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=cl@gentwo.org \
    --cc=hao.li@linux.dev \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=ptesarik@suse.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=surenb@google.com \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.