Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] mm, slab: optimize returning objects after a partial refill
@ 2026-05-22 14:23 Vlastimil Babka (SUSE)
  2026-05-22 14:23 ` [PATCH v3 1/2] mm, slab: add an optimistic __slab_try_return_freelist() Vlastimil Babka (SUSE)
  2026-05-22 14:23 ` [PATCH v3 2/2] mm, slab: simplify returning slab in __refill_objects_node() Vlastimil Babka (SUSE)
  0 siblings, 2 replies; 7+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-05-22 14:23 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Andrew Morton, linux-mm, linux-kernel, hu.shengming,
	Vinicius Costa Gomes, Vlastimil Babka (SUSE)

Optimizes the current refill leftover handling in a way that should have
no downsides, so we have a better baseline for any further changes (e.g.
spilling or caching the leftover) that involve some tradeoffs.

It's based on slab/for-7.2/perf without the previous version of the
first patch. Sending a new version due to accumulated fixups to the
first patch that resolve a perf regression, and a new cleanup patch on
top.

Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
Changes in v3:
- Fixup patch 1 to resolve lkp regression https://lore.kernel.org/all/202605112204.9382cecf-lkp@intel.com/
  - Return slabs to tail of the partial list - this resolved the
    regression.
  - Remove an unlikely() - this had no effect but it's factually
    correct.
- Add patch 2 wich further changes created by trying to resolve the
  regression. Those had no effect but still makes sense as a code
  simplification.
- Link to v2: https://patch.msgid.link/20260427-b4-refill-optimistic-return-v2-1-1672467a792d@kernel.org

Changes in v2:
- rebase to slab/for-7.2/perf, drop RFC
- simplify to reuse the existing reattaching to partial list (Hao Li)
- Add R-b from Hao Li, thanks!
- drop the stat items - they serverd to verify the optimistic path was
  succeeding, but are too detailed for mainline
- Link to v1: https://patch.msgid.link/20260421-b4-refill-optimistic-return-v1-1-24f0bfc1acff@kernel.org

---
Vlastimil Babka (SUSE) (2):
      mm, slab: add an optimistic __slab_try_return_freelist()
      mm, slab: simplify returning slab in __refill_objects_node()

 mm/slub.c | 69 ++++++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 48 insertions(+), 21 deletions(-)
---
base-commit: dc795d4c0282a4fbfbcd76a70c09ca0888678443
change-id: 20260421-b4-refill-optimistic-return-f44d3b74cc49

Best regards,
--  
Vlastimil Babka (SUSE) <vbabka@kernel.org>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/2] mm, slab: add an optimistic __slab_try_return_freelist()
  2026-05-22 14:23 [PATCH v3 0/2] mm, slab: optimize returning objects after a partial refill Vlastimil Babka (SUSE)
@ 2026-05-22 14:23 ` Vlastimil Babka (SUSE)
  2026-05-22 17:40   ` Vlastimil Babka (SUSE)
  2026-05-22 14:23 ` [PATCH v3 2/2] mm, slab: simplify returning slab in __refill_objects_node() Vlastimil Babka (SUSE)
  1 sibling, 1 reply; 7+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-05-22 14:23 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Andrew Morton, linux-mm, linux-kernel, hu.shengming,
	Vinicius Costa Gomes, Vlastimil Babka (SUSE)

When we end up returning extraneous objects during refill to a slab
where we just did a get_freelist_nofreeze(), it is likely no other CPU
has freed objects to it meanwhile. We can then reattach the remainder of
the freelist without having to walk the (potentially cache cold)
freelist for finding its tail to connect slab->freelist to it.

Add a __slab_try_return_freelist() function that does that. As suggested
by Hao Li, it doesn't need to also return the slab to the partial list,
because there's code in __refill_objects_node() that already does that
for any slabs where we don't detach the freelist in the first place. So
we just put the slab back to the pc.slabs list. It's no longer likely
that the list will be empty now, so remove the unlikely() annotation.

However, also change that code to add to the tail of the partial list
instead of head to match what __slab_free() did and avoid a regression,
that was reported for the earlier version by the kernel test robot [1].
This change will also affect slabs which were grabbed from the partial
list and not refilled from even partially, but those should be much more
rare than a partial refill.

[1] https://lore.kernel.org/all/202605112204.9382cecf-lkp@intel.com/

Reviewed-by: Hao Li <hao.li@linux.dev>
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slub.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 11 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index a7bd929bcb26..5816fcfc7a90 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4326,7 +4326,8 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags)
  * Assumes this is performed only for caches without debugging so we
  * don't need to worry about adding the slab to the full list.
  */
-static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *slab)
+static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *slab,
+					  unsigned int *count)
 {
 	struct freelist_counters old, new;
 
@@ -4342,6 +4343,7 @@ static inline void *get_freelist_nofreeze(struct kmem_cache *s, struct slab *sla
 
 	} while (!slab_update_freelist(s, slab, &old, &new, "get_freelist_nofreeze"));
 
+	*count = old.objects - old.inuse;
 	return old.freelist;
 }
 
@@ -5509,6 +5511,34 @@ static noinline void free_to_partial_list(
 	}
 }
 
+/*
+ * Try returning (remainder of) the freelist that we just detached from the
+ * slab.  Optimistically assume the slab is still full, so we don't need to find
+ * the tail of the detached freelist.
+ *
+ * Fail if the slab isn't full anymore due to a cocurrent free.
+ */
+static bool __slab_try_return_freelist(struct kmem_cache *s, struct slab *slab,
+				       void *head, int cnt)
+{
+	struct freelist_counters old, new;
+
+	old.freelist = slab->freelist;
+	old.counters = slab->counters;
+
+	if (old.freelist)
+		return false;
+
+	new.freelist = head;
+	new.counters = old.counters;
+	new.inuse -= cnt;
+
+	if (!slab_update_freelist(s, slab, &old, &new, "__slab_try_return_freelist"))
+		return false;
+
+	return true;
+}
+
 /*
  * Slow path handling. This may still be called frequently since objects
  * have a longer lifetime than the cpu slabs in most processing loads.
@@ -7120,41 +7150,48 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
 
 	list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
 
+		unsigned int count;
+
 		list_del(&slab->slab_list);
 
-		object = get_freelist_nofreeze(s, slab);
+		object = get_freelist_nofreeze(s, slab, &count);
 
-		while (object && refilled < max) {
+		while (count && refilled < max) {
 			p[refilled] = object;
 			object = get_freepointer(s, object);
 			maybe_wipe_obj_freeptr(s, p[refilled]);
 
 			refilled++;
+			count--;
 		}
 
 		/*
 		 * Freelist had more objects than we can accommodate, we need to
-		 * free them back. We can treat it like a detached freelist, just
-		 * need to find the tail object.
+		 * free them back. First we try to be optimistic and assume the
+		 * slab is stil full since we just detached its freelist.
+		 * Otherwise we must need to find the tail object.
 		 */
-		if (unlikely(object)) {
+		if (unlikely(count)) {
 			void *head = object;
 			void *tail;
-			int cnt = 0;
+
+			if (__slab_try_return_freelist(s, slab, head, count)) {
+				list_add(&slab->slab_list, &pc.slabs);
+				break;
+			}
 
 			do {
 				tail = object;
-				cnt++;
 				object = get_freepointer(s, object);
 			} while (object);
-			__slab_free(s, slab, head, tail, cnt, _RET_IP_);
+			__slab_free(s, slab, head, tail, count, _RET_IP_);
 		}
 
 		if (refilled >= max)
 			break;
 	}
 
-	if (unlikely(!list_empty(&pc.slabs))) {
+	if (!list_empty(&pc.slabs)) {
 		spin_lock_irqsave(&n->list_lock, flags);
 
 		list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
@@ -7163,7 +7200,7 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
 				continue;
 
 			list_del(&slab->slab_list);
-			add_partial(n, slab, ADD_TO_HEAD);
+			add_partial(n, slab, ADD_TO_TAIL);
 		}
 
 		spin_unlock_irqrestore(&n->list_lock, flags);

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/2] mm, slab: simplify returning slab in __refill_objects_node()
  2026-05-22 14:23 [PATCH v3 0/2] mm, slab: optimize returning objects after a partial refill Vlastimil Babka (SUSE)
  2026-05-22 14:23 ` [PATCH v3 1/2] mm, slab: add an optimistic __slab_try_return_freelist() Vlastimil Babka (SUSE)
@ 2026-05-22 14:23 ` Vlastimil Babka (SUSE)
  2026-05-25  6:47   ` Hao Li
  1 sibling, 1 reply; 7+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-05-22 14:23 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Andrew Morton, linux-mm, linux-kernel, hu.shengming,
	Vinicius Costa Gomes, Vlastimil Babka (SUSE)

When we return slabs to the partial list because we didn't fully refill
from them, we observe the min_partial limit when the returned slab is
empty, and discard it when over the limit. But it's unlikely for the
limit to be reached while we were refilling, and the worst outcome is to
have temporarily more free slabs on the list than necessary. So just
drop that code and simplify the function.

Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slub.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 5816fcfc7a90..074d57c0390b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -7196,21 +7196,11 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
 
 		list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
 
-			if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial))
-				continue;
-
 			list_del(&slab->slab_list);
 			add_partial(n, slab, ADD_TO_TAIL);
 		}
 
 		spin_unlock_irqrestore(&n->list_lock, flags);
-
-		/* any slabs left are completely free and for discard */
-		list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
-
-			list_del(&slab->slab_list);
-			discard_slab(s, slab);
-		}
 	}
 
 	return refilled;

-- 
2.54.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] mm, slab: add an optimistic __slab_try_return_freelist()
  2026-05-22 14:23 ` [PATCH v3 1/2] mm, slab: add an optimistic __slab_try_return_freelist() Vlastimil Babka (SUSE)
@ 2026-05-22 17:40   ` Vlastimil Babka (SUSE)
  0 siblings, 0 replies; 7+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-05-22 17:40 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Andrew Morton, linux-mm, linux-kernel, hu.shengming,
	Vinicius Costa Gomes

On 5/22/26 16:23, Vlastimil Babka (SUSE) wrote:
> +/*
> + * Try returning (remainder of) the freelist that we just detached from the
> + * slab.  Optimistically assume the slab is still full, so we don't need to find
> + * the tail of the detached freelist.
> + *
> + * Fail if the slab isn't full anymore due to a cocurrent free.

Just found out I reintroduced here some comment typos that were already
corrected in slab/for-next and fixed them up again locally.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] mm, slab: simplify returning slab in __refill_objects_node()
  2026-05-22 14:23 ` [PATCH v3 2/2] mm, slab: simplify returning slab in __refill_objects_node() Vlastimil Babka (SUSE)
@ 2026-05-25  6:47   ` Hao Li
  2026-05-25  7:15     ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 7+ messages in thread
From: Hao Li @ 2026-05-25  6:47 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin,
	Andrew Morton, linux-mm, linux-kernel, hu.shengming,
	Vinicius Costa Gomes

On Fri, May 22, 2026 at 04:23:21PM +0200, Vlastimil Babka (SUSE) wrote:
> When we return slabs to the partial list because we didn't fully refill
> from them, we observe the min_partial limit when the returned slab is
> empty, and discard it when over the limit. But it's unlikely for the
> limit to be reached while we were refilling, and the worst outcome is to
> have temporarily more free slabs on the list than necessary.

Just wondering if the empty slabs temporarily exceed the limit and then some
objects get allocated from them, would this lead to more fragmented slabs in
the node partial list?

> So just
> drop that code and simplify the function.
> 
> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> ---
>  mm/slub.c | 10 ----------
>  1 file changed, 10 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 5816fcfc7a90..074d57c0390b 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -7196,21 +7196,11 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
>  
>  		list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
>  
> -			if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial))
> -				continue;
> -
>  			list_del(&slab->slab_list);
>  			add_partial(n, slab, ADD_TO_TAIL);
>  		}
>  
>  		spin_unlock_irqrestore(&n->list_lock, flags);
> -
> -		/* any slabs left are completely free and for discard */
> -		list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
> -
> -			list_del(&slab->slab_list);
> -			discard_slab(s, slab);
> -		}
>  	}
>  
>  	return refilled;
> 
> -- 
> 2.54.0
> 

-- 
Thanks,
Hao


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] mm, slab: simplify returning slab in __refill_objects_node()
  2026-05-25  6:47   ` Hao Li
@ 2026-05-25  7:15     ` Vlastimil Babka (SUSE)
  2026-05-25 11:51       ` Hao Li
  0 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-05-25  7:15 UTC (permalink / raw)
  To: Hao Li
  Cc: Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin,
	Andrew Morton, linux-mm, linux-kernel, hu.shengming,
	Vinicius Costa Gomes

On 5/25/26 08:47, Hao Li wrote:
> On Fri, May 22, 2026 at 04:23:21PM +0200, Vlastimil Babka (SUSE) wrote:
>> When we return slabs to the partial list because we didn't fully refill
>> from them, we observe the min_partial limit when the returned slab is
>> empty, and discard it when over the limit. But it's unlikely for the
>> limit to be reached while we were refilling, and the worst outcome is to
>> have temporarily more free slabs on the list than necessary.
> 
> Just wondering if the empty slabs temporarily exceed the limit and then some
> objects get allocated from them, would this lead to more fragmented slabs in
> the node partial list?

I think since we're adding the slabs to tail and refill from head, it
shouldn't happen that easily. Fragmenting is possible in general due to bad
luck, I doubt this change could make it noticeably worse.

>> So just
>> drop that code and simplify the function.
>> 
>> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>> ---
>>  mm/slub.c | 10 ----------
>>  1 file changed, 10 deletions(-)
>> 
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 5816fcfc7a90..074d57c0390b 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -7196,21 +7196,11 @@ __refill_objects_node(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int mi
>>  
>>  		list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
>>  
>> -			if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial))
>> -				continue;
>> -
>>  			list_del(&slab->slab_list);
>>  			add_partial(n, slab, ADD_TO_TAIL);
>>  		}
>>  
>>  		spin_unlock_irqrestore(&n->list_lock, flags);
>> -
>> -		/* any slabs left are completely free and for discard */
>> -		list_for_each_entry_safe(slab, slab2, &pc.slabs, slab_list) {
>> -
>> -			list_del(&slab->slab_list);
>> -			discard_slab(s, slab);
>> -		}
>>  	}
>>  
>>  	return refilled;
>> 
>> -- 
>> 2.54.0
>> 
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] mm, slab: simplify returning slab in __refill_objects_node()
  2026-05-25  7:15     ` Vlastimil Babka (SUSE)
@ 2026-05-25 11:51       ` Hao Li
  0 siblings, 0 replies; 7+ messages in thread
From: Hao Li @ 2026-05-25 11:51 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin,
	Andrew Morton, linux-mm, linux-kernel, hu.shengming,
	Vinicius Costa Gomes

On Mon, May 25, 2026 at 09:15:49AM +0200, Vlastimil Babka (SUSE) wrote:
> On 5/25/26 08:47, Hao Li wrote:
> > On Fri, May 22, 2026 at 04:23:21PM +0200, Vlastimil Babka (SUSE) wrote:
> >> When we return slabs to the partial list because we didn't fully refill
> >> from them, we observe the min_partial limit when the returned slab is
> >> empty, and discard it when over the limit. But it's unlikely for the
> >> limit to be reached while we were refilling, and the worst outcome is to
> >> have temporarily more free slabs on the list than necessary.
> > 
> > Just wondering if the empty slabs temporarily exceed the limit and then some
> > objects get allocated from them, would this lead to more fragmented slabs in
> > the node partial list?
> 
> I think since we're adding the slabs to tail and refill from head, it
> shouldn't happen that easily.

This makes sense!

> Fragmenting is possible in general due to bad
> luck, I doubt this change could make it noticeably worse.

I ran a test and even though the data is pretty noisy, the
length of the partial list indeed didn't really grow much.

And after applying both patches, I can see some improvement (1~2%) in overall
performance!

Reviewed-by: Hao Li <hao.li@linux.dev>
Tested-by: Hao Li <hao.li@linux.dev>

-- 
Thanks,
Hao


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-25 11:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 14:23 [PATCH v3 0/2] mm, slab: optimize returning objects after a partial refill Vlastimil Babka (SUSE)
2026-05-22 14:23 ` [PATCH v3 1/2] mm, slab: add an optimistic __slab_try_return_freelist() Vlastimil Babka (SUSE)
2026-05-22 17:40   ` Vlastimil Babka (SUSE)
2026-05-22 14:23 ` [PATCH v3 2/2] mm, slab: simplify returning slab in __refill_objects_node() Vlastimil Babka (SUSE)
2026-05-25  6:47   ` Hao Li
2026-05-25  7:15     ` Vlastimil Babka (SUSE)
2026-05-25 11:51       ` Hao Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox