linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] mm/slub: some minor optimization and cleanup
@ 2024-01-23  9:33 Chengming Zhou
  2024-01-23  9:33 ` [PATCH v2 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case Chengming Zhou
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Chengming Zhou @ 2024-01-23  9:33 UTC (permalink / raw)
  To: Joonsoo Kim, Vlastimil Babka, David Rientjes, Roman Gushchin,
	Pekka Enberg, Christoph Lameter, Andrew Morton, Hyeonggon Yoo
  Cc: Vlastimil Babka, linux-kernel, Chengming Zhou, linux-mm,
	Christoph Lameter (Ampere)

Changes in v2:
- Add VM_BUG_ON(!freelist) after get_freelist() for cpu partial slab case,
  since it's not possible to happen for this case.
- Collect tags.
- Link to v1: https://lore.kernel.org/r/20240117-slab-misc-v1-0-fd1c49ccbe70@bytedance.com

Hi,

This series include a minor optimization of cpu partial slab fastpath,
which directly load freelist from cpu partial slab in the likely case.

It has small performance improvement in testing:
perf bench sched messaging -g 5 -t -l 100000

            mm-stable   slub-optimize
Total time      7.473    7.209

The other two patches are cleanups, which are included for convenience.

Thanks for review and comment!

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
Chengming Zhou (3):
      mm/slub: directly load freelist from cpu partial slab in the likely case
      mm/slub: remove full list manipulation for non-debug slab
      mm/slub: remove unused parameter in next_freelist_entry()

 mm/slub.c | 32 ++++++++++++++------------------
 1 file changed, 14 insertions(+), 18 deletions(-)
---
base-commit: ab27740f76654ed58dd32ac0ba0031c18a6dea3b
change-id: 20240117-slab-misc-5a5f37a51257

Best regards,
-- 
Chengming Zhou <zhouchengming@bytedance.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-23  9:33 [PATCH v2 0/3] mm/slub: some minor optimization and cleanup Chengming Zhou
@ 2024-01-23  9:33 ` Chengming Zhou
  2024-01-23 10:45   ` Vlastimil Babka
  2024-01-23  9:33 ` [PATCH v2 2/3] mm/slub: remove full list manipulation for non-debug slab Chengming Zhou
  2024-01-23  9:33 ` [PATCH v2 3/3] mm/slub: remove unused parameter in next_freelist_entry() Chengming Zhou
  2 siblings, 1 reply; 5+ messages in thread
From: Chengming Zhou @ 2024-01-23  9:33 UTC (permalink / raw)
  To: Joonsoo Kim, Vlastimil Babka, David Rientjes, Roman Gushchin,
	Pekka Enberg, Christoph Lameter, Andrew Morton, Hyeonggon Yoo
  Cc: Vlastimil Babka, linux-kernel, Chengming Zhou, linux-mm,
	Christoph Lameter (Ampere)

The likely case is that we get a usable slab from the cpu partial list,
we can directly load freelist from it and return back, instead of going
the other way that need more work, like reenable interrupt and recheck.

But we need to remove the "VM_BUG_ON(!new.frozen)" in get_freelist()
for reusing it, since cpu partial slab is not frozen. It seems
acceptable since it's only for debug purpose.

And get_freelist() also assumes it can return NULL if the freelist is
empty, which is not possible for the cpu partial slab case, so we
add "VM_BUG_ON(!freelist)" after get_freelist() to make it explicit.

There is some small performance improvement too, which shows by:
perf bench sched messaging -g 5 -t -l 100000

            mm-stable   slub-optimize
Total time      7.473    7.209

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 2ef88bbf56a3..fda402b2d649 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3326,7 +3326,6 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
 		counters = slab->counters;
 
 		new.counters = counters;
-		VM_BUG_ON(!new.frozen);
 
 		new.inuse = slab->objects;
 		new.frozen = freelist != NULL;
@@ -3498,18 +3497,20 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 		slab = slub_percpu_partial(c);
 		slub_set_percpu_partial(c, slab);
-		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
-		stat(s, CPU_PARTIAL_ALLOC);
 
-		if (unlikely(!node_match(slab, node) ||
-			     !pfmemalloc_match(slab, gfpflags))) {
-			slab->next = NULL;
-			__put_partials(s, slab);
-			continue;
+		if (likely(node_match(slab, node) &&
+			   pfmemalloc_match(slab, gfpflags))) {
+			c->slab = slab;
+			freelist = get_freelist(s, slab);
+			VM_BUG_ON(!freelist);
+			stat(s, CPU_PARTIAL_ALLOC);
+			goto load_freelist;
 		}
 
-		freelist = freeze_slab(s, slab);
-		goto retry_load_slab;
+		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
+
+		slab->next = NULL;
+		__put_partials(s, slab);
 	}
 #endif
 

-- 
b4 0.10.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/3] mm/slub: remove full list manipulation for non-debug slab
  2024-01-23  9:33 [PATCH v2 0/3] mm/slub: some minor optimization and cleanup Chengming Zhou
  2024-01-23  9:33 ` [PATCH v2 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case Chengming Zhou
@ 2024-01-23  9:33 ` Chengming Zhou
  2024-01-23  9:33 ` [PATCH v2 3/3] mm/slub: remove unused parameter in next_freelist_entry() Chengming Zhou
  2 siblings, 0 replies; 5+ messages in thread
From: Chengming Zhou @ 2024-01-23  9:33 UTC (permalink / raw)
  To: Joonsoo Kim, Vlastimil Babka, David Rientjes, Roman Gushchin,
	Pekka Enberg, Christoph Lameter, Andrew Morton, Hyeonggon Yoo
  Cc: Vlastimil Babka, linux-kernel, Chengming Zhou, linux-mm,
	Christoph Lameter (Ampere)

Since debug slab is processed by free_to_partial_list(), and only debug
slab which has SLAB_STORE_USER flag would care about the full list, we
can remove these unrelated full list manipulations from __slab_free().

Acked-by: Christoph Lameter (Ampere) <cl@linux.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index fda402b2d649..5c6fbeef05a8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4188,7 +4188,6 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
 	 * then add it.
 	 */
 	if (!kmem_cache_has_cpu_partial(s) && unlikely(!prior)) {
-		remove_full(s, n, slab);
 		add_partial(n, slab, DEACTIVATE_TO_TAIL);
 		stat(s, FREE_ADD_PARTIAL);
 	}
@@ -4202,9 +4201,6 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
 		 */
 		remove_partial(n, slab);
 		stat(s, FREE_REMOVE_PARTIAL);
-	} else {
-		/* Slab must be on the full list */
-		remove_full(s, n, slab);
 	}
 
 	spin_unlock_irqrestore(&n->list_lock, flags);

-- 
b4 0.10.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 3/3] mm/slub: remove unused parameter in next_freelist_entry()
  2024-01-23  9:33 [PATCH v2 0/3] mm/slub: some minor optimization and cleanup Chengming Zhou
  2024-01-23  9:33 ` [PATCH v2 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case Chengming Zhou
  2024-01-23  9:33 ` [PATCH v2 2/3] mm/slub: remove full list manipulation for non-debug slab Chengming Zhou
@ 2024-01-23  9:33 ` Chengming Zhou
  2 siblings, 0 replies; 5+ messages in thread
From: Chengming Zhou @ 2024-01-23  9:33 UTC (permalink / raw)
  To: Joonsoo Kim, Vlastimil Babka, David Rientjes, Roman Gushchin,
	Pekka Enberg, Christoph Lameter, Andrew Morton, Hyeonggon Yoo
  Cc: Vlastimil Babka, linux-kernel, Chengming Zhou, linux-mm,
	Christoph Lameter (Ampere)

The parameter "struct slab *slab" is unused in next_freelist_entry(),
so just remove it.

Acked-by: Christoph Lameter (Ampere) <cl@linux.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 5c6fbeef05a8..7f235fa6592d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2243,7 +2243,7 @@ static void __init init_freelist_randomization(void)
 }
 
 /* Get the next entry on the pre-computed freelist randomized */
-static void *next_freelist_entry(struct kmem_cache *s, struct slab *slab,
+static void *next_freelist_entry(struct kmem_cache *s,
 				unsigned long *pos, void *start,
 				unsigned long page_limit,
 				unsigned long freelist_count)
@@ -2282,13 +2282,12 @@ static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
 	start = fixup_red_left(s, slab_address(slab));
 
 	/* First entry is used as the base of the freelist */
-	cur = next_freelist_entry(s, slab, &pos, start, page_limit,
-				freelist_count);
+	cur = next_freelist_entry(s, &pos, start, page_limit, freelist_count);
 	cur = setup_object(s, cur);
 	slab->freelist = cur;
 
 	for (idx = 1; idx < slab->objects; idx++) {
-		next = next_freelist_entry(s, slab, &pos, start, page_limit,
+		next = next_freelist_entry(s, &pos, start, page_limit,
 			freelist_count);
 		next = setup_object(s, next);
 		set_freepointer(s, cur, next);

-- 
b4 0.10.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-23  9:33 ` [PATCH v2 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case Chengming Zhou
@ 2024-01-23 10:45   ` Vlastimil Babka
  0 siblings, 0 replies; 5+ messages in thread
From: Vlastimil Babka @ 2024-01-23 10:45 UTC (permalink / raw)
  To: Chengming Zhou, Joonsoo Kim, David Rientjes, Roman Gushchin,
	Pekka Enberg, Christoph Lameter, Andrew Morton, Hyeonggon Yoo
  Cc: linux-kernel, linux-mm

On 1/23/24 10:33, Chengming Zhou wrote:
> The likely case is that we get a usable slab from the cpu partial list,
> we can directly load freelist from it and return back, instead of going
> the other way that need more work, like reenable interrupt and recheck.
> 
> But we need to remove the "VM_BUG_ON(!new.frozen)" in get_freelist()
> for reusing it, since cpu partial slab is not frozen. It seems
> acceptable since it's only for debug purpose.
> 
> And get_freelist() also assumes it can return NULL if the freelist is
> empty, which is not possible for the cpu partial slab case, so we
> add "VM_BUG_ON(!freelist)" after get_freelist() to make it explicit.
> 
> There is some small performance improvement too, which shows by:
> perf bench sched messaging -g 5 -t -l 100000
> 
>             mm-stable   slub-optimize
> Total time      7.473    7.209
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

I'm including the series to slab/for-6.9/optimize-get-freelist and
slab/for-next, thanks!

> ---
>  mm/slub.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 2ef88bbf56a3..fda402b2d649 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3326,7 +3326,6 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
>  		counters = slab->counters;
>  
>  		new.counters = counters;
> -		VM_BUG_ON(!new.frozen);
>  
>  		new.inuse = slab->objects;
>  		new.frozen = freelist != NULL;
> @@ -3498,18 +3497,20 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  
>  		slab = slub_percpu_partial(c);
>  		slub_set_percpu_partial(c, slab);
> -		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> -		stat(s, CPU_PARTIAL_ALLOC);
>  
> -		if (unlikely(!node_match(slab, node) ||
> -			     !pfmemalloc_match(slab, gfpflags))) {
> -			slab->next = NULL;
> -			__put_partials(s, slab);
> -			continue;
> +		if (likely(node_match(slab, node) &&
> +			   pfmemalloc_match(slab, gfpflags))) {
> +			c->slab = slab;
> +			freelist = get_freelist(s, slab);
> +			VM_BUG_ON(!freelist);
> +			stat(s, CPU_PARTIAL_ALLOC);
> +			goto load_freelist;
>  		}
>  
> -		freelist = freeze_slab(s, slab);
> -		goto retry_load_slab;
> +		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> +
> +		slab->next = NULL;
> +		__put_partials(s, slab);
>  	}
>  #endif
>  
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-23 10:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-23  9:33 [PATCH v2 0/3] mm/slub: some minor optimization and cleanup Chengming Zhou
2024-01-23  9:33 ` [PATCH v2 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case Chengming Zhou
2024-01-23 10:45   ` Vlastimil Babka
2024-01-23  9:33 ` [PATCH v2 2/3] mm/slub: remove full list manipulation for non-debug slab Chengming Zhou
2024-01-23  9:33 ` [PATCH v2 3/3] mm/slub: remove unused parameter in next_freelist_entry() Chengming Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).