* [PATCH] mm/slub: allocate sheaves on local memory nodes
@ 2026-05-25 8:13 Hao Li
2026-05-26 6:56 ` Harry Yoo
0 siblings, 1 reply; 3+ messages in thread
From: Hao Li @ 2026-05-25 8:13 UTC (permalink / raw)
To: vbabka, harry, akpm
Cc: cl, rientjes, roman.gushchin, linux-mm, linux-kernel, Hao Li
Sheaves are per-CPU allocator metadata and their object arrays are accessed
from the local fast paths. Allocate them with a NUMA node hint instead of
using plain kzalloc(). While no measurable performance improvement was
observed, this approach is theoretically correct.
During bootstrap we allocate sheaves for all possible CPUs before every
possible CPU has an initialized cpu_to_mem() value, so compute the
memory node from local_memory_node(cpu_to_node(cpu)) just like
what __build_all_zonelists does.
Signed-off-by: Hao Li <hao.li@linux.dev>
---
This patch might conflict with Shengming's patch. Let's review it first, and if
it looks good, I'll rebase afterwards. Thanks.
mm/slub.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 180973a4a3d2..ff1b1e932719 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2757,7 +2757,7 @@ static inline void *setup_object(struct kmem_cache *s, void *object)
}
static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
- unsigned int capacity)
+ unsigned int capacity, int node)
{
struct slab_sheaf *sheaf;
size_t sheaf_size;
@@ -2776,7 +2776,7 @@ static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
gfp |= __GFP_NO_OBJ_EXT;
sheaf_size = struct_size(sheaf, objects, capacity);
- sheaf = kzalloc(sheaf_size, gfp);
+ sheaf = kzalloc_node(sheaf_size, gfp, node);
if (unlikely(!sheaf))
return NULL;
@@ -2791,7 +2791,7 @@ static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
static inline struct slab_sheaf *alloc_empty_sheaf(struct kmem_cache *s,
gfp_t gfp)
{
- return __alloc_empty_sheaf(s, gfp, s->sheaf_capacity);
+ return __alloc_empty_sheaf(s, gfp, s->sheaf_capacity, numa_mem_id());
}
static void free_empty_sheaf(struct kmem_cache *s, struct slab_sheaf *sheaf)
@@ -8413,10 +8413,17 @@ static void __init bootstrap_cache_sheaves(struct kmem_cache *s)
for_each_possible_cpu(cpu) {
struct slub_percpu_sheaves *pcs;
+ int mem_node;
pcs = per_cpu_ptr(s->cpu_sheaves, cpu);
- pcs->main = __alloc_empty_sheaf(s, GFP_KERNEL, capacity);
+ /*
+ * Cannot use cpu_to_mem() here because it's only initialized
+ * for online CPUs at this point (see __build_all_zonelists),
+ * while we need to allocate sheaves for all possible CPUs.
+ */
+ mem_node = local_memory_node(cpu_to_node(cpu));
+ pcs->main = __alloc_empty_sheaf(s, GFP_KERNEL, capacity, mem_node);
if (!pcs->main) {
failed = true;
--
2.54.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] mm/slub: allocate sheaves on local memory nodes
2026-05-25 8:13 [PATCH] mm/slub: allocate sheaves on local memory nodes Hao Li
@ 2026-05-26 6:56 ` Harry Yoo
2026-05-26 7:59 ` Hao Li
0 siblings, 1 reply; 3+ messages in thread
From: Harry Yoo @ 2026-05-26 6:56 UTC (permalink / raw)
To: Hao Li, vbabka, akpm; +Cc: cl, rientjes, roman.gushchin, linux-mm, linux-kernel
[-- Attachment #1.1: Type: text/plain, Size: 808 bytes --]
On 5/25/26 5:13 PM, Hao Li wrote:
> Sheaves are per-CPU allocator metadata and their object arrays are accessed
> from the local fast paths. Allocate them with a NUMA node hint instead of
> using plain kzalloc(). While no measurable performance improvement was
> observed, this approach is theoretically correct.
>
> During bootstrap we allocate sheaves for all possible CPUs before every
> possible CPU has an initialized cpu_to_mem() value, so compute the
> memory node from local_memory_node(cpu_to_node(cpu)) just like
> what __build_all_zonelists does.
What about sheaves for non-kmalloc-normal caches that are allocated &
initialized by init_percpu_sheaves()?
Would addressing above change "no measurable performance impact was
observed"?
--
Cheers,
Harry / Hyeonggon
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 265 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mm/slub: allocate sheaves on local memory nodes
2026-05-26 6:56 ` Harry Yoo
@ 2026-05-26 7:59 ` Hao Li
0 siblings, 0 replies; 3+ messages in thread
From: Hao Li @ 2026-05-26 7:59 UTC (permalink / raw)
To: Harry Yoo
Cc: vbabka, akpm, cl, rientjes, roman.gushchin, linux-mm,
linux-kernel
On Tue, May 26, 2026 at 03:56:58PM +0900, Harry Yoo wrote:
>
>
> On 5/25/26 5:13 PM, Hao Li wrote:
> > Sheaves are per-CPU allocator metadata and their object arrays are accessed
> > from the local fast paths. Allocate them with a NUMA node hint instead of
> > using plain kzalloc(). While no measurable performance improvement was
> > observed, this approach is theoretically correct.
> >
> > During bootstrap we allocate sheaves for all possible CPUs before every
> > possible CPU has an initialized cpu_to_mem() value, so compute the
> > memory node from local_memory_node(cpu_to_node(cpu)) just like
> > what __build_all_zonelists does.
>
>
> What about sheaves for non-kmalloc-normal caches that are allocated &
> initialized by init_percpu_sheaves()?
Ah, good catch! Thanks for the reminder! I completely overlooked this, which
unfortunately means most non-kmalloc caches have been missing out on the
benefits of this patch.
init_percpu_sheaves() also contain for_each_possible_cpu loops. It should also
retrieve the node id from the cpu idx, rather than relying on numa_mem_id()
inside alloc_empty_sheaf().
>
> Would addressing above change "no measurable performance impact was
> observed"?
It looks like we should get some performance gains now. (Hope so:))
--
Thanks,
Hao
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-26 8:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-25 8:13 [PATCH] mm/slub: allocate sheaves on local memory nodes Hao Li
2026-05-26 6:56 ` Harry Yoo
2026-05-26 7:59 ` Hao Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox