* [PATCH mm-unstable v2] mm/memcontrol: batch memcg charging in __memcg_slab_post_alloc_hook
@ 2026-03-20 2:07 Hui Zhu
2026-03-27 6:22 ` Andrew Morton
2026-03-31 8:16 ` Andrew Morton
0 siblings, 2 replies; 4+ messages in thread
From: Hui Zhu @ 2026-03-20 2:07 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, cgroups, linux-mm, linux-kernel
Cc: teawater
From: teawater <zhuhui@kylinos.cn>
When kmem_cache_alloc_bulk() allocates multiple objects, the post-alloc
hook __memcg_slab_post_alloc_hook() previously charged memcg one object
at a time, even though consecutive objects may reside on slabs backed by
the same pgdat node.
Batch the memcg charging by scanning ahead from the current position to
find a contiguous run of objects whose slabs share the same pgdat, then
issue a single __obj_cgroup_charge() / __consume_obj_stock() call for
the entire run. The per-object obj_ext assignment loop is preserved as-is
since it cannot be further collapsed.
This implements the TODO comment left in commit bc730030f956 ("memcg:
combine slab obj stock charging and accounting").
The existing error-recovery contract is unchanged: if size == 1 then
memcg_alloc_abort_single() will free the sole object, and for larger
bulk allocations kmem_cache_free_bulk() will uncharge any objects that
were already charged before the failure.
Benchmark using kmem_cache_alloc_bulk() with SLAB_ACCOUNT
(iters=100000):
bulk=32 before: 215 ns/object after: 174 ns/object (-19%)
bulk=1 before: 344 ns/object after: 335 ns/object ( ~)
No measurable regression for bulk=1, as expected.
Signed-off-by: teawater <zhuhui@kylinos.cn>
---
Changelog:
v2:
According to the comments in [1], add code to handle the integer
overflow issue.
[1] https://sashiko.dev/#/patchset/20260316084839.1342163-1-hui.zhu%40linux.dev
mm/memcontrol.c | 77 +++++++++++++++++++++++++++++++++++++------------
1 file changed, 58 insertions(+), 19 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a47fb68dd65f..e65130d521d7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3448,51 +3448,90 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
return false;
}
- for (i = 0; i < size; i++) {
+ for (i = 0; i < size; ) {
unsigned long obj_exts;
struct slabobj_ext *obj_ext;
struct obj_stock_pcp *stock;
+ struct pglist_data *pgdat;
+ int batch_bytes;
+ size_t run_len = 0;
+ size_t j;
+ size_t max_size;
+ bool skip_next = false;
slab = virt_to_slab(p[i]);
if (!slab_obj_exts(slab) &&
alloc_slab_obj_exts(slab, s, flags, false)) {
+ i++;
continue;
}
+ pgdat = slab_pgdat(slab);
+ run_len = 1;
+
+ /*
+ * The value of batch_bytes must not exceed
+ * (INT_MAX - PAGE_SIZE) to prevent integer overflow in
+ * the final accumulation performed by __account_obj_stock().
+ */
+ max_size = min((size_t)((INT_MAX - PAGE_SIZE) / obj_size),
+ size);
+
+ for (j = i + 1; j < max_size; j++) {
+ struct slab *slab_j = virt_to_slab(p[j]);
+
+ if (slab_pgdat(slab_j) != pgdat)
+ break;
+
+ if (!slab_obj_exts(slab_j) &&
+ alloc_slab_obj_exts(slab_j, s, flags, false)) {
+ skip_next = true;
+ break;
+ }
+
+ run_len++;
+ }
+
/*
- * if we fail and size is 1, memcg_alloc_abort_single() will
+ * If we fail and size is 1, memcg_alloc_abort_single() will
* just free the object, which is ok as we have not assigned
- * objcg to its obj_ext yet
- *
- * for larger sizes, kmem_cache_free_bulk() will uncharge
- * any objects that were already charged and obj_ext assigned
+ * objcg to its obj_ext yet.
*
- * TODO: we could batch this until slab_pgdat(slab) changes
- * between iterations, with a more complicated undo
+ * For larger sizes, kmem_cache_free_bulk() will uncharge
+ * any objects that were already charged and obj_ext assigned.
*/
+ batch_bytes = obj_size * run_len;
stock = trylock_stock();
- if (!stock || !__consume_obj_stock(objcg, stock, obj_size)) {
+ if (!stock || !__consume_obj_stock(objcg, stock, batch_bytes)) {
size_t remainder;
unlock_stock(stock);
- if (__obj_cgroup_charge(objcg, flags, obj_size, &remainder))
+ if (__obj_cgroup_charge(objcg, flags, batch_bytes, &remainder))
return false;
stock = trylock_stock();
if (remainder)
__refill_obj_stock(objcg, stock, remainder, false);
}
- __account_obj_stock(objcg, stock, obj_size,
- slab_pgdat(slab), cache_vmstat_idx(s));
+ __account_obj_stock(objcg, stock, batch_bytes,
+ pgdat, cache_vmstat_idx(s));
unlock_stock(stock);
- obj_exts = slab_obj_exts(slab);
- get_slab_obj_exts(obj_exts);
- off = obj_to_index(s, slab, p[i]);
- obj_ext = slab_obj_ext(slab, obj_exts, off);
- obj_cgroup_get(objcg);
- obj_ext->objcg = objcg;
- put_slab_obj_exts(obj_exts);
+ for (j = 0; j < run_len; j++) {
+ slab = virt_to_slab(p[i + j]);
+ obj_exts = slab_obj_exts(slab);
+ get_slab_obj_exts(obj_exts);
+ off = obj_to_index(s, slab, p[i + j]);
+ obj_ext = slab_obj_ext(slab, obj_exts, off);
+ obj_cgroup_get(objcg);
+ obj_ext->objcg = objcg;
+ put_slab_obj_exts(obj_exts);
+ }
+
+ if (skip_next)
+ i = i + run_len + 1;
+ else
+ i += run_len;
}
return true;
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH mm-unstable v2] mm/memcontrol: batch memcg charging in __memcg_slab_post_alloc_hook
2026-03-20 2:07 [PATCH mm-unstable v2] mm/memcontrol: batch memcg charging in __memcg_slab_post_alloc_hook Hui Zhu
@ 2026-03-27 6:22 ` Andrew Morton
2026-03-31 8:16 ` Andrew Morton
1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2026-03-27 6:22 UTC (permalink / raw)
To: Hui Zhu
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, cgroups, linux-mm, linux-kernel, teawater
On Fri, 20 Mar 2026 10:07:45 +0800 Hui Zhu <hui.zhu@linux.dev> wrote:
> From: teawater <zhuhui@kylinos.cn>
We do prefer real names in Linux commits, please. Can I rewrite this
patch as From: Hui Zhu <hui.zhu@linux.dev>?
> When kmem_cache_alloc_bulk() allocates multiple objects, the post-alloc
> hook __memcg_slab_post_alloc_hook() previously charged memcg one object
> at a time, even though consecutive objects may reside on slabs backed by
> the same pgdat node.
>
> Batch the memcg charging by scanning ahead from the current position to
> find a contiguous run of objects whose slabs share the same pgdat, then
> issue a single __obj_cgroup_charge() / __consume_obj_stock() call for
> the entire run. The per-object obj_ext assignment loop is preserved as-is
> since it cannot be further collapsed.
>
> This implements the TODO comment left in commit bc730030f956 ("memcg:
> combine slab obj stock charging and accounting").
>
> The existing error-recovery contract is unchanged: if size == 1 then
> memcg_alloc_abort_single() will free the sole object, and for larger
> bulk allocations kmem_cache_free_bulk() will uncharge any objects that
> were already charged before the failure.
>
> Benchmark using kmem_cache_alloc_bulk() with SLAB_ACCOUNT
> (iters=100000):
>
> bulk=32 before: 215 ns/object after: 174 ns/object (-19%)
> bulk=1 before: 344 ns/object after: 335 ns/object ( ~)
>
> No measurable regression for bulk=1, as expected.
Could memcg maintainers please review this?
Thanks.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH mm-unstable v2] mm/memcontrol: batch memcg charging in __memcg_slab_post_alloc_hook
2026-03-20 2:07 [PATCH mm-unstable v2] mm/memcontrol: batch memcg charging in __memcg_slab_post_alloc_hook Hui Zhu
2026-03-27 6:22 ` Andrew Morton
@ 2026-03-31 8:16 ` Andrew Morton
2026-03-31 8:42 ` teawater
1 sibling, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2026-03-31 8:16 UTC (permalink / raw)
To: Hui Zhu
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, cgroups, linux-mm, linux-kernel, teawater
On Fri, 20 Mar 2026 10:07:45 +0800 Hui Zhu <hui.zhu@linux.dev> wrote:
> When kmem_cache_alloc_bulk() allocates multiple objects, the post-alloc
> hook __memcg_slab_post_alloc_hook() previously charged memcg one object
> at a time, even though consecutive objects may reside on slabs backed by
> the same pgdat node.
>
> Batch the memcg charging by scanning ahead from the current position to
> find a contiguous run of objects whose slabs share the same pgdat, then
> issue a single __obj_cgroup_charge() / __consume_obj_stock() call for
> the entire run. The per-object obj_ext assignment loop is preserved as-is
> since it cannot be further collapsed.
>
> This implements the TODO comment left in commit bc730030f956 ("memcg:
> combine slab obj stock charging and accounting").
>
> The existing error-recovery contract is unchanged: if size == 1 then
> memcg_alloc_abort_single() will free the sole object, and for larger
> bulk allocations kmem_cache_free_bulk() will uncharge any objects that
> were already charged before the failure.
>
> Benchmark using kmem_cache_alloc_bulk() with SLAB_ACCOUNT
> (iters=100000):
>
> bulk=32 before: 215 ns/object after: 174 ns/object (-19%)
> bulk=1 before: 344 ns/object after: 335 ns/object ( ~)
>
> No measurable regression for bulk=1, as expected.
I noticed that the AI review of your v1 patch reported a few potential
issues:
https://sashiko.dev/#/patchset/20260316084839.1342163-1-hui.zhu@linux.dev
Can you please take a look, see if any of this is valid for v2?
Unfortunately the bot wasn't able to check v2 because it couldn't get
the patch to apply. I've checked that this patch does apply cleanly to
current mm-stable, which is on the bot's try-to-apply list. So if you
wish to get checking of the latest patch, please send us a v3 and that
will trigger a retry.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH mm-unstable v2] mm/memcontrol: batch memcg charging in __memcg_slab_post_alloc_hook
2026-03-31 8:16 ` Andrew Morton
@ 2026-03-31 8:42 ` teawater
0 siblings, 0 replies; 4+ messages in thread
From: teawater @ 2026-03-31 8:42 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, cgroups, linux-mm, linux-kernel, teawater
>
> On Fri, 20 Mar 2026 10:07:45 +0800 Hui Zhu <hui.zhu@linux.dev> wrote:
>
> >
> > When kmem_cache_alloc_bulk() allocates multiple objects, the post-alloc
> > hook __memcg_slab_post_alloc_hook() previously charged memcg one object
> > at a time, even though consecutive objects may reside on slabs backed by
> > the same pgdat node.
> >
> > Batch the memcg charging by scanning ahead from the current position to
> > find a contiguous run of objects whose slabs share the same pgdat, then
> > issue a single __obj_cgroup_charge() / __consume_obj_stock() call for
> > the entire run. The per-object obj_ext assignment loop is preserved as-is
> > since it cannot be further collapsed.
> >
> > This implements the TODO comment left in commit bc730030f956 ("memcg:
> > combine slab obj stock charging and accounting").
> >
> > The existing error-recovery contract is unchanged: if size == 1 then
> > memcg_alloc_abort_single() will free the sole object, and for larger
> > bulk allocations kmem_cache_free_bulk() will uncharge any objects that
> > were already charged before the failure.
> >
> > Benchmark using kmem_cache_alloc_bulk() with SLAB_ACCOUNT
> > (iters=100000):
> >
> > bulk=32 before: 215 ns/object after: 174 ns/object (-19%)
> > bulk=1 before: 344 ns/object after: 335 ns/object ( ~)
> >
> > No measurable regression for bulk=1, as expected.
> >
Hi Andrew,
> I noticed that the AI review of your v1 patch reported a few potential
> issues:
> https://sashiko.dev/#/patchset/20260316084839.1342163-1-hui.zhu@linux.dev
>
> Can you please take a look, see if any of this is valid for v2?
>
> Unfortunately the bot wasn't able to check v2 because it couldn't get
> the patch to apply. I've checked that this patch does apply cleanly to
> current mm-stable, which is on the bot's try-to-apply list. So if you
> wish to get checking of the latest patch, please send us a v3 and that
> will trigger a retry.
I will send a v3 to make sure the patch is OK.
Best,
Hui
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-31 8:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-20 2:07 [PATCH mm-unstable v2] mm/memcontrol: batch memcg charging in __memcg_slab_post_alloc_hook Hui Zhu
2026-03-27 6:22 ` Andrew Morton
2026-03-31 8:16 ` Andrew Morton
2026-03-31 8:42 ` teawater
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox