* [PATCH 0/4] memcg: shrink obj_stock_pcp and cache multiple objcgs
@ 2026-05-20 5:29 Shakeel Butt
0 siblings, 0 replies; 2+ messages in thread
From: Shakeel Butt @ 2026-05-20 5:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Qi Zheng, Alexandre Ghiti, Joshua Hahn, Harry Yoo,
Meta kernel team, linux-mm, cgroups, linux-kernel,
kernel test robot
Commit 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg
per-node type") split a memcg's single obj_cgroup into one per NUMA
node so that reparenting LRU folios can take per-node lru locks. As a
side effect, the per-CPU obj_stock_pcp -- which caches a single
cached_objcg pointer -- thrashes on workloads where threads of the
same memcg run on different NUMA nodes. The kernel test robot reported
a 67.7% regression on stress-ng.switch.ops_per_sec from this pattern.
Commit d0211878ce06 ("memcg: cache obj_stock by memcg, not by objcg
pointer") landed as a temporary fix by treating sibling per-node
objcgs as equivalent for the cache lookup, intended to be reverted
once per-node kmem accounting is introduced. This series takes a more
general approach: cache multiple objcgs per CPU using the multi-slot
pattern memcg_stock_pcp already uses, so the per-node objcg variants
of one memcg can all coexist in the stock without ever forcing a
drain. The temporary fix can then be reverted.
To avoid increasing the per-CPU cache footprint, the first three
patches shrink the existing single-slot obj_stock_pcp fields.
The final patch converts cached_objcg and nr_bytes into
NR_OBJ_STOCK=5 slot arrays and reorders the struct so the entire
consume/refill/account hot path fits within a single 64-byte cache
line on non-debug 64-bit builds (verified with pahole).
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202605121641.b6a60cb0-lkp@intel.com
Fixes: 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg per-node type")
Tested-by: kernel test robot <oliver.sang@intel.com>
Shakeel Butt (4):
memcg: store node_id instead of pglist_data pointer
memcg: uint16_t for nr_bytes in obj_stock_pcp
memcg: int16_t for cached slab stats
memcg: multi objcg charge support
mm/memcontrol.c | 214 +++++++++++++++++++++++++++++++++++-------------
1 file changed, 157 insertions(+), 57 deletions(-)
--
2.53.0-Meta
^ permalink raw reply [flat|nested] 2+ messages in thread* [PATCH 0/4] memcg: shrink obj_stock_pcp and cache multiple objcgs
@ 2026-05-20 5:31 Shakeel Butt
0 siblings, 0 replies; 2+ messages in thread
From: Shakeel Butt @ 2026-05-20 5:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Qi Zheng, Alexandre Ghiti, Joshua Hahn, Harry Yoo,
Meta kernel team, linux-mm, cgroups, linux-kernel,
kernel test robot
Commit 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg
per-node type") split a memcg's single obj_cgroup into one per NUMA
node so that reparenting LRU folios can take per-node lru locks. As a
side effect, the per-CPU obj_stock_pcp -- which caches a single
cached_objcg pointer -- thrashes on workloads where threads of the
same memcg run on different NUMA nodes. The kernel test robot reported
a 67.7% regression on stress-ng.switch.ops_per_sec from this pattern.
Commit d0211878ce06 ("memcg: cache obj_stock by memcg, not by objcg
pointer") landed as a temporary fix by treating sibling per-node
objcgs as equivalent for the cache lookup, intended to be reverted
once per-node kmem accounting is introduced. This series takes a more
general approach: cache multiple objcgs per CPU using the multi-slot
pattern memcg_stock_pcp already uses, so the per-node objcg variants
of one memcg can all coexist in the stock without ever forcing a
drain. The temporary fix can then be reverted.
To avoid increasing the per-CPU cache footprint, the first three
patches shrink the existing single-slot obj_stock_pcp fields.
The final patch converts cached_objcg and nr_bytes into
NR_OBJ_STOCK=5 slot arrays and reorders the struct so the entire
consume/refill/account hot path fits within a single 64-byte cache
line on non-debug 64-bit builds (verified with pahole).
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202605121641.b6a60cb0-lkp@intel.com
Fixes: 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg per-node type")
Tested-by: kernel test robot <oliver.sang@intel.com>
Shakeel Butt (4):
memcg: store node_id instead of pglist_data pointer
memcg: uint16_t for nr_bytes in obj_stock_pcp
memcg: int16_t for cached slab stats
memcg: multi objcg charge support
mm/memcontrol.c | 214 +++++++++++++++++++++++++++++++++++-------------
1 file changed, 157 insertions(+), 57 deletions(-)
--
2.53.0-Meta
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-20 5:31 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20 5:29 [PATCH 0/4] memcg: shrink obj_stock_pcp and cache multiple objcgs Shakeel Butt
-- strict thread matches above, loose matches on Subject: below --
2026-05-20 5:31 Shakeel Butt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox