All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] memcg: shrink obj_stock_pcp and cache multiple objcgs
@ 2026-05-20  5:31 Shakeel Butt
  2026-05-20  5:31 ` [PATCH 1/4] memcg: store node_id instead of pglist_data pointer Shakeel Butt
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Shakeel Butt @ 2026-05-20  5:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Qi Zheng, Alexandre Ghiti, Joshua Hahn, Harry Yoo,
	Meta kernel team, linux-mm, cgroups, linux-kernel,
	kernel test robot

Commit 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg
per-node type") split a memcg's single obj_cgroup into one per NUMA
node so that reparenting LRU folios can take per-node lru locks. As a
side effect, the per-CPU obj_stock_pcp -- which caches a single
cached_objcg pointer -- thrashes on workloads where threads of the
same memcg run on different NUMA nodes. The kernel test robot reported
a 67.7% regression on stress-ng.switch.ops_per_sec from this pattern.

Commit d0211878ce06 ("memcg: cache obj_stock by memcg, not by objcg
pointer") landed as a temporary fix by treating sibling per-node
objcgs as equivalent for the cache lookup, intended to be reverted
once per-node kmem accounting is introduced. This series takes a more
general approach: cache multiple objcgs per CPU using the multi-slot
pattern memcg_stock_pcp already uses, so the per-node objcg variants
of one memcg can all coexist in the stock without ever forcing a
drain. The temporary fix can then be reverted.

To avoid increasing the per-CPU cache footprint, the first three
patches shrink the existing single-slot obj_stock_pcp fields.
The final patch converts cached_objcg and nr_bytes into
NR_OBJ_STOCK=5 slot arrays and reorders the struct so the entire
consume/refill/account hot path fits within a single 64-byte cache
line on non-debug 64-bit builds (verified with pahole).

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202605121641.b6a60cb0-lkp@intel.com
Fixes: 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg per-node type")
Tested-by: kernel test robot <oliver.sang@intel.com>

Shakeel Butt (4):
  memcg: store node_id instead of pglist_data pointer
  memcg: uint16_t for nr_bytes in obj_stock_pcp
  memcg: int16_t for cached slab stats
  memcg: multi objcg charge support

 mm/memcontrol.c | 214 +++++++++++++++++++++++++++++++++++-------------
 1 file changed, 157 insertions(+), 57 deletions(-)

--
2.53.0-Meta



^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCH 0/4] memcg: shrink obj_stock_pcp and cache multiple objcgs
@ 2026-05-20  5:29 Shakeel Butt
  0 siblings, 0 replies; 19+ messages in thread
From: Shakeel Butt @ 2026-05-20  5:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Qi Zheng, Alexandre Ghiti, Joshua Hahn, Harry Yoo,
	Meta kernel team, linux-mm, cgroups, linux-kernel,
	kernel test robot

Commit 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg
per-node type") split a memcg's single obj_cgroup into one per NUMA
node so that reparenting LRU folios can take per-node lru locks. As a
side effect, the per-CPU obj_stock_pcp -- which caches a single
cached_objcg pointer -- thrashes on workloads where threads of the
same memcg run on different NUMA nodes. The kernel test robot reported
a 67.7% regression on stress-ng.switch.ops_per_sec from this pattern.

Commit d0211878ce06 ("memcg: cache obj_stock by memcg, not by objcg
pointer") landed as a temporary fix by treating sibling per-node
objcgs as equivalent for the cache lookup, intended to be reverted
once per-node kmem accounting is introduced. This series takes a more
general approach: cache multiple objcgs per CPU using the multi-slot
pattern memcg_stock_pcp already uses, so the per-node objcg variants
of one memcg can all coexist in the stock without ever forcing a
drain. The temporary fix can then be reverted.

To avoid increasing the per-CPU cache footprint, the first three
patches shrink the existing single-slot obj_stock_pcp fields.
The final patch converts cached_objcg and nr_bytes into
NR_OBJ_STOCK=5 slot arrays and reorders the struct so the entire
consume/refill/account hot path fits within a single 64-byte cache
line on non-debug 64-bit builds (verified with pahole).

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202605121641.b6a60cb0-lkp@intel.com
Fixes: 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg per-node type")
Tested-by: kernel test robot <oliver.sang@intel.com>

Shakeel Butt (4):
  memcg: store node_id instead of pglist_data pointer
  memcg: uint16_t for nr_bytes in obj_stock_pcp
  memcg: int16_t for cached slab stats
  memcg: multi objcg charge support

 mm/memcontrol.c | 214 +++++++++++++++++++++++++++++++++++-------------
 1 file changed, 157 insertions(+), 57 deletions(-)

--
2.53.0-Meta



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-05-21 20:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20  5:31 [PATCH 0/4] memcg: shrink obj_stock_pcp and cache multiple objcgs Shakeel Butt
2026-05-20  5:31 ` [PATCH 1/4] memcg: store node_id instead of pglist_data pointer Shakeel Butt
2026-05-20  6:01   ` Harry Yoo
2026-05-20  6:13   ` Muchun Song
2026-05-20  5:31 ` [PATCH 2/4] memcg: uint16_t for nr_bytes in obj_stock_pcp Shakeel Butt
2026-05-20  6:41   ` Harry Yoo
2026-05-20  7:01   ` Harry Yoo
2026-05-21  1:01     ` Shakeel Butt
2026-05-20 13:20   ` David Laight
2026-05-21  1:03     ` Shakeel Butt
2026-05-20  5:31 ` [PATCH 3/4] memcg: int16_t for cached slab stats Shakeel Butt
2026-05-20  7:25   ` Harry Yoo
2026-05-20  5:31 ` [PATCH 4/4] memcg: multi objcg charge support Shakeel Butt
2026-05-20  9:35   ` Harry Yoo
2026-05-21  1:05     ` Shakeel Butt
2026-05-21  1:43       ` Harry Yoo
2026-05-21 20:19         ` Shakeel Butt
2026-05-21  3:22       ` Joshua Hahn
  -- strict thread matches above, loose matches on Subject: below --
2026-05-20  5:29 [PATCH 0/4] memcg: shrink obj_stock_pcp and cache multiple objcgs Shakeel Butt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.