From: Shakeel Butt <shakeel.butt@linux.dev>
To: Harry Yoo <harry@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Qi Zheng <qi.zheng@linux.dev>, Alexandre Ghiti <alex@ghiti.fr>,
Joshua Hahn <joshua.hahnjy@gmail.com>,
Meta kernel team <kernel-team@meta.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org,
kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH 4/4] memcg: multi objcg charge support
Date: Wed, 20 May 2026 18:05:23 -0700 [thread overview]
Message-ID: <ag5Z9uIMoXpr3rLP@linux.dev> (raw)
In-Reply-To: <4e20f643-6983-4b6e-b12d-c6c4eb20ae0c@kernel.org>
On Wed, May 20, 2026 at 06:35:30PM +0900, Harry Yoo wrote:
>
>
> On 5/20/26 2:31 PM, Shakeel Butt wrote:
> > Commit 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg
> > per-node type") split a memcg's single obj_cgroup into one per NUMA
> > node so that reparenting LRU folios can take per-node lru locks. As a
> > side effect, the per-CPU obj_stock_pcp -- which caches exactly one
> > cached_objcg -- thrashes on workloads where threads of the same memcg
> > run on different NUMA nodes. The kernel test robot reported a 67.7%
> > regression on stress-ng.switch.ops_per_sec from this pattern.
> >
> > Mirror the multi-slot pattern already used by memcg_stock_pcp: turn
> > nr_bytes and cached_objcg into NR_OBJ_STOCK-element arrays, scan all
> > slots on consume/refill/account, prefer empty slots when inserting,
> > and evict a random slot only when full. With multiple slots a CPU can
> > hold the per-node objcg variants of one memcg plus a few siblings
> > without ever forcing a drain.
> >
> > A single int8_t index records which slot the cached slab stats belong
> > to; the stats are flushed on slot or pgdat change. With NR_OBJ_STOCK
> > = 5 the layout (verified with pahole) is:
> >
> > offset 0 : lock(1) + index(1) + node_id(2) + slab stats(4) = 8B
> > offset 8 : nr_bytes[5] = 10B
> > offset 18 : padding = 6B
> > offset 24 : cached[5] = 40B
> > offset 64 : (line 2) work_struct + flags (cold)
> >
> > so consume_obj_stock, refill_obj_stock and the slab account path each
> > touch exactly one 64-byte cache line on non-debug 64-bit builds.
> >
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/oe-lkp/202605121641.b6a60cb0-lkp@intel.com
> > Fixes: 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg per-node type")
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > Tested-by: kernel test robot <oliver.sang@intel.com>
> > ---
> > @@ -3350,19 +3405,45 @@ static void __refill_obj_stock(struct obj_cgroup *objcg,
> > goto out;
> > }
> > - stock_nr_bytes = stock->nr_bytes;
> > - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
> > - drain_obj_stock(stock);
> > + for (i = 0; i < NR_OBJ_STOCK; ++i) {
> > + struct obj_cgroup *cached = READ_ONCE(stock->cached[i]);
> > +
> > + if (!cached) {
> > + if (empty_slot == -1)
> > + empty_slot = i;
> > + continue;
> > + }
> > + if (cached == objcg) {
> > + slot = i;
> > + break;
> > + }
> > + }
> > +
> > + if (slot == -1) {
> > + slot = empty_slot;
> > + if (slot == -1) {
> > + slot = get_random_u32_below(NR_OBJ_STOCK);
>
> It would break kmalloc_nolock() because _get_random_bytes() uses a spinlock.
> perhaps prandom_u32_state() should be sufficient in this case.
>
> Is there a reason why it uses random eviction, unlike multi-memcg percpu
> charge cache?
Oh I didn't know and actually we are already using get_random_u32_below() in
refill_stock(). So, it need fixing as well. That would be a separate patch.
I will explore prandom_u32_state().
next prev parent reply other threads:[~2026-05-21 1:05 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 5:31 [PATCH 0/4] memcg: shrink obj_stock_pcp and cache multiple objcgs Shakeel Butt
2026-05-20 5:31 ` [PATCH 1/4] memcg: store node_id instead of pglist_data pointer Shakeel Butt
2026-05-20 6:01 ` Harry Yoo
2026-05-20 6:13 ` Muchun Song
2026-05-20 5:31 ` [PATCH 2/4] memcg: uint16_t for nr_bytes in obj_stock_pcp Shakeel Butt
2026-05-20 6:41 ` Harry Yoo
2026-05-20 7:01 ` Harry Yoo
2026-05-21 1:01 ` Shakeel Butt
2026-05-20 13:20 ` David Laight
2026-05-21 1:03 ` Shakeel Butt
2026-05-20 5:31 ` [PATCH 3/4] memcg: int16_t for cached slab stats Shakeel Butt
2026-05-20 7:25 ` Harry Yoo
2026-05-20 5:31 ` [PATCH 4/4] memcg: multi objcg charge support Shakeel Butt
2026-05-20 9:35 ` Harry Yoo
2026-05-21 1:05 ` Shakeel Butt [this message]
2026-05-21 1:43 ` Harry Yoo
2026-05-21 20:19 ` Shakeel Butt
2026-05-21 3:22 ` Joshua Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ag5Z9uIMoXpr3rLP@linux.dev \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=alex@ghiti.fr \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=harry@kernel.org \
--cc=joshua.hahnjy@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=oliver.sang@intel.com \
--cc=qi.zheng@linux.dev \
--cc=roman.gushchin@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.