From: Shakeel Butt <shakeel.butt@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Qi Zheng <qi.zheng@linux.dev>,
Meta kernel team <kernel-team@meta.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org,
kernel test robot <oliver.sang@intel.com>,
alex@ghiti.fr, joshua.hahnjy@gmail.com
Subject: Re: [PATCH v2] memcg: cache obj_stock by memcg, not by objcg pointer
Date: Mon, 18 May 2026 11:32:30 -0700 [thread overview]
Message-ID: <agtZWXhtXSfLQ4GW@linux.dev> (raw)
In-Reply-To: <ags818dAvMjylVmP@linux.dev>
On Mon, May 18, 2026 at 09:46:04AM -0700, Shakeel Butt wrote:
> Cc Alex, Joshua (since they are working on making per-num kmem accounting work)
>
> On Sun, May 17, 2026 at 12:43:08PM -0700, Shakeel Butt wrote:
> > Commit 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg
> > per-node type") split a memcg's single obj_cgroup into one per NUMA
> > node, but the per-CPU obj_stock_pcp still keys cached_objcg by
> > pointer. Cross-NUMA workloads now see a drain on every refill and a
> > miss on every consume that targets a sibling per-node objcg of the
> > same memcg, producing the 67.7% stress-ng switch-mq regression
> > reported by LKP.
> >
> > stock->nr_bytes are fungible across per-node objcgs of one memcg.
> > Treat the cache as keyed by memcg in __consume_obj_stock() and
> > __refill_obj_stock() so siblings share the reserve. Compare via
> > READ_ONCE(objcg->memcg) directly: pointer-compare only, no deref, so
> > the rcu_read_lock contract on obj_cgroup_memcg() does not apply.
> >
> > In the same-memcg refill path also fold the incoming objcg's
> > nr_charged_bytes into the stock; otherwise sub-page residue
> > accumulates on whichever sibling was cached at drain time and
> > obj_cgroup_release() silently drops it, leaking up to nr_node_ids *
> > (PAGE_SIZE - 1) bytes per memcg lifecycle from the page_counter.
> > This issue was reported by Sashiko.
> >
> > Update the now-stale invariant comment on __account_obj_stock().
> >
> > Qi Zheng built a specialized reproducer [1] for the corner case and
> > confirmed the fix.
> >
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/oe-lkp/202605121641.b6a60cb0-lkp@intel.com
> > Fixes: 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg per-node type")
> > Link: https://lore.kernel.org/19693be6-7132-446e-b3fc-b7e9f56e5949@linux.dev/ [1]
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > Debugged-by: Qi Zheng <qi.zheng@linux.dev>
> > Tested-by: Qi Zheng <qi.zheng@linux.dev>
>
> Sashiko [1] reported two issues. First one seems benign but the second one is
> real. However I think we need to take a step back and rethink on how to solve
> this issue in more future proof way.
>
> It seems like Alex and Joshua are working on enabling per-node kmem accounting
> and that would need accurate per-numa association for each per-node objcg.
> So, checking objcg->memcg in consume and refill, would go against the per-node
> kmem accounting.
>
> One way to fix the regression and be future proof is to follow the approach we
> have for memcg_stock_pcp which is multiple per-cpu objcg stocks. We will need to
> test it more and depending on the additional code complexity, we will need to
> decide to backport it to 7.2 or not.
>
>
> [1] https://sashiko.dev/#/patchset/20260517194308.952655-1-shakeel.butt@linux.dev?part=1
>
Previously I had prototyped the multiple per-cpu objcg stocks (when I worked on
multi-memcg percpu stock) which I just rebased on latest linux-next and sent [1].
That patch is additional 100 LOC. For upstreaming, I will break it up into at
least 4 patches. However I am questioning about backporting them to 7.1. One
thing I can do is fix whatever sashiko is asking for and send v3 which can be
ported to 7.1 and then later for 7.2+, revert this short term fix and send out
the multiple objcg patch series.
Any concerns?
[1] http://lore.kernel.org/agtPMpQK2jXdQAY4@linux.dev
prev parent reply other threads:[~2026-05-18 18:32 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-17 19:43 [PATCH v2] memcg: cache obj_stock by memcg, not by objcg pointer Shakeel Butt
2026-05-18 16:46 ` Shakeel Butt
2026-05-18 18:32 ` Shakeel Butt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agtZWXhtXSfLQ4GW@linux.dev \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=alex@ghiti.fr \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=joshua.hahnjy@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=oliver.sang@intel.com \
--cc=qi.zheng@linux.dev \
--cc=roman.gushchin@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.