All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeel.butt@linux.dev>
To: Harry Yoo <harry@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	 Qi Zheng <qi.zheng@linux.dev>, Alexandre Ghiti <alex@ghiti.fr>,
	 Joshua Hahn <joshua.hahnjy@gmail.com>,
	Meta kernel team <kernel-team@meta.com>,
	linux-mm@kvack.org,  cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	 kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH v3] memcg: cache obj_stock by memcg, not by objcg pointer
Date: Tue, 19 May 2026 07:02:47 -0700	[thread overview]
Message-ID: <agxszIIN6FtK0fEb@linux.dev> (raw)
In-Reply-To: <4e296262-fbbf-4ac7-aecc-3ef831583704@kernel.org>

On Tue, May 19, 2026 at 03:46:51PM +0900, Harry Yoo wrote:
> 
> 
> On 5/19/26 8:41 AM, Shakeel Butt wrote:
> > On Mon, May 18, 2026 at 03:28:27PM -0700, Shakeel Butt wrote:
> > > Commit 01b9da291c49 ("mm: memcontrol: convert objcg to be per-memcg
> > > per-node type") split a memcg's single obj_cgroup into one per NUMA
> > > node, but the per-CPU obj_stock_pcp still keys cached_objcg by
> > > pointer. Cross-NUMA workloads now see a drain on every refill and a
> > > miss on every consume that targets a sibling per-node objcg of the
> > > same memcg, producing the 67.7% stress-ng switch-mq regression
> > > reported by LKP.
> > > 
> > > stock->nr_bytes are fungible across per-node objcgs of one memcg.
> > > Treat the cache as keyed by memcg in __consume_obj_stock() and
> > > __refill_obj_stock() so siblings share the reserve. Compare via
> > > READ_ONCE(objcg->memcg) directly: pointer-compare only, no deref, so
> > > the rcu_read_lock contract on obj_cgroup_memcg() does not apply.
> > > 
> > > Sharing the reserve without re-caching means bytes funded by one
> > > per-node objcg's slow path can be consumed/freed under a different
> > > sibling, leaving sub-page residue on whichever sibling was cached at
> > > drain time. The pre-existing obj_cgroup_release() path would WARN and
> > > silently drop that residue, leaking up to nr_node_ids * (PAGE_SIZE - 1)
> > > bytes per memcg lifecycle from the page_counter. Forward the residue
> > > into a per-node objcg of the same (post-reparent) memcg at release time
> > > instead, so it can be reconciled later via a refill atomic_xchg or
> > > another release; the chain terminates at root_mem_cgroup, whose
> > > page_counter has no enforced limit.
> > > 
> > > Please note that this is temporary fix and will be reverted when
> > > per-node kmem accounting is introduced.
> 
> ... because once per-node kmem accounting is introduced,
> "stock->nr_bytes are fungible across per-node objcgs of one memcg"
> no longer holds?

Yes

> 
> And the follow-up plain is to revert this and address it with a multi-objcg
> percpu stock [1], similar to a multi-memcg percpu charge cache we have now,
> right? (regardless of per-node kmem accounting's progress)
> 

Yes

> If this temporary fix imposes other potential correctness issues, would it
> make sense to land [1] in mainline before the next LTS release and skip this
> temporary fix?
> 
> [1] https://lore.kernel.org/oe-lkp/agtPMpQK2jXdQAY4@linux.dev
> 

The full clean solution might take one more cycle and I think we can not just
ignore 67% regression on 7.1.

  reply	other threads:[~2026-05-19 14:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18 22:28 [PATCH v3] memcg: cache obj_stock by memcg, not by objcg pointer Shakeel Butt
2026-05-18 23:41 ` Shakeel Butt
2026-05-19  3:35   ` Qi Zheng
2026-05-19  6:46   ` Harry Yoo
2026-05-19 14:02     ` Shakeel Butt [this message]
2026-05-19 15:00       ` Harry Yoo
2026-05-19 20:11         ` Shakeel Butt
2026-05-19 20:49           ` Andrew Morton
2026-05-22 16:16             ` Shakeel Butt
2026-05-26  4:09               ` Harry Yoo
2026-05-19 23:39           ` Harry Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agxszIIN6FtK0fEb@linux.dev \
    --to=shakeel.butt@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=harry@kernel.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=qi.zheng@linux.dev \
    --cc=roman.gushchin@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.