From: Johannes Weiner <hannes@cmpxchg.org>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
Hao Li <hao.li@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Vlastimil Babka <vbabka@suse.cz>,
Harry Yoo <harry.yoo@oracle.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/5] mm: memcg: separate slab stat accounting from objcg charge cache
Date: Tue, 3 Mar 2026 10:43:29 -0500 [thread overview]
Message-ID: <aacBoazC21TAi-Q2@cmpxchg.org> (raw)
In-Reply-To: <aablae2eFl9ne5fW@linux.dev>
On Tue, Mar 03, 2026 at 05:45:18AM -0800, Shakeel Butt wrote:
> On Tue, Mar 03, 2026 at 11:42:31AM +0100, Vlastimil Babka (SUSE) wrote:
> > On 3/3/26 09:54, Hao Li wrote:
> > > On Mon, Mar 02, 2026 at 02:50:18PM -0500, Johannes Weiner wrote:
> > >> Cgroup slab metrics are cached per-cpu the same way as the sub-page
> > >> charge cache. However, the intertwined code to manage those dependent
> > >> caches right now is quite difficult to follow.
> > >>
> > >> Specifically, cached slab stat updates occur in consume() if there was
> > >> enough charge cache to satisfy the new object. If that fails, whole
> > >> pages are reserved, and slab stats are updated when the remainder of
> > >> those pages, after subtracting the size of the new slab object, are
> > >> put into the charge cache. This already juggles a delicate mix of the
> > >> object size, the page charge size, and the remainder to put into the
> > >> byte cache. Doing slab accounting in this path as well is fragile, and
> > >> has recently caused a bug where the input parameters between the two
> > >> caches were mixed up.
> > >>
> > >> Refactor the consume() and refill() paths into unlocked and locked
> > >> variants that only do charge caching. Then let the slab path manage
> > >> its own lock section and open-code charging and accounting.
> > >>
> > >> This makes the slab stat cache subordinate to the charge cache:
> > >> __refill_obj_stock() is called first to prepare it;
> > >> __account_obj_stock() follows to hitch a ride.
> > >>
> > >> This results in a minor behavioral change: previously, a mismatching
> > >> percpu stock would always be drained for the purpose of setting up
> > >> slab account caching, even if there was no byte remainder to put into
> > >> the charge cache. Now, the stock is left alone, and slab accounting
> > >> takes the uncached path if there is a mismatch. This is exceedingly
> > >> rare, and it was probably never worth draining the whole stock just to
> > >> cache the slab stat update.
> > >>
> > >> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > >> ---
> > >> mm/memcontrol.c | 100 +++++++++++++++++++++++++++++-------------------
> > >> 1 file changed, 61 insertions(+), 39 deletions(-)
> > >>
> > >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > >> index 4f12b75743d4..9c6f9849b717 100644
> > >> --- a/mm/memcontrol.c
> > >> +++ b/mm/memcontrol.c
> > >> @@ -3218,16 +3218,18 @@ static struct obj_stock_pcp *trylock_stock(void)
> > >>
> > >
> > > [...]
> > >
> > >> @@ -3376,17 +3383,14 @@ static bool obj_stock_flush_required(struct obj_stock_pcp *stock,
> > >> return flush;
> > >> }
> > >>
> > >> -static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
> > >> - bool allow_uncharge, int nr_acct, struct pglist_data *pgdat,
> > >> - enum node_stat_item idx)
> > >> +static void __refill_obj_stock(struct obj_cgroup *objcg,
> > >> + struct obj_stock_pcp *stock,
> > >> + unsigned int nr_bytes,
> > >> + bool allow_uncharge)
> > >> {
> > >> - struct obj_stock_pcp *stock;
> > >> unsigned int nr_pages = 0;
> > >>
> > >> - stock = trylock_stock();
> > >> if (!stock) {
> > >> - if (pgdat)
> > >> - __account_obj_stock(objcg, NULL, nr_acct, pgdat, idx);
> > >> nr_pages = nr_bytes >> PAGE_SHIFT;
> > >> nr_bytes = nr_bytes & (PAGE_SIZE - 1);
> > >> atomic_add(nr_bytes, &objcg->nr_charged_bytes);
> > >> @@ -3404,20 +3408,25 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
> > >> }
> > >> stock->nr_bytes += nr_bytes;
> > >>
> > >> - if (pgdat)
> > >> - __account_obj_stock(objcg, stock, nr_acct, pgdat, idx);
> > >> -
> > >> if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) {
> > >> nr_pages = stock->nr_bytes >> PAGE_SHIFT;
> > >> stock->nr_bytes &= (PAGE_SIZE - 1);
> > >> }
> > >>
> > >> - unlock_stock(stock);
> > >> out:
> > >> if (nr_pages)
> > >> obj_cgroup_uncharge_pages(objcg, nr_pages);
> > >> }
> > >>
> > >> +static void refill_obj_stock(struct obj_cgroup *objcg,
> > >> + unsigned int nr_bytes,
> > >> + bool allow_uncharge)
> > >> +{
> > >> + struct obj_stock_pcp *stock = trylock_stock();
> > >> + __refill_obj_stock(objcg, stock, nr_bytes, allow_uncharge);
> > >> + unlock_stock(stock);
> > >
> > > Hi Johannes,
> > >
> > > I noticed that after this patch, obj_cgroup_uncharge_pages() is now inside
> > > the obj_stock.lock critical section. Since obj_cgroup_uncharge_pages() calls
> > > refill_stock(), which seems non-trivial, this might increase the lock hold time.
> > > In particular, could that lead to more failed trylocks for IRQ handlers on
> > > non-RT kernel (or for tasks that preempt others on RT kernel)?
Good catch. I did ponder this, but forgot by the time I wrote the
changelog.
> > Yes, it also seems a bit self-defeating? (at least in theory)
> >
> > refill_obj_stock()
> > trylock_stock()
> > __refill_obj_stock()
> > obj_cgroup_uncharge_pages()
> > refill_stock()
> > local_trylock() -> nested, will fail
>
> Not really as the local_locks are different i.e. memcg_stock.lock in
> refill_stock() and obj_stock.lock in refill_obj_stock().
Right, refilling the *byte* stock could produce enough excess that we
refill the *page* stock. Which in turn could produce enough excess
that we drain that back to the page counters (shared atomics).
> However Hao's concern is valid and I think it can be easily fixed by
> moving obj_cgroup_uncharge_pages() out of obj_stock.lock.
Note that we now have multiple callsites of __refill_obj_stock(). Do
we care enough to move this to the caller?
There are a few other places with a similar pattern:
- drain_obj_stock(): calls memcg_uncharge() under the lock
- drain_stock(): calls memcg_uncharge() under the lock
- refill_stock(): still does full drain_stock()
All of these could be more intentional about only updating the per-cpu
data under the lock and the page counters outside of it.
Given that IRQ allocations/frees are rare, nested ones even rarer, and
the "slowpath" is a few extra atomics, I'm not sure it's worth the
code complication. At least until proven otherwise.
What do you think?
next prev parent reply other threads:[~2026-03-03 15:43 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-02 19:50 [PATCH 0/5]: memcg: obj stock and slab stat caching cleanups Johannes Weiner
2026-03-02 19:50 ` [PATCH 1/5] mm: memcg: factor out trylock_stock() and unlock_stock() Johannes Weiner
2026-03-02 21:43 ` Shakeel Butt
2026-03-03 7:56 ` Hao Li
2026-03-03 9:23 ` Vlastimil Babka (SUSE)
2026-03-05 13:09 ` Johannes Weiner
2026-03-02 19:50 ` [PATCH 2/5] mm: memcg: simplify objcg charge size and stock remainder math Johannes Weiner
2026-03-02 21:44 ` Shakeel Butt
2026-03-03 8:01 ` Hao Li
2026-03-03 9:34 ` Vlastimil Babka (SUSE)
2026-03-02 19:50 ` [PATCH 3/5] mm: memcontrol: split out __obj_cgroup_charge() Johannes Weiner
2026-03-02 21:45 ` Shakeel Butt
2026-03-03 8:04 ` Hao Li
2026-03-03 9:37 ` Vlastimil Babka (SUSE)
2026-03-02 19:50 ` [PATCH 4/5] mm: memcontrol: use __account_obj_stock() in the !locked path Johannes Weiner
2026-03-02 21:50 ` Shakeel Butt
2026-03-03 8:06 ` Hao Li
2026-03-03 9:39 ` Vlastimil Babka (SUSE)
2026-03-02 19:50 ` [PATCH 5/5] mm: memcg: separate slab stat accounting from objcg charge cache Johannes Weiner
2026-03-02 22:20 ` Shakeel Butt
2026-03-03 8:54 ` Hao Li
2026-03-03 10:42 ` Vlastimil Babka (SUSE)
2026-03-03 13:45 ` Shakeel Butt
2026-03-03 15:43 ` Johannes Weiner [this message]
2026-03-03 16:26 ` Shakeel Butt
2026-03-04 0:38 ` Hao Li
2026-03-04 13:02 ` Vlastimil Babka (SUSE)
2026-03-03 21:11 ` [PATCH 0/5]: memcg: obj stock and slab stat caching cleanups Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aacBoazC21TAi-Q2@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hao.li@linux.dev \
--cc=harry.yoo@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=vbabka@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.