linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Huang Fu <leon.huangfu@shopee.com>
To: inwardvessel@gmail.com
Cc: akpm@linux-foundation.org, cgroups@vger.kernel.org,
	corbet@lwn.net, hannes@cmpxchg.org, jack@suse.cz,
	joel.granados@kernel.org, kyle.meyer@hpe.com,
	lance.yang@linux.dev, laoar.shao@gmail.com,
	leon.huangfu@shopee.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mclapinski@google.com, mhocko@kernel.org, muchun.song@linux.dev,
	roman.gushchin@linux.dev, shakeel.butt@linux.dev,
	yosry.ahmed@linux.dev
Subject: Re: [PATCH mm-new v2] mm/memcontrol: Flush stats when write stat file
Date: Thu,  6 Nov 2025 14:42:00 +0800	[thread overview]
Message-ID: <20251106064200.64198-1-leon.huangfu@shopee.com> (raw)
In-Reply-To: <c704e7d9-5bc9-43e6-98cf-d28c592b0f3b@gmail.com>

>On 11/5/25 7:30 PM, Leon Huang Fu wrote:
>> On Thu, Nov 6, 2025 at 9:19 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>>
>>> +Yosry, JP
>>>
>>> On Wed, Nov 05, 2025 at 03:49:16PM +0800, Leon Huang Fu wrote:
>>>> On high-core count systems, memory cgroup statistics can become stale
>>>> due to per-CPU caching and deferred aggregation. Monitoring tools and
>>>> management applications sometimes need guaranteed up-to-date statistics
>>>> at specific points in time to make accurate decisions.
>>>
>>> Can you explain a bit more on your environment where you are seeing
>>> stale stats? More specifically, how often the management applications
>>> are reading the memcg stats and if these applications are reading memcg
>>> stats for each nodes of the cgroup tree.
>>>
>>> We force flush all the memcg stats at root level every 2 seconds but it
>>> seems like that is not enough for your case. I am fine with an explicit
>>> way for users to flush the memcg stats. In that way only users who want
>>> to has to pay for the flush cost.
>>>
>>
>> Thanks for the feedback. I encountered this issue while running the LTP
>> memcontrol02 test case [1] on a 256-core server with the 6.6.y kernel on XFS,
>> where it consistently failed.
>>
>> I was aware that Yosry had improved the memory statistics refresh mechanism
>> in "mm: memcg: subtree stats flushing and thresholds" [2], so I attempted to
>> backport that patchset to 6.6.y [3]. However, even on the 6.15.0-061500-generic
>> kernel with those improvements, the test still fails intermittently on XFS.
>>
>
>I'm not against this change, but it might be worth testing on a 6.16 or
>later kernel. There were some changes that could affect your
>measurements. One is that flushing was isolated to individual subsystems
>[0] and the other is that updating stats became lockless [1].
>
>[0]
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/cgroup/rstat.c?h=v6.18-rc4&id=5da3bfa029d6809e192d112f39fca4dbe0137aaf
>[1]
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/cgroup/rstat.c?h=v6.18-rc4&id=36df6e3dbd7e7b074e55fec080012184e2fa3a46

Thanks for the suggestion! I've tested on kernel 6.17.7-061707-generic and
the results show the problem has actually gotten worse compared to
6.15.0-061500-generic.

Test results (100 runs each on the LTP memcontrol02 test scenario):

Kernel 6.15.0-061500-generic:
- Failures: 2/100 runs
- Failure rate: 2%

Kernel 6.17.7-061707-generic:
- Failures: 25/100 runs
- Failure rate: 25%

The increased failure rate with the newer kernel suggests that the lockless
stats updates and subsystem isolation changes, while improving performance,
may have reduced the implicit synchronization that was helping mask the
staleness issue in some cases.

This reinforces the need for an explicit flush mechanism (memory.stat_refresh)
to give users control when they need guaranteed up-to-date statistics.

Thanks,
Leon

  reply	other threads:[~2025-11-06  6:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-05  7:49 [PATCH mm-new v2] mm/memcontrol: Flush stats when write stat file Leon Huang Fu
2025-11-05  8:19 ` Michal Hocko
2025-11-05  8:39   ` Lance Yang
2025-11-05  8:51     ` Leon Huang Fu
2025-11-06  1:19 ` Shakeel Butt
2025-11-06  3:30   ` Leon Huang Fu
2025-11-06  5:35     ` JP Kobryn
2025-11-06  6:42       ` Leon Huang Fu [this message]
2025-11-06 23:55     ` Shakeel Butt
2025-11-10  6:37       ` Leon Huang Fu
2025-11-10 20:19         ` Yosry Ahmed
2025-11-06 17:02 ` JP Kobryn
2025-11-10  6:20   ` Leon Huang Fu
2025-11-10 19:24     ` JP Kobryn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251106064200.64198-1-leon.huangfu@shopee.com \
    --to=leon.huangfu@shopee.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=hannes@cmpxchg.org \
    --cc=inwardvessel@gmail.com \
    --cc=jack@suse.cz \
    --cc=joel.granados@kernel.org \
    --cc=kyle.meyer@hpe.com \
    --cc=lance.yang@linux.dev \
    --cc=laoar.shao@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mclapinski@google.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).