From: Chen Ridong <chenridong@huaweicloud.com>
To: Tiffany Yang <ynaffit@google.com>
Cc: linux-kernel@vger.kernel.org, "John Stultz" <jstultz@google.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Stephen Boyd" <sboyd@kernel.org>,
"Anna-Maria Behnsen" <anna-maria@linutronix.de>,
"Frederic Weisbecker" <frederic@kernel.org>,
"Tejun Heo" <tj@kernel.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
"Pavel Machek" <pavel@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Chen Ridong" <chenridong@huawei.com>,
kernel-team@android.com, "Jonathan Corbet" <corbet@lwn.net>,
"Shuah Khan" <shuah@kernel.org>,
cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting
Date: Sat, 23 Aug 2025 09:45:26 +0800 [thread overview]
Message-ID: <1b6498f3-ca07-41d5-9637-f20a58184e60@huaweicloud.com> (raw)
In-Reply-To: <dbx8ms7r885f.fsf@ynaffit-andsys.c.googlers.com>
On 2025/8/23 3:32, Tiffany Yang wrote:
> Hi Chen,
>
> Thanks again for taking a look!
>
> Chen Ridong <chenridong@huaweicloud.com> writes:
>
>> On 2025/8/22 14:14, Chen Ridong wrote:
>
>
>>> On 2025/8/22 9:37, Tiffany Yang wrote:
>>>> There isn't yet a clear way to identify a set of "lost" time that
>>>> everyone (or at least a wider group of users) cares about. However,
>>>> users can perform some delay accounting by iterating over components of
>>>> interest. This patch allows cgroup v2 freezing time to be one of those
>>>> components.
>
>>>> Track the cumulative time that each v2 cgroup spends freezing and expose
>>>> it to userland via a new local stat file in cgroupfs. Thank you to
>>>> Michal, who provided the ASCII art in the updated documentation.
>
>>>> To access this value:
>>>> $ mkdir /sys/fs/cgroup/test
>>>> $ cat /sys/fs/cgroup/test/cgroup.stat.local
>>>> freeze_time_total 0
>
>>>> Ensure consistent freeze time reads with freeze_seq, a per-cgroup
>>>> sequence counter. Writes are serialized using the css_set_lock.
>
> ...
>
>>>> spin_lock_irq(&css_set_lock);
>>>> - if (freeze)
>>>> + write_seqcount_begin(&cgrp->freezer.freeze_seq);
>>>> + if (freeze) {
>>>> set_bit(CGRP_FREEZE, &cgrp->flags);
>>>> - else
>>>> + cgrp->freezer.freeze_start_nsec = ts_nsec;
>>>> + } else {
>>>> clear_bit(CGRP_FREEZE, &cgrp->flags);
>>>> + cgrp->freezer.frozen_nsec += (ts_nsec -
>>>> + cgrp->freezer.freeze_start_nsec);
>>>> + }
>>>> + write_seqcount_end(&cgrp->freezer.freeze_seq);
>>>> spin_unlock_irq(&css_set_lock);
>
>
>>> Hello Tiffany,
>
>>> I wanted to check if there are any specific considerations regarding how we should input the ts_nsec
>>> value.
>
>>> Would it be possible to define this directly within the cgroup_do_freeze function rather than
>>> passing it as a parameter? This approach might simplify the implementation and potentially improve
>>> timing accuracy when it have lots of descendants.
>
>
>> I revisited v3, and this was Michal's point.
>> p
>> / | \
>> 1 ... n
>> When we freeze the parent group p, is it expected that all descendant cgroups (1 to n) should share
>> the same frozen timestamp?
>
>
> Yes, this is the expectation from the current change. I understand your
> concern about the accuracy of this measurement (especially when there
> are many descendants), but I agree with Michal's point that the time to
> traverse the descendant cgroups is basically noise relative to the
> quantity we're trying to measure here.
>
>> If the cgroup tree structure is stable, the exact frozen time may not be really matter. However, if
>> the tree is not stable, obtaining the same frozen time is acceptable?
>
> I'm a little unclear as to what you mean about when the cgroup tree is
> unstable. In the case where a new descendant of p is being created, I
> believe the cgroup_mutex prevents that from happening at the same time
> as we are freezing p's other descendants. If it won the race, was
> created unfrozen under p, and then became frozen during cgroup_freeze,
> it would have the same timestamp as the other descendants. If it lost
> the race and was created as a frozen cgroup under p, it would get its
> own timestamp in cgroup_create, so its freezing duration would be
> slightly less than that of the others in the hierarchy. Both values
> would be acceptable for our purposes, but if there was a different case
> you had in mind, please let me know!
>
> Thanks,
What I mean by "stable" is that while cgroup 1 through n might be deleted or have more descendants
created. For example:
n n-1 n-2 ... 1
frozen a a+1 a+2 a+n
unfozen b b+1 b+2 ... b+n
nsec b-a ...
In this case, all frozen_nsec values are b - a, which I believe is correct.
However, consider a scenario where some cgroups are deleted:
n n-1 n-2 ... 1
frozen a a+1 a+2 a+n
// 2 ... n-1 are deleted.
unfozen b b+1
Here, the frozen_nsec for cgroup n would be b - a, but for cgroup 1 it would be (b + 1) - (a + n).
This could introduce some discrepancy / timing inaccuracies.
--
Best regards,
Ridong
next prev parent reply other threads:[~2025-08-23 1:45 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-22 1:37 [PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer Tiffany Yang
2025-08-22 1:37 ` [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting Tiffany Yang
2025-08-22 6:14 ` Chen Ridong
2025-08-22 6:58 ` Chen Ridong
2025-08-22 19:32 ` Tiffany Yang
2025-08-23 1:45 ` Chen Ridong [this message]
2025-08-25 21:00 ` Tiffany Yang
2025-08-22 1:37 ` [PATCH v4 2/2] cgroup: selftests: Add tests for freezer time Tiffany Yang
2025-08-22 7:19 ` Chen Ridong
2025-08-22 18:50 ` Tiffany Yang
2025-08-23 1:47 ` Chen Ridong
2025-08-22 17:51 ` [PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1b6498f3-ca07-41d5-9637-f20a58184e60@huaweicloud.com \
--to=chenridong@huaweicloud.com \
--cc=anna-maria@linutronix.de \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huawei.com \
--cc=corbet@lwn.net \
--cc=frederic@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jstultz@google.com \
--cc=kernel-team@android.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=pavel@kernel.org \
--cc=rafael@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=sboyd@kernel.org \
--cc=shuah@kernel.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=ynaffit@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).