[RFC PATCH v2] cgroup: Track time in cgroup v2 freezer

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer
@ 2025-07-14  5:00 Tiffany Yang
  2025-07-17 12:56 ` Michal Koutný
  0 siblings, 1 reply; 18+ messages in thread
From: Tiffany Yang @ 2025-07-14  5:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: John Stultz, Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Tejun Heo, Johannes Weiner,
	Michal Koutný, Rafael J. Wysocki, Pavel Machek,
	Roman Gushchin, Chen Ridong, kernel-team, Jonathan Corbet,
	cgroups, linux-doc

The cgroup v2 freezer controller allows user processes to be dynamically
added to and removed from an interruptible frozen state from
userspace. This feature is helpful for application management, as
a background app can be frozen to prevent its threads from being
scheduled or otherwise contending with foreground tasks for
resources. However, the application is usually unaware that it was
frozen, which can cause issues by disrupting any internal monitoring
that it is performing.

As an example, an application may implement a watchdog thread for one of
its high priority maintenance tasks that operates by checking some state
of that task at a set interval to ensure it has made progress. The key
challenge here is that the task is only expected to make progress when
the application it belongs to has the opportunity to run, but there's no
application-relative time to set the watchdog timer against. Instead,
the next timeout is set relative to system time, using an approximation
that assumes the application will continue to be scheduled as
normal. If the task misses that approximate deadline because the
application was frozen, without any way to know that, the watchdog may
kill the healthy process.

Other sources of delay can cause similar issues, but this change focuses
on allowing frozen time to be accounted for in particular because of how
large it can grow and how unevenly it can affect applications running on
the system. To allow an application to better account for the time it
spends running, I propose tracking the time each cgroup spends freezing
and exposing it to userland via a new core interface file in
cgroupfs (cgroup.freeze.stat). I used this naming because utility
controllers like "kill" and "freeze" are exposed as cgroup v2 core
interface files, but I'm happy to change it if there's a convention
others would prefer!

Currently, the cgroup css_set_lock is used to serialize accesses to the
CGRP_FREEZE bit of cgrp->flags and the new cgroup_freezer_state counters
(freeze_time_start_ns and freeze_time_total_ns). If we start to see
higher contention on this lock, we may want to introduce a v2 freezer
state-specific lock to avoid having to take the global lock every time
a cgroup.freeze.stat file is read.

Any feedback would be much appreciated!

Thank you,
Tiffany

Signed-off-by: Tiffany Yang <ynaffit@google.com>
---
v2:
* Track per-cgroup freezing time instead of per-task frozen time as
  suggested by Tejun Heo

Cc: John Stultz <jstultz@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Pavel Machek <pavel@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  8 ++++++++
 include/linux/cgroup-defs.h             |  6 ++++++
 kernel/cgroup/cgroup.c                  | 24 ++++++++++++++++++++++++
 kernel/cgroup/freezer.c                 |  8 ++++++--
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index bd98ea3175ec..9fbf3a959bdf 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1018,6 +1018,14 @@ All cgroup core files are prefixed with "cgroup."
 	it's possible to delete a frozen (and empty) cgroup, as well as
 	create new sub-cgroups.

+  cgroup.freeze.stat
+	A read-only flat-keyed file which exists in non-root cgroups.
+	The following entry is defined:
+
+	  freeze_time_total_ns
+		Cumulative time that this cgroup has spent in the freezing
+		state, regardless of whether or not it reaches "frozen".
+
   cgroup.kill
 	A write-only single value file which exists in non-root cgroups.
 	The only allowed value is "1".
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index e61687d5e496..86332d83fa22 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -436,6 +436,12 @@ struct cgroup_freezer_state {
 	 * frozen, SIGSTOPped, and PTRACEd.
 	 */
 	int nr_frozen_tasks;
+
+	/* Time when the cgroup was requested to freeze */
+	u64 freeze_time_start_ns;
+
+	/* Total duration the cgroup has spent freezing */
+	u64 freeze_time_total_ns;
 };

 struct cgroup {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index a723b7dc6e4e..1f54d16a8713 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4050,6 +4050,23 @@ static ssize_t cgroup_freeze_write(struct kernfs_open_file *of,
 	return nbytes;
 }

+static int cgroup_freeze_stat_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	u64 freeze_time = 0;
+
+	spin_lock_irq(&css_set_lock);
+	if (test_bit(CGRP_FREEZE, &cgrp->flags))
+		freeze_time = ktime_get_ns() - cgrp->freezer.freeze_time_start_ns;
+
+	freeze_time += cgrp->freezer.freeze_time_total_ns;
+	spin_unlock_irq(&css_set_lock);
+
+	seq_printf(seq, "freeze_time_total_ns %llu\n", freeze_time);
+
+	return 0;
+}
+
 static void __cgroup_kill(struct cgroup *cgrp)
 {
 	struct css_task_iter it;
@@ -5355,6 +5372,11 @@ static struct cftype cgroup_base_files[] = {
 		.seq_show = cgroup_freeze_show,
 		.write = cgroup_freeze_write,
 	},
+	{
+		.name = "cgroup.freeze.stat",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = cgroup_freeze_stat_show,
+	},
 	{
 		.name = "cgroup.kill",
 		.flags = CFTYPE_NOT_ON_ROOT,
@@ -5758,6 +5780,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
 	 * if the parent has to be frozen, the child has too.
 	 */
 	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
+	cgrp->freezer.freeze_time_total_ns = 0;
 	if (cgrp->freezer.e_freeze) {
 		/*
 		 * Set the CGRP_FREEZE flag, so when a process will be
@@ -5766,6 +5789,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
 		 * consider it frozen immediately.
 		 */
 		set_bit(CGRP_FREEZE, &cgrp->flags);
+		cgrp->freezer.freeze_time_start_ns = ktime_get_ns();
 		set_bit(CGRP_FROZEN, &cgrp->flags);
 	}

diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
index bf1690a167dd..6f3fab252140 100644
--- a/kernel/cgroup/freezer.c
+++ b/kernel/cgroup/freezer.c
@@ -179,10 +179,14 @@ static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
 	lockdep_assert_held(&cgroup_mutex);

 	spin_lock_irq(&css_set_lock);
-	if (freeze)
+	if (freeze) {
 		set_bit(CGRP_FREEZE, &cgrp->flags);
-	else
+		cgrp->freezer.freeze_time_start_ns = ktime_get_ns();
+	} else {
 		clear_bit(CGRP_FREEZE, &cgrp->flags);
+		cgrp->freezer.freeze_time_total_ns += (ktime_get_ns() -
+			cgrp->freezer.freeze_time_start_ns);
+	}
 	spin_unlock_irq(&css_set_lock);

 	if (freeze)
-- 
2.50.0.727.gbf7dc18ff4-goog

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer
  2025-07-14  5:00 [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer Tiffany Yang
@ 2025-07-17 12:56 ` Michal Koutný
  2025-07-17 13:52   ` Chen Ridong
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Michal Koutný @ 2025-07-17 12:56 UTC (permalink / raw)
  To: Tiffany Yang
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Stephen Boyd,
	Anna-Maria Behnsen, Frederic Weisbecker, Tejun Heo,
	Johannes Weiner, Rafael J. Wysocki, Pavel Machek, Roman Gushchin,
	Chen Ridong, kernel-team, Jonathan Corbet, cgroups, linux-doc

[-- Attachment #1: Type: text/plain, Size: 3448 bytes --]

Hello Tiffany.

On Sun, Jul 13, 2025 at 10:00:09PM -0700, Tiffany Yang <ynaffit@google.com> wrote:

> Other sources of delay can cause similar issues, but this change focuses
> on allowing frozen time to be accounted for in particular because of how
> large it can grow and how unevenly it can affect applications running on
> the system.

I'd like to incorporate the reason from your other mail:
| Since there isn't yet a clear way to identify a set of "lost" time
| that everyone (or at least a wider group of users) cares about, it
| seems like iterating over components of interest is the best way 
into this commit message (because that's a stronger ponit that your use
case alone).

> Any feedback would be much appreciated!

I can see benefits of this new stat field conceptually, I have some
remarks to implementation and suggestions to conventions below.

> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1018,6 +1018,14 @@ All cgroup core files are prefixed with "cgroup."
>  	it's possible to delete a frozen (and empty) cgroup, as well as
>  	create new sub-cgroups.
>  
> +  cgroup.freeze.stat

With the given implementation (and use scenario), this'd better exposed
in
  cgroup.freeze.stat.local

I grok the hierarchical summing would make little sense and it'd make
implementaion more complex. With that I'm thinking about formulation:

	Cumulative time that cgroup has spent between freezing and
	thawing, regardless of whether by self or ancestor cgroups. NB
	(not) reaching "frozen" state is not accounted here.

> +	A read-only flat-keyed file which exists in non-root cgroups.
> +	The following entry is defined:
> +
> +	  freeze_time_total_ns
> +		Cumulative time that this cgroup has spent in the freezing
> +		state, regardless of whether or not it reaches "frozen".
> +

Rather use microseconds, it's the cgroup API convention and I'm not
sure nanosecods exposed here are the needed precision.

       1    _____
frozen 0 __/     \__
          ab    cd

Yeah, I find the mesurent between a and c the sanest.

> +static int cgroup_freeze_stat_show(struct seq_file *seq, void *v)
> +{
> +	struct cgroup *cgrp = seq_css(seq)->cgroup;
> +	u64 freeze_time = 0;
> +
> +	spin_lock_irq(&css_set_lock);
> +	if (test_bit(CGRP_FREEZE, &cgrp->flags))
> +		freeze_time = ktime_get_ns() - cgrp->freezer.freeze_time_start_ns;
> +
> +	freeze_time += cgrp->freezer.freeze_time_total_ns;
> +	spin_unlock_irq(&css_set_lock);

I don't like taking this spinlock only for the matter of reading this
attribute. The intention should be to keep the (un)freezeing mostly
unaffected at the expense of these readers (seqcount or u64 stats?).

Alternative approach: either there's outer watcher who can be notified
by cgroup.events:frozen or it's an inner watcher who couldn't actively
read the field anyway. So the field could only show completed
freeze/thaw cycles from the past (i.e. not substitute clock_gettime(2)
when the cgroup is frozen), which could simplify querying the flag too.

> @@ -5758,6 +5780,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
>  	 * if the parent has to be frozen, the child has too.
>  	 */
>  	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
> +	cgrp->freezer.freeze_time_total_ns = 0;

struct cgroup is kzalloc'd, this is unnecessary

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer
  2025-07-17 12:56 ` Michal Koutný
@ 2025-07-17 13:52   ` Chen Ridong
  2025-07-17 17:06     ` Tejun Heo
  2025-07-22 22:27     ` Tiffany Yang
  2025-07-17 17:05   ` Tejun Heo
  2025-07-22 22:16   ` [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer Tiffany Yang
  2 siblings, 2 replies; 18+ messages in thread
From: Chen Ridong @ 2025-07-17 13:52 UTC (permalink / raw)
  To: Michal Koutný, Tiffany Yang
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Stephen Boyd,
	Anna-Maria Behnsen, Frederic Weisbecker, Tejun Heo,
	Johannes Weiner, Rafael J. Wysocki, Pavel Machek, Roman Gushchin,
	Chen Ridong, kernel-team, Jonathan Corbet, cgroups, linux-doc



On 2025/7/17 20:56, Michal Koutný wrote:
> Hello Tiffany.
> 
> On Sun, Jul 13, 2025 at 10:00:09PM -0700, Tiffany Yang <ynaffit@google.com> wrote:
>  
>> Other sources of delay can cause similar issues, but this change focuses
>> on allowing frozen time to be accounted for in particular because of how
>> large it can grow and how unevenly it can affect applications running on
>> the system.
> 
> I'd like to incorporate the reason from your other mail:
> | Since there isn't yet a clear way to identify a set of "lost" time
> | that everyone (or at least a wider group of users) cares about, it
> | seems like iterating over components of interest is the best way 
> into this commit message (because that's a stronger ponit that your use
> case alone).
> 
> 
>> Any feedback would be much appreciated!
> 
> I can see benefits of this new stat field conceptually, I have some
> remarks to implementation and suggestions to conventions below.
> 
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -1018,6 +1018,14 @@ All cgroup core files are prefixed with "cgroup."
>>  	it's possible to delete a frozen (and empty) cgroup, as well as
>>  	create new sub-cgroups.
>>  
>> +  cgroup.freeze.stat
> 
> With the given implementation (and use scenario), this'd better exposed
> in
>   cgroup.freeze.stat.local
> 

Would it be possible to add this field to either cgroup.event or cgroup.stat?
Since the frozen status is already tracked in cgroup.event, this placement would maintain better
cohesion with existing metrics.

This is just a suggestion.

Best regards,
Ridong

> I grok the hierarchical summing would make little sense and it'd make
> implementaion more complex. With that I'm thinking about formulation:
> 
> 	Cumulative time that cgroup has spent between freezing and
> 	thawing, regardless of whether by self or ancestor cgroups. NB
> 	(not) reaching "frozen" state is not accounted here.
> 
>> +	A read-only flat-keyed file which exists in non-root cgroups.
>> +	The following entry is defined:
>> +
>> +	  freeze_time_total_ns
>> +		Cumulative time that this cgroup has spent in the freezing
>> +		state, regardless of whether or not it reaches "frozen".
>> +
> 
> Rather use microseconds, it's the cgroup API convention and I'm not
> sure nanosecods exposed here are the needed precision.
> 
>        1    _____
> frozen 0 __/     \__
>           ab    cd
> 
> Yeah, I find the mesurent between a and c the sanest.
> 
> 
>> +static int cgroup_freeze_stat_show(struct seq_file *seq, void *v)
>> +{
>> +	struct cgroup *cgrp = seq_css(seq)->cgroup;
>> +	u64 freeze_time = 0;
>> +
>> +	spin_lock_irq(&css_set_lock);
>> +	if (test_bit(CGRP_FREEZE, &cgrp->flags))
>> +		freeze_time = ktime_get_ns() - cgrp->freezer.freeze_time_start_ns;
>> +
>> +	freeze_time += cgrp->freezer.freeze_time_total_ns;
>> +	spin_unlock_irq(&css_set_lock);
> 
> I don't like taking this spinlock only for the matter of reading this
> attribute. The intention should be to keep the (un)freezeing mostly
> unaffected at the expense of these readers (seqcount or u64 stats?).
> 
> Alternative approach: either there's outer watcher who can be notified
> by cgroup.events:frozen or it's an inner watcher who couldn't actively
> read the field anyway. So the field could only show completed
> freeze/thaw cycles from the past (i.e. not substitute clock_gettime(2)
> when the cgroup is frozen), which could simplify querying the flag too.
> 
>> @@ -5758,6 +5780,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
>>  	 * if the parent has to be frozen, the child has too.
>>  	 */
>>  	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
>> +	cgrp->freezer.freeze_time_total_ns = 0;
> 
> struct cgroup is kzalloc'd, this is unnecessary
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer
  2025-07-17 12:56 ` Michal Koutný
  2025-07-17 13:52   ` Chen Ridong
@ 2025-07-17 17:05   ` Tejun Heo
  2025-07-18  8:20     ` Michal Koutný
  2025-07-22 22:16   ` [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer Tiffany Yang
  2 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2025-07-17 17:05 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tiffany Yang, linux-kernel, John Stultz, Thomas Gleixner,
	Stephen Boyd, Anna-Maria Behnsen, Frederic Weisbecker,
	Johannes Weiner, Rafael J. Wysocki, Pavel Machek, Roman Gushchin,
	Chen Ridong, kernel-team, Jonathan Corbet, cgroups, linux-doc

Hello,

On Thu, Jul 17, 2025 at 02:56:13PM +0200, Michal Koutný wrote:
...
> > +  cgroup.freeze.stat
> 
> With the given implementation (and use scenario), this'd better exposed
> in
>   cgroup.freeze.stat.local
> 
> I grok the hierarchical summing would make little sense and it'd make
> implementaion more complex. With that I'm thinking about formulation:
> 
> 	Cumulative time that cgroup has spent between freezing and
> 	thawing, regardless of whether by self or ancestor cgroups. NB
> 	(not) reaching "frozen" state is not accounted here.

I wonder what hierarchical summing would look like for this. It's absolute
time interval measurement and I'm not sure whether summing up the
descendants' durations is the best way to go about it. ie. Should it be the
total duration any of the descendants are freezing or should it be sum of
the freezing durations of all descendants?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer
  2025-07-17 13:52   ` Chen Ridong
@ 2025-07-17 17:06     ` Tejun Heo
  2025-07-22 22:41       ` Tiffany Yang
  2025-07-22 22:27     ` Tiffany Yang
  1 sibling, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2025-07-17 17:06 UTC (permalink / raw)
  To: Chen Ridong
  Cc: Michal Koutný, Tiffany Yang, linux-kernel, John Stultz,
	Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Johannes Weiner, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, cgroups, linux-doc

On Thu, Jul 17, 2025 at 09:52:38PM +0800, Chen Ridong wrote:
> > With the given implementation (and use scenario), this'd better exposed
> > in
> >   cgroup.freeze.stat.local
> > 
> 
> Would it be possible to add this field to either cgroup.event or cgroup.stat?
> Since the frozen status is already tracked in cgroup.event, this placement would maintain better
> cohesion with existing metrics.
> 
> This is just a suggestion.

Yeah, given that the freezer is an integral part of cgroup core, using
cgroup.stat[.local] probably makes more sense.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer
  2025-07-17 17:05   ` Tejun Heo
@ 2025-07-18  8:20     ` Michal Koutný
  2025-07-18  9:26       ` Chen Ridong
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Koutný @ 2025-07-18  8:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Tiffany Yang, linux-kernel, John Stultz, Thomas Gleixner,
	Stephen Boyd, Anna-Maria Behnsen, Frederic Weisbecker,
	Johannes Weiner, Rafael J. Wysocki, Pavel Machek, Roman Gushchin,
	Chen Ridong, kernel-team, Jonathan Corbet, cgroups, linux-doc

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

On Thu, Jul 17, 2025 at 07:05:14AM -1000, Tejun Heo <tj@kernel.org> wrote:
> I wonder what hierarchical summing would look like for this.

So do I.
Thus I meant to expose this only in a *.local file not the hierarchical
one.

But I realize it should [1] match cpu.stat[.local]:thottled_usec
since they're similar quantities in principle.
- cpu.stat:thottled_usec
  - sums the time the cgroup's quota was in effect
  - not hierarchical (:-/)
- cpu.stat.local:thottled_usec
  - not hierarchical
  - sums the time cgroup's or ancestor's quota was in effect
    -> IIUC this is what's the motivation of the original patch

HTH,
Michal

[1] I'd find it more logical if
cpu.stat:thottled_usec were cpu.stat.local:thottling_usec and
cpu.stat.local:thottled_usec were cpu.stat.local:throttled_usec.
Only to illustrate my understanding of hierarchy in cpu.stat, it doesn't
matter since it's what it is now.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer
  2025-07-18  8:20     ` Michal Koutný
@ 2025-07-18  9:26       ` Chen Ridong
  2025-07-18 13:58         ` cpu.stat in core or cpu controller (was Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer) Michal Koutný
  0 siblings, 1 reply; 18+ messages in thread
From: Chen Ridong @ 2025-07-18  9:26 UTC (permalink / raw)
  To: Michal Koutný, Tejun Heo
  Cc: Tiffany Yang, linux-kernel, John Stultz, Thomas Gleixner,
	Stephen Boyd, Anna-Maria Behnsen, Frederic Weisbecker,
	Johannes Weiner, Rafael J. Wysocki, Pavel Machek, Roman Gushchin,
	Chen Ridong, kernel-team, Jonathan Corbet, cgroups, linux-doc



On 2025/7/18 16:20, Michal Koutný wrote:
> On Thu, Jul 17, 2025 at 07:05:14AM -1000, Tejun Heo <tj@kernel.org> wrote:
>> I wonder what hierarchical summing would look like for this.
> 
> So do I.
> Thus I meant to expose this only in a *.local file not the hierarchical
> one.
> 
> But I realize it should [1] match cpu.stat[.local]:thottled_usec
> since they're similar quantities in principle.
> - cpu.stat:thottled_usec
>   - sums the time the cgroup's quota was in effect
>   - not hierarchical (:-/)
> - cpu.stat.local:thottled_usec
>   - not hierarchical
>   - sums the time cgroup's or ancestor's quota was in effect
>     -> IIUC this is what's the motivation of the original patch
> 
> HTH,
> Michal
> 
> [1] I'd find it more logical if
> cpu.stat:thottled_usec were cpu.stat.local:thottling_usec and
> cpu.stat.local:thottled_usec were cpu.stat.local:throttled_usec.
> Only to illustrate my understanding of hierarchy in cpu.stat, it doesn't
> matter since it's what it is now.

Hi Michal and TJ,

I'd like to raise a separate thought unrelated to the current discussion.:)

With the recent merge of the series "cgroup: separate rstat trees," the rstat are not bound to CPU
system. This makes me wonder: should we consider moving the cpu.stat and cpu.stat.local interfaces
to the CPU subsystem?

The CPU subsystem could then align more closely with other resource controllers like memory or I/O
subsystems. By decoupling these CPU-specific statistics from the cgroup core, it could help keep
both cgroup and rstat implementations more focused.

Is there any particular reason why the CPU subsystem must remain bound to the cgroup core?

Looking forward to your insights.

Best regards,
Ridong


^ permalink raw reply	[flat|nested] 18+ messages in thread

* cpu.stat in core or cpu controller (was Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer)
  2025-07-18  9:26       ` Chen Ridong
@ 2025-07-18 13:58         ` Michal Koutný
  2025-07-19  2:01           ` Chen Ridong
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Koutný @ 2025-07-18 13:58 UTC (permalink / raw)
  To: Chen Ridong
  Cc: Tejun Heo, Tiffany Yang, linux-kernel, John Stultz,
	Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Johannes Weiner, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, cgroups, linux-doc

[-- Attachment #1: Type: text/plain, Size: 1689 bytes --]

On Fri, Jul 18, 2025 at 05:26:54PM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
> With the recent merge of the series "cgroup: separate rstat trees," the rstat are not bound to CPU
> system. This makes me wonder: should we consider moving the cpu.stat and cpu.stat.local interfaces
> to the CPU subsystem?

Note that fields printed in cpu.stat are combination of "core" and cpu
controller values.

> The CPU subsystem could then align more closely with other resource controllers like memory or I/O
> subsystems. By decoupling these CPU-specific statistics from the cgroup core, it could help keep
> both cgroup and rstat implementations more focused.

In my eyes, cpu controller is stuff encapsulated by cpu_cgrp_subsys. I'm
not sure I understand what you refer to as the CPU subsystem.

One thing is how it is presented to users (filenames and content)
another one is how it is implemented. The latter surely can be
refactored but it's not obvious to me from the short description, sorry.

> Is there any particular reason why the CPU subsystem must remain bound
> to the cgroup core?

The stuff that's bound to the core is essentially not "control" but only
accounting, so with this association, the accounting can have fine
granularity while control (which incurs higher overhead in principle)
may remain coarse. I find it thus quite fitting that CPU stats build on
top of rstat.
(Naturally, my previous claim about overhead is only rough and it's the
reason for existence of adjustments like in the commit 34f26a15611af
("sched/psi: Per-cgroup PSI accounting disable/re-enable interface").)

Thats how I see it, happy to discuss possible problems you see with
this.

Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: cpu.stat in core or cpu controller (was Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer)
  2025-07-18 13:58         ` cpu.stat in core or cpu controller (was Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer) Michal Koutný
@ 2025-07-19  2:01           ` Chen Ridong
  2025-07-19 16:27             ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Chen Ridong @ 2025-07-19  2:01 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Tiffany Yang, linux-kernel, John Stultz,
	Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Johannes Weiner, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, cgroups, linux-doc



On 2025/7/18 21:58, Michal Koutný wrote:
> On Fri, Jul 18, 2025 at 05:26:54PM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
>> With the recent merge of the series "cgroup: separate rstat trees," the rstat are not bound to CPU
>> system. This makes me wonder: should we consider moving the cpu.stat and cpu.stat.local interfaces
>> to the CPU subsystem?
> 
> Note that fields printed in cpu.stat are combination of "core" and cpu
> controller values.
> 

Do you mean the "core" values are shown as below:
- usage_usec
- user_usec
- system_usec
- nice_usec

In the legacy cgroup, these values are in the cpuacct subsystem.

>> The CPU subsystem could then align more closely with other resource controllers like memory or I/O
>> subsystems. By decoupling these CPU-specific statistics from the cgroup core, it could help keep
>> both cgroup and rstat implementations more focused.
> 
> In my eyes, cpu controller is stuff encapsulated by cpu_cgrp_subsys. I'm
> not sure I understand what you refer to as the CPU subsystem.
> 
> One thing is how it is presented to users (filenames and content)
> another one is how it is implemented. The latter surely can be
> refactored but it's not obvious to me from the short description, sorry.
> 

What I'm considering is moving the implementation of cpu.stat from cgroup_base_files to
cpu_cgrp_subsys—without changing the user-facing interface (filenames and content remain the same).
However, the interface would only appear if the CPU subsystem is enabled.

Currently, cpu.stat and cpu.stat.local are visible in every cgroup, even when the CPU subsystem is
disabled. The only populated fields in such cases are:

- usage_usec
- user_usec
- system_usec
- nice_usec

I’m unsure whether this change would be acceptable?

>> Is there any particular reason why the CPU subsystem must remain bound
>> to the cgroup core?
> 
> The stuff that's bound to the core is essentially not "control" but only
> accounting, so with this association, the accounting can have fine
> granularity while control (which incurs higher overhead in principle)
> may remain coarse. I find it thus quite fitting that CPU stats build on
> top of rstat.

The implementation would still rely on rstat, similar to memory.stat and io.stat. The goal is to
decouple it from the cgroup core (cgroup.c and rstat.c) while preserving accounting granularity.

Best regards,
Ridong

> (Naturally, my previous claim about overhead is only rough and it's the
> reason for existence of adjustments like in the commit 34f26a15611af
> ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface").)
> 
> Thats how I see it, happy to discuss possible problems you see with
> this.
> 
> Michal


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: cpu.stat in core or cpu controller (was Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer)
  2025-07-19  2:01           ` Chen Ridong
@ 2025-07-19 16:27             ` Tejun Heo
  2025-07-22  9:01               ` Chen Ridong
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2025-07-19 16:27 UTC (permalink / raw)
  To: Chen Ridong
  Cc: Michal Koutný, Tiffany Yang, linux-kernel, John Stultz,
	Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Johannes Weiner, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, cgroups, linux-doc

On Sat, Jul 19, 2025 at 10:01:07AM +0800, Chen Ridong wrote:
...
> What I'm considering is moving the implementation of cpu.stat from cgroup_base_files to
> cpu_cgrp_subsys—without changing the user-facing interface (filenames and content remain the same).
> However, the interface would only appear if the CPU subsystem is enabled.
> 
> Currently, cpu.stat and cpu.stat.local are visible in every cgroup, even when the CPU subsystem is
> disabled. The only populated fields in such cases are:
> 
> - usage_usec
> - user_usec
> - system_usec
> - nice_usec
> 
> I’m unsure whether this change would be acceptable?

I don't think so and don't really see what benefits moving the stats would
bring. Why would we move these?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: cpu.stat in core or cpu controller (was Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer)
  2025-07-19 16:27             ` Tejun Heo
@ 2025-07-22  9:01               ` Chen Ridong
  2025-07-22 11:54                 ` Michal Koutný
  0 siblings, 1 reply; 18+ messages in thread
From: Chen Ridong @ 2025-07-22  9:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michal Koutný, Tiffany Yang, linux-kernel, John Stultz,
	Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Johannes Weiner, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, cgroups, linux-doc



On 2025/7/20 0:27, Tejun Heo wrote:
> On Sat, Jul 19, 2025 at 10:01:07AM +0800, Chen Ridong wrote:
> ...
>> What I'm considering is moving the implementation of cpu.stat from cgroup_base_files to
>> cpu_cgrp_subsys—without changing the user-facing interface (filenames and content remain the same).
>> However, the interface would only appear if the CPU subsystem is enabled.
>>
>> Currently, cpu.stat and cpu.stat.local are visible in every cgroup, even when the CPU subsystem is
>> disabled. The only populated fields in such cases are:
>>
>> - usage_usec
>> - user_usec
>> - system_usec
>> - nice_usec
>>
>> I’m unsure whether this change would be acceptable?
> 
> I don't think so and don't really see what benefits moving the stats would
> bring. Why would we move these?
> 
> Thanks.
> 

Thank you for your attention. My intention is to better modularize the cgroup code by moving CPU
subsystem-specific statistics out of the core cgroup implementation (cgroup.c and rstat.c).

Specifically, this change would allow us to:

1.Remove these CPU-specific callbacks from the core:
  css_extra_stat_show()
  css_local_stat_show()
2. Clean up the 'is_self' logic in rstat.c.
3. Make the stat handling consistent across subsystems (currently cpu.stat is the only
subsystem-specific stat implemented in the core).

Best regards,
Ridong.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: cpu.stat in core or cpu controller (was Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer)
  2025-07-22  9:01               ` Chen Ridong
@ 2025-07-22 11:54                 ` Michal Koutný
  2025-07-23  1:28                   ` Chen Ridong
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Koutný @ 2025-07-22 11:54 UTC (permalink / raw)
  To: Chen Ridong
  Cc: Tejun Heo, Tiffany Yang, linux-kernel, John Stultz,
	Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Johannes Weiner, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, cgroups, linux-doc

[-- Attachment #1: Type: text/plain, Size: 782 bytes --]

On Tue, Jul 22, 2025 at 05:01:50PM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
> Specifically, this change would allow us to:
> 
> 1.Remove these CPU-specific callbacks from the core:
>   css_extra_stat_show()
>   css_local_stat_show()
> 2. Clean up the 'is_self' logic in rstat.c.

If you see an option to organize the code better, why not. (At the same
time, I currently also don't see the "why.)


> 3. Make the stat handling consistent across subsystems (currently cpu.stat is the only
> subsystem-specific stat implemented in the core).

But beware that the possibility of having cpu.stat without enabling the
cpu controller on v2 is a user visible behavior and I'm quite sure some
userspace relies on it, so you'd need to preserve that.

Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer
  2025-07-17 12:56 ` Michal Koutný
  2025-07-17 13:52   ` Chen Ridong
  2025-07-17 17:05   ` Tejun Heo
@ 2025-07-22 22:16   ` Tiffany Yang
  2 siblings, 0 replies; 18+ messages in thread
From: Tiffany Yang @ 2025-07-22 22:16 UTC (permalink / raw)
  To: Michal Koutný
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Stephen Boyd,
	Anna-Maria Behnsen, Frederic Weisbecker, Tejun Heo,
	Johannes Weiner, Rafael J. Wysocki, Pavel Machek, Roman Gushchin,
	Chen Ridong, kernel-team, Jonathan Corbet, cgroups, linux-doc

Michal Koutný <mkoutny@suse.com> writes:

> I'd like to incorporate the reason from your other mail:
> | Since there isn't yet a clear way to identify a set of "lost" time
> | that everyone (or at least a wider group of users) cares about, it
> | seems like iterating over components of interest is the best way
> into this commit message (because that's a stronger ponit that your use
> case alone).


>> Any feedback would be much appreciated!

> I can see benefits of this new stat field conceptually, I have some
> remarks to implementation and suggestions to conventions below.

>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -1018,6 +1018,14 @@ All cgroup core files are prefixed with "cgroup."
>>   	it's possible to delete a frozen (and empty) cgroup, as well as
>>   	create new sub-cgroups.

>> +  cgroup.freeze.stat

> With the given implementation (and use scenario), this'd better exposed
> in
>    cgroup.freeze.stat.local

> I grok the hierarchical summing would make little sense and it'd make
> implementaion more complex. With that I'm thinking about formulation:

> 	Cumulative time that cgroup has spent between freezing and
> 	thawing, regardless of whether by self or ancestor cgroups. NB
> 	(not) reaching "frozen" state is not accounted here.

>> +	A read-only flat-keyed file which exists in non-root cgroups.
>> +	The following entry is defined:
>> +
>> +	  freeze_time_total_ns
>> +		Cumulative time that this cgroup has spent in the freezing
>> +		state, regardless of whether or not it reaches "frozen".
>> +

> Rather use microseconds, it's the cgroup API convention and I'm not
> sure nanosecods exposed here are the needed precision.


Ack.

>         1    _____
> frozen 0 __/     \__
>            ab    cd

> Yeah, I find the mesurent between a and c the sanest.


>> +static int cgroup_freeze_stat_show(struct seq_file *seq, void *v)
>> +{
>> +	struct cgroup *cgrp = seq_css(seq)->cgroup;
>> +	u64 freeze_time = 0;
>> +
>> +	spin_lock_irq(&css_set_lock);
>> +	if (test_bit(CGRP_FREEZE, &cgrp->flags))
>> +		freeze_time = ktime_get_ns() - cgrp->freezer.freeze_time_start_ns;
>> +
>> +	freeze_time += cgrp->freezer.freeze_time_total_ns;
>> +	spin_unlock_irq(&css_set_lock);

> I don't like taking this spinlock only for the matter of reading this
> attribute. The intention should be to keep the (un)freezeing mostly
> unaffected at the expense of these readers (seqcount or u64 stats?).


Ah, thank you for this suggestion! I noticed that none of the other
seq_file read implementations took a lock, so I thought this might be a
point of contention. I'll try a seqlock in the next version of the
patch.

> Alternative approach: either there's outer watcher who can be notified
> by cgroup.events:frozen or it's an inner watcher who couldn't actively
> read the field anyway. So the field could only show completed
> freeze/thaw cycles from the past (i.e. not substitute clock_gettime(2)
> when the cgroup is frozen), which could simplify querying the flag too.


This is a good observation. This approach does simplify things, but
even though it would work for our use case, I feel like this value
would be less useful for the outer watcher case, especially in the case
where the cgroup never reaches the frozen state.

>> @@ -5758,6 +5780,7 @@ static struct cgroup *cgroup_create(struct cgroup  
>> *parent, const char *name,
>>   	 * if the parent has to be frozen, the child has too.
>>   	 */
>>   	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
>> +	cgrp->freezer.freeze_time_total_ns = 0;

> struct cgroup is kzalloc'd, this is unnecessary

Thank you for all your feedback! I'll make sure to incorporate these
suggestions into the next version.

-- 
Tiffany Y. Yang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer
  2025-07-17 13:52   ` Chen Ridong
  2025-07-17 17:06     ` Tejun Heo
@ 2025-07-22 22:27     ` Tiffany Yang
  1 sibling, 0 replies; 18+ messages in thread
From: Tiffany Yang @ 2025-07-22 22:27 UTC (permalink / raw)
  To: Chen Ridong
  Cc: Michal Koutný, linux-kernel, John Stultz, Thomas Gleixner,
	Stephen Boyd, Anna-Maria Behnsen, Frederic Weisbecker, Tejun Heo,
	Johannes Weiner, Rafael J. Wysocki, Pavel Machek, Roman Gushchin,
	Chen Ridong, kernel-team, Jonathan Corbet, cgroups, linux-doc

Hi Rindong,

Chen Ridong <chenridong@huaweicloud.com> writes:

> On 2025/7/17 20:56, Michal Koutný wrote:
>> Hello Tiffany.

>> On Sun, Jul 13, 2025 at 10:00:09PM -0700, Tiffany Yang  
>> <ynaffit@google.com> wrote:

>>> --- a/Documentation/admin-guide/cgroup-v2.rst
>>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>>> @@ -1018,6 +1018,14 @@ All cgroup core files are prefixed with "cgroup."
>>>   	it's possible to delete a frozen (and empty) cgroup, as well as
>>>   	create new sub-cgroups.

>>> +  cgroup.freeze.stat

>> With the given implementation (and use scenario), this'd better exposed
>> in
>>    cgroup.freeze.stat.local


> Would it be possible to add this field to either cgroup.event or  
> cgroup.stat?
> Since the frozen status is already tracked in cgroup.event, this  
> placement would maintain better
> cohesion with existing metrics.

> This is just a suggestion.

> Best regards,
> Ridong

Thanks for taking a look!

I don't think this would *quite* fit in cgroup.event because we're
measuring when the cgroup begins freezing instead of when it reaches the
frozen state. I also worry that having the value so close to
cgroup.frozen would cause additional confusion about its
meaning. cgroup.stat seems reasonable, but the values inside appear to
be accounted for hierarchically, which wouldn't suit our use case.

-- 
Tiffany Y. Yang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer
  2025-07-17 17:06     ` Tejun Heo
@ 2025-07-22 22:41       ` Tiffany Yang
  0 siblings, 0 replies; 18+ messages in thread
From: Tiffany Yang @ 2025-07-22 22:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chen Ridong, Michal Koutný, linux-kernel, John Stultz,
	Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Johannes Weiner, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, cgroups, linux-doc

Tejun Heo <tj@kernel.org> writes:

> On Thu, Jul 17, 2025 at 09:52:38PM +0800, Chen Ridong wrote:
>> > With the given implementation (and use scenario), this'd better exposed
>> > in
>> >   cgroup.freeze.stat.local
>> >

>> Would it be possible to add this field to either cgroup.event or  
>> cgroup.stat?
>> Since the frozen status is already tracked in cgroup.event, this  
>> placement would maintain better
>> cohesion with existing metrics.

>> This is just a suggestion.

> Yeah, given that the freezer is an integral part of cgroup core, using
> cgroup.stat[.local] probably makes more sense.

> Thanks.

One of the reasons I avoided cgroup.stat was because I interpreted its
purpose to be for exposing cgroup metadata (i.e., descendants and
descendants per subsystem), and I didn't think this value fit in neatly.

It doesn't seem like there currently exists a cgroup.stat.local, but if
that is the preferred location for this accounting, I could create one
and print it there!

-- 
Tiffany Y. Yang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: cpu.stat in core or cpu controller (was Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer)
  2025-07-22 11:54                 ` Michal Koutný
@ 2025-07-23  1:28                   ` Chen Ridong
  2025-07-25  1:08                     ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Chen Ridong @ 2025-07-23  1:28 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Tiffany Yang, linux-kernel, John Stultz,
	Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Johannes Weiner, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, cgroups, linux-doc



On 2025/7/22 19:54, Michal Koutný wrote:
> On Tue, Jul 22, 2025 at 05:01:50PM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
>> Specifically, this change would allow us to:
>>
>> 1.Remove these CPU-specific callbacks from the core:
>>   css_extra_stat_show()
>>   css_local_stat_show()
>> 2. Clean up the 'is_self' logic in rstat.c.
> 
> If you see an option to organize the code better, why not. (At the same
> time, I currently also don't see the "why.)
> 
> 
>> 3. Make the stat handling consistent across subsystems (currently cpu.stat is the only
>> subsystem-specific stat implemented in the core).
> 
> But beware that the possibility of having cpu.stat without enabling the
> cpu controller on v2 is a user visible behavior and I'm quite sure some
> userspace relies on it, so you'd need to preserve that.
> 

This is what I worry about. Thank you for your confirmation.

Best regards,
Ridong


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: cpu.stat in core or cpu controller (was Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer)
  2025-07-23  1:28                   ` Chen Ridong
@ 2025-07-25  1:08                     ` Tejun Heo
  2025-07-25  1:54                       ` Chen Ridong
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2025-07-25  1:08 UTC (permalink / raw)
  To: Chen Ridong
  Cc: Michal Koutný, Tiffany Yang, linux-kernel, John Stultz,
	Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Johannes Weiner, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, cgroups, linux-doc

On Wed, Jul 23, 2025 at 09:28:02AM +0800, Chen Ridong wrote:
> > But beware that the possibility of having cpu.stat without enabling the
> > cpu controller on v2 is a user visible behavior and I'm quite sure some
> > userspace relies on it, so you'd need to preserve that.
> 
> This is what I worry about. Thank you for your confirmation.

Yeah, this was an intentional decision - sacrificing a bit of code org
cleanliness for everyday usefulness. Enabling CPU controller can have
substantial overhead and having cpu stats available by default doesn't cost
much while improving usefulness.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: cpu.stat in core or cpu controller (was Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer)
  2025-07-25  1:08                     ` Tejun Heo
@ 2025-07-25  1:54                       ` Chen Ridong
  0 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2025-07-25  1:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michal Koutný, Tiffany Yang, linux-kernel, John Stultz,
	Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Johannes Weiner, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, cgroups, linux-doc



On 2025/7/25 9:08, Tejun Heo wrote:
> On Wed, Jul 23, 2025 at 09:28:02AM +0800, Chen Ridong wrote:
>>> But beware that the possibility of having cpu.stat without enabling the
>>> cpu controller on v2 is a user visible behavior and I'm quite sure some
>>> userspace relies on it, so you'd need to preserve that.
>>
>> This is what I worry about. Thank you for your confirmation.
> 
> Yeah, this was an intentional decision - sacrificing a bit of code org
> cleanliness for everyday usefulness. Enabling CPU controller can have
> substantial overhead and having cpu stats available by default doesn't cost
> much while improving usefulness.
> 
> Thanks.
> 

Thank you Tj, This is clear now.

Best regards,
Ridong


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-07-25  1:54 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-14  5:00 [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer Tiffany Yang
2025-07-17 12:56 ` Michal Koutný
2025-07-17 13:52   ` Chen Ridong
2025-07-17 17:06     ` Tejun Heo
2025-07-22 22:41       ` Tiffany Yang
2025-07-22 22:27     ` Tiffany Yang
2025-07-17 17:05   ` Tejun Heo
2025-07-18  8:20     ` Michal Koutný
2025-07-18  9:26       ` Chen Ridong
2025-07-18 13:58         ` cpu.stat in core or cpu controller (was Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer) Michal Koutný
2025-07-19  2:01           ` Chen Ridong
2025-07-19 16:27             ` Tejun Heo
2025-07-22  9:01               ` Chen Ridong
2025-07-22 11:54                 ` Michal Koutný
2025-07-23  1:28                   ` Chen Ridong
2025-07-25  1:08                     ` Tejun Heo
2025-07-25  1:54                       ` Chen Ridong
2025-07-22 22:16   ` [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer Tiffany Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).