[PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer
@ 2025-08-22  1:37 Tiffany Yang
  2025-08-22  1:37 ` [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting Tiffany Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Tiffany Yang @ 2025-08-22  1:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: John Stultz, Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Tejun Heo, Johannes Weiner,
	Michal Koutný, Rafael J. Wysocki, Pavel Machek,
	Roman Gushchin, Chen Ridong, kernel-team, Jonathan Corbet,
	Shuah Khan, cgroups, linux-doc, linux-kselftest

Hello,

The cgroup v2 freezer controller is useful for freezing background
applications so they don't contend with foreground tasks. However, this
may disrupt any internal monitoring that the application is performing,
as it may not be aware that it was frozen.

To illustrate, an application might implement a watchdog thread to
monitor a high-priority task by periodically checking its state to
ensure progress. The challenge is that the task only advances when the
application is running, but watchdog timers are set relative to system
time, not app time. If the app is frozen and misses the expected
deadline, the watchdog, unaware of this pause, may kill a healthy
process.

This series tracks the time that each cgroup spends "freezing" and
exposes it via cgroup.stat.local. Include several basic selftests to
demonstrate the expected behavior of this interface, including that:
  1. Freeze time will increase while a cgroup is freezing, regardless of
     whether it is frozen or not.
  2. Each cgroup's freeze time is independent from the other cgroups in
     its hierarchy.

Thanks,
Tiffany

Signed-off-by: Tiffany Yang <ynaffit@google.com>
---
v3: https://lore.kernel.org/all/20250805032940.3587891-4-ynaffit@google.com/
v2: https://lore.kernel.org/lkml/20250714050008.2167786-2-ynaffit@google.com/
v1: https://lore.kernel.org/lkml/20250603224304.3198729-3-ynaffit@google.com/

Cc: John Stultz <jstultz@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Pavel Machek <pavel@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tiffany Yang <ynaffit@google.com>

Tiffany Yang (2):
  cgroup: cgroup.stat.local time accounting
  cgroup: selftests: Add tests for freezer time

 Documentation/admin-guide/cgroup-v2.rst       |  18 +
 include/linux/cgroup-defs.h                   |  17 +
 kernel/cgroup/cgroup.c                        |  28 +
 kernel/cgroup/freezer.c                       |  16 +-
 tools/testing/selftests/cgroup/test_freezer.c | 663 ++++++++++++++++++
 5 files changed, 738 insertions(+), 4 deletions(-)

-- 
2.51.0.rc2.233.g662b1ed5c5-goog

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting
  2025-08-22  1:37 [PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer Tiffany Yang
@ 2025-08-22  1:37 ` Tiffany Yang
  2025-08-22  6:14   ` Chen Ridong
  2025-08-22  1:37 ` [PATCH v4 2/2] cgroup: selftests: Add tests for freezer time Tiffany Yang
  2025-08-22 17:51 ` [PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer Tejun Heo
  2 siblings, 1 reply; 12+ messages in thread
From: Tiffany Yang @ 2025-08-22  1:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: John Stultz, Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Tejun Heo, Johannes Weiner,
	Michal Koutný, Rafael J. Wysocki, Pavel Machek,
	Roman Gushchin, Chen Ridong, kernel-team, Jonathan Corbet,
	Shuah Khan, cgroups, linux-doc, linux-kselftest

There isn't yet a clear way to identify a set of "lost" time that
everyone (or at least a wider group of users) cares about. However,
users can perform some delay accounting by iterating over components of
interest. This patch allows cgroup v2 freezing time to be one of those
components.

Track the cumulative time that each v2 cgroup spends freezing and expose
it to userland via a new local stat file in cgroupfs. Thank you to
Michal, who provided the ASCII art in the updated documentation.

To access this value:
  $ mkdir /sys/fs/cgroup/test
  $ cat /sys/fs/cgroup/test/cgroup.stat.local
  freeze_time_total 0

Ensure consistent freeze time reads with freeze_seq, a per-cgroup
sequence counter. Writes are serialized using the css_set_lock.

Signed-off-by: Tiffany Yang <ynaffit@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
---
v3 -> v4:
* Replace "freeze_time_total" with "frozen" and expose stats via
  cgroup.stat.local, as recommended by Tejun.
* Use the same timestamp when freezing/unfreezing a cgroup as its
  descendants, as suggested by Michal.

v2 -> v3:
* Use seqcount along with css_set_lock to guard freeze time accesses, as
  suggested by Michal.

v1 -> v2:
* Track per-cgroup freezing time instead of per-task frozen time, as
  suggested by Tejun.
---
 Documentation/admin-guide/cgroup-v2.rst | 18 ++++++++++++++++
 include/linux/cgroup-defs.h             | 17 +++++++++++++++
 kernel/cgroup/cgroup.c                  | 28 +++++++++++++++++++++++++
 kernel/cgroup/freezer.c                 | 16 ++++++++++----
 4 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 51c0bc4c2dc5..a1e3d431974c 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1001,6 +1001,24 @@ All cgroup core files are prefixed with "cgroup."
 		Total number of dying cgroup subsystems (e.g. memory
 		cgroup) at and beneath the current cgroup.
 
+  cgroup.stat.local
+	A read-only flat-keyed file which exists in non-root cgroups.
+	The following entry is defined:
+
+	  frozen_usec
+		Cumulative time that this cgroup has spent between freezing and
+		thawing, regardless of whether by self or ancestor groups.
+		NB: (not) reaching "frozen" state is not accounted here.
+
+		Using the following ASCII representation of a cgroup's freezer
+		state, ::
+
+			       1    _____
+			frozen 0 __/     \__
+			          ab    cd
+
+		the duration being measured is the span between a and c.
+
   cgroup.freeze
 	A read-write single value file which exists on non-root cgroups.
 	Allowed values are "0" and "1". The default is "0".
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 6b93a64115fe..539c64eeef38 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -433,6 +433,23 @@ struct cgroup_freezer_state {
 	 * frozen, SIGSTOPped, and PTRACEd.
 	 */
 	int nr_frozen_tasks;
+
+	/* Freeze time data consistency protection */
+	seqcount_t freeze_seq;
+
+	/*
+	 * Most recent time the cgroup was requested to freeze.
+	 * Accesses guarded by freeze_seq counter. Writes serialized
+	 * by css_set_lock.
+	 */
+	u64 freeze_start_nsec;
+
+	/*
+	 * Total duration the cgroup has spent freezing.
+	 * Accesses guarded by freeze_seq counter. Writes serialized
+	 * by css_set_lock.
+	 */
+	u64 frozen_nsec;
 };
 
 struct cgroup {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 312c6a8b55bb..ab096b884bbc 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3763,6 +3763,27 @@ static int cgroup_stat_show(struct seq_file *seq, void *v)
 	return 0;
 }
 
+static int cgroup_core_local_stat_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+	unsigned int sequence;
+	u64 freeze_time;
+
+	do {
+		sequence = read_seqcount_begin(&cgrp->freezer.freeze_seq);
+		freeze_time = cgrp->freezer.frozen_nsec;
+		/* Add in current freezer interval if the cgroup is freezing. */
+		if (test_bit(CGRP_FREEZE, &cgrp->flags))
+			freeze_time += (ktime_get_ns() -
+					cgrp->freezer.freeze_start_nsec);
+	} while (read_seqcount_retry(&cgrp->freezer.freeze_seq, sequence));
+
+	seq_printf(seq, "frozen_usec %llu\n",
+		   (unsigned long long) freeze_time / NSEC_PER_USEC);
+
+	return 0;
+}
+
 #ifdef CONFIG_CGROUP_SCHED
 /**
  * cgroup_tryget_css - try to get a cgroup's css for the specified subsystem
@@ -5354,6 +5375,11 @@ static struct cftype cgroup_base_files[] = {
 		.name = "cgroup.stat",
 		.seq_show = cgroup_stat_show,
 	},
+	{
+		.name = "cgroup.stat.local",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = cgroup_core_local_stat_show,
+	},
 	{
 		.name = "cgroup.freeze",
 		.flags = CFTYPE_NOT_ON_ROOT,
@@ -5763,6 +5789,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
 	 * if the parent has to be frozen, the child has too.
 	 */
 	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
+	seqcount_init(&cgrp->freezer.freeze_seq);
 	if (cgrp->freezer.e_freeze) {
 		/*
 		 * Set the CGRP_FREEZE flag, so when a process will be
@@ -5771,6 +5798,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
 		 * consider it frozen immediately.
 		 */
 		set_bit(CGRP_FREEZE, &cgrp->flags);
+		cgrp->freezer.freeze_start_nsec = ktime_get_ns();
 		set_bit(CGRP_FROZEN, &cgrp->flags);
 	}
 
diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
index bf1690a167dd..6c18854bff34 100644
--- a/kernel/cgroup/freezer.c
+++ b/kernel/cgroup/freezer.c
@@ -171,7 +171,7 @@ static void cgroup_freeze_task(struct task_struct *task, bool freeze)
 /*
  * Freeze or unfreeze all tasks in the given cgroup.
  */
-static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
+static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze, u64 ts_nsec)
 {
 	struct css_task_iter it;
 	struct task_struct *task;
@@ -179,10 +179,16 @@ static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
 	lockdep_assert_held(&cgroup_mutex);
 
 	spin_lock_irq(&css_set_lock);
-	if (freeze)
+	write_seqcount_begin(&cgrp->freezer.freeze_seq);
+	if (freeze) {
 		set_bit(CGRP_FREEZE, &cgrp->flags);
-	else
+		cgrp->freezer.freeze_start_nsec = ts_nsec;
+	} else {
 		clear_bit(CGRP_FREEZE, &cgrp->flags);
+		cgrp->freezer.frozen_nsec += (ts_nsec -
+			cgrp->freezer.freeze_start_nsec);
+	}
+	write_seqcount_end(&cgrp->freezer.freeze_seq);
 	spin_unlock_irq(&css_set_lock);
 
 	if (freeze)
@@ -260,6 +266,7 @@ void cgroup_freeze(struct cgroup *cgrp, bool freeze)
 	struct cgroup *parent;
 	struct cgroup *dsct;
 	bool applied = false;
+	u64 ts_nsec;
 	bool old_e;
 
 	lockdep_assert_held(&cgroup_mutex);
@@ -271,6 +278,7 @@ void cgroup_freeze(struct cgroup *cgrp, bool freeze)
 		return;
 
 	cgrp->freezer.freeze = freeze;
+	ts_nsec = ktime_get_ns();
 
 	/*
 	 * Propagate changes downwards the cgroup tree.
@@ -298,7 +306,7 @@ void cgroup_freeze(struct cgroup *cgrp, bool freeze)
 		/*
 		 * Do change actual state: freeze or unfreeze.
 		 */
-		cgroup_do_freeze(dsct, freeze);
+		cgroup_do_freeze(dsct, freeze, ts_nsec);
 		applied = true;
 	}
 
-- 
2.51.0.rc2.233.g662b1ed5c5-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/2] cgroup: selftests: Add tests for freezer time
  2025-08-22  1:37 [PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer Tiffany Yang
  2025-08-22  1:37 ` [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting Tiffany Yang
@ 2025-08-22  1:37 ` Tiffany Yang
  2025-08-22  7:19   ` Chen Ridong
  2025-08-22 17:51 ` [PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer Tejun Heo
  2 siblings, 1 reply; 12+ messages in thread
From: Tiffany Yang @ 2025-08-22  1:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: John Stultz, Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Tejun Heo, Johannes Weiner,
	Michal Koutný, Rafael J. Wysocki, Pavel Machek,
	Roman Gushchin, Chen Ridong, kernel-team, Jonathan Corbet,
	Shuah Khan, cgroups, linux-doc, linux-kselftest

Test cgroup v2 freezer time stat. Freezer time accounting should
be independent of other cgroups in the hierarchy and should increase
iff a cgroup is CGRP_FREEZE (regardless of whether it reaches
CGRP_FROZEN).

Skip these tests on systems without freeze time accounting.

Signed-off-by: Tiffany Yang <ynaffit@google.com>
Cc: Michal Koutný <mkoutny@suse.com>
---
v3 -> v4:
* Clean up logic around skipping selftests and decrease granularity of
  sleep times, as suggested by Michal.
---
 tools/testing/selftests/cgroup/test_freezer.c | 663 ++++++++++++++++++
 1 file changed, 663 insertions(+)

diff --git a/tools/testing/selftests/cgroup/test_freezer.c b/tools/testing/selftests/cgroup/test_freezer.c
index 8730645d363a..dfb763819581 100644
--- a/tools/testing/selftests/cgroup/test_freezer.c
+++ b/tools/testing/selftests/cgroup/test_freezer.c
@@ -804,6 +804,662 @@ static int test_cgfreezer_vfork(const char *root)
 	return ret;
 }
 
+/*
+ * Get the current frozen_usec for the cgroup.
+ */
+static long cg_check_freezetime(const char *cgroup)
+{
+	return cg_read_key_long(cgroup, "cgroup.stat.local",
+				"frozen_usec ");
+}
+
+/*
+ * Test that the freeze time will behave as expected for an empty cgroup.
+ */
+static int test_cgfreezer_time_empty(const char *root)
+{
+	int ret = KSFT_FAIL;
+	char *cgroup = NULL;
+	long prev, curr;
+
+	cgroup = cg_name(root, "cg_time_test_empty");
+	if (!cgroup)
+		goto cleanup;
+
+	/*
+	 * 1) Create an empty cgroup and check that its freeze time
+	 *    is 0.
+	 */
+	if (cg_create(cgroup))
+		goto cleanup;
+
+	curr = cg_check_freezetime(cgroup);
+	if (curr < 0) {
+		ret = KSFT_SKIP;
+		goto cleanup;
+	}
+	if (curr > 0) {
+		debug("Expect time (%ld) to be 0\n", curr);
+		goto cleanup;
+	}
+
+	if (cg_freeze_nowait(cgroup, true))
+		goto cleanup;
+
+	/*
+	 * 2) Sleep for 1000 us. Check that the freeze time is at
+	 *    least 1000 us.
+	 */
+	usleep(1000);
+	curr = cg_check_freezetime(cgroup);
+	if (curr < 1000) {
+		debug("Expect time (%ld) to be at least 1000 us\n",
+		      curr);
+		goto cleanup;
+	}
+
+	/*
+	 * 3) Unfreeze the cgroup. Check that the freeze time is
+	 *    larger than at 2).
+	 */
+	if (cg_freeze_nowait(cgroup, false))
+		goto cleanup;
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr <= prev) {
+		debug("Expect time (%ld) to be more than previous check (%ld)\n",
+		      curr, prev);
+		goto cleanup;
+	}
+
+	/*
+	 * 4) Check the freeze time again to ensure that it has not
+	 *    changed.
+	 */
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr != prev) {
+		debug("Expect time (%ld) to be unchanged from previous check (%ld)\n",
+		      curr, prev);
+		goto cleanup;
+	}
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (cgroup)
+		cg_destroy(cgroup);
+	free(cgroup);
+	return ret;
+}
+
+/*
+ * A simple test for cgroup freezer time accounting. This test follows
+ * the same flow as test_cgfreezer_time_empty, but with a single process
+ * in the cgroup.
+ */
+static int test_cgfreezer_time_simple(const char *root)
+{
+	int ret = KSFT_FAIL;
+	char *cgroup = NULL;
+	long prev, curr;
+
+	cgroup = cg_name(root, "cg_time_test_simple");
+	if (!cgroup)
+		goto cleanup;
+
+	/*
+	 * 1) Create a cgroup and check that its freeze time is 0.
+	 */
+	if (cg_create(cgroup))
+		goto cleanup;
+
+	curr = cg_check_freezetime(cgroup);
+	if (curr < 0) {
+		ret = KSFT_SKIP;
+		goto cleanup;
+	}
+	if (curr > 0) {
+		debug("Expect time (%ld) to be 0\n", curr);
+		goto cleanup;
+	}
+
+	/*
+	 * 2) Populate the cgroup with one child and check that the
+	 *    freeze time is still 0.
+	 */
+	cg_run_nowait(cgroup, child_fn, NULL);
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr > prev) {
+		debug("Expect time (%ld) to be 0\n", curr);
+		goto cleanup;
+	}
+
+	if (cg_freeze_nowait(cgroup, true))
+		goto cleanup;
+
+	/*
+	 * 3) Sleep for 1000 us. Check that the freeze time is at
+	 *    least 1000 us.
+	 */
+	usleep(1000);
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr < 1000) {
+		debug("Expect time (%ld) to be at least 1000 us\n",
+		      curr);
+		goto cleanup;
+	}
+
+	/*
+	 * 4) Unfreeze the cgroup. Check that the freeze time is
+	 *    larger than at 3).
+	 */
+	if (cg_freeze_nowait(cgroup, false))
+		goto cleanup;
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr <= prev) {
+		debug("Expect time (%ld) to be more than previous check (%ld)\n",
+		      curr, prev);
+		goto cleanup;
+	}
+
+	/*
+	 * 5) Sleep for 1000 us. Check that the freeze time is the
+	 *    same as at 4).
+	 */
+	usleep(1000);
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr != prev) {
+		debug("Expect time (%ld) to be unchanged from previous check (%ld)\n",
+		      curr, prev);
+		goto cleanup;
+	}
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (cgroup)
+		cg_destroy(cgroup);
+	free(cgroup);
+	return ret;
+}
+
+/*
+ * Test that freezer time accounting works as expected, even while we're
+ * populating a cgroup with processes.
+ */
+static int test_cgfreezer_time_populate(const char *root)
+{
+	int ret = KSFT_FAIL;
+	char *cgroup = NULL;
+	long prev, curr;
+	int i;
+
+	cgroup = cg_name(root, "cg_time_test_populate");
+	if (!cgroup)
+		goto cleanup;
+
+	if (cg_create(cgroup))
+		goto cleanup;
+
+	curr = cg_check_freezetime(cgroup);
+	if (curr < 0) {
+		ret = KSFT_SKIP;
+		goto cleanup;
+	}
+	if (curr > 0) {
+		debug("Expect time (%ld) to be 0\n", curr);
+		goto cleanup;
+	}
+
+	/*
+	 * 1) Populate the cgroup with 100 processes. Check that
+	 *    the freeze time is 0.
+	 */
+	for (i = 0; i < 100; i++)
+		cg_run_nowait(cgroup, child_fn, NULL);
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr != prev) {
+		debug("Expect time (%ld) to be 0\n", curr);
+		goto cleanup;
+	}
+
+	/*
+	 * 2) Wait for the group to become fully populated. Check
+	 *    that the freeze time is 0.
+	 */
+	if (cg_wait_for_proc_count(cgroup, 100))
+		goto cleanup;
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr != prev) {
+		debug("Expect time (%ld) to be 0\n", curr);
+		goto cleanup;
+	}
+
+	/*
+	 * 3) Freeze the cgroup and then populate it with 100 more
+	 *    processes. Check that the freeze time continues to grow.
+	 */
+	if (cg_freeze_nowait(cgroup, true))
+		goto cleanup;
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr <= prev) {
+		debug("Expect time (%ld) to be more than previous check (%ld)\n",
+		      curr, prev);
+		goto cleanup;
+	}
+
+	for (i = 0; i < 100; i++)
+		cg_run_nowait(cgroup, child_fn, NULL);
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr <= prev) {
+		debug("Expect time (%ld) to be more than previous check (%ld)\n",
+		      curr, prev);
+		goto cleanup;
+	}
+
+	/*
+	 * 4) Wait for the group to become fully populated. Check
+	 *    that the freeze time is larger than at 3).
+	 */
+	if (cg_wait_for_proc_count(cgroup, 200))
+		goto cleanup;
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr <= prev) {
+		debug("Expect time (%ld) to be more than previous check (%ld)\n",
+		      curr, prev);
+		goto cleanup;
+	}
+
+	/*
+	 * 5) Unfreeze the cgroup. Check that the freeze time is
+	 *    larger than at 4).
+	 */
+	if (cg_freeze_nowait(cgroup, false))
+		goto cleanup;
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr <= prev) {
+		debug("Expect time (%ld) to be more than previous check (%ld)\n",
+		      curr, prev);
+		goto cleanup;
+	}
+
+	/*
+	 * 6) Kill the processes. Check that the freeze time is the
+	 *    same as it was at 5).
+	 */
+	if (cg_killall(cgroup))
+		goto cleanup;
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr != prev) {
+		debug("Expect time (%ld) to be unchanged from previous check (%ld)\n",
+		      curr, prev);
+		goto cleanup;
+	}
+
+	/*
+	 * 7) Freeze and unfreeze the cgroup. Check that the freeze
+	 *    time is larger than it was at 6).
+	 */
+	if (cg_freeze_nowait(cgroup, true))
+		goto cleanup;
+	if (cg_freeze_nowait(cgroup, false))
+		goto cleanup;
+	prev = curr;
+	curr = cg_check_freezetime(cgroup);
+	if (curr <= prev) {
+		debug("Expect time (%ld) to be more than previous check (%ld)\n",
+		      curr, prev);
+		goto cleanup;
+	}
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (cgroup)
+		cg_destroy(cgroup);
+	free(cgroup);
+	return ret;
+}
+
+/*
+ * Test that frozen time for a cgroup continues to work as expected,
+ * even as processes are migrated. Frozen cgroup A's freeze time should
+ * continue to increase and running cgroup B's should stay 0.
+ */
+static int test_cgfreezer_time_migrate(const char *root)
+{
+	long prev_A, curr_A, curr_B;
+	char *cgroup[2] = {0};
+	int ret = KSFT_FAIL;
+	int pid;
+
+	cgroup[0] = cg_name(root, "cg_time_test_migrate_A");
+	if (!cgroup[0])
+		goto cleanup;
+
+	cgroup[1] = cg_name(root, "cg_time_test_migrate_B");
+	if (!cgroup[1])
+		goto cleanup;
+
+	if (cg_create(cgroup[0]))
+		goto cleanup;
+
+	if (cg_check_freezetime(cgroup[0]) < 0) {
+		ret = KSFT_SKIP;
+		goto cleanup;
+	}
+
+	if (cg_create(cgroup[1]))
+		goto cleanup;
+
+	pid = cg_run_nowait(cgroup[0], child_fn, NULL);
+	if (pid < 0)
+		goto cleanup;
+
+	if (cg_wait_for_proc_count(cgroup[0], 1))
+		goto cleanup;
+
+	curr_A = cg_check_freezetime(cgroup[0]);
+	if (curr_A) {
+		debug("Expect time (%ld) to be 0\n", curr_A);
+		goto cleanup;
+	}
+	curr_B = cg_check_freezetime(cgroup[1]);
+	if (curr_B) {
+		debug("Expect time (%ld) to be 0\n", curr_B);
+		goto cleanup;
+	}
+
+	/*
+	 * Freeze cgroup A.
+	 */
+	if (cg_freeze_wait(cgroup[0], true))
+		goto cleanup;
+	prev_A = curr_A;
+	curr_A = cg_check_freezetime(cgroup[0]);
+	if (curr_A <= prev_A) {
+		debug("Expect time (%ld) to be > 0\n", curr_A);
+		goto cleanup;
+	}
+
+	/*
+	 * Migrate from A (frozen) to B (running).
+	 */
+	if (cg_enter(cgroup[1], pid))
+		goto cleanup;
+
+	usleep(1000);
+	curr_B = cg_check_freezetime(cgroup[1]);
+	if (curr_B) {
+		debug("Expect time (%ld) to be 0\n", curr_B);
+		goto cleanup;
+	}
+
+	prev_A = curr_A;
+	curr_A = cg_check_freezetime(cgroup[0]);
+	if (curr_A <= prev_A) {
+		debug("Expect time (%ld) to be more than previous check (%ld)\n",
+		      curr_A, prev_A);
+		goto cleanup;
+	}
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (cgroup[0])
+		cg_destroy(cgroup[0]);
+	free(cgroup[0]);
+	if (cgroup[1])
+		cg_destroy(cgroup[1]);
+	free(cgroup[1]);
+	return ret;
+}
+
+/*
+ * The test creates a cgroup and freezes it. Then it creates a child cgroup.
+ * After that it checks that the child cgroup has a non-zero freeze time
+ * that is less than the parent's. Next, it freezes the child, unfreezes
+ * the parent, and sleeps. Finally, it checks that the child's freeze
+ * time has grown larger than the parent's.
+ */
+static int test_cgfreezer_time_parent(const char *root)
+{
+	char *parent, *child = NULL;
+	int ret = KSFT_FAIL;
+	long ptime, ctime;
+
+	parent = cg_name(root, "cg_test_parent_A");
+	if (!parent)
+		goto cleanup;
+
+	child = cg_name(parent, "cg_test_parent_B");
+	if (!child)
+		goto cleanup;
+
+	if (cg_create(parent))
+		goto cleanup;
+
+	if (cg_check_freezetime(parent) < 0) {
+		ret = KSFT_SKIP;
+		goto cleanup;
+	}
+
+	if (cg_freeze_wait(parent, true))
+		goto cleanup;
+
+	usleep(1000);
+	if (cg_create(child))
+		goto cleanup;
+
+	if (cg_check_frozen(child, true))
+		goto cleanup;
+
+	/*
+	 * Since the parent was frozen the entire time the child cgroup
+	 * was being created, we expect the parent's freeze time to be
+	 * larger than the child's.
+	 *
+	 * Ideally, we would be able to check both times simultaneously,
+	 * but here we get the child's after we get the parent's.
+	 */
+	ptime = cg_check_freezetime(parent);
+	ctime = cg_check_freezetime(child);
+	if (ptime <= ctime) {
+		debug("Expect ptime (%ld) > ctime (%ld)\n", ptime, ctime);
+		goto cleanup;
+	}
+
+	if (cg_freeze_nowait(child, true))
+		goto cleanup;
+
+	if (cg_freeze_wait(parent, false))
+		goto cleanup;
+
+	if (cg_check_frozen(child, true))
+		goto cleanup;
+
+	usleep(100000);
+
+	ctime = cg_check_freezetime(child);
+	ptime = cg_check_freezetime(parent);
+
+	if (ctime <= ptime) {
+		debug("Expect ctime (%ld) > ptime (%ld)\n", ctime, ptime);
+		goto cleanup;
+	}
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (child)
+		cg_destroy(child);
+	free(child);
+	if (parent)
+		cg_destroy(parent);
+	free(parent);
+	return ret;
+}
+
+/*
+ * The test creates a parent cgroup and a child cgroup. Then, it freezes
+ * the child and checks that the child's freeze time is greater than the
+ * parent's, which should be zero.
+ */
+static int test_cgfreezer_time_child(const char *root)
+{
+	char *parent, *child = NULL;
+	int ret = KSFT_FAIL;
+	long ptime, ctime;
+
+	parent = cg_name(root, "cg_test_child_A");
+	if (!parent)
+		goto cleanup;
+
+	child = cg_name(parent, "cg_test_child_B");
+	if (!child)
+		goto cleanup;
+
+	if (cg_create(parent))
+		goto cleanup;
+
+	if (cg_check_freezetime(parent) < 0) {
+		ret = KSFT_SKIP;
+		goto cleanup;
+	}
+
+	if (cg_create(child))
+		goto cleanup;
+
+	if (cg_freeze_wait(child, true))
+		goto cleanup;
+
+	ctime = cg_check_freezetime(child);
+	ptime = cg_check_freezetime(parent);
+	if (ptime != 0) {
+		debug("Expect ptime (%ld) to be 0\n", ptime);
+		goto cleanup;
+	}
+
+	if (ctime <= ptime) {
+		debug("Expect ctime (%ld) <= ptime (%ld)\n", ctime, ptime);
+		goto cleanup;
+	}
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (child)
+		cg_destroy(child);
+	free(child);
+	if (parent)
+		cg_destroy(parent);
+	free(parent);
+	return ret;
+}
+
+/*
+ * The test creates the following hierarchy:
+ *    A
+ *    |
+ *    B
+ *    |
+ *    C
+ *
+ * Then it freezes the cgroups in the order C, B, A.
+ * Then it unfreezes the cgroups in the order A, B, C.
+ * Then it checks that C's freeze time is larger than B's and
+ * that B's is larger than A's.
+ */
+static int test_cgfreezer_time_nested(const char *root)
+{
+	char *cgroup[3] = {0};
+	int ret = KSFT_FAIL;
+	long time[3] = {0};
+	int i;
+
+	cgroup[0] = cg_name(root, "cg_test_time_A");
+	if (!cgroup[0])
+		goto cleanup;
+
+	cgroup[1] = cg_name(cgroup[0], "B");
+	if (!cgroup[1])
+		goto cleanup;
+
+	cgroup[2] = cg_name(cgroup[1], "C");
+	if (!cgroup[2])
+		goto cleanup;
+
+	if (cg_create(cgroup[0]))
+		goto cleanup;
+
+	if (cg_check_freezetime(cgroup[0]) < 0) {
+		ret = KSFT_SKIP;
+		goto cleanup;
+	}
+
+	if (cg_create(cgroup[1]))
+		goto cleanup;
+
+	if (cg_create(cgroup[2]))
+		goto cleanup;
+
+	if (cg_freeze_nowait(cgroup[2], true))
+		goto cleanup;
+
+	if (cg_freeze_nowait(cgroup[1], true))
+		goto cleanup;
+
+	if (cg_freeze_nowait(cgroup[0], true))
+		goto cleanup;
+
+	usleep(1000);
+
+	if (cg_freeze_nowait(cgroup[0], false))
+		goto cleanup;
+
+	if (cg_freeze_nowait(cgroup[1], false))
+		goto cleanup;
+
+	if (cg_freeze_nowait(cgroup[2], false))
+		goto cleanup;
+
+	time[2] = cg_check_freezetime(cgroup[2]);
+	time[1] = cg_check_freezetime(cgroup[1]);
+	time[0] = cg_check_freezetime(cgroup[0]);
+
+	if (time[2] <= time[1]) {
+		debug("Expect C's time (%ld) > B's time (%ld)", time[2], time[1]);
+		goto cleanup;
+	}
+
+	if (time[1] <= time[0]) {
+		debug("Expect B's time (%ld) > A's time (%ld)", time[1], time[0]);
+		goto cleanup;
+	}
+
+	ret = KSFT_PASS;
+
+cleanup:
+	for (i = 2; i >= 0 && cgroup[i]; i--) {
+		cg_destroy(cgroup[i]);
+		free(cgroup[i]);
+	}
+
+	return ret;
+}
+
 #define T(x) { x, #x }
 struct cgfreezer_test {
 	int (*fn)(const char *root);
@@ -819,6 +1475,13 @@ struct cgfreezer_test {
 	T(test_cgfreezer_stopped),
 	T(test_cgfreezer_ptraced),
 	T(test_cgfreezer_vfork),
+	T(test_cgfreezer_time_empty),
+	T(test_cgfreezer_time_simple),
+	T(test_cgfreezer_time_populate),
+	T(test_cgfreezer_time_migrate),
+	T(test_cgfreezer_time_parent),
+	T(test_cgfreezer_time_child),
+	T(test_cgfreezer_time_nested),
 };
 #undef T
 
-- 
2.51.0.rc2.233.g662b1ed5c5-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting
  2025-08-22  1:37 ` [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting Tiffany Yang
@ 2025-08-22  6:14   ` Chen Ridong
  2025-08-22  6:58     ` Chen Ridong
  0 siblings, 1 reply; 12+ messages in thread
From: Chen Ridong @ 2025-08-22  6:14 UTC (permalink / raw)
  To: Tiffany Yang, linux-kernel
  Cc: John Stultz, Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Tejun Heo, Johannes Weiner,
	Michal Koutný, Rafael J. Wysocki, Pavel Machek,
	Roman Gushchin, Chen Ridong, kernel-team, Jonathan Corbet,
	Shuah Khan, cgroups, linux-doc, linux-kselftest



On 2025/8/22 9:37, Tiffany Yang wrote:
> There isn't yet a clear way to identify a set of "lost" time that
> everyone (or at least a wider group of users) cares about. However,
> users can perform some delay accounting by iterating over components of
> interest. This patch allows cgroup v2 freezing time to be one of those
> components.
> 
> Track the cumulative time that each v2 cgroup spends freezing and expose
> it to userland via a new local stat file in cgroupfs. Thank you to
> Michal, who provided the ASCII art in the updated documentation.
> 
> To access this value:
>   $ mkdir /sys/fs/cgroup/test
>   $ cat /sys/fs/cgroup/test/cgroup.stat.local
>   freeze_time_total 0
> 
> Ensure consistent freeze time reads with freeze_seq, a per-cgroup
> sequence counter. Writes are serialized using the css_set_lock.
> 
> Signed-off-by: Tiffany Yang <ynaffit@google.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Michal Koutný <mkoutny@suse.com>
> ---
> v3 -> v4:
> * Replace "freeze_time_total" with "frozen" and expose stats via
>   cgroup.stat.local, as recommended by Tejun.
> * Use the same timestamp when freezing/unfreezing a cgroup as its
>   descendants, as suggested by Michal.
> 
> v2 -> v3:
> * Use seqcount along with css_set_lock to guard freeze time accesses, as
>   suggested by Michal.
> 
> v1 -> v2:
> * Track per-cgroup freezing time instead of per-task frozen time, as
>   suggested by Tejun.
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 18 ++++++++++++++++
>  include/linux/cgroup-defs.h             | 17 +++++++++++++++
>  kernel/cgroup/cgroup.c                  | 28 +++++++++++++++++++++++++
>  kernel/cgroup/freezer.c                 | 16 ++++++++++----
>  4 files changed, 75 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 51c0bc4c2dc5..a1e3d431974c 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1001,6 +1001,24 @@ All cgroup core files are prefixed with "cgroup."
>  		Total number of dying cgroup subsystems (e.g. memory
>  		cgroup) at and beneath the current cgroup.
>  
> +  cgroup.stat.local
> +	A read-only flat-keyed file which exists in non-root cgroups.
> +	The following entry is defined:
> +
> +	  frozen_usec
> +		Cumulative time that this cgroup has spent between freezing and
> +		thawing, regardless of whether by self or ancestor groups.
> +		NB: (not) reaching "frozen" state is not accounted here.
> +
> +		Using the following ASCII representation of a cgroup's freezer
> +		state, ::
> +
> +			       1    _____
> +			frozen 0 __/     \__
> +			          ab    cd
> +
> +		the duration being measured is the span between a and c.
> +
>    cgroup.freeze
>  	A read-write single value file which exists on non-root cgroups.
>  	Allowed values are "0" and "1". The default is "0".
> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
> index 6b93a64115fe..539c64eeef38 100644
> --- a/include/linux/cgroup-defs.h
> +++ b/include/linux/cgroup-defs.h
> @@ -433,6 +433,23 @@ struct cgroup_freezer_state {
>  	 * frozen, SIGSTOPped, and PTRACEd.
>  	 */
>  	int nr_frozen_tasks;
> +
> +	/* Freeze time data consistency protection */
> +	seqcount_t freeze_seq;
> +
> +	/*
> +	 * Most recent time the cgroup was requested to freeze.
> +	 * Accesses guarded by freeze_seq counter. Writes serialized
> +	 * by css_set_lock.
> +	 */
> +	u64 freeze_start_nsec;
> +
> +	/*
> +	 * Total duration the cgroup has spent freezing.
> +	 * Accesses guarded by freeze_seq counter. Writes serialized
> +	 * by css_set_lock.
> +	 */
> +	u64 frozen_nsec;
>  };
>  
>  struct cgroup {
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 312c6a8b55bb..ab096b884bbc 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -3763,6 +3763,27 @@ static int cgroup_stat_show(struct seq_file *seq, void *v)
>  	return 0;
>  }
>  
> +static int cgroup_core_local_stat_show(struct seq_file *seq, void *v)
> +{
> +	struct cgroup *cgrp = seq_css(seq)->cgroup;
> +	unsigned int sequence;
> +	u64 freeze_time;
> +
> +	do {
> +		sequence = read_seqcount_begin(&cgrp->freezer.freeze_seq);
> +		freeze_time = cgrp->freezer.frozen_nsec;
> +		/* Add in current freezer interval if the cgroup is freezing. */
> +		if (test_bit(CGRP_FREEZE, &cgrp->flags))
> +			freeze_time += (ktime_get_ns() -
> +					cgrp->freezer.freeze_start_nsec);
> +	} while (read_seqcount_retry(&cgrp->freezer.freeze_seq, sequence));
> +
> +	seq_printf(seq, "frozen_usec %llu\n",
> +		   (unsigned long long) freeze_time / NSEC_PER_USEC);
> +
> +	return 0;
> +}
> +
>  #ifdef CONFIG_CGROUP_SCHED
>  /**
>   * cgroup_tryget_css - try to get a cgroup's css for the specified subsystem
> @@ -5354,6 +5375,11 @@ static struct cftype cgroup_base_files[] = {
>  		.name = "cgroup.stat",
>  		.seq_show = cgroup_stat_show,
>  	},
> +	{
> +		.name = "cgroup.stat.local",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = cgroup_core_local_stat_show,
> +	},
>  	{
>  		.name = "cgroup.freeze",
>  		.flags = CFTYPE_NOT_ON_ROOT,
> @@ -5763,6 +5789,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
>  	 * if the parent has to be frozen, the child has too.
>  	 */
>  	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
> +	seqcount_init(&cgrp->freezer.freeze_seq);
>  	if (cgrp->freezer.e_freeze) {
>  		/*
>  		 * Set the CGRP_FREEZE flag, so when a process will be
> @@ -5771,6 +5798,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
>  		 * consider it frozen immediately.
>  		 */
>  		set_bit(CGRP_FREEZE, &cgrp->flags);
> +		cgrp->freezer.freeze_start_nsec = ktime_get_ns();
>  		set_bit(CGRP_FROZEN, &cgrp->flags);
>  	}
>  
> diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
> index bf1690a167dd..6c18854bff34 100644
> --- a/kernel/cgroup/freezer.c
> +++ b/kernel/cgroup/freezer.c
> @@ -171,7 +171,7 @@ static void cgroup_freeze_task(struct task_struct *task, bool freeze)
>  /*
>   * Freeze or unfreeze all tasks in the given cgroup.
>   */
> -static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
> +static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze, u64 ts_nsec)
>  {
>  	struct css_task_iter it;
>  	struct task_struct *task;
> @@ -179,10 +179,16 @@ static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
>  	lockdep_assert_held(&cgroup_mutex);
>  
>  	spin_lock_irq(&css_set_lock);
> -	if (freeze)
> +	write_seqcount_begin(&cgrp->freezer.freeze_seq);
> +	if (freeze) {
>  		set_bit(CGRP_FREEZE, &cgrp->flags);
> -	else
> +		cgrp->freezer.freeze_start_nsec = ts_nsec;
> +	} else {
>  		clear_bit(CGRP_FREEZE, &cgrp->flags);
> +		cgrp->freezer.frozen_nsec += (ts_nsec -
> +			cgrp->freezer.freeze_start_nsec);
> +	}
> +	write_seqcount_end(&cgrp->freezer.freeze_seq);
>  	spin_unlock_irq(&css_set_lock);
> 

Hello Tiffany,

I wanted to check if there are any specific considerations regarding how we should input the ts_nsec
value.

Would it be possible to define this directly within the cgroup_do_freeze function rather than
passing it as a parameter? This approach might simplify the implementation and potentially improve
timing accuracy when it have lots of descendants.

-- 
Best regards,
Ridong

>  	if (freeze)
> @@ -260,6 +266,7 @@ void cgroup_freeze(struct cgroup *cgrp, bool freeze)
>  	struct cgroup *parent;
>  	struct cgroup *dsct;
>  	bool applied = false;
> +	u64 ts_nsec;
>  	bool old_e;
>  
>  	lockdep_assert_held(&cgroup_mutex);
> @@ -271,6 +278,7 @@ void cgroup_freeze(struct cgroup *cgrp, bool freeze)
>  		return;
>  
>  	cgrp->freezer.freeze = freeze;
> +	ts_nsec = ktime_get_ns();
>  
>  	/*
>  	 * Propagate changes downwards the cgroup tree.
> @@ -298,7 +306,7 @@ void cgroup_freeze(struct cgroup *cgrp, bool freeze)
>  		/*
>  		 * Do change actual state: freeze or unfreeze.
>  		 */
> -		cgroup_do_freeze(dsct, freeze);
> +		cgroup_do_freeze(dsct, freeze, ts_nsec);
>  		applied = true;
>  	}
>  


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting
  2025-08-22  6:14   ` Chen Ridong
@ 2025-08-22  6:58     ` Chen Ridong
  2025-08-22 19:32       ` Tiffany Yang
  0 siblings, 1 reply; 12+ messages in thread
From: Chen Ridong @ 2025-08-22  6:58 UTC (permalink / raw)
  To: Tiffany Yang, linux-kernel
  Cc: John Stultz, Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Tejun Heo, Johannes Weiner,
	Michal Koutný, Rafael J. Wysocki, Pavel Machek,
	Roman Gushchin, Chen Ridong, kernel-team, Jonathan Corbet,
	Shuah Khan, cgroups, linux-doc, linux-kselftest



On 2025/8/22 14:14, Chen Ridong wrote:
> 
> 
> On 2025/8/22 9:37, Tiffany Yang wrote:
>> There isn't yet a clear way to identify a set of "lost" time that
>> everyone (or at least a wider group of users) cares about. However,
>> users can perform some delay accounting by iterating over components of
>> interest. This patch allows cgroup v2 freezing time to be one of those
>> components.
>>
>> Track the cumulative time that each v2 cgroup spends freezing and expose
>> it to userland via a new local stat file in cgroupfs. Thank you to
>> Michal, who provided the ASCII art in the updated documentation.
>>
>> To access this value:
>>   $ mkdir /sys/fs/cgroup/test
>>   $ cat /sys/fs/cgroup/test/cgroup.stat.local
>>   freeze_time_total 0
>>
>> Ensure consistent freeze time reads with freeze_seq, a per-cgroup
>> sequence counter. Writes are serialized using the css_set_lock.
>>
>> Signed-off-by: Tiffany Yang <ynaffit@google.com>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: Michal Koutný <mkoutny@suse.com>
>> ---
>> v3 -> v4:
>> * Replace "freeze_time_total" with "frozen" and expose stats via
>>   cgroup.stat.local, as recommended by Tejun.
>> * Use the same timestamp when freezing/unfreezing a cgroup as its
>>   descendants, as suggested by Michal.
>>
>> v2 -> v3:
>> * Use seqcount along with css_set_lock to guard freeze time accesses, as
>>   suggested by Michal.
>>
>> v1 -> v2:
>> * Track per-cgroup freezing time instead of per-task frozen time, as
>>   suggested by Tejun.
>> ---
>>  Documentation/admin-guide/cgroup-v2.rst | 18 ++++++++++++++++
>>  include/linux/cgroup-defs.h             | 17 +++++++++++++++
>>  kernel/cgroup/cgroup.c                  | 28 +++++++++++++++++++++++++
>>  kernel/cgroup/freezer.c                 | 16 ++++++++++----
>>  4 files changed, 75 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>> index 51c0bc4c2dc5..a1e3d431974c 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -1001,6 +1001,24 @@ All cgroup core files are prefixed with "cgroup."
>>  		Total number of dying cgroup subsystems (e.g. memory
>>  		cgroup) at and beneath the current cgroup.
>>  
>> +  cgroup.stat.local
>> +	A read-only flat-keyed file which exists in non-root cgroups.
>> +	The following entry is defined:
>> +
>> +	  frozen_usec
>> +		Cumulative time that this cgroup has spent between freezing and
>> +		thawing, regardless of whether by self or ancestor groups.
>> +		NB: (not) reaching "frozen" state is not accounted here.
>> +
>> +		Using the following ASCII representation of a cgroup's freezer
>> +		state, ::
>> +
>> +			       1    _____
>> +			frozen 0 __/     \__
>> +			          ab    cd
>> +
>> +		the duration being measured is the span between a and c.
>> +
>>    cgroup.freeze
>>  	A read-write single value file which exists on non-root cgroups.
>>  	Allowed values are "0" and "1". The default is "0".
>> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
>> index 6b93a64115fe..539c64eeef38 100644
>> --- a/include/linux/cgroup-defs.h
>> +++ b/include/linux/cgroup-defs.h
>> @@ -433,6 +433,23 @@ struct cgroup_freezer_state {
>>  	 * frozen, SIGSTOPped, and PTRACEd.
>>  	 */
>>  	int nr_frozen_tasks;
>> +
>> +	/* Freeze time data consistency protection */
>> +	seqcount_t freeze_seq;
>> +
>> +	/*
>> +	 * Most recent time the cgroup was requested to freeze.
>> +	 * Accesses guarded by freeze_seq counter. Writes serialized
>> +	 * by css_set_lock.
>> +	 */
>> +	u64 freeze_start_nsec;
>> +
>> +	/*
>> +	 * Total duration the cgroup has spent freezing.
>> +	 * Accesses guarded by freeze_seq counter. Writes serialized
>> +	 * by css_set_lock.
>> +	 */
>> +	u64 frozen_nsec;
>>  };
>>  
>>  struct cgroup {
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index 312c6a8b55bb..ab096b884bbc 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -3763,6 +3763,27 @@ static int cgroup_stat_show(struct seq_file *seq, void *v)
>>  	return 0;
>>  }
>>  
>> +static int cgroup_core_local_stat_show(struct seq_file *seq, void *v)
>> +{
>> +	struct cgroup *cgrp = seq_css(seq)->cgroup;
>> +	unsigned int sequence;
>> +	u64 freeze_time;
>> +
>> +	do {
>> +		sequence = read_seqcount_begin(&cgrp->freezer.freeze_seq);
>> +		freeze_time = cgrp->freezer.frozen_nsec;
>> +		/* Add in current freezer interval if the cgroup is freezing. */
>> +		if (test_bit(CGRP_FREEZE, &cgrp->flags))
>> +			freeze_time += (ktime_get_ns() -
>> +					cgrp->freezer.freeze_start_nsec);
>> +	} while (read_seqcount_retry(&cgrp->freezer.freeze_seq, sequence));
>> +
>> +	seq_printf(seq, "frozen_usec %llu\n",
>> +		   (unsigned long long) freeze_time / NSEC_PER_USEC);
>> +
>> +	return 0;
>> +}
>> +
>>  #ifdef CONFIG_CGROUP_SCHED
>>  /**
>>   * cgroup_tryget_css - try to get a cgroup's css for the specified subsystem
>> @@ -5354,6 +5375,11 @@ static struct cftype cgroup_base_files[] = {
>>  		.name = "cgroup.stat",
>>  		.seq_show = cgroup_stat_show,
>>  	},
>> +	{
>> +		.name = "cgroup.stat.local",
>> +		.flags = CFTYPE_NOT_ON_ROOT,
>> +		.seq_show = cgroup_core_local_stat_show,
>> +	},
>>  	{
>>  		.name = "cgroup.freeze",
>>  		.flags = CFTYPE_NOT_ON_ROOT,
>> @@ -5763,6 +5789,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
>>  	 * if the parent has to be frozen, the child has too.
>>  	 */
>>  	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
>> +	seqcount_init(&cgrp->freezer.freeze_seq);
>>  	if (cgrp->freezer.e_freeze) {
>>  		/*
>>  		 * Set the CGRP_FREEZE flag, so when a process will be
>> @@ -5771,6 +5798,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
>>  		 * consider it frozen immediately.
>>  		 */
>>  		set_bit(CGRP_FREEZE, &cgrp->flags);
>> +		cgrp->freezer.freeze_start_nsec = ktime_get_ns();
>>  		set_bit(CGRP_FROZEN, &cgrp->flags);
>>  	}
>>  
>> diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
>> index bf1690a167dd..6c18854bff34 100644
>> --- a/kernel/cgroup/freezer.c
>> +++ b/kernel/cgroup/freezer.c
>> @@ -171,7 +171,7 @@ static void cgroup_freeze_task(struct task_struct *task, bool freeze)
>>  /*
>>   * Freeze or unfreeze all tasks in the given cgroup.
>>   */
>> -static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
>> +static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze, u64 ts_nsec)
>>  {
>>  	struct css_task_iter it;
>>  	struct task_struct *task;
>> @@ -179,10 +179,16 @@ static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
>>  	lockdep_assert_held(&cgroup_mutex);
>>  
>>  	spin_lock_irq(&css_set_lock);
>> -	if (freeze)
>> +	write_seqcount_begin(&cgrp->freezer.freeze_seq);
>> +	if (freeze) {
>>  		set_bit(CGRP_FREEZE, &cgrp->flags);
>> -	else
>> +		cgrp->freezer.freeze_start_nsec = ts_nsec;
>> +	} else {
>>  		clear_bit(CGRP_FREEZE, &cgrp->flags);
>> +		cgrp->freezer.frozen_nsec += (ts_nsec -
>> +			cgrp->freezer.freeze_start_nsec);
>> +	}
>> +	write_seqcount_end(&cgrp->freezer.freeze_seq);
>>  	spin_unlock_irq(&css_set_lock);
>>
> 
> Hello Tiffany,
> 
> I wanted to check if there are any specific considerations regarding how we should input the ts_nsec
> value.
> 
> Would it be possible to define this directly within the cgroup_do_freeze function rather than
> passing it as a parameter? This approach might simplify the implementation and potentially improve
> timing accuracy when it have lots of descendants.
> 

I revisited v3, and this was Michal's point.
	p
     /  |  \
    1  ...  n
When we freeze the parent group p, is it expected that all descendant cgroups (1 to n) should share
the same frozen timestamp?

If the cgroup tree structure is stable, the exact frozen time may not be really matter. However, if
the tree is not stable, obtaining the same frozen time is acceptable?

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] cgroup: selftests: Add tests for freezer time
  2025-08-22  1:37 ` [PATCH v4 2/2] cgroup: selftests: Add tests for freezer time Tiffany Yang
@ 2025-08-22  7:19   ` Chen Ridong
  2025-08-22 18:50     ` Tiffany Yang
  0 siblings, 1 reply; 12+ messages in thread
From: Chen Ridong @ 2025-08-22  7:19 UTC (permalink / raw)
  To: Tiffany Yang, linux-kernel
  Cc: John Stultz, Thomas Gleixner, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Tejun Heo, Johannes Weiner,
	Michal Koutný, Rafael J. Wysocki, Pavel Machek,
	Roman Gushchin, Chen Ridong, kernel-team, Jonathan Corbet,
	Shuah Khan, cgroups, linux-doc, linux-kselftest



On 2025/8/22 9:37, Tiffany Yang wrote:
> Test cgroup v2 freezer time stat. Freezer time accounting should
> be independent of other cgroups in the hierarchy and should increase
> iff a cgroup is CGRP_FREEZE (regardless of whether it reaches
> CGRP_FROZEN).
> 
> Skip these tests on systems without freeze time accounting.
> 
> Signed-off-by: Tiffany Yang <ynaffit@google.com>
> Cc: Michal Koutný <mkoutny@suse.com>
> ---
> v3 -> v4:
> * Clean up logic around skipping selftests and decrease granularity of
>   sleep times, as suggested by Michal.
> ---
>  tools/testing/selftests/cgroup/test_freezer.c | 663 ++++++++++++++++++
>  1 file changed, 663 insertions(+)
> 
> diff --git a/tools/testing/selftests/cgroup/test_freezer.c b/tools/testing/selftests/cgroup/test_freezer.c
> index 8730645d363a..dfb763819581 100644
> --- a/tools/testing/selftests/cgroup/test_freezer.c
> +++ b/tools/testing/selftests/cgroup/test_freezer.c
> @@ -804,6 +804,662 @@ static int test_cgfreezer_vfork(const char *root)
>  	return ret;
>  }
>  
> +/*
> + * Get the current frozen_usec for the cgroup.
> + */
> +static long cg_check_freezetime(const char *cgroup)
> +{
> +	return cg_read_key_long(cgroup, "cgroup.stat.local",
> +				"frozen_usec ");
> +}
> +
> +/*
> + * Test that the freeze time will behave as expected for an empty cgroup.
> + */
> +static int test_cgfreezer_time_empty(const char *root)
> +{
> +	int ret = KSFT_FAIL;
> +	char *cgroup = NULL;
> +	long prev, curr;
> +
> +	cgroup = cg_name(root, "cg_time_test_empty");
> +	if (!cgroup)
> +		goto cleanup;
> +
> +	/*
> +	 * 1) Create an empty cgroup and check that its freeze time
> +	 *    is 0.
> +	 */
> +	if (cg_create(cgroup))
> +		goto cleanup;
> +
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr < 0) {
> +		ret = KSFT_SKIP;
> +		goto cleanup;
> +	}
> +	if (curr > 0) {
> +		debug("Expect time (%ld) to be 0\n", curr);
> +		goto cleanup;
> +	}
> +

Perhaps we can simply use if (curr != 0) for the condition?

> +	if (cg_freeze_nowait(cgroup, true))
> +		goto cleanup;
> +
> +	/*
> +	 * 2) Sleep for 1000 us. Check that the freeze time is at
> +	 *    least 1000 us.
> +	 */
> +	usleep(1000);
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr < 1000) {
> +		debug("Expect time (%ld) to be at least 1000 us\n",
> +		      curr);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 3) Unfreeze the cgroup. Check that the freeze time is
> +	 *    larger than at 2).
> +	 */
> +	if (cg_freeze_nowait(cgroup, false))
> +		goto cleanup;
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr <= prev) {
> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
> +		      curr, prev);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 4) Check the freeze time again to ensure that it has not
> +	 *    changed.
> +	 */
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr != prev) {
> +		debug("Expect time (%ld) to be unchanged from previous check (%ld)\n",
> +		      curr, prev);
> +		goto cleanup;
> +	}
> +
> +	ret = KSFT_PASS;
> +
> +cleanup:
> +	if (cgroup)
> +		cg_destroy(cgroup);
> +	free(cgroup);
> +	return ret;
> +}
> +
> +/*
> + * A simple test for cgroup freezer time accounting. This test follows
> + * the same flow as test_cgfreezer_time_empty, but with a single process
> + * in the cgroup.
> + */
> +static int test_cgfreezer_time_simple(const char *root)
> +{
> +	int ret = KSFT_FAIL;
> +	char *cgroup = NULL;
> +	long prev, curr;
> +
> +	cgroup = cg_name(root, "cg_time_test_simple");
> +	if (!cgroup)
> +		goto cleanup;
> +
> +	/*
> +	 * 1) Create a cgroup and check that its freeze time is 0.
> +	 */
> +	if (cg_create(cgroup))
> +		goto cleanup;
> +
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr < 0) {
> +		ret = KSFT_SKIP;
> +		goto cleanup;
> +	}
> +	if (curr > 0) {
> +		debug("Expect time (%ld) to be 0\n", curr);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 2) Populate the cgroup with one child and check that the
> +	 *    freeze time is still 0.
> +	 */
> +	cg_run_nowait(cgroup, child_fn, NULL);
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr > prev) {
> +		debug("Expect time (%ld) to be 0\n", curr);
> +		goto cleanup;
> +	}
> +
> +	if (cg_freeze_nowait(cgroup, true))
> +		goto cleanup;
> +
> +	/*
> +	 * 3) Sleep for 1000 us. Check that the freeze time is at
> +	 *    least 1000 us.
> +	 */
> +	usleep(1000);
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr < 1000) {
> +		debug("Expect time (%ld) to be at least 1000 us\n",
> +		      curr);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 4) Unfreeze the cgroup. Check that the freeze time is
> +	 *    larger than at 3).
> +	 */
> +	if (cg_freeze_nowait(cgroup, false))
> +		goto cleanup;
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr <= prev) {
> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
> +		      curr, prev);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 5) Sleep for 1000 us. Check that the freeze time is the
> +	 *    same as at 4).
> +	 */
> +	usleep(1000);
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr != prev) {
> +		debug("Expect time (%ld) to be unchanged from previous check (%ld)\n",
> +		      curr, prev);
> +		goto cleanup;
> +	}
> +
> +	ret = KSFT_PASS;
> +
> +cleanup:
> +	if (cgroup)
> +		cg_destroy(cgroup);
> +	free(cgroup);
> +	return ret;
> +}
> +
> +/*
> + * Test that freezer time accounting works as expected, even while we're
> + * populating a cgroup with processes.
> + */
> +static int test_cgfreezer_time_populate(const char *root)
> +{
> +	int ret = KSFT_FAIL;
> +	char *cgroup = NULL;
> +	long prev, curr;
> +	int i;
> +
> +	cgroup = cg_name(root, "cg_time_test_populate");
> +	if (!cgroup)
> +		goto cleanup;
> +
> +	if (cg_create(cgroup))
> +		goto cleanup;
> +
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr < 0) {
> +		ret = KSFT_SKIP;
> +		goto cleanup;
> +	}
> +	if (curr > 0) {
> +		debug("Expect time (%ld) to be 0\n", curr);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 1) Populate the cgroup with 100 processes. Check that
> +	 *    the freeze time is 0.
> +	 */
> +	for (i = 0; i < 100; i++)
> +		cg_run_nowait(cgroup, child_fn, NULL);
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr != prev) {
> +		debug("Expect time (%ld) to be 0\n", curr);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 2) Wait for the group to become fully populated. Check
> +	 *    that the freeze time is 0.
> +	 */
> +	if (cg_wait_for_proc_count(cgroup, 100))
> +		goto cleanup;
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr != prev) {
> +		debug("Expect time (%ld) to be 0\n", curr);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 3) Freeze the cgroup and then populate it with 100 more
> +	 *    processes. Check that the freeze time continues to grow.
> +	 */
> +	if (cg_freeze_nowait(cgroup, true))
> +		goto cleanup;
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr <= prev) {
> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
> +		      curr, prev);
> +		goto cleanup;
> +	}
> +
> +	for (i = 0; i < 100; i++)
> +		cg_run_nowait(cgroup, child_fn, NULL);
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr <= prev) {
> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
> +		      curr, prev);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 4) Wait for the group to become fully populated. Check
> +	 *    that the freeze time is larger than at 3).
> +	 */
> +	if (cg_wait_for_proc_count(cgroup, 200))
> +		goto cleanup;
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr <= prev) {
> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
> +		      curr, prev);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 5) Unfreeze the cgroup. Check that the freeze time is
> +	 *    larger than at 4).
> +	 */
> +	if (cg_freeze_nowait(cgroup, false))
> +		goto cleanup;
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr <= prev) {
> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
> +		      curr, prev);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 6) Kill the processes. Check that the freeze time is the
> +	 *    same as it was at 5).
> +	 */
> +	if (cg_killall(cgroup))
> +		goto cleanup;
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr != prev) {
> +		debug("Expect time (%ld) to be unchanged from previous check (%ld)\n",
> +		      curr, prev);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * 7) Freeze and unfreeze the cgroup. Check that the freeze
> +	 *    time is larger than it was at 6).
> +	 */
> +	if (cg_freeze_nowait(cgroup, true))
> +		goto cleanup;
> +	if (cg_freeze_nowait(cgroup, false))
> +		goto cleanup;
> +	prev = curr;
> +	curr = cg_check_freezetime(cgroup);
> +	if (curr <= prev) {
> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
> +		      curr, prev);
> +		goto cleanup;
> +	}
> +
> +	ret = KSFT_PASS;
> +
> +cleanup:
> +	if (cgroup)
> +		cg_destroy(cgroup);
> +	free(cgroup);
> +	return ret;
> +}
> +
> +/*
> + * Test that frozen time for a cgroup continues to work as expected,
> + * even as processes are migrated. Frozen cgroup A's freeze time should
> + * continue to increase and running cgroup B's should stay 0.
> + */
> +static int test_cgfreezer_time_migrate(const char *root)
> +{
> +	long prev_A, curr_A, curr_B;
> +	char *cgroup[2] = {0};
> +	int ret = KSFT_FAIL;
> +	int pid;
> +
> +	cgroup[0] = cg_name(root, "cg_time_test_migrate_A");
> +	if (!cgroup[0])
> +		goto cleanup;
> +
> +	cgroup[1] = cg_name(root, "cg_time_test_migrate_B");
> +	if (!cgroup[1])
> +		goto cleanup;
> +
> +	if (cg_create(cgroup[0]))
> +		goto cleanup;
> +
> +	if (cg_check_freezetime(cgroup[0]) < 0) {
> +		ret = KSFT_SKIP;
> +		goto cleanup;
> +	}
> +
> +	if (cg_create(cgroup[1]))
> +		goto cleanup;
> +
> +	pid = cg_run_nowait(cgroup[0], child_fn, NULL);
> +	if (pid < 0)
> +		goto cleanup;
> +
> +	if (cg_wait_for_proc_count(cgroup[0], 1))
> +		goto cleanup;
> +
> +	curr_A = cg_check_freezetime(cgroup[0]);
> +	if (curr_A) {
> +		debug("Expect time (%ld) to be 0\n", curr_A);
> +		goto cleanup;
> +	}
> +	curr_B = cg_check_freezetime(cgroup[1]);
> +	if (curr_B) {
> +		debug("Expect time (%ld) to be 0\n", curr_B);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * Freeze cgroup A.
> +	 */
> +	if (cg_freeze_wait(cgroup[0], true))
> +		goto cleanup;
> +	prev_A = curr_A;
> +	curr_A = cg_check_freezetime(cgroup[0]);
> +	if (curr_A <= prev_A) {
> +		debug("Expect time (%ld) to be > 0\n", curr_A);
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * Migrate from A (frozen) to B (running).
> +	 */
> +	if (cg_enter(cgroup[1], pid))
> +		goto cleanup;
> +
> +	usleep(1000);
> +	curr_B = cg_check_freezetime(cgroup[1]);
> +	if (curr_B) {
> +		debug("Expect time (%ld) to be 0\n", curr_B);
> +		goto cleanup;
> +	}
> +
> +	prev_A = curr_A;
> +	curr_A = cg_check_freezetime(cgroup[0]);
> +	if (curr_A <= prev_A) {
> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
> +		      curr_A, prev_A);
> +		goto cleanup;
> +	}
> +
> +	ret = KSFT_PASS;
> +
> +cleanup:
> +	if (cgroup[0])
> +		cg_destroy(cgroup[0]);
> +	free(cgroup[0]);
> +	if (cgroup[1])
> +		cg_destroy(cgroup[1]);
> +	free(cgroup[1]);
> +	return ret;
> +}
> +
> +/*
> + * The test creates a cgroup and freezes it. Then it creates a child cgroup.
> + * After that it checks that the child cgroup has a non-zero freeze time
> + * that is less than the parent's. Next, it freezes the child, unfreezes
> + * the parent, and sleeps. Finally, it checks that the child's freeze
> + * time has grown larger than the parent's.
> + */
> +static int test_cgfreezer_time_parent(const char *root)
> +{
> +	char *parent, *child = NULL;
> +	int ret = KSFT_FAIL;
> +	long ptime, ctime;
> +
> +	parent = cg_name(root, "cg_test_parent_A");
> +	if (!parent)
> +		goto cleanup;
> +
> +	child = cg_name(parent, "cg_test_parent_B");
> +	if (!child)
> +		goto cleanup;
> +
> +	if (cg_create(parent))
> +		goto cleanup;
> +
> +	if (cg_check_freezetime(parent) < 0) {
> +		ret = KSFT_SKIP;
> +		goto cleanup;
> +	}
> +
> +	if (cg_freeze_wait(parent, true))
> +		goto cleanup;
> +
> +	usleep(1000);
> +	if (cg_create(child))
> +		goto cleanup;
> +
> +	if (cg_check_frozen(child, true))
> +		goto cleanup;
> +
> +	/*
> +	 * Since the parent was frozen the entire time the child cgroup
> +	 * was being created, we expect the parent's freeze time to be
> +	 * larger than the child's.
> +	 *
> +	 * Ideally, we would be able to check both times simultaneously,
> +	 * but here we get the child's after we get the parent's.
> +	 */
> +	ptime = cg_check_freezetime(parent);
> +	ctime = cg_check_freezetime(child);
> +	if (ptime <= ctime) {
> +		debug("Expect ptime (%ld) > ctime (%ld)\n", ptime, ctime);
> +		goto cleanup;
> +	}
> +
> +	if (cg_freeze_nowait(child, true))
> +		goto cleanup;
> +
> +	if (cg_freeze_wait(parent, false))
> +		goto cleanup;
> +
> +	if (cg_check_frozen(child, true))
> +		goto cleanup;
> +
> +	usleep(100000);
> +
> +	ctime = cg_check_freezetime(child);
> +	ptime = cg_check_freezetime(parent);
> +
> +	if (ctime <= ptime) {
> +		debug("Expect ctime (%ld) > ptime (%ld)\n", ctime, ptime);
> +		goto cleanup;
> +	}
> +
> +	ret = KSFT_PASS;
> +
> +cleanup:
> +	if (child)
> +		cg_destroy(child);
> +	free(child);
> +	if (parent)
> +		cg_destroy(parent);
> +	free(parent);
> +	return ret;
> +}
> +
> +/*
> + * The test creates a parent cgroup and a child cgroup. Then, it freezes
> + * the child and checks that the child's freeze time is greater than the
> + * parent's, which should be zero.
> + */
> +static int test_cgfreezer_time_child(const char *root)
> +{
> +	char *parent, *child = NULL;
> +	int ret = KSFT_FAIL;
> +	long ptime, ctime;
> +
> +	parent = cg_name(root, "cg_test_child_A");
> +	if (!parent)
> +		goto cleanup;
> +
> +	child = cg_name(parent, "cg_test_child_B");
> +	if (!child)
> +		goto cleanup;
> +
> +	if (cg_create(parent))
> +		goto cleanup;
> +
> +	if (cg_check_freezetime(parent) < 0) {
> +		ret = KSFT_SKIP;
> +		goto cleanup;
> +	}
> +
> +	if (cg_create(child))
> +		goto cleanup;
> +
> +	if (cg_freeze_wait(child, true))
> +		goto cleanup;
> +
> +	ctime = cg_check_freezetime(child);
> +	ptime = cg_check_freezetime(parent);
> +	if (ptime != 0) {
> +		debug("Expect ptime (%ld) to be 0\n", ptime);
> +		goto cleanup;
> +	}
> +
> +	if (ctime <= ptime) {
> +		debug("Expect ctime (%ld) <= ptime (%ld)\n", ctime, ptime);
> +		goto cleanup;
> +	}
> +
> +	ret = KSFT_PASS;
> +
> +cleanup:
> +	if (child)
> +		cg_destroy(child);
> +	free(child);
> +	if (parent)
> +		cg_destroy(parent);
> +	free(parent);
> +	return ret;
> +}
> +
> +/*
> + * The test creates the following hierarchy:
> + *    A
> + *    |
> + *    B
> + *    |
> + *    C
> + *
> + * Then it freezes the cgroups in the order C, B, A.
> + * Then it unfreezes the cgroups in the order A, B, C.
> + * Then it checks that C's freeze time is larger than B's and
> + * that B's is larger than A's.
> + */
> +static int test_cgfreezer_time_nested(const char *root)
> +{
> +	char *cgroup[3] = {0};
> +	int ret = KSFT_FAIL;
> +	long time[3] = {0};
> +	int i;
> +
> +	cgroup[0] = cg_name(root, "cg_test_time_A");
> +	if (!cgroup[0])
> +		goto cleanup;
> +
> +	cgroup[1] = cg_name(cgroup[0], "B");
> +	if (!cgroup[1])
> +		goto cleanup;
> +
> +	cgroup[2] = cg_name(cgroup[1], "C");
> +	if (!cgroup[2])
> +		goto cleanup;
> +
> +	if (cg_create(cgroup[0]))
> +		goto cleanup;
> +
> +	if (cg_check_freezetime(cgroup[0]) < 0) {
> +		ret = KSFT_SKIP;
> +		goto cleanup;
> +	}
> +
> +	if (cg_create(cgroup[1]))
> +		goto cleanup;
> +
> +	if (cg_create(cgroup[2]))
> +		goto cleanup;
> +
> +	if (cg_freeze_nowait(cgroup[2], true))
> +		goto cleanup;
> +
> +	if (cg_freeze_nowait(cgroup[1], true))
> +		goto cleanup;
> +
> +	if (cg_freeze_nowait(cgroup[0], true))
> +		goto cleanup;
> +
> +	usleep(1000);
> +
> +	if (cg_freeze_nowait(cgroup[0], false))
> +		goto cleanup;
> +
> +	if (cg_freeze_nowait(cgroup[1], false))
> +		goto cleanup;
> +
> +	if (cg_freeze_nowait(cgroup[2], false))
> +		goto cleanup;
> +
> +	time[2] = cg_check_freezetime(cgroup[2]);
> +	time[1] = cg_check_freezetime(cgroup[1]);
> +	time[0] = cg_check_freezetime(cgroup[0]);
> +
> +	if (time[2] <= time[1]) {
> +		debug("Expect C's time (%ld) > B's time (%ld)", time[2], time[1]);
> +		goto cleanup;
> +	}
> +
> +	if (time[1] <= time[0]) {
> +		debug("Expect B's time (%ld) > A's time (%ld)", time[1], time[0]);
> +		goto cleanup;
> +	}
> +
> +	ret = KSFT_PASS;
> +
> +cleanup:
> +	for (i = 2; i >= 0 && cgroup[i]; i--) {
> +		cg_destroy(cgroup[i]);
> +		free(cgroup[i]);
> +	}
> +
> +	return ret;
> +}
> +
>  #define T(x) { x, #x }
>  struct cgfreezer_test {
>  	int (*fn)(const char *root);
> @@ -819,6 +1475,13 @@ struct cgfreezer_test {
>  	T(test_cgfreezer_stopped),
>  	T(test_cgfreezer_ptraced),
>  	T(test_cgfreezer_vfork),
> +	T(test_cgfreezer_time_empty),
> +	T(test_cgfreezer_time_simple),
> +	T(test_cgfreezer_time_populate),
> +	T(test_cgfreezer_time_migrate),
> +	T(test_cgfreezer_time_parent),
> +	T(test_cgfreezer_time_child),
> +	T(test_cgfreezer_time_nested),
>  };
>  #undef T
>  

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer
  2025-08-22  1:37 [PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer Tiffany Yang
  2025-08-22  1:37 ` [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting Tiffany Yang
  2025-08-22  1:37 ` [PATCH v4 2/2] cgroup: selftests: Add tests for freezer time Tiffany Yang
@ 2025-08-22 17:51 ` Tejun Heo
  2 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2025-08-22 17:51 UTC (permalink / raw)
  To: Tiffany Yang
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Stephen Boyd,
	Anna-Maria Behnsen, Frederic Weisbecker, Johannes Weiner,
	Michal Koutný, Rafael J. Wysocki, Pavel Machek,
	Roman Gushchin, Chen Ridong, kernel-team, Jonathan Corbet,
	Shuah Khan, cgroups, linux-doc, linux-kselftest

On Thu, Aug 21, 2025 at 06:37:51PM -0700, Tiffany Yang wrote:
> Hello,
> 
> The cgroup v2 freezer controller is useful for freezing background
> applications so they don't contend with foreground tasks. However, this
> may disrupt any internal monitoring that the application is performing,
> as it may not be aware that it was frozen.
> 
> To illustrate, an application might implement a watchdog thread to
> monitor a high-priority task by periodically checking its state to
> ensure progress. The challenge is that the task only advances when the
> application is running, but watchdog timers are set relative to system
> time, not app time. If the app is frozen and misses the expected
> deadline, the watchdog, unaware of this pause, may kill a healthy
> process.
> 
> This series tracks the time that each cgroup spends "freezing" and
> exposes it via cgroup.stat.local. Include several basic selftests to
> demonstrate the expected behavior of this interface, including that:
>   1. Freeze time will increase while a cgroup is freezing, regardless of
>      whether it is frozen or not.
>   2. Each cgroup's freeze time is independent from the other cgroups in
>      its hierarchy.
> 
> Thanks,
> Tiffany
> 
> Signed-off-by: Tiffany Yang <ynaffit@google.com>

Applied to cgroup/for-6.18. Let's address further issues incrementally.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] cgroup: selftests: Add tests for freezer time
  2025-08-22  7:19   ` Chen Ridong
@ 2025-08-22 18:50     ` Tiffany Yang
  2025-08-23  1:47       ` Chen Ridong
  0 siblings, 1 reply; 12+ messages in thread
From: Tiffany Yang @ 2025-08-22 18:50 UTC (permalink / raw)
  To: Chen Ridong
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Stephen Boyd,
	Anna-Maria Behnsen, Frederic Weisbecker, Tejun Heo,
	Johannes Weiner, Michal Koutný, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, Shuah Khan, cgroups, linux-doc, linux-kselftest

Thanks for taking the time to look at these patches!

Chen Ridong <chenridong@huaweicloud.com> writes:

> On 2025/8/22 9:37, Tiffany Yang wrote:
>> Test cgroup v2 freezer time stat. Freezer time accounting should
>> be independent of other cgroups in the hierarchy and should increase
>> iff a cgroup is CGRP_FREEZE (regardless of whether it reaches
>> CGRP_FROZEN).

...
>> +	if (curr < 0) {
>> +		ret = KSFT_SKIP;
>> +		goto cleanup;
>> +	}
>> +	if (curr > 0) {
>> +		debug("Expect time (%ld) to be 0\n", curr);
>> +		goto cleanup;
>> +	}
>> +

> Perhaps we can simply use if (curr != 0) for the condition?


Here we have 2 separate conditions because in the case where curr < 0,
it means that the interface is not available and we should skip this
test instead of failing it. In the case where curr > 0, the feature is
not working correctly, and the test should fail as a result.

>> +	if (cg_freeze_nowait(cgroup, true))
>> +		goto cleanup;
>> +
>> +	/*
>> +	 * 2) Sleep for 1000 us. Check that the freeze time is at
>> +	 *    least 1000 us.
>> +	 */
>> +	usleep(1000);
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr < 1000) {
>> +		debug("Expect time (%ld) to be at least 1000 us\n",
>> +		      curr);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 3) Unfreeze the cgroup. Check that the freeze time is
>> +	 *    larger than at 2).
>> +	 */
>> +	if (cg_freeze_nowait(cgroup, false))
>> +		goto cleanup;
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr <= prev) {
>> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
>> +		      curr, prev);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 4) Check the freeze time again to ensure that it has not
>> +	 *    changed.
>> +	 */
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr != prev) {
>> +		debug("Expect time (%ld) to be unchanged from previous check (%ld)\n",
>> +		      curr, prev);
>> +		goto cleanup;
>> +	}
>> +
>> +	ret = KSFT_PASS;
>> +
>> +cleanup:
>> +	if (cgroup)
>> +		cg_destroy(cgroup);
>> +	free(cgroup);
>> +	return ret;
>> +}
>> +
>> +/*
>> + * A simple test for cgroup freezer time accounting. This test follows
>> + * the same flow as test_cgfreezer_time_empty, but with a single process
>> + * in the cgroup.
>> + */
>> +static int test_cgfreezer_time_simple(const char *root)
>> +{
>> +	int ret = KSFT_FAIL;
>> +	char *cgroup = NULL;
>> +	long prev, curr;
>> +
>> +	cgroup = cg_name(root, "cg_time_test_simple");
>> +	if (!cgroup)
>> +		goto cleanup;
>> +
>> +	/*
>> +	 * 1) Create a cgroup and check that its freeze time is 0.
>> +	 */
>> +	if (cg_create(cgroup))
>> +		goto cleanup;
>> +
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr < 0) {
>> +		ret = KSFT_SKIP;
>> +		goto cleanup;
>> +	}
>> +	if (curr > 0) {
>> +		debug("Expect time (%ld) to be 0\n", curr);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 2) Populate the cgroup with one child and check that the
>> +	 *    freeze time is still 0.
>> +	 */
>> +	cg_run_nowait(cgroup, child_fn, NULL);
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr > prev) {
>> +		debug("Expect time (%ld) to be 0\n", curr);
>> +		goto cleanup;
>> +	}
>> +
>> +	if (cg_freeze_nowait(cgroup, true))
>> +		goto cleanup;
>> +
>> +	/*
>> +	 * 3) Sleep for 1000 us. Check that the freeze time is at
>> +	 *    least 1000 us.
>> +	 */
>> +	usleep(1000);
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr < 1000) {
>> +		debug("Expect time (%ld) to be at least 1000 us\n",
>> +		      curr);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 4) Unfreeze the cgroup. Check that the freeze time is
>> +	 *    larger than at 3).
>> +	 */
>> +	if (cg_freeze_nowait(cgroup, false))
>> +		goto cleanup;
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr <= prev) {
>> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
>> +		      curr, prev);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 5) Sleep for 1000 us. Check that the freeze time is the
>> +	 *    same as at 4).
>> +	 */
>> +	usleep(1000);
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr != prev) {
>> +		debug("Expect time (%ld) to be unchanged from previous check (%ld)\n",
>> +		      curr, prev);
>> +		goto cleanup;
>> +	}
>> +
>> +	ret = KSFT_PASS;
>> +
>> +cleanup:
>> +	if (cgroup)
>> +		cg_destroy(cgroup);
>> +	free(cgroup);
>> +	return ret;
>> +}
>> +
>> +/*
>> + * Test that freezer time accounting works as expected, even while we're
>> + * populating a cgroup with processes.
>> + */
>> +static int test_cgfreezer_time_populate(const char *root)
>> +{
>> +	int ret = KSFT_FAIL;
>> +	char *cgroup = NULL;
>> +	long prev, curr;
>> +	int i;
>> +
>> +	cgroup = cg_name(root, "cg_time_test_populate");
>> +	if (!cgroup)
>> +		goto cleanup;
>> +
>> +	if (cg_create(cgroup))
>> +		goto cleanup;
>> +
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr < 0) {
>> +		ret = KSFT_SKIP;
>> +		goto cleanup;
>> +	}
>> +	if (curr > 0) {
>> +		debug("Expect time (%ld) to be 0\n", curr);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 1) Populate the cgroup with 100 processes. Check that
>> +	 *    the freeze time is 0.
>> +	 */
>> +	for (i = 0; i < 100; i++)
>> +		cg_run_nowait(cgroup, child_fn, NULL);
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr != prev) {
>> +		debug("Expect time (%ld) to be 0\n", curr);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 2) Wait for the group to become fully populated. Check
>> +	 *    that the freeze time is 0.
>> +	 */
>> +	if (cg_wait_for_proc_count(cgroup, 100))
>> +		goto cleanup;
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr != prev) {
>> +		debug("Expect time (%ld) to be 0\n", curr);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 3) Freeze the cgroup and then populate it with 100 more
>> +	 *    processes. Check that the freeze time continues to grow.
>> +	 */
>> +	if (cg_freeze_nowait(cgroup, true))
>> +		goto cleanup;
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr <= prev) {
>> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
>> +		      curr, prev);
>> +		goto cleanup;
>> +	}
>> +
>> +	for (i = 0; i < 100; i++)
>> +		cg_run_nowait(cgroup, child_fn, NULL);
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr <= prev) {
>> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
>> +		      curr, prev);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 4) Wait for the group to become fully populated. Check
>> +	 *    that the freeze time is larger than at 3).
>> +	 */
>> +	if (cg_wait_for_proc_count(cgroup, 200))
>> +		goto cleanup;
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr <= prev) {
>> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
>> +		      curr, prev);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 5) Unfreeze the cgroup. Check that the freeze time is
>> +	 *    larger than at 4).
>> +	 */
>> +	if (cg_freeze_nowait(cgroup, false))
>> +		goto cleanup;
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr <= prev) {
>> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
>> +		      curr, prev);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 6) Kill the processes. Check that the freeze time is the
>> +	 *    same as it was at 5).
>> +	 */
>> +	if (cg_killall(cgroup))
>> +		goto cleanup;
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr != prev) {
>> +		debug("Expect time (%ld) to be unchanged from previous check (%ld)\n",
>> +		      curr, prev);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * 7) Freeze and unfreeze the cgroup. Check that the freeze
>> +	 *    time is larger than it was at 6).
>> +	 */
>> +	if (cg_freeze_nowait(cgroup, true))
>> +		goto cleanup;
>> +	if (cg_freeze_nowait(cgroup, false))
>> +		goto cleanup;
>> +	prev = curr;
>> +	curr = cg_check_freezetime(cgroup);
>> +	if (curr <= prev) {
>> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
>> +		      curr, prev);
>> +		goto cleanup;
>> +	}
>> +
>> +	ret = KSFT_PASS;
>> +
>> +cleanup:
>> +	if (cgroup)
>> +		cg_destroy(cgroup);
>> +	free(cgroup);
>> +	return ret;
>> +}
>> +
>> +/*
>> + * Test that frozen time for a cgroup continues to work as expected,
>> + * even as processes are migrated. Frozen cgroup A's freeze time should
>> + * continue to increase and running cgroup B's should stay 0.
>> + */
>> +static int test_cgfreezer_time_migrate(const char *root)
>> +{
>> +	long prev_A, curr_A, curr_B;
>> +	char *cgroup[2] = {0};
>> +	int ret = KSFT_FAIL;
>> +	int pid;
>> +
>> +	cgroup[0] = cg_name(root, "cg_time_test_migrate_A");
>> +	if (!cgroup[0])
>> +		goto cleanup;
>> +
>> +	cgroup[1] = cg_name(root, "cg_time_test_migrate_B");
>> +	if (!cgroup[1])
>> +		goto cleanup;
>> +
>> +	if (cg_create(cgroup[0]))
>> +		goto cleanup;
>> +
>> +	if (cg_check_freezetime(cgroup[0]) < 0) {
>> +		ret = KSFT_SKIP;
>> +		goto cleanup;
>> +	}
>> +
>> +	if (cg_create(cgroup[1]))
>> +		goto cleanup;
>> +
>> +	pid = cg_run_nowait(cgroup[0], child_fn, NULL);
>> +	if (pid < 0)
>> +		goto cleanup;
>> +
>> +	if (cg_wait_for_proc_count(cgroup[0], 1))
>> +		goto cleanup;
>> +
>> +	curr_A = cg_check_freezetime(cgroup[0]);
>> +	if (curr_A) {
>> +		debug("Expect time (%ld) to be 0\n", curr_A);
>> +		goto cleanup;
>> +	}
>> +	curr_B = cg_check_freezetime(cgroup[1]);
>> +	if (curr_B) {
>> +		debug("Expect time (%ld) to be 0\n", curr_B);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * Freeze cgroup A.
>> +	 */
>> +	if (cg_freeze_wait(cgroup[0], true))
>> +		goto cleanup;
>> +	prev_A = curr_A;
>> +	curr_A = cg_check_freezetime(cgroup[0]);
>> +	if (curr_A <= prev_A) {
>> +		debug("Expect time (%ld) to be > 0\n", curr_A);
>> +		goto cleanup;
>> +	}
>> +
>> +	/*
>> +	 * Migrate from A (frozen) to B (running).
>> +	 */
>> +	if (cg_enter(cgroup[1], pid))
>> +		goto cleanup;
>> +
>> +	usleep(1000);
>> +	curr_B = cg_check_freezetime(cgroup[1]);
>> +	if (curr_B) {
>> +		debug("Expect time (%ld) to be 0\n", curr_B);
>> +		goto cleanup;
>> +	}
>> +
>> +	prev_A = curr_A;
>> +	curr_A = cg_check_freezetime(cgroup[0]);
>> +	if (curr_A <= prev_A) {
>> +		debug("Expect time (%ld) to be more than previous check (%ld)\n",
>> +		      curr_A, prev_A);
>> +		goto cleanup;
>> +	}
>> +
>> +	ret = KSFT_PASS;
>> +
>> +cleanup:
>> +	if (cgroup[0])
>> +		cg_destroy(cgroup[0]);
>> +	free(cgroup[0]);
>> +	if (cgroup[1])
>> +		cg_destroy(cgroup[1]);
>> +	free(cgroup[1]);
>> +	return ret;
>> +}
>> +
>> +/*
>> + * The test creates a cgroup and freezes it. Then it creates a child  
>> cgroup.
>> + * After that it checks that the child cgroup has a non-zero freeze time
>> + * that is less than the parent's. Next, it freezes the child, unfreezes
>> + * the parent, and sleeps. Finally, it checks that the child's freeze
>> + * time has grown larger than the parent's.
>> + */
>> +static int test_cgfreezer_time_parent(const char *root)
>> +{
>> +	char *parent, *child = NULL;
>> +	int ret = KSFT_FAIL;
>> +	long ptime, ctime;
>> +
>> +	parent = cg_name(root, "cg_test_parent_A");
>> +	if (!parent)
>> +		goto cleanup;
>> +
>> +	child = cg_name(parent, "cg_test_parent_B");
>> +	if (!child)
>> +		goto cleanup;
>> +
>> +	if (cg_create(parent))
>> +		goto cleanup;
>> +
>> +	if (cg_check_freezetime(parent) < 0) {
>> +		ret = KSFT_SKIP;
>> +		goto cleanup;
>> +	}
>> +
>> +	if (cg_freeze_wait(parent, true))
>> +		goto cleanup;
>> +
>> +	usleep(1000);
>> +	if (cg_create(child))
>> +		goto cleanup;
>> +
>> +	if (cg_check_frozen(child, true))
>> +		goto cleanup;
>> +
>> +	/*
>> +	 * Since the parent was frozen the entire time the child cgroup
>> +	 * was being created, we expect the parent's freeze time to be
>> +	 * larger than the child's.
>> +	 *
>> +	 * Ideally, we would be able to check both times simultaneously,
>> +	 * but here we get the child's after we get the parent's.
>> +	 */
>> +	ptime = cg_check_freezetime(parent);
>> +	ctime = cg_check_freezetime(child);
>> +	if (ptime <= ctime) {
>> +		debug("Expect ptime (%ld) > ctime (%ld)\n", ptime, ctime);
>> +		goto cleanup;
>> +	}
>> +
>> +	if (cg_freeze_nowait(child, true))
>> +		goto cleanup;
>> +
>> +	if (cg_freeze_wait(parent, false))
>> +		goto cleanup;
>> +
>> +	if (cg_check_frozen(child, true))
>> +		goto cleanup;
>> +
>> +	usleep(100000);
>> +
>> +	ctime = cg_check_freezetime(child);
>> +	ptime = cg_check_freezetime(parent);
>> +
>> +	if (ctime <= ptime) {
>> +		debug("Expect ctime (%ld) > ptime (%ld)\n", ctime, ptime);
>> +		goto cleanup;
>> +	}
>> +
>> +	ret = KSFT_PASS;
>> +
>> +cleanup:
>> +	if (child)
>> +		cg_destroy(child);
>> +	free(child);
>> +	if (parent)
>> +		cg_destroy(parent);
>> +	free(parent);
>> +	return ret;
>> +}
>> +
>> +/*
>> + * The test creates a parent cgroup and a child cgroup. Then, it freezes
>> + * the child and checks that the child's freeze time is greater than the
>> + * parent's, which should be zero.
>> + */
>> +static int test_cgfreezer_time_child(const char *root)
>> +{
>> +	char *parent, *child = NULL;
>> +	int ret = KSFT_FAIL;
>> +	long ptime, ctime;
>> +
>> +	parent = cg_name(root, "cg_test_child_A");
>> +	if (!parent)
>> +		goto cleanup;
>> +
>> +	child = cg_name(parent, "cg_test_child_B");
>> +	if (!child)
>> +		goto cleanup;
>> +
>> +	if (cg_create(parent))
>> +		goto cleanup;
>> +
>> +	if (cg_check_freezetime(parent) < 0) {
>> +		ret = KSFT_SKIP;
>> +		goto cleanup;
>> +	}
>> +
>> +	if (cg_create(child))
>> +		goto cleanup;
>> +
>> +	if (cg_freeze_wait(child, true))
>> +		goto cleanup;
>> +
>> +	ctime = cg_check_freezetime(child);
>> +	ptime = cg_check_freezetime(parent);
>> +	if (ptime != 0) {
>> +		debug("Expect ptime (%ld) to be 0\n", ptime);
>> +		goto cleanup;
>> +	}
>> +
>> +	if (ctime <= ptime) {
>> +		debug("Expect ctime (%ld) <= ptime (%ld)\n", ctime, ptime);
>> +		goto cleanup;
>> +	}
>> +
>> +	ret = KSFT_PASS;
>> +
>> +cleanup:
>> +	if (child)
>> +		cg_destroy(child);
>> +	free(child);
>> +	if (parent)
>> +		cg_destroy(parent);
>> +	free(parent);
>> +	return ret;
>> +}
>> +
>> +/*
>> + * The test creates the following hierarchy:
>> + *    A
>> + *    |
>> + *    B
>> + *    |
>> + *    C
>> + *
>> + * Then it freezes the cgroups in the order C, B, A.
>> + * Then it unfreezes the cgroups in the order A, B, C.
>> + * Then it checks that C's freeze time is larger than B's and
>> + * that B's is larger than A's.
>> + */
>> +static int test_cgfreezer_time_nested(const char *root)
>> +{
>> +	char *cgroup[3] = {0};
>> +	int ret = KSFT_FAIL;
>> +	long time[3] = {0};
>> +	int i;
>> +
>> +	cgroup[0] = cg_name(root, "cg_test_time_A");
>> +	if (!cgroup[0])
>> +		goto cleanup;
>> +
>> +	cgroup[1] = cg_name(cgroup[0], "B");
>> +	if (!cgroup[1])
>> +		goto cleanup;
>> +
>> +	cgroup[2] = cg_name(cgroup[1], "C");
>> +	if (!cgroup[2])
>> +		goto cleanup;
>> +
>> +	if (cg_create(cgroup[0]))
>> +		goto cleanup;
>> +
>> +	if (cg_check_freezetime(cgroup[0]) < 0) {
>> +		ret = KSFT_SKIP;
>> +		goto cleanup;
>> +	}
>> +
>> +	if (cg_create(cgroup[1]))
>> +		goto cleanup;
>> +
>> +	if (cg_create(cgroup[2]))
>> +		goto cleanup;
>> +
>> +	if (cg_freeze_nowait(cgroup[2], true))
>> +		goto cleanup;
>> +
>> +	if (cg_freeze_nowait(cgroup[1], true))
>> +		goto cleanup;
>> +
>> +	if (cg_freeze_nowait(cgroup[0], true))
>> +		goto cleanup;
>> +
>> +	usleep(1000);
>> +
>> +	if (cg_freeze_nowait(cgroup[0], false))
>> +		goto cleanup;
>> +
>> +	if (cg_freeze_nowait(cgroup[1], false))
>> +		goto cleanup;
>> +
>> +	if (cg_freeze_nowait(cgroup[2], false))
>> +		goto cleanup;
>> +
>> +	time[2] = cg_check_freezetime(cgroup[2]);
>> +	time[1] = cg_check_freezetime(cgroup[1]);
>> +	time[0] = cg_check_freezetime(cgroup[0]);
>> +
>> +	if (time[2] <= time[1]) {
>> +		debug("Expect C's time (%ld) > B's time (%ld)", time[2], time[1]);
>> +		goto cleanup;
>> +	}
>> +
>> +	if (time[1] <= time[0]) {
>> +		debug("Expect B's time (%ld) > A's time (%ld)", time[1], time[0]);
>> +		goto cleanup;
>> +	}
>> +
>> +	ret = KSFT_PASS;
>> +
>> +cleanup:
>> +	for (i = 2; i >= 0 && cgroup[i]; i--) {
>> +		cg_destroy(cgroup[i]);
>> +		free(cgroup[i]);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>   #define T(x) { x, #x }
>>   struct cgfreezer_test {
>>   	int (*fn)(const char *root);
>> @@ -819,6 +1475,13 @@ struct cgfreezer_test {
>>   	T(test_cgfreezer_stopped),
>>   	T(test_cgfreezer_ptraced),
>>   	T(test_cgfreezer_vfork),
>> +	T(test_cgfreezer_time_empty),
>> +	T(test_cgfreezer_time_simple),
>> +	T(test_cgfreezer_time_populate),
>> +	T(test_cgfreezer_time_migrate),
>> +	T(test_cgfreezer_time_parent),
>> +	T(test_cgfreezer_time_child),
>> +	T(test_cgfreezer_time_nested),
>>   };
>>   #undef T


-- 
Tiffany Y. Yang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting
  2025-08-22  6:58     ` Chen Ridong
@ 2025-08-22 19:32       ` Tiffany Yang
  2025-08-23  1:45         ` Chen Ridong
  0 siblings, 1 reply; 12+ messages in thread
From: Tiffany Yang @ 2025-08-22 19:32 UTC (permalink / raw)
  To: Chen Ridong
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Stephen Boyd,
	Anna-Maria Behnsen, Frederic Weisbecker, Tejun Heo,
	Johannes Weiner, Michal Koutný, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, Shuah Khan, cgroups, linux-doc, linux-kselftest

Hi Chen,

Thanks again for taking a look!

Chen Ridong <chenridong@huaweicloud.com> writes:

> On 2025/8/22 14:14, Chen Ridong wrote:


>> On 2025/8/22 9:37, Tiffany Yang wrote:
>>> There isn't yet a clear way to identify a set of "lost" time that
>>> everyone (or at least a wider group of users) cares about. However,
>>> users can perform some delay accounting by iterating over components of
>>> interest. This patch allows cgroup v2 freezing time to be one of those
>>> components.

>>> Track the cumulative time that each v2 cgroup spends freezing and expose
>>> it to userland via a new local stat file in cgroupfs. Thank you to
>>> Michal, who provided the ASCII art in the updated documentation.

>>> To access this value:
>>>    $ mkdir /sys/fs/cgroup/test
>>>    $ cat /sys/fs/cgroup/test/cgroup.stat.local
>>>    freeze_time_total 0

>>> Ensure consistent freeze time reads with freeze_seq, a per-cgroup
>>> sequence counter. Writes are serialized using the css_set_lock.

...

>>>   	spin_lock_irq(&css_set_lock);
>>> -	if (freeze)
>>> +	write_seqcount_begin(&cgrp->freezer.freeze_seq);
>>> +	if (freeze) {
>>>   		set_bit(CGRP_FREEZE, &cgrp->flags);
>>> -	else
>>> +		cgrp->freezer.freeze_start_nsec = ts_nsec;
>>> +	} else {
>>>   		clear_bit(CGRP_FREEZE, &cgrp->flags);
>>> +		cgrp->freezer.frozen_nsec += (ts_nsec -
>>> +			cgrp->freezer.freeze_start_nsec);
>>> +	}
>>> +	write_seqcount_end(&cgrp->freezer.freeze_seq);
>>>   	spin_unlock_irq(&css_set_lock);


>> Hello Tiffany,

>> I wanted to check if there are any specific considerations regarding how  
>> we should input the ts_nsec
>> value.

>> Would it be possible to define this directly within the cgroup_do_freeze  
>> function rather than
>> passing it as a parameter? This approach might simplify the  
>> implementation and potentially improve
>> timing accuracy when it have lots of descendants.


> I revisited v3, and this was Michal's point.
> 	p
>       /  |  \
>      1  ...  n
> When we freeze the parent group p, is it expected that all descendant  
> cgroups (1 to n) should share
> the same frozen timestamp?


Yes, this is the expectation from the current change. I understand your
concern about the accuracy of this measurement (especially when there
are many descendants), but I agree with Michal's point that the time to
traverse the descendant cgroups is basically noise relative to the
quantity we're trying to measure here.

> If the cgroup tree structure is stable, the exact frozen time may not be  
> really matter. However, if
> the tree is not stable, obtaining the same frozen time is acceptable?

I'm a little unclear as to what you mean about when the cgroup tree is
unstable. In the case where a new descendant of p is being created, I
believe the cgroup_mutex prevents that from happening at the same time
as we are freezing p's other descendants. If it won the race, was
created unfrozen under p, and then became frozen during cgroup_freeze,
it would have the same timestamp as the other descendants. If it lost
the race and was created as a frozen cgroup under p, it would get its
own timestamp in cgroup_create, so its freezing duration would be
slightly less than that of the others in the hierarchy. Both values
would be acceptable for our purposes, but if there was a different case
you had in mind, please let me know!

Thanks,
-- 
Tiffany Y. Yang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting
  2025-08-22 19:32       ` Tiffany Yang
@ 2025-08-23  1:45         ` Chen Ridong
  2025-08-25 21:00           ` Tiffany Yang
  0 siblings, 1 reply; 12+ messages in thread
From: Chen Ridong @ 2025-08-23  1:45 UTC (permalink / raw)
  To: Tiffany Yang
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Stephen Boyd,
	Anna-Maria Behnsen, Frederic Weisbecker, Tejun Heo,
	Johannes Weiner, Michal Koutný, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, Shuah Khan, cgroups, linux-doc, linux-kselftest



On 2025/8/23 3:32, Tiffany Yang wrote:
> Hi Chen,
> 
> Thanks again for taking a look!
> 
> Chen Ridong <chenridong@huaweicloud.com> writes:
> 
>> On 2025/8/22 14:14, Chen Ridong wrote:
> 
> 
>>> On 2025/8/22 9:37, Tiffany Yang wrote:
>>>> There isn't yet a clear way to identify a set of "lost" time that
>>>> everyone (or at least a wider group of users) cares about. However,
>>>> users can perform some delay accounting by iterating over components of
>>>> interest. This patch allows cgroup v2 freezing time to be one of those
>>>> components.
> 
>>>> Track the cumulative time that each v2 cgroup spends freezing and expose
>>>> it to userland via a new local stat file in cgroupfs. Thank you to
>>>> Michal, who provided the ASCII art in the updated documentation.
> 
>>>> To access this value:
>>>>    $ mkdir /sys/fs/cgroup/test
>>>>    $ cat /sys/fs/cgroup/test/cgroup.stat.local
>>>>    freeze_time_total 0
> 
>>>> Ensure consistent freeze time reads with freeze_seq, a per-cgroup
>>>> sequence counter. Writes are serialized using the css_set_lock.
> 
> ...
> 
>>>>       spin_lock_irq(&css_set_lock);
>>>> -    if (freeze)
>>>> +    write_seqcount_begin(&cgrp->freezer.freeze_seq);
>>>> +    if (freeze) {
>>>>           set_bit(CGRP_FREEZE, &cgrp->flags);
>>>> -    else
>>>> +        cgrp->freezer.freeze_start_nsec = ts_nsec;
>>>> +    } else {
>>>>           clear_bit(CGRP_FREEZE, &cgrp->flags);
>>>> +        cgrp->freezer.frozen_nsec += (ts_nsec -
>>>> +            cgrp->freezer.freeze_start_nsec);
>>>> +    }
>>>> +    write_seqcount_end(&cgrp->freezer.freeze_seq);
>>>>       spin_unlock_irq(&css_set_lock);
> 
> 
>>> Hello Tiffany,
> 
>>> I wanted to check if there are any specific considerations regarding how we should input the ts_nsec
>>> value.
> 
>>> Would it be possible to define this directly within the cgroup_do_freeze function rather than
>>> passing it as a parameter? This approach might simplify the implementation and potentially improve
>>> timing accuracy when it have lots of descendants.
> 
> 
>> I revisited v3, and this was Michal's point.
>>     p
>>       /  |  \
>>      1  ...  n
>> When we freeze the parent group p, is it expected that all descendant cgroups (1 to n) should share
>> the same frozen timestamp?
> 
> 
> Yes, this is the expectation from the current change. I understand your
> concern about the accuracy of this measurement (especially when there
> are many descendants), but I agree with Michal's point that the time to
> traverse the descendant cgroups is basically noise relative to the
> quantity we're trying to measure here.
> 
>> If the cgroup tree structure is stable, the exact frozen time may not be really matter. However, if
>> the tree is not stable, obtaining the same frozen time is acceptable?
> 
> I'm a little unclear as to what you mean about when the cgroup tree is
> unstable. In the case where a new descendant of p is being created, I
> believe the cgroup_mutex prevents that from happening at the same time
> as we are freezing p's other descendants. If it won the race, was
> created unfrozen under p, and then became frozen during cgroup_freeze,
> it would have the same timestamp as the other descendants. If it lost
> the race and was created as a frozen cgroup under p, it would get its
> own timestamp in cgroup_create, so its freezing duration would be
> slightly less than that of the others in the hierarchy. Both values
> would be acceptable for our purposes, but if there was a different case
> you had in mind, please let me know!
> 
> Thanks,

What I mean by "stable" is that while cgroup 1 through n might be deleted or have more descendants
created. For example:

         n  n-1  n-2  ... 1
frozen   a  a+1  a+2     a+n
unfozen  b  b+1  b+2  ... b+n
nsec     b-a ...

In this case, all frozen_nsec values are b - a, which I believe is correct.
However, consider a scenario where some cgroups are deleted:

         n  n-1  n-2  ... 1
frozen   a  a+1  a+2     a+n
// 2 ... n-1 are deleted.
unfozen  b               b+1

Here, the frozen_nsec for cgroup n would be b - a, but for cgroup 1 it would be (b + 1) - (a + n).
This could introduce some discrepancy / timing inaccuracies.

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/2] cgroup: selftests: Add tests for freezer time
  2025-08-22 18:50     ` Tiffany Yang
@ 2025-08-23  1:47       ` Chen Ridong
  0 siblings, 0 replies; 12+ messages in thread
From: Chen Ridong @ 2025-08-23  1:47 UTC (permalink / raw)
  To: Tiffany Yang
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Stephen Boyd,
	Anna-Maria Behnsen, Frederic Weisbecker, Tejun Heo,
	Johannes Weiner, Michal Koutný, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, Shuah Khan, cgroups, linux-doc, linux-kselftest



On 2025/8/23 2:50, Tiffany Yang wrote:
>> Perhaps we can simply use if (curr != 0) for the condition?
> 
> 
> Here we have 2 separate conditions because in the case where curr < 0,
> it means that the interface is not available and we should skip this
> test instead of failing it. In the case where curr > 0, the feature is
> not working correctly, and the test should fail as a result.

Thank you for your explanation.
See now.

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting
  2025-08-23  1:45         ` Chen Ridong
@ 2025-08-25 21:00           ` Tiffany Yang
  0 siblings, 0 replies; 12+ messages in thread
From: Tiffany Yang @ 2025-08-25 21:00 UTC (permalink / raw)
  To: Chen Ridong
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Stephen Boyd,
	Anna-Maria Behnsen, Frederic Weisbecker, Tejun Heo,
	Johannes Weiner, Michal Koutný, Rafael J. Wysocki,
	Pavel Machek, Roman Gushchin, Chen Ridong, kernel-team,
	Jonathan Corbet, Shuah Khan, cgroups, linux-doc, linux-kselftest

Chen Ridong <chenridong@huaweicloud.com> writes:

...

>> Thanks,

> What I mean by "stable" is that while cgroup 1 through n might be deleted  
> or have more descendants
> created. For example:

>           n  n-1  n-2  ... 1
> frozen   a  a+1  a+2     a+n
> unfozen  b  b+1  b+2  ... b+n
> nsec     b-a ...

> In this case, all frozen_nsec values are b - a, which I believe is  
> correct.
> However, consider a scenario where some cgroups are deleted:

>           n  n-1  n-2  ... 1
> frozen   a  a+1  a+2     a+n
> // 2 ... n-1 are deleted.
> unfozen  b               b+1

> Here, the frozen_nsec for cgroup n would be b - a, but for cgroup 1 it  
> would be (b + 1) - (a + n).
> This could introduce some discrepancy / timing inaccuracies.

Ah, I think I see what you're saying. I had a similar concern when I had
been looking to track this value per-task rather than per-cgroup (i.e.,
when there are many tasks, the frozen duration recorded for the cgroup
drifts from the duration that the task is actually frozen). Ultimately,
although those inaccuracies exist, for the time scales in our use case,
they would not grow large enough to make an appreciable
difference. To use your example, the ~(n - 1) difference between the
"true" frozen duration and the reported one is still effectively the
same (to us). For others, their systems may see a much larger "n" than
we might realistically see on ours, or they may need finer-grained
reporting, so this solution may not be adequate.

-- 
Tiffany Y. Yang

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-08-25 21:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22  1:37 [PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer Tiffany Yang
2025-08-22  1:37 ` [PATCH v4 1/2] cgroup: cgroup.stat.local time accounting Tiffany Yang
2025-08-22  6:14   ` Chen Ridong
2025-08-22  6:58     ` Chen Ridong
2025-08-22 19:32       ` Tiffany Yang
2025-08-23  1:45         ` Chen Ridong
2025-08-25 21:00           ` Tiffany Yang
2025-08-22  1:37 ` [PATCH v4 2/2] cgroup: selftests: Add tests for freezer time Tiffany Yang
2025-08-22  7:19   ` Chen Ridong
2025-08-22 18:50     ` Tiffany Yang
2025-08-23  1:47       ` Chen Ridong
2025-08-22 17:51 ` [PATCH v4 0/2] cgroup: Track time in cgroup v2 freezer Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).