linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat
@ 2024-07-15 15:00 Waiman Long
  2024-07-15 17:22 ` Johannes Weiner
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Waiman Long @ 2024-07-15 15:00 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Michal Koutný,
	Jonathan Corbet
  Cc: cgroups, linux-doc, linux-kernel, Kamalesh Babulal,
	Roman Gushchin, Waiman Long

Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
help manage different structures in various cgroup subsystems by being
an embedded element inside a larger structure like cpuset or mem_cgroup.

The /proc/cgroups file shows the number of cgroups for each of the
subsystems.  With cgroup v1, the number of CSSes is the same as the
number of cgroups.  That is not the case anymore with cgroup v2. The
/proc/cgroups file cannot show the actual number of CSSes for the
subsystems that are bound to cgroup v2.

So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
we can't tell by looking at /proc/cgroups which cgroup subsystems may
be responsible.

As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
cgroup.stat file is now being extended to show the number of live and
dying CSSes associated with all the non-inhibited cgroup subsystems that
have been bound to cgroup v2. The number includes CSSes in the current
cgroup as well as in all the descendants underneath it.  This will help
us pinpoint which subsystems are responsible for the increasing number
of dying (nr_dying_descendants) cgroups.

The CSSes dying counts are stored in the cgroup structure itself
instead of inside the CSS as suggested by Johannes. This will allow
us to accurately track dying counts of cgroup subsystems that have
recently been disabled in a cgroup. It is now possible that a zero
subsystem number is coupled with a non-zero dying subsystem number.

The cgroup-v2.rst file is updated to discuss this new behavior.

With this patch applied, a sample output from root cgroup.stat file
was shown below.

	nr_descendants 56
	nr_subsys_cpuset 1
	nr_subsys_cpu 43
	nr_subsys_io 43
	nr_subsys_memory 56
	nr_subsys_perf_event 57
	nr_subsys_hugetlb 1
	nr_subsys_pids 56
	nr_subsys_rdma 1
	nr_subsys_misc 1
	nr_dying_descendants 30
	nr_dying_subsys_cpuset 0
	nr_dying_subsys_cpu 0
	nr_dying_subsys_io 0
	nr_dying_subsys_memory 30
	nr_dying_subsys_perf_event 0
	nr_dying_subsys_hugetlb 0
	nr_dying_subsys_pids 0
	nr_dying_subsys_rdma 0
	nr_dying_subsys_misc 0

Another sample output from system.slice/cgroup.stat was:

	nr_descendants 34
	nr_subsys_cpuset 0
	nr_subsys_cpu 32
	nr_subsys_io 32
	nr_subsys_memory 34
	nr_subsys_perf_event 35
	nr_subsys_hugetlb 0
	nr_subsys_pids 34
	nr_subsys_rdma 0
	nr_subsys_misc 0
	nr_dying_descendants 30
	nr_dying_subsys_cpuset 0
	nr_dying_subsys_cpu 0
	nr_dying_subsys_io 0
	nr_dying_subsys_memory 30
	nr_dying_subsys_perf_event 0
	nr_dying_subsys_hugetlb 0
	nr_dying_subsys_pids 0
	nr_dying_subsys_rdma 0
	nr_dying_subsys_misc 0

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 12 +++++-
 include/linux/cgroup-defs.h             | 14 +++++++
 kernel/cgroup/cgroup.c                  | 55 ++++++++++++++++++++++++-
 3 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 52763d6b2919..abf3adad04bd 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -981,6 +981,14 @@ All cgroup core files are prefixed with "cgroup."
 		A dying cgroup can consume system resources not exceeding
 		limits, which were active at the moment of cgroup deletion.
 
+	  nr_subsys_<cgroup_subsys>
+		Total number of live cgroup subsystems (e.g memory
+		cgroup) at and beneath the current cgroup.
+
+	  nr_dying_subsys_<cgroup_subsys>
+		Total number of dying cgroup subsystems (e.g. memory
+		cgroup) at and beneath the current cgroup.
+
   cgroup.freeze
 	A read-write single value file which exists on non-root cgroups.
 	Allowed values are "0" and "1". The default is "0".
@@ -2930,8 +2938,8 @@ Deprecated v1 Core Features
 
 - "cgroup.clone_children" is removed.
 
-- /proc/cgroups is meaningless for v2.  Use "cgroup.controllers" file
-  at the root instead.
+- /proc/cgroups is meaningless for v2.  Use "cgroup.controllers" or
+  "cgroup.stat" files at the root instead.
 
 
 Issues with v1 and Rationales for v2
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index b36690ca0d3f..3cb049f104f6 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -210,6 +210,14 @@ struct cgroup_subsys_state {
 	 * fields of the containing structure.
 	 */
 	struct cgroup_subsys_state *parent;
+
+	/*
+	 * Keep track of total numbers of visible descendant CSSes.
+	 * The total number of dying CSSes is tracked in
+	 * css->cgroup->nr_dying_subsys[ssid].
+	 * Protected by cgroup_mutex.
+	 */
+	int nr_descendants;
 };
 
 /*
@@ -470,6 +478,12 @@ struct cgroup {
 	/* Private pointers for each registered subsystem */
 	struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
 
+	/*
+	 * Keep track of total number of dying CSSes at and below this cgroup.
+	 * Protected by cgroup_mutex.
+	 */
+	int nr_dying_subsys[CGROUP_SUBSYS_COUNT];
+
 	struct cgroup_root *root;
 
 	/*
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c8e4b62b436a..601600afdd20 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3669,12 +3669,40 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
 static int cgroup_stat_show(struct seq_file *seq, void *v)
 {
 	struct cgroup *cgroup = seq_css(seq)->cgroup;
+	struct cgroup_subsys_state *css;
+	int dying_cnt[CGROUP_SUBSYS_COUNT];
+	int ssid;
 
 	seq_printf(seq, "nr_descendants %d\n",
 		   cgroup->nr_descendants);
+
+	/*
+	 * Show the number of live and dying csses associated with each of
+	 * non-inhibited cgroup subsystems that is bound to cgroup v2.
+	 *
+	 * Without proper lock protection, racing is possible. So the
+	 * numbers may not be consistent when that happens.
+	 */
+	rcu_read_lock();
+	for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) {
+		dying_cnt[ssid] = -1;
+		if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
+		    (cgroup_subsys[ssid]->root !=  &cgrp_dfl_root))
+			continue;
+		css = rcu_dereference_raw(cgroup->subsys[ssid]);
+		dying_cnt[ssid] = cgroup->nr_dying_subsys[ssid];
+		seq_printf(seq, "nr_subsys_%s %d\n", cgroup_subsys[ssid]->name,
+			   css ? (css->nr_descendants + 1) : 0);
+	}
+
 	seq_printf(seq, "nr_dying_descendants %d\n",
 		   cgroup->nr_dying_descendants);
-
+	for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) {
+		if (dying_cnt[ssid] >= 0)
+			seq_printf(seq, "nr_dying_subsys_%s %d\n",
+				   cgroup_subsys[ssid]->name, dying_cnt[ssid]);
+	}
+	rcu_read_unlock();
 	return 0;
 }
 
@@ -5424,6 +5452,8 @@ static void css_release_work_fn(struct work_struct *work)
 	list_del_rcu(&css->sibling);
 
 	if (ss) {
+		struct cgroup *parent_cgrp;
+
 		/* css release path */
 		if (!list_empty(&css->rstat_css_node)) {
 			cgroup_rstat_flush(cgrp);
@@ -5433,6 +5463,14 @@ static void css_release_work_fn(struct work_struct *work)
 		cgroup_idr_replace(&ss->css_idr, NULL, css->id);
 		if (ss->css_released)
 			ss->css_released(css);
+
+		cgrp->nr_dying_subsys[ss->id]--;
+		WARN_ON_ONCE(css->nr_descendants || cgrp->nr_dying_subsys[ss->id]);
+		parent_cgrp = cgroup_parent(cgrp);
+		while (parent_cgrp) {
+			parent_cgrp->nr_dying_subsys[ss->id]--;
+			parent_cgrp = cgroup_parent(parent_cgrp);
+		}
 	} else {
 		struct cgroup *tcgrp;
 
@@ -5517,8 +5555,11 @@ static int online_css(struct cgroup_subsys_state *css)
 		rcu_assign_pointer(css->cgroup->subsys[ss->id], css);
 
 		atomic_inc(&css->online_cnt);
-		if (css->parent)
+		if (css->parent) {
 			atomic_inc(&css->parent->online_cnt);
+			while ((css = css->parent))
+				css->nr_descendants++;
+		}
 	}
 	return ret;
 }
@@ -5540,6 +5581,16 @@ static void offline_css(struct cgroup_subsys_state *css)
 	RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
 
 	wake_up_all(&css->cgroup->offline_waitq);
+
+	css->cgroup->nr_dying_subsys[ss->id]++;
+	/*
+	 * Parent css and cgroup cannot be freed until after the freeing
+	 * of child css, see css_free_rwork_fn().
+	 */
+	while ((css = css->parent)) {
+		css->nr_descendants--;
+		css->cgroup->nr_dying_subsys[ss->id]++;
+	}
 }
 
 /**
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat
  2024-07-15 15:00 [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
@ 2024-07-15 17:22 ` Johannes Weiner
  2024-07-15 17:30 ` Roman Gushchin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2024-07-15 17:22 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Michal Koutný, Jonathan Corbet, cgroups,
	linux-doc, linux-kernel, Kamalesh Babulal, Roman Gushchin

On Mon, Jul 15, 2024 at 11:00:34AM -0400, Waiman Long wrote:
> Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
> help manage different structures in various cgroup subsystems by being
> an embedded element inside a larger structure like cpuset or mem_cgroup.
> 
> The /proc/cgroups file shows the number of cgroups for each of the
> subsystems.  With cgroup v1, the number of CSSes is the same as the
> number of cgroups.  That is not the case anymore with cgroup v2. The
> /proc/cgroups file cannot show the actual number of CSSes for the
> subsystems that are bound to cgroup v2.
> 
> So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
> we can't tell by looking at /proc/cgroups which cgroup subsystems may
> be responsible.
> 
> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
> cgroup.stat file is now being extended to show the number of live and
> dying CSSes associated with all the non-inhibited cgroup subsystems that
> have been bound to cgroup v2. The number includes CSSes in the current
> cgroup as well as in all the descendants underneath it.  This will help
> us pinpoint which subsystems are responsible for the increasing number
> of dying (nr_dying_descendants) cgroups.
> 
> The CSSes dying counts are stored in the cgroup structure itself
> instead of inside the CSS as suggested by Johannes. This will allow
> us to accurately track dying counts of cgroup subsystems that have
> recently been disabled in a cgroup. It is now possible that a zero
> subsystem number is coupled with a non-zero dying subsystem number.
> 
> The cgroup-v2.rst file is updated to discuss this new behavior.
> 
> With this patch applied, a sample output from root cgroup.stat file
> was shown below.
> 
> 	nr_descendants 56
> 	nr_subsys_cpuset 1
> 	nr_subsys_cpu 43
> 	nr_subsys_io 43
> 	nr_subsys_memory 56
> 	nr_subsys_perf_event 57
> 	nr_subsys_hugetlb 1
> 	nr_subsys_pids 56
> 	nr_subsys_rdma 1
> 	nr_subsys_misc 1
> 	nr_dying_descendants 30
> 	nr_dying_subsys_cpuset 0
> 	nr_dying_subsys_cpu 0
> 	nr_dying_subsys_io 0
> 	nr_dying_subsys_memory 30
> 	nr_dying_subsys_perf_event 0
> 	nr_dying_subsys_hugetlb 0
> 	nr_dying_subsys_pids 0
> 	nr_dying_subsys_rdma 0
> 	nr_dying_subsys_misc 0
> 
> Another sample output from system.slice/cgroup.stat was:
> 
> 	nr_descendants 34
> 	nr_subsys_cpuset 0
> 	nr_subsys_cpu 32
> 	nr_subsys_io 32
> 	nr_subsys_memory 34
> 	nr_subsys_perf_event 35
> 	nr_subsys_hugetlb 0
> 	nr_subsys_pids 34
> 	nr_subsys_rdma 0
> 	nr_subsys_misc 0
> 	nr_dying_descendants 30
> 	nr_dying_subsys_cpuset 0
> 	nr_dying_subsys_cpu 0
> 	nr_dying_subsys_io 0
> 	nr_dying_subsys_memory 30
> 	nr_dying_subsys_perf_event 0
> 	nr_dying_subsys_hugetlb 0
> 	nr_dying_subsys_pids 0
> 	nr_dying_subsys_rdma 0
> 	nr_dying_subsys_misc 0
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

Looks good to me!

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat
  2024-07-15 15:00 [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
  2024-07-15 17:22 ` Johannes Weiner
@ 2024-07-15 17:30 ` Roman Gushchin
  2024-07-31  0:00   ` Waiman Long
  2024-07-16  7:05 ` Kamalesh Babulal
  2024-07-31  0:21 ` Tejun Heo
  3 siblings, 1 reply; 8+ messages in thread
From: Roman Gushchin @ 2024-07-15 17:30 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, cgroups, linux-doc, linux-kernel,
	Kamalesh Babulal

On Mon, Jul 15, 2024 at 11:00:34AM -0400, Waiman Long wrote:
> Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
> help manage different structures in various cgroup subsystems by being
> an embedded element inside a larger structure like cpuset or mem_cgroup.
> 
> The /proc/cgroups file shows the number of cgroups for each of the
> subsystems.  With cgroup v1, the number of CSSes is the same as the
> number of cgroups.  That is not the case anymore with cgroup v2. The
> /proc/cgroups file cannot show the actual number of CSSes for the
> subsystems that are bound to cgroup v2.
> 
> So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
> we can't tell by looking at /proc/cgroups which cgroup subsystems may
> be responsible.
> 
> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
> cgroup.stat file is now being extended to show the number of live and
> dying CSSes associated with all the non-inhibited cgroup subsystems that
> have been bound to cgroup v2. The number includes CSSes in the current
> cgroup as well as in all the descendants underneath it.  This will help
> us pinpoint which subsystems are responsible for the increasing number
> of dying (nr_dying_descendants) cgroups.
> 
> The CSSes dying counts are stored in the cgroup structure itself
> instead of inside the CSS as suggested by Johannes. This will allow
> us to accurately track dying counts of cgroup subsystems that have
> recently been disabled in a cgroup. It is now possible that a zero
> subsystem number is coupled with a non-zero dying subsystem number.
> 
> The cgroup-v2.rst file is updated to discuss this new behavior.
> 
> With this patch applied, a sample output from root cgroup.stat file
> was shown below.
> 
> 	nr_descendants 56
> 	nr_subsys_cpuset 1
> 	nr_subsys_cpu 43
> 	nr_subsys_io 43
> 	nr_subsys_memory 56
> 	nr_subsys_perf_event 57
> 	nr_subsys_hugetlb 1
> 	nr_subsys_pids 56
> 	nr_subsys_rdma 1
> 	nr_subsys_misc 1
> 	nr_dying_descendants 30
> 	nr_dying_subsys_cpuset 0
> 	nr_dying_subsys_cpu 0
> 	nr_dying_subsys_io 0
> 	nr_dying_subsys_memory 30
> 	nr_dying_subsys_perf_event 0
> 	nr_dying_subsys_hugetlb 0
> 	nr_dying_subsys_pids 0
> 	nr_dying_subsys_rdma 0
> 	nr_dying_subsys_misc 0
> 
> Another sample output from system.slice/cgroup.stat was:
> 
> 	nr_descendants 34
> 	nr_subsys_cpuset 0
> 	nr_subsys_cpu 32
> 	nr_subsys_io 32
> 	nr_subsys_memory 34
> 	nr_subsys_perf_event 35
> 	nr_subsys_hugetlb 0
> 	nr_subsys_pids 34
> 	nr_subsys_rdma 0
> 	nr_subsys_misc 0
> 	nr_dying_descendants 30
> 	nr_dying_subsys_cpuset 0
> 	nr_dying_subsys_cpu 0
> 	nr_dying_subsys_io 0
> 	nr_dying_subsys_memory 30
> 	nr_dying_subsys_perf_event 0
> 	nr_dying_subsys_hugetlb 0
> 	nr_dying_subsys_pids 0
> 	nr_dying_subsys_rdma 0
> 	nr_dying_subsys_misc 0
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat
  2024-07-15 15:00 [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
  2024-07-15 17:22 ` Johannes Weiner
  2024-07-15 17:30 ` Roman Gushchin
@ 2024-07-16  7:05 ` Kamalesh Babulal
  2024-07-31  0:21 ` Tejun Heo
  3 siblings, 0 replies; 8+ messages in thread
From: Kamalesh Babulal @ 2024-07-16  7:05 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Zefan Li, Johannes Weiner,
	Michal Koutný, Jonathan Corbet
  Cc: cgroups, linux-doc, linux-kernel, Roman Gushchin



On 7/15/24 8:30 PM, Waiman Long wrote:
> Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
> help manage different structures in various cgroup subsystems by being
> an embedded element inside a larger structure like cpuset or mem_cgroup.
> 
> The /proc/cgroups file shows the number of cgroups for each of the
> subsystems.  With cgroup v1, the number of CSSes is the same as the
> number of cgroups.  That is not the case anymore with cgroup v2. The
> /proc/cgroups file cannot show the actual number of CSSes for the
> subsystems that are bound to cgroup v2.
> 
> So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
> we can't tell by looking at /proc/cgroups which cgroup subsystems may
> be responsible.
> 
> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
> cgroup.stat file is now being extended to show the number of live and
> dying CSSes associated with all the non-inhibited cgroup subsystems that
> have been bound to cgroup v2. The number includes CSSes in the current
> cgroup as well as in all the descendants underneath it.  This will help
> us pinpoint which subsystems are responsible for the increasing number
> of dying (nr_dying_descendants) cgroups.
> 
> The CSSes dying counts are stored in the cgroup structure itself
> instead of inside the CSS as suggested by Johannes. This will allow
> us to accurately track dying counts of cgroup subsystems that have
> recently been disabled in a cgroup. It is now possible that a zero
> subsystem number is coupled with a non-zero dying subsystem number.
> 
> The cgroup-v2.rst file is updated to discuss this new behavior.
> 
> With this patch applied, a sample output from root cgroup.stat file
> was shown below.
> 
> 	nr_descendants 56
> 	nr_subsys_cpuset 1
> 	nr_subsys_cpu 43
> 	nr_subsys_io 43
> 	nr_subsys_memory 56
> 	nr_subsys_perf_event 57
> 	nr_subsys_hugetlb 1
> 	nr_subsys_pids 56
> 	nr_subsys_rdma 1
> 	nr_subsys_misc 1
> 	nr_dying_descendants 30
> 	nr_dying_subsys_cpuset 0
> 	nr_dying_subsys_cpu 0
> 	nr_dying_subsys_io 0
> 	nr_dying_subsys_memory 30
> 	nr_dying_subsys_perf_event 0
> 	nr_dying_subsys_hugetlb 0
> 	nr_dying_subsys_pids 0
> 	nr_dying_subsys_rdma 0
> 	nr_dying_subsys_misc 0
> 
> Another sample output from system.slice/cgroup.stat was:
> 
> 	nr_descendants 34
> 	nr_subsys_cpuset 0
> 	nr_subsys_cpu 32
> 	nr_subsys_io 32
> 	nr_subsys_memory 34
> 	nr_subsys_perf_event 35
> 	nr_subsys_hugetlb 0
> 	nr_subsys_pids 34
> 	nr_subsys_rdma 0
> 	nr_subsys_misc 0
> 	nr_dying_descendants 30
> 	nr_dying_subsys_cpuset 0
> 	nr_dying_subsys_cpu 0
> 	nr_dying_subsys_io 0
> 	nr_dying_subsys_memory 30
> 	nr_dying_subsys_perf_event 0
> 	nr_dying_subsys_hugetlb 0
> 	nr_dying_subsys_pids 0
> 	nr_dying_subsys_rdma 0
> 	nr_dying_subsys_misc 0
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

The patch looks good to me.

Reviewed-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat
  2024-07-15 17:30 ` Roman Gushchin
@ 2024-07-31  0:00   ` Waiman Long
  0 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2024-07-31  0:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zefan Li, Johannes Weiner, Michal Koutný, Jonathan Corbet,
	cgroups, linux-doc, linux-kernel, Roman Gushchin,
	Kamalesh Babulal


On 7/15/24 13:30, Roman Gushchin wrote:
> On Mon, Jul 15, 2024 at 11:00:34AM -0400, Waiman Long wrote:
>> Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
>> help manage different structures in various cgroup subsystems by being
>> an embedded element inside a larger structure like cpuset or mem_cgroup.
>>
>> The /proc/cgroups file shows the number of cgroups for each of the
>> subsystems.  With cgroup v1, the number of CSSes is the same as the
>> number of cgroups.  That is not the case anymore with cgroup v2. The
>> /proc/cgroups file cannot show the actual number of CSSes for the
>> subsystems that are bound to cgroup v2.
>>
>> So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
>> we can't tell by looking at /proc/cgroups which cgroup subsystems may
>> be responsible.
>>
>> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
>> cgroup.stat file is now being extended to show the number of live and
>> dying CSSes associated with all the non-inhibited cgroup subsystems that
>> have been bound to cgroup v2. The number includes CSSes in the current
>> cgroup as well as in all the descendants underneath it.  This will help
>> us pinpoint which subsystems are responsible for the increasing number
>> of dying (nr_dying_descendants) cgroups.
>>
>> The CSSes dying counts are stored in the cgroup structure itself
>> instead of inside the CSS as suggested by Johannes. This will allow
>> us to accurately track dying counts of cgroup subsystems that have
>> recently been disabled in a cgroup. It is now possible that a zero
>> subsystem number is coupled with a non-zero dying subsystem number.
>>
>> The cgroup-v2.rst file is updated to discuss this new behavior.
>>
>> With this patch applied, a sample output from root cgroup.stat file
>> was shown below.
>>
>> 	nr_descendants 56
>> 	nr_subsys_cpuset 1
>> 	nr_subsys_cpu 43
>> 	nr_subsys_io 43
>> 	nr_subsys_memory 56
>> 	nr_subsys_perf_event 57
>> 	nr_subsys_hugetlb 1
>> 	nr_subsys_pids 56
>> 	nr_subsys_rdma 1
>> 	nr_subsys_misc 1
>> 	nr_dying_descendants 30
>> 	nr_dying_subsys_cpuset 0
>> 	nr_dying_subsys_cpu 0
>> 	nr_dying_subsys_io 0
>> 	nr_dying_subsys_memory 30
>> 	nr_dying_subsys_perf_event 0
>> 	nr_dying_subsys_hugetlb 0
>> 	nr_dying_subsys_pids 0
>> 	nr_dying_subsys_rdma 0
>> 	nr_dying_subsys_misc 0
>>
>> Another sample output from system.slice/cgroup.stat was:
>>
>> 	nr_descendants 34
>> 	nr_subsys_cpuset 0
>> 	nr_subsys_cpu 32
>> 	nr_subsys_io 32
>> 	nr_subsys_memory 34
>> 	nr_subsys_perf_event 35
>> 	nr_subsys_hugetlb 0
>> 	nr_subsys_pids 34
>> 	nr_subsys_rdma 0
>> 	nr_subsys_misc 0
>> 	nr_dying_descendants 30
>> 	nr_dying_subsys_cpuset 0
>> 	nr_dying_subsys_cpu 0
>> 	nr_dying_subsys_io 0
>> 	nr_dying_subsys_memory 30
>> 	nr_dying_subsys_perf_event 0
>> 	nr_dying_subsys_hugetlb 0
>> 	nr_dying_subsys_pids 0
>> 	nr_dying_subsys_rdma 0
>> 	nr_dying_subsys_misc 0
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Tejun, is this patch ready to be merged or do you have other suggestion 
you have in mind?

Thanks,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat
  2024-07-15 15:00 [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
                   ` (2 preceding siblings ...)
  2024-07-16  7:05 ` Kamalesh Babulal
@ 2024-07-31  0:21 ` Tejun Heo
  2024-07-31  9:41   ` Michal Koutný
  3 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2024-07-31  0:21 UTC (permalink / raw)
  To: Waiman Long
  Cc: Zefan Li, Johannes Weiner, Michal Koutný, Jonathan Corbet,
	cgroups, linux-doc, linux-kernel, Kamalesh Babulal,
	Roman Gushchin

On Mon, Jul 15, 2024 at 11:00:34AM -0400, Waiman Long wrote:
> Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
> help manage different structures in various cgroup subsystems by being
> an embedded element inside a larger structure like cpuset or mem_cgroup.
> 
> The /proc/cgroups file shows the number of cgroups for each of the
> subsystems.  With cgroup v1, the number of CSSes is the same as the
> number of cgroups.  That is not the case anymore with cgroup v2. The
> /proc/cgroups file cannot show the actual number of CSSes for the
> subsystems that are bound to cgroup v2.
> 
> So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
> we can't tell by looking at /proc/cgroups which cgroup subsystems may
> be responsible.
> 
> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
> cgroup.stat file is now being extended to show the number of live and
> dying CSSes associated with all the non-inhibited cgroup subsystems that
> have been bound to cgroup v2. The number includes CSSes in the current
> cgroup as well as in all the descendants underneath it.  This will help
> us pinpoint which subsystems are responsible for the increasing number
> of dying (nr_dying_descendants) cgroups.
> 
> The CSSes dying counts are stored in the cgroup structure itself
> instead of inside the CSS as suggested by Johannes. This will allow
> us to accurately track dying counts of cgroup subsystems that have
> recently been disabled in a cgroup. It is now possible that a zero
> subsystem number is coupled with a non-zero dying subsystem number.
> 
> The cgroup-v2.rst file is updated to discuss this new behavior.
> 
> With this patch applied, a sample output from root cgroup.stat file
> was shown below.
...

Applied to cgroup/for-6.12.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat
  2024-07-31  0:21 ` Tejun Heo
@ 2024-07-31  9:41   ` Michal Koutný
  2024-07-31 17:02     ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Koutný @ 2024-07-31  9:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Waiman Long, Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups,
	linux-doc, linux-kernel, Kamalesh Babulal, Roman Gushchin

[-- Attachment #1: Type: text/plain, Size: 2498 bytes --]

On Tue, Jul 30, 2024 at 02:21:56PM GMT, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Jul 15, 2024 at 11:00:34AM -0400, Waiman Long wrote:
> > Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
> > help manage different structures in various cgroup subsystems by being
> > an embedded element inside a larger structure like cpuset or mem_cgroup.
> > 
> > The /proc/cgroups file shows the number of cgroups for each of the
> > subsystems.  With cgroup v1, the number of CSSes is the same as the
> > number of cgroups.  That is not the case anymore with cgroup v2. The
> > /proc/cgroups file cannot show the actual number of CSSes for the
> > subsystems that are bound to cgroup v2.
> > 
> > So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
> > we can't tell by looking at /proc/cgroups which cgroup subsystems may
> > be responsible.
> > 
> > As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
> > cgroup.stat file is now being extended to show the number of live and
> > dying CSSes associated with all the non-inhibited cgroup subsystems that
> > have been bound to cgroup v2. The number includes CSSes in the current
> > cgroup as well as in all the descendants underneath it.  This will help
> > us pinpoint which subsystems are responsible for the increasing number
> > of dying (nr_dying_descendants) cgroups.
> > 
> > The CSSes dying counts are stored in the cgroup structure itself
> > instead of inside the CSS as suggested by Johannes. This will allow
> > us to accurately track dying counts of cgroup subsystems that have
> > recently been disabled in a cgroup. It is now possible that a zero
> > subsystem number is coupled with a non-zero dying subsystem number.
> > 
> > The cgroup-v2.rst file is updated to discuss this new behavior.
> > 
> > With this patch applied, a sample output from root cgroup.stat file
> > was shown below.
> ...
> 
> Applied to cgroup/for-6.12.

I think the commit message is missing something like this:

| 'debug' controller wasn't used to provide this information because
| the controller is not recommended in productions kernels, also many of
| them won't enable CONFIG_CGROUP_DEBUG by default.
| 
| Similar information could be retrieved with debuggers like drgn but
| that's also not always available (e.g. lockdown) and the additional
| cost of runtime tracking here is deemed marginal.

or a 'Link:' to the discussion ;-)

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat
  2024-07-31  9:41   ` Michal Koutný
@ 2024-07-31 17:02     ` Tejun Heo
  0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2024-07-31 17:02 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Waiman Long, Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups,
	linux-doc, linux-kernel, Kamalesh Babulal, Roman Gushchin

On Wed, Jul 31, 2024 at 11:41:39AM +0200, Michal Koutný wrote:
...
> I think the commit message is missing something like this:
> 
> | 'debug' controller wasn't used to provide this information because
> | the controller is not recommended in productions kernels, also many of
> | them won't enable CONFIG_CGROUP_DEBUG by default.
> | 
> | Similar information could be retrieved with debuggers like drgn but
> | that's also not always available (e.g. lockdown) and the additional
> | cost of runtime tracking here is deemed marginal.
> 
> or a 'Link:' to the discussion ;-)

I updated the commit message to include the paras and Link to this thread.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-07-31 17:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-15 15:00 [PATCH-cgroup v7] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
2024-07-15 17:22 ` Johannes Weiner
2024-07-15 17:30 ` Roman Gushchin
2024-07-31  0:00   ` Waiman Long
2024-07-16  7:05 ` Kamalesh Babulal
2024-07-31  0:21 ` Tejun Heo
2024-07-31  9:41   ` Michal Koutný
2024-07-31 17:02     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).