* [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat
@ 2024-07-11 2:51 Waiman Long
2024-07-11 3:26 ` Roman Gushchin
2024-07-25 13:15 ` Michal Koutný
0 siblings, 2 replies; 10+ messages in thread
From: Waiman Long @ 2024-07-11 2:51 UTC (permalink / raw)
To: Tejun Heo, Zefan Li, Johannes Weiner, Michal Koutný,
Jonathan Corbet
Cc: cgroups, linux-doc, linux-kernel, Kamalesh Babulal,
Roman Gushchin, Waiman Long
Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
help manage different structures in various cgroup subsystems by being
an embedded element inside a larger structure like cpuset or mem_cgroup.
The /proc/cgroups file shows the number of cgroups for each of the
subsystems. With cgroup v1, the number of CSSes is the same as the
number of cgroups. That is not the case anymore with cgroup v2. The
/proc/cgroups file cannot show the actual number of CSSes for the
subsystems that are bound to cgroup v2.
So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
we can't tell by looking at /proc/cgroups which cgroup subsystems may
be responsible.
As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
cgroup.stat file is now being extended to show the number of live and
dying CSSes associated with all the non-inhibited cgroup subsystems
that have been bound to cgroup v2 as long as it is not zero. The number
includes CSSes in the current cgroup as well as in all the descendants
underneath it. This will help us pinpoint which subsystems are
responsible for the increasing number of dying (nr_dying_descendants)
cgroups.
The cgroup-v2.rst file is updated to discuss this new behavior.
With this patch applied, a sample output from root cgroup.stat file
was shown below.
nr_descendants 55
nr_dying_descendants 35
nr_subsys_cpuset 1
nr_subsys_cpu 40
nr_subsys_io 40
nr_subsys_memory 55
nr_dying_subsys_memory 35
nr_subsys_perf_event 56
nr_subsys_hugetlb 1
nr_subsys_pids 55
nr_subsys_rdma 1
nr_subsys_misc 1
Another sample output from system.slice/cgroup.stat was:
nr_descendants 32
nr_dying_descendants 33
nr_subsys_cpu 30
nr_subsys_io 30
nr_subsys_memory 32
nr_dying_subsys_memory 33
nr_subsys_perf_event 33
nr_subsys_pids 32
Signed-off-by: Waiman Long <longman@redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 14 ++++++-
include/linux/cgroup-defs.h | 7 ++++
kernel/cgroup/cgroup.c | 52 ++++++++++++++++++++++++-
3 files changed, 70 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 52763d6b2919..356cd430c888 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -981,6 +981,16 @@ All cgroup core files are prefixed with "cgroup."
A dying cgroup can consume system resources not exceeding
limits, which were active at the moment of cgroup deletion.
+ nr_subsys_<cgroup_subsys>
+ Total number of live cgroups associated with that cgroup
+ subsystem (e.g. memory) at and beneath the current
+ cgroup. An entry will only be shown if it is not zero.
+
+ nr_dying_subsys_<cgroup_subsys>
+ Total number of dying cgroups associated with that cgroup
+ subsystem (e.g. memory) beneath the current cgroup.
+ An entry will only be shown if it is not zero.
+
cgroup.freeze
A read-write single value file which exists on non-root cgroups.
Allowed values are "0" and "1". The default is "0".
@@ -2930,8 +2940,8 @@ Deprecated v1 Core Features
- "cgroup.clone_children" is removed.
-- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
- at the root instead.
+- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" or
+ "cgroup.stat" files at the root instead.
Issues with v1 and Rationales for v2
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index b36690ca0d3f..62de18874508 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -210,6 +210,13 @@ struct cgroup_subsys_state {
* fields of the containing structure.
*/
struct cgroup_subsys_state *parent;
+
+ /*
+ * Keep track of total numbers of visible and dying descendant CSSes.
+ * Protected by cgroup_mutex.
+ */
+ int nr_descendants;
+ int nr_dying_descendants;
};
/*
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c8e4b62b436a..cf4fc1c109e2 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3669,12 +3669,36 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
static int cgroup_stat_show(struct seq_file *seq, void *v)
{
struct cgroup *cgroup = seq_css(seq)->cgroup;
+ struct cgroup_subsys_state *css;
+ int ssid;
seq_printf(seq, "nr_descendants %d\n",
cgroup->nr_descendants);
seq_printf(seq, "nr_dying_descendants %d\n",
cgroup->nr_dying_descendants);
+ /*
+ * Show the number of live and dying csses associated with each of
+ * non-inhibited cgroup subsystems bound to cgroup v2 if non-zero.
+ *
+ * Without proper lock protection, racing is possible. So the
+ * numbers may not be consistent when that happens.
+ */
+ rcu_read_lock();
+ for_each_css(css, ssid, cgroup) {
+ if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
+ (cgroup_subsys[ssid]->root != &cgrp_dfl_root))
+ continue;
+
+ seq_printf(seq, "nr_subsys_%s %d\n", cgroup_subsys[ssid]->name,
+ css->nr_descendants + 1);
+ /* Current css is online */
+ if (css->nr_dying_descendants)
+ seq_printf(seq, "nr_dying_subsys_%s %d\n",
+ cgroup_subsys[ssid]->name,
+ css->nr_dying_descendants);
+ }
+ rcu_read_unlock();
return 0;
}
@@ -5424,6 +5448,8 @@ static void css_release_work_fn(struct work_struct *work)
list_del_rcu(&css->sibling);
if (ss) {
+ struct cgroup_subsys_state *parent_css;
+
/* css release path */
if (!list_empty(&css->rstat_css_node)) {
cgroup_rstat_flush(cgrp);
@@ -5433,6 +5459,14 @@ static void css_release_work_fn(struct work_struct *work)
cgroup_idr_replace(&ss->css_idr, NULL, css->id);
if (ss->css_released)
ss->css_released(css);
+
+ WARN_ON_ONCE(css->nr_descendants || css->nr_dying_descendants);
+ parent_css = css->parent;
+ while (parent_css) {
+ parent_css->nr_dying_descendants--;
+ parent_css = parent_css->parent;
+ }
+ css_put(css->parent); /* Parent can be freed now */
} else {
struct cgroup *tcgrp;
@@ -5517,8 +5551,11 @@ static int online_css(struct cgroup_subsys_state *css)
rcu_assign_pointer(css->cgroup->subsys[ss->id], css);
atomic_inc(&css->online_cnt);
- if (css->parent)
+ if (css->parent) {
atomic_inc(&css->parent->online_cnt);
+ while ((css = css->parent))
+ css->nr_descendants++;
+ }
}
return ret;
}
@@ -5540,6 +5577,19 @@ static void offline_css(struct cgroup_subsys_state *css)
RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
wake_up_all(&css->cgroup->offline_waitq);
+
+ /*
+ * Get a reference to parent css to ensure reliable access to its
+ * nr_dying_descendants until after this child css is ready to be
+ * freed.
+ */
+ if (css->parent)
+ css_get(css->parent);
+
+ while ((css = css->parent)) {
+ css->nr_descendants--;
+ css->nr_dying_descendants++;
+ }
}
/**
--
2.39.3
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 2:51 [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
@ 2024-07-11 3:26 ` Roman Gushchin
2024-07-11 12:01 ` Waiman Long
2024-07-11 13:25 ` Kamalesh Babulal
2024-07-25 13:15 ` Michal Koutný
1 sibling, 2 replies; 10+ messages in thread
From: Roman Gushchin @ 2024-07-11 3:26 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Michal Koutný,
Jonathan Corbet, cgroups, linux-doc, linux-kernel,
Kamalesh Babulal
On Wed, Jul 10, 2024 at 10:51:53PM -0400, Waiman Long wrote:
> Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
> help manage different structures in various cgroup subsystems by being
> an embedded element inside a larger structure like cpuset or mem_cgroup.
>
> The /proc/cgroups file shows the number of cgroups for each of the
> subsystems. With cgroup v1, the number of CSSes is the same as the
> number of cgroups. That is not the case anymore with cgroup v2. The
> /proc/cgroups file cannot show the actual number of CSSes for the
> subsystems that are bound to cgroup v2.
>
> So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
> we can't tell by looking at /proc/cgroups which cgroup subsystems may
> be responsible.
>
> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
> cgroup.stat file is now being extended to show the number of live and
> dying CSSes associated with all the non-inhibited cgroup subsystems
> that have been bound to cgroup v2 as long as it is not zero. The number
> includes CSSes in the current cgroup as well as in all the descendants
> underneath it. This will help us pinpoint which subsystems are
> responsible for the increasing number of dying (nr_dying_descendants)
> cgroups.
>
> The cgroup-v2.rst file is updated to discuss this new behavior.
>
> With this patch applied, a sample output from root cgroup.stat file
> was shown below.
>
> nr_descendants 55
> nr_dying_descendants 35
> nr_subsys_cpuset 1
> nr_subsys_cpu 40
> nr_subsys_io 40
> nr_subsys_memory 55
> nr_dying_subsys_memory 35
> nr_subsys_perf_event 56
> nr_subsys_hugetlb 1
> nr_subsys_pids 55
> nr_subsys_rdma 1
> nr_subsys_misc 1
>
> Another sample output from system.slice/cgroup.stat was:
>
> nr_descendants 32
> nr_dying_descendants 33
> nr_subsys_cpu 30
> nr_subsys_io 30
> nr_subsys_memory 32
> nr_dying_subsys_memory 33
> nr_subsys_perf_event 33
> nr_subsys_pids 32
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 14 ++++++-
> include/linux/cgroup-defs.h | 7 ++++
> kernel/cgroup/cgroup.c | 52 ++++++++++++++++++++++++-
> 3 files changed, 70 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 52763d6b2919..356cd430c888 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -981,6 +981,16 @@ All cgroup core files are prefixed with "cgroup."
> A dying cgroup can consume system resources not exceeding
> limits, which were active at the moment of cgroup deletion.
>
> + nr_subsys_<cgroup_subsys>
> + Total number of live cgroups associated with that cgroup
> + subsystem (e.g. memory) at and beneath the current
> + cgroup. An entry will only be shown if it is not zero.
> +
> + nr_dying_subsys_<cgroup_subsys>
> + Total number of dying cgroups associated with that cgroup
> + subsystem (e.g. memory) beneath the current cgroup.
> + An entry will only be shown if it is not zero.
> +
> cgroup.freeze
> A read-write single value file which exists on non-root cgroups.
> Allowed values are "0" and "1". The default is "0".
> @@ -2930,8 +2940,8 @@ Deprecated v1 Core Features
>
> - "cgroup.clone_children" is removed.
>
> -- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
> - at the root instead.
> +- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" or
> + "cgroup.stat" files at the root instead.
>
>
> Issues with v1 and Rationales for v2
> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
> index b36690ca0d3f..62de18874508 100644
> --- a/include/linux/cgroup-defs.h
> +++ b/include/linux/cgroup-defs.h
> @@ -210,6 +210,13 @@ struct cgroup_subsys_state {
> * fields of the containing structure.
> */
> struct cgroup_subsys_state *parent;
> +
> + /*
> + * Keep track of total numbers of visible and dying descendant CSSes.
> + * Protected by cgroup_mutex.
> + */
> + int nr_descendants;
> + int nr_dying_descendants;
> };
>
> /*
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index c8e4b62b436a..cf4fc1c109e2 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -3669,12 +3669,36 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
> static int cgroup_stat_show(struct seq_file *seq, void *v)
> {
> struct cgroup *cgroup = seq_css(seq)->cgroup;
> + struct cgroup_subsys_state *css;
> + int ssid;
>
> seq_printf(seq, "nr_descendants %d\n",
> cgroup->nr_descendants);
> seq_printf(seq, "nr_dying_descendants %d\n",
> cgroup->nr_dying_descendants);
>
> + /*
> + * Show the number of live and dying csses associated with each of
> + * non-inhibited cgroup subsystems bound to cgroup v2 if non-zero.
> + *
> + * Without proper lock protection, racing is possible. So the
> + * numbers may not be consistent when that happens.
> + */
> + rcu_read_lock();
> + for_each_css(css, ssid, cgroup) {
> + if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
> + (cgroup_subsys[ssid]->root != &cgrp_dfl_root))
> + continue;
> +
> + seq_printf(seq, "nr_subsys_%s %d\n", cgroup_subsys[ssid]->name,
> + css->nr_descendants + 1);
> + /* Current css is online */
> + if (css->nr_dying_descendants)
> + seq_printf(seq, "nr_dying_subsys_%s %d\n",
> + cgroup_subsys[ssid]->name,
> + css->nr_dying_descendants);
> + }
> + rcu_read_unlock();
> return 0;
> }
>
> @@ -5424,6 +5448,8 @@ static void css_release_work_fn(struct work_struct *work)
> list_del_rcu(&css->sibling);
>
> if (ss) {
> + struct cgroup_subsys_state *parent_css;
> +
> /* css release path */
> if (!list_empty(&css->rstat_css_node)) {
> cgroup_rstat_flush(cgrp);
> @@ -5433,6 +5459,14 @@ static void css_release_work_fn(struct work_struct *work)
> cgroup_idr_replace(&ss->css_idr, NULL, css->id);
> if (ss->css_released)
> ss->css_released(css);
> +
> + WARN_ON_ONCE(css->nr_descendants || css->nr_dying_descendants);
> + parent_css = css->parent;
> + while (parent_css) {
> + parent_css->nr_dying_descendants--;
> + parent_css = parent_css->parent;
> + }
> + css_put(css->parent); /* Parent can be freed now */
> } else {
> struct cgroup *tcgrp;
>
> @@ -5517,8 +5551,11 @@ static int online_css(struct cgroup_subsys_state *css)
> rcu_assign_pointer(css->cgroup->subsys[ss->id], css);
>
> atomic_inc(&css->online_cnt);
> - if (css->parent)
> + if (css->parent) {
> atomic_inc(&css->parent->online_cnt);
> + while ((css = css->parent))
> + css->nr_descendants++;
> + }
> }
> return ret;
> }
> @@ -5540,6 +5577,19 @@ static void offline_css(struct cgroup_subsys_state *css)
> RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
>
> wake_up_all(&css->cgroup->offline_waitq);
> +
> + /*
> + * Get a reference to parent css to ensure reliable access to its
> + * nr_dying_descendants until after this child css is ready to be
> + * freed.
> + */
> + if (css->parent)
> + css_get(css->parent);
I think this blob can be dropped: css has a reference to their parent up to
very last moment, see css_free_rwork_fn().
And the corresponding css_put() can be dropped too.
With this thing fixed, please, feel free to add
Acked-by: Roman Gushchin <roman.gushchin@linux.dev> .
Thank you!
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 3:26 ` Roman Gushchin
@ 2024-07-11 12:01 ` Waiman Long
2024-07-11 13:25 ` Kamalesh Babulal
1 sibling, 0 replies; 10+ messages in thread
From: Waiman Long @ 2024-07-11 12:01 UTC (permalink / raw)
To: Roman Gushchin
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Michal Koutný,
Jonathan Corbet, cgroups, linux-doc, linux-kernel,
Kamalesh Babulal
On 7/10/24 23:26, Roman Gushchin wrote:
> On Wed, Jul 10, 2024 at 10:51:53PM -0400, Waiman Long wrote:
>> Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
>> help manage different structures in various cgroup subsystems by being
>> an embedded element inside a larger structure like cpuset or mem_cgroup.
>>
>> The /proc/cgroups file shows the number of cgroups for each of the
>> subsystems. With cgroup v1, the number of CSSes is the same as the
>> number of cgroups. That is not the case anymore with cgroup v2. The
>> /proc/cgroups file cannot show the actual number of CSSes for the
>> subsystems that are bound to cgroup v2.
>>
>> So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
>> we can't tell by looking at /proc/cgroups which cgroup subsystems may
>> be responsible.
>>
>> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
>> cgroup.stat file is now being extended to show the number of live and
>> dying CSSes associated with all the non-inhibited cgroup subsystems
>> that have been bound to cgroup v2 as long as it is not zero. The number
>> includes CSSes in the current cgroup as well as in all the descendants
>> underneath it. This will help us pinpoint which subsystems are
>> responsible for the increasing number of dying (nr_dying_descendants)
>> cgroups.
>>
>> The cgroup-v2.rst file is updated to discuss this new behavior.
>>
>> With this patch applied, a sample output from root cgroup.stat file
>> was shown below.
>>
>> nr_descendants 55
>> nr_dying_descendants 35
>> nr_subsys_cpuset 1
>> nr_subsys_cpu 40
>> nr_subsys_io 40
>> nr_subsys_memory 55
>> nr_dying_subsys_memory 35
>> nr_subsys_perf_event 56
>> nr_subsys_hugetlb 1
>> nr_subsys_pids 55
>> nr_subsys_rdma 1
>> nr_subsys_misc 1
>>
>> Another sample output from system.slice/cgroup.stat was:
>>
>> nr_descendants 32
>> nr_dying_descendants 33
>> nr_subsys_cpu 30
>> nr_subsys_io 30
>> nr_subsys_memory 32
>> nr_dying_subsys_memory 33
>> nr_subsys_perf_event 33
>> nr_subsys_pids 32
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> Documentation/admin-guide/cgroup-v2.rst | 14 ++++++-
>> include/linux/cgroup-defs.h | 7 ++++
>> kernel/cgroup/cgroup.c | 52 ++++++++++++++++++++++++-
>> 3 files changed, 70 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>> index 52763d6b2919..356cd430c888 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -981,6 +981,16 @@ All cgroup core files are prefixed with "cgroup."
>> A dying cgroup can consume system resources not exceeding
>> limits, which were active at the moment of cgroup deletion.
>>
>> + nr_subsys_<cgroup_subsys>
>> + Total number of live cgroups associated with that cgroup
>> + subsystem (e.g. memory) at and beneath the current
>> + cgroup. An entry will only be shown if it is not zero.
>> +
>> + nr_dying_subsys_<cgroup_subsys>
>> + Total number of dying cgroups associated with that cgroup
>> + subsystem (e.g. memory) beneath the current cgroup.
>> + An entry will only be shown if it is not zero.
>> +
>> cgroup.freeze
>> A read-write single value file which exists on non-root cgroups.
>> Allowed values are "0" and "1". The default is "0".
>> @@ -2930,8 +2940,8 @@ Deprecated v1 Core Features
>>
>> - "cgroup.clone_children" is removed.
>>
>> -- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
>> - at the root instead.
>> +- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" or
>> + "cgroup.stat" files at the root instead.
>>
>>
>> Issues with v1 and Rationales for v2
>> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
>> index b36690ca0d3f..62de18874508 100644
>> --- a/include/linux/cgroup-defs.h
>> +++ b/include/linux/cgroup-defs.h
>> @@ -210,6 +210,13 @@ struct cgroup_subsys_state {
>> * fields of the containing structure.
>> */
>> struct cgroup_subsys_state *parent;
>> +
>> + /*
>> + * Keep track of total numbers of visible and dying descendant CSSes.
>> + * Protected by cgroup_mutex.
>> + */
>> + int nr_descendants;
>> + int nr_dying_descendants;
>> };
>>
>> /*
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index c8e4b62b436a..cf4fc1c109e2 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -3669,12 +3669,36 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
>> static int cgroup_stat_show(struct seq_file *seq, void *v)
>> {
>> struct cgroup *cgroup = seq_css(seq)->cgroup;
>> + struct cgroup_subsys_state *css;
>> + int ssid;
>>
>> seq_printf(seq, "nr_descendants %d\n",
>> cgroup->nr_descendants);
>> seq_printf(seq, "nr_dying_descendants %d\n",
>> cgroup->nr_dying_descendants);
>>
>> + /*
>> + * Show the number of live and dying csses associated with each of
>> + * non-inhibited cgroup subsystems bound to cgroup v2 if non-zero.
>> + *
>> + * Without proper lock protection, racing is possible. So the
>> + * numbers may not be consistent when that happens.
>> + */
>> + rcu_read_lock();
>> + for_each_css(css, ssid, cgroup) {
>> + if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
>> + (cgroup_subsys[ssid]->root != &cgrp_dfl_root))
>> + continue;
>> +
>> + seq_printf(seq, "nr_subsys_%s %d\n", cgroup_subsys[ssid]->name,
>> + css->nr_descendants + 1);
>> + /* Current css is online */
>> + if (css->nr_dying_descendants)
>> + seq_printf(seq, "nr_dying_subsys_%s %d\n",
>> + cgroup_subsys[ssid]->name,
>> + css->nr_dying_descendants);
>> + }
>> + rcu_read_unlock();
>> return 0;
>> }
>>
>> @@ -5424,6 +5448,8 @@ static void css_release_work_fn(struct work_struct *work)
>> list_del_rcu(&css->sibling);
>>
>> if (ss) {
>> + struct cgroup_subsys_state *parent_css;
>> +
>> /* css release path */
>> if (!list_empty(&css->rstat_css_node)) {
>> cgroup_rstat_flush(cgrp);
>> @@ -5433,6 +5459,14 @@ static void css_release_work_fn(struct work_struct *work)
>> cgroup_idr_replace(&ss->css_idr, NULL, css->id);
>> if (ss->css_released)
>> ss->css_released(css);
>> +
>> + WARN_ON_ONCE(css->nr_descendants || css->nr_dying_descendants);
>> + parent_css = css->parent;
>> + while (parent_css) {
>> + parent_css->nr_dying_descendants--;
>> + parent_css = parent_css->parent;
>> + }
>> + css_put(css->parent); /* Parent can be freed now */
>> } else {
>> struct cgroup *tcgrp;
>>
>> @@ -5517,8 +5551,11 @@ static int online_css(struct cgroup_subsys_state *css)
>> rcu_assign_pointer(css->cgroup->subsys[ss->id], css);
>>
>> atomic_inc(&css->online_cnt);
>> - if (css->parent)
>> + if (css->parent) {
>> atomic_inc(&css->parent->online_cnt);
>> + while ((css = css->parent))
>> + css->nr_descendants++;
>> + }
>> }
>> return ret;
>> }
>> @@ -5540,6 +5577,19 @@ static void offline_css(struct cgroup_subsys_state *css)
>> RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
>>
>> wake_up_all(&css->cgroup->offline_waitq);
>> +
>> + /*
>> + * Get a reference to parent css to ensure reliable access to its
>> + * nr_dying_descendants until after this child css is ready to be
>> + * freed.
>> + */
>> + if (css->parent)
>> + css_get(css->parent);
> I think this blob can be dropped: css has a reference to their parent up to
> very last moment, see css_free_rwork_fn().
> And the corresponding css_put() can be dropped too.
>
> With this thing fixed, please, feel free to add
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> .
You are right. I missed that. Will remove the unnecessary css_get/put.
Thanks,
Longman
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 3:26 ` Roman Gushchin
2024-07-11 12:01 ` Waiman Long
@ 2024-07-11 13:25 ` Kamalesh Babulal
1 sibling, 0 replies; 10+ messages in thread
From: Kamalesh Babulal @ 2024-07-11 13:25 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Michal Koutný,
Jonathan Corbet, cgroups, linux-doc, linux-kernel, Roman Gushchin
On 7/11/24 8:56 AM, Roman Gushchin wrote:
> On Wed, Jul 10, 2024 at 10:51:53PM -0400, Waiman Long wrote:
>> Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
>> help manage different structures in various cgroup subsystems by being
>> an embedded element inside a larger structure like cpuset or mem_cgroup.
>>
>> The /proc/cgroups file shows the number of cgroups for each of the
>> subsystems. With cgroup v1, the number of CSSes is the same as the
>> number of cgroups. That is not the case anymore with cgroup v2. The
>> /proc/cgroups file cannot show the actual number of CSSes for the
>> subsystems that are bound to cgroup v2.
>>
>> So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
>> we can't tell by looking at /proc/cgroups which cgroup subsystems may
>> be responsible.
>>
>> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
>> cgroup.stat file is now being extended to show the number of live and
>> dying CSSes associated with all the non-inhibited cgroup subsystems
>> that have been bound to cgroup v2 as long as it is not zero. The number
>> includes CSSes in the current cgroup as well as in all the descendants
>> underneath it. This will help us pinpoint which subsystems are
>> responsible for the increasing number of dying (nr_dying_descendants)
>> cgroups.
>>
>> The cgroup-v2.rst file is updated to discuss this new behavior.
>>
>> With this patch applied, a sample output from root cgroup.stat file
>> was shown below.
>>
>> nr_descendants 55
>> nr_dying_descendants 35
>> nr_subsys_cpuset 1
>> nr_subsys_cpu 40
>> nr_subsys_io 40
>> nr_subsys_memory 55
>> nr_dying_subsys_memory 35
>> nr_subsys_perf_event 56
>> nr_subsys_hugetlb 1
>> nr_subsys_pids 55
>> nr_subsys_rdma 1
>> nr_subsys_misc 1
>>
>> Another sample output from system.slice/cgroup.stat was:
>>
>> nr_descendants 32
>> nr_dying_descendants 33
>> nr_subsys_cpu 30
>> nr_subsys_io 30
>> nr_subsys_memory 32
>> nr_dying_subsys_memory 33
>> nr_subsys_perf_event 33
>> nr_subsys_pids 32
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> Documentation/admin-guide/cgroup-v2.rst | 14 ++++++-
>> include/linux/cgroup-defs.h | 7 ++++
>> kernel/cgroup/cgroup.c | 52 ++++++++++++++++++++++++-
>> 3 files changed, 70 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>> index 52763d6b2919..356cd430c888 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -981,6 +981,16 @@ All cgroup core files are prefixed with "cgroup."
>> A dying cgroup can consume system resources not exceeding
>> limits, which were active at the moment of cgroup deletion.
>>
>> + nr_subsys_<cgroup_subsys>
>> + Total number of live cgroups associated with that cgroup
>> + subsystem (e.g. memory) at and beneath the current
>> + cgroup. An entry will only be shown if it is not zero.
>> +
>> + nr_dying_subsys_<cgroup_subsys>
>> + Total number of dying cgroups associated with that cgroup
>> + subsystem (e.g. memory) beneath the current cgroup.
>> + An entry will only be shown if it is not zero.
>> +
>> cgroup.freeze
>> A read-write single value file which exists on non-root cgroups.
>> Allowed values are "0" and "1". The default is "0".
>> @@ -2930,8 +2940,8 @@ Deprecated v1 Core Features
>>
>> - "cgroup.clone_children" is removed.
>>
>> -- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
>> - at the root instead.
>> +- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" or
>> + "cgroup.stat" files at the root instead.
>>
>>
>> Issues with v1 and Rationales for v2
>> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
>> index b36690ca0d3f..62de18874508 100644
>> --- a/include/linux/cgroup-defs.h
>> +++ b/include/linux/cgroup-defs.h
>> @@ -210,6 +210,13 @@ struct cgroup_subsys_state {
>> * fields of the containing structure.
>> */
>> struct cgroup_subsys_state *parent;
>> +
>> + /*
>> + * Keep track of total numbers of visible and dying descendant CSSes.
>> + * Protected by cgroup_mutex.
>> + */
>> + int nr_descendants;
>> + int nr_dying_descendants;
>> };
>>
>> /*
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index c8e4b62b436a..cf4fc1c109e2 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -3669,12 +3669,36 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
>> static int cgroup_stat_show(struct seq_file *seq, void *v)
>> {
>> struct cgroup *cgroup = seq_css(seq)->cgroup;
>> + struct cgroup_subsys_state *css;
>> + int ssid;
>>
>> seq_printf(seq, "nr_descendants %d\n",
>> cgroup->nr_descendants);
>> seq_printf(seq, "nr_dying_descendants %d\n",
>> cgroup->nr_dying_descendants);
>>
>> + /*
>> + * Show the number of live and dying csses associated with each of
>> + * non-inhibited cgroup subsystems bound to cgroup v2 if non-zero.
>> + *
>> + * Without proper lock protection, racing is possible. So the
>> + * numbers may not be consistent when that happens.
>> + */
>> + rcu_read_lock();
>> + for_each_css(css, ssid, cgroup) {
>> + if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
>> + (cgroup_subsys[ssid]->root != &cgrp_dfl_root))
>> + continue;
>> +
>> + seq_printf(seq, "nr_subsys_%s %d\n", cgroup_subsys[ssid]->name,
>> + css->nr_descendants + 1);
>> + /* Current css is online */
>> + if (css->nr_dying_descendants)
>> + seq_printf(seq, "nr_dying_subsys_%s %d\n",
>> + cgroup_subsys[ssid]->name,
>> + css->nr_dying_descendants);
>> + }
>> + rcu_read_unlock();
>> return 0;
>> }
>>
>> @@ -5424,6 +5448,8 @@ static void css_release_work_fn(struct work_struct *work)
>> list_del_rcu(&css->sibling);
>>
>> if (ss) {
>> + struct cgroup_subsys_state *parent_css;
>> +
>> /* css release path */
>> if (!list_empty(&css->rstat_css_node)) {
>> cgroup_rstat_flush(cgrp);
>> @@ -5433,6 +5459,14 @@ static void css_release_work_fn(struct work_struct *work)
>> cgroup_idr_replace(&ss->css_idr, NULL, css->id);
>> if (ss->css_released)
>> ss->css_released(css);
>> +
>> + WARN_ON_ONCE(css->nr_descendants || css->nr_dying_descendants);
>> + parent_css = css->parent;
>> + while (parent_css) {
>> + parent_css->nr_dying_descendants--;
>> + parent_css = parent_css->parent;
>> + }
>> + css_put(css->parent); /* Parent can be freed now */
>> } else {
>> struct cgroup *tcgrp;
>>
>> @@ -5517,8 +5551,11 @@ static int online_css(struct cgroup_subsys_state *css)
>> rcu_assign_pointer(css->cgroup->subsys[ss->id], css);
>>
>> atomic_inc(&css->online_cnt);
>> - if (css->parent)
>> + if (css->parent) {
>> atomic_inc(&css->parent->online_cnt);
>> + while ((css = css->parent))
>> + css->nr_descendants++;
>> + }
>> }
>> return ret;
>> }
>> @@ -5540,6 +5577,19 @@ static void offline_css(struct cgroup_subsys_state *css)
>> RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
>>
>> wake_up_all(&css->cgroup->offline_waitq);
>> +
>> + /*
>> + * Get a reference to parent css to ensure reliable access to its
>> + * nr_dying_descendants until after this child css is ready to be
>> + * freed.
>> + */
>> + if (css->parent)
>> + css_get(css->parent);
>
> I think this blob can be dropped: css has a reference to their parent up to
> very last moment, see css_free_rwork_fn().
> And the corresponding css_put() can be dropped too.
>
> With this thing fixed, please, feel free to add
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> .
>
The nr_subsys_<subsy>, nr_dying_subsys_<subsys> format is more appealing.
I tested it with Roman's suggestion, with a simple test of creating,
enabling cpu, memory controllers, and removing 1000 cgroups as descendants.
While parallelly read /sys/fs/cgroup/cgroup.stat as root and non-root user
over a loop. I did run a few test runs and saw no issues.
--
Thanks,
Kamalesh
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 2:51 [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
2024-07-11 3:26 ` Roman Gushchin
@ 2024-07-25 13:15 ` Michal Koutný
2024-07-25 20:05 ` Waiman Long
1 sibling, 1 reply; 10+ messages in thread
From: Michal Koutný @ 2024-07-25 13:15 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups,
linux-doc, linux-kernel, Kamalesh Babulal, Roman Gushchin
[-- Attachment #1: Type: text/plain, Size: 1991 bytes --]
Hello.
On Wed, Jul 10, 2024 at 10:51:53PM GMT, Waiman Long <longman@redhat.com> wrote:
> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
> cgroup.stat file is now being extended to show the number of live and
> dying CSSes associated with all the non-inhibited cgroup subsystems
> that have been bound to cgroup v2 as long as it is not zero. The number
> includes CSSes in the current cgroup as well as in all the descendants
> underneath it. This will help us pinpoint which subsystems are
> responsible for the increasing number of dying (nr_dying_descendants)
> cgroups.
This implementation means every onlining/offlining (only additionally)
contends in root's css updates (even when stats aren't ever read).
There's also 'debug' subsys. Have you looked at (extending) that wrt
dying csses troubleshooting?
It'd be good to document here why you decided against it.
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -3669,12 +3669,36 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
> static int cgroup_stat_show(struct seq_file *seq, void *v)
> {
> struct cgroup *cgroup = seq_css(seq)->cgroup;
> + struct cgroup_subsys_state *css;
> + int ssid;
>
> seq_printf(seq, "nr_descendants %d\n",
> cgroup->nr_descendants);
> seq_printf(seq, "nr_dying_descendants %d\n",
> cgroup->nr_dying_descendants);
>
> + /*
> + * Show the number of live and dying csses associated with each of
> + * non-inhibited cgroup subsystems bound to cgroup v2 if non-zero.
> + *
> + * Without proper lock protection, racing is possible. So the
> + * numbers may not be consistent when that happens.
> + */
> + rcu_read_lock();
> + for_each_css(css, ssid, cgroup) {
> + if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
> + (cgroup_subsys[ssid]->root != &cgrp_dfl_root))
> + continue;
Is this taken? (Given cgroup.stat is only on the default hierarchy.)
Thanks,
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-25 13:15 ` Michal Koutný
@ 2024-07-25 20:05 ` Waiman Long
2024-07-26 8:19 ` Michal Koutný
0 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2024-07-25 20:05 UTC (permalink / raw)
To: Michal Koutný
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups,
linux-doc, linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/25/24 09:15, Michal Koutný wrote:
> Hello.
>
> On Wed, Jul 10, 2024 at 10:51:53PM GMT, Waiman Long <longman@redhat.com> wrote:
>> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
>> cgroup.stat file is now being extended to show the number of live and
>> dying CSSes associated with all the non-inhibited cgroup subsystems
>> that have been bound to cgroup v2 as long as it is not zero. The number
>> includes CSSes in the current cgroup as well as in all the descendants
>> underneath it. This will help us pinpoint which subsystems are
>> responsible for the increasing number of dying (nr_dying_descendants)
>> cgroups.
> This implementation means every onlining/offlining (only additionally)
> contends in root's css updates (even when stats aren't ever read).
>
> There's also 'debug' subsys. Have you looked at (extending) that wrt
> dying csses troubleshooting?
> It'd be good to document here why you decided against it.
The config that I used for testing doesn't include CONFIG_CGROUP_DEBUG.
That is why "debug" doesn't show up in the sample outputs. The CSS # for
the debug subsystem should show up if it is enabled.
>
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -3669,12 +3669,36 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
>> static int cgroup_stat_show(struct seq_file *seq, void *v)
>> {
>> struct cgroup *cgroup = seq_css(seq)->cgroup;
>> + struct cgroup_subsys_state *css;
>> + int ssid;
>>
>> seq_printf(seq, "nr_descendants %d\n",
>> cgroup->nr_descendants);
>> seq_printf(seq, "nr_dying_descendants %d\n",
>> cgroup->nr_dying_descendants);
>>
>> + /*
>> + * Show the number of live and dying csses associated with each of
>> + * non-inhibited cgroup subsystems bound to cgroup v2 if non-zero.
>> + *
>> + * Without proper lock protection, racing is possible. So the
>> + * numbers may not be consistent when that happens.
>> + */
>> + rcu_read_lock();
>> + for_each_css(css, ssid, cgroup) {
>> + if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
>> + (cgroup_subsys[ssid]->root != &cgrp_dfl_root))
>> + continue;
> Is this taken? (Given cgroup.stat is only on the default hierarchy.)
I am not sure what you are asking here. Since cgroup.stat is a cgroup v2
only control file, it won't show subsystems that are bound to cgroup v1.
Cheers,
Longman
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-25 20:05 ` Waiman Long
@ 2024-07-26 8:19 ` Michal Koutný
2024-07-26 20:04 ` Tejun Heo
[not found] ` <79fcaba1-552b-4969-84fc-70c5151508a5@redhat.com>
0 siblings, 2 replies; 10+ messages in thread
From: Michal Koutný @ 2024-07-26 8:19 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups,
linux-doc, linux-kernel, Kamalesh Babulal, Roman Gushchin
[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]
On Thu, Jul 25, 2024 at 04:05:42PM GMT, Waiman Long <longman@redhat.com> wrote:
> > There's also 'debug' subsys. Have you looked at (extending) that wrt
> > dying csses troubleshooting?
> > It'd be good to document here why you decided against it.
> The config that I used for testing doesn't include CONFIG_CGROUP_DEBUG.
I mean if you enable CONFIG_CGROUP_DEBUG, there is 'debug' controller
that exposes files like debug.csses et al.
> That is why "debug" doesn't show up in the sample outputs. The CSS #
> for the debug subsystem should show up if it is enabled.
So these "debugging" numbers could be implemented via debug subsys. So I
wondered why it's not done this way. That reasoning is missing in the
commit message.
> > > + for_each_css(css, ssid, cgroup) {
> > > + if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
> > > + (cgroup_subsys[ssid]->root != &cgrp_dfl_root))
> > > + continue;
> > Is this taken? (Given cgroup.stat is only on the default hierarchy.)
>
> I am not sure what you are asking here. Since cgroup.stat is a cgroup v2
> only control file, it won't show subsystems that are bound to cgroup v1.
So, is the if (...) ever true? (The file won't exist on v1.)
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-26 8:19 ` Michal Koutný
@ 2024-07-26 20:04 ` Tejun Heo
2024-07-26 20:36 ` Waiman Long
[not found] ` <79fcaba1-552b-4969-84fc-70c5151508a5@redhat.com>
1 sibling, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2024-07-26 20:04 UTC (permalink / raw)
To: Michal Koutný
Cc: Waiman Long, Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups,
linux-doc, linux-kernel, Kamalesh Babulal, Roman Gushchin
Hello,
On Fri, Jul 26, 2024 at 10:19:05AM +0200, Michal Koutný wrote:
> On Thu, Jul 25, 2024 at 04:05:42PM GMT, Waiman Long <longman@redhat.com> wrote:
> > > There's also 'debug' subsys. Have you looked at (extending) that wrt
> > > dying csses troubleshooting?
> > > It'd be good to document here why you decided against it.
> > The config that I used for testing doesn't include CONFIG_CGROUP_DEBUG.
>
> I mean if you enable CONFIG_CGROUP_DEBUG, there is 'debug' controller
> that exposes files like debug.csses et al.
>
> > That is why "debug" doesn't show up in the sample outputs. The CSS #
> > for the debug subsystem should show up if it is enabled.
>
> So these "debugging" numbers could be implemented via debug subsys. So I
> wondered why it's not done this way. That reasoning is missing in the
> commit message.
While this is a bit of implementation detail, it's also something which can
be pretty relevant in production, so my preference is to show them in
cgroup.stat. The recursive stats is something not particularly easy to
collect from the debug controller proper anyway.
One problem with debug subsys is that it's unclear whether they are safe to
use and can be depended upon in production. Not that anything it shows
currently is particularly risky but the contract around the debug controller
is that it's debug stuff and developers may do silly things with it (e.g.
doing high complexity iterations and what not).
The debug controller, in general, I'm not sure how useful it is. It does
nothing that drgn scripts can't do and doesn't really have enough extra
benefits that make it better. We didn't have drgn back when it was added, so
it's there for historical reasons, but I don't think it's a good idea to
expand on it.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-26 20:04 ` Tejun Heo
@ 2024-07-26 20:36 ` Waiman Long
0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2024-07-26 20:36 UTC (permalink / raw)
To: Tejun Heo, Michal Koutný
Cc: Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/26/24 16:04, Tejun Heo wrote:
> Hello,
>
> On Fri, Jul 26, 2024 at 10:19:05AM +0200, Michal Koutný wrote:
>> On Thu, Jul 25, 2024 at 04:05:42PM GMT, Waiman Long <longman@redhat.com> wrote:
>>>> There's also 'debug' subsys. Have you looked at (extending) that wrt
>>>> dying csses troubleshooting?
>>>> It'd be good to document here why you decided against it.
>>> The config that I used for testing doesn't include CONFIG_CGROUP_DEBUG.
>> I mean if you enable CONFIG_CGROUP_DEBUG, there is 'debug' controller
>> that exposes files like debug.csses et al.
>>
>>> That is why "debug" doesn't show up in the sample outputs. The CSS #
>>> for the debug subsystem should show up if it is enabled.
>> So these "debugging" numbers could be implemented via debug subsys. So I
>> wondered why it's not done this way. That reasoning is missing in the
>> commit message.
> While this is a bit of implementation detail, it's also something which can
> be pretty relevant in production, so my preference is to show them in
> cgroup.stat. The recursive stats is something not particularly easy to
> collect from the debug controller proper anyway.
>
> One problem with debug subsys is that it's unclear whether they are safe to
> use and can be depended upon in production. Not that anything it shows
> currently is particularly risky but the contract around the debug controller
> is that it's debug stuff and developers may do silly things with it (e.g.
> doing high complexity iterations and what not).
>
> The debug controller, in general, I'm not sure how useful it is. It does
> nothing that drgn scripts can't do and doesn't really have enough extra
> benefits that make it better. We didn't have drgn back when it was added, so
> it's there for historical reasons, but I don't think it's a good idea to
> expand on it.
I totally agree.
For RHEL, CONFIG_CGROUP_DEBUG isn't enabled in the production kernel,
but is enabled in the debug kernel. I believe it may be similar in other
distros. So we can't really reliably depend on using the debug
controller to get this information which can be useful to monitor cgroup
behavior in a production kernel.
Cheers,
Longman
^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <79fcaba1-552b-4969-84fc-70c5151508a5@redhat.com>]
* Re: [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat
[not found] ` <79fcaba1-552b-4969-84fc-70c5151508a5@redhat.com>
@ 2024-07-26 20:42 ` Waiman Long
0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2024-07-26 20:42 UTC (permalink / raw)
To: Michal Koutný
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups,
linux-doc, linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/26/24 16:40, Waiman Long wrote:
>
>
> On 7/26/24 04:19, Michal Koutný wrote:
>>>>> + for_each_css(css, ssid, cgroup) {
>>>>> + if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
>>>>> + (cgroup_subsys[ssid]->root != &cgrp_dfl_root))
>>>>> + continue;
>>>> Is this taken? (Given cgroup.stat is only on the default hierarchy.)
>>> I am not sure what you are asking here. Since cgroup.stat is a cgroup v2
>>> only control file, it won't show subsystems that are bound to cgroup v1.
>> So, is the if (...) ever true? (The file won't exist on v1.)
>
> A mixed cgroup v1/v2 environment is allowed, though not encouraged. We
> can have some cgroup controllers bound to cgroup v2 and the rests to
> cgroup v1. It is in this case that cgroup.stat will only show those
> that are with cgroup v2.
>
> Regards,
> Longman
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-07-26 20:42 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-11 2:51 [PATCH-cgroup v4] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
2024-07-11 3:26 ` Roman Gushchin
2024-07-11 12:01 ` Waiman Long
2024-07-11 13:25 ` Kamalesh Babulal
2024-07-25 13:15 ` Michal Koutný
2024-07-25 20:05 ` Waiman Long
2024-07-26 8:19 ` Michal Koutný
2024-07-26 20:04 ` Tejun Heo
2024-07-26 20:36 ` Waiman Long
[not found] ` <79fcaba1-552b-4969-84fc-70c5151508a5@redhat.com>
2024-07-26 20:42 ` Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).