* [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
@ 2024-07-10 18:23 Waiman Long
2024-07-10 18:23 ` [PATCH v3 2/2] cgroup: Limit frequency of reading cgroup.stat for unprivileged users Waiman Long
` (3 more replies)
0 siblings, 4 replies; 21+ messages in thread
From: Waiman Long @ 2024-07-10 18:23 UTC (permalink / raw)
To: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet
Cc: cgroups, linux-doc, linux-kernel, Kamalesh Babulal,
Roman Gushchin, Waiman Long
Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
help manage different structures in various cgroup subsystems by being
an embedded element inside a larger structure like cpuset or mem_cgroup.
The /proc/cgroups file shows the number of cgroups for each of the
subsystems. With cgroup v1, the number of CSSes is the same as the
number of cgroups. That is not the case anymore with cgroup v2. The
/proc/cgroups file cannot show the actual number of CSSes for the
subsystems that are bound to cgroup v2.
So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
we can't tell by looking at /proc/cgroups which cgroup subsystems may
be responsible.
As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
cgroup.stat file is now being extended to show the number of live and
dying CSSes associated with all the non-inhibited cgroup subsystems
that have been bound to cgroup v2 as long as it is not zero. The number
includes CSSes in the current cgroup as well as in all the descendants
underneath it. This will help us pinpoint which subsystems are
responsible for the increasing number of dying (nr_dying_descendants)
cgroups.
The cgroup-v2.rst file is updated to discuss this new behavior.
With this patch applied, a sample output from root cgroup.stat file
was shown below.
nr_descendants 54
nr_dying_descendants 44
nr_cpuset 1
nr_cpu 40
nr_io 40
nr_memory 54
nr_dying_memory 44
nr_perf_event 55
nr_hugetlb 1
nr_pids 54
nr_rdma 1
nr_misc 1
Another sample output from system.slice/cgroup.stat was:
nr_descendants 32
nr_dying_descendants 37
nr_cpu 30
nr_io 30
nr_memory 32
nr_dying_memory 37
nr_perf_event 33
nr_pids 32
Signed-off-by: Waiman Long <longman@redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 14 ++++++-
include/linux/cgroup-defs.h | 7 ++++
kernel/cgroup/cgroup.c | 50 ++++++++++++++++++++++++-
3 files changed, 68 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 52763d6b2919..9031419271cd 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -981,6 +981,16 @@ All cgroup core files are prefixed with "cgroup."
A dying cgroup can consume system resources not exceeding
limits, which were active at the moment of cgroup deletion.
+ nr_<cgroup_subsys>
+ Total number of live cgroups associated with that cgroup
+ subsystem (e.g. memory) at and beneath the current
+ cgroup. An entry will only be shown if it is not zero.
+
+ nr_dying_<cgroup_subsys>
+ Total number of dying cgroups associated with that cgroup
+ subsystem (e.g. memory) beneath the current cgroup.
+ An entry will only be shown if it is not zero.
+
cgroup.freeze
A read-write single value file which exists on non-root cgroups.
Allowed values are "0" and "1". The default is "0".
@@ -2930,8 +2940,8 @@ Deprecated v1 Core Features
- "cgroup.clone_children" is removed.
-- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
- at the root instead.
+- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" or
+ "cgroup.stat" files at the root instead.
Issues with v1 and Rationales for v2
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index b36690ca0d3f..62de18874508 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -210,6 +210,13 @@ struct cgroup_subsys_state {
* fields of the containing structure.
*/
struct cgroup_subsys_state *parent;
+
+ /*
+ * Keep track of total numbers of visible and dying descendant CSSes.
+ * Protected by cgroup_mutex.
+ */
+ int nr_descendants;
+ int nr_dying_descendants;
};
/*
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c8e4b62b436a..18c982a06446 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3669,12 +3669,34 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
static int cgroup_stat_show(struct seq_file *seq, void *v)
{
struct cgroup *cgroup = seq_css(seq)->cgroup;
+ struct cgroup_subsys_state *css;
+ int ssid;
+ /* cgroup_mutex required for for_each_css() */
+ cgroup_lock();
seq_printf(seq, "nr_descendants %d\n",
cgroup->nr_descendants);
seq_printf(seq, "nr_dying_descendants %d\n",
cgroup->nr_dying_descendants);
+ /*
+ * Show the number of live and dying csses associated with each of
+ * non-inhibited cgroup subsystems bound to cgroup v2 if non-zero.
+ */
+ for_each_css(css, ssid, cgroup) {
+ if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
+ (cgroup_subsys[ssid]->root != &cgrp_dfl_root))
+ continue;
+
+ seq_printf(seq, "nr_%s %d\n", cgroup_subsys[ssid]->name,
+ css->nr_descendants + 1);
+ /* Current css is online */
+ if (css->nr_dying_descendants)
+ seq_printf(seq, "nr_dying_%s %d\n",
+ cgroup_subsys[ssid]->name,
+ css->nr_dying_descendants);
+ }
+ cgroup_unlock();
return 0;
}
@@ -5424,6 +5446,8 @@ static void css_release_work_fn(struct work_struct *work)
list_del_rcu(&css->sibling);
if (ss) {
+ struct cgroup_subsys_state *parent_css;
+
/* css release path */
if (!list_empty(&css->rstat_css_node)) {
cgroup_rstat_flush(cgrp);
@@ -5433,6 +5457,14 @@ static void css_release_work_fn(struct work_struct *work)
cgroup_idr_replace(&ss->css_idr, NULL, css->id);
if (ss->css_released)
ss->css_released(css);
+
+ WARN_ON_ONCE(css->nr_descendants || css->nr_dying_descendants);
+ parent_css = css->parent;
+ while (parent_css) {
+ parent_css->nr_dying_descendants--;
+ parent_css = parent_css->parent;
+ }
+ css_put(css->parent); /* Parent can be freed now */
} else {
struct cgroup *tcgrp;
@@ -5517,8 +5549,11 @@ static int online_css(struct cgroup_subsys_state *css)
rcu_assign_pointer(css->cgroup->subsys[ss->id], css);
atomic_inc(&css->online_cnt);
- if (css->parent)
+ if (css->parent) {
atomic_inc(&css->parent->online_cnt);
+ while ((css = css->parent))
+ css->nr_descendants++;
+ }
}
return ret;
}
@@ -5540,6 +5575,19 @@ static void offline_css(struct cgroup_subsys_state *css)
RCU_INIT_POINTER(css->cgroup->subsys[ss->id], NULL);
wake_up_all(&css->cgroup->offline_waitq);
+
+ /*
+ * Get a reference to parent css to ensure reliable access to its
+ * nr_dying_descendants until after this child css is ready to be
+ * freed.
+ */
+ if (css->parent)
+ css_get(css->parent);
+
+ while ((css = css->parent)) {
+ css->nr_descendants--;
+ css->nr_dying_descendants++;
+ }
}
/**
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v3 2/2] cgroup: Limit frequency of reading cgroup.stat for unprivileged users
2024-07-10 18:23 [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
@ 2024-07-10 18:23 ` Waiman Long
2024-07-10 21:43 ` [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat Roman Gushchin
` (2 subsequent siblings)
3 siblings, 0 replies; 21+ messages in thread
From: Waiman Long @ 2024-07-10 18:23 UTC (permalink / raw)
To: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet
Cc: cgroups, linux-doc, linux-kernel, Kamalesh Babulal,
Roman Gushchin, Waiman Long
Since cgroup_mutex is being acquired when reading from a world readable
cgroup.stat control file, it is possible that repeated reading of
cgroup.stat may be used as a denial of service (DoS) attack vector
by unprivileged users to greatly hinder cgroup related operations in
a system.
To prevent this, we are limiting the reading of cgroup.stat file from
unprivileged users to at most 8 times per second.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cgroup.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 18c982a06446..56ac9f14d100 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3668,12 +3668,22 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
static int cgroup_stat_show(struct seq_file *seq, void *v)
{
+ static unsigned long unpriv_timestamp;
struct cgroup *cgroup = seq_css(seq)->cgroup;
struct cgroup_subsys_state *css;
int ssid;
+ /*
+ * Unprivileged users can only read cgroup.stat up to 8 times
+ * per second to avoid potential DoS attack.
+ */
+ if (!capable(CAP_SYS_ADMIN) &&
+ time_before(jiffies, unpriv_timestamp + (HZ >> 3)))
+ msleep(HZ >> 3);
+
/* cgroup_mutex required for for_each_css() */
cgroup_lock();
+ unpriv_timestamp = jiffies;
seq_printf(seq, "nr_descendants %d\n",
cgroup->nr_descendants);
seq_printf(seq, "nr_dying_descendants %d\n",
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-10 18:23 [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
2024-07-10 18:23 ` [PATCH v3 2/2] cgroup: Limit frequency of reading cgroup.stat for unprivileged users Waiman Long
@ 2024-07-10 21:43 ` Roman Gushchin
2024-07-10 23:49 ` Waiman Long
2024-07-10 21:59 ` Tejun Heo
2024-07-11 13:49 ` Johannes Weiner
3 siblings, 1 reply; 21+ messages in thread
From: Roman Gushchin @ 2024-07-10 21:43 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups,
linux-doc, linux-kernel, Kamalesh Babulal
On Wed, Jul 10, 2024 at 02:23:52PM -0400, Waiman Long wrote:
> Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
> help manage different structures in various cgroup subsystems by being
> an embedded element inside a larger structure like cpuset or mem_cgroup.
>
> The /proc/cgroups file shows the number of cgroups for each of the
> subsystems. With cgroup v1, the number of CSSes is the same as the
> number of cgroups. That is not the case anymore with cgroup v2. The
> /proc/cgroups file cannot show the actual number of CSSes for the
> subsystems that are bound to cgroup v2.
>
> So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
> we can't tell by looking at /proc/cgroups which cgroup subsystems may
> be responsible.
>
> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
> cgroup.stat file is now being extended to show the number of live and
> dying CSSes associated with all the non-inhibited cgroup subsystems
> that have been bound to cgroup v2 as long as it is not zero. The number
> includes CSSes in the current cgroup as well as in all the descendants
> underneath it. This will help us pinpoint which subsystems are
> responsible for the increasing number of dying (nr_dying_descendants)
> cgroups.
>
> The cgroup-v2.rst file is updated to discuss this new behavior.
>
> With this patch applied, a sample output from root cgroup.stat file
> was shown below.
>
> nr_descendants 54
> nr_dying_descendants 44
> nr_cpuset 1
> nr_cpu 40
> nr_io 40
> nr_memory 54
> nr_dying_memory 44
> nr_perf_event 55
> nr_hugetlb 1
> nr_pids 54
> nr_rdma 1
> nr_misc 1
>
> Another sample output from system.slice/cgroup.stat was:
>
> nr_descendants 32
> nr_dying_descendants 37
> nr_cpu 30
> nr_io 30
> nr_memory 32
> nr_dying_memory 37
> nr_perf_event 33
> nr_pids 32
>
> Signed-off-by: Waiman Long <longman@redhat.com>
I like it way more than the previous version, thank you for the update.
> ---
> Documentation/admin-guide/cgroup-v2.rst | 14 ++++++-
> include/linux/cgroup-defs.h | 7 ++++
> kernel/cgroup/cgroup.c | 50 ++++++++++++++++++++++++-
> 3 files changed, 68 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 52763d6b2919..9031419271cd 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -981,6 +981,16 @@ All cgroup core files are prefixed with "cgroup."
> A dying cgroup can consume system resources not exceeding
> limits, which were active at the moment of cgroup deletion.
>
> + nr_<cgroup_subsys>
> + Total number of live cgroups associated with that cgroup
> + subsystem (e.g. memory) at and beneath the current
> + cgroup. An entry will only be shown if it is not zero.
> +
> + nr_dying_<cgroup_subsys>
> + Total number of dying cgroups associated with that cgroup
> + subsystem (e.g. memory) beneath the current cgroup.
> + An entry will only be shown if it is not zero.
> +
> cgroup.freeze
> A read-write single value file which exists on non-root cgroups.
> Allowed values are "0" and "1". The default is "0".
> @@ -2930,8 +2940,8 @@ Deprecated v1 Core Features
>
> - "cgroup.clone_children" is removed.
>
> -- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
> - at the root instead.
> +- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" or
> + "cgroup.stat" files at the root instead.
>
>
> Issues with v1 and Rationales for v2
> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
> index b36690ca0d3f..62de18874508 100644
> --- a/include/linux/cgroup-defs.h
> +++ b/include/linux/cgroup-defs.h
> @@ -210,6 +210,13 @@ struct cgroup_subsys_state {
> * fields of the containing structure.
> */
> struct cgroup_subsys_state *parent;
> +
> + /*
> + * Keep track of total numbers of visible and dying descendant CSSes.
> + * Protected by cgroup_mutex.
> + */
> + int nr_descendants;
> + int nr_dying_descendants;
> };
>
> /*
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index c8e4b62b436a..18c982a06446 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -3669,12 +3669,34 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
> static int cgroup_stat_show(struct seq_file *seq, void *v)
> {
> struct cgroup *cgroup = seq_css(seq)->cgroup;
> + struct cgroup_subsys_state *css;
> + int ssid;
>
> + /* cgroup_mutex required for for_each_css() */
> + cgroup_lock();
I *guess* it can be done under a rcu_read_lock(), isn't it?
That would eliminate a need for the second patch as well, which
is questionable (e.g. one unprivileged user can block others?)
Thanks!
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-10 21:43 ` [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat Roman Gushchin
@ 2024-07-10 23:49 ` Waiman Long
0 siblings, 0 replies; 21+ messages in thread
From: Waiman Long @ 2024-07-10 23:49 UTC (permalink / raw)
To: Roman Gushchin
Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups,
linux-doc, linux-kernel, Kamalesh Babulal
On 7/10/24 17:43, Roman Gushchin wrote:
> On Wed, Jul 10, 2024 at 02:23:52PM -0400, Waiman Long wrote:
>> Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
>> help manage different structures in various cgroup subsystems by being
>> an embedded element inside a larger structure like cpuset or mem_cgroup.
>>
>> The /proc/cgroups file shows the number of cgroups for each of the
>> subsystems. With cgroup v1, the number of CSSes is the same as the
>> number of cgroups. That is not the case anymore with cgroup v2. The
>> /proc/cgroups file cannot show the actual number of CSSes for the
>> subsystems that are bound to cgroup v2.
>>
>> So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
>> we can't tell by looking at /proc/cgroups which cgroup subsystems may
>> be responsible.
>>
>> As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
>> cgroup.stat file is now being extended to show the number of live and
>> dying CSSes associated with all the non-inhibited cgroup subsystems
>> that have been bound to cgroup v2 as long as it is not zero. The number
>> includes CSSes in the current cgroup as well as in all the descendants
>> underneath it. This will help us pinpoint which subsystems are
>> responsible for the increasing number of dying (nr_dying_descendants)
>> cgroups.
>>
>> The cgroup-v2.rst file is updated to discuss this new behavior.
>>
>> With this patch applied, a sample output from root cgroup.stat file
>> was shown below.
>>
>> nr_descendants 54
>> nr_dying_descendants 44
>> nr_cpuset 1
>> nr_cpu 40
>> nr_io 40
>> nr_memory 54
>> nr_dying_memory 44
>> nr_perf_event 55
>> nr_hugetlb 1
>> nr_pids 54
>> nr_rdma 1
>> nr_misc 1
>>
>> Another sample output from system.slice/cgroup.stat was:
>>
>> nr_descendants 32
>> nr_dying_descendants 37
>> nr_cpu 30
>> nr_io 30
>> nr_memory 32
>> nr_dying_memory 37
>> nr_perf_event 33
>> nr_pids 32
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
> I like it way more than the previous version, thank you for the update.
>
>> ---
>> Documentation/admin-guide/cgroup-v2.rst | 14 ++++++-
>> include/linux/cgroup-defs.h | 7 ++++
>> kernel/cgroup/cgroup.c | 50 ++++++++++++++++++++++++-
>> 3 files changed, 68 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>> index 52763d6b2919..9031419271cd 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -981,6 +981,16 @@ All cgroup core files are prefixed with "cgroup."
>> A dying cgroup can consume system resources not exceeding
>> limits, which were active at the moment of cgroup deletion.
>>
>> + nr_<cgroup_subsys>
>> + Total number of live cgroups associated with that cgroup
>> + subsystem (e.g. memory) at and beneath the current
>> + cgroup. An entry will only be shown if it is not zero.
>> +
>> + nr_dying_<cgroup_subsys>
>> + Total number of dying cgroups associated with that cgroup
>> + subsystem (e.g. memory) beneath the current cgroup.
>> + An entry will only be shown if it is not zero.
>> +
>> cgroup.freeze
>> A read-write single value file which exists on non-root cgroups.
>> Allowed values are "0" and "1". The default is "0".
>> @@ -2930,8 +2940,8 @@ Deprecated v1 Core Features
>>
>> - "cgroup.clone_children" is removed.
>>
>> -- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
>> - at the root instead.
>> +- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" or
>> + "cgroup.stat" files at the root instead.
>>
>>
>> Issues with v1 and Rationales for v2
>> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
>> index b36690ca0d3f..62de18874508 100644
>> --- a/include/linux/cgroup-defs.h
>> +++ b/include/linux/cgroup-defs.h
>> @@ -210,6 +210,13 @@ struct cgroup_subsys_state {
>> * fields of the containing structure.
>> */
>> struct cgroup_subsys_state *parent;
>> +
>> + /*
>> + * Keep track of total numbers of visible and dying descendant CSSes.
>> + * Protected by cgroup_mutex.
>> + */
>> + int nr_descendants;
>> + int nr_dying_descendants;
>> };
>>
>> /*
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index c8e4b62b436a..18c982a06446 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -3669,12 +3669,34 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
>> static int cgroup_stat_show(struct seq_file *seq, void *v)
>> {
>> struct cgroup *cgroup = seq_css(seq)->cgroup;
>> + struct cgroup_subsys_state *css;
>> + int ssid;
>>
>> + /* cgroup_mutex required for for_each_css() */
>> + cgroup_lock();
> I *guess* it can be done under a rcu_read_lock(), isn't it?
> That would eliminate a need for the second patch as well, which
> is questionable (e.g. one unprivileged user can block others?)
I am just following the instruction in the for_each_css() macro:
*
* Should be called under cgroup_mutex.
*/
I think taking rcu_read_lock() should also work in this case. Will try
it out and update the patch after some testing.
Thanks,
Longman
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-10 18:23 [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
2024-07-10 18:23 ` [PATCH v3 2/2] cgroup: Limit frequency of reading cgroup.stat for unprivileged users Waiman Long
2024-07-10 21:43 ` [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat Roman Gushchin
@ 2024-07-10 21:59 ` Tejun Heo
2024-07-10 23:51 ` Waiman Long
2024-07-11 13:49 ` Johannes Weiner
3 siblings, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2024-07-10 21:59 UTC (permalink / raw)
To: Waiman Long
Cc: Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
Hello,
On Wed, Jul 10, 2024 at 02:23:52PM -0400, Waiman Long wrote:
> With this patch applied, a sample output from root cgroup.stat file
> was shown below.
>
> nr_descendants 54
> nr_dying_descendants 44
> nr_cpuset 1
> nr_cpu 40
> nr_io 40
> nr_memory 54
> nr_dying_memory 44
> nr_perf_event 55
> nr_hugetlb 1
> nr_pids 54
> nr_rdma 1
> nr_misc 1
So, css may be too specific a name but this looks a bit disorganized. How
about using controller as the common prefix? Maybe something like:
nr_controllers_cpu 40
nr_controllers_io 40
nr_controllers_memory 54
nr_controller_perf_event 55
...
nr_dying_controllers_memory 44
If controllers is too long, we can shorten it somehow or use subsys, maybe?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-10 21:59 ` Tejun Heo
@ 2024-07-10 23:51 ` Waiman Long
0 siblings, 0 replies; 21+ messages in thread
From: Waiman Long @ 2024-07-10 23:51 UTC (permalink / raw)
To: Tejun Heo
Cc: Zefan Li, Johannes Weiner, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/10/24 17:59, Tejun Heo wrote:
> Hello,
>
> On Wed, Jul 10, 2024 at 02:23:52PM -0400, Waiman Long wrote:
>> With this patch applied, a sample output from root cgroup.stat file
>> was shown below.
>>
>> nr_descendants 54
>> nr_dying_descendants 44
>> nr_cpuset 1
>> nr_cpu 40
>> nr_io 40
>> nr_memory 54
>> nr_dying_memory 44
>> nr_perf_event 55
>> nr_hugetlb 1
>> nr_pids 54
>> nr_rdma 1
>> nr_misc 1
> So, css may be too specific a name but this looks a bit disorganized. How
> about using controller as the common prefix? Maybe something like:
>
> nr_controllers_cpu 40
> nr_controllers_io 40
> nr_controllers_memory 54
> nr_controller_perf_event 55
> ...
> nr_dying_controllers_memory 44
>
> If controllers is too long, we can shorten it somehow or use subsys, maybe?
I think "controllers" is too long. I am fine with "subsys". Will make
change in the next version.
Thanks,
Longman
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-10 18:23 [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
` (2 preceding siblings ...)
2024-07-10 21:59 ` Tejun Heo
@ 2024-07-11 13:49 ` Johannes Weiner
2024-07-11 14:05 ` Waiman Long
3 siblings, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2024-07-11 13:49 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On Wed, Jul 10, 2024 at 02:23:52PM -0400, Waiman Long wrote:
> @@ -3669,12 +3669,34 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
> static int cgroup_stat_show(struct seq_file *seq, void *v)
> {
> struct cgroup *cgroup = seq_css(seq)->cgroup;
> + struct cgroup_subsys_state *css;
> + int ssid;
>
> + /* cgroup_mutex required for for_each_css() */
> + cgroup_lock();
> seq_printf(seq, "nr_descendants %d\n",
> cgroup->nr_descendants);
> seq_printf(seq, "nr_dying_descendants %d\n",
> cgroup->nr_dying_descendants);
>
> + /*
> + * Show the number of live and dying csses associated with each of
> + * non-inhibited cgroup subsystems bound to cgroup v2 if non-zero.
> + */
> + for_each_css(css, ssid, cgroup) {
> + if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
> + (cgroup_subsys[ssid]->root != &cgrp_dfl_root))
> + continue;
> +
> + seq_printf(seq, "nr_%s %d\n", cgroup_subsys[ssid]->name,
> + css->nr_descendants + 1);
> + /* Current css is online */
> + if (css->nr_dying_descendants)
> + seq_printf(seq, "nr_dying_%s %d\n",
> + cgroup_subsys[ssid]->name,
> + css->nr_dying_descendants);
> + }
I think it'd be better to print the dying count unconditionally. It
makes the output more predictable for parsers, and also it's clearer
to users which data points are being tracked and reported.
With that, and TJ's "subsys" suggestion for the name, it looks good to
me. Thanks!
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 13:49 ` Johannes Weiner
@ 2024-07-11 14:05 ` Waiman Long
2024-07-11 17:18 ` Tejun Heo
0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2024-07-11 14:05 UTC (permalink / raw)
To: Johannes Weiner
Cc: Tejun Heo, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/11/24 09:49, Johannes Weiner wrote:
> On Wed, Jul 10, 2024 at 02:23:52PM -0400, Waiman Long wrote:
>> @@ -3669,12 +3669,34 @@ static int cgroup_events_show(struct seq_file *seq, void *v)
>> static int cgroup_stat_show(struct seq_file *seq, void *v)
>> {
>> struct cgroup *cgroup = seq_css(seq)->cgroup;
>> + struct cgroup_subsys_state *css;
>> + int ssid;
>>
>> + /* cgroup_mutex required for for_each_css() */
>> + cgroup_lock();
>> seq_printf(seq, "nr_descendants %d\n",
>> cgroup->nr_descendants);
>> seq_printf(seq, "nr_dying_descendants %d\n",
>> cgroup->nr_dying_descendants);
>>
>> + /*
>> + * Show the number of live and dying csses associated with each of
>> + * non-inhibited cgroup subsystems bound to cgroup v2 if non-zero.
>> + */
>> + for_each_css(css, ssid, cgroup) {
>> + if ((BIT(ssid) & cgrp_dfl_inhibit_ss_mask) ||
>> + (cgroup_subsys[ssid]->root != &cgrp_dfl_root))
>> + continue;
>> +
>> + seq_printf(seq, "nr_%s %d\n", cgroup_subsys[ssid]->name,
>> + css->nr_descendants + 1);
>> + /* Current css is online */
>> + if (css->nr_dying_descendants)
>> + seq_printf(seq, "nr_dying_%s %d\n",
>> + cgroup_subsys[ssid]->name,
>> + css->nr_dying_descendants);
>> + }
> I think it'd be better to print the dying count unconditionally. It
> makes the output more predictable for parsers, and also it's clearer
> to users which data points are being tracked and reported.
>
> With that, and TJ's "subsys" suggestion for the name, it looks good to
> me. Thanks!
Given the fact that for_each_css() iteration is filtering out csses that
are absent, the dying counts follow the same logic of skipping it if
there is no dying css. That also makes it easier to identify cgroups
with dying descendant csses as we don't need filter out entries with a 0
dying count. It also makes the output less verbose and let user focus
more on what are significant.
I do understand that it makes it inconsistent with the ways
nr_descendants and nr_dying_descendants are being handled as entries
with 0 count are also displayed. I can update the patch to display those
entries with 0 dying subsys count if other people also think that is the
better way forward.
Cheers,
Longman
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 14:05 ` Waiman Long
@ 2024-07-11 17:18 ` Tejun Heo
2024-07-11 17:39 ` Waiman Long
0 siblings, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2024-07-11 17:18 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
Hello,
On Thu, Jul 11, 2024 at 10:05:22AM -0400, Waiman Long wrote:
> Given the fact that for_each_css() iteration is filtering out csses that are
> absent, the dying counts follow the same logic of skipping it if there is no
> dying css. That also makes it easier to identify cgroups with dying
> descendant csses as we don't need filter out entries with a 0 dying count.
> It also makes the output less verbose and let user focus more on what are
> significant.
>
> I do understand that it makes it inconsistent with the ways nr_descendants
> and nr_dying_descendants are being handled as entries with 0 count are also
> displayed. I can update the patch to display those entries with 0 dying
> subsys count if other people also think that is the better way forward.
I think it'd be better to have all the keys. There are some dynamic keys in
stat files but those are mostly for things which can come and go (e.g. block
and misc devices), so yeah, I think it'd be better to show all the keys even
when they're zero.
Also, I personally would much prefer if the same prefixes are collected
together - ie. totals first and then dying. It's just a lot easier on the
eyes that way.
nr_subsys_cpu
nr_subsys_memory
nr_subsys_io
...
nr_dying_subsys_cpu
nr_dying_subsys_memory
nr_dying_subsys_io
...
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 17:18 ` Tejun Heo
@ 2024-07-11 17:39 ` Waiman Long
2024-07-11 18:44 ` Tejun Heo
0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2024-07-11 17:39 UTC (permalink / raw)
To: Tejun Heo
Cc: Johannes Weiner, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/11/24 13:18, Tejun Heo wrote:
> Hello,
>
> On Thu, Jul 11, 2024 at 10:05:22AM -0400, Waiman Long wrote:
>> Given the fact that for_each_css() iteration is filtering out csses that are
>> absent, the dying counts follow the same logic of skipping it if there is no
>> dying css. That also makes it easier to identify cgroups with dying
>> descendant csses as we don't need filter out entries with a 0 dying count.
>> It also makes the output less verbose and let user focus more on what are
>> significant.
>>
>> I do understand that it makes it inconsistent with the ways nr_descendants
>> and nr_dying_descendants are being handled as entries with 0 count are also
>> displayed. I can update the patch to display those entries with 0 dying
>> subsys count if other people also think that is the better way forward.
> I think it'd be better to have all the keys. There are some dynamic keys in
> stat files but those are mostly for things which can come and go (e.g. block
> and misc devices), so yeah, I think it'd be better to show all the keys even
> when they're zero.
Currently, I use the for_each_css() macro for iteration. If you mean
displaying all the possible cgroup subsystems even if they are not
enabled for the current cgroup, I will have to manually do the iteration.
>
> Also, I personally would much prefer if the same prefixes are collected
> together - ie. totals first and then dying. It's just a lot easier on the
> eyes that way.
>
> nr_subsys_cpu
> nr_subsys_memory
> nr_subsys_io
> ...
> nr_dying_subsys_cpu
> nr_dying_subsys_memory
> nr_dying_subsys_io
> ...
That is fine. I can group entries with the same prefix together.
Cheers,
Longman
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 17:39 ` Waiman Long
@ 2024-07-11 18:44 ` Tejun Heo
2024-07-11 18:51 ` Waiman Long
0 siblings, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2024-07-11 18:44 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
Hello,
On Thu, Jul 11, 2024 at 01:39:38PM -0400, Waiman Long wrote:
> On 7/11/24 13:18, Tejun Heo wrote:
...
> Currently, I use the for_each_css() macro for iteration. If you mean
> displaying all the possible cgroup subsystems even if they are not enabled
> for the current cgroup, I will have to manually do the iteration.
Just wrapping it with for_each_subsys() should do, no? for_each_css() won't
iterate anything if css doesn't exist for the cgroup.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 18:44 ` Tejun Heo
@ 2024-07-11 18:51 ` Waiman Long
2024-07-11 18:59 ` Tejun Heo
0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2024-07-11 18:51 UTC (permalink / raw)
To: Tejun Heo
Cc: Johannes Weiner, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/11/24 14:44, Tejun Heo wrote:
> Hello,
>
> On Thu, Jul 11, 2024 at 01:39:38PM -0400, Waiman Long wrote:
>> On 7/11/24 13:18, Tejun Heo wrote:
> ...
>> Currently, I use the for_each_css() macro for iteration. If you mean
>> displaying all the possible cgroup subsystems even if they are not enabled
>> for the current cgroup, I will have to manually do the iteration.
> Just wrapping it with for_each_subsys() should do, no? for_each_css() won't
> iterate anything if css doesn't exist for the cgroup.
OK, I wasn't sure if you were asking to list all the possible cgroup v2
cgroup subsystems even if they weren't enabled in the current cgroup.
Apparently, that is the case. I prefer it that way too.
Thanks,
Longman
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 18:51 ` Waiman Long
@ 2024-07-11 18:59 ` Tejun Heo
2024-07-11 19:13 ` Waiman Long
0 siblings, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2024-07-11 18:59 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On Thu, Jul 11, 2024 at 02:51:38PM -0400, Waiman Long wrote:
>
> On 7/11/24 14:44, Tejun Heo wrote:
> > Hello,
> >
> > On Thu, Jul 11, 2024 at 01:39:38PM -0400, Waiman Long wrote:
> > > On 7/11/24 13:18, Tejun Heo wrote:
> > ...
> > > Currently, I use the for_each_css() macro for iteration. If you mean
> > > displaying all the possible cgroup subsystems even if they are not enabled
> > > for the current cgroup, I will have to manually do the iteration.
> > Just wrapping it with for_each_subsys() should do, no? for_each_css() won't
> > iterate anything if css doesn't exist for the cgroup.
>
> OK, I wasn't sure if you were asking to list all the possible cgroup v2
> cgroup subsystems even if they weren't enabled in the current cgroup.
> Apparently, that is the case. I prefer it that way too.
Yeah, I think listing all is better. If the list corresponded directly to
cgroup.controllers, it may make sense to only show enabled ones but we can
have dying ones and implicitly enabled memory and so on, so I think it'd be
cleaner to just list them all.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 18:59 ` Tejun Heo
@ 2024-07-11 19:13 ` Waiman Long
2024-07-11 19:21 ` Tejun Heo
2024-07-11 19:59 ` Johannes Weiner
0 siblings, 2 replies; 21+ messages in thread
From: Waiman Long @ 2024-07-11 19:13 UTC (permalink / raw)
To: Tejun Heo
Cc: Johannes Weiner, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/11/24 14:59, Tejun Heo wrote:
> On Thu, Jul 11, 2024 at 02:51:38PM -0400, Waiman Long wrote:
>> On 7/11/24 14:44, Tejun Heo wrote:
>>> Hello,
>>>
>>> On Thu, Jul 11, 2024 at 01:39:38PM -0400, Waiman Long wrote:
>>>> On 7/11/24 13:18, Tejun Heo wrote:
>>> ...
>>>> Currently, I use the for_each_css() macro for iteration. If you mean
>>>> displaying all the possible cgroup subsystems even if they are not enabled
>>>> for the current cgroup, I will have to manually do the iteration.
>>> Just wrapping it with for_each_subsys() should do, no? for_each_css() won't
>>> iterate anything if css doesn't exist for the cgroup.
>> OK, I wasn't sure if you were asking to list all the possible cgroup v2
>> cgroup subsystems even if they weren't enabled in the current cgroup.
>> Apparently, that is the case. I prefer it that way too.
> Yeah, I think listing all is better. If the list corresponded directly to
> cgroup.controllers, it may make sense to only show enabled ones but we can
> have dying ones and implicitly enabled memory and so on, so I think it'd be
> cleaner to just list them all.
That will means cgroup subsystems that are seldomly used like rdma, misc
or even hugetlb will always be shown in all the cgroup.stat output. I
actually prefer just showing those that are enabled. As for dying memory
cgroups, they will only be shown in its online ancestors. We currently
don't know how many level down are each of the dying ones.
Cheers,
Longman
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 19:13 ` Waiman Long
@ 2024-07-11 19:21 ` Tejun Heo
2024-07-11 19:29 ` Waiman Long
2024-07-11 19:59 ` Johannes Weiner
1 sibling, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2024-07-11 19:21 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
Hello,
On Thu, Jul 11, 2024 at 03:13:12PM -0400, Waiman Long wrote:
> That will means cgroup subsystems that are seldomly used like rdma, misc or
> even hugetlb will always be shown in all the cgroup.stat output. I actually
Hmm... yeah, that does increase the amount of output, but I don't know. The
trade-off is between consistency and brevity and I think I'd go for
consistency here.
> prefer just showing those that are enabled. As for dying memory cgroups,
> they will only be shown in its online ancestors. We currently don't know how
So, one peculiarity with memory is that when you enable io, it gets
implicitly enabled together and we likely wanna show that.
> many level down are each of the dying ones.
I didn't understand the last sentence.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 19:21 ` Tejun Heo
@ 2024-07-11 19:29 ` Waiman Long
0 siblings, 0 replies; 21+ messages in thread
From: Waiman Long @ 2024-07-11 19:29 UTC (permalink / raw)
To: Tejun Heo
Cc: Johannes Weiner, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/11/24 15:21, Tejun Heo wrote:
> Hello,
>
> On Thu, Jul 11, 2024 at 03:13:12PM -0400, Waiman Long wrote:
>> That will means cgroup subsystems that are seldomly used like rdma, misc or
>> even hugetlb will always be shown in all the cgroup.stat output. I actually
> Hmm... yeah, that does increase the amount of output, but I don't know. The
> trade-off is between consistency and brevity and I think I'd go for
> consistency here.
>
>> prefer just showing those that are enabled. As for dying memory cgroups,
>> they will only be shown in its online ancestors. We currently don't know how
> So, one peculiarity with memory is that when you enable io, it gets
> implicitly enabled together and we likely wanna show that.
If memory is implicitly enabled, it is treated as enabled as the
corresponding cgroup->subsys[] entry should be set. I currently don't
filter out those in the cgrp_dfl_implicit_ss_mask. So perf_event, which
is implicitly enabled, is shown in all the cgroup.stat.
If you still want to display all even if some of them are not enabled, I
can certainly do that.
Cheers,
Longman
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 19:13 ` Waiman Long
2024-07-11 19:21 ` Tejun Heo
@ 2024-07-11 19:59 ` Johannes Weiner
2024-07-11 21:00 ` Waiman Long
1 sibling, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2024-07-11 19:59 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On Thu, Jul 11, 2024 at 03:13:12PM -0400, Waiman Long wrote:
>
> On 7/11/24 14:59, Tejun Heo wrote:
> > On Thu, Jul 11, 2024 at 02:51:38PM -0400, Waiman Long wrote:
> >> On 7/11/24 14:44, Tejun Heo wrote:
> >>> Hello,
> >>>
> >>> On Thu, Jul 11, 2024 at 01:39:38PM -0400, Waiman Long wrote:
> >>>> On 7/11/24 13:18, Tejun Heo wrote:
> >>> ...
> >>>> Currently, I use the for_each_css() macro for iteration. If you mean
> >>>> displaying all the possible cgroup subsystems even if they are not enabled
> >>>> for the current cgroup, I will have to manually do the iteration.
> >>> Just wrapping it with for_each_subsys() should do, no? for_each_css() won't
> >>> iterate anything if css doesn't exist for the cgroup.
> >> OK, I wasn't sure if you were asking to list all the possible cgroup v2
> >> cgroup subsystems even if they weren't enabled in the current cgroup.
> >> Apparently, that is the case. I prefer it that way too.
> > Yeah, I think listing all is better. If the list corresponded directly to
> > cgroup.controllers, it may make sense to only show enabled ones but we can
> > have dying ones and implicitly enabled memory and so on, so I think it'd be
> > cleaner to just list them all.
>
> That will means cgroup subsystems that are seldomly used like rdma, misc
> or even hugetlb will always be shown in all the cgroup.stat output. I
> actually prefer just showing those that are enabled. As for dying memory
> cgroups, they will only be shown in its online ancestors. We currently
> don't know how many level down are each of the dying ones.
It seems odd to me to not show dead ones after a cgroup has disabled
the controller again. They still consume memory, after all, and so
continue to be property of that cgroup afterwards.
Instead of doing for_each_css(), would it make more sense to have
struct cgroup {
...
int nr_dying_subsys[CGROUP_SUBSYS_COUNT];
...
}
and just always print them all, regardless of what is, or was,
enabled?
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 19:59 ` Johannes Weiner
@ 2024-07-11 21:00 ` Waiman Long
2024-07-11 21:57 ` Waiman Long
2024-07-12 16:29 ` Johannes Weiner
0 siblings, 2 replies; 21+ messages in thread
From: Waiman Long @ 2024-07-11 21:00 UTC (permalink / raw)
To: Johannes Weiner
Cc: Tejun Heo, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/11/24 15:59, Johannes Weiner wrote:
> On Thu, Jul 11, 2024 at 03:13:12PM -0400, Waiman Long wrote:
>> On 7/11/24 14:59, Tejun Heo wrote:
>>> On Thu, Jul 11, 2024 at 02:51:38PM -0400, Waiman Long wrote:
>>>> On 7/11/24 14:44, Tejun Heo wrote:
>>>>> Hello,
>>>>>
>>>>> On Thu, Jul 11, 2024 at 01:39:38PM -0400, Waiman Long wrote:
>>>>>> On 7/11/24 13:18, Tejun Heo wrote:
>>>>> ...
>>>>>> Currently, I use the for_each_css() macro for iteration. If you mean
>>>>>> displaying all the possible cgroup subsystems even if they are not enabled
>>>>>> for the current cgroup, I will have to manually do the iteration.
>>>>> Just wrapping it with for_each_subsys() should do, no? for_each_css() won't
>>>>> iterate anything if css doesn't exist for the cgroup.
>>>> OK, I wasn't sure if you were asking to list all the possible cgroup v2
>>>> cgroup subsystems even if they weren't enabled in the current cgroup.
>>>> Apparently, that is the case. I prefer it that way too.
>>> Yeah, I think listing all is better. If the list corresponded directly to
>>> cgroup.controllers, it may make sense to only show enabled ones but we can
>>> have dying ones and implicitly enabled memory and so on, so I think it'd be
>>> cleaner to just list them all.
>> That will means cgroup subsystems that are seldomly used like rdma, misc
>> or even hugetlb will always be shown in all the cgroup.stat output. I
>> actually prefer just showing those that are enabled. As for dying memory
>> cgroups, they will only be shown in its online ancestors. We currently
>> don't know how many level down are each of the dying ones.
> It seems odd to me to not show dead ones after a cgroup has disabled
> the controller again. They still consume memory, after all, and so
> continue to be property of that cgroup afterwards.
>
> Instead of doing for_each_css(), would it make more sense to have
>
> struct cgroup {
> ...
> int nr_dying_subsys[CGROUP_SUBSYS_COUNT];
What exactly does new this array for? Is this for copying out
css->nr_dying_descendants before disabling a controller? The number may
be out of date when it is used. I would think we should store the actual
css and clearing it again once the css is ready to be freed.
Anyway, I would suggest doing it as a separate add-on patch if we decide
to do it instead of adding it to the current patch.
Cheers,
Longman
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 21:00 ` Waiman Long
@ 2024-07-11 21:57 ` Waiman Long
2024-07-12 16:29 ` Johannes Weiner
1 sibling, 0 replies; 21+ messages in thread
From: Waiman Long @ 2024-07-11 21:57 UTC (permalink / raw)
To: Johannes Weiner
Cc: Tejun Heo, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/11/24 17:00, Waiman Long wrote:
> On 7/11/24 15:59, Johannes Weiner wrote:
>> On Thu, Jul 11, 2024 at 03:13:12PM -0400, Waiman Long wrote:
>>> On 7/11/24 14:59, Tejun Heo wrote:
>>>> On Thu, Jul 11, 2024 at 02:51:38PM -0400, Waiman Long wrote:
>>>>> On 7/11/24 14:44, Tejun Heo wrote:
>>>>>> Hello,
>>>>>>
>>>>>> On Thu, Jul 11, 2024 at 01:39:38PM -0400, Waiman Long wrote:
>>>>>>> On 7/11/24 13:18, Tejun Heo wrote:
>>>>>> ...
>>>>>>> Currently, I use the for_each_css() macro for iteration. If you
>>>>>>> mean
>>>>>>> displaying all the possible cgroup subsystems even if they are
>>>>>>> not enabled
>>>>>>> for the current cgroup, I will have to manually do the iteration.
>>>>>> Just wrapping it with for_each_subsys() should do, no?
>>>>>> for_each_css() won't
>>>>>> iterate anything if css doesn't exist for the cgroup.
>>>>> OK, I wasn't sure if you were asking to list all the possible
>>>>> cgroup v2
>>>>> cgroup subsystems even if they weren't enabled in the current cgroup.
>>>>> Apparently, that is the case. I prefer it that way too.
>>>> Yeah, I think listing all is better. If the list corresponded
>>>> directly to
>>>> cgroup.controllers, it may make sense to only show enabled ones but
>>>> we can
>>>> have dying ones and implicitly enabled memory and so on, so I think
>>>> it'd be
>>>> cleaner to just list them all.
>>> That will means cgroup subsystems that are seldomly used like rdma,
>>> misc
>>> or even hugetlb will always be shown in all the cgroup.stat output. I
>>> actually prefer just showing those that are enabled. As for dying
>>> memory
>>> cgroups, they will only be shown in its online ancestors. We currently
>>> don't know how many level down are each of the dying ones.
>> It seems odd to me to not show dead ones after a cgroup has disabled
>> the controller again. They still consume memory, after all, and so
>> continue to be property of that cgroup afterwards.
>>
>> Instead of doing for_each_css(), would it make more sense to have
>>
>> struct cgroup {
>> ...
>> int nr_dying_subsys[CGROUP_SUBSYS_COUNT];
>
> What exactly does new this array for? Is this for copying out
> css->nr_dying_descendants before disabling a controller? The number
> may be out of date when it is used. I would think we should store the
> actual css and clearing it again once the css is ready to be freed.
>
> Anyway, I would suggest doing it as a separate add-on patch if we
> decide to do it instead of adding it to the current patch.
Alternatively, we could delay the clearing of cgroup->subsys[] entry
from offline time to until the css is ready to be freed. We do need to
add check about the CSS_ONLINE flag when we only want to deal with
online csses.
Cheers,
Longman
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-11 21:00 ` Waiman Long
2024-07-11 21:57 ` Waiman Long
@ 2024-07-12 16:29 ` Johannes Weiner
2024-07-12 17:10 ` Waiman Long
1 sibling, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2024-07-12 16:29 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On Thu, Jul 11, 2024 at 05:00:41PM -0400, Waiman Long wrote:
> On 7/11/24 15:59, Johannes Weiner wrote:
> > On Thu, Jul 11, 2024 at 03:13:12PM -0400, Waiman Long wrote:
> >> On 7/11/24 14:59, Tejun Heo wrote:
> >>> On Thu, Jul 11, 2024 at 02:51:38PM -0400, Waiman Long wrote:
> >>>> On 7/11/24 14:44, Tejun Heo wrote:
> >>>>> Hello,
> >>>>>
> >>>>> On Thu, Jul 11, 2024 at 01:39:38PM -0400, Waiman Long wrote:
> >>>>>> On 7/11/24 13:18, Tejun Heo wrote:
> >>>>> ...
> >>>>>> Currently, I use the for_each_css() macro for iteration. If you mean
> >>>>>> displaying all the possible cgroup subsystems even if they are not enabled
> >>>>>> for the current cgroup, I will have to manually do the iteration.
> >>>>> Just wrapping it with for_each_subsys() should do, no? for_each_css() won't
> >>>>> iterate anything if css doesn't exist for the cgroup.
> >>>> OK, I wasn't sure if you were asking to list all the possible cgroup v2
> >>>> cgroup subsystems even if they weren't enabled in the current cgroup.
> >>>> Apparently, that is the case. I prefer it that way too.
> >>> Yeah, I think listing all is better. If the list corresponded directly to
> >>> cgroup.controllers, it may make sense to only show enabled ones but we can
> >>> have dying ones and implicitly enabled memory and so on, so I think it'd be
> >>> cleaner to just list them all.
> >> That will means cgroup subsystems that are seldomly used like rdma, misc
> >> or even hugetlb will always be shown in all the cgroup.stat output. I
> >> actually prefer just showing those that are enabled. As for dying memory
> >> cgroups, they will only be shown in its online ancestors. We currently
> >> don't know how many level down are each of the dying ones.
> > It seems odd to me to not show dead ones after a cgroup has disabled
> > the controller again. They still consume memory, after all, and so
> > continue to be property of that cgroup afterwards.
> >
> > Instead of doing for_each_css(), would it make more sense to have
> >
> > struct cgroup {
> > ...
> > int nr_dying_subsys[CGROUP_SUBSYS_COUNT];
>
> What exactly does new this array for?
For keeping the counts. Instead of inside the css.
AFAICS, with your current patch, if somebody were to disable the
controller in cgroup.subtree_control, it would offline the css at that
level, become unreachable from cgroup->subsys[], and you'd lose
remaining counts of dead css that are still associated with that
cgroup. Re-enabling the controller would create a new css with new
descendant counts, and now the reported numbers are actively misleading.
That seems undesirable.
If you track the counts in the cgroup itself, then cgroup.stat would
reliably show the total counts of dead controllers that are associated
with the subtree, even after disabling or toggling controllers.
The hooks in online, offline, release should be the same, just update
css->cgroup->nr_dying_subsys[id] instead of css->nr_dying_descendants.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat
2024-07-12 16:29 ` Johannes Weiner
@ 2024-07-12 17:10 ` Waiman Long
0 siblings, 0 replies; 21+ messages in thread
From: Waiman Long @ 2024-07-12 17:10 UTC (permalink / raw)
To: Johannes Weiner
Cc: Tejun Heo, Zefan Li, Jonathan Corbet, cgroups, linux-doc,
linux-kernel, Kamalesh Babulal, Roman Gushchin
On 7/12/24 12:29, Johannes Weiner wrote:
> On Thu, Jul 11, 2024 at 05:00:41PM -0400, Waiman Long wrote:
>> On 7/11/24 15:59, Johannes Weiner wrote:
>>> On Thu, Jul 11, 2024 at 03:13:12PM -0400, Waiman Long wrote:
>>>> On 7/11/24 14:59, Tejun Heo wrote:
>>>>> On Thu, Jul 11, 2024 at 02:51:38PM -0400, Waiman Long wrote:
>>>>>> On 7/11/24 14:44, Tejun Heo wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> On Thu, Jul 11, 2024 at 01:39:38PM -0400, Waiman Long wrote:
>>>>>>>> On 7/11/24 13:18, Tejun Heo wrote:
>>>>>>> ...
>>>>>>>> Currently, I use the for_each_css() macro for iteration. If you mean
>>>>>>>> displaying all the possible cgroup subsystems even if they are not enabled
>>>>>>>> for the current cgroup, I will have to manually do the iteration.
>>>>>>> Just wrapping it with for_each_subsys() should do, no? for_each_css() won't
>>>>>>> iterate anything if css doesn't exist for the cgroup.
>>>>>> OK, I wasn't sure if you were asking to list all the possible cgroup v2
>>>>>> cgroup subsystems even if they weren't enabled in the current cgroup.
>>>>>> Apparently, that is the case. I prefer it that way too.
>>>>> Yeah, I think listing all is better. If the list corresponded directly to
>>>>> cgroup.controllers, it may make sense to only show enabled ones but we can
>>>>> have dying ones and implicitly enabled memory and so on, so I think it'd be
>>>>> cleaner to just list them all.
>>>> That will means cgroup subsystems that are seldomly used like rdma, misc
>>>> or even hugetlb will always be shown in all the cgroup.stat output. I
>>>> actually prefer just showing those that are enabled. As for dying memory
>>>> cgroups, they will only be shown in its online ancestors. We currently
>>>> don't know how many level down are each of the dying ones.
>>> It seems odd to me to not show dead ones after a cgroup has disabled
>>> the controller again. They still consume memory, after all, and so
>>> continue to be property of that cgroup afterwards.
>>>
>>> Instead of doing for_each_css(), would it make more sense to have
>>>
>>> struct cgroup {
>>> ...
>>> int nr_dying_subsys[CGROUP_SUBSYS_COUNT];
>> What exactly does new this array for?
> For keeping the counts. Instead of inside the css.
>
> AFAICS, with your current patch, if somebody were to disable the
> controller in cgroup.subtree_control, it would offline the css at that
> level, become unreachable from cgroup->subsys[], and you'd lose
> remaining counts of dead css that are still associated with that
> cgroup. Re-enabling the controller would create a new css with new
> descendant counts, and now the reported numbers are actively misleading.
>
> That seems undesirable.
>
> If you track the counts in the cgroup itself, then cgroup.stat would
> reliably show the total counts of dead controllers that are associated
> with the subtree, even after disabling or toggling controllers.
>
> The hooks in online, offline, release should be the same, just update
> css->cgroup->nr_dying_subsys[id] instead of css->nr_dying_descendants.
That does make sense. Thank for the suggestion. I will update the patch
accordingly.
Cheers,
Longman
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-07-12 17:10 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-10 18:23 [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat Waiman Long
2024-07-10 18:23 ` [PATCH v3 2/2] cgroup: Limit frequency of reading cgroup.stat for unprivileged users Waiman Long
2024-07-10 21:43 ` [PATCH v3 1/2] cgroup: Show # of subsystem CSSes in cgroup.stat Roman Gushchin
2024-07-10 23:49 ` Waiman Long
2024-07-10 21:59 ` Tejun Heo
2024-07-10 23:51 ` Waiman Long
2024-07-11 13:49 ` Johannes Weiner
2024-07-11 14:05 ` Waiman Long
2024-07-11 17:18 ` Tejun Heo
2024-07-11 17:39 ` Waiman Long
2024-07-11 18:44 ` Tejun Heo
2024-07-11 18:51 ` Waiman Long
2024-07-11 18:59 ` Tejun Heo
2024-07-11 19:13 ` Waiman Long
2024-07-11 19:21 ` Tejun Heo
2024-07-11 19:29 ` Waiman Long
2024-07-11 19:59 ` Johannes Weiner
2024-07-11 21:00 ` Waiman Long
2024-07-11 21:57 ` Waiman Long
2024-07-12 16:29 ` Johannes Weiner
2024-07-12 17:10 ` Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox