* [PATCH -next v3] cpuset: Treat cpusets in attaching as populated
@ 2025-11-14 2:08 Chen Ridong
2025-11-17 20:50 ` Waiman Long
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Chen Ridong @ 2025-11-14 2:08 UTC (permalink / raw)
To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong
From: Chen Ridong <chenridong@huawei.com>
Currently, the check for whether a partition is populated does not
account for tasks in the cpuset of attaching. This is a corner case
that can leave a task stuck in a partition with no effective CPUs.
The race condition occurs as follows:
cpu0 cpu1
//cpuset A with cpu N
migrate task p to A
cpuset_can_attach
// with effective cpus
// check ok
// cpuset_mutex is not held // clear cpuset.cpus.exclusive
// making effective cpus empty
update_exclusive_cpumask
// tasks_nocpu_error check ok
// empty effective cpus, partition valid
cpuset_attach
...
// task p stays in A, with non-effective cpus.
To fix this issue, this patch introduces cs_is_populated, which considers
tasks in the attaching cpuset. This new helper is used in validate_change
and partition_is_populated.
Fixes: e2d59900d936 ("cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cpuset.c | 35 +++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index daf813386260..8bf7c38ba320 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -356,6 +356,15 @@ static inline bool is_in_v2_mode(void)
(cpuset_cgrp_subsys.root->flags & CGRP_ROOT_CPUSET_V2_MODE);
}
+static inline bool cpuset_is_populated(struct cpuset *cs)
+{
+ lockdep_assert_held(&cpuset_mutex);
+
+ /* Cpusets in the process of attaching should be considered as populated */
+ return cgroup_is_populated(cs->css.cgroup) ||
+ cs->attach_in_progress;
+}
+
/**
* partition_is_populated - check if partition has tasks
* @cs: partition root to be checked
@@ -373,19 +382,29 @@ static inline bool is_in_v2_mode(void)
static inline bool partition_is_populated(struct cpuset *cs,
struct cpuset *excluded_child)
{
- struct cgroup_subsys_state *css;
- struct cpuset *child;
+ struct cpuset *cp;
+ struct cgroup_subsys_state *pos_css;
- if (cs->css.cgroup->nr_populated_csets)
+ /*
+ * We cannot call cs_is_populated(cs) directly, as
+ * nr_populated_domain_children may include populated
+ * csets from descendants that are partitions.
+ */
+ if (cs->css.cgroup->nr_populated_csets ||
+ cs->attach_in_progress)
return true;
rcu_read_lock();
- cpuset_for_each_child(child, css, cs) {
- if (child == excluded_child)
+ cpuset_for_each_descendant_pre(cp, pos_css, cs) {
+ if (cp == cs || cp == excluded_child)
continue;
- if (is_partition_valid(child))
+
+ if (is_partition_valid(cp)) {
+ pos_css = css_rightmost_descendant(pos_css);
continue;
- if (cgroup_is_populated(child->css.cgroup)) {
+ }
+
+ if (cpuset_is_populated(cp)) {
rcu_read_unlock();
return true;
}
@@ -670,7 +689,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
* be changed to have empty cpus_allowed or mems_allowed.
*/
ret = -ENOSPC;
- if ((cgroup_is_populated(cur->css.cgroup) || cur->attach_in_progress)) {
+ if (cpuset_is_populated(cur)) {
if (!cpumask_empty(cur->cpus_allowed) &&
cpumask_empty(trial->cpus_allowed))
goto out;
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH -next v3] cpuset: Treat cpusets in attaching as populated
2025-11-14 2:08 [PATCH -next v3] cpuset: Treat cpusets in attaching as populated Chen Ridong
@ 2025-11-17 20:50 ` Waiman Long
2025-11-18 0:31 ` Chen Ridong
2025-11-21 0:39 ` Chen Ridong
2025-11-21 2:27 ` Tejun Heo
2 siblings, 1 reply; 5+ messages in thread
From: Waiman Long @ 2025-11-17 20:50 UTC (permalink / raw)
To: Chen Ridong, tj, hannes, mkoutny
Cc: cgroups, linux-kernel, lujialin4, chenridong
On 11/13/25 9:08 PM, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> Currently, the check for whether a partition is populated does not
> account for tasks in the cpuset of attaching. This is a corner case
> that can leave a task stuck in a partition with no effective CPUs.
>
> The race condition occurs as follows:
>
> cpu0 cpu1
> //cpuset A with cpu N
> migrate task p to A
> cpuset_can_attach
> // with effective cpus
> // check ok
>
> // cpuset_mutex is not held // clear cpuset.cpus.exclusive
> // making effective cpus empty
> update_exclusive_cpumask
> // tasks_nocpu_error check ok
> // empty effective cpus, partition valid
> cpuset_attach
> ...
> // task p stays in A, with non-effective cpus.
>
> To fix this issue, this patch introduces cs_is_populated, which considers
> tasks in the attaching cpuset. This new helper is used in validate_change
> and partition_is_populated.
>
> Fixes: e2d59900d936 ("cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> kernel/cgroup/cpuset.c | 35 +++++++++++++++++++++++++++--------
> 1 file changed, 27 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index daf813386260..8bf7c38ba320 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -356,6 +356,15 @@ static inline bool is_in_v2_mode(void)
> (cpuset_cgrp_subsys.root->flags & CGRP_ROOT_CPUSET_V2_MODE);
> }
>
> +static inline bool cpuset_is_populated(struct cpuset *cs)
> +{
> + lockdep_assert_held(&cpuset_mutex);
> +
> + /* Cpusets in the process of attaching should be considered as populated */
> + return cgroup_is_populated(cs->css.cgroup) ||
> + cs->attach_in_progress;
> +}
> +
> /**
> * partition_is_populated - check if partition has tasks
> * @cs: partition root to be checked
> @@ -373,19 +382,29 @@ static inline bool is_in_v2_mode(void)
> static inline bool partition_is_populated(struct cpuset *cs,
> struct cpuset *excluded_child)
> {
> - struct cgroup_subsys_state *css;
> - struct cpuset *child;
> + struct cpuset *cp;
> + struct cgroup_subsys_state *pos_css;
>
> - if (cs->css.cgroup->nr_populated_csets)
> + /*
> + * We cannot call cs_is_populated(cs) directly, as
> + * nr_populated_domain_children may include populated
> + * csets from descendants that are partitions.
> + */
> + if (cs->css.cgroup->nr_populated_csets ||
> + cs->attach_in_progress)
> return true;
>
> rcu_read_lock();
> - cpuset_for_each_child(child, css, cs) {
> - if (child == excluded_child)
> + cpuset_for_each_descendant_pre(cp, pos_css, cs) {
> + if (cp == cs || cp == excluded_child)
> continue;
> - if (is_partition_valid(child))
> +
> + if (is_partition_valid(cp)) {
> + pos_css = css_rightmost_descendant(pos_css);
> continue;
> - if (cgroup_is_populated(child->css.cgroup)) {
> + }
> +
> + if (cpuset_is_populated(cp)) {
> rcu_read_unlock();
> return true;
> }
> @@ -670,7 +689,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
> * be changed to have empty cpus_allowed or mems_allowed.
> */
> ret = -ENOSPC;
> - if ((cgroup_is_populated(cur->css.cgroup) || cur->attach_in_progress)) {
> + if (cpuset_is_populated(cur)) {
> if (!cpumask_empty(cur->cpus_allowed) &&
> cpumask_empty(trial->cpus_allowed))
> goto out;
Reviewed-by: Waiman Long <longman@redhat.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH -next v3] cpuset: Treat cpusets in attaching as populated
2025-11-17 20:50 ` Waiman Long
@ 2025-11-18 0:31 ` Chen Ridong
0 siblings, 0 replies; 5+ messages in thread
From: Chen Ridong @ 2025-11-18 0:31 UTC (permalink / raw)
To: Waiman Long, tj, hannes, mkoutny
Cc: cgroups, linux-kernel, lujialin4, chenridong
On 2025/11/18 4:50, Waiman Long wrote:
> On 11/13/25 9:08 PM, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> Currently, the check for whether a partition is populated does not
>> account for tasks in the cpuset of attaching. This is a corner case
>> that can leave a task stuck in a partition with no effective CPUs.
>>
>> The race condition occurs as follows:
>>
>> cpu0 cpu1
>> //cpuset A with cpu N
>> migrate task p to A
>> cpuset_can_attach
>> // with effective cpus
>> // check ok
>>
>> // cpuset_mutex is not held // clear cpuset.cpus.exclusive
>> // making effective cpus empty
>> update_exclusive_cpumask
>> // tasks_nocpu_error check ok
>> // empty effective cpus, partition valid
>> cpuset_attach
>> ...
>> // task p stays in A, with non-effective cpus.
>>
>> To fix this issue, this patch introduces cs_is_populated, which considers
>> tasks in the attaching cpuset. This new helper is used in validate_change
>> and partition_is_populated.
>>
>> Fixes: e2d59900d936 ("cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective")
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>> kernel/cgroup/cpuset.c | 35 +++++++++++++++++++++++++++--------
>> 1 file changed, 27 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index daf813386260..8bf7c38ba320 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -356,6 +356,15 @@ static inline bool is_in_v2_mode(void)
>> (cpuset_cgrp_subsys.root->flags & CGRP_ROOT_CPUSET_V2_MODE);
>> }
>> +static inline bool cpuset_is_populated(struct cpuset *cs)
>> +{
>> + lockdep_assert_held(&cpuset_mutex);
>> +
>> + /* Cpusets in the process of attaching should be considered as populated */
>> + return cgroup_is_populated(cs->css.cgroup) ||
>> + cs->attach_in_progress;
>> +}
>> +
>> /**
>> * partition_is_populated - check if partition has tasks
>> * @cs: partition root to be checked
>> @@ -373,19 +382,29 @@ static inline bool is_in_v2_mode(void)
>> static inline bool partition_is_populated(struct cpuset *cs,
>> struct cpuset *excluded_child)
>> {
>> - struct cgroup_subsys_state *css;
>> - struct cpuset *child;
>> + struct cpuset *cp;
>> + struct cgroup_subsys_state *pos_css;
>> - if (cs->css.cgroup->nr_populated_csets)
>> + /*
>> + * We cannot call cs_is_populated(cs) directly, as
>> + * nr_populated_domain_children may include populated
>> + * csets from descendants that are partitions.
>> + */
>> + if (cs->css.cgroup->nr_populated_csets ||
>> + cs->attach_in_progress)
>> return true;
>> rcu_read_lock();
>> - cpuset_for_each_child(child, css, cs) {
>> - if (child == excluded_child)
>> + cpuset_for_each_descendant_pre(cp, pos_css, cs) {
>> + if (cp == cs || cp == excluded_child)
>> continue;
>> - if (is_partition_valid(child))
>> +
>> + if (is_partition_valid(cp)) {
>> + pos_css = css_rightmost_descendant(pos_css);
>> continue;
>> - if (cgroup_is_populated(child->css.cgroup)) {
>> + }
>> +
>> + if (cpuset_is_populated(cp)) {
>> rcu_read_unlock();
>> return true;
>> }
>> @@ -670,7 +689,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
>> * be changed to have empty cpus_allowed or mems_allowed.
>> */
>> ret = -ENOSPC;
>> - if ((cgroup_is_populated(cur->css.cgroup) || cur->attach_in_progress)) {
>> + if (cpuset_is_populated(cur)) {
>> if (!cpumask_empty(cur->cpus_allowed) &&
>> cpumask_empty(trial->cpus_allowed))
>> goto out;
> Reviewed-by: Waiman Long <longman@redhat.com>
>
Thanks.
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH -next v3] cpuset: Treat cpusets in attaching as populated
2025-11-14 2:08 [PATCH -next v3] cpuset: Treat cpusets in attaching as populated Chen Ridong
2025-11-17 20:50 ` Waiman Long
@ 2025-11-21 0:39 ` Chen Ridong
2025-11-21 2:27 ` Tejun Heo
2 siblings, 0 replies; 5+ messages in thread
From: Chen Ridong @ 2025-11-21 0:39 UTC (permalink / raw)
To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong
On 2025/11/14 10:08, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> Currently, the check for whether a partition is populated does not
> account for tasks in the cpuset of attaching. This is a corner case
> that can leave a task stuck in a partition with no effective CPUs.
>
> The race condition occurs as follows:
>
> cpu0 cpu1
> //cpuset A with cpu N
> migrate task p to A
> cpuset_can_attach
> // with effective cpus
> // check ok
>
> // cpuset_mutex is not held // clear cpuset.cpus.exclusive
> // making effective cpus empty
> update_exclusive_cpumask
> // tasks_nocpu_error check ok
> // empty effective cpus, partition valid
> cpuset_attach
> ...
> // task p stays in A, with non-effective cpus.
>
> To fix this issue, this patch introduces cs_is_populated, which considers
> tasks in the attaching cpuset. This new helper is used in validate_change
> and partition_is_populated.
>
> Fixes: e2d59900d936 ("cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> kernel/cgroup/cpuset.c | 35 +++++++++++++++++++++++++++--------
> 1 file changed, 27 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index daf813386260..8bf7c38ba320 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -356,6 +356,15 @@ static inline bool is_in_v2_mode(void)
> (cpuset_cgrp_subsys.root->flags & CGRP_ROOT_CPUSET_V2_MODE);
> }
>
> +static inline bool cpuset_is_populated(struct cpuset *cs)
> +{
> + lockdep_assert_held(&cpuset_mutex);
> +
> + /* Cpusets in the process of attaching should be considered as populated */
> + return cgroup_is_populated(cs->css.cgroup) ||
> + cs->attach_in_progress;
> +}
> +
> /**
> * partition_is_populated - check if partition has tasks
> * @cs: partition root to be checked
> @@ -373,19 +382,29 @@ static inline bool is_in_v2_mode(void)
> static inline bool partition_is_populated(struct cpuset *cs,
> struct cpuset *excluded_child)
> {
> - struct cgroup_subsys_state *css;
> - struct cpuset *child;
> + struct cpuset *cp;
> + struct cgroup_subsys_state *pos_css;
>
> - if (cs->css.cgroup->nr_populated_csets)
> + /*
> + * We cannot call cs_is_populated(cs) directly, as
> + * nr_populated_domain_children may include populated
> + * csets from descendants that are partitions.
> + */
> + if (cs->css.cgroup->nr_populated_csets ||
> + cs->attach_in_progress)
> return true;
>
> rcu_read_lock();
> - cpuset_for_each_child(child, css, cs) {
> - if (child == excluded_child)
> + cpuset_for_each_descendant_pre(cp, pos_css, cs) {
> + if (cp == cs || cp == excluded_child)
> continue;
> - if (is_partition_valid(child))
> +
> + if (is_partition_valid(cp)) {
> + pos_css = css_rightmost_descendant(pos_css);
> continue;
> - if (cgroup_is_populated(child->css.cgroup)) {
> + }
> +
> + if (cpuset_is_populated(cp)) {
> rcu_read_unlock();
> return true;
> }
> @@ -670,7 +689,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
> * be changed to have empty cpus_allowed or mems_allowed.
> */
> ret = -ENOSPC;
> - if ((cgroup_is_populated(cur->css.cgroup) || cur->attach_in_progress)) {
> + if (cpuset_is_populated(cur)) {
> if (!cpumask_empty(cur->cpus_allowed) &&
> cpumask_empty(trial->cpus_allowed))
> goto out;
Hi TJ,
It seems you have missed this patch?
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH -next v3] cpuset: Treat cpusets in attaching as populated
2025-11-14 2:08 [PATCH -next v3] cpuset: Treat cpusets in attaching as populated Chen Ridong
2025-11-17 20:50 ` Waiman Long
2025-11-21 0:39 ` Chen Ridong
@ 2025-11-21 2:27 ` Tejun Heo
2 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2025-11-21 2:27 UTC (permalink / raw)
To: Chen Ridong; +Cc: longman, hannes, mkoutny, cgroups, linux-kernel, lujialin4
Applied to cgroup/for-6.19.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-11-21 2:27 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-14 2:08 [PATCH -next v3] cpuset: Treat cpusets in attaching as populated Chen Ridong
2025-11-17 20:50 ` Waiman Long
2025-11-18 0:31 ` Chen Ridong
2025-11-21 0:39 ` Chen Ridong
2025-11-21 2:27 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).