* [cgroup/for-6.19 PATCH v3 1/5] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
2025-11-05 4:38 [cgroup/for-6.19 PATCH v3 0/5] cgroup/cpuset: Additional housekeeping check & cleanup Waiman Long
@ 2025-11-05 4:38 ` Waiman Long
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 2/5] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Waiman Long
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2025-11-05 4:38 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
Frederic Weisbecker, Waiman Long, Chen Ridong
From: Gabriele Monaco <gmonaco@redhat.com>
update_unbound_workqueue_cpumask() updates unbound workqueues settings
when there's a change in isolated CPUs, but it can be used for other
subsystems requiring updated when isolated CPUs change.
Generalise the name to update_isolation_cpumasks() to prepare for other
functions unrelated to workqueues to be called in that spot.
[longman: Change the function name to update_isolation_cpumasks()]
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
---
kernel/cgroup/cpuset.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 7aef59ea9627..da770dac955e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1393,7 +1393,7 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
return isolcpus_updated;
}
-static void update_unbound_workqueue_cpumask(bool isolcpus_updated)
+static void update_isolation_cpumasks(bool isolcpus_updated)
{
int ret;
@@ -1557,7 +1557,7 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
list_add(&cs->remote_sibling, &remote_children);
cpumask_copy(cs->effective_xcpus, tmp->new_cpus);
spin_unlock_irq(&callback_lock);
- update_unbound_workqueue_cpumask(isolcpus_updated);
+ update_isolation_cpumasks(isolcpus_updated);
cpuset_force_rebuild();
cs->prs_err = 0;
@@ -1598,7 +1598,7 @@ static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
compute_excpus(cs, cs->effective_xcpus);
reset_partition_data(cs);
spin_unlock_irq(&callback_lock);
- update_unbound_workqueue_cpumask(isolcpus_updated);
+ update_isolation_cpumasks(isolcpus_updated);
cpuset_force_rebuild();
/*
@@ -1667,7 +1667,7 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
if (xcpus)
cpumask_copy(cs->exclusive_cpus, xcpus);
spin_unlock_irq(&callback_lock);
- update_unbound_workqueue_cpumask(isolcpus_updated);
+ update_isolation_cpumasks(isolcpus_updated);
if (adding || deleting)
cpuset_force_rebuild();
@@ -2011,7 +2011,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
tmp->delmask);
spin_unlock_irq(&callback_lock);
- update_unbound_workqueue_cpumask(isolcpus_updated);
+ update_isolation_cpumasks(isolcpus_updated);
if ((old_prs != new_prs) && (cmd == partcmd_update))
update_partition_exclusive_flag(cs, new_prs);
@@ -3029,7 +3029,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
else if (isolcpus_updated)
isolated_cpus_update(old_prs, new_prs, cs->effective_xcpus);
spin_unlock_irq(&callback_lock);
- update_unbound_workqueue_cpumask(isolcpus_updated);
+ update_isolation_cpumasks(isolcpus_updated);
/* Force update if switching back to member & update effective_xcpus */
update_cpumasks_hier(cs, &tmpmask, !new_prs);
--
2.51.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* [cgroup/for-6.19 PATCH v3 2/5] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
2025-11-05 4:38 [cgroup/for-6.19 PATCH v3 0/5] cgroup/cpuset: Additional housekeeping check & cleanup Waiman Long
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 1/5] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() Waiman Long
@ 2025-11-05 4:38 ` Waiman Long
2025-11-05 7:06 ` Chen Ridong
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 3/5] cgroup/cpuset: Move up prstate_housekeeping_conflict() helper Waiman Long
` (3 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2025-11-05 4:38 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
Frederic Weisbecker, Waiman Long
Currently the user can set up isolated cpus via cpuset and nohz_full in
such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
domain isolated nor nohz full). This can be a problem for other
subsystems (e.g. the timer wheel imgration).
Prevent this configuration by blocking any assignation that would cause
the union of domain isolated cpus and nohz_full to covers all CPUs.
[longman: Remove isolated_cpus_should_update() and rewrite the checking
in update_prstate() and update_parent_effective_cpumask()]
Originally-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 74 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 73 insertions(+), 1 deletion(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index da770dac955e..99622e90991a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1393,6 +1393,45 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
return isolcpus_updated;
}
+/*
+ * isolated_cpus_can_update - check for isolated & nohz_full conflicts
+ * @add_cpus: cpu mask for cpus that are going to be isolated
+ * @del_cpus: cpu mask for cpus that are no longer isolated, can be NULL
+ * Return: false if there is conflict, true otherwise
+ *
+ * If nohz_full is enabled and we have isolated CPUs, their combination must
+ * still leave housekeeping CPUs.
+ *
+ * TBD: Should consider merging this function into
+ * prstate_housekeeping_conflict().
+ */
+static bool isolated_cpus_can_update(struct cpumask *add_cpus,
+ struct cpumask *del_cpus)
+{
+ cpumask_var_t full_hk_cpus;
+ int res = true;
+
+ if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE))
+ return true;
+
+ if (del_cpus && cpumask_weight_and(del_cpus,
+ housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)))
+ return true;
+
+ if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL))
+ return false;
+
+ cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
+ housekeeping_cpumask(HK_TYPE_DOMAIN));
+ cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus);
+ cpumask_and(full_hk_cpus, full_hk_cpus, cpu_active_mask);
+ if (!cpumask_weight_andnot(full_hk_cpus, add_cpus))
+ res = false;
+
+ free_cpumask_var(full_hk_cpus);
+ return res;
+}
+
static void update_isolation_cpumasks(bool isolcpus_updated)
{
int ret;
@@ -1551,6 +1590,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
return PERR_INVCPUS;
+ if ((new_prs == PRS_ISOLATED) &&
+ !isolated_cpus_can_update(tmp->new_cpus, NULL))
+ return PERR_HKEEPING;
spin_lock_irq(&callback_lock);
isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
@@ -1650,6 +1692,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
cs->prs_err = PERR_NOCPUS;
+ else if ((prs == PRS_ISOLATED) &&
+ !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
+ cs->prs_err = PERR_HKEEPING;
if (cs->prs_err)
goto invalidate;
}
@@ -1750,6 +1795,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
int part_error = PERR_NONE; /* Partition error? */
int isolcpus_updated = 0;
struct cpumask *xcpus = user_xcpus(cs);
+ int parent_prs = parent->partition_root_state;
bool nocpu;
lockdep_assert_held(&cpuset_mutex);
@@ -1813,6 +1859,10 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
if (prstate_housekeeping_conflict(new_prs, xcpus))
return PERR_HKEEPING;
+ if ((new_prs == PRS_ISOLATED) && (new_prs != parent_prs) &&
+ !isolated_cpus_can_update(xcpus, NULL))
+ return PERR_HKEEPING;
+
if (tasks_nocpu_error(parent, cs, xcpus))
return PERR_NOCPUS;
@@ -1866,6 +1916,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
*
* For invalid partition:
* delmask = newmask & parent->effective_xcpus
+ * The partition may become valid soon.
*/
if (is_partition_invalid(cs)) {
adding = false;
@@ -1880,6 +1931,23 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
deleting = cpumask_and(tmp->delmask, tmp->delmask,
parent->effective_xcpus);
}
+
+ /*
+ * TBD: Invalidate a currently valid child root partition may
+ * still break isolated_cpus_can_update() rule if parent is an
+ * isolated partition.
+ */
+ if (is_partition_valid(cs) && (old_prs != parent_prs)) {
+ if ((parent_prs == PRS_ROOT) &&
+ /* Adding to parent means removing isolated CPUs */
+ !isolated_cpus_can_update(tmp->delmask, tmp->addmask))
+ part_error = PERR_HKEEPING;
+ if ((parent_prs == PRS_ISOLATED) &&
+ /* Adding to parent means adding isolated CPUs */
+ !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
+ part_error = PERR_HKEEPING;
+ }
+
/*
* The new CPUs to be removed from parent's effective CPUs
* must be present.
@@ -2994,7 +3062,11 @@ static int update_prstate(struct cpuset *cs, int new_prs)
* A change in load balance state only, no change in cpumasks.
* Need to update isolated_cpus.
*/
- isolcpus_updated = true;
+ if ((new_prs == PRS_ISOLATED) &&
+ !isolated_cpus_can_update(cs->effective_xcpus, NULL))
+ err = PERR_HKEEPING;
+ else
+ isolcpus_updated = true;
} else {
/*
* Switching back to member is always allowed even if it
--
2.51.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [cgroup/for-6.19 PATCH v3 2/5] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 2/5] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Waiman Long
@ 2025-11-05 7:06 ` Chen Ridong
0 siblings, 0 replies; 10+ messages in thread
From: Chen Ridong @ 2025-11-05 7:06 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
Frederic Weisbecker
On 2025/11/5 12:38, Waiman Long wrote:
> Currently the user can set up isolated cpus via cpuset and nohz_full in
> such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
> domain isolated nor nohz full). This can be a problem for other
> subsystems (e.g. the timer wheel imgration).
>
> Prevent this configuration by blocking any assignation that would cause
> the union of domain isolated cpus and nohz_full to covers all CPUs.
>
> [longman: Remove isolated_cpus_should_update() and rewrite the checking
> in update_prstate() and update_parent_effective_cpumask()]
>
> Originally-by: Gabriele Monaco <gmonaco@redhat.com>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/cgroup/cpuset.c | 74 +++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 73 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index da770dac955e..99622e90991a 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1393,6 +1393,45 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
> return isolcpus_updated;
> }
>
> +/*
> + * isolated_cpus_can_update - check for isolated & nohz_full conflicts
> + * @add_cpus: cpu mask for cpus that are going to be isolated
> + * @del_cpus: cpu mask for cpus that are no longer isolated, can be NULL
> + * Return: false if there is conflict, true otherwise
> + *
> + * If nohz_full is enabled and we have isolated CPUs, their combination must
> + * still leave housekeeping CPUs.
> + *
> + * TBD: Should consider merging this function into
> + * prstate_housekeeping_conflict().
> + */
> +static bool isolated_cpus_can_update(struct cpumask *add_cpus,
> + struct cpumask *del_cpus)
> +{
> + cpumask_var_t full_hk_cpus;
> + int res = true;
> +
> + if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE))
> + return true;
> +
> + if (del_cpus && cpumask_weight_and(del_cpus,
> + housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)))
> + return true;
> +
> + if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL))
> + return false;
> +
> + cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
> + housekeeping_cpumask(HK_TYPE_DOMAIN));
> + cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus);
> + cpumask_and(full_hk_cpus, full_hk_cpus, cpu_active_mask);
> + if (!cpumask_weight_andnot(full_hk_cpus, add_cpus))
> + res = false;
> +
> + free_cpumask_var(full_hk_cpus);
> + return res;
> +}
> +
> static void update_isolation_cpumasks(bool isolcpus_updated)
> {
> int ret;
> @@ -1551,6 +1590,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
> if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
> cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
> return PERR_INVCPUS;
> + if ((new_prs == PRS_ISOLATED) &&
> + !isolated_cpus_can_update(tmp->new_cpus, NULL))
> + return PERR_HKEEPING;
>
> spin_lock_irq(&callback_lock);
> isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
> @@ -1650,6 +1692,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
> else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
> cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
> cs->prs_err = PERR_NOCPUS;
> + else if ((prs == PRS_ISOLATED) &&
> + !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
> + cs->prs_err = PERR_HKEEPING;
> if (cs->prs_err)
> goto invalidate;
> }
> @@ -1750,6 +1795,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
> int part_error = PERR_NONE; /* Partition error? */
> int isolcpus_updated = 0;
> struct cpumask *xcpus = user_xcpus(cs);
> + int parent_prs = parent->partition_root_state;
> bool nocpu;
>
> lockdep_assert_held(&cpuset_mutex);
> @@ -1813,6 +1859,10 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
> if (prstate_housekeeping_conflict(new_prs, xcpus))
> return PERR_HKEEPING;
>
> + if ((new_prs == PRS_ISOLATED) && (new_prs != parent_prs) &&
> + !isolated_cpus_can_update(xcpus, NULL))
> + return PERR_HKEEPING;
> +
> if (tasks_nocpu_error(parent, cs, xcpus))
> return PERR_NOCPUS;
>
> @@ -1866,6 +1916,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
> *
> * For invalid partition:
> * delmask = newmask & parent->effective_xcpus
> + * The partition may become valid soon.
> */
> if (is_partition_invalid(cs)) {
> adding = false;
> @@ -1880,6 +1931,23 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
> deleting = cpumask_and(tmp->delmask, tmp->delmask,
> parent->effective_xcpus);
> }
> +
> + /*
> + * TBD: Invalidate a currently valid child root partition may
> + * still break isolated_cpus_can_update() rule if parent is an
> + * isolated partition.
> + */
> + if (is_partition_valid(cs) && (old_prs != parent_prs)) {
> + if ((parent_prs == PRS_ROOT) &&
> + /* Adding to parent means removing isolated CPUs */
> + !isolated_cpus_can_update(tmp->delmask, tmp->addmask))
> + part_error = PERR_HKEEPING;
> + if ((parent_prs == PRS_ISOLATED) &&
> + /* Adding to parent means adding isolated CPUs */
> + !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
> + part_error = PERR_HKEEPING;
> + }
> +
> /*
> * The new CPUs to be removed from parent's effective CPUs
> * must be present.
> @@ -2994,7 +3062,11 @@ static int update_prstate(struct cpuset *cs, int new_prs)
> * A change in load balance state only, no change in cpumasks.
> * Need to update isolated_cpus.
> */
> - isolcpus_updated = true;
> + if ((new_prs == PRS_ISOLATED) &&
> + !isolated_cpus_can_update(cs->effective_xcpus, NULL))
> + err = PERR_HKEEPING;
> + else
> + isolcpus_updated = true;
> } else {
> /*
> * Switching back to member is always allowed even if it
Reviewed-by: Chen Ridong <chenridong@huawei.com>
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 10+ messages in thread
* [cgroup/for-6.19 PATCH v3 3/5] cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
2025-11-05 4:38 [cgroup/for-6.19 PATCH v3 0/5] cgroup/cpuset: Additional housekeeping check & cleanup Waiman Long
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 1/5] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() Waiman Long
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 2/5] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Waiman Long
@ 2025-11-05 4:38 ` Waiman Long
2025-11-05 6:18 ` Chen Ridong
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 4/5] cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition Waiman Long
` (2 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2025-11-05 4:38 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
Frederic Weisbecker, Waiman Long
Move up the prstate_housekeeping_conflict() helper so that it can be
used in remote partition code.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 99622e90991a..cc9c3402f16b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1432,6 +1432,26 @@ static bool isolated_cpus_can_update(struct cpumask *add_cpus,
return res;
}
+/*
+ * prstate_housekeeping_conflict - check for partition & housekeeping conflicts
+ * @prstate: partition root state to be checked
+ * @new_cpus: cpu mask
+ * Return: true if there is conflict, false otherwise
+ *
+ * CPUs outside of boot_hk_cpus, if defined, can only be used in an
+ * isolated partition.
+ */
+static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
+{
+ if (!have_boot_isolcpus)
+ return false;
+
+ if ((prstate != PRS_ISOLATED) && !cpumask_subset(new_cpus, boot_hk_cpus))
+ return true;
+
+ return false;
+}
+
static void update_isolation_cpumasks(bool isolcpus_updated)
{
int ret;
@@ -1727,26 +1747,6 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
remote_partition_disable(cs, tmp);
}
-/*
- * prstate_housekeeping_conflict - check for partition & housekeeping conflicts
- * @prstate: partition root state to be checked
- * @new_cpus: cpu mask
- * Return: true if there is conflict, false otherwise
- *
- * CPUs outside of boot_hk_cpus, if defined, can only be used in an
- * isolated partition.
- */
-static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
-{
- if (!have_boot_isolcpus)
- return false;
-
- if ((prstate != PRS_ISOLATED) && !cpumask_subset(new_cpus, boot_hk_cpus))
- return true;
-
- return false;
-}
-
/**
* update_parent_effective_cpumask - update effective_cpus mask of parent cpuset
* @cs: The cpuset that requests change in partition root state
--
2.51.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [cgroup/for-6.19 PATCH v3 3/5] cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 3/5] cgroup/cpuset: Move up prstate_housekeeping_conflict() helper Waiman Long
@ 2025-11-05 6:18 ` Chen Ridong
0 siblings, 0 replies; 10+ messages in thread
From: Chen Ridong @ 2025-11-05 6:18 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
Frederic Weisbecker
On 2025/11/5 12:38, Waiman Long wrote:
> Move up the prstate_housekeeping_conflict() helper so that it can be
> used in remote partition code.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/cgroup/cpuset.c | 40 ++++++++++++++++++++--------------------
> 1 file changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 99622e90991a..cc9c3402f16b 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1432,6 +1432,26 @@ static bool isolated_cpus_can_update(struct cpumask *add_cpus,
> return res;
> }
>
> +/*
> + * prstate_housekeeping_conflict - check for partition & housekeeping conflicts
> + * @prstate: partition root state to be checked
> + * @new_cpus: cpu mask
> + * Return: true if there is conflict, false otherwise
> + *
> + * CPUs outside of boot_hk_cpus, if defined, can only be used in an
> + * isolated partition.
> + */
> +static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
> +{
> + if (!have_boot_isolcpus)
> + return false;
> +
> + if ((prstate != PRS_ISOLATED) && !cpumask_subset(new_cpus, boot_hk_cpus))
> + return true;
> +
> + return false;
> +}
> +
> static void update_isolation_cpumasks(bool isolcpus_updated)
> {
> int ret;
> @@ -1727,26 +1747,6 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
> remote_partition_disable(cs, tmp);
> }
>
> -/*
> - * prstate_housekeeping_conflict - check for partition & housekeeping conflicts
> - * @prstate: partition root state to be checked
> - * @new_cpus: cpu mask
> - * Return: true if there is conflict, false otherwise
> - *
> - * CPUs outside of boot_hk_cpus, if defined, can only be used in an
> - * isolated partition.
> - */
> -static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
> -{
> - if (!have_boot_isolcpus)
> - return false;
> -
> - if ((prstate != PRS_ISOLATED) && !cpumask_subset(new_cpus, boot_hk_cpus))
> - return true;
> -
> - return false;
> -}
> -
> /**
> * update_parent_effective_cpumask - update effective_cpus mask of parent cpuset
> * @cs: The cpuset that requests change in partition root state
Reviewed-by: Chen Ridong <chenridong@huawei.com>
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 10+ messages in thread
* [cgroup/for-6.19 PATCH v3 4/5] cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
2025-11-05 4:38 [cgroup/for-6.19 PATCH v3 0/5] cgroup/cpuset: Additional housekeeping check & cleanup Waiman Long
` (2 preceding siblings ...)
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 3/5] cgroup/cpuset: Move up prstate_housekeeping_conflict() helper Waiman Long
@ 2025-11-05 4:38 ` Waiman Long
2025-11-05 6:29 ` Chen Ridong
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 5/5] cgroup/cpuset: Globally track isolated_cpus update Waiman Long
2025-11-05 17:07 ` [PATCH v3 0/5] cgroup/cpuset: Additional housekeeping check & cleanup Tejun Heo
5 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2025-11-05 4:38 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
Frederic Weisbecker, Waiman Long
Commit 4a74e418881f ("cgroup/cpuset: Check partition conflict with
housekeeping setup") is supposed to ensure that domain isolated CPUs
designated by the "isolcpus" boot command line option stay either in
root partition or in isolated partitions. However, the required check
wasn't implemented when a remote partition was created or when an
existing partition changed type from "root" to "isolated".
Even though this is a relatively minor issue, we still need to add the
required prstate_housekeeping_conflict() call in the right places to
ensure that the rule is strictly followed.
The following steps can be used to reproduce the problem before this
fix.
# fmt -1 /proc/cmdline | grep isolcpus
isolcpus=9
# cd /sys/fs/cgroup/
# echo +cpuset > cgroup.subtree_control
# mkdir test
# echo 9 > test/cpuset.cpus
# echo isolated > test/cpuset.cpus.partition
# cat test/cpuset.cpus.partition
isolated
# cat test/cpuset.cpus.effective
9
# echo root > test/cpuset.cpus.partition
# cat test/cpuset.cpus.effective
9
# cat test/cpuset.cpus.partition
root
With this fix, the last few steps will become:
# echo root > test/cpuset.cpus.partition
# cat test/cpuset.cpus.effective
0-8,10-95
# cat test/cpuset.cpus.partition
root invalid (partition config conflicts with housekeeping setup)
Reported-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index cc9c3402f16b..2daf58bf0bbb 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1610,8 +1610,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
return PERR_INVCPUS;
- if ((new_prs == PRS_ISOLATED) &&
- !isolated_cpus_can_update(tmp->new_cpus, NULL))
+ if (((new_prs == PRS_ISOLATED) &&
+ !isolated_cpus_can_update(tmp->new_cpus, NULL)) ||
+ prstate_housekeeping_conflict(new_prs, tmp->new_cpus))
return PERR_HKEEPING;
spin_lock_irq(&callback_lock);
@@ -3062,8 +3063,9 @@ static int update_prstate(struct cpuset *cs, int new_prs)
* A change in load balance state only, no change in cpumasks.
* Need to update isolated_cpus.
*/
- if ((new_prs == PRS_ISOLATED) &&
- !isolated_cpus_can_update(cs->effective_xcpus, NULL))
+ if (((new_prs == PRS_ISOLATED) &&
+ !isolated_cpus_can_update(cs->effective_xcpus, NULL)) ||
+ prstate_housekeeping_conflict(new_prs, cs->effective_xcpus))
err = PERR_HKEEPING;
else
isolcpus_updated = true;
--
2.51.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [cgroup/for-6.19 PATCH v3 4/5] cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 4/5] cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition Waiman Long
@ 2025-11-05 6:29 ` Chen Ridong
0 siblings, 0 replies; 10+ messages in thread
From: Chen Ridong @ 2025-11-05 6:29 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
Frederic Weisbecker
On 2025/11/5 12:38, Waiman Long wrote:
> Commit 4a74e418881f ("cgroup/cpuset: Check partition conflict with
> housekeeping setup") is supposed to ensure that domain isolated CPUs
> designated by the "isolcpus" boot command line option stay either in
> root partition or in isolated partitions. However, the required check
> wasn't implemented when a remote partition was created or when an
> existing partition changed type from "root" to "isolated".
>
> Even though this is a relatively minor issue, we still need to add the
> required prstate_housekeeping_conflict() call in the right places to
> ensure that the rule is strictly followed.
>
> The following steps can be used to reproduce the problem before this
> fix.
>
> # fmt -1 /proc/cmdline | grep isolcpus
> isolcpus=9
> # cd /sys/fs/cgroup/
> # echo +cpuset > cgroup.subtree_control
> # mkdir test
> # echo 9 > test/cpuset.cpus
> # echo isolated > test/cpuset.cpus.partition
> # cat test/cpuset.cpus.partition
> isolated
> # cat test/cpuset.cpus.effective
> 9
> # echo root > test/cpuset.cpus.partition
> # cat test/cpuset.cpus.effective
> 9
> # cat test/cpuset.cpus.partition
> root
>
> With this fix, the last few steps will become:
>
> # echo root > test/cpuset.cpus.partition
> # cat test/cpuset.cpus.effective
> 0-8,10-95
> # cat test/cpuset.cpus.partition
> root invalid (partition config conflicts with housekeeping setup)
>
> Reported-by: Chen Ridong <chenridong@huawei.com>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/cgroup/cpuset.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index cc9c3402f16b..2daf58bf0bbb 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1610,8 +1610,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
> if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
> cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
> return PERR_INVCPUS;
> - if ((new_prs == PRS_ISOLATED) &&
> - !isolated_cpus_can_update(tmp->new_cpus, NULL))
> + if (((new_prs == PRS_ISOLATED) &&
> + !isolated_cpus_can_update(tmp->new_cpus, NULL)) ||
> + prstate_housekeeping_conflict(new_prs, tmp->new_cpus))
> return PERR_HKEEPING;
>
> spin_lock_irq(&callback_lock);
> @@ -3062,8 +3063,9 @@ static int update_prstate(struct cpuset *cs, int new_prs)
> * A change in load balance state only, no change in cpumasks.
> * Need to update isolated_cpus.
> */
> - if ((new_prs == PRS_ISOLATED) &&
> - !isolated_cpus_can_update(cs->effective_xcpus, NULL))
> + if (((new_prs == PRS_ISOLATED) &&
> + !isolated_cpus_can_update(cs->effective_xcpus, NULL)) ||
> + prstate_housekeeping_conflict(new_prs, cs->effective_xcpus))
> err = PERR_HKEEPING;
> else
> isolcpus_updated = true;
Reviewed-by: Chen Ridong <chenridong@huawei.com>
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 10+ messages in thread
* [cgroup/for-6.19 PATCH v3 5/5] cgroup/cpuset: Globally track isolated_cpus update
2025-11-05 4:38 [cgroup/for-6.19 PATCH v3 0/5] cgroup/cpuset: Additional housekeeping check & cleanup Waiman Long
` (3 preceding siblings ...)
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 4/5] cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition Waiman Long
@ 2025-11-05 4:38 ` Waiman Long
2025-11-05 17:07 ` [PATCH v3 0/5] cgroup/cpuset: Additional housekeeping check & cleanup Tejun Heo
5 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2025-11-05 4:38 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
Frederic Weisbecker, Waiman Long
The current cpuset code passes a local isolcpus_updated flag around in a
number of functions to determine if external isolation related cpumasks
like wq_unbound_cpumask should be updated. It is a bit cumbersome and
makes the code more complex. Simplify the code by using a global boolean
flag "isolated_cpus_updating" to track this. This flag will be set in
isolated_cpus_update() and cleared in update_isolation_cpumasks().
No functional change is expected.
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cpuset.c | 73 ++++++++++++++++++++----------------------
1 file changed, 35 insertions(+), 38 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 2daf58bf0bbb..90288efe5367 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -81,6 +81,13 @@ static cpumask_var_t subpartitions_cpus;
*/
static cpumask_var_t isolated_cpus;
+/*
+ * isolated_cpus updating flag (protected by cpuset_mutex)
+ * Set if isolated_cpus is going to be updated in the current
+ * cpuset_mutex crtical section.
+ */
+static bool isolated_cpus_updating;
+
/*
* Housekeeping (HK_TYPE_DOMAIN) CPUs at boot
*/
@@ -1327,6 +1334,8 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
cpumask_or(isolated_cpus, isolated_cpus, xcpus);
else
cpumask_andnot(isolated_cpus, isolated_cpus, xcpus);
+
+ isolated_cpus_updating = true;
}
/*
@@ -1334,15 +1343,12 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
* @new_prs: new partition_root_state
* @parent: parent cpuset
* @xcpus: exclusive CPUs to be added
- * Return: true if isolated_cpus modified, false otherwise
*
* Remote partition if parent == NULL
*/
-static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
+static void partition_xcpus_add(int new_prs, struct cpuset *parent,
struct cpumask *xcpus)
{
- bool isolcpus_updated;
-
WARN_ON_ONCE(new_prs < 0);
lockdep_assert_held(&callback_lock);
if (!parent)
@@ -1352,13 +1358,11 @@ static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
if (parent == &top_cpuset)
cpumask_or(subpartitions_cpus, subpartitions_cpus, xcpus);
- isolcpus_updated = (new_prs != parent->partition_root_state);
- if (isolcpus_updated)
+ if (new_prs != parent->partition_root_state)
isolated_cpus_update(parent->partition_root_state, new_prs,
xcpus);
cpumask_andnot(parent->effective_cpus, parent->effective_cpus, xcpus);
- return isolcpus_updated;
}
/*
@@ -1366,15 +1370,12 @@ static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
* @old_prs: old partition_root_state
* @parent: parent cpuset
* @xcpus: exclusive CPUs to be removed
- * Return: true if isolated_cpus modified, false otherwise
*
* Remote partition if parent == NULL
*/
-static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
+static void partition_xcpus_del(int old_prs, struct cpuset *parent,
struct cpumask *xcpus)
{
- bool isolcpus_updated;
-
WARN_ON_ONCE(old_prs < 0);
lockdep_assert_held(&callback_lock);
if (!parent)
@@ -1383,14 +1384,12 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
if (parent == &top_cpuset)
cpumask_andnot(subpartitions_cpus, subpartitions_cpus, xcpus);
- isolcpus_updated = (old_prs != parent->partition_root_state);
- if (isolcpus_updated)
+ if (old_prs != parent->partition_root_state)
isolated_cpus_update(old_prs, parent->partition_root_state,
xcpus);
cpumask_and(xcpus, xcpus, cpu_active_mask);
cpumask_or(parent->effective_cpus, parent->effective_cpus, xcpus);
- return isolcpus_updated;
}
/*
@@ -1452,17 +1451,24 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
return false;
}
-static void update_isolation_cpumasks(bool isolcpus_updated)
+/*
+ * update_isolation_cpumasks - Update external isolation related CPU masks
+ *
+ * The following external CPU masks will be updated if necessary:
+ * - workqueue unbound cpumask
+ */
+static void update_isolation_cpumasks(void)
{
int ret;
- lockdep_assert_cpus_held();
-
- if (!isolcpus_updated)
+ if (!isolated_cpus_updating)
return;
+ lockdep_assert_cpus_held();
+
ret = workqueue_unbound_exclude_cpumask(isolated_cpus);
WARN_ON_ONCE(ret < 0);
+ isolated_cpus_updating = false;
}
/**
@@ -1587,8 +1593,6 @@ static inline bool is_local_partition(struct cpuset *cs)
static int remote_partition_enable(struct cpuset *cs, int new_prs,
struct tmpmasks *tmp)
{
- bool isolcpus_updated;
-
/*
* The user must have sysadmin privilege.
*/
@@ -1616,11 +1620,11 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
return PERR_HKEEPING;
spin_lock_irq(&callback_lock);
- isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
+ partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
list_add(&cs->remote_sibling, &remote_children);
cpumask_copy(cs->effective_xcpus, tmp->new_cpus);
spin_unlock_irq(&callback_lock);
- update_isolation_cpumasks(isolcpus_updated);
+ update_isolation_cpumasks();
cpuset_force_rebuild();
cs->prs_err = 0;
@@ -1643,15 +1647,12 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
*/
static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
{
- bool isolcpus_updated;
-
WARN_ON_ONCE(!is_remote_partition(cs));
WARN_ON_ONCE(!cpumask_subset(cs->effective_xcpus, subpartitions_cpus));
spin_lock_irq(&callback_lock);
list_del_init(&cs->remote_sibling);
- isolcpus_updated = partition_xcpus_del(cs->partition_root_state,
- NULL, cs->effective_xcpus);
+ partition_xcpus_del(cs->partition_root_state, NULL, cs->effective_xcpus);
if (cs->prs_err)
cs->partition_root_state = -cs->partition_root_state;
else
@@ -1661,7 +1662,7 @@ static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
compute_excpus(cs, cs->effective_xcpus);
reset_partition_data(cs);
spin_unlock_irq(&callback_lock);
- update_isolation_cpumasks(isolcpus_updated);
+ update_isolation_cpumasks();
cpuset_force_rebuild();
/*
@@ -1686,7 +1687,6 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
{
bool adding, deleting;
int prs = cs->partition_root_state;
- int isolcpus_updated = 0;
if (WARN_ON_ONCE(!is_remote_partition(cs)))
return;
@@ -1722,9 +1722,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
spin_lock_irq(&callback_lock);
if (adding)
- isolcpus_updated += partition_xcpus_add(prs, NULL, tmp->addmask);
+ partition_xcpus_add(prs, NULL, tmp->addmask);
if (deleting)
- isolcpus_updated += partition_xcpus_del(prs, NULL, tmp->delmask);
+ partition_xcpus_del(prs, NULL, tmp->delmask);
/*
* Need to update effective_xcpus and exclusive_cpus now as
* update_sibling_cpumasks() below may iterate back to the same cs.
@@ -1733,7 +1733,7 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
if (xcpus)
cpumask_copy(cs->exclusive_cpus, xcpus);
spin_unlock_irq(&callback_lock);
- update_isolation_cpumasks(isolcpus_updated);
+ update_isolation_cpumasks();
if (adding || deleting)
cpuset_force_rebuild();
@@ -1794,7 +1794,6 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
int deleting; /* Deleting cpus from parent's effective_cpus */
int old_prs, new_prs;
int part_error = PERR_NONE; /* Partition error? */
- int isolcpus_updated = 0;
struct cpumask *xcpus = user_xcpus(cs);
int parent_prs = parent->partition_root_state;
bool nocpu;
@@ -2073,14 +2072,12 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
* and vice versa.
*/
if (adding)
- isolcpus_updated += partition_xcpus_del(old_prs, parent,
- tmp->addmask);
+ partition_xcpus_del(old_prs, parent, tmp->addmask);
if (deleting)
- isolcpus_updated += partition_xcpus_add(new_prs, parent,
- tmp->delmask);
+ partition_xcpus_add(new_prs, parent, tmp->delmask);
spin_unlock_irq(&callback_lock);
- update_isolation_cpumasks(isolcpus_updated);
+ update_isolation_cpumasks();
if ((old_prs != new_prs) && (cmd == partcmd_update))
update_partition_exclusive_flag(cs, new_prs);
@@ -3103,7 +3100,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
else if (isolcpus_updated)
isolated_cpus_update(old_prs, new_prs, cs->effective_xcpus);
spin_unlock_irq(&callback_lock);
- update_isolation_cpumasks(isolcpus_updated);
+ update_isolation_cpumasks();
/* Force update if switching back to member & update effective_xcpus */
update_cpumasks_hier(cs, &tmpmask, !new_prs);
--
2.51.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH v3 0/5] cgroup/cpuset: Additional housekeeping check & cleanup
2025-11-05 4:38 [cgroup/for-6.19 PATCH v3 0/5] cgroup/cpuset: Additional housekeeping check & cleanup Waiman Long
` (4 preceding siblings ...)
2025-11-05 4:38 ` [cgroup/for-6.19 PATCH v3 5/5] cgroup/cpuset: Globally track isolated_cpus update Waiman Long
@ 2025-11-05 17:07 ` Tejun Heo
5 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2025-11-05 17:07 UTC (permalink / raw)
To: Waiman Long; +Cc: Gabriele Monaco, Chen Ridong, cgroups, linux-kernel
> Gabriele Monaco (1):
> cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to
> update_isolation_cpumasks()
>
> Waiman Long (4):
> cgroup/cpuset: Fail if isolated and nohz_full don't leave any
> housekeeping
> cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
> cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated
> partition
> cgroup/cpuset: Globally track isolated_cpus update
Applied 1-5 to cgroup/for-6.19.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread