cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [cgroup/for-6.19 PATCH v2 0/3] cgroup/cpuset: Additional housekeeping check & cleanup
@ 2025-11-04  1:30 Waiman Long
  2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 1/3] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() Waiman Long
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Waiman Long @ 2025-11-04  1:30 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný
  Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
	Frederic Weisbecker, Waiman Long

 v2: Update patch 2 to correct some of the checking code and add
     prstate_housekeeping_conflict() as suggested by Chen Ridong.

The first two patches are based on the two cpuset patches from the
"timers: Exclude isolated cpus from timer migration" patch series [1]
with some modifications. They add additional nohz_full housekeeping
mask check to ensure that there is at least one common housekeeping
CPU that is not domain and nohz_full isolated.

The last patch is another cleanup patch to simplify the current
partition code.

[1] https://lore.kernel.org/lkml/20251020112802.102451-1-gmonaco@redhat.com/

Gabriele Monaco (1):
  cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to
    update_isolation_cpumasks()

Waiman Long (2):
  cgroup/cpuset: Fail if isolated and nohz_full don't leave any
    housekeeping
  cgroup/cpuset: Globally track isolated_cpus update

 kernel/cgroup/cpuset.c | 148 ++++++++++++++++++++++++++++++-----------
 1 file changed, 109 insertions(+), 39 deletions(-)

-- 
2.51.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [cgroup/for-6.19 PATCH v2 1/3] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
  2025-11-04  1:30 [cgroup/for-6.19 PATCH v2 0/3] cgroup/cpuset: Additional housekeeping check & cleanup Waiman Long
@ 2025-11-04  1:30 ` Waiman Long
  2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Waiman Long
  2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 3/3] cgroup/cpuset: Globally track isolated_cpus update Waiman Long
  2 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2025-11-04  1:30 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný
  Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
	Frederic Weisbecker, Waiman Long, Chen Ridong

From: Gabriele Monaco <gmonaco@redhat.com>

update_unbound_workqueue_cpumask() updates unbound workqueues settings
when there's a change in isolated CPUs, but it can be used for other
subsystems requiring updated when isolated CPUs change.

Generalise the name to update_isolation_cpumasks() to prepare for other
functions unrelated to workqueues to be called in that spot.

[longman: Change the function name to update_isolation_cpumasks()]

Acked-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
---
 kernel/cgroup/cpuset.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 7aef59ea9627..da770dac955e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1393,7 +1393,7 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
 	return isolcpus_updated;
 }
 
-static void update_unbound_workqueue_cpumask(bool isolcpus_updated)
+static void update_isolation_cpumasks(bool isolcpus_updated)
 {
 	int ret;
 
@@ -1557,7 +1557,7 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
 	list_add(&cs->remote_sibling, &remote_children);
 	cpumask_copy(cs->effective_xcpus, tmp->new_cpus);
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_isolation_cpumasks(isolcpus_updated);
 	cpuset_force_rebuild();
 	cs->prs_err = 0;
 
@@ -1598,7 +1598,7 @@ static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
 	compute_excpus(cs, cs->effective_xcpus);
 	reset_partition_data(cs);
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_isolation_cpumasks(isolcpus_updated);
 	cpuset_force_rebuild();
 
 	/*
@@ -1667,7 +1667,7 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
 	if (xcpus)
 		cpumask_copy(cs->exclusive_cpus, xcpus);
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_isolation_cpumasks(isolcpus_updated);
 	if (adding || deleting)
 		cpuset_force_rebuild();
 
@@ -2011,7 +2011,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 							tmp->delmask);
 
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_isolation_cpumasks(isolcpus_updated);
 
 	if ((old_prs != new_prs) && (cmd == partcmd_update))
 		update_partition_exclusive_flag(cs, new_prs);
@@ -3029,7 +3029,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
 	else if (isolcpus_updated)
 		isolated_cpus_update(old_prs, new_prs, cs->effective_xcpus);
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_isolation_cpumasks(isolcpus_updated);
 
 	/* Force update if switching back to member & update effective_xcpus */
 	update_cpumasks_hier(cs, &tmpmask, !new_prs);
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [cgroup/for-6.19 PATCH v2 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
  2025-11-04  1:30 [cgroup/for-6.19 PATCH v2 0/3] cgroup/cpuset: Additional housekeeping check & cleanup Waiman Long
  2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 1/3] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() Waiman Long
@ 2025-11-04  1:30 ` Waiman Long
  2025-11-04  2:19   ` Chen Ridong
  2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 3/3] cgroup/cpuset: Globally track isolated_cpus update Waiman Long
  2 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2025-11-04  1:30 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný
  Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
	Frederic Weisbecker, Waiman Long

Currently the user can set up isolated cpus via cpuset and nohz_full in
such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
domain isolated nor nohz full). This can be a problem for other
subsystems (e.g. the timer wheel imgration).

Prevent this configuration by blocking any assignation that would cause
the union of domain isolated cpus and nohz_full to covers all CPUs.

[longman: Remove isolated_cpus_should_update() and rewrite the checking
 in update_prstate() and update_parent_effective_cpumask(), also add
 prstate_housekeeping_conflict() check in update_prstate() as
 suggested by Chen Ridong]

Originally-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 75 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index da770dac955e..0c49905df394 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1393,6 +1393,45 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
 	return isolcpus_updated;
 }
 
+/*
+ * isolated_cpus_can_update - check for isolated & nohz_full conflicts
+ * @add_cpus: cpu mask for cpus that are going to be isolated
+ * @del_cpus: cpu mask for cpus that are no longer isolated, can be NULL
+ * Return: false if there is conflict, true otherwise
+ *
+ * If nohz_full is enabled and we have isolated CPUs, their combination must
+ * still leave housekeeping CPUs.
+ *
+ * TBD: Should consider merging this function into
+ *      prstate_housekeeping_conflict().
+ */
+static bool isolated_cpus_can_update(struct cpumask *add_cpus,
+				     struct cpumask *del_cpus)
+{
+	cpumask_var_t full_hk_cpus;
+	int res = true;
+
+	if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE))
+		return true;
+
+	if (del_cpus && cpumask_weight_and(del_cpus,
+			housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)))
+		return true;
+
+	if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL))
+		return false;
+
+	cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
+		    housekeeping_cpumask(HK_TYPE_DOMAIN));
+	cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus);
+	cpumask_and(full_hk_cpus, full_hk_cpus, cpu_active_mask);
+	if (!cpumask_weight_andnot(full_hk_cpus, add_cpus))
+		res = false;
+
+	free_cpumask_var(full_hk_cpus);
+	return res;
+}
+
 static void update_isolation_cpumasks(bool isolcpus_updated)
 {
 	int ret;
@@ -1551,6 +1590,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
 	if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
 	    cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
 		return PERR_INVCPUS;
+	if ((new_prs == PRS_ISOLATED) &&
+	    !isolated_cpus_can_update(tmp->new_cpus, NULL))
+		return PERR_HKEEPING;
 
 	spin_lock_irq(&callback_lock);
 	isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
@@ -1650,6 +1692,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
 		else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
 			 cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
 			cs->prs_err = PERR_NOCPUS;
+		else if ((prs == PRS_ISOLATED) &&
+			 !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
+			cs->prs_err = PERR_HKEEPING;
 		if (cs->prs_err)
 			goto invalidate;
 	}
@@ -1750,6 +1795,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 	int part_error = PERR_NONE;	/* Partition error? */
 	int isolcpus_updated = 0;
 	struct cpumask *xcpus = user_xcpus(cs);
+	int parent_prs = parent->partition_root_state;
 	bool nocpu;
 
 	lockdep_assert_held(&cpuset_mutex);
@@ -1813,6 +1859,10 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 		if (prstate_housekeeping_conflict(new_prs, xcpus))
 			return PERR_HKEEPING;
 
+		if ((new_prs == PRS_ISOLATED) && (new_prs != parent_prs) &&
+		    !isolated_cpus_can_update(xcpus, NULL))
+			return PERR_HKEEPING;
+
 		if (tasks_nocpu_error(parent, cs, xcpus))
 			return PERR_NOCPUS;
 
@@ -1866,6 +1916,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 		 *
 		 * For invalid partition:
 		 *   delmask = newmask & parent->effective_xcpus
+		 *   The partition may become valid soon.
 		 */
 		if (is_partition_invalid(cs)) {
 			adding = false;
@@ -1880,6 +1931,23 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 			deleting = cpumask_and(tmp->delmask, tmp->delmask,
 					       parent->effective_xcpus);
 		}
+
+		/*
+		 * TBD: Invalidate a currently valid child root partition may
+		 * still break isolated_cpus_can_update() rule if parent is an
+		 * isolated partition.
+		 */
+		if (is_partition_valid(cs) && (old_prs != parent_prs)) {
+			if ((parent_prs == PRS_ROOT) &&
+			    /* Adding to parent means removing isolated CPUs */
+			    !isolated_cpus_can_update(tmp->delmask, tmp->addmask))
+				part_error = PERR_HKEEPING;
+			if ((parent_prs == PRS_ISOLATED) &&
+			    /* Adding to parent means adding isolated CPUs */
+			    !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
+				part_error = PERR_HKEEPING;
+		}
+
 		/*
 		 * The new CPUs to be removed from parent's effective CPUs
 		 * must be present.
@@ -2994,7 +3062,12 @@ static int update_prstate(struct cpuset *cs, int new_prs)
 		 * A change in load balance state only, no change in cpumasks.
 		 * Need to update isolated_cpus.
 		 */
-		isolcpus_updated = true;
+		if (((new_prs == PRS_ISOLATED) &&
+		     !isolated_cpus_can_update(cs->effective_xcpus, NULL)) ||
+		    prstate_housekeeping_conflict(new_prs, cs->effective_xcpus))
+			err = PERR_HKEEPING;
+		else
+			isolcpus_updated = true;
 	} else {
 		/*
 		 * Switching back to member is always allowed even if it
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [cgroup/for-6.19 PATCH v2 3/3] cgroup/cpuset: Globally track isolated_cpus update
  2025-11-04  1:30 [cgroup/for-6.19 PATCH v2 0/3] cgroup/cpuset: Additional housekeeping check & cleanup Waiman Long
  2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 1/3] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() Waiman Long
  2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Waiman Long
@ 2025-11-04  1:30 ` Waiman Long
  2025-11-04  2:21   ` Chen Ridong
  2 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2025-11-04  1:30 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný
  Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
	Frederic Weisbecker, Waiman Long

The current cpuset code passes a local isolcpus_updated flag around in a
number of functions to determine if external isolation related cpumasks
like wq_unbound_cpumask should be updated. It is a bit cumbersome and
makes the code more complex. Simplify the code by using a global boolean
flag "isolated_cpus_updating" to track this. This flag will be set in
isolated_cpus_update() and cleared in update_isolation_cpumasks().

No functional change is expected.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 73 ++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 38 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 0c49905df394..854fe4cc1358 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -81,6 +81,13 @@ static cpumask_var_t	subpartitions_cpus;
  */
 static cpumask_var_t	isolated_cpus;
 
+/*
+ * isolated_cpus updating flag (protected by cpuset_mutex)
+ * Set if isolated_cpus is going to be updated in the current
+ * cpuset_mutex crtical section.
+ */
+static bool isolated_cpus_updating;
+
 /*
  * Housekeeping (HK_TYPE_DOMAIN) CPUs at boot
  */
@@ -1327,6 +1334,8 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
 		cpumask_or(isolated_cpus, isolated_cpus, xcpus);
 	else
 		cpumask_andnot(isolated_cpus, isolated_cpus, xcpus);
+
+	isolated_cpus_updating = true;
 }
 
 /*
@@ -1334,15 +1343,12 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
  * @new_prs: new partition_root_state
  * @parent: parent cpuset
  * @xcpus: exclusive CPUs to be added
- * Return: true if isolated_cpus modified, false otherwise
  *
  * Remote partition if parent == NULL
  */
-static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
+static void partition_xcpus_add(int new_prs, struct cpuset *parent,
 				struct cpumask *xcpus)
 {
-	bool isolcpus_updated;
-
 	WARN_ON_ONCE(new_prs < 0);
 	lockdep_assert_held(&callback_lock);
 	if (!parent)
@@ -1352,13 +1358,11 @@ static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
 	if (parent == &top_cpuset)
 		cpumask_or(subpartitions_cpus, subpartitions_cpus, xcpus);
 
-	isolcpus_updated = (new_prs != parent->partition_root_state);
-	if (isolcpus_updated)
+	if (new_prs != parent->partition_root_state)
 		isolated_cpus_update(parent->partition_root_state, new_prs,
 				     xcpus);
 
 	cpumask_andnot(parent->effective_cpus, parent->effective_cpus, xcpus);
-	return isolcpus_updated;
 }
 
 /*
@@ -1366,15 +1370,12 @@ static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
  * @old_prs: old partition_root_state
  * @parent: parent cpuset
  * @xcpus: exclusive CPUs to be removed
- * Return: true if isolated_cpus modified, false otherwise
  *
  * Remote partition if parent == NULL
  */
-static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
+static void partition_xcpus_del(int old_prs, struct cpuset *parent,
 				struct cpumask *xcpus)
 {
-	bool isolcpus_updated;
-
 	WARN_ON_ONCE(old_prs < 0);
 	lockdep_assert_held(&callback_lock);
 	if (!parent)
@@ -1383,14 +1384,12 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
 	if (parent == &top_cpuset)
 		cpumask_andnot(subpartitions_cpus, subpartitions_cpus, xcpus);
 
-	isolcpus_updated = (old_prs != parent->partition_root_state);
-	if (isolcpus_updated)
+	if (old_prs != parent->partition_root_state)
 		isolated_cpus_update(old_prs, parent->partition_root_state,
 				     xcpus);
 
 	cpumask_and(xcpus, xcpus, cpu_active_mask);
 	cpumask_or(parent->effective_cpus, parent->effective_cpus, xcpus);
-	return isolcpus_updated;
 }
 
 /*
@@ -1432,17 +1431,24 @@ static bool isolated_cpus_can_update(struct cpumask *add_cpus,
 	return res;
 }
 
-static void update_isolation_cpumasks(bool isolcpus_updated)
+/*
+ * update_isolation_cpumasks - Update external isolation related CPU masks
+ *
+ * The following external CPU masks will be updated if necessary:
+ * - workqueue unbound cpumask
+ */
+static void update_isolation_cpumasks(void)
 {
 	int ret;
 
-	lockdep_assert_cpus_held();
-
-	if (!isolcpus_updated)
+	if (!isolated_cpus_updating)
 		return;
 
+	lockdep_assert_cpus_held();
+
 	ret = workqueue_unbound_exclude_cpumask(isolated_cpus);
 	WARN_ON_ONCE(ret < 0);
+	isolated_cpus_updating = false;
 }
 
 /**
@@ -1567,8 +1573,6 @@ static inline bool is_local_partition(struct cpuset *cs)
 static int remote_partition_enable(struct cpuset *cs, int new_prs,
 				   struct tmpmasks *tmp)
 {
-	bool isolcpus_updated;
-
 	/*
 	 * The user must have sysadmin privilege.
 	 */
@@ -1595,11 +1599,11 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
 		return PERR_HKEEPING;
 
 	spin_lock_irq(&callback_lock);
-	isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
+	partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
 	list_add(&cs->remote_sibling, &remote_children);
 	cpumask_copy(cs->effective_xcpus, tmp->new_cpus);
 	spin_unlock_irq(&callback_lock);
-	update_isolation_cpumasks(isolcpus_updated);
+	update_isolation_cpumasks();
 	cpuset_force_rebuild();
 	cs->prs_err = 0;
 
@@ -1622,15 +1626,12 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
  */
 static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
 {
-	bool isolcpus_updated;
-
 	WARN_ON_ONCE(!is_remote_partition(cs));
 	WARN_ON_ONCE(!cpumask_subset(cs->effective_xcpus, subpartitions_cpus));
 
 	spin_lock_irq(&callback_lock);
 	list_del_init(&cs->remote_sibling);
-	isolcpus_updated = partition_xcpus_del(cs->partition_root_state,
-					       NULL, cs->effective_xcpus);
+	partition_xcpus_del(cs->partition_root_state, NULL, cs->effective_xcpus);
 	if (cs->prs_err)
 		cs->partition_root_state = -cs->partition_root_state;
 	else
@@ -1640,7 +1641,7 @@ static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
 	compute_excpus(cs, cs->effective_xcpus);
 	reset_partition_data(cs);
 	spin_unlock_irq(&callback_lock);
-	update_isolation_cpumasks(isolcpus_updated);
+	update_isolation_cpumasks();
 	cpuset_force_rebuild();
 
 	/*
@@ -1665,7 +1666,6 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
 {
 	bool adding, deleting;
 	int prs = cs->partition_root_state;
-	int isolcpus_updated = 0;
 
 	if (WARN_ON_ONCE(!is_remote_partition(cs)))
 		return;
@@ -1701,9 +1701,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
 
 	spin_lock_irq(&callback_lock);
 	if (adding)
-		isolcpus_updated += partition_xcpus_add(prs, NULL, tmp->addmask);
+		partition_xcpus_add(prs, NULL, tmp->addmask);
 	if (deleting)
-		isolcpus_updated += partition_xcpus_del(prs, NULL, tmp->delmask);
+		partition_xcpus_del(prs, NULL, tmp->delmask);
 	/*
 	 * Need to update effective_xcpus and exclusive_cpus now as
 	 * update_sibling_cpumasks() below may iterate back to the same cs.
@@ -1712,7 +1712,7 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
 	if (xcpus)
 		cpumask_copy(cs->exclusive_cpus, xcpus);
 	spin_unlock_irq(&callback_lock);
-	update_isolation_cpumasks(isolcpus_updated);
+	update_isolation_cpumasks();
 	if (adding || deleting)
 		cpuset_force_rebuild();
 
@@ -1793,7 +1793,6 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 	int deleting;	/* Deleting cpus from parent's effective_cpus	*/
 	int old_prs, new_prs;
 	int part_error = PERR_NONE;	/* Partition error? */
-	int isolcpus_updated = 0;
 	struct cpumask *xcpus = user_xcpus(cs);
 	int parent_prs = parent->partition_root_state;
 	bool nocpu;
@@ -2072,14 +2071,12 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 	 * and vice versa.
 	 */
 	if (adding)
-		isolcpus_updated += partition_xcpus_del(old_prs, parent,
-							tmp->addmask);
+		partition_xcpus_del(old_prs, parent, tmp->addmask);
 	if (deleting)
-		isolcpus_updated += partition_xcpus_add(new_prs, parent,
-							tmp->delmask);
+		partition_xcpus_add(new_prs, parent, tmp->delmask);
 
 	spin_unlock_irq(&callback_lock);
-	update_isolation_cpumasks(isolcpus_updated);
+	update_isolation_cpumasks();
 
 	if ((old_prs != new_prs) && (cmd == partcmd_update))
 		update_partition_exclusive_flag(cs, new_prs);
@@ -3102,7 +3099,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
 	else if (isolcpus_updated)
 		isolated_cpus_update(old_prs, new_prs, cs->effective_xcpus);
 	spin_unlock_irq(&callback_lock);
-	update_isolation_cpumasks(isolcpus_updated);
+	update_isolation_cpumasks();
 
 	/* Force update if switching back to member & update effective_xcpus */
 	update_cpumasks_hier(cs, &tmpmask, !new_prs);
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [cgroup/for-6.19 PATCH v2 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
  2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Waiman Long
@ 2025-11-04  2:19   ` Chen Ridong
  2025-11-04  4:10     ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Chen Ridong @ 2025-11-04  2:19 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný
  Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
	Frederic Weisbecker



On 2025/11/4 9:30, Waiman Long wrote:
> Currently the user can set up isolated cpus via cpuset and nohz_full in
> such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
> domain isolated nor nohz full). This can be a problem for other
> subsystems (e.g. the timer wheel imgration).
> 
> Prevent this configuration by blocking any assignation that would cause
> the union of domain isolated cpus and nohz_full to covers all CPUs.
> 
> [longman: Remove isolated_cpus_should_update() and rewrite the checking
>  in update_prstate() and update_parent_effective_cpumask(), also add
>  prstate_housekeeping_conflict() check in update_prstate() as
>  suggested by Chen Ridong]
> 
> Originally-by: Gabriele Monaco <gmonaco@redhat.com>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 75 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 74 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index da770dac955e..0c49905df394 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1393,6 +1393,45 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
>  	return isolcpus_updated;
>  }
>  
> +/*
> + * isolated_cpus_can_update - check for isolated & nohz_full conflicts
> + * @add_cpus: cpu mask for cpus that are going to be isolated
> + * @del_cpus: cpu mask for cpus that are no longer isolated, can be NULL
> + * Return: false if there is conflict, true otherwise
> + *
> + * If nohz_full is enabled and we have isolated CPUs, their combination must
> + * still leave housekeeping CPUs.
> + *
> + * TBD: Should consider merging this function into
> + *      prstate_housekeeping_conflict().
> + */
> +static bool isolated_cpus_can_update(struct cpumask *add_cpus,
> +				     struct cpumask *del_cpus)
> +{
> +	cpumask_var_t full_hk_cpus;
> +	int res = true;
> +
> +	if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE))
> +		return true;
> +
> +	if (del_cpus && cpumask_weight_and(del_cpus,
> +			housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)))
> +		return true;
> +
> +	if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL))
> +		return false;
> +
> +	cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
> +		    housekeeping_cpumask(HK_TYPE_DOMAIN));
> +	cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus);
> +	cpumask_and(full_hk_cpus, full_hk_cpus, cpu_active_mask);
> +	if (!cpumask_weight_andnot(full_hk_cpus, add_cpus))
> +		res = false;
> +
> +	free_cpumask_var(full_hk_cpus);
> +	return res;
> +}
> +
>  static void update_isolation_cpumasks(bool isolcpus_updated)
>  {
>  	int ret;
> @@ -1551,6 +1590,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
>  	if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
>  	    cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
>  		return PERR_INVCPUS;
> +	if ((new_prs == PRS_ISOLATED) &&
> +	    !isolated_cpus_can_update(tmp->new_cpus, NULL))
> +		return PERR_HKEEPING;
>  

Do we also need to check prstate_housekeeping_conflict here?

>  	spin_lock_irq(&callback_lock);
>  	isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
> @@ -1650,6 +1692,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
>  		else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
>  			 cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
>  			cs->prs_err = PERR_NOCPUS;
> +		else if ((prs == PRS_ISOLATED) &&
> +			 !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
> +			cs->prs_err = PERR_HKEEPING;
>  		if (cs->prs_err)
>  			goto invalidate;
>  	}

Ditto.

> @@ -1750,6 +1795,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>  	int part_error = PERR_NONE;	/* Partition error? */
>  	int isolcpus_updated = 0;
>  	struct cpumask *xcpus = user_xcpus(cs);
> +	int parent_prs = parent->partition_root_state;
>  	bool nocpu;
>  
>  	lockdep_assert_held(&cpuset_mutex);
> @@ -1813,6 +1859,10 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>  		if (prstate_housekeeping_conflict(new_prs, xcpus))
>  			return PERR_HKEEPING;
>  
> +		if ((new_prs == PRS_ISOLATED) && (new_prs != parent_prs) &&
> +		    !isolated_cpus_can_update(xcpus, NULL))
> +			return PERR_HKEEPING;
> +
>  		if (tasks_nocpu_error(parent, cs, xcpus))
>  			return PERR_NOCPUS;
>  

I think isolated_cpus_can_update check should be also added to validate_partition function.

If deemed necessary, you may consider applying the patch below, which reuses validate_partition to
enable the local partition, so validate_partition can be common block.

https://lore.kernel.org/cgroups/20251025064844.495525-4-chenridong@huaweicloud.com/


> @@ -1866,6 +1916,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>  		 *
>  		 * For invalid partition:
>  		 *   delmask = newmask & parent->effective_xcpus
> +		 *   The partition may become valid soon.
>  		 */
>  		if (is_partition_invalid(cs)) {
>  			adding = false;
> @@ -1880,6 +1931,23 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>  			deleting = cpumask_and(tmp->delmask, tmp->delmask,
>  					       parent->effective_xcpus);
>  		}
> +
> +		/*
> +		 * TBD: Invalidate a currently valid child root partition may
> +		 * still break isolated_cpus_can_update() rule if parent is an
> +		 * isolated partition.
> +		 */
> +		if (is_partition_valid(cs) && (old_prs != parent_prs)) {
> +			if ((parent_prs == PRS_ROOT) &&
> +			    /* Adding to parent means removing isolated CPUs */
> +			    !isolated_cpus_can_update(tmp->delmask, tmp->addmask))
> +				part_error = PERR_HKEEPING;
> +			if ((parent_prs == PRS_ISOLATED) &&
> +			    /* Adding to parent means adding isolated CPUs */
> +			    !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
> +				part_error = PERR_HKEEPING;
> +		}
> +
>  		/*
>  		 * The new CPUs to be removed from parent's effective CPUs
>  		 * must be present.
> @@ -2994,7 +3062,12 @@ static int update_prstate(struct cpuset *cs, int new_prs)
>  		 * A change in load balance state only, no change in cpumasks.
>  		 * Need to update isolated_cpus.
>  		 */
> -		isolcpus_updated = true;
> +		if (((new_prs == PRS_ISOLATED) &&
> +		     !isolated_cpus_can_update(cs->effective_xcpus, NULL)) ||
> +		    prstate_housekeeping_conflict(new_prs, cs->effective_xcpus))
> +			err = PERR_HKEEPING;
> +		else
> +			isolcpus_updated = true;
>  	} else {
>  		/*
>  		 * Switching back to member is always allowed even if it

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [cgroup/for-6.19 PATCH v2 3/3] cgroup/cpuset: Globally track isolated_cpus update
  2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 3/3] cgroup/cpuset: Globally track isolated_cpus update Waiman Long
@ 2025-11-04  2:21   ` Chen Ridong
  0 siblings, 0 replies; 8+ messages in thread
From: Chen Ridong @ 2025-11-04  2:21 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný
  Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
	Frederic Weisbecker



On 2025/11/4 9:30, Waiman Long wrote:
> The current cpuset code passes a local isolcpus_updated flag around in a
> number of functions to determine if external isolation related cpumasks
> like wq_unbound_cpumask should be updated. It is a bit cumbersome and
> makes the code more complex. Simplify the code by using a global boolean
> flag "isolated_cpus_updating" to track this. This flag will be set in
> isolated_cpus_update() and cleared in update_isolation_cpumasks().
> 
> No functional change is expected.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 73 ++++++++++++++++++++----------------------
>  1 file changed, 35 insertions(+), 38 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 0c49905df394..854fe4cc1358 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -81,6 +81,13 @@ static cpumask_var_t	subpartitions_cpus;
>   */
>  static cpumask_var_t	isolated_cpus;
>  
> +/*
> + * isolated_cpus updating flag (protected by cpuset_mutex)
> + * Set if isolated_cpus is going to be updated in the current
> + * cpuset_mutex crtical section.
> + */
> +static bool isolated_cpus_updating;
> +
>  /*
>   * Housekeeping (HK_TYPE_DOMAIN) CPUs at boot
>   */
> @@ -1327,6 +1334,8 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
>  		cpumask_or(isolated_cpus, isolated_cpus, xcpus);
>  	else
>  		cpumask_andnot(isolated_cpus, isolated_cpus, xcpus);
> +
> +	isolated_cpus_updating = true;
>  }
>  
>  /*
> @@ -1334,15 +1343,12 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
>   * @new_prs: new partition_root_state
>   * @parent: parent cpuset
>   * @xcpus: exclusive CPUs to be added
> - * Return: true if isolated_cpus modified, false otherwise
>   *
>   * Remote partition if parent == NULL
>   */
> -static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
> +static void partition_xcpus_add(int new_prs, struct cpuset *parent,
>  				struct cpumask *xcpus)
>  {
> -	bool isolcpus_updated;
> -
>  	WARN_ON_ONCE(new_prs < 0);
>  	lockdep_assert_held(&callback_lock);
>  	if (!parent)
> @@ -1352,13 +1358,11 @@ static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
>  	if (parent == &top_cpuset)
>  		cpumask_or(subpartitions_cpus, subpartitions_cpus, xcpus);
>  
> -	isolcpus_updated = (new_prs != parent->partition_root_state);
> -	if (isolcpus_updated)
> +	if (new_prs != parent->partition_root_state)
>  		isolated_cpus_update(parent->partition_root_state, new_prs,
>  				     xcpus);
>  
>  	cpumask_andnot(parent->effective_cpus, parent->effective_cpus, xcpus);
> -	return isolcpus_updated;
>  }
>  
>  /*
> @@ -1366,15 +1370,12 @@ static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
>   * @old_prs: old partition_root_state
>   * @parent: parent cpuset
>   * @xcpus: exclusive CPUs to be removed
> - * Return: true if isolated_cpus modified, false otherwise
>   *
>   * Remote partition if parent == NULL
>   */
> -static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
> +static void partition_xcpus_del(int old_prs, struct cpuset *parent,
>  				struct cpumask *xcpus)
>  {
> -	bool isolcpus_updated;
> -
>  	WARN_ON_ONCE(old_prs < 0);
>  	lockdep_assert_held(&callback_lock);
>  	if (!parent)
> @@ -1383,14 +1384,12 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
>  	if (parent == &top_cpuset)
>  		cpumask_andnot(subpartitions_cpus, subpartitions_cpus, xcpus);
>  
> -	isolcpus_updated = (old_prs != parent->partition_root_state);
> -	if (isolcpus_updated)
> +	if (old_prs != parent->partition_root_state)
>  		isolated_cpus_update(old_prs, parent->partition_root_state,
>  				     xcpus);
>  
>  	cpumask_and(xcpus, xcpus, cpu_active_mask);
>  	cpumask_or(parent->effective_cpus, parent->effective_cpus, xcpus);
> -	return isolcpus_updated;
>  }
>  
>  /*
> @@ -1432,17 +1431,24 @@ static bool isolated_cpus_can_update(struct cpumask *add_cpus,
>  	return res;
>  }
>  
> -static void update_isolation_cpumasks(bool isolcpus_updated)
> +/*
> + * update_isolation_cpumasks - Update external isolation related CPU masks
> + *
> + * The following external CPU masks will be updated if necessary:
> + * - workqueue unbound cpumask
> + */
> +static void update_isolation_cpumasks(void)
>  {
>  	int ret;
>  
> -	lockdep_assert_cpus_held();
> -
> -	if (!isolcpus_updated)
> +	if (!isolated_cpus_updating)
>  		return;
>  
> +	lockdep_assert_cpus_held();
> +
>  	ret = workqueue_unbound_exclude_cpumask(isolated_cpus);
>  	WARN_ON_ONCE(ret < 0);
> +	isolated_cpus_updating = false;
>  }
>  
>  /**
> @@ -1567,8 +1573,6 @@ static inline bool is_local_partition(struct cpuset *cs)
>  static int remote_partition_enable(struct cpuset *cs, int new_prs,
>  				   struct tmpmasks *tmp)
>  {
> -	bool isolcpus_updated;
> -
>  	/*
>  	 * The user must have sysadmin privilege.
>  	 */
> @@ -1595,11 +1599,11 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
>  		return PERR_HKEEPING;
>  
>  	spin_lock_irq(&callback_lock);
> -	isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
> +	partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
>  	list_add(&cs->remote_sibling, &remote_children);
>  	cpumask_copy(cs->effective_xcpus, tmp->new_cpus);
>  	spin_unlock_irq(&callback_lock);
> -	update_isolation_cpumasks(isolcpus_updated);
> +	update_isolation_cpumasks();
>  	cpuset_force_rebuild();
>  	cs->prs_err = 0;
>  
> @@ -1622,15 +1626,12 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
>   */
>  static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
>  {
> -	bool isolcpus_updated;
> -
>  	WARN_ON_ONCE(!is_remote_partition(cs));
>  	WARN_ON_ONCE(!cpumask_subset(cs->effective_xcpus, subpartitions_cpus));
>  
>  	spin_lock_irq(&callback_lock);
>  	list_del_init(&cs->remote_sibling);
> -	isolcpus_updated = partition_xcpus_del(cs->partition_root_state,
> -					       NULL, cs->effective_xcpus);
> +	partition_xcpus_del(cs->partition_root_state, NULL, cs->effective_xcpus);
>  	if (cs->prs_err)
>  		cs->partition_root_state = -cs->partition_root_state;
>  	else
> @@ -1640,7 +1641,7 @@ static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
>  	compute_excpus(cs, cs->effective_xcpus);
>  	reset_partition_data(cs);
>  	spin_unlock_irq(&callback_lock);
> -	update_isolation_cpumasks(isolcpus_updated);
> +	update_isolation_cpumasks();
>  	cpuset_force_rebuild();
>  
>  	/*
> @@ -1665,7 +1666,6 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
>  {
>  	bool adding, deleting;
>  	int prs = cs->partition_root_state;
> -	int isolcpus_updated = 0;
>  
>  	if (WARN_ON_ONCE(!is_remote_partition(cs)))
>  		return;
> @@ -1701,9 +1701,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
>  
>  	spin_lock_irq(&callback_lock);
>  	if (adding)
> -		isolcpus_updated += partition_xcpus_add(prs, NULL, tmp->addmask);
> +		partition_xcpus_add(prs, NULL, tmp->addmask);
>  	if (deleting)
> -		isolcpus_updated += partition_xcpus_del(prs, NULL, tmp->delmask);
> +		partition_xcpus_del(prs, NULL, tmp->delmask);
>  	/*
>  	 * Need to update effective_xcpus and exclusive_cpus now as
>  	 * update_sibling_cpumasks() below may iterate back to the same cs.
> @@ -1712,7 +1712,7 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
>  	if (xcpus)
>  		cpumask_copy(cs->exclusive_cpus, xcpus);
>  	spin_unlock_irq(&callback_lock);
> -	update_isolation_cpumasks(isolcpus_updated);
> +	update_isolation_cpumasks();
>  	if (adding || deleting)
>  		cpuset_force_rebuild();
>  
> @@ -1793,7 +1793,6 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>  	int deleting;	/* Deleting cpus from parent's effective_cpus	*/
>  	int old_prs, new_prs;
>  	int part_error = PERR_NONE;	/* Partition error? */
> -	int isolcpus_updated = 0;
>  	struct cpumask *xcpus = user_xcpus(cs);
>  	int parent_prs = parent->partition_root_state;
>  	bool nocpu;
> @@ -2072,14 +2071,12 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>  	 * and vice versa.
>  	 */
>  	if (adding)
> -		isolcpus_updated += partition_xcpus_del(old_prs, parent,
> -							tmp->addmask);
> +		partition_xcpus_del(old_prs, parent, tmp->addmask);
>  	if (deleting)
> -		isolcpus_updated += partition_xcpus_add(new_prs, parent,
> -							tmp->delmask);
> +		partition_xcpus_add(new_prs, parent, tmp->delmask);
>  
>  	spin_unlock_irq(&callback_lock);
> -	update_isolation_cpumasks(isolcpus_updated);
> +	update_isolation_cpumasks();
>  
>  	if ((old_prs != new_prs) && (cmd == partcmd_update))
>  		update_partition_exclusive_flag(cs, new_prs);
> @@ -3102,7 +3099,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
>  	else if (isolcpus_updated)
>  		isolated_cpus_update(old_prs, new_prs, cs->effective_xcpus);
>  	spin_unlock_irq(&callback_lock);
> -	update_isolation_cpumasks(isolcpus_updated);
> +	update_isolation_cpumasks();
>  
>  	/* Force update if switching back to member & update effective_xcpus */
>  	update_cpumasks_hier(cs, &tmpmask, !new_prs);

Reviewed-by: Chen Ridong <chenridong@huawei.com>

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [cgroup/for-6.19 PATCH v2 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
  2025-11-04  2:19   ` Chen Ridong
@ 2025-11-04  4:10     ` Waiman Long
  2025-11-04  7:26       ` Chen Ridong
  0 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2025-11-04  4:10 UTC (permalink / raw)
  To: Chen Ridong, Tejun Heo, Johannes Weiner, Michal Koutný
  Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
	Frederic Weisbecker

On 11/3/25 9:19 PM, Chen Ridong wrote:
>
> On 2025/11/4 9:30, Waiman Long wrote:
>> Currently the user can set up isolated cpus via cpuset and nohz_full in
>> such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
>> domain isolated nor nohz full). This can be a problem for other
>> subsystems (e.g. the timer wheel imgration).
>>
>> Prevent this configuration by blocking any assignation that would cause
>> the union of domain isolated cpus and nohz_full to covers all CPUs.
>>
>> [longman: Remove isolated_cpus_should_update() and rewrite the checking
>>   in update_prstate() and update_parent_effective_cpumask(), also add
>>   prstate_housekeeping_conflict() check in update_prstate() as
>>   suggested by Chen Ridong]
>>
>> Originally-by: Gabriele Monaco <gmonaco@redhat.com>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/cgroup/cpuset.c | 75 +++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 74 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index da770dac955e..0c49905df394 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -1393,6 +1393,45 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
>>   	return isolcpus_updated;
>>   }
>>   
>> +/*
>> + * isolated_cpus_can_update - check for isolated & nohz_full conflicts
>> + * @add_cpus: cpu mask for cpus that are going to be isolated
>> + * @del_cpus: cpu mask for cpus that are no longer isolated, can be NULL
>> + * Return: false if there is conflict, true otherwise
>> + *
>> + * If nohz_full is enabled and we have isolated CPUs, their combination must
>> + * still leave housekeeping CPUs.
>> + *
>> + * TBD: Should consider merging this function into
>> + *      prstate_housekeeping_conflict().
>> + */
>> +static bool isolated_cpus_can_update(struct cpumask *add_cpus,
>> +				     struct cpumask *del_cpus)
>> +{
>> +	cpumask_var_t full_hk_cpus;
>> +	int res = true;
>> +
>> +	if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE))
>> +		return true;
>> +
>> +	if (del_cpus && cpumask_weight_and(del_cpus,
>> +			housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)))
>> +		return true;
>> +
>> +	if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL))
>> +		return false;
>> +
>> +	cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
>> +		    housekeeping_cpumask(HK_TYPE_DOMAIN));
>> +	cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus);
>> +	cpumask_and(full_hk_cpus, full_hk_cpus, cpu_active_mask);
>> +	if (!cpumask_weight_andnot(full_hk_cpus, add_cpus))
>> +		res = false;
>> +
>> +	free_cpumask_var(full_hk_cpus);
>> +	return res;
>> +}
>> +
>>   static void update_isolation_cpumasks(bool isolcpus_updated)
>>   {
>>   	int ret;
>> @@ -1551,6 +1590,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
>>   	if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
>>   	    cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
>>   		return PERR_INVCPUS;
>> +	if ((new_prs == PRS_ISOLATED) &&
>> +	    !isolated_cpus_can_update(tmp->new_cpus, NULL))
>> +		return PERR_HKEEPING;
>>   
> Do we also need to check prstate_housekeeping_conflict here?
Right. I missed that. Will add that in the next version.
>
>>   	spin_lock_irq(&callback_lock);
>>   	isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
>> @@ -1650,6 +1692,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
>>   		else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
>>   			 cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
>>   			cs->prs_err = PERR_NOCPUS;
>> +		else if ((prs == PRS_ISOLATED) &&
>> +			 !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
>> +			cs->prs_err = PERR_HKEEPING;
>>   		if (cs->prs_err)
>>   			goto invalidate;
>>   	}
> Ditto.

prstate_housekeeping_conflict() has been called earlier via 
validate_partition() from partition_cpus_change(). We don't need one 
more check here. However, I forgot that enabling a partition will not 
call validate_partition().


>
>> @@ -1750,6 +1795,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>>   	int part_error = PERR_NONE;	/* Partition error? */
>>   	int isolcpus_updated = 0;
>>   	struct cpumask *xcpus = user_xcpus(cs);
>> +	int parent_prs = parent->partition_root_state;
>>   	bool nocpu;
>>   
>>   	lockdep_assert_held(&cpuset_mutex);
>> @@ -1813,6 +1859,10 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>>   		if (prstate_housekeeping_conflict(new_prs, xcpus))
>>   			return PERR_HKEEPING;
>>   
>> +		if ((new_prs == PRS_ISOLATED) && (new_prs != parent_prs) &&
>> +		    !isolated_cpus_can_update(xcpus, NULL))
>> +			return PERR_HKEEPING;
>> +
>>   		if (tasks_nocpu_error(parent, cs, xcpus))
>>   			return PERR_NOCPUS;
>>   
> I think isolated_cpus_can_update check should be also added to validate_partition function.
>
> If deemed necessary, you may consider applying the patch below, which reuses validate_partition to
> enable the local partition, so validate_partition can be common block.
>
> https://lore.kernel.org/cgroups/20251025064844.495525-4-chenridong@huaweicloud.com/

I do think your patch series will make that simpler. You can certainly 
update your patch series to include that additional check into 
validate_partition().

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [cgroup/for-6.19 PATCH v2 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
  2025-11-04  4:10     ` Waiman Long
@ 2025-11-04  7:26       ` Chen Ridong
  0 siblings, 0 replies; 8+ messages in thread
From: Chen Ridong @ 2025-11-04  7:26 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný
  Cc: cgroups, linux-kernel, Chen Ridong, Gabriele Monaco,
	Frederic Weisbecker



On 2025/11/4 12:10, Waiman Long wrote:
> I do think your patch series will make that simpler. You can certainly update your patch series to
> include that additional check into validate_partition().

Okay. After you have reviewed the patches, I will prepare a revision that incorporates the
additional check.

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-04  7:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-04  1:30 [cgroup/for-6.19 PATCH v2 0/3] cgroup/cpuset: Additional housekeeping check & cleanup Waiman Long
2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 1/3] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() Waiman Long
2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Waiman Long
2025-11-04  2:19   ` Chen Ridong
2025-11-04  4:10     ` Waiman Long
2025-11-04  7:26       ` Chen Ridong
2025-11-04  1:30 ` [cgroup/for-6.19 PATCH v2 3/3] cgroup/cpuset: Globally track isolated_cpus update Waiman Long
2025-11-04  2:21   ` Chen Ridong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).