* [PATCH 0/3] cgroup/cpuset: Miscellaneous cleanup patches
@ 2025-04-07 21:21 Waiman Long
2025-04-07 21:21 ` [PATCH 1/3] cgroup/cpuset: Always use cpu_active_mask Waiman Long
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Waiman Long @ 2025-04-07 21:21 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Waiman Long
This series contains a number of cpuset cleanup patches.
Waiman Long (3):
cgroup/cpuset: Always use cpu_active_mask
cgroup/cpuset: Fix obsolete comment in cpuset_css_offline()
cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs
kernel/cgroup/cpuset.c | 84 ++++++++++++++++++++++++++++++------------
1 file changed, 61 insertions(+), 23 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/3] cgroup/cpuset: Always use cpu_active_mask
2025-04-07 21:21 [PATCH 0/3] cgroup/cpuset: Miscellaneous cleanup patches Waiman Long
@ 2025-04-07 21:21 ` Waiman Long
2025-04-07 21:21 ` [PATCH 2/3] cgroup/cpuset: Fix obsolete comment in cpuset_css_offline() Waiman Long
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2025-04-07 21:21 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Waiman Long
The current cpuset code uses both cpu_active_mask and cpu_online_mask
and it can be confusing which one should be used if we need to update
the code.
The top_cpuset is always synchronized to cpu_active_mask and we should
avoid using cpu_online_mask as much as possible. An active CPU is always
an online CPU, but not vice versa. cpu_active_mask and cpu_online_mask
can differ during hotplug operations.
A CPU is marked active at the last stage of CPU bringup (CPUHP_AP_ACTIVE).
It is also the stage where cpuset hotplug code will be called to update
the sched domains so that the scheduler can move a normal task to a
newly active CPU or remove tasks away from a newly inactivated CPU. The
online bit is set much earlier in the CPU bringup process and cleared
much later in CPU teardown.
If cpu_online_mask is used while a hotunplug operation is happening in
parallel, we may leave an offline CPU in cpu_allowed or have a higher
chance of leaving an offline CPU in some other masks. Avoid this
problem by always using cpu_active_mask in the cpuset code and leave
a comment as to why the use of cpu_online_mask is discouraged.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 32 +++++++++++++++++++++++---------
1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 306b60430091..583f20942802 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -192,6 +192,20 @@ static inline void notify_partition_change(struct cpuset *cs, int old_prs)
WRITE_ONCE(cs->prs_err, PERR_NONE);
}
+/*
+ * The top_cpuset is always synchronized to cpu_active_mask and we should avoid
+ * using cpu_online_mask as much as possible. An active CPU is always an online
+ * CPU, but not vice versa. cpu_active_mask and cpu_online_mask can differ
+ * during hotplug operations. A CPU is marked active at the last stage of CPU
+ * bringup (CPUHP_AP_ACTIVE). It is also the stage where cpuset hotplug code
+ * will be called to update the sched domains so that the scheduler can move
+ * a normal task to a newly active CPU or remove tasks away from a newly
+ * inactivated CPU. The online bit is set much earlier in the CPU bringup
+ * process and cleared much later in CPU teardown.
+ *
+ * If cpu_online_mask is used while a hotunplug operation is happening in
+ * parallel, we may leave an offline CPU in cpu_allowed or some other masks.
+ */
static struct cpuset top_cpuset = {
.flags = BIT(CS_ONLINE) | BIT(CS_CPU_EXCLUSIVE) |
BIT(CS_MEM_EXCLUSIVE) | BIT(CS_SCHED_LOAD_BALANCE),
@@ -355,18 +369,18 @@ static inline bool partition_is_populated(struct cpuset *cs,
* appropriate cpus.
*
* One way or another, we guarantee to return some non-empty subset
- * of cpu_online_mask.
+ * of cpu_active_mask.
*
* Call with callback_lock or cpuset_mutex held.
*/
-static void guarantee_online_cpus(struct task_struct *tsk,
+static void guarantee_active_cpus(struct task_struct *tsk,
struct cpumask *pmask)
{
const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
struct cpuset *cs;
- if (WARN_ON(!cpumask_and(pmask, possible_mask, cpu_online_mask)))
- cpumask_copy(pmask, cpu_online_mask);
+ if (WARN_ON(!cpumask_and(pmask, possible_mask, cpu_active_mask)))
+ cpumask_copy(pmask, cpu_active_mask);
rcu_read_lock();
cs = task_cs(tsk);
@@ -2263,7 +2277,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
bool force = false;
int old_prs = cs->partition_root_state;
- /* top_cpuset.cpus_allowed tracks cpu_online_mask; it's read-only */
+ /* top_cpuset.cpus_allowed tracks cpu_active_mask; it's read-only */
if (cs == &top_cpuset)
return -EACCES;
@@ -3082,7 +3096,7 @@ static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task)
lockdep_assert_held(&cpuset_mutex);
if (cs != &top_cpuset)
- guarantee_online_cpus(task, cpus_attach);
+ guarantee_active_cpus(task, cpus_attach);
else
cpumask_andnot(cpus_attach, task_cpu_possible_mask(task),
subpartitions_cpus);
@@ -4026,7 +4040,7 @@ void __init cpuset_init_smp(void)
*
* Description: Returns the cpumask_var_t cpus_allowed of the cpuset
* attached to the specified @tsk. Guaranteed to return some non-empty
- * subset of cpu_online_mask, even if this means going outside the
+ * subset of cpu_active_mask, even if this means going outside the
* tasks cpuset, except when the task is in the top cpuset.
**/
@@ -4040,7 +4054,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
cs = task_cs(tsk);
if (cs != &top_cpuset)
- guarantee_online_cpus(tsk, pmask);
+ guarantee_active_cpus(tsk, pmask);
/*
* Tasks in the top cpuset won't get update to their cpumasks
* when a hotplug online/offline event happens. So we include all
@@ -4054,7 +4068,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
* allowable online cpu left, we fall back to all possible cpus.
*/
cpumask_andnot(pmask, possible_mask, subpartitions_cpus);
- if (!cpumask_intersects(pmask, cpu_online_mask))
+ if (!cpumask_intersects(pmask, cpu_active_mask))
cpumask_copy(pmask, possible_mask);
}
--
2.48.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/3] cgroup/cpuset: Fix obsolete comment in cpuset_css_offline()
2025-04-07 21:21 [PATCH 0/3] cgroup/cpuset: Miscellaneous cleanup patches Waiman Long
2025-04-07 21:21 ` [PATCH 1/3] cgroup/cpuset: Always use cpu_active_mask Waiman Long
@ 2025-04-07 21:21 ` Waiman Long
2025-04-07 21:21 ` [PATCH 3/3] cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs Waiman Long
2025-04-07 22:05 ` [PATCH 0/3] cgroup/cpuset: Miscellaneous cleanup patches Tejun Heo
3 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2025-04-07 21:21 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Waiman Long
Move the partition disablement comment from cpuset_css_offline()
to cpuset_css_killed() and reword it to remove the remnant of
sched.partition.
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 583f20942802..1497a2a4b2a3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3538,11 +3538,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
* will call rebuild_sched_domains_locked(). That is not needed
* in the default hierarchy where only changes in partition
* will cause repartitioning.
- *
- * If the cpuset has the 'sched.partition' flag enabled, simulate
- * turning 'sched.partition" off.
*/
-
static void cpuset_css_offline(struct cgroup_subsys_state *css)
{
struct cpuset *cs = css_cs(css);
@@ -3560,6 +3556,11 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css)
cpus_read_unlock();
}
+/*
+ * If a dying cpuset has the 'cpus.partition' enabled, turn it off by
+ * changing it back to member to free its exclusive CPUs back to the pool to
+ * be used by other online cpusets.
+ */
static void cpuset_css_killed(struct cgroup_subsys_state *css)
{
struct cpuset *cs = css_cs(css);
--
2.48.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs
2025-04-07 21:21 [PATCH 0/3] cgroup/cpuset: Miscellaneous cleanup patches Waiman Long
2025-04-07 21:21 ` [PATCH 1/3] cgroup/cpuset: Always use cpu_active_mask Waiman Long
2025-04-07 21:21 ` [PATCH 2/3] cgroup/cpuset: Fix obsolete comment in cpuset_css_offline() Waiman Long
@ 2025-04-07 21:21 ` Waiman Long
2025-04-07 22:05 ` [PATCH 0/3] cgroup/cpuset: Miscellaneous cleanup patches Tejun Heo
3 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2025-04-07 21:21 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Waiman Long
Add WARN_ON_ONCE() statements whenever new exclusive CPUs are being
added to a partition root to catch inconsistency in the way exclusive
CPUs are being handled in the cpuset code.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 43 ++++++++++++++++++++++++++++++++----------
1 file changed, 33 insertions(+), 10 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 1497a2a4b2a3..d0143b3dce47 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1453,13 +1453,15 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
* The requested exclusive_cpus must not be allocated to other
* partitions and it can't use up all the root's effective_cpus.
*
- * Note that if there is any local partition root above it or
- * remote partition root underneath it, its exclusive_cpus must
- * have overlapped with subpartitions_cpus.
+ * The effective_xcpus mask can contain offline CPUs, but there must
+ * be at least one or more online CPUs present before it can be enabled.
+ *
+ * Note that creating a remote partition with any local partition root
+ * above it or remote partition root underneath it is not allowed.
*/
compute_effective_exclusive_cpumask(cs, tmp->new_cpus, NULL);
- if (cpumask_empty(tmp->new_cpus) ||
- cpumask_intersects(tmp->new_cpus, subpartitions_cpus) ||
+ WARN_ON_ONCE(cpumask_intersects(tmp->new_cpus, subpartitions_cpus));
+ if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
return PERR_INVCPUS;
@@ -1555,6 +1557,7 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
* left in the top cpuset.
*/
if (adding) {
+ WARN_ON_ONCE(cpumask_intersects(tmp->addmask, subpartitions_cpus));
if (!capable(CAP_SYS_ADMIN))
cs->prs_err = PERR_ACCESS;
else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
@@ -1664,7 +1667,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
bool nocpu;
lockdep_assert_held(&cpuset_mutex);
- WARN_ON_ONCE(is_remote_partition(cs));
+ WARN_ON_ONCE(is_remote_partition(cs)); /* For local partition only */
/*
* new_prs will only be changed for the partcmd_update and
@@ -1710,7 +1713,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
* exclusive_cpus not set. Sibling conflict should only happen
* if exclusive_cpus isn't set.
*/
- xcpus = tmp->new_cpus;
+ xcpus = tmp->delmask;
if (compute_effective_exclusive_cpumask(cs, xcpus, NULL))
WARN_ON_ONCE(!cpumask_empty(cs->exclusive_cpus));
@@ -1731,9 +1734,20 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
if (nocpu)
return PERR_NOCPUS;
- deleting = cpumask_and(tmp->delmask, xcpus, parent->effective_xcpus);
- if (deleting)
- subparts_delta++;
+ /*
+ * This function will only be called when all the preliminary
+ * checks have passed. At this point, the following condition
+ * should hold.
+ *
+ * (cs->effective_xcpus & cpu_active_mask) ⊆ parent->effective_cpus
+ *
+ * Warn if it is not the case.
+ */
+ cpumask_and(tmp->new_cpus, xcpus, cpu_active_mask);
+ WARN_ON_ONCE(!cpumask_subset(tmp->new_cpus, parent->effective_cpus));
+
+ deleting = true;
+ subparts_delta++;
new_prs = (cmd == partcmd_enable) ? PRS_ROOT : PRS_ISOLATED;
} else if (cmd == partcmd_disable) {
/*
@@ -1787,6 +1801,15 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
deleting = cpumask_and(tmp->delmask, tmp->delmask,
parent->effective_xcpus);
}
+ /*
+ * The new CPUs to be removed from parent's effective CPUs
+ * must be present.
+ */
+ if (deleting) {
+ cpumask_and(tmp->new_cpus, tmp->delmask, cpu_active_mask);
+ WARN_ON_ONCE(!cpumask_subset(tmp->new_cpus, parent->effective_cpus));
+ }
+
/*
* Make partition invalid if parent's effective_cpus could
* become empty and there are tasks in the parent.
--
2.48.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] cgroup/cpuset: Miscellaneous cleanup patches
2025-04-07 21:21 [PATCH 0/3] cgroup/cpuset: Miscellaneous cleanup patches Waiman Long
` (2 preceding siblings ...)
2025-04-07 21:21 ` [PATCH 3/3] cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs Waiman Long
@ 2025-04-07 22:05 ` Tejun Heo
3 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2025-04-07 22:05 UTC (permalink / raw)
To: Waiman Long; +Cc: Johannes Weiner, Michal Koutný, cgroups, linux-kernel
On Mon, Apr 07, 2025 at 05:21:02PM -0400, Waiman Long wrote:
> This series contains a number of cpuset cleanup patches.
>
> Waiman Long (3):
> cgroup/cpuset: Always use cpu_active_mask
> cgroup/cpuset: Fix obsolete comment in cpuset_css_offline()
> cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs
Applied 1-3 to cgroup/for-6.16.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-04-07 22:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-07 21:21 [PATCH 0/3] cgroup/cpuset: Miscellaneous cleanup patches Waiman Long
2025-04-07 21:21 ` [PATCH 1/3] cgroup/cpuset: Always use cpu_active_mask Waiman Long
2025-04-07 21:21 ` [PATCH 2/3] cgroup/cpuset: Fix obsolete comment in cpuset_css_offline() Waiman Long
2025-04-07 21:21 ` [PATCH 3/3] cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs Waiman Long
2025-04-07 22:05 ` [PATCH 0/3] cgroup/cpuset: Miscellaneous cleanup patches Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox