* [cgroup/for-6.20 PATCH v2 0/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict
@ 2026-01-01 19:15 Waiman Long
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 1/4] cgroup/cpuset: Streamline rm_siblings_excl_cpus() Waiman Long
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: Waiman Long @ 2026-01-01 19:15 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie,
Chen Ridong, Waiman Long
v2:
- Patch 1: additional comment
- Patch 2: simplify the conditions for triggering call to
compute_excpus().
- Patch 3: update description of cpuset.cpus.exclusive in cgroup-v2.rst
to reflect the new behavior and change the name of the new
cpus_excl_conflict() parameter to xcpus_changed.
- Patch 4: update description of cpuset.cpus.partition in cgroup-v2.rst
to clarify what exclusive CPUs will be used when a partition is
created.
This patch series is inspired by the cpuset patch sent by Sun Shaojie [1].
The idea is to avoid invalidating sibling partitions when there is a
cpuset.cpus conflict. However this patch series does it in a slightly
different way to make its behavior more consistent with other cpuset
properties.
The first 3 patches are just some cleanup and minor bug fixes on
issues found during the investigation process. The last one is
the major patch that changes the way cpuset.cpus is being handled
during the partition creation process. Instead of invalidating sibling
partitions when there is a conflict, it will strip out the conflicting
exclusive CPUs and assign the remaining non-conflicting exclusive
CPUs to the new partition unless there is no more CPU left which will
fail the partition creation process. It is similar to the idea that
cpuset.cpus.effective may only contain a subset of CPUs specified in
cpuset.cpus. So cpuset.cpus.exclusive.effective may contain only a
subset of cpuset.cpus when a partition is created without setting
cpuset.cpus.exclusive.
Even setting cpuset.cpus.exclusive instead of cpuset.cpus may not
guarantee all the requested CPUs can be granted if parent doesn't have
access to some of those exclusive CPUs. The difference is that conflicts
from siblings is not possible with cpuset.cpus.exclusive as long as it
can be set successfully without failure.
[1] https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kylinos.cn/
Waiman Long (4):
cgroup/cpuset: Streamline rm_siblings_excl_cpus()
cgroup/cpuset: Consistently compute effective_xcpus in
update_cpumasks_hier()
cgroup/cpuset: Don't fail cpuset.cpus change in v2
cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus
conflict
Documentation/admin-guide/cgroup-v2.rst | 40 +++--
kernel/cgroup/cpuset-internal.h | 3 +
kernel/cgroup/cpuset-v1.c | 19 +++
kernel/cgroup/cpuset.c | 141 +++++++-----------
.../selftests/cgroup/test_cpuset_prs.sh | 26 +++-
5 files changed, 125 insertions(+), 104 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [cgroup/for-6.20 PATCH v2 1/4] cgroup/cpuset: Streamline rm_siblings_excl_cpus()
2026-01-01 19:15 [cgroup/for-6.20 PATCH v2 0/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict Waiman Long
@ 2026-01-01 19:15 ` Waiman Long
2026-01-04 1:55 ` Chen Ridong
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier() Waiman Long
` (2 subsequent siblings)
3 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2026-01-01 19:15 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie,
Chen Ridong, Waiman Long
If exclusive_cpus is set, effective_xcpus must be a subset of
exclusive_cpus. Currently, rm_siblings_excl_cpus() checks both
exclusive_cpus and effective_xcpus consecutively. It is simpler
to check only exclusive_cpus if non-empty or just effective_xcpus
otherwise.
No functional change is expected.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 221da921b4f9..da2b3b51630e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1355,23 +1355,29 @@ static int rm_siblings_excl_cpus(struct cpuset *parent, struct cpuset *cs,
int retval = 0;
if (cpumask_empty(excpus))
- return retval;
+ return 0;
/*
- * Exclude exclusive CPUs from siblings
+ * Remove exclusive CPUs from siblings
*/
rcu_read_lock();
cpuset_for_each_child(sibling, css, parent) {
+ struct cpumask *sibling_xcpus;
+
if (sibling == cs)
continue;
- if (cpumask_intersects(excpus, sibling->exclusive_cpus)) {
- cpumask_andnot(excpus, excpus, sibling->exclusive_cpus);
- retval++;
- continue;
- }
- if (cpumask_intersects(excpus, sibling->effective_xcpus)) {
- cpumask_andnot(excpus, excpus, sibling->effective_xcpus);
+ /*
+ * If exclusive_cpus is defined, effective_xcpus will always
+ * be a subset. Otherwise, effective_xcpus will only be set
+ * in a valid partition root.
+ */
+ sibling_xcpus = cpumask_empty(sibling->exclusive_cpus)
+ ? sibling->effective_xcpus
+ : sibling->exclusive_cpus;
+
+ if (cpumask_intersects(excpus, sibling_xcpus)) {
+ cpumask_andnot(excpus, excpus, sibling_xcpus);
retval++;
}
}
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
2026-01-01 19:15 [cgroup/for-6.20 PATCH v2 0/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict Waiman Long
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 1/4] cgroup/cpuset: Streamline rm_siblings_excl_cpus() Waiman Long
@ 2026-01-01 19:15 ` Waiman Long
2026-01-04 2:48 ` Chen Ridong
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2 Waiman Long
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 4/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict Waiman Long
3 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2026-01-01 19:15 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie,
Chen Ridong, Waiman Long
Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check()
& make update_cpumasks_hier() handle remote partition"), the
compute_effective_exclusive_cpumask() helper was extended to
strip exclusive CPUs from siblings when computing effective_xcpus
(cpuset.cpus.exclusive.effective). This helper was later renamed to
compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
CPU mask computation logic").
This helper is supposed to be used consistently to compute
effective_xcpus. However, there is an exception within the callback
critical section in update_cpumasks_hier() when exclusive_cpus of a
valid partition root is empty. This can cause effective_xcpus value to
differ depending on where exactly it is last computed. Fix this by using
compute_excpus() in this case to give a consistent result.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index da2b3b51630e..37d118a9ad4d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2168,17 +2168,13 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
spin_lock_irq(&callback_lock);
cpumask_copy(cp->effective_cpus, tmp->new_cpus);
cp->partition_root_state = new_prs;
- if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
- compute_excpus(cp, cp->effective_xcpus);
-
/*
- * Make sure effective_xcpus is properly set for a valid
- * partition root.
+ * Need to compute effective_xcpus if either exclusive_cpus
+ * is non-empty or it is a valid partition root.
*/
- if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
- cpumask_and(cp->effective_xcpus,
- cp->cpus_allowed, parent->effective_xcpus);
- else if (new_prs < 0)
+ if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
+ compute_excpus(cp, cp->effective_xcpus);
+ if (new_prs < 0)
reset_partition_data(cp);
spin_unlock_irq(&callback_lock);
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2
2026-01-01 19:15 [cgroup/for-6.20 PATCH v2 0/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict Waiman Long
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 1/4] cgroup/cpuset: Streamline rm_siblings_excl_cpus() Waiman Long
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier() Waiman Long
@ 2026-01-01 19:15 ` Waiman Long
2026-01-04 7:09 ` Chen Ridong
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 4/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict Waiman Long
3 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2026-01-01 19:15 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie,
Chen Ridong, Waiman Long
Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
until valid partition") introduced a new check to disallow the setting
of a new cpuset.cpus.exclusive value that is a superset of a sibling's
cpuset.cpus value so that there will at least be one CPU left in the
sibling in case the cpuset becomes a valid partition root. This new
check does have the side effect of failing a cpuset.cpus change that
make it a subset of a sibling's cpuset.cpus.exclusive value.
With v2, users are supposed to be allowed to set whatever value they
want in cpuset.cpus without failure. To maintain this rule, the check
is now restricted to only when cpuset.cpus.exclusive is being changed
not when cpuset.cpus is changed.
The cgroup-v2.rst doc file is also updated to reflect this change.
Signed-off-by: Waiman Long <longman@redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 8 +++----
kernel/cgroup/cpuset.c | 30 ++++++++++++-------------
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 7f5b59d95fce..510df2461aff 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2561,10 +2561,10 @@ Cpuset Interface Files
Users can manually set it to a value that is different from
"cpuset.cpus". One constraint in setting it is that the list of
CPUs must be exclusive with respect to "cpuset.cpus.exclusive"
- of its sibling. If "cpuset.cpus.exclusive" of a sibling cgroup
- isn't set, its "cpuset.cpus" value, if set, cannot be a subset
- of it to leave at least one CPU available when the exclusive
- CPUs are taken away.
+ and "cpuset.cpus.exclusive.effective" of its siblings. Another
+ constraint is that it cannot be a superset of "cpuset.cpus"
+ of its sibling in order to leave at least one CPU available to
+ that sibling when the exclusive CPUs are taken away.
For a parent cgroup, any one of its exclusive CPUs can only
be distributed to at most one of its child cgroups. Having an
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 37d118a9ad4d..30e31fac4fe3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -609,33 +609,31 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
/**
* cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts
- * @cs1: first cpuset to check
- * @cs2: second cpuset to check
+ * @trial: the trial cpuset to be checked
+ * @sibling: a sibling cpuset to be checked against
+ * @xcpus_changed: set if exclusive_cpus has been set
*
* Returns: true if CPU exclusivity conflict exists, false otherwise
*
* Conflict detection rules:
* 1. If either cpuset is CPU exclusive, they must be mutually exclusive
* 2. exclusive_cpus masks cannot intersect between cpusets
- * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
+ * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs
*/
-static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
+static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling,
+ bool xcpus_changed)
{
/* If either cpuset is exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
- return !cpusets_are_exclusive(cs1, cs2);
+ if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling))
+ return !cpusets_are_exclusive(trial, sibling);
/* Exclusive_cpus cannot intersect */
- if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus))
+ if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus))
return true;
- /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
- if (!cpumask_empty(cs1->cpus_allowed) &&
- cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
- return true;
-
- if (!cpumask_empty(cs2->cpus_allowed) &&
- cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
+ /* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */
+ if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) &&
+ cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus))
return true;
return false;
@@ -672,6 +670,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
{
struct cgroup_subsys_state *css;
struct cpuset *c, *par;
+ bool xcpus_changed;
int ret = 0;
rcu_read_lock();
@@ -728,10 +727,11 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
* overlap. exclusive_cpus cannot overlap with each other if set.
*/
ret = -EINVAL;
+ xcpus_changed = !cpumask_equal(cur->exclusive_cpus, trial->exclusive_cpus);
cpuset_for_each_child(c, css, par) {
if (c == cur)
continue;
- if (cpus_excl_conflict(trial, c))
+ if (cpus_excl_conflict(trial, c, xcpus_changed))
goto out;
if (mems_excl_conflict(trial, c))
goto out;
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [cgroup/for-6.20 PATCH v2 4/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict
2026-01-01 19:15 [cgroup/for-6.20 PATCH v2 0/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict Waiman Long
` (2 preceding siblings ...)
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2 Waiman Long
@ 2026-01-01 19:15 ` Waiman Long
2026-01-04 7:53 ` Chen Ridong
3 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2026-01-01 19:15 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie,
Chen Ridong, Waiman Long
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition,
the sibling's partition state becomes invalid. This is overly harsh and
is probably not necessary.
The cpuset.cpus.exclusive control file, if set, will override the
cpuset.cpus of the same cpuset when creating a cpuset partition.
So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up
a partition. However, it cannot override a conflicting cpuset.cpus file
in a sibling cpuset and the partition creation process will fail. This
is inconsistent. That will also make using cpuset.cpus.exclusive less
valuable as a tool to set up cpuset partitions as the users have to
check if such a cpuset.cpus conflict exists or not.
Fix these problems by strictly adhering to the setting of the
following control files in descending order of priority when setting
up a partition.
1. cpuset.cpus.exclusive.effective of a valid partition
2. cpuset.cpus.exclusive
3. cpuset.cpus
So once a cpuset.cpus.exclusive is set without failure, it will
always be allowed to form a valid partition as long as at least one
CPU can be granted from its parent irrespective of the state of the
siblings' cpuset.cpus values. Of course, setting cpuset.cpus.exclusive
will fail if it conflicts with the cpuset.cpus.exclusive or the
cpuset.cpus.exclusive.effective value of a sibling.
Partition can still be created by setting only cpuset.cpus without
setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's
cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will
be removed from its cpuset.cpus.exclusive.effective as long as there
is still one or more CPUs left and can be granted from its parent. This
CPU stripping is currently done in rm_siblings_excl_cpus().
The new code will now try its best to enable the creation of new
partitions with only cpuset.cpus set without invalidating existing ones.
However it is not guaranteed that all the CPUs requested in cpuset.cpus
will be used in the new partition even when all these CPUs can be
granted from the parent.
This is similar to the fact that cpuset.cpus.effective may not be
able to include all the CPUs requested in cpuset.cpus. In this case,
the parent may not able to grant all the exclusive CPUs requested in
cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have
already been granted to other partitions earlier.
With the creation of multiple sibling partitions by setting
only cpuset.cpus, this does have the side effect that their exact
cpuset.cpus.exclusive.effective settings will depend on the order of
partition creation if there are conflicts. Due to the exclusive nature
of the CPUs in a partition, it is not easy to make it fair other than
the old behavior of invalidating all the conflicting partitions.
For example,
# echo "0-2" > A1/cpuset.cpus
# echo "root" > A1/cpuset.cpus.partition
# echo A1/cpuset.cpus.partition
root
# echo A1/cpuset.cpus.exclusive.effective
0-2
# echo "2-4" > B1/cpuset.cpus
# echo "root" > B1/cpuset.cpus.partition
# echo B1/cpuset.cpus.partition
root
# echo B1/cpuset.cpus.exclusive.effective
3-4
# echo B1/cpuset.cpus.effective
3-4
For users who want to be sure that they can get most of the CPUs they
want, cpuset.cpus.exclusive should be used instead if they can set
it successfully without failure. Setting cpuset.cpus.exclusive will
guarantee that sibling conflicts from then onward is no longer possible.
To make this change, we have to separate out the is_cpu_exclusive()
check in cpus_excl_conflict() into a cgroup v1 only
cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change()
helper is now no longer needed and can be removed.
Some existing tests in test_cpuset_prs.sh are updated and new ones are
added to reflect the new behavior. The cgroup-v2.rst doc file is also
updated the clarify what exclusive CPUs will be used when a partition
is created.
Reported-by: Sun Shaojie <sunshaojie@kylinos.cn>
Closes: https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kylinos.cn/
Signed-off-by: Waiman Long <longman@redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 32 +++++---
kernel/cgroup/cpuset-internal.h | 3 +
kernel/cgroup/cpuset-v1.c | 19 +++++
kernel/cgroup/cpuset.c | 81 +++++++------------
.../selftests/cgroup/test_cpuset_prs.sh | 26 ++++--
5 files changed, 90 insertions(+), 71 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 510df2461aff..a3446db96cea 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2584,9 +2584,9 @@ Cpuset Interface Files
of this file will always be a subset of its parent's
"cpuset.cpus.exclusive.effective" if its parent is not the root
cgroup. It will also be a subset of "cpuset.cpus.exclusive"
- if it is set. If "cpuset.cpus.exclusive" is not set, it is
- treated to have an implicit value of "cpuset.cpus" in the
- formation of local partition.
+ if it is set. This file should only be non-empty if either
+ "cpuset.cpus.exclusive" is set or when the current cpuset is
+ a valid partition root.
cpuset.cpus.isolated
A read-only and root cgroup only multiple values file.
@@ -2618,13 +2618,22 @@ Cpuset Interface Files
There are two types of partitions - local and remote. A local
partition is one whose parent cgroup is also a valid partition
root. A remote partition is one whose parent cgroup is not a
- valid partition root itself. Writing to "cpuset.cpus.exclusive"
- is optional for the creation of a local partition as its
- "cpuset.cpus.exclusive" file will assume an implicit value that
- is the same as "cpuset.cpus" if it is not set. Writing the
- proper "cpuset.cpus.exclusive" values down the cgroup hierarchy
- before the target partition root is mandatory for the creation
- of a remote partition.
+ valid partition root itself.
+
+ Writing to "cpuset.cpus.exclusive" is optional for the creation
+ of a local partition as its "cpuset.cpus.exclusive" file will
+ assume an implicit value that is the same as "cpuset.cpus" if it
+ is not set. Writing the proper "cpuset.cpus.exclusive" values
+ down the cgroup hierarchy before the target partition root is
+ mandatory for the creation of a remote partition.
+
+ Not all the CPUs requested in "cpuset.cpus.exclusive" can be
+ used to form a new partition. Only those that were present
+ in its parent's "cpuset.cpus.exclusive.effective" control
+ file can be used. For partitions created without setting
+ "cpuset.cpus.exclusive", exclusive CPUs specified in sibling's
+ "cpuset.cpus.exclusive" or "cpuset.cpus.exclusive.effective"
+ also cannot be used.
Currently, a remote partition cannot be created under a local
partition. All the ancestors of a remote partition root except
@@ -2632,6 +2641,9 @@ Cpuset Interface Files
The root cgroup is always a partition root and its state cannot
be changed. All other non-root cgroups start out as "member".
+ Even though the "cpuset.cpus.exclusive*" control files are not
+ present in the root cgroup, they are implicitly the same as
+ "cpuset.cpus".
When set to "root", the current cgroup is the root of a new
partition or scheduling domain. The set of exclusive CPUs is
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index e718a4f54360..e8e2683cb067 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -312,6 +312,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
struct cpumask *new_cpus, nodemask_t *new_mems,
bool cpus_updated, bool mems_updated);
int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2);
void cpuset1_init(struct cpuset *cs);
void cpuset1_online_css(struct cgroup_subsys_state *css);
int cpuset1_generate_sched_domains(cpumask_var_t **domains,
@@ -326,6 +327,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
bool cpus_updated, bool mems_updated) {}
static inline int cpuset1_validate_change(struct cpuset *cur,
struct cpuset *trial) { return 0; }
+static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
+ struct cpuset *cs2) { return false; }
static inline void cpuset1_init(struct cpuset *cs) {}
static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index ecfea7800f0d..04124c38a774 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial)
return ret;
}
+/*
+ * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
+ * to legacy (v1)
+ * @cs1: first cpuset to check
+ * @cs2: second cpuset to check
+ *
+ * Returns: true if CPU exclusivity conflict exists, false otherwise
+ *
+ * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
+ */
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
+{
+ if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
+ return cpumask_intersects(cs1->cpus_allowed,
+ cs2->cpus_allowed);
+
+ return false;
+}
+
#ifdef CONFIG_PROC_PID_CPUSET
/*
* proc_cpuset_show()
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 30e31fac4fe3..c091b63dd3c9 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -129,6 +129,18 @@ static bool force_sd_rebuild;
* For simplicity, a local partition can be created under a local or remote
* partition but a remote partition cannot have any partition root in its
* ancestor chain except the cgroup root.
+ *
+ * A valid partition can be formed by setting either cpus_allowed or
+ * exclusive_cpus. If there are exclusive CPU conflicts, the conflicting
+ * CPUs will be assigned to the effective_xcpus of the partition according
+ * to the appearance of those CPUs in cpumasks (in descending order of
+ * priority).
+ * 1. effective_xcpus of a valid partition
+ * 2. exclusive_cpus
+ * 3. cpus_allowed
+ *
+ * exclusive_cpus should be used for setting up partition if the users want
+ * to get as many CPUs as possible.
*/
#define PRS_MEMBER 0
#define PRS_ROOT 1
@@ -616,27 +628,25 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
* Returns: true if CPU exclusivity conflict exists, false otherwise
*
* Conflict detection rules:
- * 1. If either cpuset is CPU exclusive, they must be mutually exclusive
- * 2. exclusive_cpus masks cannot intersect between cpusets
- * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs
+ * o cgroup v1
+ * See cpuset1_cpus_excl_conflict()
+ * o cgroup v2
+ * - The exclusive_cpus values cannot overlap.
+ * - New exclusive_cpus cannot be a superset of a sibling's cpus_allowed.
*/
static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling,
bool xcpus_changed)
{
- /* If either cpuset is exclusive, check if they are mutually exclusive */
- if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling))
- return !cpusets_are_exclusive(trial, sibling);
-
- /* Exclusive_cpus cannot intersect */
- if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus))
- return true;
+ if (!cpuset_v2())
+ return cpuset1_cpus_excl_conflict(trial, sibling);
/* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */
if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) &&
cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus))
return true;
- return false;
+ /* Exclusive_cpus cannot intersect */
+ return cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus);
}
static inline bool mems_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
@@ -2312,43 +2322,6 @@ static enum prs_errcode validate_partition(struct cpuset *cs, struct cpuset *tri
return PERR_NONE;
}
-static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialcs,
- struct tmpmasks *tmp)
-{
- int retval;
- struct cpuset *parent = parent_cs(cs);
-
- retval = validate_change(cs, trialcs);
-
- if ((retval == -EINVAL) && cpuset_v2()) {
- struct cgroup_subsys_state *css;
- struct cpuset *cp;
-
- /*
- * The -EINVAL error code indicates that partition sibling
- * CPU exclusivity rule has been violated. We still allow
- * the cpumask change to proceed while invalidating the
- * partition. However, any conflicting sibling partitions
- * have to be marked as invalid too.
- */
- trialcs->prs_err = PERR_NOTEXCL;
- rcu_read_lock();
- cpuset_for_each_child(cp, css, parent) {
- struct cpumask *xcpus = user_xcpus(trialcs);
-
- if (is_partition_valid(cp) &&
- cpumask_intersects(xcpus, cp->effective_xcpus)) {
- rcu_read_unlock();
- update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
- rcu_read_lock();
- }
- }
- rcu_read_unlock();
- retval = 0;
- }
- return retval;
-}
-
/**
* partition_cpus_change - Handle partition state changes due to CPU mask updates
* @cs: The target cpuset being modified
@@ -2408,15 +2381,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
if (cpumask_equal(cs->cpus_allowed, trialcs->cpus_allowed))
return 0;
- if (alloc_tmpmasks(&tmp))
- return -ENOMEM;
-
compute_trialcs_excpus(trialcs, cs);
trialcs->prs_err = PERR_NONE;
- retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
+ retval = validate_change(cs, trialcs);
if (retval < 0)
- goto out_free;
+ return retval;
+
+ if (alloc_tmpmasks(&tmp))
+ return -ENOMEM;
/*
* Check all the descendants in update_cpumasks_hier() if
@@ -2439,7 +2412,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
/* Update CS_SCHED_LOAD_BALANCE and/or sched_domains, if necessary */
if (cs->partition_root_state)
update_partition_sd_lb(cs, old_prs);
-out_free:
+
free_tmpmasks(&tmp);
return retval;
}
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index a17256d9f88a..ff4540b0490e 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -269,7 +269,7 @@ TEST_MATRIX=(
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
- " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
+ " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2"
" C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5"
" C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
" C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
@@ -318,7 +318,7 @@ TEST_MATRIX=(
# Invalid to valid local partition direct transition tests
" C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
" C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
- " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
+ " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0"
" C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
# Local partition invalidation tests
@@ -388,10 +388,10 @@ TEST_MATRIX=(
" C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2"
" C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
- # A non-exclusive cpuset.cpus change will invalidate partition and its siblings
- " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
- " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
- " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
+ # A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
+ " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3 A1:P1|B1:P0"
+ " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|XA1:0-1|B1:2-3 A1:P1|B1:P1"
+ " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
" C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"
@@ -417,6 +417,14 @@ TEST_MATRIX=(
" CX1-4:S+ CX2-4:P2 . C5-6 . . . P1:C3-6 0 A1:1|A2:2-4|B1:5-6 \
A1:P0|A2:P2:B1:P-1 2-4"
+ # When multiple partitions with conflicting cpuset.cpus are created, the
+ # latter created ones will only get what are left of the available exclusive
+ # CPUs.
+ " C1-3:P1 . . . . . . C3-5:P1 0 A1:1-3|B1:4-5:XB1:4-5 A1:P1|B1:P1"
+
+ # cpuset.cpus can be set to a subset of sibling's cpuset.cpus.exclusive
+ " C1-3:X1-3 . . C4-5 . . . C1-2 0 A1:1-3|B1:1-2"
+
# old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
# ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ --------
# Failure cases:
@@ -427,7 +435,7 @@ TEST_MATRIX=(
# Changes to cpuset.cpus.exclusive that violate exclusivity rule is rejected
" C0-3 . . C4-5 X0-3 . . X3-5 1 A1:0-3|B1:4-5"
- # cpuset.cpus cannot be a subset of sibling cpuset.cpus.exclusive
+ # cpuset.cpus.exclusive cannot be set to a superset of sibling's cpuset.cpus
" C0-3 . . C4-5 X3-5 . . . 1 A1:0-3|B1:4-5"
)
@@ -477,6 +485,10 @@ REMOTE_TEST_MATRIX=(
. . X1-2:P2 X4-5:P1 . X1-7:P2 p1:3|c11:1-2|c12:4:c22:5-6 \
p1:P0|p2:P1|c11:P2|c12:P1|c22:P2 \
1-2,4-6|1-2,5-6"
+ # c12 whose cpuset.cpus CPUs are all granted to c11 will become invalid partition
+ " C1-5:P1:S+ . C1-4:P1 C2-3 . . \
+ . . . P1 . . p1:5|c11:1-4|c12:5 \
+ p1:P1|c11:P1|c12:P-1"
)
#
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 1/4] cgroup/cpuset: Streamline rm_siblings_excl_cpus()
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 1/4] cgroup/cpuset: Streamline rm_siblings_excl_cpus() Waiman Long
@ 2026-01-04 1:55 ` Chen Ridong
0 siblings, 0 replies; 20+ messages in thread
From: Chen Ridong @ 2026-01-04 1:55 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 2026/1/2 3:15, Waiman Long wrote:
> If exclusive_cpus is set, effective_xcpus must be a subset of
> exclusive_cpus. Currently, rm_siblings_excl_cpus() checks both
> exclusive_cpus and effective_xcpus consecutively. It is simpler
> to check only exclusive_cpus if non-empty or just effective_xcpus
> otherwise.
>
> No functional change is expected.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/cgroup/cpuset.c | 24 +++++++++++++++---------
> 1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 221da921b4f9..da2b3b51630e 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1355,23 +1355,29 @@ static int rm_siblings_excl_cpus(struct cpuset *parent, struct cpuset *cs,
> int retval = 0;
>
> if (cpumask_empty(excpus))
> - return retval;
> + return 0;
>
> /*
> - * Exclude exclusive CPUs from siblings
> + * Remove exclusive CPUs from siblings
> */
> rcu_read_lock();
> cpuset_for_each_child(sibling, css, parent) {
> + struct cpumask *sibling_xcpus;
> +
> if (sibling == cs)
> continue;
>
> - if (cpumask_intersects(excpus, sibling->exclusive_cpus)) {
> - cpumask_andnot(excpus, excpus, sibling->exclusive_cpus);
> - retval++;
> - continue;
> - }
> - if (cpumask_intersects(excpus, sibling->effective_xcpus)) {
> - cpumask_andnot(excpus, excpus, sibling->effective_xcpus);
> + /*
> + * If exclusive_cpus is defined, effective_xcpus will always
> + * be a subset. Otherwise, effective_xcpus will only be set
> + * in a valid partition root.
> + */
> + sibling_xcpus = cpumask_empty(sibling->exclusive_cpus)
> + ? sibling->effective_xcpus
> + : sibling->exclusive_cpus;
> +
> + if (cpumask_intersects(excpus, sibling_xcpus)) {
> + cpumask_andnot(excpus, excpus, sibling_xcpus);
> retval++;
> }
> }
LGTM
Reviewed-by: Chen Ridong<chenridong@huawei.com>
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier() Waiman Long
@ 2026-01-04 2:48 ` Chen Ridong
2026-01-04 21:25 ` Waiman Long
0 siblings, 1 reply; 20+ messages in thread
From: Chen Ridong @ 2026-01-04 2:48 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 2026/1/2 3:15, Waiman Long wrote:
> Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check()
> & make update_cpumasks_hier() handle remote partition"), the
> compute_effective_exclusive_cpumask() helper was extended to
> strip exclusive CPUs from siblings when computing effective_xcpus
> (cpuset.cpus.exclusive.effective). This helper was later renamed to
> compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
> CPU mask computation logic").
>
> This helper is supposed to be used consistently to compute
> effective_xcpus. However, there is an exception within the callback
> critical section in update_cpumasks_hier() when exclusive_cpus of a
> valid partition root is empty. This can cause effective_xcpus value to
> differ depending on where exactly it is last computed. Fix this by using
> compute_excpus() in this case to give a consistent result.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/cgroup/cpuset.c | 14 +++++---------
> 1 file changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index da2b3b51630e..37d118a9ad4d 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2168,17 +2168,13 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
> spin_lock_irq(&callback_lock);
> cpumask_copy(cp->effective_cpus, tmp->new_cpus);
> cp->partition_root_state = new_prs;
> - if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
> - compute_excpus(cp, cp->effective_xcpus);
> -
> /*
> - * Make sure effective_xcpus is properly set for a valid
> - * partition root.
> + * Need to compute effective_xcpus if either exclusive_cpus
> + * is non-empty or it is a valid partition root.
> */
> - if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
> - cpumask_and(cp->effective_xcpus,
> - cp->cpus_allowed, parent->effective_xcpus);
> - else if (new_prs < 0)
> + if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
> + compute_excpus(cp, cp->effective_xcpus);
> + if (new_prs < 0)
> reset_partition_data(cp);
> spin_unlock_irq(&callback_lock);
>
The code resets partition data only for new_prs < 0. My understanding is that a partition is invalid
when new_prs <= 0. Shouldn't reset_partition_data() also be called when new_prs = 0? Is there a
specific reason to skip the reset in that case?
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2 Waiman Long
@ 2026-01-04 7:09 ` Chen Ridong
2026-01-04 21:48 ` Waiman Long
0 siblings, 1 reply; 20+ messages in thread
From: Chen Ridong @ 2026-01-04 7:09 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 2026/1/2 3:15, Waiman Long wrote:
> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
> until valid partition") introduced a new check to disallow the setting
> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
> cpuset.cpus value so that there will at least be one CPU left in the
> sibling in case the cpuset becomes a valid partition root. This new
> check does have the side effect of failing a cpuset.cpus change that
> make it a subset of a sibling's cpuset.cpus.exclusive value.
>
> With v2, users are supposed to be allowed to set whatever value they
> want in cpuset.cpus without failure. To maintain this rule, the check
> is now restricted to only when cpuset.cpus.exclusive is being changed
> not when cpuset.cpus is changed.
>
Hi, Longman,
You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
explicitly documented. Should we add it?
More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
constraints? This seems to be the underlying assumption in this patch.
On the implementation side, the patch looks good to me.
> The cgroup-v2.rst doc file is also updated to reflect this change.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 8 +++----
> kernel/cgroup/cpuset.c | 30 ++++++++++++-------------
> 2 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 7f5b59d95fce..510df2461aff 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -2561,10 +2561,10 @@ Cpuset Interface Files
> Users can manually set it to a value that is different from
> "cpuset.cpus". One constraint in setting it is that the list of
> CPUs must be exclusive with respect to "cpuset.cpus.exclusive"
> - of its sibling. If "cpuset.cpus.exclusive" of a sibling cgroup
> - isn't set, its "cpuset.cpus" value, if set, cannot be a subset
> - of it to leave at least one CPU available when the exclusive
> - CPUs are taken away.
> + and "cpuset.cpus.exclusive.effective" of its siblings. Another
> + constraint is that it cannot be a superset of "cpuset.cpus"
> + of its sibling in order to leave at least one CPU available to
> + that sibling when the exclusive CPUs are taken away.
>
> For a parent cgroup, any one of its exclusive CPUs can only
> be distributed to at most one of its child cgroups. Having an
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 37d118a9ad4d..30e31fac4fe3 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -609,33 +609,31 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
>
> /**
> * cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts
> - * @cs1: first cpuset to check
> - * @cs2: second cpuset to check
> + * @trial: the trial cpuset to be checked
> + * @sibling: a sibling cpuset to be checked against
> + * @xcpus_changed: set if exclusive_cpus has been set
> *
> * Returns: true if CPU exclusivity conflict exists, false otherwise
> *
> * Conflict detection rules:
> * 1. If either cpuset is CPU exclusive, they must be mutually exclusive
> * 2. exclusive_cpus masks cannot intersect between cpusets
> - * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
> + * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs
> */
> -static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
> +static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling,
> + bool xcpus_changed)
> {
> /* If either cpuset is exclusive, check if they are mutually exclusive */
> - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
> - return !cpusets_are_exclusive(cs1, cs2);
> + if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling))
> + return !cpusets_are_exclusive(trial, sibling);
>
> /* Exclusive_cpus cannot intersect */
> - if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus))
> + if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus))
> return true;
>
> - /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
> - if (!cpumask_empty(cs1->cpus_allowed) &&
> - cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
> - return true;
> -
> - if (!cpumask_empty(cs2->cpus_allowed) &&
> - cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
> + /* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */
> + if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) &&
> + cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus))
> return true;
>
> return false;
> @@ -672,6 +670,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
> {
> struct cgroup_subsys_state *css;
> struct cpuset *c, *par;
> + bool xcpus_changed;
> int ret = 0;
>
> rcu_read_lock();
> @@ -728,10 +727,11 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
> * overlap. exclusive_cpus cannot overlap with each other if set.
> */
> ret = -EINVAL;
> + xcpus_changed = !cpumask_equal(cur->exclusive_cpus, trial->exclusive_cpus);
> cpuset_for_each_child(c, css, par) {
> if (c == cur)
> continue;
> - if (cpus_excl_conflict(trial, c))
> + if (cpus_excl_conflict(trial, c, xcpus_changed))
> goto out;
> if (mems_excl_conflict(trial, c))
> goto out;
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 4/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 4/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict Waiman Long
@ 2026-01-04 7:53 ` Chen Ridong
2026-01-04 22:26 ` Waiman Long
0 siblings, 1 reply; 20+ messages in thread
From: Chen Ridong @ 2026-01-04 7:53 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 2026/1/2 3:15, Waiman Long wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition,
> the sibling's partition state becomes invalid. This is overly harsh and
> is probably not necessary.
>
> The cpuset.cpus.exclusive control file, if set, will override the
> cpuset.cpus of the same cpuset when creating a cpuset partition.
> So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up
> a partition. However, it cannot override a conflicting cpuset.cpus file
> in a sibling cpuset and the partition creation process will fail. This
> is inconsistent. That will also make using cpuset.cpus.exclusive less
> valuable as a tool to set up cpuset partitions as the users have to
> check if such a cpuset.cpus conflict exists or not.
>
> Fix these problems by strictly adhering to the setting of the
> following control files in descending order of priority when setting
> up a partition.
>
> 1. cpuset.cpus.exclusive.effective of a valid partition
> 2. cpuset.cpus.exclusive
> 3. cpuset.cpus
>
Hi, Longman,
This description is a bit confusing to me. cpuset.cpus.exclusive and cpuset.cpus are user-settable
control files, while cpuset.cpus.exclusive.effective is a read-only file that reflects the result of
applying cpuset.cpus.exclusive and cpuset.cpus after conflict resolution.
A partition can be established as long as cpuset.cpus.exclusive.effective is not empty. I believe
cpuset.cpus.exclusive.effective represents the final effective CPU mask used for the partition, so
it shouldn't be compared in priority with cpuset.cpus.exclusive or cpuset.cpus. Rather, the latter
two are inputs that determine the former.
> So once a cpuset.cpus.exclusive is set without failure, it will
> always be allowed to form a valid partition as long as at least one
> CPU can be granted from its parent irrespective of the state of the
> siblings' cpuset.cpus values. Of course, setting cpuset.cpus.exclusive
> will fail if it conflicts with the cpuset.cpus.exclusive or the
> cpuset.cpus.exclusive.effective value of a sibling.
>
> Partition can still be created by setting only cpuset.cpus without
> setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's
> cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will
> be removed from its cpuset.cpus.exclusive.effective as long as there
> is still one or more CPUs left and can be granted from its parent. This
> CPU stripping is currently done in rm_siblings_excl_cpus().
>
> The new code will now try its best to enable the creation of new
> partitions with only cpuset.cpus set without invalidating existing ones.
> However it is not guaranteed that all the CPUs requested in cpuset.cpus
> will be used in the new partition even when all these CPUs can be
> granted from the parent.
>
> This is similar to the fact that cpuset.cpus.effective may not be
> able to include all the CPUs requested in cpuset.cpus. In this case,
> the parent may not able to grant all the exclusive CPUs requested in
> cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have
> already been granted to other partitions earlier.
>
> With the creation of multiple sibling partitions by setting
> only cpuset.cpus, this does have the side effect that their exact
> cpuset.cpus.exclusive.effective settings will depend on the order of
> partition creation if there are conflicts. Due to the exclusive nature
> of the CPUs in a partition, it is not easy to make it fair other than
> the old behavior of invalidating all the conflicting partitions.
>
> For example,
> # echo "0-2" > A1/cpuset.cpus
> # echo "root" > A1/cpuset.cpus.partition
> # echo A1/cpuset.cpus.partition
> root
> # echo A1/cpuset.cpus.exclusive.effective
> 0-2
> # echo "2-4" > B1/cpuset.cpus
> # echo "root" > B1/cpuset.cpus.partition
> # echo B1/cpuset.cpus.partition
> root
> # echo B1/cpuset.cpus.exclusive.effective
> 3-4
> # echo B1/cpuset.cpus.effective
> 3-4
>
> For users who want to be sure that they can get most of the CPUs they
> want, cpuset.cpus.exclusive should be used instead if they can set
> it successfully without failure. Setting cpuset.cpus.exclusive will
> guarantee that sibling conflicts from then onward is no longer possible.
>
> To make this change, we have to separate out the is_cpu_exclusive()
> check in cpus_excl_conflict() into a cgroup v1 only
> cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change()
> helper is now no longer needed and can be removed.
>
> Some existing tests in test_cpuset_prs.sh are updated and new ones are
> added to reflect the new behavior. The cgroup-v2.rst doc file is also
> updated the clarify what exclusive CPUs will be used when a partition
> is created.
>
> Reported-by: Sun Shaojie <sunshaojie@kylinos.cn>
> Closes: https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kylinos.cn/
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 32 +++++---
> kernel/cgroup/cpuset-internal.h | 3 +
> kernel/cgroup/cpuset-v1.c | 19 +++++
> kernel/cgroup/cpuset.c | 81 +++++++------------
> .../selftests/cgroup/test_cpuset_prs.sh | 26 ++++--
> 5 files changed, 90 insertions(+), 71 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 510df2461aff..a3446db96cea 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -2584,9 +2584,9 @@ Cpuset Interface Files
> of this file will always be a subset of its parent's
> "cpuset.cpus.exclusive.effective" if its parent is not the root
> cgroup. It will also be a subset of "cpuset.cpus.exclusive"
> - if it is set. If "cpuset.cpus.exclusive" is not set, it is
> - treated to have an implicit value of "cpuset.cpus" in the
> - formation of local partition.
> + if it is set. This file should only be non-empty if either
> + "cpuset.cpus.exclusive" is set or when the current cpuset is
> + a valid partition root.
>
> cpuset.cpus.isolated
> A read-only and root cgroup only multiple values file.
> @@ -2618,13 +2618,22 @@ Cpuset Interface Files
> There are two types of partitions - local and remote. A local
> partition is one whose parent cgroup is also a valid partition
> root. A remote partition is one whose parent cgroup is not a
> - valid partition root itself. Writing to "cpuset.cpus.exclusive"
> - is optional for the creation of a local partition as its
> - "cpuset.cpus.exclusive" file will assume an implicit value that
> - is the same as "cpuset.cpus" if it is not set. Writing the
> - proper "cpuset.cpus.exclusive" values down the cgroup hierarchy
> - before the target partition root is mandatory for the creation
> - of a remote partition.
> + valid partition root itself.
> +
> + Writing to "cpuset.cpus.exclusive" is optional for the creation
> + of a local partition as its "cpuset.cpus.exclusive" file will
> + assume an implicit value that is the same as "cpuset.cpus" if it
> + is not set. Writing the proper "cpuset.cpus.exclusive" values
> + down the cgroup hierarchy before the target partition root is
> + mandatory for the creation of a remote partition.
> +
> + Not all the CPUs requested in "cpuset.cpus.exclusive" can be
> + used to form a new partition. Only those that were present
> + in its parent's "cpuset.cpus.exclusive.effective" control
> + file can be used. For partitions created without setting
> + "cpuset.cpus.exclusive", exclusive CPUs specified in sibling's
> + "cpuset.cpus.exclusive" or "cpuset.cpus.exclusive.effective"
> + also cannot be used.
>
> Currently, a remote partition cannot be created under a local
> partition. All the ancestors of a remote partition root except
> @@ -2632,6 +2641,9 @@ Cpuset Interface Files
>
> The root cgroup is always a partition root and its state cannot
> be changed. All other non-root cgroups start out as "member".
> + Even though the "cpuset.cpus.exclusive*" control files are not
> + present in the root cgroup, they are implicitly the same as
> + "cpuset.cpus".
>
> When set to "root", the current cgroup is the root of a new
> partition or scheduling domain. The set of exclusive CPUs is
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index e718a4f54360..e8e2683cb067 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -312,6 +312,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
> struct cpumask *new_cpus, nodemask_t *new_mems,
> bool cpus_updated, bool mems_updated);
> int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
> +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2);
> void cpuset1_init(struct cpuset *cs);
> void cpuset1_online_css(struct cgroup_subsys_state *css);
> int cpuset1_generate_sched_domains(cpumask_var_t **domains,
> @@ -326,6 +327,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
> bool cpus_updated, bool mems_updated) {}
> static inline int cpuset1_validate_change(struct cpuset *cur,
> struct cpuset *trial) { return 0; }
> +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
> + struct cpuset *cs2) { return false; }
> static inline void cpuset1_init(struct cpuset *cs) {}
> static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
> static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
> index ecfea7800f0d..04124c38a774 100644
> --- a/kernel/cgroup/cpuset-v1.c
> +++ b/kernel/cgroup/cpuset-v1.c
> @@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial)
> return ret;
> }
>
> +/*
> + * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
> + * to legacy (v1)
> + * @cs1: first cpuset to check
> + * @cs2: second cpuset to check
> + *
> + * Returns: true if CPU exclusivity conflict exists, false otherwise
> + *
> + * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
> + */
> +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
> +{
> + if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
> + return cpumask_intersects(cs1->cpus_allowed,
> + cs2->cpus_allowed);
> +
> + return false;
> +}
> +
> #ifdef CONFIG_PROC_PID_CPUSET
> /*
> * proc_cpuset_show()
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 30e31fac4fe3..c091b63dd3c9 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -129,6 +129,18 @@ static bool force_sd_rebuild;
> * For simplicity, a local partition can be created under a local or remote
> * partition but a remote partition cannot have any partition root in its
> * ancestor chain except the cgroup root.
> + *
> + * A valid partition can be formed by setting either cpus_allowed or
> + * exclusive_cpus. If there are exclusive CPU conflicts, the conflicting
> + * CPUs will be assigned to the effective_xcpus of the partition according
> + * to the appearance of those CPUs in cpumasks (in descending order of
> + * priority).
> + * 1. effective_xcpus of a valid partition
> + * 2. exclusive_cpus
> + * 3. cpus_allowed
> + *
> + * exclusive_cpus should be used for setting up partition if the users want
> + * to get as many CPUs as possible.
> */
Ditto
> #define PRS_MEMBER 0
> #define PRS_ROOT 1
> @@ -616,27 +628,25 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
> * Returns: true if CPU exclusivity conflict exists, false otherwise
> *
> * Conflict detection rules:
> - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive
> - * 2. exclusive_cpus masks cannot intersect between cpusets
> - * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs
> + * o cgroup v1
> + * See cpuset1_cpus_excl_conflict()
> + * o cgroup v2
> + * - The exclusive_cpus values cannot overlap.
> + * - New exclusive_cpus cannot be a superset of a sibling's cpus_allowed.
> */
> static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling,
> bool xcpus_changed)
> {
> - /* If either cpuset is exclusive, check if they are mutually exclusive */
> - if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling))
> - return !cpusets_are_exclusive(trial, sibling);
> -
> - /* Exclusive_cpus cannot intersect */
> - if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus))
> - return true;
> + if (!cpuset_v2())
> + return cpuset1_cpus_excl_conflict(trial, sibling);
>
> /* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */
> if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) &&
> cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus))
> return true;
>
> - return false;
> + /* Exclusive_cpus cannot intersect */
> + return cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus);
> }
>
> static inline bool mems_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
> @@ -2312,43 +2322,6 @@ static enum prs_errcode validate_partition(struct cpuset *cs, struct cpuset *tri
> return PERR_NONE;
> }
>
> -static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialcs,
> - struct tmpmasks *tmp)
> -{
> - int retval;
> - struct cpuset *parent = parent_cs(cs);
> -
> - retval = validate_change(cs, trialcs);
> -
> - if ((retval == -EINVAL) && cpuset_v2()) {
> - struct cgroup_subsys_state *css;
> - struct cpuset *cp;
> -
> - /*
> - * The -EINVAL error code indicates that partition sibling
> - * CPU exclusivity rule has been violated. We still allow
> - * the cpumask change to proceed while invalidating the
> - * partition. However, any conflicting sibling partitions
> - * have to be marked as invalid too.
> - */
> - trialcs->prs_err = PERR_NOTEXCL;
> - rcu_read_lock();
> - cpuset_for_each_child(cp, css, parent) {
> - struct cpumask *xcpus = user_xcpus(trialcs);
> -
> - if (is_partition_valid(cp) &&
> - cpumask_intersects(xcpus, cp->effective_xcpus)) {
> - rcu_read_unlock();
> - update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
> - rcu_read_lock();
> - }
> - }
> - rcu_read_unlock();
> - retval = 0;
> - }
> - return retval;
> -}
> -
> /**
> * partition_cpus_change - Handle partition state changes due to CPU mask updates
> * @cs: The target cpuset being modified
> @@ -2408,15 +2381,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
> if (cpumask_equal(cs->cpus_allowed, trialcs->cpus_allowed))
> return 0;
>
> - if (alloc_tmpmasks(&tmp))
> - return -ENOMEM;
> -
> compute_trialcs_excpus(trialcs, cs);
> trialcs->prs_err = PERR_NONE;
>
> - retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
> + retval = validate_change(cs, trialcs);
> if (retval < 0)
> - goto out_free;
> + return retval;
> +
> + if (alloc_tmpmasks(&tmp))
> + return -ENOMEM;
>
> /*
> * Check all the descendants in update_cpumasks_hier() if
> @@ -2439,7 +2412,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
> /* Update CS_SCHED_LOAD_BALANCE and/or sched_domains, if necessary */
> if (cs->partition_root_state)
> update_partition_sd_lb(cs, old_prs);
> -out_free:
> +
> free_tmpmasks(&tmp);
> return retval;
> }
> diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> index a17256d9f88a..ff4540b0490e 100755
> --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> @@ -269,7 +269,7 @@ TEST_MATRIX=(
> " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
> " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
> " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
> - " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
> + " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2"
> " C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5"
> " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
> " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
> @@ -318,7 +318,7 @@ TEST_MATRIX=(
> # Invalid to valid local partition direct transition tests
> " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
> " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
> - " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
> + " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0"
> " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
>
> # Local partition invalidation tests
> @@ -388,10 +388,10 @@ TEST_MATRIX=(
> " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2"
> " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
>
> - # A non-exclusive cpuset.cpus change will invalidate partition and its siblings
> - " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
> - " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
> - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
> + # A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
> + " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3 A1:P1|B1:P0"
> + " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|XA1:0-1|B1:2-3 A1:P1|B1:P1"
> + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"
>
> # cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
> " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"
> @@ -417,6 +417,14 @@ TEST_MATRIX=(
> " CX1-4:S+ CX2-4:P2 . C5-6 . . . P1:C3-6 0 A1:1|A2:2-4|B1:5-6 \
> A1:P0|A2:P2:B1:P-1 2-4"
>
> + # When multiple partitions with conflicting cpuset.cpus are created, the
> + # latter created ones will only get what are left of the available exclusive
> + # CPUs.
> + " C1-3:P1 . . . . . . C3-5:P1 0 A1:1-3|B1:4-5:XB1:4-5 A1:P1|B1:P1"
> +
> + # cpuset.cpus can be set to a subset of sibling's cpuset.cpus.exclusive
> + " C1-3:X1-3 . . C4-5 . . . C1-2 0 A1:1-3|B1:1-2"
> +
> # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
> # ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ --------
> # Failure cases:
> @@ -427,7 +435,7 @@ TEST_MATRIX=(
> # Changes to cpuset.cpus.exclusive that violate exclusivity rule is rejected
> " C0-3 . . C4-5 X0-3 . . X3-5 1 A1:0-3|B1:4-5"
>
> - # cpuset.cpus cannot be a subset of sibling cpuset.cpus.exclusive
> + # cpuset.cpus.exclusive cannot be set to a superset of sibling's cpuset.cpus
> " C0-3 . . C4-5 X3-5 . . . 1 A1:0-3|B1:4-5"
> )
>
> @@ -477,6 +485,10 @@ REMOTE_TEST_MATRIX=(
> . . X1-2:P2 X4-5:P1 . X1-7:P2 p1:3|c11:1-2|c12:4:c22:5-6 \
> p1:P0|p2:P1|c11:P2|c12:P1|c22:P2 \
> 1-2,4-6|1-2,5-6"
> + # c12 whose cpuset.cpus CPUs are all granted to c11 will become invalid partition
> + " C1-5:P1:S+ . C1-4:P1 C2-3 . . \
> + . . . P1 . . p1:5|c11:1-4|c12:5 \
> + p1:P1|c11:P1|c12:P-1"
> )
>
> #
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
2026-01-04 2:48 ` Chen Ridong
@ 2026-01-04 21:25 ` Waiman Long
2026-01-05 1:15 ` Chen Ridong
0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2026-01-04 21:25 UTC (permalink / raw)
To: Chen Ridong, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 1/3/26 9:48 PM, Chen Ridong wrote:
>
> On 2026/1/2 3:15, Waiman Long wrote:
>> Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check()
>> & make update_cpumasks_hier() handle remote partition"), the
>> compute_effective_exclusive_cpumask() helper was extended to
>> strip exclusive CPUs from siblings when computing effective_xcpus
>> (cpuset.cpus.exclusive.effective). This helper was later renamed to
>> compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
>> CPU mask computation logic").
>>
>> This helper is supposed to be used consistently to compute
>> effective_xcpus. However, there is an exception within the callback
>> critical section in update_cpumasks_hier() when exclusive_cpus of a
>> valid partition root is empty. This can cause effective_xcpus value to
>> differ depending on where exactly it is last computed. Fix this by using
>> compute_excpus() in this case to give a consistent result.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> kernel/cgroup/cpuset.c | 14 +++++---------
>> 1 file changed, 5 insertions(+), 9 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index da2b3b51630e..37d118a9ad4d 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -2168,17 +2168,13 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
>> spin_lock_irq(&callback_lock);
>> cpumask_copy(cp->effective_cpus, tmp->new_cpus);
>> cp->partition_root_state = new_prs;
>> - if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
>> - compute_excpus(cp, cp->effective_xcpus);
>> -
>> /*
>> - * Make sure effective_xcpus is properly set for a valid
>> - * partition root.
>> + * Need to compute effective_xcpus if either exclusive_cpus
>> + * is non-empty or it is a valid partition root.
>> */
>> - if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
>> - cpumask_and(cp->effective_xcpus,
>> - cp->cpus_allowed, parent->effective_xcpus);
>> - else if (new_prs < 0)
>> + if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
>> + compute_excpus(cp, cp->effective_xcpus);
>> + if (new_prs < 0)
>> reset_partition_data(cp);
>> spin_unlock_irq(&callback_lock);
>>
> The code resets partition data only for new_prs < 0. My understanding is that a partition is invalid
> when new_prs <= 0. Shouldn't reset_partition_data() also be called when new_prs = 0? Is there a
> specific reason to skip the reset in that case?
update_cpumasks_hier() is called when changes in a cpuset or hotplug
affects other cpusets in the hierarchy. With respect to changes in
partition state, it is either from valid to invalid or vice versa. It
will not change from a valid partition to member. The only way new_prs =
0 is when old_prs = 0. Even if the affected cpuset is processed again in
update_cpumask_hier(), any state change from valid partition to member
(update_prstate()), reset_partition_data() should have been called
there. That is why we only care about when new_prs != 0.
The code isn't wrong here. However I can change the condition to
(new_prs <= 0) if it makes it easier to understand.
Cheers,
Longman
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2
2026-01-04 7:09 ` Chen Ridong
@ 2026-01-04 21:48 ` Waiman Long
2026-01-05 1:35 ` Chen Ridong
0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2026-01-04 21:48 UTC (permalink / raw)
To: Chen Ridong, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 1/4/26 2:09 AM, Chen Ridong wrote:
>
> On 2026/1/2 3:15, Waiman Long wrote:
>> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
>> until valid partition") introduced a new check to disallow the setting
>> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
>> cpuset.cpus value so that there will at least be one CPU left in the
>> sibling in case the cpuset becomes a valid partition root. This new
>> check does have the side effect of failing a cpuset.cpus change that
>> make it a subset of a sibling's cpuset.cpus.exclusive value.
>>
>> With v2, users are supposed to be allowed to set whatever value they
>> want in cpuset.cpus without failure. To maintain this rule, the check
>> is now restricted to only when cpuset.cpus.exclusive is being changed
>> not when cpuset.cpus is changed.
>>
> Hi, Longman,
>
> You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
> explicitly documented. Should we add it?
>
> More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
> constraints? This seems to be the underlying assumption in this patch.
Before the introduction of cpuset partition, writing to cpuset.cpus will
only fail if the cpu list is invalid like containing CPUs outside of the
valid cpu range. What I mean by "never-fail" is that if the cpu list is
valid, the write action should not fail. The rule is not explicitly
stated in the documentation, but it is a pre-existing behavior which we
should try to keep to avoid breaking existing applications.
The exclusive CPU constraint does not apply to cpuset.cpus. It only
applies when setting cpuset.cpus.exclusive wrt to other
cpuset.cpus.exclusive* in sibling cpusets. So I will not say one has
higher priority than the other.
Cheers,
Longman
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 4/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict
2026-01-04 7:53 ` Chen Ridong
@ 2026-01-04 22:26 ` Waiman Long
0 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2026-01-04 22:26 UTC (permalink / raw)
To: Chen Ridong, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 1/4/26 2:53 AM, Chen Ridong wrote:
>
> On 2026/1/2 3:15, Waiman Long wrote:
>> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
>> with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition,
>> the sibling's partition state becomes invalid. This is overly harsh and
>> is probably not necessary.
>>
>> The cpuset.cpus.exclusive control file, if set, will override the
>> cpuset.cpus of the same cpuset when creating a cpuset partition.
>> So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up
>> a partition. However, it cannot override a conflicting cpuset.cpus file
>> in a sibling cpuset and the partition creation process will fail. This
>> is inconsistent. That will also make using cpuset.cpus.exclusive less
>> valuable as a tool to set up cpuset partitions as the users have to
>> check if such a cpuset.cpus conflict exists or not.
>>
>> Fix these problems by strictly adhering to the setting of the
>> following control files in descending order of priority when setting
>> up a partition.
>>
>> 1. cpuset.cpus.exclusive.effective of a valid partition
>> 2. cpuset.cpus.exclusive
>> 3. cpuset.cpus
>>
> Hi, Longman,
>
> This description is a bit confusing to me. cpuset.cpus.exclusive and cpuset.cpus are user-settable
> control files, while cpuset.cpus.exclusive.effective is a read-only file that reflects the result of
> applying cpuset.cpus.exclusive and cpuset.cpus after conflict resolution.
>
> A partition can be established as long as cpuset.cpus.exclusive.effective is not empty. I believe
> cpuset.cpus.exclusive.effective represents the final effective CPU mask used for the partition, so
> it shouldn't be compared in priority with cpuset.cpus.exclusive or cpuset.cpus. Rather, the latter
> two are inputs that determine the former.
Yes, that priority list can be somewhat confusing. I will take out this
paragraph. The next 2 paragraphs in the commit log should be good enough.
Thanks,
Longman
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
2026-01-04 21:25 ` Waiman Long
@ 2026-01-05 1:15 ` Chen Ridong
2026-01-05 3:50 ` Waiman Long
0 siblings, 1 reply; 20+ messages in thread
From: Chen Ridong @ 2026-01-05 1:15 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 2026/1/5 5:25, Waiman Long wrote:
> On 1/3/26 9:48 PM, Chen Ridong wrote:
>>
>> On 2026/1/2 3:15, Waiman Long wrote:
>>> Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check()
>>> & make update_cpumasks_hier() handle remote partition"), the
>>> compute_effective_exclusive_cpumask() helper was extended to
>>> strip exclusive CPUs from siblings when computing effective_xcpus
>>> (cpuset.cpus.exclusive.effective). This helper was later renamed to
>>> compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
>>> CPU mask computation logic").
>>>
>>> This helper is supposed to be used consistently to compute
>>> effective_xcpus. However, there is an exception within the callback
>>> critical section in update_cpumasks_hier() when exclusive_cpus of a
>>> valid partition root is empty. This can cause effective_xcpus value to
>>> differ depending on where exactly it is last computed. Fix this by using
>>> compute_excpus() in this case to give a consistent result.
>>>
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>> kernel/cgroup/cpuset.c | 14 +++++---------
>>> 1 file changed, 5 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index da2b3b51630e..37d118a9ad4d 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -2168,17 +2168,13 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
>>> spin_lock_irq(&callback_lock);
>>> cpumask_copy(cp->effective_cpus, tmp->new_cpus);
>>> cp->partition_root_state = new_prs;
>>> - if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
>>> - compute_excpus(cp, cp->effective_xcpus);
>>> -
>>> /*
>>> - * Make sure effective_xcpus is properly set for a valid
>>> - * partition root.
>>> + * Need to compute effective_xcpus if either exclusive_cpus
>>> + * is non-empty or it is a valid partition root.
>>> */
>>> - if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
>>> - cpumask_and(cp->effective_xcpus,
>>> - cp->cpus_allowed, parent->effective_xcpus);
>>> - else if (new_prs < 0)
>>> + if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
>>> + compute_excpus(cp, cp->effective_xcpus);
>>> + if (new_prs < 0)
>>> reset_partition_data(cp);
>>> spin_unlock_irq(&callback_lock);
>>>
>> The code resets partition data only for new_prs < 0. My understanding is that a partition is invalid
>> when new_prs <= 0. Shouldn't reset_partition_data() also be called when new_prs = 0? Is there a
>> specific reason to skip the reset in that case?
>
> update_cpumasks_hier() is called when changes in a cpuset or hotplug affects other cpusets in the
> hierarchy. With respect to changes in partition state, it is either from valid to invalid or vice
> versa. It will not change from a valid partition to member. The only way new_prs = 0 is when old_prs
> = 0. Even if the affected cpuset is processed again in update_cpumask_hier(), any state change from
> valid partition to member (update_prstate()), reset_partition_data() should have been called there.
> That is why we only care about when new_prs != 0.
>
Thank you for your patience.
> The code isn't wrong here. However I can change the condition to (new_prs <= 0) if it makes it
> easier to understand.
>
I agree there's nothing wrong with the current logic. However, for clarity, I suggest changing the
condition to (new_prs <= 0). This allows the function's logic to be fully self-consistent and
focused on a single responsibility. This approach would allow us to simplify the code to:
if (new_prs > 0)
compute_excpus(cp, cp->effective_xcpus);
else
reset_partition_data(cp);
Since reset_partition_data() already handles cases whether cp->exclusive_cpus is empty or not, this
implementation would be more concise while correctly covering all scenarios.
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2
2026-01-04 21:48 ` Waiman Long
@ 2026-01-05 1:35 ` Chen Ridong
2026-01-05 3:59 ` Waiman Long
0 siblings, 1 reply; 20+ messages in thread
From: Chen Ridong @ 2026-01-05 1:35 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 2026/1/5 5:48, Waiman Long wrote:
> On 1/4/26 2:09 AM, Chen Ridong wrote:
>>
>> On 2026/1/2 3:15, Waiman Long wrote:
>>> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
>>> until valid partition") introduced a new check to disallow the setting
>>> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
>>> cpuset.cpus value so that there will at least be one CPU left in the
>>> sibling in case the cpuset becomes a valid partition root. This new
>>> check does have the side effect of failing a cpuset.cpus change that
>>> make it a subset of a sibling's cpuset.cpus.exclusive value.
>>>
>>> With v2, users are supposed to be allowed to set whatever value they
>>> want in cpuset.cpus without failure. To maintain this rule, the check
>>> is now restricted to only when cpuset.cpus.exclusive is being changed
>>> not when cpuset.cpus is changed.
>>>
>> Hi, Longman,
>>
>> You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
>> explicitly documented. Should we add it?
>>
>> More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
>> constraints? This seems to be the underlying assumption in this patch.
>
> Before the introduction of cpuset partition, writing to cpuset.cpus will only fail if the cpu list
> is invalid like containing CPUs outside of the valid cpu range. What I mean by "never-fail" is that
> if the cpu list is valid, the write action should not fail. The rule is not explicitly stated in the
> documentation, but it is a pre-existing behavior which we should try to keep to avoid breaking
> existing applications.
>
There are two conditions that can cause a cpuset.cpus write operation to fail: ENOSPC (No space left
on device) and EBUSY.
I just want to ensure the behavior aligns with our design intent.
Consider this example:
# cd /sys/fs/cgroup/
# mkdir test
# echo 1 > test/cpuset.cpus
# echo $$ > test/cgroup.procs
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo > test/cpuset.cpus
-bash: echo: write error: No space left on device
In cgroups v2, if the test cgroup becomes empty, it could inherit the parent's effective CPUs. My
question is: Should we still fail to clear cpuset.cpus (returning an error) when the cgroup is
populated?
> The exclusive CPU constraint does not apply to cpuset.cpus. It only applies when setting
> cpuset.cpus.exclusive wrt to other cpuset.cpus.exclusive* in sibling cpusets. So I will not say one
> has higher priority than the other.
>
> Cheers,
> Longman
>
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
2026-01-05 1:15 ` Chen Ridong
@ 2026-01-05 3:50 ` Waiman Long
2026-01-05 3:58 ` Chen Ridong
0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2026-01-05 3:50 UTC (permalink / raw)
To: Chen Ridong, Waiman Long, Tejun Heo, Johannes Weiner,
Michal Koutný, Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 1/4/26 8:15 PM, Chen Ridong wrote:
>
> On 2026/1/5 5:25, Waiman Long wrote:
>> On 1/3/26 9:48 PM, Chen Ridong wrote:
>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>> Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check()
>>>> & make update_cpumasks_hier() handle remote partition"), the
>>>> compute_effective_exclusive_cpumask() helper was extended to
>>>> strip exclusive CPUs from siblings when computing effective_xcpus
>>>> (cpuset.cpus.exclusive.effective). This helper was later renamed to
>>>> compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
>>>> CPU mask computation logic").
>>>>
>>>> This helper is supposed to be used consistently to compute
>>>> effective_xcpus. However, there is an exception within the callback
>>>> critical section in update_cpumasks_hier() when exclusive_cpus of a
>>>> valid partition root is empty. This can cause effective_xcpus value to
>>>> differ depending on where exactly it is last computed. Fix this by using
>>>> compute_excpus() in this case to give a consistent result.
>>>>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>> kernel/cgroup/cpuset.c | 14 +++++---------
>>>> 1 file changed, 5 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>> index da2b3b51630e..37d118a9ad4d 100644
>>>> --- a/kernel/cgroup/cpuset.c
>>>> +++ b/kernel/cgroup/cpuset.c
>>>> @@ -2168,17 +2168,13 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
>>>> spin_lock_irq(&callback_lock);
>>>> cpumask_copy(cp->effective_cpus, tmp->new_cpus);
>>>> cp->partition_root_state = new_prs;
>>>> - if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
>>>> - compute_excpus(cp, cp->effective_xcpus);
>>>> -
>>>> /*
>>>> - * Make sure effective_xcpus is properly set for a valid
>>>> - * partition root.
>>>> + * Need to compute effective_xcpus if either exclusive_cpus
>>>> + * is non-empty or it is a valid partition root.
>>>> */
>>>> - if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
>>>> - cpumask_and(cp->effective_xcpus,
>>>> - cp->cpus_allowed, parent->effective_xcpus);
>>>> - else if (new_prs < 0)
>>>> + if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
>>>> + compute_excpus(cp, cp->effective_xcpus);
>>>> + if (new_prs < 0)
>>>> reset_partition_data(cp);
>>>> spin_unlock_irq(&callback_lock);
>>>>
>>> The code resets partition data only for new_prs < 0. My understanding is that a partition is invalid
>>> when new_prs <= 0. Shouldn't reset_partition_data() also be called when new_prs = 0? Is there a
>>> specific reason to skip the reset in that case?
>> update_cpumasks_hier() is called when changes in a cpuset or hotplug affects other cpusets in the
>> hierarchy. With respect to changes in partition state, it is either from valid to invalid or vice
>> versa. It will not change from a valid partition to member. The only way new_prs = 0 is when old_prs
>> = 0. Even if the affected cpuset is processed again in update_cpumask_hier(), any state change from
>> valid partition to member (update_prstate()), reset_partition_data() should have been called there.
>> That is why we only care about when new_prs != 0.
>>
> Thank you for your patience.
>
>> The code isn't wrong here. However I can change the condition to (new_prs <= 0) if it makes it
>> easier to understand.
>>
> I agree there's nothing wrong with the current logic. However, for clarity, I suggest changing the
> condition to (new_prs <= 0). This allows the function's logic to be fully self-consistent and
> focused on a single responsibility. This approach would allow us to simplify the code to:
>
> if (new_prs > 0)
> compute_excpus(cp, cp->effective_xcpus);
> else
> reset_partition_data(cp);
>
> Since reset_partition_data() already handles cases whether cp->exclusive_cpus is empty or not, this
> implementation would be more concise while correctly covering all scenarios.
effective_xcpus should be set when exclusive_cpus is not empty or when
the cpuset is a valid partition root. So just checking new_prs for
compute_excpus() is not enough.
Cheers,
Longman
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
2026-01-05 3:50 ` Waiman Long
@ 2026-01-05 3:58 ` Chen Ridong
2026-01-05 4:06 ` Waiman Long
0 siblings, 1 reply; 20+ messages in thread
From: Chen Ridong @ 2026-01-05 3:58 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 2026/1/5 11:50, Waiman Long wrote:
> On 1/4/26 8:15 PM, Chen Ridong wrote:
>>
>> On 2026/1/5 5:25, Waiman Long wrote:
>>> On 1/3/26 9:48 PM, Chen Ridong wrote:
>>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>>> Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check()
>>>>> & make update_cpumasks_hier() handle remote partition"), the
>>>>> compute_effective_exclusive_cpumask() helper was extended to
>>>>> strip exclusive CPUs from siblings when computing effective_xcpus
>>>>> (cpuset.cpus.exclusive.effective). This helper was later renamed to
>>>>> compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
>>>>> CPU mask computation logic").
>>>>>
>>>>> This helper is supposed to be used consistently to compute
>>>>> effective_xcpus. However, there is an exception within the callback
>>>>> critical section in update_cpumasks_hier() when exclusive_cpus of a
>>>>> valid partition root is empty. This can cause effective_xcpus value to
>>>>> differ depending on where exactly it is last computed. Fix this by using
>>>>> compute_excpus() in this case to give a consistent result.
>>>>>
>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>> ---
>>>>> kernel/cgroup/cpuset.c | 14 +++++---------
>>>>> 1 file changed, 5 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>>> index da2b3b51630e..37d118a9ad4d 100644
>>>>> --- a/kernel/cgroup/cpuset.c
>>>>> +++ b/kernel/cgroup/cpuset.c
>>>>> @@ -2168,17 +2168,13 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
>>>>> spin_lock_irq(&callback_lock);
>>>>> cpumask_copy(cp->effective_cpus, tmp->new_cpus);
>>>>> cp->partition_root_state = new_prs;
>>>>> - if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
>>>>> - compute_excpus(cp, cp->effective_xcpus);
>>>>> -
>>>>> /*
>>>>> - * Make sure effective_xcpus is properly set for a valid
>>>>> - * partition root.
>>>>> + * Need to compute effective_xcpus if either exclusive_cpus
>>>>> + * is non-empty or it is a valid partition root.
>>>>> */
>>>>> - if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
>>>>> - cpumask_and(cp->effective_xcpus,
>>>>> - cp->cpus_allowed, parent->effective_xcpus);
>>>>> - else if (new_prs < 0)
>>>>> + if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
>>>>> + compute_excpus(cp, cp->effective_xcpus);
>>>>> + if (new_prs < 0)
>>>>> reset_partition_data(cp);
>>>>> spin_unlock_irq(&callback_lock);
>>>>>
>>>> The code resets partition data only for new_prs < 0. My understanding is that a partition is
>>>> invalid
>>>> when new_prs <= 0. Shouldn't reset_partition_data() also be called when new_prs = 0? Is there a
>>>> specific reason to skip the reset in that case?
>>> update_cpumasks_hier() is called when changes in a cpuset or hotplug affects other cpusets in the
>>> hierarchy. With respect to changes in partition state, it is either from valid to invalid or vice
>>> versa. It will not change from a valid partition to member. The only way new_prs = 0 is when old_prs
>>> = 0. Even if the affected cpuset is processed again in update_cpumask_hier(), any state change from
>>> valid partition to member (update_prstate()), reset_partition_data() should have been called there.
>>> That is why we only care about when new_prs != 0.
>>>
>> Thank you for your patience.
>>
>>> The code isn't wrong here. However I can change the condition to (new_prs <= 0) if it makes it
>>> easier to understand.
>>>
>> I agree there's nothing wrong with the current logic. However, for clarity, I suggest changing the
>> condition to (new_prs <= 0). This allows the function's logic to be fully self-consistent and
>> focused on a single responsibility. This approach would allow us to simplify the code to:
>>
>> if (new_prs > 0)
>> compute_excpus(cp, cp->effective_xcpus);
>> else
>> reset_partition_data(cp);
>>
>> Since reset_partition_data() already handles cases whether cp->exclusive_cpus is empty or not, this
>> implementation would be more concise while correctly covering all scenarios.
>
> effective_xcpus should be set when exclusive_cpus is not empty or when the cpuset is a valid
> partition root. So just checking new_prs for compute_excpus() is not enough.
>
If we change the condition to (new_prs <= 0), it will reset the partition data even when we call
compute_excpus (for !cpumask_empty(cp->exclusive_cpus)), so we should still get the same result, right?
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2
2026-01-05 1:35 ` Chen Ridong
@ 2026-01-05 3:59 ` Waiman Long
2026-01-05 7:00 ` Chen Ridong
0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2026-01-05 3:59 UTC (permalink / raw)
To: Chen Ridong, Waiman Long, Tejun Heo, Johannes Weiner,
Michal Koutný, Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 1/4/26 8:35 PM, Chen Ridong wrote:
>
> On 2026/1/5 5:48, Waiman Long wrote:
>> On 1/4/26 2:09 AM, Chen Ridong wrote:
>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
>>>> until valid partition") introduced a new check to disallow the setting
>>>> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
>>>> cpuset.cpus value so that there will at least be one CPU left in the
>>>> sibling in case the cpuset becomes a valid partition root. This new
>>>> check does have the side effect of failing a cpuset.cpus change that
>>>> make it a subset of a sibling's cpuset.cpus.exclusive value.
>>>>
>>>> With v2, users are supposed to be allowed to set whatever value they
>>>> want in cpuset.cpus without failure. To maintain this rule, the check
>>>> is now restricted to only when cpuset.cpus.exclusive is being changed
>>>> not when cpuset.cpus is changed.
>>>>
>>> Hi, Longman,
>>>
>>> You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
>>> explicitly documented. Should we add it?
>>>
>>> More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
>>> constraints? This seems to be the underlying assumption in this patch.
>> Before the introduction of cpuset partition, writing to cpuset.cpus will only fail if the cpu list
>> is invalid like containing CPUs outside of the valid cpu range. What I mean by "never-fail" is that
>> if the cpu list is valid, the write action should not fail. The rule is not explicitly stated in the
>> documentation, but it is a pre-existing behavior which we should try to keep to avoid breaking
>> existing applications.
>>
> There are two conditions that can cause a cpuset.cpus write operation to fail: ENOSPC (No space left
> on device) and EBUSY.
>
> I just want to ensure the behavior aligns with our design intent.
>
> Consider this example:
>
> # cd /sys/fs/cgroup/
> # mkdir test
> # echo 1 > test/cpuset.cpus
> # echo $$ > test/cgroup.procs
> # echo 0 > /sys/devices/system/cpu/cpu1/online
> # echo > test/cpuset.cpus
> -bash: echo: write error: No space left on device
>
> In cgroups v2, if the test cgroup becomes empty, it could inherit the parent's effective CPUs. My
> question is: Should we still fail to clear cpuset.cpus (returning an error) when the cgroup is
> populated?
Good catch. This error is for v1. It shouldn't apply for v2. Yes, I
think we should fix that for v2.
Cheers,
Longman
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
2026-01-05 3:58 ` Chen Ridong
@ 2026-01-05 4:06 ` Waiman Long
2026-01-05 6:29 ` Chen Ridong
0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2026-01-05 4:06 UTC (permalink / raw)
To: Chen Ridong, Waiman Long, Tejun Heo, Johannes Weiner,
Michal Koutný, Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 1/4/26 10:58 PM, Chen Ridong wrote:
>
> On 2026/1/5 11:50, Waiman Long wrote:
>> On 1/4/26 8:15 PM, Chen Ridong wrote:
>>> On 2026/1/5 5:25, Waiman Long wrote:
>>>> On 1/3/26 9:48 PM, Chen Ridong wrote:
>>>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>>>> Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check()
>>>>>> & make update_cpumasks_hier() handle remote partition"), the
>>>>>> compute_effective_exclusive_cpumask() helper was extended to
>>>>>> strip exclusive CPUs from siblings when computing effective_xcpus
>>>>>> (cpuset.cpus.exclusive.effective). This helper was later renamed to
>>>>>> compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
>>>>>> CPU mask computation logic").
>>>>>>
>>>>>> This helper is supposed to be used consistently to compute
>>>>>> effective_xcpus. However, there is an exception within the callback
>>>>>> critical section in update_cpumasks_hier() when exclusive_cpus of a
>>>>>> valid partition root is empty. This can cause effective_xcpus value to
>>>>>> differ depending on where exactly it is last computed. Fix this by using
>>>>>> compute_excpus() in this case to give a consistent result.
>>>>>>
>>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>>> ---
>>>>>> kernel/cgroup/cpuset.c | 14 +++++---------
>>>>>> 1 file changed, 5 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>>>> index da2b3b51630e..37d118a9ad4d 100644
>>>>>> --- a/kernel/cgroup/cpuset.c
>>>>>> +++ b/kernel/cgroup/cpuset.c
>>>>>> @@ -2168,17 +2168,13 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
>>>>>> spin_lock_irq(&callback_lock);
>>>>>> cpumask_copy(cp->effective_cpus, tmp->new_cpus);
>>>>>> cp->partition_root_state = new_prs;
>>>>>> - if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
>>>>>> - compute_excpus(cp, cp->effective_xcpus);
>>>>>> -
>>>>>> /*
>>>>>> - * Make sure effective_xcpus is properly set for a valid
>>>>>> - * partition root.
>>>>>> + * Need to compute effective_xcpus if either exclusive_cpus
>>>>>> + * is non-empty or it is a valid partition root.
>>>>>> */
>>>>>> - if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
>>>>>> - cpumask_and(cp->effective_xcpus,
>>>>>> - cp->cpus_allowed, parent->effective_xcpus);
>>>>>> - else if (new_prs < 0)
>>>>>> + if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
>>>>>> + compute_excpus(cp, cp->effective_xcpus);
>>>>>> + if (new_prs < 0)
>>>>>> reset_partition_data(cp);
>>>>>> spin_unlock_irq(&callback_lock);
>>>>>>
>>>>> The code resets partition data only for new_prs < 0. My understanding is that a partition is
>>>>> invalid
>>>>> when new_prs <= 0. Shouldn't reset_partition_data() also be called when new_prs = 0? Is there a
>>>>> specific reason to skip the reset in that case?
>>>> update_cpumasks_hier() is called when changes in a cpuset or hotplug affects other cpusets in the
>>>> hierarchy. With respect to changes in partition state, it is either from valid to invalid or vice
>>>> versa. It will not change from a valid partition to member. The only way new_prs = 0 is when old_prs
>>>> = 0. Even if the affected cpuset is processed again in update_cpumask_hier(), any state change from
>>>> valid partition to member (update_prstate()), reset_partition_data() should have been called there.
>>>> That is why we only care about when new_prs != 0.
>>>>
>>> Thank you for your patience.
>>>
>>>> The code isn't wrong here. However I can change the condition to (new_prs <= 0) if it makes it
>>>> easier to understand.
>>>>
>>> I agree there's nothing wrong with the current logic. However, for clarity, I suggest changing the
>>> condition to (new_prs <= 0). This allows the function's logic to be fully self-consistent and
>>> focused on a single responsibility. This approach would allow us to simplify the code to:
>>>
>>> if (new_prs > 0)
>>> compute_excpus(cp, cp->effective_xcpus);
>>> else
>>> reset_partition_data(cp);
>>>
>>> Since reset_partition_data() already handles cases whether cp->exclusive_cpus is empty or not, this
>>> implementation would be more concise while correctly covering all scenarios.
>> effective_xcpus should be set when exclusive_cpus is not empty or when the cpuset is a valid
>> partition root. So just checking new_prs for compute_excpus() is not enough.
>>
> If we change the condition to (new_prs <= 0), it will reset the partition data even when we call
> compute_excpus (for !cpumask_empty(cp->exclusive_cpus)), so we should still get the same result, right?
Changing the condition to (new_prs <= 0) won't affect the result except
for a bit of wasted cpu cycles. That is why I am planning to make the
change in the next version to make it easier to understand.
Cheers,
Longman
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
2026-01-05 4:06 ` Waiman Long
@ 2026-01-05 6:29 ` Chen Ridong
0 siblings, 0 replies; 20+ messages in thread
From: Chen Ridong @ 2026-01-05 6:29 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 2026/1/5 12:06, Waiman Long wrote:
> On 1/4/26 10:58 PM, Chen Ridong wrote:
>>
>> On 2026/1/5 11:50, Waiman Long wrote:
>>> On 1/4/26 8:15 PM, Chen Ridong wrote:
>>>> On 2026/1/5 5:25, Waiman Long wrote:
>>>>> On 1/3/26 9:48 PM, Chen Ridong wrote:
>>>>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>>>>> Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check()
>>>>>>> & make update_cpumasks_hier() handle remote partition"), the
>>>>>>> compute_effective_exclusive_cpumask() helper was extended to
>>>>>>> strip exclusive CPUs from siblings when computing effective_xcpus
>>>>>>> (cpuset.cpus.exclusive.effective). This helper was later renamed to
>>>>>>> compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
>>>>>>> CPU mask computation logic").
>>>>>>>
>>>>>>> This helper is supposed to be used consistently to compute
>>>>>>> effective_xcpus. However, there is an exception within the callback
>>>>>>> critical section in update_cpumasks_hier() when exclusive_cpus of a
>>>>>>> valid partition root is empty. This can cause effective_xcpus value to
>>>>>>> differ depending on where exactly it is last computed. Fix this by using
>>>>>>> compute_excpus() in this case to give a consistent result.
>>>>>>>
>>>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>>>> ---
>>>>>>> kernel/cgroup/cpuset.c | 14 +++++---------
>>>>>>> 1 file changed, 5 insertions(+), 9 deletions(-)
>>>>>>>
>>>>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>>>>> index da2b3b51630e..37d118a9ad4d 100644
>>>>>>> --- a/kernel/cgroup/cpuset.c
>>>>>>> +++ b/kernel/cgroup/cpuset.c
>>>>>>> @@ -2168,17 +2168,13 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks
>>>>>>> *tmp,
>>>>>>> spin_lock_irq(&callback_lock);
>>>>>>> cpumask_copy(cp->effective_cpus, tmp->new_cpus);
>>>>>>> cp->partition_root_state = new_prs;
>>>>>>> - if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
>>>>>>> - compute_excpus(cp, cp->effective_xcpus);
>>>>>>> -
>>>>>>> /*
>>>>>>> - * Make sure effective_xcpus is properly set for a valid
>>>>>>> - * partition root.
>>>>>>> + * Need to compute effective_xcpus if either exclusive_cpus
>>>>>>> + * is non-empty or it is a valid partition root.
>>>>>>> */
>>>>>>> - if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
>>>>>>> - cpumask_and(cp->effective_xcpus,
>>>>>>> - cp->cpus_allowed, parent->effective_xcpus);
>>>>>>> - else if (new_prs < 0)
>>>>>>> + if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
>>>>>>> + compute_excpus(cp, cp->effective_xcpus);
>>>>>>> + if (new_prs < 0)
>>>>>>> reset_partition_data(cp);
>>>>>>> spin_unlock_irq(&callback_lock);
>>>>>>>
>>>>>> The code resets partition data only for new_prs < 0. My understanding is that a partition is
>>>>>> invalid
>>>>>> when new_prs <= 0. Shouldn't reset_partition_data() also be called when new_prs = 0? Is there a
>>>>>> specific reason to skip the reset in that case?
>>>>> update_cpumasks_hier() is called when changes in a cpuset or hotplug affects other cpusets in the
>>>>> hierarchy. With respect to changes in partition state, it is either from valid to invalid or vice
>>>>> versa. It will not change from a valid partition to member. The only way new_prs = 0 is when
>>>>> old_prs
>>>>> = 0. Even if the affected cpuset is processed again in update_cpumask_hier(), any state change
>>>>> from
>>>>> valid partition to member (update_prstate()), reset_partition_data() should have been called
>>>>> there.
>>>>> That is why we only care about when new_prs != 0.
>>>>>
>>>> Thank you for your patience.
>>>>
>>>>> The code isn't wrong here. However I can change the condition to (new_prs <= 0) if it makes it
>>>>> easier to understand.
>>>>>
>>>> I agree there's nothing wrong with the current logic. However, for clarity, I suggest changing the
>>>> condition to (new_prs <= 0). This allows the function's logic to be fully self-consistent and
>>>> focused on a single responsibility. This approach would allow us to simplify the code to:
>>>>
>>>> if (new_prs > 0)
>>>> compute_excpus(cp, cp->effective_xcpus);
>>>> else
>>>> reset_partition_data(cp);
>>>>
>>>> Since reset_partition_data() already handles cases whether cp->exclusive_cpus is empty or not, this
>>>> implementation would be more concise while correctly covering all scenarios.
>>> effective_xcpus should be set when exclusive_cpus is not empty or when the cpuset is a valid
>>> partition root. So just checking new_prs for compute_excpus() is not enough.
>>>
>> If we change the condition to (new_prs <= 0), it will reset the partition data even when we call
>> compute_excpus (for !cpumask_empty(cp->exclusive_cpus)), so we should still get the same result,
>> right?
>
> Changing the condition to (new_prs <= 0) won't affect the result except for a bit of wasted cpu
> cycles. That is why I am planning to make the change in the next version to make it easier to
> understand.
>
Sorry, I should have been clearer. If we change the condition, the code would essentially be:
if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
compute_excpus(cp, cp->effective_xcpus);
if (new_prs <= 0)
reset_partition_data(cp);
For cases where new_prs <= 0 && !cpumask_empty(cp->exclusive_cpus), both compute_excpus() and
reset_partition_data() would be called.
Is this functionally equivalent to:
if (new_prs > 0)
compute_excpus(cp, cp->effective_xcpus);
else (new_prs <= 0)
reset_partition_data(cp);
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2
2026-01-05 3:59 ` Waiman Long
@ 2026-01-05 7:00 ` Chen Ridong
0 siblings, 0 replies; 20+ messages in thread
From: Chen Ridong @ 2026-01-05 7:00 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan
Cc: linux-kernel, cgroups, linux-kselftest, linux-doc, Sun Shaojie
On 2026/1/5 11:59, Waiman Long wrote:
> On 1/4/26 8:35 PM, Chen Ridong wrote:
>>
>> On 2026/1/5 5:48, Waiman Long wrote:
>>> On 1/4/26 2:09 AM, Chen Ridong wrote:
>>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>>> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
>>>>> until valid partition") introduced a new check to disallow the setting
>>>>> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
>>>>> cpuset.cpus value so that there will at least be one CPU left in the
>>>>> sibling in case the cpuset becomes a valid partition root. This new
>>>>> check does have the side effect of failing a cpuset.cpus change that
>>>>> make it a subset of a sibling's cpuset.cpus.exclusive value.
>>>>>
>>>>> With v2, users are supposed to be allowed to set whatever value they
>>>>> want in cpuset.cpus without failure. To maintain this rule, the check
>>>>> is now restricted to only when cpuset.cpus.exclusive is being changed
>>>>> not when cpuset.cpus is changed.
>>>>>
>>>> Hi, Longman,
>>>>
>>>> You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
>>>> explicitly documented. Should we add it?
>>>>
>>>> More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
>>>> constraints? This seems to be the underlying assumption in this patch.
>>> Before the introduction of cpuset partition, writing to cpuset.cpus will only fail if the cpu list
>>> is invalid like containing CPUs outside of the valid cpu range. What I mean by "never-fail" is that
>>> if the cpu list is valid, the write action should not fail. The rule is not explicitly stated in the
>>> documentation, but it is a pre-existing behavior which we should try to keep to avoid breaking
>>> existing applications.
>>>
>> There are two conditions that can cause a cpuset.cpus write operation to fail: ENOSPC (No space left
>> on device) and EBUSY.
>>
>> I just want to ensure the behavior aligns with our design intent.
>>
>> Consider this example:
>>
>> # cd /sys/fs/cgroup/
>> # mkdir test
>> # echo 1 > test/cpuset.cpus
>> # echo $$ > test/cgroup.procs
>> # echo 0 > /sys/devices/system/cpu/cpu1/online
>> # echo > test/cpuset.cpus
>> -bash: echo: write error: No space left on device
>>
>> In cgroups v2, if the test cgroup becomes empty, it could inherit the parent's effective CPUs. My
>> question is: Should we still fail to clear cpuset.cpus (returning an error) when the cgroup is
>> populated?
>
> Good catch. This error is for v1. It shouldn't apply for v2. Yes, I think we should fix that for v2.
>
The EBUSY check (through cpuset_cpumask_can_shrink) is necessary, correct?
Since the subsequent patch modifies exclusive checking for v1, should we consolidate all v1-related
code into a separate function like cpuset1_validate_change() (maybe come duplicate code)?, it would
allow us to isolate v1 logic and avoid having to account for v1 implementation details in future
features.
In other words:
validate_change(...)
{
if (!is_in_v2_mode())
return cpuset1_validate_change(cur, trial);
...
// only v2 code here
}
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2026-01-05 7:00 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-01 19:15 [cgroup/for-6.20 PATCH v2 0/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict Waiman Long
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 1/4] cgroup/cpuset: Streamline rm_siblings_excl_cpus() Waiman Long
2026-01-04 1:55 ` Chen Ridong
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier() Waiman Long
2026-01-04 2:48 ` Chen Ridong
2026-01-04 21:25 ` Waiman Long
2026-01-05 1:15 ` Chen Ridong
2026-01-05 3:50 ` Waiman Long
2026-01-05 3:58 ` Chen Ridong
2026-01-05 4:06 ` Waiman Long
2026-01-05 6:29 ` Chen Ridong
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2 Waiman Long
2026-01-04 7:09 ` Chen Ridong
2026-01-04 21:48 ` Waiman Long
2026-01-05 1:35 ` Chen Ridong
2026-01-05 3:59 ` Waiman Long
2026-01-05 7:00 ` Chen Ridong
2026-01-01 19:15 ` [cgroup/for-6.20 PATCH v2 4/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict Waiman Long
2026-01-04 7:53 ` Chen Ridong
2026-01-04 22:26 ` Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).