* [PATCH-next v3 1/5] cgroup/cpuset: Add a cpuset_reserve_dl_bw() helper
2026-05-27 15:37 [PATCH-next v3 0/5] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
@ 2026-05-27 15:37 ` Waiman Long
2026-05-27 15:37 ` [PATCH-next v3 2/5] cgroup/cpuset: Expand the scope of cpuset_can_attach_check() Waiman Long
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2026-05-27 15:37 UTC (permalink / raw)
To: Chen Ridong, Tejun Heo, Johannes Weiner, Michal Koutný,
Ingo Molnar, Peter Zijlstra
Cc: cgroups, linux-kernel, Aaron Tomlin, Waiman Long
Extract the DL bandwidth allocation code in cpuset_attach() to a new
cpuset_reserve_dl_bw() helper to simplify code.
No functional change is expected.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 53 ++++++++++++++++++++++++------------------
1 file changed, 30 insertions(+), 23 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 51327333980a..d720bcc7ef83 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2980,6 +2980,25 @@ static int cpuset_can_attach_check(struct cpuset *cs)
return 0;
}
+static int cpuset_reserve_dl_bw(struct cpuset *cs)
+{
+ int cpu, ret;
+
+ if (!cs->sum_migrate_dl_bw)
+ return 0;
+
+ cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus);
+ if (unlikely(cpu >= nr_cpu_ids))
+ return -EINVAL;
+
+ ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
+ if (ret)
+ return ret;
+
+ cs->dl_bw_cpu = cpu;
+ return 0;
+}
+
static void reset_migrate_dl_data(struct cpuset *cs)
{
cs->nr_migrate_dl_tasks = 0;
@@ -2994,7 +3013,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
struct cpuset *cs, *oldcs;
struct task_struct *task;
bool setsched_check;
- int cpu, ret;
+ int ret;
/* used later by cpuset_attach() */
cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
@@ -3050,31 +3069,19 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
}
}
- if (!cs->sum_migrate_dl_bw)
- goto out_success;
-
- cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus);
- if (unlikely(cpu >= nr_cpu_ids)) {
- ret = -EINVAL;
- goto out_unlock;
- }
-
- ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
- if (ret)
- goto out_unlock;
-
- cs->dl_bw_cpu = cpu;
-
-out_success:
- /*
- * Mark attach is in progress. This makes validate_change() fail
- * changes which zero cpus/mems_allowed.
- */
- cs->attach_in_progress++;
+ ret = cpuset_reserve_dl_bw(cs);
out_unlock:
- if (ret)
+ if (ret) {
reset_migrate_dl_data(cs);
+ } else {
+ /*
+ * Mark attach is in progress. This makes validate_change() fail
+ * changes which zero cpus/mems_allowed.
+ */
+ cs->attach_in_progress++;
+ }
+
mutex_unlock(&cpuset_mutex);
return ret;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH-next v3 2/5] cgroup/cpuset: Expand the scope of cpuset_can_attach_check()
2026-05-27 15:37 [PATCH-next v3 0/5] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
2026-05-27 15:37 ` [PATCH-next v3 1/5] cgroup/cpuset: Add a cpuset_reserve_dl_bw() helper Waiman Long
@ 2026-05-27 15:37 ` Waiman Long
2026-05-27 15:37 ` [PATCH-next v3 3/5] cgroup/cpuset: Made cpuset_attach_old_cs track task group leaders Waiman Long
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2026-05-27 15:37 UTC (permalink / raw)
To: Chen Ridong, Tejun Heo, Johannes Weiner, Michal Koutný,
Ingo Molnar, Peter Zijlstra
Cc: cgroups, linux-kernel, Aaron Tomlin, Waiman Long
Expand the scope of cpuset_can_attach_check() by including the setting
of setsched flag inside cpuset_can_attach_check() with the new @oldcs
and @psetsched argument. As cpuset_can_attach_check() is also called
from cpuset_can_fork(), set the new arguments to NULL from that caller.
While at it, expose the source and destination cpuset cpu/memory check
results in the new attach_cpus_updated and attach_mems_updated static
flags so that these flags can be used directly from cpuset_attach()
without the need to do the same computations again.
No functional change is expected.
Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 70 +++++++++++++++++++++++++-----------------
1 file changed, 42 insertions(+), 28 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index d720bcc7ef83..4457c4f11fce 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2964,19 +2964,56 @@ static int update_prstate(struct cpuset *cs, int new_prs)
return 0;
}
+/*
+ * cpuset_can_attach() and cpuset_attach() specific internal data
+ * Protected by cpuset_mutex
+ */
static struct cpuset *cpuset_attach_old_cs;
+static bool attach_cpus_updated;
+static bool attach_mems_updated;
/*
* Check to see if a cpuset can accept a new task
* For v1, cpus_allowed and mems_allowed can't be empty.
* For v2, effective_cpus can't be empty.
* Note that in v1, effective_cpus = cpus_allowed.
+ *
+ * Also set the boolean flag passed in by @psetsched depending on if
+ * security_task_setscheduler() call is needed and @oldcs is not NULL.
*/
-static int cpuset_can_attach_check(struct cpuset *cs)
+static int cpuset_can_attach_check(struct cpuset *cs, struct cpuset *oldcs,
+ bool *psetsched)
{
if (cpumask_empty(cs->effective_cpus) ||
(!is_in_v2_mode() && nodes_empty(cs->mems_allowed)))
return -ENOSPC;
+
+ if (!oldcs)
+ return 0;
+
+ /*
+ * Update attach specific data
+ */
+ attach_cpus_updated = !cpumask_equal(cs->effective_cpus, oldcs->effective_cpus);
+ attach_mems_updated = !nodes_equal(cs->effective_mems, oldcs->effective_mems);
+
+ /*
+ * Skip rights over task setsched check in v2 when nothing changes,
+ * migration permission derives from hierarchy ownership in
+ * cgroup_procs_write_permission()).
+ */
+ *psetsched = !cpuset_v2() || attach_cpus_updated || attach_mems_updated;
+
+ /*
+ * A v1 cpuset with tasks will have no CPU left only when CPU hotplug
+ * brings the last online CPU offline as users are not allowed to empty
+ * cpuset.cpus when there are active tasks inside. When that happens,
+ * we should allow tasks to migrate out without security check to make
+ * sure they will be able to run after migration.
+ */
+ if (!is_in_v2_mode() && cpumask_empty(oldcs->effective_cpus))
+ *psetsched = false;
+
return 0;
}
@@ -3023,29 +3060,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
mutex_lock(&cpuset_mutex);
/* Check to see if task is allowed in the cpuset */
- ret = cpuset_can_attach_check(cs);
+ ret = cpuset_can_attach_check(cs, oldcs, &setsched_check);
if (ret)
goto out_unlock;
- /*
- * Skip rights over task setsched check in v2 when nothing changes,
- * migration permission derives from hierarchy ownership in
- * cgroup_procs_write_permission()).
- */
- setsched_check = !cpuset_v2() ||
- !cpumask_equal(cs->effective_cpus, oldcs->effective_cpus) ||
- !nodes_equal(cs->effective_mems, oldcs->effective_mems);
-
- /*
- * A v1 cpuset with tasks will have no CPU left only when CPU hotplug
- * brings the last online CPU offline as users are not allowed to empty
- * cpuset.cpus when there are active tasks inside. When that happens,
- * we should allow tasks to migrate out without security check to make
- * sure they will be able to run after migration.
- */
- if (!is_in_v2_mode() && cpumask_empty(oldcs->effective_cpus))
- setsched_check = false;
-
cgroup_taskset_for_each(task, css, tset) {
ret = task_can_attach(task);
if (ret)
@@ -3140,7 +3158,6 @@ static void cpuset_attach(struct cgroup_taskset *tset)
struct cgroup_subsys_state *css;
struct cpuset *cs;
struct cpuset *oldcs = cpuset_attach_old_cs;
- bool cpus_updated, mems_updated;
bool queue_task_work = false;
cgroup_taskset_first(tset, &css);
@@ -3148,9 +3165,6 @@ static void cpuset_attach(struct cgroup_taskset *tset)
lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */
mutex_lock(&cpuset_mutex);
- cpus_updated = !cpumask_equal(cs->effective_cpus,
- oldcs->effective_cpus);
- mems_updated = !nodes_equal(cs->effective_mems, oldcs->effective_mems);
/*
* In the default hierarchy, enabling cpuset in the child cgroups
@@ -3158,7 +3172,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
* in effective cpus and mems. In that case, we can optimize out
* by skipping the task iteration and update.
*/
- if (cpuset_v2() && !cpus_updated && !mems_updated) {
+ if (cpuset_v2() && !attach_cpus_updated && !attach_mems_updated) {
cpuset_attach_nodemask_to = cs->effective_mems;
goto out;
}
@@ -3175,7 +3189,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
* not set.
*/
cpuset_attach_nodemask_to = cs->effective_mems;
- if (!is_memory_migrate(cs) && !mems_updated)
+ if (!is_memory_migrate(cs) && !attach_mems_updated)
goto out;
cgroup_taskset_for_each_leader(leader, css, tset) {
@@ -3590,7 +3604,7 @@ static int cpuset_can_fork(struct task_struct *task, struct css_set *cset)
mutex_lock(&cpuset_mutex);
/* Check to see if task is allowed in the cpuset */
- ret = cpuset_can_attach_check(cs);
+ ret = cpuset_can_attach_check(cs, NULL, NULL);
if (ret)
goto out_unlock;
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH-next v3 3/5] cgroup/cpuset: Made cpuset_attach_old_cs track task group leaders
2026-05-27 15:37 [PATCH-next v3 0/5] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
2026-05-27 15:37 ` [PATCH-next v3 1/5] cgroup/cpuset: Add a cpuset_reserve_dl_bw() helper Waiman Long
2026-05-27 15:37 ` [PATCH-next v3 2/5] cgroup/cpuset: Expand the scope of cpuset_can_attach_check() Waiman Long
@ 2026-05-27 15:37 ` Waiman Long
2026-05-29 2:19 ` Guopeng Zhang
2026-05-27 15:37 ` [PATCH-next v3 4/5] cgroup/cpuset: Move mpol_rebind_mm/cpuset_migrate_mm() calls inside cpuset_attach_task() Waiman Long
2026-05-27 15:38 ` [PATCH-next v3 5/5] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
4 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2026-05-27 15:37 UTC (permalink / raw)
To: Chen Ridong, Tejun Heo, Johannes Weiner, Michal Koutný,
Ingo Molnar, Peter Zijlstra
Cc: cgroups, linux-kernel, Aaron Tomlin, Waiman Long, Ridong Chen
There are two possible ways that migration of tasks from multiple source
cpusets to a target cpuset can happen. Either a multithread application
with threads in different cpusets is wholely moved to a new cpuset
or disabling of v2 cpuset controller will move all the tasks in child
cpusets to the parent cpuset.
In the former case, t is the mm setting of the group leader that really
matters. So cpuset_attach_old_cs should track the oldcs of the thread
leader. In the latter case, effective_mems of child cpusets must always
be a subset of the parent. So no real page migration will be necessary
no matter which child cpuset is selected as cpuset_attach_old_cs.
IOW, cpuset_attach_old_cs should be updated to match the latest task
group leader in cpuset_can_attach().
Suggested-by: Ridong Chen <ridong.chen@linux.dev>
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 4457c4f11fce..b233a71f9b7c 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2967,6 +2967,20 @@ static int update_prstate(struct cpuset *cs, int new_prs)
/*
* cpuset_can_attach() and cpuset_attach() specific internal data
* Protected by cpuset_mutex
+ *
+ * The cpuset_attach_old_cs is used mainly by cpuset_migrate_mm() tp get the
+ * old_mems_allowed value. There are two ways that many-to-one cpuset migration
+ * can happen:
+ * 1) A multithread application with threads in different cpusets is wholely
+ * moved to a new cpuset.
+ * 2) Disabling v2 cpuset controller will move all the tasks in child cpusets
+ * to the parent cpuset.
+ *
+ * In the former case, it is the mm setting of the group leader that really
+ * matters. So cpuset_attach_old_cs should track the oldcs of the thread
+ * leader. In the latter case, effective_mems of child cpusets must always
+ * be a subset of the parent. So no real page migration will be necessary no
+ * matter which child cpuset is selected as cpuset_attach_old_cs.
*/
static struct cpuset *cpuset_attach_old_cs;
static bool attach_cpus_updated;
@@ -3069,6 +3083,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
if (ret)
goto out_unlock;
+ /* Update cpuset_attach_old_cs to the latest group leader */
+ if (task == task->group_leader)
+ cpuset_attach_old_cs = task_cs(task);
+
if (setsched_check) {
ret = security_task_setscheduler(task);
if (ret)
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH-next v3 3/5] cgroup/cpuset: Made cpuset_attach_old_cs track task group leaders
2026-05-27 15:37 ` [PATCH-next v3 3/5] cgroup/cpuset: Made cpuset_attach_old_cs track task group leaders Waiman Long
@ 2026-05-29 2:19 ` Guopeng Zhang
2026-05-29 16:54 ` Waiman Long
0 siblings, 1 reply; 10+ messages in thread
From: Guopeng Zhang @ 2026-05-29 2:19 UTC (permalink / raw)
To: Waiman Long, Chen Ridong, Tejun Heo, Johannes Weiner,
Michal Koutný, Ingo Molnar, Peter Zijlstra
Cc: cgroups, linux-kernel, Aaron Tomlin, Ridong Chen
在 2026/5/27 23:37, Waiman Long 写道:
> There are two possible ways that migration of tasks from multiple source
> cpusets to a target cpuset can happen. Either a multithread application
> with threads in different cpusets is wholely moved to a new cpuset
> or disabling of v2 cpuset controller will move all the tasks in child
> cpusets to the parent cpuset.
>
> In the former case, t is the mm setting of the group leader that really
> matters. So cpuset_attach_old_cs should track the oldcs of the thread
> leader. In the latter case, effective_mems of child cpusets must always
> be a subset of the parent. So no real page migration will be necessary
> no matter which child cpuset is selected as cpuset_attach_old_cs.
>
> IOW, cpuset_attach_old_cs should be updated to match the latest task
> group leader in cpuset_can_attach().
>
> Suggested-by: Ridong Chen <ridong.chen@linux.dev>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/cgroup/cpuset.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 4457c4f11fce..b233a71f9b7c 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2967,6 +2967,20 @@ static int update_prstate(struct cpuset *cs, int new_prs)
> /*
> * cpuset_can_attach() and cpuset_attach() specific internal data
> * Protected by cpuset_mutex
> + *
> + * The cpuset_attach_old_cs is used mainly by cpuset_migrate_mm() tp get the
> + * old_mems_allowed value. There are two ways that many-to-one cpuset migration
> + * can happen:
Hi Waiman,
I applied this series locally and ran some of my test cases. I didn't
observe any issue so far.
While doing a static/checkpatch pass, I noticed a few minor issues in
patches 3, 4 and 5. They are all non-functional nits.
For this patch, I only noticed a couple of small wording/typo nits in
the new comment:
s/tp get/to get/
Best,
Guopeng
> + * 1) A multithread application with threads in different cpusets is wholely
> + * moved to a new cpuset.
> + * 2) Disabling v2 cpuset controller will move all the tasks in child cpusets
> + * to the parent cpuset.
> + *
> + * In the former case, it is the mm setting of the group leader that really
> + * matters. So cpuset_attach_old_cs should track the oldcs of the thread
> + * leader. In the latter case, effective_mems of child cpusets must always
> + * be a subset of the parent. So no real page migration will be necessary no
> + * matter which child cpuset is selected as cpuset_attach_old_cs.
> */
> static struct cpuset *cpuset_attach_old_cs;
> static bool attach_cpus_updated;
> @@ -3069,6 +3083,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
> if (ret)
> goto out_unlock;
>
> + /* Update cpuset_attach_old_cs to the latest group leader */
> + if (task == task->group_leader)
> + cpuset_attach_old_cs = task_cs(task);
> +
> if (setsched_check) {
> ret = security_task_setscheduler(task);
> if (ret)
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH-next v3 3/5] cgroup/cpuset: Made cpuset_attach_old_cs track task group leaders
2026-05-29 2:19 ` Guopeng Zhang
@ 2026-05-29 16:54 ` Waiman Long
0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2026-05-29 16:54 UTC (permalink / raw)
To: Guopeng Zhang, Chen Ridong, Tejun Heo, Johannes Weiner,
Michal Koutný, Ingo Molnar, Peter Zijlstra
Cc: cgroups, linux-kernel, Aaron Tomlin, Ridong Chen
On 5/28/26 10:19 PM, Guopeng Zhang wrote:
>
> 在 2026/5/27 23:37, Waiman Long 写道:
>> There are two possible ways that migration of tasks from multiple source
>> cpusets to a target cpuset can happen. Either a multithread application
>> with threads in different cpusets is wholely moved to a new cpuset
>> or disabling of v2 cpuset controller will move all the tasks in child
>> cpusets to the parent cpuset.
>>
>> In the former case, t is the mm setting of the group leader that really
>> matters. So cpuset_attach_old_cs should track the oldcs of the thread
>> leader. In the latter case, effective_mems of child cpusets must always
>> be a subset of the parent. So no real page migration will be necessary
>> no matter which child cpuset is selected as cpuset_attach_old_cs.
>>
>> IOW, cpuset_attach_old_cs should be updated to match the latest task
>> group leader in cpuset_can_attach().
>>
>> Suggested-by: Ridong Chen <ridong.chen@linux.dev>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> kernel/cgroup/cpuset.c | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 4457c4f11fce..b233a71f9b7c 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -2967,6 +2967,20 @@ static int update_prstate(struct cpuset *cs, int new_prs)
>> /*
>> * cpuset_can_attach() and cpuset_attach() specific internal data
>> * Protected by cpuset_mutex
>> + *
>> + * The cpuset_attach_old_cs is used mainly by cpuset_migrate_mm() tp get the
>> + * old_mems_allowed value. There are two ways that many-to-one cpuset migration
>> + * can happen:
> Hi Waiman,
>
> I applied this series locally and ran some of my test cases. I didn't
> observe any issue so far.
>
> While doing a static/checkpatch pass, I noticed a few minor issues in
> patches 3, 4 and 5. They are all non-functional nits.
>
> For this patch, I only noticed a couple of small wording/typo nits in
> the new comment:
>
> s/tp get/to get/
Thanks for the review, will fix the typo in the next version.
Cheers,
Longman
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH-next v3 4/5] cgroup/cpuset: Move mpol_rebind_mm/cpuset_migrate_mm() calls inside cpuset_attach_task()
2026-05-27 15:37 [PATCH-next v3 0/5] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
` (2 preceding siblings ...)
2026-05-27 15:37 ` [PATCH-next v3 3/5] cgroup/cpuset: Made cpuset_attach_old_cs track task group leaders Waiman Long
@ 2026-05-27 15:37 ` Waiman Long
2026-05-29 2:21 ` Guopeng Zhang
2026-05-27 15:38 ` [PATCH-next v3 5/5] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
4 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2026-05-27 15:37 UTC (permalink / raw)
To: Chen Ridong, Tejun Heo, Johannes Weiner, Michal Koutný,
Ingo Molnar, Peter Zijlstra
Cc: cgroups, linux-kernel, Aaron Tomlin, Waiman Long
The cpuset_attach_task() was introduced in commit 42a11bf5c543
("cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly")
to enable the CLONE_INTO_CGROUP flag of clone(2) to behave more like
moving a task from one cpuset into another one. That commits didn't
move the mpol_rebind_mm() and cpuset_migrate_mm() calls for group leader
into cpuset_attach_task().
When the CLONE_INTO_CGROUP flag is used without CLONE_THREAD, the new
task is its own group leader. So it is still not equivalent to moving
task between cpusets in this case. Make CLONE_INTO_CGROUP behaves
more close to cpuset_attach() by moving the mpol_rebind_mm() and
cpuset_migrate_mm() calls inside cpuset_attach_task(). As a result,
cpuset_attach_old_cs, attach_cpus_updated and attach_mems_updated will
also need to be updated in cpuset_fork().
Besides, the original code use cpuset_attach_nodemask_to for
both nodemask returned by guarantee_online_mems() used only by
cpuset_change_task_nodemask() and cs->effective_mems in all other cases.
Such dual use is now impractical by merging the two task iteration loops
into one. So keep cpuset_attach_nodemask_to for the nodemask returned
by guarantee_online_mems() and reference cs->effective_mems directly
in all the other cases.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 90 ++++++++++++++++++++++--------------------
1 file changed, 47 insertions(+), 43 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b233a71f9b7c..7100575927f6 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3149,9 +3149,12 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
*/
static cpumask_var_t cpus_attach;
static nodemask_t cpuset_attach_nodemask_to;
+static bool queue_task_work;
static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task)
{
+ struct mm_struct *mm;
+
lockdep_assert_cpuset_lock_held();
if (cs != &top_cpuset)
@@ -3165,24 +3168,56 @@ static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task)
*/
WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach));
+ if (cpuset_v2() && !attach_mems_updated)
+ return;
+
cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to);
cpuset1_update_task_spread_flags(cs, task);
+
+ if (task != task->group_leader)
+ return;
+
+ /*
+ * Change mm for threadgroup leader. This is expensive and may
+ * sleep and should be moved outside migration path proper.
+ */
+ mm = get_task_mm(task);
+ if (mm) {
+ struct cpuset *oldcs = cpuset_attach_old_cs;
+
+ mpol_rebind_mm(mm, &cs->effective_mems);
+
+ /*
+ * old_mems_allowed is the same with mems_allowed
+ * here, except if this task is being moved
+ * automatically due to hotplug. In that case
+ * @mems_allowed has been updated and is empty, so
+ * @old_mems_allowed is the right nodesets that we
+ * migrate mm from.
+ */
+ if (is_memory_migrate(cs)) {
+ cpuset_migrate_mm(mm, &oldcs->old_mems_allowed,
+ &cs->effective_mems);
+ queue_task_work = true;
+ } else {
+ mmput(mm);
+ }
+ }
}
static void cpuset_attach(struct cgroup_taskset *tset)
{
struct task_struct *task;
- struct task_struct *leader;
struct cgroup_subsys_state *css;
struct cpuset *cs;
struct cpuset *oldcs = cpuset_attach_old_cs;
- bool queue_task_work = false;
cgroup_taskset_first(tset, &css);
cs = css_cs(css);
lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */
mutex_lock(&cpuset_mutex);
+ queue_task_work = false;
/*
* In the default hierarchy, enabling cpuset in the child cgroups
@@ -3190,53 +3225,18 @@ static void cpuset_attach(struct cgroup_taskset *tset)
* in effective cpus and mems. In that case, we can optimize out
* by skipping the task iteration and update.
*/
- if (cpuset_v2() && !attach_cpus_updated && !attach_mems_updated) {
- cpuset_attach_nodemask_to = cs->effective_mems;
+ if (cpuset_v2() && !attach_cpus_updated && !attach_mems_updated)
goto out;
- }
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
cgroup_taskset_for_each(task, css, tset)
cpuset_attach_task(cs, task);
- /*
- * Change mm for all threadgroup leaders. This is expensive and may
- * sleep and should be moved outside migration path proper. Skip it
- * if there is no change in effective_mems and CS_MEMORY_MIGRATE is
- * not set.
- */
- cpuset_attach_nodemask_to = cs->effective_mems;
- if (!is_memory_migrate(cs) && !attach_mems_updated)
- goto out;
-
- cgroup_taskset_for_each_leader(leader, css, tset) {
- struct mm_struct *mm = get_task_mm(leader);
-
- if (mm) {
- mpol_rebind_mm(mm, &cpuset_attach_nodemask_to);
-
- /*
- * old_mems_allowed is the same with mems_allowed
- * here, except if this task is being moved
- * automatically due to hotplug. In that case
- * @mems_allowed has been updated and is empty, so
- * @old_mems_allowed is the right nodesets that we
- * migrate mm from.
- */
- if (is_memory_migrate(cs)) {
- cpuset_migrate_mm(mm, &oldcs->old_mems_allowed,
- &cpuset_attach_nodemask_to);
- queue_task_work = true;
- } else
- mmput(mm);
- }
- }
-
out:
if (queue_task_work)
schedule_flush_migrate_mm();
- cs->old_mems_allowed = cpuset_attach_nodemask_to;
+ cs->old_mems_allowed = cs->effective_mems;
if (cs->nr_migrate_dl_tasks) {
cs->nr_deadline_tasks += cs->nr_migrate_dl_tasks;
@@ -3666,15 +3666,14 @@ static void cpuset_cancel_fork(struct task_struct *task, struct css_set *cset)
*/
static void cpuset_fork(struct task_struct *task)
{
- struct cpuset *cs;
- bool same_cs;
+ struct cpuset *cs, *oldcs;
rcu_read_lock();
cs = task_cs(task);
- same_cs = (cs == task_cs(current));
+ oldcs = task_cs(current);
rcu_read_unlock();
- if (same_cs) {
+ if (cs == oldcs) {
if (cs == &top_cpuset)
return;
@@ -3686,7 +3685,12 @@ static void cpuset_fork(struct task_struct *task)
/* CLONE_INTO_CGROUP */
mutex_lock(&cpuset_mutex);
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
+ /* Assume CPUs and memory nodes are updated */
+ attach_cpus_updated = attach_mems_updated = true;
+ cpuset_attach_old_cs = oldcs;
+ oldcs->old_mems_allowed = oldcs->effective_mems;
cpuset_attach_task(cs, task);
+ attach_cpus_updated = attach_mems_updated = false;
dec_attach_in_progress_locked(cs);
mutex_unlock(&cpuset_mutex);
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH-next v3 4/5] cgroup/cpuset: Move mpol_rebind_mm/cpuset_migrate_mm() calls inside cpuset_attach_task()
2026-05-27 15:37 ` [PATCH-next v3 4/5] cgroup/cpuset: Move mpol_rebind_mm/cpuset_migrate_mm() calls inside cpuset_attach_task() Waiman Long
@ 2026-05-29 2:21 ` Guopeng Zhang
0 siblings, 0 replies; 10+ messages in thread
From: Guopeng Zhang @ 2026-05-29 2:21 UTC (permalink / raw)
To: Waiman Long, Chen Ridong, Tejun Heo, Johannes Weiner,
Michal Koutný, Ingo Molnar, Peter Zijlstra
Cc: cgroups, linux-kernel, Aaron Tomlin
在 2026/5/27 23:37, Waiman Long 写道:
> The cpuset_attach_task() was introduced in commit 42a11bf5c543
> ("cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly")
> to enable the CLONE_INTO_CGROUP flag of clone(2) to behave more like
> moving a task from one cpuset into another one. That commits didn't
> move the mpol_rebind_mm() and cpuset_migrate_mm() calls for group leader
> into cpuset_attach_task().
>
> When the CLONE_INTO_CGROUP flag is used without CLONE_THREAD, the new
> task is its own group leader. So it is still not equivalent to moving
> task between cpusets in this case. Make CLONE_INTO_CGROUP behaves
> more close to cpuset_attach() by moving the mpol_rebind_mm() and
> cpuset_migrate_mm() calls inside cpuset_attach_task(). As a result,
> cpuset_attach_old_cs, attach_cpus_updated and attach_mems_updated will
> also need to be updated in cpuset_fork().
>
> Besides, the original code use cpuset_attach_nodemask_to for
> both nodemask returned by guarantee_online_mems() used only by
> cpuset_change_task_nodemask() and cs->effective_mems in all other cases.
> Such dual use is now impractical by merging the two task iteration loops
> into one. So keep cpuset_attach_nodemask_to for the nodemask returned
> by guarantee_online_mems() and reference cs->effective_mems directly
> in all the other cases.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/cgroup/cpuset.c | 90 ++++++++++++++++++++++--------------------
> 1 file changed, 47 insertions(+), 43 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index b233a71f9b7c..7100575927f6 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -3149,9 +3149,12 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
> */
> static cpumask_var_t cpus_attach;
> static nodemask_t cpuset_attach_nodemask_to;
> +static bool queue_task_work;
...
> @@ -3686,7 +3685,12 @@ static void cpuset_fork(struct task_struct *task)
> /* CLONE_INTO_CGROUP */
> mutex_lock(&cpuset_mutex);
> guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
> + /* Assume CPUs and memory nodes are updated */
> + attach_cpus_updated = attach_mems_updated = true;
> + cpuset_attach_old_cs = oldcs;
> + oldcs->old_mems_allowed = oldcs->effective_mems;
> cpuset_attach_task(cs, task);
> + attach_cpus_updated = attach_mems_updated = false;
>
> dec_attach_in_progress_locked(cs);
> mutex_unlock(&cpuset_mutex);
Just a minor nit while running checkpatch --strict on this patch:
checkpatch reports:
CHECK: multiple assignments should be avoided
Perhaps the multiple assignments can be split to keep the patch
checkpatch-clean?
attach_cpus_updated = true;
attach_mems_updated = true;
and later:
attach_cpus_updated = false;
attach_mems_updated = false;
Just a style nit.
Best,
Guopeng
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH-next v3 5/5] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach()
2026-05-27 15:37 [PATCH-next v3 0/5] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
` (3 preceding siblings ...)
2026-05-27 15:37 ` [PATCH-next v3 4/5] cgroup/cpuset: Move mpol_rebind_mm/cpuset_migrate_mm() calls inside cpuset_attach_task() Waiman Long
@ 2026-05-27 15:38 ` Waiman Long
2026-05-29 2:26 ` Guopeng Zhang
4 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2026-05-27 15:38 UTC (permalink / raw)
To: Chen Ridong, Tejun Heo, Johannes Weiner, Michal Koutný,
Ingo Molnar, Peter Zijlstra
Cc: cgroups, linux-kernel, Aaron Tomlin, Waiman Long
With cgroup v2, the cgroup_taskset structure passed into the cgroup
can_attach() and attach() methods can contain task migration data with
multiple destination or source cpusets when the cpuset controller is
enabled or disabled respectively.
Since cpuset is threaded in both v1 and v2, another possible way to
cause many-to-one migration is to move the whole process with multiple
threads in different cpuset enabled threaded cgroups into another cpuset
enabled cgroup.
The current cpuset_can_attach() and cpuset_attach() functions still
expect task migration is from one source cpuset to one destination
cpuset. This has been the case since cpuset was enabled for cgroup v2
in commit 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default
hierarchy").
This problem is less an issue when enabling the cpuset controller as all
the newly created child cpusets will have exactly the same set of CPUs
and memory nodes except when deadline tasks are involved in migration
as the deadline task accounting data can be off.
It can be more problematic when the cpuset controller is disabled as
their set of CPUs and memory nodes may differ from their parent or with
the moving of multi-threaded process from different threaded cgroups.
Fix that by tracking the set of source (old) and destination cpusets
in singly linked lists and iterating them all to properly update the
internal data. Also keep the current cs and oldcs variables up-to-date
with the css and task iterators.
To ensure proper DL tasks accounting, the nr_migrate_dl_tasks in both
the source and destination cpusets are decremented/incremented with
their values added to nr_deadline_tasks when the migration is successful.
Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default hierarchy")
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset-internal.h | 6 +
kernel/cgroup/cpuset.c | 206 +++++++++++++++++++++++---------
2 files changed, 157 insertions(+), 55 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index f7aaf01f7cd5..4c2772a7fd5e 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -161,6 +161,12 @@ struct cpuset {
*/
bool remote_partition;
+ /*
+ * cpuset_can_attach() and cpuset_attach() specific data
+ */
+ bool attach_node_in_llist;
+ struct llist_node attach_node;
+
/*
* number of SCHED_DEADLINE tasks attached to this cpuset, so that we
* know when to rebuild associated root domain bandwidth information.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 7100575927f6..98ee001ef950 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -37,6 +37,7 @@
#include <linux/wait.h>
#include <linux/workqueue.h>
#include <linux/task_work.h>
+#include <linux/llist.h>
DEFINE_STATIC_KEY_FALSE(cpusets_pre_enable_key);
DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key);
@@ -2983,6 +2984,8 @@ static int update_prstate(struct cpuset *cs, int new_prs)
* matter which child cpuset is selected as cpuset_attach_old_cs.
*/
static struct cpuset *cpuset_attach_old_cs;
+static LLIST_HEAD(src_cs_head);
+static LLIST_HEAD(dst_cs_head);
static bool attach_cpus_updated;
static bool attach_mems_updated;
@@ -2995,9 +2998,10 @@ static bool attach_mems_updated;
* Also set the boolean flag passed in by @psetsched depending on if
* security_task_setscheduler() call is needed and @oldcs is not NULL.
*/
-static int cpuset_can_attach_check(struct cpuset *cs, struct cpuset *oldcs,
- bool *psetsched)
+static int cpuset_can_attach_check(struct cpuset *cs, struct cpuset *oldcs, bool *psetsched)
{
+ bool cpu_match, mem_match;
+
if (cpumask_empty(cs->effective_cpus) ||
(!is_in_v2_mode() && nodes_empty(cs->mems_allowed)))
return -ENOSPC;
@@ -3008,15 +3012,34 @@ static int cpuset_can_attach_check(struct cpuset *cs, struct cpuset *oldcs,
/*
* Update attach specific data
*/
- attach_cpus_updated = !cpumask_equal(cs->effective_cpus, oldcs->effective_cpus);
- attach_mems_updated = !nodes_equal(cs->effective_mems, oldcs->effective_mems);
+ if (!cs->attach_node_in_llist) {
+ llist_add(&cs->attach_node, &dst_cs_head);
+ cs->attach_node_in_llist = true;
+ }
+ if (!oldcs->attach_node_in_llist) {
+ llist_add(&oldcs->attach_node, &src_cs_head);
+ oldcs->attach_node_in_llist = true;
+ }
+
+ cpu_match = cpumask_equal(cs->effective_cpus, oldcs->effective_cpus);
+ mem_match = nodes_equal(cs->effective_mems, oldcs->effective_mems);
+
+ /*
+ * Set the updated flags whenever there is a mismatch in any of the
+ * src/dst pairs.
+ */
+ if (!attach_cpus_updated)
+ attach_cpus_updated = !cpu_match;
+
+ if (!attach_mems_updated)
+ attach_mems_updated = !mem_match;
/*
* Skip rights over task setsched check in v2 when nothing changes,
* migration permission derives from hierarchy ownership in
* cgroup_procs_write_permission()).
*/
- *psetsched = !cpuset_v2() || attach_cpus_updated || attach_mems_updated;
+ *psetsched = !cpuset_v2() || !cpu_match || !mem_match;
/*
* A v1 cpuset with tasks will have no CPU left only when CPU hotplug
@@ -3031,33 +3054,103 @@ static int cpuset_can_attach_check(struct cpuset *cs, struct cpuset *oldcs,
return 0;
}
-static int cpuset_reserve_dl_bw(struct cpuset *cs)
+/*
+ * If reset_dl_bw is set, reset the previous dl_bw_alloc() call. Otherwise,
+ * update nr_deadline_tasks according to nr_migrate_dl_tasks in both source
+ * and destination cpusets.
+ */
+static void clear_attach_data(bool reset_dl_bw)
+{
+ struct cpuset *cs, *next;
+
+ llist_for_each_entry_safe(cs, next, src_cs_head.first, attach_node) {
+ cs->attach_node.next = NULL;
+ cs->attach_node_in_llist = false;
+ if (cs->nr_migrate_dl_tasks && !reset_dl_bw)
+ cs->nr_deadline_tasks += cs->nr_migrate_dl_tasks;
+ cs->nr_migrate_dl_tasks = 0;
+ }
+
+ llist_for_each_entry_safe(cs, next, dst_cs_head.first, attach_node) {
+ cs->attach_node.next = NULL;
+ cs->attach_node_in_llist = false;
+ if (reset_dl_bw && cs->dl_bw_cpu >= 0)
+ dl_bw_free(cs->dl_bw_cpu, cs->sum_migrate_dl_bw);
+ if (cs->nr_migrate_dl_tasks && !reset_dl_bw)
+ cs->nr_deadline_tasks += cs->nr_migrate_dl_tasks;
+ cs->nr_migrate_dl_tasks = 0;
+ cs->sum_migrate_dl_bw = 0;
+ cs->dl_bw_cpu = -1;
+ }
+
+ src_cs_head.first = NULL;
+ dst_cs_head.first = NULL;
+ attach_cpus_updated = false;
+ attach_mems_updated = false;
+}
+
+static int cpuset_reserve_dl_bw(void)
{
+ struct cpuset *cs;
int cpu, ret;
- if (!cs->sum_migrate_dl_bw)
- return 0;
+ llist_for_each_entry(cs, dst_cs_head.first, attach_node) {
+ if (!cs->sum_migrate_dl_bw)
+ continue;
- cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus);
- if (unlikely(cpu >= nr_cpu_ids))
- return -EINVAL;
+ cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus);
+ if (unlikely(cpu >= nr_cpu_ids))
+ return -EINVAL;
- ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
- if (ret)
- return ret;
+ ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
+ if (ret)
+ return ret;
- cs->dl_bw_cpu = cpu;
+ cs->dl_bw_cpu = cpu;
+ }
return 0;
}
-static void reset_migrate_dl_data(struct cpuset *cs)
+static void set_attach_in_progress(void)
{
- cs->nr_migrate_dl_tasks = 0;
- cs->sum_migrate_dl_bw = 0;
- cs->dl_bw_cpu = -1;
+ struct cpuset *cs;
+
+ /*
+ * Mark attach is in progress. This makes validate_change() fail
+ * changes which zero cpus/mems_allowed.
+ */
+ llist_for_each_entry(cs, dst_cs_head.first, attach_node)
+ cs->attach_in_progress++;
+}
+
+static void reset_attach_in_progress(void)
+{
+ struct cpuset *cs;
+
+ llist_for_each_entry(cs, dst_cs_head.first, attach_node)
+ dec_attach_in_progress_locked(cs);
}
-/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
+/*
+ * Called by cgroups to determine if a cpuset is usable; cpuset_mutex held.
+ *
+ * With cgroup v2, enabling of cpuset controller in a cgroup subtree can
+ * cause @tset to contain task migration data from one parent cpuset to multiple
+ * child cpusets. Not much is needed to be done here other than tracking the
+ * number of DL tasks in each cpuset as the CPUs and memory nodes of the child
+ * cpusets are exactly the same as the parent.
+ *
+ * Conversely, disabling of cpuset controller can cause @tset to contain task
+ * migration data from multiple child cpusets to one parent cpuset. Here, the
+ * CPUs and memory nodes of the child cpusets may be different from the parent,
+ * but must be a subset of its parent.
+ *
+ * Another possible many-to-one migration is the moving of the whole
+ * multithreaded process with threads in different cpusets to another cpuset.
+ *
+ * For all other use cases, @tset task migration data should be from one source
+ * cpuset to one destination cpuset.
+ */
static int cpuset_can_attach(struct cgroup_taskset *tset)
{
struct cgroup_subsys_state *css;
@@ -3079,6 +3172,16 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
goto out_unlock;
cgroup_taskset_for_each(task, css, tset) {
+ struct cpuset *newcs = css_cs(css);
+ struct cpuset *new_oldcs = task_cs(task);
+
+ if ((newcs != cs) || (new_oldcs != oldcs)) {
+ cs = newcs;
+ oldcs = new_oldcs;
+ ret = cpuset_can_attach_check(cs, oldcs, &setsched_check);
+ if (ret)
+ goto out_unlock;
+ }
ret = task_can_attach(task);
if (ret)
goto out_unlock;
@@ -3100,23 +3203,19 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
* contribute to sum_migrate_dl_bw.
*/
cs->nr_migrate_dl_tasks++;
+ oldcs->nr_migrate_dl_tasks--;
if (dl_task_needs_bw_move(task, cs->effective_cpus))
cs->sum_migrate_dl_bw += task->dl.dl_bw;
}
}
- ret = cpuset_reserve_dl_bw(cs);
+ ret = cpuset_reserve_dl_bw();
out_unlock:
- if (ret) {
- reset_migrate_dl_data(cs);
- } else {
- /*
- * Mark attach is in progress. This makes validate_change() fail
- * changes which zero cpus/mems_allowed.
- */
- cs->attach_in_progress++;
- }
+ if (ret)
+ clear_attach_data(true);
+ else
+ set_attach_in_progress();
mutex_unlock(&cpuset_mutex);
return ret;
@@ -3131,14 +3230,8 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
cs = css_cs(css);
mutex_lock(&cpuset_mutex);
- dec_attach_in_progress_locked(cs);
-
- if (cs->dl_bw_cpu >= 0)
- dl_bw_free(cs->dl_bw_cpu, cs->sum_migrate_dl_bw);
-
- if (cs->nr_migrate_dl_tasks)
- reset_migrate_dl_data(cs);
-
+ reset_attach_in_progress();
+ clear_attach_data(true);
mutex_unlock(&cpuset_mutex);
}
@@ -3210,42 +3303,45 @@ static void cpuset_attach(struct cgroup_taskset *tset)
struct task_struct *task;
struct cgroup_subsys_state *css;
struct cpuset *cs;
- struct cpuset *oldcs = cpuset_attach_old_cs;
cgroup_taskset_first(tset, &css);
cs = css_cs(css);
-
lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */
mutex_lock(&cpuset_mutex);
queue_task_work = false;
/*
* In the default hierarchy, enabling cpuset in the child cgroups
- * will trigger a number of cpuset_attach() calls with no change
- * in effective cpus and mems. In that case, we can optimize out
- * by skipping the task iteration and update.
+ * will trigger a cpuset_attach() call with no change in effective cpus
+ * and mems. In that case, we can optimize out by skipping the task
+ * iteration and update, but the destination cpuset list is iterated to
+ * set old_mems_sllowed.
*/
- if (cpuset_v2() && !attach_cpus_updated && !attach_mems_updated)
+ if (cpuset_v2() && !attach_cpus_updated && !attach_mems_updated) {
+ llist_for_each_entry(cs, dst_cs_head.first, attach_node)
+ cs->old_mems_allowed = cs->effective_mems;
goto out;
+ }
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
- cgroup_taskset_for_each(task, css, tset)
+ cgroup_taskset_for_each(task, css, tset) {
+ struct cpuset *newcs = css_cs(css);
+
+ if (newcs != cs) {
+ cs->old_mems_allowed = cs->effective_mems;
+ cs = newcs;
+ guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
+ }
cpuset_attach_task(cs, task);
+ }
-out:
if (queue_task_work)
schedule_flush_migrate_mm();
cs->old_mems_allowed = cs->effective_mems;
-
- if (cs->nr_migrate_dl_tasks) {
- cs->nr_deadline_tasks += cs->nr_migrate_dl_tasks;
- oldcs->nr_deadline_tasks -= cs->nr_migrate_dl_tasks;
- reset_migrate_dl_data(cs);
- }
-
- dec_attach_in_progress_locked(cs);
-
+out:
+ reset_attach_in_progress();
+ clear_attach_data(false);
mutex_unlock(&cpuset_mutex);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH-next v3 5/5] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach()
2026-05-27 15:38 ` [PATCH-next v3 5/5] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
@ 2026-05-29 2:26 ` Guopeng Zhang
0 siblings, 0 replies; 10+ messages in thread
From: Guopeng Zhang @ 2026-05-29 2:26 UTC (permalink / raw)
To: Waiman Long, Chen Ridong, Tejun Heo, Johannes Weiner,
Michal Koutný, Ingo Molnar, Peter Zijlstra
Cc: cgroups, linux-kernel, Aaron Tomlin
在 2026/5/27 23:38, Waiman Long 写道:
> With cgroup v2, the cgroup_taskset structure passed into the cgroup
> can_attach() and attach() methods can contain task migration data with
> multiple destination or source cpusets when the cpuset controller is
> enabled or disabled respectively.
...
> -/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
> +/*
> + * Called by cgroups to determine if a cpuset is usable; cpuset_mutex held.
> + *
> + * With cgroup v2, enabling of cpuset controller in a cgroup subtree can
> + * cause @tset to contain task migration data from one parent cpuset to multiple
> + * child cpusets. Not much is needed to be done here other than tracking the
> + * number of DL tasks in each cpuset as the CPUs and memory nodes of the child
> + * cpusets are exactly the same as the parent.
> + *
> + * Conversely, disabling of cpuset controller can cause @tset to contain task
> + * migration data from multiple child cpusets to one parent cpuset. Here, the
> + * CPUs and memory nodes of the child cpusets may be different from the parent,
> + * but must be a subset of its parent.
> + *
> + * Another possible many-to-one migration is the moving of the whole
> + * multithreaded process with threads in different cpusets to another cpuset.
> + *
> + * For all other use cases, @tset task migration data should be from one source
> + * cpuset to one destination cpuset.
> + */
> static int cpuset_can_attach(struct cgroup_taskset *tset)
> {
> struct cgroup_subsys_state *css;
> @@ -3079,6 +3172,16 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
> goto out_unlock;
>
> cgroup_taskset_for_each(task, css, tset) {
> + struct cpuset *newcs = css_cs(css);
> + struct cpuset *new_oldcs = task_cs(task);
> +
> + if ((newcs != cs) || (new_oldcs != oldcs)) {
> + cs = newcs;
> + oldcs = new_oldcs;
> + ret = cpuset_can_attach_check(cs, oldcs, &setsched_check);
> + if (ret)
> + goto out_unlock;
> + }
Just a minor nit while running checkpatch --strict on this patch:
checkpatch reports unnecessary parentheses here:
if ((newcs != cs) || (new_oldcs != oldcs)) {
Perhaps this can be simplified to:
if (newcs != cs || new_oldcs != oldcs) {
> ret = task_can_attach(task);
> if (ret)
...
> /*
> * In the default hierarchy, enabling cpuset in the child cgroups
> - * will trigger a number of cpuset_attach() calls with no change
> - * in effective cpus and mems. In that case, we can optimize out
> - * by skipping the task iteration and update.
> + * will trigger a cpuset_attach() call with no change in effective cpus
> + * and mems. In that case, we can optimize out by skipping the task
> + * iteration and update, but the destination cpuset list is iterated to
> + * set old_mems_sllowed.
> */
I also noticed one small typo in the added comment:
s/old_mems_sllowed/old_mems_allowed/
Best,
Guopeng
^ permalink raw reply [flat|nested] 10+ messages in thread