The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH v2 0/2] cgroup/cpuset: fix DL attach bandwidth accounting
@ 2026-05-07 10:33 Guopeng Zhang
  2026-05-07 10:33 ` [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure Guopeng Zhang
  2026-05-07 10:33 ` [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask Guopeng Zhang
  0 siblings, 2 replies; 9+ messages in thread
From: Guopeng Zhang @ 2026-05-07 10:33 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Michal Koutný, Ingo Molnar,
	Peter Zijlstra, Juri Lelli
  Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
	cgroups, Guopeng Zhang

Hi,

cpuset_can_attach() and set_cpus_allowed_dl() must make the same
decision about whether migrating a SCHED_DEADLINE task requires moving
bandwidth accounting between root domains.

The can_attach path used the destination cpuset effective CPU mask for
that decision. The attach path, however, applies a per-task target mask
which is constrained by task_cpu_possible_mask(), cpu_active_mask, and
the fallback walk up the cpuset hierarchy. On asymmetric CPU systems,
that per-task mask can be a strict subset of the destination cpuset
effective mask. This can make cpuset_can_attach() skip destination
bandwidth reservation while set_cpus_allowed_dl() later performs the
source-side bandwidth subtraction.

There is also an internal cpuset_can_attach() failure path where
temporary DL migration state can be left behind if a later per-task
check fails before cpuset marks attach_in_progress.

Patch 1 resets the temporary DL migration state on those internal
cpuset_can_attach() failure paths.

Patch 2 computes the same per-task target mask in cpuset_can_attach()
that cpuset_attach_task() later applies, and only includes DL tasks that
actually need a root-domain bandwidth move in the destination bandwidth
reservation.

The broader can_attach()/attach() transaction window is left unchanged.
This series does not attempt to rework sched_setattr() or source cpuset
resmask TOCTOU issues. It only aligns the reservation decision with the
attach-time bandwidth move decision and fixes the temporary state leak.

Guopeng Zhang (2):
  cgroup/cpuset: reset DL migration state on can_attach() failure
  cgroup/cpuset: align DL bandwidth reservation with attach target mask

 include/linux/sched/deadline.h  |   9 +++
 kernel/cgroup/cpuset-internal.h |   1 +
 kernel/cgroup/cpuset.c          | 105 ++++++++++++++++++++++----------
 kernel/sched/deadline.c         |  13 +++-
 4 files changed, 92 insertions(+), 36 deletions(-)

---
Changes since v1:
- Split the original patch into two patches.
- Reset temporary DL migration state on cpuset_can_attach() internal
  failure paths.
- Computed the same per-task attach mask in cpuset_can_attach() as
  cpuset_attach_task().
- Kept nr_migrate_dl_tasks counting all migrating DL tasks for cpuset
  task accounting, while restricting sum_migrate_dl_bw to tasks that need
  destination DL bandwidth reservation.
- Tightened Fixes tags.
- Documented the existing aggregate reservation invariant near the
  dl_bw_cpu selection.
- Removed the unnecessary RCU guard from dl_task_needs_bw_move().

v1:
  https://lore.kernel.org/all/20260421083449.95750-1-zhangguopeng@kylinos.cn

-- 
2.43.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure
  2026-05-07 10:33 [PATCH v2 0/2] cgroup/cpuset: fix DL attach bandwidth accounting Guopeng Zhang
@ 2026-05-07 10:33 ` Guopeng Zhang
  2026-05-07 14:31   ` Waiman Long
  2026-05-07 10:33 ` [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask Guopeng Zhang
  1 sibling, 1 reply; 9+ messages in thread
From: Guopeng Zhang @ 2026-05-07 10:33 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Michal Koutný, Ingo Molnar,
	Peter Zijlstra, Juri Lelli
  Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
	cgroups, Guopeng Zhang

cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration
state in the destination cpuset while walking the taskset.

If a later task_can_attach() or security_task_setscheduler() check
fails, cgroup_migrate_execute() treats cpuset as the failing subsystem
and does not call cpuset_cancel_attach() for it. The partially
accumulated state is then left behind and can be consumed by a later
attach, corrupting cpuset DL task accounting and pending DL bandwidth
accounting.

Reset the pending DL migration state before returning from those
per-task failure paths.

Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
---
 kernel/cgroup/cpuset.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index e3a081a07c6d..ae41736399a1 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 	cgroup_taskset_for_each(task, css, tset) {
 		ret = task_can_attach(task);
 		if (ret)
-			goto out_unlock;
+			goto out_reset_dl_data;
 
 		if (setsched_check) {
 			ret = security_task_setscheduler(task);
 			if (ret)
-				goto out_unlock;
+				goto out_reset_dl_data;
 		}
 
 		if (dl_task(task)) {
@@ -3070,6 +3070,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 	 * changes which zero cpus/mems_allowed.
 	 */
 	cs->attach_in_progress++;
+	goto out_unlock;
+
+out_reset_dl_data:
+	reset_migrate_dl_data(cs);
 out_unlock:
 	mutex_unlock(&cpuset_mutex);
 	return ret;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask
  2026-05-07 10:33 [PATCH v2 0/2] cgroup/cpuset: fix DL attach bandwidth accounting Guopeng Zhang
  2026-05-07 10:33 ` [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure Guopeng Zhang
@ 2026-05-07 10:33 ` Guopeng Zhang
  2026-05-07 15:52   ` Waiman Long
  1 sibling, 1 reply; 9+ messages in thread
From: Guopeng Zhang @ 2026-05-07 10:33 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Michal Koutný, Ingo Molnar,
	Peter Zijlstra, Juri Lelli
  Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
	cgroups, Guopeng Zhang

cpuset_can_attach() preallocates destination SCHED_DEADLINE bandwidth
before the attach commit point, while set_cpus_allowed_dl() later
subtracts bandwidth from the source root domain when the task affinity is
actually updated.

Those two decisions must be made with the same CPU mask.
cpuset_can_attach() used the destination cpuset effective mask directly,
but cpuset_attach_task() first builds a per-task target mask which is
constrained by task_cpu_possible_mask() and, if needed, by walking up the
cpuset hierarchy. On asymmetric systems, the actual target mask can
therefore be a strict subset of cs->effective_cpus.

If the source root domain intersects cs->effective_cpus only on CPUs
outside the task's possible mask, can_attach() can skip the destination
reservation even though set_cpus_allowed_dl() later sees a real
root-domain move and subtracts from the source domain.

Extract the root-domain bandwidth-move test used by
set_cpus_allowed_dl() into dl_task_needs_bw_move(), and make
cpuset_can_attach() compute the same per-task target mask that
cpuset_attach_task() applies.

Keep nr_migrate_dl_tasks counting all migrating deadline tasks for
cpuset DL task accounting. Restrict sum_migrate_dl_bw to the subset of
tasks that need destination root-domain bandwidth reservation, because a
deadline task can move between cpusets without moving bandwidth between
root domains.

This keeps the existing per-attach aggregate reservation model; it only
changes the per-task mask used to decide which tasks contribute to that
aggregate. The broader can_attach()/attach() transaction window is left
unchanged.

Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
---
 include/linux/sched/deadline.h  |  9 +++
 kernel/cgroup/cpuset-internal.h |  1 +
 kernel/cgroup/cpuset.c          | 97 ++++++++++++++++++++++-----------
 kernel/sched/deadline.c         | 13 ++++-
 4 files changed, 86 insertions(+), 34 deletions(-)

diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 1198138cb839..ddfd5216f3fc 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -33,6 +33,15 @@ struct root_domain;
 extern void dl_add_task_root_domain(struct task_struct *p);
 extern void dl_clear_root_domain(struct root_domain *rd);
 extern void dl_clear_root_domain_cpu(int cpu);
+/*
+ * Return whether moving DL task @p to @new_mask requires moving DL
+ * bandwidth accounting between root domains. This helper is specific to
+ * DL bandwidth move accounting semantics and is shared by
+ * cpuset_can_attach() and set_cpus_allowed_dl() so both paths use the
+ * same source root-domain test.
+ */
+bool dl_task_needs_bw_move(struct task_struct *p,
+			   const struct cpumask *new_mask);
 
 extern u64 dl_cookie;
 extern bool dl_bw_visited(int cpu, u64 cookie);
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index bb4e692bea30..f7aaf01f7cd5 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -167,6 +167,7 @@ struct cpuset {
 	 */
 	int nr_deadline_tasks;
 	int nr_migrate_dl_tasks;
+	/* DL bandwidth that needs destination reservation for this attach. */
 	u64 sum_migrate_dl_bw;
 	/*
 	 * CPU used for temporary DL bandwidth allocation during attach;
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index ae41736399a1..78c1a4071cc3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -485,6 +485,30 @@ static void guarantee_active_cpus(struct task_struct *tsk,
 	rcu_read_unlock();
 }
 
+/* Compute the effective CPU mask cpuset_attach_task() will apply to @tsk. */
+static void cpuset_attach_task_cpus(struct cpuset *cs, struct task_struct *tsk,
+				    struct cpumask *pmask)
+{
+	const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
+
+	lockdep_assert_cpuset_lock_held();
+
+	if (cs == &top_cpuset) {
+		cpumask_andnot(pmask, possible_mask, subpartitions_cpus);
+		return;
+	}
+
+	if (WARN_ON(!cpumask_and(pmask, possible_mask, cpu_active_mask)))
+		cpumask_copy(pmask, cpu_active_mask);
+
+	rcu_read_lock();
+	while (!cpumask_intersects(cs->effective_cpus, pmask))
+		cs = parent_cs(cs);
+
+	cpumask_and(pmask, pmask, cs->effective_cpus);
+	rcu_read_unlock();
+}
+
 /*
  * Return in *pmask the portion of a cpusets's mems_allowed that
  * are online, with memory.  If none are online with memory, walk
@@ -2986,6 +3010,14 @@ static void reset_migrate_dl_data(struct cpuset *cs)
 	cs->dl_bw_cpu = -1;
 }
 
+/*
+ * Protected by cpuset_mutex. cpus_attach is used by the can_attach/attach
+ * paths but we can't allocate it dynamically there. Define it global and
+ * allocate from cpuset_init().
+ */
+static cpumask_var_t cpus_attach;
+static nodemask_t cpuset_attach_nodemask_to;
+
 /* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
 static int cpuset_can_attach(struct cgroup_taskset *tset)
 {
@@ -2993,7 +3025,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 	struct cpuset *cs, *oldcs;
 	struct task_struct *task;
 	bool setsched_check;
-	int ret;
+	int cpu = nr_cpu_ids, ret;
 
 	/* used later by cpuset_attach() */
 	cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
@@ -3038,32 +3070,47 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 		}
 
 		if (dl_task(task)) {
+			/*
+			 * Count all migrating DL tasks for cpuset task accounting.
+			 * Only tasks that need a root-domain bandwidth move
+			 * contribute to sum_migrate_dl_bw.
+			 */
 			cs->nr_migrate_dl_tasks++;
-			cs->sum_migrate_dl_bw += task->dl.dl_bw;
+			cpuset_attach_task_cpus(cs, task, cpus_attach);
+
+			if (dl_task_needs_bw_move(task, cpus_attach)) {
+				/*
+				 * Keep the existing aggregate reservation model.
+				 * Tasks in one attach enter the same destination
+				 * cpuset, so the first CPU found for a task needing
+				 * DL bandwidth reservation identifies the destination
+				 * root domain.
+				 */
+				if (cpu >= nr_cpu_ids)
+					cpu = cpumask_any_and(cpu_active_mask,
+							      cpus_attach);
+				cs->sum_migrate_dl_bw += task->dl.dl_bw;
+			}
 		}
 	}
 
-	if (!cs->nr_migrate_dl_tasks)
+	if (!cs->sum_migrate_dl_bw)
 		goto out_success;
 
-	if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) {
-		int cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus);
-
-		if (unlikely(cpu >= nr_cpu_ids)) {
-			reset_migrate_dl_data(cs);
-			ret = -EINVAL;
-			goto out_unlock;
-		}
-
-		ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
-		if (ret) {
-			reset_migrate_dl_data(cs);
-			goto out_unlock;
-		}
+	if (unlikely(cpu >= nr_cpu_ids)) {
+		reset_migrate_dl_data(cs);
+		ret = -EINVAL;
+		goto out_unlock;
+	}
 
-		cs->dl_bw_cpu = cpu;
+	ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
+	if (ret) {
+		reset_migrate_dl_data(cs);
+		goto out_unlock;
 	}
 
+	cs->dl_bw_cpu = cpu;
+
 out_success:
 	/*
 	 * Mark attach is in progress.  This makes validate_change() fail
@@ -3099,23 +3146,11 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
 	mutex_unlock(&cpuset_mutex);
 }
 
-/*
- * Protected by cpuset_mutex. cpus_attach is used only by cpuset_attach_task()
- * but we can't allocate it dynamically there.  Define it global and
- * allocate from cpuset_init().
- */
-static cpumask_var_t cpus_attach;
-static nodemask_t cpuset_attach_nodemask_to;
-
 static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task)
 {
 	lockdep_assert_cpuset_lock_held();
 
-	if (cs != &top_cpuset)
-		guarantee_active_cpus(task, cpus_attach);
-	else
-		cpumask_andnot(cpus_attach, task_cpu_possible_mask(task),
-			       subpartitions_cpus);
+	cpuset_attach_task_cpus(cs, task, cpus_attach);
 	/*
 	 * can_attach beforehand should guarantee that this doesn't
 	 * fail.  TODO: have a better way to handle failure here
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index edca7849b165..7db4c87df83b 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3107,20 +3107,18 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p)
 static void set_cpus_allowed_dl(struct task_struct *p,
 				struct affinity_context *ctx)
 {
-	struct root_domain *src_rd;
 	struct rq *rq;
 
 	WARN_ON_ONCE(!dl_task(p));
 
 	rq = task_rq(p);
-	src_rd = rq->rd;
 	/*
 	 * Migrating a SCHED_DEADLINE task between exclusive
 	 * cpusets (different root_domains) entails a bandwidth
 	 * update. We already made space for us in the destination
 	 * domain (see cpuset_can_attach()).
 	 */
-	if (!cpumask_intersects(src_rd->span, ctx->new_mask)) {
+	if (dl_task_needs_bw_move(p, ctx->new_mask)) {
 		struct dl_bw *src_dl_b;
 
 		src_dl_b = dl_bw_of(cpu_of(rq));
@@ -3137,6 +3135,15 @@ static void set_cpus_allowed_dl(struct task_struct *p,
 	set_cpus_allowed_common(p, ctx);
 }
 
+bool dl_task_needs_bw_move(struct task_struct *p,
+			   const struct cpumask *new_mask)
+{
+	if (!dl_task(p))
+		return false;
+
+	return !cpumask_intersects(task_rq(p)->rd->span, new_mask);
+}
+
 /* Assumes rq->lock is held */
 static void rq_online_dl(struct rq *rq)
 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure
  2026-05-07 10:33 ` [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure Guopeng Zhang
@ 2026-05-07 14:31   ` Waiman Long
  2026-05-08  2:14     ` Chen Ridong
  0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2026-05-07 14:31 UTC (permalink / raw)
  To: Guopeng Zhang, Tejun Heo, Michal Koutný, Ingo Molnar,
	Peter Zijlstra, Juri Lelli
  Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
	cgroups

On 5/7/26 6:33 AM, Guopeng Zhang wrote:
> cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration
> state in the destination cpuset while walking the taskset.
>
> If a later task_can_attach() or security_task_setscheduler() check
> fails, cgroup_migrate_execute() treats cpuset as the failing subsystem
> and does not call cpuset_cancel_attach() for it. The partially
> accumulated state is then left behind and can be consumed by a later
> attach, corrupting cpuset DL task accounting and pending DL bandwidth
> accounting.
>
> Reset the pending DL migration state before returning from those
> per-task failure paths.
>
> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
> ---
>   kernel/cgroup/cpuset.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index e3a081a07c6d..ae41736399a1 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>   	cgroup_taskset_for_each(task, css, tset) {
>   		ret = task_can_attach(task);
>   		if (ret)
> -			goto out_unlock;
> +			goto out_reset_dl_data;
>   
>   		if (setsched_check) {
>   			ret = security_task_setscheduler(task);
>   			if (ret)
> -				goto out_unlock;
> +				goto out_reset_dl_data;
>   		}
>   
>   		if (dl_task(task)) {
> @@ -3070,6 +3070,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>   	 * changes which zero cpus/mems_allowed.
>   	 */
>   	cs->attach_in_progress++;
> +	goto out_unlock;
> +
> +out_reset_dl_data:
> +	reset_migrate_dl_data(cs);
>   out_unlock:
>   	mutex_unlock(&cpuset_mutex);
>   	return ret;

I would prefer the likely success path be a straight line instead of 
doing a goto. IOW, move out_reset_dl_data below return. Other than that, 
this patch looks good to me.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask
  2026-05-07 10:33 ` [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask Guopeng Zhang
@ 2026-05-07 15:52   ` Waiman Long
  2026-05-08 13:11     ` Guopeng Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2026-05-07 15:52 UTC (permalink / raw)
  To: Guopeng Zhang, Tejun Heo, Michal Koutný, Ingo Molnar,
	Peter Zijlstra, Juri Lelli
  Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
	cgroups

On 5/7/26 6:33 AM, Guopeng Zhang wrote:
> cpuset_can_attach() preallocates destination SCHED_DEADLINE bandwidth
> before the attach commit point, while set_cpus_allowed_dl() later
> subtracts bandwidth from the source root domain when the task affinity is
> actually updated.
>
> Those two decisions must be made with the same CPU mask.
> cpuset_can_attach() used the destination cpuset effective mask directly,
> but cpuset_attach_task() first builds a per-task target mask which is
> constrained by task_cpu_possible_mask() and, if needed, by walking up the
> cpuset hierarchy. On asymmetric systems, the actual target mask can
> therefore be a strict subset of cs->effective_cpus.

The task_cpu_possible_mask() is there for a special class of arm64 CPUs 
where only some of the cores are able to run legacy 32-bit applications 
on 64-bit arm CPUs. We can argue how likely that a DL task can be a 
legacy 32 bit application that is inherently slower than the same 
application compiled into native 64-bit code. Perhaps we can just 
disallow such a legacy 32-bit application from moving to a DL scheduling 
class in the first place.

I am not in favor of the idea of making the cpuset code more complex to 
support such a corner case which may never be utilized. Could you strip 
out the task_possible_cpu_mask() part from this patch? We can revisit 
this with another patch if such a special use case can be useful to 
support in the future.

Cheers,
Longman

>
> If the source root domain intersects cs->effective_cpus only on CPUs
> outside the task's possible mask, can_attach() can skip the destination
> reservation even though set_cpus_allowed_dl() later sees a real
> root-domain move and subtracts from the source domain.
>
> Extract the root-domain bandwidth-move test used by
> set_cpus_allowed_dl() into dl_task_needs_bw_move(), and make
> cpuset_can_attach() compute the same per-task target mask that
> cpuset_attach_task() applies.
>
> Keep nr_migrate_dl_tasks counting all migrating deadline tasks for
> cpuset DL task accounting. Restrict sum_migrate_dl_bw to the subset of
> tasks that need destination root-domain bandwidth reservation, because a
> deadline task can move between cpusets without moving bandwidth between
> root domains.
>
> This keeps the existing per-attach aggregate reservation model; it only
> changes the per-task mask used to decide which tasks contribute to that
> aggregate. The broader can_attach()/attach() transaction window is left
> unchanged.
>
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
> ---
>   include/linux/sched/deadline.h  |  9 +++
>   kernel/cgroup/cpuset-internal.h |  1 +
>   kernel/cgroup/cpuset.c          | 97 ++++++++++++++++++++++-----------
>   kernel/sched/deadline.c         | 13 ++++-
>   4 files changed, 86 insertions(+), 34 deletions(-)
>
> diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
> index 1198138cb839..ddfd5216f3fc 100644
> --- a/include/linux/sched/deadline.h
> +++ b/include/linux/sched/deadline.h
> @@ -33,6 +33,15 @@ struct root_domain;
>   extern void dl_add_task_root_domain(struct task_struct *p);
>   extern void dl_clear_root_domain(struct root_domain *rd);
>   extern void dl_clear_root_domain_cpu(int cpu);
> +/*
> + * Return whether moving DL task @p to @new_mask requires moving DL
> + * bandwidth accounting between root domains. This helper is specific to
> + * DL bandwidth move accounting semantics and is shared by
> + * cpuset_can_attach() and set_cpus_allowed_dl() so both paths use the
> + * same source root-domain test.
> + */
> +bool dl_task_needs_bw_move(struct task_struct *p,
> +			   const struct cpumask *new_mask);
>   
>   extern u64 dl_cookie;
>   extern bool dl_bw_visited(int cpu, u64 cookie);
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index bb4e692bea30..f7aaf01f7cd5 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -167,6 +167,7 @@ struct cpuset {
>   	 */
>   	int nr_deadline_tasks;
>   	int nr_migrate_dl_tasks;
> +	/* DL bandwidth that needs destination reservation for this attach. */
>   	u64 sum_migrate_dl_bw;
>   	/*
>   	 * CPU used for temporary DL bandwidth allocation during attach;
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index ae41736399a1..78c1a4071cc3 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -485,6 +485,30 @@ static void guarantee_active_cpus(struct task_struct *tsk,
>   	rcu_read_unlock();
>   }
>   
> +/* Compute the effective CPU mask cpuset_attach_task() will apply to @tsk. */
> +static void cpuset_attach_task_cpus(struct cpuset *cs, struct task_struct *tsk,
> +				    struct cpumask *pmask)
> +{
> +	const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
> +
> +	lockdep_assert_cpuset_lock_held();
> +
> +	if (cs == &top_cpuset) {
> +		cpumask_andnot(pmask, possible_mask, subpartitions_cpus);
> +		return;
> +	}
> +
> +	if (WARN_ON(!cpumask_and(pmask, possible_mask, cpu_active_mask)))
> +		cpumask_copy(pmask, cpu_active_mask);
> +
> +	rcu_read_lock();
> +	while (!cpumask_intersects(cs->effective_cpus, pmask))
> +		cs = parent_cs(cs);
> +
> +	cpumask_and(pmask, pmask, cs->effective_cpus);
> +	rcu_read_unlock();
> +}
> +
>   /*
>    * Return in *pmask the portion of a cpusets's mems_allowed that
>    * are online, with memory.  If none are online with memory, walk
> @@ -2986,6 +3010,14 @@ static void reset_migrate_dl_data(struct cpuset *cs)
>   	cs->dl_bw_cpu = -1;
>   }
>   
> +/*
> + * Protected by cpuset_mutex. cpus_attach is used by the can_attach/attach
> + * paths but we can't allocate it dynamically there. Define it global and
> + * allocate from cpuset_init().
> + */
> +static cpumask_var_t cpus_attach;
> +static nodemask_t cpuset_attach_nodemask_to;
> +
>   /* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
>   static int cpuset_can_attach(struct cgroup_taskset *tset)
>   {
> @@ -2993,7 +3025,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>   	struct cpuset *cs, *oldcs;
>   	struct task_struct *task;
>   	bool setsched_check;
> -	int ret;
> +	int cpu = nr_cpu_ids, ret;
>   
>   	/* used later by cpuset_attach() */
>   	cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
> @@ -3038,32 +3070,47 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>   		}
>   
>   		if (dl_task(task)) {
> +			/*
> +			 * Count all migrating DL tasks for cpuset task accounting.
> +			 * Only tasks that need a root-domain bandwidth move
> +			 * contribute to sum_migrate_dl_bw.
> +			 */
>   			cs->nr_migrate_dl_tasks++;
> -			cs->sum_migrate_dl_bw += task->dl.dl_bw;
> +			cpuset_attach_task_cpus(cs, task, cpus_attach);
> +
> +			if (dl_task_needs_bw_move(task, cpus_attach)) {
> +				/*
> +				 * Keep the existing aggregate reservation model.
> +				 * Tasks in one attach enter the same destination
> +				 * cpuset, so the first CPU found for a task needing
> +				 * DL bandwidth reservation identifies the destination
> +				 * root domain.
> +				 */
> +				if (cpu >= nr_cpu_ids)
> +					cpu = cpumask_any_and(cpu_active_mask,
> +							      cpus_attach);
> +				cs->sum_migrate_dl_bw += task->dl.dl_bw;
> +			}
>   		}
>   	}
>   
> -	if (!cs->nr_migrate_dl_tasks)
> +	if (!cs->sum_migrate_dl_bw)
>   		goto out_success;
>   
> -	if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) {
> -		int cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus);
> -
> -		if (unlikely(cpu >= nr_cpu_ids)) {
> -			reset_migrate_dl_data(cs);
> -			ret = -EINVAL;
> -			goto out_unlock;
> -		}
> -
> -		ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
> -		if (ret) {
> -			reset_migrate_dl_data(cs);
> -			goto out_unlock;
> -		}
> +	if (unlikely(cpu >= nr_cpu_ids)) {
> +		reset_migrate_dl_data(cs);
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
>   
> -		cs->dl_bw_cpu = cpu;
> +	ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
> +	if (ret) {
> +		reset_migrate_dl_data(cs);
> +		goto out_unlock;
>   	}
>   
> +	cs->dl_bw_cpu = cpu;
> +
>   out_success:
>   	/*
>   	 * Mark attach is in progress.  This makes validate_change() fail
> @@ -3099,23 +3146,11 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
>   	mutex_unlock(&cpuset_mutex);
>   }
>   
> -/*
> - * Protected by cpuset_mutex. cpus_attach is used only by cpuset_attach_task()
> - * but we can't allocate it dynamically there.  Define it global and
> - * allocate from cpuset_init().
> - */
> -static cpumask_var_t cpus_attach;
> -static nodemask_t cpuset_attach_nodemask_to;
> -
>   static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task)
>   {
>   	lockdep_assert_cpuset_lock_held();
>   
> -	if (cs != &top_cpuset)
> -		guarantee_active_cpus(task, cpus_attach);
> -	else
> -		cpumask_andnot(cpus_attach, task_cpu_possible_mask(task),
> -			       subpartitions_cpus);
> +	cpuset_attach_task_cpus(cs, task, cpus_attach);
>   	/*
>   	 * can_attach beforehand should guarantee that this doesn't
>   	 * fail.  TODO: have a better way to handle failure here
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index edca7849b165..7db4c87df83b 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -3107,20 +3107,18 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p)
>   static void set_cpus_allowed_dl(struct task_struct *p,
>   				struct affinity_context *ctx)
>   {
> -	struct root_domain *src_rd;
>   	struct rq *rq;
>   
>   	WARN_ON_ONCE(!dl_task(p));
>   
>   	rq = task_rq(p);
> -	src_rd = rq->rd;
>   	/*
>   	 * Migrating a SCHED_DEADLINE task between exclusive
>   	 * cpusets (different root_domains) entails a bandwidth
>   	 * update. We already made space for us in the destination
>   	 * domain (see cpuset_can_attach()).
>   	 */
> -	if (!cpumask_intersects(src_rd->span, ctx->new_mask)) {
> +	if (dl_task_needs_bw_move(p, ctx->new_mask)) {
>   		struct dl_bw *src_dl_b;
>   
>   		src_dl_b = dl_bw_of(cpu_of(rq));
> @@ -3137,6 +3135,15 @@ static void set_cpus_allowed_dl(struct task_struct *p,
>   	set_cpus_allowed_common(p, ctx);
>   }
>   
> +bool dl_task_needs_bw_move(struct task_struct *p,
> +			   const struct cpumask *new_mask)
> +{
> +	if (!dl_task(p))
> +		return false;
> +
> +	return !cpumask_intersects(task_rq(p)->rd->span, new_mask);
> +}
> +
>   /* Assumes rq->lock is held */
>   static void rq_online_dl(struct rq *rq)
>   {


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure
  2026-05-07 14:31   ` Waiman Long
@ 2026-05-08  2:14     ` Chen Ridong
  2026-05-08  2:26       ` Waiman Long
  0 siblings, 1 reply; 9+ messages in thread
From: Chen Ridong @ 2026-05-08  2:14 UTC (permalink / raw)
  To: Waiman Long, Guopeng Zhang, Tejun Heo, Michal Koutný,
	Ingo Molnar, Peter Zijlstra, Juri Lelli
  Cc: Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
	cgroups



On 2026/5/7 22:31, Waiman Long wrote:
> On 5/7/26 6:33 AM, Guopeng Zhang wrote:
>> cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration
>> state in the destination cpuset while walking the taskset.
>>
>> If a later task_can_attach() or security_task_setscheduler() check
>> fails, cgroup_migrate_execute() treats cpuset as the failing subsystem
>> and does not call cpuset_cancel_attach() for it. The partially
>> accumulated state is then left behind and can be consumed by a later
>> attach, corrupting cpuset DL task accounting and pending DL bandwidth
>> accounting.
>>
>> Reset the pending DL migration state before returning from those
>> per-task failure paths.
>>
>> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
>> ---
>>   kernel/cgroup/cpuset.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index e3a081a07c6d..ae41736399a1 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>       cgroup_taskset_for_each(task, css, tset) {
>>           ret = task_can_attach(task);
>>           if (ret)
>> -            goto out_unlock;
>> +            goto out_reset_dl_data;
>>             if (setsched_check) {
>>               ret = security_task_setscheduler(task);
>>               if (ret)
>> -                goto out_unlock;
>> +                goto out_reset_dl_data;
>>           }
>>             if (dl_task(task)) {
>> @@ -3070,6 +3070,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>        * changes which zero cpus/mems_allowed.
>>        */
>>       cs->attach_in_progress++;
>> +    goto out_unlock;
>> +
>> +out_reset_dl_data:
>> +    reset_migrate_dl_data(cs);
>>   out_unlock:
>>       mutex_unlock(&cpuset_mutex);
>>       return ret;
> 
> I would prefer the likely success path be a straight line instead of doing a
> goto. IOW, move out_reset_dl_data below return. Other than that, this patch
> looks good to me.
> 

I've read the code and found several places that call reset_migrate_dl_data(cs).

I think it would be better to call reset_migrate_dl_data(cs) only when we
encounter an error, for example:

```
static int cpuset_can_attach(struct cgroup_taskset *tset)
{
...
out_unlock:
	if (ret)
		reset_migrate_dl_data(cs);
	mutex_unlock(&cpuset_mutex);
	return ret;
}
```
After that, no other places would need to call reset_migrate_dl_data(cs), right?

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure
  2026-05-08  2:14     ` Chen Ridong
@ 2026-05-08  2:26       ` Waiman Long
  2026-05-08 13:03         ` Guopeng Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2026-05-08  2:26 UTC (permalink / raw)
  To: Chen Ridong, Guopeng Zhang, Tejun Heo, Michal Koutný,
	Ingo Molnar, Peter Zijlstra, Juri Lelli
  Cc: Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
	cgroups


On 5/7/26 10:14 PM, Chen Ridong wrote:
>
> On 2026/5/7 22:31, Waiman Long wrote:
>> On 5/7/26 6:33 AM, Guopeng Zhang wrote:
>>> cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration
>>> state in the destination cpuset while walking the taskset.
>>>
>>> If a later task_can_attach() or security_task_setscheduler() check
>>> fails, cgroup_migrate_execute() treats cpuset as the failing subsystem
>>> and does not call cpuset_cancel_attach() for it. The partially
>>> accumulated state is then left behind and can be consumed by a later
>>> attach, corrupting cpuset DL task accounting and pending DL bandwidth
>>> accounting.
>>>
>>> Reset the pending DL migration state before returning from those
>>> per-task failure paths.
>>>
>>> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
>>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
>>> ---
>>>    kernel/cgroup/cpuset.c | 8 ++++++--
>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index e3a081a07c6d..ae41736399a1 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>>        cgroup_taskset_for_each(task, css, tset) {
>>>            ret = task_can_attach(task);
>>>            if (ret)
>>> -            goto out_unlock;
>>> +            goto out_reset_dl_data;
>>>              if (setsched_check) {
>>>                ret = security_task_setscheduler(task);
>>>                if (ret)
>>> -                goto out_unlock;
>>> +                goto out_reset_dl_data;
>>>            }
>>>              if (dl_task(task)) {
>>> @@ -3070,6 +3070,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>>         * changes which zero cpus/mems_allowed.
>>>         */
>>>        cs->attach_in_progress++;
>>> +    goto out_unlock;
>>> +
>>> +out_reset_dl_data:
>>> +    reset_migrate_dl_data(cs);
>>>    out_unlock:
>>>        mutex_unlock(&cpuset_mutex);
>>>        return ret;
>> I would prefer the likely success path be a straight line instead of doing a
>> goto. IOW, move out_reset_dl_data below return. Other than that, this patch
>> looks good to me.
>>
> I've read the code and found several places that call reset_migrate_dl_data(cs).
>
> I think it would be better to call reset_migrate_dl_data(cs) only when we
> encounter an error, for example:
>
> ```
> static int cpuset_can_attach(struct cgroup_taskset *tset)
> {
> ...
> out_unlock:
> 	if (ret)
> 		reset_migrate_dl_data(cs);
> 	mutex_unlock(&cpuset_mutex);
> 	return ret;
> }
> ```
> After that, no other places would need to call reset_migrate_dl_data(cs), right?
>
Yes, that should work too.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure
  2026-05-08  2:26       ` Waiman Long
@ 2026-05-08 13:03         ` Guopeng Zhang
  0 siblings, 0 replies; 9+ messages in thread
From: Guopeng Zhang @ 2026-05-08 13:03 UTC (permalink / raw)
  To: Waiman Long, Chen Ridong, Tejun Heo, Michal Koutný,
	Ingo Molnar, Peter Zijlstra, Juri Lelli
  Cc: Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
	cgroups



在 2026/5/8 10:26, Waiman Long 写道:
> 
> On 5/7/26 10:14 PM, Chen Ridong wrote:
>>
>> On 2026/5/7 22:31, Waiman Long wrote:
>>> On 5/7/26 6:33 AM, Guopeng Zhang wrote:
>>>> cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration
>>>> state in the destination cpuset while walking the taskset.
>>>>
>>>> If a later task_can_attach() or security_task_setscheduler() check
>>>> fails, cgroup_migrate_execute() treats cpuset as the failing subsystem
>>>> and does not call cpuset_cancel_attach() for it. The partially
>>>> accumulated state is then left behind and can be consumed by a later
>>>> attach, corrupting cpuset DL task accounting and pending DL bandwidth
>>>> accounting.
>>>>
>>>> Reset the pending DL migration state before returning from those
>>>> per-task failure paths.
>>>>
>>>> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
>>>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
>>>> ---
>>>>    kernel/cgroup/cpuset.c | 8 ++++++--
>>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>> index e3a081a07c6d..ae41736399a1 100644
>>>> --- a/kernel/cgroup/cpuset.c
>>>> +++ b/kernel/cgroup/cpuset.c
>>>> @@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>>>        cgroup_taskset_for_each(task, css, tset) {
>>>>            ret = task_can_attach(task);
>>>>            if (ret)
>>>> -            goto out_unlock;
>>>> +            goto out_reset_dl_data;
>>>>              if (setsched_check) {
>>>>                ret = security_task_setscheduler(task);
>>>>                if (ret)
>>>> -                goto out_unlock;
>>>> +                goto out_reset_dl_data;
>>>>            }
>>>>              if (dl_task(task)) {
>>>> @@ -3070,6 +3070,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>>>         * changes which zero cpus/mems_allowed.
>>>>         */
>>>>        cs->attach_in_progress++;
>>>> +    goto out_unlock;
>>>> +
>>>> +out_reset_dl_data:
>>>> +    reset_migrate_dl_data(cs);
>>>>    out_unlock:
>>>>        mutex_unlock(&cpuset_mutex);
>>>>        return ret;
>>> I would prefer the likely success path be a straight line instead of doing a
>>> goto. IOW, move out_reset_dl_data below return. Other than that, this patch
>>> looks good to me.
>>>
>> I've read the code and found several places that call reset_migrate_dl_data(cs).
>>
>> I think it would be better to call reset_migrate_dl_data(cs) only when we
>> encounter an error, for example:
>>
>> ```
>> static int cpuset_can_attach(struct cgroup_taskset *tset)
>> {
>> ...
>> out_unlock:
>>     if (ret)
>>         reset_migrate_dl_data(cs);
>>     mutex_unlock(&cpuset_mutex);
>>     return ret;
>> }
>> ```
>> After that, no other places would need to call reset_migrate_dl_data(cs), right?
>>
> Yes, that should work too.
> 
Thanks for the review.

Yes, I will update cpuset_can_attach() to use the common ret-based
cleanup in out_unlock.

Thanks,
Guopeng
> Cheers,
> Longman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask
  2026-05-07 15:52   ` Waiman Long
@ 2026-05-08 13:11     ` Guopeng Zhang
  0 siblings, 0 replies; 9+ messages in thread
From: Guopeng Zhang @ 2026-05-08 13:11 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Michal Koutný, Ingo Molnar,
	Peter Zijlstra, Juri Lelli
  Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
	cgroups



在 2026/5/7 23:52, Waiman Long 写道:
> On 5/7/26 6:33 AM, Guopeng Zhang wrote:
>> cpuset_can_attach() preallocates destination SCHED_DEADLINE bandwidth
>> before the attach commit point, while set_cpus_allowed_dl() later
>> subtracts bandwidth from the source root domain when the task affinity is
>> actually updated.
>>
>> Those two decisions must be made with the same CPU mask.
>> cpuset_can_attach() used the destination cpuset effective mask directly,
>> but cpuset_attach_task() first builds a per-task target mask which is
>> constrained by task_cpu_possible_mask() and, if needed, by walking up the
>> cpuset hierarchy. On asymmetric systems, the actual target mask can
>> therefore be a strict subset of cs->effective_cpus.
> 
> The task_cpu_possible_mask() is there for a special class of arm64 CPUs where only some of the cores are able to run legacy 32-bit applications on 64-bit arm CPUs. We can argue how likely that a DL task can be a legacy 32 bit application that is inherently slower than the same application compiled into native 64-bit code. Perhaps we can just disallow such a legacy 32-bit application from moving to a DL scheduling class in the first place.
> 
> I am not in favor of the idea of making the cpuset code more complex to support such a corner case which may never be utilized. Could you strip out the task_possible_cpu_mask() part from this patch? We can revisit this with another patch if such a special use case can be useful to support in the future.
> 
Thanks for the review.

I agree. The task_cpu_possible_mask() case makes the fix broader and
adds more cpuset-side complexity than needed for this series.

I will drop the cpuset_attach_task() target-mask mirroring from v3 and
keep cpuset_can_attach() using cs->effective_cpus. The updated patch will
only share the root-domain bandwidth-move test with set_cpus_allowed_dl()
and only add a migrating DL task to sum_migrate_dl_bw when that task
actually needs a root-domain bandwidth move.

The task_cpu_possible_mask() corner case can be revisited separately if
there is a real need to support that scenario.

Thanks,
Guopeng
> Cheers,
> Longman
> 
>>
>> If the source root domain intersects cs->effective_cpus only on CPUs
>> outside the task's possible mask, can_attach() can skip the destination
>> reservation even though set_cpus_allowed_dl() later sees a real
>> root-domain move and subtracts from the source domain.
>>
>> Extract the root-domain bandwidth-move test used by
>> set_cpus_allowed_dl() into dl_task_needs_bw_move(), and make
>> cpuset_can_attach() compute the same per-task target mask that
>> cpuset_attach_task() applies.
>>
>> Keep nr_migrate_dl_tasks counting all migrating deadline tasks for
>> cpuset DL task accounting. Restrict sum_migrate_dl_bw to the subset of
>> tasks that need destination root-domain bandwidth reservation, because a
>> deadline task can move between cpusets without moving bandwidth between
>> root domains.
>>
>> This keeps the existing per-attach aggregate reservation model; it only
>> changes the per-task mask used to decide which tasks contribute to that
>> aggregate. The broader can_attach()/attach() transaction window is left
>> unchanged.
>>
>> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
>> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
>> ---
>>   include/linux/sched/deadline.h  |  9 +++
>>   kernel/cgroup/cpuset-internal.h |  1 +
>>   kernel/cgroup/cpuset.c          | 97 ++++++++++++++++++++++-----------
>>   kernel/sched/deadline.c         | 13 ++++-
>>   4 files changed, 86 insertions(+), 34 deletions(-)
>>



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-08 13:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 10:33 [PATCH v2 0/2] cgroup/cpuset: fix DL attach bandwidth accounting Guopeng Zhang
2026-05-07 10:33 ` [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure Guopeng Zhang
2026-05-07 14:31   ` Waiman Long
2026-05-08  2:14     ` Chen Ridong
2026-05-08  2:26       ` Waiman Long
2026-05-08 13:03         ` Guopeng Zhang
2026-05-07 10:33 ` [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask Guopeng Zhang
2026-05-07 15:52   ` Waiman Long
2026-05-08 13:11     ` Guopeng Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox