* [PATCH v2 0/2] cgroup/cpuset: fix DL attach bandwidth accounting
@ 2026-05-07 10:33 Guopeng Zhang
2026-05-07 10:33 ` [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure Guopeng Zhang
2026-05-07 10:33 ` [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask Guopeng Zhang
0 siblings, 2 replies; 9+ messages in thread
From: Guopeng Zhang @ 2026-05-07 10:33 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Michal Koutný, Ingo Molnar,
Peter Zijlstra, Juri Lelli
Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
cgroups, Guopeng Zhang
Hi,
cpuset_can_attach() and set_cpus_allowed_dl() must make the same
decision about whether migrating a SCHED_DEADLINE task requires moving
bandwidth accounting between root domains.
The can_attach path used the destination cpuset effective CPU mask for
that decision. The attach path, however, applies a per-task target mask
which is constrained by task_cpu_possible_mask(), cpu_active_mask, and
the fallback walk up the cpuset hierarchy. On asymmetric CPU systems,
that per-task mask can be a strict subset of the destination cpuset
effective mask. This can make cpuset_can_attach() skip destination
bandwidth reservation while set_cpus_allowed_dl() later performs the
source-side bandwidth subtraction.
There is also an internal cpuset_can_attach() failure path where
temporary DL migration state can be left behind if a later per-task
check fails before cpuset marks attach_in_progress.
Patch 1 resets the temporary DL migration state on those internal
cpuset_can_attach() failure paths.
Patch 2 computes the same per-task target mask in cpuset_can_attach()
that cpuset_attach_task() later applies, and only includes DL tasks that
actually need a root-domain bandwidth move in the destination bandwidth
reservation.
The broader can_attach()/attach() transaction window is left unchanged.
This series does not attempt to rework sched_setattr() or source cpuset
resmask TOCTOU issues. It only aligns the reservation decision with the
attach-time bandwidth move decision and fixes the temporary state leak.
Guopeng Zhang (2):
cgroup/cpuset: reset DL migration state on can_attach() failure
cgroup/cpuset: align DL bandwidth reservation with attach target mask
include/linux/sched/deadline.h | 9 +++
kernel/cgroup/cpuset-internal.h | 1 +
kernel/cgroup/cpuset.c | 105 ++++++++++++++++++++++----------
kernel/sched/deadline.c | 13 +++-
4 files changed, 92 insertions(+), 36 deletions(-)
---
Changes since v1:
- Split the original patch into two patches.
- Reset temporary DL migration state on cpuset_can_attach() internal
failure paths.
- Computed the same per-task attach mask in cpuset_can_attach() as
cpuset_attach_task().
- Kept nr_migrate_dl_tasks counting all migrating DL tasks for cpuset
task accounting, while restricting sum_migrate_dl_bw to tasks that need
destination DL bandwidth reservation.
- Tightened Fixes tags.
- Documented the existing aggregate reservation invariant near the
dl_bw_cpu selection.
- Removed the unnecessary RCU guard from dl_task_needs_bw_move().
v1:
https://lore.kernel.org/all/20260421083449.95750-1-zhangguopeng@kylinos.cn
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure
2026-05-07 10:33 [PATCH v2 0/2] cgroup/cpuset: fix DL attach bandwidth accounting Guopeng Zhang
@ 2026-05-07 10:33 ` Guopeng Zhang
2026-05-07 14:31 ` Waiman Long
2026-05-07 10:33 ` [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask Guopeng Zhang
1 sibling, 1 reply; 9+ messages in thread
From: Guopeng Zhang @ 2026-05-07 10:33 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Michal Koutný, Ingo Molnar,
Peter Zijlstra, Juri Lelli
Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
cgroups, Guopeng Zhang
cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration
state in the destination cpuset while walking the taskset.
If a later task_can_attach() or security_task_setscheduler() check
fails, cgroup_migrate_execute() treats cpuset as the failing subsystem
and does not call cpuset_cancel_attach() for it. The partially
accumulated state is then left behind and can be consumed by a later
attach, corrupting cpuset DL task accounting and pending DL bandwidth
accounting.
Reset the pending DL migration state before returning from those
per-task failure paths.
Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
---
kernel/cgroup/cpuset.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index e3a081a07c6d..ae41736399a1 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
cgroup_taskset_for_each(task, css, tset) {
ret = task_can_attach(task);
if (ret)
- goto out_unlock;
+ goto out_reset_dl_data;
if (setsched_check) {
ret = security_task_setscheduler(task);
if (ret)
- goto out_unlock;
+ goto out_reset_dl_data;
}
if (dl_task(task)) {
@@ -3070,6 +3070,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
* changes which zero cpus/mems_allowed.
*/
cs->attach_in_progress++;
+ goto out_unlock;
+
+out_reset_dl_data:
+ reset_migrate_dl_data(cs);
out_unlock:
mutex_unlock(&cpuset_mutex);
return ret;
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask
2026-05-07 10:33 [PATCH v2 0/2] cgroup/cpuset: fix DL attach bandwidth accounting Guopeng Zhang
2026-05-07 10:33 ` [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure Guopeng Zhang
@ 2026-05-07 10:33 ` Guopeng Zhang
2026-05-07 15:52 ` Waiman Long
1 sibling, 1 reply; 9+ messages in thread
From: Guopeng Zhang @ 2026-05-07 10:33 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Michal Koutný, Ingo Molnar,
Peter Zijlstra, Juri Lelli
Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
cgroups, Guopeng Zhang
cpuset_can_attach() preallocates destination SCHED_DEADLINE bandwidth
before the attach commit point, while set_cpus_allowed_dl() later
subtracts bandwidth from the source root domain when the task affinity is
actually updated.
Those two decisions must be made with the same CPU mask.
cpuset_can_attach() used the destination cpuset effective mask directly,
but cpuset_attach_task() first builds a per-task target mask which is
constrained by task_cpu_possible_mask() and, if needed, by walking up the
cpuset hierarchy. On asymmetric systems, the actual target mask can
therefore be a strict subset of cs->effective_cpus.
If the source root domain intersects cs->effective_cpus only on CPUs
outside the task's possible mask, can_attach() can skip the destination
reservation even though set_cpus_allowed_dl() later sees a real
root-domain move and subtracts from the source domain.
Extract the root-domain bandwidth-move test used by
set_cpus_allowed_dl() into dl_task_needs_bw_move(), and make
cpuset_can_attach() compute the same per-task target mask that
cpuset_attach_task() applies.
Keep nr_migrate_dl_tasks counting all migrating deadline tasks for
cpuset DL task accounting. Restrict sum_migrate_dl_bw to the subset of
tasks that need destination root-domain bandwidth reservation, because a
deadline task can move between cpusets without moving bandwidth between
root domains.
This keeps the existing per-attach aggregate reservation model; it only
changes the per-task mask used to decide which tasks contribute to that
aggregate. The broader can_attach()/attach() transaction window is left
unchanged.
Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
---
include/linux/sched/deadline.h | 9 +++
kernel/cgroup/cpuset-internal.h | 1 +
kernel/cgroup/cpuset.c | 97 ++++++++++++++++++++++-----------
kernel/sched/deadline.c | 13 ++++-
4 files changed, 86 insertions(+), 34 deletions(-)
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 1198138cb839..ddfd5216f3fc 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -33,6 +33,15 @@ struct root_domain;
extern void dl_add_task_root_domain(struct task_struct *p);
extern void dl_clear_root_domain(struct root_domain *rd);
extern void dl_clear_root_domain_cpu(int cpu);
+/*
+ * Return whether moving DL task @p to @new_mask requires moving DL
+ * bandwidth accounting between root domains. This helper is specific to
+ * DL bandwidth move accounting semantics and is shared by
+ * cpuset_can_attach() and set_cpus_allowed_dl() so both paths use the
+ * same source root-domain test.
+ */
+bool dl_task_needs_bw_move(struct task_struct *p,
+ const struct cpumask *new_mask);
extern u64 dl_cookie;
extern bool dl_bw_visited(int cpu, u64 cookie);
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index bb4e692bea30..f7aaf01f7cd5 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -167,6 +167,7 @@ struct cpuset {
*/
int nr_deadline_tasks;
int nr_migrate_dl_tasks;
+ /* DL bandwidth that needs destination reservation for this attach. */
u64 sum_migrate_dl_bw;
/*
* CPU used for temporary DL bandwidth allocation during attach;
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index ae41736399a1..78c1a4071cc3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -485,6 +485,30 @@ static void guarantee_active_cpus(struct task_struct *tsk,
rcu_read_unlock();
}
+/* Compute the effective CPU mask cpuset_attach_task() will apply to @tsk. */
+static void cpuset_attach_task_cpus(struct cpuset *cs, struct task_struct *tsk,
+ struct cpumask *pmask)
+{
+ const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
+
+ lockdep_assert_cpuset_lock_held();
+
+ if (cs == &top_cpuset) {
+ cpumask_andnot(pmask, possible_mask, subpartitions_cpus);
+ return;
+ }
+
+ if (WARN_ON(!cpumask_and(pmask, possible_mask, cpu_active_mask)))
+ cpumask_copy(pmask, cpu_active_mask);
+
+ rcu_read_lock();
+ while (!cpumask_intersects(cs->effective_cpus, pmask))
+ cs = parent_cs(cs);
+
+ cpumask_and(pmask, pmask, cs->effective_cpus);
+ rcu_read_unlock();
+}
+
/*
* Return in *pmask the portion of a cpusets's mems_allowed that
* are online, with memory. If none are online with memory, walk
@@ -2986,6 +3010,14 @@ static void reset_migrate_dl_data(struct cpuset *cs)
cs->dl_bw_cpu = -1;
}
+/*
+ * Protected by cpuset_mutex. cpus_attach is used by the can_attach/attach
+ * paths but we can't allocate it dynamically there. Define it global and
+ * allocate from cpuset_init().
+ */
+static cpumask_var_t cpus_attach;
+static nodemask_t cpuset_attach_nodemask_to;
+
/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
static int cpuset_can_attach(struct cgroup_taskset *tset)
{
@@ -2993,7 +3025,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
struct cpuset *cs, *oldcs;
struct task_struct *task;
bool setsched_check;
- int ret;
+ int cpu = nr_cpu_ids, ret;
/* used later by cpuset_attach() */
cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
@@ -3038,32 +3070,47 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
}
if (dl_task(task)) {
+ /*
+ * Count all migrating DL tasks for cpuset task accounting.
+ * Only tasks that need a root-domain bandwidth move
+ * contribute to sum_migrate_dl_bw.
+ */
cs->nr_migrate_dl_tasks++;
- cs->sum_migrate_dl_bw += task->dl.dl_bw;
+ cpuset_attach_task_cpus(cs, task, cpus_attach);
+
+ if (dl_task_needs_bw_move(task, cpus_attach)) {
+ /*
+ * Keep the existing aggregate reservation model.
+ * Tasks in one attach enter the same destination
+ * cpuset, so the first CPU found for a task needing
+ * DL bandwidth reservation identifies the destination
+ * root domain.
+ */
+ if (cpu >= nr_cpu_ids)
+ cpu = cpumask_any_and(cpu_active_mask,
+ cpus_attach);
+ cs->sum_migrate_dl_bw += task->dl.dl_bw;
+ }
}
}
- if (!cs->nr_migrate_dl_tasks)
+ if (!cs->sum_migrate_dl_bw)
goto out_success;
- if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) {
- int cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus);
-
- if (unlikely(cpu >= nr_cpu_ids)) {
- reset_migrate_dl_data(cs);
- ret = -EINVAL;
- goto out_unlock;
- }
-
- ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
- if (ret) {
- reset_migrate_dl_data(cs);
- goto out_unlock;
- }
+ if (unlikely(cpu >= nr_cpu_ids)) {
+ reset_migrate_dl_data(cs);
+ ret = -EINVAL;
+ goto out_unlock;
+ }
- cs->dl_bw_cpu = cpu;
+ ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
+ if (ret) {
+ reset_migrate_dl_data(cs);
+ goto out_unlock;
}
+ cs->dl_bw_cpu = cpu;
+
out_success:
/*
* Mark attach is in progress. This makes validate_change() fail
@@ -3099,23 +3146,11 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
mutex_unlock(&cpuset_mutex);
}
-/*
- * Protected by cpuset_mutex. cpus_attach is used only by cpuset_attach_task()
- * but we can't allocate it dynamically there. Define it global and
- * allocate from cpuset_init().
- */
-static cpumask_var_t cpus_attach;
-static nodemask_t cpuset_attach_nodemask_to;
-
static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task)
{
lockdep_assert_cpuset_lock_held();
- if (cs != &top_cpuset)
- guarantee_active_cpus(task, cpus_attach);
- else
- cpumask_andnot(cpus_attach, task_cpu_possible_mask(task),
- subpartitions_cpus);
+ cpuset_attach_task_cpus(cs, task, cpus_attach);
/*
* can_attach beforehand should guarantee that this doesn't
* fail. TODO: have a better way to handle failure here
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index edca7849b165..7db4c87df83b 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3107,20 +3107,18 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p)
static void set_cpus_allowed_dl(struct task_struct *p,
struct affinity_context *ctx)
{
- struct root_domain *src_rd;
struct rq *rq;
WARN_ON_ONCE(!dl_task(p));
rq = task_rq(p);
- src_rd = rq->rd;
/*
* Migrating a SCHED_DEADLINE task between exclusive
* cpusets (different root_domains) entails a bandwidth
* update. We already made space for us in the destination
* domain (see cpuset_can_attach()).
*/
- if (!cpumask_intersects(src_rd->span, ctx->new_mask)) {
+ if (dl_task_needs_bw_move(p, ctx->new_mask)) {
struct dl_bw *src_dl_b;
src_dl_b = dl_bw_of(cpu_of(rq));
@@ -3137,6 +3135,15 @@ static void set_cpus_allowed_dl(struct task_struct *p,
set_cpus_allowed_common(p, ctx);
}
+bool dl_task_needs_bw_move(struct task_struct *p,
+ const struct cpumask *new_mask)
+{
+ if (!dl_task(p))
+ return false;
+
+ return !cpumask_intersects(task_rq(p)->rd->span, new_mask);
+}
+
/* Assumes rq->lock is held */
static void rq_online_dl(struct rq *rq)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure
2026-05-07 10:33 ` [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure Guopeng Zhang
@ 2026-05-07 14:31 ` Waiman Long
2026-05-08 2:14 ` Chen Ridong
0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2026-05-07 14:31 UTC (permalink / raw)
To: Guopeng Zhang, Tejun Heo, Michal Koutný, Ingo Molnar,
Peter Zijlstra, Juri Lelli
Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
cgroups
On 5/7/26 6:33 AM, Guopeng Zhang wrote:
> cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration
> state in the destination cpuset while walking the taskset.
>
> If a later task_can_attach() or security_task_setscheduler() check
> fails, cgroup_migrate_execute() treats cpuset as the failing subsystem
> and does not call cpuset_cancel_attach() for it. The partially
> accumulated state is then left behind and can be consumed by a later
> attach, corrupting cpuset DL task accounting and pending DL bandwidth
> accounting.
>
> Reset the pending DL migration state before returning from those
> per-task failure paths.
>
> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
> ---
> kernel/cgroup/cpuset.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index e3a081a07c6d..ae41736399a1 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
> cgroup_taskset_for_each(task, css, tset) {
> ret = task_can_attach(task);
> if (ret)
> - goto out_unlock;
> + goto out_reset_dl_data;
>
> if (setsched_check) {
> ret = security_task_setscheduler(task);
> if (ret)
> - goto out_unlock;
> + goto out_reset_dl_data;
> }
>
> if (dl_task(task)) {
> @@ -3070,6 +3070,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
> * changes which zero cpus/mems_allowed.
> */
> cs->attach_in_progress++;
> + goto out_unlock;
> +
> +out_reset_dl_data:
> + reset_migrate_dl_data(cs);
> out_unlock:
> mutex_unlock(&cpuset_mutex);
> return ret;
I would prefer the likely success path be a straight line instead of
doing a goto. IOW, move out_reset_dl_data below return. Other than that,
this patch looks good to me.
Cheers,
Longman
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask
2026-05-07 10:33 ` [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask Guopeng Zhang
@ 2026-05-07 15:52 ` Waiman Long
2026-05-08 13:11 ` Guopeng Zhang
0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2026-05-07 15:52 UTC (permalink / raw)
To: Guopeng Zhang, Tejun Heo, Michal Koutný, Ingo Molnar,
Peter Zijlstra, Juri Lelli
Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
cgroups
On 5/7/26 6:33 AM, Guopeng Zhang wrote:
> cpuset_can_attach() preallocates destination SCHED_DEADLINE bandwidth
> before the attach commit point, while set_cpus_allowed_dl() later
> subtracts bandwidth from the source root domain when the task affinity is
> actually updated.
>
> Those two decisions must be made with the same CPU mask.
> cpuset_can_attach() used the destination cpuset effective mask directly,
> but cpuset_attach_task() first builds a per-task target mask which is
> constrained by task_cpu_possible_mask() and, if needed, by walking up the
> cpuset hierarchy. On asymmetric systems, the actual target mask can
> therefore be a strict subset of cs->effective_cpus.
The task_cpu_possible_mask() is there for a special class of arm64 CPUs
where only some of the cores are able to run legacy 32-bit applications
on 64-bit arm CPUs. We can argue how likely that a DL task can be a
legacy 32 bit application that is inherently slower than the same
application compiled into native 64-bit code. Perhaps we can just
disallow such a legacy 32-bit application from moving to a DL scheduling
class in the first place.
I am not in favor of the idea of making the cpuset code more complex to
support such a corner case which may never be utilized. Could you strip
out the task_possible_cpu_mask() part from this patch? We can revisit
this with another patch if such a special use case can be useful to
support in the future.
Cheers,
Longman
>
> If the source root domain intersects cs->effective_cpus only on CPUs
> outside the task's possible mask, can_attach() can skip the destination
> reservation even though set_cpus_allowed_dl() later sees a real
> root-domain move and subtracts from the source domain.
>
> Extract the root-domain bandwidth-move test used by
> set_cpus_allowed_dl() into dl_task_needs_bw_move(), and make
> cpuset_can_attach() compute the same per-task target mask that
> cpuset_attach_task() applies.
>
> Keep nr_migrate_dl_tasks counting all migrating deadline tasks for
> cpuset DL task accounting. Restrict sum_migrate_dl_bw to the subset of
> tasks that need destination root-domain bandwidth reservation, because a
> deadline task can move between cpusets without moving bandwidth between
> root domains.
>
> This keeps the existing per-attach aggregate reservation model; it only
> changes the per-task mask used to decide which tasks contribute to that
> aggregate. The broader can_attach()/attach() transaction window is left
> unchanged.
>
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
> ---
> include/linux/sched/deadline.h | 9 +++
> kernel/cgroup/cpuset-internal.h | 1 +
> kernel/cgroup/cpuset.c | 97 ++++++++++++++++++++++-----------
> kernel/sched/deadline.c | 13 ++++-
> 4 files changed, 86 insertions(+), 34 deletions(-)
>
> diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
> index 1198138cb839..ddfd5216f3fc 100644
> --- a/include/linux/sched/deadline.h
> +++ b/include/linux/sched/deadline.h
> @@ -33,6 +33,15 @@ struct root_domain;
> extern void dl_add_task_root_domain(struct task_struct *p);
> extern void dl_clear_root_domain(struct root_domain *rd);
> extern void dl_clear_root_domain_cpu(int cpu);
> +/*
> + * Return whether moving DL task @p to @new_mask requires moving DL
> + * bandwidth accounting between root domains. This helper is specific to
> + * DL bandwidth move accounting semantics and is shared by
> + * cpuset_can_attach() and set_cpus_allowed_dl() so both paths use the
> + * same source root-domain test.
> + */
> +bool dl_task_needs_bw_move(struct task_struct *p,
> + const struct cpumask *new_mask);
>
> extern u64 dl_cookie;
> extern bool dl_bw_visited(int cpu, u64 cookie);
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index bb4e692bea30..f7aaf01f7cd5 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -167,6 +167,7 @@ struct cpuset {
> */
> int nr_deadline_tasks;
> int nr_migrate_dl_tasks;
> + /* DL bandwidth that needs destination reservation for this attach. */
> u64 sum_migrate_dl_bw;
> /*
> * CPU used for temporary DL bandwidth allocation during attach;
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index ae41736399a1..78c1a4071cc3 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -485,6 +485,30 @@ static void guarantee_active_cpus(struct task_struct *tsk,
> rcu_read_unlock();
> }
>
> +/* Compute the effective CPU mask cpuset_attach_task() will apply to @tsk. */
> +static void cpuset_attach_task_cpus(struct cpuset *cs, struct task_struct *tsk,
> + struct cpumask *pmask)
> +{
> + const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
> +
> + lockdep_assert_cpuset_lock_held();
> +
> + if (cs == &top_cpuset) {
> + cpumask_andnot(pmask, possible_mask, subpartitions_cpus);
> + return;
> + }
> +
> + if (WARN_ON(!cpumask_and(pmask, possible_mask, cpu_active_mask)))
> + cpumask_copy(pmask, cpu_active_mask);
> +
> + rcu_read_lock();
> + while (!cpumask_intersects(cs->effective_cpus, pmask))
> + cs = parent_cs(cs);
> +
> + cpumask_and(pmask, pmask, cs->effective_cpus);
> + rcu_read_unlock();
> +}
> +
> /*
> * Return in *pmask the portion of a cpusets's mems_allowed that
> * are online, with memory. If none are online with memory, walk
> @@ -2986,6 +3010,14 @@ static void reset_migrate_dl_data(struct cpuset *cs)
> cs->dl_bw_cpu = -1;
> }
>
> +/*
> + * Protected by cpuset_mutex. cpus_attach is used by the can_attach/attach
> + * paths but we can't allocate it dynamically there. Define it global and
> + * allocate from cpuset_init().
> + */
> +static cpumask_var_t cpus_attach;
> +static nodemask_t cpuset_attach_nodemask_to;
> +
> /* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
> static int cpuset_can_attach(struct cgroup_taskset *tset)
> {
> @@ -2993,7 +3025,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
> struct cpuset *cs, *oldcs;
> struct task_struct *task;
> bool setsched_check;
> - int ret;
> + int cpu = nr_cpu_ids, ret;
>
> /* used later by cpuset_attach() */
> cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
> @@ -3038,32 +3070,47 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
> }
>
> if (dl_task(task)) {
> + /*
> + * Count all migrating DL tasks for cpuset task accounting.
> + * Only tasks that need a root-domain bandwidth move
> + * contribute to sum_migrate_dl_bw.
> + */
> cs->nr_migrate_dl_tasks++;
> - cs->sum_migrate_dl_bw += task->dl.dl_bw;
> + cpuset_attach_task_cpus(cs, task, cpus_attach);
> +
> + if (dl_task_needs_bw_move(task, cpus_attach)) {
> + /*
> + * Keep the existing aggregate reservation model.
> + * Tasks in one attach enter the same destination
> + * cpuset, so the first CPU found for a task needing
> + * DL bandwidth reservation identifies the destination
> + * root domain.
> + */
> + if (cpu >= nr_cpu_ids)
> + cpu = cpumask_any_and(cpu_active_mask,
> + cpus_attach);
> + cs->sum_migrate_dl_bw += task->dl.dl_bw;
> + }
> }
> }
>
> - if (!cs->nr_migrate_dl_tasks)
> + if (!cs->sum_migrate_dl_bw)
> goto out_success;
>
> - if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) {
> - int cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus);
> -
> - if (unlikely(cpu >= nr_cpu_ids)) {
> - reset_migrate_dl_data(cs);
> - ret = -EINVAL;
> - goto out_unlock;
> - }
> -
> - ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
> - if (ret) {
> - reset_migrate_dl_data(cs);
> - goto out_unlock;
> - }
> + if (unlikely(cpu >= nr_cpu_ids)) {
> + reset_migrate_dl_data(cs);
> + ret = -EINVAL;
> + goto out_unlock;
> + }
>
> - cs->dl_bw_cpu = cpu;
> + ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
> + if (ret) {
> + reset_migrate_dl_data(cs);
> + goto out_unlock;
> }
>
> + cs->dl_bw_cpu = cpu;
> +
> out_success:
> /*
> * Mark attach is in progress. This makes validate_change() fail
> @@ -3099,23 +3146,11 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
> mutex_unlock(&cpuset_mutex);
> }
>
> -/*
> - * Protected by cpuset_mutex. cpus_attach is used only by cpuset_attach_task()
> - * but we can't allocate it dynamically there. Define it global and
> - * allocate from cpuset_init().
> - */
> -static cpumask_var_t cpus_attach;
> -static nodemask_t cpuset_attach_nodemask_to;
> -
> static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task)
> {
> lockdep_assert_cpuset_lock_held();
>
> - if (cs != &top_cpuset)
> - guarantee_active_cpus(task, cpus_attach);
> - else
> - cpumask_andnot(cpus_attach, task_cpu_possible_mask(task),
> - subpartitions_cpus);
> + cpuset_attach_task_cpus(cs, task, cpus_attach);
> /*
> * can_attach beforehand should guarantee that this doesn't
> * fail. TODO: have a better way to handle failure here
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index edca7849b165..7db4c87df83b 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -3107,20 +3107,18 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p)
> static void set_cpus_allowed_dl(struct task_struct *p,
> struct affinity_context *ctx)
> {
> - struct root_domain *src_rd;
> struct rq *rq;
>
> WARN_ON_ONCE(!dl_task(p));
>
> rq = task_rq(p);
> - src_rd = rq->rd;
> /*
> * Migrating a SCHED_DEADLINE task between exclusive
> * cpusets (different root_domains) entails a bandwidth
> * update. We already made space for us in the destination
> * domain (see cpuset_can_attach()).
> */
> - if (!cpumask_intersects(src_rd->span, ctx->new_mask)) {
> + if (dl_task_needs_bw_move(p, ctx->new_mask)) {
> struct dl_bw *src_dl_b;
>
> src_dl_b = dl_bw_of(cpu_of(rq));
> @@ -3137,6 +3135,15 @@ static void set_cpus_allowed_dl(struct task_struct *p,
> set_cpus_allowed_common(p, ctx);
> }
>
> +bool dl_task_needs_bw_move(struct task_struct *p,
> + const struct cpumask *new_mask)
> +{
> + if (!dl_task(p))
> + return false;
> +
> + return !cpumask_intersects(task_rq(p)->rd->span, new_mask);
> +}
> +
> /* Assumes rq->lock is held */
> static void rq_online_dl(struct rq *rq)
> {
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure
2026-05-07 14:31 ` Waiman Long
@ 2026-05-08 2:14 ` Chen Ridong
2026-05-08 2:26 ` Waiman Long
0 siblings, 1 reply; 9+ messages in thread
From: Chen Ridong @ 2026-05-08 2:14 UTC (permalink / raw)
To: Waiman Long, Guopeng Zhang, Tejun Heo, Michal Koutný,
Ingo Molnar, Peter Zijlstra, Juri Lelli
Cc: Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
cgroups
On 2026/5/7 22:31, Waiman Long wrote:
> On 5/7/26 6:33 AM, Guopeng Zhang wrote:
>> cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration
>> state in the destination cpuset while walking the taskset.
>>
>> If a later task_can_attach() or security_task_setscheduler() check
>> fails, cgroup_migrate_execute() treats cpuset as the failing subsystem
>> and does not call cpuset_cancel_attach() for it. The partially
>> accumulated state is then left behind and can be consumed by a later
>> attach, corrupting cpuset DL task accounting and pending DL bandwidth
>> accounting.
>>
>> Reset the pending DL migration state before returning from those
>> per-task failure paths.
>>
>> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
>> ---
>> kernel/cgroup/cpuset.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index e3a081a07c6d..ae41736399a1 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>> cgroup_taskset_for_each(task, css, tset) {
>> ret = task_can_attach(task);
>> if (ret)
>> - goto out_unlock;
>> + goto out_reset_dl_data;
>> if (setsched_check) {
>> ret = security_task_setscheduler(task);
>> if (ret)
>> - goto out_unlock;
>> + goto out_reset_dl_data;
>> }
>> if (dl_task(task)) {
>> @@ -3070,6 +3070,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>> * changes which zero cpus/mems_allowed.
>> */
>> cs->attach_in_progress++;
>> + goto out_unlock;
>> +
>> +out_reset_dl_data:
>> + reset_migrate_dl_data(cs);
>> out_unlock:
>> mutex_unlock(&cpuset_mutex);
>> return ret;
>
> I would prefer the likely success path be a straight line instead of doing a
> goto. IOW, move out_reset_dl_data below return. Other than that, this patch
> looks good to me.
>
I've read the code and found several places that call reset_migrate_dl_data(cs).
I think it would be better to call reset_migrate_dl_data(cs) only when we
encounter an error, for example:
```
static int cpuset_can_attach(struct cgroup_taskset *tset)
{
...
out_unlock:
if (ret)
reset_migrate_dl_data(cs);
mutex_unlock(&cpuset_mutex);
return ret;
}
```
After that, no other places would need to call reset_migrate_dl_data(cs), right?
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure
2026-05-08 2:14 ` Chen Ridong
@ 2026-05-08 2:26 ` Waiman Long
2026-05-08 13:03 ` Guopeng Zhang
0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2026-05-08 2:26 UTC (permalink / raw)
To: Chen Ridong, Guopeng Zhang, Tejun Heo, Michal Koutný,
Ingo Molnar, Peter Zijlstra, Juri Lelli
Cc: Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
cgroups
On 5/7/26 10:14 PM, Chen Ridong wrote:
>
> On 2026/5/7 22:31, Waiman Long wrote:
>> On 5/7/26 6:33 AM, Guopeng Zhang wrote:
>>> cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration
>>> state in the destination cpuset while walking the taskset.
>>>
>>> If a later task_can_attach() or security_task_setscheduler() check
>>> fails, cgroup_migrate_execute() treats cpuset as the failing subsystem
>>> and does not call cpuset_cancel_attach() for it. The partially
>>> accumulated state is then left behind and can be consumed by a later
>>> attach, corrupting cpuset DL task accounting and pending DL bandwidth
>>> accounting.
>>>
>>> Reset the pending DL migration state before returning from those
>>> per-task failure paths.
>>>
>>> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
>>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
>>> ---
>>> kernel/cgroup/cpuset.c | 8 ++++++--
>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index e3a081a07c6d..ae41736399a1 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>> cgroup_taskset_for_each(task, css, tset) {
>>> ret = task_can_attach(task);
>>> if (ret)
>>> - goto out_unlock;
>>> + goto out_reset_dl_data;
>>> if (setsched_check) {
>>> ret = security_task_setscheduler(task);
>>> if (ret)
>>> - goto out_unlock;
>>> + goto out_reset_dl_data;
>>> }
>>> if (dl_task(task)) {
>>> @@ -3070,6 +3070,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>> * changes which zero cpus/mems_allowed.
>>> */
>>> cs->attach_in_progress++;
>>> + goto out_unlock;
>>> +
>>> +out_reset_dl_data:
>>> + reset_migrate_dl_data(cs);
>>> out_unlock:
>>> mutex_unlock(&cpuset_mutex);
>>> return ret;
>> I would prefer the likely success path be a straight line instead of doing a
>> goto. IOW, move out_reset_dl_data below return. Other than that, this patch
>> looks good to me.
>>
> I've read the code and found several places that call reset_migrate_dl_data(cs).
>
> I think it would be better to call reset_migrate_dl_data(cs) only when we
> encounter an error, for example:
>
> ```
> static int cpuset_can_attach(struct cgroup_taskset *tset)
> {
> ...
> out_unlock:
> if (ret)
> reset_migrate_dl_data(cs);
> mutex_unlock(&cpuset_mutex);
> return ret;
> }
> ```
> After that, no other places would need to call reset_migrate_dl_data(cs), right?
>
Yes, that should work too.
Cheers,
Longman
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure
2026-05-08 2:26 ` Waiman Long
@ 2026-05-08 13:03 ` Guopeng Zhang
0 siblings, 0 replies; 9+ messages in thread
From: Guopeng Zhang @ 2026-05-08 13:03 UTC (permalink / raw)
To: Waiman Long, Chen Ridong, Tejun Heo, Michal Koutný,
Ingo Molnar, Peter Zijlstra, Juri Lelli
Cc: Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
cgroups
在 2026/5/8 10:26, Waiman Long 写道:
>
> On 5/7/26 10:14 PM, Chen Ridong wrote:
>>
>> On 2026/5/7 22:31, Waiman Long wrote:
>>> On 5/7/26 6:33 AM, Guopeng Zhang wrote:
>>>> cpuset_can_attach() accumulates temporary SCHED_DEADLINE migration
>>>> state in the destination cpuset while walking the taskset.
>>>>
>>>> If a later task_can_attach() or security_task_setscheduler() check
>>>> fails, cgroup_migrate_execute() treats cpuset as the failing subsystem
>>>> and does not call cpuset_cancel_attach() for it. The partially
>>>> accumulated state is then left behind and can be consumed by a later
>>>> attach, corrupting cpuset DL task accounting and pending DL bandwidth
>>>> accounting.
>>>>
>>>> Reset the pending DL migration state before returning from those
>>>> per-task failure paths.
>>>>
>>>> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
>>>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
>>>> ---
>>>> kernel/cgroup/cpuset.c | 8 ++++++--
>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>> index e3a081a07c6d..ae41736399a1 100644
>>>> --- a/kernel/cgroup/cpuset.c
>>>> +++ b/kernel/cgroup/cpuset.c
>>>> @@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>>> cgroup_taskset_for_each(task, css, tset) {
>>>> ret = task_can_attach(task);
>>>> if (ret)
>>>> - goto out_unlock;
>>>> + goto out_reset_dl_data;
>>>> if (setsched_check) {
>>>> ret = security_task_setscheduler(task);
>>>> if (ret)
>>>> - goto out_unlock;
>>>> + goto out_reset_dl_data;
>>>> }
>>>> if (dl_task(task)) {
>>>> @@ -3070,6 +3070,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>>> * changes which zero cpus/mems_allowed.
>>>> */
>>>> cs->attach_in_progress++;
>>>> + goto out_unlock;
>>>> +
>>>> +out_reset_dl_data:
>>>> + reset_migrate_dl_data(cs);
>>>> out_unlock:
>>>> mutex_unlock(&cpuset_mutex);
>>>> return ret;
>>> I would prefer the likely success path be a straight line instead of doing a
>>> goto. IOW, move out_reset_dl_data below return. Other than that, this patch
>>> looks good to me.
>>>
>> I've read the code and found several places that call reset_migrate_dl_data(cs).
>>
>> I think it would be better to call reset_migrate_dl_data(cs) only when we
>> encounter an error, for example:
>>
>> ```
>> static int cpuset_can_attach(struct cgroup_taskset *tset)
>> {
>> ...
>> out_unlock:
>> if (ret)
>> reset_migrate_dl_data(cs);
>> mutex_unlock(&cpuset_mutex);
>> return ret;
>> }
>> ```
>> After that, no other places would need to call reset_migrate_dl_data(cs), right?
>>
> Yes, that should work too.
>
Thanks for the review.
Yes, I will update cpuset_can_attach() to use the common ret-based
cleanup in out_unlock.
Thanks,
Guopeng
> Cheers,
> Longman
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask
2026-05-07 15:52 ` Waiman Long
@ 2026-05-08 13:11 ` Guopeng Zhang
0 siblings, 0 replies; 9+ messages in thread
From: Guopeng Zhang @ 2026-05-08 13:11 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Michal Koutný, Ingo Molnar,
Peter Zijlstra, Juri Lelli
Cc: Chen Ridong, Johannes Weiner, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Gabriele Monaco, Will Deacon, linux-kernel,
cgroups
在 2026/5/7 23:52, Waiman Long 写道:
> On 5/7/26 6:33 AM, Guopeng Zhang wrote:
>> cpuset_can_attach() preallocates destination SCHED_DEADLINE bandwidth
>> before the attach commit point, while set_cpus_allowed_dl() later
>> subtracts bandwidth from the source root domain when the task affinity is
>> actually updated.
>>
>> Those two decisions must be made with the same CPU mask.
>> cpuset_can_attach() used the destination cpuset effective mask directly,
>> but cpuset_attach_task() first builds a per-task target mask which is
>> constrained by task_cpu_possible_mask() and, if needed, by walking up the
>> cpuset hierarchy. On asymmetric systems, the actual target mask can
>> therefore be a strict subset of cs->effective_cpus.
>
> The task_cpu_possible_mask() is there for a special class of arm64 CPUs where only some of the cores are able to run legacy 32-bit applications on 64-bit arm CPUs. We can argue how likely that a DL task can be a legacy 32 bit application that is inherently slower than the same application compiled into native 64-bit code. Perhaps we can just disallow such a legacy 32-bit application from moving to a DL scheduling class in the first place.
>
> I am not in favor of the idea of making the cpuset code more complex to support such a corner case which may never be utilized. Could you strip out the task_possible_cpu_mask() part from this patch? We can revisit this with another patch if such a special use case can be useful to support in the future.
>
Thanks for the review.
I agree. The task_cpu_possible_mask() case makes the fix broader and
adds more cpuset-side complexity than needed for this series.
I will drop the cpuset_attach_task() target-mask mirroring from v3 and
keep cpuset_can_attach() using cs->effective_cpus. The updated patch will
only share the root-domain bandwidth-move test with set_cpus_allowed_dl()
and only add a migrating DL task to sum_migrate_dl_bw when that task
actually needs a root-domain bandwidth move.
The task_cpu_possible_mask() corner case can be revisited separately if
there is a real need to support that scenario.
Thanks,
Guopeng
> Cheers,
> Longman
>
>>
>> If the source root domain intersects cs->effective_cpus only on CPUs
>> outside the task's possible mask, can_attach() can skip the destination
>> reservation even though set_cpus_allowed_dl() later sees a real
>> root-domain move and subtracts from the source domain.
>>
>> Extract the root-domain bandwidth-move test used by
>> set_cpus_allowed_dl() into dl_task_needs_bw_move(), and make
>> cpuset_can_attach() compute the same per-task target mask that
>> cpuset_attach_task() applies.
>>
>> Keep nr_migrate_dl_tasks counting all migrating deadline tasks for
>> cpuset DL task accounting. Restrict sum_migrate_dl_bw to the subset of
>> tasks that need destination root-domain bandwidth reservation, because a
>> deadline task can move between cpusets without moving bandwidth between
>> root domains.
>>
>> This keeps the existing per-attach aggregate reservation model; it only
>> changes the per-task mask used to decide which tasks contribute to that
>> aggregate. The broader can_attach()/attach() transaction window is left
>> unchanged.
>>
>> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
>> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
>> ---
>> include/linux/sched/deadline.h | 9 +++
>> kernel/cgroup/cpuset-internal.h | 1 +
>> kernel/cgroup/cpuset.c | 97 ++++++++++++++++++++++-----------
>> kernel/sched/deadline.c | 13 ++++-
>> 4 files changed, 86 insertions(+), 34 deletions(-)
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-05-08 13:11 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 10:33 [PATCH v2 0/2] cgroup/cpuset: fix DL attach bandwidth accounting Guopeng Zhang
2026-05-07 10:33 ` [PATCH v2 1/2] cgroup/cpuset: reset DL migration state on can_attach() failure Guopeng Zhang
2026-05-07 14:31 ` Waiman Long
2026-05-08 2:14 ` Chen Ridong
2026-05-08 2:26 ` Waiman Long
2026-05-08 13:03 ` Guopeng Zhang
2026-05-07 10:33 ` [PATCH v2 2/2] cgroup/cpuset: align DL bandwidth reservation with attach target mask Guopeng Zhang
2026-05-07 15:52 ` Waiman Long
2026-05-08 13:11 ` Guopeng Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox