From: Guopeng Zhang <zhangguopeng@kylinos.cn>
To: Waiman Long <longman@redhat.com>,
tj@kernel.org, juri.lelli@redhat.com, chenridong@huaweicloud.com,
mkoutny@suse.com
Cc: hannes@cmpxchg.org, mingo@redhat.com, peterz@infradead.org,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, kprateek.nayak@amd.com,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH] cgroup/cpuset: make DL attach bandwidth reservation root-domain aware
Date: Thu, 30 Apr 2026 16:53:42 +0800 [thread overview]
Message-ID: <7a993bd9-2fc2-496c-b6ba-713f2c893cdc@kylinos.cn> (raw)
In-Reply-To: <6840e385-ef47-4f83-bf4c-8f80843f8c1d@redhat.com>
在 2026/4/24 22:15, Waiman Long 写道:
> On 4/21/26 4:34 AM, Guopeng Zhang wrote:
>> cpuset_can_attach() currently sums the bandwidth of all migrating
>> SCHED_DEADLINE tasks and reserves destination bandwidth whenever the
>> old and new cpuset effective CPU masks do not overlap.
>>
>> That condition is stronger than what the scheduler uses when migrating
>> a deadline task. set_cpus_allowed_dl() only subtracts bandwidth from
>> the source side when moving the task requires a DL bandwidth move
>> between root domains.
>>
>> As a result, moving a deadline task between disjoint member cpusets that
>> still belong to the same root domain can reserve destination bandwidth
>> even though no matching source-side subtraction happens. Successful
>> back-and-forth migrations between such cpusets can monotonically
>> increase dl_bw->total_bw.
>>
>> Fix this by extracting the source root-domain test already used by
>> set_cpus_allowed_dl() into a shared helper and make cpuset DL bandwidth
>> preallocation use that same condition. Count all migrating deadline
>> tasks for cpuset task accounting, but only accumulate sum_migrate_dl_bw
>> for tasks that actually need a DL bandwidth move. Reserve and rollback
>> bandwidth only for that subset.
>>
>> This keeps successful attach accounting aligned with
>> set_cpus_allowed_dl() and avoids double-accounting within a single
>> root domain.
>>
>> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
>> ---
>> include/linux/sched/deadline.h | 9 +++++++++
>> kernel/cgroup/cpuset-internal.h | 1 +
>> kernel/cgroup/cpuset.c | 34 ++++++++++++++++-----------------
>> kernel/sched/deadline.c | 14 +++++++++++---
>> 4 files changed, 38 insertions(+), 20 deletions(-)
>>
>> diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
>> index 1198138cb839..273538200a44 100644
>> --- a/include/linux/sched/deadline.h
>> +++ b/include/linux/sched/deadline.h
>> @@ -33,6 +33,15 @@ struct root_domain;
>> extern void dl_add_task_root_domain(struct task_struct *p);
>> extern void dl_clear_root_domain(struct root_domain *rd);
>> extern void dl_clear_root_domain_cpu(int cpu);
>> +/*
>> + * Return whether moving DL task @p to @new_mask requires moving DL
>> + * bandwidth accounting between root domains. This helper is specific to
>> + * DL bandwidth move accounting semantics and is shared by
>> + * cpuset_can_attach() and set_cpus_allowed_dl() so both paths use the
>> + * same source root-domain test.
>> + */
>> +extern bool dl_task_needs_bw_move(struct task_struct *p,
>> + const struct cpumask *new_mask);
>> extern u64 dl_cookie;
>> extern bool dl_bw_visited(int cpu, u64 cookie);
>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>> index bb4e692bea30..f7aaf01f7cd5 100644
>> --- a/kernel/cgroup/cpuset-internal.h
>> +++ b/kernel/cgroup/cpuset-internal.h
>> @@ -167,6 +167,7 @@ struct cpuset {
>> */
>> int nr_deadline_tasks;
>> int nr_migrate_dl_tasks;
>> + /* DL bandwidth that needs destination reservation for this attach. */
>> u64 sum_migrate_dl_bw;
>> /*
>> * CPU used for temporary DL bandwidth allocation during attach;
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index e3a081a07c6d..761098b45f23 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -2993,7 +2993,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>> struct cpuset *cs, *oldcs;
>> struct task_struct *task;
>> bool setsched_check;
>> - int ret;
>> + int cpu, ret;
>> /* used later by cpuset_attach() */
>> cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
>> @@ -3039,31 +3039,31 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>> if (dl_task(task)) {
>> cs->nr_migrate_dl_tasks++;
>> - cs->sum_migrate_dl_bw += task->dl.dl_bw;
>> +
>> + if (dl_task_needs_bw_move(task, cs->effective_cpus))
>> + cs->sum_migrate_dl_bw += task->dl.dl_bw;
>> }
>> }
>> - if (!cs->nr_migrate_dl_tasks)
>> + if (!cs->sum_migrate_dl_bw)
>> goto out_success;
> You should make sure that cs->nr_migrate_dl_tasks is cleared too. Alternatively, you can move the dl task increment under the dl_task_needs_bw_move() check. It doesn't seem to do any harm if nr_migrate_dl_tasks is non-zero, but dl_bw is 0, but it still doesn't look right.😅 Sorry, I missed this part of your review earlier and only noticed the
RCU comment at first.
For nr_migrate_dl_tasks, my understanding is that it should remain
separate from sum_migrate_dl_bw.
nr_migrate_dl_tasks is used for cpuset-level DL task accounting, while
sum_migrate_dl_bw is used for destination root-domain bandwidth
reservation. A DL task may move between cpusets without requiring a
root-domain bandwidth move, but cpuset_attach() still needs to update
the cpuset DL task counts for that task.
So, unless I am missing another use of nr_migrate_dl_tasks, moving
nr_migrate_dl_tasks++ under the dl_task_needs_bw_move() check does not
look like the right fix to me. It could miss DL tasks which still need
cpuset DL task accounting updates, even though no DL bandwidth
reservation is needed.
Please let me know if I am missing something here, or if there is a
better way to represent this state in the code.
In v2, I plan to keep nr_migrate_dl_tasks counting all migrating DL
tasks, and only restrict sum_migrate_dl_bw to tasks that actually need
destination root-domain bandwidth reservation. I will also add a separate
first patch to reset the pending DL migration state
(nr_migrate_dl_tasks, sum_migrate_dl_bw and dl_bw_cpu) on
cpuset_can_attach() internal failure paths, so these temporary states
will not leak.
I still need to do some testing after my leave, and will send v2 after
that.
Thanks,
Guopeng
next prev parent reply other threads:[~2026-04-30 8:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 8:34 [PATCH] cgroup/cpuset: make DL attach bandwidth reservation root-domain aware Guopeng Zhang
2026-04-22 2:14 ` Guopeng Zhang
2026-04-24 14:15 ` Waiman Long
2026-04-26 13:48 ` Guopeng Zhang
2026-04-27 13:47 ` Waiman Long
2026-04-30 8:53 ` Guopeng Zhang [this message]
2026-04-30 12:39 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7a993bd9-2fc2-496c-b6ba-713f2c893cdc@kylinos.cn \
--to=zhangguopeng@kylinos.cn \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox