Linux cgroups development
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Guopeng Zhang <zhangguopeng@kylinos.cn>,
	tj@kernel.org, juri.lelli@redhat.com, chenridong@huaweicloud.com,
	mkoutny@suse.com
Cc: hannes@cmpxchg.org, mingo@redhat.com, peterz@infradead.org,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kprateek.nayak@amd.com,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH] cgroup/cpuset: make DL attach bandwidth reservation root-domain aware
Date: Thu, 30 Apr 2026 08:39:51 -0400	[thread overview]
Message-ID: <77c54abb-3568-49f6-a3c4-c2743f8810f2@redhat.com> (raw)
In-Reply-To: <7a993bd9-2fc2-496c-b6ba-713f2c893cdc@kylinos.cn>

On 4/30/26 4:53 AM, Guopeng Zhang wrote:
>
> 在 2026/4/24 22:15, Waiman Long 写道:
>> On 4/21/26 4:34 AM, Guopeng Zhang wrote:
>>> cpuset_can_attach() currently sums the bandwidth of all migrating
>>> SCHED_DEADLINE tasks and reserves destination bandwidth whenever the
>>> old and new cpuset effective CPU masks do not overlap.
>>>
>>> That condition is stronger than what the scheduler uses when migrating
>>> a deadline task. set_cpus_allowed_dl() only subtracts bandwidth from
>>> the source side when moving the task requires a DL bandwidth move
>>> between root domains.
>>>
>>> As a result, moving a deadline task between disjoint member cpusets that
>>> still belong to the same root domain can reserve destination bandwidth
>>> even though no matching source-side subtraction happens. Successful
>>> back-and-forth migrations between such cpusets can monotonically
>>> increase dl_bw->total_bw.
>>>
>>> Fix this by extracting the source root-domain test already used by
>>> set_cpus_allowed_dl() into a shared helper and make cpuset DL bandwidth
>>> preallocation use that same condition. Count all migrating deadline
>>> tasks for cpuset task accounting, but only accumulate sum_migrate_dl_bw
>>> for tasks that actually need a DL bandwidth move. Reserve and rollback
>>> bandwidth only for that subset.
>>>
>>> This keeps successful attach accounting aligned with
>>> set_cpus_allowed_dl() and avoids double-accounting within a single
>>> root domain.
>>>
>>> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
>>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
>>> ---
>>>    include/linux/sched/deadline.h  |  9 +++++++++
>>>    kernel/cgroup/cpuset-internal.h |  1 +
>>>    kernel/cgroup/cpuset.c          | 34 ++++++++++++++++-----------------
>>>    kernel/sched/deadline.c         | 14 +++++++++++---
>>>    4 files changed, 38 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
>>> index 1198138cb839..273538200a44 100644
>>> --- a/include/linux/sched/deadline.h
>>> +++ b/include/linux/sched/deadline.h
>>> @@ -33,6 +33,15 @@ struct root_domain;
>>>    extern void dl_add_task_root_domain(struct task_struct *p);
>>>    extern void dl_clear_root_domain(struct root_domain *rd);
>>>    extern void dl_clear_root_domain_cpu(int cpu);
>>> +/*
>>> + * Return whether moving DL task @p to @new_mask requires moving DL
>>> + * bandwidth accounting between root domains. This helper is specific to
>>> + * DL bandwidth move accounting semantics and is shared by
>>> + * cpuset_can_attach() and set_cpus_allowed_dl() so both paths use the
>>> + * same source root-domain test.
>>> + */
>>> +extern bool dl_task_needs_bw_move(struct task_struct *p,
>>> +                  const struct cpumask *new_mask);
>>>      extern u64 dl_cookie;
>>>    extern bool dl_bw_visited(int cpu, u64 cookie);
>>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>>> index bb4e692bea30..f7aaf01f7cd5 100644
>>> --- a/kernel/cgroup/cpuset-internal.h
>>> +++ b/kernel/cgroup/cpuset-internal.h
>>> @@ -167,6 +167,7 @@ struct cpuset {
>>>         */
>>>        int nr_deadline_tasks;
>>>        int nr_migrate_dl_tasks;
>>> +    /* DL bandwidth that needs destination reservation for this attach. */
>>>        u64 sum_migrate_dl_bw;
>>>        /*
>>>         * CPU used for temporary DL bandwidth allocation during attach;
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index e3a081a07c6d..761098b45f23 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -2993,7 +2993,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>>        struct cpuset *cs, *oldcs;
>>>        struct task_struct *task;
>>>        bool setsched_check;
>>> -    int ret;
>>> +    int cpu, ret;
>>>          /* used later by cpuset_attach() */
>>>        cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
>>> @@ -3039,31 +3039,31 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>>>              if (dl_task(task)) {
>>>                cs->nr_migrate_dl_tasks++;
>>> -            cs->sum_migrate_dl_bw += task->dl.dl_bw;
>>> +
>>> +            if (dl_task_needs_bw_move(task, cs->effective_cpus))
>>> +                cs->sum_migrate_dl_bw += task->dl.dl_bw;
>>>            }
>>>        }
>>>    -    if (!cs->nr_migrate_dl_tasks)
>>> +    if (!cs->sum_migrate_dl_bw)
>>>            goto out_success;
>> You should make sure that cs->nr_migrate_dl_tasks is cleared too. Alternatively, you can move the dl task increment under the dl_task_needs_bw_move() check. It doesn't seem to do any harm if nr_migrate_dl_tasks is non-zero, but dl_bw is 0, but it still doesn't look right.😅 Sorry, I missed this part of your review earlier and only noticed the
> RCU comment at first.
>
> For nr_migrate_dl_tasks, my understanding is that it should remain
> separate from sum_migrate_dl_bw.
>
> nr_migrate_dl_tasks is used for cpuset-level DL task accounting, while
> sum_migrate_dl_bw is used for destination root-domain bandwidth
> reservation. A DL task may move between cpusets without requiring a
> root-domain bandwidth move, but cpuset_attach() still needs to update
> the cpuset DL task counts for that task.
>
> So, unless I am missing another use of nr_migrate_dl_tasks, moving
> nr_migrate_dl_tasks++ under the dl_task_needs_bw_move() check does not
> look like the right fix to me. It could miss DL tasks which still need
> cpuset DL task accounting updates, even though no DL bandwidth
> reservation is needed.
>
> Please let me know if I am missing something here, or if there is a
> better way to represent this state in the code.

You are right. Cpuset structure has a nr_deadline_tasks count that 
requires proper accounting of nr_migrate_dl_tasks to set this field 
correctly.

Cheers,
Longman

>
> In v2, I plan to keep nr_migrate_dl_tasks counting all migrating DL
> tasks, and only restrict sum_migrate_dl_bw to tasks that actually need
> destination root-domain bandwidth reservation. I will also add a separate
> first patch to reset the pending DL migration state
> (nr_migrate_dl_tasks, sum_migrate_dl_bw and dl_bw_cpu) on
> cpuset_can_attach() internal failure paths, so these temporary states
> will not leak.
>
> I still need to do some testing after my leave, and will send v2 after
> that.
>
> Thanks,
> Guopeng
>


      reply	other threads:[~2026-04-30 12:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  8:34 [PATCH] cgroup/cpuset: make DL attach bandwidth reservation root-domain aware Guopeng Zhang
2026-04-22  2:14 ` Guopeng Zhang
2026-04-24 14:15 ` Waiman Long
2026-04-26 13:48   ` Guopeng Zhang
2026-04-27 13:47     ` Waiman Long
2026-04-30  8:53   ` Guopeng Zhang
2026-04-30 12:39     ` Waiman Long [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77c54abb-3568-49f6-a3c4-c2743f8810f2@redhat.com \
    --to=longman@redhat.com \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=zhangguopeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox