Linux cgroups development
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Aaron Tomlin <atomlin@atomlin.com>,
	tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com
Cc: chenridong@huaweicloud.com, neelx@suse.com,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] cpuset: Fix multi-source deadline task accounting and bandwidth bypass
Date: Wed, 13 May 2026 19:19:18 -0400	[thread overview]
Message-ID: <7ae7fe29-6405-41e3-9f3b-6c1d0255dc9e@redhat.com> (raw)
In-Reply-To: <ddc8040f-2186-4c72-a69e-26b388cb7249@arm.com>

On 5/13/26 12:22 PM, Dietmar Eggemann wrote:
> On 12.05.26 03:03, Aaron Tomlin wrote:
>> During a batch migration where threads in a taskset originate from
>> multiple source cpusets (e.g., via cgroup.procs), cpuset_can_attach()
>> and cpuset_attach() currently evaluate the source cpuset exactly once
>> by caching the first task's oldcs.
>>
>> This creates two distinct critical flaws for SCHED_DEADLINE tasks:
>>
>>      1.  oldcs->nr_deadline_tasks is decremented solely on the first
>>          source cpuset. If tasks originated from other cpusets, their
>>          counts are permanently leaked, and the first cpuset permanently
>>          underflows.
>>
>>      2.  cpumask_intersects() is evaluated strictly against the first
>>          task's source cpuset. This allows tasks originating from
>>          entirely isolated root domains to silently bypass the
>>          dl_bw_alloc() admission control.
>>
>> This patch refactors the deadline accounting to evaluate task_cs(task)
>> on a per-task basis during the cgroup_taskset_for_each() loops. To
>> achieve accurate accounting before the core cgroup migration actually
>> executes, the permanent nr_deadline_tasks increments/decrements are
>> shifted into cpuset_can_attach(). If the migration aborts, the counts
>> are gracefully reverted via an internal rollback loop or the
>> cpuset_cancel_attach() callback.
> Is there a testcase to provoke this issue in the current code?
>
> I tried to move a process with 6 DL tasks from one cpuset to another by:
>
> echo $PID > /sys/fs/cgroup/B/cgroup.procs
>
> but in this case old_cs is the same for all these tasks.
>
> [ 1991.852034] cgroup_migrate() (7) leader=[dl_batch_cgroup 823] threadgroup=1
> [ 1991.852068] cgroup_migrate_execute() tset->nr_tasks=7
> [ 1991.852238] cpuset_can_attach() (4) [dl_batch_cgroup 832] nr_migrate_dl_tasks=1 sum_migrate_dl_bw=104857 old_cs=ffff0000c4955200
> [ 1991.852246] cpuset_can_attach() (4) [dl_batch_cgroup 833] nr_migrate_dl_tasks=2 sum_migrate_dl_bw=209714 old_cs=ffff0000c4955200
> [ 1991.852248] cpuset_can_attach() (4) [dl_batch_cgroup 834] nr_migrate_dl_tasks=3 sum_migrate_dl_bw=314571 old_cs=ffff0000c4955200
> [ 1991.852249] cpuset_can_attach() (4) [dl_batch_cgroup 835] nr_migrate_dl_tasks=4 sum_migrate_dl_bw=419428 old_cs=ffff0000c4955200
> [ 1991.852249] cpuset_can_attach() (4) [dl_batch_cgroup 836] nr_migrate_dl_tasks=5 sum_migrate_dl_bw=524285 old_cs=ffff0000c4955200
> [ 1991.852250] cpuset_can_attach() (4) [dl_batch_cgroup 837] nr_migrate_dl_tasks=6 sum_migrate_dl_bw=629142 old_cs=ffff0000c4955200
> [ 1991.852328] cpuset_attach() (5) cs=ffff0000c1e9fc00 oldcs=ffff0000c4955200 cs->nr_deadline_tasks=6 oldcs->nr_deadline_tasks=6 cs->nr_migrate_dl_tasks=6
>
> dl_batch_cgroup     823     823  19      -   0 TS
> dl_batch_cgroup     823     832 140      0   - DLN
> dl_batch_cgroup     823     833 140      0   - DLN
> dl_batch_cgroup     823     834 140      0   - DLN
> dl_batch_cgroup     823     835 140      0   - DLN
> dl_batch_cgroup     823     836 140      0   - DLN
> dl_batch_cgroup     823     837 140      0   - DLN
>
> [...]

Multiple source or destination cpusets in task migration can only 
happens when the cpuset controller is enabled or disabled in a cgroup 
subtree. If there are DL tasks in 2 or more child cgroups, enabling or 
disabling of the cpuset controller for those child cgroups may lead to 
incorrect DL task accounting. This patch will probably fix the DL 
accounting aspect. However, there are also other issues unrelated to DL 
tasks that need to be addressed as well. So this patch is incomplete in 
this regard. I am working on a patch series to address these issues. 
Hopefully I can send it out in a day or 2.

Cheers,
Longman


  parent reply	other threads:[~2026-05-13 23:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12  1:03 [PATCH] cpuset: Fix multi-source deadline task accounting and bandwidth bypass Aaron Tomlin
2026-05-13 16:22 ` Dietmar Eggemann
2026-05-13 23:09   ` Aaron Tomlin
2026-05-13 23:19   ` Waiman Long [this message]
2026-05-13 23:39     ` Aaron Tomlin
2026-05-14  4:26       ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ae7fe29-6405-41e3-9f3b-6c1d0255dc9e@redhat.com \
    --to=longman@redhat.com \
    --cc=atomlin@atomlin.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkoutny@suse.com \
    --cc=neelx@suse.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox