From: Ridong Chen <ridong.chen@linux.dev>
To: Waiman Long <longman@redhat.com>
Cc: cgroups@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] cgroup/cpuset: Support multiple source/destination cpusets using pids pattern
Date: Sun, 7 Jun 2026 11:22:28 +0800 [thread overview]
Message-ID: <67d1668e-432a-49cf-b0ae-3d0e7fcb255c@linux.dev> (raw)
In-Reply-To: <110456d5-7164-4032-ae4f-81a97ed96504@redhat.com>
On 6/6/2026 1:15 AM, Waiman Long wrote:
> On 6/5/26 3:35 AM, Ridong Chen wrote:
>>
>> On 6/4/2026 2:47 AM, Waiman Long wrote:
>>> On 6/3/26 6:26 AM, Ridong Chen wrote:
>>>> The current cpuset_can_attach() and cpuset_attach() functions assume
>>>> task
>>>> migration is from one source cpuset to one destination cpuset. This
>>>> can be
>>>> wrong in several scenarios:
>>>> - Moving a multi-threaded process with threads in different cpusets
>>>> - Disabling the cpuset controller (many children to one parent)
>>>> - Enabling the cpuset controller (one parent to many children)
>>>>
>>>> Fix this by adopting the pids subsystem's per-task accounting pattern.
>>>> In cpuset_can_attach(), use task_cs(task) to get the correct source
>>>> cpuset
>>>> for each task (like pids_can_attach uses task_css), adjust
>>>> nr_deadline_tasks
>>>> and reserve DL bandwidth per-task, and increment attach_in_progress
>>>> per-task
>>>> on the destination cpuset. In cpuset_attach(), handle destination
>>>> cpuset
>>>> changes within the task iteration loop.
>>>>
>>>> A shared helper cpuset_undo_attach() reverses the per-task
>>>> operations for
>>>> both partial rollback in cpuset_can_attach() and full reversal in
>>>> cpuset_cancel_attach().
>>>>
>>>> When multiple source cpusets are detected in can_attach(), set
>>>> attach_many_sources so that cpuset_attach() forces cpus_updated and
>>>> mems_updated to true, ensuring all tasks get properly updated
>>>> regardless
>>>> of which source cpuset cpuset_attach_old_cs points to.
>>>>
>>>> This eliminates the need for nr_migrate_dl_tasks, sum_migrate_dl_bw,
>>>> and
>>>> dl_bw_cpu fields in struct cpuset.
>>>>
>>>> Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default
>>>> hierarchy")
>>>> Signed-off-by: Ridong Chen <ridong.chen@linux.dev>
>>> It is not a problem doing per-task DL BW allocation and eliminating the
>>> *dl_bw* fields. However, updating nr_deadline_tasks before it is
>>> committed can be problematic.
>>>
>> Good to hear that.
>>
>>> nr_deadline_tasks is used in dl_rebuild_rd_accounting() which is called
>>> by partition_sched_domains_locked(). After the release of cpuset_mutex
>>> at the end of cpuset_can_attach() and before cpuset_attach() or
>>> cpuset_cancel_attach() is called, it is possible
>>> that partition_sched_domains_locked() can be called
>>> and dl_rebuild_rd_accounting() is not getting the right DL BW accounting
>>> information. So unless there is a way to confirm that this situation
>>> cannot happen, we can't change nr_deadline_tasks before the attach is
>>> commited.
>>>
>> We can keep the nr_migrate_dl_tasks field and update nr_deadline_tasks
>> once migration is complete. I think this will be much simpler than
>> fixing the issue using lists.
>>
> But we still need to track the set of source and destination cpusets to
> commit or cancel the change. Doing it task-by-task will add code in the
> cpuset_attach() and cpuset_cancel_attach() to check if a task is a DL
> task and act accordingly. So we are just trading task-by-task code with
> code to handle the lists.
>
I resend a patch [1] that keeps nr_migrate_dl_tasks but eliminates
sum_migrate_dl_bw, dl_bw_cpu, attach_node, and attach_llist_head from
the cpuset structure by doing per-task dl_bw_alloc directly in
cpuset_can_attach().
I just offer a way to discuss whether we can solve this issue in a
simpler way.
[1]
https://lore.kernel.org/cgroups/20260602023203.248077-1-longman@redhat.com/T/#mb2c6a3ae44f34f571db5dffa888212eaeaaea17a
--
Best regards,
Ridong
prev parent reply other threads:[~2026-06-07 3:22 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-02 2:31 [PATCH-next v5 0/6] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
2026-06-02 2:31 ` [PATCH-next v5 1/6] cgroup/cpuset: Fix node inconsistencies between cpuset_update_tasks_nodemask() and cpuset_attach() Waiman Long
2026-06-02 13:37 ` Ridong Chen
2026-06-02 18:43 ` Waiman Long
2026-06-02 2:31 ` [PATCH-next v5 2/6] cgroup/cpuset: Add a cpuset_reserve_dl_bw() helper Waiman Long
2026-06-02 13:40 ` Ridong Chen
2026-06-02 2:32 ` [PATCH-next v5 3/6] cgroup/cpuset: Expand the scope of cpuset_can_attach_check() Waiman Long
2026-06-02 13:51 ` Ridong Chen
2026-06-02 2:32 ` [PATCH-next v5 4/6] cgroup/cpuset: Make cpuset_attach_old_cs track task group leaders Waiman Long
2026-06-02 13:58 ` Ridong Chen
2026-06-02 2:32 ` [PATCH-next v5 5/6] cgroup/cpuset: Move mpol_rebind_mm/cpuset_migrate_mm() calls inside cpuset_attach_task() Waiman Long
2026-06-02 2:32 ` [PATCH-next v5 6/6] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
2026-06-03 10:26 ` [PATCH] cgroup/cpuset: Support multiple source/destination cpusets using pids pattern Ridong Chen
2026-06-03 10:32 ` Ridong Chen
2026-06-03 18:47 ` Waiman Long
2026-06-05 7:35 ` Ridong Chen
2026-06-05 17:15 ` Waiman Long
2026-06-07 3:12 ` Ridong Chen
2026-06-08 18:49 ` Waiman Long
2026-06-07 3:22 ` Ridong Chen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=67d1668e-432a-49cf-b0ae-3d0e7fcb255c@linux.dev \
--to=ridong.chen@linux.dev \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.