Linux cgroups development
 help / color / mirror / Atom feed
From: Ridong Chen <ridong.chen@linux.dev>
To: Waiman Long <longman@redhat.com>
Cc: cgroups@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] cgroup/cpuset: Support multiple source/destination cpusets using pids pattern
Date: Sun, 7 Jun 2026 11:22:28 +0800	[thread overview]
Message-ID: <67d1668e-432a-49cf-b0ae-3d0e7fcb255c@linux.dev> (raw)
In-Reply-To: <110456d5-7164-4032-ae4f-81a97ed96504@redhat.com>



On 6/6/2026 1:15 AM, Waiman Long wrote:
> On 6/5/26 3:35 AM, Ridong Chen wrote:
>>
>> On 6/4/2026 2:47 AM, Waiman Long wrote:
>>> On 6/3/26 6:26 AM, Ridong Chen wrote:
>>>> The current cpuset_can_attach() and cpuset_attach() functions assume
>>>> task
>>>> migration is from one source cpuset to one destination cpuset. This
>>>> can be
>>>> wrong in several scenarios:
>>>>    - Moving a multi-threaded process with threads in different cpusets
>>>>    - Disabling the cpuset controller (many children to one parent)
>>>>    - Enabling the cpuset controller (one parent to many children)
>>>>
>>>> Fix this by adopting the pids subsystem's per-task accounting pattern.
>>>> In cpuset_can_attach(), use task_cs(task) to get the correct source
>>>> cpuset
>>>> for each task (like pids_can_attach uses task_css), adjust
>>>> nr_deadline_tasks
>>>> and reserve DL bandwidth per-task, and increment attach_in_progress
>>>> per-task
>>>> on the destination cpuset. In cpuset_attach(), handle destination
>>>> cpuset
>>>> changes within the task iteration loop.
>>>>
>>>> A shared helper cpuset_undo_attach() reverses the per-task
>>>> operations for
>>>> both partial rollback in cpuset_can_attach() and full reversal in
>>>> cpuset_cancel_attach().
>>>>
>>>> When multiple source cpusets are detected in can_attach(), set
>>>> attach_many_sources so that cpuset_attach() forces cpus_updated and
>>>> mems_updated to true, ensuring all tasks get properly updated
>>>> regardless
>>>> of which source cpuset cpuset_attach_old_cs points to.
>>>>
>>>> This eliminates the need for nr_migrate_dl_tasks, sum_migrate_dl_bw,
>>>> and
>>>> dl_bw_cpu fields in struct cpuset.
>>>>
>>>> Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default
>>>> hierarchy")
>>>> Signed-off-by: Ridong Chen <ridong.chen@linux.dev>
>>> It is not a problem doing per-task DL BW allocation and eliminating the
>>> *dl_bw* fields. However, updating nr_deadline_tasks before it is
>>> committed can be problematic.
>>>
>> Good to hear that.
>>
>>> nr_deadline_tasks is used in dl_rebuild_rd_accounting() which is called
>>> by partition_sched_domains_locked(). After the release of cpuset_mutex
>>> at the end of cpuset_can_attach() and before cpuset_attach() or
>>> cpuset_cancel_attach() is called, it is possible
>>> that partition_sched_domains_locked() can be called
>>> and dl_rebuild_rd_accounting() is not getting the right DL BW accounting
>>> information. So unless there is a way to confirm that this situation
>>> cannot happen, we can't change nr_deadline_tasks before the attach is
>>> commited.
>>>
>> We can keep the nr_migrate_dl_tasks field and update nr_deadline_tasks
>> once migration is complete. I think this will be much simpler than
>> fixing the issue using lists.
>>
> But we still need to track the set of source and destination cpusets to
> commit or cancel the change. Doing it task-by-task will add code in the
> cpuset_attach() and cpuset_cancel_attach() to check if a task is a DL
> task and act accordingly. So we are just trading task-by-task code with
> code to handle the lists.
> 

I resend a patch [1] that keeps nr_migrate_dl_tasks but eliminates
sum_migrate_dl_bw, dl_bw_cpu, attach_node, and attach_llist_head from
the cpuset structure by doing per-task dl_bw_alloc directly in
cpuset_can_attach().

I just offer a way to discuss whether we can solve this issue in a
simpler way.

[1]
https://lore.kernel.org/cgroups/20260602023203.248077-1-longman@redhat.com/T/#mb2c6a3ae44f34f571db5dffa888212eaeaaea17a
-- 
Best regards,
Ridong


      parent reply	other threads:[~2026-06-07  3:22 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-02  2:31 [PATCH-next v5 0/6] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
2026-06-02  2:31 ` [PATCH-next v5 1/6] cgroup/cpuset: Fix node inconsistencies between cpuset_update_tasks_nodemask() and cpuset_attach() Waiman Long
2026-06-02 13:37   ` Ridong Chen
2026-06-02 18:43     ` Waiman Long
2026-06-02  2:31 ` [PATCH-next v5 2/6] cgroup/cpuset: Add a cpuset_reserve_dl_bw() helper Waiman Long
2026-06-02 13:40   ` Ridong Chen
2026-06-02  2:32 ` [PATCH-next v5 3/6] cgroup/cpuset: Expand the scope of cpuset_can_attach_check() Waiman Long
2026-06-02 13:51   ` Ridong Chen
2026-06-02  2:32 ` [PATCH-next v5 4/6] cgroup/cpuset: Make cpuset_attach_old_cs track task group leaders Waiman Long
2026-06-02 13:58   ` Ridong Chen
2026-06-02  2:32 ` [PATCH-next v5 5/6] cgroup/cpuset: Move mpol_rebind_mm/cpuset_migrate_mm() calls inside cpuset_attach_task() Waiman Long
2026-06-02  2:32 ` [PATCH-next v5 6/6] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Waiman Long
2026-06-03 10:26   ` [PATCH] cgroup/cpuset: Support multiple source/destination cpusets using pids pattern Ridong Chen
2026-06-03 10:32     ` Ridong Chen
2026-06-03 18:47     ` Waiman Long
2026-06-05  7:35       ` Ridong Chen
2026-06-05 17:15         ` Waiman Long
2026-06-07  3:12           ` Ridong Chen
2026-06-07  3:22           ` Ridong Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67d1668e-432a-49cf-b0ae-3d0e7fcb255c@linux.dev \
    --to=ridong.chen@linux.dev \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox