* [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks()
@ 2026-06-22 22:45 Waiman Long
2026-06-22 22:45 ` [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask() Waiman Long
2026-06-23 1:14 ` [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Ridong Chen
0 siblings, 2 replies; 5+ messages in thread
From: Waiman Long @ 2026-06-22 22:45 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Ridong Chen,
Jonathan Corbet, Shuah Khan
Cc: cgroups, linux-kernel, linux-doc, Waiman Long
As reported by sashiko [1], cpuset_hotplug_update_tasks() may perform
unnecessary task iteration and updating of tasks' CPU and node masks
when mems_allowed and/or cpus_allowed are not set in cpuset v2. It is
due to the fact that the temporary new_cpus and new_mems masks do not
inherit parent's effective_cpus/mems when they are empty which is the
expected behavior for cpuset v2 since commit 4ec22e9c5a90 ("cpuset:
Enable cpuset controller in default hierarchy").
Fix that and avoid unnecessay work by adding the empty mask checks and
inheriting the parent's versions if empty.
[1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com
Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default hierarchy")
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index aff86acea701..bc0207fd6e57 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3925,6 +3925,14 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
compute_effective_cpumask(&new_cpus, cs, parent);
nodes_and(new_mems, cs->mems_allowed, parent->effective_mems);
+ if (is_in_v2_mode()) {
+ /* Inherit parent's effective_cpus/mems if empty */
+ if (cpumask_empty(&new_cpus))
+ cpumask_copy(&new_cpus, parent->effective_cpus);
+ if (nodes_empty(new_mems))
+ new_mems = parent->effective_mems;
+ }
+
if (!tmp || !cs->partition_root_state)
goto update_tasks;
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask() 2026-06-22 22:45 [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Waiman Long @ 2026-06-22 22:45 ` Waiman Long 2026-06-23 1:22 ` Ridong Chen 2026-06-23 1:14 ` [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Ridong Chen 1 sibling, 1 reply; 5+ messages in thread From: Waiman Long @ 2026-06-22 22:45 UTC (permalink / raw) To: Tejun Heo, Johannes Weiner, Michal Koutný, Ridong Chen, Jonathan Corbet, Shuah Khan Cc: cgroups, linux-kernel, linux-doc, Waiman Long As reported by sashiko [1], cpuset_update_tasks_nodemask() will do mpol_rebind_mm() and possibly cpuset_migrate_mm() for all threads of a multithreaded process. Since commit 3df9ca0a2b8b ("cpuset: migrate memory only for threadgroup leaders"), cpuset_attach() had been updated to rebind and migrate memory only for threadgroup leaders to mark the group leader as the owner of the mm_struct. To be consistent and avoid unnecessary performance overhead for heavily multithreaded processes, follow the cpuset_attach() example and perform memory rebind and migration only for threadgroup leaders. Also add a paragraph in cgroup-v2.rst under cpuset.mems that the threadgroup leader is the memory owner of that threadgroup. Therefore the non-leading threads shouldn't be in other cgroups whose "cpuset.mems" doesn't fully overleap that of the group leader. [1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com Signed-off-by: Waiman Long <longman@redhat.com> --- Documentation/admin-guide/cgroup-v2.rst | 7 +++++++ kernel/cgroup/cpuset.c | 4 ++++ 2 files changed, 11 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 993446ab66d0..341037c7ec9d 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2527,6 +2527,13 @@ Cpuset Interface Files a need to change "cpuset.mems" with active tasks, it shouldn't be done frequently. + For a multithreaded process, the threadgroup leader is + considered the owner of the group's memory. Memory policy + rebinding and migration will only happen with respect to the + threadgroup leader. To avoid unexpected result, non-leading + threads shouldn't be put into another cgroup whose "cpuset.mems" + doesn't full overleap that of the threadgroup leader. + cpuset.mems.effective A read-only multiple values file which exists on all cpuset-enabled cgroups. diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index bc0207fd6e57..27bc7a466468 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2659,6 +2659,10 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs) cpuset_change_task_nodemask(task, &newmems); + /* Rebind and migrate mm only for task group leader */ + if (task != task->group_leader) + continue; + mm = get_task_mm(task); if (!mm) continue; -- 2.54.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask() 2026-06-22 22:45 ` [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask() Waiman Long @ 2026-06-23 1:22 ` Ridong Chen 0 siblings, 0 replies; 5+ messages in thread From: Ridong Chen @ 2026-06-23 1:22 UTC (permalink / raw) To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet, Shuah Khan Cc: cgroups, linux-kernel, linux-doc On 6/23/2026 6:45 AM, Waiman Long wrote: > As reported by sashiko [1], cpuset_update_tasks_nodemask() will do > mpol_rebind_mm() and possibly cpuset_migrate_mm() for all threads of > a multithreaded process. Since commit 3df9ca0a2b8b ("cpuset: migrate > memory only for threadgroup leaders"), cpuset_attach() had been updated > to rebind and migrate memory only for threadgroup leaders to mark the > group leader as the owner of the mm_struct. > > To be consistent and avoid unnecessary performance overhead for heavily > multithreaded processes, follow the cpuset_attach() example and perform > memory rebind and migration only for threadgroup leaders. > > Also add a paragraph in cgroup-v2.rst under cpuset.mems that the > threadgroup leader is the memory owner of that threadgroup. Therefore > the non-leading threads shouldn't be in other cgroups whose "cpuset.mems" > doesn't fully overleap that of the group leader. > > [1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com > > Signed-off-by: Waiman Long <longman@redhat.com> > --- > Documentation/admin-guide/cgroup-v2.rst | 7 +++++++ > kernel/cgroup/cpuset.c | 4 ++++ > 2 files changed, 11 insertions(+) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 993446ab66d0..341037c7ec9d 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -2527,6 +2527,13 @@ Cpuset Interface Files > a need to change "cpuset.mems" with active tasks, it shouldn't > be done frequently. > > + For a multithreaded process, the threadgroup leader is > + considered the owner of the group's memory. Memory policy > + rebinding and migration will only happen with respect to the > + threadgroup leader. To avoid unexpected result, non-leading > + threads shouldn't be put into another cgroup whose "cpuset.mems" > + doesn't full overleap that of the threadgroup leader. > + > cpuset.mems.effective > A read-only multiple values file which exists on all > cpuset-enabled cgroups. > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index bc0207fd6e57..27bc7a466468 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -2659,6 +2659,10 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs) > > cpuset_change_task_nodemask(task, &newmems); > > + /* Rebind and migrate mm only for task group leader */ > + if (task != task->group_leader) > + continue; > + Nit. if (!thread_group_leader(task)) continue; > mm = get_task_mm(task); > if (!mm) > continue; Reviewed-by: Ridong Chen <ridong.chen@linux.dev> -- Best regards Ridong ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() 2026-06-22 22:45 [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Waiman Long 2026-06-22 22:45 ` [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask() Waiman Long @ 2026-06-23 1:14 ` Ridong Chen 2026-06-23 5:58 ` Waiman Long 1 sibling, 1 reply; 5+ messages in thread From: Ridong Chen @ 2026-06-23 1:14 UTC (permalink / raw) To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet, Shuah Khan Cc: cgroups, linux-kernel, linux-doc On 6/23/2026 6:45 AM, Waiman Long wrote: > As reported by sashiko [1], cpuset_hotplug_update_tasks() may perform > unnecessary task iteration and updating of tasks' CPU and node masks > when mems_allowed and/or cpus_allowed are not set in cpuset v2. It is > due to the fact that the temporary new_cpus and new_mems masks do not > inherit parent's effective_cpus/mems when they are empty which is the > expected behavior for cpuset v2 since commit 4ec22e9c5a90 ("cpuset: > Enable cpuset controller in default hierarchy"). > > Fix that and avoid unnecessay work by adding the empty mask checks and > inheriting the parent's versions if empty. > > [1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com > > Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default hierarchy") > Signed-off-by: Waiman Long <longman@redhat.com> > --- > kernel/cgroup/cpuset.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index aff86acea701..bc0207fd6e57 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -3925,6 +3925,14 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) > compute_effective_cpumask(&new_cpus, cs, parent); > nodes_and(new_mems, cs->mems_allowed, parent->effective_mems); > > + if (is_in_v2_mode()) { > + /* Inherit parent's effective_cpus/mems if empty */ > + if (cpumask_empty(&new_cpus)) > + cpumask_copy(&new_cpus, parent->effective_cpus); > + if (nodes_empty(new_mems)) > + new_mems = parent->effective_mems; > + } > + > if (!tmp || !cs->partition_root_state) > goto update_tasks; > I noticed that compute_effective_cpumask(...) is called in several places, so I think the logic should be consolidated into that function. ``` static void compute_effective_cpumask(struct cpumask *new_cpus, struct cpuset *cs, struct cpuset *parent) { cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus); if (cpumask_empty(&new_cpus) && is_in_v2_mode()) cpumask_copy(&new_cpus, parent->effective_cpus); } ``` Similarly, for new_mems, should we introduce a dedicated helper like compute_effective_nodemask? The same fallback logic is needed in update_nodemasks_hier: ``` static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems) { ... bool has_mems = nodes_and(*new_mems, cp->mems_allowed, parent->effective_mems); /* * If it becomes empty, inherit the effective mask of the * parent, which is guaranteed to have some MEMs. */ if (is_in_v2_mode() && !has_mems) *new_mems = parent->effective_mems; ... ``` -- Best regards Ridong ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() 2026-06-23 1:14 ` [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Ridong Chen @ 2026-06-23 5:58 ` Waiman Long 0 siblings, 0 replies; 5+ messages in thread From: Waiman Long @ 2026-06-23 5:58 UTC (permalink / raw) To: Ridong Chen, Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet, Shuah Khan Cc: cgroups, linux-kernel, linux-doc On 6/22/26 9:14 PM, Ridong Chen wrote: > > > On 6/23/2026 6:45 AM, Waiman Long wrote: >> As reported by sashiko [1], cpuset_hotplug_update_tasks() may perform >> unnecessary task iteration and updating of tasks' CPU and node masks >> when mems_allowed and/or cpus_allowed are not set in cpuset v2. It is >> due to the fact that the temporary new_cpus and new_mems masks do not >> inherit parent's effective_cpus/mems when they are empty which is the >> expected behavior for cpuset v2 since commit 4ec22e9c5a90 ("cpuset: >> Enable cpuset controller in default hierarchy"). >> >> Fix that and avoid unnecessay work by adding the empty mask checks and >> inheriting the parent's versions if empty. >> >> [1] >> https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com >> >> Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default >> hierarchy") >> Signed-off-by: Waiman Long <longman@redhat.com> >> --- >> kernel/cgroup/cpuset.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >> index aff86acea701..bc0207fd6e57 100644 >> --- a/kernel/cgroup/cpuset.c >> +++ b/kernel/cgroup/cpuset.c >> @@ -3925,6 +3925,14 @@ static void cpuset_hotplug_update_tasks(struct >> cpuset *cs, struct tmpmasks *tmp) >> compute_effective_cpumask(&new_cpus, cs, parent); >> nodes_and(new_mems, cs->mems_allowed, parent->effective_mems); >> + if (is_in_v2_mode()) { >> + /* Inherit parent's effective_cpus/mems if empty */ >> + if (cpumask_empty(&new_cpus)) >> + cpumask_copy(&new_cpus, parent->effective_cpus); >> + if (nodes_empty(new_mems)) >> + new_mems = parent->effective_mems; >> + } >> + >> if (!tmp || !cs->partition_root_state) >> goto update_tasks; > > I noticed that compute_effective_cpumask(...) is called in several > places, so I think the logic should be consolidated into that function. > > ``` > static void compute_effective_cpumask(struct cpumask *new_cpus, > struct cpuset *cs, struct cpuset *parent) > { > cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus); > if (cpumask_empty(&new_cpus) && is_in_v2_mode()) > cpumask_copy(&new_cpus, parent->effective_cpus); > } > > ``` > > Similarly, for new_mems, should we introduce a dedicated helper like > compute_effective_nodemask? The same fallback logic is needed in > update_nodemasks_hier: > > > ``` > static void update_nodemasks_hier(struct cpuset *cs, nodemask_t > *new_mems) > { > ... > bool has_mems = nodes_and(*new_mems, cp->mems_allowed, > parent->effective_mems); > > /* > * If it becomes empty, inherit the effective mask of the > * parent, which is guaranteed to have some MEMs. > */ > if (is_in_v2_mode() && !has_mems) > *new_mems = parent->effective_mems; > ... > ``` > Yes, that makes sense. Will adopt this approach in the next version. Cheers, Longman ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-23 5:58 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-22 22:45 [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Waiman Long 2026-06-22 22:45 ` [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask() Waiman Long 2026-06-23 1:22 ` Ridong Chen 2026-06-23 1:14 ` [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Ridong Chen 2026-06-23 5:58 ` Waiman Long
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox