All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks()
@ 2026-06-22 22:45 Waiman Long
  2026-06-22 22:45 ` [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask() Waiman Long
  2026-06-23  1:14 ` [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Ridong Chen
  0 siblings, 2 replies; 5+ messages in thread
From: Waiman Long @ 2026-06-22 22:45 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný, Ridong Chen,
	Jonathan Corbet, Shuah Khan
  Cc: cgroups, linux-kernel, linux-doc, Waiman Long

As reported by sashiko [1], cpuset_hotplug_update_tasks() may perform
unnecessary task iteration and updating of tasks' CPU and node masks
when mems_allowed and/or cpus_allowed are not set in cpuset v2. It is
due to the fact that the temporary new_cpus and new_mems masks do not
inherit parent's effective_cpus/mems when they are empty which is the
expected behavior for cpuset v2 since commit 4ec22e9c5a90 ("cpuset:
Enable cpuset controller in default hierarchy").

Fix that and avoid unnecessay work by adding the empty mask checks and
inheriting the parent's versions if empty.

[1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com

Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default hierarchy")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index aff86acea701..bc0207fd6e57 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3925,6 +3925,14 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
 	compute_effective_cpumask(&new_cpus, cs, parent);
 	nodes_and(new_mems, cs->mems_allowed, parent->effective_mems);
 
+	if (is_in_v2_mode()) {
+		/* Inherit parent's effective_cpus/mems if empty */
+		if (cpumask_empty(&new_cpus))
+			cpumask_copy(&new_cpus, parent->effective_cpus);
+		if (nodes_empty(new_mems))
+			new_mems = parent->effective_mems;
+	}
+
 	if (!tmp || !cs->partition_root_state)
 		goto update_tasks;
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask()
  2026-06-22 22:45 [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Waiman Long
@ 2026-06-22 22:45 ` Waiman Long
  2026-06-23  1:22   ` Ridong Chen
  2026-06-23  1:14 ` [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Ridong Chen
  1 sibling, 1 reply; 5+ messages in thread
From: Waiman Long @ 2026-06-22 22:45 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný, Ridong Chen,
	Jonathan Corbet, Shuah Khan
  Cc: cgroups, linux-kernel, linux-doc, Waiman Long

As reported by sashiko [1], cpuset_update_tasks_nodemask() will do
mpol_rebind_mm() and possibly cpuset_migrate_mm() for all threads of
a multithreaded process. Since commit 3df9ca0a2b8b ("cpuset: migrate
memory only for threadgroup leaders"), cpuset_attach() had been updated
to rebind and migrate memory only for threadgroup leaders to mark the
group leader as the owner of the mm_struct.

To be consistent and avoid unnecessary performance overhead for heavily
multithreaded processes, follow the cpuset_attach() example and perform
memory rebind and migration only for threadgroup leaders.

Also add a paragraph in cgroup-v2.rst under cpuset.mems that the
threadgroup leader is the memory owner of that threadgroup. Therefore
the non-leading threads shouldn't be in other cgroups whose "cpuset.mems"
doesn't fully overleap that of the group leader.

[1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 7 +++++++
 kernel/cgroup/cpuset.c                  | 4 ++++
 2 files changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 993446ab66d0..341037c7ec9d 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2527,6 +2527,13 @@ Cpuset Interface Files
 	a need to change "cpuset.mems" with active tasks, it shouldn't
 	be done frequently.
 
+	For a multithreaded process, the threadgroup leader is
+	considered the owner of the group's memory. Memory policy
+	rebinding and migration will only happen with respect to the
+	threadgroup leader. To avoid unexpected result, non-leading
+	threads shouldn't be put into another cgroup whose "cpuset.mems"
+	doesn't full overleap that of the threadgroup leader.
+
   cpuset.mems.effective
 	A read-only multiple values file which exists on all
 	cpuset-enabled cgroups.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index bc0207fd6e57..27bc7a466468 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2659,6 +2659,10 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
 
 		cpuset_change_task_nodemask(task, &newmems);
 
+		/* Rebind and migrate mm only for task group leader */
+		if (task != task->group_leader)
+			continue;
+
 		mm = get_task_mm(task);
 		if (!mm)
 			continue;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks()
  2026-06-22 22:45 [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Waiman Long
  2026-06-22 22:45 ` [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask() Waiman Long
@ 2026-06-23  1:14 ` Ridong Chen
  2026-06-23  5:58   ` Waiman Long
  1 sibling, 1 reply; 5+ messages in thread
From: Ridong Chen @ 2026-06-23  1:14 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan
  Cc: cgroups, linux-kernel, linux-doc



On 6/23/2026 6:45 AM, Waiman Long wrote:
> As reported by sashiko [1], cpuset_hotplug_update_tasks() may perform
> unnecessary task iteration and updating of tasks' CPU and node masks
> when mems_allowed and/or cpus_allowed are not set in cpuset v2. It is
> due to the fact that the temporary new_cpus and new_mems masks do not
> inherit parent's effective_cpus/mems when they are empty which is the
> expected behavior for cpuset v2 since commit 4ec22e9c5a90 ("cpuset:
> Enable cpuset controller in default hierarchy").
> 
> Fix that and avoid unnecessay work by adding the empty mask checks and
> inheriting the parent's versions if empty.
> 
> [1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com
> 
> Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default hierarchy")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>   kernel/cgroup/cpuset.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index aff86acea701..bc0207fd6e57 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -3925,6 +3925,14 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
>   	compute_effective_cpumask(&new_cpus, cs, parent);
>   	nodes_and(new_mems, cs->mems_allowed, parent->effective_mems);
>   
> +	if (is_in_v2_mode()) {
> +		/* Inherit parent's effective_cpus/mems if empty */
> +		if (cpumask_empty(&new_cpus))
> +			cpumask_copy(&new_cpus, parent->effective_cpus);
> +		if (nodes_empty(new_mems))
> +			new_mems = parent->effective_mems;
> +	}
> +
>   	if (!tmp || !cs->partition_root_state)
>   		goto update_tasks;
>   

I noticed that compute_effective_cpumask(...) is called in several 
places, so I think the logic should be consolidated into that function.

```
static void compute_effective_cpumask(struct cpumask *new_cpus,
				      struct cpuset *cs, struct cpuset *parent)
{
	cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus);
	if (cpumask_empty(&new_cpus) && is_in_v2_mode())
		cpumask_copy(&new_cpus, parent->effective_cpus);
}

```

Similarly, for new_mems, should we introduce a dedicated helper like 
compute_effective_nodemask? The same fallback logic is needed in 
update_nodemasks_hier:


```
static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
{
...
		bool has_mems = nodes_and(*new_mems, cp->mems_allowed, 
parent->effective_mems);

		/*
		 * If it becomes empty, inherit the effective mask of the
		 * parent, which is guaranteed to have some MEMs.
		 */
		if (is_in_v2_mode() && !has_mems)
			*new_mems = parent->effective_mems;
...
```

-- 
Best regards
Ridong


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask()
  2026-06-22 22:45 ` [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask() Waiman Long
@ 2026-06-23  1:22   ` Ridong Chen
  0 siblings, 0 replies; 5+ messages in thread
From: Ridong Chen @ 2026-06-23  1:22 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan
  Cc: cgroups, linux-kernel, linux-doc



On 6/23/2026 6:45 AM, Waiman Long wrote:
> As reported by sashiko [1], cpuset_update_tasks_nodemask() will do
> mpol_rebind_mm() and possibly cpuset_migrate_mm() for all threads of
> a multithreaded process. Since commit 3df9ca0a2b8b ("cpuset: migrate
> memory only for threadgroup leaders"), cpuset_attach() had been updated
> to rebind and migrate memory only for threadgroup leaders to mark the
> group leader as the owner of the mm_struct.
> 
> To be consistent and avoid unnecessary performance overhead for heavily
> multithreaded processes, follow the cpuset_attach() example and perform
> memory rebind and migration only for threadgroup leaders.
> 
> Also add a paragraph in cgroup-v2.rst under cpuset.mems that the
> threadgroup leader is the memory owner of that threadgroup. Therefore
> the non-leading threads shouldn't be in other cgroups whose "cpuset.mems"
> doesn't fully overleap that of the group leader.
> 
> [1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>   Documentation/admin-guide/cgroup-v2.rst | 7 +++++++
>   kernel/cgroup/cpuset.c                  | 4 ++++
>   2 files changed, 11 insertions(+)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 993446ab66d0..341037c7ec9d 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -2527,6 +2527,13 @@ Cpuset Interface Files
>   	a need to change "cpuset.mems" with active tasks, it shouldn't
>   	be done frequently.
>   
> +	For a multithreaded process, the threadgroup leader is
> +	considered the owner of the group's memory. Memory policy
> +	rebinding and migration will only happen with respect to the
> +	threadgroup leader. To avoid unexpected result, non-leading
> +	threads shouldn't be put into another cgroup whose "cpuset.mems"
> +	doesn't full overleap that of the threadgroup leader.
> +
>     cpuset.mems.effective
>   	A read-only multiple values file which exists on all
>   	cpuset-enabled cgroups.
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index bc0207fd6e57..27bc7a466468 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2659,6 +2659,10 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
>   
>   		cpuset_change_task_nodemask(task, &newmems);
>   
> +		/* Rebind and migrate mm only for task group leader */
> +		if (task != task->group_leader)
> +			continue;
> +

Nit.

if (!thread_group_leader(task))
     continue;

>   		mm = get_task_mm(task);
>   		if (!mm)
>   			continue;

Reviewed-by: Ridong Chen <ridong.chen@linux.dev>

-- 
Best regards
Ridong


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks()
  2026-06-23  1:14 ` [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Ridong Chen
@ 2026-06-23  5:58   ` Waiman Long
  0 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2026-06-23  5:58 UTC (permalink / raw)
  To: Ridong Chen, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan
  Cc: cgroups, linux-kernel, linux-doc

On 6/22/26 9:14 PM, Ridong Chen wrote:
>
>
> On 6/23/2026 6:45 AM, Waiman Long wrote:
>> As reported by sashiko [1], cpuset_hotplug_update_tasks() may perform
>> unnecessary task iteration and updating of tasks' CPU and node masks
>> when mems_allowed and/or cpus_allowed are not set in cpuset v2. It is
>> due to the fact that the temporary new_cpus and new_mems masks do not
>> inherit parent's effective_cpus/mems when they are empty which is the
>> expected behavior for cpuset v2 since commit 4ec22e9c5a90 ("cpuset:
>> Enable cpuset controller in default hierarchy").
>>
>> Fix that and avoid unnecessay work by adding the empty mask checks and
>> inheriting the parent's versions if empty.
>>
>> [1] 
>> https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com
>>
>> Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default 
>> hierarchy")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/cgroup/cpuset.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index aff86acea701..bc0207fd6e57 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -3925,6 +3925,14 @@ static void cpuset_hotplug_update_tasks(struct 
>> cpuset *cs, struct tmpmasks *tmp)
>>       compute_effective_cpumask(&new_cpus, cs, parent);
>>       nodes_and(new_mems, cs->mems_allowed, parent->effective_mems);
>>   +    if (is_in_v2_mode()) {
>> +        /* Inherit parent's effective_cpus/mems if empty */
>> +        if (cpumask_empty(&new_cpus))
>> +            cpumask_copy(&new_cpus, parent->effective_cpus);
>> +        if (nodes_empty(new_mems))
>> +            new_mems = parent->effective_mems;
>> +    }
>> +
>>       if (!tmp || !cs->partition_root_state)
>>           goto update_tasks;
>
> I noticed that compute_effective_cpumask(...) is called in several 
> places, so I think the logic should be consolidated into that function.
>
> ```
> static void compute_effective_cpumask(struct cpumask *new_cpus,
>                       struct cpuset *cs, struct cpuset *parent)
> {
>     cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus);
>     if (cpumask_empty(&new_cpus) && is_in_v2_mode())
>         cpumask_copy(&new_cpus, parent->effective_cpus);
> }
>
> ```
>
> Similarly, for new_mems, should we introduce a dedicated helper like 
> compute_effective_nodemask? The same fallback logic is needed in 
> update_nodemasks_hier:
>
>
> ```
> static void update_nodemasks_hier(struct cpuset *cs, nodemask_t 
> *new_mems)
> {
> ...
>         bool has_mems = nodes_and(*new_mems, cp->mems_allowed, 
> parent->effective_mems);
>
>         /*
>          * If it becomes empty, inherit the effective mask of the
>          * parent, which is guaranteed to have some MEMs.
>          */
>         if (is_in_v2_mode() && !has_mems)
>             *new_mems = parent->effective_mems;
> ...
> ```
>
Yes, that makes sense. Will adopt this approach in the next version.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-23  5:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22 22:45 [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Waiman Long
2026-06-22 22:45 ` [PATCH 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask() Waiman Long
2026-06-23  1:22   ` Ridong Chen
2026-06-23  1:14 ` [PATCH 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Ridong Chen
2026-06-23  5:58   ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.