* [PATCH v2] cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks
@ 2025-05-08 19:24 Waiman Long
2025-05-09 13:18 ` Frederic Weisbecker
2025-05-09 17:35 ` Tejun Heo
0 siblings, 2 replies; 6+ messages in thread
From: Waiman Long @ 2025-05-08 19:24 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný
Cc: cgroups, linux-kernel, Xi Wang, Frederic Weisbecker, Waiman Long
Commit ec5fbdfb99d1 ("cgroup/cpuset: Enable update_tasks_cpumask()
on top_cpuset") enabled us to pull CPUs dedicated to child partitions
from tasks in top_cpuset by ignoring per cpu kthreads. However, there
can be other kthreads that are not per cpu but have PF_NO_SETAFFINITY
flag set to indicate that we shouldn't mess with their CPU affinity.
For other kthreads, their affinity will be changed to skip CPUs dedicated
to child partitions whether it is an isolating or a scheduling one.
As all the per cpu kthreads have PF_NO_SETAFFINITY set, the
PF_NO_SETAFFINITY tasks are essentially a superset of per cpu kthreads.
Fix this issue by dropping the kthread_is_per_cpu() check and checking
the PF_NO_SETAFFINITY flag instead.
Fixes: ec5fbdfb99d1 ("cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset")
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index d0143b3dce47..967603300ee3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1130,9 +1130,11 @@ void cpuset_update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
if (top_cs) {
/*
- * Percpu kthreads in top_cpuset are ignored
+ * PF_NO_SETAFFINITY tasks are ignored.
+ * All per cpu kthreads should have PF_NO_SETAFFINITY
+ * flag set, see kthread_set_per_cpu().
*/
- if (kthread_is_per_cpu(task))
+ if (task->flags & PF_NO_SETAFFINITY)
continue;
cpumask_andnot(new_cpus, possible_mask, subpartitions_cpus);
} else {
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks
2025-05-08 19:24 [PATCH v2] cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks Waiman Long
@ 2025-05-09 13:18 ` Frederic Weisbecker
2025-05-09 14:08 ` Waiman Long
2025-05-09 17:30 ` Tejun Heo
2025-05-09 17:35 ` Tejun Heo
1 sibling, 2 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2025-05-09 13:18 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Johannes Weiner, Michal Koutný, cgroups,
linux-kernel, Xi Wang
Le Thu, May 08, 2025 at 03:24:13PM -0400, Waiman Long a écrit :
> Commit ec5fbdfb99d1 ("cgroup/cpuset: Enable update_tasks_cpumask()
> on top_cpuset") enabled us to pull CPUs dedicated to child partitions
> from tasks in top_cpuset by ignoring per cpu kthreads. However, there
> can be other kthreads that are not per cpu but have PF_NO_SETAFFINITY
> flag set to indicate that we shouldn't mess with their CPU affinity.
> For other kthreads, their affinity will be changed to skip CPUs dedicated
> to child partitions whether it is an isolating or a scheduling one.
>
> As all the per cpu kthreads have PF_NO_SETAFFINITY set, the
> PF_NO_SETAFFINITY tasks are essentially a superset of per cpu kthreads.
> Fix this issue by dropping the kthread_is_per_cpu() check and checking
> the PF_NO_SETAFFINITY flag instead.
>
> Fixes: ec5fbdfb99d1 ("cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/cgroup/cpuset.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index d0143b3dce47..967603300ee3 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1130,9 +1130,11 @@ void cpuset_update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
>
> if (top_cs) {
> /*
> - * Percpu kthreads in top_cpuset are ignored
> + * PF_NO_SETAFFINITY tasks are ignored.
> + * All per cpu kthreads should have PF_NO_SETAFFINITY
> + * flag set, see kthread_set_per_cpu().
> */
> - if (kthread_is_per_cpu(task))
> + if (task->flags & PF_NO_SETAFFINITY)
> continue;
> cpumask_andnot(new_cpus, possible_mask, subpartitions_cpus);
Acked-by: Frederic Weisbecker <frederic@kernel.org>
But this makes me realize I overlooked that when I introduced the unbound kthreads
centralized affinity.
cpuset_update_tasks_cpumask() seem to blindly affine to subpartitions_cpus
while unbound kthreads might have their preferences (per-nodes or random cpumasks).
So I need to make that pass through kthread API.
It seems that subpartition_cpus doesn't contain nohz_full= CPUs.
But it excludes isolcpus=. And it's usually sane to assume that
nohz_full= CPUs are isolated.
I think I can just rename update_unbound_workqueue_cpumask()
to update_unbound_kthreads_cpumask() and then handle unbound
kthreads from there along with workqueues. And then completely
ignore kthreads from cpuset_update_tasks_cpumask().
Let me think about it (but feel free to apply the current patch meanwhile).
Thanks.
--
Frederic Weisbecker
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks
2025-05-09 13:18 ` Frederic Weisbecker
@ 2025-05-09 14:08 ` Waiman Long
2025-05-09 17:30 ` Tejun Heo
1 sibling, 0 replies; 6+ messages in thread
From: Waiman Long @ 2025-05-09 14:08 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Tejun Heo, Johannes Weiner, Michal Koutný, cgroups,
linux-kernel, Xi Wang
On 5/9/25 9:18 AM, Frederic Weisbecker wrote:
> Le Thu, May 08, 2025 at 03:24:13PM -0400, Waiman Long a écrit :
>> Commit ec5fbdfb99d1 ("cgroup/cpuset: Enable update_tasks_cpumask()
>> on top_cpuset") enabled us to pull CPUs dedicated to child partitions
>> from tasks in top_cpuset by ignoring per cpu kthreads. However, there
>> can be other kthreads that are not per cpu but have PF_NO_SETAFFINITY
>> flag set to indicate that we shouldn't mess with their CPU affinity.
>> For other kthreads, their affinity will be changed to skip CPUs dedicated
>> to child partitions whether it is an isolating or a scheduling one.
>>
>> As all the per cpu kthreads have PF_NO_SETAFFINITY set, the
>> PF_NO_SETAFFINITY tasks are essentially a superset of per cpu kthreads.
>> Fix this issue by dropping the kthread_is_per_cpu() check and checking
>> the PF_NO_SETAFFINITY flag instead.
>>
>> Fixes: ec5fbdfb99d1 ("cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> kernel/cgroup/cpuset.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index d0143b3dce47..967603300ee3 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -1130,9 +1130,11 @@ void cpuset_update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
>>
>> if (top_cs) {
>> /*
>> - * Percpu kthreads in top_cpuset are ignored
>> + * PF_NO_SETAFFINITY tasks are ignored.
>> + * All per cpu kthreads should have PF_NO_SETAFFINITY
>> + * flag set, see kthread_set_per_cpu().
>> */
>> - if (kthread_is_per_cpu(task))
>> + if (task->flags & PF_NO_SETAFFINITY)
>> continue;
>> cpumask_andnot(new_cpus, possible_mask, subpartitions_cpus);
> Acked-by: Frederic Weisbecker <frederic@kernel.org>
>
> But this makes me realize I overlooked that when I introduced the unbound kthreads
> centralized affinity.
>
> cpuset_update_tasks_cpumask() seem to blindly affine to subpartitions_cpus
> while unbound kthreads might have their preferences (per-nodes or random cpumasks).
>
> So I need to make that pass through kthread API.
AFAIU, the kthread_bind_mask() or the kthread_bin_cpu() functions will
set PF_NO_SETAFFINITY.
>
> It seems that subpartition_cpus doesn't contain nohz_full= CPUs.
> But it excludes isolcpus=. And it's usually sane to assume that
> nohz_full= CPUs are isolated.
Most users that want isolated CPUs will set both isolcpus and nohz_full
to the same set of CPUs. I do see that RH OpenShift can set nohz_full
for a collection of CPUs that may be dynamically isolated later on via
cpuset partition.
>
> I think I can just rename update_unbound_workqueue_cpumask()
> to update_unbound_kthreads_cpumask() and then handle unbound
> kthreads from there along with workqueues. And then completely
> ignore kthreads from cpuset_update_tasks_cpumask().
I guess we can do that. Right now, update_unbound_workqueue_cpumask() is
only called to excluded isolated CPUs. The
cpuset_update_tasks_cpumasks() will updated affinity for both isolated
and scheduling partitions. I agree that there is code duplication here.
To suit Xi Wang use case, we may have to add a sysctl parameter, for
instance, to decide if we have to update unbound kthreads in the
scheduling partition case.
Cheers,
Longman
> Let me think about it (but feel free to apply the current patch meanwhile).
>
> Thanks.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks
2025-05-09 13:18 ` Frederic Weisbecker
2025-05-09 14:08 ` Waiman Long
@ 2025-05-09 17:30 ` Tejun Heo
2025-05-09 21:19 ` Frederic Weisbecker
1 sibling, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2025-05-09 17:30 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Waiman Long, Johannes Weiner, Michal Koutný, cgroups,
linux-kernel, Xi Wang
Hello,
On Fri, May 09, 2025 at 03:18:17PM +0200, Frederic Weisbecker wrote:
...
> But this makes me realize I overlooked that when I introduced the unbound kthreads
> centralized affinity.
>
> cpuset_update_tasks_cpumask() seem to blindly affine to subpartitions_cpus
> while unbound kthreads might have their preferences (per-nodes or random cpumasks).
>
> So I need to make that pass through kthread API.
I wonder whether it'd be cleaner if all kthread affinity restrictions go
through housekeeping instead of cpuset modifying the cpumasks directly so
that housekeeping keeps track of where different classes of kthreads can run
and tell e.g. workqueue what to do.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks
2025-05-08 19:24 [PATCH v2] cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks Waiman Long
2025-05-09 13:18 ` Frederic Weisbecker
@ 2025-05-09 17:35 ` Tejun Heo
1 sibling, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-05-09 17:35 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Koutný, cgroups, linux-kernel,
Xi Wang, Frederic Weisbecker
On Thu, May 08, 2025 at 03:24:13PM -0400, Waiman Long wrote:
> Commit ec5fbdfb99d1 ("cgroup/cpuset: Enable update_tasks_cpumask()
> on top_cpuset") enabled us to pull CPUs dedicated to child partitions
> from tasks in top_cpuset by ignoring per cpu kthreads. However, there
> can be other kthreads that are not per cpu but have PF_NO_SETAFFINITY
> flag set to indicate that we shouldn't mess with their CPU affinity.
> For other kthreads, their affinity will be changed to skip CPUs dedicated
> to child partitions whether it is an isolating or a scheduling one.
>
> As all the per cpu kthreads have PF_NO_SETAFFINITY set, the
> PF_NO_SETAFFINITY tasks are essentially a superset of per cpu kthreads.
> Fix this issue by dropping the kthread_is_per_cpu() check and checking
> the PF_NO_SETAFFINITY flag instead.
>
> Fixes: ec5fbdfb99d1 ("cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset")
> Signed-off-by: Waiman Long <longman@redhat.com>
Applied to cgroup/for-6.15-fixes.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks
2025-05-09 17:30 ` Tejun Heo
@ 2025-05-09 21:19 ` Frederic Weisbecker
0 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2025-05-09 21:19 UTC (permalink / raw)
To: Tejun Heo
Cc: Waiman Long, Johannes Weiner, Michal Koutný, cgroups,
linux-kernel, Xi Wang
Le Fri, May 09, 2025 at 07:30:51AM -1000, Tejun Heo a écrit :
> Hello,
>
> On Fri, May 09, 2025 at 03:18:17PM +0200, Frederic Weisbecker wrote:
> ...
> > But this makes me realize I overlooked that when I introduced the unbound kthreads
> > centralized affinity.
> >
> > cpuset_update_tasks_cpumask() seem to blindly affine to subpartitions_cpus
> > while unbound kthreads might have their preferences (per-nodes or random cpumasks).
> >
> > So I need to make that pass through kthread API.
>
> I wonder whether it'd be cleaner if all kthread affinity restrictions go
> through housekeeping instead of cpuset modifying the cpumasks directly so
> that housekeeping keeps track of where different classes of kthreads can run
> and tell e.g. workqueue what to do.
Good suggestion. "isolated_cpus" should indeed be handled by housekeeping
itself. More precisely housekeeping_cpu(HK_TYPE_DOMAIN) should be updated
through some housekeeping_update() function to union the boot 'isolcpus='
and the isolated mask of cpusets partition. Waiman tried that at some point.
This will require some synchronization against the readers of HK_TYPE_DOMAIN.
It's beyond the scope of the kthreads affinity issue but yes that's all
planned within the cpusets integration of nohz_full.
Thanks.
>
> Thanks.
>
> --
> tejun
--
Frederic Weisbecker
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-09 21:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-08 19:24 [PATCH v2] cgroup/cpuset: Extend kthread_is_per_cpu() check to all PF_NO_SETAFFINITY tasks Waiman Long
2025-05-09 13:18 ` Frederic Weisbecker
2025-05-09 14:08 ` Waiman Long
2025-05-09 17:30 ` Tejun Heo
2025-05-09 21:19 ` Frederic Weisbecker
2025-05-09 17:35 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).