linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/core: Mask out offline CPUs when user_cpus_ptr is used
@ 2025-07-15 15:58 Waiman Long
  2025-07-18  2:42 ` Chen Ridong
  0 siblings, 1 reply; 3+ messages in thread
From: Waiman Long @ 2025-07-15 15:58 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider
  Cc: linux-kernel, cgroups, Chen Ridong, Johannes Weiner,
	Michal Koutný, Waiman Long

Chen Ridong reported that cpuset could report a kernel warning for a task
due to set_cpus_allowed_ptr() returning failure in the corner case that:

1) the task used sched_setaffinity(2) to set its CPU affinity mask to
   be the same as the cpuset.cpus of its cpuset,
2) all the CPUs assigned to that cpuset were taken offline, and
3) cpuset v1 is in use and the task had to be migrated to the top cpuset.

Due to the fact that CPU affinity of the tasks in the top cpuset are
not updated when a CPU hotplug online/offline event happens, offline
CPUs are included in CPU affinity of those tasks. It is possible
that further masking with user_cpus_ptr set by sched_setaffinity(2)
in __set_cpus_allowed_ptr() will leave only offline CPUs in the new
mask causing the subsequent call to __set_cpus_allowed_ptr_locked()
to return failure with an empty CPU affinity.

Fix this failure by masking out offline CPUs when user_cpus_ptr masking
has to be done and fall back to ignoring user_cpus_ptr if the resulting
cpumask is empty.

Reported-by: Chen Ridong <chenridong@huaweicloud.com>
Closes: https://lore.kernel.org/lkml/20250714032311.3570157-1-chenridong@huaweicloud.com/
Fixes: da019032819a ("sched: Enforce user requested affinity")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sched/core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81c6df746df1..4cf25dd8827f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3172,10 +3172,15 @@ int __set_cpus_allowed_ptr(struct task_struct *p, struct affinity_context *ctx)
 	/*
 	 * Masking should be skipped if SCA_USER or any of the SCA_MIGRATE_*
 	 * flags are set.
+	 *
+	 * Even though the given new_mask must have at least one online CPU,
+	 * masking with user_cpus_ptr may strip out all online CPUs causing
+	 * failure. So offline CPUs have to be masked out too.
 	 */
 	if (p->user_cpus_ptr &&
 	    !(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) &&
-	    cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr))
+	    cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr) &&
+	    cpumask_and(rq->scratch_mask, rq->scratch_mask, cpu_active_mask))
 		ctx->new_mask = rq->scratch_mask;
 
 	return __set_cpus_allowed_ptr_locked(p, ctx, rq, &rf);
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched/core: Mask out offline CPUs when user_cpus_ptr is used
  2025-07-15 15:58 [PATCH] sched/core: Mask out offline CPUs when user_cpus_ptr is used Waiman Long
@ 2025-07-18  2:42 ` Chen Ridong
  2025-07-18 14:18   ` Waiman Long
  0 siblings, 1 reply; 3+ messages in thread
From: Chen Ridong @ 2025-07-18  2:42 UTC (permalink / raw)
  To: Waiman Long, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider
  Cc: linux-kernel, cgroups, Johannes Weiner, Michal Koutný



On 2025/7/15 23:58, Waiman Long wrote:
> Chen Ridong reported that cpuset could report a kernel warning for a task
> due to set_cpus_allowed_ptr() returning failure in the corner case that:
> 
> 1) the task used sched_setaffinity(2) to set its CPU affinity mask to
>    be the same as the cpuset.cpus of its cpuset,
> 2) all the CPUs assigned to that cpuset were taken offline, and
> 3) cpuset v1 is in use and the task had to be migrated to the top cpuset.
> 
> Due to the fact that CPU affinity of the tasks in the top cpuset are
> not updated when a CPU hotplug online/offline event happens, offline
> CPUs are included in CPU affinity of those tasks. It is possible
> that further masking with user_cpus_ptr set by sched_setaffinity(2)
> in __set_cpus_allowed_ptr() will leave only offline CPUs in the new
> mask causing the subsequent call to __set_cpus_allowed_ptr_locked()
> to return failure with an empty CPU affinity.
> 
> Fix this failure by masking out offline CPUs when user_cpus_ptr masking
> has to be done and fall back to ignoring user_cpus_ptr if the resulting
> cpumask is empty.
> 
> Reported-by: Chen Ridong <chenridong@huaweicloud.com>
> Closes: https://lore.kernel.org/lkml/20250714032311.3570157-1-chenridong@huaweicloud.com/
> Fixes: da019032819a ("sched: Enforce user requested affinity")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/sched/core.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 81c6df746df1..4cf25dd8827f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3172,10 +3172,15 @@ int __set_cpus_allowed_ptr(struct task_struct *p, struct affinity_context *ctx)
>  	/*
>  	 * Masking should be skipped if SCA_USER or any of the SCA_MIGRATE_*
>  	 * flags are set.
> +	 *
> +	 * Even though the given new_mask must have at least one online CPU,
> +	 * masking with user_cpus_ptr may strip out all online CPUs causing
> +	 * failure. So offline CPUs have to be masked out too.
>  	 */
>  	if (p->user_cpus_ptr &&
>  	    !(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) &&
> -	    cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr))
> +	    cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr) &&
> +	    cpumask_and(rq->scratch_mask, rq->scratch_mask, cpu_active_mask))
>  		ctx->new_mask = rq->scratch_mask;
>  
>  	return __set_cpus_allowed_ptr_locked(p, ctx, rq, &rf);

Hi, Waiman,
Would the following modification make more sense?

  	if (p->user_cpus_ptr &&
  	    !(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) &&
 -	    cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr))
 +	    cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr) &&
 +	    cpumask_intersects(rq->scratch_mask, cpu_active_mask))
  		ctx->new_mask = rq->scratch_mask;

This can preserve user intent as much as possible.

Best regards,
Ridong


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched/core: Mask out offline CPUs when user_cpus_ptr is used
  2025-07-18  2:42 ` Chen Ridong
@ 2025-07-18 14:18   ` Waiman Long
  0 siblings, 0 replies; 3+ messages in thread
From: Waiman Long @ 2025-07-18 14:18 UTC (permalink / raw)
  To: Chen Ridong, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider
  Cc: linux-kernel, cgroups, Johannes Weiner, Michal Koutný

On 7/17/25 10:42 PM, Chen Ridong wrote:
>
> On 2025/7/15 23:58, Waiman Long wrote:
>> Chen Ridong reported that cpuset could report a kernel warning for a task
>> due to set_cpus_allowed_ptr() returning failure in the corner case that:
>>
>> 1) the task used sched_setaffinity(2) to set its CPU affinity mask to
>>     be the same as the cpuset.cpus of its cpuset,
>> 2) all the CPUs assigned to that cpuset were taken offline, and
>> 3) cpuset v1 is in use and the task had to be migrated to the top cpuset.
>>
>> Due to the fact that CPU affinity of the tasks in the top cpuset are
>> not updated when a CPU hotplug online/offline event happens, offline
>> CPUs are included in CPU affinity of those tasks. It is possible
>> that further masking with user_cpus_ptr set by sched_setaffinity(2)
>> in __set_cpus_allowed_ptr() will leave only offline CPUs in the new
>> mask causing the subsequent call to __set_cpus_allowed_ptr_locked()
>> to return failure with an empty CPU affinity.
>>
>> Fix this failure by masking out offline CPUs when user_cpus_ptr masking
>> has to be done and fall back to ignoring user_cpus_ptr if the resulting
>> cpumask is empty.
>>
>> Reported-by: Chen Ridong <chenridong@huaweicloud.com>
>> Closes: https://lore.kernel.org/lkml/20250714032311.3570157-1-chenridong@huaweicloud.com/
>> Fixes: da019032819a ("sched: Enforce user requested affinity")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/sched/core.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 81c6df746df1..4cf25dd8827f 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -3172,10 +3172,15 @@ int __set_cpus_allowed_ptr(struct task_struct *p, struct affinity_context *ctx)
>>   	/*
>>   	 * Masking should be skipped if SCA_USER or any of the SCA_MIGRATE_*
>>   	 * flags are set.
>> +	 *
>> +	 * Even though the given new_mask must have at least one online CPU,
>> +	 * masking with user_cpus_ptr may strip out all online CPUs causing
>> +	 * failure. So offline CPUs have to be masked out too.
>>   	 */
>>   	if (p->user_cpus_ptr &&
>>   	    !(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) &&
>> -	    cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr))
>> +	    cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr) &&
>> +	    cpumask_and(rq->scratch_mask, rq->scratch_mask, cpu_active_mask))
>>   		ctx->new_mask = rq->scratch_mask;
>>   
>>   	return __set_cpus_allowed_ptr_locked(p, ctx, rq, &rf);
> Hi, Waiman,
> Would the following modification make more sense?
>
>    	if (p->user_cpus_ptr &&
>    	    !(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) &&
>   -	    cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr))
>   +	    cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr) &&
>   +	    cpumask_intersects(rq->scratch_mask, cpu_active_mask))
>    		ctx->new_mask = rq->scratch_mask;
>
> This can preserve user intent as much as possible.

I realized that I should have used cpumask_intersects() instead after 
sending out this patch. It looks like you have come to the same 
conclusion. I will send out a v2 to update that.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-07-18 14:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15 15:58 [PATCH] sched/core: Mask out offline CPUs when user_cpus_ptr is used Waiman Long
2025-07-18  2:42 ` Chen Ridong
2025-07-18 14:18   ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).