* [PATCH 1/2] sched/core: Enable full cpumask to clear user cpumask in sched_setaffinity()
@ 2025-09-23 17:54 Waiman Long
2025-09-23 17:54 ` [PATCH 2/2] fs/proc: Show the content of task->user_cpus_ptr in /proc/<pid>/status Waiman Long
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Waiman Long @ 2025-09-23 17:54 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Jonathan Corbet
Cc: linux-kernel, linux-fsdevel, linux-doc, Andrew Morton,
David Hildenbrand, Catalin Marinas, Nico Pache, Phil Auld,
John Coleman, Waiman Long
Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask"), user provided CPU affinity via sched_setaffinity(2) is
perserved even if the task is being moved to a different cpuset.
However, that affinity is also being inherited by any subsequently
created child processes which may not want or be aware of that affinity.
One way to solve this problem is to provide a way to back off from
that user provided CPU affinity. This patch implements such a scheme
by using a full cpumask (a cpumask with all bits set) to signal the
clearing of the user cpumask to follow the default as allowed by
the current cpuset. In fact, with a full cpumask in user_cpus_ptr,
the task behavior should be the same as with a NULL user_cpus_ptr.
This patch just formalizes it without causing any incompatibility and
discard an otherwise useless cpumask.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/sched/syscalls.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 77ae87f36e84..d68c7a4ee525 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -1229,14 +1229,22 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
return retval;
/*
- * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
- * alloc_user_cpus_ptr() returns NULL.
+ * If a full cpumask is passed in, clear user_cpus_ptr and reset the
+ * current cpu affinity to the default for the current cpuset.
*/
- user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
- if (user_mask) {
- cpumask_copy(user_mask, in_mask);
+ if (cpumask_full(in_mask)) {
+ user_mask = NULL;
} else {
- return -ENOMEM;
+ /*
+ * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
+ * alloc_user_cpus_ptr() returns NULL.
+ */
+ user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
+ if (user_mask) {
+ cpumask_copy(user_mask, in_mask);
+ } else {
+ return -ENOMEM;
+ }
}
ac = (struct affinity_context){
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] fs/proc: Show the content of task->user_cpus_ptr in /proc/<pid>/status
2025-09-23 17:54 [PATCH 1/2] sched/core: Enable full cpumask to clear user cpumask in sched_setaffinity() Waiman Long
@ 2025-09-23 17:54 ` Waiman Long
2025-10-20 20:06 ` [PATCH 1/2] sched/core: Enable full cpumask to clear user cpumask in sched_setaffinity() Waiman Long
2025-10-20 20:13 ` David Hildenbrand
2 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2025-09-23 17:54 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Jonathan Corbet
Cc: linux-kernel, linux-fsdevel, linux-doc, Andrew Morton,
David Hildenbrand, Catalin Marinas, Nico Pache, Phil Auld,
John Coleman, Waiman Long
The task->user_cpus_ptr was introduced by commit b90ca8badbd1 ("sched:
Introduce task_struct::user_cpus_ptr to track requested affinity") to
keep track of user-requested CPU affinity. With commit da019032819a
("sched: Enforce user requested affinity"), user_cpus_ptr will
persistently affect how cpus_allowed will be set. So it makes sense to
enable users to see the presence of a previously set user_cpus_ptr so
they can do something about it without getting a surprise.
Add new "Cpus_user" and "Cpus_user_list" fields to /proc/<pid>/status
output via task_cpus_allowed() as the presence of user_cpus_ptr will
affect the cpus_allowed cpumask.
Signed-off-by: Waiman Long <longman@redhat.com>
---
Documentation/filesystems/proc.rst | 2 ++
fs/proc/array.c | 9 +++++++++
2 files changed, 11 insertions(+)
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 2971551b7235..fb9e7753010c 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -311,6 +311,8 @@ It's slow but very precise.
SpeculationIndirectBranch indirect branch speculation mode
Cpus_allowed mask of CPUs on which this process may run
Cpus_allowed_list Same as previous, but in "list format"
+ Cpus_user mask of user requested CPUs from sched_setaffinity(2)
+ Cpus_user_list Same as previous, but in "list format"
Mems_allowed mask of memory nodes allowed to this process
Mems_allowed_list Same as previous, but in "list format"
voluntary_ctxt_switches number of voluntary context switches
diff --git a/fs/proc/array.c b/fs/proc/array.c
index d6a0369caa93..30ceab935e13 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -405,10 +405,19 @@ static inline void task_context_switch_counts(struct seq_file *m,
static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
{
+ cpumask_t *user_cpus = task->user_cpus_ptr;
+
seq_printf(m, "Cpus_allowed:\t%*pb\n",
cpumask_pr_args(&task->cpus_mask));
seq_printf(m, "Cpus_allowed_list:\t%*pbl\n",
cpumask_pr_args(&task->cpus_mask));
+
+ if (user_cpus) {
+ seq_printf(m, "Cpus_user:\t%*pb\n", cpumask_pr_args(user_cpus));
+ seq_printf(m, "Cpus_user_list:\t%*pbl\n", cpumask_pr_args(user_cpus));
+ } else {
+ seq_puts(m, "Cpus_user:\nCpus_user_list:\n");
+ }
}
static inline void task_core_dumping(struct seq_file *m, struct task_struct *task)
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] sched/core: Enable full cpumask to clear user cpumask in sched_setaffinity()
2025-09-23 17:54 [PATCH 1/2] sched/core: Enable full cpumask to clear user cpumask in sched_setaffinity() Waiman Long
2025-09-23 17:54 ` [PATCH 2/2] fs/proc: Show the content of task->user_cpus_ptr in /proc/<pid>/status Waiman Long
@ 2025-10-20 20:06 ` Waiman Long
2025-10-20 20:13 ` David Hildenbrand
2 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2025-10-20 20:06 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Jonathan Corbet
Cc: linux-kernel, linux-fsdevel, linux-doc, Andrew Morton,
David Hildenbrand, Catalin Marinas, Nico Pache, Phil Auld,
John Coleman
On 9/23/25 1:54 PM, Waiman Long wrote:
> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
> cpumask"), user provided CPU affinity via sched_setaffinity(2) is
> perserved even if the task is being moved to a different cpuset.
> However, that affinity is also being inherited by any subsequently
> created child processes which may not want or be aware of that affinity.
>
> One way to solve this problem is to provide a way to back off from
> that user provided CPU affinity. This patch implements such a scheme
> by using a full cpumask (a cpumask with all bits set) to signal the
> clearing of the user cpumask to follow the default as allowed by
> the current cpuset. In fact, with a full cpumask in user_cpus_ptr,
> the task behavior should be the same as with a NULL user_cpus_ptr.
> This patch just formalizes it without causing any incompatibility and
> discard an otherwise useless cpumask.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/sched/syscalls.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> index 77ae87f36e84..d68c7a4ee525 100644
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -1229,14 +1229,22 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
> return retval;
>
> /*
> - * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
> - * alloc_user_cpus_ptr() returns NULL.
> + * If a full cpumask is passed in, clear user_cpus_ptr and reset the
> + * current cpu affinity to the default for the current cpuset.
> */
> - user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
> - if (user_mask) {
> - cpumask_copy(user_mask, in_mask);
> + if (cpumask_full(in_mask)) {
> + user_mask = NULL;
> } else {
> - return -ENOMEM;
> + /*
> + * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
> + * alloc_user_cpus_ptr() returns NULL.
> + */
> + user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
> + if (user_mask) {
> + cpumask_copy(user_mask, in_mask);
> + } else {
> + return -ENOMEM;
> + }
> }
>
> ac = (struct affinity_context){
Any comment or suggested improvement on this patch and the following one?
Thanks,
Longman
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] sched/core: Enable full cpumask to clear user cpumask in sched_setaffinity()
2025-09-23 17:54 [PATCH 1/2] sched/core: Enable full cpumask to clear user cpumask in sched_setaffinity() Waiman Long
2025-09-23 17:54 ` [PATCH 2/2] fs/proc: Show the content of task->user_cpus_ptr in /proc/<pid>/status Waiman Long
2025-10-20 20:06 ` [PATCH 1/2] sched/core: Enable full cpumask to clear user cpumask in sched_setaffinity() Waiman Long
@ 2025-10-20 20:13 ` David Hildenbrand
2025-10-20 20:21 ` Waiman Long
2 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2025-10-20 20:13 UTC (permalink / raw)
To: Waiman Long, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, Jonathan Corbet
Cc: linux-kernel, linux-fsdevel, linux-doc, Andrew Morton,
Catalin Marinas, Nico Pache, Phil Auld, John Coleman
On 23.09.25 19:54, Waiman Long wrote:
> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
> cpumask"), user provided CPU affinity via sched_setaffinity(2) is
> perserved even if the task is being moved to a different cpuset.
> However, that affinity is also being inherited by any subsequently
> created child processes which may not want or be aware of that affinity.
So I assume setting the affinity to the full bitmap would then allow any
child to essentially reset to the default, correct?
>
> One way to solve this problem is to provide a way to back off from
> that user provided CPU affinity. This patch implements such a scheme
> by using a full cpumask (a cpumask with all bits set) to signal the
> clearing of the user cpumask to follow the default as allowed by
> the current cpuset. In fact, with a full cpumask in user_cpus_ptr,
> the task behavior should be the same as with a NULL user_cpus_ptr.
> This patch just formalizes it without causing any incompatibility and
> discard an otherwise useless cpumask.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/sched/syscalls.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> index 77ae87f36e84..d68c7a4ee525 100644
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -1229,14 +1229,22 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
> return retval;
>
> /*
> - * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
> - * alloc_user_cpus_ptr() returns NULL.
> + * If a full cpumask is passed in, clear user_cpus_ptr and reset the
> + * current cpu affinity to the default for the current cpuset.
> */
> - user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
> - if (user_mask) {
> - cpumask_copy(user_mask, in_mask);
> + if (cpumask_full(in_mask)) {
> + user_mask = NULL;
> } else {
> - return -ENOMEM;
> + /*
> + * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
> + * alloc_user_cpus_ptr() returns NULL.
> + */
> + user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
> + if (user_mask) {
> + cpumask_copy(user_mask, in_mask);
> + } else {
> + return -ENOMEM;
> + }
> }
>
> ac = (struct affinity_context){
Not an expert on this code.
I'm only wondering if there is somehow, some way we could be breaking
user space by doing that.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] sched/core: Enable full cpumask to clear user cpumask in sched_setaffinity()
2025-10-20 20:13 ` David Hildenbrand
@ 2025-10-20 20:21 ` Waiman Long
0 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2025-10-20 20:21 UTC (permalink / raw)
To: David Hildenbrand, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, Jonathan Corbet
Cc: linux-kernel, linux-fsdevel, linux-doc, Andrew Morton,
Catalin Marinas, Nico Pache, Phil Auld, John Coleman
On 10/20/25 4:13 PM, David Hildenbrand wrote:
> On 23.09.25 19:54, Waiman Long wrote:
>> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
>> cpumask"), user provided CPU affinity via sched_setaffinity(2) is
>> perserved even if the task is being moved to a different cpuset.
>> However, that affinity is also being inherited by any subsequently
>> created child processes which may not want or be aware of that affinity.
>
> So I assume setting the affinity to the full bitmap would then allow
> any child to essentially reset to the default, correct?
Yes, that is the point.
>
>>
>> One way to solve this problem is to provide a way to back off from
>> that user provided CPU affinity. This patch implements such a scheme
>> by using a full cpumask (a cpumask with all bits set) to signal the
>> clearing of the user cpumask to follow the default as allowed by
>> the current cpuset. In fact, with a full cpumask in user_cpus_ptr,
>> the task behavior should be the same as with a NULL user_cpus_ptr.
>> This patch just formalizes it without causing any incompatibility and
>> discard an otherwise useless cpumask.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> kernel/sched/syscalls.c | 20 ++++++++++++++------
>> 1 file changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
>> index 77ae87f36e84..d68c7a4ee525 100644
>> --- a/kernel/sched/syscalls.c
>> +++ b/kernel/sched/syscalls.c
>> @@ -1229,14 +1229,22 @@ long sched_setaffinity(pid_t pid, const
>> struct cpumask *in_mask)
>> return retval;
>> /*
>> - * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
>> - * alloc_user_cpus_ptr() returns NULL.
>> + * If a full cpumask is passed in, clear user_cpus_ptr and reset
>> the
>> + * current cpu affinity to the default for the current cpuset.
>> */
>> - user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
>> - if (user_mask) {
>> - cpumask_copy(user_mask, in_mask);
>> + if (cpumask_full(in_mask)) {
>> + user_mask = NULL;
>> } else {
>> - return -ENOMEM;
>> + /*
>> + * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
>> + * alloc_user_cpus_ptr() returns NULL.
>> + */
>> + user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
>> + if (user_mask) {
>> + cpumask_copy(user_mask, in_mask);
>> + } else {
>> + return -ENOMEM;
>> + }
>> }
>> ac = (struct affinity_context){
>
> Not an expert on this code.
>
> I'm only wondering if there is somehow, some way we could be breaking
> user space by doing that.
>
I don't think so. Setting user_cpus_ptr to a full cpumask will make the
task strictly follow the cpumask restriction imposed by the current
cpuset as if user_cpus_ptr isn't set.
Cheers,
Longman
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-20 20:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-23 17:54 [PATCH 1/2] sched/core: Enable full cpumask to clear user cpumask in sched_setaffinity() Waiman Long
2025-09-23 17:54 ` [PATCH 2/2] fs/proc: Show the content of task->user_cpus_ptr in /proc/<pid>/status Waiman Long
2025-10-20 20:06 ` [PATCH 1/2] sched/core: Enable full cpumask to clear user cpumask in sched_setaffinity() Waiman Long
2025-10-20 20:13 ` David Hildenbrand
2025-10-20 20:21 ` Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).