* [PATCH 1/4] mm/mempolicy: kernel_migrate_pages: simplify the usage of put_task_struct()
2026-05-24 16:57 [PATCH 0/4] mm/mempolicy: kernel_migrate_pages: fix race between security checks and suid exec Oleg Nesterov
@ 2026-05-24 16:57 ` Oleg Nesterov
2026-05-25 2:18 ` Gregory Price
2026-05-25 2:20 ` Gregory Price
2026-05-24 16:58 ` [PATCH 2/4] mm/mempolicy: kernel_migrate_pages: use find_get_task_by_vpid() Oleg Nesterov
` (3 subsequent siblings)
4 siblings, 2 replies; 10+ messages in thread
From: Oleg Nesterov @ 2026-05-24 16:57 UTC (permalink / raw)
To: Alistair Popple, Andrew Morton, Byungchul Park, David Hildenbrand,
Gregory Price, Joshua Hahn, Matthew Brost, Rakie Kim, Ying Huang,
Zi Yan
Cc: Jann Horn, Kees Cook, linux-mm, linux-kernel
kernel_migrate_pages() calls put_task_struct() twice; this complicates the
code. Move put_task_struct() to the unified exit path. This also allows us
to move the "out_put" label up and avoid another goto.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
mm/mempolicy.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4e4421b22b59..4326dff16aa6 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1922,25 +1922,20 @@ static int kernel_migrate_pages(pid_t pid, unsigned long maxnode,
goto out_put;
mm = get_task_mm(task);
- put_task_struct(task);
-
if (!mm) {
err = -EINVAL;
- goto out;
+ goto out_put;
}
err = do_migrate_pages(mm, old, new,
capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
mmput(mm);
+out_put:
+ put_task_struct(task);
out:
NODEMASK_SCRATCH_FREE(scratch);
-
return err;
-
-out_put:
- put_task_struct(task);
- goto out;
}
SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 1/4] mm/mempolicy: kernel_migrate_pages: simplify the usage of put_task_struct()
2026-05-24 16:57 ` [PATCH 1/4] mm/mempolicy: kernel_migrate_pages: simplify the usage of put_task_struct() Oleg Nesterov
@ 2026-05-25 2:18 ` Gregory Price
2026-05-25 2:20 ` Gregory Price
1 sibling, 0 replies; 10+ messages in thread
From: Gregory Price @ 2026-05-25 2:18 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Alistair Popple, Andrew Morton, Byungchul Park, David Hildenbrand,
Joshua Hahn, Matthew Brost, Rakie Kim, Ying Huang, Zi Yan,
Jann Horn, Kees Cook, linux-mm, linux-kernel
On Sun, May 24, 2026 at 06:57:59PM +0200, Oleg Nesterov wrote:
> kernel_migrate_pages() calls put_task_struct() twice; this complicates the
> code. Move put_task_struct() to the unified exit path. This also allows us
> to move the "out_put" label up and avoid another goto.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
> ---
> mm/mempolicy.c | 11 +++--------
> 1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 4e4421b22b59..4326dff16aa6 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1922,25 +1922,20 @@ static int kernel_migrate_pages(pid_t pid, unsigned long maxnode,
> goto out_put;
>
> mm = get_task_mm(task);
> - put_task_struct(task);
> -
> if (!mm) {
> err = -EINVAL;
> - goto out;
> + goto out_put;
> }
>
> err = do_migrate_pages(mm, old, new,
> capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
>
> mmput(mm);
> +out_put:
> + put_task_struct(task);
> out:
> NODEMASK_SCRATCH_FREE(scratch);
> -
> return err;
> -
> -out_put:
> - put_task_struct(task);
> - goto out;
> }
>
> SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 1/4] mm/mempolicy: kernel_migrate_pages: simplify the usage of put_task_struct()
2026-05-24 16:57 ` [PATCH 1/4] mm/mempolicy: kernel_migrate_pages: simplify the usage of put_task_struct() Oleg Nesterov
2026-05-25 2:18 ` Gregory Price
@ 2026-05-25 2:20 ` Gregory Price
2026-05-25 8:27 ` Oleg Nesterov
1 sibling, 1 reply; 10+ messages in thread
From: Gregory Price @ 2026-05-25 2:20 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Alistair Popple, Andrew Morton, Byungchul Park, David Hildenbrand,
Joshua Hahn, Matthew Brost, Rakie Kim, Ying Huang, Zi Yan,
Jann Horn, Kees Cook, linux-mm, linux-kernel
On Sun, May 24, 2026 at 06:57:59PM +0200, Oleg Nesterov wrote:
> kernel_migrate_pages() calls put_task_struct() twice; this complicates the
> code. Move put_task_struct() to the unified exit path. This also allows us
> to move the "out_put" label up and avoid another goto.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> mm/mempolicy.c | 11 +++--------
> 1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 4e4421b22b59..4326dff16aa6 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1922,25 +1922,20 @@ static int kernel_migrate_pages(pid_t pid, unsigned long maxnode,
> goto out_put;
>
> mm = get_task_mm(task);
> - put_task_struct(task);
> -
Hm, I was a little trigger-happy on that review, I just realized that
this increases the scope in which there is a task reference held (i.e.
we're now holding a reference across do_migrate_pages())
I don't *think* that's an issue, but it might we worth looking at the
changelog to see if this was done intentionally.
> if (!mm) {
> err = -EINVAL;
> - goto out;
> + goto out_put;
> }
>
> err = do_migrate_pages(mm, old, new,
> capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
>
> mmput(mm);
> +out_put:
> + put_task_struct(task);
> out:
> NODEMASK_SCRATCH_FREE(scratch);
> -
> return err;
> -
> -out_put:
> - put_task_struct(task);
> - goto out;
> }
>
> SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 1/4] mm/mempolicy: kernel_migrate_pages: simplify the usage of put_task_struct()
2026-05-25 2:20 ` Gregory Price
@ 2026-05-25 8:27 ` Oleg Nesterov
2026-05-26 15:22 ` Gregory Price
0 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2026-05-25 8:27 UTC (permalink / raw)
To: Gregory Price
Cc: Alistair Popple, Andrew Morton, Byungchul Park, David Hildenbrand,
Joshua Hahn, Matthew Brost, Rakie Kim, Ying Huang, Zi Yan,
Jann Horn, Kees Cook, linux-mm, linux-kernel
On 05/24, Gregory Price wrote:
>
> On Sun, May 24, 2026 at 06:57:59PM +0200, Oleg Nesterov wrote:
> >
> > mm = get_task_mm(task);
> > - put_task_struct(task);
> > -
>
> Hm, I was a little trigger-happy on that review, I just realized that
> this increases the scope in which there is a task reference held (i.e.
> we're now holding a reference across do_migrate_pages())
>
> I don't *think* that's an issue, but it might we worth looking at the
> changelog to see if this was done intentionally.
I did this to simplify the code and the next changes. We pin this task's
memory across do_migrate_pages(), so I think that delaying put_task_struct()
doesn't add too much.
But of course, this is not necessary, I can rework if you don't like it.
Thanks for review!
Oleg.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/4] mm/mempolicy: kernel_migrate_pages: simplify the usage of put_task_struct()
2026-05-25 8:27 ` Oleg Nesterov
@ 2026-05-26 15:22 ` Gregory Price
0 siblings, 0 replies; 10+ messages in thread
From: Gregory Price @ 2026-05-26 15:22 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Alistair Popple, Andrew Morton, Byungchul Park, David Hildenbrand,
Joshua Hahn, Matthew Brost, Rakie Kim, Ying Huang, Zi Yan,
Jann Horn, Kees Cook, linux-mm, linux-kernel
On Mon, May 25, 2026 at 10:27:06AM +0200, Oleg Nesterov wrote:
> On 05/24, Gregory Price wrote:
> >
> > On Sun, May 24, 2026 at 06:57:59PM +0200, Oleg Nesterov wrote:
> > >
> > > mm = get_task_mm(task);
> > > - put_task_struct(task);
> > > -
> >
> > Hm, I was a little trigger-happy on that review, I just realized that
> > this increases the scope in which there is a task reference held (i.e.
> > we're now holding a reference across do_migrate_pages())
> >
> > I don't *think* that's an issue, but it might we worth looking at the
> > changelog to see if this was done intentionally.
>
> I did this to simplify the code and the next changes. We pin this task's
> memory across do_migrate_pages(), so I think that delaying put_task_struct()
> doesn't add too much.
>
> But of course, this is not necessary, I can rework if you don't like it.
>
Not necessary, just a curiosity whether holding the reference across the
larger scope has any functional difference. Like I said, I don't think
that's an issue, but might be worth just checking the history on the
function to see if it was done that way on purpose.
> Thanks for review!
>
> Oleg.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/4] mm/mempolicy: kernel_migrate_pages: use find_get_task_by_vpid()
2026-05-24 16:57 [PATCH 0/4] mm/mempolicy: kernel_migrate_pages: fix race between security checks and suid exec Oleg Nesterov
2026-05-24 16:57 ` [PATCH 1/4] mm/mempolicy: kernel_migrate_pages: simplify the usage of put_task_struct() Oleg Nesterov
@ 2026-05-24 16:58 ` Oleg Nesterov
2026-05-24 16:58 ` [PATCH 3/4] mm/mempolicy: kernel_migrate_pages: check ptrace_may_access() after nodes_and() Oleg Nesterov
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2026-05-24 16:58 UTC (permalink / raw)
To: Alistair Popple, Andrew Morton, Byungchul Park, David Hildenbrand,
Gregory Price, Joshua Hahn, Matthew Brost, Rakie Kim, Ying Huang,
Zi Yan
Cc: Jann Horn, Kees Cook, linux-mm, linux-kernel
kernel_migrate_pages() calls ptrace_may_access() under rcu_read_lock() for
no reason. This is a leftover from before the commit 313674661925 ("Unify
migrate_pages and move_pages access checks"), where rcu_read_lock() was
needed to protect __task_cred(task).
So we can drop the RCU lock right after get_task_struct(). Better yet, if
pid != 0, we can use find_get_task_by_vpid() which does get_task_struct()
itself, and avoid get/put_task_struct() otherwise.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
mm/mempolicy.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4326dff16aa6..2ec14001e4dc 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1883,15 +1883,11 @@ static int kernel_migrate_pages(pid_t pid, unsigned long maxnode,
if (err)
goto out;
- /* Find the mm_struct */
- rcu_read_lock();
- task = pid ? find_task_by_vpid(pid) : current;
+ task = pid ? find_get_task_by_vpid(pid) : current;
if (!task) {
- rcu_read_unlock();
err = -ESRCH;
goto out;
}
- get_task_struct(task);
err = -EINVAL;
@@ -1900,11 +1896,9 @@ static int kernel_migrate_pages(pid_t pid, unsigned long maxnode,
* Use the regular "ptrace_may_access()" checks.
*/
if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
- rcu_read_unlock();
err = -EPERM;
goto out_put;
}
- rcu_read_unlock();
task_nodes = cpuset_mems_allowed(task);
/* Is the user allowed to access the target nodes? */
@@ -1932,7 +1926,8 @@ static int kernel_migrate_pages(pid_t pid, unsigned long maxnode,
mmput(mm);
out_put:
- put_task_struct(task);
+ if (pid)
+ put_task_struct(task);
out:
NODEMASK_SCRATCH_FREE(scratch);
return err;
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 3/4] mm/mempolicy: kernel_migrate_pages: check ptrace_may_access() after nodes_and()
2026-05-24 16:57 [PATCH 0/4] mm/mempolicy: kernel_migrate_pages: fix race between security checks and suid exec Oleg Nesterov
2026-05-24 16:57 ` [PATCH 1/4] mm/mempolicy: kernel_migrate_pages: simplify the usage of put_task_struct() Oleg Nesterov
2026-05-24 16:58 ` [PATCH 2/4] mm/mempolicy: kernel_migrate_pages: use find_get_task_by_vpid() Oleg Nesterov
@ 2026-05-24 16:58 ` Oleg Nesterov
2026-05-24 16:58 ` [PATCH 4/4] mm/mempolicy: kernel_migrate_pages: fix race between security checks and suid exec Oleg Nesterov
2026-05-24 19:29 ` [PATCH 0/4] " Oleg Nesterov
4 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2026-05-24 16:58 UTC (permalink / raw)
To: Alistair Popple, Andrew Morton, Byungchul Park, David Hildenbrand,
Gregory Price, Joshua Hahn, Matthew Brost, Rakie Kim, Ying Huang,
Zi Yan
Cc: Jann Horn, Kees Cook, linux-mm, linux-kernel
Move the ptrace_may_access() check down, after the nodemask validation.
This is just a preparation to simplify the next patch.
Note that because of this reordering, if the caller passes an invalid
nodemask and also lacks permissions, the syscall will now return -EINVAL
instead of -EPERM. Hopefully, nobody relies on the old error precedence.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
mm/mempolicy.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 2ec14001e4dc..c09ff9f9aa96 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1889,17 +1889,6 @@ static int kernel_migrate_pages(pid_t pid, unsigned long maxnode,
goto out;
}
- err = -EINVAL;
-
- /*
- * Check if this process has the right to modify the specified process.
- * Use the regular "ptrace_may_access()" checks.
- */
- if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
- err = -EPERM;
- goto out_put;
- }
-
task_nodes = cpuset_mems_allowed(task);
/* Is the user allowed to access the target nodes? */
if (!nodes_subset(*new, task_nodes) && !capable(CAP_SYS_NICE)) {
@@ -1908,8 +1897,19 @@ static int kernel_migrate_pages(pid_t pid, unsigned long maxnode,
}
task_nodes = cpuset_mems_allowed(current);
- if (!nodes_and(*new, *new, task_nodes))
+ if (!nodes_and(*new, *new, task_nodes)) {
+ err = -EINVAL;
goto out_put;
+ }
+
+ /*
+ * Check if this process has the right to modify the specified process.
+ * Use the regular "ptrace_may_access()" checks.
+ */
+ if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
+ err = -EPERM;
+ goto out_put;
+ }
err = security_task_movememory(task);
if (err)
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 4/4] mm/mempolicy: kernel_migrate_pages: fix race between security checks and suid exec
2026-05-24 16:57 [PATCH 0/4] mm/mempolicy: kernel_migrate_pages: fix race between security checks and suid exec Oleg Nesterov
` (2 preceding siblings ...)
2026-05-24 16:58 ` [PATCH 3/4] mm/mempolicy: kernel_migrate_pages: check ptrace_may_access() after nodes_and() Oleg Nesterov
@ 2026-05-24 16:58 ` Oleg Nesterov
2026-05-24 19:29 ` [PATCH 0/4] " Oleg Nesterov
4 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2026-05-24 16:58 UTC (permalink / raw)
To: Alistair Popple, Andrew Morton, Byungchul Park, David Hildenbrand,
Gregory Price, Joshua Hahn, Matthew Brost, Rakie Kim, Ying Huang,
Zi Yan
Cc: Jann Horn, Kees Cook, linux-mm, linux-kernel
The target task can execute a setuid binary between ptrace_may_access()
and get_task_mm(). Protect this critical section with exec_update_lock.
Sadly, we don't have DEFINE_LOCK_GUARD_1_COND(rwsem_read, _kill) yet.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
mm/mempolicy.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index c09ff9f9aa96..cbb45a876a93 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1902,24 +1902,31 @@ static int kernel_migrate_pages(pid_t pid, unsigned long maxnode,
goto out_put;
}
+ err = down_read_killable(&task->signal->exec_update_lock);
+ if (err)
+ goto out_put;
/*
* Check if this process has the right to modify the specified process.
* Use the regular "ptrace_may_access()" checks.
*/
if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
err = -EPERM;
- goto out_put;
+ goto unlock;
}
err = security_task_movememory(task);
if (err)
- goto out_put;
+ goto unlock;
mm = get_task_mm(task);
if (!mm) {
err = -EINVAL;
- goto out_put;
+ goto unlock;
}
+unlock:
+ up_read(&task->signal->exec_update_lock);
+ if (err)
+ goto out_put;
err = do_migrate_pages(mm, old, new,
capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 0/4] mm/mempolicy: kernel_migrate_pages: fix race between security checks and suid exec
2026-05-24 16:57 [PATCH 0/4] mm/mempolicy: kernel_migrate_pages: fix race between security checks and suid exec Oleg Nesterov
` (3 preceding siblings ...)
2026-05-24 16:58 ` [PATCH 4/4] mm/mempolicy: kernel_migrate_pages: fix race between security checks and suid exec Oleg Nesterov
@ 2026-05-24 19:29 ` Oleg Nesterov
4 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2026-05-24 19:29 UTC (permalink / raw)
To: Alistair Popple, Andrew Morton, Byungchul Park, David Hildenbrand,
Gregory Price, Joshua Hahn, Matthew Brost, Rakie Kim, Ying Huang,
Zi Yan
Cc: Jann Horn, Kees Cook, linux-mm, linux-kernel
On 05/24, Oleg Nesterov wrote:
>
> Hopefully sashiko.dev will take a look too ;)
See https://sashiko.dev/#/patchset/ahMt6xyUNnacZU8-%40redhat.com
Thanks Sashiko, let me reply.
As for reply to 3/4:
I honestly think this is not a problem... but a) I will think
again, and b) I rely on review from mm experts anyway.
The main motivation for this patch is that we can create a common
helper for find_mm_struct() and kernel_migrate_pages(), but it is
not strictly necessary.
As for reply to 4/4:
Yes! And I have already mentioned this in 0/4 and in the previous
discussions.
Plus (say) proc_coredump_filter_write() which relies on the open-time
checks and don't use ptrace_may_access() at all. This looks wrong.
But this need other changes/discussions, so I think we can ignore this
comment.
Oleg.
^ permalink raw reply [flat|nested] 10+ messages in thread