public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
@ 2026-01-05  4:36 Qing Wang
  2026-01-05 22:46 ` Andrew Morton
  2026-01-06  9:04 ` Oleg Nesterov
  0 siblings, 2 replies; 17+ messages in thread
From: Qing Wang @ 2026-01-05  4:36 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, akpm, david
  Cc: dietmar.eggemann, rostedt, bsegall, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, brauner, oleg, mjguzik, jack, joel.granados,
	linux-kernel, Qing Wang, syzbot+e0378d4f4fe57aa2bdd0

Syzbot reported a slab-use-after-free issue in __task_pid_nr_ns:

    BUG: KASAN: slab-use-after-free in __task_pid_nr_ns+0x1e4/0x490...
    Read of size 8 at addr ffff88807f8058a8 by task syz.1.574/8108

The race condition occurs between the failure path of copy_process() and
getting the PIDTYPE_TGID via __task_pid_nr_ns().

Bug timeline:
                                    Task B
                                    perf_event_open()
Task A <--------------------------- clone()
copy_process()
    perf_event_init_task()
    ...
    one copy failed
    free_signal_struct()            close(event_fd)
                                        perf_child_detach()
                                            __task_pid_nr_ns()
                                                access child task->signal

This is fixed by:
1. Setting task->signal = NULL in the failure cleanup path of copy_process.
2. Adding a null check for task->signal before accessing PIDTYPE_TGID from
task->signal.

Note: This bug was reported by syzbot without a reproducer.
The fix is based on code inspection and race condition analysis.

Reported-by: syzbot+e0378d4f4fe57aa2bdd0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e0378d4f4fe57aa2bdd0
Signed-off-by: Qing Wang <wangqing7171@gmail.com>
---
 kernel/fork.c | 8 ++++++--
 kernel/pid.c  | 6 +++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index b1f3915d5f8e..72b9b37a96c8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1975,6 +1975,7 @@ __latent_entropy struct task_struct *copy_process(
 	struct file *pidfile = NULL;
 	const u64 clone_flags = args->flags;
 	struct nsproxy *nsp = current->nsproxy;
+	struct signal_struct *free_sig = NULL;
 
 	/*
 	 * Don't allow sharing the root directory with processes in a different
@@ -2501,8 +2502,11 @@ __latent_entropy struct task_struct *copy_process(
 		mmput(p->mm);
 	}
 bad_fork_cleanup_signal:
-	if (!(clone_flags & CLONE_THREAD))
-		free_signal_struct(p->signal);
+	if (!(clone_flags & CLONE_THREAD)) {
+		free_sig = p->signal;
+		p->signal = NULL;
+		free_signal_struct(free_sig);
+	}
 bad_fork_cleanup_sighand:
 	__cleanup_sighand(p->sighand);
 bad_fork_cleanup_fs:
diff --git a/kernel/pid.c b/kernel/pid.c
index a31771bc89c1..1a012e033552 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -329,9 +329,9 @@ EXPORT_SYMBOL_GPL(find_vpid);
 
 static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
 {
-	return (type == PIDTYPE_PID) ?
-		&task->thread_pid :
-		&task->signal->pids[type];
+	if (type == PIDTYPE_PID)
+		return &task->thread_pid;
+	return task->signal ? &task->signal->pids[type] : NULL;
 }
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-05  4:36 [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns Qing Wang
@ 2026-01-05 22:46 ` Andrew Morton
  2026-01-06  7:07   ` Qing Wang
  2026-01-06  9:04 ` Oleg Nesterov
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2026-01-05 22:46 UTC (permalink / raw)
  To: Qing Wang
  Cc: mingo, peterz, juri.lelli, vincent.guittot, david,
	dietmar.eggemann, rostedt, bsegall, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, brauner, oleg, mjguzik, jack, joel.granados,
	linux-kernel, syzbot+e0378d4f4fe57aa2bdd0, Kees Cook

On Mon,  5 Jan 2026 12:36:27 +0800 Qing Wang <wangqing7171@gmail.com> wrote:

> Syzbot reported a slab-use-after-free issue in __task_pid_nr_ns:
> 
>     BUG: KASAN: slab-use-after-free in __task_pid_nr_ns+0x1e4/0x490...
>     Read of size 8 at addr ffff88807f8058a8 by task syz.1.574/8108
> 
> The race condition occurs between the failure path of copy_process() and
> getting the PIDTYPE_TGID via __task_pid_nr_ns().
> 
> Bug timeline:
>                                     Task B
>                                     perf_event_open()
> Task A <--------------------------- clone()
> copy_process()
>     perf_event_init_task()
>     ...
>     one copy failed
>     free_signal_struct()            close(event_fd)
>                                         perf_child_detach()
>                                             __task_pid_nr_ns()
>                                                 access child task->signal
> 
> This is fixed by:
> 1. Setting task->signal = NULL in the failure cleanup path of copy_process.
> 2. Adding a null check for task->signal before accessing PIDTYPE_TGID from
> task->signal.
> 
> Note: This bug was reported by syzbot without a reproducer.
> The fix is based on code inspection and race condition analysis.

Thanks.

> 
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -329,9 +329,9 @@ EXPORT_SYMBOL_GPL(find_vpid);
>  
>  static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
>  {
> -	return (type == PIDTYPE_PID) ?
> -		&task->thread_pid :
> -		&task->signal->pids[type];
> +	if (type == PIDTYPE_PID)
> +		return &task->thread_pid;
> +	return task->signal ? &task->signal->pids[type] : NULL;
>  }

It might be helpful to have a comment here telling readers how
task->signal can be zero.

Also, what in here prevents task->signal from being zeroed after we've
tested it and before we dereference it?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-05 22:46 ` Andrew Morton
@ 2026-01-06  7:07   ` Qing Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Qing Wang @ 2026-01-06  7:07 UTC (permalink / raw)
  To: akpm
  Cc: Liam.Howlett, brauner, bsegall, david, dietmar.eggemann, jack,
	joel.granados, juri.lelli, keescook, linux-kernel,
	lorenzo.stoakes, mingo, mjguzik, oleg, peterz, rostedt, rppt,
	syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot,
	wangqing7171

> It might be helpful to have a comment here telling readers how
> task->signal can be zero.
> 
> Also, what in here prevents task->signal from being zeroed after we've
> tested it and before we dereference it?

Thank you for your feedback. Regarding the "test-and-use" race condition
you raised, I’ve thought about it extensively but haven’t found a
better solution on the access side.

However, after re-examining the issue, I guess the root cause lies in
the copy_process() flow itself, and we may not need complex handling at
the access site:

1. The signal_struct is not fully managed by reference counting: In
the normal (successful) path of copy_process(), the signal structure is
indeed reference-counted, and its lifetime should be at least longer than
the task’s. However, in the failure/cleanup path, signal is explicitly
freed via free_signal_struct(), which prematurely ends its lifetime. At
the same time, other subsystems (e.g., perf) might still hold references
and attempt to access it—even if such access may be questionable.

2. A newly created task should not be visible to other CPUs during
creation: The perf subsystem copies the parent’s events
to the child during copy_process(). Later, when the parent closes or
manipulates its own perf event, it may traverse child events and access
child_ctx->task->signal. This means that a child process that has not
yet been fully created can be referenced by other CPUs.

Based on this analysis, I propose two possible fixes—either one should
resolve the issue:

1. Remove the explicit free_signal() in the cleanup path, and
fully managed by reference counting for signal lifetime. Currently
put_signal_struct() is only used in __put_task_struct(), so the lifetime
of signal is longer than or equal to task.

2. Defer perf_event_init_task() until after copy_signal() succeeds,
ensuring that if copy_process() failed perf events will be cleaned
up before the signal. This guarantees that no perf event can access
the signal.

I believe either approach would eliminate the issue. Could you please
review whether this analysis and the proposed solutions are correct? Any
guidance would be greatly appreciated.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-05  4:36 [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns Qing Wang
  2026-01-05 22:46 ` Andrew Morton
@ 2026-01-06  9:04 ` Oleg Nesterov
  2026-01-06 10:06   ` Qing Wang
                     ` (3 more replies)
  1 sibling, 4 replies; 17+ messages in thread
From: Oleg Nesterov @ 2026-01-06  9:04 UTC (permalink / raw)
  To: Qing Wang
  Cc: mingo, peterz, juri.lelli, vincent.guittot, akpm, david,
	dietmar.eggemann, rostedt, bsegall, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, brauner, mjguzik, jack, joel.granados, linux-kernel,
	syzbot+e0378d4f4fe57aa2bdd0

On 01/05, Qing Wang wrote:
>
> The race condition occurs between the failure path of copy_process() and
> getting the PIDTYPE_TGID via __task_pid_nr_ns().
>
> Bug timeline:
>                                     Task B
>                                     perf_event_open()
> Task A <--------------------------- clone()
> copy_process()
>     perf_event_init_task()
>     ...
>     one copy failed
>     free_signal_struct()            close(event_fd)
>                                         perf_child_detach()
>                                             __task_pid_nr_ns()
>                                                 access child task->signal

Sorry, this description very confusing to me... Is it Task B who does
clone? Or another Task A does copy_process() ? Could you write a more
clear changelog?

>  bad_fork_cleanup_signal:
> -	if (!(clone_flags & CLONE_THREAD))
> -		free_signal_struct(p->signal);
> +	if (!(clone_flags & CLONE_THREAD)) {
> +		free_sig = p->signal;
> +		p->signal = NULL;
> +		free_signal_struct(free_sig);
> +	}
>  bad_fork_cleanup_sighand:
>  	__cleanup_sighand(p->sighand);
>  bad_fork_cleanup_fs:
> diff --git a/kernel/pid.c b/kernel/pid.c
> index a31771bc89c1..1a012e033552 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -329,9 +329,9 @@ EXPORT_SYMBOL_GPL(find_vpid);
>
>  static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
>  {
> -	return (type == PIDTYPE_PID) ?
> -		&task->thread_pid :
> -		&task->signal->pids[type];
> +	if (type == PIDTYPE_PID)
> +		return &task->thread_pid;
> +	return task->signal ? &task->signal->pids[type] : NULL;
>  }

At first glance this is racy. Can't task->signal be freed right after
the check?

And... Can't we make another fix? If copy_process() fails and does
free_signal_struct(), the child has not been added to rcu protected
lists and init_task_pid(child) was not called yet.

So perhaps something like the patch below can work?

Oleg.
---

--- x/kernel/events/core.c
+++ x/kernel/events/core.c
@@ -1422,16 +1422,17 @@ unclone_ctx(struct perf_event_context *c
 static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
 				enum pid_type type)
 {
-	u32 nr;
+	u32 nr = 0;
 	/*
 	 * only top level events have the pid namespace they were created in
 	 */
 	if (event->parent)
 		event = event->parent;
 
-	nr = __task_pid_nr_ns(p, type, event->ns);
+	if (pid_alive(p))
+		nr = __task_pid_nr_ns(p, type, event->ns);
 	/* avoid -1 if it is idle thread or runs in another ns */
-	if (!nr && !pid_alive(p))
+	if (!nr)
 		nr = -1;
 	return nr;
 }


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-06  9:04 ` Oleg Nesterov
@ 2026-01-06 10:06   ` Qing Wang
  2026-01-06 10:26   ` Qing Wang
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Qing Wang @ 2026-01-06 10:06 UTC (permalink / raw)
  To: oleg
  Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
	jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
	mingo, mjguzik, peterz, rostedt, rppt,
	syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot,
	wangqing7171

On Tue, 06 Jan 2026 at 17:04, Oleg Nesterov <oleg@redhat.com> wrote:
> Sorry, this description very confusing to me... Is it Task B who does
> clone? Or another Task A does copy_process() ? Could you write a more
> clear changelog?

The "<---...---clone" graph may have misled you. What I meant was that
Task A is cloned from Task B.

The modified bug timeline with explanation:

                                    Task B
                                    perf_event_open()
Task A <--------------------------- clone()
copy_process()
    perf_event_init_task()
    ...
    one copy failed
    free_signal_struct()
                                    close(event_fd)
                                        perf_child_detach()
                                            __task_pid_nr_ns()
                                                access child task->signal
    perf_event_init_task()

1. Task B create perf events by perf_event_open().
2. Task B clone Task A, and Task A have perf events copied from Task B in
   this clone().
3. Task A do one clone and fail to copy one(eg. copy_mm) in
   copy_process(), then goto cleanup free_signal_struct().
4. Task B do close(event_fd), and access Task A's signal after
   free_signal_struct() and before perf_event_init_task() in Task A.

> At first glance this is racy. Can't task->signal be freed right after
> the check?
> 
> And... Can't we make another fix? If copy_process() fails and does
> free_signal_struct(), the child has not been added to rcu protected
> lists and init_task_pid(child) was not called yet.
> 
> So perhaps something like the patch below can work?
> 
> Oleg.
> ---
> 
> --- x/kernel/events/core.c
> +++ x/kernel/events/core.c
> @@ -1422,16 +1422,17 @@ unclone_ctx(struct perf_event_context *c
>  static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
>  				enum pid_type type)
>  {
> -	u32 nr;
> +	u32 nr = 0;
>  	/*
>  	 * only top level events have the pid namespace they were created in
>  	 */
>  	if (event->parent)
>  		event = event->parent;
>  
> -	nr = __task_pid_nr_ns(p, type, event->ns);
> +	if (pid_alive(p))
> +		nr = __task_pid_nr_ns(p, type, event->ns);
>  	/* avoid -1 if it is idle thread or runs in another ns */
> -	if (!nr && !pid_alive(p))
> +	if (!nr)
>  		nr = -1;
>  	return nr;
>  }

I think it doesn't work, as I explained in my previous reply to Andrew:

A newly created task should not be visible to other CPUs during
creation: The perf subsystem copies the parent’s events
to the child during copy_process(). Later, when the parent closes
its own perf event, it may traverse child events and access
child_ctx->task->signal. This means that a child process that has not
yet been fully created can be referenced by other CPUs.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-06  9:04 ` Oleg Nesterov
  2026-01-06 10:06   ` Qing Wang
@ 2026-01-06 10:26   ` Qing Wang
  2026-01-06 10:58     ` Oleg Nesterov
  2026-01-06 10:58   ` Qing Wang
  2026-01-06 12:50   ` Oleg Nesterov
  3 siblings, 1 reply; 17+ messages in thread
From: Qing Wang @ 2026-01-06 10:26 UTC (permalink / raw)
  To: oleg
  Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
	jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
	mingo, mjguzik, peterz, rostedt, rppt,
	syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot,
	wangqing7171

On Tue, 06 Jan 2026 at 17:04, Oleg Nesterov <oleg@redhat.com> wrote:
> At first glance this is racy. Can't task->signal be freed right after
> the check?
> 
> And... Can't we make another fix? If copy_process() fails and does
> free_signal_struct(), the child has not been added to rcu protected
> lists and init_task_pid(child) was not called yet.
> 
> So perhaps something like the patch below can work?
> 
> Oleg.
> ---
> 
> --- x/kernel/events/core.c
> +++ x/kernel/events/core.c
> @@ -1422,16 +1422,17 @@ unclone_ctx(struct perf_event_context *c
>  static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
>  				enum pid_type type)
>  {
> -	u32 nr;
> +	u32 nr = 0;
>  	/*
>  	 * only top level events have the pid namespace they were created in
>  	 */
>  	if (event->parent)
>  		event = event->parent;
>  
> -	nr = __task_pid_nr_ns(p, type, event->ns);
> +	if (pid_alive(p))
> +		nr = __task_pid_nr_ns(p, type, event->ns);
>  	/* avoid -1 if it is idle thread or runs in another ns */
> -	if (!nr && !pid_alive(p))
> +	if (!nr)
>  		nr = -1;
>  	return nr;
>  }

Sorry, please ignore my previous reply. I've reconsidered your code, and
using pid_alive() to check the validity of tsk->signal is actually correct.
The pid is assigned after copy_signal(), so if a task has a PID, its
tsk->signal memory is guaranteed to be valid.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-06 10:26   ` Qing Wang
@ 2026-01-06 10:58     ` Oleg Nesterov
  0 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2026-01-06 10:58 UTC (permalink / raw)
  To: Qing Wang
  Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
	jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
	mingo, mjguzik, peterz, rostedt, rppt,
	syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot

On 01/06, Qing Wang wrote:
>
> On Tue, 06 Jan 2026 at 17:04, Oleg Nesterov <oleg@redhat.com> wrote:
> > At first glance this is racy. Can't task->signal be freed right after
> > the check?
> >
> > And... Can't we make another fix? If copy_process() fails and does
> > free_signal_struct(), the child has not been added to rcu protected
> > lists and init_task_pid(child) was not called yet.
> >
> > So perhaps something like the patch below can work?
> >
> > Oleg.
> > ---
> >
> > --- x/kernel/events/core.c
> > +++ x/kernel/events/core.c
> > @@ -1422,16 +1422,17 @@ unclone_ctx(struct perf_event_context *c
> >  static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
> >  				enum pid_type type)
> >  {
> > -	u32 nr;
> > +	u32 nr = 0;
> >  	/*
> >  	 * only top level events have the pid namespace they were created in
> >  	 */
> >  	if (event->parent)
> >  		event = event->parent;
> >
> > -	nr = __task_pid_nr_ns(p, type, event->ns);
> > +	if (pid_alive(p))
> > +		nr = __task_pid_nr_ns(p, type, event->ns);
> >  	/* avoid -1 if it is idle thread or runs in another ns */
> > -	if (!nr && !pid_alive(p))
> > +	if (!nr)
> >  		nr = -1;
> >  	return nr;
> >  }
>
> Sorry, please ignore my previous reply. I've reconsidered your code, and
> using pid_alive() to check the validity of tsk->signal is actually correct.
> The pid is assigned after copy_signal(), so if a task has a PID, its
> tsk->signal memory is guaranteed to be valid.

Yes, if the child wasn't fully created then init_task_pid(child) was not
called so pid_alive(p) can't be true.

OK, if you agree with this approach, can you make V2? Or do you prefer
another approach?

The patch above is not 100% correct wrt "avoid -1 ...", but it seems that
this can be fixed.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-06  9:04 ` Oleg Nesterov
  2026-01-06 10:06   ` Qing Wang
  2026-01-06 10:26   ` Qing Wang
@ 2026-01-06 10:58   ` Qing Wang
  2026-01-06 11:19     ` Oleg Nesterov
  2026-01-06 12:50   ` Oleg Nesterov
  3 siblings, 1 reply; 17+ messages in thread
From: Qing Wang @ 2026-01-06 10:58 UTC (permalink / raw)
  To: oleg
  Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
	jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
	mingo, mjguzik, peterz, rostedt, rppt,
	syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot,
	wangqing7171

On Tue, 06 Jan 2026 at 17:04, Oleg Nesterov <oleg@redhat.com> wrote:
> At first glance this is racy. Can't task->signal be freed right after
> the check?
> 
> And... Can't we make another fix? If copy_process() fails and does
> free_signal_struct(), the child has not been added to rcu protected
> lists and init_task_pid(child) was not called yet.
> 
> So perhaps something like the patch below can work?
> 
> Oleg.
> ---
> 
> --- x/kernel/events/core.c
> +++ x/kernel/events/core.c
> @@ -1422,16 +1422,17 @@ unclone_ctx(struct perf_event_context *c
>  static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
>  				enum pid_type type)
>  {
> -	u32 nr;
> +	u32 nr = 0;
>  	/*
>  	 * only top level events have the pid namespace they were created in
>  	 */
>  	if (event->parent)
>  		event = event->parent;
>  
> -	nr = __task_pid_nr_ns(p, type, event->ns);
> +	if (pid_alive(p))
> +		nr = __task_pid_nr_ns(p, type, event->ns);
>  	/* avoid -1 if it is idle thread or runs in another ns */
> -	if (!nr && !pid_alive(p))
> +	if (!nr)
>  		nr = -1;
>  	return nr;
>  }

Could we put the checking 'pid_alive(task)' into __task_pid_nr_ns()?
Because there is another similar use case here.

arch/s390/kernel/perf_cpum_sf.c
  619,9: 		pid = __task_pid_nr_ns(tsk, type, event->ns);

---

diff --git a/kernel/pid.c b/kernel/pid.c
index a31771bc89c1..e8826731fa47 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -515,7 +515,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
        rcu_read_lock();
        if (!ns)
                ns = task_active_pid_ns(current);
-       if (ns)
+       if (ns && pid_alive(task))
                nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
        rcu_read_unlock();

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-06 10:58   ` Qing Wang
@ 2026-01-06 11:19     ` Oleg Nesterov
  2026-01-07  2:43       ` Qing Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2026-01-06 11:19 UTC (permalink / raw)
  To: Qing Wang
  Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
	jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
	mingo, mjguzik, peterz, rostedt, rppt,
	syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot

On 01/06, Qing Wang wrote:
>
> Could we put the checking 'pid_alive(task)' into __task_pid_nr_ns()?

I don't think so... see below.

> Because there is another similar use case here.
>
>
> arch/s390/kernel/perf_cpum_sf.c
>   619,9: 		pid = __task_pid_nr_ns(tsk, type, event->ns);

This case is not similar. This tsk was found by find_task_by_pid_ns(),
it must be fully initialized.

So I don't think it makes sense to add the additional check into
__task_pid_nr_ns().

> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -515,7 +515,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
>         rcu_read_lock();
>         if (!ns)
>                 ns = task_active_pid_ns(current);
> -       if (ns)
> +       if (ns && pid_alive(task))

This reminds me... the 2nd "if (ns) check must die. I'll ping Cristian.
See https://lore.kernel.org/all/20251015123613.GA9456@redhat.com/

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-06  9:04 ` Oleg Nesterov
                     ` (2 preceding siblings ...)
  2026-01-06 10:58   ` Qing Wang
@ 2026-01-06 12:50   ` Oleg Nesterov
  2026-01-07  9:40     ` Qing Wang
  2026-01-07  9:43     ` Oleg Nesterov
  3 siblings, 2 replies; 17+ messages in thread
From: Oleg Nesterov @ 2026-01-06 12:50 UTC (permalink / raw)
  To: Qing Wang
  Cc: mingo, peterz, juri.lelli, vincent.guittot, akpm, david,
	dietmar.eggemann, rostedt, bsegall, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, brauner, mjguzik, jack, joel.granados, linux-kernel,
	syzbot+e0378d4f4fe57aa2bdd0

On a second thought...

sched_fork() is called before perf_event_init_task(). So perhaps
sync_child_event() could also check task->__state != TASK_NEW before
perf_event_read_event() ?

Not sure, I know nothing about perf. Would be nice if perf experts can
take a look.

Oleg.

On 01/06, Oleg Nesterov wrote:
>
> On 01/05, Qing Wang wrote:
> >
> > The race condition occurs between the failure path of copy_process() and
> > getting the PIDTYPE_TGID via __task_pid_nr_ns().
> >
> > Bug timeline:
> >                                     Task B
> >                                     perf_event_open()
> > Task A <--------------------------- clone()
> > copy_process()
> >     perf_event_init_task()
> >     ...
> >     one copy failed
> >     free_signal_struct()            close(event_fd)
> >                                         perf_child_detach()
> >                                             __task_pid_nr_ns()
> >                                                 access child task->signal
> 
> Sorry, this description very confusing to me... Is it Task B who does
> clone? Or another Task A does copy_process() ? Could you write a more
> clear changelog?
> 
> >  bad_fork_cleanup_signal:
> > -	if (!(clone_flags & CLONE_THREAD))
> > -		free_signal_struct(p->signal);
> > +	if (!(clone_flags & CLONE_THREAD)) {
> > +		free_sig = p->signal;
> > +		p->signal = NULL;
> > +		free_signal_struct(free_sig);
> > +	}
> >  bad_fork_cleanup_sighand:
> >  	__cleanup_sighand(p->sighand);
> >  bad_fork_cleanup_fs:
> > diff --git a/kernel/pid.c b/kernel/pid.c
> > index a31771bc89c1..1a012e033552 100644
> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -329,9 +329,9 @@ EXPORT_SYMBOL_GPL(find_vpid);
> >
> >  static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
> >  {
> > -	return (type == PIDTYPE_PID) ?
> > -		&task->thread_pid :
> > -		&task->signal->pids[type];
> > +	if (type == PIDTYPE_PID)
> > +		return &task->thread_pid;
> > +	return task->signal ? &task->signal->pids[type] : NULL;
> >  }
> 
> At first glance this is racy. Can't task->signal be freed right after
> the check?
> 
> And... Can't we make another fix? If copy_process() fails and does
> free_signal_struct(), the child has not been added to rcu protected
> lists and init_task_pid(child) was not called yet.
> 
> So perhaps something like the patch below can work?
> 
> Oleg.
> ---
> 
> --- x/kernel/events/core.c
> +++ x/kernel/events/core.c
> @@ -1422,16 +1422,17 @@ unclone_ctx(struct perf_event_context *c
>  static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
>  				enum pid_type type)
>  {
> -	u32 nr;
> +	u32 nr = 0;
>  	/*
>  	 * only top level events have the pid namespace they were created in
>  	 */
>  	if (event->parent)
>  		event = event->parent;
>  
> -	nr = __task_pid_nr_ns(p, type, event->ns);
> +	if (pid_alive(p))
> +		nr = __task_pid_nr_ns(p, type, event->ns);
>  	/* avoid -1 if it is idle thread or runs in another ns */
> -	if (!nr && !pid_alive(p))
> +	if (!nr)
>  		nr = -1;
>  	return nr;
>  }


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-06 11:19     ` Oleg Nesterov
@ 2026-01-07  2:43       ` Qing Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Qing Wang @ 2026-01-07  2:43 UTC (permalink / raw)
  To: oleg
  Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
	jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
	mingo, mjguzik, peterz, rostedt, rppt,
	syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot,
	wangqing7171

On Tue, 06 Jan 2026 at 19:19, Oleg Nesterov <oleg@redhat.com> wrote:
> This case is not similar. This tsk was found by find_task_by_pid_ns(),
> it must be fully initialized.
> 
> So I don't think it makes sense to add the additional check into
> __task_pid_nr_ns().

I agree with this. Let's make an new patch.

> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -515,7 +515,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
> >         rcu_read_lock();
> >         if (!ns)
> >                 ns = task_active_pid_ns(current);
> > -       if (ns)
> > +       if (ns && pid_alive(task))
> 
> This reminds me... the 2nd "if (ns) check must die. I'll ping Cristian.
> See https://lore.kernel.org/all/20251015123613.GA9456@redhat.com/

I viewed this link. Your patches is not merged on master.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-06 12:50   ` Oleg Nesterov
@ 2026-01-07  9:40     ` Qing Wang
  2026-01-07 14:54       ` Oleg Nesterov
  2026-01-07  9:43     ` Oleg Nesterov
  1 sibling, 1 reply; 17+ messages in thread
From: Qing Wang @ 2026-01-07  9:40 UTC (permalink / raw)
  To: oleg
  Cc: thaumy.love, Liam.Howlett, akpm, brauner, bsegall, jack,
	joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes, mingo,
	mjguzik, peterz, rostedt, rppt, syzbot+e0378d4f4fe57aa2bdd0,
	wangqing7171

On Tue, 06 Jan 2026 at 20:50, Oleg Nesterov <oleg@redhat.com> wrote:
> On a second thought...
> 
> sched_fork() is called before perf_event_init_task(). So perhaps
> sync_child_event() could also check task->__state != TASK_NEW before
> perf_event_read_event() ?
> 
> Not sure, I know nothing about perf. Would be nice if perf experts can
> take a look.
> 
> Oleg.

I agree with your idea. But we don't need to fix this issue anymore,
because after reviewing the current mainline code, I found that it has
already been resolved(c418d8b4d7a4 "perf/core: Fix missing read event
generation on task exit") by moving sync_child_event() from
perf_child_detach() into perf_event_exit_event().

Here https://patch.msgid.link/20251209041600.963586-1-thaumy.love@gmail.com

As a result, perf_event_read_event() no longer occurs on the problematic
path reported (i.e., the close()->perf_release() path).

Qing.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-06 12:50   ` Oleg Nesterov
  2026-01-07  9:40     ` Qing Wang
@ 2026-01-07  9:43     ` Oleg Nesterov
  1 sibling, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2026-01-07  9:43 UTC (permalink / raw)
  To: Qing Wang
  Cc: mingo, peterz, juri.lelli, vincent.guittot, akpm, david,
	dietmar.eggemann, rostedt, bsegall, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, brauner, mjguzik, jack, joel.granados, linux-kernel,
	syzbot+e0378d4f4fe57aa2bdd0

On 01/06, Oleg Nesterov wrote:
>
> On a second thought...
>
> sched_fork() is called before perf_event_init_task(). So perhaps
> sync_child_event() could also check task->__state != TASK_NEW before
> perf_event_read_event() ?
>
> Not sure, I know nothing about perf. Would be nice if perf experts can
> take a look.

Or something else, but we can't rely on pid_alive() or ->signal != NULL
checks.

perf_event_init_task() is called soon after dup_task_struct(), so
pid_alive() is true and child->signal == current->signal.

Lets forget about use-after-free. What if perf_child_detach() paths
call __task_pid_nr_ns() before copy_signal/etc ? In this case
perf_event_pid/perf_event_tid will return the pids of the forking
process, not the child's pids.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-07  9:40     ` Qing Wang
@ 2026-01-07 14:54       ` Oleg Nesterov
  0 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2026-01-07 14:54 UTC (permalink / raw)
  To: Qing Wang
  Cc: thaumy.love, Liam.Howlett, akpm, brauner, bsegall, jack,
	joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes, mingo,
	mjguzik, peterz, rostedt, rppt, syzbot+e0378d4f4fe57aa2bdd0

On 01/07, Qing Wang wrote:
>
> I agree with your idea. But we don't need to fix this issue anymore,
> because after reviewing the current mainline code, I found that it has
> already been resolved(c418d8b4d7a4 "perf/core: Fix missing read event
> generation on task exit") by moving sync_child_event() from
> perf_child_detach() into perf_event_exit_event().
>
> Here https://patch.msgid.link/20251209041600.963586-1-thaumy.love@gmail.com
>
> As a result, perf_event_read_event() no longer occurs on the problematic
> path reported (i.e., the close()->perf_release() path).

Great, thanks. So we can forget this problem ;)

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
       [not found] <20260105045609.1764387-1-wangqing7171@gmail.com>
@ 2026-01-07 20:39 ` Kees Cook
  2026-01-08  2:15   ` Qing Wang
  2026-01-08  3:44   ` Qing Wang
  0 siblings, 2 replies; 17+ messages in thread
From: Kees Cook @ 2026-01-07 20:39 UTC (permalink / raw)
  To: Qing Wang
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, linux-mm,
	linux-kernel, syzbot+e0378d4f4fe57aa2bdd0

On Mon, Jan 05, 2026 at 12:56:09PM +0800, Qing Wang wrote:
> Syzbot reported a slab-use-after-free issue in __task_pid_nr_ns:
> 
>     BUG: KASAN: slab-use-after-free in __task_pid_nr_ns+0x1e4/0x490...
>     Read of size 8 at addr ffff88807f8058a8 by task syz.1.574/8108
> 
> The race condition occurs between the failure path of copy_process() and
> getting the PIDTYPE_TGID via __task_pid_nr_ns().
> 
> Bug timeline:
>                                     Task B
>                                     perf_event_open()
> Task A <--------------------------- clone()
> copy_process()
>     perf_event_init_task()
>     ...
>     one copy failed
>     free_signal_struct()            close(event_fd)
>                                         perf_child_detach()
>                                             __task_pid_nr_ns()
>                                                 access child task->signal
> 
> This is fixed by:
> 1. Setting task->signal = NULL in the failure cleanup path of copy_process.
> 2. Adding a null check for task->signal before accessing PIDTYPE_TGID from
> task->signal.
> 
> Note: This bug was reported by syzbot without a reproducer.
> The fix is based on code inspection and race condition analysis.

It seems like there is synchronization missing between the task->signal
assignment and its check in task_pid_ptr? Aren't there other ways of
checking if a task is dead? This change doesn't look right to me...

-Kees

> 
> Reported-by: syzbot+e0378d4f4fe57aa2bdd0@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=e0378d4f4fe57aa2bdd0
> Signed-off-by: Qing Wang <wangqing7171@gmail.com>
> ---
>  kernel/fork.c | 8 ++++++--
>  kernel/pid.c  | 6 +++---
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b1f3915d5f8e..72b9b37a96c8 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1975,6 +1975,7 @@ __latent_entropy struct task_struct *copy_process(
>  	struct file *pidfile = NULL;
>  	const u64 clone_flags = args->flags;
>  	struct nsproxy *nsp = current->nsproxy;
> +	struct signal_struct *free_sig = NULL;
>  
>  	/*
>  	 * Don't allow sharing the root directory with processes in a different
> @@ -2501,8 +2502,11 @@ __latent_entropy struct task_struct *copy_process(
>  		mmput(p->mm);
>  	}
>  bad_fork_cleanup_signal:
> -	if (!(clone_flags & CLONE_THREAD))
> -		free_signal_struct(p->signal);
> +	if (!(clone_flags & CLONE_THREAD)) {
> +		free_sig = p->signal;
> +		p->signal = NULL;
> +		free_signal_struct(free_sig);
> +	}
>  bad_fork_cleanup_sighand:
>  	__cleanup_sighand(p->sighand);
>  bad_fork_cleanup_fs:
> diff --git a/kernel/pid.c b/kernel/pid.c
> index a31771bc89c1..1a012e033552 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -329,9 +329,9 @@ EXPORT_SYMBOL_GPL(find_vpid);
>  
>  static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
>  {
> -	return (type == PIDTYPE_PID) ?
> -		&task->thread_pid :
> -		&task->signal->pids[type];
> +	if (type == PIDTYPE_PID)
> +		return &task->thread_pid;
> +	return task->signal ? &task->signal->pids[type] : NULL;
>  }
>  
>  /*
> -- 
> 2.34.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-07 20:39 ` Kees Cook
@ 2026-01-08  2:15   ` Qing Wang
  2026-01-08  3:44   ` Qing Wang
  1 sibling, 0 replies; 17+ messages in thread
From: Qing Wang @ 2026-01-08  2:15 UTC (permalink / raw)
  To: kees
  Cc: Liam.Howlett, akpm, bsegall, david, dietmar.eggemann, juri.lelli,
	linux-kernel, linux-mm, lorenzo.stoakes, mgorman, mhocko, mingo,
	peterz, rostedt, rppt, surenb, syzbot+e0378d4f4fe57aa2bdd0,
	vbabka, vincent.guittot, vschneid, wangqing7171

On Thu, 08 Jan 2026 at 04:39, Kees Cook <kees@kernel.org> wrote:
> It seems like there is synchronization missing between the task->signal
> assignment and its check in task_pid_ptr? Aren't there other ways of
> checking if a task is dead? This change doesn't look right to me...
> 
> -Kees

Thanks for your reply. Oleg and I discussed this and concluded that this
issue no longer exists.

Discussion: https://lore.kernel.org/all/aV5zkjzLTwKQOn9D@redhat.com/#R

Qing.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
  2026-01-07 20:39 ` Kees Cook
  2026-01-08  2:15   ` Qing Wang
@ 2026-01-08  3:44   ` Qing Wang
  1 sibling, 0 replies; 17+ messages in thread
From: Qing Wang @ 2026-01-08  3:44 UTC (permalink / raw)
  To: kees
  Cc: akpm, david, dietmar.eggemann, juri.lelli, linux-kernel, linux-mm,
	mhocko, mingo, peterz, rostedt, syzbot+e0378d4f4fe57aa2bdd0,
	vschneid, wangqing7171

On Thu, 08 Jan 2026 at 04:39, Kees Cook <kees@kernel.org> wrote:
> It seems like there is synchronization missing between the task->signal
> assignment and its check in task_pid_ptr? Aren't there other ways of
> checking if a task is dead? This change doesn't look right to me...
> 
> -Kees

Thanks for your reply. Oleg and I discussed this and concluded that this
issue no longer exists.

Discussion: https://lore.kernel.org/all/aV5zkjzLTwKQOn9D@redhat.com/#R

Qing.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-01-08  5:12 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-05  4:36 [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns Qing Wang
2026-01-05 22:46 ` Andrew Morton
2026-01-06  7:07   ` Qing Wang
2026-01-06  9:04 ` Oleg Nesterov
2026-01-06 10:06   ` Qing Wang
2026-01-06 10:26   ` Qing Wang
2026-01-06 10:58     ` Oleg Nesterov
2026-01-06 10:58   ` Qing Wang
2026-01-06 11:19     ` Oleg Nesterov
2026-01-07  2:43       ` Qing Wang
2026-01-06 12:50   ` Oleg Nesterov
2026-01-07  9:40     ` Qing Wang
2026-01-07 14:54       ` Oleg Nesterov
2026-01-07  9:43     ` Oleg Nesterov
     [not found] <20260105045609.1764387-1-wangqing7171@gmail.com>
2026-01-07 20:39 ` Kees Cook
2026-01-08  2:15   ` Qing Wang
2026-01-08  3:44   ` Qing Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox