* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
2026-01-06 9:04 ` Oleg Nesterov
@ 2026-01-06 10:06 ` Qing Wang
2026-01-06 10:26 ` Qing Wang
` (2 subsequent siblings)
3 siblings, 0 replies; 17+ messages in thread
From: Qing Wang @ 2026-01-06 10:06 UTC (permalink / raw)
To: oleg
Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
mingo, mjguzik, peterz, rostedt, rppt,
syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot,
wangqing7171
On Tue, 06 Jan 2026 at 17:04, Oleg Nesterov <oleg@redhat.com> wrote:
> Sorry, this description very confusing to me... Is it Task B who does
> clone? Or another Task A does copy_process() ? Could you write a more
> clear changelog?
The "<---...---clone" graph may have misled you. What I meant was that
Task A is cloned from Task B.
The modified bug timeline with explanation:
Task B
perf_event_open()
Task A <--------------------------- clone()
copy_process()
perf_event_init_task()
...
one copy failed
free_signal_struct()
close(event_fd)
perf_child_detach()
__task_pid_nr_ns()
access child task->signal
perf_event_init_task()
1. Task B create perf events by perf_event_open().
2. Task B clone Task A, and Task A have perf events copied from Task B in
this clone().
3. Task A do one clone and fail to copy one(eg. copy_mm) in
copy_process(), then goto cleanup free_signal_struct().
4. Task B do close(event_fd), and access Task A's signal after
free_signal_struct() and before perf_event_init_task() in Task A.
> At first glance this is racy. Can't task->signal be freed right after
> the check?
>
> And... Can't we make another fix? If copy_process() fails and does
> free_signal_struct(), the child has not been added to rcu protected
> lists and init_task_pid(child) was not called yet.
>
> So perhaps something like the patch below can work?
>
> Oleg.
> ---
>
> --- x/kernel/events/core.c
> +++ x/kernel/events/core.c
> @@ -1422,16 +1422,17 @@ unclone_ctx(struct perf_event_context *c
> static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
> enum pid_type type)
> {
> - u32 nr;
> + u32 nr = 0;
> /*
> * only top level events have the pid namespace they were created in
> */
> if (event->parent)
> event = event->parent;
>
> - nr = __task_pid_nr_ns(p, type, event->ns);
> + if (pid_alive(p))
> + nr = __task_pid_nr_ns(p, type, event->ns);
> /* avoid -1 if it is idle thread or runs in another ns */
> - if (!nr && !pid_alive(p))
> + if (!nr)
> nr = -1;
> return nr;
> }
I think it doesn't work, as I explained in my previous reply to Andrew:
A newly created task should not be visible to other CPUs during
creation: The perf subsystem copies the parent’s events
to the child during copy_process(). Later, when the parent closes
its own perf event, it may traverse child events and access
child_ctx->task->signal. This means that a child process that has not
yet been fully created can be referenced by other CPUs.
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
2026-01-06 9:04 ` Oleg Nesterov
2026-01-06 10:06 ` Qing Wang
@ 2026-01-06 10:26 ` Qing Wang
2026-01-06 10:58 ` Oleg Nesterov
2026-01-06 10:58 ` Qing Wang
2026-01-06 12:50 ` Oleg Nesterov
3 siblings, 1 reply; 17+ messages in thread
From: Qing Wang @ 2026-01-06 10:26 UTC (permalink / raw)
To: oleg
Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
mingo, mjguzik, peterz, rostedt, rppt,
syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot,
wangqing7171
On Tue, 06 Jan 2026 at 17:04, Oleg Nesterov <oleg@redhat.com> wrote:
> At first glance this is racy. Can't task->signal be freed right after
> the check?
>
> And... Can't we make another fix? If copy_process() fails and does
> free_signal_struct(), the child has not been added to rcu protected
> lists and init_task_pid(child) was not called yet.
>
> So perhaps something like the patch below can work?
>
> Oleg.
> ---
>
> --- x/kernel/events/core.c
> +++ x/kernel/events/core.c
> @@ -1422,16 +1422,17 @@ unclone_ctx(struct perf_event_context *c
> static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
> enum pid_type type)
> {
> - u32 nr;
> + u32 nr = 0;
> /*
> * only top level events have the pid namespace they were created in
> */
> if (event->parent)
> event = event->parent;
>
> - nr = __task_pid_nr_ns(p, type, event->ns);
> + if (pid_alive(p))
> + nr = __task_pid_nr_ns(p, type, event->ns);
> /* avoid -1 if it is idle thread or runs in another ns */
> - if (!nr && !pid_alive(p))
> + if (!nr)
> nr = -1;
> return nr;
> }
Sorry, please ignore my previous reply. I've reconsidered your code, and
using pid_alive() to check the validity of tsk->signal is actually correct.
The pid is assigned after copy_signal(), so if a task has a PID, its
tsk->signal memory is guaranteed to be valid.
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
2026-01-06 10:26 ` Qing Wang
@ 2026-01-06 10:58 ` Oleg Nesterov
0 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2026-01-06 10:58 UTC (permalink / raw)
To: Qing Wang
Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
mingo, mjguzik, peterz, rostedt, rppt,
syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot
On 01/06, Qing Wang wrote:
>
> On Tue, 06 Jan 2026 at 17:04, Oleg Nesterov <oleg@redhat.com> wrote:
> > At first glance this is racy. Can't task->signal be freed right after
> > the check?
> >
> > And... Can't we make another fix? If copy_process() fails and does
> > free_signal_struct(), the child has not been added to rcu protected
> > lists and init_task_pid(child) was not called yet.
> >
> > So perhaps something like the patch below can work?
> >
> > Oleg.
> > ---
> >
> > --- x/kernel/events/core.c
> > +++ x/kernel/events/core.c
> > @@ -1422,16 +1422,17 @@ unclone_ctx(struct perf_event_context *c
> > static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
> > enum pid_type type)
> > {
> > - u32 nr;
> > + u32 nr = 0;
> > /*
> > * only top level events have the pid namespace they were created in
> > */
> > if (event->parent)
> > event = event->parent;
> >
> > - nr = __task_pid_nr_ns(p, type, event->ns);
> > + if (pid_alive(p))
> > + nr = __task_pid_nr_ns(p, type, event->ns);
> > /* avoid -1 if it is idle thread or runs in another ns */
> > - if (!nr && !pid_alive(p))
> > + if (!nr)
> > nr = -1;
> > return nr;
> > }
>
> Sorry, please ignore my previous reply. I've reconsidered your code, and
> using pid_alive() to check the validity of tsk->signal is actually correct.
> The pid is assigned after copy_signal(), so if a task has a PID, its
> tsk->signal memory is guaranteed to be valid.
Yes, if the child wasn't fully created then init_task_pid(child) was not
called so pid_alive(p) can't be true.
OK, if you agree with this approach, can you make V2? Or do you prefer
another approach?
The patch above is not 100% correct wrt "avoid -1 ...", but it seems that
this can be fixed.
Oleg.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
2026-01-06 9:04 ` Oleg Nesterov
2026-01-06 10:06 ` Qing Wang
2026-01-06 10:26 ` Qing Wang
@ 2026-01-06 10:58 ` Qing Wang
2026-01-06 11:19 ` Oleg Nesterov
2026-01-06 12:50 ` Oleg Nesterov
3 siblings, 1 reply; 17+ messages in thread
From: Qing Wang @ 2026-01-06 10:58 UTC (permalink / raw)
To: oleg
Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
mingo, mjguzik, peterz, rostedt, rppt,
syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot,
wangqing7171
On Tue, 06 Jan 2026 at 17:04, Oleg Nesterov <oleg@redhat.com> wrote:
> At first glance this is racy. Can't task->signal be freed right after
> the check?
>
> And... Can't we make another fix? If copy_process() fails and does
> free_signal_struct(), the child has not been added to rcu protected
> lists and init_task_pid(child) was not called yet.
>
> So perhaps something like the patch below can work?
>
> Oleg.
> ---
>
> --- x/kernel/events/core.c
> +++ x/kernel/events/core.c
> @@ -1422,16 +1422,17 @@ unclone_ctx(struct perf_event_context *c
> static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
> enum pid_type type)
> {
> - u32 nr;
> + u32 nr = 0;
> /*
> * only top level events have the pid namespace they were created in
> */
> if (event->parent)
> event = event->parent;
>
> - nr = __task_pid_nr_ns(p, type, event->ns);
> + if (pid_alive(p))
> + nr = __task_pid_nr_ns(p, type, event->ns);
> /* avoid -1 if it is idle thread or runs in another ns */
> - if (!nr && !pid_alive(p))
> + if (!nr)
> nr = -1;
> return nr;
> }
Could we put the checking 'pid_alive(task)' into __task_pid_nr_ns()?
Because there is another similar use case here.
arch/s390/kernel/perf_cpum_sf.c
619,9: pid = __task_pid_nr_ns(tsk, type, event->ns);
---
diff --git a/kernel/pid.c b/kernel/pid.c
index a31771bc89c1..e8826731fa47 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -515,7 +515,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
rcu_read_lock();
if (!ns)
ns = task_active_pid_ns(current);
- if (ns)
+ if (ns && pid_alive(task))
nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
rcu_read_unlock();
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
2026-01-06 10:58 ` Qing Wang
@ 2026-01-06 11:19 ` Oleg Nesterov
2026-01-07 2:43 ` Qing Wang
0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2026-01-06 11:19 UTC (permalink / raw)
To: Qing Wang
Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
mingo, mjguzik, peterz, rostedt, rppt,
syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot
On 01/06, Qing Wang wrote:
>
> Could we put the checking 'pid_alive(task)' into __task_pid_nr_ns()?
I don't think so... see below.
> Because there is another similar use case here.
>
>
> arch/s390/kernel/perf_cpum_sf.c
> 619,9: pid = __task_pid_nr_ns(tsk, type, event->ns);
This case is not similar. This tsk was found by find_task_by_pid_ns(),
it must be fully initialized.
So I don't think it makes sense to add the additional check into
__task_pid_nr_ns().
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -515,7 +515,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
> rcu_read_lock();
> if (!ns)
> ns = task_active_pid_ns(current);
> - if (ns)
> + if (ns && pid_alive(task))
This reminds me... the 2nd "if (ns) check must die. I'll ping Cristian.
See https://lore.kernel.org/all/20251015123613.GA9456@redhat.com/
Oleg.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
2026-01-06 11:19 ` Oleg Nesterov
@ 2026-01-07 2:43 ` Qing Wang
0 siblings, 0 replies; 17+ messages in thread
From: Qing Wang @ 2026-01-07 2:43 UTC (permalink / raw)
To: oleg
Cc: Liam.Howlett, akpm, brauner, bsegall, david, dietmar.eggemann,
jack, joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes,
mingo, mjguzik, peterz, rostedt, rppt,
syzbot+e0378d4f4fe57aa2bdd0, vbabka, vincent.guittot,
wangqing7171
On Tue, 06 Jan 2026 at 19:19, Oleg Nesterov <oleg@redhat.com> wrote:
> This case is not similar. This tsk was found by find_task_by_pid_ns(),
> it must be fully initialized.
>
> So I don't think it makes sense to add the additional check into
> __task_pid_nr_ns().
I agree with this. Let's make an new patch.
> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -515,7 +515,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
> > rcu_read_lock();
> > if (!ns)
> > ns = task_active_pid_ns(current);
> > - if (ns)
> > + if (ns && pid_alive(task))
>
> This reminds me... the 2nd "if (ns) check must die. I'll ping Cristian.
> See https://lore.kernel.org/all/20251015123613.GA9456@redhat.com/
I viewed this link. Your patches is not merged on master.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
2026-01-06 9:04 ` Oleg Nesterov
` (2 preceding siblings ...)
2026-01-06 10:58 ` Qing Wang
@ 2026-01-06 12:50 ` Oleg Nesterov
2026-01-07 9:40 ` Qing Wang
2026-01-07 9:43 ` Oleg Nesterov
3 siblings, 2 replies; 17+ messages in thread
From: Oleg Nesterov @ 2026-01-06 12:50 UTC (permalink / raw)
To: Qing Wang
Cc: mingo, peterz, juri.lelli, vincent.guittot, akpm, david,
dietmar.eggemann, rostedt, bsegall, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, brauner, mjguzik, jack, joel.granados, linux-kernel,
syzbot+e0378d4f4fe57aa2bdd0
On a second thought...
sched_fork() is called before perf_event_init_task(). So perhaps
sync_child_event() could also check task->__state != TASK_NEW before
perf_event_read_event() ?
Not sure, I know nothing about perf. Would be nice if perf experts can
take a look.
Oleg.
On 01/06, Oleg Nesterov wrote:
>
> On 01/05, Qing Wang wrote:
> >
> > The race condition occurs between the failure path of copy_process() and
> > getting the PIDTYPE_TGID via __task_pid_nr_ns().
> >
> > Bug timeline:
> > Task B
> > perf_event_open()
> > Task A <--------------------------- clone()
> > copy_process()
> > perf_event_init_task()
> > ...
> > one copy failed
> > free_signal_struct() close(event_fd)
> > perf_child_detach()
> > __task_pid_nr_ns()
> > access child task->signal
>
> Sorry, this description very confusing to me... Is it Task B who does
> clone? Or another Task A does copy_process() ? Could you write a more
> clear changelog?
>
> > bad_fork_cleanup_signal:
> > - if (!(clone_flags & CLONE_THREAD))
> > - free_signal_struct(p->signal);
> > + if (!(clone_flags & CLONE_THREAD)) {
> > + free_sig = p->signal;
> > + p->signal = NULL;
> > + free_signal_struct(free_sig);
> > + }
> > bad_fork_cleanup_sighand:
> > __cleanup_sighand(p->sighand);
> > bad_fork_cleanup_fs:
> > diff --git a/kernel/pid.c b/kernel/pid.c
> > index a31771bc89c1..1a012e033552 100644
> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -329,9 +329,9 @@ EXPORT_SYMBOL_GPL(find_vpid);
> >
> > static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
> > {
> > - return (type == PIDTYPE_PID) ?
> > - &task->thread_pid :
> > - &task->signal->pids[type];
> > + if (type == PIDTYPE_PID)
> > + return &task->thread_pid;
> > + return task->signal ? &task->signal->pids[type] : NULL;
> > }
>
> At first glance this is racy. Can't task->signal be freed right after
> the check?
>
> And... Can't we make another fix? If copy_process() fails and does
> free_signal_struct(), the child has not been added to rcu protected
> lists and init_task_pid(child) was not called yet.
>
> So perhaps something like the patch below can work?
>
> Oleg.
> ---
>
> --- x/kernel/events/core.c
> +++ x/kernel/events/core.c
> @@ -1422,16 +1422,17 @@ unclone_ctx(struct perf_event_context *c
> static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
> enum pid_type type)
> {
> - u32 nr;
> + u32 nr = 0;
> /*
> * only top level events have the pid namespace they were created in
> */
> if (event->parent)
> event = event->parent;
>
> - nr = __task_pid_nr_ns(p, type, event->ns);
> + if (pid_alive(p))
> + nr = __task_pid_nr_ns(p, type, event->ns);
> /* avoid -1 if it is idle thread or runs in another ns */
> - if (!nr && !pid_alive(p))
> + if (!nr)
> nr = -1;
> return nr;
> }
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
2026-01-06 12:50 ` Oleg Nesterov
@ 2026-01-07 9:40 ` Qing Wang
2026-01-07 14:54 ` Oleg Nesterov
2026-01-07 9:43 ` Oleg Nesterov
1 sibling, 1 reply; 17+ messages in thread
From: Qing Wang @ 2026-01-07 9:40 UTC (permalink / raw)
To: oleg
Cc: thaumy.love, Liam.Howlett, akpm, brauner, bsegall, jack,
joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes, mingo,
mjguzik, peterz, rostedt, rppt, syzbot+e0378d4f4fe57aa2bdd0,
wangqing7171
On Tue, 06 Jan 2026 at 20:50, Oleg Nesterov <oleg@redhat.com> wrote:
> On a second thought...
>
> sched_fork() is called before perf_event_init_task(). So perhaps
> sync_child_event() could also check task->__state != TASK_NEW before
> perf_event_read_event() ?
>
> Not sure, I know nothing about perf. Would be nice if perf experts can
> take a look.
>
> Oleg.
I agree with your idea. But we don't need to fix this issue anymore,
because after reviewing the current mainline code, I found that it has
already been resolved(c418d8b4d7a4 "perf/core: Fix missing read event
generation on task exit") by moving sync_child_event() from
perf_child_detach() into perf_event_exit_event().
Here https://patch.msgid.link/20251209041600.963586-1-thaumy.love@gmail.com
As a result, perf_event_read_event() no longer occurs on the problematic
path reported (i.e., the close()->perf_release() path).
Qing.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
2026-01-07 9:40 ` Qing Wang
@ 2026-01-07 14:54 ` Oleg Nesterov
0 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2026-01-07 14:54 UTC (permalink / raw)
To: Qing Wang
Cc: thaumy.love, Liam.Howlett, akpm, brauner, bsegall, jack,
joel.granados, juri.lelli, linux-kernel, lorenzo.stoakes, mingo,
mjguzik, peterz, rostedt, rppt, syzbot+e0378d4f4fe57aa2bdd0
On 01/07, Qing Wang wrote:
>
> I agree with your idea. But we don't need to fix this issue anymore,
> because after reviewing the current mainline code, I found that it has
> already been resolved(c418d8b4d7a4 "perf/core: Fix missing read event
> generation on task exit") by moving sync_child_event() from
> perf_child_detach() into perf_event_exit_event().
>
> Here https://patch.msgid.link/20251209041600.963586-1-thaumy.love@gmail.com
>
> As a result, perf_event_read_event() no longer occurs on the problematic
> path reported (i.e., the close()->perf_release() path).
Great, thanks. So we can forget this problem ;)
Oleg.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
2026-01-06 12:50 ` Oleg Nesterov
2026-01-07 9:40 ` Qing Wang
@ 2026-01-07 9:43 ` Oleg Nesterov
1 sibling, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2026-01-07 9:43 UTC (permalink / raw)
To: Qing Wang
Cc: mingo, peterz, juri.lelli, vincent.guittot, akpm, david,
dietmar.eggemann, rostedt, bsegall, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, brauner, mjguzik, jack, joel.granados, linux-kernel,
syzbot+e0378d4f4fe57aa2bdd0
On 01/06, Oleg Nesterov wrote:
>
> On a second thought...
>
> sched_fork() is called before perf_event_init_task(). So perhaps
> sync_child_event() could also check task->__state != TASK_NEW before
> perf_event_read_event() ?
>
> Not sure, I know nothing about perf. Would be nice if perf experts can
> take a look.
Or something else, but we can't rely on pid_alive() or ->signal != NULL
checks.
perf_event_init_task() is called soon after dup_task_struct(), so
pid_alive() is true and child->signal == current->signal.
Lets forget about use-after-free. What if perf_child_detach() paths
call __task_pid_nr_ns() before copy_signal/etc ? In this case
perf_event_pid/perf_event_tid will return the pids of the forking
process, not the child's pids.
Oleg.
^ permalink raw reply [flat|nested] 17+ messages in thread