public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] Task stuck on global_ctx_data_rwsem
@ 2025-12-22 23:34 Namhyung Kim
  2025-12-22 23:36 ` [BUG] perf/core: " Namhyung Kim
  0 siblings, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2025-12-22 23:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
	linux-perf-users

Hello,

I got a report that a task is stuck in perf_event_exit_task() waiting
for global_ctx_data_rwsem.  On large systems, it'd have performance
issues when it grabs the lock to iterate all threads in the system to
allocate the context data.  And it'd block task exit path which is
problematic especially under memory pressure.

  perf_event_open
    perf_event_alloc
      attach_perf_ctx_data
        attach_global_ctx_data
          percpu_down_write (global_ctx_data_rwsem)
            for_each_process_thread
              alloc_task_ctx_data
                                               do_exit
                                                 perf_event_exit_task
                                                   percpu_down_read (global_ctx_data_rwsem)

I think attach_global_ctx_data() should skip tasks with PF_EXITING and
it'd be nice if perf_event_exit_task() could release the ctx_data
unconditionally.  But I'm not sure how to synchronize them properly.

Any thoughts?

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem
  2025-12-22 23:34 [BUG] Task stuck on global_ctx_data_rwsem Namhyung Kim
@ 2025-12-22 23:36 ` Namhyung Kim
  2026-01-06 22:34   ` Namhyung Kim
  0 siblings, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2025-12-22 23:36 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
	linux-perf-users, linux-kernel

Added a subject prefix and CC LKML.

Thanks,
Namhyung

On Mon, Dec 22, 2025 at 03:34:23PM -0800, Namhyung Kim wrote:
> Hello,
> 
> I got a report that a task is stuck in perf_event_exit_task() waiting
> for global_ctx_data_rwsem.  On large systems, it'd have performance
> issues when it grabs the lock to iterate all threads in the system to
> allocate the context data.  And it'd block task exit path which is
> problematic especially under memory pressure.
> 
>   perf_event_open
>     perf_event_alloc
>       attach_perf_ctx_data
>         attach_global_ctx_data
>           percpu_down_write (global_ctx_data_rwsem)
>             for_each_process_thread
>               alloc_task_ctx_data
>                                                do_exit
>                                                  perf_event_exit_task
>                                                    percpu_down_read (global_ctx_data_rwsem)
> 
> I think attach_global_ctx_data() should skip tasks with PF_EXITING and
> it'd be nice if perf_event_exit_task() could release the ctx_data
> unconditionally.  But I'm not sure how to synchronize them properly.
> 
> Any thoughts?
> 
> Thanks,
> Namhyung
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem
  2025-12-22 23:36 ` [BUG] perf/core: " Namhyung Kim
@ 2026-01-06 22:34   ` Namhyung Kim
  2026-01-07  9:16     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2026-01-06 22:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
	linux-perf-users, linux-kernel

Hello,

On Mon, Dec 22, 2025 at 03:36:53PM -0800, Namhyung Kim wrote:
> On Mon, Dec 22, 2025 at 03:34:23PM -0800, Namhyung Kim wrote:
> > Hello,
> > 
> > I got a report that a task is stuck in perf_event_exit_task() waiting
> > for global_ctx_data_rwsem.  On large systems, it'd have performance
> > issues when it grabs the lock to iterate all threads in the system to
> > allocate the context data.  And it'd block task exit path which is
> > problematic especially under memory pressure.
> > 
> >   perf_event_open
> >     perf_event_alloc
> >       attach_perf_ctx_data
> >         attach_global_ctx_data
> >           percpu_down_write (global_ctx_data_rwsem)
> >             for_each_process_thread
> >               alloc_task_ctx_data
> >                                                do_exit
> >                                                  perf_event_exit_task
> >                                                    percpu_down_read (global_ctx_data_rwsem)
> > 
> > I think attach_global_ctx_data() should skip tasks with PF_EXITING and
> > it'd be nice if perf_event_exit_task() could release the ctx_data
> > unconditionally.  But I'm not sure how to synchronize them properly.
> > 
> > Any thoughts?

I'm curious if this makes any sense..  I feel like it needs to check the
flag again before allocation.

Thanks,
Namhyung


diff --git a/kernel/events/core.c b/kernel/events/core.c
index 376fb07d869b8b50..2a8847e95d7eb698 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5469,6 +5469,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
 	/* Allocate everything */
 	scoped_guard (rcu) {
 		for_each_process_thread(g, p) {
+			if (p->flags & PF_EXITING)
+				continue;
 			cd = rcu_dereference(p->perf_ctx_data);
 			if (cd && !cd->global) {
 				cd->global = 1;
@@ -14563,7 +14565,6 @@ void perf_event_exit_task(struct task_struct *task)
 	/*
 	 * Detach the perf_ctx_data for the system-wide event.
 	 */
-	guard(percpu_read)(&global_ctx_data_rwsem);
 	detach_task_ctx_data(task);
 }
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem
  2026-01-06 22:34   ` Namhyung Kim
@ 2026-01-07  9:16     ` Peter Zijlstra
  2026-01-07 19:01       ` Namhyung Kim
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2026-01-07  9:16 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, linux-perf-users, linux-kernel

On Tue, Jan 06, 2026 at 02:34:40PM -0800, Namhyung Kim wrote:
> Hello,
> 
> On Mon, Dec 22, 2025 at 03:36:53PM -0800, Namhyung Kim wrote:
> > On Mon, Dec 22, 2025 at 03:34:23PM -0800, Namhyung Kim wrote:
> > > Hello,
> > > 
> > > I got a report that a task is stuck in perf_event_exit_task() waiting
> > > for global_ctx_data_rwsem.  On large systems, it'd have performance
> > > issues when it grabs the lock to iterate all threads in the system to
> > > allocate the context data.  And it'd block task exit path which is
> > > problematic especially under memory pressure.
> > > 
> > >   perf_event_open
> > >     perf_event_alloc
> > >       attach_perf_ctx_data
> > >         attach_global_ctx_data
> > >           percpu_down_write (global_ctx_data_rwsem)
> > >             for_each_process_thread
> > >               alloc_task_ctx_data
> > >                                                do_exit
> > >                                                  perf_event_exit_task
> > >                                                    percpu_down_read (global_ctx_data_rwsem)
> > > 
> > > I think attach_global_ctx_data() should skip tasks with PF_EXITING and
> > > it'd be nice if perf_event_exit_task() could release the ctx_data
> > > unconditionally.  But I'm not sure how to synchronize them properly.
> > > 
> > > Any thoughts?
> 
> I'm curious if this makes any sense..  I feel like it needs to check the
> flag again before allocation.
> 
> Thanks,
> Namhyung
> 
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 376fb07d869b8b50..2a8847e95d7eb698 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5469,6 +5469,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
>  	/* Allocate everything */
>  	scoped_guard (rcu) {
>  		for_each_process_thread(g, p) {
> +			if (p->flags & PF_EXITING)
> +				continue;
>  			cd = rcu_dereference(p->perf_ctx_data);
>  			if (cd && !cd->global) {
>  				cd->global = 1;

I suppose this makes sense.

> @@ -14563,7 +14565,6 @@ void perf_event_exit_task(struct task_struct *task)
>  	/*
>  	 * Detach the perf_ctx_data for the system-wide event.
>  	 */
> -	guard(percpu_read)(&global_ctx_data_rwsem);
>  	detach_task_ctx_data(task);
>  }

This would need a comment; something like:

	/*
	 * This can be done without holding global_ctx_data_rwsem
	 * because this is done after setting PF_EXITING such that
	 * attach_global_ctx_data() will skip over this task.
	 */
	WARN_ON_ONCE(!(task->flags & PF_EXITING))

But yes, I suppose this can do. The question is however, how do you get
into this predicament to begin with? Are you creating and destroying a
lot of global LBR events or something?

Would it make sense to delay detach_global_ctx_data() for a second or
so? That is, what is your event creation pattern?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem
  2026-01-07  9:16     ` Peter Zijlstra
@ 2026-01-07 19:01       ` Namhyung Kim
  2026-01-07 22:28         ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2026-01-07 19:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, linux-perf-users, linux-kernel

On Wed, Jan 07, 2026 at 10:16:52AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 06, 2026 at 02:34:40PM -0800, Namhyung Kim wrote:
> > Hello,
> > 
> > On Mon, Dec 22, 2025 at 03:36:53PM -0800, Namhyung Kim wrote:
> > > On Mon, Dec 22, 2025 at 03:34:23PM -0800, Namhyung Kim wrote:
> > > > Hello,
> > > > 
> > > > I got a report that a task is stuck in perf_event_exit_task() waiting
> > > > for global_ctx_data_rwsem.  On large systems, it'd have performance
> > > > issues when it grabs the lock to iterate all threads in the system to
> > > > allocate the context data.  And it'd block task exit path which is
> > > > problematic especially under memory pressure.
> > > > 
> > > >   perf_event_open
> > > >     perf_event_alloc
> > > >       attach_perf_ctx_data
> > > >         attach_global_ctx_data
> > > >           percpu_down_write (global_ctx_data_rwsem)
> > > >             for_each_process_thread
> > > >               alloc_task_ctx_data
> > > >                                                do_exit
> > > >                                                  perf_event_exit_task
> > > >                                                    percpu_down_read (global_ctx_data_rwsem)
> > > > 
> > > > I think attach_global_ctx_data() should skip tasks with PF_EXITING and
> > > > it'd be nice if perf_event_exit_task() could release the ctx_data
> > > > unconditionally.  But I'm not sure how to synchronize them properly.
> > > > 
> > > > Any thoughts?
> > 
> > I'm curious if this makes any sense..  I feel like it needs to check the
> > flag again before allocation.
> > 
> > Thanks,
> > Namhyung
> > 
> > 
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 376fb07d869b8b50..2a8847e95d7eb698 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -5469,6 +5469,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
> >  	/* Allocate everything */
> >  	scoped_guard (rcu) {
> >  		for_each_process_thread(g, p) {
> > +			if (p->flags & PF_EXITING)
> > +				continue;
> >  			cd = rcu_dereference(p->perf_ctx_data);
> >  			if (cd && !cd->global) {
> >  				cd->global = 1;
> 
> I suppose this makes sense.
> 
> > @@ -14563,7 +14565,6 @@ void perf_event_exit_task(struct task_struct *task)
> >  	/*
> >  	 * Detach the perf_ctx_data for the system-wide event.
> >  	 */
> > -	guard(percpu_read)(&global_ctx_data_rwsem);
> >  	detach_task_ctx_data(task);
> >  }
> 
> This would need a comment; something like:
> 
> 	/*
> 	 * This can be done without holding global_ctx_data_rwsem
> 	 * because this is done after setting PF_EXITING such that
> 	 * attach_global_ctx_data() will skip over this task.
> 	 */
> 	WARN_ON_ONCE(!(task->flags & PF_EXITING))
> 
> But yes, I suppose this can do. The question is however, how do you get
> into this predicament to begin with? Are you creating and destroying a
> lot of global LBR events or something?

I think it's just because there are too many tasks in the system like
O(100K).  And any thread going to exit needs to wait for
attach_global_ctx_data() to finish the iteration over every task.

> 
> Would it make sense to delay detach_global_ctx_data() for a second or
> so? That is, what is your event creation pattern?

I don't think it has a special pattern, but I'm curious how we can
handle a race like below.

  attach_global_ctx_data
    check p->flags & PF_EXITING
                                              do_exit
    (preemption)                                set PF_EXITING
                                                detach_task_ctx_data()
    check p->perf_ctx_data
    attach_task_ctx_data()   ---> memory leak

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem
  2026-01-07 19:01       ` Namhyung Kim
@ 2026-01-07 22:28         ` Peter Zijlstra
  2026-01-07 22:32           ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2026-01-07 22:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, linux-perf-users, linux-kernel

On Wed, Jan 07, 2026 at 11:01:53AM -0800, Namhyung Kim wrote:

> > But yes, I suppose this can do. The question is however, how do you get
> > into this predicament to begin with? Are you creating and destroying a
> > lot of global LBR events or something?
> 
> I think it's just because there are too many tasks in the system like
> O(100K).  And any thread going to exit needs to wait for
> attach_global_ctx_data() to finish the iteration over every task.

OMG, so many tasks ...

> > Would it make sense to delay detach_global_ctx_data() for a second or
> > so? That is, what is your event creation pattern?
> 
> I don't think it has a special pattern, but I'm curious how we can
> handle a race like below.
> 
>   attach_global_ctx_data
>     check p->flags & PF_EXITING
>                                               do_exit
>     (preemption)                                set PF_EXITING
>                                                 detach_task_ctx_data()
>     check p->perf_ctx_data
>     attach_task_ctx_data()   ---> memory leak

Oh right. Something like so perhaps?

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3c2a491200c6..e5e716420eb3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5421,9 +5421,19 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache,
 		return -ENOMEM;
 
 	for (;;) {
-		if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) {
+		if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) {
 			if (old)
 				perf_free_ctx_data_rcu(old);
+			/*
+			 * try_cmpxchg() pairs with try_cmpxchg() from
+			 * detach_task_ctx_data() such that
+			 * if we race with perf_event_exit_task(), we must
+			 * observe PF_EXITING.
+			 */
+			if (task->flags & PF_EXITING) {
+				task->perf_ctx_data = NULL;
+				perf_free_ctx_data_rcu(cd);
+			}
 			return 0;
 		}
 
@@ -5469,6 +5479,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
 	/* Allocate everything */
 	scoped_guard (rcu) {
 		for_each_process_thread(g, p) {
+			if (p->flags & PF_EXITING)
+				continue;
 			cd = rcu_dereference(p->perf_ctx_data);
 			if (cd && !cd->global) {
 				cd->global = 1;
@@ -14568,8 +14580,11 @@ void perf_event_exit_task(struct task_struct *task)
 
 	/*
 	 * Detach the perf_ctx_data for the system-wide event.
+	 *
+	 * Done without holding global_ctx_data_rwsem; typically
+	 * attach_global_ctx_data() will skip over this task, but otherwise
+	 * attach_task_ctx_data() will observe PF_EXITING.
 	 */
-	guard(percpu_read)(&global_ctx_data_rwsem);
 	detach_task_ctx_data(task);
 }
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem
  2026-01-07 22:28         ` Peter Zijlstra
@ 2026-01-07 22:32           ` Peter Zijlstra
  2026-01-08 19:56             ` Namhyung Kim
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2026-01-07 22:32 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, linux-perf-users, linux-kernel

On Wed, Jan 07, 2026 at 11:28:24PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 07, 2026 at 11:01:53AM -0800, Namhyung Kim wrote:
> 
> > > But yes, I suppose this can do. The question is however, how do you get
> > > into this predicament to begin with? Are you creating and destroying a
> > > lot of global LBR events or something?
> > 
> > I think it's just because there are too many tasks in the system like
> > O(100K).  And any thread going to exit needs to wait for
> > attach_global_ctx_data() to finish the iteration over every task.
> 
> OMG, so many tasks ...
> 
> > > Would it make sense to delay detach_global_ctx_data() for a second or
> > > so? That is, what is your event creation pattern?
> > 
> > I don't think it has a special pattern, but I'm curious how we can
> > handle a race like below.
> > 
> >   attach_global_ctx_data
> >     check p->flags & PF_EXITING
> >                                               do_exit
> >     (preemption)                                set PF_EXITING
> >                                                 detach_task_ctx_data()
> >     check p->perf_ctx_data
> >     attach_task_ctx_data()   ---> memory leak
> 
> Oh right. Something like so perhaps?
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 3c2a491200c6..e5e716420eb3 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5421,9 +5421,19 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache,
>  		return -ENOMEM;
>  
>  	for (;;) {
> -		if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) {
> +		if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) {
>  			if (old)
>  				perf_free_ctx_data_rcu(old);
> +			/*
> +			 * try_cmpxchg() pairs with try_cmpxchg() from
> +			 * detach_task_ctx_data() such that
> +			 * if we race with perf_event_exit_task(), we must
> +			 * observe PF_EXITING.
> +			 */
> +			if (task->flags & PF_EXITING) {
> +				task->perf_ctx_data = NULL;
> +				perf_free_ctx_data_rcu(cd);

Ugh and now it can race and do a double free, another try_cmpxchg() is
needed here.

> +			}
>  			return 0;
>  		}
>  
> @@ -5469,6 +5479,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
>  	/* Allocate everything */
>  	scoped_guard (rcu) {
>  		for_each_process_thread(g, p) {
> +			if (p->flags & PF_EXITING)
> +				continue;
>  			cd = rcu_dereference(p->perf_ctx_data);
>  			if (cd && !cd->global) {
>  				cd->global = 1;
> @@ -14568,8 +14580,11 @@ void perf_event_exit_task(struct task_struct *task)
>  
>  	/*
>  	 * Detach the perf_ctx_data for the system-wide event.
> +	 *
> +	 * Done without holding global_ctx_data_rwsem; typically
> +	 * attach_global_ctx_data() will skip over this task, but otherwise
> +	 * attach_task_ctx_data() will observe PF_EXITING.
>  	 */
> -	guard(percpu_read)(&global_ctx_data_rwsem);
>  	detach_task_ctx_data(task);
>  }
>  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem
  2026-01-07 22:32           ` Peter Zijlstra
@ 2026-01-08 19:56             ` Namhyung Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Namhyung Kim @ 2026-01-08 19:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, linux-perf-users, linux-kernel

On Wed, Jan 07, 2026 at 11:32:56PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 07, 2026 at 11:28:24PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 07, 2026 at 11:01:53AM -0800, Namhyung Kim wrote:
> > 
> > > > But yes, I suppose this can do. The question is however, how do you get
> > > > into this predicament to begin with? Are you creating and destroying a
> > > > lot of global LBR events or something?
> > > 
> > > I think it's just because there are too many tasks in the system like
> > > O(100K).  And any thread going to exit needs to wait for
> > > attach_global_ctx_data() to finish the iteration over every task.
> > 
> > OMG, so many tasks ...
> > 
> > > > Would it make sense to delay detach_global_ctx_data() for a second or
> > > > so? That is, what is your event creation pattern?
> > > 
> > > I don't think it has a special pattern, but I'm curious how we can
> > > handle a race like below.
> > > 
> > >   attach_global_ctx_data
> > >     check p->flags & PF_EXITING
> > >                                               do_exit
> > >     (preemption)                                set PF_EXITING
> > >                                                 detach_task_ctx_data()
> > >     check p->perf_ctx_data
> > >     attach_task_ctx_data()   ---> memory leak
> > 
> > Oh right. Something like so perhaps?
> > 
> > ---
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 3c2a491200c6..e5e716420eb3 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -5421,9 +5421,19 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache,
> >  		return -ENOMEM;
> >  
> >  	for (;;) {
> > -		if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) {
> > +		if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) {
> >  			if (old)
> >  				perf_free_ctx_data_rcu(old);
> > +			/*
> > +			 * try_cmpxchg() pairs with try_cmpxchg() from
> > +			 * detach_task_ctx_data() such that
> > +			 * if we race with perf_event_exit_task(), we must
> > +			 * observe PF_EXITING.
> > +			 */
> > +			if (task->flags & PF_EXITING) {
> > +				task->perf_ctx_data = NULL;
> > +				perf_free_ctx_data_rcu(cd);
> 
> Ugh and now it can race and do a double free, another try_cmpxchg() is
> needed here.

Thanks!  Something like this?

Namhyung


diff --git a/kernel/events/core.c b/kernel/events/core.c
index 376fb07d869b8b50..cf252d8f49b2b259 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5421,9 +5421,20 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache,
 		return -ENOMEM;
 
 	for (;;) {
-		if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) {
+		if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) {
 			if (old)
 				perf_free_ctx_data_rcu(old);
+			/*
+			 * try_cmpxchg() pairs with try_cmpxchg() from
+			 * detach_task_ctx_data() such that
+			 * if we race with perf_event_exit_task(), we must
+			 * observe PF_EXITING.
+			 */
+			if (task->flags & PF_EXITING) {
+				/* detach_task_ctx_data() may free it already */
+				if (try_cmpxchg(&task->perf_ctx_data, &cd, NULL))
+					perf_free_ctx_data_rcu(cd);
+			}
 			return 0;
 		}
 
@@ -5469,6 +5480,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
 	/* Allocate everything */
 	scoped_guard (rcu) {
 		for_each_process_thread(g, p) {
+			if (p->flags & PF_EXITING)
+				continue;
 			cd = rcu_dereference(p->perf_ctx_data);
 			if (cd && !cd->global) {
 				cd->global = 1;
@@ -14562,8 +14575,11 @@ void perf_event_exit_task(struct task_struct *task)
 
 	/*
 	 * Detach the perf_ctx_data for the system-wide event.
+	 *
+	 * Done without holding global_ctx_data_rwsem; typically
+	 * attach_global_ctx_data() will skip over this task, but otherwise
+	 * attach_task_ctx_data() will observe PF_EXITING.
 	 */
-	guard(percpu_read)(&global_ctx_data_rwsem);
 	detach_task_ctx_data(task);
 }
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-01-08 19:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-22 23:34 [BUG] Task stuck on global_ctx_data_rwsem Namhyung Kim
2025-12-22 23:36 ` [BUG] perf/core: " Namhyung Kim
2026-01-06 22:34   ` Namhyung Kim
2026-01-07  9:16     ` Peter Zijlstra
2026-01-07 19:01       ` Namhyung Kim
2026-01-07 22:28         ` Peter Zijlstra
2026-01-07 22:32           ` Peter Zijlstra
2026-01-08 19:56             ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox