All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Rosalie Fang <rosaliefang@google.com>
Subject: Re: [PATCH] perf/core: Fix slow perf_event_task_exit() with LBR callstacks
Date: Tue, 13 Jan 2026 13:21:18 -0800	[thread overview]
Message-ID: <aWa3TncpY3Jfd_2c@google.com> (raw)
In-Reply-To: <20260112165157.1919624-1-namhyung@kernel.org>

On Mon, Jan 12, 2026 at 08:51:57AM -0800, Namhyung Kim wrote:
> I got a report that a task is stuck in perf_event_exit_task() waiting
> for global_ctx_data_rwsem.  On large systems with lots threads, it'd
> have performance issues when it grabs the lock to iterate all threads
> in the system to allocate the context data.
> 
> And it'd block task exit path which is problematic especially under
> memory pressure.
> 
>   perf_event_open
>     perf_event_alloc
>       attach_perf_ctx_data
>         attach_global_ctx_data
>           percpu_down_write (global_ctx_data_rwsem)
>             for_each_process_thread
>               alloc_task_ctx_data
>                                                do_exit
>                                                  perf_event_exit_task
>                                                    percpu_down_read (global_ctx_data_rwsem)
> 
> It should not hold the global_ctx_data_rwsem on the exit path.  Let's
> skip allocation for exiting tasks and free the data carefully.
> 
> Reported-by: Rosalie Fang <rosaliefang@google.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  kernel/events/core.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 376fb07d869b8b50..e87bb43b7bb3dd4b 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5421,9 +5421,20 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache,
>  		return -ENOMEM;
>  
>  	for (;;) {
> -		if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) {
> +		if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) {

It seems we need to keep this casting to suppress sparse warnings.

Thanks,
Namhyung


>  			if (old)
>  				perf_free_ctx_data_rcu(old);
> +			/*
> +			 * Above try_cmpxchg() pairs with try_cmpxchg() from
> +			 * detach_task_ctx_data() such that
> +			 * if we race with perf_event_exit_task(), we must
> +			 * observe PF_EXITING.
> +			 */
> +			if (task->flags & PF_EXITING) {
> +				/* detach_task_ctx_data() may free it already */
> +				if (try_cmpxchg(&task->perf_ctx_data, &cd, NULL))
> +					perf_free_ctx_data_rcu(cd);
> +			}
>  			return 0;
>  		}
>  
> @@ -5469,6 +5480,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
>  	/* Allocate everything */
>  	scoped_guard (rcu) {
>  		for_each_process_thread(g, p) {
> +			if (p->flags & PF_EXITING)
> +				continue;
>  			cd = rcu_dereference(p->perf_ctx_data);
>  			if (cd && !cd->global) {
>  				cd->global = 1;
> @@ -14562,8 +14575,11 @@ void perf_event_exit_task(struct task_struct *task)
>  
>  	/*
>  	 * Detach the perf_ctx_data for the system-wide event.
> +	 *
> +	 * Done without holding global_ctx_data_rwsem; typically
> +	 * attach_global_ctx_data() will skip over this task, but otherwise
> +	 * attach_task_ctx_data() will observe PF_EXITING.
>  	 */
> -	guard(percpu_read)(&global_ctx_data_rwsem);
>  	detach_task_ctx_data(task);
>  }
>  
> -- 
> 2.52.0.457.g6b5491de43-goog
> 

  parent reply	other threads:[~2026-01-13 21:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-12 16:51 [PATCH] perf/core: Fix slow perf_event_task_exit() with LBR callstacks Namhyung Kim
2026-01-13  0:24 ` kernel test robot
2026-01-13 14:58 ` kernel test robot
2026-01-13 21:21 ` Namhyung Kim [this message]
2026-01-15 21:44 ` [tip: perf/core] " tip-bot2 for Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWa3TncpY3Jfd_2c@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rosaliefang@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.