From: Namhyung Kim <namhyung@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
James Clark <james.clark@linaro.org>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem
Date: Wed, 7 Jan 2026 11:01:53 -0800 [thread overview]
Message-ID: <aV6toexK5LMc1MNY@google.com> (raw)
In-Reply-To: <20260107091652.GB3707891@noisy.programming.kicks-ass.net>
On Wed, Jan 07, 2026 at 10:16:52AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 06, 2026 at 02:34:40PM -0800, Namhyung Kim wrote:
> > Hello,
> >
> > On Mon, Dec 22, 2025 at 03:36:53PM -0800, Namhyung Kim wrote:
> > > On Mon, Dec 22, 2025 at 03:34:23PM -0800, Namhyung Kim wrote:
> > > > Hello,
> > > >
> > > > I got a report that a task is stuck in perf_event_exit_task() waiting
> > > > for global_ctx_data_rwsem. On large systems, it'd have performance
> > > > issues when it grabs the lock to iterate all threads in the system to
> > > > allocate the context data. And it'd block task exit path which is
> > > > problematic especially under memory pressure.
> > > >
> > > > perf_event_open
> > > > perf_event_alloc
> > > > attach_perf_ctx_data
> > > > attach_global_ctx_data
> > > > percpu_down_write (global_ctx_data_rwsem)
> > > > for_each_process_thread
> > > > alloc_task_ctx_data
> > > > do_exit
> > > > perf_event_exit_task
> > > > percpu_down_read (global_ctx_data_rwsem)
> > > >
> > > > I think attach_global_ctx_data() should skip tasks with PF_EXITING and
> > > > it'd be nice if perf_event_exit_task() could release the ctx_data
> > > > unconditionally. But I'm not sure how to synchronize them properly.
> > > >
> > > > Any thoughts?
> >
> > I'm curious if this makes any sense.. I feel like it needs to check the
> > flag again before allocation.
> >
> > Thanks,
> > Namhyung
> >
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 376fb07d869b8b50..2a8847e95d7eb698 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -5469,6 +5469,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
> > /* Allocate everything */
> > scoped_guard (rcu) {
> > for_each_process_thread(g, p) {
> > + if (p->flags & PF_EXITING)
> > + continue;
> > cd = rcu_dereference(p->perf_ctx_data);
> > if (cd && !cd->global) {
> > cd->global = 1;
>
> I suppose this makes sense.
>
> > @@ -14563,7 +14565,6 @@ void perf_event_exit_task(struct task_struct *task)
> > /*
> > * Detach the perf_ctx_data for the system-wide event.
> > */
> > - guard(percpu_read)(&global_ctx_data_rwsem);
> > detach_task_ctx_data(task);
> > }
>
> This would need a comment; something like:
>
> /*
> * This can be done without holding global_ctx_data_rwsem
> * because this is done after setting PF_EXITING such that
> * attach_global_ctx_data() will skip over this task.
> */
> WARN_ON_ONCE(!(task->flags & PF_EXITING))
>
> But yes, I suppose this can do. The question is however, how do you get
> into this predicament to begin with? Are you creating and destroying a
> lot of global LBR events or something?
I think it's just because there are too many tasks in the system like
O(100K). And any thread going to exit needs to wait for
attach_global_ctx_data() to finish the iteration over every task.
>
> Would it make sense to delay detach_global_ctx_data() for a second or
> so? That is, what is your event creation pattern?
I don't think it has a special pattern, but I'm curious how we can
handle a race like below.
attach_global_ctx_data
check p->flags & PF_EXITING
do_exit
(preemption) set PF_EXITING
detach_task_ctx_data()
check p->perf_ctx_data
attach_task_ctx_data() ---> memory leak
Thanks,
Namhyung
next prev parent reply other threads:[~2026-01-07 19:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <aUnVfxDtLNUDJM_v@google.com>
2025-12-22 23:36 ` [BUG] perf/core: Task stuck on global_ctx_data_rwsem Namhyung Kim
2026-01-06 22:34 ` Namhyung Kim
2026-01-07 9:16 ` Peter Zijlstra
2026-01-07 19:01 ` Namhyung Kim [this message]
2026-01-07 22:28 ` Peter Zijlstra
2026-01-07 22:32 ` Peter Zijlstra
2026-01-08 19:56 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aV6toexK5LMc1MNY@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox