From: Namhyung Kim <namhyung@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
James Clark <james.clark@linaro.org>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem
Date: Wed, 7 Jan 2026 11:01:53 -0800 [thread overview]
Message-ID: <aV6toexK5LMc1MNY@google.com> (raw)
In-Reply-To: <20260107091652.GB3707891@noisy.programming.kicks-ass.net>
On Wed, Jan 07, 2026 at 10:16:52AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 06, 2026 at 02:34:40PM -0800, Namhyung Kim wrote:
> > Hello,
> >
> > On Mon, Dec 22, 2025 at 03:36:53PM -0800, Namhyung Kim wrote:
> > > On Mon, Dec 22, 2025 at 03:34:23PM -0800, Namhyung Kim wrote:
> > > > Hello,
> > > >
> > > > I got a report that a task is stuck in perf_event_exit_task() waiting
> > > > for global_ctx_data_rwsem. On large systems, it'd have performance
> > > > issues when it grabs the lock to iterate all threads in the system to
> > > > allocate the context data. And it'd block task exit path which is
> > > > problematic especially under memory pressure.
> > > >
> > > > perf_event_open
> > > > perf_event_alloc
> > > > attach_perf_ctx_data
> > > > attach_global_ctx_data
> > > > percpu_down_write (global_ctx_data_rwsem)
> > > > for_each_process_thread
> > > > alloc_task_ctx_data
> > > > do_exit
> > > > perf_event_exit_task
> > > > percpu_down_read (global_ctx_data_rwsem)
> > > >
> > > > I think attach_global_ctx_data() should skip tasks with PF_EXITING and
> > > > it'd be nice if perf_event_exit_task() could release the ctx_data
> > > > unconditionally. But I'm not sure how to synchronize them properly.
> > > >
> > > > Any thoughts?
> >
> > I'm curious if this makes any sense.. I feel like it needs to check the
> > flag again before allocation.
> >
> > Thanks,
> > Namhyung
> >
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 376fb07d869b8b50..2a8847e95d7eb698 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -5469,6 +5469,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
> > /* Allocate everything */
> > scoped_guard (rcu) {
> > for_each_process_thread(g, p) {
> > + if (p->flags & PF_EXITING)
> > + continue;
> > cd = rcu_dereference(p->perf_ctx_data);
> > if (cd && !cd->global) {
> > cd->global = 1;
>
> I suppose this makes sense.
>
> > @@ -14563,7 +14565,6 @@ void perf_event_exit_task(struct task_struct *task)
> > /*
> > * Detach the perf_ctx_data for the system-wide event.
> > */
> > - guard(percpu_read)(&global_ctx_data_rwsem);
> > detach_task_ctx_data(task);
> > }
>
> This would need a comment; something like:
>
> /*
> * This can be done without holding global_ctx_data_rwsem
> * because this is done after setting PF_EXITING such that
> * attach_global_ctx_data() will skip over this task.
> */
> WARN_ON_ONCE(!(task->flags & PF_EXITING))
>
> But yes, I suppose this can do. The question is however, how do you get
> into this predicament to begin with? Are you creating and destroying a
> lot of global LBR events or something?
I think it's just because there are too many tasks in the system like
O(100K). And any thread going to exit needs to wait for
attach_global_ctx_data() to finish the iteration over every task.
>
> Would it make sense to delay detach_global_ctx_data() for a second or
> so? That is, what is your event creation pattern?
I don't think it has a special pattern, but I'm curious how we can
handle a race like below.
attach_global_ctx_data
check p->flags & PF_EXITING
do_exit
(preemption) set PF_EXITING
detach_task_ctx_data()
check p->perf_ctx_data
attach_task_ctx_data() ---> memory leak
Thanks,
Namhyung
next prev parent reply other threads:[~2026-01-07 19:01 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-22 23:34 [BUG] Task stuck on global_ctx_data_rwsem Namhyung Kim
2025-12-22 23:36 ` [BUG] perf/core: " Namhyung Kim
2026-01-06 22:34 ` Namhyung Kim
2026-01-07 9:16 ` Peter Zijlstra
2026-01-07 19:01 ` Namhyung Kim [this message]
2026-01-07 22:28 ` Peter Zijlstra
2026-01-07 22:32 ` Peter Zijlstra
2026-01-08 19:56 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aV6toexK5LMc1MNY@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.