From: Namhyung Kim <namhyung@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
James Clark <james.clark@linaro.org>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem
Date: Thu, 8 Jan 2026 11:56:59 -0800 [thread overview]
Message-ID: <aWAMC3HPskHNQeOs@google.com> (raw)
In-Reply-To: <20260107223256.GA807925@noisy.programming.kicks-ass.net>
On Wed, Jan 07, 2026 at 11:32:56PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 07, 2026 at 11:28:24PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 07, 2026 at 11:01:53AM -0800, Namhyung Kim wrote:
> >
> > > > But yes, I suppose this can do. The question is however, how do you get
> > > > into this predicament to begin with? Are you creating and destroying a
> > > > lot of global LBR events or something?
> > >
> > > I think it's just because there are too many tasks in the system like
> > > O(100K). And any thread going to exit needs to wait for
> > > attach_global_ctx_data() to finish the iteration over every task.
> >
> > OMG, so many tasks ...
> >
> > > > Would it make sense to delay detach_global_ctx_data() for a second or
> > > > so? That is, what is your event creation pattern?
> > >
> > > I don't think it has a special pattern, but I'm curious how we can
> > > handle a race like below.
> > >
> > > attach_global_ctx_data
> > > check p->flags & PF_EXITING
> > > do_exit
> > > (preemption) set PF_EXITING
> > > detach_task_ctx_data()
> > > check p->perf_ctx_data
> > > attach_task_ctx_data() ---> memory leak
> >
> > Oh right. Something like so perhaps?
> >
> > ---
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 3c2a491200c6..e5e716420eb3 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -5421,9 +5421,19 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache,
> > return -ENOMEM;
> >
> > for (;;) {
> > - if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) {
> > + if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) {
> > if (old)
> > perf_free_ctx_data_rcu(old);
> > + /*
> > + * try_cmpxchg() pairs with try_cmpxchg() from
> > + * detach_task_ctx_data() such that
> > + * if we race with perf_event_exit_task(), we must
> > + * observe PF_EXITING.
> > + */
> > + if (task->flags & PF_EXITING) {
> > + task->perf_ctx_data = NULL;
> > + perf_free_ctx_data_rcu(cd);
>
> Ugh and now it can race and do a double free, another try_cmpxchg() is
> needed here.
Thanks! Something like this?
Namhyung
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 376fb07d869b8b50..cf252d8f49b2b259 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5421,9 +5421,20 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache,
return -ENOMEM;
for (;;) {
- if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) {
+ if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) {
if (old)
perf_free_ctx_data_rcu(old);
+ /*
+ * try_cmpxchg() pairs with try_cmpxchg() from
+ * detach_task_ctx_data() such that
+ * if we race with perf_event_exit_task(), we must
+ * observe PF_EXITING.
+ */
+ if (task->flags & PF_EXITING) {
+ /* detach_task_ctx_data() may free it already */
+ if (try_cmpxchg(&task->perf_ctx_data, &cd, NULL))
+ perf_free_ctx_data_rcu(cd);
+ }
return 0;
}
@@ -5469,6 +5480,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
/* Allocate everything */
scoped_guard (rcu) {
for_each_process_thread(g, p) {
+ if (p->flags & PF_EXITING)
+ continue;
cd = rcu_dereference(p->perf_ctx_data);
if (cd && !cd->global) {
cd->global = 1;
@@ -14562,8 +14575,11 @@ void perf_event_exit_task(struct task_struct *task)
/*
* Detach the perf_ctx_data for the system-wide event.
+ *
+ * Done without holding global_ctx_data_rwsem; typically
+ * attach_global_ctx_data() will skip over this task, but otherwise
+ * attach_task_ctx_data() will observe PF_EXITING.
*/
- guard(percpu_read)(&global_ctx_data_rwsem);
detach_task_ctx_data(task);
}
prev parent reply other threads:[~2026-01-08 19:57 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-22 23:34 [BUG] Task stuck on global_ctx_data_rwsem Namhyung Kim
2025-12-22 23:36 ` [BUG] perf/core: " Namhyung Kim
2026-01-06 22:34 ` Namhyung Kim
2026-01-07 9:16 ` Peter Zijlstra
2026-01-07 19:01 ` Namhyung Kim
2026-01-07 22:28 ` Peter Zijlstra
2026-01-07 22:32 ` Peter Zijlstra
2026-01-08 19:56 ` Namhyung Kim [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWAMC3HPskHNQeOs@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.