All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"Liang, Kan" <kan.liang@linux.intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ian Rogers <irogers@google.com>, Jiri Olsa <jolsa@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Ravi Bangoria <ravi.bangoria@amd.com>,
	linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 2/4] perf: Fix irq work dereferencing garbage
Date: Mon, 28 Apr 2025 13:11:47 +0200	[thread overview]
Message-ID: <aA9ic6m6WAcmVBAw@pavilion.home> (raw)
In-Reply-To: <20250424163024.GC18306@noisy.programming.kicks-ass.net>

Le Thu, Apr 24, 2025 at 06:30:24PM +0200, Peter Zijlstra a écrit :
> On Thu, Apr 24, 2025 at 06:11:26PM +0200, Frederic Weisbecker wrote:
> > @@ -13940,29 +13941,36 @@ perf_event_exit_event(struct perf_event *event,
> >  		 * Do destroy all inherited groups, we don't care about those
> >  		 * and being thorough is better.
> >  		 */
> > -		detach_flags |= DETACH_GROUP | DETACH_CHILD;
> > +		prd.detach_flags |= DETACH_GROUP | DETACH_CHILD;
> >  		mutex_lock(&parent_event->child_mutex);
> >  	}
> >  
> >  	if (revoke)
> > -		detach_flags |= DETACH_GROUP | DETACH_REVOKE;
> > +		prd.detach_flags |= DETACH_GROUP | DETACH_REVOKE;
> >  
> > -	perf_remove_from_context(event, detach_flags);
> > +	perf_remove_from_context(event, &prd);
> 
> Isn't all this waay to complicated?
> 
> That is, to modify state we need both ctx->mutex and ctx->lock, and this
> is what __perf_remove_from_context() has, but because of this, holding
> either one of those locks is sufficient to read the state -- it cannot
> change.
> 
> And here we already hold ctx->mutex.
> 
> So can't we simply do:
> 
> 	old_state = event->attach_state;
> 	perf_remove_from_context(event, detach_flags);
> 
> 	// do whatever with old_state

Right, the locking scenario is just a bit more complicated.
Most flags are set on init or with both ctx mutex and lock.
But:

_ PERF_ATTACH_CHILD is set instead with parent child_mutex and ctx lock.
_ PERF_ATTACH_ITRACE is set from pmu::start(). Thus from the event context
  with just interrupt disabled. It's probably enough to synchronize against
  initialization and remove_from_context IPIs but perf_event_exit_event() needs
  some care.

So we must hold both ctx mutex and child_mutex (although the pmus_srcu thing
should make that PERF_ATTACH_CHILD thing visible but let's keep things obvious).
And also have WRITE_ONCE() / READ_ONCE() to take care about PERF_ATTACH_ITRACE,
which we don't care about anyway.

Now this looks like this:

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7bcb02ffb93a..7278ca731a55 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -208,7 +208,6 @@ static void perf_ctx_unlock(struct perf_cpu_context *cpuctx,
 }
 
 #define TASK_TOMBSTONE ((void *)-1L)
-#define EVENT_TOMBSTONE ((void *)-1L)
 
 static bool is_kernel_event(struct perf_event *event)
 {
@@ -2338,12 +2337,6 @@ static void perf_child_detach(struct perf_event *event)
 
 	sync_child_event(event);
 	list_del_init(&event->child_list);
-	/*
-	 * Cannot set to NULL, as that would confuse the situation vs
-	 * not being a child event. See for example unaccount_event().
-	 */
-	event->parent = EVENT_TOMBSTONE;
-	put_event(parent_event);
 }
 
 static bool is_orphaned_event(struct perf_event *event)
@@ -5705,7 +5698,7 @@ static void put_event(struct perf_event *event)
 	_free_event(event);
 
 	/* Matches the refcount bump in inherit_event() */
-	if (parent && parent != EVENT_TOMBSTONE)
+	if (parent)
 		put_event(parent);
 }
 
@@ -9998,7 +9991,7 @@ void perf_event_text_poke(const void *addr, const void *old_bytes,
 
 void perf_event_itrace_started(struct perf_event *event)
 {
-	event->attach_state |= PERF_ATTACH_ITRACE;
+	WRITE_ONCE(event->attach_state, event->attach_state | PERF_ATTACH_ITRACE);
 }
 
 static void perf_log_itrace_start(struct perf_event *event)
@@ -13922,10 +13915,7 @@ perf_event_exit_event(struct perf_event *event,
 {
 	struct perf_event *parent_event = event->parent;
 	unsigned long detach_flags = DETACH_EXIT;
-	bool is_child = !!parent_event;
-
-	if (parent_event == EVENT_TOMBSTONE)
-		parent_event = NULL;
+	unsigned int attach_state;
 
 	if (parent_event) {
 		/*
@@ -13942,6 +13932,8 @@ perf_event_exit_event(struct perf_event *event,
 		 */
 		detach_flags |= DETACH_GROUP | DETACH_CHILD;
 		mutex_lock(&parent_event->child_mutex);
+		/* PERF_ATTACH_ITRACE might be set concurrently */
+		attach_state = READ_ONCE(event->attach_state);
 	}
 
 	if (revoke)
@@ -13951,18 +13943,25 @@ perf_event_exit_event(struct perf_event *event,
 	/*
 	 * Child events can be freed.
 	 */
-	if (is_child) {
-		if (parent_event) {
-			mutex_unlock(&parent_event->child_mutex);
-			/*
-			 * Kick perf_poll() for is_event_hup();
-			 */
-			perf_event_wakeup(parent_event);
+	if (parent_event) {
+		mutex_unlock(&parent_event->child_mutex);
+		/*
+		 * Kick perf_poll() for is_event_hup();
+		 */
+		perf_event_wakeup(parent_event);
+
+		/*
+		 * Match the refcount initialization. Make sure it doesn't happen
+		 * twice if pmu_detach_event() calls it on an already exited task.
+		 */
+		if (attach_state & PERF_ATTACH_CHILD) {
 			/*
 			 * pmu_detach_event() will have an extra refcount.
+			 * perf_pending_task() might have one too.
 			 */
 			put_event(event);
 		}
+
 		return;
 	}
 

  reply	other threads:[~2025-04-28 11:11 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-24 16:11 [PATCH 0/4] perf fixes Frederic Weisbecker
2025-04-24 16:11 ` [PATCH 1/4] perf: Fix failing inherit_event() doing extra refcount decrement on parent Frederic Weisbecker
2025-04-24 16:16   ` Peter Zijlstra
2025-05-08 10:34   ` [tip: perf/core] " tip-bot2 for Frederic Weisbecker
2025-05-08 19:55   ` tip-bot2 for Frederic Weisbecker
2025-04-24 16:11 ` [PATCH 2/4] perf: Fix irq work dereferencing garbage Frederic Weisbecker
2025-04-24 16:30   ` Peter Zijlstra
2025-04-28 11:11     ` Frederic Weisbecker [this message]
2025-05-02 10:29       ` Peter Zijlstra
2025-05-02 11:30         ` Peter Zijlstra
2025-05-02 12:04           ` Frederic Weisbecker
2025-05-02 11:58         ` Frederic Weisbecker
2025-04-24 16:11 ` [PATCH 3/4] perf: Remove too early and redundant CPU hotplug handling Frederic Weisbecker
2025-04-24 16:32   ` Peter Zijlstra
2025-05-08 10:34   ` [tip: perf/core] " tip-bot2 for Frederic Weisbecker
2025-05-08 19:55   ` tip-bot2 for Frederic Weisbecker
2025-04-24 16:11 ` [PATCH 4/4] perf: Fix confusing aux iteration Frederic Weisbecker
2025-05-08 10:34   ` [tip: perf/core] " tip-bot2 for Frederic Weisbecker
2025-05-08 19:55   ` tip-bot2 for Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aA9ic6m6WAcmVBAw@pavilion.home \
    --to=frederic@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.