All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: "mingo@kernel.org" <mingo@kernel.org>,
	"lucas.demarchi@intel.com" <lucas.demarchi@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"willy@infradead.org" <willy@infradead.org>,
	"acme@kernel.org" <acme@kernel.org>,
	"namhyung@kernel.org" <namhyung@kernel.org>,
	"mark.rutland@arm.com" <mark.rutland@arm.com>,
	"alexander.shishkin@linux.intel.com"
	<alexander.shishkin@linux.intel.com>,
	"jolsa@kernel.org" <jolsa@kernel.org>,
	"irogers@google.com" <irogers@google.com>,
	"adrian.hunter@intel.com" <adrian.hunter@intel.com>,
	"kan.liang@linux.intel.com" <kan.liang@linux.intel.com>
Subject: Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
Date: Fri, 17 Jan 2025 14:04:23 +0100	[thread overview]
Message-ID: <20250117130423.GI8385@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20250117000316.GB33629@noisy.programming.kicks-ass.net>

On Fri, Jan 17, 2025 at 01:03:16AM +0100, Peter Zijlstra wrote:

> > 2) A race with perf_event_release_kernel(). perf_event_release_kernel()
> >    prepares a separate "free_list" of all children events under ctx->mutex
> >    and event->child_mutex. However, the "free_list" uses the same
> >    "event->child_list" for entries. OTOH, perf_pmu_unregister() ultimately
> >    calls __perf_remove_from_context() with DETACH_CHILD, which checks if
> >    the event being removed is a child event, and if so, it will try to
> >    detach the child from parent using list_del_init(&event->child_list);
> >    i.e. two code path doing list_del on the same list entry.
> > 
> >    perf_event_release_kernel()                                        perf_pmu_unregister()
> >      /* Move children events to free_list */                            ...
> >      list_for_each_entry_safe(child, tmp, &free_list, child_list) {       perf_remove_from_context() /* with DETACH_CHILD */
> >        ...                                                                  __perf_remove_from_context()
> >        list_del(&child->child_list);                                          perf_child_detach()
> >                                                                                 list_del_init(&event->child_list);
> 
> Bah, I had figured it was taken care of, because perf_event_exit_event()
> has a similar race. I'll try and figure out what to do there.

So, the problem appears to be that perf_event_release_kernel() does not
use DETACH_CHILD, doing so will clear PERF_ATTACH_CHILD, at which point
the above is fully serialized by parent->child_mutex.

Then the next problem is that since pmu_detach_events() can hold an
extra ref on things, the free_event() from free_list will WARN, like
before.

Easily fixed by making that put_event(), except that messes up the whole
wait_var_event() scheme -- since __free_event() does the final
put_ctx().

This in turn can be fixed by pushing that wake_up_var() nonsense into
put_ctx() itself.

Which then gives me something like so.

But also, I think we can get rid of that free_list entirely.

Anyway, let me go break this up into individual patches and go test
this -- after lunch!

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1229,8 +1229,14 @@ static void put_ctx(struct perf_event_co
 	if (refcount_dec_and_test(&ctx->refcount)) {
 		if (ctx->parent_ctx)
 			put_ctx(ctx->parent_ctx);
-		if (ctx->task && ctx->task != TASK_TOMBSTONE)
-			put_task_struct(ctx->task);
+		if (ctx->task) {
+			if (ctx->task == TASK_TOMBSTONE) {
+				smp_mb();
+				wake_up_var(&ctx->refcount);
+			} else {
+				put_task_struct(ctx->task);
+			}
+		}
 		call_rcu(&ctx->rcu_head, free_ctx);
 	}
 }
@@ -5550,8 +5556,6 @@ int perf_event_release_kernel(struct per
 again:
 	mutex_lock(&event->child_mutex);
 	list_for_each_entry(child, &event->child_list, child_list) {
-		void *var = NULL;
-
 		/*
 		 * Cannot change, child events are not migrated, see the
 		 * comment with perf_event_ctx_lock_nested().
@@ -5584,46 +5588,32 @@ int perf_event_release_kernel(struct per
 		tmp = list_first_entry_or_null(&event->child_list,
 					       struct perf_event, child_list);
 		if (tmp == child) {
-			perf_remove_from_context(child, DETACH_GROUP);
-			list_move(&child->child_list, &free_list);
+			perf_remove_from_context(child, DETACH_GROUP | DETACH_CHILD);
+			/*
+			 * Can't risk calling into free_event() here, since
+			 * event->destroy() might invert with the currently
+			 * held locks, see 82d94856fa22 ("perf/core: Fix lock
+			 * inversion between perf,trace,cpuhp")
+			 */
+			list_add(&child->child_list, &free_list);
 			/*
 			 * This matches the refcount bump in inherit_event();
 			 * this can't be the last reference.
 			 */
 			put_event(event);
-		} else {
-			var = &ctx->refcount;
 		}
 
 		mutex_unlock(&event->child_mutex);
 		mutex_unlock(&ctx->mutex);
 		put_ctx(ctx);
 
-		if (var) {
-			/*
-			 * If perf_event_free_task() has deleted all events from the
-			 * ctx while the child_mutex got released above, make sure to
-			 * notify about the preceding put_ctx().
-			 */
-			smp_mb(); /* pairs with wait_var_event() */
-			wake_up_var(var);
-		}
 		goto again;
 	}
 	mutex_unlock(&event->child_mutex);
 
 	list_for_each_entry_safe(child, tmp, &free_list, child_list) {
-		void *var = &child->ctx->refcount;
-
 		list_del(&child->child_list);
-		free_event(child);
-
-		/*
-		 * Wake any perf_event_free_task() waiting for this event to be
-		 * freed.
-		 */
-		smp_mb(); /* pairs with wait_var_event() */
-		wake_up_var(var);
+		put_event(child);
 	}
 
 no_ctx:

  parent reply	other threads:[~2025-01-17 13:04 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
2024-11-04 13:39 ` [PATCH 01/19] lockdep: Fix might_fault() Peter Zijlstra
2025-03-01 20:06   ` [tip: locking/core] lockdep/mm: Fix might_fault() lockdep check of current->mm->mmap_lock tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 02/19] perf: Fix pmus_lock vs pmus_srcu ordering Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: Fix pmus_lock vs. " tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 03/19] perf: Fix perf_pmu_register() vs perf_init_event() Peter Zijlstra
2024-11-04 15:36   ` Uros Bizjak
2024-11-05 12:01     ` Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: Fix perf_pmu_register() vs. perf_init_event() tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 04/19] perf: Simplify perf_event_alloc() error path Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: Simplify the " tip-bot2 for Peter Zijlstra
2025-03-04  8:57   ` tip-bot2 for Peter Zijlstra
2025-03-06  7:57   ` [PATCH 04/19] perf: Simplify " Lai, Yi
2025-03-06  9:24     ` Ingo Molnar
2025-03-07  3:16       ` Lai, Yi
2025-03-07 11:33         ` Ingo Molnar
2024-11-04 13:39 ` [PATCH 05/19] perf: Simplify perf_pmu_register() " Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: Simplify the " tip-bot2 for Peter Zijlstra
2025-03-04  8:57   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 06/19] perf: Simplify perf_pmu_register() Peter Zijlstra
2024-11-20 13:06   ` Ravi Bangoria
2024-11-20 14:46     ` Peter Zijlstra
2024-11-20 15:53       ` Ravi Bangoria
2025-03-01 20:07   ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04  8:57   ` tip-bot2 for Peter Zijlstra
2025-03-25 19:47   ` [PATCH 06/19] perf: " Sidhartha Kumar
2024-11-04 13:39 ` [PATCH 07/19] perf: Simplify perf_init_event() Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04  8:57   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 08/19] perf: Simplify perf_event_alloc() Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04  8:57   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 09/19] perf: Merge pmu_disable_count into cpu_pmu_context Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count tip-bot2 for Peter Zijlstra
2025-03-04  8:57   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 10/19] perf: Add this_cpc() helper Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04  8:57   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 11/19] perf: Detach perf_cpu_pmu_context and pmu lifetimes Peter Zijlstra
2025-03-03 12:29   ` [tip: perf/core] perf/core: Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes tip-bot2 for Peter Zijlstra
2025-03-04  8:56   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 12/19] perf: Introduce perf_free_addr_filters() Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04  8:57   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 13/19] perf: Robustify perf_event_free_bpf_prog() Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/bpf: " tip-bot2 for Peter Zijlstra
2025-03-04  8:57   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 14/19] perf: Simplify perf_mmap() control flow Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: Simplify the " tip-bot2 for Peter Zijlstra
2025-03-04  8:57   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 15/19] perf: Fix perf_mmap() failure path Peter Zijlstra
2025-03-01 19:17   ` Ingo Molnar
2025-03-03 12:38     ` Lorenzo Stoakes
2025-03-04  8:46   ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04  8:56   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 16/19] perf: Further simplify perf_mmap() Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04  8:57   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 17/19] perf: Remove retry loop from perf_mmap() Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04  8:56   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 18/19] perf: Lift event->mmap_mutex in perf_mmap() Peter Zijlstra
2025-03-01 20:07   ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04  8:56   ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 19/19] perf: Make perf_pmu_unregister() useable Peter Zijlstra
2024-11-05 15:08   ` Liang, Kan
2024-11-05 15:16     ` Peter Zijlstra
2024-11-05 15:25       ` Liang, Kan
2024-11-25  4:10   ` Ravi Bangoria
2024-12-17  9:12     ` Peter Zijlstra
2024-12-17 11:52       ` Peter Zijlstra
2024-12-19  9:33         ` Ravi Bangoria
2024-12-19 10:56           ` Ravi Bangoria
2025-01-03  4:24           ` Ravi Bangoria
2025-01-17  0:03             ` Peter Zijlstra
2025-01-17  5:20               ` Ravi Bangoria
2025-01-17  8:36                 ` Peter Zijlstra
2025-01-17 13:04               ` Peter Zijlstra [this message]
2025-01-17 21:04                 ` Peter Zijlstra
2025-01-20 11:15                   ` Ravi Bangoria
2025-01-03  4:29   ` Ravi Bangoria
2024-12-16 18:02 ` [PATCH 00/19] perf: Make perf_pmu_unregister() usable Lucas De Marchi
2025-03-01 20:00 ` Ingo Molnar
2025-03-03  3:25   ` Ravi Bangoria
2025-03-03  9:16     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250117130423.GI8385@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucas.demarchi@intel.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=ravi.bangoria@amd.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.