All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marco Elver <elver@google.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Jiri Olsa <jolsa@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Alexander Potapenko <glider@google.com>,
	Al Viro <viro@zeniv.linux.org.uk>, Arnd Bergmann <arnd@arndb.de>,
	Christian Brauner <christian@brauner.io>,
	Dmitry Vyukov <dvyukov@google.com>, Jann Horn <jannh@google.com>,
	Jens Axboe <axboe@kernel.dk>, Matt Morehouse <mascasa@google.com>,
	Peter Collingbourne <pcc@google.com>,
	Ian Rogers <irogers@google.com>,
	kasan-dev <kasan-dev@googlegroups.com>,
	linux-arch <linux-arch@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	"open list:KERNEL SELFTEST FRAMEWORK" 
	<linux-kselftest@vger.kernel.org>
Subject: Re: [PATCH RFC v2 8/8] selftests/perf: Add kselftest for remove_on_exec
Date: Tue, 23 Mar 2021 13:08:55 +0100	[thread overview]
Message-ID: <YFnaV/uY/fN9WI5+@elver.google.com> (raw)
In-Reply-To: <CANpmjNO1mRBFBQ6Rij-6ojVPKkaB6JLHD2WOVxhQeqxsqit2-Q@mail.gmail.com>

On Tue, Mar 23, 2021 at 11:41AM +0100, Marco Elver wrote:
> On Tue, 23 Mar 2021 at 11:32, Peter Zijlstra <peterz@infradead.org> wrote:
[...]
> > > +             if (parent_event) {
> > >                       /*
> > > +                      * Remove event from parent, to avoid race where the
> > > +                      * parent concurrently iterates through its children to
> > > +                      * enable, disable, or otherwise modify an event.
> > >                        */
> > > +                     mutex_lock(&parent_event->child_mutex);
> > > +                     list_del_init(&event->child_list);
> > > +                     mutex_unlock(&parent_event->child_mutex);
> > >               }
> >
> >                 ^^^ this, right?
> >
> > But that's something perf_event_exit_event() alread does. So then you're
> > worried about the order of things.
> 
> Correct. We somehow need to prohibit the parent from doing an
> event_function_call() while we potentially deactivate the context with
> perf_remove_from_context().
> 
> > > +
> > > +             perf_remove_from_context(event, !!event->parent * DETACH_GROUP);
> > > +             perf_event_exit_event(event, ctx, current, true);
> > >       }
> >
> > perf_event_release_kernel() first does perf_remove_from_context() and
> > then clears the child_list, and that makes sense because if we're there,
> > there's no external access anymore, the filedesc is gone and nobody will
> > be iterating child_list anymore.
> >
> > perf_event_exit_task_context() and perf_event_exit_event() OTOH seem to
> > rely on ctx->task == TOMBSTONE to sabotage event_function_call() such
> > that if anybody is iterating the child_list, it'll NOP out.
> >
> > But here we don't have neither, and thus need to worry about the order
> > vs child_list iteration.
> >
> > I suppose we should stick sync_child_event() in there as well.
> >
> > And at that point there's very little value in still using
> > perf_event_exit_event()... let me see if there's something to be done
> > about that.
> 
> I don't mind dropping use of perf_event_exit_event() and open coding
> all of this. That would also avoid modifying perf_event_exit_event().
> 
> But I leave it to you what you think is nicest.

I played a bit more with it, and the below would be the version without
using perf_event_exit_event(). Perhaps it isn't too bad.

Thanks,
-- Marco

------ >8 ------

diff --git a/kernel/events/core.c b/kernel/events/core.c
index aa47e111435e..288b61820dab 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2165,8 +2165,9 @@ static void perf_group_detach(struct perf_event *event)
 	 * If this is a sibling, remove it from its group.
 	 */
 	if (leader != event) {
+		leader->nr_siblings--;
 		list_del_init(&event->sibling_list);
-		event->group_leader->nr_siblings--;
+		event->group_leader = event;
 		goto out;
 	}
 
@@ -2180,8 +2181,9 @@ static void perf_group_detach(struct perf_event *event)
 		if (sibling->event_caps & PERF_EV_CAP_SIBLING)
 			perf_remove_sibling_event(sibling);
 
-		sibling->group_leader = sibling;
+		leader->nr_siblings--;
 		list_del_init(&sibling->sibling_list);
+		sibling->group_leader = sibling;
 
 		/* Inherit group flags from the previous leader */
 		sibling->group_caps = event->group_caps;
@@ -2358,10 +2360,19 @@ __perf_remove_from_context(struct perf_event *event,
 static void perf_remove_from_context(struct perf_event *event, unsigned long flags)
 {
 	struct perf_event_context *ctx = event->ctx;
+	bool remove;
 
 	lockdep_assert_held(&ctx->mutex);
 
-	event_function_call(event, __perf_remove_from_context, (void *)flags);
+	/*
+	 * There is concurrency vs remove_on_exec().
+	 */
+	raw_spin_lock_irq(&ctx->lock);
+	remove = (event->attach_state & PERF_ATTACH_CONTEXT);
+	raw_spin_unlock_irq(&ctx->lock);
+
+	if (remove)
+		event_function_call(event, __perf_remove_from_context, (void *)flags);
 
 	/*
 	 * The above event_function_call() can NO-OP when it hits
@@ -4196,43 +4207,86 @@ static void perf_event_enable_on_exec(int ctxn)
 }
 
 static void perf_remove_from_owner(struct perf_event *event);
-static void perf_event_exit_event(struct perf_event *child_event,
-				  struct perf_event_context *child_ctx,
-				  struct task_struct *child);
+static void sync_child_event(struct perf_event *child_event,
+			     struct task_struct *child);
+static void free_event(struct perf_event *event);
 
 /*
  * Removes all events from the current task that have been marked
  * remove-on-exec, and feeds their values back to parent events.
  */
-static void perf_event_remove_on_exec(void)
+static void perf_event_remove_on_exec(int ctxn)
 {
-	int ctxn;
+	struct perf_event_context *ctx, *clone_ctx = NULL;
+	struct perf_event *event, *next;
+	LIST_HEAD(free_list);
+	unsigned long flags;
+	bool modified = false;
 
-	for_each_task_context_nr(ctxn) {
-		struct perf_event_context *ctx;
-		struct perf_event *event, *next;
+	ctx = perf_pin_task_context(current, ctxn);
+	if (!ctx)
+		return;
 
-		ctx = perf_pin_task_context(current, ctxn);
-		if (!ctx)
+	mutex_lock(&ctx->mutex);
+
+	if (WARN_ON_ONCE(ctx->task != current))
+		goto unlock;
+
+	list_for_each_entry_safe(event, next, &ctx->event_list, event_entry) {
+		struct perf_event *parent_event = event->parent;
+
+		if (!event->attr.remove_on_exec)
 			continue;
-		mutex_lock(&ctx->mutex);
 
-		list_for_each_entry_safe(event, next, &ctx->event_list, event_entry) {
-			if (!event->attr.remove_on_exec)
-				continue;
+		if (!is_kernel_event(event))
+			perf_remove_from_owner(event);
+
+		modified = true;
 
-			if (!is_kernel_event(event))
-				perf_remove_from_owner(event);
-			perf_remove_from_context(event, DETACH_GROUP);
+		if (parent_event) {
 			/*
-			 * Remove the event and feed back its values to the
-			 * parent event.
+			 * Remove event from parent *before* modifying contexts,
+			 * to avoid race where the parent concurrently iterates
+			 * through its children to enable, disable, or otherwise
+			 * modify an event.
 			 */
-			perf_event_exit_event(event, ctx, current);
+
+			sync_child_event(event, current);
+
+			WARN_ON_ONCE(parent_event->ctx->parent_ctx);
+			mutex_lock(&parent_event->child_mutex);
+			list_del_init(&event->child_list);
+			mutex_unlock(&parent_event->child_mutex);
+
+			perf_event_wakeup(parent_event);
+			put_event(parent_event);
 		}
-		mutex_unlock(&ctx->mutex);
-		put_ctx(ctx);
+
+		perf_remove_from_context(event, !!event->parent * DETACH_GROUP);
+
+		raw_spin_lock_irq(&ctx->lock);
+		WARN_ON_ONCE(ctx->is_active);
+		perf_event_set_state(event, PERF_EVENT_STATE_EXIT); /* is_event_hup() */
+		raw_spin_unlock_irq(&ctx->lock);
+
+		if (parent_event)
+			free_event(event);
+		else
+			perf_event_wakeup(event);
 	}
+
+	raw_spin_lock_irqsave(&ctx->lock, flags);
+	if (modified)
+		clone_ctx = unclone_ctx(ctx);
+	--ctx->pin_count;
+	raw_spin_unlock_irqrestore(&ctx->lock, flags);
+
+unlock:
+	mutex_unlock(&ctx->mutex);
+
+	put_ctx(ctx);
+	if (clone_ctx)
+		put_ctx(clone_ctx);
 }
 
 struct perf_read_data {
@@ -7581,20 +7635,18 @@ void perf_event_exec(void)
 	struct perf_event_context *ctx;
 	int ctxn;
 
-	rcu_read_lock();
 	for_each_task_context_nr(ctxn) {
-		ctx = current->perf_event_ctxp[ctxn];
-		if (!ctx)
-			continue;
-
 		perf_event_enable_on_exec(ctxn);
+		perf_event_remove_on_exec(ctxn);
 
-		perf_iterate_ctx(ctx, perf_event_addr_filters_exec, NULL,
-				   true);
+		rcu_read_lock();
+		ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
+		if (ctx) {
+			perf_iterate_ctx(ctx, perf_event_addr_filters_exec,
+					 NULL, true);
+		}
+		rcu_read_unlock();
 	}
-	rcu_read_unlock();
-
-	perf_event_remove_on_exec();
 }
 
 struct remote_output {

  reply	other threads:[~2021-03-23 12:10 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-10 10:41 [PATCH RFC v2 0/8] Add support for synchronous signals on perf events Marco Elver
2021-03-10 10:41 ` [PATCH RFC v2 1/8] perf/core: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children Marco Elver
2021-03-10 10:41 ` [PATCH RFC v2 2/8] perf/core: Support only inheriting events if cloned with CLONE_THREAD Marco Elver
2021-03-10 10:41 ` [PATCH RFC v2 3/8] perf/core: Add support for event removal on exec Marco Elver
2021-03-10 10:47   ` Marco Elver
2021-03-16 16:22   ` Peter Zijlstra
2021-03-22  9:20     ` Marco Elver
2021-03-10 10:41 ` [PATCH RFC v2 4/8] signal: Introduce TRAP_PERF si_code and si_perf to siginfo Marco Elver
2021-03-10 10:41 ` [PATCH RFC v2 5/8] perf/core: Add support for SIGTRAP on perf events Marco Elver
2021-03-10 10:41 ` [PATCH RFC v2 6/8] perf/core: Add breakpoint information to siginfo on SIGTRAP Marco Elver
2021-03-10 10:41 ` [PATCH RFC v2 7/8] selftests/perf: Add kselftest for process-wide sigtrap handling Marco Elver
2021-03-10 10:41 ` [PATCH RFC v2 8/8] selftests/perf: Add kselftest for remove_on_exec Marco Elver
2021-03-22 13:24   ` Marco Elver
2021-03-22 16:42     ` Peter Zijlstra
2021-03-23  9:52       ` Marco Elver
2021-03-23 10:32         ` Peter Zijlstra
2021-03-23 10:41           ` Marco Elver
2021-03-23 12:08             ` Marco Elver [this message]
2021-03-23 14:45           ` Peter Zijlstra
2021-03-23 15:58             ` Marco Elver
2021-03-23 16:19               ` Peter Zijlstra
2021-03-23  3:10     ` Ian Rogers
2021-03-23  9:47       ` Marco Elver
2021-03-23 19:16         ` Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFnaV/uY/fN9WI5+@elver.google.com \
    --to=elver@google.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=axboe@kernel.dk \
    --cc=christian@brauner.io \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=irogers@google.com \
    --cc=jannh@google.com \
    --cc=jolsa@redhat.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mascasa@google.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=pcc@google.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.