Re: [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>, Paul Mackerras <paulus@samba.org>,
	Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value
Date: Tue, 29 Mar 2011 10:32:12 +0200	[thread overview]
Message-ID: <1301387532.4859.54.camel@twins> (raw)
In-Reply-To: <20110328165648.GA9304@redhat.com>

On Mon, 2011-03-28 at 18:56 +0200, Oleg Nesterov wrote:
> On 03/28, Peter Zijlstra wrote:
> >
> > Another fun race, suppose we do properly remove task_ctx and is_active,
> > but then the task gets scheduled back in before free_event() gets around
> > to disabling the jump_label..
> 
> Yes, this too...
> 
> Well, ignoring the HAVE_JUMP_LABEL case... perhaps we can split
> perf_sched_events into 2 counters? I mean,
> 
> 	atomic_t perf_sched_events_in, perf_sched_events_out;
> 
> 	static inline void perf_event_task_sched_in(struct task_struct *task)
> 	{
> 		COND_STMT(&perf_sched_events_in, __perf_event_task_sched_in(task));
> 	}
> 
> 	static inline
> 	void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
> 	{
> 		perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);
> 
> 		COND_STMT(&perf_sched_events_out, __perf_event_task_sched_out(task, next));
> 	}
> 
> 	void perf_sched_events_inc(void)
> 	{
> 		atomic_inc(&perf_sched_events_out);
> 		smp_mb__after_atomic_inc();
> 		atomic_inc(&perf_sched_events_in);
> 	}
> 
> 	void perf_sched_events_dec(void)
> 	{
> 		if (atomic_dec_and_test(&perf_sched_events_in))
> 			synchronize_sched();
> 		atomic_dec(&perf_sched_events_out);
> 	}
> 
> The last 2 helpers should be used instead of jump_label_inc/dec.

Very clever, my approach was to make __perf_event_task_sched_in() a NOP
when !nr_events, which opens up another race against
perf_install_in_context() but hey ;-) Added my current hackery below.


> As for HAVE_JUMP_LABEL, I still can't understand why this test-case
> triggers the problem. 

FWIW I tested without that..

> But jump_label_inc/dec logic looks obviously
> racy.
> 
> jump_label_dec:
> 
> 	if (atomic_dec_and_test(key))
> 		jump_label_disable(key);
> 
> Another thread can create the PERF_ATTACH_TASK event in between
> and call jump_label_update(JUMP_LABEL_ENABLE) first. Looks like,
> jump_label_update() should ensure that "type" matches the state
> of the "*key" under jump_label_lock().

No I think you're right, and I think we fixed that but it looks like
Ingo still didn't merge the new jump-label patches :/



---

Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -1461,9 +1461,6 @@ static void add_event_to_ctx(struct perf
 	event->tstamp_stopped = tstamp;
 }
 
-static void perf_event_context_sched_in(struct perf_event_context *ctx,
-					struct task_struct *tsk);
-
 /*
  * Cross CPU call to install and enable a performance event
  *
@@ -1473,20 +1470,11 @@ static int  __perf_install_in_context(vo
 {
 	struct perf_event *event = info;
 	struct perf_event_context *ctx = event->ctx;
-	struct perf_event *leader = event->group_leader;
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
-	int err;
-
-	/*
-	 * In case we're installing a new context to an already running task,
-	 * could also happen before perf_event_task_sched_in() on architectures
-	 * which do context switches with IRQs enabled.
-	 */
-	if (ctx->task && !cpuctx->task_ctx)
-		perf_event_context_sched_in(ctx, ctx->task);
+	struct perf_event_context *task_ctx;
+	struct task_struct *task = NULL;
 
 	raw_spin_lock(&ctx->lock);
-	ctx->is_active = 1;
 	update_context_time(ctx);
 	/*
 	 * update cgrp time only if current cgrp
@@ -1497,43 +1485,48 @@ static int  __perf_install_in_context(vo
 
 	add_event_to_ctx(event, ctx);
 
-	if (!event_filter_match(event))
-		goto unlock;
+	if (!event_filter_match(event)) {
+		raw_spin_unlock(&ctx->lock);
+		return;
+	}
+	raw_spin_unlock(&ctx->lock);
 
 	/*
-	 * Don't put the event on if it is disabled or if
-	 * it is in a group and the group isn't on.
+	 * Since both these are only set during context-switches
+	 * and IRQs are disabled, their value is stable.
 	 */
-	if (event->state != PERF_EVENT_STATE_INACTIVE ||
-	    (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE))
-		goto unlock;
+       	task_ctx = cpuctx->task_ctx;
 
+	perf_pmu_disable(ctx->pmu);
 	/*
-	 * An exclusive event can't go on if there are already active
-	 * hardware events, and no hardware event can go on if there
-	 * is already an exclusive event on.
-	 */
-	if (!group_can_go_on(event, cpuctx, 1))
-		err = -EEXIST;
-	else
-		err = event_sched_in(event, cpuctx, ctx);
-
-	if (err) {
-		/*
-		 * This event couldn't go on.  If it is in a group
-		 * then we have to pull the whole group off.
-		 * If the event group is pinned then put it in error state.
-		 */
-		if (leader != event)
-			group_sched_out(leader, cpuctx, ctx);
-		if (leader->attr.pinned) {
-			update_group_times(leader);
-			leader->state = PERF_EVENT_STATE_ERROR;
-		}
-	}
+	 * Reschedule the PMU to possible include the fresh event, we take the
+	 * brute force approach of unscheduling everything and then re-add the
+	 * events in the correct order (CPU-pinned, TASK-pinned, CPU-flexible,
+	 * TASK-flexible).
+	 *
+	 * It is possible we received this IPI before the scheduler called
+	 * perf_event_task_sched_in() on platforms that context switch with
+	 * interrupts enabled. In that case the below DTRT.
+	 */
+	cpu_ctx_sched_out(cpuctx, EVENT_ALL);
+	if (task_ctx)
+		ctx_sched_out(task_ctx, cpuctx, EVENT_ALL);
+
+	if (ctx->task) {
+		cpuctx->task_ctx = task_ctx = ctx;
+		task = ctx->task
+	} else if (task_ctx)
+		task = task_ctx->task;
+
+	cpu_ctx_sched_in(cpuctx, EVENT_PINNED, task);
+	if (task_ctx)
+		ctx_sched_in(task_ctx, cpuctx, EVENT_PINNED, task);
+	cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE, task);
+	if (task_ctx)
+		ctx_sched_in(task_ctx, cpuctx, EVENT_FLEXIBLE, task);
 
-unlock:
-	raw_spin_unlock(&ctx->lock);
+	perf_pmu_rotate_start(ctx->pmu);
+	perf_pmu_enable(ctx->pmu);
 
 	return 0;
 }
@@ -2114,8 +2107,19 @@ static void perf_event_context_sched_in(
 	struct perf_cpu_context *cpuctx;
 
 	cpuctx = __get_cpu_context(ctx);
-	if (cpuctx->task_ctx == ctx)
+	raw_spin_lock(&ctx->lock);
+	/*
+	 * Serialize against perf_install_in_context(), the interesting case
+	 * is where perf_install_in_context() finds the context inactive and
+	 * another cpu is just about to schedule the task in. In that case
+	 * we need to avoid observing a stale ctx->nr_events.
+	 */
+	ctx->is_active = 1;
+	if (cpuctx->task_ctx == ctx || !ctx->nr_events) {
+		raw_spin_lock(&ctx->lock);
 		return;
+	}
+	raw_spin_lock(&ctx->lock);
 
 	perf_pmu_disable(ctx->pmu);
 	/*
@@ -2125,12 +2129,12 @@ static void perf_event_context_sched_in(
 	 */
 	cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
 
+	cpuctx->task_ctx = ctx;
+
 	ctx_sched_in(ctx, cpuctx, EVENT_PINNED, task);
 	cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE, task);
 	ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE, task);
 
-	cpuctx->task_ctx = ctx;
-
 	/*
 	 * Since these rotations are per-cpu, we need to ensure the
 	 * cpu-context we got scheduled on is actually rotating.
@@ -2922,15 +2926,40 @@ static void free_event(struct perf_event
 	call_rcu(&event->rcu_head, free_event_rcu);
 }
 
-int perf_event_release_kernel(struct perf_event *event)
+static int __perf_event_release(void *info)
 {
+	struct perf_event *event = info;
 	struct perf_event_context *ctx = event->ctx;
+	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+	int ret;
 
 	/*
-	 * Remove from the PMU, can't get re-enabled since we got
-	 * here because the last ref went.
+	 * Disable the event if its still running, we're shutting down.
 	 */
-	perf_event_disable(event);
+	ret = __perf_event_disable(info);
+	if (ret)
+		return ret;
+
+	raw_spin_lock_irq(&ctx->lock);
+	perf_group_detach(event);
+	list_del_event(event, ctx);
+	/*
+	 * In case we removed the last event from an active task_ctx
+	 * deactivate the task_ctx because this event being freed might
+	 * lead to the perf_sched_events jump_label being disabled
+	 * which avoids the task sched-out hook from being called.
+	 */
+	if (!ctx->nr_events && cpuctx->task_ctx == ctx) {
+		ctx->is_active = 0;
+		cpuctx->task_ctx = NULL;
+	}
+	raw_spin_unlock_irq(&ctx->lock);
+}
+
+int perf_event_release_kernel(struct perf_event *event)
+{
+	struct perf_event_context *ctx = event->ctx;
+	struct task_struct *task = ctx->task;
 
 	WARN_ON_ONCE(ctx->parent_ctx);
 	/*
@@ -2946,10 +2975,28 @@ int perf_event_release_kernel(struct per
 	 *     to trigger the AB-BA case.
 	 */
 	mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
+	if (!task) {
+		cpu_function_call(event->cpu, __perf_event_release, event);
+		goto unlock;
+	}
+
+retry:
+	if (!task_function_call(task, __perf_event_release, event))
+		goto unlock;
+
 	raw_spin_lock_irq(&ctx->lock);
+	if (ctx->is_active) {
+		raw_spin_unlock_irq(&ctx->lock);
+		goto retry;
+	}
+
+	WARN_ON_ONCE(event->state == PERF_EVENT_STATE_ACTIVE);
+
 	perf_group_detach(event);
 	list_del_event(event, ctx);
 	raw_spin_unlock_irq(&ctx->lock);
+
+unlock:
 	mutex_unlock(&ctx->mutex);
 
 	free_event(event);

next prev parent reply	other threads:[~2011-03-29  8:32 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-24 16:44 [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value Jiri Olsa
2011-03-25 19:10 ` Oleg Nesterov
2011-03-26 15:37 ` Peter Zijlstra
2011-03-26 16:13   ` Oleg Nesterov
2011-03-26 16:38     ` Peter Zijlstra
2011-03-26 17:09       ` Oleg Nesterov
2011-03-26 17:35         ` Oleg Nesterov
2011-03-26 18:29           ` Peter Zijlstra
2011-03-26 18:49             ` Oleg Nesterov
2011-03-28 13:30             ` Oleg Nesterov
2011-03-28 14:57               ` Peter Zijlstra
2011-03-28 15:00                 ` Peter Zijlstra
2011-03-28 15:15                 ` Oleg Nesterov
2011-03-28 16:27                   ` Peter Zijlstra
2011-03-28 15:39                     ` Oleg Nesterov
2011-03-28 15:49                 ` Peter Zijlstra
2011-03-28 16:56                   ` Oleg Nesterov
2011-03-29  8:32                     ` Peter Zijlstra [this message]
2011-03-29 10:49                       ` Peter Zijlstra
2011-03-29 16:28                       ` Oleg Nesterov
2011-03-29 19:01                         ` Peter Zijlstra
2011-03-30 13:09                     ` Jiri Olsa
2011-03-30 14:51                       ` Peter Zijlstra
2011-03-30 16:37                         ` Oleg Nesterov
2011-03-30 18:30                           ` Paul E. McKenney
2011-03-30 19:53                             ` Oleg Nesterov
2011-03-30 21:26                           ` Peter Zijlstra
2011-03-30 21:35                             ` Oleg Nesterov
2011-03-31 10:32                             ` Jiri Olsa
2011-03-31 12:41                             ` [tip:perf/urgent] perf: Fix task context scheduling tip-bot for Peter Zijlstra
2011-03-31 13:28                         ` [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value Oleg Nesterov
2011-03-31 13:51                           ` Peter Zijlstra
2011-03-31 14:10                             ` Oleg Nesterov
2011-04-04 16:20                             ` Oleg Nesterov
2011-03-30 15:32                       ` Oleg Nesterov
2011-03-30 15:40                         ` Peter Zijlstra
2011-03-30 15:52                           ` Oleg Nesterov
2011-03-30 15:57                             ` Peter Zijlstra
2011-03-30 16:11                         ` Peter Zijlstra
2011-03-30 17:13                           ` Oleg Nesterov
2011-03-26 17:09       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1301387532.4859.54.camel@twins \
    --to=peterz@infradead.org \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox