All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>, Paul Mackerras <paulus@samba.org>,
	Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value
Date: Tue, 29 Mar 2011 10:32:12 +0200	[thread overview]
Message-ID: <1301387532.4859.54.camel@twins> (raw)
In-Reply-To: <20110328165648.GA9304@redhat.com>

On Mon, 2011-03-28 at 18:56 +0200, Oleg Nesterov wrote:
> On 03/28, Peter Zijlstra wrote:
> >
> > Another fun race, suppose we do properly remove task_ctx and is_active,
> > but then the task gets scheduled back in before free_event() gets around
> > to disabling the jump_label..
> 
> Yes, this too...
> 
> Well, ignoring the HAVE_JUMP_LABEL case... perhaps we can split
> perf_sched_events into 2 counters? I mean,
> 
> 	atomic_t perf_sched_events_in, perf_sched_events_out;
> 
> 	static inline void perf_event_task_sched_in(struct task_struct *task)
> 	{
> 		COND_STMT(&perf_sched_events_in, __perf_event_task_sched_in(task));
> 	}
> 
> 	static inline
> 	void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
> 	{
> 		perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);
> 
> 		COND_STMT(&perf_sched_events_out, __perf_event_task_sched_out(task, next));
> 	}
> 
> 	void perf_sched_events_inc(void)
> 	{
> 		atomic_inc(&perf_sched_events_out);
> 		smp_mb__after_atomic_inc();
> 		atomic_inc(&perf_sched_events_in);
> 	}
> 
> 	void perf_sched_events_dec(void)
> 	{
> 		if (atomic_dec_and_test(&perf_sched_events_in))
> 			synchronize_sched();
> 		atomic_dec(&perf_sched_events_out);
> 	}
> 
> The last 2 helpers should be used instead of jump_label_inc/dec.

Very clever, my approach was to make __perf_event_task_sched_in() a NOP
when !nr_events, which opens up another race against
perf_install_in_context() but hey ;-) Added my current hackery below.


> As for HAVE_JUMP_LABEL, I still can't understand why this test-case
> triggers the problem. 

FWIW I tested without that..

> But jump_label_inc/dec logic looks obviously
> racy.
> 
> jump_label_dec:
> 
> 	if (atomic_dec_and_test(key))
> 		jump_label_disable(key);
> 
> Another thread can create the PERF_ATTACH_TASK event in between
> and call jump_label_update(JUMP_LABEL_ENABLE) first. Looks like,
> jump_label_update() should ensure that "type" matches the state
> of the "*key" under jump_label_lock().

No I think you're right, and I think we fixed that but it looks like
Ingo still didn't merge the new jump-label patches :/



---

Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -1461,9 +1461,6 @@ static void add_event_to_ctx(struct perf
 	event->tstamp_stopped = tstamp;
 }
 
-static void perf_event_context_sched_in(struct perf_event_context *ctx,
-					struct task_struct *tsk);
-
 /*
  * Cross CPU call to install and enable a performance event
  *
@@ -1473,20 +1470,11 @@ static int  __perf_install_in_context(vo
 {
 	struct perf_event *event = info;
 	struct perf_event_context *ctx = event->ctx;
-	struct perf_event *leader = event->group_leader;
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
-	int err;
-
-	/*
-	 * In case we're installing a new context to an already running task,
-	 * could also happen before perf_event_task_sched_in() on architectures
-	 * which do context switches with IRQs enabled.
-	 */
-	if (ctx->task && !cpuctx->task_ctx)
-		perf_event_context_sched_in(ctx, ctx->task);
+	struct perf_event_context *task_ctx;
+	struct task_struct *task = NULL;
 
 	raw_spin_lock(&ctx->lock);
-	ctx->is_active = 1;
 	update_context_time(ctx);
 	/*
 	 * update cgrp time only if current cgrp
@@ -1497,43 +1485,48 @@ static int  __perf_install_in_context(vo
 
 	add_event_to_ctx(event, ctx);
 
-	if (!event_filter_match(event))
-		goto unlock;
+	if (!event_filter_match(event)) {
+		raw_spin_unlock(&ctx->lock);
+		return;
+	}
+	raw_spin_unlock(&ctx->lock);
 
 	/*
-	 * Don't put the event on if it is disabled or if
-	 * it is in a group and the group isn't on.
+	 * Since both these are only set during context-switches
+	 * and IRQs are disabled, their value is stable.
 	 */
-	if (event->state != PERF_EVENT_STATE_INACTIVE ||
-	    (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE))
-		goto unlock;
+       	task_ctx = cpuctx->task_ctx;
 
+	perf_pmu_disable(ctx->pmu);
 	/*
-	 * An exclusive event can't go on if there are already active
-	 * hardware events, and no hardware event can go on if there
-	 * is already an exclusive event on.
-	 */
-	if (!group_can_go_on(event, cpuctx, 1))
-		err = -EEXIST;
-	else
-		err = event_sched_in(event, cpuctx, ctx);
-
-	if (err) {
-		/*
-		 * This event couldn't go on.  If it is in a group
-		 * then we have to pull the whole group off.
-		 * If the event group is pinned then put it in error state.
-		 */
-		if (leader != event)
-			group_sched_out(leader, cpuctx, ctx);
-		if (leader->attr.pinned) {
-			update_group_times(leader);
-			leader->state = PERF_EVENT_STATE_ERROR;
-		}
-	}
+	 * Reschedule the PMU to possible include the fresh event, we take the
+	 * brute force approach of unscheduling everything and then re-add the
+	 * events in the correct order (CPU-pinned, TASK-pinned, CPU-flexible,
+	 * TASK-flexible).
+	 *
+	 * It is possible we received this IPI before the scheduler called
+	 * perf_event_task_sched_in() on platforms that context switch with
+	 * interrupts enabled. In that case the below DTRT.
+	 */
+	cpu_ctx_sched_out(cpuctx, EVENT_ALL);
+	if (task_ctx)
+		ctx_sched_out(task_ctx, cpuctx, EVENT_ALL);
+
+	if (ctx->task) {
+		cpuctx->task_ctx = task_ctx = ctx;
+		task = ctx->task
+	} else if (task_ctx)
+		task = task_ctx->task;
+
+	cpu_ctx_sched_in(cpuctx, EVENT_PINNED, task);
+	if (task_ctx)
+		ctx_sched_in(task_ctx, cpuctx, EVENT_PINNED, task);
+	cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE, task);
+	if (task_ctx)
+		ctx_sched_in(task_ctx, cpuctx, EVENT_FLEXIBLE, task);
 
-unlock:
-	raw_spin_unlock(&ctx->lock);
+	perf_pmu_rotate_start(ctx->pmu);
+	perf_pmu_enable(ctx->pmu);
 
 	return 0;
 }
@@ -2114,8 +2107,19 @@ static void perf_event_context_sched_in(
 	struct perf_cpu_context *cpuctx;
 
 	cpuctx = __get_cpu_context(ctx);
-	if (cpuctx->task_ctx == ctx)
+	raw_spin_lock(&ctx->lock);
+	/*
+	 * Serialize against perf_install_in_context(), the interesting case
+	 * is where perf_install_in_context() finds the context inactive and
+	 * another cpu is just about to schedule the task in. In that case
+	 * we need to avoid observing a stale ctx->nr_events.
+	 */
+	ctx->is_active = 1;
+	if (cpuctx->task_ctx == ctx || !ctx->nr_events) {
+		raw_spin_lock(&ctx->lock);
 		return;
+	}
+	raw_spin_lock(&ctx->lock);
 
 	perf_pmu_disable(ctx->pmu);
 	/*
@@ -2125,12 +2129,12 @@ static void perf_event_context_sched_in(
 	 */
 	cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
 
+	cpuctx->task_ctx = ctx;
+
 	ctx_sched_in(ctx, cpuctx, EVENT_PINNED, task);
 	cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE, task);
 	ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE, task);
 
-	cpuctx->task_ctx = ctx;
-
 	/*
 	 * Since these rotations are per-cpu, we need to ensure the
 	 * cpu-context we got scheduled on is actually rotating.
@@ -2922,15 +2926,40 @@ static void free_event(struct perf_event
 	call_rcu(&event->rcu_head, free_event_rcu);
 }
 
-int perf_event_release_kernel(struct perf_event *event)
+static int __perf_event_release(void *info)
 {
+	struct perf_event *event = info;
 	struct perf_event_context *ctx = event->ctx;
+	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+	int ret;
 
 	/*
-	 * Remove from the PMU, can't get re-enabled since we got
-	 * here because the last ref went.
+	 * Disable the event if its still running, we're shutting down.
 	 */
-	perf_event_disable(event);
+	ret = __perf_event_disable(info);
+	if (ret)
+		return ret;
+
+	raw_spin_lock_irq(&ctx->lock);
+	perf_group_detach(event);
+	list_del_event(event, ctx);
+	/*
+	 * In case we removed the last event from an active task_ctx
+	 * deactivate the task_ctx because this event being freed might
+	 * lead to the perf_sched_events jump_label being disabled
+	 * which avoids the task sched-out hook from being called.
+	 */
+	if (!ctx->nr_events && cpuctx->task_ctx == ctx) {
+		ctx->is_active = 0;
+		cpuctx->task_ctx = NULL;
+	}
+	raw_spin_unlock_irq(&ctx->lock);
+}
+
+int perf_event_release_kernel(struct perf_event *event)
+{
+	struct perf_event_context *ctx = event->ctx;
+	struct task_struct *task = ctx->task;
 
 	WARN_ON_ONCE(ctx->parent_ctx);
 	/*
@@ -2946,10 +2975,28 @@ int perf_event_release_kernel(struct per
 	 *     to trigger the AB-BA case.
 	 */
 	mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
+	if (!task) {
+		cpu_function_call(event->cpu, __perf_event_release, event);
+		goto unlock;
+	}
+
+retry:
+	if (!task_function_call(task, __perf_event_release, event))
+		goto unlock;
+
 	raw_spin_lock_irq(&ctx->lock);
+	if (ctx->is_active) {
+		raw_spin_unlock_irq(&ctx->lock);
+		goto retry;
+	}
+
+	WARN_ON_ONCE(event->state == PERF_EVENT_STATE_ACTIVE);
+
 	perf_group_detach(event);
 	list_del_event(event, ctx);
 	raw_spin_unlock_irq(&ctx->lock);
+
+unlock:
 	mutex_unlock(&ctx->mutex);
 
 	free_event(event);


  reply	other threads:[~2011-03-29  8:32 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-24 16:44 [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value Jiri Olsa
2011-03-25 19:10 ` Oleg Nesterov
2011-03-26 15:37 ` Peter Zijlstra
2011-03-26 16:13   ` Oleg Nesterov
2011-03-26 16:38     ` Peter Zijlstra
2011-03-26 17:09       ` Oleg Nesterov
2011-03-26 17:35         ` Oleg Nesterov
2011-03-26 18:29           ` Peter Zijlstra
2011-03-26 18:49             ` Oleg Nesterov
2011-03-28 13:30             ` Oleg Nesterov
2011-03-28 14:57               ` Peter Zijlstra
2011-03-28 15:00                 ` Peter Zijlstra
2011-03-28 15:15                 ` Oleg Nesterov
2011-03-28 16:27                   ` Peter Zijlstra
2011-03-28 15:39                     ` Oleg Nesterov
2011-03-28 15:49                 ` Peter Zijlstra
2011-03-28 16:56                   ` Oleg Nesterov
2011-03-29  8:32                     ` Peter Zijlstra [this message]
2011-03-29 10:49                       ` Peter Zijlstra
2011-03-29 16:28                       ` Oleg Nesterov
2011-03-29 19:01                         ` Peter Zijlstra
2011-03-30 13:09                     ` Jiri Olsa
2011-03-30 14:51                       ` Peter Zijlstra
2011-03-30 16:37                         ` Oleg Nesterov
2011-03-30 18:30                           ` Paul E. McKenney
2011-03-30 19:53                             ` Oleg Nesterov
2011-03-30 21:26                           ` Peter Zijlstra
2011-03-30 21:35                             ` Oleg Nesterov
2011-03-31 10:32                             ` Jiri Olsa
2011-03-31 12:41                             ` [tip:perf/urgent] perf: Fix task context scheduling tip-bot for Peter Zijlstra
2011-03-31 13:28                         ` [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value Oleg Nesterov
2011-03-31 13:51                           ` Peter Zijlstra
2011-03-31 14:10                             ` Oleg Nesterov
2011-04-04 16:20                             ` Oleg Nesterov
2011-03-30 15:32                       ` Oleg Nesterov
2011-03-30 15:40                         ` Peter Zijlstra
2011-03-30 15:52                           ` Oleg Nesterov
2011-03-30 15:57                             ` Peter Zijlstra
2011-03-30 16:11                         ` Peter Zijlstra
2011-03-30 17:13                           ` Oleg Nesterov
2011-03-26 17:09       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1301387532.4859.54.camel@twins \
    --to=peterz@infradead.org \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.