From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Olsa <jolsa@redhat.com>, Paul Mackerras <paulus@samba.org>,
Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value
Date: Sat, 26 Mar 2011 17:13:46 +0100 [thread overview]
Message-ID: <20110326161346.GA18272@redhat.com> (raw)
In-Reply-To: <1301153868.2250.359.camel@laptop>
On 03/26, Peter Zijlstra wrote:
>
> On Thu, 2011-03-24 at 17:44 +0100, Jiri Olsa wrote:
> >
> > - close is called on event on CPU 0:
> > - the task is scheduled on CPU 0
> > - __perf_event_task_sched_in is called
> > - cpuctx->task_ctx is set
> > - perf_sched_events jump label is decremented and == 0
> > - __perf_event_task_sched_out is not called
> > - cpuctx->task_ctx on CPU 0 stays set
> >
> > - exit is called on CPU 1:
> > - the task is scheduled on CPU 1
> > - perf_event_exit_task is called
> > - task_ctx_sched_out unsets cpuctx->task_ctx on CPU 1
> > - put_ctx destroys the context
> >
> > - another call of perf_rotate_context on CPU 0 will use invalid
> > task_ctx pointer, and eventualy panic
> >
> >
> > The attached workaround makes sure that the task_ctx is not set
> > when the context is being removed. As I said it's not ment to be
> > fix.
>
> Still having somewhat of a cold, how does the below look?
>
> (completely untested so far, will have to bang on your testcase a bit to
> make it work).
>
> ---
> kernel/perf_event.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index c75925c..2a03cc4 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -1112,6 +1112,8 @@ static int __perf_remove_from_context(void *info)
> raw_spin_lock(&ctx->lock);
> event_sched_out(event, cpuctx, ctx);
> list_del_event(event, ctx);
> + if (cpuctx->task_ctx == event->ctx && !event->ctx->nr_active)
> + cpuctx->task_ctx = NULL;
I don't think this is right.
It is too late to clear ->task_ctx when the task exits. It is simply
wrong that cpuctx->task_ctx != NULL after context_switch(). And, once
again ->is_active is still true.
Besides, you can trust __get_cpu_context(), it possible that another
CPU has cpuctx->task_ctx == event->ctx. Otherwise task_ctx_sched_out()
has already cleared cpuctx->task_ctx.
Finally, in this case there are no events attached to this context,
close(event_fd) removes the only one.
But there is one thing I can't understand. Jiri can trigger this bug
even with HAVE_JUMP_LABEL. How? OK, jump_label_dec/jump_label_inc are
obviously racy, but this test-case can't trigger the race.
So, we are doing free_event()->jump_label_dec()->jump_label_update(DISABLE)
and this implies __stop_machine(). This means we have at least one
context_switch() from the task with the active ->task_ctx to the
migration thread. And this happens before JUMP_LABEL() code was
actually changed, perf_event_task_sched_out() should call
__perf_event_task_sched_out() and clear task_ctx. perf_sched_events
is already zero, but this shouldn't matter.
Confused.
Oleg.
Oleg.
Oleg.
next prev parent reply other threads:[~2011-03-26 16:23 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-24 16:44 [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value Jiri Olsa
2011-03-25 19:10 ` Oleg Nesterov
2011-03-26 15:37 ` Peter Zijlstra
2011-03-26 16:13 ` Oleg Nesterov [this message]
2011-03-26 16:38 ` Peter Zijlstra
2011-03-26 17:09 ` Oleg Nesterov
2011-03-26 17:35 ` Oleg Nesterov
2011-03-26 18:29 ` Peter Zijlstra
2011-03-26 18:49 ` Oleg Nesterov
2011-03-28 13:30 ` Oleg Nesterov
2011-03-28 14:57 ` Peter Zijlstra
2011-03-28 15:00 ` Peter Zijlstra
2011-03-28 15:15 ` Oleg Nesterov
2011-03-28 16:27 ` Peter Zijlstra
2011-03-28 15:39 ` Oleg Nesterov
2011-03-28 15:49 ` Peter Zijlstra
2011-03-28 16:56 ` Oleg Nesterov
2011-03-29 8:32 ` Peter Zijlstra
2011-03-29 10:49 ` Peter Zijlstra
2011-03-29 16:28 ` Oleg Nesterov
2011-03-29 19:01 ` Peter Zijlstra
2011-03-30 13:09 ` Jiri Olsa
2011-03-30 14:51 ` Peter Zijlstra
2011-03-30 16:37 ` Oleg Nesterov
2011-03-30 18:30 ` Paul E. McKenney
2011-03-30 19:53 ` Oleg Nesterov
2011-03-30 21:26 ` Peter Zijlstra
2011-03-30 21:35 ` Oleg Nesterov
2011-03-31 10:32 ` Jiri Olsa
2011-03-31 12:41 ` [tip:perf/urgent] perf: Fix task context scheduling tip-bot for Peter Zijlstra
2011-03-31 13:28 ` [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value Oleg Nesterov
2011-03-31 13:51 ` Peter Zijlstra
2011-03-31 14:10 ` Oleg Nesterov
2011-04-04 16:20 ` Oleg Nesterov
2011-03-30 15:32 ` Oleg Nesterov
2011-03-30 15:40 ` Peter Zijlstra
2011-03-30 15:52 ` Oleg Nesterov
2011-03-30 15:57 ` Peter Zijlstra
2011-03-30 16:11 ` Peter Zijlstra
2011-03-30 17:13 ` Oleg Nesterov
2011-03-26 17:09 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110326161346.GA18272@redhat.com \
--to=oleg@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=jolsa@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.