From: Frederic Weisbecker <fweisbec@gmail.com>
To: David Miller <davem@davemloft.net>, Steven Rostedt <rostedt@goodmis.org>
Cc: sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org,
mingo@elte.hu, acme@redhat.com, a.p.zijlstra@chello.nl,
paulus@samba.org
Subject: Re: Random scheduler/unaligned accesses crashes with perf lock
Date: Tue, 06 Apr 2010 10:19:28 +0000 [thread overview]
Message-ID: <20100406101925.GD5147@nowhere> (raw)
In-Reply-To: <20100406.025049.267615796.davem@davemloft.net>
On Tue, Apr 06, 2010 at 02:50:49AM -0700, David Miller wrote:
> From: Frederic Weisbecker <fweisbec@gmail.com>
> Date: Mon, 5 Apr 2010 21:40:58 +0200
>
> > It happens without CONFIG_FUNCTION_TRACER as well (but it happens
> > when the function tracer runs). And I hadn't your
> > perf_arch_save_caller_regs() when I triggered this.
>
> I figured out the problem, it's NMIs. As soon as I disable all of the
> NMI watchdog code, the problem goes away.
>
> This is because some parts of the NMI interrupt handling path are not
> marked with "notrace" and the various tracer code paths use
> local_irq_disable() (either directly or indirectly) which doesn't work
> with sparc64's NMI scheme. These essentially turn NMIs back on in the
> NMI handler before the NMI condition has been cleared, and thus we can
> re-enter with another NMI interrupt.
>
> We went through this for perf events, and we just made sure that
> local_irq_{enable,disable}() never occurs in any of the code paths in
> perf events that can be reached via the NMI interrupt handler. (the
> only one we had was sched_clock() and that was easily fixed)
>
> So, the first mcount hit we get is for rcu_nmi_enter() via
> nmi_enter().
>
> I can see two ways to handle this:
>
> 1) Pepper 'notrace' markers onto rcu_nmi_enter(), rcu_nmi_exit()
> and whatever else I can see getting hit in the NMI interrupt
> handler code paths.
>
> 2) Add a hack to __raw_local_irq_save() that keeps it from writing
> anything to the interrupt level register if we have NMI's disabled.
> (this puts the cost on the entire kernel instead of just the NMI
> paths).
>
> #1 seems to be the intent on other platforms, the majority of the NMI
> code paths are protected with 'notrace' on x86, I bet nobody noticed
> that nmi_enter() when CONFIG_NO_HZ && !CONFIG_TINY_RCU ends up calling
> a function that does tracing.
>
> The next one we'll hit is atomic_notifier_call_chain() (amusingly
> notify_die() is marked 'notrace' but the one thing it calls isn't)
>
> For example, the following are the generic notrace annotations I
> would need to get sparc64 ftrace functioning again. (Frederic I will
> send you the full patch with the sparc specific bits under seperate
> cover in so that you can test things...)
>
> --------------------
> kernel: Add notrace annotations to common routines invoked via NMI.
>
> This includes the atomic notifier call chain as well as the RCU
> specific NMI enter/exit handlers.
Ok, but this as a cause looks weird.
The function tracer handler disables interrupts. I don't remember exactly
why but we also have a no-preempt mode that only disables preemption instead:
(function_trace_call_preempt_only())
It means having such interrupt reentrancy is not a problem. In fact, the
function tracer is not reentrant:
data = tr->data[cpu];
disabled = atomic_inc_return(&data->disabled);
if (likely(disabled = 1))
trace_function(tr, ip, parent_ip, flags, pc);
atomic_dec(&data->disabled);
we do this just to prevent from tracing recursion (in case we have
a traceable function in the inner function tracing path).
Nmis are just supposed to be fine with the function tracer.
WARNING: multiple messages have this Message-ID (diff)
From: Frederic Weisbecker <fweisbec@gmail.com>
To: David Miller <davem@davemloft.net>, Steven Rostedt <rostedt@goodmis.org>
Cc: sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org,
mingo@elte.hu, acme@redhat.com, a.p.zijlstra@chello.nl,
paulus@samba.org
Subject: Re: Random scheduler/unaligned accesses crashes with perf lock events on sparc 64
Date: Tue, 6 Apr 2010 12:19:28 +0200 [thread overview]
Message-ID: <20100406101925.GD5147@nowhere> (raw)
In-Reply-To: <20100406.025049.267615796.davem@davemloft.net>
On Tue, Apr 06, 2010 at 02:50:49AM -0700, David Miller wrote:
> From: Frederic Weisbecker <fweisbec@gmail.com>
> Date: Mon, 5 Apr 2010 21:40:58 +0200
>
> > It happens without CONFIG_FUNCTION_TRACER as well (but it happens
> > when the function tracer runs). And I hadn't your
> > perf_arch_save_caller_regs() when I triggered this.
>
> I figured out the problem, it's NMIs. As soon as I disable all of the
> NMI watchdog code, the problem goes away.
>
> This is because some parts of the NMI interrupt handling path are not
> marked with "notrace" and the various tracer code paths use
> local_irq_disable() (either directly or indirectly) which doesn't work
> with sparc64's NMI scheme. These essentially turn NMIs back on in the
> NMI handler before the NMI condition has been cleared, and thus we can
> re-enter with another NMI interrupt.
>
> We went through this for perf events, and we just made sure that
> local_irq_{enable,disable}() never occurs in any of the code paths in
> perf events that can be reached via the NMI interrupt handler. (the
> only one we had was sched_clock() and that was easily fixed)
>
> So, the first mcount hit we get is for rcu_nmi_enter() via
> nmi_enter().
>
> I can see two ways to handle this:
>
> 1) Pepper 'notrace' markers onto rcu_nmi_enter(), rcu_nmi_exit()
> and whatever else I can see getting hit in the NMI interrupt
> handler code paths.
>
> 2) Add a hack to __raw_local_irq_save() that keeps it from writing
> anything to the interrupt level register if we have NMI's disabled.
> (this puts the cost on the entire kernel instead of just the NMI
> paths).
>
> #1 seems to be the intent on other platforms, the majority of the NMI
> code paths are protected with 'notrace' on x86, I bet nobody noticed
> that nmi_enter() when CONFIG_NO_HZ && !CONFIG_TINY_RCU ends up calling
> a function that does tracing.
>
> The next one we'll hit is atomic_notifier_call_chain() (amusingly
> notify_die() is marked 'notrace' but the one thing it calls isn't)
>
> For example, the following are the generic notrace annotations I
> would need to get sparc64 ftrace functioning again. (Frederic I will
> send you the full patch with the sparc specific bits under seperate
> cover in so that you can test things...)
>
> --------------------
> kernel: Add notrace annotations to common routines invoked via NMI.
>
> This includes the atomic notifier call chain as well as the RCU
> specific NMI enter/exit handlers.
Ok, but this as a cause looks weird.
The function tracer handler disables interrupts. I don't remember exactly
why but we also have a no-preempt mode that only disables preemption instead:
(function_trace_call_preempt_only())
It means having such interrupt reentrancy is not a problem. In fact, the
function tracer is not reentrant:
data = tr->data[cpu];
disabled = atomic_inc_return(&data->disabled);
if (likely(disabled == 1))
trace_function(tr, ip, parent_ip, flags, pc);
atomic_dec(&data->disabled);
we do this just to prevent from tracing recursion (in case we have
a traceable function in the inner function tracing path).
Nmis are just supposed to be fine with the function tracer.
next prev parent reply other threads:[~2010-04-06 10:19 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-04 12:18 Random scheduler/unaligned accesses crashes with perf lock events Frederic Weisbecker
2010-04-04 12:18 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 Frederic Weisbecker
2010-04-04 12:21 ` Random scheduler/unaligned accesses crashes with perf lock Frederic Weisbecker
2010-04-04 12:21 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 Frederic Weisbecker
2010-04-05 1:00 ` Random scheduler/unaligned accesses crashes with perf lock David Miller
2010-04-05 1:00 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 David Miller
2010-04-05 6:57 ` Random scheduler/unaligned accesses crashes with perf lock Frederic Weisbecker
2010-04-05 6:57 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 Frederic Weisbecker
2010-04-05 19:22 ` Random scheduler/unaligned accesses crashes with perf lock David Miller
2010-04-05 19:22 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 David Miller
2010-04-05 19:40 ` Random scheduler/unaligned accesses crashes with perf lock Frederic Weisbecker
2010-04-05 19:40 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 Frederic Weisbecker
2010-04-05 20:46 ` Random scheduler/unaligned accesses crashes with perf lock David Miller
2010-04-05 20:46 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 David Miller
2010-04-06 2:15 ` Random scheduler/unaligned accesses crashes with perf lock David Miller
2010-04-06 2:15 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 David Miller
2010-04-06 13:41 ` Random scheduler/unaligned accesses crashes with perf lock Steven Rostedt
2010-04-06 13:41 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 Steven Rostedt
2010-04-06 17:46 ` Random scheduler/unaligned accesses crashes with perf lock David Miller
2010-04-06 17:46 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 David Miller
2010-04-06 18:15 ` Random scheduler/unaligned accesses crashes with perf lock Steven Rostedt
2010-04-06 18:15 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 Steven Rostedt
2010-04-06 21:17 ` Random scheduler/unaligned accesses crashes with perf lock David Miller
2010-04-06 21:17 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 David Miller
2010-04-06 9:50 ` Random scheduler/unaligned accesses crashes with perf lock David Miller
2010-04-06 9:50 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 David Miller
2010-04-06 10:19 ` Frederic Weisbecker [this message]
2010-04-06 10:19 ` Frederic Weisbecker
2010-04-06 10:28 ` Random scheduler/unaligned accesses crashes with perf lock David Miller
2010-04-06 10:28 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 David Miller
2010-04-06 11:12 ` [RFC][PATCH] lockdep: WARN about local_irq_{en,dis}able in NMI Peter Zijlstra
2010-04-06 11:12 ` [RFC][PATCH] lockdep: WARN about local_irq_{en,dis}able in NMI context Peter Zijlstra
2010-04-06 11:13 ` [RFC][PATCH] lockdep: WARN about local_irq_{en,dis}able in NMI David Miller
2010-04-06 11:13 ` [RFC][PATCH] lockdep: WARN about local_irq_{en,dis}able in NMI context David Miller
2010-04-06 11:20 ` [RFC][PATCH] lockdep: WARN about local_irq_{en,dis}able in NMI Peter Zijlstra
2010-04-06 11:20 ` [RFC][PATCH] lockdep: WARN about local_irq_{en,dis}able in NMI context Peter Zijlstra
2010-04-06 11:22 ` [RFC][PATCH] lockdep: WARN about local_irq_{en,dis}able in NMI David Miller
2010-04-06 11:22 ` [RFC][PATCH] lockdep: WARN about local_irq_{en,dis}able in NMI context David Miller
2010-04-06 11:38 ` Random scheduler/unaligned accesses crashes with perf lock Frederic Weisbecker
2010-04-06 11:38 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 Frederic Weisbecker
2010-04-06 11:51 ` Random scheduler/unaligned accesses crashes with perf lock Peter Zijlstra
2010-04-06 11:51 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 Peter Zijlstra
2010-04-06 12:54 ` Random scheduler/unaligned accesses crashes with perf lock Mike Galbraith
2010-04-06 12:54 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 Mike Galbraith
2010-04-06 12:57 ` Random scheduler/unaligned accesses crashes with perf lock Peter Zijlstra
2010-04-06 12:57 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 Peter Zijlstra
2010-04-06 18:04 ` Random scheduler/unaligned accesses crashes with perf lock Paul E. McKenney
2010-04-06 18:04 ` Random scheduler/unaligned accesses crashes with perf lock events on sparc 64 Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100406101925.GD5147@nowhere \
--to=fweisbec@gmail.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@redhat.com \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=paulus@samba.org \
--cc=rostedt@goodmis.org \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.