From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus
Date: Tue, 15 Jan 2013 08:32:21 -0800 [thread overview]
Message-ID: <20130115163221.GN3384@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAFTL4hzYkPCmkfqBafkTA5E_snkLoC_y3p9xwumj=L1xWAWK0g@mail.gmail.com>
On Tue, Jan 15, 2013 at 01:13:10AM +0100, Frederic Weisbecker wrote:
> 2013/1/11 Colin Cross <ccross@android.com>:
> > Emulate NMIs on systems where they are not available by using timer
> > interrupts on other cpus. Each cpu will use its softlockup hrtimer
> > to check that the next cpu is processing hrtimer interrupts by
> > verifying that a counter is increasing.
> >
> > This patch is useful on systems where the hardlockup detector is not
> > available due to a lack of NMIs, for example most ARM SoCs.
> > Without this patch any cpu stuck with interrupts disabled can
> > cause a hardware watchdog reset with no debugging information,
> > but with this patch the kernel can detect the lockup and panic,
> > which can result in useful debugging info.
> >
> > Signed-off-by: Colin Cross <ccross@android.com>
>
> I believe this is pretty much what the RCU stall detector does
> already: checks for other CPUs being responsive. The only difference
> is on how it checks that. For RCU it's about checking for CPUs
> reporting quiescent states when requested to do so. In your case it's
> about ensuring the hrtimer interrupt is well handled.
>
> One thing you can do is to enqueue an RCU callback (cal_rcu()) every
> minute so you can force other CPUs to report quiescent states
> periodically and thus check for lockups.
This would work in all but one case, and that is where RCU believes
that the non-responsive CPU is in dyntick-idle mode. In that case,
RCU would not be expecting it to respond and would therefore ignore
any non-responsiveness.
> Now you'll face the same problem in the end: if you don't have NMIs,
> you won't have a very useful report.
Indeed, I must confess that I have thus far chickened out on solving
the general NMI problem. The RCU stall detector does try to look at
stacks remotely in some cases, but this is often unreliable, and some
architectures seem to refuse to produce a remote stack trace.
Thanx, Paul
prev parent reply other threads:[~2013-01-15 16:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-11 21:51 [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus Colin Cross
2013-01-14 23:49 ` Andrew Morton
2013-01-15 0:19 ` Colin Cross
2013-01-15 0:25 ` Andrew Morton
2013-01-15 0:30 ` Colin Cross
2013-01-23 2:38 ` Colin Cross
2013-01-15 1:40 ` Colin Cross
2013-01-15 0:13 ` Frederic Weisbecker
2013-01-15 0:22 ` Colin Cross
2013-01-15 0:25 ` Frederic Weisbecker
2013-01-15 1:53 ` Colin Cross
2013-01-15 2:48 ` Frederic Weisbecker
2013-01-15 3:26 ` Colin Cross
2013-01-15 16:32 ` Paul E. McKenney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130115163221.GN3384@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).