linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Anton Blanchard <anton@samba.org>
Cc: mikey@neuling.org, paulus@samba.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 2/2] powerpc: Add ppc64 hard lockup detector support
Date: Mon, 11 Aug 2014 16:42:19 -0700	[thread overview]
Message-ID: <20140811234219.GH5821@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140812093137.404e8930@kryten>

On Tue, Aug 12, 2014 at 09:31:37AM +1000, Anton Blanchard wrote:
> The hard lockup detector uses a PMU event as a periodic NMI to
> detect if we are stuck (where stuck means no timer interrupts have
> occurred).
> 
> Ben's rework of the ppc64 soft disable code has made ppc64 PMU
> exceptions a partial NMI. They can get disabled if an external interrupt
> comes in, but otherwise PMU interrupts will fire in interrupt disabled
> regions.
> 
> I wrote a kernel module to test this patch and noticed we sometimes
> missed hard lockup warnings. The RCU code detected the stall first and
> issued an IPI to backtrace all CPUs. Unfortunately an IPI is an external
> interrupt and that will hard disable interrupts, preventing the hard
> lockup detector from going off.

If it helps, commit bc1dce514e9b (rcu: Don't use NMIs to dump other
CPUs' stacks) makes RCU avoid this behavior.  It instead reads the
stacks out remotely when this commit is applied.  It is in -tip, and
should make mainline this merge window.  Corresponding patch below.

						Thanx, Paul

------------------------------------------------------------------------

rcu: Don't use NMIs to dump other CPUs' stacks

Although NMI-based stack dumps are in principle more accurate, they are
also more likely to trigger deadlocks.  This commit therefore replaces
all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
that the CPU detecting an RCU CPU stall does the stack dumping.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3f93033d3c61..8f3e4d43d736 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1013,10 +1013,7 @@ static void record_gp_stall_check_time(struct rcu_state *rsp)
 }
 
 /*
- * Dump stacks of all tasks running on stalled CPUs.  This is a fallback
- * for architectures that do not implement trigger_all_cpu_backtrace().
- * The NMI-triggered stack traces are more accurate because they are
- * printed by the target CPU.
+ * Dump stacks of all tasks running on stalled CPUs.
  */
 static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 {
@@ -1094,7 +1091,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
 	       (long)rsp->gpnum, (long)rsp->completed, totqlen);
 	if (ndetected == 0)
 		pr_err("INFO: Stall ended before state dump start\n");
-	else if (!trigger_all_cpu_backtrace())
+	else
 		rcu_dump_cpu_stacks(rsp);
 
 	/* Complain about tasks blocking the grace period. */
@@ -1125,8 +1122,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
 	pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
 		jiffies - rsp->gp_start,
 		(long)rsp->gpnum, (long)rsp->completed, totqlen);
-	if (!trigger_all_cpu_backtrace())
-		dump_stack();
+	rcu_dump_cpu_stacks(rsp);
 
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	if (ULONG_CMP_GE(jiffies, ACCESS_ONCE(rsp->jiffies_stall)))

  reply	other threads:[~2014-08-11 23:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05  4:55 [PATCH 1/2] powerpc: Hard disable interrupts in xmon Anton Blanchard
2014-08-05  4:56 ` [PATCH 2/2] powerpc: Add ppc64 hard lockup detector support Anton Blanchard
2014-08-11 23:31   ` Anton Blanchard
2014-08-11 23:42     ` Paul E. McKenney [this message]
  -- strict thread matches above, loose matches on Subject: below --
2015-01-21  3:46 [PATCH 1/2] oprofile: Add HAVE_OPROFILE_NMI_TIMER Anton Blanchard
2015-01-21  3:46 ` [PATCH 2/2] powerpc: Add ppc64 hard lockup detector support Anton Blanchard
2015-04-09  2:52 [PATCH 1/2] oprofile: Disable oprofile NMI timer on ppc64 Anton Blanchard
2015-04-09  2:52 ` [PATCH 2/2] powerpc: Add ppc64 hard lockup detector support Anton Blanchard
2015-04-09 23:48   ` Scott Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140811234219.GH5821@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mikey@neuling.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).