* [PATCH 1/2] powerpc: Hard disable interrupts in xmon
@ 2014-08-05 4:55 Anton Blanchard
2014-08-05 4:56 ` [PATCH 2/2] powerpc: Add ppc64 hard lockup detector support Anton Blanchard
0 siblings, 1 reply; 4+ messages in thread
From: Anton Blanchard @ 2014-08-05 4:55 UTC (permalink / raw)
To: benh, paulus, mpe, paulmck; +Cc: linuxppc-dev
xmon only soft disables interrupts. This seems like a bad idea - we
certainly don't want decrementer and PMU exceptions going off when
we are debugging something inside xmon.
This issue was uncovered when the hard lockup detector went off
inside xmon. To ensure we wont get a spurious hard lockup warning,
I also call touch_nmi_watchdog() when exiting xmon.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: b/arch/powerpc/xmon/xmon.c
===================================================================
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -24,6 +24,7 @@
#include <linux/interrupt.h>
#include <linux/irq.h>
#include <linux/bug.h>
+#include <linux/nmi.h>
#include <asm/ptrace.h>
#include <asm/string.h>
@@ -374,6 +375,7 @@ static int xmon_core(struct pt_regs *reg
#endif
local_irq_save(flags);
+ hard_irq_disable();
bp = in_breakpoint_table(regs->nip, &offset);
if (bp != NULL) {
@@ -558,6 +560,7 @@ static int xmon_core(struct pt_regs *reg
#endif
insert_cpu_bpts();
+ touch_nmi_watchdog();
local_irq_restore(flags);
return cmd != 'X' && cmd != EOF;
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH 2/2] powerpc: Add ppc64 hard lockup detector support 2014-08-05 4:55 [PATCH 1/2] powerpc: Hard disable interrupts in xmon Anton Blanchard @ 2014-08-05 4:56 ` Anton Blanchard 2014-08-11 23:31 ` Anton Blanchard 0 siblings, 1 reply; 4+ messages in thread From: Anton Blanchard @ 2014-08-05 4:56 UTC (permalink / raw) To: benh, paulus, mpe, paulmck; +Cc: linuxppc-dev The hard lockup detector uses a PMU event as a periodic NMI to detect if we are stuck (where stuck means no timer interrupts have occurred). Ben's rework of the ppc64 soft disable code has made ppc64 PMU exceptions a partial NMI. They can get disabled if an external interrupt comes in, but otherwise PMU interrupts will fire in interrupt disabled regions. I wrote a kernel module to test this patch and noticed we sometimes missed hard lockup warnings. The RCU code detected the stall first and issued an IPI to backtrace all CPUs. Unfortunately an IPI is an external interrupt and that will hard disable interrupts, preventing the hard lockup detector from going off. If I reduced the hard lockup threshold to 5 seconds: echo 5 > /proc/sys/kernel/watchdog_thresh Then it would beat the RCU code in detecting a stall and get a correct backtrace out. Another downside is that our PMCs can only count to 2^31, so even when we ask for 10 seconds of processor cycles, we end up taking a couple of PMU exceptions a second. Signed-off-by: Anton Blanchard <anton@samba.org> --- Index: b/arch/powerpc/Kconfig =================================================================== --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -145,6 +145,7 @@ config PPC select HAVE_IRQ_EXIT_ON_IRQ_STACK select ARCH_USE_CMPXCHG_LOCKREF if PPC64 select HAVE_ARCH_AUDITSYSCALL + select HAVE_PERF_EVENTS_NMI if PPC64 config GENERIC_CSUM def_bool CPU_LITTLE_ENDIAN Index: b/arch/powerpc/include/asm/nmi.h =================================================================== --- /dev/null +++ b/arch/powerpc/include/asm/nmi.h @@ -0,0 +1,4 @@ +#ifndef _ASM_NMI_H +#define _ASM_NMI_H + +#endif /* _ASM_NMI_H */ Index: b/arch/powerpc/kernel/setup_64.c =================================================================== --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -796,3 +796,10 @@ unsigned long memory_block_size_bytes(vo struct ppc_pci_io ppc_pci_io; EXPORT_SYMBOL(ppc_pci_io); #endif + +#ifdef CONFIG_HARDLOCKUP_DETECTOR +u64 hw_nmi_get_sample_period(int watchdog_thresh) +{ + return ppc_proc_freq * watchdog_thresh; +} +#endif ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 2/2] powerpc: Add ppc64 hard lockup detector support 2014-08-05 4:56 ` [PATCH 2/2] powerpc: Add ppc64 hard lockup detector support Anton Blanchard @ 2014-08-11 23:31 ` Anton Blanchard 2014-08-11 23:42 ` Paul E. McKenney 0 siblings, 1 reply; 4+ messages in thread From: Anton Blanchard @ 2014-08-11 23:31 UTC (permalink / raw) To: benh, paulus, mpe, paulmck; +Cc: mikey, linuxppc-dev The hard lockup detector uses a PMU event as a periodic NMI to detect if we are stuck (where stuck means no timer interrupts have occurred). Ben's rework of the ppc64 soft disable code has made ppc64 PMU exceptions a partial NMI. They can get disabled if an external interrupt comes in, but otherwise PMU interrupts will fire in interrupt disabled regions. I wrote a kernel module to test this patch and noticed we sometimes missed hard lockup warnings. The RCU code detected the stall first and issued an IPI to backtrace all CPUs. Unfortunately an IPI is an external interrupt and that will hard disable interrupts, preventing the hard lockup detector from going off. If I reduced the hard lockup threshold to 5 seconds: echo 5 > /proc/sys/kernel/watchdog_thresh Then it would beat the RCU code in detecting a stall and get a correct backtrace out. Another downside is that our PMCs can only count to 2^31, so even when we ask for 10 seconds of processor cycles, we end up taking a couple of PMU exceptions a second. Signed-off-by: Anton Blanchard <anton@samba.org> --- v2: Mikey noticed a build issue with oprofile. Since our NMI is just the PMU hardware it doesn't make any sense for oprofile to try and use it. Index: b/arch/powerpc/Kconfig =================================================================== --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -145,6 +145,7 @@ config PPC select HAVE_IRQ_EXIT_ON_IRQ_STACK select ARCH_USE_CMPXCHG_LOCKREF if PPC64 select HAVE_ARCH_AUDITSYSCALL + select HAVE_PERF_EVENTS_NMI if PPC64 config GENERIC_CSUM def_bool CPU_LITTLE_ENDIAN Index: b/arch/powerpc/include/asm/nmi.h =================================================================== --- /dev/null +++ b/arch/powerpc/include/asm/nmi.h @@ -0,0 +1,4 @@ +#ifndef _ASM_NMI_H +#define _ASM_NMI_H + +#endif /* _ASM_NMI_H */ Index: b/arch/powerpc/kernel/setup_64.c =================================================================== --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -800,3 +800,10 @@ unsigned long memory_block_size_bytes(vo struct ppc_pci_io ppc_pci_io; EXPORT_SYMBOL(ppc_pci_io); #endif + +#ifdef CONFIG_HARDLOCKUP_DETECTOR +u64 hw_nmi_get_sample_period(int watchdog_thresh) +{ + return ppc_proc_freq * watchdog_thresh; +} +#endif Index: b/arch/Kconfig =================================================================== --- a/arch/Kconfig +++ b/arch/Kconfig @@ -32,7 +32,7 @@ config HAVE_OPROFILE config OPROFILE_NMI_TIMER def_bool y - depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI + depends on (PERF_EVENTS && HAVE_PERF_EVENTS_NMI) && !PPC config KPROBES bool "Kprobes" ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] powerpc: Add ppc64 hard lockup detector support 2014-08-11 23:31 ` Anton Blanchard @ 2014-08-11 23:42 ` Paul E. McKenney 0 siblings, 0 replies; 4+ messages in thread From: Paul E. McKenney @ 2014-08-11 23:42 UTC (permalink / raw) To: Anton Blanchard; +Cc: mikey, paulus, linuxppc-dev On Tue, Aug 12, 2014 at 09:31:37AM +1000, Anton Blanchard wrote: > The hard lockup detector uses a PMU event as a periodic NMI to > detect if we are stuck (where stuck means no timer interrupts have > occurred). > > Ben's rework of the ppc64 soft disable code has made ppc64 PMU > exceptions a partial NMI. They can get disabled if an external interrupt > comes in, but otherwise PMU interrupts will fire in interrupt disabled > regions. > > I wrote a kernel module to test this patch and noticed we sometimes > missed hard lockup warnings. The RCU code detected the stall first and > issued an IPI to backtrace all CPUs. Unfortunately an IPI is an external > interrupt and that will hard disable interrupts, preventing the hard > lockup detector from going off. If it helps, commit bc1dce514e9b (rcu: Don't use NMIs to dump other CPUs' stacks) makes RCU avoid this behavior. It instead reads the stacks out remotely when this commit is applied. It is in -tip, and should make mainline this merge window. Corresponding patch below. Thanx, Paul ------------------------------------------------------------------------ rcu: Don't use NMIs to dump other CPUs' stacks Although NMI-based stack dumps are in principle more accurate, they are also more likely to trigger deadlocks. This commit therefore replaces all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so that the CPU detecting an RCU CPU stall does the stack dumping. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 3f93033d3c61..8f3e4d43d736 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1013,10 +1013,7 @@ static void record_gp_stall_check_time(struct rcu_state *rsp) } /* - * Dump stacks of all tasks running on stalled CPUs. This is a fallback - * for architectures that do not implement trigger_all_cpu_backtrace(). - * The NMI-triggered stack traces are more accurate because they are - * printed by the target CPU. + * Dump stacks of all tasks running on stalled CPUs. */ static void rcu_dump_cpu_stacks(struct rcu_state *rsp) { @@ -1094,7 +1091,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp) (long)rsp->gpnum, (long)rsp->completed, totqlen); if (ndetected == 0) pr_err("INFO: Stall ended before state dump start\n"); - else if (!trigger_all_cpu_backtrace()) + else rcu_dump_cpu_stacks(rsp); /* Complain about tasks blocking the grace period. */ @@ -1125,8 +1122,7 @@ static void print_cpu_stall(struct rcu_state *rsp) pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n", jiffies - rsp->gp_start, (long)rsp->gpnum, (long)rsp->completed, totqlen); - if (!trigger_all_cpu_backtrace()) - dump_stack(); + rcu_dump_cpu_stacks(rsp); raw_spin_lock_irqsave(&rnp->lock, flags); if (ULONG_CMP_GE(jiffies, ACCESS_ONCE(rsp->jiffies_stall))) ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-08-11 23:42 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-08-05 4:55 [PATCH 1/2] powerpc: Hard disable interrupts in xmon Anton Blanchard 2014-08-05 4:56 ` [PATCH 2/2] powerpc: Add ppc64 hard lockup detector support Anton Blanchard 2014-08-11 23:31 ` Anton Blanchard 2014-08-11 23:42 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).