* [PATCH] sparc64: Normalize NMI watchdog logging and behavior.
@ 2014-05-04 5:27 David Miller
2014-05-04 7:04 ` Sam Ravnborg
2014-05-04 18:24 ` David Miller
0 siblings, 2 replies; 3+ messages in thread
From: David Miller @ 2014-05-04 5:27 UTC (permalink / raw)
To: sparclinux
Bring this code in line with the perf based generic NMI watchdog
in kernel/watchdog.c (which we should convert over to at some
point).
In particular, don't do anything super fancy when the watchdog
triggers, and specifically don't do a do_exit() which only makes
things worse.
Either panic(), or WARN(). The latter of which will do all of
the actions such as give us a stack backtrace.
Signed-off-by: David S. Miller <davem@davemloft.net>
---
I noticed this while trying to debug various kinds of hangs I can
trigger in 3.15, hopefully these adjustments make debugging easier
for other people as well.
Committed to 'sparc' GIT.
arch/sparc/kernel/nmi.c | 21 +++++----------------
1 file changed, 5 insertions(+), 16 deletions(-)
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 6479256..3370945 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -68,27 +68,16 @@ EXPORT_SYMBOL(touch_nmi_watchdog);
static void die_nmi(const char *str, struct pt_regs *regs, int do_panic)
{
+ int this_cpu = smp_processor_id();
+
if (notify_die(DIE_NMIWATCHDOG, str, regs, 0,
pt_regs_trap_type(regs), SIGINT) = NOTIFY_STOP)
return;
- console_verbose();
- bust_spinlocks(1);
-
- printk(KERN_EMERG "%s", str);
- printk(" on CPU%d, ip %08lx, registers:\n",
- smp_processor_id(), regs->tpc);
- show_regs(regs);
- dump_stack();
-
- bust_spinlocks(0);
-
if (do_panic || panic_on_oops)
- panic("Non maskable interrupt");
-
- nmi_exit();
- local_irq_enable();
- do_exit(SIGBUS);
+ panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
+ else
+ WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu);
}
notrace __kprobes void perfctr_irq(int irq, struct pt_regs *regs)
--
1.8.1.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] sparc64: Normalize NMI watchdog logging and behavior.
2014-05-04 5:27 [PATCH] sparc64: Normalize NMI watchdog logging and behavior David Miller
@ 2014-05-04 7:04 ` Sam Ravnborg
2014-05-04 18:24 ` David Miller
1 sibling, 0 replies; 3+ messages in thread
From: Sam Ravnborg @ 2014-05-04 7:04 UTC (permalink / raw)
To: sparclinux
On Sun, May 04, 2014 at 01:27:10AM -0400, David Miller wrote:
>
> Bring this code in line with the perf based generic NMI watchdog
> in kernel/watchdog.c (which we should convert over to at some
> point).
>
> In particular, don't do anything super fancy when the watchdog
> triggers, and specifically don't do a do_exit() which only makes
> things worse.
It is always good when we can use more of the generic functionality.
Should we do something remotely similar for sparc32?
It looks like the sun4m_nmi() function is also used for sun4d + leon,
but I need to look again to make sure.
Sam
Something like this:
diff --git a/arch/sparc/kernel/sun4m_irq.c b/arch/sparc/kernel/sun4m_irq.c
index 8bb3b3f..7c8ad6f 100644
--- a/arch/sparc/kernel/sun4m_irq.c
+++ b/arch/sparc/kernel/sun4m_irq.c
@@ -308,28 +308,28 @@ static void sun4m_clear_clock_irq(void)
void sun4m_nmi(struct pt_regs *regs)
{
unsigned long afsr, afar, si;
+ char *reason = "unknown";
- printk(KERN_ERR "Aieee: sun4m NMI received!\n");
/* XXX HyperSparc hack XXX */
__asm__ __volatile__("mov 0x500, %%g1\n\t"
"lda [%%g1] 0x4, %0\n\t"
"mov 0x600, %%g1\n\t"
"lda [%%g1] 0x4, %1\n\t" :
"=r" (afsr), "=r" (afar));
- printk(KERN_ERR "afsr=%08lx afar=%08lx\n", afsr, afar);
+
si = sbus_readl(&sun4m_irq_global->pending);
printk(KERN_ERR "si=%08lx\n", si);
if (si & SUN4M_INT_MODULE_ERR)
- printk(KERN_ERR "Module async error\n");
+ reason = "Module async error";
if (si & SUN4M_INT_M2S_WRITE_ERR)
- printk(KERN_ERR "MBus/SBus async error\n");
+ reason = "MBus/SBus async error";
if (si & SUN4M_INT_ECC_ERR)
- printk(KERN_ERR "ECC memory error\n");
+ reason = "ECC memory error";
if (si & SUN4M_INT_VME_ERR)
- printk(KERN_ERR "VME async error\n");
- printk(KERN_ERR "you lose buddy boy...\n");
- show_regs(regs);
- prom_halt();
+ reason = "VME async error";
+
+ panic("sun4m NMI received (%s), afsr=%08lx afar=%08lx\n",
+ reason, afsr, afar);
}
void sun4m_unmask_profile_irq(void)
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] sparc64: Normalize NMI watchdog logging and behavior.
2014-05-04 5:27 [PATCH] sparc64: Normalize NMI watchdog logging and behavior David Miller
2014-05-04 7:04 ` Sam Ravnborg
@ 2014-05-04 18:24 ` David Miller
1 sibling, 0 replies; 3+ messages in thread
From: David Miller @ 2014-05-04 18:24 UTC (permalink / raw)
To: sparclinux
From: Sam Ravnborg <sam@ravnborg.org>
Date: Sun, 4 May 2014 09:04:37 +0200
> On Sun, May 04, 2014 at 01:27:10AM -0400, David Miller wrote:
>>
>> Bring this code in line with the perf based generic NMI watchdog
>> in kernel/watchdog.c (which we should convert over to at some
>> point).
>>
>> In particular, don't do anything super fancy when the watchdog
>> triggers, and specifically don't do a do_exit() which only makes
>> things worse.
>
> It is always good when we can use more of the generic functionality.
> Should we do something remotely similar for sparc32?
>
> It looks like the sun4m_nmi() function is also used for sun4d + leon,
> but I need to look again to make sure.
The sun4m NMI function is just for hard asynchronous errors, rather
than a periodic event generated by perf counters.
So it serves a different purpose, but it could use some cleanups
nonetheless. I wrote that code when I was a coding cowboy of
sorts :-)
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-05-04 18:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-04 5:27 [PATCH] sparc64: Normalize NMI watchdog logging and behavior David Miller
2014-05-04 7:04 ` Sam Ravnborg
2014-05-04 18:24 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.