* [PATCH AUTOSEL 4.14 031/127] powerpc/eeh: Only dump stack once if an MMIO loop is detected
[not found] <20200918021220.2066485-1-sashal@kernel.org>
@ 2020-09-18 2:10 ` Sasha Levin
2020-09-18 2:11 ` [PATCH AUTOSEL 4.14 102/127] powerpc/traps: Make unrecoverable NMIs die instead of panic Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2020-09-18 2:10 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sam Bobroff, Oliver O'Halloran, linuxppc-dev, Sasha Levin
From: Oliver O'Halloran <oohall@gmail.com>
[ Upstream commit 4e0942c0302b5ad76b228b1a7b8c09f658a1d58a ]
Many drivers don't check for errors when they get a 0xFFs response from an
MMIO load. As a result after an EEH event occurs a driver can get stuck in
a polling loop unless it some kind of internal timeout logic.
Currently EEH tries to detect and report stuck drivers by dumping a stack
trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an
already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump
would occur every few seconds if the driver was spinning in a loop. This
results in a lot of spurious stack traces in the kernel log.
Fix this by limiting it to printing one stack trace for each PE freeze. If
the driver is truely stuck the kernel's hung task detector is better suited
to reporting the probelm anyway.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/kernel/eeh.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index d2ba7936d0d33..7b46576962bfd 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -506,7 +506,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
rc = 1;
if (pe->state & EEH_PE_ISOLATED) {
pe->check_count++;
- if (pe->check_count % EEH_MAX_FAILS == 0) {
+ if (pe->check_count == EEH_MAX_FAILS) {
dn = pci_device_to_OF_node(dev);
if (dn)
location = of_get_property(dn, "ibm,loc-code",
--
2.25.1
^ permalink raw reply related [flat|nested] 2+ messages in thread* [PATCH AUTOSEL 4.14 102/127] powerpc/traps: Make unrecoverable NMIs die instead of panic
[not found] <20200918021220.2066485-1-sashal@kernel.org>
2020-09-18 2:10 ` [PATCH AUTOSEL 4.14 031/127] powerpc/eeh: Only dump stack once if an MMIO loop is detected Sasha Levin
@ 2020-09-18 2:11 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2020-09-18 2:11 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Christophe Leroy, linuxppc-dev, Nicholas Piggin, Sasha Levin
From: Nicholas Piggin <npiggin@gmail.com>
[ Upstream commit 265d6e588d87194c2fe2d6c240247f0264e0c19b ]
System Reset and Machine Check interrupts that are not recoverable due
to being nested or interrupting when RI=0 currently panic. This is not
necessary, and can often just kill the current context and recover.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Link: https://lore.kernel.org/r/20200508043408.886394-16-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/kernel/traps.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 3c9457420aee8..0f1a888c04a84 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -357,11 +357,11 @@ out:
#ifdef CONFIG_PPC_BOOK3S_64
BUG_ON(get_paca()->in_nmi == 0);
if (get_paca()->in_nmi > 1)
- nmi_panic(regs, "Unrecoverable nested System Reset");
+ die("Unrecoverable nested System Reset", regs, SIGABRT);
#endif
/* Must die if the interrupt is not recoverable */
if (!(regs->msr & MSR_RI))
- nmi_panic(regs, "Unrecoverable System Reset");
+ die("Unrecoverable System Reset", regs, SIGABRT);
if (!nested)
nmi_exit();
@@ -701,7 +701,7 @@ void machine_check_exception(struct pt_regs *regs)
/* Must die if the interrupt is not recoverable */
if (!(regs->msr & MSR_RI))
- nmi_panic(regs, "Unrecoverable Machine check");
+ die("Unrecoverable Machine check", regs, SIGBUS);
return;
--
2.25.1
^ permalink raw reply related [flat|nested] 2+ messages in thread