* [PATCH 0/7] powerpc/64: machine check and other RAS fixes
@ 2020-03-17 9:09 Nicholas Piggin
2020-03-17 9:09 ` [PATCH 1/7] powerpc/64: mark emergency stacks valid to unwind Nicholas Piggin
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-03-17 9:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Ganesh Goudar, Nicholas Piggin
There's a bunch of problems we hit bringing up fwnmi sreset and
mces on QEMU, these apply to PowerVM as well, but I haven't done
much testing there and it's much harder.
This series of fixes applies on top of next-test, the machine
check reconcile patch won't apply cleanly to previous kernels but
it might want to be backported. We can do that after upstreaming.
This doesn't solve Ganesh's machine check RMO problem, but at
least the reconciling should help squash some warnings.
Thanks,
Nick
Nicholas Piggin (7):
powerpc/64: mark emergency stacks valid to unwind
powerpc/pseries/ras: avoid calling rtas_token in NMI paths
powerpc/64s: Change irq reconcile for NMIs from reusing _DAR to RESULT
powerpc/64s: machine check reconcile irq state
powerpc/pseries/ras: FWNMI_VALID off by one
powerpc/pseries/ras: fwnmi avoid modifying r3 in error case
powerpc/pseries/ras: fwnmi sreset should not interlock
arch/powerpc/include/asm/firmware.h | 1 +
arch/powerpc/kernel/exceptions-64s.S | 29 +++++++++++---
arch/powerpc/kernel/process.c | 31 ++++++++++++++-
arch/powerpc/platforms/pseries/ras.c | 54 ++++++++++++++++++--------
arch/powerpc/platforms/pseries/setup.c | 13 +++++--
5 files changed, 103 insertions(+), 25 deletions(-)
--
2.23.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/7] powerpc/64: mark emergency stacks valid to unwind
2020-03-17 9:09 [PATCH 0/7] powerpc/64: machine check and other RAS fixes Nicholas Piggin
@ 2020-03-17 9:09 ` Nicholas Piggin
2020-03-17 9:09 ` [PATCH 2/7] powerpc/pseries/ras: avoid calling rtas_token in NMI paths Nicholas Piggin
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-03-17 9:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Ganesh Goudar, Nicholas Piggin
Before:
WARNING: CPU: 0 PID: 494 at arch/powerpc/kernel/irq.c:343
CPU: 0 PID: 494 Comm: a Tainted: G W
NIP: c00000000001ed2c LR: c000000000d13190 CTR: c00000000003f910
REGS: c0000001fffd3870 TRAP: 0700 Tainted: G W
MSR: 8000000000021003 <SF,ME,RI,LE> CR: 28000488 XER: 00000000
CFAR: c00000000001ec90 IRQMASK: 0
GPR00: c000000000aeb12c c0000001fffd3b00 c0000000012ba300 0000000000000000
GPR04: 0000000000000000 0000000000000000 000000010bd207c8 6b00696e74657272
GPR08: 0000000000000000 0000000000000000 0000000000000000 efbeadde00000000
GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 000000010bd207bc
GPR28: 0000000000000000 c00000000148a898 0000000000000000 c0000001ffff3f50
NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100
LR [c000000000d13190] _raw_spin_unlock_irqrestore+0x50/0xc0
Call Trace:
Instruction dump:
60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000
60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000
After:
WARNING: CPU: 0 PID: 499 at arch/powerpc/kernel/irq.c:343
CPU: 0 PID: 499 Comm: a Not tainted
NIP: c00000000001ed2c LR: c000000000d13210 CTR: c00000000003f980
REGS: c0000001fffd3870 TRAP: 0700 Not tainted
MSR: 8000000000021003 <SF,ME,RI,LE> CR: 28000488 XER: 00000000
CFAR: c00000000001ec90 IRQMASK: 0
GPR00: c000000000aeb1ac c0000001fffd3b00 c0000000012ba300 0000000000000000
GPR04: 0000000000000000 0000000000000000 00000001347607c8 6b00696e74657272
GPR08: 0000000000000000 0000000000000000 0000000000000000 efbeadde00000000
GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 00000001347607bc
GPR28: 0000000000000000 c00000000148a898 0000000000000000 c0000001ffff3f50
NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100
LR [c000000000d13210] _raw_spin_unlock_irqrestore+0x50/0xc0
Call Trace:
[c0000001fffd3b20] [c000000000aeb1ac] of_find_property+0x6c/0x90
[c0000001fffd3b70] [c000000000aeb1f0] of_get_property+0x20/0x40
[c0000001fffd3b90] [c000000000042cdc] rtas_token+0x3c/0x70
[c0000001fffd3bb0] [c0000000000dc318] fwnmi_release_errinfo+0x28/0x70
[c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540
[c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70
[c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0
--- interrupt: 200 at 0x1347607c8
LR = 0x7fffafbd8328
Instruction dump:
60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000
60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/process.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 1dea4d280f6f..d27bf367e929 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1983,6 +1983,32 @@ static inline int valid_irq_stack(unsigned long sp, struct task_struct *p,
return 0;
}
+static inline int valid_emergency_stack(unsigned long sp, struct task_struct *p,
+ unsigned long nbytes)
+{
+#ifdef CONFIG_PPC64
+ unsigned long stack_page;
+ unsigned long cpu = task_cpu(p);
+
+ stack_page = (unsigned long)paca_ptrs[cpu]->emergency_sp - THREAD_SIZE;
+ if (sp >= stack_page && sp <= stack_page + THREAD_SIZE - nbytes)
+ return 1;
+
+# ifdef CONFIG_PPC_BOOK3S_64
+ stack_page = (unsigned long)paca_ptrs[cpu]->nmi_emergency_sp - THREAD_SIZE;
+ if (sp >= stack_page && sp <= stack_page + THREAD_SIZE - nbytes)
+ return 1;
+
+ stack_page = (unsigned long)paca_ptrs[cpu]->mc_emergency_sp - THREAD_SIZE;
+ if (sp >= stack_page && sp <= stack_page + THREAD_SIZE - nbytes)
+ return 1;
+# endif
+#endif
+
+ return 0;
+}
+
+
int validate_sp(unsigned long sp, struct task_struct *p,
unsigned long nbytes)
{
@@ -1994,7 +2020,10 @@ int validate_sp(unsigned long sp, struct task_struct *p,
if (sp >= stack_page && sp <= stack_page + THREAD_SIZE - nbytes)
return 1;
- return valid_irq_stack(sp, p, nbytes);
+ if (valid_irq_stack(sp, p, nbytes))
+ return 1;
+
+ return valid_emergency_stack(sp, p, nbytes);
}
EXPORT_SYMBOL(validate_sp);
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/7] powerpc/pseries/ras: avoid calling rtas_token in NMI paths
2020-03-17 9:09 [PATCH 0/7] powerpc/64: machine check and other RAS fixes Nicholas Piggin
2020-03-17 9:09 ` [PATCH 1/7] powerpc/64: mark emergency stacks valid to unwind Nicholas Piggin
@ 2020-03-17 9:09 ` Nicholas Piggin
2020-03-17 9:09 ` [PATCH 3/7] powerpc/64s: Change irq reconcile for NMIs from reusing _DAR to RESULT Nicholas Piggin
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-03-17 9:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Ganesh Goudar, Nicholas Piggin
In the interest of reducing code and possible failures in the
machine check and system reset paths, grab the "ibm,nmi-interlock"
token at init time.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/firmware.h | 1 +
arch/powerpc/platforms/pseries/ras.c | 2 +-
arch/powerpc/platforms/pseries/setup.c | 13 ++++++++++---
3 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index ca33f4ef6cb4..6003c2e533a0 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -128,6 +128,7 @@ extern void machine_check_fwnmi(void);
/* This is true if we are using the firmware NMI handler (typically LPAR) */
extern int fwnmi_active;
+extern int ibm_nmi_interlock_token;
extern unsigned int __start___fw_ftr_fixup, __stop___fw_ftr_fixup;
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 1d7f973c647b..c74d5e740922 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -458,7 +458,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
*/
static void fwnmi_release_errinfo(void)
{
- int ret = rtas_call(rtas_token("ibm,nmi-interlock"), 0, 1, NULL);
+ int ret = rtas_call(ibm_nmi_interlock_token, 0, 1, NULL);
if (ret != 0)
printk(KERN_ERR "FWNMI: nmi-interlock failed: %d\n", ret);
}
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 17d17f064a2d..c31acd7ce0c0 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -83,6 +83,7 @@ unsigned long CMO_PageSize = (ASM_CONST(1) << IOMMU_PAGE_SHIFT_4K);
EXPORT_SYMBOL(CMO_PageSize);
int fwnmi_active; /* TRUE if an FWNMI handler is present */
+int ibm_nmi_interlock_token;
static void pSeries_show_cpuinfo(struct seq_file *m)
{
@@ -113,9 +114,14 @@ static void __init fwnmi_init(void)
struct slb_entry *slb_ptr;
size_t size;
#endif
+ int ibm_nmi_register_token;
- int ibm_nmi_register = rtas_token("ibm,nmi-register");
- if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE)
+ ibm_nmi_register_token = rtas_token("ibm,nmi-register");
+ if (ibm_nmi_register_token == RTAS_UNKNOWN_SERVICE)
+ return;
+
+ ibm_nmi_interlock_token = rtas_token("ibm,nmi-interlock");
+ if (WARN_ON(ibm_nmi_interlock_token == RTAS_UNKNOWN_SERVICE))
return;
/* If the kernel's not linked at zero we point the firmware at low
@@ -123,7 +129,8 @@ static void __init fwnmi_init(void)
system_reset_addr = __pa(system_reset_fwnmi) - PHYSICAL_START;
machine_check_addr = __pa(machine_check_fwnmi) - PHYSICAL_START;
- if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, system_reset_addr,
+ if (0 == rtas_call(ibm_nmi_register_token, 2, 1, NULL,
+ system_reset_addr,
machine_check_addr))
fwnmi_active = 1;
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/7] powerpc/64s: Change irq reconcile for NMIs from reusing _DAR to RESULT
2020-03-17 9:09 [PATCH 0/7] powerpc/64: machine check and other RAS fixes Nicholas Piggin
2020-03-17 9:09 ` [PATCH 1/7] powerpc/64: mark emergency stacks valid to unwind Nicholas Piggin
2020-03-17 9:09 ` [PATCH 2/7] powerpc/pseries/ras: avoid calling rtas_token in NMI paths Nicholas Piggin
@ 2020-03-17 9:09 ` Nicholas Piggin
2020-03-17 9:09 ` [PATCH 4/7] powerpc/64s: machine check reconcile irq state Nicholas Piggin
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-03-17 9:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Ganesh Goudar, Nicholas Piggin
A spare interrupt stack slot is needed to save irq state when
reconciling NMIs (sreset and decrementer soft-nmi). _DAR is used
for this, but we want to reconcile machine checks as well, which
do use _DAR. Switch to using RESULT instead, as it's used by
system calls.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/exceptions-64s.S | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 6a936c9199d6..d95c4560c038 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1011,13 +1011,13 @@ EXC_COMMON_BEGIN(system_reset_common)
* the right thing. We do not want to reconcile because that goes
* through irq tracing which we don't want in NMI.
*
- * Save PACAIRQHAPPENED to _DAR (otherwise unused), and set HARD_DIS
+ * Save PACAIRQHAPPENED to RESULT (otherwise unused), and set HARD_DIS
* as we are running with MSR[EE]=0.
*/
li r10,IRQS_ALL_DISABLED
stb r10,PACAIRQSOFTMASK(r13)
lbz r10,PACAIRQHAPPENED(r13)
- std r10,_DAR(r1)
+ std r10,RESULT(r1)
ori r10,r10,PACA_IRQ_HARD_DIS
stb r10,PACAIRQHAPPENED(r13)
@@ -1038,7 +1038,7 @@ EXC_COMMON_BEGIN(system_reset_common)
/*
* Restore soft mask settings.
*/
- ld r10,_DAR(r1)
+ ld r10,RESULT(r1)
stb r10,PACAIRQHAPPENED(r13)
ld r10,SOFTE(r1)
stb r10,PACAIRQSOFTMASK(r13)
@@ -2805,7 +2805,7 @@ EXC_COMMON_BEGIN(soft_nmi_common)
li r10,IRQS_ALL_DISABLED
stb r10,PACAIRQSOFTMASK(r13)
lbz r10,PACAIRQHAPPENED(r13)
- std r10,_DAR(r1)
+ std r10,RESULT(r1)
ori r10,r10,PACA_IRQ_HARD_DIS
stb r10,PACAIRQHAPPENED(r13)
@@ -2819,7 +2819,7 @@ EXC_COMMON_BEGIN(soft_nmi_common)
/*
* Restore soft mask settings.
*/
- ld r10,_DAR(r1)
+ ld r10,RESULT(r1)
stb r10,PACAIRQHAPPENED(r13)
ld r10,SOFTE(r1)
stb r10,PACAIRQSOFTMASK(r13)
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/7] powerpc/64s: machine check reconcile irq state
2020-03-17 9:09 [PATCH 0/7] powerpc/64: machine check and other RAS fixes Nicholas Piggin
` (2 preceding siblings ...)
2020-03-17 9:09 ` [PATCH 3/7] powerpc/64s: Change irq reconcile for NMIs from reusing _DAR to RESULT Nicholas Piggin
@ 2020-03-17 9:09 ` Nicholas Piggin
2020-03-17 9:09 ` [PATCH 5/7] powerpc/pseries/ras: FWNMI_VALID off by one Nicholas Piggin
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-03-17 9:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Ganesh Goudar, Nicholas Piggin
pseries fwnmi machine check code pops the soft-irq checks in rtas_call
(after the previous patch to remove rtas_token from this call path).
Rather than play whack a mole with these and forever having fragile
code, it seems better to have the early machine check handler perform
the same kind of reconcile as the other NMI interrupts.
WARNING: CPU: 0 PID: 493 at arch/powerpc/kernel/irq.c:343
CPU: 0 PID: 493 Comm: a Tainted: G W
NIP: c00000000001ed2c LR: c000000000042c40 CTR: 0000000000000000
REGS: c0000001fffd38b0 TRAP: 0700 Tainted: G W
MSR: 8000000000021003 <SF,ME,RI,LE> CR: 28000488 XER: 00000000
CFAR: c00000000001ec90 IRQMASK: 0
GPR00: c000000000043820 c0000001fffd3b40 c0000000012ba300 0000000000000000
GPR04: 0000000048000488 0000000000000000 0000000000000000 00000000deadbeef
GPR08: 0000000000000080 0000000000000000 0000000000000000 0000000000001001
GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: 0000000000000000 0000000000000001 c000000001360810 0000000000000000
NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100
LR [c000000000042c40] unlock_rtas+0x30/0x90
Call Trace:
[c0000001fffd3b40] [c000000001360810] 0xc000000001360810 (unreliable)
[c0000001fffd3b60] [c000000000043820] rtas_call+0x1c0/0x280
[c0000001fffd3bb0] [c0000000000dc328] fwnmi_release_errinfo+0x38/0x70
[c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540
[c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70
[c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0
--- interrupt: 200 at 0x13f1307c8
LR = 0x7fff888b8528
Instruction dump:
60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000
60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/exceptions-64s.S | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index d95c4560c038..31bdbb94e477 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1186,11 +1186,30 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
li r10,MSR_RI
mtmsrd r10,1
+ /*
+ * Set IRQS_ALL_DISABLED and save PACAIRQHAPPENED (see
+ * system_reset_common)
+ */
+ li r10,IRQS_ALL_DISABLED
+ stb r10,PACAIRQSOFTMASK(r13)
+ lbz r10,PACAIRQHAPPENED(r13)
+ std r10,RESULT(r1)
+ ori r10,r10,PACA_IRQ_HARD_DIS
+ stb r10,PACAIRQHAPPENED(r13)
+
addi r3,r1,STACK_FRAME_OVERHEAD
bl machine_check_early
std r3,RESULT(r1) /* Save result */
ld r12,_MSR(r1)
+ /*
+ * Restore soft mask settings.
+ */
+ ld r10,RESULT(r1)
+ stb r10,PACAIRQHAPPENED(r13)
+ ld r10,SOFTE(r1)
+ stb r10,PACAIRQSOFTMASK(r13)
+
#ifdef CONFIG_PPC_P7_NAP
/*
* Check if thread was in power saving mode. We come here when any
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5/7] powerpc/pseries/ras: FWNMI_VALID off by one
2020-03-17 9:09 [PATCH 0/7] powerpc/64: machine check and other RAS fixes Nicholas Piggin
` (3 preceding siblings ...)
2020-03-17 9:09 ` [PATCH 4/7] powerpc/64s: machine check reconcile irq state Nicholas Piggin
@ 2020-03-17 9:09 ` Nicholas Piggin
2020-03-17 9:09 ` [PATCH 6/7] powerpc/pseries/ras: fwnmi avoid modifying r3 in error case Nicholas Piggin
2020-03-17 9:09 ` [PATCH 7/7] powerpc/pseries/ras: fwnmi sreset should not interlock Nicholas Piggin
6 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-03-17 9:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Ganesh Goudar, Nicholas Piggin
This was discovered developing qemu fwnmi sreset support. This
off-by-one bug means the last 16 bytes of the rtas area can not
be used for a 16 byte save area.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/platforms/pseries/ras.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index c74d5e740922..9a37bda47468 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -395,10 +395,11 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
/*
* Some versions of FWNMI place the buffer inside the 4kB page starting at
* 0x7000. Other versions place it inside the rtas buffer. We check both.
+ * Minimum size of the buffer is 16 bytes.
*/
#define VALID_FWNMI_BUFFER(A) \
- ((((A) >= 0x7000) && ((A) < 0x7ff0)) || \
- (((A) >= rtas.base) && ((A) < (rtas.base + rtas.size - 16))))
+ ((((A) >= 0x7000) && ((A) <= 0x8000 - 16)) || \
+ (((A) >= rtas.base) && ((A) <= (rtas.base + rtas.size - 16))))
static inline struct rtas_error_log *fwnmi_get_errlog(void)
{
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 6/7] powerpc/pseries/ras: fwnmi avoid modifying r3 in error case
2020-03-17 9:09 [PATCH 0/7] powerpc/64: machine check and other RAS fixes Nicholas Piggin
` (4 preceding siblings ...)
2020-03-17 9:09 ` [PATCH 5/7] powerpc/pseries/ras: FWNMI_VALID off by one Nicholas Piggin
@ 2020-03-17 9:09 ` Nicholas Piggin
2020-03-17 9:09 ` [PATCH 7/7] powerpc/pseries/ras: fwnmi sreset should not interlock Nicholas Piggin
6 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-03-17 9:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Ganesh Goudar, Nicholas Piggin
If there is some error with the fwnmi save area, r3 has already been
modified which doesn't help with debugging.
Only update r3 when to restore the saved value.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/platforms/pseries/ras.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 9a37bda47468..a40598e6e525 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -423,18 +423,19 @@ static inline struct rtas_error_log *fwnmi_get_errlog(void)
*/
static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
{
+ unsigned long savep_ra;
unsigned long *savep;
struct rtas_error_log *h;
/* Mask top two bits */
- regs->gpr[3] &= ~(0x3UL << 62);
+ savep_ra = regs->gpr[3] & ~(0x3UL << 62);
- if (!VALID_FWNMI_BUFFER(regs->gpr[3])) {
+ if (!VALID_FWNMI_BUFFER(savep_ra)) {
printk(KERN_ERR "FWNMI: corrupt r3 0x%016lx\n", regs->gpr[3]);
return NULL;
}
- savep = __va(regs->gpr[3]);
+ savep = __va(savep_ra);
regs->gpr[3] = be64_to_cpu(savep[0]); /* restore original r3 */
h = (struct rtas_error_log *)&savep[1];
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 7/7] powerpc/pseries/ras: fwnmi sreset should not interlock
2020-03-17 9:09 [PATCH 0/7] powerpc/64: machine check and other RAS fixes Nicholas Piggin
` (5 preceding siblings ...)
2020-03-17 9:09 ` [PATCH 6/7] powerpc/pseries/ras: fwnmi avoid modifying r3 in error case Nicholas Piggin
@ 2020-03-17 9:09 ` Nicholas Piggin
6 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-03-17 9:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Ganesh Goudar, Nicholas Piggin
PAPR does not specify that fwnmi sreset should be interlocked, and
PowerVM (and therefore now QEMU) do not require it.
These "ibm,nmi-interlock" calls are ignored by firmware, but there
is a possibility that the sreset could have interrupted a machine
check and release the machine check's interlock too early, corrupting
it if another machine check came in.
This is an extremely rare case, but it should be fixed for clarity
and reducing the code executed in the sreset path. Firmware also
does not provide error information for the sreset case to look at, so
remove that comment.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/platforms/pseries/ras.c | 48 ++++++++++++++++++++--------
1 file changed, 34 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index a40598e6e525..833ae34b7fec 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -406,6 +406,20 @@ static inline struct rtas_error_log *fwnmi_get_errlog(void)
return (struct rtas_error_log *)local_paca->mce_data_buf;
}
+static unsigned long *fwnmi_get_savep(struct pt_regs *regs)
+{
+ unsigned long savep_ra;
+
+ /* Mask top two bits */
+ savep_ra = regs->gpr[3] & ~(0x3UL << 62);
+ if (!VALID_FWNMI_BUFFER(savep_ra)) {
+ printk(KERN_ERR "FWNMI: corrupt r3 0x%016lx\n", regs->gpr[3]);
+ return NULL;
+ }
+
+ return __va(savep_ra);
+}
+
/*
* Get the error information for errors coming through the
* FWNMI vectors. The pt_regs' r3 will be updated to reflect
@@ -423,20 +437,15 @@ static inline struct rtas_error_log *fwnmi_get_errlog(void)
*/
static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
{
- unsigned long savep_ra;
unsigned long *savep;
struct rtas_error_log *h;
- /* Mask top two bits */
- savep_ra = regs->gpr[3] & ~(0x3UL << 62);
-
- if (!VALID_FWNMI_BUFFER(savep_ra)) {
- printk(KERN_ERR "FWNMI: corrupt r3 0x%016lx\n", regs->gpr[3]);
+ savep = fwnmi_get_savep(regs);
+ if (!savep)
return NULL;
- }
- savep = __va(savep_ra);
- regs->gpr[3] = be64_to_cpu(savep[0]); /* restore original r3 */
+ /* restore original r3 */
+ regs->gpr[3] = be64_to_cpu(savep[0]);
h = (struct rtas_error_log *)&savep[1];
/* Use the per cpu buffer from paca to store rtas error log */
@@ -483,11 +492,22 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
#endif
if (fwnmi_active) {
- struct rtas_error_log *errhdr = fwnmi_get_errinfo(regs);
- if (errhdr) {
- /* XXX Should look at FWNMI information */
- }
- fwnmi_release_errinfo();
+ unsigned long *savep;
+
+ /*
+ * Firmware (PowerVM and KVM) saves r3 to a save area like
+ * machine check, which is not exactly what PAPR (2.9)
+ * suggests but there is no way to detect otherwise, so this
+ * is the interface now.
+ *
+ * System resets do not save any error log or require an
+ * "ibm,nmi-interlock" rtas call to release.
+ */
+
+ savep = fwnmi_get_savep(regs);
+ /* restore original r3 */
+ if (savep)
+ regs->gpr[3] = be64_to_cpu(savep[0]);
}
if (smp_handle_nmi_ipi(regs))
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-03-17 9:22 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-17 9:09 [PATCH 0/7] powerpc/64: machine check and other RAS fixes Nicholas Piggin
2020-03-17 9:09 ` [PATCH 1/7] powerpc/64: mark emergency stacks valid to unwind Nicholas Piggin
2020-03-17 9:09 ` [PATCH 2/7] powerpc/pseries/ras: avoid calling rtas_token in NMI paths Nicholas Piggin
2020-03-17 9:09 ` [PATCH 3/7] powerpc/64s: Change irq reconcile for NMIs from reusing _DAR to RESULT Nicholas Piggin
2020-03-17 9:09 ` [PATCH 4/7] powerpc/64s: machine check reconcile irq state Nicholas Piggin
2020-03-17 9:09 ` [PATCH 5/7] powerpc/pseries/ras: FWNMI_VALID off by one Nicholas Piggin
2020-03-17 9:09 ` [PATCH 6/7] powerpc/pseries/ras: fwnmi avoid modifying r3 in error case Nicholas Piggin
2020-03-17 9:09 ` [PATCH 7/7] powerpc/pseries/ras: fwnmi sreset should not interlock Nicholas Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).