* [PATCH v7 3/9] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
From: Mahesh J Salgaonkar @ 2018-08-07 14:16 UTC (permalink / raw)
To: linuxppc-dev
Cc: stable, Nicholas Piggin, Aneesh Kumar K.V, Michal Suchanek,
Ananth Narayan, Nicholas Piggin, Laurent Dufour, Michael Ellerman
In-Reply-To: <153365127532.14256.1965469477086140841.stgit@jupiter.in.ibm.com>
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.
Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.
[ 62.878965] Severe Machine check interrupt [Recovered]
[ 62.878968] NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
[ 62.878969] Initiator: CPU
[ 62.878970] Error type: SLB [Multihit]
[ 62.878971] Effective address: d00000000ca70000
cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
pc: c0000000009694c0: vsnprintf+0x80/0x480
lr: c0000000009698e0: vscnprintf+0x20/0x60
sp: c0000000fc777830
msr: 8000000002009033
dar: a803a30c000000d0
current = 0xc00000000bc9ef00
paca = 0xc00000001eca5c00 softe: 3 irq_happened: 0x01
pid = 8860, comm = insmod
[c0000000fc7778b0] c0000000009698e0 vscnprintf+0x20/0x60
[c0000000fc7778e0] c00000000016b6c4 vprintk_emit+0xb4/0x4b0
[c0000000fc777960] c00000000016d40c vprintk_func+0x5c/0xd0
[c0000000fc777980] c00000000016cbb4 printk+0x38/0x4c
[c0000000fc7779a0] d00000000ca301c0 init_module+0x1c0/0x338 [bork_kernel]
[c0000000fc777a40] c00000000000d9c4 do_one_initcall+0x54/0x230
[c0000000fc777b00] c0000000001b3b74 do_init_module+0x8c/0x248
[c0000000fc777b90] c0000000001b2478 load_module+0x12b8/0x15b0
[c0000000fc777d30] c0000000001b29e8 sys_finit_module+0xa8/0x110
[c0000000fc777e30] c00000000000b204 system_call+0x58/0x6c
--- Exception: c00 (System Call) at 00007fff8bda0644
SP (7fffdfbfe980) is in userspace
This patch fixes this issue.
Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support")
Cc: stable@vger.kernel.org
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/ras.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 14a46b07ab2f..851ce326874a 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -367,7 +367,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
}
savep = __va(regs->gpr[3]);
- regs->gpr[3] = savep[0]; /* restore original r3 */
+ regs->gpr[3] = be64_to_cpu(savep[0]); /* restore original r3 */
h = (struct rtas_error_log *)&savep[1];
/* Use the per cpu buffer from paca to store rtas error log */
^ permalink raw reply related
* [PATCH v7 4/9] powerpc/pseries: Define MCE error event section.
From: Mahesh J Salgaonkar @ 2018-08-07 14:16 UTC (permalink / raw)
To: linuxppc-dev
Cc: Aneesh Kumar K.V, Michal Suchanek, Ananth Narayan,
Nicholas Piggin, Laurent Dufour, Michael Ellerman
In-Reply-To: <153365127532.14256.1965469477086140841.stgit@jupiter.in.ibm.com>
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
On pseries, the machine check error details are part of RTAS extended
event log passed under Machine check exception section. This patch adds
the definition of rtas MCE event section and related helper
functions.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/rtas.h | 111 +++++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 71e393c46a49..adc677c5e3a4 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -185,6 +185,13 @@ static inline uint8_t rtas_error_disposition(const struct rtas_error_log *elog)
return (elog->byte1 & 0x18) >> 3;
}
+static inline
+void rtas_set_disposition_recovered(struct rtas_error_log *elog)
+{
+ elog->byte1 &= ~0x18;
+ elog->byte1 |= (RTAS_DISP_FULLY_RECOVERED << 3);
+}
+
static inline uint8_t rtas_error_extended(const struct rtas_error_log *elog)
{
return (elog->byte1 & 0x04) >> 2;
@@ -275,6 +282,7 @@ inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log)
#define PSERIES_ELOG_SECT_ID_CALL_HOME (('C' << 8) | 'H')
#define PSERIES_ELOG_SECT_ID_USER_DEF (('U' << 8) | 'D')
#define PSERIES_ELOG_SECT_ID_HOTPLUG (('H' << 8) | 'P')
+#define PSERIES_ELOG_SECT_ID_MCE (('M' << 8) | 'C')
/* Vendor specific Platform Event Log Format, Version 6, section header */
struct pseries_errorlog {
@@ -326,6 +334,109 @@ struct pseries_hp_errorlog {
#define PSERIES_HP_ELOG_ID_DRC_COUNT 3
#define PSERIES_HP_ELOG_ID_DRC_IC 4
+/* RTAS pseries MCE errorlog section */
+#pragma pack(push, 1)
+struct pseries_mc_errorlog {
+ __be32 fru_id;
+ __be32 proc_id;
+ uint8_t error_type;
+ union {
+ struct {
+ uint8_t ue_err_type;
+ /* XXXXXXXX
+ * X 1: Permanent or Transient UE.
+ * X 1: Effective address provided.
+ * X 1: Logical address provided.
+ * XX 2: Reserved.
+ * XXX 3: Type of UE error.
+ */
+ uint8_t reserved_1[6];
+ __be64 effective_address;
+ __be64 logical_address;
+ } ue_error;
+ struct {
+ uint8_t soft_err_type;
+ /* XXXXXXXX
+ * X 1: Effective address provided.
+ * XXXXX 5: Reserved.
+ * XX 2: Type of SLB/ERAT/TLB error.
+ */
+ uint8_t reserved_1[6];
+ __be64 effective_address;
+ uint8_t reserved_2[8];
+ } soft_error;
+ } u;
+};
+#pragma pack(pop)
+
+/* RTAS pseries MCE error types */
+#define PSERIES_MC_ERROR_TYPE_UE 0x00
+#define PSERIES_MC_ERROR_TYPE_SLB 0x01
+#define PSERIES_MC_ERROR_TYPE_ERAT 0x02
+#define PSERIES_MC_ERROR_TYPE_TLB 0x04
+#define PSERIES_MC_ERROR_TYPE_D_CACHE 0x05
+#define PSERIES_MC_ERROR_TYPE_I_CACHE 0x07
+
+/* RTAS pseries MCE error sub types */
+#define PSERIES_MC_ERROR_UE_INDETERMINATE 0
+#define PSERIES_MC_ERROR_UE_IFETCH 1
+#define PSERIES_MC_ERROR_UE_PAGE_TABLE_WALK_IFETCH 2
+#define PSERIES_MC_ERROR_UE_LOAD_STORE 3
+#define PSERIES_MC_ERROR_UE_PAGE_TABLE_WALK_LOAD_STORE 4
+
+#define PSERIES_MC_ERROR_SLB_PARITY 0
+#define PSERIES_MC_ERROR_SLB_MULTIHIT 1
+#define PSERIES_MC_ERROR_SLB_INDETERMINATE 2
+
+#define PSERIES_MC_ERROR_ERAT_PARITY 1
+#define PSERIES_MC_ERROR_ERAT_MULTIHIT 2
+#define PSERIES_MC_ERROR_ERAT_INDETERMINATE 3
+
+#define PSERIES_MC_ERROR_TLB_PARITY 1
+#define PSERIES_MC_ERROR_TLB_MULTIHIT 2
+#define PSERIES_MC_ERROR_TLB_INDETERMINATE 3
+
+static inline uint8_t rtas_mc_error_type(const struct pseries_mc_errorlog *mlog)
+{
+ return mlog->error_type;
+}
+
+static inline uint8_t rtas_mc_error_sub_type(
+ const struct pseries_mc_errorlog *mlog)
+{
+ switch (mlog->error_type) {
+ case PSERIES_MC_ERROR_TYPE_UE:
+ return (mlog->u.ue_error.ue_err_type & 0x07);
+ case PSERIES_MC_ERROR_TYPE_SLB:
+ case PSERIES_MC_ERROR_TYPE_ERAT:
+ case PSERIES_MC_ERROR_TYPE_TLB:
+ return (mlog->u.soft_error.soft_err_type & 0x03);
+ default:
+ return 0;
+ }
+}
+
+static inline uint64_t rtas_mc_get_effective_addr(
+ const struct pseries_mc_errorlog *mlog)
+{
+ uint64_t addr = 0;
+
+ switch (mlog->error_type) {
+ case PSERIES_MC_ERROR_TYPE_UE:
+ if (mlog->u.ue_error.ue_err_type & 0x40)
+ addr = mlog->u.ue_error.effective_address;
+ break;
+ case PSERIES_MC_ERROR_TYPE_SLB:
+ case PSERIES_MC_ERROR_TYPE_ERAT:
+ case PSERIES_MC_ERROR_TYPE_TLB:
+ if (mlog->u.soft_error.soft_err_type & 0x80)
+ addr = mlog->u.soft_error.effective_address;
+ default:
+ break;
+ }
+ return be64_to_cpu(addr);
+}
+
struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
uint16_t section_id);
^ permalink raw reply related
* [PATCH v7 5/9] powerpc/pseries: flush SLB contents on SLB MCE errors.
From: Mahesh J Salgaonkar @ 2018-08-07 14:17 UTC (permalink / raw)
To: linuxppc-dev
Cc: Michal Suchanek, Aneesh Kumar K.V, Michal Suchanek,
Ananth Narayan, Nicholas Piggin, Laurent Dufour, Michael Ellerman
In-Reply-To: <153365127532.14256.1965469477086140841.stgit@jupiter.in.ibm.com>
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
On pseries, as of today system crashes if we get a machine check
exceptions due to SLB errors. These are soft errors and can be fixed by
flushing the SLBs so the kernel can continue to function instead of
system crash. We do this in real mode before turning on MMU. Otherwise
we would run into nested machine checks. This patch now fetches the
rtas error log in real mode and flushes the SLBs on SLB errors.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.com>
---
Changes in V7:
- Fold Michal's patch into this patch.
- Handle MSR_RI=0 and evil context case in MC handler.
---
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1
arch/powerpc/include/asm/machdep.h | 1
arch/powerpc/kernel/exceptions-64s.S | 112 +++++++++++++++++++++++++
arch/powerpc/kernel/mce.c | 15 +++
arch/powerpc/mm/slb.c | 6 +
arch/powerpc/platforms/powernv/setup.c | 11 ++
arch/powerpc/platforms/pseries/pseries.h | 1
arch/powerpc/platforms/pseries/ras.c | 51 +++++++++++
arch/powerpc/platforms/pseries/setup.c | 1
9 files changed, 195 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 50ed64fba4ae..cc00a7088cf3 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -487,6 +487,7 @@ extern void hpte_init_native(void);
extern void slb_initialize(void);
extern void slb_flush_and_rebolt(void);
+extern void slb_flush_and_rebolt_realmode(void);
extern void slb_vmalloc_update(void);
extern void slb_set_size(u16 size);
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index a47de82fb8e2..b4831f1338db 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -108,6 +108,7 @@ struct machdep_calls {
/* Early exception handlers called in realmode */
int (*hmi_exception_early)(struct pt_regs *regs);
+ long (*machine_check_early)(struct pt_regs *regs);
/* Called during machine check exception to retrive fixup address. */
bool (*mce_check_early_recovery)(struct pt_regs *regs);
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 285c6465324a..cb06f219570a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -332,6 +332,9 @@ TRAMP_REAL_BEGIN(machine_check_pSeries)
machine_check_fwnmi:
SET_SCRATCH0(r13) /* save r13 */
EXCEPTION_PROLOG_0(PACA_EXMC)
+BEGIN_FTR_SECTION
+ b machine_check_pSeries_early
+END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
machine_check_pSeries_0:
EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200)
/*
@@ -343,6 +346,90 @@ machine_check_pSeries_0:
TRAMP_KVM_SKIP(PACA_EXMC, 0x200)
+TRAMP_REAL_BEGIN(machine_check_pSeries_early)
+BEGIN_FTR_SECTION
+ EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
+ mr r10,r1 /* Save r1 */
+ ld r1,PACAMCEMERGSP(r13) /* Use MC emergency stack */
+ subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */
+ mfspr r11,SPRN_SRR0 /* Save SRR0 */
+ mfspr r12,SPRN_SRR1 /* Save SRR1 */
+ EXCEPTION_PROLOG_COMMON_1()
+ EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
+ EXCEPTION_PROLOG_COMMON_3(0x200)
+ addi r3,r1,STACK_FRAME_OVERHEAD
+ BRANCH_LINK_TO_FAR(machine_check_early) /* Function call ABI */
+ ld r12,_MSR(r1)
+ andi. r11,r12,MSR_PR /* See if coming from user. */
+ bne 2f /* continue in V mode if we are. */
+
+ /*
+ * At this point we are not sure about what context we come from.
+ * We may be in the middle of swithing stack. r1 may not be valid.
+ * Hence stay on emergency stack, call machine_check_exception and
+ * return from the interrupt.
+ * But before that, check if this is an un-recoverable exception.
+ * If yes, then stay on emergency stack and panic.
+ */
+ andi. r11,r12,MSR_RI
+ bne 1f
+
+ /*
+ * Check if we have successfully handled/recovered from error, if not
+ * then stay on emergency stack and panic.
+ */
+ cmpdi r3,0 /* see if we handled MCE successfully */
+ bne 1f /* if handled then return from interrupt */
+
+ LOAD_HANDLER(r10,unrecover_mce)
+ mtspr SPRN_SRR0,r10
+ ld r10,PACAKMSR(r13)
+ /*
+ * We are going down. But there are chances that we might get hit by
+ * another MCE during panic path and we may run into unstable state
+ * with no way out. Hence, turn ME bit off while going down, so that
+ * when another MCE is hit during panic path, hypervisor will
+ * power cycle the lpar, instead of getting into MCE loop.
+ */
+ li r3,MSR_ME
+ andc r10,r10,r3 /* Turn off MSR_ME */
+ mtspr SPRN_SRR1,r10
+ RFI_TO_KERNEL
+ b .
+
+ /* Stay on emergency stack and return from interrupt. */
+1: LOAD_HANDLER(r10,mce_return)
+ mtspr SPRN_SRR0,r10
+ ld r10,PACAKMSR(r13)
+ mtspr SPRN_SRR1,r10
+ RFI_TO_KERNEL
+ b .
+
+ /* Move original SRR0 and SRR1 into the respective regs */
+2: ld r9,_MSR(r1)
+ mtspr SPRN_SRR1,r9
+ ld r3,_NIP(r1)
+ mtspr SPRN_SRR0,r3
+ ld r9,_CTR(r1)
+ mtctr r9
+ ld r9,_XER(r1)
+ mtxer r9
+ ld r9,_LINK(r1)
+ mtlr r9
+ REST_GPR(0, r1)
+ REST_8GPRS(2, r1)
+ REST_GPR(10, r1)
+ ld r11,_CCR(r1)
+ mtcr r11
+ REST_GPR(11, r1)
+ REST_2GPRS(12, r1)
+ /* restore original r1. */
+ ld r1,GPR1(r1)
+ SET_SCRATCH0(r13) /* save r13 */
+ EXCEPTION_PROLOG_0(PACA_EXMC)
+ b machine_check_pSeries_0
+END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
+
EXC_COMMON_BEGIN(machine_check_common)
/*
* Machine check is different because we use a different
@@ -536,6 +623,31 @@ EXC_COMMON_BEGIN(unrecover_mce)
bl unrecoverable_exception
b 1b
+EXC_COMMON_BEGIN(mce_return)
+ /* Invoke machine_check_exception to print MCE event and return. */
+ addi r3,r1,STACK_FRAME_OVERHEAD
+ bl machine_check_exception
+ ld r9,_MSR(r1)
+ mtspr SPRN_SRR1,r9
+ ld r3,_NIP(r1)
+ mtspr SPRN_SRR0,r3
+ ld r9,_CTR(r1)
+ mtctr r9
+ ld r9,_XER(r1)
+ mtxer r9
+ ld r9,_LINK(r1)
+ mtlr r9
+ REST_GPR(0, r1)
+ REST_8GPRS(2, r1)
+ REST_GPR(10, r1)
+ ld r11,_CCR(r1)
+ mtcr r11
+ REST_GPR(11, r1)
+ REST_2GPRS(12, r1)
+ /* restore original r1. */
+ ld r1,GPR1(r1)
+ RFI_TO_KERNEL
+ b .
EXC_REAL(data_access, 0x300, 0x80)
EXC_VIRT(data_access, 0x4300, 0x80, 0x300)
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index efdd16a79075..ae17d8aa60c4 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -488,10 +488,19 @@ long machine_check_early(struct pt_regs *regs)
{
long handled = 0;
- __this_cpu_inc(irq_stat.mce_exceptions);
+ /*
+ * For pSeries we count mce when we go into virtual mode machine
+ * check handler. Hence skip it. Also, We can't access per cpu
+ * variables in real mode for LPAR.
+ */
+ if (early_cpu_has_feature(CPU_FTR_HVMODE))
+ __this_cpu_inc(irq_stat.mce_exceptions);
- if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
- handled = cur_cpu_spec->machine_check_early(regs);
+ /*
+ * See if platform is capable of handling machine check.
+ */
+ if (ppc_md.machine_check_early)
+ handled = ppc_md.machine_check_early(regs);
return handled;
}
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index cb796724a6fc..e89f675f1b5e 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -145,6 +145,12 @@ void slb_flush_and_rebolt(void)
get_paca()->slb_cache_ptr = 0;
}
+void slb_flush_and_rebolt_realmode(void)
+{
+ __slb_flush_and_rebolt();
+ get_paca()->slb_cache_ptr = 0;
+}
+
void slb_vmalloc_update(void)
{
unsigned long vflags;
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index f96df0a25d05..b74c93bc2e55 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -431,6 +431,16 @@ static unsigned long pnv_get_proc_freq(unsigned int cpu)
return ret_freq;
}
+static long pnv_machine_check_early(struct pt_regs *regs)
+{
+ long handled = 0;
+
+ if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
+ handled = cur_cpu_spec->machine_check_early(regs);
+
+ return handled;
+}
+
define_machine(powernv) {
.name = "PowerNV",
.probe = pnv_probe,
@@ -442,6 +452,7 @@ define_machine(powernv) {
.machine_shutdown = pnv_shutdown,
.power_save = NULL,
.calibrate_decr = generic_calibrate_decr,
+ .machine_check_early = pnv_machine_check_early,
#ifdef CONFIG_KEXEC_CORE
.kexec_cpu_down = pnv_kexec_cpu_down,
#endif
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 60db2ee511fb..ec2a5f61d4a4 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -24,6 +24,7 @@ struct pt_regs;
extern int pSeries_system_reset_exception(struct pt_regs *regs);
extern int pSeries_machine_check_exception(struct pt_regs *regs);
+extern long pSeries_machine_check_realmode(struct pt_regs *regs);
#ifdef CONFIG_SMP
extern void smp_init_pseries(void);
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 851ce326874a..e4420f7c8fda 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -427,6 +427,35 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
return 0; /* need to perform reset */
}
+static int mce_handle_error(struct rtas_error_log *errp)
+{
+ struct pseries_errorlog *pseries_log;
+ struct pseries_mc_errorlog *mce_log;
+ int disposition = rtas_error_disposition(errp);
+ uint8_t error_type;
+
+ if (!rtas_error_extended(errp))
+ goto out;
+
+ pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
+ if (pseries_log == NULL)
+ goto out;
+
+ mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
+ error_type = rtas_mc_error_type(mce_log);
+
+ if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
+ (error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
+ /* Store the old slb content someplace. */
+ slb_flush_and_rebolt_realmode();
+ disposition = RTAS_DISP_FULLY_RECOVERED;
+ rtas_set_disposition_recovered(errp);
+ }
+
+out:
+ return disposition;
+}
+
/*
* Process MCE rtas errlog event.
*/
@@ -503,11 +532,31 @@ int pSeries_machine_check_exception(struct pt_regs *regs)
struct rtas_error_log *errp;
if (fwnmi_active) {
- errp = fwnmi_get_errinfo(regs);
fwnmi_release_errinfo();
+ errp = fwnmi_get_errlog();
if (errp && recover_mce(regs, errp))
return 1;
}
return 0;
}
+
+long pSeries_machine_check_realmode(struct pt_regs *regs)
+{
+ struct rtas_error_log *errp;
+ int disposition;
+
+ if (fwnmi_active) {
+ errp = fwnmi_get_errinfo(regs);
+ /*
+ * Call to fwnmi_release_errinfo() in real mode causes kernel
+ * to panic. Hence we will call it as soon as we go into
+ * virtual mode.
+ */
+ disposition = mce_handle_error(errp);
+ if (disposition == RTAS_DISP_FULLY_RECOVERED)
+ return 1;
+ }
+
+ return 0;
+}
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index b42087cd8c6b..7a9421d089d8 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -1000,6 +1000,7 @@ define_machine(pseries) {
.calibrate_decr = generic_calibrate_decr,
.progress = rtas_progress,
.system_reset_exception = pSeries_system_reset_exception,
+ .machine_check_early = pSeries_machine_check_realmode,
.machine_check_exception = pSeries_machine_check_exception,
#ifdef CONFIG_KEXEC_CORE
.machine_kexec = pSeries_machine_kexec,
^ permalink raw reply related
* [PATCH v7 6/9] powerpc/pseries: Display machine check error details.
From: Mahesh J Salgaonkar @ 2018-08-07 14:17 UTC (permalink / raw)
To: linuxppc-dev
Cc: Aneesh Kumar K.V, Michal Suchanek, Ananth Narayan,
Nicholas Piggin, Laurent Dufour, Michael Ellerman
In-Reply-To: <153365127532.14256.1965469477086140841.stgit@jupiter.in.ibm.com>
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Extract the MCE error details from RTAS extended log and display it to
console.
With this patch you should now see mce logs like below:
[ 142.371818] Severe Machine check interrupt [Recovered]
[ 142.371822] NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
[ 142.371822] Initiator: CPU
[ 142.371823] Error type: SLB [Multihit]
[ 142.371824] Effective address: d00000000ca70000
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/rtas.h | 5 +
arch/powerpc/platforms/pseries/ras.c | 132 ++++++++++++++++++++++++++++++++++
2 files changed, 137 insertions(+)
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index adc677c5e3a4..9b3c6e06dad1 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -197,6 +197,11 @@ static inline uint8_t rtas_error_extended(const struct rtas_error_log *elog)
return (elog->byte1 & 0x04) >> 2;
}
+static inline uint8_t rtas_error_initiator(const struct rtas_error_log *elog)
+{
+ return (elog->byte2 & 0xf0) >> 4;
+}
+
#define rtas_error_type(x) ((x)->byte3)
static inline
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index e4420f7c8fda..656b35a42d93 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -427,6 +427,135 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
return 0; /* need to perform reset */
}
+#define VAL_TO_STRING(ar, val) ((val < ARRAY_SIZE(ar)) ? ar[val] : "Unknown")
+
+static void pseries_print_mce_info(struct pt_regs *regs,
+ struct rtas_error_log *errp)
+{
+ const char *level, *sevstr;
+ struct pseries_errorlog *pseries_log;
+ struct pseries_mc_errorlog *mce_log;
+ uint8_t error_type, err_sub_type;
+ uint64_t addr;
+ uint8_t initiator = rtas_error_initiator(errp);
+ int disposition = rtas_error_disposition(errp);
+
+ static const char * const initiators[] = {
+ "Unknown",
+ "CPU",
+ "PCI",
+ "ISA",
+ "Memory",
+ "Power Mgmt",
+ };
+ static const char * const mc_err_types[] = {
+ "UE",
+ "SLB",
+ "ERAT",
+ "TLB",
+ "D-Cache",
+ "Unknown",
+ "I-Cache",
+ };
+ static const char * const mc_ue_types[] = {
+ "Indeterminate",
+ "Instruction fetch",
+ "Page table walk ifetch",
+ "Load/Store",
+ "Page table walk Load/Store",
+ };
+
+ /* SLB sub errors valid values are 0x0, 0x1, 0x2 */
+ static const char * const mc_slb_types[] = {
+ "Parity",
+ "Multihit",
+ "Indeterminate",
+ };
+
+ /* TLB and ERAT sub errors valid values are 0x1, 0x2, 0x3 */
+ static const char * const mc_soft_types[] = {
+ "Unknown",
+ "Parity",
+ "Multihit",
+ "Indeterminate",
+ };
+
+ if (!rtas_error_extended(errp)) {
+ pr_err("Machine check interrupt: Missing extended error log\n");
+ return;
+ }
+
+ pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
+ if (pseries_log == NULL)
+ return;
+
+ mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
+
+ error_type = rtas_mc_error_type(mce_log);
+ err_sub_type = rtas_mc_error_sub_type(mce_log);
+
+ switch (rtas_error_severity(errp)) {
+ case RTAS_SEVERITY_NO_ERROR:
+ level = KERN_INFO;
+ sevstr = "Harmless";
+ break;
+ case RTAS_SEVERITY_WARNING:
+ level = KERN_WARNING;
+ sevstr = "";
+ break;
+ case RTAS_SEVERITY_ERROR:
+ case RTAS_SEVERITY_ERROR_SYNC:
+ level = KERN_ERR;
+ sevstr = "Severe";
+ break;
+ case RTAS_SEVERITY_FATAL:
+ default:
+ level = KERN_ERR;
+ sevstr = "Fatal";
+ break;
+ }
+
+ printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
+ disposition == RTAS_DISP_FULLY_RECOVERED ?
+ "Recovered" : "Not recovered");
+ if (user_mode(regs)) {
+ printk("%s NIP: [%016lx] PID: %d Comm: %s\n", level,
+ regs->nip, current->pid, current->comm);
+ } else {
+ printk("%s NIP [%016lx]: %pS\n", level, regs->nip,
+ (void *)regs->nip);
+ }
+ printk("%s Initiator: %s\n", level,
+ VAL_TO_STRING(initiators, initiator));
+
+ switch (error_type) {
+ case PSERIES_MC_ERROR_TYPE_UE:
+ printk("%s Error type: %s [%s]\n", level,
+ VAL_TO_STRING(mc_err_types, error_type),
+ VAL_TO_STRING(mc_ue_types, err_sub_type));
+ break;
+ case PSERIES_MC_ERROR_TYPE_SLB:
+ printk("%s Error type: %s [%s]\n", level,
+ VAL_TO_STRING(mc_err_types, error_type),
+ VAL_TO_STRING(mc_slb_types, err_sub_type));
+ break;
+ case PSERIES_MC_ERROR_TYPE_ERAT:
+ case PSERIES_MC_ERROR_TYPE_TLB:
+ printk("%s Error type: %s [%s]\n", level,
+ VAL_TO_STRING(mc_err_types, error_type),
+ VAL_TO_STRING(mc_soft_types, err_sub_type));
+ break;
+ default:
+ printk("%s Error type: %s\n", level,
+ VAL_TO_STRING(mc_err_types, error_type));
+ break;
+ }
+
+ addr = rtas_mc_get_effective_addr(mce_log);
+ if (addr)
+ printk("%s Effective address: %016llx\n", level, addr);
+}
+
static int mce_handle_error(struct rtas_error_log *errp)
{
struct pseries_errorlog *pseries_log;
@@ -481,8 +610,11 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
int recovered = 0;
int disposition = rtas_error_disposition(err);
+ pseries_print_mce_info(regs, err);
+
if (!(regs->msr & MSR_RI)) {
/* If MSR_RI isn't set, we cannot recover */
+ pr_err("Machine check interrupt unrecoverable: MSR(RI=0)\n");
recovered = 0;
} else if (disposition == RTAS_DISP_FULLY_RECOVERED) {
^ permalink raw reply related
* [PATCH v7 7/9] powerpc/pseries: Dump the SLB contents on SLB MCE errors.
From: Mahesh J Salgaonkar @ 2018-08-07 14:17 UTC (permalink / raw)
To: linuxppc-dev
Cc: Aneesh Kumar K.V, Michael Ellerman, Aneesh Kumar K.V,
Michal Suchanek, Ananth Narayan, Nicholas Piggin, Laurent Dufour,
Michael Ellerman
In-Reply-To: <153365127532.14256.1965469477086140841.stgit@jupiter.in.ibm.com>
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
If we get a machine check exceptions due to SLB errors then dump the
current SLB contents which will be very much helpful in debugging the
root cause of SLB errors. Introduce an exclusive buffer per cpu to hold
faulty SLB entries. In real mode mce handler saves the old SLB contents
into this buffer accessible through paca and print it out later in virtual
mode.
With this patch the console will log SLB contents like below on SLB MCE
errors:
[ 507.297236] SLB contents of cpu 0x1
[ 507.297237] Last SLB entry inserted at slot 16
[ 507.297238] 00 c000000008000000 400ea1b217000500
[ 507.297239] 1T ESID= c00000 VSID= ea1b217 LLP:100
[ 507.297240] 01 d000000008000000 400d43642f000510
[ 507.297242] 1T ESID= d00000 VSID= d43642f LLP:110
[ 507.297243] 11 f000000008000000 400a86c85f000500
[ 507.297244] 1T ESID= f00000 VSID= a86c85f LLP:100
[ 507.297245] 12 00007f0008000000 4008119624000d90
[ 507.297246] 1T ESID= 7f VSID= 8119624 LLP:110
[ 507.297247] 13 0000000018000000 00092885f5150d90
[ 507.297247] 256M ESID= 1 VSID= 92885f5150 LLP:110
[ 507.297248] 14 0000010008000000 4009e7cb50000d90
[ 507.297249] 1T ESID= 1 VSID= 9e7cb50 LLP:110
[ 507.297250] 15 d000000008000000 400d43642f000510
[ 507.297251] 1T ESID= d00000 VSID= d43642f LLP:110
[ 507.297252] 16 d000000008000000 400d43642f000510
[ 507.297253] 1T ESID= d00000 VSID= d43642f LLP:110
[ 507.297253] ----------------------------------
[ 507.297254] SLB cache ptr value = 3
[ 507.297254] Valid SLB cache entries:
[ 507.297255] 00 EA[0-35]= 7f000
[ 507.297256] 01 EA[0-35]= 1
[ 507.297257] 02 EA[0-35]= 1000
[ 507.297257] Rest of SLB cache entries:
[ 507.297258] 03 EA[0-35]= 7f000
[ 507.297258] 04 EA[0-35]= 1
[ 507.297259] 05 EA[0-35]= 1000
[ 507.297260] 06 EA[0-35]= 12
[ 507.297260] 07 EA[0-35]= 7f000
Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
Changes in V7:
- Print slb cache ptr value and slb cache data
---
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 7 ++
arch/powerpc/include/asm/paca.h | 4 +
arch/powerpc/mm/slb.c | 73 +++++++++++++++++++++++++
arch/powerpc/platforms/pseries/ras.c | 10 +++
arch/powerpc/platforms/pseries/setup.c | 10 +++
5 files changed, 103 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index cc00a7088cf3..5a3fe282076d 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -485,9 +485,16 @@ static inline void hpte_init_pseries(void) { }
extern void hpte_init_native(void);
+struct slb_entry {
+ u64 esid;
+ u64 vsid;
+};
+
extern void slb_initialize(void);
extern void slb_flush_and_rebolt(void);
extern void slb_flush_and_rebolt_realmode(void);
+extern void slb_save_contents(struct slb_entry *slb_ptr);
+extern void slb_dump_contents(struct slb_entry *slb_ptr);
extern void slb_vmalloc_update(void);
extern void slb_set_size(u16 size);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 7f22929ce915..233d25ff6f64 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -254,6 +254,10 @@ struct paca_struct {
#endif
#ifdef CONFIG_PPC_PSERIES
u8 *mce_data_buf; /* buffer to hold per cpu rtas errlog */
+
+ /* Capture SLB related old contents in MCE handler. */
+ struct slb_entry *mce_faulty_slbs;
+ u16 slb_save_cache_ptr;
#endif /* CONFIG_PPC_PSERIES */
} ____cacheline_aligned;
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index e89f675f1b5e..16a53689ffd4 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -151,6 +151,79 @@ void slb_flush_and_rebolt_realmode(void)
get_paca()->slb_cache_ptr = 0;
}
+void slb_save_contents(struct slb_entry *slb_ptr)
+{
+ int i;
+ unsigned long e, v;
+
+ /* Save slb_cache_ptr value. */
+ get_paca()->slb_save_cache_ptr = get_paca()->slb_cache_ptr;
+
+ if (!slb_ptr)
+ return;
+
+ for (i = 0; i < mmu_slb_size; i++) {
+ asm volatile("slbmfee %0,%1" : "=r" (e) : "r" (i));
+ asm volatile("slbmfev %0,%1" : "=r" (v) : "r" (i));
+ slb_ptr->esid = e;
+ slb_ptr->vsid = v;
+ slb_ptr++;
+ }
+}
+
+void slb_dump_contents(struct slb_entry *slb_ptr)
+{
+ int i, n;
+ unsigned long e, v;
+ unsigned long llp;
+
+ if (!slb_ptr)
+ return;
+
+ pr_err("SLB contents of cpu 0x%x\n", smp_processor_id());
+ pr_err("Last SLB entry inserted at slot %lld\n", get_paca()->stab_rr);
+
+ for (i = 0; i < mmu_slb_size; i++) {
+ e = slb_ptr->esid;
+ v = slb_ptr->vsid;
+ slb_ptr++;
+
+ if (!e && !v)
+ continue;
+
+ pr_err("%02d %016lx %016lx\n", i, e, v);
+
+ if (!(e & SLB_ESID_V)) {
+ pr_err("\n");
+ continue;
+ }
+ llp = v & SLB_VSID_LLP;
+ if (v & SLB_VSID_B_1T) {
+ pr_err(" 1T ESID=%9lx VSID=%13lx LLP:%3lx\n",
+ GET_ESID_1T(e),
+ (v & ~SLB_VSID_B) >> SLB_VSID_SHIFT_1T,
+ llp);
+ } else {
+ pr_err(" 256M ESID=%9lx VSID=%13lx LLP:%3lx\n",
+ GET_ESID(e),
+ (v & ~SLB_VSID_B) >> SLB_VSID_SHIFT,
+ llp);
+ }
+ }
+ pr_err("----------------------------------\n");
+
+ /* Dump slb cache entires as well. */
+ pr_err("SLB cache ptr value = %d\n", get_paca()->slb_save_cache_ptr);
+ pr_err("Valid SLB cache entries:\n");
+ n = min_t(int, get_paca()->slb_save_cache_ptr, SLB_CACHE_ENTRIES);
+ for (i = 0; i < n; i++)
+ pr_err("%02d EA[0-35]=%9x\n", i, get_paca()->slb_cache[i]);
+ pr_err("Rest of SLB cache entries:\n");
+ for (i = n; i < SLB_CACHE_ENTRIES; i++)
+ pr_err("%02d EA[0-35]=%9x\n", i, get_paca()->slb_cache[i]);
+
+}
+
void slb_vmalloc_update(void)
{
unsigned long vflags;
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 656b35a42d93..117ca2ff5456 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -515,6 +515,10 @@ static void pseries_print_mce_info(struct pt_regs *regs,
break;
}
+ /* Display faulty slb contents for SLB errors. */
+ if (error_type == PSERIES_MC_ERROR_TYPE_SLB)
+ slb_dump_contents(local_paca->mce_faulty_slbs);
+
printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
disposition == RTAS_DISP_FULLY_RECOVERED ?
"Recovered" : "Not recovered");
@@ -575,7 +579,11 @@ static int mce_handle_error(struct rtas_error_log *errp)
if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
- /* Store the old slb content someplace. */
+ /*
+ * Store the old slb content in paca before flushing. Print
+ * this when we go to virtual mode.
+ */
+ slb_save_contents(local_paca->mce_faulty_slbs);
slb_flush_and_rebolt_realmode();
disposition = RTAS_DISP_FULLY_RECOVERED;
rtas_set_disposition_recovered(errp);
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 7a9421d089d8..53aee58a928b 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -105,6 +105,9 @@ static void __init fwnmi_init(void)
u8 *mce_data_buf;
unsigned int i;
int nr_cpus = num_possible_cpus();
+ struct slb_entry *slb_ptr;
+ size_t size;
+
int ibm_nmi_register = rtas_token("ibm,nmi-register");
if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE)
@@ -130,6 +133,13 @@ static void __init fwnmi_init(void)
paca_ptrs[i]->mce_data_buf = mce_data_buf +
(RTAS_ERROR_LOG_MAX * i);
}
+
+ /* Allocate per cpu slb area to save old slb contents during MCE */
+ size = sizeof(struct slb_entry) * mmu_slb_size * nr_cpus;
+ slb_ptr = __va(memblock_alloc_base(size, sizeof(struct slb_entry),
+ ppc64_rma_size));
+ for_each_possible_cpu(i)
+ paca_ptrs[i]->mce_faulty_slbs = slb_ptr + (mmu_slb_size * i);
}
static void pseries_8259_cascade(struct irq_desc *desc)
^ permalink raw reply related
* [PATCH v7 8/9] powerpc/mce: Add sysctl control for recovery action on MCE.
From: Mahesh J Salgaonkar @ 2018-08-07 14:17 UTC (permalink / raw)
To: linuxppc-dev
Cc: Aneesh Kumar K.V, Michal Suchanek, Ananth Narayan,
Nicholas Piggin, Laurent Dufour, Michael Ellerman
In-Reply-To: <153365127532.14256.1965469477086140841.stgit@jupiter.in.ibm.com>
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Introduce recovery action for recovered memory errors (MCEs). There are
soft memory errors like SLB Multihit, which can be a result of a bad
hardware OR software BUG. Kernel can easily recover from these soft errors
by flushing SLB contents. After the recovery kernel can still continue to
function without any issue. But in some scenario's we may keep getting
these soft errors until the root cause is fixed. To be able to analyze and
find the root cause, best way is to gather enough data and system state at
the time of MCE. Hence this patch introduces a sysctl knob where user can
decide either to continue after recovery or panic the kernel to capture the
dump. This will allow one to configure a kernel to capture a dump on MCE
and then toggle back to recovery while dump is being analyzed.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/mce.h | 2 +
arch/powerpc/kernel/mce.c | 58 ++++++++++++++++++++++++++++++++
arch/powerpc/kernel/traps.c | 3 +-
arch/powerpc/platforms/powernv/setup.c | 4 ++
4 files changed, 66 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 3a1226e9b465..d46e1903878d 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -202,6 +202,8 @@ struct mce_error_info {
#define MCE_EVENT_RELEASE true
#define MCE_EVENT_DONTRELEASE false
+extern int recover_on_mce;
+
extern void save_mce_event(struct pt_regs *regs, long handled,
struct mce_error_info *mce_err, uint64_t nip,
uint64_t addr, uint64_t phys_addr);
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index ae17d8aa60c4..5e2ab5cade81 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -28,6 +28,7 @@
#include <linux/percpu.h>
#include <linux/export.h>
#include <linux/irq_work.h>
+#include <linux/moduleparam.h>
#include <asm/machdep.h>
#include <asm/mce.h>
@@ -631,3 +632,60 @@ long hmi_exception_realmode(struct pt_regs *regs)
return 1;
}
+
+/*
+ * Recovery action for recovered memory errors.
+ *
+ * There are soft memory errors like SLB Multihit, which can be a result of
+ * a bad hardware OR software BUG. Kernel can easily recover from these
+ * soft errors by flushing SLB contents. After the recovery kernel can
+ * still continue to function without any issue. But in some scenario's we
+ * may keep getting these soft errors until the root cause is fixed. To be
+ * able to analyze and find the root cause, best way is to gather enough
+ * data and system state at the time of MCE. Introduce a sysctl knob where
+ * user can decide either to continue after recovery or panic the kernel
+ * to capture the dump. This will allow one to configure a kernel to capture
+ * dump on MCE and then toggle back to recovery while dump is being analyzed.
+ *
+ * recover_on_mce == 0
+ * panic/crash the kernel to trigger dump capture.
+ *
+ * recover_on_mce == 1
+ * continue after MCE recovery. (no panic)
+ */
+int recover_on_mce;
+
+#ifdef CONFIG_SYSCTL
+/*
+ * Register the sysctl to define memory error recovery action.
+ */
+static struct ctl_table machine_check_ctl_table[] = {
+ {
+ .procname = "recover_on_mce",
+ .data = &recover_on_mce,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {}
+};
+
+static struct ctl_table machine_check_sysctl_root[] = {
+ {
+ .procname = "kernel",
+ .mode = 0555,
+ .child = machine_check_ctl_table,
+ },
+ {}
+};
+
+static int __init register_machine_check_sysctl(void)
+{
+ register_sysctl_table(machine_check_sysctl_root);
+
+ return 0;
+}
+__initcall(register_machine_check_sysctl);
+#endif /* CONFIG_SYSCTL */
+
+core_param(recover_on_mce, recover_on_mce, int, 0644);
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0e17dcb48720..246477c790e8 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -70,6 +70,7 @@
#include <asm/hmi.h>
#include <sysdev/fsl_pci.h>
#include <asm/kprobes.h>
+#include <asm/mce.h>
#if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE)
int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -727,7 +728,7 @@ void machine_check_exception(struct pt_regs *regs)
else if (cur_cpu_spec->machine_check)
recover = cur_cpu_spec->machine_check(regs);
- if (recover > 0)
+ if ((recover > 0) && recover_on_mce)
goto bail;
if (debugger_fault_handler(regs))
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index b74c93bc2e55..d13278029a94 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -39,6 +39,7 @@
#include <asm/tm.h>
#include <asm/setup.h>
#include <asm/security_features.h>
+#include <asm/mce.h>
#include "powernv.h"
@@ -147,6 +148,9 @@ static void __init pnv_setup_arch(void)
/* Enable NAP mode */
powersave_nap = 1;
+ /* Recovery action on recovered MCE. By default enable it on PowerNV */
+ recover_on_mce = 1;
+
/* XXX PMCS */
}
^ permalink raw reply related
* [PATCH v7 9/9] powernv/pseries: consolidate code for mce early handling.
From: Mahesh J Salgaonkar @ 2018-08-07 14:18 UTC (permalink / raw)
To: linuxppc-dev
Cc: Aneesh Kumar K.V, Michal Suchanek, Ananth Narayan,
Nicholas Piggin, Laurent Dufour, Michael Ellerman
In-Reply-To: <153365127532.14256.1965469477086140841.stgit@jupiter.in.ibm.com>
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Now that other platforms also implements real mode mce handler,
lets consolidate the code by sharing existing powernv machine check
early code. Rename machine_check_powernv_early to
machine_check_common_early and reuse the code.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
arch/powerpc/kernel/exceptions-64s.S | 138 +++++++---------------------------
1 file changed, 28 insertions(+), 110 deletions(-)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index cb06f219570a..2f85a7baf026 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -243,14 +243,13 @@ EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
SET_SCRATCH0(r13) /* save r13 */
EXCEPTION_PROLOG_0(PACA_EXMC)
BEGIN_FTR_SECTION
- b machine_check_powernv_early
+ b machine_check_common_early
FTR_SECTION_ELSE
b machine_check_pSeries_0
ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
EXC_REAL_END(machine_check, 0x200, 0x100)
EXC_VIRT_NONE(0x4200, 0x100)
-TRAMP_REAL_BEGIN(machine_check_powernv_early)
-BEGIN_FTR_SECTION
+TRAMP_REAL_BEGIN(machine_check_common_early)
EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
/*
* Register contents:
@@ -306,7 +305,9 @@ BEGIN_FTR_SECTION
/* Save r9 through r13 from EXMC save area to stack frame. */
EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
mfmsr r11 /* get MSR value */
+BEGIN_FTR_SECTION
ori r11,r11,MSR_ME /* turn on ME bit */
+END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
ori r11,r11,MSR_RI /* turn on RI bit */
LOAD_HANDLER(r12, machine_check_handle_early)
1: mtspr SPRN_SRR0,r12
@@ -325,7 +326,6 @@ BEGIN_FTR_SECTION
andc r11,r11,r10 /* Turn off MSR_ME */
b 1b
b . /* prevent speculative execution */
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
TRAMP_REAL_BEGIN(machine_check_pSeries)
.globl machine_check_fwnmi
@@ -333,7 +333,7 @@ machine_check_fwnmi:
SET_SCRATCH0(r13) /* save r13 */
EXCEPTION_PROLOG_0(PACA_EXMC)
BEGIN_FTR_SECTION
- b machine_check_pSeries_early
+ b machine_check_common_early
END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
machine_check_pSeries_0:
EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200)
@@ -346,90 +346,6 @@ machine_check_pSeries_0:
TRAMP_KVM_SKIP(PACA_EXMC, 0x200)
-TRAMP_REAL_BEGIN(machine_check_pSeries_early)
-BEGIN_FTR_SECTION
- EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
- mr r10,r1 /* Save r1 */
- ld r1,PACAMCEMERGSP(r13) /* Use MC emergency stack */
- subi r1,r1,INT_FRAME_SIZE /* alloc stack frame */
- mfspr r11,SPRN_SRR0 /* Save SRR0 */
- mfspr r12,SPRN_SRR1 /* Save SRR1 */
- EXCEPTION_PROLOG_COMMON_1()
- EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
- EXCEPTION_PROLOG_COMMON_3(0x200)
- addi r3,r1,STACK_FRAME_OVERHEAD
- BRANCH_LINK_TO_FAR(machine_check_early) /* Function call ABI */
- ld r12,_MSR(r1)
- andi. r11,r12,MSR_PR /* See if coming from user. */
- bne 2f /* continue in V mode if we are. */
-
- /*
- * At this point we are not sure about what context we come from.
- * We may be in the middle of swithing stack. r1 may not be valid.
- * Hence stay on emergency stack, call machine_check_exception and
- * return from the interrupt.
- * But before that, check if this is an un-recoverable exception.
- * If yes, then stay on emergency stack and panic.
- */
- andi. r11,r12,MSR_RI
- bne 1f
-
- /*
- * Check if we have successfully handled/recovered from error, if not
- * then stay on emergency stack and panic.
- */
- cmpdi r3,0 /* see if we handled MCE successfully */
- bne 1f /* if handled then return from interrupt */
-
- LOAD_HANDLER(r10,unrecover_mce)
- mtspr SPRN_SRR0,r10
- ld r10,PACAKMSR(r13)
- /*
- * We are going down. But there are chances that we might get hit by
- * another MCE during panic path and we may run into unstable state
- * with no way out. Hence, turn ME bit off while going down, so that
- * when another MCE is hit during panic path, hypervisor will
- * power cycle the lpar, instead of getting into MCE loop.
- */
- li r3,MSR_ME
- andc r10,r10,r3 /* Turn off MSR_ME */
- mtspr SPRN_SRR1,r10
- RFI_TO_KERNEL
- b .
-
- /* Stay on emergency stack and return from interrupt. */
-1: LOAD_HANDLER(r10,mce_return)
- mtspr SPRN_SRR0,r10
- ld r10,PACAKMSR(r13)
- mtspr SPRN_SRR1,r10
- RFI_TO_KERNEL
- b .
-
- /* Move original SRR0 and SRR1 into the respective regs */
-2: ld r9,_MSR(r1)
- mtspr SPRN_SRR1,r9
- ld r3,_NIP(r1)
- mtspr SPRN_SRR0,r3
- ld r9,_CTR(r1)
- mtctr r9
- ld r9,_XER(r1)
- mtxer r9
- ld r9,_LINK(r1)
- mtlr r9
- REST_GPR(0, r1)
- REST_8GPRS(2, r1)
- REST_GPR(10, r1)
- ld r11,_CCR(r1)
- mtcr r11
- REST_GPR(11, r1)
- REST_2GPRS(12, r1)
- /* restore original r1. */
- ld r1,GPR1(r1)
- SET_SCRATCH0(r13) /* save r13 */
- EXCEPTION_PROLOG_0(PACA_EXMC)
- b machine_check_pSeries_0
-END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
-
EXC_COMMON_BEGIN(machine_check_common)
/*
* Machine check is different because we use a different
@@ -528,6 +444,9 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
bl machine_check_early
std r3,RESULT(r1) /* Save result */
ld r12,_MSR(r1)
+BEGIN_FTR_SECTION
+ b 4f
+END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
#ifdef CONFIG_PPC_P7_NAP
/*
@@ -551,10 +470,11 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
*/
rldicl. r11,r12,4,63 /* See if MC hit while in HV mode. */
beq 5f
- andi. r11,r12,MSR_PR /* See if coming from user. */
+4: andi. r11,r12,MSR_PR /* See if coming from user. */
bne 9f /* continue in V mode if we are. */
5:
+BEGIN_FTR_SECTION
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
/*
* We are coming from kernel context. Check if we are coming from
@@ -565,6 +485,7 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
cmpwi r11,0 /* Check if coming from guest */
bne 9f /* continue if we are. */
#endif
+END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
/*
* At this point we are not sure about what context we come from.
* Queue up the MCE event and return from the interrupt.
@@ -598,6 +519,7 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
cmpdi r3,0 /* see if we handled MCE successfully */
beq 1b /* if !handled then panic */
+BEGIN_FTR_SECTION
/*
* Return from MC interrupt.
* Queue up the MCE event so that we can log it later, while
@@ -606,10 +528,24 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
bl machine_check_queue_event
MACHINE_CHECK_HANDLER_WINDUP
RFI_TO_USER_OR_KERNEL
+FTR_SECTION_ELSE
+ /*
+ * pSeries: Return from MC interrupt. Before that stay on emergency
+ * stack and call machine_check_exception to log the MCE event.
+ */
+ LOAD_HANDLER(r10,mce_return)
+ mtspr SPRN_SRR0,r10
+ ld r10,PACAKMSR(r13)
+ mtspr SPRN_SRR1,r10
+ RFI_TO_KERNEL
+ b .
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
9:
/* Deliver the machine check to host kernel in V mode. */
MACHINE_CHECK_HANDLER_WINDUP
- b machine_check_pSeries
+ SET_SCRATCH0(r13) /* save r13 */
+ EXCEPTION_PROLOG_0(PACA_EXMC)
+ b machine_check_pSeries_0
EXC_COMMON_BEGIN(unrecover_mce)
/* Invoke machine_check_exception to print MCE event and panic. */
@@ -627,25 +563,7 @@ EXC_COMMON_BEGIN(mce_return)
/* Invoke machine_check_exception to print MCE event and return. */
addi r3,r1,STACK_FRAME_OVERHEAD
bl machine_check_exception
- ld r9,_MSR(r1)
- mtspr SPRN_SRR1,r9
- ld r3,_NIP(r1)
- mtspr SPRN_SRR0,r3
- ld r9,_CTR(r1)
- mtctr r9
- ld r9,_XER(r1)
- mtxer r9
- ld r9,_LINK(r1)
- mtlr r9
- REST_GPR(0, r1)
- REST_8GPRS(2, r1)
- REST_GPR(10, r1)
- ld r11,_CCR(r1)
- mtcr r11
- REST_GPR(11, r1)
- REST_2GPRS(12, r1)
- /* restore original r1. */
- ld r1,GPR1(r1)
+ MACHINE_CHECK_HANDLER_WINDUP
RFI_TO_KERNEL
b .
^ permalink raw reply related
* Re: [PATCH] misc: ibmvsm: Fix wrong assignment of return code
From: Bryant G. Ly @ 2018-08-07 14:22 UTC (permalink / raw)
To: Michael Ellerman, Bryant G. Ly, gregkh; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <87y3dibbt3.fsf@concordia.ellerman.id.au>
On 8/7/18 7:28 AM, Michael Ellerman wrote:
> "Bryant G. Ly" <bryantly@linux.vnet.ibm.com> writes:
>
>> From: "Bryant G. Ly" <bryantly@linux.ibm.com>
>>
>> Currently the assignment is flipped and rc is always 0.
> If you'd left rc uninitialised at the start of the function the compiler
> would have caught it for you.
>
> And what is the consequence of the bug? Nothing, complete system crash,
> subtle data corruption?
The consequence would be that if the CRQ Registration failed the first time
due to not enough resources, it would never try to reset and try again.
If it fails due to any other error then it would just fail the sending of the
crq init message, thus it would just wait for the client to init, which would
never happen.
We would also have a memory leak since in the error case DMA would never get
un-mapped and the message queue never gets freed.
>
> Also this should be tagged:
>
> Fixes: 0eca353e7ae7 ("misc: IBM Virtual Management Channel Driver (VMC)")
>
> cheers
>
Yep, sorry I forgot to add the Fixes:..
-Bryant
^ permalink raw reply
* Re: [PATCH v2] powerpc/tm: Print 64-bits MSR
From: Segher Boessenkool @ 2018-08-07 15:50 UTC (permalink / raw)
To: Breno Leitao; +Cc: linuxppc-dev, mikey
In-Reply-To: <1533648900-7933-1-git-send-email-leitao@debian.org>
On Tue, Aug 07, 2018 at 10:35:00AM -0300, Breno Leitao wrote:
> On a kernel TM Bad thing program exception, the Machine State Register
> (MSR) is not being properly displayed. The exception code dumps a 32-bits
> value but MSR is a 64 bits register for all platforms that have HTM
> enabled.
>
> This patch dumps the MSR value as a 64-bits value instead of 32 bits. In
> order to do so, the 'reason' variable could not be used, since it trimmed
> MSR to 32-bits (int).
So maybe reason should be a long instead of an int?
Segher
^ permalink raw reply
* Re: [PATCH v7 5/9] powerpc/pseries: flush SLB contents on SLB MCE errors.
From: Michal Suchánek @ 2018-08-07 16:54 UTC (permalink / raw)
To: Mahesh J Salgaonkar
Cc: linuxppc-dev, Michael Ellerman, Nicholas Piggin, Ananth Narayan,
Aneesh Kumar K.V, Laurent Dufour
In-Reply-To: <153365142349.14256.9954484737438718329.stgit@jupiter.in.ibm.com>
Hello,
On Tue, 07 Aug 2018 19:47:14 +0530
"Mahesh J Salgaonkar" <mahesh@linux.vnet.ibm.com> wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> On pseries, as of today system crashes if we get a machine check
> exceptions due to SLB errors. These are soft errors and can be fixed
> by flushing the SLBs so the kernel can continue to function instead of
> system crash. We do this in real mode before turning on MMU. Otherwise
> we would run into nested machine checks. This patch now fetches the
> rtas error log in real mode and flushes the SLBs on SLB errors.
>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> Signed-off-by: Michal Suchanek <msuchanek@suse.com>
> ---
>
> Changes in V7:
> - Fold Michal's patch into this patch.
> - Handle MSR_RI=0 and evil context case in MC handler.
> ---
> arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1
> arch/powerpc/include/asm/machdep.h | 1
> arch/powerpc/kernel/exceptions-64s.S | 112
> +++++++++++++++++++++++++
> arch/powerpc/kernel/mce.c | 15 +++
> arch/powerpc/mm/slb.c | 6 +
> arch/powerpc/platforms/powernv/setup.c | 11 ++
> arch/powerpc/platforms/pseries/pseries.h | 1
> arch/powerpc/platforms/pseries/ras.c | 51 +++++++++++
> arch/powerpc/platforms/pseries/setup.c | 1 9 files changed,
> 195 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index
> 50ed64fba4ae..cc00a7088cf3 100644 ---
> a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -487,6 +487,7 @@
> extern void hpte_init_native(void);
> extern void slb_initialize(void);
> extern void slb_flush_and_rebolt(void);
> +extern void slb_flush_and_rebolt_realmode(void);
>
> extern void slb_vmalloc_update(void);
> extern void slb_set_size(u16 size);
> diff --git a/arch/powerpc/include/asm/machdep.h
> b/arch/powerpc/include/asm/machdep.h index a47de82fb8e2..b4831f1338db
> 100644 --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -108,6 +108,7 @@ struct machdep_calls {
>
> /* Early exception handlers called in realmode */
> int (*hmi_exception_early)(struct pt_regs
> *regs);
> + long (*machine_check_early)(struct pt_regs
> *regs);
> /* Called during machine check exception to retrive fixup
> address. */ bool (*mce_check_early_recovery)(struct
> pt_regs *regs); diff --git a/arch/powerpc/kernel/exceptions-64s.S
> b/arch/powerpc/kernel/exceptions-64s.S index
> 285c6465324a..cb06f219570a 100644 ---
> a/arch/powerpc/kernel/exceptions-64s.S +++
> b/arch/powerpc/kernel/exceptions-64s.S @@ -332,6 +332,9 @@
> TRAMP_REAL_BEGIN(machine_check_pSeries) machine_check_fwnmi:
> SET_SCRATCH0(r13) /* save r13 */
> EXCEPTION_PROLOG_0(PACA_EXMC)
> +BEGIN_FTR_SECTION
> + b machine_check_pSeries_early
> +END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
> machine_check_pSeries_0:
> EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200)
> /*
> @@ -343,6 +346,90 @@ machine_check_pSeries_0:
>
> TRAMP_KVM_SKIP(PACA_EXMC, 0x200)
>
> +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> +BEGIN_FTR_SECTION
> + EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> + mr r10,r1 /* Save r1 */
> + ld r1,PACAMCEMERGSP(r13) /* Use MC emergency
> stack */
> + subi r1,r1,INT_FRAME_SIZE /* alloc stack
> frame */
> + mfspr r11,SPRN_SRR0 /* Save SRR0 */
> + mfspr r12,SPRN_SRR1 /* Save SRR1 */
> + EXCEPTION_PROLOG_COMMON_1()
> + EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> + EXCEPTION_PROLOG_COMMON_3(0x200)
> + addi r3,r1,STACK_FRAME_OVERHEAD
> + BRANCH_LINK_TO_FAR(machine_check_early) /* Function call ABI
> */
> + ld r12,_MSR(r1)
> + andi. r11,r12,MSR_PR /* See if coming
> from user. */
> + bne 2f /* continue in V mode
> if we are. */ +
> + /*
> + * At this point we are not sure about what context we come
> from.
> + * We may be in the middle of swithing stack. r1 may not be
> valid.
> + * Hence stay on emergency stack, call
> machine_check_exception and
> + * return from the interrupt.
> + * But before that, check if this is an un-recoverable
> exception.
> + * If yes, then stay on emergency stack and panic.
> + */
> + andi. r11,r12,MSR_RI
> + bne 1f
> +
> + /*
> + * Check if we have successfully handled/recovered from
> error, if not
> + * then stay on emergency stack and panic.
> + */
> + cmpdi r3,0 /* see if we handled MCE
> successfully */
> + bne 1f /* if handled then return from
> interrupt */ +
> + LOAD_HANDLER(r10,unrecover_mce)
> + mtspr SPRN_SRR0,r10
> + ld r10,PACAKMSR(r13)
> + /*
> + * We are going down. But there are chances that we might
> get hit by
> + * another MCE during panic path and we may run into
> unstable state
> + * with no way out. Hence, turn ME bit off while going down,
> so that
> + * when another MCE is hit during panic path, hypervisor will
> + * power cycle the lpar, instead of getting into MCE loop.
> + */
> + li r3,MSR_ME
> + andc r10,r10,r3 /* Turn off MSR_ME */
> + mtspr SPRN_SRR1,r10
> + RFI_TO_KERNEL
> + b .
> +
> + /* Stay on emergency stack and return from interrupt. */
> +1: LOAD_HANDLER(r10,mce_return)
> + mtspr SPRN_SRR0,r10
> + ld r10,PACAKMSR(r13)
> + mtspr SPRN_SRR1,r10
> + RFI_TO_KERNEL
> + b .
I think that the logic should be inverted here. That is we should check
for unrecoverable and unhandled exceptions and jump to unrecov_mce if
found, fallthrough to mce_return otherwise.
Thanks
Michal
> +
> + /* Move original SRR0 and SRR1 into the respective regs */
> +2: ld r9,_MSR(r1)
> + mtspr SPRN_SRR1,r9
> + ld r3,_NIP(r1)
> + mtspr SPRN_SRR0,r3
> + ld r9,_CTR(r1)
> + mtctr r9
> + ld r9,_XER(r1)
> + mtxer r9
> + ld r9,_LINK(r1)
> + mtlr r9
> + REST_GPR(0, r1)
> + REST_8GPRS(2, r1)
> + REST_GPR(10, r1)
> + ld r11,_CCR(r1)
> + mtcr r11
> + REST_GPR(11, r1)
> + REST_2GPRS(12, r1)
> + /* restore original r1. */
> + ld r1,GPR1(r1)
> + SET_SCRATCH0(r13) /* save r13 */
> + EXCEPTION_PROLOG_0(PACA_EXMC)
> + b machine_check_pSeries_0
> +END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
> +
> EXC_COMMON_BEGIN(machine_check_common)
> /*
> * Machine check is different because we use a different
> @@ -536,6 +623,31 @@ EXC_COMMON_BEGIN(unrecover_mce)
> bl unrecoverable_exception
> b 1b
>
> +EXC_COMMON_BEGIN(mce_return)
> + /* Invoke machine_check_exception to print MCE event and
> return. */
> + addi r3,r1,STACK_FRAME_OVERHEAD
> + bl machine_check_exception
> + ld r9,_MSR(r1)
> + mtspr SPRN_SRR1,r9
> + ld r3,_NIP(r1)
> + mtspr SPRN_SRR0,r3
> + ld r9,_CTR(r1)
> + mtctr r9
> + ld r9,_XER(r1)
> + mtxer r9
> + ld r9,_LINK(r1)
> + mtlr r9
> + REST_GPR(0, r1)
> + REST_8GPRS(2, r1)
> + REST_GPR(10, r1)
> + ld r11,_CCR(r1)
> + mtcr r11
> + REST_GPR(11, r1)
> + REST_2GPRS(12, r1)
> + /* restore original r1. */
> + ld r1,GPR1(r1)
> + RFI_TO_KERNEL
> + b .
>
> EXC_REAL(data_access, 0x300, 0x80)
> EXC_VIRT(data_access, 0x4300, 0x80, 0x300)
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index efdd16a79075..ae17d8aa60c4 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -488,10 +488,19 @@ long machine_check_early(struct pt_regs *regs)
> {
> long handled = 0;
>
> - __this_cpu_inc(irq_stat.mce_exceptions);
> + /*
> + * For pSeries we count mce when we go into virtual mode
> machine
> + * check handler. Hence skip it. Also, We can't access per
> cpu
> + * variables in real mode for LPAR.
> + */
> + if (early_cpu_has_feature(CPU_FTR_HVMODE))
> + __this_cpu_inc(irq_stat.mce_exceptions);
>
> - if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> - handled = cur_cpu_spec->machine_check_early(regs);
> + /*
> + * See if platform is capable of handling machine check.
> + */
> + if (ppc_md.machine_check_early)
> + handled = ppc_md.machine_check_early(regs);
> return handled;
> }
>
> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> index cb796724a6fc..e89f675f1b5e 100644
> --- a/arch/powerpc/mm/slb.c
> +++ b/arch/powerpc/mm/slb.c
> @@ -145,6 +145,12 @@ void slb_flush_and_rebolt(void)
> get_paca()->slb_cache_ptr = 0;
> }
>
> +void slb_flush_and_rebolt_realmode(void)
> +{
> + __slb_flush_and_rebolt();
> + get_paca()->slb_cache_ptr = 0;
> +}
> +
> void slb_vmalloc_update(void)
> {
> unsigned long vflags;
> diff --git a/arch/powerpc/platforms/powernv/setup.c
> b/arch/powerpc/platforms/powernv/setup.c index
> f96df0a25d05..b74c93bc2e55 100644 ---
> a/arch/powerpc/platforms/powernv/setup.c +++
> b/arch/powerpc/platforms/powernv/setup.c @@ -431,6 +431,16 @@ static
> unsigned long pnv_get_proc_freq(unsigned int cpu) return ret_freq;
> }
>
> +static long pnv_machine_check_early(struct pt_regs *regs)
> +{
> + long handled = 0;
> +
> + if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> + handled = cur_cpu_spec->machine_check_early(regs);
> +
> + return handled;
> +}
> +
> define_machine(powernv) {
> .name = "PowerNV",
> .probe = pnv_probe,
> @@ -442,6 +452,7 @@ define_machine(powernv) {
> .machine_shutdown = pnv_shutdown,
> .power_save = NULL,
> .calibrate_decr = generic_calibrate_decr,
> + .machine_check_early = pnv_machine_check_early,
> #ifdef CONFIG_KEXEC_CORE
> .kexec_cpu_down = pnv_kexec_cpu_down,
> #endif
> diff --git a/arch/powerpc/platforms/pseries/pseries.h
> b/arch/powerpc/platforms/pseries/pseries.h index
> 60db2ee511fb..ec2a5f61d4a4 100644 ---
> a/arch/powerpc/platforms/pseries/pseries.h +++
> b/arch/powerpc/platforms/pseries/pseries.h @@ -24,6 +24,7 @@ struct
> pt_regs;
> extern int pSeries_system_reset_exception(struct pt_regs *regs);
> extern int pSeries_machine_check_exception(struct pt_regs *regs);
> +extern long pSeries_machine_check_realmode(struct pt_regs *regs);
>
> #ifdef CONFIG_SMP
> extern void smp_init_pseries(void);
> diff --git a/arch/powerpc/platforms/pseries/ras.c
> b/arch/powerpc/platforms/pseries/ras.c index
> 851ce326874a..e4420f7c8fda 100644 ---
> a/arch/powerpc/platforms/pseries/ras.c +++
> b/arch/powerpc/platforms/pseries/ras.c @@ -427,6 +427,35 @@ int
> pSeries_system_reset_exception(struct pt_regs *regs) return 0; /*
> need to perform reset */ }
>
> +static int mce_handle_error(struct rtas_error_log *errp)
> +{
> + struct pseries_errorlog *pseries_log;
> + struct pseries_mc_errorlog *mce_log;
> + int disposition = rtas_error_disposition(errp);
> + uint8_t error_type;
> +
> + if (!rtas_error_extended(errp))
> + goto out;
> +
> + pseries_log = get_pseries_errorlog(errp,
> PSERIES_ELOG_SECT_ID_MCE);
> + if (pseries_log == NULL)
> + goto out;
> +
> + mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
> + error_type = rtas_mc_error_type(mce_log);
> +
> + if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
> + (error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
> + /* Store the old slb content someplace. */
> + slb_flush_and_rebolt_realmode();
> + disposition = RTAS_DISP_FULLY_RECOVERED;
> + rtas_set_disposition_recovered(errp);
> + }
> +
> +out:
> + return disposition;
> +}
> +
> /*
> * Process MCE rtas errlog event.
> */
> @@ -503,11 +532,31 @@ int pSeries_machine_check_exception(struct
> pt_regs *regs) struct rtas_error_log *errp;
>
> if (fwnmi_active) {
> - errp = fwnmi_get_errinfo(regs);
> fwnmi_release_errinfo();
> + errp = fwnmi_get_errlog();
> if (errp && recover_mce(regs, errp))
> return 1;
> }
>
> return 0;
> }
> +
> +long pSeries_machine_check_realmode(struct pt_regs *regs)
> +{
> + struct rtas_error_log *errp;
> + int disposition;
> +
> + if (fwnmi_active) {
> + errp = fwnmi_get_errinfo(regs);
> + /*
> + * Call to fwnmi_release_errinfo() in real mode
> causes kernel
> + * to panic. Hence we will call it as soon as we go
> into
> + * virtual mode.
> + */
> + disposition = mce_handle_error(errp);
> + if (disposition == RTAS_DISP_FULLY_RECOVERED)
> + return 1;
> + }
> +
> + return 0;
> +}
> diff --git a/arch/powerpc/platforms/pseries/setup.c
> b/arch/powerpc/platforms/pseries/setup.c index
> b42087cd8c6b..7a9421d089d8 100644 ---
> a/arch/powerpc/platforms/pseries/setup.c +++
> b/arch/powerpc/platforms/pseries/setup.c @@ -1000,6 +1000,7 @@
> define_machine(pseries) { .calibrate_decr =
> generic_calibrate_decr, .progress = rtas_progress,
> .system_reset_exception = pSeries_system_reset_exception,
> + .machine_check_early = pSeries_machine_check_realmode,
> .machine_check_exception = pSeries_machine_check_exception,
> #ifdef CONFIG_KEXEC_CORE
> .machine_kexec = pSeries_machine_kexec,
>
>
^ permalink raw reply
* Re: [PATCH v2] powerpc/tm: Print 64-bits MSR
From: Christophe LEROY @ 2018-08-07 17:15 UTC (permalink / raw)
To: Breno Leitao, linuxppc-dev; +Cc: mikey
In-Reply-To: <1533648900-7933-1-git-send-email-leitao@debian.org>
Le 07/08/2018 à 15:35, Breno Leitao a écrit :
> On a kernel TM Bad thing program exception, the Machine State Register
> (MSR) is not being properly displayed. The exception code dumps a 32-bits
> value but MSR is a 64 bits register for all platforms that have HTM
> enabled.
>
> This patch dumps the MSR value as a 64-bits value instead of 32 bits. In
> order to do so, the 'reason' variable could not be used, since it trimmed
> MSR to 32-bits (int).
reason is not always regs->msr, see get_reason(), allthough in your case
it is.
I think it would be better to change 'reason' to 'unsigned long' instead
of replacing it by regs->msr for the printk.
Christophe
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> arch/powerpc/kernel/traps.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 0e17dcb48720..cd561fd89532 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1402,7 +1402,7 @@ void program_check_exception(struct pt_regs *regs)
> goto bail;
> } else {
> printk(KERN_EMERG "Unexpected TM Bad Thing exception "
> - "at %lx (msr 0x%x)\n", regs->nip, reason);
> + "at %lx (msr 0x%lx)\n", regs->nip, regs->msr);
> die("Unrecoverable exception", regs, SIGABRT);
> }
> }
>
^ permalink raw reply
* Re: [RFC 3/5] powerpc/mpic: Add support for non-contiguous irq ranges
From: Rob Herring @ 2018-08-07 18:09 UTC (permalink / raw)
To: Bharat Bhushan
Cc: benh, paulus, mpe, oss, galak, mark.rutland, kstewart, gregkh,
devicetree, linuxppc-dev, linux-kernel, keescook, tyreld, joe
In-Reply-To: <1532684881-19310-4-git-send-email-Bharat.Bhushan@nxp.com>
On Fri, Jul 27, 2018 at 03:17:59PM +0530, Bharat Bhushan wrote:
> Freescale MPIC h/w may not support all interrupt sources reported
> by hardware, "last-interrupt-source" or platform. On these platforms
> a misconfigured device tree that assigns one of the reserved
> interrupts leaves a non-functioning system without warning.
There are lots of ways to misconfigure DTs. I don't think this is
special and needs a property. We've had some interrupt mask or valid
properties in the past, but generally don't accept those.
>
> This patch adds "supported-irq-ranges" property in device tree to
> provide the range of supported source of interrupts. If a reserved
> interrupt used then it will not be programming h/w, which it does
> currently, and through warning.
>
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com>
> ---
> .../devicetree/bindings/powerpc/fsl/mpic.txt | 8 ++
> arch/powerpc/include/asm/mpic.h | 9 ++
> arch/powerpc/sysdev/mpic.c | 113 +++++++++++++++++++--
> 3 files changed, 121 insertions(+), 9 deletions(-)
^ permalink raw reply
* Re: [PATCH v2] powerpc/tm: Print 64-bits MSR
From: Breno Leitao @ 2018-08-07 18:41 UTC (permalink / raw)
To: Christophe LEROY, linuxppc-dev; +Cc: mikey
In-Reply-To: <95db63eb-6fb7-eed3-1ce8-8b2c053dff47@c-s.fr>
Hi,
On 08/07/2018 02:15 PM, Christophe LEROY wrote:
> Le 07/08/2018 à 15:35, Breno Leitao a écrit :
>> On a kernel TM Bad thing program exception, the Machine State Register
>> (MSR) is not being properly displayed. The exception code dumps a 32-bits
>> value but MSR is a 64 bits register for all platforms that have HTM
>> enabled.
>>
>> This patch dumps the MSR value as a 64-bits value instead of 32 bits. In
>> order to do so, the 'reason' variable could not be used, since it trimmed
>> MSR to 32-bits (int).
>
> reason is not always regs->msr, see get_reason(), allthough in your case it is.
>
> I think it would be better to change 'reason' to 'unsigned long' instead of
> replacing it by regs->msr for the printk.
That was my initial approach, but this code seems to run on 32 bits system,
and I do not want to change the whole 'reason' bit width without having a 32
bits to test, at least.
Also, it is a bit weird doing something as:
printk("....(msr 0x%lx)....", reason);
I personally think that the follow code is much more readable:
printk(".... (msr 0x%lx)...", regs->msr);
^ permalink raw reply
* Re: [PATCH v2] powerpc/tm: Print 64-bits MSR
From: LEROY Christophe @ 2018-08-07 18:57 UTC (permalink / raw)
To: Breno Leitao; +Cc: mikey, linuxppc-dev
In-Reply-To: <c5be0bf4-de2c-6fe4-3192-f48aade9506b@debian.org>
Breno Leitao <leitao@debian.org> a =C3=A9crit=C2=A0:
> Hi,
>
> On 08/07/2018 02:15 PM, Christophe LEROY wrote:
>> Le 07/08/2018 =C3=A0 15:35, Breno Leitao a =C3=A9crit=C2=A0:
>>> On a kernel TM Bad thing program exception, the Machine State Register
>>> (MSR) is not being properly displayed. The exception code dumps a 32-bi=
ts
>>> value but MSR is a 64 bits register for all platforms that have HTM
>>> enabled.
>>>
>>> This patch dumps the MSR value as a 64-bits value instead of 32 bits. I=
n
>>> order to do so, the 'reason' variable could not be used, since it trimm=
ed
>>> MSR to 32-bits (int).
>>
>> reason is not always regs->msr, see get_reason(), allthough in your=20=
=20
>>=20case it is.
>>
>> I think it would be better to change 'reason' to 'unsigned long' instead=
of
>> replacing it by regs->msr for the printk.
>
> That was my initial approach, but this code seems to run on 32 bits syste=
m,
> and I do not want to change the whole 'reason' bit width without having a=
32
> bits to test, at least.
But 'unsigned long' is still 32 bits on ppc32, so it makes no=20=20
difference=20with 'unsigned int'
And I will test it for you if needed
Christophe
>
> Also, it is a bit weird doing something as:
>
> printk("....(msr 0x%lx)....", reason);
>
> I personally think that the follow code is much more readable:
>
> printk(".... (msr 0x%lx)...", regs->msr);
^ permalink raw reply
* Re: [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling
From: John Allen @ 2018-08-07 19:26 UTC (permalink / raw)
To: Michael Ellerman, nfont; +Cc: linuxppc-dev
In-Reply-To: <87in4ufcrd.fsf@concordia.ellerman.id.au>
On Wed, Aug 01, 2018 at 11:16:22PM +1000, Michael Ellerman wrote:
>John Allen <jallen@linux.ibm.com> writes:
>
>> On Mon, Jul 23, 2018 at 11:41:24PM +1000, Michael Ellerman wrote:
>>>John Allen <jallen@linux.ibm.com> writes:
>>>
>>>> While handling PRRN events, the time to handle the actual hotplug events
>>>> dwarfs the time it takes to perform the device tree updates and queue the
>>>> hotplug events. In the case that PRRN events are being queued continuously,
>>>> hotplug events have been observed to be queued faster than the kernel can
>>>> actually handle them. This patch avoids the problem by waiting for a
>>>> hotplug request to complete before queueing more hotplug events.
>
>Have you tested this patch in isolation, ie. not with patch 1?
While I was away on vacation, I believe a build was tested with just
this patch and not the first and it has been running with no problems.
However, I think they've had problems recreating the problem in general
so it may just be that the environment is not setup properly to recreate
the issue.
>
>>>So do we need the hotplug work queue at all? Can we just call
>>>handle_dlpar_errorlog() directly?
>>>
>>>Or are we using the work queue to serialise things? And if so would a
>>>mutex be better?
>>
>> Right, the workqueue is meant to serialize all hotplug events and it
>> gets used for more than just PRRN events. I believe the motivation for
>> using the workqueue over a mutex is that KVM guests initiate hotplug
>> events through the hotplug interrupt and can queue fairly large requests
>> meaning that in this scenario, waiting for a lock would block interrupts
>> for a while.
>
>OK, but that just means that path needs to schedule work to run later.
>
>> Using the workqueue allows us to serialize hotplug events
>> from different sources in the same way without worrying about the
>> context in which the event is generated.
>
>A lock would be so much simpler.
>
>It looks like we have three callers of queue_hotplug_event(), the dlpar
>code, the mobility code and the ras interrupt.
>
>The dlpar code already waits synchronously:
>
> init_completion(&hotplug_done);
> queue_hotplug_event(hp_elog, &hotplug_done, &rc);
> wait_for_completion(&hotplug_done);
>
>You're changing mobility to do the same (this patch), leaving only the
>ras interrupt that actually queues work and returns.
>
>
>So it really seems like a mutex would do the trick, and the ras
>interrupt would be the only case that needs to schedule work for later.
I think you may be right, but I would need some feedback from Nathan
Fontenot before I redesign the queue. He's been thinking about that
design for longer than I have and may know something that I don't
regarding the reason we're using a workqueue rather than a mutex.
Given that the bug this is meant to address is pretty high priority,
would you consider the wait_for_completion an acceptable stopgap while a
more substantial redesign of this code is discussed?
-John
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-07 20:32 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
robin.murphy, jean-philippe.brucker, marc.zyngier
In-Reply-To: <20180807135505.GA29034@infradead.org>
On Tue, 2018-08-07 at 06:55 -0700, Christoph Hellwig wrote:
> On Tue, Aug 07, 2018 at 04:42:44PM +1000, Benjamin Herrenschmidt wrote:
> > Note that I can make it so that the same DMA ops (basically standard
> > swiotlb ops without arch hacks) work for both "direct virtio" and
> > "normal PCI" devices.
> >
> > The trick is simply in the arch to setup the iommu to map the swiotlb
> > bounce buffer pool 1:1 in the iommu, so the iommu essentially can be
> > ignored without affecting the physical addresses.
> >
> > If I do that, *all* I need is a way, from the guest itself (again, the
> > other side dosn't know anything about it), to force virtio to use the
> > DMA ops as if there was an iommu, that is, use whatever dma ops were
> > setup by the platform for the pci device.
>
> In that case just setting VIRTIO_F_IOMMU_PLATFORM in the flags should
> do the work (even if that isn't strictly what the current definition
> of the flag actually means). On the qemu side you'll need to make
> sure you have a way to set VIRTIO_F_IOMMU_PLATFORM without emulating
> an iommu, but with code to take dma offsets into account if your
> plaform has any (various power plaforms seem to have them, not sure
> if it affects your config).
Something like that yes. I prefer a slightly different way, see below,
any but in both cases, it should alleviate your concerns since it means
there would be no particular mucking around with DMA ops at all, virtio
would just use whatever "normal" ops we establish for all PCI devices
on that platform, which will be standard ones.
(swiotlb ones today and the new "integrates" ones you're cooking
tomorrow).
As for the flag itself, while we could set it from qemu when we get
notified that the guest is going secure, both Michael and I think it's
rather gross, it requires qemu to go iterate all virtio devices and
"poke" something into them.
It also means qemu will need some other internal nasty flag that says
"set that bit but don't do iommu".
It's nicer if we have a way in the guest virtio driver to do something
along the lines of
if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
Which would have the same effect and means the issue is entirely
contained in the guest.
Cheers,
Ben.
^ permalink raw reply
* Re: [RFC 3/5] powerpc/mpic: Add support for non-contiguous irq ranges
From: Scott Wood @ 2018-08-07 21:03 UTC (permalink / raw)
To: Rob Herring, Bharat Bhushan
Cc: benh, paulus, mpe, galak, mark.rutland, kstewart, gregkh,
devicetree, linuxppc-dev, linux-kernel, keescook, tyreld, joe
In-Reply-To: <20180807180938.GA13623@rob-hp-laptop>
On Tue, 2018-08-07 at 12:09 -0600, Rob Herring wrote:
> On Fri, Jul 27, 2018 at 03:17:59PM +0530, Bharat Bhushan wrote:
> > Freescale MPIC h/w may not support all interrupt sources reported
> > by hardware, "last-interrupt-source" or platform. On these platforms
> > a misconfigured device tree that assigns one of the reserved
> > interrupts leaves a non-functioning system without warning.
>
> There are lots of ways to misconfigure DTs. I don't think this is
> special and needs a property.
Yeah, the system will be just as non-functioning if you specify a valid-but-
wrong-for-the-device interrupt number.
> We've had some interrupt mask or valid
> properties in the past, but generally don't accept those.
FWIW, some of them like protected-sources and mpic-msgr-receive-mask aren't
for detecting errors, but are for partitioning (though the former is obsolete
with pic-no-reset).
-Scott
^ permalink raw reply
* Re: [RFC 5/5] powerpc/fsl: Add supported-irq-ranges for P2020
From: Scott Wood @ 2018-08-07 21:13 UTC (permalink / raw)
To: Bharat Bhushan, benh, paulus, mpe, galak, mark.rutland, kstewart,
gregkh, devicetree, linuxppc-dev, linux-kernel
Cc: robh, keescook, tyreld, joe
In-Reply-To: <1532684881-19310-6-git-send-email-Bharat.Bhushan@nxp.com>
On Fri, 2018-07-27 at 15:18 +0530, Bharat Bhushan wrote:
> MPIC on NXP (Freescale) P2020 supports following irq
> ranges:
> > 0 - 11 (External interrupt)
> > 16 - 79 (Internal interrupt)
> > 176 - 183 (Messaging interrupt)
> > 224 - 231 (Shared message signaled interrupt)
Why don't you convert to the 4-cell interrupt specifiers that make dealing
with these ranges less error-prone?
> diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> index 1006950..49ff348 100644
> --- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> +++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> @@ -57,6 +57,11 @@ void __init mpc85xx_rdb_pic_init(void)
> MPIC_BIG_ENDIAN |
> MPIC_SINGLE_DEST_CPU,
> 0, 256, " OpenPIC ");
> + } else if (of_machine_is_compatible("fsl,P2020RDB-PC")) {
> + mpic = mpic_alloc(NULL, 0,
> + MPIC_BIG_ENDIAN |
> + MPIC_SINGLE_DEST_CPU,
> + 0, 0, " OpenPIC ");
> } else {
> mpic = mpic_alloc(NULL, 0,
> MPIC_BIG_ENDIAN |
I don't think we want to grow a list of every single revision of every board
in these platform files.
-Scott
^ permalink raw reply
* [PATCH] powerpc/powernv: Add support for NPU2 relaxed-ordering mode
From: Reza Arbab @ 2018-08-08 3:17 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Alistair Popple
From: Alistair Popple <alistair@popple.id.au>
Some device drivers support out of order access to GPU memory. This does
not affect the CPU view of memory but it does affect the GPU view, so it
should only be enabled once the GPU driver has requested it. Add APIs
allowing a driver to do so.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
[arbab@linux.ibm.com: Rebase, add commit log]
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
---
arch/powerpc/include/asm/opal-api.h | 4 ++-
arch/powerpc/include/asm/opal.h | 3 ++
arch/powerpc/include/asm/powernv.h | 12 ++++++++
arch/powerpc/platforms/powernv/npu-dma.c | 39 ++++++++++++++++++++++++++
arch/powerpc/platforms/powernv/opal-wrappers.S | 2 ++
5 files changed, 59 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 3bab299..be6fe23e 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -208,7 +208,9 @@
#define OPAL_SENSOR_READ_U64 162
#define OPAL_PCI_GET_PBCQ_TUNNEL_BAR 164
#define OPAL_PCI_SET_PBCQ_TUNNEL_BAR 165
-#define OPAL_LAST 165
+#define OPAL_NPU_SET_RELAXED_ORDER 168
+#define OPAL_NPU_GET_RELAXED_ORDER 169
+#define OPAL_LAST 169
#define QUIESCE_HOLD 1 /* Spin all calls at entry */
#define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e1b2910..48bea30 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -43,6 +43,9 @@ int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t bdfn,
uint64_t PE_handle);
int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
uint64_t rate_phys, uint32_t size);
+int64_t opal_npu_set_relaxed_order(uint64_t phb_id, uint16_t bdfn,
+ bool request_enabled);
+int64_t opal_npu_get_relaxed_order(uint64_t phb_id, uint16_t bdfn);
int64_t opal_console_write(int64_t term_number, __be64 *length,
const uint8_t *buffer);
int64_t opal_console_read(int64_t term_number, __be64 *length,
diff --git a/arch/powerpc/include/asm/powernv.h b/arch/powerpc/include/asm/powernv.h
index 2f3ff7a..874ec6d 100644
--- a/arch/powerpc/include/asm/powernv.h
+++ b/arch/powerpc/include/asm/powernv.h
@@ -22,6 +22,8 @@ extern void pnv_npu2_destroy_context(struct npu_context *context,
extern int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
unsigned long *flags, unsigned long *status,
int count);
+int pnv_npu2_request_relaxed_ordering(struct pci_dev *pdev, bool enable);
+int pnv_npu2_get_relaxed_ordering(struct pci_dev *pdev);
void pnv_tm_init(void);
#else
@@ -39,6 +41,16 @@ static inline int pnv_npu2_handle_fault(struct npu_context *context,
return -ENODEV;
}
+static int pnv_npu2_request_relaxed_ordering(struct pci_dev *pdev, bool enable)
+{
+ return -ENODEV;
+}
+
+static int pnv_npu2_get_relaxed_ordering(struct pci_dev *pdev)
+{
+ return -ENODEV;
+}
+
static inline void pnv_tm_init(void) { }
static inline void pnv_power9_force_smt4(void) { }
#endif
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 8cdf91f..038dc1e 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -27,6 +27,7 @@
#include <asm/pnv-pci.h>
#include <asm/msi_bitmap.h>
#include <asm/opal.h>
+#include <asm/ppc-pci.h>
#include "powernv.h"
#include "pci.h"
@@ -988,3 +989,41 @@ int pnv_npu2_init(struct pnv_phb *phb)
return 0;
}
+
+/*
+ * Request relaxed ordering be enabled or disabled for the given PCI device.
+ * This function may or may not actually enable relaxed ordering depending on
+ * the exact system configuration. Use pnv_npu2_get_relaxed_ordering() below to
+ * determine the current state of relaxed ordering.
+ */
+int pnv_npu2_request_relaxed_ordering(struct pci_dev *pdev, bool enable)
+{
+ struct pci_controller *hose;
+ struct pnv_phb *phb;
+ int rc;
+
+ hose = pci_bus_to_host(pdev->bus);
+ phb = hose->private_data;
+
+ rc = opal_npu_set_relaxed_order(phb->opal_id,
+ PCI_DEVID(pdev->bus->number, pdev->devfn),
+ enable);
+ if (rc != OPAL_SUCCESS && rc != OPAL_CONSTRAINED)
+ return -EPERM;
+
+ return 0;
+}
+EXPORT_SYMBOL(pnv_npu2_request_relaxed_ordering);
+
+int pnv_npu2_get_relaxed_ordering(struct pci_dev *pdev)
+{
+ struct pci_controller *hose;
+ struct pnv_phb *phb;
+
+ hose = pci_bus_to_host(pdev->bus);
+ phb = hose->private_data;
+
+ return opal_npu_get_relaxed_order(phb->opal_id,
+ PCI_DEVID(pdev->bus->number, pdev->devfn));
+}
+EXPORT_SYMBOL(pnv_npu2_get_relaxed_ordering);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a8d9b40..3c72faf 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -327,3 +327,5 @@ OPAL_CALL(opal_npu_tl_set, OPAL_NPU_TL_SET);
OPAL_CALL(opal_pci_get_pbcq_tunnel_bar, OPAL_PCI_GET_PBCQ_TUNNEL_BAR);
OPAL_CALL(opal_pci_set_pbcq_tunnel_bar, OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
OPAL_CALL(opal_sensor_read_u64, OPAL_SENSOR_READ_U64);
+OPAL_CALL(opal_npu_set_relaxed_order, OPAL_NPU_SET_RELAXED_ORDER);
+OPAL_CALL(opal_npu_get_relaxed_order, OPAL_NPU_GET_RELAXED_ORDER);
--
1.8.3.1
^ permalink raw reply related
* RE: [RFC 3/5] powerpc/mpic: Add support for non-contiguous irq ranges
From: Bharat Bhushan @ 2018-08-08 3:37 UTC (permalink / raw)
To: Scott Wood, Rob Herring
Cc: benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au,
galak@kernel.crashing.org, mark.rutland@arm.com,
kstewart@linuxfoundation.org, gregkh@linuxfoundation.org,
devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org, keescook@chromium.org,
tyreld@linux.vnet.ibm.com, joe@perches.com
In-Reply-To: <a20f27b15161164c829f771d2b085639f6709374.camel@buserror.net>
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogU2NvdHQgV29vZCBbbWFp
bHRvOm9zc0BidXNlcnJvci5uZXRdDQo+IFNlbnQ6IFdlZG5lc2RheSwgQXVndXN0IDgsIDIwMTgg
MjozNCBBTQ0KPiBUbzogUm9iIEhlcnJpbmcgPHJvYmhAa2VybmVsLm9yZz47IEJoYXJhdCBCaHVz
aGFuDQo+IDxiaGFyYXQuYmh1c2hhbkBueHAuY29tPg0KPiBDYzogYmVuaEBrZXJuZWwuY3Jhc2hp
bmcub3JnOyBwYXVsdXNAc2FtYmEub3JnOyBtcGVAZWxsZXJtYW4uaWQuYXU7DQo+IGdhbGFrQGtl
cm5lbC5jcmFzaGluZy5vcmc7IG1hcmsucnV0bGFuZEBhcm0uY29tOw0KPiBrc3Rld2FydEBsaW51
eGZvdW5kYXRpb24ub3JnOyBncmVna2hAbGludXhmb3VuZGF0aW9uLm9yZzsNCj4gZGV2aWNldHJl
ZUB2Z2VyLmtlcm5lbC5vcmc7IGxpbnV4cHBjLWRldkBsaXN0cy5vemxhYnMub3JnOyBsaW51eC0N
Cj4ga2VybmVsQHZnZXIua2VybmVsLm9yZzsga2Vlc2Nvb2tAY2hyb21pdW0ub3JnOw0KPiB0eXJl
bGRAbGludXgudm5ldC5pYm0uY29tOyBqb2VAcGVyY2hlcy5jb20NCj4gU3ViamVjdDogUmU6IFtS
RkMgMy81XSBwb3dlcnBjL21waWM6IEFkZCBzdXBwb3J0IGZvciBub24tY29udGlndW91cyBpcnEN
Cj4gcmFuZ2VzDQo+IA0KPiBPbiBUdWUsIDIwMTgtMDgtMDcgYXQgMTI6MDkgLTA2MDAsIFJvYiBI
ZXJyaW5nIHdyb3RlOg0KPiA+IE9uIEZyaSwgSnVsIDI3LCAyMDE4IGF0IDAzOjE3OjU5UE0gKzA1
MzAsIEJoYXJhdCBCaHVzaGFuIHdyb3RlOg0KPiA+ID4gRnJlZXNjYWxlIE1QSUMgaC93IG1heSBu
b3Qgc3VwcG9ydCBhbGwgaW50ZXJydXB0IHNvdXJjZXMgcmVwb3J0ZWQgYnkNCj4gPiA+IGhhcmR3
YXJlLCAibGFzdC1pbnRlcnJ1cHQtc291cmNlIiBvciBwbGF0Zm9ybS4gT24gdGhlc2UgcGxhdGZv
cm1zIGENCj4gPiA+IG1pc2NvbmZpZ3VyZWQgZGV2aWNlIHRyZWUgdGhhdCBhc3NpZ25zIG9uZSBv
ZiB0aGUgcmVzZXJ2ZWQNCj4gPiA+IGludGVycnVwdHMgbGVhdmVzIGEgbm9uLWZ1bmN0aW9uaW5n
IHN5c3RlbSB3aXRob3V0IHdhcm5pbmcuDQo+ID4NCj4gPiBUaGVyZSBhcmUgbG90cyBvZiB3YXlz
IHRvIG1pc2NvbmZpZ3VyZSBEVHMuIEkgZG9uJ3QgdGhpbmsgdGhpcyBpcw0KPiA+IHNwZWNpYWwg
YW5kIG5lZWRzIGEgcHJvcGVydHkuDQo+IA0KPiBZZWFoLCB0aGUgc3lzdGVtIHdpbGwgYmUganVz
dCBhcyBub24tZnVuY3Rpb25pbmcgaWYgeW91IHNwZWNpZnkgYSB2YWxpZC1idXQtDQo+IHdyb25n
LWZvci10aGUtZGV2aWNlIGludGVycnVwdCBudW1iZXIuDQoNClNvbWUgaXMgb25lIGFkZGl0aW9u
YWwgYmVuZWZpdHMgb2YgdGhpcyBjaGFuZ2VzLCBNUElDIGhhdmUgcmVzZXJ2ZWQgcmVnaW9ucyBm
b3IgdW4tc3VwcG9ydGVkIGludGVycnVwdHMgYW5kIHJlYWQvd3JpdGVzIHRvIHRoZXNlIHJlc2Vy
dmVkIHJlZ2lvbnMgc2VhbXMgaGF2ZSBubyBlZmZlY3QuDQpNUElDIGRyaXZlciByZWFkcy93cml0
ZXMgdG8gdGhlIHJlc2VydmVkIHJlZ2lvbnMgZHVyaW5nIGluaXQvdW5pbml0IGFuZCBzYXZlL3Jl
c3RvcmUgc3RhdGUuDQoNCkxldCBtZSBrbm93IGlmIGl0IG1ha2Ugc2Vuc2UgdG8gaGF2ZSB0aGVz
ZSBjaGFuZ2VzIGZvciBtZW50aW9uZWQgcmVhc29ucy4NCg0KVGhhbmtzDQotQmhhcmF0DQoNCj4g
DQo+ID4gIFdlJ3ZlIGhhZCBzb21lIGludGVycnVwdCBtYXNrIG9yIHZhbGlkIHByb3BlcnRpZXMg
aW4gdGhlIHBhc3QsIGJ1dA0KPiA+IGdlbmVyYWxseSBkb24ndCBhY2NlcHQgdGhvc2UuDQo+IA0K
PiBGV0lXLCBzb21lIG9mIHRoZW0gbGlrZSBwcm90ZWN0ZWQtc291cmNlcyBhbmQgbXBpYy1tc2dy
LXJlY2VpdmUtbWFzaw0KPiBhcmVuJ3QgZm9yIGRldGVjdGluZyBlcnJvcnMsIGJ1dCBhcmUgZm9y
IHBhcnRpdGlvbmluZyAodGhvdWdoIHRoZSBmb3JtZXIgaXMNCj4gb2Jzb2xldGUgd2l0aCBwaWMt
bm8tcmVzZXQpLg0KPiANCj4gLVNjb3R0DQoNCg==
^ permalink raw reply
* RE: [RFC 5/5] powerpc/fsl: Add supported-irq-ranges for P2020
From: Bharat Bhushan @ 2018-08-08 3:44 UTC (permalink / raw)
To: Scott Wood, benh@kernel.crashing.org, paulus@samba.org,
mpe@ellerman.id.au, galak@kernel.crashing.org,
mark.rutland@arm.com, kstewart@linuxfoundation.org,
gregkh@linuxfoundation.org, devicetree@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Cc: robh@kernel.org, keescook@chromium.org, tyreld@linux.vnet.ibm.com,
joe@perches.com
In-Reply-To: <ab2a113620123cb71364b6ae89328ae2fca49821.camel@buserror.net>
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogU2NvdHQgV29vZCBbbWFp
bHRvOm9zc0BidXNlcnJvci5uZXRdDQo+IFNlbnQ6IFdlZG5lc2RheSwgQXVndXN0IDgsIDIwMTgg
Mjo0NCBBTQ0KPiBUbzogQmhhcmF0IEJodXNoYW4gPGJoYXJhdC5iaHVzaGFuQG54cC5jb20+Ow0K
PiBiZW5oQGtlcm5lbC5jcmFzaGluZy5vcmc7IHBhdWx1c0BzYW1iYS5vcmc7IG1wZUBlbGxlcm1h
bi5pZC5hdTsNCj4gZ2FsYWtAa2VybmVsLmNyYXNoaW5nLm9yZzsgbWFyay5ydXRsYW5kQGFybS5j
b207DQo+IGtzdGV3YXJ0QGxpbnV4Zm91bmRhdGlvbi5vcmc7IGdyZWdraEBsaW51eGZvdW5kYXRp
b24ub3JnOw0KPiBkZXZpY2V0cmVlQHZnZXIua2VybmVsLm9yZzsgbGludXhwcGMtZGV2QGxpc3Rz
Lm96bGFicy5vcmc7IGxpbnV4LQ0KPiBrZXJuZWxAdmdlci5rZXJuZWwub3JnDQo+IENjOiByb2Jo
QGtlcm5lbC5vcmc7IGtlZXNjb29rQGNocm9taXVtLm9yZzsgdHlyZWxkQGxpbnV4LnZuZXQuaWJt
LmNvbTsNCj4gam9lQHBlcmNoZXMuY29tDQo+IFN1YmplY3Q6IFJlOiBbUkZDIDUvNV0gcG93ZXJw
Yy9mc2w6IEFkZCBzdXBwb3J0ZWQtaXJxLXJhbmdlcyBmb3IgUDIwMjANCj4gDQo+IE9uIEZyaSwg
MjAxOC0wNy0yNyBhdCAxNToxOCArMDUzMCwgQmhhcmF0IEJodXNoYW4gd3JvdGU6DQo+ID4gTVBJ
QyBvbiBOWFAgKEZyZWVzY2FsZSkgUDIwMjAgc3VwcG9ydHMgZm9sbG93aW5nIGlycQ0KPiA+IHJh
bmdlczoNCj4gPiAgID4gMCAtIDExICAgICAgKEV4dGVybmFsIGludGVycnVwdCkNCj4gPiAgID4g
MTYgLSA3OSAgICAgKEludGVybmFsIGludGVycnVwdCkNCj4gPiAgID4gMTc2IC0gMTgzICAgKE1l
c3NhZ2luZyBpbnRlcnJ1cHQpDQo+ID4gICA+IDIyNCAtIDIzMSAgIChTaGFyZWQgbWVzc2FnZSBz
aWduYWxlZCBpbnRlcnJ1cHQpDQo+IA0KPiBXaHkgZG9uJ3QgeW91IGNvbnZlcnQgdG8gdGhlIDQt
Y2VsbCBpbnRlcnJ1cHQgc3BlY2lmaWVycyB0aGF0IG1ha2UgZGVhbGluZw0KPiB3aXRoIHRoZXNl
IHJhbmdlcyBsZXNzIGVycm9yLXByb25lPw0KDQpPayAsIHdpbGwgZG8gaWYgd2UgYWdyZWUgdG8g
aGF2ZSB0aGlzIHNlcmllcyBhcyBwZXIgY29tbWVudCBvbiBvdGhlciBwYXRjaC4NCg0KPiANCj4g
PiBkaWZmIC0tZ2l0IGEvYXJjaC9wb3dlcnBjL3BsYXRmb3Jtcy84NXh4L21wYzg1eHhfcmRiLmMN
Cj4gPiBiL2FyY2gvcG93ZXJwYy9wbGF0Zm9ybXMvODV4eC9tcGM4NXh4X3JkYi5jDQo+ID4gaW5k
ZXggMTAwNjk1MC4uNDlmZjM0OCAxMDA2NDQNCj4gPiAtLS0gYS9hcmNoL3Bvd2VycGMvcGxhdGZv
cm1zLzg1eHgvbXBjODV4eF9yZGIuYw0KPiA+ICsrKyBiL2FyY2gvcG93ZXJwYy9wbGF0Zm9ybXMv
ODV4eC9tcGM4NXh4X3JkYi5jDQo+ID4gQEAgLTU3LDYgKzU3LDExIEBAIHZvaWQgX19pbml0IG1w
Yzg1eHhfcmRiX3BpY19pbml0KHZvaWQpDQo+ID4gIAkJCU1QSUNfQklHX0VORElBTiB8DQo+ID4g
IAkJCU1QSUNfU0lOR0xFX0RFU1RfQ1BVLA0KPiA+ICAJCQkwLCAyNTYsICIgT3BlblBJQyAgIik7
DQo+ID4gKwl9IGVsc2UgaWYgKG9mX21hY2hpbmVfaXNfY29tcGF0aWJsZSgiZnNsLFAyMDIwUkRC
LVBDIikpIHsNCj4gPiArCQltcGljID0gbXBpY19hbGxvYyhOVUxMLCAwLA0KPiA+ICsJCSAgTVBJ
Q19CSUdfRU5ESUFOIHwNCj4gPiArCQkgIE1QSUNfU0lOR0xFX0RFU1RfQ1BVLA0KPiA+ICsJCSAg
MCwgMCwgIiBPcGVuUElDICAiKTsNCj4gPiAgCX0gZWxzZSB7DQo+ID4gIAkJbXBpYyA9IG1waWNf
YWxsb2MoTlVMTCwgMCwNCj4gPiAgCQkgIE1QSUNfQklHX0VORElBTiB8DQo+IA0KPiBJIGRvbid0
IHRoaW5rIHdlIHdhbnQgdG8gZ3JvdyBhIGxpc3Qgb2YgZXZlcnkgc2luZ2xlIHJldmlzaW9uIG9m
IGV2ZXJ5IGJvYXJkIGluDQo+IHRoZXNlIHBsYXRmb3JtIGZpbGVzLg0KDQpPbmUgb3RoZXIgY29u
ZnVzaW5nIG9ic2VydmF0aW9uIEkgaGF2ZSBpcyB0aGF0ICJpcnFfY291bnQiIGZyb20gcGxhdGZv
cm0gY29kZSBpcyBnaXZlbiBwcmVjZWRlbmNlIG92ZXIgImxhc3QtaW50ZXJydXB0LXNvdXJjZSIg
aW4gZGV2aWNlLXRyZWUuDQpTaG91bGQgbm90IGRldmljZS10cmVlIHNob3VsZCBoYXZlIHByZWNl
ZGVuY2Ugb3RoZXJ3aXNlIHRoZXJlIGlzIG5vIHBvaW50IHVzaW5nICIgbGFzdC1pbnRlcnJ1cHQt
c291cmNlIiBpZiBwbGF0Zm9ybSBjb2RlIHBhc3NlcyAiaXJxX2NvdW50IiBpbiBtcGljX2FsbG9j
KCkuDQoNClRoYW5rcw0KLUJoYXJhdA0KDQo+IA0KPiAtU2NvdHQNCg0K
^ permalink raw reply
* [PATCH] powerpc/topology: Check at boot for topology updates
From: Srikar Dronamraju @ 2018-08-08 4:41 UTC (permalink / raw)
To: linuxppc-dev, Michael Ellerman
Cc: Michael Bringmann, Manjunatha H R, Srikar Dronamraju,
Anshuman Khandual
On a shared lpar, Phyp will not update the cpu associativity at boot
time. Just after the boot system does recognize itself as a shared lpar and
trigger a request for correct cpu associativity. But by then the scheduler
would have already created/destroyed its sched domains.
This causes
- Broken load balance across Nodes causing islands of cores.
- Performance degradation esp if the system is lightly loaded
- dmesg to wrongly report all cpus to be in Node 0.
- Messages in dmesg saying borken topology.
- With commit: 051f3ca02e46432 ("sched/topology: Introduce NUMA identity
node sched domain"), can cause rcu stalls at boot up.
>From a scheduler maintainer's perspective, moving cpus from one node to
another or creating more numa levels after boot is not appropriate
without some notification to the user space.
https://lore.kernel.org/lkml/20150406214558.GA38501@linux.vnet.ibm.com/T/#u
The sched_domains_numa_masks table which is used to generate cpumasks is
only created at boot time just before creating sched domains and never
updated. Hence, its better to get the topology correct before the sched
domains are created.
For example on 64 core Power 8 shared lpar, dmesg reports
[ 2.088360] Brought up 512 CPUs
[ 2.088368] Node 0 CPUs: 0-511
[ 2.088371] Node 1 CPUs:
[ 2.088373] Node 2 CPUs:
[ 2.088375] Node 3 CPUs:
[ 2.088376] Node 4 CPUs:
[ 2.088378] Node 5 CPUs:
[ 2.088380] Node 6 CPUs:
[ 2.088382] Node 7 CPUs:
[ 2.088386] Node 8 CPUs:
[ 2.088388] Node 9 CPUs:
[ 2.088390] Node 10 CPUs:
[ 2.088392] Node 11 CPUs:
...
[ 3.916091] BUG: arch topology borken
[ 3.916103] the DIE domain not a subset of the NUMA domain
[ 3.916105] BUG: arch topology borken
[ 3.916106] the DIE domain not a subset of the NUMA domain
...
numactl/lscpu output will still be correct with cores spreading across
all nodes.
Socket(s): 64
NUMA node(s): 12
Model: 2.0 (pvr 004d 0200)
Model name: POWER8 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
NUMA node4 CPU(s): 208-215,304-311,400-407,496-503
NUMA node5 CPU(s): 168-175,264-271,360-367,456-463
NUMA node6 CPU(s): 128-135,224-231,320-327,416-423
NUMA node7 CPU(s): 136-143,232-239,328-335,424-431
NUMA node8 CPU(s): 216-223,312-319,408-415,504-511
NUMA node9 CPU(s): 144-151,240-247,336-343,432-439
NUMA node10 CPU(s): 152-159,248-255,344-351,440-447
NUMA node11 CPU(s): 160-167,256-263,352-359,448-455
Currently on this lpar, the scheduler detects 2 levels of Numa and
created numa sched domains for all cpus, but it finds a single DIE
domain consisting of all cpus. Hence it deletes all numa sched domains.
To address this, split the topology update init, such that the first
part detects vphn/prrn soon after cpus are setup and force updates
topology just before scheduler creates sched domain.
With the fix, dmesg reports
[ 0.491336] numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 368-375 464-471
[ 0.491351] numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 376-383 472-479
[ 0.491359] numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 384-391 480-487
[ 0.491366] numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 392-399 488-495
[ 0.491374] numa: Node 4 CPUs: 208-215 304-311 400-407 496-503
[ 0.491379] numa: Node 5 CPUs: 168-175 264-271 360-367 456-463
[ 0.491384] numa: Node 6 CPUs: 128-135 224-231 320-327 416-423
[ 0.491389] numa: Node 7 CPUs: 136-143 232-239 328-335 424-431
[ 0.491394] numa: Node 8 CPUs: 216-223 312-319 408-415 504-511
[ 0.491399] numa: Node 9 CPUs: 144-151 240-247 336-343 432-439
[ 0.491404] numa: Node 10 CPUs: 152-159 248-255 344-351 440-447
[ 0.491409] numa: Node 11 CPUs: 160-167 256-263 352-359 448-455
and lscpu would also report
Socket(s): 64
NUMA node(s): 12
Model: 2.0 (pvr 004d 0200)
Model name: POWER8 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
NUMA node4 CPU(s): 208-215,304-311,400-407,496-503
NUMA node5 CPU(s): 168-175,264-271,360-367,456-463
NUMA node6 CPU(s): 128-135,224-231,320-327,416-423
NUMA node7 CPU(s): 136-143,232-239,328-335,424-431
NUMA node8 CPU(s): 216-223,312-319,408-415,504-511
NUMA node9 CPU(s): 144-151,240-247,336-343,432-439
NUMA node10 CPU(s): 152-159,248-255,344-351,440-447
NUMA node11 CPU(s): 160-167,256-263,352-359,448-455
Previous attempt to solve this problem
https://patchwork.ozlabs.org/patch/530090/
Reported-by: Manjunatha H R <manjuhr1@in.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/topology.h | 4 ++++
arch/powerpc/kernel/smp.c | 6 ++++++
arch/powerpc/mm/numa.c | 22 ++++++++++++++--------
3 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 16b077801a5f..a14550476bc7 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -92,6 +92,7 @@ extern int stop_topology_update(void);
extern int prrn_is_enabled(void);
extern int find_and_online_cpu_nid(int cpu);
extern int timed_topology_update(int nsecs);
+extern void check_topology_updates(void);
#else
static inline int start_topology_update(void)
{
@@ -113,6 +114,9 @@ static inline int timed_topology_update(int nsecs)
{
return 0;
}
+static void check_topology_updates(void)
+{
+}
#endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
#include <asm-generic/topology.h>
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 4794d6b4f4d2..2aa0ffd954c9 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1156,6 +1156,12 @@ void __init smp_cpus_done(unsigned int max_cpus)
if (smp_ops && smp_ops->bringup_done)
smp_ops->bringup_done();
+ /*
+ * On a shared LPAR, associativity needs to be requested.
+ * Hence, check for numa topology updates before dumping
+ * cpu topology
+ */
+ check_topology_updates();
dump_numa_cpu_topology();
/*
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 0c7e05d89244..eab46a44436f 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1515,6 +1515,7 @@ int start_topology_update(void)
lppaca_shared_proc(get_lppaca())) {
if (!vphn_enabled) {
vphn_enabled = 1;
+ topology_update_needed = 1;
setup_cpu_associativity_change_counters();
timer_setup(&topology_timer, topology_timer_fn,
TIMER_DEFERRABLE);
@@ -1551,6 +1552,19 @@ int prrn_is_enabled(void)
return prrn_enabled;
}
+void check_topology_updates(void)
+{
+ /* Do not poll for changes if disabled at boot */
+ if (topology_updates_enabled)
+ start_topology_update();
+
+ if (topology_update_needed) {
+ bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
+ nr_cpumask_bits);
+ numa_update_cpu_topology(false);
+ }
+}
+
static int topology_read(struct seq_file *file, void *v)
{
if (vphn_enabled || prrn_enabled)
@@ -1597,10 +1611,6 @@ static const struct file_operations topology_ops = {
static int topology_update_init(void)
{
- /* Do not poll for changes if disabled at boot */
- if (topology_updates_enabled)
- start_topology_update();
-
if (vphn_enabled)
topology_schedule_update();
@@ -1608,10 +1618,6 @@ static int topology_update_init(void)
return -ENOMEM;
topology_inited = 1;
- if (topology_update_needed)
- bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
- nr_cpumask_bits);
-
return 0;
}
device_initcall(topology_update_init);
--
2.17.1
^ permalink raw reply related
* Re: [PATCH v6 00/11] hugetlb: Factorize hugetlb architecture primitives
From: Alex Ghiti @ 2018-08-08 5:36 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-mm, mike.kravetz, linux, catalin.marinas, will.deacon,
tony.luck, fenghua.yu, ralf, paul.burton, jhogan, jejb, deller,
benh, paulus, mpe, ysato, dalias, davem, tglx, mingo, hpa, x86,
arnd, linux-arm-kernel, linux-kernel, linux-ia64, linux-mips,
linux-parisc, linuxppc-dev, linux-sh, sparclinux, linux-arch
In-Reply-To: <20180807095402.GA12200@gmail.com>
Thanks for your time,
Alex
Le 07/08/2018 à 09:54, Ingo Molnar a écrit :
> * Alexandre Ghiti <alex@ghiti.fr> wrote:
>
>> [CC linux-mm for inclusion in -mm tree]
>>
>> In order to reduce copy/paste of functions across architectures and then
>> make riscv hugetlb port (and future ports) simpler and smaller, this
>> patchset intends to factorize the numerous hugetlb primitives that are
>> defined across all the architectures.
>>
>> Except for prepare_hugepage_range, this patchset moves the versions that
>> are just pass-through to standard pte primitives into
>> asm-generic/hugetlb.h by using the same #ifdef semantic that can be
>> found in asm-generic/pgtable.h, i.e. __HAVE_ARCH_***.
>>
>> s390 architecture has not been tackled in this serie since it does not
>> use asm-generic/hugetlb.h at all.
>>
>> This patchset has been compiled on all addressed architectures with
>> success (except for parisc, but the problem does not come from this
>> series).
>>
>> v6:
>> - Remove nohash/32 and book3s/32 powerpc specific implementations in
>> order to use the generic ones.
>> - Add all the Reviewed-by, Acked-by and Tested-by in the commits,
>> thanks to everyone.
>>
>> v5:
>> As suggested by Mike Kravetz, no need to move the #include
>> <asm-generic/hugetlb.h> for arm and x86 architectures, let it live at
>> the top of the file.
>>
>> v4:
>> Fix powerpc build error due to misplacing of #include
>> <asm-generic/hugetlb.h> outside of #ifdef CONFIG_HUGETLB_PAGE, as
>> pointed by Christophe Leroy.
>>
>> v1, v2, v3:
>> Same version, just problems with email provider and misuse of
>> --batch-size option of git send-email
>>
>> Alexandre Ghiti (11):
>> hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h
>> hugetlb: Introduce generic version of hugetlb_free_pgd_range
>> hugetlb: Introduce generic version of set_huge_pte_at
>> hugetlb: Introduce generic version of huge_ptep_get_and_clear
>> hugetlb: Introduce generic version of huge_ptep_clear_flush
>> hugetlb: Introduce generic version of huge_pte_none
>> hugetlb: Introduce generic version of huge_pte_wrprotect
>> hugetlb: Introduce generic version of prepare_hugepage_range
>> hugetlb: Introduce generic version of huge_ptep_set_wrprotect
>> hugetlb: Introduce generic version of huge_ptep_set_access_flags
>> hugetlb: Introduce generic version of huge_ptep_get
>>
>> arch/arm/include/asm/hugetlb-3level.h | 32 +---------
>> arch/arm/include/asm/hugetlb.h | 30 ----------
>> arch/arm64/include/asm/hugetlb.h | 39 +++---------
>> arch/ia64/include/asm/hugetlb.h | 47 ++-------------
>> arch/mips/include/asm/hugetlb.h | 40 +++----------
>> arch/parisc/include/asm/hugetlb.h | 33 +++--------
>> arch/powerpc/include/asm/book3s/32/pgtable.h | 6 --
>> arch/powerpc/include/asm/book3s/64/pgtable.h | 1 +
>> arch/powerpc/include/asm/hugetlb.h | 43 ++------------
>> arch/powerpc/include/asm/nohash/32/pgtable.h | 6 --
>> arch/powerpc/include/asm/nohash/64/pgtable.h | 1 +
>> arch/sh/include/asm/hugetlb.h | 54 ++---------------
>> arch/sparc/include/asm/hugetlb.h | 40 +++----------
>> arch/x86/include/asm/hugetlb.h | 69 ----------------------
>> include/asm-generic/hugetlb.h | 88 +++++++++++++++++++++++++++-
>> 15 files changed, 135 insertions(+), 394 deletions(-)
> The x86 bits look good to me (assuming it's all tested on all relevant architectures, etc.)
>
> Acked-by: Ingo Molnar <mingo@kernel.org>
>
> Thanks,
>
> Ingo
^ permalink raw reply
* RE: [RFC 3/5] powerpc/mpic: Add support for non-contiguous irq ranges
From: Bharat Bhushan @ 2018-08-08 5:57 UTC (permalink / raw)
To: Scott Wood, Rob Herring
Cc: benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au,
galak@kernel.crashing.org, mark.rutland@arm.com,
kstewart@linuxfoundation.org, gregkh@linuxfoundation.org,
devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org, keescook@chromium.org,
tyreld@linux.vnet.ibm.com, joe@perches.com
In-Reply-To: <7526e100693e4db9bb3adf576254b2161086dfe8.camel@buserror.net>
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogU2NvdHQgV29vZCBbbWFp
bHRvOm9zc0BidXNlcnJvci5uZXRdDQo+IFNlbnQ6IFdlZG5lc2RheSwgQXVndXN0IDgsIDIwMTgg
MTE6MjEgQU0NCj4gVG86IEJoYXJhdCBCaHVzaGFuIDxiaGFyYXQuYmh1c2hhbkBueHAuY29tPjsg
Um9iIEhlcnJpbmcNCj4gPHJvYmhAa2VybmVsLm9yZz4NCj4gQ2M6IGJlbmhAa2VybmVsLmNyYXNo
aW5nLm9yZzsgcGF1bHVzQHNhbWJhLm9yZzsgbXBlQGVsbGVybWFuLmlkLmF1Ow0KPiBnYWxha0Br
ZXJuZWwuY3Jhc2hpbmcub3JnOyBtYXJrLnJ1dGxhbmRAYXJtLmNvbTsNCj4ga3N0ZXdhcnRAbGlu
dXhmb3VuZGF0aW9uLm9yZzsgZ3JlZ2toQGxpbnV4Zm91bmRhdGlvbi5vcmc7DQo+IGRldmljZXRy
ZWVAdmdlci5rZXJuZWwub3JnOyBsaW51eHBwYy1kZXZAbGlzdHMub3psYWJzLm9yZzsgbGludXgt
DQo+IGtlcm5lbEB2Z2VyLmtlcm5lbC5vcmc7IGtlZXNjb29rQGNocm9taXVtLm9yZzsNCj4gdHly
ZWxkQGxpbnV4LnZuZXQuaWJtLmNvbTsgam9lQHBlcmNoZXMuY29tDQo+IFN1YmplY3Q6IFJlOiBb
UkZDIDMvNV0gcG93ZXJwYy9tcGljOiBBZGQgc3VwcG9ydCBmb3Igbm9uLWNvbnRpZ3VvdXMgaXJx
DQo+IHJhbmdlcw0KPiANCj4gT24gV2VkLCAyMDE4LTA4LTA4IGF0IDAzOjM3ICswMDAwLCBCaGFy
YXQgQmh1c2hhbiB3cm90ZToNCj4gPiA+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4g
PiBGcm9tOiBTY290dCBXb29kIFttYWlsdG86b3NzQGJ1c2Vycm9yLm5ldF0NCj4gPiA+IFNlbnQ6
IFdlZG5lc2RheSwgQXVndXN0IDgsIDIwMTggMjozNCBBTQ0KPiA+ID4gVG86IFJvYiBIZXJyaW5n
IDxyb2JoQGtlcm5lbC5vcmc+OyBCaGFyYXQgQmh1c2hhbg0KPiA+ID4gPGJoYXJhdC5iaHVzaGFu
QG54cC5jb20+DQo+ID4gPiBDYzogYmVuaEBrZXJuZWwuY3Jhc2hpbmcub3JnOyBwYXVsdXNAc2Ft
YmEub3JnOyBtcGVAZWxsZXJtYW4uaWQuYXU7DQo+ID4gPiBnYWxha0BrZXJuZWwuY3Jhc2hpbmcu
b3JnOyBtYXJrLnJ1dGxhbmRAYXJtLmNvbTsNCj4gPiA+IGtzdGV3YXJ0QGxpbnV4Zm91bmRhdGlv
bi5vcmc7IGdyZWdraEBsaW51eGZvdW5kYXRpb24ub3JnOw0KPiA+ID4gZGV2aWNldHJlZUB2Z2Vy
Lmtlcm5lbC5vcmc7IGxpbnV4cHBjLWRldkBsaXN0cy5vemxhYnMub3JnOyBsaW51eC0NCj4gPiA+
IGtlcm5lbEB2Z2VyLmtlcm5lbC5vcmc7IGtlZXNjb29rQGNocm9taXVtLm9yZzsNCj4gPiA+IHR5
cmVsZEBsaW51eC52bmV0LmlibS5jb207IGpvZUBwZXJjaGVzLmNvbQ0KPiA+ID4gU3ViamVjdDog
UmU6IFtSRkMgMy81XSBwb3dlcnBjL21waWM6IEFkZCBzdXBwb3J0IGZvciBub24tY29udGlndW91
cw0KPiA+ID4gaXJxIHJhbmdlcw0KPiA+ID4NCj4gPiA+IE9uIFR1ZSwgMjAxOC0wOC0wNyBhdCAx
MjowOSAtMDYwMCwgUm9iIEhlcnJpbmcgd3JvdGU6DQo+ID4gPiA+IE9uIEZyaSwgSnVsIDI3LCAy
MDE4IGF0IDAzOjE3OjU5UE0gKzA1MzAsIEJoYXJhdCBCaHVzaGFuIHdyb3RlOg0KPiA+ID4gPiA+
IEZyZWVzY2FsZSBNUElDIGgvdyBtYXkgbm90IHN1cHBvcnQgYWxsIGludGVycnVwdCBzb3VyY2Vz
DQo+ID4gPiA+ID4gcmVwb3J0ZWQgYnkgaGFyZHdhcmUsICJsYXN0LWludGVycnVwdC1zb3VyY2Ui
IG9yIHBsYXRmb3JtLiBPbg0KPiA+ID4gPiA+IHRoZXNlIHBsYXRmb3JtcyBhIG1pc2NvbmZpZ3Vy
ZWQgZGV2aWNlIHRyZWUgdGhhdCBhc3NpZ25zIG9uZSBvZg0KPiA+ID4gPiA+IHRoZSByZXNlcnZl
ZCBpbnRlcnJ1cHRzIGxlYXZlcyBhIG5vbi1mdW5jdGlvbmluZyBzeXN0ZW0gd2l0aG91dA0KPiB3
YXJuaW5nLg0KPiA+ID4gPg0KPiA+ID4gPiBUaGVyZSBhcmUgbG90cyBvZiB3YXlzIHRvIG1pc2Nv
bmZpZ3VyZSBEVHMuIEkgZG9uJ3QgdGhpbmsgdGhpcyBpcw0KPiA+ID4gPiBzcGVjaWFsIGFuZCBu
ZWVkcyBhIHByb3BlcnR5Lg0KPiA+ID4NCj4gPiA+IFllYWgsIHRoZSBzeXN0ZW0gd2lsbCBiZSBq
dXN0IGFzIG5vbi1mdW5jdGlvbmluZyBpZiB5b3Ugc3BlY2lmeSBhDQo+ID4gPiB2YWxpZC0NCj4g
PiA+IGJ1dC0NCj4gPiA+IHdyb25nLWZvci10aGUtZGV2aWNlIGludGVycnVwdCBudW1iZXIuDQo+
ID4NCj4gPiBTb21lIGlzIG9uZSBhZGRpdGlvbmFsIGJlbmVmaXRzIG9mIHRoaXMgY2hhbmdlcywg
TVBJQyBoYXZlIHJlc2VydmVkDQo+ID4gcmVnaW9ucyBmb3IgdW4tc3VwcG9ydGVkIGludGVycnVw
dHMgYW5kIHJlYWQvd3JpdGVzIHRvIHRoZXNlIHJlc2VydmVkDQo+ID4gcmVnaW9ucyBzZWFtcyBo
YXZlIG5vIGVmZmVjdC4NCj4gPiBNUElDIGRyaXZlciByZWFkcy93cml0ZXMgdG8gdGhlIHJlc2Vy
dmVkIHJlZ2lvbnMgZHVyaW5nIGluaXQvdW5pbml0DQo+ID4gYW5kIHNhdmUvcmVzdG9yZSBzdGF0
ZS4NCj4gPg0KPiA+IExldCBtZSBrbm93IGlmIGl0IG1ha2Ugc2Vuc2UgdG8gaGF2ZSB0aGVzZSBj
aGFuZ2VzIGZvciBtZW50aW9uZWQNCj4gcmVhc29ucy4NCj4gDQo+IFRoZSBkcml2ZXIgaGFzIGJl
ZW4gZG9pbmcgdGhpcyBmb3JldmVyIHdpdGggbm8gaWxsIGVmZmVjdC4NCg0KWWVzLCB0aGVyZSBh
cmUgbm8gaXNzdWUgcmVwb3J0ZWQNCg0KPiAgV2hhdCBpcyB0aGUgIG1vdGl2YXRpb24gZm9yIHRo
aXMgY2hhbmdlPw0KDQpPbiBTaW11bGF0aW9uIG1vZGVsIEkgc2VlIHdhcm5pbmcgd2hlbiBhY2Nl
c3NpbmcgdGhlIHJlc2VydmVkIHJlZ2lvbiwgU28gdGhpcyBwYXRjaCBpcyBqdXN0IGFuIGVmZm9y
dCB0byBpbXByb3ZlLg0KDQpUaGFua3MNCi1CaGFyYXQNCg0KPiANCj4gLVNjb3R0DQoNCg==
^ permalink raw reply
* Re: [RFC 3/5] powerpc/mpic: Add support for non-contiguous irq ranges
From: Scott Wood @ 2018-08-08 5:50 UTC (permalink / raw)
To: Bharat Bhushan, Rob Herring
Cc: benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au,
galak@kernel.crashing.org, mark.rutland@arm.com,
kstewart@linuxfoundation.org, gregkh@linuxfoundation.org,
devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org, keescook@chromium.org,
tyreld@linux.vnet.ibm.com, joe@perches.com
In-Reply-To: <AM5PR0401MB2545AF363F5DD258205177FC9A260@AM5PR0401MB2545.eurprd04.prod.outlook.com>
On Wed, 2018-08-08 at 03:37 +0000, Bharat Bhushan wrote:
> > -----Original Message-----
> > From: Scott Wood [mailto:oss@buserror.net]
> > Sent: Wednesday, August 8, 2018 2:34 AM
> > To: Rob Herring <robh@kernel.org>; Bharat Bhushan
> > <bharat.bhushan@nxp.com>
> > Cc: benh@kernel.crashing.org; paulus@samba.org; mpe@ellerman.id.au;
> > galak@kernel.crashing.org; mark.rutland@arm.com;
> > kstewart@linuxfoundation.org; gregkh@linuxfoundation.org;
> > devicetree@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> > kernel@vger.kernel.org; keescook@chromium.org;
> > tyreld@linux.vnet.ibm.com; joe@perches.com
> > Subject: Re: [RFC 3/5] powerpc/mpic: Add support for non-contiguous irq
> > ranges
> >
> > On Tue, 2018-08-07 at 12:09 -0600, Rob Herring wrote:
> > > On Fri, Jul 27, 2018 at 03:17:59PM +0530, Bharat Bhushan wrote:
> > > > Freescale MPIC h/w may not support all interrupt sources reported by
> > > > hardware, "last-interrupt-source" or platform. On these platforms a
> > > > misconfigured device tree that assigns one of the reserved
> > > > interrupts leaves a non-functioning system without warning.
> > >
> > > There are lots of ways to misconfigure DTs. I don't think this is
> > > special and needs a property.
> >
> > Yeah, the system will be just as non-functioning if you specify a valid-
> > but-
> > wrong-for-the-device interrupt number.
>
> Some is one additional benefits of this changes, MPIC have reserved regions
> for un-supported interrupts and read/writes to these reserved regions seams
> have no effect.
> MPIC driver reads/writes to the reserved regions during init/uninit and
> save/restore state.
>
> Let me know if it make sense to have these changes for mentioned reasons.
The driver has been doing this forever with no ill effect. What is the
motivation for this change?
-Scott
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox