LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [v3 PATCH 0/5] powerpc/pseries: Machien check handler improvements.
From: Mahesh J Salgaonkar @ 2018-06-07 17:27 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Ellerman, stable, Aneesh Kumar K.V, Aneesh Kumar K.V,
	Michael Ellerman, Laurent Dufour, Nicholas Piggin

This patch series includes some improvement to Machine check handler
for pseries. Patch 1 fixes an issue where machine check handler crashes
kernel while accessing vmalloc-ed buffer while in nmi context.
Patch 2 fixes endain bug while restoring of r3 in MCE handler.
Patch 4 dumps the SLB contents on SLB MCE errors to improve the debugability.
Patch 5 display's the MCE error details on console.

CHange in V3:
- Moved patch 5 to patch 2

Change in V2:
- patch 3: Display additional info (NIP and task info) in MCE error details.
- patch 5: Fix endain bug while restoring of r3 in MCE handler.

---

Mahesh Salgaonkar (5):
      powerpc/pseries: convert rtas_log_buf to linear allocation.
      powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
      powerpc/pseries: Define MCE error event section.
      powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
      powerpc/pseries: Display machine check error details.


 arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 
 arch/powerpc/include/asm/rtas.h               |  109 ++++++++++++++++++
 arch/powerpc/kernel/rtasd.c                   |    2 
 arch/powerpc/mm/slb.c                         |   35 ++++++
 arch/powerpc/platforms/pseries/ras.c          |  155 +++++++++++++++++++++++++
 5 files changed, 299 insertions(+), 3 deletions(-)

--
Signature

^ permalink raw reply

* [v3 PATCH 1/5] powerpc/pseries: convert rtas_log_buf to linear allocation.
From: Mahesh J Salgaonkar @ 2018-06-07 17:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: stable, Aneesh Kumar K.V, Aneesh Kumar K.V, Michael Ellerman,
	Laurent Dufour, Nicholas Piggin
In-Reply-To: <152839244928.25118.15100234720683911223.stgit@jupiter.in.ibm.com>

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

rtas_log_buf is a buffer to hold RTAS event data that are communicated
to kernel by hypervisor. This buffer is then used to pass RTAS event
data to user through proc fs. This buffer is allocated from vmalloc
(non-linear mapping) area.

On Machine check interrupt, register r3 points to RTAS extended event
log passed by hypervisor that contains the MCE event. The pseries
machine check handler then logs this error into rtas_log_buf. The
rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
page fault (vector 0x300) while accessing it. Since machine check
interrupt handler runs in NMI context we can not afford to take any
page fault. Page faults are not honored in NMI context and causes
kernel panic. This patch fixes this issue by allocating rtas_log_buf
using kmalloc.

Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Cc: stable@vger.kernel.org
Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/rtasd.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index f915db93cd42..3957d4ae2ba2 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -559,7 +559,7 @@ static int __init rtas_event_scan_init(void)
 	rtas_error_log_max = rtas_get_error_log_max();
 	rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);

-	rtas_log_buf = vmalloc(rtas_error_log_buffer_max*LOG_NUMBER);
+	rtas_log_buf = kmalloc(rtas_error_log_buffer_max*LOG_NUMBER, GFP_KERNEL);
 	if (!rtas_log_buf) {
 		printk(KERN_ERR "rtasd: no memory\n");
 		return -ENOMEM;

^ permalink raw reply related

* [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
From: Mahesh J Salgaonkar @ 2018-06-07 17:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: stable, Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour,
	Nicholas Piggin
In-Reply-To: <152839244928.25118.15100234720683911223.stgit@jupiter.in.ibm.com>

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.

Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.

[   62.878965] Severe Machine check interrupt [Recovered]
[   62.878968]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
[   62.878969]   Initiator: CPU
[   62.878970]   Error type: SLB [Multihit]
[   62.878971]     Effective address: d00000000ca70000
cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
    pc: c0000000009694c0: vsnprintf+0x80/0x480
    lr: c0000000009698e0: vscnprintf+0x20/0x60
    sp: c0000000fc777830
   msr: 8000000002009033
   dar: a803a30c000000d0
  current = 0xc00000000bc9ef00
  paca    = 0xc00000001eca5c00	 softe: 3	 irq_happened: 0x01
    pid   = 8860, comm = insmod
[c0000000fc7778b0] c0000000009698e0 vscnprintf+0x20/0x60
[c0000000fc7778e0] c00000000016b6c4 vprintk_emit+0xb4/0x4b0
[c0000000fc777960] c00000000016d40c vprintk_func+0x5c/0xd0
[c0000000fc777980] c00000000016cbb4 printk+0x38/0x4c
[c0000000fc7779a0] d00000000ca301c0 init_module+0x1c0/0x338 [bork_kernel]
[c0000000fc777a40] c00000000000d9c4 do_one_initcall+0x54/0x230
[c0000000fc777b00] c0000000001b3b74 do_init_module+0x8c/0x248
[c0000000fc777b90] c0000000001b2478 load_module+0x12b8/0x15b0
[c0000000fc777d30] c0000000001b29e8 sys_finit_module+0xa8/0x110
[c0000000fc777e30] c00000000000b204 system_call+0x58/0x6c
--- Exception: c00 (System Call) at 00007fff8bda0644
SP (7fffdfbfe980) is in userspace

This patch fixes this issue.

Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support")
Cc: stable@vger.kernel.org
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/ras.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 5e1ef9150182..2edc673be137 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -360,7 +360,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
 	}
 
 	savep = __va(regs->gpr[3]);
-	regs->gpr[3] = savep[0];	/* restore original r3 */
+	regs->gpr[3] = be64_to_cpu(savep[0]);	/* restore original r3 */
 
 	/* If it isn't an extended log we can use the per cpu 64bit buffer */
 	h = (struct rtas_error_log *)&savep[1];

^ permalink raw reply related

* [v3 PATCH 3/5] powerpc/pseries: Define MCE error event section.
From: Mahesh J Salgaonkar @ 2018-06-07 17:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour,
	Nicholas Piggin
In-Reply-To: <152839244928.25118.15100234720683911223.stgit@jupiter.in.ibm.com>

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

On pseries, the machine check error details are part of RTAS extended
event log passed under Machine check exception section. This patch adds
the definition of rtas MCE event section and related helper
functions.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  104 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index ec9dd79398ee..3f2fba7ef23b 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -275,6 +275,7 @@ inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log)
 #define PSERIES_ELOG_SECT_ID_CALL_HOME		(('C' << 8) | 'H')
 #define PSERIES_ELOG_SECT_ID_USER_DEF		(('U' << 8) | 'D')
 #define PSERIES_ELOG_SECT_ID_HOTPLUG		(('H' << 8) | 'P')
+#define PSERIES_ELOG_SECT_ID_MCE		(('M' << 8) | 'C')
 
 /* Vendor specific Platform Event Log Format, Version 6, section header */
 struct pseries_errorlog {
@@ -326,6 +327,109 @@ struct pseries_hp_errorlog {
 #define PSERIES_HP_ELOG_ID_DRC_COUNT	3
 #define PSERIES_HP_ELOG_ID_DRC_IC	4
 
+/* RTAS pseries MCE errorlog section */
+#pragma pack(push, 1)
+struct pseries_mc_errorlog {
+	__be32	fru_id;
+	__be32	proc_id;
+	uint8_t	error_type;
+	union {
+		struct {
+			uint8_t	ue_err_type;
+			/* XXXXXXXX
+			 * X		1: Permanent or Transient UE.
+			 *  X		1: Effective address provided.
+			 *   X		1: Logical address provided.
+			 *    XX	2: Reserved.
+			 *      XXX	3: Type of UE error.
+			 */
+			uint8_t	reserved_1[6];
+			__be64	effective_address;
+			__be64	logical_address;
+		} ue_error;
+		struct {
+			uint8_t	soft_err_type;
+			/* XXXXXXXX
+			 * X		1: Effective address provided.
+			 *  XXXXX	5: Reserved.
+			 *       XX	2: Type of SLB/ERAT/TLB error.
+			 */
+			uint8_t	reserved_1[6];
+			__be64	effective_address;
+			uint8_t	reserved_2[8];
+		} soft_error;
+	} u;
+};
+#pragma pack(pop)
+
+/* RTAS pseries MCE error types */
+#define PSERIES_MC_ERROR_TYPE_UE		0x00
+#define PSERIES_MC_ERROR_TYPE_SLB		0x01
+#define PSERIES_MC_ERROR_TYPE_ERAT		0x02
+#define PSERIES_MC_ERROR_TYPE_TLB		0x04
+#define PSERIES_MC_ERROR_TYPE_D_CACHE		0x05
+#define PSERIES_MC_ERROR_TYPE_I_CACHE		0x07
+
+/* RTAS pseries MCE error sub types */
+#define PSERIES_MC_ERROR_UE_INDETERMINATE		0
+#define PSERIES_MC_ERROR_UE_IFETCH			1
+#define PSERIES_MC_ERROR_UE_PAGE_TABLE_WALK_IFETCH	2
+#define PSERIES_MC_ERROR_UE_LOAD_STORE			3
+#define PSERIES_MC_ERROR_UE_PAGE_TABLE_WALK_LOAD_STORE	4
+
+#define PSERIES_MC_ERROR_SLB_PARITY		0
+#define PSERIES_MC_ERROR_SLB_MULTIHIT		1
+#define PSERIES_MC_ERROR_SLB_INDETERMINATE	2
+
+#define PSERIES_MC_ERROR_ERAT_PARITY		1
+#define PSERIES_MC_ERROR_ERAT_MULTIHIT		2
+#define PSERIES_MC_ERROR_ERAT_INDETERMINATE	3
+
+#define PSERIES_MC_ERROR_TLB_PARITY		1
+#define PSERIES_MC_ERROR_TLB_MULTIHIT		2
+#define PSERIES_MC_ERROR_TLB_INDETERMINATE	3
+
+static inline uint8_t rtas_mc_error_type(const struct pseries_mc_errorlog *mlog)
+{
+	return mlog->error_type;
+}
+
+static inline uint8_t rtas_mc_error_sub_type(
+					const struct pseries_mc_errorlog *mlog)
+{
+	switch (mlog->error_type) {
+	case	PSERIES_MC_ERROR_TYPE_UE:
+		return (mlog->u.ue_error.ue_err_type & 0x07);
+	case	PSERIES_MC_ERROR_TYPE_SLB:
+	case	PSERIES_MC_ERROR_TYPE_ERAT:
+	case	PSERIES_MC_ERROR_TYPE_TLB:
+		return (mlog->u.soft_error.soft_err_type & 0x03);
+	default:
+		return 0;
+	}
+}
+
+static inline uint64_t rtas_mc_get_effective_addr(
+					const struct pseries_mc_errorlog *mlog)
+{
+	uint64_t addr = 0;
+
+	switch (mlog->error_type) {
+	case	PSERIES_MC_ERROR_TYPE_UE:
+		if (mlog->u.ue_error.ue_err_type & 0x40)
+			addr = mlog->u.ue_error.effective_address;
+		break;
+	case	PSERIES_MC_ERROR_TYPE_SLB:
+	case	PSERIES_MC_ERROR_TYPE_ERAT:
+	case	PSERIES_MC_ERROR_TYPE_TLB:
+		if (mlog->u.soft_error.soft_err_type & 0x80)
+			addr = mlog->u.soft_error.effective_address;
+	default:
+		break;
+	}
+	return be64_to_cpu(addr);
+}
+
 struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
 					      uint16_t section_id);
 

^ permalink raw reply related

* [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
From: Mahesh J Salgaonkar @ 2018-06-07 17:28 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K.V, Michael Ellerman, Aneesh Kumar K.V,
	Michael Ellerman, Laurent Dufour, Nicholas Piggin
In-Reply-To: <152839244928.25118.15100234720683911223.stgit@jupiter.in.ibm.com>

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

If we get a machine check exceptions due to SLB errors then dump the
current SLB contents which will be very much helpful in debugging the
root cause of SLB errors. On pseries, as of today system crashes on SLB
errors. These are soft errors and can be fixed by flushing the SLBs so
the kernel can continue to function instead of system crash. This patch
fixes that also.

With this patch the console will log SLB contents like below on SLB MCE
errors:

[  822.711728] slb contents:
[  822.711730] 00 c000000008000000 400ea1b217000500
[  822.711731]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
[  822.711732] 01 d000000008000000 400d43642f000510
[  822.711733]   1T  ESID=   d00000  VSID=      d43642f LLP:110
[  822.711734] 09 f000000008000000 400a86c85f000500
[  822.711736]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
[  822.711737] 10 00007f0008000000 400d1f26e3000d90
[  822.711738]   1T  ESID=       7f  VSID=      d1f26e3 LLP:110
[  822.711739] 11 0000000018000000 000e3615f520fd90
[  822.711740]  256M ESID=        1  VSID=   e3615f520f LLP:110
[  822.711740] 12 d000000008000000 400d43642f000510
[  822.711741]   1T  ESID=   d00000  VSID=      d43642f LLP:110
[  822.711742] 13 d000000008000000 400d43642f000510
[  822.711743]   1T  ESID=   d00000  VSID=      d43642f LLP:110


Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 +
 arch/powerpc/mm/slb.c                         |   35 +++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/ras.c          |   29 ++++++++++++++++++++-
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 50ed64fba4ae..c0da68927235 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -487,6 +487,7 @@ extern void hpte_init_native(void);
 
 extern void slb_initialize(void);
 extern void slb_flush_and_rebolt(void);
+extern void slb_dump_contents(void);
 
 extern void slb_vmalloc_update(void);
 extern void slb_set_size(u16 size);
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 66577cc66dc9..799aa117cec3 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -145,6 +145,41 @@ void slb_flush_and_rebolt(void)
 	get_paca()->slb_cache_ptr = 0;
 }
 
+void slb_dump_contents(void)
+{
+	int i;
+	unsigned long e, v;
+	unsigned long llp;
+
+	pr_err("slb contents:\n");
+	for (i = 0; i < mmu_slb_size; i++) {
+		asm volatile("slbmfee  %0,%1" : "=r" (e) : "r" (i));
+		asm volatile("slbmfev  %0,%1" : "=r" (v) : "r" (i));
+
+		if (!e && !v)
+			continue;
+
+		pr_err("%02d %016lx %016lx", i, e, v);
+
+		if (!(e & SLB_ESID_V)) {
+			pr_err("\n");
+			continue;
+		}
+		llp = v & SLB_VSID_LLP;
+		if (v & SLB_VSID_B_1T) {
+			pr_err("  1T  ESID=%9lx  VSID=%13lx LLP:%3lx\n",
+				GET_ESID_1T(e),
+				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT_1T,
+				llp);
+		} else {
+			pr_err(" 256M ESID=%9lx  VSID=%13lx LLP:%3lx\n",
+				GET_ESID(e),
+				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT,
+				llp);
+		}
+	}
+}
+
 void slb_vmalloc_update(void)
 {
 	unsigned long vflags;
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 2edc673be137..e56759d92356 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
 	return 0; /* need to perform reset */
 }
 
+static int mce_handle_error(struct rtas_error_log *errp)
+{
+	struct pseries_errorlog *pseries_log;
+	struct pseries_mc_errorlog *mce_log;
+	int disposition = rtas_error_disposition(errp);
+	uint8_t error_type;
+
+	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
+	if (pseries_log == NULL)
+		goto out;
+
+	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
+	error_type = rtas_mc_error_type(mce_log);
+
+	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
+			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
+		slb_dump_contents();
+		slb_flush_and_rebolt();
+		disposition = RTAS_DISP_FULLY_RECOVERED;
+	}
+
+out:
+	return disposition;
+}
+
 /*
  * See if we can recover from a machine check exception.
  * This is only called on power4 (or above) and only via
@@ -434,7 +459,9 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
 static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
 {
 	int recovered = 0;
-	int disposition = rtas_error_disposition(err);
+	int disposition;
+
+	disposition = mce_handle_error(err);
 
 	if (!(regs->msr & MSR_RI)) {
 		/* If MSR_RI isn't set, we cannot recover */

^ permalink raw reply related

* [v3 PATCH 5/5] powerpc/pseries: Display machine check error details.
From: Mahesh J Salgaonkar @ 2018-06-07 17:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour,
	Nicholas Piggin
In-Reply-To: <152839244928.25118.15100234720683911223.stgit@jupiter.in.ibm.com>

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Extract the MCE error details from RTAS extended log and display it to
console.

With this patch you should now see mce logs like below:

[  142.371818] Severe Machine check interrupt [Recovered]
[  142.371822]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
[  142.371822]   Initiator: CPU
[  142.371823]   Error type: SLB [Multihit]
[  142.371824]     Effective address: d00000000ca70000

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h      |    5 +
 arch/powerpc/platforms/pseries/ras.c |  128 +++++++++++++++++++++++++++++++++-
 2 files changed, 131 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 3f2fba7ef23b..8100a95c133a 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -190,6 +190,11 @@ static inline uint8_t rtas_error_extended(const struct rtas_error_log *elog)
 	return (elog->byte1 & 0x04) >> 2;
 }
 
+static inline uint8_t rtas_error_initiator(const struct rtas_error_log *elog)
+{
+	return (elog->byte2 & 0xf0) >> 4;
+}
+
 #define rtas_error_type(x)	((x)->byte3)
 
 static inline
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index e56759d92356..cd9446980092 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -422,7 +422,130 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
 	return 0; /* need to perform reset */
 }
 
-static int mce_handle_error(struct rtas_error_log *errp)
+#define VAL_TO_STRING(ar, val)	((val < ARRAY_SIZE(ar)) ? ar[val] : "Unknown")
+
+static void pseries_print_mce_info(struct pt_regs *regs,
+				struct rtas_error_log *errp, int disposition)
+{
+	const char *level, *sevstr;
+	struct pseries_errorlog *pseries_log;
+	struct pseries_mc_errorlog *mce_log;
+	uint8_t error_type, err_sub_type;
+	uint8_t initiator = rtas_error_initiator(errp);
+	uint64_t addr;
+
+	static const char * const initiators[] = {
+		"Unknown",
+		"CPU",
+		"PCI",
+		"ISA",
+		"Memory",
+		"Power Mgmt",
+	};
+	static const char * const mc_err_types[] = {
+		"UE",
+		"SLB",
+		"ERAT",
+		"TLB",
+		"D-Cache",
+		"Unknown",
+		"I-Cache",
+	};
+	static const char * const mc_ue_types[] = {
+		"Indeterminate",
+		"Instruction fetch",
+		"Page table walk ifetch",
+		"Load/Store",
+		"Page table walk Load/Store",
+	};
+
+	/* SLB sub errors valid values are 0x0, 0x1, 0x2 */
+	static const char * const mc_slb_types[] = {
+		"Parity",
+		"Multihit",
+		"Indeterminate",
+	};
+
+	/* TLB and ERAT sub errors valid values are 0x1, 0x2, 0x3 */
+	static const char * const mc_soft_types[] = {
+		"Unknown",
+		"Parity",
+		"Multihit",
+		"Indeterminate",
+	};
+
+	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
+	if (pseries_log == NULL)
+		return;
+
+	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
+
+	error_type = rtas_mc_error_type(mce_log);
+	err_sub_type = rtas_mc_error_sub_type(mce_log);
+
+	switch (rtas_error_severity(errp)) {
+	case RTAS_SEVERITY_NO_ERROR:
+		level = KERN_INFO;
+		sevstr = "Harmless";
+		break;
+	case RTAS_SEVERITY_WARNING:
+		level = KERN_WARNING;
+		sevstr = "";
+		break;
+	case RTAS_SEVERITY_ERROR:
+	case RTAS_SEVERITY_ERROR_SYNC:
+		level = KERN_ERR;
+		sevstr = "Severe";
+		break;
+	case RTAS_SEVERITY_FATAL:
+	default:
+		level = KERN_ERR;
+		sevstr = "Fatal";
+		break;
+	}
+
+	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
+		disposition == RTAS_DISP_FULLY_RECOVERED ?
+		"Recovered" : "Not recovered");
+	if (user_mode(regs)) {
+		printk("%s  NIP: [%016lx] PID: %d Comm: %s\n", level,
+			regs->nip, current->pid, current->comm);
+	} else {
+		printk("%s  NIP [%016lx]: %pS\n", level, regs->nip,
+			(void *)regs->nip);
+	}
+	printk("%s  Initiator: %s\n", level,
+				VAL_TO_STRING(initiators, initiator));
+
+	switch (error_type) {
+	case PSERIES_MC_ERROR_TYPE_UE:
+		printk("%s  Error type: %s [%s]\n", level,
+			VAL_TO_STRING(mc_err_types, error_type),
+			VAL_TO_STRING(mc_ue_types, err_sub_type));
+		break;
+	case PSERIES_MC_ERROR_TYPE_SLB:
+		printk("%s  Error type: %s [%s]\n", level,
+			VAL_TO_STRING(mc_err_types, error_type),
+			VAL_TO_STRING(mc_slb_types, err_sub_type));
+		break;
+	case PSERIES_MC_ERROR_TYPE_ERAT:
+	case PSERIES_MC_ERROR_TYPE_TLB:
+		printk("%s  Error type: %s [%s]\n", level,
+			VAL_TO_STRING(mc_err_types, error_type),
+			VAL_TO_STRING(mc_soft_types, err_sub_type));
+		break;
+	default:
+		printk("%s  Error type: %s\n", level,
+			VAL_TO_STRING(mc_err_types, error_type));
+		break;
+	}
+
+	addr = rtas_mc_get_effective_addr(mce_log);
+	if (addr)
+		printk("%s    Effective address: %016llx\n", level, addr);
+}
+
+static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
 {
 	struct pseries_errorlog *pseries_log;
 	struct pseries_mc_errorlog *mce_log;
@@ -442,6 +565,7 @@ static int mce_handle_error(struct rtas_error_log *errp)
 		slb_flush_and_rebolt();
 		disposition = RTAS_DISP_FULLY_RECOVERED;
 	}
+	pseries_print_mce_info(regs, errp, disposition);
 
 out:
 	return disposition;
@@ -461,7 +585,7 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
 	int recovered = 0;
 	int disposition;
 
-	disposition = mce_handle_error(err);
+	disposition = mce_handle_error(regs, err);
 
 	if (!(regs->msr & MSR_RI)) {
 		/* If MSR_RI isn't set, we cannot recover */

^ permalink raw reply related

* UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:137:13
From: Mathieu Malaterre @ 2018-06-07 18:26 UTC (permalink / raw)
  To: linuxppc-dev

Hi there,

I have a reproducible UBSAN appearing in dmesg after a while on my G4
(*). Could anyone suggest a way to diagnose the actual root issue here
(or is it just a false positive) ?

Thanks,

(*)
[41877.514338] ================================================================================
[41877.514364] UBSAN: Undefined behaviour in
../include/linux/percpu_counter.h:137:13
[41877.514373] signed integer overflow:
[41877.514378] 9223352809007201260 + 41997676517838 cannot be
represented in type 'long long int'
[41877.514389] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
[41877.514394] Call Trace:
[41877.514411] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
[41877.514422] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
[41877.514437] [dffeddc0] [c043aaa8] cfq_completed_request+0x560/0x1234
[41877.514446] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
[41877.514460] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
[41877.514469] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
[41877.514482] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
[41877.514496] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
[41877.514508] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
[41877.514520] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
[41877.514533] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
[41877.514541] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
[41877.514549] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
                   LR = arch_cpu_idle+0x30/0x78
[41877.514558] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
[41877.514570] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
[41877.514577] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
[41877.514585] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
[41877.514592] [c0ce5ff0] [00003444] 0x3444
[41877.514597] ================================================================================
[41886.390210] ================================================================================
[41886.390236] UBSAN: Undefined behaviour in
../include/linux/percpu_counter.h:137:13
[41886.390245] signed integer overflow:
[41886.390250] 9223366156262940402 + 42006563339289 cannot be
represented in type 'long long int'
[41886.390260] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
[41886.390265] Call Trace:
[41886.390282] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
[41886.390293] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
[41886.390309] [dffeddc0] [c043a8c4] cfq_completed_request+0x37c/0x1234
[41886.390317] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
[41886.390331] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
[41886.390340] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
[41886.390353] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
[41886.390367] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
[41886.390379] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
[41886.390391] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
[41886.390404] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
[41886.390411] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
[41886.390420] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
                   LR = arch_cpu_idle+0x30/0x78
[41886.390429] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
[41886.390441] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
[41886.390449] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
[41886.390457] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
[41886.390463] [c0ce5ff0] [00003444] 0x3444
[41886.390468] ================================================================================

^ permalink raw reply

* Re: [RFC PATCH 0/4] powerpc/pseries: Machien check handler improvements.
From: Michal Suchánek @ 2018-06-07 19:43 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Laurent Dufour, Aneesh Kumar K.V
In-Reply-To: <152825972491.24560.4626614298142965406.stgit@jupiter.in.ibm.com>

On Wed, 06 Jun 2018 10:06:23 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> This patch series includes some improvement to Machine check handler
> for pseries. Patch 1 fixes an issue where machine check handler
> crashes kernel while accessing vmalloc-ed buffer while in nmi context.
> Patch 3 dumps the SLB contents on SLB MCE errors to improve the
> debugability. Patch 4 display's the MCE error details on console.
>=20
> ---
>=20
> Mahesh Salgaonkar (4):
>       powerpc/pseries: convert rtas_log_buf to linear allocation.
>       powerpc/pseries: Define MCE error event section.
>       powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
>       powerpc/pseries: Display machine check error details.

Tested-by: Michal Such=C3=A1nek <msuchanek@suse.de>

Thanks

Michal

^ permalink raw reply

* Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
From: Benjamin Herrenschmidt @ 2018-06-07 21:54 UTC (permalink / raw)
  To: Alex Williamson, Alexey Kardashevskiy
  Cc: linuxppc-dev, David Gibson, kvm-ppc, Ram Pai, kvm,
	Alistair Popple
In-Reply-To: <20180607110409.5057ebac@w520.home>

On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote:
> 
> Can we back up and discuss whether the IOMMU grouping of NVLink
> connected devices makes sense?  AIUI we have a PCI view of these
> devices and from that perspective they're isolated.  That's the view of
> the device used to generate the grouping.  However, not visible to us,
> these devices are interconnected via NVLink.  What isolation properties
> does NVLink provide given that its entire purpose for existing seems to
> be to provide a high performance link for p2p between devices?

Not entire. On POWER chips, we also have an nvlink between the device
and the CPU which is running significantly faster than PCIe.

But yes, there are cross-links and those should probably be accounted
for in the grouping.

> > Each bridge represents an additional hardware interface called "NVLink2",
> > it is not a PCI link but separate but. The design inherits from original
> > NVLink from POWER8.
> > 
> > The new feature of V100 is 16GB of cache coherent memory on GPU board.
> > This memory is presented to the host via the device tree and remains offline
> > until the NVIDIA driver loads, trains NVLink2 (via the config space of these
> > bridges above) and the nvidia-persistenced daemon then onlines it.
> > The memory remains online as long as nvidia-persistenced is running, when
> > it stops, it offlines the memory.
> > 
> > The amount of GPUs suggest passing them through to a guest. However,
> > in order to do so we cannot use the NVIDIA driver so we have a host with
> > a 128GB window (bigger or equal to actual GPU RAM size) in a system memory
> > with no page structs backing this window and we cannot touch this memory
> > before the NVIDIA driver configures it in a host or a guest as
> > HMI (hardware management interrupt?) occurs.
> 
> Having a lot of GPUs only suggests assignment to a guest if there's
> actually isolation provided between those GPUs.  Otherwise we'd need to
> assign them as one big group, which gets a lot less useful.  Thanks,
> 
> Alex
> 
> > On the example system the GPU RAM windows are located at:
> > 0x0400 0000 0000
> > 0x0420 0000 0000
> > 0x0440 0000 0000
> > 0x2400 0000 0000
> > 0x2420 0000 0000
> > 0x2440 0000 0000
> > 
> > So the complications are:
> > 
> > 1. cannot touch the GPU memory till it is trained, i.e. cannot add ptes
> > to VFIO-to-userspace or guest-to-host-physical translations till
> > the driver trains it (i.e. nvidia-persistenced has started), otherwise
> > prefetching happens and HMI occurs; I am trying to get this changed
> > somehow;
> > 
> > 2. since it appears as normal cache coherent memory, it will be used
> > for DMA which means it has to be pinned and mapped in the host. Having
> > no page structs makes it different from the usual case - we only need
> > translate user addresses to host physical and map GPU RAM memory but
> > pinning is not required.
> > 
> > This series maps GPU RAM via the GPU vfio-pci device so QEMU can then
> > register this memory as a KVM memory slot and present memory nodes to
> > the guest. Unless NVIDIA provides an userspace driver, this is no use
> > for things like DPDK.
> > 
> > 
> > There is another problem which the series does not address but worth
> > mentioning - it is not strictly necessary to map GPU RAM to the guest
> > exactly where it is in the host (I tested this to some extent), we still
> > might want to represent the memory at the same offset as on the host
> > which increases the size of a TCE table needed to cover such a huge
> > window: (((0x244000000000 + 0x2000000000) >> 16)*8)>>20 = 4556MB
> > I am addressing this in a separate patchset by allocating indirect TCE
> > levels on demand and using 16MB IOMMU pages in the guest as we can now
> > back emulated pages with the smaller hardware ones.
> > 
> > 
> > This is an RFC. Please comment. Thanks.
> > 
> > 
> > 
> > Alexey Kardashevskiy (5):
> >   vfio/spapr_tce: Simplify page contained test
> >   powerpc/iommu_context: Change referencing in API
> >   powerpc/iommu: Do not pin memory of a memory device
> >   vfio_pci: Allow mapping extra regions
> >   vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver
> > 
> >  drivers/vfio/pci/Makefile              |   1 +
> >  arch/powerpc/include/asm/mmu_context.h |   5 +-
> >  drivers/vfio/pci/vfio_pci_private.h    |  11 ++
> >  include/uapi/linux/vfio.h              |   3 +
> >  arch/powerpc/kernel/iommu.c            |   8 +-
> >  arch/powerpc/mm/mmu_context_iommu.c    |  70 +++++++++---
> >  drivers/vfio/pci/vfio_pci.c            |  19 +++-
> >  drivers/vfio/pci/vfio_pci_nvlink2.c    | 190 +++++++++++++++++++++++++++++++++
> >  drivers/vfio/vfio_iommu_spapr_tce.c    |  42 +++++---
> >  drivers/vfio/pci/Kconfig               |   4 +
> >  10 files changed, 319 insertions(+), 34 deletions(-)
> >  create mode 100644 drivers/vfio/pci/vfio_pci_nvlink2.c
> > 

^ permalink raw reply

* Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
From: Alex Williamson @ 2018-06-07 22:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, linuxppc-dev, David Gibson, kvm-ppc,
	Ram Pai, kvm, Alistair Popple
In-Reply-To: <e35a7bbea8b82c17f93eb6eb438df38a94097f2d.camel@kernel.crashing.org>

On Fri, 08 Jun 2018 07:54:02 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote:
> > 
> > Can we back up and discuss whether the IOMMU grouping of NVLink
> > connected devices makes sense?  AIUI we have a PCI view of these
> > devices and from that perspective they're isolated.  That's the view of
> > the device used to generate the grouping.  However, not visible to us,
> > these devices are interconnected via NVLink.  What isolation properties
> > does NVLink provide given that its entire purpose for existing seems to
> > be to provide a high performance link for p2p between devices?  
> 
> Not entire. On POWER chips, we also have an nvlink between the device
> and the CPU which is running significantly faster than PCIe.
> 
> But yes, there are cross-links and those should probably be accounted
> for in the grouping.

Then after we fix the grouping, can we just let the host driver manage
this coherent memory range and expose vGPUs to guests?  The use case of
assigning all 6 GPUs to one VM seems pretty limited.  (Might need to
convince NVIDIA to support more than a single vGPU per VM though)
Thanks,

Alex

^ permalink raw reply

* Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
From: Benjamin Herrenschmidt @ 2018-06-07 23:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Alexey Kardashevskiy, linuxppc-dev, David Gibson, kvm-ppc,
	Ram Pai, kvm, Alistair Popple
In-Reply-To: <20180607161541.21df6434@w520.home>

On Thu, 2018-06-07 at 16:15 -0600, Alex Williamson wrote:
> On Fri, 08 Jun 2018 07:54:02 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote:
> > > 
> > > Can we back up and discuss whether the IOMMU grouping of NVLink
> > > connected devices makes sense?  AIUI we have a PCI view of these
> > > devices and from that perspective they're isolated.  That's the view of
> > > the device used to generate the grouping.  However, not visible to us,
> > > these devices are interconnected via NVLink.  What isolation properties
> > > does NVLink provide given that its entire purpose for existing seems to
> > > be to provide a high performance link for p2p between devices?  
> > 
> > Not entire. On POWER chips, we also have an nvlink between the device
> > and the CPU which is running significantly faster than PCIe.
> > 
> > But yes, there are cross-links and those should probably be accounted
> > for in the grouping.
> 
> Then after we fix the grouping, can we just let the host driver manage
> this coherent memory range and expose vGPUs to guests?  The use case of
> assigning all 6 GPUs to one VM seems pretty limited.  (Might need to
> convince NVIDIA to support more than a single vGPU per VM though)
> Thanks,

I don't know about "vGPUs" and what nVidia may be cooking in that area.

The patched from Alexey allow for passing through the full thing, but
they aren't trivial (there are additional issues, I'm not sure how
covered they are, as we need to pay with the mapping attributes of
portions of the GPU memory on the host side...).

Note: The cross-links are only per-socket so that would be 2 groups of
3.

We *can* allow individual GPUs to be passed through, either if somebody
designs a system without cross links, or if the user is ok with the
security risk as the guest driver will not enable them if it doesn't
"find" both sides of them.

Cheers,
Ben.

^ permalink raw reply

* Re: linux-next: manual merge of the powerpc tree with the kbuild tree
From: Stephen Rothwell @ 2018-06-07 23:52 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Michael Ellerman, Benjamin Herrenschmidt, PowerPC,
	Linux-Next Mailing List, Linux Kernel Mailing List,
	Nicholas Piggin, Naveen N. Rao
In-Reply-To: <20180531093216.5c91aa2b@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]

Hi all,

On Thu, 31 May 2018 09:32:16 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Today's linux-next merge of the powerpc tree got a conflict in:
> 
>   arch/powerpc/kernel/module_64.c
> 
> between commit:
> 
>   06aeb9e3f2bc ("powerpc/kbuild: move -mprofile-kernel check to Kconfig")
> 
> from the kbuild tree and commit:
> 
>   250122baed29 ("powerpc64/module: Tighten detection of mcount call sites with -mprofile-kernel")
> 
> from the powerpc tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc arch/powerpc/kernel/module_64.c
> index 55bccc315e1a,f7667e2ebfcb..000000000000
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@@ -462,9 -466,12 +466,12 @@@ static unsigned long stub_for_addr(cons
>   	return (unsigned long)&stubs[i];
>   }
>   
>  -#ifdef CC_USING_MPROFILE_KERNEL
>  +#ifdef CONFIG_MPROFILE_KERNEL
> - static bool is_early_mcount_callsite(u32 *instruction)
> + static bool is_mprofile_mcount_callsite(const char *name, u32 *instruction)
>   {
> + 	if (strcmp("_mcount", name))
> + 		return false;
> + 
>   	/*
>   	 * Check if this is one of the -mprofile-kernel sequences.
>   	 */

This is now a conflict between the kbuild tree and Linus' tree.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
From: Alex Williamson @ 2018-06-08  0:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, linuxppc-dev, David Gibson, kvm-ppc,
	Ram Pai, kvm, Alistair Popple
In-Reply-To: <33590885d138195c8ede78b588ddb03b132267fd.camel@kernel.crashing.org>

On Fri, 08 Jun 2018 09:20:30 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Thu, 2018-06-07 at 16:15 -0600, Alex Williamson wrote:
> > On Fri, 08 Jun 2018 07:54:02 +1000
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >   
> > > On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote:  
> > > > 
> > > > Can we back up and discuss whether the IOMMU grouping of NVLink
> > > > connected devices makes sense?  AIUI we have a PCI view of these
> > > > devices and from that perspective they're isolated.  That's the view of
> > > > the device used to generate the grouping.  However, not visible to us,
> > > > these devices are interconnected via NVLink.  What isolation properties
> > > > does NVLink provide given that its entire purpose for existing seems to
> > > > be to provide a high performance link for p2p between devices?    
> > > 
> > > Not entire. On POWER chips, we also have an nvlink between the device
> > > and the CPU which is running significantly faster than PCIe.
> > > 
> > > But yes, there are cross-links and those should probably be accounted
> > > for in the grouping.  
> > 
> > Then after we fix the grouping, can we just let the host driver manage
> > this coherent memory range and expose vGPUs to guests?  The use case of
> > assigning all 6 GPUs to one VM seems pretty limited.  (Might need to
> > convince NVIDIA to support more than a single vGPU per VM though)
> > Thanks,  
> 
> I don't know about "vGPUs" and what nVidia may be cooking in that area.
> 
> The patched from Alexey allow for passing through the full thing, but
> they aren't trivial (there are additional issues, I'm not sure how
> covered they are, as we need to pay with the mapping attributes of
> portions of the GPU memory on the host side...).
> 
> Note: The cross-links are only per-socket so that would be 2 groups of
> 3.
> 
> We *can* allow individual GPUs to be passed through, either if somebody
> designs a system without cross links, or if the user is ok with the
> security risk as the guest driver will not enable them if it doesn't
> "find" both sides of them.

If GPUs are not isolated and we cannot prevent them from probing each
other via these links, then I think we have an obligation to configure
grouping in a way that doesn't rely on a benevolent userspace.  Thanks,

Alex

^ permalink raw reply

* Re: [RFC PATCH -tip v5 18/27] powerpc/kprobes: Don't call the ->break_handler() in arm kprobes code
From: Masami Hiramatsu @ 2018-06-08  0:42 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Andrew Morton, H . Peter Anvin, linux-arch, linux-kernel,
	linuxppc-dev, Ingo Molnar, Ingo Molnar, Paul Mackerras,
	Steven Rostedt, Thomas Gleixner
In-Reply-To: <1528389409.62iyaqw9yn.naveen@linux.ibm.com>

On Thu, 07 Jun 2018 22:07:26 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> Masami Hiramatsu wrote:
> > On Thu, 07 Jun 2018 17:07:00 +0530
> > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> > 
> >> Masami Hiramatsu wrote:
> >> > Don't call the ->break_handler() from the arm kprobes code,
> >> 					    ^^^ powerpc
> >> 
> >> > because it was only used by jprobes which got removed.
> >> > 
> >> > This also makes skip_singlestep() a static function since
> >> > only ftrace-kprobe.c is using this function.
> >> > 
> >> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> >> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> >> > Cc: Paul Mackerras <paulus@samba.org>
> >> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> >> > Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
> >> > Cc: linuxppc-dev@lists.ozlabs.org
> >> > ---
> >> >  arch/powerpc/include/asm/kprobes.h   |   10 ----------
> >> >  arch/powerpc/kernel/kprobes-ftrace.c |   16 +++-------------
> >> >  arch/powerpc/kernel/kprobes.c        |   31 +++++++++++--------------------
> >> >  3 files changed, 14 insertions(+), 43 deletions(-)
> >> 
> >> With 2 small comments...
> > 
> > 2 ? or 1 ?
> 
> Two, with one in the commit log above :)

Oops, sorry I missed it. yeah, the comment above is my mistake. I'll fix it.

Thanks!


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply

* Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
From: Benjamin Herrenschmidt @ 2018-06-08  0:58 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Alexey Kardashevskiy, linuxppc-dev, David Gibson, kvm-ppc,
	Ram Pai, kvm, Alistair Popple
In-Reply-To: <20180607183417.3ff2acf1@w520.home>

On Thu, 2018-06-07 at 18:34 -0600, Alex Williamson wrote:
> > We *can* allow individual GPUs to be passed through, either if somebody
> > designs a system without cross links, or if the user is ok with the
> > security risk as the guest driver will not enable them if it doesn't
> > "find" both sides of them.
> 
> If GPUs are not isolated and we cannot prevent them from probing each
> other via these links, then I think we have an obligation to configure
> grouping in a way that doesn't rely on a benevolent userspace.  Thanks,

Well, it's a user decision, no ? Like how we used to let the user
decide whether to pass-through things that have LSIs shared out of
their domain.

Cheers,
Ben.

^ permalink raw reply

* Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
From: Alex Williamson @ 2018-06-08  1:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, linuxppc-dev, David Gibson, kvm-ppc,
	Ram Pai, kvm, Alistair Popple
In-Reply-To: <f6fbd7996a2edfc0bdb8363cb6384caeae7d47b9.camel@kernel.crashing.org>

On Fri, 08 Jun 2018 10:58:54 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Thu, 2018-06-07 at 18:34 -0600, Alex Williamson wrote:
> > > We *can* allow individual GPUs to be passed through, either if somebody
> > > designs a system without cross links, or if the user is ok with the
> > > security risk as the guest driver will not enable them if it doesn't
> > > "find" both sides of them.  
> > 
> > If GPUs are not isolated and we cannot prevent them from probing each
> > other via these links, then I think we have an obligation to configure
> > grouping in a way that doesn't rely on a benevolent userspace.  Thanks,  
> 
> Well, it's a user decision, no ? Like how we used to let the user
> decide whether to pass-through things that have LSIs shared out of
> their domain.

No, users don't get to pinky swear they'll be good.  The kernel creates
IOMMU groups assuming the worst case isolation and malicious users.
Its the kernel's job to protect itself from users and to protect users
from each other.  Anything else is unsupportable.  The only way to
bypass the default grouping is to modify the kernel.  Thanks,

Alex

^ permalink raw reply

* Re: [v3 PATCH 1/5] powerpc/pseries: convert rtas_log_buf to linear allocation.
From: Nicholas Piggin @ 2018-06-08  1:31 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, stable, Aneesh Kumar K.V, Michael Ellerman,
	Laurent Dufour
In-Reply-To: <152839248387.25118.579137029955034015.stgit@jupiter.in.ibm.com>

On Thu, 07 Jun 2018 22:58:11 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> rtas_log_buf is a buffer to hold RTAS event data that are communicated
> to kernel by hypervisor. This buffer is then used to pass RTAS event
> data to user through proc fs. This buffer is allocated from vmalloc
> (non-linear mapping) area.
> 
> On Machine check interrupt, register r3 points to RTAS extended event
> log passed by hypervisor that contains the MCE event. The pseries
> machine check handler then logs this error into rtas_log_buf. The
> rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
> page fault (vector 0x300) while accessing it. Since machine check
> interrupt handler runs in NMI context we can not afford to take any
> page fault. Page faults are not honored in NMI context and causes
> kernel panic. This patch fixes this issue by allocating rtas_log_buf
> using kmalloc.
> 
> Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
> Cc: stable@vger.kernel.org
> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/rtasd.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index f915db93cd42..3957d4ae2ba2 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -559,7 +559,7 @@ static int __init rtas_event_scan_init(void)
>  	rtas_error_log_max = rtas_get_error_log_max();
>  	rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);
>  
> -	rtas_log_buf = vmalloc(rtas_error_log_buffer_max*LOG_NUMBER);
> +	rtas_log_buf = kmalloc(rtas_error_log_buffer_max*LOG_NUMBER, GFP_KERNEL);

Does this have to be in the RMA region if it's to be accessed with
relocation off in the guest?

A comment about it being accessed with relocation off might be helpful
too.

Thanks,
Nick

^ permalink raw reply

* Re: [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
From: Nicholas Piggin @ 2018-06-08  1:33 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, stable, Aneesh Kumar K.V, Michael Ellerman,
	Laurent Dufour
In-Reply-To: <152839249913.25118.1191250274945665204.stgit@jupiter.in.ibm.com>

On Thu, 07 Jun 2018 22:58:33 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> During Machine Check interrupt on pseries platform, register r3 points
> RTAS extended event log passed by hypervisor. Since hypervisor uses r3
> to pass pointer to rtas log, it stores the original r3 value at the
> start of the memory (first 8 bytes) pointed by r3. Since hypervisor
> stores this info and rtas log is in BE format, linux should make
> sure to restore r3 value in correct endian format.
> 
> Without this patch when MCE handler, after recovery, returns to code that
> that caused the MCE may end up with Data SLB access interrupt for invalid
> address followed by kernel panic or hang.
> 
> [   62.878965] Severe Machine check interrupt [Recovered]
> [   62.878968]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
> [   62.878969]   Initiator: CPU
> [   62.878970]   Error type: SLB [Multihit]
> [   62.878971]     Effective address: d00000000ca70000
> cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
>     pc: c0000000009694c0: vsnprintf+0x80/0x480
>     lr: c0000000009698e0: vscnprintf+0x20/0x60
>     sp: c0000000fc777830
>    msr: 8000000002009033
>    dar: a803a30c000000d0
>   current = 0xc00000000bc9ef00
>   paca    = 0xc00000001eca5c00	 softe: 3	 irq_happened: 0x01
>     pid   = 8860, comm = insmod
> [c0000000fc7778b0] c0000000009698e0 vscnprintf+0x20/0x60
> [c0000000fc7778e0] c00000000016b6c4 vprintk_emit+0xb4/0x4b0
> [c0000000fc777960] c00000000016d40c vprintk_func+0x5c/0xd0
> [c0000000fc777980] c00000000016cbb4 printk+0x38/0x4c
> [c0000000fc7779a0] d00000000ca301c0 init_module+0x1c0/0x338 [bork_kernel]
> [c0000000fc777a40] c00000000000d9c4 do_one_initcall+0x54/0x230
> [c0000000fc777b00] c0000000001b3b74 do_init_module+0x8c/0x248
> [c0000000fc777b90] c0000000001b2478 load_module+0x12b8/0x15b0
> [c0000000fc777d30] c0000000001b29e8 sys_finit_module+0xa8/0x110
> [c0000000fc777e30] c00000000000b204 system_call+0x58/0x6c
> --- Exception: c00 (System Call) at 00007fff8bda0644
> SP (7fffdfbfe980) is in userspace
> 
> This patch fixes this issue.

LGTM

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> 
> Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/ras.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 5e1ef9150182..2edc673be137 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -360,7 +360,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>  	}
>  
>  	savep = __va(regs->gpr[3]);
> -	regs->gpr[3] = savep[0];	/* restore original r3 */
> +	regs->gpr[3] = be64_to_cpu(savep[0]);	/* restore original r3 */
>  
>  	/* If it isn't an extended log we can use the per cpu 64bit buffer */
>  	h = (struct rtas_error_log *)&savep[1];
> 

^ permalink raw reply

* Re: [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
From: Nicholas Piggin @ 2018-06-08  1:48 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour
In-Reply-To: <152839253238.25118.3114450844744290470.stgit@jupiter.in.ibm.com>

On Thu, 07 Jun 2018 22:58:55 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> If we get a machine check exceptions due to SLB errors then dump the
> current SLB contents which will be very much helpful in debugging the
> root cause of SLB errors. On pseries, as of today system crashes on SLB
> errors. These are soft errors and can be fixed by flushing the SLBs so
> the kernel can continue to function instead of system crash. This patch
> fixes that also.

So pseries never flushed SLB and reloaded in response to multi hit
errors? This seems like quite a good improvement then. I like
dumping SLB too.

It's a bit annoying we can't share the same code with xmon really,
that's okay but I just suggest commenting them both if you take a
copy like this with a note to keep them in synch if you re-post
the series.

> 
> With this patch the console will log SLB contents like below on SLB MCE
> errors:
> 
> [  822.711728] slb contents:

Suggest keeping the same format as the xmon dump (in particular
CPU number, even though it's probably printed elsewhere in the MCE
message it doesn't hurt.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks,
Nick

> [  822.711730] 00 c000000008000000 400ea1b217000500
> [  822.711731]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
> [  822.711732] 01 d000000008000000 400d43642f000510
> [  822.711733]   1T  ESID=   d00000  VSID=      d43642f LLP:110
> [  822.711734] 09 f000000008000000 400a86c85f000500
> [  822.711736]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
> [  822.711737] 10 00007f0008000000 400d1f26e3000d90
> [  822.711738]   1T  ESID=       7f  VSID=      d1f26e3 LLP:110
> [  822.711739] 11 0000000018000000 000e3615f520fd90
> [  822.711740]  256M ESID=        1  VSID=   e3615f520f LLP:110
> [  822.711740] 12 d000000008000000 400d43642f000510
> [  822.711741]   1T  ESID=   d00000  VSID=      d43642f LLP:110
> [  822.711742] 13 d000000008000000 400d43642f000510
> [  822.711743]   1T  ESID=   d00000  VSID=      d43642f LLP:110
> 
> 
> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 +
>  arch/powerpc/mm/slb.c                         |   35 +++++++++++++++++++++++++
>  arch/powerpc/platforms/pseries/ras.c          |   29 ++++++++++++++++++++-
>  3 files changed, 64 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index 50ed64fba4ae..c0da68927235 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -487,6 +487,7 @@ extern void hpte_init_native(void);
>  
>  extern void slb_initialize(void);
>  extern void slb_flush_and_rebolt(void);
> +extern void slb_dump_contents(void);
>  
>  extern void slb_vmalloc_update(void);
>  extern void slb_set_size(u16 size);
> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> index 66577cc66dc9..799aa117cec3 100644
> --- a/arch/powerpc/mm/slb.c
> +++ b/arch/powerpc/mm/slb.c
> @@ -145,6 +145,41 @@ void slb_flush_and_rebolt(void)
>  	get_paca()->slb_cache_ptr = 0;
>  }
>  
> +void slb_dump_contents(void)
> +{
> +	int i;
> +	unsigned long e, v;
> +	unsigned long llp;
> +
> +	pr_err("slb contents:\n");
> +	for (i = 0; i < mmu_slb_size; i++) {
> +		asm volatile("slbmfee  %0,%1" : "=r" (e) : "r" (i));
> +		asm volatile("slbmfev  %0,%1" : "=r" (v) : "r" (i));
> +
> +		if (!e && !v)
> +			continue;
> +
> +		pr_err("%02d %016lx %016lx", i, e, v);
> +
> +		if (!(e & SLB_ESID_V)) {
> +			pr_err("\n");
> +			continue;
> +		}
> +		llp = v & SLB_VSID_LLP;
> +		if (v & SLB_VSID_B_1T) {
> +			pr_err("  1T  ESID=%9lx  VSID=%13lx LLP:%3lx\n",
> +				GET_ESID_1T(e),
> +				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT_1T,
> +				llp);
> +		} else {
> +			pr_err(" 256M ESID=%9lx  VSID=%13lx LLP:%3lx\n",
> +				GET_ESID(e),
> +				(v & ~SLB_VSID_B) >> SLB_VSID_SHIFT,
> +				llp);
> +		}
> +	}
> +}
> +
>  void slb_vmalloc_update(void)
>  {
>  	unsigned long vflags;
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 2edc673be137..e56759d92356 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>  	return 0; /* need to perform reset */
>  }
>  
> +static int mce_handle_error(struct rtas_error_log *errp)
> +{
> +	struct pseries_errorlog *pseries_log;
> +	struct pseries_mc_errorlog *mce_log;
> +	int disposition = rtas_error_disposition(errp);
> +	uint8_t error_type;
> +
> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
> +	if (pseries_log == NULL)
> +		goto out;
> +
> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
> +	error_type = rtas_mc_error_type(mce_log);
> +
> +	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
> +			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
> +		slb_dump_contents();
> +		slb_flush_and_rebolt();
> +		disposition = RTAS_DISP_FULLY_RECOVERED;
> +	}
> +
> +out:
> +	return disposition;
> +}
> +
>  /*
>   * See if we can recover from a machine check exception.
>   * This is only called on power4 (or above) and only via
> @@ -434,7 +459,9 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>  static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
>  {
>  	int recovered = 0;
> -	int disposition = rtas_error_disposition(err);
> +	int disposition;
> +
> +	disposition = mce_handle_error(err);
>  
>  	if (!(regs->msr & MSR_RI)) {
>  		/* If MSR_RI isn't set, we cannot recover */
> 

^ permalink raw reply

* Re: [v3 PATCH 5/5] powerpc/pseries: Display machine check error details.
From: Nicholas Piggin @ 2018-06-08  1:51 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Aneesh Kumar K.V, Michael Ellerman, Laurent Dufour
In-Reply-To: <152839254280.25118.11212831020041096859.stgit@jupiter.in.ibm.com>

On Thu, 07 Jun 2018 22:59:04 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Extract the MCE error details from RTAS extended log and display it to
> console.
> 
> With this patch you should now see mce logs like below:
> 
> [  142.371818] Severe Machine check interrupt [Recovered]
> [  142.371822]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
> [  142.371822]   Initiator: CPU
> [  142.371823]   Error type: SLB [Multihit]
> [  142.371824]     Effective address: d00000000ca70000
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/rtas.h      |    5 +
>  arch/powerpc/platforms/pseries/ras.c |  128 +++++++++++++++++++++++++++++++++-
>  2 files changed, 131 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 3f2fba7ef23b..8100a95c133a 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -190,6 +190,11 @@ static inline uint8_t rtas_error_extended(const struct rtas_error_log *elog)
>  	return (elog->byte1 & 0x04) >> 2;
>  }
>  
> +static inline uint8_t rtas_error_initiator(const struct rtas_error_log *elog)
> +{
> +	return (elog->byte2 & 0xf0) >> 4;
> +}
> +
>  #define rtas_error_type(x)	((x)->byte3)
>  
>  static inline
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index e56759d92356..cd9446980092 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -422,7 +422,130 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>  	return 0; /* need to perform reset */
>  }
>  
> -static int mce_handle_error(struct rtas_error_log *errp)
> +#define VAL_TO_STRING(ar, val)	((val < ARRAY_SIZE(ar)) ? ar[val] : "Unknown")
> +
> +static void pseries_print_mce_info(struct pt_regs *regs,
> +				struct rtas_error_log *errp, int disposition)
> +{
> +	const char *level, *sevstr;
> +	struct pseries_errorlog *pseries_log;
> +	struct pseries_mc_errorlog *mce_log;
> +	uint8_t error_type, err_sub_type;
> +	uint8_t initiator = rtas_error_initiator(errp);
> +	uint64_t addr;
> +
> +	static const char * const initiators[] = {
> +		"Unknown",
> +		"CPU",
> +		"PCI",
> +		"ISA",
> +		"Memory",
> +		"Power Mgmt",
> +	};
> +	static const char * const mc_err_types[] = {
> +		"UE",
> +		"SLB",
> +		"ERAT",
> +		"TLB",
> +		"D-Cache",
> +		"Unknown",
> +		"I-Cache",
> +	};
> +	static const char * const mc_ue_types[] = {
> +		"Indeterminate",
> +		"Instruction fetch",
> +		"Page table walk ifetch",
> +		"Load/Store",
> +		"Page table walk Load/Store",
> +	};
> +
> +	/* SLB sub errors valid values are 0x0, 0x1, 0x2 */
> +	static const char * const mc_slb_types[] = {
> +		"Parity",
> +		"Multihit",
> +		"Indeterminate",
> +	};
> +
> +	/* TLB and ERAT sub errors valid values are 0x1, 0x2, 0x3 */
> +	static const char * const mc_soft_types[] = {
> +		"Unknown",
> +		"Parity",
> +		"Multihit",
> +		"Indeterminate",
> +	};
> +
> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
> +	if (pseries_log == NULL)
> +		return;
> +
> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
> +
> +	error_type = rtas_mc_error_type(mce_log);
> +	err_sub_type = rtas_mc_error_sub_type(mce_log);
> +
> +	switch (rtas_error_severity(errp)) {
> +	case RTAS_SEVERITY_NO_ERROR:
> +		level = KERN_INFO;
> +		sevstr = "Harmless";
> +		break;
> +	case RTAS_SEVERITY_WARNING:
> +		level = KERN_WARNING;
> +		sevstr = "";
> +		break;
> +	case RTAS_SEVERITY_ERROR:
> +	case RTAS_SEVERITY_ERROR_SYNC:
> +		level = KERN_ERR;
> +		sevstr = "Severe";
> +		break;
> +	case RTAS_SEVERITY_FATAL:
> +	default:
> +		level = KERN_ERR;
> +		sevstr = "Fatal";
> +		break;
> +	}
> +
> +	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
> +		disposition == RTAS_DISP_FULLY_RECOVERED ?
> +		"Recovered" : "Not recovered");
> +	if (user_mode(regs)) {
> +		printk("%s  NIP: [%016lx] PID: %d Comm: %s\n", level,
> +			regs->nip, current->pid, current->comm);
> +	} else {
> +		printk("%s  NIP [%016lx]: %pS\n", level, regs->nip,
> +			(void *)regs->nip);
> +	}

I think it's probably still useful to print pid/comm for kernel mode
faults if !in_interrupt()... I see you're basically taking kernel/mce.c
and doing the same thing.

Is there any reasonable way to share code here?

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH 1/3] powerpc: make CPU selection logic generic in Makefile
From: Nicholas Piggin @ 2018-06-08  1:54 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev
In-Reply-To: <273f8ed3e980b9385c6e1b31e17f890ea08ce33c.1528365638.git.christophe.leroy@c-s.fr>

On Thu,  7 Jun 2018 10:10:18 +0000 (UTC)
Christophe Leroy <christophe.leroy@c-s.fr> wrote:

> At the time being, when adding a new CPU for selection, both
> Kconfig.cputype and Makefile have to be modified.
> 
> This patch moves into Kconfig.cputype the name of the CPU to me
> passed to the -mcpu= argument.

Seems like a good cleanup.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
>  arch/powerpc/Makefile                  |  8 +-------
>  arch/powerpc/platforms/Kconfig.cputype | 15 +++++++++++++++
>  2 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 9704ab360d39..9a5642552abc 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -175,13 +175,7 @@ ifdef CONFIG_MPROFILE_KERNEL
>      endif
>  endif
>  
> -CFLAGS-$(CONFIG_CELL_CPU) += $(call cc-option,-mcpu=cell)
> -CFLAGS-$(CONFIG_POWER5_CPU) += $(call cc-option,-mcpu=power5)
> -CFLAGS-$(CONFIG_POWER6_CPU) += $(call cc-option,-mcpu=power6)
> -CFLAGS-$(CONFIG_POWER7_CPU) += $(call cc-option,-mcpu=power7)
> -CFLAGS-$(CONFIG_POWER8_CPU) += $(call cc-option,-mcpu=power8)
> -CFLAGS-$(CONFIG_POWER9_CPU) += $(call cc-option,-mcpu=power9)
> -CFLAGS-$(CONFIG_PPC_8xx) += $(call cc-option,-mcpu=860)
> +CFLAGS-$(CONFIG_SPECIAL_CPU_BOOL) += $(call cc-option,-mcpu=$(CONFIG_SPECIAL_CPU))
>  
>  # Altivec option not allowed with e500mc64 in GCC.
>  ifeq ($(CONFIG_ALTIVEC),y)
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> index cc892dcfa114..71ef559cc474 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -140,6 +140,21 @@ config E6500_CPU
>  
>  endchoice
>  
> +config SPECIAL_CPU_BOOL
> +	bool
> +	default !GENERIC_CPU
> +
> +config SPECIAL_CPU
> +	string
> +	depends on SPECIAL_CPU_BOOL
> +	default "cell" if CELL_CPU
> +	default "power5" if POWER5_CPU
> +	default "power6" if POWER6_CPU
> +	default "power7" if POWER7_CPU
> +	default "power8" if POWER8_CPU
> +	default "power9" if POWER9_CPU
> +	default "860" if PPC_8xx
> +
>  config PPC_BOOK3S
>  	def_bool y
>  	depends on PPC_BOOK3S_32 || PPC_BOOK3S_64

^ permalink raw reply

* [PATCH v2 00/12] macintosh: Resolve various PMU driver problems
From: Finn Thain @ 2018-06-08  2:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Schmitz, linuxppc-dev, linux-m68k, linux-kernel

This series of patches has the following aims.

1) Eliminate duplicated code. Linux presently has two drivers for
   the 68HC05-based PMU devices found in Macs: via-pmu and via-pmu68k.
   There's no value in having separate PMU drivers for each architecture.

2) Avoid further work on via-pmu68k that's not needed for via-pmu.

3) Fix some bugs in the via-pmu driver.

4) Enable the /dev/pmu and /proc/pmu/* userspace APIs on m68k Macs
   by adopting via-pmu.

5) Improve stability on early 100-series PowerBooks by loading no PMU
   driver at all. Neither via-pmu nor via-pmu68k supports the early
   M50753-based PMU device found in these models.

6) Eliminate duplicated RTC accessors for PMU and Cuda. Presently these
   can be found under both arch/m68k and arch/powerpc.

7) Assist the out-of-tree NuBus PowerMac port to support PMU designs
   shared with the m68k Mac port (e.g. PowerBooks 190 and 5300).

This patch series has been regression tested on various PowerBooks
(190, 520, 3400, Pismo G3) and PowerMacs (Beige G3, G5). These patches
did not affect userland utilities. (Note that there is a userland-
visible change to the contents of /proc/pmu/interrupts.)

Changed since v1:
1) Added blank lines after 'break' statements in patch 10.
2) Improved patch description for patch 3.
3) Added reviewed-by tags.
4) Split patch 8 to make code review easier.

Finn Thain (12):
  macintosh/via-pmu: Fix section mismatch warning
  macintosh/via-pmu: Add missing mmio accessors
  macintosh/via-pmu: Don't clear shift register interrupt flag twice
  macintosh/via-pmu: Enhance state machine with new 'uninitialized'
    state
  macintosh/via-pmu: Replace via pointer with via1 and via2 pointers
  macintosh/via-pmu: Add support for m68k PowerBooks
  macintosh/via-pmu: Make CONFIG_PPC_PMAC Kconfig deps explicit
  macintosh/via-pmu68k: Don't load driver on unsupported hardware
  macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
  macintosh: Use common code to access RTC
  macintosh/via-pmu: Clean up interrupt statistics
  macintosh/via-pmu: Disambiguate interrupt statistics

 arch/m68k/configs/mac_defconfig        |   2 +-
 arch/m68k/configs/multi_defconfig      |   2 +-
 arch/m68k/mac/config.c                 |   2 +-
 arch/m68k/mac/misc.c                   | 118 +----
 arch/powerpc/platforms/powermac/time.c |  74 +--
 drivers/macintosh/Kconfig              |  19 +-
 drivers/macintosh/Makefile             |   1 -
 drivers/macintosh/adb.c                |   2 +-
 drivers/macintosh/via-cuda.c           |  34 ++
 drivers/macintosh/via-pmu.c            | 378 ++++++++++-----
 drivers/macintosh/via-pmu68k.c         | 850 ---------------------------------
 include/linux/cuda.h                   |   3 +
 include/linux/pmu.h                    |   3 +
 include/uapi/linux/pmu.h               |   2 -
 14 files changed, 311 insertions(+), 1179 deletions(-)
 delete mode 100644 drivers/macintosh/via-pmu68k.c

-- 
2.16.4

^ permalink raw reply

* [PATCH v2 02/12] macintosh/via-pmu: Add missing mmio accessors
From: Finn Thain @ 2018-06-08  2:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Schmitz, linuxppc-dev, linux-m68k, linux-kernel
In-Reply-To: <cover.1528423341.git.fthain@telegraphics.com.au>

Add missing in_8() accessors to init_pmu() and pmu_sr_intr().

This fixes several sparse warnings:
drivers/macintosh/via-pmu.c:536:29: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:537:33: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:1455:17: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:1456:69: warning: dereference of noderef expression

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 drivers/macintosh/via-pmu.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/macintosh/via-pmu.c b/drivers/macintosh/via-pmu.c
index fd3c5640d586..74065ea410bd 100644
--- a/drivers/macintosh/via-pmu.c
+++ b/drivers/macintosh/via-pmu.c
@@ -532,8 +532,9 @@ init_pmu(void)
 	int timeout;
 	struct adb_request req;
 
-	out_8(&via[B], via[B] | TREQ);			/* negate TREQ */
-	out_8(&via[DIRB], (via[DIRB] | TREQ) & ~TACK);	/* TACK in, TREQ out */
+	/* Negate TREQ. Set TACK to input and TREQ to output. */
+	out_8(&via[B], in_8(&via[B]) | TREQ);
+	out_8(&via[DIRB], (in_8(&via[DIRB]) | TREQ) & ~TACK);
 
 	pmu_request(&req, NULL, 2, PMU_SET_INTR_MASK, pmu_intr_mask);
 	timeout =  100000;
@@ -1455,8 +1456,8 @@ pmu_sr_intr(void)
 	struct adb_request *req;
 	int bite = 0;
 
-	if (via[B] & TREQ) {
-		printk(KERN_ERR "PMU: spurious SR intr (%x)\n", via[B]);
+	if (in_8(&via[B]) & TREQ) {
+		printk(KERN_ERR "PMU: spurious SR intr (%x)\n", in_8(&via[B]));
 		out_8(&via[IFR], SR_INT);
 		return NULL;
 	}
-- 
2.16.4

^ permalink raw reply related

* [PATCH v2 01/12] macintosh/via-pmu: Fix section mismatch warning
From: Finn Thain @ 2018-06-08  2:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Schmitz, linuxppc-dev, linux-m68k, linux-kernel
In-Reply-To: <cover.1528423341.git.fthain@telegraphics.com.au>

The pmu_init() function has the __init qualifier, but the ops struct
that holds a pointer to it does not. This causes a build warning.
The driver works fine because the pointer is only dereferenced early.

The function is so small that there's negligible benefit from using
the __init qualifier. Remove it to fix the warning, consistent with
the other ADB drivers.

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 drivers/macintosh/via-pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/via-pmu.c b/drivers/macintosh/via-pmu.c
index 433dbeddfcf9..fd3c5640d586 100644
--- a/drivers/macintosh/via-pmu.c
+++ b/drivers/macintosh/via-pmu.c
@@ -378,7 +378,7 @@ static int pmu_probe(void)
 	return vias == NULL? -ENODEV: 0;
 }
 
-static int __init pmu_init(void)
+static int pmu_init(void)
 {
 	if (vias == NULL)
 		return -ENODEV;
-- 
2.16.4

^ permalink raw reply related

* [PATCH v2 03/12] macintosh/via-pmu: Don't clear shift register interrupt flag twice
From: Finn Thain @ 2018-06-08  2:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Schmitz, linuxppc-dev, linux-m68k, linux-kernel
In-Reply-To: <cover.1528423341.git.fthain@telegraphics.com.au>

The shift register interrupt flag gets cleared in via_pmu_interrupt()
and once again in pmu_sr_intr(). Fix this theoretical race condition.

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 drivers/macintosh/via-pmu.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/macintosh/via-pmu.c b/drivers/macintosh/via-pmu.c
index 74065ea410bd..4c1bae5380c2 100644
--- a/drivers/macintosh/via-pmu.c
+++ b/drivers/macintosh/via-pmu.c
@@ -1458,7 +1458,6 @@ pmu_sr_intr(void)
 
 	if (in_8(&via[B]) & TREQ) {
 		printk(KERN_ERR "PMU: spurious SR intr (%x)\n", in_8(&via[B]));
-		out_8(&via[IFR], SR_INT);
 		return NULL;
 	}
 	/* The ack may not yet be low when we get the interrupt */
-- 
2.16.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox