LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH v2 08/10] KVM: PPC: Ultravisor: Return to UV for hcalls from SVM
From: Claudio Carvalho @ 2019-05-18 14:25 UTC (permalink / raw)
  To: Paul Mackerras, Michael Ellerman, kvm-ppc, linuxppc-dev
  Cc: Madhavan Srinivasan, Michael Anderson, Ram Pai, Bharata B Rao,
	Sukadev Bhattiprolu, Thiago Jung Bauermann, Anshuman Khandual
In-Reply-To: <20190518142524.28528-1-cclaudio@linux.ibm.com>

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

All hcalls from a secure VM go to the ultravisor from where they are
reflected into the HV. When we (HV) complete processing such hcalls,
we should return to the UV rather than to the guest kernel.

Have fast_guest_return check the kvm_arch.secure_guest field so that
even a new CPU will enter UV when started (in response to a RTAS
start-cpu call).

Thanks to input from Paul Mackerras, Ram Pai and Mike Anderson.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[Fix UV_RETURN token number and arch.secure_guest check]
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h       |  1 +
 arch/powerpc/include/asm/ultravisor-api.h |  1 +
 arch/powerpc/kernel/asm-offsets.c         |  1 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 30 ++++++++++++++++++++---
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index e6b5bb012ccb..ba7dd35cb916 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -290,6 +290,7 @@ struct kvm_arch {
 	cpumask_t cpu_in_guest;
 	u8 radix;
 	u8 fwnmi_enabled;
+	u8 secure_guest;
 	bool threads_indep;
 	bool nested_enable;
 	pgd_t *pgtable;
diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h
index 24bfb4c1737e..15e6ce77a131 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -19,5 +19,6 @@
 
 /* opcodes */
 #define UV_WRITE_PATE			0xF104
+#define UV_RETURN			0xF11C
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 8e02444e9d3d..44742724513e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -508,6 +508,7 @@ int main(void)
 	OFFSET(KVM_VRMA_SLB_V, kvm, arch.vrma_slb_v);
 	OFFSET(KVM_RADIX, kvm, arch.radix);
 	OFFSET(KVM_FWNMI, kvm, arch.fwnmi_enabled);
+	OFFSET(KVM_SECURE_GUEST, kvm, arch.secure_guest);
 	OFFSET(VCPU_DSISR, kvm_vcpu, arch.shregs.dsisr);
 	OFFSET(VCPU_DAR, kvm_vcpu, arch.shregs.dar);
 	OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 938cfa5dceed..d89efa0783a2 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -36,6 +36,7 @@
 #include <asm/asm-compat.h>
 #include <asm/feature-fixups.h>
 #include <asm/cpuidle.h>
+#include <asm/ultravisor-api.h>
 
 /* Sign-extend HDEC if not on POWER9 */
 #define EXTEND_HDEC(reg)			\
@@ -1112,16 +1113,12 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 	ld	r5, VCPU_LR(r4)
-	ld	r6, VCPU_CR(r4)
 	mtlr	r5
-	mtcr	r6
 
 	ld	r1, VCPU_GPR(R1)(r4)
 	ld	r2, VCPU_GPR(R2)(r4)
 	ld	r3, VCPU_GPR(R3)(r4)
 	ld	r5, VCPU_GPR(R5)(r4)
-	ld	r6, VCPU_GPR(R6)(r4)
-	ld	r7, VCPU_GPR(R7)(r4)
 	ld	r8, VCPU_GPR(R8)(r4)
 	ld	r9, VCPU_GPR(R9)(r4)
 	ld	r10, VCPU_GPR(R10)(r4)
@@ -1139,10 +1136,35 @@ BEGIN_FTR_SECTION
 	mtspr	SPRN_HDSISR, r0
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 
+	ld	r6, VCPU_KVM(r4)
+	lbz	r7, KVM_SECURE_GUEST(r6)
+	cmpdi	r7, 0
+	bne	ret_to_ultra
+
+	lwz	r6, VCPU_CR(r4)
+	mtcr	r6
+
+	ld	r7, VCPU_GPR(R7)(r4)
+	ld	r6, VCPU_GPR(R6)(r4)
 	ld	r0, VCPU_GPR(R0)(r4)
 	ld	r4, VCPU_GPR(R4)(r4)
 	HRFI_TO_GUEST
 	b	.
+/*
+ * The hcall we just completed was from Ultravisor. Use UV_RETURN
+ * ultra call to return to the Ultravisor. Results from the hcall
+ * are already in the appropriate registers (r3:12), except for
+ * R6,7 which we used as temporary registers above. Restore them,
+ * and set R0 to the ucall number (UV_RETURN).
+ */
+ret_to_ultra:
+	lwz	r6, VCPU_CR(r4)
+	mtcr	r6
+	LOAD_REG_IMMEDIATE(r0, UV_RETURN)
+	ld	r7, VCPU_GPR(R7)(r4)
+	ld	r6, VCPU_GPR(R6)(r4)
+	ld	r4, VCPU_GPR(R4)(r4)
+	sc	2
 
 /*
  * Enter the guest on a P9 or later system where we have exactly
-- 
2.20.1


^ permalink raw reply related

* [RFC PATCH v2 09/10] KVM: PPC: Book3S HV: Fixed for running secure guests
From: Claudio Carvalho @ 2019-05-18 14:25 UTC (permalink / raw)
  To: Paul Mackerras, Michael Ellerman, kvm-ppc, linuxppc-dev
  Cc: Madhavan Srinivasan, Michael Anderson, Ram Pai, Bharata B Rao,
	Sukadev Bhattiprolu, Thiago Jung Bauermann, Anshuman Khandual
In-Reply-To: <20190518142524.28528-1-cclaudio@linux.ibm.com>

From: Paul Mackerras <paulus@ozlabs.org>

- Pass SRR1 in r11 for UV_RETURN because SRR0 and SRR1 get set by
  the sc 2 instruction. (Note r3 - r10 potentially have hcall return
  values in them.)

- Fix kvmppc_msr_interrupt to preserve the MSR_S bit.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d89efa0783a2..1b44c85956b9 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1160,6 +1160,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 ret_to_ultra:
 	lwz	r6, VCPU_CR(r4)
 	mtcr	r6
+	mfspr	r11, SPRN_SRR1
 	LOAD_REG_IMMEDIATE(r0, UV_RETURN)
 	ld	r7, VCPU_GPR(R7)(r4)
 	ld	r6, VCPU_GPR(R6)(r4)
@@ -3360,13 +3361,16 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
  *   r0 is used as a scratch register
  */
 kvmppc_msr_interrupt:
+	andis.	r0, r11, MSR_S@h
 	rldicl	r0, r11, 64 - MSR_TS_S_LG, 62
-	cmpwi	r0, 2 /* Check if we are in transactional state..  */
+	cmpwi	cr1, r0, 2 /* Check if we are in transactional state..  */
 	ld	r11, VCPU_INTR_MSR(r9)
-	bne	1f
+	bne	cr1, 1f
 	/* ... if transactional, change to suspended */
 	li	r0, 1
 1:	rldimi	r11, r0, MSR_TS_S_LG, 63 - MSR_TS_T_LG
+	beqlr
+	oris	r11, r11, MSR_S@h		/* preserve MSR_S bit setting */
 	blr
 
 /*
-- 
2.20.1


^ permalink raw reply related

* [RFC PATCH v2 10/10] KVM: PPC: Ultravisor: Check for MSR_S during hv_reset_msr
From: Claudio Carvalho @ 2019-05-18 14:25 UTC (permalink / raw)
  To: Paul Mackerras, Michael Ellerman, kvm-ppc, linuxppc-dev
  Cc: Madhavan Srinivasan, Michael Anderson, Ram Pai, Bharata B Rao,
	Sukadev Bhattiprolu, Thiago Jung Bauermann, Anshuman Khandual
In-Reply-To: <20190518142524.28528-1-cclaudio@linux.ibm.com>

From: Michael Anderson <andmike@linux.ibm.com>

 - Check for MSR_S so that kvmppc_set_msr will include. Prior to this
   change return to guest would not have the S bit set.

 - Patch based on comment from Paul Mackerras <pmac@au1.ibm.com>

Signed-off-by: Michael Anderson <andmike@linux.ibm.com>
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index be7bc070eae5..dcc1c1fb5f9c 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -295,6 +295,7 @@ static void kvmppc_mmu_book3s_64_hv_reset_msr(struct kvm_vcpu *vcpu)
 		msr |= MSR_TS_S;
 	else
 		msr |= vcpu->arch.shregs.msr & MSR_TS_MASK;
+	msr |= vcpu->arch.shregs.msr & MSR_S;
 	kvmppc_set_msr(vcpu, msr);
 }
 
-- 
2.20.1


^ permalink raw reply related

* [RESEND v4 PATCH 1/2] [PowerPC] Add simd.h implementation
From: Shawn Landden @ 2019-05-18 16:04 UTC (permalink / raw)
  Cc: Paul Mackerras, Shawn Landden, linuxppc-dev
In-Reply-To: <20190515013725.2198-1-shawn@git.icu>

Based off the x86 one.

WireGuard really wants to be able to do SIMD in interrupts,
so it can accelerate its in-bound path.

v4: allow using the may_use_simd symbol even when it always
    returns false (via include guards)
Signed-off-by: Shawn Landden <shawn@git.icu>
---
 arch/powerpc/include/asm/simd.h | 17 +++++++++++++++++
 arch/powerpc/kernel/process.c   | 30 ++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)
 create mode 100644 arch/powerpc/include/asm/simd.h

diff --git a/arch/powerpc/include/asm/simd.h b/arch/powerpc/include/asm/simd.h
new file mode 100644
index 000000000..2fe26f258
--- /dev/null
+++ b/arch/powerpc/include/asm/simd.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+/*
+ * may_use_simd - whether it is allowable at this time to issue SIMD
+ *                instructions or access the SIMD register file
+ *
+ * It's always ok in process context (ie "not interrupt")
+ * but it is sometimes ok even from an irq.
+ */
+#ifdef CONFIG_PPC_FPU
+extern bool may_use_simd(void);
+#else
+static inline bool may_use_simd(void)
+{
+	return false;
+}
+#endif
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index dd9e0d538..ef534831f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -345,6 +345,36 @@ static int restore_altivec(struct task_struct *tsk)
 	}
 	return 0;
 }
+
+/*
+ * Were we in user mode when we were
+ * interrupted?
+ *
+ * Doing kernel_altivec/vsx_begin/end() is ok if we are running
+ * in an interrupt context from user mode - we'll just
+ * save the FPU state as required.
+ */
+static bool interrupted_user_mode(void)
+{
+	struct pt_regs *regs = get_irq_regs();
+
+	return regs && user_mode(regs);
+}
+
+/*
+ * Can we use FPU in kernel mode with the
+ * whole "kernel_fpu/altivec/vsx_begin/end()" sequence?
+ *
+ * It's always ok in process context (ie "not interrupt")
+ * but it is sometimes ok even from an irq.
+ */
+bool may_use_simd(void)
+{
+	return !in_interrupt() ||
+		interrupted_user_mode();
+}
+EXPORT_SYMBOL(may_use_simd);
+
 #else
 #define loadvec(thr) 0
 static inline int restore_altivec(struct task_struct *tsk) { return 0; }
-- 
2.21.0.1020.gf2820cf01a


^ permalink raw reply related

* [RESEND v4 PATCH 2/2] [PowerPC] Allow use of SIMD in interrupts from kernel code
From: Shawn Landden @ 2019-05-18 16:04 UTC (permalink / raw)
  Cc: Paul Mackerras, Shawn Landden, linuxppc-dev
In-Reply-To: <20190518160441.25008-1-shawn@git.icu>

This even allows simd in preemptible kernel code,
as does x86, although this is rarely safe (could be used with
kthread_create_on_cpu). All callers are disabling preemption.

v4: fix build without CONFIG_AVX
    change commit message
Signed-off-by: Shawn Landden <shawn@git.icu>
---
 arch/powerpc/include/asm/switch_to.h |  15 +---
 arch/powerpc/kernel/process.c        | 117 +++++++++++++++++++--------
 2 files changed, 88 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 5b03d8a82..c79f7d24a 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -30,10 +30,7 @@ extern void enable_kernel_fp(void);
 extern void flush_fp_to_thread(struct task_struct *);
 extern void giveup_fpu(struct task_struct *);
 extern void save_fpu(struct task_struct *);
-static inline void disable_kernel_fp(void)
-{
-	msr_check_and_clear(MSR_FP);
-}
+extern void disable_kernel_fp(void);
 #else
 static inline void save_fpu(struct task_struct *t) { }
 static inline void flush_fp_to_thread(struct task_struct *t) { }
@@ -44,10 +41,7 @@ extern void enable_kernel_altivec(void);
 extern void flush_altivec_to_thread(struct task_struct *);
 extern void giveup_altivec(struct task_struct *);
 extern void save_altivec(struct task_struct *);
-static inline void disable_kernel_altivec(void)
-{
-	msr_check_and_clear(MSR_VEC);
-}
+extern void disable_kernel_altivec(void);
 #else
 static inline void save_altivec(struct task_struct *t) { }
 static inline void __giveup_altivec(struct task_struct *t) { }
@@ -56,10 +50,7 @@ static inline void __giveup_altivec(struct task_struct *t) { }
 #ifdef CONFIG_VSX
 extern void enable_kernel_vsx(void);
 extern void flush_vsx_to_thread(struct task_struct *);
-static inline void disable_kernel_vsx(void)
-{
-	msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
-}
+extern void disable_kernel_vsx(void);
 #endif
 
 #ifdef CONFIG_SPE
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ef534831f..0136fd132 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -170,6 +170,29 @@ void __msr_check_and_clear(unsigned long bits)
 EXPORT_SYMBOL(__msr_check_and_clear);
 
 #ifdef CONFIG_PPC_FPU
+/*
+ * Track whether the kernel is using the FPU state
+ * currently.
+ *
+ * This flag is used:
+ *
+ *   - by IRQ context code to potentially use the FPU
+ *     if it's unused.
+ *
+ *   - to debug kernel_fpu/altivec/vsx_begin()/end() correctness
+ */
+static DEFINE_PER_CPU(bool, in_kernel_fpu);
+
+static bool kernel_fpu_disabled(void)
+{
+	return this_cpu_read(in_kernel_fpu);
+}
+
+static bool interrupted_kernel_fpu_idle(void)
+{
+	return !kernel_fpu_disabled();
+}
+
 static void __giveup_fpu(struct task_struct *tsk)
 {
 	unsigned long msr;
@@ -230,7 +253,8 @@ void enable_kernel_fp(void)
 {
 	unsigned long cpumsr;
 
-	WARN_ON(preemptible());
+	WARN_ON_ONCE(this_cpu_read(in_kernel_fpu));
+	this_cpu_write(in_kernel_fpu, true);
 
 	cpumsr = msr_check_and_set(MSR_FP);
 
@@ -251,6 +275,15 @@ void enable_kernel_fp(void)
 }
 EXPORT_SYMBOL(enable_kernel_fp);
 
+void disable_kernel_fp(void)
+{
+	WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu));
+	this_cpu_write(in_kernel_fpu, false);
+
+	msr_check_and_clear(MSR_FP);
+}
+EXPORT_SYMBOL(disable_kernel_fp);
+
 static int restore_fp(struct task_struct *tsk)
 {
 	if (tsk->thread.load_fp || tm_active_with_fp(tsk)) {
@@ -260,6 +293,37 @@ static int restore_fp(struct task_struct *tsk)
 	}
 	return 0;
 }
+
+/*
+ * Were we in user mode when we were
+ * interrupted?
+ *
+ * Doing kernel_altivec/vsx_begin/end() is ok if we are running
+ * in an interrupt context from user mode - we'll just
+ * save the FPU state as required.
+ */
+static bool interrupted_user_mode(void)
+{
+        struct pt_regs *regs = get_irq_regs();
+
+        return regs && user_mode(regs);
+}
+
+/*
+ * Can we use FPU in kernel mode with the
+ * whole "kernel_fpu/altivec/vsx_begin/end()" sequence?
+ *
+ * It's always ok in process context (ie "not interrupt")
+ * but it is sometimes ok even from an irq.
+ */
+bool may_use_simd(void)
+{
+        return !in_interrupt() ||
+                interrupted_user_mode() ||
+                interrupted_kernel_fpu_idle();
+}
+EXPORT_SYMBOL(may_use_simd);
+
 #else
 static int restore_fp(struct task_struct *tsk) { return 0; }
 #endif /* CONFIG_PPC_FPU */
@@ -295,7 +359,8 @@ void enable_kernel_altivec(void)
 {
 	unsigned long cpumsr;
 
-	WARN_ON(preemptible());
+	WARN_ON_ONCE(this_cpu_read(in_kernel_fpu));
+	this_cpu_write(in_kernel_fpu, true);
 
 	cpumsr = msr_check_and_set(MSR_VEC);
 
@@ -316,6 +381,14 @@ void enable_kernel_altivec(void)
 }
 EXPORT_SYMBOL(enable_kernel_altivec);
 
+extern void disable_kernel_altivec(void)
+{
+	WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu));
+	this_cpu_write(in_kernel_fpu, false);
+	msr_check_and_clear(MSR_VEC);
+}
+EXPORT_SYMBOL(disable_kernel_altivec);
+
 /*
  * Make sure the VMX/Altivec register state in the
  * the thread_struct is up to date for task tsk.
@@ -346,35 +419,6 @@ static int restore_altivec(struct task_struct *tsk)
 	return 0;
 }
 
-/*
- * Were we in user mode when we were
- * interrupted?
- *
- * Doing kernel_altivec/vsx_begin/end() is ok if we are running
- * in an interrupt context from user mode - we'll just
- * save the FPU state as required.
- */
-static bool interrupted_user_mode(void)
-{
-	struct pt_regs *regs = get_irq_regs();
-
-	return regs && user_mode(regs);
-}
-
-/*
- * Can we use FPU in kernel mode with the
- * whole "kernel_fpu/altivec/vsx_begin/end()" sequence?
- *
- * It's always ok in process context (ie "not interrupt")
- * but it is sometimes ok even from an irq.
- */
-bool may_use_simd(void)
-{
-	return !in_interrupt() ||
-		interrupted_user_mode();
-}
-EXPORT_SYMBOL(may_use_simd);
-
 #else
 #define loadvec(thr) 0
 static inline int restore_altivec(struct task_struct *tsk) { return 0; }
@@ -411,7 +455,8 @@ void enable_kernel_vsx(void)
 {
 	unsigned long cpumsr;
 
-	WARN_ON(preemptible());
+	WARN_ON_ONCE(this_cpu_read(in_kernel_fpu));
+	this_cpu_write(in_kernel_fpu, true);
 
 	cpumsr = msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
 
@@ -433,6 +478,14 @@ void enable_kernel_vsx(void)
 }
 EXPORT_SYMBOL(enable_kernel_vsx);
 
+void disable_kernel_vsx(void)
+{
+	WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu));
+	this_cpu_write(in_kernel_fpu, false);
+	msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
+}
+EXPORT_SYMBOL(disable_kernel_vsx);
+
 void flush_vsx_to_thread(struct task_struct *tsk)
 {
 	if (tsk->thread.regs) {
-- 
2.21.0.1020.gf2820cf01a


^ permalink raw reply related

* Re: [PATCH] mm/nvdimm: Pick the right alignment default when creating dax devices
From: Aneesh Kumar K.V @ 2019-05-19  8:55 UTC (permalink / raw)
  To: dan.j.williams; +Cc: linux-mm, Vaibhav Jain, linuxppc-dev, linux-nvdimm
In-Reply-To: <de5cbe7d-bd47-6793-1f1a-2274c5c59eb5@linux.ibm.com>


Hi Dan,

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> On 5/17/19 8:19 PM, Vaibhav Jain wrote:
>> Hi Aneesh,
>> 

....

>>
>>> +	/*
>>> +	 * Check whether the we support the alignment. For Dax if the
>>> +	 * superblock alignment is not matching, we won't initialize
>>> +	 * the device.
>>> +	 */
>>> +	if (!nd_supported_alignment(align) &&
>>> +	    memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
>> Suggestion to change this check to:
>> 
>> if (memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN) &&
>>     !nd_supported_alignment(align))
>> 
>> It would look  a bit more natural i.e. "If the device has dax signature and alignment is
>> not supported".
>> 
>
> I guess that should be !memcmp()? . I will send an updated patch with 
> the hash failure details in the commit message.
>

We need clarification on what the expected failure behaviour should be.
The nd_pmem_probe doesn't really have a failure behaviour in this
regard. For example.

I created a dax device with 16M alignment

{                                          
  "dev":"namespace0.0",
  "mode":"devdax",                         
  "map":"dev",                             
  "size":"9.98 GiB (10.72 GB)",
  "uuid":"ba62ef22-ebdf-4779-96f5-e6135383ed22",
  "raw_uuid":"7b2492f9-7160-4ee9-9c3d-2f547d9ef3ee",
  "daxregion":{                            
    "id":0,                                
    "size":"9.98 GiB (10.72 GB)",
    "align":16777216,
    "devices":[                            
      {                                    
        "chardev":"dax0.0",
        "size":"9.98 GiB (10.72 GB)"
      }                                    
    ]                                      
  },                                       
  "align":16777216,                        
  "numa_node":0,                           
  "supported_alignments":[
    65536,                                 
    16777216                               
  ]                                        
}      

Now what we want is to fail the initialization of the device when we
boot a kernel that doesn't support 16M page size. But with the
nd_pmem_probe failure behaviour we now end up with

[
  {
    "dev":"namespace0.0",
    "mode":"fsdax",
    "map":"mem",
    "size":10737418240,
    "uuid":"7b2492f9-7160-4ee9-9c3d-2f547d9ef3ee",
    "blockdev":"pmem0"
  }
]

So it did fallthrough the

	/* if we find a valid info-block we'll come back as that personality */
	if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
			|| nd_dax_probe(dev, ndns) == 0)
		return -ENXIO;

	/* ...otherwise we're just a raw pmem device */
	return pmem_attach_disk(dev, ndns);


Is it ok if i update the code such that we don't do that default
pmem_atach_disk if we have a label area?

-aneesh


^ permalink raw reply

* Re: [PATCH] mm/nvdimm: Pick the right alignment default when creating dax devices
From: Dan Williams @ 2019-05-19 16:30 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Linux MM, Vaibhav Jain, linuxppc-dev, linux-nvdimm
In-Reply-To: <87sgtaddru.fsf@linux.ibm.com>

On Sun, May 19, 2019 at 1:55 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
>
> Hi Dan,
>
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>
> > On 5/17/19 8:19 PM, Vaibhav Jain wrote:
> >> Hi Aneesh,
> >>
>
> ....
>
> >>
> >>> +   /*
> >>> +    * Check whether the we support the alignment. For Dax if the
> >>> +    * superblock alignment is not matching, we won't initialize
> >>> +    * the device.
> >>> +    */
> >>> +   if (!nd_supported_alignment(align) &&
> >>> +       memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN)) {
> >> Suggestion to change this check to:
> >>
> >> if (memcmp(pfn_sb->signature, DAX_SIG, PFN_SIG_LEN) &&
> >>     !nd_supported_alignment(align))
> >>
> >> It would look  a bit more natural i.e. "If the device has dax signature and alignment is
> >> not supported".
> >>
> >
> > I guess that should be !memcmp()? . I will send an updated patch with
> > the hash failure details in the commit message.
> >
>
> We need clarification on what the expected failure behaviour should be.
> The nd_pmem_probe doesn't really have a failure behaviour in this
> regard. For example.
>
> I created a dax device with 16M alignment
>
> {
>   "dev":"namespace0.0",
>   "mode":"devdax",
>   "map":"dev",
>   "size":"9.98 GiB (10.72 GB)",
>   "uuid":"ba62ef22-ebdf-4779-96f5-e6135383ed22",
>   "raw_uuid":"7b2492f9-7160-4ee9-9c3d-2f547d9ef3ee",
>   "daxregion":{
>     "id":0,
>     "size":"9.98 GiB (10.72 GB)",
>     "align":16777216,
>     "devices":[
>       {
>         "chardev":"dax0.0",
>         "size":"9.98 GiB (10.72 GB)"
>       }
>     ]
>   },
>   "align":16777216,
>   "numa_node":0,
>   "supported_alignments":[
>     65536,
>     16777216
>   ]
> }
>
> Now what we want is to fail the initialization of the device when we
> boot a kernel that doesn't support 16M page size. But with the
> nd_pmem_probe failure behaviour we now end up with
>
> [
>   {
>     "dev":"namespace0.0",
>     "mode":"fsdax",
>     "map":"mem",
>     "size":10737418240,
>     "uuid":"7b2492f9-7160-4ee9-9c3d-2f547d9ef3ee",
>     "blockdev":"pmem0"
>   }
> ]
>
> So it did fallthrough the
>
>         /* if we find a valid info-block we'll come back as that personality */
>         if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
>                         || nd_dax_probe(dev, ndns) == 0)
>                 return -ENXIO;
>
>         /* ...otherwise we're just a raw pmem device */
>         return pmem_attach_disk(dev, ndns);
>
>
> Is it ok if i update the code such that we don't do that default
> pmem_atach_disk if we have a label area?

Yes. This seems a new case where the driver finds a valid info-block,
but the capability to load that configuration is missing. So perhaps
special case a EOPNOTSUPP return code from those info-block probe
routines as "fail, and don't fallback to a raw device".

^ permalink raw reply

* Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.2-2 tag
From: pr-tracker-bot @ 2019-05-19 17:45 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: aneesh.kumar, linux-kernel, Linus Torvalds, tobin, linuxppc-dev
In-Reply-To: <87bm00818p.fsf@concordia.ellerman.id.au>

The pull request you sent on Sat, 18 May 2019 21:12:54 +1000:

> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-5.2-2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/86a78a8b8d0414455c2174852968ce54205add82

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply

* [Bug 203647] New: Locking API testsuite fails "mixed read-lock/lock-write ABBA" rlock on kernels >=4.14.x
From: bugzilla-daemon @ 2019-05-19 19:34 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=203647

            Bug ID: 203647
           Summary: Locking API testsuite fails "mixed
                    read-lock/lock-write ABBA" rlock on kernels >=4.14.x
           Product: Platform Specific/Hardware
           Version: 2.5
    Kernel Version: 5.1.3
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: PPC-64
          Assignee: platform_ppc-64@kernel-bugs.osdl.org
          Reporter: erhard_f@mailbox.org
        Regression: No

Created attachment 282831
  --> https://bugzilla.kernel.org/attachment.cgi?id=282831&action=edit
dmesg (5.1.3, G5 11,2)

Probably this test fails on ppc64 since it is around. Kernel 4.9.x passes all
tests, since it does not seem to contain "mixed read-lock/lock-write ABBA".

Machine is a PowerMac G5 11,2 running Gentoo Linux ppc64, Big Endian, 4 KiB
pagesize.

[    0.002051] ------------------------
[    0.002065] | Locking API testsuite:
[    0.002079]
----------------------------------------------------------------------------
[    0.002111]                                  | spin |wlock |rlock |mutex |
wsem | rsem |
[    0.002142]  
--------------------------------------------------------------------------
[    0.002179]                      A-A deadlock:  ok  |  ok  |  ok  |  ok  | 
ok  |  ok  |  ok  |
[    0.007366]                  A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  | 
ok  |  ok  |  ok  |
[    0.012471]              A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  | 
ok  |  ok  |  ok  |
[    0.017598]              A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  | 
ok  |  ok  |  ok  |
[    0.022740]          A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  | 
ok  |  ok  |  ok  |
[    0.027912]          A-B-C-D-B-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  | 
ok  |  ok  |  ok  |
[    0.033083]          A-B-C-D-B-C-D-A deadlock:  ok  |  ok  |  ok  |  ok  | 
ok  |  ok  |  ok  |
[    0.038269]                     double unlock:  ok  |  ok  |  ok  |  ok  | 
ok  |  ok  |  ok  |
[    0.043319]                   initialize held:  ok  |  ok  |  ok  |  ok  | 
ok  |  ok  |  ok  |
[    0.048379]  
--------------------------------------------------------------------------
[    0.048411]               recursive read-lock:             |  ok  |         
   |  ok  |
[    0.049894]            recursive read-lock #2:             |  ok  |         
   |  ok  |
[    0.051375]             mixed read-write-lock:             |  ok  |         
   |  ok  |
[    0.052859]             mixed write-read-lock:             |  ok  |         
   |  ok  |
[    0.054333]   mixed read-lock/lock-write ABBA:             |FAILED|         
   |  ok  |
[    0.055802]    mixed read-lock/lock-read ABBA:             |  ok  |         
   |  ok  |
[    0.057290]  mixed write-lock/lock-write ABBA:             |  ok  |         
   |  ok  |
[    0.058771]  
--------------------------------------------------------------------------

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [Bug 203647] Locking API testsuite fails "mixed read-lock/lock-write ABBA" rlock on kernels >=4.14.x
From: bugzilla-daemon @ 2019-05-19 19:35 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-203647-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=203647

--- Comment #1 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 282833
  --> https://bugzilla.kernel.org/attachment.cgi?id=282833&action=edit
dmesg (5.0.17, G5 11,2)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [Bug 203647] Locking API testsuite fails "mixed read-lock/lock-write ABBA" rlock on kernels >=4.14.x
From: bugzilla-daemon @ 2019-05-19 19:35 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-203647-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=203647

--- Comment #2 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 282835
  --> https://bugzilla.kernel.org/attachment.cgi?id=282835&action=edit
dmesg (4.19.44, G5 11,2)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [Bug 203647] Locking API testsuite fails "mixed read-lock/lock-write ABBA" rlock on kernels >=4.14.x
From: bugzilla-daemon @ 2019-05-19 19:36 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-203647-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=203647

--- Comment #3 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 282837
  --> https://bugzilla.kernel.org/attachment.cgi?id=282837&action=edit
dmesg (4.14.120, G5 11,2)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [Bug 203647] Locking API testsuite fails "mixed read-lock/lock-write ABBA" rlock on kernels >=4.14.x
From: bugzilla-daemon @ 2019-05-19 19:36 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-203647-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=203647

--- Comment #4 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 282839
  --> https://bugzilla.kernel.org/attachment.cgi?id=282839&action=edit
dmesg (4.9.177, G5 11,2)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [Bug 203647] Locking API testsuite fails "mixed read-lock/lock-write ABBA" rlock on kernels >=4.14.x
From: bugzilla-daemon @ 2019-05-19 19:37 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-203647-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=203647

--- Comment #5 from Erhard F. (erhard_f@mailbox.org) ---
Created attachment 282841
  --> https://bugzilla.kernel.org/attachment.cgi?id=282841&action=edit
kernel .config (5.1.3, G5 11,2)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [Bug 203597] kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage
From: bugzilla-daemon @ 2019-05-19 20:11 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-203597-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=203597

Erhard F. (erhard_f@mailbox.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |CODE_FIX

--- Comment #3 from Erhard F. (erhard_f@mailbox.org) ---
(In reply to Christophe Leroy from comment #2)
> You are missing following commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/
> ?id=b45ba4a51cd
Your fix landed in 4.9.177 and I can confirm my G4 boots fine now. Thanks!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [PATCH V2 3/3] soc: fsl: add RCPM driver
From: Pavel Machek @ 2019-05-19 21:38 UTC (permalink / raw)
  To: Ran Wang
  Cc: Mark Rutland, Len Brown, devicetree, Greg Kroah-Hartman, linux-pm,
	Rafael J . Wysocki, linux-kernel, Li Yang, Rob Herring,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20190517033946.30763-3-ran.wang_1@nxp.com>

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

Hi!


> +
> +struct rcpm {
> +	unsigned int wakeup_cells;
> +	void __iomem *ippdexpcr_base;
> +	bool	little_endian;
> +};

Inconsistent whitespace


> +static int rcpm_pm_prepare(struct device *dev)
> +{
> +	struct device_node *np = dev->of_node;
> +	struct wakeup_source *ws;
> +	struct rcpm *rcpm;
> +	u32 value[RCPM_WAKEUP_CELL_MAX_SIZE + 1], tmp;
> +	int i, ret;
> +
> +	rcpm = dev_get_drvdata(dev);
> +	if (!rcpm)
> +		return -EINVAL;
> +
> +	/* Begin with first registered wakeup source */
> +	ws = wakeup_source_get_next(NULL);
> +	while (ws) {

while (ws = wakeup_source_get_next(NULL)) ?


> +static int rcpm_probe(struct platform_device *pdev)
> +{
> +	struct device	*dev = &pdev->dev;
> +	struct resource *r;
> +	struct rcpm		*rcpm;
> +	int ret;

Whitespace.

								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH V2 1/3] PM: wakeup: Add routine to help fetch wakeup source object.
From: Pavel Machek @ 2019-05-19 21:34 UTC (permalink / raw)
  To: Ran Wang
  Cc: Mark Rutland, Len Brown, devicetree, Greg Kroah-Hartman, linux-pm,
	Rafael J . Wysocki, linux-kernel, Li Yang, Rob Herring,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20190517033946.30763-1-ran.wang_1@nxp.com>

[-- Attachment #1: Type: text/plain, Size: 459 bytes --]


> --- a/include/linux/pm_wakeup.h

> @@ -70,6 +71,7 @@ struct wakeup_source {
>  	unsigned long		wakeup_count;
>  	bool			active:1;
>  	bool			autosleep_enabled:1;
> +	struct device	*attached_dev;
>  };
>  
>  #ifdef CONFIG_PM_SLEEP

You might want to format this similary to the rest...
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH] ocxl: Fix potential memory leak on context creation
From: Andrew Donnellan @ 2019-05-20  1:45 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, andrew.donnellan, alastair; +Cc: clombard
In-Reply-To: <20190517142054.13933-1-fbarrat@linux.ibm.com>

On 18/5/19 12:20 am, Frederic Barrat wrote:
> If we couldn't fully init a context, we were leaking memory.
> 
> Fixes: b9721d275cc2 ("ocxl: Allow external drivers to use OpenCAPI contexts")
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>

Acked-by: Andrew Donnellan <ajd@linux.ibm.com>

> ---
>   drivers/misc/ocxl/context.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
> index bab9c9364184..ab93156aa83e 100644
> --- a/drivers/misc/ocxl/context.c
> +++ b/drivers/misc/ocxl/context.c
> @@ -22,6 +22,7 @@ int ocxl_context_alloc(struct ocxl_context **context, struct ocxl_afu *afu,
>   			afu->pasid_base + afu->pasid_max, GFP_KERNEL);
>   	if (pasid < 0) {
>   		mutex_unlock(&afu->contexts_lock);
> +		kfree(*context);

(defensive programming: set *context = NULL so that if the caller 
ignores the return code we get an obvious crash)

>   		return pasid;
>   	}
>   	afu->pasid_count++;
> 

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited


^ permalink raw reply

* [Bug 203597] kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage
From: bugzilla-daemon @ 2019-05-20  1:53 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-203597-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=203597

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED
                 CC|                            |michael@ellerman.id.au

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [PATCH] crypto: vmx - CTR: always increment IV as quadword
From: Daniel Axtens @ 2019-05-20  1:59 UTC (permalink / raw)
  To: mpe, ebiggers, linux-crypto, Herbert Xu
  Cc: leo.barbosa, Stephan Mueller, nayna, omosnacek, leitao, pfsmorigo,
	marcelo.cerri, gcwilson, linuxppc-dev
In-Reply-To: <20190515102450.30557-1-dja@axtens.net>

Daniel Axtens <dja@axtens.net> writes:

> The kernel self-tests picked up an issue with CTR mode:
> alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep"
>
> Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so
> after 3 increments it should wrap around to 0.
>
> In the aesp8-ppc code from OpenSSL, there are two paths that
> increment IVs: the bulk (8 at a time) path, and the individual
> path which is used when there are fewer than 8 AES blocks to
> process.
>
> In the bulk path, the IV is incremented with vadduqm: "Vector
> Add Unsigned Quadword Modulo", which does 128-bit addition.
>
> In the individual path, however, the IV is incremented with
> vadduwm: "Vector Add Unsigned Word Modulo", which instead
> does 4 32-bit additions. Thus the IV would instead become
> FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result.
>
> Use vadduqm.
>
> This was probably a typo originally, what with q and w being
> adjacent. It is a pretty narrow edge case: I am really
> impressed by the quality of the kernel self-tests!
>
> Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM")
> Cc: stable@vger.kernel.org
> Signed-off-by: Daniel Axtens <dja@axtens.net>
>
> ---
>
> I'll pass this along internally to get it into OpenSSL as well.

I passed this along to OpenSSL and got pretty comprehensively schooled:
https://github.com/openssl/openssl/pull/8942

It seems we tweak the openssl code to use a 128-bit counter, whereas
the original code was in fact designed for a 32-bit counter. We must
have changed the vaddu instruction in the bulk path but not in the
individual path, as they're both vadduwm (4x32-bit) upstream.

I think this change is still correct with regards to the kernel,
but I guess it's probably something where I should have done a more
thorough read of the documentation before diving in to the code, and
perhaps we should note it in the code somewhere too. Ah well.

Regards,
Daniel

> ---
>  drivers/crypto/vmx/aesp8-ppc.pl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/crypto/vmx/aesp8-ppc.pl b/drivers/crypto/vmx/aesp8-ppc.pl
> index de78282b8f44..9c6b5c1d6a1a 100644
> --- a/drivers/crypto/vmx/aesp8-ppc.pl
> +++ b/drivers/crypto/vmx/aesp8-ppc.pl
> @@ -1357,7 +1357,7 @@ Loop_ctr32_enc:
>  	addi		$idx,$idx,16
>  	bdnz		Loop_ctr32_enc
>  
> -	vadduwm		$ivec,$ivec,$one
> +	vadduqm		$ivec,$ivec,$one
>  	 vmr		$dat,$inptail
>  	 lvx		$inptail,0,$inp
>  	 addi		$inp,$inp,16
> -- 
> 2.19.1

^ permalink raw reply

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
From: Michael Ellerman @ 2019-05-20  2:02 UTC (permalink / raw)
  To: bharata, srikanth
  Cc: aneesh.kumar, linux-kernel, npiggin, linux-next, bharata,
	linuxppc-dev
In-Reply-To: <20190518141434.GA22939@in.ibm.com>

Bharata B Rao <bharata@linux.ibm.com> writes:
> On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
>> Hello,
>> 
>> On power9 host, performing memory hotunplug from ppc64le guest results in
>> kernel oops.
>> 
>> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
>> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>> 
>> Recreation steps:
>> 
>> 1. Boot a guest with below mem configuration:
>>   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
>>   <memory unit='KiB'>8388608</memory>
>>   <currentMemory unit='KiB'>4194304</currentMemory>
>>   <cpu>
>>     <numa>
>>       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
>>     </numa>
>>   </cpu>
>> 
>> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
>> reboot guest -> once guest comes back try to unplug 8G memory
>> 
>> mem.xml used:
>> <memory model='dimm'>
>> <target>
>> <size unit='GiB'>8</size>
>> <node>0</node>
>> </target>
>> </memory>
>> 
>> Memory attach and detach commands used:
>>     virsh attach-device vm1 ./mem.xml --live
>>     virsh detach-device vm1 ./mem.xml --live
>> 
>> Trace seen inside guest after unplug, guest just hangs there forever:
>> 
>> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
>> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
>> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
>> pSeries
>> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
>> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
>> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
>> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> xor raid6_pq multipath crc32c_vpmsum
>> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
>> tainted 5.1.0-dirty #2
>> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
>> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR:
>> 0000000000008000
>> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
>> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
>> 28002884  XER: 20040000
>> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
>> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000
>> 0000000000fff8c0
>> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005
>> 0000000000000020
>> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0
>> c0000000016d21a0
>> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000
>> c0000003ffe30100
>> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000
>> c0000000016d21b0
>> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8
>> c00a000000a00000
>> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000
>> c0000003ffe96000
>> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000
>> c00a000000fff8c0
>> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
>> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
>> [   21.963873] Call Trace:
>> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0
>> (unreliable)
>> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
>> [   21.963969] [c0000003f88038c0] [c00000000006f038]
>> vmemmap_free+0x218/0x2e0
>> [   21.964006] [c0000003f8803940] [c00000000036f100]
>> sparse_remove_one_section+0xd0/0x138
>> [   21.964050] [c0000003f8803980] [c000000000383a50]
>> __remove_pages+0x410/0x560
>> [   21.964093] [c0000003f8803a90] [c000000000c784d8]
>> arch_remove_memory+0x68/0xdc
>> [   21.964136] [c0000003f8803ad0] [c000000000385d74]
>> __remove_memory+0xc4/0x110
>> [   21.964180] [c0000003f8803b10] [c0000000000d44e4]
>> dlpar_remove_lmb+0x94/0x140
>> [   21.964223] [c0000003f8803b50] [c0000000000d52b4]
>> dlpar_memory+0x464/0xd00
>> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0]
>> handle_dlpar_errorlog+0xc0/0x190
>> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc]
>> pseries_hp_work_fn+0x2c/0x60
>> [   21.964346] [c0000003f8803c80] [c00000000013a4a0]
>> process_one_work+0x2b0/0x5a0
>> [   21.964388] [c0000003f8803d10] [c00000000013a818]
>> worker_thread+0x88/0x610
>> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
>> [   21.964468] [c0000003f8803e20] [c00000000000bdc4]
>> ret_from_kernel_thread+0x5c/0x78
>> [   21.964506] Instruction dump:
>> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14
>> 395f0020 813f0020
>> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac
>> 7d205028 3129ffff
>> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
>> [   21.966349]
>> [   21.966383] Sending IPI to other CPUs
>> [   21.978335] IPI complete
>> [   21.981354] kexec: Starting switchover sequence.
>> I'm in purgatory
>
> git bisect points to
>
> commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> Author: Nicholas Piggin <npiggin@gmail.com>
> Date:   Fri Jul 27 21:48:17 2018 +1000
>
>     powerpc/64s: Fix page table fragment refcount race vs speculative references
>
>     The page table fragment allocator uses the main page refcount racily
>     with respect to speculative references. A customer observed a BUG due
>     to page table page refcount underflow in the fragment allocator. This
>     can be caused by the fragment allocator set_page_count stomping on a
>     speculative reference, and then the speculative failure handler
>     decrements the new reference, and the underflow eventually pops when
>     the page tables are freed.
>
>     Fix this by using a dedicated field in the struct page for the page
>     table fragment allocator.
>
>     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>     Cc: stable@vger.kernel.org # v3.10+

That's the commit that added the BUG_ON(), so prior to that you won't
see the crash.

cheers

^ permalink raw reply

* RE: [PATCH V2 1/3] PM: wakeup: Add routine to help fetch wakeup source object.
From: Ran Wang @ 2019-05-20  2:15 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Mark Rutland, Len Brown, devicetree@vger.kernel.org,
	Greg Kroah-Hartman, linux-pm@vger.kernel.org, Rafael J . Wysocki,
	linux-kernel@vger.kernel.org, Leo Li, Rob Herring,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20190519213457.GG31403@amd>

Hi Pavel,

On Monday, May 20, 2019 05:35, Pavel Machek wrote:
> 
> > --- a/include/linux/pm_wakeup.h
> 
> > @@ -70,6 +71,7 @@ struct wakeup_source {
> >  	unsigned long		wakeup_count;
> >  	bool			active:1;
> >  	bool			autosleep_enabled:1;
> > +	struct device	*attached_dev;
> >  };
> >
> >  #ifdef CONFIG_PM_SLEEP
> 
> You might want to format this similary to the rest...

OK, will update, thanks.

Regards,
Ran

^ permalink raw reply

* [PATCH] kbuild: do not check name uniqueness of builtin modules
From: Masahiro Yamada @ 2019-05-20  2:54 UTC (permalink / raw)
  To: linux-kbuild
  Cc: Michael Schmitz, Stephen Rothwell, linuxppc-dev, Kees Cook,
	Arnd Bergmann, Masahiro Yamada, Greg KH, Rusty Russell,
	Lucas De Marchi, linux-kernel, Lucas De Marchi, Linus Torvalds,
	Jessica Yu, Sam Ravnborg

I just thought it was a good idea to scan builtin.modules in the name
uniqueness checking, but Stephen reported a false positive.

ppc64_defconfig produces:

  warning: same basename if the following are built as modules:
    arch/powerpc/platforms/powermac/nvram.ko
    drivers/char/nvram.ko

..., which is a false positive because the former is never built as
a module as you see in arch/powerpc/platforms/powermac/Makefile:

  # CONFIG_NVRAM is an arch. independent tristate symbol, for pmac32 we really
  # need this to be a bool.  Cheat here and pretend CONFIG_NVRAM=m is really
  # CONFIG_NVRAM=y
  obj-$(CONFIG_NVRAM:m=y)         += nvram.o

Since we cannot predict how tricky Makefiles are written in wild,
builtin.modules may potentially contain false positives. I do not
think it is a big deal as far as kmod is concerned, but false positive
warnings in the kernel build makes people upset. It is better to not
do it.

Even without checking builtin.modules, we have enough (and more solid)
test coverage with allmodconfig.

While I touched this part, I replaced the sed code with neater one
provided by Stephen.

Link: https://lkml.org/lkml/2019/5/19/120
Link: https://lkml.org/lkml/2019/5/19/123
Fixes: 3a48a91901c5 ("kbuild: check uniqueness of module names")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---

 scripts/modules-check.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/modules-check.sh b/scripts/modules-check.sh
index 2f659530e1ec..39e8cb36ba19 100755
--- a/scripts/modules-check.sh
+++ b/scripts/modules-check.sh
@@ -6,10 +6,10 @@ set -e
 # Check uniqueness of module names
 check_same_name_modules()
 {
-	for m in $(sed 's:.*/::' modules.order modules.builtin | sort | uniq -d)
+	for m in $(sed 's:.*/::' modules.order | sort | uniq -d)
 	do
-		echo "warning: same basename if the following are built as modules:" >&2
-		sed "/\/$m/!d;s:^kernel/:  :" modules.order modules.builtin >&2
+		echo "warning: same module names found:" >&2
+		sed -n "/\/$m/s:^kernel/:  :p" modules.order >&2
 	done
 }

-- 
2.17.1

^ permalink raw reply related

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
From: Bharata B Rao @ 2019-05-20  4:25 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: aneesh.kumar, linux-kernel, npiggin, linux-next, bharata,
	srikanth, linuxppc-dev
In-Reply-To: <878sv1993k.fsf@concordia.ellerman.id.au>

On Mon, May 20, 2019 at 12:02:23PM +1000, Michael Ellerman wrote:
> Bharata B Rao <bharata@linux.ibm.com> writes:
> > On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
> >> Hello,
> >> 
> >> On power9 host, performing memory hotunplug from ppc64le guest results in
> >> kernel oops.
> >> 
> >> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
> >> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
> >> 
> >> Recreation steps:
> >> 
> >> 1. Boot a guest with below mem configuration:
> >>   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
> >>   <memory unit='KiB'>8388608</memory>
> >>   <currentMemory unit='KiB'>4194304</currentMemory>
> >>   <cpu>
> >>     <numa>
> >>       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
> >>     </numa>
> >>   </cpu>
> >> 
> >> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
> >> reboot guest -> once guest comes back try to unplug 8G memory
> >> 
> >> mem.xml used:
> >> <memory model='dimm'>
> >> <target>
> >> <size unit='GiB'>8</size>
> >> <node>0</node>
> >> </target>
> >> </memory>
> >> 
> >> Memory attach and detach commands used:
> >>     virsh attach-device vm1 ./mem.xml --live
> >>     virsh detach-device vm1 ./mem.xml --live
> >> 
> >> Trace seen inside guest after unplug, guest just hangs there forever:
> >> 
> >> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> >> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
> >> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
> >> pSeries
> >> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
> >> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
> >> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
> >> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> >> xor raid6_pq multipath crc32c_vpmsum
> >> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
> >> tainted 5.1.0-dirty #2
> >> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
> >> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR:
> >> 0000000000008000
> >> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
> >> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
> >> 28002884  XER: 20040000
> >> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
> >> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000
> >> 0000000000fff8c0
> >> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005
> >> 0000000000000020
> >> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0
> >> c0000000016d21a0
> >> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000
> >> c0000003ffe30100
> >> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000
> >> c0000000016d21b0
> >> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8
> >> c00a000000a00000
> >> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000
> >> c0000003ffe96000
> >> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000
> >> c00a000000fff8c0
> >> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
> >> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
> >> [   21.963873] Call Trace:
> >> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0
> >> (unreliable)
> >> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
> >> [   21.963969] [c0000003f88038c0] [c00000000006f038]
> >> vmemmap_free+0x218/0x2e0
> >> [   21.964006] [c0000003f8803940] [c00000000036f100]
> >> sparse_remove_one_section+0xd0/0x138
> >> [   21.964050] [c0000003f8803980] [c000000000383a50]
> >> __remove_pages+0x410/0x560
> >> [   21.964093] [c0000003f8803a90] [c000000000c784d8]
> >> arch_remove_memory+0x68/0xdc
> >> [   21.964136] [c0000003f8803ad0] [c000000000385d74]
> >> __remove_memory+0xc4/0x110
> >> [   21.964180] [c0000003f8803b10] [c0000000000d44e4]
> >> dlpar_remove_lmb+0x94/0x140
> >> [   21.964223] [c0000003f8803b50] [c0000000000d52b4]
> >> dlpar_memory+0x464/0xd00
> >> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0]
> >> handle_dlpar_errorlog+0xc0/0x190
> >> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc]
> >> pseries_hp_work_fn+0x2c/0x60
> >> [   21.964346] [c0000003f8803c80] [c00000000013a4a0]
> >> process_one_work+0x2b0/0x5a0
> >> [   21.964388] [c0000003f8803d10] [c00000000013a818]
> >> worker_thread+0x88/0x610
> >> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
> >> [   21.964468] [c0000003f8803e20] [c00000000000bdc4]
> >> ret_from_kernel_thread+0x5c/0x78
> >> [   21.964506] Instruction dump:
> >> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14
> >> 395f0020 813f0020
> >> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac
> >> 7d205028 3129ffff
> >> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
> >> [   21.966349]
> >> [   21.966383] Sending IPI to other CPUs
> >> [   21.978335] IPI complete
> >> [   21.981354] kexec: Starting switchover sequence.
> >> I'm in purgatory
> >
> > git bisect points to
> >
> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> > Author: Nicholas Piggin <npiggin@gmail.com>
> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >
> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> >
> >     The page table fragment allocator uses the main page refcount racily
> >     with respect to speculative references. A customer observed a BUG due
> >     to page table page refcount underflow in the fragment allocator. This
> >     can be caused by the fragment allocator set_page_count stomping on a
> >     speculative reference, and then the speculative failure handler
> >     decrements the new reference, and the underflow eventually pops when
> >     the page tables are freed.
> >
> >     Fix this by using a dedicated field in the struct page for the page
> >     table fragment allocator.
> >
> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >     Cc: stable@vger.kernel.org # v3.10+
> 
> That's the commit that added the BUG_ON(), so prior to that you won't
> see the crash.

Right, but the commit says it fixes page table page refcount underflow by
introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
for this pt_frag_refcount.

BTW, if I go below this commit, I don't hit the pagecount

VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);

which is in pte_fragment_free() path.

Regards,
Bharata.


^ permalink raw reply

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
From: Nicholas Piggin @ 2019-05-20  4:48 UTC (permalink / raw)
  To: bharata, Michael Ellerman
  Cc: aneesh.kumar, linux-kernel, srikanth, linux-next, bharata,
	linuxppc-dev
In-Reply-To: <20190520042533.GB22939@in.ibm.com>

Bharata B Rao's on May 20, 2019 2:25 pm:
> On Mon, May 20, 2019 at 12:02:23PM +1000, Michael Ellerman wrote:
>> Bharata B Rao <bharata@linux.ibm.com> writes:
>> > On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
>> >> Hello,
>> >> 
>> >> On power9 host, performing memory hotunplug from ppc64le guest results in
>> >> kernel oops.
>> >> 
>> >> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
>> >> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>> >> 
>> >> Recreation steps:
>> >> 
>> >> 1. Boot a guest with below mem configuration:
>> >>   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
>> >>   <memory unit='KiB'>8388608</memory>
>> >>   <currentMemory unit='KiB'>4194304</currentMemory>
>> >>   <cpu>
>> >>     <numa>
>> >>       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
>> >>     </numa>
>> >>   </cpu>
>> >> 
>> >> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
>> >> reboot guest -> once guest comes back try to unplug 8G memory
>> >> 
>> >> mem.xml used:
>> >> <memory model='dimm'>
>> >> <target>
>> >> <size unit='GiB'>8</size>
>> >> <node>0</node>
>> >> </target>
>> >> </memory>
>> >> 
>> >> Memory attach and detach commands used:
>> >>     virsh attach-device vm1 ./mem.xml --live
>> >>     virsh detach-device vm1 ./mem.xml --live
>> >> 
>> >> Trace seen inside guest after unplug, guest just hangs there forever:
>> >> 
>> >> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
>> >> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
>> >> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
>> >> pSeries
>> >> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
>> >> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
>> >> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
>> >> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> >> xor raid6_pq multipath crc32c_vpmsum
>> >> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
>> >> tainted 5.1.0-dirty #2
>> >> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
>> >> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR:
>> >> 0000000000008000
>> >> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
>> >> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
>> >> 28002884  XER: 20040000
>> >> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
>> >> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000
>> >> 0000000000fff8c0
>> >> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005
>> >> 0000000000000020
>> >> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0
>> >> c0000000016d21a0
>> >> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000
>> >> c0000003ffe30100
>> >> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000
>> >> c0000000016d21b0
>> >> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8
>> >> c00a000000a00000
>> >> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000
>> >> c0000003ffe96000
>> >> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000
>> >> c00a000000fff8c0
>> >> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
>> >> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
>> >> [   21.963873] Call Trace:
>> >> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0
>> >> (unreliable)
>> >> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
>> >> [   21.963969] [c0000003f88038c0] [c00000000006f038]
>> >> vmemmap_free+0x218/0x2e0
>> >> [   21.964006] [c0000003f8803940] [c00000000036f100]
>> >> sparse_remove_one_section+0xd0/0x138
>> >> [   21.964050] [c0000003f8803980] [c000000000383a50]
>> >> __remove_pages+0x410/0x560
>> >> [   21.964093] [c0000003f8803a90] [c000000000c784d8]
>> >> arch_remove_memory+0x68/0xdc
>> >> [   21.964136] [c0000003f8803ad0] [c000000000385d74]
>> >> __remove_memory+0xc4/0x110
>> >> [   21.964180] [c0000003f8803b10] [c0000000000d44e4]
>> >> dlpar_remove_lmb+0x94/0x140
>> >> [   21.964223] [c0000003f8803b50] [c0000000000d52b4]
>> >> dlpar_memory+0x464/0xd00
>> >> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0]
>> >> handle_dlpar_errorlog+0xc0/0x190
>> >> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc]
>> >> pseries_hp_work_fn+0x2c/0x60
>> >> [   21.964346] [c0000003f8803c80] [c00000000013a4a0]
>> >> process_one_work+0x2b0/0x5a0
>> >> [   21.964388] [c0000003f8803d10] [c00000000013a818]
>> >> worker_thread+0x88/0x610
>> >> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
>> >> [   21.964468] [c0000003f8803e20] [c00000000000bdc4]
>> >> ret_from_kernel_thread+0x5c/0x78
>> >> [   21.964506] Instruction dump:
>> >> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14
>> >> 395f0020 813f0020
>> >> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac
>> >> 7d205028 3129ffff
>> >> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
>> >> [   21.966349]
>> >> [   21.966383] Sending IPI to other CPUs
>> >> [   21.978335] IPI complete
>> >> [   21.981354] kexec: Starting switchover sequence.
>> >> I'm in purgatory
>> >
>> > git bisect points to
>> >
>> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>> > Author: Nicholas Piggin <npiggin@gmail.com>
>> > Date:   Fri Jul 27 21:48:17 2018 +1000
>> >
>> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
>> >
>> >     The page table fragment allocator uses the main page refcount racily
>> >     with respect to speculative references. A customer observed a BUG due
>> >     to page table page refcount underflow in the fragment allocator. This
>> >     can be caused by the fragment allocator set_page_count stomping on a
>> >     speculative reference, and then the speculative failure handler
>> >     decrements the new reference, and the underflow eventually pops when
>> >     the page tables are freed.
>> >
>> >     Fix this by using a dedicated field in the struct page for the page
>> >     table fragment allocator.
>> >
>> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>> >     Cc: stable@vger.kernel.org # v3.10+
>> 
>> That's the commit that added the BUG_ON(), so prior to that you won't
>> see the crash.
> 
> Right, but the commit says it fixes page table page refcount underflow by
> introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> for this pt_frag_refcount.

The fixed underflow is caused by a bug (race on page count) that got 
fixed by that patch. You are hitting a different underflow here. It's
not certain my patch caused it, I'm just trying to reproduce now.

> 
> BTW, if I go below this commit, I don't hit the pagecount
> 
> VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
> 
> which is in pte_fragment_free() path.

Do you have CONFIG_DEBUG_VM=y?

Thanks,
Nick


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox