LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: Declaring unrecoverable_exception() as __noreturn ?
From: Christophe Leroy @ 2021-02-11  6:13 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <87mtwbnrlf.fsf@mpe.ellerman.id.au>



Le 11/02/2021 à 05:41, Michael Ellerman a écrit :
> Nicholas Piggin <npiggin@gmail.com> writes:
>> Excerpts from Christophe Leroy's message of February 11, 2021 2:44 am:
>>> As far as I can see, almost all callers of unrecoverable_exception() expect it to never return.
>>>
>>> Can we mark it __noreturn ?
>>
>> I don't see why not, do_exit is noreturn. We could make die() noreturn
>> as well.
> 
> I'm always nervous about that, because we can return if a debugger is
> involved:
> 
> DEFINE_INTERRUPT_HANDLER(unrecoverable_exception)

Hum ... Is that correct to define it as an interrupt handler ?

Also, I see it declared a second time in interrupt.c, this time not as an interrupt handler. Is that 
wanted ?

> {
> 	pr_emerg("Unrecoverable exception %lx at %lx (msr=%lx)\n",
> 		 regs->trap, regs->nip, regs->msr);
> 	die("Unrecoverable exception", regs, SIGABRT);
> }
> 
> void die(const char *str, struct pt_regs *regs, long err)
> {
> 	unsigned long flags;
> 
> 	/*
> 	 * system_reset_excption handles debugger, crash dump, panic, for 0x100
> 	 */
> 	if (TRAP(regs) != 0x100) {
> 		if (debugger(regs))
> 			return;
> 
> 
> We obviously don't want to optimise for that case, but it worries me
> slightly if we're marking things noreturn when they can actually return.
> 

I don't think I want to declare die() as __noreturn, need to look at it more in details first.

Christophe

^ permalink raw reply

* [PATCH] powerpc/traps: Declare unrecoverable_exception() as __noreturn
From: Christophe Leroy @ 2021-02-11  6:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, npiggin
  Cc: linuxppc-dev, linux-kernel

unrecoverable_exception() is never expected to return, most callers
have an infiniteloop in case it returns.

Ensure it really never returns by terminating it with a BUG(), and
declare it __no_return.

It always GCC to really simplify functions calling it. In the exemple below,
it avoids the stack frame in the likely fast path and avoids code duplication
for the exit.

With this patch:

	00000348 <interrupt_exit_kernel_prepare>:
	 348:	81 43 00 84 	lwz     r10,132(r3)
	 34c:	71 48 00 02 	andi.   r8,r10,2
	 350:	41 82 00 2c 	beq     37c <interrupt_exit_kernel_prepare+0x34>
	 354:	71 4a 40 00 	andi.   r10,r10,16384
	 358:	40 82 00 20 	bne     378 <interrupt_exit_kernel_prepare+0x30>
	 35c:	80 62 00 70 	lwz     r3,112(r2)
	 360:	74 63 00 01 	andis.  r3,r3,1
	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
	 368:	7d 40 00 a6 	mfmsr   r10
	 36c:	7c 11 13 a6 	mtspr   81,r0
	 370:	7c 12 13 a6 	mtspr   82,r0
	 374:	4e 80 00 20 	blr
	 378:	48 00 00 00 	b       378 <interrupt_exit_kernel_prepare+0x30>
	 37c:	94 21 ff f0 	stwu    r1,-16(r1)
	 380:	7c 08 02 a6 	mflr    r0
	 384:	90 01 00 14 	stw     r0,20(r1)
	 388:	48 00 00 01 	bl      388 <interrupt_exit_kernel_prepare+0x40>
				388: R_PPC_REL24	unrecoverable_exception
	 38c:	38 e2 00 70 	addi    r7,r2,112
	 390:	3d 00 00 01 	lis     r8,1
	 394:	7c c0 38 28 	lwarx   r6,0,r7
	 398:	7c c6 40 78 	andc    r6,r6,r8
	 39c:	7c c0 39 2d 	stwcx.  r6,0,r7
	 3a0:	40 a2 ff f4 	bne     394 <interrupt_exit_kernel_prepare+0x4c>
	 3a4:	38 60 00 01 	li      r3,1
	 3a8:	4b ff ff c0 	b       368 <interrupt_exit_kernel_prepare+0x20>

Without this patch:

	00000348 <interrupt_exit_kernel_prepare>:
	 348:	94 21 ff f0 	stwu    r1,-16(r1)
	 34c:	93 e1 00 0c 	stw     r31,12(r1)
	 350:	7c 7f 1b 78 	mr      r31,r3
	 354:	81 23 00 84 	lwz     r9,132(r3)
	 358:	71 2a 00 02 	andi.   r10,r9,2
	 35c:	41 82 00 34 	beq     390 <interrupt_exit_kernel_prepare+0x48>
	 360:	71 29 40 00 	andi.   r9,r9,16384
	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
	 368:	80 62 00 70 	lwz     r3,112(r2)
	 36c:	74 63 00 01 	andis.  r3,r3,1
	 370:	40 82 00 3c 	bne     3ac <interrupt_exit_kernel_prepare+0x64>
	 374:	7d 20 00 a6 	mfmsr   r9
	 378:	7c 11 13 a6 	mtspr   81,r0
	 37c:	7c 12 13 a6 	mtspr   82,r0
	 380:	83 e1 00 0c 	lwz     r31,12(r1)
	 384:	38 21 00 10 	addi    r1,r1,16
	 388:	4e 80 00 20 	blr
	 38c:	48 00 00 00 	b       38c <interrupt_exit_kernel_prepare+0x44>
	 390:	7c 08 02 a6 	mflr    r0
	 394:	90 01 00 14 	stw     r0,20(r1)
	 398:	48 00 00 01 	bl      398 <interrupt_exit_kernel_prepare+0x50>
				398: R_PPC_REL24	unrecoverable_exception
	 39c:	80 01 00 14 	lwz     r0,20(r1)
	 3a0:	81 3f 00 84 	lwz     r9,132(r31)
	 3a4:	7c 08 03 a6 	mtlr    r0
	 3a8:	4b ff ff b8 	b       360 <interrupt_exit_kernel_prepare+0x18>
	 3ac:	39 02 00 70 	addi    r8,r2,112
	 3b0:	3d 40 00 01 	lis     r10,1
	 3b4:	7c e0 40 28 	lwarx   r7,0,r8
	 3b8:	7c e7 50 78 	andc    r7,r7,r10
	 3bc:	7c e0 41 2d 	stwcx.  r7,0,r8
	 3c0:	40 a2 ff f4 	bne     3b4 <interrupt_exit_kernel_prepare+0x6c>
	 3c4:	38 60 00 01 	li      r3,1
	 3c8:	4b ff ff ac 	b       374 <interrupt_exit_kernel_prepare+0x2c>

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/interrupt.h | 2 +-
 arch/powerpc/kernel/interrupt.c      | 1 -
 arch/powerpc/kernel/traps.c          | 2 ++
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index dcff30e3919b..fa8bfb91f8df 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -411,7 +411,7 @@ DECLARE_INTERRUPT_HANDLER(altivec_assist_exception);
 DECLARE_INTERRUPT_HANDLER(CacheLockingException);
 DECLARE_INTERRUPT_HANDLER(SPEFloatingPointException);
 DECLARE_INTERRUPT_HANDLER(SPEFloatingPointRoundException);
-DECLARE_INTERRUPT_HANDLER(unrecoverable_exception);
+DECLARE_INTERRUPT_HANDLER(unrecoverable_exception) __noreturn;
 DECLARE_INTERRUPT_HANDLER(WatchdogException);
 DECLARE_INTERRUPT_HANDLER(kernel_bad_stack);
 
diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index eca3be36c18c..7e7106641ca9 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -440,7 +440,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
 	return ret;
 }
 
-void unrecoverable_exception(struct pt_regs *regs);
 void preempt_schedule_irq(void);
 
 notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2afa05ad21c8..1ff776e9e8e3 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -2173,6 +2173,8 @@ DEFINE_INTERRUPT_HANDLER(unrecoverable_exception)
 	pr_emerg("Unrecoverable exception %lx at %lx (msr=%lx)\n",
 		 regs->trap, regs->nip, regs->msr);
 	die("Unrecoverable exception", regs, SIGABRT);
+	/* die() should not return */
+	BUG();
 }
 NOKPROBE_SYMBOL(unrecoverable_exception);
 
-- 
2.25.0


^ permalink raw reply related

* [PATCH] powerpc: remove interrupt handler functions from the noinstr section
From: Nicholas Piggin @ 2021-02-11  6:36 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Stephen Rothwell, Nicholas Piggin

The allyesconfig ppc64 kernel fails to link with relocations unable to
fit after commit 3a96570ffceb ("powerpc: convert interrupt handlers to
use wrappers"), which is due to the interrupt handler functions being
put into the .noinstr.text section, which the linker script places on
the opposite side of the main .text section from the interrupt entry
asm code which calls the handlers.

This results in a lot of linker stubs that overwhelm the 252-byte sized
space we allow for them, or in the case of BE a .opd relocation link
error for some reason.

It's not required to put interrupt handlers in the .noinstr section,
previously they used NOKPROBE_SYMBOL, so take them out and replace
with a NOKPROBE_SYMBOL in the wrapper macro. Remove the explicit
NOKPROBE_SYMBOL macros in the interrupt handler functions. This makes
a number of interrupt handlers nokprobe that were not prior to the
interrupt wrappers commit, but since that commit they were made
nokprobe due to being in .noinstr.text, so this fix does not change
that.

The fixes tag is different to the commit that first exposes the problem
because it is where the wrapper macros were introduced.

Fixes: 8d41fc618ab8 ("powerpc: interrupt handler wrapper functions")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/interrupt.h | 25 ++++++++++++++++++++-----
 arch/powerpc/kernel/traps.c          |  9 ---------
 arch/powerpc/mm/fault.c              |  1 -
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index 4badb3e51c19..ffb568587553 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -6,6 +6,7 @@
 #include <linux/hardirq.h>
 #include <asm/cputime.h>
 #include <asm/ftrace.h>
+#include <asm/kprobes.h>
 #include <asm/runlatch.h>
 
 struct interrupt_state {
@@ -164,6 +165,15 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter
 #endif
 }
 
+/*
+ * Don't use like to use noinstr here like x86, but rather add NOKPROBE_SYMBOL
+ * to each function definition. The reason for this is the noinstr section
+ * is placed after the main text section, i.e., very far away from the
+ * interrupt entry asm. That creates problems with fitting linker stubs when
+ * building large kernels.
+ */
+#define interrupt_handler __visible noinline notrace __no_kcsan __no_sanitize_address
+
 /**
  * DECLARE_INTERRUPT_HANDLER_RAW - Declare raw interrupt handler function
  * @func:	Function name of the entry point
@@ -198,7 +208,7 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter
 #define DEFINE_INTERRUPT_HANDLER_RAW(func)				\
 static __always_inline long ____##func(struct pt_regs *regs);		\
 									\
-__visible noinstr long func(struct pt_regs *regs)			\
+interrupt_handler long func(struct pt_regs *regs)			\
 {									\
 	long ret;							\
 									\
@@ -206,6 +216,7 @@ __visible noinstr long func(struct pt_regs *regs)			\
 									\
 	return ret;							\
 }									\
+NOKPROBE_SYMBOL(func);							\
 									\
 static __always_inline long ____##func(struct pt_regs *regs)
 
@@ -228,7 +239,7 @@ static __always_inline long ____##func(struct pt_regs *regs)
 #define DEFINE_INTERRUPT_HANDLER(func)					\
 static __always_inline void ____##func(struct pt_regs *regs);		\
 									\
-__visible noinstr void func(struct pt_regs *regs)			\
+interrupt_handler void func(struct pt_regs *regs)			\
 {									\
 	struct interrupt_state state;					\
 									\
@@ -238,6 +249,7 @@ __visible noinstr void func(struct pt_regs *regs)			\
 									\
 	interrupt_exit_prepare(regs, &state);				\
 }									\
+NOKPROBE_SYMBOL(func);							\
 									\
 static __always_inline void ____##func(struct pt_regs *regs)
 
@@ -262,7 +274,7 @@ static __always_inline void ____##func(struct pt_regs *regs)
 #define DEFINE_INTERRUPT_HANDLER_RET(func)				\
 static __always_inline long ____##func(struct pt_regs *regs);		\
 									\
-__visible noinstr long func(struct pt_regs *regs)			\
+interrupt_handler long func(struct pt_regs *regs)			\
 {									\
 	struct interrupt_state state;					\
 	long ret;							\
@@ -275,6 +287,7 @@ __visible noinstr long func(struct pt_regs *regs)			\
 									\
 	return ret;							\
 }									\
+NOKPROBE_SYMBOL(func);							\
 									\
 static __always_inline long ____##func(struct pt_regs *regs)
 
@@ -297,7 +310,7 @@ static __always_inline long ____##func(struct pt_regs *regs)
 #define DEFINE_INTERRUPT_HANDLER_ASYNC(func)				\
 static __always_inline void ____##func(struct pt_regs *regs);		\
 									\
-__visible noinstr void func(struct pt_regs *regs)			\
+interrupt_handler void func(struct pt_regs *regs)			\
 {									\
 	struct interrupt_state state;					\
 									\
@@ -307,6 +320,7 @@ __visible noinstr void func(struct pt_regs *regs)			\
 									\
 	interrupt_async_exit_prepare(regs, &state);			\
 }									\
+NOKPROBE_SYMBOL(func);							\
 									\
 static __always_inline void ____##func(struct pt_regs *regs)
 
@@ -331,7 +345,7 @@ static __always_inline void ____##func(struct pt_regs *regs)
 #define DEFINE_INTERRUPT_HANDLER_NMI(func)				\
 static __always_inline long ____##func(struct pt_regs *regs);		\
 									\
-__visible noinstr long func(struct pt_regs *regs)			\
+interrupt_handler long func(struct pt_regs *regs)			\
 {									\
 	struct interrupt_nmi_state state;				\
 	long ret;							\
@@ -344,6 +358,7 @@ __visible noinstr long func(struct pt_regs *regs)			\
 									\
 	return ret;							\
 }									\
+NOKPROBE_SYMBOL(func);							\
 									\
 static __always_inline long ____##func(struct pt_regs *regs)
 
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 39c8b7e9b91a..1583fd1c6010 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -513,7 +513,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(system_reset_exception)
 
 	return 0;
 }
-NOKPROBE_SYMBOL(system_reset_exception);
 
 /*
  * I/O accesses can cause machine checks on powermacs.
@@ -798,7 +797,6 @@ void die_mce(const char *str, struct pt_regs *regs, long err)
 		nmi_exit();
 	die(str, regs, err);
 }
-NOKPROBE_SYMBOL(die_mce);
 
 /*
  * BOOK3S_64 does not call this handler as a non-maskable interrupt
@@ -851,7 +849,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
 	return 0;
 #endif
 }
-NOKPROBE_SYMBOL(machine_check_exception);
 
 DEFINE_INTERRUPT_HANDLER(SMIException) /* async? */
 {
@@ -1113,7 +1110,6 @@ DEFINE_INTERRUPT_HANDLER(single_step_exception)
 
 	_exception(SIGTRAP, regs, TRAP_TRACE, regs->nip);
 }
-NOKPROBE_SYMBOL(single_step_exception);
 
 /*
  * After we have successfully emulated an instruction, we have to
@@ -1556,7 +1552,6 @@ DEFINE_INTERRUPT_HANDLER(program_check_exception)
 {
 	do_program_check(regs);
 }
-NOKPROBE_SYMBOL(program_check_exception);
 
 /*
  * This occurs when running in hypervisor mode on POWER6 or later
@@ -1567,7 +1562,6 @@ DEFINE_INTERRUPT_HANDLER(emulation_assist_interrupt)
 	regs->msr |= REASON_ILLEGAL;
 	do_program_check(regs);
 }
-NOKPROBE_SYMBOL(emulation_assist_interrupt);
 
 DEFINE_INTERRUPT_HANDLER(alignment_exception)
 {
@@ -2034,7 +2028,6 @@ DEFINE_INTERRUPT_HANDLER(DebugException)
 	} else
 		handle_debug(regs, debug_status);
 }
-NOKPROBE_SYMBOL(DebugException);
 #endif /* CONFIG_PPC_ADV_DEBUG_REGS */
 
 #ifdef CONFIG_ALTIVEC
@@ -2183,7 +2176,6 @@ DEFINE_INTERRUPT_HANDLER(unrecoverable_exception)
 		 regs->trap, regs->nip, regs->msr);
 	die("Unrecoverable exception", regs, SIGABRT);
 }
-NOKPROBE_SYMBOL(unrecoverable_exception);
 
 #if defined(CONFIG_BOOKE_WDT) || defined(CONFIG_40x)
 /*
@@ -2214,7 +2206,6 @@ DEFINE_INTERRUPT_HANDLER(kernel_bad_stack)
 	       regs->gpr[1], regs->nip);
 	die("Bad kernel stack pointer", regs, SIGABRT);
 }
-NOKPROBE_SYMBOL(kernel_bad_stack);
 
 void __init trap_init(void)
 {
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index b26a7643fc6e..bb368257b55c 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -566,7 +566,6 @@ DEFINE_INTERRUPT_HANDLER_RET(do_page_fault)
 {
 	return __do_page_fault(regs);
 }
-NOKPROBE_SYMBOL(do_page_fault);
 
 #ifdef CONFIG_PPC_BOOK3S_64
 /* Same as do_page_fault but interrupt entry has already run in do_hash_fault */
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH] powerpc/traps: Declare unrecoverable_exception() as __noreturn
From: Christophe Leroy @ 2021-02-11  7:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, npiggin
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <f46a01750b1a00c9c43725899c9cf8eb6c6a0587.1613025208.git.christophe.leroy@csgroup.eu>



Le 11/02/2021 à 07:34, Christophe Leroy a écrit :
> unrecoverable_exception() is never expected to return, most callers
> have an infiniteloop in case it returns.
> 
> Ensure it really never returns by terminating it with a BUG(), and
> declare it __no_return.

Not so easy, gcc complains about DEFINE_INTERRUPT_HANDLER() returning while the function is declared 
__noreturn, __noreturn is needed there too.

> 
> It always GCC to really simplify functions calling it. In the exemple below,
> it avoids the stack frame in the likely fast path and avoids code duplication
> for the exit.
> 
> With this patch:
> 
> 	00000348 <interrupt_exit_kernel_prepare>:
> 	 348:	81 43 00 84 	lwz     r10,132(r3)
> 	 34c:	71 48 00 02 	andi.   r8,r10,2
> 	 350:	41 82 00 2c 	beq     37c <interrupt_exit_kernel_prepare+0x34>
> 	 354:	71 4a 40 00 	andi.   r10,r10,16384
> 	 358:	40 82 00 20 	bne     378 <interrupt_exit_kernel_prepare+0x30>
> 	 35c:	80 62 00 70 	lwz     r3,112(r2)
> 	 360:	74 63 00 01 	andis.  r3,r3,1
> 	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
> 	 368:	7d 40 00 a6 	mfmsr   r10
> 	 36c:	7c 11 13 a6 	mtspr   81,r0
> 	 370:	7c 12 13 a6 	mtspr   82,r0
> 	 374:	4e 80 00 20 	blr
> 	 378:	48 00 00 00 	b       378 <interrupt_exit_kernel_prepare+0x30>
> 	 37c:	94 21 ff f0 	stwu    r1,-16(r1)
> 	 380:	7c 08 02 a6 	mflr    r0
> 	 384:	90 01 00 14 	stw     r0,20(r1)
> 	 388:	48 00 00 01 	bl      388 <interrupt_exit_kernel_prepare+0x40>
> 				388: R_PPC_REL24	unrecoverable_exception
> 	 38c:	38 e2 00 70 	addi    r7,r2,112
> 	 390:	3d 00 00 01 	lis     r8,1
> 	 394:	7c c0 38 28 	lwarx   r6,0,r7
> 	 398:	7c c6 40 78 	andc    r6,r6,r8
> 	 39c:	7c c0 39 2d 	stwcx.  r6,0,r7
> 	 3a0:	40 a2 ff f4 	bne     394 <interrupt_exit_kernel_prepare+0x4c>
> 	 3a4:	38 60 00 01 	li      r3,1
> 	 3a8:	4b ff ff c0 	b       368 <interrupt_exit_kernel_prepare+0x20>
> 
> Without this patch:
> 
> 	00000348 <interrupt_exit_kernel_prepare>:
> 	 348:	94 21 ff f0 	stwu    r1,-16(r1)
> 	 34c:	93 e1 00 0c 	stw     r31,12(r1)
> 	 350:	7c 7f 1b 78 	mr      r31,r3
> 	 354:	81 23 00 84 	lwz     r9,132(r3)
> 	 358:	71 2a 00 02 	andi.   r10,r9,2
> 	 35c:	41 82 00 34 	beq     390 <interrupt_exit_kernel_prepare+0x48>
> 	 360:	71 29 40 00 	andi.   r9,r9,16384
> 	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
> 	 368:	80 62 00 70 	lwz     r3,112(r2)
> 	 36c:	74 63 00 01 	andis.  r3,r3,1
> 	 370:	40 82 00 3c 	bne     3ac <interrupt_exit_kernel_prepare+0x64>
> 	 374:	7d 20 00 a6 	mfmsr   r9
> 	 378:	7c 11 13 a6 	mtspr   81,r0
> 	 37c:	7c 12 13 a6 	mtspr   82,r0
> 	 380:	83 e1 00 0c 	lwz     r31,12(r1)
> 	 384:	38 21 00 10 	addi    r1,r1,16
> 	 388:	4e 80 00 20 	blr
> 	 38c:	48 00 00 00 	b       38c <interrupt_exit_kernel_prepare+0x44>
> 	 390:	7c 08 02 a6 	mflr    r0
> 	 394:	90 01 00 14 	stw     r0,20(r1)
> 	 398:	48 00 00 01 	bl      398 <interrupt_exit_kernel_prepare+0x50>
> 				398: R_PPC_REL24	unrecoverable_exception
> 	 39c:	80 01 00 14 	lwz     r0,20(r1)
> 	 3a0:	81 3f 00 84 	lwz     r9,132(r31)
> 	 3a4:	7c 08 03 a6 	mtlr    r0
> 	 3a8:	4b ff ff b8 	b       360 <interrupt_exit_kernel_prepare+0x18>
> 	 3ac:	39 02 00 70 	addi    r8,r2,112
> 	 3b0:	3d 40 00 01 	lis     r10,1
> 	 3b4:	7c e0 40 28 	lwarx   r7,0,r8
> 	 3b8:	7c e7 50 78 	andc    r7,r7,r10
> 	 3bc:	7c e0 41 2d 	stwcx.  r7,0,r8
> 	 3c0:	40 a2 ff f4 	bne     3b4 <interrupt_exit_kernel_prepare+0x6c>
> 	 3c4:	38 60 00 01 	li      r3,1
> 	 3c8:	4b ff ff ac 	b       374 <interrupt_exit_kernel_prepare+0x2c>
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>   arch/powerpc/include/asm/interrupt.h | 2 +-
>   arch/powerpc/kernel/interrupt.c      | 1 -
>   arch/powerpc/kernel/traps.c          | 2 ++
>   3 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
> index dcff30e3919b..fa8bfb91f8df 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -411,7 +411,7 @@ DECLARE_INTERRUPT_HANDLER(altivec_assist_exception);
>   DECLARE_INTERRUPT_HANDLER(CacheLockingException);
>   DECLARE_INTERRUPT_HANDLER(SPEFloatingPointException);
>   DECLARE_INTERRUPT_HANDLER(SPEFloatingPointRoundException);
> -DECLARE_INTERRUPT_HANDLER(unrecoverable_exception);
> +DECLARE_INTERRUPT_HANDLER(unrecoverable_exception) __noreturn;
>   DECLARE_INTERRUPT_HANDLER(WatchdogException);
>   DECLARE_INTERRUPT_HANDLER(kernel_bad_stack);
>   
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index eca3be36c18c..7e7106641ca9 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -440,7 +440,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
>   	return ret;
>   }
>   
> -void unrecoverable_exception(struct pt_regs *regs);
>   void preempt_schedule_irq(void);
>   
>   notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 2afa05ad21c8..1ff776e9e8e3 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -2173,6 +2173,8 @@ DEFINE_INTERRUPT_HANDLER(unrecoverable_exception)
>   	pr_emerg("Unrecoverable exception %lx at %lx (msr=%lx)\n",
>   		 regs->trap, regs->nip, regs->msr);
>   	die("Unrecoverable exception", regs, SIGABRT);
> +	/* die() should not return */
> +	BUG();
>   }
>   NOKPROBE_SYMBOL(unrecoverable_exception);
>   
> 

^ permalink raw reply

* [PATCH v2] powerpc/traps: Declare unrecoverable_exception() as __noreturn
From: Christophe Leroy @ 2021-02-11  7:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, npiggin
  Cc: linuxppc-dev, linux-kernel

unrecoverable_exception() is never expected to return, most callers
have an infiniteloop in case it returns.

Ensure it really never returns by terminating it with a BUG(), and
declare it __no_return.

It always GCC to really simplify functions calling it. In the exemple below,
it avoids the stack frame in the likely fast path and avoids code duplication
for the exit.

With this patch:

	00000348 <interrupt_exit_kernel_prepare>:
	 348:	81 43 00 84 	lwz     r10,132(r3)
	 34c:	71 48 00 02 	andi.   r8,r10,2
	 350:	41 82 00 2c 	beq     37c <interrupt_exit_kernel_prepare+0x34>
	 354:	71 4a 40 00 	andi.   r10,r10,16384
	 358:	40 82 00 20 	bne     378 <interrupt_exit_kernel_prepare+0x30>
	 35c:	80 62 00 70 	lwz     r3,112(r2)
	 360:	74 63 00 01 	andis.  r3,r3,1
	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
	 368:	7d 40 00 a6 	mfmsr   r10
	 36c:	7c 11 13 a6 	mtspr   81,r0
	 370:	7c 12 13 a6 	mtspr   82,r0
	 374:	4e 80 00 20 	blr
	 378:	48 00 00 00 	b       378 <interrupt_exit_kernel_prepare+0x30>
	 37c:	94 21 ff f0 	stwu    r1,-16(r1)
	 380:	7c 08 02 a6 	mflr    r0
	 384:	90 01 00 14 	stw     r0,20(r1)
	 388:	48 00 00 01 	bl      388 <interrupt_exit_kernel_prepare+0x40>
				388: R_PPC_REL24	unrecoverable_exception
	 38c:	38 e2 00 70 	addi    r7,r2,112
	 390:	3d 00 00 01 	lis     r8,1
	 394:	7c c0 38 28 	lwarx   r6,0,r7
	 398:	7c c6 40 78 	andc    r6,r6,r8
	 39c:	7c c0 39 2d 	stwcx.  r6,0,r7
	 3a0:	40 a2 ff f4 	bne     394 <interrupt_exit_kernel_prepare+0x4c>
	 3a4:	38 60 00 01 	li      r3,1
	 3a8:	4b ff ff c0 	b       368 <interrupt_exit_kernel_prepare+0x20>

Without this patch:

	00000348 <interrupt_exit_kernel_prepare>:
	 348:	94 21 ff f0 	stwu    r1,-16(r1)
	 34c:	93 e1 00 0c 	stw     r31,12(r1)
	 350:	7c 7f 1b 78 	mr      r31,r3
	 354:	81 23 00 84 	lwz     r9,132(r3)
	 358:	71 2a 00 02 	andi.   r10,r9,2
	 35c:	41 82 00 34 	beq     390 <interrupt_exit_kernel_prepare+0x48>
	 360:	71 29 40 00 	andi.   r9,r9,16384
	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
	 368:	80 62 00 70 	lwz     r3,112(r2)
	 36c:	74 63 00 01 	andis.  r3,r3,1
	 370:	40 82 00 3c 	bne     3ac <interrupt_exit_kernel_prepare+0x64>
	 374:	7d 20 00 a6 	mfmsr   r9
	 378:	7c 11 13 a6 	mtspr   81,r0
	 37c:	7c 12 13 a6 	mtspr   82,r0
	 380:	83 e1 00 0c 	lwz     r31,12(r1)
	 384:	38 21 00 10 	addi    r1,r1,16
	 388:	4e 80 00 20 	blr
	 38c:	48 00 00 00 	b       38c <interrupt_exit_kernel_prepare+0x44>
	 390:	7c 08 02 a6 	mflr    r0
	 394:	90 01 00 14 	stw     r0,20(r1)
	 398:	48 00 00 01 	bl      398 <interrupt_exit_kernel_prepare+0x50>
				398: R_PPC_REL24	unrecoverable_exception
	 39c:	80 01 00 14 	lwz     r0,20(r1)
	 3a0:	81 3f 00 84 	lwz     r9,132(r31)
	 3a4:	7c 08 03 a6 	mtlr    r0
	 3a8:	4b ff ff b8 	b       360 <interrupt_exit_kernel_prepare+0x18>
	 3ac:	39 02 00 70 	addi    r8,r2,112
	 3b0:	3d 40 00 01 	lis     r10,1
	 3b4:	7c e0 40 28 	lwarx   r7,0,r8
	 3b8:	7c e7 50 78 	andc    r7,r7,r10
	 3bc:	7c e0 41 2d 	stwcx.  r7,0,r8
	 3c0:	40 a2 ff f4 	bne     3b4 <interrupt_exit_kernel_prepare+0x6c>
	 3c4:	38 60 00 01 	li      r3,1
	 3c8:	4b ff ff ac 	b       374 <interrupt_exit_kernel_prepare+0x2c>

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Also add __noreturn to the definition
---
 arch/powerpc/include/asm/interrupt.h | 2 +-
 arch/powerpc/kernel/interrupt.c      | 1 -
 arch/powerpc/kernel/traps.c          | 4 +++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index dcff30e3919b..e6950352347d 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -411,7 +411,7 @@ DECLARE_INTERRUPT_HANDLER(altivec_assist_exception);
 DECLARE_INTERRUPT_HANDLER(CacheLockingException);
 DECLARE_INTERRUPT_HANDLER(SPEFloatingPointException);
 DECLARE_INTERRUPT_HANDLER(SPEFloatingPointRoundException);
-DECLARE_INTERRUPT_HANDLER(unrecoverable_exception);
+__noreturn DECLARE_INTERRUPT_HANDLER(unrecoverable_exception);
 DECLARE_INTERRUPT_HANDLER(WatchdogException);
 DECLARE_INTERRUPT_HANDLER(kernel_bad_stack);
 
diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index eca3be36c18c..7e7106641ca9 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -440,7 +440,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
 	return ret;
 }
 
-void unrecoverable_exception(struct pt_regs *regs);
 void preempt_schedule_irq(void);
 
 notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2afa05ad21c8..22486d27fa82 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -2168,11 +2168,13 @@ DEFINE_INTERRUPT_HANDLER(SPEFloatingPointRoundException)
  * in the MSR is 0.  This indicates that SRR0/1 are live, and that
  * we therefore lost state by taking this exception.
  */
-DEFINE_INTERRUPT_HANDLER(unrecoverable_exception)
+__noreturn DEFINE_INTERRUPT_HANDLER(unrecoverable_exception)
 {
 	pr_emerg("Unrecoverable exception %lx at %lx (msr=%lx)\n",
 		 regs->trap, regs->nip, regs->msr);
 	die("Unrecoverable exception", regs, SIGABRT);
+	/* die() should not return */
+	BUG();
 }
 NOKPROBE_SYMBOL(unrecoverable_exception);
 
-- 
2.25.0


^ permalink raw reply related

* [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()
From: Christophe Leroy @ 2021-02-11  7:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, npiggin
  Cc: linuxppc-dev, linux-kernel

powerpc BUG_ON() is based on using twnei or tdnei instruction,
which obliges gcc to format the condition into a 0 or 1 value
in a register.

By using a generic implementation, gcc will generate a branch
to the unconditional trap generated by BUG().

As modern powerpc implement branch folding, that's even more efficient.

See below the difference at the entry of system_call_exception.

With the patch:

	00000000 <system_call_exception>:
	   0:	81 6a 00 84 	lwz     r11,132(r10)
	   4:	90 6a 00 88 	stw     r3,136(r10)
	   8:	71 60 00 02 	andi.   r0,r11,2
	   c:	41 82 00 70 	beq     7c <system_call_exception+0x7c>
	  10:	71 60 40 00 	andi.   r0,r11,16384
	  14:	41 82 00 6c 	beq     80 <system_call_exception+0x80>
	  18:	71 6b 80 00 	andi.   r11,r11,32768
	  1c:	41 82 00 68 	beq     84 <system_call_exception+0x84>
	  20:	94 21 ff e0 	stwu    r1,-32(r1)
	  24:	93 e1 00 1c 	stw     r31,28(r1)
	  28:	7d 8c 42 e6 	mftb    r12
	...
	  7c:	0f e0 00 00 	twui    r0,0
	  80:	0f e0 00 00 	twui    r0,0
	  84:	0f e0 00 00 	twui    r0,0

Without the patch:

	00000000 <system_call_exception>:
	   0:	94 21 ff e0 	stwu    r1,-32(r1)
	   4:	93 e1 00 1c 	stw     r31,28(r1)
	   8:	90 6a 00 88 	stw     r3,136(r10)
	   c:	81 6a 00 84 	lwz     r11,132(r10)
	  10:	69 60 00 02 	xori    r0,r11,2
	  14:	54 00 ff fe 	rlwinm  r0,r0,31,31,31
	  18:	0f 00 00 00 	twnei   r0,0
	  1c:	69 60 40 00 	xori    r0,r11,16384
	  20:	54 00 97 fe 	rlwinm  r0,r0,18,31,31
	  24:	0f 00 00 00 	twnei   r0,0
	  28:	69 6b 80 00 	xori    r11,r11,32768
	  2c:	55 6b 8f fe 	rlwinm  r11,r11,17,31,31
	  30:	0f 0b 00 00 	twnei   r11,0
	  34:	7d 8c 42 e6 	mftb    r12

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/bug.h | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index d1635ffbb179..21103d3e1f29 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -69,15 +69,6 @@
 	unreachable();						\
 } while (0)
 
-#define BUG_ON(x) do {						\
-	if (__builtin_constant_p(x)) {				\
-		if (x)						\
-			BUG();					\
-	} else {						\
-		BUG_ENTRY(PPC_TLNEI " %4, 0", 0, "r" ((__force long)(x)));	\
-	}							\
-} while (0)
-
 #define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))
 
 #define WARN_ON(x) ({						\
@@ -94,7 +85,6 @@
 })
 
 #define HAVE_ARCH_BUG
-#define HAVE_ARCH_BUG_ON
 #define HAVE_ARCH_WARN_ON
 #endif /* __ASSEMBLY __ */
 #else
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH] powerpc/traps: Declare unrecoverable_exception() as __noreturn
From: Gabriel Paubert @ 2021-02-11  7:47 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linux-kernel, npiggin, Paul Mackerras, linuxppc-dev
In-Reply-To: <f46a01750b1a00c9c43725899c9cf8eb6c6a0587.1613025208.git.christophe.leroy@csgroup.eu>

On Thu, Feb 11, 2021 at 06:34:43AM +0000, Christophe Leroy wrote:
> unrecoverable_exception() is never expected to return, most callers
> have an infiniteloop in case it returns.
> 
> Ensure it really never returns by terminating it with a BUG(), and
> declare it __no_return.
> 
> It always GCC to really simplify functions calling it. In the exemple below,

s/always/allows ?

(Otherwise I can't parse it.)

> it avoids the stack frame in the likely fast path and avoids code duplication
> for the exit.

Indeed, nice code generation improvement.

> 
> With this patch:
> 
> 	00000348 <interrupt_exit_kernel_prepare>:
> 	 348:	81 43 00 84 	lwz     r10,132(r3)
> 	 34c:	71 48 00 02 	andi.   r8,r10,2
> 	 350:	41 82 00 2c 	beq     37c <interrupt_exit_kernel_prepare+0x34>
> 	 354:	71 4a 40 00 	andi.   r10,r10,16384
> 	 358:	40 82 00 20 	bne     378 <interrupt_exit_kernel_prepare+0x30>
> 	 35c:	80 62 00 70 	lwz     r3,112(r2)
> 	 360:	74 63 00 01 	andis.  r3,r3,1
> 	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
> 	 368:	7d 40 00 a6 	mfmsr   r10
> 	 36c:	7c 11 13 a6 	mtspr   81,r0
> 	 370:	7c 12 13 a6 	mtspr   82,r0
> 	 374:	4e 80 00 20 	blr
> 	 378:	48 00 00 00 	b       378 <interrupt_exit_kernel_prepare+0x30>

Infinite loop (seems to be on test of MSR_PR)?

	Gabriel

> 	 37c:	94 21 ff f0 	stwu    r1,-16(r1)
> 	 380:	7c 08 02 a6 	mflr    r0
> 	 384:	90 01 00 14 	stw     r0,20(r1)
> 	 388:	48 00 00 01 	bl      388 <interrupt_exit_kernel_prepare+0x40>
> 				388: R_PPC_REL24	unrecoverable_exception
> 	 38c:	38 e2 00 70 	addi    r7,r2,112
> 	 390:	3d 00 00 01 	lis     r8,1
> 	 394:	7c c0 38 28 	lwarx   r6,0,r7
> 	 398:	7c c6 40 78 	andc    r6,r6,r8
> 	 39c:	7c c0 39 2d 	stwcx.  r6,0,r7
> 	 3a0:	40 a2 ff f4 	bne     394 <interrupt_exit_kernel_prepare+0x4c>
> 	 3a4:	38 60 00 01 	li      r3,1
> 	 3a8:	4b ff ff c0 	b       368 <interrupt_exit_kernel_prepare+0x20>
> 
> Without this patch:
> 
> 	00000348 <interrupt_exit_kernel_prepare>:
> 	 348:	94 21 ff f0 	stwu    r1,-16(r1)
> 	 34c:	93 e1 00 0c 	stw     r31,12(r1)
> 	 350:	7c 7f 1b 78 	mr      r31,r3
> 	 354:	81 23 00 84 	lwz     r9,132(r3)
> 	 358:	71 2a 00 02 	andi.   r10,r9,2
> 	 35c:	41 82 00 34 	beq     390 <interrupt_exit_kernel_prepare+0x48>
> 	 360:	71 29 40 00 	andi.   r9,r9,16384
> 	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
> 	 368:	80 62 00 70 	lwz     r3,112(r2)
> 	 36c:	74 63 00 01 	andis.  r3,r3,1
> 	 370:	40 82 00 3c 	bne     3ac <interrupt_exit_kernel_prepare+0x64>
> 	 374:	7d 20 00 a6 	mfmsr   r9
> 	 378:	7c 11 13 a6 	mtspr   81,r0
> 	 37c:	7c 12 13 a6 	mtspr   82,r0
> 	 380:	83 e1 00 0c 	lwz     r31,12(r1)
> 	 384:	38 21 00 10 	addi    r1,r1,16
> 	 388:	4e 80 00 20 	blr
> 	 38c:	48 00 00 00 	b       38c <interrupt_exit_kernel_prepare+0x44>
> 	 390:	7c 08 02 a6 	mflr    r0
> 	 394:	90 01 00 14 	stw     r0,20(r1)
> 	 398:	48 00 00 01 	bl      398 <interrupt_exit_kernel_prepare+0x50>
> 				398: R_PPC_REL24	unrecoverable_exception
> 	 39c:	80 01 00 14 	lwz     r0,20(r1)
> 	 3a0:	81 3f 00 84 	lwz     r9,132(r31)
> 	 3a4:	7c 08 03 a6 	mtlr    r0
> 	 3a8:	4b ff ff b8 	b       360 <interrupt_exit_kernel_prepare+0x18>
> 	 3ac:	39 02 00 70 	addi    r8,r2,112
> 	 3b0:	3d 40 00 01 	lis     r10,1
> 	 3b4:	7c e0 40 28 	lwarx   r7,0,r8
> 	 3b8:	7c e7 50 78 	andc    r7,r7,r10
> 	 3bc:	7c e0 41 2d 	stwcx.  r7,0,r8
> 	 3c0:	40 a2 ff f4 	bne     3b4 <interrupt_exit_kernel_prepare+0x6c>
> 	 3c4:	38 60 00 01 	li      r3,1
> 	 3c8:	4b ff ff ac 	b       374 <interrupt_exit_kernel_prepare+0x2c>
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/include/asm/interrupt.h | 2 +-
>  arch/powerpc/kernel/interrupt.c      | 1 -
>  arch/powerpc/kernel/traps.c          | 2 ++
>  3 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
> index dcff30e3919b..fa8bfb91f8df 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -411,7 +411,7 @@ DECLARE_INTERRUPT_HANDLER(altivec_assist_exception);
>  DECLARE_INTERRUPT_HANDLER(CacheLockingException);
>  DECLARE_INTERRUPT_HANDLER(SPEFloatingPointException);
>  DECLARE_INTERRUPT_HANDLER(SPEFloatingPointRoundException);
> -DECLARE_INTERRUPT_HANDLER(unrecoverable_exception);
> +DECLARE_INTERRUPT_HANDLER(unrecoverable_exception) __noreturn;
>  DECLARE_INTERRUPT_HANDLER(WatchdogException);
>  DECLARE_INTERRUPT_HANDLER(kernel_bad_stack);
>  
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index eca3be36c18c..7e7106641ca9 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -440,7 +440,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
>  	return ret;
>  }
>  
> -void unrecoverable_exception(struct pt_regs *regs);
>  void preempt_schedule_irq(void);
>  
>  notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 2afa05ad21c8..1ff776e9e8e3 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -2173,6 +2173,8 @@ DEFINE_INTERRUPT_HANDLER(unrecoverable_exception)
>  	pr_emerg("Unrecoverable exception %lx at %lx (msr=%lx)\n",
>  		 regs->trap, regs->nip, regs->msr);
>  	die("Unrecoverable exception", regs, SIGABRT);
> +	/* die() should not return */
> +	BUG();
>  }
>  NOKPROBE_SYMBOL(unrecoverable_exception);
>  
> -- 
> 2.25.0
> 
 


^ permalink raw reply

* Re: [PATCH] powerpc/traps: Declare unrecoverable_exception() as __noreturn
From: Christophe Leroy @ 2021-02-11  9:02 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linux-kernel, npiggin, Paul Mackerras, linuxppc-dev
In-Reply-To: <20210211074723.GA16987@lt-gp.iram.es>



Le 11/02/2021 à 08:47, Gabriel Paubert a écrit :
> On Thu, Feb 11, 2021 at 06:34:43AM +0000, Christophe Leroy wrote:
>> unrecoverable_exception() is never expected to return, most callers
>> have an infiniteloop in case it returns.
>>
>> Ensure it really never returns by terminating it with a BUG(), and
>> declare it __no_return.
>>
>> It always GCC to really simplify functions calling it. In the exemple below,
> 
> s/always/allows ?

Yes

> 
> (Otherwise I can't parse it.)
> 
>> it avoids the stack frame in the likely fast path and avoids code duplication
>> for the exit.
> 
> Indeed, nice code generation improvement.
> 
>>
>> With this patch:
>>
>> 	00000348 <interrupt_exit_kernel_prepare>:
>> 	 348:	81 43 00 84 	lwz     r10,132(r3)
>> 	 34c:	71 48 00 02 	andi.   r8,r10,2
>> 	 350:	41 82 00 2c 	beq     37c <interrupt_exit_kernel_prepare+0x34>
>> 	 354:	71 4a 40 00 	andi.   r10,r10,16384
>> 	 358:	40 82 00 20 	bne     378 <interrupt_exit_kernel_prepare+0x30>
>> 	 35c:	80 62 00 70 	lwz     r3,112(r2)
>> 	 360:	74 63 00 01 	andis.  r3,r3,1
>> 	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
>> 	 368:	7d 40 00 a6 	mfmsr   r10
>> 	 36c:	7c 11 13 a6 	mtspr   81,r0
>> 	 370:	7c 12 13 a6 	mtspr   82,r0
>> 	 374:	4e 80 00 20 	blr
>> 	 378:	48 00 00 00 	b       378 <interrupt_exit_kernel_prepare+0x30>
> 
> Infinite loop (seems to be on test of MSR_PR)?

Yes, that's what you get when CONFIG_BUG is not selected.

/include/asm-generic/bug.h:

#ifndef HAVE_ARCH_BUG
#define BUG() do {} while (1)
#endif


> 
> 	Gabriel
> 
>> 	 37c:	94 21 ff f0 	stwu    r1,-16(r1)
>> 	 380:	7c 08 02 a6 	mflr    r0
>> 	 384:	90 01 00 14 	stw     r0,20(r1)
>> 	 388:	48 00 00 01 	bl      388 <interrupt_exit_kernel_prepare+0x40>
>> 				388: R_PPC_REL24	unrecoverable_exception
>> 	 38c:	38 e2 00 70 	addi    r7,r2,112
>> 	 390:	3d 00 00 01 	lis     r8,1
>> 	 394:	7c c0 38 28 	lwarx   r6,0,r7
>> 	 398:	7c c6 40 78 	andc    r6,r6,r8
>> 	 39c:	7c c0 39 2d 	stwcx.  r6,0,r7
>> 	 3a0:	40 a2 ff f4 	bne     394 <interrupt_exit_kernel_prepare+0x4c>
>> 	 3a4:	38 60 00 01 	li      r3,1
>> 	 3a8:	4b ff ff c0 	b       368 <interrupt_exit_kernel_prepare+0x20>
>>
>> Without this patch:
>>
>> 	00000348 <interrupt_exit_kernel_prepare>:
>> 	 348:	94 21 ff f0 	stwu    r1,-16(r1)
>> 	 34c:	93 e1 00 0c 	stw     r31,12(r1)
>> 	 350:	7c 7f 1b 78 	mr      r31,r3
>> 	 354:	81 23 00 84 	lwz     r9,132(r3)
>> 	 358:	71 2a 00 02 	andi.   r10,r9,2
>> 	 35c:	41 82 00 34 	beq     390 <interrupt_exit_kernel_prepare+0x48>
>> 	 360:	71 29 40 00 	andi.   r9,r9,16384
>> 	 364:	40 82 00 28 	bne     38c <interrupt_exit_kernel_prepare+0x44>
>> 	 368:	80 62 00 70 	lwz     r3,112(r2)
>> 	 36c:	74 63 00 01 	andis.  r3,r3,1
>> 	 370:	40 82 00 3c 	bne     3ac <interrupt_exit_kernel_prepare+0x64>
>> 	 374:	7d 20 00 a6 	mfmsr   r9
>> 	 378:	7c 11 13 a6 	mtspr   81,r0
>> 	 37c:	7c 12 13 a6 	mtspr   82,r0
>> 	 380:	83 e1 00 0c 	lwz     r31,12(r1)
>> 	 384:	38 21 00 10 	addi    r1,r1,16
>> 	 388:	4e 80 00 20 	blr
>> 	 38c:	48 00 00 00 	b       38c <interrupt_exit_kernel_prepare+0x44>
>> 	 390:	7c 08 02 a6 	mflr    r0
>> 	 394:	90 01 00 14 	stw     r0,20(r1)
>> 	 398:	48 00 00 01 	bl      398 <interrupt_exit_kernel_prepare+0x50>
>> 				398: R_PPC_REL24	unrecoverable_exception
>> 	 39c:	80 01 00 14 	lwz     r0,20(r1)
>> 	 3a0:	81 3f 00 84 	lwz     r9,132(r31)
>> 	 3a4:	7c 08 03 a6 	mtlr    r0
>> 	 3a8:	4b ff ff b8 	b       360 <interrupt_exit_kernel_prepare+0x18>
>> 	 3ac:	39 02 00 70 	addi    r8,r2,112
>> 	 3b0:	3d 40 00 01 	lis     r10,1
>> 	 3b4:	7c e0 40 28 	lwarx   r7,0,r8
>> 	 3b8:	7c e7 50 78 	andc    r7,r7,r10
>> 	 3bc:	7c e0 41 2d 	stwcx.  r7,0,r8
>> 	 3c0:	40 a2 ff f4 	bne     3b4 <interrupt_exit_kernel_prepare+0x6c>
>> 	 3c4:	38 60 00 01 	li      r3,1
>> 	 3c8:	4b ff ff ac 	b       374 <interrupt_exit_kernel_prepare+0x2c>
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>>   arch/powerpc/include/asm/interrupt.h | 2 +-
>>   arch/powerpc/kernel/interrupt.c      | 1 -
>>   arch/powerpc/kernel/traps.c          | 2 ++
>>   3 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
>> index dcff30e3919b..fa8bfb91f8df 100644
>> --- a/arch/powerpc/include/asm/interrupt.h
>> +++ b/arch/powerpc/include/asm/interrupt.h
>> @@ -411,7 +411,7 @@ DECLARE_INTERRUPT_HANDLER(altivec_assist_exception);
>>   DECLARE_INTERRUPT_HANDLER(CacheLockingException);
>>   DECLARE_INTERRUPT_HANDLER(SPEFloatingPointException);
>>   DECLARE_INTERRUPT_HANDLER(SPEFloatingPointRoundException);
>> -DECLARE_INTERRUPT_HANDLER(unrecoverable_exception);
>> +DECLARE_INTERRUPT_HANDLER(unrecoverable_exception) __noreturn;
>>   DECLARE_INTERRUPT_HANDLER(WatchdogException);
>>   DECLARE_INTERRUPT_HANDLER(kernel_bad_stack);
>>   
>> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
>> index eca3be36c18c..7e7106641ca9 100644
>> --- a/arch/powerpc/kernel/interrupt.c
>> +++ b/arch/powerpc/kernel/interrupt.c
>> @@ -440,7 +440,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
>>   	return ret;
>>   }
>>   
>> -void unrecoverable_exception(struct pt_regs *regs);
>>   void preempt_schedule_irq(void);
>>   
>>   notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> index 2afa05ad21c8..1ff776e9e8e3 100644
>> --- a/arch/powerpc/kernel/traps.c
>> +++ b/arch/powerpc/kernel/traps.c
>> @@ -2173,6 +2173,8 @@ DEFINE_INTERRUPT_HANDLER(unrecoverable_exception)
>>   	pr_emerg("Unrecoverable exception %lx at %lx (msr=%lx)\n",
>>   		 regs->trap, regs->nip, regs->msr);
>>   	die("Unrecoverable exception", regs, SIGABRT);
>> +	/* die() should not return */
>> +	BUG();
>>   }
>>   NOKPROBE_SYMBOL(unrecoverable_exception);
>>   
>> -- 
>> 2.25.0
>>
>   
> 

^ permalink raw reply

* Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()
From: Nicholas Piggin @ 2021-02-11 10:04 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <694c7195c81d1bcc781b3c14f452886683d6c524.1613029237.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of February 11, 2021 5:41 pm:
> powerpc BUG_ON() is based on using twnei or tdnei instruction,
> which obliges gcc to format the condition into a 0 or 1 value
> in a register.
> 
> By using a generic implementation, gcc will generate a branch
> to the unconditional trap generated by BUG().

We don't want to do this on 64s because that will lose the useful CFAR
contents.

Unfortunately the code generation is not great and the registers that 
give some useful information about the condition are often mangled :(

It would be nice if we could have a __builtin_trap_if that gcc would use 
conditional traps with, (and which never assumes following code is 
unreachable even for constant true, so we can use it with WARN and put 
explicit unreachable for BUG).

> 
> As modern powerpc implement branch folding, that's even more efficient.

I think POWER will speculate conditional traps as non faulting always
so it should be just as good if not better than the branch.

Thanks,
Nick

^ permalink raw reply

* [PATCH 1/3] powerpc/perf: Adds support for programming of Thresholding in P10
From: Michael Ellerman @ 2021-02-11 11:27 UTC (permalink / raw)
  To: linuxppc-dev

From: Kajol Jain <kjain@linux.ibm.com>

Thresholding, a performance monitoring unit feature, can be
used to identify marked instructions which take more than
expected cycles between start event and end event.
Threshold compare (thresh_cmp) bits are programmed in MMCRA
register. In Power9, thresh_cmp bits were part of the
event code. But in case of P10, thresh_cmp are not part of
event code due to inclusion of MMCR3 bits.

Patch here adds an option to use attr.config1 variable
to be used to pass thresh_cmp value to be programmed in
MMCRA register. A new ppmu flag called PPMU_HAS_ATTR_CONFIG1
has been added and this flag is used to notify the use of
attr.config1 variable.

Patch has extended the parameter list of 'compute_mmcr',
to include power_pmu's 'flags' element and parameter list of
get_constraint to include attr.config1 value. It also extend
parameter list of power_check_constraints inorder to pass
perf_event list.

As stated by commit ef0e3b650f8d ("powerpc/perf: Fix Threshold
Event Counter Multiplier width for P10"), constraint bits for
thresh_cmp is also needed to be increased to 11 bits, which is
handled as part of this patch. We added bit number 53 as part
of constraint bits of thresh_cmp for power10 to make it an
11 bit field.

Updated layout for p10:

/*
 * Layout of constraint bits:
 *
 *        60        56        52        48        44        40        36        32
 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
 *   [   fab_match   ]         [       thresh_cmp      ] [   thresh_ctl    ] [   ]
 *                                          |                                  |
 *                           [  thresh_cmp bits for p10]           thresh_sel -*
 *
 *        28        24        20        16        12         8         4         0
 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
 *               [ ] |   [ ] |  [  sample ]   [     ]   [6] [5]   [4] [3]   [2] [1]
 *                |  |    |  |                  |
 *      BHRB IFM -*  |    |  |*radix_scope      |      Count of events for each PMC.
 *              EBB -*    |                     |        p1, p2, p3, p4, p5, p6.
 *      L1 I/D qualifier -*                     |
 *                     nc - number of counters -*
 *
 * The PMC fields P1..P6, and NC, are adder fields. As we accumulate constraints
 * we want the low bit of each field to be added to any existing value.
 *
 * Everything else is a value field.
 */

Result:
command#: cat /sys/devices/cpu/format/thresh_cmp
config1:0-17

ex. usage:

command#: perf record -I --weight -d  -e
	 cpu/event=0x67340101EC,thresh_cmp=500/ ./ebizzy -S 2 -t 1 -s 4096
1826636 records/s
real  2.00 s
user  2.00 s
sys   0.00 s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.038 MB perf.data (61 samples) ]

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210209095234.837356-1-kjain@linux.ibm.com
---
 arch/powerpc/include/asm/perf_event_server.h |  5 +-
 arch/powerpc/perf/core-book3s.c              | 15 +++--
 arch/powerpc/perf/isa207-common.c            | 67 +++++++++++++++++---
 arch/powerpc/perf/isa207-common.h            | 15 +++--
 arch/powerpc/perf/mpc7450-pmu.c              |  5 +-
 arch/powerpc/perf/power10-pmu.c              |  4 +-
 arch/powerpc/perf/power5+-pmu.c              |  5 +-
 arch/powerpc/perf/power5-pmu.c               |  5 +-
 arch/powerpc/perf/power6-pmu.c               |  5 +-
 arch/powerpc/perf/power7-pmu.c               |  5 +-
 arch/powerpc/perf/ppc970-pmu.c               |  5 +-
 11 files changed, 102 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index 3b7baba01c92..00e7e671bb4b 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -36,9 +36,9 @@ struct power_pmu {
 	unsigned long	test_adder;
 	int		(*compute_mmcr)(u64 events[], int n_ev,
 				unsigned int hwc[], struct mmcr_regs *mmcr,
-				struct perf_event *pevents[]);
+				struct perf_event *pevents[], u32 flags);
 	int		(*get_constraint)(u64 event_id, unsigned long *mskp,
-				unsigned long *valp);
+				unsigned long *valp, u64 event_config1);
 	int		(*get_alternatives)(u64 event_id, unsigned int flags,
 				u64 alt[]);
 	void		(*get_mem_data_src)(union perf_mem_data_src *dsrc,
@@ -83,6 +83,7 @@ struct power_pmu {
 #define PPMU_NO_SIAR		0x00000100 /* Do not use SIAR */
 #define PPMU_ARCH_31		0x00000200 /* Has MMCR3, SIER2 and SIER3 */
 #define PPMU_P10_DD1		0x00000400 /* Is power10 DD1 processor version */
+#define PPMU_HAS_ATTR_CONFIG1	0x00000800 /* Using config1 attribute */
 
 /*
  * Values for flags to get_alternatives()
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 0e31aaa0a0d2..4b4319d84c54 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -916,7 +916,7 @@ void perf_event_print_debug(void)
  */
 static int power_check_constraints(struct cpu_hw_events *cpuhw,
 				   u64 event_id[], unsigned int cflags[],
-				   int n_ev)
+				   int n_ev, struct perf_event **event)
 {
 	unsigned long mask, value, nv;
 	unsigned long smasks[MAX_HWEVENTS], svalues[MAX_HWEVENTS];
@@ -939,7 +939,7 @@ static int power_check_constraints(struct cpu_hw_events *cpuhw,
 			event_id[i] = cpuhw->alternatives[i][0];
 		}
 		if (ppmu->get_constraint(event_id[i], &cpuhw->amasks[i][0],
-					 &cpuhw->avalues[i][0]))
+					 &cpuhw->avalues[i][0], event[i]->attr.config1))
 			return -1;
 	}
 	value = mask = 0;
@@ -974,7 +974,8 @@ static int power_check_constraints(struct cpu_hw_events *cpuhw,
 		for (j = 1; j < n_alt[i]; ++j)
 			ppmu->get_constraint(cpuhw->alternatives[i][j],
 					     &cpuhw->amasks[i][j],
-					     &cpuhw->avalues[i][j]);
+					     &cpuhw->avalues[i][j],
+					     event[i]->attr.config1);
 	}
 
 	/* enumerate all possibilities and see if any will work */
@@ -1392,7 +1393,7 @@ static void power_pmu_enable(struct pmu *pmu)
 	memset(&cpuhw->mmcr, 0, sizeof(cpuhw->mmcr));
 
 	if (ppmu->compute_mmcr(cpuhw->events, cpuhw->n_events, hwc_index,
-			       &cpuhw->mmcr, cpuhw->event)) {
+			       &cpuhw->mmcr, cpuhw->event, ppmu->flags)) {
 		/* shouldn't ever get here */
 		printk(KERN_ERR "oops compute_mmcr failed\n");
 		goto out;
@@ -1580,7 +1581,7 @@ static int power_pmu_add(struct perf_event *event, int ef_flags)
 
 	if (check_excludes(cpuhw->event, cpuhw->flags, n0, 1))
 		goto out;
-	if (power_check_constraints(cpuhw, cpuhw->events, cpuhw->flags, n0 + 1))
+	if (power_check_constraints(cpuhw, cpuhw->events, cpuhw->flags, n0 + 1, cpuhw->event))
 		goto out;
 	event->hw.config = cpuhw->events[n0];
 
@@ -1790,7 +1791,7 @@ static int power_pmu_commit_txn(struct pmu *pmu)
 	n = cpuhw->n_events;
 	if (check_excludes(cpuhw->event, cpuhw->flags, 0, n))
 		return -EAGAIN;
-	i = power_check_constraints(cpuhw, cpuhw->events, cpuhw->flags, n);
+	i = power_check_constraints(cpuhw, cpuhw->events, cpuhw->flags, n, cpuhw->event);
 	if (i < 0)
 		return -EAGAIN;
 
@@ -2028,7 +2029,7 @@ static int power_pmu_event_init(struct perf_event *event)
 	local_irq_save(irq_flags);
 	cpuhw = this_cpu_ptr(&cpu_hw_events);
 
-	err = power_check_constraints(cpuhw, events, cflags, n + 1);
+	err = power_check_constraints(cpuhw, events, cflags, n + 1, ctrs);
 
 	if (has_branch_stack(event)) {
 		u64 bhrb_filter = -1;
diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index 6ab5b272090a..e4f577da33d8 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -108,12 +108,57 @@ static void mmcra_sdar_mode(u64 event, unsigned long *mmcra)
 		*mmcra |= MMCRA_SDAR_MODE_TLB;
 }
 
+static u64 p10_thresh_cmp_val(u64 value)
+{
+	int exp = 0;
+	u64 result = value;
+
+	if (!value)
+		return value;
+
+	/*
+	 * Incase of P10, thresh_cmp value is not part of raw event code
+	 * and provided via attr.config1 parameter. To program threshold in MMCRA,
+	 * take a 18 bit number N and shift right 2 places and increment
+	 * the exponent E by 1 until the upper 10 bits of N are zero.
+	 * Write E to the threshold exponent and write the lower 8 bits of N
+	 * to the threshold mantissa.
+	 * The max threshold that can be written is 261120.
+	 */
+	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+		if (value > 261120)
+			value = 261120;
+		while ((64 - __builtin_clzl(value)) > 8) {
+			exp++;
+			value >>= 2;
+		}
+
+		/*
+		 * Note that it is invalid to write a mantissa with the
+		 * upper 2 bits of mantissa being zero, unless the
+		 * exponent is also zero.
+		 */
+		if (!(value & 0xC0) && exp)
+			result = 0;
+		else
+			result = (exp << 8) | value;
+	}
+	return result;
+}
+
 static u64 thresh_cmp_val(u64 value)
 {
+	if (cpu_has_feature(CPU_FTR_ARCH_31))
+		value = p10_thresh_cmp_val(value);
+
+	/*
+	 * Since location of threshold compare bits in MMCRA
+	 * is different for p8, using different shift value.
+	 */
 	if (cpu_has_feature(CPU_FTR_ARCH_300))
 		return value << p9_MMCRA_THR_CMP_SHIFT;
-
-	return value << MMCRA_THR_CMP_SHIFT;
+	else
+		return value << MMCRA_THR_CMP_SHIFT;
 }
 
 static unsigned long combine_from_event(u64 event)
@@ -141,13 +186,13 @@ static bool is_thresh_cmp_valid(u64 event)
 {
 	unsigned int cmp, exp;
 
+	if (cpu_has_feature(CPU_FTR_ARCH_31))
+		return p10_thresh_cmp_val(event) != 0;
+
 	/*
 	 * Check the mantissa upper two bits are not zero, unless the
 	 * exponent is also zero. See the THRESH_CMP_MANTISSA doc.
-	 * Power10: thresh_cmp is replaced by l2_l3 event select.
 	 */
-	if (cpu_has_feature(CPU_FTR_ARCH_31))
-		return false;
 
 	cmp = (event >> EVENT_THR_CMP_SHIFT) & EVENT_THR_CMP_MASK;
 	exp = cmp >> 7;
@@ -256,7 +301,7 @@ void isa207_get_mem_weight(u64 *weight)
 		*weight = mantissa << (2 * exp);
 }
 
-int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
+int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp, u64 event_config1)
 {
 	unsigned int unit, pmc, cache, ebb;
 	unsigned long mask, value;
@@ -355,9 +400,11 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
 	}
 
 	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
-		if (event_is_threshold(event)) {
+		if (event_is_threshold(event) && is_thresh_cmp_valid(event_config1)) {
 			mask  |= CNST_THRESH_CTL_SEL_MASK;
 			value |= CNST_THRESH_CTL_SEL_VAL(event >> EVENT_THRESH_SHIFT);
+			mask  |= p10_CNST_THRESH_CMP_MASK;
+			value |= p10_CNST_THRESH_CMP_VAL(p10_thresh_cmp_val(event_config1));
 		}
 	} else if (cpu_has_feature(CPU_FTR_ARCH_300))  {
 		if (event_is_threshold(event) && is_thresh_cmp_valid(event)) {
@@ -411,7 +458,7 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
 
 int isa207_compute_mmcr(u64 event[], int n_ev,
 			       unsigned int hwc[], struct mmcr_regs *mmcr,
-			       struct perf_event *pevents[])
+			       struct perf_event *pevents[], u32 flags)
 {
 	unsigned long mmcra, mmcr1, mmcr2, unit, combine, psel, cache, val;
 	unsigned long mmcr3;
@@ -504,6 +551,10 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
 				val = (event[i] >> EVENT_THR_CMP_SHIFT) &
 					EVENT_THR_CMP_MASK;
 				mmcra |= thresh_cmp_val(val);
+			} else if (flags & PPMU_HAS_ATTR_CONFIG1) {
+				val = (pevents[i]->attr.config1 >> p10_EVENT_THR_CMP_SHIFT) &
+					p10_EVENT_THR_CMP_MASK;
+				mmcra |= thresh_cmp_val(val);
 			}
 		}
 
diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
index 454b32c31440..1af0e8c97ac7 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -105,6 +105,10 @@
 #define p10_EVENT_RADIX_SCOPE_QUAL_MASK	0x1
 #define p10_MMCR1_RADIX_SCOPE_QUAL_SHIFT	45
 
+/* Event Threshold Compare bit constant for power10 in config1 attribute */
+#define p10_EVENT_THR_CMP_SHIFT        0
+#define p10_EVENT_THR_CMP_MASK 0x3FFFFull
+
 #define p10_EVENT_VALID_MASK		\
 	((p10_SDAR_MODE_MASK   << p10_SDAR_MODE_SHIFT		|	\
 	(p10_EVENT_THRESH_MASK  << EVENT_THRESH_SHIFT)		|	\
@@ -124,8 +128,8 @@
  *        60        56        52        48        44        40        36        32
  * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
  *   [   fab_match   ]         [       thresh_cmp      ] [   thresh_ctl    ] [   ]
- *                                                                             |
- *                                                                 thresh_sel -*
+ *                                          |                                  |
+ *                           [  thresh_cmp bits for p10]           thresh_sel -*
  *
  *        28        24        20        16        12         8         4         0
  * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
@@ -152,6 +156,9 @@
 #define CNST_THRESH_CTL_SEL_VAL(v)	(((v) & 0x7ffull) << 32)
 #define CNST_THRESH_CTL_SEL_MASK	CNST_THRESH_CTL_SEL_VAL(0x7ff)
 
+#define p10_CNST_THRESH_CMP_VAL(v) (((v) & 0x7ffull) << 43)
+#define p10_CNST_THRESH_CMP_MASK   p10_CNST_THRESH_CMP_VAL(0x7ff)
+
 #define CNST_EBB_VAL(v)		(((v) & EVENT_EBB_MASK) << 24)
 #define CNST_EBB_MASK		CNST_EBB_VAL(EVENT_EBB_MASK)
 
@@ -262,10 +269,10 @@
 #define PH(a, b)			(P(LVL, HIT) | P(a, b))
 #define PM(a, b)			(P(LVL, MISS) | P(a, b))
 
-int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp);
+int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp, u64 event_config1);
 int isa207_compute_mmcr(u64 event[], int n_ev,
 				unsigned int hwc[], struct mmcr_regs *mmcr,
-				struct perf_event *pevents[]);
+				struct perf_event *pevents[], u32 flags);
 void isa207_disable_pmc(unsigned int pmc, struct mmcr_regs *mmcr);
 int isa207_get_alternatives(u64 event, u64 alt[], int size, unsigned int flags,
 					const unsigned int ev_alt[][MAX_ALT]);
diff --git a/arch/powerpc/perf/mpc7450-pmu.c b/arch/powerpc/perf/mpc7450-pmu.c
index 1919e9df9165..e39b15b79a83 100644
--- a/arch/powerpc/perf/mpc7450-pmu.c
+++ b/arch/powerpc/perf/mpc7450-pmu.c
@@ -148,7 +148,7 @@ static u32 classbits[N_CLASSES - 1][2] = {
 };
 
 static int mpc7450_get_constraint(u64 event, unsigned long *maskp,
-				  unsigned long *valp)
+				  unsigned long *valp, u64 event_config1 __maybe_unused)
 {
 	int pmc, class;
 	u32 mask, value;
@@ -258,7 +258,8 @@ static const u32 pmcsel_mask[N_COUNTER] = {
  */
 static int mpc7450_compute_mmcr(u64 event[], int n_ev, unsigned int hwc[],
 				struct mmcr_regs *mmcr,
-				struct perf_event *pevents[])
+				struct perf_event *pevents[],
+				u32 flags __maybe_unused)
 {
 	u8 event_index[N_CLASSES][N_COUNTER];
 	int n_classevent[N_CLASSES];
diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
index 79e0206ca454..a901c1348cad 100644
--- a/arch/powerpc/perf/power10-pmu.c
+++ b/arch/powerpc/perf/power10-pmu.c
@@ -216,6 +216,7 @@ PMU_FORMAT_ATTR(invert_bit,     "config:47");
 PMU_FORMAT_ATTR(src_mask,       "config:48-53");
 PMU_FORMAT_ATTR(src_match,      "config:54-59");
 PMU_FORMAT_ATTR(radix_scope,	"config:9");
+PMU_FORMAT_ATTR(thresh_cmp,     "config1:0-17");
 
 static struct attribute *power10_pmu_format_attr[] = {
 	&format_attr_event.attr,
@@ -236,6 +237,7 @@ static struct attribute *power10_pmu_format_attr[] = {
 	&format_attr_src_mask.attr,
 	&format_attr_src_match.attr,
 	&format_attr_radix_scope.attr,
+	&format_attr_thresh_cmp.attr,
 	NULL,
 };
 
@@ -550,7 +552,7 @@ static struct power_pmu power10_pmu = {
 	.get_mem_weight		= isa207_get_mem_weight,
 	.disable_pmc		= isa207_disable_pmc,
 	.flags			= PPMU_HAS_SIER | PPMU_ARCH_207S |
-				  PPMU_ARCH_31,
+				  PPMU_ARCH_31 | PPMU_HAS_ATTR_CONFIG1,
 	.n_generic		= ARRAY_SIZE(power10_generic_events),
 	.generic_events		= power10_generic_events,
 	.cache_events		= &power10_cache_events,
diff --git a/arch/powerpc/perf/power5+-pmu.c b/arch/powerpc/perf/power5+-pmu.c
index 3e64b4a1511f..18732267993a 100644
--- a/arch/powerpc/perf/power5+-pmu.c
+++ b/arch/powerpc/perf/power5+-pmu.c
@@ -132,7 +132,7 @@ static unsigned long unit_cons[PM_LASTUNIT+1][2] = {
 };
 
 static int power5p_get_constraint(u64 event, unsigned long *maskp,
-				  unsigned long *valp)
+				  unsigned long *valp, u64 event_config1 __maybe_unused)
 {
 	int pmc, byte, unit, sh;
 	int bit, fmask;
@@ -451,7 +451,8 @@ static int power5p_marked_instr_event(u64 event)
 
 static int power5p_compute_mmcr(u64 event[], int n_ev,
 				unsigned int hwc[], struct mmcr_regs *mmcr,
-				struct perf_event *pevents[])
+				struct perf_event *pevents[],
+				u32 flags __maybe_unused)
 {
 	unsigned long mmcr1 = 0;
 	unsigned long mmcra = 0;
diff --git a/arch/powerpc/perf/power5-pmu.c b/arch/powerpc/perf/power5-pmu.c
index 017bb19b73fb..cb611c1e7abe 100644
--- a/arch/powerpc/perf/power5-pmu.c
+++ b/arch/powerpc/perf/power5-pmu.c
@@ -136,7 +136,7 @@ static unsigned long unit_cons[PM_LASTUNIT+1][2] = {
 };
 
 static int power5_get_constraint(u64 event, unsigned long *maskp,
-				 unsigned long *valp)
+				 unsigned long *valp, u64 event_config1 __maybe_unused)
 {
 	int pmc, byte, unit, sh;
 	int bit, fmask;
@@ -382,7 +382,8 @@ static int power5_marked_instr_event(u64 event)
 
 static int power5_compute_mmcr(u64 event[], int n_ev,
 			       unsigned int hwc[], struct mmcr_regs *mmcr,
-			       struct perf_event *pevents[])
+			       struct perf_event *pevents[],
+			       u32 flags __maybe_unused)
 {
 	unsigned long mmcr1 = 0;
 	unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
diff --git a/arch/powerpc/perf/power6-pmu.c b/arch/powerpc/perf/power6-pmu.c
index 189974478e9f..69ef38216418 100644
--- a/arch/powerpc/perf/power6-pmu.c
+++ b/arch/powerpc/perf/power6-pmu.c
@@ -173,7 +173,8 @@ static int power6_marked_instr_event(u64 event)
  * Assign PMC numbers and compute MMCR1 value for a set of events
  */
 static int p6_compute_mmcr(u64 event[], int n_ev,
-			   unsigned int hwc[], struct mmcr_regs *mmcr, struct perf_event *pevents[])
+			   unsigned int hwc[], struct mmcr_regs *mmcr, struct perf_event *pevents[],
+			   u32 flags __maybe_unused)
 {
 	unsigned long mmcr1 = 0;
 	unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
@@ -266,7 +267,7 @@ static int p6_compute_mmcr(u64 event[], int n_ev,
  *	32-34	select field: nest (subunit) event selector
  */
 static int p6_get_constraint(u64 event, unsigned long *maskp,
-			     unsigned long *valp)
+			     unsigned long *valp, u64 event_config1 __maybe_unused)
 {
 	int pmc, byte, sh, subunit;
 	unsigned long mask = 0, value = 0;
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index bacfab104a1a..894c17f9a762 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -81,7 +81,7 @@ enum {
  */
 
 static int power7_get_constraint(u64 event, unsigned long *maskp,
-				 unsigned long *valp)
+				 unsigned long *valp, u64 event_config1 __maybe_unused)
 {
 	int pmc, sh, unit;
 	unsigned long mask = 0, value = 0;
@@ -245,7 +245,8 @@ static int power7_marked_instr_event(u64 event)
 
 static int power7_compute_mmcr(u64 event[], int n_ev,
 			       unsigned int hwc[], struct mmcr_regs *mmcr,
-			       struct perf_event *pevents[])
+			       struct perf_event *pevents[],
+			       u32 flags __maybe_unused)
 {
 	unsigned long mmcr1 = 0;
 	unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
diff --git a/arch/powerpc/perf/ppc970-pmu.c b/arch/powerpc/perf/ppc970-pmu.c
index 7d78df97f272..1f8263785286 100644
--- a/arch/powerpc/perf/ppc970-pmu.c
+++ b/arch/powerpc/perf/ppc970-pmu.c
@@ -190,7 +190,7 @@ static unsigned long unit_cons[PM_LASTUNIT+1][2] = {
 };
 
 static int p970_get_constraint(u64 event, unsigned long *maskp,
-			       unsigned long *valp)
+			       unsigned long *valp, u64 event_config1 __maybe_unused)
 {
 	int pmc, byte, unit, sh, spcsel;
 	unsigned long mask = 0, value = 0;
@@ -256,7 +256,8 @@ static int p970_get_alternatives(u64 event, unsigned int flags, u64 alt[])
 
 static int p970_compute_mmcr(u64 event[], int n_ev,
 			     unsigned int hwc[], struct mmcr_regs *mmcr,
-			     struct perf_event *pevents[])
+			     struct perf_event *pevents[],
+			     u32 flags __maybe_unused)
 {
 	unsigned long mmcr0 = 0, mmcr1 = 0, mmcra = 0;
 	unsigned int pmc, unit, byte, psel;
-- 
2.25.1


^ permalink raw reply related

* [PATCH] powerpc/powernv/pci: Use kzalloc() for phb related allocations
From: Michael Ellerman @ 2021-02-11 11:27 UTC (permalink / raw)
  To: linuxppc-dev

As part of commit fbbefb320214 ("powerpc/pci: Move PHB discovery for
PCI_DN using platforms"), I switched some allocations from
memblock_alloc() to kmalloc(), otherwise memblock would warn that it
was being called after slab init.

However I missed that the code relied on the allocations being zeroed,
without which we could end up crashing:

  pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to ff
  BUG: Unable to handle kernel data access on read at 0x6b6b6b6b6b6b6af7
  Faulting instruction address: 0xc0000000000dbc90
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
  ...
  NIP  pnv_ioda_get_pe_state+0xe0/0x1d0
  LR   pnv_ioda_get_pe_state+0xb4/0x1d0
  Call Trace:
    pnv_ioda_get_pe_state+0xb4/0x1d0 (unreliable)
    pnv_pci_config_check_eeh.isra.9+0x78/0x270
    pnv_pci_read_config+0xf8/0x160
    pci_bus_read_config_dword+0xa4/0x120
    pci_bus_generic_read_dev_vendor_id+0x54/0x270
    pci_scan_single_device+0xb8/0x140
    pci_scan_slot+0x80/0x1b0
    pci_scan_child_bus_extend+0x94/0x490
    pcibios_scan_phb+0x1f8/0x3c0
    pcibios_init+0x8c/0x12c
    do_one_initcall+0x94/0x510
    kernel_init_freeable+0x35c/0x3fc
    kernel_init+0x2c/0x168
    ret_from_kernel_thread+0x5c/0x70

Switch them to kzalloc().

Fixes: fbbefb320214 ("powerpc/pci: Move PHB discovery for PCI_DN using platforms")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7ee14ac275bd..f0f901683a2f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2921,7 +2921,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	phb_id = be64_to_cpup(prop64);
 	pr_debug("  PHB-ID  : 0x%016llx\n", phb_id);
 
-	phb = kmalloc(sizeof(*phb), GFP_KERNEL);
+	phb = kzalloc(sizeof(*phb), GFP_KERNEL);
 	if (!phb)
 		panic("%s: Failed to allocate %zu bytes\n", __func__,
 		      sizeof(*phb));
@@ -2970,7 +2970,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	else
 		phb->diag_data_size = PNV_PCI_DIAG_BUF_SIZE;
 
-	phb->diag_data = kmalloc(phb->diag_data_size, GFP_KERNEL);
+	phb->diag_data = kzalloc(phb->diag_data_size, GFP_KERNEL);
 	if (!phb->diag_data)
 		panic("%s: Failed to allocate %u bytes\n", __func__,
 		      phb->diag_data_size);
@@ -3032,7 +3032,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	}
 	pemap_off = size;
 	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
-	aux = kmalloc(size, GFP_KERNEL);
+	aux = kzalloc(size, GFP_KERNEL);
 	if (!aux)
 		panic("%s: Failed to allocate %lu bytes\n", __func__, size);
 
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()
From: Segher Boessenkool @ 2021-02-11 11:49 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linux-kernel, npiggin, Paul Mackerras, linuxppc-dev
In-Reply-To: <694c7195c81d1bcc781b3c14f452886683d6c524.1613029237.git.christophe.leroy@csgroup.eu>

On Thu, Feb 11, 2021 at 07:41:52AM +0000, Christophe Leroy wrote:
> powerpc BUG_ON() is based on using twnei or tdnei instruction,
> which obliges gcc to format the condition into a 0 or 1 value
> in a register.

Huh?  Why is that?

Will it work better if this used __builtin_trap?  Or does the kernel only
detect very specific forms of trap instructions?

> By using a generic implementation, gcc will generate a branch
> to the unconditional trap generated by BUG().

That is many more instructions than ideal.

> As modern powerpc implement branch folding, that's even more efficient.

What PowerPC cpus implement branch folding?  I know none.

Some example code generated via __builtin_trap:

void trap(void) { __builtin_trap(); }
void trap_if_0(int x) { if (x == 0) __builtin_trap(); }
void trap_if_not_0(int x) { if (x != 0) __builtin_trap(); }

-m64:

trap:
	trap
trap_if_0:
	tdeqi 3,0
	blr
trap_if_not_0:
	tdnei 3,0
	blr

-m32:

trap:
	trap
trap_if_0:
	tweqi 3,0
	blr
trap_if_not_0:
	twnei 3,0
	blr


Segher

^ permalink raw reply

* Re: [PATCH] arm64: defconfig: enable modern virtio pci device
From: Arnd Bergmann @ 2021-02-11 11:52 UTC (permalink / raw)
  To: Anders Roxell
  Cc: Chris Zankel, Thomas Bogendoerfer, Michael S. Tsirkin,
	Arnd Bergmann, linuxppc-dev, Catalin Marinas, linux-xtensa,
	Paul Walmsley, virtualization, linux-kernel@vger.kernel.org,
	Russell King - ARM Linux, Max Filippov, SoC Team, Albert Ou,
	Palmer Dabbelt, linux-riscv, open list:BROADCOM NVRAM DRIVER,
	Will Deacon, Jason Wang, Linux ARM
In-Reply-To: <20210210190506.1923684-1-anders.roxell@linaro.org>

On Wed, Feb 10, 2021 at 8:05 PM Anders Roxell <anders.roxell@linaro.org> wrote:
>
> Since patch ("virtio-pci: introduce modern device module") got added it
> is not possible to boot a defconfig kernel in qemu with a virtio pci
> device.  Add CONFIG_VIRTIO_PCI_MODERN=y fragment makes the kernel able
> to boot.
>
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> ---
>  arch/arm/configs/multi_v7_defconfig         | 1 +
>  arch/arm64/configs/defconfig                | 1 +

Acked-by: Arnd Bergmann <arnd@arndb.de>

Michael, can you pick this up in the vhost tree that introduces the regression?

         Arnd

^ permalink raw reply

* Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()
From: Segher Boessenkool @ 2021-02-11 11:50 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <1613036567.zvyupcz926.astroid@bobo.none>

On Thu, Feb 11, 2021 at 08:04:55PM +1000, Nicholas Piggin wrote:
> It would be nice if we could have a __builtin_trap_if that gcc would use 
> conditional traps with, (and which never assumes following code is 
> unreachable even for constant true, so we can use it with WARN and put 
> explicit unreachable for BUG).

It automatically does that with just __builtin_trap, see my other mail :-)


Segher

^ permalink raw reply

* Re: [PATCH] tools/perf: Fix powerpc gap between kernel end and module start
From: Athira Rajeev @ 2021-02-11 12:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Madhavan Srinivasan, linuxppc-dev, Jiri Olsa,
	Kajol Jain
In-Reply-To: <20210209124712.GC1018564@kernel.org>



> On 09-Feb-2021, at 6:17 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> Em Wed, Feb 03, 2021 at 12:31:48PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Tue, Feb 02, 2021 at 04:02:36PM +0530, Athira Rajeev escreveu:
>>> 
>>> 
>>>    On 18-Jan-2021, at 3:51 PM, kajoljain <kjain@linux.ibm.com> wrote:
>>> 
>>> 
>>> 
>>>    On 1/12/21 3:08 PM, Jiri Olsa wrote:
>>> 
>>>        On Mon, Dec 28, 2020 at 09:14:14PM -0500, Athira Rajeev wrote:
>>> 
>>>        SNIP
>>> 
>>> 
>>>            c000000002799370 b backtrace_flag
>>>            c000000002799378 B radix_tree_node_cachep
>>>            c000000002799380 B __bss_stop
>>>            c0000000027a0000 B _end
>>>            c008000003890000 t icmp_checkentry      [ip_tables]
>>>            c008000003890038 t ipt_alloc_initial_table      [ip_tables]
>>>            c008000003890468 T ipt_do_table [ip_tables]
>>>            c008000003890de8 T ipt_unregister_table_pre_exit        [ip_tables]
>>>            ...
>>> 
>>>            Perf calls function symbols__fixup_end() which sets the end of
>>>            symbol
>>>            to 0xc008000003890000, which is the next address and this is the
>>>            start
>>>            address of first module (icmp_checkentry in above) which will make
>>>            the
>>>            huge symbol size of 0x80000010f0000.
>>> 
>>>            After symbols__fixup_end:
>>>            symbols__fixup_end: sym->name: _end, sym->start:
>>>            0xc0000000027a0000,
>>>            sym->end: 0xc008000003890000
>>> 
>>>            On powerpc, kernel text segment is located at 0xc000000000000000
>>>            whereas the modules are located at very high memory addresses,
>>>            0xc00800000xxxxxxx. Since the gap between end of kernel text
>>>            segment
>>>            and beginning of first module's address is high, histogram
>>>            allocation
>>>            using calloc fails.
>>> 
>>>            Fix this by detecting the kernel's last symbol and limiting
>>>            the range of last kernel symbol to pagesize.
>>> 
>>> 
>>>    Patch looks good to me.
>>> 
>>>    Tested-By: Kajol Jain<kjain@linux.ibm.com>
>>> 
>>>    Thanks,
>>>    Kajol Jain
>>> 
>>> 
>>>            Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
>>> 
>>> 
>>>        I can't test, but since the same approach works for arm and s390,
>>>        this also looks ok
>>> 
>>>        Acked-by: Jiri Olsa <jolsa@redhat.com>
>>> 
>>>        thanks,
>>>        jirka
>>> 
>>> 
>>> Hi Arnaldo,
>>> 
>>> Can you please help review this patch and merge if this looks good..
>> 
>> Thanks, collected the Tested-by from Kajol and the Acked-by from Jiri
>> and applied to my local tree for testing, then up to my perf/core
>> branch.
> 
> Had to apply this on top.
> 
> - Arnaldo
> 
> commit 0f000f9c89182950cd3500226729977251529364
> Author: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date:   Tue Feb 9 09:41:21 2021 -0300
> 
>    perf powerpc: Fix printf conversion specifier for IP addresses
> 
>    We need to use "%#" PRIx64 for u64 values, not "%lx", fixing this build
>    problem on powerpc 32-bit:
> 
>      72    13.69 ubuntu:18.04-x-powerpc        : FAIL powerpc-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
>        arch/powerpc/util/machine.c: In function 'arch__symbols__fixup_end':
>        arch/powerpc/util/machine.c:23:12: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'u64 {aka long long unsigned int}' [-Werror=format=]
>          pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
>                    ^
>        /git/linux/tools/perf/util/debug.h:18:21: note: in definition of macro 'pr_fmt'
>         #define pr_fmt(fmt) fmt
>                             ^~~
>        /git/linux/tools/perf/util/debug.h:33:29: note: in expansion of macro 'pr_debugN'
>         #define pr_debug4(fmt, ...) pr_debugN(4, pr_fmt(fmt), ##__VA_ARGS__)
>                                     ^~~~~~~~~
>        /git/linux/tools/perf/util/debug.h:33:42: note: in expansion of macro 'pr_fmt'
>         #define pr_debug4(fmt, ...) pr_debugN(4, pr_fmt(fmt), ##__VA_ARGS__)
>                                                  ^~~~~~
>        arch/powerpc/util/machine.c:23:2: note: in expansion of macro 'pr_debug4'
>          pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
>          ^~~~~~~~~
>        cc1: all warnings being treated as errors
>        /git/linux/tools/build/Makefile.build:139: recipe for target 'util' failed
>        make[5]: *** [util] Error 2
>        /git/linux/tools/build/Makefile.build:139: recipe for target 'powerpc' failed
>        make[4]: *** [powerpc] Error 2
>        /git/linux/tools/build/Makefile.build:139: recipe for target 'arch' failed
>        make[3]: *** [arch] Error 2
>      73    30.47 ubuntu:18.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
> 
>    Fixes: 557c3eadb7712741 ("perf powerpc: Fix gap between kernel end and module start")
>    Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>    Cc: Jiri Olsa <jolsa@redhat.com>
>    Cc: Kajol Jain <kjain@linux.ibm.com>
>    Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
>    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Thanks Arnaldo for the fix.

Athira.
> 
> diff --git a/tools/perf/arch/powerpc/util/machine.c b/tools/perf/arch/powerpc/util/machine.c
> index c30e5cc88c1673d6..e652a1aa8132274f 100644
> --- a/tools/perf/arch/powerpc/util/machine.c
> +++ b/tools/perf/arch/powerpc/util/machine.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0
> 
> +#include <inttypes.h>
> #include <stdio.h>
> #include <string.h>
> #include <internal/lib.h> // page_size
> @@ -20,5 +21,5 @@ void arch__symbols__fixup_end(struct symbol *p, struct symbol *c)
> 		p->end += page_size;
> 	else
> 		p->end = c->start;
> -	pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
> +	pr_debug4("%s sym:%s end:%#" PRIx64 "\n", __func__, p->name, p->end);
> }


^ permalink raw reply

* Re: [PATCH 1/3] powerpc/perf: Adds support for programming of Thresholding in P10
From: Michael Ellerman @ 2021-02-11 12:21 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20210211112728.3410517-1-mpe@ellerman.id.au>

Michael Ellerman <mpe@ellerman.id.au> writes:
> From: Kajol Jain <kjain@linux.ibm.com>
>
> Thresholding, a performance monitoring unit feature, can be
> used to identify marked instructions which take more than
> expected cycles between start event and end event.
> Threshold compare (thresh_cmp) bits are programmed in MMCRA
> register. In Power9, thresh_cmp bits were part of the
> event code. But in case of P10, thresh_cmp are not part of
> event code due to inclusion of MMCR3 bits.

Accidental resend, ignore.

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()
From: Segher Boessenkool @ 2021-02-11 12:22 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <1613036567.zvyupcz926.astroid@bobo.none>

On Thu, Feb 11, 2021 at 08:04:55PM +1000, Nicholas Piggin wrote:
> Excerpts from Christophe Leroy's message of February 11, 2021 5:41 pm:
> > As modern powerpc implement branch folding, that's even more efficient.

Ah, it seems you mean what Arm calls branch folding.  Sure, power4
already did that, and this has not changed.

> I think POWER will speculate conditional traps as non faulting always
> so it should be just as good if not better than the branch.

Right, these are not branch instructions, so are not branch predicted;
all trap instructions are assumed to fall through, like all other
non-branch instructions.


Segher

^ permalink raw reply

* Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()
From: Christophe Leroy @ 2021-02-11 12:26 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linux-kernel, npiggin, Paul Mackerras, linuxppc-dev
In-Reply-To: <20210211114910.GA28121@gate.crashing.org>



Le 11/02/2021 à 12:49, Segher Boessenkool a écrit :
> On Thu, Feb 11, 2021 at 07:41:52AM +0000, Christophe Leroy wrote:
>> powerpc BUG_ON() is based on using twnei or tdnei instruction,
>> which obliges gcc to format the condition into a 0 or 1 value
>> in a register.
> 
> Huh?  Why is that?
> 
> Will it work better if this used __builtin_trap?  Or does the kernel only
> detect very specific forms of trap instructions?
> 
>> By using a generic implementation, gcc will generate a branch
>> to the unconditional trap generated by BUG().
> 
> That is many more instructions than ideal.
> 
>> As modern powerpc implement branch folding, that's even more efficient.
> 
> What PowerPC cpus implement branch folding?  I know none.

Extract from powerpc mpc8323 reference manual:

High instruction and data throughput
— Zero-cycle branch capability (branch folding)
— Programmable static branch prediction on unresolved conditional branches
— Two integer units with enhanced multipliers in thee300c2 for increased integer instruction
throughput and a maximum two-cycle latency for multiply instructions
— Instruction fetch unit capable of fetching two instructions per clock from the instruction cache
— A six-entry instruction queue (IQ) that provides lookahead capability
— Independent pipelines with feed-forwarding that reduces data dependencies in hardware
— 16-Kbyte, four-way set-associative instruction and data caches on the e300c2.
— Cache write-back or write-through operation programmable on a per-page or per-block basis
— Features for instruction and data cache locking and protection
— BPU that performs CR lookahead operations
— Address translation facilities for 4-Kbyte page size, variable block size, and 256-Mbyte
segment size
— A 64-entry, two-way, set-associative ITLB and DTLB
— Eight-entry data and instruction BAT arrays providing 128-Kbyte to 256-Mbyte blocks
— Software table search operations and updates supported through fast trap mechanism
— 52-bit virtual address; 32-bit physical address

Christophe

^ permalink raw reply

* Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()
From: Segher Boessenkool @ 2021-02-11 12:39 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linux-kernel, npiggin, Paul Mackerras, linuxppc-dev
In-Reply-To: <3b7f3a1e-0355-b6d4-14cd-300bf4d3629a@csgroup.eu>

On Thu, Feb 11, 2021 at 01:26:12PM +0100, Christophe Leroy wrote:
> >What PowerPC cpus implement branch folding?  I know none.
> 
> Extract from powerpc mpc8323 reference manual:

> — Zero-cycle branch capability (branch folding)

Yeah, this is not what is traditionally called branch folding (which
stores the instruction being branched to in some cache, originally the
instruction cache itself; somewhat similar (but different) to the BTIC
on 750).  Overloaded terminology :-)

6xx/7xx CPUs had the branch execution unit in the frontend, and it would
not issue a branch at all if it could be resolved then already.  Power4
and later predict all branches, and most are not issued at all (those
that do need to be executed, like bdnz, are).  At completion time it is
checked if the prediction was correct (and corrective action is taken if
not).


Segher

^ permalink raw reply

* [PATCH 3/6] powerpc/64s: Use htab_convert_pte_flags() in hash__mark_rodata_ro()
From: Michael Ellerman @ 2021-02-11 13:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar
In-Reply-To: <20210211135130.3474832-1-mpe@ellerman.id.au>

In hash__mark_rodata_ro() we pass the raw PP_RXXX value to
hash__change_memory_range(). That has the effect of setting the key to
zero, because PP_RXXX contains no key value.

Fix it by using htab_convert_pte_flags(), which knows how to convert a
pgprot into a pp value, including the key.

Fixes: d94b827e89dc ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/mm/book3s64/hash_pgtable.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 567e0c6b3978..03819c259f0a 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -428,12 +428,14 @@ static bool hash__change_memory_range(unsigned long start, unsigned long end,
 
 void hash__mark_rodata_ro(void)
 {
-	unsigned long start, end;
+	unsigned long start, end, pp;
 
 	start = (unsigned long)_stext;
 	end = (unsigned long)__init_begin;
 
-	WARN_ON(!hash__change_memory_range(start, end, PP_RXXX));
+	pp = htab_convert_pte_flags(pgprot_val(PAGE_KERNEL_ROX), HPTE_USE_KERNEL_KEY);
+
+	WARN_ON(!hash__change_memory_range(start, end, pp));
 }
 
 void hash__mark_initmem_nx(void)
-- 
2.25.1


^ permalink raw reply related

* [PATCH 1/6] powerpc/mm/64s: Add _PAGE_KERNEL_ROX
From: Michael Ellerman @ 2021-02-11 13:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar

In the past we had a fallback definition for _PAGE_KERNEL_ROX, but we
removed that in commit d82fd29c5a8c ("powerpc/mm: Distribute platform
specific PAGE and PMD flags and definitions") and added definitions
for each MMU family.

However we missed adding a definition for 64s, which was not really a
bug because it's currently not used.

But we'd like to use PAGE_KERNEL_ROX in a future patch so add a
definition now.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index a39886681629..1358fae01d04 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -116,6 +116,7 @@
  */
 #define _PAGE_KERNEL_RW		(_PAGE_PRIVILEGED | _PAGE_RW | _PAGE_DIRTY)
 #define _PAGE_KERNEL_RO		 (_PAGE_PRIVILEGED | _PAGE_READ)
+#define _PAGE_KERNEL_ROX	 (_PAGE_PRIVILEGED | _PAGE_READ | _PAGE_EXEC)
 #define _PAGE_KERNEL_RWX	(_PAGE_PRIVILEGED | _PAGE_DIRTY |	\
 				 _PAGE_RW | _PAGE_EXEC)
 /*
-- 
2.25.1


^ permalink raw reply related

* [PATCH 2/6] powerpc/pseries: Add key to flags in pSeries_lpar_hpte_updateboltedpp()
From: Michael Ellerman @ 2021-02-11 13:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar
In-Reply-To: <20210211135130.3474832-1-mpe@ellerman.id.au>

The flags argument to plpar_pte_protect() (aka. H_PROTECT), includes
the key in bits 9-13, but currently we always set those bits to zero.

In the past that hasn't been a problem because we always used key 0
for the kernel, and updateboltedpp() is only used for kernel mappings.

However since commit d94b827e89dc ("powerpc/book3s64/kuap: Use Key 3
for kernel mapping with hash translation") we are now inadvertently
changing the key (to zero) when we call plpar_pte_protect().

That hasn't broken anything because updateboltedpp() is only used for
STRICT_KERNEL_RWX, which is currently disabled on 64s due to other
bugs.

But we want to fix that, so first we need to pass the key correctly to
plpar_pte_protect(). In the `newpp` value the low 3 bits of the key
are already in the correct spot, but the high 2 bits of the key need
to be shifted down.

Fixes: d94b827e89dc ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/platforms/pseries/lpar.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 764170fdb0f7..8bbbddff7226 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -976,11 +976,13 @@ static void pSeries_lpar_hpte_updateboltedpp(unsigned long newpp,
 	slot = pSeries_lpar_hpte_find(vpn, psize, ssize);
 	BUG_ON(slot == -1);
 
-	flags = newpp & 7;
+	flags = newpp & (HPTE_R_PP | HPTE_R_N);
 	if (mmu_has_feature(MMU_FTR_KERNEL_RO))
 		/* Move pp0 into bit 8 (IBM 55) */
 		flags |= (newpp & HPTE_R_PP0) >> 55;
 
+	flags |= ((newpp & HPTE_R_KEY_HI) >> 48) | (newpp & HPTE_R_KEY_LO);
+
 	lpar_rc = plpar_pte_protect(flags, slot, 0);
 
 	BUG_ON(lpar_rc != H_SUCCESS);
-- 
2.25.1


^ permalink raw reply related

* [PATCH 4/6] powerpc/mm/64s/hash: Factor out change_memory_range()
From: Michael Ellerman @ 2021-02-11 13:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar
In-Reply-To: <20210211135130.3474832-1-mpe@ellerman.id.au>

Pull the loop calling hpte_updateboltedpp() out of
hash__change_memory_range() into a helper function. We need it to be a
separate function for the next patch.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/mm/book3s64/hash_pgtable.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 03819c259f0a..3663d3cdffac 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -400,10 +400,23 @@ EXPORT_SYMBOL_GPL(hash__has_transparent_hugepage);
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
+static void change_memory_range(unsigned long start, unsigned long end,
+				unsigned int step, unsigned long newpp)
+{
+	unsigned long idx;
+
+	pr_debug("Changing page protection on range 0x%lx-0x%lx, to 0x%lx, step 0x%x\n",
+		 start, end, newpp, step);
+
+	for (idx = start; idx < end; idx += step)
+		/* Not sure if we can do much with the return value */
+		mmu_hash_ops.hpte_updateboltedpp(newpp, idx, mmu_linear_psize,
+							mmu_kernel_ssize);
+}
+
 static bool hash__change_memory_range(unsigned long start, unsigned long end,
 				      unsigned long newpp)
 {
-	unsigned long idx;
 	unsigned int step, shift;
 
 	shift = mmu_psize_defs[mmu_linear_psize].shift;
@@ -415,13 +428,7 @@ static bool hash__change_memory_range(unsigned long start, unsigned long end,
 	if (start >= end)
 		return false;
 
-	pr_debug("Changing page protection on range 0x%lx-0x%lx, to 0x%lx, step 0x%x\n",
-		 start, end, newpp, step);
-
-	for (idx = start; idx < end; idx += step)
-		/* Not sure if we can do much with the return value */
-		mmu_hash_ops.hpte_updateboltedpp(newpp, idx, mmu_linear_psize,
-							mmu_kernel_ssize);
+	change_memory_range(start, end, step, newpp);
 
 	return true;
 }
-- 
2.25.1


^ permalink raw reply related

* [PATCH 5/6] powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR
From: Michael Ellerman @ 2021-02-11 13:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar
In-Reply-To: <20210211135130.3474832-1-mpe@ellerman.id.au>

When we enabled STRICT_KERNEL_RWX we received some reports of boot
failures when using the Hash MMU and running under phyp. The crashes
are intermittent, and often exhibit as a completely unresponsive
system, or possibly an oops.

One example, which was caught in xmon:

  [   14.068327][    T1] devtmpfs: mounted
  [   14.069302][    T1] Freeing unused kernel memory: 5568K
  [   14.142060][  T347] BUG: Unable to handle kernel instruction fetch
  [   14.142063][    T1] Run /sbin/init as init process
  [   14.142074][  T347] Faulting instruction address: 0xc000000000004400
  cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0]
      pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80
      lr: c0000000001862d4: update_rq_clock+0x44/0x110
      sp: c00000000c747880
     msr: 8000000040001031
    current = 0xc00000000c60d380
    paca    = 0xc00000001ec9de80   irqmask: 0x03   irq_happened: 0x01
      pid   = 347, comm = kworker/2:1
  ...
  enter ? for help
  [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable)
  [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0
  [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0
  [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390
  [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610
  [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480
  [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950
  [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120
  [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640
  [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0
  [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c

This shows that CPU 2, which was idle, woke up and then appears to
randomly take an instruction fault on a completely valid area of
kernel text.

The cause turns out to be the call to hash__mark_rodata_ro(), late in
boot. Due to the way we layout text and rodata, that function actually
changes the permissions for all of text and rodata to read-only plus
execute.

To do the permission change we use a hypervisor call, H_PROTECT. On
phyp that appears to be implemented by briefly removing the mapping of
the kernel text, before putting it back with the updated permissions.
If any other CPU is executing during that window, it will see spurious
faults on the kernel text and/or data, leading to crashes.

To fix it we use stop machine to collect all other CPUs, and then have
them drop into real mode (MMU off), while we change the mapping. That
way they are unaffected by the mapping temporarily disappearing.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/mm/book3s64/hash_pgtable.c | 105 +++++++++++++++++++++++-
 1 file changed, 104 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 3663d3cdffac..01de985df2c4 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -8,6 +8,7 @@
 #include <linux/sched.h>
 #include <linux/mm_types.h>
 #include <linux/mm.h>
+#include <linux/stop_machine.h>
 
 #include <asm/sections.h>
 #include <asm/mmu.h>
@@ -400,6 +401,19 @@ EXPORT_SYMBOL_GPL(hash__has_transparent_hugepage);
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
+
+struct change_memory_parms {
+	unsigned long start, end, newpp;
+	unsigned int step, nr_cpus, master_cpu;
+	atomic_t cpu_counter;
+};
+
+// We'd rather this was on the stack but it has to be in the RMO
+static struct change_memory_parms chmem_parms;
+
+// And therefore we need a lock to protect it from concurrent use
+static DEFINE_MUTEX(chmem_lock);
+
 static void change_memory_range(unsigned long start, unsigned long end,
 				unsigned int step, unsigned long newpp)
 {
@@ -414,6 +428,73 @@ static void change_memory_range(unsigned long start, unsigned long end,
 							mmu_kernel_ssize);
 }
 
+static int notrace chmem_secondary_loop(struct change_memory_parms *parms)
+{
+	unsigned long msr, tmp, flags;
+	int *p;
+
+	p = &parms->cpu_counter.counter;
+
+	local_irq_save(flags);
+	__hard_EE_RI_disable();
+
+	asm volatile (
+	// Switch to real mode and leave interrupts off
+	"mfmsr	%[msr]			;"
+	"li	%[tmp], %[MSR_IR_DR]	;"
+	"andc	%[tmp], %[msr], %[tmp]	;"
+	"mtmsrd %[tmp]			;"
+
+	// Tell the master we are in real mode
+	"1:				"
+	"lwarx	%[tmp], 0, %[p]		;"
+	"addic	%[tmp], %[tmp], -1	;"
+	"stwcx.	%[tmp], 0, %[p]		;"
+	"bne-	1b			;"
+
+	// Spin until the counter goes to zero
+	"2:				;"
+	"lwz	%[tmp], 0(%[p])		;"
+	"cmpwi	%[tmp], 0		;"
+	"bne-	2b			;"
+
+	// Switch back to virtual mode
+	"mtmsrd %[msr]			;"
+
+	: // outputs
+	  [msr] "=&r" (msr), [tmp] "=&b" (tmp), "+m" (*p)
+	: // inputs
+	  [p] "b" (p), [MSR_IR_DR] "i" (MSR_IR | MSR_DR)
+	: // clobbers
+	  "cc", "xer"
+	);
+
+	local_irq_restore(flags);
+
+	return 0;
+}
+
+static int change_memory_range_fn(void *data)
+{
+	struct change_memory_parms *parms = data;
+
+	if (parms->master_cpu != smp_processor_id())
+		return chmem_secondary_loop(parms);
+
+	// Wait for all but one CPU (this one) to call-in
+	while (atomic_read(&parms->cpu_counter) > 1)
+		barrier();
+
+	change_memory_range(parms->start, parms->end, parms->step, parms->newpp);
+
+	mb();
+
+	// Signal the other CPUs that we're done
+	atomic_dec(&parms->cpu_counter);
+
+	return 0;
+}
+
 static bool hash__change_memory_range(unsigned long start, unsigned long end,
 				      unsigned long newpp)
 {
@@ -428,7 +509,29 @@ static bool hash__change_memory_range(unsigned long start, unsigned long end,
 	if (start >= end)
 		return false;
 
-	change_memory_range(start, end, step, newpp);
+	if (firmware_has_feature(FW_FEATURE_LPAR)) {
+		mutex_lock(&chmem_lock);
+
+		chmem_parms.start = start;
+		chmem_parms.end = end;
+		chmem_parms.step = step;
+		chmem_parms.newpp = newpp;
+		chmem_parms.master_cpu = smp_processor_id();
+
+		cpus_read_lock();
+
+		atomic_set(&chmem_parms.cpu_counter, num_online_cpus());
+
+		// Ensure state is consistent before we call the other CPUs
+		mb();
+
+		stop_machine_cpuslocked(change_memory_range_fn, &chmem_parms,
+					cpu_online_mask);
+
+		cpus_read_unlock();
+		mutex_unlock(&chmem_lock);
+	} else
+		change_memory_range(start, end, step, newpp);
 
 	return true;
 }
-- 
2.25.1


^ permalink raw reply related

* [PATCH 6/6] powerpc/mm/64s: Allow STRICT_KERNEL_RWX again
From: Michael Ellerman @ 2021-02-11 13:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar
In-Reply-To: <20210211135130.3474832-1-mpe@ellerman.id.au>

We have now fixed the known bugs in STRICT_KERNEL_RWX for Book3S
64-bit Hash and Radix MMUs, see preceding commits, so allow the
option to be selected again.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 107bb4319e0e..7d6080afbad6 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -135,7 +135,7 @@ config PPC
 	select ARCH_HAS_MEMBARRIER_CALLBACKS
 	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64
-	select ARCH_HAS_STRICT_KERNEL_RWX	if (PPC32 && !HIBERNATION)
+	select ARCH_HAS_STRICT_KERNEL_RWX	if ((PPC_BOOK3S_64 || PPC32) && !HIBERNATION)
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAS_UACCESS_FLUSHCACHE
 	select ARCH_HAS_COPY_MC			if PPC64
-- 
2.25.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox