* [RFC][PATCH 00/14] syscall, context switch, idle performance stuff
@ 2017-06-02 7:39 Nicholas Piggin
2017-06-02 7:39 ` [PATCH 01/14] powerpc/64s: optimize hypercall/syscall Nicholas Piggin
` (13 more replies)
0 siblings, 14 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
Hi,
I'm sitting on a few performance improvements that I'm hoping to
try get polished up enough to merge, but it's taking a while, so
I juts post them out for review because I think most are at the
stage where they are good enough to start getting some reviews on.
After this series, light weight context switch (yield, threads,
same CPU) improves about 10% on my POWER8 (2.1m -> 2.3m per second
with powernv_defconfig and tick based time accounting).
Ping-pong context switches improve similarly, particularly when
you force them to go to nap. I'm still gathering up numbers. I
haven't been able to get POWER9 numbers yet.
Thanks,
Nick
Nicholas Piggin (14):
powerpc/64s: optimize hypercall/syscall
powerpc/64: syscall avoid restore_math call if possible
powerpc/64s: idle move soft interrupt mask logic into C code
powerpc/64s: process interrupts from system reset wakeup
powerpc/64s: msgclr when handling doorbell exceptions
powerpc/64s: branch to idle handler with virtual mode offset
powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
powerpc/64s: idle set polling before enabling irqs
powerpc/64s: idle read mostly for common globals
powerpc/64: CTRL[RUN] run-latch setting optimisation
powerpc/64s: idle no memory barrier after break from idle
powerpc/64s: Leave IRQs hard enabled over context switch for radix
powerpc/64: context switch can avoid reservation clear
powerpc/64: context switch additional hwsync can be avoided
arch/powerpc/include/asm/barrier.h | 4 +
arch/powerpc/include/asm/dbell.h | 13 +++
arch/powerpc/include/asm/exception-64s.h | 17 ++-
arch/powerpc/include/asm/hw_irq.h | 1 +
arch/powerpc/include/asm/machdep.h | 1 +
arch/powerpc/include/asm/ppc-opcode.h | 3 +
arch/powerpc/include/asm/processor.h | 8 +-
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/entry_64.S | 94 +++++++++------
arch/powerpc/kernel/exceptions-64s.S | 191 ++++++++++++++++++++++++-------
arch/powerpc/kernel/idle_book3s.S | 135 ++++++----------------
arch/powerpc/kernel/process.c | 30 +++--
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 +-
arch/powerpc/platforms/powernv/idle.c | 122 ++++++++++++++++++--
arch/powerpc/platforms/powernv/subcore.c | 3 +-
drivers/cpuidle/cpuidle-powernv.c | 37 +++---
drivers/cpuidle/cpuidle-pseries.c | 22 ++--
kernel/sched/core.c | 9 ++
18 files changed, 469 insertions(+), 230 deletions(-)
--
2.11.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 01/14] powerpc/64s: optimize hypercall/syscall
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 02/14] powerpc/64: syscall avoid restore_math call if possible Nicholas Piggin
` (12 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from
guest to host"), a getppid() system call goes from 307 cycles to 358
cycles (+17%). This is due significantly to the scratch SPR used by the
hypercall.
It turns out there are a some volatile registers common to both system
call and hypercall (in particular, r12, cr0, ctr), which can be used to
avoid the SPR and some other overheads for the system call case. This
brings getppid to 320 cycles (+4%).
Testing hcall entry performance by running "sc 1" in guest userspace
before this patch is 854 cycles, afterwards is 826. Also a small win
there.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/exceptions-64s.S | 134 +++++++++++++++++++++++++----------
1 file changed, 97 insertions(+), 37 deletions(-)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index ae418b85c17c..2f700a15bfa3 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -821,46 +821,80 @@ EXC_VIRT(trap_0b, 0x4b00, 0x100, 0xb00)
TRAMP_KVM(PACA_EXGEN, 0xb00)
EXC_COMMON(trap_0b_common, 0xb00, unknown_exception)
+/*
+ * system call / hypercall (0xc00, 0x4c00)
+ *
+ * The system call exception is invoked with "sc 0" and does not alter HV bit.
+ * There is support for kernel code to invoke system calls but there are no
+ * in-tree users.
+ *
+ * The hypercall is invoked with "sc 1" and sets HV=1.
+ *
+ * In HPT, sc 1 always goes to 0xc00 real mode. In RADIX, sc 1 can go to
+ * 0x4c00 virtual mode.
+ *
+ * Call convention:
+ *
+ * syscall register convention is in Documentation/powerpc/syscall64-abi.txt
+ *
+ * For hypercalls, the register convention is as follows:
+ * r0 volatile
+ * r1-2 nonvolatile
+ * r3 volatile parameter and return value for status
+ * r4-r10 volatile input and output value
+ * r11 volatile hypercall number and output value
+ * r12 volatile
+ * r13-r31 nonvolatile
+ * LR nonvolatile
+ * CTR volatile
+ * XER volatile
+ * CR0-1 CR5-7 volatile
+ * CR2-4 nonvolatile
+ * Other registers nonvolatile
+ *
+ * The intersection of volatile registers that don't contain possible
+ * inputs is: r12, cr0, xer, ctr. We may use these as scratch regs
+ * upon entry without saving.
+ */
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
- /*
- * If CONFIG_KVM_BOOK3S_64_HANDLER is set, save the PPR (on systems
- * that support it) before changing to HMT_MEDIUM. That allows the KVM
- * code to save that value into the guest state (it is the guest's PPR
- * value). Otherwise just change to HMT_MEDIUM as userspace has
- * already saved the PPR.
- */
+ /*
+ * There is a little bit of juggling to get syscall and hcall
+ * working well. Save r10 in ctr to be restored in case it is a
+ * hcall.
+ *
+ * Userspace syscalls have already saved the PPR, hcalls must save
+ * it before setting HMT_MEDIUM.
+ */
#define SYSCALL_KVMTEST \
- SET_SCRATCH0(r13); \
+ mr r12,r13; \
GET_PACA(r13); \
- std r9,PACA_EXGEN+EX_R9(r13); \
- OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR); \
+ mtctr r10; \
+ KVMTEST_PR(0xc00); /* uses r10, branch to do_kvm_0xc00_system_call */ \
HMT_MEDIUM; \
- std r10,PACA_EXGEN+EX_R10(r13); \
- OPT_SAVE_REG_TO_PACA(PACA_EXGEN+EX_PPR, r9, CPU_FTR_HAS_PPR); \
- mfcr r9; \
- KVMTEST_PR(0xc00); \
- GET_SCRATCH0(r13)
+ mr r9,r12; \
#else
#define SYSCALL_KVMTEST \
- HMT_MEDIUM
+ HMT_MEDIUM; \
+ mr r9,r13; \
+ GET_PACA(r13);
#endif
#define LOAD_SYSCALL_HANDLER(reg) \
__LOAD_HANDLER(reg, system_call_common)
-/* Syscall routine is used twice, in reloc-off and reloc-on paths */
-#define SYSCALL_PSERIES_1 \
+#define SYSCALL_FASTENDIAN_TEST \
BEGIN_FTR_SECTION \
cmpdi r0,0x1ebe ; \
beq- 1f ; \
END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) \
- mr r9,r13 ; \
- GET_PACA(r13) ; \
- mfspr r11,SPRN_SRR0 ; \
-0:
-#define SYSCALL_PSERIES_2_RFID \
+/*
+ * After SYSCALL_KVMTEST, we reach here with PACA in r13, r13 in r9,
+ * and HMT_MEDIUM.
+ */
+#define SYSCALL_REAL \
+ mfspr r11,SPRN_SRR0 ; \
mfspr r12,SPRN_SRR1 ; \
LOAD_SYSCALL_HANDLER(r10) ; \
mtspr SPRN_SRR0,r10 ; \
@@ -869,11 +903,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) \
rfid ; \
b . ; /* prevent speculative execution */
-#define SYSCALL_PSERIES_3 \
+#define SYSCALL_FASTENDIAN \
/* Fast LE/BE switch system call */ \
1: mfspr r12,SPRN_SRR1 ; \
xori r12,r12,MSR_LE ; \
mtspr SPRN_SRR1,r12 ; \
+ mr r13,r9 ; \
rfid ; /* return to userspace */ \
b . ; /* prevent speculative execution */
@@ -882,16 +917,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) \
* We can't branch directly so we do it via the CTR which
* is volatile across system calls.
*/
-#define SYSCALL_PSERIES_2_DIRECT \
- LOAD_SYSCALL_HANDLER(r12) ; \
- mtctr r12 ; \
+#define SYSCALL_VIRT \
+ LOAD_SYSCALL_HANDLER(r10) ; \
+ mtctr r10 ; \
+ mfspr r11,SPRN_SRR0 ; \
mfspr r12,SPRN_SRR1 ; \
li r10,MSR_RI ; \
mtmsrd r10,1 ; \
bctr ;
#else
/* We can branch directly */
-#define SYSCALL_PSERIES_2_DIRECT \
+#define SYSCALL_VIRT \
+ mfspr r11,SPRN_SRR0 ; \
mfspr r12,SPRN_SRR1 ; \
li r10,MSR_RI ; \
mtmsrd r10,1 ; /* Set RI (EE=0) */ \
@@ -899,20 +936,43 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) \
#endif
EXC_REAL_BEGIN(system_call, 0xc00, 0x100)
- SYSCALL_KVMTEST
- SYSCALL_PSERIES_1
- SYSCALL_PSERIES_2_RFID
- SYSCALL_PSERIES_3
+ SYSCALL_KVMTEST /* loads PACA into r13, and saves r13 to r9 */
+ SYSCALL_FASTENDIAN_TEST
+ SYSCALL_REAL
+ SYSCALL_FASTENDIAN
EXC_REAL_END(system_call, 0xc00, 0x100)
EXC_VIRT_BEGIN(system_call, 0x4c00, 0x100)
- SYSCALL_KVMTEST
- SYSCALL_PSERIES_1
- SYSCALL_PSERIES_2_DIRECT
- SYSCALL_PSERIES_3
+ SYSCALL_KVMTEST /* loads PACA into r13, and saves r13 to r9 */
+ SYSCALL_FASTENDIAN_TEST
+ SYSCALL_VIRT
+ SYSCALL_FASTENDIAN
EXC_VIRT_END(system_call, 0x4c00, 0x100)
-TRAMP_KVM(PACA_EXGEN, 0xc00)
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+ /*
+ * This is a hcall, so register convention is as above, with these
+ * differences:
+ * r13 = PACA
+ * r12 = orig r13
+ * ctr = orig r10
+ */
+TRAMP_KVM_BEGIN(do_kvm_0xc00)
+ /*
+ * Save the PPR (on systems that support it) before changing to
+ * HMT_MEDIUM. That allows the KVM code to save that value into the
+ * guest state (it is the guest's PPR value).
+ */
+ OPT_GET_SPR(r0, SPRN_PPR, CPU_FTR_HAS_PPR)
+ HMT_MEDIUM
+ OPT_SAVE_REG_TO_PACA(PACA_EXGEN+EX_PPR, r0, CPU_FTR_HAS_PPR)
+ mfctr r10
+ SET_SCRATCH0(r12)
+ std r9,PACA_EXGEN+EX_R9(r13)
+ mfcr r9
+ std r10,PACA_EXGEN+EX_R10(r13)
+ KVM_HANDLER(PACA_EXGEN, EXC_STD, 0xc00)
+#endif
EXC_REAL(single_step, 0xd00, 0x100)
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 02/14] powerpc/64: syscall avoid restore_math call if possible
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
2017-06-02 7:39 ` [PATCH 01/14] powerpc/64s: optimize hypercall/syscall Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 03/14] powerpc/64s: idle move soft interrupt mask logic into C code Nicholas Piggin
` (11 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
The syscall exit code that branches to restore_math is quite
heavyweight on Book3S, consisting of 2 mtmsr instructions. This
case can happen even if restore_math decides there is nothing
to do due to lazy math restore.
Check for lazy restore before calling restore_math. Move most
of that case out of line.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/entry_64.S | 62 +++++++++++++++++++++++++++++-------------
arch/powerpc/kernel/process.c | 4 +++
2 files changed, 47 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index bfbad08a1207..019a6322b982 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -210,27 +210,17 @@ system_call: /* label this so stack traces look sane */
andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne- syscall_exit_work
- andi. r0,r8,MSR_FP
- beq 2f
+ /* If MSR_FP and MSR_VEC are set in user msr, then no need to restore */
+ li r0,MSR_FP
#ifdef CONFIG_ALTIVEC
- andis. r0,r8,MSR_VEC@h
- bne 3f
+ oris r0,r0,MSR_VEC@h
#endif
-2: addi r3,r1,STACK_FRAME_OVERHEAD
-#ifdef CONFIG_PPC_BOOK3S
- li r10,MSR_RI
- mtmsrd r10,1 /* Restore RI */
-#endif
- bl restore_math
-#ifdef CONFIG_PPC_BOOK3S
- li r11,0
- mtmsrd r11,1
-#endif
- ld r8,_MSR(r1)
- ld r3,RESULT(r1)
- li r11,-MAX_ERRNO
+ andc r7,r8,r0
+ cmpd r7,r0
+ bne syscall_restore_math
+.Lsyscall_restore_math_cont:
-3: cmpld r3,r11
+ cmpld r3,r11
ld r5,_CCR(r1)
bge- syscall_error
.Lsyscall_error_cont:
@@ -263,7 +253,41 @@ syscall_error:
neg r3,r3
std r5,_CCR(r1)
b .Lsyscall_error_cont
-
+
+syscall_restore_math:
+ /*
+ * Some initial tests from restore_math to avoid the heavyweight
+ * C code entry and MSR manipulations.
+ */
+ LOAD_REG_IMMEDIATE(r0, MSR_TS_MASK)
+ and. r0,r0,r8
+ bne 1f
+
+ ld r7,PACACURRENT(r13)
+ lbz r0,THREAD+THREAD_LOAD_FP(r7)
+#ifdef CONFIG_ALTIVEC
+ lbz r6,THREAD+THREAD_LOAD_VEC(r7)
+ add r0,r0,r6
+#endif
+ cmpdi r0,0
+ beq .Lsyscall_restore_math_cont
+
+1: addi r3,r1,STACK_FRAME_OVERHEAD
+#ifdef CONFIG_PPC_BOOK3S
+ li r10,MSR_RI
+ mtmsrd r10,1 /* Restore RI */
+#endif
+ bl restore_math
+#ifdef CONFIG_PPC_BOOK3S
+ li r11,0
+ mtmsrd r11,1
+#endif
+ /* Restore volatiles, reload MSR from updated one */
+ ld r8,_MSR(r1)
+ ld r3,RESULT(r1)
+ li r11,-MAX_ERRNO
+ b .Lsyscall_restore_math_cont
+
/* Traced system call support */
syscall_dotrace:
bl save_nvgprs
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index baae104b16c7..5cbb8b1faf7e 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -511,6 +511,10 @@ void restore_math(struct pt_regs *regs)
{
unsigned long msr;
+ /*
+ * Syscall exit makes a similar initial check before branching
+ * to restore_math. Keep them in synch.
+ */
if (!msr_tm_active(regs->msr) &&
!current->thread.load_fp && !loadvec(current->thread))
return;
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 03/14] powerpc/64s: idle move soft interrupt mask logic into C code
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
2017-06-02 7:39 ` [PATCH 01/14] powerpc/64s: optimize hypercall/syscall Nicholas Piggin
2017-06-02 7:39 ` [PATCH 02/14] powerpc/64: syscall avoid restore_math call if possible Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 04/14] powerpc/64s: process interrupts from system reset wakeup Nicholas Piggin
` (10 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
This simplifies the asm and fixes irq-off tracing over sleep
instructions.
Also move powersave_nap check for POWER8 into C code, and move
PSSCR register value calculation for POWER9 into C.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/machdep.h | 1 +
arch/powerpc/include/asm/processor.h | 8 +--
arch/powerpc/kernel/idle_book3s.S | 84 +++++-----------------
arch/powerpc/platforms/powernv/idle.c | 116 +++++++++++++++++++++++++++----
arch/powerpc/platforms/powernv/subcore.c | 3 +-
drivers/cpuidle/cpuidle-powernv.c | 12 ++--
6 files changed, 132 insertions(+), 92 deletions(-)
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index f90b22c722e1..cd2fc1cc1cc7 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -226,6 +226,7 @@ struct machdep_calls {
extern void e500_idle(void);
extern void power4_idle(void);
extern void power7_idle(void);
+extern void power9_idle(void);
extern void ppc6xx_idle(void);
extern void book3e_idle(void);
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index a2123f291ab0..7b76b69a452e 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -481,10 +481,10 @@ extern unsigned long cpuidle_disable;
enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
extern int powersave_nap; /* set if nap mode can be used in idle loop */
-extern unsigned long power7_nap(int check_irq);
-extern unsigned long power7_sleep(void);
-extern unsigned long power7_winkle(void);
-extern unsigned long power9_idle_stop(unsigned long stop_psscr_val,
+extern unsigned long power7_idle_insn(unsigned long type); /* PNV_THREAD_NAP/etc*/
+extern unsigned long power7_idle_type(unsigned long type);
+extern unsigned long power9_idle_stop(unsigned long psscr_val);
+extern unsigned long power9_idle_type(unsigned long stop_psscr_val,
unsigned long stop_psscr_mask);
extern void flush_instruction_cache(void);
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 4898d676dcae..c7edb374d1aa 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -106,13 +106,9 @@ core_idle_lock_held:
/*
* Pass requested state in r3:
* r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
- * - Requested STOP state in POWER9
+ * - Requested PSSCR value in POWER9
*
- * To check IRQ_HAPPENED in r4
- * 0 - don't check
- * 1 - check
- *
- * Address to 'rfid' to in r5
+ * Address of idle handler to 'rfid' to in r4
*/
pnv_powersave_common:
/* Use r3 to pass state nap/sleep/winkle */
@@ -128,30 +124,7 @@ pnv_powersave_common:
std r0,_LINK(r1)
std r0,_NIP(r1)
- /* Hard disable interrupts */
- mfmsr r9
- rldicl r9,r9,48,1
- rotldi r9,r9,16
- mtmsrd r9,1 /* hard-disable interrupts */
-
- /* Check if something happened while soft-disabled */
- lbz r0,PACAIRQHAPPENED(r13)
- andi. r0,r0,~PACA_IRQ_HARD_DIS@l
- beq 1f
- cmpwi cr0,r4,0
- beq 1f
- addi r1,r1,INT_FRAME_SIZE
- ld r0,16(r1)
- li r3,0 /* Return 0 (no nap) */
- mtlr r0
- blr
-
-1: /* We mark irqs hard disabled as this is the state we'll
- * be in when returning and we need to tell arch_local_irq_restore()
- * about it
- */
- li r0,PACA_IRQ_HARD_DIS
- stb r0,PACAIRQHAPPENED(r13)
+ mfmsr r9
/* We haven't lost state ... yet */
li r0,0
@@ -160,8 +133,8 @@ pnv_powersave_common:
/* Continue saving state */
SAVE_GPR(2, r1)
SAVE_NVGPRS(r1)
- mfcr r4
- std r4,_CCR(r1)
+ mfcr r5
+ std r5,_CCR(r1)
std r9,_MSR(r1)
std r1,PACAR1(r13)
@@ -175,7 +148,7 @@ pnv_powersave_common:
li r6, MSR_RI
andc r6, r9, r6
mtmsrd r6, 1 /* clear RI before setting SRR0/1 */
- mtspr SPRN_SRR0, r5
+ mtspr SPRN_SRR0, r4
mtspr SPRN_SRR1, r7
rfid
@@ -319,35 +292,14 @@ lwarx_loop_stop:
IDLE_STATE_ENTER_SEQ_NORET(PPC_STOP)
-_GLOBAL(power7_idle)
+/*
+ * Entered with MSR[EE]=0 and no soft-masked interrupts pending.
+ * r3 contains desired idle state (PNV_THREAD_NAP/SLEEP/WINKLE).
+ */
+_GLOBAL(power7_idle_insn)
/* Now check if user or arch enabled NAP mode */
- LOAD_REG_ADDRBASE(r3,powersave_nap)
- lwz r4,ADDROFF(powersave_nap)(r3)
- cmpwi 0,r4,0
- beqlr
- li r3, 1
- /* fall through */
-
-_GLOBAL(power7_nap)
- mr r4,r3
- li r3,PNV_THREAD_NAP
- LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
- b pnv_powersave_common
- /* No return */
-
-_GLOBAL(power7_sleep)
- li r3,PNV_THREAD_SLEEP
- li r4,1
- LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
+ LOAD_REG_ADDR(r4, pnv_enter_arch207_idle_mode)
b pnv_powersave_common
- /* No return */
-
-_GLOBAL(power7_winkle)
- li r3,PNV_THREAD_WINKLE
- li r4,1
- LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
- b pnv_powersave_common
- /* No return */
#define CHECK_HMI_INTERRUPT \
mfspr r0,SPRN_SRR1; \
@@ -369,16 +321,12 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \
20: nop;
/*
- * r3 - The PSSCR value corresponding to the stop state.
- * r4 - The PSSCR mask corrresonding to the stop state.
+ * Entered with MSR[EE]=0 and no soft-masked interrupts pending.
+ * r3 contains desired PSSCR register value.
*/
_GLOBAL(power9_idle_stop)
- mfspr r5,SPRN_PSSCR
- andc r5,r5,r4
- or r3,r3,r5
- mtspr SPRN_PSSCR,r3
- LOAD_REG_ADDR(r5,power_enter_stop)
- li r4,1
+ mtspr SPRN_PSSCR,r3
+ LOAD_REG_ADDR(r4,power_enter_stop)
b pnv_powersave_common
/* No return */
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 445f30a2c5ef..5886657fd1b6 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -23,6 +23,7 @@
#include <asm/cpuidle.h>
#include <asm/code-patching.h>
#include <asm/smp.h>
+#include <asm/runlatch.h>
#include "powernv.h"
#include "subcore.h"
@@ -240,14 +241,6 @@ static u64 pnv_default_stop_mask;
static bool default_stop_found;
/*
- * Used for ppc_md.power_save which needs a function with no parameters
- */
-static void power9_idle(void)
-{
- power9_idle_stop(pnv_default_stop_val, pnv_default_stop_mask);
-}
-
-/*
* First deep stop state. Used to figure out when to save/restore
* hypervisor context.
*/
@@ -261,6 +254,105 @@ static u64 pnv_deepest_stop_psscr_val;
static u64 pnv_deepest_stop_psscr_mask;
static bool deepest_stop_found;
+unsigned long power7_idle_type(unsigned long type)
+{
+ unsigned long srr1;
+
+ WARN_ON(!irqs_disabled());
+
+ /*
+ * Set up soft-enabled state here. Hard disable and ensure no
+ * irqs are pending before low-level idle entry. Interrupts are
+ * effectively enabled after idle is executed, so lockdep is
+ * told that interrupts are on here.
+ *
+ * We don't use prep_irq_for_idle because the idle wakeup code
+ * actually returns with interrupts hard disabled here.
+ */
+
+ __hard_irq_disable();
+ local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+
+ /*
+ * If anything happened while we were soft-disabled,
+ * we return now and do not enter the low power state.
+ */
+ if (lazy_irq_pending())
+ return 0;
+
+ /* Tell lockdep we are about to re-enable */
+ trace_hardirqs_on();
+
+ ppc64_runlatch_off();
+ srr1 = power7_idle_insn(type);
+ ppc64_runlatch_on();
+
+ trace_hardirqs_off();
+
+ return srr1;
+}
+
+void power7_idle(void)
+{
+ if (!powersave_nap)
+ return;
+
+ power7_idle_type(PNV_THREAD_NAP);
+}
+
+unsigned long power9_idle_type(unsigned long stop_psscr_val,
+ unsigned long stop_psscr_mask)
+{
+ unsigned long psscr;
+ unsigned long srr1;
+
+ WARN_ON(!irqs_disabled());
+
+ /*
+ * Set up soft-enabled state here. Hard disable and ensure no
+ * irqs are pending before low-level idle entry. Interrupts are
+ * effectively enabled after idle is executed, so lockdep is
+ * told that interrupts are on here.
+ *
+ * We don't use prep_irq_for_idle because the idle wakeup code
+ * actually returns with interrupts hard disabled here.
+ */
+
+ __hard_irq_disable();
+ local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+
+ /*
+ * If anything happened while we were soft-disabled,
+ * we return now and do not enter the low power state.
+ */
+ if (lazy_irq_pending())
+ return 0;
+
+ /* Tell lockdep we are about to re-enable */
+ trace_hardirqs_on();
+
+ ppc64_runlatch_off();
+
+ psscr = mfspr(SPRN_PSSCR);
+ psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
+
+ srr1 = power9_idle_stop(psscr);
+
+ ppc64_runlatch_on();
+
+ trace_hardirqs_off();
+
+ return srr1;
+}
+
+/*
+ * Used for ppc_md.power_save which needs a function with no parameters
+ */
+void power9_idle(void)
+{
+ power9_idle_type(pnv_default_stop_val, pnv_default_stop_mask);
+}
+
/*
* pnv_cpu_offline: A function that puts the CPU into the deepest
* available platform idle state on a CPU-Offline.
@@ -272,15 +364,15 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
u32 idle_states = pnv_get_supported_cpuidle_states();
if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
- srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val,
+ srr1 = power9_idle_type(pnv_deepest_stop_psscr_val,
pnv_deepest_stop_psscr_mask);
} else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
- srr1 = power7_winkle();
+ srr1 = power7_idle_type(PNV_THREAD_WINKLE);
} else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
(idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
- srr1 = power7_sleep();
+ srr1 = power7_idle_type(PNV_THREAD_SLEEP);
} else if (idle_states & OPAL_PM_NAP_ENABLED) {
- srr1 = power7_nap(1);
+ srr1 = power7_idle_type(PNV_THREAD_NAP);
} else {
/* This is the fallback method. We emulate snooze */
while (!generic_check_cpu_restart(cpu)) {
diff --git a/arch/powerpc/platforms/powernv/subcore.c b/arch/powerpc/platforms/powernv/subcore.c
index 0babef11136f..d975d78188a9 100644
--- a/arch/powerpc/platforms/powernv/subcore.c
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -18,6 +18,7 @@
#include <linux/stop_machine.h>
#include <asm/cputhreads.h>
+#include <asm/cpuidle.h>
#include <asm/kvm_ppc.h>
#include <asm/machdep.h>
#include <asm/opal.h>
@@ -182,7 +183,7 @@ static void unsplit_core(void)
cpu = smp_processor_id();
if (cpu_thread_in_core(cpu) != 0) {
while (mfspr(SPRN_HID0) & mask)
- power7_nap(0);
+ power7_idle_insn(PNV_THREAD_NAP);
per_cpu(split_state, cpu).step = SYNC_STEP_UNSPLIT;
return;
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 12409a519cc5..150b971c303b 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -73,9 +73,8 @@ static int nap_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
- ppc64_runlatch_off();
- power7_idle();
- ppc64_runlatch_on();
+ power7_idle_type(PNV_THREAD_NAP);
+
return index;
}
@@ -98,7 +97,8 @@ static int fastsleep_loop(struct cpuidle_device *dev,
new_lpcr &= ~LPCR_PECE1;
mtspr(SPRN_LPCR, new_lpcr);
- power7_sleep();
+
+ power7_idle_type(PNV_THREAD_SLEEP);
mtspr(SPRN_LPCR, old_lpcr);
@@ -110,10 +110,8 @@ static int stop_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
- ppc64_runlatch_off();
- power9_idle_stop(stop_psscr_table[index].val,
+ power9_idle_type(stop_psscr_table[index].val,
stop_psscr_table[index].mask);
- ppc64_runlatch_on();
return index;
}
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 04/14] powerpc/64s: process interrupts from system reset wakeup
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
` (2 preceding siblings ...)
2017-06-02 7:39 ` [PATCH 03/14] powerpc/64s: idle move soft interrupt mask logic into C code Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 05/14] powerpc/64s: msgclr when handling doorbell exceptions Nicholas Piggin
` (9 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
When the CPU wakes from low power state, it begins at the system reset
interrupt with the exception that caused the wakeup encoded in SRR1.
Today, powernv idle wakeup ignores the wakeup reason (except a special
case for HMI), and the regular interrupt corresponding to the
exception will fire after the idle wakeup exits.
Change this to replay the interrupt from the idle wakeup before
interrupts are hard-enabled.
Test on POWER8 of context_switch selftests benchmark with polling idle
disabled (e.g., always nap, giving cross-CPU IPIs) gives the following
results:
original wakeup direct
Different threads, same core: 315k/s 264k/s
Different cores: 235k/s 242k/s
There is a slowdown for doorbell IPI (same core) case because system
reset wakeup does not clear the message and the doorbell interrupt
fires again needlessly.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/hw_irq.h | 1 +
arch/powerpc/kernel/exceptions-64s.S | 27 +++++++++++++++++++++++++++
arch/powerpc/platforms/powernv/idle.c | 6 ++++++
3 files changed, 34 insertions(+)
diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index eba60416536e..0ef9a33c139f 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -32,6 +32,7 @@
#ifndef __ASSEMBLY__
extern void __replay_interrupt(unsigned int vector);
+extern void __replay_wakeup_interrupt(unsigned long srr1);
extern void timer_interrupt(struct pt_regs *);
extern void performance_monitor_exception(struct pt_regs *regs);
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 2f700a15bfa3..69fe20b2b0cd 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1646,3 +1646,30 @@ FTR_SECTION_ELSE
beq doorbell_super_common
ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
blr
+
+/*
+ * Similar to __replay_interrupt but called from cpu idle wakeup
+ * with SRR1 wake value in r3.
+ */
+_GLOBAL(__replay_wakeup_interrupt)
+ extrdi r3,r3,42,4 /* Get SRR1 wake reason in low bits */
+ mfmsr r12
+ mflr r11
+ mfcr r9
+ /* Don't set EE in MSR, we have hard disable set */
+ cmpwi r3,0x6
+ beq decrementer_common
+ cmpwi r3,0x8
+ beq hardware_interrupt_common
+BEGIN_FTR_SECTION
+ cmpwi r3,0x3
+ beq h_doorbell_common
+ cmpwi r3,0x9
+ beq h_virt_irq_common
+ cmpwi r3,0xa
+ beq hmi_exception_common
+FTR_SECTION_ELSE
+ cmpwi r3,0x5
+ beq doorbell_super_common
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
+ blr
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 5886657fd1b6..2ed79ab35d8d 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -289,6 +289,8 @@ unsigned long power7_idle_type(unsigned long type)
trace_hardirqs_off();
+ __replay_wakeup_interrupt(srr1);
+
return srr1;
}
@@ -342,6 +344,8 @@ unsigned long power9_idle_type(unsigned long stop_psscr_val,
trace_hardirqs_off();
+ __replay_wakeup_interrupt(srr1);
+
return srr1;
}
@@ -671,6 +675,8 @@ static int __init pnv_init_idle_states(void)
if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
ppc_md.power_save = power7_idle;
+ else if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
+ ppc_md.power_save = power9_idle;
out:
return 0;
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 05/14] powerpc/64s: msgclr when handling doorbell exceptions
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
` (3 preceding siblings ...)
2017-06-02 7:39 ` [PATCH 04/14] powerpc/64s: process interrupts from system reset wakeup Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 06/14] powerpc/64s: branch to idle handler with virtual mode offset Nicholas Piggin
` (8 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
msgsnd doorbell exceptions are cleared when the doorbell interrupt is
taken. However if a doorbell exception causes a system reset interrupt
wake from power saving state, the message is not cleared. Processing
the doorbell from the system reset interrupt requires msgclr to avoid
taking the exception again.
Testing this plus the previous wakup direct patch gives:
original wakeup direct msgclr
Different threads, same core: 315k/s 264k/s 345k/s
Different cores: 235k/s 242k/s 242k/s
Net speedup is +10% for same core, and +3% for different core.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/dbell.h | 13 +++++++++++++
arch/powerpc/include/asm/ppc-opcode.h | 3 +++
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/exceptions-64s.S | 27 +++++++++++++++++++++++----
4 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/dbell.h b/arch/powerpc/include/asm/dbell.h
index f70cbfe0ec04..9f2ae0d25e15 100644
--- a/arch/powerpc/include/asm/dbell.h
+++ b/arch/powerpc/include/asm/dbell.h
@@ -56,6 +56,19 @@ static inline void ppc_msgsync(void)
: : "i" (CPU_FTR_HVMODE|CPU_FTR_ARCH_300));
}
+static inline void _ppc_msgclr(u32 msg)
+{
+ __asm__ __volatile__ (ASM_FTR_IFSET(PPC_MSGCLR(%1), PPC_MSGCLRP(%1), %0)
+ : : "i" (CPU_FTR_HVMODE), "r" (msg));
+}
+
+static inline void ppc_msgclr(enum ppc_dbell type)
+{
+ u32 msg = PPC_DBELL_TYPE(type);
+
+ _ppc_msgclr(msg);
+}
+
#else /* CONFIG_PPC_BOOK3S */
#define PPC_DBELL_MSGTYPE PPC_DBELL
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 3a8d278e7421..3b29c54e51fa 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -221,6 +221,7 @@
#define PPC_INST_MSGCLR 0x7c0001dc
#define PPC_INST_MSGSYNC 0x7c0006ec
#define PPC_INST_MSGSNDP 0x7c00011c
+#define PPC_INST_MSGCLRP 0x7c00015c
#define PPC_INST_MTTMR 0x7c0003dc
#define PPC_INST_NOP 0x60000000
#define PPC_INST_PASTE 0x7c00070c
@@ -409,6 +410,8 @@
___PPC_RB(b))
#define PPC_MSGSNDP(b) stringify_in_c(.long PPC_INST_MSGSNDP | \
___PPC_RB(b))
+#define PPC_MSGCLRP(b) stringify_in_c(.long PPC_INST_MSGCLRP | \
+ ___PPC_RB(b))
#define PPC_POPCNTB(a, s) stringify_in_c(.long PPC_INST_POPCNTB | \
__PPC_RA(a) | __PPC_RS(s))
#define PPC_POPCNTD(a, s) stringify_in_c(.long PPC_INST_POPCNTD | \
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 709e23425317..bd56c78ba87a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -745,6 +745,7 @@ int main(void)
#endif
DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+ DEFINE(PPC_DBELL_MSGTYPE, PPC_DBELL_MSGTYPE);
#ifdef CONFIG_PPC_8xx
DEFINE(VIRT_IMMR_BASE, (u64)__fix_to_virt(FIX_IMMR_BASE));
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 69fe20b2b0cd..06f9c573b1a1 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1612,6 +1612,25 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
b 1b
/*
+ * When doorbell is triggered from system reset wakeup, the message is
+ * not cleared, so it would fire again when EE is enabled.
+ *
+ * When coming from local_irq_enable, there may be the same problem if
+ * we were hard disabled.
+ *
+ * Execute msgclr to clear pending exceptions before handling it.
+ */
+h_doorbell_common_msgclr:
+ LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
+ PPC_MSGCLR(3)
+ b h_doorbell_common
+
+doorbell_super_common_msgclr:
+ LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
+ PPC_MSGCLRP(3)
+ b doorbell_super_common
+
+/*
* Called from arch_local_irq_enable when an interrupt needs
* to be resent. r3 contains 0x500, 0x900, 0xa00 or 0xe80 to indicate
* which kind of interrupt. MSR:EE is already off. We generate a
@@ -1636,14 +1655,14 @@ _GLOBAL(__replay_interrupt)
beq hardware_interrupt_common
BEGIN_FTR_SECTION
cmpwi r3,0xe80
- beq h_doorbell_common
+ beq h_doorbell_common_msgclr
cmpwi r3,0xea0
beq h_virt_irq_common
cmpwi r3,0xe60
beq hmi_exception_common
FTR_SECTION_ELSE
cmpwi r3,0xa00
- beq doorbell_super_common
+ beq doorbell_super_common_msgclr
ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
blr
@@ -1663,13 +1682,13 @@ _GLOBAL(__replay_wakeup_interrupt)
beq hardware_interrupt_common
BEGIN_FTR_SECTION
cmpwi r3,0x3
- beq h_doorbell_common
+ beq h_doorbell_common_msgclr
cmpwi r3,0x9
beq h_virt_irq_common
cmpwi r3,0xa
beq hmi_exception_common
FTR_SECTION_ELSE
cmpwi r3,0x5
- beq doorbell_super_common
+ beq doorbell_super_common_msgclr
ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
blr
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 06/14] powerpc/64s: branch to idle handler with virtual mode offset
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
` (4 preceding siblings ...)
2017-06-02 7:39 ` [PATCH 05/14] powerpc/64s: msgclr when handling doorbell exceptions Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 07/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths Nicholas Piggin
` (7 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
Have the system reset idle wakeup handlers branched to in real mode
with the 0xc... kernel address applied. This allows simplifications
of avoiding rfid when switching to virtual mode in the wakeup handler.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/exception-64s.h | 17 ++++++++++++++---
arch/powerpc/kernel/exceptions-64s.S | 6 ++++--
2 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 183d73b6ed99..0912e328e1d7 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -236,15 +236,26 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
#define kvmppc_interrupt kvmppc_interrupt_pr
#endif
+/*
+ * Branch to label using its 0xC000 address. This gives the same real address
+ * when relocation is off, but allows mtmsr to set MSR[IR|DR]=1.
+ * This could set the 0xc bits for !RELOCATABLE rather than load KBASE for
+ * a slight optimisation.
+ */
+#define BRANCH_TO_C000(reg, label) \
+ __LOAD_HANDLER(reg, label); \
+ mtctr reg; \
+ bctr
+
#ifdef CONFIG_RELOCATABLE
#define BRANCH_TO_COMMON(reg, label) \
__LOAD_HANDLER(reg, label); \
mtctr reg; \
bctr
-#define BRANCH_LINK_TO_FAR(label) \
- __LOAD_FAR_HANDLER(r12, label); \
- mtctr r12; \
+#define BRANCH_LINK_TO_FAR(reg, label) \
+ __LOAD_FAR_HANDLER(reg, label); \
+ mtctr reg; \
bctrl
/*
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 06f9c573b1a1..0c07a3e3158a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -99,7 +99,9 @@ EXC_VIRT_NONE(0x4000, 0x100)
#ifdef CONFIG_PPC_P7_NAP
/*
* If running native on arch 2.06 or later, check if we are waking up
- * from nap/sleep/winkle, and branch to idle handler.
+ * from nap/sleep/winkle, and branch to idle handler. The idle wakeup
+ * handler initially runs in real mode, but we branch to the 0xc000...
+ * address so we can turn on relocation with mtmsr.
*/
#define IDLETEST(n) \
BEGIN_FTR_SECTION ; \
@@ -107,7 +109,7 @@ EXC_VIRT_NONE(0x4000, 0x100)
rlwinm. r10,r10,47-31,30,31 ; \
beq- 1f ; \
cmpwi cr3,r10,2 ; \
- BRANCH_TO_COMMON(r10, system_reset_idle_common) ; \
+ BRANCH_TO_C000(r10, system_reset_idle_common) ; \
1: \
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
#else
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 07/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
` (5 preceding siblings ...)
2017-06-02 7:39 ` [PATCH 06/14] powerpc/64s: branch to idle handler with virtual mode offset Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 13:37 ` kbuild test robot
2017-06-02 7:39 ` [PATCH 08/14] powerpc/64s: idle set polling before enabling irqs Nicholas Piggin
` (6 subsequent siblings)
13 siblings, 1 reply; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
Idle code now always runs at the 0xc... effective address whether
in real or virtual mode. This means rfid can be ditched, along
with a lot of SRR manipulations.
In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change
MSR states as required.
I haven't tested KVM with this yet.
---
arch/powerpc/kernel/exceptions-64s.S | 1 +
arch/powerpc/kernel/idle_book3s.S | 57 +++++++++++++++------------------
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 ++++-
3 files changed, 33 insertions(+), 33 deletions(-)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 0c07a3e3158a..0f87ae9ac1db 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -130,6 +130,7 @@ EXC_VIRT_NONE(0x4100, 0x100)
#ifdef CONFIG_PPC_P7_NAP
EXC_COMMON_BEGIN(system_reset_idle_common)
+ mfspr r12,SPRN_SRR1
b pnv_powersave_wakeup
#endif
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index c7edb374d1aa..2efb88da8ba3 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -108,7 +108,7 @@ core_idle_lock_held:
* r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
* - Requested PSSCR value in POWER9
*
- * Address of idle handler to 'rfid' to in r4
+ * Address of idle handler to branch to in realmode in r4
*/
pnv_powersave_common:
/* Use r3 to pass state nap/sleep/winkle */
@@ -118,14 +118,14 @@ pnv_powersave_common:
* need to save PC, some CR bits and the NV GPRs,
* but for now an interrupt frame will do.
*/
+ mtctr r4
+
mflr r0
std r0,16(r1)
stdu r1,-INT_FRAME_SIZE(r1)
std r0,_LINK(r1)
std r0,_NIP(r1)
- mfmsr r9
-
/* We haven't lost state ... yet */
li r0,0
stb r0,PACA_NAPSTATELOST(r13)
@@ -135,7 +135,6 @@ pnv_powersave_common:
SAVE_NVGPRS(r1)
mfcr r5
std r5,_CCR(r1)
- std r9,_MSR(r1)
std r1,PACAR1(r13)
/*
@@ -145,12 +144,8 @@ pnv_powersave_common:
* the MMU context to the guest.
*/
LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
- li r6, MSR_RI
- andc r6, r9, r6
- mtmsrd r6, 1 /* clear RI before setting SRR0/1 */
- mtspr SPRN_SRR0, r4
- mtspr SPRN_SRR1, r7
- rfid
+ mtmsrd r7,0
+ bctr
.globl pnv_enter_arch207_idle_mode
pnv_enter_arch207_idle_mode:
@@ -302,11 +297,10 @@ _GLOBAL(power7_idle_insn)
b pnv_powersave_common
#define CHECK_HMI_INTERRUPT \
- mfspr r0,SPRN_SRR1; \
BEGIN_FTR_SECTION_NESTED(66); \
- rlwinm r0,r0,45-31,0xf; /* extract wake reason field (P8) */ \
+ rlwinm r0,r12,45-31,0xf; /* extract wake reason field (P8) */ \
FTR_SECTION_ELSE_NESTED(66); \
- rlwinm r0,r0,45-31,0xe; /* P7 wake reason field is 3 bits */ \
+ rlwinm r0,r12,45-31,0xe; /* P7 wake reason field is 3 bits */ \
ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \
cmpwi r0,0xa; /* Hypervisor maintenance ? */ \
bne 20f; \
@@ -384,17 +378,17 @@ pnv_powersave_wakeup_mce:
/*
* Now put the original SRR1 with SRR1_WAKEMCE_RESVD as the wake
- * reason into SRR1, which allows reuse of the system reset wakeup
+ * reason into r12, which allows reuse of the system reset wakeup
* code without being mistaken for another type of wakeup.
*/
- oris r3,r3,SRR1_WAKEMCE_RESVD@h
- mtspr SPRN_SRR1,r3
+ oris r12,r3,SRR1_WAKEMCE_RESVD@h
b pnv_powersave_wakeup
/*
* Called from reset vector for powersave wakeups.
* cr3 - set to gt if waking up with partial/complete hypervisor state loss
+ * r12 - SRR1
*/
.global pnv_powersave_wakeup
pnv_powersave_wakeup:
@@ -404,8 +398,10 @@ BEGIN_FTR_SECTION
BEGIN_FTR_SECTION_NESTED(70)
bl power9_dd1_recover_paca
END_FTR_SECTION_NESTED_IFSET(CPU_FTR_POWER9_DD1, 70)
+ ld r1,PACAR1(r13)
bl pnv_restore_hyp_resource_arch300
FTR_SECTION_ELSE
+ ld r1,PACAR1(r13)
bl pnv_restore_hyp_resource_arch207
ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
@@ -425,7 +421,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
#endif
/* Return SRR1 from power7_nap() */
- mfspr r3,SPRN_SRR1
+ mr r3,r12
blt cr3,pnv_wakeup_noloss
b pnv_wakeup_loss
@@ -489,7 +485,6 @@ pnv_restore_hyp_resource_arch207:
* r4 - PACA_THREAD_IDLE_STATE
*/
pnv_wakeup_tb_loss:
- ld r1,PACAR1(r13)
/*
* Before entering any idle state, the NVGPRs are saved in the stack.
* If there was a state loss, or PACA_NAPSTATELOST was set, then the
@@ -515,9 +510,9 @@ pnv_wakeup_tb_loss:
* is required to return back to reset vector after hypervisor state
* restore is complete.
*/
+ mr r19,r12
mr r18,r4
mflr r17
- mfspr r16,SPRN_SRR1
BEGIN_FTR_SECTION
CHECK_HMI_INTERRUPT
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
@@ -762,7 +757,7 @@ no_segments:
hypervisor_state_restored:
- mtspr SPRN_SRR1,r16
+ mr r12,r19
mtlr r17
blr /* return to pnv_powersave_wakeup */
@@ -778,20 +773,19 @@ fastsleep_workaround_at_exit:
*/
.global pnv_wakeup_loss
pnv_wakeup_loss:
- ld r1,PACAR1(r13)
BEGIN_FTR_SECTION
CHECK_HMI_INTERRUPT
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
REST_NVGPRS(r1)
REST_GPR(2, r1)
+ ld r4,PACAKMSR(r13)
+ ld r5,_LINK(r1)
ld r6,_CCR(r1)
- ld r4,_MSR(r1)
- ld r5,_NIP(r1)
addi r1,r1,INT_FRAME_SIZE
+ mtlr r5
mtcr r6
- mtspr SPRN_SRR1,r4
- mtspr SPRN_SRR0,r5
- rfid
+ mtmsrd r4
+ blr
/*
* R3 here contains the value that will be returned to the caller
@@ -804,12 +798,11 @@ pnv_wakeup_noloss:
BEGIN_FTR_SECTION
CHECK_HMI_INTERRUPT
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
- ld r1,PACAR1(r13)
- ld r6,_CCR(r1)
- ld r4,_MSR(r1)
+ ld r4,PACAKMSR(r13)
ld r5,_NIP(r1)
+ ld r6,_CCR(r1)
addi r1,r1,INT_FRAME_SIZE
+ mtlr r5
mtcr r6
- mtspr SPRN_SRR1,r4
- mtspr SPRN_SRR0,r5
- rfid
+ mtmsrd r4
+ blr
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bdb3f76ceb6b..2c65aedf516a 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -329,15 +329,21 @@ kvm_novcpu_exit:
* We come in here when wakened from nap mode.
* Relocation is off and most register values are lost.
* r13 points to the PACA.
+ * r3 contains the SRR1 wakeup value
*/
.globl kvm_start_guest
kvm_start_guest:
-
/* Set runlatch bit the minute you wake up from nap */
mfspr r0, SPRN_CTRLF
ori r0, r0, 1
mtspr SPRN_CTRLT, r0
+ /*
+ * Should avoid this and pass it through in r3. For now,
+ * code expects it to be in SRR1.
+ */
+ mtspr r3,SPRN_SRR1
+
ld r2,PACATOC(r13)
li r0,KVM_HWTHREAD_IN_KVM
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 08/14] powerpc/64s: idle set polling before enabling irqs
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
` (6 preceding siblings ...)
2017-06-02 7:39 ` [PATCH 07/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 09/14] powerpc/64s: idle read mostly for common globals Nicholas Piggin
` (5 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
local_irq_enable can cause interrupts to be taken which could
take significant amount of processing time. The idle process
should set its polling flag before this, so another process that
wakes it during this time will not have to send an IPI.
Expand the TIF_POLLING_NRFLAG coverage to as large as possible.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
drivers/cpuidle/cpuidle-powernv.c | 4 +++-
drivers/cpuidle/cpuidle-pseries.c | 3 ++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 150b971c303b..0ee4660efb5f 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -51,9 +51,10 @@ static int snooze_loop(struct cpuidle_device *dev,
{
u64 snooze_exit_time;
- local_irq_enable();
set_thread_flag(TIF_POLLING_NRFLAG);
+ local_irq_enable();
+
snooze_exit_time = get_tb() + snooze_timeout;
ppc64_runlatch_off();
HMT_very_low();
@@ -66,6 +67,7 @@ static int snooze_loop(struct cpuidle_device *dev,
ppc64_runlatch_on();
clear_thread_flag(TIF_POLLING_NRFLAG);
smp_mb();
+
return index;
}
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 166ccd711ec9..7b12bb2ea70f 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -62,9 +62,10 @@ static int snooze_loop(struct cpuidle_device *dev,
unsigned long in_purr;
u64 snooze_exit_time;
+ set_thread_flag(TIF_POLLING_NRFLAG);
+
idle_loop_prolog(&in_purr);
local_irq_enable();
- set_thread_flag(TIF_POLLING_NRFLAG);
snooze_exit_time = get_tb() + snooze_timeout;
while (!need_resched()) {
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 09/14] powerpc/64s: idle read mostly for common globals
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
` (7 preceding siblings ...)
2017-06-02 7:39 ` [PATCH 08/14] powerpc/64s: idle set polling before enabling irqs Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 10/14] powerpc/64: CTRL[RUN] run-latch setting optimisation Nicholas Piggin
` (4 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
Ensure these don't get put into bouncing cachelines.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
drivers/cpuidle/cpuidle-powernv.c | 10 +++++-----
drivers/cpuidle/cpuidle-pseries.c | 8 ++++----
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 0ee4660efb5f..f0247652d91f 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -32,18 +32,18 @@ static struct cpuidle_driver powernv_idle_driver = {
.owner = THIS_MODULE,
};
-static int max_idle_state;
-static struct cpuidle_state *cpuidle_state_table;
+static int max_idle_state __read_mostly;
+static struct cpuidle_state *cpuidle_state_table __read_mostly;
struct stop_psscr_table {
u64 val;
u64 mask;
};
-static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX];
+static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly;
-static u64 snooze_timeout;
-static bool snooze_timeout_en;
+static u64 snooze_timeout __read_mostly;
+static bool snooze_timeout_en __read_mostly;
static int snooze_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 7b12bb2ea70f..a404f352d284 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -25,10 +25,10 @@ struct cpuidle_driver pseries_idle_driver = {
.owner = THIS_MODULE,
};
-static int max_idle_state;
-static struct cpuidle_state *cpuidle_state_table;
-static u64 snooze_timeout;
-static bool snooze_timeout_en;
+static int max_idle_state __read_mostly;
+static struct cpuidle_state *cpuidle_state_table __read_mostly;
+static u64 snooze_timeout __read_mostly;
+static bool snooze_timeout_en __read_mostly;
static inline void idle_loop_prolog(unsigned long *in_purr)
{
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 10/14] powerpc/64: CTRL[RUN] run-latch setting optimisation
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
` (8 preceding siblings ...)
2017-06-02 7:39 ` [PATCH 09/14] powerpc/64s: idle read mostly for common globals Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 11/14] powerpc/64s: idle no memory barrier after break from idle Nicholas Piggin
` (3 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
The CTRL register is read-only except bit 63 which is the run
latch control. This means it can be updated with a mtspr rather
than mfspr/mtspr.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/process.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 5cbb8b1faf7e..633ec9967141 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1964,12 +1964,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
void notrace __ppc64_runlatch_on(void)
{
struct thread_info *ti = current_thread_info();
- unsigned long ctrl;
-
- ctrl = mfspr(SPRN_CTRLF);
- ctrl |= CTRL_RUNLATCH;
- mtspr(SPRN_CTRLT, ctrl);
+ mtspr(SPRN_CTRLT, CTRL_RUNLATCH);
ti->local_flags |= _TLF_RUNLATCH;
}
@@ -1977,13 +1973,9 @@ void notrace __ppc64_runlatch_on(void)
void notrace __ppc64_runlatch_off(void)
{
struct thread_info *ti = current_thread_info();
- unsigned long ctrl;
ti->local_flags &= ~_TLF_RUNLATCH;
-
- ctrl = mfspr(SPRN_CTRLF);
- ctrl &= ~CTRL_RUNLATCH;
- mtspr(SPRN_CTRLT, ctrl);
+ mtspr(SPRN_CTRLT, 0);
}
#endif /* CONFIG_PPC64 */
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 11/14] powerpc/64s: idle no memory barrier after break from idle
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
` (9 preceding siblings ...)
2017-06-02 7:39 ` [PATCH 10/14] powerpc/64: CTRL[RUN] run-latch setting optimisation Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 12/14] powerpc/64s: Leave IRQs hard enabled over context switch for radix Nicholas Piggin
` (2 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
A memory barrier is not required after the task wakes up,
only if we clear the polling flag before waking. The case
where we have work to do is the important one, so optimise
for it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
drivers/cpuidle/cpuidle-powernv.c | 11 +++++++++--
drivers/cpuidle/cpuidle-pseries.c | 11 +++++++++--
2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index f0247652d91f..c53a8bb40471 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -59,14 +59,21 @@ static int snooze_loop(struct cpuidle_device *dev,
ppc64_runlatch_off();
HMT_very_low();
while (!need_resched()) {
- if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time)
+ if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+ /*
+ * Task has not woken up but we are exiting the polling
+ * loop anyway. Require a barrier after polling is
+ * cleared to order subsequent test of need_resched().
+ */
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ smp_mb();
break;
+ }
}
HMT_medium();
ppc64_runlatch_on();
clear_thread_flag(TIF_POLLING_NRFLAG);
- smp_mb();
return index;
}
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index a404f352d284..e9b3853d93ea 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -71,13 +71,20 @@ static int snooze_loop(struct cpuidle_device *dev,
while (!need_resched()) {
HMT_low();
HMT_very_low();
- if (snooze_timeout_en && get_tb() > snooze_exit_time)
+ if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+ /*
+ * Task has not woken up but we are exiting the polling
+ * loop anyway. Require a barrier after polling is
+ * cleared to order subsequent test of need_resched().
+ */
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ smp_mb();
break;
+ }
}
HMT_medium();
clear_thread_flag(TIF_POLLING_NRFLAG);
- smp_mb();
idle_loop_epilog(in_purr);
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 12/14] powerpc/64s: Leave IRQs hard enabled over context switch for radix
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
` (10 preceding siblings ...)
2017-06-02 7:39 ` [PATCH 11/14] powerpc/64s: idle no memory barrier after break from idle Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 13/14] powerpc/64: context switch can avoid reservation clear Nicholas Piggin
2017-06-02 7:39 ` [PATCH 14/14] powerpc/64: context switch additional hwsync can be avoided Nicholas Piggin
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug")
hard disabled interrupts over the low level context switch, because
the SLB management can't cope with a PMU interrupt accesing the stack
in that window.
Radix based kernel mapping does not use the SLB so it does not require
interrupts disabled here. This is worth 1-2% in context switch
performance.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/process.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 633ec9967141..ea1618b62bf8 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1199,12 +1199,14 @@ struct task_struct *__switch_to(struct task_struct *prev,
__switch_to_tm(prev, new);
- /*
- * We can't take a PMU exception inside _switch() since there is a
- * window where the kernel stack SLB and the kernel stack are out
- * of sync. Hard disable here.
- */
- hard_irq_disable();
+ if (!radix_enabled()) {
+ /*
+ * We can't take a PMU exception inside _switch() since there
+ * is a window where the kernel stack SLB and the kernel stack
+ * are out of sync. Hard disable here.
+ */
+ hard_irq_disable();
+ }
/*
* Call restore_sprs() before calling _switch(). If we move it after
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 13/14] powerpc/64: context switch can avoid reservation clear
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
` (11 preceding siblings ...)
2017-06-02 7:39 ` [PATCH 12/14] powerpc/64s: Leave IRQs hard enabled over context switch for radix Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
2017-06-02 7:39 ` [PATCH 14/14] powerpc/64: context switch additional hwsync can be avoided Nicholas Piggin
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
There is no need to break reservation in _switch, because we are
guranteed that context switch path will include a larx/stcx.
Comment the guarantee and remove the reservation clear from _switch.
This is worth 1-2% in context switch performance.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/entry_64.S | 11 +++--------
kernel/sched/core.c | 6 ++++++
2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 019a6322b982..012142fe39a4 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -521,15 +521,10 @@ _GLOBAL(_switch)
#endif /* CONFIG_SMP */
/*
- * If we optimise away the clear of the reservation in system
- * calls because we know the CPU tracks the address of the
- * reservation, then we need to clear it here to cover the
- * case that the kernel context switch path has no larx
- * instructions.
+ * The kernel context switch path must contain a spin_lock,
+ * which contains larx/stcx, which will clear any reservation
+ * of the task being switched.
*/
-BEGIN_FTR_SECTION
- ldarx r6,0,r1
-END_FTR_SECTION_IFSET(CPU_FTR_STCX_CHECKS_ADDRESS)
BEGIN_FTR_SECTION
/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 803c3bc274c4..1f0688ad09d7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2875,6 +2875,12 @@ context_switch(struct rq *rq, struct task_struct *prev,
rq_unpin_lock(rq, rf);
spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
+ /*
+ * Some architectures require that a spin lock is taken before
+ * _switch. The rq_lock satisfies this condition. See powerpc
+ * _switch for details.
+ */
+
/* Here we just switch the register state and the stack. */
switch_to(prev, next, prev);
barrier();
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 14/14] powerpc/64: context switch additional hwsync can be avoided
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
` (12 preceding siblings ...)
2017-06-02 7:39 ` [PATCH 13/14] powerpc/64: context switch can avoid reservation clear Nicholas Piggin
@ 2017-06-02 7:39 ` Nicholas Piggin
13 siblings, 0 replies; 16+ messages in thread
From: Nicholas Piggin @ 2017-06-02 7:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Anton Blanchard
The hwsync in the context switch code to prevent MMIO access being
reordered from the point of view of a single process if it gets
migrated to a different CPU is not required because there is an hwsync
performed earlier in the context switch path.
Comment this so it's clear enough if anything changes on the scheduler
or the powerpc sides. Remove the hwsync from _switch. This is worth 2-3%
context switch performance.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/barrier.h | 4 ++++
arch/powerpc/kernel/entry_64.S | 21 +++++++++++++++------
kernel/sched/core.c | 3 +++
3 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index c0deafc212b8..8bbadbd3b3c7 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -74,6 +74,10 @@ do { \
___p1; \
})
+/*
+ * This must resolve to hwsync on SMP for the context switch path. See
+ * _switch.
+ */
#define smp_mb__before_spinlock() smp_mb()
#include <asm-generic/barrier.h>
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 012142fe39a4..2b1e57b33757 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -512,13 +512,22 @@ _GLOBAL(_switch)
std r23,_CCR(r1)
std r1,KSP(r3) /* Set old stack pointer */
-#ifdef CONFIG_SMP
- /* We need a sync somewhere here to make sure that if the
- * previous task gets rescheduled on another CPU, it sees all
- * stores it has performed on this one.
+ /*
+ * On SMP kernels, care must be taken because a task may be
+ * scheduled off CPUx and on to CPUy. Memory ordering must be
+ * considered.
+ *
+ * Cacheable stores on CPUx will be visible when the task is
+ * scheduled on CPUy by virtue of smp_store_release(t->on_cpu, 0)
+ * pairing with smp_cond_load_acquire(!t->on_cpu) on the other
+ * CPU.
+ *
+ * Uncacheable stores in the case of involuntary preemption must
+ * be taken care of. The smp_mb__before_spin_lock() in __schedule()
+ * is a hwsync, which orders mmio too. That does not have to be
+ * in any particular place within the context switch path, because
+ * the context switch path itself does not do any mmio.
*/
- sync
-#endif /* CONFIG_SMP */
/*
* The kernel context switch path must contain a spin_lock,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1f0688ad09d7..ff375012d2c6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3394,6 +3394,9 @@ static void __sched notrace __schedule(bool preempt)
* Make sure that signal_pending_state()->signal_pending() below
* can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
* done by the caller to avoid the race with signal_wake_up().
+ *
+ * smp_mb__before_spinlock() must be present for powerpc
+ * (see powerpc smp_mb__before_spinlock()).
*/
smp_mb__before_spinlock();
rq_lock(rq, &rf);
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 07/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
2017-06-02 7:39 ` [PATCH 07/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths Nicholas Piggin
@ 2017-06-02 13:37 ` kbuild test robot
0 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2017-06-02 13:37 UTC (permalink / raw)
To: Nicholas Piggin
Cc: kbuild-all, linuxppc-dev, Anton Blanchard, Nicholas Piggin
[-- Attachment #1: Type: text/plain, Size: 2496 bytes --]
Hi Nicholas,
[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.12-rc3]
[cannot apply to powerpc/next next-20170602]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Nicholas-Piggin/syscall-context-switch-idle-performance-stuff/20170602-172149
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc
All errors (new ones prefixed by >>):
In file included from arch/powerpc/kernel/head_64.S:198:0:
>> arch/powerpc/kernel/exceptions-64s.S:1034:43: error: macro "BRANCH_LINK_TO_FAR" requires 2 arguments, but only 1 given
BRANCH_LINK_TO_FAR(hmi_exception_realmode) /* Function call ABI */
^
vim +/BRANCH_LINK_TO_FAR +1034 arch/powerpc/kernel/exceptions-64s.S
62f9b03b06 Nicholas Piggin 2016-09-21 1028 mfspr r11,SPRN_HSRR0 /* Save HSRR0 */
a4087a4d38 Nicholas Piggin 2016-12-20 1029 mfspr r12,SPRN_HSRR1 /* Save HSRR1 */
a4087a4d38 Nicholas Piggin 2016-12-20 1030 EXCEPTION_PROLOG_COMMON_1()
62f9b03b06 Nicholas Piggin 2016-09-21 1031 EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN)
62f9b03b06 Nicholas Piggin 2016-09-21 1032 EXCEPTION_PROLOG_COMMON_3(0xe60)
62f9b03b06 Nicholas Piggin 2016-09-21 1033 addi r3,r1,STACK_FRAME_OVERHEAD
be5c5e843c Michael Ellerman 2017-04-18 @1034 BRANCH_LINK_TO_FAR(hmi_exception_realmode) /* Function call ABI */
62f9b03b06 Nicholas Piggin 2016-09-21 1035 /* Windup the stack. */
62f9b03b06 Nicholas Piggin 2016-09-21 1036 /* Move original HSRR0 and HSRR1 into the respective regs */
62f9b03b06 Nicholas Piggin 2016-09-21 1037 ld r9,_MSR(r1)
:::::: The code at line 1034 was first introduced by commit
:::::: be5c5e843c4afa1c8397cb740b6032bd4142f32d powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
:::::: TO: Michael Ellerman <mpe@ellerman.id.au>
:::::: CC: Michael Ellerman <mpe@ellerman.id.au>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23368 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-06-02 13:37 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-02 7:39 [RFC][PATCH 00/14] syscall, context switch, idle performance stuff Nicholas Piggin
2017-06-02 7:39 ` [PATCH 01/14] powerpc/64s: optimize hypercall/syscall Nicholas Piggin
2017-06-02 7:39 ` [PATCH 02/14] powerpc/64: syscall avoid restore_math call if possible Nicholas Piggin
2017-06-02 7:39 ` [PATCH 03/14] powerpc/64s: idle move soft interrupt mask logic into C code Nicholas Piggin
2017-06-02 7:39 ` [PATCH 04/14] powerpc/64s: process interrupts from system reset wakeup Nicholas Piggin
2017-06-02 7:39 ` [PATCH 05/14] powerpc/64s: msgclr when handling doorbell exceptions Nicholas Piggin
2017-06-02 7:39 ` [PATCH 06/14] powerpc/64s: branch to idle handler with virtual mode offset Nicholas Piggin
2017-06-02 7:39 ` [PATCH 07/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths Nicholas Piggin
2017-06-02 13:37 ` kbuild test robot
2017-06-02 7:39 ` [PATCH 08/14] powerpc/64s: idle set polling before enabling irqs Nicholas Piggin
2017-06-02 7:39 ` [PATCH 09/14] powerpc/64s: idle read mostly for common globals Nicholas Piggin
2017-06-02 7:39 ` [PATCH 10/14] powerpc/64: CTRL[RUN] run-latch setting optimisation Nicholas Piggin
2017-06-02 7:39 ` [PATCH 11/14] powerpc/64s: idle no memory barrier after break from idle Nicholas Piggin
2017-06-02 7:39 ` [PATCH 12/14] powerpc/64s: Leave IRQs hard enabled over context switch for radix Nicholas Piggin
2017-06-02 7:39 ` [PATCH 13/14] powerpc/64: context switch can avoid reservation clear Nicholas Piggin
2017-06-02 7:39 ` [PATCH 14/14] powerpc/64: context switch additional hwsync can be avoided Nicholas Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).