* [RFC PATCH v3] powerpc/64s: Move idle code to powernv C code
From: Nicholas Piggin @ 2018-07-31 13:42 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Vaidyanathan Srinivasan
Reimplement Book3S idle code to C, in the powernv platform code.
Assembly stubs are used to save and restore the stack frame and
non-volatile GPRs before going to idle, but these are small and
mostly agnostic to microarchitecture implementation details.
The optimisation where EC=ESL=0 idle modes did not have to save
GPRs or mtmsrd L=0 is restored, because it's simple to do.
Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
but saves and restores them all explicitly. This can easily be
extended to tracking the set of system-wide SPRs that do not have
to be saved each time.
Moving the HMI, SPR, OPAL, locking, etc. to C is the only real
way this stuff will cope with non-trivial new CPU implementation
details, firmware changes, etc., without becoming unmaintainable.
Since RFC v1:
- Now tested and working with POWER9 hash and radix.
- KVM support added. This took a bit of work to untangle and might
still have some issues, but POWER9 seems to work including hash on
radix with dependent threads mode.
- This snowballed a bit because of KVM and other details making it
not feasible to leave POWER7/8 code alone. That's only half done
at the moment.
- So far this trades about 800 lines of asm for 500 of C. With POWER7/8
support done it might be another hundred or so lines of C.
Since RFC v2:
- Fixed deep state SLB reloading
- Now tested and working with POWER8.
- Accounted for most feedback.
Thanks,
Nick
---
include/asm/book3s/64/mmu-hash.h | 1
include/asm/cpuidle.h | 21
include/asm/paca.h | 40 -
include/asm/processor.h | 9
include/asm/reg.h | 7
kernel/asm-offsets.c | 18
kernel/dt_cpu_ftrs.c | 21
kernel/exceptions-64s.S | 17
kernel/idle_book3s.S | 998 +++------------------------------------
kernel/setup-common.c | 4
kvm/book3s_hv_rmhandlers.S | 94 ++-
mm/slb.c | 15
platforms/powernv/idle.c | 839 ++++++++++++++++++++++++++------
platforms/powernv/subcore.c | 2
xmon/xmon.c | 25
15 files changed, 902 insertions(+), 1209 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 50ed64fba4ae..b68a4fe446d6 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -487,6 +487,7 @@ extern void hpte_init_native(void);
extern void slb_initialize(void);
extern void slb_flush_and_rebolt(void);
+extern void slb_shadow_reload(void);
extern void slb_vmalloc_update(void);
extern void slb_set_size(u16 size);
diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
index e210a83eb196..9b5c7ec908f2 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -27,10 +27,11 @@
* the THREAD_WINKLE_BITS are set, which indicate which threads have not
* yet woken from the winkle state.
*/
-#define PNV_CORE_IDLE_LOCK_BIT 0x10000000
+#define NR_PNV_CORE_IDLE_LOCK_BIT 28
+#define PNV_CORE_IDLE_LOCK_BIT (1ULL << NR_PNV_CORE_IDLE_LOCK_BIT)
+#define PNV_CORE_IDLE_WINKLE_COUNT_SHIFT 16
#define PNV_CORE_IDLE_WINKLE_COUNT 0x00010000
-#define PNV_CORE_IDLE_WINKLE_COUNT_ALL_BIT 0x00080000
#define PNV_CORE_IDLE_WINKLE_COUNT_BITS 0x000F0000
#define PNV_CORE_IDLE_THREAD_WINKLE_BITS_SHIFT 8
#define PNV_CORE_IDLE_THREAD_WINKLE_BITS 0x0000FF00
@@ -68,22 +69,6 @@
#define ERR_DEEP_STATE_ESL_MISMATCH -2
#ifndef __ASSEMBLY__
-/* Additional SPRs that need to be saved/restored during stop */
-struct stop_sprs {
- u64 pid;
- u64 ldbar;
- u64 fscr;
- u64 hfscr;
- u64 mmcr1;
- u64 mmcr2;
- u64 mmcra;
-};
-
-extern u32 pnv_fastsleep_workaround_at_entry[];
-extern u32 pnv_fastsleep_workaround_at_exit[];
-
-extern u64 pnv_first_deep_stop_state;
-
unsigned long pnv_cpu_offline(unsigned int cpu);
int validate_psscr_val_mask(u64 *psscr_val, u64 *psscr_mask, u32 flags);
static inline void report_invalid_psscr_val(u64 psscr_val, int err)
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 4e9cede5a7e7..d2cee5ebaaa1 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -168,7 +168,6 @@ struct paca_struct {
u8 irq_happened; /* irq happened while soft-disabled */
u8 io_sync; /* writel() needs spin_unlock sync */
u8 irq_work_pending; /* IRQ_WORK interrupt while soft-disable */
- u8 nap_state_lost; /* NV GPR values lost in power7_idle */
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
u8 pmcregs_in_use; /* pseries puts this in lppaca */
#endif
@@ -178,23 +177,30 @@ struct paca_struct {
#endif
#ifdef CONFIG_PPC_POWERNV
- /* Per-core mask tracking idle threads and a lock bit-[L][TTTTTTTT] */
- u32 *core_idle_state_ptr;
- u8 thread_idle_state; /* PNV_THREAD_RUNNING/NAP/SLEEP */
- /* Mask to indicate thread id in core */
- u8 thread_mask;
- /* Mask to denote subcore sibling threads */
- u8 subcore_sibling_mask;
- /* Flag to request this thread not to stop */
- atomic_t dont_stop;
- /* The PSSCR value that the kernel requested before going to stop */
- u64 requested_psscr;
+ /* PowerNV idle fields */
+ /* PNV_CORE_IDLE_* bits, all siblings work on thread 0 paca */
+ unsigned long idle_state;
+ union {
+ /* P7/P8 specific fields */
+ struct {
+ /* PNV_THREAD_RUNNING/NAP/SLEEP */
+ u8 thread_idle_state;
+ /* Mask to indicate thread id in core */
+ u8 thread_mask;
+ /* Mask to denote subcore sibling threads */
+ u8 subcore_sibling_mask;
+ };
- /*
- * Save area for additional SPRs that need to be
- * saved/restored during cpuidle stop.
- */
- struct stop_sprs stop_sprs;
+ /* P9 specific fields */
+ struct {
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ /* The PSSCR value that the kernel requested before going to stop */
+ u64 requested_psscr;
+ /* Flag to request this thread not to stop */
+ atomic_t dont_stop;
+#endif
+ };
+ };
#endif
#ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 5debe337ea9d..1d241f4d8612 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -507,14 +507,17 @@ static inline unsigned long get_clean_sp(unsigned long sp, int is_32)
}
#endif
+/* asm stubs */
+extern unsigned long isa3_idle_stop_noloss(unsigned long psscr_val);
+extern unsigned long isa3_idle_stop_mayloss(unsigned long psscr_val);
+extern unsigned long isa206_idle_insn_mayloss(unsigned long type);
+
extern unsigned long cpuidle_disable;
enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
extern int powersave_nap; /* set if nap mode can be used in idle loop */
-extern unsigned long power7_idle_insn(unsigned long type); /* PNV_THREAD_NAP/etc*/
+
extern void power7_idle_type(unsigned long type);
-extern unsigned long power9_idle_stop(unsigned long psscr_val);
-extern unsigned long power9_offline_stop(unsigned long psscr_val);
extern void power9_idle_type(unsigned long stop_psscr_val,
unsigned long stop_psscr_mask);
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index d85d000d05b5..76513a2a2192 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -753,10 +753,9 @@
#define SRR1_WAKERESET 0x00100000 /* System reset */
#define SRR1_WAKEHDBELL 0x000c0000 /* Hypervisor doorbell on P8 */
#define SRR1_WAKESTATE 0x00030000 /* Powersave exit mask [46:47] */
-#define SRR1_WS_DEEPEST 0x00030000 /* Some resources not maintained,
- * may not be recoverable */
-#define SRR1_WS_DEEPER 0x00020000 /* Some resources not maintained */
-#define SRR1_WS_DEEP 0x00010000 /* All resources maintained */
+#define SRR1_WS_HVLOSS 0x00030000 /* HV resources not maintained */
+#define SRR1_WS_GPRLOSS 0x00020000 /* GPRs not maintained */
+#define SRR1_WS_NOLOSS 0x00010000 /* All resources maintained */
#define SRR1_PROGTM 0x00200000 /* TM Bad Thing */
#define SRR1_PROGFPE 0x00100000 /* Floating Point Enabled */
#define SRR1_PROGILL 0x00080000 /* Illegal instruction */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 89cf15566c4e..7834256585f1 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -256,7 +256,6 @@ int main(void)
OFFSET(ACCOUNT_USER_TIME, paca_struct, accounting.utime);
OFFSET(ACCOUNT_SYSTEM_TIME, paca_struct, accounting.stime);
OFFSET(PACA_TRAP_SAVE, paca_struct, trap_save);
- OFFSET(PACA_NAPSTATELOST, paca_struct, nap_state_lost);
OFFSET(PACA_SPRG_VDSO, paca_struct, sprg_vdso);
#else /* CONFIG_PPC64 */
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
@@ -761,23 +760,6 @@ int main(void)
OFFSET(VCPU_TIMING_LAST_ENTER_TBL, kvm_vcpu, arch.timing_last_enter.tv32.tbl);
#endif
-#ifdef CONFIG_PPC_POWERNV
- OFFSET(PACA_CORE_IDLE_STATE_PTR, paca_struct, core_idle_state_ptr);
- OFFSET(PACA_THREAD_IDLE_STATE, paca_struct, thread_idle_state);
- OFFSET(PACA_THREAD_MASK, paca_struct, thread_mask);
- OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
- OFFSET(PACA_REQ_PSSCR, paca_struct, requested_psscr);
- OFFSET(PACA_DONT_STOP, paca_struct, dont_stop);
-#define STOP_SPR(x, f) OFFSET(x, paca_struct, stop_sprs.f)
- STOP_SPR(STOP_PID, pid);
- STOP_SPR(STOP_LDBAR, ldbar);
- STOP_SPR(STOP_FSCR, fscr);
- STOP_SPR(STOP_HFSCR, hfscr);
- STOP_SPR(STOP_MMCR1, mmcr1);
- STOP_SPR(STOP_MMCR2, mmcr2);
- STOP_SPR(STOP_MMCRA, mmcra);
-#endif
-
DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
DEFINE(PPC_DBELL_MSGTYPE, PPC_DBELL_MSGTYPE);
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index f432054234a4..d635d78facdc 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -71,7 +71,6 @@ static int hv_mode;
static struct {
u64 lpcr;
- u64 lpcr_clear;
u64 hfscr;
u64 fscr;
} system_registers;
@@ -80,24 +79,7 @@ static void (*init_pmu_registers)(void);
static void __restore_cpu_cpufeatures(void)
{
- u64 lpcr;
-
- /*
- * LPCR is restored by the power on engine already. It can be changed
- * after early init e.g., by radix enable, and we have no unified API
- * for saving and restoring such SPRs.
- *
- * This ->restore hook should really be removed from idle and register
- * restore moved directly into the idle restore code, because this code
- * doesn't know how idle is implemented or what it needs restored here.
- *
- * The best we can do to accommodate secondary boot and idle restore
- * for now is "or" LPCR with existing.
- */
- lpcr = mfspr(SPRN_LPCR);
- lpcr |= system_registers.lpcr;
- lpcr &= ~system_registers.lpcr_clear;
- mtspr(SPRN_LPCR, lpcr);
+ mtspr(SPRN_LPCR, system_registers.lpcr);
if (hv_mode) {
mtspr(SPRN_LPID, 0);
mtspr(SPRN_HFSCR, system_registers.hfscr);
@@ -318,7 +300,6 @@ static int __init feat_enable_mmu_hash_v3(struct dt_cpu_feature *f)
{
u64 lpcr;
- system_registers.lpcr_clear |= (LPCR_ISL | LPCR_UPRT | LPCR_HR);
lpcr = mfspr(SPRN_LPCR);
lpcr &= ~(LPCR_ISL | LPCR_UPRT | LPCR_HR);
mtspr(SPRN_LPCR, lpcr);
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 76a14702cb9c..137f0f446472 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -135,8 +135,9 @@ TRAMP_KVM(PACA_EXNMI, 0x100)
#ifdef CONFIG_PPC_P7_NAP
EXC_COMMON_BEGIN(system_reset_idle_common)
- mfspr r12,SPRN_SRR1
- b pnv_powersave_wakeup
+ mfspr r3,SPRN_SRR1
+ bltlr cr3 /* no state loss, return to idle caller */
+ b idle_return_gpr_loss
#endif
/*
@@ -415,17 +416,17 @@ EXC_COMMON_BEGIN(machine_check_idle_common)
* Then decrement MCE nesting after finishing with the stack.
*/
ld r3,_MSR(r1)
+ ld r4,_LINK(r1)
lhz r11,PACA_IN_MCE(r13)
subi r11,r11,1
sth r11,PACA_IN_MCE(r13)
- /* Turn off the RI bit because SRR1 is used by idle wakeup code. */
- /* Recoverability could be improved by reducing the use of SRR1. */
- li r11,0
- mtmsrd r11,1
-
- b pnv_powersave_wakeup_mce
+ mtlr r4
+ rlwinm r10,r3,47-31,30,31
+ cmpwi cr3,r10,2
+ bltlr cr3 /* no state loss, return to idle caller */
+ b idle_return_gpr_loss
#endif
/*
* Handle machine check early in real mode. We come here with
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 672ead80c702..618251b0dd1e 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -1,6 +1,7 @@
/*
- * This file contains idle entry/exit functions for POWER7,
- * POWER8 and POWER9 CPUs.
+ * This file contains general idle entry/exit functions. The platform / CPU
+ * must call the correct save/restore functions and ensure SPRs are saved
+ * and restored correctly, handle KVM, interrupts, etc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
@@ -8,218 +9,69 @@
* 2 of the License, or (at your option) any later version.
*/
-#include <linux/threads.h>
-#include <asm/processor.h>
-#include <asm/page.h>
-#include <asm/cputable.h>
-#include <asm/thread_info.h>
#include <asm/ppc_asm.h>
#include <asm/asm-offsets.h>
#include <asm/ppc-opcode.h>
-#include <asm/hw_irq.h>
-#include <asm/kvm_book3s_asm.h>
-#include <asm/opal.h>
#include <asm/cpuidle.h>
-#include <asm/exception-64s.h>
-#include <asm/book3s/64/mmu-hash.h>
-#include <asm/mmu.h>
-
-#undef DEBUG
-
-/*
- * Use unused space in the interrupt stack to save and restore
- * registers for winkle support.
- */
-#define _MMCR0 GPR0
-#define _SDR1 GPR3
-#define _PTCR GPR3
-#define _RPR GPR4
-#define _SPURR GPR5
-#define _PURR GPR6
-#define _TSCR GPR7
-#define _DSCR GPR8
-#define _AMOR GPR9
-#define _WORT GPR10
-#define _WORC GPR11
-#define _LPCR GPR12
-
-#define PSSCR_EC_ESL_MASK_SHIFTED (PSSCR_EC | PSSCR_ESL) >> 16
-
- .text
/*
- * Used by threads before entering deep idle states. Saves SPRs
- * in interrupt stack frame
- */
-save_sprs_to_stack:
- /*
- * Note all register i.e per-core, per-subcore or per-thread is saved
- * here since any thread in the core might wake up first
- */
-BEGIN_FTR_SECTION
- /*
- * Note - SDR1 is dropped in Power ISA v3. Hence not restoring
- * SDR1 here
- */
- mfspr r3,SPRN_PTCR
- std r3,_PTCR(r1)
- mfspr r3,SPRN_LPCR
- std r3,_LPCR(r1)
-FTR_SECTION_ELSE
- mfspr r3,SPRN_SDR1
- std r3,_SDR1(r1)
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
- mfspr r3,SPRN_RPR
- std r3,_RPR(r1)
- mfspr r3,SPRN_SPURR
- std r3,_SPURR(r1)
- mfspr r3,SPRN_PURR
- std r3,_PURR(r1)
- mfspr r3,SPRN_TSCR
- std r3,_TSCR(r1)
- mfspr r3,SPRN_DSCR
- std r3,_DSCR(r1)
- mfspr r3,SPRN_AMOR
- std r3,_AMOR(r1)
- mfspr r3,SPRN_WORT
- std r3,_WORT(r1)
- mfspr r3,SPRN_WORC
- std r3,_WORC(r1)
-/*
- * On POWER9, there are idle states such as stop4, invoked via cpuidle,
- * that lose hypervisor resources. In such cases, we need to save
- * additional SPRs before entering those idle states so that they can
- * be restored to their older values on wakeup from the idle state.
+ * Desired PSSCR in r3
+ *
+ * No state will be lost regardless of wakeup mechanism (interrupt or NIA).
+ * Interrupt driven wakeup may clobber volatiles, and should blr (with LR
+ * unchanged) to return to caller with r3 set according to caller's expected
+ * return code (for Book3S/64 that is SRR1).
*
- * On POWER8, the only such deep idle state is winkle which is used
- * only in the context of CPU-Hotplug, where these additional SPRs are
- * reinitiazed to a sane value. Hence there is no need to save/restore
- * these SPRs.
+ * Caller is responsible for restoring SPRs, MSR, etc.
*/
-BEGIN_FTR_SECTION
- blr
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
-
-power9_save_additional_sprs:
- mfspr r3, SPRN_PID
- mfspr r4, SPRN_LDBAR
- std r3, STOP_PID(r13)
- std r4, STOP_LDBAR(r13)
-
- mfspr r3, SPRN_FSCR
- mfspr r4, SPRN_HFSCR
- std r3, STOP_FSCR(r13)
- std r4, STOP_HFSCR(r13)
-
- mfspr r3, SPRN_MMCRA
- mfspr r4, SPRN_MMCR0
- std r3, STOP_MMCRA(r13)
- std r4, _MMCR0(r1)
-
- mfspr r3, SPRN_MMCR1
- mfspr r4, SPRN_MMCR2
- std r3, STOP_MMCR1(r13)
- std r4, STOP_MMCR2(r13)
- blr
-
-power9_restore_additional_sprs:
- ld r3,_LPCR(r1)
- ld r4, STOP_PID(r13)
- mtspr SPRN_LPCR,r3
- mtspr SPRN_PID, r4
-
- ld r3, STOP_LDBAR(r13)
- ld r4, STOP_FSCR(r13)
- mtspr SPRN_LDBAR, r3
- mtspr SPRN_FSCR, r4
-
- ld r3, STOP_HFSCR(r13)
- ld r4, STOP_MMCRA(r13)
- mtspr SPRN_HFSCR, r3
- mtspr SPRN_MMCRA, r4
-
- ld r3, _MMCR0(r1)
- ld r4, STOP_MMCR1(r13)
- mtspr SPRN_MMCR0, r3
- mtspr SPRN_MMCR1, r4
-
- ld r3, STOP_MMCR2(r13)
- ld r4, PACA_SPRG_VDSO(r13)
- mtspr SPRN_MMCR2, r3
- mtspr SPRN_SPRG3, r4
+_GLOBAL(isa3_idle_stop_noloss)
+ mtspr SPRN_PSSCR,r3
+ PPC_STOP
+ li r3,0
blr
/*
- * Used by threads when the lock bit of core_idle_state is set.
- * Threads will spin in HMT_LOW until the lock bit is cleared.
- * r14 - pointer to core_idle_state
- * r15 - used to load contents of core_idle_state
- * r9 - used as a temporary variable
+ * Desired PSSCR in r3
+ *
+ * GPRs may be lost, so they are saved here. Wakeup is by interrupt only.
+ * Wakeup can return to caller by calling isa3_idle_wake_gpr_loss with r3 set
+ * to return value.
+ *
+ * A wakeup without GPR loss may alterateively be handled as in
+ * isa3_idle_stop_noloss as an optimisation.
+ *
+ * Caller is responsible for restoring SPRs, MSR, etc.
*/
-
-core_idle_lock_held:
- HMT_LOW
-3: lwz r15,0(r14)
- andis. r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
- bne 3b
- HMT_MEDIUM
- lwarx r15,0,r14
- andis. r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
- bne- core_idle_lock_held
- blr
+_GLOBAL(isa3_idle_stop_mayloss)
+ mtspr SPRN_PSSCR,r3
+ std r1,PACAR1(r13)
+ mflr r4
+ mfcr r5
+ /* use stack red zone rather than a new frame */
+ addi r6,r1,-INT_FRAME_SIZE
+ SAVE_GPR(2, r6)
+ SAVE_NVGPRS(r6)
+ std r4,_LINK(r6)
+ std r5,_CCR(r6)
+ PPC_STOP
+ b . /* catch bugs */
/*
- * Pass requested state in r3:
- * r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
- * - Requested PSSCR value in POWER9
+ * Desired return value in r3
*
- * Address of idle handler to branch to in realmode in r4
+ * Idle wakeup can call this after calling isa3_idle_stop_loss to
+ * return to caller with r3 as return code.
*/
-pnv_powersave_common:
- /* Use r3 to pass state nap/sleep/winkle */
- /* NAP is a state loss, we create a regs frame on the
- * stack, fill it up with the state we care about and
- * stick a pointer to it in PACAR1. We really only
- * need to save PC, some CR bits and the NV GPRs,
- * but for now an interrupt frame will do.
- */
- mtctr r4
-
- mflr r0
- std r0,16(r1)
- stdu r1,-INT_FRAME_SIZE(r1)
- std r0,_LINK(r1)
- std r0,_NIP(r1)
-
- /* We haven't lost state ... yet */
- li r0,0
- stb r0,PACA_NAPSTATELOST(r13)
-
- /* Continue saving state */
- SAVE_GPR(2, r1)
- SAVE_NVGPRS(r1)
- mfcr r5
- std r5,_CCR(r1)
- std r1,PACAR1(r13)
-
-BEGIN_FTR_SECTION
- /*
- * POWER9 does not require real mode to stop, and presently does not
- * set hwthread_state for KVM (threads don't share MMU context), so
- * we can remain in virtual mode for this.
- */
- bctr
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
- /*
- * POWER8
- * Go to real mode to do the nap, as required by the architecture.
- * Also, we need to be in real mode before setting hwthread_state,
- * because as soon as we do that, another thread can switch
- * the MMU context to the guest.
- */
- LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
- mtmsrd r7,0
- bctr
+_GLOBAL(idle_return_gpr_loss)
+ ld r1,PACAR1(r13)
+ addi r6,r1,-INT_FRAME_SIZE
+ ld r4,_LINK(r6)
+ ld r5,_CCR(r6)
+ REST_NVGPRS(r6)
+ REST_GPR(2, r6)
+ mtlr r4
+ mtcr r5
+ blr
/*
* This is the sequence required to execute idle instructions, as
@@ -232,723 +84,53 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
ld r0,0(r1); \
236: cmpd cr0,r0,r0; \
bne 236b; \
- IDLE_INST;
-
-
- .globl pnv_enter_arch207_idle_mode
-pnv_enter_arch207_idle_mode:
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
- /* Tell KVM we're entering idle */
- li r4,KVM_HWTHREAD_IN_IDLE
- /******************************************************/
- /* N O T E W E L L ! ! ! N O T E W E L L */
- /* The following store to HSTATE_HWTHREAD_STATE(r13) */
- /* MUST occur in real mode, i.e. with the MMU off, */
- /* and the MMU must stay off until we clear this flag */
- /* and test HSTATE_HWTHREAD_REQ(r13) in */
- /* pnv_powersave_wakeup in this file. */
- /* The reason is that another thread can switch the */
- /* MMU to a guest context whenever this flag is set */
- /* to KVM_HWTHREAD_IN_IDLE, and if the MMU was on, */
- /* that would potentially cause this thread to start */
- /* executing instructions from guest memory in */
- /* hypervisor mode, leading to a host crash or data */
- /* corruption, or worse. */
- /******************************************************/
- stb r4,HSTATE_HWTHREAD_STATE(r13)
-#endif
- stb r3,PACA_THREAD_IDLE_STATE(r13)
- cmpwi cr3,r3,PNV_THREAD_SLEEP
- bge cr3,2f
- IDLE_STATE_ENTER_SEQ_NORET(PPC_NAP)
- /* No return */
-2:
- /* Sleep or winkle */
- lbz r7,PACA_THREAD_MASK(r13)
- ld r14,PACA_CORE_IDLE_STATE_PTR(r13)
- li r5,0
- beq cr3,3f
- lis r5,PNV_CORE_IDLE_WINKLE_COUNT@h
-3:
-lwarx_loop1:
- lwarx r15,0,r14
-
- andis. r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
- bnel- core_idle_lock_held
-
- add r15,r15,r5 /* Add if winkle */
- andc r15,r15,r7 /* Clear thread bit */
-
- andi. r9,r15,PNV_CORE_IDLE_THREAD_BITS
-
-/*
- * If cr0 = 0, then current thread is the last thread of the core entering
- * sleep. Last thread needs to execute the hardware bug workaround code if
- * required by the platform.
- * Make the workaround call unconditionally here. The below branch call is
- * patched out when the idle states are discovered if the platform does not
- * require it.
- */
-.global pnv_fastsleep_workaround_at_entry
-pnv_fastsleep_workaround_at_entry:
- beq fastsleep_workaround_at_entry
-
- stwcx. r15,0,r14
- bne- lwarx_loop1
- isync
-
-common_enter: /* common code for all the threads entering sleep or winkle */
- bgt cr3,enter_winkle
- IDLE_STATE_ENTER_SEQ_NORET(PPC_SLEEP)
-
-fastsleep_workaround_at_entry:
- oris r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
- stwcx. r15,0,r14
- bne- lwarx_loop1
- isync
+ IDLE_INST; \
+ b . /* catch bugs */
- /* Fast sleep workaround */
- li r3,1
- li r4,1
- bl opal_config_cpu_idle_state
-
- /* Unlock */
- xoris r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
- lwsync
- stw r15,0(r14)
- b common_enter
-
-enter_winkle:
- bl save_sprs_to_stack
-
- IDLE_STATE_ENTER_SEQ_NORET(PPC_WINKLE)
-
-/*
- * r3 - PSSCR value corresponding to the requested stop state.
- */
-power_enter_stop:
/*
- * Check if we are executing the lite variant with ESL=EC=0
- */
- andis. r4,r3,PSSCR_EC_ESL_MASK_SHIFTED
- clrldi r3,r3,60 /* r3 = Bits[60:63] = Requested Level (RL) */
- bne .Lhandle_esl_ec_set
- PPC_STOP
- li r3,0 /* Since we didn't lose state, return 0 */
- std r3, PACA_REQ_PSSCR(r13)
-
- /*
- * pnv_wakeup_noloss() expects r12 to contain the SRR1 value so
- * it can determine if the wakeup reason is an HMI in
- * CHECK_HMI_INTERRUPT.
- *
- * However, when we wakeup with ESL=0, SRR1 will not contain the wakeup
- * reason, so there is no point setting r12 to SRR1.
- *
- * Further, we clear r12 here, so that we don't accidentally enter the
- * HMI in pnv_wakeup_noloss() if the value of r12[42:45] == WAKE_HMI.
- */
- li r12, 0
- b pnv_wakeup_noloss
-
-.Lhandle_esl_ec_set:
-BEGIN_FTR_SECTION
- /*
- * POWER9 DD2.0 or earlier can incorrectly set PMAO when waking up after
- * a state-loss idle. Saving and restoring MMCR0 over idle is a
- * workaround.
- */
- mfspr r4,SPRN_MMCR0
- std r4,_MMCR0(r1)
-END_FTR_SECTION_IFCLR(CPU_FTR_POWER9_DD2_1)
-
-/*
- * Check if the requested state is a deep idle state.
- */
- LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state)
- ld r4,ADDROFF(pnv_first_deep_stop_state)(r5)
- cmpd r3,r4
- bge .Lhandle_deep_stop
- PPC_STOP /* Does not return (system reset interrupt) */
-
-.Lhandle_deep_stop:
-/*
- * Entering deep idle state.
- * Clear thread bit in PACA_CORE_IDLE_STATE, save SPRs to
- * stack and enter stop
- */
- lbz r7,PACA_THREAD_MASK(r13)
- ld r14,PACA_CORE_IDLE_STATE_PTR(r13)
-
-lwarx_loop_stop:
- lwarx r15,0,r14
- andis. r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
- bnel- core_idle_lock_held
- andc r15,r15,r7 /* Clear thread bit */
-
- stwcx. r15,0,r14
- bne- lwarx_loop_stop
- isync
-
- bl save_sprs_to_stack
-
- PPC_STOP /* Does not return (system reset interrupt) */
-
-/*
- * Entered with MSR[EE]=0 and no soft-masked interrupts pending.
- * r3 contains desired idle state (PNV_THREAD_NAP/SLEEP/WINKLE).
- */
-_GLOBAL(power7_idle_insn)
- /* Now check if user or arch enabled NAP mode */
- LOAD_REG_ADDR(r4, pnv_enter_arch207_idle_mode)
- b pnv_powersave_common
-
-#define CHECK_HMI_INTERRUPT \
-BEGIN_FTR_SECTION_NESTED(66); \
- rlwinm r0,r12,45-31,0xf; /* extract wake reason field (P8) */ \
-FTR_SECTION_ELSE_NESTED(66); \
- rlwinm r0,r12,45-31,0xe; /* P7 wake reason field is 3 bits */ \
-ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \
- cmpwi r0,0xa; /* Hypervisor maintenance ? */ \
- bne+ 20f; \
- /* Invoke opal call to handle hmi */ \
- ld r2,PACATOC(r13); \
- ld r1,PACAR1(r13); \
- std r3,ORIG_GPR3(r1); /* Save original r3 */ \
- li r3,0; /* NULL argument */ \
- bl hmi_exception_realmode; \
- nop; \
- ld r3,ORIG_GPR3(r1); /* Restore original r3 */ \
-20: nop;
-
-/*
- * Entered with MSR[EE]=0 and no soft-masked interrupts pending.
- * r3 contains desired PSSCR register value.
+ * Desired instruction type in r3
*
- * Offline (CPU unplug) case also must notify KVM that the CPU is
- * idle.
- */
-_GLOBAL(power9_offline_stop)
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
- /*
- * Tell KVM we're entering idle.
- * This does not have to be done in real mode because the P9 MMU
- * is independent per-thread. Some steppings share radix/hash mode
- * between threads, but in that case KVM has a barrier sync in real
- * mode before and after switching between radix and hash.
- */
- li r4,KVM_HWTHREAD_IN_IDLE
- stb r4,HSTATE_HWTHREAD_STATE(r13)
-#endif
- /* fall through */
-
-_GLOBAL(power9_idle_stop)
- std r3, PACA_REQ_PSSCR(r13)
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-BEGIN_FTR_SECTION
- sync
- lwz r5, PACA_DONT_STOP(r13)
- cmpwi r5, 0
- bne 1f
-END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_XER_SO_BUG)
-#endif
- mtspr SPRN_PSSCR,r3
- LOAD_REG_ADDR(r4,power_enter_stop)
- b pnv_powersave_common
- /* No return */
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-1:
- /*
- * We get here when TM / thread reconfiguration bug workaround
- * code wants to get the CPU into SMT4 mode, and therefore
- * we are being asked not to stop.
- */
- li r3, 0
- std r3, PACA_REQ_PSSCR(r13)
- blr /* return 0 for wakeup cause / SRR1 value */
-#endif
-
-/*
- * Called from machine check handler for powersave wakeups.
- * Low level machine check processing has already been done. Now just
- * go through the wake up path to get everything in order.
+ * GPRs may be lost, so they are saved here. Wakeup is by interrupt only.
+ * Wakeup can return to caller by calling pnv_powersave_wakeup_gpr_loss
+ * with r3 set to return value.
*
- * r3 - The original SRR1 value.
- * Original SRR[01] have been clobbered.
- * MSR_RI is clear.
- */
-.global pnv_powersave_wakeup_mce
-pnv_powersave_wakeup_mce:
- /* Set cr3 for pnv_powersave_wakeup */
- rlwinm r11,r3,47-31,30,31
- cmpwi cr3,r11,2
-
- /*
- * Now put the original SRR1 with SRR1_WAKEMCE_RESVD as the wake
- * reason into r12, which allows reuse of the system reset wakeup
- * code without being mistaken for another type of wakeup.
- */
- oris r12,r3,SRR1_WAKEMCE_RESVD@h
-
- b pnv_powersave_wakeup
-
-/*
- * Called from reset vector for powersave wakeups.
- * cr3 - set to gt if waking up with partial/complete hypervisor state loss
- * r12 - SRR1
- */
-.global pnv_powersave_wakeup
-pnv_powersave_wakeup:
- ld r2, PACATOC(r13)
-
-BEGIN_FTR_SECTION
- bl pnv_restore_hyp_resource_arch300
-FTR_SECTION_ELSE
- bl pnv_restore_hyp_resource_arch207
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
-
- li r0,PNV_THREAD_RUNNING
- stb r0,PACA_THREAD_IDLE_STATE(r13) /* Clear thread state */
-
- mr r3,r12
-
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
- lbz r0,HSTATE_HWTHREAD_STATE(r13)
- cmpwi r0,KVM_HWTHREAD_IN_KERNEL
- beq 0f
- li r0,KVM_HWTHREAD_IN_KERNEL
- stb r0,HSTATE_HWTHREAD_STATE(r13)
- /* Order setting hwthread_state vs. testing hwthread_req */
- sync
-0: lbz r0,HSTATE_HWTHREAD_REQ(r13)
- cmpwi r0,0
- beq 1f
- b kvm_start_guest
-1:
-#endif
-
- /* Return SRR1 from power7_nap() */
- blt cr3,pnv_wakeup_noloss
- b pnv_wakeup_loss
-
-/*
- * Check whether we have woken up with hypervisor state loss.
- * If yes, restore hypervisor state and return back to link.
+ * A wakeup without GPR loss may alterateively be handled as in
+ * isa3_idle_stop_noloss as an optimisation.
*
- * cr3 - set to gt if waking up with partial/complete hypervisor state loss
- */
-pnv_restore_hyp_resource_arch300:
- /*
- * Workaround for POWER9, if we lost resources, the ERAT
- * might have been mixed up and needs flushing. We also need
- * to reload MMCR0 (see comment above). We also need to set
- * then clear bit 60 in MMCRA to ensure the PMU starts running.
- */
- blt cr3,1f
-BEGIN_FTR_SECTION
- PPC_INVALIDATE_ERAT
- ld r1,PACAR1(r13)
- ld r4,_MMCR0(r1)
- mtspr SPRN_MMCR0,r4
-END_FTR_SECTION_IFCLR(CPU_FTR_POWER9_DD2_1)
- mfspr r4,SPRN_MMCRA
- ori r4,r4,(1 << (63-60))
- mtspr SPRN_MMCRA,r4
- xori r4,r4,(1 << (63-60))
- mtspr SPRN_MMCRA,r4
-1:
- /*
- * POWER ISA 3. Use PSSCR to determine if we
- * are waking up from deep idle state
- */
- LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state)
- ld r4,ADDROFF(pnv_first_deep_stop_state)(r5)
-
- /*
- * 0-3 bits correspond to Power-Saving Level Status
- * which indicates the idle state we are waking up from
- */
- mfspr r5, SPRN_PSSCR
- rldicl r5,r5,4,60
- li r0, 0 /* clear requested_psscr to say we're awake */
- std r0, PACA_REQ_PSSCR(r13)
- cmpd cr4,r5,r4
- bge cr4,pnv_wakeup_tb_loss /* returns to caller */
-
- blr /* Waking up without hypervisor state loss. */
-
-/* Same calling convention as arch300 */
-pnv_restore_hyp_resource_arch207:
- /*
- * POWER ISA 2.07 or less.
- * Check if we slept with sleep or winkle.
- */
- lbz r4,PACA_THREAD_IDLE_STATE(r13)
- cmpwi cr2,r4,PNV_THREAD_NAP
- bgt cr2,pnv_wakeup_tb_loss /* Either sleep or Winkle */
-
- /*
- * We fall through here if PACA_THREAD_IDLE_STATE shows we are waking
- * up from nap. At this stage CR3 shouldn't contains 'gt' since that
- * indicates we are waking with hypervisor state loss from nap.
- */
- bgt cr3,.
-
- blr /* Waking up without hypervisor state loss */
-
-/*
- * Called if waking up from idle state which can cause either partial or
- * complete hyp state loss.
- * In POWER8, called if waking up from fastsleep or winkle
- * In POWER9, called if waking up from stop state >= pnv_first_deep_stop_state
- *
- * r13 - PACA
- * cr3 - gt if waking up with partial/complete hypervisor state loss
- *
- * If ISA300:
- * cr4 - gt or eq if waking up from complete hypervisor state loss.
+ * Caller is responsible for restoring SPRs, MSR, etc.
*
- * If ISA207:
- * r4 - PACA_THREAD_IDLE_STATE
+ * ISA206/7 must call these in realmode, MMU disabled.
*/
-pnv_wakeup_tb_loss:
- ld r1,PACAR1(r13)
- /*
- * Before entering any idle state, the NVGPRs are saved in the stack.
- * If there was a state loss, or PACA_NAPSTATELOST was set, then the
- * NVGPRs are restored. If we are here, it is likely that state is lost,
- * but not guaranteed -- neither ISA207 nor ISA300 tests to reach
- * here are the same as the test to restore NVGPRS:
- * PACA_THREAD_IDLE_STATE test for ISA207, PSSCR test for ISA300,
- * and SRR1 test for restoring NVGPRs.
- *
- * We are about to clobber NVGPRs now, so set NAPSTATELOST to
- * guarantee they will always be restored. This might be tightened
- * with careful reading of specs (particularly for ISA300) but this
- * is already a slow wakeup path and it's simpler to be safe.
- */
- li r0,1
- stb r0,PACA_NAPSTATELOST(r13)
-
- /*
- *
- * Save SRR1 and LR in NVGPRs as they might be clobbered in
- * opal_call() (called in CHECK_HMI_INTERRUPT). SRR1 is required
- * to determine the wakeup reason if we branch to kvm_start_guest. LR
- * is required to return back to reset vector after hypervisor state
- * restore is complete.
- */
- mr r19,r12
- mr r18,r4
- mflr r17
-BEGIN_FTR_SECTION
- CHECK_HMI_INTERRUPT
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
-
- ld r14,PACA_CORE_IDLE_STATE_PTR(r13)
- lbz r7,PACA_THREAD_MASK(r13)
-
- /*
- * Take the core lock to synchronize against other threads.
- *
- * Lock bit is set in one of the 2 cases-
- * a. In the sleep/winkle enter path, the last thread is executing
- * fastsleep workaround code.
- * b. In the wake up path, another thread is executing fastsleep
- * workaround undo code or resyncing timebase or restoring context
- * In either case loop until the lock bit is cleared.
- */
-1:
- lwarx r15,0,r14
- andis. r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
- bnel- core_idle_lock_held
- oris r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
- stwcx. r15,0,r14
- bne- 1b
- isync
-
- andi. r9,r15,PNV_CORE_IDLE_THREAD_BITS
- cmpwi cr2,r9,0
-
- /*
- * At this stage
- * cr2 - eq if first thread to wakeup in core
- * cr3- gt if waking up with partial/complete hypervisor state loss
- * ISA300:
- * cr4 - gt or eq if waking up from complete hypervisor state loss.
- */
-
-BEGIN_FTR_SECTION
- /*
- * Were we in winkle?
- * If yes, check if all threads were in winkle, decrement our
- * winkle count, set all thread winkle bits if all were in winkle.
- * Check if our thread has a winkle bit set, and set cr4 accordingly
- * (to match ISA300, above). Pseudo-code for core idle state
- * transitions for ISA207 is as follows (everything happens atomically
- * due to store conditional and/or lock bit):
- *
- * nap_idle() { }
- * nap_wake() { }
- *
- * sleep_idle()
- * {
- * core_idle_state &= ~thread_in_core
- * }
- *
- * sleep_wake()
- * {
- * bool first_in_core, first_in_subcore;
- *
- * first_in_core = (core_idle_state & IDLE_THREAD_BITS) == 0;
- * first_in_subcore = (core_idle_state & SUBCORE_SIBLING_MASK) == 0;
- *
- * core_idle_state |= thread_in_core;
- * }
- *
- * winkle_idle()
- * {
- * core_idle_state &= ~thread_in_core;
- * core_idle_state += 1 << WINKLE_COUNT_SHIFT;
- * }
- *
- * winkle_wake()
- * {
- * bool first_in_core, first_in_subcore, winkle_state_lost;
- *
- * first_in_core = (core_idle_state & IDLE_THREAD_BITS) == 0;
- * first_in_subcore = (core_idle_state & SUBCORE_SIBLING_MASK) == 0;
- *
- * core_idle_state |= thread_in_core;
- *
- * if ((core_idle_state & WINKLE_MASK) == (8 << WINKLE_COUNT_SIHFT))
- * core_idle_state |= THREAD_WINKLE_BITS;
- * core_idle_state -= 1 << WINKLE_COUNT_SHIFT;
- *
- * winkle_state_lost = core_idle_state &
- * (thread_in_core << WINKLE_THREAD_SHIFT);
- * core_idle_state &= ~(thread_in_core << WINKLE_THREAD_SHIFT);
- * }
- *
- */
- cmpwi r18,PNV_THREAD_WINKLE
+_GLOBAL(isa206_idle_insn_mayloss)
+ std r1,PACAR1(r13)
+ mflr r4
+ mfcr r5
+ /* use stack red zone rather than a new frame */
+ addi r6,r1,-INT_FRAME_SIZE
+ SAVE_GPR(2, r6)
+ SAVE_NVGPRS(r6)
+ std r4,_LINK(r6)
+ std r5,_CCR(r6)
+ cmpwi r3,PNV_THREAD_NAP
+ bne 1f
+ IDLE_STATE_ENTER_SEQ_NORET(PPC_NAP)
+1: cmpwi r3,PNV_THREAD_SLEEP
bne 2f
- andis. r9,r15,PNV_CORE_IDLE_WINKLE_COUNT_ALL_BIT@h
- subis r15,r15,PNV_CORE_IDLE_WINKLE_COUNT@h
- beq 2f
- ori r15,r15,PNV_CORE_IDLE_THREAD_WINKLE_BITS /* all were winkle */
-2:
- /* Shift thread bit to winkle mask, then test if this thread is set,
- * and remove it from the winkle bits */
- slwi r8,r7,8
- and r8,r8,r15
- andc r15,r15,r8
- cmpwi cr4,r8,1 /* cr4 will be gt if our bit is set, lt if not */
-
- lbz r4,PACA_SUBCORE_SIBLING_MASK(r13)
- and r4,r4,r15
- cmpwi r4,0 /* Check if first in subcore */
-
- or r15,r15,r7 /* Set thread bit */
- beq first_thread_in_subcore
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
-
- or r15,r15,r7 /* Set thread bit */
- beq cr2,first_thread_in_core
-
- /* Not first thread in core or subcore to wake up */
- b clear_lock
-
-first_thread_in_subcore:
- /*
- * If waking up from sleep, subcore state is not lost. Hence
- * skip subcore state restore
- */
- blt cr4,subcore_state_restored
-
- /* Restore per-subcore state */
- ld r4,_SDR1(r1)
- mtspr SPRN_SDR1,r4
-
- ld r4,_RPR(r1)
- mtspr SPRN_RPR,r4
- ld r4,_AMOR(r1)
- mtspr SPRN_AMOR,r4
-
-subcore_state_restored:
- /*
- * Check if the thread is also the first thread in the core. If not,
- * skip to clear_lock.
- */
- bne cr2,clear_lock
-
-first_thread_in_core:
-
- /*
- * First thread in the core waking up from any state which can cause
- * partial or complete hypervisor state loss. It needs to
- * call the fastsleep workaround code if the platform requires it.
- * Call it unconditionally here. The below branch instruction will
- * be patched out if the platform does not have fastsleep or does not
- * require the workaround. Patching will be performed during the
- * discovery of idle-states.
- */
-.global pnv_fastsleep_workaround_at_exit
-pnv_fastsleep_workaround_at_exit:
- b fastsleep_workaround_at_exit
-
-timebase_resync:
- /*
- * Use cr3 which indicates that we are waking up with atleast partial
- * hypervisor state loss to determine if TIMEBASE RESYNC is needed.
- */
- ble cr3,.Ltb_resynced
- /* Time base re-sync */
- bl opal_resync_timebase;
- /*
- * If waking up from sleep (POWER8), per core state
- * is not lost, skip to clear_lock.
- */
-.Ltb_resynced:
- blt cr4,clear_lock
-
- /*
- * First thread in the core to wake up and its waking up with
- * complete hypervisor state loss. Restore per core hypervisor
- * state.
- */
-BEGIN_FTR_SECTION
- ld r4,_PTCR(r1)
- mtspr SPRN_PTCR,r4
- ld r4,_RPR(r1)
- mtspr SPRN_RPR,r4
- ld r4,_AMOR(r1)
- mtspr SPRN_AMOR,r4
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
-
- ld r4,_TSCR(r1)
- mtspr SPRN_TSCR,r4
- ld r4,_WORC(r1)
- mtspr SPRN_WORC,r4
-
-clear_lock:
- xoris r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
- lwsync
- stw r15,0(r14)
-
-common_exit:
- /*
- * Common to all threads.
- *
- * If waking up from sleep, hypervisor state is not lost. Hence
- * skip hypervisor state restore.
- */
- blt cr4,hypervisor_state_restored
-
- /* Waking up from winkle */
-
-BEGIN_MMU_FTR_SECTION
- b no_segments
-END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
- /* Restore SLB from PACA */
- ld r8,PACA_SLBSHADOWPTR(r13)
-
- .rept SLB_NUM_BOLTED
- li r3, SLBSHADOW_SAVEAREA
- LDX_BE r5, r8, r3
- addi r3, r3, 8
- LDX_BE r6, r8, r3
- andis. r7,r5,SLB_ESID_V@h
- beq 1f
- slbmte r6,r5
-1: addi r8,r8,16
- .endr
-no_segments:
-
- /* Restore per thread state */
-
- ld r4,_SPURR(r1)
- mtspr SPRN_SPURR,r4
- ld r4,_PURR(r1)
- mtspr SPRN_PURR,r4
- ld r4,_DSCR(r1)
- mtspr SPRN_DSCR,r4
- ld r4,_WORT(r1)
- mtspr SPRN_WORT,r4
+ IDLE_STATE_ENTER_SEQ_NORET(PPC_SLEEP)
+ b . /* catch bugs */
+2: IDLE_STATE_ENTER_SEQ_NORET(PPC_WINKLE)
+ b . /* catch bugs */
- /* Call cur_cpu_spec->cpu_restore() */
- LOAD_REG_ADDR(r4, cur_cpu_spec)
- ld r4,0(r4)
- ld r12,CPU_SPEC_RESTORE(r4)
-#ifdef PPC64_ELF_ABI_v1
- ld r12,0(r12)
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+_GLOBAL(idle_kvm_start_guest)
+ std r1,PACAR1(r13)
+ mflr r4
+ mfcr r5
+ /* use stack red zone rather than a new frame */
+ addi r6,r1,-INT_FRAME_SIZE
+ SAVE_GPR(2, r6)
+ SAVE_NVGPRS(r6)
+ std r4,_LINK(r6)
+ std r5,_CCR(r6)
+ b kvm_start_guest
#endif
- mtctr r12
- bctrl
-
-/*
- * On POWER9, we can come here on wakeup from a cpuidle stop state.
- * Hence restore the additional SPRs to the saved value.
- *
- * On POWER8, we come here only on winkle. Since winkle is used
- * only in the case of CPU-Hotplug, we don't need to restore
- * the additional SPRs.
- */
-BEGIN_FTR_SECTION
- bl power9_restore_additional_sprs
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
-hypervisor_state_restored:
-
- mr r12,r19
- mtlr r17
- blr /* return to pnv_powersave_wakeup */
-
-fastsleep_workaround_at_exit:
- li r3,1
- li r4,0
- bl opal_config_cpu_idle_state
- b timebase_resync
-
-/*
- * R3 here contains the value that will be returned to the caller
- * of power7_nap.
- * R12 contains SRR1 for CHECK_HMI_INTERRUPT.
- */
-.global pnv_wakeup_loss
-pnv_wakeup_loss:
- ld r1,PACAR1(r13)
-BEGIN_FTR_SECTION
- CHECK_HMI_INTERRUPT
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
- REST_NVGPRS(r1)
- REST_GPR(2, r1)
- ld r4,PACAKMSR(r13)
- ld r5,_LINK(r1)
- ld r6,_CCR(r1)
- addi r1,r1,INT_FRAME_SIZE
- mtlr r5
- mtcr r6
- mtmsrd r4
- blr
-
-/*
- * R3 here contains the value that will be returned to the caller
- * of power7_nap.
- * R12 contains SRR1 for CHECK_HMI_INTERRUPT.
- */
-pnv_wakeup_noloss:
- lbz r0,PACA_NAPSTATELOST(r13)
- cmpwi r0,0
- bne pnv_wakeup_loss
- ld r1,PACAR1(r13)
-BEGIN_FTR_SECTION
- CHECK_HMI_INTERRUPT
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
- ld r4,PACAKMSR(r13)
- ld r5,_NIP(r1)
- ld r6,_CCR(r1)
- addi r1,r1,INT_FRAME_SIZE
- mtlr r5
- mtcr r6
- mtmsrd r4
- blr
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 40b44bb53a4e..e089da156ef3 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -401,8 +401,8 @@ void __init check_for_initrd(void)
#ifdef CONFIG_SMP
-int threads_per_core, threads_per_subcore, threads_shift;
-cpumask_t threads_core_mask;
+int threads_per_core, threads_per_subcore, threads_shift __read_mostly;
+cpumask_t threads_core_mask __read_mostly;
EXPORT_SYMBOL_GPL(threads_per_core);
EXPORT_SYMBOL_GPL(threads_per_subcore);
EXPORT_SYMBOL_GPL(threads_shift);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 6e4554b273f1..90e260773aa4 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -32,6 +32,7 @@
#include <asm/opal.h>
#include <asm/xive-regs.h>
#include <asm/thread_info.h>
+#include <asm/cpuidle.h>
/* Sign-extend HDEC if not on POWER9 */
#define EXTEND_HDEC(reg) \
@@ -320,43 +321,32 @@ kvm_novcpu_exit:
b kvmhv_switch_to_host
/*
- * We come in here when wakened from nap mode.
- * Relocation is off and most register values are lost.
- * r13 points to the PACA.
+ * We come in here when wakened from Linux offline idle code.
+ * Relocation is off
* r3 contains the SRR1 wakeup value, SRR1 is trashed.
*/
.globl kvm_start_guest
kvm_start_guest:
- /* Set runlatch bit the minute you wake up from nap */
- mfspr r0, SPRN_CTRLF
- ori r0, r0, 1
- mtspr SPRN_CTRLT, r0
-
/*
* Could avoid this and pass it through in r3. For now,
* code expects it to be in SRR1.
*/
mtspr SPRN_SRR1,r3
- ld r2,PACATOC(r13)
-
li r0,0
stb r0,PACA_FTRACE_ENABLED(r13)
li r0,KVM_HWTHREAD_IN_KVM
stb r0,HSTATE_HWTHREAD_STATE(r13)
- /* NV GPR values from power7_idle() will no longer be valid */
- li r0,1
- stb r0,PACA_NAPSTATELOST(r13)
-
- /* were we napping due to cede? */
+ /* cede napping should not come through here */
lbz r0,HSTATE_NAPPING(r13)
- cmpwi r0,NAPPING_CEDE
- beq kvm_end_cede
- cmpwi r0,NAPPING_NOVCPU
- beq kvm_novcpu_wakeup
+ twnei r0,0
+/*
+ * We can come in at this point from KVM nap.
+ */
+do_start_guest:
ld r1,PACAEMERGSP(r13)
subi r1,r1,STACK_FRAME_OVERHEAD
@@ -467,19 +457,17 @@ kvm_no_guest:
lbz r3, HSTATE_HWTHREAD_REQ(r13)
cmpwi r3, 0
bne 54f
-/*
- * We jump to pnv_wakeup_loss, which will return to the caller
- * of power7_nap in the powernv cpu offline loop. The value we
- * put in r3 becomes the return value for power7_nap. pnv_wakeup_loss
- * requires SRR1 in r12.
- */
+
+ /*
+ * Jump to idle_return_gpr_loss, which returns to the
+ * idle_kvm_start_guest caller.
+ */
li r3, LPCR_PECE0
mfspr r4, SPRN_LPCR
rlwimi r4, r3, 0, LPCR_PECE0 | LPCR_PECE1
mtspr SPRN_LPCR, r4
li r3, 0
- mfspr r12,SPRN_SRR1
- b pnv_wakeup_loss
+ b idle_return_gpr_loss
53: HMT_LOW
ld r5, HSTATE_KVM_VCORE(r13)
@@ -2648,6 +2636,7 @@ _GLOBAL(kvmppc_h_cede) /* r3 = vcpu pointer, r11 = msr, r13 = paca */
* switch occurs: SLB entries, PURR, SPURR, AMOR, UAMOR, AMR, SPRG0-3,
* DAR, DSISR, DABR, DABRX, DSCR, PMCx, MMCRx, SIAR, SDAR.
*/
+#if 1
/* Save non-volatile GPRs */
std r14, VCPU_GPR(R14)(r3)
std r15, VCPU_GPR(R15)(r3)
@@ -2667,6 +2656,7 @@ _GLOBAL(kvmppc_h_cede) /* r3 = vcpu pointer, r11 = msr, r13 = paca */
std r29, VCPU_GPR(R29)(r3)
std r30, VCPU_GPR(R30)(r3)
std r31, VCPU_GPR(R31)(r3)
+#endif
/* save FP state */
bl kvmppc_save_fp
@@ -2758,21 +2748,47 @@ BEGIN_FTR_SECTION
li r4, LPCR_PECE_HVEE@higher
sldi r4, r4, 32
or r5, r5, r4
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+FTR_SECTION_ELSE
+ li r3, PNV_THREAD_NAP
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
mtspr SPRN_LPCR,r5
isync
- li r0, 0
- std r0, HSTATE_SCRATCH0(r13)
- ptesync
- ld r0, HSTATE_SCRATCH0(r13)
-1: cmpd r0, r0
- bne 1b
+
+ mr r0, r1
+ ld r1, PACAEMERGSP(r13)
+ subi r1, r1, STACK_FRAME_OVERHEAD
+ std r0, 0(r1)
+ ld r0, PACAR1(r13)
+ std r0, 8(r1)
+
BEGIN_FTR_SECTION
- nap
+ bl isa3_idle_stop_mayloss
FTR_SECTION_ELSE
- PPC_STOP
-ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
- b .
+ bl isa206_idle_insn_mayloss
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
+
+ mfspr r0, SPRN_CTRLF
+ ori r0, r0, 1
+ mtspr SPRN_CTRLT, r0
+
+ ld r0, 8(r1)
+ std r0, PACAR1(r13)
+ ld r1, 0(r1)
+
+ mtspr r3, SPRN_SRR1
+
+ li r0, 0
+ stb r0, PACA_FTRACE_ENABLED(r13)
+
+ li r0, KVM_HWTHREAD_IN_KVM
+ stb r0, HSTATE_HWTHREAD_STATE(r13)
+
+ lbz r0, HSTATE_NAPPING(r13)
+ cmpwi r0, NAPPING_CEDE
+ beq kvm_end_cede
+ cmpwi r0, NAPPING_NOVCPU
+ beq kvm_novcpu_wakeup
+ b do_start_guest
33: mr r4, r3
li r3, 0
@@ -2821,6 +2837,7 @@ END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_HV_ASSIST, 0)
subf r3, r7, r3
mtspr SPRN_DEC, r3
+#if 1
/* Load NV GPRS */
ld r14, VCPU_GPR(R14)(r4)
ld r15, VCPU_GPR(R15)(r4)
@@ -2840,6 +2857,7 @@ END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_HV_ASSIST, 0)
ld r29, VCPU_GPR(R29)(r4)
ld r30, VCPU_GPR(R30)(r4)
ld r31, VCPU_GPR(R31)(r4)
+#endif
/* Check the wake reason in SRR1 to see why we got here */
bl kvmppc_check_wake_reason
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index cb796724a6fc..1518beec3161 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -90,6 +90,21 @@ static inline void create_shadowed_slbe(unsigned long ea, int ssize,
: "memory" );
}
+void slb_shadow_reload(void)
+{
+ struct slb_shadow *p = get_slb_shadow();
+ enum slb_index index;
+
+ for (index = 0; index < SLB_NUM_BOLTED; index++) {
+ if (be64_to_cpu(p->save_area[index].esid) & SLB_ESID_V) {
+ asm volatile("slbmte %0,%1" :
+ : "r" (be64_to_cpu(p->save_area[index].vsid)),
+ "r" (be64_to_cpu(p->save_area[index].esid)));
+ }
+ }
+ isync();
+}
+
static void __slb_flush_and_rebolt(void)
{
/* If you change this make sure you change SLB_NUM_BOLTED
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 12f13acee1f6..bf85e86f1064 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -16,6 +16,7 @@
#include <linux/device.h>
#include <linux/cpu.h>
+#include <asm/asm-prototypes.h>
#include <asm/firmware.h>
#include <asm/machdep.h>
#include <asm/opal.h>
@@ -35,7 +36,7 @@
#define P9_STOP_SPR_MSR 2000
#define P9_STOP_SPR_PSSCR 855
-static u32 supported_cpuidle_states;
+static u32 pm_flags_possible; /* OPAL_PM_ flags */
/*
* The default stop state that will be used by ppc_md.power_save
@@ -46,10 +47,10 @@ static u64 pnv_default_stop_mask;
static bool default_stop_found;
/*
- * First deep stop state. Used to figure out when to save/restore
- * hypervisor context.
+ * First stop state levels when HV and TB loss can occur.
*/
-u64 pnv_first_deep_stop_state = MAX_STOP_STATE;
+static u64 pnv_first_tb_loss_level = MAX_STOP_STATE + 1;
+static u64 pnv_first_hv_loss_level = MAX_STOP_STATE + 1;
/*
* psscr value and mask of the deepest stop idle state.
@@ -60,6 +61,8 @@ static u64 pnv_deepest_stop_psscr_mask;
static u64 pnv_deepest_stop_flag;
static bool deepest_stop_found;
+static unsigned long power7_offline_type;
+
static int pnv_save_sprs_for_deep_states(void)
{
int cpu;
@@ -70,12 +73,12 @@ static int pnv_save_sprs_for_deep_states(void)
* all cpus at boot. Get these reg values of current cpu and use the
* same across all cpus.
*/
- uint64_t lpcr_val = mfspr(SPRN_LPCR);
- uint64_t hid0_val = mfspr(SPRN_HID0);
- uint64_t hid1_val = mfspr(SPRN_HID1);
- uint64_t hid4_val = mfspr(SPRN_HID4);
- uint64_t hid5_val = mfspr(SPRN_HID5);
- uint64_t hmeer_val = mfspr(SPRN_HMEER);
+ uint64_t lpcr_val = mfspr(SPRN_LPCR);
+ uint64_t hid0_val = mfspr(SPRN_HID0);
+ uint64_t hid1_val = mfspr(SPRN_HID1);
+ uint64_t hid4_val = mfspr(SPRN_HID4);
+ uint64_t hid5_val = mfspr(SPRN_HID5);
+ uint64_t hmeer_val = mfspr(SPRN_HMEER);
uint64_t msr_val = MSR_IDLE;
uint64_t psscr_val = pnv_deepest_stop_psscr_val;
@@ -135,92 +138,9 @@ static int pnv_save_sprs_for_deep_states(void)
return 0;
}
-static void pnv_alloc_idle_core_states(void)
-{
- int i, j;
- int nr_cores = cpu_nr_cores();
- u32 *core_idle_state;
-
- /*
- * core_idle_state - The lower 8 bits track the idle state of
- * each thread of the core.
- *
- * The most significant bit is the lock bit.
- *
- * Initially all the bits corresponding to threads_per_core
- * are set. They are cleared when the thread enters deep idle
- * state like sleep and winkle/stop.
- *
- * Initially the lock bit is cleared. The lock bit has 2
- * purposes:
- * a. While the first thread in the core waking up from
- * idle is restoring core state, it prevents other
- * threads in the core from switching to process
- * context.
- * b. While the last thread in the core is saving the
- * core state, it prevents a different thread from
- * waking up.
- */
- for (i = 0; i < nr_cores; i++) {
- int first_cpu = i * threads_per_core;
- int node = cpu_to_node(first_cpu);
- size_t paca_ptr_array_size;
-
- core_idle_state = kmalloc_node(sizeof(u32), GFP_KERNEL, node);
- *core_idle_state = (1 << threads_per_core) - 1;
- paca_ptr_array_size = (threads_per_core *
- sizeof(struct paca_struct *));
-
- for (j = 0; j < threads_per_core; j++) {
- int cpu = first_cpu + j;
-
- paca_ptrs[cpu]->core_idle_state_ptr = core_idle_state;
- paca_ptrs[cpu]->thread_idle_state = PNV_THREAD_RUNNING;
- paca_ptrs[cpu]->thread_mask = 1 << j;
- }
- }
-
- update_subcore_sibling_mask();
-
- if (supported_cpuidle_states & OPAL_PM_LOSE_FULL_CONTEXT) {
- int rc = pnv_save_sprs_for_deep_states();
-
- if (likely(!rc))
- return;
-
- /*
- * The stop-api is unable to restore hypervisor
- * resources on wakeup from platform idle states which
- * lose full context. So disable such states.
- */
- supported_cpuidle_states &= ~OPAL_PM_LOSE_FULL_CONTEXT;
- pr_warn("cpuidle-powernv: Disabling idle states that lose full context\n");
- pr_warn("cpuidle-powernv: Idle power-savings, CPU-Hotplug affected\n");
-
- if (cpu_has_feature(CPU_FTR_ARCH_300) &&
- (pnv_deepest_stop_flag & OPAL_PM_LOSE_FULL_CONTEXT)) {
- /*
- * Use the default stop state for CPU-Hotplug
- * if available.
- */
- if (default_stop_found) {
- pnv_deepest_stop_psscr_val =
- pnv_default_stop_val;
- pnv_deepest_stop_psscr_mask =
- pnv_default_stop_mask;
- pr_warn("cpuidle-powernv: Offlined CPUs will stop with psscr = 0x%016llx\n",
- pnv_deepest_stop_psscr_val);
- } else { /* Fallback to snooze loop for CPU-Hotplug */
- deepest_stop_found = false;
- pr_warn("cpuidle-powernv: Offlined CPUs will busy wait\n");
- }
- }
- }
-}
-
u32 pnv_get_supported_cpuidle_states(void)
{
- return supported_cpuidle_states;
+ return pm_flags_possible;
}
EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states);
@@ -236,6 +156,9 @@ static void pnv_fastsleep_workaround_apply(void *info)
*err = 1;
}
+static bool power7_fastsleep_workaround_entry = true;
+static bool power7_fastsleep_workaround_exit = true;
+
/*
* Used to store fastsleep workaround state
* 0 - Workaround applied/undone at fastsleep entry/exit path (Default)
@@ -275,13 +198,7 @@ static ssize_t store_fastsleep_workaround_applyonce(struct device *dev,
* offlined, as last thread of the core entering fastsleep or deeper
* state would have applied workaround.
*/
- err = patch_instruction(
- (unsigned int *)pnv_fastsleep_workaround_at_exit,
- PPC_INST_NOP);
- if (err) {
- pr_err("fastsleep_workaround_applyonce change failed while patching pnv_fastsleep_workaround_at_exit");
- goto fail;
- }
+ power7_fastsleep_workaround_exit = false;
get_online_cpus();
primary_thread_mask = cpu_online_cores_map();
@@ -294,13 +211,7 @@ static ssize_t store_fastsleep_workaround_applyonce(struct device *dev,
goto fail;
}
- err = patch_instruction(
- (unsigned int *)pnv_fastsleep_workaround_at_entry,
- PPC_INST_NOP);
- if (err) {
- pr_err("fastsleep_workaround_applyonce change failed while patching pnv_fastsleep_workaround_at_entry");
- goto fail;
- }
+ power7_fastsleep_workaround_entry = false;
fastsleep_workaround_applyonce = 1;
@@ -313,6 +224,308 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
show_fastsleep_workaround_applyonce,
store_fastsleep_workaround_applyonce);
+static inline void atomic_start_thread_idle(void)
+{
+ int cpu = raw_smp_processor_id();
+ int first = cpu_first_thread_sibling(cpu);
+ int thread_nr = cpu_thread_in_core(cpu);
+ unsigned long *state = &paca_ptrs[first]->idle_state;
+
+ clear_bit(thread_nr, state);
+}
+
+static inline void atomic_stop_thread_idle(void)
+{
+ int cpu = raw_smp_processor_id();
+ int first = cpu_first_thread_sibling(cpu);
+ int thread_nr = cpu_thread_in_core(cpu);
+ unsigned long *state = &paca_ptrs[first]->idle_state;
+
+ set_bit(thread_nr, state);
+}
+
+static inline void atomic_lock_thread_idle(void)
+{
+ int cpu = raw_smp_processor_id();
+ int first = cpu_first_thread_sibling(cpu);
+ unsigned long *state = &paca_ptrs[first]->idle_state;
+
+ while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, state)))
+ barrier();
+ isync();
+}
+
+static inline void atomic_unlock_and_stop_thread_idle(void)
+{
+ int cpu = raw_smp_processor_id();
+ int first = cpu_first_thread_sibling(cpu);
+ unsigned long thread = 1UL << cpu_thread_in_core(cpu);
+ unsigned long *state = &paca_ptrs[first]->idle_state;
+ u64 s = READ_ONCE(*state);
+ u64 new, tmp;
+
+ isync();
+
+ BUG_ON(!(s & PNV_CORE_IDLE_LOCK_BIT));
+ BUG_ON(s & thread);
+
+again:
+ new = (s | thread) & ~PNV_CORE_IDLE_LOCK_BIT;
+ tmp = cmpxchg(state, s, new);
+ if (unlikely(tmp != s)) {
+ s = tmp;
+ goto again;
+ }
+}
+
+static inline void atomic_unlock_thread_idle(void)
+{
+ int cpu = raw_smp_processor_id();
+ int first = cpu_first_thread_sibling(cpu);
+ unsigned long *state = &paca_ptrs[first]->idle_state;
+
+ BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, state));
+ clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, state);
+}
+
+struct p7_sprs {
+ /* per core */
+ u64 tscr;
+ u64 worc;
+
+ /* per subcore */
+ u64 sdr1;
+ u64 rpr;
+ u64 amor;
+
+ /* per thread */
+ u64 lpcr;
+ u64 hfscr;
+ u64 fscr;
+ u64 purr;
+ u64 spurr;
+ u64 dscr;
+ u64 wort;
+
+ u64 mmcra;
+ u32 mmcr0;
+ u32 mmcr1;
+ u64 mmcr2;
+};
+
+static unsigned long power7_idle_insn(unsigned long type)
+{
+ int cpu = raw_smp_processor_id();
+ int first = cpu_first_thread_sibling(cpu);
+ unsigned long thread = 1UL << cpu_thread_in_core(cpu);
+ unsigned long *state = &paca_ptrs[first]->idle_state;
+ unsigned long srr1;
+ bool full_winkle;
+ struct p7_sprs sprs;
+ bool sprs_saved = false;
+ int rc;
+
+ memset(&sprs, 0, sizeof(sprs));
+
+ if (unlikely(type != PNV_THREAD_NAP)) {
+ atomic_lock_thread_idle();
+
+ BUG_ON(!(*state & thread));
+ *state &= ~thread;
+
+ if (power7_fastsleep_workaround_entry) {
+ if ((*state & ((1 << threads_per_core) - 1)) == 0) {
+ rc = opal_config_cpu_idle_state(
+ OPAL_CONFIG_IDLE_FASTSLEEP,
+ OPAL_CONFIG_IDLE_APPLY);
+ BUG_ON(rc);
+ }
+ }
+
+ if (type == PNV_THREAD_WINKLE) {
+ sprs.tscr = mfspr(SPRN_TSCR);
+ sprs.worc = mfspr(SPRN_WORC);
+
+ sprs.sdr1 = mfspr(SPRN_SDR1);
+ sprs.rpr = mfspr(SPRN_RPR);
+ sprs.amor = mfspr(SPRN_AMOR);
+
+ sprs.lpcr = mfspr(SPRN_LPCR);
+ sprs.hfscr = mfspr(SPRN_HFSCR);
+ sprs.fscr = mfspr(SPRN_FSCR);
+ sprs.purr = mfspr(SPRN_PURR);
+ sprs.spurr = mfspr(SPRN_SPURR);
+ sprs.dscr = mfspr(SPRN_DSCR);
+ sprs.wort = mfspr(SPRN_WORT);
+
+ sprs.mmcra = mfspr(SPRN_MMCRA);
+ sprs.mmcr0 = mfspr(SPRN_MMCR0);
+ sprs.mmcr1 = mfspr(SPRN_MMCR1);
+ sprs.mmcr2 = mfspr(SPRN_MMCR2);
+
+ sprs_saved = true;
+
+ /*
+ * Increment winkle counter and set all winkle bits if
+ * all threads are winkling. This allows wakeup side to
+ * distinguish between fast sleep and winkle state
+ * loss. Fast sleep still has to resync the timebase so
+ * this may not be a really big win.
+ */
+ *state += 1 << PNV_CORE_IDLE_WINKLE_COUNT_SHIFT;
+ if ((*state & PNV_CORE_IDLE_WINKLE_COUNT_BITS) >> PNV_CORE_IDLE_WINKLE_COUNT_SHIFT == threads_per_core)
+ *state |= PNV_CORE_IDLE_THREAD_WINKLE_BITS;
+ WARN_ON((*state & PNV_CORE_IDLE_WINKLE_COUNT_BITS) == 0);
+ }
+
+ atomic_unlock_thread_idle();
+ }
+
+ srr1 = isa206_idle_insn_mayloss(type);
+
+ WARN_ON_ONCE(!srr1);
+ WARN_ON_ONCE(mfmsr() & (MSR_IR|MSR_DR));
+
+ if (unlikely((srr1 & SRR1_WAKEMASK_P8) == SRR1_WAKEHMI))
+ hmi_exception_realmode(NULL);
+
+ if (likely((srr1 & SRR1_WAKESTATE) != SRR1_WS_HVLOSS)) {
+ if (unlikely(type != PNV_THREAD_NAP)) {
+ atomic_lock_thread_idle();
+ if (type == PNV_THREAD_WINKLE) {
+ WARN_ON((*state & PNV_CORE_IDLE_WINKLE_COUNT_BITS) == 0);
+ *state -= 1 << PNV_CORE_IDLE_WINKLE_COUNT_SHIFT;
+ *state &= ~(thread << PNV_CORE_IDLE_THREAD_WINKLE_BITS_SHIFT);
+ }
+ atomic_unlock_and_stop_thread_idle();
+ }
+ return srr1;
+ }
+
+ /* HV state loss */
+ BUG_ON(type == PNV_THREAD_NAP);
+
+ atomic_lock_thread_idle();
+
+ full_winkle = false;
+ if (type == PNV_THREAD_WINKLE) {
+ WARN_ON((*state & PNV_CORE_IDLE_WINKLE_COUNT_BITS) == 0);
+ *state -= 1 << PNV_CORE_IDLE_WINKLE_COUNT_SHIFT;
+ if (*state & (thread << PNV_CORE_IDLE_THREAD_WINKLE_BITS_SHIFT)) {
+ *state &= ~(thread << PNV_CORE_IDLE_THREAD_WINKLE_BITS_SHIFT);
+ full_winkle = true;
+ BUG_ON(!sprs_saved);
+ }
+ }
+
+ WARN_ON(*state & thread);
+
+ if ((*state & ((1 << threads_per_core) - 1)) != 0)
+ goto core_woken;
+
+ /* Per-core SPRs */
+ if (full_winkle) {
+ mtspr(SPRN_TSCR, sprs.tscr);
+ mtspr(SPRN_WORC, sprs.worc);
+ }
+
+ if (power7_fastsleep_workaround_exit) {
+ rc = opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP,
+ OPAL_CONFIG_IDLE_UNDO);
+ BUG_ON(rc);
+ }
+
+ /* TB */
+ if (opal_resync_timebase() != OPAL_SUCCESS)
+ BUG();
+
+core_woken:
+ if (!full_winkle)
+ goto subcore_woken;
+
+ if ((*state & local_paca->subcore_sibling_mask) != 0)
+ goto subcore_woken;
+
+ /* Per-subcore SPRs */
+ mtspr(SPRN_SDR1, sprs.sdr1);
+ mtspr(SPRN_RPR, sprs.rpr);
+ mtspr(SPRN_AMOR, sprs.amor);
+
+subcore_woken:
+ atomic_unlock_and_stop_thread_idle();
+
+ /* Fast sleep does not lose SPRs */
+ if (!full_winkle)
+ return srr1;
+
+ /* Per-thread SPRs */
+ mtspr(SPRN_LPCR, sprs.lpcr);
+ mtspr(SPRN_HFSCR, sprs.hfscr);
+ mtspr(SPRN_FSCR, sprs.fscr);
+ mtspr(SPRN_PURR, sprs.purr);
+ mtspr(SPRN_SPURR, sprs.spurr);
+ mtspr(SPRN_DSCR, sprs.dscr);
+ mtspr(SPRN_WORT, sprs.wort);
+
+ mtspr(SPRN_MMCRA, sprs.mmcra);
+ mtspr(SPRN_MMCR0, sprs.mmcr0);
+ mtspr(SPRN_MMCR1, sprs.mmcr1);
+ mtspr(SPRN_MMCR2, sprs.mmcr2);
+
+ mtspr(SPRN_SPRG3, local_paca->sprg_vdso);
+
+ slb_shadow_reload();
+
+ return srr1;
+}
+
+extern unsigned long idle_kvm_start_guest(unsigned long srr1);
+
+static unsigned long power7_offline(void)
+{
+ unsigned long srr1;
+
+ mtmsr(MSR_IDLE);
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ /* Tell KVM we're entering idle. */
+ /******************************************************/
+ /* N O T E W E L L ! ! ! N O T E W E L L */
+ /* The following store to HSTATE_HWTHREAD_STATE(r13) */
+ /* MUST occur in real mode, i.e. with the MMU off, */
+ /* and the MMU must stay off until we clear this flag */
+ /* and test HSTATE_HWTHREAD_REQ(r13) in */
+ /* pnv_powersave_wakeup in this file. */
+ /* The reason is that another thread can switch the */
+ /* MMU to a guest context whenever this flag is set */
+ /* to KVM_HWTHREAD_IN_IDLE, and if the MMU was on, */
+ /* that would potentially cause this thread to start */
+ /* executing instructions from guest memory in */
+ /* hypervisor mode, leading to a host crash or data */
+ /* corruption, or worse. */
+ /******************************************************/
+ local_paca->kvm_hstate.hwthread_state = KVM_HWTHREAD_IN_IDLE;
+#endif
+
+ __ppc64_runlatch_off();
+ srr1 = power7_idle_insn(power7_offline_type);
+ __ppc64_runlatch_on();
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ if (local_paca->kvm_hstate.hwthread_state != KVM_HWTHREAD_IN_KERNEL) {
+ local_paca->kvm_hstate.hwthread_state = KVM_HWTHREAD_IN_KERNEL;
+ /* Order setting hwthread_state vs. testing hwthread_req */
+ smp_mb();
+ }
+ if (local_paca->kvm_hstate.hwthread_req)
+ srr1 = idle_kvm_start_guest(srr1);
+#endif
+
+ mtmsr(MSR_KERNEL);
+
+ return srr1;
+}
+
static unsigned long __power7_idle_type(unsigned long type)
{
unsigned long srr1;
@@ -320,9 +533,11 @@ static unsigned long __power7_idle_type(unsigned long type)
if (!prep_irq_for_idle_irqsoff())
return 0;
+ mtmsr(MSR_IDLE);
__ppc64_runlatch_off();
srr1 = power7_idle_insn(type);
__ppc64_runlatch_on();
+ mtmsr(MSR_KERNEL);
fini_irq_for_idle_irqsoff();
@@ -345,6 +560,244 @@ void power7_idle(void)
power7_idle_type(PNV_THREAD_NAP);
}
+struct p9_sprs {
+ /* per core */
+ u64 ptcr;
+ u64 rpr;
+ u64 tscr;
+ u64 ldbar;
+ u64 amor;
+
+ /* per thread */
+ u64 lpcr;
+ u64 hfscr;
+ u64 fscr;
+ u64 pid;
+ u64 purr;
+ u64 spurr;
+ u64 dscr;
+ u64 wort;
+
+ u64 mmcra;
+ u32 mmcr0;
+ u32 mmcr1;
+ u64 mmcr2;
+};
+
+static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
+{
+ int cpu = raw_smp_processor_id();
+ int first = cpu_first_thread_sibling(cpu);
+ unsigned long *state = &paca_ptrs[first]->idle_state;
+ unsigned long srr1;
+ unsigned long mmcr0 = 0;
+ struct p9_sprs sprs;
+ bool sprs_saved = false;
+
+ memset(&sprs, 0, sizeof(sprs));
+
+ if (!(psscr & (PSSCR_EC|PSSCR_ESL))) {
+ BUG_ON(!mmu_on);
+
+ /*
+ * Wake synchronously. SRESET via xscom may still cause
+ * a 0x100 powersave wakeup with SRR1 reason!
+ */
+ srr1 = isa3_idle_stop_noloss(psscr);
+ if (likely(!srr1))
+ return 0;
+
+ /*
+ * Registers not saved, can't recover!
+ * This would be a hardware bug
+ */
+ BUG_ON((srr1 & SRR1_WAKESTATE) != SRR1_WS_NOLOSS);
+
+ goto out;
+ }
+
+ /* EC=ESL=1 case */
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ if (cpu_has_feature(CPU_FTR_P9_TM_XER_SO_BUG)) {
+ local_paca->requested_psscr = psscr;
+ /* order setting requested_psscr vs testing dont_stop */
+ smp_mb();
+ if (atomic_read(&local_paca->dont_stop)) {
+ local_paca->requested_psscr = 0;
+ return 0;
+ }
+ }
+#endif
+
+ if (!cpu_has_feature(CPU_FTR_POWER9_DD2_1)) {
+ /*
+ * POWER9 DD2 can incorrectly set PMAO when waking up
+ * after a state-loss idle. Saving and restoring MMCR0
+ * over idle is a workaround.
+ */
+ mmcr0 = mfspr(SPRN_MMCR0);
+ }
+ if ((psscr & PSSCR_RL_MASK) >= pnv_first_hv_loss_level) {
+ sprs.lpcr = mfspr(SPRN_LPCR);
+ sprs.hfscr = mfspr(SPRN_HFSCR);
+ sprs.fscr = mfspr(SPRN_FSCR);
+ sprs.pid = mfspr(SPRN_PID);
+ sprs.purr = mfspr(SPRN_PURR);
+ sprs.spurr = mfspr(SPRN_SPURR);
+ sprs.dscr = mfspr(SPRN_DSCR);
+ sprs.wort = mfspr(SPRN_WORT);
+
+ sprs.mmcra = mfspr(SPRN_MMCRA);
+ sprs.mmcr0 = mfspr(SPRN_MMCR0);
+ sprs.mmcr1 = mfspr(SPRN_MMCR1);
+ sprs.mmcr2 = mfspr(SPRN_MMCR2);
+
+ sprs.ptcr = mfspr(SPRN_PTCR);
+ sprs.rpr = mfspr(SPRN_RPR);
+ sprs.tscr = mfspr(SPRN_TSCR);
+ sprs.ldbar = mfspr(SPRN_LDBAR);
+ sprs.amor = mfspr(SPRN_AMOR);
+
+ sprs_saved = true;
+
+ atomic_start_thread_idle();
+ }
+
+ srr1 = isa3_idle_stop_mayloss(psscr);
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ local_paca->requested_psscr = 0;
+#endif
+
+ psscr = mfspr(SPRN_PSSCR);
+
+ WARN_ON_ONCE(!srr1);
+ WARN_ON_ONCE(mfmsr() & (MSR_IR|MSR_DR));
+
+ if ((srr1 & SRR1_WAKESTATE) != SRR1_WS_NOLOSS) {
+ unsigned long mmcra;
+
+ /*
+ * Workaround for POWER9 DD2.0, if we lost resources, the ERAT
+ * might have been corrupted and needs flushing. We also need
+ * to reload MMCR0 (see mmcr0 comment above).
+ */
+ if (!cpu_has_feature(CPU_FTR_POWER9_DD2_1)) {
+ asm volatile(PPC_INVALIDATE_ERAT);
+ mtspr(SPRN_MMCR0, mmcr0);
+ }
+
+ /*
+ * DD2.2 and earlier need to set then clear bit 60 in MMCRA
+ * to ensure the PMU starts running.
+ */
+ mmcra = mfspr(SPRN_MMCRA);
+ mmcra |= PPC_BIT(60);
+ mtspr(SPRN_MMCRA, mmcra);
+ mmcra &= ~PPC_BIT(60);
+ mtspr(SPRN_MMCRA, mmcra);
+ }
+
+ if (unlikely((srr1 & SRR1_WAKEMASK_P8) == SRR1_WAKEHMI))
+ hmi_exception_realmode(NULL);
+
+ /*
+ * On POWER9, SRR1 bits do not match exactly as expected.
+ * SRR1_WS_GPRLOSS (10b) can also result in SPR loss, so
+ * always test PSSCR if there is any state loss.
+ */
+ if (likely((psscr & PSSCR_RL_MASK) < pnv_first_hv_loss_level)) {
+ if (sprs_saved)
+ atomic_stop_thread_idle();
+ goto out;
+ }
+
+ /* HV state loss */
+ BUG_ON(!sprs_saved);
+
+ atomic_lock_thread_idle();
+
+ if ((*state & ((1 << threads_per_core) - 1)) != 0)
+ goto core_woken;
+
+ /* Per-core SPRs */
+ mtspr(SPRN_PTCR, sprs.ptcr);
+ mtspr(SPRN_RPR, sprs.rpr);
+ mtspr(SPRN_TSCR, sprs.tscr);
+ mtspr(SPRN_LDBAR, sprs.ldbar);
+ mtspr(SPRN_AMOR, sprs.amor);
+
+ if ((psscr & PSSCR_RL_MASK) >= pnv_first_tb_loss_level) {
+ /* TB loss */
+ if (opal_resync_timebase() != OPAL_SUCCESS)
+ BUG();
+ }
+
+core_woken:
+ atomic_unlock_and_stop_thread_idle();
+
+ /* Per-thread SPRs */
+ mtspr(SPRN_LPCR, sprs.lpcr);
+ mtspr(SPRN_HFSCR, sprs.hfscr);
+ mtspr(SPRN_FSCR, sprs.fscr);
+ mtspr(SPRN_PID, sprs.pid);
+ mtspr(SPRN_PURR, sprs.purr);
+ mtspr(SPRN_SPURR, sprs.spurr);
+ mtspr(SPRN_DSCR, sprs.dscr);
+ mtspr(SPRN_WORT, sprs.wort);
+
+ mtspr(SPRN_MMCRA, sprs.mmcra);
+ mtspr(SPRN_MMCR0, sprs.mmcr0);
+ mtspr(SPRN_MMCR1, sprs.mmcr1);
+ mtspr(SPRN_MMCR2, sprs.mmcr2);
+
+ mtspr(SPRN_SPRG3, local_paca->sprg_vdso);
+
+ if (!radix_enabled())
+ slb_shadow_reload();
+
+out:
+ if (mmu_on)
+ mtmsr(MSR_KERNEL);
+
+ return srr1;
+}
+
+static unsigned long power9_offline_stop(unsigned long psscr)
+{
+ unsigned long srr1;
+
+#ifndef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ __ppc64_runlatch_off();
+ srr1 = power9_idle_stop(psscr, true);
+ __ppc64_runlatch_on();
+#else
+ /*
+ * Tell KVM we're entering idle.
+ * This does not have to be done in real mode because the P9 MMU
+ * is independent per-thread. Some steppings share radix/hash mode
+ * between threads, but in that case KVM has a barrier sync in real
+ * mode before and after switching between radix and hash.
+ */
+ local_paca->kvm_hstate.hwthread_state = KVM_HWTHREAD_IN_IDLE;
+
+ __ppc64_runlatch_off();
+ srr1 = power9_idle_stop(psscr, false);
+ __ppc64_runlatch_on();
+
+ if (local_paca->kvm_hstate.hwthread_state != KVM_HWTHREAD_IN_KERNEL) {
+ local_paca->kvm_hstate.hwthread_state = KVM_HWTHREAD_IN_KERNEL;
+ /* Order setting hwthread_state vs. testing hwthread_req */
+ smp_mb();
+ }
+ if (local_paca->kvm_hstate.hwthread_req)
+ srr1 = idle_kvm_start_guest(srr1);
+ mtmsr(MSR_KERNEL);
+#endif
+
+ return srr1;
+}
+
static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
unsigned long stop_psscr_mask)
{
@@ -358,7 +811,7 @@ static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
__ppc64_runlatch_off();
- srr1 = power9_idle_stop(psscr);
+ srr1 = power9_idle_stop(psscr, true);
__ppc64_runlatch_on();
fini_irq_for_idle_irqsoff();
@@ -407,7 +860,7 @@ void pnv_power9_force_smt4_catch(void)
atomic_inc(&paca_ptrs[cpu0+thr]->dont_stop);
}
/* order setting dont_stop vs testing requested_psscr */
- mb();
+ smp_mb();
for (thr = 0; thr < threads_per_core; ++thr) {
if (!paca_ptrs[cpu0+thr]->requested_psscr)
++awake_threads;
@@ -466,7 +919,7 @@ static void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val)
* Program the LPCR via stop-api only if the deepest stop state
* can lose hypervisor context.
*/
- if (supported_cpuidle_states & OPAL_PM_LOSE_FULL_CONTEXT)
+ if (pm_flags_possible & OPAL_PM_LOSE_FULL_CONTEXT)
opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
}
@@ -478,7 +931,6 @@ static void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val)
unsigned long pnv_cpu_offline(unsigned int cpu)
{
unsigned long srr1;
- u32 idle_states = pnv_get_supported_cpuidle_states();
u64 lpcr_val;
/*
@@ -503,15 +955,8 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
psscr = (psscr & ~pnv_deepest_stop_psscr_mask) |
pnv_deepest_stop_psscr_val;
srr1 = power9_offline_stop(psscr);
-
- } else if ((idle_states & OPAL_PM_WINKLE_ENABLED) &&
- (idle_states & OPAL_PM_LOSE_FULL_CONTEXT)) {
- srr1 = power7_idle_insn(PNV_THREAD_WINKLE);
- } else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
- (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
- srr1 = power7_idle_insn(PNV_THREAD_SLEEP);
- } else if (idle_states & OPAL_PM_NAP_ENABLED) {
- srr1 = power7_idle_insn(PNV_THREAD_NAP);
+ } else if (cpu_has_feature(CPU_FTR_ARCH_206) && power7_offline_type) {
+ srr1 = power7_offline();
} else {
/* This is the fallback method. We emulate snooze */
while (!generic_check_cpu_restart(cpu)) {
@@ -623,7 +1068,8 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
u64 *psscr_val = NULL;
u64 *psscr_mask = NULL;
u32 *residency_ns = NULL;
- u64 max_residency_ns = 0;
+ u64 max_deep_residency_ns = 0;
+ u64 max_default_residency_ns = 0;
int rc = 0, i;
psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL);
@@ -661,26 +1107,32 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
}
/*
- * Set pnv_first_deep_stop_state, pnv_deepest_stop_psscr_{val,mask},
- * and the pnv_default_stop_{val,mask}.
- *
- * pnv_first_deep_stop_state should be set to the first stop
- * level to cause hypervisor state loss.
- *
* pnv_deepest_stop_{val,mask} should be set to values corresponding to
* the deepest stop state.
*
* pnv_default_stop_{val,mask} should be set to values corresponding to
- * the shallowest (OPAL_PM_STOP_INST_FAST) loss-less stop state.
+ * the deepest loss-less (OPAL_PM_STOP_INST_FAST) stop state.
*/
- pnv_first_deep_stop_state = MAX_STOP_STATE;
+ pnv_first_tb_loss_level = MAX_STOP_STATE + 1;
+ pnv_first_hv_loss_level = MAX_STOP_STATE + 1;
for (i = 0; i < dt_idle_states; i++) {
int err;
u64 psscr_rl = psscr_val[i] & PSSCR_RL_MASK;
- if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
- (pnv_first_deep_stop_state > psscr_rl))
- pnv_first_deep_stop_state = psscr_rl;
+ /*
+ * Deep state idle entry/exit does not optimize states that
+ * lose timebase but not other SPRs, it always restores all
+ * SPRs for any HV loss level. POWER9 does not have any
+ * states where this is applies.
+ */
+ if ((flags[i] & (OPAL_PM_LOSE_FULL_CONTEXT |
+ OPAL_PM_TIMEBASE_STOP)) &&
+ (pnv_first_hv_loss_level > psscr_rl))
+ pnv_first_hv_loss_level = psscr_rl;
+
+ if ((flags[i] & OPAL_PM_TIMEBASE_STOP) &&
+ (pnv_first_tb_loss_level > psscr_rl))
+ pnv_first_tb_loss_level = psscr_rl;
err = validate_psscr_val_mask(&psscr_val[i], &psscr_mask[i],
flags[i]);
@@ -689,19 +1141,21 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
continue;
}
- if (max_residency_ns < residency_ns[i]) {
- max_residency_ns = residency_ns[i];
+ if (max_deep_residency_ns < residency_ns[i]) {
+ max_deep_residency_ns = residency_ns[i];
pnv_deepest_stop_psscr_val = psscr_val[i];
pnv_deepest_stop_psscr_mask = psscr_mask[i];
pnv_deepest_stop_flag = flags[i];
deepest_stop_found = true;
}
- if (!default_stop_found &&
+ if (max_default_residency_ns < residency_ns[i] &&
(flags[i] & OPAL_PM_STOP_INST_FAST)) {
+ max_default_residency_ns = residency_ns[i];
pnv_default_stop_val = psscr_val[i];
pnv_default_stop_mask = psscr_mask[i];
default_stop_found = true;
+ WARN_ON(flags[i] & OPAL_PM_LOSE_FULL_CONTEXT);
}
}
@@ -721,15 +1175,48 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
pnv_deepest_stop_psscr_mask);
}
- pr_info("cpuidle-powernv: Requested Level (RL) value of first deep stop = 0x%llx\n",
- pnv_first_deep_stop_state);
+ pr_info("cpuidle-powernv: First stop level that may lose SPRs = 0x%lld\n",
+ pnv_first_hv_loss_level);
+
+ pr_info("cpuidle-powernv: First stop level that may lose timebase = 0x%lld\n",
+ pnv_first_tb_loss_level);
out:
kfree(psscr_val);
kfree(psscr_mask);
kfree(residency_ns);
+
return rc;
}
+static void __init pnv_disable_deep_states(void)
+{
+ /*
+ * The stop-api is unable to restore hypervisor
+ * resources on wakeup from platform idle states which
+ * lose full context. So disable such states.
+ */
+ pm_flags_possible &= ~OPAL_PM_LOSE_FULL_CONTEXT;
+ pr_warn("cpuidle-powernv: Disabling idle states that lose full context\n");
+ pr_warn("cpuidle-powernv: Idle power-savings, CPU-Hotplug affected\n");
+
+ if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+ (pnv_deepest_stop_flag & OPAL_PM_LOSE_FULL_CONTEXT)) {
+ /*
+ * Use the default stop state for CPU-Hotplug
+ * if available.
+ */
+ if (default_stop_found) {
+ pnv_deepest_stop_psscr_val = pnv_default_stop_val;
+ pnv_deepest_stop_psscr_mask = pnv_default_stop_mask;
+ pr_warn("cpuidle-powernv: Offlined CPUs will stop with psscr = 0x%016llx\n",
+ pnv_deepest_stop_psscr_val);
+ } else { /* Fallback to snooze loop for CPU-Hotplug */
+ deepest_stop_found = false;
+ pr_warn("cpuidle-powernv: Offlined CPUs will busy wait\n");
+ }
+ }
+}
+
/*
* Probe device tree for supported idle states
*/
@@ -766,42 +1253,78 @@ static void __init pnv_probe_idle_states(void)
}
for (i = 0; i < dt_idle_states; i++)
- supported_cpuidle_states |= flags[i];
+ pm_flags_possible |= flags[i];
out:
kfree(flags);
}
+
static int __init pnv_init_idle_states(void)
{
+ int cpu;
- supported_cpuidle_states = 0;
+ /* Set up PACA fields */
+ for_each_present_cpu(cpu) {
+ struct paca_struct *p = paca_ptrs[cpu];
+
+ p->idle_state = 0;
+ if (cpu == cpu_first_thread_sibling(cpu))
+ p->idle_state = (1 << threads_per_core) - 1;
+
+ if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+ /* P7/P8 nap */
+ p->thread_idle_state = PNV_THREAD_RUNNING;
+ p->thread_mask = 1 << cpu_thread_in_core(cpu);
+ } else {
+ /* P9 stop */
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ p->requested_psscr = 0;
+ atomic_set(&p->dont_stop, 0);
+#endif
+ }
+ }
+
+ pm_flags_possible = 0;
if (cpuidle_disable != IDLE_NO_OVERRIDE)
goto out;
pnv_probe_idle_states();
- if (!(supported_cpuidle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
- patch_instruction(
- (unsigned int *)pnv_fastsleep_workaround_at_entry,
- PPC_INST_NOP);
- patch_instruction(
- (unsigned int *)pnv_fastsleep_workaround_at_exit,
- PPC_INST_NOP);
- } else {
- /*
- * OPAL_PM_SLEEP_ENABLED_ER1 is set. It indicates that
- * workaround is needed to use fastsleep. Provide sysfs
- * control to choose how this workaround has to be applied.
- */
- device_create_file(cpu_subsys.dev_root,
+ if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+ if (!(pm_flags_possible & OPAL_PM_SLEEP_ENABLED_ER1)) {
+ power7_fastsleep_workaround_entry = false;
+ power7_fastsleep_workaround_exit = false;
+ } else {
+ /*
+ * OPAL_PM_SLEEP_ENABLED_ER1 is set. It indicates that
+ * workaround is needed to use fastsleep. Provide sysfs
+ * control to choose how this workaround has to be
+ * applied.
+ */
+ device_create_file(cpu_subsys.dev_root,
&dev_attr_fastsleep_workaround_applyonce);
- }
+ }
- pnv_alloc_idle_core_states();
+ update_subcore_sibling_mask();
+
+ if (pm_flags_possible & OPAL_PM_NAP_ENABLED) {
+ ppc_md.power_save = power7_idle;
+ power7_offline_type = PNV_THREAD_NAP;
+ }
- if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
- ppc_md.power_save = power7_idle;
+ if ((pm_flags_possible & OPAL_PM_WINKLE_ENABLED) &&
+ (pm_flags_possible & OPAL_PM_LOSE_FULL_CONTEXT))
+ power7_offline_type = PNV_THREAD_WINKLE;
+ else if ((pm_flags_possible & OPAL_PM_SLEEP_ENABLED) ||
+ (pm_flags_possible & OPAL_PM_SLEEP_ENABLED_ER1))
+ power7_offline_type = PNV_THREAD_SLEEP;
+ }
+
+ if (pm_flags_possible & OPAL_PM_LOSE_FULL_CONTEXT) {
+ if (pnv_save_sprs_for_deep_states())
+ pnv_disable_deep_states();
+ }
out:
return 0;
diff --git a/arch/powerpc/platforms/powernv/subcore.c b/arch/powerpc/platforms/powernv/subcore.c
index 45563004feda..1d7a9fd30dd1 100644
--- a/arch/powerpc/platforms/powernv/subcore.c
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -183,7 +183,7 @@ static void unsplit_core(void)
cpu = smp_processor_id();
if (cpu_thread_in_core(cpu) != 0) {
while (mfspr(SPRN_HID0) & mask)
- power7_idle_insn(PNV_THREAD_NAP);
+ power7_idle_type(PNV_THREAD_NAP);
per_cpu(split_state, cpu).step = SYNC_STEP_UNSPLIT;
return;
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fdc43d5ccb42..4fcae8e8e741 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2417,7 +2417,6 @@ static void dump_one_paca(int cpu)
DUMP(p, irq_happened, "%#-*x");
DUMP(p, io_sync, "%#-*x");
DUMP(p, irq_work_pending, "%#-*x");
- DUMP(p, nap_state_lost, "%#-*x");
DUMP(p, sprg_vdso, "%#-*llx");
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
@@ -2425,19 +2424,17 @@ static void dump_one_paca(int cpu)
#endif
#ifdef CONFIG_PPC_POWERNV
- DUMP(p, core_idle_state_ptr, "%-*px");
- DUMP(p, thread_idle_state, "%#-*x");
- DUMP(p, thread_mask, "%#-*x");
- DUMP(p, subcore_sibling_mask, "%#-*x");
- DUMP(p, requested_psscr, "%#-*llx");
- DUMP(p, stop_sprs.pid, "%#-*llx");
- DUMP(p, stop_sprs.ldbar, "%#-*llx");
- DUMP(p, stop_sprs.fscr, "%#-*llx");
- DUMP(p, stop_sprs.hfscr, "%#-*llx");
- DUMP(p, stop_sprs.mmcr1, "%#-*llx");
- DUMP(p, stop_sprs.mmcr2, "%#-*llx");
- DUMP(p, stop_sprs.mmcra, "%#-*llx");
- DUMP(p, dont_stop.counter, "%#-*x");
+ DUMP(p, idle_state, "%#-*lx");
+ if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+ DUMP(p, thread_idle_state, "%#-*x");
+ DUMP(p, thread_mask, "%#-*x");
+ DUMP(p, subcore_sibling_mask, "%#-*x");
+ } else {
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+ DUMP(p, requested_psscr, "%#-*llx");
+ DUMP(p, dont_stop.counter, "%#-*x");
+#endif
+ }
#endif
DUMP(p, accounting.utime, "%#-*lx");
^ permalink raw reply related
* [PATCH] powerpc/64s/radix: Fix missing global invalidations when removing copro
From: Frederic Barrat @ 2018-07-31 13:24 UTC (permalink / raw)
To: linuxppc-dev, vaibhav, npiggin; +Cc: felix, clombard
With the optimizations for TLB invalidation from commit 0cef77c7798a
("powerpc/64s/radix: flush remote CPUs out of single-threaded
mm_cpumask"), the scope of a TLBI (global vs. local) can now be
influenced by the value of the 'copros' counter of the memory context.
When calling mm_context_remove_copro(), the 'copros' counter is
decremented first before flushing. It may have the unintended side
effect of sending local TLBIs when we explicitly need global
invalidations in this case. Thus breaking any nMMU user in a bad and
unpredictable way.
Fix it by flushing first, before updating the 'copros' counter, so
that invalidations will be global.
Fixes: 0cef77c7798a ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
arch/powerpc/include/asm/mmu_context.h | 33 ++++++++++++++++----------
1 file changed, 21 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 79d570cbf332..b2f89b621b15 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -143,24 +143,33 @@ static inline void mm_context_remove_copro(struct mm_struct *mm)
{
int c;
- c = atomic_dec_if_positive(&mm->context.copros);
-
- /* Detect imbalance between add and remove */
- WARN_ON(c < 0);
-
/*
- * Need to broadcast a global flush of the full mm before
- * decrementing active_cpus count, as the next TLBI may be
- * local and the nMMU and/or PSL need to be cleaned up.
- * Should be rare enough so that it's acceptable.
+ * When removing the last copro, we need to broadcast a global
+ * flush of the full mm, as the next TLBI may be local and the
+ * nMMU and/or PSL need to be cleaned up.
+ *
+ * Both the 'copros' and 'active_cpus' counts are looked at in
+ * flush_all_mm() to determine the scope (local/global) of the
+ * TLBIs, so we need to flush first before decrementing
+ * 'copros'. If this API is used by several callers for the
+ * same context, it can lead to over-flushing. It's hopefully
+ * not common enough to be a problem.
*
* Skip on hash, as we don't know how to do the proper flush
* for the time being. Invalidations will remain global if
- * used on hash.
+ * used on hash. Note that we can't drop 'copros' either, as
+ * it could make some invalidations local with no flush
+ * in-between.
*/
- if (c == 0 && radix_enabled()) {
+ if (radix_enabled()) {
flush_all_mm(mm);
- dec_mm_active_cpus(mm);
+
+ c = atomic_dec_if_positive(&mm->context.copros);
+ /* Detect imbalance between add and remove */
+ WARN_ON(c < 0);
+
+ if (c == 0)
+ dec_mm_active_cpus(mm);
}
}
#else
--
2.17.1
^ permalink raw reply related
* Re: [PATCH 08/20] powerpc/dma: remove the unused dma_nommu_ops export
From: Christoph Hellwig @ 2018-07-31 12:16 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Tony Luck, Fenghua Yu
Cc: linuxppc-dev, iommu, linux-ia64, Robin Murphy,
Konrad Rzeszutek Wilk
In-Reply-To: <20180730163824.10064-9-hch@lst.de>
It turns out cxl actually uses it. So for now skip this patch,
although random code in drivers messing with dma ops will need to
be sorted out sooner or later.
^ permalink raw reply
* [PATCH v2 2/2] selftests/powerpc: Add more version checks to alignment_handler test
From: Michael Ellerman @ 2018-07-31 12:08 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, andrew.donnellan
In-Reply-To: <20180731120842.32715-1-mpe@ellerman.id.au>
The alignment_handler is documented to only work on Power8/Power9, but
we can make it run on older CPUs by guarding more of the tests with
feature checks.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
.../powerpc/alignment/alignment_handler.c | 67 +++++++++++++++++++---
1 file changed, 59 insertions(+), 8 deletions(-)
v2: Don't incorrectly duplicate any of the tests, as noticed by @ajd.
diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 0eddd16af49f..169a8b9719fb 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -49,6 +49,8 @@
#include <setjmp.h>
#include <signal.h>
+#include <asm/cputable.h>
+
#include "utils.h"
int bufsize;
@@ -289,6 +291,7 @@ int test_alignment_handler_vsx_206(void)
int rc = 0;
SKIP_IF(!can_open_fb0());
+ SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
printf("VSX: 2.06B\n");
LOAD_VSX_XFORM_TEST(lxvd2x);
@@ -306,6 +309,7 @@ int test_alignment_handler_vsx_207(void)
int rc = 0;
SKIP_IF(!can_open_fb0());
+ SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_2_07));
printf("VSX: 2.07B\n");
LOAD_VSX_XFORM_TEST(lxsspx);
@@ -380,7 +384,6 @@ int test_alignment_handler_integer(void)
LOAD_DFORM_TEST(ldu);
LOAD_XFORM_TEST(ldx);
LOAD_XFORM_TEST(ldux);
- LOAD_XFORM_TEST(ldbrx);
LOAD_DFORM_TEST(lmw);
STORE_DFORM_TEST(stb);
STORE_XFORM_TEST(stbx);
@@ -400,8 +403,23 @@ int test_alignment_handler_integer(void)
STORE_XFORM_TEST(stdx);
STORE_DFORM_TEST(stdu);
STORE_XFORM_TEST(stdux);
- STORE_XFORM_TEST(stdbrx);
STORE_DFORM_TEST(stmw);
+
+ return rc;
+}
+
+int test_alignment_handler_integer_206(void)
+{
+ int rc = 0;
+
+ SKIP_IF(!can_open_fb0());
+ SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
+
+ printf("Integer: 2.06\n");
+
+ LOAD_XFORM_TEST(ldbrx);
+ STORE_XFORM_TEST(stdbrx);
+
return rc;
}
@@ -410,6 +428,7 @@ int test_alignment_handler_vmx(void)
int rc = 0;
SKIP_IF(!can_open_fb0());
+ SKIP_IF(!have_hwcap(PPC_FEATURE_HAS_ALTIVEC));
printf("VMX\n");
LOAD_VMX_XFORM_TEST(lvx);
@@ -441,20 +460,14 @@ int test_alignment_handler_fp(void)
printf("Floating point\n");
LOAD_FLOAT_DFORM_TEST(lfd);
LOAD_FLOAT_XFORM_TEST(lfdx);
- LOAD_FLOAT_DFORM_TEST(lfdp);
- LOAD_FLOAT_XFORM_TEST(lfdpx);
LOAD_FLOAT_DFORM_TEST(lfdu);
LOAD_FLOAT_XFORM_TEST(lfdux);
LOAD_FLOAT_DFORM_TEST(lfs);
LOAD_FLOAT_XFORM_TEST(lfsx);
LOAD_FLOAT_DFORM_TEST(lfsu);
LOAD_FLOAT_XFORM_TEST(lfsux);
- LOAD_FLOAT_XFORM_TEST(lfiwzx);
- LOAD_FLOAT_XFORM_TEST(lfiwax);
STORE_FLOAT_DFORM_TEST(stfd);
STORE_FLOAT_XFORM_TEST(stfdx);
- STORE_FLOAT_DFORM_TEST(stfdp);
- STORE_FLOAT_XFORM_TEST(stfdpx);
STORE_FLOAT_DFORM_TEST(stfdu);
STORE_FLOAT_XFORM_TEST(stfdux);
STORE_FLOAT_DFORM_TEST(stfs);
@@ -466,6 +479,38 @@ int test_alignment_handler_fp(void)
return rc;
}
+int test_alignment_handler_fp_205(void)
+{
+ int rc = 0;
+
+ SKIP_IF(!can_open_fb0());
+ SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_05));
+
+ printf("Floating point: 2.05\n");
+
+ LOAD_FLOAT_DFORM_TEST(lfdp);
+ LOAD_FLOAT_XFORM_TEST(lfdpx);
+ LOAD_FLOAT_XFORM_TEST(lfiwax);
+ STORE_FLOAT_DFORM_TEST(stfdp);
+ STORE_FLOAT_XFORM_TEST(stfdpx);
+
+ return rc;
+}
+
+int test_alignment_handler_fp_206(void)
+{
+ int rc = 0;
+
+ SKIP_IF(!can_open_fb0());
+ SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
+
+ printf("Floating point: 2.06\n");
+
+ LOAD_FLOAT_XFORM_TEST(lfiwzx);
+
+ return rc;
+}
+
void usage(char *prog)
{
printf("Usage: %s [options]\n", prog);
@@ -513,9 +558,15 @@ int main(int argc, char *argv[])
"test_alignment_handler_vsx_300");
rc |= test_harness(test_alignment_handler_integer,
"test_alignment_handler_integer");
+ rc |= test_harness(test_alignment_handler_integer_206,
+ "test_alignment_handler_integer_206");
rc |= test_harness(test_alignment_handler_vmx,
"test_alignment_handler_vmx");
rc |= test_harness(test_alignment_handler_fp,
"test_alignment_handler_fp");
+ rc |= test_harness(test_alignment_handler_fp_205,
+ "test_alignment_handler_fp_205");
+ rc |= test_harness(test_alignment_handler_fp_206,
+ "test_alignment_handler_fp_206");
return rc;
}
--
2.14.1
^ permalink raw reply related
* [PATCH v2 1/2] selftests/powerpc: Skip earlier in alignment_handler test
From: Michael Ellerman @ 2018-07-31 12:08 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, andrew.donnellan
Currently the alignment_handler test prints "Can't open /dev/fb0"
about 80 times per run, which is a little annoying.
Refactor it to check earlier if it can open /dev/fb0 and skip if not,
this results in each test printing something like:
test: test_alignment_handler_vsx_206
tags: git_version:v4.18-rc3-134-gfb21a48904aa
[SKIP] Test skipped on line 291
skip: test_alignment_handler_vsx_206
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
.../powerpc/alignment/alignment_handler.c | 40 +++++++++++++++++++---
1 file changed, 35 insertions(+), 5 deletions(-)
v2: Unchanged.
diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 0f2698f9fd6d..0eddd16af49f 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -40,6 +40,7 @@
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
+#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
@@ -191,7 +192,7 @@ int test_memcmp(void *s1, void *s2, int n, int offset, char *test_name)
*/
int do_test(char *test_name, void (*test_func)(char *, char *))
{
- int offset, width, fd, rc = 0, r;
+ int offset, width, fd, rc, r;
void *mem0, *mem1, *ci0, *ci1;
printf("\tDoing %s:\t", test_name);
@@ -199,8 +200,8 @@ int do_test(char *test_name, void (*test_func)(char *, char *))
fd = open("/dev/fb0", O_RDWR);
if (fd < 0) {
printf("\n");
- perror("Can't open /dev/fb0");
- SKIP_IF(1);
+ perror("Can't open /dev/fb0 now?");
+ return 1;
}
ci0 = mmap(NULL, bufsize, PROT_WRITE, MAP_SHARED,
@@ -226,6 +227,7 @@ int do_test(char *test_name, void (*test_func)(char *, char *))
return rc;
}
+ rc = 0;
/* offset = 0 no alignment fault, so skip */
for (offset = 1; offset < 16; offset++) {
width = 16; /* vsx == 16 bytes */
@@ -244,32 +246,50 @@ int do_test(char *test_name, void (*test_func)(char *, char *))
r |= test_memcpy(mem1, mem0, width, offset, test_func);
if (r && !debug) {
printf("FAILED: Got signal");
+ rc = 1;
break;
}
r |= test_memcmp(mem1, ci1, width, offset, test_name);
- rc |= r;
if (r && !debug) {
printf("FAILED: Wrong Data");
+ rc = 1;
break;
}
}
- if (!r)
+
+ if (rc == 0)
printf("PASSED");
+
printf("\n");
munmap(ci0, bufsize);
munmap(ci1, bufsize);
free(mem0);
free(mem1);
+ close(fd);
return rc;
}
+static bool can_open_fb0(void)
+{
+ int fd;
+
+ fd = open("/dev/fb0", O_RDWR);
+ if (fd < 0)
+ return false;
+
+ close(fd);
+ return true;
+}
+
int test_alignment_handler_vsx_206(void)
{
int rc = 0;
+ SKIP_IF(!can_open_fb0());
+
printf("VSX: 2.06B\n");
LOAD_VSX_XFORM_TEST(lxvd2x);
LOAD_VSX_XFORM_TEST(lxvw4x);
@@ -285,6 +305,8 @@ int test_alignment_handler_vsx_207(void)
{
int rc = 0;
+ SKIP_IF(!can_open_fb0());
+
printf("VSX: 2.07B\n");
LOAD_VSX_XFORM_TEST(lxsspx);
LOAD_VSX_XFORM_TEST(lxsiwax);
@@ -298,6 +320,8 @@ int test_alignment_handler_vsx_300(void)
{
int rc = 0;
+ SKIP_IF(!can_open_fb0());
+
SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_3_00));
printf("VSX: 3.00B\n");
LOAD_VMX_DFORM_TEST(lxsd);
@@ -328,6 +352,8 @@ int test_alignment_handler_integer(void)
{
int rc = 0;
+ SKIP_IF(!can_open_fb0());
+
printf("Integer\n");
LOAD_DFORM_TEST(lbz);
LOAD_DFORM_TEST(lbzu);
@@ -383,6 +409,8 @@ int test_alignment_handler_vmx(void)
{
int rc = 0;
+ SKIP_IF(!can_open_fb0());
+
printf("VMX\n");
LOAD_VMX_XFORM_TEST(lvx);
@@ -408,6 +436,8 @@ int test_alignment_handler_fp(void)
{
int rc = 0;
+ SKIP_IF(!can_open_fb0());
+
printf("Floating point\n");
LOAD_FLOAT_DFORM_TEST(lfd);
LOAD_FLOAT_XFORM_TEST(lfdx);
--
2.14.1
^ permalink raw reply related
* Re: [PATCH] powerpc/pasemi: Seach for PCI root bus by compatible property
From: Michael Ellerman @ 2018-07-31 12:04 UTC (permalink / raw)
To: Darren Stevens, linuxppc-dev; +Cc: Olof Johansson, Christian Zigotzky
In-Reply-To: <8736vzhb6h.fsf@concordia.ellerman.id.au>
Michael Ellerman <mpe@ellerman.id.au> writes:
> Darren Stevens <darren@stevens-zone.net> writes:
>
>> Pasemi arch code finds the root of the PCI-e bus by searching the
>> device-tree for a node called 'pxp'. But the root bus has a
>> compatible property of 'pasemi,rootbus' so search for that instead.
>>
>> Signed-off-by: Darren Stevens <darren@stevens-zone.net>
>> ---
>>
>> This works on the Amigaone X1000, I don't know if this method of
>> finding the pci bus was there bcause of earlier firmwares.
>
> Does anyone have another pasemi board they can test this on?
>
> The last time I plugged mine in it popped the power supply and took out
> power to half the office :) - I haven't had a chance to try it since.
I actually I remembered I have a device tree lying around from an electra.
It has:
[I] home:pxp@0,80000000(7)(I)> lsprop name compatible
name "pxp"
compatible "pasemi,rootbus"
"pa-pxp"
So it looks like the patch would work fine on it at least.
cheers
>> diff --git a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c
>> index c7c8607..be62380 100644
>> --- a/arch/powerpc/platforms/pasemi/pci.c
>> +++ b/arch/powerpc/platforms/pasemi/pci.c
>> @@ -216,6 +216,7 @@ static int __init pas_add_bridge(struct device_node *dev)
>> void __init pas_pci_init(void)
>> {
>> struct device_node *np, *root;
>> + int res;
>>
>> root = of_find_node_by_path("/");
>> if (!root) {
>> @@ -226,11 +227,11 @@ void __init pas_pci_init(void)
>>
>> pci_set_flags(PCI_SCAN_ALL_PCIE_DEVS);
>>
>> - for (np = NULL; (np = of_get_next_child(root, np)) != NULL;)
>> - if (np->name && !strcmp(np->name, "pxp") && !pas_add_bridge(np))
>> - of_node_get(np);
>> -
>> - of_node_put(root);
>> + np = of_find_compatible_node(root, NULL, "pasemi,rootbus");
>> + if (np) {
>> + res = pas_add_bridge(np);
>> + of_node_put(np);
>> + }
>> }
>>
>> void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)
^ permalink raw reply
* Re: [PATCH] powerpc/pasemi: Seach for PCI root bus by compatible property
From: Michael Ellerman @ 2018-07-31 11:55 UTC (permalink / raw)
To: Darren Stevens, linuxppc-dev; +Cc: Olof Johansson, Christian Zigotzky
In-Reply-To: <4c4cb2b4391.4d23d508@auth.smtp.1and1.co.uk>
Darren Stevens <darren@stevens-zone.net> writes:
> Pasemi arch code finds the root of the PCI-e bus by searching the
> device-tree for a node called 'pxp'. But the root bus has a
> compatible property of 'pasemi,rootbus' so search for that instead.
>
> Signed-off-by: Darren Stevens <darren@stevens-zone.net>
> ---
>
> This works on the Amigaone X1000, I don't know if this method of
> finding the pci bus was there bcause of earlier firmwares.
Does anyone have another pasemi board they can test this on?
The last time I plugged mine in it popped the power supply and took out
power to half the office :) - I haven't had a chance to try it since.
cheers
> diff --git a/arch/powerpc/platforms/pasemi/pci.c b/arch/powerpc/platforms/pasemi/pci.c
> index c7c8607..be62380 100644
> --- a/arch/powerpc/platforms/pasemi/pci.c
> +++ b/arch/powerpc/platforms/pasemi/pci.c
> @@ -216,6 +216,7 @@ static int __init pas_add_bridge(struct device_node *dev)
> void __init pas_pci_init(void)
> {
> struct device_node *np, *root;
> + int res;
>
> root = of_find_node_by_path("/");
> if (!root) {
> @@ -226,11 +227,11 @@ void __init pas_pci_init(void)
>
> pci_set_flags(PCI_SCAN_ALL_PCIE_DEVS);
>
> - for (np = NULL; (np = of_get_next_child(root, np)) != NULL;)
> - if (np->name && !strcmp(np->name, "pxp") && !pas_add_bridge(np))
> - of_node_get(np);
> -
> - of_node_put(root);
> + np = of_find_compatible_node(root, NULL, "pasemi,rootbus");
> + if (np) {
> + res = pas_add_bridge(np);
> + of_node_put(np);
> + }
> }
>
> void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)
^ permalink raw reply
* Re: [PATCH resend] powerpc/64s: fix page table fragment refcount race vs speculative references
From: Michael Ellerman @ 2018-07-31 11:42 UTC (permalink / raw)
To: Nicholas Piggin, Matthew Wilcox
Cc: linux-mm, Linus Torvalds, Andrew Morton, linuxppc-dev,
Aneesh Kumar K . V
In-Reply-To: <20180728023255.720d594c@roar.ozlabs.ibm.com>
Nicholas Piggin <npiggin@gmail.com> writes:
> On Fri, 27 Jul 2018 08:38:35 -0700
> Matthew Wilcox <willy@infradead.org> wrote:
>> On Sat, Jul 28, 2018 at 12:29:06AM +1000, Nicholas Piggin wrote:
>> > On Fri, 27 Jul 2018 06:41:56 -0700
>> > Matthew Wilcox <willy@infradead.org> wrote:
>> > > On Fri, Jul 27, 2018 at 09:48:17PM +1000, Nicholas Piggin wrote:
>> > > > The page table fragment allocator uses the main page refcount racily
>> > > > with respect to speculative references. A customer observed a BUG due
>> > > > to page table page refcount underflow in the fragment allocator. This
>> > > > can be caused by the fragment allocator set_page_count stomping on a
>> > > > speculative reference, and then the speculative failure handler
>> > > > decrements the new reference, and the underflow eventually pops when
>> > > > the page tables are freed.
>> > >
>> > > Oof. Can't you fix this instead by using page_ref_add() instead of
>> > > set_page_count()?
>> >
>> > It's ugly doing it that way. The problem is we have a page table
>> > destructor and that would be missed if the spec ref was the last
>> > put. In practice with RCU page table freeing maybe you can say
>> > there will be no spec ref there (unless something changes), but
>> > still it just seems much simpler doing this and avoiding any
>> > complexity or relying on other synchronization.
>>
>> I don't want to rely on the speculative reference not happening by the
>> time the page table is torn down; that's way too black-magic for me.
>> Another possibility would be to use, say, the top 16 bits of the
>> atomic for your counter and call the dtor once the atomic is below 64k.
>> I'm also thinking about overhauling the dtor system so it's not tied to
>> compound pages; anyone with a bit in page_type would be able to use it.
>> That way you'd always get your dtor called, even if the speculative
>> reference was the last one.
>
> Yeah we could look at doing either of those if necessary.
>
>> > > > Any objection to the struct page change to grab the arch specific
>> > > > page table page word for powerpc to use? If not, then this should
>> > > > go via powerpc tree because it's inconsequential for core mm.
>> > >
>> > > I want (eventually) to get to the point where every struct page carries
>> > > a pointer to the struct mm that it belongs to. It's good for debugging
>> > > as well as handling memory errors in page tables.
>> >
>> > That doesn't seem like it should be a problem, there's some spare
>> > words there for arch independent users.
>>
>> Could you take one of the spare words instead then? My intent was to
>> just take the 'x86 pgds only' comment off that member. _pt_pad_2 looks
>> ideal because it'll be initialised to 0 and you'll return it to 0 by
>> the time you're done.
>
> It doesn't matter for powerpc where the atomic_t goes, so I'm fine with
> moving it. But could you juggle the fields with your patch instead? I
> thought it would be nice to using this field that has been already
> tested on x86 not to overlap with any other data for
> bug fix that'll have to be widely backported.
Can we come to a conclusion on this one?
As far as backporting goes pt_mm is new in 4.18-rc so the patch will
need to be manually backported anyway. But I agree with Nick we'd rather
use a slot that is known to be free for arch use.
cheers
^ permalink raw reply
* Re: [PATCH v3] PCI: Data corruption happening due to race condition
From: Michael Ellerman @ 2018-07-31 11:21 UTC (permalink / raw)
To: Bjorn Helgaas, Benjamin Herrenschmidt
Cc: Hari Vyas, bhelgaas, linux-pci, ray.jui, Paul Mackerras,
linuxppc-dev, Sam Bobroff
In-Reply-To: <20180727222540.GH173328@bhelgaas-glaptop.roam.corp.google.com>
Bjorn Helgaas <helgaas@kernel.org> writes:
> On Thu, Jul 19, 2018 at 02:18:09PM +1000, Benjamin Herrenschmidt wrote:
>> On Wed, 2018-07-18 at 18:29 -0500, Bjorn Helgaas wrote:
>> > [+cc Paul, Michael, linuxppc-dev]
>> >
>>
>> ..../...
>>
>> > > Debugging revealed a race condition between pcie core driver
>> > > enabling is_added bit(pci_bus_add_device()) and nvme driver
>> > > reset work-queue enabling is_busmaster bit (by pci_set_master()).
>> > > As both fields are not handled in atomic manner and that clears
>> > > is_added bit.
>> > >
>> > > Fix moves device addition is_added bit to separate private flag
>> > > variable and use different atomic functions to set and retrieve
>> > > device addition state. As is_added shares different memory
>> > > location so race condition is avoided.
>> >
>> > Really nice bit of debugging!
>>
>> Indeed. However I'm not fan of the solution. Shouldn't we instead have
>> some locking for the content of pci_dev ? I've always been wary of us
>> having other similar races in there.
>>
>> As for the powerpc bits, I'm probably the one who wrote them, however,
>> I'm on vacation this week and right now, no bandwidth to context switch
>> all that back in :-) So give me a few days and/or ping me next week.
>
> OK, here's a ping :)
>
> Some powerpc cleanup would be ideal, but I'd like to fix the race for
> v4.19, so I'm fine with this patch as-is. But I'd definitely want
> your ack before inserting the ugly #include path in the powerpc code.
Sorry, the patch didn't hit linuxppc so I forgot about it.
I'm OK with the patch, the include is a bit gross, but I guess it's
fine.
I have a change to pseries/setup.c queued that might collide, though
it's just an addition of another include so it's a trivial fixup.
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
In terms of longer term clean up, do you have a sketch of what you'd
like to see?
cheers
^ permalink raw reply
* Re: [PATCH v5 09/11] hugetlb: Introduce generic version of huge_ptep_set_wrprotect
From: Alexandre Ghiti @ 2018-07-31 11:17 UTC (permalink / raw)
To: Michael Ellerman, linux-mm, mike.kravetz, linux, catalin.marinas,
will.deacon, tony.luck, fenghua.yu, ralf, paul.burton, jhogan,
jejb, deller, benh, ysato, dalias, davem, tglx, mingo, hpa, x86,
arnd, linux-arm-kernel, linux-kernel, linux-ia64, linux-mips,
linux-parisc, linuxppc-dev, linux-sh, sparclinux, linux-arch,
aneesh.kumar@linux.ibm.com
In-Reply-To: <87h8kfhg7o.fsf@concordia.ellerman.id.au>
On 07/31/2018 12:06 PM, Michael Ellerman wrote:
> Alexandre Ghiti <alex@ghiti.fr> writes:
>
>> arm, ia64, mips, sh, x86 architectures use the same version
>> of huge_ptep_set_wrprotect, so move this generic implementation into
>> asm-generic/hugetlb.h.
>> Note: powerpc uses twice for book3s/32 and nohash/32 the same version as
>> the above architectures, but the modification was not straightforward
>> and hence has not been done.
> Do you remember what the problem was there?
>
> It looks like you should just be able to drop them like the others. I
> assume there's some header spaghetti that causes problems though?
Yes, the header spaghetti frightened me a bit. Maybe I should have tried
harder: I can try to remove them and find the right defconfigs to
compile both to begin with. And to guarantee the functionality is
preserved, can I use the testsuite of libhugetlbfs with qemu ?
Alex
>
> cheers
>
>
>> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>> arch/arm/include/asm/hugetlb-3level.h | 6 ------
>> arch/arm64/include/asm/hugetlb.h | 1 +
>> arch/ia64/include/asm/hugetlb.h | 6 ------
>> arch/mips/include/asm/hugetlb.h | 6 ------
>> arch/parisc/include/asm/hugetlb.h | 1 +
>> arch/powerpc/include/asm/book3s/32/pgtable.h | 2 ++
>> arch/powerpc/include/asm/book3s/64/pgtable.h | 1 +
>> arch/powerpc/include/asm/nohash/32/pgtable.h | 2 ++
>> arch/powerpc/include/asm/nohash/64/pgtable.h | 1 +
>> arch/sh/include/asm/hugetlb.h | 6 ------
>> arch/sparc/include/asm/hugetlb.h | 1 +
>> arch/x86/include/asm/hugetlb.h | 6 ------
>> include/asm-generic/hugetlb.h | 8 ++++++++
>> 13 files changed, 17 insertions(+), 30 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/hugetlb-3level.h b/arch/arm/include/asm/hugetlb-3level.h
>> index b897541520ef..8247cd6a2ac6 100644
>> --- a/arch/arm/include/asm/hugetlb-3level.h
>> +++ b/arch/arm/include/asm/hugetlb-3level.h
>> @@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>> return retval;
>> }
>>
>> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> - unsigned long addr, pte_t *ptep)
>> -{
>> - ptep_set_wrprotect(mm, addr, ptep);
>> -}
>> -
>> static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> unsigned long addr, pte_t *ptep,
>> pte_t pte, int dirty)
>> diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
>> index 3e7f6e69b28d..f4f69ae5466e 100644
>> --- a/arch/arm64/include/asm/hugetlb.h
>> +++ b/arch/arm64/include/asm/hugetlb.h
>> @@ -48,6 +48,7 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
>> extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>> unsigned long addr, pte_t *ptep);
>> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>> extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> unsigned long addr, pte_t *ptep);
>> #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
>> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
>> index cbe296271030..49d1f7949f3a 100644
>> --- a/arch/ia64/include/asm/hugetlb.h
>> +++ b/arch/ia64/include/asm/hugetlb.h
>> @@ -27,12 +27,6 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
>> {
>> }
>>
>> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> - unsigned long addr, pte_t *ptep)
>> -{
>> - ptep_set_wrprotect(mm, addr, ptep);
>> -}
>> -
>> static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> unsigned long addr, pte_t *ptep,
>> pte_t pte, int dirty)
>> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
>> index 6ff2531cfb1d..3dcf5debf8c4 100644
>> --- a/arch/mips/include/asm/hugetlb.h
>> +++ b/arch/mips/include/asm/hugetlb.h
>> @@ -63,12 +63,6 @@ static inline int huge_pte_none(pte_t pte)
>> return !val || (val == (unsigned long)invalid_pte_table);
>> }
>>
>> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> - unsigned long addr, pte_t *ptep)
>> -{
>> - ptep_set_wrprotect(mm, addr, ptep);
>> -}
>> -
>> static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> unsigned long addr,
>> pte_t *ptep, pte_t pte,
>> diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h
>> index fb7e0fd858a3..9c3950ca2974 100644
>> --- a/arch/parisc/include/asm/hugetlb.h
>> +++ b/arch/parisc/include/asm/hugetlb.h
>> @@ -39,6 +39,7 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
>> {
>> }
>>
>> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>> void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> unsigned long addr, pte_t *ptep);
>>
>> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
>> index 02f5acd7ccc4..d2cd1d0226e9 100644
>> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
>> @@ -228,6 +228,8 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
>> {
>> pte_update(ptep, (_PAGE_RW | _PAGE_HWWRITE), _PAGE_RO);
>> }
>> +
>> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> unsigned long addr, pte_t *ptep)
>> {
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 42aafba7a308..7d957f7c47cd 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -451,6 +451,7 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
>> pte_update(mm, addr, ptep, 0, _PAGE_PRIVILEGED, 0);
>> }
>>
>> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> unsigned long addr, pte_t *ptep)
>> {
>> diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
>> index 7c46a98cc7f4..f39e200d9591 100644
>> --- a/arch/powerpc/include/asm/nohash/32/pgtable.h
>> +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
>> @@ -249,6 +249,8 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
>> {
>> pte_update(ptep, (_PAGE_RW | _PAGE_HWWRITE), _PAGE_RO);
>> }
>> +
>> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> unsigned long addr, pte_t *ptep)
>> {
>> diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
>> index dd0c7236208f..69fbf7e9b4db 100644
>> --- a/arch/powerpc/include/asm/nohash/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
>> @@ -238,6 +238,7 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
>> pte_update(mm, addr, ptep, _PAGE_RW, 0, 0);
>> }
>>
>> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> unsigned long addr, pte_t *ptep)
>> {
>> diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
>> index f1bbd255ee43..8df4004977b9 100644
>> --- a/arch/sh/include/asm/hugetlb.h
>> +++ b/arch/sh/include/asm/hugetlb.h
>> @@ -32,12 +32,6 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
>> {
>> }
>>
>> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> - unsigned long addr, pte_t *ptep)
>> -{
>> - ptep_set_wrprotect(mm, addr, ptep);
>> -}
>> -
>> static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> unsigned long addr, pte_t *ptep,
>> pte_t pte, int dirty)
>> diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
>> index 2101ea217f33..c41754a113f3 100644
>> --- a/arch/sparc/include/asm/hugetlb.h
>> +++ b/arch/sparc/include/asm/hugetlb.h
>> @@ -32,6 +32,7 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
>> {
>> }
>>
>> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> unsigned long addr, pte_t *ptep)
>> {
>> diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
>> index 59c056adb3c9..a3f781f7a264 100644
>> --- a/arch/x86/include/asm/hugetlb.h
>> +++ b/arch/x86/include/asm/hugetlb.h
>> @@ -13,12 +13,6 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
>> return 0;
>> }
>>
>> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> - unsigned long addr, pte_t *ptep)
>> -{
>> - ptep_set_wrprotect(mm, addr, ptep);
>> -}
>> -
>> static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> unsigned long addr, pte_t *ptep,
>> pte_t pte, int dirty)
>> diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
>> index 6c0c8b0c71e0..9b9039845278 100644
>> --- a/include/asm-generic/hugetlb.h
>> +++ b/include/asm-generic/hugetlb.h
>> @@ -102,4 +102,12 @@ static inline int prepare_hugepage_range(struct file *file,
>> }
>> #endif
>>
>> +#ifndef __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>> +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> + unsigned long addr, pte_t *ptep)
>> +{
>> + ptep_set_wrprotect(mm, addr, ptep);
>> +}
>> +#endif
>> +
>> #endif /* _ASM_GENERIC_HUGETLB_H */
>> --
>> 2.16.2
^ permalink raw reply
* Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code
From: Bjorn Helgaas @ 2018-07-31 10:54 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-pci, iommu, linuxppc-dev, x86, linux-sh, linux-kernel
In-Reply-To: <20180730073842.16092-1-hch@lst.de>
On Mon, Jul 30, 2018 at 09:38:42AM +0200, Christoph Hellwig wrote:
> There is nothing arch specific about PCI or dma-debug, so move this
> call to common code just after registering the bus type.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Applied with acks from Thomas and Michael to pci/misc for v4.19, thanks!
> ---
> arch/powerpc/kernel/dma.c | 3 ---
> arch/sh/drivers/pci/pci.c | 2 --
> arch/x86/kernel/pci-dma.c | 3 ---
> drivers/pci/pci-driver.c | 2 +-
> 4 files changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index 155170d70324..dbfc7056d7df 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -357,9 +357,6 @@ EXPORT_SYMBOL_GPL(dma_get_required_mask);
>
> static int __init dma_init(void)
> {
> -#ifdef CONFIG_PCI
> - dma_debug_add_bus(&pci_bus_type);
> -#endif
> #ifdef CONFIG_IBMVIO
> dma_debug_add_bus(&vio_bus_type);
> #endif
> diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
> index e5b7437ab4af..8256626bc53c 100644
> --- a/arch/sh/drivers/pci/pci.c
> +++ b/arch/sh/drivers/pci/pci.c
> @@ -160,8 +160,6 @@ static int __init pcibios_init(void)
> for (hose = hose_head; hose; hose = hose->next)
> pcibios_scanbus(hose);
>
> - dma_debug_add_bus(&pci_bus_type);
> -
> pci_initialized = 1;
>
> return 0;
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index ab5d9dd668d2..43f58632f123 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -155,9 +155,6 @@ static int __init pci_iommu_init(void)
> {
> struct iommu_table_entry *p;
>
> -#ifdef CONFIG_PCI
> - dma_debug_add_bus(&pci_bus_type);
> -#endif
> x86_init.iommu.iommu_init();
>
> for (p = __iommu_table; p < __iommu_table_end; p++) {
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 6792292b5fc7..bef17c3fca67 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1668,7 +1668,7 @@ static int __init pci_driver_init(void)
> if (ret)
> return ret;
> #endif
> -
> + dma_debug_add_bus(&pci_bus_type);
> return 0;
> }
> postcore_initcall(pci_driver_init);
> --
> 2.18.0
>
^ permalink raw reply
* Re: [PATCH] powerpc: do not redefined NEED_DMA_MAP_STATE
From: Michael Ellerman @ 2018-07-31 10:47 UTC (permalink / raw)
To: Christoph Hellwig, benh; +Cc: linuxppc-dev, iommu
In-Reply-To: <20180730073721.15991-1-hch@lst.de>
Christoph Hellwig <hch@lst.de> writes:
> kernel/dma/Kconfig already defines NEED_DMA_MAP_STATE, just select it
> from PPC64 and NOT_COHERENT_CACHE instead.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> arch/powerpc/Kconfig | 3 ---
> arch/powerpc/platforms/Kconfig.cputype | 2 ++
> 2 files changed, 2 insertions(+), 3 deletions(-)
Thanks.
I did this instead:
commit 870771ae76010c5e42ee8e0278f5823e46e96e3f (HEAD -> next-test)
Author: Christoph Hellwig <hch@lst.de>
AuthorDate: Mon Jul 30 09:37:21 2018 +0200
Commit: Michael Ellerman <mpe@ellerman.id.au>
CommitDate: Tue Jul 31 20:43:57 2018 +1000
powerpc: Do not redefine NEED_DMA_MAP_STATE
kernel/dma/Kconfig already defines NEED_DMA_MAP_STATE, just select it
from CONFIG_PPC using the same condition as an if guard.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[mpe: Move it under PPC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5eb4d969afbf..ee38fce075ee 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -226,6 +226,7 @@ config PPC
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select MODULES_USE_ELF_RELA
+ select NEED_DMA_MAP_STATE if PPC64 || NOT_COHERENT_CACHE
select NEED_SG_DMA_LENGTH
select NO_BOOTMEM
select OF
@@ -885,9 +886,6 @@ config ZONE_DMA
bool
default y
-config NEED_DMA_MAP_STATE
- def_bool (PPC64 || NOT_COHERENT_CACHE)
-
config GENERIC_ISA_DMA
bool
depends on ISA_DMA_API
cheers
^ permalink raw reply related
* Re: [PATCH v5 09/11] hugetlb: Introduce generic version of huge_ptep_set_wrprotect
From: Michael Ellerman @ 2018-07-31 10:06 UTC (permalink / raw)
To: Alexandre Ghiti, linux-mm, mike.kravetz, linux, catalin.marinas,
will.deacon, tony.luck, fenghua.yu, ralf, paul.burton, jhogan,
jejb, deller, benh, ysato, dalias, davem, tglx, mingo, hpa, x86,
arnd, linux-arm-kernel, linux-kernel, linux-ia64, linux-mips,
linux-parisc, linuxppc-dev, linux-sh, sparclinux, linux-arch,
aneesh.kumar
Cc: Alexandre Ghiti
In-Reply-To: <20180731060155.16915-10-alex@ghiti.fr>
Alexandre Ghiti <alex@ghiti.fr> writes:
> arm, ia64, mips, sh, x86 architectures use the same version
> of huge_ptep_set_wrprotect, so move this generic implementation into
> asm-generic/hugetlb.h.
> Note: powerpc uses twice for book3s/32 and nohash/32 the same version as
> the above architectures, but the modification was not straightforward
> and hence has not been done.
Do you remember what the problem was there?
It looks like you should just be able to drop them like the others. I
assume there's some header spaghetti that causes problems though?
cheers
> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
> arch/arm/include/asm/hugetlb-3level.h | 6 ------
> arch/arm64/include/asm/hugetlb.h | 1 +
> arch/ia64/include/asm/hugetlb.h | 6 ------
> arch/mips/include/asm/hugetlb.h | 6 ------
> arch/parisc/include/asm/hugetlb.h | 1 +
> arch/powerpc/include/asm/book3s/32/pgtable.h | 2 ++
> arch/powerpc/include/asm/book3s/64/pgtable.h | 1 +
> arch/powerpc/include/asm/nohash/32/pgtable.h | 2 ++
> arch/powerpc/include/asm/nohash/64/pgtable.h | 1 +
> arch/sh/include/asm/hugetlb.h | 6 ------
> arch/sparc/include/asm/hugetlb.h | 1 +
> arch/x86/include/asm/hugetlb.h | 6 ------
> include/asm-generic/hugetlb.h | 8 ++++++++
> 13 files changed, 17 insertions(+), 30 deletions(-)
>
> diff --git a/arch/arm/include/asm/hugetlb-3level.h b/arch/arm/include/asm/hugetlb-3level.h
> index b897541520ef..8247cd6a2ac6 100644
> --- a/arch/arm/include/asm/hugetlb-3level.h
> +++ b/arch/arm/include/asm/hugetlb-3level.h
> @@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
> return retval;
> }
>
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> -{
> - ptep_set_wrprotect(mm, addr, ptep);
> -}
> -
> static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t pte, int dirty)
> diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
> index 3e7f6e69b28d..f4f69ae5466e 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -48,6 +48,7 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep);
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep);
> #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index cbe296271030..49d1f7949f3a 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -27,12 +27,6 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> {
> }
>
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> -{
> - ptep_set_wrprotect(mm, addr, ptep);
> -}
> -
> static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t pte, int dirty)
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 6ff2531cfb1d..3dcf5debf8c4 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -63,12 +63,6 @@ static inline int huge_pte_none(pte_t pte)
> return !val || (val == (unsigned long)invalid_pte_table);
> }
>
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> -{
> - ptep_set_wrprotect(mm, addr, ptep);
> -}
> -
> static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr,
> pte_t *ptep, pte_t pte,
> diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h
> index fb7e0fd858a3..9c3950ca2974 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -39,6 +39,7 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> {
> }
>
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep);
>
> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
> index 02f5acd7ccc4..d2cd1d0226e9 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
> @@ -228,6 +228,8 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
> {
> pte_update(ptep, (_PAGE_RW | _PAGE_HWWRITE), _PAGE_RO);
> }
> +
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 42aafba7a308..7d957f7c47cd 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -451,6 +451,7 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
> pte_update(mm, addr, ptep, 0, _PAGE_PRIVILEGED, 0);
> }
>
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
> index 7c46a98cc7f4..f39e200d9591 100644
> --- a/arch/powerpc/include/asm/nohash/32/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
> @@ -249,6 +249,8 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
> {
> pte_update(ptep, (_PAGE_RW | _PAGE_HWWRITE), _PAGE_RO);
> }
> +
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
> index dd0c7236208f..69fbf7e9b4db 100644
> --- a/arch/powerpc/include/asm/nohash/64/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
> @@ -238,6 +238,7 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
> pte_update(mm, addr, ptep, _PAGE_RW, 0, 0);
> }
>
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
> index f1bbd255ee43..8df4004977b9 100644
> --- a/arch/sh/include/asm/hugetlb.h
> +++ b/arch/sh/include/asm/hugetlb.h
> @@ -32,12 +32,6 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> {
> }
>
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> -{
> - ptep_set_wrprotect(mm, addr, ptep);
> -}
> -
> static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t pte, int dirty)
> diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
> index 2101ea217f33..c41754a113f3 100644
> --- a/arch/sparc/include/asm/hugetlb.h
> +++ b/arch/sparc/include/asm/hugetlb.h
> @@ -32,6 +32,7 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> {
> }
>
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
> index 59c056adb3c9..a3f781f7a264 100644
> --- a/arch/x86/include/asm/hugetlb.h
> +++ b/arch/x86/include/asm/hugetlb.h
> @@ -13,12 +13,6 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
> return 0;
> }
>
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> -{
> - ptep_set_wrprotect(mm, addr, ptep);
> -}
> -
> static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t pte, int dirty)
> diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
> index 6c0c8b0c71e0..9b9039845278 100644
> --- a/include/asm-generic/hugetlb.h
> +++ b/include/asm-generic/hugetlb.h
> @@ -102,4 +102,12 @@ static inline int prepare_hugepage_range(struct file *file,
> }
> #endif
>
> +#ifndef __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep)
> +{
> + ptep_set_wrprotect(mm, addr, ptep);
> +}
> +#endif
> +
> #endif /* _ASM_GENERIC_HUGETLB_H */
> --
> 2.16.2
^ permalink raw reply
* RE: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()
From: Alastair D'Silva @ 2018-07-31 9:52 UTC (permalink / raw)
To: 'Michael Ellerman', 'Murilo Opsfelder Araujo',
'LEROY Christophe'
Cc: linux-kernel, 'Andrew Donnellan', 'Balbir Singh',
'Benjamin Herrenschmidt', 'Cyril Bur',
'Eric W . Biederman', 'Joe Perches',
'Michael Neuling', 'Nicholas Piggin',
'Paul Mackerras', 'Simon Guo',
'Sukadev Bhattiprolu', 'Tobin C . Harding',
linuxppc-dev
In-Reply-To: <87va8vhhsj.fsf@concordia.ellerman.id.au>
> -----Original Message-----
> From: Michael Ellerman <mpe@ellerman.id.au>
> Sent: Tuesday, 31 July 2018 7:32 PM
> To: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>; LEROY Christophe
> <christophe.leroy@c-s.fr>
> Cc: linux-kernel@vger.kernel.org; Alastair D'Silva =
<alastair@d-silva.org>;
> Andrew Donnellan <andrew.donnellan@au1.ibm.com>; Balbir Singh
> <bsingharora@gmail.com>; Benjamin Herrenschmidt
> <benh@kernel.crashing.org>; Cyril Bur <cyrilbur@gmail.com>; Eric W .
> Biederman <ebiederm@xmission.com>; Joe Perches <joe@perches.com>;
> Michael Neuling <mikey@neuling.org>; Nicholas Piggin
> <npiggin@gmail.com>; Paul Mackerras <paulus@samba.org>; Simon Guo
> <wei.guo.simon@gmail.com>; Sukadev Bhattiprolu
> <sukadev@linux.vnet.ibm.com>; Tobin C . Harding <me@tobin.cc>; =
linuxppc-
> dev@lists.ozlabs.org
> Subject: Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in
> show_signal_msg()
>=20
> Murilo Opsfelder Araujo <muriloo@linux.ibm.com> writes:
> > On Mon, Jul 30, 2018 at 06:30:47PM +0200, LEROY Christophe wrote:
> >> Murilo Opsfelder Araujo <muriloo@linux.ibm.com> a =C3=A9crit :
> >> > On Fri, Jul 27, 2018 at 06:40:23PM +0200, LEROY Christophe wrote:
> >> > > Murilo Opsfelder Araujo <muriloo@linux.ibm.com> a =C3=A9crit :
> >> > >
> >> > > > Simplify the message format by using REG_FMT as the register
> >> > > > format. This avoids having two different formats and avoids
> checking for MSR_64BIT.
> >> > >
> >> > > Are you sure it is what we want ?
> >> >
> >> > Yes.
> >> >
> >> > > Won't it change the behaviour for a 32 bits app running on a =
64bits
> kernel ?
> >> >
> >> > In fact, this changes how many zeroes are prefixed when =
displaying
> >> > the registers (%016lx vs. %08lx format). For example, 32-bits
> >> > userspace, 64-bits kernel:
> >>
> >> Indeed that's what I suspected. What is the real benefit of this =
change ?
> >> Why not keep the current format for 32bits userspace ? All those
> >> leading zeroes are pointless to me.
> >
> > One of the benefits is simplifying the code by removing some checks.
> > Another is deduplicating almost identical format strings in favor of =
a unified
> one.
> >
> > After reading Joe's comment [1], %px seems to be the format we're
> looking for.
> > An extract from Documentation/core-api/printk-formats.rst:
> >
> > "%px is functionally equivalent to %lx (or %lu). %px is preferred =
because it
> > is more uniquely grep'able."
> >
> > So I guess we don't need to worry about the format (%016lx vs. =
%08lx),
> > let's just use %px, as per the guideline.
>=20
> I don't think I like %px.
Me neither, semantically, it's for pointers, and the data being =
displayed is not a pointer.
> It makes the format string cleaner, but it means we have to cast =
everything
> to void * which is ugly as heck.
>=20
> I actually don't think the leading zeroes are helpful at all in the =
signal
> message, ie. we should just use %lx there.
>=20
> They are useful in show_regs() because we want everything to line up.
>=20
> So I think I'll drop patch 3 and use 0x%lx in show_signal_msg(), =
meaning we
> end up with, eg:
>=20
> [ 73.414535] segv[3759]: segfault (11) at 0x0 nip 0x10000420 lr =
0xfe61854
> code 0x1 in segv[10000000+10000]
> [ 73.414641] segv[3759]: code: 4e800421 80010014 38210010 7c0803a6
> 4bffff30 9421ffd0 93e1002c 7c3f0b78
> [ 73.414665] segv[3759]: code: 39200000 913f001c 813f001c 39400001
> <91490000> 39200000 7d234b78 397f0030
Or better yet, "%#lx" - the hash adds the appropriate prefix in the =
right case for the format.
--=20
Alastair D'Silva mob: 0423 762 819
skype: alastair_dsilva msn: alastair@d-silva.org
blog: http://alastair.d-silva.org Twitter: @EvilDeece
^ permalink raw reply
* Re: powerpc: 32BIT vs. 64BIT (PPC32 vs. PPC64)
From: Michael Ellerman @ 2018-07-31 9:57 UTC (permalink / raw)
To: Masahiro Yamada, Randy Dunlap
Cc: Stephen Rothwell, linux-kbuild, Nicholas Piggin, linuxppc-dev
In-Reply-To: <CAK7LNASKiO8J8yTRGp4ngo2=ErC6S2_0bNENVsc4PoYpENQfTQ@mail.gmail.com>
Masahiro Yamada <yamada.masahiro@socionext.com> writes:
> 2018-07-07 23:59 GMT+09:00 Randy Dunlap <rdunlap@infradead.org>:
>> On 07/07/2018 05:13 AM, Nicholas Piggin wrote:
>>> On Fri, 6 Jul 2018 21:58:29 -0700
>>> Randy Dunlap <rdunlap@infradead.org> wrote:
>>>
>>>> On 07/06/2018 06:45 PM, Benjamin Herrenschmidt wrote:
>>>>> On Thu, 2018-07-05 at 14:30 -0700, Randy Dunlap wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Is there a good way (or a shortcut) to do something like:
>>>>>>
>>>>>> $ make ARCH=powerpc O=PPC32 [other_options] allmodconfig
>>>>>> to get a PPC32/32BIT allmodconfig
>>>>>>
>>>>>> and also be able to do:
>>>>>>
>>>>>> $make ARCH=powerpc O=PPC64 [other_options] allmodconfig
>>>>>> to get a PPC64/64BIT allmodconfig?
>>>>>
>>>>> Hrm... O= is for the separate build dir, so there much be something
>>>>> else.
>>>>>
>>>>> You mean having ARCH= aliases like ppc/ppc32 and ppc64 ?
>>>>
>>>> Yes.
>>>>
>>>>> That would be a matter of overriding some .config defaults I suppose, I
>>>>> don't know how this is done on other archs.
>>>>>
>>>>> I see the aliasing trick in the Makefile but that's about it.
>>>>>
>>>>>> Note that arch/x86, arch/sh, and arch/sparc have ways to do
>>>>>> some flavor(s) of this (from Documentation/kbuild/kbuild.txt;
>>>>>> sh and sparc based on a recent "fix" patch from me):
>>>>>
>>>>> I fail to see what you are actually talking about here ... sorry. Do
>>>>> you have concrete examples on x86 or sparc ? From what I can tell the
>>>>> "i386" or "sparc32/sparc64" aliases just change SRCARCH in Makefile and
>>>>> 32 vs 64-bit is just a Kconfig option...
>>>>
>>>> Yes, your summary is mostly correct.
>>>>
>>>> I'm just looking for a way to do cross-compile builds that are close to
>>>> ppc32 allmodconfig and ppc64 allmodconfig.
>>>
>>> Would there a problem with adding ARCH=ppc32 / ppc64 matching? This
>>> seems to work...
>>
>> Yes, this mostly works and is similar to a patch (my patch) on my test machine.
>> And they both work for allmodconfig, which is my primary build target.
>>
>> And they both have one little quirk that is confusing when the build target
>> is defconfig:
>>
>> When ARCH=ppc32, the terminal output (stdout) is: (using O=PPC32)
>>
>> make[1]: Entering directory '/home/rdunlap/lnx/lnx-418-rc3/PPC32'
>> GEN ./Makefile
>> *** Default configuration is based on 'ppc64_defconfig' <<<<< NOTE <<<<<
>> #
>> # configuration written to .config
>> #
>> make[1]: Leaving directory '/home/rdunlap/lnx/lnx-418-rc3/PPC32'
>>
>
>
> Maybe, we can set one of ppc32 defconfigs to KBUILD_DEFCONFIG
> if ARCH is ppc32 ?
We could, but as I said in another reply I'd rather we didn't play
tricks with ARCH.
I've merged a patch to add three new allmodconfig targets for ppc32,
ppc64le and ppc64_book3e:
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=54457&state=*
cheers
^ permalink raw reply
* Re: [PATCH 3/6] powerpc: factor out RapidIO Kconfig menu entry
From: Michael Ellerman @ 2018-07-31 9:45 UTC (permalink / raw)
To: Alexei Colin, Alexandre Bounine, Benjamin Herrenschmidt,
Paul Mackerras
Cc: linux-kernel, Andrew Morton, linuxppc-dev, Alexei Colin,
John Paul Walters
In-Reply-To: <20180730225035.28365-4-acolin@isi.edu>
Alexei Colin <acolin@isi.edu> writes:
> The menu entry is now defined in the rapidio subtree. Also, re-order
> the bus menu so tha the platform-specific RapidIO controller appears
> after the entry for the RapidIO subsystem.
>
> Platforms with a PCI bus will be offered the RapidIO menu since they may
> be want support for a RapidIO PCI device. Platforms without a PCI bus
> that might include a RapidIO IP block will need to "select HAS_RAPIDIO"
> in the platform-/machine-specific "config ARCH_*" Kconfig entry.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Paul Walters <jwalters@isi.edu>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Alexei Colin <acolin@isi.edu>
> ---
> arch/powerpc/Kconfig | 13 +------------
> 1 file changed, 1 insertion(+), 12 deletions(-)
Looks good.
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
cheers
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 25d005af0a5b..17ea8a5f90a0 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -993,16 +993,7 @@ source "drivers/pci/Kconfig"
>
> source "drivers/pcmcia/Kconfig"
>
> -config HAS_RAPIDIO
> - bool
> - default n
> -
> -config RAPIDIO
> - tristate "RapidIO support"
> - depends on HAS_RAPIDIO || PCI
> - help
> - If you say Y here, the kernel will include drivers and
> - infrastructure code to support RapidIO interconnect devices.
> +source "drivers/rapidio/Kconfig"
>
> config FSL_RIO
> bool "Freescale Embedded SRIO Controller support"
> @@ -1012,8 +1003,6 @@ config FSL_RIO
> Include support for RapidIO controller on Freescale embedded
> processors (MPC8548, MPC8641, etc).
>
> -source "drivers/rapidio/Kconfig"
> -
> endmenu
>
> config NONSTATIC_KERNEL
> --
> 2.18.0
^ permalink raw reply
* Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code
From: Michael Ellerman @ 2018-07-31 9:34 UTC (permalink / raw)
To: Christoph Hellwig, linux-pci
Cc: iommu, x86, linuxppc-dev, linux-kernel, linux-sh
In-Reply-To: <20180730073842.16092-1-hch@lst.de>
Christoph Hellwig <hch@lst.de> writes:
> There is nothing arch specific about PCI or dma-debug, so move this
> call to common code just after registering the bus type.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> arch/powerpc/kernel/dma.c | 3 ---
> arch/sh/drivers/pci/pci.c | 2 --
> arch/x86/kernel/pci-dma.c | 3 ---
> drivers/pci/pci-driver.c | 2 +-
> 4 files changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index 155170d70324..dbfc7056d7df 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -357,9 +357,6 @@ EXPORT_SYMBOL_GPL(dma_get_required_mask);
>
> static int __init dma_init(void)
> {
> -#ifdef CONFIG_PCI
> - dma_debug_add_bus(&pci_bus_type);
> -#endif
> #ifdef CONFIG_IBMVIO
> dma_debug_add_bus(&vio_bus_type);
> #endif
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
cheers
^ permalink raw reply
* Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()
From: Michael Ellerman @ 2018-07-31 9:32 UTC (permalink / raw)
To: Murilo Opsfelder Araujo, LEROY Christophe
Cc: linux-kernel, Alastair D'Silva, Andrew Donnellan,
Balbir Singh, Benjamin Herrenschmidt, Cyril Bur,
Eric W . Biederman, Joe Perches, Michael Neuling, Nicholas Piggin,
Paul Mackerras, Simon Guo, Sukadev Bhattiprolu, Tobin C . Harding,
linuxppc-dev
In-Reply-To: <20180730231738.GA20351@kermit-br-ibm-com>
Murilo Opsfelder Araujo <muriloo@linux.ibm.com> writes:
> On Mon, Jul 30, 2018 at 06:30:47PM +0200, LEROY Christophe wrote:
>> Murilo Opsfelder Araujo <muriloo@linux.ibm.com> a =C3=A9crit=C2=A0:
>> > On Fri, Jul 27, 2018 at 06:40:23PM +0200, LEROY Christophe wrote:
>> > > Murilo Opsfelder Araujo <muriloo@linux.ibm.com> a =C3=A9crit=C2=A0:
>> > >
>> > > > Simplify the message format by using REG_FMT as the register forma=
t. This
>> > > > avoids having two different formats and avoids checking for MSR_64=
BIT.
>> > >
>> > > Are you sure it is what we want ?
>> >
>> > Yes.
>> >
>> > > Won't it change the behaviour for a 32 bits app running on a 64bits =
kernel ?
>> >
>> > In fact, this changes how many zeroes are prefixed when displaying the
>> > registers
>> > (%016lx vs. %08lx format). For example, 32-bits userspace, 64-bits ke=
rnel:
>>
>> Indeed that's what I suspected. What is the real benefit of this change ?
>> Why not keep the current format for 32bits userspace ? All those leading
>> zeroes are pointless to me.
>
> One of the benefits is simplifying the code by removing some checks. Ano=
ther is
> deduplicating almost identical format strings in favor of a unified one.
>
> After reading Joe's comment [1], %px seems to be the format we're looking=
for.
> An extract from Documentation/core-api/printk-formats.rst:
>
> "%px is functionally equivalent to %lx (or %lu). %px is preferred becau=
se it
> is more uniquely grep'able."
>
> So I guess we don't need to worry about the format (%016lx vs. %08lx), le=
t's
> just use %px, as per the guideline.
I don't think I like %px.
It makes the format string cleaner, but it means we have to cast
everything to void * which is ugly as heck.
I actually don't think the leading zeroes are helpful at all in the
signal message, ie. we should just use %lx there.
They are useful in show_regs() because we want everything to line up.
So I think I'll drop patch 3 and use 0x%lx in show_signal_msg(), meaning
we end up with, eg:
[ 73.414535] segv[3759]: segfault (11) at 0x0 nip 0x10000420 lr 0xfe618=
54 code 0x1 in segv[10000000+10000]
[ 73.414641] segv[3759]: code: 4e800421 80010014 38210010 7c0803a6 4bff=
ff30 9421ffd0 93e1002c 7c3f0b78
[ 73.414665] segv[3759]: code: 39200000 913f001c 813f001c 39400001 <914=
90000> 39200000 7d234b78 397f0030
I'll do that unless anyone screams loudly, because it would be nice to
get this into 4.19.
cheers
^ permalink raw reply
* Re: [PATCH v5 00/11] hugetlb: Factorize hugetlb architecture primitives
From: Catalin Marinas @ 2018-07-31 9:26 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: linux-mm, mike.kravetz, linux, will.deacon, tony.luck, fenghua.yu,
ralf, paul.burton, jhogan, jejb, deller, benh, paulus, mpe, ysato,
dalias, davem, tglx, mingo, hpa, x86, arnd, linux-arm-kernel,
linux-kernel, linux-ia64, linux-mips, linux-parisc, linuxppc-dev,
linux-sh, sparclinux, linux-arch
In-Reply-To: <20180731060155.16915-1-alex@ghiti.fr>
On Tue, Jul 31, 2018 at 06:01:44AM +0000, Alexandre Ghiti wrote:
> Alexandre Ghiti (11):
> hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h
> hugetlb: Introduce generic version of hugetlb_free_pgd_range
> hugetlb: Introduce generic version of set_huge_pte_at
> hugetlb: Introduce generic version of huge_ptep_get_and_clear
> hugetlb: Introduce generic version of huge_ptep_clear_flush
> hugetlb: Introduce generic version of huge_pte_none
> hugetlb: Introduce generic version of huge_pte_wrprotect
> hugetlb: Introduce generic version of prepare_hugepage_range
> hugetlb: Introduce generic version of huge_ptep_set_wrprotect
> hugetlb: Introduce generic version of huge_ptep_set_access_flags
> hugetlb: Introduce generic version of huge_ptep_get
[...]
> arch/arm64/include/asm/hugetlb.h | 39 +++---------
For the arm64 bits in this series:
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply
* Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code
From: Christoph Hellwig @ 2018-07-31 8:07 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Christoph Hellwig, linux-pci, iommu, linuxppc-dev, x86, linux-sh,
linux-kernel, Joerg Roedel
In-Reply-To: <20180730211713.GA45322@bhelgaas-glaptop.roam.corp.google.com>
On Mon, Jul 30, 2018 at 04:17:13PM -0500, Bjorn Helgaas wrote:
> [+cc Joerg]
>
> On Mon, Jul 30, 2018 at 09:38:42AM +0200, Christoph Hellwig wrote:
> > There is nothing arch specific about PCI or dma-debug, so move this
> > call to common code just after registering the bus type.
>
> I assume that previously, even if the user set CONFIG_DMA_API_DEBUG=y
> we only got PCI DMA debug on powerpc, sh, and x86. And after this
> patch, we'll get PCI DMA debug on *all* arches?
Yes. Note that this only covers the actual bus related part, that
is warning about outstanding dma mappings on unload. The rest of the
dma api debugging already is entirely generic.
^ permalink raw reply
* Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code
From: Joerg Roedel @ 2018-07-31 7:36 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Christoph Hellwig, linux-pci, iommu, linuxppc-dev, x86, linux-sh,
linux-kernel
In-Reply-To: <20180730211713.GA45322@bhelgaas-glaptop.roam.corp.google.com>
On Mon, Jul 30, 2018 at 04:17:13PM -0500, Bjorn Helgaas wrote:
> [+cc Joerg]
>
> On Mon, Jul 30, 2018 at 09:38:42AM +0200, Christoph Hellwig wrote:
> > There is nothing arch specific about PCI or dma-debug, so move this
> > call to common code just after registering the bus type.
>
> I assume that previously, even if the user set CONFIG_DMA_API_DEBUG=y
> we only got PCI DMA debug on powerpc, sh, and x86. And after this
> patch, we'll get PCI DMA debug on *all* arches?
>
> If that's true, I'll add a comment to that effect to the commitlog
> since that new functionality might be of interest to other arches.
There should be implicit support for dma-debug for all arches that use
the generic dma_ops code. The dma_debug_add_bus() function just adds the
reporting of pending dma-allocations on driver-unload for a device.
Regards,
Joerg
^ permalink raw reply
* Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
From: Anshuman Khandual @ 2018-07-31 7:00 UTC (permalink / raw)
To: Christoph Hellwig
Cc: robh, srikar, mst, aik, jasowang, linuxram, linux-kernel,
virtualization, paulus, joe, linuxppc-dev, elfring, haren, david
In-Reply-To: <20180730092551.GB26245@infradead.org>
On 07/30/2018 02:55 PM, Christoph Hellwig wrote:
>> +const struct dma_map_ops virtio_direct_dma_ops;
>
> This belongs into a header if it is non-static. If you only
> use it in this file anyway please mark it static and avoid a forward
> declaration.
Sure, will make it static, move the definition up in the file to avoid
forward declaration.
>
>> +
>> int virtio_finalize_features(struct virtio_device *dev)
>> {
>> int ret = dev->config->finalize_features(dev);
>> @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
>> if (ret)
>> return ret;
>>
>> + if (virtio_has_iommu_quirk(dev))
>> + set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
>
> This needs a big fat comment explaining what is going on here.
Sure, will do. Also talk about the XEN domain exception as well once
that goes into this conditional statement.
>
> Also not new, but I find the existance of virtio_has_iommu_quirk and its
> name horribly confusing. It might be better to open code it here once
> only a single caller is left.
Sure will do. There is one definition in the tools directory which can
be removed and then this will be the only one left.
^ permalink raw reply
* Re: [PATCH v3 1/1] powerpc/pseries: fix EEH recovery of some IOV devices
From: Michael Ellerman @ 2018-07-31 6:43 UTC (permalink / raw)
To: Bjorn Helgaas, Sam Bobroff; +Cc: linuxppc-dev, linux-pci, bhelgaas, bryantly
In-Reply-To: <20180730212155.GB45322@bhelgaas-glaptop.roam.corp.google.com>
Bjorn Helgaas <helgaas@kernel.org> writes:
> On Mon, Jul 30, 2018 at 11:59:14AM +1000, Sam Bobroff wrote:
>> EEH recovery currently fails on pSeries for some IOV capable PCI
>> devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
>> certain device tree properties for the device. (Found on an IOV
>> capable device using the ipr driver.)
>>
>> Recovery fails in pci_enable_resources() at the check on r->parent,
>> because r->flags is set and r->parent is not. This state is due to
>> sriov_init() setting the start, end and flags members of the IOV BARs
>> but the parent not being set later in
>> pseries_pci_fixup_iov_resources(), because the
>> "ibm,open-sriov-vf-bar-info" property is missing.
>>
>> Correct this by zeroing the resource flags for IOV BARs when they
>> can't be configured (this is the same method used by sriov_init() and
>> __pci_read_base()).
>>
>> VFs cleared this way can't be enabled later, because that requires
>> another device tree property, "ibm,number-of-configurable-vfs" as well
>> as support for the RTAS function "ibm_map_pes". These are all part of
>> hypervisor support for IOV and it seems unlikely that a hypervisor
>> would ever partially, but not fully, support it. (None are currently
>> provided by QEMU/KVM.)
>>
>> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
>
> Michael, I assume you'll take this, since it only touches powerpc.
> Let me know if you need anything from me.
Yeah I'll take it, thanks.
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/mobility: Fix node detach/rename problem
From: Michael Ellerman @ 2018-07-31 6:42 UTC (permalink / raw)
To: Tyrel Datwyler, Michael Bringmann, linuxppc-dev
Cc: Nathan Fontenot, Thomas Falcon, John Allen
In-Reply-To: <9cd25a93-6c71-728c-e9bf-a16f80ef5655@linux.vnet.ibm.com>
Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:
> On 07/29/2018 06:11 AM, Michael Bringmann wrote:
>> During LPAR migration, the content of the device tree/sysfs may
>> be updated including deletion and replacement of nodes in the
>> tree. When nodes are added to the internal node structures, they
>> are appended in FIFO order to a list of nodes maintained by the
>> OF code APIs. When nodes are removed from the device tree, they
>> are marked OF_DETACHED, but not actually deleted from the system
>> to allow for pointers cached elsewhere in the kernel. The order
>> and content of the entries in the list of nodes is not altered,
>> though.
>>
>> During LPAR migration some common nodes are deleted and re-added
>> e.g. "ibm,platform-facilities". If a node is re-added to the OF
>> node lists, the of_attach_node function checks to make sure that
>> the name + ibm,phandle of the to-be-added data is unique. As the
>> previous copy of a re-added node is not modified beyond the addition
>> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING
>> notice to the console, (3) renames the to-be-added node to avoid
>> filename collisions within a directory, and (3) adds entries to
>> the sysfs/kernfs.
>
> So, this patch actually just band aids over the real problem. This is
> a long standing problem with several PFO drivers leaking references.
> The issue here is that, during the device tree update that follows a
> migration. the update of the ibm,platform-facilities node and friends
> below are always deleted and re-added on the destination lpar and
> subsequently the leaked references prevent the devices nodes from
> every actually being properly cleaned up after detach. Thus, leading
> to the issue you are observing.
Leaking references shouldn't affect the node being detached from the
tree though.
See of_detach_node() calling __of_detach_node(), none of that depends on
the refcount.
It's only the actual freeing of the node, in of_node_release() that is
prevented by leaked reference counts.
So I agree we need to do a better job with the reference counting, but I
don't see how it is causing the problem here.
cheers
^ permalink raw reply
* Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
From: Anshuman Khandual @ 2018-07-31 6:39 UTC (permalink / raw)
To: Christoph Hellwig, Michael S. Tsirkin
Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
elfring, david, jasowang, benh, mpe, linuxram, haren, paulus,
srikar
In-Reply-To: <20180730093027.GC26245@infradead.org>
On 07/30/2018 03:00 PM, Christoph Hellwig wrote:
>>> +
>>> + if (xen_domain())
>>> + goto skip_override;
>>> +
>>> + if (virtio_has_iommu_quirk(dev))
>>> + set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
>>> +
>>> + skip_override:
>>> +
>>
>> I prefer normal if scoping as opposed to goto spaghetti pls.
>> Better yet move vring_use_dma_api here and use it.
>> Less of a chance something will break.
>
> I agree about avoid pointless gotos here, but we can do things
> perfectly well without either gotos or a confusing helper here
> if we structure it right. E.g.:
>
> // suitably detailed comment here
> if (!xen_domain() &&
> !virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
> set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
I had updated this patch calling vring_use_dma_api() as a helper
as suggested by Michael but yes we can have the above condition
with a comment block. I will change this patch accordingly.
>
> and while we're at it - modifying dma ops for the parent looks very
> dangerous. I don't think we can do that, as it could break iommu
> setup interactions. IFF we set a specific dma map ops it has to be
> on the virtio device itself, of which we have full control.
I understand your concern. At present virtio core calls parent's DMA
ops callbacks when device has VIRTIO_F_IOMMU_PLATFORM flag set. Most
likely those DMA OPS are architecture specific ones which can really
configure IOMMU. Most probably all devices and their parents share
the same DMA ops callback. IIUC as long as the entire system has a
single DMA ops structure, it should be okay. But I may be missing
other implications. I tried changing virtio core so that it always
calls device's DMA ops instead of it's parent DMA ops, it hit the
following WARN_ON for devices without IOMMU flag and hit both the
WARN_ON and BUG_ON for devices with the IOMMU flag.
static inline void *dma_alloc_attrs(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flag,
unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
void *cpu_addr;
BUG_ON(!ops);
WARN_ON_ONCE(dev && !dev->coherent_dma_mask);
--------
Seems like virtio device's DMA ops and coherent_dma_mask was never
set correctly assuming that virtio core always called parent's DMA
OPS all the time. We may have to change virtio device init to fix
this. Any thoughts ?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox