* [PATCH v2 0/5] Winkle support for offline cpus @ 2014-10-01 7:46 Shreyas B. Prabhu 2014-10-01 7:46 ` [PATCH v2 1/5] powerpc/powernv: Add OPAL call to save and restore Shreyas B. Prabhu ` (4 more replies) 0 siblings, 5 replies; 10+ messages in thread From: Shreyas B. Prabhu @ 2014-10-01 7:46 UTC (permalink / raw) To: linux-kernel Cc: Srivatsa S. Bhat, Shreyas B. Prabhu, Paul Mackerras, Preeti U. Murthy, linuxppc-dev Powernv already has support for nap and sleep and these states are used by cpuidle framework. This patchset adds support for 'deep winkle' a deeper idle state. In deep winkle, entire chiplet (core/L2/L3) is power off, leading to higher power savings. But this results in hypervisor state loss. This patchset add the necessary infrastructure to recover from hypervisor state loss and enables offline cpus to use winkle. I've successfully tested subcore functionality with these patches. Particularly these two scenarios: Scenario 1: -> Set subcore-per-core to 4. -> Offline and online a complete core Check if core wakes up with 4 subcores Scenario 2. -> Set subcore-per-core to 1. -> Offline a core. -> set subcore-per-core to 4. -> Online a core Check if core wakes up with 4 subcores. In both these scenarios, the core wakes up with 4 subcores and can run guests on individual subcores. Note, these patches apply on top 'powernv/cpuidle: Fastsleep workaround and fixes' series. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Srivatsa S. Bhat <srivatsa@MIT.EDU> Cc: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org v2: Rebased on 3.17-rc7 Split from 'powerpc/powernv: Support for fastsleep and winkle' v1: https://lkml.org/lkml/2014/8/25/446 Shreyas B. Prabhu (5): powerpc/powernv: Add OPAL call to save and restore powerpc: Adding macro for accessing Thread Switch Control Register powerpc/powernv: Add winkle infrastructure powerpc/powernv: Discover and enable winkle powerpc/powernv: Enter deepest supported idle state in offline arch/powerpc/include/asm/machdep.h | 1 + arch/powerpc/include/asm/opal.h | 3 + arch/powerpc/include/asm/paca.h | 3 + arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 2 + arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kernel/exceptions-64s.S | 4 +- arch/powerpc/kernel/idle.c | 11 +++ arch/powerpc/kernel/idle_power7.S | 81 ++++++++++++++++++++- arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/powernv.h | 1 + arch/powerpc/platforms/powernv/setup.c | 99 ++++++++++++++++++++++++++ arch/powerpc/platforms/powernv/smp.c | 6 +- arch/powerpc/platforms/powernv/subcore.c | 15 ++++ 15 files changed, 226 insertions(+), 5 deletions(-) -- 1.9.3 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/5] powerpc/powernv: Add OPAL call to save and restore 2014-10-01 7:46 [PATCH v2 0/5] Winkle support for offline cpus Shreyas B. Prabhu @ 2014-10-01 7:46 ` Shreyas B. Prabhu 2014-10-07 5:22 ` Benjamin Herrenschmidt 2014-10-01 7:46 ` [PATCH v2 2/5] powerpc: Adding macro for accessing Thread Switch Control Register Shreyas B. Prabhu ` (3 subsequent siblings) 4 siblings, 1 reply; 10+ messages in thread From: Shreyas B. Prabhu @ 2014-10-01 7:46 UTC (permalink / raw) To: linux-kernel; +Cc: Shreyas B. Prabhu, linuxppc-dev, Paul Mackerras PORE can be programmed to restore hypervisor registers when waking up from deep cpu idle states like winkle. Add call to pass SPR address and value to OPAL, which in turn will program PORE to restore the register state. Cc: linuxppc-dev@lists.ozlabs.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Suggested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> --- arch/powerpc/include/asm/opal.h | 2 ++ arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + 2 files changed, 3 insertions(+) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 166d572..d376020 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -150,6 +150,7 @@ struct opal_sg_list { #define OPAL_PCI_EEH_FREEZE_SET 97 #define OPAL_HANDLE_HMI 98 #define OPAL_CONFIG_IDLE_STATE 99 +#define OPAL_SLW_SET_REG 100 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION 102 @@ -978,6 +979,7 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs); extern void opal_shutdown(void); extern int opal_resync_timebase(void); int64_t opal_config_idle_state(uint64_t state, uint64_t enter); +int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); extern void opal_lpc_init(void); diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index 8d1e724..12e5d46 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -246,5 +246,6 @@ OPAL_CALL(opal_get_param, OPAL_GET_PARAM); OPAL_CALL(opal_set_param, OPAL_SET_PARAM); OPAL_CALL(opal_handle_hmi, OPAL_HANDLE_HMI); OPAL_CALL(opal_config_idle_state, OPAL_CONFIG_IDLE_STATE); +OPAL_CALL(opal_slw_set_reg, OPAL_SLW_SET_REG); OPAL_CALL(opal_register_dump_region, OPAL_REGISTER_DUMP_REGION); OPAL_CALL(opal_unregister_dump_region, OPAL_UNREGISTER_DUMP_REGION); -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/5] powerpc/powernv: Add OPAL call to save and restore 2014-10-01 7:46 ` [PATCH v2 1/5] powerpc/powernv: Add OPAL call to save and restore Shreyas B. Prabhu @ 2014-10-07 5:22 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 10+ messages in thread From: Benjamin Herrenschmidt @ 2014-10-07 5:22 UTC (permalink / raw) To: Shreyas B. Prabhu; +Cc: Paul Mackerras, linuxppc-dev, linux-kernel On Wed, 2014-10-01 at 13:16 +0530, Shreyas B. Prabhu wrote: > PORE can be programmed to restore hypervisor registers when waking up > from deep cpu idle states like winkle. Tell us a bit more about what "PORE" is. IE, explain a tiny engine will reconfigure the core and its ucode can be patched to provide some registers with sane values. > Add call to pass SPR address and value to OPAL, which in turn will > program PORE to restore the register state. Otherwise, Acke-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Suggested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> > --- > arch/powerpc/include/asm/opal.h | 2 ++ > arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + > 2 files changed, 3 insertions(+) > > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index 166d572..d376020 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -150,6 +150,7 @@ struct opal_sg_list { > #define OPAL_PCI_EEH_FREEZE_SET 97 > #define OPAL_HANDLE_HMI 98 > #define OPAL_CONFIG_IDLE_STATE 99 > +#define OPAL_SLW_SET_REG 100 > #define OPAL_REGISTER_DUMP_REGION 101 > #define OPAL_UNREGISTER_DUMP_REGION 102 > > @@ -978,6 +979,7 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs); > extern void opal_shutdown(void); > extern int opal_resync_timebase(void); > int64_t opal_config_idle_state(uint64_t state, uint64_t enter); > +int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); > > extern void opal_lpc_init(void); > > diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S > index 8d1e724..12e5d46 100644 > --- a/arch/powerpc/platforms/powernv/opal-wrappers.S > +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S > @@ -246,5 +246,6 @@ OPAL_CALL(opal_get_param, OPAL_GET_PARAM); > OPAL_CALL(opal_set_param, OPAL_SET_PARAM); > OPAL_CALL(opal_handle_hmi, OPAL_HANDLE_HMI); > OPAL_CALL(opal_config_idle_state, OPAL_CONFIG_IDLE_STATE); > +OPAL_CALL(opal_slw_set_reg, OPAL_SLW_SET_REG); > OPAL_CALL(opal_register_dump_region, OPAL_REGISTER_DUMP_REGION); > OPAL_CALL(opal_unregister_dump_region, OPAL_UNREGISTER_DUMP_REGION); ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/5] powerpc: Adding macro for accessing Thread Switch Control Register 2014-10-01 7:46 [PATCH v2 0/5] Winkle support for offline cpus Shreyas B. Prabhu 2014-10-01 7:46 ` [PATCH v2 1/5] powerpc/powernv: Add OPAL call to save and restore Shreyas B. Prabhu @ 2014-10-01 7:46 ` Shreyas B. Prabhu 2014-10-07 5:22 ` Benjamin Herrenschmidt 2014-10-01 7:46 ` [PATCH v2 3/5] powerpc/powernv: Add winkle infrastructure Shreyas B. Prabhu ` (2 subsequent siblings) 4 siblings, 1 reply; 10+ messages in thread From: Shreyas B. Prabhu @ 2014-10-01 7:46 UTC (permalink / raw) To: linux-kernel; +Cc: Shreyas B. Prabhu, linuxppc-dev, Paul Mackerras Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> --- arch/powerpc/include/asm/reg.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 0c05059..cb65a73 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -371,6 +371,7 @@ #define SPRN_DBAT7L 0x23F /* Data BAT 7 Lower Register */ #define SPRN_DBAT7U 0x23E /* Data BAT 7 Upper Register */ #define SPRN_PPR 0x380 /* SMT Thread status Register */ +#define SPRN_TSCR 0x399 /* Thread Switch Control Register */ #define SPRN_DEC 0x016 /* Decrement Register */ #define SPRN_DER 0x095 /* Debug Enable Regsiter */ -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/5] powerpc: Adding macro for accessing Thread Switch Control Register 2014-10-01 7:46 ` [PATCH v2 2/5] powerpc: Adding macro for accessing Thread Switch Control Register Shreyas B. Prabhu @ 2014-10-07 5:22 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 10+ messages in thread From: Benjamin Herrenschmidt @ 2014-10-07 5:22 UTC (permalink / raw) To: Shreyas B. Prabhu; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel Just fold that one in the patch that uses that register On Wed, 2014-10-01 at 13:16 +0530, Shreyas B. Prabhu wrote: > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> > --- > arch/powerpc/include/asm/reg.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h > index 0c05059..cb65a73 100644 > --- a/arch/powerpc/include/asm/reg.h > +++ b/arch/powerpc/include/asm/reg.h > @@ -371,6 +371,7 @@ > #define SPRN_DBAT7L 0x23F /* Data BAT 7 Lower Register */ > #define SPRN_DBAT7U 0x23E /* Data BAT 7 Upper Register */ > #define SPRN_PPR 0x380 /* SMT Thread status Register */ > +#define SPRN_TSCR 0x399 /* Thread Switch Control Register */ > > #define SPRN_DEC 0x016 /* Decrement Register */ > #define SPRN_DER 0x095 /* Debug Enable Regsiter */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 3/5] powerpc/powernv: Add winkle infrastructure 2014-10-01 7:46 [PATCH v2 0/5] Winkle support for offline cpus Shreyas B. Prabhu 2014-10-01 7:46 ` [PATCH v2 1/5] powerpc/powernv: Add OPAL call to save and restore Shreyas B. Prabhu 2014-10-01 7:46 ` [PATCH v2 2/5] powerpc: Adding macro for accessing Thread Switch Control Register Shreyas B. Prabhu @ 2014-10-01 7:46 ` Shreyas B. Prabhu 2014-10-07 5:33 ` Benjamin Herrenschmidt 2014-10-01 7:46 ` [PATCH v2 4/5] powerpc/powernv: Discover and enable winkle Shreyas B. Prabhu 2014-10-01 7:46 ` [PATCH v2 5/5] powerpc/powernv: Enter deepest supported idle state in offline Shreyas B. Prabhu 4 siblings, 1 reply; 10+ messages in thread From: Shreyas B. Prabhu @ 2014-10-01 7:46 UTC (permalink / raw) To: linux-kernel; +Cc: Shreyas B. Prabhu, linuxppc-dev, Paul Mackerras Winkle causes power to be gated off to the entire chiplet. Hence the hypervisor/firmware state in the entire chiplet is lost. This patch adds necessary infrastructure to support waking up from hypervisor state loss. Specifically does following: - Before entering winkle, save state of registers that need to be restored on wake up (SDR1, HFSCR) - SRR1 bits 46:47 which is used to identify which power saving mode cpu woke up from is '11' for both winkle and sleep. Hence introduce a flag in PACA to distinguish b/w winkle and sleep. - Upon waking up, restore all saved registers, recover slb Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Suggested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> --- arch/powerpc/include/asm/machdep.h | 1 + arch/powerpc/include/asm/paca.h | 3 ++ arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 2 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kernel/exceptions-64s.S | 8 ++-- arch/powerpc/kernel/idle.c | 11 +++++ arch/powerpc/kernel/idle_power7.S | 81 +++++++++++++++++++++++++++++++++- arch/powerpc/platforms/powernv/setup.c | 24 ++++++++++ 9 files changed, 127 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index f37014f..0a3ced9 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -301,6 +301,7 @@ struct machdep_calls { /* Idle handlers */ void (*setup_idle)(void); unsigned long (*power7_sleep)(void); + unsigned long (*power7_winkle)(void); }; extern void e500_idle(void); diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index a5139ea..3358f09 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -158,6 +158,9 @@ struct paca_struct { * early exception handler for use by high level C handler */ struct opal_machine_check_event *opal_mc_evt; + + /* Flag to distinguish b/w sleep and winkle */ + u8 offline_state; #endif #ifdef CONFIG_PPC_BOOK3S_64 /* Exclusive emergency stack pointer for machine check exception. */ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 6f85362..5155be7 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -194,6 +194,7 @@ #define PPC_INST_NAP 0x4c000364 #define PPC_INST_SLEEP 0x4c0003a4 +#define PPC_INST_WINKLE 0x4c0003e4 /* A2 specific instructions */ #define PPC_INST_ERATWE 0x7c0001a6 @@ -374,6 +375,7 @@ #define PPC_NAP stringify_in_c(.long PPC_INST_NAP) #define PPC_SLEEP stringify_in_c(.long PPC_INST_SLEEP) +#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE) /* BHRB instructions */ #define PPC_CLRBHRB stringify_in_c(.long PPC_INST_CLRBHRB) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 41953cd..00e3df9 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -455,6 +455,8 @@ extern void arch_setup_idle(void); extern void power7_nap(int check_irq); extern unsigned long power7_sleep(void); extern unsigned long __power7_sleep(void); +extern unsigned long power7_winkle(void); +extern unsigned long __power7_winkle(void); extern void flush_instruction_cache(void); extern void hard_reset_now(void); extern void poweroff_now(void); diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9d7dede..ea98817 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -731,6 +731,7 @@ int main(void) DEFINE(OPAL_MC_SRR0, offsetof(struct opal_machine_check_event, srr0)); DEFINE(OPAL_MC_SRR1, offsetof(struct opal_machine_check_event, srr1)); DEFINE(PACA_OPAL_MC_EVT, offsetof(struct paca_struct, opal_mc_evt)); + DEFINE(PACAOFFLINESTATE, offsetof(struct paca_struct, offline_state)); #endif return 0; diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index c64f3cc0..261f348 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -115,9 +115,7 @@ BEGIN_FTR_SECTION #endif /* Running native on arch 2.06 or later, check if we are - * waking up from nap. We only handle no state loss and - * supervisor state loss. We do -not- handle hypervisor - * state loss at this time. + * waking up from power saving mode. */ mfspr r13,SPRN_SRR1 rlwinm. r13,r13,47-31,30,31 @@ -133,8 +131,8 @@ BEGIN_FTR_SECTION b power7_wakeup_noloss 2: b power7_wakeup_loss - /* Fast Sleep wakeup on PowerNV */ -8: b power7_wakeup_tb_loss + /* Fast Sleep / Winkle wakeup on PowerNV */ +8: b power7_wakeup_hv_state_loss 9: END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c index 1f268e0..ed46217 100644 --- a/arch/powerpc/kernel/idle.c +++ b/arch/powerpc/kernel/idle.c @@ -98,6 +98,17 @@ unsigned long power7_sleep(void) return ret; } +unsigned long power7_winkle(void) +{ + unsigned long ret; + + if (ppc_md.power7_winkle) + ret = ppc_md.power7_winkle(); + else + ret = __power7_winkle(); + return ret; +} + int powersave_nap; #ifdef CONFIG_SYSCTL diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index c3481c9..87b2556 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -18,6 +18,13 @@ #include <asm/hw_irq.h> #include <asm/kvm_book3s_asm.h> #include <asm/opal.h> +#include <asm/mmu-hash64.h> + +/* + * Use volatile GPRs' space to save essential SPRs before entering winkle + */ +#define _SDR1 GPR3 +#define _TSCR GPR4 #undef DEBUG @@ -39,6 +46,7 @@ * Pass requested state in r3: * 0 - nap * 1 - sleep + * 2 - winkle * * To check IRQ_HAPPENED in r4 * 0 - don't check @@ -109,9 +117,27 @@ _GLOBAL(power7_enter_nap_mode) #endif cmpwi cr0,r3,1 beq 2f + cmpwi cr0,r3,2 + beq 3f IDLE_STATE_ENTER_SEQ(PPC_NAP) /* No return */ -2: IDLE_STATE_ENTER_SEQ(PPC_SLEEP) +2: + li r4,1 + stb r4,PACAOFFLINESTATE(r13) + IDLE_STATE_ENTER_SEQ(PPC_SLEEP) + /* No return */ + +3: + mfspr r4,SPRN_SDR1 + std r4,_SDR1(r1) + + mfspr r4,SPRN_TSCR + std r4,_TSCR(r1) + + /* Enter winkle */ + li r4,0 + stb r4,PACAOFFLINESTATE(r13) + IDLE_STATE_ENTER_SEQ(PPC_WINKLE) /* No return */ _GLOBAL(power7_idle) @@ -187,6 +213,59 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \ 20: nop; +_GLOBAL(__power7_winkle) + li r3,2 + li r4,1 + b power7_powersave_common + /* No return */ + +_GLOBAL(power7_wakeup_hv_state_loss) + /* Check paca flag to diffentiate b/w fast sleep and winkle */ + lbz r4,PACAOFFLINESTATE(13) + cmpwi cr0,r4,0 + bne power7_wakeup_tb_loss + + ld r2,PACATOC(r13); + ld r1,PACAR1(r13) + + bl __restore_cpu_power8 + + /* Time base re-sync */ + li r3,OPAL_RESYNC_TIMEBASE + bl opal_call_realmode; + + /* Restore SLB from PACA */ + ld r8,PACA_SLBSHADOWPTR(r13) + + .rept SLB_NUM_BOLTED + li r3, SLBSHADOW_SAVEAREA + LDX_BE r5, r8, r3 + addi r3, r3, 8 + LDX_BE r6, r8, r3 + andis. r7,r5,SLB_ESID_V@h + beq 1f + slbmte r6,r5 +1: addi r8,r8,16 + .endr + + ld r4,_SDR1(r1) + mtspr SPRN_SDR1,r4 + + ld r4,_TSCR(r1) + mtspr SPRN_TSCR,r4 + + REST_NVGPRS(r1) + REST_GPR(2, r1) + ld r3,_CCR(r1) + ld r4,_MSR(r1) + ld r5,_NIP(r1) + addi r1,r1,INT_FRAME_SIZE + mtcr r3 + mfspr r3,SPRN_SRR1 /* Return SRR1 */ + mtspr SPRN_SRR1,r4 + mtspr SPRN_SRR0,r5 + rfid + _GLOBAL(power7_wakeup_tb_loss) ld r2,PACATOC(r13); ld r1,PACAR1(r13) diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 9d9a898..f45b52d 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -370,6 +370,29 @@ static unsigned long pnv_power7_sleep(void) return srr1; } +/* + * We need to keep track of offline cpus also for calling + * fastsleep workaround appropriately + */ +static unsigned long pnv_power7_winkle(void) +{ + int cpu, primary_thread; + unsigned long srr1; + + cpu = smp_processor_id(); + primary_thread = cpu_first_thread_sibling(cpu); + + if (need_fastsleep_workaround) { + pnv_apply_fastsleep_workaround(1, primary_thread); + srr1 = __power7_winkle(); + pnv_apply_fastsleep_workaround(0, primary_thread); + } else { + srr1 = __power7_winkle(); + } + return srr1; +} + + static void __init pnv_setup_machdep_opal(void) { ppc_md.get_boot_time = opal_get_boot_time; @@ -384,6 +407,7 @@ static void __init pnv_setup_machdep_opal(void) ppc_md.handle_hmi_exception = opal_handle_hmi_exception; ppc_md.setup_idle = pnv_setup_idle; ppc_md.power7_sleep = pnv_power7_sleep; + ppc_md.power7_winkle = pnv_power7_winkle; } #ifdef CONFIG_PPC_POWERNV_RTAS -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 3/5] powerpc/powernv: Add winkle infrastructure 2014-10-01 7:46 ` [PATCH v2 3/5] powerpc/powernv: Add winkle infrastructure Shreyas B. Prabhu @ 2014-10-07 5:33 ` Benjamin Herrenschmidt 2014-10-07 9:56 ` Shreyas B Prabhu 0 siblings, 1 reply; 10+ messages in thread From: Benjamin Herrenschmidt @ 2014-10-07 5:33 UTC (permalink / raw) To: Shreyas B. Prabhu; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel On Wed, 2014-10-01 at 13:16 +0530, Shreyas B. Prabhu wrote: > Winkle causes power to be gated off to the entire chiplet. Hence the > hypervisor/firmware state in the entire chiplet is lost. > > This patch adds necessary infrastructure to support waking up from > hypervisor state loss. Specifically does following: > - Before entering winkle, save state of registers that need to be > restored on wake up (SDR1, HFSCR) Add ... to your list, it's not exhaustive, is it ? > - SRR1 bits 46:47 which is used to identify which power saving mode cpu > woke up from is '11' for both winkle and sleep. Hence introduce a flag > in PACA to distinguish b/w winkle and sleep. > > - Upon waking up, restore all saved registers, recover slb > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: linuxppc-dev@lists.ozlabs.org > Suggested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> > --- > arch/powerpc/include/asm/machdep.h | 1 + > arch/powerpc/include/asm/paca.h | 3 ++ > arch/powerpc/include/asm/ppc-opcode.h | 2 + > arch/powerpc/include/asm/processor.h | 2 + > arch/powerpc/kernel/asm-offsets.c | 1 + > arch/powerpc/kernel/exceptions-64s.S | 8 ++-- > arch/powerpc/kernel/idle.c | 11 +++++ > arch/powerpc/kernel/idle_power7.S | 81 +++++++++++++++++++++++++++++++++- > arch/powerpc/platforms/powernv/setup.c | 24 ++++++++++ > 9 files changed, 127 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h > index f37014f..0a3ced9 100644 > --- a/arch/powerpc/include/asm/machdep.h > +++ b/arch/powerpc/include/asm/machdep.h > @@ -301,6 +301,7 @@ struct machdep_calls { > /* Idle handlers */ > void (*setup_idle)(void); > unsigned long (*power7_sleep)(void); > + unsigned long (*power7_winkle)(void); > }; Why does it need to be ppc_md ? Same comments as for sleep > extern void e500_idle(void); > diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h > index a5139ea..3358f09 100644 > --- a/arch/powerpc/include/asm/paca.h > +++ b/arch/powerpc/include/asm/paca.h > @@ -158,6 +158,9 @@ struct paca_struct { > * early exception handler for use by high level C handler > */ > struct opal_machine_check_event *opal_mc_evt; > + > + /* Flag to distinguish b/w sleep and winkle */ > + u8 offline_state; Not fan of the name. I'd rather you call it "wakeup_state_loss" or something a bit more explicit about what that actually means if it's going to be a boolean value. Otherwise make it an enumeration of constants. > #endif > #ifdef CONFIG_PPC_BOOK3S_64 > /* Exclusive emergency stack pointer for machine check exception. */ > diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h > index 6f85362..5155be7 100644 > --- a/arch/powerpc/include/asm/ppc-opcode.h > +++ b/arch/powerpc/include/asm/ppc-opcode.h > @@ -194,6 +194,7 @@ > > #define PPC_INST_NAP 0x4c000364 > #define PPC_INST_SLEEP 0x4c0003a4 > +#define PPC_INST_WINKLE 0x4c0003e4 > > /* A2 specific instructions */ > #define PPC_INST_ERATWE 0x7c0001a6 > @@ -374,6 +375,7 @@ > > #define PPC_NAP stringify_in_c(.long PPC_INST_NAP) > #define PPC_SLEEP stringify_in_c(.long PPC_INST_SLEEP) > +#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE) > > /* BHRB instructions */ > #define PPC_CLRBHRB stringify_in_c(.long PPC_INST_CLRBHRB) > diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h > index 41953cd..00e3df9 100644 > --- a/arch/powerpc/include/asm/processor.h > +++ b/arch/powerpc/include/asm/processor.h > @@ -455,6 +455,8 @@ extern void arch_setup_idle(void); > extern void power7_nap(int check_irq); > extern unsigned long power7_sleep(void); > extern unsigned long __power7_sleep(void); > +extern unsigned long power7_winkle(void); > +extern unsigned long __power7_winkle(void); > extern void flush_instruction_cache(void); > extern void hard_reset_now(void); > extern void poweroff_now(void); > diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c > index 9d7dede..ea98817 100644 > --- a/arch/powerpc/kernel/asm-offsets.c > +++ b/arch/powerpc/kernel/asm-offsets.c > @@ -731,6 +731,7 @@ int main(void) > DEFINE(OPAL_MC_SRR0, offsetof(struct opal_machine_check_event, srr0)); > DEFINE(OPAL_MC_SRR1, offsetof(struct opal_machine_check_event, srr1)); > DEFINE(PACA_OPAL_MC_EVT, offsetof(struct paca_struct, opal_mc_evt)); > + DEFINE(PACAOFFLINESTATE, offsetof(struct paca_struct, offline_state)); > #endif > > return 0; > diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S > index c64f3cc0..261f348 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -115,9 +115,7 @@ BEGIN_FTR_SECTION > #endif > > /* Running native on arch 2.06 or later, check if we are > - * waking up from nap. We only handle no state loss and > - * supervisor state loss. We do -not- handle hypervisor > - * state loss at this time. > + * waking up from power saving mode. > */ > mfspr r13,SPRN_SRR1 > rlwinm. r13,r13,47-31,30,31 > @@ -133,8 +131,8 @@ BEGIN_FTR_SECTION > b power7_wakeup_noloss > 2: b power7_wakeup_loss > > - /* Fast Sleep wakeup on PowerNV */ > -8: b power7_wakeup_tb_loss > + /* Fast Sleep / Winkle wakeup on PowerNV */ > +8: b power7_wakeup_hv_state_loss > > 9: > END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) > diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c > index 1f268e0..ed46217 100644 > --- a/arch/powerpc/kernel/idle.c > +++ b/arch/powerpc/kernel/idle.c > @@ -98,6 +98,17 @@ unsigned long power7_sleep(void) > return ret; > } > > +unsigned long power7_winkle(void) > +{ > + unsigned long ret; > + > + if (ppc_md.power7_winkle) > + ret = ppc_md.power7_winkle(); > + else > + ret = __power7_winkle(); > + return ret; > +} > + > int powersave_nap; > > #ifdef CONFIG_SYSCTL > diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S > index c3481c9..87b2556 100644 > --- a/arch/powerpc/kernel/idle_power7.S > +++ b/arch/powerpc/kernel/idle_power7.S > @@ -18,6 +18,13 @@ > #include <asm/hw_irq.h> > #include <asm/kvm_book3s_asm.h> > #include <asm/opal.h> > +#include <asm/mmu-hash64.h> > + > +/* > + * Use volatile GPRs' space to save essential SPRs before entering winkle > + */ > +#define _SDR1 GPR3 > +#define _TSCR GPR4 > > #undef DEBUG > > @@ -39,6 +46,7 @@ > * Pass requested state in r3: > * 0 - nap > * 1 - sleep > + * 2 - winkle > * > * To check IRQ_HAPPENED in r4 > * 0 - don't check > @@ -109,9 +117,27 @@ _GLOBAL(power7_enter_nap_mode) > #endif > cmpwi cr0,r3,1 > beq 2f > + cmpwi cr0,r3,2 > + beq 3f > IDLE_STATE_ENTER_SEQ(PPC_NAP) > /* No return */ > -2: IDLE_STATE_ENTER_SEQ(PPC_SLEEP) > +2: > + li r4,1 > + stb r4,PACAOFFLINESTATE(r13) > + IDLE_STATE_ENTER_SEQ(PPC_SLEEP) > + /* No return */ > + > +3: > + mfspr r4,SPRN_SDR1 > + std r4,_SDR1(r1) > + > + mfspr r4,SPRN_TSCR > + std r4,_TSCR(r1) > + > + /* Enter winkle */ > + li r4,0 > + stb r4,PACAOFFLINESTATE(r13) > + IDLE_STATE_ENTER_SEQ(PPC_WINKLE) > /* No return */ > > _GLOBAL(power7_idle) > @@ -187,6 +213,59 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \ > 20: nop; > > > +_GLOBAL(__power7_winkle) > + li r3,2 > + li r4,1 > + b power7_powersave_common > + /* No return */ > + > +_GLOBAL(power7_wakeup_hv_state_loss) > + /* Check paca flag to diffentiate b/w fast sleep and winkle */ > + lbz r4,PACAOFFLINESTATE(13) > + cmpwi cr0,r4,0 > + bne power7_wakeup_tb_loss > + > + ld r2,PACATOC(r13); > + ld r1,PACAR1(r13) > + > + bl __restore_cpu_power8 So if I understand correctly, you use a per-cpu flag not a per-core flag which means we will assume a pessimistic case of having to restore stuff even if the core didn't actually enter winkle (because the last thread to go down went to sleep). This is sub-optimal. Also see below: > + /* Time base re-sync */ > + li r3,OPAL_RESYNC_TIMEBASE > + bl opal_call_realmode; You will also resync the timebase (and restore all the core shared SPRs) for each thread. This is problematic, especially with KVM as you could have a situation where: - The first thread comes out and starts diving into KVM - The other threads start coming out while the first one is doing the above. Thus the first thread might already be manipulating some core registers (SDR1 etc...) while the secondaries come back and ... whack it. Worse, the primary might have applied the TB offset using TBU40 while the secondaries resync the timebase back to the host value, incurring a loss of TB for the guest. > + /* Restore SLB from PACA */ > + ld r8,PACA_SLBSHADOWPTR(r13) > + > + .rept SLB_NUM_BOLTED > + li r3, SLBSHADOW_SAVEAREA > + LDX_BE r5, r8, r3 > + addi r3, r3, 8 > + LDX_BE r6, r8, r3 > + andis. r7,r5,SLB_ESID_V@h > + beq 1f > + slbmte r6,r5 > +1: addi r8,r8,16 > + .endr > + > + ld r4,_SDR1(r1) > + mtspr SPRN_SDR1,r4 > + > + ld r4,_TSCR(r1) > + mtspr SPRN_TSCR,r4 > + > + REST_NVGPRS(r1) > + REST_GPR(2, r1) > + ld r3,_CCR(r1) > + ld r4,_MSR(r1) > + ld r5,_NIP(r1) > + addi r1,r1,INT_FRAME_SIZE > + mtcr r3 > + mfspr r3,SPRN_SRR1 /* Return SRR1 */ > + mtspr SPRN_SRR1,r4 > + mtspr SPRN_SRR0,r5 > + rfid > + > _GLOBAL(power7_wakeup_tb_loss) > ld r2,PACATOC(r13); > ld r1,PACAR1(r13) > diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c > index 9d9a898..f45b52d 100644 > --- a/arch/powerpc/platforms/powernv/setup.c > +++ b/arch/powerpc/platforms/powernv/setup.c > @@ -370,6 +370,29 @@ static unsigned long pnv_power7_sleep(void) > return srr1; > } > > +/* > + * We need to keep track of offline cpus also for calling > + * fastsleep workaround appropriately > + */ > +static unsigned long pnv_power7_winkle(void) > +{ > + int cpu, primary_thread; > + unsigned long srr1; > + > + cpu = smp_processor_id(); > + primary_thread = cpu_first_thread_sibling(cpu); > + > + if (need_fastsleep_workaround) { > + pnv_apply_fastsleep_workaround(1, primary_thread); > + srr1 = __power7_winkle(); > + pnv_apply_fastsleep_workaround(0, primary_thread); > + } else { > + srr1 = __power7_winkle(); > + } > + return srr1; > +} > + > + > static void __init pnv_setup_machdep_opal(void) > { > ppc_md.get_boot_time = opal_get_boot_time; > @@ -384,6 +407,7 @@ static void __init pnv_setup_machdep_opal(void) > ppc_md.handle_hmi_exception = opal_handle_hmi_exception; > ppc_md.setup_idle = pnv_setup_idle; > ppc_md.power7_sleep = pnv_power7_sleep; > + ppc_md.power7_winkle = pnv_power7_winkle; > } > > #ifdef CONFIG_PPC_POWERNV_RTAS ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 3/5] powerpc/powernv: Add winkle infrastructure 2014-10-07 5:33 ` Benjamin Herrenschmidt @ 2014-10-07 9:56 ` Shreyas B Prabhu 0 siblings, 0 replies; 10+ messages in thread From: Shreyas B Prabhu @ 2014-10-07 9:56 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel On Tuesday 07 October 2014 11:03 AM, Benjamin Herrenschmidt wrote: > On Wed, 2014-10-01 at 13:16 +0530, Shreyas B. Prabhu wrote: >> Winkle causes power to be gated off to the entire chiplet. Hence the >> hypervisor/firmware state in the entire chiplet is lost. >> >> This patch adds necessary infrastructure to support waking up from >> hypervisor state loss. Specifically does following: >> - Before entering winkle, save state of registers that need to be >> restored on wake up (SDR1, HFSCR) > > Add ... to your list, it's not exhaustive, is it ? I use interrupt stack frame for only SDR1 and HFSCR. The rest of the SPRs are restored via PORE in the next patch. I'll change the comments to better reflect this. > >> - SRR1 bits 46:47 which is used to identify which power saving mode cpu >> woke up from is '11' for both winkle and sleep. Hence introduce a flag >> in PACA to distinguish b/w winkle and sleep. >> >> - Upon waking up, restore all saved registers, recover slb >> >> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> >> Cc: Paul Mackerras <paulus@samba.org> >> Cc: Michael Ellerman <mpe@ellerman.id.au> >> Cc: linuxppc-dev@lists.ozlabs.org >> Suggested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> >> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> >> --- >> arch/powerpc/include/asm/machdep.h | 1 + >> arch/powerpc/include/asm/paca.h | 3 ++ >> arch/powerpc/include/asm/ppc-opcode.h | 2 + >> arch/powerpc/include/asm/processor.h | 2 + >> arch/powerpc/kernel/asm-offsets.c | 1 + >> arch/powerpc/kernel/exceptions-64s.S | 8 ++-- >> arch/powerpc/kernel/idle.c | 11 +++++ >> arch/powerpc/kernel/idle_power7.S | 81 +++++++++++++++++++++++++++++++++- >> arch/powerpc/platforms/powernv/setup.c | 24 ++++++++++ >> 9 files changed, 127 insertions(+), 6 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h >> index f37014f..0a3ced9 100644 >> --- a/arch/powerpc/include/asm/machdep.h >> +++ b/arch/powerpc/include/asm/machdep.h >> @@ -301,6 +301,7 @@ struct machdep_calls { >> /* Idle handlers */ >> void (*setup_idle)(void); >> unsigned long (*power7_sleep)(void); >> + unsigned long (*power7_winkle)(void); >> }; > > Why does it need to be ppc_md ? Same comments as for sleep > >> extern void e500_idle(void); >> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h >> index a5139ea..3358f09 100644 >> --- a/arch/powerpc/include/asm/paca.h >> +++ b/arch/powerpc/include/asm/paca.h >> @@ -158,6 +158,9 @@ struct paca_struct { >> * early exception handler for use by high level C handler >> */ >> struct opal_machine_check_event *opal_mc_evt; >> + >> + /* Flag to distinguish b/w sleep and winkle */ >> + u8 offline_state; > > Not fan of the name. I'd rather you call it "wakeup_state_loss" or > something a bit more explicit about what that actually means if it's > going to be a boolean value. Otherwise make it an enumeration of > constants. > Okay. I'll change this. >> #endif >> #ifdef CONFIG_PPC_BOOK3S_64 >> /* Exclusive emergency stack pointer for machine check exception. */ >> diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h >> index 6f85362..5155be7 100644 >> --- a/arch/powerpc/include/asm/ppc-opcode.h >> +++ b/arch/powerpc/include/asm/ppc-opcode.h >> @@ -194,6 +194,7 @@ >> >> #define PPC_INST_NAP 0x4c000364 >> #define PPC_INST_SLEEP 0x4c0003a4 >> +#define PPC_INST_WINKLE 0x4c0003e4 >> >> /* A2 specific instructions */ >> #define PPC_INST_ERATWE 0x7c0001a6 >> @@ -374,6 +375,7 @@ >> >> #define PPC_NAP stringify_in_c(.long PPC_INST_NAP) >> #define PPC_SLEEP stringify_in_c(.long PPC_INST_SLEEP) >> +#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE) >> >> /* BHRB instructions */ >> #define PPC_CLRBHRB stringify_in_c(.long PPC_INST_CLRBHRB) >> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h >> index 41953cd..00e3df9 100644 >> --- a/arch/powerpc/include/asm/processor.h >> +++ b/arch/powerpc/include/asm/processor.h >> @@ -455,6 +455,8 @@ extern void arch_setup_idle(void); >> extern void power7_nap(int check_irq); >> extern unsigned long power7_sleep(void); >> extern unsigned long __power7_sleep(void); >> +extern unsigned long power7_winkle(void); >> +extern unsigned long __power7_winkle(void); >> extern void flush_instruction_cache(void); >> extern void hard_reset_now(void); >> extern void poweroff_now(void); >> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c >> index 9d7dede..ea98817 100644 >> --- a/arch/powerpc/kernel/asm-offsets.c >> +++ b/arch/powerpc/kernel/asm-offsets.c >> @@ -731,6 +731,7 @@ int main(void) >> DEFINE(OPAL_MC_SRR0, offsetof(struct opal_machine_check_event, srr0)); >> DEFINE(OPAL_MC_SRR1, offsetof(struct opal_machine_check_event, srr1)); >> DEFINE(PACA_OPAL_MC_EVT, offsetof(struct paca_struct, opal_mc_evt)); >> + DEFINE(PACAOFFLINESTATE, offsetof(struct paca_struct, offline_state)); >> #endif >> >> return 0; >> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S >> index c64f3cc0..261f348 100644 >> --- a/arch/powerpc/kernel/exceptions-64s.S >> +++ b/arch/powerpc/kernel/exceptions-64s.S >> @@ -115,9 +115,7 @@ BEGIN_FTR_SECTION >> #endif >> >> /* Running native on arch 2.06 or later, check if we are >> - * waking up from nap. We only handle no state loss and >> - * supervisor state loss. We do -not- handle hypervisor >> - * state loss at this time. >> + * waking up from power saving mode. >> */ >> mfspr r13,SPRN_SRR1 >> rlwinm. r13,r13,47-31,30,31 >> @@ -133,8 +131,8 @@ BEGIN_FTR_SECTION >> b power7_wakeup_noloss >> 2: b power7_wakeup_loss >> >> - /* Fast Sleep wakeup on PowerNV */ >> -8: b power7_wakeup_tb_loss >> + /* Fast Sleep / Winkle wakeup on PowerNV */ >> +8: b power7_wakeup_hv_state_loss >> >> 9: >> END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) >> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c >> index 1f268e0..ed46217 100644 >> --- a/arch/powerpc/kernel/idle.c >> +++ b/arch/powerpc/kernel/idle.c >> @@ -98,6 +98,17 @@ unsigned long power7_sleep(void) >> return ret; >> } >> >> +unsigned long power7_winkle(void) >> +{ >> + unsigned long ret; >> + >> + if (ppc_md.power7_winkle) >> + ret = ppc_md.power7_winkle(); >> + else >> + ret = __power7_winkle(); >> + return ret; >> +} >> + >> int powersave_nap; >> >> #ifdef CONFIG_SYSCTL >> diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S >> index c3481c9..87b2556 100644 >> --- a/arch/powerpc/kernel/idle_power7.S >> +++ b/arch/powerpc/kernel/idle_power7.S >> @@ -18,6 +18,13 @@ >> #include <asm/hw_irq.h> >> #include <asm/kvm_book3s_asm.h> >> #include <asm/opal.h> >> +#include <asm/mmu-hash64.h> >> + >> +/* >> + * Use volatile GPRs' space to save essential SPRs before entering winkle >> + */ >> +#define _SDR1 GPR3 >> +#define _TSCR GPR4 >> >> #undef DEBUG >> >> @@ -39,6 +46,7 @@ >> * Pass requested state in r3: >> * 0 - nap >> * 1 - sleep >> + * 2 - winkle >> * >> * To check IRQ_HAPPENED in r4 >> * 0 - don't check >> @@ -109,9 +117,27 @@ _GLOBAL(power7_enter_nap_mode) >> #endif >> cmpwi cr0,r3,1 >> beq 2f >> + cmpwi cr0,r3,2 >> + beq 3f >> IDLE_STATE_ENTER_SEQ(PPC_NAP) >> /* No return */ >> -2: IDLE_STATE_ENTER_SEQ(PPC_SLEEP) >> +2: >> + li r4,1 >> + stb r4,PACAOFFLINESTATE(r13) >> + IDLE_STATE_ENTER_SEQ(PPC_SLEEP) >> + /* No return */ >> + >> +3: >> + mfspr r4,SPRN_SDR1 >> + std r4,_SDR1(r1) >> + >> + mfspr r4,SPRN_TSCR >> + std r4,_TSCR(r1) >> + >> + /* Enter winkle */ >> + li r4,0 >> + stb r4,PACAOFFLINESTATE(r13) >> + IDLE_STATE_ENTER_SEQ(PPC_WINKLE) >> /* No return */ >> >> _GLOBAL(power7_idle) >> @@ -187,6 +213,59 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \ >> 20: nop; >> >> >> +_GLOBAL(__power7_winkle) >> + li r3,2 >> + li r4,1 >> + b power7_powersave_common >> + /* No return */ >> + >> +_GLOBAL(power7_wakeup_hv_state_loss) >> + /* Check paca flag to diffentiate b/w fast sleep and winkle */ >> + lbz r4,PACAOFFLINESTATE(13) >> + cmpwi cr0,r4,0 >> + bne power7_wakeup_tb_loss >> + >> + ld r2,PACATOC(r13); >> + ld r1,PACAR1(r13) >> + >> + bl __restore_cpu_power8 > > So if I understand correctly, you use a per-cpu flag not a per-core flag > which means we will assume a pessimistic case of having to restore stuff > even if the core didn't actually enter winkle (because the last thread > to go down went to sleep). This is sub-optimal. Also see below: > >> + /* Time base re-sync */ >> + li r3,OPAL_RESYNC_TIMEBASE >> + bl opal_call_realmode; > > You will also resync the timebase (and restore all the core shared SPRs) > for each thread. This is problematic, especially with KVM as you could > have a situation where: > > - The first thread comes out and starts diving into KVM > > - The other threads start coming out while the first one is doing the > above. > > Thus the first thread might already be manipulating some core registers > (SDR1 etc...) while the secondaries come back and ... whack it. Worse, > the primary might have applied the TB offset using TBU40 while the > secondaries resync the timebase back to the host value, incurring a > loss of TB for the guest. > Such a race is prevented with kvm_hstate.hwthread_req and kvm_hstate.hwthread_state paca flags. The current flow when a guest is scheduled on a core : -> Primary thread sets kvm_hstate.hwthread_req paca flag for all the secondary threads. -> Waits for all the secondary threads to to change state to !KVM_HWTHREAD_IN_KERNEL -> and later call __kvmppc_vcore_entry which down the line changes SDR1 and other per core registers. Therefore kvm_hstate.hwthread_req is set to 1 for all the threads in the core *before* SDR1 is switched. And when a secondary thread is woken up to execute guest, in 0x100 we check hwthread_req and branch to kvm_start_guest if set. Therefore secondary threads woken up for guest do not execute the power7_wakeup_hv_state_loss and therefore there is no danger of overwriting SDR1 or TBU40. Now lets consider the case where a guest is scheduled on the core and a secondary thread is woken up even though there is no vcpu to run on it. (Say its woken up by a stray IPI). In this case, again in 0x100 we branch to kvm_start_guest, and here when there is no vcpu to run, it executes nap. So again there no danger of overwriting SDR1. >> + /* Restore SLB from PACA */ >> + ld r8,PACA_SLBSHADOWPTR(r13) >> + >> + .rept SLB_NUM_BOLTED >> + li r3, SLBSHADOW_SAVEAREA >> + LDX_BE r5, r8, r3 >> + addi r3, r3, 8 >> + LDX_BE r6, r8, r3 >> + andis. r7,r5,SLB_ESID_V@h >> + beq 1f >> + slbmte r6,r5 >> +1: addi r8,r8,16 >> + .endr >> + >> + ld r4,_SDR1(r1) >> + mtspr SPRN_SDR1,r4 >> + >> + ld r4,_TSCR(r1) >> + mtspr SPRN_TSCR,r4 >> + >> + REST_NVGPRS(r1) >> + REST_GPR(2, r1) >> + ld r3,_CCR(r1) >> + ld r4,_MSR(r1) >> + ld r5,_NIP(r1) >> + addi r1,r1,INT_FRAME_SIZE >> + mtcr r3 >> + mfspr r3,SPRN_SRR1 /* Return SRR1 */ >> + mtspr SPRN_SRR1,r4 >> + mtspr SPRN_SRR0,r5 >> + rfid >> + >> _GLOBAL(power7_wakeup_tb_loss) >> ld r2,PACATOC(r13); >> ld r1,PACAR1(r13) >> diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c >> index 9d9a898..f45b52d 100644 >> --- a/arch/powerpc/platforms/powernv/setup.c >> +++ b/arch/powerpc/platforms/powernv/setup.c >> @@ -370,6 +370,29 @@ static unsigned long pnv_power7_sleep(void) >> return srr1; >> } >> >> +/* >> + * We need to keep track of offline cpus also for calling >> + * fastsleep workaround appropriately >> + */ >> +static unsigned long pnv_power7_winkle(void) >> +{ >> + int cpu, primary_thread; >> + unsigned long srr1; >> + >> + cpu = smp_processor_id(); >> + primary_thread = cpu_first_thread_sibling(cpu); >> + >> + if (need_fastsleep_workaround) { >> + pnv_apply_fastsleep_workaround(1, primary_thread); >> + srr1 = __power7_winkle(); >> + pnv_apply_fastsleep_workaround(0, primary_thread); >> + } else { >> + srr1 = __power7_winkle(); >> + } >> + return srr1; >> +} >> + >> + >> static void __init pnv_setup_machdep_opal(void) >> { >> ppc_md.get_boot_time = opal_get_boot_time; >> @@ -384,6 +407,7 @@ static void __init pnv_setup_machdep_opal(void) >> ppc_md.handle_hmi_exception = opal_handle_hmi_exception; >> ppc_md.setup_idle = pnv_setup_idle; >> ppc_md.power7_sleep = pnv_power7_sleep; >> + ppc_md.power7_winkle = pnv_power7_winkle; >> } >> >> #ifdef CONFIG_PPC_POWERNV_RTAS > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 4/5] powerpc/powernv: Discover and enable winkle 2014-10-01 7:46 [PATCH v2 0/5] Winkle support for offline cpus Shreyas B. Prabhu ` (2 preceding siblings ...) 2014-10-01 7:46 ` [PATCH v2 3/5] powerpc/powernv: Add winkle infrastructure Shreyas B. Prabhu @ 2014-10-01 7:46 ` Shreyas B. Prabhu 2014-10-01 7:46 ` [PATCH v2 5/5] powerpc/powernv: Enter deepest supported idle state in offline Shreyas B. Prabhu 4 siblings, 0 replies; 10+ messages in thread From: Shreyas B. Prabhu @ 2014-10-01 7:46 UTC (permalink / raw) To: linux-kernel; +Cc: Shreyas B. Prabhu, linuxppc-dev, Paul Mackerras Discover winkle from device tree. If supported make OPAL calls necessary to save HIDs, HMEER, HSPRG0 and LPCR. Also make OPAL call when the HID0 value is modified during split/unsplit of cores. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> --- arch/powerpc/include/asm/opal.h | 1 + arch/powerpc/platforms/powernv/powernv.h | 1 + arch/powerpc/platforms/powernv/setup.c | 75 ++++++++++++++++++++++++++++++++ arch/powerpc/platforms/powernv/subcore.c | 15 +++++++ 4 files changed, 92 insertions(+) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index d376020..a77957f 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -778,6 +778,7 @@ extern struct device_node *opal_node; #define IDLE_INST_NAP 0x00010000 /* nap instruction can be used */ #define IDLE_INST_SLEEP 0x00020000 /* sleep instruction can be used */ #define IDLE_INST_SLEEP_ER1 0x00080000 /* Use sleep with work around*/ +#define IDLE_INST_WINKLE 0x00040000 /* winkle instruction can be used */ /* API functions */ int64_t opal_invalid_call(void); diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 31ece13..76b37f8 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -27,6 +27,7 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask) #define IDLE_USE_NAP (1UL << 0) #define IDLE_USE_SLEEP (1UL << 1) +#define IDLE_USE_WINKLE (1UL << 3) extern unsigned int pnv_get_supported_cpuidle_states(void); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index f45b52d..13c5e49 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -273,6 +273,65 @@ unsigned int pnv_get_supported_cpuidle_states(void) return supported_cpuidle_states; } +int pnv_save_sprs_for_winkle(void) +{ + int cpu; + int rc; + + /* + * hid0, hid1, hid4, hid5, hmeer and lpcr values are symmetric accross + * all cpus at boot. Get these reg values of current cpu and use the + * same accross all cpus. + */ + uint64_t lpcr_val = mfspr(SPRN_LPCR); + uint64_t hid0_val = mfspr(SPRN_HID0); + uint64_t hid1_val = mfspr(SPRN_HID1); + uint64_t hid4_val = mfspr(SPRN_HID4); + uint64_t hid5_val = mfspr(SPRN_HID5); + uint64_t hmeer_val = mfspr(SPRN_HMEER); + + for_each_possible_cpu(cpu) { + uint64_t pir = get_hard_smp_processor_id(cpu); + uint64_t local_paca_ptr = (uint64_t)&paca[cpu]; + + rc = opal_slw_set_reg(pir, SPRN_HSPRG0, local_paca_ptr); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val); + if (rc != 0) + return rc; + + /* HIDs are per core registers */ + if (cpu_thread_in_core(cpu) == 0) { + + rc = opal_slw_set_reg(pir, SPRN_HMEER, hmeer_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID0, hid0_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val); + if (rc != 0) + return rc; + + rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val); + if (rc != 0) + return rc; + + } + + } + + return 0; + +} static int __init pnv_probe_idle_states(void) { struct device_node *power_mgt; @@ -318,6 +377,22 @@ static int __init pnv_probe_idle_states(void) supported_cpuidle_states |= IDLE_USE_SLEEP; need_fastsleep_workaround = 1; } + + if (flags & IDLE_INST_WINKLE) { + /* + * If winkle is supported, save HSPRG0, HIDs and LPCR + * contents via OPAL. Enable winkle only if this + * succeeds. + */ + int opal_ret_val = pnv_save_sprs_for_winkle(); + + if (!opal_ret_val) + supported_cpuidle_states |= IDLE_USE_WINKLE; + else + pr_warn("opal: opal_slw_set_reg failed with rc=%d, disabling winkle\n", + opal_ret_val); + } + } return 0; diff --git a/arch/powerpc/platforms/powernv/subcore.c b/arch/powerpc/platforms/powernv/subcore.c index 894ecb3..47c70666e 100644 --- a/arch/powerpc/platforms/powernv/subcore.c +++ b/arch/powerpc/platforms/powernv/subcore.c @@ -24,6 +24,7 @@ #include <asm/smp.h> #include "subcore.h" +#include "powernv.h" /* @@ -159,6 +160,18 @@ static void wait_for_sync_step(int step) mb(); } +static void update_hid_in_slw(u64 hid0) +{ + u64 idle_states = pnv_get_supported_cpuidle_states(); + + if (idle_states & IDLE_USE_WINKLE) { + /* OPAL call to patch slw with the new HID0 value */ + u64 cpu_pir = hard_smp_processor_id(); + + opal_slw_set_reg(cpu_pir, SPRN_HID0, hid0); + } +} + static void unsplit_core(void) { u64 hid0, mask; @@ -178,6 +191,7 @@ static void unsplit_core(void) hid0 = mfspr(SPRN_HID0); hid0 &= ~HID0_POWER8_DYNLPARDIS; mtspr(SPRN_HID0, hid0); + update_hid_in_slw(hid0); while (mfspr(SPRN_HID0) & mask) cpu_relax(); @@ -214,6 +228,7 @@ static void split_core(int new_mode) hid0 = mfspr(SPRN_HID0); hid0 |= HID0_POWER8_DYNLPARDIS | split_parms[i].value; mtspr(SPRN_HID0, hid0); + update_hid_in_slw(hid0); /* Wait for it to happen */ while (!(mfspr(SPRN_HID0) & split_parms[i].mask)) -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 5/5] powerpc/powernv: Enter deepest supported idle state in offline 2014-10-01 7:46 [PATCH v2 0/5] Winkle support for offline cpus Shreyas B. Prabhu ` (3 preceding siblings ...) 2014-10-01 7:46 ` [PATCH v2 4/5] powerpc/powernv: Discover and enable winkle Shreyas B. Prabhu @ 2014-10-01 7:46 ` Shreyas B. Prabhu 4 siblings, 0 replies; 10+ messages in thread From: Shreyas B. Prabhu @ 2014-10-01 7:46 UTC (permalink / raw) To: linux-kernel; +Cc: Shreyas B. Prabhu, linuxppc-dev, Paul Mackerras Enter winkle during offline if supported, else revert to sleep or nap. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> --- arch/powerpc/platforms/powernv/smp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 3ad31d2..e3fc2c9 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -169,8 +169,10 @@ static void pnv_smp_cpu_kill_self(void) while (!generic_check_cpu_restart(cpu)) { ppc64_runlatch_off(); - /* If sleep is supported, go to sleep, instead of nap */ - if (idle_states & IDLE_USE_SLEEP) + /* Go to deepest supported idle state */ + if (idle_states & IDLE_USE_WINKLE) + power7_winkle(); + else if (idle_states & IDLE_USE_SLEEP) power7_sleep(); else power7_nap(1); -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-10-07 9:56 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-01 7:46 [PATCH v2 0/5] Winkle support for offline cpus Shreyas B. Prabhu 2014-10-01 7:46 ` [PATCH v2 1/5] powerpc/powernv: Add OPAL call to save and restore Shreyas B. Prabhu 2014-10-07 5:22 ` Benjamin Herrenschmidt 2014-10-01 7:46 ` [PATCH v2 2/5] powerpc: Adding macro for accessing Thread Switch Control Register Shreyas B. Prabhu 2014-10-07 5:22 ` Benjamin Herrenschmidt 2014-10-01 7:46 ` [PATCH v2 3/5] powerpc/powernv: Add winkle infrastructure Shreyas B. Prabhu 2014-10-07 5:33 ` Benjamin Herrenschmidt 2014-10-07 9:56 ` Shreyas B Prabhu 2014-10-01 7:46 ` [PATCH v2 4/5] powerpc/powernv: Discover and enable winkle Shreyas B. Prabhu 2014-10-01 7:46 ` [PATCH v2 5/5] powerpc/powernv: Enter deepest supported idle state in offline Shreyas B. Prabhu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).