* [PATCH 0/4] powernv: cpuidle: Redesign idle states management @ 2014-11-03 16:08 Shreyas B. Prabhu 2014-11-03 16:08 ` [PATCH 2/4] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu 2014-11-03 16:08 ` [PATCH 3/4] powernv: cpuidle: Redesign idle states management Shreyas B. Prabhu 0 siblings, 2 replies; 5+ messages in thread From: Shreyas B. Prabhu @ 2014-11-03 16:08 UTC (permalink / raw) To: linux-kernel Cc: Shreyas B. Prabhu, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Rafael J. Wysocki, linux-pm, linuxppc-dev, Vaidyanathan Srinivasan, Preeti U Murthy Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these states. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores are involved. Winkle is deeper idle state compared to fastsleep. In this state the power supply to the chiplet, i.e core, private L2 and private L3 is turned off. This results in a total hypervisor state loss. This patch set adds support for winkle and provides a way to track the idle states of the threads of the core and use it for idle state management of idle states sleep and winkle. TODO: ----- Handle the case where a thread enters nap and wakes up with supervisor/ hypervisor state loss. This can only happen due to a bug in the hardware or the kernel. One way to handle this can be restore the state, switch to the kernel process context and trigger a panic or a warning. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Paul Mackerras (1): powerpc: powernv: Switch off MMU before entering nap/sleep/rvwinkle mode Preeti U. Murthy (1): powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu (2): powernv: cpuidle: Redesign idle states management powernv: powerpc: Add winkle support for offline cpus arch/powerpc/include/asm/cpuidle.h | 14 ++ arch/powerpc/include/asm/opal.h | 13 + arch/powerpc/include/asm/paca.h | 6 + arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/reg.h | 2 + arch/powerpc/kernel/asm-offsets.c | 6 + arch/powerpc/kernel/cpu_setup_power.S | 4 + arch/powerpc/kernel/exceptions-64s.S | 30 ++- arch/powerpc/kernel/idle_power7.S | 326 +++++++++++++++++++++---- arch/powerpc/platforms/powernv/opal-wrappers.S | 39 +++ arch/powerpc/platforms/powernv/powernv.h | 2 + arch/powerpc/platforms/powernv/setup.c | 170 +++++++++++++ arch/powerpc/platforms/powernv/smp.c | 10 +- arch/powerpc/platforms/powernv/subcore.c | 35 +++ arch/powerpc/platforms/powernv/subcore.h | 1 + drivers/cpuidle/cpuidle-powernv.c | 10 +- 17 files changed, 611 insertions(+), 60 deletions(-) create mode 100644 arch/powerpc/include/asm/cpuidle.h -- 1.9.3 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/4] powerpc/powernv: Enable Offline CPUs to enter deep idle states 2014-11-03 16:08 [PATCH 0/4] powernv: cpuidle: Redesign idle states management Shreyas B. Prabhu @ 2014-11-03 16:08 ` Shreyas B. Prabhu 2014-11-03 16:08 ` [PATCH 3/4] powernv: cpuidle: Redesign idle states management Shreyas B. Prabhu 1 sibling, 0 replies; 5+ messages in thread From: Shreyas B. Prabhu @ 2014-11-03 16:08 UTC (permalink / raw) To: linux-kernel Cc: Preeti U. Murthy, Shreyas B. Prabhu, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Rafael J. Wysocki, linux-pm, linuxppc-dev From: "Preeti U. Murthy" <preeti@linux.vnet.ibm.com> The secondary threads should enter deep idle states so as to gain maximum powersavings when the entire core is offline. To do so the offline path must be made aware of the available deepest idle state. Hence probe the device tree for the possible idle states in powernv core code and expose the deepest idle state through flags. Since the device tree is probed by the cpuidle driver as well, move the parameters required to discover the idle states into an appropriate common place to both the driver and the powernv core code. Another point is that fastsleep idle state may require workarounds in the kernel to function properly. This workaround is introduced in the subsequent patches. However neither the cpuidle driver or the hotplug path need be bothered about this workaround. They will be taken care of by the core powernv code. Originally-by: Srivatsa S. Bhat <srivatsa@mit.edu> Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/include/asm/opal.h | 8 ++++++ arch/powerpc/platforms/powernv/powernv.h | 2 ++ arch/powerpc/platforms/powernv/setup.c | 49 ++++++++++++++++++++++++++++++++ arch/powerpc/platforms/powernv/smp.c | 7 ++++- drivers/cpuidle/cpuidle-powernv.c | 9 ++---- 5 files changed, 68 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 9124b0e..f8b95c0 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -155,6 +155,14 @@ struct opal_sg_list { #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION 102 +/* Device tree flags */ + +/* Flags set in power-mgmt nodes in device tree if + * respective idle states are supported in the platform. + */ +#define OPAL_PM_NAP_ENABLED 0x00010000 +#define OPAL_PM_SLEEP_ENABLED 0x00020000 + #ifndef __ASSEMBLY__ #include <linux/notifier.h> diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 6c8e2d1..604c48e 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -29,6 +29,8 @@ static inline u64 pnv_pci_dma_get_required_mask(struct pci_dev *pdev) } #endif +extern u32 pnv_get_supported_cpuidle_states(void); + extern void pnv_lpc_init(void); bool cpu_core_split_required(void); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 3f9546d..34c6665 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -290,6 +290,55 @@ static void __init pnv_setup_machdep_rtas(void) } #endif /* CONFIG_PPC_POWERNV_RTAS */ +static u32 supported_cpuidle_states; + +u32 pnv_get_supported_cpuidle_states(void) +{ + return supported_cpuidle_states; +} + +static int __init pnv_init_idle_states(void) +{ + struct device_node *power_mgt; + int dt_idle_states; + const __be32 *idle_state_flags; + u32 len_flags, flags; + int i; + + supported_cpuidle_states = 0; + + if (cpuidle_disable != IDLE_NO_OVERRIDE) + return 0; + + if (!firmware_has_feature(FW_FEATURE_OPALv3)) + return 0; + + power_mgt = of_find_node_by_path("/ibm,opal/power-mgt"); + if (!power_mgt) { + pr_warn("opal: PowerMgmt Node not found\n"); + return 0; + } + + idle_state_flags = of_get_property(power_mgt, + "ibm,cpu-idle-state-flags", &len_flags); + if (!idle_state_flags) { + pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n"); + return 0; + } + + dt_idle_states = len_flags / sizeof(u32); + + for (i = 0; i < dt_idle_states; i++) { + flags = be32_to_cpu(idle_state_flags[i]); + supported_cpuidle_states |= flags; + } + + return 0; +} + +subsys_initcall(pnv_init_idle_states); + + static int __init pnv_probe(void) { unsigned long root = of_get_flat_dt_root(); diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 4753958..3dc4cec 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void) static void pnv_smp_cpu_kill_self(void) { unsigned int cpu; + u32 idle_states; /* Standard hot unplug procedure */ local_irq_disable(); @@ -159,13 +160,17 @@ static void pnv_smp_cpu_kill_self(void) generic_set_cpu_dead(cpu); smp_wmb(); + idle_states = pnv_get_supported_cpuidle_states(); /* We don't want to take decrementer interrupts while we are offline, * so clear LPCR:PECE1. We keep PECE2 enabled. */ mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1); while (!generic_check_cpu_restart(cpu)) { ppc64_runlatch_off(); - power7_nap(1); + if (idle_states & OPAL_PM_SLEEP_ENABLED) + power7_sleep(); + else + power7_nap(1); ppc64_runlatch_on(); /* Clear the IPI that woke us up */ diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index 7d3a349..0a7d827 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -16,13 +16,10 @@ #include <asm/machdep.h> #include <asm/firmware.h> +#include <asm/opal.h> #include <asm/runlatch.h> -/* Flags and constants used in PowerNV platform */ - #define MAX_POWERNV_IDLE_STATES 8 -#define IDLE_USE_INST_NAP 0x00010000 /* Use nap instruction */ -#define IDLE_USE_INST_SLEEP 0x00020000 /* Use sleep instruction */ struct cpuidle_driver powernv_idle_driver = { .name = "powernv_idle", @@ -198,7 +195,7 @@ static int powernv_add_idle_states(void) * target residency to be 10x exit_latency */ latency_ns = be32_to_cpu(idle_state_latency[i]); - if (flags & IDLE_USE_INST_NAP) { + if (flags & OPAL_PM_NAP_ENABLED) { /* Add NAP state */ strcpy(powernv_states[nr_idle_states].name, "Nap"); strcpy(powernv_states[nr_idle_states].desc, "Nap"); @@ -211,7 +208,7 @@ static int powernv_add_idle_states(void) nr_idle_states++; } - if (flags & IDLE_USE_INST_SLEEP) { + if (flags & OPAL_PM_SLEEP_ENABLED) { /* Add FASTSLEEP state */ strcpy(powernv_states[nr_idle_states].name, "FastSleep"); strcpy(powernv_states[nr_idle_states].desc, "FastSleep"); -- 1.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/4] powernv: cpuidle: Redesign idle states management 2014-11-03 16:08 [PATCH 0/4] powernv: cpuidle: Redesign idle states management Shreyas B. Prabhu 2014-11-03 16:08 ` [PATCH 2/4] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu @ 2014-11-03 16:08 ` Shreyas B. Prabhu 2014-11-12 6:51 ` Preeti U Murthy 1 sibling, 1 reply; 5+ messages in thread From: Shreyas B. Prabhu @ 2014-11-03 16:08 UTC (permalink / raw) To: linux-kernel Cc: Shreyas B. Prabhu, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Rafael J. Wysocki, linux-pm, linuxppc-dev Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these state. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores and kvm is involved. This patch adds the necessary infrastructure to track idle states of threads in a per-core structure. It uses this info to perform tasks like fastsleep workaround and timebase resync only once per core. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Originally-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/include/asm/cpuidle.h | 14 ++ arch/powerpc/include/asm/opal.h | 2 + arch/powerpc/include/asm/paca.h | 4 + arch/powerpc/kernel/asm-offsets.c | 4 + arch/powerpc/kernel/exceptions-64s.S | 20 ++- arch/powerpc/kernel/idle_power7.S | 183 +++++++++++++++++++------ arch/powerpc/platforms/powernv/opal-wrappers.S | 37 +++++ arch/powerpc/platforms/powernv/setup.c | 52 ++++++- arch/powerpc/platforms/powernv/smp.c | 3 +- drivers/cpuidle/cpuidle-powernv.c | 3 +- 10 files changed, 267 insertions(+), 55 deletions(-) create mode 100644 arch/powerpc/include/asm/cpuidle.h diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h new file mode 100644 index 0000000..8c82850 --- /dev/null +++ b/arch/powerpc/include/asm/cpuidle.h @@ -0,0 +1,14 @@ +#ifndef _ASM_POWERPC_CPUIDLE_H +#define _ASM_POWERPC_CPUIDLE_H + +#ifdef CONFIG_PPC_POWERNV +/* Used in powernv idle state management */ +#define PNV_THREAD_RUNNING 0 +#define PNV_THREAD_NAP 1 +#define PNV_THREAD_SLEEP 2 +#define PNV_THREAD_WINKLE 3 +#define PNV_CORE_IDLE_LOCK_BIT 0x100 +#define PNV_CORE_IDLE_THREAD_BITS 0x0FF +#endif + +#endif diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index f8b95c0..bef7fbc 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -152,6 +152,7 @@ struct opal_sg_list { #define OPAL_PCI_ERR_INJECT 96 #define OPAL_PCI_EEH_FREEZE_SET 97 #define OPAL_HANDLE_HMI 98 +#define OPAL_CONFIG_CPU_IDLE_STATE 99 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION 102 @@ -162,6 +163,7 @@ struct opal_sg_list { */ #define OPAL_PM_NAP_ENABLED 0x00010000 #define OPAL_PM_SLEEP_ENABLED 0x00020000 +#define OPAL_PM_SLEEP_ENABLED_ER1 0x00080000 #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index a5139ea..85aeedb 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -158,6 +158,10 @@ struct paca_struct { * early exception handler for use by high level C handler */ struct opal_machine_check_event *opal_mc_evt; + + /* Per-core mask tracking idle threads and a lock bit-[L][TTTTTTTT] */ + u32 *core_idle_state_ptr; + u8 thread_idle_state; /* ~Idle[0]/Nap[1]/Sleep[2]/Winkle[3] */ #endif #ifdef CONFIG_PPC_BOOK3S_64 /* Exclusive emergency stack pointer for machine check exception. */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9d7dede..50f299e 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -731,6 +731,10 @@ int main(void) DEFINE(OPAL_MC_SRR0, offsetof(struct opal_machine_check_event, srr0)); DEFINE(OPAL_MC_SRR1, offsetof(struct opal_machine_check_event, srr1)); DEFINE(PACA_OPAL_MC_EVT, offsetof(struct paca_struct, opal_mc_evt)); + DEFINE(PACA_CORE_IDLE_STATE_PTR, + offsetof(struct paca_struct, core_idle_state_ptr)); + DEFINE(PACA_THREAD_IDLE_STATE, + offsetof(struct paca_struct, thread_idle_state)); #endif return 0; diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 72e783e..3311c8d 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -15,6 +15,7 @@ #include <asm/hw_irq.h> #include <asm/exception-64s.h> #include <asm/ptrace.h> +#include <asm/cpuidle.h> /* * We layout physical memory as follows: @@ -109,15 +110,19 @@ BEGIN_FTR_SECTION rlwinm. r13,r13,47-31,30,31 beq 9f - /* waking up from powersave (nap) state */ cmpwi cr1,r13,2 - /* Total loss of HV state is fatal, we could try to use the - * PIR to locate a PACA, then use an emergency stack etc... - * OPAL v3 based powernv platforms have new idle states - * which fall in this catagory. - */ - bgt cr1,8f + GET_PACA(r13) + lbz r0,PACA_THREAD_IDLE_STATE(r13) + cmpwi cr2,r0,PNV_THREAD_NAP + bgt cr2,8f /* Either sleep or Winkle */ + + /* Waking up from nap should not cause hypervisor state loss */ + bgt cr1,. + + /* Waking up from nap */ + li r0,PNV_THREAD_RUNNING + stb r0,PACA_THREAD_IDLE_STATE(r13) /* Clear thread state */ #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE li r0,KVM_HWTHREAD_IN_KERNEL @@ -1386,6 +1391,7 @@ machine_check_handle_early: MACHINE_CHECK_HANDLER_WINDUP GET_PACA(r13) ld r1,PACAR1(r13) + li r3,PNV_THREAD_NAP b power7_enter_nap_mode 4: #endif diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index 283c603..df11acb 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -18,6 +18,7 @@ #include <asm/hw_irq.h> #include <asm/kvm_book3s_asm.h> #include <asm/opal.h> +#include <asm/cpuidle.h> #undef DEBUG @@ -123,12 +124,62 @@ power7_enter_nap_mode: li r4,KVM_HWTHREAD_IN_NAP stb r4,HSTATE_HWTHREAD_STATE(r13) #endif - cmpwi cr0,r3,1 - beq 2f + stb r3,PACA_THREAD_IDLE_STATE(r13) + cmpwi cr1,r3,PNV_THREAD_SLEEP + bge cr1,2f IDLE_STATE_ENTER_SEQ(PPC_NAP) /* No return */ -2: IDLE_STATE_ENTER_SEQ(PPC_SLEEP) - /* No return */ +2: + /* Sleep or winkle */ + li r7,1 + mfspr r8,SPRN_PIR + /* + * The last 3 bits of PIR represents the thread id of a cpu + * in power8. This will need adjusting for power7. + */ + andi. r8,r8,0x07 /* Get thread id into r8 */ + rotld r7,r7,r8 + + ld r14,PACA_CORE_IDLE_STATE_PTR(r13) +lwarx_loop1: + lwarx r15,0,r14 + andc r15,r15,r7 /* Clear thread bit */ + + andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS + beq last_thread + + /* Not the last thread to goto sleep */ + stwcx. r15,0,r14 + bne- lwarx_loop1 + b common_enter + +last_thread: + LOAD_REG_ADDR(r3, pnv_need_fastsleep_workaround) + lbz r3,0(r3) + cmpwi r3,1 + bne common_enter + /* + * Last thread of the core entering sleep. Last thread needs to execute + * the hardware bug workaround code. Before that, set the lock bit to + * avoid the race of other threads waking up and undoing workaround + * before workaround is applied. + */ + ori r15,r15,PNV_CORE_IDLE_LOCK_BIT + stwcx. r15,0,r14 + bne- lwarx_loop1 + + /* Fast sleep workaround */ + li r3,1 + li r4,1 + li r0,OPAL_CONFIG_CPU_IDLE_STATE + bl opal_call_realmode + + /* Clear Lock bit */ + andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS + stw r15,0(r14) + +common_enter: /* common code for all the threads entering sleep */ + IDLE_STATE_ENTER_SEQ(PPC_SLEEP) _GLOBAL(power7_idle) /* Now check if user or arch enabled NAP mode */ @@ -141,49 +192,16 @@ _GLOBAL(power7_idle) _GLOBAL(power7_nap) mr r4,r3 - li r3,0 + li r3,1 b power7_powersave_common /* No return */ _GLOBAL(power7_sleep) - li r3,1 + li r3,2 li r4,1 b power7_powersave_common /* No return */ -/* - * Make opal call in realmode. This is a generic function to be called - * from realmode from reset vector. It handles endianess. - * - * r13 - paca pointer - * r1 - stack pointer - * r3 - opal token - */ -opal_call_realmode: - mflr r12 - std r12,_LINK(r1) - ld r2,PACATOC(r13) - /* Set opal return address */ - LOAD_REG_ADDR(r0,return_from_opal_call) - mtlr r0 - /* Handle endian-ness */ - li r0,MSR_LE - mfmsr r12 - andc r12,r12,r0 - mtspr SPRN_HSRR1,r12 - mr r0,r3 /* Move opal token to r0 */ - LOAD_REG_ADDR(r11,opal) - ld r12,8(r11) - ld r2,0(r11) - mtspr SPRN_HSRR0,r12 - hrfid - -return_from_opal_call: - FIXUP_ENDIAN - ld r0,_LINK(r1) - mtlr r0 - blr - #define CHECK_HMI_INTERRUPT \ mfspr r0,SPRN_SRR1; \ BEGIN_FTR_SECTION_NESTED(66); \ @@ -196,10 +214,8 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \ /* Invoke opal call to handle hmi */ \ ld r2,PACATOC(r13); \ ld r1,PACAR1(r13); \ - std r3,ORIG_GPR3(r1); /* Save original r3 */ \ - li r3,OPAL_HANDLE_HMI; /* Pass opal token argument*/ \ + li r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/ \ bl opal_call_realmode; \ - ld r3,ORIG_GPR3(r1); /* Restore original r3 */ \ 20: nop; @@ -210,12 +226,91 @@ _GLOBAL(power7_wakeup_tb_loss) BEGIN_FTR_SECTION CHECK_HMI_INTERRUPT END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) + + li r7,1 + mfspr r8,SPRN_PIR + /* + * The last 3 bits of PIR represents the thread id of a cpu + * in power8. This will need adjusting for power7. + */ + andi. r8,r8,0x07 /* Get thread id into r8 */ + rotld r7,r7,r8 + /* r7 now has 'thread_id'th bit set */ + + ld r14,PACA_CORE_IDLE_STATE_PTR(r13) +lwarx_loop2: + lwarx r15,0,r14 + andi. r9,r15,PNV_CORE_IDLE_LOCK_BIT + /* + * Lock bit is set in one of the 2 cases- + * a. In the sleep/winkle enter path, the last thread is executing + * fastsleep workaround code. + * b. In the wake up path, another thread is executing fastsleep + * workaround undo code or resyncing timebase or restoring context + * In either case loop until the lock bit is cleared. + */ + bne lwarx_loop2 + + cmpwi cr2,r15,0 + or r15,r15,r7 /* Set thread bit */ + + beq cr2,first_thread + + /* Not first thread in core to wake up */ + stwcx. r15,0,r14 + bne- lwarx_loop2 + b common_exit + +first_thread: + /* First thread in core to wakeup */ + ori r15,r15,PNV_CORE_IDLE_LOCK_BIT + stwcx. r15,0,r14 + bne- lwarx_loop2 + + LOAD_REG_ADDR(r3, pnv_need_fastsleep_workaround) + lbz r3,0(r3) + cmpwi r3,1 + /* skip fastsleep workaround if its not needed */ + bne timebase_resync + + /* Undo fast sleep workaround */ + mfcr r16 /* Backup CR into a non-volatile register */ + li r3,1 + li r4,0 + li r0,OPAL_CONFIG_CPU_IDLE_STATE + bl opal_call_realmode + mtcr r16 /* Restore CR */ + + /* Do timebase resync if we are waking up from sleep. Use cr1 value + * set in exceptions-64s.S */ + ble cr1,clear_lock + +timebase_resync: /* Time base re-sync */ - li r3,OPAL_RESYNC_TIMEBASE + li r0,OPAL_RESYNC_TIMEBASE bl opal_call_realmode; - /* TODO: Check r3 for failure */ +clear_lock: + andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS + stw r15,0(r14) + +common_exit: + li r5,PNV_THREAD_RUNNING + stb r5,PACA_THREAD_IDLE_STATE(r13) + +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE + li r0,KVM_HWTHREAD_IN_KERNEL + stb r0,HSTATE_HWTHREAD_STATE(r13) + /* Order setting hwthread_state vs. testing hwthread_req */ + sync + lbz r0,HSTATE_HWTHREAD_REQ(r13) + cmpwi r0,0 + beq 6f + b kvm_start_guest +6: +#endif + REST_NVGPRS(r1) REST_GPR(2, r1) ld r3,_CCR(r1) diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index feb549a..b2aa93b 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -158,6 +158,43 @@ opal_tracepoint_return: blr #endif +/* + * Make opal call in realmode. This is a generic function to be called + * from realmode. It handles endianness. + * + * r13 - paca pointer + * r1 - stack pointer + * r0 - opal token + */ +_GLOBAL(opal_call_realmode) + mflr r12 + std r12,_LINK(r1) + ld r2,PACATOC(r13) + /* Set opal return address */ + LOAD_REG_ADDR(r12,return_from_opal_call) + mtlr r12 + + mfmsr r12 +#ifdef __LITTLE_ENDIAN__ + /* Handle endian-ness */ + li r11,MSR_LE + andc r12,r12,r11 +#endif + mtspr SPRN_HSRR1,r12 + LOAD_REG_ADDR(r11,opal) + ld r12,8(r11) + ld r2,0(r11) + mtspr SPRN_HSRR0,r12 + hrfid + +return_from_opal_call: +#ifdef __LITTLE_ENDIAN__ + FIXUP_ENDIAN +#endif + ld r12,_LINK(r1) + mtlr r12 + blr + OPAL_CALL(opal_invalid_call, OPAL_INVALID_CALL); OPAL_CALL(opal_console_write, OPAL_CONSOLE_WRITE); OPAL_CALL(opal_console_read, OPAL_CONSOLE_READ); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 34c6665..980c964 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -36,6 +36,8 @@ #include <asm/opal.h> #include <asm/kexec.h> #include <asm/smp.h> +#include <asm/cputhreads.h> +#include <asm/cpuidle.h> #include "powernv.h" @@ -292,11 +294,55 @@ static void __init pnv_setup_machdep_rtas(void) static u32 supported_cpuidle_states; +static void pnv_alloc_idle_core_states(void) +{ + int i, j; + int nr_cores = cpu_nr_cores(); + u32 *core_idle_state; + + /* + * Deep idle states like sleep and winkle are per core idle states. + * A core enters these states only when all the threads enter either + * the particular idle state or a deeper one. There are tasks like + * fastsleep hardware bug workaround and hypervisor core state save + * which have to be done only by the last thread of the core entering + * deep idle state and similarly tasks like timebase resync, hypervisor + * core register restore that have to be done only by the first thread + * waking up from these states. Introducing core_idle_state, a per core + * structure which will keep track threads entering idle states deeper + * than sleep. + * core_idle_state - First 8 bits track the idle state of each thread + * of the core. The 8th bit is the lock bit. Initially all thread bits + * are set. They are cleared when the thread enters deep idle state + * like sleep and winkle. Initially the lock bit is cleared. + * The lock bit has 2 purposes + * a. While the first thread is restoring core state, it prevents + * from other threads in the core from switching to prcoess context. + * b. While the last thread in the core is saving the core state, it + * prevent a different thread from waking up. + */ + for (i = 0; i < nr_cores; i++) { + int first_cpu = i * threads_per_core; + int node = cpu_to_node(first_cpu); + + core_idle_state = kmalloc_node(sizeof(u32), GFP_KERNEL, node); + for (j = 0; j < threads_per_core; j++) { + int cpu = first_cpu + j; + + paca[cpu].core_idle_state_ptr = core_idle_state; + paca[cpu].thread_idle_state = PNV_THREAD_RUNNING; + + } + } +} + u32 pnv_get_supported_cpuidle_states(void) { return supported_cpuidle_states; } +EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states); +u8 pnv_need_fastsleep_workaround; static int __init pnv_init_idle_states(void) { struct device_node *power_mgt; @@ -306,6 +352,7 @@ static int __init pnv_init_idle_states(void) int i; supported_cpuidle_states = 0; + pnv_need_fastsleep_workaround = 0; if (cpuidle_disable != IDLE_NO_OVERRIDE) return 0; @@ -332,13 +379,14 @@ static int __init pnv_init_idle_states(void) flags = be32_to_cpu(idle_state_flags[i]); supported_cpuidle_states |= flags; } - + if (supported_cpuidle_states & OPAL_PM_SLEEP_ENABLED_ER1) + pnv_need_fastsleep_workaround = 1; + pnv_alloc_idle_core_states(); return 0; } subsys_initcall(pnv_init_idle_states); - static int __init pnv_probe(void) { unsigned long root = of_get_flat_dt_root(); diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 3dc4cec..12b761a 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -167,7 +167,8 @@ static void pnv_smp_cpu_kill_self(void) mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1); while (!generic_check_cpu_restart(cpu)) { ppc64_runlatch_off(); - if (idle_states & OPAL_PM_SLEEP_ENABLED) + if ((idle_states & OPAL_PM_SLEEP_ENABLED) || + (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) power7_sleep(); else power7_nap(1); diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index 0a7d827..a489b56 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -208,7 +208,8 @@ static int powernv_add_idle_states(void) nr_idle_states++; } - if (flags & OPAL_PM_SLEEP_ENABLED) { + if (flags & OPAL_PM_SLEEP_ENABLED || + flags & OPAL_PM_SLEEP_ENABLED_ER1) { /* Add FASTSLEEP state */ strcpy(powernv_states[nr_idle_states].name, "FastSleep"); strcpy(powernv_states[nr_idle_states].desc, "FastSleep"); -- 1.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 3/4] powernv: cpuidle: Redesign idle states management 2014-11-03 16:08 ` [PATCH 3/4] powernv: cpuidle: Redesign idle states management Shreyas B. Prabhu @ 2014-11-12 6:51 ` Preeti U Murthy 2014-11-17 5:49 ` Shreyas B Prabhu 0 siblings, 1 reply; 5+ messages in thread From: Preeti U Murthy @ 2014-11-12 6:51 UTC (permalink / raw) To: Shreyas B. Prabhu, linux-kernel Cc: linux-pm, Rafael J. Wysocki, Paul Mackerras, linuxppc-dev Hi Shreyas, On 11/03/2014 09:38 PM, Shreyas B. Prabhu wrote: > diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S > index 283c603..df11acb 100644 > --- a/arch/powerpc/kernel/idle_power7.S > +++ b/arch/powerpc/kernel/idle_power7.S > _GLOBAL(power7_idle) > /* Now check if user or arch enabled NAP mode */ > @@ -141,49 +192,16 @@ _GLOBAL(power7_idle) > > _GLOBAL(power7_nap) > mr r4,r3 > - li r3,0 > + li r3,1 The comment at the top of this file states 0 for nap and 1 for sleep. You will need to change that. As an alternative I would suggest using the macros that you have already defined:PNV_THREAD_NAP and PNV_THREAD_SLEEP to write to r3 above and remove the lines that say 0 for nap and 1 for sleep in the comments. > b power7_powersave_common > /* No return */ > <snip> > @@ -210,12 +226,91 @@ _GLOBAL(power7_wakeup_tb_loss) > BEGIN_FTR_SECTION > CHECK_HMI_INTERRUPT > END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) > + > + li r7,1 > + mfspr r8,SPRN_PIR > + /* > + * The last 3 bits of PIR represents the thread id of a cpu > + * in power8. This will need adjusting for power7. > + */ > + andi. r8,r8,0x07 /* Get thread id into r8 */ > + rotld r7,r7,r8 > + /* r7 now has 'thread_id'th bit set */ > + > + ld r14,PACA_CORE_IDLE_STATE_PTR(r13) > +lwarx_loop2: > + lwarx r15,0,r14 > + andi. r9,r15,PNV_CORE_IDLE_LOCK_BIT > + /* > + * Lock bit is set in one of the 2 cases- > + * a. In the sleep/winkle enter path, the last thread is executing > + * fastsleep workaround code. > + * b. In the wake up path, another thread is executing fastsleep > + * workaround undo code or resyncing timebase or restoring context > + * In either case loop until the lock bit is cleared. > + */ > + bne lwarx_loop2 > + > + cmpwi cr2,r15,0 > + or r15,r15,r7 /* Set thread bit */ > + > + beq cr2,first_thread > + > + /* Not first thread in core to wake up */ > + stwcx. r15,0,r14 > + bne- lwarx_loop2 > + b common_exit > + > +first_thread: > + /* First thread in core to wakeup */ > + ori r15,r15,PNV_CORE_IDLE_LOCK_BIT > + stwcx. r15,0,r14 > + bne- lwarx_loop2 > + > + LOAD_REG_ADDR(r3, pnv_need_fastsleep_workaround) > + lbz r3,0(r3) > + cmpwi r3,1 > + /* skip fastsleep workaround if its not needed */ > + bne timebase_resync > + > + /* Undo fast sleep workaround */ > + mfcr r16 /* Backup CR into a non-volatile register */ Don't you want to do this ^^ before calling opal_call_realmode for timebase resync below also? > + li r3,1 > + li r4,0 > + li r0,OPAL_CONFIG_CPU_IDLE_STATE > + bl opal_call_realmode > + mtcr r16 /* Restore CR */ > + > + /* Do timebase resync if we are waking up from sleep. Use cr1 value > + * set in exceptions-64s.S */ > + ble cr1,clear_lock > + > +timebase_resync: > /* Time base re-sync */ > - li r3,OPAL_RESYNC_TIMEBASE > + li r0,OPAL_RESYNC_TIMEBASE > bl opal_call_realmode; > - > diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c > index 34c6665..980c964 100644 > --- a/arch/powerpc/platforms/powernv/setup.c > +++ b/arch/powerpc/platforms/powernv/setup.c > @@ -36,6 +36,8 @@ > #include <asm/opal.h> > #include <asm/kexec.h> > #include <asm/smp.h> > +#include <asm/cputhreads.h> > +#include <asm/cpuidle.h> > > #include "powernv.h" > > @@ -292,11 +294,55 @@ static void __init pnv_setup_machdep_rtas(void) > > static u32 supported_cpuidle_states; > > +static void pnv_alloc_idle_core_states(void) > +{ > + int i, j; > + int nr_cores = cpu_nr_cores(); > + u32 *core_idle_state; > + > + /* > + * Deep idle states like sleep and winkle are per core idle states. > + * A core enters these states only when all the threads enter either > + * the particular idle state or a deeper one. There are tasks like > + * fastsleep hardware bug workaround and hypervisor core state save > + * which have to be done only by the last thread of the core entering > + * deep idle state and similarly tasks like timebase resync, hypervisor > + * core register restore that have to be done only by the first thread > + * waking up from these states. Introducing core_idle_state, a per core > + * structure which will keep track threads entering idle states deeper > + * than sleep. Since you already have explained ^^ in the changelog, you do not need to elaborate it here. > + * core_idle_state - First 8 bits track the idle state of each thread > + * of the core. The 8th bit is the lock bit. Initially all thread bits > + * are set. They are cleared when the thread enters deep idle state > + * like sleep and winkle. Initially the lock bit is cleared. you can simply have the comment about the bits of core_idle_state without having to mention about when they are cleared etc.. > + * The lock bit has 2 purposes > + * a. While the first thread is restoring core state, it prevents > + * from other threads in the core from switching to prcoess context. > + * b. While the last thread in the core is saving the core state, it > + * prevent a different thread from waking up. The above two points are useful. As far as I see besides explaining the bits of core_idle_state structure and the purpose of lock bit the rest of the comments is redundant. A git-blame will let people know why all this is needed. The comment section should not be used up for this purpose IMO. Regards Preeti U Murthy ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 3/4] powernv: cpuidle: Redesign idle states management 2014-11-12 6:51 ` Preeti U Murthy @ 2014-11-17 5:49 ` Shreyas B Prabhu 0 siblings, 0 replies; 5+ messages in thread From: Shreyas B Prabhu @ 2014-11-17 5:49 UTC (permalink / raw) To: Preeti U Murthy, linux-kernel Cc: Paul Mackerras, Rafael J. Wysocki, linuxppc-dev, linux-pm Hi Preeti, On Wednesday 12 November 2014 12:21 PM, Preeti U Murthy wrote: > Hi Shreyas, > > On 11/03/2014 09:38 PM, Shreyas B. Prabhu wrote: >> diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S >> index 283c603..df11acb 100644 >> --- a/arch/powerpc/kernel/idle_power7.S >> +++ b/arch/powerpc/kernel/idle_power7.S >> _GLOBAL(power7_idle) >> /* Now check if user or arch enabled NAP mode */ >> @@ -141,49 +192,16 @@ _GLOBAL(power7_idle) >> >> _GLOBAL(power7_nap) >> mr r4,r3 >> - li r3,0 >> + li r3,1 > > The comment at the top of this file states 0 for nap and 1 for sleep. > You will need to change that. As an alternative I would suggest using > the macros that you have already defined:PNV_THREAD_NAP and > PNV_THREAD_SLEEP to write to r3 above and remove the lines that say 0 > for nap and 1 for sleep in the comments. My bad, I had changed the comment in the next commit. I'll make the change. > >> b power7_powersave_common >> /* No return */ >> > > <snip> > >> @@ -210,12 +226,91 @@ _GLOBAL(power7_wakeup_tb_loss) >> BEGIN_FTR_SECTION >> CHECK_HMI_INTERRUPT >> END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) >> + >> + li r7,1 >> + mfspr r8,SPRN_PIR >> + /* >> + * The last 3 bits of PIR represents the thread id of a cpu >> + * in power8. This will need adjusting for power7. >> + */ >> + andi. r8,r8,0x07 /* Get thread id into r8 */ >> + rotld r7,r7,r8 >> + /* r7 now has 'thread_id'th bit set */ >> + >> + ld r14,PACA_CORE_IDLE_STATE_PTR(r13) >> +lwarx_loop2: >> + lwarx r15,0,r14 >> + andi. r9,r15,PNV_CORE_IDLE_LOCK_BIT >> + /* >> + * Lock bit is set in one of the 2 cases- >> + * a. In the sleep/winkle enter path, the last thread is executing >> + * fastsleep workaround code. >> + * b. In the wake up path, another thread is executing fastsleep >> + * workaround undo code or resyncing timebase or restoring context >> + * In either case loop until the lock bit is cleared. >> + */ >> + bne lwarx_loop2 >> + >> + cmpwi cr2,r15,0 >> + or r15,r15,r7 /* Set thread bit */ >> + >> + beq cr2,first_thread >> + >> + /* Not first thread in core to wake up */ >> + stwcx. r15,0,r14 >> + bne- lwarx_loop2 >> + b common_exit >> + >> +first_thread: >> + /* First thread in core to wakeup */ >> + ori r15,r15,PNV_CORE_IDLE_LOCK_BIT >> + stwcx. r15,0,r14 >> + bne- lwarx_loop2 >> + >> + LOAD_REG_ADDR(r3, pnv_need_fastsleep_workaround) >> + lbz r3,0(r3) >> + cmpwi r3,1 >> + /* skip fastsleep workaround if its not needed */ >> + bne timebase_resync >> + >> + /* Undo fast sleep workaround */ >> + mfcr r16 /* Backup CR into a non-volatile register */ > > Don't you want to do this ^^ before calling opal_call_realmode for > timebase resync below also? > I am not using any of the CR registers after the timebase resync OPAL call. Though in the next commit where I do use the CRs, I restore them after the OPAL call. >> + li r3,1 >> + li r4,0 >> + li r0,OPAL_CONFIG_CPU_IDLE_STATE >> + bl opal_call_realmode >> + mtcr r16 /* Restore CR */ >> + >> + /* Do timebase resync if we are waking up from sleep. Use cr1 value >> + * set in exceptions-64s.S */ >> + ble cr1,clear_lock >> + >> +timebase_resync: >> /* Time base re-sync */ >> - li r3,OPAL_RESYNC_TIMEBASE >> + li r0,OPAL_RESYNC_TIMEBASE >> bl opal_call_realmode; >> - >> diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c >> index 34c6665..980c964 100644 >> --- a/arch/powerpc/platforms/powernv/setup.c >> +++ b/arch/powerpc/platforms/powernv/setup.c >> @@ -36,6 +36,8 @@ >> #include <asm/opal.h> >> #include <asm/kexec.h> >> #include <asm/smp.h> >> +#include <asm/cputhreads.h> >> +#include <asm/cpuidle.h> >> >> #include "powernv.h" >> >> @@ -292,11 +294,55 @@ static void __init pnv_setup_machdep_rtas(void) >> >> static u32 supported_cpuidle_states; >> >> +static void pnv_alloc_idle_core_states(void) >> +{ >> + int i, j; >> + int nr_cores = cpu_nr_cores(); >> + u32 *core_idle_state; >> + >> + /* >> + * Deep idle states like sleep and winkle are per core idle states. >> + * A core enters these states only when all the threads enter either >> + * the particular idle state or a deeper one. There are tasks like >> + * fastsleep hardware bug workaround and hypervisor core state save >> + * which have to be done only by the last thread of the core entering >> + * deep idle state and similarly tasks like timebase resync, hypervisor >> + * core register restore that have to be done only by the first thread >> + * waking up from these states. Introducing core_idle_state, a per core >> + * structure which will keep track threads entering idle states deeper >> + * than sleep. > > Since you already have explained ^^ in the changelog, you do not need to > elaborate it here. > >> + * core_idle_state - First 8 bits track the idle state of each thread >> + * of the core. The 8th bit is the lock bit. Initially all thread bits >> + * are set. They are cleared when the thread enters deep idle state >> + * like sleep and winkle. Initially the lock bit is cleared. > > you can simply have the comment about the bits of core_idle_state > without having to mention about when they are cleared etc.. Okay. Will make the change. > >> + * The lock bit has 2 purposes >> + * a. While the first thread is restoring core state, it prevents >> + * from other threads in the core from switching to prcoess context. >> + * b. While the last thread in the core is saving the core state, it >> + * prevent a different thread from waking up. > > The above two points are useful. As far as I see besides explaining the > bits of core_idle_state structure and the purpose of lock bit the rest > of the comments is redundant. A git-blame will let people know why all > this is needed. The comment section should not be used up for this > purpose IMO. > > Regards > Preeti U Murthy > _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-11-17 5:49 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-03 16:08 [PATCH 0/4] powernv: cpuidle: Redesign idle states management Shreyas B. Prabhu 2014-11-03 16:08 ` [PATCH 2/4] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu 2014-11-03 16:08 ` [PATCH 3/4] powernv: cpuidle: Redesign idle states management Shreyas B. Prabhu 2014-11-12 6:51 ` Preeti U Murthy 2014-11-17 5:49 ` Shreyas B Prabhu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).