* [patch 0/2] x86: Make offline cpus to go to deepest idle state using mwait @ 2009-05-22 23:19 venkatesh.pallipadi 2009-05-22 23:19 ` [patch 1/2] x86: Add pm_play_dead funcptr to power-efficiently offline CPUs venkatesh.pallipadi 2009-05-22 23:19 ` [patch 2/2] x86: put offline CPUs into deepest mwait cstate_subcstate venkatesh.pallipadi 0 siblings, 2 replies; 14+ messages in thread From: venkatesh.pallipadi @ 2009-05-22 23:19 UTC (permalink / raw) To: mingo, tglx, hpa Cc: linux-kernel, lenb, shaohua.li, svaidy, a.p.zijlstra, Venkatesh Pallipadi cpu hotplug has this long standing bug, with offline cpus going into hlt loop. That can result in offline CPU burning more power than while it is in online state (as when online it can potentially go to deeper C-state). Fix the bug using native mwait to enter deepest C-state in play_dead. -- ^ permalink raw reply [flat|nested] 14+ messages in thread
* [patch 1/2] x86: Add pm_play_dead funcptr to power-efficiently offline CPUs 2009-05-22 23:19 [patch 0/2] x86: Make offline cpus to go to deepest idle state using mwait venkatesh.pallipadi @ 2009-05-22 23:19 ` venkatesh.pallipadi 2009-05-23 10:44 ` Peter Zijlstra 2009-05-22 23:19 ` [patch 2/2] x86: put offline CPUs into deepest mwait cstate_subcstate venkatesh.pallipadi 1 sibling, 1 reply; 14+ messages in thread From: venkatesh.pallipadi @ 2009-05-22 23:19 UTC (permalink / raw) To: mingo, tglx, hpa Cc: linux-kernel, lenb, shaohua.li, svaidy, a.p.zijlstra, Venkatesh Pallipadi [-- Attachment #1: 0001-x86-Add-pm_play_dead-funcptr-to-power-efficiently-o.patch --] [-- Type: text/plain, Size: 1411 bytes --] Add a funcptr pm_play_dead (similar to pm_idle) that can take the offline CPUs to the most power efficient idle state. This patch just adds the func pointer. The pointer will get initialized by patch that follows. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> --- arch/x86/include/asm/smp.h | 2 ++ arch/x86/kernel/smpboot.c | 7 ++++++- 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 19e0d88..0388b81 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -148,6 +148,8 @@ static inline int num_booting_cpus(void) { return cpumask_weight(cpu_callout_mask); } + +extern void (*pm_play_dead)(void); #endif /* CONFIG_SMP */ extern unsigned disabled_cpus __cpuinitdata; diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 58d24ef..4f8af6a 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1195,6 +1195,8 @@ __init void prefill_possible_map(void) nr_cpu_ids = possible; } +void (*pm_play_dead)(void) = NULL; + #ifdef CONFIG_HOTPLUG_CPU static void remove_siblinginfo(int cpu) @@ -1313,7 +1315,10 @@ void play_dead_common(void) void native_play_dead(void) { play_dead_common(); - wbinvd_halt(); + if (pm_play_dead) + pm_play_dead(); + else + wbinvd_halt(); } #else /* ... !CONFIG_HOTPLUG_CPU */ -- 1.6.0.6 -- ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [patch 1/2] x86: Add pm_play_dead funcptr to power-efficiently offline CPUs 2009-05-22 23:19 ` [patch 1/2] x86: Add pm_play_dead funcptr to power-efficiently offline CPUs venkatesh.pallipadi @ 2009-05-23 10:44 ` Peter Zijlstra 2009-05-23 15:07 ` Pallipadi, Venkatesh 2009-06-22 17:25 ` Pallipadi, Venkatesh 0 siblings, 2 replies; 14+ messages in thread From: Peter Zijlstra @ 2009-05-23 10:44 UTC (permalink / raw) To: venkatesh.pallipadi Cc: mingo, tglx, hpa, linux-kernel, lenb, shaohua.li, svaidy On Fri, 2009-05-22 at 16:19 -0700, venkatesh.pallipadi@intel.com wrote: > plain text document attachment > (0001-x86-Add-pm_play_dead-funcptr-to-power-efficiently-o.patch) > Add a funcptr pm_play_dead (similar to pm_idle) that can take > the offline CPUs to the most power efficient idle state. > > This patch just adds the func pointer. The pointer will get initialized > by patch that follows. Since the pm_idle function pointer has given us so much grief, I don't think its wise to repeat that particular disaster. I'd much rather see a framework where idle functions can be registered, and selected from based on criteria such as wakeup latency as provided by the pm_qos stuff, and power saving. This framework should be shared between hotplug-idle and the regular idle routines. Hotplug would of course not care about things like wakeup latency and might therefore pick another idle routine. > Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> > --- > arch/x86/include/asm/smp.h | 2 ++ > arch/x86/kernel/smpboot.c | 7 ++++++- > 2 files changed, 8 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h > index 19e0d88..0388b81 100644 > --- a/arch/x86/include/asm/smp.h > +++ b/arch/x86/include/asm/smp.h > @@ -148,6 +148,8 @@ static inline int num_booting_cpus(void) > { > return cpumask_weight(cpu_callout_mask); > } > + > +extern void (*pm_play_dead)(void); > #endif /* CONFIG_SMP */ > > extern unsigned disabled_cpus __cpuinitdata; > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 58d24ef..4f8af6a 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -1195,6 +1195,8 @@ __init void prefill_possible_map(void) > nr_cpu_ids = possible; > } > > +void (*pm_play_dead)(void) = NULL; > + > #ifdef CONFIG_HOTPLUG_CPU > > static void remove_siblinginfo(int cpu) > @@ -1313,7 +1315,10 @@ void play_dead_common(void) > void native_play_dead(void) > { > play_dead_common(); > - wbinvd_halt(); > + if (pm_play_dead) > + pm_play_dead(); > + else > + wbinvd_halt(); > } > > #else /* ... !CONFIG_HOTPLUG_CPU */ > -- > 1.6.0.6 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [patch 1/2] x86: Add pm_play_dead funcptr to power-efficiently offline CPUs 2009-05-23 10:44 ` Peter Zijlstra @ 2009-05-23 15:07 ` Pallipadi, Venkatesh 2009-06-22 17:25 ` Pallipadi, Venkatesh 1 sibling, 0 replies; 14+ messages in thread From: Pallipadi, Venkatesh @ 2009-05-23 15:07 UTC (permalink / raw) To: Peter Zijlstra Cc: mingo@elte.hu, tglx@linutronix.de, hpa@zytor.com, linux-kernel@vger.kernel.org, lenb@kernel.org, Li, Shaohua, svaidy@linux.vnet.ibm.com >-----Original Message----- >From: Peter Zijlstra [mailto:a.p.zijlstra@chello.nl] >Sent: Saturday, May 23, 2009 3:45 AM >To: Pallipadi, Venkatesh >Cc: mingo@elte.hu; tglx@linutronix.de; hpa@zytor.com; >linux-kernel@vger.kernel.org; lenb@kernel.org; Li, Shaohua; >svaidy@linux.vnet.ibm.com >Subject: Re: [patch 1/2] x86: Add pm_play_dead funcptr to >power-efficiently offline CPUs > >On Fri, 2009-05-22 at 16:19 -0700, venkatesh.pallipadi@intel.com wrote: >> plain text document attachment >> (0001-x86-Add-pm_play_dead-funcptr-to-power-efficiently-o.patch) >> Add a funcptr pm_play_dead (similar to pm_idle) that can take >> the offline CPUs to the most power efficient idle state. >> >> This patch just adds the func pointer. The pointer will get >initialized >> by patch that follows. > >Since the pm_idle function pointer has given us so much grief, I don't >think its wise to repeat that particular disaster. > >I'd much rather see a framework where idle functions can be registered, >and selected from based on criteria such as wakeup latency as provided >by the pm_qos stuff, and power saving. > >This framework should be shared between hotplug-idle and the regular >idle routines. Hotplug would of course not care about things >like wakeup >latency and might therefore pick another idle routine. > Yes. Thought about that approach. There are issues with that approach however. - It would be ugly as we have to add logic in various low level idle handlers we have today mwait_idle, default_idle, acpi_mwait_idle, acpi_ioport_idle to make them aware of offline idle case and disable interrupt and not enable interrupt while going idle and not to wake up on interrupt. - There are limitations on what we can and cannot do when offline. We have had issues with calling acpi methods and/or IO port accesses, during suspend before, so CPU going on suspend only uses C1. Specifically, with the above approach we will have: 1) CPU 1 going offline and starts using regular ACPI IO port C-states. 2) CPU 0 starts suspend and after some points it is unsafe to access ACPI io port accesses, which can potentially result in crash/hangs. So, I would like to keep ACPI out of play_dead. And keeping idle and play_dead separate seemed to be cleaner way to do it. Thanks, Venki ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [patch 1/2] x86: Add pm_play_dead funcptr to power-efficiently offline CPUs 2009-05-23 10:44 ` Peter Zijlstra 2009-05-23 15:07 ` Pallipadi, Venkatesh @ 2009-06-22 17:25 ` Pallipadi, Venkatesh 1 sibling, 0 replies; 14+ messages in thread From: Pallipadi, Venkatesh @ 2009-06-22 17:25 UTC (permalink / raw) To: Pallipadi, Venkatesh, Peter Zijlstra Cc: mingo@elte.hu, tglx@linutronix.de, hpa@zytor.com, linux-kernel@vger.kernel.org, lenb@kernel.org, Li, Shaohua, svaidy@linux.vnet.ibm.com >From: Pallipadi, Venkatesh >>From: Peter Zijlstra [mailto:a.p.zijlstra@chello.nl] >>On Fri, 2009-05-22 at 16:19 -0700, >venkatesh.pallipadi@intel.com wrote: >>> plain text document attachment >>> (0001-x86-Add-pm_play_dead-funcptr-to-power-efficiently-o.patch) >>> Add a funcptr pm_play_dead (similar to pm_idle) that can take >>> the offline CPUs to the most power efficient idle state. >>> >>> This patch just adds the func pointer. The pointer will get >>initialized >>> by patch that follows. >> >>Since the pm_idle function pointer has given us so much grief, I don't >>think its wise to repeat that particular disaster. >> >>I'd much rather see a framework where idle functions can be >registered, >>and selected from based on criteria such as wakeup latency as provided >>by the pm_qos stuff, and power saving. >> >>This framework should be shared between hotplug-idle and the regular >>idle routines. Hotplug would of course not care about things >>like wakeup >>latency and might therefore pick another idle routine. >> > >Yes. Thought about that approach. There are issues with that >approach however. >- It would be ugly as we have to add logic in various low >level idle handlers we have today mwait_idle, default_idle, >acpi_mwait_idle, acpi_ioport_idle to make them aware of >offline idle case and disable interrupt and not enable >interrupt while going idle and not to wake up on interrupt. >- There are limitations on what we can and cannot do when >offline. We have had issues with calling acpi methods and/or >IO port accesses, during suspend before, so CPU going on >suspend only uses C1. Specifically, with the above approach we >will have: >1) CPU 1 going offline and starts using regular ACPI IO port C-states. >2) CPU 0 starts suspend and after some points it is unsafe to >access ACPI io port accesses, which can potentially result in >crash/hangs. > >So, I would like to keep ACPI out of play_dead. And keeping >idle and play_dead separate seemed to be cleaner way to do it. > Peter, Any more comments on this approach? As I mentioned above, from my POV, it is simpler and cleaner to keep ACPI out of the CPU offline path. Do you agree? Thanks, Venki ^ permalink raw reply [flat|nested] 14+ messages in thread
* [patch 2/2] x86: put offline CPUs into deepest mwait cstate_subcstate 2009-05-22 23:19 [patch 0/2] x86: Make offline cpus to go to deepest idle state using mwait venkatesh.pallipadi 2009-05-22 23:19 ` [patch 1/2] x86: Add pm_play_dead funcptr to power-efficiently offline CPUs venkatesh.pallipadi @ 2009-05-22 23:19 ` venkatesh.pallipadi 2009-05-25 0:56 ` Shaohua Li 2010-09-17 23:46 ` [tip:x86/idle] x86, hotplug: Use mwait to offline a processor, fix the legacy case tip-bot for H. Peter Anvin 1 sibling, 2 replies; 14+ messages in thread From: venkatesh.pallipadi @ 2009-05-22 23:19 UTC (permalink / raw) To: mingo, tglx, hpa Cc: linux-kernel, lenb, shaohua.li, svaidy, a.p.zijlstra, Venkatesh Pallipadi [-- Attachment #1: 0002-x86-put-offline-CPUs-into-deepest-mwait-cstate_subc.patch --] [-- Type: text/plain, Size: 2075 bytes --] Offline CPUs can save power by going into deepest cstate, subcstate instead of hlt loop. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> --- arch/x86/kernel/acpi/cstate.c | 51 +++++++++++++++++++++++++++++++++++++++++ 1 files changed, 51 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c index bbbe4bb..5b0988a 100644 --- a/arch/x86/kernel/acpi/cstate.c +++ b/arch/x86/kernel/acpi/cstate.c @@ -150,6 +150,54 @@ void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx) } EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_enter); +static unsigned long mwait_play_dead_eax; + +static void mwait_play_dead(void) +{ + if (boot_cpu_data.x86 >= 4) + wbinvd(); + + while (1) { + __monitor((void *)¤t_thread_info()->flags, 0, 0); + smp_mb(); + __mwait(mwait_play_dead_eax, 0); + } +} + +static void mwait_play_dead_init(void *unused) +{ + unsigned int eax, ebx, ecx, edx; + unsigned int highest_cstate = 0; + unsigned int highest_subcstate = 0; + int i; + + if (!boot_cpu_has(X86_FEATURE_MWAIT)) + return; + + if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF) + return; + + pm_play_dead = mwait_play_dead; + + cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx); + /* + * mwait_play_dead_eax will be 0 if EDX enumeration is not valid. + * Initialized below to cstate, sub_cstate value when EDX is valid. + */ + if (!(ecx & 0x1)) + return; + + edx >>= MWAIT_SUBSTATE_SIZE; + for (i = 0; i < 7 && edx; i++, edx >>= MWAIT_SUBSTATE_SIZE) { + if (edx & MWAIT_SUBSTATE_MASK) { + highest_cstate = i; + highest_subcstate = edx & MWAIT_SUBSTATE_MASK; + } + } + mwait_play_dead_eax = (highest_cstate << MWAIT_SUBSTATE_SIZE) | + (highest_subcstate - 1); +} + static int __init ffh_cstate_init(void) { struct cpuinfo_x86 *c = &boot_cpu_data; @@ -157,6 +205,9 @@ static int __init ffh_cstate_init(void) return -1; cpu_cstate_entry = alloc_percpu(struct cstate_entry); + + smp_call_function_single(0, mwait_play_dead_init, NULL, 1); + return 0; } -- 1.6.0.6 -- ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86: put offline CPUs into deepest mwait cstate_subcstate 2009-05-22 23:19 ` [patch 2/2] x86: put offline CPUs into deepest mwait cstate_subcstate venkatesh.pallipadi @ 2009-05-25 0:56 ` Shaohua Li 2009-05-26 21:17 ` Pallipadi, Venkatesh 2010-09-17 23:46 ` [tip:x86/idle] x86, hotplug: Use mwait to offline a processor, fix the legacy case tip-bot for H. Peter Anvin 1 sibling, 1 reply; 14+ messages in thread From: Shaohua Li @ 2009-05-25 0:56 UTC (permalink / raw) To: Pallipadi, Venkatesh Cc: mingo@elte.hu, tglx@linutronix.de, hpa@zytor.com, linux-kernel@vger.kernel.org, lenb@kernel.org, svaidy@linux.vnet.ibm.com, a.p.zijlstra@chello.nl On Sat, May 23, 2009 at 07:19:42AM +0800, Pallipadi, Venkatesh wrote: > Offline CPUs can save power by going into deepest cstate, subcstate > instead of hlt loop. > > Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> > --- > arch/x86/kernel/acpi/cstate.c | 51 +++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 51 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c > index bbbe4bb..5b0988a 100644 > --- a/arch/x86/kernel/acpi/cstate.c > +++ b/arch/x86/kernel/acpi/cstate.c > @@ -150,6 +150,54 @@ void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx) > } > EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_enter); > > +static unsigned long mwait_play_dead_eax; > + > +static void mwait_play_dead(void) > +{ > + if (boot_cpu_data.x86 >= 4) > + wbinvd(); > + > + while (1) { > + __monitor((void *)¤t_thread_info()->flags, 0, 0); > + smp_mb(); > + __mwait(mwait_play_dead_eax, 0); > + } > +} CPU is dead, can current_thread_info() still be used? Maybe just monitor a never changed address. Looks the patch will always take the highest native C-state, is this safe, considering BIOS usually limit C-state? Thanks, Shaohua ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86: put offline CPUs into deepest mwait cstate_subcstate 2009-05-25 0:56 ` Shaohua Li @ 2009-05-26 21:17 ` Pallipadi, Venkatesh 0 siblings, 0 replies; 14+ messages in thread From: Pallipadi, Venkatesh @ 2009-05-26 21:17 UTC (permalink / raw) To: Li, Shaohua Cc: mingo@elte.hu, tglx@linutronix.de, hpa@zytor.com, linux-kernel@vger.kernel.org, lenb@kernel.org, svaidy@linux.vnet.ibm.com, a.p.zijlstra@chello.nl On Sun, 2009-05-24 at 17:56 -0700, Li, Shaohua wrote: > On Sat, May 23, 2009 at 07:19:42AM +0800, Pallipadi, Venkatesh wrote: > > Offline CPUs can save power by going into deepest cstate, subcstate > > instead of hlt loop. > > > > Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> > > --- > > arch/x86/kernel/acpi/cstate.c | 51 +++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 51 insertions(+), 0 deletions(-) > > > > diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c > > index bbbe4bb..5b0988a 100644 > > --- a/arch/x86/kernel/acpi/cstate.c > > +++ b/arch/x86/kernel/acpi/cstate.c > > @@ -150,6 +150,54 @@ void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx) > > } > > EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_enter); > > > > +static unsigned long mwait_play_dead_eax; > > + > > +static void mwait_play_dead(void) > > +{ > > + if (boot_cpu_data.x86 >= 4) > > + wbinvd(); > > + > > + while (1) { > > + __monitor((void *)¤t_thread_info()->flags, 0, 0); > > + smp_mb(); > > + __mwait(mwait_play_dead_eax, 0); > > + } > > +} > CPU is dead, can current_thread_info() still be used? Maybe just monitor a never changed > address. > We still execute in this while loop even for an offline CPU, in case CPU just wakes up for whatever reason (HT sibling woke up etc). So, we should have stack and current_thread_info() even when offline, and we can use it here. I had a never changing variable in my earlier version of this patch. But removed it as I don't see why we should have couple of cache lines of data when we can avoid it. > Looks the patch will always take the highest native C-state, is this safe, considering > BIOS usually limit C-state? > It may not be safe in terms of wakeup latency etc. So, we use BIOS CST for normal C-states. When offline, we don't really care about latency part. If CPU supports a C-state, we should be able to enter it and we can save most power with deepest C-state. If there are functionality issues with C-state, I am sure CPU feature (hw or microcode) will have it disabled, instead of depending on BIOSes to disable it. Thanks, Venki ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tip:x86/idle] x86, hotplug: Use mwait to offline a processor, fix the legacy case 2009-05-22 23:19 ` [patch 2/2] x86: put offline CPUs into deepest mwait cstate_subcstate venkatesh.pallipadi 2009-05-25 0:56 ` Shaohua Li @ 2010-09-17 23:46 ` tip-bot for H. Peter Anvin 2010-09-18 0:13 ` [tip:x86/idle] x86, hotplug: Move WBINVD back outside the play_dead loop tip-bot for H. Peter Anvin 1 sibling, 1 reply; 14+ messages in thread From: tip-bot for H. Peter Anvin @ 2010-09-17 23:46 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, arjan, venkatesh.pallipadi, a.p.zijlstra, lenb, tglx, hpa, venki Commit-ID: ea53069231f9317062910d6e772cca4ce93de8c8 Gitweb: http://git.kernel.org/tip/ea53069231f9317062910d6e772cca4ce93de8c8 Author: H. Peter Anvin <hpa@linux.intel.com> AuthorDate: Fri, 17 Sep 2010 15:39:11 -0700 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Fri, 17 Sep 2010 15:39:11 -0700 x86, hotplug: Use mwait to offline a processor, fix the legacy case The code in native_play_dead() has a number of problems: 1. We should use MWAIT when available, to put ourselves into a deeper sleep state. 2. We use the existence of CLFLUSH to determine if WBINVD is safe, but that is totally bogus -- WBINVD is 486+, whereas CLFLUSH is a much later addition. 3. We should do WBINVD inside the loop, just in case of something like setting an A bit on page tables. Pointed out by Arjan van de Ven. This code is based in part of a previous patch by Venki Pallipadi, but unlike that patch this one keeps all the detection code local instead of pre-caching a bunch of information. We're shutting down the CPU; there is absolutely no hurry. This patch moves all the code to C and deletes the global wbinvd_halt() which is broken anyway. Originally-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.hl> LKML-Reference: <20090522232230.162239000@intel.com> --- arch/x86/include/asm/processor.h | 23 -------------- arch/x86/kernel/smpboot.c | 63 +++++++++++++++++++++++++++++++++++++- 2 files changed, 62 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 325b7bd..f358241 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -764,29 +764,6 @@ extern unsigned long idle_halt; extern unsigned long idle_nomwait; extern bool c1e_detected; -/* - * on systems with caches, caches must be flashed as the absolute - * last instruction before going into a suspended halt. Otherwise, - * dirty data can linger in the cache and become stale on resume, - * leading to strange errors. - * - * perform a variety of operations to guarantee that the compiler - * will not reorder instructions. wbinvd itself is serializing - * so the processor will not reorder. - * - * Systems without cache can just go into halt. - */ -static inline void wbinvd_halt(void) -{ - mb(); - /* check for clflush to determine if wbinvd is legal */ - if (cpu_has_clflush) - asm volatile("cli; wbinvd; 1: hlt; jmp 1b" : : : "memory"); - else - while (1) - halt(); -} - extern void enable_sep_cpu(void); extern int sysenter_setup(void); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 8b3bfc4..07bf423 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -62,6 +62,7 @@ #include <asm/pgtable.h> #include <asm/tlbflush.h> #include <asm/mtrr.h> +#include <asm/mwait.h> #include <asm/vmi.h> #include <asm/apic.h> #include <asm/setup.h> @@ -1383,11 +1384,71 @@ void play_dead_common(void) local_irq_disable(); } +/* + * We need to flush the caches before going to sleep, lest we have + * dirty data in our caches when we come back up. + */ +static inline void mwait_play_dead(void) +{ + unsigned int eax, ebx, ecx, edx; + unsigned int highest_cstate = 0; + unsigned int highest_subcstate = 0; + int i; + + if (!cpu_has(¤t_cpu_data, X86_FEATURE_MWAIT)) + return; + if (current_cpu_data.cpuid_level < CPUID_MWAIT_LEAF) + return; + + eax = CPUID_MWAIT_LEAF; + ecx = 0; + native_cpuid(&eax, &ebx, &ecx, &edx); + + /* + * eax will be 0 if EDX enumeration is not valid. + * Initialized below to cstate, sub_cstate value when EDX is valid. + */ + if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED)) { + eax = 0; + } else { + edx >>= MWAIT_SUBSTATE_SIZE; + for (i = 0; i < 7 && edx; i++, edx >>= MWAIT_SUBSTATE_SIZE) { + if (edx & MWAIT_SUBSTATE_MASK) { + highest_cstate = i; + highest_subcstate = edx & MWAIT_SUBSTATE_MASK; + } + } + eax = (highest_cstate << MWAIT_SUBSTATE_SIZE) | + (highest_subcstate - 1); + } + + while (1) { + mb(); + wbinvd(); + __monitor(¤t_thread_info()->flags, 0, 0); + mb(); + __mwait(eax, 0); + } +} + +static inline void hlt_play_dead(void) +{ + while (1) { + mb(); + if (current_cpu_data.x86 >= 4) + wbinvd(); + mb(); + native_halt(); + } +} + void native_play_dead(void) { play_dead_common(); tboot_shutdown(TB_SHUTDOWN_WFS); - wbinvd_halt(); + + mwait_play_dead(); /* Only returns on failure */ + hlt_play_dead(); } #else /* ... !CONFIG_HOTPLUG_CPU */ ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [tip:x86/idle] x86, hotplug: Move WBINVD back outside the play_dead loop 2010-09-17 23:46 ` [tip:x86/idle] x86, hotplug: Use mwait to offline a processor, fix the legacy case tip-bot for H. Peter Anvin @ 2010-09-18 0:13 ` tip-bot for H. Peter Anvin 2010-09-18 0:48 ` Venkatesh Pallipadi 0 siblings, 1 reply; 14+ messages in thread From: tip-bot for H. Peter Anvin @ 2010-09-18 0:13 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, asit.k.mallick, a.p.zijlstra, arjan, lenb, tglx, hpa, venki Commit-ID: a68e5c94f7d3dd64fef34dd5d97e365cae4bb42a Gitweb: http://git.kernel.org/tip/a68e5c94f7d3dd64fef34dd5d97e365cae4bb42a Author: H. Peter Anvin <hpa@linux.intel.com> AuthorDate: Fri, 17 Sep 2010 17:06:46 -0700 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Fri, 17 Sep 2010 17:10:23 -0700 x86, hotplug: Move WBINVD back outside the play_dead loop On processors with hyperthreading, when only one thread is offlined the other thread can cause a spurious wakeup on the idled thread. We do not want to re-WBINVD when that happens. Ideally, we should simply skip WBINVD unless we're the last thread on a particular core to shut down, but there might be similar issues elsewhere in the system. Thus, revert to previous behavior of only WBINVD outside the loop. Partly as a result, remove the mb()'s around it: they are not necessary since wbinvd() is a serializing instruction, but they were intended to make sure the compiler didn't do any funny loop optimizations. Reported-by: Asit Mallick <asit.k.mallick@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Arjan van de Ven <arjan@linux.kernel.org> Cc: Len Brown <lenb@kernel.org> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.hl> LKML-Reference: <tip-ea53069231f9317062910d6e772cca4ce93de8c8@git.kernel.org> --- arch/x86/kernel/smpboot.c | 11 +++++------ 1 files changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 07bf423..55c80ffb 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1422,9 +1422,9 @@ static inline void mwait_play_dead(void) (highest_subcstate - 1); } + wbinvd(); + while (1) { - mb(); - wbinvd(); __monitor(¤t_thread_info()->flags, 0, 0); mb(); __mwait(eax, 0); @@ -1433,11 +1433,10 @@ static inline void mwait_play_dead(void) static inline void hlt_play_dead(void) { + if (current_cpu_data.x86 >= 4) + wbinvd(); + while (1) { - mb(); - if (current_cpu_data.x86 >= 4) - wbinvd(); - mb(); native_halt(); } } ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [tip:x86/idle] x86, hotplug: Move WBINVD back outside the play_dead loop 2010-09-18 0:13 ` [tip:x86/idle] x86, hotplug: Move WBINVD back outside the play_dead loop tip-bot for H. Peter Anvin @ 2010-09-18 0:48 ` Venkatesh Pallipadi 2010-09-20 18:20 ` H. Peter Anvin 0 siblings, 1 reply; 14+ messages in thread From: Venkatesh Pallipadi @ 2010-09-18 0:48 UTC (permalink / raw) To: mingo, hpa, linux-kernel, asit.k.mallick, arjan, a.p.zijlstra, lenb, tglx, hpa, venki Cc: linux-tip-commits On Fri, Sep 17, 2010 at 5:13 PM, tip-bot for H. Peter Anvin <hpa@linux.intel.com> wrote: > Commit-ID: a68e5c94f7d3dd64fef34dd5d97e365cae4bb42a > Gitweb: http://git.kernel.org/tip/a68e5c94f7d3dd64fef34dd5d97e365cae4bb42a > Author: H. Peter Anvin <hpa@linux.intel.com> > AuthorDate: Fri, 17 Sep 2010 17:06:46 -0700 > Committer: H. Peter Anvin <hpa@linux.intel.com> > CommitDate: Fri, 17 Sep 2010 17:10:23 -0700 > > x86, hotplug: Move WBINVD back outside the play_dead loop > > On processors with hyperthreading, when only one thread is offlined > the other thread can cause a spurious wakeup on the idled thread. We > do not want to re-WBINVD when that happens. > > Ideally, we should simply skip WBINVD unless we're the last thread on > a particular core to shut down, but there might be similar issues > elsewhere in the system. > > Thus, revert to previous behavior of only WBINVD outside the loop. > Partly as a result, remove the mb()'s around it: they are not > necessary since wbinvd() is a serializing instruction, but they were > intended to make sure the compiler didn't do any funny loop > optimizations. > > Reported-by: Asit Mallick <asit.k.mallick@intel.com> > Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> > Cc: Arjan van de Ven <arjan@linux.kernel.org> > Cc: Len Brown <lenb@kernel.org> > Cc: Venkatesh Pallipadi <venki@google.com> > Cc: Peter Zijlstra <a.p.zijlstra@chello.hl> > LKML-Reference: <tip-ea53069231f9317062910d6e772cca4ce93de8c8@git.kernel.org> > --- > arch/x86/kernel/smpboot.c | 11 +++++------ > 1 files changed, 5 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 07bf423..55c80ffb 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -1422,9 +1422,9 @@ static inline void mwait_play_dead(void) > (highest_subcstate - 1); > } > > + wbinvd(); > + > while (1) { > - mb(); > - wbinvd(); > __monitor(¤t_thread_info()->flags, 0, 0); > mb(); Just one observation. There are some CPUs with errata that need clflush before monitor. So, if that CPU wakesup spuriously it may have problem reentering idle. Not sure whether that will be a problem as that errata also depended on read happening on the flag. May be its better to do monitor (0, 0, 0). Thanks, Venki ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [tip:x86/idle] x86, hotplug: Move WBINVD back outside the play_dead loop 2010-09-18 0:48 ` Venkatesh Pallipadi @ 2010-09-20 18:20 ` H. Peter Anvin 2010-09-20 20:11 ` H. Peter Anvin 0 siblings, 1 reply; 14+ messages in thread From: H. Peter Anvin @ 2010-09-20 18:20 UTC (permalink / raw) To: Venkatesh Pallipadi Cc: mingo, linux-kernel, asit.k.mallick, arjan, a.p.zijlstra, lenb, tglx, hpa, linux-tip-commits On 09/17/2010 05:48 PM, Venkatesh Pallipadi wrote: >> >> + wbinvd(); >> + >> while (1) { >> - mb(); >> - wbinvd(); >> __monitor(¤t_thread_info()->flags, 0, 0); >> mb(); > > > Just one observation. There are some CPUs with errata that need > clflush before monitor. So, if that CPU wakesup spuriously it may have > problem reentering idle. Not sure whether that will be a problem as > that errata also depended on read happening on the flag. May be its > better to do monitor (0, 0, 0). > It seems the easy way to deal with that would be to just add clflush before monitor... it is *probably* redundant, but it should be safe to do. It means depending on X86_FEATURE_CLFLUSH as well as X86_FEATURE_MWAIT, but I don't think there is any x86 processor which has MWAIT and not CLFLUSH, and I highly doubt there ever will be. Does anyone see any downside? -hpa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [tip:x86/idle] x86, hotplug: Move WBINVD back outside the play_dead loop 2010-09-20 18:20 ` H. Peter Anvin @ 2010-09-20 20:11 ` H. Peter Anvin 2010-09-20 22:34 ` Venkatesh Pallipadi 0 siblings, 1 reply; 14+ messages in thread From: H. Peter Anvin @ 2010-09-20 20:11 UTC (permalink / raw) To: Venkatesh Pallipadi Cc: mingo, linux-kernel, asit.k.mallick, arjan, a.p.zijlstra, lenb, tglx, hpa, linux-tip-commits, Andi Kleen [-- Attachment #1: Type: text/plain, Size: 1074 bytes --] On 09/20/2010 11:20 AM, H. Peter Anvin wrote: > On 09/17/2010 05:48 PM, Venkatesh Pallipadi wrote: >>> >>> + wbinvd(); >>> + >>> while (1) { >>> - mb(); >>> - wbinvd(); >>> __monitor(¤t_thread_info()->flags, 0, 0); >>> mb(); >> >> >> Just one observation. There are some CPUs with errata that need >> clflush before monitor. So, if that CPU wakesup spuriously it may have >> problem reentering idle. Not sure whether that will be a problem as >> that errata also depended on read happening on the flag. May be its >> better to do monitor (0, 0, 0). >> > > It seems the easy way to deal with that would be to just add clflush > before monitor... it is *probably* redundant, but it should be safe to > do. It means depending on X86_FEATURE_CLFLUSH as well as > X86_FEATURE_MWAIT, but I don't think there is any x86 processor which > has MWAIT and not CLFLUSH, and I highly doubt there ever will be. > > Does anyone see any downside? > Patch for review... anyone see any problem with it? -hpa [-- Attachment #2: 0001-x86-hotplug-In-the-MWAIT-case-of-play_dead-CLFLUSH-t.patch --] [-- Type: text/x-patch, Size: 2265 bytes --] >From 83c70ace9aad19ba7c41986f3c34a0808fbaf78f Mon Sep 17 00:00:00 2001 From: H. Peter Anvin <hpa@linux.intel.com> Date: Mon, 20 Sep 2010 13:04:45 -0700 Subject: [PATCH] x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line When we're using MWAIT for play_dead, explicitly CLFLUSH the cache line before executing MONITOR. This is a potential workaround for the Xeon 7400 erratum AAI65 after having a spurious wakeup and returning around the loop. "Potential" here because it is not certain that that erratum could actually trigger; however, the CLFLUSH should be harmless. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.kernel.org> Cc: Len Brown <lenb@kernel.org> Cc: Venkatesh Pallipadi <venki@google.com> --- arch/x86/kernel/smpboot.c | 20 +++++++++++++++++++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 55c80ffb..fdccfe9 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1394,9 +1394,12 @@ static inline void mwait_play_dead(void) unsigned int highest_cstate = 0; unsigned int highest_subcstate = 0; int i; + void *mwait_ptr; if (!cpu_has(¤t_cpu_data, X86_FEATURE_MWAIT)) return; + if (!cpu_has(¤t_cpu_data, X86_FEATURE_CLFLSH)) + return; if (current_cpu_data.cpuid_level < CPUID_MWAIT_LEAF) return; @@ -1422,10 +1425,25 @@ static inline void mwait_play_dead(void) (highest_subcstate - 1); } + /* + * This should be a memory location in a cache line which is + * unlikely to be touched by other processors. The actual + * content is immaterial as it is not actually modified in any way. + */ + mwait_ptr = ¤t_thread_info()->flags; + wbinvd(); while (1) { - __monitor(¤t_thread_info()->flags, 0, 0); + /* + * The CLFLUSH is a workaround for erratum AAI65 for + * the Xeon 7400 series. It's not clear it is actually + * needed, but it should be harmless in either case. + * The WBINVD is insufficient due to the spurious-wakeup + * case where we return around the loop. + */ + clflush(mwait_ptr); + __monitor(mwait_ptr, 0, 0); mb(); __mwait(eax, 0); } -- 1.7.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [tip:x86/idle] x86, hotplug: Move WBINVD back outside the play_dead loop 2010-09-20 20:11 ` H. Peter Anvin @ 2010-09-20 22:34 ` Venkatesh Pallipadi 0 siblings, 0 replies; 14+ messages in thread From: Venkatesh Pallipadi @ 2010-09-20 22:34 UTC (permalink / raw) To: H. Peter Anvin Cc: mingo, linux-kernel, asit.k.mallick, arjan, a.p.zijlstra, lenb, tglx, hpa, linux-tip-commits, Andi Kleen On Mon, Sep 20, 2010 at 1:11 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 09/20/2010 11:20 AM, H. Peter Anvin wrote: >> On 09/17/2010 05:48 PM, Venkatesh Pallipadi wrote: >>>> >>>> + wbinvd(); >>>> + >>>> while (1) { >>>> - mb(); >>>> - wbinvd(); >>>> __monitor(¤t_thread_info()->flags, 0, 0); >>>> mb(); >>> >>> >>> Just one observation. There are some CPUs with errata that need >>> clflush before monitor. So, if that CPU wakesup spuriously it may have >>> problem reentering idle. Not sure whether that will be a problem as >>> that errata also depended on read happening on the flag. May be its >>> better to do monitor (0, 0, 0). >>> >> >> It seems the easy way to deal with that would be to just add clflush >> before monitor... it is *probably* redundant, but it should be safe to >> do. It means depending on X86_FEATURE_CLFLUSH as well as >> X86_FEATURE_MWAIT, but I don't think there is any x86 processor which >> has MWAIT and not CLFLUSH, and I highly doubt there ever will be. >> >> Does anyone see any downside? >> > > Patch for review... anyone see any problem with it? > Looks good. We can also close one of the longest standing bugs in bugzilla.kernel.org now :-) https://bugzilla.kernel.org/show_bug.cgi?id=5471 Acked-by: Venkatesh Pallipadi <venki@google.com> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2010-09-20 22:39 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-22 23:19 [patch 0/2] x86: Make offline cpus to go to deepest idle state using mwait venkatesh.pallipadi 2009-05-22 23:19 ` [patch 1/2] x86: Add pm_play_dead funcptr to power-efficiently offline CPUs venkatesh.pallipadi 2009-05-23 10:44 ` Peter Zijlstra 2009-05-23 15:07 ` Pallipadi, Venkatesh 2009-06-22 17:25 ` Pallipadi, Venkatesh 2009-05-22 23:19 ` [patch 2/2] x86: put offline CPUs into deepest mwait cstate_subcstate venkatesh.pallipadi 2009-05-25 0:56 ` Shaohua Li 2009-05-26 21:17 ` Pallipadi, Venkatesh 2010-09-17 23:46 ` [tip:x86/idle] x86, hotplug: Use mwait to offline a processor, fix the legacy case tip-bot for H. Peter Anvin 2010-09-18 0:13 ` [tip:x86/idle] x86, hotplug: Move WBINVD back outside the play_dead loop tip-bot for H. Peter Anvin 2010-09-18 0:48 ` Venkatesh Pallipadi 2010-09-20 18:20 ` H. Peter Anvin 2010-09-20 20:11 ` H. Peter Anvin 2010-09-20 22:34 ` Venkatesh Pallipadi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox