* [PATCH] x86 idle: repair large-server 50-watt idle-power regression
@ 2013-12-18 21:44 Len Brown
2013-12-19 12:22 ` Ingo Molnar
0 siblings, 1 reply; 27+ messages in thread
From: Len Brown @ 2013-12-18 21:44 UTC (permalink / raw)
To: x86; +Cc: linux-pm, linux-kernel, Len Brown, stable
From: Len Brown <len.brown@intel.com>
Linux 3.10 changed the timing of how thread_info->flags is touched:
x86: Use generic idle loop
(7d1a941731fabf27e5fb6edbebb79fe856edb4e5)
This caused Intel NHM-EX and WSM-EX servers to experience a large number
of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote
from deep C-states to shallow C-states, which caused these platforms
to experience a significant increase in idle power.
Note that this issue was already present before the commit above,
however, it wasn't seen often enough to be noticed in power measurements.
Here we extend an errata workaround from the Core2 EX "Dunnington"
to extend to NHM-EX and WSM-EX, to prevent these immediate
returns from MWAIT, reducing idle power on these platforms.
While only acpi_idle ran on Dunnington, intel_idle
may also run on these two newer systems.
As of today, there are no other models that are known
to need this tweak.
ref: https://lkml.org/lkml/2013/12/7/22
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x
---
arch/x86/kernel/cpu/intel.c | 3 ++-
drivers/idle/intel_idle.c | 3 +++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index dc1ec0d..ea04b34 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -387,7 +387,8 @@ static void init_intel(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_PEBS);
}
- if (c->x86 == 6 && c->x86_model == 29 && cpu_has_clflush)
+ if (c->x86 == 6 && cpu_has_clflush &&
+ (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47))
set_cpu_cap(c, X86_FEATURE_CLFLUSH_MONITOR);
#ifdef CONFIG_X86_64
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 92d1206..f80b700 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev,
if (!current_set_polling_and_test()) {
+ if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+ clflush((void *)¤t_thread_info()->flags);
+
__monitor((void *)¤t_thread_info()->flags, 0, 0);
smp_mb();
if (!need_resched())
--
1.8.5.1.19.gdaad3aa
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-18 21:44 [PATCH] x86 idle: repair large-server 50-watt idle-power regression Len Brown
@ 2013-12-19 12:22 ` Ingo Molnar
2013-12-19 14:40 ` H. Peter Anvin
2013-12-19 15:55 ` H. Peter Anvin
0 siblings, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2013-12-19 12:22 UTC (permalink / raw)
To: Len Brown
Cc: x86, linux-pm, linux-kernel, Len Brown, stable, Linus Torvalds,
H. Peter Anvin, Thomas Gleixner, Peter Zijlstra, Mike Galbraith,
Borislav Petkov
* Len Brown <lenb@kernel.org> wrote:
> From: Len Brown <len.brown@intel.com>
>
> Linux 3.10 changed the timing of how thread_info->flags is touched:
>
> x86: Use generic idle loop
> (7d1a941731fabf27e5fb6edbebb79fe856edb4e5)
>
> This caused Intel NHM-EX and WSM-EX servers to experience a large number
> of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote
> from deep C-states to shallow C-states, which caused these platforms
> to experience a significant increase in idle power.
>
> Note that this issue was already present before the commit above,
> however, it wasn't seen often enough to be noticed in power measurements.
>
> Here we extend an errata workaround from the Core2 EX "Dunnington"
> to extend to NHM-EX and WSM-EX, to prevent these immediate
> returns from MWAIT, reducing idle power on these platforms.
>
> While only acpi_idle ran on Dunnington, intel_idle
> may also run on these two newer systems.
> As of today, there are no other models that are known
> to need this tweak.
>
> ref: https://lkml.org/lkml/2013/12/7/22
> Signed-off-by: Len Brown <len.brown@intel.com>
> Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x
> ---
> arch/x86/kernel/cpu/intel.c | 3 ++-
> drivers/idle/intel_idle.c | 3 +++
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index dc1ec0d..ea04b34 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -387,7 +387,8 @@ static void init_intel(struct cpuinfo_x86 *c)
> set_cpu_cap(c, X86_FEATURE_PEBS);
> }
>
> - if (c->x86 == 6 && c->x86_model == 29 && cpu_has_clflush)
> + if (c->x86 == 6 && cpu_has_clflush &&
> + (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47))
> set_cpu_cap(c, X86_FEATURE_CLFLUSH_MONITOR);
>
> #ifdef CONFIG_X86_64
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 92d1206..f80b700 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev,
>
> if (!current_set_polling_and_test()) {
>
> + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
> + clflush((void *)¤t_thread_info()->flags);
> +
> __monitor((void *)¤t_thread_info()->flags, 0, 0);
I don't think either of these casts to '(void *)' is needed, both the
clflush() and __monitor() will take pointers.
Looks good to me otherwise - except that maybe the best way to
represent this quirk would be for the CLFLUSH+MONITOR sequence to be a
single 'instruction' which is patched in dynamically during bootup,
using our usual alternatives framework.
On non-affected CPUs a NOP would remain in place of the CLFLUSH,
eliminating the branch above.
So the whole thing could be thought of as a slightly more complex
'monitor' instruction - not exposing the quirk details to actual usage
sites.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 12:22 ` Ingo Molnar
@ 2013-12-19 14:40 ` H. Peter Anvin
2013-12-19 15:45 ` Borislav Petkov
2013-12-19 15:55 ` H. Peter Anvin
1 sibling, 1 reply; 27+ messages in thread
From: H. Peter Anvin @ 2013-12-19 14:40 UTC (permalink / raw)
To: Ingo Molnar, Len Brown
Cc: x86, linux-pm, linux-kernel, Len Brown, stable, Linus Torvalds,
Thomas Gleixner, Peter Zijlstra, Mike Galbraith, Borislav Petkov
... or just use static_cpu_has() maybe?
Ingo Molnar <mingo@kernel.org> wrote:
>
>* Len Brown <lenb@kernel.org> wrote:
>
>> From: Len Brown <len.brown@intel.com>
>>
>> Linux 3.10 changed the timing of how thread_info->flags is touched:
>>
>> x86: Use generic idle loop
>> (7d1a941731fabf27e5fb6edbebb79fe856edb4e5)
>>
>> This caused Intel NHM-EX and WSM-EX servers to experience a large
>number
>> of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to
>demote
>> from deep C-states to shallow C-states, which caused these platforms
>> to experience a significant increase in idle power.
>>
>> Note that this issue was already present before the commit above,
>> however, it wasn't seen often enough to be noticed in power
>measurements.
>>
>> Here we extend an errata workaround from the Core2 EX "Dunnington"
>> to extend to NHM-EX and WSM-EX, to prevent these immediate
>> returns from MWAIT, reducing idle power on these platforms.
>>
>> While only acpi_idle ran on Dunnington, intel_idle
>> may also run on these two newer systems.
>> As of today, there are no other models that are known
>> to need this tweak.
>>
>> ref: https://lkml.org/lkml/2013/12/7/22
>> Signed-off-by: Len Brown <len.brown@intel.com>
>> Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x
>> ---
>> arch/x86/kernel/cpu/intel.c | 3 ++-
>> drivers/idle/intel_idle.c | 3 +++
>> 2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/intel.c
>b/arch/x86/kernel/cpu/intel.c
>> index dc1ec0d..ea04b34 100644
>> --- a/arch/x86/kernel/cpu/intel.c
>> +++ b/arch/x86/kernel/cpu/intel.c
>> @@ -387,7 +387,8 @@ static void init_intel(struct cpuinfo_x86 *c)
>> set_cpu_cap(c, X86_FEATURE_PEBS);
>> }
>>
>> - if (c->x86 == 6 && c->x86_model == 29 && cpu_has_clflush)
>> + if (c->x86 == 6 && cpu_has_clflush &&
>> + (c->x86_model == 29 || c->x86_model == 46 || c->x86_model ==
>47))
>> set_cpu_cap(c, X86_FEATURE_CLFLUSH_MONITOR);
>>
>> #ifdef CONFIG_X86_64
>> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
>> index 92d1206..f80b700 100644
>> --- a/drivers/idle/intel_idle.c
>> +++ b/drivers/idle/intel_idle.c
>> @@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev,
>>
>> if (!current_set_polling_and_test()) {
>>
>> + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
>> + clflush((void *)¤t_thread_info()->flags);
>> +
>> __monitor((void *)¤t_thread_info()->flags, 0, 0);
>
>I don't think either of these casts to '(void *)' is needed, both the
>clflush() and __monitor() will take pointers.
>
>Looks good to me otherwise - except that maybe the best way to
>represent this quirk would be for the CLFLUSH+MONITOR sequence to be a
>single 'instruction' which is patched in dynamically during bootup,
>using our usual alternatives framework.
>
>On non-affected CPUs a NOP would remain in place of the CLFLUSH,
>eliminating the branch above.
>
>So the whole thing could be thought of as a slightly more complex
>'monitor' instruction - not exposing the quirk details to actual usage
>sites.
>
>Thanks,
>
> Ingo
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 14:40 ` H. Peter Anvin
@ 2013-12-19 15:45 ` Borislav Petkov
0 siblings, 0 replies; 27+ messages in thread
From: Borislav Petkov @ 2013-12-19 15:45 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ingo Molnar, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Peter Zijlstra,
Mike Galbraith
On Thu, Dec 19, 2013 at 06:40:41AM -0800, H. Peter Anvin wrote:
> ... or just use static_cpu_has() maybe?
Right, if we can get the compiler to generate the shortest CLFLUSH of 3
bytes with register indirect addressing for the operand, then stomping
over those 3 bytes with the alternatives would be the fastest solution.
If, of course, three NOPs which get discarded in the front end are
cheaper than the near JMP from static_cpu_has.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 12:22 ` Ingo Molnar
2013-12-19 14:40 ` H. Peter Anvin
@ 2013-12-19 15:55 ` H. Peter Anvin
2013-12-19 16:02 ` Ingo Molnar
1 sibling, 1 reply; 27+ messages in thread
From: H. Peter Anvin @ 2013-12-19 15:55 UTC (permalink / raw)
To: Ingo Molnar, Len Brown
Cc: x86, linux-pm, linux-kernel, Len Brown, stable, Linus Torvalds,
Thomas Gleixner, Peter Zijlstra, Mike Galbraith, Borislav Petkov
On 12/19/2013 04:22 AM, Ingo Molnar wrote:
>> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
>> index 92d1206..f80b700 100644
>> --- a/drivers/idle/intel_idle.c
>> +++ b/drivers/idle/intel_idle.c
>> @@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev,
>>
>> if (!current_set_polling_and_test()) {
>>
>> + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
>> + clflush((void *)¤t_thread_info()->flags);
>> +
>> __monitor((void *)¤t_thread_info()->flags, 0, 0);
>
> I don't think either of these casts to '(void *)' is needed, both the
> clflush() and __monitor() will take pointers.
__monitor() currently doesn't, which is idiotic.
-hpa
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 15:55 ` H. Peter Anvin
@ 2013-12-19 16:02 ` Ingo Molnar
2013-12-19 16:09 ` H. Peter Anvin
2013-12-19 16:13 ` H. Peter Anvin
0 siblings, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2013-12-19 16:02 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Len Brown, x86, linux-pm, linux-kernel, Len Brown, stable,
Linus Torvalds, Thomas Gleixner, Peter Zijlstra, Mike Galbraith,
Borislav Petkov
* H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/19/2013 04:22 AM, Ingo Molnar wrote:
> >> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> >> index 92d1206..f80b700 100644
> >> --- a/drivers/idle/intel_idle.c
> >> +++ b/drivers/idle/intel_idle.c
> >> @@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev,
> >>
> >> if (!current_set_polling_and_test()) {
> >>
> >> + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
> >> + clflush((void *)¤t_thread_info()->flags);
> >> +
> >> __monitor((void *)¤t_thread_info()->flags, 0, 0);
> >
> > I don't think either of these casts to '(void *)' is needed, both the
> > clflush() and __monitor() will take pointers.
>
> __monitor() currently doesn't, which is idiotic.
Hm, __monitor() seems to take a void *:
arch/x86/include/asm/processor.h:static inline void __monitor(const void *eax, unsigned long ecx,
So writing:
if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
clflush(¤t_thread_info()->flags);
__monitor(¤t_thread_info()->flags, 0, 0);
ought to work just fine.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 16:02 ` Ingo Molnar
@ 2013-12-19 16:09 ` H. Peter Anvin
2013-12-19 16:13 ` H. Peter Anvin
1 sibling, 0 replies; 27+ messages in thread
From: H. Peter Anvin @ 2013-12-19 16:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Len Brown, x86, linux-pm, linux-kernel, Len Brown, stable,
Linus Torvalds, Thomas Gleixner, Peter Zijlstra, Mike Galbraith,
Borislav Petkov
On 12/19/2013 08:02 AM, Ingo Molnar wrote:
>>
>> __monitor() currently doesn't, which is idiotic.
>
> Hm, __monitor() seems to take a void *:
>
> arch/x86/include/asm/processor.h:static inline void __monitor(const void *eax, unsigned long ecx,
>
> So writing:
>
> if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
> clflush(¤t_thread_info()->flags);
>
> __monitor(¤t_thread_info()->flags, 0, 0);
>
Yes, brain failure on my part.
-hpa
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 16:02 ` Ingo Molnar
2013-12-19 16:09 ` H. Peter Anvin
@ 2013-12-19 16:13 ` H. Peter Anvin
2013-12-19 16:21 ` Peter Zijlstra
1 sibling, 1 reply; 27+ messages in thread
From: H. Peter Anvin @ 2013-12-19 16:13 UTC (permalink / raw)
To: Ingo Molnar
Cc: Len Brown, x86, linux-pm, linux-kernel, Len Brown, stable,
Linus Torvalds, Thomas Gleixner, Peter Zijlstra, Mike Galbraith,
Borislav Petkov
[-- Attachment #1: Type: text/plain, Size: 189 bytes --]
How does this look? Completely untested, of course.
I do wonder if we need more memory barriers, though.
An alternative would be to move everything into mwait_idle_with_hints().
-hpa
[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 1626 bytes --]
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 7b034a4057f9..6dce588f94b4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -723,6 +723,23 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
:: "a" (eax), "c" (ecx));
}
+/*
+ * Issue a clflush in preparation for a monitor instruction if the CPU
+ * needs it. We force the address into the ax register to get a fixed
+ * length for the instruction, however, this is what the monitor instruction
+ * is going to need anyway, so it shouldn't add any additional code.
+ */
+static inline void clflush_monitor(const void *addr, unsigned long ecx,
+ unsigned long edx)
+{
+ alternative_input(ASM_NOP3,
+ "clflush (%0)",
+ X86_FEATURE_CLFLUSH_MONITOR,
+ "a" (addr));
+ __monitor(addr, eax, edx);
+ smp_mb();
+}
+
extern void select_idle_routine(const struct cpuinfo_x86 *c);
extern void init_amd_e400_c1e_mask(void);
diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index d2b7f27781bc..b14d02354134 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -163,11 +163,7 @@ EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
{
if (!need_resched()) {
- if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
- clflush((void *)¤t_thread_info()->flags);
-
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
- smp_mb();
+ clflush_monitor(¤t_thread_info()->flags, 0, 0);
if (!need_resched())
__mwait(ax, cx);
}
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 16:13 ` H. Peter Anvin
@ 2013-12-19 16:21 ` Peter Zijlstra
2013-12-19 16:50 ` H. Peter Anvin
0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2013-12-19 16:21 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ingo Molnar, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
On Thu, Dec 19, 2013 at 08:13:21AM -0800, H. Peter Anvin wrote:
> How does this look? Completely untested, of course.
>
> I do wonder if we need more memory barriers, though.
>
> An alternative would be to move everything into mwait_idle_with_hints().
>
> -hpa
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 7b034a4057f9..6dce588f94b4 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -723,6 +723,23 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
> :: "a" (eax), "c" (ecx));
> }
>
> +/*
> + * Issue a clflush in preparation for a monitor instruction if the CPU
> + * needs it. We force the address into the ax register to get a fixed
> + * length for the instruction, however, this is what the monitor instruction
> + * is going to need anyway, so it shouldn't add any additional code.
> + */
> +static inline void clflush_monitor(const void *addr, unsigned long ecx,
> + unsigned long edx)
> +{
> + alternative_input(ASM_NOP3,
> + "clflush (%0)",
> + X86_FEATURE_CLFLUSH_MONITOR,
> + "a" (addr));
> + __monitor(addr, eax, edx);
> + smp_mb();
> +}
What's that mb for?
Also, can you please merge:
http://marc.info/?l=linux-kernel&m=138685838420632
Thomas said he would pick up that series, but seems to have gone missing
the past week or so.
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 16:21 ` Peter Zijlstra
@ 2013-12-19 16:50 ` H. Peter Anvin
2013-12-19 17:07 ` Ingo Molnar
0 siblings, 1 reply; 27+ messages in thread
From: H. Peter Anvin @ 2013-12-19 16:50 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
On 12/19/2013 08:21 AM, Peter Zijlstra wrote:
>
> What's that mb for?
>
It already exists in mwait_idle_with_hints(); I just moved it into this
common function. It is a bit odd, I have to admit; it seems like it
should be *before* the monitor (and possibly we should have one after
the CLFLUSH as well?)
> Also, can you please merge:
>
> http://marc.info/?l=linux-kernel&m=138685838420632
>
> Thomas said he would pick up that series, but seems to have gone missing
> the past week or so.
>
We need to get the immediate regression fixed.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 16:50 ` H. Peter Anvin
@ 2013-12-19 17:07 ` Ingo Molnar
2013-12-19 17:25 ` Peter Zijlstra
2013-12-19 18:09 ` H. Peter Anvin
0 siblings, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2013-12-19 17:07 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Peter Zijlstra, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
* H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/19/2013 08:21 AM, Peter Zijlstra wrote:
> >
> > What's that mb for?
> >
>
> It already exists in mwait_idle_with_hints(); I just moved it into
> this common function. It is a bit odd, I have to admit; it seems
> like it should be *before* the monitor (and possibly we should have
> one after the CLFLUSH as well?)
Yes, I think we need a barrier before the CLFLUSH, because according
to my reading of the Intel documentation CLFLUSH has no implicit
ordering so it might get reordered with the store to ->flags in
current_set_polling_and_test(), which might result in spurious wakeup
problems again.
(And CLFLUSH is a store in a sense, so special in that the regular
ordering for stores does not apply.)
Likewise, having a barrier before the MONITOR looks sensible as well.
Having it _after_ monitor looks weird and is probably wrong. [It might
have been the effects of someone seeing the spurious wakeup problems
with realizing the true source, or so.]
Thanks,
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 17:07 ` Ingo Molnar
@ 2013-12-19 17:25 ` Peter Zijlstra
2013-12-19 17:36 ` Peter Zijlstra
` (2 more replies)
2013-12-19 18:09 ` H. Peter Anvin
1 sibling, 3 replies; 27+ messages in thread
From: Peter Zijlstra @ 2013-12-19 17:25 UTC (permalink / raw)
To: Ingo Molnar
Cc: H. Peter Anvin, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
On Thu, Dec 19, 2013 at 06:07:41PM +0100, Ingo Molnar wrote:
>
> * H. Peter Anvin <hpa@zytor.com> wrote:
>
> > On 12/19/2013 08:21 AM, Peter Zijlstra wrote:
> > >
> > > What's that mb for?
> > >
> >
> > It already exists in mwait_idle_with_hints(); I just moved it into
> > this common function. It is a bit odd, I have to admit; it seems
> > like it should be *before* the monitor (and possibly we should have
> > one after the CLFLUSH as well?)
>
> Yes, I think we need a barrier before the CLFLUSH, because according
> to my reading of the Intel documentation CLFLUSH has no implicit
> ordering so it might get reordered with the store to ->flags in
> current_set_polling_and_test(), which might result in spurious wakeup
> problems again.
No it cannot; since current_set_polling_and_test() already has a barrier
to prevent that.
Also, the location patched by hpa doesn't actually call that at all.
That said, I would find it very strange indeed if a CLFLUSH doesn't also
flush the store buffer.
> (And CLFLUSH is a store in a sense, so special in that the regular
> ordering for stores does not apply.)
>
> Likewise, having a barrier before the MONITOR looks sensible as well.
> Having it _after_ monitor looks weird and is probably wrong. [It might
> have been the effects of someone seeing the spurious wakeup problems
> with realizing the true source, or so.]
I again have to disagree, one would expect monitor to flush all that is
required to start the monitor -- and it actually does so. As is
testified by this extra CLFLUSH being called a bug workaround.
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 17:25 ` Peter Zijlstra
@ 2013-12-19 17:36 ` Peter Zijlstra
2013-12-19 18:05 ` H. Peter Anvin
2013-12-19 17:50 ` Peter Zijlstra
2013-12-19 18:10 ` Ingo Molnar
2 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2013-12-19 17:36 UTC (permalink / raw)
To: Ingo Molnar
Cc: H. Peter Anvin, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote:
> That said, I would find it very strange indeed if a CLFLUSH doesn't also
> flush the store buffer.
OK, it explicitly states it does not do that and you indeed need an
mfence before the clflush.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 17:36 ` Peter Zijlstra
@ 2013-12-19 18:05 ` H. Peter Anvin
2013-12-19 18:14 ` Ingo Molnar
0 siblings, 1 reply; 27+ messages in thread
From: H. Peter Anvin @ 2013-12-19 18:05 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Len Brown, x86, linux-pm, linux-kernel, Len Brown, stable,
Linus Torvalds, Thomas Gleixner, Mike Galbraith, Borislav Petkov
On 12/19/2013 09:36 AM, Peter Zijlstra wrote:
> On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote:
>> That said, I would find it very strange indeed if a CLFLUSH doesn't also
>> flush the store buffer.
>
> OK, it explicitly states it does not do that and you indeed need an
> mfence before the clflush.
>
So, MONITOR is defined to be ordered as a load, which I think should be
adequate, but I really wonder if we should have mfence on both sides of
clflush. This now is up to 9 bytes, and perhaps pushing it a bit with
how much we would be willing to patch out.
On the other hand - the CLFLUSH seems to have worked well enough by
itself, and this is all probabilistic anyway, so perhaps we should just
leave the naked CLFLUSH in and not worry about it unless measurements
say otherwise?
-hpa
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 18:05 ` H. Peter Anvin
@ 2013-12-19 18:14 ` Ingo Molnar
0 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2013-12-19 18:14 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Peter Zijlstra, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
* H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/19/2013 09:36 AM, Peter Zijlstra wrote:
> > On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote:
> >> That said, I would find it very strange indeed if a CLFLUSH doesn't also
> >> flush the store buffer.
> >
> > OK, it explicitly states it does not do that and you indeed need
> > an mfence before the clflush.
>
> So, MONITOR is defined to be ordered as a load, which I think should
> be adequate, but I really wonder if we should have mfence on both
> sides of clflush. This now is up to 9 bytes, and perhaps pushing it
> a bit with how much we would be willing to patch out.
>
> On the other hand - the CLFLUSH seems to have worked well enough by
> itself, and this is all probabilistic anyway, so perhaps we should
> just leave the naked CLFLUSH in and not worry about it unless
> measurements say otherwise?
So I think the window of breakage was rather large here, and since it
seems to trigger on rare types of hardware I think we'd be better off
by erring on the side of robustness this time around ...
This is the 'go to idle' path, which isn't as time-critical as the
'get out of idle' code path.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 17:25 ` Peter Zijlstra
2013-12-19 17:36 ` Peter Zijlstra
@ 2013-12-19 17:50 ` Peter Zijlstra
2013-12-19 18:18 ` Ingo Molnar
2013-12-19 18:10 ` Ingo Molnar
2 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2013-12-19 17:50 UTC (permalink / raw)
To: Ingo Molnar
Cc: H. Peter Anvin, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 19, 2013 at 06:07:41PM +0100, Ingo Molnar wrote:
> > Likewise, having a barrier before the MONITOR looks sensible as well.
>
> I again have to disagree, one would expect monitor to flush all that is
> required to start the monitor -- and it actually does so. As is
> testified by this extra CLFLUSH being called a bug workaround.
SDM states that MONITOR is ordered like a LOAD, and a LOAD cannot pass
a previous STORE to the same address.
That said; there's enough holes in there to swim a titanic through,
seeing how MONITOR stares at an entire cacheline and LOAD/STORE order is
specified on location, whatever that means.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 17:50 ` Peter Zijlstra
@ 2013-12-19 18:18 ` Ingo Molnar
2013-12-19 21:05 ` H. Peter Anvin
0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2013-12-19 18:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: H. Peter Anvin, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
* Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 19, 2013 at 06:07:41PM +0100, Ingo Molnar wrote:
> > > Likewise, having a barrier before the MONITOR looks sensible as well.
> >
> > I again have to disagree, one would expect monitor to flush all that is
> > required to start the monitor -- and it actually does so. As is
> > testified by this extra CLFLUSH being called a bug workaround.
>
> SDM states that MONITOR is ordered like a LOAD, and a LOAD cannot
> pass a previous STORE to the same address.
Yes ... but you could argue that CLFLUSH is neither a load nor a
store, it's a _cache sync_ operation, with its special ordering
properties.
> That said; there's enough holes in there to swim a titanic through,
> seeing how MONITOR stares at an entire cacheline and LOAD/STORE
> order is specified on location, whatever that means.
I think assuming that MONITOR is ordered as a load or better is a
pretty safe one (and in fact the Intel documentation seems to say so)
- I'd say MONITOR is in micro-code and essentially snoops on cache
events on that specific cache line, and loads the cache line on a
snoop hit?
Btw., what state is the cache line after a MONITOR instruction, is it
loaded as shared, or as excusive? (exclusive would probably be better
for performance.)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 18:18 ` Ingo Molnar
@ 2013-12-19 21:05 ` H. Peter Anvin
2013-12-19 21:17 ` Ingo Molnar
0 siblings, 1 reply; 27+ messages in thread
From: H. Peter Anvin @ 2013-12-19 21:05 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: Len Brown, x86, linux-pm, linux-kernel, Len Brown, stable,
Linus Torvalds, Thomas Gleixner, Mike Galbraith, Borislav Petkov
CLFLUSH is only (guaranteed to be) ordered with respect to MFENCE according to the SDM.
Ingo Molnar <mingo@kernel.org> wrote:
>
>* Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote:
>> > On Thu, Dec 19, 2013 at 06:07:41PM +0100, Ingo Molnar wrote:
>> > > Likewise, having a barrier before the MONITOR looks sensible as
>well.
>> >
>> > I again have to disagree, one would expect monitor to flush all
>that is
>> > required to start the monitor -- and it actually does so. As is
>> > testified by this extra CLFLUSH being called a bug workaround.
>>
>> SDM states that MONITOR is ordered like a LOAD, and a LOAD cannot
>> pass a previous STORE to the same address.
>
>Yes ... but you could argue that CLFLUSH is neither a load nor a
>store, it's a _cache sync_ operation, with its special ordering
>properties.
>
>> That said; there's enough holes in there to swim a titanic through,
>> seeing how MONITOR stares at an entire cacheline and LOAD/STORE
>> order is specified on location, whatever that means.
>
>I think assuming that MONITOR is ordered as a load or better is a
>pretty safe one (and in fact the Intel documentation seems to say so)
>- I'd say MONITOR is in micro-code and essentially snoops on cache
>events on that specific cache line, and loads the cache line on a
>snoop hit?
>
>Btw., what state is the cache line after a MONITOR instruction, is it
>loaded as shared, or as excusive? (exclusive would probably be better
>for performance.)
>
>Thanks,
>
> Ingo
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 21:05 ` H. Peter Anvin
@ 2013-12-19 21:17 ` Ingo Molnar
0 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2013-12-19 21:17 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Peter Zijlstra, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
* H. Peter Anvin <hpa@zytor.com> wrote:
> CLFLUSH is only (guaranteed to be) ordered with respect to MFENCE
> according to the SDM.
Yes, I quoted the relevant text a couple of hours ago, CLFLUSH is
pretty special, so it's not ordered against atomics or serializing
instructions for example - only ordered against explict MFENCE calls,
which on x86 is mb()/smp_mb().
Thanks,
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 17:25 ` Peter Zijlstra
2013-12-19 17:36 ` Peter Zijlstra
2013-12-19 17:50 ` Peter Zijlstra
@ 2013-12-19 18:10 ` Ingo Molnar
2 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2013-12-19 18:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: H. Peter Anvin, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
* Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Dec 19, 2013 at 06:07:41PM +0100, Ingo Molnar wrote:
> >
> > * H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > > On 12/19/2013 08:21 AM, Peter Zijlstra wrote:
> > > >
> > > > What's that mb for?
> > > >
> > >
> > > It already exists in mwait_idle_with_hints(); I just moved it into
> > > this common function. It is a bit odd, I have to admit; it seems
> > > like it should be *before* the monitor (and possibly we should have
> > > one after the CLFLUSH as well?)
> >
> > Yes, I think we need a barrier before the CLFLUSH, because according
> > to my reading of the Intel documentation CLFLUSH has no implicit
> > ordering so it might get reordered with the store to ->flags in
> > current_set_polling_and_test(), which might result in spurious wakeup
> > problems again.
>
> No it cannot; since current_set_polling_and_test() already has a
> barrier to prevent that.
See below:
> Also, the location patched by hpa doesn't actually call that at all.
>
> That said, I would find it very strange indeed if a CLFLUSH doesn't
> also flush the store buffer.
So, the Intel documentation says (sorry about the lazy-link):
http://www.jaist.ac.jp/iscenter-new/mpc/altix/altixdata/opt/intel/vtune/doc/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc31.htm
"CLFLUSH is only ordered by the MFENCE instruction. It is not
guaranteed to be ordered by any other fencing, serializing or other
CLFLUSH instruction. For example, software can use an MFENCE
instruction to insure that previous stores are included in the
write-back."
So a specific MFENCE barrier is needed.
Also note that this wording excludes implicit serialization such as
LOCK prefix or XCHG barriers. As it happens
current_set_polling_and_test() uses smp_mb(), which happens to map to
MFENCE on all CPUs that can do CLFLUSH, but that's really just an
accident and in no way engineered.
_At minimum_ we need a prominent comment at the clflush usage site
that we rely on the MFENCE in current_set_polling_and_test() ...
> > (And CLFLUSH is a store in a sense, so special in that the regular
> > ordering for stores does not apply.)
> >
> > Likewise, having a barrier before the MONITOR looks sensible as
> > well. Having it _after_ monitor looks weird and is probably wrong.
> > [It might have been the effects of someone seeing the spurious
> > wakeup problems with realizing the true source, or so.]
>
> I again have to disagree, one would expect monitor to flush all that
> is required to start the monitor -- and it actually does so. As is
> testified by this extra CLFLUSH being called a bug workaround.
This assumption would be safer - although AFAICS the Intel
MONITOR/MWAIT documentation is quiet about this aspect.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 17:07 ` Ingo Molnar
2013-12-19 17:25 ` Peter Zijlstra
@ 2013-12-19 18:09 ` H. Peter Anvin
2013-12-19 18:19 ` H. Peter Anvin
2013-12-19 18:19 ` Ingo Molnar
1 sibling, 2 replies; 27+ messages in thread
From: H. Peter Anvin @ 2013-12-19 18:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
On 12/19/2013 09:07 AM, Ingo Molnar wrote:
>
> Likewise, having a barrier before the MONITOR looks sensible as well.
> Having it _after_ monitor looks weird and is probably wrong. [It might
> have been the effects of someone seeing the spurious wakeup problems
> with realizing the true source, or so.]
>
Does anyone know the history of this barrier after the monitor? I know
Len is looking for a minimal patchset that can go into -stable, and it
seems prudent to not preturb the code more than necessary, but going
forward it would be nice to know...
-hpa
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 18:09 ` H. Peter Anvin
@ 2013-12-19 18:19 ` H. Peter Anvin
2013-12-19 18:23 ` Ingo Molnar
2013-12-19 18:19 ` Ingo Molnar
1 sibling, 1 reply; 27+ messages in thread
From: H. Peter Anvin @ 2013-12-19 18:19 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
On 12/19/2013 10:09 AM, H. Peter Anvin wrote:
> On 12/19/2013 09:07 AM, Ingo Molnar wrote:
>>
>> Likewise, having a barrier before the MONITOR looks sensible as well.
>> Having it _after_ monitor looks weird and is probably wrong. [It might
>> have been the effects of someone seeing the spurious wakeup problems
>> with realizing the true source, or so.]
>>
>
> Does anyone know the history of this barrier after the monitor? I know
> Len is looking for a minimal patchset that can go into -stable, and it
> seems prudent to not preturb the code more than necessary, but going
> forward it would be nice to know...
>
Hmm... it *looks* like it is intended to be part of the construct:
smp_mb();
if (!need_resched())
...
I found a note in the HLT variant of the function saying:
/*
* TS_POLLING-cleared state must be visible before we
* test NEED_RESCHED:
*/
... which presumably has been copied elsewhere.
-hpa
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 18:19 ` H. Peter Anvin
@ 2013-12-19 18:23 ` Ingo Molnar
[not found] ` <CA+55aFzGxcML7j8CEvQPYzh0W81uVoAAVmGctMOUZ7CZ1yYd2A@mail.gmail.com>
0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2013-12-19 18:23 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Peter Zijlstra, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
* H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/19/2013 10:09 AM, H. Peter Anvin wrote:
> > On 12/19/2013 09:07 AM, Ingo Molnar wrote:
> >>
> >> Likewise, having a barrier before the MONITOR looks sensible as well.
> >> Having it _after_ monitor looks weird and is probably wrong. [It might
> >> have been the effects of someone seeing the spurious wakeup problems
> >> with realizing the true source, or so.]
> >>
> >
> > Does anyone know the history of this barrier after the monitor? I know
> > Len is looking for a minimal patchset that can go into -stable, and it
> > seems prudent to not preturb the code more than necessary, but going
> > forward it would be nice to know...
> >
>
> Hmm... it *looks* like it is intended to be part of the construct:
>
> smp_mb();
> if (!need_resched())
> ...
>
> I found a note in the HLT variant of the function saying:
>
> /*
> * TS_POLLING-cleared state must be visible before we
> * test NEED_RESCHED:
> */
Yes, that makes sense: the need_resched test is a load, and MONITOR is
a load as well. Can the two ever cross, or does the CPU guarantee that
because it's the same address, the loads don't cross?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 18:09 ` H. Peter Anvin
2013-12-19 18:19 ` H. Peter Anvin
@ 2013-12-19 18:19 ` Ingo Molnar
2013-12-19 19:22 ` H. Peter Anvin
1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2013-12-19 18:19 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Peter Zijlstra, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
* H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/19/2013 09:07 AM, Ingo Molnar wrote:
> >
> > Likewise, having a barrier before the MONITOR looks sensible as
> > well. Having it _after_ monitor looks weird and is probably wrong.
> > [It might have been the effects of someone seeing the spurious
> > wakeup problems with realizing the true source, or so.]
>
> Does anyone know the history of this barrier after the monitor? I
> know Len is looking for a minimal patchset that can go into -stable,
> and it seems prudent to not preturb the code more than necessary,
> but going forward it would be nice to know...
For the minimal fix I don't think we should change it - but for v3.14
it looks like a speedup for the from-idle codepath, which is
performance sensitive.
( It would also be nice to know whether MONITOR loads that cacheline
into the CPUs cache, and in what state it loads it. )
Thanks,
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 18:19 ` Ingo Molnar
@ 2013-12-19 19:22 ` H. Peter Anvin
2013-12-19 19:27 ` Peter Zijlstra
0 siblings, 1 reply; 27+ messages in thread
From: H. Peter Anvin @ 2013-12-19 19:22 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
On 12/19/2013 10:19 AM, Ingo Molnar wrote:
>
> ( It would also be nice to know whether MONITOR loads that cacheline
> into the CPUs cache, and in what state it loads it. )
>
I would assume that is implementation-dependent. However, one plausible
implementation is to load the cache line into the cache in shared state
and monitor for evictions.
-hpa
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
2013-12-19 19:22 ` H. Peter Anvin
@ 2013-12-19 19:27 ` Peter Zijlstra
0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2013-12-19 19:27 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ingo Molnar, Len Brown, x86, linux-pm, linux-kernel, Len Brown,
stable, Linus Torvalds, Thomas Gleixner, Mike Galbraith,
Borislav Petkov
On Thu, Dec 19, 2013 at 11:22:01AM -0800, H. Peter Anvin wrote:
> On 12/19/2013 10:19 AM, Ingo Molnar wrote:
> >
> > ( It would also be nice to know whether MONITOR loads that cacheline
> > into the CPUs cache, and in what state it loads it. )
> >
>
> I would assume that is implementation-dependent. However, one plausible
> implementation is to load the cache line into the cache in shared state
> and monitor for evictions.
I suppose the monitor part is the important part because certain C
states drop all cache, presumably including the cacheline we're actually
monitoring.
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2013-12-19 21:17 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-18 21:44 [PATCH] x86 idle: repair large-server 50-watt idle-power regression Len Brown
2013-12-19 12:22 ` Ingo Molnar
2013-12-19 14:40 ` H. Peter Anvin
2013-12-19 15:45 ` Borislav Petkov
2013-12-19 15:55 ` H. Peter Anvin
2013-12-19 16:02 ` Ingo Molnar
2013-12-19 16:09 ` H. Peter Anvin
2013-12-19 16:13 ` H. Peter Anvin
2013-12-19 16:21 ` Peter Zijlstra
2013-12-19 16:50 ` H. Peter Anvin
2013-12-19 17:07 ` Ingo Molnar
2013-12-19 17:25 ` Peter Zijlstra
2013-12-19 17:36 ` Peter Zijlstra
2013-12-19 18:05 ` H. Peter Anvin
2013-12-19 18:14 ` Ingo Molnar
2013-12-19 17:50 ` Peter Zijlstra
2013-12-19 18:18 ` Ingo Molnar
2013-12-19 21:05 ` H. Peter Anvin
2013-12-19 21:17 ` Ingo Molnar
2013-12-19 18:10 ` Ingo Molnar
2013-12-19 18:09 ` H. Peter Anvin
2013-12-19 18:19 ` H. Peter Anvin
2013-12-19 18:23 ` Ingo Molnar
[not found] ` <CA+55aFzGxcML7j8CEvQPYzh0W81uVoAAVmGctMOUZ7CZ1yYd2A@mail.gmail.com>
2013-12-19 18:43 ` Ingo Molnar
2013-12-19 18:19 ` Ingo Molnar
2013-12-19 19:22 ` H. Peter Anvin
2013-12-19 19:27 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox