* [BISECTED] Lots of "rescheduling IPIs" in powertop
@ 2008-05-13 20:42 Vegard Nossum
2008-05-13 20:54 ` Andi Kleen
2008-05-14 6:56 ` Ingo Molnar
0 siblings, 2 replies; 15+ messages in thread
From: Vegard Nossum @ 2008-05-13 20:42 UTC (permalink / raw)
To: Andi Kleen, Ingo Molnar, Thomas Gleixner
Cc: Andreas Herrmann, S.Çağlar Onur, Valdis.Kletnieks,
Matt Mackall, linux-kernel
Hi,
Recap: powertop shows between 200-400 wakeups/second with the description
"<kernel IPI>: Rescheduling interrupts" when all processors have load (e.g.
I need to run two busy-loops on my 2-CPU system for this to show up).
The bisect resulted in this commit:
commit 0c07ee38c9d4eb081758f5ad14bbffa7197e1aec
Author: Andi Kleen <ak@suse.de>
Date: Wed Jan 30 13:33:16 2008 +0100
x86: use the correct cpuid method to detect MWAIT support for C states
Previously there was a AMD specific quirk to handle the case of
AMD Fam10h MWAIT not supporting any C states. But it turns out
that CPUID already has ways to detectly detect that without
using special quirks.
The new code simply checks if MWAIT supports at least C1 and doesn't
use it if it doesn't. No more vendor specific code.
Note this is does not simply clear MWAIT because MWAIT can be still
useful even without C states.
Credit goes to Ben Serebrin for pointing out the (nearly) obvious.
Cc: "Andreas Herrmann" <andreas.herrmann3@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I was not able to revert this without conflicts, so I will leave that part
to you. Here is the bisect log in case it is useful:
git-bisect start 'v2.6.26-rc1' 'v2.6.24'
git-bisect bad 3f85d63ea4ff922f6abdb509f4aaf6993b3273a3
git-bisect bad f77bc6a420eba845605ff1d53cadf55f94c5e8b7
git-bisect good c76c04758b8fd24a1c38b19742e3437e954e945b
git-bisect good 0ba6c33bcddc64a54b5f1c25a696c4767dc76292
git-bisect good fde1b3fa947c2512e3715962ebb1d3a6a9b9bb7d
git-bisect bad d690b2afd5a7a02816386aa704c8c0b1aca8d2de
git-bisect good 773221f46f82dc2f277dacc331d9d2ef2c690cb6
git-bisect bad f8f76481bc2803aea03ff213c7e1405b53f7e488
git-bisect bad 9042219cd8d43b81322b826d463dd6e52acae6cf
git-bisect bad ca74a6f84e68b44867022f4a4f3ec17c087c864e
git-bisect good 064954761254ef17fae2b84fb5a034d48a769143
git-bisect good 30d432dfab2bcfd021d352e2058fae6b9405caeb
git-bisect bad 27cc2a812eb504f4aadff5baa862da715fb0f886
git-bisect bad 0c07ee38c9d4eb081758f5ad14bbffa7197e1aec
This initially showed up on my 2.6.24.5-85.fc8 (Fedora) kernel.
I have two different systems (one laptop, one desktop) which both show the
same symptoms, please do tell me if you need any more info.
Vegard
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-13 20:42 [BISECTED] Lots of "rescheduling IPIs" in powertop Vegard Nossum
@ 2008-05-13 20:54 ` Andi Kleen
2008-05-13 21:09 ` Vegard Nossum
2008-05-14 6:56 ` Ingo Molnar
1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2008-05-13 20:54 UTC (permalink / raw)
To: Vegard Nossum
Cc: Ingo Molnar, Thomas Gleixner, Andreas Herrmann,
S.Çağlar Onur, Valdis.Kletnieks, Matt Mackall,
linux-kernel
Vegard Nossum <vegard.nossum@gmail.com> writes:
>
> This initially showed up on my 2.6.24.5-85.fc8 (Fedora) kernel.
>
> I have two different systems (one laptop, one desktop) which both show the
> same symptoms, please do tell me if you need any more info.
What CPUs do they have? And does the problem really go away when
you boot with idle=mwait ? And what does the following program
output?
#include <asm/msr.h>
main()
{
printf("%x\n", cpuid_edx(5));
}
-Andi
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-13 20:54 ` Andi Kleen
@ 2008-05-13 21:09 ` Vegard Nossum
2008-05-13 21:19 ` Andi Kleen
0 siblings, 1 reply; 15+ messages in thread
From: Vegard Nossum @ 2008-05-13 21:09 UTC (permalink / raw)
To: Andi Kleen
Cc: Ingo Molnar, Thomas Gleixner, Andreas Herrmann,
S.Çağlar Onur, Valdis.Kletnieks, Matt Mackall,
linux-kernel
On Tue, May 13, 2008 at 10:54 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Vegard Nossum <vegard.nossum@gmail.com> writes:
> >
> > This initially showed up on my 2.6.24.5-85.fc8 (Fedora) kernel.
> >
> > I have two different systems (one laptop, one desktop) which both show the
> > same symptoms, please do tell me if you need any more info.
>
> What CPUs do they have? And does the problem really go away when
> you boot with idle=mwait ? And what does the following program
> output?
Yes, the problem goes away with idle=mwait.
The desktop is a P4:
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 6
model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping : 5
cpu MHz : 2992.624
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx
lm constant_tsc pebs bts pni monitor ds_cpl est tm2 cid cx16 xtpr
lahf_lm
bogomips : 5990.81
clflush size : 64
(similar for processor 1)
# msr
0
The laptop is a Pentium Dual-Core:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Pentium(R) Dual CPU T2310 @ 1.46GHz
stepping : 13
cpu MHz : 800.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
constant_tsc arch_perfmon pebs bts pni monitor ds_cpl est tm2 ssse3
cx16 xtpr lahf_lm
bogomips : 2930.23
clflush size : 64
(similar for processor 1)
# ./msr
1110
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-13 21:09 ` Vegard Nossum
@ 2008-05-13 21:19 ` Andi Kleen
2008-05-14 4:02 ` Arjan van de Ven
2008-05-14 6:00 ` Vegard Nossum
0 siblings, 2 replies; 15+ messages in thread
From: Andi Kleen @ 2008-05-13 21:19 UTC (permalink / raw)
To: Vegard Nossum
Cc: Ingo Molnar, Thomas Gleixner, Andreas Herrmann,
"S.Çağlar Onur", Valdis.Kletnieks, Matt Mackall,
linux-kernel, arjan
> The desktop is a P4:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 15
> model : 6
> model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
> stepping : 5
> cpu MHz : 2992.624
> cache size : 2048 KB
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx
> lm constant_tsc pebs bts pni monitor ds_cpl est tm2 cid cx16 xtpr
> lahf_lm
> bogomips : 5990.81
> clflush size : 64
>
> (similar for processor 1)
>
> # msr
> 0
Ok the CPU reports it doesn't support any C states in MWAIT. If that is
correct then it would be correct to not use MWAIT idle and might
actually save more power to not use it.
I don't know if that's true or not. Do you have a power meter perhaps?
If yes can you measure if there's a difference between mwait=idle /
default on your box when it is idle?
[cc Arjan he might now if that CPU is supposed to support C1 in MWAIT]
>
> The laptop is a Pentium Dual-Core:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 15
> model name : Intel(R) Pentium(R) Dual CPU T2310 @ 1.46GHz
> stepping : 13
> cpu MHz : 800.000
> cache size : 1024 KB
...
> flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
> constant_tsc arch_perfmon pebs bts pni monitor ds_cpl est tm2 ssse3
> cx16 xtpr lahf_lm
> bogomips : 2930.23
> clflush size : 64
>
> (similar for processor 1)
>
> # ./msr
> 1110
CPU reports it supports C1/C2/C3. Are you sure there is a difference on
that box? The code should have kept using MWAIT because it checks C1.
Please double check.
-Andi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-13 21:19 ` Andi Kleen
@ 2008-05-14 4:02 ` Arjan van de Ven
2008-05-14 6:58 ` Andi Kleen
2008-05-14 6:00 ` Vegard Nossum
1 sibling, 1 reply; 15+ messages in thread
From: Arjan van de Ven @ 2008-05-14 4:02 UTC (permalink / raw)
To: Andi Kleen
Cc: Vegard Nossum, Ingo Molnar, Thomas Gleixner, Andreas Herrmann,
S.Çağlar Onur, Valdis.Kletnieks, Matt Mackall,
linux-kernel
On Tue, 13 May 2008 23:19:47 +0200
Andi Kleen <andi@firstfloor.org> wrote:
>
> Ok the CPU reports it doesn't support any C states in MWAIT. If that
> is correct then it would be correct to not use MWAIT idle and might
> actually save more power to not use it.
what does the current SVN powertop say on this cpu?
>
> I don't know if that's true or not. Do you have a power meter perhaps?
> If yes can you measure if there's a difference between mwait=idle /
> default on your box when it is idle?
>
> [cc Arjan he might now if that CPU is supposed to support C1 in MWAIT]
I wasn't aware that P4's supported mwait in this way; I thought it was
core and later.
> CPU reports it supports C1/C2/C3. Are you sure there is a difference
> on that box? The code should have kept using MWAIT because it checks
> C1. Please double check.
The check is .. dubious I suspect... because the cpuid bits are not
actually the prime source of information, the BIOS is.
If the bios says mwait is usable, we need to use it with the values IT
gives us.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-14 4:02 ` Arjan van de Ven
@ 2008-05-14 6:58 ` Andi Kleen
2008-05-14 13:55 ` Arjan van de Ven
0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2008-05-14 6:58 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Vegard Nossum, Ingo Molnar, Thomas Gleixner, Andreas Herrmann,
"S.Çag(lar Onur", Valdis.Kletnieks, Matt Mackall,
linux-kernel, lenb
Arjan van de Ven wrote:
[cc Len]
> On Tue, 13 May 2008 23:19:47 +0200
> Andi Kleen <andi@firstfloor.org> wrote:
>
>> Ok the CPU reports it doesn't support any C states in MWAIT. If that
>> is correct then it would be correct to not use MWAIT idle and might
>> actually save more power to not use it.
>
> what does the current SVN powertop say on this cpu?
>
>> I don't know if that's true or not. Do you have a power meter perhaps?
>> If yes can you measure if there's a difference between mwait=idle /
>> default on your box when it is idle?
>>
>> [cc Arjan he might now if that CPU is supposed to support C1 in MWAIT]
>
> I wasn't aware that P4's supported mwait in this way; I thought it was
> core and later.
Not even C1? I generally consider MWAIT without C1 to be unusable.
Anyways if C1 doesn't work then it would be correct to not use MWAIT.
>
>> CPU reports it supports C1/C2/C3. Are you sure there is a difference
>> on that box? The code should have kept using MWAIT because it checks
>> C1. Please double check.
>
> The check is .. dubious I suspect...
I don't think so.
> because the cpuid bits are not
> actually the prime source of information, the BIOS is.
Hmmm? What BIOS information are you refering to?
Normally it's my experience that CPUID is more reliable than the BIOS.
> If the bios says mwait is usable, we need to use it with the values IT
> gives us.
At least to my knowledge the ACPI FADT just says what C states are
available, not if they are implemented with MWAIT or using IO ports.
-Andi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-14 6:58 ` Andi Kleen
@ 2008-05-14 13:55 ` Arjan van de Ven
2008-05-14 15:21 ` Andi Kleen
0 siblings, 1 reply; 15+ messages in thread
From: Arjan van de Ven @ 2008-05-14 13:55 UTC (permalink / raw)
To: Andi Kleen
Cc: Vegard Nossum, Ingo Molnar, Thomas Gleixner, Andreas Herrmann,
S.Çag(lar Onur, Valdis.Kletnieks, Matt Mackall, linux-kernel,
lenb
On Wed, 14 May 2008 08:58:05 +0200
> > If the bios says mwait is usable, we need to use it with the values
> > IT gives us.
>
> At least to my knowledge the ACPI FADT just says what C states are
> available, not if they are implemented with MWAIT or using IO ports.
_CST does; which is what the mwait C-state selection code uses.
(and we've seen quirky systems where C1 mwait didn't work, SMM did
something wrong or something, because the other OS doesn't seem to use
it for that, so making it the default is not really an option)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-14 13:55 ` Arjan van de Ven
@ 2008-05-14 15:21 ` Andi Kleen
0 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2008-05-14 15:21 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Vegard Nossum, Ingo Molnar, Thomas Gleixner, Andreas Herrmann,
caglar, Valdis.Kletnieks, Matt Mackall, linux-kernel, lenb
Arjan van de Ven wrote:
> On Wed, 14 May 2008 08:58:05 +0200
>>> If the bios says mwait is usable, we need to use it with the values
>>> IT gives us.
>> At least to my knowledge the ACPI FADT just says what C states are
>> available, not if they are implemented with MWAIT or using IO ports.
>
> _CST does; which is what the mwait C-state selection code uses.
>
> (and we've seen quirky systems where C1 mwait didn't work, SMM did
> something wrong or something,
What do you mean with didn't work? And they did not report it in CPUID?
And are they wide spread? I haven't heard of any such problems before.
Even the P4 problem is not really that serious btw.
because the other OS doesn't seem to use
> it for that, so making it the default is not really an option)
But it has been default for practically forever. It is strange
if you declare Linux default behavior for years suddenly as "not really
an option"
I think it's a sane default. And MWAIT is a valuable optimization.
I don't think we should stop sanely architected interfaces on the first
BIOS quirk someone reports. I'm sure for nearly every feature there's
some broken BIOS out there that gets it wrong somehow. If we did this in
general soon we couldn't use anything.
-Andi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-13 21:19 ` Andi Kleen
2008-05-14 4:02 ` Arjan van de Ven
@ 2008-05-14 6:00 ` Vegard Nossum
1 sibling, 0 replies; 15+ messages in thread
From: Vegard Nossum @ 2008-05-14 6:00 UTC (permalink / raw)
To: Andi Kleen
Cc: Ingo Molnar, Thomas Gleixner, Andreas Herrmann,
"S.Çağlar Onur", Valdis.Kletnieks, Matt Mackall,
linux-kernel, arjan
On Tue, May 13, 2008 at 11:19 PM, Andi Kleen <andi@firstfloor.org> wrote:
>
> > The desktop is a P4:
> >
> > processor : 0
> > vendor_id : GenuineIntel
> > cpu family : 15
> > model : 6
> > model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
> > stepping : 5
> > cpu MHz : 2992.624
> > cache size : 2048 KB
>
>
> > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx
> > lm constant_tsc pebs bts pni monitor ds_cpl est tm2 cid cx16 xtpr
> > lahf_lm
> > bogomips : 5990.81
> > clflush size : 64
> >
> > (similar for processor 1)
> >
> > # msr
> > 0
>
> Ok the CPU reports it doesn't support any C states in MWAIT. If that is
> correct then it would be correct to not use MWAIT idle and might
> actually save more power to not use it.
>
> I don't know if that's true or not. Do you have a power meter perhaps?
> If yes can you measure if there's a difference between mwait=idle /
> default on your box when it is idle?
>
> [cc Arjan he might now if that CPU is supposed to support C1 in MWAIT]
No, sorry, no power meter :-/
> > The laptop is a Pentium Dual-Core:
> >
> > processor : 0
> > vendor_id : GenuineIntel
> > cpu family : 6
> > model : 15
> > model name : Intel(R) Pentium(R) Dual CPU T2310 @ 1.46GHz
> > stepping : 13
> > cpu MHz : 800.000
> > cache size : 1024 KB
> ...
>
> > flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca
> > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
> > constant_tsc arch_perfmon pebs bts pni monitor ds_cpl est tm2 ssse3
> > cx16 xtpr lahf_lm
> > bogomips : 2930.23
> > clflush size : 64
> >
> > (similar for processor 1)
> >
> > # ./msr
> > 1110
>
> CPU reports it supports C1/C2/C3. Are you sure there is a difference on
> that box? The code should have kept using MWAIT because it checks C1.
> Please double check.
Yes, sorry, you are correct. I tested the idle=mwait only on the
desktop machine (P4, msr = 0), and it improved the IPI problem. (I
even rechecked right now, and it really does.)
Now I tested it on the laptop as well, and here it makes no difference.
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-13 20:42 [BISECTED] Lots of "rescheduling IPIs" in powertop Vegard Nossum
2008-05-13 20:54 ` Andi Kleen
@ 2008-05-14 6:56 ` Ingo Molnar
2008-05-14 7:11 ` Vegard Nossum
2008-05-14 9:09 ` Andreas Herrmann
1 sibling, 2 replies; 15+ messages in thread
From: Ingo Molnar @ 2008-05-14 6:56 UTC (permalink / raw)
To: Vegard Nossum
Cc: Andi Kleen, Thomas Gleixner, Andreas Herrmann,
S.Çağlar Onur, Valdis.Kletnieks, Matt Mackall,
linux-kernel
* Vegard Nossum <vegard.nossum@gmail.com> wrote:
> Hi,
>
> Recap: powertop shows between 200-400 wakeups/second with the
> description "<kernel IPI>: Rescheduling interrupts" when all
> processors have load (e.g. I need to run two busy-loops on my 2-CPU
> system for this to show up).
ok, could you try the fix below? It was a mistake to make mwait use
dependent on power considerations - on a desktop CPU it is unlikely to
use more power than a simple HLT - and the IPIs are extra scheduling
latency and extra power used.
Ingo
-------------->
Subject: x86: remove mwait C-state capability
From: Ingo Molnar <mingo@elte.hu>
Date: Wed May 14 08:47:40 CEST 2008
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/process.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
Index: linux/arch/x86/kernel/process.c
===================================================================
--- linux.orig/arch/x86/kernel/process.c
+++ linux/arch/x86/kernel/process.c
@@ -99,15 +99,6 @@ static void mwait_idle(void)
local_irq_enable();
}
-
-static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
-{
- if (force_mwait)
- return 1;
- /* Any C1 states supported? */
- return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
-}
-
/*
* On SMP it's slightly faster (but much more power-consuming!)
* to poll the ->work.need_resched flag instead of waiting for the
@@ -131,7 +122,7 @@ void __cpuinit select_idle_routine(const
" performance may degrade.\n");
}
#endif
- if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
+ if (cpu_has(c, X86_FEATURE_MWAIT)) {
/*
* Skip, if setup has overridden idle.
* One CPU supports mwait => All CPUs supports mwait
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-14 6:56 ` Ingo Molnar
@ 2008-05-14 7:11 ` Vegard Nossum
2008-05-14 9:09 ` Andreas Herrmann
1 sibling, 0 replies; 15+ messages in thread
From: Vegard Nossum @ 2008-05-14 7:11 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andi Kleen, Thomas Gleixner, Andreas Herrmann,
S.Çağlar Onur, Valdis.Kletnieks, Matt Mackall,
linux-kernel
Hi,
On Wed, May 14, 2008 at 8:56 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Vegard Nossum <vegard.nossum@gmail.com> wrote:
>
> > Hi,
> >
> > Recap: powertop shows between 200-400 wakeups/second with the
> > description "<kernel IPI>: Rescheduling interrupts" when all
> > processors have load (e.g. I need to run two busy-loops on my 2-CPU
> > system for this to show up).
>
> ok, could you try the fix below? It was a mistake to make mwait use
> dependent on power considerations - on a desktop CPU it is unlikely to
> use more power than a simple HLT - and the IPIs are extra scheduling
> latency and extra power used.
This fixes it for the desktop machine at least. I guess I should try
on the laptop as well, but that will have to wait. Time for school :-)
Thanks.
Vegard
> -------------->
> Subject: x86: remove mwait C-state capability
> From: Ingo Molnar <mingo@elte.hu>
> Date: Wed May 14 08:47:40 CEST 2008
>
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
> arch/x86/kernel/process.c | 11 +----------
> 1 file changed, 1 insertion(+), 10 deletions(-)
>
> Index: linux/arch/x86/kernel/process.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/process.c
> +++ linux/arch/x86/kernel/process.c
> @@ -99,15 +99,6 @@ static void mwait_idle(void)
> local_irq_enable();
> }
>
> -
> -static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
> -{
> - if (force_mwait)
> - return 1;
> - /* Any C1 states supported? */
> - return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
> -}
> -
> /*
> * On SMP it's slightly faster (but much more power-consuming!)
> * to poll the ->work.need_resched flag instead of waiting for the
> @@ -131,7 +122,7 @@ void __cpuinit select_idle_routine(const
> " performance may degrade.\n");
> }
> #endif
> - if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
> + if (cpu_has(c, X86_FEATURE_MWAIT)) {
> /*
> * Skip, if setup has overridden idle.
> * One CPU supports mwait => All CPUs supports mwait
>
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-14 6:56 ` Ingo Molnar
2008-05-14 7:11 ` Vegard Nossum
@ 2008-05-14 9:09 ` Andreas Herrmann
2008-05-14 11:42 ` Andi Kleen
2008-05-16 8:40 ` Ingo Molnar
1 sibling, 2 replies; 15+ messages in thread
From: Andreas Herrmann @ 2008-05-14 9:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Vegard Nossum, Andi Kleen, Thomas Gleixner,
S.Çağlar Onur, Valdis.Kletnieks, Matt Mackall,
linux-kernel
On Wed, May 14, 2008 at 08:56:05AM +0200, Ingo Molnar wrote:
>
> * Vegard Nossum <vegard.nossum@gmail.com> wrote:
>
> > Hi,
> >
> > Recap: powertop shows between 200-400 wakeups/second with the
> > description "<kernel IPI>: Rescheduling interrupts" when all
> > processors have load (e.g. I need to run two busy-loops on my 2-CPU
> > system for this to show up).
>
> ok, could you try the fix below? It was a mistake to make mwait use
> dependent on power considerations - on a desktop CPU it is unlikely to
> use more power than a simple HLT - and the IPIs are extra scheduling
> latency and extra power used.
It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
depend on a clock divisor and current Pstate of the core.
If all cores of a processor are in halt state (C1) the processor can
enter the C1E (C1 enhanced) state. If mwait is used this will never
happen.
Thus HLT saves more power than MWAIT here.
It might be best to switch off the mwait flag for these AMD CPU
families like it was introduced with commit
f039b754714a422959027cb18bb33760eb8153f0 (x86: Don't use MWAIT on AMD
Family 10)
Andreas
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-14 9:09 ` Andreas Herrmann
@ 2008-05-14 11:42 ` Andi Kleen
2008-05-15 16:10 ` Andreas Herrmann
2008-05-16 8:40 ` Ingo Molnar
1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2008-05-14 11:42 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Ingo Molnar, Vegard Nossum, Thomas Gleixner,
S.Çağlar Onur, Valdis.Kletnieks, Matt Mackall,
linux-kernel
Andreas Herrmann <andreas.herrmann3@amd.com> writes:
>
> It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
> Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
^ not
> It might be best to switch off the mwait flag for these AMD CPU
> families like it was introduced with commit
> f039b754714a422959027cb18bb33760eb8153f0 (x86: Don't use MWAIT on AMD
> Family 10)
Then you have to special case everything again. We still need to
work out if the P4 is even correct here or not, but if it's not
i would rather quirk the cpuid reporting on it.
-Andi
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-14 11:42 ` Andi Kleen
@ 2008-05-15 16:10 ` Andreas Herrmann
0 siblings, 0 replies; 15+ messages in thread
From: Andreas Herrmann @ 2008-05-15 16:10 UTC (permalink / raw)
To: Andi Kleen
Cc: Ingo Molnar, Vegard Nossum, Thomas Gleixner,
S.Çağlar Onur, Valdis.Kletnieks, Matt Mackall,
linux-kernel
On Wed, May 14, 2008 at 01:42:54PM +0200, Andi Kleen wrote:
> Andreas Herrmann <andreas.herrmann3@amd.com> writes:
> >
> > It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
> > Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
> ^ not
Not sure what you meant by your comment.
Maybe you should re-read the paragraph.
It's as simple as that:
If OS executes Halt the core enters C1.
The core exits C1 if an interrupt is received.
> > It might be best to switch off the mwait flag for these AMD CPU
> > families like it was introduced with commit
> > f039b754714a422959027cb18bb33760eb8153f0 (x86: Don't use MWAIT on AMD
> > Family 10)
>
> Then you have to special case everything again. We still need to
> work out if the P4 is even correct here or not, but if it's not
> i would rather quirk the cpuid reporting on it.
I just want to ensure that for AMD family 0x10 and 0x11 halt and not
mwait is executed when the core is idle.
Andreas
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BISECTED] Lots of "rescheduling IPIs" in powertop
2008-05-14 9:09 ` Andreas Herrmann
2008-05-14 11:42 ` Andi Kleen
@ 2008-05-16 8:40 ` Ingo Molnar
1 sibling, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2008-05-16 8:40 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Vegard Nossum, Andi Kleen, Thomas Gleixner, S.??a??lar Onur,
Valdis.Kletnieks, Matt Mackall, linux-kernel
* Andreas Herrmann <andreas.herrmann3@amd.com> wrote:
> > > Recap: powertop shows between 200-400 wakeups/second with the
> > > description "<kernel IPI>: Rescheduling interrupts" when all
> > > processors have load (e.g. I need to run two busy-loops on my
> > > 2-CPU system for this to show up).
> >
> > ok, could you try the fix below? It was a mistake to make mwait use
> > dependent on power considerations - on a desktop CPU it is unlikely
> > to use more power than a simple HLT - and the IPIs are extra
> > scheduling latency and extra power used.
>
> It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
> Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
> depend on a clock divisor and current Pstate of the core.
>
> If all cores of a processor are in halt state (C1) the processor can
> enter the C1E (C1 enhanced) state. If mwait is used this will never
> happen.
>
> Thus HLT saves more power than MWAIT here.
>
> It might be best to switch off the mwait flag for these AMD CPU
> families like it was introduced with commit
> f039b754714a422959027cb18bb33760eb8153f0 (x86: Don't use MWAIT on AMD
> Family 10)
agreed, that looks like the cleanest solution. Could you please send a
patch for that, ontop of my patch?
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-05-16 8:41 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-13 20:42 [BISECTED] Lots of "rescheduling IPIs" in powertop Vegard Nossum
2008-05-13 20:54 ` Andi Kleen
2008-05-13 21:09 ` Vegard Nossum
2008-05-13 21:19 ` Andi Kleen
2008-05-14 4:02 ` Arjan van de Ven
2008-05-14 6:58 ` Andi Kleen
2008-05-14 13:55 ` Arjan van de Ven
2008-05-14 15:21 ` Andi Kleen
2008-05-14 6:00 ` Vegard Nossum
2008-05-14 6:56 ` Ingo Molnar
2008-05-14 7:11 ` Vegard Nossum
2008-05-14 9:09 ` Andreas Herrmann
2008-05-14 11:42 ` Andi Kleen
2008-05-15 16:10 ` Andreas Herrmann
2008-05-16 8:40 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox