linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/5] Skip timer works.patch
@ 2006-10-20  0:09 Zachary Amsden
  2006-10-27 14:56 ` Andi Kleen
  2006-11-16 22:53 ` Andrew Morton
  0 siblings, 2 replies; 15+ messages in thread
From: Zachary Amsden @ 2006-10-20  0:09 UTC (permalink / raw)
  To: Andrew Morton, Rusty Russell, Andi Kleen, Jeremy Fitzhardinge,
	Chris Wright, Virtualization Mailing List,
	Linux Kernel Mailing List, Zachary Amsden

Add a way to disable the timer IRQ routing check via a boot option.  The
VMI timer code uses this to avoid triggering the pester Mingo code, which
probes for some very unusual and broken motherboard routings.  It fires
100% of the time when using a paravirtual delay mechanism instead of
using a realtime delay, since there is no elapsed real time, and the 4 timer
IRQs have not yet been delivered.

In addition, it is entirely possible, though improbable, that this bug
could surface on real hardware which picks a particularly bad time to enter
SMM mode, causing a long latency during one of the timer IRQs.

While here, make check_timer be __init.

Signed-off-by: Zachary Amsden <zach@vmware.com>

===================================================================
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -603,8 +603,6 @@ and is between 256 and 4096 characters. 
 
 	hugepages=	[HW,IA-32,IA-64] Maximal number of HugeTLB pages.
 
-	noirqbalance	[IA-32,SMP,KNL] Disable kernel irq balancing
-
 	i8042.direct	[HW] Put keyboard port into non-translated mode
 	i8042.dumbkbd	[HW] Pretend that controller can only read data from
 			     keyboard and cannot control its state
@@ -1060,8 +1058,13 @@ and is between 256 and 4096 characters. 
 			in certain environments such as networked servers or
 			real-time systems.
 
+	noirqbalance	[IA-32,SMP,KNL] Disable kernel irq balancing
+
 	noirqdebug	[IA-32] Disables the code which attempts to detect and
 			disable unhandled interrupt sources.
+
+	noirqtest	[IA-32,APIC] Disables the code which tests for broken
+			timer IRQ sources.
 
 	noisapnp	[ISAPNP] Disables ISA PnP code.
 
===================================================================
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -1864,6 +1864,15 @@ static void __init setup_ioapic_ids_from
 static void __init setup_ioapic_ids_from_mpc(void) { }
 #endif
 
+int timer_irq_really_works __initdata;
+int __init irqtest_disable(char *str)
+{
+	timer_irq_really_works = 1;
+	return 1;
+}
+
+__setup("noirqtest", irqtest_disable);
+
 /*
  * There is a nasty bug in some older SMP boards, their mptable lies
  * about the timer IRQ. We do the following to work around the situation:
@@ -1872,9 +1881,12 @@ static void __init setup_ioapic_ids_from
  *	- if this function detects that timer IRQs are defunct, then we fall
  *	  back to ISA timer IRQs
  */
-static int __init timer_irq_works(void)
+int __init timer_irq_works(void)
 {
 	unsigned long t1 = jiffies;
+
+	if (timer_irq_really_works)
+		return 1;
 
 	local_irq_enable();
 	/* Let ten ticks pass... */
@@ -2146,7 +2158,7 @@ int timer_uses_ioapic_pin_0;
  * is so screwy.  Thanks to Brian Perkins for testing/hacking this beast
  * fanatically on his truly buggy board.
  */
-static inline void check_timer(void)
+static inline void __init check_timer(void)
 {
 	int apic1, pin1, apic2, pin2;
 	int vector;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-10-20  0:09 [PATCH 1/5] Skip timer works.patch Zachary Amsden
@ 2006-10-27 14:56 ` Andi Kleen
  2006-10-27 19:09   ` Zachary Amsden
  2006-11-16 22:53 ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2006-10-27 14:56 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Andrew Morton, Rusty Russell, Jeremy Fitzhardinge, Chris Wright,
	Virtualization Mailing List, Linux Kernel Mailing List

On Thu, Oct 19, 2006 at 05:09:22PM -0700, Zachary Amsden wrote:
> Add a way to disable the timer IRQ routing check via a boot option.  The
> VMI timer code uses this to avoid triggering the pester Mingo code, which
> probes for some very unusual and broken motherboard routings.  It fires
> 100% of the time when using a paravirtual delay mechanism instead of
> using a realtime delay, since there is no elapsed real time, and the 4 timer
> IRQs have not yet been delivered.

You mean paravirtualized udelay will not actually wait? 

This implies that you can't ever use any real timer in that kind of guest,
right?

> 
> In addition, it is entirely possible, though improbable, that this bug
> could surface on real hardware which picks a particularly bad time to enter
> SMM mode, causing a long latency during one of the timer IRQs.

We already have a no timer check option. But:

> 
> While here, make check_timer be __init.

So how is this supposed to work? The hypervisor would always pass that 
option?  If yes that would seem rather hackish to me. We should probably
instead probe in some way if we have the required timer hardware.
The paravirt kernel should know anyways it is paravirt and that it doesn't
need to probe for flakey hardware.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-10-27 14:56 ` Andi Kleen
@ 2006-10-27 19:09   ` Zachary Amsden
  2006-10-27 21:16     ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-10-27 19:09 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Rusty Russell, Jeremy Fitzhardinge, Chris Wright,
	Virtualization Mailing List, Linux Kernel Mailing List

Andi Kleen wrote:
> On Thu, Oct 19, 2006 at 05:09:22PM -0700, Zachary Amsden wrote:
>   
>> Add a way to disable the timer IRQ routing check via a boot option.  The
>> VMI timer code uses this to avoid triggering the pester Mingo code, which
>> probes for some very unusual and broken motherboard routings.  It fires
>> 100% of the time when using a paravirtual delay mechanism instead of
>> using a realtime delay, since there is no elapsed real time, and the 4 timer
>> IRQs have not yet been delivered.
>>     
>
> You mean paravirtualized udelay will not actually wait? 
>   

Yes, but even putting that problem aside, the timing element here is 
tricky to get right in a VM.

> This implies that you can't ever use any real timer in that kind of guest,
> right?
>   

No.  You can use a real timer just fine.  But there is no reason ever to 
use udelay to busy wait for "hardware" in a virtual machine.  Drivers 
which are used for real hardware may turn udelay back on selectively; 
but this is another patch.


>> In addition, it is entirely possible, though improbable, that this bug
>> could surface on real hardware which picks a particularly bad time to enter
>> SMM mode, causing a long latency during one of the timer IRQs.
>>     
>
> We already have a no timer check option. But:
>   

Really?  I didn't see one that disabled the broken motherboard detection 
/ workaround code, which is what we are trying to avoid here.


>> While here, make check_timer be __init.
>>     
>
> So how is this supposed to work? The hypervisor would always pass that 
> option?  If yes that would seem rather hackish to me. We should probably
> instead probe in some way if we have the required timer hardware.
> The paravirt kernel should know anyways it is paravirt and that it doesn't
> need to probe for flakey hardware.
>   

That is what this patch is building towards, but the boot option is 
"free", so why not?  In the meantime, it helps non-paravirt kernels 
booted in a VM.

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-10-27 19:09   ` Zachary Amsden
@ 2006-10-27 21:16     ` Andi Kleen
  2006-10-30 20:54       ` Zachary Amsden
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2006-10-27 21:16 UTC (permalink / raw)
  To: virtualization
  Cc: Zachary Amsden, Andi Kleen, Andrew Morton, Chris Wright,
	Linux Kernel Mailing List

On Friday 27 October 2006 12:09, Zachary Amsden wrote:
> Andi Kleen wrote:
> > On Thu, Oct 19, 2006 at 05:09:22PM -0700, Zachary Amsden wrote:
> >> Add a way to disable the timer IRQ routing check via a boot option.  The
> >> VMI timer code uses this to avoid triggering the pester Mingo code,
> >> which probes for some very unusual and broken motherboard routings.  It
> >> fires 100% of the time when using a paravirtual delay mechanism instead
> >> of using a realtime delay, since there is no elapsed real time, and the
> >> 4 timer IRQs have not yet been delivered.
> >
> > You mean paravirtualized udelay will not actually wait?
>
> Yes, but even putting that problem aside, the timing element here is
> tricky to get right in a VM.
>
> > This implies that you can't ever use any real timer in that kind of
> > guest, right?
>
> No.  You can use a real timer just fine.  But there is no reason ever to
> use udelay to busy wait for "hardware" in a virtual machine.  Drivers
> which are used for real hardware may turn udelay back on selectively;
> but this is another patch.
>
> >> In addition, it is entirely possible, though improbable, that this bug
> >> could surface on real hardware which picks a particularly bad time to
> >> enter SMM mode, causing a long latency during one of the timer IRQs.
> >
> > We already have a no timer check option. But:
>
> Really?  I didn't see one that disabled the broken motherboard detection
> / workaround code, which is what we are trying to avoid here.

no_timer_check. But it's only there on x86-64 in mainline - although there
were some patches to add it to i386 too.

> That is what this patch is building towards, but the boot option is
> "free", so why not?  In the meantime, it helps non-paravirt kernels
> booted in a VM.

Hmm, you meant they paniced before?  If they just fail a few tests
that is not particularly worrying (real hardware does that often too)

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-10-27 21:16     ` Andi Kleen
@ 2006-10-30 20:54       ` Zachary Amsden
  2006-10-30 22:50         ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-10-30 20:54 UTC (permalink / raw)
  To: Andi Kleen
  Cc: virtualization, Andi Kleen, Andrew Morton, Chris Wright,
	Linux Kernel Mailing List

Andi Kleen wrote:
> no_timer_check. But it's only there on x86-64 in mainline - although there
> were some patches to add it to i386 too.
>   

I can rename to match the x86-64 name.


>> That is what this patch is building towards, but the boot option is
>> "free", so why not?  In the meantime, it helps non-paravirt kernels
>> booted in a VM.
>>     
>
> Hmm, you meant they paniced before?  If they just fail a few tests
> that is not particularly worrying (real hardware does that often too)
>   

Yes, they sometimes fail to boot, and the failure message used to ask us 
to pester mingo.

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-10-30 20:54       ` Zachary Amsden
@ 2006-10-30 22:50         ` Andi Kleen
  2006-10-30 23:09           ` Zachary Amsden
  2006-11-15  8:03           ` Chris Wright
  0 siblings, 2 replies; 15+ messages in thread
From: Andi Kleen @ 2006-10-30 22:50 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Andi Kleen, virtualization, Andrew Morton, Chris Wright,
	Linux Kernel Mailing List

On Mon, Oct 30, 2006 at 12:54:55PM -0800, Zachary Amsden wrote:
> Andi Kleen wrote:
> >no_timer_check. But it's only there on x86-64 in mainline - although there
> >were some patches to add it to i386 too.
> >  
> 
> I can rename to match the x86-64 name.

I will do that in my tree.

> >>That is what this patch is building towards, but the boot option is
> >>"free", so why not?  In the meantime, it helps non-paravirt kernels
> >>booted in a VM.
> >>    
> >
> >Hmm, you meant they paniced before?  If they just fail a few tests
> >that is not particularly worrying (real hardware does that often too)
> >  
> 
> Yes, they sometimes fail to boot, and the failure message used to ask us 
> to pester mingo.

I still think we should figure that out automatically. Letting
the Hypervisor pass magic boot options seems somehow unclean.

But i suppose it will only work for the paravirtualized case,
not for the case of kernel running "native" under a hypervisor
I suppose? Or does that one not panic?


-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-10-30 22:50         ` Andi Kleen
@ 2006-10-30 23:09           ` Zachary Amsden
  2006-10-30 23:12             ` Andi Kleen
  2006-11-15  8:03           ` Chris Wright
  1 sibling, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-10-30 23:09 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andi Kleen, virtualization, Andrew Morton, Chris Wright,
	Linux Kernel Mailing List

Andi Kleen wrote:
> On Mon, Oct 30, 2006 at 12:54:55PM -0800, Zachary Amsden wrote:
>   
>> Andi Kleen wrote:
>>     
>>> no_timer_check. But it's only there on x86-64 in mainline - although there
>>> were some patches to add it to i386 too.
>>>  
>>>       
>> I can rename to match the x86-64 name.
>>     
>
> I will do that in my tree.
>
>   
>>>> That is what this patch is building towards, but the boot option is
>>>> "free", so why not?  In the meantime, it helps non-paravirt kernels
>>>> booted in a VM.
>>>>    
>>>>         
>>> Hmm, you meant they paniced before?  If they just fail a few tests
>>> that is not particularly worrying (real hardware does that often too)
>>>  
>>>       
>> Yes, they sometimes fail to boot, and the failure message used to ask us 
>> to pester mingo.
>>     
>
> I still think we should figure that out automatically. Letting
> the Hypervisor pass magic boot options seems somehow unclean.
>
> But i suppose it will only work for the paravirtualized case,
> not for the case of kernel running "native" under a hypervisor
> I suppose? Or does that one not panic?
>   

That is the one that can panic, for now.  Fixing the paravirtualized 
case is easy, but we can't assume paravirtualization just yet.

Zach



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-10-30 23:09           ` Zachary Amsden
@ 2006-10-30 23:12             ` Andi Kleen
  2006-10-30 23:24               ` Zachary Amsden
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2006-10-30 23:12 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Andi Kleen, virtualization, Andrew Morton, Chris Wright,
	Linux Kernel Mailing List

> That is the one that can panic, for now.  Fixing the paravirtualized 
> case is easy, but we can't assume paravirtualization just yet.

Hmm, this means standard vmware boot is not reliable unless that magic option
is set?   That doesn't sound good.  

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-10-30 23:12             ` Andi Kleen
@ 2006-10-30 23:24               ` Zachary Amsden
  2006-10-30 23:50                 ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2006-10-30 23:24 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andi Kleen, virtualization, Andrew Morton, Chris Wright,
	Linux Kernel Mailing List

Andi Kleen wrote:
>> That is the one that can panic, for now.  Fixing the paravirtualized 
>> case is easy, but we can't assume paravirtualization just yet.
>>     
>
> Hmm, this means standard vmware boot is not reliable unless that magic option
> is set?   That doesn't sound good.  
>   

It doesn't happen often, but it is a possibility that the kernel 
calibrates the delay wrong because of timing glitches caused by CPU 
migration, paging, or other phenomena which are supposed to be 
transparent to the kernel (but cause temporal lapse).  In that case, the 
kernel may not make enough progress in a spin delay loop to properly 
reach the number of microseconds required for N number of timer ticks to 
occur.  In theory this can happen on a real machine, as SMM mode could 
be active, doing USB device emulation or something that takes a while 
during the lpj calibration and throwing the computation off.

By changing the parameters (N ticks at K Hz in T seconds), it is easy to 
create an unstable measurement that can achieve high failure rates, 
although in practice the Linux parameters appear to be reasonable enough 
that it is not a major problem.

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-10-30 23:24               ` Zachary Amsden
@ 2006-10-30 23:50                 ` Andi Kleen
  0 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2006-10-30 23:50 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: virtualization, Andrew Morton, Chris Wright,
	Linux Kernel Mailing List


> It doesn't happen often, but it is a possibility that the kernel 
> calibrates the delay wrong because of timing glitches caused by CPU 
> migration, paging, or other phenomena which are supposed to be 
> transparent to the kernel (but cause temporal lapse).

We're supposed to handle those because they happen on real hardware
too with long running SMM handlers. Or at least there was a effort some time ago
to do this. If it wasn't enough we'll likely need to fix the code.

> In that case, the  
> kernel may not make enough progress in a spin delay loop to properly 
> reach the number of microseconds required for N number of timer ticks to 
> occur.  

Hmm, mdelay is polling RDTSC and assumes it makes forward progress
and waits until the time that was estimated at the original TSC<->PIT
calibration passed.  While there is a spin loop it is definitely 
polling a timer that is supposed to tick properly even in virtualization.

You're saying that doesn't work on vmware? Does it have trouble
with RDTSC?

Anyways if polling against TSC doesn't work I suppose we could
change it to poll against some other timer.
 
> In theory this can happen on a real machine, as SMM mode could 
> be active, doing USB device emulation or something that takes a while 
> during the lpj calibration and throwing the computation off.

Yep.

> By changing the parameters (N ticks at K Hz in T seconds), it is easy to 
> create an unstable measurement that can achieve high failure rates, 
> although in practice the Linux parameters appear to be reasonable enough 
> that it is not a major problem.

Hmm, why exactly? 

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-10-30 22:50         ` Andi Kleen
  2006-10-30 23:09           ` Zachary Amsden
@ 2006-11-15  8:03           ` Chris Wright
  1 sibling, 0 replies; 15+ messages in thread
From: Chris Wright @ 2006-11-15  8:03 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Zachary Amsden, Andi Kleen, virtualization, Andrew Morton,
	Chris Wright, Linux Kernel Mailing List

* Andi Kleen (ak@muc.de) wrote:
> On Mon, Oct 30, 2006 at 12:54:55PM -0800, Zachary Amsden wrote:
> > Andi Kleen wrote:
> > >no_timer_check. But it's only there on x86-64 in mainline - although there
> > >were some patches to add it to i386 too.
> > >  
> > 
> > I can rename to match the x86-64 name.
> 
> I will do that in my tree.

Looks like this one might have been lost.  Here's the renamed version
(it's against 2.6.19-rc5-mm2, which should have your tree in it).

thanks,
-chris
--

From: Zachary Amsden <zach@vmware.com>

Add a way to disable the timer IRQ routing check via a boot option.  The
VMI timer code uses this to avoid triggering the pester Mingo code, which
probes for some very unusual and broken motherboard routings.  It fires
100% of the time when using a paravirtual delay mechanism instead of
using a realtime delay, since there is no elapsed real time, and the 4 timer
IRQs have not yet been delivered.

In addition, it is entirely possible, though improbable, that this bug
could surface on real hardware which picks a particularly bad time to enter
SMM mode, causing a long latency during one of the timer IRQs.

While here, make check_timer be __init.

Signed-off-by: Zachary Amsden <zach@vmware.com>
[chrisw: use no_timer_check to bring inline with x86_64 as per Andi's request]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>

===================================================================
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -610,8 +610,6 @@ and is between 256 and 4096 characters. 
 
 	hugepages=	[HW,IA-32,IA-64] Maximal number of HugeTLB pages.
 
-	noirqbalance	[IA-32,SMP,KNL] Disable kernel irq balancing
-
 	i8042.direct	[HW] Put keyboard port into non-translated mode
 	i8042.dumbkbd	[HW] Pretend that controller can only read data from
 			     keyboard and cannot control its state
@@ -1081,8 +1079,13 @@ and is between 256 and 4096 characters. 
 			in certain environments such as networked servers or
 			real-time systems.
 
+	noirqbalance	[IA-32,SMP,KNL] Disable kernel irq balancing
+
 	noirqdebug	[IA-32] Disables the code which attempts to detect and
 			disable unhandled interrupt sources.
+
+	no_timer_check	[IA-32,X86_64,APIC] Disables the code which tests for
+			broken timer IRQ sources.
 
 	noisapnp	[ISAPNP] Disables ISA PnP code.
 
===================================================================
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -1931,6 +1931,15 @@ static void __init setup_ioapic_ids_from
 static void __init setup_ioapic_ids_from_mpc(void) { }
 #endif
 
+static int no_timer_check __initdata;
+
+static int __init notimercheck(char *s)
+{
+	no_timer_check = 1;
+	return 1;
+}
+__setup("no_timer_check", notimercheck);
+
 /*
  * There is a nasty bug in some older SMP boards, their mptable lies
  * about the timer IRQ. We do the following to work around the situation:
@@ -1939,9 +1948,12 @@ static void __init setup_ioapic_ids_from
  *	- if this function detects that timer IRQs are defunct, then we fall
  *	  back to ISA timer IRQs
  */
-static int __init timer_irq_works(void)
+int __init timer_irq_works(void)
 {
 	unsigned long t1 = jiffies;
+
+	if (no_timer_check)
+		return 1;
 
 	local_irq_enable();
 	/* Let ten ticks pass... */
@@ -2219,7 +2231,7 @@ int timer_uses_ioapic_pin_0;
  * is so screwy.  Thanks to Brian Perkins for testing/hacking this beast
  * fanatically on his truly buggy board.
  */
-static inline void check_timer(void)
+static inline void __init check_timer(void)
 {
 	int apic1, pin1, apic2, pin2;
 	int vector;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-10-20  0:09 [PATCH 1/5] Skip timer works.patch Zachary Amsden
  2006-10-27 14:56 ` Andi Kleen
@ 2006-11-16 22:53 ` Andrew Morton
  2006-11-16 23:08   ` Zachary Amsden
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Andrew Morton @ 2006-11-16 22:53 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Rusty Russell, Andi Kleen, Jeremy Fitzhardinge, Chris Wright,
	Virtualization Mailing List, Linux Kernel Mailing List

On Thu, 19 Oct 2006 17:09:22 -0700
Zachary Amsden <zach@vmware.com> wrote:

> Add a way to disable the timer IRQ routing check via a boot option.  The
> VMI timer code uses this to avoid triggering the pester Mingo code, which
> probes for some very unusual and broken motherboard routings.  It fires
> 100% of the time when using a paravirtual delay mechanism instead of
> using a realtime delay, since there is no elapsed real time, and the 4 timer
> IRQs have not yet been delivered.
> 
> In addition, it is entirely possible, though improbable, that this bug
> could surface on real hardware which picks a particularly bad time to enter
> SMM mode, causing a long latency during one of the timer IRQs.
> 
> While here, make check_timer be __init.
> 

Andi seems to have merged this patch but from somewhere I picked up a
different version, below.

I think the version I have is better.  Because the patch Andi has merged is
cast in terms of "irq testing", which is broad.  But that's not what the
patch does - the patch handles only timers.

IOW, this:

> +
> +	noirqtest	[IA-32,APIC] Disables the code which tests for broken
> +			timer IRQ sources.

is misleadingly named.  This:

+       no_timer_check  [IA-32,X86_64,APIC] Disables the code which tests for
+                       broken timer IRQ sources.
+

is better, no?

But right now, I'll settle for anything which usually compiles.



From: Zachary Amsden <zach@vmware.com>

Add a way to disable the timer IRQ routing check via a boot option.  The
VMI timer code uses this to avoid triggering the pester Mingo code, which
probes for some very unusual and broken motherboard routings.  It fires
100% of the time when using a paravirtual delay mechanism instead of using
a realtime delay, since there is no elapsed real time, and the 4 timer IRQs
have not yet been delivered.

In addition, it is entirely possible, though improbable, that this bug
could surface on real hardware which picks a particularly bad time to enter
SMM mode, causing a long latency during one of the timer IRQs.

While here, make check_timer be __init.

Signed-off-by: Zachary Amsden <zach@vmware.com>
[chrisw: use no_timer_check to bring inline with x86_64 as per Andi's request]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 Documentation/kernel-parameters.txt |    7 +++++--
 arch/i386/kernel/io_apic.c          |   16 ++++++++++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff -puN arch/i386/kernel/io_apic.c~i386-add-a-way-to-disable-the-timer-irq-routing-check-via-a-boot-option arch/i386/kernel/io_apic.c
--- a/arch/i386/kernel/io_apic.c~i386-add-a-way-to-disable-the-timer-irq-routing-check-via-a-boot-option
+++ a/arch/i386/kernel/io_apic.c
@@ -1930,6 +1930,15 @@ static void __init setup_ioapic_ids_from
 static void __init setup_ioapic_ids_from_mpc(void) { }
 #endif
 
+static int no_timer_check __initdata;
+
+static int __init notimercheck(char *s)
+{
+	no_timer_check = 1;
+	return 1;
+}
+__setup("no_timer_check", notimercheck);
+
 /*
  * There is a nasty bug in some older SMP boards, their mptable lies
  * about the timer IRQ. We do the following to work around the situation:
@@ -1938,10 +1947,13 @@ static void __init setup_ioapic_ids_from
  *	- if this function detects that timer IRQs are defunct, then we fall
  *	  back to ISA timer IRQs
  */
-static int __init timer_irq_works(void)
+int __init timer_irq_works(void)
 {
 	unsigned long t1 = jiffies;
 
+	if (no_timer_check)
+		return 1;
+
 	local_irq_enable();
 	/* Let ten ticks pass... */
 	mdelay((10 * 1000) / HZ);
@@ -2212,7 +2224,7 @@ int timer_uses_ioapic_pin_0;
  * is so screwy.  Thanks to Brian Perkins for testing/hacking this beast
  * fanatically on his truly buggy board.
  */
-static inline void check_timer(void)
+static inline void __init check_timer(void)
 {
 	int apic1, pin1, apic2, pin2;
 	int vector;
diff -puN Documentation/kernel-parameters.txt~i386-add-a-way-to-disable-the-timer-irq-routing-check-via-a-boot-option Documentation/kernel-parameters.txt
--- a/Documentation/kernel-parameters.txt~i386-add-a-way-to-disable-the-timer-irq-routing-check-via-a-boot-option
+++ a/Documentation/kernel-parameters.txt
@@ -599,8 +599,6 @@ and is between 256 and 4096 characters. 
 
 	hugepages=	[HW,IA-32,IA-64] Maximal number of HugeTLB pages.
 
-	noirqbalance	[IA-32,SMP,KNL] Disable kernel irq balancing
-
 	i8042.direct	[HW] Put keyboard port into non-translated mode
 	i8042.dumbkbd	[HW] Pretend that controller can only read data from
 			     keyboard and cannot control its state
@@ -1056,9 +1054,14 @@ and is between 256 and 4096 characters. 
 			in certain environments such as networked servers or
 			real-time systems.
 
+	noirqbalance	[IA-32,SMP,KNL] Disable kernel irq balancing
+
 	noirqdebug	[IA-32] Disables the code which attempts to detect and
 			disable unhandled interrupt sources.
 
+	no_timer_check	[IA-32,X86_64,APIC] Disables the code which tests for
+			broken timer IRQ sources.
+
 	noisapnp	[ISAPNP] Disables ISA PnP code.
 
 	noinitrd	[RAM] Tells the kernel not to load any configured
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-11-16 22:53 ` Andrew Morton
@ 2006-11-16 23:08   ` Zachary Amsden
  2006-11-16 23:10   ` Chris Wright
  2006-11-17  5:05   ` Andi Kleen
  2 siblings, 0 replies; 15+ messages in thread
From: Zachary Amsden @ 2006-11-16 23:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rusty Russell, Andi Kleen, Jeremy Fitzhardinge, Chris Wright,
	Virtualization Mailing List, Linux Kernel Mailing List

Andrew Morton wrote:
> On Thu, 19 Oct 2006 17:09:22 -0700
> Zachary Amsden <zach@vmware.com> wrote:
>
>   
>> Add a way to disable the timer IRQ routing check via a boot option.  The
>> VMI timer code uses this to avoid triggering the pester Mingo code, which
>> probes for some very unusual and broken motherboard routings.  It fires
>> 100% of the time when using a paravirtual delay mechanism instead of
>> using a realtime delay, since there is no elapsed real time, and the 4 timer
>> IRQs have not yet been delivered.
>>
>> In addition, it is entirely possible, though improbable, that this bug
>> could surface on real hardware which picks a particularly bad time to enter
>> SMM mode, causing a long latency during one of the timer IRQs.
>>
>> While here, make check_timer be __init.
>>
>>     
>
> Andi seems to have merged this patch but from somewhere I picked up a
> different version, below.
>
> I think the version I have is better.  Because the patch Andi has merged is
> cast in terms of "irq testing", which is broad.  But that's not what the
> patch does - the patch handles only timers.
>
> IOW, this:
>
>   
>> +
>> +	noirqtest	[IA-32,APIC] Disables the code which tests for broken
>> +			timer IRQ sources.
>>     
>
> is misleadingly named.  This:
>
> +       no_timer_check  [IA-32,X86_64,APIC] Disables the code which tests for
> +                       broken timer IRQ sources.
> +
>
> is better, no?
>
> But right now, I'll settle for anything which usually compiles.
>
>   

Yes, the name sucks.  There is no real reason to actually have a boot 
parameter at all once the paravirt / VMI patches are in, but I wanted 
something to be able to set timer_irq_really_works until then to avoid 
someone accidentally removing it.

Zach

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-11-16 22:53 ` Andrew Morton
  2006-11-16 23:08   ` Zachary Amsden
@ 2006-11-16 23:10   ` Chris Wright
  2006-11-17  5:05   ` Andi Kleen
  2 siblings, 0 replies; 15+ messages in thread
From: Chris Wright @ 2006-11-16 23:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zachary Amsden, Rusty Russell, Andi Kleen, Jeremy Fitzhardinge,
	Chris Wright, Virtualization Mailing List,
	Linux Kernel Mailing List

* Andrew Morton (akpm@osdl.org) wrote:
> Andi seems to have merged this patch but from somewhere I picked up a
> different version, below.

That was from me.  The earlier one had fallen through the cracks, so
I picked it up and reworked it (so that it uses no_timer_check) as per
Andi's suggestion.

> I think the version I have is better.

Agreed ;-)

> Because the patch Andi has merged is
> cast in terms of "irq testing", which is broad.  But that's not what the
> patch does - the patch handles only timers.
> 
> IOW, this:
> 
> > +
> > +	noirqtest	[IA-32,APIC] Disables the code which tests for broken
> > +			timer IRQ sources.
> 
> is misleadingly named.  This:
> 
> +       no_timer_check  [IA-32,X86_64,APIC] Disables the code which tests for
> +                       broken timer IRQ sources.
> +
> 
> is better, no?

Clearer, and also same as x86_64.

> But right now, I'll settle for anything which usually compiles.

I believe the current round has fixed all the know compilation issues.
There's one outstanding runtime issue from Andi, which I believe is due
to an old glibc which requires COMPAT_VDSO (and which CONFIG_PARAVIRT
must disable).  So current set is clean from the reports I have (and
assuming above vdso analysis is correct).

thanks,
-chris
--
> From: Zachary Amsden <zach@vmware.com>
> 
> Add a way to disable the timer IRQ routing check via a boot option.  The
> VMI timer code uses this to avoid triggering the pester Mingo code, which
> probes for some very unusual and broken motherboard routings.  It fires
> 100% of the time when using a paravirtual delay mechanism instead of using
> a realtime delay, since there is no elapsed real time, and the 4 timer IRQs
> have not yet been delivered.
> 
> In addition, it is entirely possible, though improbable, that this bug
> could surface on real hardware which picks a particularly bad time to enter
> SMM mode, causing a long latency during one of the timer IRQs.
> 
> While here, make check_timer be __init.
> 
> Signed-off-by: Zachary Amsden <zach@vmware.com>
> [chrisw: use no_timer_check to bring inline with x86_64 as per Andi's request]
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
> Cc: Andi Kleen <ak@suse.de>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
> 
>  Documentation/kernel-parameters.txt |    7 +++++--
>  arch/i386/kernel/io_apic.c          |   16 ++++++++++++++--
>  2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff -puN arch/i386/kernel/io_apic.c~i386-add-a-way-to-disable-the-timer-irq-routing-check-via-a-boot-option arch/i386/kernel/io_apic.c
> --- a/arch/i386/kernel/io_apic.c~i386-add-a-way-to-disable-the-timer-irq-routing-check-via-a-boot-option
> +++ a/arch/i386/kernel/io_apic.c
> @@ -1930,6 +1930,15 @@ static void __init setup_ioapic_ids_from
>  static void __init setup_ioapic_ids_from_mpc(void) { }
>  #endif
>  
> +static int no_timer_check __initdata;
> +
> +static int __init notimercheck(char *s)
> +{
> +	no_timer_check = 1;
> +	return 1;
> +}
> +__setup("no_timer_check", notimercheck);
> +
>  /*
>   * There is a nasty bug in some older SMP boards, their mptable lies
>   * about the timer IRQ. We do the following to work around the situation:
> @@ -1938,10 +1947,13 @@ static void __init setup_ioapic_ids_from
>   *	- if this function detects that timer IRQs are defunct, then we fall
>   *	  back to ISA timer IRQs
>   */
> -static int __init timer_irq_works(void)
> +int __init timer_irq_works(void)
>  {
>  	unsigned long t1 = jiffies;
>  
> +	if (no_timer_check)
> +		return 1;
> +
>  	local_irq_enable();
>  	/* Let ten ticks pass... */
>  	mdelay((10 * 1000) / HZ);
> @@ -2212,7 +2224,7 @@ int timer_uses_ioapic_pin_0;
>   * is so screwy.  Thanks to Brian Perkins for testing/hacking this beast
>   * fanatically on his truly buggy board.
>   */
> -static inline void check_timer(void)
> +static inline void __init check_timer(void)
>  {
>  	int apic1, pin1, apic2, pin2;
>  	int vector;
> diff -puN Documentation/kernel-parameters.txt~i386-add-a-way-to-disable-the-timer-irq-routing-check-via-a-boot-option Documentation/kernel-parameters.txt
> --- a/Documentation/kernel-parameters.txt~i386-add-a-way-to-disable-the-timer-irq-routing-check-via-a-boot-option
> +++ a/Documentation/kernel-parameters.txt
> @@ -599,8 +599,6 @@ and is between 256 and 4096 characters. 
>  
>  	hugepages=	[HW,IA-32,IA-64] Maximal number of HugeTLB pages.
>  
> -	noirqbalance	[IA-32,SMP,KNL] Disable kernel irq balancing
> -
>  	i8042.direct	[HW] Put keyboard port into non-translated mode
>  	i8042.dumbkbd	[HW] Pretend that controller can only read data from
>  			     keyboard and cannot control its state
> @@ -1056,9 +1054,14 @@ and is between 256 and 4096 characters. 
>  			in certain environments such as networked servers or
>  			real-time systems.
>  
> +	noirqbalance	[IA-32,SMP,KNL] Disable kernel irq balancing
> +
>  	noirqdebug	[IA-32] Disables the code which attempts to detect and
>  			disable unhandled interrupt sources.
>  
> +	no_timer_check	[IA-32,X86_64,APIC] Disables the code which tests for
> +			broken timer IRQ sources.
> +
>  	noisapnp	[ISAPNP] Disables ISA PnP code.
>  
>  	noinitrd	[RAM] Tells the kernel not to load any configured
> _

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] Skip timer works.patch
  2006-11-16 22:53 ` Andrew Morton
  2006-11-16 23:08   ` Zachary Amsden
  2006-11-16 23:10   ` Chris Wright
@ 2006-11-17  5:05   ` Andi Kleen
  2 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2006-11-17  5:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zachary Amsden, Rusty Russell, Jeremy Fitzhardinge, Chris Wright,
	Virtualization Mailing List, Linux Kernel Mailing List


> Andi seems to have merged this patch but from somewhere I picked up a
> different version, below.
> 
> I think the version I have is better.  Because the patch Andi has merged is
> cast in terms of "irq testing", which is broad.  But that's not what the
> patch does - the patch handles only timers.

Agreed. I updated to your version now.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-11-17  5:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-20  0:09 [PATCH 1/5] Skip timer works.patch Zachary Amsden
2006-10-27 14:56 ` Andi Kleen
2006-10-27 19:09   ` Zachary Amsden
2006-10-27 21:16     ` Andi Kleen
2006-10-30 20:54       ` Zachary Amsden
2006-10-30 22:50         ` Andi Kleen
2006-10-30 23:09           ` Zachary Amsden
2006-10-30 23:12             ` Andi Kleen
2006-10-30 23:24               ` Zachary Amsden
2006-10-30 23:50                 ` Andi Kleen
2006-11-15  8:03           ` Chris Wright
2006-11-16 22:53 ` Andrew Morton
2006-11-16 23:08   ` Zachary Amsden
2006-11-16 23:10   ` Chris Wright
2006-11-17  5:05   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).