public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
* Re: 2.6.21-rc[123] regression with NOAPIC
       [not found] <4601573A.8070602@madrabbit.org>
@ 2007-03-22 13:42 ` Adrian Bunk
  2007-03-22 14:10   ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Adrian Bunk @ 2007-03-22 13:42 UTC (permalink / raw)
  To: Ray Lee
  Cc: LKML, Thomas Gleixner, Ingo Molnar, john stultz, Len Brown,
	Andi Kleen, linux-acpi

On Wed, Mar 21, 2007 at 09:03:06AM -0700, Ray Lee wrote:
> Hey Thomas, Ingo, et al.
> 
> I'm having a problem, and tracked it down to what looks like a harmless
> commit of yours. I didn't quite believe the bisect at first, so tested
> it multiple times.
> 
> The original problem report, when I boot with NOAPIC on the command line
> on my x86_64 box, the boot hangs and the system is totally unresponsive:
> 
> > For me, this is a minor regression as I no longer need to boot with
> > NOAPIC, it just happened to still be the default when I tried 2.6.21-rc3.
> > 
> > During boot, the computer wedges, hard. It's unresponsive to SysRq
> > combos, or ctrl-alt-del, but the fan kicks in pretty quickly, so I'm
> > guessing the CPU is still going wild.
> > 
> > The boot locks right before, or during, or immediately after the ** line
> > below:
> > 
> >    [   15.020037] ACPI: CPU0 (power states: C1[C1] C3[C3])
> > ** [   15.020221] ACPI: Processor [C000] (supports 8 throttling states)
> >    [   15.036059] ACPI: Thermal Zone [TZ1] (66 C)
> >    [   15.041893] ACPI: Thermal Zone [TZ2] (53 C)
> >    [   15.051285] ACPI: Thermal Zone [TZ3] (33 C)
> >    [   15.054922] ACPI: Thermal Zone [TZ4] (34 C)
> > 
> > This is an x86-64 system, HP/Compaq nx6125, ATI chipset (sigh). Booting
> > with NOAPIC worked in 2.6.20, fails in 2.6.21-rc1 (and 2, and 3).
> (& UP, not compiled for SMP.)
> 
> Starting with head as of yesterday and reverting two commits (that are
> duplicates of each other -- the same commit came into Linus's tree via
> two different paths) 'fixes' the problem for me. I'll let those with the
> big brains decide just why.
> 
> The two commits are 5c95d3f5783ab184f64b7848f0a871352c35c3cf and
> 3434933b17fa64adddf83059603c61296f6e1ee2 . The net reverse diff of those
> two is below.
>...

Thanks for tracking it down.

It's quite possible that these commits trigger your problem.

Does it work if you do _not_ revert the commits, and instead replace in
drivers/acpi/processor_idle.c the
  #ifdef ARCH_APICTIMER_STOPS_ON_C3
with an
  #if 0
?

> Ray
>...

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.21-rc[123] regression with NOAPIC
  2007-03-22 13:42 ` 2.6.21-rc[123] regression with NOAPIC Adrian Bunk
@ 2007-03-22 14:10   ` Thomas Gleixner
  2007-03-22 14:16     ` Adrian Bunk
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2007-03-22 14:10 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Ray Lee, LKML, Ingo Molnar, john stultz, Len Brown, Andi Kleen,
	linux-acpi

On Thu, 2007-03-22 at 14:42 +0100, Adrian Bunk wrote:
> > Starting with head as of yesterday and reverting two commits (that are
> > duplicates of each other -- the same commit came into Linus's tree via
> > two different paths) 'fixes' the problem for me. I'll let those with the
> > big brains decide just why.
> > 
> > The two commits are 5c95d3f5783ab184f64b7848f0a871352c35c3cf and
> > 3434933b17fa64adddf83059603c61296f6e1ee2 . The net reverse diff of those
> > two is below.
> >...
> 
> Thanks for tracking it down.
> 
> It's quite possible that these commits trigger your problem.
> 
> Does it work if you do _not_ revert the commits, and instead replace in
> drivers/acpi/processor_idle.c the
>   #ifdef ARCH_APICTIMER_STOPS_ON_C3
> with an
>   #if 0
> ?

Then NOAPIC probably works again, but booting w/o NOAPIC fails.

	tglx



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.21-rc[123] regression with NOAPIC
  2007-03-22 14:10   ` Thomas Gleixner
@ 2007-03-22 14:16     ` Adrian Bunk
  2007-03-22 15:16       ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Adrian Bunk @ 2007-03-22 14:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ray Lee, LKML, Ingo Molnar, john stultz, Len Brown, Andi Kleen,
	linux-acpi

On Thu, Mar 22, 2007 at 03:10:03PM +0100, Thomas Gleixner wrote:
> On Thu, 2007-03-22 at 14:42 +0100, Adrian Bunk wrote:
> > > Starting with head as of yesterday and reverting two commits (that are
> > > duplicates of each other -- the same commit came into Linus's tree via
> > > two different paths) 'fixes' the problem for me. I'll let those with the
> > > big brains decide just why.
> > > 
> > > The two commits are 5c95d3f5783ab184f64b7848f0a871352c35c3cf and
> > > 3434933b17fa64adddf83059603c61296f6e1ee2 . The net reverse diff of those
> > > two is below.
> > >...
> > 
> > Thanks for tracking it down.
> > 
> > It's quite possible that these commits trigger your problem.
> > 
> > Does it work if you do _not_ revert the commits, and instead replace in
> > drivers/acpi/processor_idle.c the
> >   #ifdef ARCH_APICTIMER_STOPS_ON_C3
> > with an
> >   #if 0
> > ?
> 
> Then NOAPIC probably works again, but booting w/o NOAPIC fails.

But we'll know that it's this code that has a problen with noapic
in the CONFIG_GENERIC_CLOCKEVENTS=n case.

> 	tglx

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.21-rc[123] regression with NOAPIC
  2007-03-22 14:16     ` Adrian Bunk
@ 2007-03-22 15:16       ` Thomas Gleixner
  2007-03-23  5:35         ` Ray Lee
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2007-03-22 15:16 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Ray Lee, LKML, Ingo Molnar, john stultz, Len Brown, Andi Kleen,
	linux-acpi

On Thu, 2007-03-22 at 15:16 +0100, Adrian Bunk wrote:
> > > Does it work if you do _not_ revert the commits, and instead replace in
> > > drivers/acpi/processor_idle.c the
> > >   #ifdef ARCH_APICTIMER_STOPS_ON_C3
> > > with an
> > >   #if 0
> > > ?
> > 
> > Then NOAPIC probably works again, but booting w/o NOAPIC fails.
> 
> But we'll know that it's this code that has a problen with noapic
> in the CONFIG_GENERIC_CLOCKEVENTS=n case.

Nope. This code does not have a problem. It causes a problem elsewhere:

It calls switch_ipi_to_APIC_timer() or switch_APIC_timer_to_ipi(), which
sets/clears a bit in the broadcast mask and enables / disables the local
APIC timer.

I don't see right now, why this causes the box to lock up hard, but
maybe the debug printk's below give us some hint.

	tglx

diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 723417d..29376e2 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -886,6 +886,8 @@ void disable_APIC_timer(void)
 	if (using_apic_timer) {
 		unsigned long v;
 
+		printk("Disabling local APIC timer %d\n", apic_runs_main_timer);
+
 		v = apic_read(APIC_LVTT);
 		/*
 		 * When an illegal vector value (0-15) is written to an LVT
@@ -910,6 +912,7 @@ void enable_APIC_timer(void)
 	    !cpu_isset(cpu, timer_interrupt_broadcast_ipi_mask)) {
 		unsigned long v;
 
+		printk("Enabling local APIC timer: %d\n", apic_runs_main_timer);
 		v = apic_read(APIC_LVTT);
 		apic_write(APIC_LVTT, v & ~APIC_LVT_MASKED);
 	}
@@ -934,6 +937,7 @@ void smp_send_timer_broadcast_ipi(void)
 
 	cpus_and(mask, cpu_online_map, timer_interrupt_broadcast_ipi_mask);
 	if (!cpus_empty(mask)) {
+		printk("Send IPI\n");
 		send_IPI_mask(mask, LOCAL_TIMER_VECTOR);
 	}
 }

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: 2.6.21-rc[123] regression with NOAPIC
  2007-03-22 15:16       ` Thomas Gleixner
@ 2007-03-23  5:35         ` Ray Lee
  0 siblings, 0 replies; 5+ messages in thread
From: Ray Lee @ 2007-03-23  5:35 UTC (permalink / raw)
  To: tglx
  Cc: Adrian Bunk, LKML, Ingo Molnar, john stultz, Len Brown,
	Andi Kleen, linux-acpi

Thomas Gleixner wrote:
> On Thu, 2007-03-22 at 15:16 +0100, Adrian Bunk wrote:
>>>> Does it work if you do _not_ revert the commits, and instead replace in
>>>> drivers/acpi/processor_idle.c the
>>>>   #ifdef ARCH_APICTIMER_STOPS_ON_C3
>>>> with an
>>>>   #if 0
>>>> ?
>>> Then NOAPIC probably works again, but booting w/o NOAPIC fails.
>> But we'll know that it's this code that has a problen with noapic
>> in the CONFIG_GENERIC_CLOCKEVENTS=n case.
> 
> Nope. This code does not have a problem. It causes a problem elsewhere:

I can still try the above if it ends up being a useful data point.

> 
> It calls switch_ipi_to_APIC_timer() or switch_APIC_timer_to_ipi(), which
> sets/clears a bit in the broadcast mask and enables / disables the local
> APIC timer.
> 
> I don't see right now, why this causes the box to lock up hard, but
> maybe the debug printk's below give us some hint.
> 
> 	tglx
> 
> diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
> index 723417d..29376e2 100644
> --- a/arch/x86_64/kernel/apic.c
> +++ b/arch/x86_64/kernel/apic.c
> @@ -886,6 +886,8 @@ void disable_APIC_timer(void)
>  	if (using_apic_timer) {
>  		unsigned long v;
>  
> +		printk("Disabling local APIC timer %d\n", apic_runs_main_timer);
> +
>  		v = apic_read(APIC_LVTT);
>  		/*
>  		 * When an illegal vector value (0-15) is written to an LVT
> @@ -910,6 +912,7 @@ void enable_APIC_timer(void)
>  	    !cpu_isset(cpu, timer_interrupt_broadcast_ipi_mask)) {
>  		unsigned long v;
>  
> +		printk("Enabling local APIC timer: %d\n", apic_runs_main_timer);
>  		v = apic_read(APIC_LVTT);
>  		apic_write(APIC_LVTT, v & ~APIC_LVT_MASKED);
>  	}
> @@ -934,6 +937,7 @@ void smp_send_timer_broadcast_ipi(void)
>  
>  	cpus_and(mask, cpu_online_map, timer_interrupt_broadcast_ipi_mask);
>  	if (!cpus_empty(mask)) {
> +		printk("Send IPI\n");
>  		send_IPI_mask(mask, LOCAL_TIMER_VECTOR);
>  	}
>  }
> 
> 

I didn't see the first two print, but I'm having to watch the bad
bootups (with NOAPIC) by eyesight alone, as I don't have a second system
to run netconsole on at the moment.

However, on the NOAPIC, locking boot, the last thing that prints out is
the final printk, Send IPI.

On the boots without NOAPIC, at the same spot roughly a thousand
(estimated) "Send IPI" messages hit the screen before transitioning to
the initramfs and continuing normally.

In the morning, I can rework the patch to set a global in the first two
cases (Disabling/Enabling local APIC timer), and print the result of
those in the last case, as we know the system will hang there. (I would
have done this before sending the message, but given our timezone
difference, figured this was a good start.)

Ray

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-03-23  5:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4601573A.8070602@madrabbit.org>
2007-03-22 13:42 ` 2.6.21-rc[123] regression with NOAPIC Adrian Bunk
2007-03-22 14:10   ` Thomas Gleixner
2007-03-22 14:16     ` Adrian Bunk
2007-03-22 15:16       ` Thomas Gleixner
2007-03-23  5:35         ` Ray Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox