* Re: 2.6.21-rc[123] regression with NOAPIC [not found] <4601573A.8070602@madrabbit.org> @ 2007-03-22 13:42 ` Adrian Bunk 2007-03-22 14:10 ` Thomas Gleixner 0 siblings, 1 reply; 5+ messages in thread From: Adrian Bunk @ 2007-03-22 13:42 UTC (permalink / raw) To: Ray Lee Cc: LKML, Thomas Gleixner, Ingo Molnar, john stultz, Len Brown, Andi Kleen, linux-acpi On Wed, Mar 21, 2007 at 09:03:06AM -0700, Ray Lee wrote: > Hey Thomas, Ingo, et al. > > I'm having a problem, and tracked it down to what looks like a harmless > commit of yours. I didn't quite believe the bisect at first, so tested > it multiple times. > > The original problem report, when I boot with NOAPIC on the command line > on my x86_64 box, the boot hangs and the system is totally unresponsive: > > > For me, this is a minor regression as I no longer need to boot with > > NOAPIC, it just happened to still be the default when I tried 2.6.21-rc3. > > > > During boot, the computer wedges, hard. It's unresponsive to SysRq > > combos, or ctrl-alt-del, but the fan kicks in pretty quickly, so I'm > > guessing the CPU is still going wild. > > > > The boot locks right before, or during, or immediately after the ** line > > below: > > > > [ 15.020037] ACPI: CPU0 (power states: C1[C1] C3[C3]) > > ** [ 15.020221] ACPI: Processor [C000] (supports 8 throttling states) > > [ 15.036059] ACPI: Thermal Zone [TZ1] (66 C) > > [ 15.041893] ACPI: Thermal Zone [TZ2] (53 C) > > [ 15.051285] ACPI: Thermal Zone [TZ3] (33 C) > > [ 15.054922] ACPI: Thermal Zone [TZ4] (34 C) > > > > This is an x86-64 system, HP/Compaq nx6125, ATI chipset (sigh). Booting > > with NOAPIC worked in 2.6.20, fails in 2.6.21-rc1 (and 2, and 3). > (& UP, not compiled for SMP.) > > Starting with head as of yesterday and reverting two commits (that are > duplicates of each other -- the same commit came into Linus's tree via > two different paths) 'fixes' the problem for me. I'll let those with the > big brains decide just why. > > The two commits are 5c95d3f5783ab184f64b7848f0a871352c35c3cf and > 3434933b17fa64adddf83059603c61296f6e1ee2 . The net reverse diff of those > two is below. >... Thanks for tracking it down. It's quite possible that these commits trigger your problem. Does it work if you do _not_ revert the commits, and instead replace in drivers/acpi/processor_idle.c the #ifdef ARCH_APICTIMER_STOPS_ON_C3 with an #if 0 ? > Ray >... cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.21-rc[123] regression with NOAPIC 2007-03-22 13:42 ` 2.6.21-rc[123] regression with NOAPIC Adrian Bunk @ 2007-03-22 14:10 ` Thomas Gleixner 2007-03-22 14:16 ` Adrian Bunk 0 siblings, 1 reply; 5+ messages in thread From: Thomas Gleixner @ 2007-03-22 14:10 UTC (permalink / raw) To: Adrian Bunk Cc: Ray Lee, LKML, Ingo Molnar, john stultz, Len Brown, Andi Kleen, linux-acpi On Thu, 2007-03-22 at 14:42 +0100, Adrian Bunk wrote: > > Starting with head as of yesterday and reverting two commits (that are > > duplicates of each other -- the same commit came into Linus's tree via > > two different paths) 'fixes' the problem for me. I'll let those with the > > big brains decide just why. > > > > The two commits are 5c95d3f5783ab184f64b7848f0a871352c35c3cf and > > 3434933b17fa64adddf83059603c61296f6e1ee2 . The net reverse diff of those > > two is below. > >... > > Thanks for tracking it down. > > It's quite possible that these commits trigger your problem. > > Does it work if you do _not_ revert the commits, and instead replace in > drivers/acpi/processor_idle.c the > #ifdef ARCH_APICTIMER_STOPS_ON_C3 > with an > #if 0 > ? Then NOAPIC probably works again, but booting w/o NOAPIC fails. tglx ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.21-rc[123] regression with NOAPIC 2007-03-22 14:10 ` Thomas Gleixner @ 2007-03-22 14:16 ` Adrian Bunk 2007-03-22 15:16 ` Thomas Gleixner 0 siblings, 1 reply; 5+ messages in thread From: Adrian Bunk @ 2007-03-22 14:16 UTC (permalink / raw) To: Thomas Gleixner Cc: Ray Lee, LKML, Ingo Molnar, john stultz, Len Brown, Andi Kleen, linux-acpi On Thu, Mar 22, 2007 at 03:10:03PM +0100, Thomas Gleixner wrote: > On Thu, 2007-03-22 at 14:42 +0100, Adrian Bunk wrote: > > > Starting with head as of yesterday and reverting two commits (that are > > > duplicates of each other -- the same commit came into Linus's tree via > > > two different paths) 'fixes' the problem for me. I'll let those with the > > > big brains decide just why. > > > > > > The two commits are 5c95d3f5783ab184f64b7848f0a871352c35c3cf and > > > 3434933b17fa64adddf83059603c61296f6e1ee2 . The net reverse diff of those > > > two is below. > > >... > > > > Thanks for tracking it down. > > > > It's quite possible that these commits trigger your problem. > > > > Does it work if you do _not_ revert the commits, and instead replace in > > drivers/acpi/processor_idle.c the > > #ifdef ARCH_APICTIMER_STOPS_ON_C3 > > with an > > #if 0 > > ? > > Then NOAPIC probably works again, but booting w/o NOAPIC fails. But we'll know that it's this code that has a problen with noapic in the CONFIG_GENERIC_CLOCKEVENTS=n case. > tglx cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.21-rc[123] regression with NOAPIC 2007-03-22 14:16 ` Adrian Bunk @ 2007-03-22 15:16 ` Thomas Gleixner 2007-03-23 5:35 ` Ray Lee 0 siblings, 1 reply; 5+ messages in thread From: Thomas Gleixner @ 2007-03-22 15:16 UTC (permalink / raw) To: Adrian Bunk Cc: Ray Lee, LKML, Ingo Molnar, john stultz, Len Brown, Andi Kleen, linux-acpi On Thu, 2007-03-22 at 15:16 +0100, Adrian Bunk wrote: > > > Does it work if you do _not_ revert the commits, and instead replace in > > > drivers/acpi/processor_idle.c the > > > #ifdef ARCH_APICTIMER_STOPS_ON_C3 > > > with an > > > #if 0 > > > ? > > > > Then NOAPIC probably works again, but booting w/o NOAPIC fails. > > But we'll know that it's this code that has a problen with noapic > in the CONFIG_GENERIC_CLOCKEVENTS=n case. Nope. This code does not have a problem. It causes a problem elsewhere: It calls switch_ipi_to_APIC_timer() or switch_APIC_timer_to_ipi(), which sets/clears a bit in the broadcast mask and enables / disables the local APIC timer. I don't see right now, why this causes the box to lock up hard, but maybe the debug printk's below give us some hint. tglx diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c index 723417d..29376e2 100644 --- a/arch/x86_64/kernel/apic.c +++ b/arch/x86_64/kernel/apic.c @@ -886,6 +886,8 @@ void disable_APIC_timer(void) if (using_apic_timer) { unsigned long v; + printk("Disabling local APIC timer %d\n", apic_runs_main_timer); + v = apic_read(APIC_LVTT); /* * When an illegal vector value (0-15) is written to an LVT @@ -910,6 +912,7 @@ void enable_APIC_timer(void) !cpu_isset(cpu, timer_interrupt_broadcast_ipi_mask)) { unsigned long v; + printk("Enabling local APIC timer: %d\n", apic_runs_main_timer); v = apic_read(APIC_LVTT); apic_write(APIC_LVTT, v & ~APIC_LVT_MASKED); } @@ -934,6 +937,7 @@ void smp_send_timer_broadcast_ipi(void) cpus_and(mask, cpu_online_map, timer_interrupt_broadcast_ipi_mask); if (!cpus_empty(mask)) { + printk("Send IPI\n"); send_IPI_mask(mask, LOCAL_TIMER_VECTOR); } } ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: 2.6.21-rc[123] regression with NOAPIC 2007-03-22 15:16 ` Thomas Gleixner @ 2007-03-23 5:35 ` Ray Lee 0 siblings, 0 replies; 5+ messages in thread From: Ray Lee @ 2007-03-23 5:35 UTC (permalink / raw) To: tglx Cc: Adrian Bunk, LKML, Ingo Molnar, john stultz, Len Brown, Andi Kleen, linux-acpi Thomas Gleixner wrote: > On Thu, 2007-03-22 at 15:16 +0100, Adrian Bunk wrote: >>>> Does it work if you do _not_ revert the commits, and instead replace in >>>> drivers/acpi/processor_idle.c the >>>> #ifdef ARCH_APICTIMER_STOPS_ON_C3 >>>> with an >>>> #if 0 >>>> ? >>> Then NOAPIC probably works again, but booting w/o NOAPIC fails. >> But we'll know that it's this code that has a problen with noapic >> in the CONFIG_GENERIC_CLOCKEVENTS=n case. > > Nope. This code does not have a problem. It causes a problem elsewhere: I can still try the above if it ends up being a useful data point. > > It calls switch_ipi_to_APIC_timer() or switch_APIC_timer_to_ipi(), which > sets/clears a bit in the broadcast mask and enables / disables the local > APIC timer. > > I don't see right now, why this causes the box to lock up hard, but > maybe the debug printk's below give us some hint. > > tglx > > diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c > index 723417d..29376e2 100644 > --- a/arch/x86_64/kernel/apic.c > +++ b/arch/x86_64/kernel/apic.c > @@ -886,6 +886,8 @@ void disable_APIC_timer(void) > if (using_apic_timer) { > unsigned long v; > > + printk("Disabling local APIC timer %d\n", apic_runs_main_timer); > + > v = apic_read(APIC_LVTT); > /* > * When an illegal vector value (0-15) is written to an LVT > @@ -910,6 +912,7 @@ void enable_APIC_timer(void) > !cpu_isset(cpu, timer_interrupt_broadcast_ipi_mask)) { > unsigned long v; > > + printk("Enabling local APIC timer: %d\n", apic_runs_main_timer); > v = apic_read(APIC_LVTT); > apic_write(APIC_LVTT, v & ~APIC_LVT_MASKED); > } > @@ -934,6 +937,7 @@ void smp_send_timer_broadcast_ipi(void) > > cpus_and(mask, cpu_online_map, timer_interrupt_broadcast_ipi_mask); > if (!cpus_empty(mask)) { > + printk("Send IPI\n"); > send_IPI_mask(mask, LOCAL_TIMER_VECTOR); > } > } > > I didn't see the first two print, but I'm having to watch the bad bootups (with NOAPIC) by eyesight alone, as I don't have a second system to run netconsole on at the moment. However, on the NOAPIC, locking boot, the last thing that prints out is the final printk, Send IPI. On the boots without NOAPIC, at the same spot roughly a thousand (estimated) "Send IPI" messages hit the screen before transitioning to the initramfs and continuing normally. In the morning, I can rework the patch to set a global in the first two cases (Disabling/Enabling local APIC timer), and print the result of those in the last case, as we know the system will hang there. (I would have done this before sending the message, but given our timezone difference, figured this was a good start.) Ray ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-03-23 5:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4601573A.8070602@madrabbit.org>
2007-03-22 13:42 ` 2.6.21-rc[123] regression with NOAPIC Adrian Bunk
2007-03-22 14:10 ` Thomas Gleixner
2007-03-22 14:16 ` Adrian Bunk
2007-03-22 15:16 ` Thomas Gleixner
2007-03-23 5:35 ` Ray Lee
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox