* Re: [patch 0/3] Clocksource / clockevent updates [not found] <20070430102837.748238000@linutronix.de> @ 2007-05-02 0:33 ` Andrew Morton 2007-05-02 6:09 ` Thomas Gleixner [not found] ` <20070430102852.042964000@linutronix.de> [not found] ` <20070430102851.987296000@linutronix.de> 2 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-02 0:33 UTC (permalink / raw) To: Thomas Gleixner; +Cc: LKML, Ingo Molnar, John Stultz On Mon, 30 Apr 2007 10:43:31 -0000 Thomas Gleixner <tglx@linutronix.de> wrote: > Andrew, > > please pick up the following updates to clocksource / clockevents: > > - Fixups to the resume logic > - Keep TSC stable, when lapic_timer_c2_ok is set > Should we be targetting these at 2.6.20.x? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 0/3] Clocksource / clockevent updates 2007-05-02 0:33 ` [patch 0/3] Clocksource / clockevent updates Andrew Morton @ 2007-05-02 6:09 ` Thomas Gleixner 2007-05-02 6:14 ` Andrew Morton 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-02 6:09 UTC (permalink / raw) To: Andrew Morton; +Cc: LKML, Ingo Molnar, John Stultz On Tue, 2007-05-01 at 17:33 -0700, Andrew Morton wrote: > On Mon, 30 Apr 2007 10:43:31 -0000 > Thomas Gleixner <tglx@linutronix.de> wrote: > > > Andrew, > > > > please pick up the following updates to clocksource / clockevents: > > > > - Fixups to the resume logic > > - Keep TSC stable, when lapic_timer_c2_ok is set > > > > Should we be targetting these at 2.6.20.x? 2.6.21.x ? Hmm. They should get some testing first, but otherwise yes. tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 0/3] Clocksource / clockevent updates 2007-05-02 6:09 ` Thomas Gleixner @ 2007-05-02 6:14 ` Andrew Morton 0 siblings, 0 replies; 61+ messages in thread From: Andrew Morton @ 2007-05-02 6:14 UTC (permalink / raw) To: tglx; +Cc: LKML, Ingo Molnar, John Stultz On Wed, 02 May 2007 08:09:29 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > On Tue, 2007-05-01 at 17:33 -0700, Andrew Morton wrote: > > On Mon, 30 Apr 2007 10:43:31 -0000 > > Thomas Gleixner <tglx@linutronix.de> wrote: > > > > > Andrew, > > > > > > please pick up the following updates to clocksource / clockevents: > > > > > > - Fixups to the resume logic > > > - Keep TSC stable, when lapic_timer_c2_ok is set > > > > > > > Should we be targetting these at 2.6.20.x? > > 2.6.21.x ? > > Hmm. They should get some testing first, but otherwise yes. > OK, I added the cc. The second patch won't apply to 2.6.21 when we get around to it, but it'll be pretty simple to repair. ^ permalink raw reply [flat|nested] 61+ messages in thread
[parent not found: <20070430102852.042964000@linutronix.de>]
* Re: [patch 3/3] clockevents: Fix resume logic [not found] ` <20070430102852.042964000@linutronix.de> @ 2007-05-05 7:34 ` Andrew Morton 2007-05-05 11:51 ` Ingo Molnar 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-05 7:34 UTC (permalink / raw) To: Thomas Gleixner; +Cc: LKML, Ingo Molnar, John Stultz On Mon, 30 Apr 2007 10:43:34 -0000 Thomas Gleixner <tglx@linutronix.de> wrote: > We need to make sure, that the clockevent devices are resumed, before > the tick is resumed. The current resume logic does not guarantee this. > > Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock > event devices before resuming the tick / oneshot functionality. > > Fixup the existing users. This one makes the Vaio-of-fun hang during suspend to disk. It gets up to "swsusp: critical section/: done (%d pages copied)" then it freezes. I retested only 2.6.21 plus origin clocksource-fix-resume-logic clockevents-fix-resume-logic and that hung in the same fashion. Config at http://userweb.kernel.org/~akpm/config-sony.txt ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic 2007-05-05 7:34 ` [patch 3/3] clockevents: Fix resume logic Andrew Morton @ 2007-05-05 11:51 ` Ingo Molnar 2007-05-06 15:03 ` [patch 3/3] clockevents: Fix resume logic - updated version Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Ingo Molnar @ 2007-05-05 11:51 UTC (permalink / raw) To: Andrew Morton; +Cc: Thomas Gleixner, LKML, John Stultz * Andrew Morton <akpm@linux-foundation.org> wrote: > > Fixup the existing users. > > This one makes the Vaio-of-fun hang during suspend to disk. It gets > up to "swsusp: critical section/: done (%d pages copied)" then it > freezes. after trying to reproduce it on 2 boxes without success it did trigger some sw-suspend weirdness on a third box :) We are debugging it now. Ingo ^ permalink raw reply [flat|nested] 61+ messages in thread
* [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-05 11:51 ` Ingo Molnar @ 2007-05-06 15:03 ` Thomas Gleixner 2007-05-08 10:37 ` Andrew Morton 2007-05-09 5:59 ` Andrew Morton 0 siblings, 2 replies; 61+ messages in thread From: Thomas Gleixner @ 2007-05-06 15:03 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, LKML, John Stultz Andrew, On Sat, 2007-05-05 at 13:51 +0200, Ingo Molnar wrote: > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > Fixup the existing users. > > > > This one makes the Vaio-of-fun hang during suspend to disk. It gets > > up to "swsusp: critical section/: done (%d pages copied)" then it > > freezes. > > after trying to reproduce it on 2 boxes without success it did trigger > some sw-suspend weirdness on a third box :) We are debugging it now. find an updated patch below. It fixes the problem on Ingo's VAIO-of-fun-emulator and I got confirmation from several other affected users, that the patch series is still solving their problems. tglx ------------------------> Subject: clockevents: Fix resume logic We need to make sure, that the clockevent devices are resumed, before the tick is resumed. The current resume logic does not guarantee this. Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock event devices before resuming the tick / oneshot functionality. Fixup the existing users. Fix the PIT handling of CLOCK_EVT_SHUTDOWN by disabling the interrupt instead of switching to oneshot mode, as this causes slowness on a couple of systems. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/i386/kernel/apic.c | 3 + arch/i386/kernel/hpet.c | 71 +++---------------------------------------- arch/i386/kernel/i8253.c | 43 +++++++++++++++++--------- include/linux/clockchips.h | 1 kernel/time/tick-broadcast.c | 4 +- kernel/time/tick-common.c | 16 ++++++--- 6 files changed, 51 insertions(+), 87 deletions(-) Index: linux-2.6.21/arch/i386/kernel/apic.c =================================================================== --- linux-2.6.21.orig/arch/i386/kernel/apic.c +++ linux-2.6.21/arch/i386/kernel/apic.c @@ -242,6 +242,9 @@ static void lapic_timer_setup(enum clock v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR); apic_write_around(APIC_LVTT, v); break; + case CLOCK_EVT_MODE_RESUME: + /* Nothing to do here */ + break; } local_irq_restore(flags); Index: linux-2.6.21/arch/i386/kernel/hpet.c =================================================================== --- linux-2.6.21.orig/arch/i386/kernel/hpet.c +++ linux-2.6.21/arch/i386/kernel/hpet.c @@ -187,6 +187,10 @@ static void hpet_set_mode(enum clock_eve cfg &= ~HPET_TN_ENABLE; hpet_writel(cfg, HPET_T0_CFG); break; + + case CLOCK_EVT_MODE_RESUME: + hpet_enable_int(); + break; } } @@ -217,6 +221,7 @@ static struct clocksource clocksource_hp .mask = HPET_MASK, .shift = HPET_SHIFT, .flags = CLOCK_SOURCE_IS_CONTINUOUS, + .resume = hpet_start_counter, }; /* @@ -291,7 +296,6 @@ int __init hpet_enable(void) clocksource_register(&clocksource_hpet); - if (id & HPET_ID_LEGSUP) { hpet_enable_int(); hpet_reserve_platform_timers(id); @@ -524,68 +528,3 @@ irqreturn_t hpet_rtc_interrupt(int irq, return IRQ_HANDLED; } #endif - - -/* - * Suspend/resume part - */ - -#ifdef CONFIG_PM - -static int hpet_suspend(struct sys_device *sys_device, pm_message_t state) -{ - unsigned long cfg = hpet_readl(HPET_CFG); - - cfg &= ~(HPET_CFG_ENABLE|HPET_CFG_LEGACY); - hpet_writel(cfg, HPET_CFG); - - return 0; -} - -static int hpet_resume(struct sys_device *sys_device) -{ - unsigned int id; - - hpet_start_counter(); - - id = hpet_readl(HPET_ID); - - if (id & HPET_ID_LEGSUP) - hpet_enable_int(); - - return 0; -} - -static struct sysdev_class hpet_class = { - set_kset_name("hpet"), - .suspend = hpet_suspend, - .resume = hpet_resume, -}; - -static struct sys_device hpet_device = { - .id = 0, - .cls = &hpet_class, -}; - - -static __init int hpet_register_sysfs(void) -{ - int err; - - if (!is_hpet_capable()) - return 0; - - err = sysdev_class_register(&hpet_class); - - if (!err) { - err = sysdev_register(&hpet_device); - if (err) - sysdev_class_unregister(&hpet_class); - } - - return err; -} - -device_initcall(hpet_register_sysfs); - -#endif Index: linux-2.6.21/arch/i386/kernel/i8253.c =================================================================== --- linux-2.6.21.orig/arch/i386/kernel/i8253.c +++ linux-2.6.21/arch/i386/kernel/i8253.c @@ -3,11 +3,11 @@ * */ #include <linux/clockchips.h> -#include <linux/spinlock.h> +#include <linux/init.h> +#include <linux/interrupt.h> #include <linux/jiffies.h> -#include <linux/sysdev.h> #include <linux/module.h> -#include <linux/init.h> +#include <linux/spinlock.h> #include <asm/smp.h> #include <asm/delay.h> @@ -25,6 +25,24 @@ EXPORT_SYMBOL(i8253_lock); */ struct clock_event_device *global_clock_event; +/* Status of the PIT interrupt */ +static int pit_irq_disabled; + +/* + * Control pit interrupt enable / disable + */ +static void pit_control_irq(int disable) +{ + if (pit_irq_disabled == disable) + return; + + pit_irq_disabled = disable; + if (disable) + disable_irq(0); + else + enable_irq(0); +} + /* * Initialize the PIT timer. * @@ -41,26 +59,23 @@ static void init_pit_timer(enum clock_ev case CLOCK_EVT_MODE_PERIODIC: /* binary, mode 2, LSB/MSB, ch 0 */ outb_p(0x34, PIT_MODE); - udelay(10); outb_p(LATCH & 0xff , PIT_CH0); /* LSB */ - udelay(10); outb(LATCH >> 8 , PIT_CH0); /* MSB */ + pit_control_irq(0); break; - /* - * Avoid unnecessary state transitions, as it confuses - * Geode / Cyrix based boxen. - */ case CLOCK_EVT_MODE_SHUTDOWN: - if (evt->mode == CLOCK_EVT_MODE_UNUSED) - break; case CLOCK_EVT_MODE_UNUSED: - if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN) - break; + pit_control_irq(1); + break; case CLOCK_EVT_MODE_ONESHOT: /* One shot setup */ outb_p(0x38, PIT_MODE); - udelay(10); + pit_control_irq(0); + break; + + case CLOCK_EVT_MODE_RESUME: + /* Nothing to do here */ break; } spin_unlock_irqrestore(&i8253_lock, flags); Index: linux-2.6.21/include/linux/clockchips.h =================================================================== --- linux-2.6.21.orig/include/linux/clockchips.h +++ linux-2.6.21/include/linux/clockchips.h @@ -23,6 +23,7 @@ enum clock_event_mode { CLOCK_EVT_MODE_SHUTDOWN, CLOCK_EVT_MODE_PERIODIC, CLOCK_EVT_MODE_ONESHOT, + CLOCK_EVT_MODE_RESUME, }; /* Clock event notification values */ Index: linux-2.6.21/kernel/time/tick-broadcast.c =================================================================== --- linux-2.6.21.orig/kernel/time/tick-broadcast.c +++ linux-2.6.21/kernel/time/tick-broadcast.c @@ -292,7 +292,7 @@ void tick_suspend_broadcast(void) spin_lock_irqsave(&tick_broadcast_lock, flags); bc = tick_broadcast_device.evtdev; - if (bc && tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC) + if (bc) clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN); spin_unlock_irqrestore(&tick_broadcast_lock, flags); @@ -309,6 +309,8 @@ int tick_resume_broadcast(void) bc = tick_broadcast_device.evtdev; if (bc) { + clockevents_set_mode(bc, CLOCK_EVT_MODE_RESUME); + switch (tick_broadcast_device.mode) { case TICKDEV_MODE_PERIODIC: if(!cpus_empty(tick_broadcast_mask)) Index: linux-2.6.21/kernel/time/tick-common.c =================================================================== --- linux-2.6.21.orig/kernel/time/tick-common.c +++ linux-2.6.21/kernel/time/tick-common.c @@ -318,12 +318,17 @@ static void tick_resume(void) { struct tick_device *td = &__get_cpu_var(tick_cpu_device); unsigned long flags; + int broadcast = tick_resume_broadcast(); spin_lock_irqsave(&tick_device_lock, flags); - if (td->mode == TICKDEV_MODE_PERIODIC) - tick_setup_periodic(td->evtdev, 0); - else - tick_resume_oneshot(); + clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_RESUME); + + if (!broadcast) { + if (td->mode == TICKDEV_MODE_PERIODIC) + tick_setup_periodic(td->evtdev, 0); + else + tick_resume_oneshot(); + } spin_unlock_irqrestore(&tick_device_lock, flags); } @@ -360,8 +365,7 @@ static int tick_notify(struct notifier_b break; case CLOCK_EVT_NOTIFY_RESUME: - if (!tick_resume_broadcast()) - tick_resume(); + tick_resume(); break; default: ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-06 15:03 ` [patch 3/3] clockevents: Fix resume logic - updated version Thomas Gleixner @ 2007-05-08 10:37 ` Andrew Morton 2007-05-09 5:59 ` Andrew Morton 1 sibling, 0 replies; 61+ messages in thread From: Andrew Morton @ 2007-05-08 10:37 UTC (permalink / raw) To: tglx; +Cc: Ingo Molnar, LKML, John Stultz On Sun, 06 May 2007 17:03:03 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > On Sat, 2007-05-05 at 13:51 +0200, Ingo Molnar wrote: > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > Fixup the existing users. > > > > > > This one makes the Vaio-of-fun hang during suspend to disk. It gets > > > up to "swsusp: critical section/: done (%d pages copied)" then it > > > freezes. > > > > after trying to reproduce it on 2 boxes without success it did trigger > > some sw-suspend weirdness on a third box :) We are debugging it now. > > find an updated patch below. It fixes the problem on Ingo's > VAIO-of-fun-emulator and I got confirmation from several other affected > users, that the patch series is still solving their problems. yup, that now survives suspend-to-disk and resume, thanks. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-06 15:03 ` [patch 3/3] clockevents: Fix resume logic - updated version Thomas Gleixner 2007-05-08 10:37 ` Andrew Morton @ 2007-05-09 5:59 ` Andrew Morton 2007-05-09 7:10 ` Andrew Morton 1 sibling, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-09 5:59 UTC (permalink / raw) To: tglx; +Cc: Ingo Molnar, LKML, John Stultz On Sun, 06 May 2007 17:03:03 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > Andrew, > > On Sat, 2007-05-05 at 13:51 +0200, Ingo Molnar wrote: > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > Fixup the existing users. > > > > > > This one makes the Vaio-of-fun hang during suspend to disk. It gets > > > up to "swsusp: critical section/: done (%d pages copied)" then it > > > freezes. > > > > after trying to reproduce it on 2 boxes without success it did trigger > > some sw-suspend weirdness on a third box :) We are debugging it now. > > find an updated patch below. It fixes the problem on Ingo's > VAIO-of-fun-emulator and I got confirmation from several other affected > users, that the patch series is still solving their problems. > The machine is still hanging with this patch applied. suspend-to-disk gets up to "swsusp: critical section: done (NNN pages copied)" No netconsole, no printk-timestamping. ho hum, I guess I get to debug this. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 5:59 ` Andrew Morton @ 2007-05-09 7:10 ` Andrew Morton 2007-05-09 8:18 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-09 7:10 UTC (permalink / raw) To: tglx, Ingo Molnar, LKML, John Stultz; +Cc: linux-acpi On Tue, 8 May 2007 22:59:20 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > On Sun, 06 May 2007 17:03:03 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > > > Andrew, > > > > On Sat, 2007-05-05 at 13:51 +0200, Ingo Molnar wrote: > > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > Fixup the existing users. > > > > > > > > This one makes the Vaio-of-fun hang during suspend to disk. It gets > > > > up to "swsusp: critical section/: done (%d pages copied)" then it > > > > freezes. > > > > > > after trying to reproduce it on 2 boxes without success it did trigger > > > some sw-suspend weirdness on a third box :) We are debugging it now. > > > > find an updated patch below. It fixes the problem on Ingo's > > VAIO-of-fun-emulator and I got confirmation from several other affected > > users, that the patch series is still solving their problems. > > > > The machine is still hanging with this patch applied. > > suspend-to-disk gets up to "swsusp: critical section: done (NNN pages copied)" > > No netconsole, no printk-timestamping. > > ho hum, I guess I get to debug this. It got ugly. We finish swsusp_save() and a few other functions then we go hibernate ->platform_finish ->acpi_hibernation_finish ->acpi_leave_sleep_state ->acpi_evaluate_object and there it dies, in this call: status = acpi_evaluate_object(NULL, METHOD_NAME__WAK, &arg_list, NULL); I wonder how your patch caused that? <debugs further> OK, it gets to the last statement in acpi_evaluate_object(): return_ACPI_STATUS(status); but doesn't hit the printk on return to the caller, acpi_leave_sleep_state(). A working theory would be that something we did trashed the stack in acpi_evaluate_object(). <switches from 8k stacks to 4k. No change> foo. I'm not sure what to do now. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 7:10 ` Andrew Morton @ 2007-05-09 8:18 ` Thomas Gleixner 2007-05-09 8:22 ` Andrew Morton 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-09 8:18 UTC (permalink / raw) To: Andrew Morton; +Cc: Ingo Molnar, LKML, John Stultz, linux-acpi On Wed, 2007-05-09 at 00:10 -0700, Andrew Morton wrote: > > > find an updated patch below. It fixes the problem on Ingo's > > > VAIO-of-fun-emulator and I got confirmation from several other affected > > > users, that the patch series is still solving their problems. > > > > > > > The machine is still hanging with this patch applied. What did change since yesterday ? > yup, that now survives suspend-to-disk and resume, thanks. > > suspend-to-disk gets up to "swsusp: critical section: done (NNN pages copied)" > > > > No netconsole, no printk-timestamping. > > > > ho hum, I guess I get to debug this. > > It got ugly. > > We finish swsusp_save() and a few other functions then we go > > hibernate > ->platform_finish > ->acpi_hibernation_finish > ->acpi_leave_sleep_state > ->acpi_evaluate_object > > and there it dies, in this call: > > status = acpi_evaluate_object(NULL, METHOD_NAME__WAK, &arg_list, NULL); > > I wonder how your patch caused that? hmm, that's more than strange > <debugs further> > > OK, it gets to the last statement in acpi_evaluate_object(): > > return_ACPI_STATUS(status); > > but doesn't hit the printk on return to the caller, > acpi_leave_sleep_state(). > > A working theory would be that something we did trashed the stack in > acpi_evaluate_object(). > > <switches from 8k stacks to 4k. No change> > > foo. I'm not sure what to do now. Hmm. Do we enable interrupts somewhere where we should not ? /me goes looking for such places. tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 8:18 ` Thomas Gleixner @ 2007-05-09 8:22 ` Andrew Morton 2007-05-09 8:31 ` Andrew Morton 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-09 8:22 UTC (permalink / raw) To: tglx; +Cc: Ingo Molnar, LKML, John Stultz, linux-acpi On Wed, 09 May 2007 10:18:17 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > On Wed, 2007-05-09 at 00:10 -0700, Andrew Morton wrote: > > > > find an updated patch below. It fixes the problem on Ingo's > > > > VAIO-of-fun-emulator and I got confirmation from several other affected > > > > users, that the patch series is still solving their problems. > > > > > > > > > > The machine is still hanging with this patch applied. > > What did change since yesterday ? I suspect I just tested the wrong thing yesterday. Let me recheck just these patches against 2.6.21. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 8:22 ` Andrew Morton @ 2007-05-09 8:31 ` Andrew Morton 2007-05-09 8:59 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-09 8:31 UTC (permalink / raw) To: tglx, Ingo Molnar, LKML, John Stultz, linux-acpi On Wed, 9 May 2007 01:22:57 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 09 May 2007 10:18:17 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > > > On Wed, 2007-05-09 at 00:10 -0700, Andrew Morton wrote: > > > > > find an updated patch below. It fixes the problem on Ingo's > > > > > VAIO-of-fun-emulator and I got confirmation from several other affected > > > > > users, that the patch series is still solving their problems. > > > > > > > > > > > > > The machine is still hanging with this patch applied. > > > > What did change since yesterday ? > > I suspect I just tested the wrong thing yesterday. Let me recheck just > these patches against 2.6.21. yup, same hang with just these three: origin clocksource-fix-resume-logic clockevents-fix-resume-logic-updated-version ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 8:31 ` Andrew Morton @ 2007-05-09 8:59 ` Thomas Gleixner 2007-05-09 11:45 ` Rafael J. Wysocki 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-09 8:59 UTC (permalink / raw) To: Andrew Morton; +Cc: Ingo Molnar, LKML, John Stultz, linux-acpi On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote: > > I suspect I just tested the wrong thing yesterday. Let me recheck just > > these patches against 2.6.21. > > yup, same hang with just these three: > > origin > clocksource-fix-resume-logic > clockevents-fix-resume-logic-updated-version I have no idea, how this affects acpi_evaluate_object() tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 8:59 ` Thomas Gleixner @ 2007-05-09 11:45 ` Rafael J. Wysocki 2007-05-09 12:24 ` Thomas Gleixner 2007-05-09 12:52 ` Rafael J. Wysocki 0 siblings, 2 replies; 61+ messages in thread From: Rafael J. Wysocki @ 2007-05-09 11:45 UTC (permalink / raw) To: tglx, Andrew Morton; +Cc: Ingo Molnar, LKML, John Stultz, linux-acpi On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote: > On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote: > > > I suspect I just tested the wrong thing yesterday. Let me recheck just > > > these patches against 2.6.21. > > > > yup, same hang with just these three: > > > > origin > > clocksource-fix-resume-logic > > clockevents-fix-resume-logic-updated-version > > I have no idea, how this affects acpi_evaluate_object() I think the problem is that the ACPI code ordering here is broken in a difficult to fix way. Definitely, we shouldn't execute the _BFS method after creating the image and most probably _WAK shouldn't be executed here either. Moreover, acpi_leave_sleep_state() enables the runtime GPEs, which AFAICS is equivalent to allowing ACPI to generate SCIs. I'm not sure if this is a good idea to do such a thing in this particular place. Andrew, could you please apply the appended patch and see if that helps (should apply to -mm2)? Rafael --- NOTE: This is not a complete solution, because it removes the enabling of GPEs from the resume-during-hibernation code path entirely, which probbably is not a good idea in general. --- kernel/power/disk.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6.21/kernel/power/disk.c =================================================================== --- linux-2.6.21.orig/kernel/power/disk.c +++ linux-2.6.21/kernel/power/disk.c @@ -131,7 +131,9 @@ int hibernation_snapshot(int platform_mo } enable_nonboot_cpus(); Resume_devices: - platform_finish(platform_mode); + if (!in_suspend || error) + platform_finish(platform_mode); + device_resume(); resume_console(); Finish: ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 11:45 ` Rafael J. Wysocki @ 2007-05-09 12:24 ` Thomas Gleixner 2007-05-09 13:12 ` Rafael J. Wysocki 2007-05-09 12:52 ` Rafael J. Wysocki 1 sibling, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-09 12:24 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi On Wed, 2007-05-09 at 13:45 +0200, Rafael J. Wysocki wrote: > On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote: > > On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote: > > > > I suspect I just tested the wrong thing yesterday. Let me recheck just > > > > these patches against 2.6.21. > > > > > > yup, same hang with just these three: > > > > > > origin > > > clocksource-fix-resume-logic > > > clockevents-fix-resume-logic-updated-version > > > > I have no idea, how this affects acpi_evaluate_object() > > I think the problem is that the ACPI code ordering here is broken in a > difficult to fix way. Any explanation aside of witchcraft why this is affected by the clock event resume changes ? tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 12:24 ` Thomas Gleixner @ 2007-05-09 13:12 ` Rafael J. Wysocki 2007-05-09 13:19 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Rafael J. Wysocki @ 2007-05-09 13:12 UTC (permalink / raw) To: tglx; +Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi On Wednesday, 9 May 2007 14:24, Thomas Gleixner wrote: > On Wed, 2007-05-09 at 13:45 +0200, Rafael J. Wysocki wrote: > > On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote: > > > On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote: > > > > > I suspect I just tested the wrong thing yesterday. Let me recheck just > > > > > these patches against 2.6.21. > > > > > > > > yup, same hang with just these three: > > > > > > > > origin > > > > clocksource-fix-resume-logic > > > > clockevents-fix-resume-logic-updated-version > > > > > > I have no idea, how this affects acpi_evaluate_object() > > > > I think the problem is that the ACPI code ordering here is broken in a > > difficult to fix way. > > Any explanation aside of witchcraft why this is affected by the clock > event resume changes ? Well, where is unregister_time_interpolator() called from? Rafael ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 13:12 ` Rafael J. Wysocki @ 2007-05-09 13:19 ` Thomas Gleixner 2007-05-09 17:09 ` Rafael J. Wysocki 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-09 13:19 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi On Wed, 2007-05-09 at 15:12 +0200, Rafael J. Wysocki wrote: > On Wednesday, 9 May 2007 14:24, Thomas Gleixner wrote: > > On Wed, 2007-05-09 at 13:45 +0200, Rafael J. Wysocki wrote: > > > On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote: > > > > On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote: > > > > > > I suspect I just tested the wrong thing yesterday. Let me recheck just > > > > > > these patches against 2.6.21. > > > > > > > > > > yup, same hang with just these three: > > > > > > > > > > origin > > > > > clocksource-fix-resume-logic > > > > > clockevents-fix-resume-logic-updated-version > > > > > > > > I have no idea, how this affects acpi_evaluate_object() > > > > > > I think the problem is that the ACPI code ordering here is broken in a > > > difficult to fix way. > > > > Any explanation aside of witchcraft why this is affected by the clock > > event resume changes ? > > Well, where is unregister_time_interpolator() called from? # grep -rn unregister_time_interpolator . ./kernel/timer.c:1893:unregister_time_interpolator(struct time_interpolator *ti) ./include/linux/timex.h:270:extern void unregister_time_interpolator(struct time_interpolator *); I don't see a caller. i386 does not use time interpolator anyway. # find -iname Kconfig | xargs grep TIME_INTERPOLATION ./arch/sparc64/Kconfig:37:config TIME_INTERPOLATION ./arch/ia64/Kconfig:60:config TIME_INTERPOLATION tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 13:19 ` Thomas Gleixner @ 2007-05-09 17:09 ` Rafael J. Wysocki 2007-05-09 17:15 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Rafael J. Wysocki @ 2007-05-09 17:09 UTC (permalink / raw) To: tglx; +Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi On Wednesday, 9 May 2007 15:19, Thomas Gleixner wrote: > On Wed, 2007-05-09 at 15:12 +0200, Rafael J. Wysocki wrote: > > On Wednesday, 9 May 2007 14:24, Thomas Gleixner wrote: > > > On Wed, 2007-05-09 at 13:45 +0200, Rafael J. Wysocki wrote: > > > > On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote: > > > > > On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote: > > > > > > > I suspect I just tested the wrong thing yesterday. Let me recheck just > > > > > > > these patches against 2.6.21. > > > > > > > > > > > > yup, same hang with just these three: > > > > > > > > > > > > origin > > > > > > clocksource-fix-resume-logic > > > > > > clockevents-fix-resume-logic-updated-version > > > > > > > > > > I have no idea, how this affects acpi_evaluate_object() > > > > > > > > I think the problem is that the ACPI code ordering here is broken in a > > > > difficult to fix way. > > > > > > Any explanation aside of witchcraft why this is affected by the clock > > > event resume changes ? > > > > Well, where is unregister_time_interpolator() called from? > > # grep -rn unregister_time_interpolator . > ./kernel/timer.c:1893:unregister_time_interpolator(struct time_interpolator *ti) > ./include/linux/timex.h:270:extern void unregister_time_interpolator(struct time_interpolator *); > > I don't see a caller. i386 does not use time interpolator anyway. > > # find -iname Kconfig | xargs grep TIME_INTERPOLATION > ./arch/sparc64/Kconfig:37:config TIME_INTERPOLATION > ./arch/ia64/Kconfig:60:config TIME_INTERPOLATION But clocksource_resume() has no other caller, AFAICS ... Well, I don't see any explanation that wouldn't involve witchcraft. Rafael ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 17:09 ` Rafael J. Wysocki @ 2007-05-09 17:15 ` Thomas Gleixner 2007-05-09 18:36 ` Rafael J. Wysocki 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-09 17:15 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi On Wed, 2007-05-09 at 19:09 +0200, Rafael J. Wysocki wrote: > > > Well, where is unregister_time_interpolator() called from? > > > > # grep -rn unregister_time_interpolator . > > ./kernel/timer.c:1893:unregister_time_interpolator(struct time_interpolator *ti) > > ./include/linux/timex.h:270:extern void unregister_time_interpolator(struct time_interpolator *); > > > > I don't see a caller. i386 does not use time interpolator anyway. > > > > # find -iname Kconfig | xargs grep TIME_INTERPOLATION > > ./arch/sparc64/Kconfig:37:config TIME_INTERPOLATION > > ./arch/ia64/Kconfig:60:config TIME_INTERPOLATION > > But clocksource_resume() has no other caller, AFAICS ... Eeep ? clocksource_resume is called from timekeeping_resume() timestatic int timekeeping_resume(struct sys_device *dev) { unsigned long flags; unsigned long now = read_persistent_clock(); clocksource_resume(); .... } keeping_resume() called via the sysdev resume static struct sysdev_class timekeeping_sysclass = { .resume = timekeeping_resume, .suspend = timekeeping_suspend, set_kset_name("timekeeping"), }; tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 17:15 ` Thomas Gleixner @ 2007-05-09 18:36 ` Rafael J. Wysocki 2007-05-09 20:45 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Rafael J. Wysocki @ 2007-05-09 18:36 UTC (permalink / raw) To: tglx; +Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi On Wednesday, 9 May 2007 19:15, Thomas Gleixner wrote: > On Wed, 2007-05-09 at 19:09 +0200, Rafael J. Wysocki wrote: > > > > Well, where is unregister_time_interpolator() called from? > > > > > > # grep -rn unregister_time_interpolator . > > > ./kernel/timer.c:1893:unregister_time_interpolator(struct time_interpolator *ti) > > > ./include/linux/timex.h:270:extern void unregister_time_interpolator(struct time_interpolator *); > > > > > > I don't see a caller. i386 does not use time interpolator anyway. > > > > > > # find -iname Kconfig | xargs grep TIME_INTERPOLATION > > > ./arch/sparc64/Kconfig:37:config TIME_INTERPOLATION > > > ./arch/ia64/Kconfig:60:config TIME_INTERPOLATION > > > > But clocksource_resume() has no other caller, AFAICS ... > > Eeep ? > > clocksource_resume is called from timekeeping_resume() > > timestatic int timekeeping_resume(struct sys_device *dev) > { > unsigned long flags; > unsigned long now = read_persistent_clock(); > > clocksource_resume(); > .... > } > > keeping_resume() called via the sysdev resume > > static struct sysdev_class timekeeping_sysclass = { > .resume = timekeeping_resume, > .suspend = timekeeping_suspend, > set_kset_name("timekeeping"), > }; Well, apparently, not in -mm2: rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep -r -I -l 'timekeeping_resume' * kernel/time/timekeeping.c rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep clocksource_resume kernel/time/timekeeping.c rafael@albercik:~/src/mm/linux-2.6.21-mm2> Hmm? Rafael ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 18:36 ` Rafael J. Wysocki @ 2007-05-09 20:45 ` Thomas Gleixner 2007-05-09 20:53 ` Rafael J. Wysocki 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-09 20:45 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi On Wed, 2007-05-09 at 20:36 +0200, Rafael J. Wysocki wrote: > Well, apparently, not in -mm2: > > rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep -r -I -l 'timekeeping_resume' * > kernel/time/timekeeping.c > rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep clocksource_resume kernel/time/timekeeping.c > rafael@albercik:~/src/mm/linux-2.6.21-mm2> Andrew dropped the patch because it did not work on his jinxed VAIO, but he debugged with the patch applied. tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 20:45 ` Thomas Gleixner @ 2007-05-09 20:53 ` Rafael J. Wysocki 2007-05-09 21:26 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Rafael J. Wysocki @ 2007-05-09 20:53 UTC (permalink / raw) To: tglx; +Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi On Wednesday, 9 May 2007 22:45, Thomas Gleixner wrote: > On Wed, 2007-05-09 at 20:36 +0200, Rafael J. Wysocki wrote: > > Well, apparently, not in -mm2: > > > > rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep -r -I -l 'timekeeping_resume' * > > kernel/time/timekeeping.c > > rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep clocksource_resume kernel/time/timekeeping.c > > rafael@albercik:~/src/mm/linux-2.6.21-mm2> > > Andrew dropped the patch because it did not work on his jinxed VAIO, but > he debugged with the patch applied. I see. In that case, since timekeeping_resume() is called via sysdev_resume, then it's executed before acpi_leave_sleep_state() and may very well interfere with the ACPI methods executed from there, depending on what's happening in the cs->resume() callbacks in clocksource_resume(). Greetings, Rafael ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 20:53 ` Rafael J. Wysocki @ 2007-05-09 21:26 ` Thomas Gleixner 2007-05-10 8:46 ` Andrew Morton 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-09 21:26 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi On Wed, 2007-05-09 at 22:53 +0200, Rafael J. Wysocki wrote: > On Wednesday, 9 May 2007 22:45, Thomas Gleixner wrote: > > On Wed, 2007-05-09 at 20:36 +0200, Rafael J. Wysocki wrote: > > > Well, apparently, not in -mm2: > > > > > > rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep -r -I -l 'timekeeping_resume' * > > > kernel/time/timekeeping.c > > > rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep clocksource_resume kernel/time/timekeeping.c > > > rafael@albercik:~/src/mm/linux-2.6.21-mm2> > > > > Andrew dropped the patch because it did not work on his jinxed VAIO, but > > he debugged with the patch applied. > > I see. > > In that case, since timekeeping_resume() is called via sysdev_resume, then it's > executed before acpi_leave_sleep_state() and may very well interfere with the > ACPI methods executed from there, depending on what's happening in the > cs->resume() callbacks in clocksource_resume(). The only difference in the clocksource_resume() path with the patch applied on Andrews laptop is, that the watchdog for the TSC is reset. But that's only touching some variables. The other change is the way how the clock events are resumed. It touches the PIT, local APIC and the interrupt controller. Andrew, can you test the alternative replacement patch for clockevents: Fix resume logic - updated version It does not touch the interrupt controller, it does the PIT restart different. That's a patch from Chris Wright and confirmed to work on the affected machines - except the @%$#! VAIO of course. If that patch makes the problem go away, then we should have a quite good hint what we need to look at. tglx Index: linux-2.6.21-git-x86-64/arch/i386/kernel/apic.c =================================================================== --- linux-2.6.21-git-x86-64.orig/arch/i386/kernel/apic.c +++ linux-2.6.21-git-x86-64/arch/i386/kernel/apic.c @@ -264,6 +264,9 @@ static void lapic_timer_setup(enum clock v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR); apic_write_around(APIC_LVTT, v); break; + case CLOCK_EVT_MODE_RESUME: + /* Nothing to do here */ + break; } local_irq_restore(flags); Index: linux-2.6.21-git-x86-64/arch/i386/kernel/hpet.c =================================================================== --- linux-2.6.21-git-x86-64.orig/arch/i386/kernel/hpet.c +++ linux-2.6.21-git-x86-64/arch/i386/kernel/hpet.c @@ -187,6 +187,10 @@ static void hpet_set_mode(enum clock_eve cfg &= ~HPET_TN_ENABLE; hpet_writel(cfg, HPET_T0_CFG); break; + + case CLOCK_EVT_MODE_RESUME: + hpet_enable_int(); + break; } } @@ -217,6 +221,7 @@ static struct clocksource clocksource_hp .mask = HPET_MASK, .shift = HPET_SHIFT, .flags = CLOCK_SOURCE_IS_CONTINUOUS, + .resume = hpet_start_counter, }; /* @@ -291,7 +296,6 @@ int __init hpet_enable(void) clocksource_register(&clocksource_hpet); - if (id & HPET_ID_LEGSUP) { hpet_enable_int(); hpet_reserve_platform_timers(id); @@ -524,68 +528,3 @@ irqreturn_t hpet_rtc_interrupt(int irq, return IRQ_HANDLED; } #endif - - -/* - * Suspend/resume part - */ - -#ifdef CONFIG_PM - -static int hpet_suspend(struct sys_device *sys_device, pm_message_t state) -{ - unsigned long cfg = hpet_readl(HPET_CFG); - - cfg &= ~(HPET_CFG_ENABLE|HPET_CFG_LEGACY); - hpet_writel(cfg, HPET_CFG); - - return 0; -} - -static int hpet_resume(struct sys_device *sys_device) -{ - unsigned int id; - - hpet_start_counter(); - - id = hpet_readl(HPET_ID); - - if (id & HPET_ID_LEGSUP) - hpet_enable_int(); - - return 0; -} - -static struct sysdev_class hpet_class = { - set_kset_name("hpet"), - .suspend = hpet_suspend, - .resume = hpet_resume, -}; - -static struct sys_device hpet_device = { - .id = 0, - .cls = &hpet_class, -}; - - -static __init int hpet_register_sysfs(void) -{ - int err; - - if (!is_hpet_capable()) - return 0; - - err = sysdev_class_register(&hpet_class); - - if (!err) { - err = sysdev_register(&hpet_device); - if (err) - sysdev_class_unregister(&hpet_class); - } - - return err; -} - -device_initcall(hpet_register_sysfs); - -#endif Index: linux-2.6.21-git-x86-64/arch/i386/kernel/i8253.c =================================================================== --- linux-2.6.21-git-x86-64.orig/arch/i386/kernel/i8253.c +++ linux-2.6.21-git-x86-64/arch/i386/kernel/i8253.c @@ -3,11 +3,11 @@ * */ #include <linux/clockchips.h> -#include <linux/spinlock.h> +#include <linux/init.h> +#include <linux/interrupt.h> #include <linux/jiffies.h> -#include <linux/sysdev.h> #include <linux/module.h> -#include <linux/init.h> +#include <linux/spinlock.h> #include <asm/smp.h> #include <asm/delay.h> @@ -41,26 +41,24 @@ static void init_pit_timer(enum clock_ev case CLOCK_EVT_MODE_PERIODIC: /* binary, mode 2, LSB/MSB, ch 0 */ outb_p(0x34, PIT_MODE); - udelay(10); outb_p(LATCH & 0xff , PIT_CH0); /* LSB */ - udelay(10); outb(LATCH >> 8 , PIT_CH0); /* MSB */ break; - /* - * Avoid unnecessary state transitions, as it confuses - * Geode / Cyrix based boxen. - */ case CLOCK_EVT_MODE_SHUTDOWN: - if (evt->mode == CLOCK_EVT_MODE_UNUSED) - break; case CLOCK_EVT_MODE_UNUSED: - if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN) - break; + outb_p(0x30, PIT_MODE); + outb_p(0, PIT_CH0); /* LSB */ + outb_p(0, PIT_CH0); /* MSB */ + break; + case CLOCK_EVT_MODE_ONESHOT: /* One shot setup */ outb_p(0x38, PIT_MODE); - udelay(10); + break; + + case CLOCK_EVT_MODE_RESUME: + /* Nothing to do here */ break; } spin_unlock_irqrestore(&i8253_lock, flags); Index: linux-2.6.21-git-x86-64/include/linux/clockchips.h =================================================================== --- linux-2.6.21-git-x86-64.orig/include/linux/clockchips.h +++ linux-2.6.21-git-x86-64/include/linux/clockchips.h @@ -23,6 +23,7 @@ enum clock_event_mode { CLOCK_EVT_MODE_SHUTDOWN, CLOCK_EVT_MODE_PERIODIC, CLOCK_EVT_MODE_ONESHOT, + CLOCK_EVT_MODE_RESUME, }; /* Clock event notification values */ Index: linux-2.6.21-git-x86-64/kernel/time/tick-broadcast.c =================================================================== --- linux-2.6.21-git-x86-64.orig/kernel/time/tick-broadcast.c +++ linux-2.6.21-git-x86-64/kernel/time/tick-broadcast.c @@ -292,7 +292,7 @@ void tick_suspend_broadcast(void) spin_lock_irqsave(&tick_broadcast_lock, flags); bc = tick_broadcast_device.evtdev; - if (bc && tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC) + if (bc) clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN); spin_unlock_irqrestore(&tick_broadcast_lock, flags); @@ -309,6 +309,8 @@ int tick_resume_broadcast(void) bc = tick_broadcast_device.evtdev; if (bc) { + clockevents_set_mode(bc, CLOCK_EVT_MODE_RESUME); + switch (tick_broadcast_device.mode) { case TICKDEV_MODE_PERIODIC: if(!cpus_empty(tick_broadcast_mask)) Index: linux-2.6.21-git-x86-64/kernel/time/tick-common.c =================================================================== --- linux-2.6.21-git-x86-64.orig/kernel/time/tick-common.c +++ linux-2.6.21-git-x86-64/kernel/time/tick-common.c @@ -318,12 +318,17 @@ static void tick_resume(void) { struct tick_device *td = &__get_cpu_var(tick_cpu_device); unsigned long flags; + int broadcast = tick_resume_broadcast(); spin_lock_irqsave(&tick_device_lock, flags); - if (td->mode == TICKDEV_MODE_PERIODIC) - tick_setup_periodic(td->evtdev, 0); - else - tick_resume_oneshot(); + clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_RESUME); + + if (!broadcast) { + if (td->mode == TICKDEV_MODE_PERIODIC) + tick_setup_periodic(td->evtdev, 0); + else + tick_resume_oneshot(); + } spin_unlock_irqrestore(&tick_device_lock, flags); } @@ -360,8 +365,7 @@ static int tick_notify(struct notifier_b break; case CLOCK_EVT_NOTIFY_RESUME: - if (!tick_resume_broadcast()) - tick_resume(); + tick_resume(); break; default: ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 21:26 ` Thomas Gleixner @ 2007-05-10 8:46 ` Andrew Morton 2007-05-10 8:55 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-10 8:46 UTC (permalink / raw) To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Wed, 09 May 2007 23:26:22 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > Andrew, > > can you test the alternative replacement patch for > > clockevents: Fix resume logic - updated version > > It does not touch the interrupt controller, it does the PIT restart > different. That's a patch from Chris Wright and confirmed to work on the > affected machines - except the @%$#! VAIO of course. > > If that patch makes the problem go away, then we should have a quite > good hint what we need to look at. No joy, sorry. It still hangs at the last statement in acpi_evaluate_object(). ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-10 8:46 ` Andrew Morton @ 2007-05-10 8:55 ` Thomas Gleixner 2007-05-10 9:18 ` Andrew Morton 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-10 8:55 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Thu, 2007-05-10 at 01:46 -0700, Andrew Morton wrote: > On Wed, 09 May 2007 23:26:22 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > > > Andrew, > > > > can you test the alternative replacement patch for > > > > clockevents: Fix resume logic - updated version > > > > It does not touch the interrupt controller, it does the PIT restart > > different. That's a patch from Chris Wright and confirmed to work on the > > affected machines - except the @%$#! VAIO of course. > > > > If that patch makes the problem go away, then we should have a quite > > good hint what we need to look at. > > No joy, sorry. It still hangs at the last statement in acpi_evaluate_object(). Can you add "nolapic_timer" to the command line please ? tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-10 8:55 ` Thomas Gleixner @ 2007-05-10 9:18 ` Andrew Morton 2007-05-10 9:27 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-10 9:18 UTC (permalink / raw) To: Thomas Gleixner Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Thu, 10 May 2007 10:55:45 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > On Thu, 2007-05-10 at 01:46 -0700, Andrew Morton wrote: > > On Wed, 09 May 2007 23:26:22 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > > > > > Andrew, > > > > > > can you test the alternative replacement patch for > > > > > > clockevents: Fix resume logic - updated version > > > > > > It does not touch the interrupt controller, it does the PIT restart > > > different. That's a patch from Chris Wright and confirmed to work on the > > > affected machines - except the @%$#! VAIO of course. > > > > > > If that patch makes the problem go away, then we should have a quite > > > good hint what we need to look at. > > > > No joy, sorry. It still hangs at the last statement in acpi_evaluate_object(). > > Can you add "nolapic_timer" to the command line please ? > That works. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-10 9:18 ` Andrew Morton @ 2007-05-10 9:27 ` Thomas Gleixner 2007-05-10 20:12 ` Rafael J. Wysocki 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-10 9:27 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Thu, 2007-05-10 at 02:18 -0700, Andrew Morton wrote: > > > > If that patch makes the problem go away, then we should have a quite > > > > good hint what we need to look at. > > > > > > No joy, sorry. It still hangs at the last statement in acpi_evaluate_object(). > > > > Can you add "nolapic_timer" to the command line please ? > > > > That works. Ok, that boils it down to the change, which affects the lapic timer resume. It does nothing else, than fiddling in some APIC registers, but I don't see how this affects the ACPI stuff. This smells extremly fishy. tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-10 9:27 ` Thomas Gleixner @ 2007-05-10 20:12 ` Rafael J. Wysocki 2007-05-11 16:47 ` Andrew Morton 0 siblings, 1 reply; 61+ messages in thread From: Rafael J. Wysocki @ 2007-05-10 20:12 UTC (permalink / raw) To: Thomas Gleixner, Andrew Morton; +Cc: Ingo Molnar, LKML, John Stultz, linux-acpi On Thursday, 10 May 2007 11:27, Thomas Gleixner wrote: > On Thu, 2007-05-10 at 02:18 -0700, Andrew Morton wrote: > > > > > If that patch makes the problem go away, then we should have a quite > > > > > good hint what we need to look at. > > > > > > > > No joy, sorry. It still hangs at the last statement in acpi_evaluate_object(). > > > > > > Can you add "nolapic_timer" to the command line please ? > > > > > > > That works. > > Ok, that boils it down to the change, which affects the lapic timer > resume. It does nothing else, than fiddling in some APIC registers, but > I don't see how this affects the ACPI stuff. This smells extremly fishy. Hmm, I wonder if it might be related to the execution of the _GTS ACPI control method in acpi_enter_sleep_state_prep() in violation of the spec. Andrew, could you please try to apply the following change on top of the previous ones and see if the machine still hangs in the same place without "nolapic_timer"? --- drivers/acpi/hardware/hwsleep.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c =================================================================== --- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c +++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c @@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep( return_ACPI_STATUS(status); } - status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); + /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) { return_ACPI_STATUS(status); - } + }*/ /* Setup the argument to _SST */ ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-10 20:12 ` Rafael J. Wysocki @ 2007-05-11 16:47 ` Andrew Morton 2007-05-11 20:10 ` Rafael J. Wysocki 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-11 16:47 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi On Thu, 10 May 2007 22:12:07 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > On Thursday, 10 May 2007 11:27, Thomas Gleixner wrote: > > On Thu, 2007-05-10 at 02:18 -0700, Andrew Morton wrote: > > > > > > If that patch makes the problem go away, then we should have a quite > > > > > > good hint what we need to look at. > > > > > > > > > > No joy, sorry. It still hangs at the last statement in acpi_evaluate_object(). > > > > > > > > Can you add "nolapic_timer" to the command line please ? > > > > > > > > > > That works. > > > > Ok, that boils it down to the change, which affects the lapic timer > > resume. It does nothing else, than fiddling in some APIC registers, but > > I don't see how this affects the ACPI stuff. This smells extremly fishy. > > Hmm, I wonder if it might be related to the execution of the _GTS ACPI control > method in acpi_enter_sleep_state_prep() in violation of the spec. > > Andrew, could you please try to apply the following change on top of the > previous ones and see if the machine still hangs in the same place without > "nolapic_timer"? > > --- > drivers/acpi/hardware/hwsleep.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c > =================================================================== > --- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c > +++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c > @@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep( > return_ACPI_STATUS(status); > } > > - status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); > + /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); > if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) { > return_ACPI_STATUS(status); > - } > + }*/ > > /* Setup the argument to _SST */ > I tested this against Thomas's original patch (below). Still hangs, in the same way. From: Thomas Gleixner <tglx@linutronix.de> We need to make sure that the clockevent devices are resumed before the tick is resumed. The current resume logic does not guarantee this. Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock event devices before resuming the tick / oneshot functionality. Fixup the existing users. Fix the PIT handling of CLOCK_EVT_SHUTDOWN by disabling the interrupt instead of switching to oneshot mode, as this causes slowness on a couple of systems. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/i386/kernel/apic.c | 3 + arch/i386/kernel/hpet.c | 71 ++------------------------------- arch/i386/kernel/i8253.c | 43 +++++++++++++------ include/linux/clockchips.h | 1 kernel/time/tick-broadcast.c | 4 + kernel/time/tick-common.c | 16 ++++--- 6 files changed, 51 insertions(+), 87 deletions(-) diff -puN arch/i386/kernel/apic.c~clockevents-fix-resume-logic-updated-version arch/i386/kernel/apic.c --- a/arch/i386/kernel/apic.c~clockevents-fix-resume-logic-updated-version +++ a/arch/i386/kernel/apic.c @@ -263,6 +263,9 @@ static void lapic_timer_setup(enum clock v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR); apic_write_around(APIC_LVTT, v); break; + case CLOCK_EVT_MODE_RESUME: + /* Nothing to do here */ + break; } local_irq_restore(flags); diff -puN arch/i386/kernel/hpet.c~clockevents-fix-resume-logic-updated-version arch/i386/kernel/hpet.c --- a/arch/i386/kernel/hpet.c~clockevents-fix-resume-logic-updated-version +++ a/arch/i386/kernel/hpet.c @@ -187,6 +187,10 @@ static void hpet_set_mode(enum clock_eve cfg &= ~HPET_TN_ENABLE; hpet_writel(cfg, HPET_T0_CFG); break; + + case CLOCK_EVT_MODE_RESUME: + hpet_enable_int(); + break; } } @@ -217,6 +221,7 @@ static struct clocksource clocksource_hp .mask = HPET_MASK, .shift = HPET_SHIFT, .flags = CLOCK_SOURCE_IS_CONTINUOUS, + .resume = hpet_start_counter, }; /* @@ -291,7 +296,6 @@ int __init hpet_enable(void) clocksource_register(&clocksource_hpet); - if (id & HPET_ID_LEGSUP) { hpet_enable_int(); hpet_reserve_platform_timers(id); @@ -524,68 +528,3 @@ irqreturn_t hpet_rtc_interrupt(int irq, return IRQ_HANDLED; } #endif - - -/* - * Suspend/resume part - */ - -#ifdef CONFIG_PM - -static int hpet_suspend(struct sys_device *sys_device, pm_message_t state) -{ - unsigned long cfg = hpet_readl(HPET_CFG); - - cfg &= ~(HPET_CFG_ENABLE|HPET_CFG_LEGACY); - hpet_writel(cfg, HPET_CFG); - - return 0; -} - -static int hpet_resume(struct sys_device *sys_device) -{ - unsigned int id; - - hpet_start_counter(); - - id = hpet_readl(HPET_ID); - - if (id & HPET_ID_LEGSUP) - hpet_enable_int(); - - return 0; -} - -static struct sysdev_class hpet_class = { - set_kset_name("hpet"), - .suspend = hpet_suspend, - .resume = hpet_resume, -}; - -static struct sys_device hpet_device = { - .id = 0, - .cls = &hpet_class, -}; - - -static __init int hpet_register_sysfs(void) -{ - int err; - - if (!is_hpet_capable()) - return 0; - - err = sysdev_class_register(&hpet_class); - - if (!err) { - err = sysdev_register(&hpet_device); - if (err) - sysdev_class_unregister(&hpet_class); - } - - return err; -} - -device_initcall(hpet_register_sysfs); - -#endif diff -puN arch/i386/kernel/i8253.c~clockevents-fix-resume-logic-updated-version arch/i386/kernel/i8253.c --- a/arch/i386/kernel/i8253.c~clockevents-fix-resume-logic-updated-version +++ a/arch/i386/kernel/i8253.c @@ -3,11 +3,11 @@ * */ #include <linux/clockchips.h> -#include <linux/spinlock.h> +#include <linux/init.h> +#include <linux/interrupt.h> #include <linux/jiffies.h> -#include <linux/sysdev.h> #include <linux/module.h> -#include <linux/init.h> +#include <linux/spinlock.h> #include <asm/smp.h> #include <asm/delay.h> @@ -26,6 +26,24 @@ EXPORT_SYMBOL(i8253_lock); */ struct clock_event_device *global_clock_event; +/* Status of the PIT interrupt */ +static int pit_irq_disabled; + +/* + * Control pit interrupt enable / disable + */ +static void pit_control_irq(int disable) +{ + if (pit_irq_disabled == disable) + return; + + pit_irq_disabled = disable; + if (disable) + disable_irq(0); + else + enable_irq(0); +} + /* * Initialize the PIT timer. * @@ -42,26 +60,23 @@ static void init_pit_timer(enum clock_ev case CLOCK_EVT_MODE_PERIODIC: /* binary, mode 2, LSB/MSB, ch 0 */ outb_p(0x34, PIT_MODE); - udelay(10); outb_p(LATCH & 0xff , PIT_CH0); /* LSB */ - udelay(10); outb(LATCH >> 8 , PIT_CH0); /* MSB */ + pit_control_irq(0); break; - /* - * Avoid unnecessary state transitions, as it confuses - * Geode / Cyrix based boxen. - */ case CLOCK_EVT_MODE_SHUTDOWN: - if (evt->mode == CLOCK_EVT_MODE_UNUSED) - break; case CLOCK_EVT_MODE_UNUSED: - if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN) - break; + pit_control_irq(1); + break; case CLOCK_EVT_MODE_ONESHOT: /* One shot setup */ outb_p(0x38, PIT_MODE); - udelay(10); + pit_control_irq(0); + break; + + case CLOCK_EVT_MODE_RESUME: + /* Nothing to do here */ break; } spin_unlock_irqrestore(&i8253_lock, flags); diff -puN include/linux/clockchips.h~clockevents-fix-resume-logic-updated-version include/linux/clockchips.h --- a/include/linux/clockchips.h~clockevents-fix-resume-logic-updated-version +++ a/include/linux/clockchips.h @@ -23,6 +23,7 @@ enum clock_event_mode { CLOCK_EVT_MODE_SHUTDOWN, CLOCK_EVT_MODE_PERIODIC, CLOCK_EVT_MODE_ONESHOT, + CLOCK_EVT_MODE_RESUME, }; /* Clock event notification values */ diff -puN kernel/time/tick-broadcast.c~clockevents-fix-resume-logic-updated-version kernel/time/tick-broadcast.c --- a/kernel/time/tick-broadcast.c~clockevents-fix-resume-logic-updated-version +++ a/kernel/time/tick-broadcast.c @@ -292,7 +292,7 @@ void tick_suspend_broadcast(void) spin_lock_irqsave(&tick_broadcast_lock, flags); bc = tick_broadcast_device.evtdev; - if (bc && tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC) + if (bc) clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN); spin_unlock_irqrestore(&tick_broadcast_lock, flags); @@ -309,6 +309,8 @@ int tick_resume_broadcast(void) bc = tick_broadcast_device.evtdev; if (bc) { + clockevents_set_mode(bc, CLOCK_EVT_MODE_RESUME); + switch (tick_broadcast_device.mode) { case TICKDEV_MODE_PERIODIC: if(!cpus_empty(tick_broadcast_mask)) diff -puN kernel/time/tick-common.c~clockevents-fix-resume-logic-updated-version kernel/time/tick-common.c --- a/kernel/time/tick-common.c~clockevents-fix-resume-logic-updated-version +++ a/kernel/time/tick-common.c @@ -318,12 +318,17 @@ static void tick_resume(void) { struct tick_device *td = &__get_cpu_var(tick_cpu_device); unsigned long flags; + int broadcast = tick_resume_broadcast(); spin_lock_irqsave(&tick_device_lock, flags); - if (td->mode == TICKDEV_MODE_PERIODIC) - tick_setup_periodic(td->evtdev, 0); - else - tick_resume_oneshot(); + clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_RESUME); + + if (!broadcast) { + if (td->mode == TICKDEV_MODE_PERIODIC) + tick_setup_periodic(td->evtdev, 0); + else + tick_resume_oneshot(); + } spin_unlock_irqrestore(&tick_device_lock, flags); } @@ -360,8 +365,7 @@ static int tick_notify(struct notifier_b break; case CLOCK_EVT_NOTIFY_RESUME: - if (!tick_resume_broadcast()) - tick_resume(); + tick_resume(); break; default: _ ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-11 16:47 ` Andrew Morton @ 2007-05-11 20:10 ` Rafael J. Wysocki 2007-05-11 20:28 ` Andrew Morton 0 siblings, 1 reply; 61+ messages in thread From: Rafael J. Wysocki @ 2007-05-11 20:10 UTC (permalink / raw) To: Andrew Morton; +Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi On Friday, 11 May 2007 18:47, Andrew Morton wrote: > On Thu, 10 May 2007 22:12:07 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > On Thursday, 10 May 2007 11:27, Thomas Gleixner wrote: > > > On Thu, 2007-05-10 at 02:18 -0700, Andrew Morton wrote: > > > > > > > If that patch makes the problem go away, then we should have a quite > > > > > > > good hint what we need to look at. > > > > > > > > > > > > No joy, sorry. It still hangs at the last statement in acpi_evaluate_object(). > > > > > > > > > > Can you add "nolapic_timer" to the command line please ? > > > > > > > > > > > > > That works. > > > > > > Ok, that boils it down to the change, which affects the lapic timer > > > resume. It does nothing else, than fiddling in some APIC registers, but > > > I don't see how this affects the ACPI stuff. This smells extremly fishy. > > > > Hmm, I wonder if it might be related to the execution of the _GTS ACPI control > > method in acpi_enter_sleep_state_prep() in violation of the spec. > > > > Andrew, could you please try to apply the following change on top of the > > previous ones and see if the machine still hangs in the same place without > > "nolapic_timer"? > > > > --- > > drivers/acpi/hardware/hwsleep.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c > > =================================================================== > > --- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c > > +++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c > > @@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep( > > return_ACPI_STATUS(status); > > } > > > > - status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); > > + /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); > > if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) { > > return_ACPI_STATUS(status); > > - } > > + }*/ > > > > /* Setup the argument to _SST */ > > > > I tested this against Thomas's original patch (below). Still hangs, in the > same way. Well, I'm out of ideas. :-( Could you please send me the Vaio's DSDT? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-11 20:10 ` Rafael J. Wysocki @ 2007-05-11 20:28 ` Andrew Morton 2007-05-11 21:02 ` Rafael J. Wysocki 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-11 20:28 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi On Fri, 11 May 2007 22:10:24 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c > > > =================================================================== > > > --- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c > > > +++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c > > > @@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep( > > > return_ACPI_STATUS(status); > > > } > > > > > > - status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); > > > + /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); > > > if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) { > > > return_ACPI_STATUS(status); > > > - } > > > + }*/ > > > > > > /* Setup the argument to _SST */ > > > > > > > I tested this against Thomas's original patch (below). Still hangs, in the > > same way. > > Well, I'm out of ideas. :-( > > Could you please send me the Vaio's DSDT? Maybe I should send you the Vaio ;) I have this 1999-era pre-ACPI, pre-everything-else Dual-PIII-based Supermicro machine and it just *always* works. Sigh. hm, Fedora don't seem to want to give me an RPM which contains acpidump and all the yum servers are featuring scrogged checksums. I could build it, I guess, but there's a principle involved ;) http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-11 20:28 ` Andrew Morton @ 2007-05-11 21:02 ` Rafael J. Wysocki 2007-05-11 21:09 ` Rafael J. Wysocki 0 siblings, 1 reply; 61+ messages in thread From: Rafael J. Wysocki @ 2007-05-11 21:02 UTC (permalink / raw) To: Andrew Morton; +Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi On Friday, 11 May 2007 22:28, Andrew Morton wrote: > On Fri, 11 May 2007 22:10:24 +0200 > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > > > Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c > > > > =================================================================== > > > > --- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c > > > > +++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c > > > > @@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep( > > > > return_ACPI_STATUS(status); > > > > } > > > > > > > > - status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); > > > > + /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); > > > > if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) { > > > > return_ACPI_STATUS(status); > > > > - } > > > > + }*/ > > > > > > > > /* Setup the argument to _SST */ > > > > > > > > > > I tested this against Thomas's original patch (below). Still hangs, in the > > > same way. > > > > Well, I'm out of ideas. :-( > > > > Could you please send me the Vaio's DSDT? > > Maybe I should send you the Vaio ;) Well, machines with so much potential of exposing problems are priceless. ;-) > I have this 1999-era pre-ACPI, pre-everything-else Dual-PIII-based > Supermicro machine and it just *always* works. Sigh. > > > hm, Fedora don't seem to want to give me an RPM which contains acpidump and > all the yum servers are featuring scrogged checksums. I could build it, I > guess, but there's a principle involved ;) > > http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK? Yes, thanks. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-11 21:02 ` Rafael J. Wysocki @ 2007-05-11 21:09 ` Rafael J. Wysocki 2007-05-12 6:56 ` Andrew Morton 0 siblings, 1 reply; 61+ messages in thread From: Rafael J. Wysocki @ 2007-05-11 21:09 UTC (permalink / raw) To: Andrew Morton; +Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi On Friday, 11 May 2007 23:02, Rafael J. Wysocki wrote: > On Friday, 11 May 2007 22:28, Andrew Morton wrote: > > On Fri, 11 May 2007 22:10:24 +0200 > > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > > > > > Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c > > > > > =================================================================== > > > > > --- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c > > > > > +++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c > > > > > @@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep( > > > > > return_ACPI_STATUS(status); > > > > > } > > > > > > > > > > - status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); > > > > > + /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL); > > > > > if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) { > > > > > return_ACPI_STATUS(status); > > > > > - } > > > > > + }*/ > > > > > > > > > > /* Setup the argument to _SST */ > > > > > > > > > > > > > I tested this against Thomas's original patch (below). Still hangs, in the > > > > same way. > > > > > > Well, I'm out of ideas. :-( > > > > > > Could you please send me the Vaio's DSDT? > > > > Maybe I should send you the Vaio ;) > > Well, machines with so much potential of exposing problems are priceless. ;-) > > > I have this 1999-era pre-ACPI, pre-everything-else Dual-PIII-based > > Supermicro machine and it just *always* works. Sigh. > > > > > > hm, Fedora don't seem to want to give me an RPM which contains acpidump and > > all the yum servers are featuring scrogged checksums. I could build it, I > > guess, but there's a principle involved ;) > > > > http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK? > > Yes, thanks. Hmm, have you tried to do 'echo shutdown > /sys/power/disk' before the hibernation? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-11 21:09 ` Rafael J. Wysocki @ 2007-05-12 6:56 ` Andrew Morton 2007-05-12 8:46 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-12 6:56 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi On Fri, 11 May 2007 23:09:15 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > > > > hm, Fedora don't seem to want to give me an RPM which contains acpidump and > > > all the yum servers are featuring scrogged checksums. I could build it, I > > > guess, but there's a principle involved ;) > > > > > > http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK? > > > > Yes, thanks. > > Hmm, have you tried to do 'echo shutdown > /sys/power/disk' before the > hibernation? That didn't change the behaviour. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 6:56 ` Andrew Morton @ 2007-05-12 8:46 ` Thomas Gleixner 2007-05-12 9:00 ` Andrew Morton 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-12 8:46 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Fri, 2007-05-11 at 23:56 -0700, Andrew Morton wrote: > On Fri, 11 May 2007 23:09:15 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > > > > > > > hm, Fedora don't seem to want to give me an RPM which contains acpidump and > > > > all the yum servers are featuring scrogged checksums. I could build it, I > > > > guess, but there's a principle involved ;) > > > > > > > > http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK? > > > > > > Yes, thanks. > > > > Hmm, have you tried to do 'echo shutdown > /sys/power/disk' before the > > hibernation? > > That didn't change the behaviour. Andrew, can you try the desperate witchcraft patch below ? tglx Index: linux-2.6.21/arch/i386/kernel/apic.c =================================================================== --- linux-2.6.21.orig/arch/i386/kernel/apic.c +++ linux-2.6.21/arch/i386/kernel/apic.c @@ -238,9 +238,13 @@ static void lapic_timer_setup(enum clock break; case CLOCK_EVT_MODE_UNUSED: case CLOCK_EVT_MODE_SHUTDOWN: - v = apic_read(APIC_LVTT); - v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR); - apic_write_around(APIC_LVTT, v); + + if (evt->mode == CLOCK_EVT_MODE_PERIODIC || + evt->mode == CLOCK_EVT_MODE_ONESHOT) { + v = apic_read(APIC_LVTT); + v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR); + apic_write_around(APIC_LVTT, v); + } break; case CLOCK_EVT_MODE_RESUME: /* Nothing to do here */ ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 8:46 ` Thomas Gleixner @ 2007-05-12 9:00 ` Andrew Morton 2007-05-12 9:18 ` Thomas Gleixner 2007-05-13 16:12 ` Thomas Gleixner 0 siblings, 2 replies; 61+ messages in thread From: Andrew Morton @ 2007-05-12 9:00 UTC (permalink / raw) To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 12 May 2007 10:46:03 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > On Fri, 2007-05-11 at 23:56 -0700, Andrew Morton wrote: > > On Fri, 11 May 2007 23:09:15 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > > > > > > > > > > hm, Fedora don't seem to want to give me an RPM which contains acpidump and > > > > > all the yum servers are featuring scrogged checksums. I could build it, I > > > > > guess, but there's a principle involved ;) > > > > > > > > > > http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK? > > > > > > > > Yes, thanks. > > > > > > Hmm, have you tried to do 'echo shutdown > /sys/power/disk' before the > > > hibernation? > > > > That didn't change the behaviour. > > Andrew, > > can you try the desperate witchcraft patch below ? > > tglx > > Index: linux-2.6.21/arch/i386/kernel/apic.c > =================================================================== > --- linux-2.6.21.orig/arch/i386/kernel/apic.c > +++ linux-2.6.21/arch/i386/kernel/apic.c > @@ -238,9 +238,13 @@ static void lapic_timer_setup(enum clock > break; > case CLOCK_EVT_MODE_UNUSED: > case CLOCK_EVT_MODE_SHUTDOWN: > - v = apic_read(APIC_LVTT); > - v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR); > - apic_write_around(APIC_LVTT, v); > + > + if (evt->mode == CLOCK_EVT_MODE_PERIODIC || > + evt->mode == CLOCK_EVT_MODE_ONESHOT) { > + v = apic_read(APIC_LVTT); > + v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR); > + apic_write_around(APIC_LVTT, v); > + } > break; > case CLOCK_EVT_MODE_RESUME: > /* Nothing to do here */ > Still hangs in the same fashion, sorry. It's peculiar that the hang happens when acpi_evaluate_object() hits its return statement. Any theories there? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 9:00 ` Andrew Morton @ 2007-05-12 9:18 ` Thomas Gleixner 2007-05-12 10:07 ` Andrew Morton 2007-05-13 16:12 ` Thomas Gleixner 1 sibling, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-12 9:18 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 2007-05-12 at 02:00 -0700, Andrew Morton wrote: > > Still hangs in the same fashion, sorry. I did not really expect that it fixes the problem. It just restored the local APIC suspend/resume register fiddling which we had before the resume logic patch. > It's peculiar that the hang happens when acpi_evaluate_object() hits its > return statement. Any theories there? Only stack or memory corruption come into mind, but I have no clue how this is related to the resume logic changes. tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 9:18 ` Thomas Gleixner @ 2007-05-12 10:07 ` Andrew Morton 2007-05-12 11:44 ` Thomas Gleixner ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Andrew Morton @ 2007-05-12 10:07 UTC (permalink / raw) To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 12 May 2007 11:18:09 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > > It's peculiar that the hang happens when acpi_evaluate_object() hits its > > return statement. Any theories there? > > Only stack or memory corruption come into mind, but I have no clue how > this is related to the resume logic changes. So I had the brilliant idea of turning on some kernel debugging. It's a shame that CONFIG_SOFTWARE_SUSPEND disables CONFIG_DEBUG_PAGEALLOC. [ 73.533454] swsusp: Basic memory bitmaps created [ 73.550429] Stopping tasks ... BUG: at kernel/lockdep.c:2414 check_flags() [ 73.550988] [<c0104c14>] show_trace_log_lvl+0x1a/0x30 [ 73.551143] [<c0105769>] show_trace+0x12/0x14 [ 73.551279] [<c01057c1>] dump_stack+0x15/0x17 [ 73.551412] [<c0132732>] check_flags+0x93/0x13d [ 73.551554] [<c0135558>] lock_acquire+0x28/0x7f [ 73.551691] [<c0310e35>] _spin_lock+0x2b/0x38 [ 73.551827] [<c013dc43>] refrigerator+0x16/0xc7 [ 73.551965] [<c0125d2e>] get_signal_to_deliver+0x32/0x387 [ 73.552124] [<c010336d>] do_notify_resume+0x91/0x6a9 [ 73.552271] [<c0103df1>] work_notifysig+0x13/0x1a [ 73.552413] ======================= [ 73.552507] irq event stamp: 3075 [ 73.552595] hardirqs last enabled at (3075): [<c0103e51>] syscall_exit_work+0x11/0x26 [ 73.552821] hardirqs last disabled at (3074): [<c0103d35>] syscall_exit+0x9/0x1a [ 73.553046] softirqs last enabled at (2778): [<c01209f2>] __do_softirq+0x92/0x9a [ 73.553255] softirqs last disabled at (2693): [<c0120a27>] do_softirq+0x2d/0x46 [ 73.559504] done. [ 73.559569] Shrinking memory... \b-\bdone (0 pages freed) [ 73.646511] Freed 0 kbytes in 0.08 seconds (0.00 MB/s) [ 73.649595] platform sonypi: freeze [ 73.649707] platform bluetooth: freeze [ 73.649817] usb_endpoint usbdev5.1_ep81: PM: suspend 0->1, parent 5-0:1.0 already 2 [ 73.650023] hub 5-0:1.0: PM: suspend 2-->1 <snippage> [ 73.739499] ipw2200 0000:06:0b.0: freeze [ 73.743860] eth1: Going into suspend... [ 73.748444] e100 0000:06:08.0: freeze at this point I lost netconsole (earlier testing was without netconsole btw) The lockdep spew is coming out of here: static void check_flags(unsigned long flags) { #if defined(CONFIG_DEBUG_LOCKDEP) && defined(CONFIG_TRACE_IRQFLAGS) if (!debug_locks) return; if (irqs_disabled_flags(flags)) --> DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled); else DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled); and the callsite is: void refrigerator(void) { /* Hmm, should we be allowed to suspend when there are realtime processes around? */ long save; --> task_lock(current); if (freezing(current)) { frozen_process(); task_unlock(current); } else { I don't really know what lockdep is complaining about there. I assume I'm not supposed to, given that whoever wrote that couldn't be bothered documenting any of it. I _think_ it means that lockdep believes that local irqs are enabled (according to its state tracking), only it turns out that they're not. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 10:07 ` Andrew Morton @ 2007-05-12 11:44 ` Thomas Gleixner 2007-05-12 16:51 ` Andrew Morton 2007-05-12 11:52 ` Thomas Gleixner 2007-05-15 16:52 ` Pavel Machek 2 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-12 11:44 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 2007-05-12 at 03:07 -0700, Andrew Morton wrote: > On Sat, 12 May 2007 11:18:09 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > > > > It's peculiar that the hang happens when acpi_evaluate_object() hits its > > > return statement. Any theories there? > > > > Only stack or memory corruption come into mind, but I have no clue how > > this is related to the resume logic changes. > > So I had the brilliant idea of turning on some kernel debugging. It's > a shame that CONFIG_SOFTWARE_SUSPEND disables CONFIG_DEBUG_PAGEALLOC. Really brilliant. I tried to reproduce your problem and stumbled across something else. tglx -------------------------------> Subject: clocksource fix lock order in the resume path lockdep complains about the lock nesting of clocksource and watchdog lock in the resume path. Move watchdog resume out of the clocksource lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Index: linux-2.6.21/kernel/time/clocksource.c =================================================================== --- linux-2.6.21.orig/kernel/time/clocksource.c +++ linux-2.6.21/kernel/time/clocksource.c @@ -151,9 +151,11 @@ static void clocksource_watchdog(unsigne } static void clocksource_resume_watchdog(void) { - spin_lock(&watchdog_lock); + unsigned long flags; + + spin_lock_irqsave(&watchdog_lock, flags); watchdog_resumed = 1; - spin_unlock(&watchdog_lock); + spin_unlock_irqrestore(&watchdog_lock, flags); } static void clocksource_check_watchdog(struct clocksource *cs) @@ -224,9 +226,9 @@ void clocksource_resume(void) cs->resume(); } - clocksource_resume_watchdog(); - spin_unlock_irqrestore(&clocksource_lock, flags); + + clocksource_resume_watchdog(); } /** ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 11:44 ` Thomas Gleixner @ 2007-05-12 16:51 ` Andrew Morton 2007-05-12 17:01 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-12 16:51 UTC (permalink / raw) To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 12 May 2007 13:44:13 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > lockdep complains about the lock nesting of clocksource and watchdog > lock in the resume path. Move watchdog resume out of the clocksource > lock. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > > Index: linux-2.6.21/kernel/time/clocksource.c > =================================================================== > --- linux-2.6.21.orig/kernel/time/clocksource.c > +++ linux-2.6.21/kernel/time/clocksource.c > @@ -151,9 +151,11 @@ static void clocksource_watchdog(unsigne > } > static void clocksource_resume_watchdog(void) > { > - spin_lock(&watchdog_lock); > + unsigned long flags; > + > + spin_lock_irqsave(&watchdog_lock, flags); > watchdog_resumed = 1; > - spin_unlock(&watchdog_lock); > + spin_unlock_irqrestore(&watchdog_lock, flags); > } > > static void clocksource_check_watchdog(struct clocksource *cs) > @@ -224,9 +226,9 @@ void clocksource_resume(void) > cs->resume(); > } > > - clocksource_resume_watchdog(); > - > spin_unlock_irqrestore(&clocksource_lock, flags); > + > + clocksource_resume_watchdog(); > } > The locking in clocksource_resume_watchdog looks pretty pointless anyway. Can't we just delete it? The only thing it can race against is, conceivably, resumed = watchdog_resumed; if (unlikely(resumed)) watchdog_resumed = 0; which could be solved by using test_and_clear_bit(). Does anyone have any theories about my lockdep warning? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 16:51 ` Andrew Morton @ 2007-05-12 17:01 ` Thomas Gleixner 2007-05-12 17:23 ` Andrew Morton 2007-05-12 18:49 ` Thomas Gleixner 0 siblings, 2 replies; 61+ messages in thread From: Thomas Gleixner @ 2007-05-12 17:01 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 2007-05-12 at 09:51 -0700, Andrew Morton wrote: > The locking in clocksource_resume_watchdog looks pretty pointless anyway. > Can't we just delete it? > > The only thing it can race against is, conceivably, > > resumed = watchdog_resumed; > if (unlikely(resumed)) > watchdog_resumed = 0; > > which could be solved by using test_and_clear_bit(). True. I'll redo it. > Does anyone have any theories about my lockdep warning? Can you upload a snapshot of your current queue ? tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 17:01 ` Thomas Gleixner @ 2007-05-12 17:23 ` Andrew Morton 2007-05-12 19:36 ` Thomas Gleixner 2007-05-12 19:56 ` Thomas Gleixner 2007-05-12 18:49 ` Thomas Gleixner 1 sibling, 2 replies; 61+ messages in thread From: Andrew Morton @ 2007-05-12 17:23 UTC (permalink / raw) To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 12 May 2007 19:01:41 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > Can you upload a snapshot of your current queue ? Yeah, that's super-easy. It just happens that it all compiles and runs at present ;) The single-big-patch is at http://userweb.kernel.org/~akpm/tg.bz2 The broken-out queue should turn up at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ in a few minutes. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 17:23 ` Andrew Morton @ 2007-05-12 19:36 ` Thomas Gleixner 2007-05-12 19:56 ` Thomas Gleixner 1 sibling, 0 replies; 61+ messages in thread From: Thomas Gleixner @ 2007-05-12 19:36 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 2007-05-12 at 10:23 -0700, Andrew Morton wrote: > On Sat, 12 May 2007 19:01:41 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > > > Can you upload a snapshot of your current queue ? > > Yeah, that's super-easy. It just happens that it all compiles > and runs at present ;) Really ? /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c: In function ‘pcxhr_trigger’: /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:640: error: expected expression before ‘<<’ token /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:642: error: expected expression before ‘==’ token /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:644: warning: ISO C90 forbids mixed declarations and code /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:645: error: expected expression before ‘>>’ token /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: error: ‘s’ undeclared (first use in this function) /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: error: (Each undeclared identifier is reported only once /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: error: for each function it appears in.) /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: warning: type defaults to ‘int’ in declaration of ‘type name’ /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: error: request for member ‘link_list’ in something not a structure or union /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: warning: type defaults to ‘int’ in declaration of ‘__mptr’ /home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: warning: initialization from incompatible pointer type ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 17:23 ` Andrew Morton 2007-05-12 19:36 ` Thomas Gleixner @ 2007-05-12 19:56 ` Thomas Gleixner 2007-05-13 1:11 ` Andrew Morton 1 sibling, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-12 19:56 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 2007-05-12 at 10:23 -0700, Andrew Morton wrote: > The broken-out queue should turn up at > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ > in a few minutes. Sigh. I can't reproduce your lockdep problem :( Can you send me your .config please ? tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 19:56 ` Thomas Gleixner @ 2007-05-13 1:11 ` Andrew Morton 2007-05-13 8:07 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-13 1:11 UTC (permalink / raw) To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 12 May 2007 21:56:10 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > On Sat, 2007-05-12 at 10:23 -0700, Andrew Morton wrote: > > The broken-out queue should turn up at > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ > > in a few minutes. > > Sigh. I can't reproduce your lockdep problem :( We should write a Vaio emulator. > Can you send me your .config please ? > http://userweb.kernel.org/~akpm/config-sony.txt ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-13 1:11 ` Andrew Morton @ 2007-05-13 8:07 ` Thomas Gleixner 2007-05-13 16:48 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-13 8:07 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 2007-05-12 at 18:11 -0700, Andrew Morton wrote: > On Sat, 12 May 2007 21:56:10 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > > > On Sat, 2007-05-12 at 10:23 -0700, Andrew Morton wrote: > > > The broken-out queue should turn up at > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ > > > in a few minutes. > > > > Sigh. I can't reproduce your lockdep problem :( > > We should write a Vaio emulator. :) > > Can you send me your .config please ? > > > > http://userweb.kernel.org/~akpm/config-sony.txt No luck. tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-13 8:07 ` Thomas Gleixner @ 2007-05-13 16:48 ` Thomas Gleixner 2007-05-13 19:09 ` Andrew Morton 0 siblings, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-13 16:48 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sun, 2007-05-13 at 10:07 +0200, Thomas Gleixner wrote: > > > Can you send me your .config please ? > > > > > > > http://userweb.kernel.org/~akpm/config-sony.txt > > No luck. I'm making progress. After fiddling with the config options I get a solid lockup of netconsole during boot :( But also lockdep itself is broken on my box, so I would not see warnings anyway: soft-irqs-on + irq-safe-A/21: ok | ok | ok | sirq-safe-A => hirqs-on/12: ok | ok |irq event stamp: 472 hardirqs last enabled at (472): [<c01e8bba>] irqsafe2A_rlock_12+0x94/0xa1 hardirqs last disabled at (471): [<c0109541>] native_sched_clock+0x55/0xe8 softirqs last enabled at (468): [<c01e8ba5>] irqsafe2A_rlock_12+0x7f/0xa1 softirqs last disabled at (464): [<c01e8b31>] irqsafe2A_rlock_12+0xb/0xa1 FAILED| [<c0104d21>] dump_trace+0x61/0x1e7 [<c0104ec1>] show_trace_log_lvl+0x1a/0x30 [<c0105a1b>] show_trace+0x12/0x14 [<c0105a73>] dump_stack+0x15/0x17 [<c01e6971>] dotest+0x6b/0x3ce [<c01efc1f>] locking_selftest+0x915/0x1a5a [<c04a0a0a>] start_kernel+0x1d5/0x2a7 ======================= .... more of those ----------------------------------------------------------------- BUG: 6 unexpected failures (out of 218) - debugging disabled! | ----------------------------------------------------------------- tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-13 16:48 ` Thomas Gleixner @ 2007-05-13 19:09 ` Andrew Morton 2007-05-13 20:07 ` Ingo Molnar 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-13 19:09 UTC (permalink / raw) To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sun, 13 May 2007 18:48:07 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > On Sun, 2007-05-13 at 10:07 +0200, Thomas Gleixner wrote: > > > > Can you send me your .config please ? > > > > > > > > > > http://userweb.kernel.org/~akpm/config-sony.txt > > > > No luck. > > I'm making progress. After fiddling with the config options I get a > solid lockup of netconsole during boot :( neat. Sometimes this can be a device driver failure. > But also lockdep itself is broken on my box, so I would not see warnings > anyway: > > soft-irqs-on + irq-safe-A/21: ok | ok | ok | > sirq-safe-A => hirqs-on/12: ok | ok |irq event stamp: 472 > hardirqs last enabled at (472): [<c01e8bba>] irqsafe2A_rlock_12+0x94/0xa1 > hardirqs last disabled at (471): [<c0109541>] native_sched_clock+0x55/0xe8 > softirqs last enabled at (468): [<c01e8ba5>] irqsafe2A_rlock_12+0x7f/0xa1 > softirqs last disabled at (464): [<c01e8b31>] irqsafe2A_rlock_12+0xb/0xa1 > FAILED| [<c0104d21>] dump_trace+0x61/0x1e7 > [<c0104ec1>] show_trace_log_lvl+0x1a/0x30 > [<c0105a1b>] show_trace+0x12/0x14 > [<c0105a73>] dump_stack+0x15/0x17 > [<c01e6971>] dotest+0x6b/0x3ce > [<c01efc1f>] locking_selftest+0x915/0x1a5a > [<c04a0a0a>] start_kernel+0x1d5/0x2a7 > ======================= > > .... more of those > > ----------------------------------------------------------------- > BUG: 6 unexpected failures (out of 218) - debugging disabled! | > ----------------------------------------------------------------- > Yeah, I expect quite a few people will start seeing that. iirc it was triggered by a couple of changes: a local_irq_save/local_irq_restore in sched_clock() and a change Jeremy made to the softlockup detector. There was no reasonable explanation why either of those changes should have triggered the six failures so I just went ahead with them so we can flush the real bug out. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-13 19:09 ` Andrew Morton @ 2007-05-13 20:07 ` Ingo Molnar 2007-05-13 22:35 ` Andrew Morton 0 siblings, 1 reply; 61+ messages in thread From: Ingo Molnar @ 2007-05-13 20:07 UTC (permalink / raw) To: Andrew Morton; +Cc: tglx, Rafael J. Wysocki, LKML, John Stultz, linux-acpi * Andrew Morton <akpm@linux-foundation.org> wrote: > Yeah, I expect quite a few people will start seeing that. iirc it was > triggered by a couple of changes: a local_irq_save/local_irq_restore > in sched_clock() and a change Jeremy made to the softlockup detector. hm, i specifically asked to not do local_irq_save/restore in sched_clock(). It's the _scheduler_ clock and it very absolutely positively needs no flags save/restore. Ingo ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-13 20:07 ` Ingo Molnar @ 2007-05-13 22:35 ` Andrew Morton 0 siblings, 0 replies; 61+ messages in thread From: Andrew Morton @ 2007-05-13 22:35 UTC (permalink / raw) To: Ingo Molnar; +Cc: tglx, Rafael J. Wysocki, LKML, John Stultz, linux-acpi On Sun, 13 May 2007 22:07:39 +0200 Ingo Molnar <mingo@elte.hu> wrote: > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > Yeah, I expect quite a few people will start seeing that. iirc it was > > triggered by a couple of changes: a local_irq_save/local_irq_restore > > in sched_clock() and a change Jeremy made to the softlockup detector. > > hm, i specifically asked to not do local_irq_save/restore in > sched_clock(). It's the _scheduler_ clock and it very absolutely > positively needs no flags save/restore. > It got taken out again. It doesn't matter though: a local_irq_save/restore in some callee shouldn't cause the locking API tests to break. And they're apparently now breaking without that local_irq_save/restore in there anyway. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 17:01 ` Thomas Gleixner 2007-05-12 17:23 ` Andrew Morton @ 2007-05-12 18:49 ` Thomas Gleixner 1 sibling, 0 replies; 61+ messages in thread From: Thomas Gleixner @ 2007-05-12 18:49 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 2007-05-12 at 19:01 +0200, Thomas Gleixner wrote: > On Sat, 2007-05-12 at 09:51 -0700, Andrew Morton wrote: > > The locking in clocksource_resume_watchdog looks pretty pointless anyway. > > Can't we just delete it? > > > > The only thing it can race against is, conceivably, > > > > resumed = watchdog_resumed; > > if (unlikely(resumed)) > > watchdog_resumed = 0; > > > > which could be solved by using test_and_clear_bit(). > > True. I'll redo it. Here you go. tglx -------------------------------> Subject: clocksource fix lock order in the resume path lockdep complains about the lock nesting of clocksource and watchdog lock in the resume path. Change the resume marker to a bit operation and remove the lock from this path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -74,7 +74,7 @@ static struct clocksource *watchdog; static struct timer_list watchdog_timer; static DEFINE_SPINLOCK(watchdog_lock); static cycle_t watchdog_last; -static int watchdog_resumed; +static unsigned long watchdog_resumed; /* * Interval: 0.5sec Threshold: 0.0625s @@ -104,9 +104,7 @@ static void clocksource_watchdog(unsigned long data) spin_lock(&watchdog_lock); - resumed = watchdog_resumed; - if (unlikely(resumed)) - watchdog_resumed = 0; + resumed = test_and_clear_bit(0, &watchdog_resumed); wdnow = watchdog->read(); wd_nsec = cyc2ns(watchdog, (wdnow - watchdog_last) & watchdog->mask); @@ -151,9 +149,7 @@ static void clocksource_watchdog(unsigned long data) } static void clocksource_resume_watchdog(void) { - spin_lock(&watchdog_lock); - watchdog_resumed = 1; - spin_unlock(&watchdog_lock); + set_bit(0, &watchdog_resumed); } static void clocksource_check_watchdog(struct clocksource *cs) ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 10:07 ` Andrew Morton 2007-05-12 11:44 ` Thomas Gleixner @ 2007-05-12 11:52 ` Thomas Gleixner 2007-05-15 16:52 ` Pavel Machek 2 siblings, 0 replies; 61+ messages in thread From: Thomas Gleixner @ 2007-05-12 11:52 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 2007-05-12 at 03:07 -0700, Andrew Morton wrote: > On Sat, 12 May 2007 11:18:09 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > > > > It's peculiar that the hang happens when acpi_evaluate_object() hits its > > > return statement. Any theories there? > > > > Only stack or memory corruption come into mind, but I have no clue how > > this is related to the resume logic changes. > > So I had the brilliant idea of turning on some kernel debugging. It's > a shame that CONFIG_SOFTWARE_SUSPEND disables CONFIG_DEBUG_PAGEALLOC. ... > I don't really know what lockdep is complaining about there. I assume I'm > not supposed to, given that whoever wrote that couldn't be bothered > documenting any of it. > > I _think_ it means that lockdep believes that local irqs are enabled > (according to its state tracking), only it turns out that they're not. Right. Lockdep checks, whether the tracked interrupt disabled/enabled state is the same as the state in the hardware. When the check triggers, then we disabled / enabled interrupts somewhere without going through lockdeps state tracker. /me suspects ASM code missing the TRACE_IFQS_OFF macro in some place, which was recently modified. tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 10:07 ` Andrew Morton 2007-05-12 11:44 ` Thomas Gleixner 2007-05-12 11:52 ` Thomas Gleixner @ 2007-05-15 16:52 ` Pavel Machek 2 siblings, 0 replies; 61+ messages in thread From: Pavel Machek @ 2007-05-15 16:52 UTC (permalink / raw) To: Andrew Morton, Rafael J. Wysocki Cc: tglx, Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi Hi! > > > It's peculiar that the hang happens when acpi_evaluate_object() hits its > > > return statement. Any theories there? > > > > Only stack or memory corruption come into mind, but I have no clue how > > this is related to the resume logic changes. > > So I had the brilliant idea of turning on some kernel debugging. It's > a shame that CONFIG_SOFTWARE_SUSPEND disables CONFIG_DEBUG_PAGEALLOC. We now switch to separate page tables for hibernation, IIRC, so we may be able to get DEBUG_PAGEALLOC to work. I'll take a look. > [ 73.533454] swsusp: Basic memory bitmaps created > [ 73.550429] Stopping tasks ... BUG: at kernel/lockdep.c:2414 check_flags() > [ 73.550988] [<c0104c14>] show_trace_log_lvl+0x1a/0x30 > [ 73.551143] [<c0105769>] show_trace+0x12/0x14 > [ 73.551279] [<c01057c1>] dump_stack+0x15/0x17 > [ 73.551412] [<c0132732>] check_flags+0x93/0x13d > [ 73.551554] [<c0135558>] lock_acquire+0x28/0x7f > [ 73.551691] [<c0310e35>] _spin_lock+0x2b/0x38 > [ 73.551827] [<c013dc43>] refrigerator+0x16/0xc7 > [ 73.551965] [<c0125d2e>] get_signal_to_deliver+0x32/0x387 > [ 73.552124] [<c010336d>] do_notify_resume+0x91/0x6a9 > [ 73.552271] [<c0103df1>] work_notifysig+0x13/0x1a > [ 73.552413] ======================= > [ 73.552507] irq event stamp: 3075 > [ 73.552595] hardirqs last enabled at (3075): [<c0103e51>] syscall_exit_work+0x11/0x26 > [ 73.552821] hardirqs last disabled at (3074): [<c0103d35>] syscall_exit+0x9/0x1a > [ 73.553046] softirqs last enabled at (2778): [<c01209f2>] __do_softirq+0x92/0x9a > [ 73.553255] softirqs last disabled at (2693): [<c0120a27>] do_softirq+0x2d/0x46 > [ 73.559504] done. > [ 73.559569] Shrinking memory... \b-\bdone (0 pages freed) > [ 73.646511] Freed 0 kbytes in 0.08 seconds (0.00 MB/s) > [ 73.649595] platform sonypi: freeze > [ 73.649707] platform bluetooth: freeze > [ 73.649817] usb_endpoint usbdev5.1_ep81: PM: suspend 0->1, parent 5-0:1.0 already 2 > [ 73.650023] hub 5-0:1.0: PM: suspend 2-->1 > > <snippage> > > [ 73.739499] ipw2200 0000:06:0b.0: freeze > [ 73.743860] eth1: Going into suspend... > [ 73.748444] e100 0000:06:08.0: freeze > > at this point I lost netconsole (earlier testing was without netconsole > btw) Commenting out e100's suspend routine might do the trick. > void refrigerator(void) > { > /* Hmm, should we be allowed to suspend when there are realtime > processes around? */ > long save; > > --> task_lock(current); > if (freezing(current)) { > frozen_process(); > task_unlock(current); > } else { > > > I don't really know what lockdep is complaining about there. I assume I'm > not supposed to, given that whoever wrote that couldn't be bothered > documenting any of it. > I _think_ it means that lockdep believes that local irqs are enabled > (according to its state tracking), only it turns out that they're not. Huh, running refrigerator with interrupts disabled? I do not think we are doing _that_. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-12 9:00 ` Andrew Morton 2007-05-12 9:18 ` Thomas Gleixner @ 2007-05-13 16:12 ` Thomas Gleixner 2007-05-14 6:42 ` Andrew Morton 1 sibling, 1 reply; 61+ messages in thread From: Thomas Gleixner @ 2007-05-13 16:12 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sat, 2007-05-12 at 02:00 -0700, Andrew Morton wrote: > Still hangs in the same fashion, sorry. > > It's peculiar that the hang happens when acpi_evaluate_object() hits its > return statement. Any theories there? I assume you have a printk right before the return and one after the call to acpi_evaluate_object(). Can you dump the stack at the point before the return, so we can see if the stack is corrupt there ? A WARN_ON(1) should do the trick. tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-13 16:12 ` Thomas Gleixner @ 2007-05-14 6:42 ` Andrew Morton 2007-05-14 6:58 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-14 6:42 UTC (permalink / raw) To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sun, 13 May 2007 18:12:50 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > On Sat, 2007-05-12 at 02:00 -0700, Andrew Morton wrote: > > Still hangs in the same fashion, sorry. > > > > It's peculiar that the hang happens when acpi_evaluate_object() hits its > > return statement. Any theories there? > > I assume you have a printk right before the return and one after the > call to acpi_evaluate_object(). > > Can you dump the stack at the point before the return, so we can see if > the stack is corrupt there ? A WARN_ON(1) should do the trick. > It all looks clean. I spose I should do a hex dump and a byte-by-byte walkthough, but time is short... ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-14 6:42 ` Andrew Morton @ 2007-05-14 6:58 ` Thomas Gleixner 0 siblings, 0 replies; 61+ messages in thread From: Thomas Gleixner @ 2007-05-14 6:58 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi On Sun, 2007-05-13 at 23:42 -0700, Andrew Morton wrote: > On Sun, 13 May 2007 18:12:50 +0200 Thomas Gleixner <tglx@linutronix.de> wrote: > > > On Sat, 2007-05-12 at 02:00 -0700, Andrew Morton wrote: > > > Still hangs in the same fashion, sorry. > > > > > > It's peculiar that the hang happens when acpi_evaluate_object() hits its > > > return statement. Any theories there? > > > > I assume you have a printk right before the return and one after the > > call to acpi_evaluate_object(). > > > > Can you dump the stack at the point before the return, so we can see if > > the stack is corrupt there ? A WARN_ON(1) should do the trick. > > > > It all looks clean. I spose I should do a hex dump and a byte-by-byte > walkthough, but time is short... I could backport the highres/dyntick stuff to 2.6.20, so we have the other changes out of the picture. tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 11:45 ` Rafael J. Wysocki 2007-05-09 12:24 ` Thomas Gleixner @ 2007-05-09 12:52 ` Rafael J. Wysocki 2007-05-09 17:14 ` Andrew Morton 1 sibling, 1 reply; 61+ messages in thread From: Rafael J. Wysocki @ 2007-05-09 12:52 UTC (permalink / raw) To: tglx; +Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi On Wednesday, 9 May 2007 13:45, Rafael J. Wysocki wrote: > On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote: > > On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote: > > > > I suspect I just tested the wrong thing yesterday. Let me recheck just > > > > these patches against 2.6.21. > > > > > > yup, same hang with just these three: > > > > > > origin > > > clocksource-fix-resume-logic > > > clockevents-fix-resume-logic-updated-version > > > > I have no idea, how this affects acpi_evaluate_object() > > I think the problem is that the ACPI code ordering here is broken in a > difficult to fix way. > > Definitely, we shouldn't execute the _BFS method after creating the image > and most probably _WAK shouldn't be executed here either. Moreover, > acpi_leave_sleep_state() enables the runtime GPEs, which AFAICS > is equivalent to allowing ACPI to generate SCIs. I'm not sure if this is a > good idea to do such a thing in this particular place. > > Andrew, could you please apply the appended patch and see if that > helps (should apply to -mm2)? Argh, sorry. This needs yet another patch (sent for review to linux-pm) to be applied. The following one is against -mm2: --- NOTE: This is not a complete solution, because it removes the enabling of GPEs from the resume-during-hibernation code path entirely, which probbably is not a good idea in general. --- kernel/power/disk.c | 1 - 1 file changed, 1 deletion(-) Index: linux-2.6.21-mm2/kernel/power/disk.c =================================================================== --- linux-2.6.21-mm2.orig/kernel/power/disk.c +++ linux-2.6.21-mm2/kernel/power/disk.c @@ -205,7 +205,6 @@ int hibernate(void) if (in_suspend) { enable_nonboot_cpus(); - platform_finish(); device_resume(); resume_console(); pr_debug("PM: writing image.\n"); ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 12:52 ` Rafael J. Wysocki @ 2007-05-09 17:14 ` Andrew Morton 2007-05-09 18:55 ` Rafael J. Wysocki 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-09 17:14 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: tglx, Ingo Molnar, LKML, John Stultz, linux-acpi On Wed, 9 May 2007 14:52:07 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > Andrew, could you please apply the appended patch and see if that > > helps (should apply to -mm2)? > > Argh, sorry. This needs yet another patch (sent for review to linux-pm) to > be applied. The following one is against -mm2: > > --- > NOTE: This is not a complete solution, because it removes the enabling of GPEs > from the resume-during-hibernation code path entirely, which probbably is not a > good idea in general. > --- > kernel/power/disk.c | 1 - > 1 file changed, 1 deletion(-) > > Index: linux-2.6.21-mm2/kernel/power/disk.c > =================================================================== > --- linux-2.6.21-mm2.orig/kernel/power/disk.c > +++ linux-2.6.21-mm2/kernel/power/disk.c > @@ -205,7 +205,6 @@ int hibernate(void) > > if (in_suspend) { > enable_nonboot_cpus(); > - platform_finish(); > device_resume(); > resume_console(); > pr_debug("PM: writing image.\n"); It now hangs in a similar fashion in the device_resume() call. If I back off Thomas's clockevents-fix-resume-logic-updated-version.patch and include just the above patch the machine does suspend and resume correctly. I can delve further into the device_resume() hang if we think it would be useful. Maybe Linus's clock-stomping tracer can help here, if I can remember how to use it. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version 2007-05-09 17:14 ` Andrew Morton @ 2007-05-09 18:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 61+ messages in thread From: Rafael J. Wysocki @ 2007-05-09 18:55 UTC (permalink / raw) To: Andrew Morton; +Cc: tglx, Ingo Molnar, LKML, John Stultz, linux-acpi On Wednesday, 9 May 2007 19:14, Andrew Morton wrote: > On Wed, 9 May 2007 14:52:07 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > > Andrew, could you please apply the appended patch and see if that > > > helps (should apply to -mm2)? > > > > Argh, sorry. This needs yet another patch (sent for review to linux-pm) to > > be applied. The following one is against -mm2: > > > > --- > > NOTE: This is not a complete solution, because it removes the enabling of GPEs > > from the resume-during-hibernation code path entirely, which probbably is not a > > good idea in general. > > --- > > kernel/power/disk.c | 1 - > > 1 file changed, 1 deletion(-) > > > > Index: linux-2.6.21-mm2/kernel/power/disk.c > > =================================================================== > > --- linux-2.6.21-mm2.orig/kernel/power/disk.c > > +++ linux-2.6.21-mm2/kernel/power/disk.c > > @@ -205,7 +205,6 @@ int hibernate(void) > > > > if (in_suspend) { > > enable_nonboot_cpus(); > > - platform_finish(); > > device_resume(); > > resume_console(); > > pr_debug("PM: writing image.\n"); > > It now hangs in a similar fashion in the device_resume() call. > > If I back off Thomas's clockevents-fix-resume-logic-updated-version.patch > and include just the above patch the machine does suspend and resume > correctly. > > I can delve further into the device_resume() hang if we think it would be > useful. Frankly, I'm not sure. On the one hand, I'd like to know wth is going on in there, but on the other hand, I don't know if that knowledge will help us in general. It looks like we'll need to revamp the ACPI stuff in the hibernation/suspend code paths anyway. > Maybe Linus's clock-stomping tracer can help here, if I can remember how to > use it. It's described a bit in Documentation/power/s2ram.txt . ^ permalink raw reply [flat|nested] 61+ messages in thread
[parent not found: <20070430102851.987296000@linutronix.de>]
* Re: [patch 2/3] ACPI: Keep TSC stable, when lapic_timer_c2_ok is set [not found] ` <20070430102851.987296000@linutronix.de> @ 2007-05-07 6:43 ` Andrew Morton 2007-05-07 7:01 ` Thomas Gleixner 0 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2007-05-07 6:43 UTC (permalink / raw) To: Thomas Gleixner; +Cc: LKML, Ingo Molnar, John Stultz, Len Brown, linux-acpi On Mon, 30 Apr 2007 10:43:33 -0000 Thomas Gleixner <tglx@linutronix.de> wrote: > The local apic timer stop in C2 resp. C3 states is coupled with the > stop of the TSC. When the local apic timer is marked stable in C2 > on the kernel commandline, then keep the TSC marked stable in C2 as well. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > > --- > drivers/acpi/processor_idle.c | 19 ++++++++----------- > 1 file changed, 8 insertions(+), 11 deletions(-) > > Index: linux-2.6/drivers/acpi/processor_idle.c > =================================================================== > --- linux-2.6.orig/drivers/acpi/processor_idle.c > +++ linux-2.6/drivers/acpi/processor_idle.c > @@ -305,18 +305,23 @@ static void acpi_state_timer_broadcast(s > struct acpi_processor_cx *cx, > int broadcast) > { > -#ifdef CONFIG_GENERIC_CLOCKEVENTS > - > int state = cx - pr->power.states; > > if (state >= pr->power.timer_broadcast_on_state) { > + > +#ifdef CONFIG_GENERIC_CLOCKEVENTS > unsigned long reason; > > reason = broadcast ? CLOCK_EVT_NOTIFY_BROADCAST_ENTER : > CLOCK_EVT_NOTIFY_BROADCAST_EXIT; > clockevents_notify(reason, &pr->id); > - } > #endif > + > +#ifdef CONFIG_GENERIC_TIME > + /* TSC halts in C2/3, so notify users */ > + mark_tsc_unstable(); > +#endif > + } > } > > #else > @@ -481,10 +486,6 @@ static void acpi_processor_idle(void) > /* Get end time (ticks) */ > t2 = inl(acpi_gbl_FADT.xpm_timer_block.address); > > -#ifdef CONFIG_GENERIC_TIME > - /* TSC halts in C2, so notify users */ > - mark_tsc_unstable(); > -#endif > /* Re-enable interrupts */ > local_irq_enable(); > current_thread_info()->status |= TS_POLLING; > @@ -523,10 +524,6 @@ static void acpi_processor_idle(void) > acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0); > } > > -#ifdef CONFIG_GENERIC_TIME > - /* TSC halts in C3, so notify users */ > - mark_tsc_unstable(); > -#endif > /* Re-enable interrupts */ > local_irq_enable(); > current_thread_info()->status |= TS_POLLING; > > -- I'm now up to the third or fourth time where I need to try to repair this patch against churn in Len's tree. I'm not sure what I'm doing and that changelog is of little help to me. I think I'll drop it. Please work it with Len. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 2/3] ACPI: Keep TSC stable, when lapic_timer_c2_ok is set 2007-05-07 6:43 ` [patch 2/3] ACPI: Keep TSC stable, when lapic_timer_c2_ok is set Andrew Morton @ 2007-05-07 7:01 ` Thomas Gleixner 0 siblings, 0 replies; 61+ messages in thread From: Thomas Gleixner @ 2007-05-07 7:01 UTC (permalink / raw) To: Andrew Morton; +Cc: LKML, Ingo Molnar, John Stultz, Len Brown, linux-acpi On Sun, 2007-05-06 at 23:43 -0700, Andrew Morton wrote: > I'm now up to the third or fourth time where I need to try to repair this > patch against churn in Len's tree. I'm not sure what I'm doing and that > changelog is of little help to me. > > I think I'll drop it. Please work it with Len. Will do. tglx ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2007-05-16 13:36 UTC | newest]
Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070430102837.748238000@linutronix.de>
2007-05-02 0:33 ` [patch 0/3] Clocksource / clockevent updates Andrew Morton
2007-05-02 6:09 ` Thomas Gleixner
2007-05-02 6:14 ` Andrew Morton
[not found] ` <20070430102852.042964000@linutronix.de>
2007-05-05 7:34 ` [patch 3/3] clockevents: Fix resume logic Andrew Morton
2007-05-05 11:51 ` Ingo Molnar
2007-05-06 15:03 ` [patch 3/3] clockevents: Fix resume logic - updated version Thomas Gleixner
2007-05-08 10:37 ` Andrew Morton
2007-05-09 5:59 ` Andrew Morton
2007-05-09 7:10 ` Andrew Morton
2007-05-09 8:18 ` Thomas Gleixner
2007-05-09 8:22 ` Andrew Morton
2007-05-09 8:31 ` Andrew Morton
2007-05-09 8:59 ` Thomas Gleixner
2007-05-09 11:45 ` Rafael J. Wysocki
2007-05-09 12:24 ` Thomas Gleixner
2007-05-09 13:12 ` Rafael J. Wysocki
2007-05-09 13:19 ` Thomas Gleixner
2007-05-09 17:09 ` Rafael J. Wysocki
2007-05-09 17:15 ` Thomas Gleixner
2007-05-09 18:36 ` Rafael J. Wysocki
2007-05-09 20:45 ` Thomas Gleixner
2007-05-09 20:53 ` Rafael J. Wysocki
2007-05-09 21:26 ` Thomas Gleixner
2007-05-10 8:46 ` Andrew Morton
2007-05-10 8:55 ` Thomas Gleixner
2007-05-10 9:18 ` Andrew Morton
2007-05-10 9:27 ` Thomas Gleixner
2007-05-10 20:12 ` Rafael J. Wysocki
2007-05-11 16:47 ` Andrew Morton
2007-05-11 20:10 ` Rafael J. Wysocki
2007-05-11 20:28 ` Andrew Morton
2007-05-11 21:02 ` Rafael J. Wysocki
2007-05-11 21:09 ` Rafael J. Wysocki
2007-05-12 6:56 ` Andrew Morton
2007-05-12 8:46 ` Thomas Gleixner
2007-05-12 9:00 ` Andrew Morton
2007-05-12 9:18 ` Thomas Gleixner
2007-05-12 10:07 ` Andrew Morton
2007-05-12 11:44 ` Thomas Gleixner
2007-05-12 16:51 ` Andrew Morton
2007-05-12 17:01 ` Thomas Gleixner
2007-05-12 17:23 ` Andrew Morton
2007-05-12 19:36 ` Thomas Gleixner
2007-05-12 19:56 ` Thomas Gleixner
2007-05-13 1:11 ` Andrew Morton
2007-05-13 8:07 ` Thomas Gleixner
2007-05-13 16:48 ` Thomas Gleixner
2007-05-13 19:09 ` Andrew Morton
2007-05-13 20:07 ` Ingo Molnar
2007-05-13 22:35 ` Andrew Morton
2007-05-12 18:49 ` Thomas Gleixner
2007-05-12 11:52 ` Thomas Gleixner
2007-05-15 16:52 ` Pavel Machek
2007-05-13 16:12 ` Thomas Gleixner
2007-05-14 6:42 ` Andrew Morton
2007-05-14 6:58 ` Thomas Gleixner
2007-05-09 12:52 ` Rafael J. Wysocki
2007-05-09 17:14 ` Andrew Morton
2007-05-09 18:55 ` Rafael J. Wysocki
[not found] ` <20070430102851.987296000@linutronix.de>
2007-05-07 6:43 ` [patch 2/3] ACPI: Keep TSC stable, when lapic_timer_c2_ok is set Andrew Morton
2007-05-07 7:01 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox