* spin_lock error in arch/i386/kernel/time.c on APM resume @ 2005-03-12 13:11 J. Bruce Fields 2005-03-12 15:21 ` [PATCH] APM: fix interrupts enabled in device_power_up Zwane Mwaikambo 2005-03-12 15:56 ` spin_lock error in arch/i386/kernel/time.c on APM resume George Anzinger 0 siblings, 2 replies; 18+ messages in thread From: J. Bruce Fields @ 2005-03-12 13:11 UTC (permalink / raw) To: LKML On APM resume this morning on my Thinkpad X31, I got a "spin_lock is already locked" error; see below. This doesn't happen on every resume, though it's happened before. The kernel is 2.6.11 plus a bunch of (hopefully unrelated...) NFS patches. Any ideas? --Bruce Fields Mar 12 07:07:29 puzzle kernel: PCI: Setting latency timer of device 0000:00:1d.0 to 64 Mar 12 07:07:31 puzzle kernel: PCI: Setting latency timer of device 0000:00:1d.1 to 64 Mar 12 07:07:31 puzzle kernel: PCI: Setting latency timer of device 0000:00:1d.2 to 64 Mar 12 07:07:31 puzzle kernel: PCI: cache line size of 32 is not supported by device 0000:00:1d.7 Mar 12 07:07:31 puzzle kernel: ehci_hcd 0000:00:1d.7: USB 2.0 restarted, EHCI 1.00, driver 10 Dec 2004 Mar 12 07:07:31 puzzle kernel: PCI: Found IRQ 11 for device 0000:00:1f.1 Mar 12 07:07:31 puzzle kernel: PCI: Sharing IRQ 11 with 0000:00:1d.2 Mar 12 07:07:31 puzzle kernel: PCI: Sharing IRQ 11 with 0000:02:00.2 Mar 12 07:07:31 puzzle kernel: PCI: Sharing IRQ 11 with 0000:02:02.0 Mar 12 07:07:31 puzzle kernel: PCI: Found IRQ 11 for device 0000:00:1f.5 Mar 12 07:07:31 puzzle kernel: PCI: Sharing IRQ 11 with 0000:00:1f.3 Mar 12 07:07:31 puzzle kernel: PCI: Sharing IRQ 11 with 0000:00:1f.6 Mar 12 07:07:31 puzzle kernel: PCI: Sharing IRQ 11 with 0000:02:00.1 Mar 12 07:07:31 puzzle kernel: PCI: Setting latency timer of device 0000:00:1f.5 to 64 Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:179: spin_lock(arch/i386/kernel/time.c:c0603c28) already locked by arch/i386/kernel/time.c/309 Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:316: spin_unlock(arch/i386/kernel/time.c:c0603c28) not locked Mar 12 07:07:31 puzzle kernel: PCI: Found IRQ 11 for device 0000:01:00.0 Mar 12 07:07:31 puzzle kernel: PCI: Sharing IRQ 11 with 0000:00:1d.0 Mar 12 07:07:31 puzzle kernel: PCI: Sharing IRQ 11 with 0000:02:00.0 Mar 12 07:07:31 puzzle kernel: PCI: Found IRQ 11 for device 0000:02:00.2 Mar 12 07:07:31 puzzle kernel: PCI: Sharing IRQ 11 with 0000:00:1d.2 Mar 12 07:07:31 puzzle kernel: PCI: Sharing IRQ 11 with 0000:00:1f.1 Mar 12 07:07:31 puzzle kernel: PCI: Sharing IRQ 11 with 0000:02:02.0 Mar 12 07:07:31 puzzle kernel: PCI: Found IRQ 11 for device 0000:02:08.0 Mar 12 07:07:31 puzzle kernel: agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0. Mar 12 07:07:31 puzzle kernel: agpgart: Putting AGP V2 device at 0000:00:00.0 into 1x mode Mar 12 07:07:31 puzzle kernel: agpgart: Putting AGP V2 device at 0000:01:00.0 into 1x mode ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] APM: fix interrupts enabled in device_power_up 2005-03-12 13:11 spin_lock error in arch/i386/kernel/time.c on APM resume J. Bruce Fields @ 2005-03-12 15:21 ` Zwane Mwaikambo 2005-03-15 22:33 ` J. Bruce Fields 2005-03-12 15:56 ` spin_lock error in arch/i386/kernel/time.c on APM resume George Anzinger 1 sibling, 1 reply; 18+ messages in thread From: Zwane Mwaikambo @ 2005-03-12 15:21 UTC (permalink / raw) To: J. Bruce Fields; +Cc: LKML, Andrew Morton, Pavel Machek On Sat, 12 Mar 2005, J. Bruce Fields wrote: > On APM resume this morning on my Thinkpad X31, I got a "spin_lock is > already locked" error; see below. This doesn't happen on every resume, > though it's happened before. The kernel is 2.6.11 plus a bunch of > (hopefully unrelated...) NFS patches. > > Mar 12 07:07:31 puzzle kernel: PCI: Setting latency timer of device 0000:00:1f.5 to 64 > Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:179: spin_lock(arch/i386/kernel/time.c:c0603c28) already locked by arch/i386/kernel/time.c/309 > Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:316: spin_unlock(arch/i386/kernel/time.c:c0603c28) not locked APM was calling device_power_down and device_power_up with interrupts enabled, resulting in a few calls to get_cmos_time being done with interrupts enabled (rtc_lock needs to be acquired with interrupts disabled). Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> ===== arch/i386/kernel/apm.c 1.72 vs edited ===== --- 1.72/arch/i386/kernel/apm.c 2005-01-20 22:02:11 -07:00 +++ edited/arch/i386/kernel/apm.c 2005-03-12 08:17:52 -07:00 @@ -1202,10 +1202,11 @@ } device_suspend(PMSG_SUSPEND); + local_irq_disable(); device_power_down(PMSG_SUSPEND); /* serialize with the timer interrupt */ - write_seqlock_irq(&xtime_lock); + write_seqlock(&xtime_lock); /* protect against access to timer chip registers */ spin_lock(&i8253_lock); @@ -1216,20 +1217,22 @@ * We'll undo any timer changes due to interrupts below. */ spin_unlock(&i8253_lock); - write_sequnlock_irq(&xtime_lock); + write_sequnlock(&xtime_lock); + local_irq_enable(); save_processor_state(); err = set_system_power_state(APM_STATE_SUSPEND); restore_processor_state(); - write_seqlock_irq(&xtime_lock); + local_irq_disable(); + write_seqlock(&xtime_lock); spin_lock(&i8253_lock); reinit_timer(); set_time(); ignore_normal_resume = 1; spin_unlock(&i8253_lock); - write_sequnlock_irq(&xtime_lock); + write_sequnlock(&xtime_lock); if (err == APM_NO_ERROR) err = APM_SUCCESS; @@ -1237,6 +1240,7 @@ apm_error("suspend", err); err = (err == APM_SUCCESS) ? 0 : -EIO; device_power_up(); + local_irq_enable(); device_resume(); pm_send_all(PM_RESUME, (void *)0); queue_event(APM_NORMAL_RESUME, NULL); @@ -1255,17 +1259,22 @@ { int err; + local_irq_disable(); device_power_down(PMSG_SUSPEND); /* serialize with the timer interrupt */ - write_seqlock_irq(&xtime_lock); + write_seqlock(&xtime_lock); /* If needed, notify drivers here */ get_time_diff(); - write_sequnlock_irq(&xtime_lock); + write_sequnlock(&xtime_lock); + local_irq_enable(); err = set_system_power_state(APM_STATE_STANDBY); if ((err != APM_SUCCESS) && (err != APM_NO_ERROR)) apm_error("standby", err); + + local_irq_disable(); device_power_up(); + local_irq_enable(); } static apm_event_t get_event(void) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] APM: fix interrupts enabled in device_power_up 2005-03-12 15:21 ` [PATCH] APM: fix interrupts enabled in device_power_up Zwane Mwaikambo @ 2005-03-15 22:33 ` J. Bruce Fields 2005-03-15 22:40 ` Zwane Mwaikambo 0 siblings, 1 reply; 18+ messages in thread From: J. Bruce Fields @ 2005-03-15 22:33 UTC (permalink / raw) To: Zwane Mwaikambo; +Cc: LKML, Andrew Morton, Pavel Machek On Sat, Mar 12, 2005 at 08:21:29AM -0700, Zwane Mwaikambo wrote: > On Sat, 12 Mar 2005, J. Bruce Fields wrote: > > > On APM resume this morning on my Thinkpad X31, I got a "spin_lock is > > already locked" error; see below. This doesn't happen on every resume, > > though it's happened before. The kernel is 2.6.11 plus a bunch of > > (hopefully unrelated...) NFS patches. > > > > Mar 12 07:07:31 puzzle kernel: PCI: Setting latency timer of device 0000:00:1f.5 to 64 > > Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:179: spin_lock(arch/i386/kernel/time.c:c0603c28) already locked by arch/i386/kernel/time.c/309 > > Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:316: spin_unlock(arch/i386/kernel/time.c:c0603c28) not locked > > APM was calling device_power_down and device_power_up with interrupts > enabled, resulting in a few calls to get_cmos_time being done with > interrupts enabled (rtc_lock needs to be acquired with interrupts > disabled). Thanks, I haven't been following the discussion carefully, but for what's it worth I did apply that patch and now (a few suspend-resume cycles later) haven't seen the spin_lock warning or seen any other ill effects. Let me know if there's any testing it would be useful for me to do. --Bruce Fields > Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> > > ===== arch/i386/kernel/apm.c 1.72 vs edited ===== > --- 1.72/arch/i386/kernel/apm.c 2005-01-20 22:02:11 -07:00 > +++ edited/arch/i386/kernel/apm.c 2005-03-12 08:17:52 -07:00 > @@ -1202,10 +1202,11 @@ > } > > device_suspend(PMSG_SUSPEND); > + local_irq_disable(); > device_power_down(PMSG_SUSPEND); > > /* serialize with the timer interrupt */ > - write_seqlock_irq(&xtime_lock); > + write_seqlock(&xtime_lock); > > /* protect against access to timer chip registers */ > spin_lock(&i8253_lock); > @@ -1216,20 +1217,22 @@ > * We'll undo any timer changes due to interrupts below. > */ > spin_unlock(&i8253_lock); > - write_sequnlock_irq(&xtime_lock); > + write_sequnlock(&xtime_lock); > + local_irq_enable(); > > save_processor_state(); > err = set_system_power_state(APM_STATE_SUSPEND); > restore_processor_state(); > > - write_seqlock_irq(&xtime_lock); > + local_irq_disable(); > + write_seqlock(&xtime_lock); > spin_lock(&i8253_lock); > reinit_timer(); > set_time(); > ignore_normal_resume = 1; > > spin_unlock(&i8253_lock); > - write_sequnlock_irq(&xtime_lock); > + write_sequnlock(&xtime_lock); > > if (err == APM_NO_ERROR) > err = APM_SUCCESS; > @@ -1237,6 +1240,7 @@ > apm_error("suspend", err); > err = (err == APM_SUCCESS) ? 0 : -EIO; > device_power_up(); > + local_irq_enable(); > device_resume(); > pm_send_all(PM_RESUME, (void *)0); > queue_event(APM_NORMAL_RESUME, NULL); > @@ -1255,17 +1259,22 @@ > { > int err; > > + local_irq_disable(); > device_power_down(PMSG_SUSPEND); > /* serialize with the timer interrupt */ > - write_seqlock_irq(&xtime_lock); > + write_seqlock(&xtime_lock); > /* If needed, notify drivers here */ > get_time_diff(); > - write_sequnlock_irq(&xtime_lock); > + write_sequnlock(&xtime_lock); > + local_irq_enable(); > > err = set_system_power_state(APM_STATE_STANDBY); > if ((err != APM_SUCCESS) && (err != APM_NO_ERROR)) > apm_error("standby", err); > + > + local_irq_disable(); > device_power_up(); > + local_irq_enable(); > } > > static apm_event_t get_event(void) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] APM: fix interrupts enabled in device_power_up 2005-03-15 22:33 ` J. Bruce Fields @ 2005-03-15 22:40 ` Zwane Mwaikambo 0 siblings, 0 replies; 18+ messages in thread From: Zwane Mwaikambo @ 2005-03-15 22:40 UTC (permalink / raw) To: J. Bruce Fields; +Cc: LKML, Andrew Morton, Pavel Machek On Tue, 15 Mar 2005, J. Bruce Fields wrote: > On Sat, Mar 12, 2005 at 08:21:29AM -0700, Zwane Mwaikambo wrote: > > On Sat, 12 Mar 2005, J. Bruce Fields wrote: > > > > > On APM resume this morning on my Thinkpad X31, I got a "spin_lock is > > > already locked" error; see below. This doesn't happen on every resume, > > > though it's happened before. The kernel is 2.6.11 plus a bunch of > > > (hopefully unrelated...) NFS patches. > > > > > > Mar 12 07:07:31 puzzle kernel: PCI: Setting latency timer of device 0000:00:1f.5 to 64 > > > Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:179: spin_lock(arch/i386/kernel/time.c:c0603c28) already locked by arch/i386/kernel/time.c/309 > > > Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:316: spin_unlock(arch/i386/kernel/time.c:c0603c28) not locked > > > > APM was calling device_power_down and device_power_up with interrupts > > enabled, resulting in a few calls to get_cmos_time being done with > > interrupts enabled (rtc_lock needs to be acquired with interrupts > > disabled). > > Thanks, I haven't been following the discussion carefully, but for > what's it worth I did apply that patch and now (a few suspend-resume > cycles later) haven't seen the spin_lock warning or seen any other ill > effects. > > Let me know if there's any testing it would be useful for me to do. Thanks for testing it, suspend/resume cycles are what i was most interested in. Zwane ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-12 13:11 spin_lock error in arch/i386/kernel/time.c on APM resume J. Bruce Fields 2005-03-12 15:21 ` [PATCH] APM: fix interrupts enabled in device_power_up Zwane Mwaikambo @ 2005-03-12 15:56 ` George Anzinger 2005-03-12 16:25 ` Zwane Mwaikambo ` (2 more replies) 1 sibling, 3 replies; 18+ messages in thread From: George Anzinger @ 2005-03-12 15:56 UTC (permalink / raw) To: Andrew Morton Cc: J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1498 bytes --] J. Bruce Fields wrote: > On APM resume this morning on my Thinkpad X31, I got a "spin_lock is > already locked" error; see below. This doesn't happen on every resume, > though it's happened before. The kernel is 2.6.11 plus a bunch of > (hopefully unrelated...) NFS patches. > > Any ideas? > Yesterday's night mare, todays bug :( Looks like we need the irq on the read clock also. This is true both before and after the prior cmos_time changes. Andrew, The attached replaces the patch I sent yesterday. For those wanting to fix the kernel with out those patches, all that is needed its the chunk that applies, i.e. the _irq on the get_cmos_time() spinlocks. And more... That this occures implies we are attempting to update the cmos clock on resume seems wrong. One would presume that the time is wrong at this time and we are about to save that wrong time. Possibly the APM code should change time_status to STA_UNSYNC on the way into the sleep (or what ever it is called). Who should we ping with this? ~ > Mar 12 07:07:31 puzzle kernel: PCI: Setting latency timer of device 0000:00:1f.5 to 64 > Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:179: spin_lock(arch/i386/kernel/time.c:c0603c28) already locked by arch/i386/kernel/time.c/309 > Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:316: spin_unlock(arch/i386/kernel/time.c:c0603c28) not locked ~ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ [-- Attachment #2: cmos_time_lock.patch --] [-- Type: text/plain, Size: 1515 bytes --] Source: MontaVista Software, Inc. Type: Defect Fix Disposition: Pending Description: I was not happy with the locking on this. Two changes: 1) Turn off irq while setting the clock. 2) Call the timer code only through the timer interface (set a short timer to do it from the ntp call). Signed-off-by: George Anzinger <george@mvista.com> time.c | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) Index: linux-2.6.12-rc/arch/i386/kernel/time.c =================================================================== --- linux-2.6.12-rc.orig/arch/i386/kernel/time.c +++ linux-2.6.12-rc/arch/i386/kernel/time.c @@ -176,12 +176,12 @@ static int set_rtc_mmss(unsigned long no int retval; /* gets recalled with irq locally disabled */ - spin_lock(&rtc_lock); + spin_lock_irq(&rtc_lock); if (efi_enabled) retval = efi_set_rtc_mmss(nowtime); else retval = mach_set_rtc_mmss(nowtime); - spin_unlock(&rtc_lock); + spin_unlock_irq(&rtc_lock); return retval; } @@ -282,14 +282,14 @@ unsigned long get_cmos_time(void) { unsigned long retval; - spin_lock(&rtc_lock); + spin_lock_irq(&rtc_lock); if (efi_enabled) retval = efi_get_time(); else retval = mach_get_cmos_time(); - spin_unlock(&rtc_lock); + spin_unlock_irq(&rtc_lock); return retval; } @@ -338,7 +338,7 @@ static void sync_cmos_clock(unsigned lon } void notify_arch_cmos_timer(void) { - sync_cmos_clock(0); + mod_timer(&sync_cmos_timer, jiffies + 1); } static long clock_cmos_diff, sleep_start; ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-12 15:56 ` spin_lock error in arch/i386/kernel/time.c on APM resume George Anzinger @ 2005-03-12 16:25 ` Zwane Mwaikambo 2005-03-12 16:36 ` Venkatesh Pallipadi 2005-03-12 18:04 ` George Anzinger 2005-03-12 21:14 ` Barry K. Nathan 2005-03-12 21:18 ` Lee Revell 2 siblings, 2 replies; 18+ messages in thread From: Zwane Mwaikambo @ 2005-03-12 16:25 UTC (permalink / raw) To: George Anzinger Cc: Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org On Sat, 12 Mar 2005, George Anzinger wrote: > Looks like we need the irq on the read clock also. This is true both before > and after the prior cmos_time changes. > > The attached replaces the patch I sent yesterday. > > For those wanting to fix the kernel with out those patches, all that is needed > its the chunk that applies, i.e. the _irq on the get_cmos_time() spinlocks. > > And more... That this occures implies we are attempting to update the cmos > clock on resume seems wrong. One would presume that the time is wrong at this > time and we are about to save that wrong time. Possibly the APM code should > change time_status to STA_UNSYNC on the way into the sleep (or what ever it is > called). Who should we ping with this? timer_resume, which appears to be the problem, wants to calculate amount of time was spent suspended, also your unconditional irq enable in get_cmos_time breaks the atomicity of device_power_up and would deadlock in sections of code which call get_time_diff() with xtime_lock held. I sent a patch subject "APM: fix interrupts enabled in device_power_up" which should address this. Thanks, Zwane ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-12 16:25 ` Zwane Mwaikambo @ 2005-03-12 16:36 ` Venkatesh Pallipadi 2005-03-12 16:46 ` Zwane Mwaikambo 2005-03-12 18:04 ` George Anzinger 1 sibling, 1 reply; 18+ messages in thread From: Venkatesh Pallipadi @ 2005-03-12 16:36 UTC (permalink / raw) To: Zwane Mwaikambo Cc: George Anzinger, Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org On Sat, Mar 12, 2005 at 09:25:13AM -0700, Zwane Mwaikambo wrote: > On Sat, 12 Mar 2005, George Anzinger wrote: > > > And more... That this occures implies we are attempting to update the cmos > > clock on resume seems wrong. One would presume that the time is wrong at this > > time and we are about to save that wrong time. Possibly the APM code should > > change time_status to STA_UNSYNC on the way into the sleep (or what ever it is > > called). Who should we ping with this? > > timer_resume, which appears to be the problem, wants to calculate amount > of time was spent suspended, also your unconditional irq enable in > get_cmos_time breaks the atomicity of device_power_up and would deadlock > in sections of code which call get_time_diff() with xtime_lock held. I > sent a patch subject "APM: fix interrupts enabled in device_power_up" > which should address this. > How about this patch? Also fixes one other use of rtc_lock in acpi/sleep/proc.c Thanks, Venki rtc_lock is held during timer interrupts. So, we should block interrupts while holding it. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> --- linux-2.6.10/arch/i386/kernel/time.c.org 2005-03-12 10:38:23.000000000 -0800 +++ linux-2.6.10/arch/i386/kernel/time.c 2005-03-12 10:40:26.000000000 -0800 @@ -305,15 +305,16 @@ irqreturn_t timer_interrupt(int irq, voi unsigned long get_cmos_time(void) { unsigned long retval; + unsigned long flags; - spin_lock(&rtc_lock); + spin_lock_irqsave(&rtc_lock, flags); if (efi_enabled) retval = efi_get_time(); else retval = mach_get_cmos_time(); - spin_unlock(&rtc_lock); + spin_unlock_restore(&rtc_lock, flags); return retval; } --- linux-2.6.10/drivers/acpi/sleep/proc.c.org 2005-03-12 10:50:40.000000000 -0800 +++ linux-2.6.10/drivers/acpi/sleep/proc.c 2005-03-12 10:53:08.000000000 -0800 @@ -84,10 +84,11 @@ static int acpi_system_alarm_seq_show(st u32 sec, min, hr; u32 day, mo, yr; unsigned char rtc_control = 0; + unsigned long flags; ACPI_FUNCTION_TRACE("acpi_system_alarm_seq_show"); - spin_lock(&rtc_lock); + spin_lock_irqsave(&rtc_lock, flags); sec = CMOS_READ(RTC_SECONDS_ALARM); min = CMOS_READ(RTC_MINUTES_ALARM); @@ -109,7 +110,7 @@ static int acpi_system_alarm_seq_show(st else yr = CMOS_READ(RTC_YEAR); - spin_unlock(&rtc_lock); + spin_unlock_restore(&rtc_lock, flags); if (!(rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { BCD_TO_BIN(sec); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-12 16:36 ` Venkatesh Pallipadi @ 2005-03-12 16:46 ` Zwane Mwaikambo 2005-03-12 17:45 ` Venkatesh Pallipadi 0 siblings, 1 reply; 18+ messages in thread From: Zwane Mwaikambo @ 2005-03-12 16:46 UTC (permalink / raw) To: Venkatesh Pallipadi Cc: George Anzinger, Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org On Sat, 12 Mar 2005, Venkatesh Pallipadi wrote: > On Sat, Mar 12, 2005 at 09:25:13AM -0700, Zwane Mwaikambo wrote: > > On Sat, 12 Mar 2005, George Anzinger wrote: > > > > > And more... That this occures implies we are attempting to update the cmos > > > clock on resume seems wrong. One would presume that the time is wrong at this > > > time and we are about to save that wrong time. Possibly the APM code should > > > change time_status to STA_UNSYNC on the way into the sleep (or what ever it is > > > called). Who should we ping with this? > > > > timer_resume, which appears to be the problem, wants to calculate amount > > of time was spent suspended, also your unconditional irq enable in > > get_cmos_time breaks the atomicity of device_power_up and would deadlock > > in sections of code which call get_time_diff() with xtime_lock held. I > > sent a patch subject "APM: fix interrupts enabled in device_power_up" > > which should address this. > > > > How about this patch? Also fixes one other use of rtc_lock in acpi/sleep/proc.c > > rtc_lock is held during timer interrupts. So, we should block interrupts > while holding it. It's certainly a lot safer with saving/restoring eflags and the drivers/acpi/sleep/proc.c change is a good catch, but i think the get_cmos_time() callers should take care of the interrupt disabling. btw, s/spin_unlock_restore/spin_unlock_irqrestore/. Please submit the proc.c change. Thanks, Zwane > Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> > > --- linux-2.6.10/arch/i386/kernel/time.c.org 2005-03-12 10:38:23.000000000 -0800 > +++ linux-2.6.10/arch/i386/kernel/time.c 2005-03-12 10:40:26.000000000 -0800 > @@ -305,15 +305,16 @@ irqreturn_t timer_interrupt(int irq, voi > unsigned long get_cmos_time(void) > { > unsigned long retval; > + unsigned long flags; > > - spin_lock(&rtc_lock); > + spin_lock_irqsave(&rtc_lock, flags); > > if (efi_enabled) > retval = efi_get_time(); > else > retval = mach_get_cmos_time(); > > - spin_unlock(&rtc_lock); > + spin_unlock_restore(&rtc_lock, flags); > > return retval; > } > --- linux-2.6.10/drivers/acpi/sleep/proc.c.org 2005-03-12 10:50:40.000000000 -0800 > +++ linux-2.6.10/drivers/acpi/sleep/proc.c 2005-03-12 10:53:08.000000000 -0800 > @@ -84,10 +84,11 @@ static int acpi_system_alarm_seq_show(st > u32 sec, min, hr; > u32 day, mo, yr; > unsigned char rtc_control = 0; > + unsigned long flags; > > ACPI_FUNCTION_TRACE("acpi_system_alarm_seq_show"); > > - spin_lock(&rtc_lock); > + spin_lock_irqsave(&rtc_lock, flags); > > sec = CMOS_READ(RTC_SECONDS_ALARM); > min = CMOS_READ(RTC_MINUTES_ALARM); > @@ -109,7 +110,7 @@ static int acpi_system_alarm_seq_show(st > else > yr = CMOS_READ(RTC_YEAR); > > - spin_unlock(&rtc_lock); > + spin_unlock_restore(&rtc_lock, flags); > > if (!(rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { > BCD_TO_BIN(sec); > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-12 16:46 ` Zwane Mwaikambo @ 2005-03-12 17:45 ` Venkatesh Pallipadi 0 siblings, 0 replies; 18+ messages in thread From: Venkatesh Pallipadi @ 2005-03-12 17:45 UTC (permalink / raw) To: Zwane Mwaikambo Cc: Venkatesh Pallipadi, George Anzinger, Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org, Len Brown On Sat, Mar 12, 2005 at 09:46:54AM -0700, Zwane Mwaikambo wrote: > On Sat, 12 Mar 2005, Venkatesh Pallipadi wrote: > > > On Sat, Mar 12, 2005 at 09:25:13AM -0700, Zwane Mwaikambo wrote: > > > > How about this patch? Also fixes one other use of rtc_lock in acpi/sleep/proc.c > > > > rtc_lock is held during timer interrupts. So, we should block interrupts > > while holding it. > > It's certainly a lot safer with saving/restoring eflags and the > drivers/acpi/sleep/proc.c change is a good catch, but i think the > get_cmos_time() callers should take care of the interrupt disabling. btw, > s/spin_unlock_restore/spin_unlock_irqrestore/. Please submit the proc.c > change. > So, I will assume get_cmos_time issue will be fixed in some other way. Sending the proper proc.c change this time. Andrew: Please apply. Thanks, Venki rtc_lock is held during timer interrupts. So, we should block interrupts while holding it. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> --- linux-2.6.10/drivers/acpi/sleep/proc.c.org 2005-03-12 10:50:40.000000000 -0800 +++ linux-2.6.10/drivers/acpi/sleep/proc.c 2005-03-12 10:53:08.000000000 -0800 @@ -84,10 +84,11 @@ static int acpi_system_alarm_seq_show(st u32 sec, min, hr; u32 day, mo, yr; unsigned char rtc_control = 0; + unsigned long flags; ACPI_FUNCTION_TRACE("acpi_system_alarm_seq_show"); - spin_lock(&rtc_lock); + spin_lock_irqsave(&rtc_lock, flags); sec = CMOS_READ(RTC_SECONDS_ALARM); min = CMOS_READ(RTC_MINUTES_ALARM); @@ -109,7 +110,7 @@ static int acpi_system_alarm_seq_show(st else yr = CMOS_READ(RTC_YEAR); - spin_unlock(&rtc_lock); + spin_unlock_irqrestore(&rtc_lock, flags); if (!(rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { BCD_TO_BIN(sec); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-12 16:25 ` Zwane Mwaikambo 2005-03-12 16:36 ` Venkatesh Pallipadi @ 2005-03-12 18:04 ` George Anzinger 2005-03-12 19:58 ` Zwane Mwaikambo 2005-03-13 18:35 ` Pavel Machek 1 sibling, 2 replies; 18+ messages in thread From: George Anzinger @ 2005-03-12 18:04 UTC (permalink / raw) To: Zwane Mwaikambo Cc: Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org Zwane Mwaikambo wrote: > On Sat, 12 Mar 2005, George Anzinger wrote: > > >>Looks like we need the irq on the read clock also. This is true both before >>and after the prior cmos_time changes. >> >>The attached replaces the patch I sent yesterday. >> >>For those wanting to fix the kernel with out those patches, all that is needed >>its the chunk that applies, i.e. the _irq on the get_cmos_time() spinlocks. >> >>And more... That this occures implies we are attempting to update the cmos >>clock on resume seems wrong. One would presume that the time is wrong at this >>time and we are about to save that wrong time. Possibly the APM code should >>change time_status to STA_UNSYNC on the way into the sleep (or what ever it is >>called). Who should we ping with this? > > > timer_resume, which appears to be the problem, wants to calculate amount > of time was spent suspended, also your unconditional irq enable in > get_cmos_time breaks the atomicity of device_power_up and would deadlock > in sections of code which call get_time_diff() with xtime_lock held. I > sent a patch subject "APM: fix interrupts enabled in device_power_up" > which should address this. I agree. Still in all that follows, no one has addressed the apparent race described above. The reason the system reported the errors that started this thread is that the APM restore code was trying to read the cmos clock (I assume to set the xtime clock) WHILE the timer interrupt code what trying to set the cmos clock from xtime. In other words, it is destroying the time it is trying to read. I repeat "Possibly the APM code should change time_status to STA_UNSYNC on the way into the sleep." I am not sure how ntp is supposed to react to the resume but I suspect that the system time is rather out of sync... -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-12 18:04 ` George Anzinger @ 2005-03-12 19:58 ` Zwane Mwaikambo 2005-03-12 20:25 ` George Anzinger 2005-03-13 18:35 ` Pavel Machek 1 sibling, 1 reply; 18+ messages in thread From: Zwane Mwaikambo @ 2005-03-12 19:58 UTC (permalink / raw) To: George Anzinger Cc: Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org On Sat, 12 Mar 2005, George Anzinger wrote: > I agree. Still in all that follows, no one has addressed the apparent race > described above. The reason the system reported the errors that started this > thread is that the APM restore code was trying to read the cmos clock (I > assume to set the xtime clock) WHILE the timer interrupt code what trying to > set the cmos clock from xtime. Doesn't my reply explain the actual problem? The code path being; arch/i386/kernel/apm.c suspend() write_seqlock_irq(xtime_lock) ... write_sequnlock_irq(xtime_lock) <interrupts enabled> device_power_up() timer_resume() get_cmos_time(); So this covers the problem that the reporter reported, so yes it's setting xtime but we shouldn't be taking interrupts in the first place, so i posted the patch to cover that. APM was clearly violating PM resume procedures. Thanks, Zwane ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-12 19:58 ` Zwane Mwaikambo @ 2005-03-12 20:25 ` George Anzinger 0 siblings, 0 replies; 18+ messages in thread From: George Anzinger @ 2005-03-12 20:25 UTC (permalink / raw) To: Zwane Mwaikambo Cc: Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org Zwane Mwaikambo wrote: > On Sat, 12 Mar 2005, George Anzinger wrote: > > >>I agree. Still in all that follows, no one has addressed the apparent race >>described above. The reason the system reported the errors that started this >>thread is that the APM restore code was trying to read the cmos clock (I >>assume to set the xtime clock) WHILE the timer interrupt code what trying to >>set the cmos clock from xtime. > > > Doesn't my reply explain the actual problem? The code path being; Sorry, I just didn't look at the apm code. My bad. -g > > arch/i386/kernel/apm.c > suspend() > write_seqlock_irq(xtime_lock) > ... > write_sequnlock_irq(xtime_lock) > <interrupts enabled> > device_power_up() > timer_resume() > get_cmos_time(); S > So this covers the problem that the reporter reported, so yes it's setting > xtime but we shouldn't be taking interrupts in the first place, so i > posted the patch to cover that. APM was clearly violating PM resume > procedures. > > Thanks, > Zwane -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-12 18:04 ` George Anzinger 2005-03-12 19:58 ` Zwane Mwaikambo @ 2005-03-13 18:35 ` Pavel Machek 2005-03-14 23:49 ` George Anzinger 1 sibling, 1 reply; 18+ messages in thread From: Pavel Machek @ 2005-03-13 18:35 UTC (permalink / raw) To: George Anzinger Cc: Zwane Mwaikambo, Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org Hi! > >>And more... That this occures implies we are attempting to update the cmos > >>clock on resume seems wrong. One would presume that the time is wrong at > >>this > >>time and we are about to save that wrong time. Possibly the APM code > >>should > >>change time_status to STA_UNSYNC on the way into the sleep (or what ever > >>it is > >>called). Who should we ping with this? > > > > > >timer_resume, which appears to be the problem, wants to calculate amount > >of time was spent suspended, also your unconditional irq enable in > >get_cmos_time breaks the atomicity of device_power_up and would deadlock > >in sections of code which call get_time_diff() with xtime_lock held. I > >sent a patch subject "APM: fix interrupts enabled in device_power_up" > >which should address this. > > I agree. Still in all that follows, no one has addressed the apparent race > described above. The reason the system reported the errors that started > this thread is that the APM restore code was trying to read the cmos clock > (I assume to set the xtime clock) WHILE the timer interrupt code what > trying to set the cmos clock from xtime. In other words, it is destroying > the time it is trying to read. I repeat "Possibly the APM code should > change time_status to STA_UNSYNC on the way into the sleep." I am not sure > how ntp is supposed to react to the resume but I suspect that the system > time is rather out of sync... It needs to work without NTP, too. You don't get NTP on plane (etc) where suspend is most usefull. We have CMOS clock, it should be possible to get time from there without resorting to NTP.. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-13 18:35 ` Pavel Machek @ 2005-03-14 23:49 ` George Anzinger 2005-03-15 0:08 ` Pavel Machek 0 siblings, 1 reply; 18+ messages in thread From: George Anzinger @ 2005-03-14 23:49 UTC (permalink / raw) To: Pavel Machek Cc: Zwane Mwaikambo, Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org Pavel Machek wrote: > Hi! > > >>>>And more... That this occures implies we are attempting to update the cmos >>>>clock on resume seems wrong. One would presume that the time is wrong at >>>>this >>>>time and we are about to save that wrong time. Possibly the APM code >>>>should >>>>change time_status to STA_UNSYNC on the way into the sleep (or what ever >>>>it is >>>>called). Who should we ping with this? >>> >>> >>>timer_resume, which appears to be the problem, wants to calculate amount >>>of time was spent suspended, also your unconditional irq enable in >>>get_cmos_time breaks the atomicity of device_power_up and would deadlock >>>in sections of code which call get_time_diff() with xtime_lock held. I >>>sent a patch subject "APM: fix interrupts enabled in device_power_up" >>>which should address this. >> >>I agree. Still in all that follows, no one has addressed the apparent race >>described above. The reason the system reported the errors that started >>this thread is that the APM restore code was trying to read the cmos clock >>(I assume to set the xtime clock) WHILE the timer interrupt code what >>trying to set the cmos clock from xtime. In other words, it is destroying >>the time it is trying to read. I repeat "Possibly the APM code should >>change time_status to STA_UNSYNC on the way into the sleep." I am not sure >>how ntp is supposed to react to the resume but I suspect that the system >>time is rather out of sync... > > > It needs to work without NTP, too. You don't get NTP on plane (etc) > where suspend is most usefull. > > We have CMOS clock, it should be possible to get time from there > without resorting to NTP.. Eh... sure, but... the bug was reported because the system was attempting to update the cmos clock (which it does every ~11 min.) during APM exit. It does this IF AND ONLY IF the system is synced to an external source as indicated by the STA_UNSYNC bit being cleared in the time_state. Now, I don't know what or how APM and NTP are supposed to play together, but I suspect that on entry to APM time is no longer synced, thus my comment. As to your comment, the bug would never have shown its ugly face if the system wasn't using NTP. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-14 23:49 ` George Anzinger @ 2005-03-15 0:08 ` Pavel Machek 2005-03-15 9:04 ` George Anzinger 0 siblings, 1 reply; 18+ messages in thread From: Pavel Machek @ 2005-03-15 0:08 UTC (permalink / raw) To: George Anzinger Cc: Zwane Mwaikambo, Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org Hi! > >>I agree. Still in all that follows, no one has addressed the apparent > >>race described above. The reason the system reported the errors that > >>started this thread is that the APM restore code was trying to read the > >>cmos clock (I assume to set the xtime clock) WHILE the timer interrupt > >>code what trying to set the cmos clock from xtime. In other words, it is > >>destroying the time it is trying to read. I repeat "Possibly the APM > >>code should change time_status to STA_UNSYNC on the way into the sleep." > >>I am not sure how ntp is supposed to react to the resume but I suspect > >>that the system time is rather out of sync... > > > > > >It needs to work without NTP, too. You don't get NTP on plane (etc) > >where suspend is most usefull. > > > >We have CMOS clock, it should be possible to get time from there > >without resorting to NTP.. > > Eh... sure, but... the bug was reported because the system was attempting > to update the cmos clock (which it does every ~11 min.) during APM exit. > It does this IF AND ONLY IF the system is synced to an external source as > indicated by the STA_UNSYNC bit being cleared in the time_state. Now, I > don't know what or how APM and NTP are supposed to play together, but I > suspect that on entry to APM time is no longer synced, thus my comment. > > As to your comment, the bug would never have shown its ugly face if the > system wasn't using NTP. Uh, ok, you are right. We should set time to STA_UNSYNC so that we do not write back to CMOS during/shortly after resume. I did not realize what STA_UNSYNC means. Perhaps you have patch to do that somewhere? ;-)))) Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-15 0:08 ` Pavel Machek @ 2005-03-15 9:04 ` George Anzinger 0 siblings, 0 replies; 18+ messages in thread From: George Anzinger @ 2005-03-15 9:04 UTC (permalink / raw) To: Pavel Machek Cc: Zwane Mwaikambo, Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org Pavel Machek wrote: > Hi! > > >>>>I agree. Still in all that follows, no one has addressed the apparent >>>>race described above. The reason the system reported the errors that >>>>started this thread is that the APM restore code was trying to read the >>>>cmos clock (I assume to set the xtime clock) WHILE the timer interrupt >>>>code what trying to set the cmos clock from xtime. In other words, it is >>>>destroying the time it is trying to read. I repeat "Possibly the APM >>>>code should change time_status to STA_UNSYNC on the way into the sleep." >>>>I am not sure how ntp is supposed to react to the resume but I suspect >>>>that the system time is rather out of sync... >>> >>> >>>It needs to work without NTP, too. You don't get NTP on plane (etc) >>>where suspend is most usefull. >>> >>>We have CMOS clock, it should be possible to get time from there >>>without resorting to NTP.. >> >>Eh... sure, but... the bug was reported because the system was attempting >>to update the cmos clock (which it does every ~11 min.) during APM exit. >>It does this IF AND ONLY IF the system is synced to an external source as >>indicated by the STA_UNSYNC bit being cleared in the time_state. Now, I >>don't know what or how APM and NTP are supposed to play together, but I >>suspect that on entry to APM time is no longer synced, thus my comment. >> >>As to your comment, the bug would never have shown its ugly face if the >>system wasn't using NTP. > > > Uh, ok, you are right. We should set time to STA_UNSYNC so that we do > not write back to CMOS during/shortly after resume. I did not realize > what STA_UNSYNC means. Perhaps you have patch to do that somewhere? > ;-)))) Zwane has convinced me that the real problem is doing the right thing (tm) in the APM code, i.e. not allowing the timer interrupt until after reading the cmos clock, for which he has a patch tendered. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-12 15:56 ` spin_lock error in arch/i386/kernel/time.c on APM resume George Anzinger 2005-03-12 16:25 ` Zwane Mwaikambo @ 2005-03-12 21:14 ` Barry K. Nathan 2005-03-12 21:18 ` Lee Revell 2 siblings, 0 replies; 18+ messages in thread From: Barry K. Nathan @ 2005-03-12 21:14 UTC (permalink / raw) To: George Anzinger Cc: Andrew Morton, J. Bruce Fields, Lee Revell, Ingo Molnar, linux-kernel@vger.kernel.org On Sat, Mar 12, 2005 at 07:56:10AM -0800, George Anzinger wrote: > Yesterday's night mare, todays bug :( Actually, I think people have been hitting this bug for a few months on Fedora Core, so it's not really *today's* bug... This might be the first time it's getting discussed on LKML though. (I haven't been reading LKML closely enough to say that with 100% certainty, however.) Summary: kernel-2.6.9-1.724_FC3 breaks APM suspend on Thinkpad https://bugzilla.redhat.com/beta/show_bug.cgi?id=144415 -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: spin_lock error in arch/i386/kernel/time.c on APM resume 2005-03-12 15:56 ` spin_lock error in arch/i386/kernel/time.c on APM resume George Anzinger 2005-03-12 16:25 ` Zwane Mwaikambo 2005-03-12 21:14 ` Barry K. Nathan @ 2005-03-12 21:18 ` Lee Revell 2 siblings, 0 replies; 18+ messages in thread From: Lee Revell @ 2005-03-12 21:18 UTC (permalink / raw) To: george Cc: Andrew Morton, J. Bruce Fields, Ingo Molnar, linux-kernel@vger.kernel.org On Sat, 2005-03-12 at 07:56 -0800, George Anzinger wrote: > J. Bruce Fields wrote: > > On APM resume this morning on my Thinkpad X31, I got a "spin_lock is > > already locked" error; see below. This doesn't happen on every resume, > > though it's happened before. The kernel is 2.6.11 plus a bunch of > > (hopefully unrelated...) NFS patches. > > > > Any ideas? > > > Yesterday's night mare, todays bug :( > Wait, so this is the same theoretical bug you discussed in the do_timer_interrupt thread? Wow, a real schroedinbug... Lee ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-03-15 22:57 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-03-12 13:11 spin_lock error in arch/i386/kernel/time.c on APM resume J. Bruce Fields 2005-03-12 15:21 ` [PATCH] APM: fix interrupts enabled in device_power_up Zwane Mwaikambo 2005-03-15 22:33 ` J. Bruce Fields 2005-03-15 22:40 ` Zwane Mwaikambo 2005-03-12 15:56 ` spin_lock error in arch/i386/kernel/time.c on APM resume George Anzinger 2005-03-12 16:25 ` Zwane Mwaikambo 2005-03-12 16:36 ` Venkatesh Pallipadi 2005-03-12 16:46 ` Zwane Mwaikambo 2005-03-12 17:45 ` Venkatesh Pallipadi 2005-03-12 18:04 ` George Anzinger 2005-03-12 19:58 ` Zwane Mwaikambo 2005-03-12 20:25 ` George Anzinger 2005-03-13 18:35 ` Pavel Machek 2005-03-14 23:49 ` George Anzinger 2005-03-15 0:08 ` Pavel Machek 2005-03-15 9:04 ` George Anzinger 2005-03-12 21:14 ` Barry K. Nathan 2005-03-12 21:18 ` Lee Revell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox