* [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt()
@ 2008-08-23 9:48 Mikael Pettersson
2008-08-23 16:01 ` [PATCH] rtc: fix deadlock Ingo Molnar
2008-08-24 9:14 ` [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Vegard Nossum
0 siblings, 2 replies; 15+ messages in thread
From: Mikael Pettersson @ 2008-08-23 9:48 UTC (permalink / raw)
To: linux-kernel; +Cc: hpa, mingo, tglx
Since 2.6.27-rc1 my Core2Duo has been getting sporadic oopses
from hpet_rtc_interrupt, usually during shutdown or reboot,
but occasionally also early in init. Today I finally managed
to capture one via a serial cable:
INIT: version 2.86 booting
Welcome to Fedora Core
Press 'I' to enter interactive startup.
BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0117092, registers:
Modules linked in: ehci_hcd uhci_hcd usbcore
Pid: 311, comm: nash-hotplug Not tainted (2.6.27-rc4 #1)
EIP: 0060:[<c0117092>] EFLAGS: 00000097 CPU: 0
EIP is at hpet_rtc_interrupt+0x2d2/0x310
EAX: 00000000 EBX: 00000002 ECX: 00000046 EDX: 00000002
ESI: 000000a6 EDI: ffff8e25 EBP: 00000008 ESP: f7bd7f28
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process nash-hotplug (pid: 311, ti=f7bd6000 task=f7b70460 task.ti=f7bd6000)
Stack: f7bd7f6c c0139cc0 00000000 c035ba04 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 f7b845a0 00000000 00000000
00000008 c01478a8 c035bf80 f7b845a0 c035bfb0 00000008 c0148f71 00000400
Call Trace:
[<c0139cc0>] hrtimer_run_pending+0x20/0x90
[<c01478a8>] handle_IRQ_event+0x28/0x50
[<c0148f71>] handle_edge_irq+0xa1/0x120
[<c010615b>] do_IRQ+0x3b/0x70
[<c0113225>] smp_apic_timer_interrupt+0x55/0x80
[<c0103c4f>] common_interrupt+0x23/0x28
[<c02c0000>] unix_release_sock+0xc0/0x220
=======================
Code: 89 44 24 18 0f b6 c2 e8 5d 74 0c 00 8b 0d d8 9c 3b c0 89 44 24 1c 8b 44 24 0c 48 89 44 24 20 e9 84 fd ff ff 90 8d 74 26 00 f3 90 <a1> 80 ba 35 c0 29 f8 83 f8 01 76 f2 e9 e1 fe ff ff 90 8d 74 26
This points to the following loop in hpet_rtc_interrupt:
0xc0117090 <hpet_rtc_interrupt+720>: pause
0xc0117092 <hpet_rtc_interrupt+722>: mov 0xc035ba80,%eax
0xc0117097 <hpet_rtc_interrupt+727>: sub %edi,%eax
0xc0117099 <hpet_rtc_interrupt+729>: cmp $0x1,%eax
0xc011709c <hpet_rtc_interrupt+732>: jbe 0xc0117090 <hpet_rtc_interrupt+720>
Note: 0xc035ba80 == &jiffies
This loop originates from asm-generic/rtc.h:get_rtc_time()
while (jiffies - uip_watchdog < 2*HZ/100) {
barrier();
cpu_relax();
}
Note: HZ == CONFIG_HZ == 100
The bug may not originate from the 2.6.27-rc series as I only recently
enabled HPET in this machine's kernels (not due to HPET problems, it
inherited its .config way back from an older machine w/o HPET).
/Mikael
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH] rtc: fix deadlock 2008-08-23 9:48 [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Mikael Pettersson @ 2008-08-23 16:01 ` Ingo Molnar 2008-08-23 16:58 ` Mikael Pettersson 2008-08-29 11:48 ` [PATCH] rtc: fix deadlock: fixes regression since 2.6.24 Frank van Maarseveen 2008-08-24 9:14 ` [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Vegard Nossum 1 sibling, 2 replies; 15+ messages in thread From: Ingo Molnar @ 2008-08-23 16:01 UTC (permalink / raw) To: Mikael Pettersson; +Cc: linux-kernel, hpa, mingo, tglx, Andrew Morton * Mikael Pettersson <mikpe@it.uu.se> wrote: > Since 2.6.27-rc1 my Core2Duo has been getting sporadic oopses > from hpet_rtc_interrupt, usually during shutdown or reboot, > but occasionally also early in init. Today I finally managed > to capture one via a serial cable: > > INIT: version 2.86 booting > Welcome to Fedora Core > Press 'I' to enter interactive startup. > BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0117092, registers: > Modules linked in: ehci_hcd uhci_hcd usbcore > > Pid: 311, comm: nash-hotplug Not tainted (2.6.27-rc4 #1) > EIP: 0060:[<c0117092>] EFLAGS: 00000097 CPU: 0 > EIP is at hpet_rtc_interrupt+0x2d2/0x310 > EAX: 00000000 EBX: 00000002 ECX: 00000046 EDX: 00000002 > ESI: 000000a6 EDI: ffff8e25 EBP: 00000008 ESP: f7bd7f28 > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > Process nash-hotplug (pid: 311, ti=f7bd6000 task=f7b70460 task.ti=f7bd6000) > Stack: f7bd7f6c c0139cc0 00000000 c035ba04 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 f7b845a0 00000000 00000000 > 00000008 c01478a8 c035bf80 f7b845a0 c035bfb0 00000008 c0148f71 00000400 > Call Trace: > [<c0139cc0>] hrtimer_run_pending+0x20/0x90 > [<c01478a8>] handle_IRQ_event+0x28/0x50 > [<c0148f71>] handle_edge_irq+0xa1/0x120 > [<c010615b>] do_IRQ+0x3b/0x70 > [<c0113225>] smp_apic_timer_interrupt+0x55/0x80 > [<c0103c4f>] common_interrupt+0x23/0x28 > [<c02c0000>] unix_release_sock+0xc0/0x220 > ======================= > Code: 89 44 24 18 0f b6 c2 e8 5d 74 0c 00 8b 0d d8 9c 3b c0 89 44 24 1c 8b 44 24 0c 48 89 44 24 20 e9 84 fd ff ff 90 8d 74 26 00 f3 90 <a1> 80 ba 35 c0 29 f8 83 f8 01 76 f2 e9 e1 fe ff ff 90 8d 74 26 > > This points to the following loop in hpet_rtc_interrupt: > > 0xc0117090 <hpet_rtc_interrupt+720>: pause > 0xc0117092 <hpet_rtc_interrupt+722>: mov 0xc035ba80,%eax > 0xc0117097 <hpet_rtc_interrupt+727>: sub %edi,%eax > 0xc0117099 <hpet_rtc_interrupt+729>: cmp $0x1,%eax > 0xc011709c <hpet_rtc_interrupt+732>: jbe 0xc0117090 <hpet_rtc_interrupt+720> > > Note: 0xc035ba80 == &jiffies > > This loop originates from asm-generic/rtc.h:get_rtc_time() > > while (jiffies - uip_watchdog < 2*HZ/100) { > barrier(); > cpu_relax(); > } > > Note: HZ == CONFIG_HZ == 100 > > The bug may not originate from the 2.6.27-rc series as I only recently > enabled HPET in this machine's kernels (not due to HPET problems, it > inherited its .config way back from an older machine w/o HPET). argh, that loop in asm-generic/rtc.h:get_rtc_time looks extremely fragile, we'll lock up if it's ever called with hardirqs off! Does the patch below do the trick? Ingo -----------------> >From 2273cc870b52a7ed09eb225142a6db97299e4f39 Mon Sep 17 00:00:00 2001 From: Ingo Molnar <mingo@elte.hu> Date: Sat, 23 Aug 2008 17:59:07 +0200 Subject: [PATCH] rtc: fix deadlock if get_rtc_time() is _ever_ called with IRQs off, we deadlock badly in it, waiting for jiffies to increment. So make the code more robust by doing an explicit mdelay(20). This solves a very hard to reproduce/debug hard lockup reported by Mikael Pettersson. Reported-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- include/asm-generic/rtc.h | 12 ++++-------- 1 files changed, 4 insertions(+), 8 deletions(-) diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h index be4af00..71ef3f0 100644 --- a/include/asm-generic/rtc.h +++ b/include/asm-generic/rtc.h @@ -15,6 +15,7 @@ #include <linux/mc146818rtc.h> #include <linux/rtc.h> #include <linux/bcd.h> +#include <linux/delay.h> #define RTC_PIE 0x40 /* periodic interrupt enable */ #define RTC_AIE 0x20 /* alarm interrupt enable */ @@ -43,7 +44,6 @@ static inline unsigned char rtc_is_updating(void) static inline unsigned int get_rtc_time(struct rtc_time *time) { - unsigned long uip_watchdog = jiffies; unsigned char ctrl; unsigned long flags; @@ -53,19 +53,15 @@ static inline unsigned int get_rtc_time(struct rtc_time *time) /* * read RTC once any update in progress is done. The update - * can take just over 2ms. We wait 10 to 20ms. There is no need to + * can take just over 2ms. We wait 20ms. There is no need to * to poll-wait (up to 1s - eeccch) for the falling edge of RTC_UIP. * If you need to know *exactly* when a second has started, enable * periodic update complete interrupts, (via ioctl) and then * immediately read /dev/rtc which will block until you get the IRQ. * Once the read clears, read the RTC time (again via ioctl). Easy. */ - - if (rtc_is_updating() != 0) - while (jiffies - uip_watchdog < 2*HZ/100) { - barrier(); - cpu_relax(); - } + if (rtc_is_updating()) + mdelay(20); /* * Only the values that we read from the RTC are set. We leave ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] rtc: fix deadlock 2008-08-23 16:01 ` [PATCH] rtc: fix deadlock Ingo Molnar @ 2008-08-23 16:58 ` Mikael Pettersson 2008-08-23 17:11 ` Ingo Molnar 2008-08-29 11:48 ` [PATCH] rtc: fix deadlock: fixes regression since 2.6.24 Frank van Maarseveen 1 sibling, 1 reply; 15+ messages in thread From: Mikael Pettersson @ 2008-08-23 16:58 UTC (permalink / raw) To: Ingo Molnar Cc: Mikael Pettersson, linux-kernel, hpa, mingo, tglx, Andrew Morton Ingo Molnar writes: > > * Mikael Pettersson <mikpe@it.uu.se> wrote: > > > Since 2.6.27-rc1 my Core2Duo has been getting sporadic oopses > > from hpet_rtc_interrupt, usually during shutdown or reboot, > > but occasionally also early in init. Today I finally managed > > to capture one via a serial cable: > > > > INIT: version 2.86 booting > > Welcome to Fedora Core > > Press 'I' to enter interactive startup. > > BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0117092, registers: > > Modules linked in: ehci_hcd uhci_hcd usbcore > > > > Pid: 311, comm: nash-hotplug Not tainted (2.6.27-rc4 #1) > > EIP: 0060:[<c0117092>] EFLAGS: 00000097 CPU: 0 > > EIP is at hpet_rtc_interrupt+0x2d2/0x310 > > EAX: 00000000 EBX: 00000002 ECX: 00000046 EDX: 00000002 > > ESI: 000000a6 EDI: ffff8e25 EBP: 00000008 ESP: f7bd7f28 > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > Process nash-hotplug (pid: 311, ti=f7bd6000 task=f7b70460 task.ti=f7bd6000) > > Stack: f7bd7f6c c0139cc0 00000000 c035ba04 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 00000000 00000000 f7b845a0 00000000 00000000 > > 00000008 c01478a8 c035bf80 f7b845a0 c035bfb0 00000008 c0148f71 00000400 > > Call Trace: > > [<c0139cc0>] hrtimer_run_pending+0x20/0x90 > > [<c01478a8>] handle_IRQ_event+0x28/0x50 > > [<c0148f71>] handle_edge_irq+0xa1/0x120 > > [<c010615b>] do_IRQ+0x3b/0x70 > > [<c0113225>] smp_apic_timer_interrupt+0x55/0x80 > > [<c0103c4f>] common_interrupt+0x23/0x28 > > [<c02c0000>] unix_release_sock+0xc0/0x220 > > ======================= > > Code: 89 44 24 18 0f b6 c2 e8 5d 74 0c 00 8b 0d d8 9c 3b c0 89 44 24 1c 8b 44 24 0c 48 89 44 24 20 e9 84 fd ff ff 90 8d 74 26 00 f3 90 <a1> 80 ba 35 c0 29 f8 83 f8 01 76 f2 e9 e1 fe ff ff 90 8d 74 26 > > > > This points to the following loop in hpet_rtc_interrupt: > > > > 0xc0117090 <hpet_rtc_interrupt+720>: pause > > 0xc0117092 <hpet_rtc_interrupt+722>: mov 0xc035ba80,%eax > > 0xc0117097 <hpet_rtc_interrupt+727>: sub %edi,%eax > > 0xc0117099 <hpet_rtc_interrupt+729>: cmp $0x1,%eax > > 0xc011709c <hpet_rtc_interrupt+732>: jbe 0xc0117090 <hpet_rtc_interrupt+720> > > > > Note: 0xc035ba80 == &jiffies > > > > This loop originates from asm-generic/rtc.h:get_rtc_time() > > > > while (jiffies - uip_watchdog < 2*HZ/100) { > > barrier(); > > cpu_relax(); > > } > > > > Note: HZ == CONFIG_HZ == 100 > > > > The bug may not originate from the 2.6.27-rc series as I only recently > > enabled HPET in this machine's kernels (not due to HPET problems, it > > inherited its .config way back from an older machine w/o HPET). > > argh, that loop in asm-generic/rtc.h:get_rtc_time looks extremely > fragile, we'll lock up if it's ever called with hardirqs off! > > Does the patch below do the trick? Thanks for the patch, I'll give it a try ASAP. The sporadic nature of the bug means that it will probably take a couple of days of testing and dozens of reboots w/o problems before I'm confident to say that the problem's been fixed. /Mikael > > Ingo > > -----------------> > >From 2273cc870b52a7ed09eb225142a6db97299e4f39 Mon Sep 17 00:00:00 2001 > From: Ingo Molnar <mingo@elte.hu> > Date: Sat, 23 Aug 2008 17:59:07 +0200 > Subject: [PATCH] rtc: fix deadlock > > if get_rtc_time() is _ever_ called with IRQs off, we deadlock badly > in it, waiting for jiffies to increment. > > So make the code more robust by doing an explicit mdelay(20). > > This solves a very hard to reproduce/debug hard lockup reported > by Mikael Pettersson. > > Reported-by: Mikael Pettersson <mikpe@it.uu.se> > Signed-off-by: Ingo Molnar <mingo@elte.hu> > --- > include/asm-generic/rtc.h | 12 ++++-------- > 1 files changed, 4 insertions(+), 8 deletions(-) > > diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h > index be4af00..71ef3f0 100644 > --- a/include/asm-generic/rtc.h > +++ b/include/asm-generic/rtc.h > @@ -15,6 +15,7 @@ > #include <linux/mc146818rtc.h> > #include <linux/rtc.h> > #include <linux/bcd.h> > +#include <linux/delay.h> > > #define RTC_PIE 0x40 /* periodic interrupt enable */ > #define RTC_AIE 0x20 /* alarm interrupt enable */ > @@ -43,7 +44,6 @@ static inline unsigned char rtc_is_updating(void) > > static inline unsigned int get_rtc_time(struct rtc_time *time) > { > - unsigned long uip_watchdog = jiffies; > unsigned char ctrl; > unsigned long flags; > > @@ -53,19 +53,15 @@ static inline unsigned int get_rtc_time(struct rtc_time *time) > > /* > * read RTC once any update in progress is done. The update > - * can take just over 2ms. We wait 10 to 20ms. There is no need to > + * can take just over 2ms. We wait 20ms. There is no need to > * to poll-wait (up to 1s - eeccch) for the falling edge of RTC_UIP. > * If you need to know *exactly* when a second has started, enable > * periodic update complete interrupts, (via ioctl) and then > * immediately read /dev/rtc which will block until you get the IRQ. > * Once the read clears, read the RTC time (again via ioctl). Easy. > */ > - > - if (rtc_is_updating() != 0) > - while (jiffies - uip_watchdog < 2*HZ/100) { > - barrier(); > - cpu_relax(); > - } > + if (rtc_is_updating()) > + mdelay(20); > > /* > * Only the values that we read from the RTC are set. We leave > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] rtc: fix deadlock 2008-08-23 16:58 ` Mikael Pettersson @ 2008-08-23 17:11 ` Ingo Molnar 2008-08-23 18:45 ` Maciej W. Rozycki 0 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2008-08-23 17:11 UTC (permalink / raw) To: Mikael Pettersson; +Cc: linux-kernel, hpa, mingo, tglx, Andrew Morton * Mikael Pettersson <mikpe@it.uu.se> wrote: > Thanks for the patch, I'll give it a try ASAP. > > The sporadic nature of the bug means that it will probably take a > couple of days of testing and dozens of reboots w/o problems before > I'm confident to say that the problem's been fixed. btw., i've seen that lockup too on a testbox - but havent been able to track it down yet. You went further and presented the bug on a plate by tracking it down to the loop in get_rtc_time() - so now we can both test the fix and see whether the lockups are gone. Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] rtc: fix deadlock 2008-08-23 17:11 ` Ingo Molnar @ 2008-08-23 18:45 ` Maciej W. Rozycki 2008-08-23 19:46 ` Mikael Pettersson 0 siblings, 1 reply; 15+ messages in thread From: Maciej W. Rozycki @ 2008-08-23 18:45 UTC (permalink / raw) To: Ingo Molnar Cc: Mikael Pettersson, linux-kernel, hpa, mingo, tglx, Andrew Morton On Sat, 23 Aug 2008, Ingo Molnar wrote: > > The sporadic nature of the bug means that it will probably take a > > couple of days of testing and dozens of reboots w/o problems before > > I'm confident to say that the problem's been fixed. > > btw., i've seen that lockup too on a testbox - but havent been able to > track it down yet. You went further and presented the bug on a plate by > tracking it down to the loop in get_rtc_time() - so now we can both test > the fix and see whether the lockups are gone. Hmm, waiting for 20ms with interrupts disabled sounds like a bad idea -- a lot of system timer ticks may be lost and some devices may not be happy about such a delay either (serial?). I think your proposal only papers over the actual bug and it should simply be forbidden to call this function with interrupts disabled. I can see this is a part of the legacy RTC interface, so except from the long delay the function itself may have a chance to work with interrupts disabled. But with the modern RTC library this is no longer true as some RTC chips it supports can only be accessed with interrupts enabled, because their accessor functions may sleep. This is for example common with RTC chips wired through I2C. Therefore I think to avoid propagating the design problem with places calling get_rtc_time() when they eventually switch to the RTC library it would be better to put: BUG_ON(irqs_disabled()); into get_rtc_time() now and fix the callers which happen to trigger it. Maciej ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] rtc: fix deadlock 2008-08-23 18:45 ` Maciej W. Rozycki @ 2008-08-23 19:46 ` Mikael Pettersson 0 siblings, 0 replies; 15+ messages in thread From: Mikael Pettersson @ 2008-08-23 19:46 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ingo Molnar, Mikael Pettersson, linux-kernel, hpa, mingo, tglx, Andrew Morton Maciej W. Rozycki writes: > On Sat, 23 Aug 2008, Ingo Molnar wrote: > > > > The sporadic nature of the bug means that it will probably take a > > > couple of days of testing and dozens of reboots w/o problems before > > > I'm confident to say that the problem's been fixed. > > > > btw., i've seen that lockup too on a testbox - but havent been able to > > track it down yet. You went further and presented the bug on a plate by > > tracking it down to the loop in get_rtc_time() - so now we can both test > > the fix and see whether the lockups are gone. > > Hmm, waiting for 20ms with interrupts disabled sounds like a bad idea -- > a lot of system timer ticks may be lost and some devices may not be happy > about such a delay either (serial?). I think your proposal only papers > over the actual bug and it should simply be forbidden to call this > function with interrupts disabled. > > I can see this is a part of the legacy RTC interface, so except from the > long delay the function itself may have a chance to work with interrupts > disabled. But with the modern RTC library this is no longer true as some > RTC chips it supports can only be accessed with interrupts enabled, > because their accessor functions may sleep. This is for example common > with RTC chips wired through I2C. > > Therefore I think to avoid propagating the design problem with places > calling get_rtc_time() when they eventually switch to the RTC library it > would be better to put: > > BUG_ON(irqs_disabled()); > > into get_rtc_time() now and fix the callers which happen to trigger it. On my C2D, I've only seen the lockup happen when user-space is using hwclock to init system time (early init) or to flush system time to the hw (end of shutdown/reboot). And the lockup is always in hpet_rtc_interrupt. I've never ever seen the lockup while the system is up and running normally. So I suspect a bad interaction between the HPET RTC stuff and the RTC user-space interface. /Mikael ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] rtc: fix deadlock: fixes regression since 2.6.24 2008-08-23 16:01 ` [PATCH] rtc: fix deadlock Ingo Molnar 2008-08-23 16:58 ` Mikael Pettersson @ 2008-08-29 11:48 ` Frank van Maarseveen 2008-09-06 18:32 ` Ingo Molnar 1 sibling, 1 reply; 15+ messages in thread From: Frank van Maarseveen @ 2008-08-29 11:48 UTC (permalink / raw) To: Ingo Molnar Cc: Mikael Pettersson, linux-kernel, hpa, mingo, tglx, Andrew Morton On Sat, Aug 23, 2008 at 06:01:51PM +0200, Ingo Molnar wrote: > > * Mikael Pettersson <mikpe@it.uu.se> wrote: > > > Since 2.6.27-rc1 my Core2Duo has been getting sporadic oopses > > from hpet_rtc_interrupt, usually during shutdown or reboot, > > but occasionally also early in init. Today I finally managed > > to capture one via a serial cable: > > > > INIT: version 2.86 booting > > Welcome to Fedora Core > > Press 'I' to enter interactive startup. > > BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0117092, registers: > > Modules linked in: ehci_hcd uhci_hcd usbcore > > > > Pid: 311, comm: nash-hotplug Not tainted (2.6.27-rc4 #1) > > EIP: 0060:[<c0117092>] EFLAGS: 00000097 CPU: 0 > > EIP is at hpet_rtc_interrupt+0x2d2/0x310 > > EAX: 00000000 EBX: 00000002 ECX: 00000046 EDX: 00000002 > > ESI: 000000a6 EDI: ffff8e25 EBP: 00000008 ESP: f7bd7f28 > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > Process nash-hotplug (pid: 311, ti=f7bd6000 task=f7b70460 task.ti=f7bd6000) > > Stack: f7bd7f6c c0139cc0 00000000 c035ba04 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 00000000 00000000 f7b845a0 00000000 00000000 > > 00000008 c01478a8 c035bf80 f7b845a0 c035bfb0 00000008 c0148f71 00000400 > > Call Trace: > > [<c0139cc0>] hrtimer_run_pending+0x20/0x90 > > [<c01478a8>] handle_IRQ_event+0x28/0x50 > > [<c0148f71>] handle_edge_irq+0xa1/0x120 > > [<c010615b>] do_IRQ+0x3b/0x70 > > [<c0113225>] smp_apic_timer_interrupt+0x55/0x80 > > [<c0103c4f>] common_interrupt+0x23/0x28 > > [<c02c0000>] unix_release_sock+0xc0/0x220 > > ======================= > > Code: 89 44 24 18 0f b6 c2 e8 5d 74 0c 00 8b 0d d8 9c 3b c0 89 44 24 1c 8b 44 24 0c 48 89 44 24 20 e9 84 fd ff ff 90 8d 74 26 00 f3 90 <a1> 80 ba 35 c0 29 f8 83 f8 01 76 f2 e9 e1 fe ff ff 90 8d 74 26 > > > > This points to the following loop in hpet_rtc_interrupt: > > > > 0xc0117090 <hpet_rtc_interrupt+720>: pause > > 0xc0117092 <hpet_rtc_interrupt+722>: mov 0xc035ba80,%eax > > 0xc0117097 <hpet_rtc_interrupt+727>: sub %edi,%eax > > 0xc0117099 <hpet_rtc_interrupt+729>: cmp $0x1,%eax > > 0xc011709c <hpet_rtc_interrupt+732>: jbe 0xc0117090 <hpet_rtc_interrupt+720> > > > > Note: 0xc035ba80 == &jiffies > > > > This loop originates from asm-generic/rtc.h:get_rtc_time() > > > > while (jiffies - uip_watchdog < 2*HZ/100) { > > barrier(); > > cpu_relax(); > > } > > > > Note: HZ == CONFIG_HZ == 100 > > > > The bug may not originate from the 2.6.27-rc series as I only recently > > enabled HPET in this machine's kernels (not due to HPET problems, it > > inherited its .config way back from an older machine w/o HPET). > > argh, that loop in asm-generic/rtc.h:get_rtc_time looks extremely > fragile, we'll lock up if it's ever called with hardirqs off! > > Does the patch below do the trick? > > Ingo > > -----------------> > >From 2273cc870b52a7ed09eb225142a6db97299e4f39 Mon Sep 17 00:00:00 2001 > From: Ingo Molnar <mingo@elte.hu> > Date: Sat, 23 Aug 2008 17:59:07 +0200 > Subject: [PATCH] rtc: fix deadlock > > if get_rtc_time() is _ever_ called with IRQs off, we deadlock badly > in it, waiting for jiffies to increment. > > So make the code more robust by doing an explicit mdelay(20). > > This solves a very hard to reproduce/debug hard lockup reported > by Mikael Pettersson. > > Reported-by: Mikael Pettersson <mikpe@it.uu.se> > Signed-off-by: Ingo Molnar <mingo@elte.hu> > --- > include/asm-generic/rtc.h | 12 ++++-------- > 1 files changed, 4 insertions(+), 8 deletions(-) > > diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h > index be4af00..71ef3f0 100644 > --- a/include/asm-generic/rtc.h > +++ b/include/asm-generic/rtc.h > @@ -15,6 +15,7 @@ > #include <linux/mc146818rtc.h> > #include <linux/rtc.h> > #include <linux/bcd.h> > +#include <linux/delay.h> > > #define RTC_PIE 0x40 /* periodic interrupt enable */ > #define RTC_AIE 0x20 /* alarm interrupt enable */ > @@ -43,7 +44,6 @@ static inline unsigned char rtc_is_updating(void) > > static inline unsigned int get_rtc_time(struct rtc_time *time) > { > - unsigned long uip_watchdog = jiffies; > unsigned char ctrl; > unsigned long flags; > > @@ -53,19 +53,15 @@ static inline unsigned int get_rtc_time(struct rtc_time *time) > > /* > * read RTC once any update in progress is done. The update > - * can take just over 2ms. We wait 10 to 20ms. There is no need to > + * can take just over 2ms. We wait 20ms. There is no need to > * to poll-wait (up to 1s - eeccch) for the falling edge of RTC_UIP. > * If you need to know *exactly* when a second has started, enable > * periodic update complete interrupts, (via ioctl) and then > * immediately read /dev/rtc which will block until you get the IRQ. > * Once the read clears, read the RTC time (again via ioctl). Easy. > */ > - > - if (rtc_is_updating() != 0) > - while (jiffies - uip_watchdog < 2*HZ/100) { > - barrier(); > - cpu_relax(); > - } > + if (rtc_is_updating()) > + mdelay(20); > > /* > * Only the values that we read from the RTC are set. We leave This patch fixes a regression since 2.6.24: 2.6.25 and 2.6.26 occasionally locked up hard here without a trace and even alt-sysrq did not work anymore. It's easy to reproduce with while :; do hwclock; done Others are experiencing this issue too: - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=494036 - http://kerneltrap.org/mailarchive/message-id/20080821163920.GA19140@gamma.logic.tuwien.ac.at/linux-kernel - people (me included) experienced booting problems because of this (lockup after initscripts says "Setting the system clock"). maybe this is 2.6.25.x and 2.6.26.x material too? Anyway, get_rtc_time() is called by interrupt handler(s). I think 20ms is a awful lot of time for an interrupt handler. -- Frank ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] rtc: fix deadlock: fixes regression since 2.6.24 2008-08-29 11:48 ` [PATCH] rtc: fix deadlock: fixes regression since 2.6.24 Frank van Maarseveen @ 2008-09-06 18:32 ` Ingo Molnar 2008-10-01 23:51 ` [stable] " Greg KH 0 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2008-09-06 18:32 UTC (permalink / raw) To: Frank van Maarseveen Cc: Mikael Pettersson, linux-kernel, hpa, mingo, tglx, Andrew Morton, stable * Frank van Maarseveen <frankvm@frankvm.com> wrote: > On Sat, Aug 23, 2008 at 06:01:51PM +0200, Ingo Molnar wrote: > > > > * Mikael Pettersson <mikpe@it.uu.se> wrote: > > > > > Since 2.6.27-rc1 my Core2Duo has been getting sporadic oopses > > > from hpet_rtc_interrupt, usually during shutdown or reboot, > > > but occasionally also early in init. Today I finally managed > > > to capture one via a serial cable: > > > > > > INIT: version 2.86 booting > > > Welcome to Fedora Core > > > Press 'I' to enter interactive startup. > > > BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0117092, registers: > > > Modules linked in: ehci_hcd uhci_hcd usbcore > > > > > > Pid: 311, comm: nash-hotplug Not tainted (2.6.27-rc4 #1) > > > EIP: 0060:[<c0117092>] EFLAGS: 00000097 CPU: 0 > > > EIP is at hpet_rtc_interrupt+0x2d2/0x310 > > > EAX: 00000000 EBX: 00000002 ECX: 00000046 EDX: 00000002 > > > ESI: 000000a6 EDI: ffff8e25 EBP: 00000008 ESP: f7bd7f28 > > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > > Process nash-hotplug (pid: 311, ti=f7bd6000 task=f7b70460 task.ti=f7bd6000) > > > Stack: f7bd7f6c c0139cc0 00000000 c035ba04 00000000 00000000 00000000 00000000 > > > 00000000 00000000 00000000 00000000 00000000 f7b845a0 00000000 00000000 > > > 00000008 c01478a8 c035bf80 f7b845a0 c035bfb0 00000008 c0148f71 00000400 > > > Call Trace: > > > [<c0139cc0>] hrtimer_run_pending+0x20/0x90 > > > [<c01478a8>] handle_IRQ_event+0x28/0x50 > > > [<c0148f71>] handle_edge_irq+0xa1/0x120 > > > [<c010615b>] do_IRQ+0x3b/0x70 > > > [<c0113225>] smp_apic_timer_interrupt+0x55/0x80 > > > [<c0103c4f>] common_interrupt+0x23/0x28 > > > [<c02c0000>] unix_release_sock+0xc0/0x220 > > > ======================= > > > Code: 89 44 24 18 0f b6 c2 e8 5d 74 0c 00 8b 0d d8 9c 3b c0 89 44 24 1c 8b 44 24 0c 48 89 44 24 20 e9 84 fd ff ff 90 8d 74 26 00 f3 90 <a1> 80 ba 35 c0 29 f8 83 f8 01 76 f2 e9 e1 fe ff ff 90 8d 74 26 > > > > > > This points to the following loop in hpet_rtc_interrupt: > > > > > > 0xc0117090 <hpet_rtc_interrupt+720>: pause > > > 0xc0117092 <hpet_rtc_interrupt+722>: mov 0xc035ba80,%eax > > > 0xc0117097 <hpet_rtc_interrupt+727>: sub %edi,%eax > > > 0xc0117099 <hpet_rtc_interrupt+729>: cmp $0x1,%eax > > > 0xc011709c <hpet_rtc_interrupt+732>: jbe 0xc0117090 <hpet_rtc_interrupt+720> > > > > > > Note: 0xc035ba80 == &jiffies > > > > > > This loop originates from asm-generic/rtc.h:get_rtc_time() > > > > > > while (jiffies - uip_watchdog < 2*HZ/100) { > > > barrier(); > > > cpu_relax(); > > > } > > > > > > Note: HZ == CONFIG_HZ == 100 > > > > > > The bug may not originate from the 2.6.27-rc series as I only recently > > > enabled HPET in this machine's kernels (not due to HPET problems, it > > > inherited its .config way back from an older machine w/o HPET). > > > > argh, that loop in asm-generic/rtc.h:get_rtc_time looks extremely > > fragile, we'll lock up if it's ever called with hardirqs off! > > > > Does the patch below do the trick? > > > > Ingo > > > > -----------------> > > >From 2273cc870b52a7ed09eb225142a6db97299e4f39 Mon Sep 17 00:00:00 2001 > > From: Ingo Molnar <mingo@elte.hu> > > Date: Sat, 23 Aug 2008 17:59:07 +0200 > > Subject: [PATCH] rtc: fix deadlock > > > > if get_rtc_time() is _ever_ called with IRQs off, we deadlock badly > > in it, waiting for jiffies to increment. > > > > So make the code more robust by doing an explicit mdelay(20). > > > > This solves a very hard to reproduce/debug hard lockup reported > > by Mikael Pettersson. > > > > Reported-by: Mikael Pettersson <mikpe@it.uu.se> > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > --- > > include/asm-generic/rtc.h | 12 ++++-------- > > 1 files changed, 4 insertions(+), 8 deletions(-) > > > > diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h > > index be4af00..71ef3f0 100644 > > --- a/include/asm-generic/rtc.h > > +++ b/include/asm-generic/rtc.h > > @@ -15,6 +15,7 @@ > > #include <linux/mc146818rtc.h> > > #include <linux/rtc.h> > > #include <linux/bcd.h> > > +#include <linux/delay.h> > > > > #define RTC_PIE 0x40 /* periodic interrupt enable */ > > #define RTC_AIE 0x20 /* alarm interrupt enable */ > > @@ -43,7 +44,6 @@ static inline unsigned char rtc_is_updating(void) > > > > static inline unsigned int get_rtc_time(struct rtc_time *time) > > { > > - unsigned long uip_watchdog = jiffies; > > unsigned char ctrl; > > unsigned long flags; > > > > @@ -53,19 +53,15 @@ static inline unsigned int get_rtc_time(struct rtc_time *time) > > > > /* > > * read RTC once any update in progress is done. The update > > - * can take just over 2ms. We wait 10 to 20ms. There is no need to > > + * can take just over 2ms. We wait 20ms. There is no need to > > * to poll-wait (up to 1s - eeccch) for the falling edge of RTC_UIP. > > * If you need to know *exactly* when a second has started, enable > > * periodic update complete interrupts, (via ioctl) and then > > * immediately read /dev/rtc which will block until you get the IRQ. > > * Once the read clears, read the RTC time (again via ioctl). Easy. > > */ > > - > > - if (rtc_is_updating() != 0) > > - while (jiffies - uip_watchdog < 2*HZ/100) { > > - barrier(); > > - cpu_relax(); > > - } > > + if (rtc_is_updating()) > > + mdelay(20); > > > > /* > > * Only the values that we read from the RTC are set. We leave > > This patch fixes a regression since 2.6.24: 2.6.25 and 2.6.26 occasionally > locked up hard here without a trace and even alt-sysrq did not work > anymore. It's easy to reproduce with > > while :; do hwclock; done > > Others are experiencing this issue too: > - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=494036 > - http://kerneltrap.org/mailarchive/message-id/20080821163920.GA19140@gamma.logic.tuwien.ac.at/linux-kernel > - people (me included) experienced booting problems because of > this (lockup after initscripts says "Setting the system clock"). > > maybe this is 2.6.25.x and 2.6.26.x material too? agreed - stable Cc:-ed. It's about this upstream commit: | commit 38c052f8cff1bd323ccfa968136a9556652ee420 | Author: Ingo Molnar <mingo@elte.hu> | Date: Sat Aug 23 17:59:07 2008 +0200 | | rtc: fix deadlock please backport it into -stable .26 and .25. Thanks, Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [stable] [PATCH] rtc: fix deadlock: fixes regression since 2.6.24 2008-09-06 18:32 ` Ingo Molnar @ 2008-10-01 23:51 ` Greg KH 0 siblings, 0 replies; 15+ messages in thread From: Greg KH @ 2008-10-01 23:51 UTC (permalink / raw) To: Ingo Molnar Cc: Frank van Maarseveen, Mikael Pettersson, linux-kernel, stable, mingo, hpa, tglx, Andrew Morton On Sat, Sep 06, 2008 at 08:32:11PM +0200, Ingo Molnar wrote: > > * Frank van Maarseveen <frankvm@frankvm.com> wrote: > > > On Sat, Aug 23, 2008 at 06:01:51PM +0200, Ingo Molnar wrote: > > > > > > * Mikael Pettersson <mikpe@it.uu.se> wrote: > > > > > > > Since 2.6.27-rc1 my Core2Duo has been getting sporadic oopses > > > > from hpet_rtc_interrupt, usually during shutdown or reboot, > > > > but occasionally also early in init. Today I finally managed > > > > to capture one via a serial cable: > > > > > > > > INIT: version 2.86 booting > > > > Welcome to Fedora Core > > > > Press 'I' to enter interactive startup. > > > > BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0117092, registers: > > > > Modules linked in: ehci_hcd uhci_hcd usbcore > > > > > > > > Pid: 311, comm: nash-hotplug Not tainted (2.6.27-rc4 #1) > > > > EIP: 0060:[<c0117092>] EFLAGS: 00000097 CPU: 0 > > > > EIP is at hpet_rtc_interrupt+0x2d2/0x310 > > > > EAX: 00000000 EBX: 00000002 ECX: 00000046 EDX: 00000002 > > > > ESI: 000000a6 EDI: ffff8e25 EBP: 00000008 ESP: f7bd7f28 > > > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > > > Process nash-hotplug (pid: 311, ti=f7bd6000 task=f7b70460 task.ti=f7bd6000) > > > > Stack: f7bd7f6c c0139cc0 00000000 c035ba04 00000000 00000000 00000000 00000000 > > > > 00000000 00000000 00000000 00000000 00000000 f7b845a0 00000000 00000000 > > > > 00000008 c01478a8 c035bf80 f7b845a0 c035bfb0 00000008 c0148f71 00000400 > > > > Call Trace: > > > > [<c0139cc0>] hrtimer_run_pending+0x20/0x90 > > > > [<c01478a8>] handle_IRQ_event+0x28/0x50 > > > > [<c0148f71>] handle_edge_irq+0xa1/0x120 > > > > [<c010615b>] do_IRQ+0x3b/0x70 > > > > [<c0113225>] smp_apic_timer_interrupt+0x55/0x80 > > > > [<c0103c4f>] common_interrupt+0x23/0x28 > > > > [<c02c0000>] unix_release_sock+0xc0/0x220 > > > > ======================= > > > > Code: 89 44 24 18 0f b6 c2 e8 5d 74 0c 00 8b 0d d8 9c 3b c0 89 44 24 1c 8b 44 24 0c 48 89 44 24 20 e9 84 fd ff ff 90 8d 74 26 00 f3 90 <a1> 80 ba 35 c0 29 f8 83 f8 01 76 f2 e9 e1 fe ff ff 90 8d 74 26 > > > > > > > > This points to the following loop in hpet_rtc_interrupt: > > > > > > > > 0xc0117090 <hpet_rtc_interrupt+720>: pause > > > > 0xc0117092 <hpet_rtc_interrupt+722>: mov 0xc035ba80,%eax > > > > 0xc0117097 <hpet_rtc_interrupt+727>: sub %edi,%eax > > > > 0xc0117099 <hpet_rtc_interrupt+729>: cmp $0x1,%eax > > > > 0xc011709c <hpet_rtc_interrupt+732>: jbe 0xc0117090 <hpet_rtc_interrupt+720> > > > > > > > > Note: 0xc035ba80 == &jiffies > > > > > > > > This loop originates from asm-generic/rtc.h:get_rtc_time() > > > > > > > > while (jiffies - uip_watchdog < 2*HZ/100) { > > > > barrier(); > > > > cpu_relax(); > > > > } > > > > > > > > Note: HZ == CONFIG_HZ == 100 > > > > > > > > The bug may not originate from the 2.6.27-rc series as I only recently > > > > enabled HPET in this machine's kernels (not due to HPET problems, it > > > > inherited its .config way back from an older machine w/o HPET). > > > > > > argh, that loop in asm-generic/rtc.h:get_rtc_time looks extremely > > > fragile, we'll lock up if it's ever called with hardirqs off! > > > > > > Does the patch below do the trick? > > > > > > Ingo > > > > > > -----------------> > > > >From 2273cc870b52a7ed09eb225142a6db97299e4f39 Mon Sep 17 00:00:00 2001 > > > From: Ingo Molnar <mingo@elte.hu> > > > Date: Sat, 23 Aug 2008 17:59:07 +0200 > > > Subject: [PATCH] rtc: fix deadlock > > > > > > if get_rtc_time() is _ever_ called with IRQs off, we deadlock badly > > > in it, waiting for jiffies to increment. > > > > > > So make the code more robust by doing an explicit mdelay(20). > > > > > > This solves a very hard to reproduce/debug hard lockup reported > > > by Mikael Pettersson. > > > > > > Reported-by: Mikael Pettersson <mikpe@it.uu.se> > > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > > --- > > > include/asm-generic/rtc.h | 12 ++++-------- > > > 1 files changed, 4 insertions(+), 8 deletions(-) > > > > > > diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h > > > index be4af00..71ef3f0 100644 > > > --- a/include/asm-generic/rtc.h > > > +++ b/include/asm-generic/rtc.h > > > @@ -15,6 +15,7 @@ > > > #include <linux/mc146818rtc.h> > > > #include <linux/rtc.h> > > > #include <linux/bcd.h> > > > +#include <linux/delay.h> > > > > > > #define RTC_PIE 0x40 /* periodic interrupt enable */ > > > #define RTC_AIE 0x20 /* alarm interrupt enable */ > > > @@ -43,7 +44,6 @@ static inline unsigned char rtc_is_updating(void) > > > > > > static inline unsigned int get_rtc_time(struct rtc_time *time) > > > { > > > - unsigned long uip_watchdog = jiffies; > > > unsigned char ctrl; > > > unsigned long flags; > > > > > > @@ -53,19 +53,15 @@ static inline unsigned int get_rtc_time(struct rtc_time *time) > > > > > > /* > > > * read RTC once any update in progress is done. The update > > > - * can take just over 2ms. We wait 10 to 20ms. There is no need to > > > + * can take just over 2ms. We wait 20ms. There is no need to > > > * to poll-wait (up to 1s - eeccch) for the falling edge of RTC_UIP. > > > * If you need to know *exactly* when a second has started, enable > > > * periodic update complete interrupts, (via ioctl) and then > > > * immediately read /dev/rtc which will block until you get the IRQ. > > > * Once the read clears, read the RTC time (again via ioctl). Easy. > > > */ > > > - > > > - if (rtc_is_updating() != 0) > > > - while (jiffies - uip_watchdog < 2*HZ/100) { > > > - barrier(); > > > - cpu_relax(); > > > - } > > > + if (rtc_is_updating()) > > > + mdelay(20); > > > > > > /* > > > * Only the values that we read from the RTC are set. We leave > > > > This patch fixes a regression since 2.6.24: 2.6.25 and 2.6.26 occasionally > > locked up hard here without a trace and even alt-sysrq did not work > > anymore. It's easy to reproduce with > > > > while :; do hwclock; done > > > > Others are experiencing this issue too: > > - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=494036 > > - http://kerneltrap.org/mailarchive/message-id/20080821163920.GA19140@gamma.logic.tuwien.ac.at/linux-kernel > > - people (me included) experienced booting problems because of > > this (lockup after initscripts says "Setting the system clock"). > > > > maybe this is 2.6.25.x and 2.6.26.x material too? > > agreed - stable Cc:-ed. > > It's about this upstream commit: > > | commit 38c052f8cff1bd323ccfa968136a9556652ee420 > | Author: Ingo Molnar <mingo@elte.hu> > | Date: Sat Aug 23 17:59:07 2008 +0200 > | > | rtc: fix deadlock > > please backport it into -stable .26 and .25. Thanks, backported. thanks, greg k-h ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() 2008-08-23 9:48 [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Mikael Pettersson 2008-08-23 16:01 ` [PATCH] rtc: fix deadlock Ingo Molnar @ 2008-08-24 9:14 ` Vegard Nossum 2008-08-24 10:32 ` Mikael Pettersson 1 sibling, 1 reply; 15+ messages in thread From: Vegard Nossum @ 2008-08-24 9:14 UTC (permalink / raw) To: Mikael Pettersson; +Cc: linux-kernel, hpa, mingo, tglx Hi, On Sat, Aug 23, 2008 at 11:48 AM, Mikael Pettersson <mikpe@it.uu.se> wrote: > Since 2.6.27-rc1 my Core2Duo has been getting sporadic oopses > from hpet_rtc_interrupt, usually during shutdown or reboot, > but occasionally also early in init. Today I finally managed > to capture one via a serial cable: > > INIT: version 2.86 booting > Welcome to Fedora Core > Press 'I' to enter interactive startup. > BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0117092, registers: > Modules linked in: ehci_hcd uhci_hcd usbcore > > Pid: 311, comm: nash-hotplug Not tainted (2.6.27-rc4 #1) > EIP: 0060:[<c0117092>] EFLAGS: 00000097 CPU: 0 > EIP is at hpet_rtc_interrupt+0x2d2/0x310 > EAX: 00000000 EBX: 00000002 ECX: 00000046 EDX: 00000002 > ESI: 000000a6 EDI: ffff8e25 EBP: 00000008 ESP: f7bd7f28 > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > Process nash-hotplug (pid: 311, ti=f7bd6000 task=f7b70460 task.ti=f7bd6000) > Stack: f7bd7f6c c0139cc0 00000000 c035ba04 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 f7b845a0 00000000 00000000 > 00000008 c01478a8 c035bf80 f7b845a0 c035bfb0 00000008 c0148f71 00000400 > Call Trace: > [<c0139cc0>] hrtimer_run_pending+0x20/0x90 > [<c01478a8>] handle_IRQ_event+0x28/0x50 > [<c0148f71>] handle_edge_irq+0xa1/0x120 > [<c010615b>] do_IRQ+0x3b/0x70 > [<c0113225>] smp_apic_timer_interrupt+0x55/0x80 > [<c0103c4f>] common_interrupt+0x23/0x28 > [<c02c0000>] unix_release_sock+0xc0/0x220 > ======================= > Code: 89 44 24 18 0f b6 c2 e8 5d 74 0c 00 8b 0d d8 9c 3b c0 89 44 24 1c 8b 44 24 0c 48 89 44 24 20 e9 84 fd ff ff 90 8d 74 26 00 f3 90 <a1> 80 ba 35 c0 29 f8 83 f8 01 76 f2 e9 e1 fe ff ff 90 8d 74 26 > > This points to the following loop in hpet_rtc_interrupt: > > 0xc0117090 <hpet_rtc_interrupt+720>: pause > 0xc0117092 <hpet_rtc_interrupt+722>: mov 0xc035ba80,%eax > 0xc0117097 <hpet_rtc_interrupt+727>: sub %edi,%eax > 0xc0117099 <hpet_rtc_interrupt+729>: cmp $0x1,%eax > 0xc011709c <hpet_rtc_interrupt+732>: jbe 0xc0117090 <hpet_rtc_interrupt+720> > > Note: 0xc035ba80 == &jiffies > > This loop originates from asm-generic/rtc.h:get_rtc_time() > > while (jiffies - uip_watchdog < 2*HZ/100) { > barrier(); > cpu_relax(); > } > > Note: HZ == CONFIG_HZ == 100 > > The bug may not originate from the 2.6.27-rc series as I only recently > enabled HPET in this machine's kernels (not due to HPET problems, it > inherited its .config way back from an older machine w/o HPET). I also just got this during shutdown: Syncing hardware clock to system time BUG: NMI Watchdog detected LOCKUP on CPU0, ip c011d922, registers: Pid: 4181, comm: hwclock Not tainted (2.6.27-rc3-00464-g1fca254-dirty #42) EIP: 0060:[<c011d922>] EFLAGS: 00200097 CPU: 0 EIP is at hpet_rtc_interrupt+0x282/0x2e0 EAX: 00000000 EBX: 00200096 ECX: f3990000 EDX: 00010000 ESI: 000000a6 EDI: 0004f806 EBP: f3991edc ESP: f3991e98 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process hwclock (pid: 4181, ti=f3990000 task=f359d340 task.ti=f3990000) Stack: f359d340 c08621c0 00000000 f359d340 00001d12 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 f6b4f788 00000000 00000008 f3991ef4 c017ac08 00000000 c0862180 f6b4f788 00000008 f3991f0c Call Trace: [<c017ac08>] ? handle_IRQ_event+0x28/0x70 [<c017c1cf>] ? handle_edge_irq+0xaf/0x140 [<c0107138>] ? do_IRQ+0x48/0xa0 [<c036e254>] ? trace_hardirqs_off_thunk+0xc/0x18 [<c0104a3c>] ? common_interrupt+0x28/0x30 [<c03c007b>] ? tty_get_baud_rate+0x3b/0x60 [<c036e641>] ? copy_from_user+0x1/0x80 [<c01bba1f>] ? sys_select+0x5f/0x190 [<c01ba157>] ? do_vfs_ioctl+0x57/0x2b0 [<c036e244>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c015bc64>] ? trace_hardirqs_on_caller+0xd4/0x160 [<c0103f3b>] ? sysenter_do_call+0x12/0x3f ======================= Code: 65 10 25 00 89 45 d8 0f b6 45 cc e8 59 10 25 00 89 45 dc 0f b6 45 c8 e8 4d 10 25 00 83 e8 01 89 45 e0 e9 04 fe f f ff 66 90 f3 90 <a1> 00 1b 86 c0 29 f8 83 f8 13 76 f2 e9 01 ff ff ff 83 c0 64 89 ------------[ cut here ]------------ WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/kernel/mutex.c:351 mutex_trylock+0x123/0x160() Pid: 4181, comm: hwclock Not tainted 2.6.27-rc3-00464-g1fca254-dirty #42 [<c0137ddf>] warn_on_slowpath+0x4f/0x80 [<c03c856d>] ? vt_console_print+0x1dd/0x2a0 [<c01589cb>] ? trace_hardirqs_off+0xb/0x10 [<c01500fb>] ? up+0x2b/0x40 [<c066fc85>] ? _spin_lock_irqsave+0x85/0xa0 [<c01383c1>] ? release_console_sem+0x1c1/0x1f0 [<c01589cb>] ? trace_hardirqs_off+0xb/0x10 [<c0670093>] ? _spin_unlock_irqrestore+0x43/0x70 [<c01383d5>] ? release_console_sem+0x1d5/0x1f0 [<c0138763>] ? vprintk+0x163/0x3c0 [<c01051f5>] ? print_trace_address+0x45/0x50 [<c0103f3b>] ? sysenter_do_call+0x12/0x3f [<c066da53>] mutex_trylock+0x123/0x160 [<c0670093>] ? _spin_unlock_irqrestore+0x43/0x70 [<c01694cb>] crash_kexec+0x1b/0xc0 [<c038bffe>] ? vgacon_cursor+0x16e/0x1d0 [<c03c8083>] ? set_cursor+0x53/0x70 [<c03ca20b>] ? do_unblank_screen+0xbb/0x130 [<c0105dd9>] ? die_nmi+0xb9/0x100 [<c011d922>] ? hpet_rtc_interrupt+0x282/0x2e0 [<c011d922>] ? hpet_rtc_interrupt+0x282/0x2e0 [<c0105e16>] die_nmi+0xf6/0x100 [<c011d922>] ? hpet_rtc_interrupt+0x282/0x2e0 [<c0118bc5>] nmi_watchdog_tick+0x1d5/0x1e0 [<c0106227>] do_nmi+0x97/0x2d0 [<c06704f3>] nmi_stack_correct+0x26/0x2b [<c015007b>] ? srcu_read_lock+0x3b/0x50 [<c06700d8>] ? _spin_unlock_irq+0x18/0x60 [<c011d922>] ? hpet_rtc_interrupt+0x282/0x2e0 [<c017ac08>] handle_IRQ_event+0x28/0x70 [<c017c1cf>] handle_edge_irq+0xaf/0x140 [<c0107138>] do_IRQ+0x48/0xa0 [<c036e254>] ? trace_hardirqs_off_thunk+0xc/0x18 [<c0104a3c>] common_interrupt+0x28/0x30 [<c03c007b>] ? tty_get_baud_rate+0x3b/0x60 [<c036e641>] ? copy_from_user+0x1/0x80 [<c01bba1f>] ? sys_select+0x5f/0x190 [<c01ba157>] ? do_vfs_ioctl+0x57/0x2b0 [<c036e244>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c015bc64>] ? trace_hardirqs_on_caller+0xd4/0x160 [<c0103f3b>] sysenter_do_call+0x12/0x3f ======================= ---[ end trace de495b709f3b3b84 ]--- Kernel panic - not syncing: Aiee, killing interrupt handler! ------------[ cut here ]------------ WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/kernel/smp.c:332 smp_call_function_mask+0x1b1/0x1c0() Pid: 4181, comm: hwclock Tainted: G W 2.6.27-rc3-00464-g1fca254-dirty #42 [<c0137ddf>] warn_on_slowpath+0x4f/0x80 [<c038be14>] ? vgacon_set_cursor_size+0xa4/0x120 [<c01589cb>] ? trace_hardirqs_off+0xb/0x10 [<c017dd2b>] ? __rcu_read_unlock+0x9b/0xc0 [<c01506ec>] ? __atomic_notifier_call_chain+0x3c/0x50 [<c03c856d>] ? vt_console_print+0x1dd/0x2a0 [<c015071a>] ? atomic_notifier_call_chain+0x1a/0x20 [<c037d056>] ? _raw_spin_unlock+0x46/0x80 [<c066ff87>] ? _spin_unlock+0x27/0x50 [<c03c856d>] ? vt_console_print+0x1dd/0x2a0 [<c01589cb>] ? trace_hardirqs_off+0xb/0x10 [<c01500fb>] ? up+0x2b/0x40 [<c066fc85>] ? _spin_lock_irqsave+0x85/0xa0 [<c01383c1>] ? release_console_sem+0x1c1/0x1f0 [<c01589cb>] ? trace_hardirqs_off+0xb/0x10 [<c0163371>] smp_call_function_mask+0x1b1/0x1c0 [<c01383d5>] ? release_console_sem+0x1d5/0x1f0 [<c01170e0>] ? stop_this_cpu+0x0/0x50 [<c066df98>] ? mutex_unlock+0x8/0x10 [<c01589cb>] ? trace_hardirqs_off+0xb/0x10 [<c066ded4>] ? __mutex_unlock_slowpath+0xa4/0x160 [<c066df98>] ? mutex_unlock+0x8/0x10 [<c016951d>] ? crash_kexec+0x6d/0xc0 [<c01051f5>] ? print_trace_address+0x45/0x50 [<c0103f3b>] ? sysenter_do_call+0x12/0x3f [<c01170e0>] ? stop_this_cpu+0x0/0x50 [<c01633b0>] smp_call_function+0x30/0x60 [<c011d922>] ? hpet_rtc_interrupt+0x282/0x2e0 [<c011718e>] native_smp_send_stop+0x1e/0x70 [<c011d922>] ? hpet_rtc_interrupt+0x282/0x2e0 [<c0137ccf>] panic+0x5f/0x120 [<c011d922>] ? hpet_rtc_interrupt+0x282/0x2e0 [<c013b282>] do_exit+0x7e2/0x880 [<c03c8083>] ? set_cursor+0x53/0x70 [<c03ca20b>] ? do_unblank_screen+0xbb/0x130 [<c0105dd9>] ? die_nmi+0xb9/0x100 [<c011d922>] ? hpet_rtc_interrupt+0x282/0x2e0 [<c011d922>] ? hpet_rtc_interrupt+0x282/0x2e0 [<c0105dff>] die_nmi+0xdf/0x100 [<c011d922>] ? hpet_rtc_interrupt+0x282/0x2e0 [<c0118bc5>] nmi_watchdog_tick+0x1d5/0x1e0 [<c0106227>] do_nmi+0x97/0x2d0 [<c06704f3>] nmi_stack_correct+0x26/0x2b [<c015007b>] ? srcu_read_lock+0x3b/0x50 [<c06700d8>] ? _spin_unlock_irq+0x18/0x60 [<c011d922>] ? hpet_rtc_interrupt+0x282/0x2e0 [<c017ac08>] handle_IRQ_event+0x28/0x70 [<c017c1cf>] handle_edge_irq+0xaf/0x140 [<c0107138>] do_IRQ+0x48/0xa0 [<c036e254>] ? trace_hardirqs_off_thunk+0xc/0x18 [<c0104a3c>] common_interrupt+0x28/0x30 [<c03c007b>] ? tty_get_baud_rate+0x3b/0x60 [<c036e641>] ? copy_from_user+0x1/0x80 [<c01bba1f>] ? sys_select+0x5f/0x190 [<c01ba157>] ? do_vfs_ioctl+0x57/0x2b0 [<c036e244>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c015bc64>] ? trace_hardirqs_on_caller+0xd4/0x160 [<c0103f3b>] sysenter_do_call+0x12/0x3f ======================= ---[ end trace de495b709f3b3b84 ]--- Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() 2008-08-24 9:14 ` [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Vegard Nossum @ 2008-08-24 10:32 ` Mikael Pettersson 2008-08-24 11:48 ` Vegard Nossum 0 siblings, 1 reply; 15+ messages in thread From: Mikael Pettersson @ 2008-08-24 10:32 UTC (permalink / raw) To: Vegard Nossum; +Cc: Mikael Pettersson, linux-kernel, hpa, mingo, tglx Vegard Nossum writes: > Hi, > > On Sat, Aug 23, 2008 at 11:48 AM, Mikael Pettersson <mikpe@it.uu.se> wrote: > > Since 2.6.27-rc1 my Core2Duo has been getting sporadic oopses > > from hpet_rtc_interrupt, usually during shutdown or reboot, > > but occasionally also early in init. Today I finally managed > > to capture one via a serial cable: > > > > INIT: version 2.86 booting > > Welcome to Fedora Core > > Press 'I' to enter interactive startup. > > BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0117092, registers: > > Modules linked in: ehci_hcd uhci_hcd usbcore > > > > Pid: 311, comm: nash-hotplug Not tainted (2.6.27-rc4 #1) > > EIP: 0060:[<c0117092>] EFLAGS: 00000097 CPU: 0 > > EIP is at hpet_rtc_interrupt+0x2d2/0x310 > > EAX: 00000000 EBX: 00000002 ECX: 00000046 EDX: 00000002 > > ESI: 000000a6 EDI: ffff8e25 EBP: 00000008 ESP: f7bd7f28 > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > Process nash-hotplug (pid: 311, ti=f7bd6000 task=f7b70460 task.ti=f7bd6000) > > Stack: f7bd7f6c c0139cc0 00000000 c035ba04 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 00000000 00000000 f7b845a0 00000000 00000000 > > 00000008 c01478a8 c035bf80 f7b845a0 c035bfb0 00000008 c0148f71 00000400 > > Call Trace: > > [<c0139cc0>] hrtimer_run_pending+0x20/0x90 > > [<c01478a8>] handle_IRQ_event+0x28/0x50 > > [<c0148f71>] handle_edge_irq+0xa1/0x120 > > [<c010615b>] do_IRQ+0x3b/0x70 > > [<c0113225>] smp_apic_timer_interrupt+0x55/0x80 > > [<c0103c4f>] common_interrupt+0x23/0x28 > > [<c02c0000>] unix_release_sock+0xc0/0x220 > > ======================= > > Code: 89 44 24 18 0f b6 c2 e8 5d 74 0c 00 8b 0d d8 9c 3b c0 89 44 24 1c 8b 44 24 0c 48 89 44 24 20 e9 84 fd ff ff 90 8d 74 26 00 f3 90 <a1> 80 ba 35 c0 29 f8 83 f8 01 76 f2 e9 e1 fe ff ff 90 8d 74 26 > > > > This points to the following loop in hpet_rtc_interrupt: > > > > 0xc0117090 <hpet_rtc_interrupt+720>: pause > > 0xc0117092 <hpet_rtc_interrupt+722>: mov 0xc035ba80,%eax > > 0xc0117097 <hpet_rtc_interrupt+727>: sub %edi,%eax > > 0xc0117099 <hpet_rtc_interrupt+729>: cmp $0x1,%eax > > 0xc011709c <hpet_rtc_interrupt+732>: jbe 0xc0117090 <hpet_rtc_interrupt+720> > > > > Note: 0xc035ba80 == &jiffies > > > > This loop originates from asm-generic/rtc.h:get_rtc_time() > > > > while (jiffies - uip_watchdog < 2*HZ/100) { > > barrier(); > > cpu_relax(); > > } > > > > Note: HZ == CONFIG_HZ == 100 > > > > The bug may not originate from the 2.6.27-rc series as I only recently > > enabled HPET in this machine's kernels (not due to HPET problems, it > > inherited its .config way back from an older machine w/o HPET). > > I also just got this during shutdown: > > Syncing hardware clock to system time BUG: NMI Watchdog detected > LOCKUP on CPU0, ip c011d922, registers: > Pid: 4181, comm: hwclock Not tainted (2.6.27-rc3-00464-g1fca254-dirty #42) > EIP: 0060:[<c011d922>] EFLAGS: 00200097 CPU: 0 > EIP is at hpet_rtc_interrupt+0x282/0x2e0 > EAX: 00000000 EBX: 00200096 ECX: f3990000 EDX: 00010000 > ESI: 000000a6 EDI: 0004f806 EBP: f3991edc ESP: f3991e98 > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > Process hwclock (pid: 4181, ti=f3990000 task=f359d340 task.ti=f3990000) > Stack: f359d340 c08621c0 00000000 f359d340 00001d12 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 00000000 f6b4f788 00000000 > 00000008 f3991ef4 c017ac08 00000000 c0862180 f6b4f788 00000008 f3991f0c > Call Trace: > [<c017ac08>] ? handle_IRQ_event+0x28/0x70 > [<c017c1cf>] ? handle_edge_irq+0xaf/0x140 > [<c0107138>] ? do_IRQ+0x48/0xa0 > [<c036e254>] ? trace_hardirqs_off_thunk+0xc/0x18 > [<c0104a3c>] ? common_interrupt+0x28/0x30 > [<c03c007b>] ? tty_get_baud_rate+0x3b/0x60 > [<c036e641>] ? copy_from_user+0x1/0x80 > [<c01bba1f>] ? sys_select+0x5f/0x190 > [<c01ba157>] ? do_vfs_ioctl+0x57/0x2b0 > [<c036e244>] ? trace_hardirqs_on_thunk+0xc/0x10 > [<c015bc64>] ? trace_hardirqs_on_caller+0xd4/0x160 > [<c0103f3b>] ? sysenter_do_call+0x12/0x3f > ======================= > Code: 65 10 25 00 89 45 d8 0f b6 45 cc e8 59 10 25 00 89 45 dc 0f b6 > 45 c8 e8 4d 10 25 00 83 e8 01 89 45 e0 e9 04 fe f > f ff 66 90 f3 90 <a1> 00 1b 86 c0 29 f8 83 f8 13 76 f2 e9 01 ff ff ff > 83 c0 64 89 See my reply in the thread following Ingo's patch <http://lkml.org/lkml/2008/8/23/158>. I've only seen the lockup while hwclock was setting or flushing the system time, so I suspect broken interaction between the hpet rtc emulation and the rtc user-space interface. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() 2008-08-24 10:32 ` Mikael Pettersson @ 2008-08-24 11:48 ` Vegard Nossum 2008-08-26 10:25 ` Alan Jenkins 0 siblings, 1 reply; 15+ messages in thread From: Vegard Nossum @ 2008-08-24 11:48 UTC (permalink / raw) To: Mikael Pettersson; +Cc: linux-kernel, hpa, mingo, tglx On Sun, Aug 24, 2008 at 12:32 PM, Mikael Pettersson <mikpe@it.uu.se> wrote: > > I also just got this during shutdown: > > > > Syncing hardware clock to system time BUG: NMI Watchdog detected > > LOCKUP on CPU0, ip c011d922, registers: > > Pid: 4181, comm: hwclock Not tainted (2.6.27-rc3-00464-g1fca254-dirty #42) [...] > See my reply in the thread following Ingo's patch > <http://lkml.org/lkml/2008/8/23/158>. I've only seen the lockup > while hwclock was setting or flushing the system time, so I suspect > broken interaction between the hpet rtc emulation and the rtc > user-space interface. Oops, I picked the wrong thread :-) Anyway, I guess it wouldn't be too hard to make a test cast by running "hwclock" (or the corresponding initscript) in a tight loop? (Sorry, I am not able to test this myself at the moment). That would make it easier to test the patch too, I guess. Maybe something like: $ while true; do /etc/init.d/ntpdate up; /etc/init.d/ntpdate down; done (At least "ntpdate" is the one that runs hwclock for me on F9.) I will try this later unless you beat me to it :-) Thanks, Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() 2008-08-24 11:48 ` Vegard Nossum @ 2008-08-26 10:25 ` Alan Jenkins 2008-08-26 10:39 ` Ingo Molnar 0 siblings, 1 reply; 15+ messages in thread From: Alan Jenkins @ 2008-08-26 10:25 UTC (permalink / raw) To: Vegard Nossum; +Cc: Mikael Pettersson, hpa, mingo, tglx, linux-kernel Vegard Nossum wrote: > On Sun, Aug 24, 2008 at 12:32 PM, Mikael Pettersson <mikpe@it.uu.se> wrote: > >> > I also just got this during shutdown: >> > >> > Syncing hardware clock to system time BUG: NMI Watchdog detected >> > LOCKUP on CPU0, ip c011d922, registers: >> > Pid: 4181, comm: hwclock Not tainted (2.6.27-rc3-00464-g1fca254-dirty #42) >> > [...] > > >> See my reply in the thread following Ingo's patch >> <http://lkml.org/lkml/2008/8/23/158>. I've only seen the lockup >> while hwclock was setting or flushing the system time, so I suspect >> broken interaction between the hpet rtc emulation and the rtc >> user-space interface. >> > > Oops, I picked the wrong thread :-) > > Anyway, I guess it wouldn't be too hard to make a test cast by running > "hwclock" (or the corresponding initscript) in a tight loop? (Sorry, I > am not able to test this myself at the moment). That would make it > easier to test the patch too, I guess. Maybe something like: > > $ while true; do /etc/init.d/ntpdate up; /etc/init.d/ntpdate down; done > > (At least "ntpdate" is the one that runs hwclock for me on F9.) > > I will try this later unless you beat me to it :-) > > I beat you :-). I noticed the problem myself. It does trigger if I run hwclock in a loop. while true; do hwclock; done However it is fine in latest git. Looks like Ingo's patch has been merged, and has fixed it. Alan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() 2008-08-26 10:25 ` Alan Jenkins @ 2008-08-26 10:39 ` Ingo Molnar 2008-08-27 8:54 ` Alan Jenkins 0 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2008-08-26 10:39 UTC (permalink / raw) To: Alan Jenkins Cc: Vegard Nossum, Mikael Pettersson, hpa, mingo, tglx, linux-kernel * Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote: > I beat you :-). I noticed the problem myself. It does trigger if I > run hwclock in a loop. > > while true; do hwclock; done > > However it is fine in latest git. Looks like Ingo's patch has been > merged, and has fixed it. ah, good - thanks for testing it. I havent seen a lockup on the T60 either - but it was very sporadic to begin with. (and now we understand why: hwclock is running only once per reboot.) Plus, the real bug has to be found as well: who in the rtc ioctl path calls this code with irqs disabled - it's not nice. Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() 2008-08-26 10:39 ` Ingo Molnar @ 2008-08-27 8:54 ` Alan Jenkins 0 siblings, 0 replies; 15+ messages in thread From: Alan Jenkins @ 2008-08-27 8:54 UTC (permalink / raw) To: Ingo Molnar Cc: Vegard Nossum, Mikael Pettersson, hpa, mingo, tglx, linux-kernel Ingo Molnar wrote: > * Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote: > > >> I beat you :-). I noticed the problem myself. It does trigger if I >> run hwclock in a loop. >> >> while true; do hwclock; done >> >> However it is fine in latest git. Looks like Ingo's patch has been >> merged, and has fixed it. >> > > ah, good - thanks for testing it. I havent seen a lockup on the T60 > either - but it was very sporadic to begin with. (and now we understand > why: hwclock is running only once per reboot.) > > Plus, the real bug has to be found as well: who in the rtc ioctl path > calls this code with irqs disabled - it's not nice. > Isn't the problem that it gets called from a timer IRQ? So irqs are enabled globally, but the timer IRQ is masked. [<c0139cc0>] hrtimer_run_pending+0x20/0x90 [<c01478a8>] handle_IRQ_event+0x28/0x50 [<c0148f71>] handle_edge_irq+0xa1/0x120 [<c010615b>] do_IRQ+0x3b/0x70 [<c0113225>] smp_apic_timer_interrupt+0x55/0x80 [<c0103c4f>] common_interrupt+0x23/0x28 Which in turn is probably because the userspace "RTC interrupt" interface is emulated using RTC reads within the main timer interrupt handler. Alan ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-10-02 0:03 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-23 9:48 [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Mikael Pettersson 2008-08-23 16:01 ` [PATCH] rtc: fix deadlock Ingo Molnar 2008-08-23 16:58 ` Mikael Pettersson 2008-08-23 17:11 ` Ingo Molnar 2008-08-23 18:45 ` Maciej W. Rozycki 2008-08-23 19:46 ` Mikael Pettersson 2008-08-29 11:48 ` [PATCH] rtc: fix deadlock: fixes regression since 2.6.24 Frank van Maarseveen 2008-09-06 18:32 ` Ingo Molnar 2008-10-01 23:51 ` [stable] " Greg KH 2008-08-24 9:14 ` [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Vegard Nossum 2008-08-24 10:32 ` Mikael Pettersson 2008-08-24 11:48 ` Vegard Nossum 2008-08-26 10:25 ` Alan Jenkins 2008-08-26 10:39 ` Ingo Molnar 2008-08-27 8:54 ` Alan Jenkins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox