From: Greg KH <greg@kroah.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Frank van Maarseveen <frankvm@frankvm.com>,
Mikael Pettersson <mikpe@it.uu.se>,
linux-kernel@vger.kernel.org, stable@kernel.org,
mingo@redhat.com, hpa@zytor.com, tglx@linutronix.de,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [stable] [PATCH] rtc: fix deadlock: fixes regression since 2.6.24
Date: Wed, 1 Oct 2008 16:51:11 -0700 [thread overview]
Message-ID: <20081001235111.GF31609@kroah.com> (raw)
In-Reply-To: <20080906183211.GD21872@elte.hu>
On Sat, Sep 06, 2008 at 08:32:11PM +0200, Ingo Molnar wrote:
>
> * Frank van Maarseveen <frankvm@frankvm.com> wrote:
>
> > On Sat, Aug 23, 2008 at 06:01:51PM +0200, Ingo Molnar wrote:
> > >
> > > * Mikael Pettersson <mikpe@it.uu.se> wrote:
> > >
> > > > Since 2.6.27-rc1 my Core2Duo has been getting sporadic oopses
> > > > from hpet_rtc_interrupt, usually during shutdown or reboot,
> > > > but occasionally also early in init. Today I finally managed
> > > > to capture one via a serial cable:
> > > >
> > > > INIT: version 2.86 booting
> > > > Welcome to Fedora Core
> > > > Press 'I' to enter interactive startup.
> > > > BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0117092, registers:
> > > > Modules linked in: ehci_hcd uhci_hcd usbcore
> > > >
> > > > Pid: 311, comm: nash-hotplug Not tainted (2.6.27-rc4 #1)
> > > > EIP: 0060:[<c0117092>] EFLAGS: 00000097 CPU: 0
> > > > EIP is at hpet_rtc_interrupt+0x2d2/0x310
> > > > EAX: 00000000 EBX: 00000002 ECX: 00000046 EDX: 00000002
> > > > ESI: 000000a6 EDI: ffff8e25 EBP: 00000008 ESP: f7bd7f28
> > > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > > Process nash-hotplug (pid: 311, ti=f7bd6000 task=f7b70460 task.ti=f7bd6000)
> > > > Stack: f7bd7f6c c0139cc0 00000000 c035ba04 00000000 00000000 00000000 00000000
> > > > 00000000 00000000 00000000 00000000 00000000 f7b845a0 00000000 00000000
> > > > 00000008 c01478a8 c035bf80 f7b845a0 c035bfb0 00000008 c0148f71 00000400
> > > > Call Trace:
> > > > [<c0139cc0>] hrtimer_run_pending+0x20/0x90
> > > > [<c01478a8>] handle_IRQ_event+0x28/0x50
> > > > [<c0148f71>] handle_edge_irq+0xa1/0x120
> > > > [<c010615b>] do_IRQ+0x3b/0x70
> > > > [<c0113225>] smp_apic_timer_interrupt+0x55/0x80
> > > > [<c0103c4f>] common_interrupt+0x23/0x28
> > > > [<c02c0000>] unix_release_sock+0xc0/0x220
> > > > =======================
> > > > Code: 89 44 24 18 0f b6 c2 e8 5d 74 0c 00 8b 0d d8 9c 3b c0 89 44 24 1c 8b 44 24 0c 48 89 44 24 20 e9 84 fd ff ff 90 8d 74 26 00 f3 90 <a1> 80 ba 35 c0 29 f8 83 f8 01 76 f2 e9 e1 fe ff ff 90 8d 74 26
> > > >
> > > > This points to the following loop in hpet_rtc_interrupt:
> > > >
> > > > 0xc0117090 <hpet_rtc_interrupt+720>: pause
> > > > 0xc0117092 <hpet_rtc_interrupt+722>: mov 0xc035ba80,%eax
> > > > 0xc0117097 <hpet_rtc_interrupt+727>: sub %edi,%eax
> > > > 0xc0117099 <hpet_rtc_interrupt+729>: cmp $0x1,%eax
> > > > 0xc011709c <hpet_rtc_interrupt+732>: jbe 0xc0117090 <hpet_rtc_interrupt+720>
> > > >
> > > > Note: 0xc035ba80 == &jiffies
> > > >
> > > > This loop originates from asm-generic/rtc.h:get_rtc_time()
> > > >
> > > > while (jiffies - uip_watchdog < 2*HZ/100) {
> > > > barrier();
> > > > cpu_relax();
> > > > }
> > > >
> > > > Note: HZ == CONFIG_HZ == 100
> > > >
> > > > The bug may not originate from the 2.6.27-rc series as I only recently
> > > > enabled HPET in this machine's kernels (not due to HPET problems, it
> > > > inherited its .config way back from an older machine w/o HPET).
> > >
> > > argh, that loop in asm-generic/rtc.h:get_rtc_time looks extremely
> > > fragile, we'll lock up if it's ever called with hardirqs off!
> > >
> > > Does the patch below do the trick?
> > >
> > > Ingo
> > >
> > > ----------------->
> > > >From 2273cc870b52a7ed09eb225142a6db97299e4f39 Mon Sep 17 00:00:00 2001
> > > From: Ingo Molnar <mingo@elte.hu>
> > > Date: Sat, 23 Aug 2008 17:59:07 +0200
> > > Subject: [PATCH] rtc: fix deadlock
> > >
> > > if get_rtc_time() is _ever_ called with IRQs off, we deadlock badly
> > > in it, waiting for jiffies to increment.
> > >
> > > So make the code more robust by doing an explicit mdelay(20).
> > >
> > > This solves a very hard to reproduce/debug hard lockup reported
> > > by Mikael Pettersson.
> > >
> > > Reported-by: Mikael Pettersson <mikpe@it.uu.se>
> > > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > ---
> > > include/asm-generic/rtc.h | 12 ++++--------
> > > 1 files changed, 4 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h
> > > index be4af00..71ef3f0 100644
> > > --- a/include/asm-generic/rtc.h
> > > +++ b/include/asm-generic/rtc.h
> > > @@ -15,6 +15,7 @@
> > > #include <linux/mc146818rtc.h>
> > > #include <linux/rtc.h>
> > > #include <linux/bcd.h>
> > > +#include <linux/delay.h>
> > >
> > > #define RTC_PIE 0x40 /* periodic interrupt enable */
> > > #define RTC_AIE 0x20 /* alarm interrupt enable */
> > > @@ -43,7 +44,6 @@ static inline unsigned char rtc_is_updating(void)
> > >
> > > static inline unsigned int get_rtc_time(struct rtc_time *time)
> > > {
> > > - unsigned long uip_watchdog = jiffies;
> > > unsigned char ctrl;
> > > unsigned long flags;
> > >
> > > @@ -53,19 +53,15 @@ static inline unsigned int get_rtc_time(struct rtc_time *time)
> > >
> > > /*
> > > * read RTC once any update in progress is done. The update
> > > - * can take just over 2ms. We wait 10 to 20ms. There is no need to
> > > + * can take just over 2ms. We wait 20ms. There is no need to
> > > * to poll-wait (up to 1s - eeccch) for the falling edge of RTC_UIP.
> > > * If you need to know *exactly* when a second has started, enable
> > > * periodic update complete interrupts, (via ioctl) and then
> > > * immediately read /dev/rtc which will block until you get the IRQ.
> > > * Once the read clears, read the RTC time (again via ioctl). Easy.
> > > */
> > > -
> > > - if (rtc_is_updating() != 0)
> > > - while (jiffies - uip_watchdog < 2*HZ/100) {
> > > - barrier();
> > > - cpu_relax();
> > > - }
> > > + if (rtc_is_updating())
> > > + mdelay(20);
> > >
> > > /*
> > > * Only the values that we read from the RTC are set. We leave
> >
> > This patch fixes a regression since 2.6.24: 2.6.25 and 2.6.26 occasionally
> > locked up hard here without a trace and even alt-sysrq did not work
> > anymore. It's easy to reproduce with
> >
> > while :; do hwclock; done
> >
> > Others are experiencing this issue too:
> > - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=494036
> > - http://kerneltrap.org/mailarchive/message-id/20080821163920.GA19140@gamma.logic.tuwien.ac.at/linux-kernel
> > - people (me included) experienced booting problems because of
> > this (lockup after initscripts says "Setting the system clock").
> >
> > maybe this is 2.6.25.x and 2.6.26.x material too?
>
> agreed - stable Cc:-ed.
>
> It's about this upstream commit:
>
> | commit 38c052f8cff1bd323ccfa968136a9556652ee420
> | Author: Ingo Molnar <mingo@elte.hu>
> | Date: Sat Aug 23 17:59:07 2008 +0200
> |
> | rtc: fix deadlock
>
> please backport it into -stable .26 and .25. Thanks,
backported.
thanks,
greg k-h
next prev parent reply other threads:[~2008-10-02 0:03 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-23 9:48 [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Mikael Pettersson
2008-08-23 16:01 ` [PATCH] rtc: fix deadlock Ingo Molnar
2008-08-23 16:58 ` Mikael Pettersson
2008-08-23 17:11 ` Ingo Molnar
2008-08-23 18:45 ` Maciej W. Rozycki
2008-08-23 19:46 ` Mikael Pettersson
2008-08-29 11:48 ` [PATCH] rtc: fix deadlock: fixes regression since 2.6.24 Frank van Maarseveen
2008-09-06 18:32 ` Ingo Molnar
2008-10-01 23:51 ` Greg KH [this message]
2008-08-24 9:14 ` [BUG] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Vegard Nossum
2008-08-24 10:32 ` Mikael Pettersson
2008-08-24 11:48 ` Vegard Nossum
2008-08-26 10:25 ` Alan Jenkins
2008-08-26 10:39 ` Ingo Molnar
2008-08-27 8:54 ` Alan Jenkins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081001235111.GF31609@kroah.com \
--to=greg@kroah.com \
--cc=akpm@linux-foundation.org \
--cc=frankvm@frankvm.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mikpe@it.uu.se \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=stable@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.