public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Mysterious RTC hangs on x86_64 - fixed, sort of
@ 2007-05-02 22:36 Zachary Amsden
  2007-05-02 22:56 ` Chuck Ebbert
  0 siblings, 1 reply; 5+ messages in thread
From: Zachary Amsden @ 2007-05-02 22:36 UTC (permalink / raw)
  To: Marcos Pinto, Andi Kleen, Linux Kernel Mailing List,
	Alessandro Zummo

[-- Attachment #1: Type: text/plain, Size: 447 bytes --]

With this patch, /sbin/hwclock no longer hangs my AMD64 machine when run 
after reaching multiuser.  What I don't understand is why.  I have the 
RTC based sound sequencer timer as a module, but not loaded, and the 
error message I added to indicate broken rtc control does not fire.

So why is it that if I stop taking the rtc_task_lock and issuing the 
callbacks which should never be held or exist that my system no longer 
hard freezes?

Zach

[-- Attachment #2: x86_64-rtc-mystery.patch --]
[-- Type: text/x-patch, Size: 810 bytes --]

--- /tmp/a      2007-05-03 15:36:07.451256181 -0700
+++ drivers/char/rtc.c  2007-05-03 15:27:49.000000000 -0700
@@ -265,10 +265,10 @@
        spin_unlock (&rtc_lock);
 
        /* Now do the rest of the actions */
-       spin_lock(&rtc_task_lock);
-       if (rtc_callback)
-               rtc_callback->func(rtc_callback->private_data);
-       spin_unlock(&rtc_task_lock);
+/*     spin_lock(&rtc_task_lock); */
+//     if (rtc_callback)
+//             rtc_callback->func(rtc_callback->private_data);
+/*     spin_unlock(&rtc_task_lock); */
        wake_up_interruptible(&rtc_wait);       
 
        kill_fasync (&rtc_async_queue, SIGIO, POLL_IN);
@@ -811,6 +811,7 @@
 
 int rtc_register(rtc_task_t *task)
 {
+       printk(KERN_ERR "rtc_register is busted\n");
 #ifndef RTC_IRQ
        return -EIO;
 #else

[-- Attachment #3: rtc.config --]
[-- Type: text/plain, Size: 323 bytes --]

CONFIG_HPET_EMULATE_RTC=y
CONFIG_RTC=y
# CONFIG_HPET_RTC_IRQ is not set
CONFIG_SND_RTCTIMER=m
CONFIG_SND_SEQ_RTCTIMER_DEFAULT=y
CONFIG_RTC_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
CONFIG_RTC_DEBUG=y
# RTC interfaces
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mysterious RTC hangs on x86_64 - fixed, sort of
  2007-05-02 22:36 Mysterious RTC hangs on x86_64 - fixed, sort of Zachary Amsden
@ 2007-05-02 22:56 ` Chuck Ebbert
  2007-05-03  0:05   ` Zachary Amsden
  2007-05-03  4:23   ` Zachary Amsden
  0 siblings, 2 replies; 5+ messages in thread
From: Chuck Ebbert @ 2007-05-02 22:56 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Marcos Pinto, Andi Kleen, Linux Kernel Mailing List,
	Alessandro Zummo

Zachary Amsden wrote:
> With this patch, /sbin/hwclock no longer hangs my AMD64 machine when run
> after reaching multiuser.  What I don't understand is why.  I have the
> RTC based sound sequencer timer as a module, but not loaded, and the
> error message I added to indicate broken rtc control does not fire.
> 
> So why is it that if I stop taking the rtc_task_lock and issuing the
> callbacks which should never be held or exist that my system no longer
> hard freezes?
> 
> --- /tmp/a      2007-05-03 15:36:07.451256181 -0700
> +++ drivers/char/rtc.c  2007-05-03 15:27:49.000000000 -0700
> @@ -265,10 +265,10 @@
>         spin_unlock (&rtc_lock);
>  
>         /* Now do the rest of the actions */
> -       spin_lock(&rtc_task_lock);
> -       if (rtc_callback)
> -               rtc_callback->func(rtc_callback->private_data);
> -       spin_unlock(&rtc_task_lock);
> +/*     spin_lock(&rtc_task_lock); */
> +//     if (rtc_callback)
> +//             rtc_callback->func(rtc_callback->private_data);
> +/*     spin_unlock(&rtc_task_lock); */
>         wake_up_interruptible(&rtc_wait);       

Try leaving the spinlocks and just disabling the callbacks. And maybe
enable spinlock debugging...

> 
> CONFIG_HPET_EMULATE_RTC=y

Did you try without that?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mysterious RTC hangs on x86_64 - fixed, sort of
  2007-05-02 22:56 ` Chuck Ebbert
@ 2007-05-03  0:05   ` Zachary Amsden
  2007-05-03  4:23   ` Zachary Amsden
  1 sibling, 0 replies; 5+ messages in thread
From: Zachary Amsden @ 2007-05-03  0:05 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Marcos Pinto, Andi Kleen, Linux Kernel Mailing List,
	Alessandro Zummo

Chuck Ebbert wrote:

Well, turns out this is a heisenbug.  Which is good, since it means the 
nop patch didn't change anything.

> Try leaving the spinlocks and just disabling the callbacks. And maybe
> enable spinlock debugging...
>   

I tried removing all the spinlocks inside the interrupt handler.  Seemed 
to work fine for a while, but still hung (at worst, it looks missing 
locks means we might screw up and read / write the wrong CMOS register, 
not hang or crash).

So I took down 2nd CPU with hotplug (did not yet try UP kernel though).  
It took a longer time, but still hung.  Seems not to be a spinlock 
problem, but I'll turn on debugging anyway.

>   
>> CONFIG_HPET_EMULATE_RTC=y
>>     
>
> Did you try without that?
>   

Will do.  That looks much more suspicious like.  I thought I killed it 
already, but had only got this:

# CONFIG_HPET_RTC_IRQ is not set

If that still crashes, I'll try running cmos access in a loop in userspace to see if maybe the port I/O is tickling a chipset bug (the only other report I know of is on same chipset, nVidia MCP51).  Maybe SMM handler is accessing CMOS or something wacked out.  <laughs hysterical... stops laughing when the theory actually sounds plausible>.  Stuck in SMM is not good for CPU thermal throttling ... hopefully Turion's don't reach nuclear emission point.

Would also explain maybe why NMI watchdog doesn't seem to notice anything wrong.


Thanks,
Zach

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mysterious RTC hangs on x86_64 - fixed, sort of
  2007-05-02 22:56 ` Chuck Ebbert
  2007-05-03  0:05   ` Zachary Amsden
@ 2007-05-03  4:23   ` Zachary Amsden
  2007-05-03  9:00     ` Andi Kleen
  1 sibling, 1 reply; 5+ messages in thread
From: Zachary Amsden @ 2007-05-03  4:23 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Marcos Pinto, Andi Kleen, Linux Kernel Mailing List,
	Alessandro Zummo

Chuck Ebbert wrote:
>
>   
>> CONFIG_HPET_EMULATE_RTC=y
>>     
>
> Did you try without that?
>   

Just did.  Still hangs same way; strace shows /sbin/hwclock dying after 
hundreds of RTC_RD_TIME.  And now /proc/interrupts shows no rtc 
interrupts being generated (expected, I gues).  Seems to take longer to 
crash, but this is a heisenbug.

Enough crashing for today.  Strangest thing is the NMI watchdog does not 
fire...

Zach

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mysterious RTC hangs on x86_64 - fixed, sort of
  2007-05-03  4:23   ` Zachary Amsden
@ 2007-05-03  9:00     ` Andi Kleen
  0 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2007-05-03  9:00 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Chuck Ebbert, Marcos Pinto, Linux Kernel Mailing List,
	Alessandro Zummo


> Enough crashing for today.  Strangest thing is the NMI watchdog does not 
> fire...

It's disabled now by default.

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-05-03  9:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-02 22:36 Mysterious RTC hangs on x86_64 - fixed, sort of Zachary Amsden
2007-05-02 22:56 ` Chuck Ebbert
2007-05-03  0:05   ` Zachary Amsden
2007-05-03  4:23   ` Zachary Amsden
2007-05-03  9:00     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox