public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* lost ticks and Hangcheck
@ 2005-08-19  7:41 Nathan Becker
  2005-08-19  9:45 ` Kurt Wall
  2005-08-30 13:47 ` Frank van Maarseveen
  0 siblings, 2 replies; 8+ messages in thread
From: Nathan Becker @ 2005-08-19  7:41 UTC (permalink / raw)
  To: linux-kernel

Hi,

I'm running kernel 2.6.12.5 with x86_64 target on an AMD X2 4800+ and 
Gigabyte GA-K8NXP-SLI motherboard (bios version F8).  I'm having a problem 
with lost clock ticks.  The dmesg says

warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts

Also if I enable hangcheck, then I get a huge number of Hangcheck messages 
in dmesg.

The main other symptom is that the system clock runs fast and 
inaccurately.  It seems to run more inaccurately when I'm using the CPU, 
and be basically OK when idling.

I've tried various workarounds that I found suggested on this list and 
others but the problem is still there.  I tried using noapic, turning on 
RTC interrupt, also no_timer_check.  I also tried patching the CPU 
frequency scaling code with the latest version from the AMD website 
(1.50.03), and then finally turning that option off. Nothing helped.

I'm not sure if this is a bug in the kernel or if I'm just doing something 
incorrectly.  Any thoughts or suggestions, or if is a bug then ETA for a 
fix, would be much appreciated.

I'm not a regular subscriber to this list, so please cc any responses 
directly to me.

thanks very much,

Nathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lost ticks and Hangcheck
  2005-08-19  7:41 lost ticks and Hangcheck Nathan Becker
@ 2005-08-19  9:45 ` Kurt Wall
  2005-08-20  0:22   ` Nathan Becker
  2005-08-30 13:47 ` Frank van Maarseveen
  1 sibling, 1 reply; 8+ messages in thread
From: Kurt Wall @ 2005-08-19  9:45 UTC (permalink / raw)
  To: Nathan Becker; +Cc: linux-kernel

On Fri, Aug 19, 2005 at 12:41:07AM -0700, Nathan Becker took 37 lines to write:
> Hi,
> 
> I'm running kernel 2.6.12.5 with x86_64 target on an AMD X2 4800+ and 
> Gigabyte GA-K8NXP-SLI motherboard (bios version F8).  I'm having a problem 
> with lost clock ticks.  The dmesg says
> 
> warning: many lost ticks.
> Your time source seems to be instable or some driver is hogging interupts
> 
> Also if I enable hangcheck, then I get a huge number of Hangcheck messages 
> in dmesg.
> 
> The main other symptom is that the system clock runs fast and 
> inaccurately.  It seems to run more inaccurately when I'm using the CPU, 
> and be basically OK when idling.
> 
> I've tried various workarounds that I found suggested on this list and 
> others but the problem is still there.  I tried using noapic, turning on 
> RTC interrupt, also no_timer_check.  I also tried patching the CPU 
> frequency scaling code with the latest version from the AMD website 
> (1.50.03), and then finally turning that option off. Nothing helped.

I use the no_timer_check kernel parm and that keeps the clock from
running at double speed. I still see some other annoying boot-time
messages related to timers, but at least my time source is sane:

..MP-BIOS bug: 8254 timer not connected to IO-APIC
 failed.
timer doesn't work through the IO-APIC - disabling NMI Watchdog!
Uhhuh. NMI received for unknown reason 3d.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
 works.
Using local APIC timer interrupts.
Detected 12.436 MHz APIC timer.
testing NMI watchdog ... CPU#0: NMI appears to be stuck (1->1)!

Kurt
-- 
"I think it is true for all _\bn. I was just playing it safe with _\bn >= 3
because I couldn't remember the proof."
		-- Baker, Pure Math 351a

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lost ticks and Hangcheck
  2005-08-19  9:45 ` Kurt Wall
@ 2005-08-20  0:22   ` Nathan Becker
  2005-08-20  0:34     ` john stultz
  0 siblings, 1 reply; 8+ messages in thread
From: Nathan Becker @ 2005-08-20  0:22 UTC (permalink / raw)
  To: linux-kernel

> I use the no_timer_check kernel parm and that keeps the clock from
> running at double speed. I still see some other annoying boot-time

As I mentioned, no_timer_check doesn't fix it for me.  In fact it makes 
the problem significantly worse.  I tried it again just to be sure.  Also 
I tried noapic again and it doesn't help either.

I found there was an upgrade to the NVIDIA graphics driver that addressed 
a clock issue (I don't know if it's related to my problem).  I upgraded 
from version 7667 to 7676.  That seemed to help a little bit, at least in 
prolonging the amount of time I could reasonably use the system.  Someone 
in another thread mentioned that they thought this problem might be caused 
by something in x.org, which I am using.

Any other ideas or patches would be much appreciated.

thanks for your help,

Nathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lost ticks and Hangcheck
  2005-08-20  0:22   ` Nathan Becker
@ 2005-08-20  0:34     ` john stultz
  2005-08-20  9:50       ` Nathan Becker
  0 siblings, 1 reply; 8+ messages in thread
From: john stultz @ 2005-08-20  0:34 UTC (permalink / raw)
  To: Nathan Becker; +Cc: linux-kernel

On Fri, 2005-08-19 at 17:22 -0700, Nathan Becker wrote:
> > I use the no_timer_check kernel parm and that keeps the clock from
> > running at double speed. I still see some other annoying boot-time
> 
> As I mentioned, no_timer_check doesn't fix it for me.  In fact it makes 
> the problem significantly worse.  I tried it again just to be sure.  Also 
> I tried noapic again and it doesn't help either.
> 
> I found there was an upgrade to the NVIDIA graphics driver that addressed 
> a clock issue (I don't know if it's related to my problem).  I upgraded 
> from version 7667 to 7676.  That seemed to help a little bit, at least in 
> prolonging the amount of time I could reasonably use the system.  Someone 
> in another thread mentioned that they thought this problem might be caused 
> by something in x.org, which I am using.

Please make sure this issue is reproducible without any binary only
drivers.

> Any other ideas or patches would be much appreciated.


If it happens w/o binary only drivers, could you open a bug at
bugzilla.kernel.org and provide full dmesg output?

Also check bug #3341 to see if it is at all similar to what you are
seeing.

thanks
-john



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lost ticks and Hangcheck
  2005-08-20  0:34     ` john stultz
@ 2005-08-20  9:50       ` Nathan Becker
  0 siblings, 0 replies; 8+ messages in thread
From: Nathan Becker @ 2005-08-20  9:50 UTC (permalink / raw)
  To: linux-kernel

> Please make sure this issue is reproducible without any binary only
> drivers.

I uninstalled the NVIDIA drivers and tried again with the nv x.org driver. 
Same problem.  I also tried remaining in text mode (with no NVIDIA drivers 
loaded).  Same problem.  In both cases it occurs when I start seriously 
loading the CPU.

One thing that may be of interest is that the message in dmesg is 
different if I'm in text mode vs. x.org.  If I'm in text mode the message 
is:

Losing some ticks... checking if CPU frequency changed.

If I'm running x.org then I get

warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip default_idle+0x20/0x30

OK, I'll open a bug report in bugzilla.  I don't think this is the same as 
bug #3341.  My clock comes up normal 100% of the time on boot up.  Things 
only go awry when I start putting a load on the CPU.

Thanks very much for your help, and please cc. me if you find anything out 
since I'm not a regular subscriber.

Nathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lost ticks and Hangcheck
  2005-08-19  7:41 lost ticks and Hangcheck Nathan Becker
  2005-08-19  9:45 ` Kurt Wall
@ 2005-08-30 13:47 ` Frank van Maarseveen
  2005-09-04 11:39   ` 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast Frank van Maarseveen
  1 sibling, 1 reply; 8+ messages in thread
From: Frank van Maarseveen @ 2005-08-30 13:47 UTC (permalink / raw)
  To: Nathan Becker; +Cc: linux-kernel

On Fri, Aug 19, 2005 at 12:41:07AM -0700, Nathan Becker wrote:
> Hi,
> 
> I'm running kernel 2.6.12.5 with x86_64 target on an AMD X2 4800+ and 
> Gigabyte GA-K8NXP-SLI motherboard (bios version F8).  I'm having a problem 
> with lost clock ticks.  The dmesg says
> 
> warning: many lost ticks.
> Your time source seems to be instable or some driver is hogging interupts
> 
> Also if I enable hangcheck, then I get a huge number of Hangcheck messages 
> in dmesg.

I get a lot of "kernel: Hangcheck: hangcheck value past margin!" messages
from 2.6.13-rc7 on AMD64 X2 3800+ and Asus A8V deluxe motherboard. No lost
ticks messages however.

> 
> The main other symptom is that the system clock runs fast and 
> inaccurately.  It seems to run more inaccurately when I'm using the CPU, 
> and be basically OK when idling.

That seems to be the case here too: clock runs too fast under heavy load
(burn-in tests involving kernel builds and large disk copies).

-- 
Frank

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast
  2005-08-30 13:47 ` Frank van Maarseveen
@ 2005-09-04 11:39   ` Frank van Maarseveen
  2005-09-04 14:27     ` Daniel Jacobowitz
  0 siblings, 1 reply; 8+ messages in thread
From: Frank van Maarseveen @ 2005-09-04 11:39 UTC (permalink / raw)
  To: linux-kernel

After replacing the kernel on a fresh FC4 install with a stock 2.6.13
(using gcc 3.2) and my own config it appears that the clock is going too
fast: it gains at least an hour every 12 hours or so. FC4 kernel (rpm:
kernel-2.6.11-1.1369_FC4) seems ok

I tried the following from another system with reliable clock:

	for i in `yes|head -100`
	do
		/usr/bin/time -f %e rsh system_with_buggy_clock sleep 1
	done | cat -n

annotated output:

     1	1.03
     2	1.03
     3	1.03
     4	1.03
     5	1.03
     6	1.03
     7	1.02
     8	1.03
     9	1.03
    10	1.03
    11	1.03
    12	1.03
    13	1.03
    14	1.03
    15	1.03
    16	0.72		<==
    17	1.03
    18	1.03
    19	1.03
    20	1.03
    21	1.03
    22	1.03
    23	1.03
    24	1.02
    25	1.03
    26	1.03
    27	1.03
    28	1.03
    29	1.03
    30	1.03
    31	1.03
    32	1.03
    33	1.03
    34	0.14		<==
    35	1.03
    36	1.03
    37	1.03
    38	1.03
    39	1.03
    40	1.03
    41	1.03
    42	1.02
    43	1.03
    44	1.03
    45	1.03
    46	1.03
    47	1.03
    48	1.03
    49	1.03
    50	1.03
    51	1.03
    52	0.18		<==
    53	1.03
    54	1.03
    55	1.03
    56	1.03
    57	1.03
    58	1.03
    59	1.03
    60	1.02
    61	1.03
    62	1.03
    63	1.03
    64	1.04
    65	1.03
    66	1.03
    67	1.03
    68	1.03
    69	1.03
    70	0.13		<==
    71	1.03
    72	1.03
    73	1.03
    74	1.03
    75	1.03
    76	1.03
    77	1.03
    78	1.02
    79	1.03
    80	1.03
    81	1.03
    82	1.03
    83	1.03
    84	1.03
    85	1.03
    86	1.03
    87	1.03
    88	0.15		<==
    89	1.03
    90	1.03
    91	1.03
    92	1.03
    93	1.03
    94	1.03
    95	1.03
    96	1.02
    97	1.03
    98	1.03
    99	1.03
   100	1.03

I also ran the following script on the system with the unstable clock,
measuring timer interrupts per CPU as visible in /proc/interrupts:

           CPU0       CPU1       
  0:    6741707    5860969    IO-APIC-edge  timer
  1:         45         10    IO-APIC-edge  i8042
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
 14:     807745     907612    IO-APIC-edge  ide0
 15:     834978     871118    IO-APIC-edge  ide1
 17:   45336986   45939432   IO-APIC-level  SysKonnect SK-98xx
 18:          0          0   IO-APIC-level  libata
 21:          0          0   IO-APIC-level  ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5
 22:          0          0   IO-APIC-level  VIA8237
NMI:          0          0 
LOC:   12601494   12601519 
ERR:          0
MIS:          0

script:

	#!/bin/sh

	for i in `yes|head -100`
	do
		s1=`cat /proc/interrupts`
		sleep 1
		s2=`cat /proc/interrupts`

		t10=`echo "$s1" | awk '$1=="0:"{ print $2}'`
		t11=`echo "$s1" | awk '$1=="0:"{ print $3}'`
		t20=`echo "$s2" | awk '$1=="0:"{ print $2}'`
		t21=`echo "$s2" | awk '$1=="0:"{ print $3}'`
		d1=`expr $t20 - $t10`
		d2=`expr $t21 - $t11`
		echo $d1 + $d2 = `expr $d1 + $d2`
	done | cat -n

annotated output:

      CPU0 CPU1   Total
-----------------------
     1  0 + 251 = 251
     2  0 + 251 = 251
     3  0 + 251 = 251
     4  0 + 251 = 251
     5  0 + 251 = 251
     6  52 + 196 = 248		<== (?)
     7  251 + 0 = 251
     8  251 + 0 = 251
     9  251 + 0 = 251
    10  251 + 0 = 251
    11  251 + 0 = 251
    12  251 + 0 = 251
    13  251 + 0 = 251
    14  251 + 0 = 251
    15  251 + 0 = 251
    16  147 + 1 = 148		<==
    17  0 + 252 = 252
    18  0 + 251 = 251
    19  0 + 251 = 251
    20  0 + 251 = 251
    21  0 + 251 = 251
    22  0 + 252 = 252
    23  0 + 251 = 251
    24  72 + 177 = 249		<== (?)
    25  252 + 0 = 252
    26  252 + 0 = 252
    27  252 + 0 = 252
    28  252 + 0 = 252
    29  252 + 0 = 252
    30  252 + 0 = 252
    31  252 + 0 = 252
    32  253 + 0 = 253
    33  253 + 0 = 253
    34  118 + 2 = 120		<==
    35  0 + 253 = 253
    36  0 + 253 = 253
    37  0 + 253 = 253
    38  0 + 253 = 253
    39  0 + 252 = 252
    40  0 + 252 = 252
    41  0 + 252 = 252
    42  78 + 171 = 249		<== (?)
    43  252 + 0 = 252
    44  252 + 0 = 252
    45  252 + 0 = 252
    46  252 + 0 = 252
    47  251 + 0 = 251
    48  251 + 0 = 251
    49  251 + 0 = 251
    50  251 + 0 = 251
    51  251 + 0 = 251
    52  121 + 1 = 122		<==
    53  0 + 251 = 251
    54  0 + 251 = 251
    55  0 + 251 = 251
    56  0 + 251 = 251
    57  0 + 251 = 251
    58  0 + 251 = 251
    59  0 + 251 = 251
    60  69 + 179 = 248		<== (?)
    61  251 + 0 = 251
    62  251 + 0 = 251
    63  251 + 0 = 251
    64  251 + 0 = 251
    65  251 + 0 = 251
    66  251 + 0 = 251
    67  251 + 0 = 251
    68  251 + 0 = 251
    69  251 + 0 = 251
    70  130 + 1 = 131		<==
    71  0 + 252 = 252
    72  0 + 252 = 252
    73  0 + 252 = 252
    74  0 + 252 = 252
    75  0 + 252 = 252
    76  0 + 252 = 252
    77  0 + 252 = 252
    78  77 + 172 = 249		<== (?)
    79  253 + 0 = 253
    80  253 + 0 = 253
    81  253 + 0 = 253
    82  253 + 0 = 253
    83  253 + 0 = 253
    84  253 + 0 = 253
    85  252 + 0 = 252
    86  252 + 0 = 252
    87  252 + 0 = 252
    88  112 + 2 = 114		<==
    89  0 + 252 = 252
    90  0 + 252 = 252
    91  0 + 252 = 252
    92  0 + 252 = 252
    93  0 + 252 = 252
    94  0 + 252 = 252
    95  0 + 251 = 251
    96  0 + 251 = 251
    97  0 + 251 = 251
    98  53 + 195 = 248		<== (?)
    99  251 + 0 = 251
   100  251 + 0 = 251

The hangcheck timer goes off when configured.

-- 
Frank

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast
  2005-09-04 11:39   ` 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast Frank van Maarseveen
@ 2005-09-04 14:27     ` Daniel Jacobowitz
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Jacobowitz @ 2005-09-04 14:27 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: linux-kernel

On Sun, Sep 04, 2005 at 01:39:15PM +0200, Frank van Maarseveen wrote:
> After replacing the kernel on a fresh FC4 install with a stock 2.6.13
> (using gcc 3.2) and my own config it appears that the clock is going too
> fast: it gains at least an hour every 12 hours or so. FC4 kernel (rpm:
> kernel-2.6.11-1.1369_FC4) seems ok

Mind sticking this information in bugzilla.kernel.org, bug 5105?

> annotated output:
> 
>       CPU0 CPU1   Total
> -----------------------
>      1  0 + 251 = 251
>      2  0 + 251 = 251
>      3  0 + 251 = 251
>      4  0 + 251 = 251
>      5  0 + 251 = 251
>      6  52 + 196 = 248		<== (?)
>      7  251 + 0 = 251
>      8  251 + 0 = 251
>      9  251 + 0 = 251
>     10  251 + 0 = 251
>     11  251 + 0 = 251
>     12  251 + 0 = 251
>     13  251 + 0 = 251
>     14  251 + 0 = 251
>     15  251 + 0 = 251
>     16  147 + 1 = 148		<==
>     17  0 + 252 = 252

Hmmmmmmmmmmmmmmmmmmmmmm, very interesting.


-- 
Daniel Jacobowitz
CodeSourcery, LLC

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-09-04 14:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-19  7:41 lost ticks and Hangcheck Nathan Becker
2005-08-19  9:45 ` Kurt Wall
2005-08-20  0:22   ` Nathan Becker
2005-08-20  0:34     ` john stultz
2005-08-20  9:50       ` Nathan Becker
2005-08-30 13:47 ` Frank van Maarseveen
2005-09-04 11:39   ` 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast Frank van Maarseveen
2005-09-04 14:27     ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox