* lost ticks and Hangcheck
@ 2005-08-19 7:41 Nathan Becker
2005-08-19 9:45 ` Kurt Wall
2005-08-30 13:47 ` Frank van Maarseveen
0 siblings, 2 replies; 8+ messages in thread
From: Nathan Becker @ 2005-08-19 7:41 UTC (permalink / raw)
To: linux-kernel
Hi,
I'm running kernel 2.6.12.5 with x86_64 target on an AMD X2 4800+ and
Gigabyte GA-K8NXP-SLI motherboard (bios version F8). I'm having a problem
with lost clock ticks. The dmesg says
warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
Also if I enable hangcheck, then I get a huge number of Hangcheck messages
in dmesg.
The main other symptom is that the system clock runs fast and
inaccurately. It seems to run more inaccurately when I'm using the CPU,
and be basically OK when idling.
I've tried various workarounds that I found suggested on this list and
others but the problem is still there. I tried using noapic, turning on
RTC interrupt, also no_timer_check. I also tried patching the CPU
frequency scaling code with the latest version from the AMD website
(1.50.03), and then finally turning that option off. Nothing helped.
I'm not sure if this is a bug in the kernel or if I'm just doing something
incorrectly. Any thoughts or suggestions, or if is a bug then ETA for a
fix, would be much appreciated.
I'm not a regular subscriber to this list, so please cc any responses
directly to me.
thanks very much,
Nathan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lost ticks and Hangcheck
2005-08-19 7:41 lost ticks and Hangcheck Nathan Becker
@ 2005-08-19 9:45 ` Kurt Wall
2005-08-20 0:22 ` Nathan Becker
2005-08-30 13:47 ` Frank van Maarseveen
1 sibling, 1 reply; 8+ messages in thread
From: Kurt Wall @ 2005-08-19 9:45 UTC (permalink / raw)
To: Nathan Becker; +Cc: linux-kernel
On Fri, Aug 19, 2005 at 12:41:07AM -0700, Nathan Becker took 37 lines to write:
> Hi,
>
> I'm running kernel 2.6.12.5 with x86_64 target on an AMD X2 4800+ and
> Gigabyte GA-K8NXP-SLI motherboard (bios version F8). I'm having a problem
> with lost clock ticks. The dmesg says
>
> warning: many lost ticks.
> Your time source seems to be instable or some driver is hogging interupts
>
> Also if I enable hangcheck, then I get a huge number of Hangcheck messages
> in dmesg.
>
> The main other symptom is that the system clock runs fast and
> inaccurately. It seems to run more inaccurately when I'm using the CPU,
> and be basically OK when idling.
>
> I've tried various workarounds that I found suggested on this list and
> others but the problem is still there. I tried using noapic, turning on
> RTC interrupt, also no_timer_check. I also tried patching the CPU
> frequency scaling code with the latest version from the AMD website
> (1.50.03), and then finally turning that option off. Nothing helped.
I use the no_timer_check kernel parm and that keeps the clock from
running at double speed. I still see some other annoying boot-time
messages related to timers, but at least my time source is sane:
..MP-BIOS bug: 8254 timer not connected to IO-APIC
failed.
timer doesn't work through the IO-APIC - disabling NMI Watchdog!
Uhhuh. NMI received for unknown reason 3d.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
works.
Using local APIC timer interrupts.
Detected 12.436 MHz APIC timer.
testing NMI watchdog ... CPU#0: NMI appears to be stuck (1->1)!
Kurt
--
"I think it is true for all _\bn. I was just playing it safe with _\bn >= 3
because I couldn't remember the proof."
-- Baker, Pure Math 351a
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lost ticks and Hangcheck
2005-08-19 9:45 ` Kurt Wall
@ 2005-08-20 0:22 ` Nathan Becker
2005-08-20 0:34 ` john stultz
0 siblings, 1 reply; 8+ messages in thread
From: Nathan Becker @ 2005-08-20 0:22 UTC (permalink / raw)
To: linux-kernel
> I use the no_timer_check kernel parm and that keeps the clock from
> running at double speed. I still see some other annoying boot-time
As I mentioned, no_timer_check doesn't fix it for me. In fact it makes
the problem significantly worse. I tried it again just to be sure. Also
I tried noapic again and it doesn't help either.
I found there was an upgrade to the NVIDIA graphics driver that addressed
a clock issue (I don't know if it's related to my problem). I upgraded
from version 7667 to 7676. That seemed to help a little bit, at least in
prolonging the amount of time I could reasonably use the system. Someone
in another thread mentioned that they thought this problem might be caused
by something in x.org, which I am using.
Any other ideas or patches would be much appreciated.
thanks for your help,
Nathan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lost ticks and Hangcheck
2005-08-20 0:22 ` Nathan Becker
@ 2005-08-20 0:34 ` john stultz
2005-08-20 9:50 ` Nathan Becker
0 siblings, 1 reply; 8+ messages in thread
From: john stultz @ 2005-08-20 0:34 UTC (permalink / raw)
To: Nathan Becker; +Cc: linux-kernel
On Fri, 2005-08-19 at 17:22 -0700, Nathan Becker wrote:
> > I use the no_timer_check kernel parm and that keeps the clock from
> > running at double speed. I still see some other annoying boot-time
>
> As I mentioned, no_timer_check doesn't fix it for me. In fact it makes
> the problem significantly worse. I tried it again just to be sure. Also
> I tried noapic again and it doesn't help either.
>
> I found there was an upgrade to the NVIDIA graphics driver that addressed
> a clock issue (I don't know if it's related to my problem). I upgraded
> from version 7667 to 7676. That seemed to help a little bit, at least in
> prolonging the amount of time I could reasonably use the system. Someone
> in another thread mentioned that they thought this problem might be caused
> by something in x.org, which I am using.
Please make sure this issue is reproducible without any binary only
drivers.
> Any other ideas or patches would be much appreciated.
If it happens w/o binary only drivers, could you open a bug at
bugzilla.kernel.org and provide full dmesg output?
Also check bug #3341 to see if it is at all similar to what you are
seeing.
thanks
-john
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lost ticks and Hangcheck
2005-08-20 0:34 ` john stultz
@ 2005-08-20 9:50 ` Nathan Becker
0 siblings, 0 replies; 8+ messages in thread
From: Nathan Becker @ 2005-08-20 9:50 UTC (permalink / raw)
To: linux-kernel
> Please make sure this issue is reproducible without any binary only
> drivers.
I uninstalled the NVIDIA drivers and tried again with the nv x.org driver.
Same problem. I also tried remaining in text mode (with no NVIDIA drivers
loaded). Same problem. In both cases it occurs when I start seriously
loading the CPU.
One thing that may be of interest is that the message in dmesg is
different if I'm in text mode vs. x.org. If I'm in text mode the message
is:
Losing some ticks... checking if CPU frequency changed.
If I'm running x.org then I get
warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip default_idle+0x20/0x30
OK, I'll open a bug report in bugzilla. I don't think this is the same as
bug #3341. My clock comes up normal 100% of the time on boot up. Things
only go awry when I start putting a load on the CPU.
Thanks very much for your help, and please cc. me if you find anything out
since I'm not a regular subscriber.
Nathan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lost ticks and Hangcheck
2005-08-19 7:41 lost ticks and Hangcheck Nathan Becker
2005-08-19 9:45 ` Kurt Wall
@ 2005-08-30 13:47 ` Frank van Maarseveen
2005-09-04 11:39 ` 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast Frank van Maarseveen
1 sibling, 1 reply; 8+ messages in thread
From: Frank van Maarseveen @ 2005-08-30 13:47 UTC (permalink / raw)
To: Nathan Becker; +Cc: linux-kernel
On Fri, Aug 19, 2005 at 12:41:07AM -0700, Nathan Becker wrote:
> Hi,
>
> I'm running kernel 2.6.12.5 with x86_64 target on an AMD X2 4800+ and
> Gigabyte GA-K8NXP-SLI motherboard (bios version F8). I'm having a problem
> with lost clock ticks. The dmesg says
>
> warning: many lost ticks.
> Your time source seems to be instable or some driver is hogging interupts
>
> Also if I enable hangcheck, then I get a huge number of Hangcheck messages
> in dmesg.
I get a lot of "kernel: Hangcheck: hangcheck value past margin!" messages
from 2.6.13-rc7 on AMD64 X2 3800+ and Asus A8V deluxe motherboard. No lost
ticks messages however.
>
> The main other symptom is that the system clock runs fast and
> inaccurately. It seems to run more inaccurately when I'm using the CPU,
> and be basically OK when idling.
That seems to be the case here too: clock runs too fast under heavy load
(burn-in tests involving kernel builds and large disk copies).
--
Frank
^ permalink raw reply [flat|nested] 8+ messages in thread
* 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast
2005-08-30 13:47 ` Frank van Maarseveen
@ 2005-09-04 11:39 ` Frank van Maarseveen
2005-09-04 14:27 ` Daniel Jacobowitz
0 siblings, 1 reply; 8+ messages in thread
From: Frank van Maarseveen @ 2005-09-04 11:39 UTC (permalink / raw)
To: linux-kernel
After replacing the kernel on a fresh FC4 install with a stock 2.6.13
(using gcc 3.2) and my own config it appears that the clock is going too
fast: it gains at least an hour every 12 hours or so. FC4 kernel (rpm:
kernel-2.6.11-1.1369_FC4) seems ok
I tried the following from another system with reliable clock:
for i in `yes|head -100`
do
/usr/bin/time -f %e rsh system_with_buggy_clock sleep 1
done | cat -n
annotated output:
1 1.03
2 1.03
3 1.03
4 1.03
5 1.03
6 1.03
7 1.02
8 1.03
9 1.03
10 1.03
11 1.03
12 1.03
13 1.03
14 1.03
15 1.03
16 0.72 <==
17 1.03
18 1.03
19 1.03
20 1.03
21 1.03
22 1.03
23 1.03
24 1.02
25 1.03
26 1.03
27 1.03
28 1.03
29 1.03
30 1.03
31 1.03
32 1.03
33 1.03
34 0.14 <==
35 1.03
36 1.03
37 1.03
38 1.03
39 1.03
40 1.03
41 1.03
42 1.02
43 1.03
44 1.03
45 1.03
46 1.03
47 1.03
48 1.03
49 1.03
50 1.03
51 1.03
52 0.18 <==
53 1.03
54 1.03
55 1.03
56 1.03
57 1.03
58 1.03
59 1.03
60 1.02
61 1.03
62 1.03
63 1.03
64 1.04
65 1.03
66 1.03
67 1.03
68 1.03
69 1.03
70 0.13 <==
71 1.03
72 1.03
73 1.03
74 1.03
75 1.03
76 1.03
77 1.03
78 1.02
79 1.03
80 1.03
81 1.03
82 1.03
83 1.03
84 1.03
85 1.03
86 1.03
87 1.03
88 0.15 <==
89 1.03
90 1.03
91 1.03
92 1.03
93 1.03
94 1.03
95 1.03
96 1.02
97 1.03
98 1.03
99 1.03
100 1.03
I also ran the following script on the system with the unstable clock,
measuring timer interrupts per CPU as visible in /proc/interrupts:
CPU0 CPU1
0: 6741707 5860969 IO-APIC-edge timer
1: 45 10 IO-APIC-edge i8042
2: 0 0 XT-PIC cascade
8: 0 1 IO-APIC-edge rtc
14: 807745 907612 IO-APIC-edge ide0
15: 834978 871118 IO-APIC-edge ide1
17: 45336986 45939432 IO-APIC-level SysKonnect SK-98xx
18: 0 0 IO-APIC-level libata
21: 0 0 IO-APIC-level ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5
22: 0 0 IO-APIC-level VIA8237
NMI: 0 0
LOC: 12601494 12601519
ERR: 0
MIS: 0
script:
#!/bin/sh
for i in `yes|head -100`
do
s1=`cat /proc/interrupts`
sleep 1
s2=`cat /proc/interrupts`
t10=`echo "$s1" | awk '$1=="0:"{ print $2}'`
t11=`echo "$s1" | awk '$1=="0:"{ print $3}'`
t20=`echo "$s2" | awk '$1=="0:"{ print $2}'`
t21=`echo "$s2" | awk '$1=="0:"{ print $3}'`
d1=`expr $t20 - $t10`
d2=`expr $t21 - $t11`
echo $d1 + $d2 = `expr $d1 + $d2`
done | cat -n
annotated output:
CPU0 CPU1 Total
-----------------------
1 0 + 251 = 251
2 0 + 251 = 251
3 0 + 251 = 251
4 0 + 251 = 251
5 0 + 251 = 251
6 52 + 196 = 248 <== (?)
7 251 + 0 = 251
8 251 + 0 = 251
9 251 + 0 = 251
10 251 + 0 = 251
11 251 + 0 = 251
12 251 + 0 = 251
13 251 + 0 = 251
14 251 + 0 = 251
15 251 + 0 = 251
16 147 + 1 = 148 <==
17 0 + 252 = 252
18 0 + 251 = 251
19 0 + 251 = 251
20 0 + 251 = 251
21 0 + 251 = 251
22 0 + 252 = 252
23 0 + 251 = 251
24 72 + 177 = 249 <== (?)
25 252 + 0 = 252
26 252 + 0 = 252
27 252 + 0 = 252
28 252 + 0 = 252
29 252 + 0 = 252
30 252 + 0 = 252
31 252 + 0 = 252
32 253 + 0 = 253
33 253 + 0 = 253
34 118 + 2 = 120 <==
35 0 + 253 = 253
36 0 + 253 = 253
37 0 + 253 = 253
38 0 + 253 = 253
39 0 + 252 = 252
40 0 + 252 = 252
41 0 + 252 = 252
42 78 + 171 = 249 <== (?)
43 252 + 0 = 252
44 252 + 0 = 252
45 252 + 0 = 252
46 252 + 0 = 252
47 251 + 0 = 251
48 251 + 0 = 251
49 251 + 0 = 251
50 251 + 0 = 251
51 251 + 0 = 251
52 121 + 1 = 122 <==
53 0 + 251 = 251
54 0 + 251 = 251
55 0 + 251 = 251
56 0 + 251 = 251
57 0 + 251 = 251
58 0 + 251 = 251
59 0 + 251 = 251
60 69 + 179 = 248 <== (?)
61 251 + 0 = 251
62 251 + 0 = 251
63 251 + 0 = 251
64 251 + 0 = 251
65 251 + 0 = 251
66 251 + 0 = 251
67 251 + 0 = 251
68 251 + 0 = 251
69 251 + 0 = 251
70 130 + 1 = 131 <==
71 0 + 252 = 252
72 0 + 252 = 252
73 0 + 252 = 252
74 0 + 252 = 252
75 0 + 252 = 252
76 0 + 252 = 252
77 0 + 252 = 252
78 77 + 172 = 249 <== (?)
79 253 + 0 = 253
80 253 + 0 = 253
81 253 + 0 = 253
82 253 + 0 = 253
83 253 + 0 = 253
84 253 + 0 = 253
85 252 + 0 = 252
86 252 + 0 = 252
87 252 + 0 = 252
88 112 + 2 = 114 <==
89 0 + 252 = 252
90 0 + 252 = 252
91 0 + 252 = 252
92 0 + 252 = 252
93 0 + 252 = 252
94 0 + 252 = 252
95 0 + 251 = 251
96 0 + 251 = 251
97 0 + 251 = 251
98 53 + 195 = 248 <== (?)
99 251 + 0 = 251
100 251 + 0 = 251
The hangcheck timer goes off when configured.
--
Frank
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast
2005-09-04 11:39 ` 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast Frank van Maarseveen
@ 2005-09-04 14:27 ` Daniel Jacobowitz
0 siblings, 0 replies; 8+ messages in thread
From: Daniel Jacobowitz @ 2005-09-04 14:27 UTC (permalink / raw)
To: Frank van Maarseveen; +Cc: linux-kernel
On Sun, Sep 04, 2005 at 01:39:15PM +0200, Frank van Maarseveen wrote:
> After replacing the kernel on a fresh FC4 install with a stock 2.6.13
> (using gcc 3.2) and my own config it appears that the clock is going too
> fast: it gains at least an hour every 12 hours or so. FC4 kernel (rpm:
> kernel-2.6.11-1.1369_FC4) seems ok
Mind sticking this information in bugzilla.kernel.org, bug 5105?
> annotated output:
>
> CPU0 CPU1 Total
> -----------------------
> 1 0 + 251 = 251
> 2 0 + 251 = 251
> 3 0 + 251 = 251
> 4 0 + 251 = 251
> 5 0 + 251 = 251
> 6 52 + 196 = 248 <== (?)
> 7 251 + 0 = 251
> 8 251 + 0 = 251
> 9 251 + 0 = 251
> 10 251 + 0 = 251
> 11 251 + 0 = 251
> 12 251 + 0 = 251
> 13 251 + 0 = 251
> 14 251 + 0 = 251
> 15 251 + 0 = 251
> 16 147 + 1 = 148 <==
> 17 0 + 252 = 252
Hmmmmmmmmmmmmmmmmmmmmmm, very interesting.
--
Daniel Jacobowitz
CodeSourcery, LLC
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-09-04 14:27 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-19 7:41 lost ticks and Hangcheck Nathan Becker
2005-08-19 9:45 ` Kurt Wall
2005-08-20 0:22 ` Nathan Becker
2005-08-20 0:34 ` john stultz
2005-08-20 9:50 ` Nathan Becker
2005-08-30 13:47 ` Frank van Maarseveen
2005-09-04 11:39 ` 2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast Frank van Maarseveen
2005-09-04 14:27 ` Daniel Jacobowitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox