* system (not HW) clock advancing really fast
@ 2004-02-16 5:46 Bill Anderson
2004-02-16 6:24 ` Michael Frank
0 siblings, 1 reply; 7+ messages in thread
From: Bill Anderson @ 2004-02-16 5:46 UTC (permalink / raw)
To: LKML
Kernel version:
2.4.24-xfs
We've apaprently had this problem for a while
Ok, I've got an HP LPr machine, dual 700MHz intel machine that has it's
system clock gaining seconds very quickly. This, I am told, has been
happening for several kernels.
At first, others on the team insisted it was the hardware clock at
fault, as rebooting the system gives the appearance of fixing it.
However, the system is currently having this issue, and the HW clock is
actually keeping accurate time, as I expected.
The time gain is no consistent. It can gain 3 seconds in one, or 12 in
11, but it always runs fast. This time speedup is to much for ntp to
keep up with. If I sync from hwclock or ntpdate every second, I'm
correcting about 1-3 seconds each time. This is a mail server, so I am
sure you can appreciate the need for accurate timestamps. ;)
I've seen many messages in the archives about *losing* time, but only a
few about gaining it. Personally, I am opposed to the "just reboot it"
mentality; one reason I run Linux.
Given that we are talking about system clock, not HW, and that this
happens with or w/o ntpd/ntpdate, I am suspecting something in the
kernel. Also, this thread leads me there too:
http://marc.theaimsgroup.com/?l=linux-kernel&m=105465355622844&w=2
Am I off base here? I can probably keep the hwclock sync method running
for a day or so before I'm forced to reboot it, so if there is anything
you need to know or want me to try while it is in this state, let me
know.
This address is not subscribed, so please cc me on responses.
Thanks,
Bill
--
Bill Anderson <banderson@hp.com>
Red Hat Certified Engineer
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: system (not HW) clock advancing really fast
2004-02-16 5:46 system (not HW) clock advancing really fast Bill Anderson
@ 2004-02-16 6:24 ` Michael Frank
2004-02-16 7:26 ` Bill Anderson
0 siblings, 1 reply; 7+ messages in thread
From: Michael Frank @ 2004-02-16 6:24 UTC (permalink / raw)
To: Bill Anderson, LKML
I had this somtetimes when using ntpd doing step time update
resulting in silly values in /etc/adjtime .
# mv /etc/adjtime /tmp
# hwclock --systohc
and see if it goes away.
Regards
Michael
On Monday 16 February 2004 13:46, Bill Anderson wrote:
> Kernel version:
> 2.4.24-xfs
> We've apaprently had this problem for a while
>
> Ok, I've got an HP LPr machine, dual 700MHz intel machine that has it's
> system clock gaining seconds very quickly. This, I am told, has been
> happening for several kernels.
>
> At first, others on the team insisted it was the hardware clock at
> fault, as rebooting the system gives the appearance of fixing it.
> However, the system is currently having this issue, and the HW clock is
> actually keeping accurate time, as I expected.
>
> The time gain is no consistent. It can gain 3 seconds in one, or 12 in
> 11, but it always runs fast. This time speedup is to much for ntp to
> keep up with. If I sync from hwclock or ntpdate every second, I'm
> correcting about 1-3 seconds each time. This is a mail server, so I am
> sure you can appreciate the need for accurate timestamps. ;)
>
> I've seen many messages in the archives about *losing* time, but only a
> few about gaining it. Personally, I am opposed to the "just reboot it"
> mentality; one reason I run Linux.
>
> Given that we are talking about system clock, not HW, and that this
> happens with or w/o ntpd/ntpdate, I am suspecting something in the
> kernel. Also, this thread leads me there too:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=105465355622844&w=2
>
>
> Am I off base here? I can probably keep the hwclock sync method running
> for a day or so before I'm forced to reboot it, so if there is anything
> you need to know or want me to try while it is in this state, let me
> know.
>
> This address is not subscribed, so please cc me on responses.
>
> Thanks,
>
> Bill
>
>
>
>
> --
> Bill Anderson <banderson@hp.com>
> Red Hat Certified Engineer
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: system (not HW) clock advancing really fast
2004-02-16 6:24 ` Michael Frank
@ 2004-02-16 7:26 ` Bill Anderson
2004-02-16 7:45 ` Michael Frank
0 siblings, 1 reply; 7+ messages in thread
From: Bill Anderson @ 2004-02-16 7:26 UTC (permalink / raw)
To: LKML
On Sun, 2004-02-15 at 23:24, Michael Frank wrote:
> I had this somtetimes when using ntpd doing step time update
> resulting in silly values in /etc/adjtime .
>
> # mv /etc/adjtime /tmp
> # hwclock --systohc
>
> and see if it goes away.
Thanks, though it didn't work. :(
--
Bill Anderson <banderson@hp.com>
Red Hat Certified Engineer
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: system (not HW) clock advancing really fast
2004-02-16 7:26 ` Bill Anderson
@ 2004-02-16 7:45 ` Michael Frank
2004-02-16 7:48 ` Bill Anderson
0 siblings, 1 reply; 7+ messages in thread
From: Michael Frank @ 2004-02-16 7:45 UTC (permalink / raw)
To: Bill Anderson, LKML
On Monday 16 February 2004 15:26, Bill Anderson wrote:
> On Sun, 2004-02-15 at 23:24, Michael Frank wrote:
> > I had this somtetimes when using ntpd doing step time update
> > resulting in silly values in /etc/adjtime .
> >
> > # mv /etc/adjtime /tmp
> > # hwclock --systohc
> >
> > and see if it goes away.
>
> Thanks, though it didn't work. :(
>
Please check your /etc/ntp/drift , the value in it is
usually between -30.0 and 30.0
If it is much larger than that, set it to 0.0 and restart ntpd.
Also move /etc/adjtime away again.
Regards
Michael
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: system (not HW) clock advancing really fast
2004-02-16 7:45 ` Michael Frank
@ 2004-02-16 7:48 ` Bill Anderson
2004-02-16 8:41 ` Michael Frank
0 siblings, 1 reply; 7+ messages in thread
From: Bill Anderson @ 2004-02-16 7:48 UTC (permalink / raw)
To: LKML
On Mon, 2004-02-16 at 00:45, Michael Frank wrote:
> On Monday 16 February 2004 15:26, Bill Anderson wrote:
> > On Sun, 2004-02-15 at 23:24, Michael Frank wrote:
> > > I had this somtetimes when using ntpd doing step time update
> > > resulting in silly values in /etc/adjtime .
> > >
> > > # mv /etc/adjtime /tmp
> > > # hwclock --systohc
> > >
> > > and see if it goes away.
> >
> > Thanks, though it didn't work. :(
> >
>
> Please check your /etc/ntp/drift , the value in it is
> usually between -30.0 and 30.0
>
> If it is much larger than that, set it to 0.0 and restart ntpd.
Done that, too. in fact, that was my first target.
Along with stop ntpd, sync, clear drift, clear adjtime, sync again, and
restart ntpd. Sorry, should have said that. It's been a *looong* time
since I've posted here.
I just tried some new stuff that is interesting.
MachineA is the one with the problem. MachineB is an identical machine
(as far as two machines can be).
On MachineA I am seeing some interesting things with /proc/interrupts
and the timer interrupt line.
On MachineA:
Over 10 seconds (wall clock):
CPU0: 107 interrupts/second (avg)
CPU1: 102.5 interrupts/second (avg)
[Over 10K interrupts difference between the two]
On MachineB:
Over 10 seconds (wall clock):
CPU0: 46.4 interrupts/second (avg)
CPU1: 45.5 interrupts/second (avg)
[Over 10K interrupts difference between the two]
Now, the CPU differences don't make me blink. However, the slightly more
than double the rate of timer interrupts on the problem machine is
interesting to me. or is it a red herring/blind alley? Especially since
it now seems to be ~2 seconds per second fast.
Cheers, and thanks for the help so far, Michael.
Bill
--
Bill Anderson <banderson@hp.com>
Red Hat Certified Engineer
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: system (not HW) clock advancing really fast
2004-02-16 7:48 ` Bill Anderson
@ 2004-02-16 8:41 ` Michael Frank
2004-02-16 8:50 ` Michael Frank
0 siblings, 1 reply; 7+ messages in thread
From: Michael Frank @ 2004-02-16 8:41 UTC (permalink / raw)
To: Bill Anderson, LKML
On Monday 16 February 2004 15:48, Bill Anderson wrote:
> On Mon, 2004-02-16 at 00:45, Michael Frank wrote:
> > On Monday 16 February 2004 15:26, Bill Anderson wrote:
> > > On Sun, 2004-02-15 at 23:24, Michael Frank wrote:
> > > > I had this somtetimes when using ntpd doing step time update
> > > > resulting in silly values in /etc/adjtime .
> > > >
> > > > # mv /etc/adjtime /tmp
> > > > # hwclock --systohc
> > > >
> > > > and see if it goes away.
> > >
> > > Thanks, though it didn't work. :(
> > >
> >
> > Please check your /etc/ntp/drift , the value in it is
> > usually between -30.0 and 30.0
> >
> > If it is much larger than that, set it to 0.0 and restart ntpd.
>
>
> Done that, too. in fact, that was my first target.
> Along with stop ntpd, sync, clear drift, clear adjtime, sync again, and
> restart ntpd. Sorry, should have said that. It's been a *looong* time
> since I've posted here.
The basic suggestions were bound to be redundant ;)
>
> I just tried some new stuff that is interesting.
>
> MachineA is the one with the problem. MachineB is an identical machine
> (as far as two machines can be).
>
> On MachineA I am seeing some interesting things with /proc/interrupts
> and the timer interrupt line.
>
> On MachineA:
> Over 10 seconds (wall clock):
> CPU0: 107 interrupts/second (avg)
> CPU1: 102.5 interrupts/second (avg)
> [Over 10K interrupts difference between the two]
> On MachineB:
> Over 10 seconds (wall clock):
> CPU0: 46.4 interrupts/second (avg)
> CPU1: 45.5 interrupts/second (avg)
> [Over 10K interrupts difference between the two]
>
> Now, the CPU differences don't make me blink. However, the slightly more
> than double the rate of timer interrupts on the problem machine is
> interesting to me. or is it a red herring/blind alley? Especially since
> it now seems to be ~2 seconds per second fast.
>
When running vmstat, on 2.4 100+ idle interrupts/s is normal,
100 for the clock, and a few extra for what else goes on.
If the machines are _identical_, your problem points definitely
to hardware.
a) Timer broken - too fast.
b) Generates IRQ's on both edges
c) the clock interrupt being routed into both CPU's
at the same time.
You could boot with NOSMP to rule out c)
Weird breakdown anyway,
Regards
Michael
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: system (not HW) clock advancing really fast
2004-02-16 8:41 ` Michael Frank
@ 2004-02-16 8:50 ` Michael Frank
0 siblings, 0 replies; 7+ messages in thread
From: Michael Frank @ 2004-02-16 8:50 UTC (permalink / raw)
To: Bill Anderson, LKML
On Monday 16 February 2004 16:41, Michael Frank wrote:
>
> a) Timer broken - too fast.
> b) Generates IRQ's on both edges
> c) the clock interrupt being routed into both CPU's
> at the same time.
Actually, I think the Crystal oscillator is nuts,
explaing that it sometimes jumps by several seconds.
Could you get hold of some freezing spray find the Crystal... ;)
And, lets keep further mail of LKML - enough traffic there.
>
> You could boot with NOSMP to rule out c)
>
> Weird breakdown anyway,
>
> Regards
> Michael
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-02-16 8:41 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-16 5:46 system (not HW) clock advancing really fast Bill Anderson
2004-02-16 6:24 ` Michael Frank
2004-02-16 7:26 ` Bill Anderson
2004-02-16 7:45 ` Michael Frank
2004-02-16 7:48 ` Bill Anderson
2004-02-16 8:41 ` Michael Frank
2004-02-16 8:50 ` Michael Frank
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox