From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265438AbUBPIcJ (ORCPT ); Mon, 16 Feb 2004 03:32:09 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265445AbUBPIcJ (ORCPT ); Mon, 16 Feb 2004 03:32:09 -0500 Received: from nsmtp.pacific.net.th ([203.121.130.117]:42138 "EHLO nsmtp.pacific.net.th") by vger.kernel.org with ESMTP id S265438AbUBPIb4 (ORCPT ); Mon, 16 Feb 2004 03:31:56 -0500 From: Michael Frank To: Bill Anderson , LKML Subject: Re: system (not HW) clock advancing really fast Date: Mon, 16 Feb 2004 16:41:33 +0800 User-Agent: KMail/1.5.4 References: <1076910368.25980.12.camel@perseus> <200402161545.09901.mhf@linuxmail.org> <1076917697.25980.35.camel@perseus> In-Reply-To: <1076917697.25980.35.camel@perseus> X-OS: KDE 3 on GNU/Linux MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200402161641.33323.mhf@linuxmail.org> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Monday 16 February 2004 15:48, Bill Anderson wrote: > On Mon, 2004-02-16 at 00:45, Michael Frank wrote: > > On Monday 16 February 2004 15:26, Bill Anderson wrote: > > > On Sun, 2004-02-15 at 23:24, Michael Frank wrote: > > > > I had this somtetimes when using ntpd doing step time update > > > > resulting in silly values in /etc/adjtime . > > > > > > > > # mv /etc/adjtime /tmp > > > > # hwclock --systohc > > > > > > > > and see if it goes away. > > > > > > Thanks, though it didn't work. :( > > > > > > > Please check your /etc/ntp/drift , the value in it is > > usually between -30.0 and 30.0 > > > > If it is much larger than that, set it to 0.0 and restart ntpd. > > > Done that, too. in fact, that was my first target. > Along with stop ntpd, sync, clear drift, clear adjtime, sync again, and > restart ntpd. Sorry, should have said that. It's been a *looong* time > since I've posted here. The basic suggestions were bound to be redundant ;) > > I just tried some new stuff that is interesting. > > MachineA is the one with the problem. MachineB is an identical machine > (as far as two machines can be). > > On MachineA I am seeing some interesting things with /proc/interrupts > and the timer interrupt line. > > On MachineA: > Over 10 seconds (wall clock): > CPU0: 107 interrupts/second (avg) > CPU1: 102.5 interrupts/second (avg) > [Over 10K interrupts difference between the two] > On MachineB: > Over 10 seconds (wall clock): > CPU0: 46.4 interrupts/second (avg) > CPU1: 45.5 interrupts/second (avg) > [Over 10K interrupts difference between the two] > > Now, the CPU differences don't make me blink. However, the slightly more > than double the rate of timer interrupts on the problem machine is > interesting to me. or is it a red herring/blind alley? Especially since > it now seems to be ~2 seconds per second fast. > When running vmstat, on 2.4 100+ idle interrupts/s is normal, 100 for the clock, and a few extra for what else goes on. If the machines are _identical_, your problem points definitely to hardware. a) Timer broken - too fast. b) Generates IRQ's on both edges c) the clock interrupt being routed into both CPU's at the same time. You could boot with NOSMP to rule out c) Weird breakdown anyway, Regards Michael