From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Thu, 23 May 2002 13:38:38 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Thu, 23 May 2002 13:38:37 -0400 Received: from gateway-1237.mvista.com ([12.44.186.158]:9982 "EHLO av.mvista.com") by vger.kernel.org with ESMTP id ; Thu, 23 May 2002 13:38:36 -0400 Message-ID: <3CED1AF4.7170970@mvista.com> Date: Thu, 23 May 2002 09:38:12 -0700 From: george anzinger Organization: Monta Vista Software X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.12-20b i686) X-Accept-Language: en MIME-Version: 1.0 To: Eric Seppanen , "linux-kernel@vger.kernel.org" Subject: Re: via timer/clock problem workaround In-Reply-To: <20020522221859.A24041@reric.net> <3CECB24E.D41B1527@mvista.com> <20020523103340.A27767@reric.net> <3CED076C.20C24B23@mvista.com> <20020523113943.A28069@reric.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Eric Seppanen wrote: > > On Thu, May 23, 2002 at 08:14:52AM -0700, george anzinger wrote: > > It occurs to me that you could be MUCH more aggressive in > > your fault detection. Since the code you are detecting the > > fault in is interruptable, (i.e. if a timer interrupts were > > happening they surly would happen at this time) you should > > be able to safely assume that a value greater than, oh say > > 1.5*1/HZ IS a fault. You could then, immediately do the > > "fix" code. Note that it is not enough to just restart the > > time since it is now out of step with xtime. The update I > > suggested should fix things to oh say about a 10 usec or > > less hiccup. > > OK, makes sense. My focus for now was to figure out how to detect the > situation. Since the offset returned is 71 minutes into the future, the > number I chose (~1 second) wasn't real important. > > > The unfortunate thing about the fix is that execution of the > > detection code requires some one to request the time of > > day. This, of course, could be delayed by an arbitrary > > time, depending on system activity. > > Good point. Looking at it from that perspective, it may be a waste of > time to put fixes in do_gettimeofday(). A dead timer is pretty serious, > and I can't think of a simple way to detect it. A slower, backup timer > would be an option. Or maybe doing a timer-sanity-check every _x_ > interrupts. But good luck getting that past Linus/Marcelo :) > > Best thing would be to figure out what's wrong with the timer and try to > find a way to use the timer that doesn't suffer this problem. I don't > have any good ideas to try here, however. It would be good to have some input from VIA on this, but I can not find any errata on this (or anything else for that matter-- they must make perfect chips :). If you have an SMP box, there are other timers (in the APIC) that generate interrupts, but, of course, most do not have SMP boxen. The ONLY thing in a UP box that generates a stream of interrupts with out being acknowledged is the PIT. This is why it drives the NMI stuff in SMP boxes, for example. There are other watchdogs, but I think they require special hardware. On the other hand, WHY does the box need to be busy for this to fail? From the timer point of view, exactly what is busy? I wonder if it is a voltage sag or some such. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml