From mboxrd@z Thu Jan  1 00:00:00 1970
From: Tim Deegan <tim@xen.org>
Subject: Re: [PATCH] x86/watchdog: Use real timestamps for
	watchdog timeout
Date: Fri, 24 May 2013 18:10:11 +0100
Message-ID: <20130524171011.GA60007@ocelot.phlegethon.org>
References: <519F3AED.2090209@citrix.com>
	<ebb0070be9fd3fb26bec.1369341126@andrewcoop.uk.xensource.com>
	<519F2E5D02000078000D8AA7@nat28.tlf.novell.com>
	<519F3994.7040008@citrix.com>
	<20130524101312.GB54769@ocelot.phlegethon.org>
	<519F6CD602000078000D8BE5@nat28.tlf.novell.com>
	<20130524124158.GC54769@ocelot.phlegethon.org>
	<519F61AF.2070203@citrix.com>
	<20130524135549.GA57961@ocelot.phlegethon.org>
	<519F794E.5050802@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Content-Disposition: inline
In-Reply-To: <519F794E.5050802@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "Keir (Xen.org)" <keir@xen.org>, Jan Beulich <JBeulich@suse.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

At 15:29 +0100 on 24 May (1369409374), Andrew Cooper wrote:
> On 24/05/13 14:55, Tim Deegan wrote:
> > At 13:48 +0100 on 24 May (1369403327), Andrew Cooper wrote:
> >> On 24/05/13 13:41, Tim Deegan wrote:
> >>> Of those two, I prefer (1), just because it doesn't add any cost to the
> >>> normal users of NOW().
> >> I was not planning to make any requirement to change users of NOW(). 
> > Well, you were planning to make NOW() slightly more expensive by needing
> > to look up which of the banekd alternatives is valid.  In any case, I
> > think some sort of approximate version based on tsc will do.
> >
> > Tim.
> 
> I was planning to memcpy the shadow set over the main set as part of
> calibration, leaving no alteration whatsoever to NOW().

Sorry, yes, I see how that works now.  And so I too prefer (2). :)

> An approximation from the TSC alone would be better so long as it is a
> reasonable approximation.  I am concerned about how accuate a dumb
> approximation would be for non-stable TSCs etc.

Yep.  I'm more and more convinced that we should gate on the number of
NMIs we've taken without seeing a timer tick.  I'm more afratid of funny
TSC edge cases (and remember we might take an NMI anywhere in the s3
wakeup) than I am of machines with really bad NMI storms.  So even if
the approximate time is wildly off we just print the wrong thing. 

In the case where you saw this (and cpu0 was alive for a while before
it managed a burst of enough NMIs), would detecting and warning
about high NMI rates be enough to point out what's gone wrong?

Tim.