xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Tim Deegan <tim@xen.org>
Cc: "Keir (Xen.org)" <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH] x86/watchdog: Use real timestamps for watchdog timeout
Date: Fri, 24 May 2013 18:27:14 +0100	[thread overview]
Message-ID: <519FA2F2.3070705@citrix.com> (raw)
In-Reply-To: <20130524171011.GA60007@ocelot.phlegethon.org>

On 24/05/13 18:10, Tim Deegan wrote:
> At 15:29 +0100 on 24 May (1369409374), Andrew Cooper wrote:
>> On 24/05/13 14:55, Tim Deegan wrote:
>>> At 13:48 +0100 on 24 May (1369403327), Andrew Cooper wrote:
>>>> On 24/05/13 13:41, Tim Deegan wrote:
>>>>> Of those two, I prefer (1), just because it doesn't add any cost to the
>>>>> normal users of NOW().
>>>> I was not planning to make any requirement to change users of NOW(). 
>>> Well, you were planning to make NOW() slightly more expensive by needing
>>> to look up which of the banekd alternatives is valid.  In any case, I
>>> think some sort of approximate version based on tsc will do.
>>>
>>> Tim.
>> I was planning to memcpy the shadow set over the main set as part of
>> calibration, leaving no alteration whatsoever to NOW().
> Sorry, yes, I see how that works now.  And so I too prefer (2). :)
>
>> An approximation from the TSC alone would be better so long as it is a
>> reasonable approximation.  I am concerned about how accuate a dumb
>> approximation would be for non-stable TSCs etc.
> Yep.  I'm more and more convinced that we should gate on the number of
> NMIs we've taken without seeing a timer tick.  I'm more afratid of funny
> TSC edge cases (and remember we might take an NMI anywhere in the s3
> wakeup) than I am of machines with really bad NMI storms.  So even if
> the approximate time is wildly off we just print the wrong thing. 
>
> In the case where you saw this (and cpu0 was alive for a while before
> it managed a burst of enough NMIs), would detecting and warning
> about high NMI rates be enough to point out what's gone wrong?
>
> Tim.

Not directly, butb or my debugging case knowing when the NMI storm has
started is very useful so I can dump lspci -vvvxxxx to get the debug
state from whichever PCI device is the original cause of the SERR storm.

I certainly don't think there is anything useful Xen could automatically
do when discovering an NMI storm, but leaving a message on the serial or
in a crash state certainly helps someone trying to investigate why the
server reset.  It is certainly better than finding an NMI watchdog
timeout with serial timestamps proving that the watchdog didn't actually
time out :), and starting debugging wondering WTF was going on.

~Andrew

  reply	other threads:[~2013-05-24 17:27 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-23 20:32 [PATCH] x86/watchdog: Use real timestamps for watchdog timeout Andrew Cooper
2013-05-24  7:09 ` Jan Beulich
2013-05-24  9:57   ` Andrew Cooper
2013-05-24 10:13     ` Tim Deegan
2013-05-24 10:33       ` Andrew Cooper
2013-05-24 11:42         ` Jan Beulich
2013-05-24 12:00           ` Andrew Cooper
2013-05-24 13:11             ` Jan Beulich
2013-05-24 11:36       ` Jan Beulich
2013-05-24 12:41         ` Tim Deegan
2013-05-24 12:48           ` Andrew Cooper
2013-05-24 13:55             ` Tim Deegan
2013-05-24 14:29               ` Andrew Cooper
2013-05-24 17:10                 ` Tim Deegan
2013-05-24 17:27                   ` Andrew Cooper [this message]
2013-05-24 13:17           ` Jan Beulich
2013-05-24 14:01             ` Tim Deegan
2013-05-24  9:37 ` Tim Deegan
2013-05-24 10:03   ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=519FA2F2.3070705@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=keir@xen.org \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).