From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH] x86/watchdog: Use real timestamps for watchdog timeout Date: Fri, 24 May 2013 11:33:21 +0100 Message-ID: <519F41F1.6050402@citrix.com> References: <20130524093712.GA54769@ocelot.phlegethon.org> <519F3AED.2090209@citrix.com> <519F2E5D02000078000D8AA7@nat28.tlf.novell.com> <519F3994.7040008@citrix.com> <20130524101312.GB54769@ocelot.phlegethon.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20130524101312.GB54769@ocelot.phlegethon.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Tim Deegan Cc: "Keir (Xen.org)" , Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 24/05/13 11:13, Tim Deegan wrote: > At 10:57 +0100 on 24 May (1369393060), Andrew Cooper wrote: >> On 24/05/13 08:09, Jan Beulich wrote: >>> You can't use NOW() here - while the time updating code is safe >>> against normal interrupts, it's not atomic wrt NMIs. >> But NMIs are latched at the hardware level. If we get a nested NMI the >> Xen will be toast on the exit path anyway. > The problem is that an NMI can arrive while local_time_calibration() is > writing its results, so calling NOW() in the NMI handler might return > garbage. Aah - I see. Sorry - I misunderstood the original point. Yes - that is an issue. Two solutions come to mind. 1) Along with the local_irq_disable()/enable() pairs in local_time_calibration, having an atomic_t indicating "time data update in progress", allowing the NMI handler to decide to bail early. 2) Modify local_time_calibration() to fill in a shadow cpu_time set, and a different atomic_t to indicate which one is consistent. This would allow the NMI handler to always use one consistent set of timing information. > >>> Handling this case it nice, but I wonder whether this patch ought to >>> detect and report ludicrous NMI rates rather than silently ignoring >>> them. I guess that's hard to do in an NMI handler, other than by >>> adjusting the printk when we crash. > Actually on second thoughts it's easier: as well as having this patch > (or near equivalent) to avoid premature watchdog expiry, we cna detect > the NMI rate in, say, the timer softirq and report if it's gone mad. > > Cheers, > > Tim. I was thinking along that line, but had not yet worked out where to put it. That looks like the best place. ~Andrew