From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tim Deegan Subject: Re: Audit of NMI and MCE paths Date: Thu, 6 Dec 2012 10:27:14 +0000 Message-ID: <20121206102714.GG82725@ocelot.phlegethon.org> References: <50BE5732.2050801@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <50BE5732.2050801@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper Cc: Keir Fraser , Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org At 20:04 +0000 on 04 Dec (1354651442), Andrew Cooper wrote: > I have just starting auditing the NMI path and found that the oprofile > code calls into a fair amount of common code. > > So far, down the first leg of the call graph, I have found several > ASSERT()s, a BUG() and many {rd,wr}msr()s. Given that these are common > code, and sensible in their places, removing them for the sake of being > on the NMI path seems silly. > > As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s > NMI/MCE safe, from a printk spinlock point of view. WARN()s would need to be removed, since they involve a non-fatal fault. > Either we can modify the macros to do a console_force_unlock(), which is > fine for BUG() and ASSERT(), but problematic for WARN() (and deferring > the printing to a tasklet wont work if we want a stack trace). > Alternativly, we could change the console lock to be a recursive lock, > at which point it is safe from the deadlock point of view. It's only safe if the console lock is the _only_ lock that can be taken both in NMI/MCE context and in 'normal' IRQ context. Otherwise we'd end up with exactly the class of deadlocks we had before with IRQ/non-IRQ. > For the {rd,wr}msr()s, we can assume that the Xen code is good and is > not going to fault on access to the MSR, but we certainly cant guarantee > this. As Jan points out, it's *msr_safe() we need to worry about. > As a result, I do not think it is practical or indeed sensible to remove > all possibility of faults from the NMI path (and MCE to a lesser > extent). I'm not sure what the problem is -- the printk() locking issue is AFAICT unrelated to the nested-NMI one, and will have to be fixed separately from whatever we do for nested NMI. So AFAICT we have to audit for WARN()s and non-fatal printk()s in NMI/MCE code regardless. Tim.