From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Audit of NMI and MCE paths Date: Tue, 4 Dec 2012 20:04:02 +0000 Message-ID: <50BE5732.2050801@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "xen-devel@lists.xen.org" Cc: Tim Deegan , Keir Fraser , Jan Beulich List-Id: xen-devel@lists.xenproject.org I have just starting auditing the NMI path and found that the oprofile code calls into a fair amount of common code. So far, down the first leg of the call graph, I have found several ASSERT()s, a BUG() and many {rd,wr}msr()s. Given that these are common code, and sensible in their places, removing them for the sake of being on the NMI path seems silly. As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s NMI/MCE safe, from a printk spinlock point of view. Either we can modify the macros to do a console_force_unlock(), which is fine for BUG() and ASSERT(), but problematic for WARN() (and deferring the printing to a tasklet wont work if we want a stack trace). Alternativly, we could change the console lock to be a recursive lock, at which point it is safe from the deadlock point of view. Are there any performance concerns from changing to a recursive lock? As for spinlocks themselves, as far as I can reason, recursive locks are safe to use, as are per-cpu spinlocks which are used exclusivly in the NMI handler or MCE handler (but not both), given the proviso that we have C level reentrance protection for do_{nmi,mce}(). For the {rd,wr}msr()s, we can assume that the Xen code is good and is not going to fault on access to the MSR, but we certainly cant guarantee this. As a result, I do not think it is practical or indeed sensible to remove all possibility of faults from the NMI path (and MCE to a lesser extent). Would it however be acceptable to change the console lock to a recursive lock, and rely on the Linux-inspired extended solution which will correctly deal with some nested cases, and panic verbosely in all other cases? -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com