From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Audit of NMI and MCE paths
Date: Tue, 4 Dec 2012 20:04:02 +0000
Message-ID: <50BE5732.2050801@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: Tim Deegan <tim@xen.org>, Keir Fraser <keir@xen.org>, Jan Beulich <jbeulich@suse.com>
List-Id: xen-devel@lists.xenproject.org

I have just starting auditing the NMI path and found that the oprofile
code calls into a fair amount of common code.

So far, down the first leg of the call graph, I have found several
ASSERT()s, a BUG() and many {rd,wr}msr()s.  Given that these are common
code, and sensible in their places, removing them for the sake of being
on the NMI path seems silly.

As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s
NMI/MCE safe, from a printk spinlock point of view.

Either we can modify the macros to do a console_force_unlock(), which is
fine for BUG() and ASSERT(), but problematic for WARN() (and deferring
the printing to a tasklet wont work if we want a stack trace). 
Alternativly, we could change the console lock to be a recursive lock,
at which point it is safe from the deadlock point of view.  Are there
any performance concerns from changing to a recursive lock?

As for spinlocks themselves, as far as I can reason, recursive locks are
safe to use, as are per-cpu spinlocks which are used exclusivly in the
NMI handler or MCE handler (but not both), given the proviso that we
have C level reentrance protection for do_{nmi,mce}().

For the {rd,wr}msr()s, we can assume that the Xen code is good and is
not going to fault on access to the MSR, but we certainly cant guarantee
this.


As a result, I do not think it is practical or indeed sensible to remove
all possibility of faults from the NMI path (and MCE to a lesser
extent).  Would it however be acceptable to change the console lock to a
recursive lock, and rely on the Linux-inspired extended solution which
will correctly deal with some nested cases, and panic verbosely in all
other cases?

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com