Audit of NMI and MCE paths

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Audit of NMI and MCE paths
@ 2012-12-04 20:04 Andrew Cooper
  2012-12-05 10:26 ` Jan Beulich
  2012-12-06 10:27 ` Tim Deegan
  0 siblings, 2 replies; 3+ messages in thread
From: Andrew Cooper @ 2012-12-04 20:04 UTC (permalink / raw)
  To: xen-devel@lists.xen.org; +Cc: Tim Deegan, Keir Fraser, Jan Beulich

I have just starting auditing the NMI path and found that the oprofile
code calls into a fair amount of common code.

So far, down the first leg of the call graph, I have found several
ASSERT()s, a BUG() and many {rd,wr}msr()s.  Given that these are common
code, and sensible in their places, removing them for the sake of being
on the NMI path seems silly.

As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s
NMI/MCE safe, from a printk spinlock point of view.

Either we can modify the macros to do a console_force_unlock(), which is
fine for BUG() and ASSERT(), but problematic for WARN() (and deferring
the printing to a tasklet wont work if we want a stack trace). 
Alternativly, we could change the console lock to be a recursive lock,
at which point it is safe from the deadlock point of view.  Are there
any performance concerns from changing to a recursive lock?

As for spinlocks themselves, as far as I can reason, recursive locks are
safe to use, as are per-cpu spinlocks which are used exclusivly in the
NMI handler or MCE handler (but not both), given the proviso that we
have C level reentrance protection for do_{nmi,mce}().

For the {rd,wr}msr()s, we can assume that the Xen code is good and is
not going to fault on access to the MSR, but we certainly cant guarantee
this.

As a result, I do not think it is practical or indeed sensible to remove
all possibility of faults from the NMI path (and MCE to a lesser
extent).  Would it however be acceptable to change the console lock to a
recursive lock, and rely on the Linux-inspired extended solution which
will correctly deal with some nested cases, and panic verbosely in all
other cases?

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Audit of NMI and MCE paths
  2012-12-04 20:04 Audit of NMI and MCE paths Andrew Cooper
@ 2012-12-05 10:26 ` Jan Beulich
  2012-12-06 10:27 ` Tim Deegan
  1 sibling, 0 replies; 3+ messages in thread
From: Jan Beulich @ 2012-12-05 10:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Keir Fraser, xen-devel@lists.xen.org

>>> On 04.12.12 at 21:04, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s
> NMI/MCE safe, from a printk spinlock point of view.
> 
> Either we can modify the macros to do a console_force_unlock(), which is
> fine for BUG() and ASSERT(), but problematic for WARN() (and deferring
> the printing to a tasklet wont work if we want a stack trace). 
> Alternativly, we could change the console lock to be a recursive lock,
> at which point it is safe from the deadlock point of view.  Are there
> any performance concerns from changing to a recursive lock?

Not really, and the console lock isn't performance critical anyway.

> As for spinlocks themselves, as far as I can reason, recursive locks are
> safe to use, as are per-cpu spinlocks which are used exclusivly in the
> NMI handler or MCE handler (but not both), given the proviso that we
> have C level reentrance protection for do_{nmi,mce}().
> 
> For the {rd,wr}msr()s, we can assume that the Xen code is good and is
> not going to fault on access to the MSR, but we certainly cant guarantee
> this.

{rd,wr}msr() are of no concern - if they fault it's exactly like a #PF
or #GP from a bad memory reference: a bug that will bring down the
hypervisor. Their _safe counterparts are what needs to be looked
for, as there the fault is being recovered from (and it's this recovery's
side effect of re-enabling NMIs that we don't want).

Jan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Audit of NMI and MCE paths
  2012-12-04 20:04 Audit of NMI and MCE paths Andrew Cooper
  2012-12-05 10:26 ` Jan Beulich
@ 2012-12-06 10:27 ` Tim Deegan
  1 sibling, 0 replies; 3+ messages in thread
From: Tim Deegan @ 2012-12-06 10:27 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, xen-devel@lists.xen.org

At 20:04 +0000 on 04 Dec (1354651442), Andrew Cooper wrote:
> I have just starting auditing the NMI path and found that the oprofile
> code calls into a fair amount of common code.
> 
> So far, down the first leg of the call graph, I have found several
> ASSERT()s, a BUG() and many {rd,wr}msr()s.  Given that these are common
> code, and sensible in their places, removing them for the sake of being
> on the NMI path seems silly.
> 
> As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s
> NMI/MCE safe, from a printk spinlock point of view.

WARN()s would need to be removed, since they involve a non-fatal fault.

> Either we can modify the macros to do a console_force_unlock(), which is
> fine for BUG() and ASSERT(), but problematic for WARN() (and deferring
> the printing to a tasklet wont work if we want a stack trace). 
> Alternativly, we could change the console lock to be a recursive lock,
> at which point it is safe from the deadlock point of view.

It's only safe if the console lock is the _only_ lock that can be taken
both in NMI/MCE context and in 'normal' IRQ context.  Otherwise
we'd end up with exactly the class of deadlocks we had before with
IRQ/non-IRQ.

> For the {rd,wr}msr()s, we can assume that the Xen code is good and is
> not going to fault on access to the MSR, but we certainly cant guarantee
> this.

As Jan points out, it's *msr_safe() we need to worry about.

> As a result, I do not think it is practical or indeed sensible to remove
> all possibility of faults from the NMI path (and MCE to a lesser
> extent).

I'm not sure what the problem is -- the printk() locking issue is AFAICT
unrelated to the nested-NMI one, and will have to be fixed separately
from whatever we do for nested NMI.  So AFAICT we have to audit for
WARN()s and non-fatal printk()s in NMI/MCE code regardless.

Tim.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-12-06 10:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-04 20:04 Audit of NMI and MCE paths Andrew Cooper
2012-12-05 10:26 ` Jan Beulich
2012-12-06 10:27 ` Tim Deegan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).