From: Tim Deegan <tim@xen.org>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Mats Petersson <mats.petersson@citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Woes of NMIs and MCEs, and possibly how to fix
Date: Sat, 1 Dec 2012 09:09:16 +0000 [thread overview]
Message-ID: <20121201090916.GA7987@ocelot.phlegethon.org> (raw)
In-Reply-To: <50B93964.4040609@citrix.com>
At 22:55 +0000 on 30 Nov (1354316132), Andrew Cooper wrote:
> Any spinlocking whatsoever is out, until we completely fix the
> re-entrant entry to do_{nmi,mce}(), at which point spinlocks which are
> ok so long as they are exclusively used inside their respective
> handlers.
AFAICT spinlocks are bad news even if they're only in their respective
handlers, as one CPU can get (NMI (MCE)) while the other gets (MCE (NMI)).
> >>> 2) Faults on the MCE path will re-enable NMIs, as will the iret of the
> >>> MCE itself if an MCE interrupts an NMI.
> >> The same questions apply as to #1 (just replace NMI with MCE)
> > Andrew pointed out that some MCE code uses rdmsr_safe().
> >
> > FWIW, I think that constraining MCE and NMI code not to do anything that
> > can fault is perfectly reasonable. The MCE code has grown a lot
> > recently and probably needs an audit to check for spinlocks, faults &c.
>
> Yes. However, being able to deal gracefully with the case were we miss
> things on code review which touches the NMI/MCE paths is certainly
> better than crashing in subtle ways.
Agreed.
> At the moment, the clearing of the MCIP bit is quite early in a few of
> the cpu family specific MCE handlers. As it is an architectural MSR, I
> was considering moving it outside the family specific handlers, and make
> one of the last things on the MCE path, to help reduce the race
> condition window until we properly fix reentrant MCEs.
Why not have a per-cpu mce-in-progress flag, and clear MCIP early? That
way you get a panic instead of silently losing a CPU.
> >>> [1] In an effort to prevent a flamewar with my comment, the situation we
> >>> find outself in now is almost certainly the result of unforseen
> >>> interactions of individual features, but we are left to pick up the many
> >>> pieces in way which cant completely be solved.
> >>>
> >> Happy to have my comments completely shot down into little bits, but I'm
> >> worrying that we're looking to solve a problem that doesn't actually
> >> need solving - at least as long as the code in the respective handlers
> >> are "doing the right thing", and if that happens to be broken, then we
> >> should fix THAT, not build lots of extra code to recover from such a thing.
> > I agree. The only things we can't fix by DTRT in do_nmi() and do_mce()
> > are:
> > - IRET in SMM mode re-enabling NMIs; and
> > - detecting _every_ case where we get a nested NMI/MCE (all we can
> > do is detect _most_ cases, but the detection is just so we can print
> > a message before the crash).
>
> We would need to modify the asm stubs to detect nesting.
We can detect _almost all_ nesting from C with an in-progress flag. We
can probably expand that to cover all nesting by pushing the
flag-setting/flag-clearing out to the asm but that'd still be only a
couple of lines of change - a lot simpler than what we'd need to allow
nested MCE/NMIs to continue without crashing.
> I think it is
> unreasonable to expect the C functions do_{nmi,mce}() to be reentrantly
> safe.
Good. :) I'm not suggesting that.
Tim.
next prev parent reply other threads:[~2012-12-01 9:09 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-30 17:34 Woes of NMIs and MCEs, and possibly how to fix Andrew Cooper
2012-11-30 17:56 ` Tim Deegan
2012-11-30 21:59 ` Andrew Cooper
2012-12-03 8:28 ` Jan Beulich
2012-11-30 19:12 ` Mats Petersson
2012-11-30 20:24 ` Tim Deegan
2012-11-30 22:55 ` Andrew Cooper
2012-12-01 9:09 ` Tim Deegan [this message]
2012-12-03 8:09 ` Jan Beulich
2012-12-03 8:26 ` Jan Beulich
2012-12-03 11:24 ` George Dunlap
2012-12-03 11:45 ` Mats Petersson
2012-12-03 12:31 ` Jan Beulich
2012-12-03 11:50 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121201090916.GA7987@ocelot.phlegethon.org \
--to=tim@xen.org \
--cc=andrew.cooper3@citrix.com \
--cc=mats.petersson@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.