From mboxrd@z Thu Jan  1 00:00:00 1970
From: Tim Deegan <tim@xen.org>
Subject: Re: Woes of NMIs and MCEs, and possibly how to fix
Date: Sat, 1 Dec 2012 09:09:16 +0000
Message-ID: <20121201090916.GA7987@ocelot.phlegethon.org>
References: <50B8EE13.6070301@citrix.com> <50B9050B.7090709@citrix.com>
	<20121130202420.GC95877@ocelot.phlegethon.org>
	<50B93964.4040609@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Content-Disposition: inline
In-Reply-To: <50B93964.4040609@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Mats Petersson <mats.petersson@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

At 22:55 +0000 on 30 Nov (1354316132), Andrew Cooper wrote:
> Any spinlocking whatsoever is out, until we completely fix the
> re-entrant entry to do_{nmi,mce}(), at which point spinlocks which are
> ok so long as they are exclusively used inside their respective
> handlers.

AFAICT spinlocks are bad news even if they're only in their respective
handlers, as one CPU can get (NMI (MCE)) while the other gets (MCE (NMI)).

> >>> 2) Faults on the MCE path will re-enable NMIs, as will the iret of the
> >>> MCE itself if an MCE interrupts an NMI.
> >> The same questions apply as to #1 (just replace NMI with MCE)
> > Andrew pointed out that some MCE code uses rdmsr_safe().
> >
> > FWIW, I think that constraining MCE and NMI code not to do anything that
> > can fault is perfectly reasonable.  The MCE code has grown a lot
> > recently and probably needs an audit to check for spinlocks, faults &c.
> 
> Yes.  However, being able to deal gracefully with the case were we miss
> things on code review which touches the NMI/MCE paths is certainly
> better than crashing in subtle ways.

Agreed.

> At the moment, the clearing of the MCIP bit is quite early in a few of
> the cpu family specific MCE handlers.  As it is an architectural MSR, I
> was considering moving it outside the family specific handlers, and make
> one of the last things on the MCE path, to help reduce the race
> condition window until we properly fix reentrant MCEs.

Why not have a per-cpu mce-in-progress flag, and clear MCIP early?  That
way you get a panic instead of silently losing a CPU.

> >>> [1] In an effort to prevent a flamewar with my comment, the situation we
> >>> find outself in now is almost certainly the result of unforseen
> >>> interactions of individual features, but we are left to pick up the many
> >>> pieces in way which cant completely be solved.
> >>>
> >> Happy to have my comments completely shot down into little bits, but I'm 
> >> worrying that we're looking to solve a problem that doesn't actually 
> >> need solving - at least as long as the code in the respective handlers 
> >> are "doing the right thing", and if that happens to be broken, then we 
> >> should fix THAT, not build lots of extra code to recover from such a thing.
> > I agree.  The only things we can't fix by DTRT in do_nmi() and do_mce()
> > are:
> >  - IRET in SMM mode re-enabling NMIs; and
> >  - detecting _every_ case where we get a nested NMI/MCE (all we can 
> >    do is detect _most_ cases, but the detection is just so we can print
> >    a message before the crash).
> 
> We would need to modify the asm stubs to detect nesting.

We can detect _almost all_ nesting from C with an in-progress flag.  We
can probably expand that to cover all nesting by pushing the
flag-setting/flag-clearing out to the asm but that'd still be only a
couple of lines of change - a lot simpler than what we'd need to allow
nested MCE/NMIs to continue without crashing.

>  I think it is
> unreasonable to expect the C functions do_{nmi,mce}() to be reentrantly
> safe.

Good. :)  I'm not suggesting that.

Tim.