Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: David Gibson <david@gibson.dropbear.id.au>
To: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Daniel Axtens <dja@axtens.net>,
	kvm@vger.kernel.org, michaele@au1.ibm.com,
	mahesh@linux.vnet.ibm.com, agraf@suse.de,
	kvm-ppc@vger.kernel.org, linuxppc-dev@ozlabs.org
Subject: Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception
Date: Fri, 13 Nov 2015 12:50:15 +1100	[thread overview]
Message-ID: <20151113015015.GI4886@voom.redhat.com> (raw)
In-Reply-To: <5644D1DD.1020201@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 5932 bytes --]

On Thu, Nov 12, 2015 at 11:22:29PM +0530, Aravinda Prasad wrote:
> 
> 
> On Thursday 12 November 2015 10:13 AM, David Gibson wrote:
> > On Thu, Nov 12, 2015 at 10:02:10AM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Thursday 12 November 2015 09:08 AM, David Gibson wrote:
> >>> On Thu, Nov 12, 2015 at 01:24:19PM +1100, Daniel Axtens wrote:
> >>>> Aravinda Prasad <aravinda@linux.vnet.ibm.com> writes:
> >>>>
> >>>>> This patch modifies KVM to cause a guest exit with
> >>>>> KVM_EXIT_NMI instead of immediately delivering a 0x200
> >>>>> interrupt to guest upon machine check exception in
> >>>>> guest address. Exiting the guest enables QEMU to build
> >>>>> error log and deliver machine check exception to guest
> >>>>> OS (either via guest OS registered machine check
> >>>>> handler or via 0x200 guest OS interrupt vector).
> >>>>>
> >>>>> This approach simplifies the delivering of machine
> >>>>> check exception to guest OS compared to the earlier approach
> >>>>> of KVM directly invoking 0x200 guest interrupt vector.
> >>>>> In the earlier approach QEMU patched the 0x200 interrupt
> >>>>> vector during boot. The patched code at 0x200 issued a
> >>>>> private hcall to pass the control to QEMU to build the
> >>>>> error log.
> >>>>>
> >>>>> This design/approach is based on the feedback for the
> >>>>> QEMU patches to handle machine check exception. Details
> >>>>> of earlier approach of handling machine check exception
> >>>>> in QEMU and related discussions can be found at:
> >>>>>
> >>>>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> >>>>
> >>>> I've poked at the MCE code, but not the KVM MCE code, so I may be
> >>>> mistaken here, but I'm not clear on how this handles errors that the
> >>>> guest can recover without terminating.
> >>>>
> >>>> For example, a Linux guest can handle a UE in guest userspace by killing
> >>>> the guest process. A hypthetical non-linux guest with a microkernel
> >>>> could even survive UEs in drivers.
> >>>>
> >>>> It sounds from your patch like you're changing this behaviour. Is this
> >>>> right?
> >>>
> >>> So, IIUC.  Once the qemu pieces are in place as well it shouldn't
> >>> change this behaviour: KVM will exit to qemu, qemu will log the error
> >>> information (new), then reinject the MC to the guest which can still
> >>> handle it as you describe above.
> >>
> >> Yes. With KVM and QEMU both in place this will not change the behavior.
> >> QEMU will inject the UE to guest and the guest handles the UE based on
> >> where it occurred. For example if an UE happens in a guest process
> >> address space, that process will be killed.
> >>
> >>>
> >>> But, there could be a problem if you have a new kernel with an old
> >>> qemu, in that case qemu might not understand the new exit type and
> >>> treat it as a fatal error, even though the guest could actually cope
> >>> with it.
> >>
> >> In case of new kernel and old QEMU, the guest terminates as old QEMU
> >> does not understand the NMI exit reason. However, this is the case with
> >> old kernel and old QEMU as they do not handle UE belonging to guest. The
> >> difference is that the guest kernel terminates with different error
> >> code.
> > 
> > Ok.. assuming the guest has code to handle the UE in 0x200, why would
> > the guest terminate with old kernel and old qemu?  I haven't quite
> > followed the logic.
> 
> I overlooked it. I think I need to take into consideration whether guest
> issued "ibm, nmi-register". If the guest has issued "ibm, nmi-register"
> then we should not jump to 0x200 upon UE. With the old kernel and old
> QEMU this is broken as we always jump to 0x200.
> 
> However, if the guest has not issued "ibm, nmi-register" then upon UE we
> should jump to 0x200. If new kernel is used with old QEMU this
> functionality breaks as it causes guest to terminate with unhandled NMI
> exit.
> 
> So thinking whether qemu should explicitly enable the new NMI
> behavior.

So, I think the reasoning above tends towards having qemu control the
MC behaviour.  If qemu does nothing, MCs are delivered direct to
0x200, if it enables the new handling, they cause a KVM exit and qemu
will deliver the MC.

Then I'd expect qemu to switch on the new-style handling from
ibm,nmi-register.

> 
> Regards,
> Aravinda
> 
> > 
> >>
> >> old kernel and old QEMU -> guest panics [1] irrespective of where UE
> >>                            happened in guest address space.
> >> old kernel and new QEMU -> guest panics. same as above.
> >> new kernel and old QEMU -> guest terminates with unhanded NMI error
> >>                            irrespective of where UE happened in guest
> >> new kernel and new QEMU -> guest handles UEs in process address space
> >>                            by killing the process. guest terminates
> >>                            for UEs in guest kernel address space.
> >>
> >> [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-June/118329.html
> >>
> >>>
> >>> Aravinda, do we need to change this so that qemu has to explicitly
> >>> enable the new NMI behaviour?  Or have I missed something that will
> >>> make that case work already.
> >>
> >> I think we don't need to explicitly enable the new behavior. With new
> >> kernel and new QEMU this should just work. As mentioned above this is
> >> already broken for old kernel/QEMU. Any thoughts?
> >>
> >> Regards,
> >> Aravinda
> >>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Linuxppc-dev mailing list
> >>> Linuxppc-dev@lists.ozlabs.org
> >>> https://lists.ozlabs.org/listinfo/linuxppc-dev
> >>>
> >>
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

next prev parent reply	other threads:[~2015-11-13  4:04 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-11 16:58 [PATCH] KVM: PPC: Exit guest upon fatal machine check exception Aravinda Prasad
2015-11-12  2:24 ` Daniel Axtens
2015-11-12  3:38   ` David Gibson
2015-11-12  4:32     ` Aravinda Prasad
2015-11-12  4:43       ` David Gibson
2015-11-12 17:52         ` Aravinda Prasad
2015-11-13  1:50           ` David Gibson [this message]
2015-11-13  6:26             ` Aravinda Prasad
2015-11-13  7:38               ` Thomas Huth
2015-11-13 11:25                 ` Aravinda Prasad
2015-11-12  4:58     ` Daniel Axtens
2015-11-12 17:22       ` Aravinda Prasad
2015-11-12 21:37         ` Daniel Axtens
2015-11-13  4:58           ` Aravinda Prasad
2015-11-12  3:34 ` David Gibson
2015-11-12  5:18   ` Aravinda Prasad

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151113015015.GI4886@voom.redhat.com \
    --to=david@gibson.dropbear.id.au \
    --cc=agraf@suse.de \
    --cc=aravinda@linux.vnet.ibm.com \
    --cc=dja@axtens.net \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=michaele@au1.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).