Re: [PATCH -v2] QEMU-KVM: MCE: Relay UCR MCE to guest

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Huang Ying <ying.huang@intel.com>
Cc: Avi Kivity <avi@redhat.com>, Andi Kleen <andi@firstfloor.org>,
	Anthony Liguori <aliguori@us.ibm.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH -v2] QEMU-KVM: MCE: Relay UCR MCE to guest
Date: Thu, 17 Sep 2009 18:36:56 -0300	[thread overview]
Message-ID: <20090917213656.GC13907@amt.cnet> (raw)
In-Reply-To: <1253150009.15717.462.camel@yhuang-dev.sh.intel.com>

On Thu, Sep 17, 2009 at 09:13:29AM +0800, Huang Ying wrote:
> On Thu, 2009-09-17 at 01:59 +0800, Marcelo Tosatti wrote: 
> > On Wed, Sep 09, 2009 at 10:28:02AM +0800, Huang Ying wrote:
> > > UCR (uncorrected recovery) MCE is supported in recent Intel CPUs,
> > > where some hardware error such as some memory error can be reported
> > > without PCC (processor context corrupted). To recover from such MCE,
> > > the corresponding memory will be unmapped, and all processes accessing
> > > the memory will be killed via SIGBUS.
> > > 
> > > For KVM, if QEMU/KVM is killed, all guest processes will be killed
> > > too. So we relay SIGBUS from host OS to guest system via a UCR MCE
> > > injection. Then guest OS can isolate corresponding memory and kill
> > > necessary guest processes only. SIGBUS sent to main thread (not VCPU
> > > threads) will be broadcast to all VCPU threads as UCR MCE.
> > > 
> > > v2:
> > > 
> > > - Use qemu_ram_addr_from_host instead of self made one to covert from
> > >   host address to guest RAM address. Thanks Anthony Liguori.
> > > 
> > > Signed-off-by: Huang Ying <ying.huang@intel.com>
> > > 
> > > ---
> > >  cpu-common.h      |    1 
> > >  exec.c            |   20 +++++--
> > >  qemu-kvm.c        |  154 ++++++++++++++++++++++++++++++++++++++++++++++++++----
> > >  target-i386/cpu.h |   20 ++++++-
> > >  4 files changed, 178 insertions(+), 17 deletions(-)
> > > 
> > > --- a/qemu-kvm.c
> > > +++ b/qemu-kvm.c
> > > @@ -27,10 +27,23 @@
> > >  #include <sys/mman.h>
> > >  #include <sys/ioctl.h>
> > >  #include <signal.h>
> > > +#include <sys/signalfd.h>
> > > +#include <sys/prctl.h>
> > >  
> > >  #define false 0
> > >  #define true 1
> > >  
> > > +#ifndef PR_MCE_KILL
> > > +#define PR_MCE_KILL 33
> > > +#endif
> > > +
> > > +#ifndef BUS_MCEERR_AR
> > > +#define BUS_MCEERR_AR 4
> > > +#endif
> > > +#ifndef BUS_MCEERR_AO
> > > +#define BUS_MCEERR_AO 5
> > > +#endif
> > > +
> > >  #define EXPECTED_KVM_API_VERSION 12
> > >  
> > >  #if EXPECTED_KVM_API_VERSION != KVM_API_VERSION
> > > @@ -1507,6 +1520,37 @@ static void sig_ipi_handler(int n)
> > >  {
> > >  }
> > >  
> > > +static void sigbus_handler(int n, struct signalfd_siginfo *siginfo, void *ctx)
> > > +{
> > > +    if (siginfo->ssi_code == BUS_MCEERR_AO) {
> > > +        uint64_t status;
> > > +        unsigned long paddr;
> > > +        CPUState *cenv;
> > > +
> > > +        /* Hope we are lucky for AO MCE */
> > > +        if (do_qemu_ram_addr_from_host((void *)siginfo->ssi_addr, &paddr)) {
> > > +            fprintf(stderr, "Hardware memory error for memory used by "
> > > +                    "QEMU itself instead of guest system!: %llx\n",
> > > +                    (unsigned long long)siginfo->ssi_addr);
> > > +            return;
> > 
> > qemu-kvm should die here?
> 
> There are two kinds of UCR MCE. One is triggered by user space/guest
> read/write, the other is triggered by asynchronously detected error
> (e.g. patrol scrubbing). The latter one is reported as AO (Action
> Optional) MCE, and it has nothing to do with current path. So if we are
> lucky enough, we can survive. And when we finally touch the error memory
> reported by AO MCE, another AR (Action Required) MCE will be triggered.
> We have another chance to deal with it.

OK.

> 
> > > +        }
> > > +        status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
> > > +            | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
> > > +            | 0xc0;
> > > +        kvm_inject_x86_mce(first_cpu, 9, status,
> > > +                           MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
> > > +                           (MCM_ADDR_PHYS << 6) | 0xc);
> > > +        for (cenv = first_cpu->next_cpu; cenv != NULL; cenv = cenv->next_cpu)
> > > +            kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
> > > +                               MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0);
> > > +        return;
> > 
> > Should abort if kvm_inject_x86_mce fails?
> 
> kvm_inject_x86_mce will abort by itself.

OK.

> 
> > > +    } else if (siginfo->ssi_code == BUS_MCEERR_AR)
> > > +        fprintf(stderr, "Hardware memory error!\n");
> > > +    else
> > > +        fprintf(stderr, "Internal error in QEMU!\n");
> > 
> > Can you re-raise SIGBUS so you we get a coredump on non-MCE SIGBUS as
> > usual?
> 
> We discuss this before. Copied below, please comment the comments
> below, :)
> 
> Avi:
> (also, I if we can't handle guest-mode SIGBUS I think it would be nice 
> to raise it again so the process terminates due to the SIGBUS).
> 
> Huang Ying:
> For SIGBUS we can not relay to guest as MCE, we can either abort or
> reset SIGBUS to SIGDFL and re-raise it. Both are OK for me. You prefer
> the latter one?
> 
> Andi:
> I think a suitable error message and exit would be better than a plain 
> signal kill. It shouldn't look like qemu crashed due to a software
> bug. Ideally a error message in a way that it can be parsed by libvirt
> etc. and reported in a suitable way.
> 
> However qemu getting killed itself is very unlikely, it doesn't
> have much memory foot print compared to the guest and other data. 
> So this should be a very rare condition.
> 
> Avi:
> libvirt etc. can/should wait() for qemu to terminate abnormally and 
> report the reason why.  However it doesn't seem there is a way to get 
> extended signal information from wait(), so it looks like internal 
> handling by qemu is better.

I'm not talking about SIGBUS generated by MCE.

What i mean is, for SIGBUS signals that are not due to MCE errors, the
current behaviour is to generate a core dump (which is useful
information for debugging). 

With your patch, qemu-kvm handles the signal, prints a message before
exiting.

This is annoying. It seems the discussion above is about SIGBUS
initiated by MCE errors.

next prev parent reply	other threads:[~2009-09-17 21:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-09  2:28 [PATCH -v2] QEMU-KVM: MCE: Relay UCR MCE to guest Huang Ying
2009-09-09 12:06 ` Avi Kivity
2009-09-09 12:16   ` Avi Kivity
2009-09-09 12:18     ` Avi Kivity
2009-09-10  2:40   ` Huang Ying
2009-09-10  9:35     ` Andi Kleen
2009-09-14  2:55       ` Huang Ying
2009-09-14  5:10         ` Avi Kivity
2009-09-16  1:09           ` Huang Ying
2009-09-16  8:10             ` Avi Kivity
2009-09-14  5:10       ` Avi Kivity
2009-09-16 17:59 ` Marcelo Tosatti
2009-09-17  1:13   ` Huang Ying
2009-09-17 21:36     ` Marcelo Tosatti [this message]
2009-09-18  3:01       ` Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090917213656.GC13907@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=andi@firstfloor.org \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).