From: Marcelo Tosatti <mtosatti@redhat.com>
To: Huang Ying <ying.huang@intel.com>
Cc: Avi Kivity <avi@redhat.com>, Andi Kleen <andi@firstfloor.org>,
Anthony Liguori <aliguori@us.ibm.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH -v2] QEMU-KVM: MCE: Relay UCR MCE to guest
Date: Thu, 17 Sep 2009 18:36:56 -0300 [thread overview]
Message-ID: <20090917213656.GC13907@amt.cnet> (raw)
In-Reply-To: <1253150009.15717.462.camel@yhuang-dev.sh.intel.com>
On Thu, Sep 17, 2009 at 09:13:29AM +0800, Huang Ying wrote:
> On Thu, 2009-09-17 at 01:59 +0800, Marcelo Tosatti wrote:
> > On Wed, Sep 09, 2009 at 10:28:02AM +0800, Huang Ying wrote:
> > > UCR (uncorrected recovery) MCE is supported in recent Intel CPUs,
> > > where some hardware error such as some memory error can be reported
> > > without PCC (processor context corrupted). To recover from such MCE,
> > > the corresponding memory will be unmapped, and all processes accessing
> > > the memory will be killed via SIGBUS.
> > >
> > > For KVM, if QEMU/KVM is killed, all guest processes will be killed
> > > too. So we relay SIGBUS from host OS to guest system via a UCR MCE
> > > injection. Then guest OS can isolate corresponding memory and kill
> > > necessary guest processes only. SIGBUS sent to main thread (not VCPU
> > > threads) will be broadcast to all VCPU threads as UCR MCE.
> > >
> > > v2:
> > >
> > > - Use qemu_ram_addr_from_host instead of self made one to covert from
> > > host address to guest RAM address. Thanks Anthony Liguori.
> > >
> > > Signed-off-by: Huang Ying <ying.huang@intel.com>
> > >
> > > ---
> > > cpu-common.h | 1
> > > exec.c | 20 +++++--
> > > qemu-kvm.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++----
> > > target-i386/cpu.h | 20 ++++++-
> > > 4 files changed, 178 insertions(+), 17 deletions(-)
> > >
> > > --- a/qemu-kvm.c
> > > +++ b/qemu-kvm.c
> > > @@ -27,10 +27,23 @@
> > > #include <sys/mman.h>
> > > #include <sys/ioctl.h>
> > > #include <signal.h>
> > > +#include <sys/signalfd.h>
> > > +#include <sys/prctl.h>
> > >
> > > #define false 0
> > > #define true 1
> > >
> > > +#ifndef PR_MCE_KILL
> > > +#define PR_MCE_KILL 33
> > > +#endif
> > > +
> > > +#ifndef BUS_MCEERR_AR
> > > +#define BUS_MCEERR_AR 4
> > > +#endif
> > > +#ifndef BUS_MCEERR_AO
> > > +#define BUS_MCEERR_AO 5
> > > +#endif
> > > +
> > > #define EXPECTED_KVM_API_VERSION 12
> > >
> > > #if EXPECTED_KVM_API_VERSION != KVM_API_VERSION
> > > @@ -1507,6 +1520,37 @@ static void sig_ipi_handler(int n)
> > > {
> > > }
> > >
> > > +static void sigbus_handler(int n, struct signalfd_siginfo *siginfo, void *ctx)
> > > +{
> > > + if (siginfo->ssi_code == BUS_MCEERR_AO) {
> > > + uint64_t status;
> > > + unsigned long paddr;
> > > + CPUState *cenv;
> > > +
> > > + /* Hope we are lucky for AO MCE */
> > > + if (do_qemu_ram_addr_from_host((void *)siginfo->ssi_addr, &paddr)) {
> > > + fprintf(stderr, "Hardware memory error for memory used by "
> > > + "QEMU itself instead of guest system!: %llx\n",
> > > + (unsigned long long)siginfo->ssi_addr);
> > > + return;
> >
> > qemu-kvm should die here?
>
> There are two kinds of UCR MCE. One is triggered by user space/guest
> read/write, the other is triggered by asynchronously detected error
> (e.g. patrol scrubbing). The latter one is reported as AO (Action
> Optional) MCE, and it has nothing to do with current path. So if we are
> lucky enough, we can survive. And when we finally touch the error memory
> reported by AO MCE, another AR (Action Required) MCE will be triggered.
> We have another chance to deal with it.
OK.
>
> > > + }
> > > + status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
> > > + | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
> > > + | 0xc0;
> > > + kvm_inject_x86_mce(first_cpu, 9, status,
> > > + MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
> > > + (MCM_ADDR_PHYS << 6) | 0xc);
> > > + for (cenv = first_cpu->next_cpu; cenv != NULL; cenv = cenv->next_cpu)
> > > + kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
> > > + MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0);
> > > + return;
> >
> > Should abort if kvm_inject_x86_mce fails?
>
> kvm_inject_x86_mce will abort by itself.
OK.
>
> > > + } else if (siginfo->ssi_code == BUS_MCEERR_AR)
> > > + fprintf(stderr, "Hardware memory error!\n");
> > > + else
> > > + fprintf(stderr, "Internal error in QEMU!\n");
> >
> > Can you re-raise SIGBUS so you we get a coredump on non-MCE SIGBUS as
> > usual?
>
> We discuss this before. Copied below, please comment the comments
> below, :)
>
> Avi:
> (also, I if we can't handle guest-mode SIGBUS I think it would be nice
> to raise it again so the process terminates due to the SIGBUS).
>
> Huang Ying:
> For SIGBUS we can not relay to guest as MCE, we can either abort or
> reset SIGBUS to SIGDFL and re-raise it. Both are OK for me. You prefer
> the latter one?
>
> Andi:
> I think a suitable error message and exit would be better than a plain
> signal kill. It shouldn't look like qemu crashed due to a software
> bug. Ideally a error message in a way that it can be parsed by libvirt
> etc. and reported in a suitable way.
>
> However qemu getting killed itself is very unlikely, it doesn't
> have much memory foot print compared to the guest and other data.
> So this should be a very rare condition.
>
> Avi:
> libvirt etc. can/should wait() for qemu to terminate abnormally and
> report the reason why. However it doesn't seem there is a way to get
> extended signal information from wait(), so it looks like internal
> handling by qemu is better.
I'm not talking about SIGBUS generated by MCE.
What i mean is, for SIGBUS signals that are not due to MCE errors, the
current behaviour is to generate a core dump (which is useful
information for debugging).
With your patch, qemu-kvm handles the signal, prints a message before
exiting.
This is annoying. It seems the discussion above is about SIGBUS
initiated by MCE errors.
next prev parent reply other threads:[~2009-09-17 21:37 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-09 2:28 [PATCH -v2] QEMU-KVM: MCE: Relay UCR MCE to guest Huang Ying
2009-09-09 12:06 ` Avi Kivity
2009-09-09 12:16 ` Avi Kivity
2009-09-09 12:18 ` Avi Kivity
2009-09-10 2:40 ` Huang Ying
2009-09-10 9:35 ` Andi Kleen
2009-09-14 2:55 ` Huang Ying
2009-09-14 5:10 ` Avi Kivity
2009-09-16 1:09 ` Huang Ying
2009-09-16 8:10 ` Avi Kivity
2009-09-14 5:10 ` Avi Kivity
2009-09-16 17:59 ` Marcelo Tosatti
2009-09-17 1:13 ` Huang Ying
2009-09-17 21:36 ` Marcelo Tosatti [this message]
2009-09-18 3:01 ` Huang Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090917213656.GC13907@amt.cnet \
--to=mtosatti@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=andi@firstfloor.org \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).