From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH -v2] QEMU-KVM: MCE: Relay UCR MCE to guest Date: Thu, 17 Sep 2009 18:36:56 -0300 Message-ID: <20090917213656.GC13907@amt.cnet> References: <1252463282.5212.44.camel@yhuang-dev.sh.intel.com> <20090916175931.GA7997@amt.cnet> <1253150009.15717.462.camel@yhuang-dev.sh.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Avi Kivity , Andi Kleen , Anthony Liguori , "kvm@vger.kernel.org" To: Huang Ying Return-path: Received: from mx1.redhat.com ([209.132.183.28]:5080 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754421AbZIQVht (ORCPT ); Thu, 17 Sep 2009 17:37:49 -0400 Content-Disposition: inline In-Reply-To: <1253150009.15717.462.camel@yhuang-dev.sh.intel.com> Sender: kvm-owner@vger.kernel.org List-ID: On Thu, Sep 17, 2009 at 09:13:29AM +0800, Huang Ying wrote: > On Thu, 2009-09-17 at 01:59 +0800, Marcelo Tosatti wrote: > > On Wed, Sep 09, 2009 at 10:28:02AM +0800, Huang Ying wrote: > > > UCR (uncorrected recovery) MCE is supported in recent Intel CPUs, > > > where some hardware error such as some memory error can be reported > > > without PCC (processor context corrupted). To recover from such MCE, > > > the corresponding memory will be unmapped, and all processes accessing > > > the memory will be killed via SIGBUS. > > > > > > For KVM, if QEMU/KVM is killed, all guest processes will be killed > > > too. So we relay SIGBUS from host OS to guest system via a UCR MCE > > > injection. Then guest OS can isolate corresponding memory and kill > > > necessary guest processes only. SIGBUS sent to main thread (not VCPU > > > threads) will be broadcast to all VCPU threads as UCR MCE. > > > > > > v2: > > > > > > - Use qemu_ram_addr_from_host instead of self made one to covert from > > > host address to guest RAM address. Thanks Anthony Liguori. > > > > > > Signed-off-by: Huang Ying > > > > > > --- > > > cpu-common.h | 1 > > > exec.c | 20 +++++-- > > > qemu-kvm.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++---- > > > target-i386/cpu.h | 20 ++++++- > > > 4 files changed, 178 insertions(+), 17 deletions(-) > > > > > > --- a/qemu-kvm.c > > > +++ b/qemu-kvm.c > > > @@ -27,10 +27,23 @@ > > > #include > > > #include > > > #include > > > +#include > > > +#include > > > > > > #define false 0 > > > #define true 1 > > > > > > +#ifndef PR_MCE_KILL > > > +#define PR_MCE_KILL 33 > > > +#endif > > > + > > > +#ifndef BUS_MCEERR_AR > > > +#define BUS_MCEERR_AR 4 > > > +#endif > > > +#ifndef BUS_MCEERR_AO > > > +#define BUS_MCEERR_AO 5 > > > +#endif > > > + > > > #define EXPECTED_KVM_API_VERSION 12 > > > > > > #if EXPECTED_KVM_API_VERSION != KVM_API_VERSION > > > @@ -1507,6 +1520,37 @@ static void sig_ipi_handler(int n) > > > { > > > } > > > > > > +static void sigbus_handler(int n, struct signalfd_siginfo *siginfo, void *ctx) > > > +{ > > > + if (siginfo->ssi_code == BUS_MCEERR_AO) { > > > + uint64_t status; > > > + unsigned long paddr; > > > + CPUState *cenv; > > > + > > > + /* Hope we are lucky for AO MCE */ > > > + if (do_qemu_ram_addr_from_host((void *)siginfo->ssi_addr, &paddr)) { > > > + fprintf(stderr, "Hardware memory error for memory used by " > > > + "QEMU itself instead of guest system!: %llx\n", > > > + (unsigned long long)siginfo->ssi_addr); > > > + return; > > > > qemu-kvm should die here? > > There are two kinds of UCR MCE. One is triggered by user space/guest > read/write, the other is triggered by asynchronously detected error > (e.g. patrol scrubbing). The latter one is reported as AO (Action > Optional) MCE, and it has nothing to do with current path. So if we are > lucky enough, we can survive. And when we finally touch the error memory > reported by AO MCE, another AR (Action Required) MCE will be triggered. > We have another chance to deal with it. OK. > > > > + } > > > + status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN > > > + | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S > > > + | 0xc0; > > > + kvm_inject_x86_mce(first_cpu, 9, status, > > > + MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr, > > > + (MCM_ADDR_PHYS << 6) | 0xc); > > > + for (cenv = first_cpu->next_cpu; cenv != NULL; cenv = cenv->next_cpu) > > > + kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC, > > > + MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0); > > > + return; > > > > Should abort if kvm_inject_x86_mce fails? > > kvm_inject_x86_mce will abort by itself. OK. > > > > + } else if (siginfo->ssi_code == BUS_MCEERR_AR) > > > + fprintf(stderr, "Hardware memory error!\n"); > > > + else > > > + fprintf(stderr, "Internal error in QEMU!\n"); > > > > Can you re-raise SIGBUS so you we get a coredump on non-MCE SIGBUS as > > usual? > > We discuss this before. Copied below, please comment the comments > below, :) > > Avi: > (also, I if we can't handle guest-mode SIGBUS I think it would be nice > to raise it again so the process terminates due to the SIGBUS). > > Huang Ying: > For SIGBUS we can not relay to guest as MCE, we can either abort or > reset SIGBUS to SIGDFL and re-raise it. Both are OK for me. You prefer > the latter one? > > Andi: > I think a suitable error message and exit would be better than a plain > signal kill. It shouldn't look like qemu crashed due to a software > bug. Ideally a error message in a way that it can be parsed by libvirt > etc. and reported in a suitable way. > > However qemu getting killed itself is very unlikely, it doesn't > have much memory foot print compared to the guest and other data. > So this should be a very rare condition. > > Avi: > libvirt etc. can/should wait() for qemu to terminate abnormally and > report the reason why. However it doesn't seem there is a way to get > extended signal information from wait(), so it looks like internal > handling by qemu is better. I'm not talking about SIGBUS generated by MCE. What i mean is, for SIGBUS signals that are not due to MCE errors, the current behaviour is to generate a core dump (which is useful information for debugging). With your patch, qemu-kvm handles the signal, prints a message before exiting. This is annoying. It seems the discussion above is about SIGBUS initiated by MCE errors.