Re: [PATCH] QEMU-KVM: MCE: Relay UCR MCE to guest

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Huang Ying <ying.huang@intel.com>
To: Anthony Liguori <aliguori@us.ibm.com>
Cc: Avi Kivity <avi@redhat.com>, Andi Kleen <andi@firstfloor.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH] QEMU-KVM: MCE: Relay UCR MCE to guest
Date: Tue, 08 Sep 2009 13:41:03 +0800	[thread overview]
Message-ID: <1252388463.14648.975.camel@yhuang-dev.sh.intel.com> (raw)
In-Reply-To: <4AA57187.5020502@us.ibm.com>

On Tue, 2009-09-08 at 04:48 +0800, Anthony Liguori wrote: 
> Hi Huang,
> 
> Huang Ying wrote:
> > UCR (uncorrected recovery) MCE is supported in recent Intel CPUs,
> > where some hardware error such as some memory error can be reported
> > without PCC (processor context corrupted). To recover from such MCE,
> > the corresponding memory will be unmapped, and all processes accessing
> > the memory will be killed via SIGBUS.
> >
> > For KVM, if QEMU/KVM is killed, all guest processes will be killed
> > too. So we relay SIGBUS from host OS to guest system via a UCR MCE
> > injection. Then guest OS can isolate corresponding memory and kill
> > necessary guest processes only. SIGBUS sent to main thread (not VCPU
> > threads) will be broadcast to all VCPU threads as UCR MCE.
> >
> > Signed-off-by: Huang Ying <ying.huang@intel.com>
> >
> > ---
> >  qemu-kvm.c        |  173 ++++++++++++++++++++++++++++++++++++++++++++++++++----
> >  target-i386/cpu.h |   20 +++++-
> >  2 files changed, 181 insertions(+), 12 deletions(-)
> >
> > --- a/qemu-kvm.c
> > +++ b/qemu-kvm.c
> > @@ -27,10 +27,23 @@
> >  #include <sys/mman.h>
> >  #include <sys/ioctl.h>
> >  #include <signal.h>
> > +#include <sys/signalfd.h>
> > +#include <sys/prctl.h>
> >
> >  #define false 0
> >  #define true 1
> >
> > +#ifndef PR_MCE_KILL
> > +#define PR_MCE_KILL 33
> > +#endif
> > +
> > +#ifndef BUS_MCEERR_AR
> > +#define BUS_MCEERR_AR 4
> > +#endif
> > +#ifndef BUS_MCEERR_AO
> > +#define BUS_MCEERR_AO 5
> > +#endif
> > +
> >  #define EXPECTED_KVM_API_VERSION 12
> >
> >  #if EXPECTED_KVM_API_VERSION != KVM_API_VERSION
> > @@ -702,6 +715,24 @@ int kvm_get_dirty_pages_range(kvm_contex
> >      return 0;
> >  }
> >
> > +static int kvm_addr_userspace_to_phys(unsigned long userspace_addr,
> > +                               unsigned long *phys_addr)
> > +{
> > +	int i;
> > +	struct slot_info *slot;
> > +
> > +	for (i = 0; i < KVM_MAX_NUM_MEM_REGIONS; ++i) {
> > +		slot = &slots[i];
> > +		if (slot->len && slot->userspace_addr <= userspace_addr &&
> > +		    (slot->userspace_addr + slot->len) > userspace_addr) {
> > +			*phys_addr = userspace_addr - slot->userspace_addr +
> > +				slot->phys_addr;
> > +			return 0;
> > +		}
> > +	}
> > +	return -1;
> > +}
> > +
> >   
> 
> The slot mapping is actually a copy of the qemu's ram_blocks structure 
> (see exec.c).  If you base your check on that, it will Just Work for 
> QEMU too.

I find there is already a function named qemu_ram_addr_from_host which
translate from user space virtual address into qemu RAM address. But I
need function to return a error code instead of abort in case of no RAM
address corresponding specified user space virtual address. So I plan to
use following code to deal with that.

int do_qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
ram_addr_t qemu_ram_addr_from_host(void *ptr);

Does this follow the coding style of qemu?

> >  #ifdef KVM_CAP_IRQCHIP
> >
> >  int kvm_set_irq_level(kvm_context_t kvm, int irq, int level, int *status)
> > @@ -1515,6 +1546,38 @@ static void sig_ipi_handler(int n)
> >  {
> >  }
> >
> > +static void sigbus_handler(int n, struct signalfd_siginfo *siginfo, void *ctx)
> > +{
> > +    if (siginfo->ssi_code == BUS_MCEERR_AO) {
> > +        uint64_t status;
> > +        unsigned long paddr;
> > +        CPUState *cenv;
> > +
> > +        /* Hope we are lucky for AO MCE */
> >   
> 
> Even if the error was limited to guest memory, it could have been 
> generated by either the kernel or userspace reading guest memory, no?
> 
> Does this potentially open a security hole for us?  Consider the following:
> 
> 1) We happen to read guest memory and that causes an MCE.  For instance, 
> say we're in virtio.c and we read the virtio ring.
> 2) That should trigger the kernel to generate a sigbus.
> 3) We catch sigbus, and queue an MCE for delivery.
> 4) After sigbus handler completes, we're back in virtio.c, what was the 
> value of the memory operation we just completed?
> 
> If the instruction gets skipped, we may be leaking host memory because 
> the access never happened.

There are two kinds of recoverable MCE named SRAO (Software Recoverable
Action Optional) and SRAR (Software Recoverable Action Required). For
your example, it is a SRAR error. Where kernel will munmap the error
page and send SIGBUS to qemu via force_sig_info, which will unblock
SIGBUS and reset its action to SIG_DFL, so qemu will be terminated.

If the guest mode is interrupted, because signal mask processing of KVM
kernel part, SIGBUS can be captured by qemu.

For more details of recoverable MCE (SRAO and SRAR), you can refer to
latest Intel software developer's manual.

Best Regards,
Huang Ying

next prev parent reply	other threads:[~2009-09-08  5:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-07  8:32 [PATCH] QEMU-KVM: MCE: Relay UCR MCE to guest Huang Ying
2009-09-07 20:48 ` Anthony Liguori
2009-09-08  5:41   ` Huang Ying [this message]
2009-09-08 13:07     ` Anthony Liguori
2009-09-08  6:41   ` Avi Kivity
2009-09-08  6:46     ` Huang Ying
2009-09-08  8:11   ` Andi Kleen
2009-09-09 12:10     ` Avi Kivity
2009-09-10  2:50       ` Huang Ying
2009-09-08  6:44 ` Avi Kivity
2009-09-08  6:43   ` Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1252388463.14648.975.camel@yhuang-dev.sh.intel.com \
    --to=ying.huang@intel.com \
    --cc=aliguori@us.ibm.com \
    --cc=andi@firstfloor.org \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox