Re: [PATCH uq/master -v2 2/2] KVM, MCE, unpoison memory address across reboot

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kiszka <jan.kiszka@web.de>
To: Huang Ying <ying.huang@intel.com>
Cc: Avi Kivity <avi@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Anthony Liguori <aliguori@linux.vnet.ibm.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Dean Nelson <dnelson@redhat.com>,
	Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH uq/master -v2 2/2] KVM, MCE, unpoison memory address across reboot
Date: Wed, 09 Feb 2011 09:00:13 +0100	[thread overview]
Message-ID: <4D52498D.9060706@web.de> (raw)
In-Reply-To: <1297220431.5180.15.camel@yhuang-dev>

[-- Attachment #1: Type: text/plain, Size: 4046 bytes --]

On 2011-02-09 04:00, Huang Ying wrote:
> In Linux kernel HWPoison processing implementation, the virtual
> address in processes mapping the error physical memory page is marked
> as HWPoison.  So that, the further accessing to the virtual
> address will kill corresponding processes with SIGBUS.
> 
> If the error physical memory page is used by a KVM guest, the SIGBUS
> will be sent to QEMU, and QEMU will simulate a MCE to report that
> memory error to the guest OS.  If the guest OS can not recover from
> the error (for example, the page is accessed by kernel code), guest OS
> will reboot the system.  But because the underlying host virtual
> address backing the guest physical memory is still poisoned, if the
> guest system accesses the corresponding guest physical memory even
> after rebooting, the SIGBUS will still be sent to QEMU and MCE will be
> simulated.  That is, guest system can not recover via rebooting.

Yeah, saw this already during my test...

> 
> In fact, across rebooting, the contents of guest physical memory page
> need not to be kept.  We can allocate a new host physical page to
> back the corresponding guest physical address.

I just wondering what would be architecturally suboptimal if we simply
remapped on SIGBUS directly. Would save us at least the bookkeeping.

> 
> This patch fixes this issue in QEMU-KVM via calling qemu_ram_remap()
> to clear the corresponding page table entry, so that make it possible
> to allocate a new page to recover the issue.
> 
> Signed-off-by: Huang Ying <ying.huang@intel.com>
> ---
>  target-i386/kvm.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -508,6 +508,42 @@ static int kvm_get_supported_msrs(KVMSta
>      return ret;
>  }
>  
> +struct HWPoisonPage;
> +typedef struct HWPoisonPage HWPoisonPage;
> +struct HWPoisonPage
> +{
> +    ram_addr_t ram_addr;
> +    QLIST_ENTRY(HWPoisonPage) list;
> +};
> +
> +static QLIST_HEAD(hwpoison_page_list, HWPoisonPage) hwpoison_page_list =
> +    QLIST_HEAD_INITIALIZER(hwpoison_page_list);
> +
> +static void kvm_unpoison_all(void *param)
> +{
> +    HWPoisonPage *page, *next_page;
> +
> +    QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
> +        QLIST_REMOVE(page, list);
> +        qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
> +        qemu_free(page);
> +    }
> +}
> +
> +static void kvm_hwpoison_page_add(ram_addr_t ram_addr)
> +{
> +    HWPoisonPage *page;
> +
> +    QLIST_FOREACH(page, &hwpoison_page_list, list) {
> +        if (page->ram_addr == ram_addr)
> +            return;
> +    }
> +
> +    page = qemu_malloc(sizeof(HWPoisonPage));
> +    page->ram_addr = ram_addr;
> +    QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
> +}
> +
>  int kvm_arch_init(KVMState *s)
>  {
>      uint64_t identity_base = 0xfffbc000;
> @@ -556,6 +592,7 @@ int kvm_arch_init(KVMState *s)
>          fprintf(stderr, "e820_add_entry() table is full\n");
>          return ret;
>      }
> +    qemu_register_reset(kvm_unpoison_all, NULL);
>  
>      return 0;
>  }
> @@ -1882,6 +1919,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *en
>                  hardware_memory_error();
>              }
>          }
> +        kvm_hwpoison_page_add(ram_addr);
>  
>          if (code == BUS_MCEERR_AR) {
>              /* Fake an Intel architectural Data Load SRAR UCR */
> @@ -1926,6 +1964,7 @@ int kvm_arch_on_sigbus(int code, void *a
>                      "QEMU itself instead of guest system!: %p\n", addr);
>              return 0;
>          }
> +        kvm_hwpoison_page_add(ram_addr);
>          kvm_mce_inj_srao_memscrub2(first_cpu, paddr);
>      } else
>  #endif
> 
> 

Looks fine otherwise. Unless that simplification makes sense, I could
offer to include this into my MCE rework (there is some minor conflict).
If all goes well, that series should be posted during this week.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

WARNING: multiple messages have this Message-ID (diff)

From: Jan Kiszka <jan.kiszka@web.de>
To: Huang Ying <ying.huang@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Dean Nelson <dnelson@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Anthony Liguori <aliguori@linux.vnet.ibm.com>,
	Andi Kleen <andi@firstfloor.org>, Avi Kivity <avi@redhat.com>
Subject: [Qemu-devel] Re: [PATCH uq/master -v2 2/2] KVM, MCE, unpoison memory address across reboot
Date: Wed, 09 Feb 2011 09:00:13 +0100	[thread overview]
Message-ID: <4D52498D.9060706@web.de> (raw)
In-Reply-To: <1297220431.5180.15.camel@yhuang-dev>

[-- Attachment #1: Type: text/plain, Size: 4046 bytes --]

On 2011-02-09 04:00, Huang Ying wrote:
> In Linux kernel HWPoison processing implementation, the virtual
> address in processes mapping the error physical memory page is marked
> as HWPoison.  So that, the further accessing to the virtual
> address will kill corresponding processes with SIGBUS.
> 
> If the error physical memory page is used by a KVM guest, the SIGBUS
> will be sent to QEMU, and QEMU will simulate a MCE to report that
> memory error to the guest OS.  If the guest OS can not recover from
> the error (for example, the page is accessed by kernel code), guest OS
> will reboot the system.  But because the underlying host virtual
> address backing the guest physical memory is still poisoned, if the
> guest system accesses the corresponding guest physical memory even
> after rebooting, the SIGBUS will still be sent to QEMU and MCE will be
> simulated.  That is, guest system can not recover via rebooting.

Yeah, saw this already during my test...

> 
> In fact, across rebooting, the contents of guest physical memory page
> need not to be kept.  We can allocate a new host physical page to
> back the corresponding guest physical address.

I just wondering what would be architecturally suboptimal if we simply
remapped on SIGBUS directly. Would save us at least the bookkeeping.

> 
> This patch fixes this issue in QEMU-KVM via calling qemu_ram_remap()
> to clear the corresponding page table entry, so that make it possible
> to allocate a new page to recover the issue.
> 
> Signed-off-by: Huang Ying <ying.huang@intel.com>
> ---
>  target-i386/kvm.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -508,6 +508,42 @@ static int kvm_get_supported_msrs(KVMSta
>      return ret;
>  }
>  
> +struct HWPoisonPage;
> +typedef struct HWPoisonPage HWPoisonPage;
> +struct HWPoisonPage
> +{
> +    ram_addr_t ram_addr;
> +    QLIST_ENTRY(HWPoisonPage) list;
> +};
> +
> +static QLIST_HEAD(hwpoison_page_list, HWPoisonPage) hwpoison_page_list =
> +    QLIST_HEAD_INITIALIZER(hwpoison_page_list);
> +
> +static void kvm_unpoison_all(void *param)
> +{
> +    HWPoisonPage *page, *next_page;
> +
> +    QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
> +        QLIST_REMOVE(page, list);
> +        qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
> +        qemu_free(page);
> +    }
> +}
> +
> +static void kvm_hwpoison_page_add(ram_addr_t ram_addr)
> +{
> +    HWPoisonPage *page;
> +
> +    QLIST_FOREACH(page, &hwpoison_page_list, list) {
> +        if (page->ram_addr == ram_addr)
> +            return;
> +    }
> +
> +    page = qemu_malloc(sizeof(HWPoisonPage));
> +    page->ram_addr = ram_addr;
> +    QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
> +}
> +
>  int kvm_arch_init(KVMState *s)
>  {
>      uint64_t identity_base = 0xfffbc000;
> @@ -556,6 +592,7 @@ int kvm_arch_init(KVMState *s)
>          fprintf(stderr, "e820_add_entry() table is full\n");
>          return ret;
>      }
> +    qemu_register_reset(kvm_unpoison_all, NULL);
>  
>      return 0;
>  }
> @@ -1882,6 +1919,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *en
>                  hardware_memory_error();
>              }
>          }
> +        kvm_hwpoison_page_add(ram_addr);
>  
>          if (code == BUS_MCEERR_AR) {
>              /* Fake an Intel architectural Data Load SRAR UCR */
> @@ -1926,6 +1964,7 @@ int kvm_arch_on_sigbus(int code, void *a
>                      "QEMU itself instead of guest system!: %p\n", addr);
>              return 0;
>          }
> +        kvm_hwpoison_page_add(ram_addr);
>          kvm_mce_inj_srao_memscrub2(first_cpu, paddr);
>      } else
>  #endif
> 
> 

Looks fine otherwise. Unless that simplification makes sense, I could
offer to include this into my MCE rework (there is some minor conflict).
If all goes well, that series should be posted during this week.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

next prev parent reply	other threads:[~2011-02-09  8:00 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-09  3:00 [PATCH uq/master -v2 2/2] KVM, MCE, unpoison memory address across reboot Huang Ying
2011-02-09  3:00 ` [Qemu-devel] " Huang Ying
2011-02-09  8:00 ` Jan Kiszka [this message]
2011-02-09  8:00   ` [Qemu-devel] " Jan Kiszka
2011-02-10  0:27   ` Huang Ying
2011-02-10  0:27     ` [Qemu-devel] " Huang Ying
2011-02-10  8:22     ` Jan Kiszka
2011-02-10  8:22       ` [Qemu-devel] " Jan Kiszka
2011-02-10  8:52     ` Jan Kiszka
2011-02-10  8:52       ` [Qemu-devel] " Jan Kiszka
2011-02-11  1:20       ` Huang Ying
2011-02-11  1:20         ` [Qemu-devel] " Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D52498D.9060706@web.de \
    --to=jan.kiszka@web.de \
    --cc=aliguori@linux.vnet.ibm.com \
    --cc=andi@firstfloor.org \
    --cc=avi@redhat.com \
    --cc=dnelson@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.