public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] KVM, HWPoison, unpoison address across rebooting
@ 2010-12-22  2:51 Huang Ying
  2010-12-22  2:51 ` [RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally Huang Ying
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Huang Ying @ 2010-12-22  2:51 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: linux-kernel, kvm, Andi Kleen, Tony Luck, ying.huang, Dean Nelson

Unpoison address across rebooting, to make it possible that a new
memory page can be allocated, so that guest system can successfully
reboot.

[RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally
[RFC 2/3] KVM, Replace is_hwpoison_address with get_user_pages_hwpoison
[RFC 3/3] KVM, HWPoison, unpoison address across rebooting

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally
  2010-12-22  2:51 [RFC 0/3] KVM, HWPoison, unpoison address across rebooting Huang Ying
@ 2010-12-22  2:51 ` Huang Ying
  2010-12-22 23:07   ` Andrew Morton
  2010-12-22  2:51 ` [RFC 2/3] KVM, Replace is_hwpoison_address with get_user_pages_hwpoison Huang Ying
  2010-12-22  2:51 ` [RFC 3/3] KVM, HWPoison, unpoison address across rebooting Huang Ying
  2 siblings, 1 reply; 7+ messages in thread
From: Huang Ying @ 2010-12-22  2:51 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: linux-kernel, kvm, Andi Kleen, Tony Luck, ying.huang, Dean Nelson,
	Andrew Morton

Make __get_user_pages return -EHWPOISON for HWPOISON page only if
FOLL_HWPOISON is specified.  With this patch, the interested callers
can distinguish HWPOISON page from general FAULT page, while other
callers will still get -EFAULT for pages, so the user space interface
need not to be changed.

get_user_pages_hwpoison is added as a variant of get_user_pages that
can return -EHWPOISON for HWPOISON page.

This feature is needed by KVM, where UCR MCE should be relayed to
guest for HWPOISON page, while instruction emulation and MMIO will be
tried for general FAULT page.

The idea comes from Andrew Morton.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/asm-generic/errno.h |    2 +
 include/linux/mm.h          |    5 ++++
 mm/memory.c                 |   53 +++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 57 insertions(+), 3 deletions(-)

--- a/include/asm-generic/errno.h
+++ b/include/asm-generic/errno.h
@@ -108,4 +108,6 @@
 
 #define ERFKILL		132	/* Operation not possible due to RF-kill */
 
+#define EHWPOISON	133	/* Memory page has hardware error */
+
 #endif
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -860,6 +860,10 @@ int get_user_pages(struct task_struct *t
 			struct page **pages, struct vm_area_struct **vmas);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
+int get_user_pages_hwpoison(struct task_struct *tsk, struct mm_struct *mm,
+			    unsigned long start, int nr_pages, int write,
+			    int force, struct page **pages,
+			    struct vm_area_struct **vmas);
 struct page *get_dump_page(unsigned long addr);
 
 extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
@@ -1415,6 +1419,7 @@ struct page *follow_page(struct vm_area_
 #define FOLL_GET	0x04	/* do get_page on page */
 #define FOLL_DUMP	0x08	/* give error on hole if it would be zero */
 #define FOLL_FORCE	0x10	/* get_user_pages read/write w/o permission */
+#define FOLL_HWPOISON	0x20	/* check page is hwpoisoned */
 
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
 			void *data);
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1449,9 +1449,16 @@ int __get_user_pages(struct task_struct
 				if (ret & VM_FAULT_ERROR) {
 					if (ret & VM_FAULT_OOM)
 						return i ? i : -ENOMEM;
-					if (ret &
-					    (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE|
-					     VM_FAULT_SIGBUS))
+					if (ret & (VM_FAULT_HWPOISON |
+						   VM_FAULT_HWPOISON_LARGE)) {
+						if (i)
+							return i;
+						else if (gup_flags & FOLL_HWPOISON)
+							return -EHWPOISON;
+						else
+							return -EFAULT;
+					}
+					if (ret & VM_FAULT_SIGBUS)
 						return i ? i : -EFAULT;
 					BUG();
 				}
@@ -1564,6 +1571,46 @@ int get_user_pages(struct task_struct *t
 EXPORT_SYMBOL(get_user_pages);
 
 /**
+ * get_user_pages_hwpoison() - pin user pages in memory, return hwpoison status
+ * @tsk:	task_struct of target task
+ * @mm:		mm_struct of target mm
+ * @start:	starting user address
+ * @nr_pages:	number of pages from start to pin
+ * @write:	whether pages will be written to by the caller
+ * @force:	whether to force write access even if user mapping is
+ *		readonly. This will result in the page being COWed even
+ *		in MAP_SHARED mappings. You do not want this.
+ * @pages:	array that receives pointers to the pages pinned.
+ *		Should be at least nr_pages long. Or NULL, if caller
+ *		only intends to ensure the pages are faulted in.
+ * @vmas:	array of pointers to vmas corresponding to each page.
+ *		Or NULL if the caller does not require them.
+ *
+ * Returns number of pages pinned.
+ *
+ * If the page table or memory page is hwpoisoned, return -EHWPOISON.
+ *
+ * Otherwise, same as get_user_pages.
+ */
+int get_user_pages_hwpoison(struct task_struct *tsk, struct mm_struct *mm,
+			    unsigned long start, int nr_pages, int write,
+			    int force, struct page **pages,
+			    struct vm_area_struct **vmas)
+{
+	int flags = FOLL_TOUCH | FOLL_HWPOISON;
+
+	if (pages)
+		flags |= FOLL_GET;
+	if (write)
+		flags |= FOLL_WRITE;
+	if (force)
+		flags |= FOLL_FORCE;
+
+	return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas);
+}
+EXPORT_SYMBOL(get_user_pages_hwpoison);
+
+/**
  * get_dump_page() - pin user page in memory while writing it to core dump
  * @addr: user address
  *

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC 2/3] KVM, Replace is_hwpoison_address with get_user_pages_hwpoison
  2010-12-22  2:51 [RFC 0/3] KVM, HWPoison, unpoison address across rebooting Huang Ying
  2010-12-22  2:51 ` [RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally Huang Ying
@ 2010-12-22  2:51 ` Huang Ying
  2010-12-22  2:51 ` [RFC 3/3] KVM, HWPoison, unpoison address across rebooting Huang Ying
  2 siblings, 0 replies; 7+ messages in thread
From: Huang Ying @ 2010-12-22  2:51 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: linux-kernel, kvm, Andi Kleen, Tony Luck, ying.huang, Dean Nelson

is_hwpoison_address only checks whether the page table entry is
hwpoisoned, regardless the memory page mapped.  While
get_user_pages_hwpoison will check both.

In a following patch, we will introduce unpoison_address, which will
clear the poisoned page table entry to make it possible to allocate a
new memory page for the virtual address.  But it is also possible that
the underlying memory page is kept poisoned even after the
corresponding page table entry is cleared, that is, a new memory page
can not be allocated.  get_user_pages_hwpoison can catch these
situations.

Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 include/linux/mm.h  |    8 --------
 mm/memory-failure.c |   32 --------------------------------
 virt/kvm/kvm_main.c |    4 +++-
 3 files changed, 3 insertions(+), 41 deletions(-)

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1512,14 +1512,6 @@ extern int sysctl_memory_failure_recover
 extern void shake_page(struct page *p, int access);
 extern atomic_long_t mce_bad_pages;
 extern int soft_offline_page(struct page *page, int flags);
-#ifdef CONFIG_MEMORY_FAILURE
-int is_hwpoison_address(unsigned long addr);
-#else
-static inline int is_hwpoison_address(unsigned long addr)
-{
-	return 0;
-}
-#endif
 
 extern void dump_page(struct page *page);
 
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1433,35 +1433,3 @@ done:
 	/* keep elevated page count for bad page */
 	return ret;
 }
-
-/*
- * The caller must hold current->mm->mmap_sem in read mode.
- */
-int is_hwpoison_address(unsigned long addr)
-{
-	pgd_t *pgdp;
-	pud_t pud, *pudp;
-	pmd_t pmd, *pmdp;
-	pte_t pte, *ptep;
-	swp_entry_t entry;
-
-	pgdp = pgd_offset(current->mm, addr);
-	if (!pgd_present(*pgdp))
-		return 0;
-	pudp = pud_offset(pgdp, addr);
-	pud = *pudp;
-	if (!pud_present(pud) || pud_large(pud))
-		return 0;
-	pmdp = pmd_offset(pudp, addr);
-	pmd = *pmdp;
-	if (!pmd_present(pmd) || pmd_large(pmd))
-		return 0;
-	ptep = pte_offset_map(pmdp, addr);
-	pte = *ptep;
-	pte_unmap(ptep);
-	if (!is_swap_pte(pte))
-		return 0;
-	entry = pte_to_swp_entry(pte);
-	return is_hwpoison_entry(entry);
-}
-EXPORT_SYMBOL_GPL(is_hwpoison_address);
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -966,7 +966,9 @@ static pfn_t hva_to_pfn(struct kvm *kvm,
 			goto return_fault_page;
 
 		down_read(&current->mm->mmap_sem);
-		if (is_hwpoison_address(addr)) {
+		npages = get_user_pages_hwpoison(current, current->mm,
+						 addr, 1, 1, 0, page, NULL);
+		if (npages == -EHWPOISON) {
 			up_read(&current->mm->mmap_sem);
 			get_page(hwpoison_page);
 			return page_to_pfn(hwpoison_page);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC 3/3] KVM, HWPoison, unpoison address across rebooting
  2010-12-22  2:51 [RFC 0/3] KVM, HWPoison, unpoison address across rebooting Huang Ying
  2010-12-22  2:51 ` [RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally Huang Ying
  2010-12-22  2:51 ` [RFC 2/3] KVM, Replace is_hwpoison_address with get_user_pages_hwpoison Huang Ying
@ 2010-12-22  2:51 ` Huang Ying
  2 siblings, 0 replies; 7+ messages in thread
From: Huang Ying @ 2010-12-22  2:51 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: linux-kernel, kvm, Andi Kleen, Tony Luck, ying.huang, Dean Nelson

In HWPoison processing code, not only the struct page corresponding
the error physical memory page is marked as HWPoison, but also the
virtual address in processes mapping the error physical memory page is
marked as HWPoison.  So that, the further accessing to the virtual
address will kill corresponding processes with SIGBUS.

If the error physical memory page is used by a KVM guest, the SIGBUS
will be sent to QEMU, and QEMU will simulate a MCE to report that
memory error to the guest OS.  If the guest OS can not recover from
the error (for example, the page is accessed by kernel code), guest OS
will reboot the system.  But because the underlying host virtual
address backing the guest physical memory is still poisoned, if the
guest system accesses the corresponding guest physical memory even
after rebooting, the SIGBUS will still be sent to QEMU and MCE will be
simulated.  That is, guest system can not recover via rebooting.

In fact, across rebooting, the contents of guest physical memory page
need not to be kept.  We can allocate a new host physical page to
back the corresponding guest physical address.

To do that, a mechanism in KVM to "unpoison" poisoned virtual address
by clearing the corresponding PTE is provided.  So that, when doing
rebooting, QEMU can unpoison the poisoned virtual address, and when
the unpoisoned memory page is accessed, a new physical memory may be
allocated if possible.

Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 include/linux/kvm.h |    1 +
 include/linux/mm.h  |    8 ++++++++
 mm/memory-failure.c |   39 +++++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c |   14 ++++++++++++++
 4 files changed, 62 insertions(+)

--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -676,6 +676,7 @@ struct kvm_clock_data {
 #define KVM_SET_PIT2              _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
 /* Available with KVM_CAP_PPC_GET_PVINFO */
 #define KVM_PPC_GET_PVINFO	  _IOW(KVMIO,  0xa1, struct kvm_ppc_pvinfo)
+#define KVM_UNPOISON_ADDRESS	  _IO(KVMIO,  0xa2)
 
 /*
  * ioctls for vcpu fds
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1512,6 +1512,14 @@ extern int sysctl_memory_failure_recover
 extern void shake_page(struct page *p, int access);
 extern atomic_long_t mce_bad_pages;
 extern int soft_offline_page(struct page *page, int flags);
+#ifdef CONFIG_MEMORY_FAILURE
+int unpoison_address(unsigned long addr);
+#else
+static inline int unpoison_address(unsigned long addr)
+{
+	return -EINVAL;
+}
+#endif
 
 extern void dump_page(struct page *page);
 
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1433,3 +1433,42 @@ done:
 	/* keep elevated page count for bad page */
 	return ret;
 }
+
+int unpoison_address(unsigned long addr)
+{
+	struct mm_struct *mm;
+	pgd_t *pgdp;
+	pud_t pud, *pudp;
+	pmd_t pmd, *pmdp;
+	pte_t pte, *ptep;
+	spinlock_t *ptl;
+	swp_entry_t entry;
+	int rc;
+
+	mm = current->mm;
+	pgdp = pgd_offset(mm, addr);
+	if (!pgd_present(*pgdp))
+		return -EINVAL;
+	pudp = pud_offset(pgdp, addr);
+	pud = *pudp;
+	if (!pud_present(pud) || pud_large(pud))
+		return -EINVAL;
+	pmdp = pmd_offset(pudp, addr);
+	pmd = *pmdp;
+	/* can not unpoison huge page yet */
+	if (!pmd_present(pmd) || pmd_large(pmd))
+		return -EINVAL;
+	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+	pte = *ptep;
+	rc = -EINVAL;
+	if (!is_swap_pte(pte))
+		goto out;
+	entry = pte_to_swp_entry(pte);
+	if (!is_hwpoison_entry(entry))
+		goto out;
+	pte_clear(mm, addr, ptep);
+out:
+	pte_unmap_unlock(ptep, ptl);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(unpoison_address);
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -774,6 +774,17 @@ int kvm_vm_ioctl_set_memory_region(struc
 	return kvm_set_memory_region(kvm, mem, user_alloc);
 }
 
+static int kvm_unpoison_address(struct kvm *kvm, unsigned long address)
+{
+	int r;
+
+	down_read(&current->mm->mmap_sem);
+	r = unpoison_address(address);
+	up_read(&current->mm->mmap_sem);
+
+	return r;
+}
+
 int kvm_get_dirty_log(struct kvm *kvm,
 			struct kvm_dirty_log *log, int *is_dirty)
 {
@@ -1728,6 +1739,9 @@ static long kvm_vm_ioctl(struct file *fi
 		mutex_unlock(&kvm->lock);
 		break;
 #endif
+	case KVM_UNPOISON_ADDRESS:
+		r = kvm_unpoison_address(kvm, arg);
+		break;
 	default:
 		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
 		if (r == -ENOTTY)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally
  2010-12-22  2:51 ` [RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally Huang Ying
@ 2010-12-22 23:07   ` Andrew Morton
  2010-12-23  0:39     ` Huang Ying
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2010-12-22 23:07 UTC (permalink / raw)
  To: Huang Ying
  Cc: Avi Kivity, Marcelo Tosatti, linux-kernel, kvm, Andi Kleen,
	Tony Luck, Dean Nelson

On Wed, 22 Dec 2010 10:51:55 +0800
Huang Ying <ying.huang@intel.com> wrote:

> Make __get_user_pages return -EHWPOISON for HWPOISON page only if
> FOLL_HWPOISON is specified.  With this patch, the interested callers
> can distinguish HWPOISON page from general FAULT page, while other
> callers will still get -EFAULT for pages, so the user space interface
> need not to be changed.
> 
> get_user_pages_hwpoison is added as a variant of get_user_pages that
> can return -EHWPOISON for HWPOISON page.
> 
> This feature is needed by KVM, where UCR MCE should be relayed to
> guest for HWPOISON page, while instruction emulation and MMIO will be
> tried for general FAULT page.
> 
> The idea comes from Andrew Morton.

hm, I don't remember that.  I suspect it came from someone else?

> Signed-off-by: Huang Ying <ying.huang@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  include/asm-generic/errno.h |    2 +
>  include/linux/mm.h          |    5 ++++
>  mm/memory.c                 |   53 +++++++++++++++++++++++++++++++++++++++++---
>  3 files changed, 57 insertions(+), 3 deletions(-)
> 
> --- a/include/asm-generic/errno.h
> +++ b/include/asm-generic/errno.h
> @@ -108,4 +108,6 @@
>  
>  #define ERFKILL		132	/* Operation not possible due to RF-kill */
>  
> +#define EHWPOISON	133	/* Memory page has hardware error */

So this can be propagated to userspace?

If so, which syscalls might return EHWPOISON and are there any manpage
implications?

>  #endif
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -860,6 +860,10 @@ int get_user_pages(struct task_struct *t
>  			struct page **pages, struct vm_area_struct **vmas);
>  int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>  			struct page **pages);
> +int get_user_pages_hwpoison(struct task_struct *tsk, struct mm_struct *mm,
> +			    unsigned long start, int nr_pages, int write,
> +			    int force, struct page **pages,
> +			    struct vm_area_struct **vmas);
>  struct page *get_dump_page(unsigned long addr);
>  
>  extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
> @@ -1415,6 +1419,7 @@ struct page *follow_page(struct vm_area_
>  #define FOLL_GET	0x04	/* do get_page on page */
>  #define FOLL_DUMP	0x08	/* give error on hole if it would be zero */
>  #define FOLL_FORCE	0x10	/* get_user_pages read/write w/o permission */
> +#define FOLL_HWPOISON	0x20	/* check page is hwpoisoned */
>  
>  typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
>  			void *data);
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1449,9 +1449,16 @@ int __get_user_pages(struct task_struct
>  				if (ret & VM_FAULT_ERROR) {
>  					if (ret & VM_FAULT_OOM)
>  						return i ? i : -ENOMEM;
> -					if (ret &
> -					    (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE|
> -					     VM_FAULT_SIGBUS))
> +					if (ret & (VM_FAULT_HWPOISON |
> +						   VM_FAULT_HWPOISON_LARGE)) {
> +						if (i)
> +							return i;
> +						else if (gup_flags & FOLL_HWPOISON)
> +							return -EHWPOISON;
> +						else
> +							return -EFAULT;
> +					}
> +					if (ret & VM_FAULT_SIGBUS)

hm, that function is getting a bit unweildy.

>  /**
> + * get_user_pages_hwpoison() - pin user pages in memory, return hwpoison status
> + * @tsk:	task_struct of target task
> + * @mm:		mm_struct of target mm
> + * @start:	starting user address
> + * @nr_pages:	number of pages from start to pin
> + * @write:	whether pages will be written to by the caller
> + * @force:	whether to force write access even if user mapping is
> + *		readonly. This will result in the page being COWed even
> + *		in MAP_SHARED mappings. You do not want this.
> + * @pages:	array that receives pointers to the pages pinned.
> + *		Should be at least nr_pages long. Or NULL, if caller
> + *		only intends to ensure the pages are faulted in.
> + * @vmas:	array of pointers to vmas corresponding to each page.
> + *		Or NULL if the caller does not require them.
> + *
> + * Returns number of pages pinned.
> + *
> + * If the page table or memory page is hwpoisoned, return -EHWPOISON.
> + *
> + * Otherwise, same as get_user_pages.
> + */
> +int get_user_pages_hwpoison(struct task_struct *tsk, struct mm_struct *mm,
> +			    unsigned long start, int nr_pages, int write,
> +			    int force, struct page **pages,
> +			    struct vm_area_struct **vmas)
> +{
> +	int flags = FOLL_TOUCH | FOLL_HWPOISON;
> +
> +	if (pages)
> +		flags |= FOLL_GET;
> +	if (write)
> +		flags |= FOLL_WRITE;
> +	if (force)
> +		flags |= FOLL_FORCE;
> +
> +	return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas);
> +}
> +EXPORT_SYMBOL(get_user_pages_hwpoison);

Seems that this patch will add unused bloat for CONFIG_PAGE_POISONING=n
kernels?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally
  2010-12-22 23:07   ` Andrew Morton
@ 2010-12-23  0:39     ` Huang Ying
  2010-12-23  0:54       ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Huang Ying @ 2010-12-23  0:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Avi Kivity, Marcelo Tosatti, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, Andi Kleen, Luck, Tony, Dean Nelson

Hi, Andrew,

Thanks for comments.

On Thu, 2010-12-23 at 07:07 +0800, Andrew Morton wrote:
> On Wed, 22 Dec 2010 10:51:55 +0800
> Huang Ying <ying.huang@intel.com> wrote:
> 
> > Make __get_user_pages return -EHWPOISON for HWPOISON page only if
> > FOLL_HWPOISON is specified.  With this patch, the interested callers
> > can distinguish HWPOISON page from general FAULT page, while other
> > callers will still get -EFAULT for pages, so the user space interface
> > need not to be changed.
> > 
> > get_user_pages_hwpoison is added as a variant of get_user_pages that
> > can return -EHWPOISON for HWPOISON page.
> > 
> > This feature is needed by KVM, where UCR MCE should be relayed to
> > guest for HWPOISON page, while instruction emulation and MMIO will be
> > tried for general FAULT page.
> > 
> > The idea comes from Andrew Morton.
> 
> hm, I don't remember that.  I suspect it came from someone else?

It's long long ago.  I use another solution at that time, but now I
think your one is better.

http://marc.info/?l=linux-kernel&m=127241995131034&w=2

> > Signed-off-by: Huang Ying <ying.huang@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > ---
> >  include/asm-generic/errno.h |    2 +
> >  include/linux/mm.h          |    5 ++++
> >  mm/memory.c                 |   53 +++++++++++++++++++++++++++++++++++++++++---
> >  3 files changed, 57 insertions(+), 3 deletions(-)
> > 
> > --- a/include/asm-generic/errno.h
> > +++ b/include/asm-generic/errno.h
> > @@ -108,4 +108,6 @@
> >  
> >  #define ERFKILL		132	/* Operation not possible due to RF-kill */
> >  
> > +#define EHWPOISON	133	/* Memory page has hardware error */
> 
> So this can be propagated to userspace?
> 
> If so, which syscalls might return EHWPOISON and are there any manpage
> implications?

No. EHWPOISON will only be returned if FOLL_HWPOISON is specified in
parameter flags of __get_user_pages.

> >  #endif
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -860,6 +860,10 @@ int get_user_pages(struct task_struct *t
> >  			struct page **pages, struct vm_area_struct **vmas);
> >  int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> >  			struct page **pages);
> > +int get_user_pages_hwpoison(struct task_struct *tsk, struct mm_struct *mm,
> > +			    unsigned long start, int nr_pages, int write,
> > +			    int force, struct page **pages,
> > +			    struct vm_area_struct **vmas);
> >  struct page *get_dump_page(unsigned long addr);
> >  
> >  extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
> > @@ -1415,6 +1419,7 @@ struct page *follow_page(struct vm_area_
> >  #define FOLL_GET	0x04	/* do get_page on page */
> >  #define FOLL_DUMP	0x08	/* give error on hole if it would be zero */
> >  #define FOLL_FORCE	0x10	/* get_user_pages read/write w/o permission */
> > +#define FOLL_HWPOISON	0x20	/* check page is hwpoisoned */
> >  
> >  typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
> >  			void *data);
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1449,9 +1449,16 @@ int __get_user_pages(struct task_struct
> >  				if (ret & VM_FAULT_ERROR) {
> >  					if (ret & VM_FAULT_OOM)
> >  						return i ? i : -ENOMEM;
> > -					if (ret &
> > -					    (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE|
> > -					     VM_FAULT_SIGBUS))
> > +					if (ret & (VM_FAULT_HWPOISON |
> > +						   VM_FAULT_HWPOISON_LARGE)) {
> > +						if (i)
> > +							return i;
> > +						else if (gup_flags & FOLL_HWPOISON)
> > +							return -EHWPOISON;
> > +						else
> > +							return -EFAULT;
> > +					}
> > +					if (ret & VM_FAULT_SIGBUS)
> 
> hm, that function is getting a bit unweildy.

Yes. Do you think the following code is better?

return i ? i : (gup_flags & FOLL_HWPOISON) ? -EHWPOISON : -EFAULT;

> >  /**
> > + * get_user_pages_hwpoison() - pin user pages in memory, return hwpoison status
> > + * @tsk:	task_struct of target task
> > + * @mm:		mm_struct of target mm
> > + * @start:	starting user address
> > + * @nr_pages:	number of pages from start to pin
> > + * @write:	whether pages will be written to by the caller
> > + * @force:	whether to force write access even if user mapping is
> > + *		readonly. This will result in the page being COWed even
> > + *		in MAP_SHARED mappings. You do not want this.
> > + * @pages:	array that receives pointers to the pages pinned.
> > + *		Should be at least nr_pages long. Or NULL, if caller
> > + *		only intends to ensure the pages are faulted in.
> > + * @vmas:	array of pointers to vmas corresponding to each page.
> > + *		Or NULL if the caller does not require them.
> > + *
> > + * Returns number of pages pinned.
> > + *
> > + * If the page table or memory page is hwpoisoned, return -EHWPOISON.
> > + *
> > + * Otherwise, same as get_user_pages.
> > + */
> > +int get_user_pages_hwpoison(struct task_struct *tsk, struct mm_struct *mm,
> > +			    unsigned long start, int nr_pages, int write,
> > +			    int force, struct page **pages,
> > +			    struct vm_area_struct **vmas)
> > +{
> > +	int flags = FOLL_TOUCH | FOLL_HWPOISON;
> > +
> > +	if (pages)
> > +		flags |= FOLL_GET;
> > +	if (write)
> > +		flags |= FOLL_WRITE;
> > +	if (force)
> > +		flags |= FOLL_FORCE;
> > +
> > +	return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas);
> > +}
> > +EXPORT_SYMBOL(get_user_pages_hwpoison);
> 
> Seems that this patch will add unused bloat for CONFIG_PAGE_POISONING=n
> kernels?

Sorry. I will enclose this function with
#ifdef CONFIG_MEMORY_FAILURE

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally
  2010-12-23  0:39     ` Huang Ying
@ 2010-12-23  0:54       ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2010-12-23  0:54 UTC (permalink / raw)
  To: Huang Ying
  Cc: Avi Kivity, Marcelo Tosatti, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, Andi Kleen, Luck, Tony, Dean Nelson

On Thu, 23 Dec 2010 08:39:59 +0800
Huang Ying <ying.huang@intel.com> wrote:

> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -1449,9 +1449,16 @@ int __get_user_pages(struct task_struct
> > >  				if (ret & VM_FAULT_ERROR) {
> > >  					if (ret & VM_FAULT_OOM)
> > >  						return i ? i : -ENOMEM;
> > > -					if (ret &
> > > -					    (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE|
> > > -					     VM_FAULT_SIGBUS))
> > > +					if (ret & (VM_FAULT_HWPOISON |
> > > +						   VM_FAULT_HWPOISON_LARGE)) {
> > > +						if (i)
> > > +							return i;
> > > +						else if (gup_flags & FOLL_HWPOISON)
> > > +							return -EHWPOISON;
> > > +						else
> > > +							return -EFAULT;
> > > +					}
> > > +					if (ret & VM_FAULT_SIGBUS)
> > 
> > hm, that function is getting a bit unweildy.
> 
> Yes. Do you think the following code is better?
> 
> return i ? i : (gup_flags & FOLL_HWPOISON) ? -EHWPOISON : -EFAULT;

nope ;)

The function just needs to be ripped up and redone somehow.  That's not
an appropriate activity in the context of this patch though.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-12-23  0:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-22  2:51 [RFC 0/3] KVM, HWPoison, unpoison address across rebooting Huang Ying
2010-12-22  2:51 ` [RFC 1/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally Huang Ying
2010-12-22 23:07   ` Andrew Morton
2010-12-23  0:39     ` Huang Ying
2010-12-23  0:54       ` Andrew Morton
2010-12-22  2:51 ` [RFC 2/3] KVM, Replace is_hwpoison_address with get_user_pages_hwpoison Huang Ying
2010-12-22  2:51 ` [RFC 3/3] KVM, HWPoison, unpoison address across rebooting Huang Ying

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox