* HWPOISON huge page signal fixes
@ 2010-10-06 20:57 Andi Kleen
2010-10-06 20:57 ` [PATCH 1/2] Encode huge page size for VM_FAULT_HWPOISON errors Andi Kleen
2010-10-06 20:57 ` [PATCH 2/2] x86: HWPOISON: Report correct address granuality for huge hwpoison faults Andi Kleen
0 siblings, 2 replies; 5+ messages in thread
From: Andi Kleen @ 2010-10-06 20:57 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, fengguang.wu, n-horiguchi, x86
These patches fix the address granuality reporting for hugepage
hwpoison errors. This requires some straight forward changes
in the MM (to return this information from handle_mm_fault)
and in the x86 fault handler (to pass this information on)
Any reviews and acks appreciated.
I plan to carry this in my tree for now. Targetted for 2.6.37
Thanks,
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] Encode huge page size for VM_FAULT_HWPOISON errors
2010-10-06 20:57 HWPOISON huge page signal fixes Andi Kleen
@ 2010-10-06 20:57 ` Andi Kleen
2010-10-07 2:25 ` Wu Fengguang
2010-10-06 20:57 ` [PATCH 2/2] x86: HWPOISON: Report correct address granuality for huge hwpoison faults Andi Kleen
1 sibling, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2010-10-06 20:57 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, fengguang.wu, n-horiguchi, x86, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
This fixes a problem introduced with the hugetlb hwpoison handling
The user space SIGBUS signalling wants to know the size of the hugepage
that caused a HWPOISON fault.
Unfortunately the architecture page fault handlers do not have easy
access to the struct page.
Pass the information out in the fault error code instead.
I added a separate VM_FAULT_HWPOISON_LARGE bit for this case and encode
the hpage index in some free upper bits of the fault code. The small
page hwpoison keeps stays with the VM_FAULT_HWPOISON name to minimize
changes.
Also add code to hugetlb.h to convert that index into a page shift.
Will be used in a further patch.
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: fengguang.wu@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
include/linux/hugetlb.h | 6 ++++++
include/linux/mm.h | 12 ++++++++++--
mm/hugetlb.c | 6 ++++--
mm/memory.c | 3 ++-
4 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 796f30e..943c76b 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -307,6 +307,11 @@ static inline struct hstate *page_hstate(struct page *page)
return size_to_hstate(PAGE_SIZE << compound_order(page));
}
+static inline unsigned hstate_index_to_shift(unsigned index)
+{
+ return hstates[index].order + PAGE_SHIFT;
+}
+
#else
struct hstate {};
#define alloc_huge_page_node(h, nid) NULL
@@ -324,6 +329,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
{
return 1;
}
+#define hstate_index_to_shift(index) 0
#endif
#endif /* _LINUX_HUGETLB_H */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 74949fb..f7e9efc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -718,12 +718,20 @@ static inline int page_mapped(struct page *page)
#define VM_FAULT_SIGBUS 0x0002
#define VM_FAULT_MAJOR 0x0004
#define VM_FAULT_WRITE 0x0008 /* Special case for get_user_pages */
-#define VM_FAULT_HWPOISON 0x0010 /* Hit poisoned page */
+#define VM_FAULT_HWPOISON 0x0010 /* Hit poisoned small page */
+#define VM_FAULT_HWPOISON_LARGE 0x0020 /* Hit poisoned large page. Index encoded in upper bits */
#define VM_FAULT_NOPAGE 0x0100 /* ->fault installed the pte, not return page */
#define VM_FAULT_LOCKED 0x0200 /* ->fault locked the returned page */
-#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_HWPOISON)
+#define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
+
+#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_HWPOISON | \
+ VM_FAULT_HWPOISON_LARGE)
+
+/* Encode hstate index for a hwpoisoned large page */
+#define VM_FAULT_SET_HINDEX(x) ((x) << 12)
+#define VM_FAULT_GET_HINDEX(x) (((x) >> 12) & 0xf)
/*
* Can be called by the pagefault handler when it gets a VM_FAULT_OOM.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 67cd032..96991de 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2589,7 +2589,8 @@ retry:
* So we need to block hugepage fault by PG_hwpoison bit check.
*/
if (unlikely(PageHWPoison(page))) {
- ret = VM_FAULT_HWPOISON;
+ ret = VM_FAULT_HWPOISON |
+ VM_FAULT_SET_HINDEX(h - hstates);
goto backout_unlocked;
}
page_dup_rmap(page);
@@ -2656,7 +2657,8 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
migration_entry_wait(mm, (pmd_t *)ptep, address);
return 0;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
- return VM_FAULT_HWPOISON;
+ return VM_FAULT_HWPOISON_LARGE |
+ VM_FAULT_SET_HINDEX(h - hstates);
}
ptep = huge_pte_alloc(mm, address, huge_page_size(h));
diff --git a/mm/memory.c b/mm/memory.c
index 71b161b..8cea8f3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1450,7 +1450,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
if (ret & VM_FAULT_OOM)
return i ? i : -ENOMEM;
if (ret &
- (VM_FAULT_HWPOISON|VM_FAULT_SIGBUS))
+ (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE|
+ VM_FAULT_SIGBUS))
return i ? i : -EFAULT;
BUG();
}
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] x86: HWPOISON: Report correct address granuality for huge hwpoison faults
2010-10-06 20:57 HWPOISON huge page signal fixes Andi Kleen
2010-10-06 20:57 ` [PATCH 1/2] Encode huge page size for VM_FAULT_HWPOISON errors Andi Kleen
@ 2010-10-06 20:57 ` Andi Kleen
2010-10-07 2:27 ` Wu Fengguang
1 sibling, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2010-10-06 20:57 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, fengguang.wu, n-horiguchi, x86, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
An earlier patch fixed the hwpoison fault handling to encode the
huge page size in the fault code of the page fault handler.
This is needed to report this information in SIGBUS to user space.
This is a straight forward patch to pass this information
through to the signal handling in the x86 specific fault.c
Cc: x86@kernel.org
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: fengguang.wu@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
arch/x86/mm/fault.c | 19 +++++++++++++------
1 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 4c4508e..1d15a27 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -11,6 +11,7 @@
#include <linux/kprobes.h> /* __kprobes, ... */
#include <linux/mmiotrace.h> /* kmmio_handler, ... */
#include <linux/perf_event.h> /* perf_sw_event */
+#include <linux/hugetlb.h> /* hstate_index_to_shift */
#include <asm/traps.h> /* dotraplinkage, ... */
#include <asm/pgalloc.h> /* pgd_*(), ... */
@@ -160,15 +161,20 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
static void
force_sig_info_fault(int si_signo, int si_code, unsigned long address,
- struct task_struct *tsk)
+ struct task_struct *tsk, int fault)
{
+ unsigned lsb = 0;
siginfo_t info;
info.si_signo = si_signo;
info.si_errno = 0;
info.si_code = si_code;
info.si_addr = (void __user *)address;
- info.si_addr_lsb = si_code == BUS_MCEERR_AR ? PAGE_SHIFT : 0;
+ if (fault & VM_FAULT_HWPOISON_LARGE)
+ lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
+ if (fault & VM_FAULT_HWPOISON)
+ lsb = PAGE_SHIFT;
+ info.si_addr_lsb = lsb;
force_sig_info(si_signo, &info, tsk);
}
@@ -731,7 +737,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
tsk->thread.error_code = error_code | (address >= TASK_SIZE);
tsk->thread.trap_no = 14;
- force_sig_info_fault(SIGSEGV, si_code, address, tsk);
+ force_sig_info_fault(SIGSEGV, si_code, address, tsk, 0);
return;
}
@@ -816,14 +822,14 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
tsk->thread.trap_no = 14;
#ifdef CONFIG_MEMORY_FAILURE
- if (fault & VM_FAULT_HWPOISON) {
+ if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
printk(KERN_ERR
"MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
tsk->comm, tsk->pid, address);
code = BUS_MCEERR_AR;
}
#endif
- force_sig_info_fault(SIGBUS, code, address, tsk);
+ force_sig_info_fault(SIGBUS, code, address, tsk, fault);
}
static noinline void
@@ -833,7 +839,8 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
if (fault & VM_FAULT_OOM) {
out_of_memory(regs, error_code, address);
} else {
- if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON))
+ if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|
+ VM_FAULT_HWPOISON_LARGE))
do_sigbus(regs, error_code, address, fault);
else
BUG();
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] Encode huge page size for VM_FAULT_HWPOISON errors
2010-10-06 20:57 ` [PATCH 1/2] Encode huge page size for VM_FAULT_HWPOISON errors Andi Kleen
@ 2010-10-07 2:25 ` Wu Fengguang
0 siblings, 0 replies; 5+ messages in thread
From: Wu Fengguang @ 2010-10-07 2:25 UTC (permalink / raw)
To: Andi Kleen
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
n-horiguchi@ah.jp.nec.com, x86@kernel.org, Andi Kleen
On Thu, Oct 07, 2010 at 04:57:20AM +0800, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> This fixes a problem introduced with the hugetlb hwpoison handling
>
> The user space SIGBUS signalling wants to know the size of the hugepage
> that caused a HWPOISON fault.
>
> Unfortunately the architecture page fault handlers do not have easy
> access to the struct page.
>
> Pass the information out in the fault error code instead.
>
> I added a separate VM_FAULT_HWPOISON_LARGE bit for this case and encode
> the hpage index in some free upper bits of the fault code. The small
> page hwpoison keeps stays with the VM_FAULT_HWPOISON name to minimize
> changes.
>
> Also add code to hugetlb.h to convert that index into a page shift.
The use of hstate index is space efficient, however at the cost of
more code and tight coupling with hugetlb. If directly encoding
page_order-PAGE_SHIFT, a mask of 0x3f (6 bits) will be able to present
max order 63+12=75 which is sufficient large. We still have plenty of
free bits in the 32bit fault code :)
Thanks,
Fengguang
> Will be used in a further patch.
>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: fengguang.wu@intel.com
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
> include/linux/hugetlb.h | 6 ++++++
> include/linux/mm.h | 12 ++++++++++--
> mm/hugetlb.c | 6 ++++--
> mm/memory.c | 3 ++-
> 4 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 796f30e..943c76b 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -307,6 +307,11 @@ static inline struct hstate *page_hstate(struct page *page)
> return size_to_hstate(PAGE_SIZE << compound_order(page));
> }
>
> +static inline unsigned hstate_index_to_shift(unsigned index)
> +{
> + return hstates[index].order + PAGE_SHIFT;
> +}
> +
> #else
> struct hstate {};
> #define alloc_huge_page_node(h, nid) NULL
> @@ -324,6 +329,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
> {
> return 1;
> }
> +#define hstate_index_to_shift(index) 0
> #endif
>
> #endif /* _LINUX_HUGETLB_H */
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 74949fb..f7e9efc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -718,12 +718,20 @@ static inline int page_mapped(struct page *page)
> #define VM_FAULT_SIGBUS 0x0002
> #define VM_FAULT_MAJOR 0x0004
> #define VM_FAULT_WRITE 0x0008 /* Special case for get_user_pages */
> -#define VM_FAULT_HWPOISON 0x0010 /* Hit poisoned page */
> +#define VM_FAULT_HWPOISON 0x0010 /* Hit poisoned small page */
> +#define VM_FAULT_HWPOISON_LARGE 0x0020 /* Hit poisoned large page. Index encoded in upper bits */
>
> #define VM_FAULT_NOPAGE 0x0100 /* ->fault installed the pte, not return page */
> #define VM_FAULT_LOCKED 0x0200 /* ->fault locked the returned page */
>
> -#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_HWPOISON)
> +#define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
> +
> +#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_HWPOISON | \
> + VM_FAULT_HWPOISON_LARGE)
> +
> +/* Encode hstate index for a hwpoisoned large page */
> +#define VM_FAULT_SET_HINDEX(x) ((x) << 12)
> +#define VM_FAULT_GET_HINDEX(x) (((x) >> 12) & 0xf)
>
> /*
> * Can be called by the pagefault handler when it gets a VM_FAULT_OOM.
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 67cd032..96991de 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2589,7 +2589,8 @@ retry:
> * So we need to block hugepage fault by PG_hwpoison bit check.
> */
> if (unlikely(PageHWPoison(page))) {
> - ret = VM_FAULT_HWPOISON;
> + ret = VM_FAULT_HWPOISON |
> + VM_FAULT_SET_HINDEX(h - hstates);
> goto backout_unlocked;
> }
> page_dup_rmap(page);
> @@ -2656,7 +2657,8 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> migration_entry_wait(mm, (pmd_t *)ptep, address);
> return 0;
> } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
> - return VM_FAULT_HWPOISON;
> + return VM_FAULT_HWPOISON_LARGE |
> + VM_FAULT_SET_HINDEX(h - hstates);
> }
>
> ptep = huge_pte_alloc(mm, address, huge_page_size(h));
> diff --git a/mm/memory.c b/mm/memory.c
> index 71b161b..8cea8f3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1450,7 +1450,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> if (ret & VM_FAULT_OOM)
> return i ? i : -ENOMEM;
> if (ret &
> - (VM_FAULT_HWPOISON|VM_FAULT_SIGBUS))
> + (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE|
> + VM_FAULT_SIGBUS))
> return i ? i : -EFAULT;
> BUG();
> }
> --
> 1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] x86: HWPOISON: Report correct address granuality for huge hwpoison faults
2010-10-06 20:57 ` [PATCH 2/2] x86: HWPOISON: Report correct address granuality for huge hwpoison faults Andi Kleen
@ 2010-10-07 2:27 ` Wu Fengguang
0 siblings, 0 replies; 5+ messages in thread
From: Wu Fengguang @ 2010-10-07 2:27 UTC (permalink / raw)
To: Andi Kleen
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
n-horiguchi@ah.jp.nec.com, x86@kernel.org, Andi Kleen
On Thu, Oct 07, 2010 at 04:57:21AM +0800, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> An earlier patch fixed the hwpoison fault handling to encode the
> huge page size in the fault code of the page fault handler.
>
> This is needed to report this information in SIGBUS to user space.
>
> This is a straight forward patch to pass this information
> through to the signal handling in the x86 specific fault.c
>
> Cc: x86@kernel.org
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: fengguang.wu@intel.com
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
> arch/x86/mm/fault.c | 19 +++++++++++++------
> 1 files changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 4c4508e..1d15a27 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -11,6 +11,7 @@
> #include <linux/kprobes.h> /* __kprobes, ... */
> #include <linux/mmiotrace.h> /* kmmio_handler, ... */
> #include <linux/perf_event.h> /* perf_sw_event */
> +#include <linux/hugetlb.h> /* hstate_index_to_shift */
>
> #include <asm/traps.h> /* dotraplinkage, ... */
> #include <asm/pgalloc.h> /* pgd_*(), ... */
> @@ -160,15 +161,20 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
>
> static void
> force_sig_info_fault(int si_signo, int si_code, unsigned long address,
> - struct task_struct *tsk)
> + struct task_struct *tsk, int fault)
> {
> + unsigned lsb = 0;
> siginfo_t info;
>
> info.si_signo = si_signo;
> info.si_errno = 0;
> info.si_code = si_code;
> info.si_addr = (void __user *)address;
> - info.si_addr_lsb = si_code == BUS_MCEERR_AR ? PAGE_SHIFT : 0;
Ah you changed the conditional 0..
> + if (fault & VM_FAULT_HWPOISON_LARGE)
> + lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
> + if (fault & VM_FAULT_HWPOISON)
> + lsb = PAGE_SHIFT;
> + info.si_addr_lsb = lsb;
>
> force_sig_info(si_signo, &info, tsk);
> }
> @@ -731,7 +737,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
> tsk->thread.error_code = error_code | (address >= TASK_SIZE);
> tsk->thread.trap_no = 14;
>
> - force_sig_info_fault(SIGSEGV, si_code, address, tsk);
> + force_sig_info_fault(SIGSEGV, si_code, address, tsk, 0);
..and it's sure reasonable.
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-10-07 2:27 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-06 20:57 HWPOISON huge page signal fixes Andi Kleen
2010-10-06 20:57 ` [PATCH 1/2] Encode huge page size for VM_FAULT_HWPOISON errors Andi Kleen
2010-10-07 2:25 ` Wu Fengguang
2010-10-06 20:57 ` [PATCH 2/2] x86: HWPOISON: Report correct address granuality for huge hwpoison faults Andi Kleen
2010-10-07 2:27 ` Wu Fengguang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).