linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] HWPOISON for hugepage (v6)
@ 2010-05-28  0:29 Naoya Horiguchi
  2010-05-28  0:29 ` [PATCH 1/8] hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h Naoya Horiguchi
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2010-05-28  0:29 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang, Mel Gorman

Hi,

Here is a "HWPOISON for hugepage" patchset which reflects
Mel's comments on hugepage rmapping code.
Only patch 1/8 and 2/8 are changed since the previous post.

Mel, could you please restart reviewing and testing?

 include/linux/hugetlb.h        |   14 +---
 include/linux/hugetlb_inline.h |   22 +++++++
 include/linux/pagemap.h        |    9 +++-
 include/linux/poison.h         |    9 ---
 include/linux/rmap.h           |    5 ++
 mm/hugetlb.c                   |  100 ++++++++++++++++++++++++++++++++-
 mm/hwpoison-inject.c           |   15 +++--
 mm/memory-failure.c            |  120 ++++++++++++++++++++++++++++++----------
 mm/rmap.c                      |   59 ++++++++++++++++++++
 9 files changed, 295 insertions(+), 58 deletions(-)

ChangeLog from v5:
- rebased to 2.6.34
- fix logic error (in case that private mapping and shared mapping coexist)
- move is_vm_hugetlb_page() into include/linux/mm.h to use this function
  from linear_page_index()
- define and use linear_hugepage_index() instead of compound_order()
- use page_move_anon_rmap() in hugetlb_cow()
- copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
- revert commit 24be7468 completely
- create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
- move functions setting up anon_vma for hugepage into mm/rmap.c.

ChangeLog from v4:
- rebased to 2.6.34-rc7
- add isolation code for free/reserved hugepage in me_huge_page()
- set/clear PG_hwpoison bits of all pages in hugepage.
- mce_bad_pages counts all pages in hugepage.
- rename __hugepage_set_anon_rmap() to hugepage_add_anon_rmap()
- add huge_pte_offset() dummy function in header file on !CONFIG_HUGETLBFS

ChangeLog from v3:
- rebased to 2.6.34-rc5
- support for privately mapped hugepage

ChangeLog from v2:
- rebase to 2.6.34-rc3
- consider mapcount of hugepage
- rename pointer "head" into "hpage"

ChangeLog from v1:
- rebase to 2.6.34-rc1
- add comment from Wu Fengguang

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/8] hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h
  2010-05-28  0:29 [PATCH 0/8] HWPOISON for hugepage (v6) Naoya Horiguchi
@ 2010-05-28  0:29 ` Naoya Horiguchi
  2010-05-28 10:03   ` Mel Gorman
  2010-05-28  0:29 ` [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage Naoya Horiguchi
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Naoya Horiguchi @ 2010-05-28  0:29 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang, Mel Gorman

is_vm_hugetlb_page() is a widely used inline function to insert hooks
into hugetlb code.
But we can't use it in pagemap.h because of circular dependency of
the header files. This patch removes this limitation.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/hugetlb.h        |   11 +----------
 include/linux/hugetlb_inline.h |   22 ++++++++++++++++++++++
 include/linux/pagemap.h        |    1 +
 3 files changed, 24 insertions(+), 10 deletions(-)
 create mode 100644 include/linux/hugetlb_inline.h

diff --git v2.6.34/include/linux/hugetlb.h v2.6.34/include/linux/hugetlb.h
index 78b4bc6..d47a7c4 100644
--- v2.6.34/include/linux/hugetlb.h
+++ v2.6.34/include/linux/hugetlb.h
@@ -2,6 +2,7 @@
 #define _LINUX_HUGETLB_H
 
 #include <linux/fs.h>
+#include <linux/hugetlb_inline.h>
 
 struct ctl_table;
 struct user_struct;
@@ -14,11 +15,6 @@ struct user_struct;
 
 int PageHuge(struct page *page);
 
-static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
-{
-	return vma->vm_flags & VM_HUGETLB;
-}
-
 void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
 int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
 int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
@@ -77,11 +73,6 @@ static inline int PageHuge(struct page *page)
 	return 0;
 }
 
-static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
-{
-	return 0;
-}
-
 static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
 {
 }
diff --git v2.6.34/include/linux/hugetlb_inline.h v2.6.34/include/linux/hugetlb_inline.h
new file mode 100644
index 0000000..cf00b6d
--- /dev/null
+++ v2.6.34/include/linux/hugetlb_inline.h
@@ -0,0 +1,22 @@
+#ifndef _LINUX_HUGETLB_INLINE_H
+#define _LINUX_HUGETLB_INLINE_H 1
+
+#ifdef CONFIG_HUGETLBFS
+
+#include <linux/mm.h>
+
+static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
+{
+	return vma->vm_flags & VM_HUGETLB;
+}
+
+#else
+
+static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
+{
+	return 0;
+}
+
+#endif
+
+#endif
diff --git v2.6.34/include/linux/pagemap.h v2.6.34/include/linux/pagemap.h
index 3c62ed4..b2bd2ba 100644
--- v2.6.34/include/linux/pagemap.h
+++ v2.6.34/include/linux/pagemap.h
@@ -13,6 +13,7 @@
 #include <linux/gfp.h>
 #include <linux/bitops.h>
 #include <linux/hardirq.h> /* for in_interrupt() */
+#include <linux/hugetlb_inline.h>
 
 /*
  * Bits in mapping->flags.  The lower __GFP_BITS_SHIFT bits are the page
-- 
1.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage
  2010-05-28  0:29 [PATCH 0/8] HWPOISON for hugepage (v6) Naoya Horiguchi
  2010-05-28  0:29 ` [PATCH 1/8] hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h Naoya Horiguchi
@ 2010-05-28  0:29 ` Naoya Horiguchi
  2010-05-28 14:48   ` Mel Gorman
  2010-06-02 18:16   ` Andrew Morton
  2010-05-28  0:29 ` [PATCH 3/8] HWPOISON, hugetlb: enable error handling path for hugepage Naoya Horiguchi
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2010-05-28  0:29 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang, Mel Gorman,
	Andrea Arcangeli, Larry Woodman, Lee Schermerhorn

This patch adds reverse mapping feature for hugepage by introducing
mapcount for shared/private-mapped hugepage and anon_vma for
private-mapped hugepage.

While hugepage is not currently swappable, reverse mapping can be useful
for memory error handler.

Without this patch, memory error handler cannot identify processes
using the bad hugepage nor unmap it from them. That is:
- for shared hugepage:
  we can collect processes using a hugepage through pagecache,
  but can not unmap the hugepage because of the lack of mapcount.
- for privately mapped hugepage:
  we can neither collect processes nor unmap the hugepage.
This patch solves these problems.

This patch include the bug fix given by commit 23be7468e8, so reverts it.

Dependency:
  "hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h"

ChangeLog since May 24.
- create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
- move functions setting up anon_vma for hugepage into mm/rmap.c.

ChangeLog since May 13.
- rebased to 2.6.34
- fix logic error (in case that private mapping and shared mapping coexist)
- move is_vm_hugetlb_page() into include/linux/mm.h to use this function
  from linear_page_index()
- define and use linear_hugepage_index() instead of compound_order()
- use page_move_anon_rmap() in hugetlb_cow()
- copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
- revert commit 24be7468 completely

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
---
 include/linux/hugetlb.h |    1 +
 include/linux/pagemap.h |    8 +++++-
 include/linux/poison.h  |    9 -------
 include/linux/rmap.h    |    5 ++++
 mm/hugetlb.c            |   44 +++++++++++++++++++++++++++++++++-
 mm/rmap.c               |   59 +++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 114 insertions(+), 12 deletions(-)

diff --git v2.6.34/include/linux/hugetlb.h v2.6.34/include/linux/hugetlb.h
index d47a7c4..e688fd8 100644
--- v2.6.34/include/linux/hugetlb.h
+++ v2.6.34/include/linux/hugetlb.h
@@ -99,6 +99,7 @@ static inline void hugetlb_report_meminfo(struct seq_file *m)
 #define is_hugepage_only_range(mm, addr, len)	0
 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
 #define hugetlb_fault(mm, vma, addr, flags)	({ BUG(); 0; })
+#define huge_pte_offset(mm, address)	0
 
 #define hugetlb_change_protection(vma, address, end, newprot)
 
diff --git v2.6.34/include/linux/pagemap.h v2.6.34/include/linux/pagemap.h
index b2bd2ba..a547d96 100644
--- v2.6.34/include/linux/pagemap.h
+++ v2.6.34/include/linux/pagemap.h
@@ -282,10 +282,16 @@ static inline loff_t page_offset(struct page *page)
 	return ((loff_t)page->index) << PAGE_CACHE_SHIFT;
 }
 
+extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
+				     unsigned long address);
+
 static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
 					unsigned long address)
 {
-	pgoff_t pgoff = (address - vma->vm_start) >> PAGE_SHIFT;
+	pgoff_t pgoff;
+	if (unlikely(is_vm_hugetlb_page(vma)))
+		return linear_hugepage_index(vma, address);
+	pgoff = (address - vma->vm_start) >> PAGE_SHIFT;
 	pgoff += vma->vm_pgoff;
 	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 }
diff --git v2.6.34/include/linux/poison.h v2.6.34/include/linux/poison.h
index 34066ff..2110a81 100644
--- v2.6.34/include/linux/poison.h
+++ v2.6.34/include/linux/poison.h
@@ -48,15 +48,6 @@
 #define POISON_FREE	0x6b	/* for use-after-free poisoning */
 #define	POISON_END	0xa5	/* end-byte of poisoning */
 
-/********** mm/hugetlb.c **********/
-/*
- * Private mappings of hugetlb pages use this poisoned value for
- * page->mapping. The core VM should not be doing anything with this mapping
- * but futex requires the existence of some page->mapping value even though it
- * is unused if PAGE_MAPPING_ANON is set.
- */
-#define HUGETLB_POISON	((void *)(0x00300300 + POISON_POINTER_DELTA + PAGE_MAPPING_ANON))
-
 /********** arch/$ARCH/mm/init.c **********/
 #define POISON_FREE_INITMEM	0xcc
 
diff --git v2.6.34/include/linux/rmap.h v2.6.34/include/linux/rmap.h
index d25bd22..18cbe4b 100644
--- v2.6.34/include/linux/rmap.h
+++ v2.6.34/include/linux/rmap.h
@@ -131,6 +131,11 @@ void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned lon
 void page_add_file_rmap(struct page *);
 void page_remove_rmap(struct page *);
 
+void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
+			    unsigned long);
+void hugepage_add_new_anon_rmap(struct page *, struct vm_area_struct *,
+				unsigned long);
+
 static inline void page_dup_rmap(struct page *page)
 {
 	atomic_inc(&page->_mapcount);
diff --git v2.6.34/mm/hugetlb.c v2.6.34/mm/hugetlb.c
index 4c9e6bb..b1aa0d8 100644
--- v2.6.34/mm/hugetlb.c
+++ v2.6.34/mm/hugetlb.c
@@ -18,6 +18,7 @@
 #include <linux/bootmem.h>
 #include <linux/sysfs.h>
 #include <linux/slab.h>
+#include <linux/rmap.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -220,6 +221,12 @@ static pgoff_t vma_hugecache_offset(struct hstate *h,
 			(vma->vm_pgoff >> huge_page_order(h));
 }
 
+pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
+				     unsigned long address)
+{
+	return vma_hugecache_offset(hstate_vma(vma), vma, address);
+}
+
 /*
  * Return the size of the pages allocated when backing a VMA. In the majority
  * cases this will be same size as used by the page table entries.
@@ -548,6 +555,7 @@ static void free_huge_page(struct page *page)
 	set_page_private(page, 0);
 	page->mapping = NULL;
 	BUG_ON(page_count(page));
+	BUG_ON(page_mapcount(page));
 	INIT_LIST_HEAD(&page->lru);
 
 	spin_lock(&hugetlb_lock);
@@ -2125,6 +2133,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			entry = huge_ptep_get(src_pte);
 			ptepage = pte_page(entry);
 			get_page(ptepage);
+			page_dup_rmap(ptepage);
 			set_huge_pte_at(dst, addr, dst_pte, entry);
 		}
 		spin_unlock(&src->page_table_lock);
@@ -2203,6 +2212,7 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 	flush_tlb_range(vma, start, end);
 	mmu_notifier_invalidate_range_end(mm, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
+		page_remove_rmap(page);
 		list_del(&page->lru);
 		put_page(page);
 	}
@@ -2268,6 +2278,9 @@ static int unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
 	return 1;
 }
 
+/*
+ * Hugetlb_cow() should be called with page lock of the original hugepage held.
+ */
 static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, pte_t *ptep, pte_t pte,
 			struct page *pagecache_page)
@@ -2282,8 +2295,11 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 retry_avoidcopy:
 	/* If no-one else is actually using this page, avoid the copy
 	 * and just make the page writable */
-	avoidcopy = (page_count(old_page) == 1);
+	avoidcopy = (page_mapcount(old_page) == 1);
 	if (avoidcopy) {
+		if (!trylock_page(old_page))
+			if (PageAnon(old_page))
+				page_move_anon_rmap(old_page, vma, address);
 		set_huge_ptep_writable(vma, address, ptep);
 		return 0;
 	}
@@ -2334,6 +2350,13 @@ retry_avoidcopy:
 		return -PTR_ERR(new_page);
 	}
 
+	/*
+	 * When the original hugepage is shared one, it does not have
+	 * anon_vma prepared.
+	 */
+	if (unlikely(anon_vma_prepare(vma)))
+		return VM_FAULT_OOM;
+
 	copy_huge_page(new_page, old_page, address, vma);
 	__SetPageUptodate(new_page);
 
@@ -2348,6 +2371,8 @@ retry_avoidcopy:
 		huge_ptep_clear_flush(vma, address, ptep);
 		set_huge_pte_at(mm, address, ptep,
 				make_huge_pte(vma, new_page, 1));
+		page_remove_rmap(old_page);
+		hugepage_add_anon_rmap(new_page, vma, address);
 		/* Make the old page be freed below */
 		new_page = old_page;
 	}
@@ -2448,10 +2473,17 @@ retry:
 			spin_lock(&inode->i_lock);
 			inode->i_blocks += blocks_per_huge_page(h);
 			spin_unlock(&inode->i_lock);
+			page_dup_rmap(page);
 		} else {
 			lock_page(page);
-			page->mapping = HUGETLB_POISON;
+			if (unlikely(anon_vma_prepare(vma))) {
+				ret = VM_FAULT_OOM;
+				goto backout_unlocked;
+			}
+			hugepage_add_new_anon_rmap(page, vma, address);
 		}
+	} else {
+		page_dup_rmap(page);
 	}
 
 	/*
@@ -2503,6 +2535,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	pte_t *ptep;
 	pte_t entry;
 	int ret;
+	struct page *page = NULL;
 	struct page *pagecache_page = NULL;
 	static DEFINE_MUTEX(hugetlb_instantiation_mutex);
 	struct hstate *h = hstate_vma(vma);
@@ -2544,6 +2577,11 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 								vma, address);
 	}
 
+	if (!pagecache_page) {
+		page = pte_page(entry);
+		lock_page(page);
+	}
+
 	spin_lock(&mm->page_table_lock);
 	/* Check for a racing update before calling hugetlb_cow */
 	if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
@@ -2569,6 +2607,8 @@ out_page_table_lock:
 	if (pagecache_page) {
 		unlock_page(pagecache_page);
 		put_page(pagecache_page);
+	} else {
+		unlock_page(page);
 	}
 
 out_mutex:
diff --git v2.6.34/mm/rmap.c v2.6.34/mm/rmap.c
index 0feeef8..5278371 100644
--- v2.6.34/mm/rmap.c
+++ v2.6.34/mm/rmap.c
@@ -56,6 +56,7 @@
 #include <linux/memcontrol.h>
 #include <linux/mmu_notifier.h>
 #include <linux/migrate.h>
+#include <linux/hugetlb.h>
 
 #include <asm/tlbflush.h>
 
@@ -326,6 +327,8 @@ vma_address(struct page *page, struct vm_area_struct *vma)
 	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 	unsigned long address;
 
+	if (unlikely(is_vm_hugetlb_page(vma)))
+		pgoff = page->index << huge_page_order(page_hstate(page));
 	address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 	if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
 		/* page should be within @vma mapping range */
@@ -369,6 +372,12 @@ pte_t *page_check_address(struct page *page, struct mm_struct *mm,
 	pte_t *pte;
 	spinlock_t *ptl;
 
+	if (unlikely(PageHuge(page))) {
+		pte = huge_pte_offset(mm, address);
+		ptl = &mm->page_table_lock;
+		goto check;
+	}
+
 	pgd = pgd_offset(mm, address);
 	if (!pgd_present(*pgd))
 		return NULL;
@@ -389,6 +398,7 @@ pte_t *page_check_address(struct page *page, struct mm_struct *mm,
 	}
 
 	ptl = pte_lockptr(mm, pmd);
+check:
 	spin_lock(ptl);
 	if (pte_present(*pte) && page_to_pfn(page) == pte_pfn(*pte)) {
 		*ptlp = ptl;
@@ -873,6 +883,12 @@ void page_remove_rmap(struct page *page)
 		page_clear_dirty(page);
 		set_page_dirty(page);
 	}
+	/*
+	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
+	 * and not charged by memcg for now.
+	 */
+	if (unlikely(PageHuge(page)))
+		return;
 	if (PageAnon(page)) {
 		mem_cgroup_uncharge_page(page);
 		__dec_zone_page_state(page, NR_ANON_PAGES);
@@ -1419,3 +1435,46 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
 		return rmap_walk_file(page, rmap_one, arg);
 }
 #endif /* CONFIG_MIGRATION */
+
+#ifdef CONFIG_HUGETLBFS
+/*
+ * The following three functions are for anonymous (private mapped) hugepages.
+ * Unlike common anonymous pages, anonymous hugepages have no accounting code
+ * and no lru code, because we handle hugepages differently from common pages.
+ */
+static void __hugepage_set_anon_rmap(struct page *page,
+	struct vm_area_struct *vma, unsigned long address, int exclusive)
+{
+	struct anon_vma *anon_vma = vma->anon_vma;
+	BUG_ON(!anon_vma);
+	if (!exclusive) {
+		struct anon_vma_chain *avc;
+		avc = list_entry(vma->anon_vma_chain.prev,
+				 struct anon_vma_chain, same_vma);
+		anon_vma = avc->anon_vma;
+	}
+	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
+	page->mapping = (struct address_space *) anon_vma;
+	page->index = linear_page_index(vma, address);
+}
+
+void hugepage_add_anon_rmap(struct page *page,
+			    struct vm_area_struct *vma, unsigned long address)
+{
+	struct anon_vma *anon_vma = vma->anon_vma;
+	int first;
+	BUG_ON(!anon_vma);
+	BUG_ON(address < vma->vm_start || address >= vma->vm_end);
+	first = atomic_inc_and_test(&page->_mapcount);
+	if (first)
+		__hugepage_set_anon_rmap(page, vma, address, 0);
+}
+
+void hugepage_add_new_anon_rmap(struct page *page,
+			struct vm_area_struct *vma, unsigned long address)
+{
+	BUG_ON(address < vma->vm_start || address >= vma->vm_end);
+	atomic_set(&page->_mapcount, 0);
+	__hugepage_set_anon_rmap(page, vma, address, 1);
+}
+#endif /* CONFIG_HUGETLBFS */
-- 
1.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/8] HWPOISON, hugetlb: enable error handling path for hugepage
  2010-05-28  0:29 [PATCH 0/8] HWPOISON for hugepage (v6) Naoya Horiguchi
  2010-05-28  0:29 ` [PATCH 1/8] hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h Naoya Horiguchi
  2010-05-28  0:29 ` [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage Naoya Horiguchi
@ 2010-05-28  0:29 ` Naoya Horiguchi
  2010-05-28  0:29 ` [PATCH 4/8] HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage Naoya Horiguchi
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2010-05-28  0:29 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang, Mel Gorman

This patch just enables handling path. Real containing and
recovering operation will be implemented in following patches.

Dependency:
  "hugetlb, rmap: add reverse mapping for hugepage."

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/memory-failure.c |   39 ++++++++++++++++++++++-----------------
 1 files changed, 22 insertions(+), 17 deletions(-)

diff --git v2.6.34/mm/memory-failure.c v2.6.34/mm/memory-failure.c
index 620b0b4..1ec68c8 100644
--- v2.6.34/mm/memory-failure.c
+++ v2.6.34/mm/memory-failure.c
@@ -45,6 +45,7 @@
 #include <linux/page-isolation.h>
 #include <linux/suspend.h>
 #include <linux/slab.h>
+#include <linux/hugetlb.h>
 #include "internal.h"
 
 int sysctl_memory_failure_early_kill __read_mostly = 0;
@@ -837,6 +838,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	int ret;
 	int i;
 	int kill = 1;
+	struct page *hpage = compound_head(p);
 
 	if (PageReserved(p) || PageSlab(p))
 		return SWAP_SUCCESS;
@@ -845,10 +847,10 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 * This check implies we don't kill processes if their pages
 	 * are in the swap cache early. Those are always late kills.
 	 */
-	if (!page_mapped(p))
+	if (!page_mapped(hpage))
 		return SWAP_SUCCESS;
 
-	if (PageCompound(p) || PageKsm(p))
+	if (PageKsm(p))
 		return SWAP_FAIL;
 
 	if (PageSwapCache(p)) {
@@ -863,10 +865,11 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 * XXX: the dirty test could be racy: set_page_dirty() may not always
 	 * be called inside page lock (it's recommended but not enforced).
 	 */
-	mapping = page_mapping(p);
-	if (!PageDirty(p) && mapping && mapping_cap_writeback_dirty(mapping)) {
-		if (page_mkclean(p)) {
-			SetPageDirty(p);
+	mapping = page_mapping(hpage);
+	if (!PageDirty(hpage) && mapping &&
+	    mapping_cap_writeback_dirty(mapping)) {
+		if (page_mkclean(hpage)) {
+			SetPageDirty(hpage);
 		} else {
 			kill = 0;
 			ttu |= TTU_IGNORE_HWPOISON;
@@ -885,14 +888,14 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 * there's nothing that can be done.
 	 */
 	if (kill)
-		collect_procs(p, &tokill);
+		collect_procs(hpage, &tokill);
 
 	/*
 	 * try_to_unmap can fail temporarily due to races.
 	 * Try a few times (RED-PEN better strategy?)
 	 */
 	for (i = 0; i < N_UNMAP_TRIES; i++) {
-		ret = try_to_unmap(p, ttu);
+		ret = try_to_unmap(hpage, ttu);
 		if (ret == SWAP_SUCCESS)
 			break;
 		pr_debug("MCE %#lx: try_to_unmap retry needed %d\n", pfn,  ret);
@@ -900,7 +903,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 
 	if (ret != SWAP_SUCCESS)
 		printk(KERN_ERR "MCE %#lx: failed to unmap page (mapcount=%d)\n",
-				pfn, page_mapcount(p));
+				pfn, page_mapcount(hpage));
 
 	/*
 	 * Now that the dirty bit has been propagated to the
@@ -911,7 +914,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 * use a more force-full uncatchable kill to prevent
 	 * any accesses to the poisoned memory.
 	 */
-	kill_procs_ao(&tokill, !!PageDirty(p), trapno,
+	kill_procs_ao(&tokill, !!PageDirty(hpage), trapno,
 		      ret != SWAP_SUCCESS, pfn);
 
 	return ret;
@@ -921,6 +924,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 {
 	struct page_state *ps;
 	struct page *p;
+	struct page *hpage;
 	int res;
 
 	if (!sysctl_memory_failure_recovery)
@@ -934,6 +938,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 	}
 
 	p = pfn_to_page(pfn);
+	hpage = compound_head(p);
 	if (TestSetPageHWPoison(p)) {
 		printk(KERN_ERR "MCE %#lx: already hardware poisoned\n", pfn);
 		return 0;
@@ -953,7 +958,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 	 * that may make page_freeze_refs()/page_unfreeze_refs() mismatch.
 	 */
 	if (!(flags & MF_COUNT_INCREASED) &&
-		!get_page_unless_zero(compound_head(p))) {
+		!get_page_unless_zero(hpage)) {
 		if (is_free_buddy_page(p)) {
 			action_result(pfn, "free buddy", DELAYED);
 			return 0;
@@ -971,9 +976,9 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 	 * The check (unnecessarily) ignores LRU pages being isolated and
 	 * walked by the page reclaim code, however that's not a big loss.
 	 */
-	if (!PageLRU(p))
+	if (!PageLRU(p) && !PageHuge(p))
 		shake_page(p, 0);
-	if (!PageLRU(p)) {
+	if (!PageLRU(p) && !PageHuge(p)) {
 		/*
 		 * shake_page could have turned it free.
 		 */
@@ -991,7 +996,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 	 * It's very difficult to mess with pages currently under IO
 	 * and in many cases impossible, so we just avoid it here.
 	 */
-	lock_page_nosync(p);
+	lock_page_nosync(hpage);
 
 	/*
 	 * unpoison always clear PG_hwpoison inside page lock
@@ -1004,8 +1009,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 	if (hwpoison_filter(p)) {
 		if (TestClearPageHWPoison(p))
 			atomic_long_dec(&mce_bad_pages);
-		unlock_page(p);
-		put_page(p);
+		unlock_page(hpage);
+		put_page(hpage);
 		return 0;
 	}
 
@@ -1038,7 +1043,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 		}
 	}
 out:
-	unlock_page(p);
+	unlock_page(hpage);
 	return res;
 }
 EXPORT_SYMBOL_GPL(__memory_failure);
-- 
1.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/8] HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage
  2010-05-28  0:29 [PATCH 0/8] HWPOISON for hugepage (v6) Naoya Horiguchi
                   ` (2 preceding siblings ...)
  2010-05-28  0:29 ` [PATCH 3/8] HWPOISON, hugetlb: enable error handling path for hugepage Naoya Horiguchi
@ 2010-05-28  0:29 ` Naoya Horiguchi
  2010-05-28  0:29 ` [PATCH 5/8] HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error Naoya Horiguchi
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2010-05-28  0:29 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang, Mel Gorman

To avoid race condition between concurrent memory errors on identified
hugepage, we atomically test and set PG_hwpoison bit on the head page.
All pages in the error hugepage are considered as hwpoisoned
for now, so set and clear all PG_hwpoison bits in the hugepage
with page lock of the head page held.

Dependency:
  "HWPOISON, hugetlb: enable error handling path for hugepage"

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/memory-failure.c |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git v2.6.34/mm/memory-failure.c v2.6.34/mm/memory-failure.c
index 1ec68c8..fee648b 100644
--- v2.6.34/mm/memory-failure.c
+++ v2.6.34/mm/memory-failure.c
@@ -920,6 +920,22 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	return ret;
 }
 
+static void set_page_hwpoison_huge_page(struct page *hpage)
+{
+	int i;
+	int nr_pages = 1 << compound_order(hpage);
+	for (i = 0; i < nr_pages; i++)
+		SetPageHWPoison(hpage + i);
+}
+
+static void clear_page_hwpoison_huge_page(struct page *hpage)
+{
+	int i;
+	int nr_pages = 1 << compound_order(hpage);
+	for (i = 0; i < nr_pages; i++)
+		ClearPageHWPoison(hpage + i);
+}
+
 int __memory_failure(unsigned long pfn, int trapno, int flags)
 {
 	struct page_state *ps;
@@ -1014,6 +1030,26 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 		return 0;
 	}
 
+	/*
+	 * For error on the tail page, we should set PG_hwpoison
+	 * on the head page to show that the hugepage is hwpoisoned
+	 */
+	if (PageTail(p) && TestSetPageHWPoison(hpage)) {
+		action_result(pfn, "hugepage already hardware poisoned",
+				IGNORED);
+		unlock_page(hpage);
+		put_page(hpage);
+		return 0;
+	}
+	/*
+	 * Set PG_hwpoison on all pages in an error hugepage,
+	 * because containment is done in hugepage unit for now.
+	 * Since we have done TestSetPageHWPoison() for the head page with
+	 * page lock held, we can safely set PG_hwpoison bits on tail pages.
+	 */
+	if (PageHuge(p))
+		set_page_hwpoison_huge_page(hpage);
+
 	wait_on_page_writeback(p);
 
 	/*
@@ -1118,6 +1154,8 @@ int unpoison_memory(unsigned long pfn)
 		atomic_long_dec(&mce_bad_pages);
 		freeit = 1;
 	}
+	if (PageHuge(p))
+		clear_page_hwpoison_huge_page(page);
 	unlock_page(page);
 
 	put_page(page);
-- 
1.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/8] HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error
  2010-05-28  0:29 [PATCH 0/8] HWPOISON for hugepage (v6) Naoya Horiguchi
                   ` (3 preceding siblings ...)
  2010-05-28  0:29 ` [PATCH 4/8] HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage Naoya Horiguchi
@ 2010-05-28  0:29 ` Naoya Horiguchi
  2010-05-28  0:29 ` [PATCH 6/8] HWPOISON, hugetlb: isolate corrupted hugepage Naoya Horiguchi
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2010-05-28  0:29 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang, Mel Gorman

For now all pages in the error hugepage are considered as hwpoisoned,
so count all of them in mce_bad_pages.

Dependency:
  "HWPOISON, hugetlb: enable error handling path for hugepage"

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/memory-failure.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git v2.6.34/mm/memory-failure.c v2.6.34/mm/memory-failure.c
index fee648b..473f15a 100644
--- v2.6.34/mm/memory-failure.c
+++ v2.6.34/mm/memory-failure.c
@@ -942,6 +942,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 	struct page *p;
 	struct page *hpage;
 	int res;
+	unsigned int nr_pages;
 
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure from trap %d on page %lx", trapno, pfn);
@@ -960,7 +961,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 		return 0;
 	}
 
-	atomic_long_add(1, &mce_bad_pages);
+	nr_pages = 1 << compound_order(hpage);
+	atomic_long_add(nr_pages, &mce_bad_pages);
 
 	/*
 	 * We need/can do nothing about count=0 pages.
@@ -1024,7 +1026,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 	}
 	if (hwpoison_filter(p)) {
 		if (TestClearPageHWPoison(p))
-			atomic_long_dec(&mce_bad_pages);
+			atomic_long_sub(nr_pages, &mce_bad_pages);
 		unlock_page(hpage);
 		put_page(hpage);
 		return 0;
@@ -1123,6 +1125,7 @@ int unpoison_memory(unsigned long pfn)
 	struct page *page;
 	struct page *p;
 	int freeit = 0;
+	unsigned int nr_pages;
 
 	if (!pfn_valid(pfn))
 		return -ENXIO;
@@ -1135,9 +1138,11 @@ int unpoison_memory(unsigned long pfn)
 		return 0;
 	}
 
+	nr_pages = 1 << compound_order(page);
+
 	if (!get_page_unless_zero(page)) {
 		if (TestClearPageHWPoison(p))
-			atomic_long_dec(&mce_bad_pages);
+			atomic_long_sub(nr_pages, &mce_bad_pages);
 		pr_debug("MCE: Software-unpoisoned free page %#lx\n", pfn);
 		return 0;
 	}
@@ -1149,9 +1154,9 @@ int unpoison_memory(unsigned long pfn)
 	 * the PG_hwpoison page will be caught and isolated on the entrance to
 	 * the free buddy page pool.
 	 */
-	if (TestClearPageHWPoison(p)) {
+	if (TestClearPageHWPoison(page)) {
 		pr_debug("MCE: Software-unpoisoned page %#lx\n", pfn);
-		atomic_long_dec(&mce_bad_pages);
+		atomic_long_sub(nr_pages, &mce_bad_pages);
 		freeit = 1;
 	}
 	if (PageHuge(p))
-- 
1.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/8] HWPOISON, hugetlb: isolate corrupted hugepage
  2010-05-28  0:29 [PATCH 0/8] HWPOISON for hugepage (v6) Naoya Horiguchi
                   ` (4 preceding siblings ...)
  2010-05-28  0:29 ` [PATCH 5/8] HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error Naoya Horiguchi
@ 2010-05-28  0:29 ` Naoya Horiguchi
  2010-05-28  0:29 ` [PATCH 7/8] HWPOISON, hugetlb: detect hwpoison in hugetlb code Naoya Horiguchi
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2010-05-28  0:29 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang, Mel Gorman

If error hugepage is not in-use, we can fully recovery from error
by dequeuing it from freelist, so return RECOVERY.
Otherwise whether or not we can recovery depends on user processes,
so return DELAYED.

Dependency:
  "HWPOISON, hugetlb: enable error handling path for hugepage"

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/hugetlb.h |    2 ++
 mm/hugetlb.c            |   16 ++++++++++++++++
 mm/memory-failure.c     |   28 ++++++++++++++++++++--------
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git v2.6.34/include/linux/hugetlb.h v2.6.34/include/linux/hugetlb.h
index e688fd8..f479700 100644
--- v2.6.34/include/linux/hugetlb.h
+++ v2.6.34/include/linux/hugetlb.h
@@ -43,6 +43,7 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to,
 						struct vm_area_struct *vma,
 						int acctflags);
 void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
+void __isolate_hwpoisoned_huge_page(struct page *page);
 
 extern unsigned long hugepages_treat_as_movable;
 extern const unsigned long hugetlb_zero, hugetlb_infinity;
@@ -100,6 +101,7 @@ static inline void hugetlb_report_meminfo(struct seq_file *m)
 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
 #define hugetlb_fault(mm, vma, addr, flags)	({ BUG(); 0; })
 #define huge_pte_offset(mm, address)	0
+#define __isolate_hwpoisoned_huge_page(page)	0
 
 #define hugetlb_change_protection(vma, address, end, newprot)
 
diff --git v2.6.34/mm/hugetlb.c v2.6.34/mm/hugetlb.c
index b1aa0d8..aaba3cc 100644
--- v2.6.34/mm/hugetlb.c
+++ v2.6.34/mm/hugetlb.c
@@ -2821,3 +2821,19 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
 	hugetlb_put_quota(inode->i_mapping, (chg - freed));
 	hugetlb_acct_memory(h, -(chg - freed));
 }
+
+/*
+ * This function is called from memory failure code.
+ * Assume the caller holds page lock of the head page.
+ */
+void __isolate_hwpoisoned_huge_page(struct page *hpage)
+{
+	struct hstate *h = page_hstate(hpage);
+	int nid = page_to_nid(hpage);
+
+	spin_lock(&hugetlb_lock);
+	list_del(&hpage->lru);
+	h->free_huge_pages--;
+	h->free_huge_pages_node[nid]--;
+	spin_unlock(&hugetlb_lock);
+}
diff --git v2.6.34/mm/memory-failure.c v2.6.34/mm/memory-failure.c
index 473f15a..d0b420a 100644
--- v2.6.34/mm/memory-failure.c
+++ v2.6.34/mm/memory-failure.c
@@ -690,17 +690,29 @@ static int me_swapcache_clean(struct page *p, unsigned long pfn)
 /*
  * Huge pages. Needs work.
  * Issues:
- * No rmap support so we cannot find the original mapper. In theory could walk
- * all MMs and look for the mappings, but that would be non atomic and racy.
- * Need rmap for hugepages for this. Alternatively we could employ a heuristic,
- * like just walking the current process and hoping it has it mapped (that
- * should be usually true for the common "shared database cache" case)
- * Should handle free huge pages and dequeue them too, but this needs to
- * handle huge page accounting correctly.
+ * - Error on hugepage is contained in hugepage unit (not in raw page unit.)
+ *   To narrow down kill region to one page, we need to break up pmd.
+ * - To support soft-offlining for hugepage, we need to support hugepage
+ *   migration.
  */
 static int me_huge_page(struct page *p, unsigned long pfn)
 {
-	return FAILED;
+	struct page *hpage = compound_head(p);
+	/*
+	 * We can safely recover from error on free or reserved (i.e.
+	 * not in-use) hugepage by dequeuing it from freelist.
+	 * To check whether a hugepage is in-use or not, we can't use
+	 * page->lru because it can be used in other hugepage operations,
+	 * such as __unmap_hugepage_range() and gather_surplus_pages().
+	 * So instead we use page_mapping() and PageAnon().
+	 * We assume that this function is called with page lock held,
+	 * so there is no race between isolation and mapping/unmapping.
+	 */
+	if (!(page_mapping(hpage) || PageAnon(hpage))) {
+		__isolate_hwpoisoned_huge_page(hpage);
+		return RECOVERED;
+	}
+	return DELAYED;
 }
 
 /*
-- 
1.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/8] HWPOISON, hugetlb: detect hwpoison in hugetlb code
  2010-05-28  0:29 [PATCH 0/8] HWPOISON for hugepage (v6) Naoya Horiguchi
                   ` (5 preceding siblings ...)
  2010-05-28  0:29 ` [PATCH 6/8] HWPOISON, hugetlb: isolate corrupted hugepage Naoya Horiguchi
@ 2010-05-28  0:29 ` Naoya Horiguchi
  2010-05-28  0:29 ` [PATCH 8/8] HWPOISON, hugetlb: support hwpoison injection for hugepage Naoya Horiguchi
  2010-05-31  9:30 ` [PATCH 0/8] HWPOISON for hugepage (v6) Andi Kleen
  8 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2010-05-28  0:29 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang, Mel Gorman

This patch enables to block access to hwpoisoned hugepage and
also enables to block unmapping for it.

Dependency:
  "HWPOISON, hugetlb: enable error handling path for hugepage"

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/hugetlb.c |   40 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 40 insertions(+), 0 deletions(-)

diff --git v2.6.34/mm/hugetlb.c v2.6.34/mm/hugetlb.c
index aaba3cc..5580568 100644
--- v2.6.34/mm/hugetlb.c
+++ v2.6.34/mm/hugetlb.c
@@ -19,6 +19,8 @@
 #include <linux/sysfs.h>
 #include <linux/slab.h>
 #include <linux/rmap.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -2145,6 +2147,19 @@ nomem:
 	return -ENOMEM;
 }
 
+static int is_hugetlb_entry_hwpoisoned(pte_t pte)
+{
+	swp_entry_t swp;
+
+	if (huge_pte_none(pte) || pte_present(pte))
+		return 0;
+	swp = pte_to_swp_entry(pte);
+	if (non_swap_entry(swp) && is_hwpoison_entry(swp)) {
+		return 1;
+	} else
+		return 0;
+}
+
 void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 			    unsigned long end, struct page *ref_page)
 {
@@ -2203,6 +2218,12 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 		if (huge_pte_none(pte))
 			continue;
 
+		/*
+		 * HWPoisoned hugepage is already unmapped and dropped reference
+		 */
+		if (unlikely(is_hugetlb_entry_hwpoisoned(pte)))
+			continue;
+
 		page = pte_page(pte);
 		if (pte_dirty(pte))
 			set_page_dirty(page);
@@ -2487,6 +2508,18 @@ retry:
 	}
 
 	/*
+	 * Since memory error handler replaces pte into hwpoison swap entry
+	 * at the time of error handling, a process which reserved but not have
+	 * the mapping to the error hugepage does not have hwpoison swap entry.
+	 * So we need to block accesses from such a process by checking
+	 * PG_hwpoison bit here.
+	 */
+	if (unlikely(PageHWPoison(page))) {
+		ret = VM_FAULT_HWPOISON;
+		goto backout_unlocked;
+	}
+
+	/*
 	 * If we are going to COW a private mapping later, we examine the
 	 * pending reservations for this page now. This will ensure that
 	 * any allocations necessary to record that reservation occur outside
@@ -2540,6 +2573,13 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	static DEFINE_MUTEX(hugetlb_instantiation_mutex);
 	struct hstate *h = hstate_vma(vma);
 
+	ptep = huge_pte_offset(mm, address);
+	if (ptep) {
+		entry = huge_ptep_get(ptep);
+		if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
+			return VM_FAULT_HWPOISON;
+	}
+
 	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
 	if (!ptep)
 		return VM_FAULT_OOM;
-- 
1.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 8/8] HWPOISON, hugetlb: support hwpoison injection for hugepage
  2010-05-28  0:29 [PATCH 0/8] HWPOISON for hugepage (v6) Naoya Horiguchi
                   ` (6 preceding siblings ...)
  2010-05-28  0:29 ` [PATCH 7/8] HWPOISON, hugetlb: detect hwpoison in hugetlb code Naoya Horiguchi
@ 2010-05-28  0:29 ` Naoya Horiguchi
  2010-05-31  9:30 ` [PATCH 0/8] HWPOISON for hugepage (v6) Andi Kleen
  8 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2010-05-28  0:29 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang, Mel Gorman

This patch enables hwpoison injection through debug/hwpoison interfaces,
with which we can test memory error handling for free or reserved
hugepages (which cannot be tested by madvise() injector).

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/hwpoison-inject.c |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git v2.6.34/mm/hwpoison-inject.c v2.6.34/mm/hwpoison-inject.c
index 10ea719..0948f10 100644
--- v2.6.34/mm/hwpoison-inject.c
+++ v2.6.34/mm/hwpoison-inject.c
@@ -5,6 +5,7 @@
 #include <linux/mm.h>
 #include <linux/swap.h>
 #include <linux/pagemap.h>
+#include <linux/hugetlb.h>
 #include "internal.h"
 
 static struct dentry *hwpoison_dir;
@@ -13,6 +14,7 @@ static int hwpoison_inject(void *data, u64 val)
 {
 	unsigned long pfn = val;
 	struct page *p;
+	struct page *hpage;
 	int err;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -24,18 +26,19 @@ static int hwpoison_inject(void *data, u64 val)
 		return -ENXIO;
 
 	p = pfn_to_page(pfn);
+	hpage = compound_head(p);
 	/*
 	 * This implies unable to support free buddy pages.
 	 */
-	if (!get_page_unless_zero(p))
+	if (!get_page_unless_zero(hpage))
 		return 0;
 
-	if (!PageLRU(p))
+	if (!PageLRU(p) && !PageHuge(p))
 		shake_page(p, 0);
 	/*
 	 * This implies unable to support non-LRU pages.
 	 */
-	if (!PageLRU(p))
+	if (!PageLRU(p) && !PageHuge(p))
 		return 0;
 
 	/*
@@ -44,9 +47,9 @@ static int hwpoison_inject(void *data, u64 val)
 	 * We temporarily take page lock for try_get_mem_cgroup_from_page().
 	 * __memory_failure() will redo the check reliably inside page lock.
 	 */
-	lock_page(p);
-	err = hwpoison_filter(p);
-	unlock_page(p);
+	lock_page(hpage);
+	err = hwpoison_filter(hpage);
+	unlock_page(hpage);
 	if (err)
 		return 0;
 
-- 
1.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/8] hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h
  2010-05-28  0:29 ` [PATCH 1/8] hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h Naoya Horiguchi
@ 2010-05-28 10:03   ` Mel Gorman
  2010-08-10 19:53     ` Wu Fengguang
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2010-05-28 10:03 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang

On Fri, May 28, 2010 at 09:29:15AM +0900, Naoya Horiguchi wrote:
> is_vm_hugetlb_page() is a widely used inline function to insert hooks
> into hugetlb code.
> But we can't use it in pagemap.h because of circular dependency of
> the header files. This patch removes this limitation.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  include/linux/hugetlb.h        |   11 +----------
>  include/linux/hugetlb_inline.h |   22 ++++++++++++++++++++++
>  include/linux/pagemap.h        |    1 +
>  3 files changed, 24 insertions(+), 10 deletions(-)
>  create mode 100644 include/linux/hugetlb_inline.h
> 
> diff --git v2.6.34/include/linux/hugetlb.h v2.6.34/include/linux/hugetlb.h
> index 78b4bc6..d47a7c4 100644
> --- v2.6.34/include/linux/hugetlb.h
> +++ v2.6.34/include/linux/hugetlb.h
> @@ -2,6 +2,7 @@
>  #define _LINUX_HUGETLB_H
>  
>  #include <linux/fs.h>
> +#include <linux/hugetlb_inline.h>
>  
>  struct ctl_table;
>  struct user_struct;
> @@ -14,11 +15,6 @@ struct user_struct;
>  
>  int PageHuge(struct page *page);
>  
> -static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
> -{
> -	return vma->vm_flags & VM_HUGETLB;
> -}
> -
>  void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
>  int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
>  int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
> @@ -77,11 +73,6 @@ static inline int PageHuge(struct page *page)
>  	return 0;
>  }
>  
> -static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
> -{
> -	return 0;
> -}
> -
>  static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
>  {
>  }
> diff --git v2.6.34/include/linux/hugetlb_inline.h v2.6.34/include/linux/hugetlb_inline.h
> new file mode 100644
> index 0000000..cf00b6d
> --- /dev/null
> +++ v2.6.34/include/linux/hugetlb_inline.h
> @@ -0,0 +1,22 @@
> +#ifndef _LINUX_HUGETLB_INLINE_H
> +#define _LINUX_HUGETLB_INLINE_H 1
> +

Just #define __LINUX_HUGETLB_INLINE_H is fine. No need for the 1

> +#ifdef CONFIG_HUGETLBFS
> +

Should be CONFIG_HUGETLB_PAGE

With those corrections;

Acked-by: Mel Gorman <mel@csn.ul.ie>

> +#include <linux/mm.h>
> +
> +static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
> +{
> +	return vma->vm_flags & VM_HUGETLB;
> +}
> +
> +#else
> +
> +static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
> +{
> +	return 0;
> +}
> +
> +#endif
> +
> +#endif
> diff --git v2.6.34/include/linux/pagemap.h v2.6.34/include/linux/pagemap.h
> index 3c62ed4..b2bd2ba 100644
> --- v2.6.34/include/linux/pagemap.h
> +++ v2.6.34/include/linux/pagemap.h
> @@ -13,6 +13,7 @@
>  #include <linux/gfp.h>
>  #include <linux/bitops.h>
>  #include <linux/hardirq.h> /* for in_interrupt() */
> +#include <linux/hugetlb_inline.h>
>  
>  /*
>   * Bits in mapping->flags.  The lower __GFP_BITS_SHIFT bits are the page
> -- 
> 1.7.0
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage
  2010-05-28  0:29 ` [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage Naoya Horiguchi
@ 2010-05-28 14:48   ` Mel Gorman
  2010-08-10 23:23     ` Wu Fengguang
  2010-06-02 18:16   ` Andrew Morton
  1 sibling, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2010-05-28 14:48 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang,
	Andrea Arcangeli, Larry Woodman, Lee Schermerhorn

On Fri, May 28, 2010 at 09:29:16AM +0900, Naoya Horiguchi wrote:
> This patch adds reverse mapping feature for hugepage by introducing
> mapcount for shared/private-mapped hugepage and anon_vma for
> private-mapped hugepage.
> 
> While hugepage is not currently swappable, reverse mapping can be useful
> for memory error handler.
> 
> Without this patch, memory error handler cannot identify processes
> using the bad hugepage nor unmap it from them. That is:
> - for shared hugepage:
>   we can collect processes using a hugepage through pagecache,
>   but can not unmap the hugepage because of the lack of mapcount.
> - for privately mapped hugepage:
>   we can neither collect processes nor unmap the hugepage.
> This patch solves these problems.
> 
> This patch include the bug fix given by commit 23be7468e8, so reverts it.
> 
> Dependency:
>   "hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h"
> 
> ChangeLog since May 24.
> - create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
> - move functions setting up anon_vma for hugepage into mm/rmap.c.
> 
> ChangeLog since May 13.
> - rebased to 2.6.34
> - fix logic error (in case that private mapping and shared mapping coexist)
> - move is_vm_hugetlb_page() into include/linux/mm.h to use this function
>   from linear_page_index()
> - define and use linear_hugepage_index() instead of compound_order()
> - use page_move_anon_rmap() in hugetlb_cow()
> - copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
> - revert commit 24be7468 completely
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Larry Woodman <lwoodman@redhat.com>
> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Ok, I could find no other problems with the hugetlb side of things in the
first two patches. I haven't looked at the hwpoison parts but I'm assuming
Andi has looked at those already. Thanks

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] HWPOISON for hugepage (v6)
  2010-05-28  0:29 [PATCH 0/8] HWPOISON for hugepage (v6) Naoya Horiguchi
                   ` (7 preceding siblings ...)
  2010-05-28  0:29 ` [PATCH 8/8] HWPOISON, hugetlb: support hwpoison injection for hugepage Naoya Horiguchi
@ 2010-05-31  9:30 ` Andi Kleen
  2010-05-31 10:17   ` Naoya Horiguchi
  8 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2010-05-31  9:30 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton, Wu Fengguang,
	Mel Gorman

On Fri, May 28, 2010 at 09:29:14AM +0900, Naoya Horiguchi wrote:
> Hi,
> 
> Here is a "HWPOISON for hugepage" patchset which reflects
> Mel's comments on hugepage rmapping code.
> Only patch 1/8 and 2/8 are changed since the previous post.
> 
> Mel, could you please restart reviewing and testing?

Thanks everyone, I merged this patch series in the hwpoison
tree now, aimed for 2.6.36. It should appear in linux-next
shortly.

Question is how to proceed now: the next steps would
be early kill support and soft offline/migration support for
hugetlb too. Horiguchi-san, is this something you're interested
in working on?

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/8] HWPOISON for hugepage (v6)
  2010-05-31  9:30 ` [PATCH 0/8] HWPOISON for hugepage (v6) Andi Kleen
@ 2010-05-31 10:17   ` Naoya Horiguchi
  0 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2010-05-31 10:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-mm, linux-kernel, Andrew Morton, Wu Fengguang, Mel Gorman

On Mon, May 31, 2010 at 11:30:09AM +0200, Andi Kleen wrote:
> On Fri, May 28, 2010 at 09:29:14AM +0900, Naoya Horiguchi wrote:
> > Hi,
> > 
> > Here is a "HWPOISON for hugepage" patchset which reflects
> > Mel's comments on hugepage rmapping code.
> > Only patch 1/8 and 2/8 are changed since the previous post.
> > 
> > Mel, could you please restart reviewing and testing?
> 
> Thanks everyone, I merged this patch series in the hwpoison
> tree now, aimed for 2.6.36. It should appear in linux-next
> shortly.

Thank you.

> Question is how to proceed now: the next steps would
> be early kill support

Does early kill for hugetlb work with this patchset, doesn't it?
Do you mean something else?

> and soft offline/migration support for
> hugetlb too. Horiguchi-san, is this something you're interested
> in working on?

Yes, it is.
I'll do it with pleasure :)

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage
  2010-05-28  0:29 ` [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage Naoya Horiguchi
  2010-05-28 14:48   ` Mel Gorman
@ 2010-06-02 18:16   ` Andrew Morton
  2010-06-03  1:38     ` [PATCH] replace ifdef CONFIG_HUGETLBFS into ifdef CONFIG_HUGETLB_PAGE (Re: [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage) Naoya Horiguchi
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2010-06-02 18:16 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Andi Kleen, Wu Fengguang, Mel Gorman,
	Andrea Arcangeli, Larry Woodman, Lee Schermerhorn

On Fri, 28 May 2010 09:29:16 +0900
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> +#ifdef CONFIG_HUGETLBFS
> +/*
> + * The following three functions are for anonymous (private mapped) hugepages.
> + * Unlike common anonymous pages, anonymous hugepages have no accounting code
> + * and no lru code, because we handle hugepages differently from common pages.
> + */
> +static void __hugepage_set_anon_rmap(struct page *page,
> +	struct vm_area_struct *vma, unsigned long address, int exclusive)
> +{
> +	struct anon_vma *anon_vma = vma->anon_vma;
> +	BUG_ON(!anon_vma);
> +	if (!exclusive) {
> +		struct anon_vma_chain *avc;
> +		avc = list_entry(vma->anon_vma_chain.prev,
> +				 struct anon_vma_chain, same_vma);
> +		anon_vma = avc->anon_vma;
> +	}
> +	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> +	page->mapping = (struct address_space *) anon_vma;
> +	page->index = linear_page_index(vma, address);
> +}
> +
> +void hugepage_add_anon_rmap(struct page *page,
> +			    struct vm_area_struct *vma, unsigned long address)
> +{
> +	struct anon_vma *anon_vma = vma->anon_vma;
> +	int first;
> +	BUG_ON(!anon_vma);
> +	BUG_ON(address < vma->vm_start || address >= vma->vm_end);
> +	first = atomic_inc_and_test(&page->_mapcount);
> +	if (first)
> +		__hugepage_set_anon_rmap(page, vma, address, 0);
> +}
> +
> +void hugepage_add_new_anon_rmap(struct page *page,
> +			struct vm_area_struct *vma, unsigned long address)
> +{
> +	BUG_ON(address < vma->vm_start || address >= vma->vm_end);
> +	atomic_set(&page->_mapcount, 0);
> +	__hugepage_set_anon_rmap(page, vma, address, 1);
> +}
> +#endif /* CONFIG_HUGETLBFS */

This code still make sense if CONFIG_HUGETLBFS=n, I think?  Should it
instead depend on CONFIG_HUGETLB_PAGE?

I have a feeling that we make that confusion relatively often.  Perhaps
CONFIG_HUGETLB_PAGE=y && CONFIG_HUGETLBFS=n makes no sense and we
should unify them...  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] replace ifdef CONFIG_HUGETLBFS into ifdef CONFIG_HUGETLB_PAGE (Re: [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage)
  2010-06-02 18:16   ` Andrew Morton
@ 2010-06-03  1:38     ` Naoya Horiguchi
  0 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2010-06-03  1:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Andi Kleen, Wu Fengguang, Mel Gorman,
	Andrea Arcangeli, Larry Woodman, Lee Schermerhorn

On Wed, Jun 02, 2010 at 11:16:17AM -0700, Andrew Morton wrote:
> On Fri, 28 May 2010 09:29:16 +0900
> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > +#ifdef CONFIG_HUGETLBFS
> > +/*
> > + * The following three functions are for anonymous (private mapped) hugepages.
> > + * Unlike common anonymous pages, anonymous hugepages have no accounting code
> > + * and no lru code, because we handle hugepages differently from common pages.
> > + */
> > +static void __hugepage_set_anon_rmap(struct page *page,
> > +   struct vm_area_struct *vma, unsigned long address, int exclusive)
> > +{
> > +   struct anon_vma *anon_vma = vma->anon_vma;
> > +   BUG_ON(!anon_vma);
> > +   if (!exclusive) {
> > +           struct anon_vma_chain *avc;
> > +           avc = list_entry(vma->anon_vma_chain.prev,
> > +                            struct anon_vma_chain, same_vma);
> > +           anon_vma = avc->anon_vma;
> > +   }
> > +   anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> > +   page->mapping = (struct address_space *) anon_vma;
> > +   page->index = linear_page_index(vma, address);
> > +}
> > +
> > +void hugepage_add_anon_rmap(struct page *page,
> > +                       struct vm_area_struct *vma, unsigned long address)
> > +{
> > +   struct anon_vma *anon_vma = vma->anon_vma;
> > +   int first;
> > +   BUG_ON(!anon_vma);
> > +   BUG_ON(address < vma->vm_start || address >= vma->vm_end);
> > +   first = atomic_inc_and_test(&page->_mapcount);
> > +   if (first)
> > +           __hugepage_set_anon_rmap(page, vma, address, 0);
> > +}
> > +
> > +void hugepage_add_new_anon_rmap(struct page *page,
> > +                   struct vm_area_struct *vma, unsigned long address)
> > +{
> > +   BUG_ON(address < vma->vm_start || address >= vma->vm_end);
> > +   atomic_set(&page->_mapcount, 0);
> > +   __hugepage_set_anon_rmap(page, vma, address, 1);
> > +}
> > +#endif /* CONFIG_HUGETLBFS */
> 
> This code still make sense if CONFIG_HUGETLBFS=n, I think?  Should it
> instead depend on CONFIG_HUGETLB_PAGE?

Yes.
CONFIG_HUGETLBFS controls hugetlbfs interface code.
OTOH, CONFIG_HUGETLB_PAGE controls hugepage management code.
So we should use CONFIG_HUGETLB_PAGE here.

I attached a fix patch below. This includes another fix in
include/linux/hugetlb_inline.h (commented by Mel Gorman.)

Andi-san, could you add this patch on top of your tree?

> I have a feeling that we make that confusion relatively often.  Perhaps
> CONFIG_HUGETLB_PAGE=y && CONFIG_HUGETLBFS=n makes no sense and we
> should unify them...  

Agreed.

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date: Thu, 3 Jun 2010 10:32:08 +0900
Subject: [PATCH] replace ifdef CONFIG_HUGETLBFS into ifdef CONFIG_HUGETLB_PAGE

CONFIG_HUGETLBFS controls hugetlbfs interface code.
OTOH, CONFIG_HUGETLB_PAGE controls hugepage management code.
So we should use CONFIG_HUGETLB_PAGE here.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/hugetlb_inline.h |    4 ++--
 mm/rmap.c                      |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
index cf00b6d..6931489 100644
--- a/include/linux/hugetlb_inline.h
+++ b/include/linux/hugetlb_inline.h
@@ -1,7 +1,7 @@
 #ifndef _LINUX_HUGETLB_INLINE_H
-#define _LINUX_HUGETLB_INLINE_H 1
+#define _LINUX_HUGETLB_INLINE_H
 
-#ifdef CONFIG_HUGETLBFS
+#ifdef CONFIG_HUGETLB_PAGE
 
 #include <linux/mm.h>
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 5278371..f7114c6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1436,7 +1436,7 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
 }
 #endif /* CONFIG_MIGRATION */
 
-#ifdef CONFIG_HUGETLBFS
+#ifdef CONFIG_HUGETLB_PAGE
 /*
  * The following three functions are for anonymous (private mapped) hugepages.
  * Unlike common anonymous pages, anonymous hugepages have no accounting code
@@ -1477,4 +1477,4 @@ void hugepage_add_new_anon_rmap(struct page *page,
 	atomic_set(&page->_mapcount, 0);
 	__hugepage_set_anon_rmap(page, vma, address, 1);
 }
-#endif /* CONFIG_HUGETLBFS */
+#endif /* CONFIG_HUGETLB_PAGE */
-- 
1.7.0.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/8] hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h
  2010-05-28 10:03   ` Mel Gorman
@ 2010-08-10 19:53     ` Wu Fengguang
  0 siblings, 0 replies; 17+ messages in thread
From: Wu Fengguang @ 2010-08-10 19:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Naoya Horiguchi, linux-mm, linux-kernel, Andi Kleen,
	Andrew Morton

> > +#ifndef _LINUX_HUGETLB_INLINE_H
> > +#define _LINUX_HUGETLB_INLINE_H 1
> > +
> 
> Just #define __LINUX_HUGETLB_INLINE_H is fine. No need for the 1
> 
> > +#ifdef CONFIG_HUGETLBFS
> > +
> 
> Should be CONFIG_HUGETLB_PAGE
> 
> With those corrections;
> 
> Acked-by: Mel Gorman <mel@csn.ul.ie>

Both fixed in Andi's tree, so

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage
  2010-05-28 14:48   ` Mel Gorman
@ 2010-08-10 23:23     ` Wu Fengguang
  0 siblings, 0 replies; 17+ messages in thread
From: Wu Fengguang @ 2010-08-10 23:23 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Naoya Horiguchi, linux-mm, linux-kernel, Andi Kleen,
	Andrew Morton, Andrea Arcangeli, Larry Woodman, Lee Schermerhorn

On Fri, May 28, 2010 at 03:48:24PM +0100, Mel Gorman wrote:
> On Fri, May 28, 2010 at 09:29:16AM +0900, Naoya Horiguchi wrote:
> > This patch adds reverse mapping feature for hugepage by introducing
> > mapcount for shared/private-mapped hugepage and anon_vma for
> > private-mapped hugepage.
> > 
> > While hugepage is not currently swappable, reverse mapping can be useful
> > for memory error handler.
> > 
> > Without this patch, memory error handler cannot identify processes
> > using the bad hugepage nor unmap it from them. That is:
> > - for shared hugepage:
> >   we can collect processes using a hugepage through pagecache,
> >   but can not unmap the hugepage because of the lack of mapcount.
> > - for privately mapped hugepage:
> >   we can neither collect processes nor unmap the hugepage.
> > This patch solves these problems.
> > 
> > This patch include the bug fix given by commit 23be7468e8, so reverts it.
> > 
> > Dependency:
> >   "hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h"
> > 
> > ChangeLog since May 24.
> > - create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
> > - move functions setting up anon_vma for hugepage into mm/rmap.c.
> > 
> > ChangeLog since May 13.
> > - rebased to 2.6.34
> > - fix logic error (in case that private mapping and shared mapping coexist)
> > - move is_vm_hugetlb_page() into include/linux/mm.h to use this function
> >   from linear_page_index()
> > - define and use linear_hugepage_index() instead of compound_order()
> > - use page_move_anon_rmap() in hugetlb_cow()
> > - copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
> > - revert commit 24be7468 completely
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: Andi Kleen <andi@firstfloor.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Wu Fengguang <fengguang.wu@intel.com>
> > Cc: Mel Gorman <mel@csn.ul.ie>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Larry Woodman <lwoodman@redhat.com>
> > Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
> 
> Ok, I could find no other problems with the hugetlb side of things in the
> first two patches. I haven't looked at the hwpoison parts but I'm assuming
> Andi has looked at those already. Thanks
> 
> Acked-by: Mel Gorman <mel@csn.ul.ie>

The hwpoison part looks good. We actually start by adding ugly code in
memory-failure.c to special handle huge pages, then decided that
hugetlb rmap is the way to go :)

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-08-11  1:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-28  0:29 [PATCH 0/8] HWPOISON for hugepage (v6) Naoya Horiguchi
2010-05-28  0:29 ` [PATCH 1/8] hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h Naoya Horiguchi
2010-05-28 10:03   ` Mel Gorman
2010-08-10 19:53     ` Wu Fengguang
2010-05-28  0:29 ` [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage Naoya Horiguchi
2010-05-28 14:48   ` Mel Gorman
2010-08-10 23:23     ` Wu Fengguang
2010-06-02 18:16   ` Andrew Morton
2010-06-03  1:38     ` [PATCH] replace ifdef CONFIG_HUGETLBFS into ifdef CONFIG_HUGETLB_PAGE (Re: [PATCH 2/8] hugetlb, rmap: add reverse mapping for hugepage) Naoya Horiguchi
2010-05-28  0:29 ` [PATCH 3/8] HWPOISON, hugetlb: enable error handling path for hugepage Naoya Horiguchi
2010-05-28  0:29 ` [PATCH 4/8] HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage Naoya Horiguchi
2010-05-28  0:29 ` [PATCH 5/8] HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error Naoya Horiguchi
2010-05-28  0:29 ` [PATCH 6/8] HWPOISON, hugetlb: isolate corrupted hugepage Naoya Horiguchi
2010-05-28  0:29 ` [PATCH 7/8] HWPOISON, hugetlb: detect hwpoison in hugetlb code Naoya Horiguchi
2010-05-28  0:29 ` [PATCH 8/8] HWPOISON, hugetlb: support hwpoison injection for hugepage Naoya Horiguchi
2010-05-31  9:30 ` [PATCH 0/8] HWPOISON for hugepage (v6) Andi Kleen
2010-05-31 10:17   ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).