* [RFC][PATCH 0/9] extend hugepage migration
@ 2013-02-21 19:41 Naoya Horiguchi
  2013-02-21 19:41 ` [PATCH 1/9] migrate: add migrate_entry_wait_huge() Naoya Horiguchi
                   ` (9 more replies)
  0 siblings, 10 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-21 19:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, KOSAKI Motohiro,
	Andi Kleen, linux-kernel
Hi,
Hugepage migration is now available only for soft offlining (moving
data on the half corrupted page to another page to save the data).
But it's also useful some other users of page migration, so this
patchset tries to extend some of such users to support hugepage.
The targets of this patchset are NUMA related system calls (i.e.
migrate_pages(2), move_pages(2), and mbind(2)), and memory hotplug.
This patchset does not extend page migration in memory compaction,
because I think that users of memory compaction mainly expect to
construct thp by arranging raw pages but hugepage migration doesn't
help it.
CMA, another user of page migration, can have benefit from hugepage
migration, but is not enabled to support it now. This is because
I've never used CMA and need to learn more to extend and/or test
hugepage migration in CMA. I'll add this in later version if it
becomes ready, or will post as a separate patchset.
Hugepage migration of 1GB hugepage is not enabled for now, because
I'm not sure whether users of 1GB hugepage really want it.
We need to spare free hugepage in order to do migration, but I don't
think that users want to 1GB memory to idle for that purpose
(currently we can't expand/shrink 1GB hugepage pool after boot).
Could you review and give me some comments/feedbacks?
Thanks,
Naoya Horiguchi
---
Easy patch access:
  git@github.com:Naoya-Horiguchi/linux.git
  branch:extend_hugepage_migration
Test code:
  git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git
Naoya Horiguchi (9):
      migrate: add migrate_entry_wait_huge()
      migrate: make core migration code aware of hugepage
      soft-offline: use migrate_pages() instead of migrate_huge_page()
      migrate: clean up migrate_huge_page()
      migrate: enable migrate_pages() to migrate hugepage
      migrate: enable move_pages() to migrate hugepage
      mbind: enable mbind() to migrate hugepage
      memory-hotplug: enable memory hotplug to handle hugepage
      remove /proc/sys/vm/hugepages_treat_as_movable
 Documentation/sysctl/vm.txt |  16 ------
 include/linux/hugetlb.h     |  25 ++++++++--
 include/linux/mempolicy.h   |   2 +-
 include/linux/migrate.h     |  12 ++---
 include/linux/swapops.h     |   4 ++
 kernel/sysctl.c             |   7 ---
 mm/hugetlb.c                |  98 ++++++++++++++++++++++++++++--------
 mm/memory-failure.c         |  20 ++++++--
 mm/memory.c                 |   6 ++-
 mm/memory_hotplug.c         |  51 +++++++++++++++----
 mm/mempolicy.c              |  61 +++++++++++++++--------
 mm/migrate.c                | 119 ++++++++++++++++++++++++++++++--------------
 mm/page_alloc.c             |  12 +++++
 mm/page_isolation.c         |   5 ++
 14 files changed, 311 insertions(+), 127 deletions(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* [PATCH 1/9] migrate: add migrate_entry_wait_huge()
  2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
@ 2013-02-21 19:41 ` Naoya Horiguchi
  2013-03-18 14:51   ` Michal Hocko
  2013-03-19 23:57   ` Simon Jeons
  2013-02-21 19:41 ` [PATCH 2/9] migrate: make core migration code aware of hugepage Naoya Horiguchi
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-21 19:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, KOSAKI Motohiro,
	Andi Kleen, linux-kernel
When we have a page fault for the address which is backed by a hugepage
under migration, the kernel can't wait correctly until the migration
finishes. This is because pte_offset_map_lock() can't get a correct
migration entry for hugepage. This patch adds migration_entry_wait_huge()
to separate code path between normal pages and hugepages.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/hugetlb.h |  2 ++
 include/linux/swapops.h |  4 ++++
 mm/hugetlb.c            |  4 ++--
 mm/migrate.c            | 24 ++++++++++++++++++++++++
 4 files changed, 32 insertions(+), 2 deletions(-)
diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
index 0c80d3f..40b27f6 100644
--- v3.8.orig/include/linux/hugetlb.h
+++ v3.8/include/linux/hugetlb.h
@@ -43,6 +43,7 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int,
 #endif
 
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
+int is_hugetlb_entry_migration(pte_t pte);
 int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
 			struct page **, struct vm_area_struct **,
 			unsigned long *, int *, int, unsigned int flags);
@@ -109,6 +110,7 @@ static inline unsigned long hugetlb_total_pages(void)
 #define follow_hugetlb_page(m,v,p,vs,a,b,i,w)	({ BUG(); 0; })
 #define follow_huge_addr(mm, addr, write)	ERR_PTR(-EINVAL)
 #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
+#define is_hugetlb_entry_migration(pte)		({ BUG(); 0; })
 #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
 static inline void hugetlb_report_meminfo(struct seq_file *m)
 {
diff --git v3.8.orig/include/linux/swapops.h v3.8/include/linux/swapops.h
index 47ead51..f68efdd 100644
--- v3.8.orig/include/linux/swapops.h
+++ v3.8/include/linux/swapops.h
@@ -137,6 +137,8 @@ static inline void make_migration_entry_read(swp_entry_t *entry)
 
 extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 					unsigned long address);
+extern void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd,
+					unsigned long address);
 #else
 
 #define make_migration_entry(page, write) swp_entry(0, 0)
@@ -148,6 +150,8 @@ static inline int is_migration_entry(swp_entry_t swp)
 static inline void make_migration_entry_read(swp_entry_t *entryp) { }
 static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 					 unsigned long address) { }
+static inline void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd,
+					 unsigned long address) { }
 static inline int is_write_migration_entry(swp_entry_t entry)
 {
 	return 0;
diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
index 546db81..351025e 100644
--- v3.8.orig/mm/hugetlb.c
+++ v3.8/mm/hugetlb.c
@@ -2313,7 +2313,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 	return -ENOMEM;
 }
 
-static int is_hugetlb_entry_migration(pte_t pte)
+int is_hugetlb_entry_migration(pte_t pte)
 {
 	swp_entry_t swp;
 
@@ -2823,7 +2823,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (ptep) {
 		entry = huge_ptep_get(ptep);
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
-			migration_entry_wait(mm, (pmd_t *)ptep, address);
+			migration_entry_wait_huge(mm, (pmd_t *)ptep, address);
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c
index 2fd8b4a..7d84f4c 100644
--- v3.8.orig/mm/migrate.c
+++ v3.8/mm/migrate.c
@@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 	pte_unmap_unlock(ptep, ptl);
 }
 
+void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd,
+				unsigned long address)
+{
+	spinlock_t *ptl = pte_lockptr(mm, pmd);
+	pte_t pte;
+	swp_entry_t entry;
+	struct page *page;
+
+	spin_lock(ptl);
+	pte = huge_ptep_get((pte_t *)pmd);
+	if (!is_hugetlb_entry_migration(pte))
+		goto out;
+	entry = pte_to_swp_entry(pte);
+	page = migration_entry_to_page(entry);
+	if (!get_page_unless_zero(page))
+		goto out;
+	spin_unlock(ptl);
+	wait_on_page_locked(page);
+	put_page(page);
+	return;
+out:
+	spin_unlock(ptl);
+}
+
 #ifdef CONFIG_BLOCK
 /* Returns true if all buffers are successfully locked */
 static bool buffer_migrate_lock_buffers(struct buffer_head *head,
-- 
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related	[flat|nested] 55+ messages in thread
* [PATCH 2/9] migrate: make core migration code aware of hugepage
  2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
  2013-02-21 19:41 ` [PATCH 1/9] migrate: add migrate_entry_wait_huge() Naoya Horiguchi
@ 2013-02-21 19:41 ` Naoya Horiguchi
  2013-03-18 15:22   ` Michal Hocko
  2013-02-21 19:41 ` [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Naoya Horiguchi
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-21 19:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, KOSAKI Motohiro,
	Andi Kleen, linux-kernel
Before enabling each user of page migration to support hugepage,
this patch adds necessary changes on core migration code.
The main change is that the list of pages to migrate can link
not only LRU pages, but also hugepages.
Along with this, functions such as migrate_pages() and
putback_movable_pages() need to be changed to handle hugepages.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/hugetlb.h   |  4 ++++
 include/linux/mempolicy.h |  2 +-
 include/linux/migrate.h   |  6 ++++++
 mm/hugetlb.c              | 16 ++++++++++++++++
 mm/migrate.c              | 27 +++++++++++++++++++++++++--
 5 files changed, 52 insertions(+), 3 deletions(-)
diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
index 40b27f6..8f87115 100644
--- v3.8.orig/include/linux/hugetlb.h
+++ v3.8/include/linux/hugetlb.h
@@ -67,6 +67,8 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to,
 						vm_flags_t vm_flags);
 void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
 int dequeue_hwpoisoned_huge_page(struct page *page);
+void putback_active_hugepage(struct page *page);
+void putback_active_hugepages(struct list_head *l);
 void copy_huge_page(struct page *dst, struct page *src);
 
 extern unsigned long hugepages_treat_as_movable;
@@ -130,6 +132,8 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
 	return 0;
 }
 
+#define putback_active_hugepage(p) 0
+#define putback_active_hugepages(l) 0
 static inline void copy_huge_page(struct page *dst, struct page *src)
 {
 }
diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h
index 0d7df39..2e475b5 100644
--- v3.8.orig/include/linux/mempolicy.h
+++ v3.8/include/linux/mempolicy.h
@@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol);
 /* Check if a vma is migratable */
 static inline int vma_migratable(struct vm_area_struct *vma)
 {
-	if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP))
+	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
 		return 0;
 	/*
 	 * Migration allocates pages in the highest zone. If we cannot
diff --git v3.8.orig/include/linux/migrate.h v3.8/include/linux/migrate.h
index 1e9f627..d626c27 100644
--- v3.8.orig/include/linux/migrate.h
+++ v3.8/include/linux/migrate.h
@@ -42,6 +42,9 @@ extern int migrate_page(struct address_space *,
 extern int migrate_pages(struct list_head *l, new_page_t x,
 			unsigned long private, bool offlining,
 			enum migrate_mode mode, int reason);
+extern int migrate_movable_pages(struct list_head *from,
+		new_page_t get_new_page, unsigned long private, bool offlining,
+		enum migrate_mode mode, int reason);
 extern int migrate_huge_page(struct page *, new_page_t x,
 			unsigned long private, bool offlining,
 			enum migrate_mode mode);
@@ -64,6 +67,9 @@ static inline void putback_movable_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
 		unsigned long private, bool offlining,
 		enum migrate_mode mode, int reason) { return -ENOSYS; }
+static inline int migrate_movable_pages(struct list_head *from,
+		new_page_t get_new_page, unsigned long private, bool offlining,
+		enum migrate_mode mode, int reason) { return -ENOSYS; }
 static inline int migrate_huge_page(struct page *page, new_page_t x,
 		unsigned long private, bool offlining,
 		enum migrate_mode mode) { return -ENOSYS; }
diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
index 351025e..cb9d43b8 100644
--- v3.8.orig/mm/hugetlb.c
+++ v3.8/mm/hugetlb.c
@@ -3186,3 +3186,19 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage)
 	return ret;
 }
 #endif
+
+void putback_active_hugepage(struct page *page)
+{
+	VM_BUG_ON(!PageHead(page));
+	list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
+	put_page(page);
+}
+
+void putback_active_hugepages(struct list_head *l)
+{
+	struct page *page;
+	struct page *page2;
+
+	list_for_each_entry_safe(page, page2, l, lru)
+		putback_active_hugepage(page);
+}
diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c
index 7d84f4c..e305dc0 100644
--- v3.8.orig/mm/migrate.c
+++ v3.8/mm/migrate.c
@@ -100,6 +100,10 @@ void putback_movable_pages(struct list_head *l)
 	struct page *page2;
 
 	list_for_each_entry_safe(page, page2, l, lru) {
+		if (unlikely(PageHuge(page))) {
+			putback_active_hugepage(page);
+			continue;
+		}
 		list_del(&page->lru);
 		dec_zone_page_state(page, NR_ISOLATED_ANON +
 				page_is_file_cache(page));
@@ -1046,8 +1050,12 @@ int migrate_pages(struct list_head *from,
 
 		list_for_each_entry_safe(page, page2, from, lru) {
 			cond_resched();
-
-			rc = unmap_and_move(get_new_page, private,
+			if (PageHuge(page))
+				rc = unmap_and_move_huge_page(get_new_page,
+						private, page, pass > 2,
+						offlining, mode);
+			else
+				rc = unmap_and_move(get_new_page, private,
 						page, pass > 2, offlining,
 						mode);
 
@@ -1081,6 +1089,21 @@ int migrate_pages(struct list_head *from,
 	return rc;
 }
 
+int migrate_movable_pages(struct list_head *from, new_page_t get_new_page,
+			unsigned long private, bool offlining,
+			enum migrate_mode mode, int reason)
+{
+	int err = 0;
+
+	if (!list_empty(from)) {
+		err = migrate_pages(from, get_new_page, private,
+				    offlining, mode, reason);
+		if (err)
+			putback_movable_pages(from);
+	}
+	return err;
+}
+
 int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
 		      unsigned long private, bool offlining,
 		      enum migrate_mode mode)
-- 
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related	[flat|nested] 55+ messages in thread
* [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page()
  2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
  2013-02-21 19:41 ` [PATCH 1/9] migrate: add migrate_entry_wait_huge() Naoya Horiguchi
  2013-02-21 19:41 ` [PATCH 2/9] migrate: make core migration code aware of hugepage Naoya Horiguchi
@ 2013-02-21 19:41 ` Naoya Horiguchi
  2013-02-27  7:25   ` Chen Gong
  2013-02-21 19:41 ` [PATCH 4/9] migrate: clean up migrate_huge_page() Naoya Horiguchi
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-21 19:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, KOSAKI Motohiro,
	Andi Kleen, linux-kernel
Currently migrate_huge_page() takes a pointer to a hugepage to be
migrated as an argument, instead of taking a pointer to the list of
hugepages to be migrated. This behavior was introduced in commit
189ebff28 ("hugetlb: simplify migrate_huge_page()"), and was OK
because until now hugepage migration is enabled only for soft-offlining
which takes only one hugepage in a single call.
But the situation will change in the later patches in this series
which enable other users of page migration to support hugepage migration.
They can kick migration for both of normal pages and hugepages
in a single call, so we need to go back to original implementation
of using linked lists to collect the hugepages to be migrated.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 20 ++++++++++++++++----
 mm/migrate.c        |  2 ++
 2 files changed, 18 insertions(+), 4 deletions(-)
diff --git v3.8.orig/mm/memory-failure.c v3.8/mm/memory-failure.c
index bc126f6..01e4676 100644
--- v3.8.orig/mm/memory-failure.c
+++ v3.8/mm/memory-failure.c
@@ -1467,6 +1467,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	int ret;
 	unsigned long pfn = page_to_pfn(page);
 	struct page *hpage = compound_head(page);
+	LIST_HEAD(pagelist);
 
 	/* Synchronized using the page lock with memory_failure() */
 	lock_page(hpage);
@@ -1479,13 +1480,24 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	unlock_page(hpage);
 
 	/* Keep page count to indicate a given hugepage is isolated. */
-	ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, false,
-				MIGRATE_SYNC);
-	put_page(hpage);
+	list_move(&hpage->lru, &pagelist);
+	ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, false,
+				MIGRATE_SYNC, MR_MEMORY_FAILURE);
 	if (ret) {
 		pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
 			pfn, ret, page->flags);
-		return ret;
+		/*
+		 * We know that soft_offline_huge_page() tries to migrate
+		 * only one hugepage pointed to by hpage, so we need not
+		 * run through the pagelist here.
+		 */
+		putback_active_hugepage(hpage);
+		if (ret > 0)
+			ret = -EIO;
+	} else {
+		set_page_hwpoison_huge_page(hpage);
+		dequeue_hwpoisoned_huge_page(hpage);
+		atomic_long_add(1<<compound_trans_order(hpage), &mce_bad_pages);
 	}
 	/* keep elevated page count for bad page */
 	return ret;
diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c
index e305dc0..8c13cc5 100644
--- v3.8.orig/mm/migrate.c
+++ v3.8/mm/migrate.c
@@ -1004,6 +1004,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 
 	unlock_page(hpage);
 out:
+	if (rc != -EAGAIN)
+		putback_active_hugepage(hpage);
 	put_page(new_hpage);
 	if (result) {
 		if (rc)
-- 
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related	[flat|nested] 55+ messages in thread
* [PATCH 4/9] migrate: clean up migrate_huge_page()
  2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
                   ` (2 preceding siblings ...)
  2013-02-21 19:41 ` [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Naoya Horiguchi
@ 2013-02-21 19:41 ` Naoya Horiguchi
  2013-02-21 19:41 ` [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Naoya Horiguchi
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-21 19:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, KOSAKI Motohiro,
	Andi Kleen, linux-kernel
Due to the previous patch, soft_offline_huge_page() switches to use
migrate_pages(), and migrate_huge_page() is not used any more.
So let's remove it.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/migrate.h |  6 ------
 mm/migrate.c            | 28 ----------------------------
 2 files changed, 34 deletions(-)
diff --git v3.8.orig/include/linux/migrate.h v3.8/include/linux/migrate.h
index d626c27..dc085e1 100644
--- v3.8.orig/include/linux/migrate.h
+++ v3.8/include/linux/migrate.h
@@ -45,9 +45,6 @@ extern int migrate_pages(struct list_head *l, new_page_t x,
 extern int migrate_movable_pages(struct list_head *from,
 		new_page_t get_new_page, unsigned long private, bool offlining,
 		enum migrate_mode mode, int reason);
-extern int migrate_huge_page(struct page *, new_page_t x,
-			unsigned long private, bool offlining,
-			enum migrate_mode mode);
 
 extern int fail_migrate_page(struct address_space *,
 			struct page *, struct page *);
@@ -70,9 +67,6 @@ static inline int migrate_pages(struct list_head *l, new_page_t x,
 static inline int migrate_movable_pages(struct list_head *from,
 		new_page_t get_new_page, unsigned long private, bool offlining,
 		enum migrate_mode mode, int reason) { return -ENOSYS; }
-static inline int migrate_huge_page(struct page *page, new_page_t x,
-		unsigned long private, bool offlining,
-		enum migrate_mode mode) { return -ENOSYS; }
 
 static inline int migrate_prep(void) { return -ENOSYS; }
 static inline int migrate_prep_local(void) { return -ENOSYS; }
diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c
index 8c13cc5..7b2ca1a 100644
--- v3.8.orig/mm/migrate.c
+++ v3.8/mm/migrate.c
@@ -1106,34 +1106,6 @@ int migrate_movable_pages(struct list_head *from, new_page_t get_new_page,
 	return err;
 }
 
-int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
-		      unsigned long private, bool offlining,
-		      enum migrate_mode mode)
-{
-	int pass, rc;
-
-	for (pass = 0; pass < 10; pass++) {
-		rc = unmap_and_move_huge_page(get_new_page,
-					      private, hpage, pass > 2, offlining,
-					      mode);
-		switch (rc) {
-		case -ENOMEM:
-			goto out;
-		case -EAGAIN:
-			/* try again */
-			cond_resched();
-			break;
-		case MIGRATEPAGE_SUCCESS:
-			goto out;
-		default:
-			rc = -EIO;
-			goto out;
-		}
-	}
-out:
-	return rc;
-}
-
 #ifdef CONFIG_NUMA
 /*
  * Move a list of individual pages
-- 
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related	[flat|nested] 55+ messages in thread
* [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage
  2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
                   ` (3 preceding siblings ...)
  2013-02-21 19:41 ` [PATCH 4/9] migrate: clean up migrate_huge_page() Naoya Horiguchi
@ 2013-02-21 19:41 ` Naoya Horiguchi
  2013-03-18 15:40   ` Michal Hocko
  2013-02-21 19:41 ` [PATCH 6/9] migrate: enable move_pages() " Naoya Horiguchi
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-21 19:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, KOSAKI Motohiro,
	Andi Kleen, linux-kernel
This patch extends check_range() to handle vma with VM_HUGETLB set.
With this changes, we can migrate hugepage with migrate_pages(2).
Note that for larger hugepages (covered by pud entries, 1GB for
x86_64 for example), we simply skip it now.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/hugetlb.h |  6 ++++--
 mm/hugetlb.c            | 10 ++++++++++
 mm/mempolicy.c          | 46 ++++++++++++++++++++++++++++++++++------------
 3 files changed, 48 insertions(+), 14 deletions(-)
diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
index 8f87115..eb33df5 100644
--- v3.8.orig/include/linux/hugetlb.h
+++ v3.8/include/linux/hugetlb.h
@@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
 int dequeue_hwpoisoned_huge_page(struct page *page);
 void putback_active_hugepage(struct page *page);
 void putback_active_hugepages(struct list_head *l);
+void migrate_hugepage_add(struct page *page, struct list_head *list);
 void copy_huge_page(struct page *dst, struct page *src);
 
 extern unsigned long hugepages_treat_as_movable;
@@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 				pmd_t *pmd, int write);
 struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
 				pud_t *pud, int write);
-int pmd_huge(pmd_t pmd);
-int pud_huge(pud_t pmd);
+extern int pmd_huge(pmd_t pmd);
+extern int pud_huge(pud_t pmd);
 unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 		unsigned long address, unsigned long end, pgprot_t newprot);
 
@@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
 
 #define putback_active_hugepage(p) 0
 #define putback_active_hugepages(l) 0
+#define migrate_hugepage_add(p, l) 0
 static inline void copy_huge_page(struct page *dst, struct page *src)
 {
 }
diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
index cb9d43b8..86ffcb7 100644
--- v3.8.orig/mm/hugetlb.c
+++ v3.8/mm/hugetlb.c
@@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l)
 	list_for_each_entry_safe(page, page2, l, lru)
 		putback_active_hugepage(page);
 }
+
+void migrate_hugepage_add(struct page *page, struct list_head *list)
+{
+	VM_BUG_ON(!PageHuge(page));
+	get_page(page);
+	spin_lock(&hugetlb_lock);
+	list_move_tail(&page->lru, list);
+	spin_unlock(&hugetlb_lock);
+	return;
+}
diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
index e2df1c1..8627135 100644
--- v3.8.orig/mm/mempolicy.c
+++ v3.8/mm/mempolicy.c
@@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 	return addr != end;
 }
 
+static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
+		const nodemask_t *nodes, unsigned long flags,
+				    void *private)
+{
+#ifdef CONFIG_HUGETLB_PAGE
+	int nid;
+	struct page *page;
+
+	spin_lock(&vma->vm_mm->page_table_lock);
+	page = pte_page(huge_ptep_get((pte_t *)pmd));
+	spin_unlock(&vma->vm_mm->page_table_lock);
+	nid = page_to_nid(page);
+	if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
+	    && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
+		|| flags & MPOL_MF_MOVE_ALL))
+		migrate_hugepage_add(page, private);
+#else
+	BUG();
+#endif
+}
+
 static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
 		unsigned long addr, unsigned long end,
 		const nodemask_t *nodes, unsigned long flags,
@@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
 	pmd = pmd_offset(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
+		if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
+			check_hugetlb_pmd_range(vma, pmd, nodes,
+						flags, private);
+			continue;
+		}
 		split_huge_page_pmd(vma, addr, pmd);
 		if (pmd_none_or_trans_huge_or_clear_bad(pmd))
 			continue;
@@ -557,6 +583,8 @@ static inline int check_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
 	pud = pud_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
+		if (pud_huge(*pud) && is_vm_hugetlb_page(vma))
+			continue;
 		if (pud_none_or_clear_bad(pud))
 			continue;
 		if (check_pmd_range(vma, pud, addr, next, nodes,
@@ -648,9 +676,6 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
 				return ERR_PTR(-EFAULT);
 		}
 
-		if (is_vm_hugetlb_page(vma))
-			goto next;
-
 		if (flags & MPOL_MF_LAZY) {
 			change_prot_numa(vma, start, endvma);
 			goto next;
@@ -999,7 +1024,11 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
 
 static struct page *new_node_page(struct page *page, unsigned long node, int **x)
 {
-	return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
+	if (PageHuge(page))
+		return alloc_huge_page_node(page_hstate(compound_head(page)),
+					node);
+	else
+		return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
 }
 
 /*
@@ -1011,7 +1040,6 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
 {
 	nodemask_t nmask;
 	LIST_HEAD(pagelist);
-	int err = 0;
 
 	nodes_clear(nmask);
 	node_set(source, nmask);
@@ -1025,15 +1053,9 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
 	check_range(mm, mm->mmap->vm_start, mm->task_size, &nmask,
 			flags | MPOL_MF_DISCONTIG_OK, &pagelist);
 
-	if (!list_empty(&pagelist)) {
-		err = migrate_pages(&pagelist, new_node_page, dest,
+	return migrate_movable_pages(&pagelist, new_node_page, dest,
 							false, MIGRATE_SYNC,
 							MR_SYSCALL);
-		if (err)
-			putback_lru_pages(&pagelist);
-	}
-
-	return err;
 }
 
 /*
-- 
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related	[flat|nested] 55+ messages in thread
* [PATCH 6/9] migrate: enable move_pages() to migrate hugepage
  2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
                   ` (4 preceding siblings ...)
  2013-02-21 19:41 ` [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Naoya Horiguchi
@ 2013-02-21 19:41 ` Naoya Horiguchi
  2013-02-21 19:41 ` [PATCH 7/9] mbind: enable mbind() " Naoya Horiguchi
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-21 19:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, KOSAKI Motohiro,
	Andi Kleen, linux-kernel
This patch extends move_pages() to handle vma with VM_HUGETLB
and enables to migrate hugepage with migrate_pages(2).
We avoid getting refcount on tail pages of hugepage, because unlike thp,
hugepage is not split and we need not care about races with splitting.
And migration of larger (1GB for x86_64) hugepage are not enabled.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory.c  |  6 ++++--
 mm/migrate.c | 29 ++++++++++++++++++++---------
 2 files changed, 24 insertions(+), 11 deletions(-)
diff --git v3.8.orig/mm/memory.c v3.8/mm/memory.c
index bb1369f..d7cfd11 100644
--- v3.8.orig/mm/memory.c
+++ v3.8/mm/memory.c
@@ -1495,7 +1495,8 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 	if (pud_none(*pud))
 		goto no_page_table;
 	if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
-		BUG_ON(flags & FOLL_GET);
+		if (flags & FOLL_GET)
+			goto out;
 		page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE);
 		goto out;
 	}
@@ -1506,8 +1507,9 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 	if (pmd_none(*pmd))
 		goto no_page_table;
 	if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
-		BUG_ON(flags & FOLL_GET);
 		page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
+		if (flags & FOLL_GET && PageHead(page))
+			get_page_foll(page);
 		goto out;
 	}
 	if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c
index 7b2ca1a..36959d6 100644
--- v3.8.orig/mm/migrate.c
+++ v3.8/mm/migrate.c
@@ -1130,7 +1130,11 @@ static struct page *new_page_node(struct page *p, unsigned long private,
 
 	*result = &pm->status;
 
-	return alloc_pages_exact_node(pm->node,
+	if (PageHuge(p))
+		return alloc_huge_page_node(page_hstate(compound_head(p)),
+					pm->node);
+	else
+		return alloc_pages_exact_node(pm->node,
 				GFP_HIGHUSER_MOVABLE | GFP_THISNODE, 0);
 }
 
@@ -1176,6 +1180,13 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 		if (PageReserved(page) || PageKsm(page))
 			goto put_and_set;
 
+		/*
+		 * follow_page(FOLL_GET) didn't get refcount for tail pages of
+		 * hugepage, so here we skip putting it.
+		 */
+		if (PageHuge(page) && PageTail(page))
+			goto set_status;
+
 		pp->page = page;
 		err = page_to_nid(page);
 
@@ -1190,6 +1201,12 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 				!migrate_all)
 			goto put_and_set;
 
+		if (PageHuge(page)) {
+			get_page(page);
+			list_move_tail(&page->lru, &pagelist);
+			goto put_and_set;
+		}
+
 		err = isolate_lru_page(page);
 		if (!err) {
 			list_add_tail(&page->lru, &pagelist);
@@ -1207,14 +1224,8 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 		pp->status = err;
 	}
 
-	err = 0;
-	if (!list_empty(&pagelist)) {
-		err = migrate_pages(&pagelist, new_page_node,
-				(unsigned long)pm, 0, MIGRATE_SYNC,
-				MR_SYSCALL);
-		if (err)
-			putback_lru_pages(&pagelist);
-	}
+	err = migrate_movable_pages(&pagelist, new_page_node,
+				(unsigned long)pm, 0, MIGRATE_SYNC, MR_SYSCALL);
 
 	up_read(&mm->mmap_sem);
 	return err;
-- 
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related	[flat|nested] 55+ messages in thread
* [PATCH 7/9] mbind: enable mbind() to migrate hugepage
  2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
                   ` (5 preceding siblings ...)
  2013-02-21 19:41 ` [PATCH 6/9] migrate: enable move_pages() " Naoya Horiguchi
@ 2013-02-21 19:41 ` Naoya Horiguchi
  2013-02-21 19:41 ` [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Naoya Horiguchi
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-21 19:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, KOSAKI Motohiro,
	Andi Kleen, linux-kernel
This patch enables mbind(2) to migrate hugepages.
Page collecting function check_range() are already aware of hugepage
by the previous patch in this series.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/hugetlb.h |  3 +++
 mm/hugetlb.c            |  2 +-
 mm/mempolicy.c          | 15 ++++++---------
 mm/migrate.c            |  7 ++++++-
 4 files changed, 16 insertions(+), 11 deletions(-)
diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
index eb33df5..86a4d78 100644
--- v3.8.orig/include/linux/hugetlb.h
+++ v3.8/include/linux/hugetlb.h
@@ -263,6 +263,8 @@ struct huge_bootmem_page {
 #endif
 };
 
+struct page *alloc_huge_page(struct vm_area_struct *vma,
+				unsigned long addr, int avoid_reserve);
 struct page *alloc_huge_page_node(struct hstate *h, int nid);
 
 /* arch callback */
@@ -358,6 +360,7 @@ static inline int hstate_index(struct hstate *h)
 
 #else
 struct hstate {};
+#define alloc_huge_page(v, a, r) NULL
 #define alloc_huge_page_node(h, nid) NULL
 #define alloc_bootmem_huge_page(h) NULL
 #define hstate_file(f) NULL
diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
index 86ffcb7..ccf9995 100644
--- v3.8.orig/mm/hugetlb.c
+++ v3.8/mm/hugetlb.c
@@ -1116,7 +1116,7 @@ static void vma_commit_reservation(struct hstate *h,
 	}
 }
 
-static struct page *alloc_huge_page(struct vm_area_struct *vma,
+struct page *alloc_huge_page(struct vm_area_struct *vma,
 				    unsigned long addr, int avoid_reserve)
 {
 	struct hugepage_subpool *spool = subpool_vma(vma);
diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
index 8627135..9f56c40 100644
--- v3.8.orig/mm/mempolicy.c
+++ v3.8/mm/mempolicy.c
@@ -1187,6 +1187,8 @@ static struct page *new_vma_page(struct page *page, unsigned long private, int *
 		vma = vma->vm_next;
 	}
 
+	if (PageHuge(page))
+		return alloc_huge_page(vma, address, 1);
 	/*
 	 * if !vma, alloc_page_vma() will use task or system default policy
 	 */
@@ -1291,15 +1293,10 @@ static long do_mbind(unsigned long start, unsigned long len,
 	if (!err) {
 		int nr_failed = 0;
 
-		if (!list_empty(&pagelist)) {
-			WARN_ON_ONCE(flags & MPOL_MF_LAZY);
-			nr_failed = migrate_pages(&pagelist, new_vma_page,
-						(unsigned long)vma,
-						false, MIGRATE_SYNC,
-						MR_MEMPOLICY_MBIND);
-			if (nr_failed)
-				putback_lru_pages(&pagelist);
-		}
+		WARN_ON_ONCE(flags & MPOL_MF_LAZY);
+		nr_failed = migrate_movable_pages(&pagelist, new_vma_page,
+					(unsigned long)vma, false,
+					MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
 
 		if (nr_failed && (flags & MPOL_MF_STRICT))
 			err = -EIO;
diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c
index 36959d6..8c457e7 100644
--- v3.8.orig/mm/migrate.c
+++ v3.8/mm/migrate.c
@@ -974,7 +974,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	struct page *new_hpage = get_new_page(hpage, private, &result);
 	struct anon_vma *anon_vma = NULL;
 
-	if (!new_hpage)
+	/*
+	 * Getting a new hugepage with alloc_huge_page() (which can happen
+	 * when migration is caused by mbind()) can return ERR_PTR value,
+	 * so we need take care of the case here.
+	 */
+	if (!new_hpage || IS_ERR_VALUE(new_hpage))
 		return -ENOMEM;
 
 	rc = -EAGAIN;
-- 
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related	[flat|nested] 55+ messages in thread
* [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
  2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
                   ` (6 preceding siblings ...)
  2013-02-21 19:41 ` [PATCH 7/9] mbind: enable mbind() " Naoya Horiguchi
@ 2013-02-21 19:41 ` Naoya Horiguchi
  2013-02-23  7:05   ` Hillf Danton
                     ` (3 more replies)
  2013-02-21 19:41 ` [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Naoya Horiguchi
  2013-03-19 23:43 ` [RFC][PATCH 0/9] extend hugepage migration Simon Jeons
  9 siblings, 4 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-21 19:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, KOSAKI Motohiro,
	Andi Kleen, linux-kernel
Currently we can't offline memory blocks which contain hugepages because
a hugepage is considered as an unmovable page. But now with this patch
series, a hugepage has become movable, so by using hugepage migration we
can offline such memory blocks.
What's different from other users of hugepage migration is that we need
to decompose all the hugepages inside the target memory block into free
buddy pages after hugepage migration, because otherwise free hugepages
remaining in the memory block intervene the memory offlining.
For this reason we introduce new functions dissolve_free_huge_page() and
dissolve_free_huge_pages().
Other than that, what this patch does is straightforwardly to add hugepage
migration code, that is, adding hugepage code to the functions which scan
over pfn and collect hugepages to be migrated, and adding a hugepage
allocation function to alloc_migrate_target().
As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
over them because it's larger than memory block. So we now simply leave
it to fail as it is.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/hugetlb.h |  8 ++++++++
 mm/hugetlb.c            | 43 +++++++++++++++++++++++++++++++++++++++++
 mm/memory_hotplug.c     | 51 ++++++++++++++++++++++++++++++++++++++++---------
 mm/migrate.c            | 12 +++++++++++-
 mm/page_alloc.c         | 12 ++++++++++++
 mm/page_isolation.c     |  5 +++++
 6 files changed, 121 insertions(+), 10 deletions(-)
diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
index 86a4d78..e33f07f 100644
--- v3.8.orig/include/linux/hugetlb.h
+++ v3.8/include/linux/hugetlb.h
@@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page);
 void putback_active_hugepage(struct page *page);
 void putback_active_hugepages(struct list_head *l);
 void migrate_hugepage_add(struct page *page, struct list_head *list);
+int is_hugepage_movable(struct page *page);
 void copy_huge_page(struct page *dst, struct page *src);
 
 extern unsigned long hugepages_treat_as_movable;
@@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
 #define putback_active_hugepage(p) 0
 #define putback_active_hugepages(l) 0
 #define migrate_hugepage_add(p, l) 0
+#define is_hugepage_movable(x) 0
 static inline void copy_huge_page(struct page *dst, struct page *src)
 {
 }
@@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h)
 	return h - hstates;
 }
 
+extern void dissolve_free_huge_page(struct page *page);
+extern void dissolve_free_huge_pages(unsigned long start_pfn,
+				     unsigned long end_pfn);
+
 #else
 struct hstate {};
 #define alloc_huge_page(v, a, r) NULL
@@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 }
 #define hstate_index_to_shift(index) 0
 #define hstate_index(h) 0
+#define dissolve_free_huge_page(p) 0
+#define dissolve_free_huge_pages(s, e) 0
 #endif
 
 #endif /* _LINUX_HUGETLB_H */
diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
index ccf9995..c28e6c9 100644
--- v3.8.orig/mm/hugetlb.c
+++ v3.8/mm/hugetlb.c
@@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
 	return ret;
 }
 
+/* Dissolve a given free hugepage into free pages. */
+void dissolve_free_huge_page(struct page *page)
+{
+	if (PageHuge(page) && !page_count(page)) {
+		struct hstate *h = page_hstate(page);
+		int nid = page_to_nid(page);
+		spin_lock(&hugetlb_lock);
+		list_del(&page->lru);
+		h->free_huge_pages--;
+		h->free_huge_pages_node[nid]--;
+		update_and_free_page(h, page);
+		spin_unlock(&hugetlb_lock);
+	}
+}
+
+/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */
+void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	unsigned int step = 1 << (HUGETLB_PAGE_ORDER);
+	for (pfn = start_pfn; pfn < end_pfn; pfn += step)
+		dissolve_free_huge_page(pfn_to_page(pfn));
+}
+
 static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 {
 	struct page *page;
@@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage)
 	return 0;
 }
 
+/* Returns true for head pages of in-use hugepages, otherwise returns false. */
+int is_hugepage_movable(struct page *hpage)
+{
+	struct page *page;
+	struct page *tmp;
+	struct hstate *h = page_hstate(hpage);
+	int ret = 0;
+
+	VM_BUG_ON(!PageHuge(hpage));
+	if (PageTail(hpage))
+		return 0;
+	spin_lock(&hugetlb_lock);
+	list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru)
+		if (page == hpage)
+			ret = 1;
+	spin_unlock(&hugetlb_lock);
+	return ret;
+}
+
 /*
  * This function is called from memory failure code.
  * Assume the caller holds page lock of the head page.
diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c
index d04ed87..6418de2 100644
--- v3.8.orig/mm/memory_hotplug.c
+++ v3.8/mm/memory_hotplug.c
@@ -29,6 +29,7 @@
 #include <linux/suspend.h>
 #include <linux/mm_inline.h>
 #include <linux/firmware-map.h>
+#include <linux/hugetlb.h>
 
 #include <asm/tlbflush.h>
 
@@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
 }
 
 /*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
+ * Scan pfn range [start,end) to find movable/migratable pages (LRU pages
+ * and hugepages). We scan pfn because it's much easier than scanning over
+ * linked list. This function returns the pfn of the first found movable
+ * page if it's found, otherwise 0.
  */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
 			page = pfn_to_page(pfn);
 			if (PageLRU(page))
 				return pfn;
+			if (PageHuge(page)) {
+				if (is_hugepage_movable(page))
+					return pfn;
+				else
+					pfn += (1 << compound_order(page)) - 1;
+			}
 		}
 	}
 	return 0;
@@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		page = pfn_to_page(pfn);
 		if (!get_page_unless_zero(page))
 			continue;
+		if (PageHuge(page)) {
+			/*
+			 * Larger hugepage (1GB for x86_64) is larger than
+			 * memory block, so pfn scan can start at the tail
+			 * page of larger hugepage. In such case,
+			 * we simply skip the hugepage and move the cursor
+			 * to the last tail page.
+			 */
+			if (PageTail(page)) {
+				struct page *head = compound_head(page);
+				pfn = page_to_pfn(head) +
+					(1 << compound_order(head)) - 1;
+				put_page(page);
+				continue;
+			}
+			pfn = (1 << compound_order(page)) - 1;
+			if (huge_page_size(page_hstate(page)) != PMD_SIZE) {
+				put_page(page);
+				continue;
+			}
+			list_move_tail(&page->lru, &source);
+			move_pages -= 1 << compound_order(page);
+			continue;
+		}
 		/*
 		 * We can skip free pages. And we can only deal with pages on
 		 * LRU.
@@ -1049,7 +1082,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 	}
 	if (!list_empty(&source)) {
 		if (not_managed) {
-			putback_lru_pages(&source);
+			putback_movable_pages(&source);
 			goto out;
 		}
 
@@ -1057,11 +1090,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		 * alloc_migrate_target should be improooooved!!
 		 * migrate_pages returns # of failed pages.
 		 */
-		ret = migrate_pages(&source, alloc_migrate_target, 0,
+		ret = migrate_movable_pages(&source, alloc_migrate_target, 0,
 							true, MIGRATE_SYNC,
 							MR_MEMORY_HOTPLUG);
-		if (ret)
-			putback_lru_pages(&source);
 	}
 out:
 	return ret;
@@ -1304,8 +1335,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		drain_all_pages();
 	}
 
-	pfn = scan_lru_pages(start_pfn, end_pfn);
-	if (pfn) { /* We have page on LRU */
+	pfn = scan_movable_pages(start_pfn, end_pfn);
+	if (pfn) { /* We have movable pages */
 		ret = do_migrate_range(pfn, end_pfn);
 		if (!ret) {
 			drain = 1;
@@ -1324,6 +1355,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	yield();
 	/* drain pcp pages, this is synchronous. */
 	drain_all_pages();
+	/* dissolve all free hugepages inside the memory block */
+	dissolve_free_huge_pages(start_pfn, end_pfn);
 	/* check again */
 	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
 	if (offlined_pages < 0) {
diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c
index 8c457e7..a491a98 100644
--- v3.8.orig/mm/migrate.c
+++ v3.8/mm/migrate.c
@@ -1009,8 +1009,18 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 
 	unlock_page(hpage);
 out:
-	if (rc != -EAGAIN)
+	if (rc != -EAGAIN) {
 		putback_active_hugepage(hpage);
+
+		/*
+		 * After hugepage migration from memory hotplug, the original
+		 * hugepage should never be allocated again. This will be
+		 * done by dissolving it into free normal pages, because
+		 * we already set migratetype to MIGRATE_ISOLATE for them.
+		 */
+		if (offlining)
+			dissolve_free_huge_page(hpage);
+	}
 	put_page(new_hpage);
 	if (result) {
 		if (rc)
diff --git v3.8.orig/mm/page_alloc.c v3.8/mm/page_alloc.c
index 6a83cd3..c37951d 100644
--- v3.8.orig/mm/page_alloc.c
+++ v3.8/mm/page_alloc.c
@@ -58,6 +58,7 @@
 #include <linux/prefetch.h>
 #include <linux/migrate.h>
 #include <linux/page-debug-flags.h>
+#include <linux/hugetlb.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -5686,6 +5687,17 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 			continue;
 
 		page = pfn_to_page(check);
+
+		/*
+		 * Hugepages are not in LRU lists, but they're movable.
+		 * We need not scan over tail pages bacause we don't
+		 * handle each tail page individually in migration.
+		 */
+		if (PageHuge(page)) {
+			iter += (1 << compound_order(page)) - 1;
+			continue;
+		}
+
 		/*
 		 * We can't use page_count without pin a page
 		 * because another CPU can free compound page.
diff --git v3.8.orig/mm/page_isolation.c v3.8/mm/page_isolation.c
index 383bdbb..cf48ef6 100644
--- v3.8.orig/mm/page_isolation.c
+++ v3.8/mm/page_isolation.c
@@ -6,6 +6,7 @@
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
 #include <linux/memory.h>
+#include <linux/hugetlb.h>
 #include "internal.h"
 
 int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
@@ -252,6 +253,10 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
 {
 	gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE;
 
+	if (PageHuge(page))
+		return alloc_huge_page_node(page_hstate(compound_head(page)),
+					    numa_node_id());
+
 	if (PageHighMem(page))
 		gfp_mask |= __GFP_HIGHMEM;
 
-- 
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related	[flat|nested] 55+ messages in thread
* [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable
  2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
                   ` (7 preceding siblings ...)
  2013-02-21 19:41 ` [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Naoya Horiguchi
@ 2013-02-21 19:41 ` Naoya Horiguchi
  2013-02-28  6:02   ` KOSAKI Motohiro
  2013-03-18 15:51   ` Michal Hocko
  2013-03-19 23:43 ` [RFC][PATCH 0/9] extend hugepage migration Simon Jeons
  9 siblings, 2 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-21 19:41 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Hugh Dickins, KOSAKI Motohiro,
	Andi Kleen, linux-kernel
Now hugepages are definitely movable. So allocating hugepages from
ZONE_MOVABLE is natural and we have no reason to keep this parameter.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 Documentation/sysctl/vm.txt | 16 ----------------
 include/linux/hugetlb.h     |  2 --
 kernel/sysctl.c             |  7 -------
 mm/hugetlb.c                | 23 +++++------------------
 4 files changed, 5 insertions(+), 43 deletions(-)
diff --git v3.8.orig/Documentation/sysctl/vm.txt v3.8/Documentation/sysctl/vm.txt
index 078701f..997350a 100644
--- v3.8.orig/Documentation/sysctl/vm.txt
+++ v3.8/Documentation/sysctl/vm.txt
@@ -167,22 +167,6 @@ fragmentation index is <= extfrag_threshold. The default value is 500.
 
 ==============================================================
 
-hugepages_treat_as_movable
-
-This parameter is only useful when kernelcore= is specified at boot time to
-create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages
-are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero
-value written to hugepages_treat_as_movable allows huge pages to be allocated
-from ZONE_MOVABLE.
-
-Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge
-pages pool can easily grow or shrink within. Assuming that applications are
-not running that mlock() a lot of memory, it is likely the huge pages pool
-can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value
-into nr_hugepages and triggering page reclaim.
-
-==============================================================
-
 hugetlb_shm_group
 
 hugetlb_shm_group contains group id that is allowed to create SysV
diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
index e33f07f..c97e5c5 100644
--- v3.8.orig/include/linux/hugetlb.h
+++ v3.8/include/linux/hugetlb.h
@@ -35,7 +35,6 @@ int PageHuge(struct page *page);
 void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
 int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
 int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
-int hugetlb_treat_movable_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
 
 #ifdef CONFIG_NUMA
 int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int,
@@ -73,7 +72,6 @@ void migrate_hugepage_add(struct page *page, struct list_head *list);
 int is_hugepage_movable(struct page *page);
 void copy_huge_page(struct page *dst, struct page *src);
 
-extern unsigned long hugepages_treat_as_movable;
 extern const unsigned long hugetlb_zero, hugetlb_infinity;
 extern int sysctl_hugetlb_shm_group;
 extern struct list_head huge_boot_pages;
diff --git v3.8.orig/kernel/sysctl.c v3.8/kernel/sysctl.c
index c88878d..a98bcf2 100644
--- v3.8.orig/kernel/sysctl.c
+++ v3.8/kernel/sysctl.c
@@ -1189,13 +1189,6 @@ static struct ctl_table vm_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	 },
-	 {
-		.procname	= "hugepages_treat_as_movable",
-		.data		= &hugepages_treat_as_movable,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= hugetlb_treat_movable_handler,
-	},
 	{
 		.procname	= "nr_overcommit_hugepages",
 		.data		= NULL,
diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
index c28e6c9..c60d203 100644
--- v3.8.orig/mm/hugetlb.c
+++ v3.8/mm/hugetlb.c
@@ -33,7 +33,6 @@
 #include "internal.h"
 
 const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
-static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
 int hugetlb_max_hstate __read_mostly;
@@ -542,7 +541,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 retry_cpuset:
 	cpuset_mems_cookie = get_mems_allowed();
 	zonelist = huge_zonelist(vma, address,
-					htlb_alloc_mask, &mpol, &nodemask);
+					GFP_HIGHUSER_MOVABLE, &mpol, &nodemask);
 	/*
 	 * A child process with MAP_PRIVATE mappings created by their parent
 	 * have no page reserves. This check ensures that reservations are
@@ -558,7 +557,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 						MAX_NR_ZONES - 1, nodemask) {
-		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) {
+		if (cpuset_zone_allowed_softwall(zone, GFP_HIGHUSER_MOVABLE)) {
 			page = dequeue_huge_page_node(h, zone_to_nid(zone));
 			if (page) {
 				if (!avoid_reserve)
@@ -698,7 +697,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
 		return NULL;
 
 	page = alloc_pages_exact_node(nid,
-		htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|
+		GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE|
 						__GFP_REPEAT|__GFP_NOWARN,
 		huge_page_order(h));
 	if (page) {
@@ -909,12 +908,12 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 	spin_unlock(&hugetlb_lock);
 
 	if (nid == NUMA_NO_NODE)
-		page = alloc_pages(htlb_alloc_mask|__GFP_COMP|
+		page = alloc_pages(GFP_HIGHUSER_MOVABLE|__GFP_COMP|
 				   __GFP_REPEAT|__GFP_NOWARN,
 				   huge_page_order(h));
 	else
 		page = alloc_pages_exact_node(nid,
-			htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|
+			GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE|
 			__GFP_REPEAT|__GFP_NOWARN, huge_page_order(h));
 
 	if (page && arch_prepare_hugepage(page)) {
@@ -2078,18 +2077,6 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *table, int write,
 }
 #endif /* CONFIG_NUMA */
 
-int hugetlb_treat_movable_handler(struct ctl_table *table, int write,
-			void __user *buffer,
-			size_t *length, loff_t *ppos)
-{
-	proc_dointvec(table, write, buffer, length, ppos);
-	if (hugepages_treat_as_movable)
-		htlb_alloc_mask = GFP_HIGHUSER_MOVABLE;
-	else
-		htlb_alloc_mask = GFP_HIGHUSER;
-	return 0;
-}
-
 int hugetlb_overcommit_handler(struct ctl_table *table, int write,
 			void __user *buffer,
 			size_t *length, loff_t *ppos)
-- 
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related	[flat|nested] 55+ messages in thread
* Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
  2013-02-21 19:41 ` [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Naoya Horiguchi
@ 2013-02-23  7:05   ` Hillf Danton
  2013-02-25 16:57     ` Naoya Horiguchi
  2013-02-27  7:36   ` Chen Gong
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 55+ messages in thread
From: Hillf Danton @ 2013-02-23  7:05 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel, Hillf Danton,
	Michal Hocko
Hello Naoya
[add Michal in cc list]
On Fri, Feb 22, 2013 at 3:41 AM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
>
> +/* Returns true for head pages of in-use hugepages, otherwise returns false. */
> +int is_hugepage_movable(struct page *hpage)
s/int/bool/  can we?
> +{
> +       struct page *page;
> +       struct page *tmp;
> +       struct hstate *h = page_hstate(hpage);
Make sense to compute hstate for a tail page?
> +       int ret = 0;
> +
> +       VM_BUG_ON(!PageHuge(hpage));
> +       if (PageTail(hpage))
> +               return 0;
VM_BUG_ON(!PageHuge(hpage) || PageTail(hpage)), can we?
> +       spin_lock(&hugetlb_lock);
> +       list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru)
s/_safe//  can we?
> +               if (page == hpage)
> +                       ret = 1;
Can we bail out with ret set to be true?
> +       spin_unlock(&hugetlb_lock);
> +       return ret;
> +}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
  2013-02-23  7:05   ` Hillf Danton
@ 2013-02-25 16:57     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-25 16:57 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel, Michal Hocko
Hi Hillf,
On Sat, Feb 23, 2013 at 03:05:30PM +0800, Hillf Danton wrote:
> Hello Naoya
> 
> [add Michal in cc list]
> 
> On Fri, Feb 22, 2013 at 3:41 AM, Naoya Horiguchi
> <n-horiguchi@ah.jp.nec.com> wrote:
> >
> > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */
> > +int is_hugepage_movable(struct page *hpage)
> s/int/bool/  can we?
Yes, we can. I'll do this.
> > +{
> > +       struct page *page;
> > +       struct page *tmp;
> > +       struct hstate *h = page_hstate(hpage);
> Make sense to compute hstate for a tail page?
No need to do this here.
It's better to put it after PageTail check.
> > +       int ret = 0;
> > +
> > +       VM_BUG_ON(!PageHuge(hpage));
> > +       if (PageTail(hpage))
> > +               return 0;
> VM_BUG_ON(!PageHuge(hpage) || PageTail(hpage)), can we?
I think that firing BUG_ON() for tail pages is overkill.
Pfn range over which scan_movable_pages() runs could start
at the pfn inside the hugepage when we try to hot-remove
the memory block used by 1GB hugepage. In that case,
is_hugepage_movable() can be called for tail pages as a
normal behavior.
But anyway, I'll add the comment for this corner case.
> > +       spin_lock(&hugetlb_lock);
> > +       list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru)
> s/_safe//  can we?
OK.
> > +               if (page == hpage)
> > +                       ret = 1;
> Can we bail out with ret set to be true?
Yes, inserting break is good for performance.
> > +       spin_unlock(&hugetlb_lock);
> > +       return ret;
> > +}
Thank you!
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page()
  2013-02-21 19:41 ` [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Naoya Horiguchi
@ 2013-02-27  7:25   ` Chen Gong
  2013-02-27 17:06     ` Naoya Horiguchi
  0 siblings, 1 reply; 55+ messages in thread
From: Chen Gong @ 2013-02-27  7:25 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 3065 bytes --]
On Thu, Feb 21, 2013 at 02:41:42PM -0500, Naoya Horiguchi wrote:
> Date: Thu, 21 Feb 2013 14:41:42 -0500
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> To: linux-mm@kvack.org
> Cc: Andrew Morton <akpm@linux-foundation.org>, Mel Gorman <mel@csn.ul.ie>,
>  Hugh Dickins <hughd@google.com>, KOSAKI Motohiro
>  <kosaki.motohiro@jp.fujitsu.com>, Andi Kleen <andi@firstfloor.org>,
>  linux-kernel@vger.kernel.org
> Subject: [PATCH 3/9] soft-offline: use migrate_pages() instead of
>  migrate_huge_page()
> 
> Currently migrate_huge_page() takes a pointer to a hugepage to be
> migrated as an argument, instead of taking a pointer to the list of
> hugepages to be migrated. This behavior was introduced in commit
> 189ebff28 ("hugetlb: simplify migrate_huge_page()"), and was OK
> because until now hugepage migration is enabled only for soft-offlining
> which takes only one hugepage in a single call.
> 
> But the situation will change in the later patches in this series
> which enable other users of page migration to support hugepage migration.
> They can kick migration for both of normal pages and hugepages
> in a single call, so we need to go back to original implementation
> of using linked lists to collect the hugepages to be migrated.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  mm/memory-failure.c | 20 ++++++++++++++++----
>  mm/migrate.c        |  2 ++
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git v3.8.orig/mm/memory-failure.c v3.8/mm/memory-failure.c
> index bc126f6..01e4676 100644
> --- v3.8.orig/mm/memory-failure.c
> +++ v3.8/mm/memory-failure.c
> @@ -1467,6 +1467,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
>  	int ret;
>  	unsigned long pfn = page_to_pfn(page);
>  	struct page *hpage = compound_head(page);
> +	LIST_HEAD(pagelist);
>  
>  	/* Synchronized using the page lock with memory_failure() */
>  	lock_page(hpage);
> @@ -1479,13 +1480,24 @@ static int soft_offline_huge_page(struct page *page, int flags)
>  	unlock_page(hpage);
>  
>  	/* Keep page count to indicate a given hugepage is isolated. */
> -	ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, false,
> -				MIGRATE_SYNC);
> -	put_page(hpage);
> +	list_move(&hpage->lru, &pagelist);
> +	ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, false,
> +				MIGRATE_SYNC, MR_MEMORY_FAILURE);
>  	if (ret) {
>  		pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
>  			pfn, ret, page->flags);
> -		return ret;
> +		/*
> +		 * We know that soft_offline_huge_page() tries to migrate
> +		 * only one hugepage pointed to by hpage, so we need not
> +		 * run through the pagelist here.
> +		 */
> +		putback_active_hugepage(hpage);
> +		if (ret > 0)
> +			ret = -EIO;
> +	} else {
> +		set_page_hwpoison_huge_page(hpage);
> +		dequeue_hwpoisoned_huge_page(hpage);
> +		atomic_long_add(1<<compound_trans_order(hpage), &mce_bad_pages);
mce_bad_pages has been substituted by num_poisoned_pages.
[...]
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
  2013-02-21 19:41 ` [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Naoya Horiguchi
  2013-02-23  7:05   ` Hillf Danton
@ 2013-02-27  7:36   ` Chen Gong
  2013-02-27 17:16     ` Naoya Horiguchi
  2013-03-18 16:07   ` Michal Hocko
  2013-03-20  1:03   ` Simon Jeons
  3 siblings, 1 reply; 55+ messages in thread
From: Chen Gong @ 2013-02-27  7:36 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5678 bytes --]
On Thu, Feb 21, 2013 at 02:41:47PM -0500, Naoya Horiguchi wrote:
> Date: Thu, 21 Feb 2013 14:41:47 -0500
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> To: linux-mm@kvack.org
> Cc: Andrew Morton <akpm@linux-foundation.org>, Mel Gorman <mel@csn.ul.ie>,
>  Hugh Dickins <hughd@google.com>, KOSAKI Motohiro
>  <kosaki.motohiro@jp.fujitsu.com>, Andi Kleen <andi@firstfloor.org>,
>  linux-kernel@vger.kernel.org
> Subject: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle
>  hugepage
> 
> Currently we can't offline memory blocks which contain hugepages because
> a hugepage is considered as an unmovable page. But now with this patch
> series, a hugepage has become movable, so by using hugepage migration we
> can offline such memory blocks.
> 
> What's different from other users of hugepage migration is that we need
> to decompose all the hugepages inside the target memory block into free
> buddy pages after hugepage migration, because otherwise free hugepages
> remaining in the memory block intervene the memory offlining.
> For this reason we introduce new functions dissolve_free_huge_page() and
> dissolve_free_huge_pages().
> 
> Other than that, what this patch does is straightforwardly to add hugepage
> migration code, that is, adding hugepage code to the functions which scan
> over pfn and collect hugepages to be migrated, and adding a hugepage
> allocation function to alloc_migrate_target().
> 
> As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
> over them because it's larger than memory block. So we now simply leave
> it to fail as it is.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  include/linux/hugetlb.h |  8 ++++++++
>  mm/hugetlb.c            | 43 +++++++++++++++++++++++++++++++++++++++++
>  mm/memory_hotplug.c     | 51 ++++++++++++++++++++++++++++++++++++++++---------
>  mm/migrate.c            | 12 +++++++++++-
>  mm/page_alloc.c         | 12 ++++++++++++
>  mm/page_isolation.c     |  5 +++++
>  6 files changed, 121 insertions(+), 10 deletions(-)
> 
> diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
> index 86a4d78..e33f07f 100644
> --- v3.8.orig/include/linux/hugetlb.h
> +++ v3.8/include/linux/hugetlb.h
> @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page);
>  void putback_active_hugepage(struct page *page);
>  void putback_active_hugepages(struct list_head *l);
>  void migrate_hugepage_add(struct page *page, struct list_head *list);
> +int is_hugepage_movable(struct page *page);
>  void copy_huge_page(struct page *dst, struct page *src);
>  
>  extern unsigned long hugepages_treat_as_movable;
> @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
>  #define putback_active_hugepage(p) 0
>  #define putback_active_hugepages(l) 0
>  #define migrate_hugepage_add(p, l) 0
> +#define is_hugepage_movable(x) 0
>  static inline void copy_huge_page(struct page *dst, struct page *src)
>  {
>  }
> @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h)
>  	return h - hstates;
>  }
>  
> +extern void dissolve_free_huge_page(struct page *page);
> +extern void dissolve_free_huge_pages(unsigned long start_pfn,
> +				     unsigned long end_pfn);
> +
>  #else
>  struct hstate {};
>  #define alloc_huge_page(v, a, r) NULL
> @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>  }
>  #define hstate_index_to_shift(index) 0
>  #define hstate_index(h) 0
> +#define dissolve_free_huge_page(p) 0
> +#define dissolve_free_huge_pages(s, e) 0
>  #endif
>  
>  #endif /* _LINUX_HUGETLB_H */
> diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> index ccf9995..c28e6c9 100644
> --- v3.8.orig/mm/hugetlb.c
> +++ v3.8/mm/hugetlb.c
> @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>  	return ret;
>  }
>  
> +/* Dissolve a given free hugepage into free pages. */
> +void dissolve_free_huge_page(struct page *page)
> +{
> +	if (PageHuge(page) && !page_count(page)) {
> +		struct hstate *h = page_hstate(page);
> +		int nid = page_to_nid(page);
> +		spin_lock(&hugetlb_lock);
> +		list_del(&page->lru);
> +		h->free_huge_pages--;
> +		h->free_huge_pages_node[nid]--;
> +		update_and_free_page(h, page);
> +		spin_unlock(&hugetlb_lock);
> +	}
> +}
> +
> +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */
> +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn;
> +	unsigned int step = 1 << (HUGETLB_PAGE_ORDER);
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += step)
> +		dissolve_free_huge_page(pfn_to_page(pfn));
> +}
> +
>  static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>  {
>  	struct page *page;
> @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage)
>  	return 0;
>  }
>  
> +/* Returns true for head pages of in-use hugepages, otherwise returns false. */
> +int is_hugepage_movable(struct page *hpage)
> +{
> +	struct page *page;
> +	struct page *tmp;
> +	struct hstate *h = page_hstate(hpage);
> +	int ret = 0;
> +
> +	VM_BUG_ON(!PageHuge(hpage));
> +	if (PageTail(hpage))
> +		return 0;
> +	spin_lock(&hugetlb_lock);
> +	list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru)
> +		if (page == hpage)
> +			ret = 1;
I don't understand the logic here. 1) page is not removed why tmp is used?
2) why hitting (page ==hpage) but not breaking from the loop?
> +	spin_unlock(&hugetlb_lock);
> +	return ret;
> +}
> +
> [...]
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page()
  2013-02-27  7:25   ` Chen Gong
@ 2013-02-27 17:06     ` Naoya Horiguchi
  2013-02-27 17:57       ` Naoya Horiguchi
  0 siblings, 1 reply; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-27 17:06 UTC (permalink / raw)
  To: gong.chen
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Wed, Feb 27, 2013 at 02:25:17AM -0500, Chen Gong wrote:
> On Thu, Feb 21, 2013 at 02:41:42PM -0500, Naoya Horiguchi wrote:
> > Date: Thu, 21 Feb 2013 14:41:42 -0500
...
> > diff --git v3.8.orig/mm/memory-failure.c v3.8/mm/memory-failure.c
> > index bc126f6..01e4676 100644
> > --- v3.8.orig/mm/memory-failure.c
> > +++ v3.8/mm/memory-failure.c
...
> > +		atomic_long_add(1<<compound_trans_order(hpage), &mce_bad_pages);
> 
> mce_bad_pages has been substituted by num_poisoned_pages.
This patchset is based on v3.8 (as show in diff header), where the
replacing patch "memory-failure: use num_poisoned_pages instead of
mce_bad_pages" is not merged yet. I'll rebase on v3.8-rc1 in the
next post.
Thanks,
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
  2013-02-27  7:36   ` Chen Gong
@ 2013-02-27 17:16     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-27 17:16 UTC (permalink / raw)
  To: gong.chen
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Wed, Feb 27, 2013 at 02:36:04AM -0500, Chen Gong wrote:
> On Thu, Feb 21, 2013 at 02:41:47PM -0500, Naoya Horiguchi wrote:
...
> > @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage)
> >  	return 0;
> >  }
> >  
> > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */
> > +int is_hugepage_movable(struct page *hpage)
> > +{
> > +	struct page *page;
> > +	struct page *tmp;
> > +	struct hstate *h = page_hstate(hpage);
> > +	int ret = 0;
> > +
> > +	VM_BUG_ON(!PageHuge(hpage));
> > +	if (PageTail(hpage))
> > +		return 0;
> > +	spin_lock(&hugetlb_lock);
> > +	list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru)
> > +		if (page == hpage)
> > +			ret = 1;
> 
> I don't understand the logic here. 1) page is not removed why tmp is used?
> 2) why hitting (page ==hpage) but not breaking from the loop?
For question 1), using list_for_each_entry_safe() was a remnant of
try and error and will be fixed. And for question 2), I will add
break in later version.
Thanks,
Naoya
> > +	spin_unlock(&hugetlb_lock);
> > +	return ret;
> > +}
> > +
> > [...]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page()
  2013-02-27 17:06     ` Naoya Horiguchi
@ 2013-02-27 17:57       ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-27 17:57 UTC (permalink / raw)
  To: gong.chen
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Wed, Feb 27, 2013 at 12:06:27PM -0500, Naoya Horiguchi wrote:
> On Wed, Feb 27, 2013 at 02:25:17AM -0500, Chen Gong wrote:
> > On Thu, Feb 21, 2013 at 02:41:42PM -0500, Naoya Horiguchi wrote:
> > > Date: Thu, 21 Feb 2013 14:41:42 -0500
> ...
> > > diff --git v3.8.orig/mm/memory-failure.c v3.8/mm/memory-failure.c
> > > index bc126f6..01e4676 100644
> > > --- v3.8.orig/mm/memory-failure.c
> > > +++ v3.8/mm/memory-failure.c
> ...
> > > +		atomic_long_add(1<<compound_trans_order(hpage), &mce_bad_pages);
> > 
> > mce_bad_pages has been substituted by num_poisoned_pages.
> 
> This patchset is based on v3.8 (as show in diff header), where the
> replacing patch "memory-failure: use num_poisoned_pages instead of
> mce_bad_pages" is not merged yet. I'll rebase on v3.8-rc1 in the
> next post.
sorry, s/v3.8-rc1/v3.9-rc1/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable
  2013-02-21 19:41 ` [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Naoya Horiguchi
@ 2013-02-28  6:02   ` KOSAKI Motohiro
  2013-02-28 18:16     ` Naoya Horiguchi
  2013-03-18 15:51   ` Michal Hocko
  1 sibling, 1 reply; 55+ messages in thread
From: KOSAKI Motohiro @ 2013-02-28  6:02 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm@kvack.org, Andrew Morton, Mel Gorman, Hugh Dickins,
	Andi Kleen, LKML
> -        {
> -               .procname       = "hugepages_treat_as_movable",
> -               .data           = &hugepages_treat_as_movable,
> -               .maxlen         = sizeof(int),
> -               .mode           = 0644,
> -               .proc_handler   = hugetlb_treat_movable_handler,
> -       },
Sorry, no.
This is too aggressive remove. Imagine, a lot of shell script don't
have any error check.
I suggest to keep this file but change to nop (to output warning is better).
About 1-2 years after, we can remove this file safely.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable
  2013-02-28  6:02   ` KOSAKI Motohiro
@ 2013-02-28 18:16     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-02-28 18:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm@kvack.org, Andrew Morton, Mel Gorman, Hugh Dickins,
	Andi Kleen, LKML
On Thu, Feb 28, 2013 at 01:02:37AM -0500, KOSAKI Motohiro wrote:
> > -        {
> > -               .procname       = "hugepages_treat_as_movable",
> > -               .data           = &hugepages_treat_as_movable,
> > -               .maxlen         = sizeof(int),
> > -               .mode           = 0644,
> > -               .proc_handler   = hugetlb_treat_movable_handler,
> > -       },
> 
> Sorry, no.
> 
> This is too aggressive remove. Imagine, a lot of shell script don't
> have any error check.
Sure, it could break usespace applications.
> I suggest to keep this file but change to nop (to output warning is better).
> About 1-2 years after, we can remove this file safely.
OK, so I'll leave it for a while with the comment saying that this
parameter is obsolete and shouldn't be used.
Thanks,
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge()
  2013-02-21 19:41 ` [PATCH 1/9] migrate: add migrate_entry_wait_huge() Naoya Horiguchi
@ 2013-03-18 14:51   ` Michal Hocko
  2013-03-19  0:06     ` Naoya Horiguchi
  2013-03-19 23:57   ` Simon Jeons
  1 sibling, 1 reply; 55+ messages in thread
From: Michal Hocko @ 2013-03-18 14:51 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Thu 21-02-13 14:41:40, Naoya Horiguchi wrote:
[...]
> diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c
> index 2fd8b4a..7d84f4c 100644
> --- v3.8.orig/mm/migrate.c
> +++ v3.8/mm/migrate.c
> @@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>  	pte_unmap_unlock(ptep, ptl);
>  }
>  
> +void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd,
> +				unsigned long address)
> +{
> +	spinlock_t *ptl = pte_lockptr(mm, pmd);
> +	pte_t pte;
> +	swp_entry_t entry;
> +	struct page *page;
> +
> +	spin_lock(ptl);
> +	pte = huge_ptep_get((pte_t *)pmd);
> +	if (!is_hugetlb_entry_migration(pte))
> +		goto out;
> +	entry = pte_to_swp_entry(pte);
> +	page = migration_entry_to_page(entry);
> +	if (!get_page_unless_zero(page))
> +		goto out;
> +	spin_unlock(ptl);
> +	wait_on_page_locked(page);
> +	put_page(page);
> +	return;
> +out:
> +	spin_unlock(ptl);
> +}
This duplicates a lot of code from migration_entry_wait. Can we just
teach the generic one to be HugePage aware instead?
All it takes is just opencoding pte_offset_map_lock and calling
huge_ptep_get ofr HugePage and pte_offset_map otherwise.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 2/9] migrate: make core migration code aware of hugepage
  2013-02-21 19:41 ` [PATCH 2/9] migrate: make core migration code aware of hugepage Naoya Horiguchi
@ 2013-03-18 15:22   ` Michal Hocko
  2013-03-18 15:33     ` Michal Hocko
  0 siblings, 1 reply; 55+ messages in thread
From: Michal Hocko @ 2013-03-18 15:22 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Thu 21-02-13 14:41:41, Naoya Horiguchi wrote:
[...]
> diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h
> index 0d7df39..2e475b5 100644
> --- v3.8.orig/include/linux/mempolicy.h
> +++ v3.8/include/linux/mempolicy.h
> @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol);
>  /* Check if a vma is migratable */
>  static inline int vma_migratable(struct vm_area_struct *vma)
>  {
> -	if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP))
> +	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
>  		return 0;
Is this safe? At least check_*_range don't seem to be hugetlb aware.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 2/9] migrate: make core migration code aware of hugepage
  2013-03-18 15:22   ` Michal Hocko
@ 2013-03-18 15:33     ` Michal Hocko
  2013-03-19  0:06       ` Naoya Horiguchi
  0 siblings, 1 reply; 55+ messages in thread
From: Michal Hocko @ 2013-03-18 15:33 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Mon 18-03-13 16:22:24, Michal Hocko wrote:
> On Thu 21-02-13 14:41:41, Naoya Horiguchi wrote:
> [...]
> > diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h
> > index 0d7df39..2e475b5 100644
> > --- v3.8.orig/include/linux/mempolicy.h
> > +++ v3.8/include/linux/mempolicy.h
> > @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol);
> >  /* Check if a vma is migratable */
> >  static inline int vma_migratable(struct vm_area_struct *vma)
> >  {
> > -	if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP))
> > +	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
> >  		return 0;
> 
> Is this safe? At least check_*_range don't seem to be hugetlb aware.
Ohh, they become in 5/9. Should that one be reordered then?
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage
  2013-02-21 19:41 ` [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Naoya Horiguchi
@ 2013-03-18 15:40   ` Michal Hocko
  2013-03-19  0:07     ` Naoya Horiguchi
  0 siblings, 1 reply; 55+ messages in thread
From: Michal Hocko @ 2013-03-18 15:40 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
> This patch extends check_range() to handle vma with VM_HUGETLB set.
> With this changes, we can migrate hugepage with migrate_pages(2).
> Note that for larger hugepages (covered by pud entries, 1GB for
> x86_64 for example), we simply skip it now.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  include/linux/hugetlb.h |  6 ++++--
>  mm/hugetlb.c            | 10 ++++++++++
>  mm/mempolicy.c          | 46 ++++++++++++++++++++++++++++++++++------------
>  3 files changed, 48 insertions(+), 14 deletions(-)
> 
> diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
> index 8f87115..eb33df5 100644
> --- v3.8.orig/include/linux/hugetlb.h
> +++ v3.8/include/linux/hugetlb.h
> @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
>  int dequeue_hwpoisoned_huge_page(struct page *page);
>  void putback_active_hugepage(struct page *page);
>  void putback_active_hugepages(struct list_head *l);
> +void migrate_hugepage_add(struct page *page, struct list_head *list);
>  void copy_huge_page(struct page *dst, struct page *src);
>  
>  extern unsigned long hugepages_treat_as_movable;
> @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  				pmd_t *pmd, int write);
>  struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
>  				pud_t *pud, int write);
> -int pmd_huge(pmd_t pmd);
> -int pud_huge(pud_t pmd);
> +extern int pmd_huge(pmd_t pmd);
> +extern int pud_huge(pud_t pmd);
extern is not needed here.
>  unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>  		unsigned long address, unsigned long end, pgprot_t newprot);
>  
> @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
>  
>  #define putback_active_hugepage(p) 0
>  #define putback_active_hugepages(l) 0
> +#define migrate_hugepage_add(p, l) 0
>  static inline void copy_huge_page(struct page *dst, struct page *src)
>  {
>  }
> diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> index cb9d43b8..86ffcb7 100644
> --- v3.8.orig/mm/hugetlb.c
> +++ v3.8/mm/hugetlb.c
> @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l)
>  	list_for_each_entry_safe(page, page2, l, lru)
>  		putback_active_hugepage(page);
>  }
> +
> +void migrate_hugepage_add(struct page *page, struct list_head *list)
> +{
> +	VM_BUG_ON(!PageHuge(page));
> +	get_page(page);
> +	spin_lock(&hugetlb_lock);
Why hugetlb_lock? Comment for this lock says that it protects
hugepage_freelists, nr_huge_pages, and free_huge_pages.
> +	list_move_tail(&page->lru, list);
> +	spin_unlock(&hugetlb_lock);
> +	return;
> +}
> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
> index e2df1c1..8627135 100644
> --- v3.8.orig/mm/mempolicy.c
> +++ v3.8/mm/mempolicy.c
> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>  	return addr != end;
>  }
>  
> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> +		const nodemask_t *nodes, unsigned long flags,
> +				    void *private)
> +{
> +#ifdef CONFIG_HUGETLB_PAGE
> +	int nid;
> +	struct page *page;
> +
> +	spin_lock(&vma->vm_mm->page_table_lock);
> +	page = pte_page(huge_ptep_get((pte_t *)pmd));
> +	spin_unlock(&vma->vm_mm->page_table_lock);
I am a bit confused why page_table_lock is used here and why it doesn't
cover the page usage.
> +	nid = page_to_nid(page);
> +	if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
> +	    && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
> +		|| flags & MPOL_MF_MOVE_ALL))
> +		migrate_hugepage_add(page, private);
> +#else
> +	BUG();
> +#endif
> +}
> +
>  static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
>  		unsigned long addr, unsigned long end,
>  		const nodemask_t *nodes, unsigned long flags,
> @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
>  	pmd = pmd_offset(pud, addr);
>  	do {
>  		next = pmd_addr_end(addr, end);
> +		if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
sufficient?
> +			check_hugetlb_pmd_range(vma, pmd, nodes,
> +						flags, private);
> +			continue;
> +		}
>  		split_huge_page_pmd(vma, addr, pmd);
>  		if (pmd_none_or_trans_huge_or_clear_bad(pmd))
>  			continue;
[...]
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable
  2013-02-21 19:41 ` [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Naoya Horiguchi
  2013-02-28  6:02   ` KOSAKI Motohiro
@ 2013-03-18 15:51   ` Michal Hocko
  2013-03-19  0:07     ` Naoya Horiguchi
  1 sibling, 1 reply; 55+ messages in thread
From: Michal Hocko @ 2013-03-18 15:51 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Thu 21-02-13 14:41:48, Naoya Horiguchi wrote:
> Now hugepages are definitely movable. So allocating hugepages from
> ZONE_MOVABLE is natural and we have no reason to keep this parameter.
The sysctl is a part of user interface so you shouldn't remove it right
away. What we can do is to make it noop and only WARN() that the
interface will be removed later so that userspace can prepare for that.
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  Documentation/sysctl/vm.txt | 16 ----------------
>  include/linux/hugetlb.h     |  2 --
>  kernel/sysctl.c             |  7 -------
>  mm/hugetlb.c                | 23 +++++------------------
>  4 files changed, 5 insertions(+), 43 deletions(-)
> 
> diff --git v3.8.orig/Documentation/sysctl/vm.txt v3.8/Documentation/sysctl/vm.txt
> index 078701f..997350a 100644
> --- v3.8.orig/Documentation/sysctl/vm.txt
> +++ v3.8/Documentation/sysctl/vm.txt
> @@ -167,22 +167,6 @@ fragmentation index is <= extfrag_threshold. The default value is 500.
>  
>  ==============================================================
>  
> -hugepages_treat_as_movable
> -
> -This parameter is only useful when kernelcore= is specified at boot time to
> -create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages
> -are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero
> -value written to hugepages_treat_as_movable allows huge pages to be allocated
> -from ZONE_MOVABLE.
> -
> -Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge
> -pages pool can easily grow or shrink within. Assuming that applications are
> -not running that mlock() a lot of memory, it is likely the huge pages pool
> -can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value
> -into nr_hugepages and triggering page reclaim.
> -
> -==============================================================
> -
>  hugetlb_shm_group
>  
>  hugetlb_shm_group contains group id that is allowed to create SysV
> diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
> index e33f07f..c97e5c5 100644
> --- v3.8.orig/include/linux/hugetlb.h
> +++ v3.8/include/linux/hugetlb.h
> @@ -35,7 +35,6 @@ int PageHuge(struct page *page);
>  void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
>  int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
>  int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
> -int hugetlb_treat_movable_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
>  
>  #ifdef CONFIG_NUMA
>  int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int,
> @@ -73,7 +72,6 @@ void migrate_hugepage_add(struct page *page, struct list_head *list);
>  int is_hugepage_movable(struct page *page);
>  void copy_huge_page(struct page *dst, struct page *src);
>  
> -extern unsigned long hugepages_treat_as_movable;
>  extern const unsigned long hugetlb_zero, hugetlb_infinity;
>  extern int sysctl_hugetlb_shm_group;
>  extern struct list_head huge_boot_pages;
> diff --git v3.8.orig/kernel/sysctl.c v3.8/kernel/sysctl.c
> index c88878d..a98bcf2 100644
> --- v3.8.orig/kernel/sysctl.c
> +++ v3.8/kernel/sysctl.c
> @@ -1189,13 +1189,6 @@ static struct ctl_table vm_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec,
>  	 },
> -	 {
> -		.procname	= "hugepages_treat_as_movable",
> -		.data		= &hugepages_treat_as_movable,
> -		.maxlen		= sizeof(int),
> -		.mode		= 0644,
> -		.proc_handler	= hugetlb_treat_movable_handler,
> -	},
>  	{
>  		.procname	= "nr_overcommit_hugepages",
>  		.data		= NULL,
> diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> index c28e6c9..c60d203 100644
> --- v3.8.orig/mm/hugetlb.c
> +++ v3.8/mm/hugetlb.c
> @@ -33,7 +33,6 @@
>  #include "internal.h"
>  
>  const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
> -static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
>  unsigned long hugepages_treat_as_movable;
>  
>  int hugetlb_max_hstate __read_mostly;
> @@ -542,7 +541,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>  retry_cpuset:
>  	cpuset_mems_cookie = get_mems_allowed();
>  	zonelist = huge_zonelist(vma, address,
> -					htlb_alloc_mask, &mpol, &nodemask);
> +					GFP_HIGHUSER_MOVABLE, &mpol, &nodemask);
>  	/*
>  	 * A child process with MAP_PRIVATE mappings created by their parent
>  	 * have no page reserves. This check ensures that reservations are
> @@ -558,7 +557,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>  
>  	for_each_zone_zonelist_nodemask(zone, z, zonelist,
>  						MAX_NR_ZONES - 1, nodemask) {
> -		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) {
> +		if (cpuset_zone_allowed_softwall(zone, GFP_HIGHUSER_MOVABLE)) {
>  			page = dequeue_huge_page_node(h, zone_to_nid(zone));
>  			if (page) {
>  				if (!avoid_reserve)
> @@ -698,7 +697,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
>  		return NULL;
>  
>  	page = alloc_pages_exact_node(nid,
> -		htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|
> +		GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE|
>  						__GFP_REPEAT|__GFP_NOWARN,
>  		huge_page_order(h));
>  	if (page) {
> @@ -909,12 +908,12 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>  	spin_unlock(&hugetlb_lock);
>  
>  	if (nid == NUMA_NO_NODE)
> -		page = alloc_pages(htlb_alloc_mask|__GFP_COMP|
> +		page = alloc_pages(GFP_HIGHUSER_MOVABLE|__GFP_COMP|
>  				   __GFP_REPEAT|__GFP_NOWARN,
>  				   huge_page_order(h));
>  	else
>  		page = alloc_pages_exact_node(nid,
> -			htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|
> +			GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE|
>  			__GFP_REPEAT|__GFP_NOWARN, huge_page_order(h));
>  
>  	if (page && arch_prepare_hugepage(page)) {
> @@ -2078,18 +2077,6 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *table, int write,
>  }
>  #endif /* CONFIG_NUMA */
>  
> -int hugetlb_treat_movable_handler(struct ctl_table *table, int write,
> -			void __user *buffer,
> -			size_t *length, loff_t *ppos)
> -{
> -	proc_dointvec(table, write, buffer, length, ppos);
> -	if (hugepages_treat_as_movable)
> -		htlb_alloc_mask = GFP_HIGHUSER_MOVABLE;
> -	else
> -		htlb_alloc_mask = GFP_HIGHUSER;
> -	return 0;
> -}
> -
>  int hugetlb_overcommit_handler(struct ctl_table *table, int write,
>  			void __user *buffer,
>  			size_t *length, loff_t *ppos)
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
  2013-02-21 19:41 ` [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Naoya Horiguchi
  2013-02-23  7:05   ` Hillf Danton
  2013-02-27  7:36   ` Chen Gong
@ 2013-03-18 16:07   ` Michal Hocko
  2013-03-20  3:55     ` Naoya Horiguchi
  2013-03-20  1:03   ` Simon Jeons
  3 siblings, 1 reply; 55+ messages in thread
From: Michal Hocko @ 2013-03-18 16:07 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Thu 21-02-13 14:41:47, Naoya Horiguchi wrote:
> Currently we can't offline memory blocks which contain hugepages because
> a hugepage is considered as an unmovable page. But now with this patch
> series, a hugepage has become movable, so by using hugepage migration we
> can offline such memory blocks.
> 
> What's different from other users of hugepage migration is that we need
> to decompose all the hugepages inside the target memory block into free
> buddy pages after hugepage migration, because otherwise free hugepages
> remaining in the memory block intervene the memory offlining.
> For this reason we introduce new functions dissolve_free_huge_page() and
> dissolve_free_huge_pages().
> 
> Other than that, what this patch does is straightforwardly to add hugepage
> migration code, that is, adding hugepage code to the functions which scan
> over pfn and collect hugepages to be migrated, and adding a hugepage
> allocation function to alloc_migrate_target().
> 
> As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
> over them because it's larger than memory block. So we now simply leave
> it to fail as it is.
What we could do is to check whether there is a free gb huge page on
other node and migrate there.
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  include/linux/hugetlb.h |  8 ++++++++
>  mm/hugetlb.c            | 43 +++++++++++++++++++++++++++++++++++++++++
>  mm/memory_hotplug.c     | 51 ++++++++++++++++++++++++++++++++++++++++---------
>  mm/migrate.c            | 12 +++++++++++-
>  mm/page_alloc.c         | 12 ++++++++++++
>  mm/page_isolation.c     |  5 +++++
>  6 files changed, 121 insertions(+), 10 deletions(-)
> 
> diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
> index 86a4d78..e33f07f 100644
> --- v3.8.orig/include/linux/hugetlb.h
> +++ v3.8/include/linux/hugetlb.h
> @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page);
>  void putback_active_hugepage(struct page *page);
>  void putback_active_hugepages(struct list_head *l);
>  void migrate_hugepage_add(struct page *page, struct list_head *list);
> +int is_hugepage_movable(struct page *page);
>  void copy_huge_page(struct page *dst, struct page *src);
>  
>  extern unsigned long hugepages_treat_as_movable;
> @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
>  #define putback_active_hugepage(p) 0
>  #define putback_active_hugepages(l) 0
>  #define migrate_hugepage_add(p, l) 0
> +#define is_hugepage_movable(x) 0
>  static inline void copy_huge_page(struct page *dst, struct page *src)
>  {
>  }
> @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h)
>  	return h - hstates;
>  }
>  
> +extern void dissolve_free_huge_page(struct page *page);
> +extern void dissolve_free_huge_pages(unsigned long start_pfn,
> +				     unsigned long end_pfn);
> +
>  #else
>  struct hstate {};
>  #define alloc_huge_page(v, a, r) NULL
> @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>  }
>  #define hstate_index_to_shift(index) 0
>  #define hstate_index(h) 0
> +#define dissolve_free_huge_page(p) 0
> +#define dissolve_free_huge_pages(s, e) 0
>  #endif
>  
>  #endif /* _LINUX_HUGETLB_H */
> diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> index ccf9995..c28e6c9 100644
> --- v3.8.orig/mm/hugetlb.c
> +++ v3.8/mm/hugetlb.c
> @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>  	return ret;
>  }
>  
> +/* Dissolve a given free hugepage into free pages. */
> +void dissolve_free_huge_page(struct page *page)
> +{
> +	if (PageHuge(page) && !page_count(page)) {
Could you clarify why you are cheking page_count here? I assume it is to
make sure the page is free but what prevents it being increased before
you take hugetlb_lock?
> +		struct hstate *h = page_hstate(page);
> +		int nid = page_to_nid(page);
> +		spin_lock(&hugetlb_lock);
> +		list_del(&page->lru);
> +		h->free_huge_pages--;
> +		h->free_huge_pages_node[nid]--;
> +		update_and_free_page(h, page);
> +		spin_unlock(&hugetlb_lock);
> +	}
> +}
> +
> +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */
> +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn;
> +	unsigned int step = 1 << (HUGETLB_PAGE_ORDER);
hugetlb pages could be present in different sizes so this doesn't work
in general. You need to to get order from page_hstate.
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += step)
> +		dissolve_free_huge_page(pfn_to_page(pfn));
> +}
> +
>  static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>  {
>  	struct page *page;
> @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage)
>  	return 0;
>  }
>  
> +/* Returns true for head pages of in-use hugepages, otherwise returns false. */
> +int is_hugepage_movable(struct page *hpage)
> +{
> +	struct page *page;
> +	struct page *tmp;
> +	struct hstate *h = page_hstate(hpage);
> +	int ret = 0;
> +
> +	VM_BUG_ON(!PageHuge(hpage));
> +	if (PageTail(hpage))
> +		return 0;
> +	spin_lock(&hugetlb_lock);
> +	list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru)
> +		if (page == hpage)
> +			ret = 1;
> +	spin_unlock(&hugetlb_lock);
> +	return ret;
> +}
> +
>  /*
>   * This function is called from memory failure code.
>   * Assume the caller holds page lock of the head page.
> diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c
> index d04ed87..6418de2 100644
> --- v3.8.orig/mm/memory_hotplug.c
> +++ v3.8/mm/memory_hotplug.c
> @@ -29,6 +29,7 @@
>  #include <linux/suspend.h>
>  #include <linux/mm_inline.h>
>  #include <linux/firmware-map.h>
> +#include <linux/hugetlb.h>
>  
>  #include <asm/tlbflush.h>
>  
> @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
>  }
>  
>  /*
> - * Scanning pfn is much easier than scanning lru list.
> - * Scan pfn from start to end and Find LRU page.
> + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages
> + * and hugepages). We scan pfn because it's much easier than scanning over
> + * linked list. This function returns the pfn of the first found movable
> + * page if it's found, otherwise 0.
>   */
> -static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
> +static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
>  {
>  	unsigned long pfn;
>  	struct page *page;
> @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
>  			page = pfn_to_page(pfn);
>  			if (PageLRU(page))
>  				return pfn;
> +			if (PageHuge(page)) {
> +				if (is_hugepage_movable(page))
> +					return pfn;
> +				else
> +					pfn += (1 << compound_order(page)) - 1;
> +			}
scan_lru_pages's name gets really confusing after this change because
hugetlb pages are not on the LRU. Maybe it would be good to rename it.
>  		}
>  	}
>  	return 0;
> @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  		page = pfn_to_page(pfn);
>  		if (!get_page_unless_zero(page))
>  			continue;
All tail pages have 0 reference count (according to prep_compound_page)
so they would be skipped anyway. This makes the below pfn tweaks
pointless.
> +		if (PageHuge(page)) {
> +			/*
> +			 * Larger hugepage (1GB for x86_64) is larger than
> +			 * memory block, so pfn scan can start at the tail
> +			 * page of larger hugepage. In such case,
> +			 * we simply skip the hugepage and move the cursor
> +			 * to the last tail page.
> +			 */
> +			if (PageTail(page)) {
> +				struct page *head = compound_head(page);
> +				pfn = page_to_pfn(head) +
> +					(1 << compound_order(head)) - 1;
> +				put_page(page);
> +				continue;
> +			}
> +			pfn = (1 << compound_order(page)) - 1;
> +			if (huge_page_size(page_hstate(page)) != PMD_SIZE) {
> +				put_page(page);
> +				continue;
> +			}
There might be other hugepage sizes which fit into memblock so this test
doesn't seem right.
> +			list_move_tail(&page->lru, &source);
> +			move_pages -= 1 << compound_order(page);
> +			continue;
> +		}
>  		/*
>  		 * We can skip free pages. And we can only deal with pages on
>  		 * LRU.
[...]
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge()
  2013-03-18 14:51   ` Michal Hocko
@ 2013-03-19  0:06     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-03-19  0:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Mon, Mar 18, 2013 at 03:51:59PM +0100, Michal Hocko wrote:
> On Thu 21-02-13 14:41:40, Naoya Horiguchi wrote:
> [...]
> > diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c
> > index 2fd8b4a..7d84f4c 100644
> > --- v3.8.orig/mm/migrate.c
> > +++ v3.8/mm/migrate.c
> > @@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> >  	pte_unmap_unlock(ptep, ptl);
> >  }
> >  
> > +void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd,
> > +				unsigned long address)
> > +{
> > +	spinlock_t *ptl = pte_lockptr(mm, pmd);
> > +	pte_t pte;
> > +	swp_entry_t entry;
> > +	struct page *page;
> > +
> > +	spin_lock(ptl);
> > +	pte = huge_ptep_get((pte_t *)pmd);
> > +	if (!is_hugetlb_entry_migration(pte))
> > +		goto out;
> > +	entry = pte_to_swp_entry(pte);
> > +	page = migration_entry_to_page(entry);
> > +	if (!get_page_unless_zero(page))
> > +		goto out;
> > +	spin_unlock(ptl);
> > +	wait_on_page_locked(page);
> > +	put_page(page);
> > +	return;
> > +out:
> > +	spin_unlock(ptl);
> > +}
> 
> This duplicates a lot of code from migration_entry_wait. Can we just
> teach the generic one to be HugePage aware instead?
> All it takes is just opencoding pte_offset_map_lock and calling
> huge_ptep_get ofr HugePage and pte_offset_map otherwise.
Yes, it's possible with some cleanup. I'll do this.
Thanks,
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 2/9] migrate: make core migration code aware of hugepage
  2013-03-18 15:33     ` Michal Hocko
@ 2013-03-19  0:06       ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-03-19  0:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Mon, Mar 18, 2013 at 04:33:00PM +0100, Michal Hocko wrote:
> On Mon 18-03-13 16:22:24, Michal Hocko wrote:
> > On Thu 21-02-13 14:41:41, Naoya Horiguchi wrote:
> > [...]
> > > diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h
> > > index 0d7df39..2e475b5 100644
> > > --- v3.8.orig/include/linux/mempolicy.h
> > > +++ v3.8/include/linux/mempolicy.h
> > > @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol);
> > >  /* Check if a vma is migratable */
> > >  static inline int vma_migratable(struct vm_area_struct *vma)
> > >  {
> > > -	if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP))
> > > +	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
> > >  		return 0;
> > 
> > Is this safe? At least check_*_range don't seem to be hugetlb aware.
> 
> Ohh, they become in 5/9. Should that one be reordered then?
OK, I'll shift this change after 5/9 patch.
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage
  2013-03-18 15:40   ` Michal Hocko
@ 2013-03-19  0:07     ` Naoya Horiguchi
  2013-03-19  7:11       ` Michal Hocko
  2013-03-20  0:31       ` Simon Jeons
  0 siblings, 2 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-03-19  0:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote:
> On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
> > This patch extends check_range() to handle vma with VM_HUGETLB set.
> > With this changes, we can migrate hugepage with migrate_pages(2).
> > Note that for larger hugepages (covered by pud entries, 1GB for
> > x86_64 for example), we simply skip it now.
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > ---
> >  include/linux/hugetlb.h |  6 ++++--
> >  mm/hugetlb.c            | 10 ++++++++++
> >  mm/mempolicy.c          | 46 ++++++++++++++++++++++++++++++++++------------
> >  3 files changed, 48 insertions(+), 14 deletions(-)
> > 
> > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
> > index 8f87115..eb33df5 100644
> > --- v3.8.orig/include/linux/hugetlb.h
> > +++ v3.8/include/linux/hugetlb.h
> > @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
> >  int dequeue_hwpoisoned_huge_page(struct page *page);
> >  void putback_active_hugepage(struct page *page);
> >  void putback_active_hugepages(struct list_head *l);
> > +void migrate_hugepage_add(struct page *page, struct list_head *list);
> >  void copy_huge_page(struct page *dst, struct page *src);
> >  
> >  extern unsigned long hugepages_treat_as_movable;
> > @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> >  				pmd_t *pmd, int write);
> >  struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
> >  				pud_t *pud, int write);
> > -int pmd_huge(pmd_t pmd);
> > -int pud_huge(pud_t pmd);
> > +extern int pmd_huge(pmd_t pmd);
> > +extern int pud_huge(pud_t pmd);
> 
> extern is not needed here.
OK.
> >  unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
> >  		unsigned long address, unsigned long end, pgprot_t newprot);
> >  
> > @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
> >  
> >  #define putback_active_hugepage(p) 0
> >  #define putback_active_hugepages(l) 0
> > +#define migrate_hugepage_add(p, l) 0
> >  static inline void copy_huge_page(struct page *dst, struct page *src)
> >  {
> >  }
> > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> > index cb9d43b8..86ffcb7 100644
> > --- v3.8.orig/mm/hugetlb.c
> > +++ v3.8/mm/hugetlb.c
> > @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l)
> >  	list_for_each_entry_safe(page, page2, l, lru)
> >  		putback_active_hugepage(page);
> >  }
> > +
> > +void migrate_hugepage_add(struct page *page, struct list_head *list)
> > +{
> > +	VM_BUG_ON(!PageHuge(page));
> > +	get_page(page);
> > +	spin_lock(&hugetlb_lock);
> 
> Why hugetlb_lock? Comment for this lock says that it protects
> hugepage_freelists, nr_huge_pages, and free_huge_pages.
I think that this comment is out of date and hugepage_activelists,
which was introduced recently, should be protected because this
patchset adds is_hugepage_movable() which runs through the list.
So I'll update the comment in the next post.
> > +	list_move_tail(&page->lru, list);
> > +	spin_unlock(&hugetlb_lock);
> > +	return;
> > +}
> > diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
> > index e2df1c1..8627135 100644
> > --- v3.8.orig/mm/mempolicy.c
> > +++ v3.8/mm/mempolicy.c
> > @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> >  	return addr != end;
> >  }
> >  
> > +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> > +		const nodemask_t *nodes, unsigned long flags,
> > +				    void *private)
> > +{
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +	int nid;
> > +	struct page *page;
> > +
> > +	spin_lock(&vma->vm_mm->page_table_lock);
> > +	page = pte_page(huge_ptep_get((pte_t *)pmd));
> > +	spin_unlock(&vma->vm_mm->page_table_lock);
> 
> I am a bit confused why page_table_lock is used here and why it doesn't
> cover the page usage.
I expected this function to do the same for pmd as check_pte_range() does
for pte, but the above code didn't do it. I should've put spin_unlock
below migrate_hugepage_add(). Sorry for the confusion.
> > +	nid = page_to_nid(page);
> > +	if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
> > +	    && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
> > +		|| flags & MPOL_MF_MOVE_ALL))
> > +		migrate_hugepage_add(page, private);
> > +#else
> > +	BUG();
> > +#endif
> > +}
> > +
> >  static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> >  		unsigned long addr, unsigned long end,
> >  		const nodemask_t *nodes, unsigned long flags,
> > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> >  	pmd = pmd_offset(pud, addr);
> >  	do {
> >  		next = pmd_addr_end(addr, end);
> > +		if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
> 
> Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
> sufficient?
I think we need both check here because if we use only pmd_huge(),
pmd for thp goes into this branch wrongly. 
Thanks,
Naoya
> > +			check_hugetlb_pmd_range(vma, pmd, nodes,
> > +						flags, private);
> > +			continue;
> > +		}
> >  		split_huge_page_pmd(vma, addr, pmd);
> >  		if (pmd_none_or_trans_huge_or_clear_bad(pmd))
> >  			continue;
> [...]
> -- 
> Michal Hocko
> SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable
  2013-03-18 15:51   ` Michal Hocko
@ 2013-03-19  0:07     ` Naoya Horiguchi
  0 siblings, 0 replies; 55+ messages in thread
From: Naoya Horiguchi @ 2013-03-19  0:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Mon, Mar 18, 2013 at 04:51:25PM +0100, Michal Hocko wrote:
> On Thu 21-02-13 14:41:48, Naoya Horiguchi wrote:
> > Now hugepages are definitely movable. So allocating hugepages from
> > ZONE_MOVABLE is natural and we have no reason to keep this parameter.
> 
> The sysctl is a part of user interface so you shouldn't remove it right
> away. What we can do is to make it noop and only WARN() that the
> interface will be removed later so that userspace can prepare for that.
> 
Yes, you're right. I'll replace the handler with noop.
Thanks,
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage
  2013-03-19  0:07     ` Naoya Horiguchi
@ 2013-03-19  7:11       ` Michal Hocko
  2013-03-20  6:12         ` Naoya Horiguchi
  2013-03-20  0:31       ` Simon Jeons
  1 sibling, 1 reply; 55+ messages in thread
From: Michal Hocko @ 2013-03-19  7:11 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote:
> On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote:
> > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
[...]
> > > @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l)
> > >  	list_for_each_entry_safe(page, page2, l, lru)
> > >  		putback_active_hugepage(page);
> > >  }
> > > +
> > > +void migrate_hugepage_add(struct page *page, struct list_head *list)
> > > +{
> > > +	VM_BUG_ON(!PageHuge(page));
> > > +	get_page(page);
> > > +	spin_lock(&hugetlb_lock);
> > 
> > Why hugetlb_lock? Comment for this lock says that it protects
> > hugepage_freelists, nr_huge_pages, and free_huge_pages.
> 
> I think that this comment is out of date and hugepage_activelists,
> which was introduced recently, should be protected because this
> patchset adds is_hugepage_movable() which runs through the list.
> So I'll update the comment in the next post.
> 
> > > +	list_move_tail(&page->lru, list);
> > > +	spin_unlock(&hugetlb_lock);
> > > +	return;
> > > +}
> > > diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
> > > index e2df1c1..8627135 100644
> > > --- v3.8.orig/mm/mempolicy.c
> > > +++ v3.8/mm/mempolicy.c
> > > @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> > >  	return addr != end;
> > >  }
> > >  
> > > +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> > > +		const nodemask_t *nodes, unsigned long flags,
> > > +				    void *private)
> > > +{
> > > +#ifdef CONFIG_HUGETLB_PAGE
> > > +	int nid;
> > > +	struct page *page;
> > > +
> > > +	spin_lock(&vma->vm_mm->page_table_lock);
> > > +	page = pte_page(huge_ptep_get((pte_t *)pmd));
> > > +	spin_unlock(&vma->vm_mm->page_table_lock);
> > 
> > I am a bit confused why page_table_lock is used here and why it doesn't
> > cover the page usage.
> 
> I expected this function to do the same for pmd as check_pte_range() does
> for pte, but the above code didn't do it. I should've put spin_unlock
> below migrate_hugepage_add(). Sorry for the confusion.
OK, I see. So you want to prevent from racing with pmd unmap.
> > > +	nid = page_to_nid(page);
> > > +	if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
> > > +	    && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
> > > +		|| flags & MPOL_MF_MOVE_ALL))
> > > +		migrate_hugepage_add(page, private);
> > > +#else
> > > +	BUG();
> > > +#endif
> > > +}
> > > +
> > >  static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> > >  		unsigned long addr, unsigned long end,
> > >  		const nodemask_t *nodes, unsigned long flags,
> > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> > >  	pmd = pmd_offset(pud, addr);
> > >  	do {
> > >  		next = pmd_addr_end(addr, end);
> > > +		if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
> > 
> > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
> > sufficient?
> 
> I think we need both check here because if we use only pmd_huge(),
> pmd for thp goes into this branch wrongly. 
Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it
obviously checks only _PAGE_PSE same as pmd_large() which is really
unfortunate and confusing. Can we make it hugetlb specific?
> 
> Thanks,
> Naoya
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 0/9] extend hugepage migration
  2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
                   ` (8 preceding siblings ...)
  2013-02-21 19:41 ` [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Naoya Horiguchi
@ 2013-03-19 23:43 ` Simon Jeons
  2013-03-20 21:35   ` Naoya Horiguchi
  9 siblings, 1 reply; 55+ messages in thread
From: Simon Jeons @ 2013-03-19 23:43 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
Hi Naoya,
On 02/22/2013 03:41 AM, Naoya Horiguchi wrote:
> Hi,
>
> Hugepage migration is now available only for soft offlining (moving
> data on the half corrupted page to another page to save the data).
> But it's also useful some other users of page migration, so this
> patchset tries to extend some of such users to support hugepage.
>
> The targets of this patchset are NUMA related system calls (i.e.
> migrate_pages(2), move_pages(2), and mbind(2)), and memory hotplug.
> This patchset does not extend page migration in memory compaction,
> because I think that users of memory compaction mainly expect to
> construct thp by arranging raw pages but hugepage migration doesn't
> help it.
> CMA, another user of page migration, can have benefit from hugepage
> migration, but is not enabled to support it now. This is because
> I've never used CMA and need to learn more to extend and/or test
> hugepage migration in CMA. I'll add this in later version if it
> becomes ready, or will post as a separate patchset.
>
> Hugepage migration of 1GB hugepage is not enabled for now, because
> I'm not sure whether users of 1GB hugepage really want it.
> We need to spare free hugepage in order to do migration, but I don't
> think that users want to 1GB memory to idle for that purpose
> (currently we can't expand/shrink 1GB hugepage pool after boot).
>
> Could you review and give me some comments/feedbacks?
>
> Thanks,
> Naoya Horiguchi
> ---
> Easy patch access:
>    git@github.com:Naoya-Horiguchi/linux.git
>    branch:extend_hugepage_migration
>
> Test code:
>    git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git
git clone 
git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git
Cloning into test_hugepage_migration_extension...
Permission denied (publickey).
fatal: The remote end hung up unexpectedly
>
> Naoya Horiguchi (9):
>        migrate: add migrate_entry_wait_huge()
>        migrate: make core migration code aware of hugepage
>        soft-offline: use migrate_pages() instead of migrate_huge_page()
>        migrate: clean up migrate_huge_page()
>        migrate: enable migrate_pages() to migrate hugepage
>        migrate: enable move_pages() to migrate hugepage
>        mbind: enable mbind() to migrate hugepage
>        memory-hotplug: enable memory hotplug to handle hugepage
>        remove /proc/sys/vm/hugepages_treat_as_movable
>
>   Documentation/sysctl/vm.txt |  16 ------
>   include/linux/hugetlb.h     |  25 ++++++++--
>   include/linux/mempolicy.h   |   2 +-
>   include/linux/migrate.h     |  12 ++---
>   include/linux/swapops.h     |   4 ++
>   kernel/sysctl.c             |   7 ---
>   mm/hugetlb.c                |  98 ++++++++++++++++++++++++++++--------
>   mm/memory-failure.c         |  20 ++++++--
>   mm/memory.c                 |   6 ++-
>   mm/memory_hotplug.c         |  51 +++++++++++++++----
>   mm/mempolicy.c              |  61 +++++++++++++++--------
>   mm/migrate.c                | 119 ++++++++++++++++++++++++++++++--------------
>   mm/page_alloc.c             |  12 +++++
>   mm/page_isolation.c         |   5 ++
>   14 files changed, 311 insertions(+), 127 deletions(-)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge()
  2013-02-21 19:41 ` [PATCH 1/9] migrate: add migrate_entry_wait_huge() Naoya Horiguchi
  2013-03-18 14:51   ` Michal Hocko
@ 2013-03-19 23:57   ` Simon Jeons
  2013-03-20 21:53     ` Naoya Horiguchi
  1 sibling, 1 reply; 55+ messages in thread
From: Simon Jeons @ 2013-03-19 23:57 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
Hi Naoya,
On 02/22/2013 03:41 AM, Naoya Horiguchi wrote:
> When we have a page fault for the address which is backed by a hugepage
> under migration, the kernel can't wait correctly until the migration
> finishes. This is because pte_offset_map_lock() can't get a correct
It seems that current hugetlb_fault still wait hugetlb page under 
migration, how can it work without lock 2MB memory?
> migration entry for hugepage. This patch adds migration_entry_wait_huge()
> to separate code path between normal pages and hugepages.
>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>   include/linux/hugetlb.h |  2 ++
>   include/linux/swapops.h |  4 ++++
>   mm/hugetlb.c            |  4 ++--
>   mm/migrate.c            | 24 ++++++++++++++++++++++++
>   4 files changed, 32 insertions(+), 2 deletions(-)
>
> diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
> index 0c80d3f..40b27f6 100644
> --- v3.8.orig/include/linux/hugetlb.h
> +++ v3.8/include/linux/hugetlb.h
> @@ -43,6 +43,7 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int,
>   #endif
>   
>   int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
> +int is_hugetlb_entry_migration(pte_t pte);
>   int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
>   			struct page **, struct vm_area_struct **,
>   			unsigned long *, int *, int, unsigned int flags);
> @@ -109,6 +110,7 @@ static inline unsigned long hugetlb_total_pages(void)
>   #define follow_hugetlb_page(m,v,p,vs,a,b,i,w)	({ BUG(); 0; })
>   #define follow_huge_addr(mm, addr, write)	ERR_PTR(-EINVAL)
>   #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
> +#define is_hugetlb_entry_migration(pte)		({ BUG(); 0; })
>   #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
>   static inline void hugetlb_report_meminfo(struct seq_file *m)
>   {
> diff --git v3.8.orig/include/linux/swapops.h v3.8/include/linux/swapops.h
> index 47ead51..f68efdd 100644
> --- v3.8.orig/include/linux/swapops.h
> +++ v3.8/include/linux/swapops.h
> @@ -137,6 +137,8 @@ static inline void make_migration_entry_read(swp_entry_t *entry)
>   
>   extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>   					unsigned long address);
> +extern void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd,
> +					unsigned long address);
>   #else
>   
>   #define make_migration_entry(page, write) swp_entry(0, 0)
> @@ -148,6 +150,8 @@ static inline int is_migration_entry(swp_entry_t swp)
>   static inline void make_migration_entry_read(swp_entry_t *entryp) { }
>   static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>   					 unsigned long address) { }
> +static inline void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd,
> +					 unsigned long address) { }
>   static inline int is_write_migration_entry(swp_entry_t entry)
>   {
>   	return 0;
> diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> index 546db81..351025e 100644
> --- v3.8.orig/mm/hugetlb.c
> +++ v3.8/mm/hugetlb.c
> @@ -2313,7 +2313,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   	return -ENOMEM;
>   }
>   
> -static int is_hugetlb_entry_migration(pte_t pte)
> +int is_hugetlb_entry_migration(pte_t pte)
>   {
>   	swp_entry_t swp;
>   
> @@ -2823,7 +2823,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   	if (ptep) {
>   		entry = huge_ptep_get(ptep);
>   		if (unlikely(is_hugetlb_entry_migration(entry))) {
> -			migration_entry_wait(mm, (pmd_t *)ptep, address);
> +			migration_entry_wait_huge(mm, (pmd_t *)ptep, address);
>   			return 0;
>   		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
>   			return VM_FAULT_HWPOISON_LARGE |
> diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c
> index 2fd8b4a..7d84f4c 100644
> --- v3.8.orig/mm/migrate.c
> +++ v3.8/mm/migrate.c
> @@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>   	pte_unmap_unlock(ptep, ptl);
>   }
>   
> +void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd,
> +				unsigned long address)
> +{
> +	spinlock_t *ptl = pte_lockptr(mm, pmd);
> +	pte_t pte;
> +	swp_entry_t entry;
> +	struct page *page;
> +
> +	spin_lock(ptl);
> +	pte = huge_ptep_get((pte_t *)pmd);
> +	if (!is_hugetlb_entry_migration(pte))
> +		goto out;
> +	entry = pte_to_swp_entry(pte);
> +	page = migration_entry_to_page(entry);
> +	if (!get_page_unless_zero(page))
> +		goto out;
> +	spin_unlock(ptl);
> +	wait_on_page_locked(page);
> +	put_page(page);
> +	return;
> +out:
> +	spin_unlock(ptl);
> +}
> +
>   #ifdef CONFIG_BLOCK
>   /* Returns true if all buffers are successfully locked */
>   static bool buffer_migrate_lock_buffers(struct buffer_head *head,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage
  2013-03-19  0:07     ` Naoya Horiguchi
  2013-03-19  7:11       ` Michal Hocko
@ 2013-03-20  0:31       ` Simon Jeons
  2013-03-20 21:59         ` Naoya Horiguchi
  1 sibling, 1 reply; 55+ messages in thread
From: Simon Jeons @ 2013-03-20  0:31 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Michal Hocko, linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
Hi Naoya,
On 03/19/2013 08:07 AM, Naoya Horiguchi wrote:
> On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote:
>> On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
>>> This patch extends check_range() to handle vma with VM_HUGETLB set.
>>> With this changes, we can migrate hugepage with migrate_pages(2).
>>> Note that for larger hugepages (covered by pud entries, 1GB for
>>> x86_64 for example), we simply skip it now.
>>>
>>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> ---
>>>  include/linux/hugetlb.h |  6 ++++--
>>>  mm/hugetlb.c            | 10 ++++++++++
>>>  mm/mempolicy.c          | 46 ++++++++++++++++++++++++++++++++++------------
>>>  3 files changed, 48 insertions(+), 14 deletions(-)
>>>
>>> diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
>>> index 8f87115..eb33df5 100644
>>> --- v3.8.orig/include/linux/hugetlb.h
>>> +++ v3.8/include/linux/hugetlb.h
>>> @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
>>>  int dequeue_hwpoisoned_huge_page(struct page *page);
>>>  void putback_active_hugepage(struct page *page);
>>>  void putback_active_hugepages(struct list_head *l);
>>> +void migrate_hugepage_add(struct page *page, struct list_head *list);
>>>  void copy_huge_page(struct page *dst, struct page *src);
>>>  
>>>  extern unsigned long hugepages_treat_as_movable;
>>> @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>>>  				pmd_t *pmd, int write);
>>>  struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
>>>  				pud_t *pud, int write);
>>> -int pmd_huge(pmd_t pmd);
>>> -int pud_huge(pud_t pmd);
>>> +extern int pmd_huge(pmd_t pmd);
>>> +extern int pud_huge(pud_t pmd);
>> extern is not needed here.
> OK.
>
>>>  unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>>>  		unsigned long address, unsigned long end, pgprot_t newprot);
>>>  
>>> @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
>>>  
>>>  #define putback_active_hugepage(p) 0
>>>  #define putback_active_hugepages(l) 0
>>> +#define migrate_hugepage_add(p, l) 0
>>>  static inline void copy_huge_page(struct page *dst, struct page *src)
>>>  {
>>>  }
>>> diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
>>> index cb9d43b8..86ffcb7 100644
>>> --- v3.8.orig/mm/hugetlb.c
>>> +++ v3.8/mm/hugetlb.c
>>> @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l)
>>>  	list_for_each_entry_safe(page, page2, l, lru)
>>>  		putback_active_hugepage(page);
>>>  }
>>> +
>>> +void migrate_hugepage_add(struct page *page, struct list_head *list)
>>> +{
>>> +	VM_BUG_ON(!PageHuge(page));
>>> +	get_page(page);
>>> +	spin_lock(&hugetlb_lock);
>> Why hugetlb_lock? Comment for this lock says that it protects
>> hugepage_freelists, nr_huge_pages, and free_huge_pages.
> I think that this comment is out of date and hugepage_activelists,
> which was introduced recently, should be protected because this
> patchset adds is_hugepage_movable() which runs through the list.
> So I'll update the comment in the next post.
>
>>> +	list_move_tail(&page->lru, list);
>>> +	spin_unlock(&hugetlb_lock);
>>> +	return;
>>> +}
>>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
>>> index e2df1c1..8627135 100644
>>> --- v3.8.orig/mm/mempolicy.c
>>> +++ v3.8/mm/mempolicy.c
>>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>>>  	return addr != end;
>>>  }
>>>  
>>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
>>> +		const nodemask_t *nodes, unsigned long flags,
>>> +				    void *private)
>>> +{
>>> +#ifdef CONFIG_HUGETLB_PAGE
>>> +	int nid;
>>> +	struct page *page;
>>> +
>>> +	spin_lock(&vma->vm_mm->page_table_lock);
>>> +	page = pte_page(huge_ptep_get((pte_t *)pmd));
>>> +	spin_unlock(&vma->vm_mm->page_table_lock);
>> I am a bit confused why page_table_lock is used here and why it doesn't
>> cover the page usage.
> I expected this function to do the same for pmd as check_pte_range() does
> for pte, but the above code didn't do it. I should've put spin_unlock
> below migrate_hugepage_add(). Sorry for the confusion.
I still confuse! Could you explain more in details?
>
>>> +	nid = page_to_nid(page);
>>> +	if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
>>> +	    && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
>>> +		|| flags & MPOL_MF_MOVE_ALL))
>>> +		migrate_hugepage_add(page, private);
>>> +#else
>>> +	BUG();
>>> +#endif
>>> +}
>>> +
>>>  static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
>>>  		unsigned long addr, unsigned long end,
>>>  		const nodemask_t *nodes, unsigned long flags,
>>> @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
>>>  	pmd = pmd_offset(pud, addr);
>>>  	do {
>>>  		next = pmd_addr_end(addr, end);
>>> +		if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
>> Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
>> sufficient?
> I think we need both check here because if we use only pmd_huge(),
> pmd for thp goes into this branch wrongly. 
>
> Thanks,
> Naoya
>
>>> +			check_hugetlb_pmd_range(vma, pmd, nodes,
>>> +						flags, private);
>>> +			continue;
>>> +		}
>>>  		split_huge_page_pmd(vma, addr, pmd);
>>>  		if (pmd_none_or_trans_huge_or_clear_bad(pmd))
>>>  			continue;
>> [...]
>> -- 
>> Michal Hocko
>> SUSE Labs
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
  2013-02-21 19:41 ` [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Naoya Horiguchi
                     ` (2 preceding siblings ...)
  2013-03-18 16:07   ` Michal Hocko
@ 2013-03-20  1:03   ` Simon Jeons
  2013-03-20 22:05     ` Naoya Horiguchi
  3 siblings, 1 reply; 55+ messages in thread
From: Simon Jeons @ 2013-03-20  1:03 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
Hi Naoya,
On 02/22/2013 03:41 AM, Naoya Horiguchi wrote:
> Currently we can't offline memory blocks which contain hugepages because
> a hugepage is considered as an unmovable page. But now with this patch
> series, a hugepage has become movable, so by using hugepage migration we
> can offline such memory blocks.
>
> What's different from other users of hugepage migration is that we need
> to decompose all the hugepages inside the target memory block into free
For other hugepage migration users, hugepage should be freed to 
hugepage_freelists after migration, but why I don't see any codes do this?
> buddy pages after hugepage migration, because otherwise free hugepages
> remaining in the memory block intervene the memory offlining.
> For this reason we introduce new functions dissolve_free_huge_page() and
> dissolve_free_huge_pages().
>
> Other than that, what this patch does is straightforwardly to add hugepage
> migration code, that is, adding hugepage code to the functions which scan
> over pfn and collect hugepages to be migrated, and adding a hugepage
> allocation function to alloc_migrate_target().
>
> As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
> over them because it's larger than memory block. So we now simply leave
> it to fail as it is.
>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>   include/linux/hugetlb.h |  8 ++++++++
>   mm/hugetlb.c            | 43 +++++++++++++++++++++++++++++++++++++++++
>   mm/memory_hotplug.c     | 51 ++++++++++++++++++++++++++++++++++++++++---------
>   mm/migrate.c            | 12 +++++++++++-
>   mm/page_alloc.c         | 12 ++++++++++++
>   mm/page_isolation.c     |  5 +++++
>   6 files changed, 121 insertions(+), 10 deletions(-)
>
> diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
> index 86a4d78..e33f07f 100644
> --- v3.8.orig/include/linux/hugetlb.h
> +++ v3.8/include/linux/hugetlb.h
> @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page);
>   void putback_active_hugepage(struct page *page);
>   void putback_active_hugepages(struct list_head *l);
>   void migrate_hugepage_add(struct page *page, struct list_head *list);
> +int is_hugepage_movable(struct page *page);
>   void copy_huge_page(struct page *dst, struct page *src);
>   
>   extern unsigned long hugepages_treat_as_movable;
> @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
>   #define putback_active_hugepage(p) 0
>   #define putback_active_hugepages(l) 0
>   #define migrate_hugepage_add(p, l) 0
> +#define is_hugepage_movable(x) 0
>   static inline void copy_huge_page(struct page *dst, struct page *src)
>   {
>   }
> @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h)
>   	return h - hstates;
>   }
>   
> +extern void dissolve_free_huge_page(struct page *page);
> +extern void dissolve_free_huge_pages(unsigned long start_pfn,
> +				     unsigned long end_pfn);
> +
>   #else
>   struct hstate {};
>   #define alloc_huge_page(v, a, r) NULL
> @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>   }
>   #define hstate_index_to_shift(index) 0
>   #define hstate_index(h) 0
> +#define dissolve_free_huge_page(p) 0
> +#define dissolve_free_huge_pages(s, e) 0
>   #endif
>   
>   #endif /* _LINUX_HUGETLB_H */
> diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> index ccf9995..c28e6c9 100644
> --- v3.8.orig/mm/hugetlb.c
> +++ v3.8/mm/hugetlb.c
> @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>   	return ret;
>   }
>   
> +/* Dissolve a given free hugepage into free pages. */
> +void dissolve_free_huge_page(struct page *page)
> +{
> +	if (PageHuge(page) && !page_count(page)) {
> +		struct hstate *h = page_hstate(page);
> +		int nid = page_to_nid(page);
> +		spin_lock(&hugetlb_lock);
> +		list_del(&page->lru);
> +		h->free_huge_pages--;
> +		h->free_huge_pages_node[nid]--;
> +		update_and_free_page(h, page);
> +		spin_unlock(&hugetlb_lock);
> +	}
> +}
> +
> +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */
> +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn;
> +	unsigned int step = 1 << (HUGETLB_PAGE_ORDER);
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += step)
> +		dissolve_free_huge_page(pfn_to_page(pfn));
> +}
> +
>   static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>   {
>   	struct page *page;
> @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage)
>   	return 0;
>   }
>   
> +/* Returns true for head pages of in-use hugepages, otherwise returns false. */
> +int is_hugepage_movable(struct page *hpage)
> +{
> +	struct page *page;
> +	struct page *tmp;
> +	struct hstate *h = page_hstate(hpage);
> +	int ret = 0;
> +
> +	VM_BUG_ON(!PageHuge(hpage));
> +	if (PageTail(hpage))
> +		return 0;
> +	spin_lock(&hugetlb_lock);
> +	list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru)
> +		if (page == hpage)
> +			ret = 1;
> +	spin_unlock(&hugetlb_lock);
> +	return ret;
> +}
> +
>   /*
>    * This function is called from memory failure code.
>    * Assume the caller holds page lock of the head page.
> diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c
> index d04ed87..6418de2 100644
> --- v3.8.orig/mm/memory_hotplug.c
> +++ v3.8/mm/memory_hotplug.c
> @@ -29,6 +29,7 @@
>   #include <linux/suspend.h>
>   #include <linux/mm_inline.h>
>   #include <linux/firmware-map.h>
> +#include <linux/hugetlb.h>
>   
>   #include <asm/tlbflush.h>
>   
> @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
>   }
>   
>   /*
> - * Scanning pfn is much easier than scanning lru list.
> - * Scan pfn from start to end and Find LRU page.
> + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages
> + * and hugepages). We scan pfn because it's much easier than scanning over
> + * linked list. This function returns the pfn of the first found movable
> + * page if it's found, otherwise 0.
>    */
> -static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
> +static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
>   {
>   	unsigned long pfn;
>   	struct page *page;
> @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
>   			page = pfn_to_page(pfn);
>   			if (PageLRU(page))
>   				return pfn;
> +			if (PageHuge(page)) {
> +				if (is_hugepage_movable(page))
> +					return pfn;
> +				else
> +					pfn += (1 << compound_order(page)) - 1;
> +			}
>   		}
>   	}
>   	return 0;
> @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>   		page = pfn_to_page(pfn);
>   		if (!get_page_unless_zero(page))
>   			continue;
> +		if (PageHuge(page)) {
> +			/*
> +			 * Larger hugepage (1GB for x86_64) is larger than
> +			 * memory block, so pfn scan can start at the tail
> +			 * page of larger hugepage. In such case,
> +			 * we simply skip the hugepage and move the cursor
> +			 * to the last tail page.
> +			 */
> +			if (PageTail(page)) {
> +				struct page *head = compound_head(page);
> +				pfn = page_to_pfn(head) +
> +					(1 << compound_order(head)) - 1;
> +				put_page(page);
> +				continue;
> +			}
> +			pfn = (1 << compound_order(page)) - 1;
> +			if (huge_page_size(page_hstate(page)) != PMD_SIZE) {
> +				put_page(page);
> +				continue;
> +			}
> +			list_move_tail(&page->lru, &source);
> +			move_pages -= 1 << compound_order(page);
> +			continue;
> +		}
>   		/*
>   		 * We can skip free pages. And we can only deal with pages on
>   		 * LRU.
> @@ -1049,7 +1082,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>   	}
>   	if (!list_empty(&source)) {
>   		if (not_managed) {
> -			putback_lru_pages(&source);
> +			putback_movable_pages(&source);
>   			goto out;
>   		}
>   
> @@ -1057,11 +1090,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>   		 * alloc_migrate_target should be improooooved!!
>   		 * migrate_pages returns # of failed pages.
>   		 */
> -		ret = migrate_pages(&source, alloc_migrate_target, 0,
> +		ret = migrate_movable_pages(&source, alloc_migrate_target, 0,
>   							true, MIGRATE_SYNC,
>   							MR_MEMORY_HOTPLUG);
> -		if (ret)
> -			putback_lru_pages(&source);
>   	}
>   out:
>   	return ret;
> @@ -1304,8 +1335,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
>   		drain_all_pages();
>   	}
>   
> -	pfn = scan_lru_pages(start_pfn, end_pfn);
> -	if (pfn) { /* We have page on LRU */
> +	pfn = scan_movable_pages(start_pfn, end_pfn);
> +	if (pfn) { /* We have movable pages */
>   		ret = do_migrate_range(pfn, end_pfn);
>   		if (!ret) {
>   			drain = 1;
> @@ -1324,6 +1355,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
>   	yield();
>   	/* drain pcp pages, this is synchronous. */
>   	drain_all_pages();
> +	/* dissolve all free hugepages inside the memory block */
> +	dissolve_free_huge_pages(start_pfn, end_pfn);
>   	/* check again */
>   	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
>   	if (offlined_pages < 0) {
> diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c
> index 8c457e7..a491a98 100644
> --- v3.8.orig/mm/migrate.c
> +++ v3.8/mm/migrate.c
> @@ -1009,8 +1009,18 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>   
>   	unlock_page(hpage);
>   out:
> -	if (rc != -EAGAIN)
> +	if (rc != -EAGAIN) {
>   		putback_active_hugepage(hpage);
> +
> +		/*
> +		 * After hugepage migration from memory hotplug, the original
> +		 * hugepage should never be allocated again. This will be
> +		 * done by dissolving it into free normal pages, because
> +		 * we already set migratetype to MIGRATE_ISOLATE for them.
> +		 */
> +		if (offlining)
> +			dissolve_free_huge_page(hpage);
> +	}
>   	put_page(new_hpage);
>   	if (result) {
>   		if (rc)
> diff --git v3.8.orig/mm/page_alloc.c v3.8/mm/page_alloc.c
> index 6a83cd3..c37951d 100644
> --- v3.8.orig/mm/page_alloc.c
> +++ v3.8/mm/page_alloc.c
> @@ -58,6 +58,7 @@
>   #include <linux/prefetch.h>
>   #include <linux/migrate.h>
>   #include <linux/page-debug-flags.h>
> +#include <linux/hugetlb.h>
>   
>   #include <asm/tlbflush.h>
>   #include <asm/div64.h>
> @@ -5686,6 +5687,17 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>   			continue;
>   
>   		page = pfn_to_page(check);
> +
> +		/*
> +		 * Hugepages are not in LRU lists, but they're movable.
> +		 * We need not scan over tail pages bacause we don't
> +		 * handle each tail page individually in migration.
> +		 */
> +		if (PageHuge(page)) {
> +			iter += (1 << compound_order(page)) - 1;
> +			continue;
> +		}
> +
>   		/*
>   		 * We can't use page_count without pin a page
>   		 * because another CPU can free compound page.
> diff --git v3.8.orig/mm/page_isolation.c v3.8/mm/page_isolation.c
> index 383bdbb..cf48ef6 100644
> --- v3.8.orig/mm/page_isolation.c
> +++ v3.8/mm/page_isolation.c
> @@ -6,6 +6,7 @@
>   #include <linux/page-isolation.h>
>   #include <linux/pageblock-flags.h>
>   #include <linux/memory.h>
> +#include <linux/hugetlb.h>
>   #include "internal.h"
>   
>   int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
> @@ -252,6 +253,10 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
>   {
>   	gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE;
>   
> +	if (PageHuge(page))
> +		return alloc_huge_page_node(page_hstate(compound_head(page)),
> +					    numa_node_id());
> +
>   	if (PageHighMem(page))
>   		gfp_mask |= __GFP_HIGHMEM;
>   
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
  2013-03-18 16:07   ` Michal Hocko
@ 2013-03-20  3:55     ` Naoya Horiguchi
  2013-03-20  7:57       ` Michal Hocko
  0 siblings, 1 reply; 55+ messages in thread
From: Naoya Horiguchi @ 2013-03-20  3:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Mon, Mar 18, 2013 at 05:07:37PM +0100, Michal Hocko wrote:
> On Thu 21-02-13 14:41:47, Naoya Horiguchi wrote:
> > Currently we can't offline memory blocks which contain hugepages because
> > a hugepage is considered as an unmovable page. But now with this patch
> > series, a hugepage has become movable, so by using hugepage migration we
> > can offline such memory blocks.
> > 
> > What's different from other users of hugepage migration is that we need
> > to decompose all the hugepages inside the target memory block into free
> > buddy pages after hugepage migration, because otherwise free hugepages
> > remaining in the memory block intervene the memory offlining.
> > For this reason we introduce new functions dissolve_free_huge_page() and
> > dissolve_free_huge_pages().
> > 
> > Other than that, what this patch does is straightforwardly to add hugepage
> > migration code, that is, adding hugepage code to the functions which scan
> > over pfn and collect hugepages to be migrated, and adding a hugepage
> > allocation function to alloc_migrate_target().
> > 
> > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
> > over them because it's larger than memory block. So we now simply leave
> > it to fail as it is.
> 
> What we could do is to check whether there is a free gb huge page on
> other node and migrate there.
Correct, and 1GB page migration needs more code in migration core code
(mainly it's related to migration entry in pud) and enough testing,
so I want to do it in separate patchset.
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > ---
> >  include/linux/hugetlb.h |  8 ++++++++
> >  mm/hugetlb.c            | 43 +++++++++++++++++++++++++++++++++++++++++
> >  mm/memory_hotplug.c     | 51 ++++++++++++++++++++++++++++++++++++++++---------
> >  mm/migrate.c            | 12 +++++++++++-
> >  mm/page_alloc.c         | 12 ++++++++++++
> >  mm/page_isolation.c     |  5 +++++
> >  6 files changed, 121 insertions(+), 10 deletions(-)
> > 
> > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
> > index 86a4d78..e33f07f 100644
> > --- v3.8.orig/include/linux/hugetlb.h
> > +++ v3.8/include/linux/hugetlb.h
> > @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page);
> >  void putback_active_hugepage(struct page *page);
> >  void putback_active_hugepages(struct list_head *l);
> >  void migrate_hugepage_add(struct page *page, struct list_head *list);
> > +int is_hugepage_movable(struct page *page);
> >  void copy_huge_page(struct page *dst, struct page *src);
> >  
> >  extern unsigned long hugepages_treat_as_movable;
> > @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
> >  #define putback_active_hugepage(p) 0
> >  #define putback_active_hugepages(l) 0
> >  #define migrate_hugepage_add(p, l) 0
> > +#define is_hugepage_movable(x) 0
> >  static inline void copy_huge_page(struct page *dst, struct page *src)
> >  {
> >  }
> > @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h)
> >  	return h - hstates;
> >  }
> >  
> > +extern void dissolve_free_huge_page(struct page *page);
> > +extern void dissolve_free_huge_pages(unsigned long start_pfn,
> > +				     unsigned long end_pfn);
> > +
> >  #else
> >  struct hstate {};
> >  #define alloc_huge_page(v, a, r) NULL
> > @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
> >  }
> >  #define hstate_index_to_shift(index) 0
> >  #define hstate_index(h) 0
> > +#define dissolve_free_huge_page(p) 0
> > +#define dissolve_free_huge_pages(s, e) 0
> >  #endif
> >  
> >  #endif /* _LINUX_HUGETLB_H */
> > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> > index ccf9995..c28e6c9 100644
> > --- v3.8.orig/mm/hugetlb.c
> > +++ v3.8/mm/hugetlb.c
> > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> >  	return ret;
> >  }
> >  
> > +/* Dissolve a given free hugepage into free pages. */
> > +void dissolve_free_huge_page(struct page *page)
> > +{
> > +	if (PageHuge(page) && !page_count(page)) {
> 
> Could you clarify why you are cheking page_count here? I assume it is to
> make sure the page is free but what prevents it being increased before
> you take hugetlb_lock?
There's nothing to prevent it, so it's not safe to check refcount outside
hugetlb_lock.
> > +		struct hstate *h = page_hstate(page);
> > +		int nid = page_to_nid(page);
> > +		spin_lock(&hugetlb_lock);
> > +		list_del(&page->lru);
> > +		h->free_huge_pages--;
> > +		h->free_huge_pages_node[nid]--;
> > +		update_and_free_page(h, page);
> > +		spin_unlock(&hugetlb_lock);
> > +	}
> > +}
> > +
> > +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */
> > +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> > +{
> > +	unsigned long pfn;
> > +	unsigned int step = 1 << (HUGETLB_PAGE_ORDER);
> 
> hugetlb pages could be present in different sizes so this doesn't work
> in general. You need to to get order from page_hstate.
OK.
> > +	for (pfn = start_pfn; pfn < end_pfn; pfn += step)
> > +		dissolve_free_huge_page(pfn_to_page(pfn));
> > +}
> > +
> >  static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
> >  {
> >  	struct page *page;
> > @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage)
> >  	return 0;
> >  }
> >  
> > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */
> > +int is_hugepage_movable(struct page *hpage)
> > +{
> > +	struct page *page;
> > +	struct page *tmp;
> > +	struct hstate *h = page_hstate(hpage);
> > +	int ret = 0;
> > +
> > +	VM_BUG_ON(!PageHuge(hpage));
> > +	if (PageTail(hpage))
> > +		return 0;
> > +	spin_lock(&hugetlb_lock);
> > +	list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru)
> > +		if (page == hpage)
> > +			ret = 1;
> > +	spin_unlock(&hugetlb_lock);
> > +	return ret;
> > +}
> > +
> >  /*
> >   * This function is called from memory failure code.
> >   * Assume the caller holds page lock of the head page.
> > diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c
> > index d04ed87..6418de2 100644
> > --- v3.8.orig/mm/memory_hotplug.c
> > +++ v3.8/mm/memory_hotplug.c
> > @@ -29,6 +29,7 @@
> >  #include <linux/suspend.h>
> >  #include <linux/mm_inline.h>
> >  #include <linux/firmware-map.h>
> > +#include <linux/hugetlb.h>
> >  
> >  #include <asm/tlbflush.h>
> >  
> > @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
> >  }
> >  
> >  /*
> > - * Scanning pfn is much easier than scanning lru list.
> > - * Scan pfn from start to end and Find LRU page.
> > + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages
> > + * and hugepages). We scan pfn because it's much easier than scanning over
> > + * linked list. This function returns the pfn of the first found movable
> > + * page if it's found, otherwise 0.
> >   */
> > -static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
> > +static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
> >  {
> >  	unsigned long pfn;
> >  	struct page *page;
> > @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
> >  			page = pfn_to_page(pfn);
> >  			if (PageLRU(page))
> >  				return pfn;
> > +			if (PageHuge(page)) {
> > +				if (is_hugepage_movable(page))
> > +					return pfn;
> > +				else
> > +					pfn += (1 << compound_order(page)) - 1;
> > +			}
> 
> scan_lru_pages's name gets really confusing after this change because
> hugetlb pages are not on the LRU. Maybe it would be good to rename it.
Yes, and that's done in right above chunk.
> 
> >  		}
> >  	}
> >  	return 0;
> > @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> >  		page = pfn_to_page(pfn);
> >  		if (!get_page_unless_zero(page))
> >  			continue;
> 
> All tail pages have 0 reference count (according to prep_compound_page)
> so they would be skipped anyway. This makes the below pfn tweaks
> pointless.
I was totally mistaken about what we should do here, sorry. If we call
do_migrate_range() for 1GB hugepage, we should return with error (maybe -EBUSY)
instead of just skipping it, otherwise the caller __offline_pages() repeats
'goto repeat' until timeout. In order to do that, we had better insert
if(PageHuge) block before getting refcount. And ...
> > +		if (PageHuge(page)) {
> > +			/*
> > +			 * Larger hugepage (1GB for x86_64) is larger than
> > +			 * memory block, so pfn scan can start at the tail
> > +			 * page of larger hugepage. In such case,
> > +			 * we simply skip the hugepage and move the cursor
> > +			 * to the last tail page.
> > +			 */
> > +			if (PageTail(page)) {
> > +				struct page *head = compound_head(page);
> > +				pfn = page_to_pfn(head) +
> > +					(1 << compound_order(head)) - 1;
> > +				put_page(page);
> > +				continue;
> > +			}
> > +			pfn = (1 << compound_order(page)) - 1;
> > +			if (huge_page_size(page_hstate(page)) != PMD_SIZE) {
> > +				put_page(page);
> > +				continue;
> > +			}
> 
> There might be other hugepage sizes which fit into memblock so this test
> doesn't seem right.
yes, so compound_order(head) > PFN_SECTION_SHIFT would be better.
I'll replace this chunk with the following if I don't get any other
suggestion.
@@ -1017,6 +1026,21 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		if (!pfn_valid(pfn))
 			continue;
 		page = pfn_to_page(pfn);
+
+		if (PageHuge(page)) {
+			struct page *head = compound_head(page);
+			pfn = page_to_pfn(head) + (1 << compound_order(head)) - 1;
+			if (compound_order(head) > PFN_SECTION_SHIFT) {
+				ret = -EBUSY;
+				break;
+			}
+			if (!get_page_unless_zero(page))
+				continue;
+			list_move_tail(&head->lru, &source);
+			move_pages -= 1 << compound_order(head);
+			continue;
+		}
+
 		if (!get_page_unless_zero(page))
 			continue;
 		/*
Thanks,
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage
  2013-03-19  7:11       ` Michal Hocko
@ 2013-03-20  6:12         ` Naoya Horiguchi
  2013-03-20  7:41           ` Michal Hocko
  0 siblings, 1 reply; 55+ messages in thread
From: Naoya Horiguchi @ 2013-03-20  6:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Tue, Mar 19, 2013 at 08:11:13AM +0100, Michal Hocko wrote:
> On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote:
> > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote:
> > > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
...
> > > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> > > >  	pmd = pmd_offset(pud, addr);
> > > >  	do {
> > > >  		next = pmd_addr_end(addr, end);
> > > > +		if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
> > > 
> > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
> > > sufficient?
> > 
> > I think we need both check here because if we use only pmd_huge(),
> > pmd for thp goes into this branch wrongly. 
> 
> Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it
> obviously checks only _PAGE_PSE same as pmd_large() which is really
> unfortunate and confusing. Can we make it hugetlb specific?
I agree that we had better fix this confusion.
What pmd_huge() (or pmd_large() in some architectures) does is just
checking whether a given pmd is pointing to huge/large page or not.
It does not say which type of hugepage it is.
So it shouldn't be used to decide whether the hugepage are hugetlbfs or not.
I think it would be better to introduce pmd_hugetlb() which has pmd and vma
as arguments and returns true only for hugetlbfs pmd.
Checking pmd_hugetlb() should come before checking pmd_trans_huge() because
pmd_trans_huge() implicitly assumes that the vma which covers the virtual
address of a given pmd is not hugetlbfs vma.
I'm interested in this cleanup, so will work on it after this patchset.
Thanks,
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage
  2013-03-20  6:12         ` Naoya Horiguchi
@ 2013-03-20  7:41           ` Michal Hocko
  0 siblings, 0 replies; 55+ messages in thread
From: Michal Hocko @ 2013-03-20  7:41 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Wed 20-03-13 02:12:54, Naoya Horiguchi wrote:
> On Tue, Mar 19, 2013 at 08:11:13AM +0100, Michal Hocko wrote:
> > On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote:
> > > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote:
> > > > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
> ...
> > > > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> > > > >  	pmd = pmd_offset(pud, addr);
> > > > >  	do {
> > > > >  		next = pmd_addr_end(addr, end);
> > > > > +		if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
> > > > 
> > > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
> > > > sufficient?
> > > 
> > > I think we need both check here because if we use only pmd_huge(),
> > > pmd for thp goes into this branch wrongly. 
> > 
> > Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it
> > obviously checks only _PAGE_PSE same as pmd_large() which is really
> > unfortunate and confusing. Can we make it hugetlb specific?
> 
> I agree that we had better fix this confusion.
> 
> What pmd_huge() (or pmd_large() in some architectures) does is just
> checking whether a given pmd is pointing to huge/large page or not.
> It does not say which type of hugepage it is.
> So it shouldn't be used to decide whether the hugepage are hugetlbfs or not.
> I think it would be better to introduce pmd_hugetlb() which has pmd and vma
> as arguments and returns true only for hugetlbfs pmd.
> Checking pmd_hugetlb() should come before checking pmd_trans_huge() because
> pmd_trans_huge() implicitly assumes that the vma which covers the virtual
> address of a given pmd is not hugetlbfs vma.
> 
> I'm interested in this cleanup, so will work on it after this patchset.
pnd_huge is used only at few places so it shouldn't be very big. On the
other hand you do not have vma always available so it is getting tricky.
Thanks
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
  2013-03-20  3:55     ` Naoya Horiguchi
@ 2013-03-20  7:57       ` Michal Hocko
  0 siblings, 0 replies; 55+ messages in thread
From: Michal Hocko @ 2013-03-20  7:57 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Tue 19-03-13 23:55:33, Naoya Horiguchi wrote:
> On Mon, Mar 18, 2013 at 05:07:37PM +0100, Michal Hocko wrote:
> > On Thu 21-02-13 14:41:47, Naoya Horiguchi wrote:
[...]
> > > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
> > > over them because it's larger than memory block. So we now simply leave
> > > it to fail as it is.
> > 
> > What we could do is to check whether there is a free gb huge page on
> > other node and migrate there.
> 
> Correct, and 1GB page migration needs more code in migration core code
> (mainly it's related to migration entry in pud) and enough testing,
> so I want to do it in separate patchset.
Sure, this was just a note that it is achievable not that it has to be
done in the patchset.
[...]
> > > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> > > index ccf9995..c28e6c9 100644
> > > --- v3.8.orig/mm/hugetlb.c
> > > +++ v3.8/mm/hugetlb.c
> > > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> > >  	return ret;
> > >  }
> > >  
> > > +/* Dissolve a given free hugepage into free pages. */
> > > +void dissolve_free_huge_page(struct page *page)
> > > +{
> > > +	if (PageHuge(page) && !page_count(page)) {
> > 
> > Could you clarify why you are cheking page_count here? I assume it is to
> > make sure the page is free but what prevents it being increased before
> > you take hugetlb_lock?
> 
> There's nothing to prevent it, so it's not safe to check refcount outside
> hugetlb_lock.
OK, so the lock has to be moved up.
[...]
> > > diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c
> > > index d04ed87..6418de2 100644
> > > --- v3.8.orig/mm/memory_hotplug.c
> > > +++ v3.8/mm/memory_hotplug.c
> > > @@ -29,6 +29,7 @@
> > >  #include <linux/suspend.h>
> > >  #include <linux/mm_inline.h>
> > >  #include <linux/firmware-map.h>
> > > +#include <linux/hugetlb.h>
> > >  
> > >  #include <asm/tlbflush.h>
> > >  
> > > @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
> > >  }
> > >  
> > >  /*
> > > - * Scanning pfn is much easier than scanning lru list.
> > > - * Scan pfn from start to end and Find LRU page.
> > > + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages
> > > + * and hugepages). We scan pfn because it's much easier than scanning over
> > > + * linked list. This function returns the pfn of the first found movable
> > > + * page if it's found, otherwise 0.
> > >   */
> > > -static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
> > > +static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
> > >  {
> > >  	unsigned long pfn;
> > >  	struct page *page;
> > > @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
> > >  			page = pfn_to_page(pfn);
> > >  			if (PageLRU(page))
> > >  				return pfn;
> > > +			if (PageHuge(page)) {
> > > +				if (is_hugepage_movable(page))
> > > +					return pfn;
> > > +				else
> > > +					pfn += (1 << compound_order(page)) - 1;
> > > +			}
> > 
> > scan_lru_pages's name gets really confusing after this change because
> > hugetlb pages are not on the LRU. Maybe it would be good to rename it.
> 
> Yes, and that's done in right above chunk.
bahh, I am blind. I got confused by the name in the hunk header. Sorry
about that.
> 
> > 
> > >  		}
> > >  	}
> > >  	return 0;
> > > @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> > >  		page = pfn_to_page(pfn);
> > >  		if (!get_page_unless_zero(page))
> > >  			continue;
> > 
> > All tail pages have 0 reference count (according to prep_compound_page)
> > so they would be skipped anyway. This makes the below pfn tweaks
> > pointless.
> 
> I was totally mistaken about what we should do here, sorry. If we call
> do_migrate_range() for 1GB hugepage, we should return with error (maybe -EBUSY)
> instead of just skipping it, otherwise the caller __offline_pages() repeats
> 'goto repeat' until timeout. In order to do that, we had better insert
> if(PageHuge) block before getting refcount. And ...
> 
> > > +		if (PageHuge(page)) {
> > > +			/*
> > > +			 * Larger hugepage (1GB for x86_64) is larger than
> > > +			 * memory block, so pfn scan can start at the tail
> > > +			 * page of larger hugepage. In such case,
> > > +			 * we simply skip the hugepage and move the cursor
> > > +			 * to the last tail page.
> > > +			 */
> > > +			if (PageTail(page)) {
> > > +				struct page *head = compound_head(page);
> > > +				pfn = page_to_pfn(head) +
> > > +					(1 << compound_order(head)) - 1;
> > > +				put_page(page);
> > > +				continue;
> > > +			}
> > > +			pfn = (1 << compound_order(page)) - 1;
> > > +			if (huge_page_size(page_hstate(page)) != PMD_SIZE) {
> > > +				put_page(page);
> > > +				continue;
> > > +			}
> > 
> > There might be other hugepage sizes which fit into memblock so this test
> > doesn't seem right.
> 
> yes, so compound_order(head) > PFN_SECTION_SHIFT would be better.
I would rather see compound_order(head) < MAX_ORDER to be more coupled
with the allocator.
> I'll replace this chunk with the following if I don't get any other
> suggestion.
> 
> @@ -1017,6 +1026,21 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  		if (!pfn_valid(pfn))
>  			continue;
>  		page = pfn_to_page(pfn);
> +
> +		if (PageHuge(page)) {
> +			struct page *head = compound_head(page);
> +			pfn = page_to_pfn(head) + (1 << compound_order(head)) - 1;
I do not think this is safe without an elevated ref count. Your page
might be on the way to be freed. So you need to put get_page_unless_zero
before compound_order check.
Besides that I do not see too much point in optimizing this path on the
code complexity behalf. Sure we would call get_page_unless_zero
pointlessly for all tail pages but this is hardly a hot path.
> +			if (compound_order(head) > PFN_SECTION_SHIFT) {
> +				ret = -EBUSY;
> +				break;
> +			}
> +			if (!get_page_unless_zero(page))
Should be head.
> +				continue;
> +			list_move_tail(&head->lru, &source);
> +			move_pages -= 1 << compound_order(head);
> +			continue;
> +		}
> +
>  		if (!get_page_unless_zero(page))
>  			continue;
>  		/*
> 
> Thanks,
> Naoya
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 0/9] extend hugepage migration
  2013-03-19 23:43 ` [RFC][PATCH 0/9] extend hugepage migration Simon Jeons
@ 2013-03-20 21:35   ` Naoya Horiguchi
  2013-03-20 23:49     ` Simon Jeons
  0 siblings, 1 reply; 55+ messages in thread
From: Naoya Horiguchi @ 2013-03-20 21:35 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Wed, Mar 20, 2013 at 07:43:44AM +0800, Simon Jeons wrote:
...
> >Easy patch access:
> >   git@github.com:Naoya-Horiguchi/linux.git
> >   branch:extend_hugepage_migration
> >
> >Test code:
> >   git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git
> 
> git clone
> git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git
> Cloning into test_hugepage_migration_extension...
> Permission denied (publickey).
> fatal: The remote end hung up unexpectedly
Sorry, wrong url.
git://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git
or
https://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git
should work.
Thanks,
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge()
  2013-03-19 23:57   ` Simon Jeons
@ 2013-03-20 21:53     ` Naoya Horiguchi
  2013-03-20 23:36       ` Simon Jeons
  0 siblings, 1 reply; 55+ messages in thread
From: Naoya Horiguchi @ 2013-03-20 21:53 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Wed, Mar 20, 2013 at 07:57:32AM +0800, Simon Jeons wrote:
> Hi Naoya,
> On 02/22/2013 03:41 AM, Naoya Horiguchi wrote:
> >When we have a page fault for the address which is backed by a hugepage
> >under migration, the kernel can't wait correctly until the migration
> >finishes. This is because pte_offset_map_lock() can't get a correct
> 
> It seems that current hugetlb_fault still wait hugetlb page under
> migration, how can it work without lock 2MB memory?
Hugetlb_fault() does call migration_entry_wait(), but returns immediately.
So page fault happens over and over again until the migration completes.
IOW, migration_entry_wait() is now broken for hugepage and doesn't work
as expected.
Thanks,
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage
  2013-03-20  0:31       ` Simon Jeons
@ 2013-03-20 21:59         ` Naoya Horiguchi
  2013-03-21  0:06           ` Simon Jeons
  0 siblings, 1 reply; 55+ messages in thread
From: Naoya Horiguchi @ 2013-03-20 21:59 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michal Hocko, linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Wed, Mar 20, 2013 at 08:31:06AM +0800, Simon Jeons wrote:
...
> >>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
> >>> index e2df1c1..8627135 100644
> >>> --- v3.8.orig/mm/mempolicy.c
> >>> +++ v3.8/mm/mempolicy.c
> >>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> >>>  	return addr != end;
> >>>  }
> >>>  
> >>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> >>> +		const nodemask_t *nodes, unsigned long flags,
> >>> +				    void *private)
> >>> +{
> >>> +#ifdef CONFIG_HUGETLB_PAGE
> >>> +	int nid;
> >>> +	struct page *page;
> >>> +
> >>> +	spin_lock(&vma->vm_mm->page_table_lock);
> >>> +	page = pte_page(huge_ptep_get((pte_t *)pmd));
> >>> +	spin_unlock(&vma->vm_mm->page_table_lock);
> >> I am a bit confused why page_table_lock is used here and why it doesn't
> >> cover the page usage.
> > I expected this function to do the same for pmd as check_pte_range() does
> > for pte, but the above code didn't do it. I should've put spin_unlock
> > below migrate_hugepage_add(). Sorry for the confusion.
> 
> I still confuse! Could you explain more in details?
With the above code, check_hugetlb_pmd_range() checks page_mapcount
outside the page table lock, but mapcount can be decremented by
__unmap_hugepage_range(), so there's a race.
__unmap_hugepage_range() calls page_remove_rmap() inside page table lock,
so we can avoid this race by doing whole check_hugetlb_pmd_range()'s work
inside the page table lock.
Thanks,
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
  2013-03-20  1:03   ` Simon Jeons
@ 2013-03-20 22:05     ` Naoya Horiguchi
  2013-03-20 23:55       ` Simon Jeons
  0 siblings, 1 reply; 55+ messages in thread
From: Naoya Horiguchi @ 2013-03-20 22:05 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
On Wed, Mar 20, 2013 at 09:03:20AM +0800, Simon Jeons wrote:
> Hi Naoya,
> On 02/22/2013 03:41 AM, Naoya Horiguchi wrote:
> >Currently we can't offline memory blocks which contain hugepages because
> >a hugepage is considered as an unmovable page. But now with this patch
> >series, a hugepage has become movable, so by using hugepage migration we
> >can offline such memory blocks.
> >
> >What's different from other users of hugepage migration is that we need
> >to decompose all the hugepages inside the target memory block into free
> 
> For other hugepage migration users, hugepage should be freed to
> hugepage_freelists after migration, but why I don't see any codes do
> this?
The source hugepages which are migrated by NUMA related system calls
(migrate_pages(2), move_pages(2), and mbind(2)) are still useable,
so we simply free them into free hugepage pool.
OTOH, the source hugepages migrated by memory hotremove should not be
reusable, because users of memory hotremove want to remove the memory
from the system. So we need to free such hugepages forcibly into the
buddy pages, otherwise memory offining doesn't work.
Thanks,
Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge()
  2013-03-20 21:53     ` Naoya Horiguchi
@ 2013-03-20 23:36       ` Simon Jeons
  2013-04-04  4:57         ` Simon Jeons
  0 siblings, 1 reply; 55+ messages in thread
From: Simon Jeons @ 2013-03-20 23:36 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
Hi Naoya,
On 03/21/2013 05:53 AM, Naoya Horiguchi wrote:
> On Wed, Mar 20, 2013 at 07:57:32AM +0800, Simon Jeons wrote:
>> Hi Naoya,
>> On 02/22/2013 03:41 AM, Naoya Horiguchi wrote:
>>> When we have a page fault for the address which is backed by a hugepage
>>> under migration, the kernel can't wait correctly until the migration
>>> finishes. This is because pte_offset_map_lock() can't get a correct
>> It seems that current hugetlb_fault still wait hugetlb page under
>> migration, how can it work without lock 2MB memory?
> Hugetlb_fault() does call migration_entry_wait(), but returns immediately.
Could you point out to me which code in function migration_entry_wait()
lead to return immediately?
> So page fault happens over and over again until the migration completes.
> IOW, migration_entry_wait() is now broken for hugepage and doesn't work
> as expected.
>
> Thanks,
> Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 0/9] extend hugepage migration
  2013-03-20 21:35   ` Naoya Horiguchi
@ 2013-03-20 23:49     ` Simon Jeons
  2013-03-21 12:56       ` Michal Hocko
  0 siblings, 1 reply; 55+ messages in thread
From: Simon Jeons @ 2013-03-20 23:49 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
Hi Naoya,
On 03/21/2013 05:35 AM, Naoya Horiguchi wrote:
> On Wed, Mar 20, 2013 at 07:43:44AM +0800, Simon Jeons wrote:
> ...
>>> Easy patch access:
>>>   git@github.com:Naoya-Horiguchi/linux.git
>>>   branch:extend_hugepage_migration
>>>
>>> Test code:
>>>   git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git
>> git clone
>> git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git
>> Cloning into test_hugepage_migration_extension...
>> Permission denied (publickey).
>> fatal: The remote end hung up unexpectedly
> Sorry, wrong url.
> git://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git
> or
> https://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git
> should work.
When I hacking arch/x86/mm/hugetlbpage.c like this,
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index ae1aa71..87f34ee 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file,
unsigned long addr,
#endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/
-#ifdef CONFIG_X86_64
static __init int setup_hugepagesz(char *opt)
{
unsigned long ps = memparse(opt, &opt);
if (ps == PMD_SIZE) {
hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
- } else if (ps == PUD_SIZE && cpu_has_gbpages) {
- hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
+ } else if (ps == PUD_SIZE) {
+ hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4);
} else {
printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n",
ps >> 20);
I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages.
What's the difference between these pages which I hacking and normal
huge pages?
>
> Thanks,
> Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related	[flat|nested] 55+ messages in thread
* Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
  2013-03-20 22:05     ` Naoya Horiguchi
@ 2013-03-20 23:55       ` Simon Jeons
  0 siblings, 0 replies; 55+ messages in thread
From: Simon Jeons @ 2013-03-20 23:55 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
Hi Naoya,
On 03/21/2013 06:05 AM, Naoya Horiguchi wrote:
> On Wed, Mar 20, 2013 at 09:03:20AM +0800, Simon Jeons wrote:
>> Hi Naoya,
>> On 02/22/2013 03:41 AM, Naoya Horiguchi wrote:
>>> Currently we can't offline memory blocks which contain hugepages because
>>> a hugepage is considered as an unmovable page. But now with this patch
>>> series, a hugepage has become movable, so by using hugepage migration we
>>> can offline such memory blocks.
>>>
>>> What's different from other users of hugepage migration is that we need
>>> to decompose all the hugepages inside the target memory block into free
>> For other hugepage migration users, hugepage should be freed to
>> hugepage_freelists after migration, but why I don't see any codes do
>> this?
> The source hugepages which are migrated by NUMA related system calls
> (migrate_pages(2), move_pages(2), and mbind(2)) are still useable,
> so we simply free them into free hugepage pool.
It seems that you misunderstand why I confuse. I can't find where free
huge pages to hugepage pool, could you point out to me?
> OTOH, the source hugepages migrated by memory hotremove should not be
> reusable, because users of memory hotremove want to remove the memory
> from the system. So we need to free such hugepages forcibly into the
> buddy pages, otherwise memory offining doesn't work.
>
> Thanks,
> Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage
  2013-03-20 21:59         ` Naoya Horiguchi
@ 2013-03-21  0:06           ` Simon Jeons
  0 siblings, 0 replies; 55+ messages in thread
From: Simon Jeons @ 2013-03-21  0:06 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Michal Hocko, linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
Hi Naoya,
On 03/21/2013 05:59 AM, Naoya Horiguchi wrote:
> On Wed, Mar 20, 2013 at 08:31:06AM +0800, Simon Jeons wrote:
> ...
>>>>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
>>>>> index e2df1c1..8627135 100644
>>>>> --- v3.8.orig/mm/mempolicy.c
>>>>> +++ v3.8/mm/mempolicy.c
>>>>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>>>>>  	return addr != end;
>>>>>  }
>>>>>  
>>>>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
>>>>> +		const nodemask_t *nodes, unsigned long flags,
>>>>> +				    void *private)
>>>>> +{
>>>>> +#ifdef CONFIG_HUGETLB_PAGE
>>>>> +	int nid;
>>>>> +	struct page *page;
>>>>> +
>>>>> +	spin_lock(&vma->vm_mm->page_table_lock);
>>>>> +	page = pte_page(huge_ptep_get((pte_t *)pmd));
>>>>> +	spin_unlock(&vma->vm_mm->page_table_lock);
>>>> I am a bit confused why page_table_lock is used here and why it doesn't
>>>> cover the page usage.
>>> I expected this function to do the same for pmd as check_pte_range() does
>>> for pte, but the above code didn't do it. I should've put spin_unlock
>>> below migrate_hugepage_add(). Sorry for the confusion.
>> I still confuse! Could you explain more in details?
> With the above code, check_hugetlb_pmd_range() checks page_mapcount
> outside the page table lock, but mapcount can be decremented by
> __unmap_hugepage_range(), so there's a race.
> __unmap_hugepage_range() calls page_remove_rmap() inside page table lock,
> so we can avoid this race by doing whole check_hugetlb_pmd_range()'s work
> inside the page table lock.
Why you use page_table_lock instead of split ptlock to protect 2MB?
>
> Thanks,
> Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 0/9] extend hugepage migration
  2013-03-20 23:49     ` Simon Jeons
@ 2013-03-21 12:56       ` Michal Hocko
  2013-03-21 23:46         ` Simon Jeons
  0 siblings, 1 reply; 55+ messages in thread
From: Michal Hocko @ 2013-03-21 12:56 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, Mel Gorman,
	Hugh Dickins, KOSAKI Motohiro, Andi Kleen, linux-kernel
On Thu 21-03-13 07:49:48, Simon Jeons wrote:
[...]
> When I hacking arch/x86/mm/hugetlbpage.c like this,
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index ae1aa71..87f34ee 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file,
> unsigned long addr,
> 
> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/
> 
> -#ifdef CONFIG_X86_64
> static __init int setup_hugepagesz(char *opt)
> {
> unsigned long ps = memparse(opt, &opt);
> if (ps == PMD_SIZE) {
> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
> - } else if (ps == PUD_SIZE && cpu_has_gbpages) {
> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
> + } else if (ps == PUD_SIZE) {
> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4);
> } else {
> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n",
> ps >> 20);
> 
> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages.
> What's the difference between these pages which I hacking and normal
> huge pages?
How is this related to the patch set?
Please _stop_ distracting discussion to unrelated topics!
Nothing personal but this is just wasting our time.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 0/9] extend hugepage migration
  2013-03-21 12:56       ` Michal Hocko
@ 2013-03-21 23:46         ` Simon Jeons
       [not found]           ` <20130322081532.GC31457@dhcp22.suse.cz>
  0 siblings, 1 reply; 55+ messages in thread
From: Simon Jeons @ 2013-03-21 23:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, Mel Gorman,
	Hugh Dickins, KOSAKI Motohiro, Andi Kleen, linux-kernel
Hi Michal,
On 03/21/2013 08:56 PM, Michal Hocko wrote:
> On Thu 21-03-13 07:49:48, Simon Jeons wrote:
> [...]
>> When I hacking arch/x86/mm/hugetlbpage.c like this,
>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
>> index ae1aa71..87f34ee 100644
>> --- a/arch/x86/mm/hugetlbpage.c
>> +++ b/arch/x86/mm/hugetlbpage.c
>> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file,
>> unsigned long addr,
>>
>> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/
>>
>> -#ifdef CONFIG_X86_64
>> static __init int setup_hugepagesz(char *opt)
>> {
>> unsigned long ps = memparse(opt, &opt);
>> if (ps == PMD_SIZE) {
>> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
>> - } else if (ps == PUD_SIZE && cpu_has_gbpages) {
>> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
>> + } else if (ps == PUD_SIZE) {
>> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4);
>> } else {
>> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n",
>> ps >> 20);
>>
>> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages.
>> What's the difference between these pages which I hacking and normal
>> huge pages?
> How is this related to the patch set?
> Please _stop_ distracting discussion to unrelated topics!
>
> Nothing personal but this is just wasting our time.
Sorry kindly Michal, my bad.
Btw, could you explain this question for me? very sorry waste your time.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge()
  2013-03-20 23:36       ` Simon Jeons
@ 2013-04-04  4:57         ` Simon Jeons
  0 siblings, 0 replies; 55+ messages in thread
From: Simon Jeons @ 2013-04-04  4:57 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, Mel Gorman, Hugh Dickins,
	KOSAKI Motohiro, Andi Kleen, linux-kernel
Ping!
On 03/21/2013 07:36 AM, Simon Jeons wrote:
> Hi Naoya,
> On 03/21/2013 05:53 AM, Naoya Horiguchi wrote:
>> On Wed, Mar 20, 2013 at 07:57:32AM +0800, Simon Jeons wrote:
>>> Hi Naoya,
>>> On 02/22/2013 03:41 AM, Naoya Horiguchi wrote:
>>>> When we have a page fault for the address which is backed by a hugepage
>>>> under migration, the kernel can't wait correctly until the migration
>>>> finishes. This is because pte_offset_map_lock() can't get a correct
>>> It seems that current hugetlb_fault still wait hugetlb page under
>>> migration, how can it work without lock 2MB memory?
>> Hugetlb_fault() does call migration_entry_wait(), but returns immediately.
> Could you point out to me which code in function migration_entry_wait()
> lead to return immediately?
>
>> So page fault happens over and over again until the migration completes.
>> IOW, migration_entry_wait() is now broken for hugepage and doesn't work
>> as expected.
>>
>> Thanks,
>> Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 0/9] extend hugepage migration
       [not found]           ` <20130322081532.GC31457@dhcp22.suse.cz>
@ 2013-04-05  1:14             ` Simon Jeons
  2013-04-05  8:08               ` Michal Hocko
  0 siblings, 1 reply; 55+ messages in thread
From: Simon Jeons @ 2013-04-05  1:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Memory Management List, Andrew Morton, Mel Gorman,
	Hugh Dickins, KOSAKI Motohiro, Andi Kleen,
	Linux kernel Mailing List, Naoya Horiguchi, David Rientjes
Hi Michal,
On 03/22/2013 04:15 PM, Michal Hocko wrote:
> [getting off-list]
>
> On Fri 22-03-13 07:46:32, Simon Jeons wrote:
>> Hi Michal,
>> On 03/21/2013 08:56 PM, Michal Hocko wrote:
>>> On Thu 21-03-13 07:49:48, Simon Jeons wrote:
>>> [...]
>>>> When I hacking arch/x86/mm/hugetlbpage.c like this,
>>>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
>>>> index ae1aa71..87f34ee 100644
>>>> --- a/arch/x86/mm/hugetlbpage.c
>>>> +++ b/arch/x86/mm/hugetlbpage.c
>>>> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file,
>>>> unsigned long addr,
>>>>
>>>> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/
>>>>
>>>> -#ifdef CONFIG_X86_64
>>>> static __init int setup_hugepagesz(char *opt)
>>>> {
>>>> unsigned long ps = memparse(opt, &opt);
>>>> if (ps == PMD_SIZE) {
>>>> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
>>>> - } else if (ps == PUD_SIZE && cpu_has_gbpages) {
>>>> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
>>>> + } else if (ps == PUD_SIZE) {
>>>> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4);
>>>> } else {
>>>> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n",
>>>> ps >> 20);
>>>>
>>>> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages.
>>>> What's the difference between these pages which I hacking and normal
>>>> huge pages?
>>> How is this related to the patch set?
>>> Please _stop_ distracting discussion to unrelated topics!
>>>
>>> Nothing personal but this is just wasting our time.
>> Sorry kindly Michal, my bad.
>> Btw, could you explain this question for me? very sorry waste your time.
> Your CPU has to support GB pages. You have removed cpu_has_gbpages test
> and added a hstate for order 13 pages which is a weird number on its
> own (32MB) because there is no page table level to support them.
But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*, and 
have equal number of 32MB huge pages which I set up in boot parameter. 
If there is no page table level to support them, how can them present? I 
can hacking this successfully in ubuntu, but not in fedora.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 0/9] extend hugepage migration
  2013-04-05  1:14             ` Simon Jeons
@ 2013-04-05  8:08               ` Michal Hocko
  2013-04-05  9:00                 ` Simon Jeons
  0 siblings, 1 reply; 55+ messages in thread
From: Michal Hocko @ 2013-04-05  8:08 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Linux Memory Management List, Andrew Morton, Mel Gorman,
	Hugh Dickins, KOSAKI Motohiro, Andi Kleen,
	Linux kernel Mailing List, Naoya Horiguchi, David Rientjes
On Fri 05-04-13 09:14:58, Simon Jeons wrote:
> Hi Michal,
> On 03/22/2013 04:15 PM, Michal Hocko wrote:
> >[getting off-list]
> >
> >On Fri 22-03-13 07:46:32, Simon Jeons wrote:
> >>Hi Michal,
> >>On 03/21/2013 08:56 PM, Michal Hocko wrote:
> >>>On Thu 21-03-13 07:49:48, Simon Jeons wrote:
> >>>[...]
> >>>>When I hacking arch/x86/mm/hugetlbpage.c like this,
> >>>>diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> >>>>index ae1aa71..87f34ee 100644
> >>>>--- a/arch/x86/mm/hugetlbpage.c
> >>>>+++ b/arch/x86/mm/hugetlbpage.c
> >>>>@@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file,
> >>>>unsigned long addr,
> >>>>
> >>>>#endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/
> >>>>
> >>>>-#ifdef CONFIG_X86_64
> >>>>static __init int setup_hugepagesz(char *opt)
> >>>>{
> >>>>unsigned long ps = memparse(opt, &opt);
> >>>>if (ps == PMD_SIZE) {
> >>>>hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
> >>>>- } else if (ps == PUD_SIZE && cpu_has_gbpages) {
> >>>>- hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
> >>>>+ } else if (ps == PUD_SIZE) {
> >>>>+ hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4);
> >>>>} else {
> >>>>printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n",
> >>>>ps >> 20);
> >>>>
> >>>>I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages.
> >>>>What's the difference between these pages which I hacking and normal
> >>>>huge pages?
> >>>How is this related to the patch set?
> >>>Please _stop_ distracting discussion to unrelated topics!
> >>>
> >>>Nothing personal but this is just wasting our time.
> >>Sorry kindly Michal, my bad.
> >>Btw, could you explain this question for me? very sorry waste your time.
> >Your CPU has to support GB pages. You have removed cpu_has_gbpages test
> >and added a hstate for order 13 pages which is a weird number on its
> >own (32MB) because there is no page table level to support them.
> 
> But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*,
> and have equal number of 32MB huge pages which I set up in boot
> parameter.
because hugetlb_add_hstate creates hstate for those pages and
hugetlb_init_hstates allocates them later on.
> If there is no page table level to support them, how can
> them present?
Because hugetlb hstate handling code doesn't care about page tables and
the way how those pages are going to be mapped _at all_. Or put it in
another way. Nobody prevents you to allocate order-5 page for a single
pte but that would be a pure waste. Page fault code expects that pages
with a proper size are allocated.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 0/9] extend hugepage migration
  2013-04-05  8:08               ` Michal Hocko
@ 2013-04-05  9:00                 ` Simon Jeons
  2013-04-05  9:30                   ` Michal Hocko
  0 siblings, 1 reply; 55+ messages in thread
From: Simon Jeons @ 2013-04-05  9:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Memory Management List, Andrew Morton, Mel Gorman,
	Hugh Dickins, KOSAKI Motohiro, Andi Kleen,
	Linux kernel Mailing List, Naoya Horiguchi, David Rientjes
Hi Michal,
On 04/05/2013 04:08 PM, Michal Hocko wrote:
> On Fri 05-04-13 09:14:58, Simon Jeons wrote:
>> Hi Michal,
>> On 03/22/2013 04:15 PM, Michal Hocko wrote:
>>> [getting off-list]
>>>
>>> On Fri 22-03-13 07:46:32, Simon Jeons wrote:
>>>> Hi Michal,
>>>> On 03/21/2013 08:56 PM, Michal Hocko wrote:
>>>>> On Thu 21-03-13 07:49:48, Simon Jeons wrote:
>>>>> [...]
>>>>>> When I hacking arch/x86/mm/hugetlbpage.c like this,
>>>>>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
>>>>>> index ae1aa71..87f34ee 100644
>>>>>> --- a/arch/x86/mm/hugetlbpage.c
>>>>>> +++ b/arch/x86/mm/hugetlbpage.c
>>>>>> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file,
>>>>>> unsigned long addr,
>>>>>>
>>>>>> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/
>>>>>>
>>>>>> -#ifdef CONFIG_X86_64
>>>>>> static __init int setup_hugepagesz(char *opt)
>>>>>> {
>>>>>> unsigned long ps = memparse(opt, &opt);
>>>>>> if (ps == PMD_SIZE) {
>>>>>> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
>>>>>> - } else if (ps == PUD_SIZE && cpu_has_gbpages) {
>>>>>> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
>>>>>> + } else if (ps == PUD_SIZE) {
>>>>>> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4);
>>>>>> } else {
>>>>>> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n",
>>>>>> ps >> 20);
>>>>>>
>>>>>> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages.
>>>>>> What's the difference between these pages which I hacking and normal
>>>>>> huge pages?
>>>>> How is this related to the patch set?
>>>>> Please _stop_ distracting discussion to unrelated topics!
>>>>>
>>>>> Nothing personal but this is just wasting our time.
>>>> Sorry kindly Michal, my bad.
>>>> Btw, could you explain this question for me? very sorry waste your time.
>>> Your CPU has to support GB pages. You have removed cpu_has_gbpages test
>>> and added a hstate for order 13 pages which is a weird number on its
>>> own (32MB) because there is no page table level to support them.
>> But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*,
>> and have equal number of 32MB huge pages which I set up in boot
>> parameter.
> because hugetlb_add_hstate creates hstate for those pages and
> hugetlb_init_hstates allocates them later on.
>
>> If there is no page table level to support them, how can
>> them present?
> Because hugetlb hstate handling code doesn't care about page tables and
> the way how those pages are going to be mapped _at all_. Or put it in
> another way. Nobody prevents you to allocate order-5 page for a single
> pte but that would be a pure waste. Page fault code expects that pages
> with a proper size are allocated.
Do you mean 32MB pages will map to one pmd which should map 2MB pages?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 0/9] extend hugepage migration
  2013-04-05  9:00                 ` Simon Jeons
@ 2013-04-05  9:30                   ` Michal Hocko
  2013-04-07  0:32                     ` Simon Jeons
  0 siblings, 1 reply; 55+ messages in thread
From: Michal Hocko @ 2013-04-05  9:30 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Linux Memory Management List, Andrew Morton, Mel Gorman,
	Hugh Dickins, KOSAKI Motohiro, Andi Kleen,
	Linux kernel Mailing List, Naoya Horiguchi, David Rientjes
On Fri 05-04-13 17:00:58, Simon Jeons wrote:
> Hi Michal,
> On 04/05/2013 04:08 PM, Michal Hocko wrote:
> >On Fri 05-04-13 09:14:58, Simon Jeons wrote:
> >>Hi Michal,
> >>On 03/22/2013 04:15 PM, Michal Hocko wrote:
> >>>[getting off-list]
> >>>
> >>>On Fri 22-03-13 07:46:32, Simon Jeons wrote:
> >>>>Hi Michal,
> >>>>On 03/21/2013 08:56 PM, Michal Hocko wrote:
> >>>>>On Thu 21-03-13 07:49:48, Simon Jeons wrote:
> >>>>>[...]
> >>>>>>When I hacking arch/x86/mm/hugetlbpage.c like this,
> >>>>>>diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> >>>>>>index ae1aa71..87f34ee 100644
> >>>>>>--- a/arch/x86/mm/hugetlbpage.c
> >>>>>>+++ b/arch/x86/mm/hugetlbpage.c
> >>>>>>@@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file,
> >>>>>>unsigned long addr,
> >>>>>>
> >>>>>>#endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/
> >>>>>>
> >>>>>>-#ifdef CONFIG_X86_64
> >>>>>>static __init int setup_hugepagesz(char *opt)
> >>>>>>{
> >>>>>>unsigned long ps = memparse(opt, &opt);
> >>>>>>if (ps == PMD_SIZE) {
> >>>>>>hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
> >>>>>>- } else if (ps == PUD_SIZE && cpu_has_gbpages) {
> >>>>>>- hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
> >>>>>>+ } else if (ps == PUD_SIZE) {
> >>>>>>+ hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4);
> >>>>>>} else {
> >>>>>>printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n",
> >>>>>>ps >> 20);
> >>>>>>
> >>>>>>I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages.
> >>>>>>What's the difference between these pages which I hacking and normal
> >>>>>>huge pages?
> >>>>>How is this related to the patch set?
> >>>>>Please _stop_ distracting discussion to unrelated topics!
> >>>>>
> >>>>>Nothing personal but this is just wasting our time.
> >>>>Sorry kindly Michal, my bad.
> >>>>Btw, could you explain this question for me? very sorry waste your time.
> >>>Your CPU has to support GB pages. You have removed cpu_has_gbpages test
> >>>and added a hstate for order 13 pages which is a weird number on its
> >>>own (32MB) because there is no page table level to support them.
> >>But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*,
> >>and have equal number of 32MB huge pages which I set up in boot
> >>parameter.
> >because hugetlb_add_hstate creates hstate for those pages and
> >hugetlb_init_hstates allocates them later on.
> >
> >>If there is no page table level to support them, how can
> >>them present?
> >Because hugetlb hstate handling code doesn't care about page tables and
> >the way how those pages are going to be mapped _at all_. Or put it in
> >another way. Nobody prevents you to allocate order-5 page for a single
> >pte but that would be a pure waste. Page fault code expects that pages
> >with a proper size are allocated.
> Do you mean 32MB pages will map to one pmd which should map 2MB pages?
> 
Please refer to hugetlb_fault for more information.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 0/9] extend hugepage migration
  2013-04-05  9:30                   ` Michal Hocko
@ 2013-04-07  0:32                     ` Simon Jeons
  2013-04-07 14:05                       ` KOSAKI Motohiro
  0 siblings, 1 reply; 55+ messages in thread
From: Simon Jeons @ 2013-04-07  0:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Memory Management List, Andrew Morton, Mel Gorman,
	Hugh Dickins, KOSAKI Motohiro, Andi Kleen,
	Linux kernel Mailing List, Naoya Horiguchi, David Rientjes
Hi Michal,
On 04/05/2013 05:30 PM, Michal Hocko wrote:
> On Fri 05-04-13 17:00:58, Simon Jeons wrote:
>> Hi Michal,
>> On 04/05/2013 04:08 PM, Michal Hocko wrote:
>>> On Fri 05-04-13 09:14:58, Simon Jeons wrote:
>>>> Hi Michal,
>>>> On 03/22/2013 04:15 PM, Michal Hocko wrote:
>>>>> [getting off-list]
>>>>>
>>>>> On Fri 22-03-13 07:46:32, Simon Jeons wrote:
>>>>>> Hi Michal,
>>>>>> On 03/21/2013 08:56 PM, Michal Hocko wrote:
>>>>>>> On Thu 21-03-13 07:49:48, Simon Jeons wrote:
>>>>>>> [...]
>>>>>>>> When I hacking arch/x86/mm/hugetlbpage.c like this,
>>>>>>>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
>>>>>>>> index ae1aa71..87f34ee 100644
>>>>>>>> --- a/arch/x86/mm/hugetlbpage.c
>>>>>>>> +++ b/arch/x86/mm/hugetlbpage.c
>>>>>>>> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file,
>>>>>>>> unsigned long addr,
>>>>>>>>
>>>>>>>> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/
>>>>>>>>
>>>>>>>> -#ifdef CONFIG_X86_64
>>>>>>>> static __init int setup_hugepagesz(char *opt)
>>>>>>>> {
>>>>>>>> unsigned long ps = memparse(opt, &opt);
>>>>>>>> if (ps == PMD_SIZE) {
>>>>>>>> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
>>>>>>>> - } else if (ps == PUD_SIZE && cpu_has_gbpages) {
>>>>>>>> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
>>>>>>>> + } else if (ps == PUD_SIZE) {
>>>>>>>> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4);
>>>>>>>> } else {
>>>>>>>> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n",
>>>>>>>> ps >> 20);
>>>>>>>>
>>>>>>>> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages.
>>>>>>>> What's the difference between these pages which I hacking and normal
>>>>>>>> huge pages?
>>>>>>> How is this related to the patch set?
>>>>>>> Please _stop_ distracting discussion to unrelated topics!
>>>>>>>
>>>>>>> Nothing personal but this is just wasting our time.
>>>>>> Sorry kindly Michal, my bad.
>>>>>> Btw, could you explain this question for me? very sorry waste your time.
>>>>> Your CPU has to support GB pages. You have removed cpu_has_gbpages test
>>>>> and added a hstate for order 13 pages which is a weird number on its
>>>>> own (32MB) because there is no page table level to support them.
>>>> But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*,
>>>> and have equal number of 32MB huge pages which I set up in boot
>>>> parameter.
>>> because hugetlb_add_hstate creates hstate for those pages and
>>> hugetlb_init_hstates allocates them later on.
>>>
>>>> If there is no page table level to support them, how can
>>>> them present?
>>> Because hugetlb hstate handling code doesn't care about page tables and
>>> the way how those pages are going to be mapped _at all_. Or put it in
>>> another way. Nobody prevents you to allocate order-5 page for a single
>>> pte but that would be a pure waste. Page fault code expects that pages
>>> with a proper size are allocated.
>> Do you mean 32MB pages will map to one pmd which should map 2MB pages?
>>
> Please refer to hugetlb_fault for more information.
Thanks for your pointing out. So my assume is correct, is it? Can pmd 
which support 2MB map 32MB pages work well?
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
* Re: [RFC][PATCH 0/9] extend hugepage migration
  2013-04-07  0:32                     ` Simon Jeons
@ 2013-04-07 14:05                       ` KOSAKI Motohiro
  0 siblings, 0 replies; 55+ messages in thread
From: KOSAKI Motohiro @ 2013-04-07 14:05 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michal Hocko, Linux Memory Management List, Andrew Morton,
	Mel Gorman, Hugh Dickins, KOSAKI Motohiro, Andi Kleen,
	Linux kernel Mailing List, Naoya Horiguchi, David Rientjes,
	kosaki.motohiro
>> Please refer to hugetlb_fault for more information.
> 
> Thanks for your pointing out. So my assume is correct, is it? Can pmd 
> which support 2MB map 32MB pages work well?
Simon, Please stop hijaking unrelated threads. This is not question and answer thread.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply	[flat|nested] 55+ messages in thread
end of thread, other threads:[~2013-04-07 14:05 UTC | newest]
Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 1/9] migrate: add migrate_entry_wait_huge() Naoya Horiguchi
2013-03-18 14:51   ` Michal Hocko
2013-03-19  0:06     ` Naoya Horiguchi
2013-03-19 23:57   ` Simon Jeons
2013-03-20 21:53     ` Naoya Horiguchi
2013-03-20 23:36       ` Simon Jeons
2013-04-04  4:57         ` Simon Jeons
2013-02-21 19:41 ` [PATCH 2/9] migrate: make core migration code aware of hugepage Naoya Horiguchi
2013-03-18 15:22   ` Michal Hocko
2013-03-18 15:33     ` Michal Hocko
2013-03-19  0:06       ` Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Naoya Horiguchi
2013-02-27  7:25   ` Chen Gong
2013-02-27 17:06     ` Naoya Horiguchi
2013-02-27 17:57       ` Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 4/9] migrate: clean up migrate_huge_page() Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Naoya Horiguchi
2013-03-18 15:40   ` Michal Hocko
2013-03-19  0:07     ` Naoya Horiguchi
2013-03-19  7:11       ` Michal Hocko
2013-03-20  6:12         ` Naoya Horiguchi
2013-03-20  7:41           ` Michal Hocko
2013-03-20  0:31       ` Simon Jeons
2013-03-20 21:59         ` Naoya Horiguchi
2013-03-21  0:06           ` Simon Jeons
2013-02-21 19:41 ` [PATCH 6/9] migrate: enable move_pages() " Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 7/9] mbind: enable mbind() " Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Naoya Horiguchi
2013-02-23  7:05   ` Hillf Danton
2013-02-25 16:57     ` Naoya Horiguchi
2013-02-27  7:36   ` Chen Gong
2013-02-27 17:16     ` Naoya Horiguchi
2013-03-18 16:07   ` Michal Hocko
2013-03-20  3:55     ` Naoya Horiguchi
2013-03-20  7:57       ` Michal Hocko
2013-03-20  1:03   ` Simon Jeons
2013-03-20 22:05     ` Naoya Horiguchi
2013-03-20 23:55       ` Simon Jeons
2013-02-21 19:41 ` [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Naoya Horiguchi
2013-02-28  6:02   ` KOSAKI Motohiro
2013-02-28 18:16     ` Naoya Horiguchi
2013-03-18 15:51   ` Michal Hocko
2013-03-19  0:07     ` Naoya Horiguchi
2013-03-19 23:43 ` [RFC][PATCH 0/9] extend hugepage migration Simon Jeons
2013-03-20 21:35   ` Naoya Horiguchi
2013-03-20 23:49     ` Simon Jeons
2013-03-21 12:56       ` Michal Hocko
2013-03-21 23:46         ` Simon Jeons
     [not found]           ` <20130322081532.GC31457@dhcp22.suse.cz>
2013-04-05  1:14             ` Simon Jeons
2013-04-05  8:08               ` Michal Hocko
2013-04-05  9:00                 ` Simon Jeons
2013-04-05  9:30                   ` Michal Hocko
2013-04-07  0:32                     ` Simon Jeons
2013-04-07 14:05                       ` KOSAKI Motohiro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).