Re: [PATCH 7/8] memory-hotplug: enable memory hotplug to handle hugepage

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Wanpeng Li <liwanp@linux.vnet.ibm.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Hugh Dickins <hughd@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Andi Kleen <andi@firstfloor.org>, Hillf Danton <dhillf@gmail.com>,
	Michal Hocko <mhocko@suse.cz>, Rik van Riel <riel@redhat.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org,
	Naoya Horiguchi <nao.horiguchi@gmail.com>
Subject: Re: [PATCH 7/8] memory-hotplug: enable memory hotplug to handle hugepage
Date: Wed, 24 Jul 2013 14:10:07 +0800	[thread overview]
Message-ID: <20130724061007.GA23915@hacker.(null)> (raw)
In-Reply-To: <1374183272-10153-8-git-send-email-n-horiguchi@ah.jp.nec.com>

On Thu, Jul 18, 2013 at 05:34:31PM -0400, Naoya Horiguchi wrote:
>Until now we can't offline memory blocks which contain hugepages because
>a hugepage is considered as an unmovable page. But now with this patch
>series, a hugepage has become movable, so by using hugepage migration we
>can offline such memory blocks.
>
>What's different from other users of hugepage migration is that we need
>to decompose all the hugepages inside the target memory block into free
>buddy pages after hugepage migration, because otherwise free hugepages
>remaining in the memory block intervene the memory offlining.
>For this reason we introduce new functions dissolve_free_huge_page() and
>dissolve_free_huge_pages().
>
>Other than that, what this patch does is straightforwardly to add hugepage
>migration code, that is, adding hugepage code to the functions which scan
>over pfn and collect hugepages to be migrated, and adding a hugepage
>allocation function to alloc_migrate_target().
>
>As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
>over them because it's larger than memory block. So we now simply leave
>it to fail as it is.
>
>ChangeLog v3:
> - revert introducing migrate_movable_pages (the function was opened)
> - add migratetype check in dequeue_huge_page_node to close the race
>   between scan and allocation
> - make is_hugepage_movable use refcount to find active hugepages
>   instead of running through hugepage_activelist
> - rename is_hugepage_movable to is_hugepage_active
> - add alignment check in dissolve_free_huge_pages
> - use round_up in calculating next scanning pfn
> - use isolate_huge_page
>
>ChangeLog v2:
> - changed return value type of is_hugepage_movable() to bool
> - is_hugepage_movable() uses list_for_each_entry() instead of *_safe()
> - moved if(PageHuge) block before get_page_unless_zero() in do_migrate_range()
> - do_migrate_range() returns -EBUSY for hugepages larger than memory block
> - dissolve_free_huge_pages() calculates scan step and sets it to minimum
>   hugepage size
>
>Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>---
> include/linux/hugetlb.h |  6 +++++
> mm/hugetlb.c            | 67 +++++++++++++++++++++++++++++++++++++++++++++++--
> mm/memory_hotplug.c     | 42 +++++++++++++++++++++++++------
> mm/page_alloc.c         | 12 +++++++++
> mm/page_isolation.c     |  5 ++++
> 5 files changed, 123 insertions(+), 9 deletions(-)
>
>diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
>index 768ebbe..bb7651e 100644
>--- v3.11-rc1.orig/include/linux/hugetlb.h
>+++ v3.11-rc1/include/linux/hugetlb.h
>@@ -69,6 +69,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page);
> bool isolate_huge_page(struct page *page, struct list_head *l);
> void putback_active_hugepage(struct page *page);
> void putback_active_hugepages(struct list_head *l);
>+bool is_hugepage_active(struct page *page);
> void copy_huge_page(struct page *dst, struct page *src);
>
> #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
>@@ -140,6 +141,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
> #define isolate_huge_page(p, l) false
> #define putback_active_hugepage(p)
> #define putback_active_hugepages(l)
>+#define is_hugepage_active(x) false
> static inline void copy_huge_page(struct page *dst, struct page *src)
> {
> }
>@@ -379,6 +381,9 @@ static inline pgoff_t basepage_index(struct page *page)
> 	return __basepage_index(page);
> }
>
>+extern void dissolve_free_huge_pages(unsigned long start_pfn,
>+				     unsigned long end_pfn);
>+
> #else	/* CONFIG_HUGETLB_PAGE */
> struct hstate {};
> #define alloc_huge_page_node(h, nid) NULL
>@@ -405,6 +410,7 @@ static inline pgoff_t basepage_index(struct page *page)
> {
> 	return page->index;
> }
>+#define dissolve_free_huge_pages(s, e)
> #endif	/* CONFIG_HUGETLB_PAGE */
>
> #endif /* _LINUX_HUGETLB_H */
>diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
>index fab29a1..9575e8a 100644
>--- v3.11-rc1.orig/mm/hugetlb.c
>+++ v3.11-rc1/mm/hugetlb.c
>@@ -21,6 +21,7 @@
> #include <linux/rmap.h>
> #include <linux/swap.h>
> #include <linux/swapops.h>
>+#include <linux/page-isolation.h>
>
> #include <asm/page.h>
> #include <asm/pgtable.h>
>@@ -518,9 +519,11 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
> {
> 	struct page *page;
>
>-	if (list_empty(&h->hugepage_freelists[nid]))
>+	list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
>+		if (!is_migrate_isolate_page(page))
>+			break;
>+	if (&h->hugepage_freelists[nid] == &page->lru)
> 		return NULL;
>-	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
> 	list_move(&page->lru, &h->hugepage_activelist);
> 	set_page_refcounted(page);
> 	h->free_huge_pages--;
>@@ -861,6 +864,44 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> 	return ret;
> }
>
>+/*
>+ * Dissolve a given free hugepage into free buddy pages. This function does
>+ * nothing for in-use (including surplus) hugepages.
>+ */
>+static void dissolve_free_huge_page(struct page *page)
>+{
>+	spin_lock(&hugetlb_lock);
>+	if (PageHuge(page) && !page_count(page)) {
>+		struct hstate *h = page_hstate(page);
>+		int nid = page_to_nid(page);
>+		list_del(&page->lru);
>+		h->free_huge_pages--;
>+		h->free_huge_pages_node[nid]--;
>+		update_and_free_page(h, page);
>+	}
>+	spin_unlock(&hugetlb_lock);
>+}
>+
>+/*
>+ * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
>+ * make specified memory blocks removable from the system.
>+ * Note that start_pfn should aligned with (minimum) hugepage size.
>+ */
>+void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>+{
>+	unsigned int order = 8 * sizeof(void *);
>+	unsigned long pfn;
>+	struct hstate *h;
>+
>+	/* Set scan step to minimum hugepage size */
>+	for_each_hstate(h)
>+		if (order > huge_page_order(h))
>+			order = huge_page_order(h);
>+	VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << order));
>+	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << order)
>+		dissolve_free_huge_page(pfn_to_page(pfn));
>+}
>+
> static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
> {
> 	struct page *page;
>@@ -3418,6 +3459,28 @@ static int is_hugepage_on_freelist(struct page *hpage)
> 	return 0;
> }
>
>+bool is_hugepage_active(struct page *page)
>+{
>+	VM_BUG_ON(!PageHuge(page));
>+	/*
>+	 * This function can be called for a tail page because the caller,
>+	 * scan_movable_pages, scans through a given pfn-range which typically
>+	 * covers one memory block. In systems using gigantic hugepage (1GB
>+	 * for x86_64,) a hugepage is larger than a memory block, and we don't
>+	 * support migrating such large hugepages for now, so return false
>+	 * when called for tail pages.
>+	 */
>+	if (PageTail(page))
>+		return false;
>+	/*
>+	 * Refcount of a hwpoisoned hugepages is 1, but they are not active,
>+	 * so we should return false for them.
>+	 */
>+	if (unlikely(PageHWPoison(page)))
>+		return false;
>+	return page_count(page) > 0;
>+}
>+
> /*
>  * This function is called from memory failure code.
>  * Assume the caller holds page lock of the head page.
>diff --git v3.11-rc1.orig/mm/memory_hotplug.c v3.11-rc1/mm/memory_hotplug.c
>index ca1dd3a..31f08fa 100644
>--- v3.11-rc1.orig/mm/memory_hotplug.c
>+++ v3.11-rc1/mm/memory_hotplug.c
>@@ -30,6 +30,7 @@
> #include <linux/mm_inline.h>
> #include <linux/firmware-map.h>
> #include <linux/stop_machine.h>
>+#include <linux/hugetlb.h>
>
> #include <asm/tlbflush.h>
>
>@@ -1208,10 +1209,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
> }
>
> /*
>- * Scanning pfn is much easier than scanning lru list.
>- * Scan pfn from start to end and Find LRU page.
>+ * Scan pfn range [start,end) to find movable/migratable pages (LRU pages
>+ * and hugepages). We scan pfn because it's much easier than scanning over
>+ * linked list. This function returns the pfn of the first found movable
>+ * page if it's found, otherwise 0.
>  */
>-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
>+static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
> {
> 	unsigned long pfn;
> 	struct page *page;
>@@ -1220,6 +1223,13 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
> 			page = pfn_to_page(pfn);
> 			if (PageLRU(page))
> 				return pfn;
>+			if (PageHuge(page)) {
>+				if (is_hugepage_active(page))
>+					return pfn;
>+				else
>+					pfn = round_up(pfn + 1,
>+						1 << compound_order(page)) - 1;
>+			}
> 		}
> 	}
> 	return 0;
>@@ -1240,6 +1250,19 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> 		if (!pfn_valid(pfn))
> 			continue;
> 		page = pfn_to_page(pfn);
>+
>+		if (PageHuge(page)) {
>+			struct page *head = compound_head(page);
>+			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
>+			if (compound_order(head) > PFN_SECTION_SHIFT) {
>+				ret = -EBUSY;
>+				break;
>+			}
>+			if (isolate_huge_page(page, &source))
>+				move_pages -= 1 << compound_order(head);
>+			continue;
>+		}
>+
> 		if (!get_page_unless_zero(page))
> 			continue;
> 		/*
>@@ -1272,7 +1295,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> 	}
> 	if (!list_empty(&source)) {
> 		if (not_managed) {
>-			putback_lru_pages(&source);
>+			putback_movable_pages(&source);
> 			goto out;
> 		}
>
>@@ -1283,7 +1306,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> 		ret = migrate_pages(&source, alloc_migrate_target, 0,
> 					MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> 		if (ret)
>-			putback_lru_pages(&source);
>+			putback_movable_pages(&source);
> 	}
> out:
> 	return ret;
>@@ -1527,8 +1550,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
> 		drain_all_pages();
> 	}
>
>-	pfn = scan_lru_pages(start_pfn, end_pfn);
>-	if (pfn) { /* We have page on LRU */
>+	pfn = scan_movable_pages(start_pfn, end_pfn);
>+	if (pfn) { /* We have movable pages */
> 		ret = do_migrate_range(pfn, end_pfn);
> 		if (!ret) {
> 			drain = 1;
>@@ -1547,6 +1570,11 @@ static int __ref __offline_pages(unsigned long start_pfn,
> 	yield();
> 	/* drain pcp pages, this is synchronous. */
> 	drain_all_pages();
>+	/*
>+	 * dissolve free hugepages in the memory block before doing offlining
>+	 * actually in order to make hugetlbfs's object counting consistent.
>+	 */
>+	dissolve_free_huge_pages(start_pfn, end_pfn);
> 	/* check again */
> 	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
> 	if (offlined_pages < 0) {
>diff --git v3.11-rc1.orig/mm/page_alloc.c v3.11-rc1/mm/page_alloc.c
>index b100255..24fe228 100644
>--- v3.11-rc1.orig/mm/page_alloc.c
>+++ v3.11-rc1/mm/page_alloc.c
>@@ -60,6 +60,7 @@
> #include <linux/page-debug-flags.h>
> #include <linux/hugetlb.h>
> #include <linux/sched/rt.h>
>+#include <linux/hugetlb.h>
>
> #include <asm/sections.h>
> #include <asm/tlbflush.h>
>@@ -5928,6 +5929,17 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> 			continue;
>
> 		page = pfn_to_page(check);
>+
>+		/*
>+		 * Hugepages are not in LRU lists, but they're movable.
>+		 * We need not scan over tail pages bacause we don't
>+		 * handle each tail page individually in migration.
>+		 */
>+		if (PageHuge(page)) {
>+			iter = round_up(iter + 1, 1<<compound_order(page)) - 1;
>+			continue;
>+		}
>+
> 		/*
> 		 * We can't use page_count without pin a page
> 		 * because another CPU can free compound page.
>diff --git v3.11-rc1.orig/mm/page_isolation.c v3.11-rc1/mm/page_isolation.c
>index 383bdbb..cf48ef6 100644
>--- v3.11-rc1.orig/mm/page_isolation.c
>+++ v3.11-rc1/mm/page_isolation.c
>@@ -6,6 +6,7 @@
> #include <linux/page-isolation.h>
> #include <linux/pageblock-flags.h>
> #include <linux/memory.h>
>+#include <linux/hugetlb.h>
> #include "internal.h"
>
> int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
>@@ -252,6 +253,10 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
> {
> 	gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE;
>
>+	if (PageHuge(page))
>+		return alloc_huge_page_node(page_hstate(compound_head(page)),
>+					    numa_node_id());
>+

Why specify current node? Maybe current node is under remove.

Regards,
Wanpeng Li 

> 	if (PageHighMem(page))
> 		gfp_mask |= __GFP_HIGHMEM;
>
>-- 
>1.8.3.1
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-07-24  6:10 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-18 21:34 [PATCH v3 0/8] extend hugepage migration Naoya Horiguchi
2013-07-18 21:34 ` [PATCH 1/8] migrate: make core migration code aware of hugepage Naoya Horiguchi
2013-07-19  2:38   ` Hillf Danton
2013-07-19  3:18     ` Naoya Horiguchi
2013-07-19  4:04       ` Hillf Danton
2013-07-19  5:09         ` Naoya Horiguchi
2013-07-24  2:28   ` Wanpeng Li
2013-07-24  2:28   ` Wanpeng Li
2013-07-18 21:34 ` [PATCH 2/8] soft-offline: use migrate_pages() instead of migrate_huge_page() Naoya Horiguchi
2013-07-24  2:40   ` Wanpeng Li
2013-07-24  2:40   ` Wanpeng Li
2013-07-18 21:34 ` [PATCH 3/8] migrate: add hugepage migration code to migrate_pages() Naoya Horiguchi
2013-07-19  3:05   ` Hillf Danton
2013-07-19  4:13     ` Naoya Horiguchi
2013-07-24  3:33   ` Wanpeng Li
2013-07-24  3:33   ` Wanpeng Li
2013-07-18 21:34 ` [PATCH 4/8] migrate: add hugepage migration code to move_pages() Naoya Horiguchi
2013-07-19  3:36   ` Hillf Danton
2013-07-19  4:36     ` Naoya Horiguchi
2013-07-24  3:41   ` Wanpeng Li
2013-07-24  3:41   ` Wanpeng Li
2013-07-18 21:34 ` [PATCH 5/8] mbind: add hugepage migration code to mbind() Naoya Horiguchi
2013-07-24  3:43   ` Wanpeng Li
2013-07-24  3:43   ` Wanpeng Li
2013-07-18 21:34 ` [PATCH 6/8] migrate: remove VM_HUGETLB from vma flag check in vma_migratable() Naoya Horiguchi
2013-07-19  5:26   ` Hillf Danton
2013-07-24  3:45   ` Wanpeng Li
2013-07-24  3:45   ` Wanpeng Li
2013-07-18 21:34 ` [PATCH 7/8] memory-hotplug: enable memory hotplug to handle hugepage Naoya Horiguchi
2013-07-19  5:40   ` Hillf Danton
2013-07-19 14:39     ` Naoya Horiguchi
2013-07-20 10:04       ` Hillf Danton
2013-07-24  6:10   ` Wanpeng Li [this message]
2013-07-24  6:10   ` Wanpeng Li
     [not found]   ` <51ef6fd0.1019310a.5683.345bSMTPIN_ADDED_BROKEN@mx.google.com>
2013-07-24  6:26     ` Naoya Horiguchi
2013-07-18 21:34 ` [PATCH 8/8] prepare to remove /proc/sys/vm/hugepages_treat_as_movable Naoya Horiguchi
2013-07-24  3:46   ` Wanpeng Li
2013-07-24  3:46   ` Wanpeng Li
2013-07-19 15:33 ` [PATCH v3 0/8] extend hugepage migration Andi Kleen
2013-07-19 15:49   ` Naoya Horiguchi
2013-07-19 17:33     ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2013-07-25  4:54 [PATCH v4 " Naoya Horiguchi
2013-07-25  4:55 ` [PATCH 7/8] memory-hotplug: enable memory hotplug to handle hugepage Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20130724061007.GA23915@hacker.(null)' \
    --to=liwanp@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=dhillf@gmail.com \
    --cc=hughd@google.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).