From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx132.postini.com [74.125.245.132]) by kanga.kvack.org (Postfix) with SMTP id A9E9E6B0007 for ; Thu, 21 Feb 2013 14:42:43 -0500 (EST) From: Naoya Horiguchi Subject: [RFC][PATCH 0/9] extend hugepage migration Date: Thu, 21 Feb 2013 14:41:39 -0500 Message-Id: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Hi, Hugepage migration is now available only for soft offlining (moving data on the half corrupted page to another page to save the data). But it's also useful some other users of page migration, so this patchset tries to extend some of such users to support hugepage. The targets of this patchset are NUMA related system calls (i.e. migrate_pages(2), move_pages(2), and mbind(2)), and memory hotplug. This patchset does not extend page migration in memory compaction, because I think that users of memory compaction mainly expect to construct thp by arranging raw pages but hugepage migration doesn't help it. CMA, another user of page migration, can have benefit from hugepage migration, but is not enabled to support it now. This is because I've never used CMA and need to learn more to extend and/or test hugepage migration in CMA. I'll add this in later version if it becomes ready, or will post as a separate patchset. Hugepage migration of 1GB hugepage is not enabled for now, because I'm not sure whether users of 1GB hugepage really want it. We need to spare free hugepage in order to do migration, but I don't think that users want to 1GB memory to idle for that purpose (currently we can't expand/shrink 1GB hugepage pool after boot). Could you review and give me some comments/feedbacks? Thanks, Naoya Horiguchi --- Easy patch access: git@github.com:Naoya-Horiguchi/linux.git branch:extend_hugepage_migration Test code: git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git Naoya Horiguchi (9): migrate: add migrate_entry_wait_huge() migrate: make core migration code aware of hugepage soft-offline: use migrate_pages() instead of migrate_huge_page() migrate: clean up migrate_huge_page() migrate: enable migrate_pages() to migrate hugepage migrate: enable move_pages() to migrate hugepage mbind: enable mbind() to migrate hugepage memory-hotplug: enable memory hotplug to handle hugepage remove /proc/sys/vm/hugepages_treat_as_movable Documentation/sysctl/vm.txt | 16 ------ include/linux/hugetlb.h | 25 ++++++++-- include/linux/mempolicy.h | 2 +- include/linux/migrate.h | 12 ++--- include/linux/swapops.h | 4 ++ kernel/sysctl.c | 7 --- mm/hugetlb.c | 98 ++++++++++++++++++++++++++++-------- mm/memory-failure.c | 20 ++++++-- mm/memory.c | 6 ++- mm/memory_hotplug.c | 51 +++++++++++++++---- mm/mempolicy.c | 61 +++++++++++++++-------- mm/migrate.c | 119 ++++++++++++++++++++++++++++++-------------- mm/page_alloc.c | 12 +++++ mm/page_isolation.c | 5 ++ 14 files changed, 311 insertions(+), 127 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx153.postini.com [74.125.245.153]) by kanga.kvack.org (Postfix) with SMTP id A79076B0002 for ; Thu, 21 Feb 2013 14:42:43 -0500 (EST) From: Naoya Horiguchi Subject: [PATCH 4/9] migrate: clean up migrate_huge_page() Date: Thu, 21 Feb 2013 14:41:43 -0500 Message-Id: <1361475708-25991-5-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Due to the previous patch, soft_offline_huge_page() switches to use migrate_pages(), and migrate_huge_page() is not used any more. So let's remove it. Signed-off-by: Naoya Horiguchi --- include/linux/migrate.h | 6 ------ mm/migrate.c | 28 ---------------------------- 2 files changed, 34 deletions(-) diff --git v3.8.orig/include/linux/migrate.h v3.8/include/linux/migrate.h index d626c27..dc085e1 100644 --- v3.8.orig/include/linux/migrate.h +++ v3.8/include/linux/migrate.h @@ -45,9 +45,6 @@ extern int migrate_pages(struct list_head *l, new_page_t x, extern int migrate_movable_pages(struct list_head *from, new_page_t get_new_page, unsigned long private, bool offlining, enum migrate_mode mode, int reason); -extern int migrate_huge_page(struct page *, new_page_t x, - unsigned long private, bool offlining, - enum migrate_mode mode); extern int fail_migrate_page(struct address_space *, struct page *, struct page *); @@ -70,9 +67,6 @@ static inline int migrate_pages(struct list_head *l, new_page_t x, static inline int migrate_movable_pages(struct list_head *from, new_page_t get_new_page, unsigned long private, bool offlining, enum migrate_mode mode, int reason) { return -ENOSYS; } -static inline int migrate_huge_page(struct page *page, new_page_t x, - unsigned long private, bool offlining, - enum migrate_mode mode) { return -ENOSYS; } static inline int migrate_prep(void) { return -ENOSYS; } static inline int migrate_prep_local(void) { return -ENOSYS; } diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 8c13cc5..7b2ca1a 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -1106,34 +1106,6 @@ int migrate_movable_pages(struct list_head *from, new_page_t get_new_page, return err; } -int migrate_huge_page(struct page *hpage, new_page_t get_new_page, - unsigned long private, bool offlining, - enum migrate_mode mode) -{ - int pass, rc; - - for (pass = 0; pass < 10; pass++) { - rc = unmap_and_move_huge_page(get_new_page, - private, hpage, pass > 2, offlining, - mode); - switch (rc) { - case -ENOMEM: - goto out; - case -EAGAIN: - /* try again */ - cond_resched(); - break; - case MIGRATEPAGE_SUCCESS: - goto out; - default: - rc = -EIO; - goto out; - } - } -out: - return rc; -} - #ifdef CONFIG_NUMA /* * Move a list of individual pages -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx106.postini.com [74.125.245.106]) by kanga.kvack.org (Postfix) with SMTP id C406C6B0008 for ; Thu, 21 Feb 2013 14:42:43 -0500 (EST) From: Naoya Horiguchi Subject: [PATCH 1/9] migrate: add migrate_entry_wait_huge() Date: Thu, 21 Feb 2013 14:41:40 -0500 Message-Id: <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org When we have a page fault for the address which is backed by a hugepage under migration, the kernel can't wait correctly until the migration finishes. This is because pte_offset_map_lock() can't get a correct migration entry for hugepage. This patch adds migration_entry_wait_huge() to separate code path between normal pages and hugepages. Signed-off-by: Naoya Horiguchi --- include/linux/hugetlb.h | 2 ++ include/linux/swapops.h | 4 ++++ mm/hugetlb.c | 4 ++-- mm/migrate.c | 24 ++++++++++++++++++++++++ 4 files changed, 32 insertions(+), 2 deletions(-) diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index 0c80d3f..40b27f6 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -43,6 +43,7 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, #endif int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); +int is_hugetlb_entry_migration(pte_t pte); int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, unsigned int flags); @@ -109,6 +110,7 @@ static inline unsigned long hugetlb_total_pages(void) #define follow_hugetlb_page(m,v,p,vs,a,b,i,w) ({ BUG(); 0; }) #define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL) #define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; }) +#define is_hugetlb_entry_migration(pte) ({ BUG(); 0; }) #define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) static inline void hugetlb_report_meminfo(struct seq_file *m) { diff --git v3.8.orig/include/linux/swapops.h v3.8/include/linux/swapops.h index 47ead51..f68efdd 100644 --- v3.8.orig/include/linux/swapops.h +++ v3.8/include/linux/swapops.h @@ -137,6 +137,8 @@ static inline void make_migration_entry_read(swp_entry_t *entry) extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); +extern void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, + unsigned long address); #else #define make_migration_entry(page, write) swp_entry(0, 0) @@ -148,6 +150,8 @@ static inline int is_migration_entry(swp_entry_t swp) static inline void make_migration_entry_read(swp_entry_t *entryp) { } static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { } +static inline void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, + unsigned long address) { } static inline int is_write_migration_entry(swp_entry_t entry) { return 0; diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index 546db81..351025e 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -2313,7 +2313,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, return -ENOMEM; } -static int is_hugetlb_entry_migration(pte_t pte) +int is_hugetlb_entry_migration(pte_t pte) { swp_entry_t swp; @@ -2823,7 +2823,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (ptep) { entry = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { - migration_entry_wait(mm, (pmd_t *)ptep, address); + migration_entry_wait_huge(mm, (pmd_t *)ptep, address); return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 2fd8b4a..7d84f4c 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, pte_unmap_unlock(ptep, ptl); } +void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, + unsigned long address) +{ + spinlock_t *ptl = pte_lockptr(mm, pmd); + pte_t pte; + swp_entry_t entry; + struct page *page; + + spin_lock(ptl); + pte = huge_ptep_get((pte_t *)pmd); + if (!is_hugetlb_entry_migration(pte)) + goto out; + entry = pte_to_swp_entry(pte); + page = migration_entry_to_page(entry); + if (!get_page_unless_zero(page)) + goto out; + spin_unlock(ptl); + wait_on_page_locked(page); + put_page(page); + return; +out: + spin_unlock(ptl); +} + #ifdef CONFIG_BLOCK /* Returns true if all buffers are successfully locked */ static bool buffer_migrate_lock_buffers(struct buffer_head *head, -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx145.postini.com [74.125.245.145]) by kanga.kvack.org (Postfix) with SMTP id A89A46B0006 for ; Thu, 21 Feb 2013 14:42:43 -0500 (EST) From: Naoya Horiguchi Subject: [PATCH 2/9] migrate: make core migration code aware of hugepage Date: Thu, 21 Feb 2013 14:41:41 -0500 Message-Id: <1361475708-25991-3-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Before enabling each user of page migration to support hugepage, this patch adds necessary changes on core migration code. The main change is that the list of pages to migrate can link not only LRU pages, but also hugepages. Along with this, functions such as migrate_pages() and putback_movable_pages() need to be changed to handle hugepages. Signed-off-by: Naoya Horiguchi --- include/linux/hugetlb.h | 4 ++++ include/linux/mempolicy.h | 2 +- include/linux/migrate.h | 6 ++++++ mm/hugetlb.c | 16 ++++++++++++++++ mm/migrate.c | 27 +++++++++++++++++++++++++-- 5 files changed, 52 insertions(+), 3 deletions(-) diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index 40b27f6..8f87115 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -67,6 +67,8 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to, vm_flags_t vm_flags); void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); int dequeue_hwpoisoned_huge_page(struct page *page); +void putback_active_hugepage(struct page *page); +void putback_active_hugepages(struct list_head *l); void copy_huge_page(struct page *dst, struct page *src); extern unsigned long hugepages_treat_as_movable; @@ -130,6 +132,8 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) return 0; } +#define putback_active_hugepage(p) 0 +#define putback_active_hugepages(l) 0 static inline void copy_huge_page(struct page *dst, struct page *src) { } diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h index 0d7df39..2e475b5 100644 --- v3.8.orig/include/linux/mempolicy.h +++ v3.8/include/linux/mempolicy.h @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol); /* Check if a vma is migratable */ static inline int vma_migratable(struct vm_area_struct *vma) { - if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP)) + if (vma->vm_flags & (VM_IO | VM_PFNMAP)) return 0; /* * Migration allocates pages in the highest zone. If we cannot diff --git v3.8.orig/include/linux/migrate.h v3.8/include/linux/migrate.h index 1e9f627..d626c27 100644 --- v3.8.orig/include/linux/migrate.h +++ v3.8/include/linux/migrate.h @@ -42,6 +42,9 @@ extern int migrate_page(struct address_space *, extern int migrate_pages(struct list_head *l, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode, int reason); +extern int migrate_movable_pages(struct list_head *from, + new_page_t get_new_page, unsigned long private, bool offlining, + enum migrate_mode mode, int reason); extern int migrate_huge_page(struct page *, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode); @@ -64,6 +67,9 @@ static inline void putback_movable_pages(struct list_head *l) {} static inline int migrate_pages(struct list_head *l, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode, int reason) { return -ENOSYS; } +static inline int migrate_movable_pages(struct list_head *from, + new_page_t get_new_page, unsigned long private, bool offlining, + enum migrate_mode mode, int reason) { return -ENOSYS; } static inline int migrate_huge_page(struct page *page, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode) { return -ENOSYS; } diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index 351025e..cb9d43b8 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -3186,3 +3186,19 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage) return ret; } #endif + +void putback_active_hugepage(struct page *page) +{ + VM_BUG_ON(!PageHead(page)); + list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist); + put_page(page); +} + +void putback_active_hugepages(struct list_head *l) +{ + struct page *page; + struct page *page2; + + list_for_each_entry_safe(page, page2, l, lru) + putback_active_hugepage(page); +} diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 7d84f4c..e305dc0 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -100,6 +100,10 @@ void putback_movable_pages(struct list_head *l) struct page *page2; list_for_each_entry_safe(page, page2, l, lru) { + if (unlikely(PageHuge(page))) { + putback_active_hugepage(page); + continue; + } list_del(&page->lru); dec_zone_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); @@ -1046,8 +1050,12 @@ int migrate_pages(struct list_head *from, list_for_each_entry_safe(page, page2, from, lru) { cond_resched(); - - rc = unmap_and_move(get_new_page, private, + if (PageHuge(page)) + rc = unmap_and_move_huge_page(get_new_page, + private, page, pass > 2, + offlining, mode); + else + rc = unmap_and_move(get_new_page, private, page, pass > 2, offlining, mode); @@ -1081,6 +1089,21 @@ int migrate_pages(struct list_head *from, return rc; } +int migrate_movable_pages(struct list_head *from, new_page_t get_new_page, + unsigned long private, bool offlining, + enum migrate_mode mode, int reason) +{ + int err = 0; + + if (!list_empty(from)) { + err = migrate_pages(from, get_new_page, private, + offlining, mode, reason); + if (err) + putback_movable_pages(from); + } + return err; +} + int migrate_huge_page(struct page *hpage, new_page_t get_new_page, unsigned long private, bool offlining, enum migrate_mode mode) -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx180.postini.com [74.125.245.180]) by kanga.kvack.org (Postfix) with SMTP id D09516B0009 for ; Thu, 21 Feb 2013 14:42:43 -0500 (EST) From: Naoya Horiguchi Subject: [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Date: Thu, 21 Feb 2013 14:41:42 -0500 Message-Id: <1361475708-25991-4-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Currently migrate_huge_page() takes a pointer to a hugepage to be migrated as an argument, instead of taking a pointer to the list of hugepages to be migrated. This behavior was introduced in commit 189ebff28 ("hugetlb: simplify migrate_huge_page()"), and was OK because until now hugepage migration is enabled only for soft-offlining which takes only one hugepage in a single call. But the situation will change in the later patches in this series which enable other users of page migration to support hugepage migration. They can kick migration for both of normal pages and hugepages in a single call, so we need to go back to original implementation of using linked lists to collect the hugepages to be migrated. Signed-off-by: Naoya Horiguchi --- mm/memory-failure.c | 20 ++++++++++++++++---- mm/migrate.c | 2 ++ 2 files changed, 18 insertions(+), 4 deletions(-) diff --git v3.8.orig/mm/memory-failure.c v3.8/mm/memory-failure.c index bc126f6..01e4676 100644 --- v3.8.orig/mm/memory-failure.c +++ v3.8/mm/memory-failure.c @@ -1467,6 +1467,7 @@ static int soft_offline_huge_page(struct page *page, int flags) int ret; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); + LIST_HEAD(pagelist); /* Synchronized using the page lock with memory_failure() */ lock_page(hpage); @@ -1479,13 +1480,24 @@ static int soft_offline_huge_page(struct page *page, int flags) unlock_page(hpage); /* Keep page count to indicate a given hugepage is isolated. */ - ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, false, - MIGRATE_SYNC); - put_page(hpage); + list_move(&hpage->lru, &pagelist); + ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, false, + MIGRATE_SYNC, MR_MEMORY_FAILURE); if (ret) { pr_info("soft offline: %#lx: migration failed %d, type %lx\n", pfn, ret, page->flags); - return ret; + /* + * We know that soft_offline_huge_page() tries to migrate + * only one hugepage pointed to by hpage, so we need not + * run through the pagelist here. + */ + putback_active_hugepage(hpage); + if (ret > 0) + ret = -EIO; + } else { + set_page_hwpoison_huge_page(hpage); + dequeue_hwpoisoned_huge_page(hpage); + atomic_long_add(1< email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx171.postini.com [74.125.245.171]) by kanga.kvack.org (Postfix) with SMTP id D37D46B000D for ; Thu, 21 Feb 2013 14:42:44 -0500 (EST) From: Naoya Horiguchi Subject: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Date: Thu, 21 Feb 2013 14:41:44 -0500 Message-Id: <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org This patch extends check_range() to handle vma with VM_HUGETLB set. With this changes, we can migrate hugepage with migrate_pages(2). Note that for larger hugepages (covered by pud entries, 1GB for x86_64 for example), we simply skip it now. Signed-off-by: Naoya Horiguchi --- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 10 ++++++++++ mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++------------ 3 files changed, 48 insertions(+), 14 deletions(-) diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index 8f87115..eb33df5 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); int dequeue_hwpoisoned_huge_page(struct page *page); void putback_active_hugepage(struct page *page); void putback_active_hugepages(struct list_head *l); +void migrate_hugepage_add(struct page *page, struct list_head *list); void copy_huge_page(struct page *dst, struct page *src); extern unsigned long hugepages_treat_as_movable; @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write); struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, pud_t *pud, int write); -int pmd_huge(pmd_t pmd); -int pud_huge(pud_t pmd); +extern int pmd_huge(pmd_t pmd); +extern int pud_huge(pud_t pmd); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot); @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) #define putback_active_hugepage(p) 0 #define putback_active_hugepages(l) 0 +#define migrate_hugepage_add(p, l) 0 static inline void copy_huge_page(struct page *dst, struct page *src) { } diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index cb9d43b8..86ffcb7 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l) list_for_each_entry_safe(page, page2, l, lru) putback_active_hugepage(page); } + +void migrate_hugepage_add(struct page *page, struct list_head *list) +{ + VM_BUG_ON(!PageHuge(page)); + get_page(page); + spin_lock(&hugetlb_lock); + list_move_tail(&page->lru, list); + spin_unlock(&hugetlb_lock); + return; +} diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c index e2df1c1..8627135 100644 --- v3.8.orig/mm/mempolicy.c +++ v3.8/mm/mempolicy.c @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, return addr != end; } +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, + const nodemask_t *nodes, unsigned long flags, + void *private) +{ +#ifdef CONFIG_HUGETLB_PAGE + int nid; + struct page *page; + + spin_lock(&vma->vm_mm->page_table_lock); + page = pte_page(huge_ptep_get((pte_t *)pmd)); + spin_unlock(&vma->vm_mm->page_table_lock); + nid = page_to_nid(page); + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT) + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1) + || flags & MPOL_MF_MOVE_ALL)) + migrate_hugepage_add(page, private); +#else + BUG(); +#endif +} + static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, const nodemask_t *nodes, unsigned long flags, @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { + check_hugetlb_pmd_range(vma, pmd, nodes, + flags, private); + continue; + } split_huge_page_pmd(vma, addr, pmd); if (pmd_none_or_trans_huge_or_clear_bad(pmd)) continue; @@ -557,6 +583,8 @@ static inline int check_pud_range(struct vm_area_struct *vma, pgd_t *pgd, pud = pud_offset(pgd, addr); do { next = pud_addr_end(addr, end); + if (pud_huge(*pud) && is_vm_hugetlb_page(vma)) + continue; if (pud_none_or_clear_bad(pud)) continue; if (check_pmd_range(vma, pud, addr, next, nodes, @@ -648,9 +676,6 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end, return ERR_PTR(-EFAULT); } - if (is_vm_hugetlb_page(vma)) - goto next; - if (flags & MPOL_MF_LAZY) { change_prot_numa(vma, start, endvma); goto next; @@ -999,7 +1024,11 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist, static struct page *new_node_page(struct page *page, unsigned long node, int **x) { - return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0); + if (PageHuge(page)) + return alloc_huge_page_node(page_hstate(compound_head(page)), + node); + else + return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0); } /* @@ -1011,7 +1040,6 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest, { nodemask_t nmask; LIST_HEAD(pagelist); - int err = 0; nodes_clear(nmask); node_set(source, nmask); @@ -1025,15 +1053,9 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest, check_range(mm, mm->mmap->vm_start, mm->task_size, &nmask, flags | MPOL_MF_DISCONTIG_OK, &pagelist); - if (!list_empty(&pagelist)) { - err = migrate_pages(&pagelist, new_node_page, dest, + return migrate_movable_pages(&pagelist, new_node_page, dest, false, MIGRATE_SYNC, MR_SYSCALL); - if (err) - putback_lru_pages(&pagelist); - } - - return err; } /* -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx176.postini.com [74.125.245.176]) by kanga.kvack.org (Postfix) with SMTP id 386B56B0009 for ; Thu, 21 Feb 2013 14:42:46 -0500 (EST) From: Naoya Horiguchi Subject: [PATCH 6/9] migrate: enable move_pages() to migrate hugepage Date: Thu, 21 Feb 2013 14:41:45 -0500 Message-Id: <1361475708-25991-7-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org This patch extends move_pages() to handle vma with VM_HUGETLB and enables to migrate hugepage with migrate_pages(2). We avoid getting refcount on tail pages of hugepage, because unlike thp, hugepage is not split and we need not care about races with splitting. And migration of larger (1GB for x86_64) hugepage are not enabled. Signed-off-by: Naoya Horiguchi --- mm/memory.c | 6 ++++-- mm/migrate.c | 29 ++++++++++++++++++++--------- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git v3.8.orig/mm/memory.c v3.8/mm/memory.c index bb1369f..d7cfd11 100644 --- v3.8.orig/mm/memory.c +++ v3.8/mm/memory.c @@ -1495,7 +1495,8 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, if (pud_none(*pud)) goto no_page_table; if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) { - BUG_ON(flags & FOLL_GET); + if (flags & FOLL_GET) + goto out; page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE); goto out; } @@ -1506,8 +1507,9 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, if (pmd_none(*pmd)) goto no_page_table; if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) { - BUG_ON(flags & FOLL_GET); page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE); + if (flags & FOLL_GET && PageHead(page)) + get_page_foll(page); goto out; } if ((flags & FOLL_NUMA) && pmd_numa(*pmd)) diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 7b2ca1a..36959d6 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -1130,7 +1130,11 @@ static struct page *new_page_node(struct page *p, unsigned long private, *result = &pm->status; - return alloc_pages_exact_node(pm->node, + if (PageHuge(p)) + return alloc_huge_page_node(page_hstate(compound_head(p)), + pm->node); + else + return alloc_pages_exact_node(pm->node, GFP_HIGHUSER_MOVABLE | GFP_THISNODE, 0); } @@ -1176,6 +1180,13 @@ static int do_move_page_to_node_array(struct mm_struct *mm, if (PageReserved(page) || PageKsm(page)) goto put_and_set; + /* + * follow_page(FOLL_GET) didn't get refcount for tail pages of + * hugepage, so here we skip putting it. + */ + if (PageHuge(page) && PageTail(page)) + goto set_status; + pp->page = page; err = page_to_nid(page); @@ -1190,6 +1201,12 @@ static int do_move_page_to_node_array(struct mm_struct *mm, !migrate_all) goto put_and_set; + if (PageHuge(page)) { + get_page(page); + list_move_tail(&page->lru, &pagelist); + goto put_and_set; + } + err = isolate_lru_page(page); if (!err) { list_add_tail(&page->lru, &pagelist); @@ -1207,14 +1224,8 @@ static int do_move_page_to_node_array(struct mm_struct *mm, pp->status = err; } - err = 0; - if (!list_empty(&pagelist)) { - err = migrate_pages(&pagelist, new_page_node, - (unsigned long)pm, 0, MIGRATE_SYNC, - MR_SYSCALL); - if (err) - putback_lru_pages(&pagelist); - } + err = migrate_movable_pages(&pagelist, new_page_node, + (unsigned long)pm, 0, MIGRATE_SYNC, MR_SYSCALL); up_read(&mm->mmap_sem); return err; -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx163.postini.com [74.125.245.163]) by kanga.kvack.org (Postfix) with SMTP id C6DCB6B0009 for ; Thu, 21 Feb 2013 14:42:46 -0500 (EST) From: Naoya Horiguchi Subject: [PATCH 7/9] mbind: enable mbind() to migrate hugepage Date: Thu, 21 Feb 2013 14:41:46 -0500 Message-Id: <1361475708-25991-8-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org This patch enables mbind(2) to migrate hugepages. Page collecting function check_range() are already aware of hugepage by the previous patch in this series. Signed-off-by: Naoya Horiguchi --- include/linux/hugetlb.h | 3 +++ mm/hugetlb.c | 2 +- mm/mempolicy.c | 15 ++++++--------- mm/migrate.c | 7 ++++++- 4 files changed, 16 insertions(+), 11 deletions(-) diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index eb33df5..86a4d78 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -263,6 +263,8 @@ struct huge_bootmem_page { #endif }; +struct page *alloc_huge_page(struct vm_area_struct *vma, + unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_node(struct hstate *h, int nid); /* arch callback */ @@ -358,6 +360,7 @@ static inline int hstate_index(struct hstate *h) #else struct hstate {}; +#define alloc_huge_page(v, a, r) NULL #define alloc_huge_page_node(h, nid) NULL #define alloc_bootmem_huge_page(h) NULL #define hstate_file(f) NULL diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index 86ffcb7..ccf9995 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -1116,7 +1116,7 @@ static void vma_commit_reservation(struct hstate *h, } } -static struct page *alloc_huge_page(struct vm_area_struct *vma, +struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) { struct hugepage_subpool *spool = subpool_vma(vma); diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c index 8627135..9f56c40 100644 --- v3.8.orig/mm/mempolicy.c +++ v3.8/mm/mempolicy.c @@ -1187,6 +1187,8 @@ static struct page *new_vma_page(struct page *page, unsigned long private, int * vma = vma->vm_next; } + if (PageHuge(page)) + return alloc_huge_page(vma, address, 1); /* * if !vma, alloc_page_vma() will use task or system default policy */ @@ -1291,15 +1293,10 @@ static long do_mbind(unsigned long start, unsigned long len, if (!err) { int nr_failed = 0; - if (!list_empty(&pagelist)) { - WARN_ON_ONCE(flags & MPOL_MF_LAZY); - nr_failed = migrate_pages(&pagelist, new_vma_page, - (unsigned long)vma, - false, MIGRATE_SYNC, - MR_MEMPOLICY_MBIND); - if (nr_failed) - putback_lru_pages(&pagelist); - } + WARN_ON_ONCE(flags & MPOL_MF_LAZY); + nr_failed = migrate_movable_pages(&pagelist, new_vma_page, + (unsigned long)vma, false, + MIGRATE_SYNC, MR_MEMPOLICY_MBIND); if (nr_failed && (flags & MPOL_MF_STRICT)) err = -EIO; diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 36959d6..8c457e7 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -974,7 +974,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, struct page *new_hpage = get_new_page(hpage, private, &result); struct anon_vma *anon_vma = NULL; - if (!new_hpage) + /* + * Getting a new hugepage with alloc_huge_page() (which can happen + * when migration is caused by mbind()) can return ERR_PTR value, + * so we need take care of the case here. + */ + if (!new_hpage || IS_ERR_VALUE(new_hpage)) return -ENOMEM; rc = -EAGAIN; -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx172.postini.com [74.125.245.172]) by kanga.kvack.org (Postfix) with SMTP id EF5EB6B000E for ; Thu, 21 Feb 2013 14:42:48 -0500 (EST) From: Naoya Horiguchi Subject: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Date: Thu, 21 Feb 2013 14:41:47 -0500 Message-Id: <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Currently we can't offline memory blocks which contain hugepages because a hugepage is considered as an unmovable page. But now with this patch series, a hugepage has become movable, so by using hugepage migration we can offline such memory blocks. What's different from other users of hugepage migration is that we need to decompose all the hugepages inside the target memory block into free buddy pages after hugepage migration, because otherwise free hugepages remaining in the memory block intervene the memory offlining. For this reason we introduce new functions dissolve_free_huge_page() and dissolve_free_huge_pages(). Other than that, what this patch does is straightforwardly to add hugepage migration code, that is, adding hugepage code to the functions which scan over pfn and collect hugepages to be migrated, and adding a hugepage allocation function to alloc_migrate_target(). As for larger hugepages (1GB for x86_64), it's not easy to do hotremove over them because it's larger than memory block. So we now simply leave it to fail as it is. Signed-off-by: Naoya Horiguchi --- include/linux/hugetlb.h | 8 ++++++++ mm/hugetlb.c | 43 +++++++++++++++++++++++++++++++++++++++++ mm/memory_hotplug.c | 51 ++++++++++++++++++++++++++++++++++++++++--------- mm/migrate.c | 12 +++++++++++- mm/page_alloc.c | 12 ++++++++++++ mm/page_isolation.c | 5 +++++ 6 files changed, 121 insertions(+), 10 deletions(-) diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index 86a4d78..e33f07f 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page); void putback_active_hugepage(struct page *page); void putback_active_hugepages(struct list_head *l); void migrate_hugepage_add(struct page *page, struct list_head *list); +int is_hugepage_movable(struct page *page); void copy_huge_page(struct page *dst, struct page *src); extern unsigned long hugepages_treat_as_movable; @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) #define putback_active_hugepage(p) 0 #define putback_active_hugepages(l) 0 #define migrate_hugepage_add(p, l) 0 +#define is_hugepage_movable(x) 0 static inline void copy_huge_page(struct page *dst, struct page *src) { } @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h) return h - hstates; } +extern void dissolve_free_huge_page(struct page *page); +extern void dissolve_free_huge_pages(unsigned long start_pfn, + unsigned long end_pfn); + #else struct hstate {}; #define alloc_huge_page(v, a, r) NULL @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) } #define hstate_index_to_shift(index) 0 #define hstate_index(h) 0 +#define dissolve_free_huge_page(p) 0 +#define dissolve_free_huge_pages(s, e) 0 #endif #endif /* _LINUX_HUGETLB_H */ diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index ccf9995..c28e6c9 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, return ret; } +/* Dissolve a given free hugepage into free pages. */ +void dissolve_free_huge_page(struct page *page) +{ + if (PageHuge(page) && !page_count(page)) { + struct hstate *h = page_hstate(page); + int nid = page_to_nid(page); + spin_lock(&hugetlb_lock); + list_del(&page->lru); + h->free_huge_pages--; + h->free_huge_pages_node[nid]--; + update_and_free_page(h, page); + spin_unlock(&hugetlb_lock); + } +} + +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */ +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long pfn; + unsigned int step = 1 << (HUGETLB_PAGE_ORDER); + for (pfn = start_pfn; pfn < end_pfn; pfn += step) + dissolve_free_huge_page(pfn_to_page(pfn)); +} + static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) { struct page *page; @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage) return 0; } +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ +int is_hugepage_movable(struct page *hpage) +{ + struct page *page; + struct page *tmp; + struct hstate *h = page_hstate(hpage); + int ret = 0; + + VM_BUG_ON(!PageHuge(hpage)); + if (PageTail(hpage)) + return 0; + spin_lock(&hugetlb_lock); + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) + if (page == hpage) + ret = 1; + spin_unlock(&hugetlb_lock); + return ret; +} + /* * This function is called from memory failure code. * Assume the caller holds page lock of the head page. diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c index d04ed87..6418de2 100644 --- v3.8.orig/mm/memory_hotplug.c +++ v3.8/mm/memory_hotplug.c @@ -29,6 +29,7 @@ #include #include #include +#include #include @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) } /* - * Scanning pfn is much easier than scanning lru list. - * Scan pfn from start to end and Find LRU page. + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages + * and hugepages). We scan pfn because it's much easier than scanning over + * linked list. This function returns the pfn of the first found movable + * page if it's found, otherwise 0. */ -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) +static unsigned long scan_movable_pages(unsigned long start, unsigned long end) { unsigned long pfn; struct page *page; @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end) page = pfn_to_page(pfn); if (PageLRU(page)) return pfn; + if (PageHuge(page)) { + if (is_hugepage_movable(page)) + return pfn; + else + pfn += (1 << compound_order(page)) - 1; + } } } return 0; @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) page = pfn_to_page(pfn); if (!get_page_unless_zero(page)) continue; + if (PageHuge(page)) { + /* + * Larger hugepage (1GB for x86_64) is larger than + * memory block, so pfn scan can start at the tail + * page of larger hugepage. In such case, + * we simply skip the hugepage and move the cursor + * to the last tail page. + */ + if (PageTail(page)) { + struct page *head = compound_head(page); + pfn = page_to_pfn(head) + + (1 << compound_order(head)) - 1; + put_page(page); + continue; + } + pfn = (1 << compound_order(page)) - 1; + if (huge_page_size(page_hstate(page)) != PMD_SIZE) { + put_page(page); + continue; + } + list_move_tail(&page->lru, &source); + move_pages -= 1 << compound_order(page); + continue; + } /* * We can skip free pages. And we can only deal with pages on * LRU. @@ -1049,7 +1082,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) } if (!list_empty(&source)) { if (not_managed) { - putback_lru_pages(&source); + putback_movable_pages(&source); goto out; } @@ -1057,11 +1090,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) * alloc_migrate_target should be improooooved!! * migrate_pages returns # of failed pages. */ - ret = migrate_pages(&source, alloc_migrate_target, 0, + ret = migrate_movable_pages(&source, alloc_migrate_target, 0, true, MIGRATE_SYNC, MR_MEMORY_HOTPLUG); - if (ret) - putback_lru_pages(&source); } out: return ret; @@ -1304,8 +1335,8 @@ static int __ref __offline_pages(unsigned long start_pfn, drain_all_pages(); } - pfn = scan_lru_pages(start_pfn, end_pfn); - if (pfn) { /* We have page on LRU */ + pfn = scan_movable_pages(start_pfn, end_pfn); + if (pfn) { /* We have movable pages */ ret = do_migrate_range(pfn, end_pfn); if (!ret) { drain = 1; @@ -1324,6 +1355,8 @@ static int __ref __offline_pages(unsigned long start_pfn, yield(); /* drain pcp pages, this is synchronous. */ drain_all_pages(); + /* dissolve all free hugepages inside the memory block */ + dissolve_free_huge_pages(start_pfn, end_pfn); /* check again */ offlined_pages = check_pages_isolated(start_pfn, end_pfn); if (offlined_pages < 0) { diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 8c457e7..a491a98 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -1009,8 +1009,18 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, unlock_page(hpage); out: - if (rc != -EAGAIN) + if (rc != -EAGAIN) { putback_active_hugepage(hpage); + + /* + * After hugepage migration from memory hotplug, the original + * hugepage should never be allocated again. This will be + * done by dissolving it into free normal pages, because + * we already set migratetype to MIGRATE_ISOLATE for them. + */ + if (offlining) + dissolve_free_huge_page(hpage); + } put_page(new_hpage); if (result) { if (rc) diff --git v3.8.orig/mm/page_alloc.c v3.8/mm/page_alloc.c index 6a83cd3..c37951d 100644 --- v3.8.orig/mm/page_alloc.c +++ v3.8/mm/page_alloc.c @@ -58,6 +58,7 @@ #include #include #include +#include #include #include @@ -5686,6 +5687,17 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, continue; page = pfn_to_page(check); + + /* + * Hugepages are not in LRU lists, but they're movable. + * We need not scan over tail pages bacause we don't + * handle each tail page individually in migration. + */ + if (PageHuge(page)) { + iter += (1 << compound_order(page)) - 1; + continue; + } + /* * We can't use page_count without pin a page * because another CPU can free compound page. diff --git v3.8.orig/mm/page_isolation.c v3.8/mm/page_isolation.c index 383bdbb..cf48ef6 100644 --- v3.8.orig/mm/page_isolation.c +++ v3.8/mm/page_isolation.c @@ -6,6 +6,7 @@ #include #include #include +#include #include "internal.h" int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages) @@ -252,6 +253,10 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, { gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE; + if (PageHuge(page)) + return alloc_huge_page_node(page_hstate(compound_head(page)), + numa_node_id()); + if (PageHighMem(page)) gfp_mask |= __GFP_HIGHMEM; -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx110.postini.com [74.125.245.110]) by kanga.kvack.org (Postfix) with SMTP id E8E0F6B0010 for ; Thu, 21 Feb 2013 14:42:49 -0500 (EST) From: Naoya Horiguchi Subject: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Date: Thu, 21 Feb 2013 14:41:48 -0500 Message-Id: <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Now hugepages are definitely movable. So allocating hugepages from ZONE_MOVABLE is natural and we have no reason to keep this parameter. Signed-off-by: Naoya Horiguchi --- Documentation/sysctl/vm.txt | 16 ---------------- include/linux/hugetlb.h | 2 -- kernel/sysctl.c | 7 ------- mm/hugetlb.c | 23 +++++------------------ 4 files changed, 5 insertions(+), 43 deletions(-) diff --git v3.8.orig/Documentation/sysctl/vm.txt v3.8/Documentation/sysctl/vm.txt index 078701f..997350a 100644 --- v3.8.orig/Documentation/sysctl/vm.txt +++ v3.8/Documentation/sysctl/vm.txt @@ -167,22 +167,6 @@ fragmentation index is <= extfrag_threshold. The default value is 500. ============================================================== -hugepages_treat_as_movable - -This parameter is only useful when kernelcore= is specified at boot time to -create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages -are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero -value written to hugepages_treat_as_movable allows huge pages to be allocated -from ZONE_MOVABLE. - -Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge -pages pool can easily grow or shrink within. Assuming that applications are -not running that mlock() a lot of memory, it is likely the huge pages pool -can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value -into nr_hugepages and triggering page reclaim. - -============================================================== - hugetlb_shm_group hugetlb_shm_group contains group id that is allowed to create SysV diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index e33f07f..c97e5c5 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -35,7 +35,6 @@ int PageHuge(struct page *page); void reset_vma_resv_huge_pages(struct vm_area_struct *vma); int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); -int hugetlb_treat_movable_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); #ifdef CONFIG_NUMA int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, @@ -73,7 +72,6 @@ void migrate_hugepage_add(struct page *page, struct list_head *list); int is_hugepage_movable(struct page *page); void copy_huge_page(struct page *dst, struct page *src); -extern unsigned long hugepages_treat_as_movable; extern const unsigned long hugetlb_zero, hugetlb_infinity; extern int sysctl_hugetlb_shm_group; extern struct list_head huge_boot_pages; diff --git v3.8.orig/kernel/sysctl.c v3.8/kernel/sysctl.c index c88878d..a98bcf2 100644 --- v3.8.orig/kernel/sysctl.c +++ v3.8/kernel/sysctl.c @@ -1189,13 +1189,6 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, - { - .procname = "hugepages_treat_as_movable", - .data = &hugepages_treat_as_movable, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = hugetlb_treat_movable_handler, - }, { .procname = "nr_overcommit_hugepages", .data = NULL, diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index c28e6c9..c60d203 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -33,7 +33,6 @@ #include "internal.h" const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; -static gfp_t htlb_alloc_mask = GFP_HIGHUSER; unsigned long hugepages_treat_as_movable; int hugetlb_max_hstate __read_mostly; @@ -542,7 +541,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, retry_cpuset: cpuset_mems_cookie = get_mems_allowed(); zonelist = huge_zonelist(vma, address, - htlb_alloc_mask, &mpol, &nodemask); + GFP_HIGHUSER_MOVABLE, &mpol, &nodemask); /* * A child process with MAP_PRIVATE mappings created by their parent * have no page reserves. This check ensures that reservations are @@ -558,7 +557,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, for_each_zone_zonelist_nodemask(zone, z, zonelist, MAX_NR_ZONES - 1, nodemask) { - if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) { + if (cpuset_zone_allowed_softwall(zone, GFP_HIGHUSER_MOVABLE)) { page = dequeue_huge_page_node(h, zone_to_nid(zone)); if (page) { if (!avoid_reserve) @@ -698,7 +697,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid) return NULL; page = alloc_pages_exact_node(nid, - htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE| + GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE| __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); if (page) { @@ -909,12 +908,12 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) spin_unlock(&hugetlb_lock); if (nid == NUMA_NO_NODE) - page = alloc_pages(htlb_alloc_mask|__GFP_COMP| + page = alloc_pages(GFP_HIGHUSER_MOVABLE|__GFP_COMP| __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); else page = alloc_pages_exact_node(nid, - htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE| + GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE| __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); if (page && arch_prepare_hugepage(page)) { @@ -2078,18 +2077,6 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *table, int write, } #endif /* CONFIG_NUMA */ -int hugetlb_treat_movable_handler(struct ctl_table *table, int write, - void __user *buffer, - size_t *length, loff_t *ppos) -{ - proc_dointvec(table, write, buffer, length, ppos); - if (hugepages_treat_as_movable) - htlb_alloc_mask = GFP_HIGHUSER_MOVABLE; - else - htlb_alloc_mask = GFP_HIGHUSER; - return 0; -} - int hugetlb_overcommit_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, loff_t *ppos) -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx199.postini.com [74.125.245.199]) by kanga.kvack.org (Postfix) with SMTP id 32B326B0002 for ; Sat, 23 Feb 2013 02:05:31 -0500 (EST) Received: by mail-ob0-f172.google.com with SMTP id tb18so1254766obb.3 for ; Fri, 22 Feb 2013 23:05:30 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> Date: Sat, 23 Feb 2013 15:05:30 +0800 Message-ID: Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage From: Hillf Danton Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org, Hillf Danton , Michal Hocko Hello Naoya [add Michal in cc list] On Fri, Feb 22, 2013 at 3:41 AM, Naoya Horiguchi wrote: > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > +int is_hugepage_movable(struct page *hpage) s/int/bool/ can we? > +{ > + struct page *page; > + struct page *tmp; > + struct hstate *h = page_hstate(hpage); Make sense to compute hstate for a tail page? > + int ret = 0; > + > + VM_BUG_ON(!PageHuge(hpage)); > + if (PageTail(hpage)) > + return 0; VM_BUG_ON(!PageHuge(hpage) || PageTail(hpage)), can we? > + spin_lock(&hugetlb_lock); > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) s/_safe// can we? > + if (page == hpage) > + ret = 1; Can we bail out with ret set to be true? > + spin_unlock(&hugetlb_lock); > + return ret; > +} -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx155.postini.com [74.125.245.155]) by kanga.kvack.org (Postfix) with SMTP id 7CC346B0005 for ; Mon, 25 Feb 2013 11:58:14 -0500 (EST) Date: Mon, 25 Feb 2013 11:57:56 -0500 From: Naoya Horiguchi Message-ID: <1361811476-la4ql3y2-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Hillf Danton Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org, Michal Hocko Hi Hillf, On Sat, Feb 23, 2013 at 03:05:30PM +0800, Hillf Danton wrote: > Hello Naoya > > [add Michal in cc list] > > On Fri, Feb 22, 2013 at 3:41 AM, Naoya Horiguchi > wrote: > > > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > > +int is_hugepage_movable(struct page *hpage) > s/int/bool/ can we? Yes, we can. I'll do this. > > +{ > > + struct page *page; > > + struct page *tmp; > > + struct hstate *h = page_hstate(hpage); > Make sense to compute hstate for a tail page? No need to do this here. It's better to put it after PageTail check. > > + int ret = 0; > > + > > + VM_BUG_ON(!PageHuge(hpage)); > > + if (PageTail(hpage)) > > + return 0; > VM_BUG_ON(!PageHuge(hpage) || PageTail(hpage)), can we? I think that firing BUG_ON() for tail pages is overkill. Pfn range over which scan_movable_pages() runs could start at the pfn inside the hugepage when we try to hot-remove the memory block used by 1GB hugepage. In that case, is_hugepage_movable() can be called for tail pages as a normal behavior. But anyway, I'll add the comment for this corner case. > > + spin_lock(&hugetlb_lock); > > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) > s/_safe// can we? OK. > > + if (page == hpage) > > + ret = 1; > Can we bail out with ret set to be true? Yes, inserting break is good for performance. > > + spin_unlock(&hugetlb_lock); > > + return ret; > > +} Thank you! Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx126.postini.com [74.125.245.126]) by kanga.kvack.org (Postfix) with SMTP id C01AC6B0005 for ; Wed, 27 Feb 2013 02:27:20 -0500 (EST) Date: Wed, 27 Feb 2013 02:25:17 -0500 From: Chen Gong Subject: Re: [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Message-ID: <20130227072517.GA30971@gchen.bj.intel.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-4-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="MGYHOYXEY6WxJCY8" Content-Disposition: inline In-Reply-To: <1361475708-25991-4-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org --MGYHOYXEY6WxJCY8 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Feb 21, 2013 at 02:41:42PM -0500, Naoya Horiguchi wrote: > Date: Thu, 21 Feb 2013 14:41:42 -0500 > From: Naoya Horiguchi > To: linux-mm@kvack.org > Cc: Andrew Morton , Mel Gorman , > Hugh Dickins , KOSAKI Motohiro > , Andi Kleen , > linux-kernel@vger.kernel.org > Subject: [PATCH 3/9] soft-offline: use migrate_pages() instead of > migrate_huge_page() >=20 > Currently migrate_huge_page() takes a pointer to a hugepage to be > migrated as an argument, instead of taking a pointer to the list of > hugepages to be migrated. This behavior was introduced in commit > 189ebff28 ("hugetlb: simplify migrate_huge_page()"), and was OK > because until now hugepage migration is enabled only for soft-offlining > which takes only one hugepage in a single call. >=20 > But the situation will change in the later patches in this series > which enable other users of page migration to support hugepage migration. > They can kick migration for both of normal pages and hugepages > in a single call, so we need to go back to original implementation > of using linked lists to collect the hugepages to be migrated. >=20 > Signed-off-by: Naoya Horiguchi > --- > mm/memory-failure.c | 20 ++++++++++++++++---- > mm/migrate.c | 2 ++ > 2 files changed, 18 insertions(+), 4 deletions(-) >=20 > diff --git v3.8.orig/mm/memory-failure.c v3.8/mm/memory-failure.c > index bc126f6..01e4676 100644 > --- v3.8.orig/mm/memory-failure.c > +++ v3.8/mm/memory-failure.c > @@ -1467,6 +1467,7 @@ static int soft_offline_huge_page(struct page *page= , int flags) > int ret; > unsigned long pfn =3D page_to_pfn(page); > struct page *hpage =3D compound_head(page); > + LIST_HEAD(pagelist); > =20 > /* Synchronized using the page lock with memory_failure() */ > lock_page(hpage); > @@ -1479,13 +1480,24 @@ static int soft_offline_huge_page(struct page *pa= ge, int flags) > unlock_page(hpage); > =20 > /* Keep page count to indicate a given hugepage is isolated. */ > - ret =3D migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, false, > - MIGRATE_SYNC); > - put_page(hpage); > + list_move(&hpage->lru, &pagelist); > + ret =3D migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, false, > + MIGRATE_SYNC, MR_MEMORY_FAILURE); > if (ret) { > pr_info("soft offline: %#lx: migration failed %d, type %lx\n", > pfn, ret, page->flags); > - return ret; > + /* > + * We know that soft_offline_huge_page() tries to migrate > + * only one hugepage pointed to by hpage, so we need not > + * run through the pagelist here. > + */ > + putback_active_hugepage(hpage); > + if (ret > 0) > + ret =3D -EIO; > + } else { > + set_page_hwpoison_huge_page(hpage); > + dequeue_hwpoisoned_huge_page(hpage); > + atomic_long_add(1< email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx104.postini.com [74.125.245.104]) by kanga.kvack.org (Postfix) with SMTP id 06A4A6B0005 for ; Wed, 27 Feb 2013 02:38:06 -0500 (EST) Date: Wed, 27 Feb 2013 02:36:04 -0500 From: Chen Gong Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Message-ID: <20130227073604.GB30971@gchen.bj.intel.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="hHWLQfXTYDoKhP50" Content-Disposition: inline In-Reply-To: <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org --hHWLQfXTYDoKhP50 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Feb 21, 2013 at 02:41:47PM -0500, Naoya Horiguchi wrote: > Date: Thu, 21 Feb 2013 14:41:47 -0500 > From: Naoya Horiguchi > To: linux-mm@kvack.org > Cc: Andrew Morton , Mel Gorman , > Hugh Dickins , KOSAKI Motohiro > , Andi Kleen , > linux-kernel@vger.kernel.org > Subject: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle > hugepage >=20 > Currently we can't offline memory blocks which contain hugepages because > a hugepage is considered as an unmovable page. But now with this patch > series, a hugepage has become movable, so by using hugepage migration we > can offline such memory blocks. >=20 > What's different from other users of hugepage migration is that we need > to decompose all the hugepages inside the target memory block into free > buddy pages after hugepage migration, because otherwise free hugepages > remaining in the memory block intervene the memory offlining. > For this reason we introduce new functions dissolve_free_huge_page() and > dissolve_free_huge_pages(). >=20 > Other than that, what this patch does is straightforwardly to add hugepage > migration code, that is, adding hugepage code to the functions which scan > over pfn and collect hugepages to be migrated, and adding a hugepage > allocation function to alloc_migrate_target(). >=20 > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove > over them because it's larger than memory block. So we now simply leave > it to fail as it is. >=20 > Signed-off-by: Naoya Horiguchi > --- > include/linux/hugetlb.h | 8 ++++++++ > mm/hugetlb.c | 43 +++++++++++++++++++++++++++++++++++++++++ > mm/memory_hotplug.c | 51 ++++++++++++++++++++++++++++++++++++++++---= ------ > mm/migrate.c | 12 +++++++++++- > mm/page_alloc.c | 12 ++++++++++++ > mm/page_isolation.c | 5 +++++ > 6 files changed, 121 insertions(+), 10 deletions(-) >=20 > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > index 86a4d78..e33f07f 100644 > --- v3.8.orig/include/linux/hugetlb.h > +++ v3.8/include/linux/hugetlb.h > @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page); > void putback_active_hugepage(struct page *page); > void putback_active_hugepages(struct list_head *l); > void migrate_hugepage_add(struct page *page, struct list_head *list); > +int is_hugepage_movable(struct page *page); > void copy_huge_page(struct page *dst, struct page *src); > =20 > extern unsigned long hugepages_treat_as_movable; > @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct= page *page) > #define putback_active_hugepage(p) 0 > #define putback_active_hugepages(l) 0 > #define migrate_hugepage_add(p, l) 0 > +#define is_hugepage_movable(x) 0 > static inline void copy_huge_page(struct page *dst, struct page *src) > { > } > @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h) > return h - hstates; > } > =20 > +extern void dissolve_free_huge_page(struct page *page); > +extern void dissolve_free_huge_pages(unsigned long start_pfn, > + unsigned long end_pfn); > + > #else > struct hstate {}; > #define alloc_huge_page(v, a, r) NULL > @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct= hstate *h) > } > #define hstate_index_to_shift(index) 0 > #define hstate_index(h) 0 > +#define dissolve_free_huge_page(p) 0 > +#define dissolve_free_huge_pages(s, e) 0 > #endif > =20 > #endif /* _LINUX_HUGETLB_H */ > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > index ccf9995..c28e6c9 100644 > --- v3.8.orig/mm/hugetlb.c > +++ v3.8/mm/hugetlb.c > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nod= emask_t *nodes_allowed, > return ret; > } > =20 > +/* Dissolve a given free hugepage into free pages. */ > +void dissolve_free_huge_page(struct page *page) > +{ > + if (PageHuge(page) && !page_count(page)) { > + struct hstate *h =3D page_hstate(page); > + int nid =3D page_to_nid(page); > + spin_lock(&hugetlb_lock); > + list_del(&page->lru); > + h->free_huge_pages--; > + h->free_huge_pages_node[nid]--; > + update_and_free_page(h, page); > + spin_unlock(&hugetlb_lock); > + } > +} > + > +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug.= */ > +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end= _pfn) > +{ > + unsigned long pfn; > + unsigned int step =3D 1 << (HUGETLB_PAGE_ORDER); > + for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D step) > + dissolve_free_huge_page(pfn_to_page(pfn)); > +} > + > static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > { > struct page *page; > @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hp= age) > return 0; > } > =20 > +/* Returns true for head pages of in-use hugepages, otherwise returns fa= lse. */ > +int is_hugepage_movable(struct page *hpage) > +{ > + struct page *page; > + struct page *tmp; > + struct hstate *h =3D page_hstate(hpage); > + int ret =3D 0; > + > + VM_BUG_ON(!PageHuge(hpage)); > + if (PageTail(hpage)) > + return 0; > + spin_lock(&hugetlb_lock); > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) > + if (page =3D=3D hpage) > + ret =3D 1; I don't understand the logic here. 1) page is not removed why tmp is used? 2) why hitting (page =3D=3Dhpage) but not breaking from the loop? > + spin_unlock(&hugetlb_lock); > + return ret; > +} > + > [...] --hHWLQfXTYDoKhP50 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAEBAgAGBQJRLbdkAAoJEI01n1+kOSLHBqgP/jz3zpOJ5p+PdkX1nSbYjUsJ l0g1K6eNehbXRaAGlQ+2Za3yQkdMt1zSLl22CGx+l30bBtHX5q+xDBaR1QgmRp2A JUzwDotyzqapL/99VeCMABXAzZy06ZWplXN7r8S2MuQnE+nxhrOlVt8iC9wPRiPv K+hBsszUySNfe32Gsqy09HFL/CInOu5DbkK8ByXUn6pNmDn4R7jr89Rv7I0X2Rr9 nAk8gZyb86M+0ZQfSGvHI/ZpF4QixtFjkyVTUcgHMkR817CnDD0WqJqJt17nEjxN nKjgjZs/4ybx1vjtsuXw7qGzW4Du642IIw/kGatk+pspjLvJtZDd8528p2o6lihB qRRWyQQMIfjmDdHDvnmToQU5Hwpx+RotHGxGpxGmyOvFT3v9vXY9fpN8IzQnrXcw rGsyddEb5QwPXADKefZythcpUWrifvmO1cuzV1Sx98flYGa3Xokf5EvXRIhiuSrL Bw6NXSPnH20mDqP/ZnTIpIx9j2td2plCWUlTQGz+2Fwhw0ro+vrWeFovjkc5UAvO 9lo65UhfLD3hSfF49KEXNwJY5pFVyK148ZP3aC6mSw5GYf9V+Xql0gNBQdWrUUBP A0horNqv9Qvo/zFCMmJwpucS8l8EukJPMg4k0LsBO3ucfc1I+OiNqx0lFnDYLVjH H2On4eWMtFB+qN/Ok8w6 =JLBD -----END PGP SIGNATURE----- --hHWLQfXTYDoKhP50-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx175.postini.com [74.125.245.175]) by kanga.kvack.org (Postfix) with SMTP id 881C26B0002 for ; Wed, 27 Feb 2013 12:06:41 -0500 (EST) Date: Wed, 27 Feb 2013 12:06:27 -0500 From: Naoya Horiguchi Message-ID: <1361984787-yx7rovrg-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130227072517.GA30971@gchen.bj.intel.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-4-git-send-email-n-horiguchi@ah.jp.nec.com> <20130227072517.GA30971@gchen.bj.intel.com> Subject: Re: [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: gong.chen@linux.intel.com Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Wed, Feb 27, 2013 at 02:25:17AM -0500, Chen Gong wrote: > On Thu, Feb 21, 2013 at 02:41:42PM -0500, Naoya Horiguchi wrote: > > Date: Thu, 21 Feb 2013 14:41:42 -0500 ... > > diff --git v3.8.orig/mm/memory-failure.c v3.8/mm/memory-failure.c > > index bc126f6..01e4676 100644 > > --- v3.8.orig/mm/memory-failure.c > > +++ v3.8/mm/memory-failure.c ... > > + atomic_long_add(1< > mce_bad_pages has been substituted by num_poisoned_pages. This patchset is based on v3.8 (as show in diff header), where the replacing patch "memory-failure: use num_poisoned_pages instead of mce_bad_pages" is not merged yet. I'll rebase on v3.8-rc1 in the next post. Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx159.postini.com [74.125.245.159]) by kanga.kvack.org (Postfix) with SMTP id A68BA6B0002 for ; Wed, 27 Feb 2013 12:17:06 -0500 (EST) Date: Wed, 27 Feb 2013 12:16:55 -0500 From: Naoya Horiguchi Message-ID: <1361985415-3tashl9l-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130227073604.GB30971@gchen.bj.intel.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> <20130227073604.GB30971@gchen.bj.intel.com> Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: gong.chen@linux.intel.com Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Wed, Feb 27, 2013 at 02:36:04AM -0500, Chen Gong wrote: > On Thu, Feb 21, 2013 at 02:41:47PM -0500, Naoya Horiguchi wrote: ... > > @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage) > > return 0; > > } > > > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > > +int is_hugepage_movable(struct page *hpage) > > +{ > > + struct page *page; > > + struct page *tmp; > > + struct hstate *h = page_hstate(hpage); > > + int ret = 0; > > + > > + VM_BUG_ON(!PageHuge(hpage)); > > + if (PageTail(hpage)) > > + return 0; > > + spin_lock(&hugetlb_lock); > > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) > > + if (page == hpage) > > + ret = 1; > > I don't understand the logic here. 1) page is not removed why tmp is used? > 2) why hitting (page ==hpage) but not breaking from the loop? For question 1), using list_for_each_entry_safe() was a remnant of try and error and will be fixed. And for question 2), I will add break in later version. Thanks, Naoya > > + spin_unlock(&hugetlb_lock); > > + return ret; > > +} > > + > > [...] -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx203.postini.com [74.125.245.203]) by kanga.kvack.org (Postfix) with SMTP id 9F38D6B0002 for ; Wed, 27 Feb 2013 12:58:08 -0500 (EST) Date: Wed, 27 Feb 2013 12:57:57 -0500 From: Naoya Horiguchi Message-ID: <1361987877-6x88p62s-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361984787-yx7rovrg-mutt-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-4-git-send-email-n-horiguchi@ah.jp.nec.com> <20130227072517.GA30971@gchen.bj.intel.com> <1361984787-yx7rovrg-mutt-n-horiguchi@ah.jp.nec.com> Subject: Re: [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: gong.chen@linux.intel.com Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Wed, Feb 27, 2013 at 12:06:27PM -0500, Naoya Horiguchi wrote: > On Wed, Feb 27, 2013 at 02:25:17AM -0500, Chen Gong wrote: > > On Thu, Feb 21, 2013 at 02:41:42PM -0500, Naoya Horiguchi wrote: > > > Date: Thu, 21 Feb 2013 14:41:42 -0500 > ... > > > diff --git v3.8.orig/mm/memory-failure.c v3.8/mm/memory-failure.c > > > index bc126f6..01e4676 100644 > > > --- v3.8.orig/mm/memory-failure.c > > > +++ v3.8/mm/memory-failure.c > ... > > > + atomic_long_add(1< > > > mce_bad_pages has been substituted by num_poisoned_pages. > > This patchset is based on v3.8 (as show in diff header), where the > replacing patch "memory-failure: use num_poisoned_pages instead of > mce_bad_pages" is not merged yet. I'll rebase on v3.8-rc1 in the > next post. sorry, s/v3.8-rc1/v3.9-rc1/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx204.postini.com [74.125.245.204]) by kanga.kvack.org (Postfix) with SMTP id 132F26B0011 for ; Thu, 28 Feb 2013 01:02:58 -0500 (EST) Received: by mail-ee0-f50.google.com with SMTP id e51so1152942eek.9 for ; Wed, 27 Feb 2013 22:02:57 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> From: KOSAKI Motohiro Date: Thu, 28 Feb 2013 01:02:37 -0500 Message-ID: Subject: Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: "linux-mm@kvack.org" , Andrew Morton , Mel Gorman , Hugh Dickins , Andi Kleen , LKML > - { > - .procname = "hugepages_treat_as_movable", > - .data = &hugepages_treat_as_movable, > - .maxlen = sizeof(int), > - .mode = 0644, > - .proc_handler = hugetlb_treat_movable_handler, > - }, Sorry, no. This is too aggressive remove. Imagine, a lot of shell script don't have any error check. I suggest to keep this file but change to nop (to output warning is better). About 1-2 years after, we can remove this file safely. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx110.postini.com [74.125.245.110]) by kanga.kvack.org (Postfix) with SMTP id 886B56B0002 for ; Thu, 28 Feb 2013 13:17:05 -0500 (EST) Date: Thu, 28 Feb 2013 13:16:52 -0500 From: Naoya Horiguchi Message-ID: <1362075412-779292mh-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> Subject: Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: KOSAKI Motohiro Cc: "linux-mm@kvack.org" , Andrew Morton , Mel Gorman , Hugh Dickins , Andi Kleen , LKML On Thu, Feb 28, 2013 at 01:02:37AM -0500, KOSAKI Motohiro wrote: > > - { > > - .procname = "hugepages_treat_as_movable", > > - .data = &hugepages_treat_as_movable, > > - .maxlen = sizeof(int), > > - .mode = 0644, > > - .proc_handler = hugetlb_treat_movable_handler, > > - }, > > Sorry, no. > > This is too aggressive remove. Imagine, a lot of shell script don't > have any error check. Sure, it could break usespace applications. > I suggest to keep this file but change to nop (to output warning is better). > About 1-2 years after, we can remove this file safely. OK, so I'll leave it for a while with the comment saying that this parameter is obsolete and shouldn't be used. Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx158.postini.com [74.125.245.158]) by kanga.kvack.org (Postfix) with SMTP id 5C5B96B0006 for ; Mon, 18 Mar 2013 10:52:03 -0400 (EDT) Date: Mon, 18 Mar 2013 15:51:59 +0100 From: Michal Hocko Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() Message-ID: <20130318145159.GP10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Thu 21-02-13 14:41:40, Naoya Horiguchi wrote: [...] > diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c > index 2fd8b4a..7d84f4c 100644 > --- v3.8.orig/mm/migrate.c > +++ v3.8/mm/migrate.c > @@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, > pte_unmap_unlock(ptep, ptl); > } > > +void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, > + unsigned long address) > +{ > + spinlock_t *ptl = pte_lockptr(mm, pmd); > + pte_t pte; > + swp_entry_t entry; > + struct page *page; > + > + spin_lock(ptl); > + pte = huge_ptep_get((pte_t *)pmd); > + if (!is_hugetlb_entry_migration(pte)) > + goto out; > + entry = pte_to_swp_entry(pte); > + page = migration_entry_to_page(entry); > + if (!get_page_unless_zero(page)) > + goto out; > + spin_unlock(ptl); > + wait_on_page_locked(page); > + put_page(page); > + return; > +out: > + spin_unlock(ptl); > +} This duplicates a lot of code from migration_entry_wait. Can we just teach the generic one to be HugePage aware instead? All it takes is just opencoding pte_offset_map_lock and calling huge_ptep_get ofr HugePage and pte_offset_map otherwise. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx133.postini.com [74.125.245.133]) by kanga.kvack.org (Postfix) with SMTP id 313DF6B0005 for ; Mon, 18 Mar 2013 11:22:26 -0400 (EDT) Date: Mon, 18 Mar 2013 16:22:24 +0100 From: Michal Hocko Subject: Re: [PATCH 2/9] migrate: make core migration code aware of hugepage Message-ID: <20130318152224.GQ10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-3-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1361475708-25991-3-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Thu 21-02-13 14:41:41, Naoya Horiguchi wrote: [...] > diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h > index 0d7df39..2e475b5 100644 > --- v3.8.orig/include/linux/mempolicy.h > +++ v3.8/include/linux/mempolicy.h > @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol); > /* Check if a vma is migratable */ > static inline int vma_migratable(struct vm_area_struct *vma) > { > - if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP)) > + if (vma->vm_flags & (VM_IO | VM_PFNMAP)) > return 0; Is this safe? At least check_*_range don't seem to be hugetlb aware. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx128.postini.com [74.125.245.128]) by kanga.kvack.org (Postfix) with SMTP id 74FF96B0005 for ; Mon, 18 Mar 2013 11:33:02 -0400 (EDT) Date: Mon, 18 Mar 2013 16:33:00 +0100 From: Michal Hocko Subject: Re: [PATCH 2/9] migrate: make core migration code aware of hugepage Message-ID: <20130318153300.GR10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-3-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318152224.GQ10192@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130318152224.GQ10192@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Mon 18-03-13 16:22:24, Michal Hocko wrote: > On Thu 21-02-13 14:41:41, Naoya Horiguchi wrote: > [...] > > diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h > > index 0d7df39..2e475b5 100644 > > --- v3.8.orig/include/linux/mempolicy.h > > +++ v3.8/include/linux/mempolicy.h > > @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol); > > /* Check if a vma is migratable */ > > static inline int vma_migratable(struct vm_area_struct *vma) > > { > > - if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP)) > > + if (vma->vm_flags & (VM_IO | VM_PFNMAP)) > > return 0; > > Is this safe? At least check_*_range don't seem to be hugetlb aware. Ohh, they become in 5/9. Should that one be reordered then? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx149.postini.com [74.125.245.149]) by kanga.kvack.org (Postfix) with SMTP id 9F77E6B0005 for ; Mon, 18 Mar 2013 11:40:59 -0400 (EDT) Date: Mon, 18 Mar 2013 16:40:57 +0100 From: Michal Hocko Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Message-ID: <20130318154057.GS10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: > This patch extends check_range() to handle vma with VM_HUGETLB set. > With this changes, we can migrate hugepage with migrate_pages(2). > Note that for larger hugepages (covered by pud entries, 1GB for > x86_64 for example), we simply skip it now. > > Signed-off-by: Naoya Horiguchi > --- > include/linux/hugetlb.h | 6 ++++-- > mm/hugetlb.c | 10 ++++++++++ > mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++------------ > 3 files changed, 48 insertions(+), 14 deletions(-) > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > index 8f87115..eb33df5 100644 > --- v3.8.orig/include/linux/hugetlb.h > +++ v3.8/include/linux/hugetlb.h > @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); > int dequeue_hwpoisoned_huge_page(struct page *page); > void putback_active_hugepage(struct page *page); > void putback_active_hugepages(struct list_head *l); > +void migrate_hugepage_add(struct page *page, struct list_head *list); > void copy_huge_page(struct page *dst, struct page *src); > > extern unsigned long hugepages_treat_as_movable; > @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, > pmd_t *pmd, int write); > struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, > pud_t *pud, int write); > -int pmd_huge(pmd_t pmd); > -int pud_huge(pud_t pmd); > +extern int pmd_huge(pmd_t pmd); > +extern int pud_huge(pud_t pmd); extern is not needed here. > unsigned long hugetlb_change_protection(struct vm_area_struct *vma, > unsigned long address, unsigned long end, pgprot_t newprot); > > @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) > > #define putback_active_hugepage(p) 0 > #define putback_active_hugepages(l) 0 > +#define migrate_hugepage_add(p, l) 0 > static inline void copy_huge_page(struct page *dst, struct page *src) > { > } > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > index cb9d43b8..86ffcb7 100644 > --- v3.8.orig/mm/hugetlb.c > +++ v3.8/mm/hugetlb.c > @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l) > list_for_each_entry_safe(page, page2, l, lru) > putback_active_hugepage(page); > } > + > +void migrate_hugepage_add(struct page *page, struct list_head *list) > +{ > + VM_BUG_ON(!PageHuge(page)); > + get_page(page); > + spin_lock(&hugetlb_lock); Why hugetlb_lock? Comment for this lock says that it protects hugepage_freelists, nr_huge_pages, and free_huge_pages. > + list_move_tail(&page->lru, list); > + spin_unlock(&hugetlb_lock); > + return; > +} > diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c > index e2df1c1..8627135 100644 > --- v3.8.orig/mm/mempolicy.c > +++ v3.8/mm/mempolicy.c > @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > return addr != end; > } > > +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, > + const nodemask_t *nodes, unsigned long flags, > + void *private) > +{ > +#ifdef CONFIG_HUGETLB_PAGE > + int nid; > + struct page *page; > + > + spin_lock(&vma->vm_mm->page_table_lock); > + page = pte_page(huge_ptep_get((pte_t *)pmd)); > + spin_unlock(&vma->vm_mm->page_table_lock); I am a bit confused why page_table_lock is used here and why it doesn't cover the page usage. > + nid = page_to_nid(page); > + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT) > + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1) > + || flags & MPOL_MF_MOVE_ALL)) > + migrate_hugepage_add(page, private); > +#else > + BUG(); > +#endif > +} > + > static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > unsigned long addr, unsigned long end, > const nodemask_t *nodes, unsigned long flags, > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > pmd = pmd_offset(pud, addr); > do { > next = pmd_addr_end(addr, end); > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() sufficient? > + check_hugetlb_pmd_range(vma, pmd, nodes, > + flags, private); > + continue; > + } > split_huge_page_pmd(vma, addr, pmd); > if (pmd_none_or_trans_huge_or_clear_bad(pmd)) > continue; [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx197.postini.com [74.125.245.197]) by kanga.kvack.org (Postfix) with SMTP id D42586B0005 for ; Mon, 18 Mar 2013 11:51:26 -0400 (EDT) Date: Mon, 18 Mar 2013 16:51:25 +0100 From: Michal Hocko Subject: Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Message-ID: <20130318155125.GT10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Thu 21-02-13 14:41:48, Naoya Horiguchi wrote: > Now hugepages are definitely movable. So allocating hugepages from > ZONE_MOVABLE is natural and we have no reason to keep this parameter. The sysctl is a part of user interface so you shouldn't remove it right away. What we can do is to make it noop and only WARN() that the interface will be removed later so that userspace can prepare for that. > Signed-off-by: Naoya Horiguchi > --- > Documentation/sysctl/vm.txt | 16 ---------------- > include/linux/hugetlb.h | 2 -- > kernel/sysctl.c | 7 ------- > mm/hugetlb.c | 23 +++++------------------ > 4 files changed, 5 insertions(+), 43 deletions(-) > > diff --git v3.8.orig/Documentation/sysctl/vm.txt v3.8/Documentation/sysctl/vm.txt > index 078701f..997350a 100644 > --- v3.8.orig/Documentation/sysctl/vm.txt > +++ v3.8/Documentation/sysctl/vm.txt > @@ -167,22 +167,6 @@ fragmentation index is <= extfrag_threshold. The default value is 500. > > ============================================================== > > -hugepages_treat_as_movable > - > -This parameter is only useful when kernelcore= is specified at boot time to > -create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages > -are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero > -value written to hugepages_treat_as_movable allows huge pages to be allocated > -from ZONE_MOVABLE. > - > -Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge > -pages pool can easily grow or shrink within. Assuming that applications are > -not running that mlock() a lot of memory, it is likely the huge pages pool > -can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value > -into nr_hugepages and triggering page reclaim. > - > -============================================================== > - > hugetlb_shm_group > > hugetlb_shm_group contains group id that is allowed to create SysV > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > index e33f07f..c97e5c5 100644 > --- v3.8.orig/include/linux/hugetlb.h > +++ v3.8/include/linux/hugetlb.h > @@ -35,7 +35,6 @@ int PageHuge(struct page *page); > void reset_vma_resv_huge_pages(struct vm_area_struct *vma); > int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > -int hugetlb_treat_movable_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > > #ifdef CONFIG_NUMA > int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, > @@ -73,7 +72,6 @@ void migrate_hugepage_add(struct page *page, struct list_head *list); > int is_hugepage_movable(struct page *page); > void copy_huge_page(struct page *dst, struct page *src); > > -extern unsigned long hugepages_treat_as_movable; > extern const unsigned long hugetlb_zero, hugetlb_infinity; > extern int sysctl_hugetlb_shm_group; > extern struct list_head huge_boot_pages; > diff --git v3.8.orig/kernel/sysctl.c v3.8/kernel/sysctl.c > index c88878d..a98bcf2 100644 > --- v3.8.orig/kernel/sysctl.c > +++ v3.8/kernel/sysctl.c > @@ -1189,13 +1189,6 @@ static struct ctl_table vm_table[] = { > .mode = 0644, > .proc_handler = proc_dointvec, > }, > - { > - .procname = "hugepages_treat_as_movable", > - .data = &hugepages_treat_as_movable, > - .maxlen = sizeof(int), > - .mode = 0644, > - .proc_handler = hugetlb_treat_movable_handler, > - }, > { > .procname = "nr_overcommit_hugepages", > .data = NULL, > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > index c28e6c9..c60d203 100644 > --- v3.8.orig/mm/hugetlb.c > +++ v3.8/mm/hugetlb.c > @@ -33,7 +33,6 @@ > #include "internal.h" > > const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; > -static gfp_t htlb_alloc_mask = GFP_HIGHUSER; > unsigned long hugepages_treat_as_movable; > > int hugetlb_max_hstate __read_mostly; > @@ -542,7 +541,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, > retry_cpuset: > cpuset_mems_cookie = get_mems_allowed(); > zonelist = huge_zonelist(vma, address, > - htlb_alloc_mask, &mpol, &nodemask); > + GFP_HIGHUSER_MOVABLE, &mpol, &nodemask); > /* > * A child process with MAP_PRIVATE mappings created by their parent > * have no page reserves. This check ensures that reservations are > @@ -558,7 +557,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, > > for_each_zone_zonelist_nodemask(zone, z, zonelist, > MAX_NR_ZONES - 1, nodemask) { > - if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) { > + if (cpuset_zone_allowed_softwall(zone, GFP_HIGHUSER_MOVABLE)) { > page = dequeue_huge_page_node(h, zone_to_nid(zone)); > if (page) { > if (!avoid_reserve) > @@ -698,7 +697,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid) > return NULL; > > page = alloc_pages_exact_node(nid, > - htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE| > + GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE| > __GFP_REPEAT|__GFP_NOWARN, > huge_page_order(h)); > if (page) { > @@ -909,12 +908,12 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > spin_unlock(&hugetlb_lock); > > if (nid == NUMA_NO_NODE) > - page = alloc_pages(htlb_alloc_mask|__GFP_COMP| > + page = alloc_pages(GFP_HIGHUSER_MOVABLE|__GFP_COMP| > __GFP_REPEAT|__GFP_NOWARN, > huge_page_order(h)); > else > page = alloc_pages_exact_node(nid, > - htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE| > + GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE| > __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); > > if (page && arch_prepare_hugepage(page)) { > @@ -2078,18 +2077,6 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *table, int write, > } > #endif /* CONFIG_NUMA */ > > -int hugetlb_treat_movable_handler(struct ctl_table *table, int write, > - void __user *buffer, > - size_t *length, loff_t *ppos) > -{ > - proc_dointvec(table, write, buffer, length, ppos); > - if (hugepages_treat_as_movable) > - htlb_alloc_mask = GFP_HIGHUSER_MOVABLE; > - else > - htlb_alloc_mask = GFP_HIGHUSER; > - return 0; > -} > - > int hugetlb_overcommit_handler(struct ctl_table *table, int write, > void __user *buffer, > size_t *length, loff_t *ppos) > -- > 1.7.11.7 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx186.postini.com [74.125.245.186]) by kanga.kvack.org (Postfix) with SMTP id CD2536B0005 for ; Mon, 18 Mar 2013 12:07:40 -0400 (EDT) Date: Mon, 18 Mar 2013 17:07:37 +0100 From: Michal Hocko Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Message-ID: <20130318160737.GU10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Thu 21-02-13 14:41:47, Naoya Horiguchi wrote: > Currently we can't offline memory blocks which contain hugepages because > a hugepage is considered as an unmovable page. But now with this patch > series, a hugepage has become movable, so by using hugepage migration we > can offline such memory blocks. > > What's different from other users of hugepage migration is that we need > to decompose all the hugepages inside the target memory block into free > buddy pages after hugepage migration, because otherwise free hugepages > remaining in the memory block intervene the memory offlining. > For this reason we introduce new functions dissolve_free_huge_page() and > dissolve_free_huge_pages(). > > Other than that, what this patch does is straightforwardly to add hugepage > migration code, that is, adding hugepage code to the functions which scan > over pfn and collect hugepages to be migrated, and adding a hugepage > allocation function to alloc_migrate_target(). > > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove > over them because it's larger than memory block. So we now simply leave > it to fail as it is. What we could do is to check whether there is a free gb huge page on other node and migrate there. > Signed-off-by: Naoya Horiguchi > --- > include/linux/hugetlb.h | 8 ++++++++ > mm/hugetlb.c | 43 +++++++++++++++++++++++++++++++++++++++++ > mm/memory_hotplug.c | 51 ++++++++++++++++++++++++++++++++++++++++--------- > mm/migrate.c | 12 +++++++++++- > mm/page_alloc.c | 12 ++++++++++++ > mm/page_isolation.c | 5 +++++ > 6 files changed, 121 insertions(+), 10 deletions(-) > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > index 86a4d78..e33f07f 100644 > --- v3.8.orig/include/linux/hugetlb.h > +++ v3.8/include/linux/hugetlb.h > @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page); > void putback_active_hugepage(struct page *page); > void putback_active_hugepages(struct list_head *l); > void migrate_hugepage_add(struct page *page, struct list_head *list); > +int is_hugepage_movable(struct page *page); > void copy_huge_page(struct page *dst, struct page *src); > > extern unsigned long hugepages_treat_as_movable; > @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) > #define putback_active_hugepage(p) 0 > #define putback_active_hugepages(l) 0 > #define migrate_hugepage_add(p, l) 0 > +#define is_hugepage_movable(x) 0 > static inline void copy_huge_page(struct page *dst, struct page *src) > { > } > @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h) > return h - hstates; > } > > +extern void dissolve_free_huge_page(struct page *page); > +extern void dissolve_free_huge_pages(unsigned long start_pfn, > + unsigned long end_pfn); > + > #else > struct hstate {}; > #define alloc_huge_page(v, a, r) NULL > @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > } > #define hstate_index_to_shift(index) 0 > #define hstate_index(h) 0 > +#define dissolve_free_huge_page(p) 0 > +#define dissolve_free_huge_pages(s, e) 0 > #endif > > #endif /* _LINUX_HUGETLB_H */ > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > index ccf9995..c28e6c9 100644 > --- v3.8.orig/mm/hugetlb.c > +++ v3.8/mm/hugetlb.c > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > return ret; > } > > +/* Dissolve a given free hugepage into free pages. */ > +void dissolve_free_huge_page(struct page *page) > +{ > + if (PageHuge(page) && !page_count(page)) { Could you clarify why you are cheking page_count here? I assume it is to make sure the page is free but what prevents it being increased before you take hugetlb_lock? > + struct hstate *h = page_hstate(page); > + int nid = page_to_nid(page); > + spin_lock(&hugetlb_lock); > + list_del(&page->lru); > + h->free_huge_pages--; > + h->free_huge_pages_node[nid]--; > + update_and_free_page(h, page); > + spin_unlock(&hugetlb_lock); > + } > +} > + > +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */ > +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > +{ > + unsigned long pfn; > + unsigned int step = 1 << (HUGETLB_PAGE_ORDER); hugetlb pages could be present in different sizes so this doesn't work in general. You need to to get order from page_hstate. > + for (pfn = start_pfn; pfn < end_pfn; pfn += step) > + dissolve_free_huge_page(pfn_to_page(pfn)); > +} > + > static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > { > struct page *page; > @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage) > return 0; > } > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > +int is_hugepage_movable(struct page *hpage) > +{ > + struct page *page; > + struct page *tmp; > + struct hstate *h = page_hstate(hpage); > + int ret = 0; > + > + VM_BUG_ON(!PageHuge(hpage)); > + if (PageTail(hpage)) > + return 0; > + spin_lock(&hugetlb_lock); > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) > + if (page == hpage) > + ret = 1; > + spin_unlock(&hugetlb_lock); > + return ret; > +} > + > /* > * This function is called from memory failure code. > * Assume the caller holds page lock of the head page. > diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c > index d04ed87..6418de2 100644 > --- v3.8.orig/mm/memory_hotplug.c > +++ v3.8/mm/memory_hotplug.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > > #include > > @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) > } > > /* > - * Scanning pfn is much easier than scanning lru list. > - * Scan pfn from start to end and Find LRU page. > + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages > + * and hugepages). We scan pfn because it's much easier than scanning over > + * linked list. This function returns the pfn of the first found movable > + * page if it's found, otherwise 0. > */ > -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > +static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > { > unsigned long pfn; > struct page *page; > @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > page = pfn_to_page(pfn); > if (PageLRU(page)) > return pfn; > + if (PageHuge(page)) { > + if (is_hugepage_movable(page)) > + return pfn; > + else > + pfn += (1 << compound_order(page)) - 1; > + } scan_lru_pages's name gets really confusing after this change because hugetlb pages are not on the LRU. Maybe it would be good to rename it. > } > } > return 0; > @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > page = pfn_to_page(pfn); > if (!get_page_unless_zero(page)) > continue; All tail pages have 0 reference count (according to prep_compound_page) so they would be skipped anyway. This makes the below pfn tweaks pointless. > + if (PageHuge(page)) { > + /* > + * Larger hugepage (1GB for x86_64) is larger than > + * memory block, so pfn scan can start at the tail > + * page of larger hugepage. In such case, > + * we simply skip the hugepage and move the cursor > + * to the last tail page. > + */ > + if (PageTail(page)) { > + struct page *head = compound_head(page); > + pfn = page_to_pfn(head) + > + (1 << compound_order(head)) - 1; > + put_page(page); > + continue; > + } > + pfn = (1 << compound_order(page)) - 1; > + if (huge_page_size(page_hstate(page)) != PMD_SIZE) { > + put_page(page); > + continue; > + } There might be other hugepage sizes which fit into memblock so this test doesn't seem right. > + list_move_tail(&page->lru, &source); > + move_pages -= 1 << compound_order(page); > + continue; > + } > /* > * We can skip free pages. And we can only deal with pages on > * LRU. [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx123.postini.com [74.125.245.123]) by kanga.kvack.org (Postfix) with SMTP id 3E90C6B0005 for ; Mon, 18 Mar 2013 20:06:36 -0400 (EDT) Date: Mon, 18 Mar 2013 20:06:23 -0400 From: Naoya Horiguchi Message-ID: <1363651583-dzi7mg86-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130318145159.GP10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318145159.GP10192@dhcp22.suse.cz> Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Mon, Mar 18, 2013 at 03:51:59PM +0100, Michal Hocko wrote: > On Thu 21-02-13 14:41:40, Naoya Horiguchi wrote: > [...] > > diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c > > index 2fd8b4a..7d84f4c 100644 > > --- v3.8.orig/mm/migrate.c > > +++ v3.8/mm/migrate.c > > @@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, > > pte_unmap_unlock(ptep, ptl); > > } > > > > +void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, > > + unsigned long address) > > +{ > > + spinlock_t *ptl = pte_lockptr(mm, pmd); > > + pte_t pte; > > + swp_entry_t entry; > > + struct page *page; > > + > > + spin_lock(ptl); > > + pte = huge_ptep_get((pte_t *)pmd); > > + if (!is_hugetlb_entry_migration(pte)) > > + goto out; > > + entry = pte_to_swp_entry(pte); > > + page = migration_entry_to_page(entry); > > + if (!get_page_unless_zero(page)) > > + goto out; > > + spin_unlock(ptl); > > + wait_on_page_locked(page); > > + put_page(page); > > + return; > > +out: > > + spin_unlock(ptl); > > +} > > This duplicates a lot of code from migration_entry_wait. Can we just > teach the generic one to be HugePage aware instead? > All it takes is just opencoding pte_offset_map_lock and calling > huge_ptep_get ofr HugePage and pte_offset_map otherwise. Yes, it's possible with some cleanup. I'll do this. Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx196.postini.com [74.125.245.196]) by kanga.kvack.org (Postfix) with SMTP id 8C4A06B0006 for ; Mon, 18 Mar 2013 20:06:45 -0400 (EDT) Date: Mon, 18 Mar 2013 20:06:35 -0400 From: Naoya Horiguchi Message-ID: <1363651595-ewr7efx1-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130318153300.GR10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-3-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318152224.GQ10192@dhcp22.suse.cz> <20130318153300.GR10192@dhcp22.suse.cz> Subject: Re: [PATCH 2/9] migrate: make core migration code aware of hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Mon, Mar 18, 2013 at 04:33:00PM +0100, Michal Hocko wrote: > On Mon 18-03-13 16:22:24, Michal Hocko wrote: > > On Thu 21-02-13 14:41:41, Naoya Horiguchi wrote: > > [...] > > > diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h > > > index 0d7df39..2e475b5 100644 > > > --- v3.8.orig/include/linux/mempolicy.h > > > +++ v3.8/include/linux/mempolicy.h > > > @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol); > > > /* Check if a vma is migratable */ > > > static inline int vma_migratable(struct vm_area_struct *vma) > > > { > > > - if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP)) > > > + if (vma->vm_flags & (VM_IO | VM_PFNMAP)) > > > return 0; > > > > Is this safe? At least check_*_range don't seem to be hugetlb aware. > > Ohh, they become in 5/9. Should that one be reordered then? OK, I'll shift this change after 5/9 patch. Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx116.postini.com [74.125.245.116]) by kanga.kvack.org (Postfix) with SMTP id 37F046B0027 for ; Mon, 18 Mar 2013 20:07:26 -0400 (EDT) Date: Mon, 18 Mar 2013 20:07:16 -0400 From: Naoya Horiguchi Message-ID: <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130318154057.GS10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote: > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: > > This patch extends check_range() to handle vma with VM_HUGETLB set. > > With this changes, we can migrate hugepage with migrate_pages(2). > > Note that for larger hugepages (covered by pud entries, 1GB for > > x86_64 for example), we simply skip it now. > > > > Signed-off-by: Naoya Horiguchi > > --- > > include/linux/hugetlb.h | 6 ++++-- > > mm/hugetlb.c | 10 ++++++++++ > > mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++------------ > > 3 files changed, 48 insertions(+), 14 deletions(-) > > > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > > index 8f87115..eb33df5 100644 > > --- v3.8.orig/include/linux/hugetlb.h > > +++ v3.8/include/linux/hugetlb.h > > @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); > > int dequeue_hwpoisoned_huge_page(struct page *page); > > void putback_active_hugepage(struct page *page); > > void putback_active_hugepages(struct list_head *l); > > +void migrate_hugepage_add(struct page *page, struct list_head *list); > > void copy_huge_page(struct page *dst, struct page *src); > > > > extern unsigned long hugepages_treat_as_movable; > > @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, > > pmd_t *pmd, int write); > > struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, > > pud_t *pud, int write); > > -int pmd_huge(pmd_t pmd); > > -int pud_huge(pud_t pmd); > > +extern int pmd_huge(pmd_t pmd); > > +extern int pud_huge(pud_t pmd); > > extern is not needed here. OK. > > unsigned long hugetlb_change_protection(struct vm_area_struct *vma, > > unsigned long address, unsigned long end, pgprot_t newprot); > > > > @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) > > > > #define putback_active_hugepage(p) 0 > > #define putback_active_hugepages(l) 0 > > +#define migrate_hugepage_add(p, l) 0 > > static inline void copy_huge_page(struct page *dst, struct page *src) > > { > > } > > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > > index cb9d43b8..86ffcb7 100644 > > --- v3.8.orig/mm/hugetlb.c > > +++ v3.8/mm/hugetlb.c > > @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l) > > list_for_each_entry_safe(page, page2, l, lru) > > putback_active_hugepage(page); > > } > > + > > +void migrate_hugepage_add(struct page *page, struct list_head *list) > > +{ > > + VM_BUG_ON(!PageHuge(page)); > > + get_page(page); > > + spin_lock(&hugetlb_lock); > > Why hugetlb_lock? Comment for this lock says that it protects > hugepage_freelists, nr_huge_pages, and free_huge_pages. I think that this comment is out of date and hugepage_activelists, which was introduced recently, should be protected because this patchset adds is_hugepage_movable() which runs through the list. So I'll update the comment in the next post. > > + list_move_tail(&page->lru, list); > > + spin_unlock(&hugetlb_lock); > > + return; > > +} > > diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c > > index e2df1c1..8627135 100644 > > --- v3.8.orig/mm/mempolicy.c > > +++ v3.8/mm/mempolicy.c > > @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > > return addr != end; > > } > > > > +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, > > + const nodemask_t *nodes, unsigned long flags, > > + void *private) > > +{ > > +#ifdef CONFIG_HUGETLB_PAGE > > + int nid; > > + struct page *page; > > + > > + spin_lock(&vma->vm_mm->page_table_lock); > > + page = pte_page(huge_ptep_get((pte_t *)pmd)); > > + spin_unlock(&vma->vm_mm->page_table_lock); > > I am a bit confused why page_table_lock is used here and why it doesn't > cover the page usage. I expected this function to do the same for pmd as check_pte_range() does for pte, but the above code didn't do it. I should've put spin_unlock below migrate_hugepage_add(). Sorry for the confusion. > > + nid = page_to_nid(page); > > + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT) > > + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1) > > + || flags & MPOL_MF_MOVE_ALL)) > > + migrate_hugepage_add(page, private); > > +#else > > + BUG(); > > +#endif > > +} > > + > > static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > unsigned long addr, unsigned long end, > > const nodemask_t *nodes, unsigned long flags, > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > pmd = pmd_offset(pud, addr); > > do { > > next = pmd_addr_end(addr, end); > > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() > sufficient? I think we need both check here because if we use only pmd_huge(), pmd for thp goes into this branch wrongly. Thanks, Naoya > > + check_hugetlb_pmd_range(vma, pmd, nodes, > > + flags, private); > > + continue; > > + } > > split_huge_page_pmd(vma, addr, pmd); > > if (pmd_none_or_trans_huge_or_clear_bad(pmd)) > > continue; > [...] > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx126.postini.com [74.125.245.126]) by kanga.kvack.org (Postfix) with SMTP id 0F92A6B0037 for ; Mon, 18 Mar 2013 20:07:40 -0400 (EDT) Date: Mon, 18 Mar 2013 20:07:32 -0400 From: Naoya Horiguchi Message-ID: <1363651652-dcf5qvg4-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130318155125.GT10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318155125.GT10192@dhcp22.suse.cz> Subject: Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Mon, Mar 18, 2013 at 04:51:25PM +0100, Michal Hocko wrote: > On Thu 21-02-13 14:41:48, Naoya Horiguchi wrote: > > Now hugepages are definitely movable. So allocating hugepages from > > ZONE_MOVABLE is natural and we have no reason to keep this parameter. > > The sysctl is a part of user interface so you shouldn't remove it right > away. What we can do is to make it noop and only WARN() that the > interface will be removed later so that userspace can prepare for that. > Yes, you're right. I'll replace the handler with noop. Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx132.postini.com [74.125.245.132]) by kanga.kvack.org (Postfix) with SMTP id E26486B0005 for ; Tue, 19 Mar 2013 03:11:17 -0400 (EDT) Received: by mail-wi0-f179.google.com with SMTP id ez12so110562wid.12 for ; Tue, 19 Mar 2013 00:11:16 -0700 (PDT) Date: Tue, 19 Mar 2013 08:11:13 +0100 From: Michal Hocko Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Message-ID: <20130319071113.GD5112@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote: > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote: > > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: [...] > > > @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l) > > > list_for_each_entry_safe(page, page2, l, lru) > > > putback_active_hugepage(page); > > > } > > > + > > > +void migrate_hugepage_add(struct page *page, struct list_head *list) > > > +{ > > > + VM_BUG_ON(!PageHuge(page)); > > > + get_page(page); > > > + spin_lock(&hugetlb_lock); > > > > Why hugetlb_lock? Comment for this lock says that it protects > > hugepage_freelists, nr_huge_pages, and free_huge_pages. > > I think that this comment is out of date and hugepage_activelists, > which was introduced recently, should be protected because this > patchset adds is_hugepage_movable() which runs through the list. > So I'll update the comment in the next post. > > > > + list_move_tail(&page->lru, list); > > > + spin_unlock(&hugetlb_lock); > > > + return; > > > +} > > > diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c > > > index e2df1c1..8627135 100644 > > > --- v3.8.orig/mm/mempolicy.c > > > +++ v3.8/mm/mempolicy.c > > > @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > > > return addr != end; > > > } > > > > > > +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, > > > + const nodemask_t *nodes, unsigned long flags, > > > + void *private) > > > +{ > > > +#ifdef CONFIG_HUGETLB_PAGE > > > + int nid; > > > + struct page *page; > > > + > > > + spin_lock(&vma->vm_mm->page_table_lock); > > > + page = pte_page(huge_ptep_get((pte_t *)pmd)); > > > + spin_unlock(&vma->vm_mm->page_table_lock); > > > > I am a bit confused why page_table_lock is used here and why it doesn't > > cover the page usage. > > I expected this function to do the same for pmd as check_pte_range() does > for pte, but the above code didn't do it. I should've put spin_unlock > below migrate_hugepage_add(). Sorry for the confusion. OK, I see. So you want to prevent from racing with pmd unmap. > > > + nid = page_to_nid(page); > > > + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT) > > > + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1) > > > + || flags & MPOL_MF_MOVE_ALL)) > > > + migrate_hugepage_add(page, private); > > > +#else > > > + BUG(); > > > +#endif > > > +} > > > + > > > static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > > unsigned long addr, unsigned long end, > > > const nodemask_t *nodes, unsigned long flags, > > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > > pmd = pmd_offset(pud, addr); > > > do { > > > next = pmd_addr_end(addr, end); > > > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { > > > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() > > sufficient? > > I think we need both check here because if we use only pmd_huge(), > pmd for thp goes into this branch wrongly. Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it obviously checks only _PAGE_PSE same as pmd_large() which is really unfortunate and confusing. Can we make it hugetlb specific? > > Thanks, > Naoya -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx148.postini.com [74.125.245.148]) by kanga.kvack.org (Postfix) with SMTP id 8B0D76B0005 for ; Tue, 19 Mar 2013 19:43:50 -0400 (EDT) Received: by mail-da0-f43.google.com with SMTP id u36so607167dak.2 for ; Tue, 19 Mar 2013 16:43:49 -0700 (PDT) Message-ID: <5148F830.3070601@gmail.com> Date: Wed, 20 Mar 2013 07:43:44 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Hi Naoya, On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: > Hi, > > Hugepage migration is now available only for soft offlining (moving > data on the half corrupted page to another page to save the data). > But it's also useful some other users of page migration, so this > patchset tries to extend some of such users to support hugepage. > > The targets of this patchset are NUMA related system calls (i.e. > migrate_pages(2), move_pages(2), and mbind(2)), and memory hotplug. > This patchset does not extend page migration in memory compaction, > because I think that users of memory compaction mainly expect to > construct thp by arranging raw pages but hugepage migration doesn't > help it. > CMA, another user of page migration, can have benefit from hugepage > migration, but is not enabled to support it now. This is because > I've never used CMA and need to learn more to extend and/or test > hugepage migration in CMA. I'll add this in later version if it > becomes ready, or will post as a separate patchset. > > Hugepage migration of 1GB hugepage is not enabled for now, because > I'm not sure whether users of 1GB hugepage really want it. > We need to spare free hugepage in order to do migration, but I don't > think that users want to 1GB memory to idle for that purpose > (currently we can't expand/shrink 1GB hugepage pool after boot). > > Could you review and give me some comments/feedbacks? > > Thanks, > Naoya Horiguchi > --- > Easy patch access: > git@github.com:Naoya-Horiguchi/linux.git > branch:extend_hugepage_migration > > Test code: > git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git git clone git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git Cloning into test_hugepage_migration_extension... Permission denied (publickey). fatal: The remote end hung up unexpectedly > > Naoya Horiguchi (9): > migrate: add migrate_entry_wait_huge() > migrate: make core migration code aware of hugepage > soft-offline: use migrate_pages() instead of migrate_huge_page() > migrate: clean up migrate_huge_page() > migrate: enable migrate_pages() to migrate hugepage > migrate: enable move_pages() to migrate hugepage > mbind: enable mbind() to migrate hugepage > memory-hotplug: enable memory hotplug to handle hugepage > remove /proc/sys/vm/hugepages_treat_as_movable > > Documentation/sysctl/vm.txt | 16 ------ > include/linux/hugetlb.h | 25 ++++++++-- > include/linux/mempolicy.h | 2 +- > include/linux/migrate.h | 12 ++--- > include/linux/swapops.h | 4 ++ > kernel/sysctl.c | 7 --- > mm/hugetlb.c | 98 ++++++++++++++++++++++++++++-------- > mm/memory-failure.c | 20 ++++++-- > mm/memory.c | 6 ++- > mm/memory_hotplug.c | 51 +++++++++++++++---- > mm/mempolicy.c | 61 +++++++++++++++-------- > mm/migrate.c | 119 ++++++++++++++++++++++++++++++-------------- > mm/page_alloc.c | 12 +++++ > mm/page_isolation.c | 5 ++ > 14 files changed, 311 insertions(+), 127 deletions(-) > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx147.postini.com [74.125.245.147]) by kanga.kvack.org (Postfix) with SMTP id 798E46B0006 for ; Tue, 19 Mar 2013 19:57:39 -0400 (EDT) Received: by mail-pd0-f169.google.com with SMTP id 3so366862pdj.0 for ; Tue, 19 Mar 2013 16:57:38 -0700 (PDT) Message-ID: <5148FB6C.4070202@gmail.com> Date: Wed, 20 Mar 2013 07:57:32 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Hi Naoya, On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: > When we have a page fault for the address which is backed by a hugepage > under migration, the kernel can't wait correctly until the migration > finishes. This is because pte_offset_map_lock() can't get a correct It seems that current hugetlb_fault still wait hugetlb page under migration, how can it work without lock 2MB memory? > migration entry for hugepage. This patch adds migration_entry_wait_huge() > to separate code path between normal pages and hugepages. > > Signed-off-by: Naoya Horiguchi > --- > include/linux/hugetlb.h | 2 ++ > include/linux/swapops.h | 4 ++++ > mm/hugetlb.c | 4 ++-- > mm/migrate.c | 24 ++++++++++++++++++++++++ > 4 files changed, 32 insertions(+), 2 deletions(-) > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > index 0c80d3f..40b27f6 100644 > --- v3.8.orig/include/linux/hugetlb.h > +++ v3.8/include/linux/hugetlb.h > @@ -43,6 +43,7 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, > #endif > > int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); > +int is_hugetlb_entry_migration(pte_t pte); > int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, > struct page **, struct vm_area_struct **, > unsigned long *, int *, int, unsigned int flags); > @@ -109,6 +110,7 @@ static inline unsigned long hugetlb_total_pages(void) > #define follow_hugetlb_page(m,v,p,vs,a,b,i,w) ({ BUG(); 0; }) > #define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL) > #define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; }) > +#define is_hugetlb_entry_migration(pte) ({ BUG(); 0; }) > #define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) > static inline void hugetlb_report_meminfo(struct seq_file *m) > { > diff --git v3.8.orig/include/linux/swapops.h v3.8/include/linux/swapops.h > index 47ead51..f68efdd 100644 > --- v3.8.orig/include/linux/swapops.h > +++ v3.8/include/linux/swapops.h > @@ -137,6 +137,8 @@ static inline void make_migration_entry_read(swp_entry_t *entry) > > extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, > unsigned long address); > +extern void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, > + unsigned long address); > #else > > #define make_migration_entry(page, write) swp_entry(0, 0) > @@ -148,6 +150,8 @@ static inline int is_migration_entry(swp_entry_t swp) > static inline void make_migration_entry_read(swp_entry_t *entryp) { } > static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, > unsigned long address) { } > +static inline void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, > + unsigned long address) { } > static inline int is_write_migration_entry(swp_entry_t entry) > { > return 0; > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > index 546db81..351025e 100644 > --- v3.8.orig/mm/hugetlb.c > +++ v3.8/mm/hugetlb.c > @@ -2313,7 +2313,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > return -ENOMEM; > } > > -static int is_hugetlb_entry_migration(pte_t pte) > +int is_hugetlb_entry_migration(pte_t pte) > { > swp_entry_t swp; > > @@ -2823,7 +2823,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > if (ptep) { > entry = huge_ptep_get(ptep); > if (unlikely(is_hugetlb_entry_migration(entry))) { > - migration_entry_wait(mm, (pmd_t *)ptep, address); > + migration_entry_wait_huge(mm, (pmd_t *)ptep, address); > return 0; > } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) > return VM_FAULT_HWPOISON_LARGE | > diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c > index 2fd8b4a..7d84f4c 100644 > --- v3.8.orig/mm/migrate.c > +++ v3.8/mm/migrate.c > @@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, > pte_unmap_unlock(ptep, ptl); > } > > +void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, > + unsigned long address) > +{ > + spinlock_t *ptl = pte_lockptr(mm, pmd); > + pte_t pte; > + swp_entry_t entry; > + struct page *page; > + > + spin_lock(ptl); > + pte = huge_ptep_get((pte_t *)pmd); > + if (!is_hugetlb_entry_migration(pte)) > + goto out; > + entry = pte_to_swp_entry(pte); > + page = migration_entry_to_page(entry); > + if (!get_page_unless_zero(page)) > + goto out; > + spin_unlock(ptl); > + wait_on_page_locked(page); > + put_page(page); > + return; > +out: > + spin_unlock(ptl); > +} > + > #ifdef CONFIG_BLOCK > /* Returns true if all buffers are successfully locked */ > static bool buffer_migrate_lock_buffers(struct buffer_head *head, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx176.postini.com [74.125.245.176]) by kanga.kvack.org (Postfix) with SMTP id 526AA6B0037 for ; Tue, 19 Mar 2013 20:31:13 -0400 (EDT) Received: by mail-da0-f51.google.com with SMTP id g27so536137dan.10 for ; Tue, 19 Mar 2013 17:31:12 -0700 (PDT) Message-ID: <5149034A.5050907@gmail.com> Date: Wed, 20 Mar 2013 08:31:06 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: Michal Hocko , linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Hi Naoya, On 03/19/2013 08:07 AM, Naoya Horiguchi wrote: > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote: >> On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: >>> This patch extends check_range() to handle vma with VM_HUGETLB set. >>> With this changes, we can migrate hugepage with migrate_pages(2). >>> Note that for larger hugepages (covered by pud entries, 1GB for >>> x86_64 for example), we simply skip it now. >>> >>> Signed-off-by: Naoya Horiguchi >>> --- >>> include/linux/hugetlb.h | 6 ++++-- >>> mm/hugetlb.c | 10 ++++++++++ >>> mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++------------ >>> 3 files changed, 48 insertions(+), 14 deletions(-) >>> >>> diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h >>> index 8f87115..eb33df5 100644 >>> --- v3.8.orig/include/linux/hugetlb.h >>> +++ v3.8/include/linux/hugetlb.h >>> @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); >>> int dequeue_hwpoisoned_huge_page(struct page *page); >>> void putback_active_hugepage(struct page *page); >>> void putback_active_hugepages(struct list_head *l); >>> +void migrate_hugepage_add(struct page *page, struct list_head *list); >>> void copy_huge_page(struct page *dst, struct page *src); >>> >>> extern unsigned long hugepages_treat_as_movable; >>> @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, >>> pmd_t *pmd, int write); >>> struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, >>> pud_t *pud, int write); >>> -int pmd_huge(pmd_t pmd); >>> -int pud_huge(pud_t pmd); >>> +extern int pmd_huge(pmd_t pmd); >>> +extern int pud_huge(pud_t pmd); >> extern is not needed here. > OK. > >>> unsigned long hugetlb_change_protection(struct vm_area_struct *vma, >>> unsigned long address, unsigned long end, pgprot_t newprot); >>> >>> @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) >>> >>> #define putback_active_hugepage(p) 0 >>> #define putback_active_hugepages(l) 0 >>> +#define migrate_hugepage_add(p, l) 0 >>> static inline void copy_huge_page(struct page *dst, struct page *src) >>> { >>> } >>> diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c >>> index cb9d43b8..86ffcb7 100644 >>> --- v3.8.orig/mm/hugetlb.c >>> +++ v3.8/mm/hugetlb.c >>> @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l) >>> list_for_each_entry_safe(page, page2, l, lru) >>> putback_active_hugepage(page); >>> } >>> + >>> +void migrate_hugepage_add(struct page *page, struct list_head *list) >>> +{ >>> + VM_BUG_ON(!PageHuge(page)); >>> + get_page(page); >>> + spin_lock(&hugetlb_lock); >> Why hugetlb_lock? Comment for this lock says that it protects >> hugepage_freelists, nr_huge_pages, and free_huge_pages. > I think that this comment is out of date and hugepage_activelists, > which was introduced recently, should be protected because this > patchset adds is_hugepage_movable() which runs through the list. > So I'll update the comment in the next post. > >>> + list_move_tail(&page->lru, list); >>> + spin_unlock(&hugetlb_lock); >>> + return; >>> +} >>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c >>> index e2df1c1..8627135 100644 >>> --- v3.8.orig/mm/mempolicy.c >>> +++ v3.8/mm/mempolicy.c >>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, >>> return addr != end; >>> } >>> >>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, >>> + const nodemask_t *nodes, unsigned long flags, >>> + void *private) >>> +{ >>> +#ifdef CONFIG_HUGETLB_PAGE >>> + int nid; >>> + struct page *page; >>> + >>> + spin_lock(&vma->vm_mm->page_table_lock); >>> + page = pte_page(huge_ptep_get((pte_t *)pmd)); >>> + spin_unlock(&vma->vm_mm->page_table_lock); >> I am a bit confused why page_table_lock is used here and why it doesn't >> cover the page usage. > I expected this function to do the same for pmd as check_pte_range() does > for pte, but the above code didn't do it. I should've put spin_unlock > below migrate_hugepage_add(). Sorry for the confusion. I still confuse! Could you explain more in details? > >>> + nid = page_to_nid(page); >>> + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT) >>> + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1) >>> + || flags & MPOL_MF_MOVE_ALL)) >>> + migrate_hugepage_add(page, private); >>> +#else >>> + BUG(); >>> +#endif >>> +} >>> + >>> static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, >>> unsigned long addr, unsigned long end, >>> const nodemask_t *nodes, unsigned long flags, >>> @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, >>> pmd = pmd_offset(pud, addr); >>> do { >>> next = pmd_addr_end(addr, end); >>> + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { >> Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() >> sufficient? > I think we need both check here because if we use only pmd_huge(), > pmd for thp goes into this branch wrongly. > > Thanks, > Naoya > >>> + check_hugetlb_pmd_range(vma, pmd, nodes, >>> + flags, private); >>> + continue; >>> + } >>> split_huge_page_pmd(vma, addr, pmd); >>> if (pmd_none_or_trans_huge_or_clear_bad(pmd)) >>> continue; >> [...] >> -- >> Michal Hocko >> SUSE Labs > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx159.postini.com [74.125.245.159]) by kanga.kvack.org (Postfix) with SMTP id E5D506B0005 for ; Tue, 19 Mar 2013 21:03:27 -0400 (EDT) Received: by mail-pd0-f178.google.com with SMTP id u10so377695pdi.9 for ; Tue, 19 Mar 2013 18:03:26 -0700 (PDT) Message-ID: <51490AD8.9050308@gmail.com> Date: Wed, 20 Mar 2013 09:03:20 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Hi Naoya, On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: > Currently we can't offline memory blocks which contain hugepages because > a hugepage is considered as an unmovable page. But now with this patch > series, a hugepage has become movable, so by using hugepage migration we > can offline such memory blocks. > > What's different from other users of hugepage migration is that we need > to decompose all the hugepages inside the target memory block into free For other hugepage migration users, hugepage should be freed to hugepage_freelists after migration, but why I don't see any codes do this? > buddy pages after hugepage migration, because otherwise free hugepages > remaining in the memory block intervene the memory offlining. > For this reason we introduce new functions dissolve_free_huge_page() and > dissolve_free_huge_pages(). > > Other than that, what this patch does is straightforwardly to add hugepage > migration code, that is, adding hugepage code to the functions which scan > over pfn and collect hugepages to be migrated, and adding a hugepage > allocation function to alloc_migrate_target(). > > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove > over them because it's larger than memory block. So we now simply leave > it to fail as it is. > > Signed-off-by: Naoya Horiguchi > --- > include/linux/hugetlb.h | 8 ++++++++ > mm/hugetlb.c | 43 +++++++++++++++++++++++++++++++++++++++++ > mm/memory_hotplug.c | 51 ++++++++++++++++++++++++++++++++++++++++--------- > mm/migrate.c | 12 +++++++++++- > mm/page_alloc.c | 12 ++++++++++++ > mm/page_isolation.c | 5 +++++ > 6 files changed, 121 insertions(+), 10 deletions(-) > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > index 86a4d78..e33f07f 100644 > --- v3.8.orig/include/linux/hugetlb.h > +++ v3.8/include/linux/hugetlb.h > @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page); > void putback_active_hugepage(struct page *page); > void putback_active_hugepages(struct list_head *l); > void migrate_hugepage_add(struct page *page, struct list_head *list); > +int is_hugepage_movable(struct page *page); > void copy_huge_page(struct page *dst, struct page *src); > > extern unsigned long hugepages_treat_as_movable; > @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) > #define putback_active_hugepage(p) 0 > #define putback_active_hugepages(l) 0 > #define migrate_hugepage_add(p, l) 0 > +#define is_hugepage_movable(x) 0 > static inline void copy_huge_page(struct page *dst, struct page *src) > { > } > @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h) > return h - hstates; > } > > +extern void dissolve_free_huge_page(struct page *page); > +extern void dissolve_free_huge_pages(unsigned long start_pfn, > + unsigned long end_pfn); > + > #else > struct hstate {}; > #define alloc_huge_page(v, a, r) NULL > @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > } > #define hstate_index_to_shift(index) 0 > #define hstate_index(h) 0 > +#define dissolve_free_huge_page(p) 0 > +#define dissolve_free_huge_pages(s, e) 0 > #endif > > #endif /* _LINUX_HUGETLB_H */ > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > index ccf9995..c28e6c9 100644 > --- v3.8.orig/mm/hugetlb.c > +++ v3.8/mm/hugetlb.c > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > return ret; > } > > +/* Dissolve a given free hugepage into free pages. */ > +void dissolve_free_huge_page(struct page *page) > +{ > + if (PageHuge(page) && !page_count(page)) { > + struct hstate *h = page_hstate(page); > + int nid = page_to_nid(page); > + spin_lock(&hugetlb_lock); > + list_del(&page->lru); > + h->free_huge_pages--; > + h->free_huge_pages_node[nid]--; > + update_and_free_page(h, page); > + spin_unlock(&hugetlb_lock); > + } > +} > + > +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */ > +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > +{ > + unsigned long pfn; > + unsigned int step = 1 << (HUGETLB_PAGE_ORDER); > + for (pfn = start_pfn; pfn < end_pfn; pfn += step) > + dissolve_free_huge_page(pfn_to_page(pfn)); > +} > + > static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > { > struct page *page; > @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage) > return 0; > } > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > +int is_hugepage_movable(struct page *hpage) > +{ > + struct page *page; > + struct page *tmp; > + struct hstate *h = page_hstate(hpage); > + int ret = 0; > + > + VM_BUG_ON(!PageHuge(hpage)); > + if (PageTail(hpage)) > + return 0; > + spin_lock(&hugetlb_lock); > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) > + if (page == hpage) > + ret = 1; > + spin_unlock(&hugetlb_lock); > + return ret; > +} > + > /* > * This function is called from memory failure code. > * Assume the caller holds page lock of the head page. > diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c > index d04ed87..6418de2 100644 > --- v3.8.orig/mm/memory_hotplug.c > +++ v3.8/mm/memory_hotplug.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > > #include > > @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) > } > > /* > - * Scanning pfn is much easier than scanning lru list. > - * Scan pfn from start to end and Find LRU page. > + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages > + * and hugepages). We scan pfn because it's much easier than scanning over > + * linked list. This function returns the pfn of the first found movable > + * page if it's found, otherwise 0. > */ > -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > +static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > { > unsigned long pfn; > struct page *page; > @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > page = pfn_to_page(pfn); > if (PageLRU(page)) > return pfn; > + if (PageHuge(page)) { > + if (is_hugepage_movable(page)) > + return pfn; > + else > + pfn += (1 << compound_order(page)) - 1; > + } > } > } > return 0; > @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > page = pfn_to_page(pfn); > if (!get_page_unless_zero(page)) > continue; > + if (PageHuge(page)) { > + /* > + * Larger hugepage (1GB for x86_64) is larger than > + * memory block, so pfn scan can start at the tail > + * page of larger hugepage. In such case, > + * we simply skip the hugepage and move the cursor > + * to the last tail page. > + */ > + if (PageTail(page)) { > + struct page *head = compound_head(page); > + pfn = page_to_pfn(head) + > + (1 << compound_order(head)) - 1; > + put_page(page); > + continue; > + } > + pfn = (1 << compound_order(page)) - 1; > + if (huge_page_size(page_hstate(page)) != PMD_SIZE) { > + put_page(page); > + continue; > + } > + list_move_tail(&page->lru, &source); > + move_pages -= 1 << compound_order(page); > + continue; > + } > /* > * We can skip free pages. And we can only deal with pages on > * LRU. > @@ -1049,7 +1082,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > } > if (!list_empty(&source)) { > if (not_managed) { > - putback_lru_pages(&source); > + putback_movable_pages(&source); > goto out; > } > > @@ -1057,11 +1090,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > * alloc_migrate_target should be improooooved!! > * migrate_pages returns # of failed pages. > */ > - ret = migrate_pages(&source, alloc_migrate_target, 0, > + ret = migrate_movable_pages(&source, alloc_migrate_target, 0, > true, MIGRATE_SYNC, > MR_MEMORY_HOTPLUG); > - if (ret) > - putback_lru_pages(&source); > } > out: > return ret; > @@ -1304,8 +1335,8 @@ static int __ref __offline_pages(unsigned long start_pfn, > drain_all_pages(); > } > > - pfn = scan_lru_pages(start_pfn, end_pfn); > - if (pfn) { /* We have page on LRU */ > + pfn = scan_movable_pages(start_pfn, end_pfn); > + if (pfn) { /* We have movable pages */ > ret = do_migrate_range(pfn, end_pfn); > if (!ret) { > drain = 1; > @@ -1324,6 +1355,8 @@ static int __ref __offline_pages(unsigned long start_pfn, > yield(); > /* drain pcp pages, this is synchronous. */ > drain_all_pages(); > + /* dissolve all free hugepages inside the memory block */ > + dissolve_free_huge_pages(start_pfn, end_pfn); > /* check again */ > offlined_pages = check_pages_isolated(start_pfn, end_pfn); > if (offlined_pages < 0) { > diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c > index 8c457e7..a491a98 100644 > --- v3.8.orig/mm/migrate.c > +++ v3.8/mm/migrate.c > @@ -1009,8 +1009,18 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, > > unlock_page(hpage); > out: > - if (rc != -EAGAIN) > + if (rc != -EAGAIN) { > putback_active_hugepage(hpage); > + > + /* > + * After hugepage migration from memory hotplug, the original > + * hugepage should never be allocated again. This will be > + * done by dissolving it into free normal pages, because > + * we already set migratetype to MIGRATE_ISOLATE for them. > + */ > + if (offlining) > + dissolve_free_huge_page(hpage); > + } > put_page(new_hpage); > if (result) { > if (rc) > diff --git v3.8.orig/mm/page_alloc.c v3.8/mm/page_alloc.c > index 6a83cd3..c37951d 100644 > --- v3.8.orig/mm/page_alloc.c > +++ v3.8/mm/page_alloc.c > @@ -58,6 +58,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -5686,6 +5687,17 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > continue; > > page = pfn_to_page(check); > + > + /* > + * Hugepages are not in LRU lists, but they're movable. > + * We need not scan over tail pages bacause we don't > + * handle each tail page individually in migration. > + */ > + if (PageHuge(page)) { > + iter += (1 << compound_order(page)) - 1; > + continue; > + } > + > /* > * We can't use page_count without pin a page > * because another CPU can free compound page. > diff --git v3.8.orig/mm/page_isolation.c v3.8/mm/page_isolation.c > index 383bdbb..cf48ef6 100644 > --- v3.8.orig/mm/page_isolation.c > +++ v3.8/mm/page_isolation.c > @@ -6,6 +6,7 @@ > #include > #include > #include > +#include > #include "internal.h" > > int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages) > @@ -252,6 +253,10 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, > { > gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE; > > + if (PageHuge(page)) > + return alloc_huge_page_node(page_hstate(compound_head(page)), > + numa_node_id()); > + > if (PageHighMem(page)) > gfp_mask |= __GFP_HIGHMEM; > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx162.postini.com [74.125.245.162]) by kanga.kvack.org (Postfix) with SMTP id 038226B0005 for ; Tue, 19 Mar 2013 23:55:46 -0400 (EDT) Date: Tue, 19 Mar 2013 23:55:33 -0400 From: Naoya Horiguchi Message-ID: <1363751733-1fg9kic6-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130318160737.GU10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318160737.GU10192@dhcp22.suse.cz> Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Mon, Mar 18, 2013 at 05:07:37PM +0100, Michal Hocko wrote: > On Thu 21-02-13 14:41:47, Naoya Horiguchi wrote: > > Currently we can't offline memory blocks which contain hugepages because > > a hugepage is considered as an unmovable page. But now with this patch > > series, a hugepage has become movable, so by using hugepage migration we > > can offline such memory blocks. > > > > What's different from other users of hugepage migration is that we need > > to decompose all the hugepages inside the target memory block into free > > buddy pages after hugepage migration, because otherwise free hugepages > > remaining in the memory block intervene the memory offlining. > > For this reason we introduce new functions dissolve_free_huge_page() and > > dissolve_free_huge_pages(). > > > > Other than that, what this patch does is straightforwardly to add hugepage > > migration code, that is, adding hugepage code to the functions which scan > > over pfn and collect hugepages to be migrated, and adding a hugepage > > allocation function to alloc_migrate_target(). > > > > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove > > over them because it's larger than memory block. So we now simply leave > > it to fail as it is. > > What we could do is to check whether there is a free gb huge page on > other node and migrate there. Correct, and 1GB page migration needs more code in migration core code (mainly it's related to migration entry in pud) and enough testing, so I want to do it in separate patchset. > > Signed-off-by: Naoya Horiguchi > > --- > > include/linux/hugetlb.h | 8 ++++++++ > > mm/hugetlb.c | 43 +++++++++++++++++++++++++++++++++++++++++ > > mm/memory_hotplug.c | 51 ++++++++++++++++++++++++++++++++++++++++--------- > > mm/migrate.c | 12 +++++++++++- > > mm/page_alloc.c | 12 ++++++++++++ > > mm/page_isolation.c | 5 +++++ > > 6 files changed, 121 insertions(+), 10 deletions(-) > > > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > > index 86a4d78..e33f07f 100644 > > --- v3.8.orig/include/linux/hugetlb.h > > +++ v3.8/include/linux/hugetlb.h > > @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page); > > void putback_active_hugepage(struct page *page); > > void putback_active_hugepages(struct list_head *l); > > void migrate_hugepage_add(struct page *page, struct list_head *list); > > +int is_hugepage_movable(struct page *page); > > void copy_huge_page(struct page *dst, struct page *src); > > > > extern unsigned long hugepages_treat_as_movable; > > @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) > > #define putback_active_hugepage(p) 0 > > #define putback_active_hugepages(l) 0 > > #define migrate_hugepage_add(p, l) 0 > > +#define is_hugepage_movable(x) 0 > > static inline void copy_huge_page(struct page *dst, struct page *src) > > { > > } > > @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h) > > return h - hstates; > > } > > > > +extern void dissolve_free_huge_page(struct page *page); > > +extern void dissolve_free_huge_pages(unsigned long start_pfn, > > + unsigned long end_pfn); > > + > > #else > > struct hstate {}; > > #define alloc_huge_page(v, a, r) NULL > > @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > > } > > #define hstate_index_to_shift(index) 0 > > #define hstate_index(h) 0 > > +#define dissolve_free_huge_page(p) 0 > > +#define dissolve_free_huge_pages(s, e) 0 > > #endif > > > > #endif /* _LINUX_HUGETLB_H */ > > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > > index ccf9995..c28e6c9 100644 > > --- v3.8.orig/mm/hugetlb.c > > +++ v3.8/mm/hugetlb.c > > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > > return ret; > > } > > > > +/* Dissolve a given free hugepage into free pages. */ > > +void dissolve_free_huge_page(struct page *page) > > +{ > > + if (PageHuge(page) && !page_count(page)) { > > Could you clarify why you are cheking page_count here? I assume it is to > make sure the page is free but what prevents it being increased before > you take hugetlb_lock? There's nothing to prevent it, so it's not safe to check refcount outside hugetlb_lock. > > + struct hstate *h = page_hstate(page); > > + int nid = page_to_nid(page); > > + spin_lock(&hugetlb_lock); > > + list_del(&page->lru); > > + h->free_huge_pages--; > > + h->free_huge_pages_node[nid]--; > > + update_and_free_page(h, page); > > + spin_unlock(&hugetlb_lock); > > + } > > +} > > + > > +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */ > > +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > > +{ > > + unsigned long pfn; > > + unsigned int step = 1 << (HUGETLB_PAGE_ORDER); > > hugetlb pages could be present in different sizes so this doesn't work > in general. You need to to get order from page_hstate. OK. > > + for (pfn = start_pfn; pfn < end_pfn; pfn += step) > > + dissolve_free_huge_page(pfn_to_page(pfn)); > > +} > > + > > static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > > { > > struct page *page; > > @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage) > > return 0; > > } > > > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > > +int is_hugepage_movable(struct page *hpage) > > +{ > > + struct page *page; > > + struct page *tmp; > > + struct hstate *h = page_hstate(hpage); > > + int ret = 0; > > + > > + VM_BUG_ON(!PageHuge(hpage)); > > + if (PageTail(hpage)) > > + return 0; > > + spin_lock(&hugetlb_lock); > > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) > > + if (page == hpage) > > + ret = 1; > > + spin_unlock(&hugetlb_lock); > > + return ret; > > +} > > + > > /* > > * This function is called from memory failure code. > > * Assume the caller holds page lock of the head page. > > diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c > > index d04ed87..6418de2 100644 > > --- v3.8.orig/mm/memory_hotplug.c > > +++ v3.8/mm/memory_hotplug.c > > @@ -29,6 +29,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > > > @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) > > } > > > > /* > > - * Scanning pfn is much easier than scanning lru list. > > - * Scan pfn from start to end and Find LRU page. > > + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages > > + * and hugepages). We scan pfn because it's much easier than scanning over > > + * linked list. This function returns the pfn of the first found movable > > + * page if it's found, otherwise 0. > > */ > > -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > > +static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > > { > > unsigned long pfn; > > struct page *page; > > @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > > page = pfn_to_page(pfn); > > if (PageLRU(page)) > > return pfn; > > + if (PageHuge(page)) { > > + if (is_hugepage_movable(page)) > > + return pfn; > > + else > > + pfn += (1 << compound_order(page)) - 1; > > + } > > scan_lru_pages's name gets really confusing after this change because > hugetlb pages are not on the LRU. Maybe it would be good to rename it. Yes, and that's done in right above chunk. > > > } > > } > > return 0; > > @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > > page = pfn_to_page(pfn); > > if (!get_page_unless_zero(page)) > > continue; > > All tail pages have 0 reference count (according to prep_compound_page) > so they would be skipped anyway. This makes the below pfn tweaks > pointless. I was totally mistaken about what we should do here, sorry. If we call do_migrate_range() for 1GB hugepage, we should return with error (maybe -EBUSY) instead of just skipping it, otherwise the caller __offline_pages() repeats 'goto repeat' until timeout. In order to do that, we had better insert if(PageHuge) block before getting refcount. And ... > > + if (PageHuge(page)) { > > + /* > > + * Larger hugepage (1GB for x86_64) is larger than > > + * memory block, so pfn scan can start at the tail > > + * page of larger hugepage. In such case, > > + * we simply skip the hugepage and move the cursor > > + * to the last tail page. > > + */ > > + if (PageTail(page)) { > > + struct page *head = compound_head(page); > > + pfn = page_to_pfn(head) + > > + (1 << compound_order(head)) - 1; > > + put_page(page); > > + continue; > > + } > > + pfn = (1 << compound_order(page)) - 1; > > + if (huge_page_size(page_hstate(page)) != PMD_SIZE) { > > + put_page(page); > > + continue; > > + } > > There might be other hugepage sizes which fit into memblock so this test > doesn't seem right. yes, so compound_order(head) > PFN_SECTION_SHIFT would be better. I'll replace this chunk with the following if I don't get any other suggestion. @@ -1017,6 +1026,21 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) if (!pfn_valid(pfn)) continue; page = pfn_to_page(pfn); + + if (PageHuge(page)) { + struct page *head = compound_head(page); + pfn = page_to_pfn(head) + (1 << compound_order(head)) - 1; + if (compound_order(head) > PFN_SECTION_SHIFT) { + ret = -EBUSY; + break; + } + if (!get_page_unless_zero(page)) + continue; + list_move_tail(&head->lru, &source); + move_pages -= 1 << compound_order(head); + continue; + } + if (!get_page_unless_zero(page)) continue; /* Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx193.postini.com [74.125.245.193]) by kanga.kvack.org (Postfix) with SMTP id 05C0D6B0002 for ; Wed, 20 Mar 2013 02:13:08 -0400 (EDT) Date: Wed, 20 Mar 2013 02:12:54 -0400 From: Naoya Horiguchi Message-ID: <1363759974-38t0k25g-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130319071113.GD5112@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> <20130319071113.GD5112@dhcp22.suse.cz> Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Tue, Mar 19, 2013 at 08:11:13AM +0100, Michal Hocko wrote: > On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote: > > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote: > > > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: ... > > > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > > > pmd = pmd_offset(pud, addr); > > > > do { > > > > next = pmd_addr_end(addr, end); > > > > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { > > > > > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() > > > sufficient? > > > > I think we need both check here because if we use only pmd_huge(), > > pmd for thp goes into this branch wrongly. > > Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it > obviously checks only _PAGE_PSE same as pmd_large() which is really > unfortunate and confusing. Can we make it hugetlb specific? I agree that we had better fix this confusion. What pmd_huge() (or pmd_large() in some architectures) does is just checking whether a given pmd is pointing to huge/large page or not. It does not say which type of hugepage it is. So it shouldn't be used to decide whether the hugepage are hugetlbfs or not. I think it would be better to introduce pmd_hugetlb() which has pmd and vma as arguments and returns true only for hugetlbfs pmd. Checking pmd_hugetlb() should come before checking pmd_trans_huge() because pmd_trans_huge() implicitly assumes that the vma which covers the virtual address of a given pmd is not hugetlbfs vma. I'm interested in this cleanup, so will work on it after this patchset. Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx166.postini.com [74.125.245.166]) by kanga.kvack.org (Postfix) with SMTP id 68E076B0027 for ; Wed, 20 Mar 2013 03:41:22 -0400 (EDT) Date: Wed, 20 Mar 2013 08:41:18 +0100 From: Michal Hocko Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Message-ID: <20130320074118.GB20045@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> <20130319071113.GD5112@dhcp22.suse.cz> <1363759974-38t0k25g-mutt-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1363759974-38t0k25g-mutt-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Wed 20-03-13 02:12:54, Naoya Horiguchi wrote: > On Tue, Mar 19, 2013 at 08:11:13AM +0100, Michal Hocko wrote: > > On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote: > > > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote: > > > > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: > ... > > > > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > > > > pmd = pmd_offset(pud, addr); > > > > > do { > > > > > next = pmd_addr_end(addr, end); > > > > > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { > > > > > > > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() > > > > sufficient? > > > > > > I think we need both check here because if we use only pmd_huge(), > > > pmd for thp goes into this branch wrongly. > > > > Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it > > obviously checks only _PAGE_PSE same as pmd_large() which is really > > unfortunate and confusing. Can we make it hugetlb specific? > > I agree that we had better fix this confusion. > > What pmd_huge() (or pmd_large() in some architectures) does is just > checking whether a given pmd is pointing to huge/large page or not. > It does not say which type of hugepage it is. > So it shouldn't be used to decide whether the hugepage are hugetlbfs or not. > I think it would be better to introduce pmd_hugetlb() which has pmd and vma > as arguments and returns true only for hugetlbfs pmd. > Checking pmd_hugetlb() should come before checking pmd_trans_huge() because > pmd_trans_huge() implicitly assumes that the vma which covers the virtual > address of a given pmd is not hugetlbfs vma. > > I'm interested in this cleanup, so will work on it after this patchset. pnd_huge is used only at few places so it shouldn't be very big. On the other hand you do not have vma always available so it is getting tricky. Thanks -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx149.postini.com [74.125.245.149]) by kanga.kvack.org (Postfix) with SMTP id E70926B0002 for ; Wed, 20 Mar 2013 03:57:41 -0400 (EDT) Date: Wed, 20 Mar 2013 08:57:36 +0100 From: Michal Hocko Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Message-ID: <20130320075736.GC20045@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318160737.GU10192@dhcp22.suse.cz> <1363751733-1fg9kic6-mutt-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1363751733-1fg9kic6-mutt-n-horiguchi@ah.jp.nec.com> Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Tue 19-03-13 23:55:33, Naoya Horiguchi wrote: > On Mon, Mar 18, 2013 at 05:07:37PM +0100, Michal Hocko wrote: > > On Thu 21-02-13 14:41:47, Naoya Horiguchi wrote: [...] > > > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove > > > over them because it's larger than memory block. So we now simply leave > > > it to fail as it is. > > > > What we could do is to check whether there is a free gb huge page on > > other node and migrate there. > > Correct, and 1GB page migration needs more code in migration core code > (mainly it's related to migration entry in pud) and enough testing, > so I want to do it in separate patchset. Sure, this was just a note that it is achievable not that it has to be done in the patchset. [...] > > > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > > > index ccf9995..c28e6c9 100644 > > > --- v3.8.orig/mm/hugetlb.c > > > +++ v3.8/mm/hugetlb.c > > > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > > > return ret; > > > } > > > > > > +/* Dissolve a given free hugepage into free pages. */ > > > +void dissolve_free_huge_page(struct page *page) > > > +{ > > > + if (PageHuge(page) && !page_count(page)) { > > > > Could you clarify why you are cheking page_count here? I assume it is to > > make sure the page is free but what prevents it being increased before > > you take hugetlb_lock? > > There's nothing to prevent it, so it's not safe to check refcount outside > hugetlb_lock. OK, so the lock has to be moved up. [...] > > > diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c > > > index d04ed87..6418de2 100644 > > > --- v3.8.orig/mm/memory_hotplug.c > > > +++ v3.8/mm/memory_hotplug.c > > > @@ -29,6 +29,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > > > > #include > > > > > > @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) > > > } > > > > > > /* > > > - * Scanning pfn is much easier than scanning lru list. > > > - * Scan pfn from start to end and Find LRU page. > > > + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages > > > + * and hugepages). We scan pfn because it's much easier than scanning over > > > + * linked list. This function returns the pfn of the first found movable > > > + * page if it's found, otherwise 0. > > > */ > > > -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > > > +static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > > > { > > > unsigned long pfn; > > > struct page *page; > > > @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > > > page = pfn_to_page(pfn); > > > if (PageLRU(page)) > > > return pfn; > > > + if (PageHuge(page)) { > > > + if (is_hugepage_movable(page)) > > > + return pfn; > > > + else > > > + pfn += (1 << compound_order(page)) - 1; > > > + } > > > > scan_lru_pages's name gets really confusing after this change because > > hugetlb pages are not on the LRU. Maybe it would be good to rename it. > > Yes, and that's done in right above chunk. bahh, I am blind. I got confused by the name in the hunk header. Sorry about that. > > > > > > } > > > } > > > return 0; > > > @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > > > page = pfn_to_page(pfn); > > > if (!get_page_unless_zero(page)) > > > continue; > > > > All tail pages have 0 reference count (according to prep_compound_page) > > so they would be skipped anyway. This makes the below pfn tweaks > > pointless. > > I was totally mistaken about what we should do here, sorry. If we call > do_migrate_range() for 1GB hugepage, we should return with error (maybe -EBUSY) > instead of just skipping it, otherwise the caller __offline_pages() repeats > 'goto repeat' until timeout. In order to do that, we had better insert > if(PageHuge) block before getting refcount. And ... > > > > + if (PageHuge(page)) { > > > + /* > > > + * Larger hugepage (1GB for x86_64) is larger than > > > + * memory block, so pfn scan can start at the tail > > > + * page of larger hugepage. In such case, > > > + * we simply skip the hugepage and move the cursor > > > + * to the last tail page. > > > + */ > > > + if (PageTail(page)) { > > > + struct page *head = compound_head(page); > > > + pfn = page_to_pfn(head) + > > > + (1 << compound_order(head)) - 1; > > > + put_page(page); > > > + continue; > > > + } > > > + pfn = (1 << compound_order(page)) - 1; > > > + if (huge_page_size(page_hstate(page)) != PMD_SIZE) { > > > + put_page(page); > > > + continue; > > > + } > > > > There might be other hugepage sizes which fit into memblock so this test > > doesn't seem right. > > yes, so compound_order(head) > PFN_SECTION_SHIFT would be better. I would rather see compound_order(head) < MAX_ORDER to be more coupled with the allocator. > I'll replace this chunk with the following if I don't get any other > suggestion. > > @@ -1017,6 +1026,21 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > if (!pfn_valid(pfn)) > continue; > page = pfn_to_page(pfn); > + > + if (PageHuge(page)) { > + struct page *head = compound_head(page); > + pfn = page_to_pfn(head) + (1 << compound_order(head)) - 1; I do not think this is safe without an elevated ref count. Your page might be on the way to be freed. So you need to put get_page_unless_zero before compound_order check. Besides that I do not see too much point in optimizing this path on the code complexity behalf. Sure we would call get_page_unless_zero pointlessly for all tail pages but this is hardly a hot path. > + if (compound_order(head) > PFN_SECTION_SHIFT) { > + ret = -EBUSY; > + break; > + } > + if (!get_page_unless_zero(page)) Should be head. > + continue; > + list_move_tail(&head->lru, &source); > + move_pages -= 1 << compound_order(head); > + continue; > + } > + > if (!get_page_unless_zero(page)) > continue; > /* > > Thanks, > Naoya > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx162.postini.com [74.125.245.162]) by kanga.kvack.org (Postfix) with SMTP id 582126B0002 for ; Wed, 20 Mar 2013 17:35:43 -0400 (EDT) Date: Wed, 20 Mar 2013 17:35:26 -0400 From: Naoya Horiguchi Message-ID: <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <5148F830.3070601@gmail.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> Subject: Re: [RFC][PATCH 0/9] extend hugepage migration Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Simon Jeons Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Wed, Mar 20, 2013 at 07:43:44AM +0800, Simon Jeons wrote: ... > >Easy patch access: > > git@github.com:Naoya-Horiguchi/linux.git > > branch:extend_hugepage_migration > > > >Test code: > > git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git > > git clone > git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git > Cloning into test_hugepage_migration_extension... > Permission denied (publickey). > fatal: The remote end hung up unexpectedly Sorry, wrong url. git://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git or https://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git should work. Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx109.postini.com [74.125.245.109]) by kanga.kvack.org (Postfix) with SMTP id 70DE56B0002 for ; Wed, 20 Mar 2013 17:53:35 -0400 (EDT) Date: Wed, 20 Mar 2013 17:53:19 -0400 From: Naoya Horiguchi Message-ID: <1363816399-c6e7mofc-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <5148FB6C.4070202@gmail.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> <5148FB6C.4070202@gmail.com> Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Simon Jeons Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Wed, Mar 20, 2013 at 07:57:32AM +0800, Simon Jeons wrote: > Hi Naoya, > On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: > >When we have a page fault for the address which is backed by a hugepage > >under migration, the kernel can't wait correctly until the migration > >finishes. This is because pte_offset_map_lock() can't get a correct > > It seems that current hugetlb_fault still wait hugetlb page under > migration, how can it work without lock 2MB memory? Hugetlb_fault() does call migration_entry_wait(), but returns immediately. So page fault happens over and over again until the migration completes. IOW, migration_entry_wait() is now broken for hugepage and doesn't work as expected. Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx143.postini.com [74.125.245.143]) by kanga.kvack.org (Postfix) with SMTP id 689E96B0036 for ; Wed, 20 Mar 2013 18:00:18 -0400 (EDT) Date: Wed, 20 Mar 2013 17:59:53 -0400 From: Naoya Horiguchi Message-ID: <1363816793-7eq6pu0l-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <5149034A.5050907@gmail.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> <5149034A.5050907@gmail.com> Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Simon Jeons Cc: Michal Hocko , linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Wed, Mar 20, 2013 at 08:31:06AM +0800, Simon Jeons wrote: ... > >>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c > >>> index e2df1c1..8627135 100644 > >>> --- v3.8.orig/mm/mempolicy.c > >>> +++ v3.8/mm/mempolicy.c > >>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > >>> return addr != end; > >>> } > >>> > >>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, > >>> + const nodemask_t *nodes, unsigned long flags, > >>> + void *private) > >>> +{ > >>> +#ifdef CONFIG_HUGETLB_PAGE > >>> + int nid; > >>> + struct page *page; > >>> + > >>> + spin_lock(&vma->vm_mm->page_table_lock); > >>> + page = pte_page(huge_ptep_get((pte_t *)pmd)); > >>> + spin_unlock(&vma->vm_mm->page_table_lock); > >> I am a bit confused why page_table_lock is used here and why it doesn't > >> cover the page usage. > > I expected this function to do the same for pmd as check_pte_range() does > > for pte, but the above code didn't do it. I should've put spin_unlock > > below migrate_hugepage_add(). Sorry for the confusion. > > I still confuse! Could you explain more in details? With the above code, check_hugetlb_pmd_range() checks page_mapcount outside the page table lock, but mapcount can be decremented by __unmap_hugepage_range(), so there's a race. __unmap_hugepage_range() calls page_remove_rmap() inside page table lock, so we can avoid this race by doing whole check_hugetlb_pmd_range()'s work inside the page table lock. Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx115.postini.com [74.125.245.115]) by kanga.kvack.org (Postfix) with SMTP id 5A3A26B0039 for ; Wed, 20 Mar 2013 18:06:00 -0400 (EDT) Date: Wed, 20 Mar 2013 18:05:48 -0400 From: Naoya Horiguchi Message-ID: <1363817148-rlt5mp5n-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <51490AD8.9050308@gmail.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> <51490AD8.9050308@gmail.com> Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Simon Jeons Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Wed, Mar 20, 2013 at 09:03:20AM +0800, Simon Jeons wrote: > Hi Naoya, > On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: > >Currently we can't offline memory blocks which contain hugepages because > >a hugepage is considered as an unmovable page. But now with this patch > >series, a hugepage has become movable, so by using hugepage migration we > >can offline such memory blocks. > > > >What's different from other users of hugepage migration is that we need > >to decompose all the hugepages inside the target memory block into free > > For other hugepage migration users, hugepage should be freed to > hugepage_freelists after migration, but why I don't see any codes do > this? The source hugepages which are migrated by NUMA related system calls (migrate_pages(2), move_pages(2), and mbind(2)) are still useable, so we simply free them into free hugepage pool. OTOH, the source hugepages migrated by memory hotremove should not be reusable, because users of memory hotremove want to remove the memory from the system. So we need to free such hugepages forcibly into the buddy pages, otherwise memory offining doesn't work. Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx121.postini.com [74.125.245.121]) by kanga.kvack.org (Postfix) with SMTP id 8F01A6B0002 for ; Wed, 20 Mar 2013 19:37:00 -0400 (EDT) Received: by mail-pb0-f49.google.com with SMTP id xa12so1755521pbc.36 for ; Wed, 20 Mar 2013 16:36:59 -0700 (PDT) Message-ID: <514A4815.4040206@gmail.com> Date: Thu, 21 Mar 2013 07:36:53 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> <5148FB6C.4070202@gmail.com> <1363816399-c6e7mofc-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1363816399-c6e7mofc-mutt-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Hi Naoya, On 03/21/2013 05:53 AM, Naoya Horiguchi wrote: > On Wed, Mar 20, 2013 at 07:57:32AM +0800, Simon Jeons wrote: >> Hi Naoya, >> On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: >>> When we have a page fault for the address which is backed by a hugepage >>> under migration, the kernel can't wait correctly until the migration >>> finishes. This is because pte_offset_map_lock() can't get a correct >> It seems that current hugetlb_fault still wait hugetlb page under >> migration, how can it work without lock 2MB memory? > Hugetlb_fault() does call migration_entry_wait(), but returns immediately. Could you point out to me which code in function migration_entry_wait() lead to return immediately? > So page fault happens over and over again until the migration completes. > IOW, migration_entry_wait() is now broken for hugepage and doesn't work > as expected. > > Thanks, > Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx133.postini.com [74.125.245.133]) by kanga.kvack.org (Postfix) with SMTP id E70B36B0002 for ; Wed, 20 Mar 2013 19:49:54 -0400 (EDT) Received: by mail-pd0-f180.google.com with SMTP id g10so817036pdj.25 for ; Wed, 20 Mar 2013 16:49:54 -0700 (PDT) Message-ID: <514A4B1C.6020201@gmail.com> Date: Thu, 21 Mar 2013 07:49:48 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Hi Naoya, On 03/21/2013 05:35 AM, Naoya Horiguchi wrote: > On Wed, Mar 20, 2013 at 07:43:44AM +0800, Simon Jeons wrote: > ... >>> Easy patch access: >>> git@github.com:Naoya-Horiguchi/linux.git >>> branch:extend_hugepage_migration >>> >>> Test code: >>> git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git >> git clone >> git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git >> Cloning into test_hugepage_migration_extension... >> Permission denied (publickey). >> fatal: The remote end hung up unexpectedly > Sorry, wrong url. > git://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git > or > https://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git > should work. When I hacking arch/x86/mm/hugetlbpage.c like this, diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c index ae1aa71..87f34ee 100644 --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ -#ifdef CONFIG_X86_64 static __init int setup_hugepagesz(char *opt) { unsigned long ps = memparse(opt, &opt); if (ps == PMD_SIZE) { hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); - } else if (ps == PUD_SIZE && cpu_has_gbpages) { - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); + } else if (ps == PUD_SIZE) { + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); } else { printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", ps >> 20); I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. What's the difference between these pages which I hacking and normal huge pages? > > Thanks, > Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx134.postini.com [74.125.245.134]) by kanga.kvack.org (Postfix) with SMTP id ECB456B0005 for ; Wed, 20 Mar 2013 19:55:34 -0400 (EDT) Received: by mail-da0-f46.google.com with SMTP id y19so1269330dan.5 for ; Wed, 20 Mar 2013 16:55:34 -0700 (PDT) Message-ID: <514A4C70.2020303@gmail.com> Date: Thu, 21 Mar 2013 07:55:28 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> <51490AD8.9050308@gmail.com> <1363817148-rlt5mp5n-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1363817148-rlt5mp5n-mutt-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Hi Naoya, On 03/21/2013 06:05 AM, Naoya Horiguchi wrote: > On Wed, Mar 20, 2013 at 09:03:20AM +0800, Simon Jeons wrote: >> Hi Naoya, >> On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: >>> Currently we can't offline memory blocks which contain hugepages because >>> a hugepage is considered as an unmovable page. But now with this patch >>> series, a hugepage has become movable, so by using hugepage migration we >>> can offline such memory blocks. >>> >>> What's different from other users of hugepage migration is that we need >>> to decompose all the hugepages inside the target memory block into free >> For other hugepage migration users, hugepage should be freed to >> hugepage_freelists after migration, but why I don't see any codes do >> this? > The source hugepages which are migrated by NUMA related system calls > (migrate_pages(2), move_pages(2), and mbind(2)) are still useable, > so we simply free them into free hugepage pool. It seems that you misunderstand why I confuse. I can't find where free huge pages to hugepage pool, could you point out to me? > OTOH, the source hugepages migrated by memory hotremove should not be > reusable, because users of memory hotremove want to remove the memory > from the system. So we need to free such hugepages forcibly into the > buddy pages, otherwise memory offining doesn't work. > > Thanks, > Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx177.postini.com [74.125.245.177]) by kanga.kvack.org (Postfix) with SMTP id 2DE166B0006 for ; Wed, 20 Mar 2013 20:06:13 -0400 (EDT) Received: by mail-ie0-f181.google.com with SMTP id 17so2870528iea.12 for ; Wed, 20 Mar 2013 17:06:12 -0700 (PDT) Message-ID: <514A4EEE.1080405@gmail.com> Date: Thu, 21 Mar 2013 08:06:06 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> <5149034A.5050907@gmail.com> <1363816793-7eq6pu0l-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1363816793-7eq6pu0l-mutt-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: Michal Hocko , linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Hi Naoya, On 03/21/2013 05:59 AM, Naoya Horiguchi wrote: > On Wed, Mar 20, 2013 at 08:31:06AM +0800, Simon Jeons wrote: > ... >>>>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c >>>>> index e2df1c1..8627135 100644 >>>>> --- v3.8.orig/mm/mempolicy.c >>>>> +++ v3.8/mm/mempolicy.c >>>>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, >>>>> return addr != end; >>>>> } >>>>> >>>>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, >>>>> + const nodemask_t *nodes, unsigned long flags, >>>>> + void *private) >>>>> +{ >>>>> +#ifdef CONFIG_HUGETLB_PAGE >>>>> + int nid; >>>>> + struct page *page; >>>>> + >>>>> + spin_lock(&vma->vm_mm->page_table_lock); >>>>> + page = pte_page(huge_ptep_get((pte_t *)pmd)); >>>>> + spin_unlock(&vma->vm_mm->page_table_lock); >>>> I am a bit confused why page_table_lock is used here and why it doesn't >>>> cover the page usage. >>> I expected this function to do the same for pmd as check_pte_range() does >>> for pte, but the above code didn't do it. I should've put spin_unlock >>> below migrate_hugepage_add(). Sorry for the confusion. >> I still confuse! Could you explain more in details? > With the above code, check_hugetlb_pmd_range() checks page_mapcount > outside the page table lock, but mapcount can be decremented by > __unmap_hugepage_range(), so there's a race. > __unmap_hugepage_range() calls page_remove_rmap() inside page table lock, > so we can avoid this race by doing whole check_hugetlb_pmd_range()'s work > inside the page table lock. Why you use page_table_lock instead of split ptlock to protect 2MB? > > Thanks, > Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx125.postini.com [74.125.245.125]) by kanga.kvack.org (Postfix) with SMTP id 35E2B6B0002 for ; Thu, 21 Mar 2013 08:56:33 -0400 (EDT) Date: Thu, 21 Mar 2013 13:56:28 +0100 From: Michal Hocko Subject: Re: [RFC][PATCH 0/9] extend hugepage migration Message-ID: <20130321125628.GB6051@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <514A4B1C.6020201@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Simon Jeons Cc: Naoya Horiguchi , linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org On Thu 21-03-13 07:49:48, Simon Jeons wrote: [...] > When I hacking arch/x86/mm/hugetlbpage.c like this, > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > index ae1aa71..87f34ee 100644 > --- a/arch/x86/mm/hugetlbpage.c > +++ b/arch/x86/mm/hugetlbpage.c > @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, > unsigned long addr, > > #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ > > -#ifdef CONFIG_X86_64 > static __init int setup_hugepagesz(char *opt) > { > unsigned long ps = memparse(opt, &opt); > if (ps == PMD_SIZE) { > hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); > - } else if (ps == PUD_SIZE && cpu_has_gbpages) { > - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); > + } else if (ps == PUD_SIZE) { > + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); > } else { > printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", > ps >> 20); > > I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. > What's the difference between these pages which I hacking and normal > huge pages? How is this related to the patch set? Please _stop_ distracting discussion to unrelated topics! Nothing personal but this is just wasting our time. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx193.postini.com [74.125.245.193]) by kanga.kvack.org (Postfix) with SMTP id ABFDF6B0039 for ; Thu, 21 Mar 2013 19:46:39 -0400 (EDT) Received: by mail-ie0-f176.google.com with SMTP id x14so4117713ief.7 for ; Thu, 21 Mar 2013 16:46:39 -0700 (PDT) Message-ID: <514B9BD8.9050207@gmail.com> Date: Fri, 22 Mar 2013 07:46:32 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> In-Reply-To: <20130321125628.GB6051@dhcp22.suse.cz> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Naoya Horiguchi , linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Hi Michal, On 03/21/2013 08:56 PM, Michal Hocko wrote: > On Thu 21-03-13 07:49:48, Simon Jeons wrote: > [...] >> When I hacking arch/x86/mm/hugetlbpage.c like this, >> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >> index ae1aa71..87f34ee 100644 >> --- a/arch/x86/mm/hugetlbpage.c >> +++ b/arch/x86/mm/hugetlbpage.c >> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, >> unsigned long addr, >> >> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ >> >> -#ifdef CONFIG_X86_64 >> static __init int setup_hugepagesz(char *opt) >> { >> unsigned long ps = memparse(opt, &opt); >> if (ps == PMD_SIZE) { >> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); >> - } else if (ps == PUD_SIZE && cpu_has_gbpages) { >> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); >> + } else if (ps == PUD_SIZE) { >> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); >> } else { >> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", >> ps >> 20); >> >> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. >> What's the difference between these pages which I hacking and normal >> huge pages? > How is this related to the patch set? > Please _stop_ distracting discussion to unrelated topics! > > Nothing personal but this is just wasting our time. Sorry kindly Michal, my bad. Btw, could you explain this question for me? very sorry waste your time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx113.postini.com [74.125.245.113]) by kanga.kvack.org (Postfix) with SMTP id CF26E6B0005 for ; Thu, 4 Apr 2013 00:57:38 -0400 (EDT) Received: by mail-ob0-f174.google.com with SMTP id 16so2163051obc.5 for ; Wed, 03 Apr 2013 21:57:38 -0700 (PDT) Message-ID: <515D083A.4010704@gmail.com> Date: Thu, 04 Apr 2013 12:57:30 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> <5148FB6C.4070202@gmail.com> <1363816399-c6e7mofc-mutt-n-horiguchi@ah.jp.nec.com> <514A4815.4040206@gmail.com> In-Reply-To: <514A4815.4040206@gmail.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Ping! On 03/21/2013 07:36 AM, Simon Jeons wrote: > Hi Naoya, > On 03/21/2013 05:53 AM, Naoya Horiguchi wrote: >> On Wed, Mar 20, 2013 at 07:57:32AM +0800, Simon Jeons wrote: >>> Hi Naoya, >>> On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: >>>> When we have a page fault for the address which is backed by a hugepage >>>> under migration, the kernel can't wait correctly until the migration >>>> finishes. This is because pte_offset_map_lock() can't get a correct >>> It seems that current hugetlb_fault still wait hugetlb page under >>> migration, how can it work without lock 2MB memory? >> Hugetlb_fault() does call migration_entry_wait(), but returns immediately. > Could you point out to me which code in function migration_entry_wait() > lead to return immediately? > >> So page fault happens over and over again until the migration completes. >> IOW, migration_entry_wait() is now broken for hugepage and doesn't work >> as expected. >> >> Thanks, >> Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx110.postini.com [74.125.245.110]) by kanga.kvack.org (Postfix) with SMTP id 87FA16B0027 for ; Thu, 4 Apr 2013 21:15:05 -0400 (EDT) Received: by mail-ob0-f169.google.com with SMTP id wp18so2573577obc.0 for ; Thu, 04 Apr 2013 18:15:04 -0700 (PDT) Message-ID: <515E2592.7020607@gmail.com> Date: Fri, 05 Apr 2013 09:14:58 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> In-Reply-To: <20130322081532.GC31457@dhcp22.suse.cz> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes Hi Michal, On 03/22/2013 04:15 PM, Michal Hocko wrote: > [getting off-list] > > On Fri 22-03-13 07:46:32, Simon Jeons wrote: >> Hi Michal, >> On 03/21/2013 08:56 PM, Michal Hocko wrote: >>> On Thu 21-03-13 07:49:48, Simon Jeons wrote: >>> [...] >>>> When I hacking arch/x86/mm/hugetlbpage.c like this, >>>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >>>> index ae1aa71..87f34ee 100644 >>>> --- a/arch/x86/mm/hugetlbpage.c >>>> +++ b/arch/x86/mm/hugetlbpage.c >>>> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, >>>> unsigned long addr, >>>> >>>> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ >>>> >>>> -#ifdef CONFIG_X86_64 >>>> static __init int setup_hugepagesz(char *opt) >>>> { >>>> unsigned long ps = memparse(opt, &opt); >>>> if (ps == PMD_SIZE) { >>>> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); >>>> - } else if (ps == PUD_SIZE && cpu_has_gbpages) { >>>> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); >>>> + } else if (ps == PUD_SIZE) { >>>> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); >>>> } else { >>>> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", >>>> ps >> 20); >>>> >>>> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. >>>> What's the difference between these pages which I hacking and normal >>>> huge pages? >>> How is this related to the patch set? >>> Please _stop_ distracting discussion to unrelated topics! >>> >>> Nothing personal but this is just wasting our time. >> Sorry kindly Michal, my bad. >> Btw, could you explain this question for me? very sorry waste your time. > Your CPU has to support GB pages. You have removed cpu_has_gbpages test > and added a hstate for order 13 pages which is a weird number on its > own (32MB) because there is no page table level to support them. But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*, and have equal number of 32MB huge pages which I set up in boot parameter. If there is no page table level to support them, how can them present? I can hacking this successfully in ubuntu, but not in fedora. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx141.postini.com [74.125.245.141]) by kanga.kvack.org (Postfix) with SMTP id E1DBE6B0027 for ; Fri, 5 Apr 2013 04:08:31 -0400 (EDT) Date: Fri, 5 Apr 2013 10:08:28 +0200 From: Michal Hocko Subject: Re: [RFC][PATCH 0/9] extend hugepage migration Message-ID: <20130405080828.GA14882@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> <515E2592.7020607@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <515E2592.7020607@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Simon Jeons Cc: Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes On Fri 05-04-13 09:14:58, Simon Jeons wrote: > Hi Michal, > On 03/22/2013 04:15 PM, Michal Hocko wrote: > >[getting off-list] > > > >On Fri 22-03-13 07:46:32, Simon Jeons wrote: > >>Hi Michal, > >>On 03/21/2013 08:56 PM, Michal Hocko wrote: > >>>On Thu 21-03-13 07:49:48, Simon Jeons wrote: > >>>[...] > >>>>When I hacking arch/x86/mm/hugetlbpage.c like this, > >>>>diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > >>>>index ae1aa71..87f34ee 100644 > >>>>--- a/arch/x86/mm/hugetlbpage.c > >>>>+++ b/arch/x86/mm/hugetlbpage.c > >>>>@@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, > >>>>unsigned long addr, > >>>> > >>>>#endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ > >>>> > >>>>-#ifdef CONFIG_X86_64 > >>>>static __init int setup_hugepagesz(char *opt) > >>>>{ > >>>>unsigned long ps = memparse(opt, &opt); > >>>>if (ps == PMD_SIZE) { > >>>>hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); > >>>>- } else if (ps == PUD_SIZE && cpu_has_gbpages) { > >>>>- hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); > >>>>+ } else if (ps == PUD_SIZE) { > >>>>+ hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); > >>>>} else { > >>>>printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", > >>>>ps >> 20); > >>>> > >>>>I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. > >>>>What's the difference between these pages which I hacking and normal > >>>>huge pages? > >>>How is this related to the patch set? > >>>Please _stop_ distracting discussion to unrelated topics! > >>> > >>>Nothing personal but this is just wasting our time. > >>Sorry kindly Michal, my bad. > >>Btw, could you explain this question for me? very sorry waste your time. > >Your CPU has to support GB pages. You have removed cpu_has_gbpages test > >and added a hstate for order 13 pages which is a weird number on its > >own (32MB) because there is no page table level to support them. > > But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*, > and have equal number of 32MB huge pages which I set up in boot > parameter. because hugetlb_add_hstate creates hstate for those pages and hugetlb_init_hstates allocates them later on. > If there is no page table level to support them, how can > them present? Because hugetlb hstate handling code doesn't care about page tables and the way how those pages are going to be mapped _at all_. Or put it in another way. Nobody prevents you to allocate order-5 page for a single pte but that would be a pure waste. Page fault code expects that pages with a proper size are allocated. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx124.postini.com [74.125.245.124]) by kanga.kvack.org (Postfix) with SMTP id 8C8296B0005 for ; Fri, 5 Apr 2013 05:01:06 -0400 (EDT) Received: by mail-da0-f48.google.com with SMTP id p8so1502348dan.21 for ; Fri, 05 Apr 2013 02:01:05 -0700 (PDT) Message-ID: <515E92CA.4000507@gmail.com> Date: Fri, 05 Apr 2013 17:00:58 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> <515E2592.7020607@gmail.com> <20130405080828.GA14882@dhcp22.suse.cz> In-Reply-To: <20130405080828.GA14882@dhcp22.suse.cz> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes Hi Michal, On 04/05/2013 04:08 PM, Michal Hocko wrote: > On Fri 05-04-13 09:14:58, Simon Jeons wrote: >> Hi Michal, >> On 03/22/2013 04:15 PM, Michal Hocko wrote: >>> [getting off-list] >>> >>> On Fri 22-03-13 07:46:32, Simon Jeons wrote: >>>> Hi Michal, >>>> On 03/21/2013 08:56 PM, Michal Hocko wrote: >>>>> On Thu 21-03-13 07:49:48, Simon Jeons wrote: >>>>> [...] >>>>>> When I hacking arch/x86/mm/hugetlbpage.c like this, >>>>>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >>>>>> index ae1aa71..87f34ee 100644 >>>>>> --- a/arch/x86/mm/hugetlbpage.c >>>>>> +++ b/arch/x86/mm/hugetlbpage.c >>>>>> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, >>>>>> unsigned long addr, >>>>>> >>>>>> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ >>>>>> >>>>>> -#ifdef CONFIG_X86_64 >>>>>> static __init int setup_hugepagesz(char *opt) >>>>>> { >>>>>> unsigned long ps = memparse(opt, &opt); >>>>>> if (ps == PMD_SIZE) { >>>>>> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); >>>>>> - } else if (ps == PUD_SIZE && cpu_has_gbpages) { >>>>>> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); >>>>>> + } else if (ps == PUD_SIZE) { >>>>>> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); >>>>>> } else { >>>>>> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", >>>>>> ps >> 20); >>>>>> >>>>>> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. >>>>>> What's the difference between these pages which I hacking and normal >>>>>> huge pages? >>>>> How is this related to the patch set? >>>>> Please _stop_ distracting discussion to unrelated topics! >>>>> >>>>> Nothing personal but this is just wasting our time. >>>> Sorry kindly Michal, my bad. >>>> Btw, could you explain this question for me? very sorry waste your time. >>> Your CPU has to support GB pages. You have removed cpu_has_gbpages test >>> and added a hstate for order 13 pages which is a weird number on its >>> own (32MB) because there is no page table level to support them. >> But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*, >> and have equal number of 32MB huge pages which I set up in boot >> parameter. > because hugetlb_add_hstate creates hstate for those pages and > hugetlb_init_hstates allocates them later on. > >> If there is no page table level to support them, how can >> them present? > Because hugetlb hstate handling code doesn't care about page tables and > the way how those pages are going to be mapped _at all_. Or put it in > another way. Nobody prevents you to allocate order-5 page for a single > pte but that would be a pure waste. Page fault code expects that pages > with a proper size are allocated. Do you mean 32MB pages will map to one pmd which should map 2MB pages? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx153.postini.com [74.125.245.153]) by kanga.kvack.org (Postfix) with SMTP id BF6686B0027 for ; Fri, 5 Apr 2013 05:30:37 -0400 (EDT) Date: Fri, 5 Apr 2013 11:30:34 +0200 From: Michal Hocko Subject: Re: [RFC][PATCH 0/9] extend hugepage migration Message-ID: <20130405093034.GB31132@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> <515E2592.7020607@gmail.com> <20130405080828.GA14882@dhcp22.suse.cz> <515E92CA.4000507@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <515E92CA.4000507@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Simon Jeons Cc: Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes On Fri 05-04-13 17:00:58, Simon Jeons wrote: > Hi Michal, > On 04/05/2013 04:08 PM, Michal Hocko wrote: > >On Fri 05-04-13 09:14:58, Simon Jeons wrote: > >>Hi Michal, > >>On 03/22/2013 04:15 PM, Michal Hocko wrote: > >>>[getting off-list] > >>> > >>>On Fri 22-03-13 07:46:32, Simon Jeons wrote: > >>>>Hi Michal, > >>>>On 03/21/2013 08:56 PM, Michal Hocko wrote: > >>>>>On Thu 21-03-13 07:49:48, Simon Jeons wrote: > >>>>>[...] > >>>>>>When I hacking arch/x86/mm/hugetlbpage.c like this, > >>>>>>diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > >>>>>>index ae1aa71..87f34ee 100644 > >>>>>>--- a/arch/x86/mm/hugetlbpage.c > >>>>>>+++ b/arch/x86/mm/hugetlbpage.c > >>>>>>@@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, > >>>>>>unsigned long addr, > >>>>>> > >>>>>>#endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ > >>>>>> > >>>>>>-#ifdef CONFIG_X86_64 > >>>>>>static __init int setup_hugepagesz(char *opt) > >>>>>>{ > >>>>>>unsigned long ps = memparse(opt, &opt); > >>>>>>if (ps == PMD_SIZE) { > >>>>>>hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); > >>>>>>- } else if (ps == PUD_SIZE && cpu_has_gbpages) { > >>>>>>- hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); > >>>>>>+ } else if (ps == PUD_SIZE) { > >>>>>>+ hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); > >>>>>>} else { > >>>>>>printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", > >>>>>>ps >> 20); > >>>>>> > >>>>>>I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. > >>>>>>What's the difference between these pages which I hacking and normal > >>>>>>huge pages? > >>>>>How is this related to the patch set? > >>>>>Please _stop_ distracting discussion to unrelated topics! > >>>>> > >>>>>Nothing personal but this is just wasting our time. > >>>>Sorry kindly Michal, my bad. > >>>>Btw, could you explain this question for me? very sorry waste your time. > >>>Your CPU has to support GB pages. You have removed cpu_has_gbpages test > >>>and added a hstate for order 13 pages which is a weird number on its > >>>own (32MB) because there is no page table level to support them. > >>But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*, > >>and have equal number of 32MB huge pages which I set up in boot > >>parameter. > >because hugetlb_add_hstate creates hstate for those pages and > >hugetlb_init_hstates allocates them later on. > > > >>If there is no page table level to support them, how can > >>them present? > >Because hugetlb hstate handling code doesn't care about page tables and > >the way how those pages are going to be mapped _at all_. Or put it in > >another way. Nobody prevents you to allocate order-5 page for a single > >pte but that would be a pure waste. Page fault code expects that pages > >with a proper size are allocated. > Do you mean 32MB pages will map to one pmd which should map 2MB pages? > Please refer to hugetlb_fault for more information. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx146.postini.com [74.125.245.146]) by kanga.kvack.org (Postfix) with SMTP id 7BE116B0005 for ; Sat, 6 Apr 2013 20:32:37 -0400 (EDT) Received: by mail-oa0-f45.google.com with SMTP id o6so5194691oag.32 for ; Sat, 06 Apr 2013 17:32:36 -0700 (PDT) Message-ID: <5160BE9E.1050905@gmail.com> Date: Sun, 07 Apr 2013 08:32:30 +0800 From: Simon Jeons MIME-Version: 1.0 Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> <515E2592.7020607@gmail.com> <20130405080828.GA14882@dhcp22.suse.cz> <515E92CA.4000507@gmail.com> <20130405093034.GB31132@dhcp22.suse.cz> In-Reply-To: <20130405093034.GB31132@dhcp22.suse.cz> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes Hi Michal, On 04/05/2013 05:30 PM, Michal Hocko wrote: > On Fri 05-04-13 17:00:58, Simon Jeons wrote: >> Hi Michal, >> On 04/05/2013 04:08 PM, Michal Hocko wrote: >>> On Fri 05-04-13 09:14:58, Simon Jeons wrote: >>>> Hi Michal, >>>> On 03/22/2013 04:15 PM, Michal Hocko wrote: >>>>> [getting off-list] >>>>> >>>>> On Fri 22-03-13 07:46:32, Simon Jeons wrote: >>>>>> Hi Michal, >>>>>> On 03/21/2013 08:56 PM, Michal Hocko wrote: >>>>>>> On Thu 21-03-13 07:49:48, Simon Jeons wrote: >>>>>>> [...] >>>>>>>> When I hacking arch/x86/mm/hugetlbpage.c like this, >>>>>>>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >>>>>>>> index ae1aa71..87f34ee 100644 >>>>>>>> --- a/arch/x86/mm/hugetlbpage.c >>>>>>>> +++ b/arch/x86/mm/hugetlbpage.c >>>>>>>> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, >>>>>>>> unsigned long addr, >>>>>>>> >>>>>>>> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ >>>>>>>> >>>>>>>> -#ifdef CONFIG_X86_64 >>>>>>>> static __init int setup_hugepagesz(char *opt) >>>>>>>> { >>>>>>>> unsigned long ps = memparse(opt, &opt); >>>>>>>> if (ps == PMD_SIZE) { >>>>>>>> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); >>>>>>>> - } else if (ps == PUD_SIZE && cpu_has_gbpages) { >>>>>>>> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); >>>>>>>> + } else if (ps == PUD_SIZE) { >>>>>>>> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); >>>>>>>> } else { >>>>>>>> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", >>>>>>>> ps >> 20); >>>>>>>> >>>>>>>> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. >>>>>>>> What's the difference between these pages which I hacking and normal >>>>>>>> huge pages? >>>>>>> How is this related to the patch set? >>>>>>> Please _stop_ distracting discussion to unrelated topics! >>>>>>> >>>>>>> Nothing personal but this is just wasting our time. >>>>>> Sorry kindly Michal, my bad. >>>>>> Btw, could you explain this question for me? very sorry waste your time. >>>>> Your CPU has to support GB pages. You have removed cpu_has_gbpages test >>>>> and added a hstate for order 13 pages which is a weird number on its >>>>> own (32MB) because there is no page table level to support them. >>>> But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*, >>>> and have equal number of 32MB huge pages which I set up in boot >>>> parameter. >>> because hugetlb_add_hstate creates hstate for those pages and >>> hugetlb_init_hstates allocates them later on. >>> >>>> If there is no page table level to support them, how can >>>> them present? >>> Because hugetlb hstate handling code doesn't care about page tables and >>> the way how those pages are going to be mapped _at all_. Or put it in >>> another way. Nobody prevents you to allocate order-5 page for a single >>> pte but that would be a pure waste. Page fault code expects that pages >>> with a proper size are allocated. >> Do you mean 32MB pages will map to one pmd which should map 2MB pages? >> > Please refer to hugetlb_fault for more information. Thanks for your pointing out. So my assume is correct, is it? Can pmd which support 2MB map 32MB pages work well? > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx190.postini.com [74.125.245.190]) by kanga.kvack.org (Postfix) with SMTP id 625FC6B0005 for ; Sun, 7 Apr 2013 10:05:40 -0400 (EDT) Received: by mail-ve0-f170.google.com with SMTP id 15so4719459vea.15 for ; Sun, 07 Apr 2013 07:05:39 -0700 (PDT) Message-ID: <51617D37.1020502@gmail.com> Date: Sun, 07 Apr 2013 10:05:43 -0400 From: KOSAKI Motohiro MIME-Version: 1.0 Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> <515E2592.7020607@gmail.com> <20130405080828.GA14882@dhcp22.suse.cz> <515E92CA.4000507@gmail.com> <20130405093034.GB31132@dhcp22.suse.cz> <5160BE9E.1050905@gmail.com> In-Reply-To: <5160BE9E.1050905@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Simon Jeons Cc: Michal Hocko , Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes , kosaki.motohiro@gmail.com >> Please refer to hugetlb_fault for more information. > > Thanks for your pointing out. So my assume is correct, is it? Can pmd > which support 2MB map 32MB pages work well? Simon, Please stop hijaking unrelated threads. This is not question and answer thread. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755354Ab3BUTms (ORCPT ); Thu, 21 Feb 2013 14:42:48 -0500 Received: from mx1.redhat.com ([209.132.183.28]:8231 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754110Ab3BUTmp (ORCPT ); Thu, 21 Feb 2013 14:42:45 -0500 From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: [RFC][PATCH 0/9] extend hugepage migration Date: Thu, 21 Feb 2013 14:41:39 -0500 Message-Id: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Hugepage migration is now available only for soft offlining (moving data on the half corrupted page to another page to save the data). But it's also useful some other users of page migration, so this patchset tries to extend some of such users to support hugepage. The targets of this patchset are NUMA related system calls (i.e. migrate_pages(2), move_pages(2), and mbind(2)), and memory hotplug. This patchset does not extend page migration in memory compaction, because I think that users of memory compaction mainly expect to construct thp by arranging raw pages but hugepage migration doesn't help it. CMA, another user of page migration, can have benefit from hugepage migration, but is not enabled to support it now. This is because I've never used CMA and need to learn more to extend and/or test hugepage migration in CMA. I'll add this in later version if it becomes ready, or will post as a separate patchset. Hugepage migration of 1GB hugepage is not enabled for now, because I'm not sure whether users of 1GB hugepage really want it. We need to spare free hugepage in order to do migration, but I don't think that users want to 1GB memory to idle for that purpose (currently we can't expand/shrink 1GB hugepage pool after boot). Could you review and give me some comments/feedbacks? Thanks, Naoya Horiguchi --- Easy patch access: git@github.com:Naoya-Horiguchi/linux.git branch:extend_hugepage_migration Test code: git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git Naoya Horiguchi (9): migrate: add migrate_entry_wait_huge() migrate: make core migration code aware of hugepage soft-offline: use migrate_pages() instead of migrate_huge_page() migrate: clean up migrate_huge_page() migrate: enable migrate_pages() to migrate hugepage migrate: enable move_pages() to migrate hugepage mbind: enable mbind() to migrate hugepage memory-hotplug: enable memory hotplug to handle hugepage remove /proc/sys/vm/hugepages_treat_as_movable Documentation/sysctl/vm.txt | 16 ------ include/linux/hugetlb.h | 25 ++++++++-- include/linux/mempolicy.h | 2 +- include/linux/migrate.h | 12 ++--- include/linux/swapops.h | 4 ++ kernel/sysctl.c | 7 --- mm/hugetlb.c | 98 ++++++++++++++++++++++++++++-------- mm/memory-failure.c | 20 ++++++-- mm/memory.c | 6 ++- mm/memory_hotplug.c | 51 +++++++++++++++---- mm/mempolicy.c | 61 +++++++++++++++-------- mm/migrate.c | 119 ++++++++++++++++++++++++++++++-------------- mm/page_alloc.c | 12 +++++ mm/page_isolation.c | 5 ++ 14 files changed, 311 insertions(+), 127 deletions(-) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755487Ab3BUTmx (ORCPT ); Thu, 21 Feb 2013 14:42:53 -0500 Received: from mx1.redhat.com ([209.132.183.28]:20816 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754110Ab3BUTmu (ORCPT ); Thu, 21 Feb 2013 14:42:50 -0500 From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Date: Thu, 21 Feb 2013 14:41:47 -0500 Message-Id: <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently we can't offline memory blocks which contain hugepages because a hugepage is considered as an unmovable page. But now with this patch series, a hugepage has become movable, so by using hugepage migration we can offline such memory blocks. What's different from other users of hugepage migration is that we need to decompose all the hugepages inside the target memory block into free buddy pages after hugepage migration, because otherwise free hugepages remaining in the memory block intervene the memory offlining. For this reason we introduce new functions dissolve_free_huge_page() and dissolve_free_huge_pages(). Other than that, what this patch does is straightforwardly to add hugepage migration code, that is, adding hugepage code to the functions which scan over pfn and collect hugepages to be migrated, and adding a hugepage allocation function to alloc_migrate_target(). As for larger hugepages (1GB for x86_64), it's not easy to do hotremove over them because it's larger than memory block. So we now simply leave it to fail as it is. Signed-off-by: Naoya Horiguchi --- include/linux/hugetlb.h | 8 ++++++++ mm/hugetlb.c | 43 +++++++++++++++++++++++++++++++++++++++++ mm/memory_hotplug.c | 51 ++++++++++++++++++++++++++++++++++++++++--------- mm/migrate.c | 12 +++++++++++- mm/page_alloc.c | 12 ++++++++++++ mm/page_isolation.c | 5 +++++ 6 files changed, 121 insertions(+), 10 deletions(-) diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index 86a4d78..e33f07f 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page); void putback_active_hugepage(struct page *page); void putback_active_hugepages(struct list_head *l); void migrate_hugepage_add(struct page *page, struct list_head *list); +int is_hugepage_movable(struct page *page); void copy_huge_page(struct page *dst, struct page *src); extern unsigned long hugepages_treat_as_movable; @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) #define putback_active_hugepage(p) 0 #define putback_active_hugepages(l) 0 #define migrate_hugepage_add(p, l) 0 +#define is_hugepage_movable(x) 0 static inline void copy_huge_page(struct page *dst, struct page *src) { } @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h) return h - hstates; } +extern void dissolve_free_huge_page(struct page *page); +extern void dissolve_free_huge_pages(unsigned long start_pfn, + unsigned long end_pfn); + #else struct hstate {}; #define alloc_huge_page(v, a, r) NULL @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) } #define hstate_index_to_shift(index) 0 #define hstate_index(h) 0 +#define dissolve_free_huge_page(p) 0 +#define dissolve_free_huge_pages(s, e) 0 #endif #endif /* _LINUX_HUGETLB_H */ diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index ccf9995..c28e6c9 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, return ret; } +/* Dissolve a given free hugepage into free pages. */ +void dissolve_free_huge_page(struct page *page) +{ + if (PageHuge(page) && !page_count(page)) { + struct hstate *h = page_hstate(page); + int nid = page_to_nid(page); + spin_lock(&hugetlb_lock); + list_del(&page->lru); + h->free_huge_pages--; + h->free_huge_pages_node[nid]--; + update_and_free_page(h, page); + spin_unlock(&hugetlb_lock); + } +} + +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */ +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long pfn; + unsigned int step = 1 << (HUGETLB_PAGE_ORDER); + for (pfn = start_pfn; pfn < end_pfn; pfn += step) + dissolve_free_huge_page(pfn_to_page(pfn)); +} + static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) { struct page *page; @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage) return 0; } +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ +int is_hugepage_movable(struct page *hpage) +{ + struct page *page; + struct page *tmp; + struct hstate *h = page_hstate(hpage); + int ret = 0; + + VM_BUG_ON(!PageHuge(hpage)); + if (PageTail(hpage)) + return 0; + spin_lock(&hugetlb_lock); + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) + if (page == hpage) + ret = 1; + spin_unlock(&hugetlb_lock); + return ret; +} + /* * This function is called from memory failure code. * Assume the caller holds page lock of the head page. diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c index d04ed87..6418de2 100644 --- v3.8.orig/mm/memory_hotplug.c +++ v3.8/mm/memory_hotplug.c @@ -29,6 +29,7 @@ #include #include #include +#include #include @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) } /* - * Scanning pfn is much easier than scanning lru list. - * Scan pfn from start to end and Find LRU page. + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages + * and hugepages). We scan pfn because it's much easier than scanning over + * linked list. This function returns the pfn of the first found movable + * page if it's found, otherwise 0. */ -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) +static unsigned long scan_movable_pages(unsigned long start, unsigned long end) { unsigned long pfn; struct page *page; @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end) page = pfn_to_page(pfn); if (PageLRU(page)) return pfn; + if (PageHuge(page)) { + if (is_hugepage_movable(page)) + return pfn; + else + pfn += (1 << compound_order(page)) - 1; + } } } return 0; @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) page = pfn_to_page(pfn); if (!get_page_unless_zero(page)) continue; + if (PageHuge(page)) { + /* + * Larger hugepage (1GB for x86_64) is larger than + * memory block, so pfn scan can start at the tail + * page of larger hugepage. In such case, + * we simply skip the hugepage and move the cursor + * to the last tail page. + */ + if (PageTail(page)) { + struct page *head = compound_head(page); + pfn = page_to_pfn(head) + + (1 << compound_order(head)) - 1; + put_page(page); + continue; + } + pfn = (1 << compound_order(page)) - 1; + if (huge_page_size(page_hstate(page)) != PMD_SIZE) { + put_page(page); + continue; + } + list_move_tail(&page->lru, &source); + move_pages -= 1 << compound_order(page); + continue; + } /* * We can skip free pages. And we can only deal with pages on * LRU. @@ -1049,7 +1082,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) } if (!list_empty(&source)) { if (not_managed) { - putback_lru_pages(&source); + putback_movable_pages(&source); goto out; } @@ -1057,11 +1090,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) * alloc_migrate_target should be improooooved!! * migrate_pages returns # of failed pages. */ - ret = migrate_pages(&source, alloc_migrate_target, 0, + ret = migrate_movable_pages(&source, alloc_migrate_target, 0, true, MIGRATE_SYNC, MR_MEMORY_HOTPLUG); - if (ret) - putback_lru_pages(&source); } out: return ret; @@ -1304,8 +1335,8 @@ static int __ref __offline_pages(unsigned long start_pfn, drain_all_pages(); } - pfn = scan_lru_pages(start_pfn, end_pfn); - if (pfn) { /* We have page on LRU */ + pfn = scan_movable_pages(start_pfn, end_pfn); + if (pfn) { /* We have movable pages */ ret = do_migrate_range(pfn, end_pfn); if (!ret) { drain = 1; @@ -1324,6 +1355,8 @@ static int __ref __offline_pages(unsigned long start_pfn, yield(); /* drain pcp pages, this is synchronous. */ drain_all_pages(); + /* dissolve all free hugepages inside the memory block */ + dissolve_free_huge_pages(start_pfn, end_pfn); /* check again */ offlined_pages = check_pages_isolated(start_pfn, end_pfn); if (offlined_pages < 0) { diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 8c457e7..a491a98 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -1009,8 +1009,18 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, unlock_page(hpage); out: - if (rc != -EAGAIN) + if (rc != -EAGAIN) { putback_active_hugepage(hpage); + + /* + * After hugepage migration from memory hotplug, the original + * hugepage should never be allocated again. This will be + * done by dissolving it into free normal pages, because + * we already set migratetype to MIGRATE_ISOLATE for them. + */ + if (offlining) + dissolve_free_huge_page(hpage); + } put_page(new_hpage); if (result) { if (rc) diff --git v3.8.orig/mm/page_alloc.c v3.8/mm/page_alloc.c index 6a83cd3..c37951d 100644 --- v3.8.orig/mm/page_alloc.c +++ v3.8/mm/page_alloc.c @@ -58,6 +58,7 @@ #include #include #include +#include #include #include @@ -5686,6 +5687,17 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, continue; page = pfn_to_page(check); + + /* + * Hugepages are not in LRU lists, but they're movable. + * We need not scan over tail pages bacause we don't + * handle each tail page individually in migration. + */ + if (PageHuge(page)) { + iter += (1 << compound_order(page)) - 1; + continue; + } + /* * We can't use page_count without pin a page * because another CPU can free compound page. diff --git v3.8.orig/mm/page_isolation.c v3.8/mm/page_isolation.c index 383bdbb..cf48ef6 100644 --- v3.8.orig/mm/page_isolation.c +++ v3.8/mm/page_isolation.c @@ -6,6 +6,7 @@ #include #include #include +#include #include "internal.h" int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages) @@ -252,6 +253,10 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, { gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE; + if (PageHuge(page)) + return alloc_huge_page_node(page_hstate(compound_head(page)), + numa_node_id()); + if (PageHighMem(page)) gfp_mask |= __GFP_HIGHMEM; -- 1.7.11.7 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755591Ab3BUTnF (ORCPT ); Thu, 21 Feb 2013 14:43:05 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37902 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752523Ab3BUTmw (ORCPT ); Thu, 21 Feb 2013 14:42:52 -0500 From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Date: Thu, 21 Feb 2013 14:41:48 -0500 Message-Id: <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Now hugepages are definitely movable. So allocating hugepages from ZONE_MOVABLE is natural and we have no reason to keep this parameter. Signed-off-by: Naoya Horiguchi --- Documentation/sysctl/vm.txt | 16 ---------------- include/linux/hugetlb.h | 2 -- kernel/sysctl.c | 7 ------- mm/hugetlb.c | 23 +++++------------------ 4 files changed, 5 insertions(+), 43 deletions(-) diff --git v3.8.orig/Documentation/sysctl/vm.txt v3.8/Documentation/sysctl/vm.txt index 078701f..997350a 100644 --- v3.8.orig/Documentation/sysctl/vm.txt +++ v3.8/Documentation/sysctl/vm.txt @@ -167,22 +167,6 @@ fragmentation index is <= extfrag_threshold. The default value is 500. ============================================================== -hugepages_treat_as_movable - -This parameter is only useful when kernelcore= is specified at boot time to -create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages -are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero -value written to hugepages_treat_as_movable allows huge pages to be allocated -from ZONE_MOVABLE. - -Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge -pages pool can easily grow or shrink within. Assuming that applications are -not running that mlock() a lot of memory, it is likely the huge pages pool -can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value -into nr_hugepages and triggering page reclaim. - -============================================================== - hugetlb_shm_group hugetlb_shm_group contains group id that is allowed to create SysV diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index e33f07f..c97e5c5 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -35,7 +35,6 @@ int PageHuge(struct page *page); void reset_vma_resv_huge_pages(struct vm_area_struct *vma); int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); -int hugetlb_treat_movable_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); #ifdef CONFIG_NUMA int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, @@ -73,7 +72,6 @@ void migrate_hugepage_add(struct page *page, struct list_head *list); int is_hugepage_movable(struct page *page); void copy_huge_page(struct page *dst, struct page *src); -extern unsigned long hugepages_treat_as_movable; extern const unsigned long hugetlb_zero, hugetlb_infinity; extern int sysctl_hugetlb_shm_group; extern struct list_head huge_boot_pages; diff --git v3.8.orig/kernel/sysctl.c v3.8/kernel/sysctl.c index c88878d..a98bcf2 100644 --- v3.8.orig/kernel/sysctl.c +++ v3.8/kernel/sysctl.c @@ -1189,13 +1189,6 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, - { - .procname = "hugepages_treat_as_movable", - .data = &hugepages_treat_as_movable, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = hugetlb_treat_movable_handler, - }, { .procname = "nr_overcommit_hugepages", .data = NULL, diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index c28e6c9..c60d203 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -33,7 +33,6 @@ #include "internal.h" const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; -static gfp_t htlb_alloc_mask = GFP_HIGHUSER; unsigned long hugepages_treat_as_movable; int hugetlb_max_hstate __read_mostly; @@ -542,7 +541,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, retry_cpuset: cpuset_mems_cookie = get_mems_allowed(); zonelist = huge_zonelist(vma, address, - htlb_alloc_mask, &mpol, &nodemask); + GFP_HIGHUSER_MOVABLE, &mpol, &nodemask); /* * A child process with MAP_PRIVATE mappings created by their parent * have no page reserves. This check ensures that reservations are @@ -558,7 +557,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, for_each_zone_zonelist_nodemask(zone, z, zonelist, MAX_NR_ZONES - 1, nodemask) { - if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) { + if (cpuset_zone_allowed_softwall(zone, GFP_HIGHUSER_MOVABLE)) { page = dequeue_huge_page_node(h, zone_to_nid(zone)); if (page) { if (!avoid_reserve) @@ -698,7 +697,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid) return NULL; page = alloc_pages_exact_node(nid, - htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE| + GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE| __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); if (page) { @@ -909,12 +908,12 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) spin_unlock(&hugetlb_lock); if (nid == NUMA_NO_NODE) - page = alloc_pages(htlb_alloc_mask|__GFP_COMP| + page = alloc_pages(GFP_HIGHUSER_MOVABLE|__GFP_COMP| __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); else page = alloc_pages_exact_node(nid, - htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE| + GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE| __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); if (page && arch_prepare_hugepage(page)) { @@ -2078,18 +2077,6 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *table, int write, } #endif /* CONFIG_NUMA */ -int hugetlb_treat_movable_handler(struct ctl_table *table, int write, - void __user *buffer, - size_t *length, loff_t *ppos) -{ - proc_dointvec(table, write, buffer, length, ppos); - if (hugepages_treat_as_movable) - htlb_alloc_mask = GFP_HIGHUSER_MOVABLE; - else - htlb_alloc_mask = GFP_HIGHUSER; - return 0; -} - int hugetlb_overcommit_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, loff_t *ppos) -- 1.7.11.7 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754852Ab3BUTnl (ORCPT ); Thu, 21 Feb 2013 14:43:41 -0500 Received: from mx1.redhat.com ([209.132.183.28]:4295 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755257Ab3BUTms (ORCPT ); Thu, 21 Feb 2013 14:42:48 -0500 From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: [PATCH 7/9] mbind: enable mbind() to migrate hugepage Date: Thu, 21 Feb 2013 14:41:46 -0500 Message-Id: <1361475708-25991-8-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch enables mbind(2) to migrate hugepages. Page collecting function check_range() are already aware of hugepage by the previous patch in this series. Signed-off-by: Naoya Horiguchi --- include/linux/hugetlb.h | 3 +++ mm/hugetlb.c | 2 +- mm/mempolicy.c | 15 ++++++--------- mm/migrate.c | 7 ++++++- 4 files changed, 16 insertions(+), 11 deletions(-) diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index eb33df5..86a4d78 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -263,6 +263,8 @@ struct huge_bootmem_page { #endif }; +struct page *alloc_huge_page(struct vm_area_struct *vma, + unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_node(struct hstate *h, int nid); /* arch callback */ @@ -358,6 +360,7 @@ static inline int hstate_index(struct hstate *h) #else struct hstate {}; +#define alloc_huge_page(v, a, r) NULL #define alloc_huge_page_node(h, nid) NULL #define alloc_bootmem_huge_page(h) NULL #define hstate_file(f) NULL diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index 86ffcb7..ccf9995 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -1116,7 +1116,7 @@ static void vma_commit_reservation(struct hstate *h, } } -static struct page *alloc_huge_page(struct vm_area_struct *vma, +struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) { struct hugepage_subpool *spool = subpool_vma(vma); diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c index 8627135..9f56c40 100644 --- v3.8.orig/mm/mempolicy.c +++ v3.8/mm/mempolicy.c @@ -1187,6 +1187,8 @@ static struct page *new_vma_page(struct page *page, unsigned long private, int * vma = vma->vm_next; } + if (PageHuge(page)) + return alloc_huge_page(vma, address, 1); /* * if !vma, alloc_page_vma() will use task or system default policy */ @@ -1291,15 +1293,10 @@ static long do_mbind(unsigned long start, unsigned long len, if (!err) { int nr_failed = 0; - if (!list_empty(&pagelist)) { - WARN_ON_ONCE(flags & MPOL_MF_LAZY); - nr_failed = migrate_pages(&pagelist, new_vma_page, - (unsigned long)vma, - false, MIGRATE_SYNC, - MR_MEMPOLICY_MBIND); - if (nr_failed) - putback_lru_pages(&pagelist); - } + WARN_ON_ONCE(flags & MPOL_MF_LAZY); + nr_failed = migrate_movable_pages(&pagelist, new_vma_page, + (unsigned long)vma, false, + MIGRATE_SYNC, MR_MEMPOLICY_MBIND); if (nr_failed && (flags & MPOL_MF_STRICT)) err = -EIO; diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 36959d6..8c457e7 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -974,7 +974,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, struct page *new_hpage = get_new_page(hpage, private, &result); struct anon_vma *anon_vma = NULL; - if (!new_hpage) + /* + * Getting a new hugepage with alloc_huge_page() (which can happen + * when migration is caused by mbind()) can return ERR_PTR value, + * so we need take care of the case here. + */ + if (!new_hpage || IS_ERR_VALUE(new_hpage)) return -ENOMEM; rc = -EAGAIN; -- 1.7.11.7 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755087Ab3BUTmr (ORCPT ); Thu, 21 Feb 2013 14:42:47 -0500 Received: from mx1.redhat.com ([209.132.183.28]:12632 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754006Ab3BUTmp (ORCPT ); Thu, 21 Feb 2013 14:42:45 -0500 From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: [PATCH 4/9] migrate: clean up migrate_huge_page() Date: Thu, 21 Feb 2013 14:41:43 -0500 Message-Id: <1361475708-25991-5-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Due to the previous patch, soft_offline_huge_page() switches to use migrate_pages(), and migrate_huge_page() is not used any more. So let's remove it. Signed-off-by: Naoya Horiguchi --- include/linux/migrate.h | 6 ------ mm/migrate.c | 28 ---------------------------- 2 files changed, 34 deletions(-) diff --git v3.8.orig/include/linux/migrate.h v3.8/include/linux/migrate.h index d626c27..dc085e1 100644 --- v3.8.orig/include/linux/migrate.h +++ v3.8/include/linux/migrate.h @@ -45,9 +45,6 @@ extern int migrate_pages(struct list_head *l, new_page_t x, extern int migrate_movable_pages(struct list_head *from, new_page_t get_new_page, unsigned long private, bool offlining, enum migrate_mode mode, int reason); -extern int migrate_huge_page(struct page *, new_page_t x, - unsigned long private, bool offlining, - enum migrate_mode mode); extern int fail_migrate_page(struct address_space *, struct page *, struct page *); @@ -70,9 +67,6 @@ static inline int migrate_pages(struct list_head *l, new_page_t x, static inline int migrate_movable_pages(struct list_head *from, new_page_t get_new_page, unsigned long private, bool offlining, enum migrate_mode mode, int reason) { return -ENOSYS; } -static inline int migrate_huge_page(struct page *page, new_page_t x, - unsigned long private, bool offlining, - enum migrate_mode mode) { return -ENOSYS; } static inline int migrate_prep(void) { return -ENOSYS; } static inline int migrate_prep_local(void) { return -ENOSYS; } diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 8c13cc5..7b2ca1a 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -1106,34 +1106,6 @@ int migrate_movable_pages(struct list_head *from, new_page_t get_new_page, return err; } -int migrate_huge_page(struct page *hpage, new_page_t get_new_page, - unsigned long private, bool offlining, - enum migrate_mode mode) -{ - int pass, rc; - - for (pass = 0; pass < 10; pass++) { - rc = unmap_and_move_huge_page(get_new_page, - private, hpage, pass > 2, offlining, - mode); - switch (rc) { - case -ENOMEM: - goto out; - case -EAGAIN: - /* try again */ - cond_resched(); - break; - case MIGRATEPAGE_SUCCESS: - goto out; - default: - rc = -EIO; - goto out; - } - } -out: - return rc; -} - #ifdef CONFIG_NUMA /* * Move a list of individual pages -- 1.7.11.7 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753920Ab3BUToD (ORCPT ); Thu, 21 Feb 2013 14:44:03 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52287 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754006Ab3BUTmr (ORCPT ); Thu, 21 Feb 2013 14:42:47 -0500 From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: [PATCH 6/9] migrate: enable move_pages() to migrate hugepage Date: Thu, 21 Feb 2013 14:41:45 -0500 Message-Id: <1361475708-25991-7-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch extends move_pages() to handle vma with VM_HUGETLB and enables to migrate hugepage with migrate_pages(2). We avoid getting refcount on tail pages of hugepage, because unlike thp, hugepage is not split and we need not care about races with splitting. And migration of larger (1GB for x86_64) hugepage are not enabled. Signed-off-by: Naoya Horiguchi --- mm/memory.c | 6 ++++-- mm/migrate.c | 29 ++++++++++++++++++++--------- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git v3.8.orig/mm/memory.c v3.8/mm/memory.c index bb1369f..d7cfd11 100644 --- v3.8.orig/mm/memory.c +++ v3.8/mm/memory.c @@ -1495,7 +1495,8 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, if (pud_none(*pud)) goto no_page_table; if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) { - BUG_ON(flags & FOLL_GET); + if (flags & FOLL_GET) + goto out; page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE); goto out; } @@ -1506,8 +1507,9 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, if (pmd_none(*pmd)) goto no_page_table; if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) { - BUG_ON(flags & FOLL_GET); page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE); + if (flags & FOLL_GET && PageHead(page)) + get_page_foll(page); goto out; } if ((flags & FOLL_NUMA) && pmd_numa(*pmd)) diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 7b2ca1a..36959d6 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -1130,7 +1130,11 @@ static struct page *new_page_node(struct page *p, unsigned long private, *result = &pm->status; - return alloc_pages_exact_node(pm->node, + if (PageHuge(p)) + return alloc_huge_page_node(page_hstate(compound_head(p)), + pm->node); + else + return alloc_pages_exact_node(pm->node, GFP_HIGHUSER_MOVABLE | GFP_THISNODE, 0); } @@ -1176,6 +1180,13 @@ static int do_move_page_to_node_array(struct mm_struct *mm, if (PageReserved(page) || PageKsm(page)) goto put_and_set; + /* + * follow_page(FOLL_GET) didn't get refcount for tail pages of + * hugepage, so here we skip putting it. + */ + if (PageHuge(page) && PageTail(page)) + goto set_status; + pp->page = page; err = page_to_nid(page); @@ -1190,6 +1201,12 @@ static int do_move_page_to_node_array(struct mm_struct *mm, !migrate_all) goto put_and_set; + if (PageHuge(page)) { + get_page(page); + list_move_tail(&page->lru, &pagelist); + goto put_and_set; + } + err = isolate_lru_page(page); if (!err) { list_add_tail(&page->lru, &pagelist); @@ -1207,14 +1224,8 @@ static int do_move_page_to_node_array(struct mm_struct *mm, pp->status = err; } - err = 0; - if (!list_empty(&pagelist)) { - err = migrate_pages(&pagelist, new_page_node, - (unsigned long)pm, 0, MIGRATE_SYNC, - MR_SYSCALL); - if (err) - putback_lru_pages(&pagelist); - } + err = migrate_movable_pages(&pagelist, new_page_node, + (unsigned long)pm, 0, MIGRATE_SYNC, MR_SYSCALL); up_read(&mm->mmap_sem); return err; -- 1.7.11.7 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753815Ab3BUTof (ORCPT ); Thu, 21 Feb 2013 14:44:35 -0500 Received: from mx1.redhat.com ([209.132.183.28]:16299 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754745Ab3BUTmq (ORCPT ); Thu, 21 Feb 2013 14:42:46 -0500 From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Date: Thu, 21 Feb 2013 14:41:44 -0500 Message-Id: <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch extends check_range() to handle vma with VM_HUGETLB set. With this changes, we can migrate hugepage with migrate_pages(2). Note that for larger hugepages (covered by pud entries, 1GB for x86_64 for example), we simply skip it now. Signed-off-by: Naoya Horiguchi --- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 10 ++++++++++ mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++------------ 3 files changed, 48 insertions(+), 14 deletions(-) diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index 8f87115..eb33df5 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); int dequeue_hwpoisoned_huge_page(struct page *page); void putback_active_hugepage(struct page *page); void putback_active_hugepages(struct list_head *l); +void migrate_hugepage_add(struct page *page, struct list_head *list); void copy_huge_page(struct page *dst, struct page *src); extern unsigned long hugepages_treat_as_movable; @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write); struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, pud_t *pud, int write); -int pmd_huge(pmd_t pmd); -int pud_huge(pud_t pmd); +extern int pmd_huge(pmd_t pmd); +extern int pud_huge(pud_t pmd); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot); @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) #define putback_active_hugepage(p) 0 #define putback_active_hugepages(l) 0 +#define migrate_hugepage_add(p, l) 0 static inline void copy_huge_page(struct page *dst, struct page *src) { } diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index cb9d43b8..86ffcb7 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l) list_for_each_entry_safe(page, page2, l, lru) putback_active_hugepage(page); } + +void migrate_hugepage_add(struct page *page, struct list_head *list) +{ + VM_BUG_ON(!PageHuge(page)); + get_page(page); + spin_lock(&hugetlb_lock); + list_move_tail(&page->lru, list); + spin_unlock(&hugetlb_lock); + return; +} diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c index e2df1c1..8627135 100644 --- v3.8.orig/mm/mempolicy.c +++ v3.8/mm/mempolicy.c @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, return addr != end; } +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, + const nodemask_t *nodes, unsigned long flags, + void *private) +{ +#ifdef CONFIG_HUGETLB_PAGE + int nid; + struct page *page; + + spin_lock(&vma->vm_mm->page_table_lock); + page = pte_page(huge_ptep_get((pte_t *)pmd)); + spin_unlock(&vma->vm_mm->page_table_lock); + nid = page_to_nid(page); + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT) + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1) + || flags & MPOL_MF_MOVE_ALL)) + migrate_hugepage_add(page, private); +#else + BUG(); +#endif +} + static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, const nodemask_t *nodes, unsigned long flags, @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { + check_hugetlb_pmd_range(vma, pmd, nodes, + flags, private); + continue; + } split_huge_page_pmd(vma, addr, pmd); if (pmd_none_or_trans_huge_or_clear_bad(pmd)) continue; @@ -557,6 +583,8 @@ static inline int check_pud_range(struct vm_area_struct *vma, pgd_t *pgd, pud = pud_offset(pgd, addr); do { next = pud_addr_end(addr, end); + if (pud_huge(*pud) && is_vm_hugetlb_page(vma)) + continue; if (pud_none_or_clear_bad(pud)) continue; if (check_pmd_range(vma, pud, addr, next, nodes, @@ -648,9 +676,6 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end, return ERR_PTR(-EFAULT); } - if (is_vm_hugetlb_page(vma)) - goto next; - if (flags & MPOL_MF_LAZY) { change_prot_numa(vma, start, endvma); goto next; @@ -999,7 +1024,11 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist, static struct page *new_node_page(struct page *page, unsigned long node, int **x) { - return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0); + if (PageHuge(page)) + return alloc_huge_page_node(page_hstate(compound_head(page)), + node); + else + return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0); } /* @@ -1011,7 +1040,6 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest, { nodemask_t nmask; LIST_HEAD(pagelist); - int err = 0; nodes_clear(nmask); node_set(source, nmask); @@ -1025,15 +1053,9 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest, check_range(mm, mm->mmap->vm_start, mm->task_size, &nmask, flags | MPOL_MF_DISCONTIG_OK, &pagelist); - if (!list_empty(&pagelist)) { - err = migrate_pages(&pagelist, new_node_page, dest, + return migrate_movable_pages(&pagelist, new_node_page, dest, false, MIGRATE_SYNC, MR_SYSCALL); - if (err) - putback_lru_pages(&pagelist); - } - - return err; } /* -- 1.7.11.7 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754763Ab3BUTpI (ORCPT ); Thu, 21 Feb 2013 14:45:08 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58044 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754024Ab3BUTmp (ORCPT ); Thu, 21 Feb 2013 14:42:45 -0500 From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: [PATCH 1/9] migrate: add migrate_entry_wait_huge() Date: Thu, 21 Feb 2013 14:41:40 -0500 Message-Id: <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When we have a page fault for the address which is backed by a hugepage under migration, the kernel can't wait correctly until the migration finishes. This is because pte_offset_map_lock() can't get a correct migration entry for hugepage. This patch adds migration_entry_wait_huge() to separate code path between normal pages and hugepages. Signed-off-by: Naoya Horiguchi --- include/linux/hugetlb.h | 2 ++ include/linux/swapops.h | 4 ++++ mm/hugetlb.c | 4 ++-- mm/migrate.c | 24 ++++++++++++++++++++++++ 4 files changed, 32 insertions(+), 2 deletions(-) diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index 0c80d3f..40b27f6 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -43,6 +43,7 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, #endif int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); +int is_hugetlb_entry_migration(pte_t pte); int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, unsigned int flags); @@ -109,6 +110,7 @@ static inline unsigned long hugetlb_total_pages(void) #define follow_hugetlb_page(m,v,p,vs,a,b,i,w) ({ BUG(); 0; }) #define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL) #define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; }) +#define is_hugetlb_entry_migration(pte) ({ BUG(); 0; }) #define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) static inline void hugetlb_report_meminfo(struct seq_file *m) { diff --git v3.8.orig/include/linux/swapops.h v3.8/include/linux/swapops.h index 47ead51..f68efdd 100644 --- v3.8.orig/include/linux/swapops.h +++ v3.8/include/linux/swapops.h @@ -137,6 +137,8 @@ static inline void make_migration_entry_read(swp_entry_t *entry) extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); +extern void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, + unsigned long address); #else #define make_migration_entry(page, write) swp_entry(0, 0) @@ -148,6 +150,8 @@ static inline int is_migration_entry(swp_entry_t swp) static inline void make_migration_entry_read(swp_entry_t *entryp) { } static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { } +static inline void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, + unsigned long address) { } static inline int is_write_migration_entry(swp_entry_t entry) { return 0; diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index 546db81..351025e 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -2313,7 +2313,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, return -ENOMEM; } -static int is_hugetlb_entry_migration(pte_t pte) +int is_hugetlb_entry_migration(pte_t pte) { swp_entry_t swp; @@ -2823,7 +2823,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (ptep) { entry = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { - migration_entry_wait(mm, (pmd_t *)ptep, address); + migration_entry_wait_huge(mm, (pmd_t *)ptep, address); return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 2fd8b4a..7d84f4c 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, pte_unmap_unlock(ptep, ptl); } +void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, + unsigned long address) +{ + spinlock_t *ptl = pte_lockptr(mm, pmd); + pte_t pte; + swp_entry_t entry; + struct page *page; + + spin_lock(ptl); + pte = huge_ptep_get((pte_t *)pmd); + if (!is_hugetlb_entry_migration(pte)) + goto out; + entry = pte_to_swp_entry(pte); + page = migration_entry_to_page(entry); + if (!get_page_unless_zero(page)) + goto out; + spin_unlock(ptl); + wait_on_page_locked(page); + put_page(page); + return; +out: + spin_unlock(ptl); +} + #ifdef CONFIG_BLOCK /* Returns true if all buffers are successfully locked */ static bool buffer_migrate_lock_buffers(struct buffer_head *head, -- 1.7.11.7 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754538Ab3BUTpH (ORCPT ); Thu, 21 Feb 2013 14:45:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:4589 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754162Ab3BUTmp (ORCPT ); Thu, 21 Feb 2013 14:42:45 -0500 From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: [PATCH 2/9] migrate: make core migration code aware of hugepage Date: Thu, 21 Feb 2013 14:41:41 -0500 Message-Id: <1361475708-25991-3-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Before enabling each user of page migration to support hugepage, this patch adds necessary changes on core migration code. The main change is that the list of pages to migrate can link not only LRU pages, but also hugepages. Along with this, functions such as migrate_pages() and putback_movable_pages() need to be changed to handle hugepages. Signed-off-by: Naoya Horiguchi --- include/linux/hugetlb.h | 4 ++++ include/linux/mempolicy.h | 2 +- include/linux/migrate.h | 6 ++++++ mm/hugetlb.c | 16 ++++++++++++++++ mm/migrate.c | 27 +++++++++++++++++++++++++-- 5 files changed, 52 insertions(+), 3 deletions(-) diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h index 40b27f6..8f87115 100644 --- v3.8.orig/include/linux/hugetlb.h +++ v3.8/include/linux/hugetlb.h @@ -67,6 +67,8 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to, vm_flags_t vm_flags); void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); int dequeue_hwpoisoned_huge_page(struct page *page); +void putback_active_hugepage(struct page *page); +void putback_active_hugepages(struct list_head *l); void copy_huge_page(struct page *dst, struct page *src); extern unsigned long hugepages_treat_as_movable; @@ -130,6 +132,8 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) return 0; } +#define putback_active_hugepage(p) 0 +#define putback_active_hugepages(l) 0 static inline void copy_huge_page(struct page *dst, struct page *src) { } diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h index 0d7df39..2e475b5 100644 --- v3.8.orig/include/linux/mempolicy.h +++ v3.8/include/linux/mempolicy.h @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol); /* Check if a vma is migratable */ static inline int vma_migratable(struct vm_area_struct *vma) { - if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP)) + if (vma->vm_flags & (VM_IO | VM_PFNMAP)) return 0; /* * Migration allocates pages in the highest zone. If we cannot diff --git v3.8.orig/include/linux/migrate.h v3.8/include/linux/migrate.h index 1e9f627..d626c27 100644 --- v3.8.orig/include/linux/migrate.h +++ v3.8/include/linux/migrate.h @@ -42,6 +42,9 @@ extern int migrate_page(struct address_space *, extern int migrate_pages(struct list_head *l, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode, int reason); +extern int migrate_movable_pages(struct list_head *from, + new_page_t get_new_page, unsigned long private, bool offlining, + enum migrate_mode mode, int reason); extern int migrate_huge_page(struct page *, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode); @@ -64,6 +67,9 @@ static inline void putback_movable_pages(struct list_head *l) {} static inline int migrate_pages(struct list_head *l, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode, int reason) { return -ENOSYS; } +static inline int migrate_movable_pages(struct list_head *from, + new_page_t get_new_page, unsigned long private, bool offlining, + enum migrate_mode mode, int reason) { return -ENOSYS; } static inline int migrate_huge_page(struct page *page, new_page_t x, unsigned long private, bool offlining, enum migrate_mode mode) { return -ENOSYS; } diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c index 351025e..cb9d43b8 100644 --- v3.8.orig/mm/hugetlb.c +++ v3.8/mm/hugetlb.c @@ -3186,3 +3186,19 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage) return ret; } #endif + +void putback_active_hugepage(struct page *page) +{ + VM_BUG_ON(!PageHead(page)); + list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist); + put_page(page); +} + +void putback_active_hugepages(struct list_head *l) +{ + struct page *page; + struct page *page2; + + list_for_each_entry_safe(page, page2, l, lru) + putback_active_hugepage(page); +} diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c index 7d84f4c..e305dc0 100644 --- v3.8.orig/mm/migrate.c +++ v3.8/mm/migrate.c @@ -100,6 +100,10 @@ void putback_movable_pages(struct list_head *l) struct page *page2; list_for_each_entry_safe(page, page2, l, lru) { + if (unlikely(PageHuge(page))) { + putback_active_hugepage(page); + continue; + } list_del(&page->lru); dec_zone_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); @@ -1046,8 +1050,12 @@ int migrate_pages(struct list_head *from, list_for_each_entry_safe(page, page2, from, lru) { cond_resched(); - - rc = unmap_and_move(get_new_page, private, + if (PageHuge(page)) + rc = unmap_and_move_huge_page(get_new_page, + private, page, pass > 2, + offlining, mode); + else + rc = unmap_and_move(get_new_page, private, page, pass > 2, offlining, mode); @@ -1081,6 +1089,21 @@ int migrate_pages(struct list_head *from, return rc; } +int migrate_movable_pages(struct list_head *from, new_page_t get_new_page, + unsigned long private, bool offlining, + enum migrate_mode mode, int reason) +{ + int err = 0; + + if (!list_empty(from)) { + err = migrate_pages(from, get_new_page, private, + offlining, mode, reason); + if (err) + putback_movable_pages(from); + } + return err; +} + int migrate_huge_page(struct page *hpage, new_page_t get_new_page, unsigned long private, bool offlining, enum migrate_mode mode) -- 1.7.11.7 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753750Ab3BUTpG (ORCPT ); Thu, 21 Feb 2013 14:45:06 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58762 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754340Ab3BUTmp (ORCPT ); Thu, 21 Feb 2013 14:42:45 -0500 From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Date: Thu, 21 Feb 2013 14:41:42 -0500 Message-Id: <1361475708-25991-4-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently migrate_huge_page() takes a pointer to a hugepage to be migrated as an argument, instead of taking a pointer to the list of hugepages to be migrated. This behavior was introduced in commit 189ebff28 ("hugetlb: simplify migrate_huge_page()"), and was OK because until now hugepage migration is enabled only for soft-offlining which takes only one hugepage in a single call. But the situation will change in the later patches in this series which enable other users of page migration to support hugepage migration. They can kick migration for both of normal pages and hugepages in a single call, so we need to go back to original implementation of using linked lists to collect the hugepages to be migrated. Signed-off-by: Naoya Horiguchi --- mm/memory-failure.c | 20 ++++++++++++++++---- mm/migrate.c | 2 ++ 2 files changed, 18 insertions(+), 4 deletions(-) diff --git v3.8.orig/mm/memory-failure.c v3.8/mm/memory-failure.c index bc126f6..01e4676 100644 --- v3.8.orig/mm/memory-failure.c +++ v3.8/mm/memory-failure.c @@ -1467,6 +1467,7 @@ static int soft_offline_huge_page(struct page *page, int flags) int ret; unsigned long pfn = page_to_pfn(page); struct page *hpage = compound_head(page); + LIST_HEAD(pagelist); /* Synchronized using the page lock with memory_failure() */ lock_page(hpage); @@ -1479,13 +1480,24 @@ static int soft_offline_huge_page(struct page *page, int flags) unlock_page(hpage); /* Keep page count to indicate a given hugepage is isolated. */ - ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, false, - MIGRATE_SYNC); - put_page(hpage); + list_move(&hpage->lru, &pagelist); + ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, false, + MIGRATE_SYNC, MR_MEMORY_FAILURE); if (ret) { pr_info("soft offline: %#lx: migration failed %d, type %lx\n", pfn, ret, page->flags); - return ret; + /* + * We know that soft_offline_huge_page() tries to migrate + * only one hugepage pointed to by hpage, so we need not + * run through the pagelist here. + */ + putback_active_hugepage(hpage); + if (ret > 0) + ret = -EIO; + } else { + set_page_hwpoison_huge_page(hpage); + dequeue_hwpoisoned_huge_page(hpage); + atomic_long_add(1< Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753952Ab3BWHFc (ORCPT ); Sat, 23 Feb 2013 02:05:32 -0500 Received: from mail-ob0-f175.google.com ([209.85.214.175]:32811 "EHLO mail-ob0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753311Ab3BWHFb (ORCPT ); Sat, 23 Feb 2013 02:05:31 -0500 MIME-Version: 1.0 In-Reply-To: <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> Date: Sat, 23 Feb 2013 15:05:30 +0800 Message-ID: Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage From: Hillf Danton To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org, Hillf Danton , Michal Hocko Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Naoya [add Michal in cc list] On Fri, Feb 22, 2013 at 3:41 AM, Naoya Horiguchi wrote: > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > +int is_hugepage_movable(struct page *hpage) s/int/bool/ can we? > +{ > + struct page *page; > + struct page *tmp; > + struct hstate *h = page_hstate(hpage); Make sense to compute hstate for a tail page? > + int ret = 0; > + > + VM_BUG_ON(!PageHuge(hpage)); > + if (PageTail(hpage)) > + return 0; VM_BUG_ON(!PageHuge(hpage) || PageTail(hpage)), can we? > + spin_lock(&hugetlb_lock); > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) s/_safe// can we? > + if (page == hpage) > + ret = 1; Can we bail out with ret set to be true? > + spin_unlock(&hugetlb_lock); > + return ret; > +} From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757181Ab3BYQ67 (ORCPT ); Mon, 25 Feb 2013 11:58:59 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55511 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752168Ab3BYQ66 (ORCPT ); Mon, 25 Feb 2013 11:58:58 -0500 Date: Mon, 25 Feb 2013 11:57:56 -0500 From: Naoya Horiguchi To: Hillf Danton Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org, Michal Hocko Message-ID: <1361811476-la4ql3y2-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-References: X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Hillf, On Sat, Feb 23, 2013 at 03:05:30PM +0800, Hillf Danton wrote: > Hello Naoya > > [add Michal in cc list] > > On Fri, Feb 22, 2013 at 3:41 AM, Naoya Horiguchi > wrote: > > > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > > +int is_hugepage_movable(struct page *hpage) > s/int/bool/ can we? Yes, we can. I'll do this. > > +{ > > + struct page *page; > > + struct page *tmp; > > + struct hstate *h = page_hstate(hpage); > Make sense to compute hstate for a tail page? No need to do this here. It's better to put it after PageTail check. > > + int ret = 0; > > + > > + VM_BUG_ON(!PageHuge(hpage)); > > + if (PageTail(hpage)) > > + return 0; > VM_BUG_ON(!PageHuge(hpage) || PageTail(hpage)), can we? I think that firing BUG_ON() for tail pages is overkill. Pfn range over which scan_movable_pages() runs could start at the pfn inside the hugepage when we try to hot-remove the memory block used by 1GB hugepage. In that case, is_hugepage_movable() can be called for tail pages as a normal behavior. But anyway, I'll add the comment for this corner case. > > + spin_lock(&hugetlb_lock); > > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) > s/_safe// can we? OK. > > + if (page == hpage) > > + ret = 1; > Can we bail out with ret set to be true? Yes, inserting break is good for performance. > > + spin_unlock(&hugetlb_lock); > > + return ret; > > +} Thank you! Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760321Ab3B0RGr (ORCPT ); Wed, 27 Feb 2013 12:06:47 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42236 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750927Ab3B0RGq (ORCPT ); Wed, 27 Feb 2013 12:06:46 -0500 Date: Wed, 27 Feb 2013 12:06:27 -0500 From: Naoya Horiguchi To: gong.chen@linux.intel.com Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1361984787-yx7rovrg-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130227072517.GA30971@gchen.bj.intel.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-4-git-send-email-n-horiguchi@ah.jp.nec.com> <20130227072517.GA30971@gchen.bj.intel.com> Subject: Re: [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 27, 2013 at 02:25:17AM -0500, Chen Gong wrote: > On Thu, Feb 21, 2013 at 02:41:42PM -0500, Naoya Horiguchi wrote: > > Date: Thu, 21 Feb 2013 14:41:42 -0500 ... > > diff --git v3.8.orig/mm/memory-failure.c v3.8/mm/memory-failure.c > > index bc126f6..01e4676 100644 > > --- v3.8.orig/mm/memory-failure.c > > +++ v3.8/mm/memory-failure.c ... > > + atomic_long_add(1< > mce_bad_pages has been substituted by num_poisoned_pages. This patchset is based on v3.8 (as show in diff header), where the replacing patch "memory-failure: use num_poisoned_pages instead of mce_bad_pages" is not merged yet. I'll rebase on v3.8-rc1 in the next post. Thanks, Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760463Ab3B0RRN (ORCPT ); Wed, 27 Feb 2013 12:17:13 -0500 Received: from mx1.redhat.com ([209.132.183.28]:62482 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760426Ab3B0RRK (ORCPT ); Wed, 27 Feb 2013 12:17:10 -0500 Date: Wed, 27 Feb 2013 12:16:55 -0500 From: Naoya Horiguchi To: gong.chen@linux.intel.com Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1361985415-3tashl9l-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130227073604.GB30971@gchen.bj.intel.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> <20130227073604.GB30971@gchen.bj.intel.com> Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-References: <20130227073604.GB30971@gchen.bj.intel.com> X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 27, 2013 at 02:36:04AM -0500, Chen Gong wrote: > On Thu, Feb 21, 2013 at 02:41:47PM -0500, Naoya Horiguchi wrote: ... > > @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage) > > return 0; > > } > > > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > > +int is_hugepage_movable(struct page *hpage) > > +{ > > + struct page *page; > > + struct page *tmp; > > + struct hstate *h = page_hstate(hpage); > > + int ret = 0; > > + > > + VM_BUG_ON(!PageHuge(hpage)); > > + if (PageTail(hpage)) > > + return 0; > > + spin_lock(&hugetlb_lock); > > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) > > + if (page == hpage) > > + ret = 1; > > I don't understand the logic here. 1) page is not removed why tmp is used? > 2) why hitting (page ==hpage) but not breaking from the loop? For question 1), using list_for_each_entry_safe() was a remnant of try and error and will be fixed. And for question 2), I will add break in later version. Thanks, Naoya > > + spin_unlock(&hugetlb_lock); > > + return ret; > > +} > > + > > [...] From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760669Ab3B0R6Y (ORCPT ); Wed, 27 Feb 2013 12:58:24 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51948 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760263Ab3B0R6K (ORCPT ); Wed, 27 Feb 2013 12:58:10 -0500 Date: Wed, 27 Feb 2013 12:57:57 -0500 From: Naoya Horiguchi To: gong.chen@linux.intel.com Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1361987877-6x88p62s-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361984787-yx7rovrg-mutt-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-4-git-send-email-n-horiguchi@ah.jp.nec.com> <20130227072517.GA30971@gchen.bj.intel.com> <1361984787-yx7rovrg-mutt-n-horiguchi@ah.jp.nec.com> Subject: Re: [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-References: <1361984787-yx7rovrg-mutt-n-horiguchi@ah.jp.nec.com> X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 27, 2013 at 12:06:27PM -0500, Naoya Horiguchi wrote: > On Wed, Feb 27, 2013 at 02:25:17AM -0500, Chen Gong wrote: > > On Thu, Feb 21, 2013 at 02:41:42PM -0500, Naoya Horiguchi wrote: > > > Date: Thu, 21 Feb 2013 14:41:42 -0500 > ... > > > diff --git v3.8.orig/mm/memory-failure.c v3.8/mm/memory-failure.c > > > index bc126f6..01e4676 100644 > > > --- v3.8.orig/mm/memory-failure.c > > > +++ v3.8/mm/memory-failure.c > ... > > > + atomic_long_add(1< > > > mce_bad_pages has been substituted by num_poisoned_pages. > > This patchset is based on v3.8 (as show in diff header), where the > replacing patch "memory-failure: use num_poisoned_pages instead of > mce_bad_pages" is not merged yet. I'll rebase on v3.8-rc1 in the > next post. sorry, s/v3.8-rc1/v3.9-rc1/ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751700Ab3B1GC7 (ORCPT ); Thu, 28 Feb 2013 01:02:59 -0500 Received: from mail-ee0-f46.google.com ([74.125.83.46]:57928 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750775Ab3B1GC6 (ORCPT ); Thu, 28 Feb 2013 01:02:58 -0500 MIME-Version: 1.0 In-Reply-To: <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> From: KOSAKI Motohiro Date: Thu, 28 Feb 2013 01:02:37 -0500 X-Google-Sender-Auth: w6bLqmpydUB5QzzcewFK9UdWVpI Message-ID: Subject: Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable To: Naoya Horiguchi Cc: "linux-mm@kvack.org" , Andrew Morton , Mel Gorman , Hugh Dickins , Andi Kleen , LKML Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > - { > - .procname = "hugepages_treat_as_movable", > - .data = &hugepages_treat_as_movable, > - .maxlen = sizeof(int), > - .mode = 0644, > - .proc_handler = hugetlb_treat_movable_handler, > - }, Sorry, no. This is too aggressive remove. Imagine, a lot of shell script don't have any error check. I suggest to keep this file but change to nop (to output warning is better). About 1-2 years after, we can remove this file safely. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932955Ab3B1SRL (ORCPT ); Thu, 28 Feb 2013 13:17:11 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45428 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932894Ab3B1SRH (ORCPT ); Thu, 28 Feb 2013 13:17:07 -0500 Date: Thu, 28 Feb 2013 13:16:52 -0500 From: Naoya Horiguchi To: KOSAKI Motohiro Cc: "linux-mm@kvack.org" , Andrew Morton , Mel Gorman , Hugh Dickins , Andi Kleen , LKML Message-ID: <1362075412-779292mh-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> Subject: Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-References: X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 28, 2013 at 01:02:37AM -0500, KOSAKI Motohiro wrote: > > - { > > - .procname = "hugepages_treat_as_movable", > > - .data = &hugepages_treat_as_movable, > > - .maxlen = sizeof(int), > > - .mode = 0644, > > - .proc_handler = hugetlb_treat_movable_handler, > > - }, > > Sorry, no. > > This is too aggressive remove. Imagine, a lot of shell script don't > have any error check. Sure, it could break usespace applications. > I suggest to keep this file but change to nop (to output warning is better). > About 1-2 years after, we can remove this file safely. OK, so I'll leave it for a while with the comment saying that this parameter is obsolete and shouldn't be used. Thanks, Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753484Ab3CROwG (ORCPT ); Mon, 18 Mar 2013 10:52:06 -0400 Received: from cantor2.suse.de ([195.135.220.15]:47988 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753312Ab3CROwE (ORCPT ); Mon, 18 Mar 2013 10:52:04 -0400 Date: Mon, 18 Mar 2013 15:51:59 +0100 From: Michal Hocko To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() Message-ID: <20130318145159.GP10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 21-02-13 14:41:40, Naoya Horiguchi wrote: [...] > diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c > index 2fd8b4a..7d84f4c 100644 > --- v3.8.orig/mm/migrate.c > +++ v3.8/mm/migrate.c > @@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, > pte_unmap_unlock(ptep, ptl); > } > > +void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, > + unsigned long address) > +{ > + spinlock_t *ptl = pte_lockptr(mm, pmd); > + pte_t pte; > + swp_entry_t entry; > + struct page *page; > + > + spin_lock(ptl); > + pte = huge_ptep_get((pte_t *)pmd); > + if (!is_hugetlb_entry_migration(pte)) > + goto out; > + entry = pte_to_swp_entry(pte); > + page = migration_entry_to_page(entry); > + if (!get_page_unless_zero(page)) > + goto out; > + spin_unlock(ptl); > + wait_on_page_locked(page); > + put_page(page); > + return; > +out: > + spin_unlock(ptl); > +} This duplicates a lot of code from migration_entry_wait. Can we just teach the generic one to be HugePage aware instead? All it takes is just opencoding pte_offset_map_lock and calling huge_ptep_get ofr HugePage and pte_offset_map otherwise. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753936Ab3CRPW2 (ORCPT ); Mon, 18 Mar 2013 11:22:28 -0400 Received: from cantor2.suse.de ([195.135.220.15]:49434 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753915Ab3CRPWZ (ORCPT ); Mon, 18 Mar 2013 11:22:25 -0400 Date: Mon, 18 Mar 2013 16:22:24 +0100 From: Michal Hocko To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/9] migrate: make core migration code aware of hugepage Message-ID: <20130318152224.GQ10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-3-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1361475708-25991-3-git-send-email-n-horiguchi@ah.jp.nec.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 21-02-13 14:41:41, Naoya Horiguchi wrote: [...] > diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h > index 0d7df39..2e475b5 100644 > --- v3.8.orig/include/linux/mempolicy.h > +++ v3.8/include/linux/mempolicy.h > @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol); > /* Check if a vma is migratable */ > static inline int vma_migratable(struct vm_area_struct *vma) > { > - if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP)) > + if (vma->vm_flags & (VM_IO | VM_PFNMAP)) > return 0; Is this safe? At least check_*_range don't seem to be hugetlb aware. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753799Ab3CRPdE (ORCPT ); Mon, 18 Mar 2013 11:33:04 -0400 Received: from cantor2.suse.de ([195.135.220.15]:49896 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753409Ab3CRPdC (ORCPT ); Mon, 18 Mar 2013 11:33:02 -0400 Date: Mon, 18 Mar 2013 16:33:00 +0100 From: Michal Hocko To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/9] migrate: make core migration code aware of hugepage Message-ID: <20130318153300.GR10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-3-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318152224.GQ10192@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130318152224.GQ10192@dhcp22.suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 18-03-13 16:22:24, Michal Hocko wrote: > On Thu 21-02-13 14:41:41, Naoya Horiguchi wrote: > [...] > > diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h > > index 0d7df39..2e475b5 100644 > > --- v3.8.orig/include/linux/mempolicy.h > > +++ v3.8/include/linux/mempolicy.h > > @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol); > > /* Check if a vma is migratable */ > > static inline int vma_migratable(struct vm_area_struct *vma) > > { > > - if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP)) > > + if (vma->vm_flags & (VM_IO | VM_PFNMAP)) > > return 0; > > Is this safe? At least check_*_range don't seem to be hugetlb aware. Ohh, they become in 5/9. Should that one be reordered then? -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753862Ab3CRPlA (ORCPT ); Mon, 18 Mar 2013 11:41:00 -0400 Received: from cantor2.suse.de ([195.135.220.15]:50413 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752598Ab3CRPk7 (ORCPT ); Mon, 18 Mar 2013 11:40:59 -0400 Date: Mon, 18 Mar 2013 16:40:57 +0100 From: Michal Hocko To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Message-ID: <20130318154057.GS10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: > This patch extends check_range() to handle vma with VM_HUGETLB set. > With this changes, we can migrate hugepage with migrate_pages(2). > Note that for larger hugepages (covered by pud entries, 1GB for > x86_64 for example), we simply skip it now. > > Signed-off-by: Naoya Horiguchi > --- > include/linux/hugetlb.h | 6 ++++-- > mm/hugetlb.c | 10 ++++++++++ > mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++------------ > 3 files changed, 48 insertions(+), 14 deletions(-) > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > index 8f87115..eb33df5 100644 > --- v3.8.orig/include/linux/hugetlb.h > +++ v3.8/include/linux/hugetlb.h > @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); > int dequeue_hwpoisoned_huge_page(struct page *page); > void putback_active_hugepage(struct page *page); > void putback_active_hugepages(struct list_head *l); > +void migrate_hugepage_add(struct page *page, struct list_head *list); > void copy_huge_page(struct page *dst, struct page *src); > > extern unsigned long hugepages_treat_as_movable; > @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, > pmd_t *pmd, int write); > struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, > pud_t *pud, int write); > -int pmd_huge(pmd_t pmd); > -int pud_huge(pud_t pmd); > +extern int pmd_huge(pmd_t pmd); > +extern int pud_huge(pud_t pmd); extern is not needed here. > unsigned long hugetlb_change_protection(struct vm_area_struct *vma, > unsigned long address, unsigned long end, pgprot_t newprot); > > @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) > > #define putback_active_hugepage(p) 0 > #define putback_active_hugepages(l) 0 > +#define migrate_hugepage_add(p, l) 0 > static inline void copy_huge_page(struct page *dst, struct page *src) > { > } > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > index cb9d43b8..86ffcb7 100644 > --- v3.8.orig/mm/hugetlb.c > +++ v3.8/mm/hugetlb.c > @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l) > list_for_each_entry_safe(page, page2, l, lru) > putback_active_hugepage(page); > } > + > +void migrate_hugepage_add(struct page *page, struct list_head *list) > +{ > + VM_BUG_ON(!PageHuge(page)); > + get_page(page); > + spin_lock(&hugetlb_lock); Why hugetlb_lock? Comment for this lock says that it protects hugepage_freelists, nr_huge_pages, and free_huge_pages. > + list_move_tail(&page->lru, list); > + spin_unlock(&hugetlb_lock); > + return; > +} > diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c > index e2df1c1..8627135 100644 > --- v3.8.orig/mm/mempolicy.c > +++ v3.8/mm/mempolicy.c > @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > return addr != end; > } > > +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, > + const nodemask_t *nodes, unsigned long flags, > + void *private) > +{ > +#ifdef CONFIG_HUGETLB_PAGE > + int nid; > + struct page *page; > + > + spin_lock(&vma->vm_mm->page_table_lock); > + page = pte_page(huge_ptep_get((pte_t *)pmd)); > + spin_unlock(&vma->vm_mm->page_table_lock); I am a bit confused why page_table_lock is used here and why it doesn't cover the page usage. > + nid = page_to_nid(page); > + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT) > + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1) > + || flags & MPOL_MF_MOVE_ALL)) > + migrate_hugepage_add(page, private); > +#else > + BUG(); > +#endif > +} > + > static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > unsigned long addr, unsigned long end, > const nodemask_t *nodes, unsigned long flags, > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > pmd = pmd_offset(pud, addr); > do { > next = pmd_addr_end(addr, end); > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() sufficient? > + check_hugetlb_pmd_range(vma, pmd, nodes, > + flags, private); > + continue; > + } > split_huge_page_pmd(vma, addr, pmd); > if (pmd_none_or_trans_huge_or_clear_bad(pmd)) > continue; [...] -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753805Ab3CRPv2 (ORCPT ); Mon, 18 Mar 2013 11:51:28 -0400 Received: from cantor2.suse.de ([195.135.220.15]:50888 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752567Ab3CRPv1 (ORCPT ); Mon, 18 Mar 2013 11:51:27 -0400 Date: Mon, 18 Mar 2013 16:51:25 +0100 From: Michal Hocko To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Message-ID: <20130318155125.GT10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 21-02-13 14:41:48, Naoya Horiguchi wrote: > Now hugepages are definitely movable. So allocating hugepages from > ZONE_MOVABLE is natural and we have no reason to keep this parameter. The sysctl is a part of user interface so you shouldn't remove it right away. What we can do is to make it noop and only WARN() that the interface will be removed later so that userspace can prepare for that. > Signed-off-by: Naoya Horiguchi > --- > Documentation/sysctl/vm.txt | 16 ---------------- > include/linux/hugetlb.h | 2 -- > kernel/sysctl.c | 7 ------- > mm/hugetlb.c | 23 +++++------------------ > 4 files changed, 5 insertions(+), 43 deletions(-) > > diff --git v3.8.orig/Documentation/sysctl/vm.txt v3.8/Documentation/sysctl/vm.txt > index 078701f..997350a 100644 > --- v3.8.orig/Documentation/sysctl/vm.txt > +++ v3.8/Documentation/sysctl/vm.txt > @@ -167,22 +167,6 @@ fragmentation index is <= extfrag_threshold. The default value is 500. > > ============================================================== > > -hugepages_treat_as_movable > - > -This parameter is only useful when kernelcore= is specified at boot time to > -create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages > -are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero > -value written to hugepages_treat_as_movable allows huge pages to be allocated > -from ZONE_MOVABLE. > - > -Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge > -pages pool can easily grow or shrink within. Assuming that applications are > -not running that mlock() a lot of memory, it is likely the huge pages pool > -can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value > -into nr_hugepages and triggering page reclaim. > - > -============================================================== > - > hugetlb_shm_group > > hugetlb_shm_group contains group id that is allowed to create SysV > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > index e33f07f..c97e5c5 100644 > --- v3.8.orig/include/linux/hugetlb.h > +++ v3.8/include/linux/hugetlb.h > @@ -35,7 +35,6 @@ int PageHuge(struct page *page); > void reset_vma_resv_huge_pages(struct vm_area_struct *vma); > int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > -int hugetlb_treat_movable_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > > #ifdef CONFIG_NUMA > int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, > @@ -73,7 +72,6 @@ void migrate_hugepage_add(struct page *page, struct list_head *list); > int is_hugepage_movable(struct page *page); > void copy_huge_page(struct page *dst, struct page *src); > > -extern unsigned long hugepages_treat_as_movable; > extern const unsigned long hugetlb_zero, hugetlb_infinity; > extern int sysctl_hugetlb_shm_group; > extern struct list_head huge_boot_pages; > diff --git v3.8.orig/kernel/sysctl.c v3.8/kernel/sysctl.c > index c88878d..a98bcf2 100644 > --- v3.8.orig/kernel/sysctl.c > +++ v3.8/kernel/sysctl.c > @@ -1189,13 +1189,6 @@ static struct ctl_table vm_table[] = { > .mode = 0644, > .proc_handler = proc_dointvec, > }, > - { > - .procname = "hugepages_treat_as_movable", > - .data = &hugepages_treat_as_movable, > - .maxlen = sizeof(int), > - .mode = 0644, > - .proc_handler = hugetlb_treat_movable_handler, > - }, > { > .procname = "nr_overcommit_hugepages", > .data = NULL, > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > index c28e6c9..c60d203 100644 > --- v3.8.orig/mm/hugetlb.c > +++ v3.8/mm/hugetlb.c > @@ -33,7 +33,6 @@ > #include "internal.h" > > const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; > -static gfp_t htlb_alloc_mask = GFP_HIGHUSER; > unsigned long hugepages_treat_as_movable; > > int hugetlb_max_hstate __read_mostly; > @@ -542,7 +541,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, > retry_cpuset: > cpuset_mems_cookie = get_mems_allowed(); > zonelist = huge_zonelist(vma, address, > - htlb_alloc_mask, &mpol, &nodemask); > + GFP_HIGHUSER_MOVABLE, &mpol, &nodemask); > /* > * A child process with MAP_PRIVATE mappings created by their parent > * have no page reserves. This check ensures that reservations are > @@ -558,7 +557,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, > > for_each_zone_zonelist_nodemask(zone, z, zonelist, > MAX_NR_ZONES - 1, nodemask) { > - if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) { > + if (cpuset_zone_allowed_softwall(zone, GFP_HIGHUSER_MOVABLE)) { > page = dequeue_huge_page_node(h, zone_to_nid(zone)); > if (page) { > if (!avoid_reserve) > @@ -698,7 +697,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid) > return NULL; > > page = alloc_pages_exact_node(nid, > - htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE| > + GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE| > __GFP_REPEAT|__GFP_NOWARN, > huge_page_order(h)); > if (page) { > @@ -909,12 +908,12 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > spin_unlock(&hugetlb_lock); > > if (nid == NUMA_NO_NODE) > - page = alloc_pages(htlb_alloc_mask|__GFP_COMP| > + page = alloc_pages(GFP_HIGHUSER_MOVABLE|__GFP_COMP| > __GFP_REPEAT|__GFP_NOWARN, > huge_page_order(h)); > else > page = alloc_pages_exact_node(nid, > - htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE| > + GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE| > __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); > > if (page && arch_prepare_hugepage(page)) { > @@ -2078,18 +2077,6 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *table, int write, > } > #endif /* CONFIG_NUMA */ > > -int hugetlb_treat_movable_handler(struct ctl_table *table, int write, > - void __user *buffer, > - size_t *length, loff_t *ppos) > -{ > - proc_dointvec(table, write, buffer, length, ppos); > - if (hugepages_treat_as_movable) > - htlb_alloc_mask = GFP_HIGHUSER_MOVABLE; > - else > - htlb_alloc_mask = GFP_HIGHUSER; > - return 0; > -} > - > int hugetlb_overcommit_handler(struct ctl_table *table, int write, > void __user *buffer, > size_t *length, loff_t *ppos) > -- > 1.7.11.7 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754065Ab3CRQHm (ORCPT ); Mon, 18 Mar 2013 12:07:42 -0400 Received: from cantor2.suse.de ([195.135.220.15]:51865 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752894Ab3CRQHl (ORCPT ); Mon, 18 Mar 2013 12:07:41 -0400 Date: Mon, 18 Mar 2013 17:07:37 +0100 From: Michal Hocko To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Message-ID: <20130318160737.GU10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 21-02-13 14:41:47, Naoya Horiguchi wrote: > Currently we can't offline memory blocks which contain hugepages because > a hugepage is considered as an unmovable page. But now with this patch > series, a hugepage has become movable, so by using hugepage migration we > can offline such memory blocks. > > What's different from other users of hugepage migration is that we need > to decompose all the hugepages inside the target memory block into free > buddy pages after hugepage migration, because otherwise free hugepages > remaining in the memory block intervene the memory offlining. > For this reason we introduce new functions dissolve_free_huge_page() and > dissolve_free_huge_pages(). > > Other than that, what this patch does is straightforwardly to add hugepage > migration code, that is, adding hugepage code to the functions which scan > over pfn and collect hugepages to be migrated, and adding a hugepage > allocation function to alloc_migrate_target(). > > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove > over them because it's larger than memory block. So we now simply leave > it to fail as it is. What we could do is to check whether there is a free gb huge page on other node and migrate there. > Signed-off-by: Naoya Horiguchi > --- > include/linux/hugetlb.h | 8 ++++++++ > mm/hugetlb.c | 43 +++++++++++++++++++++++++++++++++++++++++ > mm/memory_hotplug.c | 51 ++++++++++++++++++++++++++++++++++++++++--------- > mm/migrate.c | 12 +++++++++++- > mm/page_alloc.c | 12 ++++++++++++ > mm/page_isolation.c | 5 +++++ > 6 files changed, 121 insertions(+), 10 deletions(-) > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > index 86a4d78..e33f07f 100644 > --- v3.8.orig/include/linux/hugetlb.h > +++ v3.8/include/linux/hugetlb.h > @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page); > void putback_active_hugepage(struct page *page); > void putback_active_hugepages(struct list_head *l); > void migrate_hugepage_add(struct page *page, struct list_head *list); > +int is_hugepage_movable(struct page *page); > void copy_huge_page(struct page *dst, struct page *src); > > extern unsigned long hugepages_treat_as_movable; > @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) > #define putback_active_hugepage(p) 0 > #define putback_active_hugepages(l) 0 > #define migrate_hugepage_add(p, l) 0 > +#define is_hugepage_movable(x) 0 > static inline void copy_huge_page(struct page *dst, struct page *src) > { > } > @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h) > return h - hstates; > } > > +extern void dissolve_free_huge_page(struct page *page); > +extern void dissolve_free_huge_pages(unsigned long start_pfn, > + unsigned long end_pfn); > + > #else > struct hstate {}; > #define alloc_huge_page(v, a, r) NULL > @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > } > #define hstate_index_to_shift(index) 0 > #define hstate_index(h) 0 > +#define dissolve_free_huge_page(p) 0 > +#define dissolve_free_huge_pages(s, e) 0 > #endif > > #endif /* _LINUX_HUGETLB_H */ > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > index ccf9995..c28e6c9 100644 > --- v3.8.orig/mm/hugetlb.c > +++ v3.8/mm/hugetlb.c > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > return ret; > } > > +/* Dissolve a given free hugepage into free pages. */ > +void dissolve_free_huge_page(struct page *page) > +{ > + if (PageHuge(page) && !page_count(page)) { Could you clarify why you are cheking page_count here? I assume it is to make sure the page is free but what prevents it being increased before you take hugetlb_lock? > + struct hstate *h = page_hstate(page); > + int nid = page_to_nid(page); > + spin_lock(&hugetlb_lock); > + list_del(&page->lru); > + h->free_huge_pages--; > + h->free_huge_pages_node[nid]--; > + update_and_free_page(h, page); > + spin_unlock(&hugetlb_lock); > + } > +} > + > +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */ > +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > +{ > + unsigned long pfn; > + unsigned int step = 1 << (HUGETLB_PAGE_ORDER); hugetlb pages could be present in different sizes so this doesn't work in general. You need to to get order from page_hstate. > + for (pfn = start_pfn; pfn < end_pfn; pfn += step) > + dissolve_free_huge_page(pfn_to_page(pfn)); > +} > + > static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > { > struct page *page; > @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage) > return 0; > } > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > +int is_hugepage_movable(struct page *hpage) > +{ > + struct page *page; > + struct page *tmp; > + struct hstate *h = page_hstate(hpage); > + int ret = 0; > + > + VM_BUG_ON(!PageHuge(hpage)); > + if (PageTail(hpage)) > + return 0; > + spin_lock(&hugetlb_lock); > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) > + if (page == hpage) > + ret = 1; > + spin_unlock(&hugetlb_lock); > + return ret; > +} > + > /* > * This function is called from memory failure code. > * Assume the caller holds page lock of the head page. > diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c > index d04ed87..6418de2 100644 > --- v3.8.orig/mm/memory_hotplug.c > +++ v3.8/mm/memory_hotplug.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > > #include > > @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) > } > > /* > - * Scanning pfn is much easier than scanning lru list. > - * Scan pfn from start to end and Find LRU page. > + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages > + * and hugepages). We scan pfn because it's much easier than scanning over > + * linked list. This function returns the pfn of the first found movable > + * page if it's found, otherwise 0. > */ > -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > +static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > { > unsigned long pfn; > struct page *page; > @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > page = pfn_to_page(pfn); > if (PageLRU(page)) > return pfn; > + if (PageHuge(page)) { > + if (is_hugepage_movable(page)) > + return pfn; > + else > + pfn += (1 << compound_order(page)) - 1; > + } scan_lru_pages's name gets really confusing after this change because hugetlb pages are not on the LRU. Maybe it would be good to rename it. > } > } > return 0; > @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > page = pfn_to_page(pfn); > if (!get_page_unless_zero(page)) > continue; All tail pages have 0 reference count (according to prep_compound_page) so they would be skipped anyway. This makes the below pfn tweaks pointless. > + if (PageHuge(page)) { > + /* > + * Larger hugepage (1GB for x86_64) is larger than > + * memory block, so pfn scan can start at the tail > + * page of larger hugepage. In such case, > + * we simply skip the hugepage and move the cursor > + * to the last tail page. > + */ > + if (PageTail(page)) { > + struct page *head = compound_head(page); > + pfn = page_to_pfn(head) + > + (1 << compound_order(head)) - 1; > + put_page(page); > + continue; > + } > + pfn = (1 << compound_order(page)) - 1; > + if (huge_page_size(page_hstate(page)) != PMD_SIZE) { > + put_page(page); > + continue; > + } There might be other hugepage sizes which fit into memblock so this test doesn't seem right. > + list_move_tail(&page->lru, &source); > + move_pages -= 1 << compound_order(page); > + continue; > + } > /* > * We can skip free pages. And we can only deal with pages on > * LRU. [...] -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756916Ab3CSAHW (ORCPT ); Mon, 18 Mar 2013 20:07:22 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40688 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753836Ab3CSAHU (ORCPT ); Mon, 18 Mar 2013 20:07:20 -0400 Date: Mon, 18 Mar 2013 20:06:23 -0400 From: Naoya Horiguchi To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1363651583-dzi7mg86-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130318145159.GP10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318145159.GP10192@dhcp22.suse.cz> Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 18, 2013 at 03:51:59PM +0100, Michal Hocko wrote: > On Thu 21-02-13 14:41:40, Naoya Horiguchi wrote: > [...] > > diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c > > index 2fd8b4a..7d84f4c 100644 > > --- v3.8.orig/mm/migrate.c > > +++ v3.8/mm/migrate.c > > @@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, > > pte_unmap_unlock(ptep, ptl); > > } > > > > +void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, > > + unsigned long address) > > +{ > > + spinlock_t *ptl = pte_lockptr(mm, pmd); > > + pte_t pte; > > + swp_entry_t entry; > > + struct page *page; > > + > > + spin_lock(ptl); > > + pte = huge_ptep_get((pte_t *)pmd); > > + if (!is_hugetlb_entry_migration(pte)) > > + goto out; > > + entry = pte_to_swp_entry(pte); > > + page = migration_entry_to_page(entry); > > + if (!get_page_unless_zero(page)) > > + goto out; > > + spin_unlock(ptl); > > + wait_on_page_locked(page); > > + put_page(page); > > + return; > > +out: > > + spin_unlock(ptl); > > +} > > This duplicates a lot of code from migration_entry_wait. Can we just > teach the generic one to be HugePage aware instead? > All it takes is just opencoding pte_offset_map_lock and calling > huge_ptep_get ofr HugePage and pte_offset_map otherwise. Yes, it's possible with some cleanup. I'll do this. Thanks, Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932368Ab3CSAHd (ORCPT ); Mon, 18 Mar 2013 20:07:33 -0400 Received: from mx1.redhat.com ([209.132.183.28]:4851 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932306Ab3CSAH2 (ORCPT ); Mon, 18 Mar 2013 20:07:28 -0400 Date: Mon, 18 Mar 2013 20:06:35 -0400 From: Naoya Horiguchi To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1363651595-ewr7efx1-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130318153300.GR10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-3-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318152224.GQ10192@dhcp22.suse.cz> <20130318153300.GR10192@dhcp22.suse.cz> Subject: Re: [PATCH 2/9] migrate: make core migration code aware of hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-References: <20130318153300.GR10192@dhcp22.suse.cz> X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 18, 2013 at 04:33:00PM +0100, Michal Hocko wrote: > On Mon 18-03-13 16:22:24, Michal Hocko wrote: > > On Thu 21-02-13 14:41:41, Naoya Horiguchi wrote: > > [...] > > > diff --git v3.8.orig/include/linux/mempolicy.h v3.8/include/linux/mempolicy.h > > > index 0d7df39..2e475b5 100644 > > > --- v3.8.orig/include/linux/mempolicy.h > > > +++ v3.8/include/linux/mempolicy.h > > > @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol); > > > /* Check if a vma is migratable */ > > > static inline int vma_migratable(struct vm_area_struct *vma) > > > { > > > - if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP)) > > > + if (vma->vm_flags & (VM_IO | VM_PFNMAP)) > > > return 0; > > > > Is this safe? At least check_*_range don't seem to be hugetlb aware. > > Ohh, they become in 5/9. Should that one be reordered then? OK, I'll shift this change after 5/9 patch. Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932483Ab3CSAIL (ORCPT ); Mon, 18 Mar 2013 20:08:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]:7585 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932378Ab3CSAIJ (ORCPT ); Mon, 18 Mar 2013 20:08:09 -0400 Date: Mon, 18 Mar 2013 20:07:16 -0400 From: Naoya Horiguchi To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130318154057.GS10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-References: <20130318154057.GS10192@dhcp22.suse.cz> X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote: > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: > > This patch extends check_range() to handle vma with VM_HUGETLB set. > > With this changes, we can migrate hugepage with migrate_pages(2). > > Note that for larger hugepages (covered by pud entries, 1GB for > > x86_64 for example), we simply skip it now. > > > > Signed-off-by: Naoya Horiguchi > > --- > > include/linux/hugetlb.h | 6 ++++-- > > mm/hugetlb.c | 10 ++++++++++ > > mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++------------ > > 3 files changed, 48 insertions(+), 14 deletions(-) > > > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > > index 8f87115..eb33df5 100644 > > --- v3.8.orig/include/linux/hugetlb.h > > +++ v3.8/include/linux/hugetlb.h > > @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); > > int dequeue_hwpoisoned_huge_page(struct page *page); > > void putback_active_hugepage(struct page *page); > > void putback_active_hugepages(struct list_head *l); > > +void migrate_hugepage_add(struct page *page, struct list_head *list); > > void copy_huge_page(struct page *dst, struct page *src); > > > > extern unsigned long hugepages_treat_as_movable; > > @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, > > pmd_t *pmd, int write); > > struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, > > pud_t *pud, int write); > > -int pmd_huge(pmd_t pmd); > > -int pud_huge(pud_t pmd); > > +extern int pmd_huge(pmd_t pmd); > > +extern int pud_huge(pud_t pmd); > > extern is not needed here. OK. > > unsigned long hugetlb_change_protection(struct vm_area_struct *vma, > > unsigned long address, unsigned long end, pgprot_t newprot); > > > > @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) > > > > #define putback_active_hugepage(p) 0 > > #define putback_active_hugepages(l) 0 > > +#define migrate_hugepage_add(p, l) 0 > > static inline void copy_huge_page(struct page *dst, struct page *src) > > { > > } > > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > > index cb9d43b8..86ffcb7 100644 > > --- v3.8.orig/mm/hugetlb.c > > +++ v3.8/mm/hugetlb.c > > @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l) > > list_for_each_entry_safe(page, page2, l, lru) > > putback_active_hugepage(page); > > } > > + > > +void migrate_hugepage_add(struct page *page, struct list_head *list) > > +{ > > + VM_BUG_ON(!PageHuge(page)); > > + get_page(page); > > + spin_lock(&hugetlb_lock); > > Why hugetlb_lock? Comment for this lock says that it protects > hugepage_freelists, nr_huge_pages, and free_huge_pages. I think that this comment is out of date and hugepage_activelists, which was introduced recently, should be protected because this patchset adds is_hugepage_movable() which runs through the list. So I'll update the comment in the next post. > > + list_move_tail(&page->lru, list); > > + spin_unlock(&hugetlb_lock); > > + return; > > +} > > diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c > > index e2df1c1..8627135 100644 > > --- v3.8.orig/mm/mempolicy.c > > +++ v3.8/mm/mempolicy.c > > @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > > return addr != end; > > } > > > > +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, > > + const nodemask_t *nodes, unsigned long flags, > > + void *private) > > +{ > > +#ifdef CONFIG_HUGETLB_PAGE > > + int nid; > > + struct page *page; > > + > > + spin_lock(&vma->vm_mm->page_table_lock); > > + page = pte_page(huge_ptep_get((pte_t *)pmd)); > > + spin_unlock(&vma->vm_mm->page_table_lock); > > I am a bit confused why page_table_lock is used here and why it doesn't > cover the page usage. I expected this function to do the same for pmd as check_pte_range() does for pte, but the above code didn't do it. I should've put spin_unlock below migrate_hugepage_add(). Sorry for the confusion. > > + nid = page_to_nid(page); > > + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT) > > + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1) > > + || flags & MPOL_MF_MOVE_ALL)) > > + migrate_hugepage_add(page, private); > > +#else > > + BUG(); > > +#endif > > +} > > + > > static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > unsigned long addr, unsigned long end, > > const nodemask_t *nodes, unsigned long flags, > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > pmd = pmd_offset(pud, addr); > > do { > > next = pmd_addr_end(addr, end); > > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() > sufficient? I think we need both check here because if we use only pmd_huge(), pmd for thp goes into this branch wrongly. Thanks, Naoya > > + check_hugetlb_pmd_range(vma, pmd, nodes, > > + flags, private); > > + continue; > > + } > > split_huge_page_pmd(vma, addr, pmd); > > if (pmd_none_or_trans_huge_or_clear_bad(pmd)) > > continue; > [...] > -- > Michal Hocko > SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932584Ab3CSAIV (ORCPT ); Mon, 18 Mar 2013 20:08:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:7274 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932378Ab3CSAIT (ORCPT ); Mon, 18 Mar 2013 20:08:19 -0400 Date: Mon, 18 Mar 2013 20:07:32 -0400 From: Naoya Horiguchi To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1363651652-dcf5qvg4-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130318155125.GT10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-10-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318155125.GT10192@dhcp22.suse.cz> Subject: Re: [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-References: <20130318155125.GT10192@dhcp22.suse.cz> X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 18, 2013 at 04:51:25PM +0100, Michal Hocko wrote: > On Thu 21-02-13 14:41:48, Naoya Horiguchi wrote: > > Now hugepages are definitely movable. So allocating hugepages from > > ZONE_MOVABLE is natural and we have no reason to keep this parameter. > > The sysctl is a part of user interface so you shouldn't remove it right > away. What we can do is to make it noop and only WARN() that the > interface will be removed later so that userspace can prepare for that. > Yes, you're right. I'll replace the handler with noop. Thanks, Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934063Ab3CSHLT (ORCPT ); Tue, 19 Mar 2013 03:11:19 -0400 Received: from mail-we0-f169.google.com ([74.125.82.169]:58180 "EHLO mail-we0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932866Ab3CSHLS (ORCPT ); Tue, 19 Mar 2013 03:11:18 -0400 Date: Tue, 19 Mar 2013 08:11:13 +0100 From: Michal Hocko To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Message-ID: <20130319071113.GD5112@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote: > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote: > > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: [...] > > > @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l) > > > list_for_each_entry_safe(page, page2, l, lru) > > > putback_active_hugepage(page); > > > } > > > + > > > +void migrate_hugepage_add(struct page *page, struct list_head *list) > > > +{ > > > + VM_BUG_ON(!PageHuge(page)); > > > + get_page(page); > > > + spin_lock(&hugetlb_lock); > > > > Why hugetlb_lock? Comment for this lock says that it protects > > hugepage_freelists, nr_huge_pages, and free_huge_pages. > > I think that this comment is out of date and hugepage_activelists, > which was introduced recently, should be protected because this > patchset adds is_hugepage_movable() which runs through the list. > So I'll update the comment in the next post. > > > > + list_move_tail(&page->lru, list); > > > + spin_unlock(&hugetlb_lock); > > > + return; > > > +} > > > diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c > > > index e2df1c1..8627135 100644 > > > --- v3.8.orig/mm/mempolicy.c > > > +++ v3.8/mm/mempolicy.c > > > @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > > > return addr != end; > > > } > > > > > > +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, > > > + const nodemask_t *nodes, unsigned long flags, > > > + void *private) > > > +{ > > > +#ifdef CONFIG_HUGETLB_PAGE > > > + int nid; > > > + struct page *page; > > > + > > > + spin_lock(&vma->vm_mm->page_table_lock); > > > + page = pte_page(huge_ptep_get((pte_t *)pmd)); > > > + spin_unlock(&vma->vm_mm->page_table_lock); > > > > I am a bit confused why page_table_lock is used here and why it doesn't > > cover the page usage. > > I expected this function to do the same for pmd as check_pte_range() does > for pte, but the above code didn't do it. I should've put spin_unlock > below migrate_hugepage_add(). Sorry for the confusion. OK, I see. So you want to prevent from racing with pmd unmap. > > > + nid = page_to_nid(page); > > > + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT) > > > + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1) > > > + || flags & MPOL_MF_MOVE_ALL)) > > > + migrate_hugepage_add(page, private); > > > +#else > > > + BUG(); > > > +#endif > > > +} > > > + > > > static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > > unsigned long addr, unsigned long end, > > > const nodemask_t *nodes, unsigned long flags, > > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > > pmd = pmd_offset(pud, addr); > > > do { > > > next = pmd_addr_end(addr, end); > > > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { > > > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() > > sufficient? > > I think we need both check here because if we use only pmd_huge(), > pmd for thp goes into this branch wrongly. Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it obviously checks only _PAGE_PSE same as pmd_large() which is really unfortunate and confusing. Can we make it hugetlb specific? > > Thanks, > Naoya -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934001Ab3CSXnv (ORCPT ); Tue, 19 Mar 2013 19:43:51 -0400 Received: from mail-da0-f49.google.com ([209.85.210.49]:34612 "EHLO mail-da0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932137Ab3CSXnu (ORCPT ); Tue, 19 Mar 2013 19:43:50 -0400 Message-ID: <5148F830.3070601@gmail.com> Date: Wed, 20 Mar 2013 07:43:44 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Naoya Horiguchi CC: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Naoya, On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: > Hi, > > Hugepage migration is now available only for soft offlining (moving > data on the half corrupted page to another page to save the data). > But it's also useful some other users of page migration, so this > patchset tries to extend some of such users to support hugepage. > > The targets of this patchset are NUMA related system calls (i.e. > migrate_pages(2), move_pages(2), and mbind(2)), and memory hotplug. > This patchset does not extend page migration in memory compaction, > because I think that users of memory compaction mainly expect to > construct thp by arranging raw pages but hugepage migration doesn't > help it. > CMA, another user of page migration, can have benefit from hugepage > migration, but is not enabled to support it now. This is because > I've never used CMA and need to learn more to extend and/or test > hugepage migration in CMA. I'll add this in later version if it > becomes ready, or will post as a separate patchset. > > Hugepage migration of 1GB hugepage is not enabled for now, because > I'm not sure whether users of 1GB hugepage really want it. > We need to spare free hugepage in order to do migration, but I don't > think that users want to 1GB memory to idle for that purpose > (currently we can't expand/shrink 1GB hugepage pool after boot). > > Could you review and give me some comments/feedbacks? > > Thanks, > Naoya Horiguchi > --- > Easy patch access: > git@github.com:Naoya-Horiguchi/linux.git > branch:extend_hugepage_migration > > Test code: > git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git git clone git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git Cloning into test_hugepage_migration_extension... Permission denied (publickey). fatal: The remote end hung up unexpectedly > > Naoya Horiguchi (9): > migrate: add migrate_entry_wait_huge() > migrate: make core migration code aware of hugepage > soft-offline: use migrate_pages() instead of migrate_huge_page() > migrate: clean up migrate_huge_page() > migrate: enable migrate_pages() to migrate hugepage > migrate: enable move_pages() to migrate hugepage > mbind: enable mbind() to migrate hugepage > memory-hotplug: enable memory hotplug to handle hugepage > remove /proc/sys/vm/hugepages_treat_as_movable > > Documentation/sysctl/vm.txt | 16 ------ > include/linux/hugetlb.h | 25 ++++++++-- > include/linux/mempolicy.h | 2 +- > include/linux/migrate.h | 12 ++--- > include/linux/swapops.h | 4 ++ > kernel/sysctl.c | 7 --- > mm/hugetlb.c | 98 ++++++++++++++++++++++++++++-------- > mm/memory-failure.c | 20 ++++++-- > mm/memory.c | 6 ++- > mm/memory_hotplug.c | 51 +++++++++++++++---- > mm/mempolicy.c | 61 +++++++++++++++-------- > mm/migrate.c | 119 ++++++++++++++++++++++++++++++-------------- > mm/page_alloc.c | 12 +++++ > mm/page_isolation.c | 5 ++ > 14 files changed, 311 insertions(+), 127 deletions(-) > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932834Ab3CSX5k (ORCPT ); Tue, 19 Mar 2013 19:57:40 -0400 Received: from mail-da0-f44.google.com ([209.85.210.44]:41207 "EHLO mail-da0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752651Ab3CSX5j (ORCPT ); Tue, 19 Mar 2013 19:57:39 -0400 Message-ID: <5148FB6C.4070202@gmail.com> Date: Wed, 20 Mar 2013 07:57:32 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Naoya Horiguchi CC: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Naoya, On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: > When we have a page fault for the address which is backed by a hugepage > under migration, the kernel can't wait correctly until the migration > finishes. This is because pte_offset_map_lock() can't get a correct It seems that current hugetlb_fault still wait hugetlb page under migration, how can it work without lock 2MB memory? > migration entry for hugepage. This patch adds migration_entry_wait_huge() > to separate code path between normal pages and hugepages. > > Signed-off-by: Naoya Horiguchi > --- > include/linux/hugetlb.h | 2 ++ > include/linux/swapops.h | 4 ++++ > mm/hugetlb.c | 4 ++-- > mm/migrate.c | 24 ++++++++++++++++++++++++ > 4 files changed, 32 insertions(+), 2 deletions(-) > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > index 0c80d3f..40b27f6 100644 > --- v3.8.orig/include/linux/hugetlb.h > +++ v3.8/include/linux/hugetlb.h > @@ -43,6 +43,7 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, > #endif > > int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); > +int is_hugetlb_entry_migration(pte_t pte); > int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, > struct page **, struct vm_area_struct **, > unsigned long *, int *, int, unsigned int flags); > @@ -109,6 +110,7 @@ static inline unsigned long hugetlb_total_pages(void) > #define follow_hugetlb_page(m,v,p,vs,a,b,i,w) ({ BUG(); 0; }) > #define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL) > #define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; }) > +#define is_hugetlb_entry_migration(pte) ({ BUG(); 0; }) > #define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) > static inline void hugetlb_report_meminfo(struct seq_file *m) > { > diff --git v3.8.orig/include/linux/swapops.h v3.8/include/linux/swapops.h > index 47ead51..f68efdd 100644 > --- v3.8.orig/include/linux/swapops.h > +++ v3.8/include/linux/swapops.h > @@ -137,6 +137,8 @@ static inline void make_migration_entry_read(swp_entry_t *entry) > > extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, > unsigned long address); > +extern void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, > + unsigned long address); > #else > > #define make_migration_entry(page, write) swp_entry(0, 0) > @@ -148,6 +150,8 @@ static inline int is_migration_entry(swp_entry_t swp) > static inline void make_migration_entry_read(swp_entry_t *entryp) { } > static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, > unsigned long address) { } > +static inline void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, > + unsigned long address) { } > static inline int is_write_migration_entry(swp_entry_t entry) > { > return 0; > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > index 546db81..351025e 100644 > --- v3.8.orig/mm/hugetlb.c > +++ v3.8/mm/hugetlb.c > @@ -2313,7 +2313,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > return -ENOMEM; > } > > -static int is_hugetlb_entry_migration(pte_t pte) > +int is_hugetlb_entry_migration(pte_t pte) > { > swp_entry_t swp; > > @@ -2823,7 +2823,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > if (ptep) { > entry = huge_ptep_get(ptep); > if (unlikely(is_hugetlb_entry_migration(entry))) { > - migration_entry_wait(mm, (pmd_t *)ptep, address); > + migration_entry_wait_huge(mm, (pmd_t *)ptep, address); > return 0; > } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) > return VM_FAULT_HWPOISON_LARGE | > diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c > index 2fd8b4a..7d84f4c 100644 > --- v3.8.orig/mm/migrate.c > +++ v3.8/mm/migrate.c > @@ -236,6 +236,30 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, > pte_unmap_unlock(ptep, ptl); > } > > +void migration_entry_wait_huge(struct mm_struct *mm, pmd_t *pmd, > + unsigned long address) > +{ > + spinlock_t *ptl = pte_lockptr(mm, pmd); > + pte_t pte; > + swp_entry_t entry; > + struct page *page; > + > + spin_lock(ptl); > + pte = huge_ptep_get((pte_t *)pmd); > + if (!is_hugetlb_entry_migration(pte)) > + goto out; > + entry = pte_to_swp_entry(pte); > + page = migration_entry_to_page(entry); > + if (!get_page_unless_zero(page)) > + goto out; > + spin_unlock(ptl); > + wait_on_page_locked(page); > + put_page(page); > + return; > +out: > + spin_unlock(ptl); > +} > + > #ifdef CONFIG_BLOCK > /* Returns true if all buffers are successfully locked */ > static bool buffer_migrate_lock_buffers(struct buffer_head *head, From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757773Ab3CTAbO (ORCPT ); Tue, 19 Mar 2013 20:31:14 -0400 Received: from mail-da0-f47.google.com ([209.85.210.47]:53365 "EHLO mail-da0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755046Ab3CTAbM (ORCPT ); Tue, 19 Mar 2013 20:31:12 -0400 Message-ID: <5149034A.5050907@gmail.com> Date: Wed, 20 Mar 2013 08:31:06 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Naoya Horiguchi CC: Michal Hocko , linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Naoya, On 03/19/2013 08:07 AM, Naoya Horiguchi wrote: > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote: >> On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: >>> This patch extends check_range() to handle vma with VM_HUGETLB set. >>> With this changes, we can migrate hugepage with migrate_pages(2). >>> Note that for larger hugepages (covered by pud entries, 1GB for >>> x86_64 for example), we simply skip it now. >>> >>> Signed-off-by: Naoya Horiguchi >>> --- >>> include/linux/hugetlb.h | 6 ++++-- >>> mm/hugetlb.c | 10 ++++++++++ >>> mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++------------ >>> 3 files changed, 48 insertions(+), 14 deletions(-) >>> >>> diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h >>> index 8f87115..eb33df5 100644 >>> --- v3.8.orig/include/linux/hugetlb.h >>> +++ v3.8/include/linux/hugetlb.h >>> @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); >>> int dequeue_hwpoisoned_huge_page(struct page *page); >>> void putback_active_hugepage(struct page *page); >>> void putback_active_hugepages(struct list_head *l); >>> +void migrate_hugepage_add(struct page *page, struct list_head *list); >>> void copy_huge_page(struct page *dst, struct page *src); >>> >>> extern unsigned long hugepages_treat_as_movable; >>> @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, >>> pmd_t *pmd, int write); >>> struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, >>> pud_t *pud, int write); >>> -int pmd_huge(pmd_t pmd); >>> -int pud_huge(pud_t pmd); >>> +extern int pmd_huge(pmd_t pmd); >>> +extern int pud_huge(pud_t pmd); >> extern is not needed here. > OK. > >>> unsigned long hugetlb_change_protection(struct vm_area_struct *vma, >>> unsigned long address, unsigned long end, pgprot_t newprot); >>> >>> @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) >>> >>> #define putback_active_hugepage(p) 0 >>> #define putback_active_hugepages(l) 0 >>> +#define migrate_hugepage_add(p, l) 0 >>> static inline void copy_huge_page(struct page *dst, struct page *src) >>> { >>> } >>> diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c >>> index cb9d43b8..86ffcb7 100644 >>> --- v3.8.orig/mm/hugetlb.c >>> +++ v3.8/mm/hugetlb.c >>> @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l) >>> list_for_each_entry_safe(page, page2, l, lru) >>> putback_active_hugepage(page); >>> } >>> + >>> +void migrate_hugepage_add(struct page *page, struct list_head *list) >>> +{ >>> + VM_BUG_ON(!PageHuge(page)); >>> + get_page(page); >>> + spin_lock(&hugetlb_lock); >> Why hugetlb_lock? Comment for this lock says that it protects >> hugepage_freelists, nr_huge_pages, and free_huge_pages. > I think that this comment is out of date and hugepage_activelists, > which was introduced recently, should be protected because this > patchset adds is_hugepage_movable() which runs through the list. > So I'll update the comment in the next post. > >>> + list_move_tail(&page->lru, list); >>> + spin_unlock(&hugetlb_lock); >>> + return; >>> +} >>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c >>> index e2df1c1..8627135 100644 >>> --- v3.8.orig/mm/mempolicy.c >>> +++ v3.8/mm/mempolicy.c >>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, >>> return addr != end; >>> } >>> >>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, >>> + const nodemask_t *nodes, unsigned long flags, >>> + void *private) >>> +{ >>> +#ifdef CONFIG_HUGETLB_PAGE >>> + int nid; >>> + struct page *page; >>> + >>> + spin_lock(&vma->vm_mm->page_table_lock); >>> + page = pte_page(huge_ptep_get((pte_t *)pmd)); >>> + spin_unlock(&vma->vm_mm->page_table_lock); >> I am a bit confused why page_table_lock is used here and why it doesn't >> cover the page usage. > I expected this function to do the same for pmd as check_pte_range() does > for pte, but the above code didn't do it. I should've put spin_unlock > below migrate_hugepage_add(). Sorry for the confusion. I still confuse! Could you explain more in details? > >>> + nid = page_to_nid(page); >>> + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT) >>> + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1) >>> + || flags & MPOL_MF_MOVE_ALL)) >>> + migrate_hugepage_add(page, private); >>> +#else >>> + BUG(); >>> +#endif >>> +} >>> + >>> static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, >>> unsigned long addr, unsigned long end, >>> const nodemask_t *nodes, unsigned long flags, >>> @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, >>> pmd = pmd_offset(pud, addr); >>> do { >>> next = pmd_addr_end(addr, end); >>> + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { >> Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() >> sufficient? > I think we need both check here because if we use only pmd_huge(), > pmd for thp goes into this branch wrongly. > > Thanks, > Naoya > >>> + check_hugetlb_pmd_range(vma, pmd, nodes, >>> + flags, private); >>> + continue; >>> + } >>> split_huge_page_pmd(vma, addr, pmd); >>> if (pmd_none_or_trans_huge_or_clear_bad(pmd)) >>> continue; >> [...] >> -- >> Michal Hocko >> SUSE Labs > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964873Ab3CTBD3 (ORCPT ); Tue, 19 Mar 2013 21:03:29 -0400 Received: from mail-pb0-f50.google.com ([209.85.160.50]:40356 "EHLO mail-pb0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932679Ab3CTBD1 (ORCPT ); Tue, 19 Mar 2013 21:03:27 -0400 Message-ID: <51490AD8.9050308@gmail.com> Date: Wed, 20 Mar 2013 09:03:20 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Naoya Horiguchi CC: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Naoya, On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: > Currently we can't offline memory blocks which contain hugepages because > a hugepage is considered as an unmovable page. But now with this patch > series, a hugepage has become movable, so by using hugepage migration we > can offline such memory blocks. > > What's different from other users of hugepage migration is that we need > to decompose all the hugepages inside the target memory block into free For other hugepage migration users, hugepage should be freed to hugepage_freelists after migration, but why I don't see any codes do this? > buddy pages after hugepage migration, because otherwise free hugepages > remaining in the memory block intervene the memory offlining. > For this reason we introduce new functions dissolve_free_huge_page() and > dissolve_free_huge_pages(). > > Other than that, what this patch does is straightforwardly to add hugepage > migration code, that is, adding hugepage code to the functions which scan > over pfn and collect hugepages to be migrated, and adding a hugepage > allocation function to alloc_migrate_target(). > > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove > over them because it's larger than memory block. So we now simply leave > it to fail as it is. > > Signed-off-by: Naoya Horiguchi > --- > include/linux/hugetlb.h | 8 ++++++++ > mm/hugetlb.c | 43 +++++++++++++++++++++++++++++++++++++++++ > mm/memory_hotplug.c | 51 ++++++++++++++++++++++++++++++++++++++++--------- > mm/migrate.c | 12 +++++++++++- > mm/page_alloc.c | 12 ++++++++++++ > mm/page_isolation.c | 5 +++++ > 6 files changed, 121 insertions(+), 10 deletions(-) > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > index 86a4d78..e33f07f 100644 > --- v3.8.orig/include/linux/hugetlb.h > +++ v3.8/include/linux/hugetlb.h > @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page); > void putback_active_hugepage(struct page *page); > void putback_active_hugepages(struct list_head *l); > void migrate_hugepage_add(struct page *page, struct list_head *list); > +int is_hugepage_movable(struct page *page); > void copy_huge_page(struct page *dst, struct page *src); > > extern unsigned long hugepages_treat_as_movable; > @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) > #define putback_active_hugepage(p) 0 > #define putback_active_hugepages(l) 0 > #define migrate_hugepage_add(p, l) 0 > +#define is_hugepage_movable(x) 0 > static inline void copy_huge_page(struct page *dst, struct page *src) > { > } > @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h) > return h - hstates; > } > > +extern void dissolve_free_huge_page(struct page *page); > +extern void dissolve_free_huge_pages(unsigned long start_pfn, > + unsigned long end_pfn); > + > #else > struct hstate {}; > #define alloc_huge_page(v, a, r) NULL > @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > } > #define hstate_index_to_shift(index) 0 > #define hstate_index(h) 0 > +#define dissolve_free_huge_page(p) 0 > +#define dissolve_free_huge_pages(s, e) 0 > #endif > > #endif /* _LINUX_HUGETLB_H */ > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > index ccf9995..c28e6c9 100644 > --- v3.8.orig/mm/hugetlb.c > +++ v3.8/mm/hugetlb.c > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > return ret; > } > > +/* Dissolve a given free hugepage into free pages. */ > +void dissolve_free_huge_page(struct page *page) > +{ > + if (PageHuge(page) && !page_count(page)) { > + struct hstate *h = page_hstate(page); > + int nid = page_to_nid(page); > + spin_lock(&hugetlb_lock); > + list_del(&page->lru); > + h->free_huge_pages--; > + h->free_huge_pages_node[nid]--; > + update_and_free_page(h, page); > + spin_unlock(&hugetlb_lock); > + } > +} > + > +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */ > +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > +{ > + unsigned long pfn; > + unsigned int step = 1 << (HUGETLB_PAGE_ORDER); > + for (pfn = start_pfn; pfn < end_pfn; pfn += step) > + dissolve_free_huge_page(pfn_to_page(pfn)); > +} > + > static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > { > struct page *page; > @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage) > return 0; > } > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > +int is_hugepage_movable(struct page *hpage) > +{ > + struct page *page; > + struct page *tmp; > + struct hstate *h = page_hstate(hpage); > + int ret = 0; > + > + VM_BUG_ON(!PageHuge(hpage)); > + if (PageTail(hpage)) > + return 0; > + spin_lock(&hugetlb_lock); > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) > + if (page == hpage) > + ret = 1; > + spin_unlock(&hugetlb_lock); > + return ret; > +} > + > /* > * This function is called from memory failure code. > * Assume the caller holds page lock of the head page. > diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c > index d04ed87..6418de2 100644 > --- v3.8.orig/mm/memory_hotplug.c > +++ v3.8/mm/memory_hotplug.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > > #include > > @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) > } > > /* > - * Scanning pfn is much easier than scanning lru list. > - * Scan pfn from start to end and Find LRU page. > + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages > + * and hugepages). We scan pfn because it's much easier than scanning over > + * linked list. This function returns the pfn of the first found movable > + * page if it's found, otherwise 0. > */ > -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > +static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > { > unsigned long pfn; > struct page *page; > @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > page = pfn_to_page(pfn); > if (PageLRU(page)) > return pfn; > + if (PageHuge(page)) { > + if (is_hugepage_movable(page)) > + return pfn; > + else > + pfn += (1 << compound_order(page)) - 1; > + } > } > } > return 0; > @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > page = pfn_to_page(pfn); > if (!get_page_unless_zero(page)) > continue; > + if (PageHuge(page)) { > + /* > + * Larger hugepage (1GB for x86_64) is larger than > + * memory block, so pfn scan can start at the tail > + * page of larger hugepage. In such case, > + * we simply skip the hugepage and move the cursor > + * to the last tail page. > + */ > + if (PageTail(page)) { > + struct page *head = compound_head(page); > + pfn = page_to_pfn(head) + > + (1 << compound_order(head)) - 1; > + put_page(page); > + continue; > + } > + pfn = (1 << compound_order(page)) - 1; > + if (huge_page_size(page_hstate(page)) != PMD_SIZE) { > + put_page(page); > + continue; > + } > + list_move_tail(&page->lru, &source); > + move_pages -= 1 << compound_order(page); > + continue; > + } > /* > * We can skip free pages. And we can only deal with pages on > * LRU. > @@ -1049,7 +1082,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > } > if (!list_empty(&source)) { > if (not_managed) { > - putback_lru_pages(&source); > + putback_movable_pages(&source); > goto out; > } > > @@ -1057,11 +1090,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > * alloc_migrate_target should be improooooved!! > * migrate_pages returns # of failed pages. > */ > - ret = migrate_pages(&source, alloc_migrate_target, 0, > + ret = migrate_movable_pages(&source, alloc_migrate_target, 0, > true, MIGRATE_SYNC, > MR_MEMORY_HOTPLUG); > - if (ret) > - putback_lru_pages(&source); > } > out: > return ret; > @@ -1304,8 +1335,8 @@ static int __ref __offline_pages(unsigned long start_pfn, > drain_all_pages(); > } > > - pfn = scan_lru_pages(start_pfn, end_pfn); > - if (pfn) { /* We have page on LRU */ > + pfn = scan_movable_pages(start_pfn, end_pfn); > + if (pfn) { /* We have movable pages */ > ret = do_migrate_range(pfn, end_pfn); > if (!ret) { > drain = 1; > @@ -1324,6 +1355,8 @@ static int __ref __offline_pages(unsigned long start_pfn, > yield(); > /* drain pcp pages, this is synchronous. */ > drain_all_pages(); > + /* dissolve all free hugepages inside the memory block */ > + dissolve_free_huge_pages(start_pfn, end_pfn); > /* check again */ > offlined_pages = check_pages_isolated(start_pfn, end_pfn); > if (offlined_pages < 0) { > diff --git v3.8.orig/mm/migrate.c v3.8/mm/migrate.c > index 8c457e7..a491a98 100644 > --- v3.8.orig/mm/migrate.c > +++ v3.8/mm/migrate.c > @@ -1009,8 +1009,18 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, > > unlock_page(hpage); > out: > - if (rc != -EAGAIN) > + if (rc != -EAGAIN) { > putback_active_hugepage(hpage); > + > + /* > + * After hugepage migration from memory hotplug, the original > + * hugepage should never be allocated again. This will be > + * done by dissolving it into free normal pages, because > + * we already set migratetype to MIGRATE_ISOLATE for them. > + */ > + if (offlining) > + dissolve_free_huge_page(hpage); > + } > put_page(new_hpage); > if (result) { > if (rc) > diff --git v3.8.orig/mm/page_alloc.c v3.8/mm/page_alloc.c > index 6a83cd3..c37951d 100644 > --- v3.8.orig/mm/page_alloc.c > +++ v3.8/mm/page_alloc.c > @@ -58,6 +58,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -5686,6 +5687,17 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > continue; > > page = pfn_to_page(check); > + > + /* > + * Hugepages are not in LRU lists, but they're movable. > + * We need not scan over tail pages bacause we don't > + * handle each tail page individually in migration. > + */ > + if (PageHuge(page)) { > + iter += (1 << compound_order(page)) - 1; > + continue; > + } > + > /* > * We can't use page_count without pin a page > * because another CPU can free compound page. > diff --git v3.8.orig/mm/page_isolation.c v3.8/mm/page_isolation.c > index 383bdbb..cf48ef6 100644 > --- v3.8.orig/mm/page_isolation.c > +++ v3.8/mm/page_isolation.c > @@ -6,6 +6,7 @@ > #include > #include > #include > +#include > #include "internal.h" > > int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages) > @@ -252,6 +253,10 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, > { > gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE; > > + if (PageHuge(page)) > + return alloc_huge_page_node(page_hstate(compound_head(page)), > + numa_node_id()); > + > if (PageHighMem(page)) > gfp_mask |= __GFP_HIGHMEM; > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758032Ab3CTD4e (ORCPT ); Tue, 19 Mar 2013 23:56:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:31417 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752781Ab3CTD4d (ORCPT ); Tue, 19 Mar 2013 23:56:33 -0400 Date: Tue, 19 Mar 2013 23:55:33 -0400 From: Naoya Horiguchi To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1363751733-1fg9kic6-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130318160737.GU10192@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318160737.GU10192@dhcp22.suse.cz> Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 18, 2013 at 05:07:37PM +0100, Michal Hocko wrote: > On Thu 21-02-13 14:41:47, Naoya Horiguchi wrote: > > Currently we can't offline memory blocks which contain hugepages because > > a hugepage is considered as an unmovable page. But now with this patch > > series, a hugepage has become movable, so by using hugepage migration we > > can offline such memory blocks. > > > > What's different from other users of hugepage migration is that we need > > to decompose all the hugepages inside the target memory block into free > > buddy pages after hugepage migration, because otherwise free hugepages > > remaining in the memory block intervene the memory offlining. > > For this reason we introduce new functions dissolve_free_huge_page() and > > dissolve_free_huge_pages(). > > > > Other than that, what this patch does is straightforwardly to add hugepage > > migration code, that is, adding hugepage code to the functions which scan > > over pfn and collect hugepages to be migrated, and adding a hugepage > > allocation function to alloc_migrate_target(). > > > > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove > > over them because it's larger than memory block. So we now simply leave > > it to fail as it is. > > What we could do is to check whether there is a free gb huge page on > other node and migrate there. Correct, and 1GB page migration needs more code in migration core code (mainly it's related to migration entry in pud) and enough testing, so I want to do it in separate patchset. > > Signed-off-by: Naoya Horiguchi > > --- > > include/linux/hugetlb.h | 8 ++++++++ > > mm/hugetlb.c | 43 +++++++++++++++++++++++++++++++++++++++++ > > mm/memory_hotplug.c | 51 ++++++++++++++++++++++++++++++++++++++++--------- > > mm/migrate.c | 12 +++++++++++- > > mm/page_alloc.c | 12 ++++++++++++ > > mm/page_isolation.c | 5 +++++ > > 6 files changed, 121 insertions(+), 10 deletions(-) > > > > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h > > index 86a4d78..e33f07f 100644 > > --- v3.8.orig/include/linux/hugetlb.h > > +++ v3.8/include/linux/hugetlb.h > > @@ -70,6 +70,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page); > > void putback_active_hugepage(struct page *page); > > void putback_active_hugepages(struct list_head *l); > > void migrate_hugepage_add(struct page *page, struct list_head *list); > > +int is_hugepage_movable(struct page *page); > > void copy_huge_page(struct page *dst, struct page *src); > > > > extern unsigned long hugepages_treat_as_movable; > > @@ -136,6 +137,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page) > > #define putback_active_hugepage(p) 0 > > #define putback_active_hugepages(l) 0 > > #define migrate_hugepage_add(p, l) 0 > > +#define is_hugepage_movable(x) 0 > > static inline void copy_huge_page(struct page *dst, struct page *src) > > { > > } > > @@ -358,6 +360,10 @@ static inline int hstate_index(struct hstate *h) > > return h - hstates; > > } > > > > +extern void dissolve_free_huge_page(struct page *page); > > +extern void dissolve_free_huge_pages(unsigned long start_pfn, > > + unsigned long end_pfn); > > + > > #else > > struct hstate {}; > > #define alloc_huge_page(v, a, r) NULL > > @@ -378,6 +384,8 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) > > } > > #define hstate_index_to_shift(index) 0 > > #define hstate_index(h) 0 > > +#define dissolve_free_huge_page(p) 0 > > +#define dissolve_free_huge_pages(s, e) 0 > > #endif > > > > #endif /* _LINUX_HUGETLB_H */ > > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > > index ccf9995..c28e6c9 100644 > > --- v3.8.orig/mm/hugetlb.c > > +++ v3.8/mm/hugetlb.c > > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > > return ret; > > } > > > > +/* Dissolve a given free hugepage into free pages. */ > > +void dissolve_free_huge_page(struct page *page) > > +{ > > + if (PageHuge(page) && !page_count(page)) { > > Could you clarify why you are cheking page_count here? I assume it is to > make sure the page is free but what prevents it being increased before > you take hugetlb_lock? There's nothing to prevent it, so it's not safe to check refcount outside hugetlb_lock. > > + struct hstate *h = page_hstate(page); > > + int nid = page_to_nid(page); > > + spin_lock(&hugetlb_lock); > > + list_del(&page->lru); > > + h->free_huge_pages--; > > + h->free_huge_pages_node[nid]--; > > + update_and_free_page(h, page); > > + spin_unlock(&hugetlb_lock); > > + } > > +} > > + > > +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */ > > +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > > +{ > > + unsigned long pfn; > > + unsigned int step = 1 << (HUGETLB_PAGE_ORDER); > > hugetlb pages could be present in different sizes so this doesn't work > in general. You need to to get order from page_hstate. OK. > > + for (pfn = start_pfn; pfn < end_pfn; pfn += step) > > + dissolve_free_huge_page(pfn_to_page(pfn)); > > +} > > + > > static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) > > { > > struct page *page; > > @@ -3158,6 +3182,25 @@ static int is_hugepage_on_freelist(struct page *hpage) > > return 0; > > } > > > > +/* Returns true for head pages of in-use hugepages, otherwise returns false. */ > > +int is_hugepage_movable(struct page *hpage) > > +{ > > + struct page *page; > > + struct page *tmp; > > + struct hstate *h = page_hstate(hpage); > > + int ret = 0; > > + > > + VM_BUG_ON(!PageHuge(hpage)); > > + if (PageTail(hpage)) > > + return 0; > > + spin_lock(&hugetlb_lock); > > + list_for_each_entry_safe(page, tmp, &h->hugepage_activelist, lru) > > + if (page == hpage) > > + ret = 1; > > + spin_unlock(&hugetlb_lock); > > + return ret; > > +} > > + > > /* > > * This function is called from memory failure code. > > * Assume the caller holds page lock of the head page. > > diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c > > index d04ed87..6418de2 100644 > > --- v3.8.orig/mm/memory_hotplug.c > > +++ v3.8/mm/memory_hotplug.c > > @@ -29,6 +29,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > > > @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) > > } > > > > /* > > - * Scanning pfn is much easier than scanning lru list. > > - * Scan pfn from start to end and Find LRU page. > > + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages > > + * and hugepages). We scan pfn because it's much easier than scanning over > > + * linked list. This function returns the pfn of the first found movable > > + * page if it's found, otherwise 0. > > */ > > -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > > +static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > > { > > unsigned long pfn; > > struct page *page; > > @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > > page = pfn_to_page(pfn); > > if (PageLRU(page)) > > return pfn; > > + if (PageHuge(page)) { > > + if (is_hugepage_movable(page)) > > + return pfn; > > + else > > + pfn += (1 << compound_order(page)) - 1; > > + } > > scan_lru_pages's name gets really confusing after this change because > hugetlb pages are not on the LRU. Maybe it would be good to rename it. Yes, and that's done in right above chunk. > > > } > > } > > return 0; > > @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > > page = pfn_to_page(pfn); > > if (!get_page_unless_zero(page)) > > continue; > > All tail pages have 0 reference count (according to prep_compound_page) > so they would be skipped anyway. This makes the below pfn tweaks > pointless. I was totally mistaken about what we should do here, sorry. If we call do_migrate_range() for 1GB hugepage, we should return with error (maybe -EBUSY) instead of just skipping it, otherwise the caller __offline_pages() repeats 'goto repeat' until timeout. In order to do that, we had better insert if(PageHuge) block before getting refcount. And ... > > + if (PageHuge(page)) { > > + /* > > + * Larger hugepage (1GB for x86_64) is larger than > > + * memory block, so pfn scan can start at the tail > > + * page of larger hugepage. In such case, > > + * we simply skip the hugepage and move the cursor > > + * to the last tail page. > > + */ > > + if (PageTail(page)) { > > + struct page *head = compound_head(page); > > + pfn = page_to_pfn(head) + > > + (1 << compound_order(head)) - 1; > > + put_page(page); > > + continue; > > + } > > + pfn = (1 << compound_order(page)) - 1; > > + if (huge_page_size(page_hstate(page)) != PMD_SIZE) { > > + put_page(page); > > + continue; > > + } > > There might be other hugepage sizes which fit into memblock so this test > doesn't seem right. yes, so compound_order(head) > PFN_SECTION_SHIFT would be better. I'll replace this chunk with the following if I don't get any other suggestion. @@ -1017,6 +1026,21 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) if (!pfn_valid(pfn)) continue; page = pfn_to_page(pfn); + + if (PageHuge(page)) { + struct page *head = compound_head(page); + pfn = page_to_pfn(head) + (1 << compound_order(head)) - 1; + if (compound_order(head) > PFN_SECTION_SHIFT) { + ret = -EBUSY; + break; + } + if (!get_page_unless_zero(page)) + continue; + list_move_tail(&head->lru, &source); + move_pages -= 1 << compound_order(head); + continue; + } + if (!get_page_unless_zero(page)) continue; /* Thanks, Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758008Ab3CTGNz (ORCPT ); Wed, 20 Mar 2013 02:13:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:21597 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757958Ab3CTGNx (ORCPT ); Wed, 20 Mar 2013 02:13:53 -0400 Date: Wed, 20 Mar 2013 02:12:54 -0400 From: Naoya Horiguchi To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1363759974-38t0k25g-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <20130319071113.GD5112@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> <20130319071113.GD5112@dhcp22.suse.cz> Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 19, 2013 at 08:11:13AM +0100, Michal Hocko wrote: > On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote: > > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote: > > > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: ... > > > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > > > pmd = pmd_offset(pud, addr); > > > > do { > > > > next = pmd_addr_end(addr, end); > > > > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { > > > > > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() > > > sufficient? > > > > I think we need both check here because if we use only pmd_huge(), > > pmd for thp goes into this branch wrongly. > > Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it > obviously checks only _PAGE_PSE same as pmd_large() which is really > unfortunate and confusing. Can we make it hugetlb specific? I agree that we had better fix this confusion. What pmd_huge() (or pmd_large() in some architectures) does is just checking whether a given pmd is pointing to huge/large page or not. It does not say which type of hugepage it is. So it shouldn't be used to decide whether the hugepage are hugetlbfs or not. I think it would be better to introduce pmd_hugetlb() which has pmd and vma as arguments and returns true only for hugetlbfs pmd. Checking pmd_hugetlb() should come before checking pmd_trans_huge() because pmd_trans_huge() implicitly assumes that the vma which covers the virtual address of a given pmd is not hugetlbfs vma. I'm interested in this cleanup, so will work on it after this patchset. Thanks, Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752541Ab3CTHlX (ORCPT ); Wed, 20 Mar 2013 03:41:23 -0400 Received: from cantor2.suse.de ([195.135.220.15]:46466 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751422Ab3CTHlW (ORCPT ); Wed, 20 Mar 2013 03:41:22 -0400 Date: Wed, 20 Mar 2013 08:41:18 +0100 From: Michal Hocko To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Message-ID: <20130320074118.GB20045@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> <20130319071113.GD5112@dhcp22.suse.cz> <1363759974-38t0k25g-mutt-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1363759974-38t0k25g-mutt-n-horiguchi@ah.jp.nec.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 20-03-13 02:12:54, Naoya Horiguchi wrote: > On Tue, Mar 19, 2013 at 08:11:13AM +0100, Michal Hocko wrote: > > On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote: > > > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote: > > > > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote: > ... > > > > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud, > > > > > pmd = pmd_offset(pud, addr); > > > > > do { > > > > > next = pmd_addr_end(addr, end); > > > > > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { > > > > > > > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge() > > > > sufficient? > > > > > > I think we need both check here because if we use only pmd_huge(), > > > pmd for thp goes into this branch wrongly. > > > > Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it > > obviously checks only _PAGE_PSE same as pmd_large() which is really > > unfortunate and confusing. Can we make it hugetlb specific? > > I agree that we had better fix this confusion. > > What pmd_huge() (or pmd_large() in some architectures) does is just > checking whether a given pmd is pointing to huge/large page or not. > It does not say which type of hugepage it is. > So it shouldn't be used to decide whether the hugepage are hugetlbfs or not. > I think it would be better to introduce pmd_hugetlb() which has pmd and vma > as arguments and returns true only for hugetlbfs pmd. > Checking pmd_hugetlb() should come before checking pmd_trans_huge() because > pmd_trans_huge() implicitly assumes that the vma which covers the virtual > address of a given pmd is not hugetlbfs vma. > > I'm interested in this cleanup, so will work on it after this patchset. pnd_huge is used only at few places so it shouldn't be very big. On the other hand you do not have vma always available so it is getting tricky. Thanks -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756213Ab3CTH5o (ORCPT ); Wed, 20 Mar 2013 03:57:44 -0400 Received: from cantor2.suse.de ([195.135.220.15]:46922 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752134Ab3CTH5m (ORCPT ); Wed, 20 Mar 2013 03:57:42 -0400 Date: Wed, 20 Mar 2013 08:57:36 +0100 From: Michal Hocko To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Message-ID: <20130320075736.GC20045@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318160737.GU10192@dhcp22.suse.cz> <1363751733-1fg9kic6-mutt-n-horiguchi@ah.jp.nec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1363751733-1fg9kic6-mutt-n-horiguchi@ah.jp.nec.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 19-03-13 23:55:33, Naoya Horiguchi wrote: > On Mon, Mar 18, 2013 at 05:07:37PM +0100, Michal Hocko wrote: > > On Thu 21-02-13 14:41:47, Naoya Horiguchi wrote: [...] > > > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove > > > over them because it's larger than memory block. So we now simply leave > > > it to fail as it is. > > > > What we could do is to check whether there is a free gb huge page on > > other node and migrate there. > > Correct, and 1GB page migration needs more code in migration core code > (mainly it's related to migration entry in pud) and enough testing, > so I want to do it in separate patchset. Sure, this was just a note that it is achievable not that it has to be done in the patchset. [...] > > > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c > > > index ccf9995..c28e6c9 100644 > > > --- v3.8.orig/mm/hugetlb.c > > > +++ v3.8/mm/hugetlb.c > > > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > > > return ret; > > > } > > > > > > +/* Dissolve a given free hugepage into free pages. */ > > > +void dissolve_free_huge_page(struct page *page) > > > +{ > > > + if (PageHuge(page) && !page_count(page)) { > > > > Could you clarify why you are cheking page_count here? I assume it is to > > make sure the page is free but what prevents it being increased before > > you take hugetlb_lock? > > There's nothing to prevent it, so it's not safe to check refcount outside > hugetlb_lock. OK, so the lock has to be moved up. [...] > > > diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c > > > index d04ed87..6418de2 100644 > > > --- v3.8.orig/mm/memory_hotplug.c > > > +++ v3.8/mm/memory_hotplug.c > > > @@ -29,6 +29,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > > > > #include > > > > > > @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) > > > } > > > > > > /* > > > - * Scanning pfn is much easier than scanning lru list. > > > - * Scan pfn from start to end and Find LRU page. > > > + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages > > > + * and hugepages). We scan pfn because it's much easier than scanning over > > > + * linked list. This function returns the pfn of the first found movable > > > + * page if it's found, otherwise 0. > > > */ > > > -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > > > +static unsigned long scan_movable_pages(unsigned long start, unsigned long end) > > > { > > > unsigned long pfn; > > > struct page *page; > > > @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end) > > > page = pfn_to_page(pfn); > > > if (PageLRU(page)) > > > return pfn; > > > + if (PageHuge(page)) { > > > + if (is_hugepage_movable(page)) > > > + return pfn; > > > + else > > > + pfn += (1 << compound_order(page)) - 1; > > > + } > > > > scan_lru_pages's name gets really confusing after this change because > > hugetlb pages are not on the LRU. Maybe it would be good to rename it. > > Yes, and that's done in right above chunk. bahh, I am blind. I got confused by the name in the hunk header. Sorry about that. > > > > > > } > > > } > > > return 0; > > > @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > > > page = pfn_to_page(pfn); > > > if (!get_page_unless_zero(page)) > > > continue; > > > > All tail pages have 0 reference count (according to prep_compound_page) > > so they would be skipped anyway. This makes the below pfn tweaks > > pointless. > > I was totally mistaken about what we should do here, sorry. If we call > do_migrate_range() for 1GB hugepage, we should return with error (maybe -EBUSY) > instead of just skipping it, otherwise the caller __offline_pages() repeats > 'goto repeat' until timeout. In order to do that, we had better insert > if(PageHuge) block before getting refcount. And ... > > > > + if (PageHuge(page)) { > > > + /* > > > + * Larger hugepage (1GB for x86_64) is larger than > > > + * memory block, so pfn scan can start at the tail > > > + * page of larger hugepage. In such case, > > > + * we simply skip the hugepage and move the cursor > > > + * to the last tail page. > > > + */ > > > + if (PageTail(page)) { > > > + struct page *head = compound_head(page); > > > + pfn = page_to_pfn(head) + > > > + (1 << compound_order(head)) - 1; > > > + put_page(page); > > > + continue; > > > + } > > > + pfn = (1 << compound_order(page)) - 1; > > > + if (huge_page_size(page_hstate(page)) != PMD_SIZE) { > > > + put_page(page); > > > + continue; > > > + } > > > > There might be other hugepage sizes which fit into memblock so this test > > doesn't seem right. > > yes, so compound_order(head) > PFN_SECTION_SHIFT would be better. I would rather see compound_order(head) < MAX_ORDER to be more coupled with the allocator. > I'll replace this chunk with the following if I don't get any other > suggestion. > > @@ -1017,6 +1026,21 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > if (!pfn_valid(pfn)) > continue; > page = pfn_to_page(pfn); > + > + if (PageHuge(page)) { > + struct page *head = compound_head(page); > + pfn = page_to_pfn(head) + (1 << compound_order(head)) - 1; I do not think this is safe without an elevated ref count. Your page might be on the way to be freed. So you need to put get_page_unless_zero before compound_order check. Besides that I do not see too much point in optimizing this path on the code complexity behalf. Sure we would call get_page_unless_zero pointlessly for all tail pages but this is hardly a hot path. > + if (compound_order(head) > PFN_SECTION_SHIFT) { > + ret = -EBUSY; > + break; > + } > + if (!get_page_unless_zero(page)) Should be head. > + continue; > + list_move_tail(&head->lru, &source); > + move_pages -= 1 << compound_order(head); > + continue; > + } > + > if (!get_page_unless_zero(page)) > continue; > /* > > Thanks, > Naoya > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752493Ab3CTVfq (ORCPT ); Wed, 20 Mar 2013 17:35:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:18448 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751197Ab3CTVfp (ORCPT ); Wed, 20 Mar 2013 17:35:45 -0400 Date: Wed, 20 Mar 2013 17:35:26 -0400 From: Naoya Horiguchi To: Simon Jeons Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <5148F830.3070601@gmail.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> Subject: Re: [RFC][PATCH 0/9] extend hugepage migration Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-References: <5148F830.3070601@gmail.com> X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 20, 2013 at 07:43:44AM +0800, Simon Jeons wrote: ... > >Easy patch access: > > git@github.com:Naoya-Horiguchi/linux.git > > branch:extend_hugepage_migration > > > >Test code: > > git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git > > git clone > git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git > Cloning into test_hugepage_migration_extension... > Permission denied (publickey). > fatal: The remote end hung up unexpectedly Sorry, wrong url. git://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git or https://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git should work. Thanks, Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752816Ab3CTVxi (ORCPT ); Wed, 20 Mar 2013 17:53:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:19185 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751755Ab3CTVxh (ORCPT ); Wed, 20 Mar 2013 17:53:37 -0400 Date: Wed, 20 Mar 2013 17:53:19 -0400 From: Naoya Horiguchi To: Simon Jeons Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1363816399-c6e7mofc-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <5148FB6C.4070202@gmail.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> <5148FB6C.4070202@gmail.com> Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-References: <5148FB6C.4070202@gmail.com> X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 20, 2013 at 07:57:32AM +0800, Simon Jeons wrote: > Hi Naoya, > On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: > >When we have a page fault for the address which is backed by a hugepage > >under migration, the kernel can't wait correctly until the migration > >finishes. This is because pte_offset_map_lock() can't get a correct > > It seems that current hugetlb_fault still wait hugetlb page under > migration, how can it work without lock 2MB memory? Hugetlb_fault() does call migration_entry_wait(), but returns immediately. So page fault happens over and over again until the migration completes. IOW, migration_entry_wait() is now broken for hugepage and doesn't work as expected. Thanks, Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752702Ab3CTWBF (ORCPT ); Wed, 20 Mar 2013 18:01:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:61137 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751659Ab3CTWBD (ORCPT ); Wed, 20 Mar 2013 18:01:03 -0400 Date: Wed, 20 Mar 2013 17:59:53 -0400 From: Naoya Horiguchi To: Simon Jeons Cc: Michal Hocko , linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1363816793-7eq6pu0l-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <5149034A.5050907@gmail.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> <5149034A.5050907@gmail.com> Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-References: <5149034A.5050907@gmail.com> X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 20, 2013 at 08:31:06AM +0800, Simon Jeons wrote: ... > >>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c > >>> index e2df1c1..8627135 100644 > >>> --- v3.8.orig/mm/mempolicy.c > >>> +++ v3.8/mm/mempolicy.c > >>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > >>> return addr != end; > >>> } > >>> > >>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, > >>> + const nodemask_t *nodes, unsigned long flags, > >>> + void *private) > >>> +{ > >>> +#ifdef CONFIG_HUGETLB_PAGE > >>> + int nid; > >>> + struct page *page; > >>> + > >>> + spin_lock(&vma->vm_mm->page_table_lock); > >>> + page = pte_page(huge_ptep_get((pte_t *)pmd)); > >>> + spin_unlock(&vma->vm_mm->page_table_lock); > >> I am a bit confused why page_table_lock is used here and why it doesn't > >> cover the page usage. > > I expected this function to do the same for pmd as check_pte_range() does > > for pte, but the above code didn't do it. I should've put spin_unlock > > below migrate_hugepage_add(). Sorry for the confusion. > > I still confuse! Could you explain more in details? With the above code, check_hugetlb_pmd_range() checks page_mapcount outside the page table lock, but mapcount can be decremented by __unmap_hugepage_range(), so there's a race. __unmap_hugepage_range() calls page_remove_rmap() inside page table lock, so we can avoid this race by doing whole check_hugetlb_pmd_range()'s work inside the page table lock. Thanks, Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753426Ab3CTWGE (ORCPT ); Wed, 20 Mar 2013 18:06:04 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49801 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751348Ab3CTWGC (ORCPT ); Wed, 20 Mar 2013 18:06:02 -0400 Date: Wed, 20 Mar 2013 18:05:48 -0400 From: Naoya Horiguchi To: Simon Jeons Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Message-ID: <1363817148-rlt5mp5n-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <51490AD8.9050308@gmail.com> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> <51490AD8.9050308@gmail.com> Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mutt-References: <51490AD8.9050308@gmail.com> X-Mutt-Fcc: ~/Maildir/sent/ User-Agent: Mutt 1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 20, 2013 at 09:03:20AM +0800, Simon Jeons wrote: > Hi Naoya, > On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: > >Currently we can't offline memory blocks which contain hugepages because > >a hugepage is considered as an unmovable page. But now with this patch > >series, a hugepage has become movable, so by using hugepage migration we > >can offline such memory blocks. > > > >What's different from other users of hugepage migration is that we need > >to decompose all the hugepages inside the target memory block into free > > For other hugepage migration users, hugepage should be freed to > hugepage_freelists after migration, but why I don't see any codes do > this? The source hugepages which are migrated by NUMA related system calls (migrate_pages(2), move_pages(2), and mbind(2)) are still useable, so we simply free them into free hugepage pool. OTOH, the source hugepages migrated by memory hotremove should not be reusable, because users of memory hotremove want to remove the memory from the system. So we need to free such hugepages forcibly into the buddy pages, otherwise memory offining doesn't work. Thanks, Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753106Ab3CTXhB (ORCPT ); Wed, 20 Mar 2013 19:37:01 -0400 Received: from mail-da0-f50.google.com ([209.85.210.50]:52239 "EHLO mail-da0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752021Ab3CTXhA (ORCPT ); Wed, 20 Mar 2013 19:37:00 -0400 Message-ID: <514A4815.4040206@gmail.com> Date: Thu, 21 Mar 2013 07:36:53 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Naoya Horiguchi CC: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> <5148FB6C.4070202@gmail.com> <1363816399-c6e7mofc-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1363816399-c6e7mofc-mutt-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Naoya, On 03/21/2013 05:53 AM, Naoya Horiguchi wrote: > On Wed, Mar 20, 2013 at 07:57:32AM +0800, Simon Jeons wrote: >> Hi Naoya, >> On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: >>> When we have a page fault for the address which is backed by a hugepage >>> under migration, the kernel can't wait correctly until the migration >>> finishes. This is because pte_offset_map_lock() can't get a correct >> It seems that current hugetlb_fault still wait hugetlb page under >> migration, how can it work without lock 2MB memory? > Hugetlb_fault() does call migration_entry_wait(), but returns immediately. Could you point out to me which code in function migration_entry_wait() lead to return immediately? > So page fault happens over and over again until the migration completes. > IOW, migration_entry_wait() is now broken for hugepage and doesn't work > as expected. > > Thanks, > Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754523Ab3CTXtz (ORCPT ); Wed, 20 Mar 2013 19:49:55 -0400 Received: from mail-pd0-f179.google.com ([209.85.192.179]:56411 "EHLO mail-pd0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751054Ab3CTXty (ORCPT ); Wed, 20 Mar 2013 19:49:54 -0400 X-Greylist: delayed 172591 seconds by postgrey-1.27 at vger.kernel.org; Wed, 20 Mar 2013 19:49:54 EDT Message-ID: <514A4B1C.6020201@gmail.com> Date: Thu, 21 Mar 2013 07:49:48 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Naoya Horiguchi CC: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Naoya, On 03/21/2013 05:35 AM, Naoya Horiguchi wrote: > On Wed, Mar 20, 2013 at 07:43:44AM +0800, Simon Jeons wrote: > ... >>> Easy patch access: >>> git@github.com:Naoya-Horiguchi/linux.git >>> branch:extend_hugepage_migration >>> >>> Test code: >>> git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git >> git clone >> git@github.com:Naoya-Horiguchi/test_hugepage_migration_extension.git >> Cloning into test_hugepage_migration_extension... >> Permission denied (publickey). >> fatal: The remote end hung up unexpectedly > Sorry, wrong url. > git://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git > or > https://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git > should work. When I hacking arch/x86/mm/hugetlbpage.c like this, diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c index ae1aa71..87f34ee 100644 --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ -#ifdef CONFIG_X86_64 static __init int setup_hugepagesz(char *opt) { unsigned long ps = memparse(opt, &opt); if (ps == PMD_SIZE) { hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); - } else if (ps == PUD_SIZE && cpu_has_gbpages) { - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); + } else if (ps == PUD_SIZE) { + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); } else { printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", ps >> 20); I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. What's the difference between these pages which I hacking and normal huge pages? > > Thanks, > Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753722Ab3CTXzf (ORCPT ); Wed, 20 Mar 2013 19:55:35 -0400 Received: from mail-da0-f54.google.com ([209.85.210.54]:38046 "EHLO mail-da0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751151Ab3CTXze (ORCPT ); Wed, 20 Mar 2013 19:55:34 -0400 Message-ID: <514A4C70.2020303@gmail.com> Date: Thu, 21 Mar 2013 07:55:28 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Naoya Horiguchi CC: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-9-git-send-email-n-horiguchi@ah.jp.nec.com> <51490AD8.9050308@gmail.com> <1363817148-rlt5mp5n-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1363817148-rlt5mp5n-mutt-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Naoya, On 03/21/2013 06:05 AM, Naoya Horiguchi wrote: > On Wed, Mar 20, 2013 at 09:03:20AM +0800, Simon Jeons wrote: >> Hi Naoya, >> On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: >>> Currently we can't offline memory blocks which contain hugepages because >>> a hugepage is considered as an unmovable page. But now with this patch >>> series, a hugepage has become movable, so by using hugepage migration we >>> can offline such memory blocks. >>> >>> What's different from other users of hugepage migration is that we need >>> to decompose all the hugepages inside the target memory block into free >> For other hugepage migration users, hugepage should be freed to >> hugepage_freelists after migration, but why I don't see any codes do >> this? > The source hugepages which are migrated by NUMA related system calls > (migrate_pages(2), move_pages(2), and mbind(2)) are still useable, > so we simply free them into free hugepage pool. It seems that you misunderstand why I confuse. I can't find where free huge pages to hugepage pool, could you point out to me? > OTOH, the source hugepages migrated by memory hotremove should not be > reusable, because users of memory hotremove want to remove the memory > from the system. So we need to free such hugepages forcibly into the > buddy pages, otherwise memory offining doesn't work. > > Thanks, > Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754052Ab3CUAGO (ORCPT ); Wed, 20 Mar 2013 20:06:14 -0400 Received: from mail-ie0-f179.google.com ([209.85.223.179]:42408 "EHLO mail-ie0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752260Ab3CUAGM (ORCPT ); Wed, 20 Mar 2013 20:06:12 -0400 Message-ID: <514A4EEE.1080405@gmail.com> Date: Thu, 21 Mar 2013 08:06:06 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Naoya Horiguchi CC: Michal Hocko , linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-6-git-send-email-n-horiguchi@ah.jp.nec.com> <20130318154057.GS10192@dhcp22.suse.cz> <1363651636-3lsf20se-mutt-n-horiguchi@ah.jp.nec.com> <5149034A.5050907@gmail.com> <1363816793-7eq6pu0l-mutt-n-horiguchi@ah.jp.nec.com> In-Reply-To: <1363816793-7eq6pu0l-mutt-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Naoya, On 03/21/2013 05:59 AM, Naoya Horiguchi wrote: > On Wed, Mar 20, 2013 at 08:31:06AM +0800, Simon Jeons wrote: > ... >>>>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c >>>>> index e2df1c1..8627135 100644 >>>>> --- v3.8.orig/mm/mempolicy.c >>>>> +++ v3.8/mm/mempolicy.c >>>>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd, >>>>> return addr != end; >>>>> } >>>>> >>>>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd, >>>>> + const nodemask_t *nodes, unsigned long flags, >>>>> + void *private) >>>>> +{ >>>>> +#ifdef CONFIG_HUGETLB_PAGE >>>>> + int nid; >>>>> + struct page *page; >>>>> + >>>>> + spin_lock(&vma->vm_mm->page_table_lock); >>>>> + page = pte_page(huge_ptep_get((pte_t *)pmd)); >>>>> + spin_unlock(&vma->vm_mm->page_table_lock); >>>> I am a bit confused why page_table_lock is used here and why it doesn't >>>> cover the page usage. >>> I expected this function to do the same for pmd as check_pte_range() does >>> for pte, but the above code didn't do it. I should've put spin_unlock >>> below migrate_hugepage_add(). Sorry for the confusion. >> I still confuse! Could you explain more in details? > With the above code, check_hugetlb_pmd_range() checks page_mapcount > outside the page table lock, but mapcount can be decremented by > __unmap_hugepage_range(), so there's a race. > __unmap_hugepage_range() calls page_remove_rmap() inside page table lock, > so we can avoid this race by doing whole check_hugetlb_pmd_range()'s work > inside the page table lock. Why you use page_table_lock instead of split ptlock to protect 2MB? > > Thanks, > Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758115Ab3CUM4e (ORCPT ); Thu, 21 Mar 2013 08:56:34 -0400 Received: from cantor2.suse.de ([195.135.220.15]:58667 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754724Ab3CUM4d (ORCPT ); Thu, 21 Mar 2013 08:56:33 -0400 Date: Thu, 21 Mar 2013 13:56:28 +0100 From: Michal Hocko To: Simon Jeons Cc: Naoya Horiguchi , linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCH 0/9] extend hugepage migration Message-ID: <20130321125628.GB6051@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <514A4B1C.6020201@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 21-03-13 07:49:48, Simon Jeons wrote: [...] > When I hacking arch/x86/mm/hugetlbpage.c like this, > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > index ae1aa71..87f34ee 100644 > --- a/arch/x86/mm/hugetlbpage.c > +++ b/arch/x86/mm/hugetlbpage.c > @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, > unsigned long addr, > > #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ > > -#ifdef CONFIG_X86_64 > static __init int setup_hugepagesz(char *opt) > { > unsigned long ps = memparse(opt, &opt); > if (ps == PMD_SIZE) { > hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); > - } else if (ps == PUD_SIZE && cpu_has_gbpages) { > - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); > + } else if (ps == PUD_SIZE) { > + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); > } else { > printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", > ps >> 20); > > I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. > What's the difference between these pages which I hacking and normal > huge pages? How is this related to the patch set? Please _stop_ distracting discussion to unrelated topics! Nothing personal but this is just wasting our time. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754068Ab3CUXqk (ORCPT ); Thu, 21 Mar 2013 19:46:40 -0400 Received: from mail-ie0-f171.google.com ([209.85.223.171]:47997 "EHLO mail-ie0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751943Ab3CUXqj (ORCPT ); Thu, 21 Mar 2013 19:46:39 -0400 Message-ID: <514B9BD8.9050207@gmail.com> Date: Fri, 22 Mar 2013 07:46:32 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Michal Hocko CC: Naoya Horiguchi , linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> In-Reply-To: <20130321125628.GB6051@dhcp22.suse.cz> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Michal, On 03/21/2013 08:56 PM, Michal Hocko wrote: > On Thu 21-03-13 07:49:48, Simon Jeons wrote: > [...] >> When I hacking arch/x86/mm/hugetlbpage.c like this, >> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >> index ae1aa71..87f34ee 100644 >> --- a/arch/x86/mm/hugetlbpage.c >> +++ b/arch/x86/mm/hugetlbpage.c >> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, >> unsigned long addr, >> >> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ >> >> -#ifdef CONFIG_X86_64 >> static __init int setup_hugepagesz(char *opt) >> { >> unsigned long ps = memparse(opt, &opt); >> if (ps == PMD_SIZE) { >> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); >> - } else if (ps == PUD_SIZE && cpu_has_gbpages) { >> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); >> + } else if (ps == PUD_SIZE) { >> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); >> } else { >> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", >> ps >> 20); >> >> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. >> What's the difference between these pages which I hacking and normal >> huge pages? > How is this related to the patch set? > Please _stop_ distracting discussion to unrelated topics! > > Nothing personal but this is just wasting our time. Sorry kindly Michal, my bad. Btw, could you explain this question for me? very sorry waste your time. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763256Ab3DDE5j (ORCPT ); Thu, 4 Apr 2013 00:57:39 -0400 Received: from mail-oa0-f41.google.com ([209.85.219.41]:33367 "EHLO mail-oa0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1763197Ab3DDE5i (ORCPT ); Thu, 4 Apr 2013 00:57:38 -0400 Message-ID: <515D083A.4010704@gmail.com> Date: Thu, 04 Apr 2013 12:57:30 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Naoya Horiguchi CC: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/9] migrate: add migrate_entry_wait_huge() References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1361475708-25991-2-git-send-email-n-horiguchi@ah.jp.nec.com> <5148FB6C.4070202@gmail.com> <1363816399-c6e7mofc-mutt-n-horiguchi@ah.jp.nec.com> <514A4815.4040206@gmail.com> In-Reply-To: <514A4815.4040206@gmail.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ping! On 03/21/2013 07:36 AM, Simon Jeons wrote: > Hi Naoya, > On 03/21/2013 05:53 AM, Naoya Horiguchi wrote: >> On Wed, Mar 20, 2013 at 07:57:32AM +0800, Simon Jeons wrote: >>> Hi Naoya, >>> On 02/22/2013 03:41 AM, Naoya Horiguchi wrote: >>>> When we have a page fault for the address which is backed by a hugepage >>>> under migration, the kernel can't wait correctly until the migration >>>> finishes. This is because pte_offset_map_lock() can't get a correct >>> It seems that current hugetlb_fault still wait hugetlb page under >>> migration, how can it work without lock 2MB memory? >> Hugetlb_fault() does call migration_entry_wait(), but returns immediately. > Could you point out to me which code in function migration_entry_wait() > lead to return immediately? > >> So page fault happens over and over again until the migration completes. >> IOW, migration_entry_wait() is now broken for hugepage and doesn't work >> as expected. >> >> Thanks, >> Naoya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1765928Ab3DEBPH (ORCPT ); Thu, 4 Apr 2013 21:15:07 -0400 Received: from mail-oa0-f41.google.com ([209.85.219.41]:42586 "EHLO mail-oa0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1765909Ab3DEBPF (ORCPT ); Thu, 4 Apr 2013 21:15:05 -0400 Message-ID: <515E2592.7020607@gmail.com> Date: Fri, 05 Apr 2013 09:14:58 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Michal Hocko CC: Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> In-Reply-To: <20130322081532.GC31457@dhcp22.suse.cz> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Michal, On 03/22/2013 04:15 PM, Michal Hocko wrote: > [getting off-list] > > On Fri 22-03-13 07:46:32, Simon Jeons wrote: >> Hi Michal, >> On 03/21/2013 08:56 PM, Michal Hocko wrote: >>> On Thu 21-03-13 07:49:48, Simon Jeons wrote: >>> [...] >>>> When I hacking arch/x86/mm/hugetlbpage.c like this, >>>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >>>> index ae1aa71..87f34ee 100644 >>>> --- a/arch/x86/mm/hugetlbpage.c >>>> +++ b/arch/x86/mm/hugetlbpage.c >>>> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, >>>> unsigned long addr, >>>> >>>> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ >>>> >>>> -#ifdef CONFIG_X86_64 >>>> static __init int setup_hugepagesz(char *opt) >>>> { >>>> unsigned long ps = memparse(opt, &opt); >>>> if (ps == PMD_SIZE) { >>>> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); >>>> - } else if (ps == PUD_SIZE && cpu_has_gbpages) { >>>> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); >>>> + } else if (ps == PUD_SIZE) { >>>> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); >>>> } else { >>>> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", >>>> ps >> 20); >>>> >>>> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. >>>> What's the difference between these pages which I hacking and normal >>>> huge pages? >>> How is this related to the patch set? >>> Please _stop_ distracting discussion to unrelated topics! >>> >>> Nothing personal but this is just wasting our time. >> Sorry kindly Michal, my bad. >> Btw, could you explain this question for me? very sorry waste your time. > Your CPU has to support GB pages. You have removed cpu_has_gbpages test > and added a hstate for order 13 pages which is a weird number on its > own (32MB) because there is no page table level to support them. But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*, and have equal number of 32MB huge pages which I set up in boot parameter. If there is no page table level to support them, how can them present? I can hacking this successfully in ubuntu, but not in fedora. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161884Ab3DEIIl (ORCPT ); Fri, 5 Apr 2013 04:08:41 -0400 Received: from cantor2.suse.de ([195.135.220.15]:45430 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161874Ab3DEIIc (ORCPT ); Fri, 5 Apr 2013 04:08:32 -0400 Date: Fri, 5 Apr 2013 10:08:28 +0200 From: Michal Hocko To: Simon Jeons Cc: Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes Subject: Re: [RFC][PATCH 0/9] extend hugepage migration Message-ID: <20130405080828.GA14882@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> <515E2592.7020607@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <515E2592.7020607@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 05-04-13 09:14:58, Simon Jeons wrote: > Hi Michal, > On 03/22/2013 04:15 PM, Michal Hocko wrote: > >[getting off-list] > > > >On Fri 22-03-13 07:46:32, Simon Jeons wrote: > >>Hi Michal, > >>On 03/21/2013 08:56 PM, Michal Hocko wrote: > >>>On Thu 21-03-13 07:49:48, Simon Jeons wrote: > >>>[...] > >>>>When I hacking arch/x86/mm/hugetlbpage.c like this, > >>>>diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > >>>>index ae1aa71..87f34ee 100644 > >>>>--- a/arch/x86/mm/hugetlbpage.c > >>>>+++ b/arch/x86/mm/hugetlbpage.c > >>>>@@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, > >>>>unsigned long addr, > >>>> > >>>>#endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ > >>>> > >>>>-#ifdef CONFIG_X86_64 > >>>>static __init int setup_hugepagesz(char *opt) > >>>>{ > >>>>unsigned long ps = memparse(opt, &opt); > >>>>if (ps == PMD_SIZE) { > >>>>hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); > >>>>- } else if (ps == PUD_SIZE && cpu_has_gbpages) { > >>>>- hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); > >>>>+ } else if (ps == PUD_SIZE) { > >>>>+ hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); > >>>>} else { > >>>>printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", > >>>>ps >> 20); > >>>> > >>>>I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. > >>>>What's the difference between these pages which I hacking and normal > >>>>huge pages? > >>>How is this related to the patch set? > >>>Please _stop_ distracting discussion to unrelated topics! > >>> > >>>Nothing personal but this is just wasting our time. > >>Sorry kindly Michal, my bad. > >>Btw, could you explain this question for me? very sorry waste your time. > >Your CPU has to support GB pages. You have removed cpu_has_gbpages test > >and added a hstate for order 13 pages which is a weird number on its > >own (32MB) because there is no page table level to support them. > > But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*, > and have equal number of 32MB huge pages which I set up in boot > parameter. because hugetlb_add_hstate creates hstate for those pages and hugetlb_init_hstates allocates them later on. > If there is no page table level to support them, how can > them present? Because hugetlb hstate handling code doesn't care about page tables and the way how those pages are going to be mapped _at all_. Or put it in another way. Nobody prevents you to allocate order-5 page for a single pte but that would be a pure waste. Page fault code expects that pages with a proper size are allocated. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752196Ab3DEJBK (ORCPT ); Fri, 5 Apr 2013 05:01:10 -0400 Received: from mail-pd0-f180.google.com ([209.85.192.180]:61987 "EHLO mail-pd0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750723Ab3DEJBG (ORCPT ); Fri, 5 Apr 2013 05:01:06 -0400 Message-ID: <515E92CA.4000507@gmail.com> Date: Fri, 05 Apr 2013 17:00:58 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Michal Hocko CC: Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> <515E2592.7020607@gmail.com> <20130405080828.GA14882@dhcp22.suse.cz> In-Reply-To: <20130405080828.GA14882@dhcp22.suse.cz> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Michal, On 04/05/2013 04:08 PM, Michal Hocko wrote: > On Fri 05-04-13 09:14:58, Simon Jeons wrote: >> Hi Michal, >> On 03/22/2013 04:15 PM, Michal Hocko wrote: >>> [getting off-list] >>> >>> On Fri 22-03-13 07:46:32, Simon Jeons wrote: >>>> Hi Michal, >>>> On 03/21/2013 08:56 PM, Michal Hocko wrote: >>>>> On Thu 21-03-13 07:49:48, Simon Jeons wrote: >>>>> [...] >>>>>> When I hacking arch/x86/mm/hugetlbpage.c like this, >>>>>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >>>>>> index ae1aa71..87f34ee 100644 >>>>>> --- a/arch/x86/mm/hugetlbpage.c >>>>>> +++ b/arch/x86/mm/hugetlbpage.c >>>>>> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, >>>>>> unsigned long addr, >>>>>> >>>>>> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ >>>>>> >>>>>> -#ifdef CONFIG_X86_64 >>>>>> static __init int setup_hugepagesz(char *opt) >>>>>> { >>>>>> unsigned long ps = memparse(opt, &opt); >>>>>> if (ps == PMD_SIZE) { >>>>>> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); >>>>>> - } else if (ps == PUD_SIZE && cpu_has_gbpages) { >>>>>> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); >>>>>> + } else if (ps == PUD_SIZE) { >>>>>> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); >>>>>> } else { >>>>>> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", >>>>>> ps >> 20); >>>>>> >>>>>> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. >>>>>> What's the difference between these pages which I hacking and normal >>>>>> huge pages? >>>>> How is this related to the patch set? >>>>> Please _stop_ distracting discussion to unrelated topics! >>>>> >>>>> Nothing personal but this is just wasting our time. >>>> Sorry kindly Michal, my bad. >>>> Btw, could you explain this question for me? very sorry waste your time. >>> Your CPU has to support GB pages. You have removed cpu_has_gbpages test >>> and added a hstate for order 13 pages which is a weird number on its >>> own (32MB) because there is no page table level to support them. >> But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*, >> and have equal number of 32MB huge pages which I set up in boot >> parameter. > because hugetlb_add_hstate creates hstate for those pages and > hugetlb_init_hstates allocates them later on. > >> If there is no page table level to support them, how can >> them present? > Because hugetlb hstate handling code doesn't care about page tables and > the way how those pages are going to be mapped _at all_. Or put it in > another way. Nobody prevents you to allocate order-5 page for a single > pte but that would be a pure waste. Page fault code expects that pages > with a proper size are allocated. Do you mean 32MB pages will map to one pmd which should map 2MB pages? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161315Ab3DEJaj (ORCPT ); Fri, 5 Apr 2013 05:30:39 -0400 Received: from cantor2.suse.de ([195.135.220.15]:48267 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161203Ab3DEJah (ORCPT ); Fri, 5 Apr 2013 05:30:37 -0400 Date: Fri, 5 Apr 2013 11:30:34 +0200 From: Michal Hocko To: Simon Jeons Cc: Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes Subject: Re: [RFC][PATCH 0/9] extend hugepage migration Message-ID: <20130405093034.GB31132@dhcp22.suse.cz> References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> <515E2592.7020607@gmail.com> <20130405080828.GA14882@dhcp22.suse.cz> <515E92CA.4000507@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <515E92CA.4000507@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 05-04-13 17:00:58, Simon Jeons wrote: > Hi Michal, > On 04/05/2013 04:08 PM, Michal Hocko wrote: > >On Fri 05-04-13 09:14:58, Simon Jeons wrote: > >>Hi Michal, > >>On 03/22/2013 04:15 PM, Michal Hocko wrote: > >>>[getting off-list] > >>> > >>>On Fri 22-03-13 07:46:32, Simon Jeons wrote: > >>>>Hi Michal, > >>>>On 03/21/2013 08:56 PM, Michal Hocko wrote: > >>>>>On Thu 21-03-13 07:49:48, Simon Jeons wrote: > >>>>>[...] > >>>>>>When I hacking arch/x86/mm/hugetlbpage.c like this, > >>>>>>diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > >>>>>>index ae1aa71..87f34ee 100644 > >>>>>>--- a/arch/x86/mm/hugetlbpage.c > >>>>>>+++ b/arch/x86/mm/hugetlbpage.c > >>>>>>@@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, > >>>>>>unsigned long addr, > >>>>>> > >>>>>>#endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ > >>>>>> > >>>>>>-#ifdef CONFIG_X86_64 > >>>>>>static __init int setup_hugepagesz(char *opt) > >>>>>>{ > >>>>>>unsigned long ps = memparse(opt, &opt); > >>>>>>if (ps == PMD_SIZE) { > >>>>>>hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); > >>>>>>- } else if (ps == PUD_SIZE && cpu_has_gbpages) { > >>>>>>- hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); > >>>>>>+ } else if (ps == PUD_SIZE) { > >>>>>>+ hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); > >>>>>>} else { > >>>>>>printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", > >>>>>>ps >> 20); > >>>>>> > >>>>>>I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. > >>>>>>What's the difference between these pages which I hacking and normal > >>>>>>huge pages? > >>>>>How is this related to the patch set? > >>>>>Please _stop_ distracting discussion to unrelated topics! > >>>>> > >>>>>Nothing personal but this is just wasting our time. > >>>>Sorry kindly Michal, my bad. > >>>>Btw, could you explain this question for me? very sorry waste your time. > >>>Your CPU has to support GB pages. You have removed cpu_has_gbpages test > >>>and added a hstate for order 13 pages which is a weird number on its > >>>own (32MB) because there is no page table level to support them. > >>But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*, > >>and have equal number of 32MB huge pages which I set up in boot > >>parameter. > >because hugetlb_add_hstate creates hstate for those pages and > >hugetlb_init_hstates allocates them later on. > > > >>If there is no page table level to support them, how can > >>them present? > >Because hugetlb hstate handling code doesn't care about page tables and > >the way how those pages are going to be mapped _at all_. Or put it in > >another way. Nobody prevents you to allocate order-5 page for a single > >pte but that would be a pure waste. Page fault code expects that pages > >with a proper size are allocated. > Do you mean 32MB pages will map to one pmd which should map 2MB pages? > Please refer to hugetlb_fault for more information. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161232Ab3DGAci (ORCPT ); Sat, 6 Apr 2013 20:32:38 -0400 Received: from mail-ob0-f170.google.com ([209.85.214.170]:41189 "EHLO mail-ob0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759597Ab3DGAch (ORCPT ); Sat, 6 Apr 2013 20:32:37 -0400 Message-ID: <5160BE9E.1050905@gmail.com> Date: Sun, 07 Apr 2013 08:32:30 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Michal Hocko CC: Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> <515E2592.7020607@gmail.com> <20130405080828.GA14882@dhcp22.suse.cz> <515E92CA.4000507@gmail.com> <20130405093034.GB31132@dhcp22.suse.cz> In-Reply-To: <20130405093034.GB31132@dhcp22.suse.cz> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Michal, On 04/05/2013 05:30 PM, Michal Hocko wrote: > On Fri 05-04-13 17:00:58, Simon Jeons wrote: >> Hi Michal, >> On 04/05/2013 04:08 PM, Michal Hocko wrote: >>> On Fri 05-04-13 09:14:58, Simon Jeons wrote: >>>> Hi Michal, >>>> On 03/22/2013 04:15 PM, Michal Hocko wrote: >>>>> [getting off-list] >>>>> >>>>> On Fri 22-03-13 07:46:32, Simon Jeons wrote: >>>>>> Hi Michal, >>>>>> On 03/21/2013 08:56 PM, Michal Hocko wrote: >>>>>>> On Thu 21-03-13 07:49:48, Simon Jeons wrote: >>>>>>> [...] >>>>>>>> When I hacking arch/x86/mm/hugetlbpage.c like this, >>>>>>>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c >>>>>>>> index ae1aa71..87f34ee 100644 >>>>>>>> --- a/arch/x86/mm/hugetlbpage.c >>>>>>>> +++ b/arch/x86/mm/hugetlbpage.c >>>>>>>> @@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file, >>>>>>>> unsigned long addr, >>>>>>>> >>>>>>>> #endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/ >>>>>>>> >>>>>>>> -#ifdef CONFIG_X86_64 >>>>>>>> static __init int setup_hugepagesz(char *opt) >>>>>>>> { >>>>>>>> unsigned long ps = memparse(opt, &opt); >>>>>>>> if (ps == PMD_SIZE) { >>>>>>>> hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); >>>>>>>> - } else if (ps == PUD_SIZE && cpu_has_gbpages) { >>>>>>>> - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); >>>>>>>> + } else if (ps == PUD_SIZE) { >>>>>>>> + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4); >>>>>>>> } else { >>>>>>>> printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", >>>>>>>> ps >> 20); >>>>>>>> >>>>>>>> I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages. >>>>>>>> What's the difference between these pages which I hacking and normal >>>>>>>> huge pages? >>>>>>> How is this related to the patch set? >>>>>>> Please _stop_ distracting discussion to unrelated topics! >>>>>>> >>>>>>> Nothing personal but this is just wasting our time. >>>>>> Sorry kindly Michal, my bad. >>>>>> Btw, could you explain this question for me? very sorry waste your time. >>>>> Your CPU has to support GB pages. You have removed cpu_has_gbpages test >>>>> and added a hstate for order 13 pages which is a weird number on its >>>>> own (32MB) because there is no page table level to support them. >>>> But after hacking, there is /sys/kernel/mm/hugepages/hugepages-*, >>>> and have equal number of 32MB huge pages which I set up in boot >>>> parameter. >>> because hugetlb_add_hstate creates hstate for those pages and >>> hugetlb_init_hstates allocates them later on. >>> >>>> If there is no page table level to support them, how can >>>> them present? >>> Because hugetlb hstate handling code doesn't care about page tables and >>> the way how those pages are going to be mapped _at all_. Or put it in >>> another way. Nobody prevents you to allocate order-5 page for a single >>> pte but that would be a pure waste. Page fault code expects that pages >>> with a proper size are allocated. >> Do you mean 32MB pages will map to one pmd which should map 2MB pages? >> > Please refer to hugetlb_fault for more information. Thanks for your pointing out. So my assume is correct, is it? Can pmd which support 2MB map 32MB pages work well? > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933793Ab3DGOFl (ORCPT ); Sun, 7 Apr 2013 10:05:41 -0400 Received: from mail-vc0-f176.google.com ([209.85.220.176]:53259 "EHLO mail-vc0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933717Ab3DGOFk (ORCPT ); Sun, 7 Apr 2013 10:05:40 -0400 Message-ID: <51617D37.1020502@gmail.com> Date: Sun, 07 Apr 2013 10:05:43 -0400 From: KOSAKI Motohiro User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: Simon Jeons CC: Michal Hocko , Linux Memory Management List , Andrew Morton , Mel Gorman , Hugh Dickins , KOSAKI Motohiro , Andi Kleen , Linux kernel Mailing List , Naoya Horiguchi , David Rientjes , kosaki.motohiro@gmail.com Subject: Re: [RFC][PATCH 0/9] extend hugepage migration References: <1361475708-25991-1-git-send-email-n-horiguchi@ah.jp.nec.com> <5148F830.3070601@gmail.com> <1363815326-urchkyxr-mutt-n-horiguchi@ah.jp.nec.com> <514A4B1C.6020201@gmail.com> <20130321125628.GB6051@dhcp22.suse.cz> <514B9BD8.9050207@gmail.com> <20130322081532.GC31457@dhcp22.suse.cz> <515E2592.7020607@gmail.com> <20130405080828.GA14882@dhcp22.suse.cz> <515E92CA.4000507@gmail.com> <20130405093034.GB31132@dhcp22.suse.cz> <5160BE9E.1050905@gmail.com> In-Reply-To: <5160BE9E.1050905@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >> Please refer to hugetlb_fault for more information. > > Thanks for your pointing out. So my assume is correct, is it? Can pmd > which support 2MB map 32MB pages work well? Simon, Please stop hijaking unrelated threads. This is not question and answer thread.