[PATCHv2 0/4] thp: simplify freeze_page() and unfreeze

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv2 0/4] thp: simplify freeze_page() and unfreeze_page()
@ 2016-03-07 11:57 Kirill A. Shutemov
  2016-03-07 11:57 ` [PATCHv2 1/4] rmap: introduce rmap_walk_locked() Kirill A. Shutemov
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2016-03-07 11:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, linux-mm, Kirill A. Shutemov

This patchset rewrites freeze_page() and unfreeze_page() using try_to_unmap()
and remove_migration_ptes(). Result is much simpler, but somewhat slower.

Comparing to v1, I've recovered most of performance for PMD-mapped THPs
with few shortcuts.

Migration 8GiB worth of PMD-mapped THP:

Baseline	20.21 A+- 0.393
Patched		20.73 A+- 0.082
Slowdown	1.03x

It's 3% slower, comparing to 14% in v1. I don't it should be a stopper.

Splitting of PTE-mapped pages slowed more. But this is not that often
case.

Migration 8GiB worth of PMD-mapped THP:

Baseline	20.39 A+- 0.225
Patched		22.43 A+- 0.496
Slowdown	1.10x

Please, consider applying.

Kirill A. Shutemov (4):
  rmap: introduce rmap_walk_locked()
  rmap: extend try_to_unmap() to be usable by split_huge_page()
  mm: make remove_migration_ptes() beyond mm/migration.c
  thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers

 include/linux/huge_mm.h |  13 ++-
 include/linux/rmap.h    |   6 ++
 mm/huge_memory.c        | 204 +++++++-----------------------------------------
 mm/migrate.c            |  15 ++--
 mm/rmap.c               |  70 +++++++++++++----
 5 files changed, 106 insertions(+), 202 deletions(-)

-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv2 1/4] rmap: introduce rmap_walk_locked()
  2016-03-07 11:57 [PATCHv2 0/4] thp: simplify freeze_page() and unfreeze_page() Kirill A. Shutemov
@ 2016-03-07 11:57 ` Kirill A. Shutemov
  2016-03-07 11:57 ` [PATCHv2 2/4] rmap: extend try_to_unmap() to be usable by split_huge_page() Kirill A. Shutemov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2016-03-07 11:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, linux-mm, Kirill A. Shutemov

rmap_walk_locked() is the same as rmap_walk(), but caller takes care
about relevant rmap lock.

It's preparation to switch THP splitting from custom rmap walk in
freeze_page()/unfreeze_page() to generic one.

Not support for KSM pages for now: not clear which lock is implied.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/rmap.h |  1 +
 mm/rmap.c            | 41 ++++++++++++++++++++++++++++++++---------
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index a07f42bedda3..a5875e9b4a27 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -266,6 +266,7 @@ struct rmap_walk_control {
 };
 
 int rmap_walk(struct page *page, struct rmap_walk_control *rwc);
+int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc);
 
 #else	/* !CONFIG_MMU */
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 02f0bfc3c80a..30b739ce0ffa 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1715,14 +1715,21 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
  * vm_flags for that VMA.  That should be OK, because that vma shouldn't be
  * LOCKED.
  */
-static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
+static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
+		bool locked)
 {
 	struct anon_vma *anon_vma;
 	pgoff_t pgoff;
 	struct anon_vma_chain *avc;
 	int ret = SWAP_AGAIN;
 
-	anon_vma = rmap_walk_anon_lock(page, rwc);
+	if (locked) {
+		anon_vma = page_anon_vma(page);
+		/* anon_vma disappear under us? */
+		VM_BUG_ON_PAGE(!anon_vma, page);
+	} else {
+		anon_vma = rmap_walk_anon_lock(page, rwc);
+	}
 	if (!anon_vma)
 		return ret;
 
@@ -1742,7 +1749,9 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
 		if (rwc->done && rwc->done(page))
 			break;
 	}
-	anon_vma_unlock_read(anon_vma);
+
+	if (!locked)
+		anon_vma_unlock_read(anon_vma);
 	return ret;
 }
 
@@ -1759,9 +1768,10 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
  * vm_flags for that VMA.  That should be OK, because that vma shouldn't be
  * LOCKED.
  */
-static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc)
+static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc,
+		bool locked)
 {
-	struct address_space *mapping = page->mapping;
+	struct address_space *mapping = page_mapping(page);
 	pgoff_t pgoff;
 	struct vm_area_struct *vma;
 	int ret = SWAP_AGAIN;
@@ -1778,7 +1788,8 @@ static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc)
 		return ret;
 
 	pgoff = page_to_pgoff(page);
-	i_mmap_lock_read(mapping);
+	if (!locked)
+		i_mmap_lock_read(mapping);
 	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
 		unsigned long address = vma_address(page, vma);
 
@@ -1795,7 +1806,8 @@ static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc)
 	}
 
 done:
-	i_mmap_unlock_read(mapping);
+	if (!locked)
+		i_mmap_unlock_read(mapping);
 	return ret;
 }
 
@@ -1804,9 +1816,20 @@ int rmap_walk(struct page *page, struct rmap_walk_control *rwc)
 	if (unlikely(PageKsm(page)))
 		return rmap_walk_ksm(page, rwc);
 	else if (PageAnon(page))
-		return rmap_walk_anon(page, rwc);
+		return rmap_walk_anon(page, rwc, false);
+	else
+		return rmap_walk_file(page, rwc, false);
+}
+
+/* Like rmap_walk, but caller holds relevant rmap lock */
+int rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc)
+{
+	/* no ksm support for now */
+	VM_BUG_ON_PAGE(PageKsm(page), page);
+	if (PageAnon(page))
+		return rmap_walk_anon(page, rwc, true);
 	else
-		return rmap_walk_file(page, rwc);
+		return rmap_walk_file(page, rwc, true);
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv2 2/4] rmap: extend try_to_unmap() to be usable by split_huge_page()
  2016-03-07 11:57 [PATCHv2 0/4] thp: simplify freeze_page() and unfreeze_page() Kirill A. Shutemov
  2016-03-07 11:57 ` [PATCHv2 1/4] rmap: introduce rmap_walk_locked() Kirill A. Shutemov
@ 2016-03-07 11:57 ` Kirill A. Shutemov
  2016-03-07 11:57 ` [PATCHv2 3/4] mm: make remove_migration_ptes() beyond mm/migration.c Kirill A. Shutemov
  2016-03-07 11:57 ` [PATCHv2 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers Kirill A. Shutemov
  3 siblings, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2016-03-07 11:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, linux-mm, Kirill A. Shutemov

The patch add support for two ttu_flags:

  - TTU_SPLIT_HUGE_PMD would split PMD if it's there, before trying to
    unmap page;

  - TTU_RMAP_LOCKED indicates that caller holds relevant rmap lock;

Apart these flags, patch changes rwc->done to !page_mapcount()
instead of !page_mapped(). try_to_unmap() works on pte level, so we
really interested if this small pages is mapped, not compound page
it's part of.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/huge_mm.h |  7 +++++++
 include/linux/rmap.h    |  3 +++
 mm/huge_memory.c        |  5 +----
 mm/rmap.c               | 24 ++++++++++++++++--------
 4 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 459fd25b378e..c47067151ffd 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -111,6 +111,9 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			__split_huge_pmd(__vma, __pmd, __address);	\
 	}  while (0)
 
+
+void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address);
+
 #if HPAGE_PMD_ORDER >= MAX_ORDER
 #error "hugepages can't be allocated by the buddy allocator"
 #endif
@@ -178,6 +181,10 @@ static inline int split_huge_page(struct page *page)
 static inline void deferred_split_huge_page(struct page *page) {}
 #define split_huge_pmd(__vma, __pmd, __address)	\
 	do { } while (0)
+
+static inline void split_huge_pmd_address(struct vm_area_struct *vma,
+		unsigned long address) {}
+
 static inline int hugepage_madvise(struct vm_area_struct *vma,
 				   unsigned long *vm_flags, int advice)
 {
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index a5875e9b4a27..3d975e2252d4 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -86,6 +86,7 @@ enum ttu_flags {
 	TTU_MIGRATION = 2,		/* migration mode */
 	TTU_MUNLOCK = 4,		/* munlock mode */
 	TTU_LZFREE = 8,			/* lazy free mode */
+	TTU_SPLIT_HUGE_PMD = 16,	/* split huge PMD if any */
 
 	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */
 	TTU_IGNORE_ACCESS = (1 << 9),	/* don't age */
@@ -93,6 +94,8 @@ enum ttu_flags {
 	TTU_BATCH_FLUSH = (1 << 11),	/* Batch TLB flushes where possible
 					 * and caller guarantees they will
 					 * do a final flush if necessary */
+	TTU_RMAP_LOCKED = (1 << 12)	/* do not grab rmap lock:
+					 * caller holds it */
 };
 
 #ifdef CONFIG_MMU
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 09de368c742b..354464b484a7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3049,15 +3049,12 @@ out:
 	}
 }
 
-static void split_huge_pmd_address(struct vm_area_struct *vma,
-				    unsigned long address)
+void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 
-	VM_BUG_ON(!(address & ~HPAGE_PMD_MASK));
-
 	pgd = pgd_offset(vma->vm_mm, address);
 	if (!pgd_present(*pgd))
 		return;
diff --git a/mm/rmap.c b/mm/rmap.c
index 30b739ce0ffa..945933a01010 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1431,6 +1431,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
 		goto out;
 
+	if (flags & TTU_SPLIT_HUGE_PMD)
+		split_huge_pmd_address(vma, address);
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
@@ -1576,10 +1578,10 @@ static bool invalid_migration_vma(struct vm_area_struct *vma, void *arg)
 	return is_vma_temporary_stack(vma);
 }
 
-static int page_not_mapped(struct page *page)
+static int page_mapcount_is_zero(struct page *page)
 {
-	return !page_mapped(page);
-};
+	return !page_mapcount(page);
+}
 
 /**
  * try_to_unmap - try to remove all page table mappings to a page
@@ -1606,12 +1608,10 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 	struct rmap_walk_control rwc = {
 		.rmap_one = try_to_unmap_one,
 		.arg = &rp,
-		.done = page_not_mapped,
+		.done = page_mapcount_is_zero,
 		.anon_lock = page_lock_anon_vma_read,
 	};
 
-	VM_BUG_ON_PAGE(!PageHuge(page) && PageTransHuge(page), page);
-
 	/*
 	 * During exec, a temporary VMA is setup and later moved.
 	 * The VMA is moved under the anon_vma lock but not the
@@ -1623,9 +1623,12 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 	if ((flags & TTU_MIGRATION) && !PageKsm(page) && PageAnon(page))
 		rwc.invalid_vma = invalid_migration_vma;
 
-	ret = rmap_walk(page, &rwc);
+	if (flags & TTU_RMAP_LOCKED)
+		ret = rmap_walk_locked(page, &rwc);
+	else
+		ret = rmap_walk(page, &rwc);
 
-	if (ret != SWAP_MLOCK && !page_mapped(page)) {
+	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
 		ret = SWAP_SUCCESS;
 		if (rp.lazyfreed && !PageDirty(page))
 			ret = SWAP_LZFREE;
@@ -1633,6 +1636,11 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 	return ret;
 }
 
+static int page_not_mapped(struct page *page)
+{
+	return !page_mapped(page);
+};
+
 /**
  * try_to_munlock - try to munlock a page
  * @page: the page to be munlocked
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv2 3/4] mm: make remove_migration_ptes() beyond mm/migration.c
  2016-03-07 11:57 [PATCHv2 0/4] thp: simplify freeze_page() and unfreeze_page() Kirill A. Shutemov
  2016-03-07 11:57 ` [PATCHv2 1/4] rmap: introduce rmap_walk_locked() Kirill A. Shutemov
  2016-03-07 11:57 ` [PATCHv2 2/4] rmap: extend try_to_unmap() to be usable by split_huge_page() Kirill A. Shutemov
@ 2016-03-07 11:57 ` Kirill A. Shutemov
  2016-03-07 11:57 ` [PATCHv2 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers Kirill A. Shutemov
  3 siblings, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2016-03-07 11:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, linux-mm, Kirill A. Shutemov

The patch makes remove_migration_ptes() available to be used in
split_huge_page().

New parameter 'locked' added: as with try_to_umap() we need a way to
indicate that caller holds rmap lock.

We also shouldn't try to mlock() pte-mapped huge pages: pte-mapeed THP
pages are never mlocked.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/rmap.h |  2 ++
 mm/migrate.c         | 15 +++++++++------
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 3d975e2252d4..49eb4f8ebac9 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -243,6 +243,8 @@ int page_mkclean(struct page *);
  */
 int try_to_munlock(struct page *);
 
+void remove_migration_ptes(struct page *old, struct page *new, bool locked);
+
 /*
  * Called by memory-failure.c to kill processes.
  */
diff --git a/mm/migrate.c b/mm/migrate.c
index 577c94b8e959..6c822a7b27e0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -172,7 +172,7 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
 	else
 		page_add_file_rmap(new);
 
-	if (vma->vm_flags & VM_LOCKED)
+	if (vma->vm_flags & VM_LOCKED && !PageTransCompound(new))
 		mlock_vma_page(new);
 
 	/* No need to invalidate - it was non-present before */
@@ -187,14 +187,17 @@ out:
  * Get rid of all migration entries and replace them by
  * references to the indicated page.
  */
-static void remove_migration_ptes(struct page *old, struct page *new)
+void remove_migration_ptes(struct page *old, struct page *new, bool locked)
 {
 	struct rmap_walk_control rwc = {
 		.rmap_one = remove_migration_pte,
 		.arg = old,
 	};
 
-	rmap_walk(new, &rwc);
+	if (locked)
+		rmap_walk_locked(new, &rwc);
+	else
+		rmap_walk(new, &rwc);
 }
 
 /*
@@ -702,7 +705,7 @@ static int writeout(struct address_space *mapping, struct page *page)
 	 * At this point we know that the migration attempt cannot
 	 * be successful.
 	 */
-	remove_migration_ptes(page, page);
+	remove_migration_ptes(page, page, false);
 
 	rc = mapping->a_ops->writepage(page, &wbc);
 
@@ -900,7 +903,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 
 	if (page_was_mapped)
 		remove_migration_ptes(page,
-			rc == MIGRATEPAGE_SUCCESS ? newpage : page);
+			rc == MIGRATEPAGE_SUCCESS ? newpage : page, false);
 
 out_unlock_both:
 	unlock_page(newpage);
@@ -1070,7 +1073,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 
 	if (page_was_mapped)
 		remove_migration_ptes(hpage,
-			rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage);
+			rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage, false);
 
 	unlock_page(new_hpage);
 
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv2 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
  2016-03-07 11:57 [PATCHv2 0/4] thp: simplify freeze_page() and unfreeze_page() Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2016-03-07 11:57 ` [PATCHv2 3/4] mm: make remove_migration_ptes() beyond mm/migration.c Kirill A. Shutemov
@ 2016-03-07 11:57 ` Kirill A. Shutemov
  2016-03-11  9:42   ` Kirill A. Shutemov
  2016-04-19 13:55   ` Sasha Levin
  3 siblings, 2 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2016-03-07 11:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, linux-mm, Kirill A. Shutemov

freeze_page() and unfreeze_page() helpers evolved in rather complex
beasts. It would be nice to cut complexity of this code.

This patch rewrites freeze_page() using standard try_to_unmap().
unfreeze_page() is rewritten with remove_migration_ptes().

The result is much simpler.

But the new variant is somewhat slower for PTE-mapped THPs.
Current helpers iterates over VMAs the compound page is mapped to, and
then over ptes within this VMA. New helpers iterates over small page,
then over VMA the small page mapped to, and only then find relevant pte.

We have short cut for PMD-mapped THP: we directly install migration
entries on PMD split.

I don't think the slowdown is critical, considering how much simpler
result is and that split_huge_page() is quite rare nowadays. It only
happens due memory pressure or migration.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/huge_mm.h |  10 ++-
 mm/huge_memory.c        | 201 +++++++-----------------------------------------
 mm/rmap.c               |   9 ++-
 3 files changed, 40 insertions(+), 180 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index c47067151ffd..74271f9e3c03 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -101,18 +101,20 @@ static inline int split_huge_page(struct page *page)
 void deferred_split_huge_page(struct page *page);
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
-		unsigned long address);
+		unsigned long address, bool freeze);
 
 #define split_huge_pmd(__vma, __pmd, __address)				\
 	do {								\
 		pmd_t *____pmd = (__pmd);				\
 		if (pmd_trans_huge(*____pmd)				\
 					|| pmd_devmap(*____pmd))	\
-			__split_huge_pmd(__vma, __pmd, __address);	\
+			__split_huge_pmd(__vma, __pmd, __address,	\
+						false);			\
 	}  while (0)
 
 
-void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address);
+void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address,
+		bool freeze);
 
 #if HPAGE_PMD_ORDER >= MAX_ORDER
 #error "hugepages can't be allocated by the buddy allocator"
@@ -183,7 +185,7 @@ static inline void deferred_split_huge_page(struct page *page) {}
 	do { } while (0)
 
 static inline void split_huge_pmd_address(struct vm_area_struct *vma,
-		unsigned long address) {}
+		unsigned long address, bool freeze) {}
 
 static inline int hugepage_madvise(struct vm_area_struct *vma,
 				   unsigned long *vm_flags, int advice)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 354464b484a7..1f18687b96d3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3020,7 +3020,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 }
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
-		unsigned long address)
+		unsigned long address, bool freeze)
 {
 	spinlock_t *ptl;
 	struct mm_struct *mm = vma->vm_mm;
@@ -3037,7 +3037,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			page = NULL;
 	} else if (!pmd_devmap(*pmd))
 		goto out;
-	__split_huge_pmd_locked(vma, pmd, haddr, false);
+	__split_huge_pmd_locked(vma, pmd, haddr, freeze);
 out:
 	spin_unlock(ptl);
 	mmu_notifier_invalidate_range_end(mm, haddr, haddr + HPAGE_PMD_SIZE);
@@ -3049,7 +3049,8 @@ out:
 	}
 }
 
-void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address)
+void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address,
+		bool freeze)
 {
 	pgd_t *pgd;
 	pud_t *pud;
@@ -3070,7 +3071,7 @@ void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address)
 	 * Caller holds the mmap_sem write mode, so a huge pmd cannot
 	 * materialize from under us.
 	 */
-	split_huge_pmd(vma, pmd, address);
+	__split_huge_pmd(vma, pmd, address, freeze);
 }
 
 void vma_adjust_trans_huge(struct vm_area_struct *vma,
@@ -3086,7 +3087,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
 	if (start & ~HPAGE_PMD_MASK &&
 	    (start & HPAGE_PMD_MASK) >= vma->vm_start &&
 	    (start & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= vma->vm_end)
-		split_huge_pmd_address(vma, start);
+		split_huge_pmd_address(vma, start, false);
 
 	/*
 	 * If the new end address isn't hpage aligned and it could
@@ -3096,7 +3097,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
 	if (end & ~HPAGE_PMD_MASK &&
 	    (end & HPAGE_PMD_MASK) >= vma->vm_start &&
 	    (end & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= vma->vm_end)
-		split_huge_pmd_address(vma, end);
+		split_huge_pmd_address(vma, end, false);
 
 	/*
 	 * If we're also updating the vma->vm_next->vm_start, if the new
@@ -3110,184 +3111,36 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
 		if (nstart & ~HPAGE_PMD_MASK &&
 		    (nstart & HPAGE_PMD_MASK) >= next->vm_start &&
 		    (nstart & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= next->vm_end)
-			split_huge_pmd_address(next, nstart);
+			split_huge_pmd_address(next, nstart, false);
 	}
 }
 
-static void freeze_page_vma(struct vm_area_struct *vma, struct page *page,
-		unsigned long address)
+static void freeze_page(struct page *page)
 {
-	unsigned long haddr = address & HPAGE_PMD_MASK;
-	spinlock_t *ptl;
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-	int i, nr = HPAGE_PMD_NR;
-
-	/* Skip pages which doesn't belong to the VMA */
-	if (address < vma->vm_start) {
-		int off = (vma->vm_start - address) >> PAGE_SHIFT;
-		page += off;
-		nr -= off;
-		address = vma->vm_start;
-	}
-
-	pgd = pgd_offset(vma->vm_mm, address);
-	if (!pgd_present(*pgd))
-		return;
-	pud = pud_offset(pgd, address);
-	if (!pud_present(*pud))
-		return;
-	pmd = pmd_offset(pud, address);
-	ptl = pmd_lock(vma->vm_mm, pmd);
-	if (!pmd_present(*pmd)) {
-		spin_unlock(ptl);
-		return;
-	}
-	if (pmd_trans_huge(*pmd)) {
-		if (page == pmd_page(*pmd))
-			__split_huge_pmd_locked(vma, pmd, haddr, true);
-		spin_unlock(ptl);
-		return;
-	}
-	spin_unlock(ptl);
-
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
-	for (i = 0; i < nr; i++, address += PAGE_SIZE, page++, pte++) {
-		pte_t entry, swp_pte;
-		swp_entry_t swp_entry;
-
-		/*
-		 * We've just crossed page table boundary: need to map next one.
-		 * It can happen if THP was mremaped to non PMD-aligned address.
-		 */
-		if (unlikely(address == haddr + HPAGE_PMD_SIZE)) {
-			pte_unmap_unlock(pte - 1, ptl);
-			pmd = mm_find_pmd(vma->vm_mm, address);
-			if (!pmd)
-				return;
-			pte = pte_offset_map_lock(vma->vm_mm, pmd,
-					address, &ptl);
-		}
-
-		if (!pte_present(*pte))
-			continue;
-		if (page_to_pfn(page) != pte_pfn(*pte))
-			continue;
-		flush_cache_page(vma, address, page_to_pfn(page));
-		entry = ptep_clear_flush(vma, address, pte);
-		if (pte_dirty(entry))
-			SetPageDirty(page);
-		swp_entry = make_migration_entry(page, pte_write(entry));
-		swp_pte = swp_entry_to_pte(swp_entry);
-		if (pte_soft_dirty(entry))
-			swp_pte = pte_swp_mksoft_dirty(swp_pte);
-		set_pte_at(vma->vm_mm, address, pte, swp_pte);
-		page_remove_rmap(page, false);
-		put_page(page);
-	}
-	pte_unmap_unlock(pte - 1, ptl);
-}
-
-static void freeze_page(struct anon_vma *anon_vma, struct page *page)
-{
-	struct anon_vma_chain *avc;
-	pgoff_t pgoff = page_to_pgoff(page);
+	enum ttu_flags ttu_flags = TTU_MIGRATION | TTU_IGNORE_MLOCK |
+		TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED;
+	int i, ret;
 
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 
-	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff,
-			pgoff + HPAGE_PMD_NR - 1) {
-		unsigned long address = __vma_address(page, avc->vma);
-
-		mmu_notifier_invalidate_range_start(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-		freeze_page_vma(avc->vma, page, address);
-		mmu_notifier_invalidate_range_end(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-	}
-}
-
-static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
-		unsigned long address)
-{
-	spinlock_t *ptl;
-	pmd_t *pmd;
-	pte_t *pte, entry;
-	swp_entry_t swp_entry;
-	unsigned long haddr = address & HPAGE_PMD_MASK;
-	int i, nr = HPAGE_PMD_NR;
-
-	/* Skip pages which doesn't belong to the VMA */
-	if (address < vma->vm_start) {
-		int off = (vma->vm_start - address) >> PAGE_SHIFT;
-		page += off;
-		nr -= off;
-		address = vma->vm_start;
-	}
-
-	pmd = mm_find_pmd(vma->vm_mm, address);
-	if (!pmd)
-		return;
-
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
-	for (i = 0; i < nr; i++, address += PAGE_SIZE, page++, pte++) {
-		/*
-		 * We've just crossed page table boundary: need to map next one.
-		 * It can happen if THP was mremaped to non-PMD aligned address.
-		 */
-		if (unlikely(address == haddr + HPAGE_PMD_SIZE)) {
-			pte_unmap_unlock(pte - 1, ptl);
-			pmd = mm_find_pmd(vma->vm_mm, address);
-			if (!pmd)
-				return;
-			pte = pte_offset_map_lock(vma->vm_mm, pmd,
-					address, &ptl);
-		}
-
-		if (!is_swap_pte(*pte))
-			continue;
-
-		swp_entry = pte_to_swp_entry(*pte);
-		if (!is_migration_entry(swp_entry))
-			continue;
-		if (migration_entry_to_page(swp_entry) != page)
-			continue;
-
-		get_page(page);
-		page_add_anon_rmap(page, vma, address, false);
-
-		entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
-		if (PageDirty(page))
-			entry = pte_mkdirty(entry);
-		if (is_write_migration_entry(swp_entry))
-			entry = maybe_mkwrite(entry, vma);
-
-		flush_dcache_page(page);
-		set_pte_at(vma->vm_mm, address, pte, entry);
+	/* We only need TTU_SPLIT_HUGE_PMD once */
+	ret = try_to_unmap(page, ttu_flags | TTU_SPLIT_HUGE_PMD);
+	for (i = 1; !ret && i < HPAGE_PMD_NR; i++) {
+		/* Cut short if the page is unmapped */
+		if (page_count(page) == 1)
+			return;
 
-		/* No need to invalidate - it was non-present before */
-		update_mmu_cache(vma, address, pte);
+		ret = try_to_unmap(page + i, ttu_flags);
 	}
-	pte_unmap_unlock(pte - 1, ptl);
+	VM_BUG_ON(ret);
 }
 
-static void unfreeze_page(struct anon_vma *anon_vma, struct page *page)
+static void unfreeze_page(struct page *page)
 {
-	struct anon_vma_chain *avc;
-	pgoff_t pgoff = page_to_pgoff(page);
-
-	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
-			pgoff, pgoff + HPAGE_PMD_NR - 1) {
-		unsigned long address = __vma_address(page, avc->vma);
+	int i;
 
-		mmu_notifier_invalidate_range_start(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-		unfreeze_page_vma(avc->vma, page, address);
-		mmu_notifier_invalidate_range_end(avc->vma->vm_mm,
-				address, address + HPAGE_PMD_SIZE);
-	}
+	for (i = 0; i < HPAGE_PMD_NR; i++)
+		remove_migration_ptes(page + i, page + i, true);
 }
 
 static void __split_huge_page_tail(struct page *head, int tail,
@@ -3365,7 +3218,7 @@ static void __split_huge_page(struct page *page, struct list_head *list)
 	ClearPageCompound(head);
 	spin_unlock_irq(&zone->lru_lock);
 
-	unfreeze_page(page_anon_vma(head), head);
+	unfreeze_page(head);
 
 	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		struct page *subpage = head + i;
@@ -3461,7 +3314,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	}
 
 	mlocked = PageMlocked(page);
-	freeze_page(anon_vma, head);
+	freeze_page(head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
 	/* Make sure the page is not on per-CPU pagevec as it takes pin */
@@ -3490,7 +3343,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		BUG();
 	} else {
 		spin_unlock_irqrestore(&pgdata->split_queue_lock, flags);
-		unfreeze_page(anon_vma, head);
+		unfreeze_page(head);
 		ret = -EBUSY;
 	}
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 945933a01010..8fa96fa3225e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1431,8 +1431,13 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
 		goto out;
 
-	if (flags & TTU_SPLIT_HUGE_PMD)
-		split_huge_pmd_address(vma, address);
+	if (flags & TTU_SPLIT_HUGE_PMD) {
+		split_huge_pmd_address(vma, address, flags & TTU_MIGRATION);
+		/* check if we have anything to do after split */
+		if (page_mapcount(page) == 0)
+			goto out;
+	}
+
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
  2016-03-07 11:57 ` [PATCHv2 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers Kirill A. Shutemov
@ 2016-03-11  9:42   ` Kirill A. Shutemov
  2016-04-19 13:55   ` Sasha Levin
  1 sibling, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2016-03-11  9:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, linux-mm

On Mon, Mar 07, 2016 at 02:57:18PM +0300, Kirill A. Shutemov wrote:
> freeze_page() and unfreeze_page() helpers evolved in rather complex
> beasts. It would be nice to cut complexity of this code.
> 
> This patch rewrites freeze_page() using standard try_to_unmap().
> unfreeze_page() is rewritten with remove_migration_ptes().
> 
> The result is much simpler.
> 
> But the new variant is somewhat slower for PTE-mapped THPs.
> Current helpers iterates over VMAs the compound page is mapped to, and
> then over ptes within this VMA. New helpers iterates over small page,
> then over VMA the small page mapped to, and only then find relevant pte.
> 
> We have short cut for PMD-mapped THP: we directly install migration
> entries on PMD split.
> 
> I don't think the slowdown is critical, considering how much simpler
> result is and that split_huge_page() is quite rare nowadays. It only
> happens due memory pressure or migration.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  include/linux/huge_mm.h |  10 ++-
>  mm/huge_memory.c        | 201 +++++++-----------------------------------------
>  mm/rmap.c               |   9 ++-
>  3 files changed, 40 insertions(+), 180 deletions(-)

Andrew, could you please fold patch below into this one.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
  2016-03-07 11:57 ` [PATCHv2 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers Kirill A. Shutemov
  2016-03-11  9:42   ` Kirill A. Shutemov
@ 2016-04-19 13:55   ` Sasha Levin
  1 sibling, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2016-04-19 13:55 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton; +Cc: Andrea Arcangeli, linux-mm

On 03/07/2016 06:57 AM, Kirill A. Shutemov wrote:
> freeze_page() and unfreeze_page() helpers evolved in rather complex
> beasts. It would be nice to cut complexity of this code.
> 
> This patch rewrites freeze_page() using standard try_to_unmap().
> unfreeze_page() is rewritten with remove_migration_ptes().
> 
> The result is much simpler.
> 
> But the new variant is somewhat slower for PTE-mapped THPs.
> Current helpers iterates over VMAs the compound page is mapped to, and
> then over ptes within this VMA. New helpers iterates over small page,
> then over VMA the small page mapped to, and only then find relevant pte.
> 
> We have short cut for PMD-mapped THP: we directly install migration
> entries on PMD split.
> 
> I don't think the slowdown is critical, considering how much simpler
> result is and that split_huge_page() is quite rare nowadays. It only
> happens due memory pressure or migration.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Hey Kirill,

I'm seeing the following while fuzzing:

[  302.029712] page:ffffea0002fa7fc0 count:0 mapcount:0 mapping:dead000000000400 index:0x0 compound_mapcount: 0

[  302.033158] flags: 0x1fffff80000000()

[  302.037878] ------------[ cut here ]------------

[  302.038574] kernel BUG at include/linux/page-flags.h:332!

[  302.038574] invalid opcode: 0000 [#1] PREEMPT SMP KASAN

[  302.038574] Modules linked in:

[  302.038574] CPU: 0 PID: 9538 Comm: trinity-c394 Not tainted 4.6.0-rc3-next-20160412-sasha-00024-geaec67e-dirty #3002

[  302.038574] task: ffff8800c29fc000 ti: ffff8800c2a90000 task.ti: ffff8800c2a90000

[  302.046951] RIP: clear_pages_mlock (include/linux/page-flags.h:332 mm/mlock.c:82)
[  302.046951] RSP: 0018:ffff8800c2a970e0  EFLAGS: 00010286

[  302.046951] RAX: 0000000000000000 RBX: ffffea0002fa7fc0 RCX: 0000000000000000

[  302.046951] RDX: 1ffffd40005f4fff RSI: 0000000000000282 RDI: ffffea0002fa7ff8

[  302.046951] RBP: ffff8800c2a97120 R08: 6d75642065676170 R09: 6163656220646570

[  302.046951] R10: ffff8800c2a97ad8 R11: 5f4d56203a657375 R12: ffffea0002fa7fe0

[  302.046951] R13: ffffea0002fa8000 R14: dffffc0000000000 R15: 0000000000000000

[  302.046951] FS:  00007f26730a5700(0000) GS:ffff8801b1a00000(0000) knlGS:0000000000000000

[  302.063233] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[  302.063233] CR2: 00000000025cff00 CR3: 00000000c2a6d000 CR4: 00000000000006b0

[  302.063233] Stack:

[  302.063233]  0000000000000008 ffff8801b7fd3000 0000000100000000 ffffea0002fa7fc0

[  302.063233]  ffffea0002fa7fe0 ffffea0002fa0000 ffffea0002fa0001 0000000000000000

[  302.063233]  ffff8800c2a97170 ffffffff81712322 ffffffff817221ab 0000000000e00000

[  302.063233] Call Trace:

[  302.063233] page_remove_rmap (include/linux/page-flags.h:157 include/linux/page-flags.h:522 mm/rmap.c:1383)
[  302.063233] __split_huge_pmd_locked (include/linux/compiler.h:222 (discriminator 3) include/linux/page-flags.h:143 (discriminator 3) include/linux/mm.h:736 (discriminator 3) mm/huge_memory.c:3075 (discriminator 3))
[  302.063233] __split_huge_pmd (include/linux/spinlock.h:347 mm/huge_memory.c:3102)
[  302.063233] split_huge_pmd_address (mm/huge_memory.c:3137)
[  302.063233] try_to_unmap_one (include/linux/compiler.h:222 include/linux/page-flags.h:143 include/linux/page-flags.h:268 include/linux/mm.h:495 mm/rmap.c:1425)
[  302.063233] rmap_walk_anon (mm/rmap.c:1762)
[  302.063233] rmap_walk_locked (mm/rmap.c:1845)
[  302.063233] try_to_unmap (mm/rmap.c:1643)
[  302.063233] split_huge_page_to_list (mm/huge_memory.c:3191 mm/huge_memory.c:3380)
[  302.063233] queue_pages_pte_range (mm/mempolicy.c:505)
[  302.063233] __walk_page_range (mm/pagewalk.c:51 mm/pagewalk.c:90 mm/pagewalk.c:116 mm/pagewalk.c:204)
[  302.063233] walk_page_range (mm/pagewalk.c:282)
[  302.063233] queue_pages_range (mm/mempolicy.c:667)
[  302.063233] migrate_to_node (include/linux/compiler.h:222 include/linux/list.h:189 mm/mempolicy.c:1002)
[  302.063233] do_migrate_pages (mm/mempolicy.c:1105)
[  302.063233] SYSC_migrate_pages (mm/mempolicy.c:1451)
[  302.063233] SyS_migrate_pages (mm/mempolicy.c:1369)
[  302.063233] do_syscall_64 (arch/x86/entry/common.c:350)
[  302.063233] entry_SYSCALL64_slow_path (arch/x86/entry/entry_64.S:251)
[ 302.063233] Code: 42 80 3c 30 00 74 08 4c 89 e7 e8 c7 f8 08 00 48 8b 43 20 a8 01 74 22 e8 da e2 ea ff 48 c7 c6 e0 9b 31 8b 48 89 df e8 0b 01 fe ff <0f> 0b 48 c7 c7 e0 3b 52 8f e8 5f 3b 9d 01 e8 b8 e2 ea ff 48 8b

All code
========
   0:	42 80 3c 30 00       	cmpb   $0x0,(%rax,%r14,1)
   5:	74 08                	je     0xf
   7:	4c 89 e7             	mov    %r12,%rdi
   a:	e8 c7 f8 08 00       	callq  0x8f8d6
   f:	48 8b 43 20          	mov    0x20(%rbx),%rax
  13:	a8 01                	test   $0x1,%al
  15:	74 22                	je     0x39
  17:	e8 da e2 ea ff       	callq  0xffffffffffeae2f6
  1c:	48 c7 c6 e0 9b 31 8b 	mov    $0xffffffff8b319be0,%rsi
  23:	48 89 df             	mov    %rbx,%rdi
  26:	e8 0b 01 fe ff       	callq  0xfffffffffffe0136
  2b:*	0f 0b                	ud2    		<-- trapping instruction
  2d:	48 c7 c7 e0 3b 52 8f 	mov    $0xffffffff8f523be0,%rdi
  34:	e8 5f 3b 9d 01       	callq  0x19d3b98
  39:	e8 b8 e2 ea ff       	callq  0xffffffffffeae2f6
  3e:	48 8b 00             	mov    (%rax),%rax

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	48 c7 c7 e0 3b 52 8f 	mov    $0xffffffff8f523be0,%rdi
   9:	e8 5f 3b 9d 01       	callq  0x19d3b6d
   e:	e8 b8 e2 ea ff       	callq  0xffffffffffeae2cb
  13:	48 8b 00             	mov    (%rax),%rax
[  302.063233] RIP clear_pages_mlock (include/linux/page-flags.h:332 mm/mlock.c:82)
[  302.063233]  RSP <ffff8800c2a970e0>




Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-04-19 13:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-07 11:57 [PATCHv2 0/4] thp: simplify freeze_page() and unfreeze_page() Kirill A. Shutemov
2016-03-07 11:57 ` [PATCHv2 1/4] rmap: introduce rmap_walk_locked() Kirill A. Shutemov
2016-03-07 11:57 ` [PATCHv2 2/4] rmap: extend try_to_unmap() to be usable by split_huge_page() Kirill A. Shutemov
2016-03-07 11:57 ` [PATCHv2 3/4] mm: make remove_migration_ptes() beyond mm/migration.c Kirill A. Shutemov
2016-03-07 11:57 ` [PATCHv2 4/4] thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers Kirill A. Shutemov
2016-03-11  9:42   ` Kirill A. Shutemov
2016-04-19 13:55   ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).