linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: /prom/pid/clear_refs: avoid split_huge_page()
@ 2014-05-21 19:04 Kirill A. Shutemov
  2014-05-21 19:19 ` Cyrill Gorcunov
  2014-05-21 19:34 ` Andrew Morton
  0 siblings, 2 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2014-05-21 19:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Kirill A. Shutemov, Andrea Arcangeli,
	Pavel Emelyanov, Cyrill Gorcunov, Dave Hansen

Currently we split all THP pages on any clear_refs request. It's not
necessary. We can handle this on PMD level.

One side effect is that soft dirty will potentially see more dirty
memory, since we will mark whole THP page dirty at once.

Sanity checked with CRIU test suite. More testing is required.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
---
 fs/proc/task_mmu.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 43 insertions(+), 3 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 442177b1119a..9f5ae29f3037 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -716,10 +716,10 @@ struct clear_refs_private {
 	enum clear_refs_types type;
 };
 
+#ifdef CONFIG_MEM_SOFT_DIRTY
 static inline void clear_soft_dirty(struct vm_area_struct *vma,
 		unsigned long addr, pte_t *pte)
 {
-#ifdef CONFIG_MEM_SOFT_DIRTY
 	/*
 	 * The soft-dirty tracker uses #PF-s to catch writes
 	 * to pages, so write-protect the pte as well. See the
@@ -741,9 +741,34 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 		vma->vm_flags &= ~VM_SOFTDIRTY;
 
 	set_pte_at(vma->vm_mm, addr, pte, ptent);
-#endif
 }
 
+static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
+		unsigned long addr, pmd_t *pmdp)
+{
+	pmd_t pmd = *pmdp;
+
+	pmd = pmd_wrprotect(pmd);
+	pmd = pmd_clear_flags(pmd, _PAGE_SOFT_DIRTY);
+
+	if (vma->vm_flags & VM_SOFTDIRTY)
+		vma->vm_flags &= ~VM_SOFTDIRTY;
+
+	set_pmd_at(vma->vm_mm, addr, pmdp, pmd);
+}
+
+#else
+static inline void clear_soft_dirty(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *pte)
+{
+}
+
+static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
+		unsigned long addr, pmd_t *pmdp)
+{
+}
+#endif
+
 static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
 {
@@ -753,7 +778,22 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 	spinlock_t *ptl;
 	struct page *page;
 
-	split_huge_page_pmd(vma, addr, pmd);
+	if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
+		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
+			clear_soft_dirty_pmd(vma, addr, pmd);
+			spin_unlock(ptl);
+			return 0;
+		}
+
+		page = pmd_page(*pmd);
+
+		/* Clear accessed and referenced bits. */
+		pmdp_test_and_clear_young(vma, addr, pmd);
+		ClearPageReferenced(page);
+		spin_unlock(ptl);
+		return 0;
+	}
+
 	if (pmd_trans_unstable(pmd))
 		return 0;
 
-- 
2.0.0.rc2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: /prom/pid/clear_refs: avoid split_huge_page()
  2014-05-21 19:04 [PATCH] mm: /prom/pid/clear_refs: avoid split_huge_page() Kirill A. Shutemov
@ 2014-05-21 19:19 ` Cyrill Gorcunov
  2014-05-21 19:34 ` Andrew Morton
  1 sibling, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2014-05-21 19:19 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, linux-kernel, linux-mm, Andrea Arcangeli,
	Pavel Emelyanov, Dave Hansen

On Wed, May 21, 2014 at 10:04:22PM +0300, Kirill A. Shutemov wrote:
> Currently we split all THP pages on any clear_refs request. It's not
> necessary. We can handle this on PMD level.
> 
> One side effect is that soft dirty will potentially see more dirty
> memory, since we will mark whole THP page dirty at once.
> 
> Sanity checked with CRIU test suite. More testing is required.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Dave Hansen <dave.hansen@intel.com>

Looks reasonable to me, thanks!

Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: /prom/pid/clear_refs: avoid split_huge_page()
  2014-05-21 19:04 [PATCH] mm: /prom/pid/clear_refs: avoid split_huge_page() Kirill A. Shutemov
  2014-05-21 19:19 ` Cyrill Gorcunov
@ 2014-05-21 19:34 ` Andrew Morton
  2014-05-21 19:57   ` Cyrill Gorcunov
  2014-05-22  1:11   ` Kirill A. Shutemov
  1 sibling, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2014-05-21 19:34 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, Andrea Arcangeli, Pavel Emelyanov,
	Cyrill Gorcunov, Dave Hansen

On Wed, 21 May 2014 22:04:22 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> Currently we split all THP pages on any clear_refs request. It's not
> necessary. We can handle this on PMD level.
> 
> One side effect is that soft dirty will potentially see more dirty
> memory, since we will mark whole THP page dirty at once.

This clashes pretty badly with
http://ozlabs.org/~akpm/mmots/broken-out/clear_refs-redefine-callback-functions-for-page-table-walker.patch

> Sanity checked with CRIU test suite. More testing is required.

Will you be doing that testing or was this a request for Cyrill & co to
help?

Perhaps this is post-3.15 material.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: /prom/pid/clear_refs: avoid split_huge_page()
  2014-05-21 19:34 ` Andrew Morton
@ 2014-05-21 19:57   ` Cyrill Gorcunov
  2014-05-22  1:11   ` Kirill A. Shutemov
  1 sibling, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2014-05-21 19:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, linux-kernel, linux-mm, Andrea Arcangeli,
	Pavel Emelyanov, Dave Hansen

On Wed, May 21, 2014 at 12:34:46PM -0700, Andrew Morton wrote:
> On Wed, 21 May 2014 22:04:22 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > Currently we split all THP pages on any clear_refs request. It's not
> > necessary. We can handle this on PMD level.
> > 
> > One side effect is that soft dirty will potentially see more dirty
> > memory, since we will mark whole THP page dirty at once.
> 
> This clashes pretty badly with
> http://ozlabs.org/~akpm/mmots/broken-out/clear_refs-redefine-callback-functions-for-page-table-walker.patch
> 
> > Sanity checked with CRIU test suite. More testing is required.
> 
> Will you be doing that testing or was this a request for Cyrill & co to
> help?

We've talking to Kirill how to test is and end up that criu is
the best candidate (though I think I'll write selftest for
vanilla too, hopefully tomorrow).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: /prom/pid/clear_refs: avoid split_huge_page()
  2014-05-21 19:34 ` Andrew Morton
  2014-05-21 19:57   ` Cyrill Gorcunov
@ 2014-05-22  1:11   ` Kirill A. Shutemov
  2014-05-22  5:32     ` Cyrill Gorcunov
  1 sibling, 1 reply; 7+ messages in thread
From: Kirill A. Shutemov @ 2014-05-22  1:11 UTC (permalink / raw)
  To: Andrew Morton, Pavel Emelyanov, Cyrill Gorcunov
  Cc: Kirill A. Shutemov, linux-kernel, linux-mm, Andrea Arcangeli,
	Dave Hansen, Naoya Horiguchi

Andrew Morton wrote:
> On Wed, 21 May 2014 22:04:22 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > Currently we split all THP pages on any clear_refs request. It's not
> > necessary. We can handle this on PMD level.
> > 
> > One side effect is that soft dirty will potentially see more dirty
> > memory, since we will mark whole THP page dirty at once.
> 
> This clashes pretty badly with
> http://ozlabs.org/~akpm/mmots/broken-out/clear_refs-redefine-callback-functions-for-page-table-walker.patch

Hm.. For some reason CRIU memory-snapshotting test cases fail on current
linux-next. I didn't debug why. Mainline works. Folks?

Below is patch which applies on linux-next, but I wasn't able to test it.

> > Sanity checked with CRIU test suite. More testing is required.
> 
> Will you be doing that testing or was this a request for Cyrill & co to
> help?

Cyrill, Pavel, could you take care of this?

> Perhaps this is post-3.15 material.

Sure.

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Thu, 22 May 2014 03:44:38 +0300
Subject: [PATCH] mm: /prom/pid/clear_refs: avoid split_huge_page()

Currently pagewalker splits all THP pages on any clear_refs request.
It's not necessary. We can handle this on PMD level.

One side effect is that soft dirty will potentially see more dirty
memory, since we will mark whole THP page dirty at once.

Sanity checked with CRIU test suite. More testing is required.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/proc/task_mmu.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index fa6d6a4e85b3..0cc47a44d016 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -702,10 +702,10 @@ struct clear_refs_private {
 	enum clear_refs_types type;
 };
 
+#ifdef CONFIG_MEM_SOFT_DIRTY
 static inline void clear_soft_dirty(struct vm_area_struct *vma,
 		unsigned long addr, pte_t *pte)
 {
-#ifdef CONFIG_MEM_SOFT_DIRTY
 	/*
 	 * The soft-dirty tracker uses #PF-s to catch writes
 	 * to pages, so write-protect the pte as well. See the
@@ -724,9 +724,35 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 	}
 
 	set_pte_at(vma->vm_mm, addr, pte, ptent);
-#endif
 }
 
+static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
+		unsigned long addr, pmd_t *pmdp)
+{
+	pmd_t pmd = *pmdp;
+
+	pmd = pmd_wrprotect(pmd);
+	pmd = pmd_clear_flags(pmd, _PAGE_SOFT_DIRTY);
+
+	if (vma->vm_flags & VM_SOFTDIRTY)
+		vma->vm_flags &= ~VM_SOFTDIRTY;
+
+	set_pmd_at(vma->vm_mm, addr, pmdp, pmd);
+}
+
+#else
+
+static inline void clear_soft_dirty(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *pte)
+{
+}
+
+static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
+		unsigned long addr, pmd_t *pmdp)
+{
+}
+#endif
+
 static int clear_refs_pte(pte_t *pte, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
 {
@@ -749,6 +775,33 @@ static int clear_refs_pte(pte_t *pte, unsigned long addr,
 	return 0;
 }
 
+static int clear_refs_pmd(pmd_t *pmd, unsigned long addr,
+				unsigned long end, struct mm_walk *walk)
+{
+	struct clear_refs_private *cp = walk->private;
+	struct vm_area_struct *vma = walk->vma;
+	struct page *page;
+	spinlock_t *ptl;
+
+	if (pmd_trans_huge_lock(pmd, vma, &ptl) != 1)
+		return 0;
+	if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
+		clear_soft_dirty_pmd(vma, addr, pmd);
+		goto out;
+	}
+
+	page = pmd_page(*pmd);
+
+	/* Clear accessed and referenced bits. */
+	pmdp_test_and_clear_young(vma, addr, pmd);
+	ClearPageReferenced(page);
+out:
+	spin_unlock(ptl);
+	/* handled as pmd, no need to call clear_refs_pte() */
+	walk->skip = 1;
+	return 0;
+}
+
 static int clear_refs_test_walk(unsigned long start, unsigned long end,
 				struct mm_walk *walk)
 {
@@ -812,6 +865,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 		};
 		struct mm_walk clear_refs_walk = {
 			.pte_entry = clear_refs_pte,
+			.pmd_entry = clear_refs_pmd,
 			.test_walk = clear_refs_test_walk,
 			.mm = mm,
 			.private = &cp,
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: /prom/pid/clear_refs: avoid split_huge_page()
  2014-05-22  1:11   ` Kirill A. Shutemov
@ 2014-05-22  5:32     ` Cyrill Gorcunov
  2014-05-22  8:35       ` Cyrill Gorcunov
  0 siblings, 1 reply; 7+ messages in thread
From: Cyrill Gorcunov @ 2014-05-22  5:32 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Pavel Emelyanov, linux-kernel, linux-mm,
	Andrea Arcangeli, Dave Hansen, Naoya Horiguchi

On Thu, May 22, 2014 at 04:11:10AM +0300, Kirill A. Shutemov wrote:
> Andrew Morton wrote:
> > On Wed, 21 May 2014 22:04:22 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > 
> > > Currently we split all THP pages on any clear_refs request. It's not
> > > necessary. We can handle this on PMD level.
> > > 
> > > One side effect is that soft dirty will potentially see more dirty
> > > memory, since we will mark whole THP page dirty at once.
> > 
> > This clashes pretty badly with
> > http://ozlabs.org/~akpm/mmots/broken-out/clear_refs-redefine-callback-functions-for-page-table-walker.patch
> 
> Hm.. For some reason CRIU memory-snapshotting test cases fail on current
> linux-next. I didn't debug why. Mainline works. Folks?

Thanks for noticing, Kirill! I don't test linux-test regulary will try and
report the results.

	Cyrill

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: /prom/pid/clear_refs: avoid split_huge_page()
  2014-05-22  5:32     ` Cyrill Gorcunov
@ 2014-05-22  8:35       ` Cyrill Gorcunov
  0 siblings, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2014-05-22  8:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Pavel Emelyanov, linux-kernel, linux-mm,
	Andrea Arcangeli, Dave Hansen, Naoya Horiguchi

On Thu, May 22, 2014 at 09:32:47AM +0400, Cyrill Gorcunov wrote:
> On Thu, May 22, 2014 at 04:11:10AM +0300, Kirill A. Shutemov wrote:
> > Andrew Morton wrote:
> > > On Wed, 21 May 2014 22:04:22 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > > 
> > > > Currently we split all THP pages on any clear_refs request. It's not
> > > > necessary. We can handle this on PMD level.
> > > > 
> > > > One side effect is that soft dirty will potentially see more dirty
> > > > memory, since we will mark whole THP page dirty at once.
> > > 
> > > This clashes pretty badly with
> > > http://ozlabs.org/~akpm/mmots/broken-out/clear_refs-redefine-callback-functions-for-page-table-walker.patch
> > 
> > Hm.. For some reason CRIU memory-snapshotting test cases fail on current
> > linux-next. I didn't debug why. Mainline works. Folks?
> 
> Thanks for noticing, Kirill! I don't test linux-test regulary will try and
> report the results.

OK, I managed to run criu on linux-next. Due to changes in vdso it no longer
able to run. I'll handle it in criu and ping you then.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-05-22  8:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-21 19:04 [PATCH] mm: /prom/pid/clear_refs: avoid split_huge_page() Kirill A. Shutemov
2014-05-21 19:19 ` Cyrill Gorcunov
2014-05-21 19:34 ` Andrew Morton
2014-05-21 19:57   ` Cyrill Gorcunov
2014-05-22  1:11   ` Kirill A. Shutemov
2014-05-22  5:32     ` Cyrill Gorcunov
2014-05-22  8:35       ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).