All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Suleiman Souhlal <suleiman@google.com>,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [patch for-3.17] mm, thp: fix collapsing of hugepages on madvise
Date: Sun, 5 Oct 2014 21:41:15 +0300	[thread overview]
Message-ID: <20141005184115.GA21713@node.dhcp.inet.fi> (raw)
In-Reply-To: <alpine.DEB.2.02.1410041947080.7055@chino.kir.corp.google.com>

On Sat, Oct 04, 2014 at 07:48:04PM -0700, David Rientjes wrote:
> If an anonymous mapping is not allowed to fault thp memory and then
> madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
> collapse this memory into thp memory.
> 
> This occurs because the madvise(2) handler for thp, hugepage_advise(),
> clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
> until the final action of madvise_behavior().  This causes the
> khugepaged_enter_vma_merge() to be a no-op in hugepage_advise() when the
> vma had previously had VM_NOHUGEPAGE set.
> 
> Fix this by passing the correct vma flags to the khugepaged mm slot
> handler.  There's no chance khugepaged can run on this vma until after
> madvise_behavior() returns since we hold mm->mmap_sem.
> 
> It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
> in hugepage_advise(), but I didn't want to introduce special case
> behavior into madvise_behavior().  I think it's best to just let it
> always set vma->vm_flags itself.
> 
> Cc: <stable@vger.kernel.org>
> Reported-by: Suleiman Souhlal <suleiman@google.com>
> Signed-off-by: David Rientjes <rientjes@google.com>

Look like rather complex fix for a not that complex bug.
What about untested patch below?

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Sun, 5 Oct 2014 21:22:43 +0300
Subject: [PATCH] thp: fix registering VMA into khugepaged on
 madvise(MADV_HUGEPAGE)

hugepage_madvise() tries to register VMA into khugepaged with
khugepaged_enter_vma_merge() on madvise(MADV_HUGEPAGE). Unfortunately
it's effectevely nop, since khugepaged_enter_vma_merge() rely on
vma->vm_flags which has not yet updated by the time of
hugepage_madvise().

Let's move khugepaged_enter_vma_merge() to the end of madvise_behavior().
Now we also have chance to catch VMAs which become good for THP after
vma_merge().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c | 8 +++-----
 mm/madvise.c     | 6 ++++++
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f8ffd9412ec5..f84d52158a66 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1966,12 +1966,10 @@ int hugepage_madvise(struct vm_area_struct *vma,
 		*vm_flags &= ~VM_NOHUGEPAGE;
 		*vm_flags |= VM_HUGEPAGE;
 		/*
-		 * If the vma become good for khugepaged to scan,
-		 * register it here without waiting a page fault that
-		 * may not happen any time soon.
+		 * vma->vm_flags is not yet updated here. madvise_behavior()
+		 * will take care to register it in khugepaged once flags
+		 * updated.
 		 */
-		if (unlikely(khugepaged_enter_vma_merge(vma)))
-			return -ENOMEM;
 		break;
 	case MADV_NOHUGEPAGE:
 		/*
diff --git a/mm/madvise.c b/mm/madvise.c
index 0938b30da4ab..60effd2c5e9c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -128,6 +128,12 @@ success:
 	 */
 	vma->vm_flags = new_flags;
 
+	/*
+	 * If the vma become good for khugepaged to scan, register it here
+	 * without waiting a page fault that may not happen any time soon.
+	 */
+	if (unlikely(khugepaged_enter_vma_merge(vma)))
+		error = -ENOMEM;
 out:
 	if (error == -ENOMEM)
 		error = -EAGAIN;
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Suleiman Souhlal <suleiman@google.com>,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [patch for-3.17] mm, thp: fix collapsing of hugepages on madvise
Date: Sun, 5 Oct 2014 21:41:15 +0300	[thread overview]
Message-ID: <20141005184115.GA21713@node.dhcp.inet.fi> (raw)
In-Reply-To: <alpine.DEB.2.02.1410041947080.7055@chino.kir.corp.google.com>

On Sat, Oct 04, 2014 at 07:48:04PM -0700, David Rientjes wrote:
> If an anonymous mapping is not allowed to fault thp memory and then
> madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
> collapse this memory into thp memory.
> 
> This occurs because the madvise(2) handler for thp, hugepage_advise(),
> clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
> until the final action of madvise_behavior().  This causes the
> khugepaged_enter_vma_merge() to be a no-op in hugepage_advise() when the
> vma had previously had VM_NOHUGEPAGE set.
> 
> Fix this by passing the correct vma flags to the khugepaged mm slot
> handler.  There's no chance khugepaged can run on this vma until after
> madvise_behavior() returns since we hold mm->mmap_sem.
> 
> It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
> in hugepage_advise(), but I didn't want to introduce special case
> behavior into madvise_behavior().  I think it's best to just let it
> always set vma->vm_flags itself.
> 
> Cc: <stable@vger.kernel.org>
> Reported-by: Suleiman Souhlal <suleiman@google.com>
> Signed-off-by: David Rientjes <rientjes@google.com>

Look like rather complex fix for a not that complex bug.
What about untested patch below?

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Sun, 5 Oct 2014 21:22:43 +0300
Subject: [PATCH] thp: fix registering VMA into khugepaged on
 madvise(MADV_HUGEPAGE)

hugepage_madvise() tries to register VMA into khugepaged with
khugepaged_enter_vma_merge() on madvise(MADV_HUGEPAGE). Unfortunately
it's effectevely nop, since khugepaged_enter_vma_merge() rely on
vma->vm_flags which has not yet updated by the time of
hugepage_madvise().

Let's move khugepaged_enter_vma_merge() to the end of madvise_behavior().
Now we also have chance to catch VMAs which become good for THP after
vma_merge().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c | 8 +++-----
 mm/madvise.c     | 6 ++++++
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f8ffd9412ec5..f84d52158a66 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1966,12 +1966,10 @@ int hugepage_madvise(struct vm_area_struct *vma,
 		*vm_flags &= ~VM_NOHUGEPAGE;
 		*vm_flags |= VM_HUGEPAGE;
 		/*
-		 * If the vma become good for khugepaged to scan,
-		 * register it here without waiting a page fault that
-		 * may not happen any time soon.
+		 * vma->vm_flags is not yet updated here. madvise_behavior()
+		 * will take care to register it in khugepaged once flags
+		 * updated.
 		 */
-		if (unlikely(khugepaged_enter_vma_merge(vma)))
-			return -ENOMEM;
 		break;
 	case MADV_NOHUGEPAGE:
 		/*
diff --git a/mm/madvise.c b/mm/madvise.c
index 0938b30da4ab..60effd2c5e9c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -128,6 +128,12 @@ success:
 	 */
 	vma->vm_flags = new_flags;
 
+	/*
+	 * If the vma become good for khugepaged to scan, register it here
+	 * without waiting a page fault that may not happen any time soon.
+	 */
+	if (unlikely(khugepaged_enter_vma_merge(vma)))
+		error = -ENOMEM;
 out:
 	if (error == -ENOMEM)
 		error = -EAGAIN;
-- 
 Kirill A. Shutemov

  parent reply	other threads:[~2014-10-05 18:41 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-05  2:48 [patch for-3.17] mm, thp: fix collapsing of hugepages on madvise David Rientjes
2014-10-05  2:48 ` David Rientjes
2014-10-05 17:15 ` Linus Torvalds
2014-10-05 17:15   ` Linus Torvalds
2014-10-05 18:41 ` Kirill A. Shutemov [this message]
2014-10-05 18:41   ` Kirill A. Shutemov
2014-10-05 18:51   ` Linus Torvalds
2014-10-05 18:51     ` Linus Torvalds
2014-10-06  9:42   ` David Rientjes
2014-10-06  9:42     ` David Rientjes
2014-10-06 15:03 ` Kirill A. Shutemov
2014-10-06 15:03   ` Kirill A. Shutemov
2014-10-06 20:53   ` David Rientjes
2014-10-06 20:53     ` David Rientjes
2014-10-15 21:13 ` [patch resend] " David Rientjes
2014-10-15 21:13   ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141005184115.GA21713@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=suleiman@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.