Re: [PATCH 3/3] mm,migration: Remove straggling migration PTEs when page tables are being moved after the VMA has already moved

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Arcangeli <aarcange@redhat.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>, Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 3/3] mm,migration: Remove straggling migration PTEs when page tables are being moved after the VMA has already moved
Date: Wed, 28 Apr 2010 04:42:27 +0200	[thread overview]
Message-ID: <20100428024227.GN510@random.random> (raw)
In-Reply-To: <20100428111248.2797801c.kamezawa.hiroyu@jp.fujitsu.com>

On Wed, Apr 28, 2010 at 11:12:48AM +0900, KAMEZAWA Hiroyuki wrote:
> The page can be replaced with migration_pte before the 1st vma_adjust.
> 
> The key is 
> 	(vma, page) <-> address <-> pte <-> page
> relationship.
> 
> 	vma_adjust() 
> 	(*)
> 	move_pagetables();
> 	(**)
> 	vma_adjust();
> 
> At (*), vma_address(vma, page) retruns a _new_ address. But pte is not
> updated. This is ciritcal for rmap_walk. We're safe at (**).

Yes I agree we can move the unlock at (**) because the last vma_adjust
is only there to truncate the vm_end. In fact it looks super
heavyweight to call vma_adjust for that instead of just using
vma->vm_end = new_end considering we're under mmap_sem, full anonymous
etc... In fact I think even the first vma_adjust looks too
heavyweight and it doesn't bring any simplicity or added safety
considering this works in place and there's nothing to wonder about
vm_next or vma_merge or vm_file or anything that vma_adjust is good at.

So the confusion I had about vm_pgoff is because all things that moves
vm_start down, also move vm_pgoff down like stack growsdown but of
course those don't move the pages down too, so we must not alter
vm_pgoff here just vm_start along with the pagetables inside the
anon_vma lock to be fully safe. Also I forgot to unlock in case of
-ENOMEM ;)

this is a new try, next is for a later time... hope this helps!

Thanks!

----
Subject: fix race between shift_arg_pages and rmap_walk

From: Andrea Arcangeli <aarcange@redhat.com>

migrate.c requires rmap to be able to find all ptes mapping a page at
all times, otherwise the migration entry can be instantiated, but it
can't be removed if the second rmap_walk fails to find the page.

So shift_arg_pages must run atomically with respect of rmap_walk, and
it's enough to run it under the anon_vma lock to make it atomic.

And split_huge_page() will have the same requirements as migrate.c
already has.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -55,6 +55,7 @@
 #include <linux/fsnotify.h>
 #include <linux/fs_struct.h>
 #include <linux/pipe_fs_i.h>
+#include <linux/rmap.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -502,6 +503,7 @@ static int shift_arg_pages(struct vm_are
 	unsigned long length = old_end - old_start;
 	unsigned long new_start = old_start - shift;
 	unsigned long new_end = old_end - shift;
+	unsigned long moved_length;
 	struct mmu_gather *tlb;
 
 	BUG_ON(new_start > new_end);
@@ -514,16 +516,26 @@ static int shift_arg_pages(struct vm_are
 		return -EFAULT;
 
 	/*
+	 * Stop the rmap walk or it won't find the stack pages, we've
+	 * to keep the lock hold until all pages are moved to the new
+	 * vm_start so their page->index will be always found
+	 * consistent with the unchanged vm_pgoff.
+	 */
+	spin_lock(&vma->anon_vma->lock);
+
+	/*
 	 * cover the whole range: [new_start, old_end)
 	 */
-	vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL);
+	vma->vm_start = new_start;
 
 	/*
 	 * move the page tables downwards, on failure we rely on
 	 * process cleanup to remove whatever mess we made.
 	 */
-	if (length != move_page_tables(vma, old_start,
-				       vma, new_start, length))
+	moved_length = move_page_tables(vma, old_start,
+					vma, new_start, length);
+	spin_unlock(&vma->anon_vma->lock);
+	if (length != moved_length) 
 		return -ENOMEM;
 
 	lru_add_drain();
@@ -549,7 +561,7 @@ static int shift_arg_pages(struct vm_are
 	/*
 	 * shrink the vma to just the new range.
 	 */
-	vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL);
+	vma->vm_end = new_end;
 
 	return 0;
 }

WARNING: multiple messages have this Message-ID (diff)

From: Andrea Arcangeli <aarcange@redhat.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>, Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 3/3] mm,migration: Remove straggling migration PTEs when page tables are being moved after the VMA has already moved
Date: Wed, 28 Apr 2010 04:42:27 +0200	[thread overview]
Message-ID: <20100428024227.GN510@random.random> (raw)
In-Reply-To: <20100428111248.2797801c.kamezawa.hiroyu@jp.fujitsu.com>

On Wed, Apr 28, 2010 at 11:12:48AM +0900, KAMEZAWA Hiroyuki wrote:
> The page can be replaced with migration_pte before the 1st vma_adjust.
> 
> The key is 
> 	(vma, page) <-> address <-> pte <-> page
> relationship.
> 
> 	vma_adjust() 
> 	(*)
> 	move_pagetables();
> 	(**)
> 	vma_adjust();
> 
> At (*), vma_address(vma, page) retruns a _new_ address. But pte is not
> updated. This is ciritcal for rmap_walk. We're safe at (**).

Yes I agree we can move the unlock at (**) because the last vma_adjust
is only there to truncate the vm_end. In fact it looks super
heavyweight to call vma_adjust for that instead of just using
vma->vm_end = new_end considering we're under mmap_sem, full anonymous
etc... In fact I think even the first vma_adjust looks too
heavyweight and it doesn't bring any simplicity or added safety
considering this works in place and there's nothing to wonder about
vm_next or vma_merge or vm_file or anything that vma_adjust is good at.

So the confusion I had about vm_pgoff is because all things that moves
vm_start down, also move vm_pgoff down like stack growsdown but of
course those don't move the pages down too, so we must not alter
vm_pgoff here just vm_start along with the pagetables inside the
anon_vma lock to be fully safe. Also I forgot to unlock in case of
-ENOMEM ;)

this is a new try, next is for a later time... hope this helps!

Thanks!

----
Subject: fix race between shift_arg_pages and rmap_walk

From: Andrea Arcangeli <aarcange@redhat.com>

migrate.c requires rmap to be able to find all ptes mapping a page at
all times, otherwise the migration entry can be instantiated, but it
can't be removed if the second rmap_walk fails to find the page.

So shift_arg_pages must run atomically with respect of rmap_walk, and
it's enough to run it under the anon_vma lock to make it atomic.

And split_huge_page() will have the same requirements as migrate.c
already has.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -55,6 +55,7 @@
 #include <linux/fsnotify.h>
 #include <linux/fs_struct.h>
 #include <linux/pipe_fs_i.h>
+#include <linux/rmap.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -502,6 +503,7 @@ static int shift_arg_pages(struct vm_are
 	unsigned long length = old_end - old_start;
 	unsigned long new_start = old_start - shift;
 	unsigned long new_end = old_end - shift;
+	unsigned long moved_length;
 	struct mmu_gather *tlb;
 
 	BUG_ON(new_start > new_end);
@@ -514,16 +516,26 @@ static int shift_arg_pages(struct vm_are
 		return -EFAULT;
 
 	/*
+	 * Stop the rmap walk or it won't find the stack pages, we've
+	 * to keep the lock hold until all pages are moved to the new
+	 * vm_start so their page->index will be always found
+	 * consistent with the unchanged vm_pgoff.
+	 */
+	spin_lock(&vma->anon_vma->lock);
+
+	/*
 	 * cover the whole range: [new_start, old_end)
 	 */
-	vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL);
+	vma->vm_start = new_start;
 
 	/*
 	 * move the page tables downwards, on failure we rely on
 	 * process cleanup to remove whatever mess we made.
 	 */
-	if (length != move_page_tables(vma, old_start,
-				       vma, new_start, length))
+	moved_length = move_page_tables(vma, old_start,
+					vma, new_start, length);
+	spin_unlock(&vma->anon_vma->lock);
+	if (length != moved_length) 
 		return -ENOMEM;
 
 	lru_add_drain();
@@ -549,7 +561,7 @@ static int shift_arg_pages(struct vm_are
 	/*
 	 * shrink the vma to just the new range.
 	 */
-	vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL);
+	vma->vm_end = new_end;
 
 	return 0;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-04-28  2:43 UTC|newest]

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-27 21:30 [PATCH 0/3] Fix migration races in rmap_walk() V2 Mel Gorman
2010-04-27 21:30 ` Mel Gorman
2010-04-27 21:30 ` [PATCH 1/3] mm,migration: During fork(), wait for migration to end if migration PTE is encountered Mel Gorman
2010-04-27 21:30   ` Mel Gorman
2010-04-27 22:22   ` Andrea Arcangeli
2010-04-27 22:22     ` Andrea Arcangeli
2010-04-27 23:52     ` KAMEZAWA Hiroyuki
2010-04-27 23:52       ` KAMEZAWA Hiroyuki
2010-04-28  0:18       ` Andrea Arcangeli
2010-04-28  0:18         ` Andrea Arcangeli
2010-04-28  0:19         ` Andrea Arcangeli
2010-04-28  0:19           ` Andrea Arcangeli
2010-04-28  0:28           ` KAMEZAWA Hiroyuki
2010-04-28  0:28             ` KAMEZAWA Hiroyuki
2010-04-28  0:59             ` Andrea Arcangeli
2010-04-28  0:59               ` Andrea Arcangeli
2010-04-28  8:24       ` Mel Gorman
2010-04-28  8:24         ` Mel Gorman
2010-04-27 21:30 ` [PATCH 2/3] mm,migration: Prevent rmap_walk_[anon|ksm] seeing the wrong VMA information Mel Gorman
2010-04-27 21:30   ` Mel Gorman
2010-04-27 23:10   ` Andrea Arcangeli
2010-04-27 23:10     ` Andrea Arcangeli
2010-04-28  9:15     ` Mel Gorman
2010-04-28  9:15       ` Mel Gorman
2010-04-28 15:35       ` Andrea Arcangeli
2010-04-28 15:35         ` Andrea Arcangeli
2010-04-28 15:39         ` Andrea Arcangeli
2010-04-28 15:39           ` Andrea Arcangeli
2010-04-28 15:55         ` Mel Gorman
2010-04-28 15:55           ` Mel Gorman
2010-04-28 16:23           ` Andrea Arcangeli
2010-04-28 16:23             ` Andrea Arcangeli
2010-04-28 17:34             ` Mel Gorman
2010-04-28 17:34               ` Mel Gorman
2010-04-28 17:58               ` Andrea Arcangeli
2010-04-28 17:58                 ` Andrea Arcangeli
2010-04-28 17:47             ` [RFC PATCH] take all anon_vma locks in anon_vma_lock Rik van Riel
2010-04-28 17:47               ` Rik van Riel
2010-04-28 18:03               ` Andrea Arcangeli
2010-04-28 18:03                 ` Andrea Arcangeli
2010-04-28 18:09                 ` Rik van Riel
2010-04-28 18:09                   ` Rik van Riel
2010-04-28 18:25               ` [RFC PATCH -v2] " Rik van Riel
2010-04-28 18:25                 ` Rik van Riel
2010-04-28 19:07                 ` Mel Gorman
2010-04-28 19:07                   ` Mel Gorman
2010-04-28 20:17                 ` [RFC PATCH -v3] " Rik van Riel
2010-04-28 20:17                   ` Rik van Riel
2010-04-28 20:57                   ` Rik van Riel
2010-04-28 20:57                     ` Rik van Riel
2010-04-29  0:28                     ` Minchan Kim
2010-04-29  0:28                       ` Minchan Kim
2010-04-29  2:10                       ` Rik van Riel
2010-04-29  2:10                         ` Rik van Riel
2010-04-29  2:55                         ` Minchan Kim
2010-04-29  2:55                           ` Minchan Kim
2010-04-29  6:42                           ` Minchan Kim
2010-04-29  6:42                             ` Minchan Kim
2010-04-29 15:39                           ` Rik van Riel
2010-04-29 15:39                             ` Rik van Riel
2010-04-29  7:37                       ` Mel Gorman
2010-04-29  7:37                         ` Mel Gorman
2010-04-29  8:15                     ` Mel Gorman
2010-04-29  8:15                       ` Mel Gorman
2010-04-29  8:32                       ` Minchan Kim
2010-04-29  8:32                         ` Minchan Kim
2010-04-29  8:44                         ` Mel Gorman
2010-04-29  8:44                           ` Mel Gorman
2010-04-27 21:30 ` [PATCH 3/3] mm,migration: Remove straggling migration PTEs when page tables are being moved after the VMA has already moved Mel Gorman
2010-04-27 21:30   ` Mel Gorman
2010-04-27 22:30   ` Andrea Arcangeli
2010-04-27 22:30     ` Andrea Arcangeli
2010-04-27 22:58     ` Andrea Arcangeli
2010-04-27 22:58       ` Andrea Arcangeli
2010-04-28  0:39       ` KAMEZAWA Hiroyuki
2010-04-28  0:39         ` KAMEZAWA Hiroyuki
2010-04-28  1:05         ` Andrea Arcangeli
2010-04-28  1:05           ` Andrea Arcangeli
2010-04-28  1:09           ` Andrea Arcangeli
2010-04-28  1:09             ` Andrea Arcangeli
2010-04-28  1:18           ` KAMEZAWA Hiroyuki
2010-04-28  1:18             ` KAMEZAWA Hiroyuki
2010-04-28  1:36             ` Andrea Arcangeli
2010-04-28  1:36               ` Andrea Arcangeli
2010-04-28  1:29       ` KAMEZAWA Hiroyuki
2010-04-28  1:29         ` KAMEZAWA Hiroyuki
2010-04-28  1:44         ` Andrea Arcangeli
2010-04-28  1:44           ` Andrea Arcangeli
2010-04-28  2:12           ` KAMEZAWA Hiroyuki
2010-04-28  2:12             ` KAMEZAWA Hiroyuki
2010-04-28  2:42             ` Andrea Arcangeli [this message]
2010-04-28  2:42               ` Andrea Arcangeli
2010-04-28  2:49               ` KAMEZAWA Hiroyuki
2010-04-28  2:49                 ` KAMEZAWA Hiroyuki
2010-04-28  7:28                 ` KAMEZAWA Hiroyuki
2010-04-28  7:28                   ` KAMEZAWA Hiroyuki
2010-04-28 10:48                   ` Mel Gorman
2010-04-28 10:48                     ` Mel Gorman
2010-04-28  0:03   ` KAMEZAWA Hiroyuki
2010-04-28  0:03     ` KAMEZAWA Hiroyuki
2010-04-28  0:08     ` Andrea Arcangeli
2010-04-28  0:08       ` Andrea Arcangeli
2010-04-28  0:36       ` KAMEZAWA Hiroyuki
2010-04-28  0:36         ` KAMEZAWA Hiroyuki
2010-04-28  8:30   ` KAMEZAWA Hiroyuki
2010-04-28  8:30     ` KAMEZAWA Hiroyuki
2010-04-28 14:46     ` Andrea Arcangeli
2010-04-28 14:46       ` Andrea Arcangeli
2010-04-27 22:27 ` [PATCH 0/3] Fix migration races in rmap_walk() V2 Christoph Lameter
2010-04-27 22:27   ` Christoph Lameter
2010-04-27 22:32   ` Andrea Arcangeli
2010-04-27 22:32     ` Andrea Arcangeli
2010-04-28  0:13     ` KAMEZAWA Hiroyuki
2010-04-28  0:13       ` KAMEZAWA Hiroyuki
2010-04-28  0:20       ` Andrea Arcangeli
2010-04-28  0:20         ` Andrea Arcangeli
2010-04-28 14:23         ` Mel Gorman
2010-04-28 14:23           ` Mel Gorman
2010-04-28 14:57           ` Mel Gorman
2010-04-28 14:57             ` Mel Gorman
2010-04-28 15:16             ` Andrea Arcangeli
2010-04-28 15:16               ` Andrea Arcangeli
2010-04-28 15:23               ` Mel Gorman
2010-04-28 15:23                 ` Mel Gorman
2010-04-28 15:45                 ` Andrea Arcangeli
2010-04-28 15:45                   ` Andrea Arcangeli
2010-04-28 20:40                   ` Andrea Arcangeli
2010-04-28 20:40                     ` Andrea Arcangeli
2010-04-28 21:05                     ` Andrea Arcangeli
2010-04-28 21:05                       ` Andrea Arcangeli
2010-04-28  9:17     ` Mel Gorman
2010-04-28  9:17       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100428024227.GN510@random.random \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.