linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>,
	Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
	David Rientjes <rientjes@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 10/12] mm: Check for an empty VMA list in rmap_walk_anon
Date: Wed, 17 Feb 2010 18:22:47 +0000	[thread overview]
Message-ID: <20100217182247.GA15943@csn.ul.ie> (raw)
In-Reply-To: <1265976059-7459-11-git-send-email-mel@csn.ul.ie>

On Fri, Feb 12, 2010 at 12:00:57PM +0000, Mel Gorman wrote:
> There appears to be a race in rmap_walk_anon() that can be triggered by using
> page migration under heavy load on pages that do not belong to the process
> doing the migration - e.g. during memory compaction. The bug triggered is
> a NULL pointer deference in list_for_each_entry().
> 

Incidentally, I hate patches 10-12 with the last one in particular being
outright flaky. I'm looking at copying the KSM approach instead and have
started tests using the following patch. Any comments on this approach
for ensuring the anon_vma exists for as long as it's needed? Obviously,
the refcounts for KSM and migrate can be collapsed to save space but this
is clearer to review.

=== CUT HERE ====
mm,migration: Take a reference to the anon_vma before migrating

rmap_walk_anon() does not use page_lock_anon_vma() for looking up and
locking an anon_vma and it does not appear to have sufficient locking to
ensure the anon_vma does not disappear from under it.

This patch copies an approach used by KSM to take a reference on the
anon_vma while pages are being migrated. This should prevent rmap_walk()
running into nasty surprises later because anon_vma has been freed.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/rmap.h |   23 +++++++++++++++++++++++
 mm/migrate.c         |   12 ++++++++++++
 mm/rmap.c            |   10 +++++-----
 3 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b019ae6..6b5a1a9 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -29,6 +29,9 @@ struct anon_vma {
 #ifdef CONFIG_KSM
 	atomic_t ksm_refcount;
 #endif
+#ifdef CONFIG_MIGRATION
+	atomic_t migrate_refcount;
+#endif
 	/*
 	 * NOTE: the LSB of the head.next is set by
 	 * mm_take_all_locks() _after_ taking the above lock. So the
@@ -61,6 +64,26 @@ static inline int ksm_refcount(struct anon_vma *anon_vma)
 	return 0;
 }
 #endif /* CONFIG_KSM */
+#ifdef CONFIG_MIGRATION
+static inline void migrate_refcount_init(struct anon_vma *anon_vma)
+{
+	atomic_set(&anon_vma->migrate_refcount, 0);
+}
+
+static inline int migrate_refcount(struct anon_vma *anon_vma)
+{
+	return atomic_read(&anon_vma->migrate_refcount);
+}
+#else
+static inline void migrate_refcount_init(struct anon_vma *anon_vma)
+{
+}
+
+static inline int migrate_refcount(struct anon_vma *anon_vma)
+{
+	return 0;
+}
+#endif /* CONFIG_MIGRATE */
 
 static inline struct anon_vma *page_anon_vma(struct page *page)
 {
diff --git a/mm/migrate.c b/mm/migrate.c
index 9a0db5b..63addfa 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -551,6 +551,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 	int rcu_locked = 0;
 	int charge = 0;
 	struct mem_cgroup *mem = NULL;
+	struct anon_vma *anon_vma = NULL;
 
 	if (!newpage)
 		return -ENOMEM;
@@ -607,6 +608,8 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 	if (PageAnon(page)) {
 		rcu_read_lock();
 		rcu_locked = 1;
+		anon_vma = page_anon_vma(page);
+		atomic_inc(&anon_vma->migrate_refcount);
 	}
 
 	/*
@@ -646,6 +649,15 @@ skip_unmap:
 	if (rc)
 		remove_migration_ptes(page, page);
 rcu_unlock:
+
+	/* Drop an anon_vma reference if we took one */
+	if (anon_vma && atomic_dec_and_lock(&anon_vma->migrate_refcount, &anon_vma->lock)) {
+		int empty = list_empty(&anon_vma->head);
+		spin_unlock(&anon_vma->lock);
+		if (empty)
+			anon_vma_free(anon_vma);
+	}
+
 	if (rcu_locked)
 		rcu_read_unlock();
 uncharge:
diff --git a/mm/rmap.c b/mm/rmap.c
index 278cd27..11ba74a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -172,7 +172,8 @@ void anon_vma_unlink(struct vm_area_struct *vma)
 	list_del(&vma->anon_vma_node);
 
 	/* We must garbage collect the anon_vma if it's empty */
-	empty = list_empty(&anon_vma->head) && !ksm_refcount(anon_vma);
+	empty = list_empty(&anon_vma->head) && !ksm_refcount(anon_vma) &&
+					!migrate_refcount(anon_vma);
 	spin_unlock(&anon_vma->lock);
 
 	if (empty)
@@ -185,6 +186,7 @@ static void anon_vma_ctor(void *data)
 
 	spin_lock_init(&anon_vma->lock);
 	ksm_refcount_init(anon_vma);
+	migrate_refcount_init(anon_vma);
 	INIT_LIST_HEAD(&anon_vma->head);
 }
 
@@ -1228,10 +1230,8 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
 	/*
 	 * Note: remove_migration_ptes() cannot use page_lock_anon_vma()
 	 * because that depends on page_mapped(); but not all its usages
-	 * are holding mmap_sem, which also gave the necessary guarantee
-	 * (that this anon_vma's slab has not already been destroyed).
-	 * This needs to be reviewed later: avoiding page_lock_anon_vma()
-	 * is risky, and currently limits the usefulness of rmap_walk().
+	 * are holding mmap_sem. Users without mmap_sem are required to
+	 * take a reference count to prevent the anon_vma disappearing
 	 */
 	anon_vma = page_anon_vma(page);
 	if (!anon_vma)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-02-17 18:23 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-12 12:00 [PATCH 0/12] Memory Compaction v2r12 Mel Gorman
2010-02-12 12:00 ` [PATCH 01/12] mm: Document /proc/pagetypeinfo Mel Gorman
2010-02-12 15:54   ` Christoph Lameter
2010-02-16  7:05   ` KOSAKI Motohiro
2010-02-12 12:00 ` [PATCH 02/12] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA or memory hot-remove Mel Gorman
2010-02-16 17:43   ` Rik van Riel
2010-02-12 12:00 ` [PATCH 03/12] Export unusable free space index via /proc/pagetypeinfo Mel Gorman
2010-02-16  7:03   ` KOSAKI Motohiro
2010-02-16  8:36     ` Mel Gorman
2010-02-16  8:41       ` KOSAKI Motohiro
2010-02-16  8:50         ` Mel Gorman
2010-02-16 18:28   ` Rik van Riel
2010-02-18 15:23   ` Minchan Kim
2010-02-18 15:32     ` Mel Gorman
2010-02-12 12:00 ` [PATCH 04/12] Export fragmentation " Mel Gorman
2010-02-16  7:59   ` KOSAKI Motohiro
2010-02-16  8:41     ` Mel Gorman
2010-02-16  8:49       ` KOSAKI Motohiro
2010-02-17  1:44   ` Rik van Riel
2010-02-18 15:37   ` Minchan Kim
2010-02-12 12:00 ` [PATCH 05/12] Memory compaction core Mel Gorman
2010-02-16  8:31   ` KOSAKI Motohiro
2010-02-16  8:48     ` Mel Gorman
2010-02-16 14:55       ` Christoph Lameter
2010-02-16 14:59         ` Mel Gorman
2010-02-18 19:37           ` Christoph Lameter
2010-02-18 21:35             ` Mel Gorman
2010-02-19  0:04             ` KAMEZAWA Hiroyuki
2010-02-17 13:29     ` Mel Gorman
2010-02-17 15:45       ` Rik van Riel
2010-02-18 16:58   ` Minchan Kim
2010-02-18 17:34     ` Mel Gorman
2010-02-19  1:21       ` Minchan Kim
2010-02-19 14:33         ` Mel Gorman
2010-02-12 12:00 ` [PATCH 06/12] Add /proc trigger for memory compaction Mel Gorman
2010-02-12 18:34   ` Valdis.Kletnieks
2010-02-12 18:38     ` Mel Gorman
2010-02-17 16:30   ` Rik van Riel
2010-02-18 19:51   ` Christoph Lameter
2010-02-19  1:56   ` Minchan Kim
2010-02-12 12:00 ` [PATCH 07/12] Add /sys trigger for per-node " Mel Gorman
2010-02-17 16:30   ` Rik van Riel
2010-02-12 12:00 ` [PATCH 08/12] Direct compact when a high-order allocation fails Mel Gorman
2010-02-18  3:57   ` Rik van Riel
2010-02-12 12:00 ` [PATCH 09/12] Do not compact within a preferred zone after a compaction failure Mel Gorman
2010-02-18  4:09   ` Rik van Riel
2010-02-12 12:00 ` [PATCH 10/12] mm: Check for an empty VMA list in rmap_walk_anon Mel Gorman
2010-02-17 18:22   ` Mel Gorman [this message]
2010-02-12 12:00 ` [PATCH 11/12] mm: Take the RCU read lock " Mel Gorman
2010-02-12 12:00 ` [PATCH 12/12] mm: Check the anon_vma is still valid in rmap_walk_anon() Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100217182247.GA15943@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=aarcange@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=avi@redhat.com \
    --cc=cl@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).