linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] always lock the root anon_vma
@ 2010-05-12 17:38 Rik van Riel
  2010-05-12 17:39 ` [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma Rik van Riel
                   ` (5 more replies)
  0 siblings, 6 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-12 17:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

This patch series implements Linus's suggestion of always locking the
root anon_vma.  Because the lock in other anon_vmas no longer protects
anything at all, we cannot do the "lock dance" that Mel's earlier
patches implements and instead need a root pointer in the anon_vma.

The only subtlety these patches rely on is that the same_vma list
is ordered from new to old, with the root anon_vma at the very end.

This, together with the forward list walking in unlink_anon_vmas,
ensures that the root anon_vma is the last one freed.

The KSM refcount adds some additional complexity, because an anon_vma
can stick around after the processes it was attached to have already
exited.  Patch 5/5 deals with that issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma
  2010-05-12 17:38 [PATCH 0/5] always lock the root anon_vma Rik van Riel
@ 2010-05-12 17:39 ` Rik van Riel
  2010-05-12 20:57   ` Mel Gorman
  2010-05-13  0:30   ` KAMEZAWA Hiroyuki
  2010-05-12 17:39 ` [PATCH 3/5] track the root (oldest) anon_vma Rik van Riel
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-12 17:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

Subject: rename anon_vma_lock to vma_lock_anon_vma

Rename anon_vma_lock to vma_lock_anon_vma.  This matches the
naming style used in page_lock_anon_vma and will come in really
handy further down in this patch series.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/rmap.h |    4 ++--
 mm/mmap.c            |   14 +++++++-------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index d25bd22..88cae59 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -90,14 +90,14 @@ static inline struct anon_vma *page_anon_vma(struct page *page)
 	return page_rmapping(page);
 }
 
-static inline void anon_vma_lock(struct vm_area_struct *vma)
+static inline void vma_lock_anon_vma(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
 		spin_lock(&anon_vma->lock);
 }
 
-static inline void anon_vma_unlock(struct vm_area_struct *vma)
+static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
diff --git a/mm/mmap.c b/mm/mmap.c
index 456ec6f..d30bed3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -452,12 +452,12 @@ static void vma_link(struct mm_struct *mm, struct vm_area_struct *vma,
 		spin_lock(&mapping->i_mmap_lock);
 		vma->vm_truncate_count = mapping->truncate_count;
 	}
-	anon_vma_lock(vma);
+	vma_lock_anon_vma(vma);
 
 	__vma_link(mm, vma, prev, rb_link, rb_parent);
 	__vma_link_file(vma);
 
-	anon_vma_unlock(vma);
+	vma_unlock_anon_vma(vma);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
 
@@ -1710,7 +1710,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 	 */
 	if (unlikely(anon_vma_prepare(vma)))
 		return -ENOMEM;
-	anon_vma_lock(vma);
+	vma_lock_anon_vma(vma);
 
 	/*
 	 * vma->vm_start/vm_end cannot change under us because the caller
@@ -1721,7 +1721,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 	if (address < PAGE_ALIGN(address+4))
 		address = PAGE_ALIGN(address+4);
 	else {
-		anon_vma_unlock(vma);
+		vma_unlock_anon_vma(vma);
 		return -ENOMEM;
 	}
 	error = 0;
@@ -1737,7 +1737,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 		if (!error)
 			vma->vm_end = address;
 	}
-	anon_vma_unlock(vma);
+	vma_unlock_anon_vma(vma);
 	return error;
 }
 #endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */
@@ -1762,7 +1762,7 @@ static int expand_downwards(struct vm_area_struct *vma,
 	if (error)
 		return error;
 
-	anon_vma_lock(vma);
+	vma_lock_anon_vma(vma);
 
 	/*
 	 * vma->vm_start/vm_end cannot change under us because the caller
@@ -1783,7 +1783,7 @@ static int expand_downwards(struct vm_area_struct *vma,
 			vma->vm_pgoff -= grow;
 		}
 	}
-	anon_vma_unlock(vma);
+	vma_unlock_anon_vma(vma);
 	return error;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 3/5] track the root (oldest) anon_vma
  2010-05-12 17:38 [PATCH 0/5] always lock the root anon_vma Rik van Riel
  2010-05-12 17:39 ` [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma Rik van Riel
@ 2010-05-12 17:39 ` Rik van Riel
  2010-05-12 20:59   ` Mel Gorman
  2010-05-13  0:38   ` KAMEZAWA Hiroyuki
  2010-05-12 17:40 ` [PATCH 4/5] always lock " Rik van Riel
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-12 17:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

Subject: track the root (oldest) anon_vma

Track the root (oldest) anon_vma in each anon_vma tree.   Because we only
take the lock on the root anon_vma, we cannot use the lock on higher-up
anon_vmas to lock anything.  This makes it impossible to do an indirect
lookup of the root anon_vma, since the data structures could go away from
under us.

However, a direct pointer is safe because the root anon_vma is always the
last one that gets freed on munmap or exit, by virtue of the same_vma list
order and unlink_anon_vmas walking the list forward.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/rmap.h |    1 +
 mm/rmap.c            |   20 +++++++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 72ecd87..457ae1e 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -26,6 +26,7 @@
  */
 struct anon_vma {
 	spinlock_t lock;	/* Serialize access to vma list */
+	struct anon_vma *root;	/* Root of this anon_vma tree */
 #ifdef CONFIG_KSM
 	atomic_t ksm_refcount;
 #endif
diff --git a/mm/rmap.c b/mm/rmap.c
index 6102f77..e34cb56 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -132,6 +132,11 @@ int anon_vma_prepare(struct vm_area_struct *vma)
 			if (unlikely(!anon_vma))
 				goto out_enomem_free_avc;
 			allocated = anon_vma;
+			/*
+			 * This VMA had no anon_vma yet.  This anon_vma is
+			 * the root of any anon_vma tree that might form.
+			 */
+			anon_vma->root = anon_vma;
 		}
 
 		anon_vma_lock(anon_vma);
@@ -203,7 +208,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
  */
 int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 {
-	struct anon_vma_chain *avc;
+	struct anon_vma_chain *avc, *root_avc;
 	struct anon_vma *anon_vma;
 
 	/* Don't bother if the parent process has no anon_vma here. */
@@ -224,9 +229,18 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	avc = anon_vma_chain_alloc();
 	if (!avc)
 		goto out_error_free_anon_vma;
-	anon_vma_chain_link(vma, avc, anon_vma);
+
+	/*
+	 * Get the root anon_vma on the list by depending on the ordering
+	 * of the same_vma list setup by previous invocations of anon_vma_fork.
+	 * The root anon_vma will always be referenced by the last item
+	 * in the anon_vma_chain list.
+	 */
+	root_avc = list_entry(vma->anon_vma_chain.prev, struct anon_vma_chain, same_vma);
+	anon_vma->root = root_avc->anon_vma;
 	/* Mark this anon_vma as the one where our new (COWed) pages go. */
 	vma->anon_vma = anon_vma;
+	anon_vma_chain_link(vma, avc, anon_vma);
 
 	return 0;
 
@@ -261,7 +275,7 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
 {
 	struct anon_vma_chain *avc, *next;
 
-	/* Unlink each anon_vma chained to the VMA. */
+	/* Unlink each anon_vma chained to the VMA, from newest to oldest. */
 	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
 		anon_vma_unlink(avc);
 		list_del(&avc->same_vma);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 4/5] always lock the root (oldest) anon_vma
  2010-05-12 17:38 [PATCH 0/5] always lock the root anon_vma Rik van Riel
  2010-05-12 17:39 ` [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma Rik van Riel
  2010-05-12 17:39 ` [PATCH 3/5] track the root (oldest) anon_vma Rik van Riel
@ 2010-05-12 17:40 ` Rik van Riel
  2010-05-12 21:02   ` Mel Gorman
  2010-05-12 21:55   ` [PATCH 4/5] always lock the root (oldest) anon_vma Linus Torvalds
  2010-05-12 17:41 ` [PATCH 5/5] extend KSM refcounts to the anon_vma root Rik van Riel
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-12 17:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

Subject: always lock the root (oldest) anon_vma

Always (and only) lock the root (oldest) anon_vma whenever we do something in an
anon_vma.  The recently introduced anon_vma scalability is due to the rmap code
scanning only the VMAs that need to be scanned.  Many common operations still
took the anon_vma lock on the root anon_vma, so always taking that lock is not
expected to introduce any scalability issues.

However, always taking the same lock does mean we only need to take one lock,
which means rmap_walk on pages from any anon_vma in the vma is excluded from
occurring during an munmap, expand_stack or other operation that needs to
exclude rmap_walk and similar functions.

Also add the proper locking to vma_adjust.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/rmap.h |    8 ++++----
 mm/ksm.c             |    2 +-
 mm/mmap.c            |    6 +++++-
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 457ae1e..33ffe14 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -95,24 +95,24 @@ static inline void vma_lock_anon_vma(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_lock(&anon_vma->lock);
+		spin_lock(&anon_vma->root->lock);
 }
 
 static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_unlock(&anon_vma->lock);
+		spin_unlock(&anon_vma->root->lock);
 }
 
 static inline void anon_vma_lock(struct anon_vma *anon_vma)
 {
-	spin_lock(&anon_vma->lock);
+	spin_lock(&anon_vma->root->lock);
 }
 
 static inline void anon_vma_unlock(struct anon_vma *anon_vma)
 {
-	spin_unlock(&anon_vma->lock);
+	spin_unlock(&anon_vma->root->lock);
 }
 
 /*
diff --git a/mm/ksm.c b/mm/ksm.c
index d488012..7ca0dd7 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -325,7 +325,7 @@ static void drop_anon_vma(struct rmap_item *rmap_item)
 {
 	struct anon_vma *anon_vma = rmap_item->anon_vma;
 
-	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->lock)) {
+	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock)) {
 		int empty = list_empty(&anon_vma->head);
 		anon_vma_unlock(anon_vma);
 		if (empty)
diff --git a/mm/mmap.c b/mm/mmap.c
index f70bc65..b7dfe30 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -553,6 +553,8 @@ again:			remove_next = 1 + (end > next->vm_end);
 		}
 	}
 
+	vma_lock_anon_vma(vma);
+
 	if (file) {
 		mapping = file->f_mapping;
 		if (!(vma->vm_flags & VM_NONLINEAR))
@@ -600,6 +602,8 @@ again:			remove_next = 1 + (end > next->vm_end);
 		flush_dcache_mmap_unlock(mapping);
 	}
 
+	vma_unlock_anon_vma(vma);
+
 	if (remove_next) {
 		/*
 		 * vma_merge has merged next into vma, and needs
@@ -2471,7 +2475,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
 		 * The LSB of head.next can't change from under us
 		 * because we hold the mm_all_locks_mutex.
 		 */
-		spin_lock_nest_lock(&anon_vma->lock, &mm->mmap_sem);
+		spin_lock_nest_lock(&anon_vma->root->lock, &mm->mmap_sem);
 		/*
 		 * We can safely modify head.next after taking the
 		 * anon_vma->lock. If some other vma in this mm shares

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 5/5] extend KSM refcounts to the anon_vma root
  2010-05-12 17:38 [PATCH 0/5] always lock the root anon_vma Rik van Riel
                   ` (2 preceding siblings ...)
  2010-05-12 17:40 ` [PATCH 4/5] always lock " Rik van Riel
@ 2010-05-12 17:41 ` Rik van Riel
  2010-05-12 21:07   ` Mel Gorman
  2010-05-12 17:41 ` [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function Rik van Riel
  2010-05-20 22:42 ` [PATCH 6/5] adjust mm_take_all_locks to anon-vma-root locking Andrea Arcangeli
  5 siblings, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2010-05-12 17:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

Subject: extend KSM refcounts to the anon_vma root

KSM reference counts can cause an anon_vma to exist after the processe
it belongs to have already exited.  Because the anon_vma lock now lives
in the root anon_vma, we need to ensure that the root anon_vma stays
around until after all the "child" anon_vmas have been freed.

The obvious way to do this is to have a "child" anon_vma take a
reference to the root in anon_vma_fork.  When the anon_vma is freed
at munmap or process exit, we drop the refcount in anon_vma_unlink
and possibly free the root anon_vma.

The KSM anon_vma reference count function also needs to be modified
to deal with the possibility of freeing 2 levels of anon_vma.  The
easiest way to do this is to break out the KSM magic and make it
generic.

When compiling without CONFIG_KSM, this code is compiled out.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/rmap.h |   12 ++++++++++++
 mm/ksm.c             |   17 ++++++-----------
 mm/rmap.c            |   45 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 62 insertions(+), 12 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 33ffe14..387d40c 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -126,6 +126,18 @@ int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
 void __anon_vma_link(struct vm_area_struct *);
 void anon_vma_free(struct anon_vma *);
 
+#ifdef CONFIG_KSM
+static inline void get_anon_vma(struct anon_vma *anon_vma)
+{
+	atomic_inc(&anon_vma->ksm_refcount);
+}
+
+void drop_anon_vma(struct anon_vma *);
+#else
+#define get_anon_vma(x)		do {} while(0)
+#define drop_anon_vma(x)	do {} while(0)
+#endif
+
 static inline void anon_vma_merge(struct vm_area_struct *vma,
 				  struct vm_area_struct *next)
 {
diff --git a/mm/ksm.c b/mm/ksm.c
index 7ca0dd7..9f2acc9 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -318,19 +318,14 @@ static void hold_anon_vma(struct rmap_item *rmap_item,
 			  struct anon_vma *anon_vma)
 {
 	rmap_item->anon_vma = anon_vma;
-	atomic_inc(&anon_vma->ksm_refcount);
+	get_anon_vma(anon_vma);
 }
 
-static void drop_anon_vma(struct rmap_item *rmap_item)
+static void ksm_drop_anon_vma(struct rmap_item *rmap_item)
 {
 	struct anon_vma *anon_vma = rmap_item->anon_vma;
 
-	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock)) {
-		int empty = list_empty(&anon_vma->head);
-		anon_vma_unlock(anon_vma);
-		if (empty)
-			anon_vma_free(anon_vma);
-	}
+	drop_anon_vma(anon_vma);
 }
 
 /*
@@ -415,7 +410,7 @@ static void break_cow(struct rmap_item *rmap_item)
 	 * It is not an accident that whenever we want to break COW
 	 * to undo, we also need to drop a reference to the anon_vma.
 	 */
-	drop_anon_vma(rmap_item);
+	ksm_drop_anon_vma(rmap_item);
 
 	down_read(&mm->mmap_sem);
 	if (ksm_test_exit(mm))
@@ -470,7 +465,7 @@ static void remove_node_from_stable_tree(struct stable_node *stable_node)
 			ksm_pages_sharing--;
 		else
 			ksm_pages_shared--;
-		drop_anon_vma(rmap_item);
+		ksm_drop_anon_vma(rmap_item);
 		rmap_item->address &= PAGE_MASK;
 		cond_resched();
 	}
@@ -558,7 +553,7 @@ static void remove_rmap_item_from_tree(struct rmap_item *rmap_item)
 		else
 			ksm_pages_shared--;
 
-		drop_anon_vma(rmap_item);
+		ksm_drop_anon_vma(rmap_item);
 		rmap_item->address &= PAGE_MASK;
 
 	} else if (rmap_item->address & UNSTABLE_FLAG) {
diff --git a/mm/rmap.c b/mm/rmap.c
index f0ba648..d63cd91 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -238,6 +238,12 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	 */
 	root_avc = list_entry(vma->anon_vma_chain.prev, struct anon_vma_chain, same_vma);
 	anon_vma->root = root_avc->anon_vma;
+	/*
+	 * With KSM refcounts, an anon_vma can stay around longer than the
+	 * process it belongs to.  The root anon_vma needs to be pinned,
+	 * because that is where the lock lives.
+	 */
+	get_anon_vma(anon_vma->root);
 	/* Mark this anon_vma as the one where our new (COWed) pages go. */
 	vma->anon_vma = anon_vma;
 	anon_vma_chain_link(vma, avc, anon_vma);
@@ -267,8 +273,11 @@ static void anon_vma_unlink(struct anon_vma_chain *anon_vma_chain)
 	empty = list_empty(&anon_vma->head) && !ksm_refcount(anon_vma);
 	anon_vma_unlock(anon_vma);
 
-	if (empty)
+	if (empty) {
+		/* We no longer need the root anon_vma */
+		drop_anon_vma(anon_vma->root);
 		anon_vma_free(anon_vma);
+	}
 }
 
 void unlink_anon_vmas(struct vm_area_struct *vma)
@@ -1355,6 +1364,40 @@ int try_to_munlock(struct page *page)
 		return try_to_unmap_file(page, TTU_MUNLOCK);
 }
 
+#ifdef CONFIG_KSM
+/*
+ * Drop an anon_vma refcount, freeing the anon_vma and anon_vma->root
+ * if necessary.  Be careful to do all the tests under the lock.  Once
+ * we know we are the last user, nobody else can get a reference and we
+ * can do the freeing without the lock.
+ */
+void drop_anon_vma(struct anon_vma *anon_vma)
+{
+	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock)) {
+		struct anon_vma *root = anon_vma->root;
+		int empty list_empty(&anon_vma->head);
+		int last_root_user = 0;
+		int root_empty = 0;
+
+		/*
+		 * The refcount on a non-root anon_vma got dropped.  Drop
+		 * the refcount on the root and check if we need to free it.
+		 */
+		if (empty && anon_vma != root) {
+			last_root_user = atomic_dec_and_test(&root->ksm_refcount);
+			root_empty = list_empty(&root->head);
+		}
+		anon_vma_unlock(anon_vma);
+
+		if (empty) {
+			anon_vma_free(anon_vma);
+			if (root_empty && last_root_user)
+				anon_vma_free(root);
+		}
+	}
+}
+#endif
+
 #ifdef CONFIG_MIGRATION
 /*
  * rmap_walk() and its helpers rmap_walk_anon() and rmap_walk_file():

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function
  2010-05-12 17:38 [PATCH 0/5] always lock the root anon_vma Rik van Riel
                   ` (3 preceding siblings ...)
  2010-05-12 17:41 ` [PATCH 5/5] extend KSM refcounts to the anon_vma root Rik van Riel
@ 2010-05-12 17:41 ` Rik van Riel
  2010-05-12 20:58   ` Mel Gorman
  2010-05-13  0:32   ` KAMEZAWA Hiroyuki
  2010-05-20 22:42 ` [PATCH 6/5] adjust mm_take_all_locks to anon-vma-root locking Andrea Arcangeli
  5 siblings, 2 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-12 17:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

Subject: change direct call of spin_lock(anon_vma->lock) to inline function

Subsitute a direct call of spin_lock(anon_vma->lock) with
an inline function doing exactly the same.

This makes it easier to do the substitution to the root
anon_vma lock in a following patch.

We will deal with the handful of special locks (nested,
dec_and_lock, etc) separately.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/rmap.h |   10 ++++++++++
 mm/ksm.c             |   18 +++++++++---------
 mm/mmap.c            |    2 +-
 mm/rmap.c            |   20 ++++++++++----------
 4 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 88cae59..72ecd87 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -104,6 +104,16 @@ static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
 		spin_unlock(&anon_vma->lock);
 }
 
+static inline void anon_vma_lock(struct anon_vma *anon_vma)
+{
+	spin_lock(&anon_vma->lock);
+}
+
+static inline void anon_vma_unlock(struct anon_vma *anon_vma)
+{
+	spin_unlock(&anon_vma->lock);
+}
+
 /*
  * anon_vma helper functions.
  */
diff --git a/mm/ksm.c b/mm/ksm.c
index 956880f..d488012 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -327,7 +327,7 @@ static void drop_anon_vma(struct rmap_item *rmap_item)
 
 	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->lock)) {
 		int empty = list_empty(&anon_vma->head);
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 		if (empty)
 			anon_vma_free(anon_vma);
 	}
@@ -1566,7 +1566,7 @@ again:
 		struct anon_vma_chain *vmac;
 		struct vm_area_struct *vma;
 
-		spin_lock(&anon_vma->lock);
+		anon_vma_lock(anon_vma);
 		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
 			vma = vmac->vma;
 			if (rmap_item->address < vma->vm_start ||
@@ -1589,7 +1589,7 @@ again:
 			if (!search_new_forks || !mapcount)
 				break;
 		}
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 		if (!mapcount)
 			goto out;
 	}
@@ -1619,7 +1619,7 @@ again:
 		struct anon_vma_chain *vmac;
 		struct vm_area_struct *vma;
 
-		spin_lock(&anon_vma->lock);
+		anon_vma_lock(anon_vma);
 		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
 			vma = vmac->vma;
 			if (rmap_item->address < vma->vm_start ||
@@ -1637,11 +1637,11 @@ again:
 			ret = try_to_unmap_one(page, vma,
 					rmap_item->address, flags);
 			if (ret != SWAP_AGAIN || !page_mapped(page)) {
-				spin_unlock(&anon_vma->lock);
+				anon_vma_unlock(anon_vma);
 				goto out;
 			}
 		}
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 	}
 	if (!search_new_forks++)
 		goto again;
@@ -1671,7 +1671,7 @@ again:
 		struct anon_vma_chain *vmac;
 		struct vm_area_struct *vma;
 
-		spin_lock(&anon_vma->lock);
+		anon_vma_lock(anon_vma);
 		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
 			vma = vmac->vma;
 			if (rmap_item->address < vma->vm_start ||
@@ -1688,11 +1688,11 @@ again:
 
 			ret = rmap_one(page, vma, rmap_item->address, arg);
 			if (ret != SWAP_AGAIN) {
-				spin_unlock(&anon_vma->lock);
+				anon_vma_unlock(anon_vma);
 				goto out;
 			}
 		}
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 	}
 	if (!search_new_forks++)
 		goto again;
diff --git a/mm/mmap.c b/mm/mmap.c
index d30bed3..f70bc65 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2589,7 +2589,7 @@ static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
 		if (!__test_and_clear_bit(0, (unsigned long *)
 					  &anon_vma->head.next))
 			BUG();
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 	}
 }
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 07fc947..4eb8937 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -134,7 +134,7 @@ int anon_vma_prepare(struct vm_area_struct *vma)
 			allocated = anon_vma;
 		}
 
-		spin_lock(&anon_vma->lock);
+		anon_vma_lock(anon_vma);
 		/* page_table_lock to protect against threads */
 		spin_lock(&mm->page_table_lock);
 		if (likely(!vma->anon_vma)) {
@@ -147,7 +147,7 @@ int anon_vma_prepare(struct vm_area_struct *vma)
 			avc = NULL;
 		}
 		spin_unlock(&mm->page_table_lock);
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 
 		if (unlikely(allocated))
 			anon_vma_free(allocated);
@@ -170,9 +170,9 @@ static void anon_vma_chain_link(struct vm_area_struct *vma,
 	avc->anon_vma = anon_vma;
 	list_add(&avc->same_vma, &vma->anon_vma_chain);
 
-	spin_lock(&anon_vma->lock);
+	anon_vma_lock(anon_vma);
 	list_add_tail(&avc->same_anon_vma, &anon_vma->head);
-	spin_unlock(&anon_vma->lock);
+	anon_vma_unlock(anon_vma);
 }
 
 /*
@@ -246,12 +246,12 @@ static void anon_vma_unlink(struct anon_vma_chain *anon_vma_chain)
 	if (!anon_vma)
 		return;
 
-	spin_lock(&anon_vma->lock);
+	anon_vma_lock(anon_vma);
 	list_del(&anon_vma_chain->same_anon_vma);
 
 	/* We must garbage collect the anon_vma if it's empty */
 	empty = list_empty(&anon_vma->head) && !ksm_refcount(anon_vma);
-	spin_unlock(&anon_vma->lock);
+	anon_vma_unlock(anon_vma);
 
 	if (empty)
 		anon_vma_free(anon_vma);
@@ -302,7 +302,7 @@ struct anon_vma *page_lock_anon_vma(struct page *page)
 		goto out;
 
 	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
-	spin_lock(&anon_vma->lock);
+	anon_vma_lock(anon_vma);
 	return anon_vma;
 out:
 	rcu_read_unlock();
@@ -311,7 +311,7 @@ out:
 
 void page_unlock_anon_vma(struct anon_vma *anon_vma)
 {
-	spin_unlock(&anon_vma->lock);
+	anon_vma_unlock(anon_vma);
 	rcu_read_unlock();
 }
 
@@ -1364,7 +1364,7 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
 	anon_vma = page_anon_vma(page);
 	if (!anon_vma)
 		return ret;
-	spin_lock(&anon_vma->lock);
+	anon_vma_lock(anon_vma);
 	list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
 		struct vm_area_struct *vma = avc->vma;
 		unsigned long address = vma_address(page, vma);
@@ -1374,7 +1374,7 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
 		if (ret != SWAP_AGAIN)
 			break;
 	}
-	spin_unlock(&anon_vma->lock);
+	anon_vma_unlock(anon_vma);
 	return ret;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma
  2010-05-12 17:39 ` [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma Rik van Riel
@ 2010-05-12 20:57   ` Mel Gorman
  2010-05-13  0:30   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 51+ messages in thread
From: Mel Gorman @ 2010-05-12 20:57 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On Wed, May 12, 2010 at 01:39:31PM -0400, Rik van Riel wrote:
> Subject: rename anon_vma_lock to vma_lock_anon_vma
> 
> Rename anon_vma_lock to vma_lock_anon_vma.  This matches the
> naming style used in page_lock_anon_vma and will come in really
> handy further down in this patch series.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
>  include/linux/rmap.h |    4 ++--
>  mm/mmap.c            |   14 +++++++-------
>  2 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index d25bd22..88cae59 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -90,14 +90,14 @@ static inline struct anon_vma *page_anon_vma(struct page *page)
>  	return page_rmapping(page);
>  }
>  
> -static inline void anon_vma_lock(struct vm_area_struct *vma)
> +static inline void vma_lock_anon_vma(struct vm_area_struct *vma)
>  {
>  	struct anon_vma *anon_vma = vma->anon_vma;
>  	if (anon_vma)
>  		spin_lock(&anon_vma->lock);
>  }
>  
> -static inline void anon_vma_unlock(struct vm_area_struct *vma)
> +static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
>  {
>  	struct anon_vma *anon_vma = vma->anon_vma;
>  	if (anon_vma)
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 456ec6f..d30bed3 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -452,12 +452,12 @@ static void vma_link(struct mm_struct *mm, struct vm_area_struct *vma,
>  		spin_lock(&mapping->i_mmap_lock);
>  		vma->vm_truncate_count = mapping->truncate_count;
>  	}
> -	anon_vma_lock(vma);
> +	vma_lock_anon_vma(vma);
>  
>  	__vma_link(mm, vma, prev, rb_link, rb_parent);
>  	__vma_link_file(vma);
>  
> -	anon_vma_unlock(vma);
> +	vma_unlock_anon_vma(vma);
>  	if (mapping)
>  		spin_unlock(&mapping->i_mmap_lock);
>  
> @@ -1710,7 +1710,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  	 */
>  	if (unlikely(anon_vma_prepare(vma)))
>  		return -ENOMEM;
> -	anon_vma_lock(vma);
> +	vma_lock_anon_vma(vma);
>  
>  	/*
>  	 * vma->vm_start/vm_end cannot change under us because the caller
> @@ -1721,7 +1721,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  	if (address < PAGE_ALIGN(address+4))
>  		address = PAGE_ALIGN(address+4);
>  	else {
> -		anon_vma_unlock(vma);
> +		vma_unlock_anon_vma(vma);
>  		return -ENOMEM;
>  	}
>  	error = 0;
> @@ -1737,7 +1737,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  		if (!error)
>  			vma->vm_end = address;
>  	}
> -	anon_vma_unlock(vma);
> +	vma_unlock_anon_vma(vma);
>  	return error;
>  }
>  #endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */
> @@ -1762,7 +1762,7 @@ static int expand_downwards(struct vm_area_struct *vma,
>  	if (error)
>  		return error;
>  
> -	anon_vma_lock(vma);
> +	vma_lock_anon_vma(vma);
>  
>  	/*
>  	 * vma->vm_start/vm_end cannot change under us because the caller
> @@ -1783,7 +1783,7 @@ static int expand_downwards(struct vm_area_struct *vma,
>  			vma->vm_pgoff -= grow;
>  		}
>  	}
> -	anon_vma_unlock(vma);
> +	vma_unlock_anon_vma(vma);
>  	return error;
>  }
>  
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function
  2010-05-12 17:41 ` [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function Rik van Riel
@ 2010-05-12 20:58   ` Mel Gorman
  2010-05-13  0:32   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 51+ messages in thread
From: Mel Gorman @ 2010-05-12 20:58 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On Wed, May 12, 2010 at 01:41:18PM -0400, Rik van Riel wrote:
> Subject: change direct call of spin_lock(anon_vma->lock) to inline function
> 
> Subsitute a direct call of spin_lock(anon_vma->lock) with
> an inline function doing exactly the same.
> 
> This makes it easier to do the substitution to the root
> anon_vma lock in a following patch.
> 
> We will deal with the handful of special locks (nested,
> dec_and_lock, etc) separately.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
>  include/linux/rmap.h |   10 ++++++++++
>  mm/ksm.c             |   18 +++++++++---------
>  mm/mmap.c            |    2 +-
>  mm/rmap.c            |   20 ++++++++++----------
>  4 files changed, 30 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 88cae59..72ecd87 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -104,6 +104,16 @@ static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
>  		spin_unlock(&anon_vma->lock);
>  }
>  
> +static inline void anon_vma_lock(struct anon_vma *anon_vma)
> +{
> +	spin_lock(&anon_vma->lock);
> +}
> +
> +static inline void anon_vma_unlock(struct anon_vma *anon_vma)
> +{
> +	spin_unlock(&anon_vma->lock);
> +}
> +
>  /*
>   * anon_vma helper functions.
>   */
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 956880f..d488012 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -327,7 +327,7 @@ static void drop_anon_vma(struct rmap_item *rmap_item)
>  
>  	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->lock)) {
>  		int empty = list_empty(&anon_vma->head);
> -		spin_unlock(&anon_vma->lock);
> +		anon_vma_unlock(anon_vma);
>  		if (empty)
>  			anon_vma_free(anon_vma);
>  	}
> @@ -1566,7 +1566,7 @@ again:
>  		struct anon_vma_chain *vmac;
>  		struct vm_area_struct *vma;
>  
> -		spin_lock(&anon_vma->lock);
> +		anon_vma_lock(anon_vma);
>  		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
>  			vma = vmac->vma;
>  			if (rmap_item->address < vma->vm_start ||
> @@ -1589,7 +1589,7 @@ again:
>  			if (!search_new_forks || !mapcount)
>  				break;
>  		}
> -		spin_unlock(&anon_vma->lock);
> +		anon_vma_unlock(anon_vma);
>  		if (!mapcount)
>  			goto out;
>  	}
> @@ -1619,7 +1619,7 @@ again:
>  		struct anon_vma_chain *vmac;
>  		struct vm_area_struct *vma;
>  
> -		spin_lock(&anon_vma->lock);
> +		anon_vma_lock(anon_vma);
>  		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
>  			vma = vmac->vma;
>  			if (rmap_item->address < vma->vm_start ||
> @@ -1637,11 +1637,11 @@ again:
>  			ret = try_to_unmap_one(page, vma,
>  					rmap_item->address, flags);
>  			if (ret != SWAP_AGAIN || !page_mapped(page)) {
> -				spin_unlock(&anon_vma->lock);
> +				anon_vma_unlock(anon_vma);
>  				goto out;
>  			}
>  		}
> -		spin_unlock(&anon_vma->lock);
> +		anon_vma_unlock(anon_vma);
>  	}
>  	if (!search_new_forks++)
>  		goto again;
> @@ -1671,7 +1671,7 @@ again:
>  		struct anon_vma_chain *vmac;
>  		struct vm_area_struct *vma;
>  
> -		spin_lock(&anon_vma->lock);
> +		anon_vma_lock(anon_vma);
>  		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
>  			vma = vmac->vma;
>  			if (rmap_item->address < vma->vm_start ||
> @@ -1688,11 +1688,11 @@ again:
>  
>  			ret = rmap_one(page, vma, rmap_item->address, arg);
>  			if (ret != SWAP_AGAIN) {
> -				spin_unlock(&anon_vma->lock);
> +				anon_vma_unlock(anon_vma);
>  				goto out;
>  			}
>  		}
> -		spin_unlock(&anon_vma->lock);
> +		anon_vma_unlock(anon_vma);
>  	}
>  	if (!search_new_forks++)
>  		goto again;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d30bed3..f70bc65 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2589,7 +2589,7 @@ static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
>  		if (!__test_and_clear_bit(0, (unsigned long *)
>  					  &anon_vma->head.next))
>  			BUG();
> -		spin_unlock(&anon_vma->lock);
> +		anon_vma_unlock(anon_vma);
>  	}
>  }
>  
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 07fc947..4eb8937 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -134,7 +134,7 @@ int anon_vma_prepare(struct vm_area_struct *vma)
>  			allocated = anon_vma;
>  		}
>  
> -		spin_lock(&anon_vma->lock);
> +		anon_vma_lock(anon_vma);
>  		/* page_table_lock to protect against threads */
>  		spin_lock(&mm->page_table_lock);
>  		if (likely(!vma->anon_vma)) {
> @@ -147,7 +147,7 @@ int anon_vma_prepare(struct vm_area_struct *vma)
>  			avc = NULL;
>  		}
>  		spin_unlock(&mm->page_table_lock);
> -		spin_unlock(&anon_vma->lock);
> +		anon_vma_unlock(anon_vma);
>  
>  		if (unlikely(allocated))
>  			anon_vma_free(allocated);
> @@ -170,9 +170,9 @@ static void anon_vma_chain_link(struct vm_area_struct *vma,
>  	avc->anon_vma = anon_vma;
>  	list_add(&avc->same_vma, &vma->anon_vma_chain);
>  
> -	spin_lock(&anon_vma->lock);
> +	anon_vma_lock(anon_vma);
>  	list_add_tail(&avc->same_anon_vma, &anon_vma->head);
> -	spin_unlock(&anon_vma->lock);
> +	anon_vma_unlock(anon_vma);
>  }
>  
>  /*
> @@ -246,12 +246,12 @@ static void anon_vma_unlink(struct anon_vma_chain *anon_vma_chain)
>  	if (!anon_vma)
>  		return;
>  
> -	spin_lock(&anon_vma->lock);
> +	anon_vma_lock(anon_vma);
>  	list_del(&anon_vma_chain->same_anon_vma);
>  
>  	/* We must garbage collect the anon_vma if it's empty */
>  	empty = list_empty(&anon_vma->head) && !ksm_refcount(anon_vma);
> -	spin_unlock(&anon_vma->lock);
> +	anon_vma_unlock(anon_vma);
>  
>  	if (empty)
>  		anon_vma_free(anon_vma);
> @@ -302,7 +302,7 @@ struct anon_vma *page_lock_anon_vma(struct page *page)
>  		goto out;
>  
>  	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
> -	spin_lock(&anon_vma->lock);
> +	anon_vma_lock(anon_vma);
>  	return anon_vma;
>  out:
>  	rcu_read_unlock();
> @@ -311,7 +311,7 @@ out:
>  
>  void page_unlock_anon_vma(struct anon_vma *anon_vma)
>  {
> -	spin_unlock(&anon_vma->lock);
> +	anon_vma_unlock(anon_vma);
>  	rcu_read_unlock();
>  }
>  
> @@ -1364,7 +1364,7 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
>  	anon_vma = page_anon_vma(page);
>  	if (!anon_vma)
>  		return ret;
> -	spin_lock(&anon_vma->lock);
> +	anon_vma_lock(anon_vma);
>  	list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
>  		struct vm_area_struct *vma = avc->vma;
>  		unsigned long address = vma_address(page, vma);
> @@ -1374,7 +1374,7 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
>  		if (ret != SWAP_AGAIN)
>  			break;
>  	}
> -	spin_unlock(&anon_vma->lock);
> +	anon_vma_unlock(anon_vma);
>  	return ret;
>  }
>  
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/5] track the root (oldest) anon_vma
  2010-05-12 17:39 ` [PATCH 3/5] track the root (oldest) anon_vma Rik van Riel
@ 2010-05-12 20:59   ` Mel Gorman
  2010-05-12 21:01     ` Rik van Riel
  2010-05-13  0:38   ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 51+ messages in thread
From: Mel Gorman @ 2010-05-12 20:59 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On Wed, May 12, 2010 at 01:39:58PM -0400, Rik van Riel wrote:
> Subject: track the root (oldest) anon_vma
> 
> Track the root (oldest) anon_vma in each anon_vma tree.   Because we only
> take the lock on the root anon_vma, we cannot use the lock on higher-up
> anon_vmas to lock anything.  This makes it impossible to do an indirect
> lookup of the root anon_vma, since the data structures could go away from
> under us.
> 
> However, a direct pointer is safe because the root anon_vma is always the
> last one that gets freed on munmap or exit, by virtue of the same_vma list
> order and unlink_anon_vmas walking the list forward.
> 

Shouldn't this be "usually the last one that gets freed" because of the
ref-counting by KSM aspect? Minor nit anyway.

> Signed-off-by: Rik van Riel <riel@redhat.com>

Otherwise

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
>  include/linux/rmap.h |    1 +
>  mm/rmap.c            |   20 +++++++++++++++++---
>  2 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 72ecd87..457ae1e 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -26,6 +26,7 @@
>   */
>  struct anon_vma {
>  	spinlock_t lock;	/* Serialize access to vma list */
> +	struct anon_vma *root;	/* Root of this anon_vma tree */
>  #ifdef CONFIG_KSM
>  	atomic_t ksm_refcount;
>  #endif
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 6102f77..e34cb56 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -132,6 +132,11 @@ int anon_vma_prepare(struct vm_area_struct *vma)
>  			if (unlikely(!anon_vma))
>  				goto out_enomem_free_avc;
>  			allocated = anon_vma;
> +			/*
> +			 * This VMA had no anon_vma yet.  This anon_vma is
> +			 * the root of any anon_vma tree that might form.
> +			 */
> +			anon_vma->root = anon_vma;
>  		}
>  
>  		anon_vma_lock(anon_vma);
> @@ -203,7 +208,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>   */
>  int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>  {
> -	struct anon_vma_chain *avc;
> +	struct anon_vma_chain *avc, *root_avc;
>  	struct anon_vma *anon_vma;
>  
>  	/* Don't bother if the parent process has no anon_vma here. */
> @@ -224,9 +229,18 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>  	avc = anon_vma_chain_alloc();
>  	if (!avc)
>  		goto out_error_free_anon_vma;
> -	anon_vma_chain_link(vma, avc, anon_vma);
> +
> +	/*
> +	 * Get the root anon_vma on the list by depending on the ordering
> +	 * of the same_vma list setup by previous invocations of anon_vma_fork.
> +	 * The root anon_vma will always be referenced by the last item
> +	 * in the anon_vma_chain list.
> +	 */
> +	root_avc = list_entry(vma->anon_vma_chain.prev, struct anon_vma_chain, same_vma);
> +	anon_vma->root = root_avc->anon_vma;
>  	/* Mark this anon_vma as the one where our new (COWed) pages go. */
>  	vma->anon_vma = anon_vma;
> +	anon_vma_chain_link(vma, avc, anon_vma);
>  
>  	return 0;
>  
> @@ -261,7 +275,7 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
>  {
>  	struct anon_vma_chain *avc, *next;
>  
> -	/* Unlink each anon_vma chained to the VMA. */
> +	/* Unlink each anon_vma chained to the VMA, from newest to oldest. */
>  	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
>  		anon_vma_unlink(avc);
>  		list_del(&avc->same_vma);
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/5] track the root (oldest) anon_vma
  2010-05-12 20:59   ` Mel Gorman
@ 2010-05-12 21:01     ` Rik van Riel
  0 siblings, 0 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-12 21:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On 05/12/2010 04:59 PM, Mel Gorman wrote:
> On Wed, May 12, 2010 at 01:39:58PM -0400, Rik van Riel wrote:
>> Subject: track the root (oldest) anon_vma
>>
>> Track the root (oldest) anon_vma in each anon_vma tree.   Because we only
>> take the lock on the root anon_vma, we cannot use the lock on higher-up
>> anon_vmas to lock anything.  This makes it impossible to do an indirect
>> lookup of the root anon_vma, since the data structures could go away from
>> under us.
>>
>> However, a direct pointer is safe because the root anon_vma is always the
>> last one that gets freed on munmap or exit, by virtue of the same_vma list
>> order and unlink_anon_vmas walking the list forward.
>>
>
> Shouldn't this be "usually the last one that gets freed" because of the
> ref-counting by KSM aspect? Minor nit anyway.

It needs to be the last one that gets freed.  Patch 5/5 makes
sure that it is when KSM refcounting is involved.

>> Signed-off-by: Rik van Riel<riel@redhat.com>
>
> Otherwise
>
> Acked-by: Mel Gorman<mel@csn.ul.ie>

Thank you for reviewing these patches.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/5] always lock the root (oldest) anon_vma
  2010-05-12 17:40 ` [PATCH 4/5] always lock " Rik van Riel
@ 2010-05-12 21:02   ` Mel Gorman
  2010-05-12 21:08     ` Rik van Riel
  2010-05-12 21:55   ` [PATCH 4/5] always lock the root (oldest) anon_vma Linus Torvalds
  1 sibling, 1 reply; 51+ messages in thread
From: Mel Gorman @ 2010-05-12 21:02 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On Wed, May 12, 2010 at 01:40:29PM -0400, Rik van Riel wrote:
> Subject: always lock the root (oldest) anon_vma
> 
> Always (and only) lock the root (oldest) anon_vma whenever we do something in an
> anon_vma.  The recently introduced anon_vma scalability is due to the rmap code
> scanning only the VMAs that need to be scanned.  Many common operations still
> took the anon_vma lock on the root anon_vma, so always taking that lock is not
> expected to introduce any scalability issues.
> 
> However, always taking the same lock does mean we only need to take one lock,
> which means rmap_walk on pages from any anon_vma in the vma is excluded from
> occurring during an munmap, expand_stack or other operation that needs to
> exclude rmap_walk and similar functions.
> 
> Also add the proper locking to vma_adjust.
> 

This last comment is a bit light. It's actually restoring the lock that
was taken in 2.6.33 to some extent except we are always taking it now.
In 2.6.33, it was resricted to

       if (vma->anon_vma && (insert || importer || start != vma->vm_start))
                anon_vma = vma->anon_vma;

but now it's always. Has it been determined that the locking in 2.6.33
was insufficient or are we playing it safe now?

> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  include/linux/rmap.h |    8 ++++----
>  mm/ksm.c             |    2 +-
>  mm/mmap.c            |    6 +++++-
>  3 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 457ae1e..33ffe14 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -95,24 +95,24 @@ static inline void vma_lock_anon_vma(struct vm_area_struct *vma)
>  {
>  	struct anon_vma *anon_vma = vma->anon_vma;
>  	if (anon_vma)
> -		spin_lock(&anon_vma->lock);
> +		spin_lock(&anon_vma->root->lock);
>  }
>  
>  static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
>  {
>  	struct anon_vma *anon_vma = vma->anon_vma;
>  	if (anon_vma)
> -		spin_unlock(&anon_vma->lock);
> +		spin_unlock(&anon_vma->root->lock);
>  }
>  
>  static inline void anon_vma_lock(struct anon_vma *anon_vma)
>  {
> -	spin_lock(&anon_vma->lock);
> +	spin_lock(&anon_vma->root->lock);
>  }
>  
>  static inline void anon_vma_unlock(struct anon_vma *anon_vma)
>  {
> -	spin_unlock(&anon_vma->lock);
> +	spin_unlock(&anon_vma->root->lock);
>  }
>  
>  /*
> diff --git a/mm/ksm.c b/mm/ksm.c
> index d488012..7ca0dd7 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -325,7 +325,7 @@ static void drop_anon_vma(struct rmap_item *rmap_item)
>  {
>  	struct anon_vma *anon_vma = rmap_item->anon_vma;
>  
> -	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->lock)) {
> +	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock)) {
>  		int empty = list_empty(&anon_vma->head);
>  		anon_vma_unlock(anon_vma);
>  		if (empty)
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f70bc65..b7dfe30 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -553,6 +553,8 @@ again:			remove_next = 1 + (end > next->vm_end);
>  		}
>  	}
>  
> +	vma_lock_anon_vma(vma);
> +
>  	if (file) {
>  		mapping = file->f_mapping;
>  		if (!(vma->vm_flags & VM_NONLINEAR))
> @@ -600,6 +602,8 @@ again:			remove_next = 1 + (end > next->vm_end);
>  		flush_dcache_mmap_unlock(mapping);
>  	}
>  
> +	vma_unlock_anon_vma(vma);
> +
>  	if (remove_next) {
>  		/*
>  		 * vma_merge has merged next into vma, and needs
> @@ -2471,7 +2475,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
>  		 * The LSB of head.next can't change from under us
>  		 * because we hold the mm_all_locks_mutex.
>  		 */
> -		spin_lock_nest_lock(&anon_vma->lock, &mm->mmap_sem);
> +		spin_lock_nest_lock(&anon_vma->root->lock, &mm->mmap_sem);
>  		/*
>  		 * We can safely modify head.next after taking the
>  		 * anon_vma->lock. If some other vma in this mm shares
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] extend KSM refcounts to the anon_vma root
  2010-05-12 17:41 ` [PATCH 5/5] extend KSM refcounts to the anon_vma root Rik van Riel
@ 2010-05-12 21:07   ` Mel Gorman
  2010-05-12 21:09     ` Rik van Riel
  0 siblings, 1 reply; 51+ messages in thread
From: Mel Gorman @ 2010-05-12 21:07 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On Wed, May 12, 2010 at 01:41:11PM -0400, Rik van Riel wrote:
> Subject: extend KSM refcounts to the anon_vma root
> 
> KSM reference counts can cause an anon_vma to exist after the processe
> it belongs to have already exited.  Because the anon_vma lock now lives
> in the root anon_vma, we need to ensure that the root anon_vma stays
> around until after all the "child" anon_vmas have been freed.
> 
> The obvious way to do this is to have a "child" anon_vma take a
> reference to the root in anon_vma_fork.  When the anon_vma is freed
> at munmap or process exit, we drop the refcount in anon_vma_unlink
> and possibly free the root anon_vma.
> 
> The KSM anon_vma reference count function also needs to be modified
> to deal with the possibility of freeing 2 levels of anon_vma.  The
> easiest way to do this is to break out the KSM magic and make it
> generic.
> 
> When compiling without CONFIG_KSM, this code is compiled out.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  include/linux/rmap.h |   12 ++++++++++++
>  mm/ksm.c             |   17 ++++++-----------
>  mm/rmap.c            |   45 ++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 62 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 33ffe14..387d40c 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -126,6 +126,18 @@ int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>  void __anon_vma_link(struct vm_area_struct *);
>  void anon_vma_free(struct anon_vma *);
>  
> +#ifdef CONFIG_KSM
> +static inline void get_anon_vma(struct anon_vma *anon_vma)
> +{
> +	atomic_inc(&anon_vma->ksm_refcount);
> +}
> +
> +void drop_anon_vma(struct anon_vma *);
> +#else
> +#define get_anon_vma(x)		do {} while(0)
> +#define drop_anon_vma(x)	do {} while(0)
> +#endif
> +
>  static inline void anon_vma_merge(struct vm_area_struct *vma,
>  				  struct vm_area_struct *next)
>  {
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 7ca0dd7..9f2acc9 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -318,19 +318,14 @@ static void hold_anon_vma(struct rmap_item *rmap_item,
>  			  struct anon_vma *anon_vma)
>  {
>  	rmap_item->anon_vma = anon_vma;
> -	atomic_inc(&anon_vma->ksm_refcount);
> +	get_anon_vma(anon_vma);
>  }

I'm not quite getting this. Here, we get the local anon_vma so we
increment its reference count and later we drop it but without a
refcount taken on the root anon_vma, why is it guaranteed to stay
around?

>  
> -static void drop_anon_vma(struct rmap_item *rmap_item)
> +static void ksm_drop_anon_vma(struct rmap_item *rmap_item)
>  {
>  	struct anon_vma *anon_vma = rmap_item->anon_vma;
>  
> -	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock)) {
> -		int empty = list_empty(&anon_vma->head);
> -		anon_vma_unlock(anon_vma);
> -		if (empty)
> -			anon_vma_free(anon_vma);
> -	}
> +	drop_anon_vma(anon_vma);
>  }
>  
>  /*
> @@ -415,7 +410,7 @@ static void break_cow(struct rmap_item *rmap_item)
>  	 * It is not an accident that whenever we want to break COW
>  	 * to undo, we also need to drop a reference to the anon_vma.
>  	 */
> -	drop_anon_vma(rmap_item);
> +	ksm_drop_anon_vma(rmap_item);
>  
>  	down_read(&mm->mmap_sem);
>  	if (ksm_test_exit(mm))
> @@ -470,7 +465,7 @@ static void remove_node_from_stable_tree(struct stable_node *stable_node)
>  			ksm_pages_sharing--;
>  		else
>  			ksm_pages_shared--;
> -		drop_anon_vma(rmap_item);
> +		ksm_drop_anon_vma(rmap_item);
>  		rmap_item->address &= PAGE_MASK;
>  		cond_resched();
>  	}
> @@ -558,7 +553,7 @@ static void remove_rmap_item_from_tree(struct rmap_item *rmap_item)
>  		else
>  			ksm_pages_shared--;
>  
> -		drop_anon_vma(rmap_item);
> +		ksm_drop_anon_vma(rmap_item);
>  		rmap_item->address &= PAGE_MASK;
>  
>  	} else if (rmap_item->address & UNSTABLE_FLAG) {
> diff --git a/mm/rmap.c b/mm/rmap.c
> index f0ba648..d63cd91 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -238,6 +238,12 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>  	 */
>  	root_avc = list_entry(vma->anon_vma_chain.prev, struct anon_vma_chain, same_vma);
>  	anon_vma->root = root_avc->anon_vma;
> +	/*
> +	 * With KSM refcounts, an anon_vma can stay around longer than the
> +	 * process it belongs to.  The root anon_vma needs to be pinned,
> +	 * because that is where the lock lives.
> +	 */
> +	get_anon_vma(anon_vma->root);
>  	/* Mark this anon_vma as the one where our new (COWed) pages go. */
>  	vma->anon_vma = anon_vma;
>  	anon_vma_chain_link(vma, avc, anon_vma);
> @@ -267,8 +273,11 @@ static void anon_vma_unlink(struct anon_vma_chain *anon_vma_chain)
>  	empty = list_empty(&anon_vma->head) && !ksm_refcount(anon_vma);
>  	anon_vma_unlock(anon_vma);
>  
> -	if (empty)
> +	if (empty) {
> +		/* We no longer need the root anon_vma */
> +		drop_anon_vma(anon_vma->root);
>  		anon_vma_free(anon_vma);
> +	}
>  }
>  
>  void unlink_anon_vmas(struct vm_area_struct *vma)
> @@ -1355,6 +1364,40 @@ int try_to_munlock(struct page *page)
>  		return try_to_unmap_file(page, TTU_MUNLOCK);
>  }
>  
> +#ifdef CONFIG_KSM
> +/*
> + * Drop an anon_vma refcount, freeing the anon_vma and anon_vma->root
> + * if necessary.  Be careful to do all the tests under the lock.  Once
> + * we know we are the last user, nobody else can get a reference and we
> + * can do the freeing without the lock.
> + */
> +void drop_anon_vma(struct anon_vma *anon_vma)
> +{
> +	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock)) {
> +		struct anon_vma *root = anon_vma->root;
> +		int empty list_empty(&anon_vma->head);
> +		int last_root_user = 0;
> +		int root_empty = 0;
> +
> +		/*
> +		 * The refcount on a non-root anon_vma got dropped.  Drop
> +		 * the refcount on the root and check if we need to free it.
> +		 */
> +		if (empty && anon_vma != root) {
> +			last_root_user = atomic_dec_and_test(&root->ksm_refcount);
> +			root_empty = list_empty(&root->head);
> +		}
> +		anon_vma_unlock(anon_vma);
> +
> +		if (empty) {
> +			anon_vma_free(anon_vma);
> +			if (root_empty && last_root_user)
> +				anon_vma_free(root);
> +		}
> +	}
> +}
> +#endif
> +
>  #ifdef CONFIG_MIGRATION
>  /*
>   * rmap_walk() and its helpers rmap_walk_anon() and rmap_walk_file():
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/5] always lock the root (oldest) anon_vma
  2010-05-12 21:02   ` Mel Gorman
@ 2010-05-12 21:08     ` Rik van Riel
  2010-05-13  9:54       ` Mel Gorman
  0 siblings, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2010-05-12 21:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On 05/12/2010 05:02 PM, Mel Gorman wrote:

> This last comment is a bit light. It's actually restoring the lock that
> was taken in 2.6.33 to some extent except we are always taking it now.
> In 2.6.33, it was resricted to
>
>         if (vma->anon_vma&&  (insert || importer || start != vma->vm_start))
>                  anon_vma = vma->anon_vma;
>
> but now it's always. Has it been determined that the locking in 2.6.33
> was insufficient or are we playing it safe now?

Playing it safe, mostly.

Another aspect is that, if you look at the if condition above,
the number of cases where we have an anon_vma and do not take
the lock is pretty small.

Basically only the case where we expand a VMA upward or merge
VMAs in an mprotect.  I believe in pretty much all other cases
we end up needing to take the lock.

I am not entirely convinced the old code took the lock in all
of the required cases.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] extend KSM refcounts to the anon_vma root
  2010-05-12 21:07   ` Mel Gorman
@ 2010-05-12 21:09     ` Rik van Riel
  2010-05-13 11:26       ` Mel Gorman
  0 siblings, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2010-05-12 21:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On 05/12/2010 05:07 PM, Mel Gorman wrote:
> On Wed, May 12, 2010 at 01:41:11PM -0400, Rik van Riel wrote:
>> Subject: extend KSM refcounts to the anon_vma root
>>
>> KSM reference counts can cause an anon_vma to exist after the processe
>> it belongs to have already exited.  Because the anon_vma lock now lives
>> in the root anon_vma, we need to ensure that the root anon_vma stays
>> around until after all the "child" anon_vmas have been freed.
>>
>> The obvious way to do this is to have a "child" anon_vma take a
>> reference to the root in anon_vma_fork.  When the anon_vma is freed
>> at munmap or process exit, we drop the refcount in anon_vma_unlink
>> and possibly free the root anon_vma.
>>
>> The KSM anon_vma reference count function also needs to be modified
>> to deal with the possibility of freeing 2 levels of anon_vma.  The
>> easiest way to do this is to break out the KSM magic and make it
>> generic.
>>
>> When compiling without CONFIG_KSM, this code is compiled out.
>>
>> Signed-off-by: Rik van Riel<riel@redhat.com>
>> ---
>>   include/linux/rmap.h |   12 ++++++++++++
>>   mm/ksm.c             |   17 ++++++-----------
>>   mm/rmap.c            |   45 ++++++++++++++++++++++++++++++++++++++++++++-
>>   3 files changed, 62 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> index 33ffe14..387d40c 100644
>> --- a/include/linux/rmap.h
>> +++ b/include/linux/rmap.h
>> @@ -126,6 +126,18 @@ int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>>   void __anon_vma_link(struct vm_area_struct *);
>>   void anon_vma_free(struct anon_vma *);
>>
>> +#ifdef CONFIG_KSM
>> +static inline void get_anon_vma(struct anon_vma *anon_vma)
>> +{
>> +	atomic_inc(&anon_vma->ksm_refcount);
>> +}
>> +
>> +void drop_anon_vma(struct anon_vma *);
>> +#else
>> +#define get_anon_vma(x)		do {} while(0)
>> +#define drop_anon_vma(x)	do {} while(0)
>> +#endif
>> +
>>   static inline void anon_vma_merge(struct vm_area_struct *vma,
>>   				  struct vm_area_struct *next)
>>   {
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index 7ca0dd7..9f2acc9 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -318,19 +318,14 @@ static void hold_anon_vma(struct rmap_item *rmap_item,
>>   			  struct anon_vma *anon_vma)
>>   {
>>   	rmap_item->anon_vma = anon_vma;
>> -	atomic_inc(&anon_vma->ksm_refcount);
>> +	get_anon_vma(anon_vma);
>>   }
>
> I'm not quite getting this. Here, we get the local anon_vma so we
> increment its reference count and later we drop it but without a
> refcount taken on the root anon_vma, why is it guaranteed to stay
> around?

Because anon_vma_fork takes a reference count on the root anon_vma,
the VMA we take a refcount on will either have a refcount on the
root, or it is the root.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/5] always lock the root (oldest) anon_vma
  2010-05-12 17:40 ` [PATCH 4/5] always lock " Rik van Riel
  2010-05-12 21:02   ` Mel Gorman
@ 2010-05-12 21:55   ` Linus Torvalds
  2010-05-12 22:18     ` Rik van Riel
  1 sibling, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2010-05-12 21:55 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, KAMEZAWA Hiroyuki, LKML



On Wed, 12 May 2010, Rik van Riel wrote:
> 
> Always (and only) lock the root (oldest) anon_vma whenever we do 
> something in an anon_vma.  The recently introduced anon_vma scalability 
> is due to the rmap code scanning only the VMAs that need to be scanned.  
> Many common operations still took the anon_vma lock on the root 
> anon_vma, so always taking that lock is not expected to introduce any 
> scalability issues.

Ack for this (and the whole series, for that matter - looks fine to me). 

Somebody should run the performance numbers with AIM7 or whatever, just to 
check that the lock isn't a problem, but this approach certainly gets rid 
of all my objections about crazy locking. 

That patch #5 is pretty ugly, though. And I think this part (in 
drop_anon_vma) is approaching being wrong:

+       if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock)) {

because I do _not_ believe that you need to decrement that ksm_refcount 
under the lock, do you? It's just a refcount, isn't it?

Wouldn't it be sufficient to do

	if (atomic_dec_and_test(&anon_vma->ksm_refcount)) {
		anon_vma_lock(anon_vma);

instead? The "atomic_dec_and_lock()" semantics are _much_ stricter than a 
regular "decrement and test and then lock", and that strictness means that 
it's way more complicated and expensive. So if you don't need the 
semantics, you shouldn't use them.

But maybe we do need those "lock before decrementing to zero" semantics. 
The old ksm.c code had it too, although I suspect it's just being 
confused.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/5] always lock the root (oldest) anon_vma
  2010-05-12 21:55   ` [PATCH 4/5] always lock the root (oldest) anon_vma Linus Torvalds
@ 2010-05-12 22:18     ` Rik van Riel
  2010-05-12 22:26       ` Linus Torvalds
  0 siblings, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2010-05-12 22:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, KAMEZAWA Hiroyuki, LKML

On 05/12/2010 05:55 PM, Linus Torvalds wrote:

> Wouldn't it be sufficient to do
>
> 	if (atomic_dec_and_test(&anon_vma->ksm_refcount)) {
> 		anon_vma_lock(anon_vma);
>
> instead? The "atomic_dec_and_lock()" semantics are _much_ stricter than a
> regular "decrement and test and then lock", and that strictness means that
> it's way more complicated and expensive. So if you don't need the
> semantics, you shouldn't use them.

I suspect the atomic_dec_and_lock in the KVM code is being used
to prevent the following race:

1) KSM code reduces the refcount to 0

2)                               munmap on other CPU frees the anon_vma

3) KSM code takes the anon_vma lock,
    which now lives in freed memory

Am I totally confused by this and can we use a nicer approach?

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/5] always lock the root (oldest) anon_vma
  2010-05-12 22:18     ` Rik van Riel
@ 2010-05-12 22:26       ` Linus Torvalds
  0 siblings, 0 replies; 51+ messages in thread
From: Linus Torvalds @ 2010-05-12 22:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, KAMEZAWA Hiroyuki, LKML



On Wed, 12 May 2010, Rik van Riel wrote:
> 
> I suspect the atomic_dec_and_lock in the KVM code is being used
> to prevent the following race:
> 
> 1) KSM code reduces the refcount to 0
> 
> 2)     munmap on other CPU frees the anon_vma
> 
> 3) KSM code takes the anon_vma lock,
>    which now lives in freed memory

Hmm. Well, if it were just about the lock, then that would be fine. That's 
why we do the whole anon_vma RCU freeing dance, after all.

But I guess you're right - although not because of the lock. You're right 
because it would be a double-free - both parties would decide that they 
can free the damn thing, because it's not a pure atomic refcount, it's a 
"refcount or list_empty()" thing.

If _everybody_ was using the refcount, we could just do the 
atomic_dec_and_test(). But they aren't. So yeah, I guess we do want that 
nasty dec-and-lock version.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma
  2010-05-12 17:39 ` [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma Rik van Riel
  2010-05-12 20:57   ` Mel Gorman
@ 2010-05-13  0:30   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 51+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-13  0:30 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, LKML, Linus Torvalds

On Wed, 12 May 2010 13:39:31 -0400
Rik van Riel <riel@redhat.com> wrote:

> Subject: rename anon_vma_lock to vma_lock_anon_vma
> 
> Rename anon_vma_lock to vma_lock_anon_vma.  This matches the
> naming style used in page_lock_anon_vma and will come in really
> handy further down in this patch series.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function
  2010-05-12 17:41 ` [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function Rik van Riel
  2010-05-12 20:58   ` Mel Gorman
@ 2010-05-13  0:32   ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 51+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-13  0:32 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, LKML, Linus Torvalds

On Wed, 12 May 2010 13:41:18 -0400
Rik van Riel <riel@redhat.com> wrote:

> Subject: change direct call of spin_lock(anon_vma->lock) to inline function
> 
> Subsitute a direct call of spin_lock(anon_vma->lock) with
> an inline function doing exactly the same.
> 
> This makes it easier to do the substitution to the root
> anon_vma lock in a following patch.
> 
> We will deal with the handful of special locks (nested,
> dec_and_lock, etc) separately.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/5] track the root (oldest) anon_vma
  2010-05-12 17:39 ` [PATCH 3/5] track the root (oldest) anon_vma Rik van Riel
  2010-05-12 20:59   ` Mel Gorman
@ 2010-05-13  0:38   ` KAMEZAWA Hiroyuki
  2010-05-13  2:25     ` Rik van Riel
  1 sibling, 1 reply; 51+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-13  0:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, LKML, Linus Torvalds

On Wed, 12 May 2010 13:39:58 -0400
Rik van Riel <riel@redhat.com> wrote:

> Subject: track the root (oldest) anon_vma
> 
> Track the root (oldest) anon_vma in each anon_vma tree.   Because we only
> take the lock on the root anon_vma, we cannot use the lock on higher-up
> anon_vmas to lock anything.  This makes it impossible to do an indirect
> lookup of the root anon_vma, since the data structures could go away from
> under us.
> 
> However, a direct pointer is safe because the root anon_vma is always the
> last one that gets freed on munmap or exit, by virtue of the same_vma list
> order and unlink_anon_vmas walking the list forward.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>


Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

I welcome this. Thank you!

Reading 4/5, I felt I'm grad if you add a Documentation or very-precise-comment
about the new anon_vma rules and the _meaning_ of anon_vma_root_lock.
I cannot fully convice myself that I understand them all.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/5] track the root (oldest) anon_vma
  2010-05-13  0:38   ` KAMEZAWA Hiroyuki
@ 2010-05-13  2:25     ` Rik van Riel
  2010-05-14  0:04       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2010-05-13  2:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, LKML, Linus Torvalds

On 05/12/2010 08:38 PM, KAMEZAWA Hiroyuki wrote:
> On Wed, 12 May 2010 13:39:58 -0400
> Rik van Riel<riel@redhat.com>  wrote:
>
>> Subject: track the root (oldest) anon_vma
>>
>> Track the root (oldest) anon_vma in each anon_vma tree.   Because we only
>> take the lock on the root anon_vma, we cannot use the lock on higher-up
>> anon_vmas to lock anything.  This makes it impossible to do an indirect
>> lookup of the root anon_vma, since the data structures could go away from
>> under us.
>>
>> However, a direct pointer is safe because the root anon_vma is always the
>> last one that gets freed on munmap or exit, by virtue of the same_vma list
>> order and unlink_anon_vmas walking the list forward.
>>
>> Signed-off-by: Rik van Riel<riel@redhat.com>
>
>
> Acked-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> I welcome this. Thank you!
>
> Reading 4/5, I felt I'm grad if you add a Documentation or very-precise-comment
> about the new anon_vma rules and the _meaning_ of anon_vma_root_lock.
> I cannot fully convice myself that I understand them all.

Please send me a list of all the questions that come up
when you read the patches, and I'll prepare a patch 6/5
with just documentation :)

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/5] always lock the root (oldest) anon_vma
  2010-05-12 21:08     ` Rik van Riel
@ 2010-05-13  9:54       ` Mel Gorman
  2010-05-13 14:33         ` [PATCH -v2 " Rik van Riel
  0 siblings, 1 reply; 51+ messages in thread
From: Mel Gorman @ 2010-05-13  9:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

On Wed, May 12, 2010 at 05:08:11PM -0400, Rik van Riel wrote:
> On 05/12/2010 05:02 PM, Mel Gorman wrote:
>
>> This last comment is a bit light. It's actually restoring the lock that
>> was taken in 2.6.33 to some extent except we are always taking it now.
>> In 2.6.33, it was resricted to
>>
>>         if (vma->anon_vma&&  (insert || importer || start != vma->vm_start))
>>                  anon_vma = vma->anon_vma;
>>
>> but now it's always. Has it been determined that the locking in 2.6.33
>> was insufficient or are we playing it safe now?
>
> Playing it safe, mostly.
>

Sure. I did the same, got the same question from Andrea and more or less
gave the same answer :) . I asked again in case you spotted something I
didn't.

> Another aspect is that, if you look at the if condition above,
> the number of cases where we have an anon_vma and do not take
> the lock is pretty small.
>
> Basically only the case where we expand a VMA upward or merge
> VMAs in an mprotect.  I believe in pretty much all other cases
> we end up needing to take the lock.
>

Looking at the if condition, brk() would appear to be the most important
case, right? This would appear to correlate with the reasoning behind
that condition in the first place in commit
252c5f94d944487e9f50ece7942b0fbf659c5c31 where sbrk contended on the
lock heavily.

I can't convince myself 100% but it is possible we will regress on that
test case again if the same logic is not applied to the locking. I ran a
brk() microbenchmark from aim9 and the results were really bad - 48%
regression. I didn't rerun with the old logic to see the results
unfortunately and right now I'm on the road. It'll be tomorrow morning
before I get the chance.

> I am not entirely convinced the old code took the lock in all
> of the required cases.
>

I have vague worries about expand_upwards but otherwise the reasoning
seemed solid and even with the new anon_vma code, we are not doing
anything fundamentally different in this area. Maybe it's best to play
it safe now and always take the lock, but it's worth reconsidering later
particularly if this patch gets fingered in some performance-related
bisection later.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] extend KSM refcounts to the anon_vma root
  2010-05-12 21:09     ` Rik van Riel
@ 2010-05-13 11:26       ` Mel Gorman
  2010-05-13 13:11         ` Rik van Riel
  0 siblings, 1 reply; 51+ messages in thread
From: Mel Gorman @ 2010-05-13 11:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On Wed, May 12, 2010 at 05:09:18PM -0400, Rik van Riel wrote:
> On 05/12/2010 05:07 PM, Mel Gorman wrote:
>> On Wed, May 12, 2010 at 01:41:11PM -0400, Rik van Riel wrote:
>>> Subject: extend KSM refcounts to the anon_vma root
>>>
>>> KSM reference counts can cause an anon_vma to exist after the processe
>>> it belongs to have already exited.  Because the anon_vma lock now lives
>>> in the root anon_vma, we need to ensure that the root anon_vma stays
>>> around until after all the "child" anon_vmas have been freed.
>>>
>>> The obvious way to do this is to have a "child" anon_vma take a
>>> reference to the root in anon_vma_fork.  When the anon_vma is freed
>>> at munmap or process exit, we drop the refcount in anon_vma_unlink
>>> and possibly free the root anon_vma.
>>>
>>> The KSM anon_vma reference count function also needs to be modified
>>> to deal with the possibility of freeing 2 levels of anon_vma.  The
>>> easiest way to do this is to break out the KSM magic and make it
>>> generic.
>>>
>>> When compiling without CONFIG_KSM, this code is compiled out.
>>>
>>> Signed-off-by: Rik van Riel<riel@redhat.com>
>>> ---
>>>   include/linux/rmap.h |   12 ++++++++++++
>>>   mm/ksm.c             |   17 ++++++-----------
>>>   mm/rmap.c            |   45 ++++++++++++++++++++++++++++++++++++++++++++-
>>>   3 files changed, 62 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>> index 33ffe14..387d40c 100644
>>> --- a/include/linux/rmap.h
>>> +++ b/include/linux/rmap.h
>>> @@ -126,6 +126,18 @@ int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>>>   void __anon_vma_link(struct vm_area_struct *);
>>>   void anon_vma_free(struct anon_vma *);
>>>
>>> +#ifdef CONFIG_KSM
>>> +static inline void get_anon_vma(struct anon_vma *anon_vma)
>>> +{
>>> +	atomic_inc(&anon_vma->ksm_refcount);
>>> +}
>>> +
>>> +void drop_anon_vma(struct anon_vma *);
>>> +#else
>>> +#define get_anon_vma(x)		do {} while(0)
>>> +#define drop_anon_vma(x)	do {} while(0)
>>> +#endif
>>> +
>>>   static inline void anon_vma_merge(struct vm_area_struct *vma,
>>>   				  struct vm_area_struct *next)
>>>   {
>>> diff --git a/mm/ksm.c b/mm/ksm.c
>>> index 7ca0dd7..9f2acc9 100644
>>> --- a/mm/ksm.c
>>> +++ b/mm/ksm.c
>>> @@ -318,19 +318,14 @@ static void hold_anon_vma(struct rmap_item *rmap_item,
>>>   			  struct anon_vma *anon_vma)
>>>   {
>>>   	rmap_item->anon_vma = anon_vma;
>>> -	atomic_inc(&anon_vma->ksm_refcount);
>>> +	get_anon_vma(anon_vma);
>>>   }
>>
>> I'm not quite getting this. Here, we get the local anon_vma so we
>> increment its reference count and later we drop it but without a
>> refcount taken on the root anon_vma, why is it guaranteed to stay
>> around?
>
> Because anon_vma_fork takes a reference count on the root anon_vma,
> the VMA we take a refcount on will either have a refcount on the
> root, or it is the root.
>

Sorry, I'm still not getting it. anon_vma_fork keeps the refcount around
during fork but what about during exit? Lets say anon_vma_unlink is called
on the following arrangement;

root_anon_vma->refcounted_anon_vma

We walk the list but the root_anon_vma doesn't have a refcount so it
gets freed. drop_anon_vma gets called on refcounted_anon_vma which does

if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock))

but the root anon_vma is now gone. Are you depending on the lifecycle of
anon_vma's within KSM for this to work? If so, then the migration-related
fixes in mmotm that take a refcount on anon_vma during migration will also
need to take a refcount on the root.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] extend KSM refcounts to the anon_vma root
  2010-05-13 11:26       ` Mel Gorman
@ 2010-05-13 13:11         ` Rik van Riel
  2010-05-13 13:24           ` Mel Gorman
  0 siblings, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2010-05-13 13:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On 05/13/2010 07:26 AM, Mel Gorman wrote:
> On Wed, May 12, 2010 at 05:09:18PM -0400, Rik van Riel wrote:
>> On 05/12/2010 05:07 PM, Mel Gorman wrote:
>>> On Wed, May 12, 2010 at 01:41:11PM -0400, Rik van Riel wrote:
>>>> Subject: extend KSM refcounts to the anon_vma root
>>>>
>>>> KSM reference counts can cause an anon_vma to exist after the processe
>>>> it belongs to have already exited.  Because the anon_vma lock now lives
>>>> in the root anon_vma, we need to ensure that the root anon_vma stays
>>>> around until after all the "child" anon_vmas have been freed.
>>>>
>>>> The obvious way to do this is to have a "child" anon_vma take a
>>>> reference to the root in anon_vma_fork.  When the anon_vma is freed
>>>> at munmap or process exit, we drop the refcount in anon_vma_unlink
>>>> and possibly free the root anon_vma.
>>>>
>>>> The KSM anon_vma reference count function also needs to be modified
>>>> to deal with the possibility of freeing 2 levels of anon_vma.  The
>>>> easiest way to do this is to break out the KSM magic and make it
>>>> generic.
>>>>
>>>> When compiling without CONFIG_KSM, this code is compiled out.
>>>>
>>>> Signed-off-by: Rik van Riel<riel@redhat.com>
>>>> ---
>>>>    include/linux/rmap.h |   12 ++++++++++++
>>>>    mm/ksm.c             |   17 ++++++-----------
>>>>    mm/rmap.c            |   45 ++++++++++++++++++++++++++++++++++++++++++++-
>>>>    3 files changed, 62 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>> index 33ffe14..387d40c 100644
>>>> --- a/include/linux/rmap.h
>>>> +++ b/include/linux/rmap.h
>>>> @@ -126,6 +126,18 @@ int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>>>>    void __anon_vma_link(struct vm_area_struct *);
>>>>    void anon_vma_free(struct anon_vma *);
>>>>
>>>> +#ifdef CONFIG_KSM
>>>> +static inline void get_anon_vma(struct anon_vma *anon_vma)
>>>> +{
>>>> +	atomic_inc(&anon_vma->ksm_refcount);
>>>> +}
>>>> +
>>>> +void drop_anon_vma(struct anon_vma *);
>>>> +#else
>>>> +#define get_anon_vma(x)		do {} while(0)
>>>> +#define drop_anon_vma(x)	do {} while(0)
>>>> +#endif
>>>> +
>>>>    static inline void anon_vma_merge(struct vm_area_struct *vma,
>>>>    				  struct vm_area_struct *next)
>>>>    {
>>>> diff --git a/mm/ksm.c b/mm/ksm.c
>>>> index 7ca0dd7..9f2acc9 100644
>>>> --- a/mm/ksm.c
>>>> +++ b/mm/ksm.c
>>>> @@ -318,19 +318,14 @@ static void hold_anon_vma(struct rmap_item *rmap_item,
>>>>    			  struct anon_vma *anon_vma)
>>>>    {
>>>>    	rmap_item->anon_vma = anon_vma;
>>>> -	atomic_inc(&anon_vma->ksm_refcount);
>>>> +	get_anon_vma(anon_vma);
>>>>    }
>>>
>>> I'm not quite getting this. Here, we get the local anon_vma so we
>>> increment its reference count and later we drop it but without a
>>> refcount taken on the root anon_vma, why is it guaranteed to stay
>>> around?
>>
>> Because anon_vma_fork takes a reference count on the root anon_vma,
>> the VMA we take a refcount on will either have a refcount on the
>> root, or it is the root.
>>
>
> Sorry, I'm still not getting it. anon_vma_fork keeps the refcount around
> during fork but what about during exit?

It is kept around all the way from fork until exit.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/5] extend KSM refcounts to the anon_vma root
  2010-05-13 13:11         ` Rik van Riel
@ 2010-05-13 13:24           ` Mel Gorman
  2010-05-13 14:34             ` [PATCH -v2 " Rik van Riel
  0 siblings, 1 reply; 51+ messages in thread
From: Mel Gorman @ 2010-05-13 13:24 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On Thu, May 13, 2010 at 09:11:30AM -0400, Rik van Riel wrote:
> On 05/13/2010 07:26 AM, Mel Gorman wrote:
>> On Wed, May 12, 2010 at 05:09:18PM -0400, Rik van Riel wrote:
>>> On 05/12/2010 05:07 PM, Mel Gorman wrote:
>>>> On Wed, May 12, 2010 at 01:41:11PM -0400, Rik van Riel wrote:
>>>>> Subject: extend KSM refcounts to the anon_vma root
>>>>>
>>>>> KSM reference counts can cause an anon_vma to exist after the processe
>>>>> it belongs to have already exited.  Because the anon_vma lock now lives
>>>>> in the root anon_vma, we need to ensure that the root anon_vma stays
>>>>> around until after all the "child" anon_vmas have been freed.
>>>>>
>>>>> The obvious way to do this is to have a "child" anon_vma take a
>>>>> reference to the root in anon_vma_fork.  When the anon_vma is freed
>>>>> at munmap or process exit, we drop the refcount in anon_vma_unlink
>>>>> and possibly free the root anon_vma.
>>>>>
>>>>> The KSM anon_vma reference count function also needs to be modified
>>>>> to deal with the possibility of freeing 2 levels of anon_vma.  The
>>>>> easiest way to do this is to break out the KSM magic and make it
>>>>> generic.
>>>>>
>>>>> When compiling without CONFIG_KSM, this code is compiled out.
>>>>>
>>>>> Signed-off-by: Rik van Riel<riel@redhat.com>
>>>>> ---
>>>>>    include/linux/rmap.h |   12 ++++++++++++
>>>>>    mm/ksm.c             |   17 ++++++-----------
>>>>>    mm/rmap.c            |   45 ++++++++++++++++++++++++++++++++++++++++++++-
>>>>>    3 files changed, 62 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>>> index 33ffe14..387d40c 100644
>>>>> --- a/include/linux/rmap.h
>>>>> +++ b/include/linux/rmap.h
>>>>> @@ -126,6 +126,18 @@ int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>>>>>    void __anon_vma_link(struct vm_area_struct *);
>>>>>    void anon_vma_free(struct anon_vma *);
>>>>>
>>>>> +#ifdef CONFIG_KSM
>>>>> +static inline void get_anon_vma(struct anon_vma *anon_vma)
>>>>> +{
>>>>> +	atomic_inc(&anon_vma->ksm_refcount);
>>>>> +}
>>>>> +
>>>>> +void drop_anon_vma(struct anon_vma *);
>>>>> +#else
>>>>> +#define get_anon_vma(x)		do {} while(0)
>>>>> +#define drop_anon_vma(x)	do {} while(0)
>>>>> +#endif
>>>>> +
>>>>>    static inline void anon_vma_merge(struct vm_area_struct *vma,
>>>>>    				  struct vm_area_struct *next)
>>>>>    {
>>>>> diff --git a/mm/ksm.c b/mm/ksm.c
>>>>> index 7ca0dd7..9f2acc9 100644
>>>>> --- a/mm/ksm.c
>>>>> +++ b/mm/ksm.c
>>>>> @@ -318,19 +318,14 @@ static void hold_anon_vma(struct rmap_item *rmap_item,
>>>>>    			  struct anon_vma *anon_vma)
>>>>>    {
>>>>>    	rmap_item->anon_vma = anon_vma;
>>>>> -	atomic_inc(&anon_vma->ksm_refcount);
>>>>> +	get_anon_vma(anon_vma);
>>>>>    }
>>>>
>>>> I'm not quite getting this. Here, we get the local anon_vma so we
>>>> increment its reference count and later we drop it but without a
>>>> refcount taken on the root anon_vma, why is it guaranteed to stay
>>>> around?
>>>
>>> Because anon_vma_fork takes a reference count on the root anon_vma,
>>> the VMA we take a refcount on will either have a refcount on the
>>> root, or it is the root.
>>>
>>
>> Sorry, I'm still not getting it. anon_vma_fork keeps the refcount around
>> during fork but what about during exit?
>
> It is kept around all the way from fork until exit.
>

Ok, now I get it. I was thinking in terms of a reference count being for
the duration of a specific operation. In this case, the root anon_vma
has an elevated reference count for the lifetime of the anon_vma forest.
Thanks.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH -v2 4/5] always lock the root (oldest) anon_vma
  2010-05-13  9:54       ` Mel Gorman
@ 2010-05-13 14:33         ` Rik van Riel
  2010-05-13 21:09           ` Andrew Morton
  0 siblings, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2010-05-13 14:33 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

> Looking at the if condition, brk() would appear to be the most important
> case, right? This would appear to correlate with the reasoning behind
> that condition in the first place in commit
> 252c5f94d944487e9f50ece7942b0fbf659c5c31 where sbrk contended on the
> lock heavily.

You are right.  Here is a new patch 4/5:
---------------------

Subject: always lock the root (oldest) anon_vma

Always (and only) lock the root (oldest) anon_vma whenever we do something in an
anon_vma.  The recently introduced anon_vma scalability is due to the rmap code
scanning only the VMAs that need to be scanned.  Many common operations still
took the anon_vma lock on the root anon_vma, so always taking that lock is not
expected to introduce any scalability issues.

However, always taking the same lock does mean we only need to take one lock,
which means rmap_walk on pages from any anon_vma in the vma is excluded from
occurring during an munmap, expand_stack or other operation that needs to
exclude rmap_walk and similar functions.

Also add the proper locking to vma_adjust.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
v2:
 - conditionally take the anon_vma lock in vma_adjust, like introduced
   in 252c5f94d944487e9f50ece7942b0fbf659c5c31  (with a proper comment)

 include/linux/rmap.h |    8 ++++----
 mm/ksm.c             |    2 +-
 mm/mmap.c            |   16 +++++++++++++++-
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 457ae1e..33ffe14 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -95,24 +95,24 @@ static inline void vma_lock_anon_vma(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_lock(&anon_vma->lock);
+		spin_lock(&anon_vma->root->lock);
 }
 
 static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_unlock(&anon_vma->lock);
+		spin_unlock(&anon_vma->root->lock);
 }
 
 static inline void anon_vma_lock(struct anon_vma *anon_vma)
 {
-	spin_lock(&anon_vma->lock);
+	spin_lock(&anon_vma->root->lock);
 }
 
 static inline void anon_vma_unlock(struct anon_vma *anon_vma)
 {
-	spin_unlock(&anon_vma->lock);
+	spin_unlock(&anon_vma->root->lock);
 }
 
 /*
diff --git a/mm/ksm.c b/mm/ksm.c
index d488012..7ca0dd7 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -325,7 +325,7 @@ static void drop_anon_vma(struct rmap_item *rmap_item)
 {
 	struct anon_vma *anon_vma = rmap_item->anon_vma;
 
-	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->lock)) {
+	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock)) {
 		int empty = list_empty(&anon_vma->head);
 		anon_vma_unlock(anon_vma);
 		if (empty)
diff --git a/mm/mmap.c b/mm/mmap.c
index f70bc65..a543359 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -506,6 +506,7 @@ int vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	struct vm_area_struct *importer = NULL;
 	struct address_space *mapping = NULL;
 	struct prio_tree_root *root = NULL;
+	struct anon_vma *anon_vma = NULL;
 	struct file *file = vma->vm_file;
 	long adjust_next = 0;
 	int remove_next = 0;
@@ -553,6 +554,17 @@ again:			remove_next = 1 + (end > next->vm_end);
 		}
 	}
 
+	/*
+	 * When changing only vma->vm_end, we don't really need anon_vma
+	 * lock. This is a fairly rare case by itself, but the anon_vma
+	 * lock may be shared between many sibling processes.  Skipping
+	 * the lock for brk adjustments makes a difference sometimes.
+	 */
+	if (vma->anon_vma && (insert || importer || start != vma->vm_start)) {
+		anon_vma = vma->anon_vma;
+		anon_vma_lock(anon_vma);
+	}
+
 	if (file) {
 		mapping = file->f_mapping;
 		if (!(vma->vm_flags & VM_NONLINEAR))
@@ -619,6 +631,8 @@ again:			remove_next = 1 + (end > next->vm_end);
 
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
+	if (anon_vma)
+		anon_vma_unlock(anon_vma);
 
 	if (remove_next) {
 		if (file) {
@@ -2471,7 +2485,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
 		 * The LSB of head.next can't change from under us
 		 * because we hold the mm_all_locks_mutex.
 		 */
-		spin_lock_nest_lock(&anon_vma->lock, &mm->mmap_sem);
+		spin_lock_nest_lock(&anon_vma->root->lock, &mm->mmap_sem);
 		/*
 		 * We can safely modify head.next after taking the
 		 * anon_vma->lock. If some other vma in this mm shares

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH -v2 5/5] extend KSM refcounts to the anon_vma root
  2010-05-13 13:24           ` Mel Gorman
@ 2010-05-13 14:34             ` Rik van Riel
  2010-05-19  1:05               ` Andrea Arcangeli
  0 siblings, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2010-05-13 14:34 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds


> Ok, now I get it. I was thinking in terms of a reference count being for
> the duration of a specific operation. In this case, the root anon_vma
> has an elevated reference count for the lifetime of the anon_vma forest.
> Thanks.

I have updated the comment in anon_vma_fork to reflect that the refcount
lasts for the lifetime of the anon_vma.
-------------------------------------

Subject: extend KSM refcounts to the anon_vma root

KSM reference counts can cause an anon_vma to exist after the processe
it belongs to have already exited.  Because the anon_vma lock now lives
in the root anon_vma, we need to ensure that the root anon_vma stays
around until after all the "child" anon_vmas have been freed.

The obvious way to do this is to have a "child" anon_vma take a
reference to the root in anon_vma_fork.  When the anon_vma is freed
at munmap or process exit, we drop the refcount in anon_vma_unlink
and possibly free the root anon_vma.

The KSM anon_vma reference count function also needs to be modified
to deal with the possibility of freeing 2 levels of anon_vma.  The
easiest way to do this is to break out the KSM magic and make it
generic.

When compiling without CONFIG_KSM, this code is compiled out.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
v2:
 - improve the anon_vma refcount comment in anon_vma_fork with the refcount lifetime

 include/linux/rmap.h |   12 ++++++++++++
 mm/ksm.c             |   17 ++++++-----------
 mm/rmap.c            |   45 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 62 insertions(+), 12 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 33ffe14..387d40c 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -126,6 +126,18 @@ int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
 void __anon_vma_link(struct vm_area_struct *);
 void anon_vma_free(struct anon_vma *);
 
+#ifdef CONFIG_KSM
+static inline void get_anon_vma(struct anon_vma *anon_vma)
+{
+	atomic_inc(&anon_vma->ksm_refcount);
+}
+
+void drop_anon_vma(struct anon_vma *);
+#else
+#define get_anon_vma(x)		do {} while(0)
+#define drop_anon_vma(x)	do {} while(0)
+#endif
+
 static inline void anon_vma_merge(struct vm_area_struct *vma,
 				  struct vm_area_struct *next)
 {
diff --git a/mm/ksm.c b/mm/ksm.c
index 7ca0dd7..9f2acc9 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -318,19 +318,14 @@ static void hold_anon_vma(struct rmap_item *rmap_item,
 			  struct anon_vma *anon_vma)
 {
 	rmap_item->anon_vma = anon_vma;
-	atomic_inc(&anon_vma->ksm_refcount);
+	get_anon_vma(anon_vma);
 }
 
-static void drop_anon_vma(struct rmap_item *rmap_item)
+static void ksm_drop_anon_vma(struct rmap_item *rmap_item)
 {
 	struct anon_vma *anon_vma = rmap_item->anon_vma;
 
-	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock)) {
-		int empty = list_empty(&anon_vma->head);
-		anon_vma_unlock(anon_vma);
-		if (empty)
-			anon_vma_free(anon_vma);
-	}
+	drop_anon_vma(anon_vma);
 }
 
 /*
@@ -415,7 +410,7 @@ static void break_cow(struct rmap_item *rmap_item)
 	 * It is not an accident that whenever we want to break COW
 	 * to undo, we also need to drop a reference to the anon_vma.
 	 */
-	drop_anon_vma(rmap_item);
+	ksm_drop_anon_vma(rmap_item);
 
 	down_read(&mm->mmap_sem);
 	if (ksm_test_exit(mm))
@@ -470,7 +465,7 @@ static void remove_node_from_stable_tree(struct stable_node *stable_node)
 			ksm_pages_sharing--;
 		else
 			ksm_pages_shared--;
-		drop_anon_vma(rmap_item);
+		ksm_drop_anon_vma(rmap_item);
 		rmap_item->address &= PAGE_MASK;
 		cond_resched();
 	}
@@ -558,7 +553,7 @@ static void remove_rmap_item_from_tree(struct rmap_item *rmap_item)
 		else
 			ksm_pages_shared--;
 
-		drop_anon_vma(rmap_item);
+		ksm_drop_anon_vma(rmap_item);
 		rmap_item->address &= PAGE_MASK;
 
 	} else if (rmap_item->address & UNSTABLE_FLAG) {
diff --git a/mm/rmap.c b/mm/rmap.c
index f0ba648..af87ef0 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -238,6 +238,12 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	 */
 	root_avc = list_entry(vma->anon_vma_chain.prev, struct anon_vma_chain, same_vma);
 	anon_vma->root = root_avc->anon_vma;
+	/*
+	 * With KSM refcounts, an anon_vma can stay around longer than the
+	 * process it belongs to.  The root anon_vma needs to be pinned
+	 * until this anon_vma is freed, because that is where the lock lives.
+	 */
+	get_anon_vma(anon_vma->root);
 	/* Mark this anon_vma as the one where our new (COWed) pages go. */
 	vma->anon_vma = anon_vma;
 	anon_vma_chain_link(vma, avc, anon_vma);
@@ -267,8 +273,11 @@ static void anon_vma_unlink(struct anon_vma_chain *anon_vma_chain)
 	empty = list_empty(&anon_vma->head) && !ksm_refcount(anon_vma);
 	anon_vma_unlock(anon_vma);
 
-	if (empty)
+	if (empty) {
+		/* We no longer need the root anon_vma */
+		drop_anon_vma(anon_vma->root);
 		anon_vma_free(anon_vma);
+	}
 }
 
 void unlink_anon_vmas(struct vm_area_struct *vma)
@@ -1355,6 +1364,40 @@ int try_to_munlock(struct page *page)
 		return try_to_unmap_file(page, TTU_MUNLOCK);
 }
 
+#ifdef CONFIG_KSM
+/*
+ * Drop an anon_vma refcount, freeing the anon_vma and anon_vma->root
+ * if necessary.  Be careful to do all the tests under the lock.  Once
+ * we know we are the last user, nobody else can get a reference and we
+ * can do the freeing without the lock.
+ */
+void drop_anon_vma(struct anon_vma *anon_vma)
+{
+	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock)) {
+		struct anon_vma *root = anon_vma->root;
+		int empty list_empty(&anon_vma->head);
+		int last_root_user = 0;
+		int root_empty = 0;
+
+		/*
+		 * The refcount on a non-root anon_vma got dropped.  Drop
+		 * the refcount on the root and check if we need to free it.
+		 */
+		if (empty && anon_vma != root) {
+			last_root_user = atomic_dec_and_test(&root->ksm_refcount);
+			root_empty = list_empty(&root->head);
+		}
+		anon_vma_unlock(anon_vma);
+
+		if (empty) {
+			anon_vma_free(anon_vma);
+			if (root_empty && last_root_user)
+				anon_vma_free(root);
+		}
+	}
+}
+#endif
+
 #ifdef CONFIG_MIGRATION
 /*
  * rmap_walk() and its helpers rmap_walk_anon() and rmap_walk_file():

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH -v2 4/5] always lock the root (oldest) anon_vma
  2010-05-13 14:33         ` [PATCH -v2 " Rik van Riel
@ 2010-05-13 21:09           ` Andrew Morton
  2010-05-13 22:50             ` Rik van Riel
  2010-05-26  4:00             ` Rik van Riel
  0 siblings, 2 replies; 51+ messages in thread
From: Andrew Morton @ 2010-05-13 21:09 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

On Thu, 13 May 2010 10:33:56 -0400
Rik van Riel <riel@redhat.com> wrote:

> > Looking at the if condition, brk() would appear to be the most important
> > case, right? This would appear to correlate with the reasoning behind
> > that condition in the first place in commit
> > 252c5f94d944487e9f50ece7942b0fbf659c5c31 where sbrk contended on the
> > lock heavily.
> 
> You are right.  Here is a new patch 4/5:
> ---------------------
> 
> Subject: always lock the root (oldest) anon_vma
> 
> Always (and only) lock the root (oldest) anon_vma whenever we do something in an
> anon_vma.  The recently introduced anon_vma scalability is due to the rmap code
> scanning only the VMAs that need to be scanned.  Many common operations still
> took the anon_vma lock on the root anon_vma, so always taking that lock is not
> expected to introduce any scalability issues.
> 
> However, always taking the same lock does mean we only need to take one lock,
> which means rmap_walk on pages from any anon_vma in the vma is excluded from
> occurring during an munmap, expand_stack or other operation that needs to
> exclude rmap_walk and similar functions.
> 
> Also add the proper locking to vma_adjust.
> 
> ...
>
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -325,7 +325,7 @@ static void drop_anon_vma(struct rmap_item *rmap_item)
>  {
>  	struct anon_vma *anon_vma = rmap_item->anon_vma;
>  
> -	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->lock)) {
> +	if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->root->lock)) {
>  		int empty = list_empty(&anon_vma->head);
>  		anon_vma_unlock(anon_vma);
>  		if (empty)

Well that had me confused for a while.  The anon_vma_unlock(anon_vma)
looks like it's unlocking a different lock from the one which
atomic_dec_and_lock() took.  But I worked it out!  I guess one could
add an anon_vma_atomic_dec_and_lock() to make things nice and
symmetrical, but there seems little point.  A comment would suffice.

It wouldn't hurt to add some nice descriptions to these functions, IMO.


General comment on these patches: I had to fix quite a lot of rejects
and some instances of spin_lock(vma->lock) were missed.  It would have
been a good idea to rename anon_vma.lock to something else early in the
patch series so that unconverted code fails to compile, rather than
causing mysterious bugs.  And if the requirement is that all code
should use the helper functions, the lock should be renamed to
double-underscore-something, with a suitable comment telling people not
to use it directly.


I'm still not very confident that I got them all.

<greps or a while>

What's this, in 

mm/migrate.c:unmap_and_move()?

	/* Drop an anon_vma reference if we took one */
	if (anon_vma && atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->lock)) {
		int empty = list_empty(&anon_vma->head);
		spin_unlock(&anon_vma->lock);
		if (empty)
			anon_vma_free(anon_vma);
	}

it looks awfully similar to drop_anon_vma().


I'm not very confident in merging all these onto the current MM pile.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH -v2 4/5] always lock the root (oldest) anon_vma
  2010-05-13 21:09           ` Andrew Morton
@ 2010-05-13 22:50             ` Rik van Riel
  2010-05-14  9:33               ` Mel Gorman
  2010-05-26  4:00             ` Rik van Riel
  1 sibling, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2010-05-13 22:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

On 05/13/2010 05:09 PM, Andrew Morton wrote:

> I'm not very confident in merging all these onto the current MM pile.

My apologies, I built them onto Linus's latest git
tree (where I know I did get all the anon_vma->lock
instances).

Andrew, Mel, want me to make a version of this series
against -mmotm, or does the migrate & compaction code
need to be modified in some non-obvious way that would
require Mel to create a new compaction series on top
of these anon_vma patches?

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/5] track the root (oldest) anon_vma
  2010-05-13  2:25     ` Rik van Riel
@ 2010-05-14  0:04       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 51+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-14  0:04 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, LKML, Linus Torvalds

On Wed, 12 May 2010 22:25:15 -0400
Rik van Riel <riel@redhat.com> wrote:

> On 05/12/2010 08:38 PM, KAMEZAWA Hiroyuki wrote:
> > On Wed, 12 May 2010 13:39:58 -0400
> > Rik van Riel<riel@redhat.com>  wrote:
> >
> >> Subject: track the root (oldest) anon_vma
> >>
> >> Track the root (oldest) anon_vma in each anon_vma tree.   Because we only
> >> take the lock on the root anon_vma, we cannot use the lock on higher-up
> >> anon_vmas to lock anything.  This makes it impossible to do an indirect
> >> lookup of the root anon_vma, since the data structures could go away from
> >> under us.
> >>
> >> However, a direct pointer is safe because the root anon_vma is always the
> >> last one that gets freed on munmap or exit, by virtue of the same_vma list
> >> order and unlink_anon_vmas walking the list forward.
> >>
> >> Signed-off-by: Rik van Riel<riel@redhat.com>
> >
> >
> > Acked-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> >
> > I welcome this. Thank you!
> >
> > Reading 4/5, I felt I'm grad if you add a Documentation or very-precise-comment
> > about the new anon_vma rules and the _meaning_ of anon_vma_root_lock.
> > I cannot fully convice myself that I understand them all.
> 
> Please send me a list of all the questions that come up
> when you read the patches, and I'll prepare a patch 6/5
> with just documentation :)
> 

0. Why it's dangerous to take vma->anon_vma->lock ?

1. What kinds of anon_vmas we'll found in
     page->mapping => anon_vma->head and avc->same_anon_vma ?
   IOW, what kinds of avc->vmas will see when we walk anon_vma->head.

2. Why we have to walk from the root ?

3. What anon_vma_lock guards, actually ?


etc....the facts which is unclear for guys who are not involved in this fix.
Preparing some explanation seems to be kindly rather than "plz ask google"

Bye.
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH -v2 4/5] always lock the root (oldest) anon_vma
  2010-05-13 22:50             ` Rik van Riel
@ 2010-05-14  9:33               ` Mel Gorman
  0 siblings, 0 replies; 51+ messages in thread
From: Mel Gorman @ 2010-05-14  9:33 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

On Thu, May 13, 2010 at 06:50:29PM -0400, Rik van Riel wrote:
> On 05/13/2010 05:09 PM, Andrew Morton wrote:
>
>> I'm not very confident in merging all these onto the current MM pile.
>
> My apologies, I built them onto Linus's latest git
> tree (where I know I did get all the anon_vma->lock
> instances).
>

And the instance Andrew ran into was specific to a migration fix in
mmotm where it was possible for anon_vma to disappear during migration.

> Andrew, Mel, want me to make a version of this series
> against -mmotm, or does the migrate & compaction code
> need to be modified in some non-obvious way that would
> require Mel to create a new compaction series on top
> of these anon_vma patches?
>

I'd like to see a version on top of mmotm at least. It isn't clear to me what
order these anon_vma changes were going in. Compaction should not be
affected by this series but the fixes to migration are. I'd expect the
main collision points to be with these patches.

mmmigration-take-a-reference-to-the-anon_vma-before-migrating.patch
mmmigration-share-the-anon_vma-ref-counts-between-ksm-and-page-migration.patch

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH -v2 5/5] extend KSM refcounts to the anon_vma root
  2010-05-13 14:34             ` [PATCH -v2 " Rik van Riel
@ 2010-05-19  1:05               ` Andrea Arcangeli
  0 siblings, 0 replies; 51+ messages in thread
From: Andrea Arcangeli @ 2010-05-19  1:05 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, Andrew Morton, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On Thu, May 13, 2010 at 10:34:46AM -0400, Rik van Riel wrote:
> +		int empty list_empty(&anon_vma->head);

Adding "=" and hope it all works.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 6/5] adjust mm_take_all_locks to anon-vma-root locking
  2010-05-12 17:38 [PATCH 0/5] always lock the root anon_vma Rik van Riel
                   ` (4 preceding siblings ...)
  2010-05-12 17:41 ` [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function Rik van Riel
@ 2010-05-20 22:42 ` Andrea Arcangeli
  2010-05-20 23:07   ` Rik van Riel
  5 siblings, 1 reply; 51+ messages in thread
From: Andrea Arcangeli @ 2010-05-20 22:42 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

This is needed as 6/5 to avoid lockups in mm_take_all_locks.

======
Subject: adjust mm_take_all_locks to the root_anon_vma locking

From: Andrea Arcangeli <aarcange@redhat.com>

Track the anon_vma->root->lock.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2480,7 +2480,7 @@ static DEFINE_MUTEX(mm_all_locks_mutex);
 
 static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
 {
-	if (!test_bit(0, (unsigned long *) &anon_vma->head.next)) {
+	if (!test_bit(0, (unsigned long *) &anon_vma->root->head.next)) {
 		/*
 		 * The LSB of head.next can't change from under us
 		 * because we hold the mm_all_locks_mutex.
@@ -2488,15 +2488,15 @@ static void vm_lock_anon_vma(struct mm_s
 		spin_lock_nest_lock(&anon_vma->root->lock, &mm->mmap_sem);
 		/*
 		 * We can safely modify head.next after taking the
-		 * anon_vma->lock. If some other vma in this mm shares
+		 * anon_vma->root->lock. If some other vma in this mm shares
 		 * the same anon_vma we won't take it again.
 		 *
 		 * No need of atomic instructions here, head.next
 		 * can't change from under us thanks to the
-		 * anon_vma->lock.
+		 * anon_vma->root->lock.
 		 */
 		if (__test_and_set_bit(0, (unsigned long *)
-				       &anon_vma->head.next))
+				       &anon_vma->root->head.next))
 			BUG();
 	}
 }
@@ -2537,12 +2537,12 @@ static void vm_lock_mapping(struct mm_st
  * A single task can't take more than one mm_take_all_locks() in a row
  * or it would deadlock.
  *
- * The LSB in anon_vma->head.next and the AS_MM_ALL_LOCKS bitflag in
+ * The LSB in anon_vma->root->head.next and the AS_MM_ALL_LOCKS bitflag in
  * mapping->flags avoid to take the same lock twice, if more than one
  * vma in this mm is backed by the same anon_vma or address_space.
  *
  * We can take all the locks in random order because the VM code
- * taking i_mmap_lock or anon_vma->lock outside the mmap_sem never
+ * taking i_mmap_lock or anon_vma->root->lock outside the mmap_sem never
  * takes more than one of them in a row. Secondly we're protected
  * against a concurrent mm_take_all_locks() by the mm_all_locks_mutex.
  *
@@ -2587,21 +2587,21 @@ out_unlock:
 
 static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
 {
-	if (test_bit(0, (unsigned long *) &anon_vma->head.next)) {
+	if (test_bit(0, (unsigned long *) &anon_vma->root->head.next)) {
 		/*
 		 * The LSB of head.next can't change to 0 from under
 		 * us because we hold the mm_all_locks_mutex.
 		 *
 		 * We must however clear the bitflag before unlocking
-		 * the vma so the users using the anon_vma->head will
+		 * the vma so the users using the anon_vma->root->head will
 		 * never see our bitflag.
 		 *
 		 * No need of atomic instructions here, head.next
 		 * can't change from under us until we release the
-		 * anon_vma->lock.
+		 * anon_vma->root->lock.
 		 */
 		if (!__test_and_clear_bit(0, (unsigned long *)
-					  &anon_vma->head.next))
+					  &anon_vma->root->head.next))
 			BUG();
 		anon_vma_unlock(anon_vma);
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 6/5] adjust mm_take_all_locks to anon-vma-root locking
  2010-05-20 22:42 ` [PATCH 6/5] adjust mm_take_all_locks to anon-vma-root locking Andrea Arcangeli
@ 2010-05-20 23:07   ` Rik van Riel
  0 siblings, 0 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-20 23:07 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, Mel Gorman, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds

On 05/20/2010 06:42 PM, Andrea Arcangeli wrote:
> This is needed as 6/5 to avoid lockups in mm_take_all_locks.
>
> ======
> Subject: adjust mm_take_all_locks to the root_anon_vma locking
>
> From: Andrea Arcangeli<aarcange@redhat.com>
>
> Track the anon_vma->root->lock.
>
> Signed-off-by: Andrea Arcangeli<aarcange@redhat.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH -v2 4/5] always lock the root (oldest) anon_vma
  2010-05-13 21:09           ` Andrew Morton
  2010-05-13 22:50             ` Rik van Riel
@ 2010-05-26  4:00             ` Rik van Riel
  2010-05-26  4:15               ` Andrew Morton
  2010-05-26 15:24               ` [PATCH -v2 0/5] always lock the root anon_vma Rik van Riel
  1 sibling, 2 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-26  4:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

On 05/13/2010 05:09 PM, Andrew Morton wrote:

> I'm not very confident in merging all these onto the current MM pile.

Blah.  I thought I just did that (and wondered why it was
so easy), and then I saw that the MMOTM git tree is old
and does not have the COMPACTION code :(

On to doing this thing again :/

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH -v2 4/5] always lock the root (oldest) anon_vma
  2010-05-26  4:00             ` Rik van Riel
@ 2010-05-26  4:15               ` Andrew Morton
  2010-05-26  5:46                 ` james toy
  2010-05-26 15:24               ` [PATCH -v2 0/5] always lock the root anon_vma Rik van Riel
  1 sibling, 1 reply; 51+ messages in thread
From: Andrew Morton @ 2010-05-26  4:15 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn,
	james toy, james toy, James Toy

On Wed, 26 May 2010 00:00:15 -0400 Rik van Riel <riel@redhat.com> wrote:

> On 05/13/2010 05:09 PM, Andrew Morton wrote:
> 
> > I'm not very confident in merging all these onto the current MM pile.
> 
> Blah.  I thought I just did that (and wondered why it was
> so easy), and then I saw that the MMOTM git tree is old
> and does not have the COMPACTION code :(
> 

Oh.  James's mmotm->git bot might have broken.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH -v2 4/5] always lock the root (oldest) anon_vma
  2010-05-26  4:15               ` Andrew Morton
@ 2010-05-26  5:46                 ` james toy
  2010-06-01  0:57                   ` james toy
  0 siblings, 1 reply; 51+ messages in thread
From: james toy @ 2010-05-26  5:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn,
	james toy, james toy, James Toy

I'll get after this asap; sorry.  I'm finishing my last trimester of
my B.S.  I'll send a message when it's back up with the offending patch.

=jt

On May 26, 2010, at 0:15, Andrew Morton <akpm@linux-foundation.org>
wrote:

> On Wed, 26 May 2010 00:00:15 -0400 Rik van Riel <riel@redhat.com>
> wrote:
>
>> On 05/13/2010 05:09 PM, Andrew Morton wrote:
>>
>>> I'm not very confident in merging all these onto the current MM
>>> pile.
>>
>> Blah.  I thought I just did that (and wondered why it was
>> so easy), and then I saw that the MMOTM git tree is old
>> and does not have the COMPACTION code :(
>>
>
> Oh.  James's mmotm->git bot might have broken.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH -v2 0/5] always lock the root anon_vma
  2010-05-26  4:00             ` Rik van Riel
  2010-05-26  4:15               ` Andrew Morton
@ 2010-05-26 15:24               ` Rik van Riel
  2010-05-26 15:25                 ` [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma Rik van Riel
                                   ` (4 more replies)
  1 sibling, 5 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-26 15:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

On Wed, 26 May 2010 00:00:15 -0400
Rik van Riel <riel@redhat.com> wrote:
> On 05/13/2010 05:09 PM, Andrew Morton wrote:
> 
> > I'm not very confident in merging all these onto the current MM pile.
> 
> Blah.  I thought I just did that (and wondered why it was
> so easy), and then I saw that the MMOTM git tree is old
> and does not have the COMPACTION code :(
> 
> On to doing this thing again :/

Andrew, here are the patches to always lock the root anon_vma,
ported to the latest -mm tree.

These patches implement Linus's idea of always locking the root
anon_vma and contain all the fixes and improvements suggested 
by Andrea.

This should fix the last bits of the anon_vma locking.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma
  2010-05-26 15:24               ` [PATCH -v2 0/5] always lock the root anon_vma Rik van Riel
@ 2010-05-26 15:25                 ` Rik van Riel
  2010-05-26 17:25                   ` Linus Torvalds
  2010-05-26 15:25                 ` [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function Rik van Riel
                                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2010-05-26 15:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

Subject: rename anon_vma_lock to vma_lock_anon_vma

Rename anon_vma_lock to vma_lock_anon_vma.  This matches the
naming style used in page_lock_anon_vma and will come in really
handy further down in this patch series.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/rmap.h |    4 ++--
 mm/mmap.c            |   14 +++++++-------
 2 files changed, 9 insertions(+), 9 deletions(-)

Index: linux-2.6.34/include/linux/rmap.h
===================================================================
--- linux-2.6.34.orig/include/linux/rmap.h
+++ linux-2.6.34/include/linux/rmap.h
@@ -99,14 +99,14 @@ static inline struct anon_vma *page_anon
 	return page_rmapping(page);
 }
 
-static inline void anon_vma_lock(struct vm_area_struct *vma)
+static inline void vma_lock_anon_vma(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
 		spin_lock(&anon_vma->lock);
 }
 
-static inline void anon_vma_unlock(struct vm_area_struct *vma)
+static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
Index: linux-2.6.34/mm/mmap.c
===================================================================
--- linux-2.6.34.orig/mm/mmap.c
+++ linux-2.6.34/mm/mmap.c
@@ -452,12 +452,12 @@ static void vma_link(struct mm_struct *m
 		spin_lock(&mapping->i_mmap_lock);
 		vma->vm_truncate_count = mapping->truncate_count;
 	}
-	anon_vma_lock(vma);
+	vma_lock_anon_vma(vma);
 
 	__vma_link(mm, vma, prev, rb_link, rb_parent);
 	__vma_link_file(vma);
 
-	anon_vma_unlock(vma);
+	vma_unlock_anon_vma(vma);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
 
@@ -1710,7 +1710,7 @@ int expand_upwards(struct vm_area_struct
 	 */
 	if (unlikely(anon_vma_prepare(vma)))
 		return -ENOMEM;
-	anon_vma_lock(vma);
+	vma_lock_anon_vma(vma);
 
 	/*
 	 * vma->vm_start/vm_end cannot change under us because the caller
@@ -1721,7 +1721,7 @@ int expand_upwards(struct vm_area_struct
 	if (address < PAGE_ALIGN(address+4))
 		address = PAGE_ALIGN(address+4);
 	else {
-		anon_vma_unlock(vma);
+		vma_unlock_anon_vma(vma);
 		return -ENOMEM;
 	}
 	error = 0;
@@ -1737,7 +1737,7 @@ int expand_upwards(struct vm_area_struct
 		if (!error)
 			vma->vm_end = address;
 	}
-	anon_vma_unlock(vma);
+	vma_unlock_anon_vma(vma);
 	return error;
 }
 #endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */
@@ -1762,7 +1762,7 @@ static int expand_downwards(struct vm_ar
 	if (error)
 		return error;
 
-	anon_vma_lock(vma);
+	vma_lock_anon_vma(vma);
 
 	/*
 	 * vma->vm_start/vm_end cannot change under us because the caller
@@ -1783,7 +1783,7 @@ static int expand_downwards(struct vm_ar
 			vma->vm_pgoff -= grow;
 		}
 	}
-	anon_vma_unlock(vma);
+	vma_unlock_anon_vma(vma);
 	return error;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function
  2010-05-26 15:24               ` [PATCH -v2 0/5] always lock the root anon_vma Rik van Riel
  2010-05-26 15:25                 ` [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma Rik van Riel
@ 2010-05-26 15:25                 ` Rik van Riel
  2010-05-26 15:26                 ` [PATCH 3/5] track the root (oldest) anon_vma Rik van Riel
                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-26 15:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

Subject: change direct call of spin_lock(anon_vma->lock) to inline function

Subsitute a direct call of spin_lock(anon_vma->lock) with
an inline function doing exactly the same.

This makes it easier to do the substitution to the root
anon_vma lock in a following patch.

We will deal with the handful of special locks (nested,
dec_and_lock, etc) separately.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/rmap.h |   10 ++++++++++
 mm/ksm.c             |   18 +++++++++---------
 mm/migrate.c         |    2 +-
 mm/mmap.c            |    2 +-
 mm/rmap.c            |   22 +++++++++++-----------
 5 files changed, 32 insertions(+), 22 deletions(-)

Index: linux-2.6.34/include/linux/rmap.h
===================================================================
--- linux-2.6.34.orig/include/linux/rmap.h
+++ linux-2.6.34/include/linux/rmap.h
@@ -113,6 +113,16 @@ static inline void vma_unlock_anon_vma(s
 		spin_unlock(&anon_vma->lock);
 }
 
+static inline void anon_vma_lock(struct anon_vma *anon_vma)
+{
+	spin_lock(&anon_vma->lock);
+}
+
+static inline void anon_vma_unlock(struct anon_vma *anon_vma)
+{
+	spin_unlock(&anon_vma->lock);
+}
+
 /*
  * anon_vma helper functions.
  */
Index: linux-2.6.34/mm/ksm.c
===================================================================
--- linux-2.6.34.orig/mm/ksm.c
+++ linux-2.6.34/mm/ksm.c
@@ -327,7 +327,7 @@ static void drop_anon_vma(struct rmap_it
 
 	if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->lock)) {
 		int empty = list_empty(&anon_vma->head);
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 		if (empty)
 			anon_vma_free(anon_vma);
 	}
@@ -1566,7 +1566,7 @@ again:
 		struct anon_vma_chain *vmac;
 		struct vm_area_struct *vma;
 
-		spin_lock(&anon_vma->lock);
+		anon_vma_lock(anon_vma);
 		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
 			vma = vmac->vma;
 			if (rmap_item->address < vma->vm_start ||
@@ -1589,7 +1589,7 @@ again:
 			if (!search_new_forks || !mapcount)
 				break;
 		}
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 		if (!mapcount)
 			goto out;
 	}
@@ -1619,7 +1619,7 @@ again:
 		struct anon_vma_chain *vmac;
 		struct vm_area_struct *vma;
 
-		spin_lock(&anon_vma->lock);
+		anon_vma_lock(anon_vma);
 		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
 			vma = vmac->vma;
 			if (rmap_item->address < vma->vm_start ||
@@ -1637,11 +1637,11 @@ again:
 			ret = try_to_unmap_one(page, vma,
 					rmap_item->address, flags);
 			if (ret != SWAP_AGAIN || !page_mapped(page)) {
-				spin_unlock(&anon_vma->lock);
+				anon_vma_unlock(anon_vma);
 				goto out;
 			}
 		}
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 	}
 	if (!search_new_forks++)
 		goto again;
@@ -1671,7 +1671,7 @@ again:
 		struct anon_vma_chain *vmac;
 		struct vm_area_struct *vma;
 
-		spin_lock(&anon_vma->lock);
+		anon_vma_lock(anon_vma);
 		list_for_each_entry(vmac, &anon_vma->head, same_anon_vma) {
 			vma = vmac->vma;
 			if (rmap_item->address < vma->vm_start ||
@@ -1688,11 +1688,11 @@ again:
 
 			ret = rmap_one(page, vma, rmap_item->address, arg);
 			if (ret != SWAP_AGAIN) {
-				spin_unlock(&anon_vma->lock);
+				anon_vma_unlock(anon_vma);
 				goto out;
 			}
 		}
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 	}
 	if (!search_new_forks++)
 		goto again;
Index: linux-2.6.34/mm/mmap.c
===================================================================
--- linux-2.6.34.orig/mm/mmap.c
+++ linux-2.6.34/mm/mmap.c
@@ -2589,7 +2589,7 @@ static void vm_unlock_anon_vma(struct an
 		if (!__test_and_clear_bit(0, (unsigned long *)
 					  &anon_vma->head.next))
 			BUG();
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 	}
 }
 
Index: linux-2.6.34/mm/rmap.c
===================================================================
--- linux-2.6.34.orig/mm/rmap.c
+++ linux-2.6.34/mm/rmap.c
@@ -134,7 +134,7 @@ int anon_vma_prepare(struct vm_area_stru
 			allocated = anon_vma;
 		}
 
-		spin_lock(&anon_vma->lock);
+		anon_vma_lock(anon_vma);
 		/* page_table_lock to protect against threads */
 		spin_lock(&mm->page_table_lock);
 		if (likely(!vma->anon_vma)) {
@@ -147,7 +147,7 @@ int anon_vma_prepare(struct vm_area_stru
 			avc = NULL;
 		}
 		spin_unlock(&mm->page_table_lock);
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 
 		if (unlikely(allocated))
 			anon_vma_free(allocated);
@@ -170,9 +170,9 @@ static void anon_vma_chain_link(struct v
 	avc->anon_vma = anon_vma;
 	list_add(&avc->same_vma, &vma->anon_vma_chain);
 
-	spin_lock(&anon_vma->lock);
+	anon_vma_lock(anon_vma);
 	list_add_tail(&avc->same_anon_vma, &anon_vma->head);
-	spin_unlock(&anon_vma->lock);
+	anon_vma_unlock(anon_vma);
 }
 
 /*
@@ -246,12 +246,12 @@ static void anon_vma_unlink(struct anon_
 	if (!anon_vma)
 		return;
 
-	spin_lock(&anon_vma->lock);
+	anon_vma_lock(anon_vma);
 	list_del(&anon_vma_chain->same_anon_vma);
 
 	/* We must garbage collect the anon_vma if it's empty */
 	empty = list_empty(&anon_vma->head) && !anonvma_external_refcount(anon_vma);
-	spin_unlock(&anon_vma->lock);
+	anon_vma_unlock(anon_vma);
 
 	if (empty)
 		anon_vma_free(anon_vma);
@@ -303,10 +303,10 @@ again:
 		goto out;
 
 	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
-	spin_lock(&anon_vma->lock);
+	anon_vma_lock(anon_vma);
 
 	if (page_rmapping(page) != anon_vma) {
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 		goto again;
 	}
 
@@ -318,7 +318,7 @@ out:
 
 void page_unlock_anon_vma(struct anon_vma *anon_vma)
 {
-	spin_unlock(&anon_vma->lock);
+	anon_vma_unlock(anon_vma);
 	rcu_read_unlock();
 }
 
@@ -1396,7 +1396,7 @@ static int rmap_walk_anon(struct page *p
 	anon_vma = page_anon_vma(page);
 	if (!anon_vma)
 		return ret;
-	spin_lock(&anon_vma->lock);
+	anon_vma_lock(anon_vma);
 	list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
 		struct vm_area_struct *vma = avc->vma;
 		unsigned long address = vma_address(page, vma);
@@ -1406,7 +1406,7 @@ static int rmap_walk_anon(struct page *p
 		if (ret != SWAP_AGAIN)
 			break;
 	}
-	spin_unlock(&anon_vma->lock);
+	anon_vma_unlock(anon_vma);
 	return ret;
 }
 
Index: linux-2.6.34/mm/migrate.c
===================================================================
--- linux-2.6.34.orig/mm/migrate.c
+++ linux-2.6.34/mm/migrate.c
@@ -684,7 +684,7 @@ rcu_unlock:
 	/* Drop an anon_vma reference if we took one */
 	if (anon_vma && atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->lock)) {
 		int empty = list_empty(&anon_vma->head);
-		spin_unlock(&anon_vma->lock);
+		anon_vma_unlock(anon_vma);
 		if (empty)
 			anon_vma_free(anon_vma);
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 3/5] track the root (oldest) anon_vma
  2010-05-26 15:24               ` [PATCH -v2 0/5] always lock the root anon_vma Rik van Riel
  2010-05-26 15:25                 ` [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma Rik van Riel
  2010-05-26 15:25                 ` [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function Rik van Riel
@ 2010-05-26 15:26                 ` Rik van Riel
  2010-05-26 15:27                 ` [PATCH 4/5] always lock " Rik van Riel
  2010-05-26 15:27                 ` [PATCH 5/5] extend KSM refcounts to the anon_vma root Rik van Riel
  4 siblings, 0 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-26 15:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

Subject: track the root (oldest) anon_vma

Track the root (oldest) anon_vma in each anon_vma tree.   Because we only
take the lock on the root anon_vma, we cannot use the lock on higher-up
anon_vmas to lock anything.  This makes it impossible to do an indirect
lookup of the root anon_vma, since the data structures could go away from
under us.

However, a direct pointer is safe because the root anon_vma is always the
last one that gets freed on munmap or exit, by virtue of the same_vma list
order and unlink_anon_vmas walking the list forward.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/rmap.h |    1 +
 mm/rmap.c            |   18 ++++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

Index: linux-2.6.34/include/linux/rmap.h
===================================================================
--- linux-2.6.34.orig/include/linux/rmap.h
+++ linux-2.6.34/include/linux/rmap.h
@@ -26,6 +26,7 @@
  */
 struct anon_vma {
 	spinlock_t lock;	/* Serialize access to vma list */
+	struct anon_vma *root;	/* Root of this anon_vma tree */
 #if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
 
 	/*
Index: linux-2.6.34/mm/rmap.c
===================================================================
--- linux-2.6.34.orig/mm/rmap.c
+++ linux-2.6.34/mm/rmap.c
@@ -132,6 +132,11 @@ int anon_vma_prepare(struct vm_area_stru
 			if (unlikely(!anon_vma))
 				goto out_enomem_free_avc;
 			allocated = anon_vma;
+			/*
+			 * This VMA had no anon_vma yet.  This anon_vma is
+			 * the root of any anon_vma tree that might form.
+			 */
+			anon_vma->root = anon_vma;
 		}
 
 		anon_vma_lock(anon_vma);
@@ -224,9 +229,15 @@ int anon_vma_fork(struct vm_area_struct 
 	avc = anon_vma_chain_alloc();
 	if (!avc)
 		goto out_error_free_anon_vma;
-	anon_vma_chain_link(vma, avc, anon_vma);
+
+	/*
+	 * The root anon_vm's spinlock is the lock actually used when we
+	 * lock any of the anon_vmas in this anon_vma tree.
+	 */
+	anon_vma->root = pvma->anon_vma->root;
 	/* Mark this anon_vma as the one where our new (COWed) pages go. */
 	vma->anon_vma = anon_vma;
+	anon_vma_chain_link(vma, avc, anon_vma);
 
 	return 0;
 
@@ -261,7 +272,10 @@ void unlink_anon_vmas(struct vm_area_str
 {
 	struct anon_vma_chain *avc, *next;
 
-	/* Unlink each anon_vma chained to the VMA. */
+	/*
+	 * Unlink each anon_vma chained to the VMA.  This list is ordered
+	 * from newest to oldest, ensuring the root anon_vma gets freed last.
+	 */
 	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
 		anon_vma_unlink(avc);
 		list_del(&avc->same_vma);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 4/5] always lock the root (oldest) anon_vma
  2010-05-26 15:24               ` [PATCH -v2 0/5] always lock the root anon_vma Rik van Riel
                                   ` (2 preceding siblings ...)
  2010-05-26 15:26                 ` [PATCH 3/5] track the root (oldest) anon_vma Rik van Riel
@ 2010-05-26 15:27                 ` Rik van Riel
  2010-05-26 15:27                 ` [PATCH 5/5] extend KSM refcounts to the anon_vma root Rik van Riel
  4 siblings, 0 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-26 15:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

Subject: always lock the root (oldest) anon_vma

Always (and only) lock the root (oldest) anon_vma whenever we do something in an
anon_vma.  The recently introduced anon_vma scalability is due to the rmap code
scanning only the VMAs that need to be scanned.  Many common operations still
took the anon_vma lock on the root anon_vma, so always taking that lock is not
expected to introduce any scalability issues.

However, always taking the same lock does mean we only need to take one lock,
which means rmap_walk on pages from any anon_vma in the vma is excluded from
occurring during an munmap, expand_stack or other operation that needs to
exclude rmap_walk and similar functions.

Also add the proper locking to vma_adjust.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
v3:
 - fix locking inversion in vma_adjust, spotted by Andrea
 - mm_take_all locks needs to use the bitflag in the root anon_vma,
   since that is the one that gets locked (Andrea Arcangeli)
v2:
 - conditionally take the anon_vma lock in vma_adjust, like introduced
   in 252c5f94d944487e9f50ece7942b0fbf659c5c31  (with a proper comment)

 include/linux/rmap.h |    8 ++++----
 mm/ksm.c             |    2 +-
 mm/migrate.c         |    2 +-
 mm/mmap.c            |   30 ++++++++++++++++++++++--------
 4 files changed, 28 insertions(+), 14 deletions(-)

Index: linux-2.6.34/include/linux/rmap.h
===================================================================
--- linux-2.6.34.orig/include/linux/rmap.h
+++ linux-2.6.34/include/linux/rmap.h
@@ -104,24 +104,24 @@ static inline void vma_lock_anon_vma(str
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_lock(&anon_vma->lock);
+		spin_lock(&anon_vma->root->lock);
 }
 
 static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_unlock(&anon_vma->lock);
+		spin_unlock(&anon_vma->root->lock);
 }
 
 static inline void anon_vma_lock(struct anon_vma *anon_vma)
 {
-	spin_lock(&anon_vma->lock);
+	spin_lock(&anon_vma->root->lock);
 }
 
 static inline void anon_vma_unlock(struct anon_vma *anon_vma)
 {
-	spin_unlock(&anon_vma->lock);
+	spin_unlock(&anon_vma->root->lock);
 }
 
 /*
Index: linux-2.6.34/mm/ksm.c
===================================================================
--- linux-2.6.34.orig/mm/ksm.c
+++ linux-2.6.34/mm/ksm.c
@@ -325,7 +325,7 @@ static void drop_anon_vma(struct rmap_it
 {
 	struct anon_vma *anon_vma = rmap_item->anon_vma;
 
-	if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->lock)) {
+	if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->root->lock)) {
 		int empty = list_empty(&anon_vma->head);
 		anon_vma_unlock(anon_vma);
 		if (empty)
Index: linux-2.6.34/mm/mmap.c
===================================================================
--- linux-2.6.34.orig/mm/mmap.c
+++ linux-2.6.34/mm/mmap.c
@@ -506,6 +506,7 @@ int vma_adjust(struct vm_area_struct *vm
 	struct vm_area_struct *importer = NULL;
 	struct address_space *mapping = NULL;
 	struct prio_tree_root *root = NULL;
+	struct anon_vma *anon_vma = NULL;
 	struct file *file = vma->vm_file;
 	long adjust_next = 0;
 	int remove_next = 0;
@@ -578,6 +579,17 @@ again:			remove_next = 1 + (end > next->
 		}
 	}
 
+	/*
+	 * When changing only vma->vm_end, we don't really need anon_vma
+	 * lock. This is a fairly rare case by itself, but the anon_vma
+	 * lock may be shared between many sibling processes.  Skipping
+	 * the lock for brk adjustments makes a difference sometimes.
+	 */
+	if (vma->anon_vma && (insert || importer || start != vma->vm_start)) {
+		anon_vma = vma->anon_vma;
+		anon_vma_lock(anon_vma);
+	}
+
 	if (root) {
 		flush_dcache_mmap_lock(mapping);
 		vma_prio_tree_remove(vma, root);
@@ -617,6 +629,8 @@ again:			remove_next = 1 + (end > next->
 		__insert_vm_struct(mm, insert);
 	}
 
+	if (anon_vma)
+		anon_vma_unlock(anon_vma);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
 
@@ -2466,23 +2480,23 @@ static DEFINE_MUTEX(mm_all_locks_mutex);
 
 static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
 {
-	if (!test_bit(0, (unsigned long *) &anon_vma->head.next)) {
+	if (!test_bit(0, (unsigned long *) &anon_vma->root->head.next)) {
 		/*
 		 * The LSB of head.next can't change from under us
 		 * because we hold the mm_all_locks_mutex.
 		 */
-		spin_lock_nest_lock(&anon_vma->lock, &mm->mmap_sem);
+		spin_lock_nest_lock(&anon_vma->root->lock, &mm->mmap_sem);
 		/*
 		 * We can safely modify head.next after taking the
-		 * anon_vma->lock. If some other vma in this mm shares
+		 * anon_vma->root->lock. If some other vma in this mm shares
 		 * the same anon_vma we won't take it again.
 		 *
 		 * No need of atomic instructions here, head.next
 		 * can't change from under us thanks to the
-		 * anon_vma->lock.
+		 * anon_vma->root->lock.
 		 */
 		if (__test_and_set_bit(0, (unsigned long *)
-				       &anon_vma->head.next))
+				       &anon_vma->root->head.next))
 			BUG();
 	}
 }
@@ -2573,7 +2587,7 @@ out_unlock:
 
 static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
 {
-	if (test_bit(0, (unsigned long *) &anon_vma->head.next)) {
+	if (test_bit(0, (unsigned long *) &anon_vma->root->head.next)) {
 		/*
 		 * The LSB of head.next can't change to 0 from under
 		 * us because we hold the mm_all_locks_mutex.
@@ -2584,10 +2598,10 @@ static void vm_unlock_anon_vma(struct an
 		 *
 		 * No need of atomic instructions here, head.next
 		 * can't change from under us until we release the
-		 * anon_vma->lock.
+		 * anon_vma->root->lock.
 		 */
 		if (!__test_and_clear_bit(0, (unsigned long *)
-					  &anon_vma->head.next))
+					  &anon_vma->root->head.next))
 			BUG();
 		anon_vma_unlock(anon_vma);
 	}
Index: linux-2.6.34/mm/migrate.c
===================================================================
--- linux-2.6.34.orig/mm/migrate.c
+++ linux-2.6.34/mm/migrate.c
@@ -682,7 +682,7 @@ skip_unmap:
 rcu_unlock:
 
 	/* Drop an anon_vma reference if we took one */
-	if (anon_vma && atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->lock)) {
+	if (anon_vma && atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->root->lock)) {
 		int empty = list_empty(&anon_vma->head);
 		anon_vma_unlock(anon_vma);
 		if (empty)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 5/5] extend KSM refcounts to the anon_vma root
  2010-05-26 15:24               ` [PATCH -v2 0/5] always lock the root anon_vma Rik van Riel
                                   ` (3 preceding siblings ...)
  2010-05-26 15:27                 ` [PATCH 4/5] always lock " Rik van Riel
@ 2010-05-26 15:27                 ` Rik van Riel
  4 siblings, 0 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-26 15:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, Minchan Kim, Linux-MM,
	KAMEZAWA Hiroyuki, LKML, Linus Torvalds, Lee Schermerhorn

Subject: extend KSM refcounts to the anon_vma root

KSM reference counts can cause an anon_vma to exist after the processe
it belongs to have already exited.  Because the anon_vma lock now lives
in the root anon_vma, we need to ensure that the root anon_vma stays
around until after all the "child" anon_vmas have been freed.

The obvious way to do this is to have a "child" anon_vma take a
reference to the root in anon_vma_fork.  When the anon_vma is freed
at munmap or process exit, we drop the refcount in anon_vma_unlink
and possibly free the root anon_vma.

The KSM anon_vma reference count function also needs to be modified
to deal with the possibility of freeing 2 levels of anon_vma.  The
easiest way to do this is to break out the KSM magic and make it
generic.

When compiling without CONFIG_KSM, this code is compiled out.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
v2:
 - merge with -mm and the compaction code
 - improve the anon_vma refcount comment in anon_vma_fork with the
   refcount lifetime

 include/linux/rmap.h |   15 +++++++++++++++
 mm/ksm.c             |   17 ++++++-----------
 mm/migrate.c         |   10 +++-------
 mm/rmap.c            |   45 ++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 68 insertions(+), 19 deletions(-)

Index: linux-2.6.34/include/linux/rmap.h
===================================================================
--- linux-2.6.34.orig/include/linux/rmap.h
+++ linux-2.6.34/include/linux/rmap.h
@@ -81,6 +81,13 @@ static inline int anonvma_external_refco
 {
 	return atomic_read(&anon_vma->external_refcount);
 }
+
+static inline void get_anon_vma(struct anon_vma *anon_vma)
+{
+	atomic_inc(&anon_vma->external_refcount);
+}
+
+void drop_anon_vma(struct anon_vma *);
 #else
 static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
 {
@@ -90,6 +97,14 @@ static inline int anonvma_external_refco
 {
 	return 0;
 }
+
+static inline void get_anon_vma(struct anon_vma *anon_vma)
+{
+}
+
+static inline void drop_anon_vma(struct anon_vma *anon_vma)
+{
+}
 #endif /* CONFIG_KSM */
 
 static inline struct anon_vma *page_anon_vma(struct page *page)
Index: linux-2.6.34/mm/ksm.c
===================================================================
--- linux-2.6.34.orig/mm/ksm.c
+++ linux-2.6.34/mm/ksm.c
@@ -318,19 +318,14 @@ static void hold_anon_vma(struct rmap_it
 			  struct anon_vma *anon_vma)
 {
 	rmap_item->anon_vma = anon_vma;
-	atomic_inc(&anon_vma->external_refcount);
+	get_anon_vma(anon_vma);
 }
 
-static void drop_anon_vma(struct rmap_item *rmap_item)
+static void ksm_drop_anon_vma(struct rmap_item *rmap_item)
 {
 	struct anon_vma *anon_vma = rmap_item->anon_vma;
 
-	if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->root->lock)) {
-		int empty = list_empty(&anon_vma->head);
-		anon_vma_unlock(anon_vma);
-		if (empty)
-			anon_vma_free(anon_vma);
-	}
+	drop_anon_vma(anon_vma);
 }
 
 /*
@@ -415,7 +410,7 @@ static void break_cow(struct rmap_item *
 	 * It is not an accident that whenever we want to break COW
 	 * to undo, we also need to drop a reference to the anon_vma.
 	 */
-	drop_anon_vma(rmap_item);
+	ksm_drop_anon_vma(rmap_item);
 
 	down_read(&mm->mmap_sem);
 	if (ksm_test_exit(mm))
@@ -470,7 +465,7 @@ static void remove_node_from_stable_tree
 			ksm_pages_sharing--;
 		else
 			ksm_pages_shared--;
-		drop_anon_vma(rmap_item);
+		ksm_drop_anon_vma(rmap_item);
 		rmap_item->address &= PAGE_MASK;
 		cond_resched();
 	}
@@ -558,7 +553,7 @@ static void remove_rmap_item_from_tree(s
 		else
 			ksm_pages_shared--;
 
-		drop_anon_vma(rmap_item);
+		ksm_drop_anon_vma(rmap_item);
 		rmap_item->address &= PAGE_MASK;
 
 	} else if (rmap_item->address & UNSTABLE_FLAG) {
Index: linux-2.6.34/mm/rmap.c
===================================================================
--- linux-2.6.34.orig/mm/rmap.c
+++ linux-2.6.34/mm/rmap.c
@@ -235,6 +235,12 @@ int anon_vma_fork(struct vm_area_struct 
 	 * lock any of the anon_vmas in this anon_vma tree.
 	 */
 	anon_vma->root = pvma->anon_vma->root;
+	/*
+	 * With KSM refcounts, an anon_vma can stay around longer than the
+	 * process it belongs to.  The root anon_vma needs to be pinned
+	 * until this anon_vma is freed, because the lock lives in the root.
+	 */
+	get_anon_vma(anon_vma->root);
 	/* Mark this anon_vma as the one where our new (COWed) pages go. */
 	vma->anon_vma = anon_vma;
 	anon_vma_chain_link(vma, avc, anon_vma);
@@ -264,8 +270,11 @@ static void anon_vma_unlink(struct anon_
 	empty = list_empty(&anon_vma->head) && !anonvma_external_refcount(anon_vma);
 	anon_vma_unlock(anon_vma);
 
-	if (empty)
+	if (empty) {
+		/* We no longer need the root anon_vma */
+		drop_anon_vma(anon_vma->root);
 		anon_vma_free(anon_vma);
+	}
 }
 
 void unlink_anon_vmas(struct vm_area_struct *vma)
@@ -1389,6 +1398,40 @@ int try_to_munlock(struct page *page)
 		return try_to_unmap_file(page, TTU_MUNLOCK);
 }
 
+#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
+/*
+ * Drop an anon_vma refcount, freeing the anon_vma and anon_vma->root
+ * if necessary.  Be careful to do all the tests under the lock.  Once
+ * we know we are the last user, nobody else can get a reference and we
+ * can do the freeing without the lock.
+ */
+void drop_anon_vma(struct anon_vma *anon_vma)
+{
+	if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->root->lock)) {
+		struct anon_vma *root = anon_vma->root;
+		int empty = list_empty(&anon_vma->head);
+		int last_root_user = 0;
+		int root_empty = 0;
+
+		/*
+		 * The refcount on a non-root anon_vma got dropped.  Drop
+		 * the refcount on the root and check if we need to free it.
+		 */
+		if (empty && anon_vma != root) {
+			last_root_user = atomic_dec_and_test(&root->external_refcount);
+			root_empty = list_empty(&root->head);
+		}
+		anon_vma_unlock(anon_vma);
+
+		if (empty) {
+			anon_vma_free(anon_vma);
+			if (root_empty && last_root_user)
+				anon_vma_free(root);
+		}
+	}
+}
+#endif
+
 #ifdef CONFIG_MIGRATION
 /*
  * rmap_walk() and its helpers rmap_walk_anon() and rmap_walk_file():
Index: linux-2.6.34/mm/migrate.c
===================================================================
--- linux-2.6.34.orig/mm/migrate.c
+++ linux-2.6.34/mm/migrate.c
@@ -639,7 +639,7 @@ static int unmap_and_move(new_page_t get
 			 * exist when the page is remapped later
 			 */
 			anon_vma = page_anon_vma(page);
-			atomic_inc(&anon_vma->external_refcount);
+			get_anon_vma(anon_vma);
 		}
 	}
 
@@ -682,12 +682,8 @@ skip_unmap:
 rcu_unlock:
 
 	/* Drop an anon_vma reference if we took one */
-	if (anon_vma && atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->root->lock)) {
-		int empty = list_empty(&anon_vma->head);
-		anon_vma_unlock(anon_vma);
-		if (empty)
-			anon_vma_free(anon_vma);
-	}
+	if (anon_vma)
+		drop_anon_vma(anon_vma);
 
 	if (rcu_locked)
 		rcu_read_unlock();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma
  2010-05-26 15:25                 ` [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma Rik van Riel
@ 2010-05-26 17:25                   ` Linus Torvalds
  2010-05-26 19:01                     ` Rik van Riel
  0 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2010-05-26 17:25 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, KAMEZAWA Hiroyuki, LKML, Lee Schermerhorn



On Wed, 26 May 2010, Rik van Riel wrote:
>
> Subject: rename anon_vma_lock to vma_lock_anon_vma
> 
> Rename anon_vma_lock to vma_lock_anon_vma.  This matches the
> naming style used in page_lock_anon_vma and will come in really
> handy further down in this patch series.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

This v2 series seems to be missing all the ack's etc from the previous 
series. At least Mel and Hiroyuki-san had acked several of the patches.

What's the point of making a v2 if it doesn't actually take the input from 
v1 into account?

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma
  2010-05-26 17:25                   ` Linus Torvalds
@ 2010-05-26 19:01                     ` Rik van Riel
  2010-05-26 19:25                       ` Linus Torvalds
  0 siblings, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2010-05-26 19:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, KAMEZAWA Hiroyuki, LKML, Lee Schermerhorn

On 05/26/2010 01:25 PM, Linus Torvalds wrote:
>
>
> On Wed, 26 May 2010, Rik van Riel wrote:
>>
>> Subject: rename anon_vma_lock to vma_lock_anon_vma
>>
>> Rename anon_vma_lock to vma_lock_anon_vma.  This matches the
>> naming style used in page_lock_anon_vma and will come in really
>> handy further down in this patch series.
>>
>> Signed-off-by: Rik van Riel<riel@redhat.com>
>
> This v2 series seems to be missing all the ack's etc from the previous
> series. At least Mel and Hiroyuki-san had acked several of the patches.
>
> What's the point of making a v2 if it doesn't actually take the input from
> v1 into account?

It did take all the input into account (I made all the
suggested improvements, enhanced comments where people
had questions, etc).

I just forgot to add all the Acked-by's :(

If you want I can send you and Andrew a duplicate of
what I sent this morning, with the only difference
being added Acked-by's.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma
  2010-05-26 19:01                     ` Rik van Riel
@ 2010-05-26 19:25                       ` Linus Torvalds
  2010-05-26 19:35                         ` Rik van Riel
  0 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2010-05-26 19:25 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, KAMEZAWA Hiroyuki, LKML, Lee Schermerhorn



On Wed, 26 May 2010, Rik van Riel wrote:
> 
> If you want I can send you and Andrew a duplicate of
> what I sent this morning, with the only difference
> being added Acked-by's.

I assume this will come through Andrew, so it depends on whether he is so 
resigned to manually adding acks from following threads or not.

Personally, I think it's one of the responsibilities of the person pushing 
the patch to do. Having the acked-by and reviewed-by trail should be one 
of the things that makes it _way_ easier for upstream to decide whether a 
patch should go in - especially if the people in question are active 
maintainers.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma
  2010-05-26 19:25                       ` Linus Torvalds
@ 2010-05-26 19:35                         ` Rik van Riel
  0 siblings, 0 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-26 19:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	Linux-MM, KAMEZAWA Hiroyuki, LKML, Lee Schermerhorn

On 05/26/2010 03:25 PM, Linus Torvalds wrote:
>
>
> On Wed, 26 May 2010, Rik van Riel wrote:
>>
>> If you want I can send you and Andrew a duplicate of
>> what I sent this morning, with the only difference
>> being added Acked-by's.
>
> I assume this will come through Andrew, so it depends on whether he is so
> resigned to manually adding acks from following threads or not.
>
> Personally, I think it's one of the responsibilities of the person pushing
> the patch to do. Having the acked-by and reviewed-by trail should be one
> of the things that makes it _way_ easier for upstream to decide whether a
> patch should go in - especially if the people in question are active
> maintainers.

I'll do a repost, gathering up the acks I have so far.

I just saw that patches 4 and 5 have no acks yet.

I've fixed up the comments to make the code clearer and
made the "find the root anon_vma" thing a little simpler
in anon_vma_fork.

Hopefully those patches will get acks now :)

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 3/5] track the root (oldest) anon_vma
  2010-05-26 19:38 [PATCH -v3 0/5] always lock the root anon_vma Rik van Riel
@ 2010-05-26 19:40 ` Rik van Riel
  2010-05-26 20:34   ` Larry Woodman
  2010-05-27 13:48   ` Minchan Kim
  0 siblings, 2 replies; 51+ messages in thread
From: Rik van Riel @ 2010-05-26 19:40 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli, Minchan Kim,
	KAMEZAWA Hiroyuki, Lee Schermerhorn

Subject: track the root (oldest) anon_vma

Track the root (oldest) anon_vma in each anon_vma tree.   Because we only
take the lock on the root anon_vma, we cannot use the lock on higher-up
anon_vmas to lock anything.  This makes it impossible to do an indirect
lookup of the root anon_vma, since the data structures could go away from
under us.

However, a direct pointer is safe because the root anon_vma is always the
last one that gets freed on munmap or exit, by virtue of the same_vma list
order and unlink_anon_vmas walking the list forward.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/rmap.h |    1 +
 mm/rmap.c            |   18 ++++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

Index: linux-2.6.34/include/linux/rmap.h
===================================================================
--- linux-2.6.34.orig/include/linux/rmap.h
+++ linux-2.6.34/include/linux/rmap.h
@@ -26,6 +26,7 @@
  */
 struct anon_vma {
 	spinlock_t lock;	/* Serialize access to vma list */
+	struct anon_vma *root;	/* Root of this anon_vma tree */
 #if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
 
 	/*
Index: linux-2.6.34/mm/rmap.c
===================================================================
--- linux-2.6.34.orig/mm/rmap.c
+++ linux-2.6.34/mm/rmap.c
@@ -132,6 +132,11 @@ int anon_vma_prepare(struct vm_area_stru
 			if (unlikely(!anon_vma))
 				goto out_enomem_free_avc;
 			allocated = anon_vma;
+			/*
+			 * This VMA had no anon_vma yet.  This anon_vma is
+			 * the root of any anon_vma tree that might form.
+			 */
+			anon_vma->root = anon_vma;
 		}
 
 		anon_vma_lock(anon_vma);
@@ -224,9 +229,15 @@ int anon_vma_fork(struct vm_area_struct 
 	avc = anon_vma_chain_alloc();
 	if (!avc)
 		goto out_error_free_anon_vma;
-	anon_vma_chain_link(vma, avc, anon_vma);
+
+	/*
+	 * The root anon_vm's spinlock is the lock actually used when we
+	 * lock any of the anon_vmas in this anon_vma tree.
+	 */
+	anon_vma->root = pvma->anon_vma->root;
 	/* Mark this anon_vma as the one where our new (COWed) pages go. */
 	vma->anon_vma = anon_vma;
+	anon_vma_chain_link(vma, avc, anon_vma);
 
 	return 0;
 
@@ -261,7 +272,10 @@ void unlink_anon_vmas(struct vm_area_str
 {
 	struct anon_vma_chain *avc, *next;
 
-	/* Unlink each anon_vma chained to the VMA. */
+	/*
+	 * Unlink each anon_vma chained to the VMA.  This list is ordered
+	 * from newest to oldest, ensuring the root anon_vma gets freed last.
+	 */
 	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
 		anon_vma_unlink(avc);
 		list_del(&avc->same_vma);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/5] track the root (oldest) anon_vma
  2010-05-26 19:40 ` [PATCH 3/5] track the root (oldest) anon_vma Rik van Riel
@ 2010-05-26 20:34   ` Larry Woodman
  2010-05-27 13:48   ` Minchan Kim
  1 sibling, 0 replies; 51+ messages in thread
From: Larry Woodman @ 2010-05-26 20:34 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli,
	Minchan Kim, KAMEZAWA Hiroyuki, Lee Schermerhorn

On Wed, 2010-05-26 at 15:40 -0400, Rik van Riel wrote:
> Subject: track the root (oldest) anon_vma
> 
> Track the root (oldest) anon_vma in each anon_vma tree.   Because we only
> take the lock on the root anon_vma, we cannot use the lock on higher-up
> anon_vmas to lock anything.  This makes it impossible to do an indirect
> lookup of the root anon_vma, since the data structures could go away from
> under us.
> 
> However, a direct pointer is safe because the root anon_vma is always the
> last one that gets freed on munmap or exit, by virtue of the same_vma list
> order and unlink_anon_vmas walking the list forward.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Acked-by: Mel Gorman <mel@csn.ul.ie>
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Tested and Acked-by: Larry Woodman <lwoodman@redhat.com>

> ---
>  include/linux/rmap.h |    1 +
>  mm/rmap.c            |   18 ++++++++++++++++--
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.34/include/linux/rmap.h
> ===================================================================
> --- linux-2.6.34.orig/include/linux/rmap.h
> +++ linux-2.6.34/include/linux/rmap.h
> @@ -26,6 +26,7 @@
>   */
>  struct anon_vma {
>  	spinlock_t lock;	/* Serialize access to vma list */
> +	struct anon_vma *root;	/* Root of this anon_vma tree */
>  #if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
>  
>  	/*
> Index: linux-2.6.34/mm/rmap.c
> ===================================================================
> --- linux-2.6.34.orig/mm/rmap.c
> +++ linux-2.6.34/mm/rmap.c
> @@ -132,6 +132,11 @@ int anon_vma_prepare(struct vm_area_stru
>  			if (unlikely(!anon_vma))
>  				goto out_enomem_free_avc;
>  			allocated = anon_vma;
> +			/*
> +			 * This VMA had no anon_vma yet.  This anon_vma is
> +			 * the root of any anon_vma tree that might form.
> +			 */
> +			anon_vma->root = anon_vma;
>  		}
>  
>  		anon_vma_lock(anon_vma);
> @@ -224,9 +229,15 @@ int anon_vma_fork(struct vm_area_struct 
>  	avc = anon_vma_chain_alloc();
>  	if (!avc)
>  		goto out_error_free_anon_vma;
> -	anon_vma_chain_link(vma, avc, anon_vma);
> +
> +	/*
> +	 * The root anon_vm's spinlock is the lock actually used when we
> +	 * lock any of the anon_vmas in this anon_vma tree.
> +	 */
> +	anon_vma->root = pvma->anon_vma->root;
>  	/* Mark this anon_vma as the one where our new (COWed) pages go. */
>  	vma->anon_vma = anon_vma;
> +	anon_vma_chain_link(vma, avc, anon_vma);
>  
>  	return 0;
>  
> @@ -261,7 +272,10 @@ void unlink_anon_vmas(struct vm_area_str
>  {
>  	struct anon_vma_chain *avc, *next;
>  
> -	/* Unlink each anon_vma chained to the VMA. */
> +	/*
> +	 * Unlink each anon_vma chained to the VMA.  This list is ordered
> +	 * from newest to oldest, ensuring the root anon_vma gets freed last.
> +	 */
>  	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
>  		anon_vma_unlink(avc);
>  		list_del(&avc->same_vma);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/5] track the root (oldest) anon_vma
  2010-05-26 19:40 ` [PATCH 3/5] track the root (oldest) anon_vma Rik van Riel
  2010-05-26 20:34   ` Larry Woodman
@ 2010-05-27 13:48   ` Minchan Kim
  1 sibling, 0 replies; 51+ messages in thread
From: Minchan Kim @ 2010-05-27 13:48 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-kernel, linux-mm, Mel Gorman, Andrea Arcangeli,
	KAMEZAWA Hiroyuki, Lee Schermerhorn

On Wed, May 26, 2010 at 03:40:10PM -0400, Rik van Riel wrote:
> Subject: track the root (oldest) anon_vma
> 
> Track the root (oldest) anon_vma in each anon_vma tree.   Because we only
> take the lock on the root anon_vma, we cannot use the lock on higher-up
> anon_vmas to lock anything.  This makes it impossible to do an indirect
> lookup of the root anon_vma, since the data structures could go away from
> under us.
> 
> However, a direct pointer is safe because the root anon_vma is always the
> last one that gets freed on munmap or exit, by virtue of the same_vma list
> order and unlink_anon_vmas walking the list forward.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Acked-by: Mel Gorman <mel@csn.ul.ie>
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Except below one minor type. 

> ---
>  include/linux/rmap.h |    1 +
>  mm/rmap.c            |   18 ++++++++++++++++--
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.34/include/linux/rmap.h
> ===================================================================
> --- linux-2.6.34.orig/include/linux/rmap.h
> +++ linux-2.6.34/include/linux/rmap.h
> @@ -26,6 +26,7 @@
>   */
>  struct anon_vma {
>  	spinlock_t lock;	/* Serialize access to vma list */
> +	struct anon_vma *root;	/* Root of this anon_vma tree */
>  #if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
>  
>  	/*
> Index: linux-2.6.34/mm/rmap.c
> ===================================================================
> --- linux-2.6.34.orig/mm/rmap.c
> +++ linux-2.6.34/mm/rmap.c
> @@ -132,6 +132,11 @@ int anon_vma_prepare(struct vm_area_stru
>  			if (unlikely(!anon_vma))
>  				goto out_enomem_free_avc;
>  			allocated = anon_vma;
> +			/*
> +			 * This VMA had no anon_vma yet.  This anon_vma is
> +			 * the root of any anon_vma tree that might form.
> +			 */
> +			anon_vma->root = anon_vma;
>  		}
>  
>  		anon_vma_lock(anon_vma);
> @@ -224,9 +229,15 @@ int anon_vma_fork(struct vm_area_struct 
>  	avc = anon_vma_chain_alloc();
>  	if (!avc)
>  		goto out_error_free_anon_vma;
> -	anon_vma_chain_link(vma, avc, anon_vma);
> +
> +	/*
> +	 * The root anon_vm's spinlock is the lock actually used when we
                    anon_vma's
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH -v2 4/5] always lock the root (oldest) anon_vma
  2010-05-26  5:46                 ` james toy
@ 2010-06-01  0:57                   ` james toy
  0 siblings, 0 replies; 51+ messages in thread
From: james toy @ 2010-06-01  0:57 UTC (permalink / raw)
  To: james toy
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, Andrea Arcangeli,
	Minchan Kim, Linux-MM, KAMEZAWA Hiroyuki, LKML, Linus Torvalds,
	Lee Schermerhorn, james toy, James Toy

This is now fixed; I am truly sorry for the delay but it was out of my
hands.  The sysadmin had git 1.7.1 but an old version of guilt (0.3.2)
which was incompatible.  After the upgrade the watchdog program is
working fine.  Again thanks for alerting me to this and i'll be
keeping a closer eye on it since i'm working on the kernel more now
days.

respectfully,

=jt

On Wed, May 26, 2010 at 01:46, james toy <nil@0xabadba.be> wrote:
> I'll get after this asap; sorry.  I'm finishing my last trimester of
> my B.S.  I'll send a message when it's back up with the offending patch.
>
> =jt
>
> On May 26, 2010, at 0:15, Andrew Morton <akpm@linux-foundation.org>
> wrote:
>
>> On Wed, 26 May 2010 00:00:15 -0400 Rik van Riel <riel@redhat.com>
>> wrote:
>>
>>> On 05/13/2010 05:09 PM, Andrew Morton wrote:
>>>
>>>> I'm not very confident in merging all these onto the current MM
>>>> pile.
>>>
>>> Blah.  I thought I just did that (and wondered why it was
>>> so easy), and then I saw that the MMOTM git tree is old
>>> and does not have the COMPACTION code :(
>>>
>>
>> Oh.  James's mmotm->git bot might have broken.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2010-06-01  0:57 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-12 17:38 [PATCH 0/5] always lock the root anon_vma Rik van Riel
2010-05-12 17:39 ` [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma Rik van Riel
2010-05-12 20:57   ` Mel Gorman
2010-05-13  0:30   ` KAMEZAWA Hiroyuki
2010-05-12 17:39 ` [PATCH 3/5] track the root (oldest) anon_vma Rik van Riel
2010-05-12 20:59   ` Mel Gorman
2010-05-12 21:01     ` Rik van Riel
2010-05-13  0:38   ` KAMEZAWA Hiroyuki
2010-05-13  2:25     ` Rik van Riel
2010-05-14  0:04       ` KAMEZAWA Hiroyuki
2010-05-12 17:40 ` [PATCH 4/5] always lock " Rik van Riel
2010-05-12 21:02   ` Mel Gorman
2010-05-12 21:08     ` Rik van Riel
2010-05-13  9:54       ` Mel Gorman
2010-05-13 14:33         ` [PATCH -v2 " Rik van Riel
2010-05-13 21:09           ` Andrew Morton
2010-05-13 22:50             ` Rik van Riel
2010-05-14  9:33               ` Mel Gorman
2010-05-26  4:00             ` Rik van Riel
2010-05-26  4:15               ` Andrew Morton
2010-05-26  5:46                 ` james toy
2010-06-01  0:57                   ` james toy
2010-05-26 15:24               ` [PATCH -v2 0/5] always lock the root anon_vma Rik van Riel
2010-05-26 15:25                 ` [PATCH 1/5] rename anon_vma_lock to vma_lock_anon_vma Rik van Riel
2010-05-26 17:25                   ` Linus Torvalds
2010-05-26 19:01                     ` Rik van Riel
2010-05-26 19:25                       ` Linus Torvalds
2010-05-26 19:35                         ` Rik van Riel
2010-05-26 15:25                 ` [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function Rik van Riel
2010-05-26 15:26                 ` [PATCH 3/5] track the root (oldest) anon_vma Rik van Riel
2010-05-26 15:27                 ` [PATCH 4/5] always lock " Rik van Riel
2010-05-26 15:27                 ` [PATCH 5/5] extend KSM refcounts to the anon_vma root Rik van Riel
2010-05-12 21:55   ` [PATCH 4/5] always lock the root (oldest) anon_vma Linus Torvalds
2010-05-12 22:18     ` Rik van Riel
2010-05-12 22:26       ` Linus Torvalds
2010-05-12 17:41 ` [PATCH 5/5] extend KSM refcounts to the anon_vma root Rik van Riel
2010-05-12 21:07   ` Mel Gorman
2010-05-12 21:09     ` Rik van Riel
2010-05-13 11:26       ` Mel Gorman
2010-05-13 13:11         ` Rik van Riel
2010-05-13 13:24           ` Mel Gorman
2010-05-13 14:34             ` [PATCH -v2 " Rik van Riel
2010-05-19  1:05               ` Andrea Arcangeli
2010-05-12 17:41 ` [PATCH 2/5] change direct call of spin_lock(anon_vma->lock) to inline function Rik van Riel
2010-05-12 20:58   ` Mel Gorman
2010-05-13  0:32   ` KAMEZAWA Hiroyuki
2010-05-20 22:42 ` [PATCH 6/5] adjust mm_take_all_locks to anon-vma-root locking Andrea Arcangeli
2010-05-20 23:07   ` Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2010-05-26 19:38 [PATCH -v3 0/5] always lock the root anon_vma Rik van Riel
2010-05-26 19:40 ` [PATCH 3/5] track the root (oldest) anon_vma Rik van Riel
2010-05-26 20:34   ` Larry Woodman
2010-05-27 13:48   ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).