All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Paul Turner <pjt@google.com>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH 2/2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
Date: Mon, 3 Dec 2012 14:17:01 +0000	[thread overview]
Message-ID: <20121203141701.GN8218@suse.de> (raw)
In-Reply-To: <20121201201538.GB2704@gmail.com>

On Sat, Dec 01, 2012 at 09:15:38PM +0100, Ingo Molnar wrote:
> 
> Note, with this optimization I went a farther than the 
> boundaries of the migration code - it seemed worthwile to do and 
> I've reviewed all the other users of page_lock_anon_vma() as 
> well and none seemed to be modifying the list inside that lock.
> 
> Please review this patch carefully - in particular the SMP races 
> outlined in anon_vma_free() are exciting: I have updated the 
> reasoning and it still appears to hold, but please double check 
> the changes nevertheless ...
> 
> Thanks,
> 
> 	Ingo
> 
> ------------------->
> From: Ingo Molnar <mingo@kernel.org>
> Date: Sat Dec 1 20:43:04 CET 2012
> 
> rmap_walk_anon() and try_to_unmap_anon() appears to be too careful
> about locking the anon vma: while it needs protection against anon
> vma list modifications, it does not need exclusive access to the
> list itself.
> 
> Transforming this exclusive lock to a read-locked rwsem removes a
> global lock from the hot path of page-migration intense threaded
> workloads which can cause pathological performance like this:
> 
>     96.43%        process 0  [kernel.kallsyms]  [k] perf_trace_sched_switch
>                   |
>                   --- perf_trace_sched_switch
>                       __schedule
>                       schedule
>                       schedule_preempt_disabled
>                       __mutex_lock_common.isra.6
>                       __mutex_lock_slowpath
>                       mutex_lock
>                      |
>                      |--50.61%-- rmap_walk
>                      |          move_to_new_page
>                      |          migrate_pages
>                      |          migrate_misplaced_page
>                      |          __do_numa_page.isra.69
>                      |          handle_pte_fault
>                      |          handle_mm_fault
>                      |          __do_page_fault
>                      |          do_page_fault
>                      |          page_fault
>                      |          __memset_sse2
>                      |          |
>                      |           --100.00%-- worker_thread
>                      |                     |
>                      |                      --100.00%-- start_thread
>                      |
>                       --49.39%-- page_lock_anon_vma
>                                 try_to_unmap_anon
>                                 try_to_unmap
>                                 migrate_pages
>                                 migrate_misplaced_page
>                                 __do_numa_page.isra.69
>                                 handle_pte_fault
>                                 handle_mm_fault
>                                 __do_page_fault
>                                 do_page_fault
>                                 page_fault
>                                 __memset_sse2
>                                 |
>                                  --100.00%-- worker_thread
>                                            start_thread
> 
> With this change applied the profile is now nicely flat
> and there's no anon-vma related scheduling/blocking.
> 
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  include/linux/rmap.h |   15 +++++++++++++--
>  mm/huge_memory.c     |    4 ++--
>  mm/memory-failure.c  |    4 ++--
>  mm/migrate.c         |    2 +-
>  mm/rmap.c            |   40 ++++++++++++++++++++--------------------
>  5 files changed, 38 insertions(+), 27 deletions(-)
> 
> Index: linux/include/linux/rmap.h
> ===================================================================
> --- linux.orig/include/linux/rmap.h
> +++ linux/include/linux/rmap.h
> @@ -128,6 +128,17 @@ static inline void anon_vma_unlock(struc
>  	up_write(&anon_vma->root->rwsem);
>  }
>  
> +static inline void anon_vma_lock_read(struct anon_vma *anon_vma)
> +{
> +	down_read(&anon_vma->root->rwsem);
> +}
> +
> +static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
> +{
> +	up_read(&anon_vma->root->rwsem);
> +}
> +
> +
>  /*
>   * anon_vma helper functions.
>   */
> @@ -220,8 +231,8 @@ int try_to_munlock(struct page *);
>  /*
>   * Called by memory-failure.c to kill processes.
>   */
> -struct anon_vma *page_lock_anon_vma(struct page *page);
> -void page_unlock_anon_vma(struct anon_vma *anon_vma);
> +struct anon_vma *page_lock_anon_vma_read(struct page *page);
> +void page_unlock_anon_vma_read(struct anon_vma *anon_vma);
>  int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma);
>  
>  /*
> Index: linux/mm/huge_memory.c
> ===================================================================
> --- linux.orig/mm/huge_memory.c
> +++ linux/mm/huge_memory.c
> @@ -1645,7 +1645,7 @@ int split_huge_page(struct page *page)
>  	int ret = 1;
>  
>  	BUG_ON(!PageAnon(page));
> -	anon_vma = page_lock_anon_vma(page);
> +	anon_vma = page_lock_anon_vma_read(page);
>  	if (!anon_vma)
>  		goto out;
>  	ret = 0;
> @@ -1658,7 +1658,7 @@ int split_huge_page(struct page *page)
>  
>  	BUG_ON(PageCompound(page));
>  out_unlock:
> -	page_unlock_anon_vma(anon_vma);
> +	page_unlock_anon_vma_read(anon_vma);
>  out:
>  	return ret;
>  }
> Index: linux/mm/memory-failure.c
> ===================================================================
> --- linux.orig/mm/memory-failure.c
> +++ linux/mm/memory-failure.c
> @@ -402,7 +402,7 @@ static void collect_procs_anon(struct pa
>  	struct anon_vma *av;
>  	pgoff_t pgoff;
>  
> -	av = page_lock_anon_vma(page);
> +	av = page_lock_anon_vma_read(page);
>  	if (av == NULL)	/* Not actually mapped anymore */
>  		return;
>  

Probably no real benefit on this one. It takes the tasklist_lock just
after it which is a much heavier lock anyway. I don't think there is
anything wrong with this though.

> @@ -423,7 +423,7 @@ static void collect_procs_anon(struct pa
>  		}
>  	}
>  	read_unlock(&tasklist_lock);
> -	page_unlock_anon_vma(av);
> +	page_unlock_anon_vma_read(av);
>  }
>  
>  /*
> Index: linux/mm/migrate.c
> ===================================================================
> --- linux.orig/mm/migrate.c
> +++ linux/mm/migrate.c
> @@ -751,7 +751,7 @@ static int __unmap_and_move(struct page
>  	 */
>  	if (PageAnon(page)) {
>  		/*
> -		 * Only page_lock_anon_vma() understands the subtleties of
> +		 * Only page_lock_anon_vma_read() understands the subtleties of
>  		 * getting a hold on an anon_vma from outside one of its mms.
>  		 */
>  		anon_vma = page_get_anon_vma(page);
> Index: linux/mm/rmap.c
> ===================================================================
> --- linux.orig/mm/rmap.c
> +++ linux/mm/rmap.c
> @@ -87,18 +87,18 @@ static inline void anon_vma_free(struct
>  	VM_BUG_ON(atomic_read(&anon_vma->refcount));
>  
>  	/*
> -	 * Synchronize against page_lock_anon_vma() such that
> +	 * Synchronize against page_lock_anon_vma_read() such that
>  	 * we can safely hold the lock without the anon_vma getting
>  	 * freed.
>  	 *
>  	 * Relies on the full mb implied by the atomic_dec_and_test() from
>  	 * put_anon_vma() against the acquire barrier implied by
> -	 * mutex_trylock() from page_lock_anon_vma(). This orders:
> +	 * down_read_trylock() from page_lock_anon_vma_read(). This orders:
>  	 *
> -	 * page_lock_anon_vma()		VS	put_anon_vma()
> -	 *   mutex_trylock()			  atomic_dec_and_test()
> +	 * page_lock_anon_vma_read()	VS	put_anon_vma()
> +	 *   down_read_trylock()		  atomic_dec_and_test()
>  	 *   LOCK				  MB
> -	 *   atomic_read()			  mutex_is_locked()
> +	 *   atomic_read()			  rwsem_is_locked()
>  	 *
>  	 * LOCK should suffice since the actual taking of the lock must
>  	 * happen _before_ what follows.
> @@ -146,7 +146,7 @@ static void anon_vma_chain_link(struct v
>   * allocate a new one.
>   *
>   * Anon-vma allocations are very subtle, because we may have
> - * optimistically looked up an anon_vma in page_lock_anon_vma()
> + * optimistically looked up an anon_vma in page_lock_anon_vma_read()
>   * and that may actually touch the spinlock even in the newly
>   * allocated vma (it depends on RCU to make sure that the
>   * anon_vma isn't actually destroyed).
> @@ -442,7 +442,7 @@ out:
>   * atomic op -- the trylock. If we fail the trylock, we fall back to getting a
>   * reference like with page_get_anon_vma() and then block on the mutex.
>   */
> -struct anon_vma *page_lock_anon_vma(struct page *page)
> +struct anon_vma *page_lock_anon_vma_read(struct page *page)
>  {
>  	struct anon_vma *anon_vma = NULL;
>  	struct anon_vma *root_anon_vma;
> @@ -457,14 +457,14 @@ struct anon_vma *page_lock_anon_vma(stru
>  
>  	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
>  	root_anon_vma = ACCESS_ONCE(anon_vma->root);
> -	if (down_write_trylock(&root_anon_vma->rwsem)) {
> +	if (down_read_trylock(&root_anon_vma->rwsem)) {
>  		/*
>  		 * If the page is still mapped, then this anon_vma is still
>  		 * its anon_vma, and holding the mutex ensures that it will
>  		 * not go away, see anon_vma_free().
>  		 */
>  		if (!page_mapped(page)) {
> -			up_write(&root_anon_vma->rwsem);
> +			up_read(&root_anon_vma->rwsem);
>  			anon_vma = NULL;
>  		}
>  		goto out;
> @@ -484,7 +484,7 @@ struct anon_vma *page_lock_anon_vma(stru
>  
>  	/* we pinned the anon_vma, its safe to sleep */
>  	rcu_read_unlock();
> -	anon_vma_lock(anon_vma);
> +	anon_vma_lock_read(anon_vma);
>  
>  	if (atomic_dec_and_test(&anon_vma->refcount)) {
>  		/*
> @@ -492,7 +492,7 @@ struct anon_vma *page_lock_anon_vma(stru
>  		 * and bail -- can't simply use put_anon_vma() because
>  		 * we'll deadlock on the anon_vma_lock() recursion.
>  		 */
> -		anon_vma_unlock(anon_vma);
> +		anon_vma_unlock_read(anon_vma);
>  		__put_anon_vma(anon_vma);
>  		anon_vma = NULL;
>  	}
> @@ -504,9 +504,9 @@ out:
>  	return anon_vma;
>  }
>  
> -void page_unlock_anon_vma(struct anon_vma *anon_vma)
> +void page_unlock_anon_vma_read(struct anon_vma *anon_vma)
>  {
> -	anon_vma_unlock(anon_vma);
> +	anon_vma_unlock_read(anon_vma);
>  }
>  
>  /*
> @@ -732,7 +732,7 @@ static int page_referenced_anon(struct p
>  	struct anon_vma_chain *avc;
>  	int referenced = 0;
>  
> -	anon_vma = page_lock_anon_vma(page);
> +	anon_vma = page_lock_anon_vma_read(page);
>  	if (!anon_vma)
>  		return referenced;
>  

This is a slightly trickier one as this path is called from reclaim. It does
open the possibility that reclaim can stall something like a parallel fork
or anything that requires the anon_vma rwsem for a period of time. I very
severely doubt it'll really be a problem but keep an eye out for bug reports
related to delayed mmap/fork/anything_needing_write_lock during page reclaim.

> @@ -754,7 +754,7 @@ static int page_referenced_anon(struct p
>  			break;
>  	}
>  
> -	page_unlock_anon_vma(anon_vma);
> +	page_unlock_anon_vma_read(anon_vma);
>  	return referenced;
>  }
>  
> @@ -1474,7 +1474,7 @@ static int try_to_unmap_anon(struct page
>  	struct anon_vma_chain *avc;
>  	int ret = SWAP_AGAIN;
>  
> -	anon_vma = page_lock_anon_vma(page);
> +	anon_vma = page_lock_anon_vma_read(page);
>  	if (!anon_vma)
>  		return ret;
>  
> @@ -1501,7 +1501,7 @@ static int try_to_unmap_anon(struct page
>  			break;
>  	}
>  
> -	page_unlock_anon_vma(anon_vma);
> +	page_unlock_anon_vma_read(anon_vma);
>  	return ret;
>  }
>  
> @@ -1696,7 +1696,7 @@ static int rmap_walk_anon(struct page *p
>  	int ret = SWAP_AGAIN;
>  
>  	/*
> -	 * Note: remove_migration_ptes() cannot use page_lock_anon_vma()
> +	 * Note: remove_migration_ptes() cannot use page_lock_anon_vma_read()
>  	 * because that depends on page_mapped(); but not all its usages
>  	 * are holding mmap_sem. Users without mmap_sem are required to
>  	 * take a reference count to prevent the anon_vma disappearing
> @@ -1704,7 +1704,7 @@ static int rmap_walk_anon(struct page *p
>  	anon_vma = page_anon_vma(page);
>  	if (!anon_vma)
>  		return ret;
> -	anon_vma_lock(anon_vma);
> +	anon_vma_lock_read(anon_vma);
>  	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
>  		struct vm_area_struct *vma = avc->vma;
>  		unsigned long address = vma_address(page, vma);
> @@ -1712,7 +1712,7 @@ static int rmap_walk_anon(struct page *p
>  		if (ret != SWAP_AGAIN)
>  			break;
>  	}
> -	anon_vma_unlock(anon_vma);
> +	anon_vma_unlock_read(anon_vma);
>  	return ret;
>  }
>  

I can't think of any reason why this would not work. Good stuff!

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Paul Turner <pjt@google.com>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH 2/2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
Date: Mon, 3 Dec 2012 14:17:01 +0000	[thread overview]
Message-ID: <20121203141701.GN8218@suse.de> (raw)
In-Reply-To: <20121201201538.GB2704@gmail.com>

On Sat, Dec 01, 2012 at 09:15:38PM +0100, Ingo Molnar wrote:
> 
> Note, with this optimization I went a farther than the 
> boundaries of the migration code - it seemed worthwile to do and 
> I've reviewed all the other users of page_lock_anon_vma() as 
> well and none seemed to be modifying the list inside that lock.
> 
> Please review this patch carefully - in particular the SMP races 
> outlined in anon_vma_free() are exciting: I have updated the 
> reasoning and it still appears to hold, but please double check 
> the changes nevertheless ...
> 
> Thanks,
> 
> 	Ingo
> 
> ------------------->
> From: Ingo Molnar <mingo@kernel.org>
> Date: Sat Dec 1 20:43:04 CET 2012
> 
> rmap_walk_anon() and try_to_unmap_anon() appears to be too careful
> about locking the anon vma: while it needs protection against anon
> vma list modifications, it does not need exclusive access to the
> list itself.
> 
> Transforming this exclusive lock to a read-locked rwsem removes a
> global lock from the hot path of page-migration intense threaded
> workloads which can cause pathological performance like this:
> 
>     96.43%        process 0  [kernel.kallsyms]  [k] perf_trace_sched_switch
>                   |
>                   --- perf_trace_sched_switch
>                       __schedule
>                       schedule
>                       schedule_preempt_disabled
>                       __mutex_lock_common.isra.6
>                       __mutex_lock_slowpath
>                       mutex_lock
>                      |
>                      |--50.61%-- rmap_walk
>                      |          move_to_new_page
>                      |          migrate_pages
>                      |          migrate_misplaced_page
>                      |          __do_numa_page.isra.69
>                      |          handle_pte_fault
>                      |          handle_mm_fault
>                      |          __do_page_fault
>                      |          do_page_fault
>                      |          page_fault
>                      |          __memset_sse2
>                      |          |
>                      |           --100.00%-- worker_thread
>                      |                     |
>                      |                      --100.00%-- start_thread
>                      |
>                       --49.39%-- page_lock_anon_vma
>                                 try_to_unmap_anon
>                                 try_to_unmap
>                                 migrate_pages
>                                 migrate_misplaced_page
>                                 __do_numa_page.isra.69
>                                 handle_pte_fault
>                                 handle_mm_fault
>                                 __do_page_fault
>                                 do_page_fault
>                                 page_fault
>                                 __memset_sse2
>                                 |
>                                  --100.00%-- worker_thread
>                                            start_thread
> 
> With this change applied the profile is now nicely flat
> and there's no anon-vma related scheduling/blocking.
> 
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  include/linux/rmap.h |   15 +++++++++++++--
>  mm/huge_memory.c     |    4 ++--
>  mm/memory-failure.c  |    4 ++--
>  mm/migrate.c         |    2 +-
>  mm/rmap.c            |   40 ++++++++++++++++++++--------------------
>  5 files changed, 38 insertions(+), 27 deletions(-)
> 
> Index: linux/include/linux/rmap.h
> ===================================================================
> --- linux.orig/include/linux/rmap.h
> +++ linux/include/linux/rmap.h
> @@ -128,6 +128,17 @@ static inline void anon_vma_unlock(struc
>  	up_write(&anon_vma->root->rwsem);
>  }
>  
> +static inline void anon_vma_lock_read(struct anon_vma *anon_vma)
> +{
> +	down_read(&anon_vma->root->rwsem);
> +}
> +
> +static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
> +{
> +	up_read(&anon_vma->root->rwsem);
> +}
> +
> +
>  /*
>   * anon_vma helper functions.
>   */
> @@ -220,8 +231,8 @@ int try_to_munlock(struct page *);
>  /*
>   * Called by memory-failure.c to kill processes.
>   */
> -struct anon_vma *page_lock_anon_vma(struct page *page);
> -void page_unlock_anon_vma(struct anon_vma *anon_vma);
> +struct anon_vma *page_lock_anon_vma_read(struct page *page);
> +void page_unlock_anon_vma_read(struct anon_vma *anon_vma);
>  int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma);
>  
>  /*
> Index: linux/mm/huge_memory.c
> ===================================================================
> --- linux.orig/mm/huge_memory.c
> +++ linux/mm/huge_memory.c
> @@ -1645,7 +1645,7 @@ int split_huge_page(struct page *page)
>  	int ret = 1;
>  
>  	BUG_ON(!PageAnon(page));
> -	anon_vma = page_lock_anon_vma(page);
> +	anon_vma = page_lock_anon_vma_read(page);
>  	if (!anon_vma)
>  		goto out;
>  	ret = 0;
> @@ -1658,7 +1658,7 @@ int split_huge_page(struct page *page)
>  
>  	BUG_ON(PageCompound(page));
>  out_unlock:
> -	page_unlock_anon_vma(anon_vma);
> +	page_unlock_anon_vma_read(anon_vma);
>  out:
>  	return ret;
>  }
> Index: linux/mm/memory-failure.c
> ===================================================================
> --- linux.orig/mm/memory-failure.c
> +++ linux/mm/memory-failure.c
> @@ -402,7 +402,7 @@ static void collect_procs_anon(struct pa
>  	struct anon_vma *av;
>  	pgoff_t pgoff;
>  
> -	av = page_lock_anon_vma(page);
> +	av = page_lock_anon_vma_read(page);
>  	if (av == NULL)	/* Not actually mapped anymore */
>  		return;
>  

Probably no real benefit on this one. It takes the tasklist_lock just
after it which is a much heavier lock anyway. I don't think there is
anything wrong with this though.

> @@ -423,7 +423,7 @@ static void collect_procs_anon(struct pa
>  		}
>  	}
>  	read_unlock(&tasklist_lock);
> -	page_unlock_anon_vma(av);
> +	page_unlock_anon_vma_read(av);
>  }
>  
>  /*
> Index: linux/mm/migrate.c
> ===================================================================
> --- linux.orig/mm/migrate.c
> +++ linux/mm/migrate.c
> @@ -751,7 +751,7 @@ static int __unmap_and_move(struct page
>  	 */
>  	if (PageAnon(page)) {
>  		/*
> -		 * Only page_lock_anon_vma() understands the subtleties of
> +		 * Only page_lock_anon_vma_read() understands the subtleties of
>  		 * getting a hold on an anon_vma from outside one of its mms.
>  		 */
>  		anon_vma = page_get_anon_vma(page);
> Index: linux/mm/rmap.c
> ===================================================================
> --- linux.orig/mm/rmap.c
> +++ linux/mm/rmap.c
> @@ -87,18 +87,18 @@ static inline void anon_vma_free(struct
>  	VM_BUG_ON(atomic_read(&anon_vma->refcount));
>  
>  	/*
> -	 * Synchronize against page_lock_anon_vma() such that
> +	 * Synchronize against page_lock_anon_vma_read() such that
>  	 * we can safely hold the lock without the anon_vma getting
>  	 * freed.
>  	 *
>  	 * Relies on the full mb implied by the atomic_dec_and_test() from
>  	 * put_anon_vma() against the acquire barrier implied by
> -	 * mutex_trylock() from page_lock_anon_vma(). This orders:
> +	 * down_read_trylock() from page_lock_anon_vma_read(). This orders:
>  	 *
> -	 * page_lock_anon_vma()		VS	put_anon_vma()
> -	 *   mutex_trylock()			  atomic_dec_and_test()
> +	 * page_lock_anon_vma_read()	VS	put_anon_vma()
> +	 *   down_read_trylock()		  atomic_dec_and_test()
>  	 *   LOCK				  MB
> -	 *   atomic_read()			  mutex_is_locked()
> +	 *   atomic_read()			  rwsem_is_locked()
>  	 *
>  	 * LOCK should suffice since the actual taking of the lock must
>  	 * happen _before_ what follows.
> @@ -146,7 +146,7 @@ static void anon_vma_chain_link(struct v
>   * allocate a new one.
>   *
>   * Anon-vma allocations are very subtle, because we may have
> - * optimistically looked up an anon_vma in page_lock_anon_vma()
> + * optimistically looked up an anon_vma in page_lock_anon_vma_read()
>   * and that may actually touch the spinlock even in the newly
>   * allocated vma (it depends on RCU to make sure that the
>   * anon_vma isn't actually destroyed).
> @@ -442,7 +442,7 @@ out:
>   * atomic op -- the trylock. If we fail the trylock, we fall back to getting a
>   * reference like with page_get_anon_vma() and then block on the mutex.
>   */
> -struct anon_vma *page_lock_anon_vma(struct page *page)
> +struct anon_vma *page_lock_anon_vma_read(struct page *page)
>  {
>  	struct anon_vma *anon_vma = NULL;
>  	struct anon_vma *root_anon_vma;
> @@ -457,14 +457,14 @@ struct anon_vma *page_lock_anon_vma(stru
>  
>  	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
>  	root_anon_vma = ACCESS_ONCE(anon_vma->root);
> -	if (down_write_trylock(&root_anon_vma->rwsem)) {
> +	if (down_read_trylock(&root_anon_vma->rwsem)) {
>  		/*
>  		 * If the page is still mapped, then this anon_vma is still
>  		 * its anon_vma, and holding the mutex ensures that it will
>  		 * not go away, see anon_vma_free().
>  		 */
>  		if (!page_mapped(page)) {
> -			up_write(&root_anon_vma->rwsem);
> +			up_read(&root_anon_vma->rwsem);
>  			anon_vma = NULL;
>  		}
>  		goto out;
> @@ -484,7 +484,7 @@ struct anon_vma *page_lock_anon_vma(stru
>  
>  	/* we pinned the anon_vma, its safe to sleep */
>  	rcu_read_unlock();
> -	anon_vma_lock(anon_vma);
> +	anon_vma_lock_read(anon_vma);
>  
>  	if (atomic_dec_and_test(&anon_vma->refcount)) {
>  		/*
> @@ -492,7 +492,7 @@ struct anon_vma *page_lock_anon_vma(stru
>  		 * and bail -- can't simply use put_anon_vma() because
>  		 * we'll deadlock on the anon_vma_lock() recursion.
>  		 */
> -		anon_vma_unlock(anon_vma);
> +		anon_vma_unlock_read(anon_vma);
>  		__put_anon_vma(anon_vma);
>  		anon_vma = NULL;
>  	}
> @@ -504,9 +504,9 @@ out:
>  	return anon_vma;
>  }
>  
> -void page_unlock_anon_vma(struct anon_vma *anon_vma)
> +void page_unlock_anon_vma_read(struct anon_vma *anon_vma)
>  {
> -	anon_vma_unlock(anon_vma);
> +	anon_vma_unlock_read(anon_vma);
>  }
>  
>  /*
> @@ -732,7 +732,7 @@ static int page_referenced_anon(struct p
>  	struct anon_vma_chain *avc;
>  	int referenced = 0;
>  
> -	anon_vma = page_lock_anon_vma(page);
> +	anon_vma = page_lock_anon_vma_read(page);
>  	if (!anon_vma)
>  		return referenced;
>  

This is a slightly trickier one as this path is called from reclaim. It does
open the possibility that reclaim can stall something like a parallel fork
or anything that requires the anon_vma rwsem for a period of time. I very
severely doubt it'll really be a problem but keep an eye out for bug reports
related to delayed mmap/fork/anything_needing_write_lock during page reclaim.

> @@ -754,7 +754,7 @@ static int page_referenced_anon(struct p
>  			break;
>  	}
>  
> -	page_unlock_anon_vma(anon_vma);
> +	page_unlock_anon_vma_read(anon_vma);
>  	return referenced;
>  }
>  
> @@ -1474,7 +1474,7 @@ static int try_to_unmap_anon(struct page
>  	struct anon_vma_chain *avc;
>  	int ret = SWAP_AGAIN;
>  
> -	anon_vma = page_lock_anon_vma(page);
> +	anon_vma = page_lock_anon_vma_read(page);
>  	if (!anon_vma)
>  		return ret;
>  
> @@ -1501,7 +1501,7 @@ static int try_to_unmap_anon(struct page
>  			break;
>  	}
>  
> -	page_unlock_anon_vma(anon_vma);
> +	page_unlock_anon_vma_read(anon_vma);
>  	return ret;
>  }
>  
> @@ -1696,7 +1696,7 @@ static int rmap_walk_anon(struct page *p
>  	int ret = SWAP_AGAIN;
>  
>  	/*
> -	 * Note: remove_migration_ptes() cannot use page_lock_anon_vma()
> +	 * Note: remove_migration_ptes() cannot use page_lock_anon_vma_read()
>  	 * because that depends on page_mapped(); but not all its usages
>  	 * are holding mmap_sem. Users without mmap_sem are required to
>  	 * take a reference count to prevent the anon_vma disappearing
> @@ -1704,7 +1704,7 @@ static int rmap_walk_anon(struct page *p
>  	anon_vma = page_anon_vma(page);
>  	if (!anon_vma)
>  		return ret;
> -	anon_vma_lock(anon_vma);
> +	anon_vma_lock_read(anon_vma);
>  	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
>  		struct vm_area_struct *vma = avc->vma;
>  		unsigned long address = vma_address(page, vma);
> @@ -1712,7 +1712,7 @@ static int rmap_walk_anon(struct page *p
>  		if (ret != SWAP_AGAIN)
>  			break;
>  	}
> -	anon_vma_unlock(anon_vma);
> +	anon_vma_unlock_read(anon_vma);
>  	return ret;
>  }
>  

I can't think of any reason why this would not work. Good stuff!

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

  parent reply	other threads:[~2012-12-03 14:17 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-30 19:58 [PATCH 00/10] Latest numa/core release, v18 Ingo Molnar
2012-11-30 19:58 ` Ingo Molnar
2012-11-30 19:58 ` [PATCH 01/10] sched: Add "task flipping" support Ingo Molnar
2012-11-30 19:58   ` Ingo Molnar
2012-11-30 19:58 ` [PATCH 02/10] sched: Move the NUMA placement logic to a worklet Ingo Molnar
2012-11-30 19:58   ` Ingo Molnar
2012-11-30 19:58 ` [PATCH 03/10] numa, mempolicy: Improve CONFIG_NUMA_BALANCING=y OOM behavior Ingo Molnar
2012-11-30 19:58   ` Ingo Molnar
2012-11-30 19:58 ` [PATCH 04/10] mm, numa: Turn 4K pte NUMA faults into effective hugepage ones Ingo Molnar
2012-11-30 19:58   ` Ingo Molnar
2012-11-30 19:58 ` [PATCH 05/10] sched: Introduce directed NUMA convergence Ingo Molnar
2012-11-30 19:58   ` Ingo Molnar
2012-11-30 19:58 ` [PATCH 06/10] sched: Remove statistical NUMA scheduling Ingo Molnar
2012-11-30 19:58   ` Ingo Molnar
2012-11-30 19:58 ` [PATCH 07/10] sched: Track quality and strength of convergence Ingo Molnar
2012-11-30 19:58   ` Ingo Molnar
2012-11-30 19:58 ` [PATCH 08/10] sched: Converge NUMA migrations Ingo Molnar
2012-11-30 19:58   ` Ingo Molnar
2012-11-30 19:58 ` [PATCH 09/10] sched: Add convergence strength based adaptive NUMA page fault rate Ingo Molnar
2012-11-30 19:58   ` Ingo Molnar
2012-11-30 19:58 ` [PATCH 10/10] sched: Refine the 'shared tasks' memory interleaving logic Ingo Molnar
2012-11-30 19:58   ` Ingo Molnar
2012-11-30 20:37 ` [PATCH 00/10] Latest numa/core release, v18 Linus Torvalds
2012-11-30 20:37   ` Linus Torvalds
2012-12-01  9:49   ` [RFC PATCH] mm/migration: Don't lock anon vmas in rmap_walk_anon() Ingo Molnar
2012-12-01  9:49     ` Ingo Molnar
2012-12-01 12:26     ` [RFC PATCH] mm/migration: Remove anon vma locking from try_to_unmap() use Ingo Molnar
2012-12-01 12:26       ` Ingo Molnar
2012-12-01 18:38       ` Linus Torvalds
2012-12-01 18:38         ` Linus Torvalds
2012-12-01 18:41         ` Ingo Molnar
2012-12-01 18:41           ` Ingo Molnar
2012-12-01 18:50           ` Linus Torvalds
2012-12-01 18:50             ` Linus Torvalds
2012-12-01 20:10             ` [PATCH 1/2] mm/rmap: Convert the struct anon_vma::mutex to an rwsem Ingo Molnar
2012-12-01 20:10               ` Ingo Molnar
2012-12-01 20:19               ` Rik van Riel
2012-12-01 20:19                 ` Rik van Riel
2012-12-02 15:10                 ` Ingo Molnar
2012-12-02 15:10                   ` Ingo Molnar
2012-12-03 13:59               ` Mel Gorman
2012-12-03 13:59                 ` Mel Gorman
2012-12-01 20:15             ` [PATCH 2/2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable Ingo Molnar
2012-12-01 20:15               ` Ingo Molnar
2012-12-01 20:33               ` Rik van Riel
2012-12-01 20:33                 ` Rik van Riel
2012-12-02 15:12                 ` [PATCH 2/2, v2] " Ingo Molnar
2012-12-02 15:12                   ` Ingo Molnar
2012-12-02 17:53                   ` Rik van Riel
2012-12-02 17:53                     ` Rik van Riel
2012-12-04 14:42                   ` Michel Lespinasse
2012-12-04 14:42                     ` Michel Lespinasse
2012-12-05  2:59                   ` Michel Lespinasse
2012-12-05  2:59                     ` Michel Lespinasse
2012-12-03 14:17               ` Mel Gorman [this message]
2012-12-03 14:17                 ` [PATCH 2/2] " Mel Gorman
2012-12-04 14:37                 ` Michel Lespinasse
2012-12-04 14:37                   ` Michel Lespinasse
2012-12-04 18:17                   ` Mel Gorman
2012-12-04 18:17                     ` Mel Gorman
2012-12-01 18:55         ` [RFC PATCH] mm/migration: Remove anon vma locking from try_to_unmap() use Rik van Riel
2012-12-01 18:55           ` Rik van Riel
2012-12-01 16:19     ` [RFC PATCH] mm/migration: Don't lock anon vmas in rmap_walk_anon() Rik van Riel
2012-12-01 16:19       ` Rik van Riel
2012-12-01 17:55     ` Linus Torvalds
2012-12-01 17:55       ` Linus Torvalds
2012-12-01 18:30       ` Ingo Molnar
2012-12-01 18:30         ` Ingo Molnar
2012-12-03 13:41   ` [PATCH 00/10] Latest numa/core release, v18 Mel Gorman
2012-12-03 13:41     ` Mel Gorman
2012-12-04 17:30     ` Thomas Gleixner
2012-12-04 17:30       ` Thomas Gleixner
2012-12-03 10:43 ` Mel Gorman
2012-12-03 10:43   ` Mel Gorman
2012-12-03 11:32 ` Mel Gorman
2012-12-03 11:32   ` Mel Gorman
2012-12-04 22:49 ` Mel Gorman
2012-12-04 22:49   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121203141701.GN8218@suse.de \
    --to=mgorman@suse.de \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.