From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Avi Kivity <avi@redhat.com>, Thomas Gleixner <tglx@linutronix.de>,
Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@elte.hu>,
akpm@linux-foundation.org,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
David Miller <davem@davemloft.net>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Mel Gorman <mel@csn.ul.ie>, Nick Piggin <npiggin@suse.de>
Subject: Re: [PATCH 13/13] mm: Optimize page_lock_anon_vma
Date: Thu, 8 Apr 2010 15:18:17 -0700 [thread overview]
Message-ID: <20100408221817.GE2520@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100408192723.158842551@chello.nl>
On Thu, Apr 08, 2010 at 09:17:50PM +0200, Peter Zijlstra wrote:
> Optimize page_lock_anon_vma() by removing the atomic ref count
> ops from the fast path.
>
> Rather complicates the code a lot, but might be worth it.
Some questions and a disclaimer below.
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> mm/rmap.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 67 insertions(+), 4 deletions(-)
>
> Index: linux-2.6/mm/rmap.c
> ===================================================================
> --- linux-2.6.orig/mm/rmap.c
> +++ linux-2.6/mm/rmap.c
> @@ -78,6 +78,12 @@ static inline struct anon_vma *anon_vma_
> void anon_vma_free(struct anon_vma *anon_vma)
> {
> VM_BUG_ON(atomic_read(&anon_vma->ref));
> + /*
> + * Sync against the anon_vma->lock, so that we can hold the
> + * lock without requiring a reference. See page_lock_anon_vma().
> + */
> + mutex_lock(&anon_vma->lock);
On some systems, the CPU is permitted to pull references into the critical
section from either side. So, do we also need an smp_mb() here?
> + mutex_unlock(&anon_vma->lock);
So, a question...
Can the above mutex be contended? If yes, what happens when the
competing mutex_lock() acquires the lock at this point? Or, worse yet,
after the kmem_cache_free()?
If no, what do we accomplish by acquiring the lock?
If the above mutex can be contended, can we fix by substituting
synchronize_rcu_expedited()? Which will soon require some scalability
attention if it gets used here, but what else is new? ;-)
> kmem_cache_free(anon_vma_cachep, anon_vma);
> }
>
> @@ -291,7 +297,7 @@ void __init anon_vma_init(void)
>
> /*
> * Getting a lock on a stable anon_vma from a page off the LRU is
> - * tricky: page_lock_anon_vma relies on RCU to guard against the races.
> + * tricky: anon_vma_get relies on RCU to guard against the races.
> */
> struct anon_vma *anon_vma_get(struct page *page)
> {
> @@ -320,12 +326,70 @@ out:
> return anon_vma;
> }
>
> +/*
> + * Similar to anon_vma_get(), however it relies on the anon_vma->lock
> + * to pin the object. However since we cannot wait for the mutex
> + * acquisition inside the RCU read lock, we use the ref count
> + * in the slow path.
> + */
> struct anon_vma *page_lock_anon_vma(struct page *page)
> {
> - struct anon_vma *anon_vma = anon_vma_get(page);
> + struct anon_vma *anon_vma = NULL;
> + unsigned long anon_mapping;
> +
> +again:
> + rcu_read_lock();
This is interesting. You have an RCU read-side critical section with
no rcu_dereference().
This strange state of affairs is actually legal (assuming that
anon_mapping is the RCU-protected structure) because all dereferences
of the anon_vma variable are atomic operations that guarantee ordering
(the mutex_trylock() and the atomic_inc_not_zero().
The other dereferences (the atomic_read()s) are under the lock, so
are also OK assuming that the lock is held when initializing and
updating these fields, and even more OK due to the smp_rmb() below.
But see below.
> + anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
> + if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
> + goto unlock;
> + if (!page_mapped(page))
> + goto unlock;
> +
> + anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
> + if (!mutex_trylock(&anon_vma->lock)) {
> + /*
> + * We failed to acquire the lock, take a ref so we can
> + * drop the RCU read lock and sleep on it.
> + */
> + if (!atomic_inc_not_zero(&anon_vma->ref)) {
> + /*
> + * Failed to get a ref, we're dead, bail.
> + */
> + anon_vma = NULL;
> + goto unlock;
> + }
> + rcu_read_unlock();
>
> - if (anon_vma)
> mutex_lock(&anon_vma->lock);
> + /*
> + * We got the lock, drop the temp. ref, if it was the last
> + * one free it and bail.
> + */
> + if (atomic_dec_and_test(&anon_vma->ref)) {
> + mutex_unlock(&anon_vma->lock);
> + anon_vma_free(anon_vma);
> + anon_vma = NULL;
> + }
> + goto out;
> + }
> + /*
> + * Got the lock, check we're still alive. Seeing a ref
> + * here guarantees the object will stay alive due to
> + * anon_vma_free() syncing against the lock we now hold.
> + */
> + smp_rmb(); /* Order against anon_vma_put() */
This is ordering the fetch into anon_vma against the atomic_read() below?
If so, smp_read_barrier_depends() will cover it more cheaply. Alternatively,
use rcu_dereference() when fetching into anon_vma.
Or am I misunderstanding the purpose of this barrier?
(Disclaimer: I have not yet found anon_vma_put(), so I am assuming that
anon_vma_free() plays the role of a grace period.)
> + if (!atomic_read(&anon_vma->ref)) {
> + mutex_unlock(&anon_vma->lock);
> + anon_vma = NULL;
> + }
> +
> +unlock:
> + rcu_read_unlock();
> +out:
> + if (anon_vma && page_rmapping(page) != anon_vma) {
> + mutex_unlock(&anon_vma->lock);
> + goto again;
> + }
>
> return anon_vma;
> }
> @@ -333,7 +397,6 @@ struct anon_vma *page_lock_anon_vma(stru
> void page_unlock_anon_vma(struct anon_vma *anon_vma)
> {
> mutex_unlock(&anon_vma->lock);
> - anon_vma_put(anon_vma);
> }
>
> /*
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2010-04-08 22:18 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-08 19:17 [PATCH 00/13] mm: preemptibility -v2 Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 01/13] powerpc: Add rcu_read_lock() to gup_fast() implementation Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-08 20:31 ` Rik van Riel
2010-04-09 3:11 ` Nick Piggin
2010-04-13 1:05 ` Benjamin Herrenschmidt
2010-04-13 3:43 ` Paul E. McKenney
2010-04-14 13:51 ` Peter Zijlstra
2010-04-15 14:28 ` Paul E. McKenney
2010-04-16 6:54 ` Benjamin Herrenschmidt
2010-04-16 13:43 ` Paul E. McKenney
2010-04-16 13:43 ` Paul E. McKenney
2010-04-16 23:25 ` Benjamin Herrenschmidt
2010-04-16 13:51 ` Peter Zijlstra
2010-04-16 14:17 ` Paul E. McKenney
2010-04-16 14:23 ` Peter Zijlstra
2010-04-16 14:32 ` Paul E. McKenney
2010-04-16 14:56 ` Peter Zijlstra
2010-04-16 15:09 ` Paul E. McKenney
2010-04-16 15:14 ` Peter Zijlstra
2010-04-16 16:45 ` Paul E. McKenney
2010-04-16 19:37 ` Peter Zijlstra
2010-04-16 20:28 ` Paul E. McKenney
2010-04-18 3:06 ` James Bottomley
2010-04-18 13:55 ` Paul E. McKenney
2010-04-18 18:55 ` James Bottomley
2010-04-16 6:51 ` Benjamin Herrenschmidt
2010-04-16 8:18 ` Nick Piggin
2010-04-16 8:29 ` Benjamin Herrenschmidt
2010-04-16 9:22 ` Nick Piggin
2010-04-08 19:17 ` [PATCH 02/13] mm: Revalidate anon_vma in page_lock_anon_vma() Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-08 20:50 ` Rik van Riel
2010-04-08 21:20 ` Andrew Morton
2010-04-08 21:54 ` Peter Zijlstra
2010-04-08 21:54 ` Peter Zijlstra
2010-04-09 2:19 ` KOSAKI Motohiro
2010-04-09 2:19 ` Minchan Kim
2010-04-09 3:16 ` Nick Piggin
2010-04-09 4:56 ` KAMEZAWA Hiroyuki
2010-04-09 6:34 ` KOSAKI Motohiro
2010-04-09 6:47 ` KAMEZAWA Hiroyuki
2010-04-09 7:29 ` KOSAKI Motohiro
2010-04-09 7:57 ` KAMEZAWA Hiroyuki
2010-04-09 8:03 ` KAMEZAWA Hiroyuki
2010-04-09 8:24 ` KAMEZAWA Hiroyuki
2010-04-09 8:01 ` Minchan Kim
2010-04-09 8:17 ` KOSAKI Motohiro
2010-04-09 14:41 ` mlock and pageout race? Minchan Kim
2010-04-09 8:44 ` [PATCH 02/13] mm: Revalidate anon_vma in page_lock_anon_vma() Peter Zijlstra
2010-05-24 19:32 ` Andrew Morton
2010-05-25 9:01 ` Peter Zijlstra
2010-04-09 12:57 ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 03/13] x86: Remove last traces of quicklist usage Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-08 20:51 ` Rik van Riel
2010-04-08 19:17 ` [PATCH 04/13] mm: Move anon_vma ref out from under CONFIG_KSM Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-09 12:35 ` Rik van Riel
2010-04-08 19:17 ` [PATCH 05/13] mm: Make use of the anon_vma ref count Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-09 7:04 ` Christian Ehrhardt
2010-04-09 9:57 ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 06/13] mm: Preemptible mmu_gather Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-09 3:25 ` Nick Piggin
2010-04-09 8:18 ` Peter Zijlstra
2010-04-09 20:36 ` Peter Zijlstra
2010-04-19 19:16 ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 07/13] powerpc: " Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-09 4:07 ` Nick Piggin
2010-04-09 8:14 ` Peter Zijlstra
2010-04-09 8:46 ` Nick Piggin
2010-04-09 9:22 ` Peter Zijlstra
2010-04-13 2:06 ` Benjamin Herrenschmidt
2010-04-13 1:56 ` Benjamin Herrenschmidt
2010-04-13 1:23 ` Benjamin Herrenschmidt
2010-04-13 10:22 ` Peter Zijlstra
2010-04-14 13:34 ` Peter Zijlstra
2010-04-14 13:51 ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 08/13] sparc: " Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 09/13] mm, powerpc: Move the RCU page-table freeing into generic code Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-09 3:35 ` Nick Piggin
2010-04-09 8:08 ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 10/13] lockdep, mutex: Provide mutex_lock_nest_lock Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-09 15:36 ` Rik van Riel
2010-04-08 19:17 ` [PATCH 11/13] mutex: Provide mutex_is_contended Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-09 15:37 ` Rik van Riel
2010-04-08 19:17 ` [PATCH 12/13] mm: Convert i_mmap_lock and anon_vma->lock to mutexes Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 13/13] mm: Optimize page_lock_anon_vma Peter Zijlstra
2010-04-08 19:17 ` Peter Zijlstra
2010-04-08 22:18 ` Paul E. McKenney [this message]
2010-04-09 8:35 ` Peter Zijlstra
2010-04-09 19:22 ` Paul E. McKenney
2010-04-08 20:29 ` [PATCH 00/13] mm: preemptibility -v2 David Miller
2010-04-08 20:35 ` Peter Zijlstra
2010-04-09 1:00 ` David Miller
2010-04-09 4:14 ` Nick Piggin
2010-04-09 8:35 ` Peter Zijlstra
2010-04-09 8:50 ` Nick Piggin
2010-04-09 8:58 ` Peter Zijlstra
2010-04-09 8:58 ` Martin Schwidefsky
2010-04-09 9:53 ` Peter Zijlstra
2010-04-09 9:03 ` David Howells
2010-04-09 9:22 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100408221817.GE2520@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=davem@davemloft.net \
--cc=hugh.dickins@tiscali.co.uk \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.