From: Andrea Arcangeli <aarcange@redhat.com>
To: qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Cc: Robert Love <rlove@google.com>, Dave Hansen <dave@sr71.net>,
Jan Kara <jack@suse.cz>, Neil Brown <neilb@suse.de>,
Stefan Hajnoczi <stefanha@gmail.com>,
Andrew Jones <drjones@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
Michel Lespinasse <walken@google.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Taras Glek <tglek@mozilla.com>,
Juan Quintela <quintela@redhat.com>,
Hugh Dickins <hughd@google.com>,
Isaku Yamahata <yamahata@valinux.co.jp>,
Mel Gorman <mgorman@suse.de>,
Android Kernel Team <kernel-team@android.com>,
Mel Gorman <mel@csn.ul.ie>,
"\\\"Dr. David Alan Gilbert\\\"" <dgilbert@redhat.com>,
"Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
Anthony Liguori <anthony@codemonkey.ws>,
Mike Hommey <mh@glandium.org>, Keith Packard <keithp@keithp.com>,
Wenchao Xia <wenchaoqemu@gmail.com>,
Minchan Kim <minchan@kernel.org>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: [Qemu-devel] [PATCH 04/10] mm: rmap preparation for remap_anon_pages
Date: Wed, 2 Jul 2014 18:50:10 +0200 [thread overview]
Message-ID: <1404319816-30229-5-git-send-email-aarcange@redhat.com> (raw)
In-Reply-To: <1404319816-30229-1-git-send-email-aarcange@redhat.com>
remap_anon_pages (unlike remap_file_pages) tries to be non intrusive
in the rmap code.
As far as the rmap code is concerned, rmap_anon_pages only alters the
page->mapping and page->index. It does it while holding the page
lock. However there are a few places that in presence of anon pages
are allowed to do rmap walks without the page lock (split_huge_page
and page_referenced_anon). Those places that are doing rmap walks
without taking the page lock first, must be updated to re-check that
the page->mapping didn't change after they obtained the anon_vma
lock. remap_anon_pages takes the anon_vma lock for writing before
altering the page->mapping, so if the page->mapping is still the same
after obtaining the anon_vma lock (without the page lock), the rmap
walks can go ahead safely (and remap_anon_pages will wait them to
complete before proceeding).
remap_anon_pages serializes against itself with the page lock.
All other places taking the anon_vma lock while holding the mmap_sem
for writing, don't need to check if the page->mapping has changed
after taking the anon_vma lock, regardless of the page lock, because
remap_anon_pages holds the mmap_sem for reading.
Overall this looks a fairly small change to the rmap code, notably
less intrusive than the nonlinear vmas created by remap_file_pages.
There's one constraint enforced to allow this simplification: the
source pages passed to remap_anon_pages must be mapped only in one
vma, but this is not a limitation when used to handle userland page
faults with MADV_USERFAULT. The source addresses passed to
remap_anon_pages should be set as VM_DONTCOPY with MADV_DONTFORK to
avoid any risk of the mapcount of the pages increasing, if fork runs
in parallel in another thread, before or while remap_anon_pages runs.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
mm/huge_memory.c | 24 ++++++++++++++++++++----
mm/rmap.c | 9 +++++++++
2 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1928463..94c37ca 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1907,6 +1907,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
{
struct anon_vma *anon_vma;
int ret = 1;
+ struct address_space *mapping;
BUG_ON(is_huge_zero_page(page));
BUG_ON(!PageAnon(page));
@@ -1918,10 +1919,24 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
* page_lock_anon_vma_read except the write lock is taken to serialise
* against parallel split or collapse operations.
*/
- anon_vma = page_get_anon_vma(page);
- if (!anon_vma)
- goto out;
- anon_vma_lock_write(anon_vma);
+ for (;;) {
+ mapping = ACCESS_ONCE(page->mapping);
+ anon_vma = page_get_anon_vma(page);
+ if (!anon_vma)
+ goto out;
+ anon_vma_lock_write(anon_vma);
+ /*
+ * We don't hold the page lock here so
+ * remap_anon_pages_huge_pmd can change the anon_vma
+ * from under us until we obtain the anon_vma
+ * lock. Verify that we obtained the anon_vma lock
+ * before remap_anon_pages did.
+ */
+ if (likely(mapping == ACCESS_ONCE(page->mapping)))
+ break;
+ anon_vma_unlock_write(anon_vma);
+ put_anon_vma(anon_vma);
+ }
ret = 0;
if (!PageCompound(page))
@@ -2420,6 +2435,7 @@ static void collapse_huge_page(struct mm_struct *mm,
* Prevent all access to pagetables with the exception of
* gup_fast later hanlded by the ptep_clear_flush and the VM
* handled by the anon_vma lock + PG_lock.
+ * remap_anon_pages is prevented to race as well by the mmap_sem.
*/
down_write(&mm->mmap_sem);
if (unlikely(khugepaged_test_exit(mm)))
diff --git a/mm/rmap.c b/mm/rmap.c
index b7e94eb..59a7e7d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -450,6 +450,7 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)
struct anon_vma *root_anon_vma;
unsigned long anon_mapping;
+repeat:
rcu_read_lock();
anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
@@ -488,6 +489,14 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)
rcu_read_unlock();
anon_vma_lock_read(anon_vma);
+ /* check if remap_anon_pages changed the anon_vma */
+ if (unlikely((unsigned long) ACCESS_ONCE(page->mapping) != anon_mapping)) {
+ anon_vma_unlock_read(anon_vma);
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ goto repeat;
+ }
+
if (atomic_dec_and_test(&anon_vma->refcount)) {
/*
* Oops, we held the last refcount, release the lock
next prev parent reply other threads:[~2014-07-02 16:51 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-02 16:50 [Qemu-devel] [PATCH 00/10] RFC: userfault Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 01/10] mm: madvise MADV_USERFAULT: prepare vm_flags to allow more than 32bits Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 02/10] mm: madvise MADV_USERFAULT Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 03/10] mm: PT lock: export double_pt_lock/unlock Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli [this message]
2014-07-02 16:50 ` [Qemu-devel] [PATCH 05/10] mm: swp_entry_swapcount Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 06/10] mm: sys_remap_anon_pages Andrea Arcangeli
2014-07-04 11:30 ` Michael Kerrisk
2014-07-02 16:50 ` [Qemu-devel] [PATCH 07/10] waitqueue: add nr wake parameter to __wake_up_locked_key Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 08/10] userfaultfd: add new syscall to provide memory externalization Andrea Arcangeli
2014-07-03 1:56 ` Andy Lutomirski
2014-07-03 13:19 ` Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 09/10] userfaultfd: make userfaultfd_write non blocking Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 10/10] userfaultfd: use VM_FAULT_RETRY in handle_userfault() Andrea Arcangeli
2014-07-03 1:51 ` [Qemu-devel] [PATCH 00/10] RFC: userfault Andy Lutomirski
2014-07-03 13:45 ` Christopher Covington
2014-07-03 14:08 ` Andrea Arcangeli
2014-07-03 15:41 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1404319816-30229-5-git-send-email-aarcange@redhat.com \
--to=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anthony@codemonkey.ws \
--cc=dave@sr71.net \
--cc=dgilbert@redhat.com \
--cc=dmitry.adamushko@gmail.com \
--cc=drjones@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=keithp@keithp.com \
--cc=kernel-team@android.com \
--cc=kosaki.motohiro@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mgorman@suse.de \
--cc=mh@glandium.org \
--cc=minchan@kernel.org \
--cc=neilb@suse.de \
--cc=pbonzini@redhat.com \
--cc=peter.huangpeng@huawei.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=rlove@google.com \
--cc=stefanha@gmail.com \
--cc=tglek@mozilla.com \
--cc=walken@google.com \
--cc=wenchaoqemu@gmail.com \
--cc=yamahata@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).