linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: qemu-devel@nongnu.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-api@vger.kernel.org,
	Android Kernel Team <kernel-team@android.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Pavel Emelyanov <xemul@parallels.com>,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>,
	zhang.zhanghailiang@huawei.com,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Dave Hansen <dave@sr71.net>, Paolo Bonzini <pbonzini@redhat.com>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Andy Lutomirski <luto@amacapital.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	Peter Feiner <pfeiner@google.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Christopher Covington <cov@codeaurora.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Robert Love <rlove@google.com>,
	Dmitry Adamushko <dmitry.adamushko@gmail.com>,
	Neil Brown <neilb@suse.de>, Mike Hommey <mh@glandium.org>,
	Taras Glek <tglek@mozilla.com>, Jan Kara <jack@suse.cz>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	Michel Lespinasse <walken@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Keith Packard <keithp@keithp.com>,
	"Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
	Anthony Liguori <anthony@codemonkey.ws>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	Wenchao Xia <wenchaoqemu@gmail.com>,
	Andrew Jones <drjones@redhat.com>,
	Juan Quintela <quintela@redhat.com>
Subject: [PATCH 16/21] userfaultfd: remap_pages: rmap preparation
Date: Thu,  5 Mar 2015 18:17:59 +0100	[thread overview]
Message-ID: <1425575884-2574-17-git-send-email-aarcange@redhat.com> (raw)
In-Reply-To: <1425575884-2574-1-git-send-email-aarcange@redhat.com>

As far as the rmap code is concerned, rmap_pages only alters the
page->mapping and page->index. It does it while holding the page
lock. However there are a few places that in presence of anon pages
are allowed to do rmap walks without the page lock (split_huge_page
and page_referenced_anon). Those places that are doing rmap walks
without taking the page lock first, must be updated to re-check that
the page->mapping didn't change after they obtained the anon_vma
lock. remap_pages takes the anon_vma lock for writing before altering
the page->mapping, so if the page->mapping is still the same after
obtaining the anon_vma lock (without the page lock), the rmap walks
can go ahead safely (and remap_pages will wait them to complete before
proceeding).

remap_pages serializes against itself with the page lock.

All other places taking the anon_vma lock while holding the mmap_sem
for writing, don't need to check if the page->mapping has changed
after taking the anon_vma lock, regardless of the page lock, because
remap_pages holds the mmap_sem for reading.

There's one constraint enforced to allow this simplification: the
source pages passed to remap_pages must be mapped only in one vma, but
this is not a limitation when used to handle userland page faults. The
source addresses passed to remap_pages should be set as VM_DONTCOPY
with MADV_DONTFORK to avoid any risk of the mapcount of the pages
increasing, if fork runs in parallel in another thread, before or
while remap_pages runs.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 mm/huge_memory.c | 23 +++++++++++++++++++----
 mm/rmap.c        |  9 +++++++++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8f1b6a5..1e25cb3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1902,6 +1902,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 {
 	struct anon_vma *anon_vma;
 	int ret = 1;
+	struct address_space *mapping;
 
 	BUG_ON(is_huge_zero_page(page));
 	BUG_ON(!PageAnon(page));
@@ -1913,10 +1914,24 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	 * page_lock_anon_vma_read except the write lock is taken to serialise
 	 * against parallel split or collapse operations.
 	 */
-	anon_vma = page_get_anon_vma(page);
-	if (!anon_vma)
-		goto out;
-	anon_vma_lock_write(anon_vma);
+	for (;;) {
+		mapping = ACCESS_ONCE(page->mapping);
+		anon_vma = page_get_anon_vma(page);
+		if (!anon_vma)
+			goto out;
+		anon_vma_lock_write(anon_vma);
+		/*
+		 * We don't hold the page lock here so
+		 * remap_pages_huge_pmd can change the anon_vma from
+		 * under us until we obtain the anon_vma lock. Verify
+		 * that we obtained the anon_vma lock before
+		 * remap_pages did.
+		 */
+		if (likely(mapping == ACCESS_ONCE(page->mapping)))
+			break;
+		anon_vma_unlock_write(anon_vma);
+		put_anon_vma(anon_vma);
+	}
 
 	ret = 0;
 	if (!PageCompound(page))
diff --git a/mm/rmap.c b/mm/rmap.c
index 5e3e090..5ab2df1 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -492,6 +492,7 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)
 	struct anon_vma *root_anon_vma;
 	unsigned long anon_mapping;
 
+repeat:
 	rcu_read_lock();
 	anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
 	if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
@@ -530,6 +531,14 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)
 	rcu_read_unlock();
 	anon_vma_lock_read(anon_vma);
 
+	/* check if remap_anon_pages changed the anon_vma */
+	if (unlikely((unsigned long) ACCESS_ONCE(page->mapping) != anon_mapping)) {
+		anon_vma_unlock_read(anon_vma);
+		put_anon_vma(anon_vma);
+		anon_vma = NULL;
+		goto repeat;
+	}
+
 	if (atomic_dec_and_test(&anon_vma->refcount)) {
 		/*
 		 * Oops, we held the last refcount, release the lock

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-03-05 17:19 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-05 17:17 [PATCH 00/21] RFC: userfaultfd v3 Andrea Arcangeli
2015-03-05 17:17 ` [PATCH 01/21] userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key Andrea Arcangeli
2015-03-05 17:17 ` [PATCH 02/21] userfaultfd: linux/Documentation/vm/userfaultfd.txt Andrea Arcangeli
2015-03-06 15:39   ` [Qemu-devel] " Eric Blake
2015-03-05 17:17 ` [PATCH 03/21] userfaultfd: uAPI Andrea Arcangeli
2015-03-05 17:17 ` [PATCH 04/21] userfaultfd: linux/userfaultfd_k.h Andrea Arcangeli
2015-03-05 17:17 ` [PATCH 05/21] userfaultfd: add vm_userfaultfd_ctx to the vm_area_struct Andrea Arcangeli
2015-03-05 17:48   ` Pavel Emelyanov
2015-03-05 17:17 ` [PATCH 06/21] userfaultfd: add VM_UFFD_MISSING and VM_UFFD_WP Andrea Arcangeli
2015-03-05 17:17 ` [PATCH 07/21] userfaultfd: call handle_userfault() for userfaultfd_missing() faults Andrea Arcangeli
2015-03-05 17:17 ` [PATCH 08/21] userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx Andrea Arcangeli
2015-03-05 17:17 ` [PATCH 09/21] userfaultfd: prevent khugepaged to merge if userfaultfd is armed Andrea Arcangeli
2015-03-05 17:17 ` [PATCH 10/21] userfaultfd: add new syscall to provide memory externalization Andrea Arcangeli
2015-03-05 17:57   ` Pavel Emelyanov
2015-03-06 10:48   ` Michael Kerrisk (man-pages)
2015-03-05 17:17 ` [PATCH 11/21] userfaultfd: buildsystem activation Andrea Arcangeli
2015-03-05 17:17 ` [PATCH 12/21] userfaultfd: activate syscall Andrea Arcangeli
2015-03-05 17:17 ` [PATCH 13/21] userfaultfd: UFFDIO_COPY|UFFDIO_ZEROPAGE uAPI Andrea Arcangeli
2015-03-05 17:17 ` [PATCH 14/21] userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation Andrea Arcangeli
2015-03-05 18:07   ` Pavel Emelyanov
2015-03-05 17:17 ` [PATCH 15/21] userfaultfd: UFFDIO_COPY and UFFDIO_ZEROPAGE Andrea Arcangeli
2015-03-05 17:17 ` Andrea Arcangeli [this message]
2015-03-05 17:18 ` [PATCH 17/21] userfaultfd: remap_pages: swp_entry_swapcount() preparation Andrea Arcangeli
2015-03-05 17:18 ` [PATCH 18/21] userfaultfd: UFFDIO_REMAP uABI Andrea Arcangeli
2015-03-05 17:18 ` [PATCH 19/21] userfaultfd: remap_pages: UFFDIO_REMAP preparation Andrea Arcangeli
2015-03-05 17:39   ` Linus Torvalds
2015-03-05 18:51     ` Andrea Arcangeli
2015-03-05 19:32       ` Linus Torvalds
2015-03-05 18:01   ` Pavel Emelyanov
2015-03-05 17:18 ` [PATCH 20/21] userfaultfd: UFFDIO_REMAP Andrea Arcangeli
2015-03-05 17:18 ` [PATCH 21/21] userfaultfd: add userfaultfd_wp mm helpers Andrea Arcangeli
2015-03-05 18:15 ` [PATCH 00/21] RFC: userfaultfd v3 Pavel Emelyanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1425575884-2574-17-git-send-email-aarcange@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=anthony@codemonkey.ws \
    --cc=cov@codeaurora.org \
    --cc=dave@sr71.net \
    --cc=dgilbert@redhat.com \
    --cc=dmitry.adamushko@gmail.com \
    --cc=drjones@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=keithp@keithp.com \
    --cc=kernel-team@android.com \
    --cc=kirill@shutemov.name \
    --cc=kosaki.motohiro@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=mgorman@suse.de \
    --cc=mh@glandium.org \
    --cc=minchan@kernel.org \
    --cc=neilb@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=pfeiner@google.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=riel@redhat.com \
    --cc=rlove@google.com \
    --cc=sanidhya.gatech@gmail.com \
    --cc=sasha.levin@oracle.com \
    --cc=stefanha@gmail.com \
    --cc=tglek@mozilla.com \
    --cc=torvalds@linux-foundation.org \
    --cc=walken@google.com \
    --cc=wenchaoqemu@gmail.com \
    --cc=xemul@parallels.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).