linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, Rik van Riel <riel@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Jan Vorlicek <janvorli@microsoft.com>,
	Aditya Mandaleeka <adityam@microsoft.com>
Subject: [PATCH 0/2] vma_merge vs rmap_walk SMP race condition fix
Date: Thu, 15 Sep 2016 19:41:42 +0200	[thread overview]
Message-ID: <1473961304-19370-1-git-send-email-aarcange@redhat.com> (raw)

Hello,

Aditya reported the CLR JIT was segfaulting on older kernels.

It turns out newer kernels have do_numa_page using pte_modify instead
of pte_mknotnuma, that by luck corrects any broken invariant of a
PAGE_NONE pte on a VM_READ|WRITE vma even on non-NUMA systems, and
establishes a proper pte with PAGE_USER set.

However the opposite version of the race condition can still happen
even upstream and that generates silent data corruption with a pte
that should be PAGE_NONE and is not.

The testcase source is on github.

https://github.com/dotnet/coreclr/tree/784f4d93d43f29cc52d41ec9fae5a96ad68633cb/src

To verify the bug upstream, I verified that do_page_numa indeed is
invoked on non-NUMA guests, despite NUMA balancing obviously was never
enabled on non-NUMA guests. With the fix applied do_numa_page is never
executed anymore.

-->	*pprev = vma_merge(mm, *pprev, start, end, newflags,
			   vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
			   vma->vm_userfaultfd_ctx);
	if (*pprev) {
		vma = *pprev;
		goto success;
[..]
success:
	/*
	 * vm_flags and vm_page_prot are protected by the mmap_sem
	 * held in write mode.
	 */
	vma->vm_flags = newflags;
	dirty_accountable = vma_wants_writenotify(vma);
-->	vma_set_page_prot(vma);

If remove_migration_ptes runs before vma_set_page_prot run but after
the current vma was extended and the "next" vma was removed, and the
page migration happened in an address in the "next->vma_start/end"
range, the pte of the "next" range gets established by
remove_migration_ptes using the vm_page_prot of the prev vma, but
change_protection of course won't run on the next range, but only in
the current vma range.

There are already fields in the vma that must be updated inside the
anon_vma/i_mmap locks, to avoid screwing the rmap_walk,
vm_start/vm_pgoff are two of them, but vm_page_prot/vm_flags got
overlooked and wasn't properly transferred from the removed_next vma
to the importer current vma, before releasing the rmap locks.

Now the bug happens because of vm_page_prot, but I copied over
vm_flags too as often it's used by maybe_mkwrite and other pte/pmd
manipulations that might run in rmap walks. It's just safer to update
both before releasing the rmap serializing locks.

This fixes the race and it is confirmed by do_numa_page never running
anymore as after mprotect(PROT_READ|WRITE) returns on non-NUMA
systems.

So no pte is left as PAGE_NONE by mistake (the other way around too
even though it's harder to verify as the other way is silent, but the
fix is obviously going to correct the reverse condition too).

Then there's patch 1/2 too where half updated vm_page_prot could
become visible to rmap walks but that is a theoretical problem noticed
only during code review that I fixed along the way. The real practical
fix that solves the pgtable going out of sync with vm_flags is 2/2.

Andrea Arcangeli (2):
  mm: vm_page_prot: update with WRITE_ONCE/READ_ONCE
  mm: vma_merge: fix race vm_page_prot race condition against rmap_walk

 mm/huge_memory.c |  2 +-
 mm/migrate.c     |  2 +-
 mm/mmap.c        | 23 +++++++++++++++++++----
 3 files changed, 21 insertions(+), 6 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2016-09-15 17:41 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-15 17:41 Andrea Arcangeli [this message]
2016-09-15 17:41 ` [PATCH 1/2] mm: vm_page_prot: update with WRITE_ONCE/READ_ONCE Andrea Arcangeli
2016-09-15 18:27   ` Rik van Riel
2016-09-15 17:41 ` [PATCH 2/2] mm: vma_merge: fix race vm_page_prot race condition against rmap_walk Andrea Arcangeli
2016-09-15 18:28   ` Rik van Riel
2016-09-16 18:42   ` Hugh Dickins
2016-09-16 20:54     ` Andrea Arcangeli
2016-09-17 16:05       ` [PATCH 0/1] mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk v2 Andrea Arcangeli
2016-09-17 16:05         ` [PATCH 1/1] mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk Andrea Arcangeli
2016-09-18  0:36           ` Andrea Arcangeli
2016-09-19 18:25             ` [PATCH 1/2] " Andrea Arcangeli
2016-09-19 18:25               ` [PATCH 2/2] mm: vma_adjust: remove superfluous check for next not NULL Andrea Arcangeli
2016-09-22 10:36               ` [PATCH 1/2] mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk Hugh Dickins
2016-09-23 19:18                 ` Andrea Arcangeli
2016-09-23 20:25                   ` Hugh Dickins
2016-09-28  5:09               ` [lkp] [mm] 2129957506: kernel BUG at mm/mmap.c:329! kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1473961304-19370-1-git-send-email-aarcange@redhat.com \
    --to=aarcange@redhat.com \
    --cc=adityam@microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=janvorli@microsoft.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).