All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jiri Slaby <jslaby@suse.cz>
Subject: [PATCH 3.14 11/11] mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
Date: Fri,  9 Sep 2016 17:33:46 +0200	[thread overview]
Message-ID: <20160909153157.937965065@linuxfoundation.org> (raw)
In-Reply-To: <20160909153156.152470606@linuxfoundation.org>

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andrea Arcangeli <aarcange@redhat.com>

commit ad33bb04b2a6cee6c1f99fabb15cddbf93ff0433 upstream.

pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).

While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable().  The old pmd_trans_huge() left a
tiny window for a race.

Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.

[js] 3.12 backport: no pmd_devmap in 3.12 yet.

[akpm@linux-foundation.org: tidy up comment grammar/layout]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 mm/memory.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3770,8 +3770,18 @@ static int __handle_mm_fault(struct mm_s
 	if (unlikely(pmd_none(*pmd)) &&
 	    unlikely(__pte_alloc(mm, vma, pmd, address)))
 		return VM_FAULT_OOM;
-	/* if an huge pmd materialized from under us just retry later */
-	if (unlikely(pmd_trans_huge(*pmd)))
+	/*
+	 * If a huge pmd materialized under us just retry later.  Use
+	 * pmd_trans_unstable() instead of pmd_trans_huge() to ensure the pmd
+	 * didn't become pmd_trans_huge under us and then back to pmd_none, as
+	 * a result of MADV_DONTNEED running immediately after a huge pmd fault
+	 * in a different thread of this mm, in turn leading to a misleading
+	 * pmd_trans_huge() retval.  All we have to ensure is that it is a
+	 * regular pmd that we can walk with pte_offset_map() and we can do that
+	 * through an atomic read in C, which is what pmd_trans_unstable()
+	 * provides.
+	 */
+	if (unlikely(pmd_trans_unstable(pmd)))
 		return 0;
 	/*
 	 * A regular pmd is established and it can't morph into a huge pmd

  parent reply	other threads:[~2016-09-09 15:34 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20160909153350uscas1p21aff690b4a974b74b05dd6391f4fc8a1@uscas1p2.samsung.com>
2016-09-09 15:33 ` [PATCH 3.14 00/11] 3.14.79-stable review Greg Kroah-Hartman
2016-09-09 15:33   ` [PATCH 3.14 01/11] Revert "can: fix handling of unmodifiable configuration options fix" Greg Kroah-Hartman
2016-09-09 15:33   ` [PATCH 3.14 02/11] be2iscsi: Fix bogus WARN_ON length check Greg Kroah-Hartman
2016-09-09 15:33   ` [PATCH 3.14 03/11] HID: hid-input: Add parentheses to quell gcc warning Greg Kroah-Hartman
2016-09-09 15:33   ` [PATCH 3.14 04/11] ALSA: oxygen: Fix logical-not-parentheses warning Greg Kroah-Hartman
2016-09-09 15:33   ` [PATCH 3.14 05/11] [media] stb6100: fix buffer length check in stb6100_write_reg_range() Greg Kroah-Hartman
2016-09-09 15:33   ` [PATCH 3.14 06/11] ext4: validate that metadata blocks do not overlap superblock Greg Kroah-Hartman
2016-09-09 15:33   ` [PATCH 3.14 08/11] rds: fix an infoleak in rds_inc_info_copy Greg Kroah-Hartman
2016-09-09 15:33   ` [PATCH 3.14 09/11] s390/sclp_ctl: fix potential information leak with /dev/sclp Greg Kroah-Hartman
2016-09-09 15:33   ` [PATCH 3.14 10/11] fix d_walk()/non-delayed __d_free() race Greg Kroah-Hartman
2016-09-09 15:33   ` Greg Kroah-Hartman [this message]
2016-09-09 22:32   ` [PATCH 3.14 00/11] 3.14.79-stable review Shuah Khan
2016-09-10  7:17     ` Greg Kroah-Hartman
2016-09-10  2:19   ` Guenter Roeck
2016-09-10  7:17     ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160909153157.937965065@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=jslaby@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.