stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, David Rientjes <rientjes@google.com>,
	Holger Kiehl <Holger.Kiehl@dwd.de>,
	Christoph Lameter <cl@linux.com>,
	Rafael Aquini <aquini@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@suse.cz>,
	Mel Gorman <mgorman@suse.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.13 20/22] mm: close PageTail race
Date: Mon, 31 Mar 2014 21:08:50 -0700	[thread overview]
Message-ID: <20140401040708.038012110@linuxfoundation.org> (raw)
In-Reply-To: <20140401040703.045139933@linuxfoundation.org>

3.13-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Rientjes <rientjes@google.com>

commit 668f9abbd4334e6c29fa8acd71635c4f9101caa7 upstream.

Commit bf6bddf1924e ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page).  Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation.  This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling.  The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 drivers/block/aoe/aoecmd.c      |    4 ++--
 drivers/vfio/vfio_iommu_type1.c |    4 ++--
 fs/proc/page.c                  |    2 +-
 include/linux/huge_mm.h         |   18 ------------------
 include/linux/mm.h              |   14 ++++++++++++--
 mm/ksm.c                        |    2 +-
 mm/memory-failure.c             |    2 +-
 mm/page_alloc.c                 |    4 +++-
 mm/swap.c                       |    4 ++--
 9 files changed, 24 insertions(+), 30 deletions(-)

--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -905,7 +905,7 @@ bio_pageinc(struct bio *bio)
 		/* Non-zero page count for non-head members of
 		 * compound pages is no longer allowed by the kernel.
 		 */
-		page = compound_trans_head(bv->bv_page);
+		page = compound_head(bv->bv_page);
 		atomic_inc(&page->_count);
 	}
 }
@@ -918,7 +918,7 @@ bio_pagedec(struct bio *bio)
 	int i;
 
 	bio_for_each_segment(bv, bio, i) {
-		page = compound_trans_head(bv->bv_page);
+		page = compound_head(bv->bv_page);
 		atomic_dec(&page->_count);
 	}
 }
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -186,12 +186,12 @@ static bool is_invalid_reserved_pfn(unsi
 	if (pfn_valid(pfn)) {
 		bool reserved;
 		struct page *tail = pfn_to_page(pfn);
-		struct page *head = compound_trans_head(tail);
+		struct page *head = compound_head(tail);
 		reserved = !!(PageReserved(head));
 		if (head != tail) {
 			/*
 			 * "head" is not a dangling pointer
-			 * (compound_trans_head takes care of that)
+			 * (compound_head takes care of that)
 			 * but the hugepage may have been split
 			 * from under us (and we may not hold a
 			 * reference count on the head page so it can
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -121,7 +121,7 @@ u64 stable_page_flags(struct page *page)
 	 * just checks PG_head/PG_tail, so we need to check PageLRU to make
 	 * sure a given page is a thp, not a non-huge compound page.
 	 */
-	else if (PageTransCompound(page) && PageLRU(compound_trans_head(page)))
+	else if (PageTransCompound(page) && PageLRU(compound_head(page)))
 		u |= 1 << KPF_THP;
 
 	/*
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -157,23 +157,6 @@ static inline int hpage_nr_pages(struct
 		return HPAGE_PMD_NR;
 	return 1;
 }
-static inline struct page *compound_trans_head(struct page *page)
-{
-	if (PageTail(page)) {
-		struct page *head;
-		head = page->first_page;
-		smp_rmb();
-		/*
-		 * head may be a dangling pointer.
-		 * __split_huge_page_refcount clears PageTail before
-		 * overwriting first_page, so if PageTail is still
-		 * there it means the head pointer isn't dangling.
-		 */
-		if (PageTail(page))
-			return head;
-	}
-	return page;
-}
 
 extern int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
 				unsigned long addr, pmd_t pmd, pmd_t *pmdp);
@@ -203,7 +186,6 @@ static inline int split_huge_page(struct
 	do { } while (0)
 #define split_huge_page_pmd_mm(__mm, __address, __pmd)	\
 	do { } while (0)
-#define compound_trans_head(page) compound_head(page)
 static inline int hugepage_madvise(struct vm_area_struct *vma,
 				   unsigned long *vm_flags, int advice)
 {
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -389,8 +389,18 @@ static inline void compound_unlock_irqre
 
 static inline struct page *compound_head(struct page *page)
 {
-	if (unlikely(PageTail(page)))
-		return page->first_page;
+	if (unlikely(PageTail(page))) {
+		struct page *head = page->first_page;
+
+		/*
+		 * page->first_page may be a dangling pointer to an old
+		 * compound page, so recheck that it is still a tail
+		 * page before returning.
+		 */
+		smp_rmb();
+		if (likely(PageTail(page)))
+			return head;
+	}
 	return page;
 }
 
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -444,7 +444,7 @@ static void break_cow(struct rmap_item *
 static struct page *page_trans_compound_anon(struct page *page)
 {
 	if (PageTransCompound(page)) {
-		struct page *head = compound_trans_head(page);
+		struct page *head = compound_head(page);
 		/*
 		 * head may actually be splitted and freed from under
 		 * us but it's ok here.
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1645,7 +1645,7 @@ int soft_offline_page(struct page *page,
 {
 	int ret;
 	unsigned long pfn = page_to_pfn(page);
-	struct page *hpage = compound_trans_head(page);
+	struct page *hpage = compound_head(page);
 
 	if (PageHWPoison(page)) {
 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -369,9 +369,11 @@ void prep_compound_page(struct page *pag
 	__SetPageHead(page);
 	for (i = 1; i < nr_pages; i++) {
 		struct page *p = page + i;
-		__SetPageTail(p);
 		set_page_count(p, 0);
 		p->first_page = page;
+		/* Make sure p->first_page is always valid for PageTail() */
+		smp_wmb();
+		__SetPageTail(p);
 	}
 }
 
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -84,7 +84,7 @@ static void put_compound_page(struct pag
 {
 	if (unlikely(PageTail(page))) {
 		/* __split_huge_page_refcount can run under us */
-		struct page *page_head = compound_trans_head(page);
+		struct page *page_head = compound_head(page);
 
 		if (likely(page != page_head &&
 			   get_page_unless_zero(page_head))) {
@@ -222,7 +222,7 @@ bool __get_page_tail(struct page *page)
 	 */
 	unsigned long flags;
 	bool got = false;
-	struct page *page_head = compound_trans_head(page);
+	struct page *page_head = compound_head(page);
 
 	if (likely(page != page_head && get_page_unless_zero(page_head))) {
 		/* Ref to put_compound_page() comment. */



  parent reply	other threads:[~2014-04-01  4:08 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-01  4:08 [PATCH 3.13 00/22] 3.13.9-stable review Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 01/22] ext4: atomically set inode->i_flags in ext4_set_inode_flags() Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 02/22] rcuwalk: recheck mount_lock after mountpoint crossing attempts Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 03/22] Input: mousedev - fix race when creating mixed device Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 04/22] Input: synaptics - add manual min/max quirk Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 05/22] Input: synaptics - add manual min/max quirk for ThinkPad X240 Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 06/22] Input: cypress_ps2 - dont report as a button pads Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 07/22] xen/balloon: flush persistent kmaps in correct position Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 08/22] Revert "xen: properly account for _PAGE_NUMA during xen pte translations" Greg Kroah-Hartman
2014-04-10 10:05   ` Steven Noonan
2014-04-10 10:08     ` David Vrabel
2014-04-01  4:08 ` [PATCH 3.13 09/22] i2c: cpm: Fix build by adding of_address.h and of_irq.h Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 10/22] drm/i915: Undo gtt scratch pte unmapping again Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 11/22] x86: fix boot on uniprocessor systems Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 12/22] make prepend_name() work correctly when called with negative *buflen Greg Kroah-Hartman
2014-04-09 20:17   ` Sasha Levin
2014-04-01  4:08 ` [PATCH 3.13 13/22] net: mvneta: rename MVNETA_GMAC2_PSC_ENABLE to MVNETA_GMAC2_PCS_ENABLE Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 14/22] net: mvneta: fix usage as a module on RGMII configurations Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 15/22] random32: avoid attempt to late reseed if in the middle of seeding Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 16/22] resizable namespace.c hashes Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 17/22] keep shadowed vfsmounts together Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 18/22] dont bother with propagate_mnt() unless the target is shared Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 19/22] switch mnt_hash to hlist Greg Kroah-Hartman
2014-04-01  4:08 ` Greg Kroah-Hartman [this message]
2014-04-01  4:08 ` [PATCH 3.13 21/22] cgroup: protect modifications to cgroup_idr with cgroup_mutex Greg Kroah-Hartman
2014-04-01  4:08 ` [PATCH 3.13 22/22] netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages Greg Kroah-Hartman
2014-04-02  0:03 ` [PATCH 3.13 00/22] 3.13.9-stable review Guenter Roeck
2014-04-03 22:45   ` Greg Kroah-Hartman
2014-04-04 13:30     ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140401040708.038012110@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=Holger.Kiehl@dwd.de \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=cl@linux.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).