All of lore.kernel.org
 help / color / mirror / Atom feed
From: <gregkh@linuxfoundation.org>
To: dan.j.williams@intel.com, akpm@linux-foundation.org,
	darrick.wong@oracle.com, dave.hansen@intel.com,
	dave.jiang@intel.com, eguan@redhat.com,
	gregkh@linuxfoundation.org, hch@lst.de, jack@suse.cz,
	kirill.shutemov@linux.intel.com, mawilcox@microsoft.com,
	pawel.lebioda@intel.com, ross.zwisler@linux.intel.com,
	stable@vger.kernel.org, torvalds@linux-foundation.org,
	viro@zeniv.linux.org.uk, xzhou@redhat.com
Cc: <stable@vger.kernel.org>, <stable-commits@vger.kernel.org>
Subject: Patch "mm: avoid spurious 'bad pmd' warning messages" has been added to the 4.9-stable tree
Date: Mon, 26 Feb 2018 20:58:18 +0100	[thread overview]
Message-ID: <151967509843252@kroah.com> (raw)
In-Reply-To: <151942352781.21775.15841303754448120195.stgit@dwillia2-desk3.amr.corp.intel.com>


This is a note to let you know that I've just added the patch titled

    mm: avoid spurious 'bad pmd' warning messages

to the 4.9-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     mm-avoid-spurious-bad-pmd-warning-messages.patch
and it can be found in the queue-4.9 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


>From foo@baz Mon Feb 26 20:55:53 CET 2018
From: Dan Williams <dan.j.williams@intel.com>
Date: Fri, 23 Feb 2018 14:05:27 -0800
Subject: mm: avoid spurious 'bad pmd' warning messages
To: gregkh@linuxfoundation.org
Cc: Jan Kara <jack@suse.cz>, Eryu Guan <eguan@redhat.com>, Xiong Zhou <xzhou@redhat.com>, linux-kernel@vger.kernel.org, Matthew Wilcox <mawilcox@microsoft.com>, Christoph Hellwig <hch@lst.de>, stable@vger.kernel.org, Pawel Lebioda <pawel.lebioda@intel.com>, Dave Hansen <dave.hansen@intel.com>, Alexander Viro <viro@zeniv.linux.org.uk>, Ross Zwisler <ross.zwisler@linux.intel.com>, Dave Jiang <dave.jiang@intel.com>, Andrew Morton <akpm@linux-foundation.org>, Linus Torvalds <torvalds@linux-foundation.org>, "Darrick J. Wong" <darrick.wong@oracle.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Message-ID: <151942352781.21775.15841303754448120195.stgit@dwillia2-desk3.amr.corp.intel.com>

From: Ross Zwisler <ross.zwisler@linux.intel.com>

commit d0f0931de936a0a468d7e59284d39581c16d3a73 upstream.

When the pmd_devmap() checks were added by 5c7fb56e5e3f ("mm, dax:
dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
pages, they were all added to the end of if() statements after existing
pmd_trans_huge() checks.  So, things like:

  -       if (pmd_trans_huge(*pmd))
  +       if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))

When further checks were added after pmd_trans_unstable() checks by
commit 7267ec008b5c ("mm: postpone page table allocation until we have
page to map") they were also added at the end of the conditional:

  +       if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))

This ordering is fine for pmd_trans_huge(), but doesn't work for
pmd_trans_unstable().  This is because DAX huge pages trip the bad_pmd()
check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
pmd_trans_unstable()), which prints out a warning and returns 1.  So, we
do end up doing the right thing, but only after spamming dmesg with
suspicious looking messages:

  mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)

Reorder these checks in a helper so that pmd_devmap() is checked first,
avoiding the error messages, and add a comment explaining why the
ordering is important.

Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/memory.c |   40 ++++++++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 10 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,6 +2848,17 @@ static int __do_fault(struct fault_env *
 	return ret;
 }
 
+/*
+ * The ordering of these checks is important for pmds with _PAGE_DEVMAP set.
+ * If we check pmd_trans_unstable() first we will trip the bad_pmd() check
+ * inside of pmd_none_or_trans_huge_or_clear_bad(). This will end up correctly
+ * returning 1 but not before it spams dmesg with the pmd_clear_bad() output.
+ */
+static int pmd_devmap_trans_unstable(pmd_t *pmd)
+{
+	return pmd_devmap(*pmd) || pmd_trans_unstable(pmd);
+}
+
 static int pte_alloc_one_map(struct fault_env *fe)
 {
 	struct vm_area_struct *vma = fe->vma;
@@ -2871,18 +2882,27 @@ static int pte_alloc_one_map(struct faul
 map_pte:
 	/*
 	 * If a huge pmd materialized under us just retry later.  Use
-	 * pmd_trans_unstable() instead of pmd_trans_huge() to ensure the pmd
-	 * didn't become pmd_trans_huge under us and then back to pmd_none, as
-	 * a result of MADV_DONTNEED running immediately after a huge pmd fault
-	 * in a different thread of this mm, in turn leading to a misleading
-	 * pmd_trans_huge() retval.  All we have to ensure is that it is a
-	 * regular pmd that we can walk with pte_offset_map() and we can do that
-	 * through an atomic read in C, which is what pmd_trans_unstable()
-	 * provides.
+	 * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead of
+	 * pmd_trans_huge() to ensure the pmd didn't become pmd_trans_huge
+	 * under us and then back to pmd_none, as a result of MADV_DONTNEED
+	 * running immediately after a huge pmd fault in a different thread of
+	 * this mm, in turn leading to a misleading pmd_trans_huge() retval.
+	 * All we have to ensure is that it is a regular pmd that we can walk
+	 * with pte_offset_map() and we can do that through an atomic read in
+	 * C, which is what pmd_trans_unstable() provides.
 	 */
-	if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+	if (pmd_devmap_trans_unstable(fe->pmd))
 		return VM_FAULT_NOPAGE;
 
+	/*
+	 * At this point we know that our vmf->pmd points to a page of ptes
+	 * and it cannot become pmd_none(), pmd_devmap() or pmd_trans_huge()
+	 * for the duration of the fault.  If a racing MADV_DONTNEED runs and
+	 * we zap the ptes pointed to by our vmf->pmd, the vmf->ptl will still
+	 * be valid and we will re-check to make sure the vmf->pte isn't
+	 * pte_none() under vmf->ptl protection when we return to
+	 * alloc_set_pte().
+	 */
 	fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address,
 			&fe->ptl);
 	return 0;
@@ -3456,7 +3476,7 @@ static int handle_pte_fault(struct fault
 		fe->pte = NULL;
 	} else {
 		/* See comment in pte_alloc_one_map() */
-		if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+		if (pmd_devmap_trans_unstable(fe->pmd))
 			return 0;
 		/*
 		 * A regular pmd is established and it can't morph into a huge


Patches currently in stable-queue which might be from dan.j.williams@intel.com are

queue-4.9/mm-fix-devm_memremap_pages-collision-handling.patch
queue-4.9/ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch
queue-4.9/mm-avoid-spurious-bad-pmd-warning-messages.patch
queue-4.9/mm-introduce-get_user_pages_longterm.patch
queue-4.9/mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
queue-4.9/fs-dax.c-fix-inefficiency-in-dax_writeback_mapping_range.patch
queue-4.9/device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
queue-4.9/v4l2-disable-filesystem-dax-mapping-support.patch
queue-4.9/libnvdimm-dax-fix-1gb-aligned-namespaces-vs-physical-misalignment.patch
queue-4.9/x86-entry-64-clear-extra-registers-beyond-syscall-arguments-to-reduce-speculation-attack-surface.patch
queue-4.9/libnvdimm-fix-integer-overflow-static-analysis-warning.patch

  reply	other threads:[~2018-02-26 19:58 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <151942352167.21775.16852023419062929165.stgit@dwillia2-desk3.amr.corp.intel.com>
2018-02-23 22:05 ` [4.9-stable PATCH 01/11] mm: avoid spurious 'bad pmd' warning messages Dan Williams
2018-02-26 19:58   ` gregkh [this message]
2018-02-23 22:05 ` [4.9-stable PATCH 02/11] fs/dax.c: fix inefficiency in dax_writeback_mapping_range() Dan Williams
2018-02-26 19:58   ` Patch "fs/dax.c: fix inefficiency in dax_writeback_mapping_range()" has been added to the 4.9-stable tree gregkh
2018-02-23 22:05 ` [4.9-stable PATCH 03/11] libnvdimm: fix integer overflow static analysis warning Dan Williams
2018-02-26 19:58   ` Patch "libnvdimm: fix integer overflow static analysis warning" has been added to the 4.9-stable tree gregkh
2018-02-23 22:05 ` [4.9-stable PATCH 04/11] device-dax: implement ->split() to catch invalid munmap attempts Dan Williams
2018-02-26 19:58   ` Patch "device-dax: implement ->split() to catch invalid munmap attempts" has been added to the 4.9-stable tree gregkh
2018-02-23 22:05 ` [4.9-stable PATCH 05/11] mm: introduce get_user_pages_longterm Dan Williams
2018-02-26 19:58   ` Patch "mm: introduce get_user_pages_longterm" has been added to the 4.9-stable tree gregkh
2018-02-23 22:05 ` [4.9-stable PATCH 06/11] v4l2: disable filesystem-dax mapping support Dan Williams
2018-02-26 19:58   ` Patch "v4l2: disable filesystem-dax mapping support" has been added to the 4.9-stable tree gregkh
2018-02-23 22:06 ` [4.9-stable PATCH 07/11] IB/core: disable memory registration of filesystem-dax vmas Dan Williams
2018-02-26 19:58   ` Patch "IB/core: disable memory registration of filesystem-dax vmas" has been added to the 4.9-stable tree gregkh
2018-02-23 22:06 ` [4.9-stable PATCH 08/11] libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment Dan Williams
2018-02-26 19:58   ` Patch "libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment" has been added to the 4.9-stable tree gregkh
2018-02-23 22:06 ` [4.9-stable PATCH 10/11] mm: fail get_vaddr_frames() for filesystem-dax mappings Dan Williams
2018-02-26 19:58   ` Patch "mm: fail get_vaddr_frames() for filesystem-dax mappings" has been added to the 4.9-stable tree gregkh
2018-02-23 22:06 ` [4.9-stable PATCH 11/11] x86/entry/64: Clear extra registers beyond syscall arguments, to reduce speculation attack surface Dan Williams
2018-02-26 19:58   ` Patch "x86/entry/64: Clear extra registers beyond syscall arguments, to reduce speculation attack surface" has been added to the 4.9-stable tree gregkh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=151967509843252@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=eguan@redhat.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=mawilcox@microsoft.com \
    --cc=pawel.lebioda@intel.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=stable-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xzhou@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.