From: <gregkh@linuxfoundation.org>
To: dan.j.williams@intel.com, akpm@linux-foundation.org,
darrick.wong@oracle.com, dave.hansen@intel.com,
dave.jiang@intel.com, eguan@redhat.com,
gregkh@linuxfoundation.org, hch@lst.de, jack@suse.cz,
kirill.shutemov@linux.intel.com, mawilcox@microsoft.com,
pawel.lebioda@intel.com, ross.zwisler@linux.intel.com,
stable@vger.kernel.org, torvalds@linux-foundation.org,
viro@zeniv.linux.org.uk, xzhou@redhat.com
Cc: <stable@vger.kernel.org>, <stable-commits@vger.kernel.org>
Subject: Patch "mm: avoid spurious 'bad pmd' warning messages" has been added to the 4.9-stable tree
Date: Mon, 26 Feb 2018 20:58:18 +0100 [thread overview]
Message-ID: <151967509843252@kroah.com> (raw)
In-Reply-To: <151942352781.21775.15841303754448120195.stgit@dwillia2-desk3.amr.corp.intel.com>
This is a note to let you know that I've just added the patch titled
mm: avoid spurious 'bad pmd' warning messages
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
mm-avoid-spurious-bad-pmd-warning-messages.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
>From foo@baz Mon Feb 26 20:55:53 CET 2018
From: Dan Williams <dan.j.williams@intel.com>
Date: Fri, 23 Feb 2018 14:05:27 -0800
Subject: mm: avoid spurious 'bad pmd' warning messages
To: gregkh@linuxfoundation.org
Cc: Jan Kara <jack@suse.cz>, Eryu Guan <eguan@redhat.com>, Xiong Zhou <xzhou@redhat.com>, linux-kernel@vger.kernel.org, Matthew Wilcox <mawilcox@microsoft.com>, Christoph Hellwig <hch@lst.de>, stable@vger.kernel.org, Pawel Lebioda <pawel.lebioda@intel.com>, Dave Hansen <dave.hansen@intel.com>, Alexander Viro <viro@zeniv.linux.org.uk>, Ross Zwisler <ross.zwisler@linux.intel.com>, Dave Jiang <dave.jiang@intel.com>, Andrew Morton <akpm@linux-foundation.org>, Linus Torvalds <torvalds@linux-foundation.org>, "Darrick J. Wong" <darrick.wong@oracle.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Message-ID: <151942352781.21775.15841303754448120195.stgit@dwillia2-desk3.amr.corp.intel.com>
From: Ross Zwisler <ross.zwisler@linux.intel.com>
commit d0f0931de936a0a468d7e59284d39581c16d3a73 upstream.
When the pmd_devmap() checks were added by 5c7fb56e5e3f ("mm, dax:
dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
pages, they were all added to the end of if() statements after existing
pmd_trans_huge() checks. So, things like:
- if (pmd_trans_huge(*pmd))
+ if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
When further checks were added after pmd_trans_unstable() checks by
commit 7267ec008b5c ("mm: postpone page table allocation until we have
page to map") they were also added at the end of the conditional:
+ if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
This ordering is fine for pmd_trans_huge(), but doesn't work for
pmd_trans_unstable(). This is because DAX huge pages trip the bad_pmd()
check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
pmd_trans_unstable()), which prints out a warning and returns 1. So, we
do end up doing the right thing, but only after spamming dmesg with
suspicious looking messages:
mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)
Reorder these checks in a helper so that pmd_devmap() is checked first,
avoiding the error messages, and add a comment explaining why the
ordering is important.
Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/memory.c | 40 ++++++++++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 10 deletions(-)
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,6 +2848,17 @@ static int __do_fault(struct fault_env *
return ret;
}
+/*
+ * The ordering of these checks is important for pmds with _PAGE_DEVMAP set.
+ * If we check pmd_trans_unstable() first we will trip the bad_pmd() check
+ * inside of pmd_none_or_trans_huge_or_clear_bad(). This will end up correctly
+ * returning 1 but not before it spams dmesg with the pmd_clear_bad() output.
+ */
+static int pmd_devmap_trans_unstable(pmd_t *pmd)
+{
+ return pmd_devmap(*pmd) || pmd_trans_unstable(pmd);
+}
+
static int pte_alloc_one_map(struct fault_env *fe)
{
struct vm_area_struct *vma = fe->vma;
@@ -2871,18 +2882,27 @@ static int pte_alloc_one_map(struct faul
map_pte:
/*
* If a huge pmd materialized under us just retry later. Use
- * pmd_trans_unstable() instead of pmd_trans_huge() to ensure the pmd
- * didn't become pmd_trans_huge under us and then back to pmd_none, as
- * a result of MADV_DONTNEED running immediately after a huge pmd fault
- * in a different thread of this mm, in turn leading to a misleading
- * pmd_trans_huge() retval. All we have to ensure is that it is a
- * regular pmd that we can walk with pte_offset_map() and we can do that
- * through an atomic read in C, which is what pmd_trans_unstable()
- * provides.
+ * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead of
+ * pmd_trans_huge() to ensure the pmd didn't become pmd_trans_huge
+ * under us and then back to pmd_none, as a result of MADV_DONTNEED
+ * running immediately after a huge pmd fault in a different thread of
+ * this mm, in turn leading to a misleading pmd_trans_huge() retval.
+ * All we have to ensure is that it is a regular pmd that we can walk
+ * with pte_offset_map() and we can do that through an atomic read in
+ * C, which is what pmd_trans_unstable() provides.
*/
- if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+ if (pmd_devmap_trans_unstable(fe->pmd))
return VM_FAULT_NOPAGE;
+ /*
+ * At this point we know that our vmf->pmd points to a page of ptes
+ * and it cannot become pmd_none(), pmd_devmap() or pmd_trans_huge()
+ * for the duration of the fault. If a racing MADV_DONTNEED runs and
+ * we zap the ptes pointed to by our vmf->pmd, the vmf->ptl will still
+ * be valid and we will re-check to make sure the vmf->pte isn't
+ * pte_none() under vmf->ptl protection when we return to
+ * alloc_set_pte().
+ */
fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address,
&fe->ptl);
return 0;
@@ -3456,7 +3476,7 @@ static int handle_pte_fault(struct fault
fe->pte = NULL;
} else {
/* See comment in pte_alloc_one_map() */
- if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+ if (pmd_devmap_trans_unstable(fe->pmd))
return 0;
/*
* A regular pmd is established and it can't morph into a huge
Patches currently in stable-queue which might be from dan.j.williams@intel.com are
queue-4.9/mm-fix-devm_memremap_pages-collision-handling.patch
queue-4.9/ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch
queue-4.9/mm-avoid-spurious-bad-pmd-warning-messages.patch
queue-4.9/mm-introduce-get_user_pages_longterm.patch
queue-4.9/mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
queue-4.9/fs-dax.c-fix-inefficiency-in-dax_writeback_mapping_range.patch
queue-4.9/device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
queue-4.9/v4l2-disable-filesystem-dax-mapping-support.patch
queue-4.9/libnvdimm-dax-fix-1gb-aligned-namespaces-vs-physical-misalignment.patch
queue-4.9/x86-entry-64-clear-extra-registers-beyond-syscall-arguments-to-reduce-speculation-attack-surface.patch
queue-4.9/libnvdimm-fix-integer-overflow-static-analysis-warning.patch
next prev parent reply other threads:[~2018-02-26 19:58 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <151942352167.21775.16852023419062929165.stgit@dwillia2-desk3.amr.corp.intel.com>
2018-02-23 22:05 ` [4.9-stable PATCH 01/11] mm: avoid spurious 'bad pmd' warning messages Dan Williams
2018-02-26 19:58 ` gregkh [this message]
2018-02-23 22:05 ` [4.9-stable PATCH 02/11] fs/dax.c: fix inefficiency in dax_writeback_mapping_range() Dan Williams
2018-02-26 19:58 ` Patch "fs/dax.c: fix inefficiency in dax_writeback_mapping_range()" has been added to the 4.9-stable tree gregkh
2018-02-23 22:05 ` [4.9-stable PATCH 03/11] libnvdimm: fix integer overflow static analysis warning Dan Williams
2018-02-26 19:58 ` Patch "libnvdimm: fix integer overflow static analysis warning" has been added to the 4.9-stable tree gregkh
2018-02-23 22:05 ` [4.9-stable PATCH 04/11] device-dax: implement ->split() to catch invalid munmap attempts Dan Williams
2018-02-26 19:58 ` Patch "device-dax: implement ->split() to catch invalid munmap attempts" has been added to the 4.9-stable tree gregkh
2018-02-23 22:05 ` [4.9-stable PATCH 05/11] mm: introduce get_user_pages_longterm Dan Williams
2018-02-26 19:58 ` Patch "mm: introduce get_user_pages_longterm" has been added to the 4.9-stable tree gregkh
2018-02-23 22:05 ` [4.9-stable PATCH 06/11] v4l2: disable filesystem-dax mapping support Dan Williams
2018-02-26 19:58 ` Patch "v4l2: disable filesystem-dax mapping support" has been added to the 4.9-stable tree gregkh
2018-02-23 22:06 ` [4.9-stable PATCH 07/11] IB/core: disable memory registration of filesystem-dax vmas Dan Williams
2018-02-26 19:58 ` Patch "IB/core: disable memory registration of filesystem-dax vmas" has been added to the 4.9-stable tree gregkh
2018-02-23 22:06 ` [4.9-stable PATCH 08/11] libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment Dan Williams
2018-02-26 19:58 ` Patch "libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment" has been added to the 4.9-stable tree gregkh
2018-02-23 22:06 ` [4.9-stable PATCH 10/11] mm: fail get_vaddr_frames() for filesystem-dax mappings Dan Williams
2018-02-26 19:58 ` Patch "mm: fail get_vaddr_frames() for filesystem-dax mappings" has been added to the 4.9-stable tree gregkh
2018-02-23 22:06 ` [4.9-stable PATCH 11/11] x86/entry/64: Clear extra registers beyond syscall arguments, to reduce speculation attack surface Dan Williams
2018-02-26 19:58 ` Patch "x86/entry/64: Clear extra registers beyond syscall arguments, to reduce speculation attack surface" has been added to the 4.9-stable tree gregkh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=151967509843252@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=darrick.wong@oracle.com \
--cc=dave.hansen@intel.com \
--cc=dave.jiang@intel.com \
--cc=eguan@redhat.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=kirill.shutemov@linux.intel.com \
--cc=mawilcox@microsoft.com \
--cc=pawel.lebioda@intel.com \
--cc=ross.zwisler@linux.intel.com \
--cc=stable-commits@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=xzhou@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).