All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jan Kara <jack@suse.cz>, Pawel Lebioda <pawel.lebioda@intel.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Dave Jiang <dave.jiang@intel.com>, Xiong Zhou <xzhou@redhat.com>,
	Eryu Guan <eguan@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.9 29/39] mm: avoid spurious bad pmd warning messages
Date: Mon, 26 Feb 2018 21:20:50 +0100	[thread overview]
Message-ID: <20180226201644.948091150@linuxfoundation.org> (raw)
In-Reply-To: <20180226201643.660109883@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ross Zwisler <ross.zwisler@linux.intel.com>

commit d0f0931de936a0a468d7e59284d39581c16d3a73 upstream.

When the pmd_devmap() checks were added by 5c7fb56e5e3f ("mm, dax:
dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
pages, they were all added to the end of if() statements after existing
pmd_trans_huge() checks.  So, things like:

  -       if (pmd_trans_huge(*pmd))
  +       if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))

When further checks were added after pmd_trans_unstable() checks by
commit 7267ec008b5c ("mm: postpone page table allocation until we have
page to map") they were also added at the end of the conditional:

  +       if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))

This ordering is fine for pmd_trans_huge(), but doesn't work for
pmd_trans_unstable().  This is because DAX huge pages trip the bad_pmd()
check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
pmd_trans_unstable()), which prints out a warning and returns 1.  So, we
do end up doing the right thing, but only after spamming dmesg with
suspicious looking messages:

  mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)

Reorder these checks in a helper so that pmd_devmap() is checked first,
avoiding the error messages, and add a comment explaining why the
ordering is important.

Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/memory.c |   40 ++++++++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 10 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,6 +2848,17 @@ static int __do_fault(struct fault_env *
 	return ret;
 }
 
+/*
+ * The ordering of these checks is important for pmds with _PAGE_DEVMAP set.
+ * If we check pmd_trans_unstable() first we will trip the bad_pmd() check
+ * inside of pmd_none_or_trans_huge_or_clear_bad(). This will end up correctly
+ * returning 1 but not before it spams dmesg with the pmd_clear_bad() output.
+ */
+static int pmd_devmap_trans_unstable(pmd_t *pmd)
+{
+	return pmd_devmap(*pmd) || pmd_trans_unstable(pmd);
+}
+
 static int pte_alloc_one_map(struct fault_env *fe)
 {
 	struct vm_area_struct *vma = fe->vma;
@@ -2871,18 +2882,27 @@ static int pte_alloc_one_map(struct faul
 map_pte:
 	/*
 	 * If a huge pmd materialized under us just retry later.  Use
-	 * pmd_trans_unstable() instead of pmd_trans_huge() to ensure the pmd
-	 * didn't become pmd_trans_huge under us and then back to pmd_none, as
-	 * a result of MADV_DONTNEED running immediately after a huge pmd fault
-	 * in a different thread of this mm, in turn leading to a misleading
-	 * pmd_trans_huge() retval.  All we have to ensure is that it is a
-	 * regular pmd that we can walk with pte_offset_map() and we can do that
-	 * through an atomic read in C, which is what pmd_trans_unstable()
-	 * provides.
+	 * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead of
+	 * pmd_trans_huge() to ensure the pmd didn't become pmd_trans_huge
+	 * under us and then back to pmd_none, as a result of MADV_DONTNEED
+	 * running immediately after a huge pmd fault in a different thread of
+	 * this mm, in turn leading to a misleading pmd_trans_huge() retval.
+	 * All we have to ensure is that it is a regular pmd that we can walk
+	 * with pte_offset_map() and we can do that through an atomic read in
+	 * C, which is what pmd_trans_unstable() provides.
 	 */
-	if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+	if (pmd_devmap_trans_unstable(fe->pmd))
 		return VM_FAULT_NOPAGE;
 
+	/*
+	 * At this point we know that our vmf->pmd points to a page of ptes
+	 * and it cannot become pmd_none(), pmd_devmap() or pmd_trans_huge()
+	 * for the duration of the fault.  If a racing MADV_DONTNEED runs and
+	 * we zap the ptes pointed to by our vmf->pmd, the vmf->ptl will still
+	 * be valid and we will re-check to make sure the vmf->pte isn't
+	 * pte_none() under vmf->ptl protection when we return to
+	 * alloc_set_pte().
+	 */
 	fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address,
 			&fe->ptl);
 	return 0;
@@ -3456,7 +3476,7 @@ static int handle_pte_fault(struct fault
 		fe->pte = NULL;
 	} else {
 		/* See comment in pte_alloc_one_map() */
-		if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+		if (pmd_devmap_trans_unstable(fe->pmd))
 			return 0;
 		/*
 		 * A regular pmd is established and it can't morph into a huge

  parent reply	other threads:[~2018-02-26 20:20 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-26 20:20 [PATCH 4.9 00/39] 4.9.85-stable review Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 01/39] netfilter: drop outermost socket lock in getsockopt() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 02/39] xtensa: fix high memory/reserved memory collision Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 03/39] scsi: ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 04/39] cfg80211: fix cfg80211_beacon_dup Greg Kroah-Hartman
2018-02-26 20:20   ` Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 05/39] X.509: fix BUG_ON() when hash algorithm is unsupported Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 06/39] PKCS#7: fix certificate chain verification Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 07/39] RDMA/uverbs: Protect from command mask overflow Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 08/39] iio: buffer: check if a buffer has been set up when poll is called Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 09/39] iio: adis_lib: Initialize trigger before requesting interrupt Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 10/39] x86/oprofile: Fix bogus GCC-8 warning in nmi_setup() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 11/39] irqchip/gic-v3: Use wmb() instead of smb_wmb() in gic_raise_softirq() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 12/39] PCI/cxgb4: Extend T3 PCI quirk to T4+ devices Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 13/39] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 14/39] usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 15/39] arm64: Disable unhandled signal log messages by default Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 16/39] Add delay-init quirk for Corsair K70 RGB keyboards Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 17/39] drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 18/39] usb: dwc3: gadget: Set maxpacket size for ep0 IN Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 19/39] usb: ldusb: add PIDs for new CASSY devices supported by this driver Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 20/39] Revert "usb: musb: host: dont start next rx urb if current one failed" Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 21/39] usb: gadget: f_fs: Process all descriptors during bind Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 22/39] usb: renesas_usbhs: missed the "running" flag in usb_dmac with rx path Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 23/39] drm/amdgpu: Add dpm quirk for Jet PRO (v2) Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 24/39] drm/amdgpu: add atpx quirk handling (v2) Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 25/39] drm/amdgpu: Avoid leaking PM domain on driver unbind (v2) Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 26/39] drm/amdgpu: add new device to use atpx quirk Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 27/39] binder: add missing binder_unlock() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 28/39] X.509: fix NULL dereference when restricting key with unsupported_sig Greg Kroah-Hartman
2018-02-26 20:20 ` Greg Kroah-Hartman [this message]
2018-02-26 20:20 ` [PATCH 4.9 30/39] fs/dax.c: fix inefficiency in dax_writeback_mapping_range() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 31/39] libnvdimm: fix integer overflow static analysis warning Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 32/39] device-dax: implement ->split() to catch invalid munmap attempts Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 33/39] mm: introduce get_user_pages_longterm Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 34/39] v4l2: disable filesystem-dax mapping support Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 35/39] IB/core: disable memory registration of filesystem-dax vmas Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 36/39] libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 37/39] mm: Fix devm_memremap_pages() collision handling Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 38/39] mm: fail get_vaddr_frames() for filesystem-dax mappings Greg Kroah-Hartman
2018-02-26 20:21 ` [PATCH 4.9 39/39] x86/entry/64: Clear extra registers beyond syscall arguments, to reduce speculation attack surface Greg Kroah-Hartman
2018-02-27  0:57 ` [PATCH 4.9 00/39] 4.9.85-stable review Shuah Khan
2018-02-27  7:12 ` Naresh Kamboju
2018-02-27 14:56 ` Guenter Roeck
2018-02-27 18:34   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180226201644.948091150@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=eguan@redhat.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=pawel.lebioda@intel.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xzhou@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.