From: Peter Zijlstra <peterz@infradead.org>
To: x86@kernel.org, willy@infradead.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
aarcange@redhat.com, kirill.shutemov@linux.intel.com,
jroedel@suse.de, peterz@infradead.org
Subject: [RFC][PATCH 4/9] mm: Fix pmd_read_atomic()
Date: Mon, 30 Nov 2020 12:27:09 +0100 [thread overview]
Message-ID: <20201130113603.018510568@infradead.org> (raw)
In-Reply-To: 20201130112705.900705277@infradead.org
AFAICT there's no reason to do anything different than what we do for
PTEs. Make it so (also affects SH).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/pgtable-3level.h | 56 ----------------------------------
include/linux/pgtable.h | 49 +++++++++++++++++++++++------
2 files changed, 39 insertions(+), 66 deletions(-)
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -34,62 +34,6 @@ static inline void native_set_pte(pte_t
ptep->pte_low = pte.pte_low;
}
-#define pmd_read_atomic pmd_read_atomic
-/*
- * pte_offset_map_lock() on 32-bit PAE kernels was reading the pmd_t with
- * a "*pmdp" dereference done by GCC. Problem is, in certain places
- * where pte_offset_map_lock() is called, concurrent page faults are
- * allowed, if the mmap_lock is hold for reading. An example is mincore
- * vs page faults vs MADV_DONTNEED. On the page fault side
- * pmd_populate() rightfully does a set_64bit(), but if we're reading the
- * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
- * because GCC will not read the 64-bit value of the pmd atomically.
- *
- * To fix this all places running pte_offset_map_lock() while holding the
- * mmap_lock in read mode, shall read the pmdp pointer using this
- * function to know if the pmd is null or not, and in turn to know if
- * they can run pte_offset_map_lock() or pmd_trans_huge() or other pmd
- * operations.
- *
- * Without THP if the mmap_lock is held for reading, the pmd can only
- * transition from null to not null while pmd_read_atomic() runs. So
- * we can always return atomic pmd values with this function.
- *
- * With THP if the mmap_lock is held for reading, the pmd can become
- * trans_huge or none or point to a pte (and in turn become "stable")
- * at any time under pmd_read_atomic(). We could read it truly
- * atomically here with an atomic64_read() for the THP enabled case (and
- * it would be a whole lot simpler), but to avoid using cmpxchg8b we
- * only return an atomic pmdval if the low part of the pmdval is later
- * found to be stable (i.e. pointing to a pte). We are also returning a
- * 'none' (zero) pmdval if the low part of the pmd is zero.
- *
- * In some cases the high and low part of the pmdval returned may not be
- * consistent if THP is enabled (the low part may point to previously
- * mapped hugepage, while the high part may point to a more recently
- * mapped hugepage), but pmd_none_or_trans_huge_or_clear_bad() only
- * needs the low part of the pmd to be read atomically to decide if the
- * pmd is unstable or not, with the only exception when the low part
- * of the pmd is zero, in which case we return a 'none' pmd.
- */
-static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
-{
- pmdval_t ret;
- u32 *tmp = (u32 *)pmdp;
-
- ret = (pmdval_t) (*tmp);
- if (ret) {
- /*
- * If the low part is null, we must not read the high part
- * or we can end up with a partial pmd.
- */
- smp_rmb();
- ret |= ((pmdval_t)*(tmp + 1)) << 32;
- }
-
- return (pmd_t) { .pmd = ret };
-}
-
static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
{
set_64bit((unsigned long long *)(ptep), native_pte_val(pte));
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -258,6 +258,13 @@ static inline pte_t ptep_get(pte_t *ptep
}
#endif
+#ifndef __HAVE_ARCH_PMDP_GET
+static inline pmd_t pmdp_get(pmd_t *pmdp)
+{
+ return READ_ONCE(*pmdp);
+}
+#endif
+
#ifdef CONFIG_GUP_GET_PTE_LOW_HIGH
/*
* For walking the pagetables without holding any locks. Some architectures
@@ -302,15 +309,44 @@ static inline pte_t ptep_get_lockless(pt
return pte;
}
-#else /* CONFIG_GUP_GET_PTE_LOW_HIGH */
+#define ptep_get_lockless ptep_get_lockless
+
+#if CONFIG_PGTABLE_LEVELS > 2
+static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
+{
+ pmd_t pmd;
+
+ lockdep_assert_irqs_disabled();
+
+ do {
+ pmd.pmd_low = pmdp->pmd_low;
+ smp_rmb();
+ pmd.pmd_high = pmdp->pmd_high;
+ smp_rmb();
+ } while (unlikely(pmd.pmd_low != pmdp->pmd_low));
+
+ return pmd;
+}
+#define pmdp_get_lockless pmdp_get_lockless
+#endif /* CONFIG_PGTABLE_LEVELS > 2 */
+#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */
+
/*
* We require that the PTE can be read atomically.
*/
+#ifndef ptep_get_lockless
static inline pte_t ptep_get_lockless(pte_t *ptep)
{
return ptep_get(ptep);
}
-#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */
+#endif
+
+#ifndef pmdp_get_lockless
+static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
+{
+ return pmdp_get(pmdp);
+}
+#endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#ifndef __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
@@ -1211,17 +1247,10 @@ static inline int pud_trans_unstable(pud
#endif
}
-#ifndef pmd_read_atomic
static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
{
- /*
- * Depend on compiler for an atomic pmd read. NOTE: this is
- * only going to work, if the pmdval_t isn't larger than
- * an unsigned long.
- */
- return *pmdp;
+ return pmdp_get_lockless(pmdp);
}
-#endif
#ifndef arch_needs_pgtable_deposit
#define arch_needs_pgtable_deposit() (false)
next prev parent reply other threads:[~2020-11-30 11:38 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-30 11:27 [RFC][PATCH 0/9] Clean up i386-PAE Peter Zijlstra
2020-11-30 11:27 ` [RFC][PATCH 1/9] mm: Update ptep_get_lockless()s comment Peter Zijlstra
2020-11-30 11:27 ` [RFC][PATCH 2/9] x86/mm/pae: Make pmd_t similar to pte_t Peter Zijlstra
2020-11-30 11:27 ` [RFC][PATCH 3/9] sh/mm: " Peter Zijlstra
2020-11-30 14:10 ` David Laight
2020-11-30 14:21 ` Peter Zijlstra
2020-11-30 11:27 ` Peter Zijlstra [this message]
2020-11-30 11:27 ` [RFC][PATCH 5/9] mm: Rename pmd_read_atomic() Peter Zijlstra
2020-11-30 15:31 ` Jason Gunthorpe
2020-12-01 8:57 ` Peter Zijlstra
2020-11-30 11:27 ` [RFC][PATCH 6/9] mm/gup: Fix the lockless walkers Peter Zijlstra
2020-11-30 11:27 ` [RFC][PATCH 7/9] x86/mm/pae: Dont (ab)use atomic64 Peter Zijlstra
2020-11-30 11:27 ` [RFC][PATCH 8/9] x86/mm/pae: Use WRITE_ONCE() Peter Zijlstra
2020-11-30 11:27 ` [RFC][PATCH 9/9] x86/mm/pae: Be consistent with pXXp_get_and_clear() Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201130113603.018510568@infradead.org \
--to=peterz@infradead.org \
--cc=aarcange@redhat.com \
--cc=jroedel@suse.de \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).