From: Nicholas Piggin <npiggin@gmail.com>
To: linuxppc-dev@lists.ozlabs.org
Cc: Nicholas Piggin <npiggin@gmail.com>,
"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Subject: [PATCH v4 5/7] powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags
Date: Fri, 1 Jun 2018 20:01:19 +1000 [thread overview]
Message-ID: <20180601100121.393-6-npiggin@gmail.com> (raw)
In-Reply-To: <20180601100121.393-1-npiggin@gmail.com>
The ISA suggests ptesync after setting a pte, to prevent a table walk
initiated by a subsequent access from missing that store and causing a
spurious fault. This is an architectual allowance that allows an
implementation's page table walker to be incoherent with the store
queue.
However there is no correctness problem in taking a spurious fault in
userspace -- the kernel copes with these at any time, so the updated
pte will be found eventually. Spurious kernel faults on vmap memory
must be avoided, so a ptesync is put into flush_cache_vmap.
On POWER9 so far I have not found a measurable window where this can
result in more minor faults, so as an optimisation, remove the costly
ptesync from pte updates. If an implementation benefits from ptesync,
it would be better to add it back in update_mmu_cache, so it's not
done for things like fork(2).
fork --fork --exec benchmark improved 5.2% (12400->13100).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/book3s/64/radix.h | 19 ++++++++++++++++++-
arch/powerpc/include/asm/cacheflush.h | 13 +++++++++++++
arch/powerpc/mm/pgtable-radix.c | 2 +-
3 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index 01f6c2ca7ecd..9c567d243f61 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -202,7 +202,24 @@ static inline void radix__set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, int percpu)
{
*ptep = pte;
- asm volatile("ptesync" : : : "memory");
+
+ /*
+ * The architecture suggests a ptesync after setting the pte, which
+ * orders the store that updates the pte with subsequent page table
+ * walk accesses which may load the pte. Without this it may be
+ * possible for a subsequent access to result in spurious fault.
+ *
+ * This is not necessary for correctness, because a spurious fault
+ * is tolerated by the page fault handler, and this store will
+ * eventually be seen. In testing, there was no noticable increase
+ * in user faults on POWER9. Avoiding ptesync here is a significant
+ * win for things like fork. If a future microarchitecture benefits
+ * from ptesync, it should probably go into update_mmu_cache, rather
+ * than set_pte_at (which is used to set ptes unrelated to faults).
+ *
+ * Spurious faults to vmalloc region are not tolerated, so there is
+ * a ptesync in flush_cache_vmap.
+ */
}
static inline int radix__pmd_bad(pmd_t pmd)
diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index 11843e37d9cf..e9662648e72d 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -26,6 +26,19 @@
#define flush_cache_vmap(start, end) do { } while (0)
#define flush_cache_vunmap(start, end) do { } while (0)
+#ifdef CONFIG_BOOK3S_64
+/*
+ * Book3s has no ptesync after setting a pte, so without this ptesync it's
+ * possible for a kernel virtual mapping access to return a spurious fault
+ * if it's accessed right after the pte is set. The page fault handler does
+ * not expect this type of fault. flush_cache_vmap is not exactly the right
+ * place to put this, but it seems to work well enough.
+ */
+#define flush_cache_vmap(start, end) do { asm volatile("ptesync"); } while (0)
+#else
+#define flush_cache_vmap(start, end) do { } while (0)
+#endif
+
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
extern void flush_dcache_page(struct page *page);
#define flush_dcache_mmap_lock(mapping) do { } while (0)
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index d6f74cbf0fed..96f68c5aa1f5 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -1115,5 +1115,5 @@ void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
* an access fault, which is defined by the architectue.
*/
}
- asm volatile("ptesync" : : : "memory");
+ /* See ptesync comment in radix__set_pte_at */
}
--
2.17.0
next prev parent reply other threads:[~2018-06-01 10:01 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-01 10:01 [PATCH v4 0/7] Various TLB and PTE improvements Nicholas Piggin
2018-06-01 10:01 ` [PATCH v4 1/7] powerpc/64s/radix: do not flush TLB when relaxing access Nicholas Piggin
2018-06-04 14:11 ` [v4, " Michael Ellerman
2018-06-01 10:01 ` [PATCH v4 2/7] powerpc/64s/radix: do not flush TLB on spurious fault Nicholas Piggin
2018-06-01 10:01 ` [PATCH v4 3/7] powerpc/64s/radix: make ptep_get_and_clear_full non-atomic for the full case Nicholas Piggin
2018-06-01 10:01 ` [PATCH v4 4/7] powerpc/64s/radix: prefetch user address in update_mmu_cache Nicholas Piggin
2018-06-01 10:01 ` Nicholas Piggin [this message]
2018-06-01 10:01 ` [PATCH v4 6/7] powerpc/64s/radix: optimise pte_update Nicholas Piggin
2018-06-01 10:01 ` [PATCH v4 7/7] powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180601100121.393-6-npiggin@gmail.com \
--to=npiggin@gmail.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).