From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org, yuzhao@google.com, will@kernel.org,
tglx@linutronix.de, peterz@infradead.org, peterx@redhat.com,
npiggin@gmail.com, luto@kernel.org, dave.hansen@linux.intel.com,
andrew.cooper3@citrix.com, aarcange@redhat.com, namit@vmware.com,
akpm@linux-foundation.org
Subject: + mm-avoid-unnecessary-flush-on-change_huge_pmd.patch added to -mm tree
Date: Mon, 04 Apr 2022 15:36:42 -0700 [thread overview]
Message-ID: <20220404223643.66C2BC2BBE4@smtp.kernel.org> (raw)
The patch titled
Subject: mm: avoid unnecessary flush on change_huge_pmd()
has been added to the -mm tree. Its filename is
mm-avoid-unnecessary-flush-on-change_huge_pmd.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-avoid-unnecessary-flush-on-change_huge_pmd.patch
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-avoid-unnecessary-flush-on-change_huge_pmd.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Nadav Amit <namit@vmware.com>
Subject: mm: avoid unnecessary flush on change_huge_pmd()
Calls to change_protection_range() on THP can trigger, at least on x86,
two TLB flushes for one page: one immediately, when pmdp_invalidate() is
called by change_huge_pmd(), and then another one later (that can be
batched) when change_protection_range() finishes.
The first TLB flush is only necessary to prevent the dirty bit (and with a
lesser importance the access bit) from changing while the PTE is modified.
However, this is not necessary as the x86 CPUs set the dirty-bit
atomically with an additional check that the PTE is (still) present. One
caveat is Intel's Knights Landing that has a bug and does not do so.
Leverage this behavior to eliminate the unnecessary TLB flush in
change_huge_pmd(). Introduce a new arch specific pmdp_invalidate_ad()
that only invalidates the access and dirty bit from further changes.
Link: https://lkml.kernel.org/r/20220401180821.1986781-4-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/x86/include/asm/pgtable.h | 5 +++++
arch/x86/mm/pgtable.c | 10 ++++++++++
include/linux/pgtable.h | 20 ++++++++++++++++++++
mm/huge_memory.c | 4 ++--
mm/pgtable-generic.c | 8 ++++++++
5 files changed, 45 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/pgtable.h~mm-avoid-unnecessary-flush-on-change_huge_pmd
+++ a/arch/x86/include/asm/pgtable.h
@@ -1173,6 +1173,11 @@ static inline pmd_t pmdp_establish(struc
}
}
#endif
+
+#define __HAVE_ARCH_PMDP_INVALIDATE_AD
+extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp);
+
/*
* Page table pages are page-aligned. The lower half of the top
* level is used for userspace and the top half for the kernel.
--- a/arch/x86/mm/pgtable.c~mm-avoid-unnecessary-flush-on-change_huge_pmd
+++ a/arch/x86/mm/pgtable.c
@@ -608,6 +608,16 @@ int pmdp_clear_flush_young(struct vm_are
return young;
}
+
+pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp)
+{
+ /*
+ * No flush is necessary. Once an invalid PTE is established, the PTE's
+ * access and dirty bits cannot be updated.
+ */
+ return pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
+}
#endif
/**
--- a/include/linux/pgtable.h~mm-avoid-unnecessary-flush-on-change_huge_pmd
+++ a/include/linux/pgtable.h
@@ -570,6 +570,26 @@ extern pmd_t pmdp_invalidate(struct vm_a
pmd_t *pmdp);
#endif
+#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD
+
+/*
+ * pmdp_invalidate_ad() invalidates the PMD while changing a transparent
+ * hugepage mapping in the page tables. This function is similar to
+ * pmdp_invalidate(), but should only be used if the access and dirty bits would
+ * not be cleared by the software in the new PMD value. The function ensures
+ * that hardware changes of the access and dirty bits updates would not be lost.
+ *
+ * Doing so can allow in certain architectures to avoid a TLB flush in most
+ * cases. Yet, another TLB flush might be necessary later if the PMD update
+ * itself requires such flush (e.g., if protection was set to be stricter). Yet,
+ * even when a TLB flush is needed because of the update, the caller may be able
+ * to batch these TLB flushing operations, so fewer TLB flush operations are
+ * needed.
+ */
+extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp);
+#endif
+
#ifndef __HAVE_ARCH_PTE_SAME
static inline int pte_same(pte_t pte_a, pte_t pte_b)
{
--- a/mm/huge_memory.c~mm-avoid-unnecessary-flush-on-change_huge_pmd
+++ a/mm/huge_memory.c
@@ -1801,10 +1801,10 @@ int change_huge_pmd(struct mmu_gather *t
* The race makes MADV_DONTNEED miss the huge pmd and don't clear it
* which may break userspace.
*
- * pmdp_invalidate() is required to make sure we don't miss
+ * pmdp_invalidate_ad() is required to make sure we don't miss
* dirty/young flags set by hardware.
*/
- oldpmd = pmdp_invalidate(vma, addr, pmd);
+ oldpmd = pmdp_invalidate_ad(vma, addr, pmd);
entry = pmd_modify(oldpmd, newprot);
if (preserve_write)
--- a/mm/pgtable-generic.c~mm-avoid-unnecessary-flush-on-change_huge_pmd
+++ a/mm/pgtable-generic.c
@@ -201,6 +201,14 @@ pmd_t pmdp_invalidate(struct vm_area_str
}
#endif
+#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD
+pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp)
+{
+ return pmdp_invalidate(vma, address, pmdp);
+}
+#endif
+
#ifndef pmdp_collapse_flush
pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
_
Patches currently in -mm which might be from namit@vmware.com are
userfaultfd-mark-uffd_wp-regardless-of-vm_write-flag.patch
mm-mprotect-use-mmu_gather.patch
mm-mprotect-do-not-flush-when-not-required-architecturally.patch
mm-avoid-unnecessary-flush-on-change_huge_pmd.patch
reply other threads:[~2022-04-04 23:06 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220404223643.66C2BC2BBE4@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=aarcange@redhat.com \
--cc=andrew.cooper3@citrix.com \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mm-commits@vger.kernel.org \
--cc=namit@vmware.com \
--cc=npiggin@gmail.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.