From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: npiggin@gmail.com, paulus@samba.org, mpe@ellerman.id.au,
kirill@shutemov.name
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
linuxppc-dev@lists.ozlabs.org
Subject: [RFC PATCH 21/21] powerpc/mm/book3s64: Avoid sending IPI on clearing PMD
Date: Thu, 27 Feb 2020 12:26:20 +0530 [thread overview]
Message-ID: <20200227065620.50804-22-aneesh.kumar@linux.ibm.com> (raw)
In-Reply-To: <20200227065620.50804-1-aneesh.kumar@linux.ibm.com>
Now that all the lockless page table walk is careful w.r.t the PTE
address returned, we can now revert
commit: 13bd817bb884 ("powerpc/thp: Serialize pmd clear against a linux page table walk.")
This speeds up Qemu termination with large guest RAM value.
We also drop the equivalent IPI from other pte updates routines. We still keep
IPI in hash pmdp collapse and that is to take care of parallel hash page table
insert. The radix pmdp collapse flush can possibly be removed once I am sure
generic code doesn't have the any expectations around parallel gup walk.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/mm/book3s64/hash_pgtable.c | 11 -----------
arch/powerpc/mm/book3s64/pgtable.c | 8 --------
arch/powerpc/mm/book3s64/radix_pgtable.c | 20 ++++++++------------
3 files changed, 8 insertions(+), 31 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 64733b9cb20a..64ca375278dc 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -363,17 +363,6 @@ pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
* hash fault look at them.
*/
memset(pgtable, 0, PTE_FRAG_SIZE);
- /*
- * Serialize against find_current_mm_pte variants which does lock-less
- * lookup in page tables with local interrupts disabled. For huge pages
- * it casts pmd_t to pte_t. Since format of pte_t is different from
- * pmd_t we want to prevent transit from pmd pointing to page table
- * to pmd pointing to huge page (and back) while interrupts are disabled.
- * We clear pmd to possibly replace it with page table pointer in
- * different code paths. So make sure we wait for the parallel
- * find_curren_mm_pte to finish.
- */
- serialize_against_pte_lookup(mm);
return old_pmd;
}
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 2bf7e1b4fd82..93fc3be41ed9 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -109,14 +109,6 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, _PAGE_INVALID);
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
- /*
- * This ensures that generic code that rely on IRQ disabling
- * to prevent a parallel THP split work as expected.
- *
- * Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
- * a special case check in pmd_access_permitted.
- */
- serialize_against_pte_lookup(vma->vm_mm);
return __pmd(old_pmd);
}
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index dd1bea45325c..1ae77568221d 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -957,7 +957,14 @@ pmd_t radix__pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long addre
pmd = *pmdp;
pmd_clear(pmdp);
- /*FIXME!! Verify whether we need this kick below */
+ /*
+ * pmdp collapse_flush need to ensure that there are no parallel gup
+ * walk after this call. We can ensure that by sending IPI because
+ * gup walk happens with IRQ disabled.
+ * FIXME!!
+ * At this point I am not sure why we have a comment above pmdp_collapse_flush
+ * that a parallel gup walk should be avoided.
+ */
serialize_against_pte_lookup(vma->vm_mm);
radix__flush_tlb_collapsed_pmd(vma->vm_mm, address);
@@ -1018,17 +1025,6 @@ pmd_t radix__pmdp_huge_get_and_clear(struct mm_struct *mm,
old = radix__pmd_hugepage_update(mm, addr, pmdp, ~0UL, 0);
old_pmd = __pmd(old);
- /*
- * Serialize against find_current_mm_pte which does lock-less
- * lookup in page tables with local interrupts disabled. For huge pages
- * it casts pmd_t to pte_t. Since format of pte_t is different from
- * pmd_t we want to prevent transit from pmd pointing to page table
- * to pmd pointing to huge page (and back) while interrupts are disabled.
- * We clear pmd to possibly replace it with page table pointer in
- * different code paths. So make sure we wait for the parallel
- * find_current_mm_pte to finish.
- */
- serialize_against_pte_lookup(mm);
return old_pmd;
}
--
2.24.1
prev parent reply other threads:[~2020-02-27 7:38 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-27 6:55 [RFC PATCH 00/21] Avoid IPI while updating page table entries Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 01/21] powerpc/pkeys: Avoid using lockless page table walk Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 02/21] powerpc/pkeys: Check vma before returning key fault error to the user Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 03/21] powerpc/mm/hash64: use _PAGE_PTE when checking for pte_present Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 04/21] powerpc/hash64: Restrict page table lookup using init_mm with __flush_hash_table_range Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 05/21] powerpc/book3s64/hash: Use the pte_t address from the caller Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 06/21] powerpc/book3s/hash64/devmap: Use H_PAGE_THP_HUGE when setting up level huge devmap pte entries Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 07/21] powerpc/mce: Don't reload pte val in addr_to_pfn Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 08/21] powerpc/perf/callchain: Use __get_user_pages_fast in read_user_stack_slow Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 09/21] powerpc/kvm/book3s: switch from raw_spin_*lock to arch_spin_lock Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 10/21] powerpc/kvm/book3s: Add helper to walk partition scoped linux page table Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 11/21] powerpc/kvm/nested: Add helper to walk nested shadow " Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 12/21] powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 13/21] powerpc/kvm/book3s: Add helper for host page table walk Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 14/21] powerpc/kvm/book3s: Use find_kvm_host_pte in page fault handler Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 15/21] powerpc/kvm/book3s: Use find_kvm_host_pte in h_enter Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 16/21] powerpc/kvm/book3s: use find_kvm_host_pte in pute_tce functions Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 17/21] powerpc/kvm/book3s: Avoid using rmap to protect parallel page table update Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 18/21] powerpc/kvm/book3s: use find_kvm_host_pte in kvmppc_book3s_instantiate_page Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 19/21] powerpc/kvm/book3s: Use find_kvm_host_pte in kvmppc_get_hpa Aneesh Kumar K.V
2020-02-27 6:56 ` [RFC PATCH 20/21] powerpc/kvm/book3s: Use pte_present instead of opencoding _PAGE_PRESENT check Aneesh Kumar K.V
2020-02-27 6:56 ` Aneesh Kumar K.V [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200227065620.50804-22-aneesh.kumar@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=kirill@shutemov.name \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).