From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org
Cc: peterz@infradead.org, kaleshsingh@google.com,
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
joel@joelfernandes.org, linuxppc-dev@lists.ozlabs.org
Subject: [RFC PATCH 4/6] mm/mremap: Use mmu gather interface instead of flush_tlb_range
Date: Tue, 2 Feb 2021 14:41:14 +0530 [thread overview]
Message-ID: <20210202091116.196134-4-aneesh.kumar@linux.ibm.com> (raw)
In-Reply-To: <20210202091116.196134-1-aneesh.kumar@linux.ibm.com>
Some architectures do have the concept of page walk cache and only mmu gather
interface supports flushing them. A fast mremap that involves moving page
table pages instead of copying pte entries should flush page walk cache since
the old translation cache is no more valid. Hence switch to mm gather to flush
TLB and mark tlb.freed_tables = 1. No page table pages need to be freed here.
With this the tlb flush is done outside page table lock (ptl).
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/mremap.c | 33 +++++++++++++++++++++++++++++----
1 file changed, 29 insertions(+), 4 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 54fd2302b99d..14778d215011 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -217,6 +217,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
{
spinlock_t *old_ptl, *new_ptl;
struct mm_struct *mm = vma->vm_mm;
+ struct mmu_gather tlb;
pmd_t pmd;
/*
@@ -245,11 +246,12 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
if (WARN_ON_ONCE(!pmd_none(*new_pmd)))
return false;
+ tlb_gather_mmu(&tlb, mm, old_addr, PMD_SIZE);
/*
* We don't have to worry about the ordering of src and dst
* ptlocks because exclusive mmap_lock prevents deadlock.
*/
- old_ptl = pmd_lock(vma->vm_mm, old_pmd);
+ old_ptl = pmd_lock(mm, old_pmd);
new_ptl = pmd_lockptr(mm, new_pmd);
if (new_ptl != old_ptl)
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
@@ -258,13 +260,23 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
pmd = *old_pmd;
pmd_clear(old_pmd);
+ /*
+ * Mark the range. We are not freeing page table pages nor
+ * regular pages. Hence we don't need to call tlb_remove_table()
+ * or tlb_remove_page().
+ */
+ tlb_flush_pte_range(&tlb, old_addr, PMD_SIZE);
+ tlb.freed_tables = 1;
VM_BUG_ON(!pmd_none(*new_pmd));
pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd));
- flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
if (new_ptl != old_ptl)
spin_unlock(new_ptl);
spin_unlock(old_ptl);
+ /*
+ * This will invalidate both the old TLB and page table walk caches.
+ */
+ tlb_finish_mmu(&tlb, old_addr, PMD_SIZE);
return true;
}
@@ -283,6 +295,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
{
spinlock_t *old_ptl, *new_ptl;
struct mm_struct *mm = vma->vm_mm;
+ struct mmu_gather tlb;
pud_t pud;
/*
@@ -292,11 +305,12 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
if (WARN_ON_ONCE(!pud_none(*new_pud)))
return false;
+ tlb_gather_mmu(&tlb, mm, old_addr, PUD_SIZE);
/*
* We don't have to worry about the ordering of src and dst
* ptlocks because exclusive mmap_lock prevents deadlock.
*/
- old_ptl = pud_lock(vma->vm_mm, old_pud);
+ old_ptl = pud_lock(mm, old_pud);
new_ptl = pud_lockptr(mm, new_pud);
if (new_ptl != old_ptl)
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
@@ -305,14 +319,25 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
pud = *old_pud;
pud_clear(old_pud);
+ /*
+ * Mark the range. We are not freeing page table pages nor
+ * regular pages. Hence we don't need to call tlb_remove_table()
+ * or tlb_remove_page().
+ */
+ tlb_flush_pte_range(&tlb, old_addr, PUD_SIZE);
+ tlb.freed_tables = 1;
VM_BUG_ON(!pud_none(*new_pud));
pud_populate(mm, new_pud, (pmd_t *)pud_page_vaddr(pud));
- flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE);
+
if (new_ptl != old_ptl)
spin_unlock(new_ptl);
spin_unlock(old_ptl);
+ /*
+ * This will invalidate both the old TLB and page table walk caches.
+ */
+ tlb_finish_mmu(&tlb, old_addr, PUD_SIZE);
return true;
}
#else
--
2.29.2
next prev parent reply other threads:[~2021-02-03 0:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-02 9:11 [RFC PATCH 1/6] selftest/mremap_test: Update the test to handle pagesize other than 4K Aneesh Kumar K.V
2021-02-02 9:11 ` [RFC PATCH 2/6] selftest/mremap_test: Avoid crash with static build Aneesh Kumar K.V
2021-02-02 9:11 ` [RFC PATCH 3/6] mm/mremap: Use pmd/pud_poplulate to update page table entries Aneesh Kumar K.V
2021-02-02 10:47 ` Peter Zijlstra
2021-02-02 14:50 ` Aneesh Kumar K.V
2021-02-02 9:11 ` Aneesh Kumar K.V [this message]
2021-02-02 9:11 ` [RFC PATCH 5/6] mm/mremap: Allow arch runtime override Aneesh Kumar K.V
2021-02-02 9:11 ` [RFC PATCH 6/6] powerpc/mm: Enable move pmd/pud Aneesh Kumar K.V
2021-02-02 13:29 ` [RFC PATCH 1/6] selftest/mremap_test: Update the test to handle pagesize other than 4K Li Xinhai
2021-02-02 14:51 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210202091116.196134-4-aneesh.kumar@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=joel@joelfernandes.org \
--cc=kaleshsingh@google.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).