[PATCH v3 2/2] mm: Optimize mremap() by PTE batching

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dev Jain <dev.jain@arm.com>
To: akpm@linux-foundation.org
Cc: Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com,
	vbabka@suse.cz, jannh@google.com, pfalcato@suse.de,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	david@redhat.com, peterx@redhat.com, ryan.roberts@arm.com,
	mingo@kernel.org, libang.li@antgroup.com, maobibo@loongson.cn,
	zhengqi.arch@bytedance.com, baohua@kernel.org,
	anshuman.khandual@arm.com, willy@infradead.org,
	ioworker0@gmail.com, yang@os.amperecomputing.com,
	baolin.wang@linux.alibaba.com, ziy@nvidia.com, hughd@google.com,
	Dev Jain <dev.jain@arm.com>
Subject: [PATCH v3 2/2] mm: Optimize mremap() by PTE batching
Date: Tue, 27 May 2025 13:20:49 +0530	[thread overview]
Message-ID: <20250527075049.60215-3-dev.jain@arm.com> (raw)
In-Reply-To: <20250527075049.60215-1-dev.jain@arm.com>

Use folio_pte_batch() to optimize move_ptes(). On arm64, if the ptes
are painted with the contig bit, then ptep_get() will iterate through all 16
entries to collect a/d bits. Hence this optimization will result in a 16x
reduction in the number of ptep_get() calls. Next, ptep_get_and_clear()
will eventually call contpte_try_unfold() on every contig block, thus
flushing the TLB for the complete large folio range. Instead, use
get_and_clear_full_ptes() so as to elide TLBIs on each contig block, and only
do them on the starting and ending contig block.

Signed-off-by: Dev Jain <dev.jain@arm.com>
---
 mm/mremap.c | 40 +++++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 0163e02e5aa8..580b41f8d169 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -170,6 +170,24 @@ static pte_t move_soft_dirty_pte(pte_t pte)
 	return pte;
 }
 
+/* mremap a batch of PTEs mapping the same large folio */
+static int mremap_folio_pte_batch(struct vm_area_struct *vma, unsigned long addr,
+		pte_t *ptep, pte_t pte, int max_nr)
+{
+	const fpb_t flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
+	struct folio *folio;
+
+	if (max_nr == 1)
+		return 1;
+
+	folio = vm_normal_folio(vma, addr, pte);
+	if (!folio || !folio_test_large(folio))
+		return 1;
+
+	return folio_pte_batch(folio, addr, ptep, pte, max_nr, flags, NULL,
+			       NULL, NULL);
+}
+
 static int move_ptes(struct pagetable_move_control *pmc,
 		unsigned long extent, pmd_t *old_pmd, pmd_t *new_pmd)
 {
@@ -177,7 +195,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
 	bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma);
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *old_ptep, *new_ptep;
-	pte_t pte;
+	pte_t old_pte, pte;
 	pmd_t dummy_pmdval;
 	spinlock_t *old_ptl, *new_ptl;
 	bool force_flush = false;
@@ -185,6 +203,8 @@ static int move_ptes(struct pagetable_move_control *pmc,
 	unsigned long new_addr = pmc->new_addr;
 	unsigned long old_end = old_addr + extent;
 	unsigned long len = old_end - old_addr;
+	int max_nr_ptes;
+	int nr_ptes;
 	int err = 0;
 
 	/*
@@ -236,12 +256,14 @@ static int move_ptes(struct pagetable_move_control *pmc,
 	flush_tlb_batched_pending(vma->vm_mm);
 	arch_enter_lazy_mmu_mode();
 
-	for (; old_addr < old_end; old_ptep++, old_addr += PAGE_SIZE,
-				   new_ptep++, new_addr += PAGE_SIZE) {
-		if (pte_none(ptep_get(old_ptep)))
+	for (; old_addr < old_end; old_ptep += nr_ptes, old_addr += nr_ptes * PAGE_SIZE,
+		new_ptep += nr_ptes, new_addr += nr_ptes * PAGE_SIZE) {
+		nr_ptes = 1;
+		max_nr_ptes = (old_end - old_addr) >> PAGE_SHIFT;
+		old_pte = ptep_get(old_ptep);
+		if (pte_none(old_pte))
 			continue;
 
-		pte = ptep_get_and_clear(mm, old_addr, old_ptep);
 		/*
 		 * If we are remapping a valid PTE, make sure
 		 * to flush TLB before we drop the PTL for the
@@ -253,8 +275,12 @@ static int move_ptes(struct pagetable_move_control *pmc,
 		 * the TLB entry for the old mapping has been
 		 * flushed.
 		 */
-		if (pte_present(pte))
+		if (pte_present(old_pte)) {
+			nr_ptes = mremap_folio_pte_batch(vma, old_addr, old_ptep,
+							 old_pte, max_nr_ptes);
 			force_flush = true;
+		}
+		pte = get_and_clear_full_ptes(mm, old_addr, old_ptep, nr_ptes, 0);
 		pte = move_pte(pte, old_addr, new_addr);
 		pte = move_soft_dirty_pte(pte);
 
@@ -267,7 +293,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
 				else if (is_swap_pte(pte))
 					pte = pte_swp_clear_uffd_wp(pte);
 			}
-			set_pte_at(mm, new_addr, new_ptep, pte);
+			set_ptes(mm, new_addr, new_ptep, pte, nr_ptes);
 		}
 	}
 
-- 
2.30.2

next prev parent reply	other threads:[~2025-05-27  7:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-27  7:50 [PATCH v3 0/2] Optimize mremap() for large folios Dev Jain
2025-05-27  7:50 ` [PATCH v3 1/2] mm: Call pointers to ptes as ptep Dev Jain
2025-05-27  7:50 ` Dev Jain [this message]
2025-05-27 10:45   ` [PATCH v3 2/2] mm: Optimize mremap() by PTE batching Lorenzo Stoakes
2025-05-27 16:22     ` Dev Jain
2025-05-27 16:29       ` Lorenzo Stoakes
2025-05-27 16:38         ` Dev Jain
2025-05-27 16:46           ` Lorenzo Stoakes
2025-05-28  3:32             ` Dev Jain
2025-05-28  4:49               ` Lorenzo Stoakes
2025-05-28  6:15                 ` Dev Jain
2025-05-27 10:50 ` [PATCH v3 0/2] Optimize mremap() for large folios Lorenzo Stoakes
2025-05-27 16:26   ` Dev Jain
2025-05-27 16:32     ` Lorenzo Stoakes

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:0163e02e5aa dfblob:580b41f8d16 )
 OR (
bs:"[PATCH v3 2/2] mm: Optimize mremap() by PTE batching" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250527075049.60215-3-dev.jain@arm.com \
    --to=dev.jain@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=jannh@google.com \
    --cc=libang.li@antgroup.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=maobibo@loongson.cn \
    --cc=mingo@kernel.org \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=ryan.roberts@arm.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.