All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Zi Yan <ziy@nvidia.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Joshua Hahn <joshua.hahnjy@gmail.com>,
	Rakie Kim <rakie.kim@sk.com>, Byungchul Park <byungchul@sk.com>,
	Gregory Price <gourry@gourry.net>,
	Ying Huang <ying.huang@linux.alibaba.com>,
	Alistair Popple <apopple@nvidia.com>,
	Pedro Falcato <pfalcato@suse.de>, Rik van Riel <riel@surriel.com>,
	Harry Yoo <harry.yoo@oracle.com>
Subject: [PATCH v1 3/4] mm: split folio_pte_batch() into folio_pte_batch() and folio_pte_batch_ext()
Date: Fri, 27 Jun 2025 13:55:09 +0200	[thread overview]
Message-ID: <20250627115510.3273675-4-david@redhat.com> (raw)
In-Reply-To: <20250627115510.3273675-1-david@redhat.com>

Many users (including upcoming ones) don't really need the flags etc,
and can live with a function call.

So let's provide a basic, non-inlined folio_pte_batch().

In zap_present_ptes(), where we care about performance, the compiler
already seem to generate a call to a common inlined folio_pte_batch()
variant, shared with fork() code. So calling the new non-inlined variant
should not make a difference.

While at it, drop the "addr" parameter that is unused.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/internal.h  | 11 ++++++++---
 mm/madvise.c   |  4 ++--
 mm/memory.c    |  6 ++----
 mm/mempolicy.c |  3 +--
 mm/mlock.c     |  3 +--
 mm/mremap.c    |  3 +--
 mm/rmap.c      |  3 +--
 mm/util.c      | 29 +++++++++++++++++++++++++++++
 8 files changed, 45 insertions(+), 17 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index ca6590c6d9eab..6000b683f68ee 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -218,9 +218,8 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
 }
 
 /**
- * folio_pte_batch - detect a PTE batch for a large folio
+ * folio_pte_batch_ext - detect a PTE batch for a large folio
  * @folio: The large folio to detect a PTE batch for.
- * @addr: The user virtual address the first page is mapped at.
  * @ptep: Page table pointer for the first entry.
  * @pte: Page table entry for the first page.
  * @max_nr: The maximum number of table entries to consider.
@@ -243,9 +242,12 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
  * must be limited by the caller so scanning cannot exceed a single VMA and
  * a single page table.
  *
+ * This function will be inlined to optimize based on the input parameters;
+ * consider using folio_pte_batch() instead if applicable.
+ *
  * Return: the number of table entries in the batch.
  */
-static inline unsigned int folio_pte_batch(struct folio *folio, unsigned long addr,
+static inline unsigned int folio_pte_batch_ext(struct folio *folio,
 		pte_t *ptep, pte_t pte, unsigned int max_nr, fpb_t flags,
 		bool *any_writable, bool *any_young, bool *any_dirty)
 {
@@ -293,6 +295,9 @@ static inline unsigned int folio_pte_batch(struct folio *folio, unsigned long ad
 	return min(nr, max_nr);
 }
 
+unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte,
+		unsigned int max_nr);
+
 /**
  * pte_move_swp_offset - Move the swap entry offset field of a swap pte
  *	 forward or backward by delta
diff --git a/mm/madvise.c b/mm/madvise.c
index 661bb743d2216..9b9c35a398ed0 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -349,8 +349,8 @@ static inline int madvise_folio_pte_batch(unsigned long addr, unsigned long end,
 {
 	int max_nr = (end - addr) / PAGE_SIZE;
 
-	return folio_pte_batch(folio, addr, ptep, pte, max_nr, 0, NULL,
-			       any_young, any_dirty);
+	return folio_pte_batch_ext(folio, ptep, pte, max_nr, 0, NULL,
+				   any_young, any_dirty);
 }
 
 static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
diff --git a/mm/memory.c b/mm/memory.c
index ab2d6c1425691..43d35d6675f2e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -995,7 +995,7 @@ copy_present_ptes(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma
 		if (vma_soft_dirty_enabled(src_vma))
 			flags |= FPB_HONOR_SOFT_DIRTY;
 
-		nr = folio_pte_batch(folio, addr, src_pte, pte, max_nr, flags,
+		nr = folio_pte_batch_ext(folio, src_pte, pte, max_nr, flags,
 				     &any_writable, NULL, NULL);
 		folio_ref_add(folio, nr);
 		if (folio_test_anon(folio)) {
@@ -1564,9 +1564,7 @@ static inline int zap_present_ptes(struct mmu_gather *tlb,
 	 * by keeping the batching logic separate.
 	 */
 	if (unlikely(folio_test_large(folio) && max_nr != 1)) {
-		nr = folio_pte_batch(folio, addr, pte, ptent, max_nr, 0,
-				     NULL, NULL, NULL);
-
+		nr = folio_pte_batch(folio, pte, ptent, max_nr);
 		zap_present_folio_ptes(tlb, vma, folio, page, pte, ptent, nr,
 				       addr, details, rss, force_flush,
 				       force_break, any_skipped);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 2a25eedc3b1c0..eb83cff7db8c3 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -711,8 +711,7 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr,
 		if (!folio || folio_is_zone_device(folio))
 			continue;
 		if (folio_test_large(folio) && max_nr != 1)
-			nr = folio_pte_batch(folio, addr, pte, ptent,
-					     max_nr, 0, NULL, NULL, NULL);
+			nr = folio_pte_batch(folio, pte, ptent, max_nr);
 		/*
 		 * vm_normal_folio() filters out zero pages, but there might
 		 * still be reserved folios to skip, perhaps in a VDSO.
diff --git a/mm/mlock.c b/mm/mlock.c
index 2238cdc5eb1c1..a1d93ad33c6db 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -313,8 +313,7 @@ static inline unsigned int folio_mlock_step(struct folio *folio,
 	if (!folio_test_large(folio))
 		return 1;
 
-	return folio_pte_batch(folio, addr, pte, ptent, count, 0, NULL,
-			       NULL, NULL);
+	return folio_pte_batch(folio, pte, ptent, count);
 }
 
 static inline bool allow_mlock_munlock(struct folio *folio,
diff --git a/mm/mremap.c b/mm/mremap.c
index d4d3ffc931502..1f5bebbb9c0cb 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -182,8 +182,7 @@ static int mremap_folio_pte_batch(struct vm_area_struct *vma, unsigned long addr
 	if (!folio || !folio_test_large(folio))
 		return 1;
 
-	return folio_pte_batch(folio, addr, ptep, pte, max_nr, 0, NULL,
-			       NULL, NULL);
+	return folio_pte_batch(folio, ptep, pte, max_nr);
 }
 
 static int move_ptes(struct pagetable_move_control *pmc,
diff --git a/mm/rmap.c b/mm/rmap.c
index a29d7d29c7283..6658968600b72 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1859,8 +1859,7 @@ static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
 	if (pte_pfn(pte) != folio_pfn(folio))
 		return false;
 
-	return folio_pte_batch(folio, addr, ptep, pte, max_nr, 0, NULL,
-			       NULL, NULL) == max_nr;
+	return folio_pte_batch(folio, ptep, pte, max_nr);
 }
 
 /*
diff --git a/mm/util.c b/mm/util.c
index 0b270c43d7d12..d29dcc135ad28 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1171,3 +1171,32 @@ int compat_vma_mmap_prepare(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 EXPORT_SYMBOL(compat_vma_mmap_prepare);
+
+#ifdef CONFIG_MMU
+/**
+ * folio_pte_batch - detect a PTE batch for a large folio
+ * @folio: The large folio to detect a PTE batch for.
+ * @ptep: Page table pointer for the first entry.
+ * @pte: Page table entry for the first page.
+ * @max_nr: The maximum number of table entries to consider.
+ *
+ * This is a simplified variant of folio_pte_batch_ext().
+ *
+ * Detect a PTE batch: consecutive (present) PTEs that map consecutive
+ * pages of the same large folio in a single VMA and a single page table.
+ *
+ * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
+ * the accessed bit, writable bit, dirt-bit and soft-dirty bit.
+ *
+ * ptep must map any page of the folio. max_nr must be at least one and
+ * must be limited by the caller so scanning cannot exceed a single VMA and
+ * a single page table.
+ *
+ * Return: the number of table entries in the batch.
+ */
+unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte,
+		unsigned int max_nr)
+{
+	return folio_pte_batch_ext(folio, ptep, pte, max_nr, 0, NULL, NULL, NULL);
+}
+#endif /* CONFIG_MMU */
-- 
2.49.0



  parent reply	other threads:[~2025-06-27 11:55 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-27 11:55 [PATCH v1 0/4] mm: folio_pte_batch() improvements David Hildenbrand
2025-06-27 11:55 ` [PATCH v1 1/4] mm: convert FPB_IGNORE_* into FPB_HONOR_* David Hildenbrand
2025-06-27 13:40   ` Lance Yang
2025-06-27 16:28   ` Lorenzo Stoakes
2025-06-27 16:30     ` David Hildenbrand
2025-06-27 16:33       ` Lorenzo Stoakes
2025-06-29  8:59         ` Mike Rapoport
2025-06-30 13:47           ` David Hildenbrand
2025-06-28  3:37   ` Dev Jain
2025-06-28 21:00     ` David Hildenbrand
2025-06-30  3:34       ` Dev Jain
2025-06-30  9:04         ` Ryan Roberts
2025-06-30  9:08           ` David Hildenbrand
2025-06-30  9:18             ` Ryan Roberts
2025-06-30  9:24               ` David Hildenbrand
2025-06-30 10:57                 ` Ryan Roberts
2025-06-30 11:01                   ` David Hildenbrand
2025-06-30 14:35   ` Zi Yan
2025-07-02  8:31   ` Oscar Salvador
2025-06-27 11:55 ` [PATCH v1 2/4] mm: smaller folio_pte_batch() improvements David Hildenbrand
2025-06-27 13:58   ` Lance Yang
2025-06-27 16:51   ` Lorenzo Stoakes
2025-06-27 17:02     ` David Hildenbrand
2025-06-27 18:39       ` Lorenzo Stoakes
2025-06-30 17:40   ` Zi Yan
2025-07-02  8:42   ` Oscar Salvador
2025-07-02  8:48     ` David Hildenbrand
2025-07-02  8:51       ` Lorenzo Stoakes
2025-07-02  9:00         ` David Hildenbrand
2025-07-02  9:08           ` Lorenzo Stoakes
2025-07-02  9:11             ` David Hildenbrand
2025-06-27 11:55 ` David Hildenbrand [this message]
2025-06-27 14:19   ` [PATCH v1 3/4] mm: split folio_pte_batch() into folio_pte_batch() and folio_pte_batch_ext() Lance Yang
2025-06-27 15:09     ` David Hildenbrand
2025-06-27 15:45       ` Lance Yang
2025-06-27 18:48   ` Lorenzo Stoakes
2025-06-30  9:19     ` David Hildenbrand
2025-06-30 10:41       ` Lorenzo Stoakes
2025-06-30 10:54         ` David Hildenbrand
2025-06-30 17:45   ` Zi Yan
2025-07-02  9:02   ` Oscar Salvador
2025-07-02  9:05     ` David Hildenbrand
2025-07-02  9:07       ` Oscar Salvador
2025-07-02  9:11         ` David Hildenbrand
2025-07-02  9:09   ` Oscar Salvador
2025-06-27 11:55 ` [PATCH v1 4/4] mm: remove boolean output parameters from folio_pte_batch_ext() David Hildenbrand
2025-06-27 14:34   ` Lance Yang
2025-06-27 15:11     ` David Hildenbrand
2025-06-27 15:40       ` Lance Yang
2025-06-27 19:04   ` Lorenzo Stoakes
2025-06-30  9:32     ` David Hildenbrand
2025-06-30 11:08       ` Lorenzo Stoakes
2025-06-30 11:16         ` David Hildenbrand
2025-06-30 11:18           ` Lorenzo Stoakes
2025-06-30 11:21             ` David Hildenbrand
2025-06-30 17:59   ` Zi Yan
2025-07-02  9:08     ` David Hildenbrand
2025-07-02  9:09       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250627115510.3273675-4-david@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=gourry@gourry.net \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=pfalcato@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.