* [PATCH 0/7] Optimize mprotect for large folios
@ 2025-04-28 12:04 Dev Jain
2025-04-28 12:04 ` [PATCH 1/7] mm: Refactor code in mprotect Dev Jain
` (8 more replies)
0 siblings, 9 replies; 19+ messages in thread
From: Dev Jain @ 2025-04-28 12:04 UTC (permalink / raw)
To: akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy, Dev Jain
This patchset optimizes the mprotect() system call for large folios
by PTE-batching.
We use the following test cases to measure performance, mprotect()'ing
the mapped memory to read-only then read-write 40 times:
Test case 1: Mapping 1G of memory, touching it to get PMD-THPs, then
pte-mapping those THPs
Test case 2: Mapping 1G of memory with 64K mTHPs
Test case 3: Mapping 1G of memory with 4K pages
Average execution time on arm64, Apple M3:
Before the patchset:
T1: 7.9 seconds T2: 7.9 seconds T3: 4.2 seconds
After the patchset:
T1: 2.1 seconds T2: 2.2 seconds T3: 4.2 seconds
Observing T1/T2 and T3 before the patchset, we also remove the regression
introduced by ptep_get() on a contpte block. And, for large folios we get
an almost 276% performance improvement.
Dev Jain (7):
mm: Refactor code in mprotect
mm: Optimize mprotect() by batch-skipping PTEs
mm: Add batched versions of ptep_modify_prot_start/commit
arm64: Add batched version of ptep_modify_prot_start
arm64: Add batched version of ptep_modify_prot_commit
mm: Batch around can_change_pte_writable()
mm: Optimize mprotect() through PTE-batching
arch/arm64/include/asm/pgtable.h | 10 ++
arch/arm64/mm/mmu.c | 21 +++-
include/linux/mm.h | 4 +-
include/linux/pgtable.h | 42 ++++++++
mm/gup.c | 2 +-
mm/huge_memory.c | 4 +-
mm/memory.c | 6 +-
mm/mprotect.c | 163 +++++++++++++++++++++----------
mm/pgtable-generic.c | 16 ++-
9 files changed, 198 insertions(+), 70 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/7] mm: Refactor code in mprotect
2025-04-28 12:04 [PATCH 0/7] Optimize mprotect for large folios Dev Jain
@ 2025-04-28 12:04 ` Dev Jain
2025-04-28 12:04 ` [PATCH 2/7] mm: Optimize mprotect() by batch-skipping PTEs Dev Jain
` (7 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2025-04-28 12:04 UTC (permalink / raw)
To: akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy, Dev Jain
Reduce indentation in change_pte_range() by refactoring some of the code
into a new function. No functional change.
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
mm/mprotect.c | 116 +++++++++++++++++++++++++++++---------------------
1 file changed, 68 insertions(+), 48 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 62c1f7945741..8d635c7fc81f 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -83,6 +83,71 @@ bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
return pte_dirty(pte);
}
+
+
+static bool prot_numa_skip(struct vm_area_struct *vma, struct folio *folio,
+ int target_node)
+{
+ bool toptier;
+ int nid;
+
+ /* Also skip shared copy-on-write pages */
+ if (is_cow_mapping(vma->vm_flags) &&
+ (folio_maybe_dma_pinned(folio) ||
+ folio_maybe_mapped_shared(folio)))
+ return true;
+
+ /*
+ * While migration can move some dirty pages,
+ * it cannot move them all from MIGRATE_ASYNC
+ * context.
+ */
+ if (folio_is_file_lru(folio) &&
+ folio_test_dirty(folio))
+ return true;
+
+ /*
+ * Don't mess with PTEs if page is already on the node
+ * a single-threaded process is running on.
+ */
+ nid = folio_nid(folio);
+ if (target_node == nid)
+ return true;
+ toptier = node_is_toptier(nid);
+
+ /*
+ * Skip scanning top tier node if normal numa
+ * balancing is disabled
+ */
+ if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
+ toptier)
+ return true;
+ return false;
+}
+
+static bool prot_numa_avoid_fault(struct vm_area_struct *vma,
+ unsigned long addr, pte_t oldpte, int target_node)
+{
+ struct folio *folio;
+ int ret;
+
+ /* Avoid TLB flush if possible */
+ if (pte_protnone(oldpte))
+ return true;
+
+ folio = vm_normal_folio(vma, addr, oldpte);
+ if (!folio || folio_is_zone_device(folio) ||
+ folio_test_ksm(folio))
+ return true;
+ ret = prot_numa_skip(vma, folio, target_node);
+ if (ret)
+ return ret;
+ if (folio_use_access_time(folio))
+ folio_xchg_access_time(folio,
+ jiffies_to_msecs(jiffies));
+ return false;
+}
+
static long change_pte_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr,
unsigned long end, pgprot_t newprot, unsigned long cp_flags)
@@ -116,56 +181,11 @@ static long change_pte_range(struct mmu_gather *tlb,
* Avoid trapping faults against the zero or KSM
* pages. See similar comment in change_huge_pmd.
*/
- if (prot_numa) {
- struct folio *folio;
- int nid;
- bool toptier;
-
- /* Avoid TLB flush if possible */
- if (pte_protnone(oldpte))
- continue;
-
- folio = vm_normal_folio(vma, addr, oldpte);
- if (!folio || folio_is_zone_device(folio) ||
- folio_test_ksm(folio))
- continue;
-
- /* Also skip shared copy-on-write pages */
- if (is_cow_mapping(vma->vm_flags) &&
- (folio_maybe_dma_pinned(folio) ||
- folio_maybe_mapped_shared(folio)))
- continue;
-
- /*
- * While migration can move some dirty pages,
- * it cannot move them all from MIGRATE_ASYNC
- * context.
- */
- if (folio_is_file_lru(folio) &&
- folio_test_dirty(folio))
+ if (prot_numa &&
+ prot_numa_avoid_fault(vma, addr,
+ oldpte, target_node))
continue;
- /*
- * Don't mess with PTEs if page is already on the node
- * a single-threaded process is running on.
- */
- nid = folio_nid(folio);
- if (target_node == nid)
- continue;
- toptier = node_is_toptier(nid);
-
- /*
- * Skip scanning top tier node if normal numa
- * balancing is disabled
- */
- if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
- toptier)
- continue;
- if (folio_use_access_time(folio))
- folio_xchg_access_time(folio,
- jiffies_to_msecs(jiffies));
- }
-
oldpte = ptep_modify_prot_start(vma, addr, pte);
ptent = pte_modify(oldpte, newprot);
--
2.30.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/7] mm: Optimize mprotect() by batch-skipping PTEs
2025-04-28 12:04 [PATCH 0/7] Optimize mprotect for large folios Dev Jain
2025-04-28 12:04 ` [PATCH 1/7] mm: Refactor code in mprotect Dev Jain
@ 2025-04-28 12:04 ` Dev Jain
2025-04-28 12:04 ` [PATCH 3/7] mm: Add batched versions of ptep_modify_prot_start/commit Dev Jain
` (6 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2025-04-28 12:04 UTC (permalink / raw)
To: akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy, Dev Jain
In case of prot_numa, there are various cases in which we can skip to the
next iteration. Since the skip condition is based on the folio and not
the PTEs, we can skip a PTE batch.
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
mm/mprotect.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 8d635c7fc81f..33eabc995584 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -91,6 +91,9 @@ static bool prot_numa_skip(struct vm_area_struct *vma, struct folio *folio,
bool toptier;
int nid;
+ if (folio_is_zone_device(folio) || folio_test_ksm(folio))
+ return true;
+
/* Also skip shared copy-on-write pages */
if (is_cow_mapping(vma->vm_flags) &&
(folio_maybe_dma_pinned(folio) ||
@@ -126,8 +129,10 @@ static bool prot_numa_skip(struct vm_area_struct *vma, struct folio *folio,
}
static bool prot_numa_avoid_fault(struct vm_area_struct *vma,
- unsigned long addr, pte_t oldpte, int target_node)
+ unsigned long addr, pte_t *pte, pte_t oldpte, int target_node,
+ int max_nr, int *nr)
{
+ const fpb_t flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
struct folio *folio;
int ret;
@@ -136,12 +141,16 @@ static bool prot_numa_avoid_fault(struct vm_area_struct *vma,
return true;
folio = vm_normal_folio(vma, addr, oldpte);
- if (!folio || folio_is_zone_device(folio) ||
- folio_test_ksm(folio))
+ if (!folio)
return true;
+
ret = prot_numa_skip(vma, folio, target_node);
- if (ret)
+ if (ret) {
+ if (folio_test_large(folio) && max_nr != 1)
+ *nr = folio_pte_batch(folio, addr, pte, oldpte,
+ max_nr, flags, NULL, NULL, NULL);
return ret;
+ }
if (folio_use_access_time(folio))
folio_xchg_access_time(folio,
jiffies_to_msecs(jiffies));
@@ -159,6 +168,7 @@ static long change_pte_range(struct mmu_gather *tlb,
bool prot_numa = cp_flags & MM_CP_PROT_NUMA;
bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
+ int nr;
tlb_change_page_size(tlb, PAGE_SIZE);
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
@@ -173,8 +183,10 @@ static long change_pte_range(struct mmu_gather *tlb,
flush_tlb_batched_pending(vma->vm_mm);
arch_enter_lazy_mmu_mode();
do {
+ nr = 1;
oldpte = ptep_get(pte);
if (pte_present(oldpte)) {
+ int max_nr = (end - addr) >> PAGE_SHIFT;
pte_t ptent;
/*
@@ -182,8 +194,9 @@ static long change_pte_range(struct mmu_gather *tlb,
* pages. See similar comment in change_huge_pmd.
*/
if (prot_numa &&
- prot_numa_avoid_fault(vma, addr,
- oldpte, target_node))
+ prot_numa_avoid_fault(vma, addr, pte,
+ oldpte, target_node,
+ max_nr, &nr))
continue;
oldpte = ptep_modify_prot_start(vma, addr, pte);
@@ -300,7 +313,7 @@ static long change_pte_range(struct mmu_gather *tlb,
pages++;
}
}
- } while (pte++, addr += PAGE_SIZE, addr != end);
+ } while (pte += nr, addr += nr * PAGE_SIZE, addr != end);
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
--
2.30.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/7] mm: Add batched versions of ptep_modify_prot_start/commit
2025-04-28 12:04 [PATCH 0/7] Optimize mprotect for large folios Dev Jain
2025-04-28 12:04 ` [PATCH 1/7] mm: Refactor code in mprotect Dev Jain
2025-04-28 12:04 ` [PATCH 2/7] mm: Optimize mprotect() by batch-skipping PTEs Dev Jain
@ 2025-04-28 12:04 ` Dev Jain
2025-04-28 12:04 ` [PATCH 4/7] arm64: Add batched version of ptep_modify_prot_start Dev Jain
` (5 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2025-04-28 12:04 UTC (permalink / raw)
To: akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy, Dev Jain
Batch ptep_modify_prot_start/commit in preparation for optimizing mprotect.
Architecture can override these helpers.
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
include/linux/pgtable.h | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index b50447ef1c92..ed287289335f 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -891,6 +891,44 @@ static inline void wrprotect_ptes(struct mm_struct *mm, unsigned long addr,
}
#endif
+/* See the comment for ptep_modify_prot_start */
+#ifndef modify_prot_start_ptes
+static inline pte_t modify_prot_start_ptes(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep, unsigned int nr)
+{
+ pte_t pte, tmp_pte;
+
+ pte = ptep_modify_prot_start(vma, addr, ptep);
+ while (--nr) {
+ ptep++;
+ addr += PAGE_SIZE;
+ tmp_pte = ptep_modify_prot_start(vma, addr, ptep);
+ if (pte_dirty(tmp_pte))
+ pte = pte_mkdirty(pte);
+ if (pte_young(tmp_pte))
+ pte = pte_mkyoung(pte);
+ }
+ return pte;
+}
+#endif
+
+/* See the comment for ptep_modify_prot_commit */
+#ifndef modify_prot_commit_ptes
+static inline void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep, pte_t old_pte, pte_t pte, unsigned int nr)
+{
+ for (;;) {
+ ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte);
+ if (--nr == 0)
+ break;
+ ptep++;
+ addr += PAGE_SIZE;
+ old_pte = pte_next_pfn(old_pte);
+ pte = pte_next_pfn(pte);
+ }
+}
+#endif
+
/*
* On some architectures hardware does not set page access bit when accessing
* memory page, it is responsibility of software setting this bit. It brings
--
2.30.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/7] arm64: Add batched version of ptep_modify_prot_start
2025-04-28 12:04 [PATCH 0/7] Optimize mprotect for large folios Dev Jain
` (2 preceding siblings ...)
2025-04-28 12:04 ` [PATCH 3/7] mm: Add batched versions of ptep_modify_prot_start/commit Dev Jain
@ 2025-04-28 12:04 ` Dev Jain
2025-04-28 18:06 ` Zi Yan
2025-04-28 12:04 ` [PATCH 5/7] arm64: Add batched version of ptep_modify_prot_commit Dev Jain
` (4 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Dev Jain @ 2025-04-28 12:04 UTC (permalink / raw)
To: akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy, Dev Jain
Override the generic definition to use get_and_clear_full_ptes(), so that
we do a TLBI possibly only on the "contpte-edges" of the large PTE block,
instead of doing it for every contpte block, which happens for ptep_get_and_clear().
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
arch/arm64/include/asm/pgtable.h | 5 +++++
arch/arm64/mm/mmu.c | 12 +++++++++---
include/linux/pgtable.h | 4 ++++
mm/pgtable-generic.c | 16 +++++++++++-----
4 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2a77f11b78d5..8872ea5f0642 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1553,6 +1553,11 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t old_pte, pte_t new_pte);
+#define modify_prot_start_ptes modify_prot_start_ptes
+extern pte_t modify_prot_start_ptes(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep,
+ unsigned int nr);
+
#ifdef CONFIG_ARM64_CONTPTE
/*
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8fcf59ba39db..fe60be8774f4 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1523,7 +1523,8 @@ static int __init prevent_bootmem_remove_init(void)
early_initcall(prevent_bootmem_remove_init);
#endif
-pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
+pte_t modify_prot_start_ptes(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep, unsigned int nr)
{
if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) {
/*
@@ -1532,9 +1533,14 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte
* in cases where cpu is affected with errata #2645198.
*/
if (pte_user_exec(ptep_get(ptep)))
- return ptep_clear_flush(vma, addr, ptep);
+ return clear_flush_ptes(vma, addr, ptep, nr);
}
- return ptep_get_and_clear(vma->vm_mm, addr, ptep);
+ return get_and_clear_full_ptes(vma->vm_mm, addr, ptep, nr, 0);
+}
+
+pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
+{
+ return modify_prot_start_ptes(vma, addr, ptep, 1);
}
void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep,
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index ed287289335f..10cdb87ccecf 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -828,6 +828,10 @@ extern pte_t ptep_clear_flush(struct vm_area_struct *vma,
pte_t *ptep);
#endif
+extern pte_t clear_flush_ptes(struct vm_area_struct *vma,
+ unsigned long address,
+ pte_t *ptep, unsigned int nr);
+
#ifndef __HAVE_ARCH_PMDP_HUGE_CLEAR_FLUSH
extern pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma,
unsigned long address,
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 5a882f2b10f9..e238f88c3cac 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -90,17 +90,23 @@ int ptep_clear_flush_young(struct vm_area_struct *vma,
}
#endif
-#ifndef __HAVE_ARCH_PTEP_CLEAR_FLUSH
-pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address,
- pte_t *ptep)
+pte_t clear_flush_ptes(struct vm_area_struct *vma, unsigned long address,
+ pte_t *ptep, unsigned int nr)
{
struct mm_struct *mm = (vma)->vm_mm;
pte_t pte;
- pte = ptep_get_and_clear(mm, address, ptep);
+ pte = get_and_clear_full_ptes(mm, address, ptep, nr, 0);
if (pte_accessible(mm, pte))
- flush_tlb_page(vma, address);
+ flush_tlb_range(vma, address, address + nr * PAGE_SIZE);
return pte;
}
+
+#ifndef __HAVE_ARCH_PTEP_CLEAR_FLUSH
+pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address,
+ pte_t *ptep)
+{
+ return clear_flush_ptes(vma, address, ptep, 1);
+}
#endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
--
2.30.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 5/7] arm64: Add batched version of ptep_modify_prot_commit
2025-04-28 12:04 [PATCH 0/7] Optimize mprotect for large folios Dev Jain
` (3 preceding siblings ...)
2025-04-28 12:04 ` [PATCH 4/7] arm64: Add batched version of ptep_modify_prot_start Dev Jain
@ 2025-04-28 12:04 ` Dev Jain
2025-04-28 12:04 ` [PATCH 6/7] mm: Batch around can_change_pte_writable() Dev Jain
` (3 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2025-04-28 12:04 UTC (permalink / raw)
To: akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy, Dev Jain
Override the generic definition to simply use set_ptes() to map the new
ptes into the pagetable.
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
arch/arm64/include/asm/pgtable.h | 5 +++++
arch/arm64/mm/mmu.c | 9 ++++++++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 8872ea5f0642..0b13ca38f80c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1558,6 +1558,11 @@ extern pte_t modify_prot_start_ptes(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
unsigned int nr);
+#define modify_prot_commit_ptes modify_prot_commit_ptes
+extern void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep, pte_t old_pte, pte_t pte,
+ unsigned int nr);
+
#ifdef CONFIG_ARM64_CONTPTE
/*
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index fe60be8774f4..5f04bcdcd946 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1543,10 +1543,17 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte
return modify_prot_start_ptes(vma, addr, ptep, 1);
}
+void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep, pte_t old_pte, pte_t pte,
+ unsigned int nr)
+{
+ set_ptes(vma->vm_mm, addr, ptep, pte, nr);
+}
+
void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep,
pte_t old_pte, pte_t pte)
{
- set_pte_at(vma->vm_mm, addr, ptep, pte);
+ modify_prot_commit_ptes(vma, addr, ptep, old_pte, pte, 1);
}
/*
--
2.30.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 6/7] mm: Batch around can_change_pte_writable()
2025-04-28 12:04 [PATCH 0/7] Optimize mprotect for large folios Dev Jain
` (4 preceding siblings ...)
2025-04-28 12:04 ` [PATCH 5/7] arm64: Add batched version of ptep_modify_prot_commit Dev Jain
@ 2025-04-28 12:04 ` Dev Jain
2025-04-28 12:50 ` Lance Yang
2025-04-28 12:04 ` [PATCH 7/7] mm: Optimize mprotect() through PTE-batching Dev Jain
` (2 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Dev Jain @ 2025-04-28 12:04 UTC (permalink / raw)
To: akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy, Dev Jain
In preparation for patch 7, we need to properly batch around
can_change_pte_writable(). We batch around pte_needs_soft_dirty_wp() by
the corresponding fpb flag, we batch around the page-anon exclusive check
using folio_maybe_mapped_shared(); modify_prot_start_ptes() collects the
dirty and access bits across the batch, therefore batching across
pte_dirty(): this is correct since the dirty bit on the PTE really
is just an indication that the folio got written to, so even if
the PTE is not actually dirty (but one of the PTEs in the batch is),
the wp-fault optimization can be made.
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
include/linux/mm.h | 4 ++--
mm/gup.c | 2 +-
mm/huge_memory.c | 4 ++--
mm/memory.c | 6 +++---
mm/mprotect.c | 9 ++++++---
5 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5eb0d77c4438..ffa02e15863f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2710,8 +2710,8 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen);
#define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
MM_CP_UFFD_WP_RESOLVE)
-bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
- pte_t pte);
+bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned long addr,
+ pte_t pte, struct folio *folio, unsigned int nr);
extern long change_protection(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start,
unsigned long end, unsigned long cp_flags);
diff --git a/mm/gup.c b/mm/gup.c
index 84461d384ae2..6a605fc5f2cb 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -614,7 +614,7 @@ static inline bool can_follow_write_common(struct page *page,
return false;
/*
- * See can_change_pte_writable(): we broke COW and could map the page
+ * See can_change_ptes_writable(): we broke COW and could map the page
* writable if we have an exclusive anonymous page ...
*/
return page && PageAnon(page) && PageAnonExclusive(page);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 28c87e0e036f..e5496c0d9e7e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2032,12 +2032,12 @@ static inline bool can_change_pmd_writable(struct vm_area_struct *vma,
return false;
if (!(vma->vm_flags & VM_SHARED)) {
- /* See can_change_pte_writable(). */
+ /* See can_change_ptes_writable(). */
page = vm_normal_page_pmd(vma, addr, pmd);
return page && PageAnon(page) && PageAnonExclusive(page);
}
- /* See can_change_pte_writable(). */
+ /* See can_change_ptes_writable(). */
return pmd_dirty(pmd);
}
diff --git a/mm/memory.c b/mm/memory.c
index b9e8443aaa86..b1fda3de8d27 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -750,7 +750,7 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
pte = pte_mkuffd_wp(pte);
if ((vma->vm_flags & VM_WRITE) &&
- can_change_pte_writable(vma, address, pte)) {
+ can_change_ptes_writable(vma, address, pte, NULL, 1)) {
if (folio_test_dirty(folio))
pte = pte_mkdirty(pte);
pte = pte_mkwrite(pte, vma);
@@ -5767,7 +5767,7 @@ static void numa_rebuild_large_mapping(struct vm_fault *vmf, struct vm_area_stru
ptent = pte_modify(ptent, vma->vm_page_prot);
writable = pte_write(ptent);
if (!writable && pte_write_upgrade &&
- can_change_pte_writable(vma, addr, ptent))
+ can_change_ptes_writable(vma, addr, ptent, NULL, 1))
writable = true;
}
@@ -5808,7 +5808,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
*/
writable = pte_write(pte);
if (!writable && pte_write_upgrade &&
- can_change_pte_writable(vma, vmf->address, pte))
+ can_change_ptes_writable(vma, vmf->address, pte, NULL, 1))
writable = true;
folio = vm_normal_folio(vma, vmf->address, pte);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 33eabc995584..362fd7e5457d 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -40,8 +40,8 @@
#include "internal.h"
-bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
- pte_t pte)
+bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned long addr,
+ pte_t pte, struct folio *folio, unsigned int nr)
{
struct page *page;
@@ -67,6 +67,9 @@ bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
* write-fault handler similarly would map them writable without
* any additional checks while holding the PT lock.
*/
+ if (unlikely(nr != 1))
+ return !folio_maybe_mapped_shared(folio);
+
page = vm_normal_page(vma, addr, pte);
return page && PageAnon(page) && PageAnonExclusive(page);
}
@@ -222,7 +225,7 @@ static long change_pte_range(struct mmu_gather *tlb,
*/
if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
!pte_write(ptent) &&
- can_change_pte_writable(vma, addr, ptent))
+ can_change_ptes_writable(vma, addr, ptent, folio, 1))
ptent = pte_mkwrite(ptent, vma);
ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
--
2.30.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 7/7] mm: Optimize mprotect() through PTE-batching
2025-04-28 12:04 [PATCH 0/7] Optimize mprotect for large folios Dev Jain
` (5 preceding siblings ...)
2025-04-28 12:04 ` [PATCH 6/7] mm: Batch around can_change_pte_writable() Dev Jain
@ 2025-04-28 12:04 ` Dev Jain
2025-04-28 12:52 ` [PATCH 0/7] Optimize mprotect for large folios Dev Jain
2025-04-28 13:31 ` Lance Yang
8 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2025-04-28 12:04 UTC (permalink / raw)
To: akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy, Dev Jain
The common pte_present case does not require the folio. Elide the overhead of
vm_normal_folio() for the small folio case, by making an approximation:
for arm64, pte_batch_hint() is conclusive. For other arches, if the pfns
pointed to by the current and the next PTE are contiguous, check whether
a large folio is actually mapped, and only then make the batch optimization.
Reuse the folio from prot_numa case if possible. Since modify_prot_start_ptes()
gathers access/dirty bits, it lets us batch around pte_needs_flush()
(for parisc, the definition includes the access bit).
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
mm/mprotect.c | 49 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 35 insertions(+), 14 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 362fd7e5457d..d382d57bc796 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -131,7 +131,7 @@ static bool prot_numa_skip(struct vm_area_struct *vma, struct folio *folio,
return false;
}
-static bool prot_numa_avoid_fault(struct vm_area_struct *vma,
+static struct folio *prot_numa_avoid_fault(struct vm_area_struct *vma,
unsigned long addr, pte_t *pte, pte_t oldpte, int target_node,
int max_nr, int *nr)
{
@@ -141,25 +141,37 @@ static bool prot_numa_avoid_fault(struct vm_area_struct *vma,
/* Avoid TLB flush if possible */
if (pte_protnone(oldpte))
- return true;
+ return NULL;
folio = vm_normal_folio(vma, addr, oldpte);
if (!folio)
- return true;
+ return NULL;
ret = prot_numa_skip(vma, folio, target_node);
if (ret) {
if (folio_test_large(folio) && max_nr != 1)
*nr = folio_pte_batch(folio, addr, pte, oldpte,
max_nr, flags, NULL, NULL, NULL);
- return ret;
+ return NULL;
}
if (folio_use_access_time(folio))
folio_xchg_access_time(folio,
jiffies_to_msecs(jiffies));
- return false;
+ return folio;
}
+static bool maybe_contiguous_pte_pfns(pte_t *ptep, pte_t pte)
+{
+ pte_t *next_ptep, next_pte;
+
+ if (pte_batch_hint(ptep, pte) != 1)
+ return true;
+
+ next_ptep = ptep + 1;
+ next_pte = ptep_get(next_ptep);
+
+ return unlikely(pte_pfn(next_pte) - pte_pfn(pte) == PAGE_SIZE);
+}
static long change_pte_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr,
unsigned long end, pgprot_t newprot, unsigned long cp_flags)
@@ -190,19 +202,28 @@ static long change_pte_range(struct mmu_gather *tlb,
oldpte = ptep_get(pte);
if (pte_present(oldpte)) {
int max_nr = (end - addr) >> PAGE_SHIFT;
+ const fpb_t flags = FPB_IGNORE_DIRTY;
+ struct folio *folio = NULL;
pte_t ptent;
/*
* Avoid trapping faults against the zero or KSM
* pages. See similar comment in change_huge_pmd.
*/
- if (prot_numa &&
- prot_numa_avoid_fault(vma, addr, pte,
- oldpte, target_node,
- max_nr, &nr))
+ if (prot_numa) {
+ folio = prot_numa_avoid_fault(vma, addr, pte,
+ oldpte, target_node, max_nr, &nr);
+ if (!folio)
continue;
+ }
- oldpte = ptep_modify_prot_start(vma, addr, pte);
+ if (!folio && (max_nr != 1) && maybe_contiguous_pte_pfns(pte, oldpte)) {
+ folio = vm_normal_folio(vma, addr, oldpte);
+ if (folio_test_large(folio))
+ nr = folio_pte_batch(folio, addr, pte,
+ oldpte, max_nr, flags, NULL, NULL, NULL);
+ }
+ oldpte = modify_prot_start_ptes(vma, addr, pte, nr);
ptent = pte_modify(oldpte, newprot);
if (uffd_wp)
@@ -225,13 +246,13 @@ static long change_pte_range(struct mmu_gather *tlb,
*/
if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
!pte_write(ptent) &&
- can_change_ptes_writable(vma, addr, ptent, folio, 1))
+ can_change_ptes_writable(vma, addr, ptent, folio, nr))
ptent = pte_mkwrite(ptent, vma);
- ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
+ modify_prot_commit_ptes(vma, addr, pte, oldpte, ptent, nr);
if (pte_needs_flush(oldpte, ptent))
- tlb_flush_pte_range(tlb, addr, PAGE_SIZE);
- pages++;
+ tlb_flush_pte_range(tlb, addr, nr * PAGE_SIZE);
+ pages += nr;
} else if (is_swap_pte(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);
pte_t newpte;
--
2.30.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] mm: Batch around can_change_pte_writable()
2025-04-28 12:04 ` [PATCH 6/7] mm: Batch around can_change_pte_writable() Dev Jain
@ 2025-04-28 12:50 ` Lance Yang
2025-04-28 12:59 ` Dev Jain
2025-04-28 13:16 ` Lance Yang
0 siblings, 2 replies; 19+ messages in thread
From: Lance Yang @ 2025-04-28 12:50 UTC (permalink / raw)
To: Dev Jain, akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy
Hey Dev,
On 2025/4/28 20:04, Dev Jain wrote:
> In preparation for patch 7, we need to properly batch around
> can_change_pte_writable(). We batch around pte_needs_soft_dirty_wp() by
> the corresponding fpb flag, we batch around the page-anon exclusive check
> using folio_maybe_mapped_shared(); modify_prot_start_ptes() collects the
> dirty and access bits across the batch, therefore batching across
> pte_dirty(): this is correct since the dirty bit on the PTE really
> is just an indication that the folio got written to, so even if
> the PTE is not actually dirty (but one of the PTEs in the batch is),
> the wp-fault optimization can be made.
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
> include/linux/mm.h | 4 ++--
> mm/gup.c | 2 +-
> mm/huge_memory.c | 4 ++--
> mm/memory.c | 6 +++---
> mm/mprotect.c | 9 ++++++---
> 5 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5eb0d77c4438..ffa02e15863f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2710,8 +2710,8 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen);
> #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
> MM_CP_UFFD_WP_RESOLVE)
>
> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
> - pte_t pte);
> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned long addr,
> + pte_t pte, struct folio *folio, unsigned int nr);
> extern long change_protection(struct mmu_gather *tlb,
> struct vm_area_struct *vma, unsigned long start,
> unsigned long end, unsigned long cp_flags);
> diff --git a/mm/gup.c b/mm/gup.c
> index 84461d384ae2..6a605fc5f2cb 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -614,7 +614,7 @@ static inline bool can_follow_write_common(struct page *page,
> return false;
>
> /*
> - * See can_change_pte_writable(): we broke COW and could map the page
> + * See can_change_ptes_writable(): we broke COW and could map the page
> * writable if we have an exclusive anonymous page ...
> */
> return page && PageAnon(page) && PageAnonExclusive(page);
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 28c87e0e036f..e5496c0d9e7e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2032,12 +2032,12 @@ static inline bool can_change_pmd_writable(struct vm_area_struct *vma,
> return false;
>
> if (!(vma->vm_flags & VM_SHARED)) {
> - /* See can_change_pte_writable(). */
> + /* See can_change_ptes_writable(). */
> page = vm_normal_page_pmd(vma, addr, pmd);
> return page && PageAnon(page) && PageAnonExclusive(page);
> }
>
> - /* See can_change_pte_writable(). */
> + /* See can_change_ptes_writable(). */
> return pmd_dirty(pmd);
> }
>
> diff --git a/mm/memory.c b/mm/memory.c
> index b9e8443aaa86..b1fda3de8d27 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -750,7 +750,7 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
> pte = pte_mkuffd_wp(pte);
>
> if ((vma->vm_flags & VM_WRITE) &&
> - can_change_pte_writable(vma, address, pte)) {
> + can_change_ptes_writable(vma, address, pte, NULL, 1)) {
> if (folio_test_dirty(folio))
> pte = pte_mkdirty(pte);
> pte = pte_mkwrite(pte, vma);
> @@ -5767,7 +5767,7 @@ static void numa_rebuild_large_mapping(struct vm_fault *vmf, struct vm_area_stru
> ptent = pte_modify(ptent, vma->vm_page_prot);
> writable = pte_write(ptent);
> if (!writable && pte_write_upgrade &&
> - can_change_pte_writable(vma, addr, ptent))
> + can_change_ptes_writable(vma, addr, ptent, NULL, 1))
> writable = true;
> }
>
> @@ -5808,7 +5808,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
> */
> writable = pte_write(pte);
> if (!writable && pte_write_upgrade &&
> - can_change_pte_writable(vma, vmf->address, pte))
> + can_change_ptes_writable(vma, vmf->address, pte, NULL, 1))
> writable = true;
>
> folio = vm_normal_folio(vma, vmf->address, pte);
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 33eabc995584..362fd7e5457d 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -40,8 +40,8 @@
>
> #include "internal.h"
>
> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
> - pte_t pte)
> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned long addr,
> + pte_t pte, struct folio *folio, unsigned int nr)
> {
> struct page *page;
>
> @@ -67,6 +67,9 @@ bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
> * write-fault handler similarly would map them writable without
> * any additional checks while holding the PT lock.
> */
> + if (unlikely(nr != 1))
> + return !folio_maybe_mapped_shared(folio);
> +
> page = vm_normal_page(vma, addr, pte);
> return page && PageAnon(page) && PageAnonExclusive(page);
> }
IIUC, As mentioned in the comment above, we should do the same anonymous
check
to large folios. And folio_maybe_mapped_shared() already handles both
order-0
and large folios nicely, so we could simplify the logic as follows:
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 1605e89349d2..df56a30bb241 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -43,8 +43,6 @@
bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
long addr,
pte_t pte, struct folio *folio, unsigned
int nr)
{
- struct page *page;
-
if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
return false;
@@ -67,11 +65,7 @@ bool can_change_ptes_writable(struct vm_area_struct
*vma, unsigned long addr,
* write-fault handler similarly would map them
writable without
* any additional checks while holding the PT lock.
*/
- if (unlikely(nr != 1))
- return !folio_maybe_mapped_shared(folio);
-
- page = vm_normal_page(vma, addr, pte);
- return page && PageAnon(page) && PageAnonExclusive(page);
+ return folio_test_anon(folio) &&
!folio_maybe_mapped_shared(folio);
}
VM_WARN_ON_ONCE(is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte));
--
Thanks,
Lance
> @@ -222,7 +225,7 @@ static long change_pte_range(struct mmu_gather *tlb,
> */
> if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
> !pte_write(ptent) &&
> - can_change_pte_writable(vma, addr, ptent))
> + can_change_ptes_writable(vma, addr, ptent, folio, 1))
> ptent = pte_mkwrite(ptent, vma);
>
> ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 0/7] Optimize mprotect for large folios
2025-04-28 12:04 [PATCH 0/7] Optimize mprotect for large folios Dev Jain
` (6 preceding siblings ...)
2025-04-28 12:04 ` [PATCH 7/7] mm: Optimize mprotect() through PTE-batching Dev Jain
@ 2025-04-28 12:52 ` Dev Jain
2025-04-28 13:31 ` Lance Yang
8 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2025-04-28 12:52 UTC (permalink / raw)
To: akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy
On 28/04/25 5:34 pm, Dev Jain wrote:
> This patchset optimizes the mprotect() system call for large folios
> by PTE-batching.
>
> We use the following test cases to measure performance, mprotect()'ing
> the mapped memory to read-only then read-write 40 times:
>
> Test case 1: Mapping 1G of memory, touching it to get PMD-THPs, then
> pte-mapping those THPs
> Test case 2: Mapping 1G of memory with 64K mTHPs
> Test case 3: Mapping 1G of memory with 4K pages
>
> Average execution time on arm64, Apple M3:
> Before the patchset:
> T1: 7.9 seconds T2: 7.9 seconds T3: 4.2 seconds
>
> After the patchset:
> T1: 2.1 seconds T2: 2.2 seconds T3: 4.2 seconds
>
> Observing T1/T2 and T3 before the patchset, we also remove the regression
> introduced by ptep_get() on a contpte block. And, for large folios we get
> an almost 276% performance improvement.
Messed up the denominator, I mean ((7.9 - 2.1) / 7.9) * 100 = 73%.
>
> Dev Jain (7):
> mm: Refactor code in mprotect
> mm: Optimize mprotect() by batch-skipping PTEs
> mm: Add batched versions of ptep_modify_prot_start/commit
> arm64: Add batched version of ptep_modify_prot_start
> arm64: Add batched version of ptep_modify_prot_commit
> mm: Batch around can_change_pte_writable()
> mm: Optimize mprotect() through PTE-batching
>
> arch/arm64/include/asm/pgtable.h | 10 ++
> arch/arm64/mm/mmu.c | 21 +++-
> include/linux/mm.h | 4 +-
> include/linux/pgtable.h | 42 ++++++++
> mm/gup.c | 2 +-
> mm/huge_memory.c | 4 +-
> mm/memory.c | 6 +-
> mm/mprotect.c | 163 +++++++++++++++++++++----------
> mm/pgtable-generic.c | 16 ++-
> 9 files changed, 198 insertions(+), 70 deletions(-)
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] mm: Batch around can_change_pte_writable()
2025-04-28 12:50 ` Lance Yang
@ 2025-04-28 12:59 ` Dev Jain
2025-04-28 13:23 ` Lance Yang
2025-04-28 13:16 ` Lance Yang
1 sibling, 1 reply; 19+ messages in thread
From: Dev Jain @ 2025-04-28 12:59 UTC (permalink / raw)
To: Lance Yang, akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy
On 28/04/25 6:20 pm, Lance Yang wrote:
> Hey Dev,
>
> On 2025/4/28 20:04, Dev Jain wrote:
>> In preparation for patch 7, we need to properly batch around
>> can_change_pte_writable(). We batch around pte_needs_soft_dirty_wp() by
>> the corresponding fpb flag, we batch around the page-anon exclusive check
>> using folio_maybe_mapped_shared(); modify_prot_start_ptes() collects the
>> dirty and access bits across the batch, therefore batching across
>> pte_dirty(): this is correct since the dirty bit on the PTE really
>> is just an indication that the folio got written to, so even if
>> the PTE is not actually dirty (but one of the PTEs in the batch is),
>> the wp-fault optimization can be made.
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>> include/linux/mm.h | 4 ++--
>> mm/gup.c | 2 +-
>> mm/huge_memory.c | 4 ++--
>> mm/memory.c | 6 +++---
>> mm/mprotect.c | 9 ++++++---
>> 5 files changed, 14 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 5eb0d77c4438..ffa02e15863f 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -2710,8 +2710,8 @@ int get_cmdline(struct task_struct *task, char
>> *buffer, int buflen);
>> #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
>> MM_CP_UFFD_WP_RESOLVE)
>> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned
>> long addr,
>> - pte_t pte);
>> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>> long addr,
>> + pte_t pte, struct folio *folio, unsigned int nr);
>> extern long change_protection(struct mmu_gather *tlb,
>> struct vm_area_struct *vma, unsigned long start,
>> unsigned long end, unsigned long cp_flags);
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 84461d384ae2..6a605fc5f2cb 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -614,7 +614,7 @@ static inline bool can_follow_write_common(struct
>> page *page,
>> return false;
>> /*
>> - * See can_change_pte_writable(): we broke COW and could map the
>> page
>> + * See can_change_ptes_writable(): we broke COW and could map the
>> page
>> * writable if we have an exclusive anonymous page ...
>> */
>> return page && PageAnon(page) && PageAnonExclusive(page);
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 28c87e0e036f..e5496c0d9e7e 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2032,12 +2032,12 @@ static inline bool
>> can_change_pmd_writable(struct vm_area_struct *vma,
>> return false;
>> if (!(vma->vm_flags & VM_SHARED)) {
>> - /* See can_change_pte_writable(). */
>> + /* See can_change_ptes_writable(). */
>> page = vm_normal_page_pmd(vma, addr, pmd);
>> return page && PageAnon(page) && PageAnonExclusive(page);
>> }
>> - /* See can_change_pte_writable(). */
>> + /* See can_change_ptes_writable(). */
>> return pmd_dirty(pmd);
>> }
>> diff --git a/mm/memory.c b/mm/memory.c
>> index b9e8443aaa86..b1fda3de8d27 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -750,7 +750,7 @@ static void restore_exclusive_pte(struct
>> vm_area_struct *vma,
>> pte = pte_mkuffd_wp(pte);
>> if ((vma->vm_flags & VM_WRITE) &&
>> - can_change_pte_writable(vma, address, pte)) {
>> + can_change_ptes_writable(vma, address, pte, NULL, 1)) {
>> if (folio_test_dirty(folio))
>> pte = pte_mkdirty(pte);
>> pte = pte_mkwrite(pte, vma);
>> @@ -5767,7 +5767,7 @@ static void numa_rebuild_large_mapping(struct
>> vm_fault *vmf, struct vm_area_stru
>> ptent = pte_modify(ptent, vma->vm_page_prot);
>> writable = pte_write(ptent);
>> if (!writable && pte_write_upgrade &&
>> - can_change_pte_writable(vma, addr, ptent))
>> + can_change_ptes_writable(vma, addr, ptent, NULL, 1))
>> writable = true;
>> }
>> @@ -5808,7 +5808,7 @@ static vm_fault_t do_numa_page(struct vm_fault
>> *vmf)
>> */
>> writable = pte_write(pte);
>> if (!writable && pte_write_upgrade &&
>> - can_change_pte_writable(vma, vmf->address, pte))
>> + can_change_ptes_writable(vma, vmf->address, pte, NULL, 1))
>> writable = true;
>> folio = vm_normal_folio(vma, vmf->address, pte);
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index 33eabc995584..362fd7e5457d 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -40,8 +40,8 @@
>> #include "internal.h"
>> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned
>> long addr,
>> - pte_t pte)
>> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>> long addr,
>> + pte_t pte, struct folio *folio, unsigned int nr)
>> {
>> struct page *page;
>> @@ -67,6 +67,9 @@ bool can_change_pte_writable(struct vm_area_struct
>> *vma, unsigned long addr,
>> * write-fault handler similarly would map them writable
>> without
>> * any additional checks while holding the PT lock.
>> */
>> + if (unlikely(nr != 1))
>> + return !folio_maybe_mapped_shared(folio);
>> +
>> page = vm_normal_page(vma, addr, pte);
>> return page && PageAnon(page) && PageAnonExclusive(page);
>> }
>
> IIUC, As mentioned in the comment above, we should do the same anonymous
> check
> to large folios. And folio_maybe_mapped_shared() already handles both
> order-0
> and large folios nicely, so we could simplify the logic as follows:
Thanks. Although we will have to call vm_normal_folio() in case of
!folio, since we may not have the folio already for nr == 1 case.
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 1605e89349d2..df56a30bb241 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -43,8 +43,6 @@
> bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
> long addr,
> pte_t pte, struct folio *folio, unsigned
> int nr)
> {
> - struct page *page;
> -
> if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
> return false;
>
> @@ -67,11 +65,7 @@ bool can_change_ptes_writable(struct vm_area_struct
> *vma, unsigned long addr,
> * write-fault handler similarly would map them
> writable without
> * any additional checks while holding the PT lock.
> */
> - if (unlikely(nr != 1))
> - return !folio_maybe_mapped_shared(folio);
> -
> - page = vm_normal_page(vma, addr, pte);
> - return page && PageAnon(page) && PageAnonExclusive(page);
> + return folio_test_anon(folio) && !
> folio_maybe_mapped_shared(folio);
> }
>
> VM_WARN_ON_ONCE(is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte));
> --
>
> Thanks,
> Lance
>
>> @@ -222,7 +225,7 @@ static long change_pte_range(struct mmu_gather *tlb,
>> */
>> if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
>> !pte_write(ptent) &&
>> - can_change_pte_writable(vma, addr, ptent))
>> + can_change_ptes_writable(vma, addr, ptent, folio, 1))
>> ptent = pte_mkwrite(ptent, vma);
>> ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] mm: Batch around can_change_pte_writable()
2025-04-28 12:50 ` Lance Yang
2025-04-28 12:59 ` Dev Jain
@ 2025-04-28 13:16 ` Lance Yang
2025-04-28 15:54 ` Lance Yang
1 sibling, 1 reply; 19+ messages in thread
From: Lance Yang @ 2025-04-28 13:16 UTC (permalink / raw)
To: Dev Jain, akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy
On 2025/4/28 20:50, Lance Yang wrote:
> Hey Dev,
>
> On 2025/4/28 20:04, Dev Jain wrote:
>> In preparation for patch 7, we need to properly batch around
>> can_change_pte_writable(). We batch around pte_needs_soft_dirty_wp() by
>> the corresponding fpb flag, we batch around the page-anon exclusive check
>> using folio_maybe_mapped_shared(); modify_prot_start_ptes() collects the
>> dirty and access bits across the batch, therefore batching across
>> pte_dirty(): this is correct since the dirty bit on the PTE really
>> is just an indication that the folio got written to, so even if
>> the PTE is not actually dirty (but one of the PTEs in the batch is),
>> the wp-fault optimization can be made.
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>> include/linux/mm.h | 4 ++--
>> mm/gup.c | 2 +-
>> mm/huge_memory.c | 4 ++--
>> mm/memory.c | 6 +++---
>> mm/mprotect.c | 9 ++++++---
>> 5 files changed, 14 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 5eb0d77c4438..ffa02e15863f 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -2710,8 +2710,8 @@ int get_cmdline(struct task_struct *task, char
>> *buffer, int buflen);
>> #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
>> MM_CP_UFFD_WP_RESOLVE)
>> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned
>> long addr,
>> - pte_t pte);
>> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>> long addr,
>> + pte_t pte, struct folio *folio, unsigned int nr);
>> extern long change_protection(struct mmu_gather *tlb,
>> struct vm_area_struct *vma, unsigned long start,
>> unsigned long end, unsigned long cp_flags);
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 84461d384ae2..6a605fc5f2cb 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -614,7 +614,7 @@ static inline bool can_follow_write_common(struct
>> page *page,
>> return false;
>> /*
>> - * See can_change_pte_writable(): we broke COW and could map the
>> page
>> + * See can_change_ptes_writable(): we broke COW and could map the
>> page
>> * writable if we have an exclusive anonymous page ...
>> */
>> return page && PageAnon(page) && PageAnonExclusive(page);
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 28c87e0e036f..e5496c0d9e7e 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2032,12 +2032,12 @@ static inline bool
>> can_change_pmd_writable(struct vm_area_struct *vma,
>> return false;
>> if (!(vma->vm_flags & VM_SHARED)) {
>> - /* See can_change_pte_writable(). */
>> + /* See can_change_ptes_writable(). */
>> page = vm_normal_page_pmd(vma, addr, pmd);
>> return page && PageAnon(page) && PageAnonExclusive(page);
>> }
>> - /* See can_change_pte_writable(). */
>> + /* See can_change_ptes_writable(). */
>> return pmd_dirty(pmd);
>> }
>> diff --git a/mm/memory.c b/mm/memory.c
>> index b9e8443aaa86..b1fda3de8d27 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -750,7 +750,7 @@ static void restore_exclusive_pte(struct
>> vm_area_struct *vma,
>> pte = pte_mkuffd_wp(pte);
>> if ((vma->vm_flags & VM_WRITE) &&
>> - can_change_pte_writable(vma, address, pte)) {
>> + can_change_ptes_writable(vma, address, pte, NULL, 1)) {
>> if (folio_test_dirty(folio))
>> pte = pte_mkdirty(pte);
>> pte = pte_mkwrite(pte, vma);
>> @@ -5767,7 +5767,7 @@ static void numa_rebuild_large_mapping(struct
>> vm_fault *vmf, struct vm_area_stru
>> ptent = pte_modify(ptent, vma->vm_page_prot);
>> writable = pte_write(ptent);
>> if (!writable && pte_write_upgrade &&
>> - can_change_pte_writable(vma, addr, ptent))
>> + can_change_ptes_writable(vma, addr, ptent, NULL, 1))
>> writable = true;
>> }
>> @@ -5808,7 +5808,7 @@ static vm_fault_t do_numa_page(struct vm_fault
>> *vmf)
>> */
>> writable = pte_write(pte);
>> if (!writable && pte_write_upgrade &&
>> - can_change_pte_writable(vma, vmf->address, pte))
>> + can_change_ptes_writable(vma, vmf->address, pte, NULL, 1))
>> writable = true;
>> folio = vm_normal_folio(vma, vmf->address, pte);
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index 33eabc995584..362fd7e5457d 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -40,8 +40,8 @@
>> #include "internal.h"
>> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned
>> long addr,
>> - pte_t pte)
>> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>> long addr,
>> + pte_t pte, struct folio *folio, unsigned int nr)
>> {
>> struct page *page;
>> @@ -67,6 +67,9 @@ bool can_change_pte_writable(struct vm_area_struct
>> *vma, unsigned long addr,
>> * write-fault handler similarly would map them writable
>> without
>> * any additional checks while holding the PT lock.
>> */
>> + if (unlikely(nr != 1))
>> + return !folio_maybe_mapped_shared(folio);
>> +
>> page = vm_normal_page(vma, addr, pte);
>> return page && PageAnon(page) && PageAnonExclusive(page);
>> }
>
> IIUC, As mentioned in the comment above, we should do the same anonymous
> check
> to large folios. And folio_maybe_mapped_shared() already handles both
> order-0
> and large folios nicely, so we could simplify the logic as follows:
Forget to add:
Note that the exclusive flag is set only for non-large folios or the head
page of large folios during mapping, so PageAnonExclusive() will always
return false for tail pages of large folios, IIUC.
Thanks,
Lance
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 1605e89349d2..df56a30bb241 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -43,8 +43,6 @@
> bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
> long addr,
> pte_t pte, struct folio *folio, unsigned
> int nr)
> {
> - struct page *page;
> -
> if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
> return false;
>
> @@ -67,11 +65,7 @@ bool can_change_ptes_writable(struct vm_area_struct
> *vma, unsigned long addr,
> * write-fault handler similarly would map them
> writable without
> * any additional checks while holding the PT lock.
> */
> - if (unlikely(nr != 1))
> - return !folio_maybe_mapped_shared(folio);
> -
> - page = vm_normal_page(vma, addr, pte);
> - return page && PageAnon(page) && PageAnonExclusive(page);
> + return folio_test_anon(folio) && !
> folio_maybe_mapped_shared(folio);
> }
>
> VM_WARN_ON_ONCE(is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte));
> --
>
> Thanks,
> Lance
>
>> @@ -222,7 +225,7 @@ static long change_pte_range(struct mmu_gather *tlb,
>> */
>> if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
>> !pte_write(ptent) &&
>> - can_change_pte_writable(vma, addr, ptent))
>> + can_change_ptes_writable(vma, addr, ptent, folio, 1))
>> ptent = pte_mkwrite(ptent, vma);
>> ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] mm: Batch around can_change_pte_writable()
2025-04-28 12:59 ` Dev Jain
@ 2025-04-28 13:23 ` Lance Yang
2025-04-29 4:59 ` Dev Jain
0 siblings, 1 reply; 19+ messages in thread
From: Lance Yang @ 2025-04-28 13:23 UTC (permalink / raw)
To: Dev Jain, akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy
On 2025/4/28 20:59, Dev Jain wrote:
>
>
> On 28/04/25 6:20 pm, Lance Yang wrote:
>> Hey Dev,
>>
>> On 2025/4/28 20:04, Dev Jain wrote:
>>> In preparation for patch 7, we need to properly batch around
>>> can_change_pte_writable(). We batch around pte_needs_soft_dirty_wp() by
>>> the corresponding fpb flag, we batch around the page-anon exclusive
>>> check
>>> using folio_maybe_mapped_shared(); modify_prot_start_ptes() collects the
>>> dirty and access bits across the batch, therefore batching across
>>> pte_dirty(): this is correct since the dirty bit on the PTE really
>>> is just an indication that the folio got written to, so even if
>>> the PTE is not actually dirty (but one of the PTEs in the batch is),
>>> the wp-fault optimization can be made.
>>>
>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>> ---
>>> include/linux/mm.h | 4 ++--
>>> mm/gup.c | 2 +-
>>> mm/huge_memory.c | 4 ++--
>>> mm/memory.c | 6 +++---
>>> mm/mprotect.c | 9 ++++++---
>>> 5 files changed, 14 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 5eb0d77c4438..ffa02e15863f 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -2710,8 +2710,8 @@ int get_cmdline(struct task_struct *task, char
>>> *buffer, int buflen);
>>> #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
>>> MM_CP_UFFD_WP_RESOLVE)
>>> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned
>>> long addr,
>>> - pte_t pte);
>>> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>>> long addr,
>>> + pte_t pte, struct folio *folio, unsigned int nr);
>>> extern long change_protection(struct mmu_gather *tlb,
>>> struct vm_area_struct *vma, unsigned long start,
>>> unsigned long end, unsigned long cp_flags);
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 84461d384ae2..6a605fc5f2cb 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -614,7 +614,7 @@ static inline bool can_follow_write_common(struct
>>> page *page,
>>> return false;
>>> /*
>>> - * See can_change_pte_writable(): we broke COW and could map the
>>> page
>>> + * See can_change_ptes_writable(): we broke COW and could map
>>> the page
>>> * writable if we have an exclusive anonymous page ...
>>> */
>>> return page && PageAnon(page) && PageAnonExclusive(page);
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 28c87e0e036f..e5496c0d9e7e 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2032,12 +2032,12 @@ static inline bool
>>> can_change_pmd_writable(struct vm_area_struct *vma,
>>> return false;
>>> if (!(vma->vm_flags & VM_SHARED)) {
>>> - /* See can_change_pte_writable(). */
>>> + /* See can_change_ptes_writable(). */
>>> page = vm_normal_page_pmd(vma, addr, pmd);
>>> return page && PageAnon(page) && PageAnonExclusive(page);
>>> }
>>> - /* See can_change_pte_writable(). */
>>> + /* See can_change_ptes_writable(). */
>>> return pmd_dirty(pmd);
>>> }
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index b9e8443aaa86..b1fda3de8d27 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -750,7 +750,7 @@ static void restore_exclusive_pte(struct
>>> vm_area_struct *vma,
>>> pte = pte_mkuffd_wp(pte);
>>> if ((vma->vm_flags & VM_WRITE) &&
>>> - can_change_pte_writable(vma, address, pte)) {
>>> + can_change_ptes_writable(vma, address, pte, NULL, 1)) {
>>> if (folio_test_dirty(folio))
>>> pte = pte_mkdirty(pte);
>>> pte = pte_mkwrite(pte, vma);
>>> @@ -5767,7 +5767,7 @@ static void numa_rebuild_large_mapping(struct
>>> vm_fault *vmf, struct vm_area_stru
>>> ptent = pte_modify(ptent, vma->vm_page_prot);
>>> writable = pte_write(ptent);
>>> if (!writable && pte_write_upgrade &&
>>> - can_change_pte_writable(vma, addr, ptent))
>>> + can_change_ptes_writable(vma, addr, ptent, NULL, 1))
>>> writable = true;
>>> }
>>> @@ -5808,7 +5808,7 @@ static vm_fault_t do_numa_page(struct vm_fault
>>> *vmf)
>>> */
>>> writable = pte_write(pte);
>>> if (!writable && pte_write_upgrade &&
>>> - can_change_pte_writable(vma, vmf->address, pte))
>>> + can_change_ptes_writable(vma, vmf->address, pte, NULL, 1))
>>> writable = true;
>>> folio = vm_normal_folio(vma, vmf->address, pte);
>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>>> index 33eabc995584..362fd7e5457d 100644
>>> --- a/mm/mprotect.c
>>> +++ b/mm/mprotect.c
>>> @@ -40,8 +40,8 @@
>>> #include "internal.h"
>>> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned
>>> long addr,
>>> - pte_t pte)
>>> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>>> long addr,
>>> + pte_t pte, struct folio *folio, unsigned int nr)
>>> {
>>> struct page *page;
>>> @@ -67,6 +67,9 @@ bool can_change_pte_writable(struct vm_area_struct
>>> *vma, unsigned long addr,
>>> * write-fault handler similarly would map them writable
>>> without
>>> * any additional checks while holding the PT lock.
>>> */
>>> + if (unlikely(nr != 1))
>>> + return !folio_maybe_mapped_shared(folio);
>>> +
>>> page = vm_normal_page(vma, addr, pte);
>>> return page && PageAnon(page) && PageAnonExclusive(page);
>>> }
>>
>> IIUC, As mentioned in the comment above, we should do the same
>> anonymous check
>> to large folios. And folio_maybe_mapped_shared() already handles both
>> order-0
>> and large folios nicely, so we could simplify the logic as follows:
>
> Thanks. Although we will have to call vm_normal_folio() in case of !
> folio, since we may not have the folio already for nr == 1 case.
Ah, I see. Should we still check folio_test_anon() when nr != 1?
Thanks,
Lance
>
>>
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index 1605e89349d2..df56a30bb241 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -43,8 +43,6 @@
>> bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>> long addr,
>> pte_t pte, struct folio *folio,
>> unsigned int nr)
>> {
>> - struct page *page;
>> -
>> if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
>> return false;
>>
>> @@ -67,11 +65,7 @@ bool can_change_ptes_writable(struct vm_area_struct
>> *vma, unsigned long addr,
>> * write-fault handler similarly would map them
>> writable without
>> * any additional checks while holding the PT lock.
>> */
>> - if (unlikely(nr != 1))
>> - return !folio_maybe_mapped_shared(folio);
>> -
>> - page = vm_normal_page(vma, addr, pte);
>> - return page && PageAnon(page) && PageAnonExclusive(page);
>> + return folio_test_anon(folio) && !
>> folio_maybe_mapped_shared(folio);
>> }
>>
>> VM_WARN_ON_ONCE(is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte));
>> --
>>
>> Thanks,
>> Lance
>>
>>> @@ -222,7 +225,7 @@ static long change_pte_range(struct mmu_gather *tlb,
>>> */
>>> if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
>>> !pte_write(ptent) &&
>>> - can_change_pte_writable(vma, addr, ptent))
>>> + can_change_ptes_writable(vma, addr, ptent, folio, 1))
>>> ptent = pte_mkwrite(ptent, vma);
>>> ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
>>
>>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/7] Optimize mprotect for large folios
2025-04-28 12:04 [PATCH 0/7] Optimize mprotect for large folios Dev Jain
` (7 preceding siblings ...)
2025-04-28 12:52 ` [PATCH 0/7] Optimize mprotect for large folios Dev Jain
@ 2025-04-28 13:31 ` Lance Yang
2025-04-29 4:40 ` Dev Jain
8 siblings, 1 reply; 19+ messages in thread
From: Lance Yang @ 2025-04-28 13:31 UTC (permalink / raw)
To: Dev Jain, akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy
I'm hitting the following compilation errors after applying this patch
series:
In file included from ./include/linux/kasan.h:37,
from ./include/linux/slab.h:260,
from ./include/linux/crypto.h:19,
from arch/x86/kernel/asm-offsets.c:9:
./include/linux/pgtable.h: In function ‘modify_prot_start_ptes’:
./include/linux/pgtable.h:905:15: error: implicit declaration of
function ‘ptep_modify_prot_start’ [-Werror=implicit-function-declaration]
905 | pte = ptep_modify_prot_start(vma, addr, ptep);
| ^~~~~~~~~~~~~~~~~~~~~~
./include/linux/pgtable.h:905:15: error: incompatible types when
assigning to type ‘pte_t’ from type ‘int’
./include/linux/pgtable.h:909:27: error: incompatible types when
assigning to type ‘pte_t’ from type ‘int’
909 | tmp_pte = ptep_modify_prot_start(vma, addr, ptep);
| ^~~~~~~~~~~~~~~~~~~~~~
./include/linux/pgtable.h: In function ‘modify_prot_commit_ptes’:
./include/linux/pgtable.h:925:17: error: implicit declaration of
function ‘ptep_modify_prot_commit’ [-Werror=implicit-function-declaration]
925 | ptep_modify_prot_commit(vma, addr, ptep,
old_pte, pte);
| ^~~~~~~~~~~~~~~~~~~~~~~
./include/linux/pgtable.h: At top level:
./include/linux/pgtable.h:1360:21: error: conflicting types for
‘ptep_modify_prot_start’; have ‘pte_t(struct vm_area_struct *, long
unsigned int, pte_t *)’
1360 | static inline pte_t ptep_modify_prot_start(struct
vm_area_struct *vma,
| ^~~~~~~~~~~~~~~~~~~~~~
./include/linux/pgtable.h:905:15: note: previous implicit declaration of
‘ptep_modify_prot_start’ with type ‘int()’
905 | pte = ptep_modify_prot_start(vma, addr, ptep);
| ^~~~~~~~~~~~~~~~~~~~~~
./include/linux/pgtable.h:1371:20: warning: conflicting types for
‘ptep_modify_prot_commit’; have ‘void(struct vm_area_struct *, long
unsigned int, pte_t *, pte_t, pte_t)’
1371 | static inline void ptep_modify_prot_commit(struct
vm_area_struct *vma,
| ^~~~~~~~~~~~~~~~~~~~~~~
./include/linux/pgtable.h:1371:20: error: static declaration of
‘ptep_modify_prot_commit’ follows non-static declaration
./include/linux/pgtable.h:925:17: note: previous implicit declaration of
‘ptep_modify_prot_commit’ with type ‘void(struct vm_area_struct *, long
unsigned int, pte_t *, pte_t, pte_t)’
925 | ptep_modify_prot_commit(vma, addr, ptep,
old_pte, pte);
| ^~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:98: arch/x86/kernel/asm-offsets.s]
Error 1
make[1]: ***
[/home/runner/work/mm-test-robot/mm-test-robot/linux/Makefile:1280:
prepare0] Error 2
make: *** [Makefile:248: __sub-make] Error 2
Based on:
mm-unstable b18dec6a6ad3d051dadc3c16fb838e4abddf8d3c ("mm/numa: remove
unnecessary local variable in alloc_node_data()")
Thanks,
Lance
On 2025/4/28 20:04, Dev Jain wrote:
> This patchset optimizes the mprotect() system call for large folios
> by PTE-batching.
>
> We use the following test cases to measure performance, mprotect()'ing
> the mapped memory to read-only then read-write 40 times:
>
> Test case 1: Mapping 1G of memory, touching it to get PMD-THPs, then
> pte-mapping those THPs
> Test case 2: Mapping 1G of memory with 64K mTHPs
> Test case 3: Mapping 1G of memory with 4K pages
>
> Average execution time on arm64, Apple M3:
> Before the patchset:
> T1: 7.9 seconds T2: 7.9 seconds T3: 4.2 seconds
>
> After the patchset:
> T1: 2.1 seconds T2: 2.2 seconds T3: 4.2 seconds
>
> Observing T1/T2 and T3 before the patchset, we also remove the regression
> introduced by ptep_get() on a contpte block. And, for large folios we get
> an almost 276% performance improvement.
>
> Dev Jain (7):
> mm: Refactor code in mprotect
> mm: Optimize mprotect() by batch-skipping PTEs
> mm: Add batched versions of ptep_modify_prot_start/commit
> arm64: Add batched version of ptep_modify_prot_start
> arm64: Add batched version of ptep_modify_prot_commit
> mm: Batch around can_change_pte_writable()
> mm: Optimize mprotect() through PTE-batching
>
> arch/arm64/include/asm/pgtable.h | 10 ++
> arch/arm64/mm/mmu.c | 21 +++-
> include/linux/mm.h | 4 +-
> include/linux/pgtable.h | 42 ++++++++
> mm/gup.c | 2 +-
> mm/huge_memory.c | 4 +-
> mm/memory.c | 6 +-
> mm/mprotect.c | 163 +++++++++++++++++++++----------
> mm/pgtable-generic.c | 16 ++-
> 9 files changed, 198 insertions(+), 70 deletions(-)
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] mm: Batch around can_change_pte_writable()
2025-04-28 13:16 ` Lance Yang
@ 2025-04-28 15:54 ` Lance Yang
0 siblings, 0 replies; 19+ messages in thread
From: Lance Yang @ 2025-04-28 15:54 UTC (permalink / raw)
To: Dev Jain, akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy
On 2025/4/28 21:16, Lance Yang wrote:
>
>
> On 2025/4/28 20:50, Lance Yang wrote:
>> Hey Dev,
>>
>> On 2025/4/28 20:04, Dev Jain wrote:
>>> In preparation for patch 7, we need to properly batch around
>>> can_change_pte_writable(). We batch around pte_needs_soft_dirty_wp() by
>>> the corresponding fpb flag, we batch around the page-anon exclusive
>>> check
>>> using folio_maybe_mapped_shared(); modify_prot_start_ptes() collects the
>>> dirty and access bits across the batch, therefore batching across
>>> pte_dirty(): this is correct since the dirty bit on the PTE really
>>> is just an indication that the folio got written to, so even if
>>> the PTE is not actually dirty (but one of the PTEs in the batch is),
>>> the wp-fault optimization can be made.
>>>
>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>> ---
>>> include/linux/mm.h | 4 ++--
>>> mm/gup.c | 2 +-
>>> mm/huge_memory.c | 4 ++--
>>> mm/memory.c | 6 +++---
>>> mm/mprotect.c | 9 ++++++---
>>> 5 files changed, 14 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 5eb0d77c4438..ffa02e15863f 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -2710,8 +2710,8 @@ int get_cmdline(struct task_struct *task, char
>>> *buffer, int buflen);
>>> #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
>>> MM_CP_UFFD_WP_RESOLVE)
>>> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned
>>> long addr,
>>> - pte_t pte);
>>> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>>> long addr,
>>> + pte_t pte, struct folio *folio, unsigned int nr);
>>> extern long change_protection(struct mmu_gather *tlb,
>>> struct vm_area_struct *vma, unsigned long start,
>>> unsigned long end, unsigned long cp_flags);
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 84461d384ae2..6a605fc5f2cb 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -614,7 +614,7 @@ static inline bool can_follow_write_common(struct
>>> page *page,
>>> return false;
>>> /*
>>> - * See can_change_pte_writable(): we broke COW and could map the
>>> page
>>> + * See can_change_ptes_writable(): we broke COW and could map
>>> the page
>>> * writable if we have an exclusive anonymous page ...
>>> */
>>> return page && PageAnon(page) && PageAnonExclusive(page);
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 28c87e0e036f..e5496c0d9e7e 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2032,12 +2032,12 @@ static inline bool
>>> can_change_pmd_writable(struct vm_area_struct *vma,
>>> return false;
>>> if (!(vma->vm_flags & VM_SHARED)) {
>>> - /* See can_change_pte_writable(). */
>>> + /* See can_change_ptes_writable(). */
>>> page = vm_normal_page_pmd(vma, addr, pmd);
>>> return page && PageAnon(page) && PageAnonExclusive(page);
>>> }
>>> - /* See can_change_pte_writable(). */
>>> + /* See can_change_ptes_writable(). */
>>> return pmd_dirty(pmd);
>>> }
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index b9e8443aaa86..b1fda3de8d27 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -750,7 +750,7 @@ static void restore_exclusive_pte(struct
>>> vm_area_struct *vma,
>>> pte = pte_mkuffd_wp(pte);
>>> if ((vma->vm_flags & VM_WRITE) &&
>>> - can_change_pte_writable(vma, address, pte)) {
>>> + can_change_ptes_writable(vma, address, pte, NULL, 1)) {
>>> if (folio_test_dirty(folio))
>>> pte = pte_mkdirty(pte);
>>> pte = pte_mkwrite(pte, vma);
>>> @@ -5767,7 +5767,7 @@ static void numa_rebuild_large_mapping(struct
>>> vm_fault *vmf, struct vm_area_stru
>>> ptent = pte_modify(ptent, vma->vm_page_prot);
>>> writable = pte_write(ptent);
>>> if (!writable && pte_write_upgrade &&
>>> - can_change_pte_writable(vma, addr, ptent))
>>> + can_change_ptes_writable(vma, addr, ptent, NULL, 1))
>>> writable = true;
>>> }
>>> @@ -5808,7 +5808,7 @@ static vm_fault_t do_numa_page(struct vm_fault
>>> *vmf)
>>> */
>>> writable = pte_write(pte);
>>> if (!writable && pte_write_upgrade &&
>>> - can_change_pte_writable(vma, vmf->address, pte))
>>> + can_change_ptes_writable(vma, vmf->address, pte, NULL, 1))
>>> writable = true;
>>> folio = vm_normal_folio(vma, vmf->address, pte);
>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>>> index 33eabc995584..362fd7e5457d 100644
>>> --- a/mm/mprotect.c
>>> +++ b/mm/mprotect.c
>>> @@ -40,8 +40,8 @@
>>> #include "internal.h"
>>> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned
>>> long addr,
>>> - pte_t pte)
>>> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>>> long addr,
>>> + pte_t pte, struct folio *folio, unsigned int nr)
>>> {
>>> struct page *page;
>>> @@ -67,6 +67,9 @@ bool can_change_pte_writable(struct vm_area_struct
>>> *vma, unsigned long addr,
>>> * write-fault handler similarly would map them writable
>>> without
>>> * any additional checks while holding the PT lock.
>>> */
>>> + if (unlikely(nr != 1))
>>> + return !folio_maybe_mapped_shared(folio);
>>> +
>>> page = vm_normal_page(vma, addr, pte);
>>> return page && PageAnon(page) && PageAnonExclusive(page);
>>> }
>>
>> IIUC, As mentioned in the comment above, we should do the same
>> anonymous check
>> to large folios. And folio_maybe_mapped_shared() already handles both
>> order-0
>> and large folios nicely, so we could simplify the logic as follows:
>
> Forget to add:
>
> Note that the exclusive flag is set only for non-large folios or the head
> page of large folios during mapping, so PageAnonExclusive() will always
> return false for tail pages of large folios, IIUC.
Correction: the exclusive flag would be set for all sub pages of large
folios
during mapping.
Thanks,
Lance
>
> Thanks,
> Lance
>
>>
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index 1605e89349d2..df56a30bb241 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -43,8 +43,6 @@
>> bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>> long addr,
>> pte_t pte, struct folio *folio,
>> unsigned int nr)
>> {
>> - struct page *page;
>> -
>> if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
>> return false;
>>
>> @@ -67,11 +65,7 @@ bool can_change_ptes_writable(struct vm_area_struct
>> *vma, unsigned long addr,
>> * write-fault handler similarly would map them
>> writable without
>> * any additional checks while holding the PT lock.
>> */
>> - if (unlikely(nr != 1))
>> - return !folio_maybe_mapped_shared(folio);
>> -
>> - page = vm_normal_page(vma, addr, pte);
>> - return page && PageAnon(page) && PageAnonExclusive(page);
>> + return folio_test_anon(folio) && !
>> folio_maybe_mapped_shared(folio);
>> }
>>
>> VM_WARN_ON_ONCE(is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte));
>> --
>>
>> Thanks,
>> Lance
>>
>>> @@ -222,7 +225,7 @@ static long change_pte_range(struct mmu_gather *tlb,
>>> */
>>> if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
>>> !pte_write(ptent) &&
>>> - can_change_pte_writable(vma, addr, ptent))
>>> + can_change_ptes_writable(vma, addr, ptent, folio, 1))
>>> ptent = pte_mkwrite(ptent, vma);
>>> ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
>>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/7] arm64: Add batched version of ptep_modify_prot_start
2025-04-28 12:04 ` [PATCH 4/7] arm64: Add batched version of ptep_modify_prot_start Dev Jain
@ 2025-04-28 18:06 ` Zi Yan
2025-04-29 4:44 ` Dev Jain
0 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2025-04-28 18:06 UTC (permalink / raw)
To: Dev Jain
Cc: akpm, ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang
On 28 Apr 2025, at 8:04, Dev Jain wrote:
> Override the generic definition to use get_and_clear_full_ptes(), so that
> we do a TLBI possibly only on the "contpte-edges" of the large PTE block,
What do you mean by “contpte-edges”? Can you provide an example?
> instead of doing it for every contpte block, which happens for ptep_get_and_clear().
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
> arch/arm64/include/asm/pgtable.h | 5 +++++
> arch/arm64/mm/mmu.c | 12 +++++++++---
> include/linux/pgtable.h | 4 ++++
> mm/pgtable-generic.c | 16 +++++++++++-----
> 4 files changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 2a77f11b78d5..8872ea5f0642 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1553,6 +1553,11 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t old_pte, pte_t new_pte);
>
> +#define modify_prot_start_ptes modify_prot_start_ptes
> +extern pte_t modify_prot_start_ptes(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep,
> + unsigned int nr);
> +
> #ifdef CONFIG_ARM64_CONTPTE
>
> /*
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 8fcf59ba39db..fe60be8774f4 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1523,7 +1523,8 @@ static int __init prevent_bootmem_remove_init(void)
> early_initcall(prevent_bootmem_remove_init);
> #endif
>
> -pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
> +pte_t modify_prot_start_ptes(struct vm_area_struct *vma, unsigned long addr,
> + pte_t *ptep, unsigned int nr)
Putting ptes at the end seems to break the naming convention. How about
ptep_modify_prot_range_start? ptes_modify_prot_start might be OK too.
> {
> if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) {
> /*
> @@ -1532,9 +1533,14 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte
> * in cases where cpu is affected with errata #2645198.
> */
> if (pte_user_exec(ptep_get(ptep)))
> - return ptep_clear_flush(vma, addr, ptep);
> + return clear_flush_ptes(vma, addr, ptep, nr);
> }
> - return ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + return get_and_clear_full_ptes(vma->vm_mm, addr, ptep, nr, 0);
> +}
> +
> +pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
> +{
> + return modify_prot_start_ptes(vma, addr, ptep, 1);
> }
>
> void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep,
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index ed287289335f..10cdb87ccecf 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -828,6 +828,10 @@ extern pte_t ptep_clear_flush(struct vm_area_struct *vma,
> pte_t *ptep);
> #endif
>
> +extern pte_t clear_flush_ptes(struct vm_area_struct *vma,
> + unsigned long address,
> + pte_t *ptep, unsigned int nr);
> +
> #ifndef __HAVE_ARCH_PMDP_HUGE_CLEAR_FLUSH
> extern pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma,
> unsigned long address,
> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> index 5a882f2b10f9..e238f88c3cac 100644
> --- a/mm/pgtable-generic.c
> +++ b/mm/pgtable-generic.c
> @@ -90,17 +90,23 @@ int ptep_clear_flush_young(struct vm_area_struct *vma,
> }
> #endif
>
> -#ifndef __HAVE_ARCH_PTEP_CLEAR_FLUSH
> -pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address,
> - pte_t *ptep)
> +pte_t clear_flush_ptes(struct vm_area_struct *vma, unsigned long address,
> + pte_t *ptep, unsigned int nr)
> {
Ditto.
> struct mm_struct *mm = (vma)->vm_mm;
> pte_t pte;
> - pte = ptep_get_and_clear(mm, address, ptep);
> + pte = get_and_clear_full_ptes(mm, address, ptep, nr, 0);
> if (pte_accessible(mm, pte))
> - flush_tlb_page(vma, address);
> + flush_tlb_range(vma, address, address + nr * PAGE_SIZE);
> return pte;
> }
> +
> +#ifndef __HAVE_ARCH_PTEP_CLEAR_FLUSH
> +pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address,
> + pte_t *ptep)
> +{
> + return clear_flush_ptes(vma, address, ptep, 1);
> +}
> #endif
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> --
> 2.30.2
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/7] Optimize mprotect for large folios
2025-04-28 13:31 ` Lance Yang
@ 2025-04-29 4:40 ` Dev Jain
0 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2025-04-29 4:40 UTC (permalink / raw)
To: Lance Yang, akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy
On 28/04/25 7:01 pm, Lance Yang wrote:
> I'm hitting the following compilation errors after applying this patch
> series:
Not sure why is that. I cherry-picked my commits onto
6ebffe676fcf8d259e3fb5d5fbf1a8227f22182c and the kernel builds for me.
Let me send a v2 rebased onto this.
>
> In file included from ./include/linux/kasan.h:37,
> from ./include/linux/slab.h:260,
> from ./include/linux/crypto.h:19,
> from arch/x86/kernel/asm-offsets.c:9:
> ./include/linux/pgtable.h: In function ‘modify_prot_start_ptes’:
> ./include/linux/pgtable.h:905:15: error: implicit declaration of
> function ‘ptep_modify_prot_start’ [-Werror=implicit-function-declaration]
> 905 | pte = ptep_modify_prot_start(vma, addr, ptep);
> | ^~~~~~~~~~~~~~~~~~~~~~
> ./include/linux/pgtable.h:905:15: error: incompatible types when
> assigning to type ‘pte_t’ from type ‘int’
> ./include/linux/pgtable.h:909:27: error: incompatible types when
> assigning to type ‘pte_t’ from type ‘int’
> 909 | tmp_pte = ptep_modify_prot_start(vma, addr, ptep);
> | ^~~~~~~~~~~~~~~~~~~~~~
> ./include/linux/pgtable.h: In function ‘modify_prot_commit_ptes’:
> ./include/linux/pgtable.h:925:17: error: implicit declaration of
> function ‘ptep_modify_prot_commit’ [-Werror=implicit-function-declaration]
> 925 | ptep_modify_prot_commit(vma, addr, ptep,
> old_pte, pte);
> | ^~~~~~~~~~~~~~~~~~~~~~~
> ./include/linux/pgtable.h: At top level:
> ./include/linux/pgtable.h:1360:21: error: conflicting types for
> ‘ptep_modify_prot_start’; have ‘pte_t(struct vm_area_struct *, long
> unsigned int, pte_t *)’
> 1360 | static inline pte_t ptep_modify_prot_start(struct
> vm_area_struct *vma,
> | ^~~~~~~~~~~~~~~~~~~~~~
> ./include/linux/pgtable.h:905:15: note: previous implicit declaration of
> ‘ptep_modify_prot_start’ with type ‘int()’
> 905 | pte = ptep_modify_prot_start(vma, addr, ptep);
> | ^~~~~~~~~~~~~~~~~~~~~~
> ./include/linux/pgtable.h:1371:20: warning: conflicting types for
> ‘ptep_modify_prot_commit’; have ‘void(struct vm_area_struct *, long
> unsigned int, pte_t *, pte_t, pte_t)’
> 1371 | static inline void ptep_modify_prot_commit(struct
> vm_area_struct *vma,
> | ^~~~~~~~~~~~~~~~~~~~~~~
> ./include/linux/pgtable.h:1371:20: error: static declaration of
> ‘ptep_modify_prot_commit’ follows non-static declaration
> ./include/linux/pgtable.h:925:17: note: previous implicit declaration of
> ‘ptep_modify_prot_commit’ with type ‘void(struct vm_area_struct *, long
> unsigned int, pte_t *, pte_t, pte_t)’
> 925 | ptep_modify_prot_commit(vma, addr, ptep,
> old_pte, pte);
> | ^~~~~~~~~~~~~~~~~~~~~~~
> cc1: some warnings being treated as errors
> make[2]: *** [scripts/Makefile.build:98: arch/x86/kernel/asm-offsets.s]
> Error 1
> make[1]: *** [/home/runner/work/mm-test-robot/mm-test-robot/linux/
> Makefile:1280: prepare0] Error 2
> make: *** [Makefile:248: __sub-make] Error 2
>
> Based on:
>
> mm-unstable b18dec6a6ad3d051dadc3c16fb838e4abddf8d3c ("mm/numa: remove
> unnecessary local variable in alloc_node_data()")
>
>
> Thanks,
> Lance
>
>
>
> On 2025/4/28 20:04, Dev Jain wrote:
>> This patchset optimizes the mprotect() system call for large folios
>> by PTE-batching.
>>
>> We use the following test cases to measure performance, mprotect()'ing
>> the mapped memory to read-only then read-write 40 times:
>>
>> Test case 1: Mapping 1G of memory, touching it to get PMD-THPs, then
>> pte-mapping those THPs
>> Test case 2: Mapping 1G of memory with 64K mTHPs
>> Test case 3: Mapping 1G of memory with 4K pages
>>
>> Average execution time on arm64, Apple M3:
>> Before the patchset:
>> T1: 7.9 seconds T2: 7.9 seconds T3: 4.2 seconds
>>
>> After the patchset:
>> T1: 2.1 seconds T2: 2.2 seconds T3: 4.2 seconds
>>
>> Observing T1/T2 and T3 before the patchset, we also remove the regression
>> introduced by ptep_get() on a contpte block. And, for large folios we get
>> an almost 276% performance improvement.
>>
>> Dev Jain (7):
>> mm: Refactor code in mprotect
>> mm: Optimize mprotect() by batch-skipping PTEs
>> mm: Add batched versions of ptep_modify_prot_start/commit
>> arm64: Add batched version of ptep_modify_prot_start
>> arm64: Add batched version of ptep_modify_prot_commit
>> mm: Batch around can_change_pte_writable()
>> mm: Optimize mprotect() through PTE-batching
>>
>> arch/arm64/include/asm/pgtable.h | 10 ++
>> arch/arm64/mm/mmu.c | 21 +++-
>> include/linux/mm.h | 4 +-
>> include/linux/pgtable.h | 42 ++++++++
>> mm/gup.c | 2 +-
>> mm/huge_memory.c | 4 +-
>> mm/memory.c | 6 +-
>> mm/mprotect.c | 163 +++++++++++++++++++++----------
>> mm/pgtable-generic.c | 16 ++-
>> 9 files changed, 198 insertions(+), 70 deletions(-)
>>
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/7] arm64: Add batched version of ptep_modify_prot_start
2025-04-28 18:06 ` Zi Yan
@ 2025-04-29 4:44 ` Dev Jain
0 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2025-04-29 4:44 UTC (permalink / raw)
To: Zi Yan
Cc: akpm, ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang
On 28/04/25 11:36 pm, Zi Yan wrote:
> On 28 Apr 2025, at 8:04, Dev Jain wrote:
>
>> Override the generic definition to use get_and_clear_full_ptes(), so that
>> we do a TLBI possibly only on the "contpte-edges" of the large PTE block,
>
> What do you mean by “contpte-edges”? Can you provide an example?
get_and_clear_full_ptes -> contpte_get_and_clear_full_ptes ->
contpte_try_unfold_partial, which unfolds only the start and end
contpte block. Whereas, ptep_get_and_clear -> contpte_try_unfold, which
means it will unfold for every contpte block.
>
>> instead of doing it for every contpte block, which happens for ptep_get_and_clear().
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>> arch/arm64/include/asm/pgtable.h | 5 +++++
>> arch/arm64/mm/mmu.c | 12 +++++++++---
>> include/linux/pgtable.h | 4 ++++
>> mm/pgtable-generic.c | 16 +++++++++++-----
>> 4 files changed, 29 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 2a77f11b78d5..8872ea5f0642 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -1553,6 +1553,11 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma,
>> unsigned long addr, pte_t *ptep,
>> pte_t old_pte, pte_t new_pte);
>>
>> +#define modify_prot_start_ptes modify_prot_start_ptes
>> +extern pte_t modify_prot_start_ptes(struct vm_area_struct *vma,
>> + unsigned long addr, pte_t *ptep,
>> + unsigned int nr);
>> +
>> #ifdef CONFIG_ARM64_CONTPTE
>>
>> /*
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 8fcf59ba39db..fe60be8774f4 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1523,7 +1523,8 @@ static int __init prevent_bootmem_remove_init(void)
>> early_initcall(prevent_bootmem_remove_init);
>> #endif
>>
>> -pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
>> +pte_t modify_prot_start_ptes(struct vm_area_struct *vma, unsigned long addr,
>> + pte_t *ptep, unsigned int nr)
>
> Putting ptes at the end seems to break the naming convention. How about
> ptep_modify_prot_range_start? ptes_modify_prot_start might be OK too.
I was actually following the convention present in
include/linux/pgtable.h, look at all the functions with _ptes suffix.
>
>> {
>> if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) {
>> /*
>> @@ -1532,9 +1533,14 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte
>> * in cases where cpu is affected with errata #2645198.
>> */
>> if (pte_user_exec(ptep_get(ptep)))
>> - return ptep_clear_flush(vma, addr, ptep);
>> + return clear_flush_ptes(vma, addr, ptep, nr);
>> }
>> - return ptep_get_and_clear(vma->vm_mm, addr, ptep);
>> + return get_and_clear_full_ptes(vma->vm_mm, addr, ptep, nr, 0);
>> +}
>> +
>> +pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
>> +{
>> + return modify_prot_start_ptes(vma, addr, ptep, 1);
>> }
>>
>> void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep,
>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>> index ed287289335f..10cdb87ccecf 100644
>> --- a/include/linux/pgtable.h
>> +++ b/include/linux/pgtable.h
>> @@ -828,6 +828,10 @@ extern pte_t ptep_clear_flush(struct vm_area_struct *vma,
>> pte_t *ptep);
>> #endif
>>
>> +extern pte_t clear_flush_ptes(struct vm_area_struct *vma,
>> + unsigned long address,
>> + pte_t *ptep, unsigned int nr);
>> +
>> #ifndef __HAVE_ARCH_PMDP_HUGE_CLEAR_FLUSH
>> extern pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma,
>> unsigned long address,
>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>> index 5a882f2b10f9..e238f88c3cac 100644
>> --- a/mm/pgtable-generic.c
>> +++ b/mm/pgtable-generic.c
>> @@ -90,17 +90,23 @@ int ptep_clear_flush_young(struct vm_area_struct *vma,
>> }
>> #endif
>>
>> -#ifndef __HAVE_ARCH_PTEP_CLEAR_FLUSH
>> -pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address,
>> - pte_t *ptep)
>> +pte_t clear_flush_ptes(struct vm_area_struct *vma, unsigned long address,
>> + pte_t *ptep, unsigned int nr)
>> {
>
> Ditto.
>
>> struct mm_struct *mm = (vma)->vm_mm;
>> pte_t pte;
>> - pte = ptep_get_and_clear(mm, address, ptep);
>> + pte = get_and_clear_full_ptes(mm, address, ptep, nr, 0);
>> if (pte_accessible(mm, pte))
>> - flush_tlb_page(vma, address);
>> + flush_tlb_range(vma, address, address + nr * PAGE_SIZE);
>> return pte;
>> }
>> +
>> +#ifndef __HAVE_ARCH_PTEP_CLEAR_FLUSH
>> +pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address,
>> + pte_t *ptep)
>> +{
>> + return clear_flush_ptes(vma, address, ptep, 1);
>> +}
>> #endif
>>
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> --
>> 2.30.2
>
>
> Best Regards,
> Yan, Zi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] mm: Batch around can_change_pte_writable()
2025-04-28 13:23 ` Lance Yang
@ 2025-04-29 4:59 ` Dev Jain
0 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2025-04-29 4:59 UTC (permalink / raw)
To: Lance Yang, akpm
Cc: ryan.roberts, david, willy, linux-mm, linux-kernel,
catalin.marinas, will, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, anshuman.khandual, peterx, joey.gouly, ioworker0, baohua,
kevin.brodsky, quic_zhenhuah, christophe.leroy, yangyicong,
linux-arm-kernel, namit, hughd, yang, ziy
On 28/04/25 6:53 pm, Lance Yang wrote:
>
>
> On 2025/4/28 20:59, Dev Jain wrote:
>>
>>
>> On 28/04/25 6:20 pm, Lance Yang wrote:
>>> Hey Dev,
>>>
>>> On 2025/4/28 20:04, Dev Jain wrote:
>>>> In preparation for patch 7, we need to properly batch around
>>>> can_change_pte_writable(). We batch around pte_needs_soft_dirty_wp() by
>>>> the corresponding fpb flag, we batch around the page-anon exclusive
>>>> check
>>>> using folio_maybe_mapped_shared(); modify_prot_start_ptes() collects
>>>> the
>>>> dirty and access bits across the batch, therefore batching across
>>>> pte_dirty(): this is correct since the dirty bit on the PTE really
>>>> is just an indication that the folio got written to, so even if
>>>> the PTE is not actually dirty (but one of the PTEs in the batch is),
>>>> the wp-fault optimization can be made.
>>>>
>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>> ---
>>>> include/linux/mm.h | 4 ++--
>>>> mm/gup.c | 2 +-
>>>> mm/huge_memory.c | 4 ++--
>>>> mm/memory.c | 6 +++---
>>>> mm/mprotect.c | 9 ++++++---
>>>> 5 files changed, 14 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index 5eb0d77c4438..ffa02e15863f 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -2710,8 +2710,8 @@ int get_cmdline(struct task_struct *task, char
>>>> *buffer, int buflen);
>>>> #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
>>>> MM_CP_UFFD_WP_RESOLVE)
>>>> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned
>>>> long addr,
>>>> - pte_t pte);
>>>> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>>>> long addr,
>>>> + pte_t pte, struct folio *folio, unsigned int nr);
>>>> extern long change_protection(struct mmu_gather *tlb,
>>>> struct vm_area_struct *vma, unsigned long start,
>>>> unsigned long end, unsigned long cp_flags);
>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>> index 84461d384ae2..6a605fc5f2cb 100644
>>>> --- a/mm/gup.c
>>>> +++ b/mm/gup.c
>>>> @@ -614,7 +614,7 @@ static inline bool
>>>> can_follow_write_common(struct page *page,
>>>> return false;
>>>> /*
>>>> - * See can_change_pte_writable(): we broke COW and could map
>>>> the page
>>>> + * See can_change_ptes_writable(): we broke COW and could map
>>>> the page
>>>> * writable if we have an exclusive anonymous page ...
>>>> */
>>>> return page && PageAnon(page) && PageAnonExclusive(page);
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 28c87e0e036f..e5496c0d9e7e 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2032,12 +2032,12 @@ static inline bool
>>>> can_change_pmd_writable(struct vm_area_struct *vma,
>>>> return false;
>>>> if (!(vma->vm_flags & VM_SHARED)) {
>>>> - /* See can_change_pte_writable(). */
>>>> + /* See can_change_ptes_writable(). */
>>>> page = vm_normal_page_pmd(vma, addr, pmd);
>>>> return page && PageAnon(page) && PageAnonExclusive(page);
>>>> }
>>>> - /* See can_change_pte_writable(). */
>>>> + /* See can_change_ptes_writable(). */
>>>> return pmd_dirty(pmd);
>>>> }
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index b9e8443aaa86..b1fda3de8d27 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -750,7 +750,7 @@ static void restore_exclusive_pte(struct
>>>> vm_area_struct *vma,
>>>> pte = pte_mkuffd_wp(pte);
>>>> if ((vma->vm_flags & VM_WRITE) &&
>>>> - can_change_pte_writable(vma, address, pte)) {
>>>> + can_change_ptes_writable(vma, address, pte, NULL, 1)) {
>>>> if (folio_test_dirty(folio))
>>>> pte = pte_mkdirty(pte);
>>>> pte = pte_mkwrite(pte, vma);
>>>> @@ -5767,7 +5767,7 @@ static void numa_rebuild_large_mapping(struct
>>>> vm_fault *vmf, struct vm_area_stru
>>>> ptent = pte_modify(ptent, vma->vm_page_prot);
>>>> writable = pte_write(ptent);
>>>> if (!writable && pte_write_upgrade &&
>>>> - can_change_pte_writable(vma, addr, ptent))
>>>> + can_change_ptes_writable(vma, addr, ptent, NULL, 1))
>>>> writable = true;
>>>> }
>>>> @@ -5808,7 +5808,7 @@ static vm_fault_t do_numa_page(struct vm_fault
>>>> *vmf)
>>>> */
>>>> writable = pte_write(pte);
>>>> if (!writable && pte_write_upgrade &&
>>>> - can_change_pte_writable(vma, vmf->address, pte))
>>>> + can_change_ptes_writable(vma, vmf->address, pte, NULL, 1))
>>>> writable = true;
>>>> folio = vm_normal_folio(vma, vmf->address, pte);
>>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>>>> index 33eabc995584..362fd7e5457d 100644
>>>> --- a/mm/mprotect.c
>>>> +++ b/mm/mprotect.c
>>>> @@ -40,8 +40,8 @@
>>>> #include "internal.h"
>>>> -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned
>>>> long addr,
>>>> - pte_t pte)
>>>> +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>>>> long addr,
>>>> + pte_t pte, struct folio *folio, unsigned int nr)
>>>> {
>>>> struct page *page;
>>>> @@ -67,6 +67,9 @@ bool can_change_pte_writable(struct vm_area_struct
>>>> *vma, unsigned long addr,
>>>> * write-fault handler similarly would map them writable
>>>> without
>>>> * any additional checks while holding the PT lock.
>>>> */
>>>> + if (unlikely(nr != 1))
>>>> + return !folio_maybe_mapped_shared(folio);
>>>> +
>>>> page = vm_normal_page(vma, addr, pte);
>>>> return page && PageAnon(page) && PageAnonExclusive(page);
>>>> }
>>>
>>> IIUC, As mentioned in the comment above, we should do the same
>>> anonymous check
>>> to large folios. And folio_maybe_mapped_shared() already handles both
>>> order-0
>>> and large folios nicely, so we could simplify the logic as follows:
>>
>> Thanks. Although we will have to call vm_normal_folio() in case of !
>> folio, since we may not have the folio already for nr == 1 case.
>
> Ah, I see. Should we still check folio_test_anon() when nr != 1?
According to the comment, "We can only special-case on exclusive
anonymous pages", I would say yes.
>
> Thanks,
> Lance
>
>>
>>>
>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>>> index 1605e89349d2..df56a30bb241 100644
>>> --- a/mm/mprotect.c
>>> +++ b/mm/mprotect.c
>>> @@ -43,8 +43,6 @@
>>> bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned
>>> long addr,
>>> pte_t pte, struct folio *folio,
>>> unsigned int nr)
>>> {
>>> - struct page *page;
>>> -
>>> if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
>>> return false;
>>>
>>> @@ -67,11 +65,7 @@ bool can_change_ptes_writable(struct
>>> vm_area_struct *vma, unsigned long addr,
>>> * write-fault handler similarly would map them
>>> writable without
>>> * any additional checks while holding the PT lock.
>>> */
>>> - if (unlikely(nr != 1))
>>> - return !folio_maybe_mapped_shared(folio);
>>> -
>>> - page = vm_normal_page(vma, addr, pte);
>>> - return page && PageAnon(page) &&
>>> PageAnonExclusive(page);
>>> + return folio_test_anon(folio) && !
>>> folio_maybe_mapped_shared(folio);
>>> }
>>>
>>> VM_WARN_ON_ONCE(is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte));
>>> --
>>>
>>> Thanks,
>>> Lance
>>>
>>>> @@ -222,7 +225,7 @@ static long change_pte_range(struct mmu_gather
>>>> *tlb,
>>>> */
>>>> if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
>>>> !pte_write(ptent) &&
>>>> - can_change_pte_writable(vma, addr, ptent))
>>>> + can_change_ptes_writable(vma, addr, ptent, folio, 1))
>>>> ptent = pte_mkwrite(ptent, vma);
>>>> ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
>>>
>>>
>>
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-04-29 5:01 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-28 12:04 [PATCH 0/7] Optimize mprotect for large folios Dev Jain
2025-04-28 12:04 ` [PATCH 1/7] mm: Refactor code in mprotect Dev Jain
2025-04-28 12:04 ` [PATCH 2/7] mm: Optimize mprotect() by batch-skipping PTEs Dev Jain
2025-04-28 12:04 ` [PATCH 3/7] mm: Add batched versions of ptep_modify_prot_start/commit Dev Jain
2025-04-28 12:04 ` [PATCH 4/7] arm64: Add batched version of ptep_modify_prot_start Dev Jain
2025-04-28 18:06 ` Zi Yan
2025-04-29 4:44 ` Dev Jain
2025-04-28 12:04 ` [PATCH 5/7] arm64: Add batched version of ptep_modify_prot_commit Dev Jain
2025-04-28 12:04 ` [PATCH 6/7] mm: Batch around can_change_pte_writable() Dev Jain
2025-04-28 12:50 ` Lance Yang
2025-04-28 12:59 ` Dev Jain
2025-04-28 13:23 ` Lance Yang
2025-04-29 4:59 ` Dev Jain
2025-04-28 13:16 ` Lance Yang
2025-04-28 15:54 ` Lance Yang
2025-04-28 12:04 ` [PATCH 7/7] mm: Optimize mprotect() through PTE-batching Dev Jain
2025-04-28 12:52 ` [PATCH 0/7] Optimize mprotect for large folios Dev Jain
2025-04-28 13:31 ` Lance Yang
2025-04-29 4:40 ` Dev Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).