From: Dev Jain <dev.jain@arm.com>
To: akpm@linux-foundation.org
Cc: ryan.roberts@arm.com, david@redhat.com, willy@infradead.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
catalin.marinas@arm.com, will@kernel.org,
Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com,
vbabka@suse.cz, jannh@google.com, anshuman.khandual@arm.com,
peterx@redhat.com, joey.gouly@arm.com, ioworker0@gmail.com,
baohua@kernel.org, kevin.brodsky@arm.com,
quic_zhenhuah@quicinc.com, christophe.leroy@csgroup.eu,
yangyicong@hisilicon.com, linux-arm-kernel@lists.infradead.org,
namit@vmware.com, hughd@google.com, yang@os.amperecomputing.com,
ziy@nvidia.com, Dev Jain <dev.jain@arm.com>
Subject: [PATCH v2 7/7] mm: Optimize mprotect() through PTE-batching
Date: Tue, 29 Apr 2025 10:53:36 +0530 [thread overview]
Message-ID: <20250429052336.18912-8-dev.jain@arm.com> (raw)
In-Reply-To: <20250429052336.18912-1-dev.jain@arm.com>
The common pte_present case does not require the folio. Elide the overhead of
vm_normal_folio() for the small folio case, by making an approximation:
for arm64, pte_batch_hint() is conclusive. For other arches, if the pfns
pointed to by the current and the next PTE are contiguous, check whether
a large folio is actually mapped, and only then make the batch optimization.
Reuse the folio from prot_numa case if possible. Since modify_prot_start_ptes()
gathers access/dirty bits, it lets us batch around pte_needs_flush()
(for parisc, the definition includes the access bit).
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
mm/mprotect.c | 49 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 35 insertions(+), 14 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index baff009fc981..f8382806611f 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -129,7 +129,7 @@ static bool prot_numa_skip(struct vm_area_struct *vma, struct folio *folio,
return false;
}
-static bool prot_numa_avoid_fault(struct vm_area_struct *vma,
+static struct folio *prot_numa_avoid_fault(struct vm_area_struct *vma,
unsigned long addr, pte_t *pte, pte_t oldpte, int target_node,
int max_nr, int *nr)
{
@@ -139,25 +139,37 @@ static bool prot_numa_avoid_fault(struct vm_area_struct *vma,
/* Avoid TLB flush if possible */
if (pte_protnone(oldpte))
- return true;
+ return NULL;
folio = vm_normal_folio(vma, addr, oldpte);
if (!folio)
- return true;
+ return NULL;
ret = prot_numa_skip(vma, folio, target_node);
if (ret) {
if (folio_test_large(folio) && max_nr != 1)
*nr = folio_pte_batch(folio, addr, pte, oldpte,
max_nr, flags, NULL, NULL, NULL);
- return ret;
+ return NULL;
}
if (folio_use_access_time(folio))
folio_xchg_access_time(folio,
jiffies_to_msecs(jiffies));
- return false;
+ return folio;
}
+static bool maybe_contiguous_pte_pfns(pte_t *ptep, pte_t pte)
+{
+ pte_t *next_ptep, next_pte;
+
+ if (pte_batch_hint(ptep, pte) != 1)
+ return true;
+
+ next_ptep = ptep + 1;
+ next_pte = ptep_get(next_ptep);
+
+ return unlikely(pte_pfn(next_pte) - pte_pfn(pte) == PAGE_SIZE);
+}
static long change_pte_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr,
unsigned long end, pgprot_t newprot, unsigned long cp_flags)
@@ -188,19 +200,28 @@ static long change_pte_range(struct mmu_gather *tlb,
oldpte = ptep_get(pte);
if (pte_present(oldpte)) {
int max_nr = (end - addr) >> PAGE_SHIFT;
+ const fpb_t flags = FPB_IGNORE_DIRTY;
+ struct folio *folio = NULL;
pte_t ptent;
/*
* Avoid trapping faults against the zero or KSM
* pages. See similar comment in change_huge_pmd.
*/
- if (prot_numa &&
- prot_numa_avoid_fault(vma, addr, pte,
- oldpte, target_node,
- max_nr, &nr))
+ if (prot_numa) {
+ folio = prot_numa_avoid_fault(vma, addr, pte,
+ oldpte, target_node, max_nr, &nr);
+ if (!folio)
continue;
+ }
- oldpte = ptep_modify_prot_start(vma, addr, pte);
+ if (!folio && (max_nr != 1) && maybe_contiguous_pte_pfns(pte, oldpte)) {
+ folio = vm_normal_folio(vma, addr, oldpte);
+ if (folio_test_large(folio))
+ nr = folio_pte_batch(folio, addr, pte,
+ oldpte, max_nr, flags, NULL, NULL, NULL);
+ }
+ oldpte = modify_prot_start_ptes(vma, addr, pte, nr);
ptent = pte_modify(oldpte, newprot);
if (uffd_wp)
@@ -223,13 +244,13 @@ static long change_pte_range(struct mmu_gather *tlb,
*/
if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
!pte_write(ptent) &&
- can_change_ptes_writable(vma, addr, ptent, folio, 1))
+ can_change_ptes_writable(vma, addr, ptent, folio, nr))
ptent = pte_mkwrite(ptent, vma);
- ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
+ modify_prot_commit_ptes(vma, addr, pte, oldpte, ptent, nr);
if (pte_needs_flush(oldpte, ptent))
- tlb_flush_pte_range(tlb, addr, PAGE_SIZE);
- pages++;
+ tlb_flush_pte_range(tlb, addr, nr * PAGE_SIZE);
+ pages += nr;
} else if (is_swap_pte(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);
pte_t newpte;
--
2.30.2
next prev parent reply other threads:[~2025-04-29 5:39 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-29 5:23 [PATCH v2 0/7] Optimize mprotect for large folios Dev Jain
2025-04-29 5:23 ` [PATCH v2 1/7] mm: Refactor code in mprotect Dev Jain
2025-04-29 6:41 ` Anshuman Khandual
2025-04-29 6:54 ` Dev Jain
2025-04-29 11:00 ` Lorenzo Stoakes
2025-04-29 5:23 ` [PATCH v2 2/7] mm: Optimize mprotect() by batch-skipping PTEs Dev Jain
2025-04-29 7:14 ` Anshuman Khandual
2025-04-29 8:59 ` Dev Jain
2025-04-29 13:19 ` Lorenzo Stoakes
2025-04-30 6:37 ` Dev Jain
2025-04-30 13:18 ` Ryan Roberts
2025-04-30 13:36 ` Lorenzo Stoakes
2025-04-29 5:23 ` [PATCH v2 3/7] mm: Add batched versions of ptep_modify_prot_start/commit Dev Jain
2025-04-29 8:39 ` Anshuman Khandual
2025-04-29 9:01 ` Dev Jain
2025-04-29 13:52 ` Lorenzo Stoakes
2025-04-30 6:25 ` Dev Jain
2025-04-30 14:37 ` Lorenzo Stoakes
2025-05-06 14:30 ` David Hildenbrand
2025-05-06 15:03 ` Lorenzo Stoakes
2025-04-30 14:09 ` Ryan Roberts
2025-04-30 14:34 ` Lorenzo Stoakes
2025-05-01 11:33 ` Ryan Roberts
2025-05-01 12:58 ` Lorenzo Stoakes
2025-05-06 14:28 ` David Hildenbrand
2025-04-30 5:35 ` kernel test robot
2025-04-30 5:45 ` kernel test robot
2025-04-30 14:16 ` Ryan Roberts
2025-04-29 5:23 ` [PATCH v2 4/7] arm64: Add batched version of ptep_modify_prot_start Dev Jain
2025-04-30 5:43 ` Anshuman Khandual
2025-04-30 5:49 ` Dev Jain
2025-04-30 6:14 ` Anshuman Khandual
2025-04-30 6:32 ` Dev Jain
2025-04-29 5:23 ` [PATCH v2 5/7] arm64: Add batched version of ptep_modify_prot_commit Dev Jain
2025-04-29 5:23 ` [PATCH v2 6/7] mm: Batch around can_change_pte_writable() Dev Jain
2025-04-29 9:15 ` David Hildenbrand
2025-04-29 9:19 ` David Hildenbrand
2025-04-29 9:27 ` David Hildenbrand
2025-04-29 13:57 ` Lorenzo Stoakes
2025-04-29 14:00 ` David Hildenbrand
2025-04-30 5:44 ` Dev Jain
2025-05-06 9:16 ` Dev Jain
2025-05-06 14:34 ` David Hildenbrand
2025-04-30 6:17 ` kernel test robot
2025-04-29 5:23 ` Dev Jain [this message]
2025-04-29 7:06 ` [PATCH v2 0/7] Optimize mprotect for large folios Lance Yang
2025-04-29 9:02 ` Dev Jain
2025-04-29 10:41 ` Lorenzo Stoakes
2025-04-30 5:42 ` Dev Jain
2025-04-30 6:22 ` Lance Yang
2025-04-30 7:07 ` Dev Jain
2025-04-29 11:03 ` Lorenzo Stoakes
2025-04-29 14:02 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250429052336.18912-8-dev.jain@arm.com \
--to=dev.jain@arm.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=baohua@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=ioworker0@gmail.com \
--cc=jannh@google.com \
--cc=joey.gouly@arm.com \
--cc=kevin.brodsky@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=namit@vmware.com \
--cc=peterx@redhat.com \
--cc=quic_zhenhuah@quicinc.com \
--cc=ryan.roberts@arm.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=yangyicong@hisilicon.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.