public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>, Luke Yang <luyang@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Lorenzo Stoakes <ljs@kernel.org>,
	Vlastimil Babka <vbabka@kernel.org>, Jann Horn <jannh@google.com>,
	Dev Jain <dev.jain@arm.com>,
	jhladky@redhat.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Nico Pache <npache@redhat.com>
Subject: Re: [PATCH v2 0/2] mm/mprotect: micro-optimization work
Date: Wed, 1 Apr 2026 10:25:40 +0200	[thread overview]
Message-ID: <bb40b9b5-d71e-4968-8c14-86af46e33fee@kernel.org> (raw)
In-Reply-To: <20260330130635.1d8bf3f1f9dfcafc0317f5e7@linux-foundation.org>

On 3/30/26 22:06, Andrew Morton wrote:
> On Mon, 30 Mar 2026 15:55:51 -0400 Luke Yang <luyang@redhat.com> wrote:
> 
>> Thanks for working on this. I just wanted to share that we've created a
>> test kernel with your patches and tested on the following CPUs:
>>
>> --- aarch64 ---
>> Ampere Altra
>> Ampere Altra Max
>>
>> --- x86_64 ---
>> AMD EPYC 7713
>> AMD EPYC 7351
>> AMD EPYC 7542
>> AMD EPYC 7573X
>> AMD EPYC 7702
>> AMD EPYC 9754
>> Intel Xeon Gold 6126
>> Into Xeon Gold 6330
>> Intel Xeon Gold 6530
>> Intel Xeon Platinum 8351N
>> Intel Core i7-6820HQ
>>
>> --- ppc64le ---
>> IBM Power 10
>>
>> On average, we see improvements ranging from a minimum of 5% to a
>> maximum of 55%, with most improvements showing around a 25% speed up in
>> the libmicro/mprot_tw4m micro benchmark.
> 
> Thanks, that's nice.  I've added some of the above into the changelog
> and I took the liberty of adding your Tested-by: to both patches.
> 
> fyi, regarding [2/2]: it's unclear to me whether the discussion with
> David will result in any alterations.  If there's something I need to
> it always helps to lmk ;)

I think we want to get a better understanding of which exact __always_inline
is really helpful in patch #2, and where to apply the nr_ptes==1 forced
optimization.

I updated my microbenchmark I use for fork+unmap etc to measure
mprotect as well

	https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/pte-mapped-folio-benchmarks.c?ref_type=heads

Running some simple tests with order-0 on 1 GiB of memory:


Upstream Linus:

./pte-mapped-folio-benchmarks 0 write-protect 5
0.005779
...
./pte-mapped-folio-benchmarks 0 write-unprotect 5
0.009113
...


With Pedro's patch #2:
$ ./pte-mapped-folio-benchmarks 0 write-protect 5
0.003941
...
$ ./pte-mapped-folio-benchmarks 0 write-unprotect 5
0.006163
...


With the patch below:

$ ./pte-mapped-folio-benchmarks 0 write-protect 5
0.003364

$ ./pte-mapped-folio-benchmarks 0 write-unprotect 5
0.005729


So patch #2 might be improved. And the forced inlining of
mprotect_folio_pte_batch() should likely not go into the same patch.


---

From cf1a2a4a6ef95ed541947f2fd9d8351bef664426 Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Arm)" <david@kernel.org>
Date: Wed, 1 Apr 2026 08:15:44 +0000
Subject: [PATCH] tmp

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 mm/mprotect.c | 79 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 48 insertions(+), 31 deletions(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index c0571445bef7..8d14c05a11a2 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -117,7 +117,7 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
 }
 
 /* Set nr_ptes number of ptes, starting from idx */
-static void prot_commit_flush_ptes(struct vm_area_struct *vma, unsigned long addr,
+static __always_inline void prot_commit_flush_ptes(struct vm_area_struct *vma, unsigned long addr,
                pte_t *ptep, pte_t oldpte, pte_t ptent, int nr_ptes,
                int idx, bool set_write, struct mmu_gather *tlb)
 {
@@ -143,7 +143,7 @@ static void prot_commit_flush_ptes(struct vm_area_struct *vma, unsigned long add
  * !PageAnonExclusive() pages, starting from start_idx. Caller must enforce
  * that the ptes point to consecutive pages of the same anon large folio.
  */
-static int page_anon_exclusive_sub_batch(int start_idx, int max_len,
+static __always_inline int page_anon_exclusive_sub_batch(int start_idx, int max_len,
                struct page *first_page, bool expected_anon_exclusive)
 {
        int idx;
@@ -169,7 +169,7 @@ static int page_anon_exclusive_sub_batch(int start_idx, int max_len,
  * pte of the batch. Therefore, we must individually check all pages and
  * retrieve sub-batches.
  */
-static void commit_anon_folio_batch(struct vm_area_struct *vma,
+static __always_inline void commit_anon_folio_batch(struct vm_area_struct *vma,
                struct folio *folio, struct page *first_page, unsigned long addr, pte_t *ptep,
                pte_t oldpte, pte_t ptent, int nr_ptes, struct mmu_gather *tlb)
 {
@@ -188,7 +188,7 @@ static void commit_anon_folio_batch(struct vm_area_struct *vma,
        }
 }
 
-static void set_write_prot_commit_flush_ptes(struct vm_area_struct *vma,
+static __always_inline void set_write_prot_commit_flush_ptes(struct vm_area_struct *vma,
                struct folio *folio, struct page *page, unsigned long addr, pte_t *ptep,
                pte_t oldpte, pte_t ptent, int nr_ptes, struct mmu_gather *tlb)
 {
@@ -211,6 +211,41 @@ static void set_write_prot_commit_flush_ptes(struct vm_area_struct *vma,
        commit_anon_folio_batch(vma, folio, page, addr, ptep, oldpte, ptent, nr_ptes, tlb);
 }
 
+static __always_inline void change_present_ptes(struct mmu_gather *tlb,
+               struct vm_area_struct *vma, unsigned long addr,
+               pgprot_t newprot, unsigned long cp_flags,
+               struct folio *folio, struct page *page, pte_t *pte,
+               unsigned int nr_ptes)
+{
+       bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
+       bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
+       pte_t oldpte = modify_prot_start_ptes(vma, addr, pte, nr_ptes);
+       pte_t ptent = pte_modify(oldpte, newprot);
+
+       if (uffd_wp)
+               ptent = pte_mkuffd_wp(ptent);
+       else if (uffd_wp_resolve)
+               ptent = pte_clear_uffd_wp(ptent);
+
+       /*
+        * In some writable, shared mappings, we might want to catch actual
+        * write access -- see vma_wants_writenotify().
+        *
+        * In all writable, private mappings, we have to properly handle COW.
+        *
+        * In both cases, we can sometimes still change PTEs writable and avoid
+        * the write-fault handler, for example, if a PTE is already dirty and
+        * no other COW or special handling is required.
+        */
+       if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
+            !pte_write(ptent))
+               set_write_prot_commit_flush_ptes(vma, folio, page, addr, pte,
+                                                oldpte, ptent, nr_ptes, tlb);
+       else
+               prot_commit_flush_ptes(vma, addr, pte, oldpte, ptent,
+                       nr_ptes, /* idx = */ 0, /* set_write = */ false, tlb);
+}
+
 static long change_pte_range(struct mmu_gather *tlb,
                struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr,
                unsigned long end, pgprot_t newprot, unsigned long cp_flags)
@@ -242,7 +277,6 @@ static long change_pte_range(struct mmu_gather *tlb,
                        int max_nr_ptes = (end - addr) >> PAGE_SHIFT;
                        struct folio *folio = NULL;
                        struct page *page;
-                       pte_t ptent;
 
                        /* Already in the desired state. */
                        if (prot_numa && pte_protnone(oldpte))
@@ -268,34 +302,17 @@ static long change_pte_range(struct mmu_gather *tlb,
 
                        nr_ptes = mprotect_folio_pte_batch(folio, pte, oldpte, max_nr_ptes, flags);
 
-                       oldpte = modify_prot_start_ptes(vma, addr, pte, nr_ptes);
-                       ptent = pte_modify(oldpte, newprot);
-
-                       if (uffd_wp)
-                               ptent = pte_mkuffd_wp(ptent);
-                       else if (uffd_wp_resolve)
-                               ptent = pte_clear_uffd_wp(ptent);
-
                        /*
-                        * In some writable, shared mappings, we might want
-                        * to catch actual write access -- see
-                        * vma_wants_writenotify().
-                        *
-                        * In all writable, private mappings, we have to
-                        * properly handle COW.
-                        *
-                        * In both cases, we can sometimes still change PTEs
-                        * writable and avoid the write-fault handler, for
-                        * example, if a PTE is already dirty and no other
-                        * COW or special handling is required.
+                        * Optimize for order-0 folios by optimizing out all
+                        * loops.
                         */
-                       if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
-                            !pte_write(ptent))
-                               set_write_prot_commit_flush_ptes(vma, folio, page,
-                               addr, pte, oldpte, ptent, nr_ptes, tlb);
-                       else
-                               prot_commit_flush_ptes(vma, addr, pte, oldpte, ptent,
-                                       nr_ptes, /* idx = */ 0, /* set_write = */ false, tlb);
+                       if (nr_ptes == 1) {
+                               change_present_ptes(tlb, vma, addr, newprot,
+                                                   cp_flags, folio, page, pte, 1);
+                       } else {
+                               change_present_ptes(tlb, vma, addr, newprot,
+                                                   cp_flags, folio, page, pte, nr_ptes);
+                       }
                        pages += nr_ptes;
                } else if (pte_none(oldpte)) {
                        /*
-- 
2.53.0


-- 
Cheers,

David


  reply	other threads:[~2026-04-01  8:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 15:43 [PATCH v2 0/2] mm/mprotect: micro-optimization work Pedro Falcato
2026-03-24 15:43 ` [PATCH v2 1/2] mm/mprotect: move softleaf code out of the main function Pedro Falcato
2026-03-24 20:12   ` David Hildenbrand (Arm)
2026-03-24 15:43 ` [PATCH v2 2/2] mm/mprotect: special-case small folios when applying write permissions Pedro Falcato
2026-03-24 20:18   ` David Hildenbrand (Arm)
2026-03-25 11:37     ` Pedro Falcato
2026-03-30 15:16       ` David Hildenbrand (Arm)
2026-04-02  0:09         ` Andrew Morton
2026-04-02  3:44           ` Andrew Morton
2026-04-02  7:11             ` David Hildenbrand (Arm)
2026-03-30 19:55 ` [PATCH v2 0/2] mm/mprotect: micro-optimization work Luke Yang
2026-03-30 20:06   ` Andrew Morton
2026-04-01  8:25     ` David Hildenbrand (Arm) [this message]
2026-04-01 14:14       ` Pedro Falcato
2026-04-01 14:10   ` Pedro Falcato
2026-04-02 13:55     ` Luke Yang
2026-04-06 14:32       ` Luke Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb40b9b5-d71e-4968-8c14-86af46e33fee@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dev.jain@arm.com \
    --cc=jannh@google.com \
    --cc=jhladky@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=luyang@redhat.com \
    --cc=npache@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox