From: Lorenzo Stoakes <ljs@kernel.org>
To: Lance Yang <lance.yang@linux.dev>
Cc: sarthak.sharma@arm.com, akpm@linux-foundation.org,
david@kernel.org, liam@infradead.org, vbabka@kernel.org,
rppt@kernel.org, surenb@google.com, mhocko@suse.com,
dev.jain@arm.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/memory: refactor finish_fault
Date: Tue, 30 Jun 2026 10:57:11 +0100 [thread overview]
Message-ID: <akOS23WSqp_6oXpW@lucifer> (raw)
In-Reply-To: <20260630075153.78201-1-lance.yang@linux.dev>
On Tue, Jun 30, 2026 at 03:51:53PM +0800, Lance Yang wrote:
>
> On Wed, Jun 24, 2026 at 03:50:47PM +0530, Sarthak Sharma wrote:
> >finish_fault() currently has a goto fallback implementation
> >where we try to map a large folio with PTEs. If that cannot be
> >installed, we goto fallback and go through the fallback mapping
> >path again. This looks weird and is tough to comprehend.
> >
> >Remove the goto fallback implementation and try to map the
> >whole folio if allowed. If the whole folio cannot be mapped,
> >fall back to single page mapping without repeating the whole
> >function.
> >
> >The cleanup of finish_fault() was suggested by David in [1].
> >
> >[1] https://lore.kernel.org/all/3684c55a-6581-4731-b94a-19526f455a1e@kernel.org/
> >
> >Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> >Signed-off-by: Sarthak Sharma <sarthak.sharma@arm.com>
> >---
> >Tested this patch by running mm selftests on baseline and patched 7.1
> >kernels. No regressions were observed.
> >
> > mm/memory.c | 78 ++++++++++++++++++++++++++++-------------------------
> > 1 file changed, 41 insertions(+), 37 deletions(-)
> >
> >diff --git a/mm/memory.c b/mm/memory.c
> >index 86a973119bd4..f883b3f3850e 100644
> >--- a/mm/memory.c
> >+++ b/mm/memory.c
> >@@ -5666,18 +5666,16 @@ static bool vmf_pte_changed(struct vm_fault *vmf)
> > vm_fault_t finish_fault(struct vm_fault *vmf)
> > {
> > struct vm_area_struct *vma = vmf->vma;
> >- struct page *page;
> >+ struct page *page, *map_page;
> > struct folio *folio;
> > vm_fault_t ret;
> > bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) &&
> > !(vma->vm_flags & VM_SHARED);
> >- int type, nr_pages;
> >- unsigned long addr;
> >+ pte_t *fault_pte, *start_pte;
> >+ int type, nr_pages, nr_ptes;
> >+ unsigned long start_addr;
> > bool needs_fallback = false;
> >
> >-fallback:
> >- addr = vmf->address;
> >-
> > /* Did we COW the page? */
> > if (is_cow)
> > page = vmf->cow_page;
> >@@ -5695,7 +5693,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> > return ret;
> > }
> >
> >- if (!needs_fallback && vma->vm_file) {
> >+ if (vma->vm_file) {
> > struct address_space *mapping = vma->vm_file->f_mapping;
> > pgoff_t file_end;
> >
> >@@ -5727,56 +5725,62 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> >
> > nr_pages = folio_nr_pages(folio);
> >
> >+ /* Initialize for single page mapping */
> >+ map_page = page;
> >+ start_addr = vmf->address;
> >+ nr_ptes = 1;
> >+
> > /* Using per-page fault to maintain the uffd semantics */
> >- if (unlikely(userfaultfd_armed(vma)) || unlikely(needs_fallback)) {
> >- nr_pages = 1;
> >- } else if (nr_pages > 1) {
> >- pgoff_t idx = folio_page_idx(folio, page);
> >- /* The page offset of vmf->address within the VMA. */
> >- pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
> >+ if (nr_pages > 1 && likely(!needs_fallback) && likely(!userfaultfd_armed(vma))) {
> >+ unsigned long fault_page_idx = folio_page_idx(folio, page);
> >+ unsigned long pages_to_folio_end = nr_pages - fault_page_idx;
> >+
> >+ bool fits_vma = folio_within_vma(folio, vma);
> >+
> > /* The index of the entry in the pagetable for fault page. */
> > pgoff_t pte_off = pte_index(vmf->address);
> >
> >- /*
> >- * Fallback to per-page fault in case the folio size in page
> >- * cache beyond the VMA limits and PMD pagetable limits.
> >- */
> >- if (unlikely(vma_off < idx ||
> >- vma_off + (nr_pages - idx) > vma_pages(vma) ||
> >- pte_off < idx ||
> >- pte_off + (nr_pages - idx) > PTRS_PER_PTE)) {
> >- nr_pages = 1;
>
> TL;DR: nothing looks broken to me here. Just wanted to spelling out what
> this relies on :)
>
> Old code above had four range checks before installing multiple PTEs:
>
> 1) vma_off < idx
> 2) vma_off + (nr_pages - idx) > vma_pages(vma)
> 3) pte_off < idx
> 4) pte_off + (nr_pages - idx) > PTRS_PER_PTE
>
> PTE-table part looks equivalent after refactor. 3) and 4) become:
>
> 3) pte_off >= fault_page_idx
> 4) pte_off + pages_to_folio_end <= PTRS_PER_PTE
>
> where fault_page_idx is old idx, and pages_to_folio_end is old
> nr_pages - idx.
>
> Only VMA part looks different to me. Old code checked same range, based
> on fault address, that it later installed:
>
> [vma_off - idx, vma_off + (nr_pages - idx))
>
> After refactor, folio_within_vma() checks folio range:
>
> [folio->index, folio->index + folio_nr_pages(folio))
>
> while installed range is still based on fault address:
>
> [vmf->pgoff - fault_page_idx, vmf->pgoff - fault_page_idx + nr_pages)
>
> So these line up when:
>
> vmf->pgoff == folio->index + fault_page_idx
>
> Looks fine for page-cache users, since vmf->page comes from
> folio_file_page(folio, vmf->pgoff).
This to me all strongly suggests a comment might be helpful here :)
>
> [...]
>
> Cheers, Lance
Cheers, Lorenzo
prev parent reply other threads:[~2026-06-30 9:57 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-24 10:20 [PATCH] mm/memory: refactor finish_fault Sarthak Sharma
2026-06-25 8:52 ` David Hildenbrand (Arm)
2026-06-25 9:25 ` Sarthak Sharma
2026-06-30 7:51 ` Lance Yang
2026-06-30 9:57 ` Lorenzo Stoakes [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=akOS23WSqp_6oXpW@lucifer \
--to=ljs@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=sarthak.sharma@arm.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox