Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: Lance Yang <lance.yang@linux.dev>
Cc: sarthak.sharma@arm.com, akpm@linux-foundation.org,
	david@kernel.org,  liam@infradead.org, vbabka@kernel.org,
	rppt@kernel.org, surenb@google.com,  mhocko@suse.com,
	dev.jain@arm.com, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/memory: refactor finish_fault
Date: Tue, 30 Jun 2026 10:57:11 +0100	[thread overview]
Message-ID: <akOS23WSqp_6oXpW@lucifer> (raw)
In-Reply-To: <20260630075153.78201-1-lance.yang@linux.dev>

On Tue, Jun 30, 2026 at 03:51:53PM +0800, Lance Yang wrote:
>
> On Wed, Jun 24, 2026 at 03:50:47PM +0530, Sarthak Sharma wrote:
> >finish_fault() currently has a goto fallback implementation
> >where we try to map a large folio with PTEs. If that cannot be
> >installed, we goto fallback and go through the fallback mapping
> >path again. This looks weird and is tough to comprehend.
> >
> >Remove the goto fallback implementation and try to map the
> >whole folio if allowed. If the whole folio cannot be mapped,
> >fall back to single page mapping without repeating the whole
> >function.
> >
> >The cleanup of finish_fault() was suggested by David in [1].
> >
> >[1] https://lore.kernel.org/all/3684c55a-6581-4731-b94a-19526f455a1e@kernel.org/
> >
> >Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> >Signed-off-by: Sarthak Sharma <sarthak.sharma@arm.com>
> >---
> >Tested this patch by running mm selftests on baseline and patched 7.1
> >kernels. No regressions were observed.
> >
> > mm/memory.c | 78 ++++++++++++++++++++++++++++-------------------------
> > 1 file changed, 41 insertions(+), 37 deletions(-)
> >
> >diff --git a/mm/memory.c b/mm/memory.c
> >index 86a973119bd4..f883b3f3850e 100644
> >--- a/mm/memory.c
> >+++ b/mm/memory.c
> >@@ -5666,18 +5666,16 @@ static bool vmf_pte_changed(struct vm_fault *vmf)
> > vm_fault_t finish_fault(struct vm_fault *vmf)
> > {
> > 	struct vm_area_struct *vma = vmf->vma;
> >-	struct page *page;
> >+	struct page *page, *map_page;
> > 	struct folio *folio;
> > 	vm_fault_t ret;
> > 	bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) &&
> > 		      !(vma->vm_flags & VM_SHARED);
> >-	int type, nr_pages;
> >-	unsigned long addr;
> >+	pte_t *fault_pte, *start_pte;
> >+	int type, nr_pages, nr_ptes;
> >+	unsigned long start_addr;
> > 	bool needs_fallback = false;
> >
> >-fallback:
> >-	addr = vmf->address;
> >-
> > 	/* Did we COW the page? */
> > 	if (is_cow)
> > 		page = vmf->cow_page;
> >@@ -5695,7 +5693,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> > 			return ret;
> > 	}
> >
> >-	if (!needs_fallback && vma->vm_file) {
> >+	if (vma->vm_file) {
> > 		struct address_space *mapping = vma->vm_file->f_mapping;
> > 		pgoff_t file_end;
> >
> >@@ -5727,56 +5725,62 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> >
> > 	nr_pages = folio_nr_pages(folio);
> >
> >+	/* Initialize for single page mapping */
> >+	map_page = page;
> >+	start_addr = vmf->address;
> >+	nr_ptes = 1;
> >+
> > 	/* Using per-page fault to maintain the uffd semantics */
> >-	if (unlikely(userfaultfd_armed(vma)) || unlikely(needs_fallback)) {
> >-		nr_pages = 1;
> >-	} else if (nr_pages > 1) {
> >-		pgoff_t idx = folio_page_idx(folio, page);
> >-		/* The page offset of vmf->address within the VMA. */
> >-		pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
> >+	if (nr_pages > 1 && likely(!needs_fallback) && likely(!userfaultfd_armed(vma))) {
> >+		unsigned long fault_page_idx = folio_page_idx(folio, page);
> >+		unsigned long pages_to_folio_end = nr_pages - fault_page_idx;
> >+
> >+		bool fits_vma = folio_within_vma(folio, vma);
> >+
> > 		/* The index of the entry in the pagetable for fault page. */
> > 		pgoff_t pte_off = pte_index(vmf->address);
> >
> >-		/*
> >-		 * Fallback to per-page fault in case the folio size in page
> >-		 * cache beyond the VMA limits and PMD pagetable limits.
> >-		 */
> >-		if (unlikely(vma_off < idx ||
> >-			    vma_off + (nr_pages - idx) > vma_pages(vma) ||
> >-			    pte_off < idx ||
> >-			    pte_off + (nr_pages - idx)  > PTRS_PER_PTE)) {
> >-			nr_pages = 1;
>
> TL;DR: nothing looks broken to me here. Just wanted to spelling out what
> this relies on :)
>
> Old code above had four range checks before installing multiple PTEs:
>
>   1) vma_off < idx
>   2) vma_off + (nr_pages - idx) > vma_pages(vma)
>   3) pte_off < idx
>   4) pte_off + (nr_pages - idx) > PTRS_PER_PTE
>
> PTE-table part looks equivalent after refactor. 3) and 4) become:
>
>   3) pte_off >= fault_page_idx
>   4) pte_off + pages_to_folio_end <= PTRS_PER_PTE
>
> where fault_page_idx is old idx, and pages_to_folio_end is old
> nr_pages - idx.
>
> Only VMA part looks different to me. Old code checked same range, based
> on fault address, that it later installed:
>
>   [vma_off - idx, vma_off + (nr_pages - idx))
>
> After refactor, folio_within_vma() checks folio range:
>
>   [folio->index, folio->index + folio_nr_pages(folio))
>
> while installed range is still based on fault address:
>
>   [vmf->pgoff - fault_page_idx, vmf->pgoff - fault_page_idx + nr_pages)
>
> So these line up when:
>
>   vmf->pgoff == folio->index + fault_page_idx
>
> Looks fine for page-cache users, since vmf->page comes from
> folio_file_page(folio, vmf->pgoff).

This to me all strongly suggests a comment might be helpful here :)

>
> [...]
>
> Cheers, Lance

Cheers, Lorenzo


      reply	other threads:[~2026-06-30  9:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24 10:20 [PATCH] mm/memory: refactor finish_fault Sarthak Sharma
2026-06-25  8:52 ` David Hildenbrand (Arm)
2026-06-25  9:25   ` Sarthak Sharma
2026-06-30  7:51 ` Lance Yang
2026-06-30  9:57   ` Lorenzo Stoakes [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=akOS23WSqp_6oXpW@lucifer \
    --to=ljs@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=sarthak.sharma@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox