Re: [PATCH] mm/memory: refactor finish_fault

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Lance Yang <lance.yang@linux.dev>
To: sarthak.sharma@arm.com
Cc: akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org,
	liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
	surenb@google.com, mhocko@suse.com, dev.jain@arm.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Lance Yang <lance.yang@linux.dev>
Subject: Re: [PATCH] mm/memory: refactor finish_fault
Date: Tue, 30 Jun 2026 15:51:53 +0800	[thread overview]
Message-ID: <20260630075153.78201-1-lance.yang@linux.dev> (raw)
In-Reply-To: <20260624102047.144543-1-sarthak.sharma@arm.com>


On Wed, Jun 24, 2026 at 03:50:47PM +0530, Sarthak Sharma wrote:
>finish_fault() currently has a goto fallback implementation
>where we try to map a large folio with PTEs. If that cannot be
>installed, we goto fallback and go through the fallback mapping
>path again. This looks weird and is tough to comprehend.
>
>Remove the goto fallback implementation and try to map the
>whole folio if allowed. If the whole folio cannot be mapped,
>fall back to single page mapping without repeating the whole
>function.
>
>The cleanup of finish_fault() was suggested by David in [1].
>
>[1] https://lore.kernel.org/all/3684c55a-6581-4731-b94a-19526f455a1e@kernel.org/
>
>Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
>Signed-off-by: Sarthak Sharma <sarthak.sharma@arm.com>
>---
>Tested this patch by running mm selftests on baseline and patched 7.1
>kernels. No regressions were observed.
>
> mm/memory.c | 78 ++++++++++++++++++++++++++++-------------------------
> 1 file changed, 41 insertions(+), 37 deletions(-)
>
>diff --git a/mm/memory.c b/mm/memory.c
>index 86a973119bd4..f883b3f3850e 100644
>--- a/mm/memory.c
>+++ b/mm/memory.c
>@@ -5666,18 +5666,16 @@ static bool vmf_pte_changed(struct vm_fault *vmf)
> vm_fault_t finish_fault(struct vm_fault *vmf)
> {
> 	struct vm_area_struct *vma = vmf->vma;
>-	struct page *page;
>+	struct page *page, *map_page;
> 	struct folio *folio;
> 	vm_fault_t ret;
> 	bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) &&
> 		      !(vma->vm_flags & VM_SHARED);
>-	int type, nr_pages;
>-	unsigned long addr;
>+	pte_t *fault_pte, *start_pte;
>+	int type, nr_pages, nr_ptes;
>+	unsigned long start_addr;
> 	bool needs_fallback = false;
> 
>-fallback:
>-	addr = vmf->address;
>-
> 	/* Did we COW the page? */
> 	if (is_cow)
> 		page = vmf->cow_page;
>@@ -5695,7 +5693,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> 			return ret;
> 	}
> 
>-	if (!needs_fallback && vma->vm_file) {
>+	if (vma->vm_file) {
> 		struct address_space *mapping = vma->vm_file->f_mapping;
> 		pgoff_t file_end;
> 
>@@ -5727,56 +5725,62 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> 
> 	nr_pages = folio_nr_pages(folio);
> 
>+	/* Initialize for single page mapping */
>+	map_page = page;
>+	start_addr = vmf->address;
>+	nr_ptes = 1;
>+
> 	/* Using per-page fault to maintain the uffd semantics */
>-	if (unlikely(userfaultfd_armed(vma)) || unlikely(needs_fallback)) {
>-		nr_pages = 1;
>-	} else if (nr_pages > 1) {
>-		pgoff_t idx = folio_page_idx(folio, page);
>-		/* The page offset of vmf->address within the VMA. */
>-		pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
>+	if (nr_pages > 1 && likely(!needs_fallback) && likely(!userfaultfd_armed(vma))) {
>+		unsigned long fault_page_idx = folio_page_idx(folio, page);
>+		unsigned long pages_to_folio_end = nr_pages - fault_page_idx;
>+
>+		bool fits_vma = folio_within_vma(folio, vma);
>+
> 		/* The index of the entry in the pagetable for fault page. */
> 		pgoff_t pte_off = pte_index(vmf->address);
> 
>-		/*
>-		 * Fallback to per-page fault in case the folio size in page
>-		 * cache beyond the VMA limits and PMD pagetable limits.
>-		 */
>-		if (unlikely(vma_off < idx ||
>-			    vma_off + (nr_pages - idx) > vma_pages(vma) ||
>-			    pte_off < idx ||
>-			    pte_off + (nr_pages - idx)  > PTRS_PER_PTE)) {
>-			nr_pages = 1;

TL;DR: nothing looks broken to me here. Just wanted to spelling out what
this relies on :)

Old code above had four range checks before installing multiple PTEs:

  1) vma_off < idx
  2) vma_off + (nr_pages - idx) > vma_pages(vma)
  3) pte_off < idx
  4) pte_off + (nr_pages - idx) > PTRS_PER_PTE

PTE-table part looks equivalent after refactor. 3) and 4) become:

  3) pte_off >= fault_page_idx
  4) pte_off + pages_to_folio_end <= PTRS_PER_PTE

where fault_page_idx is old idx, and pages_to_folio_end is old
nr_pages - idx.

Only VMA part looks different to me. Old code checked same range, based
on fault address, that it later installed:

  [vma_off - idx, vma_off + (nr_pages - idx))

After refactor, folio_within_vma() checks folio range:

  [folio->index, folio->index + folio_nr_pages(folio))

while installed range is still based on fault address:

  [vmf->pgoff - fault_page_idx, vmf->pgoff - fault_page_idx + nr_pages)

So these line up when:

  vmf->pgoff == folio->index + fault_page_idx

Looks fine for page-cache users, since vmf->page comes from
folio_file_page(folio, vmf->pgoff).

[...]

Cheers, Lance

next prev parent reply	other threads:[~2026-06-30  7:52 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24 10:20 [PATCH] mm/memory: refactor finish_fault Sarthak Sharma
2026-06-25  8:52 ` David Hildenbrand (Arm)
2026-06-25  9:25   ` Sarthak Sharma
2026-06-30  7:51 ` Lance Yang [this message]
2026-06-30  9:57   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260630075153.78201-1-lance.yang@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=sarthak.sharma@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox