From: Lance Yang <lance.yang@linux.dev>
To: sarthak.sharma@arm.com
Cc: akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org,
liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, dev.jain@arm.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Lance Yang <lance.yang@linux.dev>
Subject: Re: [PATCH] mm/memory: refactor finish_fault
Date: Tue, 30 Jun 2026 15:51:53 +0800 [thread overview]
Message-ID: <20260630075153.78201-1-lance.yang@linux.dev> (raw)
In-Reply-To: <20260624102047.144543-1-sarthak.sharma@arm.com>
On Wed, Jun 24, 2026 at 03:50:47PM +0530, Sarthak Sharma wrote:
>finish_fault() currently has a goto fallback implementation
>where we try to map a large folio with PTEs. If that cannot be
>installed, we goto fallback and go through the fallback mapping
>path again. This looks weird and is tough to comprehend.
>
>Remove the goto fallback implementation and try to map the
>whole folio if allowed. If the whole folio cannot be mapped,
>fall back to single page mapping without repeating the whole
>function.
>
>The cleanup of finish_fault() was suggested by David in [1].
>
>[1] https://lore.kernel.org/all/3684c55a-6581-4731-b94a-19526f455a1e@kernel.org/
>
>Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
>Signed-off-by: Sarthak Sharma <sarthak.sharma@arm.com>
>---
>Tested this patch by running mm selftests on baseline and patched 7.1
>kernels. No regressions were observed.
>
> mm/memory.c | 78 ++++++++++++++++++++++++++++-------------------------
> 1 file changed, 41 insertions(+), 37 deletions(-)
>
>diff --git a/mm/memory.c b/mm/memory.c
>index 86a973119bd4..f883b3f3850e 100644
>--- a/mm/memory.c
>+++ b/mm/memory.c
>@@ -5666,18 +5666,16 @@ static bool vmf_pte_changed(struct vm_fault *vmf)
> vm_fault_t finish_fault(struct vm_fault *vmf)
> {
> struct vm_area_struct *vma = vmf->vma;
>- struct page *page;
>+ struct page *page, *map_page;
> struct folio *folio;
> vm_fault_t ret;
> bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) &&
> !(vma->vm_flags & VM_SHARED);
>- int type, nr_pages;
>- unsigned long addr;
>+ pte_t *fault_pte, *start_pte;
>+ int type, nr_pages, nr_ptes;
>+ unsigned long start_addr;
> bool needs_fallback = false;
>
>-fallback:
>- addr = vmf->address;
>-
> /* Did we COW the page? */
> if (is_cow)
> page = vmf->cow_page;
>@@ -5695,7 +5693,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> return ret;
> }
>
>- if (!needs_fallback && vma->vm_file) {
>+ if (vma->vm_file) {
> struct address_space *mapping = vma->vm_file->f_mapping;
> pgoff_t file_end;
>
>@@ -5727,56 +5725,62 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>
> nr_pages = folio_nr_pages(folio);
>
>+ /* Initialize for single page mapping */
>+ map_page = page;
>+ start_addr = vmf->address;
>+ nr_ptes = 1;
>+
> /* Using per-page fault to maintain the uffd semantics */
>- if (unlikely(userfaultfd_armed(vma)) || unlikely(needs_fallback)) {
>- nr_pages = 1;
>- } else if (nr_pages > 1) {
>- pgoff_t idx = folio_page_idx(folio, page);
>- /* The page offset of vmf->address within the VMA. */
>- pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
>+ if (nr_pages > 1 && likely(!needs_fallback) && likely(!userfaultfd_armed(vma))) {
>+ unsigned long fault_page_idx = folio_page_idx(folio, page);
>+ unsigned long pages_to_folio_end = nr_pages - fault_page_idx;
>+
>+ bool fits_vma = folio_within_vma(folio, vma);
>+
> /* The index of the entry in the pagetable for fault page. */
> pgoff_t pte_off = pte_index(vmf->address);
>
>- /*
>- * Fallback to per-page fault in case the folio size in page
>- * cache beyond the VMA limits and PMD pagetable limits.
>- */
>- if (unlikely(vma_off < idx ||
>- vma_off + (nr_pages - idx) > vma_pages(vma) ||
>- pte_off < idx ||
>- pte_off + (nr_pages - idx) > PTRS_PER_PTE)) {
>- nr_pages = 1;
TL;DR: nothing looks broken to me here. Just wanted to spelling out what
this relies on :)
Old code above had four range checks before installing multiple PTEs:
1) vma_off < idx
2) vma_off + (nr_pages - idx) > vma_pages(vma)
3) pte_off < idx
4) pte_off + (nr_pages - idx) > PTRS_PER_PTE
PTE-table part looks equivalent after refactor. 3) and 4) become:
3) pte_off >= fault_page_idx
4) pte_off + pages_to_folio_end <= PTRS_PER_PTE
where fault_page_idx is old idx, and pages_to_folio_end is old
nr_pages - idx.
Only VMA part looks different to me. Old code checked same range, based
on fault address, that it later installed:
[vma_off - idx, vma_off + (nr_pages - idx))
After refactor, folio_within_vma() checks folio range:
[folio->index, folio->index + folio_nr_pages(folio))
while installed range is still based on fault address:
[vmf->pgoff - fault_page_idx, vmf->pgoff - fault_page_idx + nr_pages)
So these line up when:
vmf->pgoff == folio->index + fault_page_idx
Looks fine for page-cache users, since vmf->page comes from
folio_file_page(folio, vmf->pgoff).
[...]
Cheers, Lance
next prev parent reply other threads:[~2026-06-30 7:52 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-24 10:20 [PATCH] mm/memory: refactor finish_fault Sarthak Sharma
2026-06-25 8:52 ` David Hildenbrand (Arm)
2026-06-25 9:25 ` Sarthak Sharma
2026-06-30 7:51 ` Lance Yang [this message]
2026-06-30 9:57 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260630075153.78201-1-lance.yang@linux.dev \
--to=lance.yang@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=sarthak.sharma@arm.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox