From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC81B377006
	for <linux-kernel@vger.kernel.org>; Tue, 30 Jun 2026 09:57:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1782813442; cv=none; b=GiPkGZD59wjOBeUawFHB3DY176eg6QuN+ZDZg0xOPFhrL4NNuzOQwkRgC/ObOeFXUqQmzArAYmAphqDh4VA5a4iHLKqxJi/+Nk3qs9+8MAwn2zMLL+HlVb2MeVm4popW6/v15Jjkdfks16Cf2LOHV0soHDhRUB9XTcO/YP5DbYs=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1782813442; c=relaxed/simple;
	bh=vYuWM4twKo1pdxyOLrbys26V1TpydzI3rOpCBkk7D+c=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=Bb4buqcHqC63+3IF3Yq0mi83wqnEcVNsxV3xefcOZy2Bjy1EA2aFlUaDl2arRZaiyRdMbZIWhi51C4MOg5agE1/PffiOWtF3kEqqtKRA627GHmX1cSoriXqOXAI7d/dJnvK0zgUPa9ShSTLnpaAIOEUKNrZcEWaJV3bkJ8pVEnE=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ADB7zKnZ; arc=none smtp.client-ip=100.103.45.18
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ADB7zKnZ"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id BBF7A1F000E9;
	Tue, 30 Jun 2026 09:57:17 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org;
	s=k20260515; t=1782813440;
	bh=uswHKiiboqANt8oiY8a09HzTyTmoJjrXo6oo4QGSw9M=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To;
	b=ADB7zKnZGmNXADJ3tRtD7hXSgF5hL/kkBTFtpzCPUvF3XbQcGrg3n7rtqvwwR6OwX
	 XBrXKMYMidEbp89jDOiawcrj7PKb9s00KRDW5uEtpgXXyOBJNrMW/Frb8imnz0cueA
	 QxAXTJ2HEIJ9Ct/mGjniTh/vbFojk09inOK+nv3ai21XnTBU0zgIrCAPLJAssboAQw
	 BgmTSC/FZGVVsouKJsO+nyUwkFVz0ZF9zpZ/b1TbI1/K9ASBajsaclG8gKNFWT3FkV
	 TibgAv/SKZ4zDlwqbFH27HXMPAvw0g0l8B6poyEvj7vfbbUU95s+VDiJ2T+iJQ+xNW
	 4oX4m+Xcr0Cog==
Date: Tue, 30 Jun 2026 10:57:11 +0100
From: Lorenzo Stoakes <ljs@kernel.org>
To: Lance Yang <lance.yang@linux.dev>
Cc: sarthak.sharma@arm.com, akpm@linux-foundation.org, david@kernel.org, 
	liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, 
	mhocko@suse.com, dev.jain@arm.com, linux-mm@kvack.org, 
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/memory: refactor finish_fault
Message-ID: <akOS23WSqp_6oXpW@lucifer>
References: <20260624102047.144543-1-sarthak.sharma@arm.com>
 <20260630075153.78201-1-lance.yang@linux.dev>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20260630075153.78201-1-lance.yang@linux.dev>

On Tue, Jun 30, 2026 at 03:51:53PM +0800, Lance Yang wrote:
>
> On Wed, Jun 24, 2026 at 03:50:47PM +0530, Sarthak Sharma wrote:
> >finish_fault() currently has a goto fallback implementation
> >where we try to map a large folio with PTEs. If that cannot be
> >installed, we goto fallback and go through the fallback mapping
> >path again. This looks weird and is tough to comprehend.
> >
> >Remove the goto fallback implementation and try to map the
> >whole folio if allowed. If the whole folio cannot be mapped,
> >fall back to single page mapping without repeating the whole
> >function.
> >
> >The cleanup of finish_fault() was suggested by David in [1].
> >
> >[1] https://lore.kernel.org/all/3684c55a-6581-4731-b94a-19526f455a1e@kernel.org/
> >
> >Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> >Signed-off-by: Sarthak Sharma <sarthak.sharma@arm.com>
> >---
> >Tested this patch by running mm selftests on baseline and patched 7.1
> >kernels. No regressions were observed.
> >
> > mm/memory.c | 78 ++++++++++++++++++++++++++++-------------------------
> > 1 file changed, 41 insertions(+), 37 deletions(-)
> >
> >diff --git a/mm/memory.c b/mm/memory.c
> >index 86a973119bd4..f883b3f3850e 100644
> >--- a/mm/memory.c
> >+++ b/mm/memory.c
> >@@ -5666,18 +5666,16 @@ static bool vmf_pte_changed(struct vm_fault *vmf)
> > vm_fault_t finish_fault(struct vm_fault *vmf)
> > {
> > 	struct vm_area_struct *vma = vmf->vma;
> >-	struct page *page;
> >+	struct page *page, *map_page;
> > 	struct folio *folio;
> > 	vm_fault_t ret;
> > 	bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) &&
> > 		      !(vma->vm_flags & VM_SHARED);
> >-	int type, nr_pages;
> >-	unsigned long addr;
> >+	pte_t *fault_pte, *start_pte;
> >+	int type, nr_pages, nr_ptes;
> >+	unsigned long start_addr;
> > 	bool needs_fallback = false;
> >
> >-fallback:
> >-	addr = vmf->address;
> >-
> > 	/* Did we COW the page? */
> > 	if (is_cow)
> > 		page = vmf->cow_page;
> >@@ -5695,7 +5693,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> > 			return ret;
> > 	}
> >
> >-	if (!needs_fallback && vma->vm_file) {
> >+	if (vma->vm_file) {
> > 		struct address_space *mapping = vma->vm_file->f_mapping;
> > 		pgoff_t file_end;
> >
> >@@ -5727,56 +5725,62 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> >
> > 	nr_pages = folio_nr_pages(folio);
> >
> >+	/* Initialize for single page mapping */
> >+	map_page = page;
> >+	start_addr = vmf->address;
> >+	nr_ptes = 1;
> >+
> > 	/* Using per-page fault to maintain the uffd semantics */
> >-	if (unlikely(userfaultfd_armed(vma)) || unlikely(needs_fallback)) {
> >-		nr_pages = 1;
> >-	} else if (nr_pages > 1) {
> >-		pgoff_t idx = folio_page_idx(folio, page);
> >-		/* The page offset of vmf->address within the VMA. */
> >-		pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
> >+	if (nr_pages > 1 && likely(!needs_fallback) && likely(!userfaultfd_armed(vma))) {
> >+		unsigned long fault_page_idx = folio_page_idx(folio, page);
> >+		unsigned long pages_to_folio_end = nr_pages - fault_page_idx;
> >+
> >+		bool fits_vma = folio_within_vma(folio, vma);
> >+
> > 		/* The index of the entry in the pagetable for fault page. */
> > 		pgoff_t pte_off = pte_index(vmf->address);
> >
> >-		/*
> >-		 * Fallback to per-page fault in case the folio size in page
> >-		 * cache beyond the VMA limits and PMD pagetable limits.
> >-		 */
> >-		if (unlikely(vma_off < idx ||
> >-			    vma_off + (nr_pages - idx) > vma_pages(vma) ||
> >-			    pte_off < idx ||
> >-			    pte_off + (nr_pages - idx)  > PTRS_PER_PTE)) {
> >-			nr_pages = 1;
>
> TL;DR: nothing looks broken to me here. Just wanted to spelling out what
> this relies on :)
>
> Old code above had four range checks before installing multiple PTEs:
>
>   1) vma_off < idx
>   2) vma_off + (nr_pages - idx) > vma_pages(vma)
>   3) pte_off < idx
>   4) pte_off + (nr_pages - idx) > PTRS_PER_PTE
>
> PTE-table part looks equivalent after refactor. 3) and 4) become:
>
>   3) pte_off >= fault_page_idx
>   4) pte_off + pages_to_folio_end <= PTRS_PER_PTE
>
> where fault_page_idx is old idx, and pages_to_folio_end is old
> nr_pages - idx.
>
> Only VMA part looks different to me. Old code checked same range, based
> on fault address, that it later installed:
>
>   [vma_off - idx, vma_off + (nr_pages - idx))
>
> After refactor, folio_within_vma() checks folio range:
>
>   [folio->index, folio->index + folio_nr_pages(folio))
>
> while installed range is still based on fault address:
>
>   [vmf->pgoff - fault_page_idx, vmf->pgoff - fault_page_idx + nr_pages)
>
> So these line up when:
>
>   vmf->pgoff == folio->index + fault_page_idx
>
> Looks fine for page-cache users, since vmf->page comes from
> folio_file_page(folio, vmf->pgoff).

This to me all strongly suggests a comment might be helpful here :)

>
> [...]
>
> Cheers, Lance

Cheers, Lorenzo