From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC81B377006 for ; Tue, 30 Jun 2026 09:57:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782813442; cv=none; b=GiPkGZD59wjOBeUawFHB3DY176eg6QuN+ZDZg0xOPFhrL4NNuzOQwkRgC/ObOeFXUqQmzArAYmAphqDh4VA5a4iHLKqxJi/+Nk3qs9+8MAwn2zMLL+HlVb2MeVm4popW6/v15Jjkdfks16Cf2LOHV0soHDhRUB9XTcO/YP5DbYs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782813442; c=relaxed/simple; bh=vYuWM4twKo1pdxyOLrbys26V1TpydzI3rOpCBkk7D+c=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Bb4buqcHqC63+3IF3Yq0mi83wqnEcVNsxV3xefcOZy2Bjy1EA2aFlUaDl2arRZaiyRdMbZIWhi51C4MOg5agE1/PffiOWtF3kEqqtKRA627GHmX1cSoriXqOXAI7d/dJnvK0zgUPa9ShSTLnpaAIOEUKNrZcEWaJV3bkJ8pVEnE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ADB7zKnZ; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ADB7zKnZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BBF7A1F000E9; Tue, 30 Jun 2026 09:57:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782813440; bh=uswHKiiboqANt8oiY8a09HzTyTmoJjrXo6oo4QGSw9M=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=ADB7zKnZGmNXADJ3tRtD7hXSgF5hL/kkBTFtpzCPUvF3XbQcGrg3n7rtqvwwR6OwX XBrXKMYMidEbp89jDOiawcrj7PKb9s00KRDW5uEtpgXXyOBJNrMW/Frb8imnz0cueA QxAXTJ2HEIJ9Ct/mGjniTh/vbFojk09inOK+nv3ai21XnTBU0zgIrCAPLJAssboAQw BgmTSC/FZGVVsouKJsO+nyUwkFVz0ZF9zpZ/b1TbI1/K9ASBajsaclG8gKNFWT3FkV TibgAv/SKZ4zDlwqbFH27HXMPAvw0g0l8B6poyEvj7vfbbUU95s+VDiJ2T+iJQ+xNW 4oX4m+Xcr0Cog== Date: Tue, 30 Jun 2026 10:57:11 +0100 From: Lorenzo Stoakes To: Lance Yang Cc: sarthak.sharma@arm.com, akpm@linux-foundation.org, david@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, dev.jain@arm.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm/memory: refactor finish_fault Message-ID: References: <20260624102047.144543-1-sarthak.sharma@arm.com> <20260630075153.78201-1-lance.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260630075153.78201-1-lance.yang@linux.dev> On Tue, Jun 30, 2026 at 03:51:53PM +0800, Lance Yang wrote: > > On Wed, Jun 24, 2026 at 03:50:47PM +0530, Sarthak Sharma wrote: > >finish_fault() currently has a goto fallback implementation > >where we try to map a large folio with PTEs. If that cannot be > >installed, we goto fallback and go through the fallback mapping > >path again. This looks weird and is tough to comprehend. > > > >Remove the goto fallback implementation and try to map the > >whole folio if allowed. If the whole folio cannot be mapped, > >fall back to single page mapping without repeating the whole > >function. > > > >The cleanup of finish_fault() was suggested by David in [1]. > > > >[1] https://lore.kernel.org/all/3684c55a-6581-4731-b94a-19526f455a1e@kernel.org/ > > > >Suggested-by: David Hildenbrand (Arm) > >Signed-off-by: Sarthak Sharma > >--- > >Tested this patch by running mm selftests on baseline and patched 7.1 > >kernels. No regressions were observed. > > > > mm/memory.c | 78 ++++++++++++++++++++++++++++------------------------- > > 1 file changed, 41 insertions(+), 37 deletions(-) > > > >diff --git a/mm/memory.c b/mm/memory.c > >index 86a973119bd4..f883b3f3850e 100644 > >--- a/mm/memory.c > >+++ b/mm/memory.c > >@@ -5666,18 +5666,16 @@ static bool vmf_pte_changed(struct vm_fault *vmf) > > vm_fault_t finish_fault(struct vm_fault *vmf) > > { > > struct vm_area_struct *vma = vmf->vma; > >- struct page *page; > >+ struct page *page, *map_page; > > struct folio *folio; > > vm_fault_t ret; > > bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && > > !(vma->vm_flags & VM_SHARED); > >- int type, nr_pages; > >- unsigned long addr; > >+ pte_t *fault_pte, *start_pte; > >+ int type, nr_pages, nr_ptes; > >+ unsigned long start_addr; > > bool needs_fallback = false; > > > >-fallback: > >- addr = vmf->address; > >- > > /* Did we COW the page? */ > > if (is_cow) > > page = vmf->cow_page; > >@@ -5695,7 +5693,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > > return ret; > > } > > > >- if (!needs_fallback && vma->vm_file) { > >+ if (vma->vm_file) { > > struct address_space *mapping = vma->vm_file->f_mapping; > > pgoff_t file_end; > > > >@@ -5727,56 +5725,62 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > > > > nr_pages = folio_nr_pages(folio); > > > >+ /* Initialize for single page mapping */ > >+ map_page = page; > >+ start_addr = vmf->address; > >+ nr_ptes = 1; > >+ > > /* Using per-page fault to maintain the uffd semantics */ > >- if (unlikely(userfaultfd_armed(vma)) || unlikely(needs_fallback)) { > >- nr_pages = 1; > >- } else if (nr_pages > 1) { > >- pgoff_t idx = folio_page_idx(folio, page); > >- /* The page offset of vmf->address within the VMA. */ > >- pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff; > >+ if (nr_pages > 1 && likely(!needs_fallback) && likely(!userfaultfd_armed(vma))) { > >+ unsigned long fault_page_idx = folio_page_idx(folio, page); > >+ unsigned long pages_to_folio_end = nr_pages - fault_page_idx; > >+ > >+ bool fits_vma = folio_within_vma(folio, vma); > >+ > > /* The index of the entry in the pagetable for fault page. */ > > pgoff_t pte_off = pte_index(vmf->address); > > > >- /* > >- * Fallback to per-page fault in case the folio size in page > >- * cache beyond the VMA limits and PMD pagetable limits. > >- */ > >- if (unlikely(vma_off < idx || > >- vma_off + (nr_pages - idx) > vma_pages(vma) || > >- pte_off < idx || > >- pte_off + (nr_pages - idx) > PTRS_PER_PTE)) { > >- nr_pages = 1; > > TL;DR: nothing looks broken to me here. Just wanted to spelling out what > this relies on :) > > Old code above had four range checks before installing multiple PTEs: > > 1) vma_off < idx > 2) vma_off + (nr_pages - idx) > vma_pages(vma) > 3) pte_off < idx > 4) pte_off + (nr_pages - idx) > PTRS_PER_PTE > > PTE-table part looks equivalent after refactor. 3) and 4) become: > > 3) pte_off >= fault_page_idx > 4) pte_off + pages_to_folio_end <= PTRS_PER_PTE > > where fault_page_idx is old idx, and pages_to_folio_end is old > nr_pages - idx. > > Only VMA part looks different to me. Old code checked same range, based > on fault address, that it later installed: > > [vma_off - idx, vma_off + (nr_pages - idx)) > > After refactor, folio_within_vma() checks folio range: > > [folio->index, folio->index + folio_nr_pages(folio)) > > while installed range is still based on fault address: > > [vmf->pgoff - fault_page_idx, vmf->pgoff - fault_page_idx + nr_pages) > > So these line up when: > > vmf->pgoff == folio->index + fault_page_idx > > Looks fine for page-cache users, since vmf->page comes from > folio_file_page(folio, vmf->pgoff). This to me all strongly suggests a comment might be helpful here :) > > [...] > > Cheers, Lance Cheers, Lorenzo