From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ECC83317148 for ; Tue, 30 Jun 2026 07:52:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782805939; cv=none; b=NgQjugEZtMQDj5U7ghsITeKrZCGgHnq9L/SW397HJgy3Og9ClspX7J/qdUOs/bjYbRWI3m419une/XD+W+u/WOd4IhhCJtQSc98j8vlj90gkKIS9Dc4sFzTWtN6EmU57Y+EY8217EB57hdfRS+S/ix1kiATEh3jmsrQhgLEUgJg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782805939; c=relaxed/simple; bh=TOiLA957ZbR7V4TqUk8yAQirNE0svrC0JT5rjfwYxX8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=eBB5fMhlwyv/oyrpk7JxDnIsTEkk+AO0TJRO5m3kYy70LcKoXsrzsx1ZPk3dprtWUeom/VXK4SoGAn+LFVbWveLb7UZ2y1usb04uUNKTwJsG1o4tCUWLYWjBJsg25bh0g2H9akyh6WKz9r67FLdAizzYqTeDcAzOTvX598U/wxw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=co63CCcd; arc=none smtp.client-ip=91.218.175.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="co63CCcd" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1782805935; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xjV7MUM0fTW0l2bLsP56lxGOgmMniewaSqOOi/jChAI=; b=co63CCcdteWmQ39aIJfmwuv/Vfh6Ko3VzwQKXa/xqAC/FI35TrMiiQtsmbS1RdHREqEjuH 1EikgbTwEQ8Jkmv1Zbu1YKJ8BlHzM5S3eWL841cha6+Yo1QtuS1SU7qYZVB2c8Suq8cdha PwX/onpf3ZAtPtkuLmPRZ0Z/AHBh3Mc= From: Lance Yang To: sarthak.sharma@arm.com Cc: akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, dev.jain@arm.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lance Yang Subject: Re: [PATCH] mm/memory: refactor finish_fault Date: Tue, 30 Jun 2026 15:51:53 +0800 Message-Id: <20260630075153.78201-1-lance.yang@linux.dev> In-Reply-To: <20260624102047.144543-1-sarthak.sharma@arm.com> References: <20260624102047.144543-1-sarthak.sharma@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On Wed, Jun 24, 2026 at 03:50:47PM +0530, Sarthak Sharma wrote: >finish_fault() currently has a goto fallback implementation >where we try to map a large folio with PTEs. If that cannot be >installed, we goto fallback and go through the fallback mapping >path again. This looks weird and is tough to comprehend. > >Remove the goto fallback implementation and try to map the >whole folio if allowed. If the whole folio cannot be mapped, >fall back to single page mapping without repeating the whole >function. > >The cleanup of finish_fault() was suggested by David in [1]. > >[1] https://lore.kernel.org/all/3684c55a-6581-4731-b94a-19526f455a1e@kernel.org/ > >Suggested-by: David Hildenbrand (Arm) >Signed-off-by: Sarthak Sharma >--- >Tested this patch by running mm selftests on baseline and patched 7.1 >kernels. No regressions were observed. > > mm/memory.c | 78 ++++++++++++++++++++++++++++------------------------- > 1 file changed, 41 insertions(+), 37 deletions(-) > >diff --git a/mm/memory.c b/mm/memory.c >index 86a973119bd4..f883b3f3850e 100644 >--- a/mm/memory.c >+++ b/mm/memory.c >@@ -5666,18 +5666,16 @@ static bool vmf_pte_changed(struct vm_fault *vmf) > vm_fault_t finish_fault(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; >- struct page *page; >+ struct page *page, *map_page; > struct folio *folio; > vm_fault_t ret; > bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && > !(vma->vm_flags & VM_SHARED); >- int type, nr_pages; >- unsigned long addr; >+ pte_t *fault_pte, *start_pte; >+ int type, nr_pages, nr_ptes; >+ unsigned long start_addr; > bool needs_fallback = false; > >-fallback: >- addr = vmf->address; >- > /* Did we COW the page? */ > if (is_cow) > page = vmf->cow_page; >@@ -5695,7 +5693,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > return ret; > } > >- if (!needs_fallback && vma->vm_file) { >+ if (vma->vm_file) { > struct address_space *mapping = vma->vm_file->f_mapping; > pgoff_t file_end; > >@@ -5727,56 +5725,62 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > > nr_pages = folio_nr_pages(folio); > >+ /* Initialize for single page mapping */ >+ map_page = page; >+ start_addr = vmf->address; >+ nr_ptes = 1; >+ > /* Using per-page fault to maintain the uffd semantics */ >- if (unlikely(userfaultfd_armed(vma)) || unlikely(needs_fallback)) { >- nr_pages = 1; >- } else if (nr_pages > 1) { >- pgoff_t idx = folio_page_idx(folio, page); >- /* The page offset of vmf->address within the VMA. */ >- pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff; >+ if (nr_pages > 1 && likely(!needs_fallback) && likely(!userfaultfd_armed(vma))) { >+ unsigned long fault_page_idx = folio_page_idx(folio, page); >+ unsigned long pages_to_folio_end = nr_pages - fault_page_idx; >+ >+ bool fits_vma = folio_within_vma(folio, vma); >+ > /* The index of the entry in the pagetable for fault page. */ > pgoff_t pte_off = pte_index(vmf->address); > >- /* >- * Fallback to per-page fault in case the folio size in page >- * cache beyond the VMA limits and PMD pagetable limits. >- */ >- if (unlikely(vma_off < idx || >- vma_off + (nr_pages - idx) > vma_pages(vma) || >- pte_off < idx || >- pte_off + (nr_pages - idx) > PTRS_PER_PTE)) { >- nr_pages = 1; TL;DR: nothing looks broken to me here. Just wanted to spelling out what this relies on :) Old code above had four range checks before installing multiple PTEs: 1) vma_off < idx 2) vma_off + (nr_pages - idx) > vma_pages(vma) 3) pte_off < idx 4) pte_off + (nr_pages - idx) > PTRS_PER_PTE PTE-table part looks equivalent after refactor. 3) and 4) become: 3) pte_off >= fault_page_idx 4) pte_off + pages_to_folio_end <= PTRS_PER_PTE where fault_page_idx is old idx, and pages_to_folio_end is old nr_pages - idx. Only VMA part looks different to me. Old code checked same range, based on fault address, that it later installed: [vma_off - idx, vma_off + (nr_pages - idx)) After refactor, folio_within_vma() checks folio range: [folio->index, folio->index + folio_nr_pages(folio)) while installed range is still based on fault address: [vmf->pgoff - fault_page_idx, vmf->pgoff - fault_page_idx + nr_pages) So these line up when: vmf->pgoff == folio->index + fault_page_idx Looks fine for page-cache users, since vmf->page comes from folio_file_page(folio, vmf->pgoff). [...] Cheers, Lance