From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B28B380FD6 for ; Fri, 19 Jun 2026 12:34:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781872501; cv=none; b=LEpQ5s1TLTwqnRe7KyijWWKWIGXeFDRJGf+v88vhFW6bsOy1b5PTndg0Dyp5bTqO0EQMgojlEUpClA5EPb25QCCNXN7dXxt+5Qpskgt8dD/AFoS/2bYfJFcxbard1HwdSyE+rJDhSeRacRcYtmvVTalEWVSqtsIYhIQPnL07ECw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781872501; c=relaxed/simple; bh=mxsj9P5OnM7NzDY85DVoeFqUIht8PRHsVngiqotprxA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=H3QcbiozVM72zlRuJk/eVKYG1pyh9pEXBGElDnELN1y0/lKUQi2cx69fEQq7fFLG8QPMAC8D1Czl/kF/Ga2AvXP4pXkDlGQKnBzi9Ob8NI9rkBDeqqBeObgfElQC2KFM+BcCT8A+99p920C7TzdBzTGfN1IEEAkJniF3TUWY0J0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bkCvrc8q; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bkCvrc8q" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 78EF71F000E9; Fri, 19 Jun 2026 12:34:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781872499; bh=LPTdqcfE3XkHm6Hlqj60cB+qa6fB0LufMaAVlDRj1Jg=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=bkCvrc8q91bS7UYJUGghBnJYg0zzAF3AQB8nybB/1ygAtU2Eh7B2LLO04LAVVJTJB Z2aXlMRK80r8gBnuRHY4H/rNMM2aqIjgqujVWm99e1v1IctHmpmJvSTA4EoqWtifHh WPI0X+zGb7uSTEJN6yH/Rweu9UW+i+CbRg0QNblk35T0kx0JFgLNhMuHdCxzGr0oLB ylmE/jz6LIzCr20pVGJcslfYiYvLJtXu3MazuL+m8pYxsdThiuetua4kLrFYmduZKw WyPbLEmmMH2Oy2slPZN8IUToZ4xtrjoiXepDLzBQoqL+VxIqsoG9HNDZjP93dYBrsG ZCV6s7gXeQ6cA== Date: Fri, 19 Jun 2026 13:34:53 +0100 From: Lorenzo Stoakes To: Rik van Riel Cc: linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Dmitry Ilvokhin , Borislav Petkov , Dave Hansen , Andrew Morton , David Hildenbrand , "Liam R. Howlett" , Vlastimil Babka , Suren Baghdasaryan Subject: Re: [PATCH 2/3] mm/pagewalk: let folio_walk_start() run under the per-VMA lock Message-ID: References: <20260616190300.1509639-1-riel@surriel.com> <20260616190300.1509639-3-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260616190300.1509639-3-riel@surriel.com> On Tue, Jun 16, 2026 at 03:02:59PM -0400, Rik van Riel wrote: > folio_walk_start() asserts that the mmap lock is held. For callers that > only need to read a single, already-present page, the mmap lock is a > heavy and often badly contended hammer: the VMA can instead be > stabilized with the per-VMA lock, and the page table pages that are > walked are kept alive by RCU page-table freeing > (CONFIG_MMU_GATHER_RCU_TABLE_FREE). See below, I don't think this is correct? > > Add an FW_VMA_LOCKED flag. When passed, folio_walk_start() asserts the > per-VMA lock instead of the mmap lock, requires RCU-freed page tables, > and refuses hugetlb VMAs (PMD sharing cannot be walked safely this way). This is mostly superfluous. You can just say you added the flag to use a VMA flag. You put in parens the key thing about hugetlb, I think you should break that out. > Everything else folio_walk_start() relies on -- the page table locks, > pmdp_get_lockless() and pte_offset_map_lock() -- is already safe without is -> are. > the mmap lock, mirroring the per-VMA lock page fault path. I'm not sure I understand why you have to have RCU freed page tables but then say that you didn't need it here? Strange to reference arbitrary functions from folio_walk_start() too. > > No existing caller passes FW_VMA_LOCKED, so behaviour is unchanged. > > Assisted-by: Claude:claude-opus-4-8 > Signed-off-by: Rik van Riel > --- > include/linux/pagewalk.h | 5 +++++ > mm/pagewalk.c | 18 ++++++++++++++++-- > 2 files changed, 21 insertions(+), 2 deletions(-) > > diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h > index b41d7265c01b..84dd0d68f747 100644 > --- a/include/linux/pagewalk.h > +++ b/include/linux/pagewalk.h > @@ -150,6 +150,11 @@ typedef int __bitwise folio_walk_flags_t; > > /* Walk shared zeropages (small + huge) as well. */ > #define FW_ZEROPAGE ((__force folio_walk_flags_t)BIT(0)) > +/* > + * The caller holds the per-VMA lock instead of the mmap lock. Only valid with > + * RCU-freed page tables (CONFIG_MMU_GATHER_RCU_TABLE_FREE) and not for hugetlb. > + */ Hang on how could we be freeing higher level page tables of a VMA that's still locked? A VMA lock stabilises page tables for traversal, so why do you require CONFIG_MMU_GATHER_RCU_TABLE_FREE here? What will free the higher-level page tables? Ref: https://origin.kernel.org/doc/html/latest/mm/process_addrs.html#page-table > +#define FW_VMA_LOCKED ((__force folio_walk_flags_t)BIT(1)) > > enum folio_walk_level { > FW_LEVEL_PTE, > diff --git a/mm/pagewalk.c b/mm/pagewalk.c > index 3ae2586ff45b..c85364b73e12 100644 > --- a/mm/pagewalk.c > +++ b/mm/pagewalk.c > @@ -890,7 +890,9 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index, > * huge_ptep_set_*, ...). Note that the page table entry stored in @fw might > * not correspond to the first physical entry of a logical hugetlb entry. > * > - * The mmap lock must be held in read mode. > + * The mmap lock must be held in read mode. Alternatively, if @FW_VMA_LOCKED is > + * passed, the VMA's per-VMA lock must be held (only supported with RCU-freed > + * page tables, i.e. CONFIG_MMU_GATHER_RCU_TABLE_FREE, and not for hugetlb). > * > * Return: folio pointer on success, otherwise NULL. > */ > @@ -908,7 +910,19 @@ struct folio *folio_walk_start(struct folio_walk *fw, > pgd_t *pgdp; > p4d_t *p4dp; > > - mmap_assert_locked(vma->vm_mm); > + if (flags & FW_VMA_LOCKED) { > + /* > + * Lockless walk: the per-VMA lock keeps the VMA stable, and > + * RCU-freed page tables keep the walked page table pages alive > + * across the lockless upper-level walk and pte_offset_map_lock(). Err, but we take locks as normal on the walk? > + * Hugetlb (PMD sharing) is not supported on this path. I don't get the explanation above and then you just write a line that says what your assert is doing with zero explanation here. You should explain why hugetlb isn't supported. > + */ > + VM_WARN_ON_ONCE(!IS_ENABLED(CONFIG_MMU_GATHER_RCU_TABLE_FREE)); > + VM_WARN_ON_ONCE(is_vm_hugetlb_page(vma)); > + vma_assert_locked(vma); > + } else { > + mmap_assert_locked(vma->vm_mm); > + } > vma_pgtable_walk_begin(vma); > > if (WARN_ON_ONCE(addr < vma->vm_start || addr >= vma->vm_end)) > -- > 2.53.0-Meta > Thanks, Lorenzo