From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5838528DB56; Thu, 27 Nov 2025 16:57:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764262634; cv=none; b=EgLoWgkmiqSwtd6yMIlpAFryGMx/uMnkrpDSUqwjPeY7IWDCB0nT/FSECiYMH9mkdSvrlpaniksX+SObXit9+jzZU7rtNcXIhtSfI6S5BpWXRG40T4RbvNCxV+g8NVsv9kgfUSjrjZLX8KVRcifQmE+BWn7HPG9qD2F7zaFLXfY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764262634; c=relaxed/simple; bh=FrBJEg0cXxzCXg+RAjjum6hBih4iAmtPwX05VwBla58=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=hMiepJ3s+FOBoMJh79Rob0ViNA6eSHwFPEM0jcFF8lwJj67cBl74EtBJz7fxyNVnFe1Pl9YN1gt5IfQGnKT/TVwzDDX9XlSfdh2k4JoEswEOY3jt2bZoFCJCTf6FhnrGwDbdEulMwe8DJn/9enbIUec3wwcQQUpl2BvhG38qqnU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BA0C2176A; Thu, 27 Nov 2025 08:57:03 -0800 (PST) Received: from [10.57.87.167] (unknown [10.57.87.167]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 199DA3F66E; Thu, 27 Nov 2025 08:57:07 -0800 (PST) Message-ID: Date: Thu, 27 Nov 2025 16:57:06 +0000 Precedence: bulk X-Mailing-List: devicetree@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 08/22] mm: Allow page table accessors to be non-idempotent Content-Language: en-GB To: Samuel Holland , Palmer Dabbelt , Paul Walmsley , linux-riscv@lists.infradead.org, Andrew Morton , David Hildenbrand , linux-mm@kvack.org Cc: devicetree@vger.kernel.org, Suren Baghdasaryan , linux-kernel@vger.kernel.org, Mike Rapoport , Michal Hocko , Conor Dooley , Lorenzo Stoakes , Krzysztof Kozlowski , Alexandre Ghiti , Emil Renner Berthing , Rob Herring , Vlastimil Babka , "Liam R . Howlett" References: <20251113014656.2605447-1-samuel.holland@sifive.com> <20251113014656.2605447-9-samuel.holland@sifive.com> From: Ryan Roberts In-Reply-To: <20251113014656.2605447-9-samuel.holland@sifive.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 13/11/2025 01:45, Samuel Holland wrote: > Currently, some functions such as pte_offset_map() are passed both > pointers to hardware page tables, and pointers to previously-read PMD > entries on the stack. To ensure correctness in the first case, these > functions must use the page table accessor function (pmdp_get()) to > dereference the supplied pointer. However, this means pmdp_get() is > called twice in the second case. This double call must be avoided if > pmdp_get() applies some non-idempotent transformation to the value. > > Avoid the double transformation by calling set_pmd() on the stack > variables where necessary to keep set_pmd()/pmdp_get() calls balanced. I don't think this is a good solution. arm64, at least, expects and requires that only pointers to entries in pgtables are passed to the arch helpers (e.g. set_pte(), ptep_get(), etc). For PTEs, arm64 accesses adjacent entries within the page table to manage contiguous mappings. If it is passed a pointer to a stack variable, it may erroneously access other stuff on the stack thinking it is an entry in a page table. I think we should formalize this as a clear requirement for all these functions; all pte/pmd/pud/p4d/pgd pointers passed to the arch pgtable helpers must always point to entries in pgtables. arm64 will very likely take advantage of this in future in the pmd/pud/... helpers as it does today for the pte level. But even today, arm64's set_pmd() will emit barriers which are totally unnecessary when operating on a stack variable that the HW PTW will never see. Thanks, Ryan > > Signed-off-by: Samuel Holland > --- > > (no changes since v2) > > Changes in v2: > - New patch for v2 > > kernel/events/core.c | 2 ++ > mm/gup.c | 3 +++ > mm/khugepaged.c | 6 ++++-- > mm/page_table_check.c | 3 +++ > mm/pgtable-generic.c | 2 ++ > 5 files changed, 14 insertions(+), 2 deletions(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index fa4f9165bd94..7969b060bf2d 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -8154,6 +8154,8 @@ static u64 perf_get_pgtable_size(struct mm_struct *mm, unsigned long addr) > if (pmd_leaf(pmd)) > return pmd_leaf_size(pmd); > > + /* transform pmd as if &pmd pointed to a hardware page table */ > + set_pmd(&pmd, pmd); > ptep = pte_offset_map(&pmd, addr); > if (!ptep) > goto again; > diff --git a/mm/gup.c b/mm/gup.c > index 549f9e868311..aba61704049e 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2844,7 +2844,10 @@ static int gup_fast_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, > int ret = 0; > pte_t *ptep, *ptem; > > + /* transform pmd as if &pmd pointed to a hardware page table */ > + set_pmd(&pmd, pmd); > ptem = ptep = pte_offset_map(&pmd, addr); > + pmd = pmdp_get(&pmd); > if (!ptep) > return 0; > do { > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 1bff8ade751a..ab1f68a7bc83 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1724,7 +1724,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) > struct mmu_notifier_range range; > struct mm_struct *mm; > unsigned long addr; > - pmd_t *pmd, pgt_pmd; > + pmd_t *pmd, pgt_pmd, pmdval; > spinlock_t *pml; > spinlock_t *ptl; > bool success = false; > @@ -1777,7 +1777,9 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) > */ > if (check_pmd_state(pmd) != SCAN_SUCCEED) > goto drop_pml; > - ptl = pte_lockptr(mm, pmd); > + /* pte_lockptr() needs a value, not a pointer to a page table */ > + pmdval = pmdp_get(pmd); > + ptl = pte_lockptr(mm, &pmdval); > if (ptl != pml) > spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > > diff --git a/mm/page_table_check.c b/mm/page_table_check.c > index 31f4c39d20ef..77d6688db0de 100644 > --- a/mm/page_table_check.c > +++ b/mm/page_table_check.c > @@ -260,7 +260,10 @@ void __page_table_check_pte_clear_range(struct mm_struct *mm, > return; > > if (!pmd_bad(pmd) && !pmd_leaf(pmd)) { > + /* transform pmd as if &pmd pointed to a hardware page table */ > + set_pmd(&pmd, pmd); > pte_t *ptep = pte_offset_map(&pmd, addr); > + pmd = pmdp_get(&pmd); > unsigned long i; > > if (WARN_ON(!ptep)) > diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c > index 63a573306bfa..6602deb002f1 100644 > --- a/mm/pgtable-generic.c > +++ b/mm/pgtable-generic.c > @@ -299,6 +299,8 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) > pmd_clear_bad(pmd); > goto nomap; > } > + /* transform pmdval as if &pmdval pointed to a hardware page table */ > + set_pmd(&pmdval, pmdval); > return __pte_map(&pmdval, addr); > nomap: > rcu_read_unlock();