All of lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Lance Yang <lance.yang@linux.dev>, dev.jain@arm.com
Cc: linmiaohe@huawei.com, muchun.song@linux.dev, osalvador@suse.de,
	akpm@linux-foundation.org, ljs@kernel.org, liam@infradead.org,
	riel@surriel.com, vbabka@kernel.org, harry@kernel.org,
	jannh@google.com, kas@kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, rcampbell@nvidia.com,
	apopple@nvidia.com, ziy@nvidia.com, matthew.brost@intel.com,
	joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
	gourry@gourry.net, ying.huang@linux.alibaba.com, mel@csn.ul.ie,
	nao.horiguchi@gmail.com, ak@linux.intel.com,
	j-nomura@ce.jp.nec.com, pfalcato@suse.de, dave.hansen@intel.com,
	tglx@kernel.org, jpoimboe@kernel.org, ryan.roberts@arm.com,
	anshuman.khandual@arm.com, stable@vger.kernel.org
Subject: Re: [PATCH 4/5] mm/page_vma_mapped: use huge_ptep_get() for hugetlb
Date: Mon, 29 Jun 2026 10:05:34 +0200	[thread overview]
Message-ID: <a1c6c3dd-8db1-4db6-b032-e350bacc4577@kernel.org> (raw)
In-Reply-To: <20260629074802.42727-1-lance.yang@linux.dev>

On 6/29/26 09:48, Lance Yang wrote:
> 
> On Mon, Jun 29, 2026 at 09:25:48AM +0200, David Hildenbrand (Arm) wrote:
>> On 6/29/26 08:48, Dev Jain wrote:
>>>
>>>
>>>
>>> Sashiko notes other places:
>>>
>>> https://sashiko.dev/#/patchset/20260625112955.3254283-1-dev.jain%40arm.com
>>
>> Yeah, that looks shaky. We do seem to have a bunch of these cases, primarily
>>from pagewalk code (where some users like pagemap need the actual address).
> 
> Indeed ...
> 
>> I think we have two options
>>
>> 1) To prevent any (further) issues, make huge_ptep_get() always consume the
>> hstate, and let the arch code deal with aligning it. Invasive.
> 
> Kinda lean toward option 1, even if it's more invasive. If we pass the
> hstate down, each arch can figure out the right addr from there.
> 
>> 2) Make the arch code handle aligning without the hstate.
>>
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 30772a909aea3..303a1b74796c9 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -126,6 +126,9 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>>                return orig_pte;
>>
>>        ncontig = find_num_contig(mm, addr, ptep, &pgsize);
>> +       ptep = PTR_ALIGN_DOWN(ptep, sizeof(*ptep) * ncontig);
>> +       orig_pte = __ptep_get(ptep);
>> +
>>        for (i = 0; i < ncontig; i++, ptep++) {
>>                pte_t pte = __ptep_get(ptep);
>>
>> (nshift/order instead of ncontig might avoid a multiplication, but not sure if that matters in practice)
>>
>> IIUC, that's similar to what huge_ptep_get() does on ppc.
>>
>>
>> static inline pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>> {
>> 	if (ptep_is_8m_pmdp(mm, addr, ptep))
>> 		ptep = pte_offset_kernel((pmd_t *)ptep, ALIGN_DOWN(addr, SZ_8M));
>> 	return ptep_get(ptep);
>> }
>>
>> I'd assume we could do the same on riscv. Besides that, I don't think any arch has cont
>> entries.
> 
> AFAICT, for huge_ptep_get() the addr users are arm64 and powerpc, riscv
> doesn't really care about addr there. Looks mostly arm64-specific ... 
powerpc handles it correctly in the weird "span two PMD entries" case by
aligning the PMD down.

Risc-v copied from arm64, but can simply derive the #entries from the PTE value.
it doesn't have to re-walk the table using the address.

But I think the following is required to fix, no?

diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index a6d217112cf46..7e25cc13b3dba 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -5,6 +5,7 @@
 #ifdef CONFIG_RISCV_ISA_SVNAPOT
 pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
-       unsigned long pte_num;
+       unsigned long pte_num, pte_order;
        int i;
        pte_t orig_pte = ptep_get(ptep);
@@ -12,7 +13,11 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr,
pte_t *ptep)
        if (!pte_present(orig_pte) || !pte_napot(orig_pte))
                return orig_pte;

-       pte_num = napot_pte_num(napot_cont_order(orig_pte));
+       pte_order = napot_cont_order(orig_pte);
+       pte_num = napot_pte_num(pte_order);
+
+       ptep = PTR_ALIGN_DOWN(ptep, sizeof(*ptep) << pte_order);
+       orig_pte = ptep_get(ptep);

        for (i = 0; i < pte_num; i++, ptep++) {
                pte_t pte = ptep_get(ptep);



I'd prefer (2) as a simple stable fix first.

If we do (1) on top, huge_ptep_get() on arm64 could stop walking the page table
another time.

If we pass the hstate (or vma) to set_huge_pte_at(), huge_pte_clear(),
huge_ptep_get_and_clear(), we could likely get rid of the re-walk in
num_contig_ptes() entirely and possibly just remove it.

That would probably be cleanest.

-- 
Cheers,

David

  reply	other threads:[~2026-06-29  8:05 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 11:29 [PATCH 0/5] Fix incorrect access of hugetlb pte entries Dev Jain
2026-06-25 11:29 ` [PATCH 1/5] mm/rmap: use huge_ptep_get() in try_to_unmap_one() Dev Jain
2026-06-26  3:17   ` Muchun Song
2026-06-26  4:03     ` Dev Jain
2026-06-26  4:16       ` Muchun Song
2026-06-25 11:29 ` [PATCH 2/5] mm/rmap: use huge_ptep_get() in try_to_migrate_one() Dev Jain
2026-06-26  3:24   ` Muchun Song
2026-06-25 11:29 ` [PATCH 3/5] mm/migrate: use huge_ptep_get() in remove_migration_pte() Dev Jain
2026-06-26  3:32   ` Muchun Song
2026-06-25 11:29 ` [PATCH 4/5] mm/page_vma_mapped: use huge_ptep_get() for hugetlb Dev Jain
2026-06-26  2:31   ` Lance Yang
2026-06-26  4:06     ` Dev Jain
2026-06-26  7:48   ` Lance Yang
2026-06-26  9:14     ` Lance Yang
2026-06-26 13:23     ` Dev Jain
2026-06-26 14:10       ` Lance Yang
2026-06-26 15:26         ` Dev Jain
2026-06-26 16:46           ` Lance Yang
2026-06-27  3:54             ` Miaohe Lin
2026-06-27  7:13             ` Dev Jain
2026-06-28  5:44               ` Lance Yang
2026-06-29  6:39                 ` David Hildenbrand (Arm)
2026-06-29  6:48                   ` Dev Jain
2026-06-29  7:25                     ` David Hildenbrand (Arm)
2026-06-29  7:48                       ` Lance Yang
2026-06-29  8:05                         ` David Hildenbrand (Arm) [this message]
2026-06-29  8:22                           ` Lance Yang
2026-06-30 11:34                           ` Dev Jain
2026-06-30 12:46                             ` David Hildenbrand (Arm)
2026-06-30 13:53                               ` Dev Jain
2026-06-30 16:40                                 ` David Hildenbrand (Arm)
2026-06-29  6:59                   ` Lance Yang
2026-06-25 11:29 ` [PATCH 5/5] mm/mprotect: " Dev Jain
2026-06-26  3:40   ` Muchun Song
2026-06-26  4:08     ` Dev Jain
2026-06-26  4:21       ` Muchun Song
2026-06-26  4:42         ` Dev Jain
2026-06-25 13:59 ` [PATCH 0/5] Fix incorrect access of hugetlb pte entries Zi Yan
2026-06-26  4:09   ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1c6c3dd-8db1-4db6-b032-e350bacc4577@kernel.org \
    --to=david@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@intel.com \
    --cc=dev.jain@arm.com \
    --cc=gourry@gourry.net \
    --cc=harry@kernel.org \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=jannh@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=jpoimboe@kernel.org \
    --cc=kas@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=mel@csn.ul.ie \
    --cc=muchun.song@linux.dev \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=pfalcato@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=rcampbell@nvidia.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@kernel.org \
    --cc=vbabka@kernel.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.