From: Lance Yang <lance.yang@linux.dev>
To: "David Hildenbrand (Arm)" <david@kernel.org>, dev.jain@arm.com
Cc: linmiaohe@huawei.com, muchun.song@linux.dev, osalvador@suse.de,
akpm@linux-foundation.org, ljs@kernel.org, liam@infradead.org,
riel@surriel.com, vbabka@kernel.org, harry@kernel.org,
jannh@google.com, kas@kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, rcampbell@nvidia.com,
apopple@nvidia.com, ziy@nvidia.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com, mel@csn.ul.ie,
nao.horiguchi@gmail.com, ak@linux.intel.com,
j-nomura@ce.jp.nec.com, pfalcato@suse.de, dave.hansen@intel.com,
tglx@kernel.org, jpoimboe@kernel.org, ryan.roberts@arm.com,
anshuman.khandual@arm.com, stable@vger.kernel.org
Subject: Re: [PATCH 4/5] mm/page_vma_mapped: use huge_ptep_get() for hugetlb
Date: Mon, 29 Jun 2026 16:22:49 +0800 [thread overview]
Message-ID: <458f63e2-6ee4-44c7-a230-636e1927f857@linux.dev> (raw)
In-Reply-To: <a1c6c3dd-8db1-4db6-b032-e350bacc4577@kernel.org>
On 2026/6/29 16:05, David Hildenbrand (Arm) wrote:
> On 6/29/26 09:48, Lance Yang wrote:
>>
>> On Mon, Jun 29, 2026 at 09:25:48AM +0200, David Hildenbrand (Arm) wrote:
>>> On 6/29/26 08:48, Dev Jain wrote:
>>>>
>>>>
>>>>
>>>> Sashiko notes other places:
>>>>
>>>> https://sashiko.dev/#/patchset/20260625112955.3254283-1-dev.jain%40arm.com
>>>
>>> Yeah, that looks shaky. We do seem to have a bunch of these cases, primarily
>> >from pagewalk code (where some users like pagemap need the actual address).
>>
>> Indeed ...
>>
>>> I think we have two options
>>>
>>> 1) To prevent any (further) issues, make huge_ptep_get() always consume the
>>> hstate, and let the arch code deal with aligning it. Invasive.
>>
>> Kinda lean toward option 1, even if it's more invasive. If we pass the
>> hstate down, each arch can figure out the right addr from there.
>>
>>> 2) Make the arch code handle aligning without the hstate.
>>>
>>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>>> index 30772a909aea3..303a1b74796c9 100644
>>> --- a/arch/arm64/mm/hugetlbpage.c
>>> +++ b/arch/arm64/mm/hugetlbpage.c
>>> @@ -126,6 +126,9 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>>> return orig_pte;
>>>
>>> ncontig = find_num_contig(mm, addr, ptep, &pgsize);
>>> + ptep = PTR_ALIGN_DOWN(ptep, sizeof(*ptep) * ncontig);
>>> + orig_pte = __ptep_get(ptep);
>>> +
>>> for (i = 0; i < ncontig; i++, ptep++) {
>>> pte_t pte = __ptep_get(ptep);
>>>
>>> (nshift/order instead of ncontig might avoid a multiplication, but not sure if that matters in practice)
>>>
>>> IIUC, that's similar to what huge_ptep_get() does on ppc.
>>>
>>>
>>> static inline pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>>> {
>>> if (ptep_is_8m_pmdp(mm, addr, ptep))
>>> ptep = pte_offset_kernel((pmd_t *)ptep, ALIGN_DOWN(addr, SZ_8M));
>>> return ptep_get(ptep);
>>> }
>>>
>>> I'd assume we could do the same on riscv. Besides that, I don't think any arch has cont
>>> entries.
>>
>> AFAICT, for huge_ptep_get() the addr users are arm64 and powerpc, riscv
>> doesn't really care about addr there. Looks mostly arm64-specific ...
> powerpc handles it correctly in the weird "span two PMD entries" case by
> aligning the PMD down.
>
> Risc-v copied from arm64, but can simply derive the #entries from the PTE value.
> it doesn't have to re-walk the table using the address.
Yeah, fair enough, thanks for spelling that out!
> But I think the following is required to fix, no?
>
> diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
> index a6d217112cf46..7e25cc13b3dba 100644
> --- a/arch/riscv/mm/hugetlbpage.c
> +++ b/arch/riscv/mm/hugetlbpage.c
> @@ -5,6 +5,7 @@
> #ifdef CONFIG_RISCV_ISA_SVNAPOT
> pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> {
> - unsigned long pte_num;
> + unsigned long pte_num, pte_order;
> int i;
> pte_t orig_pte = ptep_get(ptep);
> @@ -12,7 +13,11 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr,
> pte_t *ptep)
> if (!pte_present(orig_pte) || !pte_napot(orig_pte))
> return orig_pte;
>
> - pte_num = napot_pte_num(napot_cont_order(orig_pte));
> + pte_order = napot_cont_order(orig_pte);
> + pte_num = napot_pte_num(pte_order);
> +
> + ptep = PTR_ALIGN_DOWN(ptep, sizeof(*ptep) << pte_order);
> + orig_pte = ptep_get(ptep);
>
> for (i = 0; i < pte_num; i++, ptep++) {
> pte_t pte = ptep_get(ptep);
>
>
>
> I'd prefer (2) as a simple stable fix first.
Right. I'm good with (2) as the stable fix first :)
Still pretty new to arch code, but happy to stare at it some more.
> If we do (1) on top, huge_ptep_get() on arm64 could stop walking the page table
> another time.
>
> If we pass the hstate (or vma) to set_huge_pte_at(), huge_pte_clear(),
> huge_ptep_get_and_clear(), we could likely get rid of the re-walk in
> num_contig_ptes() entirely and possibly just remove it.
>
> That would probably be cleanest.
Agreed!
next prev parent reply other threads:[~2026-06-29 8:23 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 11:29 [PATCH 0/5] Fix incorrect access of hugetlb pte entries Dev Jain
2026-06-25 11:29 ` [PATCH 1/5] mm/rmap: use huge_ptep_get() in try_to_unmap_one() Dev Jain
2026-06-26 3:17 ` Muchun Song
2026-06-26 4:03 ` Dev Jain
2026-06-26 4:16 ` Muchun Song
2026-06-25 11:29 ` [PATCH 2/5] mm/rmap: use huge_ptep_get() in try_to_migrate_one() Dev Jain
2026-06-26 3:24 ` Muchun Song
2026-06-25 11:29 ` [PATCH 3/5] mm/migrate: use huge_ptep_get() in remove_migration_pte() Dev Jain
2026-06-26 3:32 ` Muchun Song
2026-06-25 11:29 ` [PATCH 4/5] mm/page_vma_mapped: use huge_ptep_get() for hugetlb Dev Jain
2026-06-26 2:31 ` Lance Yang
2026-06-26 4:06 ` Dev Jain
2026-06-26 7:48 ` Lance Yang
2026-06-26 9:14 ` Lance Yang
2026-06-26 13:23 ` Dev Jain
2026-06-26 14:10 ` Lance Yang
2026-06-26 15:26 ` Dev Jain
2026-06-26 16:46 ` Lance Yang
2026-06-27 3:54 ` Miaohe Lin
2026-06-27 7:13 ` Dev Jain
2026-06-28 5:44 ` Lance Yang
2026-06-29 6:39 ` David Hildenbrand (Arm)
2026-06-29 6:48 ` Dev Jain
2026-06-29 7:25 ` David Hildenbrand (Arm)
2026-06-29 7:48 ` Lance Yang
2026-06-29 8:05 ` David Hildenbrand (Arm)
2026-06-29 8:22 ` Lance Yang [this message]
2026-06-29 6:59 ` Lance Yang
2026-06-25 11:29 ` [PATCH 5/5] mm/mprotect: " Dev Jain
2026-06-26 3:40 ` Muchun Song
2026-06-26 4:08 ` Dev Jain
2026-06-26 4:21 ` Muchun Song
2026-06-26 4:42 ` Dev Jain
2026-06-25 13:59 ` [PATCH 0/5] Fix incorrect access of hugetlb pte entries Zi Yan
2026-06-26 4:09 ` Dev Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=458f63e2-6ee4-44c7-a230-636e1927f857@linux.dev \
--to=lance.yang@linux.dev \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=apopple@nvidia.com \
--cc=byungchul@sk.com \
--cc=dave.hansen@intel.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=gourry@gourry.net \
--cc=harry@kernel.org \
--cc=j-nomura@ce.jp.nec.com \
--cc=jannh@google.com \
--cc=joshua.hahnjy@gmail.com \
--cc=jpoimboe@kernel.org \
--cc=kas@kernel.org \
--cc=liam@infradead.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=matthew.brost@intel.com \
--cc=mel@csn.ul.ie \
--cc=muchun.song@linux.dev \
--cc=nao.horiguchi@gmail.com \
--cc=osalvador@suse.de \
--cc=pfalcato@suse.de \
--cc=rakie.kim@sk.com \
--cc=rcampbell@nvidia.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=stable@vger.kernel.org \
--cc=tglx@kernel.org \
--cc=vbabka@kernel.org \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox