From: Lance Yang <lance.yang@linux.dev>
To: Dev Jain <dev.jain@arm.com>, linmiaohe@huawei.com
Cc: muchun.song@linux.dev, osalvador@suse.de,
akpm@linux-foundation.org, ljs@kernel.org, david@kernel.org,
liam@infradead.org, riel@surriel.com, vbabka@kernel.org,
harry@kernel.org, jannh@google.com, kas@kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
rcampbell@nvidia.com, apopple@nvidia.com, ziy@nvidia.com,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, mel@csn.ul.ie,
nao.horiguchi@gmail.com, ak@linux.intel.com,
j-nomura@ce.jp.nec.com, pfalcato@suse.de, dave.hansen@intel.com,
tglx@kernel.org, jpoimboe@kernel.org, ryan.roberts@arm.com,
anshuman.khandual@arm.com, stable@vger.kernel.org
Subject: Re: [PATCH 4/5] mm/page_vma_mapped: use huge_ptep_get() for hugetlb
Date: Sat, 27 Jun 2026 00:46:38 +0800 [thread overview]
Message-ID: <edee8461-34c6-41e9-ae0c-076380d92ebb@linux.dev> (raw)
In-Reply-To: <61e9fcf7-02a5-4285-948b-62fba4dcd69c@arm.com>
On 2026/6/26 23:26, Dev Jain wrote:
>
>
> On 26/06/26 7:40 pm, Lance Yang wrote:
>>
>> On Fri, Jun 26, 2026 at 06:53:10PM +0530, Dev Jain wrote:
>>>
>>>
>>> On 26/06/26 1:18 pm, Lance Yang wrote:
>>>>
>>>> On Thu, Jun 25, 2026 at 11:29:53AM +0000, Dev Jain wrote:
>>>>> check_pte() is the final validation step in page_vma_mapped_walk().
>>>>> It reads pvmw->pte with ptep_get() to decide whether the entry maps
>>>>> the PFN range being walked. For hugetlb VMAs, that pointer refers
>>>>> to a hugetlb entry.
>>>>>
>>>>> On arches which provide their own huge_ptep_get() to dereference a huge
>>>>> pte pointer, accessing via ptep_get() would cause pte_pfn(),
>>>>> pte_present() etc to misbehave.
>>>>>
>>>>> It is not clear whether this has a trivially visible effect to userspace.
>>>>>
>>>>> Use huge_ptep_get() to dereference a huge pte pointer.
>>>>>
>>>>> Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
>>>>> Cc: stable@vger.kernel.org
>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>> ---
>>>>> mm/page_vma_mapped.c | 8 +++++++-
>>>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>>>> index 2ccbabfb2cc17..18e1d341f463c 100644
>>>>> --- a/mm/page_vma_mapped.c
>>>>> +++ b/mm/page_vma_mapped.c
>>>>> @@ -107,7 +107,13 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw, pmd_t *pmdvalp,
>>>>> static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
>>>>> {
>>>>> unsigned long pfn;
>>>>> - pte_t ptent = ptep_get(pvmw->pte);
>>>>> + pte_t ptent;
>>>>> +
>>>>> + if (is_vm_hugetlb_page(pvmw->vma))
>>>>> + ptent = huge_ptep_get(pvmw->vma->vm_mm, pvmw->address,
>>>>> + pvmw->pte);
>>>>
>>>> I think check_pte() can pass a wrong address to huge_ptep_get() ...
>>>
>>> Won't this be handled by rmap_walk_anon/rmap_walk_file - they are the ones
>>> performing the rmap traversal and passing address to try_to_unmap_one/folio_referenced_one
>>> etc ...
>>
>> Right, that should cover the rmap callbacks. The bit I was worried about
>> is page_mapped_in_vma() though.
>>
>>>>
>>>> Not sure that is wrong in the first place. For memory failure,
>>>> page_mapped_in_vma() can be called with a poisoned tail page of a hugetlb
>>>> folio. In that case, pvmw->address need not be hugepage-aligned.
>>>>
>>>> @Miaohe
>>
>> For hugetlb memory failure we start with the poisoned PFN:
>>
>> static int try_memory_failure_hugetlb(unsigned long pfn, int flags)
>> {
>> ...
>> struct page *p = pfn_to_page(pfn);
>> struct folio *folio;
>> ...
>>
>> folio = page_folio(p);
>>
>> ...
>>
>> if (!hwpoison_user_mappings(folio, p, pfn, flags)) {
>> ...
>> }
>>
>> ...
>> }
>>
>> and pass the same p down:
>>
>> static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
>> unsigned long pfn, int flags)
>> {
>> ...
>>
>> collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>
>> ...
>> }
>>
>> static void collect_procs(const struct folio *folio, const struct page *page,
>> struct list_head *tokill, int force_early)
>> {
>> ...
>>
>> if (unlikely(folio_test_ksm(folio)))
>> ...
>> else if (folio_test_anon(folio))
>> collect_procs_anon(folio, page, tokill, force_early);
>> else
>> ...
>> }
>>
>> So collect_procs_anon() still gets the poisoned page, not &folio->page:
>>
>> static void collect_procs_anon(const struct folio *folio,
>> const struct page *page, struct list_head *to_kill,
>> int force_early)
>> {
>> ...
>>
>> pgoff = page_pgoff(folio, page);
>> rcu_read_lock();
>> for_each_process(tsk) {
>> ...
>>
>> anon_vma_interval_tree_foreach(vmac, &av->rb_root,
>> pgoff, pgoff) {
>> ...
>> addr = page_mapped_in_vma(page, vma);
>> ...
>> }
>> }
>> rcu_read_unlock();
>> anon_vma_unlock_read(av);
>> }
>>
>> page_mapped_in_vma() then builds pvmw for that page:
>>
>> unsigned long page_mapped_in_vma(const struct page *page,
>> struct vm_area_struct *vma)
>> {
>> const struct folio *folio = page_folio(page);
>> struct page_vma_mapped_walk pvmw = {
>> .pfn = page_to_pfn(page),
>> .nr_pages = 1,
>> .vma = vma,
>> .flags = PVMW_SYNC,
>> };
>>
>> pvmw.address = vma_address(vma, page_pgoff(folio, page), 1);
>> ...
>> }
>>
>> and page_pgoff() includes the subpage index:
>>
>> static inline pgoff_t page_pgoff(const struct folio *folio,
>> const struct page *page)
>> {
>> return folio->index + folio_page_idx(folio, page);
>> }
>>
>> So if the poisoned PFN points to a tail page, pvmw->address can be offset
>> from the start of the hugetlb mapping by
>>
>> folio_page_idx(folio, page) << PAGE_SHIFT
>>
>> Should check_pte() pass the hugepage-aligned address to huge_ptep_get()
>> for that case?
>
> Thanks! This looks correct.
>
> I can indeed fix this up in check_pte. But in the memory-failure path
> it has always been confusing to me for hugetlb folios why we are bothering
> with the tail page. I am sure that area can also be simplified. But for
> now I'll just do a simple fix here itself.
Just thinking out loud: given that huge_ptep_get() already assumes that
addr matches the huge pte, at least on arm64, would it make sense to
have a small hugetlb wrapper around it that takes hstate and aligns
the address before calling the arch helper?
Might make the rule clearer, and a bit harder to get wrong again :)
Thanks, Lance
>
>>
>> Cheers, Lance
>>
>>>>
>>>> For arm64, CONT_PMD_SIZE is one supported HugeTLB size. With such a VMA,
>>>> page_vma_mapped_walk() passes that size to hugetlb_walk():
>>>>
>>>> bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>>> {
>>>> ...
>>>> if (unlikely(is_vm_hugetlb_page(vma))) {
>>>> ...
>>>> pvmw->pte = hugetlb_walk(vma, pvmw->address, size);
>>>> ...
>>>> }
>>>> ...
>>>> }
>>>>
>>>> hugetlb_walk() then calls arm64 huge_pte_offset(mm, addr, sz). For
>>>> sz == CONT_PMD_SIZE, huge_pte_offset() aligns its local addr before
>>>> calculating pmdp:
>>>>
>>>> pte_t *huge_pte_offset(struct mm_struct *mm,
>>>> unsigned long addr, unsigned long sz)
>>>> {
>>>> ...
>>>> if (sz == CONT_PMD_SIZE)
>>>> addr &= CONT_PMD_MASK;
>>>>
>>>> pmdp = pmd_offset(pudp, addr);
>>>> pmd = READ_ONCE(*pmdp);
>>>> ...
>>>> }
>>>>
>>>> So for that case, pvmw->pte is calculated from the aligned addr, not
>>>> necessarily from the original pvmw->address. But check_pte() passes the
>>>> original address together with pvmw->pte:
>>>>
>>>> + ptent = huge_ptep_get(pvmw->vma->vm_mm, pvmw->address,
>>>> + pvmw->pte);
>>>>
>>>> arm64 then uses that addr again to choose ncontig:
>>>>
>>>> pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>>>> {
>>>> ...
>>>> ncontig = find_num_contig(mm, addr, ptep, &pgsize);
>>>> for (i = 0; i < ncontig; i++, ptep++) {
>>>> ...
>>>> }
>>>> return orig_pte;
>>>> }
>>>>
>>>> static int find_num_contig(struct mm_struct *mm, unsigned long addr,
>>>> pte_t *ptep, size_t *pgsize)
>>>> {
>>>> pgd_t *pgdp = pgd_offset(mm, addr);
>>>> p4d_t *p4dp;
>>>> pud_t *pudp;
>>>> pmd_t *pmdp;
>>>>
>>>> *pgsize = PAGE_SIZE;
>>>> p4dp = p4d_offset(pgdp, addr);
>>>> pudp = pud_offset(p4dp, addr);
>>>> pmdp = pmd_offset(pudp, addr);
>>>> if ((pte_t *)pmdp == ptep) {
>>>> *pgsize = PMD_SIZE;
>>>> return CONT_PMDS;
>>>> }
>>>> return CONT_PTES;
>>>> }
>>>>
>>>> With a tail address, pmdp may no longer point at pvmw->pte, so
>>>> find_num_contig() can return CONT_PTES for a CONT_PMD HugeTLB mapping.
>>>>
>>>> On 16K arm64, that changes ncontig from 32 to 128. So huge_ptep_get()
>>>> can walk past the CONT_PMD entries, and possibly past the PMD table.
>>>>
>>>> Should check_pte() pass the address matching pvmw->pte, sth like:
>>>>
>>>> ---8<---
>>>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>>> index 406fd50bbd8f..58463493bd3d 100644
>>>> --- a/mm/page_vma_mapped.c
>>>> +++ b/mm/page_vma_mapped.c
>>>> @@ -109,11 +109,14 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
>>>> unsigned long pfn;
>>>> pte_t ptent;
>>>>
>>>> - if (is_vm_hugetlb_page(pvmw->vma))
>>>> - ptent = huge_ptep_get(pvmw->vma->vm_mm, pvmw->address,
>>>> - pvmw->pte);
>>>> - else
>>>> + if (is_vm_hugetlb_page(pvmw->vma)) {
>>>> + struct hstate *hstate = hstate_vma(pvmw->vma);
>>>> + unsigned long haddr = pvmw->address & huge_page_mask(hstate);
>>>> +
>>>> + ptent = huge_ptep_get(pvmw->vma->vm_mm, haddr, pvmw->pte);
>>>> + } else {
>>>> ptent = ptep_get(pvmw->pte);
>>>> + }
>>>>
>>>> if (pvmw->flags & PVMW_MIGRATION) {
>>>> const softleaf_t entry = softleaf_from_pte(ptent);
>>>> --
>>>>
>>>> while leaving pvmw->address unchanged for page_mapped_in_vma()?
>>>>
>>>> Cheers, Lance
>>>>
>>>>> + else
>>>>> + ptent = ptep_get(pvmw->pte);
>>>>>
>>>>> if (pvmw->flags & PVMW_MIGRATION) {
>>>>> const softleaf_t entry = softleaf_from_pte(ptent);
>>>>> --
>>>>> 2.43.0
>>>>>
>>>>>
>>>
>>>
>>
>
next prev parent reply other threads:[~2026-06-26 16:46 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 11:29 [PATCH 0/5] Fix incorrect access of hugetlb pte entries Dev Jain
2026-06-25 11:29 ` [PATCH 1/5] mm/rmap: use huge_ptep_get() in try_to_unmap_one() Dev Jain
2026-06-26 3:17 ` Muchun Song
2026-06-26 4:03 ` Dev Jain
2026-06-26 4:16 ` Muchun Song
2026-06-25 11:29 ` [PATCH 2/5] mm/rmap: use huge_ptep_get() in try_to_migrate_one() Dev Jain
2026-06-26 3:24 ` Muchun Song
2026-06-25 11:29 ` [PATCH 3/5] mm/migrate: use huge_ptep_get() in remove_migration_pte() Dev Jain
2026-06-26 3:32 ` Muchun Song
2026-06-25 11:29 ` [PATCH 4/5] mm/page_vma_mapped: use huge_ptep_get() for hugetlb Dev Jain
2026-06-26 2:31 ` Lance Yang
2026-06-26 4:06 ` Dev Jain
2026-06-26 7:48 ` Lance Yang
2026-06-26 9:14 ` Lance Yang
2026-06-26 13:23 ` Dev Jain
2026-06-26 14:10 ` Lance Yang
2026-06-26 15:26 ` Dev Jain
2026-06-26 16:46 ` Lance Yang [this message]
2026-06-25 11:29 ` [PATCH 5/5] mm/mprotect: " Dev Jain
2026-06-26 3:40 ` Muchun Song
2026-06-26 4:08 ` Dev Jain
2026-06-26 4:21 ` Muchun Song
2026-06-26 4:42 ` Dev Jain
2026-06-25 13:59 ` [PATCH 0/5] Fix incorrect access of hugetlb pte entries Zi Yan
2026-06-26 4:09 ` Dev Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=edee8461-34c6-41e9-ae0c-076380d92ebb@linux.dev \
--to=lance.yang@linux.dev \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=apopple@nvidia.com \
--cc=byungchul@sk.com \
--cc=dave.hansen@intel.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=gourry@gourry.net \
--cc=harry@kernel.org \
--cc=j-nomura@ce.jp.nec.com \
--cc=jannh@google.com \
--cc=joshua.hahnjy@gmail.com \
--cc=jpoimboe@kernel.org \
--cc=kas@kernel.org \
--cc=liam@infradead.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=matthew.brost@intel.com \
--cc=mel@csn.ul.ie \
--cc=muchun.song@linux.dev \
--cc=nao.horiguchi@gmail.com \
--cc=osalvador@suse.de \
--cc=pfalcato@suse.de \
--cc=rakie.kim@sk.com \
--cc=rcampbell@nvidia.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=stable@vger.kernel.org \
--cc=tglx@kernel.org \
--cc=vbabka@kernel.org \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox