* [PATCH] mm/hugetlb: LoongArch: Return NULL from huge_pte_offset() for none PMD @ 2025-04-24 8:30 Ming Wang 2025-04-24 18:48 ` Peter Xu 2025-04-26 10:13 ` Huacai Chen 0 siblings, 2 replies; 5+ messages in thread From: Ming Wang @ 2025-04-24 8:30 UTC (permalink / raw) To: Huacai Chen, WANG Xuerui, Peter Xu, Andrew Morton, Hongchen Zhang, Ming Wang, loongarch, linux-kernel Cc: lixuefeng, gaojuxin, chenhuacai LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot even if the underlying entry points to invalid_pte_table (indicating no mapping). Callers like smaps_hugetlb_range() fetch this invalid entry value (the address of invalid_pte_table) via this pointer. The generic is_swap_pte() check then incorrectly identifies this address as a swap entry on LoongArch, because it satisfies the !pte_present() && !pte_none() conditions. This misinterpretation, combined with a coincidental match by is_migration_entry() on the address bits, leads to kernel crashes in pfn_swap_entry_to_page(). Fix this at the architecture level by modifying huge_pte_offset() to check the PMD entry's content using pmd_none() before returning. If the entry is none (i.e., it points to invalid_pte_table), return NULL instead of the pointer to the slot. Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Ming Wang <wangming01@loongson.cn> --- arch/loongarch/mm/hugetlbpage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c index e4068906143b..cea84d7f2b91 100644 --- a/arch/loongarch/mm/hugetlbpage.c +++ b/arch/loongarch/mm/hugetlbpage.c @@ -47,7 +47,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, pmd = pmd_offset(pud, addr); } } - return (pte_t *) pmd; + return pmd_none(pmdp_get(pmd)) ? NULL : (pte_t *) pmd; } uint64_t pmd_to_entrylo(unsigned long pmd_val) -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hugetlb: LoongArch: Return NULL from huge_pte_offset() for none PMD 2025-04-24 8:30 [PATCH] mm/hugetlb: LoongArch: Return NULL from huge_pte_offset() for none PMD Ming Wang @ 2025-04-24 18:48 ` Peter Xu 2025-04-25 2:47 ` Ming Wang 2025-04-26 10:13 ` Huacai Chen 1 sibling, 1 reply; 5+ messages in thread From: Peter Xu @ 2025-04-24 18:48 UTC (permalink / raw) To: Ming Wang Cc: Huacai Chen, WANG Xuerui, Andrew Morton, Hongchen Zhang, loongarch, linux-kernel, lixuefeng, gaojuxin, chenhuacai On Thu, Apr 24, 2025 at 04:30:37PM +0800, Ming Wang wrote: > LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot > even if the underlying entry points to invalid_pte_table (indicating no > mapping). Callers like smaps_hugetlb_range() fetch this invalid entry > value (the address of invalid_pte_table) via this pointer. So it's in smaps_hugetlb_range() context, then.. > > The generic is_swap_pte() check then incorrectly identifies this address > as a swap entry on LoongArch, because it satisfies the !pte_present() > && !pte_none() conditions. This misinterpretation, combined with a > coincidental match by is_migration_entry() on the address bits, leads > to kernel crashes in pfn_swap_entry_to_page(). .. you mentioned !pte_none() is checked. Could you explain why it's pte_none() rather than huge_pte_none()? As I saw loongarch has exactly this.. static inline int huge_pte_none(pte_t pte) { unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL; return !val || (val == (unsigned long)invalid_pte_table); } I'm not familiar with loongarch's invalid_pte_table, but I would expect at least for now in hugetlb specific paths it should be reported as none even without this patch. One more thing to mention is the kernel (AFAIU) is trying to moving away from hugetlb specific pgtable parsing to generic pgtable walkers. It means it could happen at some point where the kernel walks the hugetlb pgtables using normal pgtable APIs. I'm not yet sure what would happen then, but maybe at some point the invalid_pte_table check is needed in pte_none() too for loongarch. Thanks, > > Fix this at the architecture level by modifying huge_pte_offset() to > check the PMD entry's content using pmd_none() before returning. If the > entry is none (i.e., it points to invalid_pte_table), return NULL > instead of the pointer to the slot. > > Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn> > Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > Signed-off-by: Ming Wang <wangming01@loongson.cn> > --- > arch/loongarch/mm/hugetlbpage.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c > index e4068906143b..cea84d7f2b91 100644 > --- a/arch/loongarch/mm/hugetlbpage.c > +++ b/arch/loongarch/mm/hugetlbpage.c > @@ -47,7 +47,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, > pmd = pmd_offset(pud, addr); > } > } > - return (pte_t *) pmd; > + return pmd_none(pmdp_get(pmd)) ? NULL : (pte_t *) pmd; > } > > uint64_t pmd_to_entrylo(unsigned long pmd_val) > -- > 2.43.0 > -- Peter Xu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hugetlb: LoongArch: Return NULL from huge_pte_offset() for none PMD 2025-04-24 18:48 ` Peter Xu @ 2025-04-25 2:47 ` Ming Wang 2025-04-25 15:28 ` Peter Xu 0 siblings, 1 reply; 5+ messages in thread From: Ming Wang @ 2025-04-25 2:47 UTC (permalink / raw) To: Peter Xu Cc: Huacai Chen, WANG Xuerui, Andrew Morton, Hongchen Zhang, loongarch, linux-kernel, lixuefeng, gaojuxin, chenhuacai Hi Peter Xu, Thanks for taking a look and raising these important points! On 4/25/25 02:48, Peter Xu wrote: > On Thu, Apr 24, 2025 at 04:30:37PM +0800, Ming Wang wrote: >> LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot >> even if the underlying entry points to invalid_pte_table (indicating no >> mapping). Callers like smaps_hugetlb_range() fetch this invalid entry >> value (the address of invalid_pte_table) via this pointer. > > So it's in smaps_hugetlb_range() context, then.. The root cause involves several steps: 1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD (or relevant level) entry is often populated by the kernel during mmap() with a non-present entry pointing to the architecture's invalid_pte_table On the affected LoongArch system, this address was observed to be 0x90000000031e4000. 2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads this entry. 3. The generic is_swap_pte() macro checks `!pte_present() && !pte_none()`. The entry (invalid_pte_table address) is not present. Crucially, the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`) returns false because the invalid_pte_table address is non-zero. Therefore, is_swap_pte() incorrectly returns true. 4. The code enters the `else if (is_swap_pte(...))` block. 5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit pattern coincidence in the invalid_pte_table address on LoongArch, the embedded generic `is_migration_entry()` check happens to return true (misinterpreting parts of the address as a migration type). 6. This leads to a call to pfn_swap_entry_to_page() with the bogus swap entry derived from the invalid table address. 7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an unrelated struct page, checks its lock status (unlocked), and hits the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion. > >> >> The generic is_swap_pte() check then incorrectly identifies this address >> as a swap entry on LoongArch, because it satisfies the !pte_present() >> && !pte_none() conditions. This misinterpretation, combined with a >> coincidental match by is_migration_entry() on the address bits, leads >> to kernel crashes in pfn_swap_entry_to_page(). > > .. you mentioned !pte_none() is checked. Could you explain why it's > pte_none() rather than huge_pte_none()? As I saw loongarch has exactly > this.. > > static inline int huge_pte_none(pte_t pte) > { > unsigned long val = pte_val(pte) & ~_PAGE_GLOBAL; > return !val || (val == (unsigned long)invalid_pte_table); > } > > I'm not familiar with loongarch's invalid_pte_table, but I would expect at > least for now in hugetlb specific paths it should be reported as none even > without this patch. > > One more thing to mention is the kernel (AFAIU) is trying to moving away > from hugetlb specific pgtable parsing to generic pgtable walkers. It means > it could happen at some point where the kernel walks the hugetlb pgtables > using normal pgtable APIs. I'm not yet sure what would happen then, but > maybe at some point the invalid_pte_table check is needed in pte_none() too > for loongarch. > > Thanks, You asked why the check involves pte_none() rather than huge_pte_none(), given that LoongArch provides the latter which correctly identifies the invalid_pte_table address. That's a great question, and the crux seems to be in how the generic code path works. The crash originates within smaps_hugetlb_range() after the generic `is_swap_pte(ptent)` macro returns true. Looking at the definition of `is_swap_pte()` (in include/linux/mm.h or similar), it typically expands to `!pte_present(pte) && !pte_none(pte)`. Critically, even though `smaps_hugetlb_range()` deals with HugeTLB entries (often PMDs cast to pte_t), the generic `is_swap_pte()` macro itself, when expanded, calls the **generic `pte_none()` macro**, not the specialized `huge_pte_none()`. LoongArch's generic `pte_none()` macro is defined as: `#define pte_none(pte) (!(pte_val(pte) & ~_PAGE_GLOBAL))` This definition does *not* check for the `invalid_pte_table` address and thus returns false for it, leading to `is_swap_pte()` incorrectly returning true. So, while LoongArch does provide `huge_pte_none()` which *could* correctly identify the state, it's not actually invoked in the code path triggered by `is_swap_pte()` within `smaps_hugetlb_range()`. This is why modifying `huge_pte_offset()` seems necessary and appropriate at the architecture level. By returning NULL when the underlying PMD entry is none (checked using the correct `pmd_none()`, which *does* check for invalid_pte_table on LoongArch), we prevent the invalid pointer and its problematic value from reaching `smaps_hugetlb_range()` and subsequently fooling the generic `is_swap_pte()` check that uses the generic `pte_none()`. Regarding your point about generic page table walkers possibly needing `pte_none()` itself to handle `invalid_pte_table` in the future – I understand the concern. That might indeed be a separate, future enhancement needed for LoongArch's generic page table support. However, the current patch addresses the immediate crash within the existing hugetlb-specific walker (`smaps_hugetlb_range`) by stopping the problematic value at the source (`huge_pte_offset`), which seems like a necessary and correct fix for the present issue. Does this explanation clarify the interaction between the generic macros and the arch-specific helpers in this context? Thanks, Ming > >> >> Fix this at the architecture level by modifying huge_pte_offset() to >> check the PMD entry's content using pmd_none() before returning. If the >> entry is none (i.e., it points to invalid_pte_table), return NULL >> instead of the pointer to the slot. >> >> Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn> >> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> >> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> >> Signed-off-by: Ming Wang <wangming01@loongson.cn> >> --- >> arch/loongarch/mm/hugetlbpage.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c >> index e4068906143b..cea84d7f2b91 100644 >> --- a/arch/loongarch/mm/hugetlbpage.c >> +++ b/arch/loongarch/mm/hugetlbpage.c >> @@ -47,7 +47,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, >> pmd = pmd_offset(pud, addr); >> } >> } >> - return (pte_t *) pmd; >> + return pmd_none(pmdp_get(pmd)) ? NULL : (pte_t *) pmd; >> } >> >> uint64_t pmd_to_entrylo(unsigned long pmd_val) >> -- >> 2.43.0 >> > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hugetlb: LoongArch: Return NULL from huge_pte_offset() for none PMD 2025-04-25 2:47 ` Ming Wang @ 2025-04-25 15:28 ` Peter Xu 0 siblings, 0 replies; 5+ messages in thread From: Peter Xu @ 2025-04-25 15:28 UTC (permalink / raw) To: Ming Wang Cc: Huacai Chen, WANG Xuerui, Andrew Morton, Hongchen Zhang, loongarch, linux-kernel, lixuefeng, gaojuxin, chenhuacai On Fri, Apr 25, 2025 at 10:47:24AM +0800, Ming Wang wrote: > Hi Peter Xu, Hi, Ming, [...] > You asked why the check involves pte_none() rather than huge_pte_none(), given that LoongArch > provides the latter which correctly identifies the invalid_pte_table address. > > That's a great question, and the crux seems to be in how the generic code path works. The crash > originates within smaps_hugetlb_range() after the generic `is_swap_pte(ptent)` macro returns true. > Looking at the definition of `is_swap_pte()` (in include/linux/mm.h or similar), it typically > expands to `!pte_present(pte) && !pte_none(pte)`. > > Critically, even though `smaps_hugetlb_range()` deals with HugeTLB entries (often PMDs cast to pte_t), > the generic `is_swap_pte()` macro itself, when expanded, calls the **generic `pte_none()` macro**, not > the specialized `huge_pte_none()`. > > LoongArch's generic `pte_none()` macro is defined as: > `#define pte_none(pte) (!(pte_val(pte) & ~_PAGE_GLOBAL))` > This definition does *not* check for the `invalid_pte_table` address and thus returns false for it, > leading to `is_swap_pte()` incorrectly returning true. > > So, while LoongArch does provide `huge_pte_none()` which *could* correctly identify the state, it's not > actually invoked in the code path triggered by `is_swap_pte()` within `smaps_hugetlb_range()`. > > This is why modifying `huge_pte_offset()` seems necessary and appropriate at the architecture level. > By returning NULL when the underlying PMD entry is none (checked using the correct `pmd_none()`, which *does* > check for invalid_pte_table on LoongArch), we prevent the invalid pointer and its problematic value from reaching > `smaps_hugetlb_range()` and subsequently fooling the generic `is_swap_pte()` check that uses the generic `pte_none()`. > > Regarding your point about generic page table walkers possibly needing `pte_none()` itself to handle `invalid_pte_table` > in the future – I understand the concern. That might indeed be a separate, future enhancement needed for LoongArch's > generic page table support. However, the current patch addresses the immediate crash within the existing hugetlb-specific > walker (`smaps_hugetlb_range`) by stopping the problematic value at the source (`huge_pte_offset`), which seems like a > necessary and correct fix for the present issue. > > Does this explanation clarify the interaction between the generic macros and the arch-specific helpers in this context? I see what's off here - I'm looking at Andrew's latest mm-unstable, which contains your other fix already: commit 2f46598ca15065ff7efac3dba466899608bfc659 Author: Ming Wang <wangming01@loongson.cn> Date: Wed Apr 23 09:03:59 2025 +0800 smaps: fix crash in smaps_hugetlb_range for non-present hugetlb entries So we're reading different code base.. Looks like the generic mm code used is_swap_pte() in multiple occurances on hugetlb ptes already. Besides smap code you mentioned, I at least also see page_vma_mapped_walk -> check_pte also does it. I'm not sure what's the best to fix this, and if it means is_swap_pte() should work on hugetlb pte_t's for all archs. However you're right if that's the case your current patch can fix all of them by fixing huge_pte_offset() in loongarch's impl. So it looks like at least the simplest. Acked-by: Peter Xu <peterx@redhat.com> Maybe you want to ping the other patch to drop that in mm-unstable too, if that's not your intention to merge. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hugetlb: LoongArch: Return NULL from huge_pte_offset() for none PMD 2025-04-24 8:30 [PATCH] mm/hugetlb: LoongArch: Return NULL from huge_pte_offset() for none PMD Ming Wang 2025-04-24 18:48 ` Peter Xu @ 2025-04-26 10:13 ` Huacai Chen 1 sibling, 0 replies; 5+ messages in thread From: Huacai Chen @ 2025-04-26 10:13 UTC (permalink / raw) To: Ming Wang Cc: WANG Xuerui, Peter Xu, Andrew Morton, Hongchen Zhang, loongarch, linux-kernel, lixuefeng, gaojuxin, chenhuacai Applied, thanks. Huacai On Thu, Apr 24, 2025 at 4:30 PM Ming Wang <wangming01@loongson.cn> wrote: > > LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot > even if the underlying entry points to invalid_pte_table (indicating no > mapping). Callers like smaps_hugetlb_range() fetch this invalid entry > value (the address of invalid_pte_table) via this pointer. > > The generic is_swap_pte() check then incorrectly identifies this address > as a swap entry on LoongArch, because it satisfies the !pte_present() > && !pte_none() conditions. This misinterpretation, combined with a > coincidental match by is_migration_entry() on the address bits, leads > to kernel crashes in pfn_swap_entry_to_page(). > > Fix this at the architecture level by modifying huge_pte_offset() to > check the PMD entry's content using pmd_none() before returning. If the > entry is none (i.e., it points to invalid_pte_table), return NULL > instead of the pointer to the slot. > > Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn> > Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > Signed-off-by: Ming Wang <wangming01@loongson.cn> > --- > arch/loongarch/mm/hugetlbpage.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/loongarch/mm/hugetlbpage.c b/arch/loongarch/mm/hugetlbpage.c > index e4068906143b..cea84d7f2b91 100644 > --- a/arch/loongarch/mm/hugetlbpage.c > +++ b/arch/loongarch/mm/hugetlbpage.c > @@ -47,7 +47,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, > pmd = pmd_offset(pud, addr); > } > } > - return (pte_t *) pmd; > + return pmd_none(pmdp_get(pmd)) ? NULL : (pte_t *) pmd; > } > > uint64_t pmd_to_entrylo(unsigned long pmd_val) > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-04-26 10:13 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-24 8:30 [PATCH] mm/hugetlb: LoongArch: Return NULL from huge_pte_offset() for none PMD Ming Wang 2025-04-24 18:48 ` Peter Xu 2025-04-25 2:47 ` Ming Wang 2025-04-25 15:28 ` Peter Xu 2025-04-26 10:13 ` Huacai Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox