* [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range() @ 2025-08-22 5:58 Jeongjun Park 2025-08-22 15:19 ` Sidhartha Kumar 2025-08-23 1:07 ` Andrew Morton 0 siblings, 2 replies; 6+ messages in thread From: Jeongjun Park @ 2025-08-22 5:58 UTC (permalink / raw) To: muchun.song, osalvador, david, akpm Cc: leitao, linux-mm, linux-kernel, syzbot+417aeb05fd190f3a6da9, Jeongjun Park When restoring a reservation for an anonymous page, we need to check to freeing a surplus. However, __unmap_hugepage_range() causes data race because it reads h->surplus_huge_pages without the protection of hugetlb_lock. Therefore, we need to add missing hugetlb_lock. Reported-by: syzbot+417aeb05fd190f3a6da9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=417aeb05fd190f3a6da9 Fixes: df7a6d1f6405 ("mm/hugetlb: restore the reservation if needed") Signed-off-by: Jeongjun Park <aha310510@gmail.com> --- mm/hugetlb.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 753f99b4c718..e8d95a314df2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5951,6 +5951,8 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, * If there we are freeing a surplus, do not set the restore * reservation bit. */ + spin_lock_irq(&hugetlb_lock); + if (!h->surplus_huge_pages && __vma_private_lock(vma) && folio_test_anon(folio)) { folio_set_hugetlb_restore_reserve(folio); @@ -5958,6 +5960,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, adjust_reservation = true; } + spin_unlock_irq(&hugetlb_lock); spin_unlock(ptl); /* -- ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range() 2025-08-22 5:58 [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range() Jeongjun Park @ 2025-08-22 15:19 ` Sidhartha Kumar 2025-08-23 1:07 ` Andrew Morton 1 sibling, 0 replies; 6+ messages in thread From: Sidhartha Kumar @ 2025-08-22 15:19 UTC (permalink / raw) To: Jeongjun Park, muchun.song, osalvador, david, akpm Cc: leitao, linux-mm, linux-kernel, syzbot+417aeb05fd190f3a6da9 On 8/22/25 1:58 AM, Jeongjun Park wrote: > When restoring a reservation for an anonymous page, we need to check to > freeing a surplus. However, __unmap_hugepage_range() causes data race > because it reads h->surplus_huge_pages without the protection of > hugetlb_lock. > > Therefore, we need to add missing hugetlb_lock. > Makes sense as alloc_surplus_hugetlb_folio() takes the hugetlb_lock when reading the hstate. Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> > Reported-by: syzbot+417aeb05fd190f3a6da9@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=417aeb05fd190f3a6da9 > Fixes: df7a6d1f6405 ("mm/hugetlb: restore the reservation if needed") > Signed-off-by: Jeongjun Park <aha310510@gmail.com> > --- > mm/hugetlb.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 753f99b4c718..e8d95a314df2 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5951,6 +5951,8 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, > * If there we are freeing a surplus, do not set the restore > * reservation bit. > */ > + spin_lock_irq(&hugetlb_lock); > + > if (!h->surplus_huge_pages && __vma_private_lock(vma) && > folio_test_anon(folio)) { > folio_set_hugetlb_restore_reserve(folio); > @@ -5958,6 +5960,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, > adjust_reservation = true; > } > > + spin_unlock_irq(&hugetlb_lock); > spin_unlock(ptl); > > /* > -- > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range() 2025-08-22 5:58 [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range() Jeongjun Park 2025-08-22 15:19 ` Sidhartha Kumar @ 2025-08-23 1:07 ` Andrew Morton 2025-08-23 8:52 ` Giorgi Tchankvetadze 2025-08-23 15:07 ` Jeongjun Park 1 sibling, 2 replies; 6+ messages in thread From: Andrew Morton @ 2025-08-23 1:07 UTC (permalink / raw) To: Jeongjun Park Cc: muchun.song, osalvador, david, leitao, linux-mm, linux-kernel, syzbot+417aeb05fd190f3a6da9 On Fri, 22 Aug 2025 14:58:57 +0900 Jeongjun Park <aha310510@gmail.com> wrote: > When restoring a reservation for an anonymous page, we need to check to > freeing a surplus. However, __unmap_hugepage_range() causes data race > because it reads h->surplus_huge_pages without the protection of > hugetlb_lock. > > Therefore, we need to add missing hugetlb_lock. > > ... > > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5951,6 +5951,8 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, > * If there we are freeing a surplus, do not set the restore > * reservation bit. > */ > + spin_lock_irq(&hugetlb_lock); > + > if (!h->surplus_huge_pages && __vma_private_lock(vma) && > folio_test_anon(folio)) { > folio_set_hugetlb_restore_reserve(folio); > @@ -5958,6 +5960,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, > adjust_reservation = true; > } > > + spin_unlock_irq(&hugetlb_lock); > spin_unlock(ptl); > Does hugetlb_lock nest inside page_table_lock? It's a bit sad to be taking a global lock just to defend against some alleged data race which probably never happens. Doing it once per hugepage probably won't matter but still, is there something more proportionate that we can do here? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range() 2025-08-23 1:07 ` Andrew Morton @ 2025-08-23 8:52 ` Giorgi Tchankvetadze 2025-08-23 14:40 ` Jeongjun Park 2025-08-23 15:07 ` Jeongjun Park 1 sibling, 1 reply; 6+ messages in thread From: Giorgi Tchankvetadze @ 2025-08-23 8:52 UTC (permalink / raw) To: akpm Cc: aha310510, david, leitao, linux-kernel, linux-mm, muchun.song, osalvador, syzbot+417aeb05fd190f3a6da9 + /* + * Check surplus_huge_pages without taking hugetlb_lock. + * A race here is okay: + * - If surplus goes 0 -> nonzero, we skip restore. + * - If surplus goes nonzero -> 0, we also skip. + * In both cases we just miss a restore, which is safe. + */ + { + unsigned long surplus = READ_ONCE(h->surplus_huge_pages); + + if (!surplus && + __vma_private_lock(vma) && + folio_test_anon(folio) && + READ_ONCE(h->surplus_huge_pages) == surplus) { + folio_set_hugetlb_restore_reserve(folio); + adjust_reservation = true; + } + } spin_unlock(ptl); On 8/23/2025 5:07 AM, Andrew Morton wrote: > On Fri, 22 Aug 2025 14:58:57 +0900 Jeongjun Park <aha310510@gmail.com> wrote: > >> When restoring a reservation for an anonymous page, we need to check to > freeing a surplus. However, __unmap_hugepage_range() causes data > race > because it reads h->surplus_huge_pages without the protection of > > hugetlb_lock. > > Therefore, we need to add missing hugetlb_lock. > > > ... > > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5951,6 +5951,8 > @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct > vm_area_struct *vma, > * If there we are freeing a surplus, do not set > the restore > * reservation bit. > */ > + spin_lock_irq(&hugetlb_lock); > > + > if (!h->surplus_huge_pages && __vma_private_lock(vma) && > > folio_test_anon(folio)) { > folio_set_hugetlb_restore_reserve(folio); > > @@ -5958,6 +5960,7 @@ void __unmap_hugepage_range(struct mmu_gather > *tlb, struct vm_area_struct *vma, > adjust_reservation = true; > } > > + > spin_unlock_irq(&hugetlb_lock); > spin_unlock(ptl); > > Does hugetlb_lock nest inside page_table_lock? > > It's a bit sad to be taking a global lock just to defend against some > alleged data race which probably never happens. Doing it once per > hugepage probably won't matter but still, is there something more > proportionate that we can do here? > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range() 2025-08-23 8:52 ` Giorgi Tchankvetadze @ 2025-08-23 14:40 ` Jeongjun Park 0 siblings, 0 replies; 6+ messages in thread From: Jeongjun Park @ 2025-08-23 14:40 UTC (permalink / raw) To: Giorgi Tchankvetadze Cc: akpm, david, leitao, linux-kernel, linux-mm, muchun.song, osalvador, syzbot+417aeb05fd190f3a6da9 Hello Giorgi, Giorgi Tchankvetadze <giorgitchankvetadze1997@gmail.com> wrote: > > + /* > + * Check surplus_huge_pages without taking hugetlb_lock. > + * A race here is okay: > + * - If surplus goes 0 -> nonzero, we skip restore. > + * - If surplus goes nonzero -> 0, we also skip. > + * In both cases we just miss a restore, which is safe. > + */ > + { > + unsigned long surplus = READ_ONCE(h->surplus_huge_pages); > + > + if (!surplus && > + __vma_private_lock(vma) && > + folio_test_anon(folio) && > + READ_ONCE(h->surplus_huge_pages) == surplus) { > + folio_set_hugetlb_restore_reserve(folio); > + adjust_reservation = true; > + } > + } > > spin_unlock(ptl); > > Why do you think skipping restoration is safe? As specified in the comments, if scheduled restoration of anonymous pages isn't performed in a timely manner, the backup page can be stolen. And If the original owner tries to fault in the stolen page, it causes a page fault, resulting in a SIGBUS error. Of course, this phenomenon is a rare occurrence due to a race condition, but in workloads that frequently use hugetlb, surplus_huge_pages increases and decreases frequently, and backup pages that are not restored in time due to this race continue to accumulate, so this is not a race that can be ignored. > > > On 8/23/2025 5:07 AM, Andrew Morton wrote: > > On Fri, 22 Aug 2025 14:58:57 +0900 Jeongjun Park <aha310510@gmail.com> wrote: > > > >> When restoring a reservation for an anonymous page, we need to check to > freeing a surplus. However, __unmap_hugepage_range() causes data > > race > because it reads h->surplus_huge_pages without the protection of > > > hugetlb_lock. > > Therefore, we need to add missing hugetlb_lock. > > > > ... > > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5951,6 +5951,8 > > @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct > > vm_area_struct *vma, > * If there we are freeing a surplus, do not set > > the restore > * reservation bit. > */ > + spin_lock_irq(&hugetlb_lock); > > > + > if (!h->surplus_huge_pages && __vma_private_lock(vma) && > > > folio_test_anon(folio)) { > folio_set_hugetlb_restore_reserve(folio); > > > @@ -5958,6 +5960,7 @@ void __unmap_hugepage_range(struct mmu_gather > > *tlb, struct vm_area_struct *vma, > adjust_reservation = true; > } > > + > > spin_unlock_irq(&hugetlb_lock); > spin_unlock(ptl); > > > Does hugetlb_lock nest inside page_table_lock? > > > > It's a bit sad to be taking a global lock just to defend against some > > alleged data race which probably never happens. Doing it once per > > hugepage probably won't matter but still, is there something more > > proportionate that we can do here? > > > > Regards, Jeongjun Park ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range() 2025-08-23 1:07 ` Andrew Morton 2025-08-23 8:52 ` Giorgi Tchankvetadze @ 2025-08-23 15:07 ` Jeongjun Park 1 sibling, 0 replies; 6+ messages in thread From: Jeongjun Park @ 2025-08-23 15:07 UTC (permalink / raw) To: Andrew Morton Cc: muchun.song, osalvador, david, leitao, linux-mm, linux-kernel, syzbot+417aeb05fd190f3a6da9 Hello Andrew, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Fri, 22 Aug 2025 14:58:57 +0900 Jeongjun Park <aha310510@gmail.com> wrote: > > > When restoring a reservation for an anonymous page, we need to check to > > freeing a surplus. However, __unmap_hugepage_range() causes data race > > because it reads h->surplus_huge_pages without the protection of > > hugetlb_lock. > > > > Therefore, we need to add missing hugetlb_lock. > > > > ... > > > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -5951,6 +5951,8 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, > > * If there we are freeing a surplus, do not set the restore > > * reservation bit. > > */ > > + spin_lock_irq(&hugetlb_lock); > > + > > if (!h->surplus_huge_pages && __vma_private_lock(vma) && > > folio_test_anon(folio)) { > > folio_set_hugetlb_restore_reserve(folio); > > @@ -5958,6 +5960,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, > > adjust_reservation = true; > > } > > > > + spin_unlock_irq(&hugetlb_lock); > > spin_unlock(ptl); > > > > Does hugetlb_lock nest inside page_table_lock? > > It's a bit sad to be taking a global lock just to defend against some > alleged data race which probably never happens. Doing it once per > hugepage probably won't matter but still, is there something more > proportionate that we can do here? I think it would be better to move the page_table_lock unlock call after the hugetlb_remove_rmap() call. ``` pte = huge_ptep_get_and_clear(mm, address, ptep, sz); tlb_remove_huge_tlb_entry(h, tlb, ptep, address); if (huge_pte_dirty(pte)) folio_mark_dirty(folio); /* Leave a uffd-wp pte marker if needed */ if (huge_pte_uffd_wp(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) set_huge_pte_at(mm, address, ptep, make_pte_marker(PTE_MARKER_UFFD_WP), sz); hugetlb_count_sub(pages_per_huge_page(h), mm); hugetlb_remove_rmap(folio); ``` In __unmap_hugepage_range(), after all of the above code has been executed, the PTE/TLB/rmap are all properly cleaned up. Therefore, there's no need to continue protecting folio_set_hugetlb_restore_reserve(), which only sets bits of folio->private, with a page_table_lock. Regards, Jeongjun Park ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-23 15:07 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-22 5:58 [PATCH] mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range() Jeongjun Park 2025-08-22 15:19 ` Sidhartha Kumar 2025-08-23 1:07 ` Andrew Morton 2025-08-23 8:52 ` Giorgi Tchankvetadze 2025-08-23 14:40 ` Jeongjun Park 2025-08-23 15:07 ` Jeongjun Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).