* [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios @ 2025-05-22 3:22 yangge1116 2025-05-22 3:47 ` Muchun Song ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: yangge1116 @ 2025-05-22 3:22 UTC (permalink / raw) To: akpm Cc: linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, muchun.song, osalvador, liuzixing, Ge Yang From: Ge Yang <yangge1116@126.com> A kernel crash was observed when replacing free hugetlb folios: BUG: kernel NULL pointer dereference, address: 0000000000000028 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary) RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 Call Trace: <TASK> replace_free_hugepage_folios+0xb6/0x100 alloc_contig_range_noprof+0x18a/0x590 ? srso_return_thunk+0x5/0x5f ? down_read+0x12/0xa0 ? srso_return_thunk+0x5/0x5f cma_range_alloc.constprop.0+0x131/0x290 __cma_alloc+0xcf/0x2c0 cma_alloc_write+0x43/0xb0 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 debugfs_attr_write+0x46/0x70 full_proxy_write+0x62/0xa0 vfs_write+0xf8/0x420 ? srso_return_thunk+0x5/0x5f ? filp_flush+0x86/0xa0 ? srso_return_thunk+0x5/0x5f ? filp_close+0x1f/0x30 ? srso_return_thunk+0x5/0x5f ? do_dup2+0xaf/0x160 ? srso_return_thunk+0x5/0x5f ksys_write+0x65/0xe0 do_syscall_64+0x64/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e There is a potential race between __update_and_free_hugetlb_folio() and replace_free_hugepage_folios(): CPU1 CPU2 __update_and_free_hugetlb_folio replace_free_hugepage_folios folio_test_hugetlb(folio) -- It's still hugetlb folio. __folio_clear_hugetlb(folio) hugetlb_free_folio(folio) h = folio_hstate(folio) -- Here, h is NULL pointer When the above race condition occurs, folio_hstate(folio) returns NULL, and subsequent access to this NULL pointer will cause the system to crash. To resolve this issue, execute folio_hstate(folio) under the protection of the hugetlb_lock lock, ensuring that folio_hstate(folio) does not return NULL. Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") Signed-off-by: Ge Yang <yangge1116@126.com> Cc: <stable@vger.kernel.org> --- mm/hugetlb.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3d3ca6b..6c2e007 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) while (start_pfn < end_pfn) { folio = pfn_folio(start_pfn); + + /* + * The folio might have been dissolved from under our feet, so make sure + * to carefully check the state under the lock. + */ + spin_lock_irq(&hugetlb_lock); if (folio_test_hugetlb(folio)) { h = folio_hstate(folio); } else { + spin_unlock_irq(&hugetlb_lock); start_pfn++; continue; } + spin_unlock_irq(&hugetlb_lock); if (!folio_ref_count(folio)) { ret = alloc_and_dissolve_hugetlb_folio(h, folio, -- 2.7.4 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-22 3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116 @ 2025-05-22 3:47 ` Muchun Song 2025-05-22 5:34 ` Oscar Salvador 2025-05-22 11:50 ` Oscar Salvador 2025-05-26 12:41 ` David Hildenbrand 2 siblings, 1 reply; 18+ messages in thread From: Muchun Song @ 2025-05-22 3:47 UTC (permalink / raw) To: yangge1116 Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, osalvador, liuzixing > On May 22, 2025, at 11:22, yangge1116@126.com wrote: > > From: Ge Yang <yangge1116@126.com> > > A kernel crash was observed when replacing free hugetlb folios: > > BUG: kernel NULL pointer dereference, address: 0000000000000028 > PGD 0 P4D 0 > Oops: Oops: 0000 [#1] SMP NOPTI > CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary) > RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 > RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 > RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 > RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 > R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 > R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 > FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 > Call Trace: > <TASK> > replace_free_hugepage_folios+0xb6/0x100 > alloc_contig_range_noprof+0x18a/0x590 > ? srso_return_thunk+0x5/0x5f > ? down_read+0x12/0xa0 > ? srso_return_thunk+0x5/0x5f > cma_range_alloc.constprop.0+0x131/0x290 > __cma_alloc+0xcf/0x2c0 > cma_alloc_write+0x43/0xb0 > simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 > debugfs_attr_write+0x46/0x70 > full_proxy_write+0x62/0xa0 > vfs_write+0xf8/0x420 > ? srso_return_thunk+0x5/0x5f > ? filp_flush+0x86/0xa0 > ? srso_return_thunk+0x5/0x5f > ? filp_close+0x1f/0x30 > ? srso_return_thunk+0x5/0x5f > ? do_dup2+0xaf/0x160 > ? srso_return_thunk+0x5/0x5f > ksys_write+0x65/0xe0 > do_syscall_64+0x64/0x170 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > There is a potential race between __update_and_free_hugetlb_folio() > and replace_free_hugepage_folios(): > > CPU1 CPU2 > __update_and_free_hugetlb_folio replace_free_hugepage_folios > folio_test_hugetlb(folio) > -- It's still hugetlb folio. > > __folio_clear_hugetlb(folio) > hugetlb_free_folio(folio) > h = folio_hstate(folio) > -- Here, h is NULL pointer > > When the above race condition occurs, folio_hstate(folio) returns > NULL, and subsequent access to this NULL pointer will cause the > system to crash. To resolve this issue, execute folio_hstate(folio) > under the protection of the hugetlb_lock lock, ensuring that > folio_hstate(folio) does not return NULL. > > Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") > Signed-off-by: Ge Yang <yangge1116@126.com> > Cc: <stable@vger.kernel.org> Thanks for fixing this problem. BTW, in order to catch future similar problem, it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock is not held when folio's reference count is zero. For this fix, LGTM. Reviewed-by: Muchun Song <muchun.song@linux.dev> Thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-22 3:47 ` Muchun Song @ 2025-05-22 5:34 ` Oscar Salvador 2025-05-22 7:13 ` Muchun Song 2025-05-22 11:34 ` Ge Yang 0 siblings, 2 replies; 18+ messages in thread From: Oscar Salvador @ 2025-05-22 5:34 UTC (permalink / raw) To: Muchun Song Cc: yangge1116, akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote: > Thanks for fixing this problem. BTW, in order to catch future similar problem, > it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock > is not held when folio's reference count is zero. For this fix, LGTM. Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(), which will again check things under the lock? I mean, I would be ok to save cycles and check upfront in replace_free_hugepage_folios(), but the latter has only one user which is alloc_contig_range(), which is not really an expected-to-be optimized function. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bd8971388236..b4d937732256 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2924,13 +2924,6 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) while (start_pfn < end_pfn) { folio = pfn_folio(start_pfn); - if (folio_test_hugetlb(folio)) { - h = folio_hstate(folio); - } else { - start_pfn++; - continue; - } - if (!folio_ref_count(folio)) { ret = alloc_and_dissolve_hugetlb_folio(h, folio, &isolate_list); -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-22 5:34 ` Oscar Salvador @ 2025-05-22 7:13 ` Muchun Song 2025-05-22 10:13 ` Oscar Salvador 2025-05-22 11:34 ` Ge Yang 1 sibling, 1 reply; 18+ messages in thread From: Muchun Song @ 2025-05-22 7:13 UTC (permalink / raw) To: Oscar Salvador Cc: yangge1116, akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing > On May 22, 2025, at 13:34, Oscar Salvador <osalvador@suse.de> wrote: > > On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote: >> Thanks for fixing this problem. BTW, in order to catch future similar problem, >> it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock >> is not held when folio's reference count is zero. For this fix, LGTM. > > Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(), > which will again check things under the lock? I've also considered about this choice, because there is another similar case in isolate_or_dissolve_huge_page() which could benefit from this change. I am fine with both approaches. Anyway, adding an assertion into folio_hstate() is an improvement for capturing invalid users in the future. Because any user of folio_hstate() should hold a reference to folio or hold the hugetlb_lock to make sure it returns a valid hstate for a hugetlb folio. Muchun, Thanks. > I mean, I would be ok to save cycles and check upfront in > replace_free_hugepage_folios(), but the latter has only one user which > is alloc_contig_range(), which is not really an expected-to-be optimized > function. > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index bd8971388236..b4d937732256 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2924,13 +2924,6 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) > > while (start_pfn < end_pfn) { > folio = pfn_folio(start_pfn); > - if (folio_test_hugetlb(folio)) { > - h = folio_hstate(folio); > - } else { > - start_pfn++; > - continue; > - } > - > if (!folio_ref_count(folio)) { > ret = alloc_and_dissolve_hugetlb_folio(h, folio, > &isolate_list); > > > > -- > Oscar Salvador > SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-22 7:13 ` Muchun Song @ 2025-05-22 10:13 ` Oscar Salvador 0 siblings, 0 replies; 18+ messages in thread From: Oscar Salvador @ 2025-05-22 10:13 UTC (permalink / raw) To: Muchun Song Cc: yangge1116, akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing On Thu, May 22, 2025 at 03:13:31PM +0800, Muchun Song wrote: > > > > On May 22, 2025, at 13:34, Oscar Salvador <osalvador@suse.de> wrote: > > > > On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote: > >> Thanks for fixing this problem. BTW, in order to catch future similar problem, > >> it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock > >> is not held when folio's reference count is zero. For this fix, LGTM. > > > > Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(), > > which will again check things under the lock? > > I've also considered about this choice, because there is another similar > case in isolate_or_dissolve_huge_page() which could benefit from this > change. I am fine with both approaches. Anyway, adding an assertion into > folio_hstate() is an improvement for capturing invalid users in the future. > Because any user of folio_hstate() should hold a reference to folio or > hold the hugetlb_lock to make sure it returns a valid hstate for a hugetlb > folio. Yes, I am not arguing about that, it makes perfect sense to me, but I am just kinda against these micro-optimizations for taking the lock to check things when we are calling in a function that will again the lock to check things. Actually, I think that the folio_test_hugetlb() check in replace_free_hugepage_folios() was put there to try tro be smart and save cycles in case we were not dealing with a hugetlb page (so freed under us). Now that we learnt that we cannot do that without 1) taking a refcount 2) or holding the lock, that becomes superfluos, so I would just wipe that out. -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-22 5:34 ` Oscar Salvador 2025-05-22 7:13 ` Muchun Song @ 2025-05-22 11:34 ` Ge Yang 2025-05-22 11:49 ` Oscar Salvador 1 sibling, 1 reply; 18+ messages in thread From: Ge Yang @ 2025-05-22 11:34 UTC (permalink / raw) To: Oscar Salvador, Muchun Song Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing 在 2025/5/22 13:34, Oscar Salvador 写道: > On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote: >> Thanks for fixing this problem. BTW, in order to catch future similar problem, >> it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock >> is not held when folio's reference count is zero. For this fix, LGTM. > > Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(), > which will again check things under the lock? > I mean, I would be ok to save cycles and check upfront in > replace_free_hugepage_folios(), but the latter has only one user which > is alloc_contig_range(), which is not really an expected-to-be optimized > function. > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index bd8971388236..b4d937732256 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2924,13 +2924,6 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) > > while (start_pfn < end_pfn) { > folio = pfn_folio(start_pfn); > - if (folio_test_hugetlb(folio)) { > - h = folio_hstate(folio); > - } else { > - start_pfn++; > - continue; > - } > - > if (!folio_ref_count(folio)) { > ret = alloc_and_dissolve_hugetlb_folio(h, folio, > &isolate_list); > > > It seems that we cannot simply remove the folio_test_hugetlb() check. The reasons are as follows: 1)If we remove it, we will be unable to obtain the hstat corresponding to the folio, and consequently, we won't be able to call alloc_and_dissolve_hugetlb_folio(). 2)The alloc_and_dissolve_hugetlb_folio() function is also called within the isolate_or_dissolve_huge_folio() function. However, the folio_test_hugetlb() check within the isolate_or_dissolve_huge_folio() function cannot be removed. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-22 11:34 ` Ge Yang @ 2025-05-22 11:49 ` Oscar Salvador 2025-05-22 12:39 ` Muchun Song 0 siblings, 1 reply; 18+ messages in thread From: Oscar Salvador @ 2025-05-22 11:49 UTC (permalink / raw) To: Ge Yang Cc: Muchun Song, akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing On Thu, May 22, 2025 at 07:34:56PM +0800, Ge Yang wrote: > It seems that we cannot simply remove the folio_test_hugetlb() check. The > reasons are as follows: Yeah, my thought was whether we could move the folio_hstate within alloc_and_dissolve_hugetlb_folio(), since the latter really needs to take the lock. But isolate_or_dissolve_huge_page() also needs the 'hstate' not only to pass it onto alloc_and_dissolve_hugetlb_folio() but to check whether hstate is gigantic. Umh, kinda hate sparkling the locks all around. -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-22 11:49 ` Oscar Salvador @ 2025-05-22 12:39 ` Muchun Song 2025-05-22 19:32 ` Oscar Salvador 0 siblings, 1 reply; 18+ messages in thread From: Muchun Song @ 2025-05-22 12:39 UTC (permalink / raw) To: Oscar Salvador Cc: Ge Yang, akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing > On May 22, 2025, at 19:49, Oscar Salvador <osalvador@suse.de> wrote: > > On Thu, May 22, 2025 at 07:34:56PM +0800, Ge Yang wrote: >> It seems that we cannot simply remove the folio_test_hugetlb() check. The >> reasons are as follows: > > Yeah, my thought was whether we could move the folio_hstate within > alloc_and_dissolve_hugetlb_folio(), since the latter really needs to take the > lock. > But isolate_or_dissolve_huge_page() also needs the 'hstate' not only to > pass it onto alloc_and_dissolve_hugetlb_folio() but to check whether > hstate is gigantic. But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it. > > Umh, kinda hate sparkling the locks all around. > > > -- > Oscar Salvador > SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-22 12:39 ` Muchun Song @ 2025-05-22 19:32 ` Oscar Salvador 2025-05-23 3:27 ` Muchun Song 0 siblings, 1 reply; 18+ messages in thread From: Oscar Salvador @ 2025-05-22 19:32 UTC (permalink / raw) To: Muchun Song Cc: Ge Yang, akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote: > But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check > of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate > from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it. Yes, I think we can do that. So something like the following (compily-tested only) maybe? From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001 From: Oscar Salvador <osalvador@suse.de> Date: Thu, 22 May 2025 19:51:04 +0200 Subject: [PATCH] TMP --- mm/hugetlb.c | 38 +++++++++----------------------------- 1 file changed, 9 insertions(+), 29 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bd8971388236..20f08de9e37d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2787,15 +2787,13 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, /* * alloc_and_dissolve_hugetlb_folio - Allocate a new folio and dissolve * the old one - * @h: struct hstate old page belongs to * @old_folio: Old folio to dissolve * @list: List to isolate the page in case we need to * Returns 0 on success, otherwise negated error. */ -static int alloc_and_dissolve_hugetlb_folio(struct hstate *h, - struct folio *old_folio, struct list_head *list) +static int alloc_and_dissolve_hugetlb_folio(struct folio *old_folio, struct list_head *list) { - gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; + struct hstate *h; int nid = folio_nid(old_folio); struct folio *new_folio = NULL; int ret = 0; @@ -2829,7 +2827,11 @@ static int alloc_and_dissolve_hugetlb_folio(struct hstate *h, cond_resched(); goto retry; } else { + h = folio_hstate(old_folio); + if (!new_folio) { + gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; + spin_unlock_irq(&hugetlb_lock); new_folio = alloc_buddy_hugetlb_folio(h, gfp_mask, nid, NULL, NULL); @@ -2874,35 +2876,20 @@ static int alloc_and_dissolve_hugetlb_folio(struct hstate *h, int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list) { - struct hstate *h; int ret = -EBUSY; - /* - * The page might have been dissolved from under our feet, so make sure - * to carefully check the state under the lock. - * Return success when racing as if we dissolved the page ourselves. - */ - spin_lock_irq(&hugetlb_lock); - if (folio_test_hugetlb(folio)) { - h = folio_hstate(folio); - } else { - spin_unlock_irq(&hugetlb_lock); - return 0; - } - spin_unlock_irq(&hugetlb_lock); - /* * Fence off gigantic pages as there is a cyclic dependency between * alloc_contig_range and them. Return -ENOMEM as this has the effect * of bailing out right away without further retrying. */ - if (hstate_is_gigantic(h)) + if (folio_order(folio) > MAX_PAGE_ORDER) return -ENOMEM; if (folio_ref_count(folio) && folio_isolate_hugetlb(folio, list)) ret = 0; else if (!folio_ref_count(folio)) - ret = alloc_and_dissolve_hugetlb_folio(h, folio, list); + ret = alloc_and_dissolve_hugetlb_folio(folio, list); return ret; } @@ -2916,7 +2903,6 @@ int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list) */ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) { - struct hstate *h; struct folio *folio; int ret = 0; @@ -2924,15 +2910,9 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) while (start_pfn < end_pfn) { folio = pfn_folio(start_pfn); - if (folio_test_hugetlb(folio)) { - h = folio_hstate(folio); - } else { - start_pfn++; - continue; - } if (!folio_ref_count(folio)) { - ret = alloc_and_dissolve_hugetlb_folio(h, folio, + ret = alloc_and_dissolve_hugetlb_folio(folio, &isolate_list); if (ret) break; -- 2.49.0 -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-22 19:32 ` Oscar Salvador @ 2025-05-23 3:27 ` Muchun Song 2025-05-23 3:46 ` Ge Yang 0 siblings, 1 reply; 18+ messages in thread From: Muchun Song @ 2025-05-23 3:27 UTC (permalink / raw) To: Oscar Salvador Cc: Ge Yang, akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing > On May 23, 2025, at 03:32, Oscar Salvador <osalvador@suse.de> wrote: > > On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote: >> But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check >> of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate >> from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it. > > Yes, I think we can do that. > So something like the following (compily-tested only) maybe? > > From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001 > From: Oscar Salvador <osalvador@suse.de> > Date: Thu, 22 May 2025 19:51:04 +0200 > Subject: [PATCH] TMP > > --- > mm/hugetlb.c | 38 +++++++++----------------------------- > 1 file changed, 9 insertions(+), 29 deletions(-) Pretty simple. The code LGTM. Thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-23 3:27 ` Muchun Song @ 2025-05-23 3:46 ` Ge Yang 2025-05-23 3:56 ` Muchun Song 2025-05-23 5:30 ` Oscar Salvador 0 siblings, 2 replies; 18+ messages in thread From: Ge Yang @ 2025-05-23 3:46 UTC (permalink / raw) To: Muchun Song, Oscar Salvador Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing 在 2025/5/23 11:27, Muchun Song 写道: > > >> On May 23, 2025, at 03:32, Oscar Salvador <osalvador@suse.de> wrote: >> >> On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote: >>> But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check >>> of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate >>> from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it. >> >> Yes, I think we can do that. >> So something like the following (compily-tested only) maybe? >> >> From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001 >> From: Oscar Salvador <osalvador@suse.de> >> Date: Thu, 22 May 2025 19:51:04 +0200 >> Subject: [PATCH] TMP >> >> --- >> mm/hugetlb.c | 38 +++++++++----------------------------- >> 1 file changed, 9 insertions(+), 29 deletions(-) > > Pretty simple. The code LGTM. > > Thanks. Thanks. The implementation of alloc_and_dissolve_hugetlb_folio differs between kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to submit another patch based on Oscar Salvador's suggestion. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-23 3:46 ` Ge Yang @ 2025-05-23 3:56 ` Muchun Song 2025-05-23 5:30 ` Oscar Salvador 1 sibling, 0 replies; 18+ messages in thread From: Muchun Song @ 2025-05-23 3:56 UTC (permalink / raw) To: Ge Yang Cc: Oscar Salvador, akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing > On May 23, 2025, at 11:46, Ge Yang <yangge1116@126.com> wrote: > > > > 在 2025/5/23 11:27, Muchun Song 写道: >>> On May 23, 2025, at 03:32, Oscar Salvador <osalvador@suse.de> wrote: >>> >>> On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote: >>>> But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check >>>> of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate >>>> from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it. >>> >>> Yes, I think we can do that. >>> So something like the following (compily-tested only) maybe? >>> >>> From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001 >>> From: Oscar Salvador <osalvador@suse.de> >>> Date: Thu, 22 May 2025 19:51:04 +0200 >>> Subject: [PATCH] TMP >>> >>> --- >>> mm/hugetlb.c | 38 +++++++++----------------------------- >>> 1 file changed, 9 insertions(+), 29 deletions(-) >> Pretty simple. The code LGTM. >> Thanks. > > Thanks. > > The implementation of alloc_and_dissolve_hugetlb_folio differs between kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to submit another patch based on Oscar Salvador's suggestion. A separate improving patch LGTM. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-23 3:46 ` Ge Yang 2025-05-23 3:56 ` Muchun Song @ 2025-05-23 5:30 ` Oscar Salvador 2025-05-23 8:07 ` Ge Yang 1 sibling, 1 reply; 18+ messages in thread From: Oscar Salvador @ 2025-05-23 5:30 UTC (permalink / raw) To: Ge Yang Cc: Muchun Song, akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing On Fri, May 23, 2025 at 11:46:51AM +0800, Ge Yang wrote: > The implementation of alloc_and_dissolve_hugetlb_folio differs between > kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to > submit another patch based on Oscar Salvador's suggestion. If the code differs a lot between a stable version and upstream, the way to go is to submit a specific patch for the stable one that applies to that tree, e.g: [PATCH 6.6]... -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-23 5:30 ` Oscar Salvador @ 2025-05-23 8:07 ` Ge Yang 0 siblings, 0 replies; 18+ messages in thread From: Ge Yang @ 2025-05-23 8:07 UTC (permalink / raw) To: Oscar Salvador Cc: Muchun Song, akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, liuzixing 在 2025/5/23 13:30, Oscar Salvador 写道: > On Fri, May 23, 2025 at 11:46:51AM +0800, Ge Yang wrote: >> The implementation of alloc_and_dissolve_hugetlb_folio differs between >> kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to >> submit another patch based on Oscar Salvador's suggestion. > > If the code differs a lot between a stable version and upstream, the way > to go is to submit a specific patch for the stable one that applies > to that tree, e.g: [PATCH 6.6]... > > Ok, thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-22 3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116 2025-05-22 3:47 ` Muchun Song @ 2025-05-22 11:50 ` Oscar Salvador 2025-05-26 12:41 ` David Hildenbrand 2 siblings, 0 replies; 18+ messages in thread From: Oscar Salvador @ 2025-05-22 11:50 UTC (permalink / raw) To: yangge1116 Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang, muchun.song, liuzixing On Thu, May 22, 2025 at 11:22:17AM +0800, yangge1116@126.com wrote: > From: Ge Yang <yangge1116@126.com> > > A kernel crash was observed when replacing free hugetlb folios: > > BUG: kernel NULL pointer dereference, address: 0000000000000028 > PGD 0 P4D 0 > Oops: Oops: 0000 [#1] SMP NOPTI > CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary) > RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 > RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 > RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 > RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 > R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 > R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 > FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 > Call Trace: > <TASK> > replace_free_hugepage_folios+0xb6/0x100 > alloc_contig_range_noprof+0x18a/0x590 > ? srso_return_thunk+0x5/0x5f > ? down_read+0x12/0xa0 > ? srso_return_thunk+0x5/0x5f > cma_range_alloc.constprop.0+0x131/0x290 > __cma_alloc+0xcf/0x2c0 > cma_alloc_write+0x43/0xb0 > simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 > debugfs_attr_write+0x46/0x70 > full_proxy_write+0x62/0xa0 > vfs_write+0xf8/0x420 > ? srso_return_thunk+0x5/0x5f > ? filp_flush+0x86/0xa0 > ? srso_return_thunk+0x5/0x5f > ? filp_close+0x1f/0x30 > ? srso_return_thunk+0x5/0x5f > ? do_dup2+0xaf/0x160 > ? srso_return_thunk+0x5/0x5f > ksys_write+0x65/0xe0 > do_syscall_64+0x64/0x170 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > There is a potential race between __update_and_free_hugetlb_folio() > and replace_free_hugepage_folios(): > > CPU1 CPU2 > __update_and_free_hugetlb_folio replace_free_hugepage_folios > folio_test_hugetlb(folio) > -- It's still hugetlb folio. > > __folio_clear_hugetlb(folio) > hugetlb_free_folio(folio) > h = folio_hstate(folio) > -- Here, h is NULL pointer > > When the above race condition occurs, folio_hstate(folio) returns > NULL, and subsequent access to this NULL pointer will cause the > system to crash. To resolve this issue, execute folio_hstate(folio) > under the protection of the hugetlb_lock lock, ensuring that > folio_hstate(folio) does not return NULL. > > Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") > Signed-off-by: Ge Yang <yangge1116@126.com> > Cc: <stable@vger.kernel.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-22 3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116 2025-05-22 3:47 ` Muchun Song 2025-05-22 11:50 ` Oscar Salvador @ 2025-05-26 12:41 ` David Hildenbrand 2025-05-26 12:57 ` Ge Yang 2 siblings, 1 reply; 18+ messages in thread From: David Hildenbrand @ 2025-05-26 12:41 UTC (permalink / raw) To: yangge1116, akpm Cc: linux-mm, linux-kernel, stable, 21cnbao, baolin.wang, muchun.song, osalvador, liuzixing On 22.05.25 05:22, yangge1116@126.com wrote: > From: Ge Yang <yangge1116@126.com> > > A kernel crash was observed when replacing free hugetlb folios: > > BUG: kernel NULL pointer dereference, address: 0000000000000028 > PGD 0 P4D 0 > Oops: Oops: 0000 [#1] SMP NOPTI > CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary) > RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 > RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 > RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 > RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 > R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 > R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 > FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 > Call Trace: > <TASK> > replace_free_hugepage_folios+0xb6/0x100 > alloc_contig_range_noprof+0x18a/0x590 > ? srso_return_thunk+0x5/0x5f > ? down_read+0x12/0xa0 > ? srso_return_thunk+0x5/0x5f > cma_range_alloc.constprop.0+0x131/0x290 > __cma_alloc+0xcf/0x2c0 > cma_alloc_write+0x43/0xb0 > simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 > debugfs_attr_write+0x46/0x70 > full_proxy_write+0x62/0xa0 > vfs_write+0xf8/0x420 > ? srso_return_thunk+0x5/0x5f > ? filp_flush+0x86/0xa0 > ? srso_return_thunk+0x5/0x5f > ? filp_close+0x1f/0x30 > ? srso_return_thunk+0x5/0x5f > ? do_dup2+0xaf/0x160 > ? srso_return_thunk+0x5/0x5f > ksys_write+0x65/0xe0 > do_syscall_64+0x64/0x170 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > There is a potential race between __update_and_free_hugetlb_folio() > and replace_free_hugepage_folios(): > > CPU1 CPU2 > __update_and_free_hugetlb_folio replace_free_hugepage_folios > folio_test_hugetlb(folio) > -- It's still hugetlb folio. > > __folio_clear_hugetlb(folio) > hugetlb_free_folio(folio) > h = folio_hstate(folio) > -- Here, h is NULL pointer > > When the above race condition occurs, folio_hstate(folio) returns > NULL, and subsequent access to this NULL pointer will cause the > system to crash. To resolve this issue, execute folio_hstate(folio) > under the protection of the hugetlb_lock lock, ensuring that > folio_hstate(folio) does not return NULL. > > Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") > Signed-off-by: Ge Yang <yangge1116@126.com> > Cc: <stable@vger.kernel.org> > --- > mm/hugetlb.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 3d3ca6b..6c2e007 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) > > while (start_pfn < end_pfn) { > folio = pfn_folio(start_pfn); > + > + /* > + * The folio might have been dissolved from under our feet, so make sure > + * to carefully check the state under the lock. > + */ > + spin_lock_irq(&hugetlb_lock); > if (folio_test_hugetlb(folio)) { > h = folio_hstate(folio); > } else { > + spin_unlock_irq(&hugetlb_lock); > start_pfn++; > continue; > } > + spin_unlock_irq(&hugetlb_lock); As mentioned elsewhere, this will grab the hugetlb_lock for each and every pfn in the range if there are no hugetlb folios (common case). That should certainly *not* be done. In case we see !folio_test_hugetlb(), we should just move on. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-26 12:41 ` David Hildenbrand @ 2025-05-26 12:57 ` Ge Yang 2025-05-26 12:59 ` David Hildenbrand 0 siblings, 1 reply; 18+ messages in thread From: Ge Yang @ 2025-05-26 12:57 UTC (permalink / raw) To: David Hildenbrand, akpm Cc: linux-mm, linux-kernel, stable, 21cnbao, baolin.wang, muchun.song, osalvador, liuzixing 在 2025/5/26 20:41, David Hildenbrand 写道: > On 22.05.25 05:22, yangge1116@126.com wrote: >> From: Ge Yang <yangge1116@126.com> >> >> A kernel crash was observed when replacing free hugetlb folios: >> >> BUG: kernel NULL pointer dereference, address: 0000000000000028 >> PGD 0 P4D 0 >> Oops: Oops: 0000 [#1] SMP NOPTI >> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 >> PREEMPT(voluntary) >> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 >> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 >> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 >> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 >> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 >> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 >> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 >> FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) >> knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 >> Call Trace: >> <TASK> >> replace_free_hugepage_folios+0xb6/0x100 >> alloc_contig_range_noprof+0x18a/0x590 >> ? srso_return_thunk+0x5/0x5f >> ? down_read+0x12/0xa0 >> ? srso_return_thunk+0x5/0x5f >> cma_range_alloc.constprop.0+0x131/0x290 >> __cma_alloc+0xcf/0x2c0 >> cma_alloc_write+0x43/0xb0 >> simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 >> debugfs_attr_write+0x46/0x70 >> full_proxy_write+0x62/0xa0 >> vfs_write+0xf8/0x420 >> ? srso_return_thunk+0x5/0x5f >> ? filp_flush+0x86/0xa0 >> ? srso_return_thunk+0x5/0x5f >> ? filp_close+0x1f/0x30 >> ? srso_return_thunk+0x5/0x5f >> ? do_dup2+0xaf/0x160 >> ? srso_return_thunk+0x5/0x5f >> ksys_write+0x65/0xe0 >> do_syscall_64+0x64/0x170 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> There is a potential race between __update_and_free_hugetlb_folio() >> and replace_free_hugepage_folios(): >> >> CPU1 CPU2 >> __update_and_free_hugetlb_folio replace_free_hugepage_folios >> folio_test_hugetlb(folio) >> -- It's still hugetlb folio. >> >> __folio_clear_hugetlb(folio) >> hugetlb_free_folio(folio) >> h = folio_hstate(folio) >> -- Here, h is NULL pointer >> >> When the above race condition occurs, folio_hstate(folio) returns >> NULL, and subsequent access to this NULL pointer will cause the >> system to crash. To resolve this issue, execute folio_hstate(folio) >> under the protection of the hugetlb_lock lock, ensuring that >> folio_hstate(folio) does not return NULL. >> >> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") >> Signed-off-by: Ge Yang <yangge1116@126.com> >> Cc: <stable@vger.kernel.org> >> --- >> mm/hugetlb.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 3d3ca6b..6c2e007 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long >> start_pfn, unsigned long end_pfn) >> while (start_pfn < end_pfn) { >> folio = pfn_folio(start_pfn); >> + >> + /* >> + * The folio might have been dissolved from under our feet, >> so make sure >> + * to carefully check the state under the lock. >> + */ >> + spin_lock_irq(&hugetlb_lock); >> if (folio_test_hugetlb(folio)) { >> h = folio_hstate(folio); >> } else { >> + spin_unlock_irq(&hugetlb_lock); >> start_pfn++; >> continue; >> } >> + spin_unlock_irq(&hugetlb_lock); > > As mentioned elsewhere, this will grab the hugetlb_lock for each and > every pfn in the range if there are no hugetlb folios (common case). > > That should certainly *not* be done. > > In case we see !folio_test_hugetlb(), we should just move on. > The main reason for acquiring the hugetlb_lock here is to obtain a valid hstate, as the alloc_and_dissolve_hugetlb_folio() function requires hstate as a parameter. This approach is indeed not performance-friendly. However, in the patch available at https://lore.kernel.org/lkml/1747987559-23082-1-git-send-email-yangge1116@126.com/, all these operations will be removed. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios 2025-05-26 12:57 ` Ge Yang @ 2025-05-26 12:59 ` David Hildenbrand 0 siblings, 0 replies; 18+ messages in thread From: David Hildenbrand @ 2025-05-26 12:59 UTC (permalink / raw) To: Ge Yang, akpm Cc: linux-mm, linux-kernel, stable, 21cnbao, baolin.wang, muchun.song, osalvador, liuzixing On 26.05.25 14:57, Ge Yang wrote: > > > 在 2025/5/26 20:41, David Hildenbrand 写道: >> On 22.05.25 05:22, yangge1116@126.com wrote: >>> From: Ge Yang <yangge1116@126.com> >>> >>> A kernel crash was observed when replacing free hugetlb folios: >>> >>> BUG: kernel NULL pointer dereference, address: 0000000000000028 >>> PGD 0 P4D 0 >>> Oops: Oops: 0000 [#1] SMP NOPTI >>> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 >>> PREEMPT(voluntary) >>> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 >>> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 >>> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 >>> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 >>> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 >>> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 >>> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 >>> FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) >>> knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 >>> Call Trace: >>> <TASK> >>> replace_free_hugepage_folios+0xb6/0x100 >>> alloc_contig_range_noprof+0x18a/0x590 >>> ? srso_return_thunk+0x5/0x5f >>> ? down_read+0x12/0xa0 >>> ? srso_return_thunk+0x5/0x5f >>> cma_range_alloc.constprop.0+0x131/0x290 >>> __cma_alloc+0xcf/0x2c0 >>> cma_alloc_write+0x43/0xb0 >>> simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 >>> debugfs_attr_write+0x46/0x70 >>> full_proxy_write+0x62/0xa0 >>> vfs_write+0xf8/0x420 >>> ? srso_return_thunk+0x5/0x5f >>> ? filp_flush+0x86/0xa0 >>> ? srso_return_thunk+0x5/0x5f >>> ? filp_close+0x1f/0x30 >>> ? srso_return_thunk+0x5/0x5f >>> ? do_dup2+0xaf/0x160 >>> ? srso_return_thunk+0x5/0x5f >>> ksys_write+0x65/0xe0 >>> do_syscall_64+0x64/0x170 >>> entry_SYSCALL_64_after_hwframe+0x76/0x7e >>> >>> There is a potential race between __update_and_free_hugetlb_folio() >>> and replace_free_hugepage_folios(): >>> >>> CPU1 CPU2 >>> __update_and_free_hugetlb_folio replace_free_hugepage_folios >>> folio_test_hugetlb(folio) >>> -- It's still hugetlb folio. >>> >>> __folio_clear_hugetlb(folio) >>> hugetlb_free_folio(folio) >>> h = folio_hstate(folio) >>> -- Here, h is NULL pointer >>> >>> When the above race condition occurs, folio_hstate(folio) returns >>> NULL, and subsequent access to this NULL pointer will cause the >>> system to crash. To resolve this issue, execute folio_hstate(folio) >>> under the protection of the hugetlb_lock lock, ensuring that >>> folio_hstate(folio) does not return NULL. >>> >>> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") >>> Signed-off-by: Ge Yang <yangge1116@126.com> >>> Cc: <stable@vger.kernel.org> >>> --- >>> mm/hugetlb.c | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index 3d3ca6b..6c2e007 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long >>> start_pfn, unsigned long end_pfn) >>> while (start_pfn < end_pfn) { >>> folio = pfn_folio(start_pfn); >>> + >>> + /* >>> + * The folio might have been dissolved from under our feet, >>> so make sure >>> + * to carefully check the state under the lock. >>> + */ >>> + spin_lock_irq(&hugetlb_lock); >>> if (folio_test_hugetlb(folio)) { >>> h = folio_hstate(folio); >>> } else { >>> + spin_unlock_irq(&hugetlb_lock); >>> start_pfn++; >>> continue; >>> } >>> + spin_unlock_irq(&hugetlb_lock); >> >> As mentioned elsewhere, this will grab the hugetlb_lock for each and >> every pfn in the range if there are no hugetlb folios (common case). >> >> That should certainly *not* be done. >> >> In case we see !folio_test_hugetlb(), we should just move on. >> > > The main reason for acquiring the hugetlb_lock here is to obtain a valid > hstate, as the alloc_and_dissolve_hugetlb_folio() function requires > hstate as a parameter. This approach is indeed not performance-friendly. > However, in the patch available at > https://lore.kernel.org/lkml/1747987559-23082-1-git-send-email-yangge1116@126.com/, > all these operations will be removed. No, you still take locks on anything that is !folio_ref_count(folio). We should really have an early folio_test_hugetlb(folio) check. hugetlb is on most systems out there the absolute *corner case*. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-05-26 13:00 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-22 3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116 2025-05-22 3:47 ` Muchun Song 2025-05-22 5:34 ` Oscar Salvador 2025-05-22 7:13 ` Muchun Song 2025-05-22 10:13 ` Oscar Salvador 2025-05-22 11:34 ` Ge Yang 2025-05-22 11:49 ` Oscar Salvador 2025-05-22 12:39 ` Muchun Song 2025-05-22 19:32 ` Oscar Salvador 2025-05-23 3:27 ` Muchun Song 2025-05-23 3:46 ` Ge Yang 2025-05-23 3:56 ` Muchun Song 2025-05-23 5:30 ` Oscar Salvador 2025-05-23 8:07 ` Ge Yang 2025-05-22 11:50 ` Oscar Salvador 2025-05-26 12:41 ` David Hildenbrand 2025-05-26 12:57 ` Ge Yang 2025-05-26 12:59 ` David Hildenbrand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).