[PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
@ 2025-05-22  3:22 yangge1116
  2025-05-22  3:47 ` Muchun Song
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: yangge1116 @ 2025-05-22  3:22 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
	muchun.song, osalvador, liuzixing, Ge Yang

From: Ge Yang <yangge1116@126.com>

A kernel crash was observed when replacing free hugetlb folios:

BUG: kernel NULL pointer dereference, address: 0000000000000028
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary)
RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
FS:  00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
Call Trace:
<TASK>
 replace_free_hugepage_folios+0xb6/0x100
 alloc_contig_range_noprof+0x18a/0x590
 ? srso_return_thunk+0x5/0x5f
 ? down_read+0x12/0xa0
 ? srso_return_thunk+0x5/0x5f
 cma_range_alloc.constprop.0+0x131/0x290
 __cma_alloc+0xcf/0x2c0
 cma_alloc_write+0x43/0xb0
 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
 debugfs_attr_write+0x46/0x70
 full_proxy_write+0x62/0xa0
 vfs_write+0xf8/0x420
 ? srso_return_thunk+0x5/0x5f
 ? filp_flush+0x86/0xa0
 ? srso_return_thunk+0x5/0x5f
 ? filp_close+0x1f/0x30
 ? srso_return_thunk+0x5/0x5f
 ? do_dup2+0xaf/0x160
 ? srso_return_thunk+0x5/0x5f
 ksys_write+0x65/0xe0
 do_syscall_64+0x64/0x170
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

There is a potential race between __update_and_free_hugetlb_folio()
and replace_free_hugepage_folios():

CPU1                              CPU2
__update_and_free_hugetlb_folio   replace_free_hugepage_folios
                                    folio_test_hugetlb(folio)
                                    -- It's still hugetlb folio.

  __folio_clear_hugetlb(folio)
  hugetlb_free_folio(folio)
                                    h = folio_hstate(folio)
                                    -- Here, h is NULL pointer

When the above race condition occurs, folio_hstate(folio) returns
NULL, and subsequent access to this NULL pointer will cause the
system to crash. To resolve this issue, execute folio_hstate(folio)
under the protection of the hugetlb_lock lock, ensuring that
folio_hstate(folio) does not return NULL.

Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
Signed-off-by: Ge Yang <yangge1116@126.com>
Cc: <stable@vger.kernel.org>
---
 mm/hugetlb.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3d3ca6b..6c2e007 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
 
 	while (start_pfn < end_pfn) {
 		folio = pfn_folio(start_pfn);
+
+		/*
+		 * The folio might have been dissolved from under our feet, so make sure
+		 * to carefully check the state under the lock.
+		 */
+		spin_lock_irq(&hugetlb_lock);
 		if (folio_test_hugetlb(folio)) {
 			h = folio_hstate(folio);
 		} else {
+			spin_unlock_irq(&hugetlb_lock);
 			start_pfn++;
 			continue;
 		}
+		spin_unlock_irq(&hugetlb_lock);
 
 		if (!folio_ref_count(folio)) {
 			ret = alloc_and_dissolve_hugetlb_folio(h, folio,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-22  3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116
@ 2025-05-22  3:47 ` Muchun Song
  2025-05-22  5:34   ` Oscar Salvador
  2025-05-22 11:50 ` Oscar Salvador
  2025-05-26 12:41 ` David Hildenbrand
  2 siblings, 1 reply; 18+ messages in thread
From: Muchun Song @ 2025-05-22  3:47 UTC (permalink / raw)
  To: yangge1116
  Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
	osalvador, liuzixing



> On May 22, 2025, at 11:22, yangge1116@126.com wrote:
> 
> From: Ge Yang <yangge1116@126.com>
> 
> A kernel crash was observed when replacing free hugetlb folios:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000028
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary)
> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
> FS:  00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> replace_free_hugepage_folios+0xb6/0x100
> alloc_contig_range_noprof+0x18a/0x590
> ? srso_return_thunk+0x5/0x5f
> ? down_read+0x12/0xa0
> ? srso_return_thunk+0x5/0x5f
> cma_range_alloc.constprop.0+0x131/0x290
> __cma_alloc+0xcf/0x2c0
> cma_alloc_write+0x43/0xb0
> simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
> debugfs_attr_write+0x46/0x70
> full_proxy_write+0x62/0xa0
> vfs_write+0xf8/0x420
> ? srso_return_thunk+0x5/0x5f
> ? filp_flush+0x86/0xa0
> ? srso_return_thunk+0x5/0x5f
> ? filp_close+0x1f/0x30
> ? srso_return_thunk+0x5/0x5f
> ? do_dup2+0xaf/0x160
> ? srso_return_thunk+0x5/0x5f
> ksys_write+0x65/0xe0
> do_syscall_64+0x64/0x170
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> There is a potential race between __update_and_free_hugetlb_folio()
> and replace_free_hugepage_folios():
> 
> CPU1                              CPU2
> __update_and_free_hugetlb_folio   replace_free_hugepage_folios
>                                    folio_test_hugetlb(folio)
>                                    -- It's still hugetlb folio.
> 
>  __folio_clear_hugetlb(folio)
>  hugetlb_free_folio(folio)
>                                    h = folio_hstate(folio)
>                                    -- Here, h is NULL pointer
> 
> When the above race condition occurs, folio_hstate(folio) returns
> NULL, and subsequent access to this NULL pointer will cause the
> system to crash. To resolve this issue, execute folio_hstate(folio)
> under the protection of the hugetlb_lock lock, ensuring that
> folio_hstate(folio) does not return NULL.
> 
> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
> Signed-off-by: Ge Yang <yangge1116@126.com>
> Cc: <stable@vger.kernel.org>

Thanks for fixing this problem. BTW, in order to catch future similar problem,
it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock
is not held when folio's reference count is zero. For this fix, LGTM.

Reviewed-by: Muchun Song <muchun.song@linux.dev>

Thanks.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-22  3:47 ` Muchun Song
@ 2025-05-22  5:34   ` Oscar Salvador
  2025-05-22  7:13     ` Muchun Song
  2025-05-22 11:34     ` Ge Yang
  0 siblings, 2 replies; 18+ messages in thread
From: Oscar Salvador @ 2025-05-22  5:34 UTC (permalink / raw)
  To: Muchun Song
  Cc: yangge1116, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
	baolin.wang, liuzixing

On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote:
> Thanks for fixing this problem. BTW, in order to catch future similar problem,
> it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock
> is not held when folio's reference count is zero. For this fix, LGTM.

Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(),
which will again check things under the lock?
I mean, I would be ok to save cycles and check upfront in
replace_free_hugepage_folios(), but the latter has only one user which
is alloc_contig_range(), which is not really an expected-to-be optimized
function.

 diff --git a/mm/hugetlb.c b/mm/hugetlb.c
 index bd8971388236..b4d937732256 100644
 --- a/mm/hugetlb.c
 +++ b/mm/hugetlb.c
 @@ -2924,13 +2924,6 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
 
  	while (start_pfn < end_pfn) {
  		folio = pfn_folio(start_pfn);
 -		if (folio_test_hugetlb(folio)) {
 -			h = folio_hstate(folio);
 -		} else {
 -			start_pfn++;
 -			continue;
 -		}
 -
  		if (!folio_ref_count(folio)) {
  			ret = alloc_and_dissolve_hugetlb_folio(h, folio,
  							       &isolate_list);

 

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-22  5:34   ` Oscar Salvador
@ 2025-05-22  7:13     ` Muchun Song
  2025-05-22 10:13       ` Oscar Salvador
  2025-05-22 11:34     ` Ge Yang
  1 sibling, 1 reply; 18+ messages in thread
From: Muchun Song @ 2025-05-22  7:13 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: yangge1116, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
	baolin.wang, liuzixing



> On May 22, 2025, at 13:34, Oscar Salvador <osalvador@suse.de> wrote:
> 
> On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote:
>> Thanks for fixing this problem. BTW, in order to catch future similar problem,
>> it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock
>> is not held when folio's reference count is zero. For this fix, LGTM.
> 
> Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(),
> which will again check things under the lock?

I've also considered about this choice, because there is another similar
case in isolate_or_dissolve_huge_page() which could benefit from this
change. I am fine with both approaches. Anyway, adding an assertion into
folio_hstate() is an improvement for capturing invalid users in the future.
Because any user of folio_hstate() should hold a reference to folio or
hold the hugetlb_lock to make sure it returns a valid hstate for a hugetlb
folio.

Muchun,
Thanks.

> I mean, I would be ok to save cycles and check upfront in
> replace_free_hugepage_folios(), but the latter has only one user which
> is alloc_contig_range(), which is not really an expected-to-be optimized
> function.
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index bd8971388236..b4d937732256 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2924,13 +2924,6 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
> 
>   	while (start_pfn < end_pfn) {
>   		folio = pfn_folio(start_pfn);
> - 		if (folio_test_hugetlb(folio)) {
> - 			h = folio_hstate(folio);
> - 		} else {
> - 			start_pfn++;
> - 			continue;
> - 		}
> -
>   		if (!folio_ref_count(folio)) {
>   			ret = alloc_and_dissolve_hugetlb_folio(h, folio,
>         						&isolate_list);
> 
> 
> 
> -- 
> Oscar Salvador
> SUSE Labs



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-22  7:13     ` Muchun Song
@ 2025-05-22 10:13       ` Oscar Salvador
  0 siblings, 0 replies; 18+ messages in thread
From: Oscar Salvador @ 2025-05-22 10:13 UTC (permalink / raw)
  To: Muchun Song
  Cc: yangge1116, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
	baolin.wang, liuzixing

On Thu, May 22, 2025 at 03:13:31PM +0800, Muchun Song wrote:
> 
> 
> > On May 22, 2025, at 13:34, Oscar Salvador <osalvador@suse.de> wrote:
> > 
> > On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote:
> >> Thanks for fixing this problem. BTW, in order to catch future similar problem,
> >> it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock
> >> is not held when folio's reference count is zero. For this fix, LGTM.
> > 
> > Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(),
> > which will again check things under the lock?
> 
> I've also considered about this choice, because there is another similar
> case in isolate_or_dissolve_huge_page() which could benefit from this
> change. I am fine with both approaches. Anyway, adding an assertion into
> folio_hstate() is an improvement for capturing invalid users in the future.
> Because any user of folio_hstate() should hold a reference to folio or
> hold the hugetlb_lock to make sure it returns a valid hstate for a hugetlb
> folio.

Yes, I am not arguing about that, it makes perfect sense to me, but I am
just kinda against these micro-optimizations for taking the lock to check
things when we are calling in a function that will again the lock to
check things.

Actually, I think that the folio_test_hugetlb() check in
replace_free_hugepage_folios() was put there to try tro be smart and save cycles in
case we were not dealing with a hugetlb page (so freed under us).
Now that we learnt that we cannot do that without 1) taking a refcount
2) or holding the lock, that becomes superfluos, so I would just wipe that
out.

 

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-22  5:34   ` Oscar Salvador
  2025-05-22  7:13     ` Muchun Song
@ 2025-05-22 11:34     ` Ge Yang
  2025-05-22 11:49       ` Oscar Salvador
  1 sibling, 1 reply; 18+ messages in thread
From: Ge Yang @ 2025-05-22 11:34 UTC (permalink / raw)
  To: Oscar Salvador, Muchun Song
  Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
	liuzixing



在 2025/5/22 13:34, Oscar Salvador 写道:
> On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote:
>> Thanks for fixing this problem. BTW, in order to catch future similar problem,
>> it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock
>> is not held when folio's reference count is zero. For this fix, LGTM.
> 
> Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(),
> which will again check things under the lock?
> I mean, I would be ok to save cycles and check upfront in
> replace_free_hugepage_folios(), but the latter has only one user which
> is alloc_contig_range(), which is not really an expected-to-be optimized
> function.
> 
>   diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>   index bd8971388236..b4d937732256 100644
>   --- a/mm/hugetlb.c
>   +++ b/mm/hugetlb.c
>   @@ -2924,13 +2924,6 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
>   
>    	while (start_pfn < end_pfn) {
>    		folio = pfn_folio(start_pfn);
>   -		if (folio_test_hugetlb(folio)) {
>   -			h = folio_hstate(folio);
>   -		} else {
>   -			start_pfn++;
>   -			continue;
>   -		}
>   -
>    		if (!folio_ref_count(folio)) {
>    			ret = alloc_and_dissolve_hugetlb_folio(h, folio,
>    							       &isolate_list);
> 
>   
> 
It seems that we cannot simply remove the folio_test_hugetlb() check. 
The reasons are as follows:

1）If we remove it, we will be unable to obtain the hstat corresponding 
to the folio, and consequently, we won't be able to call 
alloc_and_dissolve_hugetlb_folio().

2）The alloc_and_dissolve_hugetlb_folio() function is also called within 
the isolate_or_dissolve_huge_folio() function. However, the 
folio_test_hugetlb() check within the isolate_or_dissolve_huge_folio() 
function cannot be removed.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-22 11:34     ` Ge Yang
@ 2025-05-22 11:49       ` Oscar Salvador
  2025-05-22 12:39         ` Muchun Song
  0 siblings, 1 reply; 18+ messages in thread
From: Oscar Salvador @ 2025-05-22 11:49 UTC (permalink / raw)
  To: Ge Yang
  Cc: Muchun Song, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
	baolin.wang, liuzixing

On Thu, May 22, 2025 at 07:34:56PM +0800, Ge Yang wrote:
> It seems that we cannot simply remove the folio_test_hugetlb() check. The
> reasons are as follows:

Yeah, my thought was whether we could move the folio_hstate within
alloc_and_dissolve_hugetlb_folio(), since the latter really needs to take the
lock.
But isolate_or_dissolve_huge_page() also needs the 'hstate' not only to
pass it onto alloc_and_dissolve_hugetlb_folio() but to check whether
hstate is gigantic.

Umh, kinda hate sparkling the locks all around.

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-22 11:49       ` Oscar Salvador
@ 2025-05-22 12:39         ` Muchun Song
  2025-05-22 19:32           ` Oscar Salvador
  0 siblings, 1 reply; 18+ messages in thread
From: Muchun Song @ 2025-05-22 12:39 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Ge Yang, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
	baolin.wang, liuzixing



> On May 22, 2025, at 19:49, Oscar Salvador <osalvador@suse.de> wrote:
> 
> On Thu, May 22, 2025 at 07:34:56PM +0800, Ge Yang wrote:
>> It seems that we cannot simply remove the folio_test_hugetlb() check. The
>> reasons are as follows:
> 
> Yeah, my thought was whether we could move the folio_hstate within
> alloc_and_dissolve_hugetlb_folio(), since the latter really needs to take the
> lock.
> But isolate_or_dissolve_huge_page() also needs the 'hstate' not only to
> pass it onto alloc_and_dissolve_hugetlb_folio() but to check whether
> hstate is gigantic.

But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check
of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate
from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it.

> 
> Umh, kinda hate sparkling the locks all around.
> 
> 
> -- 
> Oscar Salvador
> SUSE Labs



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-22 12:39         ` Muchun Song
@ 2025-05-22 19:32           ` Oscar Salvador
  2025-05-23  3:27             ` Muchun Song
  0 siblings, 1 reply; 18+ messages in thread
From: Oscar Salvador @ 2025-05-22 19:32 UTC (permalink / raw)
  To: Muchun Song
  Cc: Ge Yang, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
	baolin.wang, liuzixing

On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote:
> But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check
> of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate
> from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it.

Yes, I think we can do that.
So something like the following (compily-tested only) maybe?

 From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001
 From: Oscar Salvador <osalvador@suse.de>
 Date: Thu, 22 May 2025 19:51:04 +0200
 Subject: [PATCH] TMP
 
 ---
  mm/hugetlb.c | 38 +++++++++-----------------------------
  1 file changed, 9 insertions(+), 29 deletions(-)
 
 diff --git a/mm/hugetlb.c b/mm/hugetlb.c
 index bd8971388236..20f08de9e37d 100644
 --- a/mm/hugetlb.c
 +++ b/mm/hugetlb.c
 @@ -2787,15 +2787,13 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
  /*
   * alloc_and_dissolve_hugetlb_folio - Allocate a new folio and dissolve
   * the old one
 - * @h: struct hstate old page belongs to
   * @old_folio: Old folio to dissolve
   * @list: List to isolate the page in case we need to
   * Returns 0 on success, otherwise negated error.
   */
 -static int alloc_and_dissolve_hugetlb_folio(struct hstate *h,
 -			struct folio *old_folio, struct list_head *list)
 +static int alloc_and_dissolve_hugetlb_folio(struct folio *old_folio, struct list_head *list)
  {
 -	gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
 +	struct hstate *h;
  	int nid = folio_nid(old_folio);
  	struct folio *new_folio = NULL;
  	int ret = 0;
 @@ -2829,7 +2827,11 @@ static int alloc_and_dissolve_hugetlb_folio(struct hstate *h,
  		cond_resched();
  		goto retry;
  	} else {
 +		h = folio_hstate(old_folio);
 +
  		if (!new_folio) {
 +			gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
 +
  			spin_unlock_irq(&hugetlb_lock);
  			new_folio = alloc_buddy_hugetlb_folio(h, gfp_mask, nid,
  							      NULL, NULL);
 @@ -2874,35 +2876,20 @@ static int alloc_and_dissolve_hugetlb_folio(struct hstate *h,
  
  int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list)
  {
 -	struct hstate *h;
  	int ret = -EBUSY;
  
 -	/*
 -	 * The page might have been dissolved from under our feet, so make sure
 -	 * to carefully check the state under the lock.
 -	 * Return success when racing as if we dissolved the page ourselves.
 -	 */
 -	spin_lock_irq(&hugetlb_lock);
 -	if (folio_test_hugetlb(folio)) {
 -		h = folio_hstate(folio);
 -	} else {
 -		spin_unlock_irq(&hugetlb_lock);
 -		return 0;
 -	}
 -	spin_unlock_irq(&hugetlb_lock);
 -
  	/*
  	 * Fence off gigantic pages as there is a cyclic dependency between
  	 * alloc_contig_range and them. Return -ENOMEM as this has the effect
  	 * of bailing out right away without further retrying.
  	 */
 -	if (hstate_is_gigantic(h))
 +	if (folio_order(folio) > MAX_PAGE_ORDER)
  		return -ENOMEM;
  
  	if (folio_ref_count(folio) && folio_isolate_hugetlb(folio, list))
  		ret = 0;
  	else if (!folio_ref_count(folio))
 -		ret = alloc_and_dissolve_hugetlb_folio(h, folio, list);
 +		ret = alloc_and_dissolve_hugetlb_folio(folio, list);
  
  	return ret;
  }
 @@ -2916,7 +2903,6 @@ int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list)
   */
  int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
  {
 -	struct hstate *h;
  	struct folio *folio;
  	int ret = 0;
  
 @@ -2924,15 +2910,9 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
  
  	while (start_pfn < end_pfn) {
  		folio = pfn_folio(start_pfn);
 -		if (folio_test_hugetlb(folio)) {
 -			h = folio_hstate(folio);
 -		} else {
 -			start_pfn++;
 -			continue;
 -		}
  
  		if (!folio_ref_count(folio)) {
 -			ret = alloc_and_dissolve_hugetlb_folio(h, folio,
 +			ret = alloc_and_dissolve_hugetlb_folio(folio,
  							       &isolate_list);
  			if (ret)
  				break;
 -- 
 2.49.0

 

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-22 19:32           ` Oscar Salvador
@ 2025-05-23  3:27             ` Muchun Song
  2025-05-23  3:46               ` Ge Yang
  0 siblings, 1 reply; 18+ messages in thread
From: Muchun Song @ 2025-05-23  3:27 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Ge Yang, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
	baolin.wang, liuzixing



> On May 23, 2025, at 03:32, Oscar Salvador <osalvador@suse.de> wrote:
> 
> On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote:
>> But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check
>> of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate
>> from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it.
> 
> Yes, I think we can do that.
> So something like the following (compily-tested only) maybe?
> 
> From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001
> From: Oscar Salvador <osalvador@suse.de>
> Date: Thu, 22 May 2025 19:51:04 +0200
> Subject: [PATCH] TMP
> 
> ---
>  mm/hugetlb.c | 38 +++++++++-----------------------------
>  1 file changed, 9 insertions(+), 29 deletions(-)

Pretty simple. The code LGTM.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-23  3:27             ` Muchun Song
@ 2025-05-23  3:46               ` Ge Yang
  2025-05-23  3:56                 ` Muchun Song
  2025-05-23  5:30                 ` Oscar Salvador
  0 siblings, 2 replies; 18+ messages in thread
From: Ge Yang @ 2025-05-23  3:46 UTC (permalink / raw)
  To: Muchun Song, Oscar Salvador
  Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
	liuzixing



在 2025/5/23 11:27, Muchun Song 写道:
> 
> 
>> On May 23, 2025, at 03:32, Oscar Salvador <osalvador@suse.de> wrote:
>>
>> On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote:
>>> But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check
>>> of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate
>>> from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it.
>>
>> Yes, I think we can do that.
>> So something like the following (compily-tested only) maybe?
>>
>>  From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001
>> From: Oscar Salvador <osalvador@suse.de>
>> Date: Thu, 22 May 2025 19:51:04 +0200
>> Subject: [PATCH] TMP
>>
>> ---
>>   mm/hugetlb.c | 38 +++++++++-----------------------------
>>   1 file changed, 9 insertions(+), 29 deletions(-)
> 
> Pretty simple. The code LGTM.
> 
> Thanks.

Thanks.

The implementation of alloc_and_dissolve_hugetlb_folio differs between 
kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to 
submit another patch based on Oscar Salvador's suggestion.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-23  3:46               ` Ge Yang
@ 2025-05-23  3:56                 ` Muchun Song
  2025-05-23  5:30                 ` Oscar Salvador
  1 sibling, 0 replies; 18+ messages in thread
From: Muchun Song @ 2025-05-23  3:56 UTC (permalink / raw)
  To: Ge Yang
  Cc: Oscar Salvador, akpm, linux-mm, linux-kernel, stable, 21cnbao,
	david, baolin.wang, liuzixing



> On May 23, 2025, at 11:46, Ge Yang <yangge1116@126.com> wrote:
> 
> 
> 
> 在 2025/5/23 11:27, Muchun Song 写道:
>>> On May 23, 2025, at 03:32, Oscar Salvador <osalvador@suse.de> wrote:
>>> 
>>> On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote:
>>>> But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check
>>>> of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate
>>>> from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it.
>>> 
>>> Yes, I think we can do that.
>>> So something like the following (compily-tested only) maybe?
>>> 
>>> From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001
>>> From: Oscar Salvador <osalvador@suse.de>
>>> Date: Thu, 22 May 2025 19:51:04 +0200
>>> Subject: [PATCH] TMP
>>> 
>>> ---
>>>  mm/hugetlb.c | 38 +++++++++-----------------------------
>>>  1 file changed, 9 insertions(+), 29 deletions(-)
>> Pretty simple. The code LGTM.
>> Thanks.
> 
> Thanks.
> 
> The implementation of alloc_and_dissolve_hugetlb_folio differs between kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to submit another patch based on Oscar Salvador's suggestion.

A separate improving patch LGTM.

> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-23  3:46               ` Ge Yang
  2025-05-23  3:56                 ` Muchun Song
@ 2025-05-23  5:30                 ` Oscar Salvador
  2025-05-23  8:07                   ` Ge Yang
  1 sibling, 1 reply; 18+ messages in thread
From: Oscar Salvador @ 2025-05-23  5:30 UTC (permalink / raw)
  To: Ge Yang
  Cc: Muchun Song, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
	baolin.wang, liuzixing

On Fri, May 23, 2025 at 11:46:51AM +0800, Ge Yang wrote:
> The implementation of alloc_and_dissolve_hugetlb_folio differs between
> kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to
> submit another patch based on Oscar Salvador's suggestion.

If the code differs a lot between a stable version and upstream, the way
to go is to submit a specific patch for the stable one that applies
to that tree, e.g: [PATCH 6.6]...


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-23  5:30                 ` Oscar Salvador
@ 2025-05-23  8:07                   ` Ge Yang
  0 siblings, 0 replies; 18+ messages in thread
From: Ge Yang @ 2025-05-23  8:07 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Muchun Song, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
	baolin.wang, liuzixing



在 2025/5/23 13:30, Oscar Salvador 写道:
> On Fri, May 23, 2025 at 11:46:51AM +0800, Ge Yang wrote:
>> The implementation of alloc_and_dissolve_hugetlb_folio differs between
>> kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to
>> submit another patch based on Oscar Salvador's suggestion.
> 
> If the code differs a lot between a stable version and upstream, the way
> to go is to submit a specific patch for the stable one that applies
> to that tree, e.g: [PATCH 6.6]...
> 
> 

Ok, thanks.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-22  3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116
  2025-05-22  3:47 ` Muchun Song
@ 2025-05-22 11:50 ` Oscar Salvador
  2025-05-26 12:41 ` David Hildenbrand
  2 siblings, 0 replies; 18+ messages in thread
From: Oscar Salvador @ 2025-05-22 11:50 UTC (permalink / raw)
  To: yangge1116
  Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
	muchun.song, liuzixing

On Thu, May 22, 2025 at 11:22:17AM +0800, yangge1116@126.com wrote:
> From: Ge Yang <yangge1116@126.com>
> 
> A kernel crash was observed when replacing free hugetlb folios:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000028
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary)
> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
> FS:  00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
>  replace_free_hugepage_folios+0xb6/0x100
>  alloc_contig_range_noprof+0x18a/0x590
>  ? srso_return_thunk+0x5/0x5f
>  ? down_read+0x12/0xa0
>  ? srso_return_thunk+0x5/0x5f
>  cma_range_alloc.constprop.0+0x131/0x290
>  __cma_alloc+0xcf/0x2c0
>  cma_alloc_write+0x43/0xb0
>  simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
>  debugfs_attr_write+0x46/0x70
>  full_proxy_write+0x62/0xa0
>  vfs_write+0xf8/0x420
>  ? srso_return_thunk+0x5/0x5f
>  ? filp_flush+0x86/0xa0
>  ? srso_return_thunk+0x5/0x5f
>  ? filp_close+0x1f/0x30
>  ? srso_return_thunk+0x5/0x5f
>  ? do_dup2+0xaf/0x160
>  ? srso_return_thunk+0x5/0x5f
>  ksys_write+0x65/0xe0
>  do_syscall_64+0x64/0x170
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> There is a potential race between __update_and_free_hugetlb_folio()
> and replace_free_hugepage_folios():
> 
> CPU1                              CPU2
> __update_and_free_hugetlb_folio   replace_free_hugepage_folios
>                                     folio_test_hugetlb(folio)
>                                     -- It's still hugetlb folio.
> 
>   __folio_clear_hugetlb(folio)
>   hugetlb_free_folio(folio)
>                                     h = folio_hstate(folio)
>                                     -- Here, h is NULL pointer
> 
> When the above race condition occurs, folio_hstate(folio) returns
> NULL, and subsequent access to this NULL pointer will cause the
> system to crash. To resolve this issue, execute folio_hstate(folio)
> under the protection of the hugetlb_lock lock, ensuring that
> folio_hstate(folio) does not return NULL.
> 
> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
> Signed-off-by: Ge Yang <yangge1116@126.com>
> Cc: <stable@vger.kernel.org>

Reviewed-by: Oscar Salvador <osalvador@suse.de>


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-22  3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116
  2025-05-22  3:47 ` Muchun Song
  2025-05-22 11:50 ` Oscar Salvador
@ 2025-05-26 12:41 ` David Hildenbrand
  2025-05-26 12:57   ` Ge Yang
  2 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2025-05-26 12:41 UTC (permalink / raw)
  To: yangge1116, akpm
  Cc: linux-mm, linux-kernel, stable, 21cnbao, baolin.wang, muchun.song,
	osalvador, liuzixing

On 22.05.25 05:22, yangge1116@126.com wrote:
> From: Ge Yang <yangge1116@126.com>
> 
> A kernel crash was observed when replacing free hugetlb folios:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000028
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary)
> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
> FS:  00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
>   replace_free_hugepage_folios+0xb6/0x100
>   alloc_contig_range_noprof+0x18a/0x590
>   ? srso_return_thunk+0x5/0x5f
>   ? down_read+0x12/0xa0
>   ? srso_return_thunk+0x5/0x5f
>   cma_range_alloc.constprop.0+0x131/0x290
>   __cma_alloc+0xcf/0x2c0
>   cma_alloc_write+0x43/0xb0
>   simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
>   debugfs_attr_write+0x46/0x70
>   full_proxy_write+0x62/0xa0
>   vfs_write+0xf8/0x420
>   ? srso_return_thunk+0x5/0x5f
>   ? filp_flush+0x86/0xa0
>   ? srso_return_thunk+0x5/0x5f
>   ? filp_close+0x1f/0x30
>   ? srso_return_thunk+0x5/0x5f
>   ? do_dup2+0xaf/0x160
>   ? srso_return_thunk+0x5/0x5f
>   ksys_write+0x65/0xe0
>   do_syscall_64+0x64/0x170
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> There is a potential race between __update_and_free_hugetlb_folio()
> and replace_free_hugepage_folios():
> 
> CPU1                              CPU2
> __update_and_free_hugetlb_folio   replace_free_hugepage_folios
>                                      folio_test_hugetlb(folio)
>                                      -- It's still hugetlb folio.
> 
>    __folio_clear_hugetlb(folio)
>    hugetlb_free_folio(folio)
>                                      h = folio_hstate(folio)
>                                      -- Here, h is NULL pointer
> 
> When the above race condition occurs, folio_hstate(folio) returns
> NULL, and subsequent access to this NULL pointer will cause the
> system to crash. To resolve this issue, execute folio_hstate(folio)
> under the protection of the hugetlb_lock lock, ensuring that
> folio_hstate(folio) does not return NULL.
> 
> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
> Signed-off-by: Ge Yang <yangge1116@126.com>
> Cc: <stable@vger.kernel.org>
> ---
>   mm/hugetlb.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3d3ca6b..6c2e007 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
>   
>   	while (start_pfn < end_pfn) {
>   		folio = pfn_folio(start_pfn);
> +
> +		/*
> +		 * The folio might have been dissolved from under our feet, so make sure
> +		 * to carefully check the state under the lock.
> +		 */
> +		spin_lock_irq(&hugetlb_lock);
>   		if (folio_test_hugetlb(folio)) {
>   			h = folio_hstate(folio);
>   		} else {
> +			spin_unlock_irq(&hugetlb_lock);
>   			start_pfn++;
>   			continue;
>   		}
> +		spin_unlock_irq(&hugetlb_lock);

As mentioned elsewhere, this will grab the hugetlb_lock for each and 
every pfn in the range if there are no hugetlb folios (common case).

That should certainly *not* be done.

In case we see !folio_test_hugetlb(), we should just move on.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-26 12:41 ` David Hildenbrand
@ 2025-05-26 12:57   ` Ge Yang
  2025-05-26 12:59     ` David Hildenbrand
  0 siblings, 1 reply; 18+ messages in thread
From: Ge Yang @ 2025-05-26 12:57 UTC (permalink / raw)
  To: David Hildenbrand, akpm
  Cc: linux-mm, linux-kernel, stable, 21cnbao, baolin.wang, muchun.song,
	osalvador, liuzixing



在 2025/5/26 20:41, David Hildenbrand 写道:
> On 22.05.25 05:22, yangge1116@126.com wrote:
>> From: Ge Yang <yangge1116@126.com>
>>
>> A kernel crash was observed when replacing free hugetlb folios:
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000028
>> PGD 0 P4D 0
>> Oops: Oops: 0000 [#1] SMP NOPTI
>> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 
>> PREEMPT(voluntary)
>> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
>> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
>> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
>> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
>> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
>> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
>> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
>> FS:  00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) 
>> knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
>> Call Trace:
>> <TASK>
>>   replace_free_hugepage_folios+0xb6/0x100
>>   alloc_contig_range_noprof+0x18a/0x590
>>   ? srso_return_thunk+0x5/0x5f
>>   ? down_read+0x12/0xa0
>>   ? srso_return_thunk+0x5/0x5f
>>   cma_range_alloc.constprop.0+0x131/0x290
>>   __cma_alloc+0xcf/0x2c0
>>   cma_alloc_write+0x43/0xb0
>>   simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
>>   debugfs_attr_write+0x46/0x70
>>   full_proxy_write+0x62/0xa0
>>   vfs_write+0xf8/0x420
>>   ? srso_return_thunk+0x5/0x5f
>>   ? filp_flush+0x86/0xa0
>>   ? srso_return_thunk+0x5/0x5f
>>   ? filp_close+0x1f/0x30
>>   ? srso_return_thunk+0x5/0x5f
>>   ? do_dup2+0xaf/0x160
>>   ? srso_return_thunk+0x5/0x5f
>>   ksys_write+0x65/0xe0
>>   do_syscall_64+0x64/0x170
>>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> There is a potential race between __update_and_free_hugetlb_folio()
>> and replace_free_hugepage_folios():
>>
>> CPU1                              CPU2
>> __update_and_free_hugetlb_folio   replace_free_hugepage_folios
>>                                      folio_test_hugetlb(folio)
>>                                      -- It's still hugetlb folio.
>>
>>    __folio_clear_hugetlb(folio)
>>    hugetlb_free_folio(folio)
>>                                      h = folio_hstate(folio)
>>                                      -- Here, h is NULL pointer
>>
>> When the above race condition occurs, folio_hstate(folio) returns
>> NULL, and subsequent access to this NULL pointer will cause the
>> system to crash. To resolve this issue, execute folio_hstate(folio)
>> under the protection of the hugetlb_lock lock, ensuring that
>> folio_hstate(folio) does not return NULL.
>>
>> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
>> Signed-off-by: Ge Yang <yangge1116@126.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>>   mm/hugetlb.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 3d3ca6b..6c2e007 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long 
>> start_pfn, unsigned long end_pfn)
>>       while (start_pfn < end_pfn) {
>>           folio = pfn_folio(start_pfn);
>> +
>> +        /*
>> +         * The folio might have been dissolved from under our feet, 
>> so make sure
>> +         * to carefully check the state under the lock.
>> +         */
>> +        spin_lock_irq(&hugetlb_lock);
>>           if (folio_test_hugetlb(folio)) {
>>               h = folio_hstate(folio);
>>           } else {
>> +            spin_unlock_irq(&hugetlb_lock);
>>               start_pfn++;
>>               continue;
>>           }
>> +        spin_unlock_irq(&hugetlb_lock);
> 
> As mentioned elsewhere, this will grab the hugetlb_lock for each and 
> every pfn in the range if there are no hugetlb folios (common case).
> 
> That should certainly *not* be done.
> 
> In case we see !folio_test_hugetlb(), we should just move on.
> 

The main reason for acquiring the hugetlb_lock here is to obtain a valid 
hstate, as the alloc_and_dissolve_hugetlb_folio() function requires 
hstate as a parameter. This approach is indeed not performance-friendly. 
However, in the patch available at 
https://lore.kernel.org/lkml/1747987559-23082-1-git-send-email-yangge1116@126.com/, 
all these operations will be removed.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  2025-05-26 12:57   ` Ge Yang
@ 2025-05-26 12:59     ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2025-05-26 12:59 UTC (permalink / raw)
  To: Ge Yang, akpm
  Cc: linux-mm, linux-kernel, stable, 21cnbao, baolin.wang, muchun.song,
	osalvador, liuzixing

On 26.05.25 14:57, Ge Yang wrote:
> 
> 
> 在 2025/5/26 20:41, David Hildenbrand 写道:
>> On 22.05.25 05:22, yangge1116@126.com wrote:
>>> From: Ge Yang <yangge1116@126.com>
>>>
>>> A kernel crash was observed when replacing free hugetlb folios:
>>>
>>> BUG: kernel NULL pointer dereference, address: 0000000000000028
>>> PGD 0 P4D 0
>>> Oops: Oops: 0000 [#1] SMP NOPTI
>>> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41
>>> PREEMPT(voluntary)
>>> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
>>> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
>>> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
>>> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
>>> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
>>> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
>>> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
>>> FS:  00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000)
>>> knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
>>> Call Trace:
>>> <TASK>
>>>    replace_free_hugepage_folios+0xb6/0x100
>>>    alloc_contig_range_noprof+0x18a/0x590
>>>    ? srso_return_thunk+0x5/0x5f
>>>    ? down_read+0x12/0xa0
>>>    ? srso_return_thunk+0x5/0x5f
>>>    cma_range_alloc.constprop.0+0x131/0x290
>>>    __cma_alloc+0xcf/0x2c0
>>>    cma_alloc_write+0x43/0xb0
>>>    simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
>>>    debugfs_attr_write+0x46/0x70
>>>    full_proxy_write+0x62/0xa0
>>>    vfs_write+0xf8/0x420
>>>    ? srso_return_thunk+0x5/0x5f
>>>    ? filp_flush+0x86/0xa0
>>>    ? srso_return_thunk+0x5/0x5f
>>>    ? filp_close+0x1f/0x30
>>>    ? srso_return_thunk+0x5/0x5f
>>>    ? do_dup2+0xaf/0x160
>>>    ? srso_return_thunk+0x5/0x5f
>>>    ksys_write+0x65/0xe0
>>>    do_syscall_64+0x64/0x170
>>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> There is a potential race between __update_and_free_hugetlb_folio()
>>> and replace_free_hugepage_folios():
>>>
>>> CPU1                              CPU2
>>> __update_and_free_hugetlb_folio   replace_free_hugepage_folios
>>>                                       folio_test_hugetlb(folio)
>>>                                       -- It's still hugetlb folio.
>>>
>>>     __folio_clear_hugetlb(folio)
>>>     hugetlb_free_folio(folio)
>>>                                       h = folio_hstate(folio)
>>>                                       -- Here, h is NULL pointer
>>>
>>> When the above race condition occurs, folio_hstate(folio) returns
>>> NULL, and subsequent access to this NULL pointer will cause the
>>> system to crash. To resolve this issue, execute folio_hstate(folio)
>>> under the protection of the hugetlb_lock lock, ensuring that
>>> folio_hstate(folio) does not return NULL.
>>>
>>> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
>>> Signed-off-by: Ge Yang <yangge1116@126.com>
>>> Cc: <stable@vger.kernel.org>
>>> ---
>>>    mm/hugetlb.c | 8 ++++++++
>>>    1 file changed, 8 insertions(+)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 3d3ca6b..6c2e007 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long
>>> start_pfn, unsigned long end_pfn)
>>>        while (start_pfn < end_pfn) {
>>>            folio = pfn_folio(start_pfn);
>>> +
>>> +        /*
>>> +         * The folio might have been dissolved from under our feet,
>>> so make sure
>>> +         * to carefully check the state under the lock.
>>> +         */
>>> +        spin_lock_irq(&hugetlb_lock);
>>>            if (folio_test_hugetlb(folio)) {
>>>                h = folio_hstate(folio);
>>>            } else {
>>> +            spin_unlock_irq(&hugetlb_lock);
>>>                start_pfn++;
>>>                continue;
>>>            }
>>> +        spin_unlock_irq(&hugetlb_lock);
>>
>> As mentioned elsewhere, this will grab the hugetlb_lock for each and
>> every pfn in the range if there are no hugetlb folios (common case).
>>
>> That should certainly *not* be done.
>>
>> In case we see !folio_test_hugetlb(), we should just move on.
>>
> 
> The main reason for acquiring the hugetlb_lock here is to obtain a valid
> hstate, as the alloc_and_dissolve_hugetlb_folio() function requires
> hstate as a parameter. This approach is indeed not performance-friendly.
> However, in the patch available at
> https://lore.kernel.org/lkml/1747987559-23082-1-git-send-email-yangge1116@126.com/,
> all these operations will be removed.

No, you still take locks on anything that is !folio_ref_count(folio).

We should really have an early folio_test_hugetlb(folio) check.

hugetlb is on most systems out there the absolute *corner case*.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-05-26 13:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22  3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116
2025-05-22  3:47 ` Muchun Song
2025-05-22  5:34   ` Oscar Salvador
2025-05-22  7:13     ` Muchun Song
2025-05-22 10:13       ` Oscar Salvador
2025-05-22 11:34     ` Ge Yang
2025-05-22 11:49       ` Oscar Salvador
2025-05-22 12:39         ` Muchun Song
2025-05-22 19:32           ` Oscar Salvador
2025-05-23  3:27             ` Muchun Song
2025-05-23  3:46               ` Ge Yang
2025-05-23  3:56                 ` Muchun Song
2025-05-23  5:30                 ` Oscar Salvador
2025-05-23  8:07                   ` Ge Yang
2025-05-22 11:50 ` Oscar Salvador
2025-05-26 12:41 ` David Hildenbrand
2025-05-26 12:57   ` Ge Yang
2025-05-26 12:59     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).