* [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
@ 2025-05-22 3:22 yangge1116
2025-05-22 3:47 ` Muchun Song
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: yangge1116 @ 2025-05-22 3:22 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
muchun.song, osalvador, liuzixing, Ge Yang
From: Ge Yang <yangge1116@126.com>
A kernel crash was observed when replacing free hugetlb folios:
BUG: kernel NULL pointer dereference, address: 0000000000000028
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary)
RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
Call Trace:
<TASK>
replace_free_hugepage_folios+0xb6/0x100
alloc_contig_range_noprof+0x18a/0x590
? srso_return_thunk+0x5/0x5f
? down_read+0x12/0xa0
? srso_return_thunk+0x5/0x5f
cma_range_alloc.constprop.0+0x131/0x290
__cma_alloc+0xcf/0x2c0
cma_alloc_write+0x43/0xb0
simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
debugfs_attr_write+0x46/0x70
full_proxy_write+0x62/0xa0
vfs_write+0xf8/0x420
? srso_return_thunk+0x5/0x5f
? filp_flush+0x86/0xa0
? srso_return_thunk+0x5/0x5f
? filp_close+0x1f/0x30
? srso_return_thunk+0x5/0x5f
? do_dup2+0xaf/0x160
? srso_return_thunk+0x5/0x5f
ksys_write+0x65/0xe0
do_syscall_64+0x64/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
There is a potential race between __update_and_free_hugetlb_folio()
and replace_free_hugepage_folios():
CPU1 CPU2
__update_and_free_hugetlb_folio replace_free_hugepage_folios
folio_test_hugetlb(folio)
-- It's still hugetlb folio.
__folio_clear_hugetlb(folio)
hugetlb_free_folio(folio)
h = folio_hstate(folio)
-- Here, h is NULL pointer
When the above race condition occurs, folio_hstate(folio) returns
NULL, and subsequent access to this NULL pointer will cause the
system to crash. To resolve this issue, execute folio_hstate(folio)
under the protection of the hugetlb_lock lock, ensuring that
folio_hstate(folio) does not return NULL.
Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
Signed-off-by: Ge Yang <yangge1116@126.com>
Cc: <stable@vger.kernel.org>
---
mm/hugetlb.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3d3ca6b..6c2e007 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
while (start_pfn < end_pfn) {
folio = pfn_folio(start_pfn);
+
+ /*
+ * The folio might have been dissolved from under our feet, so make sure
+ * to carefully check the state under the lock.
+ */
+ spin_lock_irq(&hugetlb_lock);
if (folio_test_hugetlb(folio)) {
h = folio_hstate(folio);
} else {
+ spin_unlock_irq(&hugetlb_lock);
start_pfn++;
continue;
}
+ spin_unlock_irq(&hugetlb_lock);
if (!folio_ref_count(folio)) {
ret = alloc_and_dissolve_hugetlb_folio(h, folio,
--
2.7.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-22 3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116
@ 2025-05-22 3:47 ` Muchun Song
2025-05-22 5:34 ` Oscar Salvador
2025-05-22 11:50 ` Oscar Salvador
2025-05-26 12:41 ` David Hildenbrand
2 siblings, 1 reply; 18+ messages in thread
From: Muchun Song @ 2025-05-22 3:47 UTC (permalink / raw)
To: yangge1116
Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
osalvador, liuzixing
> On May 22, 2025, at 11:22, yangge1116@126.com wrote:
>
> From: Ge Yang <yangge1116@126.com>
>
> A kernel crash was observed when replacing free hugetlb folios:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000028
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary)
> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
> FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> replace_free_hugepage_folios+0xb6/0x100
> alloc_contig_range_noprof+0x18a/0x590
> ? srso_return_thunk+0x5/0x5f
> ? down_read+0x12/0xa0
> ? srso_return_thunk+0x5/0x5f
> cma_range_alloc.constprop.0+0x131/0x290
> __cma_alloc+0xcf/0x2c0
> cma_alloc_write+0x43/0xb0
> simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
> debugfs_attr_write+0x46/0x70
> full_proxy_write+0x62/0xa0
> vfs_write+0xf8/0x420
> ? srso_return_thunk+0x5/0x5f
> ? filp_flush+0x86/0xa0
> ? srso_return_thunk+0x5/0x5f
> ? filp_close+0x1f/0x30
> ? srso_return_thunk+0x5/0x5f
> ? do_dup2+0xaf/0x160
> ? srso_return_thunk+0x5/0x5f
> ksys_write+0x65/0xe0
> do_syscall_64+0x64/0x170
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> There is a potential race between __update_and_free_hugetlb_folio()
> and replace_free_hugepage_folios():
>
> CPU1 CPU2
> __update_and_free_hugetlb_folio replace_free_hugepage_folios
> folio_test_hugetlb(folio)
> -- It's still hugetlb folio.
>
> __folio_clear_hugetlb(folio)
> hugetlb_free_folio(folio)
> h = folio_hstate(folio)
> -- Here, h is NULL pointer
>
> When the above race condition occurs, folio_hstate(folio) returns
> NULL, and subsequent access to this NULL pointer will cause the
> system to crash. To resolve this issue, execute folio_hstate(folio)
> under the protection of the hugetlb_lock lock, ensuring that
> folio_hstate(folio) does not return NULL.
>
> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
> Signed-off-by: Ge Yang <yangge1116@126.com>
> Cc: <stable@vger.kernel.org>
Thanks for fixing this problem. BTW, in order to catch future similar problem,
it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock
is not held when folio's reference count is zero. For this fix, LGTM.
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-22 3:47 ` Muchun Song
@ 2025-05-22 5:34 ` Oscar Salvador
2025-05-22 7:13 ` Muchun Song
2025-05-22 11:34 ` Ge Yang
0 siblings, 2 replies; 18+ messages in thread
From: Oscar Salvador @ 2025-05-22 5:34 UTC (permalink / raw)
To: Muchun Song
Cc: yangge1116, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
baolin.wang, liuzixing
On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote:
> Thanks for fixing this problem. BTW, in order to catch future similar problem,
> it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock
> is not held when folio's reference count is zero. For this fix, LGTM.
Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(),
which will again check things under the lock?
I mean, I would be ok to save cycles and check upfront in
replace_free_hugepage_folios(), but the latter has only one user which
is alloc_contig_range(), which is not really an expected-to-be optimized
function.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bd8971388236..b4d937732256 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2924,13 +2924,6 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
while (start_pfn < end_pfn) {
folio = pfn_folio(start_pfn);
- if (folio_test_hugetlb(folio)) {
- h = folio_hstate(folio);
- } else {
- start_pfn++;
- continue;
- }
-
if (!folio_ref_count(folio)) {
ret = alloc_and_dissolve_hugetlb_folio(h, folio,
&isolate_list);
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-22 5:34 ` Oscar Salvador
@ 2025-05-22 7:13 ` Muchun Song
2025-05-22 10:13 ` Oscar Salvador
2025-05-22 11:34 ` Ge Yang
1 sibling, 1 reply; 18+ messages in thread
From: Muchun Song @ 2025-05-22 7:13 UTC (permalink / raw)
To: Oscar Salvador
Cc: yangge1116, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
baolin.wang, liuzixing
> On May 22, 2025, at 13:34, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote:
>> Thanks for fixing this problem. BTW, in order to catch future similar problem,
>> it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock
>> is not held when folio's reference count is zero. For this fix, LGTM.
>
> Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(),
> which will again check things under the lock?
I've also considered about this choice, because there is another similar
case in isolate_or_dissolve_huge_page() which could benefit from this
change. I am fine with both approaches. Anyway, adding an assertion into
folio_hstate() is an improvement for capturing invalid users in the future.
Because any user of folio_hstate() should hold a reference to folio or
hold the hugetlb_lock to make sure it returns a valid hstate for a hugetlb
folio.
Muchun,
Thanks.
> I mean, I would be ok to save cycles and check upfront in
> replace_free_hugepage_folios(), but the latter has only one user which
> is alloc_contig_range(), which is not really an expected-to-be optimized
> function.
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index bd8971388236..b4d937732256 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2924,13 +2924,6 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
>
> while (start_pfn < end_pfn) {
> folio = pfn_folio(start_pfn);
> - if (folio_test_hugetlb(folio)) {
> - h = folio_hstate(folio);
> - } else {
> - start_pfn++;
> - continue;
> - }
> -
> if (!folio_ref_count(folio)) {
> ret = alloc_and_dissolve_hugetlb_folio(h, folio,
> &isolate_list);
>
>
>
> --
> Oscar Salvador
> SUSE Labs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-22 7:13 ` Muchun Song
@ 2025-05-22 10:13 ` Oscar Salvador
0 siblings, 0 replies; 18+ messages in thread
From: Oscar Salvador @ 2025-05-22 10:13 UTC (permalink / raw)
To: Muchun Song
Cc: yangge1116, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
baolin.wang, liuzixing
On Thu, May 22, 2025 at 03:13:31PM +0800, Muchun Song wrote:
>
>
> > On May 22, 2025, at 13:34, Oscar Salvador <osalvador@suse.de> wrote:
> >
> > On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote:
> >> Thanks for fixing this problem. BTW, in order to catch future similar problem,
> >> it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock
> >> is not held when folio's reference count is zero. For this fix, LGTM.
> >
> > Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(),
> > which will again check things under the lock?
>
> I've also considered about this choice, because there is another similar
> case in isolate_or_dissolve_huge_page() which could benefit from this
> change. I am fine with both approaches. Anyway, adding an assertion into
> folio_hstate() is an improvement for capturing invalid users in the future.
> Because any user of folio_hstate() should hold a reference to folio or
> hold the hugetlb_lock to make sure it returns a valid hstate for a hugetlb
> folio.
Yes, I am not arguing about that, it makes perfect sense to me, but I am
just kinda against these micro-optimizations for taking the lock to check
things when we are calling in a function that will again the lock to
check things.
Actually, I think that the folio_test_hugetlb() check in
replace_free_hugepage_folios() was put there to try tro be smart and save cycles in
case we were not dealing with a hugetlb page (so freed under us).
Now that we learnt that we cannot do that without 1) taking a refcount
2) or holding the lock, that becomes superfluos, so I would just wipe that
out.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-22 5:34 ` Oscar Salvador
2025-05-22 7:13 ` Muchun Song
@ 2025-05-22 11:34 ` Ge Yang
2025-05-22 11:49 ` Oscar Salvador
1 sibling, 1 reply; 18+ messages in thread
From: Ge Yang @ 2025-05-22 11:34 UTC (permalink / raw)
To: Oscar Salvador, Muchun Song
Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
liuzixing
在 2025/5/22 13:34, Oscar Salvador 写道:
> On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote:
>> Thanks for fixing this problem. BTW, in order to catch future similar problem,
>> it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock
>> is not held when folio's reference count is zero. For this fix, LGTM.
>
> Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(),
> which will again check things under the lock?
> I mean, I would be ok to save cycles and check upfront in
> replace_free_hugepage_folios(), but the latter has only one user which
> is alloc_contig_range(), which is not really an expected-to-be optimized
> function.
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index bd8971388236..b4d937732256 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2924,13 +2924,6 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
>
> while (start_pfn < end_pfn) {
> folio = pfn_folio(start_pfn);
> - if (folio_test_hugetlb(folio)) {
> - h = folio_hstate(folio);
> - } else {
> - start_pfn++;
> - continue;
> - }
> -
> if (!folio_ref_count(folio)) {
> ret = alloc_and_dissolve_hugetlb_folio(h, folio,
> &isolate_list);
>
>
>
It seems that we cannot simply remove the folio_test_hugetlb() check.
The reasons are as follows:
1)If we remove it, we will be unable to obtain the hstat corresponding
to the folio, and consequently, we won't be able to call
alloc_and_dissolve_hugetlb_folio().
2)The alloc_and_dissolve_hugetlb_folio() function is also called within
the isolate_or_dissolve_huge_folio() function. However, the
folio_test_hugetlb() check within the isolate_or_dissolve_huge_folio()
function cannot be removed.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-22 11:34 ` Ge Yang
@ 2025-05-22 11:49 ` Oscar Salvador
2025-05-22 12:39 ` Muchun Song
0 siblings, 1 reply; 18+ messages in thread
From: Oscar Salvador @ 2025-05-22 11:49 UTC (permalink / raw)
To: Ge Yang
Cc: Muchun Song, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
baolin.wang, liuzixing
On Thu, May 22, 2025 at 07:34:56PM +0800, Ge Yang wrote:
> It seems that we cannot simply remove the folio_test_hugetlb() check. The
> reasons are as follows:
Yeah, my thought was whether we could move the folio_hstate within
alloc_and_dissolve_hugetlb_folio(), since the latter really needs to take the
lock.
But isolate_or_dissolve_huge_page() also needs the 'hstate' not only to
pass it onto alloc_and_dissolve_hugetlb_folio() but to check whether
hstate is gigantic.
Umh, kinda hate sparkling the locks all around.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-22 3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116
2025-05-22 3:47 ` Muchun Song
@ 2025-05-22 11:50 ` Oscar Salvador
2025-05-26 12:41 ` David Hildenbrand
2 siblings, 0 replies; 18+ messages in thread
From: Oscar Salvador @ 2025-05-22 11:50 UTC (permalink / raw)
To: yangge1116
Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
muchun.song, liuzixing
On Thu, May 22, 2025 at 11:22:17AM +0800, yangge1116@126.com wrote:
> From: Ge Yang <yangge1116@126.com>
>
> A kernel crash was observed when replacing free hugetlb folios:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000028
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary)
> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
> FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> replace_free_hugepage_folios+0xb6/0x100
> alloc_contig_range_noprof+0x18a/0x590
> ? srso_return_thunk+0x5/0x5f
> ? down_read+0x12/0xa0
> ? srso_return_thunk+0x5/0x5f
> cma_range_alloc.constprop.0+0x131/0x290
> __cma_alloc+0xcf/0x2c0
> cma_alloc_write+0x43/0xb0
> simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
> debugfs_attr_write+0x46/0x70
> full_proxy_write+0x62/0xa0
> vfs_write+0xf8/0x420
> ? srso_return_thunk+0x5/0x5f
> ? filp_flush+0x86/0xa0
> ? srso_return_thunk+0x5/0x5f
> ? filp_close+0x1f/0x30
> ? srso_return_thunk+0x5/0x5f
> ? do_dup2+0xaf/0x160
> ? srso_return_thunk+0x5/0x5f
> ksys_write+0x65/0xe0
> do_syscall_64+0x64/0x170
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> There is a potential race between __update_and_free_hugetlb_folio()
> and replace_free_hugepage_folios():
>
> CPU1 CPU2
> __update_and_free_hugetlb_folio replace_free_hugepage_folios
> folio_test_hugetlb(folio)
> -- It's still hugetlb folio.
>
> __folio_clear_hugetlb(folio)
> hugetlb_free_folio(folio)
> h = folio_hstate(folio)
> -- Here, h is NULL pointer
>
> When the above race condition occurs, folio_hstate(folio) returns
> NULL, and subsequent access to this NULL pointer will cause the
> system to crash. To resolve this issue, execute folio_hstate(folio)
> under the protection of the hugetlb_lock lock, ensuring that
> folio_hstate(folio) does not return NULL.
>
> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
> Signed-off-by: Ge Yang <yangge1116@126.com>
> Cc: <stable@vger.kernel.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-22 11:49 ` Oscar Salvador
@ 2025-05-22 12:39 ` Muchun Song
2025-05-22 19:32 ` Oscar Salvador
0 siblings, 1 reply; 18+ messages in thread
From: Muchun Song @ 2025-05-22 12:39 UTC (permalink / raw)
To: Oscar Salvador
Cc: Ge Yang, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
baolin.wang, liuzixing
> On May 22, 2025, at 19:49, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Thu, May 22, 2025 at 07:34:56PM +0800, Ge Yang wrote:
>> It seems that we cannot simply remove the folio_test_hugetlb() check. The
>> reasons are as follows:
>
> Yeah, my thought was whether we could move the folio_hstate within
> alloc_and_dissolve_hugetlb_folio(), since the latter really needs to take the
> lock.
> But isolate_or_dissolve_huge_page() also needs the 'hstate' not only to
> pass it onto alloc_and_dissolve_hugetlb_folio() but to check whether
> hstate is gigantic.
But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check
of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate
from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it.
>
> Umh, kinda hate sparkling the locks all around.
>
>
> --
> Oscar Salvador
> SUSE Labs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-22 12:39 ` Muchun Song
@ 2025-05-22 19:32 ` Oscar Salvador
2025-05-23 3:27 ` Muchun Song
0 siblings, 1 reply; 18+ messages in thread
From: Oscar Salvador @ 2025-05-22 19:32 UTC (permalink / raw)
To: Muchun Song
Cc: Ge Yang, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
baolin.wang, liuzixing
On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote:
> But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check
> of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate
> from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it.
Yes, I think we can do that.
So something like the following (compily-tested only) maybe?
From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001
From: Oscar Salvador <osalvador@suse.de>
Date: Thu, 22 May 2025 19:51:04 +0200
Subject: [PATCH] TMP
---
mm/hugetlb.c | 38 +++++++++-----------------------------
1 file changed, 9 insertions(+), 29 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bd8971388236..20f08de9e37d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2787,15 +2787,13 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
/*
* alloc_and_dissolve_hugetlb_folio - Allocate a new folio and dissolve
* the old one
- * @h: struct hstate old page belongs to
* @old_folio: Old folio to dissolve
* @list: List to isolate the page in case we need to
* Returns 0 on success, otherwise negated error.
*/
-static int alloc_and_dissolve_hugetlb_folio(struct hstate *h,
- struct folio *old_folio, struct list_head *list)
+static int alloc_and_dissolve_hugetlb_folio(struct folio *old_folio, struct list_head *list)
{
- gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
+ struct hstate *h;
int nid = folio_nid(old_folio);
struct folio *new_folio = NULL;
int ret = 0;
@@ -2829,7 +2827,11 @@ static int alloc_and_dissolve_hugetlb_folio(struct hstate *h,
cond_resched();
goto retry;
} else {
+ h = folio_hstate(old_folio);
+
if (!new_folio) {
+ gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
+
spin_unlock_irq(&hugetlb_lock);
new_folio = alloc_buddy_hugetlb_folio(h, gfp_mask, nid,
NULL, NULL);
@@ -2874,35 +2876,20 @@ static int alloc_and_dissolve_hugetlb_folio(struct hstate *h,
int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list)
{
- struct hstate *h;
int ret = -EBUSY;
- /*
- * The page might have been dissolved from under our feet, so make sure
- * to carefully check the state under the lock.
- * Return success when racing as if we dissolved the page ourselves.
- */
- spin_lock_irq(&hugetlb_lock);
- if (folio_test_hugetlb(folio)) {
- h = folio_hstate(folio);
- } else {
- spin_unlock_irq(&hugetlb_lock);
- return 0;
- }
- spin_unlock_irq(&hugetlb_lock);
-
/*
* Fence off gigantic pages as there is a cyclic dependency between
* alloc_contig_range and them. Return -ENOMEM as this has the effect
* of bailing out right away without further retrying.
*/
- if (hstate_is_gigantic(h))
+ if (folio_order(folio) > MAX_PAGE_ORDER)
return -ENOMEM;
if (folio_ref_count(folio) && folio_isolate_hugetlb(folio, list))
ret = 0;
else if (!folio_ref_count(folio))
- ret = alloc_and_dissolve_hugetlb_folio(h, folio, list);
+ ret = alloc_and_dissolve_hugetlb_folio(folio, list);
return ret;
}
@@ -2916,7 +2903,6 @@ int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list)
*/
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
{
- struct hstate *h;
struct folio *folio;
int ret = 0;
@@ -2924,15 +2910,9 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
while (start_pfn < end_pfn) {
folio = pfn_folio(start_pfn);
- if (folio_test_hugetlb(folio)) {
- h = folio_hstate(folio);
- } else {
- start_pfn++;
- continue;
- }
if (!folio_ref_count(folio)) {
- ret = alloc_and_dissolve_hugetlb_folio(h, folio,
+ ret = alloc_and_dissolve_hugetlb_folio(folio,
&isolate_list);
if (ret)
break;
--
2.49.0
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-22 19:32 ` Oscar Salvador
@ 2025-05-23 3:27 ` Muchun Song
2025-05-23 3:46 ` Ge Yang
0 siblings, 1 reply; 18+ messages in thread
From: Muchun Song @ 2025-05-23 3:27 UTC (permalink / raw)
To: Oscar Salvador
Cc: Ge Yang, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
baolin.wang, liuzixing
> On May 23, 2025, at 03:32, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote:
>> But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check
>> of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate
>> from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it.
>
> Yes, I think we can do that.
> So something like the following (compily-tested only) maybe?
>
> From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001
> From: Oscar Salvador <osalvador@suse.de>
> Date: Thu, 22 May 2025 19:51:04 +0200
> Subject: [PATCH] TMP
>
> ---
> mm/hugetlb.c | 38 +++++++++-----------------------------
> 1 file changed, 9 insertions(+), 29 deletions(-)
Pretty simple. The code LGTM.
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-23 3:27 ` Muchun Song
@ 2025-05-23 3:46 ` Ge Yang
2025-05-23 3:56 ` Muchun Song
2025-05-23 5:30 ` Oscar Salvador
0 siblings, 2 replies; 18+ messages in thread
From: Ge Yang @ 2025-05-23 3:46 UTC (permalink / raw)
To: Muchun Song, Oscar Salvador
Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
liuzixing
在 2025/5/23 11:27, Muchun Song 写道:
>
>
>> On May 23, 2025, at 03:32, Oscar Salvador <osalvador@suse.de> wrote:
>>
>> On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote:
>>> But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check
>>> of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate
>>> from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it.
>>
>> Yes, I think we can do that.
>> So something like the following (compily-tested only) maybe?
>>
>> From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001
>> From: Oscar Salvador <osalvador@suse.de>
>> Date: Thu, 22 May 2025 19:51:04 +0200
>> Subject: [PATCH] TMP
>>
>> ---
>> mm/hugetlb.c | 38 +++++++++-----------------------------
>> 1 file changed, 9 insertions(+), 29 deletions(-)
>
> Pretty simple. The code LGTM.
>
> Thanks.
Thanks.
The implementation of alloc_and_dissolve_hugetlb_folio differs between
kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to
submit another patch based on Oscar Salvador's suggestion.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-23 3:46 ` Ge Yang
@ 2025-05-23 3:56 ` Muchun Song
2025-05-23 5:30 ` Oscar Salvador
1 sibling, 0 replies; 18+ messages in thread
From: Muchun Song @ 2025-05-23 3:56 UTC (permalink / raw)
To: Ge Yang
Cc: Oscar Salvador, akpm, linux-mm, linux-kernel, stable, 21cnbao,
david, baolin.wang, liuzixing
> On May 23, 2025, at 11:46, Ge Yang <yangge1116@126.com> wrote:
>
>
>
> 在 2025/5/23 11:27, Muchun Song 写道:
>>> On May 23, 2025, at 03:32, Oscar Salvador <osalvador@suse.de> wrote:
>>>
>>> On Thu, May 22, 2025 at 08:39:39PM +0800, Muchun Song wrote:
>>>> But I think we could use "folio_order() > MAX_PAGE_ORDER" to replace the check
>>>> of hstate_is_gigantic(), right? Then ee could remove the first parameter of hstate
>>>> from alloc_and_dissolve_hugetlb_folio() and obtain hstate in it.
>>>
>>> Yes, I think we can do that.
>>> So something like the following (compily-tested only) maybe?
>>>
>>> From d7199339e905f83b54d22849e8f21f631916ce94 Mon Sep 17 00:00:00 2001
>>> From: Oscar Salvador <osalvador@suse.de>
>>> Date: Thu, 22 May 2025 19:51:04 +0200
>>> Subject: [PATCH] TMP
>>>
>>> ---
>>> mm/hugetlb.c | 38 +++++++++-----------------------------
>>> 1 file changed, 9 insertions(+), 29 deletions(-)
>> Pretty simple. The code LGTM.
>> Thanks.
>
> Thanks.
>
> The implementation of alloc_and_dissolve_hugetlb_folio differs between kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to submit another patch based on Oscar Salvador's suggestion.
A separate improving patch LGTM.
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-23 3:46 ` Ge Yang
2025-05-23 3:56 ` Muchun Song
@ 2025-05-23 5:30 ` Oscar Salvador
2025-05-23 8:07 ` Ge Yang
1 sibling, 1 reply; 18+ messages in thread
From: Oscar Salvador @ 2025-05-23 5:30 UTC (permalink / raw)
To: Ge Yang
Cc: Muchun Song, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
baolin.wang, liuzixing
On Fri, May 23, 2025 at 11:46:51AM +0800, Ge Yang wrote:
> The implementation of alloc_and_dissolve_hugetlb_folio differs between
> kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to
> submit another patch based on Oscar Salvador's suggestion.
If the code differs a lot between a stable version and upstream, the way
to go is to submit a specific patch for the stable one that applies
to that tree, e.g: [PATCH 6.6]...
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-23 5:30 ` Oscar Salvador
@ 2025-05-23 8:07 ` Ge Yang
0 siblings, 0 replies; 18+ messages in thread
From: Ge Yang @ 2025-05-23 8:07 UTC (permalink / raw)
To: Oscar Salvador
Cc: Muchun Song, akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
baolin.wang, liuzixing
在 2025/5/23 13:30, Oscar Salvador 写道:
> On Fri, May 23, 2025 at 11:46:51AM +0800, Ge Yang wrote:
>> The implementation of alloc_and_dissolve_hugetlb_folio differs between
>> kernel 6.6 and kernel 6.15. To facilitate backporting, I'm planning to
>> submit another patch based on Oscar Salvador's suggestion.
>
> If the code differs a lot between a stable version and upstream, the way
> to go is to submit a specific patch for the stable one that applies
> to that tree, e.g: [PATCH 6.6]...
>
>
Ok, thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-22 3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116
2025-05-22 3:47 ` Muchun Song
2025-05-22 11:50 ` Oscar Salvador
@ 2025-05-26 12:41 ` David Hildenbrand
2025-05-26 12:57 ` Ge Yang
2 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2025-05-26 12:41 UTC (permalink / raw)
To: yangge1116, akpm
Cc: linux-mm, linux-kernel, stable, 21cnbao, baolin.wang, muchun.song,
osalvador, liuzixing
On 22.05.25 05:22, yangge1116@126.com wrote:
> From: Ge Yang <yangge1116@126.com>
>
> A kernel crash was observed when replacing free hugetlb folios:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000028
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary)
> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
> FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> replace_free_hugepage_folios+0xb6/0x100
> alloc_contig_range_noprof+0x18a/0x590
> ? srso_return_thunk+0x5/0x5f
> ? down_read+0x12/0xa0
> ? srso_return_thunk+0x5/0x5f
> cma_range_alloc.constprop.0+0x131/0x290
> __cma_alloc+0xcf/0x2c0
> cma_alloc_write+0x43/0xb0
> simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
> debugfs_attr_write+0x46/0x70
> full_proxy_write+0x62/0xa0
> vfs_write+0xf8/0x420
> ? srso_return_thunk+0x5/0x5f
> ? filp_flush+0x86/0xa0
> ? srso_return_thunk+0x5/0x5f
> ? filp_close+0x1f/0x30
> ? srso_return_thunk+0x5/0x5f
> ? do_dup2+0xaf/0x160
> ? srso_return_thunk+0x5/0x5f
> ksys_write+0x65/0xe0
> do_syscall_64+0x64/0x170
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> There is a potential race between __update_and_free_hugetlb_folio()
> and replace_free_hugepage_folios():
>
> CPU1 CPU2
> __update_and_free_hugetlb_folio replace_free_hugepage_folios
> folio_test_hugetlb(folio)
> -- It's still hugetlb folio.
>
> __folio_clear_hugetlb(folio)
> hugetlb_free_folio(folio)
> h = folio_hstate(folio)
> -- Here, h is NULL pointer
>
> When the above race condition occurs, folio_hstate(folio) returns
> NULL, and subsequent access to this NULL pointer will cause the
> system to crash. To resolve this issue, execute folio_hstate(folio)
> under the protection of the hugetlb_lock lock, ensuring that
> folio_hstate(folio) does not return NULL.
>
> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
> Signed-off-by: Ge Yang <yangge1116@126.com>
> Cc: <stable@vger.kernel.org>
> ---
> mm/hugetlb.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3d3ca6b..6c2e007 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
>
> while (start_pfn < end_pfn) {
> folio = pfn_folio(start_pfn);
> +
> + /*
> + * The folio might have been dissolved from under our feet, so make sure
> + * to carefully check the state under the lock.
> + */
> + spin_lock_irq(&hugetlb_lock);
> if (folio_test_hugetlb(folio)) {
> h = folio_hstate(folio);
> } else {
> + spin_unlock_irq(&hugetlb_lock);
> start_pfn++;
> continue;
> }
> + spin_unlock_irq(&hugetlb_lock);
As mentioned elsewhere, this will grab the hugetlb_lock for each and
every pfn in the range if there are no hugetlb folios (common case).
That should certainly *not* be done.
In case we see !folio_test_hugetlb(), we should just move on.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-26 12:41 ` David Hildenbrand
@ 2025-05-26 12:57 ` Ge Yang
2025-05-26 12:59 ` David Hildenbrand
0 siblings, 1 reply; 18+ messages in thread
From: Ge Yang @ 2025-05-26 12:57 UTC (permalink / raw)
To: David Hildenbrand, akpm
Cc: linux-mm, linux-kernel, stable, 21cnbao, baolin.wang, muchun.song,
osalvador, liuzixing
在 2025/5/26 20:41, David Hildenbrand 写道:
> On 22.05.25 05:22, yangge1116@126.com wrote:
>> From: Ge Yang <yangge1116@126.com>
>>
>> A kernel crash was observed when replacing free hugetlb folios:
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000028
>> PGD 0 P4D 0
>> Oops: Oops: 0000 [#1] SMP NOPTI
>> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41
>> PREEMPT(voluntary)
>> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
>> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
>> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
>> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
>> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
>> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
>> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
>> FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
>> Call Trace:
>> <TASK>
>> replace_free_hugepage_folios+0xb6/0x100
>> alloc_contig_range_noprof+0x18a/0x590
>> ? srso_return_thunk+0x5/0x5f
>> ? down_read+0x12/0xa0
>> ? srso_return_thunk+0x5/0x5f
>> cma_range_alloc.constprop.0+0x131/0x290
>> __cma_alloc+0xcf/0x2c0
>> cma_alloc_write+0x43/0xb0
>> simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
>> debugfs_attr_write+0x46/0x70
>> full_proxy_write+0x62/0xa0
>> vfs_write+0xf8/0x420
>> ? srso_return_thunk+0x5/0x5f
>> ? filp_flush+0x86/0xa0
>> ? srso_return_thunk+0x5/0x5f
>> ? filp_close+0x1f/0x30
>> ? srso_return_thunk+0x5/0x5f
>> ? do_dup2+0xaf/0x160
>> ? srso_return_thunk+0x5/0x5f
>> ksys_write+0x65/0xe0
>> do_syscall_64+0x64/0x170
>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> There is a potential race between __update_and_free_hugetlb_folio()
>> and replace_free_hugepage_folios():
>>
>> CPU1 CPU2
>> __update_and_free_hugetlb_folio replace_free_hugepage_folios
>> folio_test_hugetlb(folio)
>> -- It's still hugetlb folio.
>>
>> __folio_clear_hugetlb(folio)
>> hugetlb_free_folio(folio)
>> h = folio_hstate(folio)
>> -- Here, h is NULL pointer
>>
>> When the above race condition occurs, folio_hstate(folio) returns
>> NULL, and subsequent access to this NULL pointer will cause the
>> system to crash. To resolve this issue, execute folio_hstate(folio)
>> under the protection of the hugetlb_lock lock, ensuring that
>> folio_hstate(folio) does not return NULL.
>>
>> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
>> Signed-off-by: Ge Yang <yangge1116@126.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>> mm/hugetlb.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 3d3ca6b..6c2e007 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long
>> start_pfn, unsigned long end_pfn)
>> while (start_pfn < end_pfn) {
>> folio = pfn_folio(start_pfn);
>> +
>> + /*
>> + * The folio might have been dissolved from under our feet,
>> so make sure
>> + * to carefully check the state under the lock.
>> + */
>> + spin_lock_irq(&hugetlb_lock);
>> if (folio_test_hugetlb(folio)) {
>> h = folio_hstate(folio);
>> } else {
>> + spin_unlock_irq(&hugetlb_lock);
>> start_pfn++;
>> continue;
>> }
>> + spin_unlock_irq(&hugetlb_lock);
>
> As mentioned elsewhere, this will grab the hugetlb_lock for each and
> every pfn in the range if there are no hugetlb folios (common case).
>
> That should certainly *not* be done.
>
> In case we see !folio_test_hugetlb(), we should just move on.
>
The main reason for acquiring the hugetlb_lock here is to obtain a valid
hstate, as the alloc_and_dissolve_hugetlb_folio() function requires
hstate as a parameter. This approach is indeed not performance-friendly.
However, in the patch available at
https://lore.kernel.org/lkml/1747987559-23082-1-git-send-email-yangge1116@126.com/,
all these operations will be removed.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
2025-05-26 12:57 ` Ge Yang
@ 2025-05-26 12:59 ` David Hildenbrand
0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2025-05-26 12:59 UTC (permalink / raw)
To: Ge Yang, akpm
Cc: linux-mm, linux-kernel, stable, 21cnbao, baolin.wang, muchun.song,
osalvador, liuzixing
On 26.05.25 14:57, Ge Yang wrote:
>
>
> 在 2025/5/26 20:41, David Hildenbrand 写道:
>> On 22.05.25 05:22, yangge1116@126.com wrote:
>>> From: Ge Yang <yangge1116@126.com>
>>>
>>> A kernel crash was observed when replacing free hugetlb folios:
>>>
>>> BUG: kernel NULL pointer dereference, address: 0000000000000028
>>> PGD 0 P4D 0
>>> Oops: Oops: 0000 [#1] SMP NOPTI
>>> CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41
>>> PREEMPT(voluntary)
>>> RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
>>> RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
>>> RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
>>> RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
>>> RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
>>> R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
>>> R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
>>> FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000)
>>> knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
>>> Call Trace:
>>> <TASK>
>>> replace_free_hugepage_folios+0xb6/0x100
>>> alloc_contig_range_noprof+0x18a/0x590
>>> ? srso_return_thunk+0x5/0x5f
>>> ? down_read+0x12/0xa0
>>> ? srso_return_thunk+0x5/0x5f
>>> cma_range_alloc.constprop.0+0x131/0x290
>>> __cma_alloc+0xcf/0x2c0
>>> cma_alloc_write+0x43/0xb0
>>> simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
>>> debugfs_attr_write+0x46/0x70
>>> full_proxy_write+0x62/0xa0
>>> vfs_write+0xf8/0x420
>>> ? srso_return_thunk+0x5/0x5f
>>> ? filp_flush+0x86/0xa0
>>> ? srso_return_thunk+0x5/0x5f
>>> ? filp_close+0x1f/0x30
>>> ? srso_return_thunk+0x5/0x5f
>>> ? do_dup2+0xaf/0x160
>>> ? srso_return_thunk+0x5/0x5f
>>> ksys_write+0x65/0xe0
>>> do_syscall_64+0x64/0x170
>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> There is a potential race between __update_and_free_hugetlb_folio()
>>> and replace_free_hugepage_folios():
>>>
>>> CPU1 CPU2
>>> __update_and_free_hugetlb_folio replace_free_hugepage_folios
>>> folio_test_hugetlb(folio)
>>> -- It's still hugetlb folio.
>>>
>>> __folio_clear_hugetlb(folio)
>>> hugetlb_free_folio(folio)
>>> h = folio_hstate(folio)
>>> -- Here, h is NULL pointer
>>>
>>> When the above race condition occurs, folio_hstate(folio) returns
>>> NULL, and subsequent access to this NULL pointer will cause the
>>> system to crash. To resolve this issue, execute folio_hstate(folio)
>>> under the protection of the hugetlb_lock lock, ensuring that
>>> folio_hstate(folio) does not return NULL.
>>>
>>> Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
>>> Signed-off-by: Ge Yang <yangge1116@126.com>
>>> Cc: <stable@vger.kernel.org>
>>> ---
>>> mm/hugetlb.c | 8 ++++++++
>>> 1 file changed, 8 insertions(+)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 3d3ca6b..6c2e007 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long
>>> start_pfn, unsigned long end_pfn)
>>> while (start_pfn < end_pfn) {
>>> folio = pfn_folio(start_pfn);
>>> +
>>> + /*
>>> + * The folio might have been dissolved from under our feet,
>>> so make sure
>>> + * to carefully check the state under the lock.
>>> + */
>>> + spin_lock_irq(&hugetlb_lock);
>>> if (folio_test_hugetlb(folio)) {
>>> h = folio_hstate(folio);
>>> } else {
>>> + spin_unlock_irq(&hugetlb_lock);
>>> start_pfn++;
>>> continue;
>>> }
>>> + spin_unlock_irq(&hugetlb_lock);
>>
>> As mentioned elsewhere, this will grab the hugetlb_lock for each and
>> every pfn in the range if there are no hugetlb folios (common case).
>>
>> That should certainly *not* be done.
>>
>> In case we see !folio_test_hugetlb(), we should just move on.
>>
>
> The main reason for acquiring the hugetlb_lock here is to obtain a valid
> hstate, as the alloc_and_dissolve_hugetlb_folio() function requires
> hstate as a parameter. This approach is indeed not performance-friendly.
> However, in the patch available at
> https://lore.kernel.org/lkml/1747987559-23082-1-git-send-email-yangge1116@126.com/,
> all these operations will be removed.
No, you still take locks on anything that is !folio_ref_count(folio).
We should really have an early folio_test_hugetlb(folio) check.
hugetlb is on most systems out there the absolute *corner case*.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-05-26 13:00 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22 3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116
2025-05-22 3:47 ` Muchun Song
2025-05-22 5:34 ` Oscar Salvador
2025-05-22 7:13 ` Muchun Song
2025-05-22 10:13 ` Oscar Salvador
2025-05-22 11:34 ` Ge Yang
2025-05-22 11:49 ` Oscar Salvador
2025-05-22 12:39 ` Muchun Song
2025-05-22 19:32 ` Oscar Salvador
2025-05-23 3:27 ` Muchun Song
2025-05-23 3:46 ` Ge Yang
2025-05-23 3:56 ` Muchun Song
2025-05-23 5:30 ` Oscar Salvador
2025-05-23 8:07 ` Ge Yang
2025-05-22 11:50 ` Oscar Salvador
2025-05-26 12:41 ` David Hildenbrand
2025-05-26 12:57 ` Ge Yang
2025-05-26 12:59 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).