linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
@ 2025-05-22  3:22 yangge1116
  2025-05-22  3:47 ` Muchun Song
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: yangge1116 @ 2025-05-22  3:22 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
	muchun.song, osalvador, liuzixing, Ge Yang

From: Ge Yang <yangge1116@126.com>

A kernel crash was observed when replacing free hugetlb folios:

BUG: kernel NULL pointer dereference, address: 0000000000000028
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary)
RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0
RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000
RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000
RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000
R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000
R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004
FS:  00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0
Call Trace:
<TASK>
 replace_free_hugepage_folios+0xb6/0x100
 alloc_contig_range_noprof+0x18a/0x590
 ? srso_return_thunk+0x5/0x5f
 ? down_read+0x12/0xa0
 ? srso_return_thunk+0x5/0x5f
 cma_range_alloc.constprop.0+0x131/0x290
 __cma_alloc+0xcf/0x2c0
 cma_alloc_write+0x43/0xb0
 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110
 debugfs_attr_write+0x46/0x70
 full_proxy_write+0x62/0xa0
 vfs_write+0xf8/0x420
 ? srso_return_thunk+0x5/0x5f
 ? filp_flush+0x86/0xa0
 ? srso_return_thunk+0x5/0x5f
 ? filp_close+0x1f/0x30
 ? srso_return_thunk+0x5/0x5f
 ? do_dup2+0xaf/0x160
 ? srso_return_thunk+0x5/0x5f
 ksys_write+0x65/0xe0
 do_syscall_64+0x64/0x170
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

There is a potential race between __update_and_free_hugetlb_folio()
and replace_free_hugepage_folios():

CPU1                              CPU2
__update_and_free_hugetlb_folio   replace_free_hugepage_folios
                                    folio_test_hugetlb(folio)
                                    -- It's still hugetlb folio.

  __folio_clear_hugetlb(folio)
  hugetlb_free_folio(folio)
                                    h = folio_hstate(folio)
                                    -- Here, h is NULL pointer

When the above race condition occurs, folio_hstate(folio) returns
NULL, and subsequent access to this NULL pointer will cause the
system to crash. To resolve this issue, execute folio_hstate(folio)
under the protection of the hugetlb_lock lock, ensuring that
folio_hstate(folio) does not return NULL.

Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration")
Signed-off-by: Ge Yang <yangge1116@126.com>
Cc: <stable@vger.kernel.org>
---
 mm/hugetlb.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3d3ca6b..6c2e007 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2924,12 +2924,20 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
 
 	while (start_pfn < end_pfn) {
 		folio = pfn_folio(start_pfn);
+
+		/*
+		 * The folio might have been dissolved from under our feet, so make sure
+		 * to carefully check the state under the lock.
+		 */
+		spin_lock_irq(&hugetlb_lock);
 		if (folio_test_hugetlb(folio)) {
 			h = folio_hstate(folio);
 		} else {
+			spin_unlock_irq(&hugetlb_lock);
 			start_pfn++;
 			continue;
 		}
+		spin_unlock_irq(&hugetlb_lock);
 
 		if (!folio_ref_count(folio)) {
 			ret = alloc_and_dissolve_hugetlb_folio(h, folio,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-05-26 13:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22  3:22 [PATCH] mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios yangge1116
2025-05-22  3:47 ` Muchun Song
2025-05-22  5:34   ` Oscar Salvador
2025-05-22  7:13     ` Muchun Song
2025-05-22 10:13       ` Oscar Salvador
2025-05-22 11:34     ` Ge Yang
2025-05-22 11:49       ` Oscar Salvador
2025-05-22 12:39         ` Muchun Song
2025-05-22 19:32           ` Oscar Salvador
2025-05-23  3:27             ` Muchun Song
2025-05-23  3:46               ` Ge Yang
2025-05-23  3:56                 ` Muchun Song
2025-05-23  5:30                 ` Oscar Salvador
2025-05-23  8:07                   ` Ge Yang
2025-05-22 11:50 ` Oscar Salvador
2025-05-26 12:41 ` David Hildenbrand
2025-05-26 12:57   ` Ge Yang
2025-05-26 12:59     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).