public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH v2] mm/hugetlb: fix memory offline failure due to hwpoisoned file hugetlb
@ 2026-03-21  2:10 Jinjiang Tu
  2026-03-21  2:50 ` Andrew Morton
  2026-03-23 12:14 ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 5+ messages in thread
From: Jinjiang Tu @ 2026-03-21  2:10 UTC (permalink / raw)
  To: akpm, muchun.song, osalvador, david, linmiaohe, nao.horiguchi,
	linux-mm
  Cc: wangkefeng.wang, sunnanyong, tujinjiang

When a file hugetlb folio triggers UCE, me_huge_page() will keep the
hugetlb folio in pagcahe with refcount increased and PG_hwpoison set. Even
after the hugetlb file is deleted, the hugetlb folio is still leaked.

If we want to offline the memory block that the hwpoisoned hugetlb folio
belongs to, it fails in dissolve_free_hugetlb_folios() due to the
hwpoisoned hugetlb folio isn't free.

I can reproduce this issue with the following steps in qemu:
 1) echo offline >/sys/devices/system/memory/auto_online_blocks
 2) in qemu monitor:
       object_add memory-backend-ram,id=mem10,size=1G
       device_add pc-dimm,id=dimm1,memdev=mem10,node=2
 3) echo online_movable > /sys/devices/system/node/node2/memory136/state
 4) echo 5 > /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
 5) run ./hugetlb_file. This process will receive SIGBUS.
 6) remove the hugetlbfs file.
 7) echo offline > /sys/devices/system/node/node2/memory136/state

hugetlb_file.c:
  fd = open("/dev/hugepages/my_hugepage_file", O_CREAT | O_RDWR, 0755);
  fallocate(fd, 0, 0, HUGEPAGE_SIZE * 2);
  addr = mmap(NULL, HUGEPAGE_SIZE * 2, PROT_READ | PROT_WRITE,
		MAP_SHARED | MAP_HUGETLB, fd, 0);
  memset(addr, 0xaa, HUGEPAGE_SIZE * 2);
  madvise(addr, HUGEPAGE_SIZE, MADV_HWPOISON);

To fix it, force to put ref of hwpoisoned hugetlb in memory offline, the
hwpoisoned hugetlb will be freed and succeeds to be dissolved. We couldn't
avoid races here, just like commit b023f46813cd ("memory-hotplug: skip
HWPoisoned page when offlining pages"), which force to skip hwpoisoned
page regardless of refcount.

Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
---
 mm/hugetlb.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 327eaa4074d3..b7d6c905b4b1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2115,6 +2115,15 @@ int dissolve_free_hugetlb_folios(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << order) {
 		folio = pfn_folio(pfn);
+
+		/*
+		 * For hwpoisoned hugetlb, put the refcount increaed by
+		 * memory-failure, make it succeed to dissolve.
+		 */
+		if (unlikely(folio_test_hwpoison(folio) && folio_test_hugetlb(folio)
+				&& folio_ref_count(folio)))
+			folio_put(folio);
+
 		rc = dissolve_free_hugetlb_folio(folio);
 		if (rc)
 			break;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-24  8:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-21  2:10 [PATCH v2] mm/hugetlb: fix memory offline failure due to hwpoisoned file hugetlb Jinjiang Tu
2026-03-21  2:50 ` Andrew Morton
2026-03-23 12:14 ` David Hildenbrand (Arm)
2026-03-24  6:41   ` Jinjiang Tu
2026-03-24  8:00     ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox