Linux Kernel Selftest development
 help / color / mirror / Atom feed
* [PATCH v6 0/4] mm/memory-failure: add panic option for unrecoverable pages
@ 2026-05-11 15:38 Breno Leitao
  2026-05-11 15:38 ` [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Breno Leitao @ 2026-05-11 15:38 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
	linux-trace-kernel, kernel-team, Lance Yang

Changes from previouis version

* Replaced the late, action-time recheck for MF_MSG_KERNEL_HIGH_ORDER
  with early failure classification inside get_any_page() (new patch
  2/4).

* David Hildenbrand pointed out that the recheck inspected refcount and
  folio mapping without holding a folio reference, which is unsafe
  (concurrent split can trigger VM_WARN_ON_FOLIO).

* Lance Yang suggested moving the disambiguation to the call site that
  still knows *why* the page reference could not be taken, which is what
  this version does via a new enum mf_get_page_status (MF_GET_PAGE_OK
  / RACE / UNHANDLABLE) plumbed out through get_hwpoison_page().

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v6:
- Dropped the selftest given the value was not clear
- Get the status of the failure from get_any_page()
- Small nits from different people/AIs.
- Link to v5: https://patch.msgid.link/20260424-ecc_panic-v5-0-a35f4b50425c@debian.org

Changes in v5:
- Add vm.panic_on_unrecoverable_memory_failure sysctl to panic on
  unrecoverable kernel page hwpoison events (reserved pages, refcount-0
  non-buddy pages, unknown state), with a recheck to avoid racing with
  concurrent buddy allocations. (Miaohe)
- Distinguish reserved pages as MF_MSG_KERNEL in memory_failure(),
  document the new sysctl in Documentation/admin-guide/sysctl/vm.rst,
  and add a selftest verifying SIGBUS recovery on userspace pages still
  works when the sysctl is enabled. (Miaohe)
- Added a selftest
- Link to v4:
  https://patch.msgid.link/20260415-ecc_panic-v4-0-2d0277f8f601@debian.org

Changes in v4:
- Drop CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option.
- Split the reserved page classification (MF_MSG_KERNEL) into its own
  patch, separate from the panic mechanism.
- Document why the buddy allocator TOCTOU race (between
  get_hwpoison_page() and is_free_buddy_page()) cannot cause false
  positives: PG_hwpoison is set beforehand and check_new_page() in the
  page allocator rejects hwpoisoned pages.
- Document the narrow LRU isolation race window for MF_MSG_UNKNOWN and
  its mitigation via identify_page_state()'s two-pass design.
- Explicitly document why MF_MSG_GET_HWPOISON is excluded from the
  panic conditions (shared path with transient races and non-reserved
  kernel memory).
- Link to v3: https://patch.msgid.link/20260413-ecc_panic-v3-0-1dcbb2f12bc4@debian.org

Changes in v3:
- Rename is_unrecoverable_memory_failure() to panic_on_unrecoverable_mf()
  as suggested by maintainer.
- Add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option,
  similar to CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.
- Add documentation for the sysctl and CONFIG option.
- Add code comments documenting the panic condition design rationale and
  how the retry mechanism mitigates false positives from buddy allocator
  races.
- Link to v2: https://patch.msgid.link/20260331-ecc_panic-v2-0-9e40d0f64f7a@debian.org

Changes in v2:
- Panic on MF_MSG_KERNEL, MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_UNKNOWN
  instead of MF_MSG_GET_HWPOISON.
- Report MF_MSG_KERNEL for reserved pages when get_hwpoison_page() fails
  instead of MF_MSG_GET_HWPOISON.
- Link to v1: https://patch.msgid.link/20260323-ecc_panic-v1-0-72a1921726c5@debian.org

To: Miaohe Lin <linmiaohe@huawei.com>
To: Naoya Horiguchi <nao.horiguchi@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Steven Rostedt <rostedt@goodmis.org>
To: Masami Hiramatsu <mhiramat@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Jonathan Corbet <corbet@lwn.net>
To: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Cc: linux-doc@vger.kernel.org

---
Breno Leitao (4):
      mm/memory-failure: report MF_MSG_KERNEL for reserved pages
      mm/memory-failure: classify get_any_page() failures by reason
      mm/memory-failure: add panic option for unrecoverable pages
      Documentation: document panic_on_unrecoverable_memory_failure sysctl

 Documentation/admin-guide/sysctl/vm.rst | 70 ++++++++++++++++++++++++++
 include/trace/events/memory-failure.h   |  2 +-
 mm/memory-failure.c                     | 89 ++++++++++++++++++++++++++++++---
 3 files changed, 152 insertions(+), 9 deletions(-)
---
base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
change-id: 20260323-ecc_panic-4e473b83087c

Best regards,
--  
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
  2026-05-11 15:38 [PATCH v6 0/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
@ 2026-05-11 15:38 ` Breno Leitao
  2026-05-12  8:17   ` David Hildenbrand (Arm)
  2026-05-11 15:38 ` [PATCH v6 2/4] mm/memory-failure: classify get_any_page() failures by reason Breno Leitao
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Breno Leitao @ 2026-05-11 15:38 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
	linux-trace-kernel, kernel-team, Lance Yang

When get_hwpoison_page() returns a negative value, distinguish
reserved pages from other failure cases by reporting MF_MSG_KERNEL
instead of MF_MSG_GET_HWPOISON. Reserved pages belong to the kernel
and should be classified accordingly for proper handling.

Sample PG_reserved before the get_hwpoison_page() call. In the
MF_COUNT_INCREASED path get_any_page() can drop the caller's
reference before returning -EIO, after which the underlying page may
have been freed and reallocated with page->flags reset; reading
PageReserved(p) at that point would observe stale or unrelated state.
The pre-call snapshot reflects what the page actually was at the
time of the failure event.

Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 mm/memory-failure.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 866c4428ac7ef..f112fb27a8ff6 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2348,6 +2348,7 @@ int memory_failure(unsigned long pfn, int flags)
 	unsigned long page_flags;
 	bool retry = true;
 	int hugetlb = 0;
+	bool is_reserved;
 
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
@@ -2411,6 +2412,18 @@ int memory_failure(unsigned long pfn, int flags)
 	 * In fact it's dangerous to directly bump up page count from 0,
 	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 	 */
+	/*
+	 * Pages with PG_reserved set are not currently managed by the
+	 * page allocator (memblock-reserved memory, driver reservations,
+	 * etc.), so classify them as kernel-owned for reporting.
+	 *
+	 * Sample the flag before get_hwpoison_page(): in the
+	 * MF_COUNT_INCREASED path, get_any_page() can drop the caller's
+	 * reference before returning -EIO, after which page->flags may
+	 * have been reset by the allocator.
+	 */
+	is_reserved = PageReserved(p);
+
 	res = get_hwpoison_page(p, flags);
 	if (!res) {
 		if (is_free_buddy_page(p)) {
@@ -2432,7 +2445,11 @@ int memory_failure(unsigned long pfn, int flags)
 		}
 		goto unlock_mutex;
 	} else if (res < 0) {
-		res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
+		if (is_reserved)
+			res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
+		else
+			res = action_result(pfn, MF_MSG_GET_HWPOISON,
+					    MF_IGNORED);
 		goto unlock_mutex;
 	}
 

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 2/4] mm/memory-failure: classify get_any_page() failures by reason
  2026-05-11 15:38 [PATCH v6 0/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
  2026-05-11 15:38 ` [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
@ 2026-05-11 15:38 ` Breno Leitao
  2026-05-12  8:21   ` David Hildenbrand (Arm)
  2026-05-11 15:38 ` [PATCH v6 3/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
  2026-05-11 15:38 ` [PATCH v6 4/4] Documentation: document panic_on_unrecoverable_memory_failure sysctl Breno Leitao
  3 siblings, 1 reply; 13+ messages in thread
From: Breno Leitao @ 2026-05-11 15:38 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
	linux-trace-kernel, kernel-team, Lance Yang

When get_any_page() fails to grab a page reference, the *reason* it
failed is known at the call site but is not surfaced to callers: the
HWPoisonHandlable() rejection path (a stable kernel page hwpoison cannot
handle — slab, vmalloc, page tables, kernel stacks, ...) and the
page_count() / put_page race paths (a transient page-allocator lifecycle
race) all collapse to a single negative errno by the time
memory_failure() sees them. memory_failure() can only observe the
conflated result and reports both as MF_MSG_GET_HWPOISON.

Surface the diagnosis explicitly. Add an mf_get_page_status enum,
plumbed out through get_any_page() and get_hwpoison_page() (NULL is
accepted by callers that do not care — unpoison_memory() and
soft_offline_page() pass NULL). get_any_page() sets the status at the
moment it gives up:

  MF_GET_PAGE_UNHANDLABLE — HWPoisonHandlable() rejected the page
                            after retries.
  MF_GET_PAGE_RACE        — exhausted retries on a refcount /
                            lifecycle race with the allocator.

memory_failure() then promotes the unhandlable case to MF_MSG_KERNEL
alongside the existing PageReserved branch, and leaves the
transient-race case as MF_MSG_GET_HWPOISON. This forms the foundation
a later patch will rely on to decide whether an unrecoverable failure
should panic.

Drop the "reserved" qualifier from action_page_types[MF_MSG_KERNEL]
and the matching tracepoint string in MF_PAGE_TYPE: the enum value
now covers both PageReserved pages and unhandlable kernel pages
(slab, vmalloc, page tables, kernel stacks, ...), so "kernel page"
is the accurate label for both populations.

Suggested-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/trace/events/memory-failure.h |  2 +-
 mm/memory-failure.c                   | 46 +++++++++++++++++++++++++++++------
 2 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/include/trace/events/memory-failure.h b/include/trace/events/memory-failure.h
index aa57cc8f896be..8a860e6fcb4e9 100644
--- a/include/trace/events/memory-failure.h
+++ b/include/trace/events/memory-failure.h
@@ -24,7 +24,7 @@
 	EMe ( MF_RECOVERED, "Recovered" )
 
 #define MF_PAGE_TYPE		\
-	EM ( MF_MSG_KERNEL, "reserved kernel page" )			\
+	EM ( MF_MSG_KERNEL, "kernel page" )				\
 	EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" )	\
 	EM ( MF_MSG_HUGE, "huge page" )					\
 	EM ( MF_MSG_FREE_HUGE, "free huge page" )			\
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f112fb27a8ff6..4210173060aac 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -878,7 +878,7 @@ static const char *action_name[] = {
 };
 
 static const char * const action_page_types[] = {
-	[MF_MSG_KERNEL]			= "reserved kernel page",
+	[MF_MSG_KERNEL]			= "kernel page",
 	[MF_MSG_KERNEL_HIGH_ORDER]	= "high-order kernel page",
 	[MF_MSG_HUGE]			= "huge page",
 	[MF_MSG_FREE_HUGE]		= "free huge page",
@@ -1389,11 +1389,29 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
 
 #define GET_PAGE_MAX_RETRY_NUM 3
 
-static int get_any_page(struct page *p, unsigned long flags)
+enum mf_get_page_status {
+	MF_GET_PAGE_OK = 0,
+	MF_GET_PAGE_RACE,
+	MF_GET_PAGE_UNHANDLABLE,
+};
+
+static void set_mf_get_page_status(enum mf_get_page_status *gp_status,
+				   enum mf_get_page_status value)
+{
+	if (!gp_status)
+		return;
+
+	*gp_status = value;
+}
+
+static int get_any_page(struct page *p, unsigned long flags,
+			enum mf_get_page_status *gp_status)
 {
 	int ret = 0, pass = 0;
 	bool count_increased = false;
 
+	set_mf_get_page_status(gp_status, MF_GET_PAGE_OK);
+
 	if (flags & MF_COUNT_INCREASED)
 		count_increased = true;
 
@@ -1406,11 +1424,13 @@ static int get_any_page(struct page *p, unsigned long flags)
 				if (pass++ < GET_PAGE_MAX_RETRY_NUM)
 					goto try_again;
 				ret = -EBUSY;
+				set_mf_get_page_status(gp_status, MF_GET_PAGE_RACE);
 			} else if (!PageHuge(p) && !is_free_buddy_page(p)) {
 				/* We raced with put_page, retry. */
 				if (pass++ < GET_PAGE_MAX_RETRY_NUM)
 					goto try_again;
 				ret = -EIO;
+				set_mf_get_page_status(gp_status, MF_GET_PAGE_RACE);
 			}
 			goto out;
 		} else if (ret == -EBUSY) {
@@ -1423,6 +1443,7 @@ static int get_any_page(struct page *p, unsigned long flags)
 				goto try_again;
 			}
 			ret = -EIO;
+			set_mf_get_page_status(gp_status, MF_GET_PAGE_UNHANDLABLE);
 			goto out;
 		}
 	}
@@ -1442,6 +1463,7 @@ static int get_any_page(struct page *p, unsigned long flags)
 		}
 		put_page(p);
 		ret = -EIO;
+		set_mf_get_page_status(gp_status, MF_GET_PAGE_UNHANDLABLE);
 	}
 out:
 	if (ret == -EIO)
@@ -1480,6 +1502,7 @@ static int __get_unpoison_page(struct page *page)
  * get_hwpoison_page() - Get refcount for memory error handling
  * @p:		Raw error page (hit by memory error)
  * @flags:	Flags controlling behavior of error handling
+ * @gp_status:	Optional output for the reason get_any_page() failed
  *
  * get_hwpoison_page() takes a page refcount of an error page to handle memory
  * error on it, after checking that the error page is in a well-defined state
@@ -1503,7 +1526,8 @@ static int __get_unpoison_page(struct page *page)
  *         operations like allocation and free,
  *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.
  */
-static int get_hwpoison_page(struct page *p, unsigned long flags)
+static int get_hwpoison_page(struct page *p, unsigned long flags,
+			     enum mf_get_page_status *gp_status)
 {
 	int ret;
 
@@ -1511,7 +1535,7 @@ static int get_hwpoison_page(struct page *p, unsigned long flags)
 	if (flags & MF_UNPOISON)
 		ret = __get_unpoison_page(p);
 	else
-		ret = get_any_page(p, flags);
+		ret = get_any_page(p, flags, gp_status);
 	zone_pcp_enable(page_zone(p));
 
 	return ret;
@@ -2349,6 +2373,7 @@ int memory_failure(unsigned long pfn, int flags)
 	bool retry = true;
 	int hugetlb = 0;
 	bool is_reserved;
+	enum mf_get_page_status gp_status = MF_GET_PAGE_OK;
 
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
@@ -2424,7 +2449,7 @@ int memory_failure(unsigned long pfn, int flags)
 	 */
 	is_reserved = PageReserved(p);
 
-	res = get_hwpoison_page(p, flags);
+	res = get_hwpoison_page(p, flags, &gp_status);
 	if (!res) {
 		if (is_free_buddy_page(p)) {
 			if (take_page_off_buddy(p)) {
@@ -2445,7 +2470,12 @@ int memory_failure(unsigned long pfn, int flags)
 		}
 		goto unlock_mutex;
 	} else if (res < 0) {
-		if (is_reserved)
+		/*
+		 * Promote a stable unhandlable kernel page diagnosed by
+		 * get_hwpoison_page() to MF_MSG_KERNEL alongside reserved
+		 * pages; transient lifecycle races stay as MF_MSG_GET_HWPOISON.
+		 */
+		if (is_reserved || gp_status == MF_GET_PAGE_UNHANDLABLE)
 			res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
 		else
 			res = action_result(pfn, MF_MSG_GET_HWPOISON,
@@ -2750,7 +2780,7 @@ int unpoison_memory(unsigned long pfn)
 		goto unlock_mutex;
 	}
 
-	ghp = get_hwpoison_page(p, MF_UNPOISON);
+	ghp = get_hwpoison_page(p, MF_UNPOISON, NULL);
 	if (!ghp) {
 		if (folio_test_hugetlb(folio)) {
 			huge = true;
@@ -2957,7 +2987,7 @@ int soft_offline_page(unsigned long pfn, int flags)
 
 retry:
 	get_online_mems();
-	ret = get_hwpoison_page(page, flags | MF_SOFT_OFFLINE);
+	ret = get_hwpoison_page(page, flags | MF_SOFT_OFFLINE, NULL);
 	put_online_mems();
 
 	if (hwpoison_filter(page)) {

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 3/4] mm/memory-failure: add panic option for unrecoverable pages
  2026-05-11 15:38 [PATCH v6 0/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
  2026-05-11 15:38 ` [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
  2026-05-11 15:38 ` [PATCH v6 2/4] mm/memory-failure: classify get_any_page() failures by reason Breno Leitao
@ 2026-05-11 15:38 ` Breno Leitao
  2026-05-12  8:22   ` David Hildenbrand (Arm)
  2026-05-11 15:38 ` [PATCH v6 4/4] Documentation: document panic_on_unrecoverable_memory_failure sysctl Breno Leitao
  3 siblings, 1 reply; 13+ messages in thread
From: Breno Leitao @ 2026-05-11 15:38 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
	linux-trace-kernel, kernel-team

Add a sysctl panic_on_unrecoverable_memory_failure that triggers a
kernel panic when memory_failure() encounters pages that cannot be
recovered. This provides a clean crash with useful debug information
rather than allowing silent data corruption or a delayed crash at an
unrelated code path.

Panic eligibility is intentionally narrow: only MF_MSG_KERNEL with
result == MF_IGNORED panics. That covers reserved pages (PageReserved)
and the kernel page types that the prior patch promotes from
MF_MSG_GET_HWPOISON via MF_GET_PAGE_UNHANDLABLE — slab, vmalloc, page
tables, kernel stacks, and similar non-LRU/non-buddy kernel-owned pages.

All other action types are excluded:

- MF_MSG_GET_HWPOISON and MF_MSG_KERNEL_HIGH_ORDER can be reached by
  transient refcount races with the page allocator (an in-flight buddy
  allocation has refcount 0 and is no longer on the buddy free list,
  briefly), and panicking on them would risk killing the box for what
  is actually a recoverable userspace page.

- MF_MSG_UNKNOWN means identify_page_state() could not classify the
  page; that is precisely the wrong basis for a panic decision.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 mm/memory-failure.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 4210173060aac..e4a9ceacaf36b 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;
 
 static int sysctl_enable_soft_offline __read_mostly = 1;
 
+static int sysctl_panic_on_unrecoverable_mf __read_mostly;
+
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
@@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = {
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
+	},
+	{
+		.procname	= "panic_on_unrecoverable_memory_failure",
+		.data		= &sysctl_panic_on_unrecoverable_mf,
+		.maxlen		= sizeof(sysctl_panic_on_unrecoverable_mf),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
 	}
 };
 
@@ -1281,6 +1292,18 @@ static void update_per_node_mf_stats(unsigned long pfn,
 	++mf_stats->total;
 }
 
+static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,
+				      enum mf_result result)
+{
+	if (!sysctl_panic_on_unrecoverable_mf || result != MF_IGNORED)
+		return false;
+
+	if (type == MF_MSG_KERNEL)
+		return true;
+
+	return false;
+}
+
 /*
  * "Dirty/Clean" indication is not 100% accurate due to the possibility of
  * setting PG_dirty outside page lock. See also comment above set_page_dirty().
@@ -1298,6 +1321,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
 	pr_err("%#lx: recovery action for %s: %s\n",
 		pfn, action_page_types[type], action_name[result]);
 
+	if (panic_on_unrecoverable_mf(type, result))
+		panic("Memory failure: %#lx: unrecoverable page", pfn);
+
 	return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
 }
 

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 4/4] Documentation: document panic_on_unrecoverable_memory_failure sysctl
  2026-05-11 15:38 [PATCH v6 0/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
                   ` (2 preceding siblings ...)
  2026-05-11 15:38 ` [PATCH v6 3/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
@ 2026-05-11 15:38 ` Breno Leitao
  3 siblings, 0 replies; 13+ messages in thread
From: Breno Leitao @ 2026-05-11 15:38 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
	linux-trace-kernel, kernel-team

Add documentation for the new vm.panic_on_unrecoverable_memory_failure
sysctl, describing which failures trigger a panic (kernel-owned pages
the handler cannot recover) and which are intentionally left out
(transient allocator races and unclassified pages).

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Documentation/admin-guide/sysctl/vm.rst | 70 +++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 97e12359775c9..802c51ba8c43b 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:
 - page-cluster
 - page_lock_unfairness
 - panic_on_oom
+- panic_on_unrecoverable_memory_failure
 - percpu_pagelist_high_fraction
 - stat_interval
 - stat_refresh
@@ -925,6 +926,75 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
 why oom happens. You can get snapshot.
 
 
+panic_on_unrecoverable_memory_failure
+======================================
+
+When a hardware memory error (e.g. multi-bit ECC) hits a kernel page
+that cannot be recovered by the memory failure handler, the default
+behaviour is to ignore the error and continue operation.  This is
+dangerous because the corrupted data remains accessible to the kernel,
+risking silent data corruption or a delayed crash when the poisoned
+memory is next accessed.
+
+When enabled, this sysctl triggers a panic on kernel-owned pages that
+the memory failure handler cannot recover: reserved pages
+(``PageReserved``) and stable kernel pages that hwpoison cannot handle
+(slab, vmalloc, page tables, kernel stacks, and similar non-LRU,
+non-buddy pages).
+
+Other failure paths are intentionally left out because they can be
+reached by transient races with the page allocator (an in-flight
+buddy allocation has refcount 0 and is no longer on the buddy free
+list, briefly), and panicking on them would risk killing the box for
+a page that was actually destined for userspace where the standard
+SIGBUS recovery path applies. Pages whose state could not be
+classified at all are also not covered, since an unknown state is
+not a sound basis for a panic decision.
+
+For many environments it is preferable to panic immediately with a clean
+crash dump that captures the original error context, rather than to
+continue and face a random crash later whose cause is difficult to
+diagnose.
+
+Use cases
+---------
+
+This option is most useful in environments where unattributed crashes
+are expensive to debug or where data integrity must take precedence
+over availability:
+
+* Large fleets, where multi-bit ECC errors on kernel pages are observed
+  regularly and post-mortem analysis of an unrelated downstream crash
+  (often seconds to minutes after the original error) consumes
+  significant engineering effort.
+
+* Systems configured with kdump, where panicking at the moment of the
+  hardware error produces a vmcore that still contains the faulting
+  address, the affected page state, and the originating MCE/GHES
+  record — context that is typically lost by the time a delayed crash
+  occurs.
+
+* High-availability clusters that rely on fast, deterministic node
+  failure for failover, and prefer an immediate panic over silent data
+  corruption propagating to replicas or persistent storage.
+
+* Kernel and platform developers reproducing hwpoison issues with
+  tools such as ``mce-inject`` or error-injection debugfs interfaces,
+  where panicking on the unrecoverable path makes regressions
+  immediately visible instead of surfacing as later, unrelated
+  failures.
+
+= =====================================================================
+0 Try to continue operation (default).
+1 Panic immediately.  If the ``panic`` sysctl is also non-zero then the
+  machine will be rebooted.
+= =====================================================================
+
+Example::
+
+     echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
+
+
 percpu_pagelist_high_fraction
 =============================
 

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
  2026-05-11 15:38 ` [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
@ 2026-05-12  8:17   ` David Hildenbrand (Arm)
  2026-05-12 12:48     ` Lance Yang
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-12  8:17 UTC (permalink / raw)
  To: Breno Leitao, Miaohe Lin, Naoya Horiguchi, Andrew Morton,
	Jonathan Corbet, Shuah Khan, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team, Lance Yang

On 5/11/26 17:38, Breno Leitao wrote:
> When get_hwpoison_page() returns a negative value, distinguish
> reserved pages from other failure cases by reporting MF_MSG_KERNEL
> instead of MF_MSG_GET_HWPOISON. Reserved pages belong to the kernel
> and should be classified accordingly for proper handling.
> 
> Sample PG_reserved before the get_hwpoison_page() call. In the
> MF_COUNT_INCREASED path get_any_page() can drop the caller's
> reference before returning -EIO, after which the underlying page may
> have been freed and reallocated with page->flags reset; reading
> PageReserved(p) at that point would observe stale or unrelated state.
> The pre-call snapshot reflects what the page actually was at the
> time of the failure event.
> 
> Acked-by: Miaohe Lin <linmiaohe@huawei.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  mm/memory-failure.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 866c4428ac7ef..f112fb27a8ff6 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2348,6 +2348,7 @@ int memory_failure(unsigned long pfn, int flags)
>  	unsigned long page_flags;
>  	bool retry = true;
>  	int hugetlb = 0;
> +	bool is_reserved;
>  
>  	if (!sysctl_memory_failure_recovery)
>  		panic("Memory failure on page %lx", pfn);
> @@ -2411,6 +2412,18 @@ int memory_failure(unsigned long pfn, int flags)
>  	 * In fact it's dangerous to directly bump up page count from 0,
>  	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
>  	 */
> +	/*
> +	 * Pages with PG_reserved set are not currently managed by the
> +	 * page allocator (memblock-reserved memory, driver reservations,
> +	 * etc.), so classify them as kernel-owned for reporting.
> +	 *
> +	 * Sample the flag before get_hwpoison_page(): in the
> +	 * MF_COUNT_INCREASED path, get_any_page() can drop the caller's
> +	 * reference before returning -EIO, after which page->flags may
> +	 * have been reset by the allocator.
> +	 */
> +	is_reserved = PageReserved(p);
> +
>  	res = get_hwpoison_page(p, flags);
>  	if (!res) {
>  		if (is_free_buddy_page(p)) {
> @@ -2432,7 +2445,11 @@ int memory_failure(unsigned long pfn, int flags)
>  		}
>  		goto unlock_mutex;
>  	} else if (res < 0) {
> -		res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
> +		if (is_reserved)
> +			res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
> +		else
> +			res = action_result(pfn, MF_MSG_GET_HWPOISON,
> +					    MF_IGNORED);
>  		goto unlock_mutex;
>  	}
>  
> 

It's a bit odd that we need this handling when we already have handling for
reserved pages in error_states[].

HWPoisonHandlable() would always essentially reject PG_reserved pages. So
__get_hwpoison_page() ... would always fail? Making
get_hwpoison_page()->get_any_page() always fail?

But then, we never call identify_page_state()? And never call me_kernel()?

This all looks very odd.

Why would you even want to call get_hwpoison_page() in the first place if you
find PageReserved?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/4] mm/memory-failure: classify get_any_page() failures by reason
  2026-05-11 15:38 ` [PATCH v6 2/4] mm/memory-failure: classify get_any_page() failures by reason Breno Leitao
@ 2026-05-12  8:21   ` David Hildenbrand (Arm)
  2026-05-12 13:33     ` Breno Leitao
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-12  8:21 UTC (permalink / raw)
  To: Breno Leitao, Miaohe Lin, Naoya Horiguchi, Andrew Morton,
	Jonathan Corbet, Shuah Khan, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team, Lance Yang


>  		}
>  		goto unlock_mutex;
>  	} else if (res < 0) {
> -		if (is_reserved)
> +		/*
> +		 * Promote a stable unhandlable kernel page diagnosed by
> +		 * get_hwpoison_page() to MF_MSG_KERNEL alongside reserved
> +		 * pages; transient lifecycle races stay as MF_MSG_GET_HWPOISON.
> +		 */
> +		if (is_reserved || gp_status == MF_GET_PAGE_UNHANDLABLE)
>  			res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);


It's all a bit of a mess. get_hwpoison_page() should just indicate that a page
is unhandable if it is PG_reserved?

Why can't we just return a special error code from  get_hwpoison_page()? We ahve
plenty of errno values to chose from.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/4] mm/memory-failure: add panic option for unrecoverable pages
  2026-05-11 15:38 ` [PATCH v6 3/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
@ 2026-05-12  8:22   ` David Hildenbrand (Arm)
  2026-05-12 13:05     ` Breno Leitao
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-12  8:22 UTC (permalink / raw)
  To: Breno Leitao, Miaohe Lin, Naoya Horiguchi, Andrew Morton,
	Jonathan Corbet, Shuah Khan, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team


> @@ -1281,6 +1292,18 @@ static void update_per_node_mf_stats(unsigned long pfn,
>  	++mf_stats->total;
>  }
>  
> +static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,
> +				      enum mf_result result)
> +{
> +	if (!sysctl_panic_on_unrecoverable_mf || result != MF_IGNORED)
> +		return false;
> +
> +	if (type == MF_MSG_KERNEL)
> +		return true;
> +
> +	return false;

return type == MF_MSG_KERNEL;

might be simpler.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
  2026-05-12  8:17   ` David Hildenbrand (Arm)
@ 2026-05-12 12:48     ` Lance Yang
  2026-05-12 13:04     ` Breno Leitao
  2026-05-12 17:58     ` jane.chu
  2 siblings, 0 replies; 13+ messages in thread
From: Lance Yang @ 2026-05-12 12:48 UTC (permalink / raw)
  To: david
  Cc: leitao, linmiaohe, nao.horiguchi, akpm, corbet, skhan, ljs,
	vbabka, rppt, surenb, mhocko, shuah, rostedt, mhiramat,
	mathieu.desnoyers, liam, linux-mm, linux-kernel, linux-doc,
	linux-kselftest, linux-trace-kernel, kernel-team, lance.yang


On Tue, May 12, 2026 at 10:17:00AM +0200, David Hildenbrand (Arm) wrote:
>On 5/11/26 17:38, Breno Leitao wrote:
>> When get_hwpoison_page() returns a negative value, distinguish
>> reserved pages from other failure cases by reporting MF_MSG_KERNEL
>> instead of MF_MSG_GET_HWPOISON. Reserved pages belong to the kernel
>> and should be classified accordingly for proper handling.
>> 
>> Sample PG_reserved before the get_hwpoison_page() call. In the
>> MF_COUNT_INCREASED path get_any_page() can drop the caller's
>> reference before returning -EIO, after which the underlying page may
>> have been freed and reallocated with page->flags reset; reading
>> PageReserved(p) at that point would observe stale or unrelated state.
>> The pre-call snapshot reflects what the page actually was at the
>> time of the failure event.
>> 
>> Acked-by: Miaohe Lin <linmiaohe@huawei.com>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Signed-off-by: Breno Leitao <leitao@debian.org>
>> ---
>>  mm/memory-failure.c | 19 ++++++++++++++++++-
>>  1 file changed, 18 insertions(+), 1 deletion(-)
>> 
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 866c4428ac7ef..f112fb27a8ff6 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2348,6 +2348,7 @@ int memory_failure(unsigned long pfn, int flags)
>>  	unsigned long page_flags;
>>  	bool retry = true;
>>  	int hugetlb = 0;
>> +	bool is_reserved;
>>  
>>  	if (!sysctl_memory_failure_recovery)
>>  		panic("Memory failure on page %lx", pfn);
>> @@ -2411,6 +2412,18 @@ int memory_failure(unsigned long pfn, int flags)
>>  	 * In fact it's dangerous to directly bump up page count from 0,
>>  	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
>>  	 */
>> +	/*
>> +	 * Pages with PG_reserved set are not currently managed by the
>> +	 * page allocator (memblock-reserved memory, driver reservations,
>> +	 * etc.), so classify them as kernel-owned for reporting.
>> +	 *
>> +	 * Sample the flag before get_hwpoison_page(): in the
>> +	 * MF_COUNT_INCREASED path, get_any_page() can drop the caller's
>> +	 * reference before returning -EIO, after which page->flags may
>> +	 * have been reset by the allocator.
>> +	 */
>> +	is_reserved = PageReserved(p);
>> +
>>  	res = get_hwpoison_page(p, flags);
>>  	if (!res) {
>>  		if (is_free_buddy_page(p)) {
>> @@ -2432,7 +2445,11 @@ int memory_failure(unsigned long pfn, int flags)
>>  		}
>>  		goto unlock_mutex;
>>  	} else if (res < 0) {
>> -		res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
>> +		if (is_reserved)
>> +			res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
>> +		else
>> +			res = action_result(pfn, MF_MSG_GET_HWPOISON,
>> +					    MF_IGNORED);
>>  		goto unlock_mutex;
>>  	}
>>  
>> 
>
>It's a bit odd that we need this handling when we already have handling for
>reserved pages in error_states[].
>
>HWPoisonHandlable() would always essentially reject PG_reserved pages. So
>__get_hwpoison_page() ... would always fail? Making
>get_hwpoison_page()->get_any_page() always fail?
>
>But then, we never call identify_page_state()? And never call me_kernel()?

Looks like we never get that far ...

>This all looks very odd.
>
>Why would you even want to call get_hwpoison_page() in the first place if you
>find PageReserved?

Ah, I see :)

For a PG_reserved page, I would not expect PageLRU to be set, nor would
I expect it to be in the buddy allocator.

include/linux/page-flags.h also says:

"
Once (if ever) freed, PG_reserved is cleared and they will be given to
the page allocator.
"

So maybe special-case PageReserved() before get_hwpoison_page()?
Something like:

	if (PageReserved(p)){
		res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
		goto unlock_mutex; 	
	}

	res = get_hwpoison_page(p, flags, &gp_status);

Cheers, Lance

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
  2026-05-12  8:17   ` David Hildenbrand (Arm)
  2026-05-12 12:48     ` Lance Yang
@ 2026-05-12 13:04     ` Breno Leitao
  2026-05-12 17:58     ` jane.chu
  2 siblings, 0 replies; 13+ messages in thread
From: Breno Leitao @ 2026-05-12 13:04 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Liam R. Howlett, linux-mm,
	linux-kernel, linux-doc, linux-kselftest, linux-trace-kernel,
	kernel-team, Lance Yang

On Tue, May 12, 2026 at 10:17:00AM +0200, David Hildenbrand (Arm) wrote:
> > @@ -2348,6 +2348,7 @@ int memory_failure(unsigned long pfn, int flags)
> >  	unsigned long page_flags;
> >  	bool retry = true;
> >  	int hugetlb = 0;
> > +	bool is_reserved;
> >  
> >  	if (!sysctl_memory_failure_recovery)
> >  		panic("Memory failure on page %lx", pfn);
> > @@ -2411,6 +2412,18 @@ int memory_failure(unsigned long pfn, int flags)
> >  	 * In fact it's dangerous to directly bump up page count from 0,
> >  	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
> >  	 */
> > +	/*
> > +	 * Pages with PG_reserved set are not currently managed by the
> > +	 * page allocator (memblock-reserved memory, driver reservations,
> > +	 * etc.), so classify them as kernel-owned for reporting.
> > +	 *
> > +	 * Sample the flag before get_hwpoison_page(): in the
> > +	 * MF_COUNT_INCREASED path, get_any_page() can drop the caller's
> > +	 * reference before returning -EIO, after which page->flags may
> > +	 * have been reset by the allocator.
> > +	 */
> > +	is_reserved = PageReserved(p);
> > +
> >  	res = get_hwpoison_page(p, flags);
> >  	if (!res) {
> >  		if (is_free_buddy_page(p)) {
> > @@ -2432,7 +2445,11 @@ int memory_failure(unsigned long pfn, int flags)
> >  		}
> >  		goto unlock_mutex;
> >  	} else if (res < 0) {
> > -		res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
> > +		if (is_reserved)
> > +			res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
> > +		else
> > +			res = action_result(pfn, MF_MSG_GET_HWPOISON,
> > +					    MF_IGNORED);
> >  		goto unlock_mutex;
> >  	}
> >  
> > 
> 
> It's a bit odd that we need this handling when we already have handling for
> reserved pages in error_states[].
> 
> HWPoisonHandlable() would always essentially reject PG_reserved pages. So
> __get_hwpoison_page() ... would always fail? Making
> get_hwpoison_page()->get_any_page() always fail?
> 
> But then, we never call identify_page_state()? And never call me_kernel()?

From what I read, it seems that error_states[0] = { reserved, reserved, MF_MSG_KERNEL, me_kernel }
has been effectively dead code on the hwpoison-from-MCE path for a
while.

My v6 patch relabels the failure-path output to match what me_kernel() would
have reported anyway.

> This all looks very odd.
> 
> Why would you even want to call get_hwpoison_page() in the first place if you
> find PageReserved?

Are you suggesting we should all the page action as soon as we detect the page
is reserved and get out?

Something as:

    if (PageReserved(p)) {
        res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
        goto unlock_mutex;
    }

    res = get_hwpoison_page(p, flags);

Thanks for the review,
--breno

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/4] mm/memory-failure: add panic option for unrecoverable pages
  2026-05-12  8:22   ` David Hildenbrand (Arm)
@ 2026-05-12 13:05     ` Breno Leitao
  0 siblings, 0 replies; 13+ messages in thread
From: Breno Leitao @ 2026-05-12 13:05 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Liam R. Howlett, linux-mm,
	linux-kernel, linux-doc, linux-kselftest, linux-trace-kernel,
	kernel-team

On Tue, May 12, 2026 at 10:22:38AM +0200, David Hildenbrand (Arm) wrote:
> 
> > @@ -1281,6 +1292,18 @@ static void update_per_node_mf_stats(unsigned long pfn,
> >  	++mf_stats->total;
> >  }
> >  
> > +static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,
> > +				      enum mf_result result)
> > +{
> > +	if (!sysctl_panic_on_unrecoverable_mf || result != MF_IGNORED)
> > +		return false;
> > +
> > +	if (type == MF_MSG_KERNEL)
> > +		return true;
> > +
> > +	return false;
> 
> return type == MF_MSG_KERNEL;
> 
> might be simpler.

Ack, I will update once we decide about the other pendencies.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 2/4] mm/memory-failure: classify get_any_page() failures by reason
  2026-05-12  8:21   ` David Hildenbrand (Arm)
@ 2026-05-12 13:33     ` Breno Leitao
  0 siblings, 0 replies; 13+ messages in thread
From: Breno Leitao @ 2026-05-12 13:33 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Liam R. Howlett, linux-mm,
	linux-kernel, linux-doc, linux-kselftest, linux-trace-kernel,
	kernel-team, Lance Yang

On Tue, May 12, 2026 at 10:21:50AM +0200, David Hildenbrand (Arm) wrote:
> 
> >  		}
> >  		goto unlock_mutex;
> >  	} else if (res < 0) {
> > -		if (is_reserved)
> > +		/*
> > +		 * Promote a stable unhandlable kernel page diagnosed by
> > +		 * get_hwpoison_page() to MF_MSG_KERNEL alongside reserved
> > +		 * pages; transient lifecycle races stay as MF_MSG_GET_HWPOISON.
> > +		 */
> > +		if (is_reserved || gp_status == MF_GET_PAGE_UNHANDLABLE)
> >  			res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
> 
> 
> It's all a bit of a mess. get_hwpoison_page() should just indicate that a page
> is unhandable if it is PG_reserved?

Are you saying that we should identify if the page is PG_reserved in
get_hwpoison_page() instead of in memory_failure(), as done in the
previous patch ("mm/memory-failure: report MF_MSG_KERNEL for reserved
pages") ?

> Why can't we just return a special error code from  get_hwpoison_page()? We ahve
> plenty of errno values to chose from.

Something like:


diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 866c4428ac7ef..0a6d83575833e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -878,7 +878,7 @@ static const char *action_name[] = {
 };
 
 static const char * const action_page_types[] = {
-	[MF_MSG_KERNEL]			= "reserved kernel page",
+	[MF_MSG_KERNEL]			= "unrecoverable kernel page",
 	[MF_MSG_KERNEL_HIGH_ORDER]	= "high-order kernel page",
 	[MF_MSG_HUGE]			= "huge page",
 	[MF_MSG_FREE_HUGE]		= "free huge page",
@@ -1394,6 +1394,21 @@ static int get_any_page(struct page *p, unsigned long flags)
 	int ret = 0, pass = 0;
 	bool count_increased = false;

+	if (PageReserved(p)) {
+		ret = -ENOTRECOVERABLE;
+		goto out;
+	}
+
 	if (flags & MF_COUNT_INCREASED)
 		count_increased = true;
 
@@ -1422,7 +1437,7 @@ static int get_any_page(struct page *p, unsigned long flags)
 				shake_page(p);
 				goto try_again;
 			}
-			ret = -EIO;
+			ret = -ENOTRECOVERABLE;
 			goto out;
 		}
 	}
@@ -1441,10 +1456,10 @@ static int get_any_page(struct page *p, unsigned long flags)
 			goto try_again;
 		}
 		put_page(p);
-		ret = -EIO;
+		ret = -ENOTRECOVERABLE;
 	}
 out:
-	if (ret == -EIO)
+	if (ret == -EIO || ret == -ENOTRECOVERABLE)
 		pr_err("%#lx: unhandlable page.\n", page_to_pfn(p));
 
 	return ret;
@@ -2431,6 +2448,9 @@ int memory_failure(unsigned long pfn, int flags)
 			res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
 		}
 		goto unlock_mutex;
+	} else if (res == -ENOTRECOVERABLE) {
+		res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
+		goto unlock_mutex;
 	} else if (res < 0) {
 		res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
 		goto unlock_mutex;


If that is what you are suggestion, maybe we can create another
MF_MSG_RESERVED? and another return value for get_any_page() to track
the reserve pages ?

Thanks for the review and suggestions,
--breno

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
  2026-05-12  8:17   ` David Hildenbrand (Arm)
  2026-05-12 12:48     ` Lance Yang
  2026-05-12 13:04     ` Breno Leitao
@ 2026-05-12 17:58     ` jane.chu
  2 siblings, 0 replies; 13+ messages in thread
From: jane.chu @ 2026-05-12 17:58 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Breno Leitao, Miaohe Lin,
	Naoya Horiguchi, Andrew Morton, Jonathan Corbet, Shuah Khan,
	Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team, Lance Yang



On 5/12/2026 1:17 AM, David Hildenbrand (Arm) wrote:
> On 5/11/26 17:38, Breno Leitao wrote:
>> When get_hwpoison_page() returns a negative value, distinguish
>> reserved pages from other failure cases by reporting MF_MSG_KERNEL
>> instead of MF_MSG_GET_HWPOISON. Reserved pages belong to the kernel
>> and should be classified accordingly for proper handling.
>>
>> Sample PG_reserved before the get_hwpoison_page() call. In the
>> MF_COUNT_INCREASED path get_any_page() can drop the caller's
>> reference before returning -EIO, after which the underlying page may
>> have been freed and reallocated with page->flags reset; reading
>> PageReserved(p) at that point would observe stale or unrelated state.
>> The pre-call snapshot reflects what the page actually was at the
>> time of the failure event.
>>
>> Acked-by: Miaohe Lin <linmiaohe@huawei.com>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Signed-off-by: Breno Leitao <leitao@debian.org>
>> ---
>>   mm/memory-failure.c | 19 ++++++++++++++++++-
>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 866c4428ac7ef..f112fb27a8ff6 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2348,6 +2348,7 @@ int memory_failure(unsigned long pfn, int flags)
>>   	unsigned long page_flags;
>>   	bool retry = true;
>>   	int hugetlb = 0;
>> +	bool is_reserved;
>>   
>>   	if (!sysctl_memory_failure_recovery)
>>   		panic("Memory failure on page %lx", pfn);
>> @@ -2411,6 +2412,18 @@ int memory_failure(unsigned long pfn, int flags)
>>   	 * In fact it's dangerous to directly bump up page count from 0,
>>   	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
>>   	 */
>> +	/*
>> +	 * Pages with PG_reserved set are not currently managed by the
>> +	 * page allocator (memblock-reserved memory, driver reservations,
>> +	 * etc.), so classify them as kernel-owned for reporting.
>> +	 *
>> +	 * Sample the flag before get_hwpoison_page(): in the
>> +	 * MF_COUNT_INCREASED path, get_any_page() can drop the caller's
>> +	 * reference before returning -EIO, after which page->flags may
>> +	 * have been reset by the allocator.
>> +	 */
>> +	is_reserved = PageReserved(p);
>> +
>>   	res = get_hwpoison_page(p, flags);
>>   	if (!res) {
>>   		if (is_free_buddy_page(p)) {
>> @@ -2432,7 +2445,11 @@ int memory_failure(unsigned long pfn, int flags)
>>   		}
>>   		goto unlock_mutex;
>>   	} else if (res < 0) {
>> -		res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
>> +		if (is_reserved)
>> +			res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
>> +		else
>> +			res = action_result(pfn, MF_MSG_GET_HWPOISON,
>> +					    MF_IGNORED);
>>   		goto unlock_mutex;
>>   	}
>>   
>>
> 
> It's a bit odd that we need this handling when we already have handling for
> reserved pages in error_states[].
> 
> HWPoisonHandlable() would always essentially reject PG_reserved pages. So
> __get_hwpoison_page() ... would always fail? Making
> get_hwpoison_page()->get_any_page() always fail?
> 
> But then, we never call identify_page_state()? And never call me_kernel()?
> 
> This all looks very odd.
> 
> Why would you even want to call get_hwpoison_page() in the first place if you
> find PageReserved?
> 

Ah, good point!
It seems to me that all unhandable pages should head out to 
identify_page_state:

--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2411,6 +2411,10 @@ int memory_failure(unsigned long pfn, int flags)
          * In fact it's dangerous to directly bump up page count from 0,
          * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
          */
+
+       if (!HWPoisonHandlable(page, flags)
+               goto identify_page_state;
+
         res = get_hwpoison_page(p, flags);
         if (!res) {
                 if (is_free_buddy_page(p)) {

thanks,
-jane





^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-12 17:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 15:38 [PATCH v6 0/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
2026-05-11 15:38 ` [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
2026-05-12  8:17   ` David Hildenbrand (Arm)
2026-05-12 12:48     ` Lance Yang
2026-05-12 13:04     ` Breno Leitao
2026-05-12 17:58     ` jane.chu
2026-05-11 15:38 ` [PATCH v6 2/4] mm/memory-failure: classify get_any_page() failures by reason Breno Leitao
2026-05-12  8:21   ` David Hildenbrand (Arm)
2026-05-12 13:33     ` Breno Leitao
2026-05-11 15:38 ` [PATCH v6 3/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
2026-05-12  8:22   ` David Hildenbrand (Arm)
2026-05-12 13:05     ` Breno Leitao
2026-05-11 15:38 ` [PATCH v6 4/4] Documentation: document panic_on_unrecoverable_memory_failure sysctl Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox