Linux Documentation
 help / color / mirror / Atom feed
* [PATCH v6 0/4] mm/memory-failure: add panic option for unrecoverable pages
@ 2026-05-11 15:38 Breno Leitao
  2026-05-11 15:38 ` [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Breno Leitao @ 2026-05-11 15:38 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
	linux-trace-kernel, kernel-team, Lance Yang

Changes from previouis version

* Replaced the late, action-time recheck for MF_MSG_KERNEL_HIGH_ORDER
  with early failure classification inside get_any_page() (new patch
  2/4).

* David Hildenbrand pointed out that the recheck inspected refcount and
  folio mapping without holding a folio reference, which is unsafe
  (concurrent split can trigger VM_WARN_ON_FOLIO).

* Lance Yang suggested moving the disambiguation to the call site that
  still knows *why* the page reference could not be taken, which is what
  this version does via a new enum mf_get_page_status (MF_GET_PAGE_OK
  / RACE / UNHANDLABLE) plumbed out through get_hwpoison_page().

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v6:
- Dropped the selftest given the value was not clear
- Get the status of the failure from get_any_page()
- Small nits from different people/AIs.
- Link to v5: https://patch.msgid.link/20260424-ecc_panic-v5-0-a35f4b50425c@debian.org

Changes in v5:
- Add vm.panic_on_unrecoverable_memory_failure sysctl to panic on
  unrecoverable kernel page hwpoison events (reserved pages, refcount-0
  non-buddy pages, unknown state), with a recheck to avoid racing with
  concurrent buddy allocations. (Miaohe)
- Distinguish reserved pages as MF_MSG_KERNEL in memory_failure(),
  document the new sysctl in Documentation/admin-guide/sysctl/vm.rst,
  and add a selftest verifying SIGBUS recovery on userspace pages still
  works when the sysctl is enabled. (Miaohe)
- Added a selftest
- Link to v4:
  https://patch.msgid.link/20260415-ecc_panic-v4-0-2d0277f8f601@debian.org

Changes in v4:
- Drop CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option.
- Split the reserved page classification (MF_MSG_KERNEL) into its own
  patch, separate from the panic mechanism.
- Document why the buddy allocator TOCTOU race (between
  get_hwpoison_page() and is_free_buddy_page()) cannot cause false
  positives: PG_hwpoison is set beforehand and check_new_page() in the
  page allocator rejects hwpoisoned pages.
- Document the narrow LRU isolation race window for MF_MSG_UNKNOWN and
  its mitigation via identify_page_state()'s two-pass design.
- Explicitly document why MF_MSG_GET_HWPOISON is excluded from the
  panic conditions (shared path with transient races and non-reserved
  kernel memory).
- Link to v3: https://patch.msgid.link/20260413-ecc_panic-v3-0-1dcbb2f12bc4@debian.org

Changes in v3:
- Rename is_unrecoverable_memory_failure() to panic_on_unrecoverable_mf()
  as suggested by maintainer.
- Add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option,
  similar to CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.
- Add documentation for the sysctl and CONFIG option.
- Add code comments documenting the panic condition design rationale and
  how the retry mechanism mitigates false positives from buddy allocator
  races.
- Link to v2: https://patch.msgid.link/20260331-ecc_panic-v2-0-9e40d0f64f7a@debian.org

Changes in v2:
- Panic on MF_MSG_KERNEL, MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_UNKNOWN
  instead of MF_MSG_GET_HWPOISON.
- Report MF_MSG_KERNEL for reserved pages when get_hwpoison_page() fails
  instead of MF_MSG_GET_HWPOISON.
- Link to v1: https://patch.msgid.link/20260323-ecc_panic-v1-0-72a1921726c5@debian.org

To: Miaohe Lin <linmiaohe@huawei.com>
To: Naoya Horiguchi <nao.horiguchi@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Steven Rostedt <rostedt@goodmis.org>
To: Masami Hiramatsu <mhiramat@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Jonathan Corbet <corbet@lwn.net>
To: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Cc: linux-doc@vger.kernel.org

---
Breno Leitao (4):
      mm/memory-failure: report MF_MSG_KERNEL for reserved pages
      mm/memory-failure: classify get_any_page() failures by reason
      mm/memory-failure: add panic option for unrecoverable pages
      Documentation: document panic_on_unrecoverable_memory_failure sysctl

 Documentation/admin-guide/sysctl/vm.rst | 70 ++++++++++++++++++++++++++
 include/trace/events/memory-failure.h   |  2 +-
 mm/memory-failure.c                     | 89 ++++++++++++++++++++++++++++++---
 3 files changed, 152 insertions(+), 9 deletions(-)
---
base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
change-id: 20260323-ecc_panic-4e473b83087c

Best regards,
--  
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-12 17:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 15:38 [PATCH v6 0/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
2026-05-11 15:38 ` [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
2026-05-12  8:17   ` David Hildenbrand (Arm)
2026-05-12 12:48     ` Lance Yang
2026-05-12 13:04     ` Breno Leitao
2026-05-12 17:58     ` jane.chu
2026-05-11 15:38 ` [PATCH v6 2/4] mm/memory-failure: classify get_any_page() failures by reason Breno Leitao
2026-05-12  8:21   ` David Hildenbrand (Arm)
2026-05-12 13:33     ` Breno Leitao
2026-05-11 15:38 ` [PATCH v6 3/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
2026-05-12  8:22   ` David Hildenbrand (Arm)
2026-05-12 13:05     ` Breno Leitao
2026-05-11 15:38 ` [PATCH v6 4/4] Documentation: document panic_on_unrecoverable_memory_failure sysctl Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox