From: Breno Leitao <leitao@debian.org>
To: Miaohe Lin <linmiaohe@huawei.com>,
Naoya Horiguchi <nao.horiguchi@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Shuah Khan <shuah@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
"Liam R. Howlett" <liam@infradead.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
Breno Leitao <leitao@debian.org>,
linux-trace-kernel@vger.kernel.org, kernel-team@meta.com,
Lance Yang <lance.yang@linux.dev>
Subject: [PATCH v6 0/4] mm/memory-failure: add panic option for unrecoverable pages
Date: Mon, 11 May 2026 08:38:34 -0700 [thread overview]
Message-ID: <20260511-ecc_panic-v6-0-183012ba7d4b@debian.org> (raw)
Changes from previouis version
* Replaced the late, action-time recheck for MF_MSG_KERNEL_HIGH_ORDER
with early failure classification inside get_any_page() (new patch
2/4).
* David Hildenbrand pointed out that the recheck inspected refcount and
folio mapping without holding a folio reference, which is unsafe
(concurrent split can trigger VM_WARN_ON_FOLIO).
* Lance Yang suggested moving the disambiguation to the call site that
still knows *why* the page reference could not be taken, which is what
this version does via a new enum mf_get_page_status (MF_GET_PAGE_OK
/ RACE / UNHANDLABLE) plumbed out through get_hwpoison_page().
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v6:
- Dropped the selftest given the value was not clear
- Get the status of the failure from get_any_page()
- Small nits from different people/AIs.
- Link to v5: https://patch.msgid.link/20260424-ecc_panic-v5-0-a35f4b50425c@debian.org
Changes in v5:
- Add vm.panic_on_unrecoverable_memory_failure sysctl to panic on
unrecoverable kernel page hwpoison events (reserved pages, refcount-0
non-buddy pages, unknown state), with a recheck to avoid racing with
concurrent buddy allocations. (Miaohe)
- Distinguish reserved pages as MF_MSG_KERNEL in memory_failure(),
document the new sysctl in Documentation/admin-guide/sysctl/vm.rst,
and add a selftest verifying SIGBUS recovery on userspace pages still
works when the sysctl is enabled. (Miaohe)
- Added a selftest
- Link to v4:
https://patch.msgid.link/20260415-ecc_panic-v4-0-2d0277f8f601@debian.org
Changes in v4:
- Drop CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option.
- Split the reserved page classification (MF_MSG_KERNEL) into its own
patch, separate from the panic mechanism.
- Document why the buddy allocator TOCTOU race (between
get_hwpoison_page() and is_free_buddy_page()) cannot cause false
positives: PG_hwpoison is set beforehand and check_new_page() in the
page allocator rejects hwpoisoned pages.
- Document the narrow LRU isolation race window for MF_MSG_UNKNOWN and
its mitigation via identify_page_state()'s two-pass design.
- Explicitly document why MF_MSG_GET_HWPOISON is excluded from the
panic conditions (shared path with transient races and non-reserved
kernel memory).
- Link to v3: https://patch.msgid.link/20260413-ecc_panic-v3-0-1dcbb2f12bc4@debian.org
Changes in v3:
- Rename is_unrecoverable_memory_failure() to panic_on_unrecoverable_mf()
as suggested by maintainer.
- Add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option,
similar to CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.
- Add documentation for the sysctl and CONFIG option.
- Add code comments documenting the panic condition design rationale and
how the retry mechanism mitigates false positives from buddy allocator
races.
- Link to v2: https://patch.msgid.link/20260331-ecc_panic-v2-0-9e40d0f64f7a@debian.org
Changes in v2:
- Panic on MF_MSG_KERNEL, MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_UNKNOWN
instead of MF_MSG_GET_HWPOISON.
- Report MF_MSG_KERNEL for reserved pages when get_hwpoison_page() fails
instead of MF_MSG_GET_HWPOISON.
- Link to v1: https://patch.msgid.link/20260323-ecc_panic-v1-0-72a1921726c5@debian.org
To: Miaohe Lin <linmiaohe@huawei.com>
To: Naoya Horiguchi <nao.horiguchi@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Steven Rostedt <rostedt@goodmis.org>
To: Masami Hiramatsu <mhiramat@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Jonathan Corbet <corbet@lwn.net>
To: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
---
Breno Leitao (4):
mm/memory-failure: report MF_MSG_KERNEL for reserved pages
mm/memory-failure: classify get_any_page() failures by reason
mm/memory-failure: add panic option for unrecoverable pages
Documentation: document panic_on_unrecoverable_memory_failure sysctl
Documentation/admin-guide/sysctl/vm.rst | 70 ++++++++++++++++++++++++++
include/trace/events/memory-failure.h | 2 +-
mm/memory-failure.c | 89 ++++++++++++++++++++++++++++++---
3 files changed, 152 insertions(+), 9 deletions(-)
---
base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
change-id: 20260323-ecc_panic-4e473b83087c
Best regards,
--
Breno Leitao <leitao@debian.org>
next reply other threads:[~2026-05-11 15:39 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 15:38 Breno Leitao [this message]
2026-05-11 15:38 ` [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
2026-05-12 8:17 ` David Hildenbrand (Arm)
2026-05-12 12:48 ` Lance Yang
2026-05-12 13:04 ` Breno Leitao
2026-05-12 17:58 ` jane.chu
2026-05-11 15:38 ` [PATCH v6 2/4] mm/memory-failure: classify get_any_page() failures by reason Breno Leitao
2026-05-12 8:21 ` David Hildenbrand (Arm)
2026-05-12 13:33 ` Breno Leitao
2026-05-11 15:38 ` [PATCH v6 3/4] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
2026-05-12 8:22 ` David Hildenbrand (Arm)
2026-05-12 13:05 ` Breno Leitao
2026-05-11 15:38 ` [PATCH v6 4/4] Documentation: document panic_on_unrecoverable_memory_failure sysctl Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260511-ecc_panic-v6-0-183012ba7d4b@debian.org \
--to=leitao@debian.org \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=kernel-team@meta.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linmiaohe@huawei.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=ljs@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=nao.horiguchi@gmail.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox