All of lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Breno Leitao <leitao@debian.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Shuah Khan <shuah@kernel.org>,
	Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	lance.yang@linux.dev, Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v9 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE
Date: Tue, 9 Jun 2026 20:41:25 +0200	[thread overview]
Message-ID: <cf2bb24e-9341-4ded-b238-064dca442a92@kernel.org> (raw)
In-Reply-To: <aig7jzwDHfVCxikl@gmail.com>

On 6/9/26 18:15, Breno Leitao wrote:
> On Tue, Jun 09, 2026 at 04:41:01PM +0200, David Hildenbrand (Arm) wrote:
>> On 6/9/26 12:56, Breno Leitao wrote:
>>> get_any_page() collapses every HWPoisonHandlable() rejection into a
>>> single -EIO via the __get_hwpoison_page() -> -EBUSY -> shake_page()
>>> -> retry path.  That is correct for the transient case (a userspace
>>> folio briefly off LRU during migration or compaction, which a later
>>> shake can drag back), but wrong for stable kernel-owned pages: slab,
>>> page-table, large-kmalloc and PG_reserved pages will never become
>>> HWPoisonHandlable(), so the retry loop is wasted work and the final
>>> -EIO loses the "this is structurally unrecoverable" information.
>>> memory_failure() then maps -EIO into MF_MSG_GET_HWPOISON, which the
>>> panic-on-unrecoverable sysctl deliberately does not act on.
>>>
>>> Introduce HWPoisonKernelOwned(), a small predicate that positively
>>> identifies pages the hwpoison handler cannot recover from:
>>>
>>>   HWPoisonKernelOwned(p, flags) :=
>>>       !(MF_SOFT_OFFLINE && page_has_movable_ops(p)) &&
>>>       (PageReserved(p) ||
>>>        PageSlab(head) || PageTable(head) || PageLargeKmalloc(head))
>>>
>>>   where head = compound_head(p).
>>>
>>> PG_reserved is a per-page flag (PF_NO_COMPOUND) and is tested on the
>>> page directly.  The slab, page-table and large-kmalloc page-type bits
>>> are only stored on the head page, so those tests resolve the compound
>>> head first, then re-read compound_head(page) afterwards: a concurrent
>>> split or compound free that moves head invalidates the just-read flags
>>> and the loop retries.  The lookup still takes no refcount, mirroring
>>> the rest of get_any_page(); the recheck closes the common split race,
>>> and a residual free->alloc->free in the same window can only mis-tag
>>> a genuinely poisoned page, never reclassify a handlable one.
>>>
>>> The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors the
>>> same exception in HWPoisonHandlable(): soft-offline is allowed to
>>> migrate movable_ops pages even though they are not on the LRU, and
>>> we must not pre-empt that with an unrecoverable verdict.
>>>
>>> The list is intentionally not exhaustive.  vmalloc and kernel-stack
>>> pages, for example, do not carry a page_type bit and would need a
>>> different oracle; they keep going through the existing retry path
>>> unchanged.  This is the smallest set we can identify with certainty
>>> by page type.
>>>
>>> Wire the helper into the top of get_any_page() to short-circuit
>>> those pages before the retry loop runs.  On a hit, drop the caller's
>>> MF_COUNT_INCREASED reference (if any) and return -ENOTRECOVERABLE
>>> straight away.  Pages outside the helper's positive list still take
>>> the existing retry path and return -EIO, leaving operator-visible
>>> behaviour for those cases unchanged.
>>>
>>> Extend the unhandlable-page pr_err() to fire for either errno and
>>> update the get_hwpoison_page() kerneldoc to document the new return.
>>>
>>> memory_failure() still folds every negative return into
>>> MF_MSG_GET_HWPOISON via its existing "else if (res < 0)" branch, so
>>> this patch on its own only changes the errno that soft_offline_page()
>>> can propagate to its callers.  A follow-up wires -ENOTRECOVERABLE
>>> through memory_failure() and reports MF_MSG_KERNEL for the
>>> unrecoverable cases, which is what the
>>> panic_on_unrecoverable_memory_failure sysctl observes.
>>>
>>> Suggested-by: David Hildenbrand <david@kernel.org>
>>> Suggested-by: Lance Yang <lance.yang@linux.dev>
>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>> ---
>>>  mm/memory-failure.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>>>  1 file changed, 58 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index f4d3e6e20e13..eed9de387694 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1325,6 +1325,46 @@ static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
>>>  	return PageLRU(page) || is_free_buddy_page(page);
>>>  }
>>>  
>>> +/*
>>> + * Positive identification of pages the hwpoison handler cannot recover.
>>> + * These page types are owned by kernel internals (no userspace mapping
>>> + * to unmap, no file mapping to invalidate, no migration target), so the
>>> + * shake_page() / retry loop in get_any_page() can never turn them into
>>> + * something HWPoisonHandlable() will accept.  Short-circuit them to
>>> + * -ENOTRECOVERABLE so callers can panic on operator request instead of
>>> + * spinning through retries that exit as a transient-looking -EIO.
>>> + *
>>> + * The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors
>>> + * HWPoisonHandlable(): soft-offline is allowed to migrate movable_ops
>>> + * pages even though they are not on the LRU.
>>> + */
>>> +static inline bool HWPoisonKernelOwned(struct page *page, unsigned long flags)
>>> +{
>>> +	struct page *head;
>>> +
>>> +	if ((flags & MF_SOFT_OFFLINE) && page_has_movable_ops(page))
>>> +		return false;
>>> +
>>
>> On a second look: Do we really need that? The page types below never support
>> migration. So I guess that check is not required?
>>
>> Apart from that, looks good with two comments:
>>
>> a) HWPoisonKernelOwned: this is not the common style for us to name functions.
>>
>> is_kernel_owned_page() or sth like that would do.
> 
> Ack, I will rename it is_kernel_owned_page()
> 
> In my defence, most of the functions similar to HWPoisonKernelOwned()
> has this name format, and I got this discussion earlier (with Lance?
> I think). Here are the similar function names in that file:
> 
>  * HWPoisonHandlable
>  * PageHWPoisonTakenOff()
>  * SetPageHWPoisonTakenOff

Some of these probably date back to our old way of handling page flags and
things, like PageLRU.

But we really should stop :)

> 
> I will update in the new version.

Thanks! Probably best to wait a bit, the merge window is coming up either way,
so this will have to wait a bit either way.

-- 
Cheers,

David

  reply	other threads:[~2026-06-09 18:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-09 10:56 [PATCH v9 0/6] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
2026-06-09 10:56 ` [PATCH v9 1/6] mm/memory-failure: drop dead error_states[] entry for reserved pages Breno Leitao
2026-06-09 10:56 ` [PATCH v9 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE Breno Leitao
2026-06-09 14:41   ` David Hildenbrand (Arm)
2026-06-09 16:15     ` Breno Leitao
2026-06-09 18:41       ` David Hildenbrand (Arm) [this message]
2026-06-09 10:56 ` [PATCH v9 3/6] mm/memory-failure: report MF_MSG_KERNEL for unrecoverable kernel pages Breno Leitao
2026-06-09 10:56 ` [PATCH v9 4/6] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
2026-06-09 10:56 ` [PATCH v9 5/6] Documentation: document panic_on_unrecoverable_memory_failure sysctl Breno Leitao
2026-06-09 10:57 ` [PATCH v9 6/6] selftests/mm: add hwpoison-panic destructive test Breno Leitao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf2bb24e-9341-4ded-b238-064dca442a92@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=leitao@debian.org \
    --cc=liam@infradead.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=ljs@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=shuah@kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.