Re: mm/hwpoison: persist poisoned PFN list across kexec via KHO [RFC]

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Breno Leitao <leitao@debian.org>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>,
	nao.horiguchi@gmail.com,  linmiaohe@huawei.com, david@kernel.org,
	lance.yang@linux.dev,  akpm@linux-foundation.org,
	baoquan.he@linux.dev, rppt@kernel.org, pratyush@kernel.org,
	 kexec@lists.infradead.org, linux-mm@kvack.org, rneu@meta.com,
	riel@surriel.com,  caggio@meta.com
Subject: Re: mm/hwpoison: persist poisoned PFN list across kexec via KHO [RFC]
Date: Wed, 24 Jun 2026 08:21:16 -0700	[thread overview]
Message-ID: <ajvzUg8KK9gtFTYe@gmail.com> (raw)
In-Reply-To: <aju5_WjBtagTOSJw@thinkstation>

Hello Kiryl, 

First of all, thanks for the review and topics raised!

On Wed, Jun 24, 2026 at 01:04:19PM +0100, Kiryl Shutsemau wrote:
> On Wed, Jun 24, 2026 at 03:39:38AM -0700, Breno Leitao wrote:
> >   * Consumer: early in the next boot (fs_initcall_sync, before the
> >     buddy allocator has handed anything out) it restores that array
> >     and re-runs memory_failure() on each PFN, re-offlining the frame
> >     and rebuilding the full hwpoison state (PG_hwpoison, counters,
> >     HardwareCorrupted).
> 
> fs_initcall_sync is not before buddy hands anything out - buddy has been
> live since memblock_free_all() in start_kernel(), and every initcall before
> this one has allocated freely. So this is recovery, not prevention: you may
> be running memory_failure() against a frame already in use, possibly by a
> kernel allocation.

Agreed - that wording was wrong. It is recovery, not prevention, and running
memory_failure() against an already-allocated (possibly kernel) frame is the
not ideal, but, still better than what we have today.

> Two windows are missed entirely:
> 
>   - memblock allocations between setup_arch() and memblock_free_all()
>     (page tables, mem_map[], percpu) can land on the bad frame.
> 
>   - The kernel image itself: KASLR picks its location in the
>     decompressor/stub, long before any initcall. The next kernel can end
>     up running *on* the bad frame.
> 
> So I don't think this should be a memory_failure() replay. The frames need
> to leave the next kernel's view at the memory-map level, before memblock
> and KASLR.

Agreed, this is the ideal right approach.

> > Possible solutions
> > ==================
> ...
> > 
> > 2. e820 / EFI memory map (E820_TYPE_UNUSABLE). Tempting because the
> >    frame would simply never become RAM (no allocator race at all).
> >    But: it is x86-only (no arm64 equivalent in the same mechanism;
> >    this series is tested on arm64);
> 
> (+Ard. I might get some details around EFI wrong.)
> 
> This isn't accurate, and I think it's the right direction for EFI
> platforms. EFI_UNUSABLE_MEMORY is honored on both arches today, no new
> consumer code:
> 
>   - arm64: reserve_regions() marks non-usable memory nomap.

Is it true for non-UEFI arm64 hosts?

>   - x86: do_add_efi_memmap() maps it to E820_TYPE_UNUSABLE.
> 
> And it closes the KASLR window for free, because the image is only placed in
> EFI_CONVENTIONAL_MEMORY on both (x86 process_efi_entries(), arm64
> randomalloc.c). So the bad frame is invisible to both the allocator and
> KASLR, which is exactly what fs_initcall_sync can't give you.
> 
> There's also LINUX_EFI_MEMRESERVE (efi_mem_reserve_persistent()) -
> cross-arch, reserved pre-buddy in efi_init() - and looks otherwise fine, but
> it's parsed too late to keep KASLR off the frame.

Thanks, I am wondering if we piggy-back on this EFI_UNUSABLE_MEMORY (or
something similar), than we don't need to use KHO at all, basically just marked
the page as EFI_UNUSABLE_MEMORY at poison time, and rely on kexec to avoid
passing this page forward.

Thanks for the discussion,
--breno

next prev parent reply	other threads:[~2026-06-24 15:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24 10:39 mm/hwpoison: persist poisoned PFN list across kexec via KHO [RFC] Breno Leitao
2026-06-24 12:04 ` Kiryl Shutsemau
2026-06-24 13:46   ` Pratyush Yadav
2026-06-24 15:21   ` Breno Leitao [this message]
2026-06-24 15:34     ` Kiryl Shutsemau
2026-06-24 13:40 ` Pratyush Yadav
2026-06-24 14:44   ` Rik van Riel
2026-06-24 15:17     ` Pratyush Yadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajvzUg8KK9gtFTYe@gmail.com \
    --to=leitao@debian.org \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=baoquan.he@linux.dev \
    --cc=caggio@meta.com \
    --cc=david@kernel.org \
    --cc=kas@kernel.org \
    --cc=kexec@lists.infradead.org \
    --cc=lance.yang@linux.dev \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=nao.horiguchi@gmail.com \
    --cc=pratyush@kernel.org \
    --cc=riel@surriel.com \
    --cc=rneu@meta.com \
    --cc=rppt@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.