From: Pranjal Shrivastava <praan@google.com>
To: Mike Rapoport <rppt@kernel.org>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>,
Samiullah Khawaja <skhawaja@google.com>,
David Matlack <dmatlack@google.com>,
kexec@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Pranjal Shrivastava <praan@google.com>
Subject: [RFC PATCH 0/4] kho: Support preserving unsplit high-order pages
Date: Fri, 3 Jul 2026 02:08:28 +0000 [thread overview]
Message-ID: <20260703020832.1731864-1-praan@google.com> (raw)
This series is required for the ongoing effort to preserve DMA allocations
across KHO [1]. It addresses a fundamental mismatch between the current KHO
restoration logic and adds support for high-order buddy allocations.
The Problem
===========
The current KHO restore implementation treats all multi-page blocks as
split pages during restoration, i.e. kho_restore_pages() initializes
every 4KB page with a refcount of 1.
However, many kernel subsystems, most notably the DMA allocator (via
dma_alloc_coherent), frequently return high-order non-compound pages.
In this unsplit state, only the head page carries a refcount of 1,
while all tail pages have a reference count of 0.
Consequently, when these contiguous but unsplit blocks are restored by
KHO in the new kernel, the forced refcount of 1 on tail pages causes some
trouble with the buddy allocator. Downstream of the eventual free path
the __free_pages_prepare() [2] ends up calling page_expected_state() [3]
when is_check_pages_enabled() returns true (only when CONFIG_DEBUG_VM or
debug_pagealloc=on).
This detects the non-zero refcounts on tail pages [4] and incorrectly
taints the kernel while leaking the pages in question.
Proposed Solution
=================
This series introduces a "Page Type" field to the KHO ABI to track the
refcount pattern of the preserved pages.
1. KHO detects the physical state (CONTIG vs SPLIT) during preservation
by peeking at the refcount of the second page in each buddy block.
2. The type bit is preserved in the high bits of the KHO radix tree key
(Bit 63) and stashed in page->private metadata during boot.
3. kho_restore_page() applies the correct refcount pattern based on the
preserved metadata.
4. A new helper, kho_split_preserved_pages(), is provided for subsystems
that may need to split memory after it has already been preserved.
Considerations
==============
1. A primary goal of this approach is to prevent driver/subsystem code
from peeking into MM internals. Drivers should not need to understand
the distinction between head/tail pages or compound metadata. The KHO
core handles this internally.
2. To handle rare cases where a caller might wish to split a high-order
block after preservation, we provide kho_split_preserved_pages().
3. The callers must ensure that the split_page() doesn't race with
kho_preserve_pages for consistency.
4. Folios are always implicitly considered of the CONTIG type
Thanks,
Praan
[1] https://lore.kernel.org/all/20260505002737.2213734-1-skhawaja@google.com/
[2] https://elixir.bootlin.com/linux/v7.1.1/source/mm/page_alloc.c#L1370
[3] https://elixir.bootlin.com/linux/v7.1.1/source/mm/page_alloc.c#L1027
[4] https://elixir.bootlin.com/linux/v7.1.1/source/mm/page_alloc.c#L1034
Pranjal Shrivastava (4):
kho: Introduce infrastructure to track preserved page types
kho: Detect preserved page types
kho: Implement page-aware refcount restoration
kho: Introduce kho_split_preserved_pages() helper
include/linux/kexec_handover.h | 7 ++
include/linux/kho_radix_tree.h | 17 +++-
kernel/liveupdate/kexec_handover.c | 144 +++++++++++++++++++++--------
3 files changed, 124 insertions(+), 44 deletions(-)
base-commit: 87320be9f0d24fce67631b7eef919f0b79c3e45c
--
2.55.0.rc0.799.gd6f94ed593-goog
next reply other threads:[~2026-07-03 2:08 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-03 2:08 Pranjal Shrivastava [this message]
2026-07-03 2:08 ` [RFC PATCH 1/4] kho: Introduce infrastructure to track preserved page types Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 2/4] kho: Detect " Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 3/4] kho: Implement page-aware refcount restoration Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 4/4] kho: Introduce kho_split_preserved_pages() helper Pranjal Shrivastava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260703020832.1731864-1-praan@google.com \
--to=praan@google.com \
--cc=dmatlack@google.com \
--cc=graf@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pasha.tatashin@soleen.com \
--cc=pratyush@kernel.org \
--cc=rppt@kernel.org \
--cc=skhawaja@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox