Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Pranjal Shrivastava <praan@google.com>
To: Mike Rapoport <rppt@kernel.org>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	 Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>,
	Samiullah Khawaja <skhawaja@google.com>,
	 David Matlack <dmatlack@google.com>,
	kexec@lists.infradead.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org,
	Pranjal Shrivastava <praan@google.com>
Subject: [RFC PATCH 0/4] kho: Support preserving unsplit high-order pages
Date: Fri,  3 Jul 2026 02:08:28 +0000	[thread overview]
Message-ID: <20260703020832.1731864-1-praan@google.com> (raw)

This series is required for the ongoing effort to preserve DMA allocations
across KHO [1]. It addresses a fundamental mismatch between the current KHO
restoration logic and adds support for high-order buddy allocations.

The Problem
===========
The current KHO restore implementation treats all multi-page blocks as 
split pages during restoration, i.e. kho_restore_pages() initializes 
every 4KB page with a refcount of 1.

However, many kernel subsystems, most notably the DMA allocator (via
dma_alloc_coherent), frequently return high-order non-compound pages. 
In this unsplit state, only the head page carries a refcount of 1, 
while all tail pages have a reference count of 0.

Consequently, when these contiguous but unsplit blocks are restored by 
KHO in the new kernel, the forced refcount of 1 on tail pages causes some 
trouble with the buddy allocator. Downstream of the eventual free path
the __free_pages_prepare() [2] ends up calling page_expected_state() [3] 
when is_check_pages_enabled() returns true (only when CONFIG_DEBUG_VM or
debug_pagealloc=on).

This detects the non-zero refcounts on tail pages [4] and incorrectly
taints the kernel while leaking the pages in question.

Proposed Solution
=================
This series introduces a "Page Type" field to the KHO ABI to track the
refcount pattern of the preserved pages.

1. KHO detects the physical state (CONTIG vs SPLIT) during preservation
   by peeking at the refcount of the second page in each buddy block.

2. The type bit is preserved in the high bits of the KHO radix tree key
   (Bit 63) and stashed in page->private metadata during boot.

3. kho_restore_page() applies the correct refcount pattern based on the
   preserved metadata.

4. A new helper, kho_split_preserved_pages(), is provided for subsystems
   that may need to split memory after it has already been preserved.

Considerations
==============

1. A primary goal of this approach is to prevent driver/subsystem code
   from peeking into MM internals. Drivers should not need to understand
   the distinction between head/tail pages or compound metadata. The KHO
   core handles this internally.

2. To handle rare cases where a caller might wish to split a high-order 
   block after preservation, we provide kho_split_preserved_pages().
 
3. The callers must ensure that the split_page() doesn't race with
   kho_preserve_pages for consistency.

4. Folios are always implicitly considered of the CONTIG type

Thanks,
Praan

[1] https://lore.kernel.org/all/20260505002737.2213734-1-skhawaja@google.com/
[2] https://elixir.bootlin.com/linux/v7.1.1/source/mm/page_alloc.c#L1370
[3] https://elixir.bootlin.com/linux/v7.1.1/source/mm/page_alloc.c#L1027
[4] https://elixir.bootlin.com/linux/v7.1.1/source/mm/page_alloc.c#L1034

Pranjal Shrivastava (4):
  kho: Introduce infrastructure to track preserved page types
  kho: Detect preserved page types
  kho: Implement page-aware refcount restoration
  kho: Introduce kho_split_preserved_pages() helper

 include/linux/kexec_handover.h     |   7 ++
 include/linux/kho_radix_tree.h     |  17 +++-
 kernel/liveupdate/kexec_handover.c | 144 +++++++++++++++++++++--------
 3 files changed, 124 insertions(+), 44 deletions(-)


base-commit: 87320be9f0d24fce67631b7eef919f0b79c3e45c
-- 
2.55.0.rc0.799.gd6f94ed593-goog



             reply	other threads:[~2026-07-03  2:08 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-03  2:08 Pranjal Shrivastava [this message]
2026-07-03  2:08 ` [RFC PATCH 1/4] kho: Introduce infrastructure to track preserved page types Pranjal Shrivastava
2026-07-03  2:08 ` [RFC PATCH 2/4] kho: Detect " Pranjal Shrivastava
2026-07-03  2:08 ` [RFC PATCH 3/4] kho: Implement page-aware refcount restoration Pranjal Shrivastava
2026-07-03  2:08 ` [RFC PATCH 4/4] kho: Introduce kho_split_preserved_pages() helper Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260703020832.1731864-1-praan@google.com \
    --to=praan@google.com \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=pratyush@kernel.org \
    --cc=rppt@kernel.org \
    --cc=skhawaja@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox