* [RFC PATCH 0/4] kho: Support preserving unsplit high-order pages
@ 2026-07-03 2:08 Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 1/4] kho: Introduce infrastructure to track preserved page types Pranjal Shrivastava
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Pranjal Shrivastava @ 2026-07-03 2:08 UTC (permalink / raw)
To: Mike Rapoport, Pasha Tatashin, Pratyush Yadav
Cc: Alexander Graf, Samiullah Khawaja, David Matlack, kexec, linux-mm,
linux-kernel, Pranjal Shrivastava
This series is required for the ongoing effort to preserve DMA allocations
across KHO [1]. It addresses a fundamental mismatch between the current KHO
restoration logic and adds support for high-order buddy allocations.
The Problem
===========
The current KHO restore implementation treats all multi-page blocks as
split pages during restoration, i.e. kho_restore_pages() initializes
every 4KB page with a refcount of 1.
However, many kernel subsystems, most notably the DMA allocator (via
dma_alloc_coherent), frequently return high-order non-compound pages.
In this unsplit state, only the head page carries a refcount of 1,
while all tail pages have a reference count of 0.
Consequently, when these contiguous but unsplit blocks are restored by
KHO in the new kernel, the forced refcount of 1 on tail pages causes some
trouble with the buddy allocator. Downstream of the eventual free path
the __free_pages_prepare() [2] ends up calling page_expected_state() [3]
when is_check_pages_enabled() returns true (only when CONFIG_DEBUG_VM or
debug_pagealloc=on).
This detects the non-zero refcounts on tail pages [4] and incorrectly
taints the kernel while leaking the pages in question.
Proposed Solution
=================
This series introduces a "Page Type" field to the KHO ABI to track the
refcount pattern of the preserved pages.
1. KHO detects the physical state (CONTIG vs SPLIT) during preservation
by peeking at the refcount of the second page in each buddy block.
2. The type bit is preserved in the high bits of the KHO radix tree key
(Bit 63) and stashed in page->private metadata during boot.
3. kho_restore_page() applies the correct refcount pattern based on the
preserved metadata.
4. A new helper, kho_split_preserved_pages(), is provided for subsystems
that may need to split memory after it has already been preserved.
Considerations
==============
1. A primary goal of this approach is to prevent driver/subsystem code
from peeking into MM internals. Drivers should not need to understand
the distinction between head/tail pages or compound metadata. The KHO
core handles this internally.
2. To handle rare cases where a caller might wish to split a high-order
block after preservation, we provide kho_split_preserved_pages().
3. The callers must ensure that the split_page() doesn't race with
kho_preserve_pages for consistency.
4. Folios are always implicitly considered of the CONTIG type
Thanks,
Praan
[1] https://lore.kernel.org/all/20260505002737.2213734-1-skhawaja@google.com/
[2] https://elixir.bootlin.com/linux/v7.1.1/source/mm/page_alloc.c#L1370
[3] https://elixir.bootlin.com/linux/v7.1.1/source/mm/page_alloc.c#L1027
[4] https://elixir.bootlin.com/linux/v7.1.1/source/mm/page_alloc.c#L1034
Pranjal Shrivastava (4):
kho: Introduce infrastructure to track preserved page types
kho: Detect preserved page types
kho: Implement page-aware refcount restoration
kho: Introduce kho_split_preserved_pages() helper
include/linux/kexec_handover.h | 7 ++
include/linux/kho_radix_tree.h | 17 +++-
kernel/liveupdate/kexec_handover.c | 144 +++++++++++++++++++++--------
3 files changed, 124 insertions(+), 44 deletions(-)
base-commit: 87320be9f0d24fce67631b7eef919f0b79c3e45c
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply [flat|nested] 5+ messages in thread
* [RFC PATCH 1/4] kho: Introduce infrastructure to track preserved page types
2026-07-03 2:08 [RFC PATCH 0/4] kho: Support preserving unsplit high-order pages Pranjal Shrivastava
@ 2026-07-03 2:08 ` Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 2/4] kho: Detect " Pranjal Shrivastava
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Pranjal Shrivastava @ 2026-07-03 2:08 UTC (permalink / raw)
To: Mike Rapoport, Pasha Tatashin, Pratyush Yadav
Cc: Alexander Graf, Samiullah Khawaja, David Matlack, kexec, linux-mm,
linux-kernel, Pranjal Shrivastava
The KHO mechanism currently treats all multi-page blocks preserved across
a kexec as split pages during restoration, i.e. every page carries a
refcount of 1.
However, many kernel allocations, most notably DMA buffer-allocations
via dma_alloc_coherent(), return high-order non-compound pages. In this
unsplit state, only the head page has a reference count of 1, while tail
pages have a reference count of 0.
Restoring these contiguous & unsplit blocks using the current KHO
restore forces a refcount of 1 on every tail page. This causes the
buddy allocator to trigger a bad page state panic on the free path in
the new kernel when CONFIG_DEBUG_VM is enabled, as it does not expect
tail pages of a high-order block to be refcounted.
Introduce a page_type field to track the refcount pattern of preserved
pages to avoid refcounting the tails pages of high-order non-compound
pages during restore.
The type is stored in the unused high bit (bit 63) of the KHO radix tree
key to ensure it survives the kexec journey (ABI), and is stashed in the
page->private metadata during early boot of the new kernel.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
include/linux/kho_radix_tree.h | 17 +++++---
kernel/liveupdate/kexec_handover.c | 62 ++++++++++++++++++++----------
2 files changed, 53 insertions(+), 26 deletions(-)
diff --git a/include/linux/kho_radix_tree.h b/include/linux/kho_radix_tree.h
index 84e918b96e53..9244a3f7a2d4 100644
--- a/include/linux/kho_radix_tree.h
+++ b/include/linux/kho_radix_tree.h
@@ -34,16 +34,22 @@ struct kho_radix_tree {
struct mutex lock; /* protects the tree's structure and root pointer */
};
+enum kho_page_type {
+ KHO_PAGE_CONTIG = 0,
+ KHO_PAGE_SPLIT,
+};
+
typedef int (*kho_radix_tree_walk_callback_t)(phys_addr_t phys,
- unsigned int order);
+ unsigned int order,
+ enum kho_page_type type);
#ifdef CONFIG_KEXEC_HANDOVER
int kho_radix_add_page(struct kho_radix_tree *tree, unsigned long pfn,
- unsigned int order);
+ unsigned int order, enum kho_page_type type);
void kho_radix_del_page(struct kho_radix_tree *tree, unsigned long pfn,
- unsigned int order);
+ unsigned int order, enum kho_page_type type);
int kho_radix_walk_tree(struct kho_radix_tree *tree,
kho_radix_tree_walk_callback_t cb);
@@ -51,13 +57,14 @@ int kho_radix_walk_tree(struct kho_radix_tree *tree,
#else /* #ifdef CONFIG_KEXEC_HANDOVER */
static inline int kho_radix_add_page(struct kho_radix_tree *tree, long pfn,
- unsigned int order)
+ unsigned int order, enum kho_page_type type)
{
return -EOPNOTSUPP;
}
static inline void kho_radix_del_page(struct kho_radix_tree *tree,
- unsigned long pfn, unsigned int order) { }
+ unsigned long pfn, unsigned int order,
+ enum kho_page_type type) { }
static inline int kho_radix_walk_tree(struct kho_radix_tree *tree,
kho_radix_tree_walk_callback_t cb)
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index 4834a809985a..f829ffdd00f4 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -43,18 +43,22 @@
/*
* KHO uses page->private, which is an unsigned long, to store page metadata.
- * Use it to store both the magic and the order.
+ * Use it to store the magic, the order, and the type bit.
*/
union kho_page_info {
unsigned long page_private;
struct {
- unsigned int order;
+ unsigned int order : 31;
+ unsigned int type : 1;
unsigned int magic;
};
};
static_assert(sizeof(union kho_page_info) == sizeof(((struct page *)0)->private));
+#define KHO_KEY_TYPE_SHIFT 63
+#define KHO_KEY_TYPE_MASK BIT(KHO_KEY_TYPE_SHIFT)
+
static bool kho_enable __ro_after_init = IS_ENABLED(CONFIG_KEXEC_HANDOVER_ENABLE_DEFAULT);
bool kho_is_enabled(void)
@@ -85,42 +89,52 @@ static struct kho_out kho_out = {
};
/**
- * kho_radix_encode_key - Encodes a physical address and order into a radix key.
+ * kho_radix_encode_key - Encodes a physical address, order and type into a radix key.
* @phys: The physical address of the page.
* @order: The order of the page.
+ * @type: The page type.
*
- * This function combines a page's physical address and its order into a
+ * This function combines a page's physical address, its order, and its type into a
* single unsigned long, which is used as a key for all radix tree
* operations.
*
* Return: The encoded unsigned long radix key.
*/
-static unsigned long kho_radix_encode_key(phys_addr_t phys, unsigned int order)
+static unsigned long kho_radix_encode_key(phys_addr_t phys, unsigned int order,
+ enum kho_page_type type)
{
/* Order bits part */
unsigned long h = 1UL << (KHO_ORDER_0_LOG2 - order);
/* Shifted physical address part */
unsigned long l = phys >> (PAGE_SHIFT + order);
+ /* Type bit part */
+ unsigned long t = (unsigned long)type << KHO_KEY_TYPE_SHIFT;
- return h | l;
+ return h | l | t;
}
/**
- * kho_radix_decode_key - Decodes a radix key back into a physical address and order.
+ * kho_radix_decode_key - Decodes a radix key back into physical address, order, and type.
* @key: The unsigned long key to decode.
* @order: An output parameter, a pointer to an unsigned int where the decoded
* page order will be stored.
+ * @type: An output parameter, a pointer to where the decoded type will be stored.
*
* This function reverses the encoding performed by kho_radix_encode_key(),
- * extracting the original physical address and page order from a given key.
+ * extracting the original physical address, page order, and type from a given key.
*
* Return: The decoded physical address.
*/
-static phys_addr_t kho_radix_decode_key(unsigned long key, unsigned int *order)
+static phys_addr_t kho_radix_decode_key(unsigned long key, unsigned int *order,
+ enum kho_page_type *type)
{
- unsigned int order_bit = fls64(key);
+ unsigned int order_bit;
phys_addr_t phys;
+ *type = (key & KHO_KEY_TYPE_MASK) >> KHO_KEY_TYPE_SHIFT;
+ key &= ~KHO_KEY_TYPE_MASK;
+
+ order_bit = fls64(key);
/* order_bit is numbered starting at 1 from fls64 */
*order = KHO_ORDER_0_LOG2 - order_bit + 1;
/* The order is discarded by the shift */
@@ -148,6 +162,7 @@ static unsigned long kho_radix_get_table_index(unsigned long key,
* @tree: The KHO radix tree.
* @pfn: The page frame number of the page to preserve.
* @order: The order of the page.
+ * @type: The page type.
*
* This function traverses the radix tree based on the key derived from @pfn
* and @order. It sets the corresponding bit in the leaf bitmap to mark the
@@ -157,11 +172,12 @@ static unsigned long kho_radix_get_table_index(unsigned long key,
* Return: 0 on success, or a negative error code on failure.
*/
int kho_radix_add_page(struct kho_radix_tree *tree,
- unsigned long pfn, unsigned int order)
+ unsigned long pfn, unsigned int order,
+ enum kho_page_type type)
{
/* Newly allocated nodes for error cleanup */
struct kho_radix_node *intermediate_nodes[KHO_TREE_MAX_DEPTH] = { 0 };
- unsigned long key = kho_radix_encode_key(PFN_PHYS(pfn), order);
+ unsigned long key = kho_radix_encode_key(PFN_PHYS(pfn), order, type);
struct kho_radix_node *anchor_node = NULL;
struct kho_radix_node *node = tree->root;
struct kho_radix_node *new_node;
@@ -231,15 +247,16 @@ EXPORT_SYMBOL_GPL(kho_radix_add_page);
* @tree: The KHO radix tree.
* @pfn: The page frame number of the page to unpreserve.
* @order: The order of the page.
+ * @type: The page type.
*
* This function traverses the radix tree and clears the bit corresponding to
* the page, effectively removing its "preserved" status. It does not free
* the tree's intermediate nodes, even if they become empty.
*/
void kho_radix_del_page(struct kho_radix_tree *tree, unsigned long pfn,
- unsigned int order)
+ unsigned int order, enum kho_page_type type)
{
- unsigned long key = kho_radix_encode_key(PFN_PHYS(pfn), order);
+ unsigned long key = kho_radix_encode_key(PFN_PHYS(pfn), order, type);
struct kho_radix_node *node = tree->root;
struct kho_radix_leaf *leaf;
unsigned int i, idx;
@@ -277,14 +294,15 @@ static int kho_radix_walk_leaf(struct kho_radix_leaf *leaf,
kho_radix_tree_walk_callback_t cb)
{
unsigned long *bitmap = (unsigned long *)leaf;
+ enum kho_page_type type;
unsigned int order;
phys_addr_t phys;
unsigned int i;
int err;
for_each_set_bit(i, bitmap, PAGE_SIZE * BITS_PER_BYTE) {
- phys = kho_radix_decode_key(key | i, &order);
- err = cb(phys, order);
+ phys = kho_radix_decode_key(key | i, &order, &type);
+ err = cb(phys, order, type);
if (err)
return err;
}
@@ -485,7 +503,8 @@ static struct page *__init kho_get_preserved_page(phys_addr_t phys,
}
static int __init kho_preserved_memory_reserve(phys_addr_t phys,
- unsigned int order)
+ unsigned int order,
+ enum kho_page_type type)
{
union kho_page_info info;
struct page *page;
@@ -499,6 +518,7 @@ static int __init kho_preserved_memory_reserve(phys_addr_t phys,
memblock_reserved_mark_noinit(phys, sz);
info.magic = KHO_PAGE_MAGIC;
info.order = order;
+ info.type = type;
page->private = info.page_private;
return 0;
@@ -859,7 +879,7 @@ int kho_preserve_folio(struct folio *folio)
if (WARN_ON(kho_scratch_overlap(pfn << PAGE_SHIFT, PAGE_SIZE << order)))
return -EINVAL;
- return kho_radix_add_page(tree, pfn, order);
+ return kho_radix_add_page(tree, pfn, order, KHO_PAGE_CONTIG);
}
EXPORT_SYMBOL_GPL(kho_preserve_folio);
@@ -877,7 +897,7 @@ void kho_unpreserve_folio(struct folio *folio)
const unsigned long pfn = folio_pfn(folio);
const unsigned int order = folio_order(folio);
- kho_radix_del_page(tree, pfn, order);
+ kho_radix_del_page(tree, pfn, order, KHO_PAGE_CONTIG);
}
EXPORT_SYMBOL_GPL(kho_unpreserve_folio);
@@ -906,7 +926,7 @@ static void __kho_unpreserve(struct kho_radix_tree *tree,
while (pfn < end_pfn) {
order = __kho_preserve_pages_order(pfn, end_pfn);
- kho_radix_del_page(tree, pfn, order);
+ kho_radix_del_page(tree, pfn, order, KHO_PAGE_CONTIG);
pfn += 1 << order;
}
@@ -939,7 +959,7 @@ int kho_preserve_pages(struct page *page, unsigned long nr_pages)
while (pfn < end_pfn) {
unsigned int order = __kho_preserve_pages_order(pfn, end_pfn);
- err = kho_radix_add_page(tree, pfn, order);
+ err = kho_radix_add_page(tree, pfn, order, KHO_PAGE_CONTIG);
if (err) {
failed_pfn = pfn;
break;
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC PATCH 2/4] kho: Detect preserved page types
2026-07-03 2:08 [RFC PATCH 0/4] kho: Support preserving unsplit high-order pages Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 1/4] kho: Introduce infrastructure to track preserved page types Pranjal Shrivastava
@ 2026-07-03 2:08 ` Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 3/4] kho: Implement page-aware refcount restoration Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 4/4] kho: Introduce kho_split_preserved_pages() helper Pranjal Shrivastava
3 siblings, 0 replies; 5+ messages in thread
From: Pranjal Shrivastava @ 2026-07-03 2:08 UTC (permalink / raw)
To: Mike Rapoport, Pasha Tatashin, Pratyush Yadav
Cc: Alexander Graf, Samiullah Khawaja, David Matlack, kexec, linux-mm,
linux-kernel, Pranjal Shrivastava
Detect page types in kho_preserve_pages() by peeking into the refcount
of the second page in each block of order > 0. If the second page has
a non-zero reference count, it indicates that the block has been split.
Otherwise, treated it as a contiguous, unsplit block.
Since the page type is now part of the radix tree key, update the
unpreserve path to use the same detection logic to ensure it can
correctly locate and delete existing entries.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
kernel/liveupdate/kexec_handover.c | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index f829ffdd00f4..d6e81f72fe5d 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -918,15 +918,29 @@ static unsigned int __kho_preserve_pages_order(unsigned long start_pfn,
return order;
}
+static bool kho_is_page_split(unsigned long pfn, unsigned int order)
+{
+ /*
+ * If the refcount of the second page is non-zero, this block
+ * has been split.
+ */
+ if (order > 0 && page_ref_count(pfn_to_page(pfn + 1)) != 0)
+ return true;
+
+ return false;
+}
+
static void __kho_unpreserve(struct kho_radix_tree *tree,
unsigned long pfn, unsigned long end_pfn)
{
- unsigned int order;
-
while (pfn < end_pfn) {
- order = __kho_preserve_pages_order(pfn, end_pfn);
+ unsigned int order = __kho_preserve_pages_order(pfn, end_pfn);
+ enum kho_page_type type;
- kho_radix_del_page(tree, pfn, order, KHO_PAGE_CONTIG);
+ type = kho_is_page_split(pfn, order) ? KHO_PAGE_SPLIT :
+ KHO_PAGE_CONTIG;
+
+ kho_radix_del_page(tree, pfn, order, type);
pfn += 1 << order;
}
@@ -958,8 +972,12 @@ int kho_preserve_pages(struct page *page, unsigned long nr_pages)
while (pfn < end_pfn) {
unsigned int order = __kho_preserve_pages_order(pfn, end_pfn);
+ enum kho_page_type type;
+
+ type = kho_is_page_split(pfn, order) ? KHO_PAGE_SPLIT :
+ KHO_PAGE_CONTIG;
- err = kho_radix_add_page(tree, pfn, order, KHO_PAGE_CONTIG);
+ err = kho_radix_add_page(tree, pfn, order, type);
if (err) {
failed_pfn = pfn;
break;
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC PATCH 3/4] kho: Implement page-aware refcount restoration
2026-07-03 2:08 [RFC PATCH 0/4] kho: Support preserving unsplit high-order pages Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 1/4] kho: Introduce infrastructure to track preserved page types Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 2/4] kho: Detect " Pranjal Shrivastava
@ 2026-07-03 2:08 ` Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 4/4] kho: Introduce kho_split_preserved_pages() helper Pranjal Shrivastava
3 siblings, 0 replies; 5+ messages in thread
From: Pranjal Shrivastava @ 2026-07-03 2:08 UTC (permalink / raw)
To: Mike Rapoport, Pasha Tatashin, Pratyush Yadav
Cc: Alexander Graf, Samiullah Khawaja, David Matlack, kexec, linux-mm,
linux-kernel, Pranjal Shrivastava
The KHO restoration logic currently forces a refcount of 1 on every
page of a multi-page block. While that is correct for split pages, it
violates the expectations of the buddy allocator for high-order
non-compound pages allocated by kernel user (like the DMA allocator),
where tail pages are expected to have a refcount of 0.
Update the restoration path to respect the preserved page type stored
in the page->private metadata. For KHO_PAGE_CONTIG blocks, only the
head page is given a reference count of 1. For KHO_PAGE_SPLIT blocks,
every page is given a reference count of 1.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
kernel/liveupdate/kexec_handover.c | 35 +++++++++++++++++-------------
1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index d6e81f72fe5d..f6ca5e24c740 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -375,11 +375,18 @@ int kho_radix_walk_tree(struct kho_radix_tree *tree,
}
EXPORT_SYMBOL_GPL(kho_radix_walk_tree);
-/* For physically contiguous 0-order pages. */
-static void kho_init_pages(struct page *page, unsigned long nr_pages)
+/* For physically contiguous pages. */
+static void kho_restore_refcounts(struct page *page, unsigned long nr_pages,
+ enum kho_page_type type)
{
- for (unsigned long i = 0; i < nr_pages; i++) {
- set_page_count(page + i, 1);
+ /* Head page always gets refcount of 1. */
+ set_page_count(page, 1);
+ clear_page_tag_ref(page);
+
+ for (unsigned long i = 1; i < nr_pages; i++) {
+ unsigned int count = (type == KHO_PAGE_SPLIT) ? 1 : 0;
+
+ set_page_count(page + i, count);
/* Clear each page's codetag to avoid accounting mismatch. */
clear_page_tag_ref(page + i);
}
@@ -387,16 +394,7 @@ static void kho_init_pages(struct page *page, unsigned long nr_pages)
static void kho_init_folio(struct page *page, unsigned int order)
{
- unsigned long nr_pages = (1 << order);
-
- /* Head page gets refcount of 1. */
- set_page_count(page, 1);
- /* Clear head page's codetag to avoid accounting mismatch. */
- clear_page_tag_ref(page);
-
- /* For higher order folios, tail pages get a page count of zero. */
- for (unsigned long i = 1; i < nr_pages; i++)
- set_page_count(page + i, 0);
+ kho_restore_refcounts(page, 1 << order, KHO_PAGE_CONTIG);
if (order > 0)
prep_compound_page(page, order);
@@ -421,13 +419,20 @@ static struct page *kho_restore_page(phys_addr_t phys, bool is_folio)
return NULL;
nr_pages = (1 << info.order);
+ /*
+ * If we want to restore a folio, but the memory was split in the
+ * previous kernel, something is wrong.
+ */
+ if (WARN_ON_ONCE(is_folio && info.type == KHO_PAGE_SPLIT))
+ return NULL;
+
/* Clear private to make sure later restores on this page error out. */
page->private = 0;
if (is_folio)
kho_init_folio(page, info.order);
else
- kho_init_pages(page, nr_pages);
+ kho_restore_refcounts(page, nr_pages, info.type);
adjust_managed_page_count(page, nr_pages);
return page;
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC PATCH 4/4] kho: Introduce kho_split_preserved_pages() helper
2026-07-03 2:08 [RFC PATCH 0/4] kho: Support preserving unsplit high-order pages Pranjal Shrivastava
` (2 preceding siblings ...)
2026-07-03 2:08 ` [RFC PATCH 3/4] kho: Implement page-aware refcount restoration Pranjal Shrivastava
@ 2026-07-03 2:08 ` Pranjal Shrivastava
3 siblings, 0 replies; 5+ messages in thread
From: Pranjal Shrivastava @ 2026-07-03 2:08 UTC (permalink / raw)
To: Mike Rapoport, Pasha Tatashin, Pratyush Yadav
Cc: Alexander Graf, Samiullah Khawaja, David Matlack, kexec, linux-mm,
linux-kernel, Pranjal Shrivastava
A driver may need to split a high-order allocation that has already
been preserved. If the pages are split using split_page() manually,
the refcounts would change but KHO won't record the change in the
preserved page-type, resulting in a metadata mismatch during
restoration in the new kernel.
Introduce kho_split_preserved_pages() to handle splitting of preserved
pages. The helper follows an unpreserve -> split -> re-preserve sequence,
while ensuring that the KHO radix tree is updated with the correct
KHO_PAGE_SPLIT type bits.
The helper returns 0 on success, or a negative error code if the
re-preservation fails. Callers must ensure the provided order matches
the original allocation and that the operation is serialized against
other preservation API calls.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
include/linux/kexec_handover.h | 7 +++++++
kernel/liveupdate/kexec_handover.c | 23 +++++++++++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h
index 8968c56d2d73..452e38bb2076 100644
--- a/include/linux/kexec_handover.h
+++ b/include/linux/kexec_handover.h
@@ -24,6 +24,7 @@ int kho_preserve_folio(struct folio *folio);
void kho_unpreserve_folio(struct folio *folio);
int kho_preserve_pages(struct page *page, unsigned long nr_pages);
void kho_unpreserve_pages(struct page *page, unsigned long nr_pages);
+int kho_split_preserved_pages(struct page *page, unsigned int order);
int kho_preserve_vmalloc(void *ptr, struct kho_vmalloc *preservation);
void kho_unpreserve_vmalloc(struct kho_vmalloc *preservation);
void *kho_alloc_preserve(size_t size);
@@ -65,6 +66,12 @@ static inline int kho_preserve_pages(struct page *page, unsigned int nr_pages)
static inline void kho_unpreserve_pages(struct page *page, unsigned int nr_pages) { }
+static inline int kho_split_preserved_pages(struct page *page,
+ unsigned int order)
+{
+ return -EOPNOTSUPP;
+}
+
static inline int kho_preserve_vmalloc(void *ptr,
struct kho_vmalloc *preservation)
{
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index f6ca5e24c740..ea08248901b5 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -1018,6 +1018,29 @@ void kho_unpreserve_pages(struct page *page, unsigned long nr_pages)
}
EXPORT_SYMBOL_GPL(kho_unpreserve_pages);
+/**
+ * kho_split_preserved_pages - split contiguous pages that are preserved
+ * @page: first page in the list.
+ * @order: the order of the original allocation.
+ *
+ * This function allows to split a high-order allocation that has been
+ * preserved across kexec. It unpreserves the pages, splits them using
+ * split_page() and then re-preserves them as individual pages.
+ *
+ * This function MUST only be called on pages that are currently preserved.
+ * The @order provided MUST match the order used during the initial
+ * preservation.
+ *
+ * Return: 0 on success, or a negative error code on failure.
+ */
+int kho_split_preserved_pages(struct page *page, unsigned int order)
+{
+ kho_unpreserve_pages(page, 1UL << order);
+ split_page(page, order);
+ return kho_preserve_pages(page, 1UL << order);
+}
+EXPORT_SYMBOL_GPL(kho_split_preserved_pages);
+
/* vmalloc flags KHO supports */
#define KHO_VMALLOC_SUPPORTED_FLAGS (VM_ALLOC | VM_ALLOW_HUGE_VMAP)
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-07-03 2:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-03 2:08 [RFC PATCH 0/4] kho: Support preserving unsplit high-order pages Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 1/4] kho: Introduce infrastructure to track preserved page types Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 2/4] kho: Detect " Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 3/4] kho: Implement page-aware refcount restoration Pranjal Shrivastava
2026-07-03 2:08 ` [RFC PATCH 4/4] kho: Introduce kho_split_preserved_pages() helper Pranjal Shrivastava
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox