From: Jork Loeser <jloeser@linux.microsoft.com>
To: linux-hyperv@vger.kernel.org, linux-mm@kvack.org,
kexec@lists.infradead.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
Long Li <longli@microsoft.com>, Mike Rapoport <rppt@kernel.org>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Pratyush Yadav <pratyush@kernel.org>,
Alexander Graf <graf@amazon.com>, Jason Miu <jasonmiu@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Muchun Song <muchun.song@linux.dev>,
Oscar Salvador <osalvador@suse.de>, Baoquan He <bhe@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Thomas Gleixner <tglx@kernel.org>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>, Kees Cook <kees@kernel.org>,
Ran Xiaokai <ran.xiaokai@zte.com.cn>,
Justinien Bouron <jbouron@amazon.com>,
Sourabh Jain <sourabhjain@linux.ibm.com>,
Pingfan Liu <piliu@redhat.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
Mario Limonciello <mario.limonciello@amd.com>,
linux-arm-kernel@lists.infradead.org, x86@kernel.org,
linux-kernel@vger.kernel.org,
Michael Kelley <mhklinux@outlook.com>,
Jork Loeser <jloeser@linux.microsoft.com>
Subject: [RFC PATCH 01/20] kho: generalize radix tree APIs
Date: Wed, 27 May 2026 17:41:43 -0700 [thread overview]
Message-ID: <20260528004204.1484584-2-jloeser@linux.microsoft.com> (raw)
In-Reply-To: <20260528004204.1484584-1-jloeser@linux.microsoft.com>
From: "Pratyush Yadav (Google)" <pratyush@kernel.org>
The KHO radix tree is a data structure that can track the presence or
absence of an arbitrary key, with nothing inherently tied to KHO memory
preservation tracking. This was one of the design goals of the radix
tree. This was done to enable it to be re-used by other users of KHO.
Despite that, the radix tree APIs are very closely tied to KHO memory
preservation tracking. Adding a key is done by kho_radix_add_page(),
which encodes it as a page tracking operation and takes in PFN and
order. kho_radix_del_page() does the same. These functions encode the
key internally that goes into the radix tree. kho_radix_walk_tree() does
the same by baking the PFN and order into the callback arguments.
Generalize the APIs by taking the key directly and doing the encoding at
the callers. Rename the functions to kho_radix_add_key() and
kho_radix_del_key(). In practice, this removes a line each from the
functions and moves the encoding function call to the callers.
Similarly, update kho_radix_tree_walk_callback_t to take the key
directly.
To keep the naming convention clearer, rename
kho_radix_{encode,decode}_key() to kho_{encode,decode}_radix_key().
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
---
include/linux/kho_radix_tree.h | 18 ++---
kernel/liveupdate/kexec_handover.c | 119 ++++++++++++++---------------
2 files changed, 63 insertions(+), 74 deletions(-)
diff --git a/include/linux/kho_radix_tree.h b/include/linux/kho_radix_tree.h
index 84e918b96e53..f368f3b9f923 100644
--- a/include/linux/kho_radix_tree.h
+++ b/include/linux/kho_radix_tree.h
@@ -34,30 +34,24 @@ struct kho_radix_tree {
struct mutex lock; /* protects the tree's structure and root pointer */
};
-typedef int (*kho_radix_tree_walk_callback_t)(phys_addr_t phys,
- unsigned int order);
+typedef int (*kho_radix_tree_walk_callback_t)(unsigned long key);
#ifdef CONFIG_KEXEC_HANDOVER
-int kho_radix_add_page(struct kho_radix_tree *tree, unsigned long pfn,
- unsigned int order);
-
-void kho_radix_del_page(struct kho_radix_tree *tree, unsigned long pfn,
- unsigned int order);
-
+int kho_radix_add_key(struct kho_radix_tree *tree, unsigned long key);
+void kho_radix_del_key(struct kho_radix_tree *tree, unsigned long key);
int kho_radix_walk_tree(struct kho_radix_tree *tree,
kho_radix_tree_walk_callback_t cb);
#else /* #ifdef CONFIG_KEXEC_HANDOVER */
-static inline int kho_radix_add_page(struct kho_radix_tree *tree, long pfn,
- unsigned int order)
+static inline int kho_radix_add_key(struct kho_radix_tree *tree, unsigned long key)
{
return -EOPNOTSUPP;
}
-static inline void kho_radix_del_page(struct kho_radix_tree *tree,
- unsigned long pfn, unsigned int order) { }
+static inline void kho_radix_del_key(struct kho_radix_tree *tree,
+ unsigned long key) { }
static inline int kho_radix_walk_tree(struct kho_radix_tree *tree,
kho_radix_tree_walk_callback_t cb)
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index 4834a809985a..05a6eb56e176 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -85,7 +85,7 @@ static struct kho_out kho_out = {
};
/**
- * kho_radix_encode_key - Encodes a physical address and order into a radix key.
+ * kho_encode_radix_key - Encodes a physical address and order into a radix key.
* @phys: The physical address of the page.
* @order: The order of the page.
*
@@ -95,7 +95,7 @@ static struct kho_out kho_out = {
*
* Return: The encoded unsigned long radix key.
*/
-static unsigned long kho_radix_encode_key(phys_addr_t phys, unsigned int order)
+static unsigned long kho_encode_radix_key(phys_addr_t phys, unsigned int order)
{
/* Order bits part */
unsigned long h = 1UL << (KHO_ORDER_0_LOG2 - order);
@@ -106,17 +106,17 @@ static unsigned long kho_radix_encode_key(phys_addr_t phys, unsigned int order)
}
/**
- * kho_radix_decode_key - Decodes a radix key back into a physical address and order.
+ * kho_decode_radix_key - Decodes a radix key back into a physical address and order.
* @key: The unsigned long key to decode.
* @order: An output parameter, a pointer to an unsigned int where the decoded
* page order will be stored.
*
- * This function reverses the encoding performed by kho_radix_encode_key(),
+ * This function reverses the encoding performed by kho_encode_radix_key(),
* extracting the original physical address and page order from a given key.
*
* Return: The decoded physical address.
*/
-static phys_addr_t kho_radix_decode_key(unsigned long key, unsigned int *order)
+static phys_addr_t kho_decode_radix_key(unsigned long key, unsigned int *order)
{
unsigned int order_bit = fls64(key);
phys_addr_t phys;
@@ -144,24 +144,21 @@ static unsigned long kho_radix_get_table_index(unsigned long key,
}
/**
- * kho_radix_add_page - Marks a page as preserved in the radix tree.
+ * kho_radix_add_key - Add a key to the radix tree.
* @tree: The KHO radix tree.
- * @pfn: The page frame number of the page to preserve.
- * @order: The order of the page.
+ * @key: The key to add.
*
- * This function traverses the radix tree based on the key derived from @pfn
- * and @order. It sets the corresponding bit in the leaf bitmap to mark the
- * page for preservation. If intermediate nodes do not exist along the path,
- * they are allocated and added to the tree.
+ * This function traverses the radix tree based on the key provided. It sets the
+ * corresponding bit in the leaf bitmap to mark the key as present. If
+ * intermediate nodes do not exist along the path, they are allocated and added
+ * to the tree.
*
* Return: 0 on success, or a negative error code on failure.
*/
-int kho_radix_add_page(struct kho_radix_tree *tree,
- unsigned long pfn, unsigned int order)
+int kho_radix_add_key(struct kho_radix_tree *tree, unsigned long key)
{
/* Newly allocated nodes for error cleanup */
struct kho_radix_node *intermediate_nodes[KHO_TREE_MAX_DEPTH] = { 0 };
- unsigned long key = kho_radix_encode_key(PFN_PHYS(pfn), order);
struct kho_radix_node *anchor_node = NULL;
struct kho_radix_node *node = tree->root;
struct kho_radix_node *new_node;
@@ -224,22 +221,19 @@ int kho_radix_add_page(struct kho_radix_tree *tree,
return err;
}
-EXPORT_SYMBOL_GPL(kho_radix_add_page);
+EXPORT_SYMBOL_GPL(kho_radix_add_key);
/**
- * kho_radix_del_page - Removes a page's preservation status from the radix tree.
+ * kho_radix_del_key - Removes the key from the radix tree.
* @tree: The KHO radix tree.
- * @pfn: The page frame number of the page to unpreserve.
- * @order: The order of the page.
+ * @key: The key to remove.
*
* This function traverses the radix tree and clears the bit corresponding to
- * the page, effectively removing its "preserved" status. It does not free
- * the tree's intermediate nodes, even if they become empty.
+ * the key, effectively removing it from the tree. It does not free the tree's
+ * intermediate nodes, even if they become empty.
*/
-void kho_radix_del_page(struct kho_radix_tree *tree, unsigned long pfn,
- unsigned int order)
+void kho_radix_del_key(struct kho_radix_tree *tree, unsigned long key)
{
- unsigned long key = kho_radix_encode_key(PFN_PHYS(pfn), order);
struct kho_radix_node *node = tree->root;
struct kho_radix_leaf *leaf;
unsigned int i, idx;
@@ -270,21 +264,18 @@ void kho_radix_del_page(struct kho_radix_tree *tree, unsigned long pfn,
idx = kho_radix_get_bitmap_index(key);
__clear_bit(idx, leaf->bitmap);
}
-EXPORT_SYMBOL_GPL(kho_radix_del_page);
+EXPORT_SYMBOL_GPL(kho_radix_del_key);
static int kho_radix_walk_leaf(struct kho_radix_leaf *leaf,
unsigned long key,
kho_radix_tree_walk_callback_t cb)
{
unsigned long *bitmap = (unsigned long *)leaf;
- unsigned int order;
- phys_addr_t phys;
unsigned int i;
int err;
for_each_set_bit(i, bitmap, PAGE_SIZE * BITS_PER_BYTE) {
- phys = kho_radix_decode_key(key | i, &order);
- err = cb(phys, order);
+ err = cb(key | i);
if (err)
return err;
}
@@ -332,15 +323,14 @@ static int __kho_radix_walk_tree(struct kho_radix_node *root,
}
/**
- * kho_radix_walk_tree - Traverses the radix tree and calls a callback for each preserved page.
+ * kho_radix_walk_tree - Traverses the radix tree and calls a callback for each key.
* @tree: A pointer to the KHO radix tree to walk.
* @cb: A callback function of type kho_radix_tree_walk_callback_t that will be
- * invoked for each preserved page found in the tree. The callback receives
- * the physical address and order of the preserved page.
+ * invoked for each key in the tree.
*
* This function walks the radix tree, searching from the specified top level
- * down to the lowest level (level 0). For each preserved page found, it invokes
- * the provided callback, passing the page's physical address and order.
+ * down to the lowest level (level 0). For each key found, it invokes the
+ * provided callback.
*
* Return: 0 if the walk completed the specified tree, or the non-zero return
* value from the callback that stopped the walk.
@@ -484,13 +474,16 @@ static struct page *__init kho_get_preserved_page(phys_addr_t phys,
return pfn_to_page(pfn);
}
-static int __init kho_preserved_memory_reserve(phys_addr_t phys,
- unsigned int order)
+static int __init kho_preserved_memory_reserve(unsigned long key)
{
union kho_page_info info;
struct page *page;
+ unsigned int order;
+ phys_addr_t phys;
u64 sz;
+ phys = kho_decode_radix_key(key, &order);
+
sz = 1 << (order + PAGE_SHIFT);
page = kho_get_preserved_page(phys, order);
@@ -618,30 +611,20 @@ early_param("kho_scratch", kho_parse_scratch_size);
static void __init scratch_size_update(void)
{
- /*
- * If fixed sizes are not provided via command line, calculate them
- * now.
- */
- if (scratch_scale) {
- phys_addr_t size;
+ phys_addr_t size;
- size = memblock_reserved_kern_size(ARCH_LOW_ADDRESS_LIMIT,
- NUMA_NO_NODE);
- size = size * scratch_scale / 100;
- scratch_size_lowmem = size;
+ if (!scratch_scale)
+ return;
- size = memblock_reserved_kern_size(MEMBLOCK_ALLOC_ANYWHERE,
- NUMA_NO_NODE);
- size = size * scratch_scale / 100 - scratch_size_lowmem;
- scratch_size_global = size;
- }
+ size = memblock_reserved_kern_size(ARCH_LOW_ADDRESS_LIMIT,
+ NUMA_NO_NODE);
+ size = size * scratch_scale / 100;
+ scratch_size_lowmem = round_up(size, CMA_MIN_ALIGNMENT_BYTES);
- /*
- * Scratch areas are released as MIGRATE_CMA. Round them up to the right
- * size.
- */
- scratch_size_lowmem = round_up(scratch_size_lowmem, CMA_MIN_ALIGNMENT_BYTES);
- scratch_size_global = round_up(scratch_size_global, CMA_MIN_ALIGNMENT_BYTES);
+ size = memblock_reserved_kern_size(MEMBLOCK_ALLOC_ANYWHERE,
+ NUMA_NO_NODE);
+ size = size * scratch_scale / 100 - scratch_size_lowmem;
+ scratch_size_global = round_up(size, CMA_MIN_ALIGNMENT_BYTES);
}
static phys_addr_t __init scratch_size_node(int nid)
@@ -859,7 +842,8 @@ int kho_preserve_folio(struct folio *folio)
if (WARN_ON(kho_scratch_overlap(pfn << PAGE_SHIFT, PAGE_SIZE << order)))
return -EINVAL;
- return kho_radix_add_page(tree, pfn, order);
+ return kho_radix_add_key(tree, kho_encode_radix_key(PFN_PHYS(pfn),
+ order));
}
EXPORT_SYMBOL_GPL(kho_preserve_folio);
@@ -877,7 +861,7 @@ void kho_unpreserve_folio(struct folio *folio)
const unsigned long pfn = folio_pfn(folio);
const unsigned int order = folio_order(folio);
- kho_radix_del_page(tree, pfn, order);
+ kho_radix_del_key(tree, kho_encode_radix_key(PFN_PHYS(pfn), order));
}
EXPORT_SYMBOL_GPL(kho_unpreserve_folio);
@@ -906,7 +890,8 @@ static void __kho_unpreserve(struct kho_radix_tree *tree,
while (pfn < end_pfn) {
order = __kho_preserve_pages_order(pfn, end_pfn);
- kho_radix_del_page(tree, pfn, order);
+ kho_radix_del_key(tree, kho_encode_radix_key(PFN_PHYS(pfn),
+ order));
pfn += 1 << order;
}
@@ -937,9 +922,19 @@ int kho_preserve_pages(struct page *page, unsigned long nr_pages)
}
while (pfn < end_pfn) {
- unsigned int order = __kho_preserve_pages_order(pfn, end_pfn);
+ unsigned int order =
+ min(count_trailing_zeros(pfn), ilog2(end_pfn - pfn));
+
+ /*
+ * Make sure all the pages in a single preservation are in the
+ * same NUMA node. The restore machinery can not cope with a
+ * preservation spanning multiple NUMA nodes.
+ */
+ while (pfn_to_nid(pfn) != pfn_to_nid(pfn + (1UL << order) - 1))
+ order--;
- err = kho_radix_add_page(tree, pfn, order);
+ err = kho_radix_add_key(tree, kho_encode_radix_key(PFN_PHYS(pfn),
+ order));
if (err) {
failed_pfn = pfn;
break;
--
2.43.0
next prev parent reply other threads:[~2026-05-28 0:42 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-28 0:41 [RFC PATCH 00/20] mshv: enable kexec with Hyper-V donated pages and partitions Jork Loeser
2026-05-28 0:41 ` Jork Loeser [this message]
2026-05-28 0:41 ` [RFC PATCH 02/20] kho: store incoming radix tree in kho_in Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 03/20] kho: add a struct for radix callbacks Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 04/20] kho: add callback for table pages Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 05/20] kho: add data argument to radix walk callback Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 06/20] kho: allow early-boot usage of the KHO radix tree Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 07/20] kho: allow destroying " Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 08/20] kho: add kho_radix_init_tree() Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 09/20] memblock: introduce MEMBLOCK_KHO_SCRATCH_EXT Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 10/20] kho: extended scratch Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 11/20] kho: return virtual address of mem_map Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 12/20] mm/hugetlb: make bootmem allocation work with KHO Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 13/20] kho: add radix tree freeze and del_key() error reporting Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 14/20] kho: Add crash-kernel-safe radix tree presence check Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 15/20] mshv: Use page tracker to manage MSHV-owned pages and preserve with KHO Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 16/20] mshv: Add debugfs interface to page tracker Jork Loeser
2026-05-28 0:41 ` [RFC PATCH 17/20] hyperv: Reserve crash MSR P2 for page preservation root PA Jork Loeser
2026-05-28 0:42 ` [RFC PATCH 18/20] mshv: Exclude Hyper-V donated pages from crash dump collection Jork Loeser
2026-05-28 0:42 ` [RFC PATCH 19/20] kexec: export kexec_in_progress for modules Jork Loeser
2026-05-28 0:42 ` [RFC PATCH 20/20] mshv: freeze and vacuum partitions across kexec Jork Loeser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260528004204.1484584-2-jloeser@linux.microsoft.com \
--to=jloeser@linux.microsoft.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=decui@microsoft.com \
--cc=graf@amazon.com \
--cc=haiyangz@microsoft.com \
--cc=hpa@zytor.com \
--cc=jasonmiu@google.com \
--cc=jbouron@amazon.com \
--cc=kees@kernel.org \
--cc=kexec@lists.infradead.org \
--cc=kys@microsoft.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longli@microsoft.com \
--cc=mario.limonciello@amd.com \
--cc=mhklinux@outlook.com \
--cc=mingo@redhat.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@soleen.com \
--cc=piliu@redhat.com \
--cc=pratyush@kernel.org \
--cc=rafael.j.wysocki@intel.com \
--cc=ran.xiaokai@zte.com.cn \
--cc=rppt@kernel.org \
--cc=sourabhjain@linux.ibm.com \
--cc=tglx@kernel.org \
--cc=wei.liu@kernel.org \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox