[PATCH v7 0/3] kho: add support for deferred struct page init

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [PATCH v7 0/3] kho: add support for deferred struct page init
@ 2026-03-17 14:15 Michal Clapinski
  2026-03-17 14:15 ` [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging Michal Clapinski
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Michal Clapinski @ 2026-03-17 14:15 UTC (permalink / raw)
  To: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Clapinski

When CONFIG_DEFERRED_STRUCT_PAGE_INIT (hereinafter DEFERRED) is
enabled, struct page initialization is deferred to parallel kthreads
that run later in the boot process.

Currently, KHO is incompatible with DEFERRED.
This series fixes that incompatibility.
---
v7:
- reimplemented the initialization of kho scratch again
v6:
- reimplemented the initialization of kho scratch
v5:
- rebased
v4:
- added a new commit to fix deferred init of kho scratch
- switched to ulong when refering to pfn
v3:
- changed commit msg
- don't invoke early_pfn_to_nid if CONFIG_DEFERRED_STRUCT_PAGE_INIT=n
v2:
- updated a comment

I took Evangelos's test code:
https://git.infradead.org/?p=users/vpetrog/linux.git;a=shortlog;h=refs/heads/kho-deferred-struct-page-init
and then modified it to this monster test that does 2 allocations:
at core_initcall (early) and at module_init (late). Then kexec, then
2 more allocations at these points, then restore the original 2, then
kexec, then restore the other 2. Basically I test preservation of early
and late allocation both on cold and on warm boot.
Tested it both with and without DEFERRED.

Evangelos Petrongonas (1):
  kho: make preserved pages compatible with deferred struct page init

Michal Clapinski (2):
  kho: make kho_scratch_overlap usable outside debugging
  kho: fix deferred init of kho scratch

 include/linux/kexec_handover.h              |  6 ++
 include/linux/memblock.h                    |  2 -
 kernel/liveupdate/Kconfig                   |  2 -
 kernel/liveupdate/Makefile                  |  1 -
 kernel/liveupdate/kexec_handover.c          | 65 ++++++++++++++++++---
 kernel/liveupdate/kexec_handover_debug.c    | 25 --------
 kernel/liveupdate/kexec_handover_internal.h |  7 ++-
 mm/memblock.c                               | 22 -------
 mm/page_alloc.c                             |  7 +++
 9 files changed, 74 insertions(+), 63 deletions(-)
 delete mode 100644 kernel/liveupdate/kexec_handover_debug.c

-- 
2.53.0.851.ga537e3e6e9-goog



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging
  2026-03-17 14:15 [PATCH v7 0/3] kho: add support for deferred struct page init Michal Clapinski
@ 2026-03-17 14:15 ` Michal Clapinski
  2026-03-18  9:16   ` Mike Rapoport
  2026-03-17 14:15 ` [PATCH v7 2/3] kho: fix deferred init of kho scratch Michal Clapinski
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Michal Clapinski @ 2026-03-17 14:15 UTC (permalink / raw)
  To: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Clapinski

Also return false if kho_scratch is NULL.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
---
 include/linux/kexec_handover.h              |  6 +++++
 kernel/liveupdate/Makefile                  |  1 -
 kernel/liveupdate/kexec_handover.c          | 28 ++++++++++++++++++---
 kernel/liveupdate/kexec_handover_debug.c    | 25 ------------------
 kernel/liveupdate/kexec_handover_internal.h |  7 ++++--
 5 files changed, 35 insertions(+), 32 deletions(-)
 delete mode 100644 kernel/liveupdate/kexec_handover_debug.c

diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h
index ac4129d1d741..6a0e572c3adc 100644
--- a/include/linux/kexec_handover.h
+++ b/include/linux/kexec_handover.h
@@ -35,6 +35,7 @@ void *kho_restore_vmalloc(const struct kho_vmalloc *preservation);
 int kho_add_subtree(const char *name, void *fdt);
 void kho_remove_subtree(void *fdt);
 int kho_retrieve_subtree(const char *name, phys_addr_t *phys);
+bool kho_scratch_overlap(phys_addr_t phys, size_t size);
 
 void kho_memory_init(void);
 
@@ -109,6 +110,11 @@ static inline int kho_retrieve_subtree(const char *name, phys_addr_t *phys)
 	return -EOPNOTSUPP;
 }
 
+static inline bool kho_scratch_overlap(phys_addr_t phys, size_t size)
+{
+	return false;
+}
+
 static inline void kho_memory_init(void) { }
 
 static inline void kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
index d2f779cbe279..dc352839ccf0 100644
--- a/kernel/liveupdate/Makefile
+++ b/kernel/liveupdate/Makefile
@@ -7,7 +7,6 @@ luo-y :=								\
 		luo_session.o
 
 obj-$(CONFIG_KEXEC_HANDOVER)		+= kexec_handover.o
-obj-$(CONFIG_KEXEC_HANDOVER_DEBUG)	+= kexec_handover_debug.o
 obj-$(CONFIG_KEXEC_HANDOVER_DEBUGFS)	+= kexec_handover_debugfs.o
 
 obj-$(CONFIG_LIVEUPDATE)		+= luo.o
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index 532f455c5d4f..c9b982372d6e 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -820,7 +820,8 @@ int kho_preserve_folio(struct folio *folio)
 	const unsigned long pfn = folio_pfn(folio);
 	const unsigned int order = folio_order(folio);
 
-	if (WARN_ON(kho_scratch_overlap(pfn << PAGE_SHIFT, PAGE_SIZE << order)))
+	if (WARN_ON(kho_scratch_overlap_debug(pfn << PAGE_SHIFT,
+					      PAGE_SIZE << order)))
 		return -EINVAL;
 
 	return kho_radix_add_page(tree, pfn, order);
@@ -864,10 +865,9 @@ int kho_preserve_pages(struct page *page, unsigned long nr_pages)
 	unsigned long failed_pfn = 0;
 	int err = 0;
 
-	if (WARN_ON(kho_scratch_overlap(start_pfn << PAGE_SHIFT,
-					nr_pages << PAGE_SHIFT))) {
+	if (WARN_ON(kho_scratch_overlap_debug(start_pfn << PAGE_SHIFT,
+					      nr_pages << PAGE_SHIFT)))
 		return -EINVAL;
-	}
 
 	while (pfn < end_pfn) {
 		unsigned int order =
@@ -1327,6 +1327,26 @@ int kho_retrieve_subtree(const char *name, phys_addr_t *phys)
 }
 EXPORT_SYMBOL_GPL(kho_retrieve_subtree);
 
+bool kho_scratch_overlap(phys_addr_t phys, size_t size)
+{
+	phys_addr_t scratch_start, scratch_end;
+	unsigned int i;
+
+	if (!kho_scratch)
+		return false;
+
+	for (i = 0; i < kho_scratch_cnt; i++) {
+		scratch_start = kho_scratch[i].addr;
+		scratch_end = kho_scratch[i].addr + kho_scratch[i].size;
+
+		if (phys < scratch_end && (phys + size) > scratch_start)
+			return true;
+	}
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(kho_scratch_overlap);
+
 static int __init kho_mem_retrieve(const void *fdt)
 {
 	struct kho_radix_tree tree;
diff --git a/kernel/liveupdate/kexec_handover_debug.c b/kernel/liveupdate/kexec_handover_debug.c
deleted file mode 100644
index 6efb696f5426..000000000000
--- a/kernel/liveupdate/kexec_handover_debug.c
+++ /dev/null
@@ -1,25 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * kexec_handover_debug.c - kexec handover optional debug functionality
- * Copyright (C) 2025 Google LLC, Pasha Tatashin <pasha.tatashin@soleen.com>
- */
-
-#define pr_fmt(fmt) "KHO: " fmt
-
-#include "kexec_handover_internal.h"
-
-bool kho_scratch_overlap(phys_addr_t phys, size_t size)
-{
-	phys_addr_t scratch_start, scratch_end;
-	unsigned int i;
-
-	for (i = 0; i < kho_scratch_cnt; i++) {
-		scratch_start = kho_scratch[i].addr;
-		scratch_end = kho_scratch[i].addr + kho_scratch[i].size;
-
-		if (phys < scratch_end && (phys + size) > scratch_start)
-			return true;
-	}
-
-	return false;
-}
diff --git a/kernel/liveupdate/kexec_handover_internal.h b/kernel/liveupdate/kexec_handover_internal.h
index 9a832a35254c..804d6a1209b8 100644
--- a/kernel/liveupdate/kexec_handover_internal.h
+++ b/kernel/liveupdate/kexec_handover_internal.h
@@ -41,9 +41,12 @@ static inline void kho_debugfs_fdt_remove(struct kho_debugfs *dbg,
 #endif /* CONFIG_KEXEC_HANDOVER_DEBUGFS */
 
 #ifdef CONFIG_KEXEC_HANDOVER_DEBUG
-bool kho_scratch_overlap(phys_addr_t phys, size_t size);
+static inline bool kho_scratch_overlap_debug(phys_addr_t phys, size_t size)
+{
+	return kho_scratch_overlap(phys, size);
+}
 #else
-static inline bool kho_scratch_overlap(phys_addr_t phys, size_t size)
+static inline bool kho_scratch_overlap_debug(phys_addr_t phys, size_t size)
 {
 	return false;
 }
-- 
2.53.0.851.ga537e3e6e9-goog



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging
  2026-03-17 14:15 ` [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging Michal Clapinski
@ 2026-03-18  9:16   ` Mike Rapoport
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2026-03-18  9:16 UTC (permalink / raw)
  To: Michal Clapinski
  Cc: Evangelos Petrongonas, Pasha Tatashin, Pratyush Yadav,
	Alexander Graf, Samiullah Khawaja, kexec, linux-mm, linux-kernel,
	Andrew Morton

Hi Michal,

On Tue, Mar 17, 2026 at 03:15:32PM +0100, Michal Clapinski wrote:
> Also return false if kho_scratch is NULL.
> 
> Signed-off-by: Michal Clapinski <mclapinski@google.com>
> ---
>  include/linux/kexec_handover.h              |  6 +++++
>  kernel/liveupdate/Makefile                  |  1 -
>  kernel/liveupdate/kexec_handover.c          | 28 ++++++++++++++++++---
>  kernel/liveupdate/kexec_handover_debug.c    | 25 ------------------
>  kernel/liveupdate/kexec_handover_internal.h |  7 ++++--
>  5 files changed, 35 insertions(+), 32 deletions(-)
>  delete mode 100644 kernel/liveupdate/kexec_handover_debug.c
> 
> diff --git a/include/linux/kexec_handover.h b/include/linux/kexec_handover.h
> index ac4129d1d741..6a0e572c3adc 100644
> --- a/include/linux/kexec_handover.h
> +++ b/include/linux/kexec_handover.h
> @@ -35,6 +35,7 @@ void *kho_restore_vmalloc(const struct kho_vmalloc *preservation);
>  int kho_add_subtree(const char *name, void *fdt);
>  void kho_remove_subtree(void *fdt);
>  int kho_retrieve_subtree(const char *name, phys_addr_t *phys);
> +bool kho_scratch_overlap(phys_addr_t phys, size_t size);
>  
>  void kho_memory_init(void);
>  
> @@ -109,6 +110,11 @@ static inline int kho_retrieve_subtree(const char *name, phys_addr_t *phys)
>  	return -EOPNOTSUPP;
>  }
>  
> +static inline bool kho_scratch_overlap(phys_addr_t phys, size_t size)
> +{
> +	return false;
> +}
> +
>  static inline void kho_memory_init(void) { }
>  
>  static inline void kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
> diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
> index d2f779cbe279..dc352839ccf0 100644
> --- a/kernel/liveupdate/Makefile
> +++ b/kernel/liveupdate/Makefile
> @@ -7,7 +7,6 @@ luo-y :=								\
>  		luo_session.o
>  
>  obj-$(CONFIG_KEXEC_HANDOVER)		+= kexec_handover.o
> -obj-$(CONFIG_KEXEC_HANDOVER_DEBUG)	+= kexec_handover_debug.o
>  obj-$(CONFIG_KEXEC_HANDOVER_DEBUGFS)	+= kexec_handover_debugfs.o
>  
>  obj-$(CONFIG_LIVEUPDATE)		+= luo.o
> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> index 532f455c5d4f..c9b982372d6e 100644
> --- a/kernel/liveupdate/kexec_handover.c
> +++ b/kernel/liveupdate/kexec_handover.c
> @@ -820,7 +820,8 @@ int kho_preserve_folio(struct folio *folio)
>  	const unsigned long pfn = folio_pfn(folio);
>  	const unsigned int order = folio_order(folio);
>  
> -	if (WARN_ON(kho_scratch_overlap(pfn << PAGE_SHIFT, PAGE_SIZE << order)))
> +	if (WARN_ON(kho_scratch_overlap_debug(pfn << PAGE_SHIFT,
> +					      PAGE_SIZE << order)))

Can't say I'm fond of kho_scratch_overlap_debug(). How about we make it

	if (IS_ENABLED(CONFIG_KEXEC_HANDOVER_DEBUG) &&
	    WARN_ON(kho_scratch_overlap(...)) 

>  		return -EINVAL;
>  
>  	return kho_radix_add_page(tree, pfn, order);
> @@ -864,10 +865,9 @@ int kho_preserve_pages(struct page *page, unsigned long nr_pages)
>  	unsigned long failed_pfn = 0;
>  	int err = 0;
>  
> -	if (WARN_ON(kho_scratch_overlap(start_pfn << PAGE_SHIFT,
> -					nr_pages << PAGE_SHIFT))) {
> +	if (WARN_ON(kho_scratch_overlap_debug(start_pfn << PAGE_SHIFT,
> +					      nr_pages << PAGE_SHIFT)))

Ditto.

>  		return -EINVAL;
> -	}
>  
>  	while (pfn < end_pfn) {
>  		unsigned int order =
> @@ -1327,6 +1327,26 @@ int kho_retrieve_subtree(const char *name, phys_addr_t *phys)
>  }
>  EXPORT_SYMBOL_GPL(kho_retrieve_subtree);
>  
> +bool kho_scratch_overlap(phys_addr_t phys, size_t size)
> +{
> +	phys_addr_t scratch_start, scratch_end;
> +	unsigned int i;
> +
> +	if (!kho_scratch)
> +		return false;
> +
> +	for (i = 0; i < kho_scratch_cnt; i++) {
> +		scratch_start = kho_scratch[i].addr;
> +		scratch_end = kho_scratch[i].addr + kho_scratch[i].size;
> +
> +		if (phys < scratch_end && (phys + size) > scratch_start)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(kho_scratch_overlap);

I don't think we need to EXPORT_SYMBOL() it and it'd better grouped with
scratch allocation functions.

> +
>  static int __init kho_mem_retrieve(const void *fdt)
>  {
>  	struct kho_radix_tree tree;

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-17 14:15 [PATCH v7 0/3] kho: add support for deferred struct page init Michal Clapinski
  2026-03-17 14:15 ` [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging Michal Clapinski
@ 2026-03-17 14:15 ` Michal Clapinski
  2026-03-17 23:23   ` Vishal Moola (Oracle)
                     ` (2 more replies)
  2026-03-17 14:15 ` [PATCH v7 3/3] kho: make preserved pages compatible with deferred struct page init Michal Clapinski
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 25+ messages in thread
From: Michal Clapinski @ 2026-03-17 14:15 UTC (permalink / raw)
  To: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Clapinski

Currently, if DEFERRED is enabled, kho_release_scratch will initialize
the struct pages and set migratetype of kho scratch. Unless the whole
scratch fit below first_deferred_pfn, some of that will be overwritten
either by deferred_init_pages or memmap_init_reserved_pages.

To fix it, I modified kho_release_scratch to only set the migratetype
on already initialized pages. Then, modified init_pageblock_migratetype
to set the migratetype to CMA if the page is located inside scratch.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
---
 include/linux/memblock.h           |  2 --
 kernel/liveupdate/kexec_handover.c | 10 ++++++----
 mm/memblock.c                      | 22 ----------------------
 mm/page_alloc.c                    |  7 +++++++
 4 files changed, 13 insertions(+), 28 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 6ec5e9ac0699..3e217414e12d 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -614,11 +614,9 @@ static inline void memtest_report_meminfo(struct seq_file *m) { }
 #ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
 void memblock_set_kho_scratch_only(void);
 void memblock_clear_kho_scratch_only(void);
-void memmap_init_kho_scratch_pages(void);
 #else
 static inline void memblock_set_kho_scratch_only(void) { }
 static inline void memblock_clear_kho_scratch_only(void) { }
-static inline void memmap_init_kho_scratch_pages(void) {}
 #endif
 
 #endif /* _LINUX_MEMBLOCK_H */
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index c9b982372d6e..e511a50fab9c 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -1477,8 +1477,7 @@ static void __init kho_release_scratch(void)
 {
 	phys_addr_t start, end;
 	u64 i;
-
-	memmap_init_kho_scratch_pages();
+	int nid;
 
 	/*
 	 * Mark scratch mem as CMA before we return it. That way we
@@ -1486,10 +1485,13 @@ static void __init kho_release_scratch(void)
 	 * we can reuse it as scratch memory again later.
 	 */
 	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
-			     MEMBLOCK_KHO_SCRATCH, &start, &end, NULL) {
+			     MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
 		ulong start_pfn = pageblock_start_pfn(PFN_DOWN(start));
 		ulong end_pfn = pageblock_align(PFN_UP(end));
 		ulong pfn;
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+		end_pfn = min(end_pfn, NODE_DATA(nid)->first_deferred_pfn);
+#endif
 
 		for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages)
 			init_pageblock_migratetype(pfn_to_page(pfn),
@@ -1500,8 +1502,8 @@ static void __init kho_release_scratch(void)
 void __init kho_memory_init(void)
 {
 	if (kho_in.scratch_phys) {
-		kho_scratch = phys_to_virt(kho_in.scratch_phys);
 		kho_release_scratch();
+		kho_scratch = phys_to_virt(kho_in.scratch_phys);
 
 		if (kho_mem_retrieve(kho_get_fdt()))
 			kho_in.fdt_phys = 0;
diff --git a/mm/memblock.c b/mm/memblock.c
index b3ddfdec7a80..ae6a5af46bd7 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -959,28 +959,6 @@ __init void memblock_clear_kho_scratch_only(void)
 {
 	kho_scratch_only = false;
 }
-
-__init void memmap_init_kho_scratch_pages(void)
-{
-	phys_addr_t start, end;
-	unsigned long pfn;
-	int nid;
-	u64 i;
-
-	if (!IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT))
-		return;
-
-	/*
-	 * Initialize struct pages for free scratch memory.
-	 * The struct pages for reserved scratch memory will be set up in
-	 * reserve_bootmem_region()
-	 */
-	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
-			     MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
-		for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
-			init_deferred_page(pfn, nid);
-	}
-}
 #endif
 
 /**
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ee81f5c67c18..5ca078dde61d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -55,6 +55,7 @@
 #include <linux/cacheinfo.h>
 #include <linux/pgalloc_tag.h>
 #include <linux/mmzone_lock.h>
+#include <linux/kexec_handover.h>
 #include <asm/div64.h>
 #include "internal.h"
 #include "shuffle.h"
@@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
 		     migratetype < MIGRATE_PCPTYPES))
 		migratetype = MIGRATE_UNMOVABLE;
 
+	/*
+	 * Mark KHO scratch as CMA so no unmovable allocations are made there.
+	 */
+	if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
+		migratetype = MIGRATE_CMA;
+
 	flags = migratetype;
 
 #ifdef CONFIG_MEMORY_ISOLATION
-- 
2.53.0.851.ga537e3e6e9-goog



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-17 14:15 ` [PATCH v7 2/3] kho: fix deferred init of kho scratch Michal Clapinski
@ 2026-03-17 23:23   ` Vishal Moola (Oracle)
  2026-03-18  0:08     ` SeongJae Park
  2026-03-18  9:33   ` Mike Rapoport
  2026-03-18 15:10   ` Zi Yan
  2 siblings, 1 reply; 25+ messages in thread
From: Vishal Moola (Oracle) @ 2026-03-17 23:23 UTC (permalink / raw)
  To: Michal Clapinski
  Cc: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm, linux-kernel, Andrew Morton

On Tue, Mar 17, 2026 at 03:15:33PM +0100, Michal Clapinski wrote:
> Currently, if DEFERRED is enabled, kho_release_scratch will initialize
> the struct pages and set migratetype of kho scratch. Unless the whole
> scratch fit below first_deferred_pfn, some of that will be overwritten
> either by deferred_init_pages or memmap_init_reserved_pages.
> 
> To fix it, I modified kho_release_scratch to only set the migratetype
> on already initialized pages. Then, modified init_pageblock_migratetype
> to set the migratetype to CMA if the page is located inside scratch.
> 
> Signed-off-by: Michal Clapinski <mclapinski@google.com>
> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ee81f5c67c18..5ca078dde61d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -55,6 +55,7 @@
>  #include <linux/cacheinfo.h>
>  #include <linux/pgalloc_tag.h>
>  #include <linux/mmzone_lock.h>
> +#include <linux/kexec_handover.h>
>  #include <asm/div64.h>
>  #include "internal.h"
>  #include "shuffle.h"
> @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
>  		     migratetype < MIGRATE_PCPTYPES))
>  		migratetype = MIGRATE_UNMOVABLE;
>  
> +	/*
> +	 * Mark KHO scratch as CMA so no unmovable allocations are made there.
> +	 */
> +	if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
> +		migratetype = MIGRATE_CMA;
> +
>  	flags = migratetype;
>  
>  #ifdef CONFIG_MEMORY_ISOLATION

I've just tried to build the current mm-new tree. I'm getting this
error:

 error: ‘MIGRATE_CMA’ undeclared (first use in this function); did you mean ‘MIGRATE_SYNC’?
  557 |                 migratetype = MIGRATE_CMA;
      |                               ^~~~~~~~~~~
      |                               MIGRATE_SYNC

From what I can tell, MIGRATE_CMA is only defined if CONFIG_CMA is
enabled (for x86 defconfig its disabled).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-17 23:23   ` Vishal Moola (Oracle)
@ 2026-03-18  0:08     ` SeongJae Park
  2026-03-18  0:23       ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: SeongJae Park @ 2026-03-18  0:08 UTC (permalink / raw)
  To: Vishal Moola (Oracle)
  Cc: SeongJae Park, Michal Clapinski, Evangelos Petrongonas,
	Pasha Tatashin, Mike Rapoport, Pratyush Yadav, Alexander Graf,
	Samiullah Khawaja, kexec, linux-mm, linux-kernel, Andrew Morton

On Tue, 17 Mar 2026 16:23:07 -0700 "Vishal Moola (Oracle)" <vishal.moola@gmail.com> wrote:

> On Tue, Mar 17, 2026 at 03:15:33PM +0100, Michal Clapinski wrote:
> > Currently, if DEFERRED is enabled, kho_release_scratch will initialize
> > the struct pages and set migratetype of kho scratch. Unless the whole
> > scratch fit below first_deferred_pfn, some of that will be overwritten
> > either by deferred_init_pages or memmap_init_reserved_pages.
> > 
> > To fix it, I modified kho_release_scratch to only set the migratetype
> > on already initialized pages. Then, modified init_pageblock_migratetype
> > to set the migratetype to CMA if the page is located inside scratch.
> > 
> > Signed-off-by: Michal Clapinski <mclapinski@google.com>
> > ---
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index ee81f5c67c18..5ca078dde61d 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -55,6 +55,7 @@
> >  #include <linux/cacheinfo.h>
> >  #include <linux/pgalloc_tag.h>
> >  #include <linux/mmzone_lock.h>
> > +#include <linux/kexec_handover.h>
> >  #include <asm/div64.h>
> >  #include "internal.h"
> >  #include "shuffle.h"
> > @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
> >  		     migratetype < MIGRATE_PCPTYPES))
> >  		migratetype = MIGRATE_UNMOVABLE;
> >  
> > +	/*
> > +	 * Mark KHO scratch as CMA so no unmovable allocations are made there.
> > +	 */
> > +	if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
> > +		migratetype = MIGRATE_CMA;
> > +
> >  	flags = migratetype;
> >  
> >  #ifdef CONFIG_MEMORY_ISOLATION
> 
> I've just tried to build the current mm-new tree. I'm getting this
> error:
> 
>  error: ‘MIGRATE_CMA’ undeclared (first use in this function); did you mean ‘MIGRATE_SYNC’?
>   557 |                 migratetype = MIGRATE_CMA;
>       |                               ^~~~~~~~~~~
>       |                               MIGRATE_SYNC
> 
> >From what I can tell, MIGRATE_CMA is only defined if CONFIG_CMA is
> enabled (for x86 defconfig its disabled).

I also got the same issue, and was about to report.  I added below workaround
patch to my test setup.  It is just a temporal quick fix for only my setup.
Michal or others may find a better fix.

=== >8 ===
From 9d89a12e1a17edd68750e97b0c8b1970e3adc648 Mon Sep 17 00:00:00 2001
From: SeongJae Park <sj@kernel.org>
Date: Tue, 17 Mar 2026 16:56:17 -0700
Subject: [PATCH] mm/page_alloc: enclose kho-specific code with
 CONFIG_KEXEC_HANDOVER

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/page_alloc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5ca078dde61d6..ed4d585f46202 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -550,11 +550,13 @@ void __meminit init_pageblock_migratetype(struct page *page,
 		     migratetype < MIGRATE_PCPTYPES))
 		migratetype = MIGRATE_UNMOVABLE;
 
+#ifdef CONFIG_KEXEC_HANDOVER
 	/*
 	 * Mark KHO scratch as CMA so no unmovable allocations are made there.
 	 */
 	if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
 		migratetype = MIGRATE_CMA;
+#endif
 
 	flags = migratetype;
 
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-18  0:08     ` SeongJae Park
@ 2026-03-18  0:23       ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2026-03-18  0:23 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Vishal Moola (Oracle), Michal Clapinski, Evangelos Petrongonas,
	Pasha Tatashin, Mike Rapoport, Pratyush Yadav, Alexander Graf,
	Samiullah Khawaja, kexec, linux-mm, linux-kernel

On Tue, 17 Mar 2026 17:08:15 -0700 SeongJae Park <sj@kernel.org> wrote:

> >  error: ‘MIGRATE_CMA’ undeclared (first use in this function); did you mean ‘MIGRATE_SYNC’?
> >   557 |                 migratetype = MIGRATE_CMA;
> >       |                               ^~~~~~~~~~~
> >       |                               MIGRATE_SYNC
> > 
> > >From what I can tell, MIGRATE_CMA is only defined if CONFIG_CMA is
> > enabled (for x86 defconfig its disabled).
> 
> I also got the same issue, and was about to report.

Thanks.  I dropped this series and repushed.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-17 14:15 ` [PATCH v7 2/3] kho: fix deferred init of kho scratch Michal Clapinski
  2026-03-17 23:23   ` Vishal Moola (Oracle)
@ 2026-03-18  9:33   ` Mike Rapoport
  2026-03-18 10:28     ` Michał Cłapiński
  2026-03-18 10:33     ` Michał Cłapiński
  2026-03-18 15:10   ` Zi Yan
  2 siblings, 2 replies; 25+ messages in thread
From: Mike Rapoport @ 2026-03-18  9:33 UTC (permalink / raw)
  To: Michal Clapinski
  Cc: Evangelos Petrongonas, Pasha Tatashin, Pratyush Yadav,
	Alexander Graf, Samiullah Khawaja, kexec, linux-mm, linux-kernel,
	Andrew Morton

Hi Michal,

On Tue, Mar 17, 2026 at 03:15:33PM +0100, Michal Clapinski wrote:
> Currently, if DEFERRED is enabled, kho_release_scratch will initialize

Please spell out CONFIG_DEFERRED_STRUCT_PAGE_INIT

> the struct pages and set migratetype of kho scratch. Unless the whole
> scratch fit below first_deferred_pfn, some of that will be overwritten
> either by deferred_init_pages or memmap_init_reserved_pages.

Usually we put brackets after function names to make them more visible.
 
> To fix it, I modified kho_release_scratch to only set the migratetype

Prefer an imperative mood please, e.g. "To fix it, modify
kho_release_scratch() ..."

> on already initialized pages. Then, modified init_pageblock_migratetype
> to set the migratetype to CMA if the page is located inside scratch.
> 
> Signed-off-by: Michal Clapinski <mclapinski@google.com>
> ---
>  include/linux/memblock.h           |  2 --
>  kernel/liveupdate/kexec_handover.c | 10 ++++++----
>  mm/memblock.c                      | 22 ----------------------
>  mm/page_alloc.c                    |  7 +++++++
>  4 files changed, 13 insertions(+), 28 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 6ec5e9ac0699..3e217414e12d 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -614,11 +614,9 @@ static inline void memtest_report_meminfo(struct seq_file *m) { }
>  #ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
>  void memblock_set_kho_scratch_only(void);
>  void memblock_clear_kho_scratch_only(void);
> -void memmap_init_kho_scratch_pages(void);
>  #else
>  static inline void memblock_set_kho_scratch_only(void) { }
>  static inline void memblock_clear_kho_scratch_only(void) { }
> -static inline void memmap_init_kho_scratch_pages(void) {}
>  #endif
>  
>  #endif /* _LINUX_MEMBLOCK_H */
> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> index c9b982372d6e..e511a50fab9c 100644
> --- a/kernel/liveupdate/kexec_handover.c
> +++ b/kernel/liveupdate/kexec_handover.c
> @@ -1477,8 +1477,7 @@ static void __init kho_release_scratch(void)
>  {
>  	phys_addr_t start, end;
>  	u64 i;
> -
> -	memmap_init_kho_scratch_pages();
> +	int nid;
>  
>  	/*
>  	 * Mark scratch mem as CMA before we return it. That way we
> @@ -1486,10 +1485,13 @@ static void __init kho_release_scratch(void)
>  	 * we can reuse it as scratch memory again later.
>  	 */
>  	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> -			     MEMBLOCK_KHO_SCRATCH, &start, &end, NULL) {
> +			     MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
>  		ulong start_pfn = pageblock_start_pfn(PFN_DOWN(start));
>  		ulong end_pfn = pageblock_align(PFN_UP(end));
>  		ulong pfn;
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> +		end_pfn = min(end_pfn, NODE_DATA(nid)->first_deferred_pfn);
> +#endif

A helper that returns first_deferred_pfn or ULONG_MAX might be beeter
looking. 

>  
>  		for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages)
>  			init_pageblock_migratetype(pfn_to_page(pfn),
> @@ -1500,8 +1502,8 @@ static void __init kho_release_scratch(void)
>  void __init kho_memory_init(void)
>  {
>  	if (kho_in.scratch_phys) {
> -		kho_scratch = phys_to_virt(kho_in.scratch_phys);
>  		kho_release_scratch();
> +		kho_scratch = phys_to_virt(kho_in.scratch_phys);

Why this change is needed?

>  
>  		if (kho_mem_retrieve(kho_get_fdt()))
>  			kho_in.fdt_phys = 0;
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b3ddfdec7a80..ae6a5af46bd7 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -959,28 +959,6 @@ __init void memblock_clear_kho_scratch_only(void)
>  {
>  	kho_scratch_only = false;
>  }
> -
> -__init void memmap_init_kho_scratch_pages(void)
> -{
> -	phys_addr_t start, end;
> -	unsigned long pfn;
> -	int nid;
> -	u64 i;
> -
> -	if (!IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT))
> -		return;
> -
> -	/*
> -	 * Initialize struct pages for free scratch memory.
> -	 * The struct pages for reserved scratch memory will be set up in
> -	 * reserve_bootmem_region()
> -	 */
> -	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> -			     MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
> -		for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
> -			init_deferred_page(pfn, nid);
> -	}
> -}
>  #endif
>  
>  /**
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ee81f5c67c18..5ca078dde61d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -55,6 +55,7 @@
>  #include <linux/cacheinfo.h>
>  #include <linux/pgalloc_tag.h>
>  #include <linux/mmzone_lock.h>
> +#include <linux/kexec_handover.h>
>  #include <asm/div64.h>
>  #include "internal.h"
>  #include "shuffle.h"
> @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
>  		     migratetype < MIGRATE_PCPTYPES))
>  		migratetype = MIGRATE_UNMOVABLE;
>  
> +	/*
> +	 * Mark KHO scratch as CMA so no unmovable allocations are made there.
> +	 */
> +	if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
> +		migratetype = MIGRATE_CMA;
> +

Please pick SJ's fixup for the next respin :)

>  	flags = migratetype;
>  
>  #ifdef CONFIG_MEMORY_ISOLATION
> -- 
> 2.53.0.851.ga537e3e6e9-goog
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-18  9:33   ` Mike Rapoport
@ 2026-03-18 10:28     ` Michał Cłapiński
  2026-03-18 10:33     ` Michał Cłapiński
  1 sibling, 0 replies; 25+ messages in thread
From: Michał Cłapiński @ 2026-03-18 10:28 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Evangelos Petrongonas, Pasha Tatashin, Pratyush Yadav,
	Alexander Graf, Samiullah Khawaja, kexec, linux-mm, linux-kernel,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 6381 bytes --]

On Wed, Mar 18, 2026 at 10:33 AM Mike Rapoport <rppt@kernel.org> wrote:

> Hi Michal,
>
> On Tue, Mar 17, 2026 at 03:15:33PM +0100, Michal Clapinski wrote:
> > Currently, if DEFERRED is enabled, kho_release_scratch will initialize
>
> Please spell out CONFIG_DEFERRED_STRUCT_PAGE_INIT
>
> > the struct pages and set migratetype of kho scratch. Unless the whole
> > scratch fit below first_deferred_pfn, some of that will be overwritten
> > either by deferred_init_pages or memmap_init_reserved_pages.
>
> Usually we put brackets after function names to make them more visible.
>
> > To fix it, I modified kho_release_scratch to only set the migratetype
>
> Prefer an imperative mood please, e.g. "To fix it, modify
> kho_release_scratch() ..."
>
> > on already initialized pages. Then, modified init_pageblock_migratetype
> > to set the migratetype to CMA if the page is located inside scratch.
> >
> > Signed-off-by: Michal Clapinski <mclapinski@google.com>
> > ---
> >  include/linux/memblock.h           |  2 --
> >  kernel/liveupdate/kexec_handover.c | 10 ++++++----
> >  mm/memblock.c                      | 22 ----------------------
> >  mm/page_alloc.c                    |  7 +++++++
> >  4 files changed, 13 insertions(+), 28 deletions(-)
> >
> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > index 6ec5e9ac0699..3e217414e12d 100644
> > --- a/include/linux/memblock.h
> > +++ b/include/linux/memblock.h
> > @@ -614,11 +614,9 @@ static inline void memtest_report_meminfo(struct
> seq_file *m) { }
> >  #ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
> >  void memblock_set_kho_scratch_only(void);
> >  void memblock_clear_kho_scratch_only(void);
> > -void memmap_init_kho_scratch_pages(void);
> >  #else
> >  static inline void memblock_set_kho_scratch_only(void) { }
> >  static inline void memblock_clear_kho_scratch_only(void) { }
> > -static inline void memmap_init_kho_scratch_pages(void) {}
> >  #endif
> >
> >  #endif /* _LINUX_MEMBLOCK_H */
> > diff --git a/kernel/liveupdate/kexec_handover.c
> b/kernel/liveupdate/kexec_handover.c
> > index c9b982372d6e..e511a50fab9c 100644
> > --- a/kernel/liveupdate/kexec_handover.c
> > +++ b/kernel/liveupdate/kexec_handover.c
> > @@ -1477,8 +1477,7 @@ static void __init kho_release_scratch(void)
> >  {
> >       phys_addr_t start, end;
> >       u64 i;
> > -
> > -     memmap_init_kho_scratch_pages();
> > +     int nid;
> >
> >       /*
> >        * Mark scratch mem as CMA before we return it. That way we
> > @@ -1486,10 +1485,13 @@ static void __init kho_release_scratch(void)
> >        * we can reuse it as scratch memory again later.
> >        */
> >       __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> > -                          MEMBLOCK_KHO_SCRATCH, &start, &end, NULL) {
> > +                          MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
> >               ulong start_pfn = pageblock_start_pfn(PFN_DOWN(start));
> >               ulong end_pfn = pageblock_align(PFN_UP(end));
> >               ulong pfn;
> > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> > +             end_pfn = min(end_pfn, NODE_DATA(nid)->first_deferred_pfn);
> > +#endif
>
> A helper that returns first_deferred_pfn or ULONG_MAX might be beeter
> looking.
>
> >
> >               for (pfn = start_pfn; pfn < end_pfn; pfn +=
> pageblock_nr_pages)
> >                       init_pageblock_migratetype(pfn_to_page(pfn),
> > @@ -1500,8 +1502,8 @@ static void __init kho_release_scratch(void)
> >  void __init kho_memory_init(void)
> >  {
> >       if (kho_in.scratch_phys) {
> > -             kho_scratch = phys_to_virt(kho_in.scratch_phys);
> >               kho_release_scratch();
> > +             kho_scratch = phys_to_virt(kho_in.scratch_phys);
>
> Why this change is needed?
>

It's not necessary but kho_release_scratch() will
call kho_scratch_overlap(). If kho_scratch is NULL, kho_scratch_overlap()
will return early, making it slightly faster. Alternatively, I skip
invoking kho_scratch_overlap() if migratetype is already MIGRATE_CMA.

>
> >               if (kho_mem_retrieve(kho_get_fdt()))
> >                       kho_in.fdt_phys = 0;
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index b3ddfdec7a80..ae6a5af46bd7 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -959,28 +959,6 @@ __init void memblock_clear_kho_scratch_only(void)
> >  {
> >       kho_scratch_only = false;
> >  }
> > -
> > -__init void memmap_init_kho_scratch_pages(void)
> > -{
> > -     phys_addr_t start, end;
> > -     unsigned long pfn;
> > -     int nid;
> > -     u64 i;
> > -
> > -     if (!IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT))
> > -             return;
> > -
> > -     /*
> > -      * Initialize struct pages for free scratch memory.
> > -      * The struct pages for reserved scratch memory will be set up in
> > -      * reserve_bootmem_region()
> > -      */
> > -     __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> > -                          MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
> > -             for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
> > -                     init_deferred_page(pfn, nid);
> > -     }
> > -}
> >  #endif
> >
> >  /**
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index ee81f5c67c18..5ca078dde61d 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -55,6 +55,7 @@
> >  #include <linux/cacheinfo.h>
> >  #include <linux/pgalloc_tag.h>
> >  #include <linux/mmzone_lock.h>
> > +#include <linux/kexec_handover.h>
> >  #include <asm/div64.h>
> >  #include "internal.h"
> >  #include "shuffle.h"
> > @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct
> page *page,
> >                    migratetype < MIGRATE_PCPTYPES))
> >               migratetype = MIGRATE_UNMOVABLE;
> >
> > +     /*
> > +      * Mark KHO scratch as CMA so no unmovable allocations are made
> there.
> > +      */
> > +     if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
> > +             migratetype = MIGRATE_CMA;
> > +
>
> Please pick SJ's fixup for the next respin :)
>
> >       flags = migratetype;
> >
> >  #ifdef CONFIG_MEMORY_ISOLATION
> > --
> > 2.53.0.851.ga537e3e6e9-goog
> >
>
> --
> Sincerely yours,
> Mike.
>

[-- Attachment #2: Type: text/html, Size: 8070 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-18  9:33   ` Mike Rapoport
  2026-03-18 10:28     ` Michał Cłapiński
@ 2026-03-18 10:33     ` Michał Cłapiński
  2026-03-18 11:02       ` Mike Rapoport
  1 sibling, 1 reply; 25+ messages in thread
From: Michał Cłapiński @ 2026-03-18 10:33 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Evangelos Petrongonas, Pasha Tatashin, Pratyush Yadav,
	Alexander Graf, Samiullah Khawaja, kexec, linux-mm, linux-kernel,
	Andrew Morton

On Wed, Mar 18, 2026 at 10:33 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> Hi Michal,
>
> On Tue, Mar 17, 2026 at 03:15:33PM +0100, Michal Clapinski wrote:
> > Currently, if DEFERRED is enabled, kho_release_scratch will initialize
>
> Please spell out CONFIG_DEFERRED_STRUCT_PAGE_INIT
>
> > the struct pages and set migratetype of kho scratch. Unless the whole
> > scratch fit below first_deferred_pfn, some of that will be overwritten
> > either by deferred_init_pages or memmap_init_reserved_pages.
>
> Usually we put brackets after function names to make them more visible.
>
> > To fix it, I modified kho_release_scratch to only set the migratetype
>
> Prefer an imperative mood please, e.g. "To fix it, modify
> kho_release_scratch() ..."
>
> > on already initialized pages. Then, modified init_pageblock_migratetype
> > to set the migratetype to CMA if the page is located inside scratch.
> >
> > Signed-off-by: Michal Clapinski <mclapinski@google.com>
> > ---
> >  include/linux/memblock.h           |  2 --
> >  kernel/liveupdate/kexec_handover.c | 10 ++++++----
> >  mm/memblock.c                      | 22 ----------------------
> >  mm/page_alloc.c                    |  7 +++++++
> >  4 files changed, 13 insertions(+), 28 deletions(-)
> >
> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > index 6ec5e9ac0699..3e217414e12d 100644
> > --- a/include/linux/memblock.h
> > +++ b/include/linux/memblock.h
> > @@ -614,11 +614,9 @@ static inline void memtest_report_meminfo(struct seq_file *m) { }
> >  #ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
> >  void memblock_set_kho_scratch_only(void);
> >  void memblock_clear_kho_scratch_only(void);
> > -void memmap_init_kho_scratch_pages(void);
> >  #else
> >  static inline void memblock_set_kho_scratch_only(void) { }
> >  static inline void memblock_clear_kho_scratch_only(void) { }
> > -static inline void memmap_init_kho_scratch_pages(void) {}
> >  #endif
> >
> >  #endif /* _LINUX_MEMBLOCK_H */
> > diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> > index c9b982372d6e..e511a50fab9c 100644
> > --- a/kernel/liveupdate/kexec_handover.c
> > +++ b/kernel/liveupdate/kexec_handover.c
> > @@ -1477,8 +1477,7 @@ static void __init kho_release_scratch(void)
> >  {
> >       phys_addr_t start, end;
> >       u64 i;
> > -
> > -     memmap_init_kho_scratch_pages();
> > +     int nid;
> >
> >       /*
> >        * Mark scratch mem as CMA before we return it. That way we
> > @@ -1486,10 +1485,13 @@ static void __init kho_release_scratch(void)
> >        * we can reuse it as scratch memory again later.
> >        */
> >       __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> > -                          MEMBLOCK_KHO_SCRATCH, &start, &end, NULL) {
> > +                          MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
> >               ulong start_pfn = pageblock_start_pfn(PFN_DOWN(start));
> >               ulong end_pfn = pageblock_align(PFN_UP(end));
> >               ulong pfn;
> > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> > +             end_pfn = min(end_pfn, NODE_DATA(nid)->first_deferred_pfn);
> > +#endif
>
> A helper that returns first_deferred_pfn or ULONG_MAX might be beeter
> looking.
>
> >
> >               for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages)
> >                       init_pageblock_migratetype(pfn_to_page(pfn),
> > @@ -1500,8 +1502,8 @@ static void __init kho_release_scratch(void)
> >  void __init kho_memory_init(void)
> >  {
> >       if (kho_in.scratch_phys) {
> > -             kho_scratch = phys_to_virt(kho_in.scratch_phys);
> >               kho_release_scratch();
> > +             kho_scratch = phys_to_virt(kho_in.scratch_phys);
>
> Why this change is needed?

It's not necessary but kho_release_scratch() will call
kho_scratch_overlap(). If kho_scratch is NULL, kho_scratch_overlap()
will return early, making it slightly faster. Alternatively, I skip
invoking kho_scratch_overlap() if migratetype is already MIGRATE_CMA.
(resending this since the last email was html)

> >
> >               if (kho_mem_retrieve(kho_get_fdt()))
> >                       kho_in.fdt_phys = 0;
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index b3ddfdec7a80..ae6a5af46bd7 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -959,28 +959,6 @@ __init void memblock_clear_kho_scratch_only(void)
> >  {
> >       kho_scratch_only = false;
> >  }
> > -
> > -__init void memmap_init_kho_scratch_pages(void)
> > -{
> > -     phys_addr_t start, end;
> > -     unsigned long pfn;
> > -     int nid;
> > -     u64 i;
> > -
> > -     if (!IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT))
> > -             return;
> > -
> > -     /*
> > -      * Initialize struct pages for free scratch memory.
> > -      * The struct pages for reserved scratch memory will be set up in
> > -      * reserve_bootmem_region()
> > -      */
> > -     __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> > -                          MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
> > -             for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
> > -                     init_deferred_page(pfn, nid);
> > -     }
> > -}
> >  #endif
> >
> >  /**
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index ee81f5c67c18..5ca078dde61d 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -55,6 +55,7 @@
> >  #include <linux/cacheinfo.h>
> >  #include <linux/pgalloc_tag.h>
> >  #include <linux/mmzone_lock.h>
> > +#include <linux/kexec_handover.h>
> >  #include <asm/div64.h>
> >  #include "internal.h"
> >  #include "shuffle.h"
> > @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
> >                    migratetype < MIGRATE_PCPTYPES))
> >               migratetype = MIGRATE_UNMOVABLE;
> >
> > +     /*
> > +      * Mark KHO scratch as CMA so no unmovable allocations are made there.
> > +      */
> > +     if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
> > +             migratetype = MIGRATE_CMA;
> > +
>
> Please pick SJ's fixup for the next respin :)
>
> >       flags = migratetype;
> >
> >  #ifdef CONFIG_MEMORY_ISOLATION
> > --
> > 2.53.0.851.ga537e3e6e9-goog
> >
>
> --
> Sincerely yours,
> Mike.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-18 10:33     ` Michał Cłapiński
@ 2026-03-18 11:02       ` Mike Rapoport
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2026-03-18 11:02 UTC (permalink / raw)
  To: Michał Cłapiński
  Cc: Evangelos Petrongonas, Pasha Tatashin, Pratyush Yadav,
	Alexander Graf, Samiullah Khawaja, kexec, linux-mm, linux-kernel,
	Andrew Morton

On Wed, Mar 18, 2026 at 11:33:09AM +0100, Michał Cłapiński wrote:
> On Wed, Mar 18, 2026 at 10:33 AM Mike Rapoport <rppt@kernel.org> wrote:
> > > @@ -1500,8 +1502,8 @@ static void __init kho_release_scratch(void)
> > >  void __init kho_memory_init(void)
> > >  {
> > >       if (kho_in.scratch_phys) {
> > > -             kho_scratch = phys_to_virt(kho_in.scratch_phys);
> > >               kho_release_scratch();
> > > +             kho_scratch = phys_to_virt(kho_in.scratch_phys);
> >
> > Why this change is needed?
> 
> It's not necessary but kho_release_scratch() will call
> kho_scratch_overlap(). If kho_scratch is NULL, kho_scratch_overlap()
> will return early, making it slightly faster. Alternatively, I skip
> invoking kho_scratch_overlap() if migratetype is already MIGRATE_CMA.
> (resending this since the last email was html)

Thanks for the explanation.
Let's keep the change and add a sentence about it to the changelog.

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-17 14:15 ` [PATCH v7 2/3] kho: fix deferred init of kho scratch Michal Clapinski
  2026-03-17 23:23   ` Vishal Moola (Oracle)
  2026-03-18  9:33   ` Mike Rapoport
@ 2026-03-18 15:10   ` Zi Yan
  2026-03-18 15:18     ` Michał Cłapiński
  2 siblings, 1 reply; 25+ messages in thread
From: Zi Yan @ 2026-03-18 15:10 UTC (permalink / raw)
  To: Michal Clapinski
  Cc: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm, linux-kernel, Andrew Morton

On 17 Mar 2026, at 10:15, Michal Clapinski wrote:

> Currently, if DEFERRED is enabled, kho_release_scratch will initialize
> the struct pages and set migratetype of kho scratch. Unless the whole
> scratch fit below first_deferred_pfn, some of that will be overwritten
> either by deferred_init_pages or memmap_init_reserved_pages.
>
> To fix it, I modified kho_release_scratch to only set the migratetype
> on already initialized pages. Then, modified init_pageblock_migratetype
> to set the migratetype to CMA if the page is located inside scratch.
>
> Signed-off-by: Michal Clapinski <mclapinski@google.com>
> ---
>  include/linux/memblock.h           |  2 --
>  kernel/liveupdate/kexec_handover.c | 10 ++++++----
>  mm/memblock.c                      | 22 ----------------------
>  mm/page_alloc.c                    |  7 +++++++
>  4 files changed, 13 insertions(+), 28 deletions(-)
>

<snip>

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ee81f5c67c18..5ca078dde61d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -55,6 +55,7 @@
>  #include <linux/cacheinfo.h>
>  #include <linux/pgalloc_tag.h>
>  #include <linux/mmzone_lock.h>
> +#include <linux/kexec_handover.h>
>  #include <asm/div64.h>
>  #include "internal.h"
>  #include "shuffle.h"
> @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
>  		     migratetype < MIGRATE_PCPTYPES))
>  		migratetype = MIGRATE_UNMOVABLE;
>
> +	/*
> +	 * Mark KHO scratch as CMA so no unmovable allocations are made there.
> +	 */
> +	if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
> +		migratetype = MIGRATE_CMA;
> +

If this is only for deferred init code, why not put it in deferred_free_pages()?
Otherwise, all init_pageblock_migratetype() callers need to pay the penalty
of traversing kho_scratch array.


>  	flags = migratetype;
>
>  #ifdef CONFIG_MEMORY_ISOLATION



Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-18 15:10   ` Zi Yan
@ 2026-03-18 15:18     ` Michał Cłapiński
  2026-03-18 15:26       ` Zi Yan
  0 siblings, 1 reply; 25+ messages in thread
From: Michał Cłapiński @ 2026-03-18 15:18 UTC (permalink / raw)
  To: Zi Yan
  Cc: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm, linux-kernel, Andrew Morton

On Wed, Mar 18, 2026 at 4:10 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On 17 Mar 2026, at 10:15, Michal Clapinski wrote:
>
> > Currently, if DEFERRED is enabled, kho_release_scratch will initialize
> > the struct pages and set migratetype of kho scratch. Unless the whole
> > scratch fit below first_deferred_pfn, some of that will be overwritten
> > either by deferred_init_pages or memmap_init_reserved_pages.
> >
> > To fix it, I modified kho_release_scratch to only set the migratetype
> > on already initialized pages. Then, modified init_pageblock_migratetype
> > to set the migratetype to CMA if the page is located inside scratch.
> >
> > Signed-off-by: Michal Clapinski <mclapinski@google.com>
> > ---
> >  include/linux/memblock.h           |  2 --
> >  kernel/liveupdate/kexec_handover.c | 10 ++++++----
> >  mm/memblock.c                      | 22 ----------------------
> >  mm/page_alloc.c                    |  7 +++++++
> >  4 files changed, 13 insertions(+), 28 deletions(-)
> >
>
> <snip>
>
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index ee81f5c67c18..5ca078dde61d 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -55,6 +55,7 @@
> >  #include <linux/cacheinfo.h>
> >  #include <linux/pgalloc_tag.h>
> >  #include <linux/mmzone_lock.h>
> > +#include <linux/kexec_handover.h>
> >  #include <asm/div64.h>
> >  #include "internal.h"
> >  #include "shuffle.h"
> > @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
> >                    migratetype < MIGRATE_PCPTYPES))
> >               migratetype = MIGRATE_UNMOVABLE;
> >
> > +     /*
> > +      * Mark KHO scratch as CMA so no unmovable allocations are made there.
> > +      */
> > +     if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
> > +             migratetype = MIGRATE_CMA;
> > +
>
> If this is only for deferred init code, why not put it in deferred_free_pages()?
> Otherwise, all init_pageblock_migratetype() callers need to pay the penalty
> of traversing kho_scratch array.

Because reserve_bootmem_region() doesn't call deferred_free_pages().
So I would also have to modify it.

And the early initialization won't pay the penalty of traversing the
kho_scratch array, since then kho_scratch is NULL.

> >       flags = migratetype;
> >
> >  #ifdef CONFIG_MEMORY_ISOLATION
>
>
>
> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-18 15:18     ` Michał Cłapiński
@ 2026-03-18 15:26       ` Zi Yan
  2026-03-18 15:45         ` Michał Cłapiński
  0 siblings, 1 reply; 25+ messages in thread
From: Zi Yan @ 2026-03-18 15:26 UTC (permalink / raw)
  To: Michał Cłapiński
  Cc: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm, linux-kernel, Andrew Morton

On 18 Mar 2026, at 11:18, Michał Cłapiński wrote:

> On Wed, Mar 18, 2026 at 4:10 PM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 17 Mar 2026, at 10:15, Michal Clapinski wrote:
>>
>>> Currently, if DEFERRED is enabled, kho_release_scratch will initialize
>>> the struct pages and set migratetype of kho scratch. Unless the whole
>>> scratch fit below first_deferred_pfn, some of that will be overwritten
>>> either by deferred_init_pages or memmap_init_reserved_pages.
>>>
>>> To fix it, I modified kho_release_scratch to only set the migratetype
>>> on already initialized pages. Then, modified init_pageblock_migratetype
>>> to set the migratetype to CMA if the page is located inside scratch.
>>>
>>> Signed-off-by: Michal Clapinski <mclapinski@google.com>
>>> ---
>>>  include/linux/memblock.h           |  2 --
>>>  kernel/liveupdate/kexec_handover.c | 10 ++++++----
>>>  mm/memblock.c                      | 22 ----------------------
>>>  mm/page_alloc.c                    |  7 +++++++
>>>  4 files changed, 13 insertions(+), 28 deletions(-)
>>>
>>
>> <snip>
>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index ee81f5c67c18..5ca078dde61d 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -55,6 +55,7 @@
>>>  #include <linux/cacheinfo.h>
>>>  #include <linux/pgalloc_tag.h>
>>>  #include <linux/mmzone_lock.h>
>>> +#include <linux/kexec_handover.h>
>>>  #include <asm/div64.h>
>>>  #include "internal.h"
>>>  #include "shuffle.h"
>>> @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
>>>                    migratetype < MIGRATE_PCPTYPES))
>>>               migratetype = MIGRATE_UNMOVABLE;
>>>
>>> +     /*
>>> +      * Mark KHO scratch as CMA so no unmovable allocations are made there.
>>> +      */
>>> +     if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
>>> +             migratetype = MIGRATE_CMA;
>>> +
>>
>> If this is only for deferred init code, why not put it in deferred_free_pages()?
>> Otherwise, all init_pageblock_migratetype() callers need to pay the penalty
>> of traversing kho_scratch array.
>
> Because reserve_bootmem_region() doesn't call deferred_free_pages().
> So I would also have to modify it.
>
> And the early initialization won't pay the penalty of traversing the
> kho_scratch array, since then kho_scratch is NULL.

How about hugetlb_bootmem_init_migratetype(), init_cma_pageblock(),
init_cma_reserved_pageblock(), __init_page_from_nid(), memmap_init_range(),
__init_zone_device_page()?

1. are they having any PFN range overlapping with kho?
2. is kho_scratch NULL for them?

1 tells us whether putting code in init_pageblock_migratetype() could save
the hassle of changing all above locations.
2 tells us how many callers are affected by traversing kho_scratch.

Thanks.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-18 15:26       ` Zi Yan
@ 2026-03-18 15:45         ` Michał Cłapiński
  2026-03-18 17:08           ` Zi Yan
  0 siblings, 1 reply; 25+ messages in thread
From: Michał Cłapiński @ 2026-03-18 15:45 UTC (permalink / raw)
  To: Zi Yan
  Cc: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm, linux-kernel, Andrew Morton

On Wed, Mar 18, 2026 at 4:26 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On 18 Mar 2026, at 11:18, Michał Cłapiński wrote:
>
> > On Wed, Mar 18, 2026 at 4:10 PM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> On 17 Mar 2026, at 10:15, Michal Clapinski wrote:
> >>
> >>> Currently, if DEFERRED is enabled, kho_release_scratch will initialize
> >>> the struct pages and set migratetype of kho scratch. Unless the whole
> >>> scratch fit below first_deferred_pfn, some of that will be overwritten
> >>> either by deferred_init_pages or memmap_init_reserved_pages.
> >>>
> >>> To fix it, I modified kho_release_scratch to only set the migratetype
> >>> on already initialized pages. Then, modified init_pageblock_migratetype
> >>> to set the migratetype to CMA if the page is located inside scratch.
> >>>
> >>> Signed-off-by: Michal Clapinski <mclapinski@google.com>
> >>> ---
> >>>  include/linux/memblock.h           |  2 --
> >>>  kernel/liveupdate/kexec_handover.c | 10 ++++++----
> >>>  mm/memblock.c                      | 22 ----------------------
> >>>  mm/page_alloc.c                    |  7 +++++++
> >>>  4 files changed, 13 insertions(+), 28 deletions(-)
> >>>
> >>
> >> <snip>
> >>
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index ee81f5c67c18..5ca078dde61d 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -55,6 +55,7 @@
> >>>  #include <linux/cacheinfo.h>
> >>>  #include <linux/pgalloc_tag.h>
> >>>  #include <linux/mmzone_lock.h>
> >>> +#include <linux/kexec_handover.h>
> >>>  #include <asm/div64.h>
> >>>  #include "internal.h"
> >>>  #include "shuffle.h"
> >>> @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
> >>>                    migratetype < MIGRATE_PCPTYPES))
> >>>               migratetype = MIGRATE_UNMOVABLE;
> >>>
> >>> +     /*
> >>> +      * Mark KHO scratch as CMA so no unmovable allocations are made there.
> >>> +      */
> >>> +     if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
> >>> +             migratetype = MIGRATE_CMA;
> >>> +
> >>
> >> If this is only for deferred init code, why not put it in deferred_free_pages()?
> >> Otherwise, all init_pageblock_migratetype() callers need to pay the penalty
> >> of traversing kho_scratch array.
> >
> > Because reserve_bootmem_region() doesn't call deferred_free_pages().
> > So I would also have to modify it.
> >
> > And the early initialization won't pay the penalty of traversing the
> > kho_scratch array, since then kho_scratch is NULL.
>
> How about hugetlb_bootmem_init_migratetype(), init_cma_pageblock(),
> init_cma_reserved_pageblock(), __init_page_from_nid(), memmap_init_range(),
> __init_zone_device_page()?
>
> 1. are they having any PFN range overlapping with kho?
> 2. is kho_scratch NULL for them?
>
> 1 tells us whether putting code in init_pageblock_migratetype() could save
> the hassle of changing all above locations.
> 2 tells us how many callers are affected by traversing kho_scratch.

I could try answering those questions but

1. I'm new to this and I'm not sure how correct the answers will be.

2. If you're not using CONFIG_KEXEC_HANDOVER, the performance penalty
will be zero.
If you are using it, currently you have to disable
CONFIG_DEFERRED_STRUCT_PAGE_INIT and the performance hit from this is
far, far greater. This solution saves 0.5s on my setup (100GB of
memory). We can always improve the performance further in the future.

> Thanks.
>
> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-18 15:45         ` Michał Cłapiński
@ 2026-03-18 17:08           ` Zi Yan
  2026-03-18 17:19             ` Michał Cłapiński
  0 siblings, 1 reply; 25+ messages in thread
From: Zi Yan @ 2026-03-18 17:08 UTC (permalink / raw)
  To: Michał Cłapiński
  Cc: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm, linux-kernel, Andrew Morton

On 18 Mar 2026, at 11:45, Michał Cłapiński wrote:

> On Wed, Mar 18, 2026 at 4:26 PM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 18 Mar 2026, at 11:18, Michał Cłapiński wrote:
>>
>>> On Wed, Mar 18, 2026 at 4:10 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>
>>>> On 17 Mar 2026, at 10:15, Michal Clapinski wrote:
>>>>
>>>>> Currently, if DEFERRED is enabled, kho_release_scratch will initialize
>>>>> the struct pages and set migratetype of kho scratch. Unless the whole
>>>>> scratch fit below first_deferred_pfn, some of that will be overwritten
>>>>> either by deferred_init_pages or memmap_init_reserved_pages.
>>>>>
>>>>> To fix it, I modified kho_release_scratch to only set the migratetype
>>>>> on already initialized pages. Then, modified init_pageblock_migratetype
>>>>> to set the migratetype to CMA if the page is located inside scratch.
>>>>>
>>>>> Signed-off-by: Michal Clapinski <mclapinski@google.com>
>>>>> ---
>>>>>  include/linux/memblock.h           |  2 --
>>>>>  kernel/liveupdate/kexec_handover.c | 10 ++++++----
>>>>>  mm/memblock.c                      | 22 ----------------------
>>>>>  mm/page_alloc.c                    |  7 +++++++
>>>>>  4 files changed, 13 insertions(+), 28 deletions(-)
>>>>>
>>>>
>>>> <snip>
>>>>
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index ee81f5c67c18..5ca078dde61d 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -55,6 +55,7 @@
>>>>>  #include <linux/cacheinfo.h>
>>>>>  #include <linux/pgalloc_tag.h>
>>>>>  #include <linux/mmzone_lock.h>
>>>>> +#include <linux/kexec_handover.h>
>>>>>  #include <asm/div64.h>
>>>>>  #include "internal.h"
>>>>>  #include "shuffle.h"
>>>>> @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
>>>>>                    migratetype < MIGRATE_PCPTYPES))
>>>>>               migratetype = MIGRATE_UNMOVABLE;
>>>>>
>>>>> +     /*
>>>>> +      * Mark KHO scratch as CMA so no unmovable allocations are made there.
>>>>> +      */
>>>>> +     if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
>>>>> +             migratetype = MIGRATE_CMA;
>>>>> +
>>>>
>>>> If this is only for deferred init code, why not put it in deferred_free_pages()?
>>>> Otherwise, all init_pageblock_migratetype() callers need to pay the penalty
>>>> of traversing kho_scratch array.
>>>
>>> Because reserve_bootmem_region() doesn't call deferred_free_pages().
>>> So I would also have to modify it.
>>>
>>> And the early initialization won't pay the penalty of traversing the
>>> kho_scratch array, since then kho_scratch is NULL.
>>
>> How about hugetlb_bootmem_init_migratetype(), init_cma_pageblock(),
>> init_cma_reserved_pageblock(), __init_page_from_nid(), memmap_init_range(),
>> __init_zone_device_page()?
>>
>> 1. are they having any PFN range overlapping with kho?
>> 2. is kho_scratch NULL for them?
>>
>> 1 tells us whether putting code in init_pageblock_migratetype() could save
>> the hassle of changing all above locations.
>> 2 tells us how many callers are affected by traversing kho_scratch.
>
> I could try answering those questions but
>
> 1. I'm new to this and I'm not sure how correct the answers will be.
>
> 2. If you're not using CONFIG_KEXEC_HANDOVER, the performance penalty
> will be zero.
> If you are using it, currently you have to disable
> CONFIG_DEFERRED_STRUCT_PAGE_INIT and the performance hit from this is
> far, far greater. This solution saves 0.5s on my setup (100GB of
> memory). We can always improve the performance further in the future.
>

OK, I asked Claude for help and the answer is that not all callers of
init_pageblock_migratetype() touch kho scratch memory regions. Basically,
you only need to perform the kho_scratch_overlap() check in
__init_page_from_nid() to achieve the same end result.

The below is the analysis from Claude.
Based on my understanding,
1. memmap_init_range() is done before kho_memory_init(), so it does not need
the check.

2. __init_zone_device_page() is not relevant.

3. init_cma_reserved_pageblock() / init_cma_pageblock() are already set
to MIGRATE_CMA.

4. hugetlb is not used by kho scratch, so also does not need the check.

5. kho_release_scratch() already takes care of it.

The remaining memblock_free_pages() needs a check, but I am not 100%.

# kho_scratch_overlap() in init_pageblock_migratetype() — scope analysis

## Context

Commit a7700b3c6779 ("kho: fix deferred init of kho scratch") added a
kho_scratch_overlap() call inside init_pageblock_migratetype() in
mm/page_alloc.c:

```c
if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
    migratetype = MIGRATE_CMA;
```

kho_scratch_overlap() does a NULL check followed by a loop over the
kho_scratch array. For non-KHO boots (kho_scratch == NULL) the cost is
a single NULL load and branch. For KHO boots the loop runs on every call
to init_pageblock_migratetype().

## Question

Does this add overhead for callers whose memory range cannot overlap
with scratch? Can the check be moved to the caller side?

## Call site analysis

init_pageblock_migratetype() has nine call sites. The init call ordering
relevant to scratch is:

```
setup_arch()
  zone_sizes_init() -> free_area_init() -> memmap_init_range()   [1]

mm_init_free_all() / start_kernel():
  kho_memory_init() -> kho_release_scratch()                     [2]
  memblock_free_all()
    free_low_memory_core_early()
      memmap_init_reserved_pages()
        reserve_bootmem_region() -> __init_deferred_page()
          -> __init_page_from_nid()                              [3]
  deferred init kthreads -> __init_page_from_nid()               [4]
```

### Per call site

**mm/mm_init.c — __init_page_from_nid() (deferred init)**

Called for every deferred pfn (>= first_deferred_pfn). Scratch pages
in the deferred range are not touched by kho_release_scratch() (new
code clips end_pfn to first_deferred_pfn) and not touched by
memmap_init_range() (stops at first_deferred_pfn). This path sets
MIGRATE_MOVABLE on deferred scratch pageblocks after
kho_release_scratch() has already run.

**Needs the fix: yes.**

Both sub-paths that reach this function for deferred scratch pages:
- deferred init kthreads [4]
- reserve_bootmem_region() -> __init_deferred_page() [3]
  (early_page_initialised() returns early for non-deferred pfns, so
  __init_page_from_nid() is only reached for deferred pfns here too)

**mm/mm_init.c — memmap_init_range()**

Runs during setup_arch() [1], before kho_memory_init() [2]. Sets
MIGRATE_MOVABLE on scratch pageblocks, but kho_release_scratch() runs
afterward and correctly overrides to MIGRATE_CMA for non-deferred
scratch. For deferred scratch, memmap_init_range() stops at
first_deferred_pfn and never processes them.

**Needs the fix: no.**

**mm/mm_init.c — __init_zone_device_page()**

ZONE_DEVICE path only. Scratch is normal RAM, not ZONE_DEVICE.

**Needs the fix: no.**

**mm/mm_init.c — memblock_free_pages() (lines ~2012 and ~2023)**

Called by memblock_free_all() for free (non-reserved) memblock regions.
Scratch is memblock-reserved and released through the CMA path, not
through memblock_free_all().

**Needs the fix: no.**

**mm/mm_init.c — init_cma_reserved_pageblock() / init_cma_pageblock()**

Both already pass MIGRATE_CMA. The kho_scratch_overlap() check would
be redundant even if scratch reaches these paths.

**Needs the fix: no (redundant).**

**mm/hugetlb.c — __prep_compound_gigantic_folio()**

Gigantic hugepage setup. Scratch regions are not used for gigantic
hugepages.

**Needs the fix: no.**

**kernel/liveupdate/kexec_handover.c — kho_release_scratch()**

Already passes MIGRATE_CMA. Additionally, kho_scratch is NULL at the
point kho_release_scratch() runs (kho_memory_init() sets kho_scratch
only after kho_release_scratch() returns), so kho_scratch_overlap()
would return false regardless.

**Needs the fix: no.**

## Conclusion

The only path that actually requires the MIGRATE_CMA override is
__init_page_from_nid(). All problematic sub-paths (deferred init
kthreads and reserve_bootmem_region()) converge there.

The check could be moved to __init_page_from_nid() to keep the
KHO-specific concern out of the generic init_pageblock_migratetype():

```c
/* mm/mm_init.c: __init_page_from_nid() */
if (pageblock_aligned(pfn)) {
    enum migratetype mt = MIGRATE_MOVABLE;
    if (kho_scratch_overlap(PFN_PHYS(pfn), PAGE_SIZE))
        mt = MIGRATE_CMA;
    init_pageblock_migratetype(pfn_to_page(pfn), mt, false);
}
```

__init_page_from_nid() is only compiled under CONFIG_DEFERRED_STRUCT_PAGE_INIT,
which is the only configuration where the bug can occur, so the
kho_scratch_overlap() call would be naturally gated by that config.

Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-18 17:08           ` Zi Yan
@ 2026-03-18 17:19             ` Michał Cłapiński
  2026-03-18 17:36               ` Zi Yan
  0 siblings, 1 reply; 25+ messages in thread
From: Michał Cłapiński @ 2026-03-18 17:19 UTC (permalink / raw)
  To: Zi Yan
  Cc: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm, linux-kernel, Andrew Morton

On Wed, Mar 18, 2026 at 6:08 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On 18 Mar 2026, at 11:45, Michał Cłapiński wrote:
>
> > On Wed, Mar 18, 2026 at 4:26 PM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> On 18 Mar 2026, at 11:18, Michał Cłapiński wrote:
> >>
> >>> On Wed, Mar 18, 2026 at 4:10 PM Zi Yan <ziy@nvidia.com> wrote:
> >>>>
> >>>> On 17 Mar 2026, at 10:15, Michal Clapinski wrote:
> >>>>
> >>>>> Currently, if DEFERRED is enabled, kho_release_scratch will initialize
> >>>>> the struct pages and set migratetype of kho scratch. Unless the whole
> >>>>> scratch fit below first_deferred_pfn, some of that will be overwritten
> >>>>> either by deferred_init_pages or memmap_init_reserved_pages.
> >>>>>
> >>>>> To fix it, I modified kho_release_scratch to only set the migratetype
> >>>>> on already initialized pages. Then, modified init_pageblock_migratetype
> >>>>> to set the migratetype to CMA if the page is located inside scratch.
> >>>>>
> >>>>> Signed-off-by: Michal Clapinski <mclapinski@google.com>
> >>>>> ---
> >>>>>  include/linux/memblock.h           |  2 --
> >>>>>  kernel/liveupdate/kexec_handover.c | 10 ++++++----
> >>>>>  mm/memblock.c                      | 22 ----------------------
> >>>>>  mm/page_alloc.c                    |  7 +++++++
> >>>>>  4 files changed, 13 insertions(+), 28 deletions(-)
> >>>>>
> >>>>
> >>>> <snip>
> >>>>
> >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>>>> index ee81f5c67c18..5ca078dde61d 100644
> >>>>> --- a/mm/page_alloc.c
> >>>>> +++ b/mm/page_alloc.c
> >>>>> @@ -55,6 +55,7 @@
> >>>>>  #include <linux/cacheinfo.h>
> >>>>>  #include <linux/pgalloc_tag.h>
> >>>>>  #include <linux/mmzone_lock.h>
> >>>>> +#include <linux/kexec_handover.h>
> >>>>>  #include <asm/div64.h>
> >>>>>  #include "internal.h"
> >>>>>  #include "shuffle.h"
> >>>>> @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
> >>>>>                    migratetype < MIGRATE_PCPTYPES))
> >>>>>               migratetype = MIGRATE_UNMOVABLE;
> >>>>>
> >>>>> +     /*
> >>>>> +      * Mark KHO scratch as CMA so no unmovable allocations are made there.
> >>>>> +      */
> >>>>> +     if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
> >>>>> +             migratetype = MIGRATE_CMA;
> >>>>> +
> >>>>
> >>>> If this is only for deferred init code, why not put it in deferred_free_pages()?
> >>>> Otherwise, all init_pageblock_migratetype() callers need to pay the penalty
> >>>> of traversing kho_scratch array.
> >>>
> >>> Because reserve_bootmem_region() doesn't call deferred_free_pages().
> >>> So I would also have to modify it.
> >>>
> >>> And the early initialization won't pay the penalty of traversing the
> >>> kho_scratch array, since then kho_scratch is NULL.
> >>
> >> How about hugetlb_bootmem_init_migratetype(), init_cma_pageblock(),
> >> init_cma_reserved_pageblock(), __init_page_from_nid(), memmap_init_range(),
> >> __init_zone_device_page()?
> >>
> >> 1. are they having any PFN range overlapping with kho?
> >> 2. is kho_scratch NULL for them?
> >>
> >> 1 tells us whether putting code in init_pageblock_migratetype() could save
> >> the hassle of changing all above locations.
> >> 2 tells us how many callers are affected by traversing kho_scratch.
> >
> > I could try answering those questions but
> >
> > 1. I'm new to this and I'm not sure how correct the answers will be.
> >
> > 2. If you're not using CONFIG_KEXEC_HANDOVER, the performance penalty
> > will be zero.
> > If you are using it, currently you have to disable
> > CONFIG_DEFERRED_STRUCT_PAGE_INIT and the performance hit from this is
> > far, far greater. This solution saves 0.5s on my setup (100GB of
> > memory). We can always improve the performance further in the future.
> >
>
> OK, I asked Claude for help and the answer is that not all callers of
> init_pageblock_migratetype() touch kho scratch memory regions. Basically,
> you only need to perform the kho_scratch_overlap() check in
> __init_page_from_nid() to achieve the same end result.
>
>
> The below is the analysis from Claude.
> Based on my understanding,
> 1. memmap_init_range() is done before kho_memory_init(), so it does not need
> the check.
>
> 2. __init_zone_device_page() is not relevant.
>
> 3. init_cma_reserved_pageblock() / init_cma_pageblock() are already set
> to MIGRATE_CMA.
>
> 4. hugetlb is not used by kho scratch, so also does not need the check.
>
> 5. kho_release_scratch() already takes care of it.
>
> The remaining memblock_free_pages() needs a check, but I am not 100%.
>
>
> # kho_scratch_overlap() in init_pageblock_migratetype() — scope analysis
>
> ## Context
>
> Commit a7700b3c6779 ("kho: fix deferred init of kho scratch") added a
> kho_scratch_overlap() call inside init_pageblock_migratetype() in
> mm/page_alloc.c:
>
> ```c
> if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
>     migratetype = MIGRATE_CMA;
> ```
>
> kho_scratch_overlap() does a NULL check followed by a loop over the
> kho_scratch array. For non-KHO boots (kho_scratch == NULL) the cost is
> a single NULL load and branch. For KHO boots the loop runs on every call
> to init_pageblock_migratetype().
>
> ## Question
>
> Does this add overhead for callers whose memory range cannot overlap
> with scratch? Can the check be moved to the caller side?
>
> ## Call site analysis
>
> init_pageblock_migratetype() has nine call sites. The init call ordering
> relevant to scratch is:
>
> ```
> setup_arch()
>   zone_sizes_init() -> free_area_init() -> memmap_init_range()   [1]
>
> mm_init_free_all() / start_kernel():
>   kho_memory_init() -> kho_release_scratch()                     [2]
>   memblock_free_all()
>     free_low_memory_core_early()
>       memmap_init_reserved_pages()
>         reserve_bootmem_region() -> __init_deferred_page()
>           -> __init_page_from_nid()                              [3]
>   deferred init kthreads -> __init_page_from_nid()               [4]
> ```

I don't understand this. deferred_free_pages() doesn't call
__init_page_from_nid(). So I would clearly need to modify both
deferred_free_pages and __init_page_from_nid.

>
> ### Per call site
>
> **mm/mm_init.c — __init_page_from_nid() (deferred init)**
>
> Called for every deferred pfn (>= first_deferred_pfn). Scratch pages
> in the deferred range are not touched by kho_release_scratch() (new
> code clips end_pfn to first_deferred_pfn) and not touched by
> memmap_init_range() (stops at first_deferred_pfn). This path sets
> MIGRATE_MOVABLE on deferred scratch pageblocks after
> kho_release_scratch() has already run.
>
> **Needs the fix: yes.**
>
> Both sub-paths that reach this function for deferred scratch pages:
> - deferred init kthreads [4]
> - reserve_bootmem_region() -> __init_deferred_page() [3]
>   (early_page_initialised() returns early for non-deferred pfns, so
>   __init_page_from_nid() is only reached for deferred pfns here too)
>
> **mm/mm_init.c — memmap_init_range()**
>
> Runs during setup_arch() [1], before kho_memory_init() [2]. Sets
> MIGRATE_MOVABLE on scratch pageblocks, but kho_release_scratch() runs
> afterward and correctly overrides to MIGRATE_CMA for non-deferred
> scratch. For deferred scratch, memmap_init_range() stops at
> first_deferred_pfn and never processes them.
>
> **Needs the fix: no.**
>
> **mm/mm_init.c — __init_zone_device_page()**
>
> ZONE_DEVICE path only. Scratch is normal RAM, not ZONE_DEVICE.
>
> **Needs the fix: no.**
>
> **mm/mm_init.c — memblock_free_pages() (lines ~2012 and ~2023)**
>
> Called by memblock_free_all() for free (non-reserved) memblock regions.
> Scratch is memblock-reserved and released through the CMA path, not
> through memblock_free_all().
>
> **Needs the fix: no.**
>
> **mm/mm_init.c — init_cma_reserved_pageblock() / init_cma_pageblock()**
>
> Both already pass MIGRATE_CMA. The kho_scratch_overlap() check would
> be redundant even if scratch reaches these paths.
>
> **Needs the fix: no (redundant).**
>
> **mm/hugetlb.c — __prep_compound_gigantic_folio()**
>
> Gigantic hugepage setup. Scratch regions are not used for gigantic
> hugepages.
>
> **Needs the fix: no.**
>
> **kernel/liveupdate/kexec_handover.c — kho_release_scratch()**
>
> Already passes MIGRATE_CMA. Additionally, kho_scratch is NULL at the
> point kho_release_scratch() runs (kho_memory_init() sets kho_scratch
> only after kho_release_scratch() returns), so kho_scratch_overlap()
> would return false regardless.
>
> **Needs the fix: no.**
>
> ## Conclusion
>
> The only path that actually requires the MIGRATE_CMA override is
> __init_page_from_nid(). All problematic sub-paths (deferred init
> kthreads and reserve_bootmem_region()) converge there.
>
> The check could be moved to __init_page_from_nid() to keep the
> KHO-specific concern out of the generic init_pageblock_migratetype():
>
> ```c
> /* mm/mm_init.c: __init_page_from_nid() */
> if (pageblock_aligned(pfn)) {
>     enum migratetype mt = MIGRATE_MOVABLE;
>     if (kho_scratch_overlap(PFN_PHYS(pfn), PAGE_SIZE))
>         mt = MIGRATE_CMA;
>     init_pageblock_migratetype(pfn_to_page(pfn), mt, false);
> }
> ```
>
> __init_page_from_nid() is only compiled under CONFIG_DEFERRED_STRUCT_PAGE_INIT,
> which is the only configuration where the bug can occur, so the
> kho_scratch_overlap() call would be naturally gated by that config.
>
>
>
> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-18 17:19             ` Michał Cłapiński
@ 2026-03-18 17:36               ` Zi Yan
  2026-03-19  7:54                 ` Mike Rapoport
  0 siblings, 1 reply; 25+ messages in thread
From: Zi Yan @ 2026-03-18 17:36 UTC (permalink / raw)
  To: Michał Cłapiński
  Cc: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm, linux-kernel, Andrew Morton

On 18 Mar 2026, at 13:19, Michał Cłapiński wrote:

> On Wed, Mar 18, 2026 at 6:08 PM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 18 Mar 2026, at 11:45, Michał Cłapiński wrote:
>>
>>> On Wed, Mar 18, 2026 at 4:26 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>
>>>> On 18 Mar 2026, at 11:18, Michał Cłapiński wrote:
>>>>
>>>>> On Wed, Mar 18, 2026 at 4:10 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>
>>>>>> On 17 Mar 2026, at 10:15, Michal Clapinski wrote:
>>>>>>
>>>>>>> Currently, if DEFERRED is enabled, kho_release_scratch will initialize
>>>>>>> the struct pages and set migratetype of kho scratch. Unless the whole
>>>>>>> scratch fit below first_deferred_pfn, some of that will be overwritten
>>>>>>> either by deferred_init_pages or memmap_init_reserved_pages.
>>>>>>>
>>>>>>> To fix it, I modified kho_release_scratch to only set the migratetype
>>>>>>> on already initialized pages. Then, modified init_pageblock_migratetype
>>>>>>> to set the migratetype to CMA if the page is located inside scratch.
>>>>>>>
>>>>>>> Signed-off-by: Michal Clapinski <mclapinski@google.com>
>>>>>>> ---
>>>>>>>  include/linux/memblock.h           |  2 --
>>>>>>>  kernel/liveupdate/kexec_handover.c | 10 ++++++----
>>>>>>>  mm/memblock.c                      | 22 ----------------------
>>>>>>>  mm/page_alloc.c                    |  7 +++++++
>>>>>>>  4 files changed, 13 insertions(+), 28 deletions(-)
>>>>>>>
>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>> index ee81f5c67c18..5ca078dde61d 100644
>>>>>>> --- a/mm/page_alloc.c
>>>>>>> +++ b/mm/page_alloc.c
>>>>>>> @@ -55,6 +55,7 @@
>>>>>>>  #include <linux/cacheinfo.h>
>>>>>>>  #include <linux/pgalloc_tag.h>
>>>>>>>  #include <linux/mmzone_lock.h>
>>>>>>> +#include <linux/kexec_handover.h>
>>>>>>>  #include <asm/div64.h>
>>>>>>>  #include "internal.h"
>>>>>>>  #include "shuffle.h"
>>>>>>> @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
>>>>>>>                    migratetype < MIGRATE_PCPTYPES))
>>>>>>>               migratetype = MIGRATE_UNMOVABLE;
>>>>>>>
>>>>>>> +     /*
>>>>>>> +      * Mark KHO scratch as CMA so no unmovable allocations are made there.
>>>>>>> +      */
>>>>>>> +     if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
>>>>>>> +             migratetype = MIGRATE_CMA;
>>>>>>> +
>>>>>>
>>>>>> If this is only for deferred init code, why not put it in deferred_free_pages()?
>>>>>> Otherwise, all init_pageblock_migratetype() callers need to pay the penalty
>>>>>> of traversing kho_scratch array.
>>>>>
>>>>> Because reserve_bootmem_region() doesn't call deferred_free_pages().
>>>>> So I would also have to modify it.
>>>>>
>>>>> And the early initialization won't pay the penalty of traversing the
>>>>> kho_scratch array, since then kho_scratch is NULL.
>>>>
>>>> How about hugetlb_bootmem_init_migratetype(), init_cma_pageblock(),
>>>> init_cma_reserved_pageblock(), __init_page_from_nid(), memmap_init_range(),
>>>> __init_zone_device_page()?
>>>>
>>>> 1. are they having any PFN range overlapping with kho?
>>>> 2. is kho_scratch NULL for them?
>>>>
>>>> 1 tells us whether putting code in init_pageblock_migratetype() could save
>>>> the hassle of changing all above locations.
>>>> 2 tells us how many callers are affected by traversing kho_scratch.
>>>
>>> I could try answering those questions but
>>>
>>> 1. I'm new to this and I'm not sure how correct the answers will be.
>>>
>>> 2. If you're not using CONFIG_KEXEC_HANDOVER, the performance penalty
>>> will be zero.
>>> If you are using it, currently you have to disable
>>> CONFIG_DEFERRED_STRUCT_PAGE_INIT and the performance hit from this is
>>> far, far greater. This solution saves 0.5s on my setup (100GB of
>>> memory). We can always improve the performance further in the future.
>>>
>>
>> OK, I asked Claude for help and the answer is that not all callers of
>> init_pageblock_migratetype() touch kho scratch memory regions. Basically,
>> you only need to perform the kho_scratch_overlap() check in
>> __init_page_from_nid() to achieve the same end result.
>>
>>
>> The below is the analysis from Claude.
>> Based on my understanding,
>> 1. memmap_init_range() is done before kho_memory_init(), so it does not need
>> the check.
>>
>> 2. __init_zone_device_page() is not relevant.
>>
>> 3. init_cma_reserved_pageblock() / init_cma_pageblock() are already set
>> to MIGRATE_CMA.
>>
>> 4. hugetlb is not used by kho scratch, so also does not need the check.
>>
>> 5. kho_release_scratch() already takes care of it.
>>
>> The remaining memblock_free_pages() needs a check, but I am not 100%.
>>
>>
>> # kho_scratch_overlap() in init_pageblock_migratetype() — scope analysis
>>
>> ## Context
>>
>> Commit a7700b3c6779 ("kho: fix deferred init of kho scratch") added a
>> kho_scratch_overlap() call inside init_pageblock_migratetype() in
>> mm/page_alloc.c:
>>
>> ```c
>> if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
>>     migratetype = MIGRATE_CMA;
>> ```
>>
>> kho_scratch_overlap() does a NULL check followed by a loop over the
>> kho_scratch array. For non-KHO boots (kho_scratch == NULL) the cost is
>> a single NULL load and branch. For KHO boots the loop runs on every call
>> to init_pageblock_migratetype().
>>
>> ## Question
>>
>> Does this add overhead for callers whose memory range cannot overlap
>> with scratch? Can the check be moved to the caller side?
>>
>> ## Call site analysis
>>
>> init_pageblock_migratetype() has nine call sites. The init call ordering
>> relevant to scratch is:
>>
>> ```
>> setup_arch()
>>   zone_sizes_init() -> free_area_init() -> memmap_init_range()   [1]
>>
>> mm_init_free_all() / start_kernel():
>>   kho_memory_init() -> kho_release_scratch()                     [2]
>>   memblock_free_all()
>>     free_low_memory_core_early()
>>       memmap_init_reserved_pages()
>>         reserve_bootmem_region() -> __init_deferred_page()
>>           -> __init_page_from_nid()                              [3]
>>   deferred init kthreads -> __init_page_from_nid()               [4]
>> ```
>
> I don't understand this. deferred_free_pages() doesn't call
> __init_page_from_nid(). So I would clearly need to modify both
> deferred_free_pages and __init_page_from_nid.

Sure. But other callers I mentioned above do not need to check kho_scratch,
right?


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-18 17:36               ` Zi Yan
@ 2026-03-19  7:54                 ` Mike Rapoport
  2026-03-19 18:17                   ` Michał Cłapiński
  0 siblings, 1 reply; 25+ messages in thread
From: Mike Rapoport @ 2026-03-19  7:54 UTC (permalink / raw)
  To: Zi Yan
  Cc: Michał Cłapiński, Evangelos Petrongonas,
	Pasha Tatashin, Pratyush Yadav, Alexander Graf, Samiullah Khawaja,
	kexec, linux-mm, linux-kernel, Andrew Morton

Hi,

On Wed, Mar 18, 2026 at 01:36:07PM -0400, Zi Yan wrote:
> On 18 Mar 2026, at 13:19, Michał Cłapiński wrote:
> > On Wed, Mar 18, 2026 at 6:08 PM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> ## Call site analysis
> >>
> >> init_pageblock_migratetype() has nine call sites. The init call ordering
> >> relevant to scratch is:
> >>
> >> ```
> >> setup_arch()
> >>   zone_sizes_init() -> free_area_init() -> memmap_init_range()   [1]

Hmm, this is slightly outdated, but largely correct :)

> >>
> >> mm_init_free_all() / start_kernel():
> >>   kho_memory_init() -> kho_release_scratch()                     [2]
> >>   memblock_free_all()
> >>     free_low_memory_core_early()
> >>       memmap_init_reserved_pages()
> >>         reserve_bootmem_region() -> __init_deferred_page()
> >>           -> __init_page_from_nid()                              [3]
> >>   deferred init kthreads -> __init_page_from_nid()               [4]

And this is wrong, deferred init does not call __init_page_from_nid, only
reserve_bootmem_region() does.

And there's a case claude missed:

hugetlb_bootmem_free_invalid_page() -> __init_page_from_nid() that
shouldn't check for KHO. Well, at least until we have support for hugetlb
persistence and most probably even afterwards.

I don't think we should modify reserve_bootmem_region(). If there are
reserved pages in a pageblock, it does not matter if it's initialized to
MIGRATE_CMA. It only becomes important if the reserved pages freed, so we
can update pageblock migrate type in free_reserved_area().
When we boot with KHO, all memblock allocations come from scratch, so
anything freed in free_reserved_area() should become CMA again.

> >> ```
> >
> > I don't understand this. deferred_free_pages() doesn't call
> > __init_page_from_nid(). So I would clearly need to modify both
> > deferred_free_pages and __init_page_from_nid.

For deferred_free_pages() we don't need kho_scratch_overlap(), we already
have memblock_region (almost) at hand and it's enough to check if it's
MEMBLOCK_KHO_SCRATCH.

Something along these lines (compile tested only) should do the trick:

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 3e217414e12d..b9b1e0991ec8 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -275,6 +275,8 @@ static inline void __next_physmem_range(u64 *idx, struct memblock_type *type,
 	__for_each_mem_range(i, &memblock.reserved, NULL, NUMA_NO_NODE,	\
 			     MEMBLOCK_NONE, p_start, p_end, NULL)
 
+struct memblock_region *memblock_region_from_iter(u64 iterator);
+
 static inline bool memblock_is_hotpluggable(struct memblock_region *m)
 {
 	return m->flags & MEMBLOCK_HOTPLUG;
diff --git a/mm/memblock.c b/mm/memblock.c
index ae6a5af46bd7..9cf99f32279f 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1359,6 +1359,16 @@ void __init_memblock __next_mem_range_rev(u64 *idx, int nid,
 	*idx = ULLONG_MAX;
 }
 
+__init_memblock struct memblock_region *memblock_region_from_iter(u64 iterator)
+{
+	int index = iterator & 0xffffffff;
+
+	if (index < 0 || index >= memblock.memory.cnt)
+		return NULL;
+
+	return &memblock.memory.regions[index];
+}
+
 /*
  * Common iterator interface used to define for_each_mem_pfn_range().
  */
diff --git a/mm/mm_init.c b/mm/mm_init.c
index cec7bb758bdd..96b25895ffbe 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1996,7 +1996,7 @@ unsigned long __init node_map_pfn_alignment(void)
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static void __init deferred_free_pages(unsigned long pfn,
-		unsigned long nr_pages)
+		unsigned long nr_pages, enum migratetype mt)
 {
 	struct page *page;
 	unsigned long i;
@@ -2009,8 +2009,7 @@ static void __init deferred_free_pages(unsigned long pfn,
 	/* Free a large naturally-aligned chunk if possible */
 	if (nr_pages == MAX_ORDER_NR_PAGES && IS_MAX_ORDER_ALIGNED(pfn)) {
 		for (i = 0; i < nr_pages; i += pageblock_nr_pages)
-			init_pageblock_migratetype(page + i, MIGRATE_MOVABLE,
-					false);
+			init_pageblock_migratetype(page + i, mt, false);
 		__free_pages_core(page, MAX_PAGE_ORDER, MEMINIT_EARLY);
 		return;
 	}
@@ -2020,8 +2019,7 @@ static void __init deferred_free_pages(unsigned long pfn,
 
 	for (i = 0; i < nr_pages; i++, page++, pfn++) {
 		if (pageblock_aligned(pfn))
-			init_pageblock_migratetype(page, MIGRATE_MOVABLE,
-					false);
+			init_pageblock_migratetype(page, mt, false);
 		__free_pages_core(page, 0, MEMINIT_EARLY);
 	}
 }
@@ -2077,6 +2075,8 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
 	u64 i = 0;
 
 	for_each_free_mem_range(i, nid, 0, &start, &end, NULL) {
+		struct memblock_region *region = memblock_region_from_iter(i);
+		enum migratetype mt = MIGRATE_MOVABLE;
 		unsigned long spfn = PFN_UP(start);
 		unsigned long epfn = PFN_DOWN(end);
 
@@ -2086,12 +2086,15 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
 		spfn = max(spfn, start_pfn);
 		epfn = min(epfn, end_pfn);
 
+		if (memblock_is_kho_scratch(region))
+			mt = MIGRATE_CMA;
+
 		while (spfn < epfn) {
 			unsigned long mo_pfn = ALIGN(spfn + 1, MAX_ORDER_NR_PAGES);
 			unsigned long chunk_end = min(mo_pfn, epfn);
 
 			nr_pages += deferred_init_pages(zone, spfn, chunk_end);
-			deferred_free_pages(spfn, chunk_end - spfn);
+			deferred_free_pages(spfn, chunk_end - spfn, mt);
 
 			spfn = chunk_end;

-- 
Sincerely yours,
Mike.


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-19  7:54                 ` Mike Rapoport
@ 2026-03-19 18:17                   ` Michał Cłapiński
  2026-03-22 14:45                     ` Mike Rapoport
  0 siblings, 1 reply; 25+ messages in thread
From: Michał Cłapiński @ 2026-03-19 18:17 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Zi Yan, Evangelos Petrongonas, Pasha Tatashin, Pratyush Yadav,
	Alexander Graf, Samiullah Khawaja, kexec, linux-mm, linux-kernel,
	Andrew Morton

On Thu, Mar 19, 2026 at 8:54 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> Hi,
>
> On Wed, Mar 18, 2026 at 01:36:07PM -0400, Zi Yan wrote:
> > On 18 Mar 2026, at 13:19, Michał Cłapiński wrote:
> > > On Wed, Mar 18, 2026 at 6:08 PM Zi Yan <ziy@nvidia.com> wrote:
> > >>
> > >> ## Call site analysis
> > >>
> > >> init_pageblock_migratetype() has nine call sites. The init call ordering
> > >> relevant to scratch is:
> > >>
> > >> ```
> > >> setup_arch()
> > >>   zone_sizes_init() -> free_area_init() -> memmap_init_range()   [1]
>
> Hmm, this is slightly outdated, but largely correct :)
>
> > >>
> > >> mm_init_free_all() / start_kernel():
> > >>   kho_memory_init() -> kho_release_scratch()                     [2]
> > >>   memblock_free_all()
> > >>     free_low_memory_core_early()
> > >>       memmap_init_reserved_pages()
> > >>         reserve_bootmem_region() -> __init_deferred_page()
> > >>           -> __init_page_from_nid()                              [3]
> > >>   deferred init kthreads -> __init_page_from_nid()               [4]
>
> And this is wrong, deferred init does not call __init_page_from_nid, only
> reserve_bootmem_region() does.
>
> And there's a case claude missed:
>
> hugetlb_bootmem_free_invalid_page() -> __init_page_from_nid() that
> shouldn't check for KHO. Well, at least until we have support for hugetlb
> persistence and most probably even afterwards.
>
> I don't think we should modify reserve_bootmem_region(). If there are
> reserved pages in a pageblock, it does not matter if it's initialized to
> MIGRATE_CMA. It only becomes important if the reserved pages freed, so we
> can update pageblock migrate type in free_reserved_area().
> When we boot with KHO, all memblock allocations come from scratch, so
> anything freed in free_reserved_area() should become CMA again.

What happens if the reserved area covers one page and that page is
pageblock aligned? Then it won't be marked as CMA until it is freed
and unmovable allocation might appear in that pageblock, right?

>
> > >> ```
> > >
> > > I don't understand this. deferred_free_pages() doesn't call
> > > __init_page_from_nid(). So I would clearly need to modify both
> > > deferred_free_pages and __init_page_from_nid.
>
> For deferred_free_pages() we don't need kho_scratch_overlap(), we already
> have memblock_region (almost) at hand and it's enough to check if it's
> MEMBLOCK_KHO_SCRATCH.
>
> Something along these lines (compile tested only) should do the trick:
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 3e217414e12d..b9b1e0991ec8 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -275,6 +275,8 @@ static inline void __next_physmem_range(u64 *idx, struct memblock_type *type,
>         __for_each_mem_range(i, &memblock.reserved, NULL, NUMA_NO_NODE, \
>                              MEMBLOCK_NONE, p_start, p_end, NULL)
>
> +struct memblock_region *memblock_region_from_iter(u64 iterator);
> +
>  static inline bool memblock_is_hotpluggable(struct memblock_region *m)
>  {
>         return m->flags & MEMBLOCK_HOTPLUG;
> diff --git a/mm/memblock.c b/mm/memblock.c
> index ae6a5af46bd7..9cf99f32279f 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1359,6 +1359,16 @@ void __init_memblock __next_mem_range_rev(u64 *idx, int nid,
>         *idx = ULLONG_MAX;
>  }
>
> +__init_memblock struct memblock_region *memblock_region_from_iter(u64 iterator)
> +{
> +       int index = iterator & 0xffffffff;

I'm not sure about this. __next_mem_range() has this code:
/*
* The region which ends first is
* advanced for the next iteration.
*/
if (m_end <= r_end)
        idx_a++;
else
        idx_b++;

Therefore, the index you get from this might be correct or it might
already be incremented.

> +
> +       if (index < 0 || index >= memblock.memory.cnt)
> +               return NULL;
> +
> +       return &memblock.memory.regions[index];
> +}
> +
>  /*
>   * Common iterator interface used to define for_each_mem_pfn_range().
>   */
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index cec7bb758bdd..96b25895ffbe 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1996,7 +1996,7 @@ unsigned long __init node_map_pfn_alignment(void)
>
>  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  static void __init deferred_free_pages(unsigned long pfn,
> -               unsigned long nr_pages)
> +               unsigned long nr_pages, enum migratetype mt)
>  {
>         struct page *page;
>         unsigned long i;
> @@ -2009,8 +2009,7 @@ static void __init deferred_free_pages(unsigned long pfn,
>         /* Free a large naturally-aligned chunk if possible */
>         if (nr_pages == MAX_ORDER_NR_PAGES && IS_MAX_ORDER_ALIGNED(pfn)) {
>                 for (i = 0; i < nr_pages; i += pageblock_nr_pages)
> -                       init_pageblock_migratetype(page + i, MIGRATE_MOVABLE,
> -                                       false);
> +                       init_pageblock_migratetype(page + i, mt, false);
>                 __free_pages_core(page, MAX_PAGE_ORDER, MEMINIT_EARLY);
>                 return;
>         }
> @@ -2020,8 +2019,7 @@ static void __init deferred_free_pages(unsigned long pfn,
>
>         for (i = 0; i < nr_pages; i++, page++, pfn++) {
>                 if (pageblock_aligned(pfn))
> -                       init_pageblock_migratetype(page, MIGRATE_MOVABLE,
> -                                       false);
> +                       init_pageblock_migratetype(page, mt, false);
>                 __free_pages_core(page, 0, MEMINIT_EARLY);
>         }
>  }
> @@ -2077,6 +2075,8 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
>         u64 i = 0;
>
>         for_each_free_mem_range(i, nid, 0, &start, &end, NULL) {
> +               struct memblock_region *region = memblock_region_from_iter(i);
> +               enum migratetype mt = MIGRATE_MOVABLE;
>                 unsigned long spfn = PFN_UP(start);
>                 unsigned long epfn = PFN_DOWN(end);
>
> @@ -2086,12 +2086,15 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
>                 spfn = max(spfn, start_pfn);
>                 epfn = min(epfn, end_pfn);
>
> +               if (memblock_is_kho_scratch(region))
> +                       mt = MIGRATE_CMA;
> +
>                 while (spfn < epfn) {
>                         unsigned long mo_pfn = ALIGN(spfn + 1, MAX_ORDER_NR_PAGES);
>                         unsigned long chunk_end = min(mo_pfn, epfn);
>
>                         nr_pages += deferred_init_pages(zone, spfn, chunk_end);
> -                       deferred_free_pages(spfn, chunk_end - spfn);
> +                       deferred_free_pages(spfn, chunk_end - spfn, mt);
>
>                         spfn = chunk_end;
>
> --
> Sincerely yours,
> Mike.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
  2026-03-19 18:17                   ` Michał Cłapiński
@ 2026-03-22 14:45                     ` Mike Rapoport
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2026-03-22 14:45 UTC (permalink / raw)
  To: Michał Cłapiński
  Cc: Zi Yan, Evangelos Petrongonas, Pasha Tatashin, Pratyush Yadav,
	Alexander Graf, Samiullah Khawaja, kexec, linux-mm, linux-kernel,
	Andrew Morton

On Thu, Mar 19, 2026 at 07:17:48PM +0100, Michał Cłapiński wrote:
> On Thu, Mar 19, 2026 at 8:54 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > Hi,
> >
> > On Wed, Mar 18, 2026 at 01:36:07PM -0400, Zi Yan wrote:
> > > On 18 Mar 2026, at 13:19, Michał Cłapiński wrote:
> > > > On Wed, Mar 18, 2026 at 6:08 PM Zi Yan <ziy@nvidia.com> wrote:
> > > >>
> > > >> ## Call site analysis
> > > >>
> > > >> init_pageblock_migratetype() has nine call sites. The init call ordering
> > > >> relevant to scratch is:
> > > >>
> > > >> ```
> > > >> setup_arch()
> > > >>   zone_sizes_init() -> free_area_init() -> memmap_init_range()   [1]
> >
> > Hmm, this is slightly outdated, but largely correct :)
> >
> > > >>
> > > >> mm_init_free_all() / start_kernel():
> > > >>   kho_memory_init() -> kho_release_scratch()                     [2]
> > > >>   memblock_free_all()
> > > >>     free_low_memory_core_early()
> > > >>       memmap_init_reserved_pages()
> > > >>         reserve_bootmem_region() -> __init_deferred_page()
> > > >>           -> __init_page_from_nid()                              [3]
> > > >>   deferred init kthreads -> __init_page_from_nid()               [4]
> >
> > And this is wrong, deferred init does not call __init_page_from_nid, only
> > reserve_bootmem_region() does.
> >
> > And there's a case claude missed:
> >
> > hugetlb_bootmem_free_invalid_page() -> __init_page_from_nid() that
> > shouldn't check for KHO. Well, at least until we have support for hugetlb
> > persistence and most probably even afterwards.
> >
> > I don't think we should modify reserve_bootmem_region(). If there are
> > reserved pages in a pageblock, it does not matter if it's initialized to
> > MIGRATE_CMA. It only becomes important if the reserved pages freed, so we
> > can update pageblock migrate type in free_reserved_area().
> > When we boot with KHO, all memblock allocations come from scratch, so
> > anything freed in free_reserved_area() should become CMA again.
> 
> What happens if the reserved area covers one page and that page is
> pageblock aligned? Then it won't be marked as CMA until it is freed
> and unmovable allocation might appear in that pageblock, right?
>
> > +__init_memblock struct memblock_region *memblock_region_from_iter(u64 iterator)
> > +{
> > +       int index = iterator & 0xffffffff;
> 
> I'm not sure about this. __next_mem_range() has this code:
> /*
> * The region which ends first is
> * advanced for the next iteration.
> */
> if (m_end <= r_end)
>         idx_a++;
> else
>         idx_b++;
> 
> Therefore, the index you get from this might be correct or it might
> already be incremented.

Hmm, right, missed that :/

Still, we can check if an address is inside scratch in
reserve_bootmem_regions() and in deferred_init_pages() and set migrate type
to CMA in that case.

I think something like the patch below should work. It might not be the
most optimized, but it localizes the changes to mm_init and memblock and
does not complicated the code (well, almost).

The patch is on top of
https://lore.kernel.org/linux-mm/20260322143144.3540679-1-rppt@kernel.org/T/#u

and I pushed the entire set here:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=kho-deferred-init

It compiles and passes kho self test with both deferred pages enabled and
disabled, but I didn't do further testing yet.

From 97aa1ea8e085a128dd5add73f81a5a1e4e0aad5e Mon Sep 17 00:00:00 2001
From: Michal Clapinski <mclapinski@google.com>
Date: Tue, 17 Mar 2026 15:15:33 +0100
Subject: [PATCH] kho: fix deferred initialization of scratch areas

Currently, if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled,
kho_release_scratch() will initialize the struct pages and set migratetype
of KHO scratch. Unless the whole scratch fits below first_deferred_pfn, some
of that will be overwritten either by deferred_init_pages() or
memmap_init_reserved_range().

To fix it, modify kho_release_scratch() to only set the migratetype on
already initialized pages and make deferred_init_pages() and
memmap_init_reserved_range() recognize KHO scratch regions and set
migratetype of pageblocks in that regions to MIGRATE_CMA.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 include/linux/memblock.h           |  7 ++++--
 kernel/liveupdate/kexec_handover.c | 10 +++++---
 mm/memblock.c                      | 39 +++++++++++++-----------------
 mm/mm_init.c                       | 14 ++++++-----
 4 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 6ec5e9ac0699..410f2a399691 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -614,11 +614,14 @@ static inline void memtest_report_meminfo(struct seq_file *m) { }
 #ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
 void memblock_set_kho_scratch_only(void);
 void memblock_clear_kho_scratch_only(void);
-void memmap_init_kho_scratch_pages(void);
+bool memblock_is_kho_scratch_memory(phys_addr_t addr);
 #else
 static inline void memblock_set_kho_scratch_only(void) { }
 static inline void memblock_clear_kho_scratch_only(void) { }
-static inline void memmap_init_kho_scratch_pages(void) {}
+static inline bool memblock_is_kho_scratch_memory(phys_addr_t addr)
+{
+	return false;
+}
 #endif
 
 #endif /* _LINUX_MEMBLOCK_H */
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index 532f455c5d4f..12292b83bf49 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -1457,8 +1457,7 @@ static void __init kho_release_scratch(void)
 {
 	phys_addr_t start, end;
 	u64 i;
-
-	memmap_init_kho_scratch_pages();
+	int nid;
 
 	/*
 	 * Mark scratch mem as CMA before we return it. That way we
@@ -1466,10 +1465,13 @@ static void __init kho_release_scratch(void)
 	 * we can reuse it as scratch memory again later.
 	 */
 	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
-			     MEMBLOCK_KHO_SCRATCH, &start, &end, NULL) {
+			     MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
 		ulong start_pfn = pageblock_start_pfn(PFN_DOWN(start));
 		ulong end_pfn = pageblock_align(PFN_UP(end));
 		ulong pfn;
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+		end_pfn = min(end_pfn, NODE_DATA(nid)->first_deferred_pfn);
+#endif
 
 		for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages)
 			init_pageblock_migratetype(pfn_to_page(pfn),
@@ -1480,8 +1482,8 @@ static void __init kho_release_scratch(void)
 void __init kho_memory_init(void)
 {
 	if (kho_in.scratch_phys) {
-		kho_scratch = phys_to_virt(kho_in.scratch_phys);
 		kho_release_scratch();
+		kho_scratch = phys_to_virt(kho_in.scratch_phys);
 
 		if (kho_mem_retrieve(kho_get_fdt()))
 			kho_in.fdt_phys = 0;
diff --git a/mm/memblock.c b/mm/memblock.c
index 17aa8661b84d..fe50d60db9c6 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -17,6 +17,7 @@
 #include <linux/seq_file.h>
 #include <linux/memblock.h>
 #include <linux/mutex.h>
+#include <linux/page-isolation.h>
 
 #ifdef CONFIG_KEXEC_HANDOVER
 #include <linux/libfdt.h>
@@ -959,28 +960,6 @@ __init void memblock_clear_kho_scratch_only(void)
 {
 	kho_scratch_only = false;
 }
-
-__init void memmap_init_kho_scratch_pages(void)
-{
-	phys_addr_t start, end;
-	unsigned long pfn;
-	int nid;
-	u64 i;
-
-	if (!IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT))
-		return;
-
-	/*
-	 * Initialize struct pages for free scratch memory.
-	 * The struct pages for reserved scratch memory will be set up in
-	 * reserve_bootmem_region()
-	 */
-	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
-			     MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
-		for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
-			init_deferred_page(pfn, nid);
-	}
-}
 #endif
 
 /**
@@ -1971,6 +1950,18 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr)
 	return !memblock_is_nomap(&memblock.memory.regions[i]);
 }
 
+#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
+bool __init_memblock memblock_is_kho_scratch_memory(phys_addr_t addr)
+{
+	int i = memblock_search(&memblock.memory, addr);
+
+	if (i == -1)
+		return false;
+
+	return memblock_is_kho_scratch(&memblock.memory.regions[i]);
+}
+#endif
+
 int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
 			 unsigned long *start_pfn, unsigned long *end_pfn)
 {
@@ -2262,6 +2253,10 @@ static void __init memmap_init_reserved_range(phys_addr_t start,
 		 * access it yet.
 		 */
 		__SetPageReserved(page);
+
+		if (memblock_is_kho_scratch_memory(PFN_PHYS(pfn)) &&
+		    pageblock_aligned(pfn))
+			init_pageblock_migratetype(page, MIGRATE_CMA, false);
 	}
 }
 
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 96ae6024a75f..5ead2b0f07c6 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1971,7 +1971,7 @@ unsigned long __init node_map_pfn_alignment(void)
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static void __init deferred_free_pages(unsigned long pfn,
-		unsigned long nr_pages)
+		unsigned long nr_pages, enum migratetype mt)
 {
 	struct page *page;
 	unsigned long i;
@@ -1984,8 +1984,7 @@ static void __init deferred_free_pages(unsigned long pfn,
 	/* Free a large naturally-aligned chunk if possible */
 	if (nr_pages == MAX_ORDER_NR_PAGES && IS_MAX_ORDER_ALIGNED(pfn)) {
 		for (i = 0; i < nr_pages; i += pageblock_nr_pages)
-			init_pageblock_migratetype(page + i, MIGRATE_MOVABLE,
-					false);
+			init_pageblock_migratetype(page + i, mt, false);
 		__free_pages_core(page, MAX_PAGE_ORDER, MEMINIT_EARLY);
 		return;
 	}
@@ -1995,8 +1994,7 @@ static void __init deferred_free_pages(unsigned long pfn,
 
 	for (i = 0; i < nr_pages; i++, page++, pfn++) {
 		if (pageblock_aligned(pfn))
-			init_pageblock_migratetype(page, MIGRATE_MOVABLE,
-					false);
+			init_pageblock_migratetype(page, mt, false);
 		__free_pages_core(page, 0, MEMINIT_EARLY);
 	}
 }
@@ -2052,6 +2050,7 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
 	u64 i = 0;
 
 	for_each_free_mem_range(i, nid, 0, &start, &end, NULL) {
+		enum migratetype mt = MIGRATE_MOVABLE;
 		unsigned long spfn = PFN_UP(start);
 		unsigned long epfn = PFN_DOWN(end);
 
@@ -2061,12 +2060,15 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
 		spfn = max(spfn, start_pfn);
 		epfn = min(epfn, end_pfn);
 
+		if (memblock_is_kho_scratch_memory(PFN_PHYS(spfn)))
+			mt = MIGRATE_CMA;
+
 		while (spfn < epfn) {
 			unsigned long mo_pfn = ALIGN(spfn + 1, MAX_ORDER_NR_PAGES);
 			unsigned long chunk_end = min(mo_pfn, epfn);
 
 			nr_pages += deferred_init_pages(zone, spfn, chunk_end);
-			deferred_free_pages(spfn, chunk_end - spfn);
+			deferred_free_pages(spfn, chunk_end - spfn, mt);
 
 			spfn = chunk_end;
 
-- 
2.53.0

-- 
Sincerely yours,
Mike.


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 3/3] kho: make preserved pages compatible with deferred struct page init
  2026-03-17 14:15 [PATCH v7 0/3] kho: add support for deferred struct page init Michal Clapinski
  2026-03-17 14:15 ` [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging Michal Clapinski
  2026-03-17 14:15 ` [PATCH v7 2/3] kho: fix deferred init of kho scratch Michal Clapinski
@ 2026-03-17 14:15 ` Michal Clapinski
  2026-03-17 17:46 ` [PATCH v7 0/3] kho: add support for " Andrew Morton
  2026-03-18  9:18 ` Mike Rapoport
  4 siblings, 0 replies; 25+ messages in thread
From: Michal Clapinski @ 2026-03-17 14:15 UTC (permalink / raw)
  To: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Clapinski

From: Evangelos Petrongonas <epetron@amazon.de>

When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, struct page
initialization is deferred to parallel kthreads that run later
in the boot process.

During KHO restoration, kho_preserved_memory_reserve() writes metadata
for each preserved memory region. However, if the struct page has not
been initialized, this write targets uninitialized memory, potentially
leading to errors like:
BUG: unable to handle page fault for address: ...

Fix this by introducing kho_get_preserved_page(),  which ensures
all struct pages in a preserved region are initialized by calling
init_deferred_page() which is a no-op when the struct page is already
initialized.

Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
Co-developed-by: Michal Clapinski <mclapinski@google.com>
Signed-off-by: Michal Clapinski <mclapinski@google.com>
Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
I think we can't initialize those struct pages in kho_restore_page.
I encountered this stack:
  page_zone(start_page)
  __pageblock_pfn_to_page
  set_zone_contiguous
  page_alloc_init_late

So, at the end of page_alloc_init_late struct pages are expected to be
already initialized. set_zone_contiguous() looks at the first and last
struct page of each pageblock in each populated zone to figure out if
the zone is contiguous. If a kho page lands on a pageblock boundary,
this will lead to access of an uninitialized struct page.
There is also page_ext_init that invokes pfn_to_nid, which calls
page_to_nid for each section-aligned page.
There might be other places that do something similar. Therefore, it's
a good idea to initialize all struct pages by the end of deferred
struct page init. That's why I'm resending Evangelos's patch.

I also tried to implement Pratyush's idea, i.e. iterate over zones,
then get node from zone. I didn't notice any performance difference
even with 8GB of kho.
---
 kernel/liveupdate/Kconfig          |  2 --
 kernel/liveupdate/kexec_handover.c | 27 ++++++++++++++++++++++++++-
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
index 1a8513f16ef7..c13af38ba23a 100644
--- a/kernel/liveupdate/Kconfig
+++ b/kernel/liveupdate/Kconfig
@@ -1,12 +1,10 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 menu "Live Update and Kexec HandOver"
-	depends on !DEFERRED_STRUCT_PAGE_INIT
 
 config KEXEC_HANDOVER
 	bool "kexec handover"
 	depends on ARCH_SUPPORTS_KEXEC_HANDOVER && ARCH_SUPPORTS_KEXEC_FILE
-	depends on !DEFERRED_STRUCT_PAGE_INIT
 	select MEMBLOCK_KHO_SCRATCH
 	select KEXEC_FILE
 	select LIBFDT
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index e511a50fab9c..b49ebcd0b946 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -471,6 +471,31 @@ struct page *kho_restore_pages(phys_addr_t phys, unsigned long nr_pages)
 }
 EXPORT_SYMBOL_GPL(kho_restore_pages);
 
+/*
+ * With CONFIG_DEFERRED_STRUCT_PAGE_INIT, struct pages in higher memory regions
+ * may not be initialized yet at the time KHO deserializes preserved memory.
+ * KHO uses the struct page to store metadata and a later initialization would
+ * overwrite it.
+ * Ensure all the struct pages in the preservation are
+ * initialized. kho_preserved_memory_reserve() marks the reservation as noinit
+ * to make sure they don't get re-initialized later.
+ */
+static struct page *__init kho_get_preserved_page(phys_addr_t phys,
+						  unsigned int order)
+{
+	unsigned long pfn = PHYS_PFN(phys);
+	int nid;
+
+	if (!IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT))
+		return pfn_to_page(pfn);
+
+	nid = early_pfn_to_nid(pfn);
+	for (unsigned long i = 0; i < (1UL << order); i++)
+		init_deferred_page(pfn + i, nid);
+
+	return pfn_to_page(pfn);
+}
+
 static int __init kho_preserved_memory_reserve(phys_addr_t phys,
 					       unsigned int order)
 {
@@ -479,7 +504,7 @@ static int __init kho_preserved_memory_reserve(phys_addr_t phys,
 	u64 sz;
 
 	sz = 1 << (order + PAGE_SHIFT);
-	page = phys_to_page(phys);
+	page = kho_get_preserved_page(phys, order);
 
 	/* Reserve the memory preserved in KHO in memblock */
 	memblock_reserve(phys, sz);
-- 
2.53.0.851.ga537e3e6e9-goog



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/3] kho: add support for deferred struct page init
  2026-03-17 14:15 [PATCH v7 0/3] kho: add support for deferred struct page init Michal Clapinski
                   ` (2 preceding siblings ...)
  2026-03-17 14:15 ` [PATCH v7 3/3] kho: make preserved pages compatible with deferred struct page init Michal Clapinski
@ 2026-03-17 17:46 ` Andrew Morton
  2026-03-18  9:34   ` Mike Rapoport
  2026-03-18  9:18 ` Mike Rapoport
  4 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2026-03-17 17:46 UTC (permalink / raw)
  To: Michal Clapinski
  Cc: Evangelos Petrongonas, Pasha Tatashin, Mike Rapoport,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm, linux-kernel

On Tue, 17 Mar 2026 15:15:31 +0100 Michal Clapinski <mclapinski@google.com> wrote:

> When CONFIG_DEFERRED_STRUCT_PAGE_INIT (hereinafter DEFERRED) is
> enabled, struct page initialization is deferred to parallel kthreads
> that run later in the boot process.
> 
> Currently, KHO is incompatible with DEFERRED.
> This series fixes that incompatibility.

Thanks, I've added this series to mm.git's mm-new branch for testing. 
All being well I'll move it into the mm-unstable branch (and hence
linux-next) in a few days.  All being well I'll move it into the
non-rebasing mm-stable branch a few weeks hence.  Then into mainline
during the merge window!

[1/3] and [2/3] aren't showing any review at this time, but the
well-reviewed [3/3] is the meat of this series.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/3] kho: add support for deferred struct page init
  2026-03-17 17:46 ` [PATCH v7 0/3] kho: add support for " Andrew Morton
@ 2026-03-18  9:34   ` Mike Rapoport
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2026-03-18  9:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Clapinski, Evangelos Petrongonas, Pasha Tatashin,
	Pratyush Yadav, Alexander Graf, Samiullah Khawaja, kexec,
	linux-mm, linux-kernel

On Tue, Mar 17, 2026 at 10:46:25AM -0700, Andrew Morton wrote:
> On Tue, 17 Mar 2026 15:15:31 +0100 Michal Clapinski <mclapinski@google.com> wrote:
> 
> > When CONFIG_DEFERRED_STRUCT_PAGE_INIT (hereinafter DEFERRED) is
> > enabled, struct page initialization is deferred to parallel kthreads
> > that run later in the boot process.
> > 
> > Currently, KHO is incompatible with DEFERRED.
> > This series fixes that incompatibility.
> 
> Thanks, I've added this series to mm.git's mm-new branch for testing. 
> All being well I'll move it into the mm-unstable branch (and hence
> linux-next) in a few days.  All being well I'll move it into the
> non-rebasing mm-stable branch a few weeks hence.  Then into mainline
> during the merge window!
> 
> [1/3] and [2/3] aren't showing any review at this time, but the

They keep changing because of review comments, that's why there still no
tags for them,

> well-reviewed [3/3] is the meat of this series.

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/3] kho: add support for deferred struct page init
  2026-03-17 14:15 [PATCH v7 0/3] kho: add support for deferred struct page init Michal Clapinski
                   ` (3 preceding siblings ...)
  2026-03-17 17:46 ` [PATCH v7 0/3] kho: add support for " Andrew Morton
@ 2026-03-18  9:18 ` Mike Rapoport
  4 siblings, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2026-03-18  9:18 UTC (permalink / raw)
  To: Michal Clapinski
  Cc: Evangelos Petrongonas, Pasha Tatashin, Pratyush Yadav,
	Alexander Graf, Samiullah Khawaja, kexec, linux-mm, linux-kernel,
	Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan

Hi Michal,

On Tue, Mar 17, 2026 at 03:15:31PM +0100, Michal Clapinski wrote:
> When CONFIG_DEFERRED_STRUCT_PAGE_INIT (hereinafter DEFERRED) is
> enabled, struct page initialization is deferred to parallel kthreads
> that run later in the boot process.
> 
> Currently, KHO is incompatible with DEFERRED.
> This series fixes that incompatibility.
> ---
> v7:
> - reimplemented the initialization of kho scratch again
> v6:
> - reimplemented the initialization of kho scratch
> v5:
> - rebased
> v4:
> - added a new commit to fix deferred init of kho scratch
> - switched to ulong when refering to pfn
> v3:
> - changed commit msg
> - don't invoke early_pfn_to_nid if CONFIG_DEFERRED_STRUCT_PAGE_INIT=n
> v2:
> - updated a comment
> 
> I took Evangelos's test code:
> https://git.infradead.org/?p=users/vpetrog/linux.git;a=shortlog;h=refs/heads/kho-deferred-struct-page-init
> and then modified it to this monster test that does 2 allocations:
> at core_initcall (early) and at module_init (late). Then kexec, then
> 2 more allocations at these points, then restore the original 2, then
> kexec, then restore the other 2. Basically I test preservation of early
> and late allocation both on cold and on warm boot.
> Tested it both with and without DEFERRED.
> 
> Evangelos Petrongonas (1):
>   kho: make preserved pages compatible with deferred struct page init
> 
> Michal Clapinski (2):
>   kho: make kho_scratch_overlap usable outside debugging
>   kho: fix deferred init of kho scratch
> 
>  include/linux/kexec_handover.h              |  6 ++
>  include/linux/memblock.h                    |  2 -
>  kernel/liveupdate/Kconfig                   |  2 -
>  kernel/liveupdate/Makefile                  |  1 -
>  kernel/liveupdate/kexec_handover.c          | 65 ++++++++++++++++++---
>  kernel/liveupdate/kexec_handover_debug.c    | 25 --------
>  kernel/liveupdate/kexec_handover_internal.h |  7 ++-
>  mm/memblock.c                               | 22 -------
>  mm/page_alloc.c                             |  7 +++

Although it's a small change, page_alloc maintainers should be CC'ed
Adding them now.

>  9 files changed, 74 insertions(+), 63 deletions(-)
>  delete mode 100644 kernel/liveupdate/kexec_handover_debug.c
> 
> -- 
> 2.53.0.851.ga537e3e6e9-goog
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-03-22 14:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-17 14:15 [PATCH v7 0/3] kho: add support for deferred struct page init Michal Clapinski
2026-03-17 14:15 ` [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging Michal Clapinski
2026-03-18  9:16   ` Mike Rapoport
2026-03-17 14:15 ` [PATCH v7 2/3] kho: fix deferred init of kho scratch Michal Clapinski
2026-03-17 23:23   ` Vishal Moola (Oracle)
2026-03-18  0:08     ` SeongJae Park
2026-03-18  0:23       ` Andrew Morton
2026-03-18  9:33   ` Mike Rapoport
2026-03-18 10:28     ` Michał Cłapiński
2026-03-18 10:33     ` Michał Cłapiński
2026-03-18 11:02       ` Mike Rapoport
2026-03-18 15:10   ` Zi Yan
2026-03-18 15:18     ` Michał Cłapiński
2026-03-18 15:26       ` Zi Yan
2026-03-18 15:45         ` Michał Cłapiński
2026-03-18 17:08           ` Zi Yan
2026-03-18 17:19             ` Michał Cłapiński
2026-03-18 17:36               ` Zi Yan
2026-03-19  7:54                 ` Mike Rapoport
2026-03-19 18:17                   ` Michał Cłapiński
2026-03-22 14:45                     ` Mike Rapoport
2026-03-17 14:15 ` [PATCH v7 3/3] kho: make preserved pages compatible with deferred struct page init Michal Clapinski
2026-03-17 17:46 ` [PATCH v7 0/3] kho: add support for " Andrew Morton
2026-03-18  9:34   ` Mike Rapoport
2026-03-18  9:18 ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox