Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Pratyush Yadav <pratyush@kernel.org>
To: Mike Rapoport <rppt@kernel.org>
Cc: "Michał Cłapiński" <mclapinski@google.com>,
	"Zi Yan" <ziy@nvidia.com>,
	"Evangelos Petrongonas" <epetron@amazon.de>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Pratyush Yadav" <pratyush@kernel.org>,
	"Alexander Graf" <graf@amazon.com>,
	"Samiullah Khawaja" <skhawaja@google.com>,
	kexec@lists.infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
Date: Tue, 07 Apr 2026 12:21:56 +0000	[thread overview]
Message-ID: <2vxzwlyj9d0b.fsf@kernel.org> (raw)
In-Reply-To: <acAAp4JaTGn_STBH@kernel.org> (Mike Rapoport's message of "Sun, 22 Mar 2026 16:45:59 +0200")

On Sun, Mar 22 2026, Mike Rapoport wrote:

> On Thu, Mar 19, 2026 at 07:17:48PM +0100, Michał Cłapiński wrote:
>> On Thu, Mar 19, 2026 at 8:54 AM Mike Rapoport <rppt@kernel.org> wrote:
[...]
>> > +__init_memblock struct memblock_region *memblock_region_from_iter(u64 iterator)
>> > +{
>> > +       int index = iterator & 0xffffffff;
>> 
>> I'm not sure about this. __next_mem_range() has this code:
>> /*
>> * The region which ends first is
>> * advanced for the next iteration.
>> */
>> if (m_end <= r_end)
>>         idx_a++;
>> else
>>         idx_b++;
>> 
>> Therefore, the index you get from this might be correct or it might
>> already be incremented.
>
> Hmm, right, missed that :/
>
> Still, we can check if an address is inside scratch in
> reserve_bootmem_regions() and in deferred_init_pages() and set migrate type
> to CMA in that case.
>
> I think something like the patch below should work. It might not be the
> most optimized, but it localizes the changes to mm_init and memblock and
> does not complicated the code (well, almost).
>
> The patch is on top of
> https://lore.kernel.org/linux-mm/20260322143144.3540679-1-rppt@kernel.org/T/#u
>
> and I pushed the entire set here:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=kho-deferred-init
>
> It compiles and passes kho self test with both deferred pages enabled and
> disabled, but I didn't do further testing yet.
>
> From 97aa1ea8e085a128dd5add73f81a5a1e4e0aad5e Mon Sep 17 00:00:00 2001
> From: Michal Clapinski <mclapinski@google.com>
> Date: Tue, 17 Mar 2026 15:15:33 +0100
> Subject: [PATCH] kho: fix deferred initialization of scratch areas
>
> Currently, if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled,
> kho_release_scratch() will initialize the struct pages and set migratetype
> of KHO scratch. Unless the whole scratch fits below first_deferred_pfn, some
> of that will be overwritten either by deferred_init_pages() or
> memmap_init_reserved_range().
>
> To fix it, modify kho_release_scratch() to only set the migratetype on
> already initialized pages and make deferred_init_pages() and
> memmap_init_reserved_range() recognize KHO scratch regions and set
> migratetype of pageblocks in that regions to MIGRATE_CMA.

Hmm, I don't like that how complex this is. It adds another layer of
complexity to the initialization of the migratetype, and you have to dig
through all the possible call sites to be sure that we catch all the
cases. Makes it harder to wrap your head around it. Plus, makes it more
likely for bugs to slip through if later refactors change some page init
flow.

Is the cost to look through the scratch array really that bad? I would
suspect we'd have at most 4-6 per-node scratches, and one global one
lowmem. So I'd expect around 10 items to look through, and it will
probably be in the cache anyway.

Michal, did you ever run any numbers on how much extra time
init_pageblock_migratetype() takes as a result of your patch?

Anyway, Mike, if you do want to do it this way, it LGTM for the most
part, but some comments below.

>
> Signed-off-by: Michal Clapinski <mclapinski@google.com>
> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
>  include/linux/memblock.h           |  7 ++++--
>  kernel/liveupdate/kexec_handover.c | 10 +++++---
>  mm/memblock.c                      | 39 +++++++++++++-----------------
>  mm/mm_init.c                       | 14 ++++++-----
>  4 files changed, 36 insertions(+), 34 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 6ec5e9ac0699..410f2a399691 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -614,11 +614,14 @@ static inline void memtest_report_meminfo(struct seq_file *m) { }
>  #ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
>  void memblock_set_kho_scratch_only(void);
>  void memblock_clear_kho_scratch_only(void);
> -void memmap_init_kho_scratch_pages(void);
> +bool memblock_is_kho_scratch_memory(phys_addr_t addr);
>  #else
>  static inline void memblock_set_kho_scratch_only(void) { }
>  static inline void memblock_clear_kho_scratch_only(void) { }
> -static inline void memmap_init_kho_scratch_pages(void) {}
> +static inline bool memblock_is_kho_scratch_memory(phys_addr_t addr)
> +{
> +	return false;
> +}
>  #endif
>  
>  #endif /* _LINUX_MEMBLOCK_H */
> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> index 532f455c5d4f..12292b83bf49 100644
> --- a/kernel/liveupdate/kexec_handover.c
> +++ b/kernel/liveupdate/kexec_handover.c
> @@ -1457,8 +1457,7 @@ static void __init kho_release_scratch(void)
>  {
>  	phys_addr_t start, end;
>  	u64 i;
> -
> -	memmap_init_kho_scratch_pages();
> +	int nid;
>  
>  	/*
>  	 * Mark scratch mem as CMA before we return it. That way we
> @@ -1466,10 +1465,13 @@ static void __init kho_release_scratch(void)
>  	 * we can reuse it as scratch memory again later.
>  	 */
>  	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> -			     MEMBLOCK_KHO_SCRATCH, &start, &end, NULL) {
> +			     MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
>  		ulong start_pfn = pageblock_start_pfn(PFN_DOWN(start));
>  		ulong end_pfn = pageblock_align(PFN_UP(end));
>  		ulong pfn;
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> +		end_pfn = min(end_pfn, NODE_DATA(nid)->first_deferred_pfn);
> +#endif

Can we just get rid of this entirely? And just update
memmap_init_zone_range() to also look for scratch and set the
migratetype correctly from the get go? That's more consistent IMO. The
two main places that initialize the struct page,
memmap_init_zone_range() and deferred_init_memmap_chunk(), check for
scratch and set the migratetype correctly.
>  
>  		for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages)
>  			init_pageblock_migratetype(pfn_to_page(pfn),
> @@ -1480,8 +1482,8 @@ static void __init kho_release_scratch(void)
>  void __init kho_memory_init(void)
>  {
>  	if (kho_in.scratch_phys) {
> -		kho_scratch = phys_to_virt(kho_in.scratch_phys);
>  		kho_release_scratch();
> +		kho_scratch = phys_to_virt(kho_in.scratch_phys);
>  
>  		if (kho_mem_retrieve(kho_get_fdt()))
>  			kho_in.fdt_phys = 0;
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 17aa8661b84d..fe50d60db9c6 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -17,6 +17,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/memblock.h>
>  #include <linux/mutex.h>
> +#include <linux/page-isolation.h>
>  
>  #ifdef CONFIG_KEXEC_HANDOVER
>  #include <linux/libfdt.h>
> @@ -959,28 +960,6 @@ __init void memblock_clear_kho_scratch_only(void)
>  {
>  	kho_scratch_only = false;
>  }
> -
> -__init void memmap_init_kho_scratch_pages(void)
> -{
> -	phys_addr_t start, end;
> -	unsigned long pfn;
> -	int nid;
> -	u64 i;
> -
> -	if (!IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT))
> -		return;
> -
> -	/*
> -	 * Initialize struct pages for free scratch memory.
> -	 * The struct pages for reserved scratch memory will be set up in
> -	 * reserve_bootmem_region()
> -	 */
> -	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> -			     MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
> -		for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
> -			init_deferred_page(pfn, nid);
> -	}
> -}
>  #endif
>  
>  /**
> @@ -1971,6 +1950,18 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr)
>  	return !memblock_is_nomap(&memblock.memory.regions[i]);
>  }
>  
> +#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
> +bool __init_memblock memblock_is_kho_scratch_memory(phys_addr_t addr)
> +{
> +	int i = memblock_search(&memblock.memory, addr);
> +
> +	if (i == -1)
> +		return false;
> +
> +	return memblock_is_kho_scratch(&memblock.memory.regions[i]);
> +}
> +#endif
> +
>  int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
>  			 unsigned long *start_pfn, unsigned long *end_pfn)
>  {
> @@ -2262,6 +2253,10 @@ static void __init memmap_init_reserved_range(phys_addr_t start,
>  		 * access it yet.
>  		 */
>  		__SetPageReserved(page);
> +
> +		if (memblock_is_kho_scratch_memory(PFN_PHYS(pfn)) &&
> +		    pageblock_aligned(pfn))
> +			init_pageblock_migratetype(page, MIGRATE_CMA, false);
>  	}
>  }
>  
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 96ae6024a75f..5ead2b0f07c6 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1971,7 +1971,7 @@ unsigned long __init node_map_pfn_alignment(void)
>  
>  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  static void __init deferred_free_pages(unsigned long pfn,
> -		unsigned long nr_pages)
> +		unsigned long nr_pages, enum migratetype mt)
>  {
>  	struct page *page;
>  	unsigned long i;
> @@ -1984,8 +1984,7 @@ static void __init deferred_free_pages(unsigned long pfn,
>  	/* Free a large naturally-aligned chunk if possible */
>  	if (nr_pages == MAX_ORDER_NR_PAGES && IS_MAX_ORDER_ALIGNED(pfn)) {
>  		for (i = 0; i < nr_pages; i += pageblock_nr_pages)
> -			init_pageblock_migratetype(page + i, MIGRATE_MOVABLE,
> -					false);
> +			init_pageblock_migratetype(page + i, mt, false);
>  		__free_pages_core(page, MAX_PAGE_ORDER, MEMINIT_EARLY);
>  		return;
>  	}
> @@ -1995,8 +1994,7 @@ static void __init deferred_free_pages(unsigned long pfn,
>  
>  	for (i = 0; i < nr_pages; i++, page++, pfn++) {
>  		if (pageblock_aligned(pfn))
> -			init_pageblock_migratetype(page, MIGRATE_MOVABLE,
> -					false);
> +			init_pageblock_migratetype(page, mt, false);
>  		__free_pages_core(page, 0, MEMINIT_EARLY);
>  	}
>  }
> @@ -2052,6 +2050,7 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
>  	u64 i = 0;
>  
>  	for_each_free_mem_range(i, nid, 0, &start, &end, NULL) {
> +		enum migratetype mt = MIGRATE_MOVABLE;
>  		unsigned long spfn = PFN_UP(start);
>  		unsigned long epfn = PFN_DOWN(end);
>  
> @@ -2061,12 +2060,15 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
>  		spfn = max(spfn, start_pfn);
>  		epfn = min(epfn, end_pfn);
>  
> +		if (memblock_is_kho_scratch_memory(PFN_PHYS(spfn)))
> +			mt = MIGRATE_CMA;

Would it make sense for for_each_free_mem_range() to also return the
flags for the region? Then you won't have to do another search. It adds
yet another parameter to it so no strong opinion, but something to
consider.

> +
>  		while (spfn < epfn) {
>  			unsigned long mo_pfn = ALIGN(spfn + 1, MAX_ORDER_NR_PAGES);
>  			unsigned long chunk_end = min(mo_pfn, epfn);
>  
>  			nr_pages += deferred_init_pages(zone, spfn, chunk_end);
> -			deferred_free_pages(spfn, chunk_end - spfn);
> +			deferred_free_pages(spfn, chunk_end - spfn, mt);
>  
>  			spfn = chunk_end;
>  
> -- 
>
> 2.53.0

-- 
Regards,
Pratyush Yadav

next prev parent reply	other threads:[~2026-04-07 12:22 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 14:15 [PATCH v7 0/3] kho: add support for deferred struct page init Michal Clapinski
2026-03-17 14:15 ` [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging Michal Clapinski
2026-03-18  9:16   ` Mike Rapoport
2026-04-07 10:55     ` Pratyush Yadav
2026-04-07 14:18       ` Pasha Tatashin
2026-04-07 16:09         ` Pratyush Yadav
2026-04-07 16:32           ` Pasha Tatashin
2026-03-17 14:15 ` [PATCH v7 2/3] kho: fix deferred init of kho scratch Michal Clapinski
2026-03-17 23:23   ` Vishal Moola (Oracle)
2026-03-18  0:08     ` SeongJae Park
2026-03-18  0:23       ` Andrew Morton
2026-03-18  9:33   ` Mike Rapoport
2026-03-18 10:28     ` Michał Cłapiński
2026-03-18 10:33     ` Michał Cłapiński
2026-03-18 11:02       ` Mike Rapoport
2026-03-18 15:10   ` Zi Yan
2026-03-18 15:18     ` Michał Cłapiński
2026-03-18 15:26       ` Zi Yan
2026-03-18 15:45         ` Michał Cłapiński
2026-03-18 17:08           ` Zi Yan
2026-03-18 17:19             ` Michał Cłapiński
2026-03-18 17:36               ` Zi Yan
2026-03-19  7:54                 ` Mike Rapoport
2026-03-19 18:17                   ` Michał Cłapiński
2026-03-22 14:45                     ` Mike Rapoport
2026-04-07 12:21                       ` Pratyush Yadav [this message]
2026-04-07 13:21                         ` Zi Yan
2026-03-17 14:15 ` [PATCH v7 3/3] kho: make preserved pages compatible with deferred struct page init Michal Clapinski
2026-03-17 17:46 ` [PATCH v7 0/3] kho: add support for " Andrew Morton
2026-03-18  9:34   ` Mike Rapoport
2026-03-18  9:18 ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2vxzwlyj9d0b.fsf@kernel.org \
    --to=pratyush@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=epetron@amazon.de \
    --cc=graf@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mclapinski@google.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=rppt@kernel.org \
    --cc=skhawaja@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox