Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Rapoport <rppt@kernel.org>
To: Pratyush Yadav <pratyush@kernel.org>
Cc: "Michał Cłapiński" <mclapinski@google.com>,
	"Zi Yan" <ziy@nvidia.com>,
	"Evangelos Petrongonas" <epetron@amazon.de>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Alexander Graf" <graf@amazon.com>,
	"Samiullah Khawaja" <skhawaja@google.com>,
	kexec@lists.infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
Date: Thu, 9 Apr 2026 21:06:08 +0300	[thread overview]
Message-ID: <adfqkOWVFgeAkItF@kernel.org> (raw)
In-Reply-To: <2vxzwlyj9d0b.fsf@kernel.org>

On Tue, Apr 07, 2026 at 12:21:56PM +0000, Pratyush Yadav wrote:
> On Sun, Mar 22 2026, Mike Rapoport wrote:
> 
> > On Thu, Mar 19, 2026 at 07:17:48PM +0100, Michał Cłapiński wrote:
> >> On Thu, Mar 19, 2026 at 8:54 AM Mike Rapoport <rppt@kernel.org> wrote:
> [...]
> >> > +__init_memblock struct memblock_region *memblock_region_from_iter(u64 iterator)
> >> > +{
> >> > +       int index = iterator & 0xffffffff;
> >> 
> >> I'm not sure about this. __next_mem_range() has this code:
> >> /*
> >> * The region which ends first is
> >> * advanced for the next iteration.
> >> */
> >> if (m_end <= r_end)
> >>         idx_a++;
> >> else
> >>         idx_b++;
> >> 
> >> Therefore, the index you get from this might be correct or it might
> >> already be incremented.
> >
> > Hmm, right, missed that :/
> >
> > Still, we can check if an address is inside scratch in
> > reserve_bootmem_regions() and in deferred_init_pages() and set migrate type
> > to CMA in that case.
> >
> > I think something like the patch below should work. It might not be the
> > most optimized, but it localizes the changes to mm_init and memblock and
> > does not complicated the code (well, almost).
> >
> > The patch is on top of
> > https://lore.kernel.org/linux-mm/20260322143144.3540679-1-rppt@kernel.org/T/#u
> >
> > and I pushed the entire set here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=kho-deferred-init
> >
> > It compiles and passes kho self test with both deferred pages enabled and
> > disabled, but I didn't do further testing yet.
> >
> > From 97aa1ea8e085a128dd5add73f81a5a1e4e0aad5e Mon Sep 17 00:00:00 2001
> > From: Michal Clapinski <mclapinski@google.com>
> > Date: Tue, 17 Mar 2026 15:15:33 +0100
> > Subject: [PATCH] kho: fix deferred initialization of scratch areas
> >
> > Currently, if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled,
> > kho_release_scratch() will initialize the struct pages and set migratetype
> > of KHO scratch. Unless the whole scratch fits below first_deferred_pfn, some
> > of that will be overwritten either by deferred_init_pages() or
> > memmap_init_reserved_range().
> >
> > To fix it, modify kho_release_scratch() to only set the migratetype on
> > already initialized pages and make deferred_init_pages() and
> > memmap_init_reserved_range() recognize KHO scratch regions and set
> > migratetype of pageblocks in that regions to MIGRATE_CMA.
> 
> Hmm, I don't like that how complex this is. It adds another layer of
> complexity to the initialization of the migratetype, and you have to dig
> through all the possible call sites to be sure that we catch all the
> cases. Makes it harder to wrap your head around it. Plus, makes it more
> likely for bugs to slip through if later refactors change some page init
> flow.
> 
> Is the cost to look through the scratch array really that bad? I would
> suspect we'd have at most 4-6 per-node scratches, and one global one
> lowmem. So I'd expect around 10 items to look through, and it will
> probably be in the cache anyway.

The check is most probably cheap enough, but if we stick it into
init_pageblock_migratetype(), we'd check each pageblock even though we have
that information already for much larger chunks. And we should use that
information rather than go back and forth for each pageblock.
 
> Michal, did you ever run any numbers on how much extra time
> init_pageblock_migratetype() takes as a result of your patch?
> 
> Anyway, Mike, if you do want to do it this way, it LGTM for the most
> part, but some comments below.
> 
> > @@ -1466,10 +1465,13 @@ static void __init kho_release_scratch(void)
> >  	 * we can reuse it as scratch memory again later.
> >  	 */
> >  	__for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE,
> > -			     MEMBLOCK_KHO_SCRATCH, &start, &end, NULL) {
> > +			     MEMBLOCK_KHO_SCRATCH, &start, &end, &nid) {
> >  		ulong start_pfn = pageblock_start_pfn(PFN_DOWN(start));
> >  		ulong end_pfn = pageblock_align(PFN_UP(end));
> >  		ulong pfn;
> > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> > +		end_pfn = min(end_pfn, NODE_DATA(nid)->first_deferred_pfn);
> > +#endif
> 
> Can we just get rid of this entirely? And just update
> memmap_init_zone_range() to also look for scratch and set the
> migratetype correctly from the get go? That's more consistent IMO. The
> two main places that initialize the struct page,
> memmap_init_zone_range() and deferred_init_memmap_chunk(), check for
> scratch and set the migratetype correctly.

We could. E.g. let memmap_init() check the memblock flags and pass the
migratetype to memmap_init_zone_range().

I wanted to avoid as much KHO code in mm/ as possible, but if it is must
have in deferred_init_memmap_chunk() we could add some to memmap_init() as
well.
 
> > @@ -2061,12 +2060,15 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
> >  		spfn = max(spfn, start_pfn);
> >  		epfn = min(epfn, end_pfn);
> >  
> > +		if (memblock_is_kho_scratch_memory(PFN_PHYS(spfn)))
> > +			mt = MIGRATE_CMA;
> 
> Would it make sense for for_each_free_mem_range() to also return the
> flags for the region? Then you won't have to do another search. It adds
> yet another parameter to it so no strong opinion, but something to
> consider.

I hesitated a lot about this.
Have you seen memblock::__next_mem_range() signature? ;-)

I decided to start with something correct, but slowish and leave the churn
and speed for later.
 
> -- 
> Regards,
> Pratyush Yadav

-- 
Sincerely yours,
Mike.

next prev parent reply	other threads:[~2026-04-09 18:06 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 14:15 [PATCH v7 0/3] kho: add support for deferred struct page init Michal Clapinski
2026-03-17 14:15 ` [PATCH v7 1/3] kho: make kho_scratch_overlap usable outside debugging Michal Clapinski
2026-03-18  9:16   ` Mike Rapoport
2026-04-07 10:55     ` Pratyush Yadav
2026-04-07 14:18       ` Pasha Tatashin
2026-04-07 16:09         ` Pratyush Yadav
2026-04-07 16:32           ` Pasha Tatashin
2026-03-17 14:15 ` [PATCH v7 2/3] kho: fix deferred init of kho scratch Michal Clapinski
2026-03-17 23:23   ` Vishal Moola (Oracle)
2026-03-18  0:08     ` SeongJae Park
2026-03-18  0:23       ` Andrew Morton
2026-03-18  9:33   ` Mike Rapoport
2026-03-18 10:28     ` Michał Cłapiński
2026-03-18 10:33     ` Michał Cłapiński
2026-03-18 11:02       ` Mike Rapoport
2026-03-18 15:10   ` Zi Yan
2026-03-18 15:18     ` Michał Cłapiński
2026-03-18 15:26       ` Zi Yan
2026-03-18 15:45         ` Michał Cłapiński
2026-03-18 17:08           ` Zi Yan
2026-03-18 17:19             ` Michał Cłapiński
2026-03-18 17:36               ` Zi Yan
2026-03-19  7:54                 ` Mike Rapoport
2026-03-19 18:17                   ` Michał Cłapiński
2026-03-22 14:45                     ` Mike Rapoport
2026-04-07 12:21                       ` Pratyush Yadav
2026-04-07 13:21                         ` Zi Yan
2026-04-16  9:44                           ` Pratyush Yadav
2026-04-09 18:06                         ` Mike Rapoport [this message]
2026-04-16  9:41                           ` Pratyush Yadav
2026-03-17 14:15 ` [PATCH v7 3/3] kho: make preserved pages compatible with deferred struct page init Michal Clapinski
2026-03-17 17:46 ` [PATCH v7 0/3] kho: add support for " Andrew Morton
2026-03-18  9:34   ` Mike Rapoport
2026-03-18  9:18 ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adfqkOWVFgeAkItF@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=epetron@amazon.de \
    --cc=graf@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mclapinski@google.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pratyush@kernel.org \
    --cc=skhawaja@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.