From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FA8C4A07 for ; Thu, 19 Mar 2026 07:54:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773906853; cv=none; b=gG8VVEHRKecxINqt98qNQLF0keJVcYgYoYC5ISHMVFXealzgbz0oBdgaGqbsg0Sk9/Ca5E9n9th4usRZtZmSk4FcVHd53TUXturz8piXKm9+RHkbKMKSssUMwYEojoxaoFkait3IJmmmZ1RMpBKIb7tumBoXLFZU/43BOD2MvUw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773906853; c=relaxed/simple; bh=dP5W3Kndpvm9kB8cRcapsSUyxSzs+yEOaxzwwC5+yTE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=vDvqreeN0O5Xt/sEmOtQjDrJJUjYxu+qW5uamjlC59j6NnWhtrN4mY1COChXH6u17JV5uOw3iyHppXbKt2JVbJFv/054gNZYNaLRouuezeikflfzCGr16pOoHTZcXpFpyRyMBK9Z02qp6oxvoEhOvyeswJaN/NyisKDRYJ8MNmc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LhQFQN2p; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LhQFQN2p" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 58FDEC19424; Thu, 19 Mar 2026 07:54:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773906853; bh=dP5W3Kndpvm9kB8cRcapsSUyxSzs+yEOaxzwwC5+yTE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LhQFQN2prXyafTb29mtmn2lHjoY58KytAriVs8or6fr7UgLNLmyzkZnrqNOr89tyF c+SRMME9tiUcQN3zYgFdBZRgb5Cn7p8vkDeBWbjwDj2Oe8I3l/g5CfI4uGsEdZIcVz B4RLC6g5epofAZSfHL9lCCGk/IKa/Ha/LlXM8M7B3aXLbF4eN172sPSQcDOCMNUBUE w9MIihXIDcXMTCnPcMElI3l+n0pvPRBf78dM/40BNAsyQturHIzhSYg+zITc58tqm3 kqBBwet/VV1CSKoNtyAqLCJF9iNJVHaiE3lREBoQ89s6qdoeTp+kgLXqabm/YD5RMw Wo+UyfktVi3hw== Date: Thu, 19 Mar 2026 09:54:05 +0200 From: Mike Rapoport To: Zi Yan Cc: =?utf-8?B?TWljaGHFgiBDxYJhcGnFhHNraQ==?= , Evangelos Petrongonas , Pasha Tatashin , Pratyush Yadav , Alexander Graf , Samiullah Khawaja , kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch Message-ID: References: <20260317141534.815634-1-mclapinski@google.com> <20260317141534.815634-3-mclapinski@google.com> <76559EF5-8740-4691-8776-0ADD1CCBF2A4@nvidia.com> <0D1F59C7-CA35-49C8-B341-32D8C7F4A345@nvidia.com> <58A8B1B4-A73B-48D2-8492-A58A03634644@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <58A8B1B4-A73B-48D2-8492-A58A03634644@nvidia.com> Hi, On Wed, Mar 18, 2026 at 01:36:07PM -0400, Zi Yan wrote: > On 18 Mar 2026, at 13:19, Michał Cłapiński wrote: > > On Wed, Mar 18, 2026 at 6:08 PM Zi Yan wrote: > >> > >> ## Call site analysis > >> > >> init_pageblock_migratetype() has nine call sites. The init call ordering > >> relevant to scratch is: > >> > >> ``` > >> setup_arch() > >> zone_sizes_init() -> free_area_init() -> memmap_init_range() [1] Hmm, this is slightly outdated, but largely correct :) > >> > >> mm_init_free_all() / start_kernel(): > >> kho_memory_init() -> kho_release_scratch() [2] > >> memblock_free_all() > >> free_low_memory_core_early() > >> memmap_init_reserved_pages() > >> reserve_bootmem_region() -> __init_deferred_page() > >> -> __init_page_from_nid() [3] > >> deferred init kthreads -> __init_page_from_nid() [4] And this is wrong, deferred init does not call __init_page_from_nid, only reserve_bootmem_region() does. And there's a case claude missed: hugetlb_bootmem_free_invalid_page() -> __init_page_from_nid() that shouldn't check for KHO. Well, at least until we have support for hugetlb persistence and most probably even afterwards. I don't think we should modify reserve_bootmem_region(). If there are reserved pages in a pageblock, it does not matter if it's initialized to MIGRATE_CMA. It only becomes important if the reserved pages freed, so we can update pageblock migrate type in free_reserved_area(). When we boot with KHO, all memblock allocations come from scratch, so anything freed in free_reserved_area() should become CMA again. > >> ``` > > > > I don't understand this. deferred_free_pages() doesn't call > > __init_page_from_nid(). So I would clearly need to modify both > > deferred_free_pages and __init_page_from_nid. For deferred_free_pages() we don't need kho_scratch_overlap(), we already have memblock_region (almost) at hand and it's enough to check if it's MEMBLOCK_KHO_SCRATCH. Something along these lines (compile tested only) should do the trick: diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 3e217414e12d..b9b1e0991ec8 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -275,6 +275,8 @@ static inline void __next_physmem_range(u64 *idx, struct memblock_type *type, __for_each_mem_range(i, &memblock.reserved, NULL, NUMA_NO_NODE, \ MEMBLOCK_NONE, p_start, p_end, NULL) +struct memblock_region *memblock_region_from_iter(u64 iterator); + static inline bool memblock_is_hotpluggable(struct memblock_region *m) { return m->flags & MEMBLOCK_HOTPLUG; diff --git a/mm/memblock.c b/mm/memblock.c index ae6a5af46bd7..9cf99f32279f 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1359,6 +1359,16 @@ void __init_memblock __next_mem_range_rev(u64 *idx, int nid, *idx = ULLONG_MAX; } +__init_memblock struct memblock_region *memblock_region_from_iter(u64 iterator) +{ + int index = iterator & 0xffffffff; + + if (index < 0 || index >= memblock.memory.cnt) + return NULL; + + return &memblock.memory.regions[index]; +} + /* * Common iterator interface used to define for_each_mem_pfn_range(). */ diff --git a/mm/mm_init.c b/mm/mm_init.c index cec7bb758bdd..96b25895ffbe 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1996,7 +1996,7 @@ unsigned long __init node_map_pfn_alignment(void) #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT static void __init deferred_free_pages(unsigned long pfn, - unsigned long nr_pages) + unsigned long nr_pages, enum migratetype mt) { struct page *page; unsigned long i; @@ -2009,8 +2009,7 @@ static void __init deferred_free_pages(unsigned long pfn, /* Free a large naturally-aligned chunk if possible */ if (nr_pages == MAX_ORDER_NR_PAGES && IS_MAX_ORDER_ALIGNED(pfn)) { for (i = 0; i < nr_pages; i += pageblock_nr_pages) - init_pageblock_migratetype(page + i, MIGRATE_MOVABLE, - false); + init_pageblock_migratetype(page + i, mt, false); __free_pages_core(page, MAX_PAGE_ORDER, MEMINIT_EARLY); return; } @@ -2020,8 +2019,7 @@ static void __init deferred_free_pages(unsigned long pfn, for (i = 0; i < nr_pages; i++, page++, pfn++) { if (pageblock_aligned(pfn)) - init_pageblock_migratetype(page, MIGRATE_MOVABLE, - false); + init_pageblock_migratetype(page, mt, false); __free_pages_core(page, 0, MEMINIT_EARLY); } } @@ -2077,6 +2075,8 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn, u64 i = 0; for_each_free_mem_range(i, nid, 0, &start, &end, NULL) { + struct memblock_region *region = memblock_region_from_iter(i); + enum migratetype mt = MIGRATE_MOVABLE; unsigned long spfn = PFN_UP(start); unsigned long epfn = PFN_DOWN(end); @@ -2086,12 +2086,15 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn, spfn = max(spfn, start_pfn); epfn = min(epfn, end_pfn); + if (memblock_is_kho_scratch(region)) + mt = MIGRATE_CMA; + while (spfn < epfn) { unsigned long mo_pfn = ALIGN(spfn + 1, MAX_ORDER_NR_PAGES); unsigned long chunk_end = min(mo_pfn, epfn); nr_pages += deferred_init_pages(zone, spfn, chunk_end); - deferred_free_pages(spfn, chunk_end - spfn); + deferred_free_pages(spfn, chunk_end - spfn, mt); spfn = chunk_end; -- Sincerely yours, Mike.