* [PATCH] kho: initialize tail pages for higher order folios properly
@ 2025-06-05 17:11 Pratyush Yadav
2025-06-05 20:13 ` Andrew Morton
2025-06-06 8:04 ` Mike Rapoport
0 siblings, 2 replies; 18+ messages in thread
From: Pratyush Yadav @ 2025-06-05 17:11 UTC (permalink / raw)
To: Alexander Graf, Mike Rapoport, Changyuan Lyu, Pasha Tatashin,
Andrew Morton, Baoquan He
Cc: Pratyush Yadav, kexec, linux-kernel, linux-mm
From: Pratyush Yadav <ptyadav@amazon.de>
Currently, when restoring higher order folios, kho_restore_folio() only
calls prep_compound_page() on all the pages. That is not enough to
properly initialize the folios. The managed page count does not
get updated, the reserved flag does not get dropped, and page count does
not get initialized properly.
Restoring a higher order folio with it results in the following BUG with
CONFIG_DEBUG_VM when attempting to free the folio:
BUG: Bad page state in process test pfn:104e2b
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffffffffffffffff pfn:0x104e2b
flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
raw: 002fffff80000000 0000000000000000 00000000ffffffff 0000000000000000
raw: ffffffffffffffff 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: nonzero _refcount
[...]
Call Trace:
<TASK>
dump_stack_lvl+0x4b/0x70
bad_page.cold+0x97/0xb2
__free_frozen_pages+0x616/0x850
[...]
Combine the path for 0-order and higher order folios, initialize the
tail pages with a count of zero, and call adjust_managed_page_count() to
account for all the pages instead of just missing them.
In addition, since all the KHO-preserved pages get marked with
MEMBLOCK_RSRV_NOINIT by deserialize_bitmap(), the reserved flag is not
actually set (as can also be seen from the flags of the dumped page in
the logs above). So drop the ClearPageReserved() calls.
Fixes: fc33e4b44b271 ("kexec: enable KHO support for memory preservation")
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
---
Side note: get_maintainers.pl for KHO only lists kexec@ as the mailing list.
Since KHO has a bunch of MM bits as well, should we also add linux-mm@ to its
MAINTAINERS entry?
Adding linux-mm@ to this patch at least, in case MM people have an opinion on
this.
kernel/kexec_handover.c | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c
index eb305e7e61296..5214ab27d1f8d 100644
--- a/kernel/kexec_handover.c
+++ b/kernel/kexec_handover.c
@@ -157,11 +157,21 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn,
}
/* almost as free_reserved_page(), just don't free the page */
-static void kho_restore_page(struct page *page)
+static void kho_restore_page(struct page *page, unsigned int order)
{
- ClearPageReserved(page);
- init_page_count(page);
- adjust_managed_page_count(page, 1);
+ unsigned int i, nr_pages = (1 << order);
+
+ /* Head page gets refcount of 1. */
+ set_page_count(page, 1);
+
+ /* For higher order folios, tail pages get a page count of zero. */
+ for (i = 1; i < nr_pages; i++)
+ set_page_count(page + i, 0);
+
+ if (order > 0)
+ prep_compound_page(page, order);
+
+ adjust_managed_page_count(page, nr_pages);
}
/**
@@ -179,15 +189,10 @@ struct folio *kho_restore_folio(phys_addr_t phys)
return NULL;
order = page->private;
- if (order) {
- if (order > MAX_PAGE_ORDER)
- return NULL;
-
- prep_compound_page(page, order);
- } else {
- kho_restore_page(page);
- }
+ if (order > MAX_PAGE_ORDER)
+ return NULL;
+ kho_restore_page(page, order);
return page_folio(page);
}
EXPORT_SYMBOL_GPL(kho_restore_folio);
--
2.47.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-05 17:11 [PATCH] kho: initialize tail pages for higher order folios properly Pratyush Yadav
@ 2025-06-05 20:13 ` Andrew Morton
2025-06-06 8:04 ` Mike Rapoport
1 sibling, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2025-06-05 20:13 UTC (permalink / raw)
To: Pratyush Yadav
Cc: Alexander Graf, Mike Rapoport, Changyuan Lyu, Pasha Tatashin,
Baoquan He, Pratyush Yadav, kexec, linux-kernel, linux-mm
On Thu, 5 Jun 2025 19:11:41 +0200 Pratyush Yadav <pratyush@kernel.org> wrote:
> Side note: get_maintainers.pl for KHO only lists kexec@ as the mailing list.
> Since KHO has a bunch of MM bits as well, should we also add linux-mm@ to its
> MAINTAINERS entry?
I think so. Please send a patch?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-05 17:11 [PATCH] kho: initialize tail pages for higher order folios properly Pratyush Yadav
2025-06-05 20:13 ` Andrew Morton
@ 2025-06-06 8:04 ` Mike Rapoport
2025-06-06 16:23 ` Pratyush Yadav
1 sibling, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2025-06-06 8:04 UTC (permalink / raw)
To: Pratyush Yadav
Cc: Alexander Graf, Changyuan Lyu, Pasha Tatashin, Andrew Morton,
Baoquan He, Pratyush Yadav, kexec, linux-kernel, linux-mm
On Thu, Jun 05, 2025 at 07:11:41PM +0200, Pratyush Yadav wrote:
> From: Pratyush Yadav <ptyadav@amazon.de>
>
> Currently, when restoring higher order folios, kho_restore_folio() only
> calls prep_compound_page() on all the pages. That is not enough to
> properly initialize the folios. The managed page count does not
> get updated, the reserved flag does not get dropped, and page count does
> not get initialized properly.
>
> Restoring a higher order folio with it results in the following BUG with
> CONFIG_DEBUG_VM when attempting to free the folio:
>
> BUG: Bad page state in process test pfn:104e2b
> page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffffffffffffffff pfn:0x104e2b
> flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
> raw: 002fffff80000000 0000000000000000 00000000ffffffff 0000000000000000
> raw: ffffffffffffffff 0000000000000000 00000001ffffffff 0000000000000000
> page dumped because: nonzero _refcount
> [...]
> Call Trace:
> <TASK>
> dump_stack_lvl+0x4b/0x70
> bad_page.cold+0x97/0xb2
> __free_frozen_pages+0x616/0x850
> [...]
>
> Combine the path for 0-order and higher order folios, initialize the
> tail pages with a count of zero, and call adjust_managed_page_count() to
> account for all the pages instead of just missing them.
>
> In addition, since all the KHO-preserved pages get marked with
> MEMBLOCK_RSRV_NOINIT by deserialize_bitmap(), the reserved flag is not
> actually set (as can also be seen from the flags of the dumped page in
> the logs above). So drop the ClearPageReserved() calls.
>
> Fixes: fc33e4b44b271 ("kexec: enable KHO support for memory preservation")
> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
> ---
>
> Side note: get_maintainers.pl for KHO only lists kexec@ as the mailing list.
> Since KHO has a bunch of MM bits as well, should we also add linux-mm@ to its
> MAINTAINERS entry?
>
> Adding linux-mm@ to this patch at least, in case MM people have an opinion on
> this.
>
> kernel/kexec_handover.c | 29 +++++++++++++++++------------
> 1 file changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c
> index eb305e7e61296..5214ab27d1f8d 100644
> --- a/kernel/kexec_handover.c
> +++ b/kernel/kexec_handover.c
> @@ -157,11 +157,21 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn,
> }
>
> /* almost as free_reserved_page(), just don't free the page */
> -static void kho_restore_page(struct page *page)
> +static void kho_restore_page(struct page *page, unsigned int order)
> {
> - ClearPageReserved(page);
So now we don't clear PG_Reserved even on order-0 pages? ;-)
> - init_page_count(page);
> - adjust_managed_page_count(page, 1);
> + unsigned int i, nr_pages = (1 << order);
Can you please declare 'i' inside the loop, looks nicer IMHO.
> +
> + /* Head page gets refcount of 1. */
> + set_page_count(page, 1);
ClearPageReserved(page) here?
> +
> + /* For higher order folios, tail pages get a page count of zero. */
> + for (i = 1; i < nr_pages; i++)
> + set_page_count(page + i, 0);
and here?
> +
> + if (order > 0)
> + prep_compound_page(page, order);
> +
> + adjust_managed_page_count(page, nr_pages);
> }
>
> /**
> @@ -179,15 +189,10 @@ struct folio *kho_restore_folio(phys_addr_t phys)
> return NULL;
>
> order = page->private;
> - if (order) {
> - if (order > MAX_PAGE_ORDER)
> - return NULL;
> -
> - prep_compound_page(page, order);
> - } else {
> - kho_restore_page(page);
> - }
> + if (order > MAX_PAGE_ORDER)
> + return NULL;
>
> + kho_restore_page(page, order);
> return page_folio(page);
> }
> EXPORT_SYMBOL_GPL(kho_restore_folio);
> --
> 2.47.1
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-06 8:04 ` Mike Rapoport
@ 2025-06-06 16:23 ` Pratyush Yadav
2025-06-09 19:36 ` Mike Rapoport
0 siblings, 1 reply; 18+ messages in thread
From: Pratyush Yadav @ 2025-06-06 16:23 UTC (permalink / raw)
To: Mike Rapoport
Cc: Pratyush Yadav, Alexander Graf, Changyuan Lyu, Pasha Tatashin,
Andrew Morton, Baoquan He, kexec, linux-kernel, linux-mm
Hi Mike,
On Fri, Jun 06 2025, Mike Rapoport wrote:
> On Thu, Jun 05, 2025 at 07:11:41PM +0200, Pratyush Yadav wrote:
>> From: Pratyush Yadav <ptyadav@amazon.de>
>>
>> Currently, when restoring higher order folios, kho_restore_folio() only
>> calls prep_compound_page() on all the pages. That is not enough to
>> properly initialize the folios. The managed page count does not
>> get updated, the reserved flag does not get dropped, and page count does
>> not get initialized properly.
>>
>> Restoring a higher order folio with it results in the following BUG with
>> CONFIG_DEBUG_VM when attempting to free the folio:
>>
>> BUG: Bad page state in process test pfn:104e2b
>> page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffffffffffffffff pfn:0x104e2b
>> flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
>> raw: 002fffff80000000 0000000000000000 00000000ffffffff 0000000000000000
>> raw: ffffffffffffffff 0000000000000000 00000001ffffffff 0000000000000000
>> page dumped because: nonzero _refcount
>> [...]
>> Call Trace:
>> <TASK>
>> dump_stack_lvl+0x4b/0x70
>> bad_page.cold+0x97/0xb2
>> __free_frozen_pages+0x616/0x850
>> [...]
>>
>> Combine the path for 0-order and higher order folios, initialize the
>> tail pages with a count of zero, and call adjust_managed_page_count() to
>> account for all the pages instead of just missing them.
>>
>> In addition, since all the KHO-preserved pages get marked with
>> MEMBLOCK_RSRV_NOINIT by deserialize_bitmap(), the reserved flag is not
>> actually set (as can also be seen from the flags of the dumped page in
>> the logs above). So drop the ClearPageReserved() calls.
>>
>> Fixes: fc33e4b44b271 ("kexec: enable KHO support for memory preservation")
>> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
>> ---
>>
>> Side note: get_maintainers.pl for KHO only lists kexec@ as the mailing list.
>> Since KHO has a bunch of MM bits as well, should we also add linux-mm@ to its
>> MAINTAINERS entry?
>>
>> Adding linux-mm@ to this patch at least, in case MM people have an opinion on
>> this.
>>
>> kernel/kexec_handover.c | 29 +++++++++++++++++------------
>> 1 file changed, 17 insertions(+), 12 deletions(-)
>>
>> diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c
>> index eb305e7e61296..5214ab27d1f8d 100644
>> --- a/kernel/kexec_handover.c
>> +++ b/kernel/kexec_handover.c
>> @@ -157,11 +157,21 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn,
>> }
>>
>> /* almost as free_reserved_page(), just don't free the page */
>> -static void kho_restore_page(struct page *page)
>> +static void kho_restore_page(struct page *page, unsigned int order)
>> {
>> - ClearPageReserved(page);
>
> So now we don't clear PG_Reserved even on order-0 pages? ;-)
We don't need to. As I mentioned in the commit message as well,
PG_Reserved is never set for KHO pages since they are reserved with
MEMBLOCK_RSRV_NOINIT, so memmap_init_reserved_pages() skips over them.
To double-check, I added some quick prints to kho_restore_page():
/* Head page gets refcount of 1. */
set_page_count(page, 1);
printk("Head page flags: 0x%lx reserved: %d\n",
page->flags, PageReserved(page));
/* For higher order folios, tail pages get a page count of zero. */
for (i = 1; i < nr_pages; i++) {
printk("Tail page %u flags: 0x%lx reserved: %d\n",
i, (page+i)->flags, PageReserved(page+i));
set_page_count(page + i, 0);
}
And this is what I get:
[ 9.003187] Head page flags: 0x2fffff80000000 reserved: 0
[ 9.003730] Tail page 1 flags: 0x2fffff80000000 reserved: 0
[ 9.004229] Tail page 2 flags: 0x2fffff80000000 reserved: 0
[ 9.004740] Tail page 3 flags: 0x2fffff80000000 reserved: 0
[ 9.005265] Head page flags: 0x2fffff80000000 reserved: 0
[ 9.005759] Head page flags: 0x2fffff80000000 reserved: 0
[...]
So PG_Reserved is never set.
That said, while reading through some of the code, I noticed another
bug: because KHO reserves the preserved pages as NOINIT, with
CONFIG_DEFERRED_STRUCT_PAGE_INIT == n, all the pages get initialized
when memmap_init_range() is called from setup_arch (paging_init() on
x86). This happens before kho_memory_init(), so the KHO-preserved pages
are not marked as reserved to memblock yet.
With deferred page init, some pages might not get initialized early, and
get initialized after kho_memory_init(), by which time the KHO-preserved
pages are marked as reserved. So, deferred_init_maxorder() will skip
over those pages and leave them uninitialized.
And sure enough, doing the same with CONFIG_DEFERRED_STRUCT_PAGE_INIT ==
y results in:
[ 10.060842] Head page flags: 0x2fffff80000000 reserved: 0
[ 10.061387] Tail page 1 flags: 0x2fffff80000000 reserved: 0
[ 10.061902] Tail page 2 flags: 0x2fffff80000000 reserved: 0
[ 10.062400] Tail page 3 flags: 0x2fffff80000000 reserved: 0
[ 10.062924] page:00000000fb3dca54 is uninitialized and poisoned
[ 10.063494] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
[ 10.064190] ------------[ cut here ]------------
[ 10.064636] kernel BUG at ./include/linux/page-flags.h:571!
[ 10.065194] Oops: invalid opcode: 0000 [#1] SMP PTI
[ 10.065661] CPU: 2 UID: 0 PID: 1954 Comm: test Not tainted 6.15.0+ #297 PREEMPT(undef)
[ 10.066449] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 10.067353] RIP: 0010:kho_restore_folio+0x4e/0x70
[...]
So we need to either also call init_deferred_page(), or remove the
memblock_reserved_mark_noinit() call in deserialize_bitmap(). And TBH, I
am not sure why KHO pages even need to be marked noinit in the first
place. Probably the only benefit would be if a large chunk of memory is
KHO-preserved, the pages can be initialized later on-demand, reducing
bootup time a bit.
What do you think? Should we drop noinit or call init_deferred_page()?
FWIW, my preference is to drop noinit, since init_deferred_page() is
__meminit and we would have to make sure it doesn't go away after boot.
>
>> - init_page_count(page);
>> - adjust_managed_page_count(page, 1);
>> + unsigned int i, nr_pages = (1 << order);
>
> Can you please declare 'i' inside the loop, looks nicer IMHO.
Ok, will do.
>
>> +
>> + /* Head page gets refcount of 1. */
>> + set_page_count(page, 1);
>
> ClearPageReserved(page) here?
>
>> +
>> + /* For higher order folios, tail pages get a page count of zero. */
>> + for (i = 1; i < nr_pages; i++)
>> + set_page_count(page + i, 0);
>
> and here?
>
>> +
>> + if (order > 0)
>> + prep_compound_page(page, order);
>> +
>> + adjust_managed_page_count(page, nr_pages);
>> }
>>
>> /**
[...]
--
Regards,
Pratyush Yadav
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-06 16:23 ` Pratyush Yadav
@ 2025-06-09 19:36 ` Mike Rapoport
2025-06-09 20:07 ` Pasha Tatashin
0 siblings, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2025-06-09 19:36 UTC (permalink / raw)
To: Pratyush Yadav
Cc: Alexander Graf, Changyuan Lyu, Pasha Tatashin, Andrew Morton,
Baoquan He, kexec, linux-kernel, linux-mm
Hi Pratyush,
On Fri, Jun 06, 2025 at 06:23:06PM +0200, Pratyush Yadav wrote:
> Hi Mike,
>
> On Fri, Jun 06 2025, Mike Rapoport wrote:
>
> > On Thu, Jun 05, 2025 at 07:11:41PM +0200, Pratyush Yadav wrote:
> >> From: Pratyush Yadav <ptyadav@amazon.de>
> >>
> >> --- a/kernel/kexec_handover.c
> >> +++ b/kernel/kexec_handover.c
> >> @@ -157,11 +157,21 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn,
> >> }
> >>
> >> /* almost as free_reserved_page(), just don't free the page */
> >> -static void kho_restore_page(struct page *page)
> >> +static void kho_restore_page(struct page *page, unsigned int order)
> >> {
> >> - ClearPageReserved(page);
> >
> > So now we don't clear PG_Reserved even on order-0 pages? ;-)
>
> We don't need to. As I mentioned in the commit message as well,
> PG_Reserved is never set for KHO pages since they are reserved with
> MEMBLOCK_RSRV_NOINIT, so memmap_init_reserved_pages() skips over them.
You are right, I missed it.
> That said, while reading through some of the code, I noticed another
> bug: because KHO reserves the preserved pages as NOINIT, with
> CONFIG_DEFERRED_STRUCT_PAGE_INIT == n, all the pages get initialized
> when memmap_init_range() is called from setup_arch (paging_init() on
> x86). This happens before kho_memory_init(), so the KHO-preserved pages
> are not marked as reserved to memblock yet.
>
> With deferred page init, some pages might not get initialized early, and
> get initialized after kho_memory_init(), by which time the KHO-preserved
> pages are marked as reserved. So, deferred_init_maxorder() will skip
> over those pages and leave them uninitialized.
>
> So we need to either also call init_deferred_page(), or remove the
> memblock_reserved_mark_noinit() call in deserialize_bitmap(). And TBH, I
> am not sure why KHO pages even need to be marked noinit in the first
> place. Probably the only benefit would be if a large chunk of memory is
> KHO-preserved, the pages can be initialized later on-demand, reducing
> bootup time a bit.
One benefit is performance indeed, because in not deferred case the
initialization of reserved pages in memmap_init_reserved_pages() is really
excessive.
But more importantly, if we remove memblock_reserved_mark_noinit(), with
CONFIG_DEFERRED_STRUCT_PAGE_INIT we'd loose page->private because the
struct page will be cleared after kho_mem_deserialize().
> What do you think? Should we drop noinit or call init_deferred_page()?
> FWIW, my preference is to drop noinit, since init_deferred_page() is
> __meminit and we would have to make sure it doesn't go away after boot.
We can't drop noinit and calling init_deferred_page() after boot just won't
work because it uses memblock to find the page's node and memblock is gone
after init.
The simplest short-term solution is to disable KHO when
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set and then find an efficient way to
make it all work together.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-09 19:36 ` Mike Rapoport
@ 2025-06-09 20:07 ` Pasha Tatashin
2025-06-10 5:44 ` Mike Rapoport
0 siblings, 1 reply; 18+ messages in thread
From: Pasha Tatashin @ 2025-06-09 20:07 UTC (permalink / raw)
To: Mike Rapoport
Cc: Pratyush Yadav, Alexander Graf, Changyuan Lyu, Andrew Morton,
Baoquan He, kexec, linux-kernel, linux-mm
On Mon, Jun 9, 2025 at 3:36 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> Hi Pratyush,
>
> On Fri, Jun 06, 2025 at 06:23:06PM +0200, Pratyush Yadav wrote:
> > Hi Mike,
> >
> > On Fri, Jun 06 2025, Mike Rapoport wrote:
> >
> > > On Thu, Jun 05, 2025 at 07:11:41PM +0200, Pratyush Yadav wrote:
> > >> From: Pratyush Yadav <ptyadav@amazon.de>
> > >>
> > >> --- a/kernel/kexec_handover.c
> > >> +++ b/kernel/kexec_handover.c
> > >> @@ -157,11 +157,21 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn,
> > >> }
> > >>
> > >> /* almost as free_reserved_page(), just don't free the page */
> > >> -static void kho_restore_page(struct page *page)
> > >> +static void kho_restore_page(struct page *page, unsigned int order)
> > >> {
> > >> - ClearPageReserved(page);
> > >
> > > So now we don't clear PG_Reserved even on order-0 pages? ;-)
> >
> > We don't need to. As I mentioned in the commit message as well,
> > PG_Reserved is never set for KHO pages since they are reserved with
> > MEMBLOCK_RSRV_NOINIT, so memmap_init_reserved_pages() skips over them.
>
> You are right, I missed it.
>
> > That said, while reading through some of the code, I noticed another
> > bug: because KHO reserves the preserved pages as NOINIT, with
> > CONFIG_DEFERRED_STRUCT_PAGE_INIT == n, all the pages get initialized
> > when memmap_init_range() is called from setup_arch (paging_init() on
> > x86). This happens before kho_memory_init(), so the KHO-preserved pages
> > are not marked as reserved to memblock yet.
> >
> > With deferred page init, some pages might not get initialized early, and
> > get initialized after kho_memory_init(), by which time the KHO-preserved
> > pages are marked as reserved. So, deferred_init_maxorder() will skip
> > over those pages and leave them uninitialized.
> >
> > So we need to either also call init_deferred_page(), or remove the
> > memblock_reserved_mark_noinit() call in deserialize_bitmap(). And TBH, I
> > am not sure why KHO pages even need to be marked noinit in the first
> > place. Probably the only benefit would be if a large chunk of memory is
> > KHO-preserved, the pages can be initialized later on-demand, reducing
> > bootup time a bit.
>
> One benefit is performance indeed, because in not deferred case the
> initialization of reserved pages in memmap_init_reserved_pages() is really
> excessive.
>
> But more importantly, if we remove memblock_reserved_mark_noinit(), with
> CONFIG_DEFERRED_STRUCT_PAGE_INIT we'd loose page->private because the
> struct page will be cleared after kho_mem_deserialize().
>
> > What do you think? Should we drop noinit or call init_deferred_page()?
> > FWIW, my preference is to drop noinit, since init_deferred_page() is
> > __meminit and we would have to make sure it doesn't go away after boot.
>
> We can't drop noinit and calling init_deferred_page() after boot just won't
> work because it uses memblock to find the page's node and memblock is gone
> after init.
>
> The simplest short-term solution is to disable KHO when
> CONFIG_DEFERRED_STRUCT_PAGE_INIT is set and then find an efficient way to
> make it all work together.
This is what I've done in LUOv3 WIP:
https://github.com/soleen/linux/commit/3059f38ac0a39a397873759fb429bd5d1f8ea681
We will need to teah KHO to work with deferred struct page init. I
suspect, we could init preserved struct pages and then skip over them
during deferred init.
Pasha
>
> --
> Sincerely yours,
> Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-09 20:07 ` Pasha Tatashin
@ 2025-06-10 5:44 ` Mike Rapoport
2025-06-10 11:20 ` Pasha Tatashin
0 siblings, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2025-06-10 5:44 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Pratyush Yadav, Alexander Graf, Changyuan Lyu, Andrew Morton,
Baoquan He, kexec, linux-kernel, linux-mm
On Mon, Jun 09, 2025 at 04:07:50PM -0400, Pasha Tatashin wrote:
> On Mon, Jun 9, 2025 at 3:36 PM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > Hi Pratyush,
> >
> > On Fri, Jun 06, 2025 at 06:23:06PM +0200, Pratyush Yadav wrote:
> > > Hi Mike,
> > >
> > > On Fri, Jun 06 2025, Mike Rapoport wrote:
> > >
> > > > On Thu, Jun 05, 2025 at 07:11:41PM +0200, Pratyush Yadav wrote:
> > > >> From: Pratyush Yadav <ptyadav@amazon.de>
> > > >>
> > > >> --- a/kernel/kexec_handover.c
> > > >> +++ b/kernel/kexec_handover.c
> > > >> @@ -157,11 +157,21 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn,
> > > >> }
> > > >>
> > > >> /* almost as free_reserved_page(), just don't free the page */
> > > >> -static void kho_restore_page(struct page *page)
> > > >> +static void kho_restore_page(struct page *page, unsigned int order)
> > > >> {
> > > >> - ClearPageReserved(page);
> > > >
> > > > So now we don't clear PG_Reserved even on order-0 pages? ;-)
> > >
> > > We don't need to. As I mentioned in the commit message as well,
> > > PG_Reserved is never set for KHO pages since they are reserved with
> > > MEMBLOCK_RSRV_NOINIT, so memmap_init_reserved_pages() skips over them.
> >
> > You are right, I missed it.
> >
> > > That said, while reading through some of the code, I noticed another
> > > bug: because KHO reserves the preserved pages as NOINIT, with
> > > CONFIG_DEFERRED_STRUCT_PAGE_INIT == n, all the pages get initialized
> > > when memmap_init_range() is called from setup_arch (paging_init() on
> > > x86). This happens before kho_memory_init(), so the KHO-preserved pages
> > > are not marked as reserved to memblock yet.
> > >
> > > With deferred page init, some pages might not get initialized early, and
> > > get initialized after kho_memory_init(), by which time the KHO-preserved
> > > pages are marked as reserved. So, deferred_init_maxorder() will skip
> > > over those pages and leave them uninitialized.
> > >
> > > So we need to either also call init_deferred_page(), or remove the
> > > memblock_reserved_mark_noinit() call in deserialize_bitmap(). And TBH, I
> > > am not sure why KHO pages even need to be marked noinit in the first
> > > place. Probably the only benefit would be if a large chunk of memory is
> > > KHO-preserved, the pages can be initialized later on-demand, reducing
> > > bootup time a bit.
> >
> > One benefit is performance indeed, because in not deferred case the
> > initialization of reserved pages in memmap_init_reserved_pages() is really
> > excessive.
> >
> > But more importantly, if we remove memblock_reserved_mark_noinit(), with
> > CONFIG_DEFERRED_STRUCT_PAGE_INIT we'd loose page->private because the
> > struct page will be cleared after kho_mem_deserialize().
> >
> > > What do you think? Should we drop noinit or call init_deferred_page()?
> > > FWIW, my preference is to drop noinit, since init_deferred_page() is
> > > __meminit and we would have to make sure it doesn't go away after boot.
> >
> > We can't drop noinit and calling init_deferred_page() after boot just won't
> > work because it uses memblock to find the page's node and memblock is gone
> > after init.
> >
> > The simplest short-term solution is to disable KHO when
> > CONFIG_DEFERRED_STRUCT_PAGE_INIT is set and then find an efficient way to
> > make it all work together.
>
> This is what I've done in LUOv3 WIP:
> https://github.com/soleen/linux/commit/3059f38ac0a39a397873759fb429bd5d1f8ea681
I think it should be the other way around, KHO should depend on
!DEFERRED_STRUCT_PAGE_INIT.
> We will need to teah KHO to work with deferred struct page init. I
> suspect, we could init preserved struct pages and then skip over them
> during deferred init.
We could, but with that would mean we'll run this before SMP and it's not
desirable. Also, init_deferred_page() for a random page requires
finding its node with early_pfn_to_nid() that's also suboptimal.
> Pasha
>
> >
> > --
> > Sincerely yours,
> > Mike.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-10 5:44 ` Mike Rapoport
@ 2025-06-10 11:20 ` Pasha Tatashin
2025-06-10 16:41 ` Mike Rapoport
0 siblings, 1 reply; 18+ messages in thread
From: Pasha Tatashin @ 2025-06-10 11:20 UTC (permalink / raw)
To: Mike Rapoport
Cc: Pratyush Yadav, Alexander Graf, Changyuan Lyu, Andrew Morton,
Baoquan He, kexec, linux-kernel, linux-mm
On Tue, Jun 10, 2025 at 1:44 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Mon, Jun 09, 2025 at 04:07:50PM -0400, Pasha Tatashin wrote:
> > On Mon, Jun 9, 2025 at 3:36 PM Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > Hi Pratyush,
> > >
> > > On Fri, Jun 06, 2025 at 06:23:06PM +0200, Pratyush Yadav wrote:
> > > > Hi Mike,
> > > >
> > > > On Fri, Jun 06 2025, Mike Rapoport wrote:
> > > >
> > > > > On Thu, Jun 05, 2025 at 07:11:41PM +0200, Pratyush Yadav wrote:
> > > > >> From: Pratyush Yadav <ptyadav@amazon.de>
> > > > >>
> > > > >> --- a/kernel/kexec_handover.c
> > > > >> +++ b/kernel/kexec_handover.c
> > > > >> @@ -157,11 +157,21 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn,
> > > > >> }
> > > > >>
> > > > >> /* almost as free_reserved_page(), just don't free the page */
> > > > >> -static void kho_restore_page(struct page *page)
> > > > >> +static void kho_restore_page(struct page *page, unsigned int order)
> > > > >> {
> > > > >> - ClearPageReserved(page);
> > > > >
> > > > > So now we don't clear PG_Reserved even on order-0 pages? ;-)
> > > >
> > > > We don't need to. As I mentioned in the commit message as well,
> > > > PG_Reserved is never set for KHO pages since they are reserved with
> > > > MEMBLOCK_RSRV_NOINIT, so memmap_init_reserved_pages() skips over them.
> > >
> > > You are right, I missed it.
> > >
> > > > That said, while reading through some of the code, I noticed another
> > > > bug: because KHO reserves the preserved pages as NOINIT, with
> > > > CONFIG_DEFERRED_STRUCT_PAGE_INIT == n, all the pages get initialized
> > > > when memmap_init_range() is called from setup_arch (paging_init() on
> > > > x86). This happens before kho_memory_init(), so the KHO-preserved pages
> > > > are not marked as reserved to memblock yet.
> > > >
> > > > With deferred page init, some pages might not get initialized early, and
> > > > get initialized after kho_memory_init(), by which time the KHO-preserved
> > > > pages are marked as reserved. So, deferred_init_maxorder() will skip
> > > > over those pages and leave them uninitialized.
> > > >
> > > > So we need to either also call init_deferred_page(), or remove the
> > > > memblock_reserved_mark_noinit() call in deserialize_bitmap(). And TBH, I
> > > > am not sure why KHO pages even need to be marked noinit in the first
> > > > place. Probably the only benefit would be if a large chunk of memory is
> > > > KHO-preserved, the pages can be initialized later on-demand, reducing
> > > > bootup time a bit.
> > >
> > > One benefit is performance indeed, because in not deferred case the
> > > initialization of reserved pages in memmap_init_reserved_pages() is really
> > > excessive.
> > >
> > > But more importantly, if we remove memblock_reserved_mark_noinit(), with
> > > CONFIG_DEFERRED_STRUCT_PAGE_INIT we'd loose page->private because the
> > > struct page will be cleared after kho_mem_deserialize().
> > >
> > > > What do you think? Should we drop noinit or call init_deferred_page()?
> > > > FWIW, my preference is to drop noinit, since init_deferred_page() is
> > > > __meminit and we would have to make sure it doesn't go away after boot.
> > >
> > > We can't drop noinit and calling init_deferred_page() after boot just won't
> > > work because it uses memblock to find the page's node and memblock is gone
> > > after init.
> > >
> > > The simplest short-term solution is to disable KHO when
> > > CONFIG_DEFERRED_STRUCT_PAGE_INIT is set and then find an efficient way to
> > > make it all work together.
> >
> > This is what I've done in LUOv3 WIP:
> > https://github.com/soleen/linux/commit/3059f38ac0a39a397873759fb429bd5d1f8ea681
>
> I think it should be the other way around, KHO should depend on
> !DEFERRED_STRUCT_PAGE_INIT.
Agreed, and this is what I first tried, but that does not work, there
is some circular dependency breaking the build. If you feel
adventurous you can try that :-)
> > We will need to teah KHO to work with deferred struct page init. I
> > suspect, we could init preserved struct pages and then skip over them
> > during deferred init.
>
> We could, but with that would mean we'll run this before SMP and it's not
> desirable. Also, init_deferred_page() for a random page requires
We already run KHO init before smp_init:
start_kernel() -> mm_core_init() -> kho_memory_init() ->
kho_restore_folio() -> struct pages must be already initialized here!
While deferred struct pages are initialized:
start_kernel() -> rest_init() -> kernel_init() ->
kernel_init_freeable() -> page_alloc_init_late() ->
deferred_init_memmap()
If the number of preserved pages that is needed during early boot is
relatively small, that it should not be an issue to pre-initialize
struct pages for them before deferred struct pages are initialized. We
already pre-initialize some "struct pages" that are needed during
early boot before the reset are initialized, see deferred_grow_zone()
Pasha
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-10 11:20 ` Pasha Tatashin
@ 2025-06-10 16:41 ` Mike Rapoport
2025-06-10 22:33 ` Pasha Tatashin
0 siblings, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2025-06-10 16:41 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Pratyush Yadav, Alexander Graf, Changyuan Lyu, Andrew Morton,
Baoquan He, kexec, linux-kernel, linux-mm
On Tue, Jun 10, 2025 at 07:20:23AM -0400, Pasha Tatashin wrote:
> On Tue, Jun 10, 2025 at 1:44 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Mon, Jun 09, 2025 at 04:07:50PM -0400, Pasha Tatashin wrote:
> > > On Mon, Jun 9, 2025 at 3:36 PM Mike Rapoport <rppt@kernel.org> wrote:
> > > >
> > > > Hi Pratyush,
> > > >
> > > > On Fri, Jun 06, 2025 at 06:23:06PM +0200, Pratyush Yadav wrote:
> > > > > Hi Mike,
> > > > >
> > > > > On Fri, Jun 06 2025, Mike Rapoport wrote:
> > > > >
> > > > > > On Thu, Jun 05, 2025 at 07:11:41PM +0200, Pratyush Yadav wrote:
> > > > > >> From: Pratyush Yadav <ptyadav@amazon.de>
> > > > > >>
> > > > > >> --- a/kernel/kexec_handover.c
> > > > > >> +++ b/kernel/kexec_handover.c
> > > > > >> @@ -157,11 +157,21 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn,
> > > > > >> }
> > > > > >>
> > > > > >> /* almost as free_reserved_page(), just don't free the page */
> > > > > >> -static void kho_restore_page(struct page *page)
> > > > > >> +static void kho_restore_page(struct page *page, unsigned int order)
> > > > > >> {
> > > > > >> - ClearPageReserved(page);
> > > > > >
> > > > > > So now we don't clear PG_Reserved even on order-0 pages? ;-)
> > > > >
> > > > > We don't need to. As I mentioned in the commit message as well,
> > > > > PG_Reserved is never set for KHO pages since they are reserved with
> > > > > MEMBLOCK_RSRV_NOINIT, so memmap_init_reserved_pages() skips over them.
> > > >
> > > > You are right, I missed it.
> > > >
> > > > > That said, while reading through some of the code, I noticed another
> > > > > bug: because KHO reserves the preserved pages as NOINIT, with
> > > > > CONFIG_DEFERRED_STRUCT_PAGE_INIT == n, all the pages get initialized
> > > > > when memmap_init_range() is called from setup_arch (paging_init() on
> > > > > x86). This happens before kho_memory_init(), so the KHO-preserved pages
> > > > > are not marked as reserved to memblock yet.
> > > > >
> > > > > With deferred page init, some pages might not get initialized early, and
> > > > > get initialized after kho_memory_init(), by which time the KHO-preserved
> > > > > pages are marked as reserved. So, deferred_init_maxorder() will skip
> > > > > over those pages and leave them uninitialized.
> > > > >
> > > > > So we need to either also call init_deferred_page(), or remove the
> > > > > memblock_reserved_mark_noinit() call in deserialize_bitmap(). And TBH, I
> > > > > am not sure why KHO pages even need to be marked noinit in the first
> > > > > place. Probably the only benefit would be if a large chunk of memory is
> > > > > KHO-preserved, the pages can be initialized later on-demand, reducing
> > > > > bootup time a bit.
> > > >
> > > > One benefit is performance indeed, because in not deferred case the
> > > > initialization of reserved pages in memmap_init_reserved_pages() is really
> > > > excessive.
> > > >
> > > > But more importantly, if we remove memblock_reserved_mark_noinit(), with
> > > > CONFIG_DEFERRED_STRUCT_PAGE_INIT we'd loose page->private because the
> > > > struct page will be cleared after kho_mem_deserialize().
> > > >
> > > > > What do you think? Should we drop noinit or call init_deferred_page()?
> > > > > FWIW, my preference is to drop noinit, since init_deferred_page() is
> > > > > __meminit and we would have to make sure it doesn't go away after boot.
> > > >
> > > > We can't drop noinit and calling init_deferred_page() after boot just won't
> > > > work because it uses memblock to find the page's node and memblock is gone
> > > > after init.
> > > >
> > > > The simplest short-term solution is to disable KHO when
> > > > CONFIG_DEFERRED_STRUCT_PAGE_INIT is set and then find an efficient way to
> > > > make it all work together.
> > >
> > > This is what I've done in LUOv3 WIP:
> > > https://github.com/soleen/linux/commit/3059f38ac0a39a397873759fb429bd5d1f8ea681
> >
> > I think it should be the other way around, KHO should depend on
> > !DEFERRED_STRUCT_PAGE_INIT.
>
> Agreed, and this is what I first tried, but that does not work, there
> is some circular dependency breaking the build. If you feel
> adventurous you can try that :-)
Hmm, weird, worked for me :/
> > > We will need to teah KHO to work with deferred struct page init. I
> > > suspect, we could init preserved struct pages and then skip over them
> > > during deferred init.
> >
> > We could, but with that would mean we'll run this before SMP and it's not
> > desirable. Also, init_deferred_page() for a random page requires
>
> We already run KHO init before smp_init:
> start_kernel() -> mm_core_init() -> kho_memory_init() ->
> kho_restore_folio() -> struct pages must be already initialized here!
>
> While deferred struct pages are initialized:
> start_kernel() -> rest_init() -> kernel_init() ->
> kernel_init_freeable() -> page_alloc_init_late() ->
> deferred_init_memmap()
>
> If the number of preserved pages that is needed during early boot is
> relatively small, that it should not be an issue to pre-initialize
> struct pages for them before deferred struct pages are initialized. We
> already pre-initialize some "struct pages" that are needed during
> early boot before the reset are initialized, see deferred_grow_zone()
deferred_grow_zone() takes a chunk in the beginning of uninitialized range,
with kho we are talking about some random pages. If we preinit them early,
deferred_init_memmap() will overwrite them.
Anyway, I'm going to look into it, hopefully I'll have something Really
Soon (tm).
> Pasha
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-10 16:41 ` Mike Rapoport
@ 2025-06-10 22:33 ` Pasha Tatashin
2025-06-11 13:06 ` Pratyush Yadav
0 siblings, 1 reply; 18+ messages in thread
From: Pasha Tatashin @ 2025-06-10 22:33 UTC (permalink / raw)
To: Mike Rapoport
Cc: Pratyush Yadav, Alexander Graf, Changyuan Lyu, Andrew Morton,
Baoquan He, kexec, linux-kernel, linux-mm, Michal Clapinski
> > > I think it should be the other way around, KHO should depend on
> > > !DEFERRED_STRUCT_PAGE_INIT.
> >
> > Agreed, and this is what I first tried, but that does not work, there
> > is some circular dependency breaking the build. If you feel
> > adventurous you can try that :-)
>
> Hmm, weird, worked for me :/
I am super confused, it did not work for me over weekend, and now it
is working. Even `make menuconfig` would not work. Anyways, I will put
it in the appropriate place.
>
> > > > We will need to teah KHO to work with deferred struct page init. I
> > > > suspect, we could init preserved struct pages and then skip over them
> > > > during deferred init.
> > >
> > > We could, but with that would mean we'll run this before SMP and it's not
> > > desirable. Also, init_deferred_page() for a random page requires
> >
> > We already run KHO init before smp_init:
> > start_kernel() -> mm_core_init() -> kho_memory_init() ->
> > kho_restore_folio() -> struct pages must be already initialized here!
> >
> > While deferred struct pages are initialized:
> > start_kernel() -> rest_init() -> kernel_init() ->
> > kernel_init_freeable() -> page_alloc_init_late() ->
> > deferred_init_memmap()
> >
> > If the number of preserved pages that is needed during early boot is
> > relatively small, that it should not be an issue to pre-initialize
> > struct pages for them before deferred struct pages are initialized. We
> > already pre-initialize some "struct pages" that are needed during
> > early boot before the reset are initialized, see deferred_grow_zone()
>
> deferred_grow_zone() takes a chunk in the beginning of uninitialized range,
> with kho we are talking about some random pages. If we preinit them early,
> deferred_init_memmap() will overwrite them.
Yes, this is why I am saying that we would need to skip the KHO
initialized "struct pages" somehow during deferred initialization. If
we create an ordered by PFN list of early-initialized KHO struct
pages, skipping during deferred initialization could be done
efficiently.
> Anyway, I'm going to look into it, hopefully I'll have something Really
> Soon (tm).
CC: Michal Clapinski, who was also planning to look into this problem.
Pasha
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-10 22:33 ` Pasha Tatashin
@ 2025-06-11 13:06 ` Pratyush Yadav
2025-06-11 13:14 ` Pasha Tatashin
0 siblings, 1 reply; 18+ messages in thread
From: Pratyush Yadav @ 2025-06-11 13:06 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Mike Rapoport, Pratyush Yadav, Alexander Graf, Changyuan Lyu,
Andrew Morton, Baoquan He, kexec, linux-kernel, linux-mm,
Michal Clapinski
On Tue, Jun 10 2025, Pasha Tatashin wrote:
>> > > I think it should be the other way around, KHO should depend on
>> > > !DEFERRED_STRUCT_PAGE_INIT.
>> >
>> > Agreed, and this is what I first tried, but that does not work, there
>> > is some circular dependency breaking the build. If you feel
>> > adventurous you can try that :-)
>>
>> Hmm, weird, worked for me :/
Worked for me as well.
>
> I am super confused, it did not work for me over weekend, and now it
> is working. Even `make menuconfig` would not work. Anyways, I will put
> it in the appropriate place.
>
>>
>> > > > We will need to teah KHO to work with deferred struct page init. I
>> > > > suspect, we could init preserved struct pages and then skip over them
>> > > > during deferred init.
>> > >
>> > > We could, but with that would mean we'll run this before SMP and it's not
>> > > desirable. Also, init_deferred_page() for a random page requires
>> >
>> > We already run KHO init before smp_init:
>> > start_kernel() -> mm_core_init() -> kho_memory_init() ->
>> > kho_restore_folio() -> struct pages must be already initialized here!
>> >
>> > While deferred struct pages are initialized:
>> > start_kernel() -> rest_init() -> kernel_init() ->
>> > kernel_init_freeable() -> page_alloc_init_late() ->
>> > deferred_init_memmap()
>> >
>> > If the number of preserved pages that is needed during early boot is
>> > relatively small, that it should not be an issue to pre-initialize
>> > struct pages for them before deferred struct pages are initialized. We
>> > already pre-initialize some "struct pages" that are needed during
>> > early boot before the reset are initialized, see deferred_grow_zone()
>>
>> deferred_grow_zone() takes a chunk in the beginning of uninitialized range,
>> with kho we are talking about some random pages. If we preinit them early,
>> deferred_init_memmap() will overwrite them.
>
> Yes, this is why I am saying that we would need to skip the KHO
> initialized "struct pages" somehow during deferred initialization. If
> we create an ordered by PFN list of early-initialized KHO struct
> pages, skipping during deferred initialization could be done
> efficiently.
Or keep things simple and don't use any KHO struct pages during early
init. You can access the page itself, just don't use its struct page.
Currently the only user of kho_restore_folio() during init is
kho_memory_init(). The FDT is accessed by doing
phys_to_virt(kho_in.fdt_phys) anyway, so there is really no need for
restoring the folio so early. It can be done later, for example when LUO
does the finish event, to clean up and free the folio.
--
Regards,
Pratyush Yadav
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-11 13:06 ` Pratyush Yadav
@ 2025-06-11 13:14 ` Pasha Tatashin
2025-06-11 13:35 ` Mike Rapoport
2025-06-11 13:38 ` Pratyush Yadav
0 siblings, 2 replies; 18+ messages in thread
From: Pasha Tatashin @ 2025-06-11 13:14 UTC (permalink / raw)
To: Pratyush Yadav
Cc: Mike Rapoport, Alexander Graf, Changyuan Lyu, Andrew Morton,
Baoquan He, kexec, linux-kernel, linux-mm, Michal Clapinski
On Wed, Jun 11, 2025 at 9:06 AM Pratyush Yadav <pratyush@kernel.org> wrote:
>
> On Tue, Jun 10 2025, Pasha Tatashin wrote:
>
> >> > > I think it should be the other way around, KHO should depend on
> >> > > !DEFERRED_STRUCT_PAGE_INIT.
> >> >
> >> > Agreed, and this is what I first tried, but that does not work, there
> >> > is some circular dependency breaking the build. If you feel
> >> > adventurous you can try that :-)
> >>
> >> Hmm, weird, worked for me :/
>
> Worked for me as well.
>
> >
> > I am super confused, it did not work for me over weekend, and now it
> > is working. Even `make menuconfig` would not work. Anyways, I will put
> > it in the appropriate place.
> >
> >>
> >> > > > We will need to teah KHO to work with deferred struct page init. I
> >> > > > suspect, we could init preserved struct pages and then skip over them
> >> > > > during deferred init.
> >> > >
> >> > > We could, but with that would mean we'll run this before SMP and it's not
> >> > > desirable. Also, init_deferred_page() for a random page requires
> >> >
> >> > We already run KHO init before smp_init:
> >> > start_kernel() -> mm_core_init() -> kho_memory_init() ->
> >> > kho_restore_folio() -> struct pages must be already initialized here!
> >> >
> >> > While deferred struct pages are initialized:
> >> > start_kernel() -> rest_init() -> kernel_init() ->
> >> > kernel_init_freeable() -> page_alloc_init_late() ->
> >> > deferred_init_memmap()
> >> >
> >> > If the number of preserved pages that is needed during early boot is
> >> > relatively small, that it should not be an issue to pre-initialize
> >> > struct pages for them before deferred struct pages are initialized. We
> >> > already pre-initialize some "struct pages" that are needed during
> >> > early boot before the reset are initialized, see deferred_grow_zone()
> >>
> >> deferred_grow_zone() takes a chunk in the beginning of uninitialized range,
> >> with kho we are talking about some random pages. If we preinit them early,
> >> deferred_init_memmap() will overwrite them.
> >
> > Yes, this is why I am saying that we would need to skip the KHO
> > initialized "struct pages" somehow during deferred initialization. If
> > we create an ordered by PFN list of early-initialized KHO struct
> > pages, skipping during deferred initialization could be done
> > efficiently.
>
> Or keep things simple and don't use any KHO struct pages during early
> init. You can access the page itself, just don't use its struct page.
>
> Currently the only user of kho_restore_folio() during init is
> kho_memory_init(). The FDT is accessed by doing
> phys_to_virt(kho_in.fdt_phys) anyway, so there is really no need for
> restoring the folio so early. It can be done later, for example when LUO
> does the finish event, to clean up and free the folio.
Good suggestion, however, KHO does not have any sophisticated users
that we are going to be adding as part of the live update work in the
future: IR, KVM, early VCPU threads, and so on. So, while today, this
might work, in the future, I am not sure if we should expect struct
pages are not accessed until after deferred initialization or simply
fix it once and for all.
Pasha
>
> --
> Regards,
> Pratyush Yadav
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-11 13:14 ` Pasha Tatashin
@ 2025-06-11 13:35 ` Mike Rapoport
2025-06-11 14:01 ` Pratyush Yadav
2025-06-11 13:38 ` Pratyush Yadav
1 sibling, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2025-06-11 13:35 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Pratyush Yadav, Alexander Graf, Changyuan Lyu, Andrew Morton,
Baoquan He, kexec, linux-kernel, linux-mm, Michal Clapinski
On Wed, Jun 11, 2025 at 09:14:55AM -0400, Pasha Tatashin wrote:
> On Wed, Jun 11, 2025 at 9:06 AM Pratyush Yadav <pratyush@kernel.org> wrote:
> >
> > On Tue, Jun 10 2025, Pasha Tatashin wrote:
> >
> > >> > > I think it should be the other way around, KHO should depend on
> > >> > > !DEFERRED_STRUCT_PAGE_INIT.
> > >> >
> > >> > Agreed, and this is what I first tried, but that does not work, there
> > >> > is some circular dependency breaking the build. If you feel
> > >> > adventurous you can try that :-)
> > >>
> > >> Hmm, weird, worked for me :/
> >
> > Worked for me as well.
> >
> > >
> > > I am super confused, it did not work for me over weekend, and now it
> > > is working. Even `make menuconfig` would not work. Anyways, I will put
> > > it in the appropriate place.
> > >
> > >>
> > >> > > > We will need to teah KHO to work with deferred struct page init. I
> > >> > > > suspect, we could init preserved struct pages and then skip over them
> > >> > > > during deferred init.
> > >> > >
> > >> > > We could, but with that would mean we'll run this before SMP and it's not
> > >> > > desirable. Also, init_deferred_page() for a random page requires
> > >> >
> > >> > We already run KHO init before smp_init:
> > >> > start_kernel() -> mm_core_init() -> kho_memory_init() ->
> > >> > kho_restore_folio() -> struct pages must be already initialized here!
> > >> >
> > >> > While deferred struct pages are initialized:
> > >> > start_kernel() -> rest_init() -> kernel_init() ->
> > >> > kernel_init_freeable() -> page_alloc_init_late() ->
> > >> > deferred_init_memmap()
> > >> >
> > >> > If the number of preserved pages that is needed during early boot is
> > >> > relatively small, that it should not be an issue to pre-initialize
> > >> > struct pages for them before deferred struct pages are initialized. We
> > >> > already pre-initialize some "struct pages" that are needed during
> > >> > early boot before the reset are initialized, see deferred_grow_zone()
> > >>
> > >> deferred_grow_zone() takes a chunk in the beginning of uninitialized range,
> > >> with kho we are talking about some random pages. If we preinit them early,
> > >> deferred_init_memmap() will overwrite them.
> > >
> > > Yes, this is why I am saying that we would need to skip the KHO
> > > initialized "struct pages" somehow during deferred initialization. If
> > > we create an ordered by PFN list of early-initialized KHO struct
> > > pages, skipping during deferred initialization could be done
> > > efficiently.
> >
> > Or keep things simple and don't use any KHO struct pages during early
> > init. You can access the page itself, just don't use its struct page.
> >
> > Currently the only user of kho_restore_folio() during init is
> > kho_memory_init(). The FDT is accessed by doing
> > phys_to_virt(kho_in.fdt_phys) anyway, so there is really no need for
> > restoring the folio so early. It can be done later, for example when LUO
> > does the finish event, to clean up and free the folio.
>
> Good suggestion, however, KHO does not have any sophisticated users
> that we are going to be adding as part of the live update work in the
> future: IR, KVM, early VCPU threads, and so on. So, while today, this
> might work, in the future, I am not sure if we should expect struct
> pages are not accessed until after deferred initialization or simply
> fix it once and for all.
KHO already accesses stuct page early and uses page->private for order.
Since preserved memory is reserved in memblock, deferred init of struct
pages won't touch those pages, we just need to make sure they are properly
initialized at some point. If we don't expect many kho_restore_folio()
before page_alloc_init_late() we can use init_deferred_page() for early
accesses.
> Pasha
>
> >
> > --
> > Regards,
> > Pratyush Yadav
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-11 13:14 ` Pasha Tatashin
2025-06-11 13:35 ` Mike Rapoport
@ 2025-06-11 13:38 ` Pratyush Yadav
1 sibling, 0 replies; 18+ messages in thread
From: Pratyush Yadav @ 2025-06-11 13:38 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Pratyush Yadav, Mike Rapoport, Alexander Graf, Changyuan Lyu,
Andrew Morton, Baoquan He, kexec, linux-kernel, linux-mm,
Michal Clapinski
On Wed, Jun 11 2025, Pasha Tatashin wrote:
> On Wed, Jun 11, 2025 at 9:06 AM Pratyush Yadav <pratyush@kernel.org> wrote:
>>
>> On Tue, Jun 10 2025, Pasha Tatashin wrote:
[...]
>> >> > > We could, but with that would mean we'll run this before SMP and it's not
>> >> > > desirable. Also, init_deferred_page() for a random page requires
>> >> >
>> >> > We already run KHO init before smp_init:
>> >> > start_kernel() -> mm_core_init() -> kho_memory_init() ->
>> >> > kho_restore_folio() -> struct pages must be already initialized here!
>> >> >
>> >> > While deferred struct pages are initialized:
>> >> > start_kernel() -> rest_init() -> kernel_init() ->
>> >> > kernel_init_freeable() -> page_alloc_init_late() ->
>> >> > deferred_init_memmap()
>> >> >
>> >> > If the number of preserved pages that is needed during early boot is
>> >> > relatively small, that it should not be an issue to pre-initialize
>> >> > struct pages for them before deferred struct pages are initialized. We
>> >> > already pre-initialize some "struct pages" that are needed during
>> >> > early boot before the reset are initialized, see deferred_grow_zone()
>> >>
>> >> deferred_grow_zone() takes a chunk in the beginning of uninitialized range,
>> >> with kho we are talking about some random pages. If we preinit them early,
>> >> deferred_init_memmap() will overwrite them.
>> >
>> > Yes, this is why I am saying that we would need to skip the KHO
>> > initialized "struct pages" somehow during deferred initialization. If
>> > we create an ordered by PFN list of early-initialized KHO struct
>> > pages, skipping during deferred initialization could be done
>> > efficiently.
>>
>> Or keep things simple and don't use any KHO struct pages during early
>> init. You can access the page itself, just don't use its struct page.
>>
>> Currently the only user of kho_restore_folio() during init is
>> kho_memory_init(). The FDT is accessed by doing
>> phys_to_virt(kho_in.fdt_phys) anyway, so there is really no need for
>> restoring the folio so early. It can be done later, for example when LUO
>> does the finish event, to clean up and free the folio.
>
> Good suggestion, however, KHO does not have any sophisticated users
> that we are going to be adding as part of the live update work in the
> future: IR, KVM, early VCPU threads, and so on. So, while today, this
> might work, in the future, I am not sure if we should expect struct
> pages are not accessed until after deferred initialization or simply
> fix it once and for all.
Right. We might end up needing it down the line. But from a quick look,
it doesn't seem to be trivial to solve, so IMO we should solve it when
those use cases actually show up, and keep things simple for now.
--
Regards,
Pratyush Yadav
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-11 13:35 ` Mike Rapoport
@ 2025-06-11 14:01 ` Pratyush Yadav
2025-06-11 14:36 ` Mike Rapoport
0 siblings, 1 reply; 18+ messages in thread
From: Pratyush Yadav @ 2025-06-11 14:01 UTC (permalink / raw)
To: Mike Rapoport
Cc: Pasha Tatashin, Pratyush Yadav, Alexander Graf, Changyuan Lyu,
Andrew Morton, Baoquan He, kexec, linux-kernel, linux-mm,
Michal Clapinski
On Wed, Jun 11 2025, Mike Rapoport wrote:
> On Wed, Jun 11, 2025 at 09:14:55AM -0400, Pasha Tatashin wrote:
>> On Wed, Jun 11, 2025 at 9:06 AM Pratyush Yadav <pratyush@kernel.org> wrote:
>> >
>> > On Tue, Jun 10 2025, Pasha Tatashin wrote:
>> >
>> > >> > > I think it should be the other way around, KHO should depend on
>> > >> > > !DEFERRED_STRUCT_PAGE_INIT.
>> > >> >
>> > >> > Agreed, and this is what I first tried, but that does not work, there
>> > >> > is some circular dependency breaking the build. If you feel
>> > >> > adventurous you can try that :-)
>> > >>
>> > >> Hmm, weird, worked for me :/
>> >
>> > Worked for me as well.
>> >
>> > >
>> > > I am super confused, it did not work for me over weekend, and now it
>> > > is working. Even `make menuconfig` would not work. Anyways, I will put
>> > > it in the appropriate place.
>> > >
>> > >>
>> > >> > > > We will need to teah KHO to work with deferred struct page init. I
>> > >> > > > suspect, we could init preserved struct pages and then skip over them
>> > >> > > > during deferred init.
>> > >> > >
>> > >> > > We could, but with that would mean we'll run this before SMP and it's not
>> > >> > > desirable. Also, init_deferred_page() for a random page requires
>> > >> >
>> > >> > We already run KHO init before smp_init:
>> > >> > start_kernel() -> mm_core_init() -> kho_memory_init() ->
>> > >> > kho_restore_folio() -> struct pages must be already initialized here!
>> > >> >
>> > >> > While deferred struct pages are initialized:
>> > >> > start_kernel() -> rest_init() -> kernel_init() ->
>> > >> > kernel_init_freeable() -> page_alloc_init_late() ->
>> > >> > deferred_init_memmap()
>> > >> >
>> > >> > If the number of preserved pages that is needed during early boot is
>> > >> > relatively small, that it should not be an issue to pre-initialize
>> > >> > struct pages for them before deferred struct pages are initialized. We
>> > >> > already pre-initialize some "struct pages" that are needed during
>> > >> > early boot before the reset are initialized, see deferred_grow_zone()
>> > >>
>> > >> deferred_grow_zone() takes a chunk in the beginning of uninitialized range,
>> > >> with kho we are talking about some random pages. If we preinit them early,
>> > >> deferred_init_memmap() will overwrite them.
>> > >
>> > > Yes, this is why I am saying that we would need to skip the KHO
>> > > initialized "struct pages" somehow during deferred initialization. If
>> > > we create an ordered by PFN list of early-initialized KHO struct
>> > > pages, skipping during deferred initialization could be done
>> > > efficiently.
>> >
>> > Or keep things simple and don't use any KHO struct pages during early
>> > init. You can access the page itself, just don't use its struct page.
>> >
>> > Currently the only user of kho_restore_folio() during init is
>> > kho_memory_init(). The FDT is accessed by doing
>> > phys_to_virt(kho_in.fdt_phys) anyway, so there is really no need for
>> > restoring the folio so early. It can be done later, for example when LUO
>> > does the finish event, to clean up and free the folio.
>>
>> Good suggestion, however, KHO does not have any sophisticated users
>> that we are going to be adding as part of the live update work in the
>> future: IR, KVM, early VCPU threads, and so on. So, while today, this
>> might work, in the future, I am not sure if we should expect struct
>> pages are not accessed until after deferred initialization or simply
>> fix it once and for all.
>
> KHO already accesses stuct page early and uses page->private for order.
> Since preserved memory is reserved in memblock, deferred init of struct
> pages won't touch those pages, we just need to make sure they are properly
Not strictly true. Some of them might have been initialized from
free_area_init() -> memmap_init() (the ones not eligible for deferred
init), which happens before KHO makes its memblock reservations.
> initialized at some point. If we don't expect many kho_restore_folio()
> before page_alloc_init_late() we can use init_deferred_page() for early
> accesses.
I tried doing this when looking into this initially, but it doesn't work
for some reason.
static void kho_restore_page(struct page *page, unsigned int order)
{
unsigned int i, nr_pages = (1 << order);
/* Head page gets refcount of 1. */
init_deferred_page(page_to_pfn(page), NUMA_NO_NODE);
set_page_count(page, 1);
/* For higher order folios, tail pages get a page count of zero. */
for (i = 1; i < nr_pages; i++) {
init_deferred_page(page_to_pfn(page + i), NUMA_NO_NODE);
set_page_count(page + i, 0);
}
[...]
results in:
[ 0.644032] page:(____ptrval____) is uninitialized and poisoned
[ 0.644679] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
[ 0.645376] ------------[ cut here ]------------
[ 0.645883] kernel BUG at ./include/linux/mm.h:1512!
[...]
[ 0.647924] RIP: 0010:__pageblock_pfn_to_page+0x166/0x180
[...]
[ 0.647924] <TASK>
[ 0.647924] set_zone_contiguous+0x6b/0x90
[ 0.647924] page_alloc_init_late+0x356/0x370
[ 0.647924] kernel_init_freeable+0x12d/0x190
[ 0.647924] ? __pfx_kernel_init+0x10/0x10
[ 0.647924] kernel_init+0x1a/0x130
didn't dig any deeper on why it happens...
--
Regards,
Pratyush Yadav
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-11 14:01 ` Pratyush Yadav
@ 2025-06-11 14:36 ` Mike Rapoport
2025-06-13 14:22 ` Pratyush Yadav
0 siblings, 1 reply; 18+ messages in thread
From: Mike Rapoport @ 2025-06-11 14:36 UTC (permalink / raw)
To: Pratyush Yadav
Cc: Pasha Tatashin, Alexander Graf, Changyuan Lyu, Andrew Morton,
Baoquan He, kexec, linux-kernel, linux-mm, Michal Clapinski
On Wed, Jun 11, 2025 at 04:01:52PM +0200, Pratyush Yadav wrote:
> On Wed, Jun 11 2025, Mike Rapoport wrote:
>
> > On Wed, Jun 11, 2025 at 09:14:55AM -0400, Pasha Tatashin wrote:
> >> On Wed, Jun 11, 2025 at 9:06 AM Pratyush Yadav <pratyush@kernel.org> wrote:
> >> >
> >> > On Tue, Jun 10 2025, Pasha Tatashin wrote:
> >> >
> >> > >> > > I think it should be the other way around, KHO should depend on
> >> > >> > > !DEFERRED_STRUCT_PAGE_INIT.
> >> > >> >
> >> > >> > Agreed, and this is what I first tried, but that does not work, there
> >> > >> > is some circular dependency breaking the build. If you feel
> >> > >> > adventurous you can try that :-)
> >> > >>
> >> > >> Hmm, weird, worked for me :/
> >> >
> >> > Worked for me as well.
> >> >
> >> > >
> >> > > I am super confused, it did not work for me over weekend, and now it
> >> > > is working. Even `make menuconfig` would not work. Anyways, I will put
> >> > > it in the appropriate place.
> >> > >
> >> > >>
> >> > >> > > > We will need to teah KHO to work with deferred struct page init. I
> >> > >> > > > suspect, we could init preserved struct pages and then skip over them
> >> > >> > > > during deferred init.
> >> > >> > >
> >> > >> > > We could, but with that would mean we'll run this before SMP and it's not
> >> > >> > > desirable. Also, init_deferred_page() for a random page requires
> >> > >> >
> >> > >> > We already run KHO init before smp_init:
> >> > >> > start_kernel() -> mm_core_init() -> kho_memory_init() ->
> >> > >> > kho_restore_folio() -> struct pages must be already initialized here!
> >> > >> >
> >> > >> > While deferred struct pages are initialized:
> >> > >> > start_kernel() -> rest_init() -> kernel_init() ->
> >> > >> > kernel_init_freeable() -> page_alloc_init_late() ->
> >> > >> > deferred_init_memmap()
> >> > >> >
> >> > >> > If the number of preserved pages that is needed during early boot is
> >> > >> > relatively small, that it should not be an issue to pre-initialize
> >> > >> > struct pages for them before deferred struct pages are initialized. We
> >> > >> > already pre-initialize some "struct pages" that are needed during
> >> > >> > early boot before the reset are initialized, see deferred_grow_zone()
> >> > >>
> >> > >> deferred_grow_zone() takes a chunk in the beginning of uninitialized range,
> >> > >> with kho we are talking about some random pages. If we preinit them early,
> >> > >> deferred_init_memmap() will overwrite them.
> >> > >
> >> > > Yes, this is why I am saying that we would need to skip the KHO
> >> > > initialized "struct pages" somehow during deferred initialization. If
> >> > > we create an ordered by PFN list of early-initialized KHO struct
> >> > > pages, skipping during deferred initialization could be done
> >> > > efficiently.
> >> >
> >> > Or keep things simple and don't use any KHO struct pages during early
> >> > init. You can access the page itself, just don't use its struct page.
> >> >
> >> > Currently the only user of kho_restore_folio() during init is
> >> > kho_memory_init(). The FDT is accessed by doing
> >> > phys_to_virt(kho_in.fdt_phys) anyway, so there is really no need for
> >> > restoring the folio so early. It can be done later, for example when LUO
> >> > does the finish event, to clean up and free the folio.
> >>
> >> Good suggestion, however, KHO does not have any sophisticated users
> >> that we are going to be adding as part of the live update work in the
> >> future: IR, KVM, early VCPU threads, and so on. So, while today, this
> >> might work, in the future, I am not sure if we should expect struct
> >> pages are not accessed until after deferred initialization or simply
> >> fix it once and for all.
> >
> > KHO already accesses stuct page early and uses page->private for order.
> > Since preserved memory is reserved in memblock, deferred init of struct
> > pages won't touch those pages, we just need to make sure they are properly
>
> Not strictly true. Some of them might have been initialized from
> free_area_init() -> memmap_init() (the ones not eligible for deferred
> init), which happens before KHO makes its memblock reservations.
>
> > initialized at some point. If we don't expect many kho_restore_folio()
> > before page_alloc_init_late() we can use init_deferred_page() for early
> > accesses.
>
> I tried doing this when looking into this initially, but it doesn't work
> for some reason.
>
> static void kho_restore_page(struct page *page, unsigned int order)
> {
> unsigned int i, nr_pages = (1 << order);
>
> /* Head page gets refcount of 1. */
> init_deferred_page(page_to_pfn(page), NUMA_NO_NODE);
This would do
if (early_page_initialised(pfn, nid))
return;
__init_page_from_nid(pfn, nid);
and I'm really surprised it didn't crash in early_page_initialised()
because of NUMA_NO_NODE :)
What might work here is
pfn = page_to_pfn(page);
__init_page_from_nid(pfn, early_pfn_to_nid(pfn));
> set_page_count(page, 1);
>
> /* For higher order folios, tail pages get a page count of zero. */
> for (i = 1; i < nr_pages; i++) {
> init_deferred_page(page_to_pfn(page + i), NUMA_NO_NODE);
> set_page_count(page + i, 0);
> }
>
> [...]
>
> results in:
>
> [ 0.644032] page:(____ptrval____) is uninitialized and poisoned
> [ 0.644679] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
> [ 0.645376] ------------[ cut here ]------------
> [ 0.645883] kernel BUG at ./include/linux/mm.h:1512!
> [...]
> [ 0.647924] RIP: 0010:__pageblock_pfn_to_page+0x166/0x180
> [...]
> [ 0.647924] <TASK>
> [ 0.647924] set_zone_contiguous+0x6b/0x90
> [ 0.647924] page_alloc_init_late+0x356/0x370
> [ 0.647924] kernel_init_freeable+0x12d/0x190
> [ 0.647924] ? __pfx_kernel_init+0x10/0x10
> [ 0.647924] kernel_init+0x1a/0x130
>
> didn't dig any deeper on why it happens...
>
> --
> Regards,
> Pratyush Yadav
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-11 14:36 ` Mike Rapoport
@ 2025-06-13 14:22 ` Pratyush Yadav
2025-06-13 16:21 ` Mike Rapoport
0 siblings, 1 reply; 18+ messages in thread
From: Pratyush Yadav @ 2025-06-13 14:22 UTC (permalink / raw)
To: Mike Rapoport
Cc: Pratyush Yadav, Pasha Tatashin, Alexander Graf, Changyuan Lyu,
Andrew Morton, Baoquan He, kexec, linux-kernel, linux-mm,
Michal Clapinski
On Wed, Jun 11 2025, Mike Rapoport wrote:
> On Wed, Jun 11, 2025 at 04:01:52PM +0200, Pratyush Yadav wrote:
>> On Wed, Jun 11 2025, Mike Rapoport wrote:
>>
>> > On Wed, Jun 11, 2025 at 09:14:55AM -0400, Pasha Tatashin wrote:
>> >> On Wed, Jun 11, 2025 at 9:06 AM Pratyush Yadav <pratyush@kernel.org> wrote:
>> >> >
>> >> > On Tue, Jun 10 2025, Pasha Tatashin wrote:
>> >> >
>> >> > >> > > I think it should be the other way around, KHO should depend on
>> >> > >> > > !DEFERRED_STRUCT_PAGE_INIT.
>> >> > >> >
>> >> > >> > Agreed, and this is what I first tried, but that does not work, there
>> >> > >> > is some circular dependency breaking the build. If you feel
>> >> > >> > adventurous you can try that :-)
>> >> > >>
>> >> > >> Hmm, weird, worked for me :/
>> >> >
>> >> > Worked for me as well.
>> >> >
>> >> > >
>> >> > > I am super confused, it did not work for me over weekend, and now it
>> >> > > is working. Even `make menuconfig` would not work. Anyways, I will put
>> >> > > it in the appropriate place.
>> >> > >
>> >> > >>
>> >> > >> > > > We will need to teah KHO to work with deferred struct page init. I
>> >> > >> > > > suspect, we could init preserved struct pages and then skip over them
>> >> > >> > > > during deferred init.
>> >> > >> > >
>> >> > >> > > We could, but with that would mean we'll run this before SMP and it's not
>> >> > >> > > desirable. Also, init_deferred_page() for a random page requires
>> >> > >> >
>> >> > >> > We already run KHO init before smp_init:
>> >> > >> > start_kernel() -> mm_core_init() -> kho_memory_init() ->
>> >> > >> > kho_restore_folio() -> struct pages must be already initialized here!
>> >> > >> >
>> >> > >> > While deferred struct pages are initialized:
>> >> > >> > start_kernel() -> rest_init() -> kernel_init() ->
>> >> > >> > kernel_init_freeable() -> page_alloc_init_late() ->
>> >> > >> > deferred_init_memmap()
>> >> > >> >
>> >> > >> > If the number of preserved pages that is needed during early boot is
>> >> > >> > relatively small, that it should not be an issue to pre-initialize
>> >> > >> > struct pages for them before deferred struct pages are initialized. We
>> >> > >> > already pre-initialize some "struct pages" that are needed during
>> >> > >> > early boot before the reset are initialized, see deferred_grow_zone()
>> >> > >>
>> >> > >> deferred_grow_zone() takes a chunk in the beginning of uninitialized range,
>> >> > >> with kho we are talking about some random pages. If we preinit them early,
>> >> > >> deferred_init_memmap() will overwrite them.
>> >> > >
>> >> > > Yes, this is why I am saying that we would need to skip the KHO
>> >> > > initialized "struct pages" somehow during deferred initialization. If
>> >> > > we create an ordered by PFN list of early-initialized KHO struct
>> >> > > pages, skipping during deferred initialization could be done
>> >> > > efficiently.
>> >> >
>> >> > Or keep things simple and don't use any KHO struct pages during early
>> >> > init. You can access the page itself, just don't use its struct page.
>> >> >
>> >> > Currently the only user of kho_restore_folio() during init is
>> >> > kho_memory_init(). The FDT is accessed by doing
>> >> > phys_to_virt(kho_in.fdt_phys) anyway, so there is really no need for
>> >> > restoring the folio so early. It can be done later, for example when LUO
>> >> > does the finish event, to clean up and free the folio.
>> >>
>> >> Good suggestion, however, KHO does not have any sophisticated users
>> >> that we are going to be adding as part of the live update work in the
>> >> future: IR, KVM, early VCPU threads, and so on. So, while today, this
>> >> might work, in the future, I am not sure if we should expect struct
>> >> pages are not accessed until after deferred initialization or simply
>> >> fix it once and for all.
>> >
>> > KHO already accesses stuct page early and uses page->private for order.
>> > Since preserved memory is reserved in memblock, deferred init of struct
>> > pages won't touch those pages, we just need to make sure they are properly
>>
>> Not strictly true. Some of them might have been initialized from
>> free_area_init() -> memmap_init() (the ones not eligible for deferred
>> init), which happens before KHO makes its memblock reservations.
>>
>> > initialized at some point. If we don't expect many kho_restore_folio()
>> > before page_alloc_init_late() we can use init_deferred_page() for early
>> > accesses.
>>
>> I tried doing this when looking into this initially, but it doesn't work
>> for some reason.
>>
>> static void kho_restore_page(struct page *page, unsigned int order)
>> {
>> unsigned int i, nr_pages = (1 << order);
>>
>> /* Head page gets refcount of 1. */
>> init_deferred_page(page_to_pfn(page), NUMA_NO_NODE);
>
>
> This would do
>
> if (early_page_initialised(pfn, nid))
> return;
>
> __init_page_from_nid(pfn, nid);
>
> and I'm really surprised it didn't crash in early_page_initialised()
> because of NUMA_NO_NODE :)
Oh, right. Using the wrong node completely throws
early_page_initialised() off.
>
> What might work here is
>
> pfn = page_to_pfn(page);
> __init_page_from_nid(pfn, early_pfn_to_nid(pfn));
Yep, that works. Although this would do early_pfn_to_nid() for each page
so it isn't very efficient. And we also need to make sure memblock does
not go away.
>
>> set_page_count(page, 1);
>>
>> /* For higher order folios, tail pages get a page count of zero. */
>> for (i = 1; i < nr_pages; i++) {
>> init_deferred_page(page_to_pfn(page + i), NUMA_NO_NODE);
>> set_page_count(page + i, 0);
>> }
>>
>> [...]
>>
[...]
--
Regards,
Pratyush Yadav
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] kho: initialize tail pages for higher order folios properly
2025-06-13 14:22 ` Pratyush Yadav
@ 2025-06-13 16:21 ` Mike Rapoport
0 siblings, 0 replies; 18+ messages in thread
From: Mike Rapoport @ 2025-06-13 16:21 UTC (permalink / raw)
To: Pratyush Yadav
Cc: Pasha Tatashin, Alexander Graf, Changyuan Lyu, Andrew Morton,
Baoquan He, kexec, linux-kernel, linux-mm, Michal Clapinski
On Fri, Jun 13, 2025 at 04:22:32PM +0200, Pratyush Yadav wrote:
> On Wed, Jun 11 2025, Mike Rapoport wrote:
>
> > On Wed, Jun 11, 2025 at 04:01:52PM +0200, Pratyush Yadav wrote:
> >> On Wed, Jun 11 2025, Mike Rapoport wrote:
> >>
> >> > On Wed, Jun 11, 2025 at 09:14:55AM -0400, Pasha Tatashin wrote:
> >> >> On Wed, Jun 11, 2025 at 9:06 AM Pratyush Yadav <pratyush@kernel.org> wrote:
> >> >> >
> >> >> > On Tue, Jun 10 2025, Pasha Tatashin wrote:
> >> >> >
> >> >> > >> > > I think it should be the other way around, KHO should depend on
> >> >> > >> > > !DEFERRED_STRUCT_PAGE_INIT.
> >> >> > >> >
> >> >> > >> > Agreed, and this is what I first tried, but that does not work, there
> >> >> > >> > is some circular dependency breaking the build. If you feel
> >> >> > >> > adventurous you can try that :-)
> >> >> > >>
> >> >> > >> Hmm, weird, worked for me :/
> >> >> >
> >> >> > Worked for me as well.
> >> >> >
> >> >> > >
> >> >> > > I am super confused, it did not work for me over weekend, and now it
> >> >> > > is working. Even `make menuconfig` would not work. Anyways, I will put
> >> >> > > it in the appropriate place.
> >> >> > >
> >> >> > >>
> >> >> > >> > > > We will need to teah KHO to work with deferred struct page init. I
> >> >> > >> > > > suspect, we could init preserved struct pages and then skip over them
> >> >> > >> > > > during deferred init.
> >> >> > >> > >
> >> >> > >> > > We could, but with that would mean we'll run this before SMP and it's not
> >> >> > >> > > desirable. Also, init_deferred_page() for a random page requires
> >> >> > >> >
> >> >> > >> > We already run KHO init before smp_init:
> >> >> > >> > start_kernel() -> mm_core_init() -> kho_memory_init() ->
> >> >> > >> > kho_restore_folio() -> struct pages must be already initialized here!
> >> >> > >> >
> >> >> > >> > While deferred struct pages are initialized:
> >> >> > >> > start_kernel() -> rest_init() -> kernel_init() ->
> >> >> > >> > kernel_init_freeable() -> page_alloc_init_late() ->
> >> >> > >> > deferred_init_memmap()
> >> >> > >> >
> >> >> > >> > If the number of preserved pages that is needed during early boot is
> >> >> > >> > relatively small, that it should not be an issue to pre-initialize
> >> >> > >> > struct pages for them before deferred struct pages are initialized. We
> >> >> > >> > already pre-initialize some "struct pages" that are needed during
> >> >> > >> > early boot before the reset are initialized, see deferred_grow_zone()
> >> >> > >>
> >> >> > >> deferred_grow_zone() takes a chunk in the beginning of uninitialized range,
> >> >> > >> with kho we are talking about some random pages. If we preinit them early,
> >> >> > >> deferred_init_memmap() will overwrite them.
> >> >> > >
> >> >> > > Yes, this is why I am saying that we would need to skip the KHO
> >> >> > > initialized "struct pages" somehow during deferred initialization. If
> >> >> > > we create an ordered by PFN list of early-initialized KHO struct
> >> >> > > pages, skipping during deferred initialization could be done
> >> >> > > efficiently.
> >> >> >
> >> >> > Or keep things simple and don't use any KHO struct pages during early
> >> >> > init. You can access the page itself, just don't use its struct page.
> >> >> >
> >> >> > Currently the only user of kho_restore_folio() during init is
> >> >> > kho_memory_init(). The FDT is accessed by doing
> >> >> > phys_to_virt(kho_in.fdt_phys) anyway, so there is really no need for
> >> >> > restoring the folio so early. It can be done later, for example when LUO
> >> >> > does the finish event, to clean up and free the folio.
> >> >>
> >> >> Good suggestion, however, KHO does not have any sophisticated users
> >> >> that we are going to be adding as part of the live update work in the
> >> >> future: IR, KVM, early VCPU threads, and so on. So, while today, this
> >> >> might work, in the future, I am not sure if we should expect struct
> >> >> pages are not accessed until after deferred initialization or simply
> >> >> fix it once and for all.
> >> >
> >> > KHO already accesses stuct page early and uses page->private for order.
> >> > Since preserved memory is reserved in memblock, deferred init of struct
> >> > pages won't touch those pages, we just need to make sure they are properly
> >>
> >> Not strictly true. Some of them might have been initialized from
> >> free_area_init() -> memmap_init() (the ones not eligible for deferred
> >> init), which happens before KHO makes its memblock reservations.
> >>
> >> > initialized at some point. If we don't expect many kho_restore_folio()
> >> > before page_alloc_init_late() we can use init_deferred_page() for early
> >> > accesses.
> >>
> >> I tried doing this when looking into this initially, but it doesn't work
> >> for some reason.
> >>
> >> static void kho_restore_page(struct page *page, unsigned int order)
> >> {
> >> unsigned int i, nr_pages = (1 << order);
> >>
> >> /* Head page gets refcount of 1. */
> >> init_deferred_page(page_to_pfn(page), NUMA_NO_NODE);
> >
> >
> > This would do
> >
> > if (early_page_initialised(pfn, nid))
> > return;
> >
> > __init_page_from_nid(pfn, nid);
> >
> > and I'm really surprised it didn't crash in early_page_initialised()
> > because of NUMA_NO_NODE :)
>
> Oh, right. Using the wrong node completely throws
> early_page_initialised() off.
>
> >
> > What might work here is
> >
> > pfn = page_to_pfn(page);
> > __init_page_from_nid(pfn, early_pfn_to_nid(pfn));
>
> Yep, that works. Although this would do early_pfn_to_nid() for each page
> so it isn't very efficient. And we also need to make sure memblock does
> not go away.
Yeah, that's far from ideal. Let's just gate KHO on
!CONFIG_DEFERRED_PAGE_STRUCT_INIT for now.
> >> set_page_count(page, 1);
> >>
> >> /* For higher order folios, tail pages get a page count of zero. */
> >> for (i = 1; i < nr_pages; i++) {
> >> init_deferred_page(page_to_pfn(page + i), NUMA_NO_NODE);
> >> set_page_count(page + i, 0);
> >> }
> >>
> >> [...]
> >>
> [...]
>
> --
> Regards,
> Pratyush Yadav
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-06-13 16:21 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05 17:11 [PATCH] kho: initialize tail pages for higher order folios properly Pratyush Yadav
2025-06-05 20:13 ` Andrew Morton
2025-06-06 8:04 ` Mike Rapoport
2025-06-06 16:23 ` Pratyush Yadav
2025-06-09 19:36 ` Mike Rapoport
2025-06-09 20:07 ` Pasha Tatashin
2025-06-10 5:44 ` Mike Rapoport
2025-06-10 11:20 ` Pasha Tatashin
2025-06-10 16:41 ` Mike Rapoport
2025-06-10 22:33 ` Pasha Tatashin
2025-06-11 13:06 ` Pratyush Yadav
2025-06-11 13:14 ` Pasha Tatashin
2025-06-11 13:35 ` Mike Rapoport
2025-06-11 14:01 ` Pratyush Yadav
2025-06-11 14:36 ` Mike Rapoport
2025-06-13 14:22 ` Pratyush Yadav
2025-06-13 16:21 ` Mike Rapoport
2025-06-11 13:38 ` Pratyush Yadav
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).