linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kho: add support for deferred struct page init
@ 2025-12-16  8:49 Evangelos Petrongonas
  2025-12-16 10:53 ` Pasha Tatashin
  2025-12-16 11:57 ` Mike Rapoport
  0 siblings, 2 replies; 27+ messages in thread
From: Evangelos Petrongonas @ 2025-12-16  8:49 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Evangelos Petrongonas, Pasha Tatashin, Pratyush Yadav,
	Alexander Graf, Andrew Morton, Jason Miu, linux-kernel, kexec,
	linux-mm, nh-open-source

When `CONFIG_DEFERRED_STRUCT_PAGE_INIT` is enabled, struct page
initialization is deferred to parallel kthreads that run later
in the boot process.

During KHO restoration, `deserialize_bitmap()` writes metadata for
each preserved memory region. However, if the struct page has not been
initialized, this write targets uninitialized memory, potentially
leading to errors like:
```
BUG: unable to handle page fault for address: ...
```

Fix this by introducing `kho_get_preserved_page()`,  which ensures
all struct pages in a preserved region are initialized by calling
`init_deferred_page()` which is a no-op when deferred init is disabled
or when the struct page is already initialized.

Fixes: 8b66ed2c3f42 ("kho: mm: don't allow deferred struct page with KHO")
Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
---
### Notes
@Jason, this patch should act as a temporary fix to make KHO play nice
with deferred struct page init until you post your ideas about splitting
"Physical Reservation" from "Metadata Restoration".

### Testing
In order to test the fix, I modified the KHO selftest, to allocate more
memory and do so from higher memory to trigger the incompatibility. The
branch with those changes can be found in:
https://git.infradead.org/?p=users/vpetrog/linux.git;a=shortlog;h=refs/heads/kho-deferred-struct-page-init

In future patches, we might want to enhance the selftest to cover
this case as well. However, properly adopting the test for this
is much more work than the actual fix, therefore it can be deferred to a
follow-up series.

In addition attempting to run the selftest for arm (without my changes)
fails with:
```
ERROR:target/arm/internals.h:767:regime_is_user: code should not be reached
Bail out! ERROR:target/arm/internals.h:767:regime_is_user: code should not be reached
./tools/testing/selftests/kho/vmtest.sh: line 113: 61609 Aborted
```
I have not looked it up further, but can also do so as part of a
selftest follow-up.

 kernel/liveupdate/Kconfig          |  2 --
 kernel/liveupdate/kexec_handover.c | 19 ++++++++++++++++++-
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
index d2aeaf13c3ac..9394a608f939 100644
--- a/kernel/liveupdate/Kconfig
+++ b/kernel/liveupdate/Kconfig
@@ -1,12 +1,10 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 menu "Live Update and Kexec HandOver"
-	depends on !DEFERRED_STRUCT_PAGE_INIT
 
 config KEXEC_HANDOVER
 	bool "kexec handover"
 	depends on ARCH_SUPPORTS_KEXEC_HANDOVER && ARCH_SUPPORTS_KEXEC_FILE
-	depends on !DEFERRED_STRUCT_PAGE_INIT
 	select MEMBLOCK_KHO_SCRATCH
 	select KEXEC_FILE
 	select LIBFDT
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index 9dc51fab604f..78cfe71e6107 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -439,6 +439,23 @@ static int kho_mem_serialize(struct kho_out *kho_out)
 	return err;
 }
 
+/*
+ * With CONFIG_DEFERRED_STRUCT_PAGE_INIT, struct pages in higher memory
+ * regions may not be initialized yet at the time KHO deserializes preserved
+ * memory. This function ensures all struct pages in the region are initialized.
+ */
+static struct page *__init kho_get_preserved_page(phys_addr_t phys,
+						  unsigned int order)
+{
+	unsigned long pfn = PHYS_PFN(phys);
+	int nid = early_pfn_to_nid(pfn);
+
+	for (int i = 0; i < (1 << order); i++)
+		init_deferred_page(pfn + i, nid);
+
+	return pfn_to_page(pfn);
+}
+
 static void __init deserialize_bitmap(unsigned int order,
 				      struct khoser_mem_bitmap_ptr *elm)
 {
@@ -449,7 +466,7 @@ static void __init deserialize_bitmap(unsigned int order,
 		int sz = 1 << (order + PAGE_SHIFT);
 		phys_addr_t phys =
 			elm->phys_start + (bit << (order + PAGE_SHIFT));
-		struct page *page = phys_to_page(phys);
+		struct page *page = kho_get_preserved_page(phys, order);
 		union kho_page_info info;
 
 		memblock_reserve(phys, sz);
-- 
2.43.0




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply related	[flat|nested] 27+ messages in thread
* Re: [PATCH] kho: add support for deferred struct page init
@ 2025-12-24  7:34 Fadouse
  2025-12-29 21:09 ` Pratyush Yadav
  0 siblings, 1 reply; 27+ messages in thread
From: Fadouse @ 2025-12-24  7:34 UTC (permalink / raw)
  To: Evangelos Petrongonas, Mike Rapoport
  Cc: Pasha Tatashin, Pratyush Yadav, Alexander Graf, Andrew Morton,
	Jason Miu, linux-kernel, kexec, linux-mm, nh-open-source


[-- Attachment #1.1: Type: text/plain, Size: 5467 bytes --]

Hi Evangelos, Mike, Pasha, Pratyush,

I independently hit a crash in the LUO/memfd restore path with
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y, on a local build based on dd9b004b7ff3
(x86_64 QEMU, 6.19.0-rc1 timeframe).

In my reproducer, stage1 preserves a memfd via LUO and kexecs into stage2;
stage2 calls LIVEUPDATE_SESSION_FINISH without retrieving files. I observed
a reliable crash in adjust_managed_page_count() from kho_restore_page().

Minimal excerpt:

stage2: start
stage2: retrieved session fd=4
BUG: unable to handle page fault for address: 0000000000001410
RIP: adjust_managed_page_count+0x29/0x40
Call Trace:
   kho_restore_page+0x18a/0x1c0
   kho_restore_folio+0xe/0x60
   memfd_luo_finish+0xe6/0x160
   luo_file_finish+0x188/0x240
   luo_session_finish+0x2c/0x80
   luo_session_ioctl+0xf5/0x170
   __x64_sys_ioctl+0x91/0xe0

Applying the patch in <20251216084913.86342-1-epetron@amazon.de> makes the
issue no longer reproduce for me.

I can share full logs and the small two-stage initramfs reproducer if 
needed.

Thanks,
YanXin Li

Tested-by: YanXin Li <fadouse@proton.me>

On 12/16/2025 4:49 PM, Evangelos Petrongonas wrote:
> When `CONFIG_DEFERRED_STRUCT_PAGE_INIT` is enabled, struct page
> initialization is deferred to parallel kthreads that run later
> in the boot process.
>
> During KHO restoration, `deserialize_bitmap()` writes metadata for
> each preserved memory region. However, if the struct page has not been
> initialized, this write targets uninitialized memory, potentially
> leading to errors like:
> ```
> BUG: unable to handle page fault for address: ...
> ```
>
> Fix this by introducing `kho_get_preserved_page()`,  which ensures
> all struct pages in a preserved region are initialized by calling
> `init_deferred_page()` which is a no-op when deferred init is disabled
> or when the struct page is already initialized.
>
> Fixes: 8b66ed2c3f42 ("kho: mm: don't allow deferred struct page with KHO")
> Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> ---
> ### Notes
> @Jason, this patch should act as a temporary fix to make KHO play nice
> with deferred struct page init until you post your ideas about splitting
> "Physical Reservation" from "Metadata Restoration".
>
> ### Testing
> In order to test the fix, I modified the KHO selftest, to allocate more
> memory and do so from higher memory to trigger the incompatibility. The
> branch with those changes can be found in:
> https://git.infradead.org/?p=users/vpetrog/linux.git;a=shortlog;h=refs/heads/kho-deferred-struct-page-init
>
> In future patches, we might want to enhance the selftest to cover
> this case as well. However, properly adopting the test for this
> is much more work than the actual fix, therefore it can be deferred to a
> follow-up series.
>
> In addition attempting to run the selftest for arm (without my changes)
> fails with:
> ```
> ERROR:target/arm/internals.h:767:regime_is_user: code should not be reached
> Bail out! ERROR:target/arm/internals.h:767:regime_is_user: code should not be reached
> ./tools/testing/selftests/kho/vmtest.sh: line 113: 61609 Aborted
> ```
> I have not looked it up further, but can also do so as part of a
> selftest follow-up.
>
>   kernel/liveupdate/Kconfig          |  2 --
>   kernel/liveupdate/kexec_handover.c | 19 ++++++++++++++++++-
>   2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
> index d2aeaf13c3ac..9394a608f939 100644
> --- a/kernel/liveupdate/Kconfig
> +++ b/kernel/liveupdate/Kconfig
> @@ -1,12 +1,10 @@
>   # SPDX-License-Identifier: GPL-2.0-only
>   
>   menu "Live Update and Kexec HandOver"
> -	depends on !DEFERRED_STRUCT_PAGE_INIT
>   
>   config KEXEC_HANDOVER
>   	bool "kexec handover"
>   	depends on ARCH_SUPPORTS_KEXEC_HANDOVER && ARCH_SUPPORTS_KEXEC_FILE
> -	depends on !DEFERRED_STRUCT_PAGE_INIT
>   	select MEMBLOCK_KHO_SCRATCH
>   	select KEXEC_FILE
>   	select LIBFDT
> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> index 9dc51fab604f..78cfe71e6107 100644
> --- a/kernel/liveupdate/kexec_handover.c
> +++ b/kernel/liveupdate/kexec_handover.c
> @@ -439,6 +439,23 @@ static int kho_mem_serialize(struct kho_out *kho_out)
>   	return err;
>   }
>   
> +/*
> + * With CONFIG_DEFERRED_STRUCT_PAGE_INIT, struct pages in higher memory
> + * regions may not be initialized yet at the time KHO deserializes preserved
> + * memory. This function ensures all struct pages in the region are initialized.
> + */
> +static struct page *__init kho_get_preserved_page(phys_addr_t phys,
> +						  unsigned int order)
> +{
> +	unsigned long pfn = PHYS_PFN(phys);
> +	int nid = early_pfn_to_nid(pfn);
> +
> +	for (int i = 0; i < (1 << order); i++)
> +		init_deferred_page(pfn + i, nid);
> +
> +	return pfn_to_page(pfn);
> +}
> +
>   static void __init deserialize_bitmap(unsigned int order,
>   				      struct khoser_mem_bitmap_ptr *elm)
>   {
> @@ -449,7 +466,7 @@ static void __init deserialize_bitmap(unsigned int order,
>   		int sz = 1 << (order + PAGE_SHIFT);
>   		phys_addr_t phys =
>   			elm->phys_start + (bit << (order + PAGE_SHIFT));
> -		struct page *page = phys_to_page(phys);
> +		struct page *page = kho_get_preserved_page(phys, order);
>   		union kho_page_info info;
>   
>   		memblock_reserve(phys, sz);

[-- Attachment #1.2: publickey - fadouse@proton.me - 0xFD2A1679.asc --]
[-- Type: application/pgp-keys, Size: 693 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 322 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-12-30 18:22 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-16  8:49 [PATCH] kho: add support for deferred struct page init Evangelos Petrongonas
2025-12-16 10:53 ` Pasha Tatashin
2025-12-16 11:57 ` Mike Rapoport
2025-12-16 14:26   ` Evangelos Petrongonas
2025-12-16 15:05   ` Pasha Tatashin
2025-12-16 15:19     ` Mike Rapoport
2025-12-16 15:36       ` Pasha Tatashin
2025-12-16 15:51         ` Pasha Tatashin
2025-12-20  2:27           ` Pratyush Yadav
2025-12-19  9:19         ` Mike Rapoport
2025-12-19 16:28           ` Pasha Tatashin
2025-12-20  3:20             ` Pratyush Yadav
2025-12-20 14:49               ` Pasha Tatashin
2025-12-22 15:33                 ` Pratyush Yadav
2025-12-22 15:55                   ` Pasha Tatashin
2025-12-22 16:24                     ` Pratyush Yadav
2025-12-23 17:37                       ` Pasha Tatashin
2025-12-29 21:03                         ` Pratyush Yadav
2025-12-30 16:05                           ` Pasha Tatashin
2025-12-30 16:16                             ` Mike Rapoport
2025-12-30 16:18                               ` Pasha Tatashin
2025-12-30 17:18                                 ` Mike Rapoport
2025-12-30 18:21                                   ` Pasha Tatashin
2025-12-30 16:14                           ` Mike Rapoport
  -- strict thread matches above, loose matches on Subject: below --
2025-12-24  7:34 Fadouse
2025-12-29 21:09 ` Pratyush Yadav
2025-12-30 15:05   ` Pasha Tatashin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).