linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] efi: Support booting with kexec handover (KHO)
@ 2025-08-08 16:36 Evangelos Petrongonas
  2025-08-11  6:39 ` Mike Rapoport
  0 siblings, 1 reply; 4+ messages in thread
From: Evangelos Petrongonas @ 2025-08-08 16:36 UTC (permalink / raw)
  To: ardb
  Cc: Evangelos Petrongonas, Alexander Graf, Mike Rapoport,
	Changyuan Lyu, kexec, nh-open-source, linux-efi, linux-kernel

When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
early during device tree scanning. After kexec, the new kernel
exclusively uses this region for memory allocations during boot up to
the initialization of the page allocator

However, when booting with EFI, EFI's reserve_regions() uses
memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
rebuilding them from EFI data. This destroys KHO scratch regions and
their flags, thus causing a kernel panic, as there are no scratch
memory regions.

Instead of wholesale removal, iterate through memory regions and only
remove non-KHO ones. This preserves KHO scratch regions while still
allowing EFI to rebuild its memory map.

Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
---

Reproduction/Verification Steps
The issue and the fix can be reproduced/verified by booting a VM with
EFI and attempting to perform a KHO enabled kexec. The fix
was developed/tested on arm64.

 drivers/firmware/efi/efi-init.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
index a00e07b853f22..2f08b1ab764f6 100644
--- a/drivers/firmware/efi/efi-init.c
+++ b/drivers/firmware/efi/efi-init.c
@@ -164,12 +164,35 @@ static __init void reserve_regions(void)
 		pr_info("Processing EFI memory map:\n");
 
 	/*
-	 * Discard memblocks discovered so far: if there are any at this
-	 * point, they originate from memory nodes in the DT, and UEFI
-	 * uses its own memory map instead.
+	 * Discard memblocks discovered so far except for KHO scratch regions.
+	 * Most memblocks at this point originate from memory nodes in the DT,
+	 * and UEFI uses its own memory map instead. However, if KHO is enabled,
+	 * scratch regions must be preserved.
 	 */
 	memblock_dump_all();
-	memblock_remove(0, PHYS_ADDR_MAX);
+
+	if (IS_ENABLED(CONFIG_MEMBLOCK_KHO_SCRATCH)) {
+		struct memblock_region *reg;
+		phys_addr_t start, size;
+		int i;
+
+		/* Remove all non-KHO regions */
+		for (i = memblock.memory.cnt - 1; i >= 0; i--) {
+			reg = &memblock.memory.regions[i];
+			if (!memblock_is_kho_scratch(reg)) {
+				start = reg->base;
+				size = reg->size;
+				memblock_remove(start, size);
+			}
+		}
+	} else {
+	/*
+	 * KHO is disabled. Discard memblocks discovered so far: if there
+	 * are any at this point, they originate from memory nodes in the
+	 * DT, and UEFI uses its own memory map instead.
+	 */
+		memblock_remove(0, PHYS_ADDR_MAX);
+	}
 
 	for_each_efi_memory_desc(md) {
 		paddr = md->phys_addr;
-- 
2.43.0




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] efi: Support booting with kexec handover (KHO)
  2025-08-08 16:36 [PATCH] efi: Support booting with kexec handover (KHO) Evangelos Petrongonas
@ 2025-08-11  6:39 ` Mike Rapoport
  2025-08-14  0:53   ` Evangelos Petrongonas
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Rapoport @ 2025-08-11  6:39 UTC (permalink / raw)
  To: Evangelos Petrongonas
  Cc: ardb, Alexander Graf, Changyuan Lyu, kexec, nh-open-source,
	linux-efi, linux-kernel

On Fri, Aug 08, 2025 at 04:36:51PM +0000, Evangelos Petrongonas wrote:
> When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
> early during device tree scanning. After kexec, the new kernel
> exclusively uses this region for memory allocations during boot up to
> the initialization of the page allocator
> 
> However, when booting with EFI, EFI's reserve_regions() uses
> memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
> rebuilding them from EFI data. This destroys KHO scratch regions and
> their flags, thus causing a kernel panic, as there are no scratch
> memory regions.
> 
> Instead of wholesale removal, iterate through memory regions and only
> remove non-KHO ones. This preserves KHO scratch regions while still
> allowing EFI to rebuild its memory map.

It's worth mentioning that scratch areas are "good known memory" :)
 
> Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> ---
> 
> Reproduction/Verification Steps
> The issue and the fix can be reproduced/verified by booting a VM with
> EFI and attempting to perform a KHO enabled kexec. The fix
> was developed/tested on arm64.
> 
>  drivers/firmware/efi/efi-init.c | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
> index a00e07b853f22..2f08b1ab764f6 100644
> --- a/drivers/firmware/efi/efi-init.c
> +++ b/drivers/firmware/efi/efi-init.c
> @@ -164,12 +164,35 @@ static __init void reserve_regions(void)
>  		pr_info("Processing EFI memory map:\n");
>  
>  	/*
> -	 * Discard memblocks discovered so far: if there are any at this
> -	 * point, they originate from memory nodes in the DT, and UEFI
> -	 * uses its own memory map instead.
> +	 * Discard memblocks discovered so far except for KHO scratch regions.
> +	 * Most memblocks at this point originate from memory nodes in the DT,
> +	 * and UEFI uses its own memory map instead. However, if KHO is enabled,
> +	 * scratch regions must be preserved.
>  	 */
>  	memblock_dump_all();
> -	memblock_remove(0, PHYS_ADDR_MAX);
> +
> +	if (IS_ENABLED(CONFIG_MEMBLOCK_KHO_SCRATCH)) {

It's better to condition this on kho_get_fdt() that means that we are
actually doing a handover.

> +		struct memblock_region *reg;
> +		phys_addr_t start, size;
> +		int i;
> +
> +		/* Remove all non-KHO regions */
> +		for (i = memblock.memory.cnt - 1; i >= 0; i--) {

Please use for_each_mem_region() 

> +			reg = &memblock.memory.regions[i];
> +			if (!memblock_is_kho_scratch(reg)) {
> +				start = reg->base;
> +				size = reg->size;
> +				memblock_remove(start, size);
> +			}
> +		}
> +	} else {
> +	/*
> +	 * KHO is disabled. Discard memblocks discovered so far: if there
> +	 * are any at this point, they originate from memory nodes in the
> +	 * DT, and UEFI uses its own memory map instead.
> +	 */
> +		memblock_remove(0, PHYS_ADDR_MAX);
> +	}
>  
>  	for_each_efi_memory_desc(md) {
>  		paddr = md->phys_addr;
> -- 
> 2.43.0

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] efi: Support booting with kexec handover (KHO)
  2025-08-11  6:39 ` Mike Rapoport
@ 2025-08-14  0:53   ` Evangelos Petrongonas
  2025-08-14  8:51     ` Mike Rapoport
  0 siblings, 1 reply; 4+ messages in thread
From: Evangelos Petrongonas @ 2025-08-14  0:53 UTC (permalink / raw)
  To: rppt
  Cc: ardb, changyuanl, epetron, graf, kexec, linux-efi, linux-kernel,
	nh-open-source

Hey Mike, thanks for your review,

On Mon, 11 Aug 2025 09:39:50 +0300, Mike Rapoport <rppt@kernel.org> wrote:
> On Fri, Aug 08, 2025 at 04:36:51PM +0000, Evangelos Petrongonas wrote:
> > When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
> > early during device tree scanning. After kexec, the new kernel
> > exclusively uses this region for memory allocations during boot up to
> > the initialization of the page allocator
> >
> > However, when booting with EFI, EFI's reserve_regions() uses
> > memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
> > rebuilding them from EFI data. This destroys KHO scratch regions and
> > their flags, thus causing a kernel panic, as there are no scratch
> > memory regions.
> >
> > Instead of wholesale removal, iterate through memory regions and only
> > remove non-KHO ones. This preserves KHO scratch regions while still
> > allowing EFI to rebuild its memory map.
>
> It's worth mentioning that scratch areas are "good known memory" :)
>

I Will do so on Rev2.

> > Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> > ---
> >
> > Reproduction/Verification Steps
> > The issue and the fix can be reproduced/verified by booting a VM with
> > EFI and attempting to perform a KHO enabled kexec. The fix
> > was developed/tested on arm64.
> >
> >  drivers/firmware/efi/efi-init.c | 31 +++++++++++++++++++++++++++----
> >  1 file changed, 27 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
> > index a00e07b853f22..2f08b1ab764f6 100644
> > --- a/drivers/firmware/efi/efi-init.c
> > +++ b/drivers/firmware/efi/efi-init.c
> > @@ -164,12 +164,35 @@ static __init void reserve_regions(void)
> >  		pr_info("Processing EFI memory map:\n");
> >
> >  	/*
> > -	 * Discard memblocks discovered so far: if there are any at this
> > -	 * point, they originate from memory nodes in the DT, and UEFI
> > -	 * uses its own memory map instead.
> > +	 * Discard memblocks discovered so far except for KHO scratch regions.
> > +	 * Most memblocks at this point originate from memory nodes in the DT,
> > +	 * and UEFI uses its own memory map instead. However, if KHO is enabled,
> > +	 * scratch regions must be preserved.
> >  	 */
> >  	memblock_dump_all();
> > -	memblock_remove(0, PHYS_ADDR_MAX);
> > +
> > +	if (IS_ENABLED(CONFIG_MEMBLOCK_KHO_SCRATCH)) {
>
> It's better to condition this on kho_get_fdt() that means that we are
> actually doing a handover.
>

Hmm, I see that `kho_get_fdt()` is static. My first instinct was to use
kho_enable() instead. Diving a bit more into the initialisation flow,
during the `setup_arch()`->`efi_init()`, `kho_enable()` will return
true if kho is enabled in the cmdline, but not if we are actually doing
a KHO enabled kexec. However, in this case, the parsing of memory
regions is going to be a noop in terms of functionality, but will
contribute, negatively —though the overhead would likely be
unmeasurable to the (cold) boot time. If we  want to avoid that, we
might consider adding another function to the KHO API, like
`is_booting_with_kho()`, that practically wraps the `kho_get_fdt()`.
IMO, it feels a bit cleaner this way, as other components  don't
necessarily (need to) know the internal FDT based implementation of
KHO. That being said, I am definitely not the most experienced person
when it comes to API design, so there is a high chance that I am way
off :)

So to sum it up, I see three paths forward:
1. Condition with `kho_is_enabled()` instead of the CONFIG (accepting
   the minor cold boot overhead)
2. Post another patch that extends the KHO API, adding a wrapper for
   the `kho_get_fdt()`, like `is_booting_with_kho()` indicating that we
   are booting with KHO enabled
3. Post another patch that exports the `kho_get_fdt()` directly.

I am happy to implement any of the three, or any other suggestion you
might have.

> > +		struct memblock_region *reg;
> > +		phys_addr_t start, size;
> > +		int i;
> > +
> > +		/* Remove all non-KHO regions */
> > +		for (i = memblock.memory.cnt - 1; i >= 0; i--) {
>
> Please use for_each_mem_region()
>

Todo in Rev2.

> > +			reg = &memblock.memory.regions[i];
> > +			if (!memblock_is_kho_scratch(reg)) {
> > +				start = reg->base;
> > +				size = reg->size;
> > +				memblock_remove(start, size);
> > +			}
> > +		}
> > +	} else {
> > +	/*
> > +	 * KHO is disabled. Discard memblocks discovered so far: if there
> > +	 * are any at this point, they originate from memory nodes in the
> > +	 * DT, and UEFI uses its own memory map instead.
> > +	 */
> > +		memblock_remove(0, PHYS_ADDR_MAX);
> > +	}
> >
> >  	for_each_efi_memory_desc(md) {
> >  		paddr = md->phys_addr;
> > --
> > 2.43.0
>
> --
> Sincerely yours,
> Mike.
>
>

--
Kind Regards,
Evangelos.



Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] efi: Support booting with kexec handover (KHO)
  2025-08-14  0:53   ` Evangelos Petrongonas
@ 2025-08-14  8:51     ` Mike Rapoport
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Rapoport @ 2025-08-14  8:51 UTC (permalink / raw)
  To: Evangelos Petrongonas
  Cc: ardb, changyuanl, graf, kexec, linux-efi, linux-kernel,
	nh-open-source

On Thu, Aug 14, 2025 at 12:53:15AM +0000, Evangelos Petrongonas wrote:
> Hey Mike, thanks for your review,
> 
> On Mon, 11 Aug 2025 09:39:50 +0300, Mike Rapoport <rppt@kernel.org> wrote:
> > On Fri, Aug 08, 2025 at 04:36:51PM +0000, Evangelos Petrongonas wrote:
> > > When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
> > > early during device tree scanning. After kexec, the new kernel
> > > exclusively uses this region for memory allocations during boot up to
> > > the initialization of the page allocator
> > >
> > > However, when booting with EFI, EFI's reserve_regions() uses
> > > memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
> > > rebuilding them from EFI data. This destroys KHO scratch regions and
> > > their flags, thus causing a kernel panic, as there are no scratch
> > > memory regions.
> > >
> > > Instead of wholesale removal, iterate through memory regions and only
> > > remove non-KHO ones. This preserves KHO scratch regions while still
> > > allowing EFI to rebuild its memory map.
> >
> > It's worth mentioning that scratch areas are "good known memory" :)
> >
> 
> I Will do so on Rev2.
> 
> > > Signed-off-by: Evangelos Petrongonas <epetron@amazon.de>
> > > ---
> > >
> > >  	 */
> > >  	memblock_dump_all();
> > > -	memblock_remove(0, PHYS_ADDR_MAX);
> > > +
> > > +	if (IS_ENABLED(CONFIG_MEMBLOCK_KHO_SCRATCH)) {
> >
> > It's better to condition this on kho_get_fdt() that means that we are
> > actually doing a handover.
> >
> 
> Hmm, I see that `kho_get_fdt()` is static. My first instinct was to use
> kho_enable() instead. Diving a bit more into the initialisation flow,
> during the `setup_arch()`->`efi_init()`, `kho_enable()` will return
> true if kho is enabled in the cmdline, but not if we are actually doing
> a KHO enabled kexec. However, in this case, the parsing of memory
> regions is going to be a noop in terms of functionality, but will
> contribute, negatively —though the overhead would likely be
> unmeasurable to the (cold) boot time. If we  want to avoid that, we
> might consider adding another function to the KHO API, like
> `is_booting_with_kho()`, that practically wraps the `kho_get_fdt()`.
> IMO, it feels a bit cleaner this way, as other components  don't
> necessarily (need to) know the internal FDT based implementation of
> KHO. That being said, I am definitely not the most experienced person
> when it comes to API design, so there is a high chance that I am way
> off :)
> 
> So to sum it up, I see three paths forward:
> 1. Condition with `kho_is_enabled()` instead of the CONFIG (accepting
>    the minor cold boot overhead)
> 2. Post another patch that extends the KHO API, adding a wrapper for
>    the `kho_get_fdt()`, like `is_booting_with_kho()` indicating that we
>    are booting with KHO enabled
> 3. Post another patch that exports the `kho_get_fdt()` directly.

My preference is for the second option, I'd just name it is_kho_boot()
 
> I am happy to implement any of the three, or any other suggestion you
> might have.
> 
> > > +		struct memblock_region *reg;
> > > +		phys_addr_t start, size;
> > > +		int i;
> > > +
> > > +		/* Remove all non-KHO regions */
> > > +		for (i = memblock.memory.cnt - 1; i >= 0; i--) {
> >
> > Please use for_each_mem_region()
> >
> 
> Todo in Rev2.
> 
> --
> Kind Regards,
> Evangelos.

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-08-14  8:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-08 16:36 [PATCH] efi: Support booting with kexec handover (KHO) Evangelos Petrongonas
2025-08-11  6:39 ` Mike Rapoport
2025-08-14  0:53   ` Evangelos Petrongonas
2025-08-14  8:51     ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).