* [PATCH] efi: Support booting with kexec handover (KHO) @ 2025-08-08 16:36 Evangelos Petrongonas 2025-08-11 6:39 ` Mike Rapoport 0 siblings, 1 reply; 4+ messages in thread From: Evangelos Petrongonas @ 2025-08-08 16:36 UTC (permalink / raw) To: ardb Cc: Evangelos Petrongonas, Alexander Graf, Mike Rapoport, Changyuan Lyu, kexec, nh-open-source, linux-efi, linux-kernel When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions early during device tree scanning. After kexec, the new kernel exclusively uses this region for memory allocations during boot up to the initialization of the page allocator However, when booting with EFI, EFI's reserve_regions() uses memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before rebuilding them from EFI data. This destroys KHO scratch regions and their flags, thus causing a kernel panic, as there are no scratch memory regions. Instead of wholesale removal, iterate through memory regions and only remove non-KHO ones. This preserves KHO scratch regions while still allowing EFI to rebuild its memory map. Signed-off-by: Evangelos Petrongonas <epetron@amazon.de> --- Reproduction/Verification Steps The issue and the fix can be reproduced/verified by booting a VM with EFI and attempting to perform a KHO enabled kexec. The fix was developed/tested on arm64. drivers/firmware/efi/efi-init.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c index a00e07b853f22..2f08b1ab764f6 100644 --- a/drivers/firmware/efi/efi-init.c +++ b/drivers/firmware/efi/efi-init.c @@ -164,12 +164,35 @@ static __init void reserve_regions(void) pr_info("Processing EFI memory map:\n"); /* - * Discard memblocks discovered so far: if there are any at this - * point, they originate from memory nodes in the DT, and UEFI - * uses its own memory map instead. + * Discard memblocks discovered so far except for KHO scratch regions. + * Most memblocks at this point originate from memory nodes in the DT, + * and UEFI uses its own memory map instead. However, if KHO is enabled, + * scratch regions must be preserved. */ memblock_dump_all(); - memblock_remove(0, PHYS_ADDR_MAX); + + if (IS_ENABLED(CONFIG_MEMBLOCK_KHO_SCRATCH)) { + struct memblock_region *reg; + phys_addr_t start, size; + int i; + + /* Remove all non-KHO regions */ + for (i = memblock.memory.cnt - 1; i >= 0; i--) { + reg = &memblock.memory.regions[i]; + if (!memblock_is_kho_scratch(reg)) { + start = reg->base; + size = reg->size; + memblock_remove(start, size); + } + } + } else { + /* + * KHO is disabled. Discard memblocks discovered so far: if there + * are any at this point, they originate from memory nodes in the + * DT, and UEFI uses its own memory map instead. + */ + memblock_remove(0, PHYS_ADDR_MAX); + } for_each_efi_memory_desc(md) { paddr = md->phys_addr; -- 2.43.0 Amazon Web Services Development Center Germany GmbH Tamara-Danz-Str. 13 10243 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] efi: Support booting with kexec handover (KHO) 2025-08-08 16:36 [PATCH] efi: Support booting with kexec handover (KHO) Evangelos Petrongonas @ 2025-08-11 6:39 ` Mike Rapoport 2025-08-14 0:53 ` Evangelos Petrongonas 0 siblings, 1 reply; 4+ messages in thread From: Mike Rapoport @ 2025-08-11 6:39 UTC (permalink / raw) To: Evangelos Petrongonas Cc: ardb, Alexander Graf, Changyuan Lyu, kexec, nh-open-source, linux-efi, linux-kernel On Fri, Aug 08, 2025 at 04:36:51PM +0000, Evangelos Petrongonas wrote: > When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions > early during device tree scanning. After kexec, the new kernel > exclusively uses this region for memory allocations during boot up to > the initialization of the page allocator > > However, when booting with EFI, EFI's reserve_regions() uses > memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before > rebuilding them from EFI data. This destroys KHO scratch regions and > their flags, thus causing a kernel panic, as there are no scratch > memory regions. > > Instead of wholesale removal, iterate through memory regions and only > remove non-KHO ones. This preserves KHO scratch regions while still > allowing EFI to rebuild its memory map. It's worth mentioning that scratch areas are "good known memory" :) > Signed-off-by: Evangelos Petrongonas <epetron@amazon.de> > --- > > Reproduction/Verification Steps > The issue and the fix can be reproduced/verified by booting a VM with > EFI and attempting to perform a KHO enabled kexec. The fix > was developed/tested on arm64. > > drivers/firmware/efi/efi-init.c | 31 +++++++++++++++++++++++++++---- > 1 file changed, 27 insertions(+), 4 deletions(-) > > diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c > index a00e07b853f22..2f08b1ab764f6 100644 > --- a/drivers/firmware/efi/efi-init.c > +++ b/drivers/firmware/efi/efi-init.c > @@ -164,12 +164,35 @@ static __init void reserve_regions(void) > pr_info("Processing EFI memory map:\n"); > > /* > - * Discard memblocks discovered so far: if there are any at this > - * point, they originate from memory nodes in the DT, and UEFI > - * uses its own memory map instead. > + * Discard memblocks discovered so far except for KHO scratch regions. > + * Most memblocks at this point originate from memory nodes in the DT, > + * and UEFI uses its own memory map instead. However, if KHO is enabled, > + * scratch regions must be preserved. > */ > memblock_dump_all(); > - memblock_remove(0, PHYS_ADDR_MAX); > + > + if (IS_ENABLED(CONFIG_MEMBLOCK_KHO_SCRATCH)) { It's better to condition this on kho_get_fdt() that means that we are actually doing a handover. > + struct memblock_region *reg; > + phys_addr_t start, size; > + int i; > + > + /* Remove all non-KHO regions */ > + for (i = memblock.memory.cnt - 1; i >= 0; i--) { Please use for_each_mem_region() > + reg = &memblock.memory.regions[i]; > + if (!memblock_is_kho_scratch(reg)) { > + start = reg->base; > + size = reg->size; > + memblock_remove(start, size); > + } > + } > + } else { > + /* > + * KHO is disabled. Discard memblocks discovered so far: if there > + * are any at this point, they originate from memory nodes in the > + * DT, and UEFI uses its own memory map instead. > + */ > + memblock_remove(0, PHYS_ADDR_MAX); > + } > > for_each_efi_memory_desc(md) { > paddr = md->phys_addr; > -- > 2.43.0 -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] efi: Support booting with kexec handover (KHO) 2025-08-11 6:39 ` Mike Rapoport @ 2025-08-14 0:53 ` Evangelos Petrongonas 2025-08-14 8:51 ` Mike Rapoport 0 siblings, 1 reply; 4+ messages in thread From: Evangelos Petrongonas @ 2025-08-14 0:53 UTC (permalink / raw) To: rppt Cc: ardb, changyuanl, epetron, graf, kexec, linux-efi, linux-kernel, nh-open-source Hey Mike, thanks for your review, On Mon, 11 Aug 2025 09:39:50 +0300, Mike Rapoport <rppt@kernel.org> wrote: > On Fri, Aug 08, 2025 at 04:36:51PM +0000, Evangelos Petrongonas wrote: > > When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions > > early during device tree scanning. After kexec, the new kernel > > exclusively uses this region for memory allocations during boot up to > > the initialization of the page allocator > > > > However, when booting with EFI, EFI's reserve_regions() uses > > memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before > > rebuilding them from EFI data. This destroys KHO scratch regions and > > their flags, thus causing a kernel panic, as there are no scratch > > memory regions. > > > > Instead of wholesale removal, iterate through memory regions and only > > remove non-KHO ones. This preserves KHO scratch regions while still > > allowing EFI to rebuild its memory map. > > It's worth mentioning that scratch areas are "good known memory" :) > I Will do so on Rev2. > > Signed-off-by: Evangelos Petrongonas <epetron@amazon.de> > > --- > > > > Reproduction/Verification Steps > > The issue and the fix can be reproduced/verified by booting a VM with > > EFI and attempting to perform a KHO enabled kexec. The fix > > was developed/tested on arm64. > > > > drivers/firmware/efi/efi-init.c | 31 +++++++++++++++++++++++++++---- > > 1 file changed, 27 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c > > index a00e07b853f22..2f08b1ab764f6 100644 > > --- a/drivers/firmware/efi/efi-init.c > > +++ b/drivers/firmware/efi/efi-init.c > > @@ -164,12 +164,35 @@ static __init void reserve_regions(void) > > pr_info("Processing EFI memory map:\n"); > > > > /* > > - * Discard memblocks discovered so far: if there are any at this > > - * point, they originate from memory nodes in the DT, and UEFI > > - * uses its own memory map instead. > > + * Discard memblocks discovered so far except for KHO scratch regions. > > + * Most memblocks at this point originate from memory nodes in the DT, > > + * and UEFI uses its own memory map instead. However, if KHO is enabled, > > + * scratch regions must be preserved. > > */ > > memblock_dump_all(); > > - memblock_remove(0, PHYS_ADDR_MAX); > > + > > + if (IS_ENABLED(CONFIG_MEMBLOCK_KHO_SCRATCH)) { > > It's better to condition this on kho_get_fdt() that means that we are > actually doing a handover. > Hmm, I see that `kho_get_fdt()` is static. My first instinct was to use kho_enable() instead. Diving a bit more into the initialisation flow, during the `setup_arch()`->`efi_init()`, `kho_enable()` will return true if kho is enabled in the cmdline, but not if we are actually doing a KHO enabled kexec. However, in this case, the parsing of memory regions is going to be a noop in terms of functionality, but will contribute, negatively —though the overhead would likely be unmeasurable to the (cold) boot time. If we want to avoid that, we might consider adding another function to the KHO API, like `is_booting_with_kho()`, that practically wraps the `kho_get_fdt()`. IMO, it feels a bit cleaner this way, as other components don't necessarily (need to) know the internal FDT based implementation of KHO. That being said, I am definitely not the most experienced person when it comes to API design, so there is a high chance that I am way off :) So to sum it up, I see three paths forward: 1. Condition with `kho_is_enabled()` instead of the CONFIG (accepting the minor cold boot overhead) 2. Post another patch that extends the KHO API, adding a wrapper for the `kho_get_fdt()`, like `is_booting_with_kho()` indicating that we are booting with KHO enabled 3. Post another patch that exports the `kho_get_fdt()` directly. I am happy to implement any of the three, or any other suggestion you might have. > > + struct memblock_region *reg; > > + phys_addr_t start, size; > > + int i; > > + > > + /* Remove all non-KHO regions */ > > + for (i = memblock.memory.cnt - 1; i >= 0; i--) { > > Please use for_each_mem_region() > Todo in Rev2. > > + reg = &memblock.memory.regions[i]; > > + if (!memblock_is_kho_scratch(reg)) { > > + start = reg->base; > > + size = reg->size; > > + memblock_remove(start, size); > > + } > > + } > > + } else { > > + /* > > + * KHO is disabled. Discard memblocks discovered so far: if there > > + * are any at this point, they originate from memory nodes in the > > + * DT, and UEFI uses its own memory map instead. > > + */ > > + memblock_remove(0, PHYS_ADDR_MAX); > > + } > > > > for_each_efi_memory_desc(md) { > > paddr = md->phys_addr; > > -- > > 2.43.0 > > -- > Sincerely yours, > Mike. > > -- Kind Regards, Evangelos. Amazon Web Services Development Center Germany GmbH Tamara-Danz-Str. 13 10243 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] efi: Support booting with kexec handover (KHO) 2025-08-14 0:53 ` Evangelos Petrongonas @ 2025-08-14 8:51 ` Mike Rapoport 0 siblings, 0 replies; 4+ messages in thread From: Mike Rapoport @ 2025-08-14 8:51 UTC (permalink / raw) To: Evangelos Petrongonas Cc: ardb, changyuanl, graf, kexec, linux-efi, linux-kernel, nh-open-source On Thu, Aug 14, 2025 at 12:53:15AM +0000, Evangelos Petrongonas wrote: > Hey Mike, thanks for your review, > > On Mon, 11 Aug 2025 09:39:50 +0300, Mike Rapoport <rppt@kernel.org> wrote: > > On Fri, Aug 08, 2025 at 04:36:51PM +0000, Evangelos Petrongonas wrote: > > > When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions > > > early during device tree scanning. After kexec, the new kernel > > > exclusively uses this region for memory allocations during boot up to > > > the initialization of the page allocator > > > > > > However, when booting with EFI, EFI's reserve_regions() uses > > > memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before > > > rebuilding them from EFI data. This destroys KHO scratch regions and > > > their flags, thus causing a kernel panic, as there are no scratch > > > memory regions. > > > > > > Instead of wholesale removal, iterate through memory regions and only > > > remove non-KHO ones. This preserves KHO scratch regions while still > > > allowing EFI to rebuild its memory map. > > > > It's worth mentioning that scratch areas are "good known memory" :) > > > > I Will do so on Rev2. > > > > Signed-off-by: Evangelos Petrongonas <epetron@amazon.de> > > > --- > > > > > > */ > > > memblock_dump_all(); > > > - memblock_remove(0, PHYS_ADDR_MAX); > > > + > > > + if (IS_ENABLED(CONFIG_MEMBLOCK_KHO_SCRATCH)) { > > > > It's better to condition this on kho_get_fdt() that means that we are > > actually doing a handover. > > > > Hmm, I see that `kho_get_fdt()` is static. My first instinct was to use > kho_enable() instead. Diving a bit more into the initialisation flow, > during the `setup_arch()`->`efi_init()`, `kho_enable()` will return > true if kho is enabled in the cmdline, but not if we are actually doing > a KHO enabled kexec. However, in this case, the parsing of memory > regions is going to be a noop in terms of functionality, but will > contribute, negatively —though the overhead would likely be > unmeasurable to the (cold) boot time. If we want to avoid that, we > might consider adding another function to the KHO API, like > `is_booting_with_kho()`, that practically wraps the `kho_get_fdt()`. > IMO, it feels a bit cleaner this way, as other components don't > necessarily (need to) know the internal FDT based implementation of > KHO. That being said, I am definitely not the most experienced person > when it comes to API design, so there is a high chance that I am way > off :) > > So to sum it up, I see three paths forward: > 1. Condition with `kho_is_enabled()` instead of the CONFIG (accepting > the minor cold boot overhead) > 2. Post another patch that extends the KHO API, adding a wrapper for > the `kho_get_fdt()`, like `is_booting_with_kho()` indicating that we > are booting with KHO enabled > 3. Post another patch that exports the `kho_get_fdt()` directly. My preference is for the second option, I'd just name it is_kho_boot() > I am happy to implement any of the three, or any other suggestion you > might have. > > > > + struct memblock_region *reg; > > > + phys_addr_t start, size; > > > + int i; > > > + > > > + /* Remove all non-KHO regions */ > > > + for (i = memblock.memory.cnt - 1; i >= 0; i--) { > > > > Please use for_each_mem_region() > > > > Todo in Rev2. > > -- > Kind Regards, > Evangelos. -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-08-14 8:51 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-08 16:36 [PATCH] efi: Support booting with kexec handover (KHO) Evangelos Petrongonas 2025-08-11 6:39 ` Mike Rapoport 2025-08-14 0:53 ` Evangelos Petrongonas 2025-08-14 8:51 ` Mike Rapoport
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).