* [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map
@ 2026-03-19 9:05 Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 01/19] memblock: Permit existing reserved regions to be marked RSRV_KERN Ard Biesheuvel
` (19 more replies)
0 siblings, 20 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
At boot, x86 uses E820 tables, memblock tables and the EFI memory map to
reason about which parts of system RAM are available to the OS, and
which are reserved.
While other EFI architectures treat the EFI memory map as immutable, the
x86 boot code modifies it to keep track of memory reservations of boot
services data regions, in order to distinguish which parts have been
memblock_reserve()'d permanently, and which ones have been reserved only
temporarily to work around buggy implementations of the EFI runtime
service [SetVirtualAddressMap()] that reconfigures the VA space of the
runtime services themselves.
This method is mostly fine for marking entire regions as reserved, but
it gets complicated when the code decides to split EFI memory map
entries in order to mark some of it permanently reserved, and the rest
of it temporarily reserved.
Let's clean this up, by
- marking permanent reservations of EFI boot services data memory as
MEMBLOCK_RSRV_KERN
- taking this marking into account when deciding whether or not a EFI
boot services data region can be freed
- dropping all of the EFI memory map insertion/splitting logic and the
allocation/freeing logic, all of which have become redundant.
Changes since v1:
- Also get rid of all reallocation logic, and just reuse the initial
allocation throughout, and keep track of the number of valid entries
- Drop abuse of the EFI_MEMORY_RUNTIME flag
- Add acks from Mike to #1-#2
This v2 now gets rid of all manipulations of the EFI memory map except
for setting the virtual address field and suppressing unwanted entries.
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ard Biesheuvel (19):
memblock: Permit existing reserved regions to be marked RSRV_KERN
efi: Tag memblock reservations of boot services regions as RSRV_KERN
x86/efi: Unmap kernel-reserved boot regions from EFI page tables
x86/efi: Drop EFI_MEMORY_RUNTIME check from __ioremap_check_other()
x86/efi: Omit RSRV_KERN memblock reservations when freeing boot
regions
x86/efi: Defer sub-1M check from unmap to free stage
x86/efi: Simplify real mode trampoline allocation quirk
x86/efi: Omit redundant kernel image overlap check
x86/efi: Drop redundant EFI_PARAVIRT check
x86/efi: Do not rely on EFI_MEMORY_RUNTIME bit and avoid entry
splitting
efi: Use nr_map not map_end to find the last valid memory map entry
x86/efi: Only merge EFI memory map entries on 32-bit systems
x86/efi: Clean the memory map using iterator and filter API
x86/efi: Update the runtime map in place
x86/efi: Use iterator API when mapping EFI regions for runtime
x86/efi: Reuse memory map instead of reallocating it
x86/efi: Defer compaction of the EFI memory map
x86/efi: Do not abuse RUNTIME bit to mark boot regions as reserved
x86/efi: Free unused tail of the EFI memory map
arch/x86/include/asm/efi.h | 15 +-
arch/x86/mm/ioremap.c | 6 +-
arch/x86/platform/efi/Makefile | 2 +-
arch/x86/platform/efi/efi.c | 247 ++++-------------
arch/x86/platform/efi/efi_32.c | 31 +++
arch/x86/platform/efi/memmap.c | 250 -----------------
arch/x86/platform/efi/quirks.c | 287 ++++++--------------
arch/x86/platform/efi/runtime-map.c | 4 +-
drivers/firmware/efi/arm-runtime.c | 2 +-
drivers/firmware/efi/efi.c | 4 +-
drivers/firmware/efi/memattr.c | 2 +-
drivers/firmware/efi/memmap.c | 8 +-
drivers/firmware/efi/riscv-runtime.c | 2 +-
include/linux/efi.h | 29 +-
include/linux/memblock.h | 1 +
mm/memblock.c | 15 +
16 files changed, 236 insertions(+), 669 deletions(-)
delete mode 100644 arch/x86/platform/efi/memmap.c
base-commit: 1f318b96cc84d7c2ab792fcc0bfd42a7ca890681
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 01/19] memblock: Permit existing reserved regions to be marked RSRV_KERN
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 02/19] efi: Tag memblock reservations of boot services regions as RSRV_KERN Ard Biesheuvel
` (18 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
Permit existing memblock reservations to be marked as RSRV_KERN. This
will be used by the EFI code on x86 to distinguish between reservations
of boot services data regions that have actual significance to the
kernel and regions that are reserved temporarily to work around buggy
firmware.
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 15 +++++++++++++++
2 files changed, 16 insertions(+)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 6ec5e9ac0699..9eac4f268359 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -155,6 +155,7 @@ int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
int memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t size);
+int memblock_reserved_mark_kern(phys_addr_t base, phys_addr_t size);
int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size);
int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size);
diff --git a/mm/memblock.c b/mm/memblock.c
index b3ddfdec7a80..2505ce8b319c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1115,6 +1115,21 @@ int __init_memblock memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t
MEMBLOCK_RSRV_NOINIT);
}
+/**
+ * memblock_reserved_mark_kern - Mark a reserved memory region with flag
+ * MEMBLOCK_RSRV_KERN
+ *
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int __init_memblock memblock_reserved_mark_kern(phys_addr_t base, phys_addr_t size)
+{
+ return memblock_setclr_flag(&memblock.reserved, base, size, 1,
+ MEMBLOCK_RSRV_KERN);
+}
+
/**
* memblock_mark_kho_scratch - Mark a memory region as MEMBLOCK_KHO_SCRATCH.
* @base: the base phys addr of the region
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 02/19] efi: Tag memblock reservations of boot services regions as RSRV_KERN
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 01/19] memblock: Permit existing reserved regions to be marked RSRV_KERN Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 03/19] x86/efi: Unmap kernel-reserved boot regions from EFI page tables Ard Biesheuvel
` (17 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
By definition, EFI memory regions of type boot services code or data
have no special significance to the firmware at runtime, only to the OS.
In some cases, the firmware will allocate tables and other assets that
are passed in memory in regions of this type, and leave it up to the OS
to decide whether or not to treat the allocation as special, or simply
consume the contents at boot and recycle the RAM for ordinary use. The
reason for this approach is that it avoids needless memory reservations
for assets that the OS knows nothing about, and therefore doesn't know
how to free either.
This means that any memblock reservations covering such regions can be
marked as MEMBLOCK_RSRV_KERN - this is a better match semantically, and
is useful on x86 to distinguish true reservations from temporary
reservations that are only needed to work around firmware bugs.
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
drivers/firmware/efi/efi.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index b2fb92a4bbd1..e4ab7481bbf6 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -600,7 +600,9 @@ void __init efi_mem_reserve(phys_addr_t addr, u64 size)
return;
if (!memblock_is_region_reserved(addr, size))
- memblock_reserve(addr, size);
+ memblock_reserve_kern(addr, size);
+ else
+ memblock_reserved_mark_kern(addr, size);
/*
* Some architectures (x86) reserve all boot services ranges
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 03/19] x86/efi: Unmap kernel-reserved boot regions from EFI page tables
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 01/19] memblock: Permit existing reserved regions to be marked RSRV_KERN Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 02/19] efi: Tag memblock reservations of boot services regions as RSRV_KERN Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 04/19] x86/efi: Drop EFI_MEMORY_RUNTIME check from __ioremap_check_other() Ard Biesheuvel
` (16 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
Currently, the logic that unmaps boot services code and data regions
that were mapped temporarily to work around firmware bugs disregards
regions that have been marked as EFI_MEMORY_RUNTIME. However, such
regions only have significance to the OS, and there is no reason to
retain the mapping in the EFI page tables, given that the runtime
firmware must never touch those regions.
So pull the unmap forward.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/quirks.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 35caa5746115..30b8012eafaa 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -442,12 +442,6 @@ void __init efi_unmap_boot_services(void)
continue;
}
- /* Do not free, someone else owns it: */
- if (md->attribute & EFI_MEMORY_RUNTIME) {
- num_entries++;
- continue;
- }
-
/*
* Before calling set_virtual_address_map(), EFI boot services
* code/data regions were mapped as a quirk for buggy firmware.
@@ -455,6 +449,12 @@ void __init efi_unmap_boot_services(void)
*/
efi_unmap_pages(md);
+ /* Do not free, someone else owns it: */
+ if (md->attribute & EFI_MEMORY_RUNTIME) {
+ num_entries++;
+ continue;
+ }
+
/*
* Nasty quirk: if all sub-1MB memory is used for boot
* services, we can get here without having allocated the
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 04/19] x86/efi: Drop EFI_MEMORY_RUNTIME check from __ioremap_check_other()
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (2 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 03/19] x86/efi: Unmap kernel-reserved boot regions from EFI page tables Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 05/19] x86/efi: Omit RSRV_KERN memblock reservations when freeing boot regions Ard Biesheuvel
` (15 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
__ioremap_check_other() is called when memremap() is used on memory that
turns out to be reserved. This may be the case for ESRT or MOK tables
that are reserved via efi_mem_reserve(), in which case they will be
covered by EfiBootServicesData entries in the EFI memory map.
Such entries are created with the EFI_MEMORY_RUNTIME attribute set, to
distinguish them from EfiBootServicesData entries that were reserved
only temporarily, in order to work around firmware bugs.
However, given that
a) __ioremap_check_other() is only called for memory that could not be
mapped using try_ram_remap(),
b) on x86, the EFI memory map only retains EfiBootServicesData entries that
cover a permanent reservation,
the EFI_MEMORY_RUNTIME check is redundant, and can be dropped.
This removes the need to set this attribute in the first place, which is
desirable as it results in considerable complexity in managing the EFI
memory map on x86. This is implemented in subsequent patches.
While at it, use switch() rather than if() to avoid multiple calls to
efi_mem_type(), which is backed by a hypervisor call in some cases.
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/mm/ioremap.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 12c8180ca1ba..2d0e1cfb9054 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -124,9 +124,9 @@ static void __ioremap_check_other(resource_size_t addr, struct ioremap_desc *des
if (!IS_ENABLED(CONFIG_EFI))
return;
- if (efi_mem_type(addr) == EFI_RUNTIME_SERVICES_DATA ||
- (efi_mem_type(addr) == EFI_BOOT_SERVICES_DATA &&
- efi_mem_attributes(addr) & EFI_MEMORY_RUNTIME))
+ switch (efi_mem_type(addr))
+ case EFI_RUNTIME_SERVICES_DATA:
+ case EFI_BOOT_SERVICES_DATA:
desc->flags |= IORES_MAP_ENCRYPTED;
}
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 05/19] x86/efi: Omit RSRV_KERN memblock reservations when freeing boot regions
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (3 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 04/19] x86/efi: Drop EFI_MEMORY_RUNTIME check from __ioremap_check_other() Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 06/19] x86/efi: Defer sub-1M check from unmap to free stage Ard Biesheuvel
` (14 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
Instead of freeing all EFI boot services code and data regions that were
preliminarily reserved during early boot to work around buggy firmware,
take care to only free those parts that are not marked as
MEMBLOCK_RSRV_KERN. This marking is used by the generic implementation
of efi_mem_reserve() to mark things like informational tables that are
provided to the OS by the firmware, but where the contents of memory
have no significance to the firmware itself. Such assets are often
passed in a EFI boot service data region, leaving it to the OS to decide
whether it needs to be reserved or not.
This removes the need to mark such regions as EFI_MEMORY_RUNTIME, which
is a hack that results in a lot of complexity in updating and
re-allocating the EFI memory map, which would otherwise not need to be
modified at all. Note that x86 is the only EFI arch that does any of
this, others just treat the EFI memory map as immutable.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/quirks.c | 40 +++++++++++++++++---
1 file changed, 35 insertions(+), 5 deletions(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 30b8012eafaa..906e29754026 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -536,6 +536,40 @@ void __init efi_unmap_boot_services(void)
}
}
+static unsigned long __init
+efi_free_unreserved_subregions(u64 range_start, u64 range_end)
+{
+ struct memblock_region *region;
+ unsigned long freed = 0;
+
+ for_each_reserved_mem_region(region) {
+ u64 region_end = region->base + region->size;
+ u64 start, end;
+
+ /* memblock tables are sorted so no need to carry on */
+ if (region->base >= range_end)
+ break;
+
+ if (region_end < range_start)
+ continue;
+
+ if (region->flags & MEMBLOCK_RSRV_KERN)
+ continue;
+
+ start = PAGE_ALIGN(max(range_start, region->base));
+ end = PAGE_ALIGN_DOWN(min(range_end, region_end));
+
+ if (start >= end)
+ continue;
+
+ free_reserved_area(phys_to_virt(start),
+ phys_to_virt(end), -1, NULL);
+ freed += (end - start);
+ }
+
+ return freed;
+}
+
static int __init efi_free_boot_services(void)
{
struct efi_freeable_range *range = ranges_to_free;
@@ -545,11 +579,7 @@ static int __init efi_free_boot_services(void)
return 0;
while (range->start) {
- void *start = phys_to_virt(range->start);
- void *end = phys_to_virt(range->end);
-
- free_reserved_area(start, end, -1, NULL);
- freed += (end - start);
+ freed += efi_free_unreserved_subregions(range->start, range->end);
range++;
}
kfree(ranges_to_free);
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 06/19] x86/efi: Defer sub-1M check from unmap to free stage
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (4 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 05/19] x86/efi: Omit RSRV_KERN memblock reservations when freeing boot regions Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 07/19] x86/efi: Simplify real mode trampoline allocation quirk Ard Biesheuvel
` (13 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
As a first step towards moving the free logic to a later stage
altogether, and only keeping the unmap and the realmode trampoline hack
during the early stage of freeing the boot service code and data
regions, move the logic that avoids freeing memory below 1M to the later
stage.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/quirks.c | 24 +++++++++-----------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 906e29754026..25f51d673ad6 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -475,18 +475,6 @@ void __init efi_unmap_boot_services(void)
size -= rm_size;
}
- /*
- * Don't free memory under 1M for two reasons:
- * - BIOS might clobber it
- * - Crash kernel needs it to be reserved
- */
- if (start + size < SZ_1M)
- continue;
- if (start < SZ_1M) {
- size -= (SZ_1M - start);
- start = SZ_1M;
- }
-
/*
* With CONFIG_DEFERRED_STRUCT_PAGE_INIT parts of the memory
* map are still not initialized and we can't reliably free
@@ -579,7 +567,17 @@ static int __init efi_free_boot_services(void)
return 0;
while (range->start) {
- freed += efi_free_unreserved_subregions(range->start, range->end);
+ /*
+ * Don't free memory under 1M for two reasons:
+ * - BIOS might clobber it
+ * - Crash kernel needs it to be reserved
+ */
+ u64 start = max(range->start, SZ_1M);
+
+ if (start >= range->end)
+ continue;
+
+ freed += efi_free_unreserved_subregions(start, range->end);
range++;
}
kfree(ranges_to_free);
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 07/19] x86/efi: Simplify real mode trampoline allocation quirk
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (5 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 06/19] x86/efi: Defer sub-1M check from unmap to free stage Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 08/19] x86/efi: Omit redundant kernel image overlap check Ard Biesheuvel
` (12 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
To work around a common bug in EFI firmware for x86 systems, Linux
reserves all EFI boot services code and data regions until after it has
invoked the SetVirtualAddressMap() EFI runtime service. This is needed
because those regions may still be accessed by the firmware during that
call, even though the EFI spec says that they shouldn't.
This includes any boot services data regions below 1M, which might mean
that by the time the real mode trampoline is being allocated, all memory
below 1M is already exhausted.
Commit
5bc653b73182 ("x86/efi: Allocate a trampoline if needed in efi_free_boot_services()")
added a quirk to detect this condition, and to make another attempt at
allocating the real mode trampoline when freeing those boot services
regions again. This is a rather crude hack, which gets in the way of
cleanup work on the EFI/x86 memory map handling code.
Given that
- the real mode trampoline is normally allocated soon after all EFI boot
services regions are reserved temporarily,
- this allocation logic marks all memory below 1M as reserved,
- the trampoline memory is not actually populated until an early
initcall,
there is actually no need to reserve any boot services regions below 1M,
even if they are mapped into the EFI page tables during the call to
SetVirtualAddressMap(). So cap the lower bound of the reserved regions
to 1M, and fix up the size accordingly when making the reservation. This
allows the additional quirk to be dropped entirely.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/quirks.c | 29 ++++----------------
1 file changed, 6 insertions(+), 23 deletions(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 25f51d673ad6..c867153eab8a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -329,10 +329,14 @@ void __init efi_reserve_boot_services(void)
return;
for_each_efi_memory_desc(md) {
- u64 start = md->phys_addr;
- u64 size = md->num_pages << EFI_PAGE_SHIFT;
+ u64 start = max(md->phys_addr, SZ_1M);
+ u64 end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT);
+ u64 size = end - start;
bool already_reserved;
+ if (end < start)
+ continue;
+
if (md->type != EFI_BOOT_SERVICES_CODE &&
md->type != EFI_BOOT_SERVICES_DATA)
continue;
@@ -434,7 +438,6 @@ void __init efi_unmap_boot_services(void)
for_each_efi_memory_desc(md) {
unsigned long long start = md->phys_addr;
unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
- size_t rm_size;
if (md->type != EFI_BOOT_SERVICES_CODE &&
md->type != EFI_BOOT_SERVICES_DATA) {
@@ -455,26 +458,6 @@ void __init efi_unmap_boot_services(void)
continue;
}
- /*
- * Nasty quirk: if all sub-1MB memory is used for boot
- * services, we can get here without having allocated the
- * real mode trampoline. It's too late to hand boot services
- * memory back to the memblock allocator, so instead
- * try to manually allocate the trampoline if needed.
- *
- * I've seen this on a Dell XPS 13 9350 with firmware
- * 1.4.4 with SGX enabled booting Linux via Fedora 24's
- * grub2-efi on a hard disk. (And no, I don't know why
- * this happened, but Linux should still try to boot rather
- * panicking early.)
- */
- rm_size = real_mode_size_needed();
- if (rm_size && (start + rm_size) < (1<<20) && size >= rm_size) {
- set_real_mode_mem(start);
- start += rm_size;
- size -= rm_size;
- }
-
/*
* With CONFIG_DEFERRED_STRUCT_PAGE_INIT parts of the memory
* map are still not initialized and we can't reliably free
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 08/19] x86/efi: Omit redundant kernel image overlap check
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (6 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 07/19] x86/efi: Simplify real mode trampoline allocation quirk Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 09/19] x86/efi: Drop redundant EFI_PARAVIRT check Ard Biesheuvel
` (11 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
The physical region covering the kernel's executable image is
memblock_reserve()'d in early_mem_reserve(), and so it is guaranteed not
to intersect with the regions passed to can_free_region(). So remove the
pointless overlap check.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/quirks.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index c867153eab8a..13d9e036a23a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -305,16 +305,11 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
* can free regions in efi_free_boot_services().
*
* Use this function to ensure we do not free regions owned by somebody
- * else. We must only reserve (and then free) regions:
- *
- * - Not within any part of the kernel
- * - Not the BIOS reserved area (E820_TYPE_RESERVED, E820_TYPE_NVS, etc)
+ * else. We must only reserve (and then free) regions that do not intersect
+ * with the BIOS reserved area (E820_TYPE_RESERVED, E820_TYPE_NVS, etc)
*/
static __init bool can_free_region(u64 start, u64 size)
{
- if (start + size > __pa_symbol(_text) && start <= __pa_symbol(_end))
- return false;
-
if (!e820__mapped_all(start, start+size, E820_TYPE_RAM))
return false;
@@ -347,10 +342,8 @@ void __init efi_reserve_boot_services(void)
* Because the following memblock_reserve() is paired
* with free_reserved_area() for this region in
* efi_free_boot_services(), we must be extremely
- * careful not to reserve, and subsequently free,
- * critical regions of memory (like the kernel image) or
- * those regions that somebody else has already
- * reserved.
+ * careful not to reserve, and subsequently free, critical
+ * regions of memory that somebody else has already reserved.
*
* A good example of a critical region that must not be
* freed is page zero (first 4Kb of memory), which may
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 09/19] x86/efi: Drop redundant EFI_PARAVIRT check
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (7 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 08/19] x86/efi: Omit redundant kernel image overlap check Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 10/19] x86/efi: Do not rely on EFI_MEMORY_RUNTIME bit and avoid entry splitting Ard Biesheuvel
` (10 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
efi_memblock_x86_reserve_range() exits early if EFI_PARAVIRT is set, so
there is no point in checking it a second time further down.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/efi.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index d84c6020dda1..b60f8454a1ec 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -211,11 +211,9 @@ int __init efi_memblock_x86_reserve_range(void)
data.desc_size = e->efi_memdesc_size;
data.desc_version = e->efi_memdesc_version;
- if (!efi_enabled(EFI_PARAVIRT)) {
- rv = efi_memmap_init_early(&data);
- if (rv)
- return rv;
- }
+ rv = efi_memmap_init_early(&data);
+ if (rv)
+ return rv;
if (add_efi_memmap || do_efi_soft_reserve())
do_add_efi_memmap();
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 10/19] x86/efi: Do not rely on EFI_MEMORY_RUNTIME bit and avoid entry splitting
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (8 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 09/19] x86/efi: Drop redundant EFI_PARAVIRT check Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 11/19] efi: Use nr_map not map_end to find the last valid memory map entry Ard Biesheuvel
` (9 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
Now that efi_mem_reserve() has been updated to rely on RSRV_KERN
memblock reservations, it is no longer needed to mark memblock reserved
boot services regions as EFI_MEMORY_RUNTIME. This means that it is no
longer needed to split existing entries in the EFI memory map, removing
the need to re-allocate/copy/remap the entire EFI memory map on every
call to efi_mem_reserve().
So drop this functionality - it is no longer needed.
Note that, for the time being, the E820 map needs to be consulted to
decide whether or not an entry with the EFI_MEMORY_RUNTIME bit cleared
needs to be preserved in the runtime map or not. However, this will be
superseded and removed by a subsequent patch, which combines the map
compaction with the actual freeing, in which case the freeing logic can
answer this question directly.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/include/asm/efi.h | 4 -
arch/x86/platform/efi/memmap.c | 138 --------------------
arch/x86/platform/efi/quirks.c | 62 +--------
3 files changed, 6 insertions(+), 198 deletions(-)
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 51b4cdbea061..b01dd639bf62 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -396,10 +396,6 @@ extern int __init efi_memmap_alloc(unsigned int num_entries,
struct efi_memory_map_data *data);
extern int __init efi_memmap_install(struct efi_memory_map_data *data);
-extern int __init efi_memmap_split_count(efi_memory_desc_t *md,
- struct range *range);
-extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,
- void *buf, struct efi_mem_range *mem);
extern enum efi_secureboot_mode __x86_ima_efi_boot_mode(void);
diff --git a/arch/x86/platform/efi/memmap.c b/arch/x86/platform/efi/memmap.c
index 023697c88910..8ef45014c7e7 100644
--- a/arch/x86/platform/efi/memmap.c
+++ b/arch/x86/platform/efi/memmap.c
@@ -110,141 +110,3 @@ int __init efi_memmap_install(struct efi_memory_map_data *data)
__efi_memmap_free(phys, size, flags);
return 0;
}
-
-/**
- * efi_memmap_split_count - Count number of additional EFI memmap entries
- * @md: EFI memory descriptor to split
- * @range: Address range (start, end) to split around
- *
- * Returns the number of additional EFI memmap entries required to
- * accommodate @range.
- */
-int __init efi_memmap_split_count(efi_memory_desc_t *md, struct range *range)
-{
- u64 m_start, m_end;
- u64 start, end;
- int count = 0;
-
- start = md->phys_addr;
- end = start + (md->num_pages << EFI_PAGE_SHIFT) - 1;
-
- /* modifying range */
- m_start = range->start;
- m_end = range->end;
-
- if (m_start <= start) {
- /* split into 2 parts */
- if (start < m_end && m_end < end)
- count++;
- }
-
- if (start < m_start && m_start < end) {
- /* split into 3 parts */
- if (m_end < end)
- count += 2;
- /* split into 2 parts */
- if (end <= m_end)
- count++;
- }
-
- return count;
-}
-
-/**
- * efi_memmap_insert - Insert a memory region in an EFI memmap
- * @old_memmap: The existing EFI memory map structure
- * @buf: Address of buffer to store new map
- * @mem: Memory map entry to insert
- *
- * It is suggested that you call efi_memmap_split_count() first
- * to see how large @buf needs to be.
- */
-void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf,
- struct efi_mem_range *mem)
-{
- u64 m_start, m_end, m_attr;
- efi_memory_desc_t *md;
- u64 start, end;
- void *old, *new;
-
- /* modifying range */
- m_start = mem->range.start;
- m_end = mem->range.end;
- m_attr = mem->attribute;
-
- /*
- * The EFI memory map deals with regions in EFI_PAGE_SIZE
- * units. Ensure that the region described by 'mem' is aligned
- * correctly.
- */
- if (!IS_ALIGNED(m_start, EFI_PAGE_SIZE) ||
- !IS_ALIGNED(m_end + 1, EFI_PAGE_SIZE)) {
- WARN_ON(1);
- return;
- }
-
- for (old = old_memmap->map, new = buf;
- old < old_memmap->map_end;
- old += old_memmap->desc_size, new += old_memmap->desc_size) {
-
- /* copy original EFI memory descriptor */
- memcpy(new, old, old_memmap->desc_size);
- md = new;
- start = md->phys_addr;
- end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
-
- if (m_start <= start && end <= m_end)
- md->attribute |= m_attr;
-
- if (m_start <= start &&
- (start < m_end && m_end < end)) {
- /* first part */
- md->attribute |= m_attr;
- md->num_pages = (m_end - md->phys_addr + 1) >>
- EFI_PAGE_SHIFT;
- /* latter part */
- new += old_memmap->desc_size;
- memcpy(new, old, old_memmap->desc_size);
- md = new;
- md->phys_addr = m_end + 1;
- md->num_pages = (end - md->phys_addr + 1) >>
- EFI_PAGE_SHIFT;
- }
-
- if ((start < m_start && m_start < end) && m_end < end) {
- /* first part */
- md->num_pages = (m_start - md->phys_addr) >>
- EFI_PAGE_SHIFT;
- /* middle part */
- new += old_memmap->desc_size;
- memcpy(new, old, old_memmap->desc_size);
- md = new;
- md->attribute |= m_attr;
- md->phys_addr = m_start;
- md->num_pages = (m_end - m_start + 1) >>
- EFI_PAGE_SHIFT;
- /* last part */
- new += old_memmap->desc_size;
- memcpy(new, old, old_memmap->desc_size);
- md = new;
- md->phys_addr = m_end + 1;
- md->num_pages = (end - m_end) >>
- EFI_PAGE_SHIFT;
- }
-
- if ((start < m_start && m_start < end) &&
- (end <= m_end)) {
- /* first part */
- md->num_pages = (m_start - md->phys_addr) >>
- EFI_PAGE_SHIFT;
- /* latter part */
- new += old_memmap->desc_size;
- memcpy(new, old, old_memmap->desc_size);
- md = new;
- md->phys_addr = m_start;
- md->num_pages = (end - md->phys_addr + 1) >>
- EFI_PAGE_SHIFT;
- md->attribute |= m_attr;
- }
- }
-}
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 13d9e036a23a..ae4ad6389f9e 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -239,63 +239,9 @@ EXPORT_SYMBOL_GPL(efi_query_variable_store);
* buggy implementations we reserve boot services region during EFI
* init and make sure it stays executable. Then, after
* SetVirtualAddressMap(), it is discarded.
- *
- * However, some boot services regions contain data that is required
- * by drivers, so we need to track which memory ranges can never be
- * freed. This is done by tagging those regions with the
- * EFI_MEMORY_RUNTIME attribute.
- *
- * Any driver that wants to mark a region as reserved must use
- * efi_mem_reserve() which will insert a new EFI memory descriptor
- * into efi.memmap (splitting existing regions if necessary) and tag
- * it with EFI_MEMORY_RUNTIME.
*/
void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
{
- struct efi_memory_map_data data = { 0 };
- struct efi_mem_range mr;
- efi_memory_desc_t md;
- int num_entries;
- void *new;
-
- if (efi_mem_desc_lookup(addr, &md) ||
- md.type != EFI_BOOT_SERVICES_DATA) {
- pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
- return;
- }
-
- if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
- pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
- return;
- }
-
- size += addr % EFI_PAGE_SIZE;
- size = round_up(size, EFI_PAGE_SIZE);
- addr = round_down(addr, EFI_PAGE_SIZE);
-
- mr.range.start = addr;
- mr.range.end = addr + size - 1;
- mr.attribute = md.attribute | EFI_MEMORY_RUNTIME;
-
- num_entries = efi_memmap_split_count(&md, &mr.range);
- num_entries += efi.memmap.nr_map;
-
- if (efi_memmap_alloc(num_entries, &data) != 0) {
- pr_err("Could not allocate boot services memmap\n");
- return;
- }
-
- new = early_memremap_prot(data.phys_map, data.size,
- pgprot_val(pgprot_encrypted(FIXMAP_PAGE_NORMAL)));
- if (!new) {
- pr_err("Failed to map new boot services memmap\n");
- return;
- }
-
- efi_memmap_insert(&efi.memmap, new, &mr);
- early_memunmap(new, data.size);
-
- efi_memmap_install(&data);
e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
e820__update_table(e820_table);
}
@@ -446,7 +392,8 @@ void __init efi_unmap_boot_services(void)
efi_unmap_pages(md);
/* Do not free, someone else owns it: */
- if (md->attribute & EFI_MEMORY_RUNTIME) {
+ if ((md->attribute & EFI_MEMORY_RUNTIME) ||
+ !can_free_region(start, size)) {
num_entries++;
continue;
}
@@ -485,8 +432,11 @@ void __init efi_unmap_boot_services(void)
for_each_efi_memory_desc(md) {
if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
(md->type == EFI_BOOT_SERVICES_CODE ||
- md->type == EFI_BOOT_SERVICES_DATA))
+ md->type == EFI_BOOT_SERVICES_DATA) &&
+ can_free_region(md->phys_addr,
+ md->num_pages << EFI_PAGE_SHIFT)) {
continue;
+ }
memcpy(new_md, md, efi.memmap.desc_size);
new_md += efi.memmap.desc_size;
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 11/19] efi: Use nr_map not map_end to find the last valid memory map entry
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (9 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 10/19] x86/efi: Do not rely on EFI_MEMORY_RUNTIME bit and avoid entry splitting Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 12/19] x86/efi: Only merge EFI memory map entries on 32-bit systems Ard Biesheuvel
` (8 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
Currently, the efi.memmap struct keeps track of the start and the end of
the EFI memory map in memory, as well as the number of entries.
Let's repaint the nr_map field as the number of *valid* entries, and
update all the iterators and other memory map traversal routines
accordingly.
This allows pruning of invalid or unneeded entries by moving the
remaining entries to the start of the map, without the need for
freeing/reallocating or unmapping and remapping. Now that entries are
never added, but only removed, it is possible to retain the same
allocation throughout the boot process, and free the part that is no
longer in use afterwards.
While at it, implement a version of for_each_efi_memory_desc() that
traverses the memory map in opposite order. It will be used by a
subsequent patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/efi.c | 15 +++++++-----
arch/x86/platform/efi/memmap.c | 2 +-
arch/x86/platform/efi/quirks.c | 2 +-
arch/x86/platform/efi/runtime-map.c | 4 ++--
drivers/firmware/efi/arm-runtime.c | 2 +-
drivers/firmware/efi/memattr.c | 2 +-
drivers/firmware/efi/memmap.c | 8 +++----
drivers/firmware/efi/riscv-runtime.c | 2 +-
include/linux/efi.h | 24 ++++++++++++++++----
9 files changed, 39 insertions(+), 22 deletions(-)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index b60f8454a1ec..183cca8fe4a6 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -222,7 +222,7 @@ int __init efi_memblock_x86_reserve_range(void)
"Unexpected EFI_MEMORY_DESCRIPTOR version %ld",
efi.memmap.desc_version);
- memblock_reserve(pmap, efi.memmap.nr_map * efi.memmap.desc_size);
+ memblock_reserve(pmap, efi.memmap.num_valid_entries * efi.memmap.desc_size);
set_bit(EFI_PRESERVE_BS_REGIONS, &efi.flags);
return 0;
@@ -289,7 +289,7 @@ static void __init efi_clean_memmap(void)
.phys_map = efi.memmap.phys_map,
.desc_version = efi.memmap.desc_version,
.desc_size = efi.memmap.desc_size,
- .size = efi.memmap.desc_size * (efi.memmap.nr_map - n_removal),
+ .size = efi.memmap.desc_size * (efi.memmap.num_valid_entries - n_removal),
.flags = 0,
};
@@ -564,7 +564,8 @@ static inline void *efi_map_next_entry_reverse(void *entry)
{
/* Initial call */
if (!entry)
- return efi.memmap.map_end - efi.memmap.desc_size;
+ return efi_memdesc_ptr(efi.memmap.map, efi.memmap.desc_size,
+ efi.memmap.num_valid_entries - 1);
entry -= efi.memmap.desc_size;
if (entry < efi.memmap.map)
@@ -612,7 +613,9 @@ static void *efi_map_next_entry(void *entry)
return efi.memmap.map;
entry += efi.memmap.desc_size;
- if (entry >= efi.memmap.map_end)
+ if (entry >= (void *)efi_memdesc_ptr(efi.memmap.map,
+ efi.memmap.desc_size,
+ efi.memmap.num_valid_entries))
return NULL;
return entry;
@@ -743,13 +746,13 @@ static void __init kexec_enter_virtual_mode(void)
efi_memmap_unmap();
if (efi_memmap_init_late(efi.memmap.phys_map,
- efi.memmap.desc_size * efi.memmap.nr_map)) {
+ efi.memmap.desc_size * efi.memmap.num_valid_entries)) {
pr_err("Failed to remap late EFI memory map\n");
clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
return;
}
- num_pages = ALIGN(efi.memmap.nr_map * efi.memmap.desc_size, PAGE_SIZE);
+ num_pages = ALIGN(efi.memmap.num_valid_entries * efi.memmap.desc_size, PAGE_SIZE);
num_pages >>= PAGE_SHIFT;
if (efi_setup_page_tables(efi.memmap.phys_map, num_pages)) {
diff --git a/arch/x86/platform/efi/memmap.c b/arch/x86/platform/efi/memmap.c
index 8ef45014c7e7..fa580c4122c4 100644
--- a/arch/x86/platform/efi/memmap.c
+++ b/arch/x86/platform/efi/memmap.c
@@ -93,7 +93,7 @@ int __init efi_memmap_alloc(unsigned int num_entries,
*/
int __init efi_memmap_install(struct efi_memory_map_data *data)
{
- unsigned long size = efi.memmap.desc_size * efi.memmap.nr_map;
+ unsigned long size = efi.memmap.map_end - efi.memmap.map;
unsigned long flags = efi.memmap.flags;
u64 phys = efi.memmap.phys_map;
int ret;
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index ae4ad6389f9e..eecaa745d352 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -367,7 +367,7 @@ void __init efi_unmap_boot_services(void)
if (efi_enabled(EFI_DBG))
return;
- sz = sizeof(*ranges_to_free) * efi.memmap.nr_map + 1;
+ sz = sizeof(*ranges_to_free) * efi.memmap.num_valid_entries + 1;
ranges_to_free = kzalloc(sz, GFP_KERNEL);
if (!ranges_to_free) {
pr_err("Failed to allocate storage for freeable EFI regions\n");
diff --git a/arch/x86/platform/efi/runtime-map.c b/arch/x86/platform/efi/runtime-map.c
index 053ff161eb9a..fc8ca1974730 100644
--- a/arch/x86/platform/efi/runtime-map.c
+++ b/arch/x86/platform/efi/runtime-map.c
@@ -138,7 +138,7 @@ add_sysfs_runtime_map_entry(struct kobject *kobj, int nr,
int efi_get_runtime_map_size(void)
{
- return efi.memmap.nr_map * efi.memmap.desc_size;
+ return efi.memmap.num_valid_entries * efi.memmap.desc_size;
}
int efi_get_runtime_map_desc_size(void)
@@ -166,7 +166,7 @@ static int __init efi_runtime_map_init(void)
if (!efi_enabled(EFI_MEMMAP) || !efi_kobj)
return 0;
- map_entries = kzalloc_objs(entry, efi.memmap.nr_map);
+ map_entries = kzalloc_objs(entry, efi.memmap.num_valid_entries);
if (!map_entries) {
ret = -ENOMEM;
goto out;
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index 3167cab62014..e19997c09175 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -96,7 +96,7 @@ static int __init arm_enable_runtime_services(void)
efi_memmap_unmap();
- mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
+ mapsize = efi.memmap.desc_size * efi.memmap.num_valid_entries;
if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
pr_err("Failed to remap EFI memory map\n");
diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c
index e727cc5909cb..36f733b37df2 100644
--- a/drivers/firmware/efi/memattr.c
+++ b/drivers/firmware/efi/memattr.c
@@ -49,7 +49,7 @@ void __init efi_memattr_init(void)
* just be ignored altogether.
*/
size = tbl->num_entries * tbl->desc_size;
- if (size > 3 * efi.memmap.nr_map * efi.memmap.desc_size) {
+ if (size > 3 * efi.memmap.num_valid_entries * efi.memmap.desc_size) {
pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u)\n",
tbl->version, tbl->desc_size, tbl->num_entries);
goto unmap;
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index f1c04d7cfd71..035089791c93 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -49,7 +49,7 @@ int __init __efi_memmap_init(struct efi_memory_map_data *data)
}
map.phys_map = data->phys_map;
- map.nr_map = data->size / data->desc_size;
+ map.num_valid_entries = data->size / data->desc_size;
map.map_end = map.map + data->size;
map.desc_version = data->desc_version;
@@ -87,10 +87,8 @@ void __init efi_memmap_unmap(void)
return;
if (!(efi.memmap.flags & EFI_MEMMAP_LATE)) {
- unsigned long size;
-
- size = efi.memmap.desc_size * efi.memmap.nr_map;
- early_memunmap(efi.memmap.map, size);
+ early_memunmap(efi.memmap.map,
+ efi.memmap.map_end - efi.memmap.map);
} else {
memunmap(efi.memmap.map);
}
diff --git a/drivers/firmware/efi/riscv-runtime.c b/drivers/firmware/efi/riscv-runtime.c
index 60cdf7bf141f..087a7f8a74e6 100644
--- a/drivers/firmware/efi/riscv-runtime.c
+++ b/drivers/firmware/efi/riscv-runtime.c
@@ -66,7 +66,7 @@ static int __init riscv_enable_runtime_services(void)
efi_memmap_unmap();
- mapsize = efi.memmap.desc_size * efi.memmap.nr_map;
+ mapsize = efi.memmap.desc_size * efi.memmap.num_valid_entries;
if (efi_memmap_init_late(efi.memmap.phys_map, mapsize)) {
pr_err("Failed to remap EFI memory map\n");
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 664898d09ff5..b0c3e9648126 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -568,7 +568,7 @@ struct efi_memory_map {
phys_addr_t phys_map;
void *map;
void *map_end;
- int nr_map;
+ int num_valid_entries;
unsigned long desc_version;
unsigned long desc_size;
#define EFI_MEMMAP_LATE (1UL << 0)
@@ -803,9 +803,15 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm,
/* Iterate through an efi_memory_map */
#define for_each_efi_memory_desc_in_map(m, md) \
- for ((md) = (m)->map; \
- (md) && ((void *)(md) + (m)->desc_size) <= (m)->map_end; \
- (md) = (void *)(md) + (m)->desc_size)
+ for (int __idx = 0; \
+ (md) = efi_memdesc_ptr((m)->map, (m)->desc_size, __idx), \
+ __idx < (m)->num_valid_entries; ++__idx)
+
+/* Iterate through an efi_memory_map in reverse order */
+#define for_each_efi_memory_desc_in_map_rev(m, md) \
+ for (int __idx = (m)->num_valid_entries - 1; \
+ (md) = efi_memdesc_ptr((m)->map, (m)->desc_size, __idx), \
+ __idx >= 0; --__idx)
/**
* for_each_efi_memory_desc - iterate over descriptors in efi.memmap
@@ -816,6 +822,16 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm,
#define for_each_efi_memory_desc(md) \
for_each_efi_memory_desc_in_map(&efi.memmap, md)
+/**
+ * for_each_efi_memory_desc_rev - iterate over descriptors in efi.memmap in
+ * reverse order
+ * @md: the efi_memory_desc_t * iterator
+ *
+ * Once the loop finishes @md must not be accessed.
+ */
+#define for_each_efi_memory_desc_rev(md) \
+ for_each_efi_memory_desc_in_map_rev(&efi.memmap, md)
+
/*
* Format an EFI memory descriptor's type and attributes to a user-provided
* character buffer, as per snprintf(), and return the buffer.
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 12/19] x86/efi: Only merge EFI memory map entries on 32-bit systems
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (10 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 11/19] efi: Use nr_map not map_end to find the last valid memory map entry Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 13/19] x86/efi: Clean the memory map using iterator and filter API Ard Biesheuvel
` (7 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
Commit
202f9d0a4180 ("x86, efi: Merge contiguous memory regions of the same type and attribute")
introduced a pass over the EFI memory map, ensuring that contiguous
regions of the same type and attribute are coalesced into a single
entry. This was needed because relative references may exist between
those regions, and so the virtual remapping needs to preserve the
relative placement of these regions. This virtual remapping was based on
ioremap() at the time, which does not guarantee that adjacent physical
addresses are mapped adjacently in the virtual space.
Commit
d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping")
introduced a new strategy for virtually remapping the EFI runtime
services, which is now the only remaining one, and commit
a5caa209ba9c ("x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down")
tweaked the logic to ensure that the relative offset of adjacent regions
of any type is preserved on 64-bit systems, by reversing the order in
which the EFI memory map is traversed when choosing the virtual
placement.
This means that merging regions is no longer needed on 64-bit, given
that the relative placement of adjacent regions is guaranteed to be
preserved in the virtual space. So make this hack 32-bit only.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/include/asm/efi.h | 6 ++++
arch/x86/platform/efi/efi.c | 31 --------------------
arch/x86/platform/efi/efi_32.c | 31 ++++++++++++++++++++
3 files changed, 37 insertions(+), 31 deletions(-)
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index b01dd639bf62..44cdd3c1055e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -143,6 +143,12 @@ extern void efi_unmap_boot_services(void);
void arch_efi_call_virt_setup(void);
void arch_efi_call_virt_teardown(void);
+#ifdef CONFIG_X86_32
+void efi_merge_regions(void);
+#else
+static inline void efi_merge_regions(void) {}
+#endif
+
extern u64 efi_setup;
#ifdef CONFIG_EFI
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 183cca8fe4a6..a6081e3f1b88 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -502,37 +502,6 @@ void __init efi_init(void)
efi_print_memmap();
}
-/* Merge contiguous regions of the same type and attribute */
-static void __init efi_merge_regions(void)
-{
- efi_memory_desc_t *md, *prev_md = NULL;
-
- for_each_efi_memory_desc(md) {
- u64 prev_size;
-
- if (!prev_md) {
- prev_md = md;
- continue;
- }
-
- if (prev_md->type != md->type ||
- prev_md->attribute != md->attribute) {
- prev_md = md;
- continue;
- }
-
- prev_size = prev_md->num_pages << EFI_PAGE_SHIFT;
-
- if (md->phys_addr == (prev_md->phys_addr + prev_size)) {
- prev_md->num_pages += md->num_pages;
- md->type = EFI_RESERVED_TYPE;
- md->attribute = 0;
- continue;
- }
- prev_md = md;
- }
-}
-
static void *realloc_pages(void *old_memmap, int old_shift)
{
void *ret;
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index b2cc7b4552a1..886ede4117b5 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -152,3 +152,34 @@ void arch_efi_call_virt_teardown(void)
firmware_restrict_branch_speculation_end();
efi_fpu_end();
}
+
+/* Merge contiguous regions of the same type and attribute */
+void __init efi_merge_regions(void)
+{
+ efi_memory_desc_t *md, *prev_md = NULL;
+
+ for_each_efi_memory_desc(md) {
+ u64 prev_size;
+
+ if (!prev_md) {
+ prev_md = md;
+ continue;
+ }
+
+ if (prev_md->type != md->type ||
+ prev_md->attribute != md->attribute) {
+ prev_md = md;
+ continue;
+ }
+
+ prev_size = prev_md->num_pages << EFI_PAGE_SHIFT;
+
+ if (md->phys_addr == (prev_md->phys_addr + prev_size)) {
+ prev_md->num_pages += md->num_pages;
+ md->type = EFI_RESERVED_TYPE;
+ md->attribute = 0;
+ continue;
+ }
+ prev_md = md;
+ }
+}
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 13/19] x86/efi: Clean the memory map using iterator and filter API
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (11 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 12/19] x86/efi: Only merge EFI memory map entries on 32-bit systems Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 14/19] x86/efi: Update the runtime map in place Ard Biesheuvel
` (6 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
Instead of open coding the iteration logic, use the existing iterator
API to iterate over all valid entries in the EFI memory map.
In addition, break out the logic that iterates over and conditionally
suppresses memory map entries so it can be reused later, as something
similar is happening two more times during boot.
Note that actually reinstalling the EFI memory map, which involves
unmapping and remapping it, is no longer needed, given that the number
of valid entries can only go down. So omit efi_memmap_install() and just
update the number of valid entries.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/efi.c | 30 +++++++++-----------
1 file changed, 13 insertions(+), 17 deletions(-)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index a6081e3f1b88..e9b84ecc859b 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -266,36 +266,32 @@ static bool __init efi_memmap_entry_valid(const efi_memory_desc_t *md, int i)
return false;
}
-static void __init efi_clean_memmap(void)
+static int __init
+efi_memmap_filter_entries(bool (*callback)(const efi_memory_desc_t *, int))
{
efi_memory_desc_t *out = efi.memmap.map;
const efi_memory_desc_t *in = out;
- const efi_memory_desc_t *end = efi.memmap.map_end;
- int i, n_removal;
+ int i = 0, filtered = 0;
- for (i = n_removal = 0; in < end; i++) {
- if (efi_memmap_entry_valid(in, i)) {
+ for_each_efi_memory_desc(in) {
+ if (callback(in, i++)) {
if (out != in)
memcpy(out, in, efi.memmap.desc_size);
out = (void *)out + efi.memmap.desc_size;
} else {
- n_removal++;
+ filtered++;
}
- in = (void *)in + efi.memmap.desc_size;
}
+ efi.memmap.num_valid_entries -= filtered;
+ return filtered;
+}
- if (n_removal > 0) {
- struct efi_memory_map_data data = {
- .phys_map = efi.memmap.phys_map,
- .desc_version = efi.memmap.desc_version,
- .desc_size = efi.memmap.desc_size,
- .size = efi.memmap.desc_size * (efi.memmap.num_valid_entries - n_removal),
- .flags = 0,
- };
+static void __init efi_clean_memmap(void)
+{
+ int n_removal = efi_memmap_filter_entries(efi_memmap_entry_valid);
+ if (n_removal > 0)
pr_warn("Removing %d invalid memory map entries.\n", n_removal);
- efi_memmap_install(&data);
- }
}
/*
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 14/19] x86/efi: Update the runtime map in place
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (12 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 13/19] x86/efi: Clean the memory map using iterator and filter API Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 15/19] x86/efi: Use iterator API when mapping EFI regions for runtime Ard Biesheuvel
` (5 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
When creating the EFI runtime map, a copy is created containing only the
entries that will be mapped on behalf of the firmware, but the
assignment of the virtual address field is applied to both copies.
Subsequently, the copy is installed as the new EFI memory map, and the
old one is just leaked.
This means that there is no reason whatsoever to allocate and install
the copy, and it is much easier to just update the existing memory map in
place to set the virtual addresses and suppress unused entries.
So reuse the filter function used by efi_clean_memmap() to drop all
entries that are irrelevant, and then apply the existing logic to assign
the virtual addresses and create the mappings in the EFI page tables.
Note that x86_64 and i386 traverse the memory map in opposite order, so
this part remains a separate pass as before. This logic will be further
simplified in subsequent patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/efi.c | 89 +++++---------------
1 file changed, 19 insertions(+), 70 deletions(-)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index e9b84ecc859b..44d106879120 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -498,27 +498,6 @@ void __init efi_init(void)
efi_print_memmap();
}
-static void *realloc_pages(void *old_memmap, int old_shift)
-{
- void *ret;
-
- ret = (void *)__get_free_pages(GFP_KERNEL, old_shift + 1);
- if (!ret)
- goto out;
-
- /*
- * A first-time allocation doesn't have anything to copy.
- */
- if (!old_memmap)
- return ret;
-
- memcpy(ret, old_memmap, PAGE_SIZE << old_shift);
-
-out:
- free_pages((unsigned long)old_memmap, old_shift);
- return ret;
-}
-
/*
* Iterate the EFI memory map in reverse order because the regions
* will be mapped top-down. The end result is the same as if we had
@@ -586,7 +565,7 @@ static void *efi_map_next_entry(void *entry)
return entry;
}
-static bool should_map_region(efi_memory_desc_t *md)
+static bool should_map_region(const efi_memory_desc_t *md, int unused)
{
/*
* Runtime regions always require runtime mappings (obviously).
@@ -639,40 +618,14 @@ static bool should_map_region(efi_memory_desc_t *md)
* Map the efi memory ranges of the runtime services and update new_mmap with
* virtual addresses.
*/
-static void * __init efi_map_regions(int *count, int *pg_shift)
+static void __init efi_map_regions(void)
{
- void *p, *new_memmap = NULL;
- unsigned long left = 0;
- unsigned long desc_size;
efi_memory_desc_t *md;
- desc_size = efi.memmap.desc_size;
-
- p = NULL;
- while ((p = efi_map_next_entry(p))) {
- md = p;
-
- if (!should_map_region(md))
- continue;
+ efi_memmap_filter_entries(should_map_region);
+ while ((md = efi_map_next_entry(md)))
efi_map_region(md);
-
- if (left < desc_size) {
- new_memmap = realloc_pages(new_memmap, *pg_shift);
- if (!new_memmap)
- return NULL;
-
- left += PAGE_SIZE << *pg_shift;
- (*pg_shift)++;
- }
-
- memcpy(new_memmap + (*count * desc_size), md, desc_size);
-
- left -= desc_size;
- (*count)++;
- }
-
- return new_memmap;
}
static void __init kexec_enter_virtual_mode(void)
@@ -749,25 +702,10 @@ static void __init kexec_enter_virtual_mode(void)
*/
static void __init __efi_enter_virtual_mode(void)
{
- int count = 0, pg_shift = 0;
- void *new_memmap = NULL;
efi_status_t status;
+ unsigned long size;
unsigned long pa;
- if (efi_alloc_page_tables()) {
- pr_err("Failed to allocate EFI page tables\n");
- goto err;
- }
-
- efi_merge_regions();
- new_memmap = efi_map_regions(&count, &pg_shift);
- if (!new_memmap) {
- pr_err("Error reallocating memory, EFI runtime non-functional!\n");
- goto err;
- }
-
- pa = __pa(new_memmap);
-
/*
* Unregister the early EFI memmap from efi_init() and install
* the new EFI memory map that we are about to pass to the
@@ -775,22 +713,33 @@ static void __init __efi_enter_virtual_mode(void)
*/
efi_memmap_unmap();
- if (efi_memmap_init_late(pa, efi.memmap.desc_size * count)) {
+ if (efi_alloc_page_tables()) {
+ pr_err("Failed to allocate EFI page tables\n");
+ goto err;
+ }
+
+ size = efi.memmap.desc_size * efi.memmap.num_valid_entries;
+ if (efi_memmap_init_late(efi.memmap.phys_map, size)) {
pr_err("Failed to remap late EFI memory map\n");
goto err;
}
+ efi_merge_regions();
+ efi_map_regions();
+
if (efi_enabled(EFI_DBG)) {
pr_info("EFI runtime memory map:\n");
efi_print_memmap();
}
- if (efi_setup_page_tables(pa, 1 << pg_shift))
+ if (efi_setup_page_tables(efi.memmap.phys_map,
+ DIV_ROUND_UP(size, PAGE_SIZE)))
goto err;
efi_sync_low_kernel_mappings();
- status = efi_set_virtual_address_map(efi.memmap.desc_size * count,
+ pa = efi.memmap.phys_map;
+ status = efi_set_virtual_address_map(size,
efi.memmap.desc_size,
efi.memmap.desc_version,
(efi_memory_desc_t *)pa,
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 15/19] x86/efi: Use iterator API when mapping EFI regions for runtime
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (13 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 14/19] x86/efi: Update the runtime map in place Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 16/19] x86/efi: Reuse memory map instead of reallocating it Ard Biesheuvel
` (4 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
Use the generic EFI memory map iterators to invoke efi_map_region() on
each entry in the map. x86_64 and i386 traverse the map in opposite
order, so the two cases are handled separately.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/efi.c | 90 +++++---------------
1 file changed, 21 insertions(+), 69 deletions(-)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 44d106879120..8778ad441c42 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -498,73 +498,6 @@ void __init efi_init(void)
efi_print_memmap();
}
-/*
- * Iterate the EFI memory map in reverse order because the regions
- * will be mapped top-down. The end result is the same as if we had
- * mapped things forward, but doesn't require us to change the
- * existing implementation of efi_map_region().
- */
-static inline void *efi_map_next_entry_reverse(void *entry)
-{
- /* Initial call */
- if (!entry)
- return efi_memdesc_ptr(efi.memmap.map, efi.memmap.desc_size,
- efi.memmap.num_valid_entries - 1);
-
- entry -= efi.memmap.desc_size;
- if (entry < efi.memmap.map)
- return NULL;
-
- return entry;
-}
-
-/*
- * efi_map_next_entry - Return the next EFI memory map descriptor
- * @entry: Previous EFI memory map descriptor
- *
- * This is a helper function to iterate over the EFI memory map, which
- * we do in different orders depending on the current configuration.
- *
- * To begin traversing the memory map @entry must be %NULL.
- *
- * Returns %NULL when we reach the end of the memory map.
- */
-static void *efi_map_next_entry(void *entry)
-{
- if (efi_enabled(EFI_64BIT)) {
- /*
- * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
- * config table feature requires us to map all entries
- * in the same order as they appear in the EFI memory
- * map. That is to say, entry N must have a lower
- * virtual address than entry N+1. This is because the
- * firmware toolchain leaves relative references in
- * the code/data sections, which are split and become
- * separate EFI memory regions. Mapping things
- * out-of-order leads to the firmware accessing
- * unmapped addresses.
- *
- * Since we need to map things this way whether or not
- * the kernel actually makes use of
- * EFI_PROPERTIES_TABLE, let's just switch to this
- * scheme by default for 64-bit.
- */
- return efi_map_next_entry_reverse(entry);
- }
-
- /* Initial call */
- if (!entry)
- return efi.memmap.map;
-
- entry += efi.memmap.desc_size;
- if (entry >= (void *)efi_memdesc_ptr(efi.memmap.map,
- efi.memmap.desc_size,
- efi.memmap.num_valid_entries))
- return NULL;
-
- return entry;
-}
-
static bool should_map_region(const efi_memory_desc_t *md, int unused)
{
/*
@@ -624,8 +557,27 @@ static void __init efi_map_regions(void)
efi_memmap_filter_entries(should_map_region);
- while ((md = efi_map_next_entry(md)))
- efi_map_region(md);
+ /*
+ * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE config table feature
+ * requires us to map all entries in the same order as they appear in
+ * the EFI memory map. That is to say, entry N must have a lower
+ * virtual address than entry N+1. This is because the firmware
+ * toolchain leaves relative references in the code/data sections,
+ * which are split and become separate EFI memory regions. Mapping
+ * things out-of-order leads to the firmware accessing unmapped
+ * addresses.
+ *
+ * Since we need to map things this way whether or not the kernel
+ * actually makes use of EFI_PROPERTIES_TABLE, let's just switch to
+ * this scheme by default for 64-bit.
+ */
+ if (efi_enabled(EFI_64BIT)) {
+ for_each_efi_memory_desc_rev(md)
+ efi_map_region(md);
+ } else {
+ for_each_efi_memory_desc(md)
+ efi_map_region(md);
+ }
}
static void __init kexec_enter_virtual_mode(void)
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 16/19] x86/efi: Reuse memory map instead of reallocating it
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (14 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 15/19] x86/efi: Use iterator API when mapping EFI regions for runtime Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 17/19] x86/efi: Defer compaction of the EFI memory map Ard Biesheuvel
` (3 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
The EFI memory map consists of 10s to 100s of entries of around 40 bytes
each. The initial version is allocated and populated by the EFI stub,
but later on, after freeing the boot services data regions and pruning
the associated entries, a new memory map is allocated with room for only
the remaining entries, which are typically much fewer in number.
Given that the original allocation is never freed, this does not
actually save any memory currently, and it is much simpler to just move
the entries that need to be preserved to the beginning of the map, and
truncate it. That way, a lot of the complicated memory map allocation
and freeing code can simply be dropped.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/include/asm/efi.h | 5 -
arch/x86/platform/efi/Makefile | 2 +-
arch/x86/platform/efi/memmap.c | 112 --------------------
arch/x86/platform/efi/quirks.c | 30 +-----
include/linux/efi.h | 5 +-
5 files changed, 6 insertions(+), 148 deletions(-)
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 44cdd3c1055e..f21b9e85f544 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -398,11 +398,6 @@ static inline void efi_reserve_boot_services(void)
}
#endif /* CONFIG_EFI */
-extern int __init efi_memmap_alloc(unsigned int num_entries,
- struct efi_memory_map_data *data);
-
-extern int __init efi_memmap_install(struct efi_memory_map_data *data);
-
extern enum efi_secureboot_mode __x86_ima_efi_boot_mode(void);
#define arch_ima_efi_boot_mode __x86_ima_efi_boot_mode()
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 500cab4a7f7c..28772e046a1b 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,7 +2,7 @@
KASAN_SANITIZE := n
GCOV_PROFILE := n
-obj-$(CONFIG_EFI) += memmap.o quirks.o efi.o efi_$(BITS).o \
+obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o \
efi_stub_$(BITS).o
obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
obj-$(CONFIG_EFI_RUNTIME_MAP) += runtime-map.o
diff --git a/arch/x86/platform/efi/memmap.c b/arch/x86/platform/efi/memmap.c
deleted file mode 100644
index fa580c4122c4..000000000000
--- a/arch/x86/platform/efi/memmap.c
+++ /dev/null
@@ -1,112 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Common EFI memory map functions.
- */
-
-#define pr_fmt(fmt) "efi: " fmt
-
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/efi.h>
-#include <linux/io.h>
-#include <asm/early_ioremap.h>
-#include <asm/efi.h>
-#include <linux/memblock.h>
-#include <linux/slab.h>
-
-static phys_addr_t __init __efi_memmap_alloc_early(unsigned long size)
-{
- return memblock_phys_alloc(size, SMP_CACHE_BYTES);
-}
-
-static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
-{
- unsigned int order = get_order(size);
- struct page *p = alloc_pages(GFP_KERNEL, order);
-
- if (!p)
- return 0;
-
- return PFN_PHYS(page_to_pfn(p));
-}
-
-static
-void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
-{
- if (flags & EFI_MEMMAP_MEMBLOCK) {
- if (slab_is_available())
- memblock_free_late(phys, size);
- else
- memblock_phys_free(phys, size);
- } else if (flags & EFI_MEMMAP_SLAB) {
- struct page *p = pfn_to_page(PHYS_PFN(phys));
- unsigned int order = get_order(size);
-
- __free_pages(p, order);
- }
-}
-
-/**
- * efi_memmap_alloc - Allocate memory for the EFI memory map
- * @num_entries: Number of entries in the allocated map.
- * @data: efi memmap installation parameters
- *
- * Depending on whether mm_init() has already been invoked or not,
- * either memblock or "normal" page allocation is used.
- *
- * Returns zero on success, a negative error code on failure.
- */
-int __init efi_memmap_alloc(unsigned int num_entries,
- struct efi_memory_map_data *data)
-{
- /* Expect allocation parameters are zero initialized */
- WARN_ON(data->phys_map || data->size);
-
- data->size = num_entries * efi.memmap.desc_size;
- data->desc_version = efi.memmap.desc_version;
- data->desc_size = efi.memmap.desc_size;
- data->flags &= ~(EFI_MEMMAP_SLAB | EFI_MEMMAP_MEMBLOCK);
- data->flags |= efi.memmap.flags & EFI_MEMMAP_LATE;
-
- if (slab_is_available()) {
- data->flags |= EFI_MEMMAP_SLAB;
- data->phys_map = __efi_memmap_alloc_late(data->size);
- } else {
- data->flags |= EFI_MEMMAP_MEMBLOCK;
- data->phys_map = __efi_memmap_alloc_early(data->size);
- }
-
- if (!data->phys_map)
- return -ENOMEM;
- return 0;
-}
-
-/**
- * efi_memmap_install - Install a new EFI memory map in efi.memmap
- * @data: efi memmap installation parameters
- *
- * Unlike efi_memmap_init_*(), this function does not allow the caller
- * to switch from early to late mappings. It simply uses the existing
- * mapping function and installs the new memmap.
- *
- * Returns zero on success, a negative error code on failure.
- */
-int __init efi_memmap_install(struct efi_memory_map_data *data)
-{
- unsigned long size = efi.memmap.map_end - efi.memmap.map;
- unsigned long flags = efi.memmap.flags;
- u64 phys = efi.memmap.phys_map;
- int ret;
-
- efi_memmap_unmap();
-
- if (efi_enabled(EFI_PARAVIRT))
- return 0;
-
- ret = __efi_memmap_init(data);
- if (ret)
- return ret;
-
- __efi_memmap_free(phys, size, flags);
- return 0;
-}
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index eecaa745d352..dc90c35480f8 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -356,12 +356,10 @@ static struct efi_freeable_range *ranges_to_free;
void __init efi_unmap_boot_services(void)
{
- struct efi_memory_map_data data = { 0 };
efi_memory_desc_t *md;
- int num_entries = 0;
+ void *new_md;
int idx = 0;
size_t sz;
- void *new, *new_md;
/* Keep all regions for /sys/kernel/debug/efi */
if (efi_enabled(EFI_DBG))
@@ -374,13 +372,13 @@ void __init efi_unmap_boot_services(void)
return;
}
+ new_md = efi.memmap.map;
for_each_efi_memory_desc(md) {
unsigned long long start = md->phys_addr;
unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
if (md->type != EFI_BOOT_SERVICES_CODE &&
md->type != EFI_BOOT_SERVICES_DATA) {
- num_entries++;
continue;
}
@@ -394,7 +392,6 @@ void __init efi_unmap_boot_services(void)
/* Do not free, someone else owns it: */
if ((md->attribute & EFI_MEMORY_RUNTIME) ||
!can_free_region(start, size)) {
- num_entries++;
continue;
}
@@ -409,26 +406,12 @@ void __init efi_unmap_boot_services(void)
idx++;
}
- if (!num_entries)
- return;
-
- if (efi_memmap_alloc(num_entries, &data) != 0) {
- pr_err("Failed to allocate new EFI memmap\n");
- return;
- }
-
- new = memremap(data.phys_map, data.size, MEMREMAP_WB);
- if (!new) {
- pr_err("Failed to map new EFI memmap\n");
- return;
- }
-
/*
* Build a new EFI memmap that excludes any boot services
* regions that are not tagged EFI_MEMORY_RUNTIME, since those
* regions have now been freed.
*/
- new_md = new;
+ new_md = efi.memmap.map;
for_each_efi_memory_desc(md) {
if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
(md->type == EFI_BOOT_SERVICES_CODE ||
@@ -442,12 +425,7 @@ void __init efi_unmap_boot_services(void)
new_md += efi.memmap.desc_size;
}
- memunmap(new);
-
- if (efi_memmap_install(&data) != 0) {
- pr_err("Could not install new EFI memmap\n");
- return;
- }
+ efi.memmap.num_valid_entries = (new_md - efi.memmap.map) / efi.memmap.desc_size;
}
static unsigned long __init
diff --git a/include/linux/efi.h b/include/linux/efi.h
index b0c3e9648126..58279538d9d8 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -553,8 +553,7 @@ struct efi_unaccepted_memory {
/*
* Architecture independent structure for describing a memory map for the
- * benefit of efi_memmap_init_early(), and for passing context between
- * efi_memmap_alloc() and efi_memmap_install().
+ * benefit of efi_memmap_init_early().
*/
struct efi_memory_map_data {
phys_addr_t phys_map;
@@ -572,8 +571,6 @@ struct efi_memory_map {
unsigned long desc_version;
unsigned long desc_size;
#define EFI_MEMMAP_LATE (1UL << 0)
-#define EFI_MEMMAP_MEMBLOCK (1UL << 1)
-#define EFI_MEMMAP_SLAB (1UL << 2)
unsigned long flags;
};
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 17/19] x86/efi: Defer compaction of the EFI memory map
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (15 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 16/19] x86/efi: Reuse memory map instead of reallocating it Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 18/19] x86/efi: Do not abuse RUNTIME bit to mark boot regions as reserved Ard Biesheuvel
` (2 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
Currently, the EFI memory map is compacted early at boot, to leave only
the entries that are significant to the current kernel or potentially a
kexec'ed kernel that comes after, and to suppress all boot services code
and data entries that have no correspondence with anything that either
the firmware or the kernel treats as reserved for firmware use.
Given that actually freeing those regions to the page allocator is not
possible yet at this point, those suppressed entries are converted into
yet another type of temporary memory reservation map, and freed during
an arch_initcall(), which is the earliest convenient time to actually
perform this operation.
Given that compacting the memory map does not need to occur that early
to begin with, move it to the arch_initcall(). This removes the need for
the special memory reservation map, as the entries still exist at this
point, and can be consulted directly to decide whether they need to be
preserved in their entirety or only partially.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/quirks.c | 110 +++++++-------------
1 file changed, 39 insertions(+), 71 deletions(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index dc90c35480f8..bc9dfe7925aa 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -347,36 +347,11 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
pr_err("Failed to unmap VA mapping for 0x%llx\n", va);
}
-struct efi_freeable_range {
- u64 start;
- u64 end;
-};
-
-static struct efi_freeable_range *ranges_to_free;
-
void __init efi_unmap_boot_services(void)
{
efi_memory_desc_t *md;
- void *new_md;
- int idx = 0;
- size_t sz;
- /* Keep all regions for /sys/kernel/debug/efi */
- if (efi_enabled(EFI_DBG))
- return;
-
- sz = sizeof(*ranges_to_free) * efi.memmap.num_valid_entries + 1;
- ranges_to_free = kzalloc(sz, GFP_KERNEL);
- if (!ranges_to_free) {
- pr_err("Failed to allocate storage for freeable EFI regions\n");
- return;
- }
-
- new_md = efi.memmap.map;
for_each_efi_memory_desc(md) {
- unsigned long long start = md->phys_addr;
- unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
-
if (md->type != EFI_BOOT_SERVICES_CODE &&
md->type != EFI_BOOT_SERVICES_DATA) {
continue;
@@ -385,47 +360,10 @@ void __init efi_unmap_boot_services(void)
/*
* Before calling set_virtual_address_map(), EFI boot services
* code/data regions were mapped as a quirk for buggy firmware.
- * Unmap them from efi_pgd before freeing them up.
+ * Unmap them from efi_pgd, they will be freed later.
*/
efi_unmap_pages(md);
-
- /* Do not free, someone else owns it: */
- if ((md->attribute & EFI_MEMORY_RUNTIME) ||
- !can_free_region(start, size)) {
- continue;
- }
-
- /*
- * With CONFIG_DEFERRED_STRUCT_PAGE_INIT parts of the memory
- * map are still not initialized and we can't reliably free
- * memory here.
- * Queue the ranges to free at a later point.
- */
- ranges_to_free[idx].start = start;
- ranges_to_free[idx].end = start + size;
- idx++;
}
-
- /*
- * Build a new EFI memmap that excludes any boot services
- * regions that are not tagged EFI_MEMORY_RUNTIME, since those
- * regions have now been freed.
- */
- new_md = efi.memmap.map;
- for_each_efi_memory_desc(md) {
- if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
- (md->type == EFI_BOOT_SERVICES_CODE ||
- md->type == EFI_BOOT_SERVICES_DATA) &&
- can_free_region(md->phys_addr,
- md->num_pages << EFI_PAGE_SHIFT)) {
- continue;
- }
-
- memcpy(new_md, md, efi.memmap.desc_size);
- new_md += efi.memmap.desc_size;
- }
-
- efi.memmap.num_valid_entries = (new_md - efi.memmap.map) / efi.memmap.desc_size;
}
static unsigned long __init
@@ -464,27 +402,57 @@ efi_free_unreserved_subregions(u64 range_start, u64 range_end)
static int __init efi_free_boot_services(void)
{
- struct efi_freeable_range *range = ranges_to_free;
unsigned long freed = 0;
+ efi_memory_desc_t *md;
+ void *new_md;
+
+ /* No EFI memory map or it came from the preceding kernel? */
+ if (efi_setup || !efi_enabled(EFI_MEMMAP))
+ return 0;
- if (!ranges_to_free)
+ /* Keep all regions for /sys/kernel/debug/efi */
+ if (efi_enabled(EFI_DBG))
return 0;
- while (range->start) {
+ new_md = efi.memmap.map;
+ for_each_efi_memory_desc(md) {
/*
* Don't free memory under 1M for two reasons:
* - BIOS might clobber it
* - Crash kernel needs it to be reserved
*/
- u64 start = max(range->start, SZ_1M);
+ u64 md_start = max(md->phys_addr, SZ_1M);
+ u64 md_end = md->phys_addr + md->num_pages * EFI_PAGE_SIZE;
+ bool preserve_entry = md->attribute & EFI_MEMORY_RUNTIME;
- if (start >= range->end)
+ if (md_start >= md_end)
continue;
- freed += efi_free_unreserved_subregions(start, range->end);
- range++;
+ if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
+ (md->type == EFI_BOOT_SERVICES_CODE ||
+ md->type == EFI_BOOT_SERVICES_DATA)) {
+ u64 f = efi_free_unreserved_subregions(md_start, md_end);
+
+ /*
+ * Omit the memory map entry of this region only if it
+ * has been freed entirely. This ensures that boot data
+ * regions for things like ESRT and BGRT tables carry
+ * over correctly during kexec.
+ */
+ if (f < md_end - md_start)
+ preserve_entry = true;
+
+ freed += f;
+ }
+
+ if (preserve_entry) {
+ if (new_md != md)
+ memcpy(new_md, md, efi.memmap.desc_size);
+ new_md += efi.memmap.desc_size;
+ }
}
- kfree(ranges_to_free);
+
+ efi.memmap.num_valid_entries = (new_md - efi.memmap.map) / efi.memmap.desc_size;
if (freed)
pr_info("Freeing EFI boot services memory: %ldK\n", freed / SZ_1K);
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 18/19] x86/efi: Do not abuse RUNTIME bit to mark boot regions as reserved
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (16 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 17/19] x86/efi: Defer compaction of the EFI memory map Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 19/19] x86/efi: Free unused tail of the EFI memory map Ard Biesheuvel
2026-03-24 9:50 ` [PATCH v2 00/19] efi/x86: Avoid the need to mangle " Ard Biesheuvel
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
efi_reserve_boot_regions() marks all EFI boot services memory regions as
memblock_reserve()'d temporarily, so that they can be mapped in the EFI
page tables during the call to the SetVirtualAddressMap() runtime
service.
This means it has to take care to distinguish between regions that are
entirely unused from regions that are already covered by some prior
reservations, either by the kernel itself via memblock, or via the
firmware or bootloader via the E820 map.
For this reason, it only memblock_reserve()'s boot services regions that
are not covered by any prior memblock reservation. Otherwise, it will
set the EFI_MEMORY_RUNTIME flag for the region, which indicates to the
freeing code that runs later that the region must remain reserved.
It also sets the EFI_MEMORY_RUNTIME flag for the region if it covers any
E820 region that is not E820_RAM, so that -again- the entire region
remains reserved indefinitely.
This is inefficient, and abusing the EFI_MEMORY_RUNTIME flag for this is
not great either. It would be better to respect the actual memblock or
E820 reservations instead, which is feasible now that the freeing code
takes the MEMBLOCK_RSRV_KERN flag into account.
So drop the EFI_MEMORY_RUNTIME hack, and instead, respect existing
memblock reservations by upgrading them to MEMBLOCK_RSRV_KERN
reservations. Take E820 reservations into account by cross-referencing
them with the EFI and memblock reservations when actually returning the
pages back to the page allocator.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/quirks.c | 29 ++++++--------------
1 file changed, 9 insertions(+), 20 deletions(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index bc9dfe7925aa..8f2dc477eee0 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -298,26 +298,13 @@ void __init efi_reserve_boot_services(void)
*/
if (!already_reserved) {
memblock_reserve(start, size);
-
+ } else {
/*
- * If we are the first to reserve the region, no
- * one else cares about it. We own it and can
- * free it later.
+ * Mark existing reservations as MEMBLOCK_RSRV_KERN so
+ * they will be respected by efi_free_boot_services().
*/
- if (can_free_region(start, size))
- continue;
+ memblock_reserved_mark_kern(start, size);
}
-
- /*
- * We don't own the region. We must not free it.
- *
- * Setting this bit for a boot services region really
- * doesn't make sense as far as the firmware is
- * concerned, but it does provide us with a way to tag
- * those regions that must not be paired with
- * memblock_free_late().
- */
- md->attribute |= EFI_MEMORY_RUNTIME;
}
}
@@ -392,6 +379,9 @@ efi_free_unreserved_subregions(u64 range_start, u64 range_end)
if (start >= end)
continue;
+ if (!can_free_region(start, end - start))
+ continue;
+
free_reserved_area(phys_to_virt(start),
phys_to_virt(end), -1, NULL);
freed += (end - start);
@@ -428,9 +418,8 @@ static int __init efi_free_boot_services(void)
if (md_start >= md_end)
continue;
- if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
- (md->type == EFI_BOOT_SERVICES_CODE ||
- md->type == EFI_BOOT_SERVICES_DATA)) {
+ if (md->type == EFI_BOOT_SERVICES_CODE ||
+ md->type == EFI_BOOT_SERVICES_DATA) {
u64 f = efi_free_unreserved_subregions(md_start, md_end);
/*
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 19/19] x86/efi: Free unused tail of the EFI memory map
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (17 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 18/19] x86/efi: Do not abuse RUNTIME bit to mark boot regions as reserved Ard Biesheuvel
@ 2026-03-19 9:05 ` Ard Biesheuvel
2026-03-24 9:50 ` [PATCH v2 00/19] efi/x86: Avoid the need to mangle " Ard Biesheuvel
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-19 9:05 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
Benjamin Herrenschmidt
From: Ard Biesheuvel <ardb@kernel.org>
After moving the relevant entries to the start of the map, the remainder
can be handed back to the page allocator.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/platform/efi/quirks.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 8f2dc477eee0..3b3652c4b90e 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -443,6 +443,10 @@ static int __init efi_free_boot_services(void)
efi.memmap.num_valid_entries = (new_md - efi.memmap.map) / efi.memmap.desc_size;
+ /* Free the part of the memory map allocation that has become unused */
+ free_reserved_area(new_md, efi.memmap.map_end, -1, NULL);
+ freed += (void *)efi.memmap.map_end - new_md;
+
if (freed)
pr_info("Freeing EFI boot services memory: %ldK\n", freed / SZ_1K);
--
2.53.0.851.ga537e3e6e9-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
` (18 preceding siblings ...)
2026-03-19 9:05 ` [PATCH v2 19/19] x86/efi: Free unused tail of the EFI memory map Ard Biesheuvel
@ 2026-03-24 9:50 ` Ard Biesheuvel
19 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2026-03-24 9:50 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: linux-efi, x86, Mike Rapoport, Benjamin Herrenschmidt
On Thu, 19 Mar 2026, at 10:05, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> At boot, x86 uses E820 tables, memblock tables and the EFI memory map to
> reason about which parts of system RAM are available to the OS, and
> which are reserved.
>
> While other EFI architectures treat the EFI memory map as immutable, the
> x86 boot code modifies it to keep track of memory reservations of boot
> services data regions, in order to distinguish which parts have been
> memblock_reserve()'d permanently, and which ones have been reserved only
> temporarily to work around buggy implementations of the EFI runtime
> service [SetVirtualAddressMap()] that reconfigures the VA space of the
> runtime services themselves.
>
> This method is mostly fine for marking entire regions as reserved, but
> it gets complicated when the code decides to split EFI memory map
> entries in order to mark some of it permanently reserved, and the rest
> of it temporarily reserved.
>
> Let's clean this up, by
> - marking permanent reservations of EFI boot services data memory as
> MEMBLOCK_RSRV_KERN
> - taking this marking into account when deciding whether or not a EFI
> boot services data region can be freed
> - dropping all of the EFI memory map insertion/splitting logic and the
> allocation/freeing logic, all of which have become redundant.
>
Please disregard this for now. Sashiko pointed out some fundamental issues in this series, and I think it might be better to take a different approach entirely.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2026-03-24 9:50 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19 9:05 [PATCH v2 00/19] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 01/19] memblock: Permit existing reserved regions to be marked RSRV_KERN Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 02/19] efi: Tag memblock reservations of boot services regions as RSRV_KERN Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 03/19] x86/efi: Unmap kernel-reserved boot regions from EFI page tables Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 04/19] x86/efi: Drop EFI_MEMORY_RUNTIME check from __ioremap_check_other() Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 05/19] x86/efi: Omit RSRV_KERN memblock reservations when freeing boot regions Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 06/19] x86/efi: Defer sub-1M check from unmap to free stage Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 07/19] x86/efi: Simplify real mode trampoline allocation quirk Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 08/19] x86/efi: Omit redundant kernel image overlap check Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 09/19] x86/efi: Drop redundant EFI_PARAVIRT check Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 10/19] x86/efi: Do not rely on EFI_MEMORY_RUNTIME bit and avoid entry splitting Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 11/19] efi: Use nr_map not map_end to find the last valid memory map entry Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 12/19] x86/efi: Only merge EFI memory map entries on 32-bit systems Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 13/19] x86/efi: Clean the memory map using iterator and filter API Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 14/19] x86/efi: Update the runtime map in place Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 15/19] x86/efi: Use iterator API when mapping EFI regions for runtime Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 16/19] x86/efi: Reuse memory map instead of reallocating it Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 17/19] x86/efi: Defer compaction of the EFI memory map Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 18/19] x86/efi: Do not abuse RUNTIME bit to mark boot regions as reserved Ard Biesheuvel
2026-03-19 9:05 ` [PATCH v2 19/19] x86/efi: Free unused tail of the EFI memory map Ard Biesheuvel
2026-03-24 9:50 ` [PATCH v2 00/19] efi/x86: Avoid the need to mangle " Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox