public inbox for linux-efi@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map
@ 2026-03-06 15:57 Ard Biesheuvel
  2026-03-06 15:57 ` [RFC PATCH 1/9] memblock: Permit existing reserved regions to be marked RSRV_KERN Ard Biesheuvel
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 15:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
	Benjamin Herrenschmidt

From: Ard Biesheuvel <ardb@kernel.org>

At boot, x86 uses E820 tables, memblock tables and the EFI memory map to
reason about which parts of system RAM are available to the OS, and
which are reserved.

While other EFI architectures treat the EFI memory map as immutable, the
x86 boot code modifies it to keep track of memory reservations of boot
services data regions, in order to distinguish which parts have been
memblock_reserve()'d permanently, and which ones have been reserved only
temporarily to work around buggy implementations of the EFI runtime
service [SetVirtualAddressMap()] that reconfigures the VA space of the
runtime services themselves.

This method is mostly fine for marking entire regions as reserved, but
it gets complicated when the code decides to split EFI memory map
entries in order to mark some of it permanently reserved, and the rest
of it temporarily reserved.

Let's clean this up, by
- marking permanent reservations of EFI boot services data memory as
  MEMBLOCK_RSRV_KERN
- taking this marking into account when deciding whether or not a EFI
  boot services data region can be freed
- dropping all of the EFI memory map insertion/splitting logic and the
  allocation/freeing logic, all of which have become redundant.

Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Ard Biesheuvel (9):
  memblock: Permit existing reserved regions to be marked RSRV_KERN
  efi: Tag memblock reservations of boot services regions as RSRV_KERN
  x86/efi: Omit RSRV_KERN memblock reservations when freeing boot
    regions
  x86/efi: Defer sub-1M check from unmap to free stage
  x86/efi: Unmap kernel-reserved boot regions from EFI page tables
  x86/efi: Do not rely on EFI_MEMORY_RUNTIME bit and avoid entry
    splitting
  x86/efi: Reuse memory map instead of reallocating it
  x86/efi: Defer compaction of the EFI memory map
  x86/efi: Free unused tail of the EFI memory map

 arch/x86/include/asm/efi.h     |   7 -
 arch/x86/platform/efi/memmap.c | 221 +------------------
 arch/x86/platform/efi/quirks.c | 222 +++++++-------------
 drivers/firmware/efi/efi.c     |   4 +-
 include/linux/efi.h            |   2 -
 include/linux/memblock.h       |   1 +
 mm/memblock.c                  |  15 ++
 7 files changed, 96 insertions(+), 376 deletions(-)

base-commit: a4b0bf6a40f3c107c67a24fbc614510ef5719980 # linux-efi/urgent
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/9] memblock: Permit existing reserved regions to be marked RSRV_KERN
  2026-03-06 15:57 [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
@ 2026-03-06 15:57 ` Ard Biesheuvel
  2026-03-16  6:53   ` Mike Rapoport
  2026-03-06 15:57 ` [RFC PATCH 2/9] efi: Tag memblock reservations of boot services regions as RSRV_KERN Ard Biesheuvel
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 15:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
	Benjamin Herrenschmidt

From: Ard Biesheuvel <ardb@kernel.org>

Permit existing memblock reservations to be marked as RSRV_KERN. This
will be used by the EFI code on x86 to distinguish between reservations
of boot services data regions that have actual significance to the
kernel and regions that are reserved temporarily to work around buggy
firmware.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/linux/memblock.h |  1 +
 mm/memblock.c            | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 6ec5e9ac0699..9eac4f268359 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -155,6 +155,7 @@ int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
 int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
 int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
 int memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t size);
+int memblock_reserved_mark_kern(phys_addr_t base, phys_addr_t size);
 int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size);
 int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size);
 
diff --git a/mm/memblock.c b/mm/memblock.c
index b3ddfdec7a80..2505ce8b319c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1115,6 +1115,21 @@ int __init_memblock memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t
 				    MEMBLOCK_RSRV_NOINIT);
 }
 
+/**
+ * memblock_reserved_mark_kern - Mark a reserved memory region with flag
+ * MEMBLOCK_RSRV_KERN
+ *
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int __init_memblock memblock_reserved_mark_kern(phys_addr_t base, phys_addr_t size)
+{
+	return memblock_setclr_flag(&memblock.reserved, base, size, 1,
+				    MEMBLOCK_RSRV_KERN);
+}
+
 /**
  * memblock_mark_kho_scratch - Mark a memory region as MEMBLOCK_KHO_SCRATCH.
  * @base: the base phys addr of the region
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/9] efi: Tag memblock reservations of boot services regions as RSRV_KERN
  2026-03-06 15:57 [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
  2026-03-06 15:57 ` [RFC PATCH 1/9] memblock: Permit existing reserved regions to be marked RSRV_KERN Ard Biesheuvel
@ 2026-03-06 15:57 ` Ard Biesheuvel
  2026-03-16  6:55   ` Mike Rapoport
  2026-03-06 15:57 ` [RFC PATCH 3/9] x86/efi: Omit RSRV_KERN memblock reservations when freeing boot regions Ard Biesheuvel
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 15:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
	Benjamin Herrenschmidt

From: Ard Biesheuvel <ardb@kernel.org>

By definition, EFI memory regions of type boot services code or data
have no special significance to the firmware at runtime, only to the OS.
In some cases, the firmware will allocate tables and other assets that
are passed in memory in regions of this type, and leave it up to the OS
to decide whether or not to treat the allocation as special, or simply
consume the contents at boot and recycle the RAM for ordinary use. The
reason for this approach is that it avoids needless memory reservations
for assets that the OS knows nothing about, and therefore doesn't know
how to free either.

This means that any memblock reservations covering such regions can be
marked as MEMBLOCK_RSRV_KERN - this is a better match semantically, and
is useful on x86 to distinguish true reservations from temporary
reservations that are only needed to work around firmware bugs.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/firmware/efi/efi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index b2fb92a4bbd1..e4ab7481bbf6 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -600,7 +600,9 @@ void __init efi_mem_reserve(phys_addr_t addr, u64 size)
 		return;
 
 	if (!memblock_is_region_reserved(addr, size))
-		memblock_reserve(addr, size);
+		memblock_reserve_kern(addr, size);
+	else
+		memblock_reserved_mark_kern(addr, size);
 
 	/*
 	 * Some architectures (x86) reserve all boot services ranges
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/9] x86/efi: Omit RSRV_KERN memblock reservations when freeing boot regions
  2026-03-06 15:57 [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
  2026-03-06 15:57 ` [RFC PATCH 1/9] memblock: Permit existing reserved regions to be marked RSRV_KERN Ard Biesheuvel
  2026-03-06 15:57 ` [RFC PATCH 2/9] efi: Tag memblock reservations of boot services regions as RSRV_KERN Ard Biesheuvel
@ 2026-03-06 15:57 ` Ard Biesheuvel
  2026-03-06 15:57 ` [RFC PATCH 4/9] x86/efi: Defer sub-1M check from unmap to free stage Ard Biesheuvel
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 15:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
	Benjamin Herrenschmidt

From: Ard Biesheuvel <ardb@kernel.org>

Instead of freeing all EFI boot services code and data regions that were
preliminarily reserved during early boot to work around buggy firmware,
take care to only free those parts that are not marked as
MEMBLOCK_RSRV_KERN. This marking is used by the generic implementation
of efi_mem_reserve() to mark things like informational tables that are
provided to the OS by the firmware, but where the contents of memory
have no significance to the firmware itself. Such assets are often
passed in a EFI boot service data region, leaving it to the OS to decide
whether it needs to be reserved or not.

This removes the need to mark such regions as EFI_MEMORY_RUNTIME, which
is a hack that results in a lot of complexity in updating and
re-allocating the EFI memory map, which would otherwise not need to be
modified at all. Note that x86 is the only EFI arch that does any of
this, others just treat the EFI memory map as immutable.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/platform/efi/quirks.c | 40 +++++++++++++++++---
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 35caa5746115..f896930cecda 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -536,6 +536,40 @@ void __init efi_unmap_boot_services(void)
 	}
 }
 
+static unsigned long __init
+efi_free_unreserved_subregions(u64 range_start, u64 range_end)
+{
+	struct memblock_region *region;
+	unsigned long freed = 0;
+
+	for_each_reserved_mem_region(region) {
+		u64 region_end = region->base + region->size;
+		u64 start, end;
+
+		/* memblock tables are sorted so no need to carry on */
+		if (region->base >= range_end)
+			break;
+
+		if (region_end < range_start)
+			continue;
+
+		if (region->flags & MEMBLOCK_RSRV_KERN)
+			continue;
+
+		start = PAGE_ALIGN(max(range_start, region->base));
+		end = PAGE_ALIGN_DOWN(min(range_end, region_end));
+
+		if (start >= end)
+			continue;
+
+		free_reserved_area(phys_to_virt(start),
+				   phys_to_virt(end), -1, NULL);
+		freed += (end - start);
+	}
+
+	return freed;
+}
+
 static int __init efi_free_boot_services(void)
 {
 	struct efi_freeable_range *range = ranges_to_free;
@@ -545,11 +579,7 @@ static int __init efi_free_boot_services(void)
 		return 0;
 
 	while (range->start) {
-		void *start = phys_to_virt(range->start);
-		void *end = phys_to_virt(range->end);
-
-		free_reserved_area(start, end, -1, NULL);
-		freed += (end - start);
+		freed += efi_free_unreserved_subregions(range->start, range->end);
 		range++;
 	}
 	kfree(ranges_to_free);
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 4/9] x86/efi: Defer sub-1M check from unmap to free stage
  2026-03-06 15:57 [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2026-03-06 15:57 ` [RFC PATCH 3/9] x86/efi: Omit RSRV_KERN memblock reservations when freeing boot regions Ard Biesheuvel
@ 2026-03-06 15:57 ` Ard Biesheuvel
  2026-03-06 15:57 ` [PATCH 4/4] x86/efi: Omit kernel reservations of boot services memory from memmap Ard Biesheuvel
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 15:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
	Benjamin Herrenschmidt

From: Ard Biesheuvel <ardb@kernel.org>

As a first step towards moving the free logic to a later stage
altogether, and only keeping the unmap and the realmode trampoline hack
during the early stage of freeing the boot service code and data
regions, move the logic that avoids freeing memory below 1M to the later
stage.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/platform/efi/quirks.c | 21 ++++++++------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f896930cecda..58d00ffb1d59 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -475,18 +475,6 @@ void __init efi_unmap_boot_services(void)
 			size -= rm_size;
 		}
 
-		/*
-		 * Don't free memory under 1M for two reasons:
-		 * - BIOS might clobber it
-		 * - Crash kernel needs it to be reserved
-		 */
-		if (start + size < SZ_1M)
-			continue;
-		if (start < SZ_1M) {
-			size -= (SZ_1M - start);
-			start = SZ_1M;
-		}
-
 		/*
 		 * With CONFIG_DEFERRED_STRUCT_PAGE_INIT parts of the memory
 		 * map are still not initialized and we can't reliably free
@@ -579,7 +567,14 @@ static int __init efi_free_boot_services(void)
 		return 0;
 
 	while (range->start) {
-		freed += efi_free_unreserved_subregions(range->start, range->end);
+		/*
+		 * Don't free memory under 1M for two reasons:
+		 * - BIOS might clobber it
+		 * - Crash kernel needs it to be reserved
+		 */
+		u64 start = max(range->start, SZ_1M);
+
+		freed += efi_free_unreserved_subregions(start, range->end);
 		range++;
 	}
 	kfree(ranges_to_free);
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] x86/efi: Omit kernel reservations of boot services memory from memmap
  2026-03-06 15:57 [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2026-03-06 15:57 ` [RFC PATCH 4/9] x86/efi: Defer sub-1M check from unmap to free stage Ard Biesheuvel
@ 2026-03-06 15:57 ` Ard Biesheuvel
  2026-03-06 16:00   ` Ard Biesheuvel
  2026-03-06 15:57 ` [RFC PATCH 5/9] x86/efi: Unmap kernel-reserved boot regions from EFI page tables Ard Biesheuvel
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 15:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
	Benjamin Herrenschmidt

From: Ard Biesheuvel <ardb@kernel.org>

Now that efi_mem_reserve() has been updated to rely on RSRV_KERN
memblock reservations, it is no longer needed to mark memblock reserved
regions as EFI_MEMORY_RUNTIME. This means that it is no longer needed to
split existing entries in the EFI memory map, removing the need to
re-allocate/copy/remap the entire EFI memory map on every call to
efi_mem_reserve().

So drop this functionality - it is no longer needed.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/include/asm/efi.h     |   4 -
 arch/x86/platform/efi/memmap.c | 138 --------------------
 arch/x86/platform/efi/quirks.c |  54 --------
 3 files changed, 196 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 51b4cdbea061..b01dd639bf62 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -396,10 +396,6 @@ extern int __init efi_memmap_alloc(unsigned int num_entries,
 				   struct efi_memory_map_data *data);
 
 extern int __init efi_memmap_install(struct efi_memory_map_data *data);
-extern int __init efi_memmap_split_count(efi_memory_desc_t *md,
-					 struct range *range);
-extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,
-				     void *buf, struct efi_mem_range *mem);
 
 extern enum efi_secureboot_mode __x86_ima_efi_boot_mode(void);
 
diff --git a/arch/x86/platform/efi/memmap.c b/arch/x86/platform/efi/memmap.c
index 023697c88910..8ef45014c7e7 100644
--- a/arch/x86/platform/efi/memmap.c
+++ b/arch/x86/platform/efi/memmap.c
@@ -110,141 +110,3 @@ int __init efi_memmap_install(struct efi_memory_map_data *data)
 	__efi_memmap_free(phys, size, flags);
 	return 0;
 }
-
-/**
- * efi_memmap_split_count - Count number of additional EFI memmap entries
- * @md: EFI memory descriptor to split
- * @range: Address range (start, end) to split around
- *
- * Returns the number of additional EFI memmap entries required to
- * accommodate @range.
- */
-int __init efi_memmap_split_count(efi_memory_desc_t *md, struct range *range)
-{
-	u64 m_start, m_end;
-	u64 start, end;
-	int count = 0;
-
-	start = md->phys_addr;
-	end = start + (md->num_pages << EFI_PAGE_SHIFT) - 1;
-
-	/* modifying range */
-	m_start = range->start;
-	m_end = range->end;
-
-	if (m_start <= start) {
-		/* split into 2 parts */
-		if (start < m_end && m_end < end)
-			count++;
-	}
-
-	if (start < m_start && m_start < end) {
-		/* split into 3 parts */
-		if (m_end < end)
-			count += 2;
-		/* split into 2 parts */
-		if (end <= m_end)
-			count++;
-	}
-
-	return count;
-}
-
-/**
- * efi_memmap_insert - Insert a memory region in an EFI memmap
- * @old_memmap: The existing EFI memory map structure
- * @buf: Address of buffer to store new map
- * @mem: Memory map entry to insert
- *
- * It is suggested that you call efi_memmap_split_count() first
- * to see how large @buf needs to be.
- */
-void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf,
-			      struct efi_mem_range *mem)
-{
-	u64 m_start, m_end, m_attr;
-	efi_memory_desc_t *md;
-	u64 start, end;
-	void *old, *new;
-
-	/* modifying range */
-	m_start = mem->range.start;
-	m_end = mem->range.end;
-	m_attr = mem->attribute;
-
-	/*
-	 * The EFI memory map deals with regions in EFI_PAGE_SIZE
-	 * units. Ensure that the region described by 'mem' is aligned
-	 * correctly.
-	 */
-	if (!IS_ALIGNED(m_start, EFI_PAGE_SIZE) ||
-	    !IS_ALIGNED(m_end + 1, EFI_PAGE_SIZE)) {
-		WARN_ON(1);
-		return;
-	}
-
-	for (old = old_memmap->map, new = buf;
-	     old < old_memmap->map_end;
-	     old += old_memmap->desc_size, new += old_memmap->desc_size) {
-
-		/* copy original EFI memory descriptor */
-		memcpy(new, old, old_memmap->desc_size);
-		md = new;
-		start = md->phys_addr;
-		end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
-
-		if (m_start <= start && end <= m_end)
-			md->attribute |= m_attr;
-
-		if (m_start <= start &&
-		    (start < m_end && m_end < end)) {
-			/* first part */
-			md->attribute |= m_attr;
-			md->num_pages = (m_end - md->phys_addr + 1) >>
-				EFI_PAGE_SHIFT;
-			/* latter part */
-			new += old_memmap->desc_size;
-			memcpy(new, old, old_memmap->desc_size);
-			md = new;
-			md->phys_addr = m_end + 1;
-			md->num_pages = (end - md->phys_addr + 1) >>
-				EFI_PAGE_SHIFT;
-		}
-
-		if ((start < m_start && m_start < end) && m_end < end) {
-			/* first part */
-			md->num_pages = (m_start - md->phys_addr) >>
-				EFI_PAGE_SHIFT;
-			/* middle part */
-			new += old_memmap->desc_size;
-			memcpy(new, old, old_memmap->desc_size);
-			md = new;
-			md->attribute |= m_attr;
-			md->phys_addr = m_start;
-			md->num_pages = (m_end - m_start + 1) >>
-				EFI_PAGE_SHIFT;
-			/* last part */
-			new += old_memmap->desc_size;
-			memcpy(new, old, old_memmap->desc_size);
-			md = new;
-			md->phys_addr = m_end + 1;
-			md->num_pages = (end - m_end) >>
-				EFI_PAGE_SHIFT;
-		}
-
-		if ((start < m_start && m_start < end) &&
-		    (end <= m_end)) {
-			/* first part */
-			md->num_pages = (m_start - md->phys_addr) >>
-				EFI_PAGE_SHIFT;
-			/* latter part */
-			new += old_memmap->desc_size;
-			memcpy(new, old, old_memmap->desc_size);
-			md = new;
-			md->phys_addr = m_start;
-			md->num_pages = (end - md->phys_addr + 1) >>
-				EFI_PAGE_SHIFT;
-			md->attribute |= m_attr;
-		}
-	}
-}
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index af766694f7ee..8d2bfbd3a0ce 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -239,63 +239,9 @@ EXPORT_SYMBOL_GPL(efi_query_variable_store);
  * buggy implementations we reserve boot services region during EFI
  * init and make sure it stays executable. Then, after
  * SetVirtualAddressMap(), it is discarded.
- *
- * However, some boot services regions contain data that is required
- * by drivers, so we need to track which memory ranges can never be
- * freed. This is done by tagging those regions with the
- * EFI_MEMORY_RUNTIME attribute.
- *
- * Any driver that wants to mark a region as reserved must use
- * efi_mem_reserve() which will insert a new EFI memory descriptor
- * into efi.memmap (splitting existing regions if necessary) and tag
- * it with EFI_MEMORY_RUNTIME.
  */
 void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 {
-	struct efi_memory_map_data data = { 0 };
-	struct efi_mem_range mr;
-	efi_memory_desc_t md;
-	int num_entries;
-	void *new;
-
-	if (efi_mem_desc_lookup(addr, &md) ||
-	    md.type != EFI_BOOT_SERVICES_DATA) {
-		pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
-		return;
-	}
-
-	if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
-		pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
-		return;
-	}
-
-	size += addr % EFI_PAGE_SIZE;
-	size = round_up(size, EFI_PAGE_SIZE);
-	addr = round_down(addr, EFI_PAGE_SIZE);
-
-	mr.range.start = addr;
-	mr.range.end = addr + size - 1;
-	mr.attribute = md.attribute | EFI_MEMORY_RUNTIME;
-
-	num_entries = efi_memmap_split_count(&md, &mr.range);
-	num_entries += efi.memmap.nr_map;
-
-	if (efi_memmap_alloc(num_entries, &data) != 0) {
-		pr_err("Could not allocate boot services memmap\n");
-		return;
-	}
-
-	new = early_memremap_prot(data.phys_map, data.size,
-				  pgprot_val(pgprot_encrypted(FIXMAP_PAGE_NORMAL)));
-	if (!new) {
-		pr_err("Failed to map new boot services memmap\n");
-		return;
-	}
-
-	efi_memmap_insert(&efi.memmap, new, &mr);
-	early_memunmap(new, data.size);
-
-	efi_memmap_install(&data);
 	e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
 	e820__update_table(e820_table);
 }
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 5/9] x86/efi: Unmap kernel-reserved boot regions from EFI page tables
  2026-03-06 15:57 [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2026-03-06 15:57 ` [PATCH 4/4] x86/efi: Omit kernel reservations of boot services memory from memmap Ard Biesheuvel
@ 2026-03-06 15:57 ` Ard Biesheuvel
  2026-03-06 15:57 ` [RFC PATCH 6/9] x86/efi: Do not rely on EFI_MEMORY_RUNTIME bit and avoid entry splitting Ard Biesheuvel
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 15:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
	Benjamin Herrenschmidt

From: Ard Biesheuvel <ardb@kernel.org>

Currently, the logic that unmaps boot services code and data regions
that were mapped temporarily to work around firmware bugs disregards
regions that have been marked as EFI_MEMORY_RUNTIME. However, such
regions only have significance to the OS, and there is no reason the
retain the mapping in the EFI page tables, given that the runtime
firmware must never touch those regions.

So pull the unmap forward.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/platform/efi/quirks.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 58d00ffb1d59..e72e8b23598e 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -442,12 +442,6 @@ void __init efi_unmap_boot_services(void)
 			continue;
 		}
 
-		/* Do not free, someone else owns it: */
-		if (md->attribute & EFI_MEMORY_RUNTIME) {
-			num_entries++;
-			continue;
-		}
-
 		/*
 		 * Before calling set_virtual_address_map(), EFI boot services
 		 * code/data regions were mapped as a quirk for buggy firmware.
@@ -455,6 +449,12 @@ void __init efi_unmap_boot_services(void)
 		 */
 		efi_unmap_pages(md);
 
+		/* Do not free, someone else owns it: */
+		if (md->attribute & EFI_MEMORY_RUNTIME) {
+			num_entries++;
+			continue;
+		}
+
 		/*
 		 * Nasty quirk: if all sub-1MB memory is used for boot
 		 * services, we can get here without having allocated the
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 6/9] x86/efi: Do not rely on EFI_MEMORY_RUNTIME bit and avoid entry splitting
  2026-03-06 15:57 [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2026-03-06 15:57 ` [RFC PATCH 5/9] x86/efi: Unmap kernel-reserved boot regions from EFI page tables Ard Biesheuvel
@ 2026-03-06 15:57 ` Ard Biesheuvel
  2026-03-06 15:57 ` [RFC PATCH 7/9] x86/efi: Reuse memory map instead of reallocating it Ard Biesheuvel
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 15:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
	Benjamin Herrenschmidt

From: Ard Biesheuvel <ardb@kernel.org>

Now that efi_mem_reserve() has been updated to rely on RSRV_KERN
memblock reservations, it is no longer needed to mark memblock reserved
regions as EFI_MEMORY_RUNTIME. This means that it is no longer needed to
split existing entries in the EFI memory map, removing the need to
re-allocate/copy/remap the entire EFI memory map on every call to
efi_mem_reserve().

So drop this functionality - it is no longer needed.

Note that, for the time being, this requires the E820 map to be
consulted when deciding whether or not an entry with the
EFI_MEMORY_RUNTIME cleared needs to be preserved in the runtime map or
not. However, this will be superseded and removed by a subsequent patch,
which combines the map compaction with the actual freeing, in which case
the freeing logic can answer this question directly.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/include/asm/efi.h     |   4 -
 arch/x86/platform/efi/memmap.c | 138 --------------------
 arch/x86/platform/efi/quirks.c |  60 +--------
 3 files changed, 5 insertions(+), 197 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 51b4cdbea061..b01dd639bf62 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -396,10 +396,6 @@ extern int __init efi_memmap_alloc(unsigned int num_entries,
 				   struct efi_memory_map_data *data);
 
 extern int __init efi_memmap_install(struct efi_memory_map_data *data);
-extern int __init efi_memmap_split_count(efi_memory_desc_t *md,
-					 struct range *range);
-extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,
-				     void *buf, struct efi_mem_range *mem);
 
 extern enum efi_secureboot_mode __x86_ima_efi_boot_mode(void);
 
diff --git a/arch/x86/platform/efi/memmap.c b/arch/x86/platform/efi/memmap.c
index 023697c88910..8ef45014c7e7 100644
--- a/arch/x86/platform/efi/memmap.c
+++ b/arch/x86/platform/efi/memmap.c
@@ -110,141 +110,3 @@ int __init efi_memmap_install(struct efi_memory_map_data *data)
 	__efi_memmap_free(phys, size, flags);
 	return 0;
 }
-
-/**
- * efi_memmap_split_count - Count number of additional EFI memmap entries
- * @md: EFI memory descriptor to split
- * @range: Address range (start, end) to split around
- *
- * Returns the number of additional EFI memmap entries required to
- * accommodate @range.
- */
-int __init efi_memmap_split_count(efi_memory_desc_t *md, struct range *range)
-{
-	u64 m_start, m_end;
-	u64 start, end;
-	int count = 0;
-
-	start = md->phys_addr;
-	end = start + (md->num_pages << EFI_PAGE_SHIFT) - 1;
-
-	/* modifying range */
-	m_start = range->start;
-	m_end = range->end;
-
-	if (m_start <= start) {
-		/* split into 2 parts */
-		if (start < m_end && m_end < end)
-			count++;
-	}
-
-	if (start < m_start && m_start < end) {
-		/* split into 3 parts */
-		if (m_end < end)
-			count += 2;
-		/* split into 2 parts */
-		if (end <= m_end)
-			count++;
-	}
-
-	return count;
-}
-
-/**
- * efi_memmap_insert - Insert a memory region in an EFI memmap
- * @old_memmap: The existing EFI memory map structure
- * @buf: Address of buffer to store new map
- * @mem: Memory map entry to insert
- *
- * It is suggested that you call efi_memmap_split_count() first
- * to see how large @buf needs to be.
- */
-void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf,
-			      struct efi_mem_range *mem)
-{
-	u64 m_start, m_end, m_attr;
-	efi_memory_desc_t *md;
-	u64 start, end;
-	void *old, *new;
-
-	/* modifying range */
-	m_start = mem->range.start;
-	m_end = mem->range.end;
-	m_attr = mem->attribute;
-
-	/*
-	 * The EFI memory map deals with regions in EFI_PAGE_SIZE
-	 * units. Ensure that the region described by 'mem' is aligned
-	 * correctly.
-	 */
-	if (!IS_ALIGNED(m_start, EFI_PAGE_SIZE) ||
-	    !IS_ALIGNED(m_end + 1, EFI_PAGE_SIZE)) {
-		WARN_ON(1);
-		return;
-	}
-
-	for (old = old_memmap->map, new = buf;
-	     old < old_memmap->map_end;
-	     old += old_memmap->desc_size, new += old_memmap->desc_size) {
-
-		/* copy original EFI memory descriptor */
-		memcpy(new, old, old_memmap->desc_size);
-		md = new;
-		start = md->phys_addr;
-		end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
-
-		if (m_start <= start && end <= m_end)
-			md->attribute |= m_attr;
-
-		if (m_start <= start &&
-		    (start < m_end && m_end < end)) {
-			/* first part */
-			md->attribute |= m_attr;
-			md->num_pages = (m_end - md->phys_addr + 1) >>
-				EFI_PAGE_SHIFT;
-			/* latter part */
-			new += old_memmap->desc_size;
-			memcpy(new, old, old_memmap->desc_size);
-			md = new;
-			md->phys_addr = m_end + 1;
-			md->num_pages = (end - md->phys_addr + 1) >>
-				EFI_PAGE_SHIFT;
-		}
-
-		if ((start < m_start && m_start < end) && m_end < end) {
-			/* first part */
-			md->num_pages = (m_start - md->phys_addr) >>
-				EFI_PAGE_SHIFT;
-			/* middle part */
-			new += old_memmap->desc_size;
-			memcpy(new, old, old_memmap->desc_size);
-			md = new;
-			md->attribute |= m_attr;
-			md->phys_addr = m_start;
-			md->num_pages = (m_end - m_start + 1) >>
-				EFI_PAGE_SHIFT;
-			/* last part */
-			new += old_memmap->desc_size;
-			memcpy(new, old, old_memmap->desc_size);
-			md = new;
-			md->phys_addr = m_end + 1;
-			md->num_pages = (end - m_end) >>
-				EFI_PAGE_SHIFT;
-		}
-
-		if ((start < m_start && m_start < end) &&
-		    (end <= m_end)) {
-			/* first part */
-			md->num_pages = (m_start - md->phys_addr) >>
-				EFI_PAGE_SHIFT;
-			/* latter part */
-			new += old_memmap->desc_size;
-			memcpy(new, old, old_memmap->desc_size);
-			md = new;
-			md->phys_addr = m_start;
-			md->num_pages = (end - md->phys_addr + 1) >>
-				EFI_PAGE_SHIFT;
-			md->attribute |= m_attr;
-		}
-	}
-}
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index e72e8b23598e..8a4a0c6b64bc 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -239,63 +239,9 @@ EXPORT_SYMBOL_GPL(efi_query_variable_store);
  * buggy implementations we reserve boot services region during EFI
  * init and make sure it stays executable. Then, after
  * SetVirtualAddressMap(), it is discarded.
- *
- * However, some boot services regions contain data that is required
- * by drivers, so we need to track which memory ranges can never be
- * freed. This is done by tagging those regions with the
- * EFI_MEMORY_RUNTIME attribute.
- *
- * Any driver that wants to mark a region as reserved must use
- * efi_mem_reserve() which will insert a new EFI memory descriptor
- * into efi.memmap (splitting existing regions if necessary) and tag
- * it with EFI_MEMORY_RUNTIME.
  */
 void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 {
-	struct efi_memory_map_data data = { 0 };
-	struct efi_mem_range mr;
-	efi_memory_desc_t md;
-	int num_entries;
-	void *new;
-
-	if (efi_mem_desc_lookup(addr, &md) ||
-	    md.type != EFI_BOOT_SERVICES_DATA) {
-		pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
-		return;
-	}
-
-	if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
-		pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
-		return;
-	}
-
-	size += addr % EFI_PAGE_SIZE;
-	size = round_up(size, EFI_PAGE_SIZE);
-	addr = round_down(addr, EFI_PAGE_SIZE);
-
-	mr.range.start = addr;
-	mr.range.end = addr + size - 1;
-	mr.attribute = md.attribute | EFI_MEMORY_RUNTIME;
-
-	num_entries = efi_memmap_split_count(&md, &mr.range);
-	num_entries += efi.memmap.nr_map;
-
-	if (efi_memmap_alloc(num_entries, &data) != 0) {
-		pr_err("Could not allocate boot services memmap\n");
-		return;
-	}
-
-	new = early_memremap_prot(data.phys_map, data.size,
-				  pgprot_val(pgprot_encrypted(FIXMAP_PAGE_NORMAL)));
-	if (!new) {
-		pr_err("Failed to map new boot services memmap\n");
-		return;
-	}
-
-	efi_memmap_insert(&efi.memmap, new, &mr);
-	early_memunmap(new, data.size);
-
-	efi_memmap_install(&data);
 	e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
 	e820__update_table(e820_table);
 }
@@ -509,8 +455,12 @@ void __init efi_unmap_boot_services(void)
 	for_each_efi_memory_desc(md) {
 		if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
 		    (md->type == EFI_BOOT_SERVICES_CODE ||
-		     md->type == EFI_BOOT_SERVICES_DATA))
+		     md->type == EFI_BOOT_SERVICES_DATA) &&
+		    !e820__mapped_any(md->phys_addr,
+				      md->phys_addr + md->num_pages * EFI_PAGE_SIZE,
+				      E820_TYPE_RESERVED)) {
 			continue;
+		}
 
 		memcpy(new_md, md, efi.memmap.desc_size);
 		new_md += efi.memmap.desc_size;
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 7/9] x86/efi: Reuse memory map instead of reallocating it
  2026-03-06 15:57 [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2026-03-06 15:57 ` [RFC PATCH 6/9] x86/efi: Do not rely on EFI_MEMORY_RUNTIME bit and avoid entry splitting Ard Biesheuvel
@ 2026-03-06 15:57 ` Ard Biesheuvel
  2026-03-06 15:57 ` [RFC PATCH 8/9] x86/efi: Defer compaction of the EFI memory map Ard Biesheuvel
  2026-03-06 15:57 ` [RFC PATCH 9/9] x86/efi: Free unused tail " Ard Biesheuvel
  9 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 15:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
	Benjamin Herrenschmidt

From: Ard Biesheuvel <ardb@kernel.org>

The EFI memory map consists of 10s to 100s of entries of around 40 bytes
each. The initial version is allocated and populated by the EFI stub,
but later on, after freeing the boot services data regions and pruning
the associated entries, a new memory map is allocated with room for only
the remaining entries, which are typically much fewer in number.

Given that the original allocation is never freed, this does not
actually save any memory, and it is much simpler to just move the
entries that need to be preserved to the beginning of the map, and to
truncate it. That way, a lot of the complicated memory map allocation
and freeing code can simply be dropped.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/include/asm/efi.h     |  3 -
 arch/x86/platform/efi/memmap.c | 83 +-------------------
 arch/x86/platform/efi/quirks.c | 30 +++----
 include/linux/efi.h            |  2 -
 4 files changed, 10 insertions(+), 108 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index b01dd639bf62..ec352a8f6e7a 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -392,9 +392,6 @@ static inline void efi_reserve_boot_services(void)
 }
 #endif /* CONFIG_EFI */
 
-extern int __init efi_memmap_alloc(unsigned int num_entries,
-				   struct efi_memory_map_data *data);
-
 extern int __init efi_memmap_install(struct efi_memory_map_data *data);
 
 extern enum efi_secureboot_mode __x86_ima_efi_boot_mode(void);
diff --git a/arch/x86/platform/efi/memmap.c b/arch/x86/platform/efi/memmap.c
index 8ef45014c7e7..951a90235abb 100644
--- a/arch/x86/platform/efi/memmap.c
+++ b/arch/x86/platform/efi/memmap.c
@@ -8,78 +8,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/efi.h>
-#include <linux/io.h>
-#include <asm/early_ioremap.h>
 #include <asm/efi.h>
-#include <linux/memblock.h>
-#include <linux/slab.h>
-
-static phys_addr_t __init __efi_memmap_alloc_early(unsigned long size)
-{
-	return memblock_phys_alloc(size, SMP_CACHE_BYTES);
-}
-
-static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
-{
-	unsigned int order = get_order(size);
-	struct page *p = alloc_pages(GFP_KERNEL, order);
-
-	if (!p)
-		return 0;
-
-	return PFN_PHYS(page_to_pfn(p));
-}
-
-static
-void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
-{
-	if (flags & EFI_MEMMAP_MEMBLOCK) {
-		if (slab_is_available())
-			memblock_free_late(phys, size);
-		else
-			memblock_phys_free(phys, size);
-	} else if (flags & EFI_MEMMAP_SLAB) {
-		struct page *p = pfn_to_page(PHYS_PFN(phys));
-		unsigned int order = get_order(size);
-
-		__free_pages(p, order);
-	}
-}
-
-/**
- * efi_memmap_alloc - Allocate memory for the EFI memory map
- * @num_entries: Number of entries in the allocated map.
- * @data: efi memmap installation parameters
- *
- * Depending on whether mm_init() has already been invoked or not,
- * either memblock or "normal" page allocation is used.
- *
- * Returns zero on success, a negative error code on failure.
- */
-int __init efi_memmap_alloc(unsigned int num_entries,
-		struct efi_memory_map_data *data)
-{
-	/* Expect allocation parameters are zero initialized */
-	WARN_ON(data->phys_map || data->size);
-
-	data->size = num_entries * efi.memmap.desc_size;
-	data->desc_version = efi.memmap.desc_version;
-	data->desc_size = efi.memmap.desc_size;
-	data->flags &= ~(EFI_MEMMAP_SLAB | EFI_MEMMAP_MEMBLOCK);
-	data->flags |= efi.memmap.flags & EFI_MEMMAP_LATE;
-
-	if (slab_is_available()) {
-		data->flags |= EFI_MEMMAP_SLAB;
-		data->phys_map = __efi_memmap_alloc_late(data->size);
-	} else {
-		data->flags |= EFI_MEMMAP_MEMBLOCK;
-		data->phys_map = __efi_memmap_alloc_early(data->size);
-	}
-
-	if (!data->phys_map)
-		return -ENOMEM;
-	return 0;
-}
 
 /**
  * efi_memmap_install - Install a new EFI memory map in efi.memmap
@@ -93,20 +22,10 @@ int __init efi_memmap_alloc(unsigned int num_entries,
  */
 int __init efi_memmap_install(struct efi_memory_map_data *data)
 {
-	unsigned long size = efi.memmap.desc_size * efi.memmap.nr_map;
-	unsigned long flags = efi.memmap.flags;
-	u64 phys = efi.memmap.phys_map;
-	int ret;
-
 	efi_memmap_unmap();
 
 	if (efi_enabled(EFI_PARAVIRT))
 		return 0;
 
-	ret = __efi_memmap_init(data);
-	if (ret)
-		return ret;
-
-	__efi_memmap_free(phys, size, flags);
-	return 0;
+	return __efi_memmap_init(data);
 }
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 8a4a0c6b64bc..5bf97376c1a0 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -359,12 +359,15 @@ static struct efi_freeable_range *ranges_to_free;
 
 void __init efi_unmap_boot_services(void)
 {
-	struct efi_memory_map_data data = { 0 };
+	struct efi_memory_map_data data = {
+		.phys_map	= efi.memmap.phys_map,
+		.desc_version	= efi.memmap.desc_version,
+		.desc_size	= efi.memmap.desc_size,
+	};
 	efi_memory_desc_t *md;
-	int num_entries = 0;
+	void *new_md;
 	int idx = 0;
 	size_t sz;
-	void *new, *new_md;
 
 	/* Keep all regions for /sys/kernel/debug/efi */
 	if (efi_enabled(EFI_DBG))
@@ -377,6 +380,7 @@ void __init efi_unmap_boot_services(void)
 		return;
 	}
 
+	new_md = efi.memmap.map;
 	for_each_efi_memory_desc(md) {
 		unsigned long long start = md->phys_addr;
 		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
@@ -384,7 +388,6 @@ void __init efi_unmap_boot_services(void)
 
 		if (md->type != EFI_BOOT_SERVICES_CODE &&
 		    md->type != EFI_BOOT_SERVICES_DATA) {
-			num_entries++;
 			continue;
 		}
 
@@ -397,7 +400,6 @@ void __init efi_unmap_boot_services(void)
 
 		/* Do not free, someone else owns it: */
 		if (md->attribute & EFI_MEMORY_RUNTIME) {
-			num_entries++;
 			continue;
 		}
 
@@ -432,26 +434,12 @@ void __init efi_unmap_boot_services(void)
 		idx++;
 	}
 
-	if (!num_entries)
-		return;
-
-	if (efi_memmap_alloc(num_entries, &data) != 0) {
-		pr_err("Failed to allocate new EFI memmap\n");
-		return;
-	}
-
-	new = memremap(data.phys_map, data.size, MEMREMAP_WB);
-	if (!new) {
-		pr_err("Failed to map new EFI memmap\n");
-		return;
-	}
-
 	/*
 	 * Build a new EFI memmap that excludes any boot services
 	 * regions that are not tagged EFI_MEMORY_RUNTIME, since those
 	 * regions have now been freed.
 	 */
-	new_md = new;
+	new_md = efi.memmap.map;
 	for_each_efi_memory_desc(md) {
 		if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
 		    (md->type == EFI_BOOT_SERVICES_CODE ||
@@ -466,7 +454,7 @@ void __init efi_unmap_boot_services(void)
 		new_md += efi.memmap.desc_size;
 	}
 
-	memunmap(new);
+	data.size = new_md - efi.memmap.map;
 
 	if (efi_memmap_install(&data) != 0) {
 		pr_err("Could not install new EFI memmap\n");
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 664898d09ff5..dbf5971dd1c5 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -572,8 +572,6 @@ struct efi_memory_map {
 	unsigned long desc_version;
 	unsigned long desc_size;
 #define EFI_MEMMAP_LATE (1UL << 0)
-#define EFI_MEMMAP_MEMBLOCK (1UL << 1)
-#define EFI_MEMMAP_SLAB (1UL << 2)
 	unsigned long flags;
 };
 
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 8/9] x86/efi: Defer compaction of the EFI memory map
  2026-03-06 15:57 [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2026-03-06 15:57 ` [RFC PATCH 7/9] x86/efi: Reuse memory map instead of reallocating it Ard Biesheuvel
@ 2026-03-06 15:57 ` Ard Biesheuvel
  2026-03-06 15:57 ` [RFC PATCH 9/9] x86/efi: Free unused tail " Ard Biesheuvel
  9 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 15:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
	Benjamin Herrenschmidt

From: Ard Biesheuvel <ardb@kernel.org>

Currently, the EFI memory map is compacted early at boot, to leave only
the entries that are significant to the current kernel or potentially a
kexec'ed kernel that comes after, and to suppress all boot services code
and data entries that have no correspondence with anything that either
the firmware or the kernel treats as reserved for firmware use.

Given that actually freeing those regions to the page allocator is not
possible yet at this point, those suppressed entries are converted into
yet another type of temporary memory reservation map, and freed during
an arch_initcall(), which is the earliest convenient time to actually
perform this operation.

Given that compacting the memory map does not need to occur that early
to begin with, move it to the arch_initcall(). This removes the need for
the special memory reservation map, as the entries still exist at this
point, and can be consulted directly to decide whether they need to be
preserved in their entirety or only partially.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/platform/efi/quirks.c | 130 +++++++-------------
 1 file changed, 46 insertions(+), 84 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 5bf97376c1a0..d7a64b404bea 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -350,37 +350,10 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
 		pr_err("Failed to unmap VA mapping for 0x%llx\n", va);
 }
 
-struct efi_freeable_range {
-	u64 start;
-	u64 end;
-};
-
-static struct efi_freeable_range *ranges_to_free;
-
 void __init efi_unmap_boot_services(void)
 {
-	struct efi_memory_map_data data = {
-		.phys_map	= efi.memmap.phys_map,
-		.desc_version	= efi.memmap.desc_version,
-		.desc_size	= efi.memmap.desc_size,
-	};
 	efi_memory_desc_t *md;
-	void *new_md;
-	int idx = 0;
-	size_t sz;
 
-	/* Keep all regions for /sys/kernel/debug/efi */
-	if (efi_enabled(EFI_DBG))
-		return;
-
-	sz = sizeof(*ranges_to_free) * efi.memmap.nr_map + 1;
-	ranges_to_free = kzalloc(sz, GFP_KERNEL);
-	if (!ranges_to_free) {
-		pr_err("Failed to allocate storage for freeable EFI regions\n");
-		return;
-	}
-
-	new_md = efi.memmap.map;
 	for_each_efi_memory_desc(md) {
 		unsigned long long start = md->phys_addr;
 		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
@@ -394,15 +367,10 @@ void __init efi_unmap_boot_services(void)
 		/*
 		 * Before calling set_virtual_address_map(), EFI boot services
 		 * code/data regions were mapped as a quirk for buggy firmware.
-		 * Unmap them from efi_pgd before freeing them up.
+		 * Unmap them from efi_pgd, they will be freed later.
 		 */
 		efi_unmap_pages(md);
 
-		/* Do not free, someone else owns it: */
-		if (md->attribute & EFI_MEMORY_RUNTIME) {
-			continue;
-		}
-
 		/*
 		 * Nasty quirk: if all sub-1MB memory is used for boot
 		 * services, we can get here without having allocated the
@@ -416,49 +384,14 @@ void __init efi_unmap_boot_services(void)
 		 * this happened, but Linux should still try to boot rather
 		 * panicking early.)
 		 */
-		rm_size = real_mode_size_needed();
+		rm_size = PAGE_ALIGN(real_mode_size_needed());
 		if (rm_size && (start + rm_size) < (1<<20) && size >= rm_size) {
 			set_real_mode_mem(start);
-			start += rm_size;
-			size -= rm_size;
-		}
-
-		/*
-		 * With CONFIG_DEFERRED_STRUCT_PAGE_INIT parts of the memory
-		 * map are still not initialized and we can't reliably free
-		 * memory here.
-		 * Queue the ranges to free at a later point.
-		 */
-		ranges_to_free[idx].start = start;
-		ranges_to_free[idx].end = start + size;
-		idx++;
-	}
 
-	/*
-	 * Build a new EFI memmap that excludes any boot services
-	 * regions that are not tagged EFI_MEMORY_RUNTIME, since those
-	 * regions have now been freed.
-	 */
-	new_md = efi.memmap.map;
-	for_each_efi_memory_desc(md) {
-		if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
-		    (md->type == EFI_BOOT_SERVICES_CODE ||
-		     md->type == EFI_BOOT_SERVICES_DATA) &&
-		    !e820__mapped_any(md->phys_addr,
-				      md->phys_addr + md->num_pages * EFI_PAGE_SIZE,
-				      E820_TYPE_RESERVED)) {
-			continue;
+			/* Remove the allocated space from the descriptor */
+			md->phys_addr += rm_size;
+			md->num_pages -= rm_size / EFI_PAGE_SIZE;
 		}
-
-		memcpy(new_md, md, efi.memmap.desc_size);
-		new_md += efi.memmap.desc_size;
-	}
-
-	data.size = new_md - efi.memmap.map;
-
-	if (efi_memmap_install(&data) != 0) {
-		pr_err("Could not install new EFI memmap\n");
-		return;
 	}
 }
 
@@ -498,24 +431,53 @@ efi_free_unreserved_subregions(u64 range_start, u64 range_end)
 
 static int __init efi_free_boot_services(void)
 {
-	struct efi_freeable_range *range = ranges_to_free;
+	struct efi_memory_map_data data = {
+		.phys_map	= efi.memmap.phys_map,
+		.desc_version	= efi.memmap.desc_version,
+		.desc_size	= efi.memmap.desc_size,
+	};
 	unsigned long freed = 0;
+	efi_memory_desc_t *md;
+	void *new_md;
 
-	if (!ranges_to_free)
+	/* Keep all regions for /sys/kernel/debug/efi */
+	if (efi_enabled(EFI_DBG))
 		return 0;
 
-	while (range->start) {
-		/*
-		 * Don't free memory under 1M for two reasons:
-		 * - BIOS might clobber it
-		 * - Crash kernel needs it to be reserved
-		 */
-		u64 start = max(range->start, SZ_1M);
+	new_md = efi.memmap.map;
+	for_each_efi_memory_desc(md) {
+		u64 md_start = max(md->phys_addr, SZ_1M);
+		u64 md_end = md->phys_addr + md->num_pages * EFI_PAGE_SIZE;
+		bool preserve_entry = true;
+
+		if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
+		    (md->type == EFI_BOOT_SERVICES_CODE ||
+		     md->type == EFI_BOOT_SERVICES_DATA)) {
+			u64 f = efi_free_unreserved_subregions(md_start, md_end);
+
+			/*
+			 * Omit the memory map entry of this region only if it
+			 * has been freed entirely. This ensures that boot data
+			 * regions for things like ESRT and BGRT tables carry
+			 * over correctly during kexec.
+			 */
+			if (f == md_end - md_start)
+				preserve_entry = false;
+
+			freed += f;
+		}
 
-		freed += efi_free_unreserved_subregions(start, range->end);
-		range++;
+		if (preserve_entry) {
+			if (new_md != md)
+				memcpy(new_md, md, efi.memmap.desc_size);
+			new_md += efi.memmap.desc_size;
+		}
 	}
-	kfree(ranges_to_free);
+
+	data.size = new_md - efi.memmap.map;
+
+	if (efi_memmap_install(&data) != 0)
+		pr_err("Could not install new EFI memmap\n");
 
 	if (freed)
 		pr_info("Freeing EFI boot services memory: %ldK\n", freed / SZ_1K);
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 9/9] x86/efi: Free unused tail of the EFI memory map
  2026-03-06 15:57 [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2026-03-06 15:57 ` [RFC PATCH 8/9] x86/efi: Defer compaction of the EFI memory map Ard Biesheuvel
@ 2026-03-06 15:57 ` Ard Biesheuvel
  9 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 15:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-efi, x86, Ard Biesheuvel, Mike Rapoport (Microsoft),
	Benjamin Herrenschmidt

From: Ard Biesheuvel <ardb@kernel.org>

After moving the relevant entries to the start of the map, the remainder
can be handed back to the page allocator.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/platform/efi/quirks.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index d7a64b404bea..4d94b1e82c28 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -475,10 +475,15 @@ static int __init efi_free_boot_services(void)
 	}
 
 	data.size = new_md - efi.memmap.map;
+	md = efi.memmap.map_end;
 
 	if (efi_memmap_install(&data) != 0)
 		pr_err("Could not install new EFI memmap\n");
 
+	/* Free the part of the memory map allocation that has become unused */
+	free_reserved_area(new_md, md, -1, NULL);
+	freed += (void *)md - new_md;
+
 	if (freed)
 		pr_info("Freeing EFI boot services memory: %ldK\n", freed / SZ_1K);
 
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] x86/efi: Omit kernel reservations of boot services memory from memmap
  2026-03-06 15:57 ` [PATCH 4/4] x86/efi: Omit kernel reservations of boot services memory from memmap Ard Biesheuvel
@ 2026-03-06 16:00   ` Ard Biesheuvel
  0 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2026-03-06 16:00 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-kernel
  Cc: linux-efi, x86, Mike Rapoport, Benjamin Herrenschmidt

Please disregard this 4/4 - it is a stale version from my working dir, apologies.

On Fri, 6 Mar 2026, at 16:57, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Now that efi_mem_reserve() has been updated to rely on RSRV_KERN
> memblock reservations, it is no longer needed to mark memblock reserved
> regions as EFI_MEMORY_RUNTIME. This means that it is no longer needed to
> split existing entries in the EFI memory map, removing the need to
> re-allocate/copy/remap the entire EFI memory map on every call to
> efi_mem_reserve().
>
> So drop this functionality - it is no longer needed.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/include/asm/efi.h     |   4 -
>  arch/x86/platform/efi/memmap.c | 138 --------------------
>  arch/x86/platform/efi/quirks.c |  54 --------
>  3 files changed, 196 deletions(-)
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/9] memblock: Permit existing reserved regions to be marked RSRV_KERN
  2026-03-06 15:57 ` [RFC PATCH 1/9] memblock: Permit existing reserved regions to be marked RSRV_KERN Ard Biesheuvel
@ 2026-03-16  6:53   ` Mike Rapoport
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2026-03-16  6:53 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, linux-efi, x86, Ard Biesheuvel,
	Benjamin Herrenschmidt

On Fri, Mar 06, 2026 at 04:57:05PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Permit existing memblock reservations to be marked as RSRV_KERN. This
> will be used by the EFI code on x86 to distinguish between reservations
> of boot services data regions that have actual significance to the
> kernel and regions that are reserved temporarily to work around buggy
> firmware.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  include/linux/memblock.h |  1 +
>  mm/memblock.c            | 15 +++++++++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 6ec5e9ac0699..9eac4f268359 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -155,6 +155,7 @@ int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
>  int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
>  int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
>  int memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t size);
> +int memblock_reserved_mark_kern(phys_addr_t base, phys_addr_t size);
>  int memblock_mark_kho_scratch(phys_addr_t base, phys_addr_t size);
>  int memblock_clear_kho_scratch(phys_addr_t base, phys_addr_t size);
>  
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b3ddfdec7a80..2505ce8b319c 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1115,6 +1115,21 @@ int __init_memblock memblock_reserved_mark_noinit(phys_addr_t base, phys_addr_t
>  				    MEMBLOCK_RSRV_NOINIT);
>  }
>  
> +/**
> + * memblock_reserved_mark_kern - Mark a reserved memory region with flag
> + * MEMBLOCK_RSRV_KERN
> + *
> + * @base: the base phys addr of the region
> + * @size: the size of the region
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +int __init_memblock memblock_reserved_mark_kern(phys_addr_t base, phys_addr_t size)
> +{
> +	return memblock_setclr_flag(&memblock.reserved, base, size, 1,
> +				    MEMBLOCK_RSRV_KERN);
> +}
> +
>  /**
>   * memblock_mark_kho_scratch - Mark a memory region as MEMBLOCK_KHO_SCRATCH.
>   * @base: the base phys addr of the region
> -- 
> 2.53.0.473.g4a7958ca14-goog
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 2/9] efi: Tag memblock reservations of boot services regions as RSRV_KERN
  2026-03-06 15:57 ` [RFC PATCH 2/9] efi: Tag memblock reservations of boot services regions as RSRV_KERN Ard Biesheuvel
@ 2026-03-16  6:55   ` Mike Rapoport
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2026-03-16  6:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, linux-efi, x86, Ard Biesheuvel,
	Benjamin Herrenschmidt

On Fri, Mar 06, 2026 at 04:57:06PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> By definition, EFI memory regions of type boot services code or data
> have no special significance to the firmware at runtime, only to the OS.
> In some cases, the firmware will allocate tables and other assets that
> are passed in memory in regions of this type, and leave it up to the OS
> to decide whether or not to treat the allocation as special, or simply
> consume the contents at boot and recycle the RAM for ordinary use. The
> reason for this approach is that it avoids needless memory reservations
> for assets that the OS knows nothing about, and therefore doesn't know
> how to free either.
> 
> This means that any memblock reservations covering such regions can be
> marked as MEMBLOCK_RSRV_KERN - this is a better match semantically, and
> is useful on x86 to distinguish true reservations from temporary
> reservations that are only needed to work around firmware bugs.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  drivers/firmware/efi/efi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index b2fb92a4bbd1..e4ab7481bbf6 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -600,7 +600,9 @@ void __init efi_mem_reserve(phys_addr_t addr, u64 size)
>  		return;
>  
>  	if (!memblock_is_region_reserved(addr, size))
> -		memblock_reserve(addr, size);
> +		memblock_reserve_kern(addr, size);
> +	else
> +		memblock_reserved_mark_kern(addr, size);
>  
>  	/*
>  	 * Some architectures (x86) reserve all boot services ranges
> -- 
> 2.53.0.473.g4a7958ca14-goog
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-03-16  6:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 15:57 [RFC PATCH 0/9] efi/x86: Avoid the need to mangle the EFI memory map Ard Biesheuvel
2026-03-06 15:57 ` [RFC PATCH 1/9] memblock: Permit existing reserved regions to be marked RSRV_KERN Ard Biesheuvel
2026-03-16  6:53   ` Mike Rapoport
2026-03-06 15:57 ` [RFC PATCH 2/9] efi: Tag memblock reservations of boot services regions as RSRV_KERN Ard Biesheuvel
2026-03-16  6:55   ` Mike Rapoport
2026-03-06 15:57 ` [RFC PATCH 3/9] x86/efi: Omit RSRV_KERN memblock reservations when freeing boot regions Ard Biesheuvel
2026-03-06 15:57 ` [RFC PATCH 4/9] x86/efi: Defer sub-1M check from unmap to free stage Ard Biesheuvel
2026-03-06 15:57 ` [PATCH 4/4] x86/efi: Omit kernel reservations of boot services memory from memmap Ard Biesheuvel
2026-03-06 16:00   ` Ard Biesheuvel
2026-03-06 15:57 ` [RFC PATCH 5/9] x86/efi: Unmap kernel-reserved boot regions from EFI page tables Ard Biesheuvel
2026-03-06 15:57 ` [RFC PATCH 6/9] x86/efi: Do not rely on EFI_MEMORY_RUNTIME bit and avoid entry splitting Ard Biesheuvel
2026-03-06 15:57 ` [RFC PATCH 7/9] x86/efi: Reuse memory map instead of reallocating it Ard Biesheuvel
2026-03-06 15:57 ` [RFC PATCH 8/9] x86/efi: Defer compaction of the EFI memory map Ard Biesheuvel
2026-03-06 15:57 ` [RFC PATCH 9/9] x86/efi: Free unused tail " Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox