* [PATCH v2 1/9] memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name()
2026-03-23 7:48 [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
@ 2026-03-23 7:48 ` Mike Rapoport
2026-03-23 7:48 ` [PATCH v2 2/9] powerpc: fadump: pair alloc_pages_exact() with free_pages_exact() Mike Rapoport
` (8 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2026-03-23 7:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Potapenko, Alexander Viro, Andreas Larsson,
Ard Biesheuvel, Borislav Petkov, Brendan Jackman,
Christophe Leroy (CS GROUP), Catalin Marinas, Christian Brauner,
David S. Miller, Dave Hansen, David Hildenbrand, Dmitry Vyukov,
Ilias Apalodimas, Ingo Molnar, Jan Kara, Johannes Weiner,
Liam R. Howlett, Lorenzo Stoakes, Madhavan Srinivasan,
Marco Elver, Marek Szyprowski, Masami Hiramatsu, Michael Ellerman,
Michal Hocko, Mike Rapoport, Nicholas Piggin, H. Peter Anvin,
Rob Herring, Robin Murphy, Saravana Kannan, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, Will Deacon, Zi Yan, devicetree,
iommu, kasan-dev, linux-arm-kernel, linux-efi, linux-fsdevel,
linux-kernel, linux-mm, linux-trace-kernel, linuxppc-dev,
sparclinux, x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
free_reserved_area() expects end parameter to point to the first address
after the area, but reserve_mem_release_by_name() passes it the last
address inside the area.
Remove subtraction of one in calculation of the area end.
Fixes: 74e2498ccf7b ("mm/memblock: Add reserved memory release function")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
mm/memblock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index b3ddfdec7a80..d4a02f1750e9 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2434,7 +2434,7 @@ int reserve_mem_release_by_name(const char *name)
return 0;
start = phys_to_virt(map->start);
- end = start + map->size - 1;
+ end = start + map->size;
snprintf(buf, sizeof(buf), "reserve_mem:%s", name);
free_reserved_area(start, end, 0, buf);
map->size = 0;
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 2/9] powerpc: fadump: pair alloc_pages_exact() with free_pages_exact()
2026-03-23 7:48 [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
2026-03-23 7:48 ` [PATCH v2 1/9] memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name() Mike Rapoport
@ 2026-03-23 7:48 ` Mike Rapoport
2026-03-23 7:48 ` [PATCH v2 3/9] powerpc: opal-core: " Mike Rapoport
` (7 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2026-03-23 7:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Potapenko, Alexander Viro, Andreas Larsson,
Ard Biesheuvel, Borislav Petkov, Brendan Jackman,
Christophe Leroy (CS GROUP), Catalin Marinas, Christian Brauner,
David S. Miller, Dave Hansen, David Hildenbrand, Dmitry Vyukov,
Ilias Apalodimas, Ingo Molnar, Jan Kara, Johannes Weiner,
Liam R. Howlett, Lorenzo Stoakes, Madhavan Srinivasan,
Marco Elver, Marek Szyprowski, Masami Hiramatsu, Michael Ellerman,
Michal Hocko, Mike Rapoport, Nicholas Piggin, H. Peter Anvin,
Rob Herring, Robin Murphy, Saravana Kannan, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, Will Deacon, Zi Yan, devicetree,
iommu, kasan-dev, linux-arm-kernel, linux-efi, linux-fsdevel,
linux-kernel, linux-mm, linux-trace-kernel, linuxppc-dev,
sparclinux, x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
fadump allocates buffers with alloc_pages_exact(), but then marks them
as reserved and frees using free_reserved_area().
This is completely unnecessary and the pages allocated with
alloc_pages_exact() can be naturally freed with free_pages_exact().
Replace freeing of memory in fadump_free_buffer() with
free_pages_exact() and simplify allocation code so that it won't mark
allocated pages as reserved.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
arch/powerpc/kernel/fadump.c | 16 ++--------------
1 file changed, 2 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 4ebc333dd786..501d43bf18f3 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -775,24 +775,12 @@ void __init fadump_update_elfcore_header(char *bufp)
static void *__init fadump_alloc_buffer(unsigned long size)
{
- unsigned long count, i;
- struct page *page;
- void *vaddr;
-
- vaddr = alloc_pages_exact(size, GFP_KERNEL | __GFP_ZERO);
- if (!vaddr)
- return NULL;
-
- count = PAGE_ALIGN(size) / PAGE_SIZE;
- page = virt_to_page(vaddr);
- for (i = 0; i < count; i++)
- mark_page_reserved(page + i);
- return vaddr;
+ return alloc_pages_exact(size, GFP_KERNEL | __GFP_ZERO);
}
static void fadump_free_buffer(unsigned long vaddr, unsigned long size)
{
- free_reserved_area((void *)vaddr, (void *)(vaddr + size), -1, NULL);
+ free_pages_exact((void *)vaddr, size);
}
s32 __init fadump_setup_cpu_notes_buf(u32 num_cpus)
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 3/9] powerpc: opal-core: pair alloc_pages_exact() with free_pages_exact()
2026-03-23 7:48 [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
2026-03-23 7:48 ` [PATCH v2 1/9] memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name() Mike Rapoport
2026-03-23 7:48 ` [PATCH v2 2/9] powerpc: fadump: pair alloc_pages_exact() with free_pages_exact() Mike Rapoport
@ 2026-03-23 7:48 ` Mike Rapoport
2026-03-23 7:48 ` [PATCH v2 4/9] mm: move free_reserved_area() to mm/memblock.c Mike Rapoport
` (6 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2026-03-23 7:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Potapenko, Alexander Viro, Andreas Larsson,
Ard Biesheuvel, Borislav Petkov, Brendan Jackman,
Christophe Leroy (CS GROUP), Catalin Marinas, Christian Brauner,
David S. Miller, Dave Hansen, David Hildenbrand, Dmitry Vyukov,
Ilias Apalodimas, Ingo Molnar, Jan Kara, Johannes Weiner,
Liam R. Howlett, Lorenzo Stoakes, Madhavan Srinivasan,
Marco Elver, Marek Szyprowski, Masami Hiramatsu, Michael Ellerman,
Michal Hocko, Mike Rapoport, Nicholas Piggin, H. Peter Anvin,
Rob Herring, Robin Murphy, Saravana Kannan, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, Will Deacon, Zi Yan, devicetree,
iommu, kasan-dev, linux-arm-kernel, linux-efi, linux-fsdevel,
linux-kernel, linux-mm, linux-trace-kernel, linuxppc-dev,
sparclinux, x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
opal-core allocates buffers with alloc_pages_exact(), but then
marks them as reserved and frees using free_reserved_area().
This is completely unnecessary and the pages allocated with
alloc_pages_exact() can be naturally freed with free_pages_exact().
Replace freeing of memory in opalcore_cleanup() with
free_pages_exact() and simplify allocation code so that it won't mark
allocated pages as reserved.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
arch/powerpc/platforms/powernv/opal-core.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/opal-core.c b/arch/powerpc/platforms/powernv/opal-core.c
index e76e462f55f6..32662d30d70f 100644
--- a/arch/powerpc/platforms/powernv/opal-core.c
+++ b/arch/powerpc/platforms/powernv/opal-core.c
@@ -303,7 +303,6 @@ static int __init create_opalcore(void)
struct device_node *dn;
struct opalcore *new;
loff_t opalcore_off;
- struct page *page;
Elf64_Phdr *phdr;
Elf64_Ehdr *elf;
int i, ret;
@@ -328,11 +327,6 @@ static int __init create_opalcore(void)
oc_conf->opalcorebuf_sz = 0;
return -ENOMEM;
}
- count = oc_conf->opalcorebuf_sz / PAGE_SIZE;
- page = virt_to_page(oc_conf->opalcorebuf);
- for (i = 0; i < count; i++)
- mark_page_reserved(page + i);
-
pr_debug("opalcorebuf = 0x%llx\n", (u64)oc_conf->opalcorebuf);
/* Read OPAL related device-tree entries */
@@ -437,10 +431,7 @@ static void opalcore_cleanup(void)
/* free the buffer used for setting up OPAL core */
if (oc_conf->opalcorebuf) {
- void *end = (void *)((u64)oc_conf->opalcorebuf +
- oc_conf->opalcorebuf_sz);
-
- free_reserved_area(oc_conf->opalcorebuf, end, -1, NULL);
+ free_pages_exact(oc_conf->opalcorebuf, oc_conf->opalcorebuf_sz);
oc_conf->opalcorebuf = NULL;
oc_conf->opalcorebuf_sz = 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 4/9] mm: move free_reserved_area() to mm/memblock.c
2026-03-23 7:48 [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
` (2 preceding siblings ...)
2026-03-23 7:48 ` [PATCH v2 3/9] powerpc: opal-core: " Mike Rapoport
@ 2026-03-23 7:48 ` Mike Rapoport
2026-03-23 7:48 ` [PATCH v2 5/9] memblock: make free_reserved_area() more robust Mike Rapoport
` (5 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2026-03-23 7:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Potapenko, Alexander Viro, Andreas Larsson,
Ard Biesheuvel, Borislav Petkov, Brendan Jackman,
Christophe Leroy (CS GROUP), Catalin Marinas, Christian Brauner,
David S. Miller, Dave Hansen, David Hildenbrand, Dmitry Vyukov,
Ilias Apalodimas, Ingo Molnar, Jan Kara, Johannes Weiner,
Liam R. Howlett, Lorenzo Stoakes, Madhavan Srinivasan,
Marco Elver, Marek Szyprowski, Masami Hiramatsu, Michael Ellerman,
Michal Hocko, Mike Rapoport, Nicholas Piggin, H. Peter Anvin,
Rob Herring, Robin Murphy, Saravana Kannan, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, Will Deacon, Zi Yan, devicetree,
iommu, kasan-dev, linux-arm-kernel, linux-efi, linux-fsdevel,
linux-kernel, linux-mm, linux-trace-kernel, linuxppc-dev,
sparclinux, x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
free_reserved_area() is related to memblock as it frees reserved memory
back to the buddy allocator, similar to what memblock_free_late() does.
Move free_reserved_area() to mm/memblock.c to prepare for further
consolidation of the functions that free reserved memory.
No functional changes.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
mm/memblock.c | 37 ++++++++++++++++++++++++++++++-
mm/page_alloc.c | 36 ------------------------------
tools/include/linux/mm.h | 1 +
tools/testing/memblock/internal.h | 34 +++++++++++++++++++++++++---
4 files changed, 68 insertions(+), 40 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index d4a02f1750e9..c0896efbee97 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -893,6 +893,42 @@ int __init_memblock memblock_remove(phys_addr_t base, phys_addr_t size)
return memblock_remove_range(&memblock.memory, base, size);
}
+unsigned long free_reserved_area(void *start, void *end, int poison, const char *s)
+{
+ void *pos;
+ unsigned long pages = 0;
+
+ start = (void *)PAGE_ALIGN((unsigned long)start);
+ end = (void *)((unsigned long)end & PAGE_MASK);
+ for (pos = start; pos < end; pos += PAGE_SIZE, pages++) {
+ struct page *page = virt_to_page(pos);
+ void *direct_map_addr;
+
+ /*
+ * 'direct_map_addr' might be different from 'pos'
+ * because some architectures' virt_to_page()
+ * work with aliases. Getting the direct map
+ * address ensures that we get a _writeable_
+ * alias for the memset().
+ */
+ direct_map_addr = page_address(page);
+ /*
+ * Perform a kasan-unchecked memset() since this memory
+ * has not been initialized.
+ */
+ direct_map_addr = kasan_reset_tag(direct_map_addr);
+ if ((unsigned int)poison <= 0xFF)
+ memset(direct_map_addr, poison, PAGE_SIZE);
+
+ free_reserved_page(page);
+ }
+
+ if (pages && s)
+ pr_info("Freeing %s memory: %ldK\n", s, K(pages));
+
+ return pages;
+}
+
/**
* memblock_free - free boot memory allocation
* @ptr: starting address of the boot memory allocation
@@ -1776,7 +1812,6 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
totalram_pages_inc();
}
}
-
/*
* Remaining API functions
*/
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d4b6f1a554e..df3d61253001 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6234,42 +6234,6 @@ void adjust_managed_page_count(struct page *page, long count)
}
EXPORT_SYMBOL(adjust_managed_page_count);
-unsigned long free_reserved_area(void *start, void *end, int poison, const char *s)
-{
- void *pos;
- unsigned long pages = 0;
-
- start = (void *)PAGE_ALIGN((unsigned long)start);
- end = (void *)((unsigned long)end & PAGE_MASK);
- for (pos = start; pos < end; pos += PAGE_SIZE, pages++) {
- struct page *page = virt_to_page(pos);
- void *direct_map_addr;
-
- /*
- * 'direct_map_addr' might be different from 'pos'
- * because some architectures' virt_to_page()
- * work with aliases. Getting the direct map
- * address ensures that we get a _writeable_
- * alias for the memset().
- */
- direct_map_addr = page_address(page);
- /*
- * Perform a kasan-unchecked memset() since this memory
- * has not been initialized.
- */
- direct_map_addr = kasan_reset_tag(direct_map_addr);
- if ((unsigned int)poison <= 0xFF)
- memset(direct_map_addr, poison, PAGE_SIZE);
-
- free_reserved_page(page);
- }
-
- if (pages && s)
- pr_info("Freeing %s memory: %ldK\n", s, K(pages));
-
- return pages;
-}
-
void free_reserved_page(struct page *page)
{
clear_page_tag_ref(page);
diff --git a/tools/include/linux/mm.h b/tools/include/linux/mm.h
index 028f3faf46e7..4407d8396108 100644
--- a/tools/include/linux/mm.h
+++ b/tools/include/linux/mm.h
@@ -17,6 +17,7 @@
#define __va(x) ((void *)((unsigned long)(x)))
#define __pa(x) ((unsigned long)(x))
+#define __pa_symbol(x) ((unsigned long)(x))
#define pfn_to_page(pfn) ((void *)((pfn) * PAGE_SIZE))
diff --git a/tools/testing/memblock/internal.h b/tools/testing/memblock/internal.h
index 009b97bbdd22..b72be2968104 100644
--- a/tools/testing/memblock/internal.h
+++ b/tools/testing/memblock/internal.h
@@ -11,9 +11,22 @@ static int memblock_debug = 1;
#define pr_warn_ratelimited(fmt, ...) printf(fmt, ##__VA_ARGS__)
+#define K(x) ((x) << (PAGE_SHIFT-10))
+
bool mirrored_kernelcore = false;
struct page {};
+static inline void *page_address(struct page *page)
+{
+ BUG();
+ return page;
+}
+
+static inline struct page *virt_to_page(void *virt)
+{
+ BUG();
+ return virt;
+}
void memblock_free_pages(unsigned long pfn, unsigned int order)
{
@@ -23,10 +36,25 @@ static inline void accept_memory(phys_addr_t start, unsigned long size)
{
}
-static inline unsigned long free_reserved_area(void *start, void *end,
- int poison, const char *s)
+unsigned long free_reserved_area(void *start, void *end, int poison, const char *s);
+void free_reserved_page(struct page *page);
+
+static inline bool deferred_pages_enabled(void)
+{
+ return false;
+}
+
+#define for_each_valid_pfn(pfn, start_pfn, end_pfn) \
+ for ((pfn) = (start_pfn); (pfn) < (end_pfn); (pfn)++)
+
+static inline void *kasan_reset_tag(const void *addr)
+{
+ return (void *)addr;
+}
+
+static inline bool __is_kernel(unsigned long addr)
{
- return 0;
+ return false;
}
#endif
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 5/9] memblock: make free_reserved_area() more robust
2026-03-23 7:48 [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
` (3 preceding siblings ...)
2026-03-23 7:48 ` [PATCH v2 4/9] mm: move free_reserved_area() to mm/memblock.c Mike Rapoport
@ 2026-03-23 7:48 ` Mike Rapoport
2026-03-23 7:48 ` [PATCH v2 6/9] memblock: extract page freeing from free_reserved_area() into a helper Mike Rapoport
` (4 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2026-03-23 7:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Potapenko, Alexander Viro, Andreas Larsson,
Ard Biesheuvel, Borislav Petkov, Brendan Jackman,
Christophe Leroy (CS GROUP), Catalin Marinas, Christian Brauner,
David S. Miller, Dave Hansen, David Hildenbrand, Dmitry Vyukov,
Ilias Apalodimas, Ingo Molnar, Jan Kara, Johannes Weiner,
Liam R. Howlett, Lorenzo Stoakes, Madhavan Srinivasan,
Marco Elver, Marek Szyprowski, Masami Hiramatsu, Michael Ellerman,
Michal Hocko, Mike Rapoport, Nicholas Piggin, H. Peter Anvin,
Rob Herring, Robin Murphy, Saravana Kannan, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, Will Deacon, Zi Yan, devicetree,
iommu, kasan-dev, linux-arm-kernel, linux-efi, linux-fsdevel,
linux-kernel, linux-mm, linux-trace-kernel, linuxppc-dev,
sparclinux, x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
There are two potential problems in free_reserved_area():
* it may free a page with not-existent buddy page
* it may be passed a virtual address from an alias mapping that won't
be properly translated by virt_to_page(), for example a symbol on arm64
While first issue is quite theoretical and the second one does not manifest
itself because all the callers do the right thing, it is easy to make
free_reserved_area() robust enough to avoid these potential issues.
Replace the loop by virtual address with a loop by pfn that uses
for_each_valid_pfn() and use __pa() or __pa_symbol() depending on the
virtual mapping alias to correctly determine the loop boundaries.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
mm/memblock.c | 34 +++++++++++++++++++++++-----------
1 file changed, 23 insertions(+), 11 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index c0896efbee97..eb086724802a 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -895,21 +895,32 @@ int __init_memblock memblock_remove(phys_addr_t base, phys_addr_t size)
unsigned long free_reserved_area(void *start, void *end, int poison, const char *s)
{
- void *pos;
- unsigned long pages = 0;
+ phys_addr_t start_pa, end_pa;
+ unsigned long pages = 0, pfn;
- start = (void *)PAGE_ALIGN((unsigned long)start);
- end = (void *)((unsigned long)end & PAGE_MASK);
- for (pos = start; pos < end; pos += PAGE_SIZE, pages++) {
- struct page *page = virt_to_page(pos);
+ /*
+ * end is the first address past the region and it may be beyond what
+ * __pa() or __pa_symbol() can handle.
+ * Use the address included in the range for the conversion and add
+ * back 1 afterwards.
+ */
+ if (__is_kernel((unsigned long)start)) {
+ start_pa = __pa_symbol(start);
+ end_pa = __pa_symbol(end - 1) + 1;
+ } else {
+ start_pa = __pa(start);
+ end_pa = __pa(end - 1) + 1;
+ }
+
+ for_each_valid_pfn(pfn, PFN_UP(start_pa), PFN_DOWN(end_pa)) {
+ struct page *page = pfn_to_page(pfn);
void *direct_map_addr;
/*
- * 'direct_map_addr' might be different from 'pos'
- * because some architectures' virt_to_page()
- * work with aliases. Getting the direct map
- * address ensures that we get a _writeable_
- * alias for the memset().
+ * 'direct_map_addr' might be different from the kernel virtual
+ * address because some architectures use aliases.
+ * Going via physical address, pfn_to_page() and page_address()
+ * ensures that we get a _writeable_ alias for the memset().
*/
direct_map_addr = page_address(page);
/*
@@ -921,6 +932,7 @@ unsigned long free_reserved_area(void *start, void *end, int poison, const char
memset(direct_map_addr, poison, PAGE_SIZE);
free_reserved_page(page);
+ pages++;
}
if (pages && s)
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 6/9] memblock: extract page freeing from free_reserved_area() into a helper
2026-03-23 7:48 [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
` (4 preceding siblings ...)
2026-03-23 7:48 ` [PATCH v2 5/9] memblock: make free_reserved_area() more robust Mike Rapoport
@ 2026-03-23 7:48 ` Mike Rapoport
2026-03-23 7:48 ` [PATCH v2 7/9] memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y Mike Rapoport
` (3 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2026-03-23 7:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Potapenko, Alexander Viro, Andreas Larsson,
Ard Biesheuvel, Borislav Petkov, Brendan Jackman,
Christophe Leroy (CS GROUP), Catalin Marinas, Christian Brauner,
David S. Miller, Dave Hansen, David Hildenbrand, Dmitry Vyukov,
Ilias Apalodimas, Ingo Molnar, Jan Kara, Johannes Weiner,
Liam R. Howlett, Lorenzo Stoakes, Madhavan Srinivasan,
Marco Elver, Marek Szyprowski, Masami Hiramatsu, Michael Ellerman,
Michal Hocko, Mike Rapoport, Nicholas Piggin, H. Peter Anvin,
Rob Herring, Robin Murphy, Saravana Kannan, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, Will Deacon, Zi Yan, devicetree,
iommu, kasan-dev, linux-arm-kernel, linux-efi, linux-fsdevel,
linux-kernel, linux-mm, linux-trace-kernel, linuxppc-dev,
sparclinux, x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
There are two functions that release pages to the buddy allocator late in
the boot: free_reserved_area() and memblock_free_late().
Currently they are using different underlying functionality,
free_reserved_area() runs each page being freed via free_reserved_page()
and memblock_free_late() uses memblock_free_pages() -> __free_pages_core(),
but in the end they both boil down to a loop that frees a range page by
page.
Extract the loop frees pages from free_reserved_area() into a helper and
use that helper in memblock_free_late().
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
mm/memblock.c | 55 +++++++++++++++++++++++++++------------------------
1 file changed, 29 insertions(+), 26 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index eb086724802a..ccdf3d225626 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -893,26 +893,12 @@ int __init_memblock memblock_remove(phys_addr_t base, phys_addr_t size)
return memblock_remove_range(&memblock.memory, base, size);
}
-unsigned long free_reserved_area(void *start, void *end, int poison, const char *s)
+static unsigned long __free_reserved_area(phys_addr_t start, phys_addr_t end,
+ int poison)
{
- phys_addr_t start_pa, end_pa;
unsigned long pages = 0, pfn;
- /*
- * end is the first address past the region and it may be beyond what
- * __pa() or __pa_symbol() can handle.
- * Use the address included in the range for the conversion and add
- * back 1 afterwards.
- */
- if (__is_kernel((unsigned long)start)) {
- start_pa = __pa_symbol(start);
- end_pa = __pa_symbol(end - 1) + 1;
- } else {
- start_pa = __pa(start);
- end_pa = __pa(end - 1) + 1;
- }
-
- for_each_valid_pfn(pfn, PFN_UP(start_pa), PFN_DOWN(end_pa)) {
+ for_each_valid_pfn(pfn, PFN_UP(start), PFN_DOWN(end)) {
struct page *page = pfn_to_page(pfn);
void *direct_map_addr;
@@ -934,7 +920,29 @@ unsigned long free_reserved_area(void *start, void *end, int poison, const char
free_reserved_page(page);
pages++;
}
+ return pages;
+}
+
+unsigned long free_reserved_area(void *start, void *end, int poison, const char *s)
+{
+ phys_addr_t start_pa, end_pa;
+ unsigned long pages;
+
+ /*
+ * end is the first address past the region and it may be beyond what
+ * __pa() or __pa_symbol() can handle.
+ * Use the address included in the range for the conversion and add back
+ * 1 afterwards.
+ */
+ if (__is_kernel((unsigned long)start)) {
+ start_pa = __pa_symbol(start);
+ end_pa = __pa_symbol(end - 1) + 1;
+ } else {
+ start_pa = __pa(start);
+ end_pa = __pa(end - 1) + 1;
+ }
+ pages = __free_reserved_area(start_pa, end_pa, poison);
if (pages && s)
pr_info("Freeing %s memory: %ldK\n", s, K(pages));
@@ -1810,20 +1818,15 @@ void *__init __memblock_alloc_or_panic(phys_addr_t size, phys_addr_t align,
*/
void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
{
- phys_addr_t cursor, end;
+ phys_addr_t end = base + size - 1;
- end = base + size - 1;
memblock_dbg("%s: [%pa-%pa] %pS\n",
__func__, &base, &end, (void *)_RET_IP_);
- kmemleak_free_part_phys(base, size);
- cursor = PFN_UP(base);
- end = PFN_DOWN(base + size);
- for (; cursor < end; cursor++) {
- memblock_free_pages(cursor, 0);
- totalram_pages_inc();
- }
+ kmemleak_free_part_phys(base, size);
+ __free_reserved_area(base, base + size, -1);
}
+
/*
* Remaining API functions
*/
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 7/9] memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y
2026-03-23 7:48 [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
` (5 preceding siblings ...)
2026-03-23 7:48 ` [PATCH v2 6/9] memblock: extract page freeing from free_reserved_area() into a helper Mike Rapoport
@ 2026-03-23 7:48 ` Mike Rapoport
2026-03-23 7:48 ` [PATCH v2 8/9] memblock, treewide: make memblock_free() handle late freeing Mike Rapoport
` (2 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2026-03-23 7:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Potapenko, Alexander Viro, Andreas Larsson,
Ard Biesheuvel, Borislav Petkov, Brendan Jackman,
Christophe Leroy (CS GROUP), Catalin Marinas, Christian Brauner,
David S. Miller, Dave Hansen, David Hildenbrand, Dmitry Vyukov,
Ilias Apalodimas, Ingo Molnar, Jan Kara, Johannes Weiner,
Liam R. Howlett, Lorenzo Stoakes, Madhavan Srinivasan,
Marco Elver, Marek Szyprowski, Masami Hiramatsu, Michael Ellerman,
Michal Hocko, Mike Rapoport, Nicholas Piggin, H. Peter Anvin,
Rob Herring, Robin Murphy, Saravana Kannan, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, Will Deacon, Zi Yan, devicetree,
iommu, kasan-dev, linux-arm-kernel, linux-efi, linux-fsdevel,
linux-kernel, linux-mm, linux-trace-kernel, linuxppc-dev,
sparclinux, x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
On architectures that keep memblock after boot, freeing of reserved memory
with free_reserved_area() is paired with an update of memblock arrays,
usually by a call to memblock_free().
Make free_reserved_area() directly update memblock.reserved when
ARCH_KEEP_MEMBLOCK is enabled.
Remove the now-redundant explicit memblock_free() call from
arm64::free_initmem() and the #ifdef CONFIG_ARCH_KEEP_MEMBLOCK block
from the generic free_initrd_mem().
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
arch/arm64/mm/init.c | 3 ---
init/initramfs.c | 7 -------
mm/memblock.c | 6 ++++++
3 files changed, 6 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 96711b8578fd..07b17c708702 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -385,9 +385,6 @@ void free_initmem(void)
WARN_ON(!IS_ALIGNED((unsigned long)lm_init_begin, PAGE_SIZE));
WARN_ON(!IS_ALIGNED((unsigned long)lm_init_end, PAGE_SIZE));
- /* Delete __init region from memblock.reserved. */
- memblock_free(lm_init_begin, lm_init_end - lm_init_begin);
-
free_reserved_area(lm_init_begin, lm_init_end,
POISON_FREE_INITMEM, "unused kernel");
/*
diff --git a/init/initramfs.c b/init/initramfs.c
index 139baed06589..bca0922b2850 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -652,13 +652,6 @@ void __init reserve_initrd_mem(void)
void __weak __init free_initrd_mem(unsigned long start, unsigned long end)
{
-#ifdef CONFIG_ARCH_KEEP_MEMBLOCK
- unsigned long aligned_start = ALIGN_DOWN(start, PAGE_SIZE);
- unsigned long aligned_end = ALIGN(end, PAGE_SIZE);
-
- memblock_free((void *)aligned_start, aligned_end - aligned_start);
-#endif
-
free_reserved_area((void *)start, (void *)end, POISON_FREE_INITMEM,
"initrd");
}
diff --git a/mm/memblock.c b/mm/memblock.c
index ccdf3d225626..0ad968c2f2e8 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -942,6 +942,12 @@ unsigned long free_reserved_area(void *start, void *end, int poison, const char
end_pa = __pa(end - 1) + 1;
}
+ if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
+ if (start_pa < end_pa)
+ memblock_remove_range(&memblock.reserved,
+ start_pa, end_pa - start_pa);
+ }
+
pages = __free_reserved_area(start_pa, end_pa, poison);
if (pages && s)
pr_info("Freeing %s memory: %ldK\n", s, K(pages));
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 8/9] memblock, treewide: make memblock_free() handle late freeing
2026-03-23 7:48 [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
` (6 preceding siblings ...)
2026-03-23 7:48 ` [PATCH v2 7/9] memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y Mike Rapoport
@ 2026-03-23 7:48 ` Mike Rapoport
2026-03-23 7:48 ` [PATCH v2 9/9] memblock: warn when freeing reserved memory before memory map is initialized Mike Rapoport
2026-03-25 8:51 ` [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
9 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2026-03-23 7:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Potapenko, Alexander Viro, Andreas Larsson,
Ard Biesheuvel, Borislav Petkov, Brendan Jackman,
Christophe Leroy (CS GROUP), Catalin Marinas, Christian Brauner,
David S. Miller, Dave Hansen, David Hildenbrand, Dmitry Vyukov,
Ilias Apalodimas, Ingo Molnar, Jan Kara, Johannes Weiner,
Liam R. Howlett, Lorenzo Stoakes, Madhavan Srinivasan,
Marco Elver, Marek Szyprowski, Masami Hiramatsu, Michael Ellerman,
Michal Hocko, Mike Rapoport, Nicholas Piggin, H. Peter Anvin,
Rob Herring, Robin Murphy, Saravana Kannan, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, Will Deacon, Zi Yan, devicetree,
iommu, kasan-dev, linux-arm-kernel, linux-efi, linux-fsdevel,
linux-kernel, linux-mm, linux-trace-kernel, linuxppc-dev,
sparclinux, x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
It shouldn't be responsibility of memblock users to detect if they free
memory allocated from memblock late and should use memblock_free_late().
Make memblock_free() and memblock_phys_free() take care of late memory
freeing and drop memblock_free_late().
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
arch/sparc/kernel/mdesc.c | 4 +-
arch/x86/kernel/setup.c | 2 +-
arch/x86/platform/efi/memmap.c | 5 +--
arch/x86/platform/efi/quirks.c | 2 +-
drivers/firmware/efi/apple-properties.c | 2 +-
drivers/of/kexec.c | 2 +-
include/linux/memblock.h | 2 -
kernel/dma/swiotlb.c | 6 +--
lib/bootconfig.c | 2 +-
mm/kfence/core.c | 4 +-
mm/memblock.c | 49 ++++++++++---------------
11 files changed, 31 insertions(+), 49 deletions(-)
diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
index 30f171b7b00c..ecd6c8ae49c7 100644
--- a/arch/sparc/kernel/mdesc.c
+++ b/arch/sparc/kernel/mdesc.c
@@ -183,14 +183,12 @@ static struct mdesc_handle * __init mdesc_memblock_alloc(unsigned int mdesc_size
static void __init mdesc_memblock_free(struct mdesc_handle *hp)
{
unsigned int alloc_size;
- unsigned long start;
BUG_ON(refcount_read(&hp->refcnt) != 0);
BUG_ON(!list_empty(&hp->list));
alloc_size = PAGE_ALIGN(hp->handle_size);
- start = __pa(hp);
- memblock_free_late(start, alloc_size);
+ memblock_free(hp, alloc_size);
}
static struct mdesc_mem_ops memblock_mdesc_ops = {
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index eebcc9db1a1b..46882ce79c3a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -426,7 +426,7 @@ int __init ima_free_kexec_buffer(void)
if (!ima_kexec_buffer_size)
return -ENOENT;
- memblock_free_late(ima_kexec_buffer_phys,
+ memblock_phys_free(ima_kexec_buffer_phys,
ima_kexec_buffer_size);
ima_kexec_buffer_phys = 0;
diff --git a/arch/x86/platform/efi/memmap.c b/arch/x86/platform/efi/memmap.c
index 023697c88910..697a9a26a005 100644
--- a/arch/x86/platform/efi/memmap.c
+++ b/arch/x86/platform/efi/memmap.c
@@ -34,10 +34,7 @@ static
void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
{
if (flags & EFI_MEMMAP_MEMBLOCK) {
- if (slab_is_available())
- memblock_free_late(phys, size);
- else
- memblock_phys_free(phys, size);
+ memblock_phys_free(phys, size);
} else if (flags & EFI_MEMMAP_SLAB) {
struct page *p = pfn_to_page(PHYS_PFN(phys));
unsigned int order = get_order(size);
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 35caa5746115..a560bbcaa006 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -372,7 +372,7 @@ void __init efi_reserve_boot_services(void)
* doesn't make sense as far as the firmware is
* concerned, but it does provide us with a way to tag
* those regions that must not be paired with
- * memblock_free_late().
+ * memblock_phys_free().
*/
md->attribute |= EFI_MEMORY_RUNTIME;
}
diff --git a/drivers/firmware/efi/apple-properties.c b/drivers/firmware/efi/apple-properties.c
index 13ac28754c03..2e525e17fba7 100644
--- a/drivers/firmware/efi/apple-properties.c
+++ b/drivers/firmware/efi/apple-properties.c
@@ -226,7 +226,7 @@ static int __init map_properties(void)
*/
data->len = 0;
memunmap(data);
- memblock_free_late(pa_data + sizeof(*data), data_len);
+ memblock_phys_free(pa_data + sizeof(*data), data_len);
return ret;
}
diff --git a/drivers/of/kexec.c b/drivers/of/kexec.c
index c4cf3552c018..512d9be9d513 100644
--- a/drivers/of/kexec.c
+++ b/drivers/of/kexec.c
@@ -175,7 +175,7 @@ int __init ima_free_kexec_buffer(void)
if (ret)
return ret;
- memblock_free_late(addr, size);
+ memblock_phys_free(addr, size);
return 0;
}
#endif
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 6ec5e9ac0699..6f6c5b5c4a4b 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -172,8 +172,6 @@ void __next_mem_range_rev(u64 *idx, int nid, enum memblock_flags flags,
struct memblock_type *type_b, phys_addr_t *out_start,
phys_addr_t *out_end, int *out_nid);
-void memblock_free_late(phys_addr_t base, phys_addr_t size);
-
#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
static inline void __next_physmem_range(u64 *idx, struct memblock_type *type,
phys_addr_t *out_start,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d8e6f1d889d5..e44e039e00d3 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -546,10 +546,10 @@ void __init swiotlb_exit(void)
free_pages(tbl_vaddr, get_order(tbl_size));
free_pages((unsigned long)mem->slots, get_order(slots_size));
} else {
- memblock_free_late(__pa(mem->areas),
+ memblock_free(mem->areas,
array_size(sizeof(*mem->areas), mem->nareas));
- memblock_free_late(mem->start, tbl_size);
- memblock_free_late(__pa(mem->slots), slots_size);
+ memblock_phys_free(mem->start, tbl_size);
+ memblock_free(mem->slots, slots_size);
}
memset(mem, 0, sizeof(*mem));
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 449369a60846..86a75bf636bc 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -64,7 +64,7 @@ static inline void __init xbc_free_mem(void *addr, size_t size, bool early)
if (early)
memblock_free(addr, size);
else if (addr)
- memblock_free_late(__pa(addr), size);
+ memblock_free(addr, size);
}
#else /* !__KERNEL__ */
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 7393957f9a20..5c8268af533e 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -731,10 +731,10 @@ static bool __init kfence_init_pool_early(void)
* fails for the first page, and therefore expect addr==__kfence_pool in
* most failure cases.
*/
- memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool));
+ memblock_free((void *)addr, KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool));
__kfence_pool = NULL;
- memblock_free_late(__pa(kfence_metadata_init), KFENCE_METADATA_SIZE);
+ memblock_free(kfence_metadata_init, KFENCE_METADATA_SIZE);
kfence_metadata_init = NULL;
return false;
diff --git a/mm/memblock.c b/mm/memblock.c
index 0ad968c2f2e8..dc8811861c11 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -384,26 +384,27 @@ static void __init_memblock memblock_remove_region(struct memblock_type *type, u
*/
void __init memblock_discard(void)
{
- phys_addr_t addr, size;
+ phys_addr_t size;
+ void *addr;
if (memblock.reserved.regions != memblock_reserved_init_regions) {
- addr = __pa(memblock.reserved.regions);
+ addr = memblock.reserved.regions;
size = PAGE_ALIGN(sizeof(struct memblock_region) *
memblock.reserved.max);
if (memblock_reserved_in_slab)
- kfree(memblock.reserved.regions);
+ kfree(addr);
else
- memblock_free_late(addr, size);
+ memblock_free(addr, size);
}
if (memblock.memory.regions != memblock_memory_init_regions) {
- addr = __pa(memblock.memory.regions);
+ addr = memblock.memory.regions;
size = PAGE_ALIGN(sizeof(struct memblock_region) *
memblock.memory.max);
if (memblock_memory_in_slab)
- kfree(memblock.memory.regions);
+ kfree(addr);
else
- memblock_free_late(addr, size);
+ memblock_free(addr, size);
}
memblock_memory = NULL;
@@ -961,7 +962,8 @@ unsigned long free_reserved_area(void *start, void *end, int poison, const char
* @size: size of the boot memory block in bytes
*
* Free boot memory block previously allocated by memblock_alloc_xx() API.
- * The freeing memory will not be released to the buddy allocator.
+ * If called after the buddy allocator is available, the memory is released to
+ * the buddy allocator.
*/
void __init_memblock memblock_free(void *ptr, size_t size)
{
@@ -975,17 +977,24 @@ void __init_memblock memblock_free(void *ptr, size_t size)
* @size: size of the boot memory block in bytes
*
* Free boot memory block previously allocated by memblock_phys_alloc_xx() API.
- * The freeing memory will not be released to the buddy allocator.
+ * If called after the buddy allocator is available, the memory is released to
+ * the buddy allocator.
*/
int __init_memblock memblock_phys_free(phys_addr_t base, phys_addr_t size)
{
phys_addr_t end = base + size - 1;
+ int ret;
memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
&base, &end, (void *)_RET_IP_);
kmemleak_free_part_phys(base, size);
- return memblock_remove_range(&memblock.reserved, base, size);
+ ret = memblock_remove_range(&memblock.reserved, base, size);
+
+ if (slab_is_available())
+ __free_reserved_area(base, base + size, -1);
+
+ return ret;
}
int __init_memblock __memblock_reserve(phys_addr_t base, phys_addr_t size,
@@ -1813,26 +1822,6 @@ void *__init __memblock_alloc_or_panic(phys_addr_t size, phys_addr_t align,
return addr;
}
-/**
- * memblock_free_late - free pages directly to buddy allocator
- * @base: phys starting address of the boot memory block
- * @size: size of the boot memory block in bytes
- *
- * This is only useful when the memblock allocator has already been torn
- * down, but we are still initializing the system. Pages are released directly
- * to the buddy allocator.
- */
-void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
-{
- phys_addr_t end = base + size - 1;
-
- memblock_dbg("%s: [%pa-%pa] %pS\n",
- __func__, &base, &end, (void *)_RET_IP_);
-
- kmemleak_free_part_phys(base, size);
- __free_reserved_area(base, base + size, -1);
-}
-
/*
* Remaining API functions
*/
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 9/9] memblock: warn when freeing reserved memory before memory map is initialized
2026-03-23 7:48 [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
` (7 preceding siblings ...)
2026-03-23 7:48 ` [PATCH v2 8/9] memblock, treewide: make memblock_free() handle late freeing Mike Rapoport
@ 2026-03-23 7:48 ` Mike Rapoport
2026-03-27 14:01 ` Warning from free_reserved_area() in next-20260325+ Bert Karwatzki
2026-03-25 8:51 ` [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
9 siblings, 1 reply; 14+ messages in thread
From: Mike Rapoport @ 2026-03-23 7:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Potapenko, Alexander Viro, Andreas Larsson,
Ard Biesheuvel, Borislav Petkov, Brendan Jackman,
Christophe Leroy (CS GROUP), Catalin Marinas, Christian Brauner,
David S. Miller, Dave Hansen, David Hildenbrand, Dmitry Vyukov,
Ilias Apalodimas, Ingo Molnar, Jan Kara, Johannes Weiner,
Liam R. Howlett, Lorenzo Stoakes, Madhavan Srinivasan,
Marco Elver, Marek Szyprowski, Masami Hiramatsu, Michael Ellerman,
Michal Hocko, Mike Rapoport, Nicholas Piggin, H. Peter Anvin,
Rob Herring, Robin Murphy, Saravana Kannan, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, Will Deacon, Zi Yan, devicetree,
iommu, kasan-dev, linux-arm-kernel, linux-efi, linux-fsdevel,
linux-kernel, linux-mm, linux-trace-kernel, linuxppc-dev,
sparclinux, x86
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, freeing of reserved
memory before the memory map is fully initialized in deferred_init_memmap()
would cause access to uninitialized struct pages and may crash when
accessing spurious list pointers, like was recently discovered during
discussion about memory leaks in x86 EFI code [1].
The trace below is from an attempt to call free_reserved_page() before
page_alloc_init_late():
[ 0.076840] BUG: unable to handle page fault for address: ffffce1a005a0788
[ 0.078226] #PF: supervisor read access in kernel mode
[ 0.078226] #PF: error_code(0x0000) - not-present page
[ 0.078226] PGD 0 P4D 0
[ 0.078226] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 0.078226] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.68-92.123.amzn2023.x86_64 #1
[ 0.078226] Hardware name: Amazon EC2 t3a.nano/, BIOS 1.0 10/16/2017
[ 0.078226] RIP: 0010:__list_del_entry_valid_or_report+0x32/0xb0
...
[ 0.078226] __free_one_page+0x170/0x520
[ 0.078226] free_pcppages_bulk+0x151/0x1e0
[ 0.078226] free_unref_page_commit+0x263/0x320
[ 0.078226] free_unref_page+0x2c8/0x5b0
[ 0.078226] ? srso_return_thunk+0x5/0x5f
[ 0.078226] free_reserved_page+0x1c/0x30
[ 0.078226] memblock_free_late+0x6c/0xc0
Currently there are not many callers of free_reserved_area() and they all
appear to be at the right timings.
Still, in order to protect against problematic code moves or additions of
new callers add a warning that will inform that reserved pages cannot be
freed until the memory map is fully initialized.
[1] https://lore.kernel.org/all/e5d5a1105d90ee1e7fe7eafaed2ed03bbad0c46b.camel@kernel.crashing.org/
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
mm/internal.h | 10 ++++++++++
mm/memblock.c | 5 +++++
mm/page_alloc.c | 10 ----------
3 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index cb0af847d7d9..f60c1edb2e02 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1233,7 +1233,17 @@ static inline void vunmap_range_noflush(unsigned long start, unsigned long end)
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
DECLARE_STATIC_KEY_TRUE(deferred_pages);
+static inline bool deferred_pages_enabled(void)
+{
+ return static_branch_unlikely(&deferred_pages);
+}
+
bool __init deferred_grow_zone(struct zone *zone, unsigned int order);
+#else
+static inline bool deferred_pages_enabled(void)
+{
+ return false;
+}
#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
void init_deferred_page(unsigned long pfn, int nid);
diff --git a/mm/memblock.c b/mm/memblock.c
index dc8811861c11..ab8f35c3bd41 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -899,6 +899,11 @@ static unsigned long __free_reserved_area(phys_addr_t start, phys_addr_t end,
{
unsigned long pages = 0, pfn;
+ if (deferred_pages_enabled()) {
+ WARN(1, "Cannot free reserved memory because of deferred initialization of the memory map");
+ return 0;
+ }
+
for_each_valid_pfn(pfn, PFN_UP(start), PFN_DOWN(end)) {
struct page *page = pfn_to_page(pfn);
void *direct_map_addr;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df3d61253001..9ac47bab2ea7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -331,11 +331,6 @@ int page_group_by_mobility_disabled __read_mostly;
*/
DEFINE_STATIC_KEY_TRUE(deferred_pages);
-static inline bool deferred_pages_enabled(void)
-{
- return static_branch_unlikely(&deferred_pages);
-}
-
/*
* deferred_grow_zone() is __init, but it is called from
* get_page_from_freelist() during early boot until deferred_pages permanently
@@ -348,11 +343,6 @@ _deferred_grow_zone(struct zone *zone, unsigned int order)
return deferred_grow_zone(zone, order);
}
#else
-static inline bool deferred_pages_enabled(void)
-{
- return false;
-}
-
static inline bool _deferred_grow_zone(struct zone *zone, unsigned int order)
{
return false;
--
2.53.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Warning from free_reserved_area() in next-20260325+
2026-03-23 7:48 ` [PATCH v2 9/9] memblock: warn when freeing reserved memory before memory map is initialized Mike Rapoport
@ 2026-03-27 14:01 ` Bert Karwatzki
2026-03-27 17:12 ` Mike Rapoport
0 siblings, 1 reply; 14+ messages in thread
From: Bert Karwatzki @ 2026-03-27 14:01 UTC (permalink / raw)
To: Mike Rapoport
Cc: Bert Karwatzki, linux-kernel, Liam.Howlett, akpm, andreas, ardb,
bp, brauner, catalin.marinas, chleroy, dave.hansen, davem, david,
devicetree, dvyukov, elver, glider, hannes, hpa, ilias.apalodimas,
iommu, jack, jackmanb, kasan-dev, linux-arm-kernel, linux-efi,
linux-fsdevel, linux-mm, linux-trace-kernel, linuxppc-dev,
lorenzo.stoakes, m.szyprowski, maddy, mhiramat, mhocko, mingo,
mpe, npiggin, robh, robin.murphy, saravanak, sparclinux, surenb,
tglx, vbabka, viro, will, x86, ziy
Starting with linux next-20260325 I see the following warning early in the
boot process of a machine running debian stable (trixie) (except for the kernel):
[ 0.027118] [ T0] ------------[ cut here ]------------
[ 0.027118] [ T0] Cannot free reserved memory because of deferred initialization of the memory map
[ 0.027119] [ T0] WARNING: mm/memblock.c:904 at __free_reserved_area+0xa9/0xc0, CPU#0: swapper/0/0
[ 0.027122] [ T0] Modules linked in:
[ 0.027123] [ T0] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc5-next-20260326-master #385 PREEMPT_RT
[ 0.027125] [ T0] Hardware name: ASUS System Product Name/ROG STRIX B850-F GAMING WIFI, BIOS 1627 02/05/2026
[ 0.027125] [ T0] RIP: 0010:__free_reserved_area+0xa9/0xc0
[ 0.027126] [ T0] Code: 48 89 df 48 89 ee e8 06 fe ff ff 48 89 c3 48 39 e8 72 a0 5b 4c 89 e8 5d 41 5c 41 5d 41 5e c3 cc cc cc cc 48 8d 3d 97 c2 c6 00 <67> 48 0f b9 3a 45 31 ed eb df 66 66 2e 0f 1f 84 00 00 00 00 00 66
[ 0.027127] [ T0] RSP: 0000:ffffffff9b203e98 EFLAGS: 00010202
[ 0.027128] [ T0] RAX: 0000000e91c00001 RBX: ffffffff9b100c0f RCX: 0000000080000001
[ 0.027128] [ T0] RDX: 00000000000000cc RSI: 0000000e2d42d000 RDI: ffffffff9b32ef60
[ 0.027128] [ T0] RBP: ffff9eeafdd6fbc0 R08: 0000000000000000 R09: 0000000000000001
[ 0.027129] [ T0] R10: 0000000000001000 R11: 8000000000000163 R12: 000000000000006f
[ 0.027129] [ T0] R13: 0000000000000000 R14: 0000000000000045 R15: 000000005c8a1000
[ 0.027129] [ T0] FS: 0000000000000000(0000) GS:ffff9eeb21c05000(0000) knlGS:0000000000000000
[ 0.027130] [ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.027130] [ T0] CR2: ffff9ee8ad801000 CR3: 0000000e2ce1e000 CR4: 0000000000f50ef0
[ 0.027131] [ T0] PKRU: 55555554
[ 0.027131] [ T0] Call Trace:
[ 0.027132] [ T0] <TASK>
[ 0.027132] [ T0] free_reserved_area+0x89/0xd0
[ 0.027133] [ T0] alternative_instructions+0xee/0x110
[ 0.027136] [ T0] arch_cpu_finalize_init+0x10f/0x160
[ 0.027138] [ T0] start_kernel+0x686/0x710
[ 0.027140] [ T0] x86_64_start_reservations+0x24/0x30
[ 0.027141] [ T0] x86_64_start_kernel+0xd4/0xe0
[ 0.027142] [ T0] common_startup_64+0x13e/0x141
[ 0.027143] [ T0] </TASK>
[ 0.027144] [ T0] ---[ end trace 0000000000000000 ]---
The Hardware used is this:
$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 26
model : 68
model name : AMD Ryzen 9 9950X 16-Core Processor
stepping : 0
microcode : 0xb404035
cpu MHz : 3607.683
cache size : 1024 KB
physical id : 0
siblings : 32
core id : 0
cpu cores : 16
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx_vnni avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid bus_lock_detect movdiri movdir64b overflow_recov succor smca fsrm avx512_vp2intersect flush_l1d amd_lbr_pmc_freeze
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso spectre_v2_user vmscape
bogomips : 8599.98
TLB size : 192 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
$ lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Internal GPP Bridge to Bus [C:A]
00:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Internal GPP Bridge to Bus [C:A]
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 71)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 7
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 25)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 25)
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 44 [RX 9060 XT] (rev c0)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 48 HDMI/DP Audio Controller
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD 9100 PRO [PM9E1]
05:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Upstream Port (rev 01)
06:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
06:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
06:07.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
06:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
06:0c.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
06:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
08:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-V (rev 06)
09:00.0 Network controller: MEDIATEK Corp. Device 7925
0b:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 800 Series Chipset USB 3.x XHCI Controller (rev 01)
0c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset SATA Controller (rev 01)
0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge PCIe Dummy Function (rev c1)
0d:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 19h PSP/CCP
0d:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI
0d:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI
0e:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 2.0 xHCI
Memory used is 64G:
$ LANG=C free
total used free shared buff/cache available
Mem: 65500068 3584080 56709424 70916 5912256 61915988
Swap: 78125052 0 78125052
Bert Karwatzki
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: Warning from free_reserved_area() in next-20260325+
2026-03-27 14:01 ` Warning from free_reserved_area() in next-20260325+ Bert Karwatzki
@ 2026-03-27 17:12 ` Mike Rapoport
2026-03-27 19:54 ` Bert Karwatzki
0 siblings, 1 reply; 14+ messages in thread
From: Mike Rapoport @ 2026-03-27 17:12 UTC (permalink / raw)
To: Bert Karwatzki
Cc: linux-kernel, Liam.Howlett, akpm, andreas, ardb, bp, brauner,
catalin.marinas, chleroy, dave.hansen, davem, david, devicetree,
dvyukov, elver, glider, hannes, hpa, ilias.apalodimas, iommu,
jack, jackmanb, kasan-dev, linux-arm-kernel, linux-efi,
linux-fsdevel, linux-mm, linux-trace-kernel, linuxppc-dev,
lorenzo.stoakes, m.szyprowski, maddy, mhiramat, mhocko, mingo,
mpe, npiggin, robh, robin.murphy, saravanak, sparclinux, surenb,
tglx, vbabka, viro, will, x86, ziy
Hi Bert,
On Fri, Mar 27, 2026 at 03:01:08PM +0100, Bert Karwatzki wrote:
> Starting with linux next-20260325 I see the following warning early in the
> boot process of a machine running debian stable (trixie) (except for the kernel):
Thanks for the report!
> [ 0.027118] [ T0] ------------[ cut here ]------------
> [ 0.027118] [ T0] Cannot free reserved memory because of deferred initialization of the memory map
> [ 0.027119] [ T0] WARNING: mm/memblock.c:904 at __free_reserved_area+0xa9/0xc0, CPU#0: swapper/0/0
> [ 0.027122] [ T0] Modules linked in:
> [ 0.027123] [ T0] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc5-next-20260326-master #385 PREEMPT_RT
> [ 0.027125] [ T0] Hardware name: ASUS System Product Name/ROG STRIX B850-F GAMING WIFI, BIOS 1627 02/05/2026
> [ 0.027125] [ T0] RIP: 0010:__free_reserved_area+0xa9/0xc0
> [ 0.027126] [ T0] Code: 48 89 df 48 89 ee e8 06 fe ff ff 48 89 c3 48 39 e8 72 a0 5b 4c 89 e8 5d 41 5c 41 5d 41 5e c3 cc cc cc cc 48 8d 3d 97 c2 c6 00 <67> 48 0f b9 3a 45 31 ed eb df 66 66 2e 0f 1f 84 00 00 00 00 00 66
> [ 0.027127] [ T0] RSP: 0000:ffffffff9b203e98 EFLAGS: 00010202
> [ 0.027128] [ T0] RAX: 0000000e91c00001 RBX: ffffffff9b100c0f RCX: 0000000080000001
> [ 0.027128] [ T0] RDX: 00000000000000cc RSI: 0000000e2d42d000 RDI: ffffffff9b32ef60
> [ 0.027128] [ T0] RBP: ffff9eeafdd6fbc0 R08: 0000000000000000 R09: 0000000000000001
> [ 0.027129] [ T0] R10: 0000000000001000 R11: 8000000000000163 R12: 000000000000006f
> [ 0.027129] [ T0] R13: 0000000000000000 R14: 0000000000000045 R15: 000000005c8a1000
> [ 0.027129] [ T0] FS: 0000000000000000(0000) GS:ffff9eeb21c05000(0000) knlGS:0000000000000000
> [ 0.027130] [ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.027130] [ T0] CR2: ffff9ee8ad801000 CR3: 0000000e2ce1e000 CR4: 0000000000f50ef0
> [ 0.027131] [ T0] PKRU: 55555554
> [ 0.027131] [ T0] Call Trace:
> [ 0.027132] [ T0] <TASK>
> [ 0.027132] [ T0] free_reserved_area+0x89/0xd0
> [ 0.027133] [ T0] alternative_instructions+0xee/0x110
> [ 0.027136] [ T0] arch_cpu_finalize_init+0x10f/0x160
> [ 0.027138] [ T0] start_kernel+0x686/0x710
> [ 0.027140] [ T0] x86_64_start_reservations+0x24/0x30
> [ 0.027141] [ T0] x86_64_start_kernel+0xd4/0xe0
> [ 0.027142] [ T0] common_startup_64+0x13e/0x141
> [ 0.027143] [ T0] </TASK>
> [ 0.027144] [ T0] ---[ end trace 0000000000000000 ]---
Does this patch fix it for you?
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index e87da25d1236..62936a3bde19 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2448,19 +2448,31 @@ void __init alternative_instructions(void)
__smp_locks, __smp_locks_end,
_text, _etext);
}
+#endif
+ restart_nmi();
+ alternatives_patched = 1;
+
+ alt_reloc_selftest();
+}
+
+#ifdef CONFIG_SMP
+/*
+ * With CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled we can free_init_pages() only
+ * after the deferred initialization of the memory map is complete.
+ */
+static int __init free_smp_locks(void)
+{
if (!uniproc_patched || num_possible_cpus() == 1) {
free_init_pages("SMP alternatives",
(unsigned long)__smp_locks,
(unsigned long)__smp_locks_end);
}
-#endif
- restart_nmi();
- alternatives_patched = 1;
-
- alt_reloc_selftest();
+ return 0;
}
+arch_initcall(free_smp_locks);
+#endif
/**
* text_poke_early - Update instructions on a live kernel at boot time
> Bert Karwatzki
--
Sincerely yours,
Mike.
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: Warning from free_reserved_area() in next-20260325+
2026-03-27 17:12 ` Mike Rapoport
@ 2026-03-27 19:54 ` Bert Karwatzki
0 siblings, 0 replies; 14+ messages in thread
From: Bert Karwatzki @ 2026-03-27 19:54 UTC (permalink / raw)
To: Mike Rapoport
Cc: linux-kernel, spasswolf, Liam.Howlett, akpm, andreas, ardb, bp,
brauner, catalin.marinas, chleroy, dave.hansen, davem, david,
devicetree, dvyukov, elver, glider, hannes, hpa, ilias.apalodimas,
iommu, jack, jackmanb, kasan-dev, linux-arm-kernel, linux-efi,
linux-fsdevel, linux-mm, linux-trace-kernel, linuxppc-dev,
lorenzo.stoakes, m.szyprowski, maddy, mhiramat, mhocko, mingo,
mpe, npiggin, robh, robin.murphy, saravanak, sparclinux, surenb,
tglx, vbabka, viro, will, x86, ziy
Am Freitag, dem 27.03.2026 um 20:12 +0300 schrieb Mike Rapoport:
> Hi Bert,
>
> On Fri, Mar 27, 2026 at 03:01:08PM +0100, Bert Karwatzki wrote:
> > Starting with linux next-20260325 I see the following warning early in the
> > boot process of a machine running debian stable (trixie) (except for the kernel):
>
> Thanks for the report!
>
> > [ 0.027118] [ T0] ------------[ cut here ]------------
> > [ 0.027118] [ T0] Cannot free reserved memory because of deferred initialization of the memory map
> > [ 0.027119] [ T0] WARNING: mm/memblock.c:904 at __free_reserved_area+0xa9/0xc0, CPU#0: swapper/0/0
> > [ 0.027122] [ T0] Modules linked in:
> > [ 0.027123] [ T0] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc5-next-20260326-master #385 PREEMPT_RT
> > [ 0.027125] [ T0] Hardware name: ASUS System Product Name/ROG STRIX B850-F GAMING WIFI, BIOS 1627 02/05/2026
> > [ 0.027125] [ T0] RIP: 0010:__free_reserved_area+0xa9/0xc0
> > [ 0.027126] [ T0] Code: 48 89 df 48 89 ee e8 06 fe ff ff 48 89 c3 48 39 e8 72 a0 5b 4c 89 e8 5d 41 5c 41 5d 41 5e c3 cc cc cc cc 48 8d 3d 97 c2 c6 00 <67> 48 0f b9 3a 45 31 ed eb df 66 66 2e 0f 1f 84 00 00 00 00 00 66
> > [ 0.027127] [ T0] RSP: 0000:ffffffff9b203e98 EFLAGS: 00010202
> > [ 0.027128] [ T0] RAX: 0000000e91c00001 RBX: ffffffff9b100c0f RCX: 0000000080000001
> > [ 0.027128] [ T0] RDX: 00000000000000cc RSI: 0000000e2d42d000 RDI: ffffffff9b32ef60
> > [ 0.027128] [ T0] RBP: ffff9eeafdd6fbc0 R08: 0000000000000000 R09: 0000000000000001
> > [ 0.027129] [ T0] R10: 0000000000001000 R11: 8000000000000163 R12: 000000000000006f
> > [ 0.027129] [ T0] R13: 0000000000000000 R14: 0000000000000045 R15: 000000005c8a1000
> > [ 0.027129] [ T0] FS: 0000000000000000(0000) GS:ffff9eeb21c05000(0000) knlGS:0000000000000000
> > [ 0.027130] [ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 0.027130] [ T0] CR2: ffff9ee8ad801000 CR3: 0000000e2ce1e000 CR4: 0000000000f50ef0
> > [ 0.027131] [ T0] PKRU: 55555554
> > [ 0.027131] [ T0] Call Trace:
> > [ 0.027132] [ T0] <TASK>
> > [ 0.027132] [ T0] free_reserved_area+0x89/0xd0
> > [ 0.027133] [ T0] alternative_instructions+0xee/0x110
> > [ 0.027136] [ T0] arch_cpu_finalize_init+0x10f/0x160
> > [ 0.027138] [ T0] start_kernel+0x686/0x710
> > [ 0.027140] [ T0] x86_64_start_reservations+0x24/0x30
> > [ 0.027141] [ T0] x86_64_start_kernel+0xd4/0xe0
> > [ 0.027142] [ T0] common_startup_64+0x13e/0x141
> > [ 0.027143] [ T0] </TASK>
> > [ 0.027144] [ T0] ---[ end trace 0000000000000000 ]---
>
> Does this patch fix it for you?
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index e87da25d1236..62936a3bde19 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -2448,19 +2448,31 @@ void __init alternative_instructions(void)
> __smp_locks, __smp_locks_end,
> _text, _etext);
> }
> +#endif
>
> + restart_nmi();
> + alternatives_patched = 1;
> +
> + alt_reloc_selftest();
> +}
> +
> +#ifdef CONFIG_SMP
> +/*
> + * With CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled we can free_init_pages() only
> + * after the deferred initialization of the memory map is complete.
> + */
> +static int __init free_smp_locks(void)
> +{
> if (!uniproc_patched || num_possible_cpus() == 1) {
> free_init_pages("SMP alternatives",
> (unsigned long)__smp_locks,
> (unsigned long)__smp_locks_end);
> }
> -#endif
>
> - restart_nmi();
> - alternatives_patched = 1;
> -
> - alt_reloc_selftest();
> + return 0;
> }
> +arch_initcall(free_smp_locks);
> +#endif
>
> /**
> * text_poke_early - Update instructions on a live kernel at boot time
>
> > Bert Karwatzki
Yes, your patch fixes the issue in next-20260326.
Tested-By: Bert Karwatzki <spasswolf@web.de>
Bert Karwatzki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/9] memblock: improve late freeing of reserved memory
2026-03-23 7:48 [PATCH v2 0/9] memblock: improve late freeing of reserved memory Mike Rapoport
` (8 preceding siblings ...)
2026-03-23 7:48 ` [PATCH v2 9/9] memblock: warn when freeing reserved memory before memory map is initialized Mike Rapoport
@ 2026-03-25 8:51 ` Mike Rapoport
9 siblings, 0 replies; 14+ messages in thread
From: Mike Rapoport @ 2026-03-25 8:51 UTC (permalink / raw)
To: Andrew Morton, Mike Rapoport
Cc: Alexander Potapenko, Alexander Viro, Andreas Larsson,
Ard Biesheuvel, Borislav Petkov, Brendan Jackman,
Christophe Leroy (CS GROUP), Catalin Marinas, Christian Brauner,
David S. Miller, Dave Hansen, David Hildenbrand, Dmitry Vyukov,
Ilias Apalodimas, Ingo Molnar, Jan Kara, Johannes Weiner,
Liam R. Howlett, Madhavan Srinivasan, Marco Elver,
Marek Szyprowski, Masami Hiramatsu, Michael Ellerman,
Michal Hocko, Nicholas Piggin, H. Peter Anvin, Rob Herring,
Robin Murphy, Saravana Kannan, Suren Baghdasaryan,
Thomas Gleixner, Vlastimil Babka, Will Deacon, Zi Yan, devicetree,
iommu, kasan-dev, linux-arm-kernel, linux-efi, linux-fsdevel,
linux-kernel, linux-mm, linux-trace-kernel, linuxppc-dev,
sparclinux, x86, Lorenzo Stoakes
On Mon, 23 Mar 2026 09:48:27 +0200, Mike Rapoport wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>
> Hi,
>
> Following a recent discussion about leaks in x86 EFI [1], I audited usage of
> memblock_free_late() and free_reserved_area() and made some imporovements how
> we handle late freeing of the memory allocated with memblock.
>
> [...]
Applied to for-next branch of memblock.git tree, thanks!
[1/9] memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name()
commit: ea459d3c24fefd90b60a702f4a73833434ae0248
[2/9] powerpc: fadump: pair alloc_pages_exact() with free_pages_exact()
commit: 6e827110aea5fb9c53a5bf070413ffe5cad105b0
[3/9] powerpc: opal-core: pair alloc_pages_exact() with free_pages_exact()
commit: 3cf80188ecb828ed034ba562614cf1d48156b126
[4/9] mm: move free_reserved_area() to mm/memblock.c
commit: 0aa264cda784f9fbe1a80ef13144cf81610086c7
[5/9] memblock: make free_reserved_area() more robust
commit: 456ac994018598bc57ceaacb8a2c72e722c9755b
[6/9] memblock: extract page freeing from free_reserved_area() into a helper
commit: 40191dae9ed84c816b593bb1b36a80f86c2279d1
[7/9] memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y
commit: b9e028ca869de24df00206d7ec640380670fc38f
[8/9] memblock, treewide: make memblock_free() handle late freeing
commit: 64cb853c2ab4d8bd25b965f05e33ac0c6672bae7
[9/9] memblock: warn when freeing reserved memory before memory map is initialized
commit: c7fc9cde41be029cf6675befbafcbb2dab40b39b
tree: https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
branch: for-next
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 14+ messages in thread