* Re: [PATCH 7/8] s390/mm: use free_reserved_page() in vmem_free_pages()
From: Heiko Carstens @ 2026-05-11 14:21 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: David S. Miller, Andreas Larsson, Mike Rapoport, Andrew Morton,
Alexander Gordeev, Gerald Schaefer, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, sparclinux, linux-kernel,
linux-mm, linux-s390, linuxppc-dev
In-Reply-To: <20260511-bootmem_info_prep-v1-7-3fb0be6fc688@kernel.org>
On Mon, May 11, 2026 at 04:05:35PM +0200, David Hildenbrand (Arm) wrote:
> We never select CONFIG_HAVE_BOOTMEM_INFO_NODE on s390. Therefore,
> free_bootmem_page() nowadays always translates to free_reserved_page().
>
> Let's use free_reserved_page() to replace the free_bootmem_page() loop.
> We can stop including bootmem_info.h.
>
> Likely, vmemmap freeing code could be factored out into the core in the
> future.
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
> arch/s390/mm/vmem.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index eeadff45e0e1..d8b2a60e0c33 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -4,7 +4,6 @@
> */
>
> #include <linux/memory_hotplug.h>
> -#include <linux/bootmem_info.h>
> #include <linux/cpufeature.h>
> #include <linux/memblock.h>
> #include <linux/pfn.h>
> @@ -51,7 +50,7 @@ static void vmem_free_pages(unsigned long addr, int order, struct vmem_altmap *a
> if (PageReserved(page)) {
> /* allocated from memblock */
> while (nr_pages--)
> - free_bootmem_page(page++);
> + free_reserved_page(page++);
What about the implicit call of kmemleak_free_part_phys() which gets
removed with this?
^ permalink raw reply
* [PATCH 8/8] powerpc/mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE
From: David Hildenbrand (Arm) @ 2026-05-11 14:05 UTC (permalink / raw)
To: David S. Miller, Andreas Larsson, Mike Rapoport, Andrew Morton,
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko
Cc: sparclinux, linux-kernel, linux-mm, linux-s390, linuxppc-dev,
David Hildenbrand (Arm)
In-Reply-To: <20260511-bootmem_info_prep-v1-0-3fb0be6fc688@kernel.org>
register_page_bootmem_info_node() essentially only calls
register_page_bootmem_memmap(). However, on powerpc that function is a
nop. So there is not benefit in using CONFIG_HAVE_BOOTMEM_INFO_NODE
anymore, let's just drop it.
We can stop including bootmem_info.h.
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
arch/powerpc/mm/init_64.c | 8 --------
mm/Kconfig | 2 +-
2 files changed, 1 insertion(+), 9 deletions(-)
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index b6f3ae03ca9e..64f0df5bb5cd 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -41,7 +41,6 @@
#include <linux/libfdt.h>
#include <linux/memremap.h>
#include <linux/memory.h>
-#include <linux/bootmem_info.h>
#include <asm/pgalloc.h>
#include <asm/page.h>
@@ -388,13 +387,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
#endif
-#ifdef CONFIG_HAVE_BOOTMEM_INFO_NODE
-void register_page_bootmem_memmap(unsigned long section_nr,
- struct page *start_page, unsigned long size)
-{
-}
-#endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
-
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
#ifdef CONFIG_PPC_BOOK3S_64
diff --git a/mm/Kconfig b/mm/Kconfig
index e221fa1dc54d..97b079372325 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -537,7 +537,7 @@ endchoice
config MEMORY_HOTREMOVE
bool "Allow for memory hot remove"
- select HAVE_BOOTMEM_INFO_NODE if (X86_64 || PPC64)
+ select HAVE_BOOTMEM_INFO_NODE if X86_64
depends on MEMORY_HOTPLUG
select MIGRATION
--
2.43.0
^ permalink raw reply related
* [PATCH 7/8] s390/mm: use free_reserved_page() in vmem_free_pages()
From: David Hildenbrand (Arm) @ 2026-05-11 14:05 UTC (permalink / raw)
To: David S. Miller, Andreas Larsson, Mike Rapoport, Andrew Morton,
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko
Cc: sparclinux, linux-kernel, linux-mm, linux-s390, linuxppc-dev,
David Hildenbrand (Arm)
In-Reply-To: <20260511-bootmem_info_prep-v1-0-3fb0be6fc688@kernel.org>
We never select CONFIG_HAVE_BOOTMEM_INFO_NODE on s390. Therefore,
free_bootmem_page() nowadays always translates to free_reserved_page().
Let's use free_reserved_page() to replace the free_bootmem_page() loop.
We can stop including bootmem_info.h.
Likely, vmemmap freeing code could be factored out into the core in the
future.
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
arch/s390/mm/vmem.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index eeadff45e0e1..d8b2a60e0c33 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -4,7 +4,6 @@
*/
#include <linux/memory_hotplug.h>
-#include <linux/bootmem_info.h>
#include <linux/cpufeature.h>
#include <linux/memblock.h>
#include <linux/pfn.h>
@@ -51,7 +50,7 @@ static void vmem_free_pages(unsigned long addr, int order, struct vmem_altmap *a
if (PageReserved(page)) {
/* allocated from memblock */
while (nr_pages--)
- free_bootmem_page(page++);
+ free_reserved_page(page++);
} else {
free_pages(addr, order);
}
--
2.43.0
^ permalink raw reply related
* [PATCH 6/8] mm/bootmem_info: stop marking mem_section_usage as MIX_SECTION_INFO
From: David Hildenbrand (Arm) @ 2026-05-11 14:05 UTC (permalink / raw)
To: David S. Miller, Andreas Larsson, Mike Rapoport, Andrew Morton,
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko
Cc: sparclinux, linux-kernel, linux-mm, linux-s390, linuxppc-dev,
David Hildenbrand (Arm)
In-Reply-To: <20260511-bootmem_info_prep-v1-0-3fb0be6fc688@kernel.org>
We never free the ms->usage data for boot memory sections (see
section_deactivate()). And to identify whether ms->usage was allocated
from memblock, we simply identify it by looking at PG_reserved.
Consequently, there is no need to mark ms->usage as MIX_SECTION_INFO.
Let's just stop doing that.
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
mm/bootmem_info.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)
diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
index cce1d560f094..0fa78db7fbc0 100644
--- a/mm/bootmem_info.c
+++ b/mm/bootmem_info.c
@@ -38,10 +38,8 @@ void put_page_bootmem(struct page *page)
static void __init register_page_bootmem_info_section(unsigned long start_pfn)
{
- unsigned long mapsize, section_nr, i;
+ unsigned long section_nr;
struct mem_section *ms;
- struct mem_section_usage *usage;
- struct page *page;
start_pfn = SECTION_ALIGN_DOWN(start_pfn);
section_nr = pfn_to_section_nr(start_pfn);
@@ -50,14 +48,6 @@ static void __init register_page_bootmem_info_section(unsigned long start_pfn)
if (!preinited_vmemmap_section(ms))
register_page_bootmem_memmap(section_nr, pfn_to_page(start_pfn),
PAGES_PER_SECTION);
-
- usage = ms->usage;
- page = virt_to_page(usage);
-
- mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT;
-
- for (i = 0; i < mapsize; i++, page++)
- get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
}
void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
--
2.43.0
^ permalink raw reply related
* [PATCH 5/8] mm/bootmem_info: stop marking the pgdat as NODE_INFO
From: David Hildenbrand (Arm) @ 2026-05-11 14:05 UTC (permalink / raw)
To: David S. Miller, Andreas Larsson, Mike Rapoport, Andrew Morton,
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko
Cc: sparclinux, linux-kernel, linux-mm, linux-s390, linuxppc-dev,
David Hildenbrand (Arm)
In-Reply-To: <20260511-bootmem_info_prep-v1-0-3fb0be6fc688@kernel.org>
We removed the last user of NODE_INFO in commit 119c31caa59e ("mm/sparse:
remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG").
But it really was never used it besides for safety-checks ever since it was
introduced in commit 04753278769f ("memory hotplug: register section/node
id to free"), where we had the comment:
5) The node information like pgdat has similar issues. But, this
will be able to be solved too by this.
(Not implemented yet, but, remembering node id in the pages.)
Of course, that never happened, and we are not planning on freeing the
node data (pgdat/pglist_data), during memory hotunplug.
So let's just stop marking the pgdat as NODE_INFO.
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
mm/bootmem_info.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
index 74c1116626c8..cce1d560f094 100644
--- a/mm/bootmem_info.c
+++ b/mm/bootmem_info.c
@@ -62,15 +62,8 @@ static void __init register_page_bootmem_info_section(unsigned long start_pfn)
void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
{
- unsigned long i, pfn, end_pfn, nr_pages;
+ unsigned long pfn, end_pfn;
int node = pgdat->node_id;
- struct page *page;
-
- nr_pages = PAGE_ALIGN(sizeof(struct pglist_data)) >> PAGE_SHIFT;
- page = virt_to_page(pgdat);
-
- for (i = 0; i < nr_pages; i++, page++)
- get_page_bootmem(node, page, NODE_INFO);
pfn = pgdat->node_start_pfn;
end_pfn = pgdat_end_pfn(pgdat);
--
2.43.0
^ permalink raw reply related
* [PATCH 4/8] mm/bootmem_info: remove call to kmemleak_free_part_phys()
From: David Hildenbrand (Arm) @ 2026-05-11 14:05 UTC (permalink / raw)
To: David S. Miller, Andreas Larsson, Mike Rapoport, Andrew Morton,
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko
Cc: sparclinux, linux-kernel, linux-mm, linux-s390, linuxppc-dev,
David Hildenbrand (Arm)
In-Reply-To: <20260511-bootmem_info_prep-v1-0-3fb0be6fc688@kernel.org>
The call to kmemleak_free_part_phys() was added in 2022 in
commit dd0ff4d12dd2 ("bootmem: remove the vmemmap pages from kmemleak in
put_page_bootmem").
In 2025, commit b2aad24b5333 ("mm/memmap: prevent double scanning of memmap
by kmemleak") started to use MEMBLOCK_ALLOC_NOLEAKTRACE when allocating
the memmap to skip the kmemleak_alloc_phys() in the buddy.
So remove the call to kmemleak_free_part_phys(). If this would still
be required for other purposes, either free_reserved_page() should take
care of it, or selected users.
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
include/linux/bootmem_info.h | 1 -
mm/bootmem_info.c | 1 -
2 files changed, 2 deletions(-)
diff --git a/include/linux/bootmem_info.h b/include/linux/bootmem_info.h
index 492ceeb1cdf8..f724340755e5 100644
--- a/include/linux/bootmem_info.h
+++ b/include/linux/bootmem_info.h
@@ -82,7 +82,6 @@ static inline void get_page_bootmem(unsigned long info, struct page *page,
static inline void free_bootmem_page(struct page *page)
{
- kmemleak_free_part_phys(PFN_PHYS(page_to_pfn(page)), PAGE_SIZE);
free_reserved_page(page);
}
#endif
diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
index 6e2aaab3dca9..74c1116626c8 100644
--- a/mm/bootmem_info.c
+++ b/mm/bootmem_info.c
@@ -32,7 +32,6 @@ void put_page_bootmem(struct page *page)
if (page_ref_dec_return(page) == 1) {
set_page_private(page, 0);
- kmemleak_free_part_phys(PFN_PHYS(page_to_pfn(page)), PAGE_SIZE);
free_reserved_page(page);
}
}
--
2.43.0
^ permalink raw reply related
* [PATCH 3/8] mm/bootmem_info: stop using PG_private
From: David Hildenbrand (Arm) @ 2026-05-11 14:05 UTC (permalink / raw)
To: David S. Miller, Andreas Larsson, Mike Rapoport, Andrew Morton,
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko
Cc: sparclinux, linux-kernel, linux-mm, linux-s390, linuxppc-dev,
David Hildenbrand (Arm)
In-Reply-To: <20260511-bootmem_info_prep-v1-0-3fb0be6fc688@kernel.org>
Nobody checks PG_private for these pages, and we can happily use
set_page_private() without setting PG_private. So let's just stop
setting/clearing PG_private.
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
mm/bootmem_info.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
index a0a1ecdec8d0..6e2aaab3dca9 100644
--- a/mm/bootmem_info.c
+++ b/mm/bootmem_info.c
@@ -19,7 +19,6 @@ void get_page_bootmem(unsigned long info, struct page *page,
{
BUG_ON(type > 0xf);
BUG_ON(info > (ULONG_MAX >> 4));
- SetPagePrivate(page);
set_page_private(page, info << 4 | type);
page_ref_inc(page);
}
@@ -32,7 +31,6 @@ void put_page_bootmem(struct page *page)
type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE);
if (page_ref_dec_return(page) == 1) {
- ClearPagePrivate(page);
set_page_private(page, 0);
kmemleak_free_part_phys(PFN_PHYS(page_to_pfn(page)), PAGE_SIZE);
free_reserved_page(page);
--
2.43.0
^ permalink raw reply related
* [PATCH 2/8] mm/bootmem_info: drop initialization of page->lru
From: David Hildenbrand (Arm) @ 2026-05-11 14:05 UTC (permalink / raw)
To: David S. Miller, Andreas Larsson, Mike Rapoport, Andrew Morton,
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko
Cc: sparclinux, linux-kernel, linux-mm, linux-s390, linuxppc-dev,
David Hildenbrand (Arm)
In-Reply-To: <20260511-bootmem_info_prep-v1-0-3fb0be6fc688@kernel.org>
In the past, we used to store the type in page->lru.next, introduced by
commit 5f24ce5fd34c ("thp: remove PG_buddy"). The location changed over
the years; ever since commit 0386aaa6e9c8 ("bootmem: stop using
page->index"), we store it alongside the info in page->private.
Consequently, there is no need to reset page->lru anymore.
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
mm/bootmem_info.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
index 3d7675a3ae04..a0a1ecdec8d0 100644
--- a/mm/bootmem_info.c
+++ b/mm/bootmem_info.c
@@ -34,7 +34,6 @@ void put_page_bootmem(struct page *page)
if (page_ref_dec_return(page) == 1) {
ClearPagePrivate(page);
set_page_private(page, 0);
- INIT_LIST_HEAD(&page->lru);
kmemleak_free_part_phys(PFN_PHYS(page_to_pfn(page)), PAGE_SIZE);
free_reserved_page(page);
}
--
2.43.0
^ permalink raw reply related
* [PATCH 1/8] sparc/mm: remove register_page_bootmem_info()
From: David Hildenbrand (Arm) @ 2026-05-11 14:05 UTC (permalink / raw)
To: David S. Miller, Andreas Larsson, Mike Rapoport, Andrew Morton,
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko
Cc: sparclinux, linux-kernel, linux-mm, linux-s390, linuxppc-dev,
David Hildenbrand (Arm)
In-Reply-To: <20260511-bootmem_info_prep-v1-0-3fb0be6fc688@kernel.org>
sparc does not select CONFIG_HAVE_BOOTMEM_INFO_NODE, therefore,
register_page_bootmem_info_node() is a nop.
Let's just get rid of register_page_bootmem_info().
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
arch/sparc/mm/init_64.c | 20 --------------------
1 file changed, 20 deletions(-)
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 367c269305e5..3b679b1d1d72 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -27,7 +27,6 @@
#include <linux/percpu.h>
#include <linux/mmzone.h>
#include <linux/gfp.h>
-#include <linux/bootmem_info.h>
#include <asm/head.h>
#include <asm/page.h>
@@ -2477,17 +2476,6 @@ int page_in_phys_avail(unsigned long paddr)
return 0;
}
-static void __init register_page_bootmem_info(void)
-{
-#ifdef CONFIG_NUMA
- int i;
-
- for_each_online_node(i)
- if (NODE_DATA(i)->node_spanned_pages)
- register_page_bootmem_info_node(NODE_DATA(i));
-#endif
-}
-
void __init arch_setup_zero_pages(void)
{
phys_addr_t zero_page_pa = kern_base +
@@ -2498,14 +2486,6 @@ void __init arch_setup_zero_pages(void)
void __init mem_init(void)
{
- /*
- * Must be done after boot memory is put on freelist, because here we
- * might set fields in deferred struct pages that have not yet been
- * initialized, and memblock_free_all() initializes all the reserved
- * deferred pages for us.
- */
- register_page_bootmem_info();
-
if (tlb_type == cheetah || tlb_type == cheetah_plus)
cheetah_ecache_flush_init();
}
--
2.43.0
^ permalink raw reply related
* [PATCH 0/8] mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE (Part 1)
From: David Hildenbrand (Arm) @ 2026-05-11 14:05 UTC (permalink / raw)
To: David S. Miller, Andreas Larsson, Mike Rapoport, Andrew Morton,
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko
Cc: sparclinux, linux-kernel, linux-mm, linux-s390, linuxppc-dev,
David Hildenbrand (Arm)
We want to remove CONFIG_HAVE_BOOTMEM_INFO_NODE. As a first step,
let's limit the remaining harm to x86 and core code, removing
sparc, ppc and s390 leftovers, starting the stepwise removal by removing
and simplifying some code.
Once a related x86 vmemmap fix [1] is in, we can merge part 2 that will
remove CONFIG_HAVE_BOOTMEM_INFO_NODE entirely.
Tested on x86-64 with hugetlb vmemmap optimization in combination with
KMEMLEAK, making sure that the problem reported in dd0ff4d12dd2 ("bootmem:
remove the vmemmap pages from kmemleak in put_page_bootmem") does not
reappear -- hoping I managed to trigger the original problem.
Heavily cross-compiled, but let's let build bots run on it for a bit.
[1] https://lore.kernel.org/r/20260429-vmemmap-v2-1-8dfcacffd877@kernel.org
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
David Hildenbrand (Arm) (8):
sparc/mm: remove register_page_bootmem_info()
mm/bootmem_info: drop initialization of page->lru
mm/bootmem_info: stop using PG_private
mm/bootmem_info: remove call to kmemleak_free_part_phys()
mm/bootmem_info: stop marking the pgdat as NODE_INFO
mm/bootmem_info: stop marking mem_section_usage as MIX_SECTION_INFO
s390/mm: use free_reserved_page() in vmem_free_pages()
powerpc/mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE
arch/powerpc/mm/init_64.c | 8 --------
arch/s390/mm/vmem.c | 3 +--
arch/sparc/mm/init_64.c | 20 --------------------
include/linux/bootmem_info.h | 1 -
mm/Kconfig | 2 +-
mm/bootmem_info.c | 25 ++-----------------------
6 files changed, 4 insertions(+), 55 deletions(-)
---
base-commit: e9dd96806dbc2d50a66770b6a86962bd5d601153
change-id: 20260511-bootmem_info_prep-bfc0e7a5b87e
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH] drivers/base/memory: make memory block get/put explicit
From: Lorenzo Stoakes @ 2026-05-11 13:48 UTC (permalink / raw)
To: Muchun Song
Cc: David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, Andrew Morton,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, linux-mm, driver-core,
linux-kernel, linuxppc-dev, linux-s390, muchun.song
In-Reply-To: <20260511111800.2181785-1-songmuchun@bytedance.com>
On Mon, May 11, 2026 at 07:18:00PM +0800, Muchun Song wrote:
> Rename the memory block lookup helper to make the acquired reference
> explicit, add memory_block_put() to wrap put_device(), and collapse the
> redundant section-number wrapper into a single block-id based lookup
> interface.
>
> This makes it clearer to callers that a successful lookup holds a
> reference that must be dropped, reducing the chance of forgetting the
> matching put and leaking the memory block device reference.
As David said, let's reference more of what you've done in the various
refactorings.
>
> Link: https://lore.kernel.org/linux-mm/7887915D-E598-42B3-9AFE-BFFBACE8DE2D@linux.dev/#t
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
LGTM overall, so:
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
> ---
> .../platforms/pseries/hotplug-memory.c | 14 ++-----
> drivers/base/memory.c | 38 +++++++------------
> drivers/base/node.c | 4 +-
> drivers/s390/char/sclp_mem.c | 17 ++++-----
> include/linux/memory.h | 7 +++-
> mm/memory_hotplug.c | 5 +--
> 6 files changed, 35 insertions(+), 50 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 75f85a5da981..94f3b57054b6 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -164,13 +164,7 @@ static int update_lmb_associativity_index(struct drmem_lmb *lmb)
>
> static struct memory_block *lmb_to_memblock(struct drmem_lmb *lmb)
> {
> - unsigned long section_nr;
> - struct memory_block *mem_block;
> -
> - section_nr = pfn_to_section_nr(PFN_DOWN(lmb->base_addr));
> -
> - mem_block = find_memory_block(section_nr);
> - return mem_block;
> + return memory_block_get(phys_to_block_id(lmb->base_addr));
Ah nice I see this does the equivalent via phys_to_block_id() and
pfn_to_block_id() so that's a nice cleanup.
> }
>
> static int get_lmb_range(u32 drc_index, int n_lmbs,
> @@ -220,7 +214,7 @@ static int dlpar_change_lmb_state(struct drmem_lmb *lmb, bool online)
> else
> rc = 0;
>
> - put_device(&mem_block->dev);
> + memory_block_put(mem_block);
>
> return rc;
> }
> @@ -319,12 +313,12 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
>
> rc = dlpar_offline_lmb(lmb);
> if (rc) {
> - put_device(&mem_block->dev);
> + memory_block_put(mem_block);
> return rc;
> }
>
> __remove_memory(lmb->base_addr, memory_block_size);
> - put_device(&mem_block->dev);
> + memory_block_put(mem_block);
>
> /* Update memory regions for memory remove */
> memblock_remove(lmb->base_addr, memory_block_size);
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 11d57cfa8d72..5b5d41089e81 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -649,7 +649,7 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
> *
> * Called under device_hotplug_lock.
> */
> -struct memory_block *find_memory_block_by_id(unsigned long block_id)
> +struct memory_block *memory_block_get(unsigned long block_id)
> {
> struct memory_block *mem;
>
> @@ -659,16 +659,6 @@ struct memory_block *find_memory_block_by_id(unsigned long block_id)
> return mem;
> }
>
> -/*
> - * Called under device_hotplug_lock.
> - */
> -struct memory_block *find_memory_block(unsigned long section_nr)
> -{
> - unsigned long block_id = memory_block_id(section_nr);
> -
> - return find_memory_block_by_id(block_id);
> -}
> -
> static struct attribute *memory_memblk_attrs[] = {
> &dev_attr_phys_index.attr,
> &dev_attr_state.attr,
> @@ -701,7 +691,7 @@ static int __add_memory_block(struct memory_block *memory)
>
> ret = device_register(&memory->dev);
> if (ret) {
> - put_device(&memory->dev);
> + memory_block_put(memory);
> return ret;
> }
> ret = xa_err(xa_store(&memory_blocks, memory->dev.id, memory,
> @@ -795,9 +785,9 @@ static int add_memory_block(unsigned long block_id, int nid, unsigned long state
> struct memory_block *mem;
> int ret = 0;
>
> - mem = find_memory_block_by_id(block_id);
> + mem = memory_block_get(block_id);
> if (mem) {
> - put_device(&mem->dev);
> + memory_block_put(mem);
> return -EEXIST;
> }
> mem = kzalloc_obj(*mem);
> @@ -845,8 +835,8 @@ static void remove_memory_block(struct memory_block *memory)
> memory->group = NULL;
> }
>
> - /* drop the ref. we got via find_memory_block() */
> - put_device(&memory->dev);
> + /* drop the ref. we got via memory_block_get() */
> + memory_block_put(memory);
> device_unregister(&memory->dev);
> }
>
> @@ -880,7 +870,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
> end_block_id = block_id;
> for (block_id = start_block_id; block_id != end_block_id;
> block_id++) {
> - mem = find_memory_block_by_id(block_id);
> + mem = memory_block_get(block_id);
> if (WARN_ON_ONCE(!mem))
> continue;
> remove_memory_block(mem);
> @@ -908,7 +898,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
> return;
>
> for (block_id = start_block_id; block_id != end_block_id; block_id++) {
> - mem = find_memory_block_by_id(block_id);
> + mem = memory_block_get(block_id);
> if (WARN_ON_ONCE(!mem))
> continue;
> num_poisoned_pages_sub(-1UL, memblk_nr_poison(mem));
> @@ -1015,12 +1005,12 @@ int walk_memory_blocks(unsigned long start, unsigned long size,
> return 0;
>
> for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
> - mem = find_memory_block_by_id(block_id);
> + mem = memory_block_get(block_id);
> if (!mem)
> continue;
>
> ret = func(mem, arg);
> - put_device(&mem->dev);
> + memory_block_put(mem);
> if (ret)
> break;
> }
> @@ -1228,22 +1218,22 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
> void memblk_nr_poison_inc(unsigned long pfn)
> {
> const unsigned long block_id = pfn_to_block_id(pfn);
> - struct memory_block *mem = find_memory_block_by_id(block_id);
> + struct memory_block *mem = memory_block_get(block_id);
>
> if (mem) {
> atomic_long_inc(&mem->nr_hwpoison);
> - put_device(&mem->dev);
> + memory_block_put(mem);
> }
> }
>
> void memblk_nr_poison_sub(unsigned long pfn, long i)
> {
> const unsigned long block_id = pfn_to_block_id(pfn);
> - struct memory_block *mem = find_memory_block_by_id(block_id);
> + struct memory_block *mem = memory_block_get(block_id);
>
> if (mem) {
> atomic_long_sub(i, &mem->nr_hwpoison);
> - put_device(&mem->dev);
> + memory_block_put(mem);
> }
> }
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 126f66aa2c3e..b3333ca92090 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -847,13 +847,13 @@ static void register_memory_blocks_under_nodes(void)
> for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
> struct memory_block *mem;
>
> - mem = find_memory_block_by_id(block_id);
> + mem = memory_block_get(block_id);
> if (!mem)
> continue;
>
> memory_block_add_nid_early(mem, nid);
> do_register_memory_block_under_node(nid, mem);
> - put_device(&mem->dev);
> + memory_block_put(mem);
> }
>
> }
> diff --git a/drivers/s390/char/sclp_mem.c b/drivers/s390/char/sclp_mem.c
> index 78c054e26d17..6df1926d4c62 100644
> --- a/drivers/s390/char/sclp_mem.c
> +++ b/drivers/s390/char/sclp_mem.c
> @@ -204,7 +204,7 @@ static ssize_t sclp_config_mem_store(struct kobject *kobj, struct kobj_attribute
> addr = sclp_mem->id * block_size;
> /*
> * Hold device_hotplug_lock when adding/removing memory blocks.
> - * Additionally, also protect calls to find_memory_block() and
> + * Additionally, also protect calls to memory_block_get() and
> * sclp_attach_storage().
> */
> rc = lock_device_hotplug_sysfs();
> @@ -231,20 +231,19 @@ static ssize_t sclp_config_mem_store(struct kobject *kobj, struct kobj_attribute
> sclp_mem_change_state(addr, block_size, 0);
> goto out_unlock;
> }
> - mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(addr)));
> - put_device(&mem->dev);
> + mem = memory_block_get(phys_to_block_id(addr));
> + memory_block_put(mem);
> WRITE_ONCE(sclp_mem->config, 1);
> } else {
> if (!sclp_mem->config)
> goto out_unlock;
> - mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(addr)));
> + mem = memory_block_get(phys_to_block_id(addr));
> if (mem->state != MEM_OFFLINE) {
> - put_device(&mem->dev);
> + memory_block_put(mem);
> rc = -EBUSY;
> goto out_unlock;
> }
> - /* drop the ref just got via find_memory_block() */
> - put_device(&mem->dev);
> + memory_block_put(mem);
> sclp_mem_change_state(addr, block_size, 0);
> __remove_memory(addr, block_size);
> #ifdef CONFIG_KASAN
> @@ -294,11 +293,11 @@ static ssize_t sclp_memmap_on_memory_store(struct kobject *kobj, struct kobj_att
> return rc;
> block_size = memory_block_size_bytes();
> sclp_mem = container_of(kobj, struct sclp_mem, kobj);
> - mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(sclp_mem->id * block_size)));
> + mem = memory_block_get(phys_to_block_id(sclp_mem->id * block_size));
> if (!mem) {
> WRITE_ONCE(sclp_mem->memmap_on_memory, value);
> } else {
> - put_device(&mem->dev);
> + memory_block_put(mem);
> rc = -EBUSY;
> }
> unlock_device_hotplug();
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 5bb5599c6b2b..29edef1f975c 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -158,7 +158,11 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
> void remove_memory_block_devices(unsigned long start, unsigned long size);
> extern void memory_dev_init(void);
> extern int memory_notify(enum memory_block_state state, void *v);
> -extern struct memory_block *find_memory_block(unsigned long section_nr);
> +extern struct memory_block *memory_block_get(unsigned long block_id);
> +static inline void memory_block_put(struct memory_block *mem)
Yeah as David says, we remove extern's now as and when we touch that code :)
> +{
> + put_device(&mem->dev);
> +}
> typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *);
> extern int walk_memory_blocks(unsigned long start, unsigned long size,
> void *arg, walk_memory_blocks_func_t func);
> @@ -171,7 +175,6 @@ struct memory_group *memory_group_find_by_id(int mgid);
> typedef int (*walk_memory_groups_func_t)(struct memory_group *, void *);
> int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
> struct memory_group *excluded, void *arg);
> -struct memory_block *find_memory_block_by_id(unsigned long block_id);
> #define hotplug_memory_notifier(fn, pri) ({ \
> static __meminitdata struct notifier_block fn##_mem_nb =\
> { .notifier_call = fn, .priority = pri };\
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 462d8dcd636d..890c6453e887 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1417,14 +1417,13 @@ static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
> struct vmem_altmap *altmap = NULL;
> struct memory_block *mem;
>
> - mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(cur_start)));
> + mem = memory_block_get(phys_to_block_id(cur_start));
> if (WARN_ON_ONCE(!mem))
> continue;
>
> altmap = mem->altmap;
> mem->altmap = NULL;
> - /* drop the ref. we got via find_memory_block() */
> - put_device(&mem->dev);
> + memory_block_put(mem);
>
> remove_memory_block_devices(cur_start, memblock_size);
>
>
> base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
> --
> 2.54.0
>
^ permalink raw reply
* Re: [PATCH] drivers/base/memory: make memory block get/put explicit
From: Muchun Song @ 2026-05-11 13:23 UTC (permalink / raw)
To: David Hildenbrand
Cc: Muchun Song, Oscar Salvador, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, linux-mm, driver-core,
linux-kernel, linuxppc-dev, linux-s390
In-Reply-To: <2841f424-580c-48c6-bb26-de30e4397b7f@kernel.org>
> On May 11, 2026, at 20:22, David Hildenbrand (Arm) <david@kernel.org> wrote:
>
> On 5/11/26 13:18, Muchun Song wrote:
>> Rename the memory block lookup helper to make the acquired reference
>> explicit, add memory_block_put() to wrap put_device(), and collapse the
>> redundant section-number wrapper into a single block-id based lookup
>> interface.
>>
>> This makes it clearer to callers that a successful lookup holds a
>> reference that must be dropped, reducing the chance of forgetting the
>> matching put and leaking the memory block device reference.
>
> Better mention some of the other changes here, like removing find_memory_block().
Will do.
>
> [...]
>
>> unlock_device_hotplug();
>> diff --git a/include/linux/memory.h b/include/linux/memory.h
>> index 5bb5599c6b2b..29edef1f975c 100644
>> --- a/include/linux/memory.h
>> +++ b/include/linux/memory.h
>> @@ -158,7 +158,11 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
>> void remove_memory_block_devices(unsigned long start, unsigned long size);
>> extern void memory_dev_init(void);
>> extern int memory_notify(enum memory_block_state state, void *v);
>> -extern struct memory_block *find_memory_block(unsigned long section_nr);
>> +extern struct memory_block *memory_block_get(unsigned long block_id);
>
> While at it, please drop the "extern".
OK.
>
>> +static inline void memory_block_put(struct memory_block *mem)
>> +{
>> + put_device(&mem->dev);
>> +}
>> typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *);
>> extern int walk_memory_blocks(unsigned long start, unsigned long size,
>> void *arg, walk_memory_blocks_func_t func);
>> @@ -171,7 +175,6 @@ struct memory_group *memory_group_find_by_id(int mgid);
>> typedef int (*walk_memory_groups_func_t)(struct memory_group *, void *);
>> int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
>> struct memory_group *excluded, void *arg);
>> -struct memory_block *find_memory_block_by_id(unsigned long block_id);
>> #define hotplug_memory_notifier(fn, pri) ({ \
>> static __meminitdata struct notifier_block fn##_mem_nb =\
>> { .notifier_call = fn, .priority = pri };\
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 462d8dcd636d..890c6453e887 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1417,14 +1417,13 @@ static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
>> struct vmem_altmap *altmap = NULL;
>> struct memory_block *mem;
>>
>> - mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(cur_start)));
>> + mem = memory_block_get(phys_to_block_id(cur_start));
>> if (WARN_ON_ONCE(!mem))
>> continue;
>>
>> altmap = mem->altmap;
>> mem->altmap = NULL;
>> - /* drop the ref. we got via find_memory_block() */
>> - put_device(&mem->dev);
>> + memory_block_put(mem);
>
> Would guards come in handy here?
You mean to introduce something like:
scoped_guard(memory_block, id) {
}
Right? If yes, I will give it a try.
>
> In general
>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Thanks.
Muchun
>
> --
> Cheers,
>
> David
^ permalink raw reply
* Re: [PATCH v13 04/15] arm64: kexec_file: Fix potential buffer overflow in prepare_elf_headers()
From: Breno Leitao @ 2026-05-11 12:30 UTC (permalink / raw)
To: Jinjie Ruan
Cc: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
pasha.tatashin, pratyush, ruirui.yang, rdunlap, pmladek,
dapeng1.mi, kees, elver, kuba, ebiggers, lirongqing, paulmck,
sourabhjain, coxu, jbohac, ryan.roberts, osandov, cfsworks,
tangyouling, ritesh.list, adityag, guoren, songshuaishuai,
kevin.brodsky, vishal.moola, junhui.liu, wangruikang, namcao,
chao.gao, seanjc, fuqiang.wang, ardb, chenjiahao16, hbathini,
takahiro.akashi, james.morse, lizhengyu3, x86, linux-doc,
linux-kernel, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, devicetree, kexec
In-Reply-To: <79c14bee-b1f5-4d70-8345-6582d6cf0128@huawei.com>
On Mon, May 11, 2026 at 07:30:44PM +0800, Jinjie Ruan wrote:
>
>
> On 5/11/2026 5:46 PM, Breno Leitao wrote:
> > On Mon, May 11, 2026 at 11:04:43AM +0800, Jinjie Ruan wrote:
> >> There is a race condition between the kexec_load() system call
> >> (crash kernel loading path) and memory hotplug operations that can
> >> lead to buffer overflow and potential kernel crash.
> >>
> >> During prepare_elf_headers(), the following steps occur:
> >> 1. The first for_each_mem_range() queries current System RAM memory ranges
> >> 2. Allocates buffer based on queried count
> >> 3. The 2st for_each_mem_range() populates ranges from memblock
> >>
> >> If memory hotplug occurs between step 1 and step 3, the number of ranges
> >> can increase, causing out-of-bounds write when populating cmem->ranges[].
> >>
> >> This happens because kexec_load() uses kexec_trylock (atomic_t) while
> >> memory hotplug uses device_hotplug_lock (mutex), so they don't serialize
> >> with each other.
> >>
> >> Add the explicit bounds checking to prevent out-of-bounds access.
> >
> > It seems you have a TOCTOU type of issue, and this seems to be shrinking
> > the window, but not fully solving it?
>
> Hi Breno,
>
> Thanks for your comments regarding the TOCTOU issue.
>
> You are correct that the current bounds checking only "shrinks the
> window" and prevents a kernel crash, but doesn't fully guarantee header
> consistency if a race occurs.
>
> In my local environment, this race is extremely difficult to reproduce,
> but it is theoretically possible.
>
> To address this properly for arm64, I am considering two steps:
>
> - For this patch: I will change the return value to -EAGAIN and keep the
> bounds check. This ensures that even if a race happens, the kernel
> remains safe (no OOB access), and user-space is notified to retry.
>
> - Long-term solution: A better way to solve this is to implement ARM64
> CRASH_HOTPLUG support (similar to x86). With crash hotplug, the kernel
> will automatically re-generate the crash headers whenever a memory
> hotplug event occurs. This makes the TOCTOU during the initial
> kexec_load less critical, as any transient inconsistency will be
> immediately corrected by the subsequent hotplug handler.
>
> Does it make sense to you to use this patch as a safety guard first, and
> then I (or someone else) follow up with the full CRASH_HOTPLUG support
> for arm64 as [1]?
It would be OK for me, but, make it explict that there is a TOCTOU
issue, that depends on CRASH_HOTPLUG.
^ permalink raw reply
* Re: [PATCH] drivers/base/memory: make memory block get/put explicit
From: David Hildenbrand (Arm) @ 2026-05-11 12:22 UTC (permalink / raw)
To: Muchun Song, Oscar Salvador, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich
Cc: Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy (CS GROUP), Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle, linux-mm,
driver-core, linux-kernel, linuxppc-dev, linux-s390, muchun.song
In-Reply-To: <20260511111800.2181785-1-songmuchun@bytedance.com>
On 5/11/26 13:18, Muchun Song wrote:
> Rename the memory block lookup helper to make the acquired reference
> explicit, add memory_block_put() to wrap put_device(), and collapse the
> redundant section-number wrapper into a single block-id based lookup
> interface.
>
> This makes it clearer to callers that a successful lookup holds a
> reference that must be dropped, reducing the chance of forgetting the
> matching put and leaking the memory block device reference.
Better mention some of the other changes here, like removing find_memory_block().
[...]
> unlock_device_hotplug();
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 5bb5599c6b2b..29edef1f975c 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -158,7 +158,11 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
> void remove_memory_block_devices(unsigned long start, unsigned long size);
> extern void memory_dev_init(void);
> extern int memory_notify(enum memory_block_state state, void *v);
> -extern struct memory_block *find_memory_block(unsigned long section_nr);
> +extern struct memory_block *memory_block_get(unsigned long block_id);
While at it, please drop the "extern".
> +static inline void memory_block_put(struct memory_block *mem)
> +{
> + put_device(&mem->dev);
> +}
> typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *);
> extern int walk_memory_blocks(unsigned long start, unsigned long size,
> void *arg, walk_memory_blocks_func_t func);
> @@ -171,7 +175,6 @@ struct memory_group *memory_group_find_by_id(int mgid);
> typedef int (*walk_memory_groups_func_t)(struct memory_group *, void *);
> int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
> struct memory_group *excluded, void *arg);
> -struct memory_block *find_memory_block_by_id(unsigned long block_id);
> #define hotplug_memory_notifier(fn, pri) ({ \
> static __meminitdata struct notifier_block fn##_mem_nb =\
> { .notifier_call = fn, .priority = pri };\
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 462d8dcd636d..890c6453e887 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1417,14 +1417,13 @@ static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
> struct vmem_altmap *altmap = NULL;
> struct memory_block *mem;
>
> - mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(cur_start)));
> + mem = memory_block_get(phys_to_block_id(cur_start));
> if (WARN_ON_ONCE(!mem))
> continue;
>
> altmap = mem->altmap;
> mem->altmap = NULL;
> - /* drop the ref. we got via find_memory_block() */
> - put_device(&mem->dev);
> + memory_block_put(mem);
Would guards come in handy here?
In general
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH] drivers/base/memory: make memory block get/put explicit
From: Oscar Salvador @ 2026-05-11 12:14 UTC (permalink / raw)
To: Muchun Song
Cc: David Hildenbrand, Greg Kroah-Hartman, Rafael J. Wysocki,
Danilo Krummrich, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy (CS GROUP), Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle, linux-mm,
driver-core, linux-kernel, linuxppc-dev, linux-s390, muchun.song
In-Reply-To: <20260511111800.2181785-1-songmuchun@bytedance.com>
On Mon, May 11, 2026 at 07:18:00PM +0800, Muchun Song wrote:
> Rename the memory block lookup helper to make the acquired reference
> explicit, add memory_block_put() to wrap put_device(), and collapse the
> redundant section-number wrapper into a single block-id based lookup
> interface.
>
> This makes it clearer to callers that a successful lookup holds a
> reference that must be dropped, reducing the chance of forgetting the
> matching put and leaking the memory block device reference.
>
> Link: https://lore.kernel.org/linux-mm/7887915D-E598-42B3-9AFE-BFFBACE8DE2D@linux.dev/#t
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Thanks, it looks more solid to me!
--
Oscar Salvador
SUSE Labs
^ permalink raw reply
* [PATCH] powerpc/powernv: fix null pointer dereference in pnv_get_random_long()
From: Paul Menzel @ 2026-05-11 12:04 UTC (permalink / raw)
To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy (CS GROUP), Kees Cook, Tony Luck,
Guilherme G. Piccoli, Jason A. Donenfeld
Cc: Paul Menzel, stable, linuxppc-dev, linux-kernel
pnv_get_random_long() dereferences the per-CPU pnv_rng pointer without
checking whether it has been initialized resulting in the oops below:
[ 0.000000] Linux version 7.1.0-rc2+ (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) #3 SMP PREEMPT Wed May 6 08:50:58 CEST 2026
[…]
[ 17.901992] Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
[ 17.902011] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 17.902018] Faulting instruction address: 0xc0000000000e7138
[ 17.902027] Oops: Kernel access of bad area, sig: 11 [#1]
[ 17.902034] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[ 17.902045] Modules linked in: powernv_rng(+) bnx2x ofpart ibmpowernv bfq mdio cmdlinepart powernv_flash ipmi_powernv ipmi_devintf mtd ipmi_msghandler at24(+) vmx_crypto opal_prd sch_fq_codel nfsd parport_pc ppdev auth_rpcgss nfs_acl lp lockd grace parport sunrpc autofs4 btrfs xor libblake2b raid6_pq ast drm_shmem_helper drm_client_lib i2c_algo_bit drm_kms_helper drm ahci drm_panel_orientation_quirks libahci
[ 17.902185] CPU: 147 UID: 0 PID: 2626 Comm: hwrng Not tainted 7.1.0-rc2+ #3 PREEMPTLAZY
[ 17.902197] Hardware name: 8335-GCA POWER8 (raw) 0x4d0200 opal:skiboot-5.4.8-5787ad3 PowerNV
[ 17.902204] NIP: c0000000000e7138 LR: c00800001ec8013c CTR: c0000000000e70fc
[ 17.902212] REGS: c000000092913c50 TRAP: 0300 Not tainted (7.1.0-rc2+)
[ 17.902222] MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44420220 XER: 20000000
[ 17.902269] CFAR: c00800001ec8026c DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
GPR00: c00800001ec8013c c000000092913ef0 c000000001c18100 c00000002222d900
GPR04: c00000002222d900 0000000000000080 0000000000000001 0000000000000000
GPR08: 0000000000000000 c000000002212000 c0000000951e1780 c00800001ec80258
GPR12: c0000000000e70fc c00000ffff6fd700 c0000000001d11c0 c00000001b99b9c0
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 c000000002fe6a58 0000000000000000 0000000000000000
GPR28: c000000002fe6a20 0000000000000010 000000000000000f c00000002222d900
[ 17.902406] NIP [c0000000000e7138] pnv_get_random_long+0x3c/0x114
[ 17.902426] LR [c00800001ec8013c] powernv_rng_read+0x78/0xc4 [powernv_rng]
[ 17.902444] Call Trace:
[ 17.902448] [c000000092913ef0] [c000000092913f30] 0xc000000092913f30 (unreliable)
[ 17.902463] [c000000092913f30] [c000000000decd58] hwrng_fillfn+0xd4/0x3dc
[ 17.902484] [c000000092913f90] [c0000000001d1328] kthread+0x170/0x1a4
[ 17.902498] [c000000092913fe0] [c00000000000d030] start_kernel_thread+0x14/0x18
[ 17.902513] Code: 60000000 7d2000a6 71290010 418200bc e94d0908 812a0000 39290001 912a0000 e90d0030 3d220060 39299f00 7d08482a <e9280000> 7c0004ac e8e90000 0c070000
[ 17.902569] ---[ end trace 0000000000000000 ]---
[ 18.008801] pstore: backend (nvram) writing error (-1)
[ 18.015458] note: hwrng[2626] exited with irqs disabled
[ 18.015483] note: hwrng[2626] exited with preempt_count 1
Commit f3eac426657d ("powerpc/powernv: wire up rng during setup_arch")
introduced a lazy initialization path via pnv_get_random_long_early():
per-CPU pointers are left NULL until slab becomes available and
rng_create() completes.
pnv_get_random_long() is an exported symbol called directly by the
powernv_rng hwrng module (powernv_rng_read()), bypassing the
ppc_md.get_random_seed guard that would otherwise ensure per-CPU data is
ready. If the hwrng fill thread runs on a CPU whose slot is still NULL,
the function crashes dereferencing rng->regs at offset 0.
Guard both branches with a NULL check and return 0 (no data) when the
per-CPU pointer has not been set up yet.
Testing on the IBM Power S822LC (8335-GCA POWER8 (raw) 0x4d0200
opal:skiboot-5.4.8-5787ad3 PowerNV) is successful:
[ 23.850775] powernv_rng: Registered powernv hwrng.
Fixes: f3eac426657d ("powerpc/powernv: wire up rng during setup_arch")
Link: https://lore.kernel.org/all/a159e81a-ccfd-440f-af68-6a56cca09cb2@molgen.mpg.de/
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: stable@vger.kernel.org # v5.18
Assisted-by: Claude Sonnet 4.6
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
---
No idea, how to test, that the rng works as expected (and if, despite
the missing message) it didn’t work before.
arch/powerpc/platforms/powernv/rng.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/rng.c b/arch/powerpc/platforms/powernv/rng.c
index 7a4c38cd6a82..dc71eaf5d954 100644
--- a/arch/powerpc/platforms/powernv/rng.c
+++ b/arch/powerpc/platforms/powernv/rng.c
@@ -87,12 +87,16 @@ int pnv_get_random_long(unsigned long *v)
if (mfmsr() & MSR_DR) {
rng = get_cpu_var(pnv_rng);
- *v = rng_whiten(rng, in_be64(rng->regs));
+ if (rng)
+ *v = rng_whiten(rng, in_be64(rng->regs));
put_cpu_var(rng);
- } else {
- rng = raw_cpu_read(pnv_rng);
- *v = rng_whiten(rng, __raw_rm_readq(rng->regs_real));
+ return rng ? 1 : 0;
}
+
+ rng = raw_cpu_read(pnv_rng);
+ if (!rng)
+ return 0;
+ *v = rng_whiten(rng, __raw_rm_readq(rng->regs_real));
return 1;
}
EXPORT_SYMBOL_GPL(pnv_get_random_long);
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v13 04/15] arm64: kexec_file: Fix potential buffer overflow in prepare_elf_headers()
From: Jinjie Ruan @ 2026-05-11 11:30 UTC (permalink / raw)
To: Breno Leitao
Cc: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
pasha.tatashin, pratyush, ruirui.yang, rdunlap, pmladek,
dapeng1.mi, kees, elver, kuba, ebiggers, lirongqing, paulmck,
sourabhjain, coxu, jbohac, ryan.roberts, osandov, cfsworks,
tangyouling, ritesh.list, adityag, guoren, songshuaishuai,
kevin.brodsky, vishal.moola, junhui.liu, wangruikang, namcao,
chao.gao, seanjc, fuqiang.wang, ardb, chenjiahao16, hbathini,
takahiro.akashi, james.morse, lizhengyu3, x86, linux-doc,
linux-kernel, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, devicetree, kexec
In-Reply-To: <agGkvrg06KNDNfDi@gmail.com>
On 5/11/2026 5:46 PM, Breno Leitao wrote:
> On Mon, May 11, 2026 at 11:04:43AM +0800, Jinjie Ruan wrote:
>> There is a race condition between the kexec_load() system call
>> (crash kernel loading path) and memory hotplug operations that can
>> lead to buffer overflow and potential kernel crash.
>>
>> During prepare_elf_headers(), the following steps occur:
>> 1. The first for_each_mem_range() queries current System RAM memory ranges
>> 2. Allocates buffer based on queried count
>> 3. The 2st for_each_mem_range() populates ranges from memblock
>>
>> If memory hotplug occurs between step 1 and step 3, the number of ranges
>> can increase, causing out-of-bounds write when populating cmem->ranges[].
>>
>> This happens because kexec_load() uses kexec_trylock (atomic_t) while
>> memory hotplug uses device_hotplug_lock (mutex), so they don't serialize
>> with each other.
>>
>> Add the explicit bounds checking to prevent out-of-bounds access.
>
> It seems you have a TOCTOU type of issue, and this seems to be shrinking
> the window, but not fully solving it?
Hi Breno,
Thanks for your comments regarding the TOCTOU issue.
You are correct that the current bounds checking only "shrinks the
window" and prevents a kernel crash, but doesn't fully guarantee header
consistency if a race occurs.
In my local environment, this race is extremely difficult to reproduce,
but it is theoretically possible.
To address this properly for arm64, I am considering two steps:
- For this patch: I will change the return value to -EAGAIN and keep the
bounds check. This ensures that even if a race happens, the kernel
remains safe (no OOB access), and user-space is notified to retry.
- Long-term solution: A better way to solve this is to implement ARM64
CRASH_HOTPLUG support (similar to x86). With crash hotplug, the kernel
will automatically re-generate the crash headers whenever a memory
hotplug event occurs. This makes the TOCTOU during the initial
kexec_load less critical, as any transient inconsistency will be
immediately corrected by the subsequent hotplug handler.
Does it make sense to you to use this patch as a safety guard first, and
then I (or someone else) follow up with the full CRASH_HOTPLUG support
for arm64 as [1]?
[1]:
https://lore.kernel.org/all/20260402081459.635022-1-ruanjinjie@huawei.com/
Best regards,
Jinjie
>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Baoquan He <bhe@redhat.com>
>> Cc: Breno Leitao <leitao@debian.org>
>> Cc: stable@vger.kernel.org
>> Fixes: 3751e728cef2 ("arm64: kexec_file: add crash dump support")
>> Closes: https://sashiko.dev/#/patchset/20260323072745.2481719-1-ruanjinjie%40huawei.com
>> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
>> ---
>> arch/arm64/kernel/machine_kexec_file.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>> index e31fabed378a..a67e7b1abbab 100644
>> --- a/arch/arm64/kernel/machine_kexec_file.c
>> +++ b/arch/arm64/kernel/machine_kexec_file.c
>> @@ -59,6 +59,11 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
>> cmem->max_nr_ranges = nr_ranges;
>> cmem->nr_ranges = 0;
>> for_each_mem_range(i, &start, &end) {
>> + if (cmem->nr_ranges >= cmem->max_nr_ranges) {
>> + ret = -ENOMEM;
>
> -ENOMEM seems to be the the wrong errno. This isn't an allocation
> failure; it's a transient race. -EBUSY or -EAGAIN would be more honest
^ permalink raw reply
* Re: [PATCH] drivers/base/memory: make memory block get/put explicit
From: Michal Hocko @ 2026-05-11 11:27 UTC (permalink / raw)
To: Muchun Song
Cc: David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Heiko Carstens,
Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
Sven Schnelle, linux-mm, driver-core, linux-kernel, linuxppc-dev,
linux-s390, muchun.song
In-Reply-To: <20260511111800.2181785-1-songmuchun@bytedance.com>
On Mon 11-05-26 19:18:00, Muchun Song wrote:
> Rename the memory block lookup helper to make the acquired reference
> explicit, add memory_block_put() to wrap put_device(), and collapse the
> redundant section-number wrapper into a single block-id based lookup
> interface.
>
> This makes it clearer to callers that a successful lookup holds a
> reference that must be dropped, reducing the chance of forgetting the
> matching put and leaking the memory block device reference.
>
> Link: https://lore.kernel.org/linux-mm/7887915D-E598-42B3-9AFE-BFFBACE8DE2D@linux.dev/#t
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!
> ---
> .../platforms/pseries/hotplug-memory.c | 14 ++-----
> drivers/base/memory.c | 38 +++++++------------
> drivers/base/node.c | 4 +-
> drivers/s390/char/sclp_mem.c | 17 ++++-----
> include/linux/memory.h | 7 +++-
> mm/memory_hotplug.c | 5 +--
> 6 files changed, 35 insertions(+), 50 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 75f85a5da981..94f3b57054b6 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -164,13 +164,7 @@ static int update_lmb_associativity_index(struct drmem_lmb *lmb)
>
> static struct memory_block *lmb_to_memblock(struct drmem_lmb *lmb)
> {
> - unsigned long section_nr;
> - struct memory_block *mem_block;
> -
> - section_nr = pfn_to_section_nr(PFN_DOWN(lmb->base_addr));
> -
> - mem_block = find_memory_block(section_nr);
> - return mem_block;
> + return memory_block_get(phys_to_block_id(lmb->base_addr));
> }
>
> static int get_lmb_range(u32 drc_index, int n_lmbs,
> @@ -220,7 +214,7 @@ static int dlpar_change_lmb_state(struct drmem_lmb *lmb, bool online)
> else
> rc = 0;
>
> - put_device(&mem_block->dev);
> + memory_block_put(mem_block);
>
> return rc;
> }
> @@ -319,12 +313,12 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
>
> rc = dlpar_offline_lmb(lmb);
> if (rc) {
> - put_device(&mem_block->dev);
> + memory_block_put(mem_block);
> return rc;
> }
>
> __remove_memory(lmb->base_addr, memory_block_size);
> - put_device(&mem_block->dev);
> + memory_block_put(mem_block);
>
> /* Update memory regions for memory remove */
> memblock_remove(lmb->base_addr, memory_block_size);
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 11d57cfa8d72..5b5d41089e81 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -649,7 +649,7 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
> *
> * Called under device_hotplug_lock.
> */
> -struct memory_block *find_memory_block_by_id(unsigned long block_id)
> +struct memory_block *memory_block_get(unsigned long block_id)
> {
> struct memory_block *mem;
>
> @@ -659,16 +659,6 @@ struct memory_block *find_memory_block_by_id(unsigned long block_id)
> return mem;
> }
>
> -/*
> - * Called under device_hotplug_lock.
> - */
> -struct memory_block *find_memory_block(unsigned long section_nr)
> -{
> - unsigned long block_id = memory_block_id(section_nr);
> -
> - return find_memory_block_by_id(block_id);
> -}
> -
> static struct attribute *memory_memblk_attrs[] = {
> &dev_attr_phys_index.attr,
> &dev_attr_state.attr,
> @@ -701,7 +691,7 @@ static int __add_memory_block(struct memory_block *memory)
>
> ret = device_register(&memory->dev);
> if (ret) {
> - put_device(&memory->dev);
> + memory_block_put(memory);
> return ret;
> }
> ret = xa_err(xa_store(&memory_blocks, memory->dev.id, memory,
> @@ -795,9 +785,9 @@ static int add_memory_block(unsigned long block_id, int nid, unsigned long state
> struct memory_block *mem;
> int ret = 0;
>
> - mem = find_memory_block_by_id(block_id);
> + mem = memory_block_get(block_id);
> if (mem) {
> - put_device(&mem->dev);
> + memory_block_put(mem);
> return -EEXIST;
> }
> mem = kzalloc_obj(*mem);
> @@ -845,8 +835,8 @@ static void remove_memory_block(struct memory_block *memory)
> memory->group = NULL;
> }
>
> - /* drop the ref. we got via find_memory_block() */
> - put_device(&memory->dev);
> + /* drop the ref. we got via memory_block_get() */
> + memory_block_put(memory);
> device_unregister(&memory->dev);
> }
>
> @@ -880,7 +870,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
> end_block_id = block_id;
> for (block_id = start_block_id; block_id != end_block_id;
> block_id++) {
> - mem = find_memory_block_by_id(block_id);
> + mem = memory_block_get(block_id);
> if (WARN_ON_ONCE(!mem))
> continue;
> remove_memory_block(mem);
> @@ -908,7 +898,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
> return;
>
> for (block_id = start_block_id; block_id != end_block_id; block_id++) {
> - mem = find_memory_block_by_id(block_id);
> + mem = memory_block_get(block_id);
> if (WARN_ON_ONCE(!mem))
> continue;
> num_poisoned_pages_sub(-1UL, memblk_nr_poison(mem));
> @@ -1015,12 +1005,12 @@ int walk_memory_blocks(unsigned long start, unsigned long size,
> return 0;
>
> for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
> - mem = find_memory_block_by_id(block_id);
> + mem = memory_block_get(block_id);
> if (!mem)
> continue;
>
> ret = func(mem, arg);
> - put_device(&mem->dev);
> + memory_block_put(mem);
> if (ret)
> break;
> }
> @@ -1228,22 +1218,22 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
> void memblk_nr_poison_inc(unsigned long pfn)
> {
> const unsigned long block_id = pfn_to_block_id(pfn);
> - struct memory_block *mem = find_memory_block_by_id(block_id);
> + struct memory_block *mem = memory_block_get(block_id);
>
> if (mem) {
> atomic_long_inc(&mem->nr_hwpoison);
> - put_device(&mem->dev);
> + memory_block_put(mem);
> }
> }
>
> void memblk_nr_poison_sub(unsigned long pfn, long i)
> {
> const unsigned long block_id = pfn_to_block_id(pfn);
> - struct memory_block *mem = find_memory_block_by_id(block_id);
> + struct memory_block *mem = memory_block_get(block_id);
>
> if (mem) {
> atomic_long_sub(i, &mem->nr_hwpoison);
> - put_device(&mem->dev);
> + memory_block_put(mem);
> }
> }
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 126f66aa2c3e..b3333ca92090 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -847,13 +847,13 @@ static void register_memory_blocks_under_nodes(void)
> for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
> struct memory_block *mem;
>
> - mem = find_memory_block_by_id(block_id);
> + mem = memory_block_get(block_id);
> if (!mem)
> continue;
>
> memory_block_add_nid_early(mem, nid);
> do_register_memory_block_under_node(nid, mem);
> - put_device(&mem->dev);
> + memory_block_put(mem);
> }
>
> }
> diff --git a/drivers/s390/char/sclp_mem.c b/drivers/s390/char/sclp_mem.c
> index 78c054e26d17..6df1926d4c62 100644
> --- a/drivers/s390/char/sclp_mem.c
> +++ b/drivers/s390/char/sclp_mem.c
> @@ -204,7 +204,7 @@ static ssize_t sclp_config_mem_store(struct kobject *kobj, struct kobj_attribute
> addr = sclp_mem->id * block_size;
> /*
> * Hold device_hotplug_lock when adding/removing memory blocks.
> - * Additionally, also protect calls to find_memory_block() and
> + * Additionally, also protect calls to memory_block_get() and
> * sclp_attach_storage().
> */
> rc = lock_device_hotplug_sysfs();
> @@ -231,20 +231,19 @@ static ssize_t sclp_config_mem_store(struct kobject *kobj, struct kobj_attribute
> sclp_mem_change_state(addr, block_size, 0);
> goto out_unlock;
> }
> - mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(addr)));
> - put_device(&mem->dev);
> + mem = memory_block_get(phys_to_block_id(addr));
> + memory_block_put(mem);
> WRITE_ONCE(sclp_mem->config, 1);
> } else {
> if (!sclp_mem->config)
> goto out_unlock;
> - mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(addr)));
> + mem = memory_block_get(phys_to_block_id(addr));
> if (mem->state != MEM_OFFLINE) {
> - put_device(&mem->dev);
> + memory_block_put(mem);
> rc = -EBUSY;
> goto out_unlock;
> }
> - /* drop the ref just got via find_memory_block() */
> - put_device(&mem->dev);
> + memory_block_put(mem);
> sclp_mem_change_state(addr, block_size, 0);
> __remove_memory(addr, block_size);
> #ifdef CONFIG_KASAN
> @@ -294,11 +293,11 @@ static ssize_t sclp_memmap_on_memory_store(struct kobject *kobj, struct kobj_att
> return rc;
> block_size = memory_block_size_bytes();
> sclp_mem = container_of(kobj, struct sclp_mem, kobj);
> - mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(sclp_mem->id * block_size)));
> + mem = memory_block_get(phys_to_block_id(sclp_mem->id * block_size));
> if (!mem) {
> WRITE_ONCE(sclp_mem->memmap_on_memory, value);
> } else {
> - put_device(&mem->dev);
> + memory_block_put(mem);
> rc = -EBUSY;
> }
> unlock_device_hotplug();
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 5bb5599c6b2b..29edef1f975c 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -158,7 +158,11 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
> void remove_memory_block_devices(unsigned long start, unsigned long size);
> extern void memory_dev_init(void);
> extern int memory_notify(enum memory_block_state state, void *v);
> -extern struct memory_block *find_memory_block(unsigned long section_nr);
> +extern struct memory_block *memory_block_get(unsigned long block_id);
> +static inline void memory_block_put(struct memory_block *mem)
> +{
> + put_device(&mem->dev);
> +}
> typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *);
> extern int walk_memory_blocks(unsigned long start, unsigned long size,
> void *arg, walk_memory_blocks_func_t func);
> @@ -171,7 +175,6 @@ struct memory_group *memory_group_find_by_id(int mgid);
> typedef int (*walk_memory_groups_func_t)(struct memory_group *, void *);
> int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
> struct memory_group *excluded, void *arg);
> -struct memory_block *find_memory_block_by_id(unsigned long block_id);
> #define hotplug_memory_notifier(fn, pri) ({ \
> static __meminitdata struct notifier_block fn##_mem_nb =\
> { .notifier_call = fn, .priority = pri };\
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 462d8dcd636d..890c6453e887 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1417,14 +1417,13 @@ static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
> struct vmem_altmap *altmap = NULL;
> struct memory_block *mem;
>
> - mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(cur_start)));
> + mem = memory_block_get(phys_to_block_id(cur_start));
> if (WARN_ON_ONCE(!mem))
> continue;
>
> altmap = mem->altmap;
> mem->altmap = NULL;
> - /* drop the ref. we got via find_memory_block() */
> - put_device(&mem->dev);
> + memory_block_put(mem);
>
> remove_memory_block_devices(cur_start, memblock_size);
>
>
> base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
> --
> 2.54.0
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* [PATCH] drivers/base/memory: make memory block get/put explicit
From: Muchun Song @ 2026-05-11 11:18 UTC (permalink / raw)
To: David Hildenbrand, Oscar Salvador, Greg Kroah-Hartman,
Rafael J. Wysocki, Danilo Krummrich
Cc: Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy (CS GROUP), Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle, linux-mm,
driver-core, linux-kernel, linuxppc-dev, linux-s390, Muchun Song,
muchun.song
Rename the memory block lookup helper to make the acquired reference
explicit, add memory_block_put() to wrap put_device(), and collapse the
redundant section-number wrapper into a single block-id based lookup
interface.
This makes it clearer to callers that a successful lookup holds a
reference that must be dropped, reducing the chance of forgetting the
matching put and leaking the memory block device reference.
Link: https://lore.kernel.org/linux-mm/7887915D-E598-42B3-9AFE-BFFBACE8DE2D@linux.dev/#t
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
.../platforms/pseries/hotplug-memory.c | 14 ++-----
drivers/base/memory.c | 38 +++++++------------
drivers/base/node.c | 4 +-
drivers/s390/char/sclp_mem.c | 17 ++++-----
include/linux/memory.h | 7 +++-
mm/memory_hotplug.c | 5 +--
6 files changed, 35 insertions(+), 50 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 75f85a5da981..94f3b57054b6 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -164,13 +164,7 @@ static int update_lmb_associativity_index(struct drmem_lmb *lmb)
static struct memory_block *lmb_to_memblock(struct drmem_lmb *lmb)
{
- unsigned long section_nr;
- struct memory_block *mem_block;
-
- section_nr = pfn_to_section_nr(PFN_DOWN(lmb->base_addr));
-
- mem_block = find_memory_block(section_nr);
- return mem_block;
+ return memory_block_get(phys_to_block_id(lmb->base_addr));
}
static int get_lmb_range(u32 drc_index, int n_lmbs,
@@ -220,7 +214,7 @@ static int dlpar_change_lmb_state(struct drmem_lmb *lmb, bool online)
else
rc = 0;
- put_device(&mem_block->dev);
+ memory_block_put(mem_block);
return rc;
}
@@ -319,12 +313,12 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
rc = dlpar_offline_lmb(lmb);
if (rc) {
- put_device(&mem_block->dev);
+ memory_block_put(mem_block);
return rc;
}
__remove_memory(lmb->base_addr, memory_block_size);
- put_device(&mem_block->dev);
+ memory_block_put(mem_block);
/* Update memory regions for memory remove */
memblock_remove(lmb->base_addr, memory_block_size);
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 11d57cfa8d72..5b5d41089e81 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -649,7 +649,7 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
*
* Called under device_hotplug_lock.
*/
-struct memory_block *find_memory_block_by_id(unsigned long block_id)
+struct memory_block *memory_block_get(unsigned long block_id)
{
struct memory_block *mem;
@@ -659,16 +659,6 @@ struct memory_block *find_memory_block_by_id(unsigned long block_id)
return mem;
}
-/*
- * Called under device_hotplug_lock.
- */
-struct memory_block *find_memory_block(unsigned long section_nr)
-{
- unsigned long block_id = memory_block_id(section_nr);
-
- return find_memory_block_by_id(block_id);
-}
-
static struct attribute *memory_memblk_attrs[] = {
&dev_attr_phys_index.attr,
&dev_attr_state.attr,
@@ -701,7 +691,7 @@ static int __add_memory_block(struct memory_block *memory)
ret = device_register(&memory->dev);
if (ret) {
- put_device(&memory->dev);
+ memory_block_put(memory);
return ret;
}
ret = xa_err(xa_store(&memory_blocks, memory->dev.id, memory,
@@ -795,9 +785,9 @@ static int add_memory_block(unsigned long block_id, int nid, unsigned long state
struct memory_block *mem;
int ret = 0;
- mem = find_memory_block_by_id(block_id);
+ mem = memory_block_get(block_id);
if (mem) {
- put_device(&mem->dev);
+ memory_block_put(mem);
return -EEXIST;
}
mem = kzalloc_obj(*mem);
@@ -845,8 +835,8 @@ static void remove_memory_block(struct memory_block *memory)
memory->group = NULL;
}
- /* drop the ref. we got via find_memory_block() */
- put_device(&memory->dev);
+ /* drop the ref. we got via memory_block_get() */
+ memory_block_put(memory);
device_unregister(&memory->dev);
}
@@ -880,7 +870,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
end_block_id = block_id;
for (block_id = start_block_id; block_id != end_block_id;
block_id++) {
- mem = find_memory_block_by_id(block_id);
+ mem = memory_block_get(block_id);
if (WARN_ON_ONCE(!mem))
continue;
remove_memory_block(mem);
@@ -908,7 +898,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
return;
for (block_id = start_block_id; block_id != end_block_id; block_id++) {
- mem = find_memory_block_by_id(block_id);
+ mem = memory_block_get(block_id);
if (WARN_ON_ONCE(!mem))
continue;
num_poisoned_pages_sub(-1UL, memblk_nr_poison(mem));
@@ -1015,12 +1005,12 @@ int walk_memory_blocks(unsigned long start, unsigned long size,
return 0;
for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
- mem = find_memory_block_by_id(block_id);
+ mem = memory_block_get(block_id);
if (!mem)
continue;
ret = func(mem, arg);
- put_device(&mem->dev);
+ memory_block_put(mem);
if (ret)
break;
}
@@ -1228,22 +1218,22 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
void memblk_nr_poison_inc(unsigned long pfn)
{
const unsigned long block_id = pfn_to_block_id(pfn);
- struct memory_block *mem = find_memory_block_by_id(block_id);
+ struct memory_block *mem = memory_block_get(block_id);
if (mem) {
atomic_long_inc(&mem->nr_hwpoison);
- put_device(&mem->dev);
+ memory_block_put(mem);
}
}
void memblk_nr_poison_sub(unsigned long pfn, long i)
{
const unsigned long block_id = pfn_to_block_id(pfn);
- struct memory_block *mem = find_memory_block_by_id(block_id);
+ struct memory_block *mem = memory_block_get(block_id);
if (mem) {
atomic_long_sub(i, &mem->nr_hwpoison);
- put_device(&mem->dev);
+ memory_block_put(mem);
}
}
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 126f66aa2c3e..b3333ca92090 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -847,13 +847,13 @@ static void register_memory_blocks_under_nodes(void)
for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
struct memory_block *mem;
- mem = find_memory_block_by_id(block_id);
+ mem = memory_block_get(block_id);
if (!mem)
continue;
memory_block_add_nid_early(mem, nid);
do_register_memory_block_under_node(nid, mem);
- put_device(&mem->dev);
+ memory_block_put(mem);
}
}
diff --git a/drivers/s390/char/sclp_mem.c b/drivers/s390/char/sclp_mem.c
index 78c054e26d17..6df1926d4c62 100644
--- a/drivers/s390/char/sclp_mem.c
+++ b/drivers/s390/char/sclp_mem.c
@@ -204,7 +204,7 @@ static ssize_t sclp_config_mem_store(struct kobject *kobj, struct kobj_attribute
addr = sclp_mem->id * block_size;
/*
* Hold device_hotplug_lock when adding/removing memory blocks.
- * Additionally, also protect calls to find_memory_block() and
+ * Additionally, also protect calls to memory_block_get() and
* sclp_attach_storage().
*/
rc = lock_device_hotplug_sysfs();
@@ -231,20 +231,19 @@ static ssize_t sclp_config_mem_store(struct kobject *kobj, struct kobj_attribute
sclp_mem_change_state(addr, block_size, 0);
goto out_unlock;
}
- mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(addr)));
- put_device(&mem->dev);
+ mem = memory_block_get(phys_to_block_id(addr));
+ memory_block_put(mem);
WRITE_ONCE(sclp_mem->config, 1);
} else {
if (!sclp_mem->config)
goto out_unlock;
- mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(addr)));
+ mem = memory_block_get(phys_to_block_id(addr));
if (mem->state != MEM_OFFLINE) {
- put_device(&mem->dev);
+ memory_block_put(mem);
rc = -EBUSY;
goto out_unlock;
}
- /* drop the ref just got via find_memory_block() */
- put_device(&mem->dev);
+ memory_block_put(mem);
sclp_mem_change_state(addr, block_size, 0);
__remove_memory(addr, block_size);
#ifdef CONFIG_KASAN
@@ -294,11 +293,11 @@ static ssize_t sclp_memmap_on_memory_store(struct kobject *kobj, struct kobj_att
return rc;
block_size = memory_block_size_bytes();
sclp_mem = container_of(kobj, struct sclp_mem, kobj);
- mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(sclp_mem->id * block_size)));
+ mem = memory_block_get(phys_to_block_id(sclp_mem->id * block_size));
if (!mem) {
WRITE_ONCE(sclp_mem->memmap_on_memory, value);
} else {
- put_device(&mem->dev);
+ memory_block_put(mem);
rc = -EBUSY;
}
unlock_device_hotplug();
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 5bb5599c6b2b..29edef1f975c 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -158,7 +158,11 @@ int create_memory_block_devices(unsigned long start, unsigned long size,
void remove_memory_block_devices(unsigned long start, unsigned long size);
extern void memory_dev_init(void);
extern int memory_notify(enum memory_block_state state, void *v);
-extern struct memory_block *find_memory_block(unsigned long section_nr);
+extern struct memory_block *memory_block_get(unsigned long block_id);
+static inline void memory_block_put(struct memory_block *mem)
+{
+ put_device(&mem->dev);
+}
typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *);
extern int walk_memory_blocks(unsigned long start, unsigned long size,
void *arg, walk_memory_blocks_func_t func);
@@ -171,7 +175,6 @@ struct memory_group *memory_group_find_by_id(int mgid);
typedef int (*walk_memory_groups_func_t)(struct memory_group *, void *);
int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
struct memory_group *excluded, void *arg);
-struct memory_block *find_memory_block_by_id(unsigned long block_id);
#define hotplug_memory_notifier(fn, pri) ({ \
static __meminitdata struct notifier_block fn##_mem_nb =\
{ .notifier_call = fn, .priority = pri };\
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 462d8dcd636d..890c6453e887 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1417,14 +1417,13 @@ static void remove_memory_blocks_and_altmaps(u64 start, u64 size)
struct vmem_altmap *altmap = NULL;
struct memory_block *mem;
- mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(cur_start)));
+ mem = memory_block_get(phys_to_block_id(cur_start));
if (WARN_ON_ONCE(!mem))
continue;
altmap = mem->altmap;
mem->altmap = NULL;
- /* drop the ref. we got via find_memory_block() */
- put_device(&mem->dev);
+ memory_block_put(mem);
remove_memory_block_devices(cur_start, memblock_size);
base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v13 04/15] arm64: kexec_file: Fix potential buffer overflow in prepare_elf_headers()
From: Breno Leitao @ 2026-05-11 9:46 UTC (permalink / raw)
To: Jinjie Ruan
Cc: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
pasha.tatashin, pratyush, ruirui.yang, rdunlap, pmladek,
dapeng1.mi, kees, elver, kuba, ebiggers, lirongqing, paulmck,
sourabhjain, coxu, jbohac, ryan.roberts, osandov, cfsworks,
tangyouling, ritesh.list, adityag, guoren, songshuaishuai,
kevin.brodsky, vishal.moola, junhui.liu, wangruikang, namcao,
chao.gao, seanjc, fuqiang.wang, ardb, chenjiahao16, hbathini,
takahiro.akashi, james.morse, lizhengyu3, x86, linux-doc,
linux-kernel, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, devicetree, kexec
In-Reply-To: <20260511030454.1730881-5-ruanjinjie@huawei.com>
On Mon, May 11, 2026 at 11:04:43AM +0800, Jinjie Ruan wrote:
> There is a race condition between the kexec_load() system call
> (crash kernel loading path) and memory hotplug operations that can
> lead to buffer overflow and potential kernel crash.
>
> During prepare_elf_headers(), the following steps occur:
> 1. The first for_each_mem_range() queries current System RAM memory ranges
> 2. Allocates buffer based on queried count
> 3. The 2st for_each_mem_range() populates ranges from memblock
>
> If memory hotplug occurs between step 1 and step 3, the number of ranges
> can increase, causing out-of-bounds write when populating cmem->ranges[].
>
> This happens because kexec_load() uses kexec_trylock (atomic_t) while
> memory hotplug uses device_hotplug_lock (mutex), so they don't serialize
> with each other.
>
> Add the explicit bounds checking to prevent out-of-bounds access.
It seems you have a TOCTOU type of issue, and this seems to be shrinking
the window, but not fully solving it?
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Breno Leitao <leitao@debian.org>
> Cc: stable@vger.kernel.org
> Fixes: 3751e728cef2 ("arm64: kexec_file: add crash dump support")
> Closes: https://sashiko.dev/#/patchset/20260323072745.2481719-1-ruanjinjie%40huawei.com
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
> arch/arm64/kernel/machine_kexec_file.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index e31fabed378a..a67e7b1abbab 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -59,6 +59,11 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
> cmem->max_nr_ranges = nr_ranges;
> cmem->nr_ranges = 0;
> for_each_mem_range(i, &start, &end) {
> + if (cmem->nr_ranges >= cmem->max_nr_ranges) {
> + ret = -ENOMEM;
-ENOMEM seems to be the the wrong errno. This isn't an allocation
failure; it's a transient race. -EBUSY or -EAGAIN would be more honest
^ permalink raw reply
* [RFC PATCH net-next 1/5] ibmvnic: Move long delayed work on system_dfl_long_wq
From: Marco Crivellari @ 2026-05-11 9:28 UTC (permalink / raw)
To: linux-kernel, netdev
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Haren Myneni, Rick Lindsley, Nick Child,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy (CS GROUP), linuxppc-dev
In-Reply-To: <20260511092846.120141-1-marco.crivellari@suse.com>
Currently the code enqueue work items using {queue|mod}_delayed_work(),
using system_long_wq. This workqueue should be used when long works are
expected and it is a per-cpu workqueue.
The function(s) end up calling __queue_delayed_work(), which set a global
timer that could fire anywhere, enqueuing the work where the timer fired.
Unbound works could benefit from scheduler task placement, to optimize
performance and power consumption. Long work shouldn't stick to a single
CPU.
Recently, a new unbound workqueue specific for long running work has
been added:
c116737e972e ("workqueue: Add system_dfl_long_wq for long unbound works")
Since the workqueue work doesn't rely on per-cpu variables, there is no
obvious reason that justify the use of a per-cpu workqueue. So change
system_long_wq with system_dfl_long_wq so that the work may benefit from
scheduler task placement.
Cc: Haren Myneni <haren@linux.ibm.com>
Cc: Rick Lindsley <ricklind@linux.ibm.com>
Cc: Nick Child <nnac123@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
drivers/net/ethernet/ibm/ibmvnic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 5a510eed335e..a1c01c9820d2 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3229,7 +3229,7 @@ static void __ibmvnic_reset(struct work_struct *work)
if (adapter->state == VNIC_PROBING &&
!wait_for_completion_timeout(&adapter->probe_done, timeout)) {
dev_err(dev, "Reset thread timed out on probe");
- queue_delayed_work(system_long_wq,
+ queue_delayed_work(system_dfl_long_wq,
&adapter->ibmvnic_delayed_reset,
IBMVNIC_RESET_DELAY);
return;
@@ -3267,7 +3267,7 @@ static void __ibmvnic_reset(struct work_struct *work)
spin_lock(&adapter->rwi_lock);
if (!list_empty(&adapter->rwi_list)) {
if (test_and_set_bit_lock(0, &adapter->resetting)) {
- queue_delayed_work(system_long_wq,
+ queue_delayed_work(system_dfl_long_wq,
&adapter->ibmvnic_delayed_reset,
IBMVNIC_RESET_DELAY);
} else {
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v2] KVM: PPC: Book3S HV: Add H_FAC_UNAVAIL mapping for tracing exits
From: Gautam Menghani @ 2026-05-11 8:19 UTC (permalink / raw)
To: Gautam Menghani, maddy, npiggin, mpe, chleroy, linuxppc-dev, kvm,
linux-kernel
In-Reply-To: <20260507151014.fd9d5732-0f-amachhiw@linux.ibm.com>
On Thu, May 07, 2026 at 03:12:29PM +0530, Amit Machhiwal wrote:
> On 2026/04/28 02:15 PM, Gautam Menghani wrote:
> > From: Gautam Menghani <gautam@linux.ibm.com>
> >
> > The macro kvm_trace_symbol_exit is used for providing the mappings
> > for the trap vectors and their names. Add mapping for H_FAC_UNAVAIL so that
> > trap reason is displayed as string instead of a vector number when using
> > the kvm_guest_exit tracepoint.
> >
> > Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
> > ---
> > v2:
> > 1. Remove the trailing comma after last element
> >
> > arch/powerpc/kvm/trace_book3s.h | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kvm/trace_book3s.h b/arch/powerpc/kvm/trace_book3s.h
> > index 9260ddbd557f..5d272c115331 100644
> > --- a/arch/powerpc/kvm/trace_book3s.h
> > +++ b/arch/powerpc/kvm/trace_book3s.h
> > @@ -28,6 +28,7 @@
> > {0xea0, "H_VIRT"}, \
> > {0xf00, "PERFMON"}, \
> > {0xf20, "ALTIVEC"}, \
> > - {0xf40, "VSX"}
> > + {0xf40, "VSX"}, \
> > + {0xf80, "H_FAC_UNAVAIL"}
>
> While we are at it, should we also consider adding 0xf60 for Facility
> Unavailable? Anyways, LGTM.
0xf60 is handled by OS when problem state tries to use a facility which
is not available. So we won't exit to host for this.
Thanks,
Gautam
^ permalink raw reply
* [PATCH] powerpc/64s: Fix the vector number in comments for h_facility_unavailable
From: Gautam Menghani @ 2026-05-11 8:04 UTC (permalink / raw)
To: maddy, mpe, npiggin, chleroy; +Cc: Gautam Menghani, linuxppc-dev, linux-kernel
From: Gautam Menghani <gautam@linux.ibm.com>
The comments explaining the h_facility_unavailable interrupt have mentioned
the vector number as 0xf60 instead of 0xf80. Fix this typo.
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
---
arch/powerpc/kernel/exceptions-64s.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index b7229430ca94..2696fbbca3b6 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -2498,7 +2498,7 @@ EXC_COMMON_BEGIN(facility_unavailable_common)
/**
- * Interrupt 0xf60 - Hypervisor Facility Unavailable Interrupt.
+ * Interrupt 0xf80 - Hypervisor Facility Unavailable Interrupt.
* This is a synchronous interrupt in response to
* executing an instruction without access to the facility that can only
* be resolved in HV mode (e.g., HFSCR).
--
2.53.0
^ permalink raw reply related
* Re: powernv_rng_read: Oops: Kernel access of bad area, sig: 11 [#1]
From: Paul Menzel @ 2026-05-11 8:03 UTC (permalink / raw)
To: Madhavan Srinivasan, Olivia Mackall, Herbert Xu, Michael Ellerman,
Jason A. Donenfeld
Cc: linux-crypto, linuxppc-dev, LKML
In-Reply-To: <6f58b950-a997-4dd6-a1a2-95eb72009151@molgen.mpg.de>
Dear Madhavan, dear Jason,
Am 11.05.26 um 09:00 schrieb Paul Menzel:
> Am 07.05.26 um 04:40 schrieb Madhavan Srinivasan:
>>
>> On 5/6/26 7:31 PM, Paul Menzel wrote:
>
>>> After a long while, on the 8335-GCA POWER8 (raw) 0x4d0200
>>> opal:skiboot-5.4.8-5787ad3 PowerNV, I built Linux from Linus’ master
>>> branch and rebooted via kexec.
>>>
>>> ```
>>> [ 0.000000] Linux version 7.1.0-rc2+ (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) #3 SMP PREEMPT Wed May 6 08:50:58 CEST 2026
>>> […]
>>> [ 17.901992] Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
>>> [ 17.902011] BUG: Kernel NULL pointer dereference on read at 0x00000000
>>> [ 17.902018] Faulting instruction address: 0xc0000000000e7138
>>> [ 17.902027] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [ 17.902034] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
>>> [ 17.902045] Modules linked in: powernv_rng(+) bnx2x ofpart ibmpowernv bfq mdio cmdlinepart powernv_flash ipmi_powernv ipmi_devintf mtd ipmi_msghandler at24(+) vmx_crypto opal_prd sch_fq_codel nfsd parport_pc ppdev auth_rpcgss nfs_acl lp lockd grace parport sunrpc autofs4 btrfs xor libblake2b raid6_pq ast drm_shmem_helper drm_client_lib i2c_algo_bit drm_kms_helper drm ahci drm_panel_orientation_quirks libahci
>>> [ 17.902185] CPU: 147 UID: 0 PID: 2626 Comm: hwrng Not tainted 7.1.0-rc2+ #3 PREEMPTLAZY
>>> [ 17.902197] Hardware name: 8335-GCA POWER8 (raw) 0x4d0200 opal:skiboot-5.4.8-5787ad3 PowerNV
>>> [ 17.902204] NIP: c0000000000e7138 LR: c00800001ec8013c CTR: c0000000000e70fc
>>> [ 17.902212] REGS: c000000092913c50 TRAP: 0300 Not tainted (7.1.0-rc2+)
>>> [ 17.902222] MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44420220 XER: 20000000
>>> [ 17.902269] CFAR: c00800001ec8026c DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
>>> GPR00: c00800001ec8013c c000000092913ef0 c000000001c18100 c00000002222d900
>>> GPR04: c00000002222d900 0000000000000080 0000000000000001 0000000000000000
>>> GPR08: 0000000000000000 c000000002212000 c0000000951e1780 c00800001ec80258
>>> GPR12: c0000000000e70fc c00000ffff6fd700 c0000000001d11c0 c00000001b99b9c0
>>> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> GPR24: 0000000000000000 c000000002fe6a58 0000000000000000 0000000000000000
>>> GPR28: c000000002fe6a20 0000000000000010 000000000000000f c00000002222d900
>>> [ 17.902406] NIP [c0000000000e7138] pnv_get_random_long+0x3c/0x114
>>> [ 17.902426] LR [c00800001ec8013c] powernv_rng_read+0x78/0xc4 [powernv_rng]
>>> [ 17.902444] Call Trace:
>>> [ 17.902448] [c000000092913ef0] [c000000092913f30] 0xc000000092913f30 (unreliable)
>>> [ 17.902463] [c000000092913f30] [c000000000decd58] hwrng_fillfn+0xd4/0x3dc
>>> [ 17.902484] [c000000092913f90] [c0000000001d1328] kthread+0x170/0x1a4
>>> [ 17.902498] [c000000092913fe0] [c00000000000d030] start_kernel_thread+0x14/0x18
>>> [ 17.902513] Code: 60000000 7d2000a6 71290010 418200bc e94d0908 812a0000 39290001 912a0000 e90d0030 3d220060 39299f00 7d08482a <e9280000> 7c0004ac e8e90000 0c070000
>>> [ 17.902569] ---[ end trace 0000000000000000 ]---
>>> [ 18.008801] pstore: backend (nvram) writing error (-1)
>>>
>>> [ 18.015458] note: hwrng[2626] exited with irqs disabled
>>> [ 18.015483] note: hwrng[2626] exited with preempt_count 1
>>> ```
>>>
>>> Please find the output of `dmesg` attached.
>>
>> This is from my yesterday's boot test log in my P8, did not see this
>> fail.
>>
>> root@ltcppm1:~# uname -a
>> Linux ltcppm1.ltc.tadn.ibm.com 7.1.0-rc2-00021-gf583bd5f64d4 #1 SMP
>> PREEMPT Wed May 6 00:55:45 EDT 2026 ppc64le GNU/Linux
>> root@ltcppm1:~# dmesg
>> [ 0.000000] [ T0] random: crng init done
>> [ 0.000000] [ T0] hash-mmu: Page sizes from device-tree:
>> [ 0.000000] [ T0] hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
>> [ 0.000000] [ T0] hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
>> [ 0.000000] [ T0] hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
>> [ 0.000000] [ T0] hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
>> [ 0.000000] [ T0] hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
>> [ 0.000000] [ T0] hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
>> [ 0.000000] [ T0] hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
>> [ 0.000000] [ T0] Enabling pkeys with max key count 32
>> [ 0.000000] [ T0] Activating Kernel Userspace Access Prevention
>> [ 0.000000] [ T0] Activating Kernel Userspace Execution Prevention
>> [ 0.000000] [ T0] hash-mmu: Page orders: linear mapping = 24, virtual = 16, io = 16, vmemmap = 24
>> [ 0.000000] [ T0] hash-mmu: Using 1TB segments
>> [ 0.000000] [ T0] hash-mmu: Initializing hash mmu with SLB
>> [ 0.000000] [ T0] Linux version 7.1.0-rc2-00021-gf583bd5f64d4 (root@ltcppm1.ltc.tadn.ibm.com) (gcc (GCC) 16.1.1 20260501 (Red Hat 16.1.1-1), GNU ld version 2.46-1.fc44) #1 SMP PREEMPT Wed May 6
>> 00:55:45 EDT 2026
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000000039c00000..0x000000003b6801ff (27136 KiB) map non-reusable ibm,firmware-allocs-memory@39c00000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000000800000000..0x0000000800e801ff (14848 KiB) map non-reusable ibm,firmware-allocs-memory@800000000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000001000000000..0x0000001000dc01ff (14080 KiB) map non-reusable ibm,firmware-allocs-memory@1000000000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000001800000000..0x0000001800e801ff (14848 KiB) map non-reusable ibm,firmware-allocs-memory@1800000000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000000030000000..0x00000000302fffff (3072 KiB) map non-reusable ibm,firmware-code@30000000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000000031000000..0x0000000031bfffff (12288 KiB) map non-reusable ibm,firmware-data@31000000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000000030300000..0x0000000030ffffff (13312 KiB) map non-reusable ibm,firmware-heap@30300000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000000031c00000..0x0000000033fdffff (36736 KiB) map non-reusable ibm,firmware-stacks@31c00000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffd510000..0x0000001ffd69ffff (1600 KiB) map non-reusable ibm,hbrt-code-image@1ffd510000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffd6a0000..0x0000001ffd6fffff (384 KiB) map non-reusable ibm,hbrt-target-image@1ffd6a0000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffd700000..0x0000001ffd7fffff (1024 KiB) map non-reusable ibm,hbrt-vpd-image@1ffd700000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffda00000..0x0000001ffdafffff (1024 KiB) map non-reusable ibm,slw-image@1ffda00000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffde00000..0x0000001ffdefffff (1024 KiB) map non-reusable ibm,slw-image@1ffde00000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffe200000..0x0000001ffe2fffff (1024 KiB) map non-reusable ibm,slw-image@1ffe200000
>> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffe600000..0x0000001ffe6fffff (1024 KiB) map non-reusable ibm,slw-image@1ffe600000
>> [ 0.000000] [ T0] Found initrd at 0xc000000006a40000:0xc00000000815ae9e
>> [ 0.000000] [ T0] Hardware name: 8247-22L POWER8E (raw) 0x4b0201 opal:skiboot-v5.4.12 PowerNV
>> [ 0.000000] [ T0] printk: legacy bootconsole [udbg0] enabled
>> [ 0.000000] [ T0] CPU maps initialized for 8 threads per core
>> [ 0.000000] [ T0] (thread shift is 3)
>>>> But I my opal version 5.4.12.
>>
>> Thanks for reporting the issue, will have an look at it.
>
> I bisected it to a change between 5.19-rc3 and 5.19-rc4, and merge
> commit 8100775d59a6 (Merge tag 'powerpc-5.19-3' of git://git.kernel.org/
> pub/scm/linux/kernel/git/powerpc/linux) [1] indeed has rng related changes
>
>> - Three fixes to wire up our various RNGs earlier in boot so they're
>> available for use in the initial seeding in random_init().
I confirmed, that commit f3eac426657d (powerpc/powernv: wire up rng
during setup_arch) [2] introduced the Oops.
>> [ 0.000000] [ T0] Allocated 4608 bytes for 160 pacas
>> [ 0.000000] [ T0]
>> -----------------------------------------------------
>>
>> .......
>>
>> [ 37.407674] [ T900] audit: type=1130 audit(1778043621.931:10): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=lvm2-monitor comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
>> terminal=? res=success'
>> [ 37.413015] [ T900] audit: type=1130 audit(1778043621.937:11): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-sysctl comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
>> [ 38.448156] [ T2286] powernv_rng: Registered powernv hwrng.
>> [ 38.575227] [ T2264] tg3 0005:09:00.1 enP5p9s0f1: renamed from eth1
>> [ 38.582176] [ T2223] tg3 0005:09:00.2 enP5p9s0f2: renamed from eth2
>> ........
>>
>> ////cpuinfo output
>>
>> processor : 159
>>
>> cpu : POWER8E (raw), altivec supported
>> clock : 2061.000000MHz
>> revision : 2.1 (pvr 004b 0201)
>>
>> timebase : 512000000
>> platform : PowerNV
>> model : 8247-22L
>> machine : PowerNV 8247-22L
>> firmware : OPAL
>> MMU : Hash
>>
>>
>> But my system opal version 5.4.12.
>> Thanks for reporting the issue, will have an look at it.
>
> Thank you.
Kind regards,
Paul
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8100775d59a6789c3c6c309de26fac52f129cba8
[2]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f3eac426657d985b97c92fa5f7ae1d43f04721f3
^ permalink raw reply
* Re: powernv_rng_read: Oops: Kernel access of bad area, sig: 11 [#1]
From: Paul Menzel @ 2026-05-11 7:00 UTC (permalink / raw)
To: Madhavan Srinivasan, Olivia Mackall, Herbert Xu, Michael Ellerman
Cc: linux-crypto, linuxppc-dev, LKML, Jason A. Donenfeld
In-Reply-To: <0c06bc14-9459-44d5-9e28-b0b78c0fbe36@linux.ibm.com>
[Cc: +Jason]
Dear Madhavan,
Am 07.05.26 um 04:40 schrieb Madhavan Srinivasan:
>
> On 5/6/26 7:31 PM, Paul Menzel wrote:
>> After a long while, on the 8335-GCA POWER8 (raw) 0x4d0200
>> opal:skiboot-5.4.8-5787ad3 PowerNV, I built Linux from Linus’ master
>> branch and rebooted via kexec.
>>
>> ```
>> [ 0.000000] Linux version 7.1.0-rc2+ (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) #3 SMP PREEMPT Wed May 6 08:50:58 CEST 2026
>> […]
>> [ 17.901992] Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
>> [ 17.902011] BUG: Kernel NULL pointer dereference on read at 0x00000000
>> [ 17.902018] Faulting instruction address: 0xc0000000000e7138
>> [ 17.902027] Oops: Kernel access of bad area, sig: 11 [#1]
>> [ 17.902034] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
>> [ 17.902045] Modules linked in: powernv_rng(+) bnx2x ofpart ibmpowernv bfq mdio cmdlinepart powernv_flash ipmi_powernv ipmi_devintf mtd ipmi_msghandler at24(+) vmx_crypto opal_prd sch_fq_codel nfsd parport_pc ppdev auth_rpcgss nfs_acl lp lockd grace parport sunrpc autofs4 btrfs xor libblake2b raid6_pq ast drm_shmem_helper drm_client_lib i2c_algo_bit drm_kms_helper drm ahci drm_panel_orientation_quirks libahci
>> [ 17.902185] CPU: 147 UID: 0 PID: 2626 Comm: hwrng Not tainted 7.1.0-rc2+ #3 PREEMPTLAZY
>> [ 17.902197] Hardware name: 8335-GCA POWER8 (raw) 0x4d0200 opal:skiboot-5.4.8-5787ad3 PowerNV
>> [ 17.902204] NIP: c0000000000e7138 LR: c00800001ec8013c CTR: c0000000000e70fc
>> [ 17.902212] REGS: c000000092913c50 TRAP: 0300 Not tainted (7.1.0-rc2+)
>> [ 17.902222] MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44420220 XER: 20000000
>> [ 17.902269] CFAR: c00800001ec8026c DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
>> GPR00: c00800001ec8013c c000000092913ef0 c000000001c18100 c00000002222d900
>> GPR04: c00000002222d900 0000000000000080 0000000000000001 0000000000000000
>> GPR08: 0000000000000000 c000000002212000 c0000000951e1780 c00800001ec80258
>> GPR12: c0000000000e70fc c00000ffff6fd700 c0000000001d11c0 c00000001b99b9c0
>> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR24: 0000000000000000 c000000002fe6a58 0000000000000000 0000000000000000
>> GPR28: c000000002fe6a20 0000000000000010 000000000000000f c00000002222d900
>> [ 17.902406] NIP [c0000000000e7138] pnv_get_random_long+0x3c/0x114
>> [ 17.902426] LR [c00800001ec8013c] powernv_rng_read+0x78/0xc4 [powernv_rng]
>> [ 17.902444] Call Trace:
>> [ 17.902448] [c000000092913ef0] [c000000092913f30] 0xc000000092913f30 (unreliable)
>> [ 17.902463] [c000000092913f30] [c000000000decd58] hwrng_fillfn+0xd4/0x3dc
>> [ 17.902484] [c000000092913f90] [c0000000001d1328] kthread+0x170/0x1a4
>> [ 17.902498] [c000000092913fe0] [c00000000000d030] start_kernel_thread+0x14/0x18
>> [ 17.902513] Code: 60000000 7d2000a6 71290010 418200bc e94d0908 812a0000 39290001 912a0000 e90d0030 3d220060 39299f00 7d08482a <e9280000> 7c0004ac e8e90000 0c070000
>> [ 17.902569] ---[ end trace 0000000000000000 ]---
>> [ 18.008801] pstore: backend (nvram) writing error (-1)
>>
>> [ 18.015458] note: hwrng[2626] exited with irqs disabled
>> [ 18.015483] note: hwrng[2626] exited with preempt_count 1
>> ```
>>
>> Please find the output of `dmesg` attached.
>
> This is from my yesterday's boot test log in my P8, did not see this fail.
>
> root@ltcppm1:~# uname -a
> Linux ltcppm1.ltc.tadn.ibm.com 7.1.0-rc2-00021-gf583bd5f64d4 #1 SMP PREEMPT Wed May 6 00:55:45 EDT 2026 ppc64le GNU/Linux
> root@ltcppm1:~# dmesg
> [ 0.000000] [ T0] random: crng init done
> [ 0.000000] [ T0] hash-mmu: Page sizes from device-tree:
> [ 0.000000] [ T0] hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
> [ 0.000000] [ T0] hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
> [ 0.000000] [ T0] hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
> [ 0.000000] [ T0] hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
> [ 0.000000] [ T0] hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
> [ 0.000000] [ T0] hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
> [ 0.000000] [ T0] hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
> [ 0.000000] [ T0] Enabling pkeys with max key count 32
> [ 0.000000] [ T0] Activating Kernel Userspace Access Prevention
> [ 0.000000] [ T0] Activating Kernel Userspace Execution Prevention
> [ 0.000000] [ T0] hash-mmu: Page orders: linear mapping = 24, virtual = 16, io = 16, vmemmap = 24
> [ 0.000000] [ T0] hash-mmu: Using 1TB segments
> [ 0.000000] [ T0] hash-mmu: Initializing hash mmu with SLB
> [ 0.000000] [ T0] Linux version 7.1.0-rc2-00021-gf583bd5f64d4
> (root@ltcppm1.ltc.tadn.ibm.com) (gcc (GCC) 16.1.1 20260501 (Red Hat 16.1.1-1), GNU ld version 2.46-1.fc44) #1 SMP PREEMPT Wed May 6 00:55:45 EDT 2026
> [ 0.000000] [ T0] OF: reserved mem: 0x0000000039c00000..0x000000003b6801ff (27136 KiB) map non-reusable ibm,firmware-allocs-memory@39c00000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000000800000000..0x0000000800e801ff (14848 KiB) map non-reusable ibm,firmware-allocs-memory@800000000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000001000000000..0x0000001000dc01ff (14080 KiB) map non-reusable ibm,firmware-allocs-memory@1000000000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000001800000000..0x0000001800e801ff (14848 KiB) map non-reusable ibm,firmware-allocs-memory@1800000000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000000030000000..0x00000000302fffff (3072 KiB) map non-reusable ibm,firmware-code@30000000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000000031000000..0x0000000031bfffff (12288 KiB) map non-reusable ibm,firmware-data@31000000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000000030300000..0x0000000030ffffff (13312 KiB) map non-reusable ibm,firmware-heap@30300000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000000031c00000..0x0000000033fdffff (36736 KiB) map non-reusable ibm,firmware-stacks@31c00000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffd510000..0x0000001ffd69ffff (1600 KiB) map non-reusable ibm,hbrt-code-image@1ffd510000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffd6a0000..0x0000001ffd6fffff (384 KiB) map non-reusable ibm,hbrt-target-image@1ffd6a0000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffd700000..0x0000001ffd7fffff (1024 KiB) map non-reusable ibm,hbrt-vpd-image@1ffd700000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffda00000..0x0000001ffdafffff (1024 KiB) map non-reusable ibm,slw-image@1ffda00000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffde00000..0x0000001ffdefffff (1024 KiB) map non-reusable ibm,slw-image@1ffde00000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffe200000..0x0000001ffe2fffff (1024 KiB) map non-reusable ibm,slw-image@1ffe200000
> [ 0.000000] [ T0] OF: reserved mem: 0x0000001ffe600000..0x0000001ffe6fffff (1024 KiB) map non-reusable ibm,slw-image@1ffe600000
> [ 0.000000] [ T0] Found initrd at 0xc000000006a40000:0xc00000000815ae9e
> [ 0.000000] [ T0] Hardware name: 8247-22L POWER8E (raw) 0x4b0201 opal:skiboot-v5.4.12 PowerNV
> [ 0.000000] [ T0] printk: legacy bootconsole [udbg0] enabled
> [ 0.000000] [ T0] CPU maps initialized for 8 threads per core
> [ 0.000000] [ T0] (thread shift is 3)But I my opal version 5.4.12.
>
> Thanks for reporting the issue, will have an look at it.
I bisected it to a change between 5.19-rc3 and 5.19-rc4, and merge
commit 8100775d59a6 (Merge tag 'powerpc-5.19-3' of
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux) [1] indeed
has rng related changes
> - Three fixes to wire up our various RNGs earlier in boot so they're
> available for use in the initial seeding in random_init().
> [ 0.000000] [ T0] Allocated 4608 bytes for 160 pacas
> [ 0.000000] [ T0]
> -----------------------------------------------------
>
> .......
>
> [ 37.407674] [ T900] audit: type=1130 audit(1778043621.931:10): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=lvm2-monitor comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> [ 37.413015] [ T900] audit: type=1130 audit(1778043621.937:11): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-sysctl comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> [ 38.448156] [ T2286] powernv_rng: Registered powernv hwrng.
> [ 38.575227] [ T2264] tg3 0005:09:00.1 enP5p9s0f1: renamed from eth1
> [ 38.582176] [ T2223] tg3 0005:09:00.2 enP5p9s0f2: renamed from eth2
> ........
>
> ////cpuinfo output
>
> processor : 159
>
> cpu : POWER8E (raw), altivec supported
> clock : 2061.000000MHz
> revision : 2.1 (pvr 004b 0201)
>
> timebase : 512000000
> platform : PowerNV
> model : 8247-22L
> machine : PowerNV 8247-22L
> firmware : OPAL
> MMU : Hash
>
>
> But my system opal version 5.4.12.
> Thanks for reporting the issue, will have an look at it.
Thank you.
Kind regards,
Paul
[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8100775d59a6789c3c6c309de26fac52f129cba8
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox