LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v15 13/23] riscv: kexec_file: Fix TOCTOU buffer overflow via memory region padding
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Sashiko AI code review pointed out there is a TOCTOU (Time-of-Check to
Time-of-Use) race condition in prepare_elf_headers() between the initial
pass that counts System RAM ranges and the second pass that populates them.
If a memory hotplug event occurs between these two steps, the number of
memory regions may increase, causing an out-of-bounds write to
the cmem->ranges[] array.

Fix this fundamentally by using `CRASH_HOTPLUG_SAFETY_PADDING` (128 slots)
to expand the flexible array allocation ceiling upfront. This safely
absorbs any concurrent memory region expansion. Concurrently, add
a defensive boundary check inside the callback to return -EAGAIN on
unexpected overrun, fully eradicating the overflow window and ensuring
system stability.

Cc: Paul Walmsley <pjw@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: songshuaishuai@tinylab.org
Cc: bjorn@rivosinc.com
Cc: leitao@debian.org
Fixes: 8acea455fafa ("RISC-V: Support for kexec_file on panic")
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/riscv/kernel/machine_kexec_file.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/machine_kexec_file.c b/arch/riscv/kernel/machine_kexec_file.c
index 3f7766057cac..f3576dc0513f 100644
--- a/arch/riscv/kernel/machine_kexec_file.c
+++ b/arch/riscv/kernel/machine_kexec_file.c
@@ -48,6 +48,9 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 {
 	struct crash_mem *cmem = arg;
 
+	if (unlikely(cmem->nr_ranges >= cmem->max_nr_ranges))
+		return -EAGAIN;
+
 	cmem->ranges[cmem->nr_ranges].start = res->start;
 	cmem->ranges[cmem->nr_ranges].end = res->end;
 	cmem->nr_ranges++;
@@ -61,7 +64,8 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
 	unsigned int nr_ranges;
 	int ret;
 
-	nr_ranges = 2; /* For exclusion of crashkernel region */
+	/* For exclusion of crashkernel region */
+	nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
 	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
 
 	cmem = kmalloc_flex(*cmem, ranges, nr_ranges);
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 12/23] arm64: kexec_file: Fix TOCTOU buffer overflow via memory region padding
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Sashiko AI code review pointed out there is a TOCTOU (Time-of-Check to
Time-of-Use) race condition in prepare_elf_headers() between the initial
pass that counts System RAM ranges and the second pass that populates them.
If a memory hotplug event occurs between these two steps, the number of
memory regions may increase, causing an out-of-bounds write to
the cmem->ranges[] array.

Fix this fundamentally by using `CRASH_HOTPLUG_SAFETY_PADDING`
(128 slots) to expand the flexible array allocation ceiling upfront.
This safely absorbs any concurrent memory region expansion. Concurrently,
add a defensive boundary check to return -EAGAIN on unexpected overrun,
fully eradicating the overflow window and ensuring system stability.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: stable@vger.kernel.org
Fixes: 3751e728cef2 ("arm64: kexec_file: add crash dump support")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/machine_kexec_file.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 4cbb71e1f8ed..8a96fb68b88d 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -48,7 +48,8 @@ int prepare_elf_headers(void **addr, unsigned long *sz)
 	u64 i;
 	phys_addr_t start, end;
 
-	nr_ranges = 2; /* for exclusion of crashkernel region */
+	/* for exclusion of crashkernel region */
+	nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
 	for_each_mem_range(i, &start, &end)
 		nr_ranges++;
 
@@ -59,6 +60,11 @@ int prepare_elf_headers(void **addr, unsigned long *sz)
 	cmem->max_nr_ranges = nr_ranges;
 	cmem->nr_ranges = 0;
 	for_each_mem_range(i, &start, &end) {
+		if (unlikely(cmem->nr_ranges >= cmem->max_nr_ranges)) {
+			ret = -EAGAIN;
+			goto out;
+		}
+
 		cmem->ranges[cmem->nr_ranges].start = start;
 		cmem->ranges[cmem->nr_ranges].end = end - 1;
 		cmem->nr_ranges++;
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 11/23] x86: kexec_file: Fix TOCTOU buffer overflow via memory region padding
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Sashiko AI code review pointed out there is a TOCTOU (Time-of-Check to
Time-of-Use) race condition in prepare_elf_headers() between the initial
pass that counts System RAM ranges and the second pass that populates them.
If a memory hotplug event occurs between these two steps, the number of
memory regions may increase, causing an out-of-bounds write to
the cmem->ranges[] array.

Fix this fundamentally by using `CRASH_HOTPLUG_SAFETY_PADDING`(128 slots)
to expand the flexible array allocation ceiling upfront. This safely
absorbs any concurrent memory region expansion. Concurrently, add
a defensive boundary check inside the callback to return -EAGAIN
on unexpected overrun, fully eradicating the overflow window and ensuring
system stability.

Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 8d5f894a3108 ("x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/x86/kernel/crash.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cd796818d94d..a1089907728d 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -177,7 +177,7 @@ static struct crash_mem *fill_up_crash_elf_data(void)
 	 * But in order to lest the low 1M could be changed in the future,
 	 * (e.g. [start, 1M]), add a extra slot.
 	 */
-	nr_ranges += 3 + crashk_cma_cnt;
+	nr_ranges += 3 + crashk_cma_cnt + CRASH_HOTPLUG_SAFETY_PADDING;
 	cmem = vzalloc(struct_size(cmem, ranges, nr_ranges));
 	if (!cmem)
 		return NULL;
@@ -226,6 +226,9 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 {
 	struct crash_mem *cmem = arg;
 
+	if (unlikely(cmem->nr_ranges >= cmem->max_nr_ranges))
+		return -EAGAIN;
+
 	cmem->ranges[cmem->nr_ranges].start = res->start;
 	cmem->ranges[cmem->nr_ranges].end = res->end;
 	cmem->nr_ranges++;
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 10/23] crash_core: Introduce CRASH_HOTPLUG_SAFETY_PADDING for memory hotplug safety
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Introduce CRASH_HOTPLUG_SAFETY_PADDING to allocate extra slots
for the crash memory ranges array, mitigating potential TOCTOU races
caused by concurrent memory hotplug events. When CONFIG_MEMORY_HOTPLUG
is disabled, the padding safely defaults to 0 as the memory
layout remains static.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 include/linux/crash_core.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index c1dee3f971a9..d4762e000098 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -14,6 +14,12 @@ struct crash_mem {
 	struct range ranges[] __counted_by(max_nr_ranges);
 };
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+#define CRASH_HOTPLUG_SAFETY_PADDING 128
+#else
+#define CRASH_HOTPLUG_SAFETY_PADDING 0
+#endif
+
 #ifdef CONFIG_CRASH_DUMP
 
 int crash_shrink_memory(unsigned long new_size);
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 09/23] kexec: Fix UAF and Double Free in crash_load_dm_crypt_keys()
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

A static memory safety review by Sashiko AI identified a high-severity
Use-After-Free (UAF) and Double Free vulnerability in the dm-crypt keys
handling path during arm64 kexec image placement retry loops.

In crash_load_dm_crypt_keys(), when the segment allocation fails via
kexec_add_buffer(), the error path invokes `kvfree((void *)kbuf.buffer)`
to reclaim the keys buffer. However, the global pointer `keys_header` is
left dangling with a stale address, creating an insecure memory trap.

When the top-level loader image_load() retries the next available placement
hole, crash_load_dm_crypt_keys() is re-entered. Since `is_dm_key_reused`
is a read-only global configuration managed by user-space configfs,
it cannot be mutated by the kernel. If it remains true, the loader skips
build_keys_header() and blindly reuses the stale `keys_header` pointer
for kbuf.buffer, triggering a severe Use-After-Free or a Null pointer
dereference during kexec_add_buffer(). Alternatively, a new headers build
can trigger a recursive Double Free inside build_keys_header().

Fix this by setting the global `keys_header` to NULL immediately after
it is freed in the failure path. Concurrently, upgrade the header
regeneration check to a composite condition:
	`if (!is_dm_key_reused || !keys_header)`

This ensures that if a previous retry attempt wiped the buffer, the kernel
will automatically and safely trigger a fresh header regeneration
internally without modifying the user-configured `is_dm_key_reused` state
flag, achieving absolute data consistency and memory safety across all
retry paths.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Dave Young <ruirui.yang@linux.dev>
Cc: stable@vger.kernel.org
Fixes: e3a84be1ec2f ("arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 kernel/crash_dump_dm_crypt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/crash_dump_dm_crypt.c b/kernel/crash_dump_dm_crypt.c
index cb875ddb6ba6..2c5462876337 100644
--- a/kernel/crash_dump_dm_crypt.c
+++ b/kernel/crash_dump_dm_crypt.c
@@ -412,13 +412,12 @@ int crash_load_dm_crypt_keys(struct kimage *image)
 	};
 	int r;
 
-
 	if (key_count <= 0) {
 		kexec_dprintk("No dm-crypt keys\n");
 		return 0;
 	}
 
-	if (!is_dm_key_reused) {
+	if (!is_dm_key_reused || unlikely(!keys_header)) {
 		image->dm_crypt_keys_addr = 0;
 		r = build_keys_header();
 		if (r) {
@@ -437,6 +436,7 @@ int crash_load_dm_crypt_keys(struct kimage *image)
 	if (r) {
 		pr_err("Failed to call kexec_add_buffer, ret=%d\n", r);
 		kvfree((void *)kbuf.buffer);
+		keys_header = NULL;
 		return r;
 	}
 	image->dm_crypt_keys_addr = kbuf.mem;
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 08/23] arm64: kexec_file: Fix image->elf_headers memory leak during retry loop
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Sashiko AI code review pointed out a potential memory leak of
image->elf_headers when load_other_segments() fails on error paths.

In the arm64 kexec_file file-load path, kexec_image.c runs a retry loop
calling kexec_add_buffer() to find a suitable location for the kernel
segment. On each iteration, load_other_segments() is invoked to allocate
and populate alternative segments such as initrd, DTB, and ELF headers.

However, if a placement or allocation failure occurs later in
load_other_segments() (e.g., when adding initrd or dtb), the execution
jumps to the out_err label. While this path restores image->nr_segments
via orig_segments, it returns an error back to the caller without freeing
the previously allocated image->elf_headers vmalloc buffer.

As a result, the retry loop in image_load() unconditionally allocates
new ELF headers on the next iteration and overwrites image->elf_headers,
permanently leaking the memory blocks allocated in previous iterations.

To fix this, decouple the ELF header allocation from the target-seeking
retry loop. Since the contents and size of ELF headers only depend on
the host memory layout and do not change with the kernel's physical
placement, move prepare_elf_headers() completely outside and prior to
the while retry loop in image_load().

And if kexec_add_buffer() for elf headers fails, not need to vfree
headers, because the err path will vfree `image->elf_headers` by calling
arch_kimage_file_post_load_cleanup().

This optimization eliminates redundant memory allocation/deallocation
overhead during kexec placement retries and eradicates the Use-After-Free
and memory leak risk.

Concurrently, remove the prepare_elf_headers() call from inside
load_other_segments() and have it directly reuse the single, pre-allocated
image->elf_headers.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Coiby Xu <coxu@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Benjamin Gwin <bgwin@google.com>
Cc: stable@vger.kernel.org
Fixes: 108aa503657e ("arm64: kexec_file: try more regions if loading segments fails")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
v15:
- Use image->elf_headers and image->elf_headers_sz instead of adding function
  parameters for load_other_segments() to simplify the fix.
---
 arch/arm64/include/asm/kexec.h         |  1 +
 arch/arm64/kernel/kexec_image.c        | 16 ++++++++++++++++
 arch/arm64/kernel/machine_kexec_file.c | 23 +++++------------------
 3 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 892e5bebda95..7ffa2ff5fcfd 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -128,6 +128,7 @@ extern int load_other_segments(struct kimage *image,
 		unsigned long kernel_load_addr, unsigned long kernel_size,
 		char *initrd, unsigned long initrd_len,
 		char *cmdline);
+extern int prepare_elf_headers(void **addr, unsigned long *sz);
 #endif
 
 #endif /* __ASSEMBLER__ */
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index ffcb7f9075e6..424b9527db09 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -89,6 +89,22 @@ static void *image_load(struct kimage *image,
 
 	kernel_segment_number = image->nr_segments;
 
+#ifdef CONFIG_CRASH_DUMP
+	if (image->type == KEXEC_TYPE_CRASH) {
+		/* load elf core header */
+		unsigned long headers_sz;
+		void *headers;
+
+		ret = prepare_elf_headers(&headers, &headers_sz);
+		if (ret) {
+			pr_err("Preparing elf core header failed\n");
+			return ERR_PTR(ret);
+		}
+		image->elf_headers = headers;
+		image->elf_headers_sz = headers_sz;
+	}
+#endif
+
 	/*
 	 * The location of the kernel segment may make it impossible to satisfy
 	 * the other segment requirements, so we try repeatedly to find a
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 13c247c28866..4cbb71e1f8ed 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -40,7 +40,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
 }
 
 #ifdef CONFIG_CRASH_DUMP
-static int prepare_elf_headers(void **addr, unsigned long *sz)
+int prepare_elf_headers(void **addr, unsigned long *sz)
 {
 	struct crash_mem *cmem;
 	unsigned int nr_ranges;
@@ -105,32 +105,19 @@ int load_other_segments(struct kimage *image,
 	kbuf.buf_min = kernel_load_addr + kernel_size;
 
 #ifdef CONFIG_CRASH_DUMP
-	/* load elf core header */
-	void *headers;
-	unsigned long headers_sz;
 	if (image->type == KEXEC_TYPE_CRASH) {
-		ret = prepare_elf_headers(&headers, &headers_sz);
-		if (ret) {
-			pr_err("Preparing elf core header failed\n");
-			goto out_err;
-		}
-
-		kbuf.buffer = headers;
-		kbuf.bufsz = headers_sz;
+		kbuf.buffer = image->elf_headers;
+		kbuf.bufsz = image->elf_headers_sz;
 		kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
-		kbuf.memsz = headers_sz;
+		kbuf.memsz = image->elf_headers_sz;
 		kbuf.buf_align = SZ_64K; /* largest supported page size */
 		kbuf.buf_max = ULONG_MAX;
 		kbuf.top_down = true;
 
 		ret = kexec_add_buffer(&kbuf);
-		if (ret) {
-			vfree(headers);
+		if (ret)
 			goto out_err;
-		}
-		image->elf_headers = headers;
 		image->elf_load_addr = kbuf.mem;
-		image->elf_headers_sz = headers_sz;
 
 		kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
 			      image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 07/23] arm64: kexec_file: Fix CMA page leaks during segment placement retry loops
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Sashiko AI code review pointed out, during arm64 kexec image placement
retry loops in image_load(), the loader repeatedly attempts to find
a suitable memory hole for the kernel and its associated segments
(initrd, dtb, etc.). When a placement attempt fails midway, the core
framework rolls back `image->nr_segments` to its initial state to purge
the failed segments logically.

However, this truncation causes a severe background memory leak. Any CMA
pages successfully allocated via kexec_add_buffer() during the failed
attempt are recorded in the `image->segment_cma` array. Since
the subsequent global kimage_free_cma() cleanup only iterates up to
the truncated (smaller) `nr_segments` boundary, these allocated CMA pages
outside the new boundary become completely orphaned and permanently leaked.

Fix this by leverage the newly introduced generic kexec_free_segment_cma()
helper to execute fine-grained memory reclamation before any truncation
occurs:

1. In image_load(), explicitly invoke kexec_free_segment_cma() to release
   the CMA buffer allocated for the current failed kernel segment before
   decrementing `image->nr_segments`.

2. In the error path of load_other_segments(), iterate backward from the
   failed segment index down to `orig_segments`, sequentially freeing each
   orphan CMA segment allocation before restoring the initial segment
   count.

This guarantees that all temporary CMA pages allocated during placement
failures are cleanly returned to the contiguous memory allocator,
eliminating silent background memory leaks across all retry paths.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Breno Leitao <leitao@debian.org>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Kees Cook <kees@kernel.org>
Cc: "Rob Herring (Arm)" <robh@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Coiby Xu <coxu@redhat.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: stable@vger.kernel.org
Fixes: 07d24902977e4 ("kexec: enable CMA based contiguous allocation")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/kexec_image.c        | 1 +
 arch/arm64/kernel/machine_kexec_file.c | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index b70f4df15a1a..ffcb7f9075e6 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -107,6 +107,7 @@ static void *image_load(struct kimage *image,
 		 * We couldn't find space for the other segments; erase the
 		 * kernel segment and try the next available hole.
 		 */
+		kexec_free_segment_cma(image, kernel_segment_number);
 		image->nr_segments -= 1;
 		kbuf.buf_min = kernel_segment->mem + kernel_segment->memsz;
 		kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index e31fabed378a..13c247c28866 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -195,7 +195,10 @@ int load_other_segments(struct kimage *image,
 	return 0;
 
 out_err:
-	image->nr_segments = orig_segments;
+	while (image->nr_segments > orig_segments) {
+		kexec_free_segment_cma(image, image->nr_segments - 1);
+		image->nr_segments--;
+	}
 	kvfree(dtb);
 	return ret;
 }
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 06/23] kexec: Extract kexec_free_segment_cma() from kimage_free_cma()
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

The generic kimage_free_cma() relies on `image->nr_segments` to iterate
and free allocated CMA pages. However, during architecture-specific
segment placement retry loops (e.g., arm64's image_load()), a mid-way
failure will truncate `image->nr_segments` back to its initial value.
This truncation permanently hides any CMA pages allocated outside the
new boundary from global cleanup, causing silent background memory leaks.

To allow architecture-specific loaders to execute fine-grained memory
reclamation before truncation occurs, extract the single-pass CMA release
logic into a dedicated and exported helper:

	void kexec_free_segment_cma(struct kimage *image, unsigned long idx);

Refactor the main kimage_free_cma() to invoke this helper sequentially
to maintain backward compatibility while expanding single-slot flexibility.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 include/linux/kexec.h |  2 ++
 kernel/kexec_core.c   | 25 ++++++++++++++-----------
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 8a22bc9b8c6c..6f1eabda0300 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -532,6 +532,7 @@ extern bool kexec_file_dbg_print;
 
 extern void *kimage_map_segment(struct kimage *image, int idx);
 extern void kimage_unmap_segment(void *buffer);
+extern void kexec_free_segment_cma(struct kimage *image, unsigned long idx);
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
@@ -543,6 +544,7 @@ static inline int kexec_crash_loaded(void) { return 0; }
 static inline void *kimage_map_segment(struct kimage *image, int idx)
 { return NULL; }
 static inline void kimage_unmap_segment(void *buffer) { }
+static inline void kexec_free_segment_cma(struct kimage *image, unsigned long idx) { }
 #define kexec_in_progress false
 #endif /* CONFIG_KEXEC_CORE */
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index a43d2da0fe3e..9195f81e53c4 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -554,22 +554,25 @@ static void kimage_free_entry(kimage_entry_t entry)
 	kimage_free_pages(page);
 }
 
-static void kimage_free_cma(struct kimage *image)
+void kexec_free_segment_cma(struct kimage *image, unsigned long idx)
 {
-	unsigned long i;
+	u32 nr_pages = image->segment[idx].memsz >> PAGE_SHIFT;
+	struct page *cma = image->segment_cma[idx];
 
-	for (i = 0; i < image->nr_segments; i++) {
-		struct page *cma = image->segment_cma[i];
-		u32 nr_pages = image->segment[i].memsz >> PAGE_SHIFT;
+	if (!cma)
+		return;
 
-		if (!cma)
-			continue;
+	arch_kexec_pre_free_pages(page_address(cma), nr_pages);
+	dma_release_from_contiguous(NULL, cma, nr_pages);
+	image->segment_cma[idx] = NULL;
+}
 
-		arch_kexec_pre_free_pages(page_address(cma), nr_pages);
-		dma_release_from_contiguous(NULL, cma, nr_pages);
-		image->segment_cma[i] = NULL;
-	}
+static void kimage_free_cma(struct kimage *image)
+{
+	unsigned long i;
 
+	for (i = 0; i < image->nr_segments; i++)
+		kexec_free_segment_cma(image, i);
 }
 
 void kimage_free(struct kimage *image)
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 05/23] powerpc/crash: sort crash memory ranges before preparing elfcorehdr
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

From: Sourabh Jain <sourabhjain@linux.ibm.com>

During a memory hot-remove event, the elfcorehdr is rebuilt to exclude
the removed memory. While updating the crash memory ranges for this
operation, the crash memory ranges array can become unsorted. This
happens because remove_mem_range() may split a memory range into two
parts and append the higher-address part as a separate range at the end
of the array.

So far, no issues have been observed due to the unsorted crash memory
ranges. However, this could lead to problems once crash memory range
removal is handled by generic code, as introduced in the upcoming
patches in this series.

Currently, powerpc uses a platform-specific function,
remove_mem_range(), to exclude hot-removed memory from the crash memory
ranges. This function performs the same task as the generic
crash_exclude_mem_range() in crash_core.c. The generic helper also
ensures that the crash memory ranges remain sorted. So remove the
redundant powerpc-specific implementation and instead call
crash_exclude_mem_range_guarded() (which internally calls
crash_exclude_mem_range()) to exclude the hot-removed memory ranges.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan he <bhe@redhat.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Shivang Upadhyay <shivangu@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/powerpc/include/asm/kexec_ranges.h |  4 +-
 arch/powerpc/kexec/crash.c              |  5 +-
 arch/powerpc/kexec/ranges.c             | 87 +------------------------
 3 files changed, 7 insertions(+), 89 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h
index 14055896cbcb..ad95e3792d10 100644
--- a/arch/powerpc/include/asm/kexec_ranges.h
+++ b/arch/powerpc/include/asm/kexec_ranges.h
@@ -7,7 +7,9 @@
 void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
 struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
 int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
-int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
+int crash_exclude_mem_range_guarded(struct crash_mem **mem_ranges,
+				    unsigned long long mstart,
+				    unsigned long long mend);
 int get_exclude_memory_ranges(struct crash_mem **mem_ranges);
 int get_reserved_memory_ranges(struct crash_mem **mem_ranges);
 int get_crash_memory_ranges(struct crash_mem **mem_ranges);
diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index a520f851c3a6..d634db67becc 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -493,7 +493,7 @@ static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify *
 	struct crash_mem *cmem = NULL;
 	struct kexec_segment *ksegment;
 	void *ptr, *mem, *elfbuf = NULL;
-	unsigned long elfsz, memsz, base_addr, size;
+	unsigned long elfsz, memsz, base_addr, size, end;
 
 	ksegment = &image->segment[image->elfcorehdr_index];
 	mem = (void *) ksegment->mem;
@@ -512,7 +512,8 @@ static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify *
 	if (image->hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY) {
 		base_addr = PFN_PHYS(mn->start_pfn);
 		size = mn->nr_pages * PAGE_SIZE;
-		ret = remove_mem_range(&cmem, base_addr, size);
+		end = base_addr + size - 1;
+		ret = crash_exclude_mem_range_guarded(&cmem, base_addr, end);
 		if (ret) {
 			pr_err("Failed to remove hot-unplugged memory from crash memory ranges\n");
 			goto out;
diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
index eb45e89502ca..b2fb78562cdc 100644
--- a/arch/powerpc/kexec/ranges.c
+++ b/arch/powerpc/kexec/ranges.c
@@ -551,7 +551,7 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges)
 #endif /* CONFIG_KEXEC_FILE */
 
 #ifdef CONFIG_CRASH_DUMP
-static int crash_exclude_mem_range_guarded(struct crash_mem **mem_ranges,
+int crash_exclude_mem_range_guarded(struct crash_mem **mem_ranges,
 					   unsigned long long mstart,
 					   unsigned long long mend)
 {
@@ -639,89 +639,4 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges)
 		pr_err("Failed to setup crash memory ranges\n");
 	return ret;
 }
-
-/**
- * remove_mem_range - Removes the given memory range from the range list.
- * @mem_ranges:    Range list to remove the memory range to.
- * @base:          Base address of the range to remove.
- * @size:          Size of the memory range to remove.
- *
- * (Re)allocates memory, if needed.
- *
- * Returns 0 on success, negative errno on error.
- */
-int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size)
-{
-	u64 end;
-	int ret = 0;
-	unsigned int i;
-	u64 mstart, mend;
-	struct crash_mem *mem_rngs = *mem_ranges;
-
-	if (!size)
-		return 0;
-
-	/*
-	 * Memory range are stored as start and end address, use
-	 * the same format to do remove operation.
-	 */
-	end = base + size - 1;
-
-	for (i = 0; i < mem_rngs->nr_ranges; i++) {
-		mstart = mem_rngs->ranges[i].start;
-		mend = mem_rngs->ranges[i].end;
-
-		/*
-		 * Memory range to remove is not part of this range entry
-		 * in the memory range list
-		 */
-		if (!(base >= mstart && end <= mend))
-			continue;
-
-		/*
-		 * Memory range to remove is equivalent to this entry in the
-		 * memory range list. Remove the range entry from the list.
-		 */
-		if (base == mstart && end == mend) {
-			for (; i < mem_rngs->nr_ranges - 1; i++) {
-				mem_rngs->ranges[i].start = mem_rngs->ranges[i+1].start;
-				mem_rngs->ranges[i].end = mem_rngs->ranges[i+1].end;
-			}
-			mem_rngs->nr_ranges--;
-			goto out;
-		}
-		/*
-		 * Start address of the memory range to remove and the
-		 * current memory range entry in the list is same. Just
-		 * move the start address of the current memory range
-		 * entry in the list to end + 1.
-		 */
-		else if (base == mstart) {
-			mem_rngs->ranges[i].start = end + 1;
-			goto out;
-		}
-		/*
-		 * End address of the memory range to remove and the
-		 * current memory range entry in the list is same.
-		 * Just move the end address of the current memory
-		 * range entry in the list to base - 1.
-		 */
-		else if (end == mend)  {
-			mem_rngs->ranges[i].end = base - 1;
-			goto out;
-		}
-		/*
-		 * Memory range to remove is not at the edge of current
-		 * memory range entry. Split the current memory entry into
-		 * two half.
-		 */
-		else {
-			size = mem_rngs->ranges[i].end - end + 1;
-			mem_rngs->ranges[i].end = base - 1;
-			ret = add_mem_range(mem_ranges, end + 1, size);
-		}
-	}
-out:
-	return ret;
-}
 #endif /* CONFIG_CRASH_DUMP */
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 04/23] powerpc/kexec_file: Fix memory range truncation in __merge_memory_ranges()
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Sashiko AI review pointed out the following issue.

The __merge_memory_ranges() function incorrectly handles overlapping
memory ranges when merging them. Although sort_memory_ranges() sorts all
ranges by their start address in ascending order beforehand, the merge
logic remains defective in two ways:

1. It compares the current range's start against the previous element (i-1)
   instead of the running target index (idx)

2. It unconditionally overwrites 'ranges[idx].end' with 'ranges[i].end'.

This logic flaw leads to critical memory truncation when a larger memory
range completely subsumes subsequent smaller ranges.

For example, consider a sorted input array with three ranges:
  Range A (idx=0): [0x1000 - 0x9000]
  Range B (i=1):   [0x2000 - 0x5000] (completely inside Range A)
  Range C (i=2):   [0x6000 - 0x8000] (completely inside Range A)

1. When i=1 (Range B):
   ranges[1].start (0x2000) <= ranges[0].end + 1 (0x9001) is TRUE.
   The code executes: ranges[0].end = ranges[1].end, which erroneously
   shrinks Range A's end from 0x9000 down to 0x5000.

2. When i=2 (Range C):
   ranges[2].start (0x6000) <= ranges[1].end + 1 (0x5001) is FALSE.
   The code falls into the else block, creating a broken new range.

As a result, valid memory fragments [0x5001 - 0x5fff] and [0x8001 - 0x9000]
are completely lost from the kexec exclude lists, potentially allowing
the crash kernel to overwrite active memory, causing data corruption
or crashes.

Fix this by ensuring the start of the current range is compared against the
end of the active merged range (idx), and use max() to safely prevent the
outer boundary from being truncated.

Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@vger.kernel.org
Fixes: 180adfc532a8 ("powerpc/kexec_file: Add helper functions for getting memory ranges")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/powerpc/kexec/ranges.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
index 867135560e5c..eb45e89502ca 100644
--- a/arch/powerpc/kexec/ranges.c
+++ b/arch/powerpc/kexec/ranges.c
@@ -21,6 +21,7 @@
 #include <linux/of.h>
 #include <linux/slab.h>
 #include <linux/memblock.h>
+#include <linux/minmax.h>
 #include <linux/crash_core.h>
 #include <asm/sections.h>
 #include <asm/kexec_ranges.h>
@@ -105,19 +106,16 @@ static void __merge_memory_ranges(struct crash_mem *mem_rngs)
 	struct range *ranges;
 	int i, idx;
 
-	if (!mem_rngs)
+	if (!mem_rngs || mem_rngs->nr_ranges <= 1)
 		return;
 
 	idx = 0;
-	ranges = &(mem_rngs->ranges[0]);
+	ranges = mem_rngs->ranges;
 	for (i = 1; i < mem_rngs->nr_ranges; i++) {
-		if (ranges[i].start <= (ranges[i-1].end + 1))
-			ranges[idx].end = ranges[i].end;
+		if (ranges[i].start <= (ranges[idx].end + 1))
+			ranges[idx].end = max(ranges[idx].end, ranges[i].end);
 		else {
 			idx++;
-			if (i == idx)
-				continue;
-
 			ranges[idx] = ranges[i];
 		}
 	}
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 03/23] powerpc/kexec_file: Fix NULL pointer dereference in kexec_extra_fdt_size_ppc64()
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

A static Sashiko AI review identified a potential NULL pointer dereference
in kexec_extra_fdt_size_ppc64().

When get_reserved_memory_ranges() successfully returns 0 on platforms
without any reserved memory regions, the allocated 'rmem' pointer remains
NULL. Passing this unallocated pointer directly to
kexec_extra_fdt_size_ppc64() leads to a kernel panic when evaluating
'rmem->nr_ranges'.

Fix this by adding a defensive NULL pointer check at the beginning of
kexec_extra_fdt_size_ppc64(), returning 0 extra space immediately if
no reserved memory structure exists.

Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@vger.kernel.org
Fixes: 0d3ff067331e ("powerpc/kexec_file: fix extra size calculation for kexec FDT")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/powerpc/kexec/file_load_64.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 8c72e12ea44e..fdeedf102c38 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -649,6 +649,9 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image, struct crash_mem *
 	struct device_node *dn;
 	unsigned int cpu_nodes = 0, extra_size = 0;
 
+	if (!rmem)
+		return 0;
+
 	// Budget some space for the password blob. There's already extra space
 	// for the key name
 	if (plpks_is_available())
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 02/23] powerpc/crash: Fix possible memory leak in update_crash_elfcorehdr()
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

In get_crash_memory_ranges(), if crash_exclude_mem_range() failed
after realloc_mem_ranges() has successfully allocated the cmem
memory, it just returns an error but leaves cmem pointing to
the allocated memory, nor is it freed in the caller
update_crash_elfcorehdr(), which cause a memory leak, goto out
to free the cmem.

Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Fixes: 849599b702ef ("powerpc/crash: add crash memory hotplug support")
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/powerpc/kexec/crash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index e6539f213b3d..a520f851c3a6 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -502,7 +502,7 @@ static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify *
 	ret = get_crash_memory_ranges(&cmem);
 	if (ret) {
 		pr_err("Failed to get crash mem range\n");
-		return;
+		goto out;
 	}
 
 	/*
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 01/23] riscv: kexec_file: Fix crashk_low_res not exclude bug
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

As done in commit 944a45abfabc ("arm64: kdump: Reimplement crashkernel=X")
and commit 4831be702b95 ("arm64/kexec: Fix missing extra range for
crashkres_low.") for arm64, while implementing crashkernel=X,[high,low],
riscv should have excluded the "crashk_low_res" reserved ranges from
the crash kernel memory to prevent them from being exported through
/proc/vmcore, and the exclusion would need an extra crash_mem range.

Just simply tested on qemu with crashkernel=4G with kexec in [1] mentioned
in [2]. And the second kernel can be started normally.

	# dmesg | grep crash
	[    0.000000] crashkernel low memory reserved: 0xf8000000 - 0x100000000 (128 MB)
	[    0.000000] crashkernel reserved: 0x000000017fe00000 - 0x000000027fe00000 (4096 MB)

Cc: Guo Ren <guoren@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
[1]: https://github.com/chenjh005/kexec-tools/tree/build-test-riscv-v2
[2]: https://lore.kernel.org/all/20230726175000.2536220-1-chenjiahao16@huawei.com/
Fixes: 5882e5acf18d ("riscv: kdump: Implement crashkernel=X,[high,low]")
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/riscv/kernel/machine_kexec_file.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/kernel/machine_kexec_file.c b/arch/riscv/kernel/machine_kexec_file.c
index 54e2d9552e93..3f7766057cac 100644
--- a/arch/riscv/kernel/machine_kexec_file.c
+++ b/arch/riscv/kernel/machine_kexec_file.c
@@ -61,7 +61,7 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
 	unsigned int nr_ranges;
 	int ret;
 
-	nr_ranges = 1; /* For exclusion of crashkernel region */
+	nr_ranges = 2; /* For exclusion of crashkernel region */
 	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
 
 	cmem = kmalloc_flex(*cmem, ranges, nr_ranges);
@@ -76,8 +76,16 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
 
 	/* Exclude crashkernel region */
 	ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
-	if (!ret)
-		ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
+	if (ret)
+		goto out;
+
+	if (crashk_low_res.end) {
+		ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
+		if (ret)
+			goto out;
+	}
+
+	ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
 
 out:
 	kfree(cmem);
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 00/23] arm64/riscv: Add support for crashkernel CMA reservation
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie

The crash memory allocation, and the exclude of crashk_res, crashk_low_res
and crashk_cma memory are almost identical across different architectures,
This patch set handle them in crash core in a general way, which eliminate
a lot of duplication code.

And add support for crashkernel CMA reservation for arm64 and riscv.

Also add support for arm64 crash hotplug.

This patch set is rebased on v7.1-rc1.

Basic second kernel boot test were performed on QEMU platforms for x86,
ARM64 and RISC-V architectures with the following parameters:

        "cma=256M crashkernel=4G crashkernel=64M,cma"

For first kernel, there will be such log:

        # dmesg | grep crash
        [    0.000000] crashkernel low memory reserved: 0xe8000000 - 0xf0000000 (128 MB)
        [    0.000000] crashkernel reserved: 0x000000023e600000 - 0x000000033e600000 (4096 MB)
        [    0.000000] crashkernel CMA reserved: 64 MB in 1 ranges

        # dmesg | grep cma
        [    0.000000] cma: Reserved 256 MiB at 0x00000000f0000000
        [    0.000000] cma: Reserved 64 MiB at 0x0000000100000000

For second kernel, there will be such log:

        [    0.000000] OF: fdt: Looking for usable-memory-range property...
        [    0.000000] OF: fdt: cap_mem_regions[0]: base=0x000000023e600000, size=0x0000000100000000
        [    0.000000] OF: fdt: cap_mem_regions[1]: base=0x00000000e8000000, size=0x0000000008000000
        [    0.000000] OF: fdt: cap_mem_regions[2]: base=0x0000000100000000, size=0x0000000004000000

Changes in v15:
- Unify the subject prefix formats as Huacai suggested.
- Fix powerpc pre-existing NULL pointer dereference [Sashiko [1]]
- Fix powerpc pre-existing __merge_memory_ranges() memory range
  truncation [Sashiko [1]].
- Fix pre-existing arm64 CMA page leaks [Sashiko[2]].
- Fix pre-existing crash_load_dm_crypt_keys() Use-After-Free and
  Double Free issue [Sashiko[3]].
- Fix vfree(headers) and uninitialized variables issue
  and simplify the fix [Sashiko[2]].
- As walk_system_ram_res() and for_each_mem_range() use different
  lock, unify and simplify the fix of TOCTOU buffer overflow via memory
  region padding [Sashiko[4]].
- Fix the arm64 crash dump issues in Sashiko[5].
- Link to v14: https://lore.kernel.org/all/20260525084932.934910-1-ruanjinjie@huawei.com/

[1]: https://lore.kernel.org/all/20260525092207.96B9D1F000E9@smtp.kernel.org/
[2]: https://lore.kernel.org/all/20260525091149.1A1E01F00A3D@smtp.kernel.org/
[3]: https://lore.kernel.org/all/20260525105227.3C2421F000E9@smtp.kernel.org/
[4]: https://lore.kernel.org/all/20260525095447.944E11F000E9@smtp.kernel.org/
[5]: https://lore.kernel.org/all/20260525101746.9959D1F000E9@smtp.kernel.org/

Changes in v14:
- Fix image->elf_headers memory leak during retry loop for arm64 as Sashiko
  AI code review pointed out.
- Solve the hotplug notifier arch_crash_handle_hotplug_event() AA
  self-deadlock problem as Sashiko AI code review pointed out.
- Fix the TOCTOU issue in prepare_elf_headers() by get_online_mems().
- -ENOMEM -> -EAGAIN as Breno suggested.
- Add support for arm64 crash hotplug.
- Link to v13: https://lore.kernel.org/all/20260511030454.1730881-1-ruanjinjie@huawei.com/

Changes in v13:
- Rebased on v7.1-rc1.
- Update the commit message.
- Add Reviewed-by.
- Link to v12: https://lore.kernel.org/all/20260402072701.628293-1-ruanjinjie@huawei.com/

Changes in v12:
- Remove the unused "nr_mem_ranges" for x86.
- Add "Fix crashk_low_res not exclude bug" test log.
- Provide a separate patch for each architecture for using
  crash_prepare_headers(), which will make the review more convenient.
- Add Reviewed-by and Tested-by.
- Link to v11: https://lore.kernel.org/all/20260328074013.3589544-1-ruanjinjie@huawei.com/

Changes in v11:
- Avoid silently drop crash memory if the crash kernel is built without
  CONFIG_CMA.
- Remove unnecessary "cmem->nr_ranges = 0" for arch_crash_populate_cmem()
  as we use kvzalloc().
- Provide a separate patch for each architecture to fix the existing
  buffer overflow issue.
- Add Acked-bys for arm64.

Changes in v10:
- Fix crashk_low_res not excluded bug in the existing
  RISC-V code.
- Fix an existing memory leak issue in the existing PowerPC code.
- Fix the ordering issue of adding CMA ranges to
  "linux,usable-memory-range".
- Fix an existing concurrency issue. A Concurrent memory hotplug may occur
  between reading memblock and attempting to fill cmem during kexec_load()
  for almost all existing architectures.
- Link to v9: https://lore.kernel.org/all/20260323072745.2481719-1-ruanjinjie@huawei.com/

Changes in v9:
- Collect Reviewed-by and Acked-by, and prepare for Sashiko AI review.
- Link to v8: https://lore.kernel.org/all/20260302035315.3892241-1-ruanjinjie@huawei.com/

Changes in v8:
- Fix the build issues reported by kernel test robot and Sourabh.
- Link to v7: https://lore.kernel.org/all/20260226130437.1867658-1-ruanjinjie@huawei.com/

Changes in v7:
- Correct the inclusion of CMA-reserved ranges for kdump kernel in of/kexec
  for arm64 and riscv.
- Add Acked-by.
- Link to v6: https://lore.kernel.org/all/20260224085342.387996-1-ruanjinjie@huawei.com/

Changes in v6:
- Update the crash core exclude code as Mike suggested.
- Rebased on v7.0-rc1.
- Add acked-by.
- Link to v5: https://lore.kernel.org/all/20260212101001.343158-1-ruanjinjie@huawei.com/

Jinjie Ruan (22):
  riscv: kexec_file: Fix crashk_low_res not exclude bug
  powerpc/crash: Fix possible memory leak in update_crash_elfcorehdr()
  powerpc/kexec_file: Fix NULL pointer dereference in
    kexec_extra_fdt_size_ppc64()
  powerpc/kexec_file: Fix memory range truncation in
    __merge_memory_ranges()
  kexec: Extract kexec_free_segment_cma() from kimage_free_cma()
  arm64: kexec_file: Fix CMA page leaks during segment placement retry
    loops
  arm64: kexec_file: Fix image->elf_headers memory leak during retry
    loop
  kexec: Fix UAF and Double Free in crash_load_dm_crypt_keys()
  crash_core: Introduce CRASH_HOTPLUG_SAFETY_PADDING for memory hotplug
    safety
  x86: kexec_file: Fix TOCTOU buffer overflow via memory region padding
  arm64: kexec_file: Fix TOCTOU buffer overflow via memory region
    padding
  riscv: kexec_file: Fix TOCTOU buffer overflow via memory region
    padding
  LoongArch: kexec_file: Fix TOCTOU buffer overflow via memory region
    padding
  crash: Add crash_prepare_headers() to exclude crash kernel memory
  arm64: kexec_file: Use crash_prepare_headers() helper to simplify code
  x86: kexec_file: Use crash_prepare_headers() helper to simplify code
  riscv: kexec_file: Use crash_prepare_headers() helper to simplify code
  LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify
    code
  powerpc/kexec_file: Use crash_exclude_core_ranges() helper
  arm64: kexec_file: Add support for crashkernel CMA reservation
  riscv: kexec_file: Add support for crashkernel CMA reservation
  arm64: crash: Add crash hotplug support

Sourabh Jain (1):
  powerpc/crash: sort crash memory ranges before preparing elfcorehdr

 .../admin-guide/kernel-parameters.txt         |  16 +-
 arch/arm64/Kconfig                            |   3 +
 arch/arm64/include/asm/kexec.h                |  13 ++
 arch/arm64/kernel/Makefile                    |   2 +-
 arch/arm64/kernel/crash.c                     | 152 ++++++++++++++++++
 arch/arm64/kernel/kexec_image.c               |  34 ++++
 arch/arm64/kernel/machine_kexec_file.c        |  78 ++-------
 arch/arm64/mm/init.c                          |   5 +-
 arch/loongarch/kernel/machine_kexec_file.c    |  44 ++---
 arch/powerpc/include/asm/kexec_ranges.h       |   1 -
 arch/powerpc/kexec/crash.c                    |   7 +-
 arch/powerpc/kexec/file_load_64.c             |   3 +
 arch/powerpc/kexec/ranges.c                   | 113 ++-----------
 arch/riscv/kernel/machine_kexec_file.c        |  43 ++---
 arch/riscv/mm/init.c                          |   5 +-
 arch/x86/kernel/crash.c                       |  92 ++---------
 drivers/of/fdt.c                              |   9 +-
 drivers/of/kexec.c                            |   9 ++
 include/linux/crash_core.h                    |  15 ++
 include/linux/crash_reserve.h                 |   4 +-
 include/linux/kexec.h                         |   2 +
 kernel/crash_core.c                           |  89 +++++++++-
 kernel/crash_dump_dm_crypt.c                  |   4 +-
 kernel/kexec_core.c                           |  25 +--
 24 files changed, 430 insertions(+), 338 deletions(-)
 create mode 100644 arch/arm64/kernel/crash.c

-- 
2.34.1



^ permalink raw reply

* Re: [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
From: Shrikanth Hegde @ 2026-06-01  9:16 UTC (permalink / raw)
  To: Venkat Rao Bagalkote, Madhavan Srinivasan,
	Mukesh Kumar Chaurasiya, Ritesh Harjani
  Cc: linuxppc-dev, LKML, Srikar Dronamraju, Peter Zijlstra
In-Reply-To: <7904105b-9dfa-4efd-a5ef-bc0276ed255d@linux.ibm.com>

Hi Venkat. Thanks for the report.

+ mukesh, ritesh

On 6/1/26 12:11 PM, Venkat Rao Bagalkote wrote:
> Greetings!!!
> 
> 
> I hit a kernel BUG on a linux-next kernel running on ppc64le (Power11 
> LPAR). The issue was observed once in CI (Avocado tests) and I haven’t 
> been able to reproduce it reliably yet.
> 

Can you run with lockdep and see if you can hit it?

> Architecture: ppc64le (Power11, pSeries)
> Kernel: 7.1.0-rc5-next-20260529
> Config: PREEMPT(lazy)
> CPUs: large system (NR_CPUS=8192)
> 

This is with GENERIC_ENTRY.

> 
> So far, I have not reproduced the crash, but I am trying to stress 
> similar conditions using:
> 
> parallel read workloads (fio / dd)
> memory pressure
> 
> 
> Traces:
> 
>   (5/8) /home/upstreamci/avocado-fvt-wrapper/tests/avocado-misc-tests/ 
> cpu/ppc64_cpu_test.py:PPC64Test.test_smt_loop;run-run_type- 
> upstream-9cfe: STARTED
> [ 1885.176400] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1885.296164] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1885.386120] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1885.556134] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1886.576119] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1886.806060] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1887.026051] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1887.456075] ------------[ cut here ]------------
> [ 1887.456101] kernel BUG at kernel/sched/core.c:7512!
> [ 1887.456107] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 1887.456111] LE PAGE_SIZE=4K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
> [ 1887.456116] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 
> nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding 
> tls ip_set rfkill nf_tables fsdev_dax kmem device_dax pseries_rng 
> vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2 sd_mod nd_pmem papr_scm 
> sg libnvdimm ibmvscsi ibmveth scsi_transport_srp pseries_wdt
> [ 1887.456173] CPU: 28 UID: 0 PID: 85305 Comm: kexec Not tainted 7.1.0- 
> rc5-next-20260529 #1 PREEMPT(lazy)
> [ 1887.456180] Hardware name: IBM,9080-HEX Power11 (architected) 
> 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
> [ 1887.456185] NIP:  c0000000013a8e8c LR: c0000000003483bc CTR: 
> 0000000000000000
> [ 1887.456190] REGS: c000000069f03070 TRAP: 0700   Not tainted (7.1.0- 
> rc5-next-20260529)
> [ 1887.456195] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 
> 24428222  XER: 0000005a
> [ 1887.456208] CFAR: c0000000003483b8 IRQMASK: 0
> [ 1887.456208] GPR00: c0000000003483bc c000000069f03330 c000000001a82100 
> c000000069f033e0
> [ 1887.456208] GPR04: 0000000000000000 0000000000000001 0000000000000001 
> c000000006dd3b00
> [ 1887.456208] GPR08: ffffffffffffff00 0000000000000001 0000000000000000 
> 0000000024428220
> [ 1887.456208] GPR12: 0000000000000300 c000000effdbef00 0000000000000000 
> 0000000000000000
> [ 1887.456208] GPR16: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> [ 1887.456208] GPR20: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> [ 1887.456208] GPR24: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> [ 1887.456208] GPR28: 0000000000000000 0000000000000000 0000000000000000 
> c000000069f033e0
> [ 1887.456265] NIP [c0000000013a8e8c] preempt_schedule_irq+0x44/0x118
> [ 1887.456274] LR [c0000000003483bc] 
> dynamic_irqentry_exit_cond_resched+0x40/0x1a4
> [ 1887.456282] Call Trace:
> [ 1887.456284] [c000000069f03360] [c0000000003483bc] 
> dynamic_irqentry_exit_cond_resched+0x40/0x1a4
> [ 1887.456291] [c000000069f03380] [c00000000014f3bc] 
> do_page_fault+0xc0/0x104
> [ 1887.456298] [c000000069f033b0] [c000000000008be0] 
> data_access_common_virt+0x210/0x220
> [ 1887.456306] ---- interrupt: 300 at __copy_tofrom_user_base+0xac/0x5a4
> [ 1887.456313] NIP:  c00000000017fc38 LR: c000000000aaa684 CTR: 
> 0000000000000000
> [ 1887.456317] REGS: c000000069f033e0 TRAP: 0300   Not tainted (7.1.0- 
> rc5-next-20260529)
> [ 1887.456322] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 
> 24428220  XER: 2004005a
> [ 1887.456334] CFAR: c00000000017fc34 DAR: 00003fff879a8000 DSISR: 
> 42000000 IRQMASK: 0
> [ 1887.456334] GPR00: 0000000000000000 c000000069f036a0 c000000001a82100 
> 00003fff879a8000
> [ 1887.456334] GPR04: c0000000bb314ff0 0000000000001000 69f0000606480600 
> 0200c4080368f028
> [ 1887.456334] GPR08: 09036af00005d9c4 0600000200e80803 0000000000000000 
> 0000000000000030
> [ 1887.456334] GPR12: 0000000000000040 c000000effdbef00 0000000000000000 
> 000000000000000e
> [ 1887.456334] GPR16: 0000000004a00000 000000000000001f c000000069f038a0 
> c00000006e73e500
> [ 1887.456334] GPR20: c00000006f0ff6a8 0000000000000000 c00000006f0ff540 
> 0000000000000001
> [ 1887.456334] GPR24: 000000001816ce60 c0000000bb314000 c000000002e48730 
> c000000069f03a30
> [ 1887.456334] GPR28: c0000000bb314000 00003fff879a7010 0000000000000010 
> 0000000000001000
> [ 1887.456393] NIP [c00000000017fc38] __copy_tofrom_user_base+0xac/0x5a4
> [ 1887.456399] LR [c000000000aaa684] raw_copy_to_user+0x12c/0x314
> [ 1887.456405] ---- interrupt: 300
> [ 1887.456408] [c000000069f036a0] [c000000000aaa5f4] 
> raw_copy_to_user+0x9c/0x314 (unreliable)
> [ 1887.456416] [c000000069f036e0] [c000000000aacd08] 
> _copy_to_iter+0xe4/0x79c
> [ 1887.456423] [c000000069f037a0] [c000000000ab01ec] 
> copy_page_to_iter+0xd4/0x1a4
> [ 1887.456429] [c000000069f037f0] [c0000000005ddc34] 
> filemap_read+0x420/0x4f0
> [ 1887.456436] [c000000069f039c0] [c0080000043443e0] 
> ext4_file_read_iter+0x78/0x31c [ext4]
> [ 1887.456517] [c000000069f03a10] [c000000000796498] vfs_read+0x2a8/0x3c8
> [ 1887.456524] [c000000069f03ac0] [c00000000079726c] ksys_read+0x88/0x140
> [ 1887.456530] [c000000069f03b10] [c000000000032f98] 
> system_call_exception+0x198/0x4e0
> [ 1887.456537] [c000000069f03e30] [c00000000000d05c] 
> system_call_vectored_common+0x15c/0x2ec
> [ 1887.456544] ---- interrupt: 3000 at 0x3fff9b133cf4
> [ 1887.456549] NIP:  00003fff9b133cf4 LR: 00003fff9b133cf4 CTR: 
> 0000000000000000
> [ 1887.456554] REGS: c000000069f03e60 TRAP: 3000   Not tainted (7.1.0- 
> rc5-next-20260529)
> [ 1887.456558] MSR:  800000000000f033 <SF,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 
> 44424402  XER: 00000000
> [ 1887.456572] IRQMASK: 0
> [ 1887.456572] GPR00: 0000000000000003 00003fffe5fb4190 0000000105087f00 
> 0000000000000003
> [ 1887.456572] GPR04: 00003fff82e93010 000000001816ce60 0000000000000022 
> 0000000000000000
> [ 1887.456572] GPR08: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> [ 1887.456572] GPR12: 0000000000000000 00003fff9b4cd860 000000010507f588 
> 0000000000000000
> [ 1887.456572] GPR16: ffffffffffffffff 0000000000000000 0000000000000006 
> 0000000000000000
> [ 1887.456572] GPR20: 0000000000000001 00003fff9b23039c 00003fff9b2303a0 
> 00003fffe5fb5ee7
> [ 1887.456572] GPR24: 0000000000000000 0000000000000000 00003fffe5fb5ee7 
> 00003fffe5fb42d0
> [ 1887.456572] GPR28: 0000000000000003 00003fff82e93010 000000001816ce60 
> 0000000000000000
> [ 1887.456626] NIP [00003fff9b133cf4] 0x3fff9b133cf4
> [ 1887.456630] LR [00003fff9b133cf4] 0x3fff9b133cf4
> [ 1887.456634] ---- interrupt: 3000
> [ 1887.456637] Code: fbe1fff8 e92d0128 f8010010 f821ffd1 81490000 
> 39200001 2c0a0000 40820014 892d0152 552907fe 7d290034 5529d97e 
> <0b090000> 60000000 3bc00000 ebed0128
> [ 1887.456657] ---[ end trace 0000000000000000 ]---
> 
> 
> If you happen to fix this, please add below tag.
> 
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> 

Ritesh, Mukesh, Is below possible scenario?

do_page_fault seems to enable irq's in the interrupt handler?
is that expected? if so, one might see

-- do_page_fault (enter kernel mode)
    -- enables interrupts
    -- gets interrupt - Sets need_resched.
       -- irqentry_exit - Sees it is kernel mode. Just checks preempt count
			 and calls preempt_schedule_irq, which catches both
			 preempt_count and !irqs_disabled. Hence the panic?

Should do_page_fault do preempt_disable when it enables the interrupts?


^ permalink raw reply

* Re: [PATCH v6 09/15] arm64: Move fixmap and kasan page tables to end of kernel image
From: Ard Biesheuvel @ 2026-06-01  8:39 UTC (permalink / raw)
  To: Kevin Brodsky, Ard Biesheuvel, linux-arm-kernel
  Cc: linux-kernel, Will Deacon, Catalin Marinas, Mark Rutland,
	Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
	Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
	Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
In-Reply-To: <69488547-cf2a-4aa0-bca7-0cb65aa01914@arm.com>


On Mon, 1 Jun 2026, at 10:37, Kevin Brodsky wrote:
> On 29/05/2026 16:47, Ard Biesheuvel wrote:
>>>>>> +	/* fixmap BSS starts here - preceding data/BSS is omitted from the linear map */
>>>>>> +	.pgdir.bss (NOLOAD) : ALIGN(PAGE_SIZE) {
>>>>> Do we actually need the NOLOAD type here?
>>>> Yes, otherwise it is emitted as PROGBITS, resulting in all of BSS to be
>>>> emitted into Image.
>>> That's rather strange, aren't the .pgdir.bss input sections already
>>> NOBITS since __pgtbl_bss is only used on default-initialised globals?
>> Not sure why, but the section was PROGBITS not NOBITS before I added the (NOLOAD)
>
> I've had a closer look into this. Similar sections in other
> architectures are all named .bss..<something>. If I rename this section
> to .bss..pgdir, then indeed the compiler does emit an object file with
> that section marked NOBITS:
>
> $ readelf -e out/arch/arm64/mm/fixmap.o | grep bss
>   [ 4] .bss              NOBITS          0000000000000000 0002ac 000000
> 00  WA  0   0  1
>   [18] .bss..pgdir       NOBITS          0000000000000000 000750 005000
> 00  WA  0   0 4096
>
> And then the linker does the right thing without having to use NOLOAD.
>
> I was concerned that .bss..pgdir might get caught by BSS_SECTION(), but
> it seems that the double dots are meant to prevent exactly that.
>

Thanks for this. As Sashiko appears to be making me do a v8 anyway, I'll
rename the section to .bss..pgdir too.



^ permalink raw reply

* Re: [PATCH v6 09/15] arm64: Move fixmap and kasan page tables to end of kernel image
From: Kevin Brodsky @ 2026-06-01  8:37 UTC (permalink / raw)
  To: Ard Biesheuvel, Ard Biesheuvel, linux-arm-kernel
  Cc: linux-kernel, Will Deacon, Catalin Marinas, Mark Rutland,
	Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
	Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
	Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
In-Reply-To: <feab72b8-2961-4145-ac5c-80e820bf1ce9@app.fastmail.com>

On 29/05/2026 16:47, Ard Biesheuvel wrote:
>>>>> +	/* fixmap BSS starts here - preceding data/BSS is omitted from the linear map */
>>>>> +	.pgdir.bss (NOLOAD) : ALIGN(PAGE_SIZE) {
>>>> Do we actually need the NOLOAD type here?
>>> Yes, otherwise it is emitted as PROGBITS, resulting in all of BSS to be
>>> emitted into Image.
>> That's rather strange, aren't the .pgdir.bss input sections already
>> NOBITS since __pgtbl_bss is only used on default-initialised globals?
> Not sure why, but the section was PROGBITS not NOBITS before I added the (NOLOAD)

I've had a closer look into this. Similar sections in other
architectures are all named .bss..<something>. If I rename this section
to .bss..pgdir, then indeed the compiler does emit an object file with
that section marked NOBITS:

$ readelf -e out/arch/arm64/mm/fixmap.o | grep bss
  [ 4] .bss              NOBITS          0000000000000000 0002ac 000000
00  WA  0   0  1
  [18] .bss..pgdir       NOBITS          0000000000000000 000750 005000
00  WA  0   0 4096

And then the linker does the right thing without having to use NOLOAD.

I was concerned that .bss..pgdir might get caught by BSS_SECTION(), but
it seems that the double dots are meant to prevent exactly that.

- Kevin


^ permalink raw reply

* Re: [PATCH v7 12/15] sh: Drop cache flush of the zero page at boot
From: Geert Uytterhoeven @ 2026-06-01  8:11 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-kernel, will, catalin.marinas,
	mark.rutland, Ard Biesheuvel, Ryan Roberts, Anshuman Khandual,
	Kevin Brodsky, Liz Prucka, Seth Jenkins, Kees Cook, Mike Rapoport,
	David Hildenbrand, Andrew Morton, Jann Horn, linux-mm,
	linux-hardening, linuxppc-dev, linux-sh, Yoshinori Sato,
	Rich Felker, John Paul Adrian Glaubitz, Geert Uytterhoeven
In-Reply-To: <20260529150150.1670604-29-ardb+git@google.com>

On Fri, 29 May 2026 at 17:02, Ard Biesheuvel <ardb+git@google.com> wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> SuperH performs cache maintenance on the zero page during boot,
> presumably because before commit
>
>   6215d9f4470f ("arch, mm: consolidate empty_zero_page")
>
> the zero page did double duty as a boot params region, and was cleared
> separately, as it was not part of BSS. The memset() in question was
> dropped by that commit, but the __flush_wback_region() call remained.
>
> As empty_zero_page[] has been moved to BSS, it can be treated as any
> other BSS memory, and so the cache flush can be dropped.
>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Rich Felker <dalias@libc.org>
> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


^ permalink raw reply

* [PATCH v2] ASoC: fsl_sai: Fix 32 slots TDM broken by integer shift UB in xMR write
From: chancel.liu @ 2026-06-01  7:05 UTC (permalink / raw)
  To: shengjiu.wang, Xiubo.Lee, festevam, nicoleotsuka, lgirdwood,
	broonie, perex, tiwai
  Cc: linux-kernel, linuxppc-dev, linux-sound, stable
In-Reply-To: <20260529085020.3727790-1-chancel.liu@nxp.com>

From: Chancel Liu <chancel.liu@nxp.com>

When configuring 32 slots TDM (channels == slots == 32), the xMR
(Mask Register) write used:
~0UL - ((1 << min(channels, slots)) - 1)

The literal "1" is a signed 32-bit int. Shifting it by 32 positions is
undefined behaviour which may set this register to 0xFFFFFFFF, masking
all 32 slots.

Use GENMASK_U32() macro instead. For 32 slots this produces a zero mask:
~GENMASK_U32(31, 0) = ~0xFFFFFFFF = 0x00000000
Behaviour for fewer than 32 slots is unchanged.

Fixes: 770f58d7d2c5 ("ASoC: fsl_sai: Support multiple data channel enable bits")
Cc: stable@vger.kernel.org
Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
---
Changes in v2
- Use GENMASK_U32() macro instead to make it clearer and safer

 sound/soc/fsl/fsl_sai.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 821e3bd51b6e..9661602b53c5 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -797,7 +797,7 @@ static int fsl_sai_hw_params(struct snd_pcm_substream *substream,
 				   FSL_SAI_CR4_FSD_MSTR, FSL_SAI_CR4_FSD_MSTR);

 	regmap_write(sai->regmap, FSL_SAI_xMR(tx),
-		     ~0ULL - ((1ULL << min(channels, slots)) - 1));
+		     ~GENMASK_U32(min(channels, slots) - 1, 0));

 	return 0;
 }
--
2.50.1



^ permalink raw reply related

* [PATCH] selftests/mm/run_vmtests.sh: Fix protection_keys binary name in run_vmtests.sh
From: Pavithra @ 2026-06-01  6:13 UTC (permalink / raw)
  To: shuah, akpm
  Cc: linux-kselftest, linux-mm, linux-kernel, linuxppc-dev, pavrampu,
	ritesh.list, david, ziy, mhocko, osalvador, lorenzo.stoakes,
	dev.jain, Liam.Howlett, linmiaohe

we have protection_keys_32 and protection_keys_64 tests mentioned in
run_vmtests.sh but the binary name is protection_keys with current
kernel, adding the correct binary name.

Signed-off-by: Pavithra <pavrampu@linux.ibm.com>
---
 tools/testing/selftests/mm/run_vmtests.sh | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 043aa3ed2596..a6b6f397d942 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -344,14 +344,9 @@ CATEGORY="ksm_numa" run_test ./ksm_tests -N -m 0
 CATEGORY="ksm" run_test ./ksm_functional_tests
 
 # protection_keys tests
-if [ -x ./protection_keys_32 ]
+if [ -x ./protection_keys ]
 then
-	CATEGORY="pkey" run_test ./protection_keys_32
-fi
-
-if [ -x ./protection_keys_64 ]
-then
-	CATEGORY="pkey" run_test ./protection_keys_64
+	CATEGORY="pkey" run_test ./protection_keys
 fi
 
 if [ -x ./soft-dirty ]
-- 
2.54.0



^ permalink raw reply related

* [PATCH] powerpc/spufs: fix out-of-bounds access in spufs_mem_mmap_access()
From: Junrui Luo @ 2026-06-01  7:50 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), Rik van Riel, Benjamin Herrenschmidt,
	Andrew Morton
  Cc: linuxppc-dev, linux-kernel, Yuhao Jiang, stable, Junrui Luo

spufs_mem_mmap_access() computes the local store offset as
address - vma->vm_start, but bounds-checks it against vma->vm_end
instead of the local store size. On 64-bit, offset is always well
below vma->vm_end, so the clamp never fires and len stays unbounded
against the LS_SIZE buffer returned by ctx->ops->get_ls().

Reject offsets at or beyond LS_SIZE and clamp len to the remaining
space, mirroring the guard already used by spufs_mem_mmap_fault() and
spufs_ps_fault().

Fixes: a352894d0705 ("spufs: use new vm_ops->access to allow local state access from gdb")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
---
 arch/powerpc/platforms/cell/spufs/file.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c
index 10fa9b844fcc..94c1ffa8792e 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -268,10 +268,12 @@ static int spufs_mem_mmap_access(struct vm_area_struct *vma,
 
 	if (write && !(vma->vm_flags & VM_WRITE))
 		return -EACCES;
+	if (offset >= LS_SIZE)
+		return -EFAULT;
 	if (spu_acquire(ctx))
 		return -EINTR;
-	if ((offset + len) > vma->vm_end)
-		len = vma->vm_end - offset;
+	if ((offset + len) > LS_SIZE)
+		len = LS_SIZE - offset;
 	local_store = ctx->ops->get_ls(ctx);
 	if (write)
 		memcpy_toio(local_store + offset, buf, len);

---
base-commit: c369299895a591d96745d6492d4888259b004a9e
change-id: 20260601-fixes-e7319a0b4db2

Best regards,
-- 
Junrui Luo <moonafterrain@outlook.com>



^ permalink raw reply related

* [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
From: Venkat Rao Bagalkote @ 2026-06-01  6:41 UTC (permalink / raw)
  To: Peter Zijlstra, Shrikanth Hegde, Srikar Dronamraju,
	Madhavan Srinivasan
  Cc: linuxppc-dev, LKML

Greetings!!!


I hit a kernel BUG on a linux-next kernel running on ppc64le (Power11 
LPAR). The issue was observed once in CI (Avocado tests) and I haven’t 
been able to reproduce it reliably yet.

Architecture: ppc64le (Power11, pSeries)
Kernel: 7.1.0-rc5-next-20260529
Config: PREEMPT(lazy)
CPUs: large system (NR_CPUS=8192)


So far, I have not reproduced the crash, but I am trying to stress 
similar conditions using:

parallel read workloads (fio / dd)
memory pressure


Traces:

  (5/8) 
/home/upstreamci/avocado-fvt-wrapper/tests/avocado-misc-tests/cpu/ppc64_cpu_test.py:PPC64Test.test_smt_loop;run-run_type-upstream-9cfe: 
STARTED
[ 1885.176400] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1885.296164] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1885.386120] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1885.556134] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1886.576119] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1886.806060] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1887.026051] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1887.456075] ------------[ cut here ]------------
[ 1887.456101] kernel BUG at kernel/sched/core.c:7512!
[ 1887.456107] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1887.456111] LE PAGE_SIZE=4K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
[ 1887.456116] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding 
tls ip_set rfkill nf_tables fsdev_dax kmem device_dax pseries_rng 
vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2 sd_mod nd_pmem papr_scm 
sg libnvdimm ibmvscsi ibmveth scsi_transport_srp pseries_wdt
[ 1887.456173] CPU: 28 UID: 0 PID: 85305 Comm: kexec Not tainted 
7.1.0-rc5-next-20260529 #1 PREEMPT(lazy)
[ 1887.456180] Hardware name: IBM,9080-HEX Power11 (architected) 
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[ 1887.456185] NIP:  c0000000013a8e8c LR: c0000000003483bc CTR: 
0000000000000000
[ 1887.456190] REGS: c000000069f03070 TRAP: 0700   Not tainted 
(7.1.0-rc5-next-20260529)
[ 1887.456195] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 
24428222  XER: 0000005a
[ 1887.456208] CFAR: c0000000003483b8 IRQMASK: 0
[ 1887.456208] GPR00: c0000000003483bc c000000069f03330 c000000001a82100 
c000000069f033e0
[ 1887.456208] GPR04: 0000000000000000 0000000000000001 0000000000000001 
c000000006dd3b00
[ 1887.456208] GPR08: ffffffffffffff00 0000000000000001 0000000000000000 
0000000024428220
[ 1887.456208] GPR12: 0000000000000300 c000000effdbef00 0000000000000000 
0000000000000000
[ 1887.456208] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[ 1887.456208] GPR20: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[ 1887.456208] GPR24: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[ 1887.456208] GPR28: 0000000000000000 0000000000000000 0000000000000000 
c000000069f033e0
[ 1887.456265] NIP [c0000000013a8e8c] preempt_schedule_irq+0x44/0x118
[ 1887.456274] LR [c0000000003483bc] 
dynamic_irqentry_exit_cond_resched+0x40/0x1a4
[ 1887.456282] Call Trace:
[ 1887.456284] [c000000069f03360] [c0000000003483bc] 
dynamic_irqentry_exit_cond_resched+0x40/0x1a4
[ 1887.456291] [c000000069f03380] [c00000000014f3bc] 
do_page_fault+0xc0/0x104
[ 1887.456298] [c000000069f033b0] [c000000000008be0] 
data_access_common_virt+0x210/0x220
[ 1887.456306] ---- interrupt: 300 at __copy_tofrom_user_base+0xac/0x5a4
[ 1887.456313] NIP:  c00000000017fc38 LR: c000000000aaa684 CTR: 
0000000000000000
[ 1887.456317] REGS: c000000069f033e0 TRAP: 0300   Not tainted 
(7.1.0-rc5-next-20260529)
[ 1887.456322] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 
24428220  XER: 2004005a
[ 1887.456334] CFAR: c00000000017fc34 DAR: 00003fff879a8000 DSISR: 
42000000 IRQMASK: 0
[ 1887.456334] GPR00: 0000000000000000 c000000069f036a0 c000000001a82100 
00003fff879a8000
[ 1887.456334] GPR04: c0000000bb314ff0 0000000000001000 69f0000606480600 
0200c4080368f028
[ 1887.456334] GPR08: 09036af00005d9c4 0600000200e80803 0000000000000000 
0000000000000030
[ 1887.456334] GPR12: 0000000000000040 c000000effdbef00 0000000000000000 
000000000000000e
[ 1887.456334] GPR16: 0000000004a00000 000000000000001f c000000069f038a0 
c00000006e73e500
[ 1887.456334] GPR20: c00000006f0ff6a8 0000000000000000 c00000006f0ff540 
0000000000000001
[ 1887.456334] GPR24: 000000001816ce60 c0000000bb314000 c000000002e48730 
c000000069f03a30
[ 1887.456334] GPR28: c0000000bb314000 00003fff879a7010 0000000000000010 
0000000000001000
[ 1887.456393] NIP [c00000000017fc38] __copy_tofrom_user_base+0xac/0x5a4
[ 1887.456399] LR [c000000000aaa684] raw_copy_to_user+0x12c/0x314
[ 1887.456405] ---- interrupt: 300
[ 1887.456408] [c000000069f036a0] [c000000000aaa5f4] 
raw_copy_to_user+0x9c/0x314 (unreliable)
[ 1887.456416] [c000000069f036e0] [c000000000aacd08] 
_copy_to_iter+0xe4/0x79c
[ 1887.456423] [c000000069f037a0] [c000000000ab01ec] 
copy_page_to_iter+0xd4/0x1a4
[ 1887.456429] [c000000069f037f0] [c0000000005ddc34] 
filemap_read+0x420/0x4f0
[ 1887.456436] [c000000069f039c0] [c0080000043443e0] 
ext4_file_read_iter+0x78/0x31c [ext4]
[ 1887.456517] [c000000069f03a10] [c000000000796498] vfs_read+0x2a8/0x3c8
[ 1887.456524] [c000000069f03ac0] [c00000000079726c] ksys_read+0x88/0x140
[ 1887.456530] [c000000069f03b10] [c000000000032f98] 
system_call_exception+0x198/0x4e0
[ 1887.456537] [c000000069f03e30] [c00000000000d05c] 
system_call_vectored_common+0x15c/0x2ec
[ 1887.456544] ---- interrupt: 3000 at 0x3fff9b133cf4
[ 1887.456549] NIP:  00003fff9b133cf4 LR: 00003fff9b133cf4 CTR: 
0000000000000000
[ 1887.456554] REGS: c000000069f03e60 TRAP: 3000   Not tainted 
(7.1.0-rc5-next-20260529)
[ 1887.456558] MSR:  800000000000f033 <SF,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 
44424402  XER: 00000000
[ 1887.456572] IRQMASK: 0
[ 1887.456572] GPR00: 0000000000000003 00003fffe5fb4190 0000000105087f00 
0000000000000003
[ 1887.456572] GPR04: 00003fff82e93010 000000001816ce60 0000000000000022 
0000000000000000
[ 1887.456572] GPR08: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[ 1887.456572] GPR12: 0000000000000000 00003fff9b4cd860 000000010507f588 
0000000000000000
[ 1887.456572] GPR16: ffffffffffffffff 0000000000000000 0000000000000006 
0000000000000000
[ 1887.456572] GPR20: 0000000000000001 00003fff9b23039c 00003fff9b2303a0 
00003fffe5fb5ee7
[ 1887.456572] GPR24: 0000000000000000 0000000000000000 00003fffe5fb5ee7 
00003fffe5fb42d0
[ 1887.456572] GPR28: 0000000000000003 00003fff82e93010 000000001816ce60 
0000000000000000
[ 1887.456626] NIP [00003fff9b133cf4] 0x3fff9b133cf4
[ 1887.456630] LR [00003fff9b133cf4] 0x3fff9b133cf4
[ 1887.456634] ---- interrupt: 3000
[ 1887.456637] Code: fbe1fff8 e92d0128 f8010010 f821ffd1 81490000 
39200001 2c0a0000 40820014 892d0152 552907fe 7d290034 5529d97e 
<0b090000> 60000000 3bc00000 ebed0128
[ 1887.456657] ---[ end trace 0000000000000000 ]---


If you happen to fix this, please add below tag.

Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>


Regards,

Venkat.




^ permalink raw reply

* Re: [PATCH] selftests/mm/run_vmtests.sh: Fix protection_keys binary name in run_vmtests.sh
From: Andrew Morton @ 2026-06-01  6:22 UTC (permalink / raw)
  To: Pavithra
  Cc: shuah, linux-kselftest, linux-mm, linux-kernel, linuxppc-dev,
	ritesh.list, david, ziy, mhocko, osalvador, lorenzo.stoakes,
	dev.jain, Liam.Howlett, linmiaohe
In-Reply-To: <20260601061314.898388-1-pavrampu@linux.ibm.com>

On Mon,  1 Jun 2026 11:43:14 +0530 Pavithra <pavrampu@linux.ibm.com> wrote:

> we have protection_keys_32 and protection_keys_64 tests mentioned in
> run_vmtests.sh but the binary name is protection_keys with current
> kernel,

Is it?

Makefile has:

VMTARGETS := protection_keys
VMTARGETS += pkey_sighandler_tests
BINARIES_32 := $(VMTARGETS:%=%_32)
BINARIES_64 := $(VMTARGETS:%=%_64)

hp2:/usr/src/mm/tools/testing/selftests/mm> ls -l prot*
-rwxrwxr-x 1 akpm akpm 117792 May 31 23:20 protection_keys_64
-rw-rw-r-- 1 akpm akpm  48448 May 31 21:54 protection_keys.c



^ permalink raw reply

* Re: [v2 0/2] KVM: Validate irqchip index in routing entries
From: Yanfei Xu @ 2026-05-31 14:36 UTC (permalink / raw)
  To: Greg KH, Yanfei Xu
  Cc: harshpb, zhaotianrui, maobibo, chenhuacai, maddy, npiggin,
	sashiko-reviews, seanjc, pbonzini, kvm, stable, loongarch,
	linuxppc-dev, caixiangfeng, fangying.tommy
In-Reply-To: <2026053158-cussed-outweigh-6f0f@gregkh>


On 2026/5/31 22:15, Greg KH wrote:
>> -- 
>> 2.20.1
>>
> <formletter>
>
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read:
>      https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> for how to do this properly.
>
> </formletter>

Thanks for pointing out the correct process. I saw
that PPC maintainer added "Cc: stable@vger.kernel.org"
on v1, so I mistakenly thought v2 should cc...

Thanks,
Yanfei



^ permalink raw reply

* Re: [PATCH] KVM: Validate irqchip index for LoongArch and PowerPC
From: Yanfei Xu @ 2026-05-31 14:02 UTC (permalink / raw)
  To: Sean Christopherson, Yanfei Xu
  Cc: zhaotianrui, maobibo, chenhuacai, maddy, npiggin, sashiko-reviews,
	pbonzini, kvm, loongarch, linuxppc-dev, caixiangfeng,
	fangying.tommy, Sashiko
In-Reply-To: <ahoYdrs9dVzp6Ps6@google.com>


On 2026/5/30 06:51, Sean Christopherson wrote:
> Can you split this into two patches, and send a v2?  I suspect the reason no one
> has picked this up is because it straddles two completely different (sub)subsystems.

That makes sense. Done :)

Thanks,
Yanfei

>
> That would also make it easier to get the fixes backported to stable trees.  PPC
> has been around a lot longer than LoongArch, so I assume the PPC fix will need to
> go further back in time.
>
> Thanks!


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox