* [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode
@ 2026-06-04 13:24 fangyu.yu
2026-06-04 13:24 ` [PATCH v3 1/9] riscv: kexec: Reset executable bit on the control code page in cleanup fangyu.yu
` (8 more replies)
0 siblings, 9 replies; 14+ messages in thread
From: fangyu.yu @ 2026-06-04 13:24 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Anup Patel, Atish Patra, Nick Kossifidis
Cc: Song Shuai, Björn Töpel, Ard Biesheuvel, Conor Dooley,
Arnd Bergmann, Thomas Zimmermann, Richard Lyu, Nam Cao,
Jisheng Zhang, Nathan Chancellor, guoren, linux-riscv,
linux-kernel, kexec, kvm-riscv, kvm, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
In a RISC-V kernel, both kexec and crashdump need to hand off execution
to the next kernel after tearing down the current kernel address space.
However, under virtualization the guest uses two-stage address
translation, and pc does not jump to stvec after setting satp to zero,
so the legacy single-step "csrw satp,0 + stvec redirect" sequence
traps with "kvm run failed Operation not supported" and the VCPU dies.
This patch set introduces a dedicated kexec trampoline text section and
builds a minimal trampoline page table for it. Both handoffs are then
reworked into a two-pass trampoline:
1. First enter via the kernel VA, install the trampoline page table,
and jump to the trampoline VA(=PA) of the entry stub;
2. Continue execution with PC already on a PA, drop SATP with
csrw satp,0 (now safe because PC re-anchoring is moot), and
jump directly to the target -- either the crash kernel entry
(crash path) or the per-image control_code_buffer that runs
the relocate body with SATP=0 throughout (normal path).
With this, both kexec and crashdump in RISC-V guests become robust
against the two-stage translation.
Tested on QEMU virt under two configurations:
* HS-mode bare (QEMU TCG) -- regression check
- normal kexec: kexec -l/-e succeeds, second kernel boots and
prints the userspace SECOND BOOT marker.
- crash kdump: panic triggers crash kernel boot, /proc/vmcore
opens cleanly in crash(1) and shows the panic
backtrace.
* VS-mode (L0 x86 + QEMU TCG -> L1 riscv64 + KVM -> L2)
Before this series, both paths die with
kvm run failed Operation not supported
and an all-zero M-mode register dump on the SATP transition.
After this series, both paths succeed end-to-end and the
vmcore opens cleanly in crash.
---
Changes in v3 (Sashiko AI review):
- Add two new Fixes: patches at the start of the series:
#1: machine_kexec_cleanup() was empty, so the set_memory_x()
call in prepare() leaked an executable direct-map page back
to the buddy allocator on kexec -u (W^X bypass). Fix: add
set_memory_nx() in cleanup.
#2: machine_kexec_prepare() FDT search checked
memsz <= sizeof(fdt) but read sizeof(fdt) bytes from
segment[i].buf, which can be smaller than memsz. Fix: check
bufsz instead.
- Inline the .kexec.tramp.text section definition directly into
vmlinux.lds.S instead of using a macro in image-vars.h.
- Rewrite map_tramp_page() to share a single set of lower-level
page tables between the VA and PA mappings (5 BSS pages instead
of 9), with a collision-safe walker that only populates entries
still zero. Add Sv32 support.
- Link to v2:
https://lore.kernel.org/linux-riscv/20260526125009.2404-1-fangyu.yu@linux.alibaba.com/
- Link to v1:
https://lore.kernel.org/linux-riscv/20260324114527.91494-1-fangyu.yu@linux.alibaba.com/
Fangyu Yu (9):
riscv: kexec: Reset executable bit on the control code page in cleanup
riscv: kexec: Bound FDT search by source buffer size, not destination
riscv: Add kexec trampoline text section to vmlinux.lds.S
riscv: kexec: Place norelocate trampoline into .kexec.tramp.text
riscv: kexec: Build trampoline page tables for crash kernel entry
riscv: kexec: Switch to trampoline page table before norelocate
riscv: kexec: Always build the trampoline page table
riscv: kexec: Add the relocate-trampoline wrapper
riscv: kexec: Route normal kexec through the trampoline page table
arch/riscv/include/asm/kexec.h | 5 +
arch/riscv/kernel/kexec_relocate.S | 92 +++++++++++-----
arch/riscv/kernel/machine_kexec.c | 171 +++++++++++++++++++++++++++--
arch/riscv/kernel/vmlinux.lds.S | 10 ++
4 files changed, 241 insertions(+), 37 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v3 1/9] riscv: kexec: Reset executable bit on the control code page in cleanup
2026-06-04 13:24 [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode fangyu.yu
@ 2026-06-04 13:24 ` fangyu.yu
2026-06-04 13:24 ` [PATCH v3 2/9] riscv: kexec: Bound FDT search by source buffer size, not destination fangyu.yu
` (7 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: fangyu.yu @ 2026-06-04 13:24 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Anup Patel, Atish Patra, Nick Kossifidis
Cc: Song Shuai, Björn Töpel, Ard Biesheuvel, Conor Dooley,
Arnd Bergmann, Thomas Zimmermann, Richard Lyu, Nam Cao,
Jisheng Zhang, Nathan Chancellor, guoren, linux-riscv,
linux-kernel, kexec, kvm-riscv, kvm, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
machine_kexec_prepare() calls set_memory_x() on the per-image
control_code_page so the relocate stub copied into it can be executed
during a normal kexec. machine_kexec_cleanup() is empty, so when the
image is freed (via kexec -u, or because a later step in load failed)
the page is returned to the buddy allocator with its executable bit
still set. Once the page is reallocated for arbitrary kernel data,
the W^X invariant is broken: a writable page also marked executable.
Implement the architecture cleanup hook to call set_memory_nx() on
the control code page for non-crash images, mirroring the
set_memory_x() in prepare().
The crash path does not call set_memory_x() (the crash kernel is
loaded into the reserved crashkernel region whose pages are not in
the buddy allocator) and so does not need the cleanup.
Fixes: fba8a8674f68 ("RISC-V: Add kexec support")
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/kernel/machine_kexec.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index 2306ce3e5f22..ea6794c9f4c2 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -91,6 +91,19 @@ machine_kexec_prepare(struct kimage *image)
void
machine_kexec_cleanup(struct kimage *image)
{
+ void *control_code_buffer;
+
+ if (image->type == KEXEC_TYPE_CRASH || !image->control_code_page)
+ return;
+
+ /*
+ * machine_kexec_prepare() called set_memory_x() on the control
+ * code page for non-crash images. Revert it before kimage_free()
+ * returns the page to the buddy allocator, so we do not leak an
+ * executable page back into general allocation.
+ */
+ control_code_buffer = page_address(image->control_code_page);
+ set_memory_nx((unsigned long)control_code_buffer, 1);
}
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 2/9] riscv: kexec: Bound FDT search by source buffer size, not destination
2026-06-04 13:24 [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode fangyu.yu
2026-06-04 13:24 ` [PATCH v3 1/9] riscv: kexec: Reset executable bit on the control code page in cleanup fangyu.yu
@ 2026-06-04 13:24 ` fangyu.yu
2026-06-04 13:37 ` sashiko-bot
2026-06-04 13:24 ` [PATCH v3 3/9] riscv: Add kexec trampoline text section to vmlinux.lds.S fangyu.yu
` (6 subsequent siblings)
8 siblings, 1 reply; 14+ messages in thread
From: fangyu.yu @ 2026-06-04 13:24 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Anup Patel, Atish Patra, Nick Kossifidis
Cc: Song Shuai, Björn Töpel, Ard Biesheuvel, Conor Dooley,
Arnd Bergmann, Thomas Zimmermann, Richard Lyu, Nam Cao,
Jisheng Zhang, Nathan Chancellor, guoren, linux-riscv,
linux-kernel, kexec, kvm-riscv, kvm, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
The FDT search loop in machine_kexec_prepare() reads sizeof(fdt) bytes
from segment[i].buf to identify the device tree, but it gates the read
on segment[i].memsz, which is the destination size in the next kernel.
kexec allows bufsz < memsz (the loaded image is zero-padded at the
destination), so a caller can craft a segment with bufsz=10 and
memsz=1MB:
if (image->segment[i].memsz <= sizeof(fdt)) /* 1MB > 40, OK */
continue;
memcpy(&fdt, image->segment[i].buf, sizeof(fdt)); /* reads 40
from a 10-byte
kbuf */
For kexec_file_load (image->file_mode), the read walks 30 bytes past
the kernel-allocated kbuf. In the worst case the trailing bytes fall
in an unmapped guard page and the read faults the kernel; in the
common case the read returns garbage which fdt_check_header() rejects
and the loop continues. The plain kexec_load path is shielded by
copy_from_user(), which validates the read against the user mapping.
Replace the memsz check with the bufsz check, which is the right
bound for the read site.
Fixes: fba8a8674f68 ("RISC-V: Add kexec support")
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/kernel/machine_kexec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index ea6794c9f4c2..e6e179cffc44 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -38,7 +38,7 @@ machine_kexec_prepare(struct kimage *image)
/* Find the Flattened Device Tree and save its physical address */
for (i = 0; i < image->nr_segments; i++) {
- if (image->segment[i].memsz <= sizeof(fdt))
+ if (image->segment[i].bufsz < sizeof(fdt))
continue;
if (image->file_mode)
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 3/9] riscv: Add kexec trampoline text section to vmlinux.lds.S
2026-06-04 13:24 [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode fangyu.yu
2026-06-04 13:24 ` [PATCH v3 1/9] riscv: kexec: Reset executable bit on the control code page in cleanup fangyu.yu
2026-06-04 13:24 ` [PATCH v3 2/9] riscv: kexec: Bound FDT search by source buffer size, not destination fangyu.yu
@ 2026-06-04 13:24 ` fangyu.yu
2026-06-04 13:24 ` [PATCH v3 4/9] riscv: kexec: Place norelocate trampoline into .kexec.tramp.text fangyu.yu
` (5 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: fangyu.yu @ 2026-06-04 13:24 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Anup Patel, Atish Patra, Nick Kossifidis
Cc: Song Shuai, Björn Töpel, Ard Biesheuvel, Conor Dooley,
Arnd Bergmann, Thomas Zimmermann, Richard Lyu, Nam Cao,
Jisheng Zhang, Nathan Chancellor, guoren, linux-riscv,
linux-kernel, kexec, kvm-riscv, kvm, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
When CONFIG_KEXEC_CORE is enabled, add a dedicated .kexec.tramp.text
area to the RISC-V kernel linker script.
Extend vmlinux.lds.S to:
- align both the start and the end to PAGE_SIZE
- define __kexec_tramp_text_start/__kexec_tramp_text_end
- KEEP all .kexec.tramp.text* input sections
- ASSERT the trampoline text fits within one page
The end-of-section page alignment guarantees that the trampoline page,
which is later identity-mapped as PAGE_KERNEL_EXEC, contains nothing but
the trampoline code and padding (no shared neighbour data).
When kexec is disabled, the whole block is excluded via #ifdef
CONFIG_KEXEC_CORE.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/kernel/vmlinux.lds.S | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/riscv/kernel/vmlinux.lds.S b/arch/riscv/kernel/vmlinux.lds.S
index 1f4f8496941a..bc615f7b702f 100644
--- a/arch/riscv/kernel/vmlinux.lds.S
+++ b/arch/riscv/kernel/vmlinux.lds.S
@@ -41,6 +41,16 @@ SECTIONS
ENTRY_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
+#ifdef CONFIG_KEXEC_CORE
+ . = ALIGN(PAGE_SIZE);
+ __kexec_tramp_text_start = .;
+ KEEP(*(.kexec.tramp.text))
+ KEEP(*(.kexec.tramp.text.*))
+ __kexec_tramp_text_end = .;
+ ASSERT((__kexec_tramp_text_end - __kexec_tramp_text_start) <= PAGE_SIZE,
+ ".kexec.tramp.text exceeds PAGE_SIZE");
+ . = ALIGN(PAGE_SIZE);
+#endif
_etext = .;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 4/9] riscv: kexec: Place norelocate trampoline into .kexec.tramp.text
2026-06-04 13:24 [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode fangyu.yu
` (2 preceding siblings ...)
2026-06-04 13:24 ` [PATCH v3 3/9] riscv: Add kexec trampoline text section to vmlinux.lds.S fangyu.yu
@ 2026-06-04 13:24 ` fangyu.yu
2026-06-04 13:24 ` [PATCH v3 5/9] riscv: kexec: Build trampoline page tables for crash kernel entry fangyu.yu
` (4 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: fangyu.yu @ 2026-06-04 13:24 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Anup Patel, Atish Patra, Nick Kossifidis
Cc: Song Shuai, Björn Töpel, Ard Biesheuvel, Conor Dooley,
Arnd Bergmann, Thomas Zimmermann, Richard Lyu, Nam Cao,
Jisheng Zhang, Nathan Chancellor, guoren, linux-riscv,
linux-kernel, kexec, kvm-riscv, kvm, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Move riscv_kexec_norelocate out of the generic .text section and into
a dedicated executable trampoline section, .kexec.tramp.text.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/include/asm/kexec.h | 4 ++++
arch/riscv/kernel/kexec_relocate.S | 2 +-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
index b9ee8346cc8c..6466c1f00d41 100644
--- a/arch/riscv/include/asm/kexec.h
+++ b/arch/riscv/include/asm/kexec.h
@@ -75,4 +75,8 @@ int load_extra_segments(struct kimage *image, unsigned long kernel_start,
unsigned long cmdline_len);
#endif
+#ifndef __ASSEMBLY__
+extern char __kexec_tramp_text_start[];
+#endif
+
#endif
diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
index de0a4b35d01e..af6b99f5b0fd 100644
--- a/arch/riscv/kernel/kexec_relocate.S
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -147,7 +147,7 @@ riscv_kexec_relocate_end:
/* Used for jumping to crashkernel */
-.section ".text"
+.section ".kexec.tramp.text", "ax"
SYM_CODE_START(riscv_kexec_norelocate)
/*
* s0: (const) Phys address to jump to
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 5/9] riscv: kexec: Build trampoline page tables for crash kernel entry
2026-06-04 13:24 [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode fangyu.yu
` (3 preceding siblings ...)
2026-06-04 13:24 ` [PATCH v3 4/9] riscv: kexec: Place norelocate trampoline into .kexec.tramp.text fangyu.yu
@ 2026-06-04 13:24 ` fangyu.yu
2026-06-04 13:24 ` [PATCH v3 6/9] riscv: kexec: Switch to trampoline page table before norelocate fangyu.yu
` (3 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: fangyu.yu @ 2026-06-04 13:24 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Anup Patel, Atish Patra, Nick Kossifidis
Cc: Song Shuai, Björn Töpel, Ard Biesheuvel, Conor Dooley,
Arnd Bergmann, Thomas Zimmermann, Richard Lyu, Nam Cao,
Jisheng Zhang, Nathan Chancellor, guoren, linux-riscv,
linux-kernel, kexec, kvm-riscv, kvm, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Crash kexec uses riscv_kexec_norelocate as a trampoline to jump into
the crashkernel. Pre-build dedicated 4 KB page tables in
machine_kexec_prepare() that map the trampoline page as executable,
so the panic path only has to switch satp and jump.
Two mappings are installed into a shared pgd:
- VA(__kexec_tramp_text_start) -> PA(__kexec_tramp_text_start)
- PA(__kexec_tramp_text_start) -> PA(__kexec_tramp_text_start)
The lower-level tables (p4d/pud/pmd/pte) are shared between both
mappings; map_tramp_page() walks the existing tree and only
populates entries that are still zero, so the two installs coexist
even when their indices happen to collide at any level.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/kernel/machine_kexec.c | 87 +++++++++++++++++++++++++++++++
1 file changed, 87 insertions(+)
diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index e6e179cffc44..1947b7bdf5c4 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -18,6 +18,85 @@
#include <linux/interrupt.h>
#include <linux/irq.h>
+/*
+ * Trampoline page tables. Both the VA(trampoline)->PA and the
+ * PA(trampoline)->PA identity mapping are installed in this single
+ * pgd; the lower-level tables are shared so the two mappings can
+ * coexist even if they happen to collide at any level (the walker
+ * only populates entries that are still zero).
+ *
+ * Pre-allocate for the largest paging mode (Sv57). Levels that the
+ * runtime mode does not use simply waste a page or two of BSS, in
+ * exchange for a builder that is infallible and safe to run from
+ * the panic path.
+ */
+static pgd_t kexec_tramp_pgd[PTRS_PER_PGD] __aligned(PAGE_SIZE);
+#ifdef CONFIG_64BIT
+static p4d_t kexec_tramp_p4d[PTRS_PER_P4D] __aligned(PAGE_SIZE);
+static pud_t kexec_tramp_pud[PTRS_PER_PUD] __aligned(PAGE_SIZE);
+static pmd_t kexec_tramp_pmd[PTRS_PER_PMD] __aligned(PAGE_SIZE);
+#endif
+static pte_t kexec_tramp_pte[PTRS_PER_PTE] __aligned(PAGE_SIZE);
+
+static void map_tramp_page(unsigned long va, unsigned long pa)
+{
+ pgd_t *pgd = kexec_tramp_pgd + pgd_index(va);
+
+#ifdef CONFIG_64BIT
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ if (pgtable_l5_enabled) {
+ if (pgd_val(*pgd) == 0)
+ set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa_symbol(kexec_tramp_p4d)),
+ PAGE_TABLE));
+ p4d = kexec_tramp_p4d + p4d_index(va);
+ } else {
+ p4d = (p4d_t *)pgd;
+ }
+
+ if (pgtable_l4_enabled) {
+ if (p4d_val(*p4d) == 0)
+ set_p4d(p4d, pfn_p4d(PFN_DOWN(__pa_symbol(kexec_tramp_pud)),
+ PAGE_TABLE));
+ pud = kexec_tramp_pud + pud_index(va);
+ } else {
+ pud = (pud_t *)p4d;
+ }
+
+ if (pud_val(*pud) == 0)
+ set_pud(pud, pfn_pud(PFN_DOWN(__pa_symbol(kexec_tramp_pmd)),
+ PAGE_TABLE));
+ pmd = kexec_tramp_pmd + pmd_index(va);
+
+ if (pmd_val(*pmd) == 0)
+ set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa_symbol(kexec_tramp_pte)),
+ PAGE_TABLE));
+#else
+ /* Sv32: PGD points directly to the PTE table. */
+ if (pgd_val(*pgd) == 0)
+ set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa_symbol(kexec_tramp_pte)),
+ PAGE_TABLE));
+#endif
+
+ set_pte(kexec_tramp_pte + pte_index(va),
+ pfn_pte(PFN_DOWN(pa), PAGE_KERNEL_EXEC));
+}
+
+static void riscv_kexec_build_tramp(unsigned long va, unsigned long pa)
+{
+ /* VA -> PA: map the trampoline page via its kernel VA. */
+ map_tramp_page(va, pa);
+
+ /*
+ * PA -> PA: identity-map the same page so the second-pass code
+ * can keep executing after the kernel VA mapping is dropped.
+ */
+ map_tramp_page(pa, pa);
+}
+
+
/*
* machine_kexec_prepare - Initialize kexec
*
@@ -73,6 +152,14 @@ machine_kexec_prepare(struct kimage *image)
/* Mark the control page executable */
set_memory_x((unsigned long) control_code_buffer, 1);
+ } else {
+ /*
+ * Crash kexec uses riscv_kexec_norelocate as a trampoline.
+ * Pre-build the trampoline page tables here so the panic
+ * path only has to switch satp and jump.
+ */
+ riscv_kexec_build_tramp((unsigned long)__kexec_tramp_text_start,
+ __pa_symbol(__kexec_tramp_text_start));
}
return 0;
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 6/9] riscv: kexec: Switch to trampoline page table before norelocate
2026-06-04 13:24 [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode fangyu.yu
` (4 preceding siblings ...)
2026-06-04 13:24 ` [PATCH v3 5/9] riscv: kexec: Build trampoline page tables for crash kernel entry fangyu.yu
@ 2026-06-04 13:24 ` fangyu.yu
2026-06-04 13:40 ` sashiko-bot
2026-06-04 13:24 ` [PATCH v3 7/9] riscv: kexec: Always build the trampoline page table fangyu.yu
` (2 subsequent siblings)
8 siblings, 1 reply; 14+ messages in thread
From: fangyu.yu @ 2026-06-04 13:24 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Anup Patel, Atish Patra, Nick Kossifidis
Cc: Song Shuai, Björn Töpel, Ard Biesheuvel, Conor Dooley,
Arnd Bergmann, Thomas Zimmermann, Richard Lyu, Nam Cao,
Jisheng Zhang, Nathan Chancellor, guoren, linux-riscv,
linux-kernel, kexec, kvm-riscv, kvm, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Make riscv_kexec_norelocate a two-pass trampoline so it can
drop the kernel page tables while still executing from a
mapped address.
On the first entry, t3 is initialized to 0 by machine_kexec().
Loads the physical address of riscv_kexec_norelocate and the
trampoline SATP value, switches to the trampoline page table,
and jumps to the trampoline VA(=PA).
On the second entry, t3 contains the physical address of
riscv_kexec_norelocate, so the PC comparison matches and
execution continues under trampoline VA(=PA).
Since the trampoline page table is already active, replace the
previous stvec-based handoff with a direct jump to the target
entry (jr a2).
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/kernel/kexec_relocate.S | 30 +++++++++++++++-----
arch/riscv/kernel/machine_kexec.c | 44 +++++++++++++++++++++++++++---
2 files changed, 63 insertions(+), 11 deletions(-)
diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
index af6b99f5b0fd..8cfdf6f4032a 100644
--- a/arch/riscv/kernel/kexec_relocate.S
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -147,13 +147,35 @@ riscv_kexec_relocate_end:
/* Used for jumping to crashkernel */
+.extern kexec_tramp_satp
+.extern riscv_kexec_norelocate_pa
.section ".kexec.tramp.text", "ax"
SYM_CODE_START(riscv_kexec_norelocate)
+ /*
+ * Two-pass entry:
+ * - 1st entry: t3 == 0 (initialized by machine_kexec()).
+ *
+ * - 2nd entry: t3 holds the physical address of
+ * riscv_kexec_norelocate, so auipc matches t3 and we fall through
+ * to label 1 to continue execution under trampoline VA(=PA).
+ */
+ auipc t0, 0
+ beq t0, t3, 1f
+
+ la t0, riscv_kexec_norelocate_pa
+ REG_L t3, 0(t0)
+ la t0, kexec_tramp_satp
+ REG_L t1, 0(t0)
+ csrw CSR_SATP, t1
+ sfence.vma x0, x0
+
+ jr t3
/*
* s0: (const) Phys address to jump to
* s1: (const) Phys address of the FDT image
* s2: (const) The hartid of the current hart
*/
+1:
mv s0, a1
mv s1, a2
mv s2, a3
@@ -198,14 +220,8 @@ SYM_CODE_START(riscv_kexec_norelocate)
csrw CSR_SCAUSE, zero
csrw CSR_SSCRATCH, zero
- /*
- * Switch to physical addressing
- * This will also trigger a jump to CSR_STVEC
- * which in this case is the address of the new
- * kernel.
- */
- csrw CSR_STVEC, a2
csrw CSR_SATP, zero
+ jr a2
SYM_CODE_END(riscv_kexec_norelocate)
diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index 1947b7bdf5c4..72817bba5d3b 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -18,6 +18,8 @@
#include <linux/interrupt.h>
#include <linux/irq.h>
+unsigned long kexec_tramp_satp;
+unsigned long riscv_kexec_norelocate_pa;
/*
* Trampoline page tables. Both the VA(trampoline)->PA and the
* PA(trampoline)->PA identity mapping are installed in this single
@@ -155,11 +157,17 @@ machine_kexec_prepare(struct kimage *image)
} else {
/*
* Crash kexec uses riscv_kexec_norelocate as a trampoline.
- * Pre-build the trampoline page tables here so the panic
- * path only has to switch satp and jump.
+ * Pre-build the trampoline page tables and capture the
+ * trampoline SATP value plus the physical address of
+ * riscv_kexec_norelocate so that the panic path only has
+ * to switch satp and jump.
*/
riscv_kexec_build_tramp((unsigned long)__kexec_tramp_text_start,
__pa_symbol(__kexec_tramp_text_start));
+ WRITE_ONCE(riscv_kexec_norelocate_pa,
+ __pa_symbol(&riscv_kexec_norelocate));
+ WRITE_ONCE(kexec_tramp_satp,
+ PFN_DOWN(__pa_symbol(kexec_tramp_pgd)) | satp_mode);
}
return 0;
@@ -276,7 +284,35 @@ machine_kexec(struct kimage *image)
/* Jump to the relocation code */
pr_notice("Bye...\n");
- kexec_method(first_ind_entry, jump_addr, fdt_addr,
- this_hart_id, kernel_map.va_pa_offset);
+ /*
+ * Hand off to the trampoline. For KEXEC_TYPE_CRASH we go into
+ * riscv_kexec_norelocate, which uses t3 as the 1st/2nd-pass
+ * discriminator (must be 0 on first entry). A bare
+ * asm volatile ("li t3, 0" ::: "t3")
+ * before the C call only declares t3 *modified*; the compiler is
+ * free to use t3 as scratch when materialising args. Pin t3 = 0
+ * (and the args) via local register variables and perform the
+ * indirect jump inside the same inline asm so t3 == 0 is
+ * guaranteed at the moment control leaves machine_kexec().
+ */
+ {
+ register unsigned long a0_val asm("a0") = first_ind_entry;
+ register unsigned long a1_val asm("a1") = jump_addr;
+ register unsigned long a2_val asm("a2") = fdt_addr;
+ register unsigned long a3_val asm("a3") = this_hart_id;
+ register unsigned long a4_val asm("a4") = kernel_map.va_pa_offset;
+ register unsigned long t3_zero asm("t3") = 0;
+ register riscv_kexec_method m asm("t6") = kexec_method;
+
+ asm volatile (
+ "jr %[m]"
+ :
+ : "r" (a0_val), "r" (a1_val), "r" (a2_val),
+ "r" (a3_val), "r" (a4_val),
+ "r" (t3_zero),
+ [m] "r" (m)
+ : "memory"
+ );
+ }
unreachable();
}
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 7/9] riscv: kexec: Always build the trampoline page table
2026-06-04 13:24 [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode fangyu.yu
` (5 preceding siblings ...)
2026-06-04 13:24 ` [PATCH v3 6/9] riscv: kexec: Switch to trampoline page table before norelocate fangyu.yu
@ 2026-06-04 13:24 ` fangyu.yu
2026-06-04 13:24 ` [PATCH v3 8/9] riscv: kexec: Add the relocate-trampoline wrapper fangyu.yu
2026-06-04 13:24 ` [PATCH v3 9/9] riscv: kexec: Route normal kexec through the trampoline page table fangyu.yu
8 siblings, 0 replies; 14+ messages in thread
From: fangyu.yu @ 2026-06-04 13:24 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Anup Patel, Atish Patra, Nick Kossifidis
Cc: Song Shuai, Björn Töpel, Ard Biesheuvel, Conor Dooley,
Arnd Bergmann, Thomas Zimmermann, Richard Lyu, Nam Cao,
Jisheng Zhang, Nathan Chancellor, guoren, linux-riscv,
linux-kernel, kexec, kvm-riscv, kvm, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
The trampoline page table and the kexec_tramp_satp value are
currently built only on the crash path. A follow-up patch needs
the same infrastructure for the normal kexec path.
Pull the trampoline build and the WRITE_ONCE() that publishes the
SATP value out of the crash-only else branch in
machine_kexec_prepare(). The crash path keeps recording its own
riscv_kexec_norelocate_pa; the normal path keeps its existing
control_code_buffer copy.
No functional change.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/kernel/machine_kexec.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index 72817bba5d3b..d82f45fb44b6 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -139,6 +139,16 @@ machine_kexec_prepare(struct kimage *image)
return -EINVAL;
}
+ /*
+ * Build the trampoline page table and capture its SATP value.
+ * The crash path consumes it today; the non-crash kexec path
+ * will use the same setup as well.
+ */
+ riscv_kexec_build_tramp((unsigned long)__kexec_tramp_text_start,
+ __pa_symbol(__kexec_tramp_text_start));
+ WRITE_ONCE(kexec_tramp_satp,
+ PFN_DOWN(__pa_symbol(kexec_tramp_pgd)) | satp_mode);
+
/* Copy the assembler code for relocation to the control page */
if (image->type != KEXEC_TYPE_CRASH) {
control_code_buffer = page_address(image->control_code_page);
@@ -155,19 +165,8 @@ machine_kexec_prepare(struct kimage *image)
/* Mark the control page executable */
set_memory_x((unsigned long) control_code_buffer, 1);
} else {
- /*
- * Crash kexec uses riscv_kexec_norelocate as a trampoline.
- * Pre-build the trampoline page tables and capture the
- * trampoline SATP value plus the physical address of
- * riscv_kexec_norelocate so that the panic path only has
- * to switch satp and jump.
- */
- riscv_kexec_build_tramp((unsigned long)__kexec_tramp_text_start,
- __pa_symbol(__kexec_tramp_text_start));
WRITE_ONCE(riscv_kexec_norelocate_pa,
__pa_symbol(&riscv_kexec_norelocate));
- WRITE_ONCE(kexec_tramp_satp,
- PFN_DOWN(__pa_symbol(kexec_tramp_pgd)) | satp_mode);
}
return 0;
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 8/9] riscv: kexec: Add the relocate-trampoline wrapper
2026-06-04 13:24 [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode fangyu.yu
` (6 preceding siblings ...)
2026-06-04 13:24 ` [PATCH v3 7/9] riscv: kexec: Always build the trampoline page table fangyu.yu
@ 2026-06-04 13:24 ` fangyu.yu
2026-06-04 13:46 ` sashiko-bot
2026-06-04 13:24 ` [PATCH v3 9/9] riscv: kexec: Route normal kexec through the trampoline page table fangyu.yu
8 siblings, 1 reply; 14+ messages in thread
From: fangyu.yu @ 2026-06-04 13:24 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Anup Patel, Atish Patra, Nick Kossifidis
Cc: Song Shuai, Björn Töpel, Ard Biesheuvel, Conor Dooley,
Arnd Bergmann, Thomas Zimmermann, Richard Lyu, Nam Cao,
Jisheng Zhang, Nathan Chancellor, guoren, linux-riscv,
linux-kernel, kexec, kvm-riscv, kvm, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Add riscv_kexec_relocate_entry to .kexec.tramp.text and the two
asm-visible globals (riscv_kexec_relocate_entry_pa and
riscv_kexec_cc_buffer_pa) that the wrapper consumes.
The wrapper performs the same two-step transition used by the crash
path: switch to the trampoline pgd, jump to the PA of self, then drop
the MMU with PC already on a PA. It finally jumps to the PA of
control_code_buffer.
machine_kexec_prepare() publishes the wrapper PA via WRITE_ONCE for
non-crash images. The per-image control_code_buffer PA is published
later, at dispatch time, so a load failure between prepare() and the
kexec_image swap cannot leave the global pointing at a freed page.
Nothing routes to the wrapper yet; the switchover happens in the
follow-up patch.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/include/asm/kexec.h | 1 +
arch/riscv/kernel/kexec_relocate.S | 36 ++++++++++++++++++++++++++++++
arch/riscv/kernel/machine_kexec.c | 5 +++++
3 files changed, 42 insertions(+)
diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
index 6466c1f00d41..b75cab959e53 100644
--- a/arch/riscv/include/asm/kexec.h
+++ b/arch/riscv/include/asm/kexec.h
@@ -53,6 +53,7 @@ typedef void (*riscv_kexec_method)(unsigned long first_ind_entry,
unsigned long va_pa_off);
extern riscv_kexec_method riscv_kexec_norelocate;
+extern riscv_kexec_method riscv_kexec_relocate_entry;
#ifdef CONFIG_KEXEC_FILE
extern const struct kexec_file_ops elf_kexec_ops;
diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
index 8cfdf6f4032a..6c624560c9ac 100644
--- a/arch/riscv/kernel/kexec_relocate.S
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -225,6 +225,42 @@ SYM_CODE_START(riscv_kexec_norelocate)
SYM_CODE_END(riscv_kexec_norelocate)
+.extern riscv_kexec_relocate_entry_pa
+.extern riscv_kexec_cc_buffer_pa
+.section ".kexec.tramp.text", "ax"
+SYM_CODE_START(riscv_kexec_relocate_entry)
+ /*
+ * Two-pass entry, identical in shape to riscv_kexec_norelocate:
+ * - 1st entry: t3 == 0 (initialized by machine_kexec()).
+ * - 2nd entry: t3 == PA of riscv_kexec_relocate_entry, so auipc
+ * matches t3 and we fall through to label 1.
+ * Args a0..a4 are passed through unchanged to riscv_kexec_relocate.
+ */
+ auipc t0, 0
+ beq t0, t3, 1f
+
+ la t0, riscv_kexec_relocate_entry_pa
+ REG_L t3, 0(t0)
+ la t0, kexec_tramp_satp
+ REG_L t1, 0(t0)
+ csrw CSR_SATP, t1
+ sfence.vma x0, x0
+
+ jr t3
+1:
+ /*
+ * Now executing at the PA of this wrapper with the trampoline pgd
+ * installed (identity-mapped). Drop the MMU; PC stays valid because
+ * it is already a PA.
+ */
+ csrw CSR_SATP, zero
+
+ /* Jump to the PA of control_code_buffer to run the relocate body. */
+ la t0, riscv_kexec_cc_buffer_pa
+ REG_L t0, 0(t0)
+ jr t0
+SYM_CODE_END(riscv_kexec_relocate_entry)
+
.section ".rodata"
SYM_DATA(riscv_kexec_relocate_size,
.long riscv_kexec_relocate_end - riscv_kexec_relocate)
diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index d82f45fb44b6..71688c63af65 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -20,6 +20,8 @@
unsigned long kexec_tramp_satp;
unsigned long riscv_kexec_norelocate_pa;
+unsigned long riscv_kexec_relocate_entry_pa;
+unsigned long riscv_kexec_cc_buffer_pa;
/*
* Trampoline page tables. Both the VA(trampoline)->PA and the
* PA(trampoline)->PA identity mapping are installed in this single
@@ -164,6 +166,9 @@ machine_kexec_prepare(struct kimage *image)
/* Mark the control page executable */
set_memory_x((unsigned long) control_code_buffer, 1);
+
+ WRITE_ONCE(riscv_kexec_relocate_entry_pa,
+ __pa_symbol(&riscv_kexec_relocate_entry));
} else {
WRITE_ONCE(riscv_kexec_norelocate_pa,
__pa_symbol(&riscv_kexec_norelocate));
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 9/9] riscv: kexec: Route normal kexec through the trampoline page table
2026-06-04 13:24 [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode fangyu.yu
` (7 preceding siblings ...)
2026-06-04 13:24 ` [PATCH v3 8/9] riscv: kexec: Add the relocate-trampoline wrapper fangyu.yu
@ 2026-06-04 13:24 ` fangyu.yu
2026-06-04 13:36 ` sashiko-bot
8 siblings, 1 reply; 14+ messages in thread
From: fangyu.yu @ 2026-06-04 13:24 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Anup Patel, Atish Patra, Nick Kossifidis
Cc: Song Shuai, Björn Töpel, Ard Biesheuvel, Conor Dooley,
Arnd Bergmann, Thomas Zimmermann, Richard Lyu, Nam Cao,
Jisheng Zhang, Nathan Chancellor, guoren, linux-riscv,
linux-kernel, kexec, kvm-riscv, kvm, Fangyu Yu
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
riscv_kexec_relocate (copied into control_code_buffer) uses an stvec
trick to drop the MMU and land on the PA of the next loop label.
Under VS-mode KVM cannot emulate this single-step transition and the
VCPU dies with "kvm run failed Operation not supported".
Route normal kexec through riscv_kexec_relocate_entry, the trampoline
wrapper added in the previous patch. It drops SATP with PC already on
a PA, then hands off to control_code_buffer where the relocate body
runs with SATP=0.
Drop the stvec trick from the relocate body and pass first_ind_entry
as a physical address since the body now starts with SATP=0. The
".align 2" plus filler "nop" that ensured the PA of the loop top was
4-byte aligned -- required because the legacy stvec trick wrote that
PA into stvec.BASE, whose low two bits are MODE and are discarded by
the hardware -- is no longer load-bearing and is removed as well.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
arch/riscv/kernel/kexec_relocate.S | 26 ++++++--------------------
arch/riscv/kernel/machine_kexec.c | 27 +++++++++++++++++++--------
2 files changed, 25 insertions(+), 28 deletions(-)
diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
index 6c624560c9ac..7ffb83ea45fc 100644
--- a/arch/riscv/kernel/kexec_relocate.S
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -34,27 +34,13 @@ SYM_CODE_START(riscv_kexec_relocate)
csrw CSR_SIP, zero
/*
- * When we switch SATP.MODE to "Bare" we'll only
- * play with physical addresses. However the first time
- * we try to jump somewhere, the offset on the jump
- * will be relative to pc which will still be on VA. To
- * deal with this we set stvec to the physical address at
- * the start of the loop below so that we jump there in
- * any case.
+ * The trampoline wrapper (riscv_kexec_relocate_entry) has already
+ * dropped the MMU and handed control to us at this PA copy of the
+ * relocate code. From here on the entire loop runs with SATP=0 and
+ * every address (s0, s5, source/dest pointers) is a physical one.
*/
- la s6, 1f
- sub s6, s6, s4
- csrw CSR_STVEC, s6
-
- /*
- * With C-extension, here we get 42 Bytes and the next
- * .align directive would pad zeros here up to 44 Bytes.
- * So manually put a nop here to avoid zeros padding.
- */
- nop
/* Process entries in a loop */
-.align 2
1:
REG_L t0, 0(s0) /* t0 = *image->entry */
addi s0, s0, RISCV_SZPTR /* image->entry++ */
@@ -70,8 +56,8 @@ SYM_CODE_START(riscv_kexec_relocate)
andi t1, t0, 0x2
beqz t1, 2f
andi s0, t0, ~0x2
- csrw CSR_SATP, zero
- jr s6
+ /* MMU is already off; the entry wrapper handled the transition. */
+ j 1b
2:
/* IND_DONE entry ? -> jump to done label */
diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index 71688c63af65..82fcb84a03ec 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -164,9 +164,6 @@ machine_kexec_prepare(struct kimage *image)
memcpy(control_code_buffer, riscv_kexec_relocate,
riscv_kexec_relocate_size);
- /* Mark the control page executable */
- set_memory_x((unsigned long) control_code_buffer, 1);
-
WRITE_ONCE(riscv_kexec_relocate_entry_pa,
__pa_symbol(&riscv_kexec_relocate_entry));
} else {
@@ -262,11 +259,15 @@ machine_kexec(struct kimage *image)
{
struct kimage_arch *internal = &image->arch;
unsigned long jump_addr = (unsigned long) image->start;
- unsigned long first_ind_entry = (unsigned long) &image->head;
+ /*
+ * The relocate body runs entirely with the MMU off (the wrapper
+ * drops SATP before jumping into control_code_buffer), so the very
+ * first entry must be a physical address.
+ */
+ unsigned long first_ind_entry = __pa(&image->head);
unsigned long this_cpu_id = __smp_processor_id();
unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id);
unsigned long fdt_addr = internal->fdt_addr;
- void *control_code_buffer = page_address(image->control_code_page);
riscv_kexec_method kexec_method = NULL;
#ifdef CONFIG_SMP
@@ -274,10 +275,20 @@ machine_kexec(struct kimage *image)
"Some CPUs may be stale, kdump will be unreliable.\n");
#endif
- if (image->type != KEXEC_TYPE_CRASH)
- kexec_method = control_code_buffer;
- else
+ if (image->type != KEXEC_TYPE_CRASH) {
+ kexec_method = (riscv_kexec_method) &riscv_kexec_relocate_entry;
+ /*
+ * Publish the per-image control_code_buffer PA at dispatch
+ * time rather than in machine_kexec_prepare(). machine_kexec()
+ * only runs once the image has been fully loaded and committed
+ * as kexec_image, so the global cannot be left pointing at a
+ * page freed by a failed load.
+ */
+ WRITE_ONCE(riscv_kexec_cc_buffer_pa,
+ __pa(page_address(image->control_code_page)));
+ } else {
kexec_method = (riscv_kexec_method) &riscv_kexec_norelocate;
+ }
pr_notice("Will call new kernel at %08lx from hart id %lx\n",
jump_addr, this_hart_id);
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v3 9/9] riscv: kexec: Route normal kexec through the trampoline page table
2026-06-04 13:24 ` [PATCH v3 9/9] riscv: kexec: Route normal kexec through the trampoline page table fangyu.yu
@ 2026-06-04 13:36 ` sashiko-bot
0 siblings, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-04 13:36 UTC (permalink / raw)
To: fangyu.yu; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The patch removes the `set_memory_x()` call in `machine_kexec_prepare()` but fails to remove the corresponding `set_memory_nx()` call in `machine_kexec_cleanup()`, leading to an unbalanced page protection modification.
--
commit 1f36a552a8c04150c5598c4bdd508c89d475252b
Author: Fangyu Yu <fangyu.yu@linux.alibaba.com>
riscv: kexec: Route normal kexec through the trampoline page table
This commit routes normal kexec through a trampoline wrapper, which drops
SATP with PC already on a PA. As a result, the relocation code now runs
entirely with the MMU off.
> diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
> index 71688c63af65f..82fcb84a03ec5 100644
> --- a/arch/riscv/kernel/machine_kexec.c
> +++ b/arch/riscv/kernel/machine_kexec.c
> @@ -164,9 +164,6 @@ machine_kexec_prepare(struct kimage *image)
> memcpy(control_code_buffer, riscv_kexec_relocate,
> riscv_kexec_relocate_size);
>
> - /* Mark the control page executable */
> - set_memory_x((unsigned long) control_code_buffer, 1);
> -
[Severity: Medium]
By removing this set_memory_x() call here, does machine_kexec_cleanup()
now have an unbalanced set_memory_nx() call?
Looking at machine_kexec_cleanup() in arch/riscv/kernel/machine_kexec.c:
void
machine_kexec_cleanup(struct kimage *image)
{
...
control_code_buffer = page_address(image->control_code_page);
set_memory_nx((unsigned long)control_code_buffer, 1);
}
Since __set_memory() unconditionally splits the linear mapping on RISC-V,
will leaving this set_memory_nx() call permanently shatter direct map huge
pages down to 4KB PTEs every time a non-crash kexec image is unloaded or
fails to load?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260604132418.15725-1-fangyu.yu@linux.alibaba.com?part=9
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 2/9] riscv: kexec: Bound FDT search by source buffer size, not destination
2026-06-04 13:24 ` [PATCH v3 2/9] riscv: kexec: Bound FDT search by source buffer size, not destination fangyu.yu
@ 2026-06-04 13:37 ` sashiko-bot
0 siblings, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-04 13:37 UTC (permalink / raw)
To: fangyu.yu; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [Critical] In `kexec_file_load`, an attacker can bypass Secure Boot restrictions by providing an initrd that starts with a forged FDT header, overwriting the kernel-constructed FDT.
- [High] The check `if (!internal->fdt_addr)` falsely rejects a valid FDT if it is placed at physical address 0x0.
--
commit 0ee8b0469cc5460085b86fc5826ddc9fa0522c40
Author: Fangyu Yu <fangyu.yu@linux.alibaba.com>
riscv: kexec: Bound FDT search by source buffer size, not destination
This commit changes the FDT size check in machine_kexec_prepare() to use
bufsz instead of memsz. This correctly bounds the read to the provided
source buffer, preventing out-of-bounds reads and potential kernel faults
when a segment is padded with zeros at its destination.
> diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
> index ea6794c9f4c2d..e6e179cffc447 100644
> --- a/arch/riscv/kernel/machine_kexec.c
> +++ b/arch/riscv/kernel/machine_kexec.c
> @@ -38,21 +38,21 @@ machine_kexec_prepare(struct kimage *image)
>
> /* Find the Flattened Device Tree and save its physical address */
> for (i = 0; i < image->nr_segments; i++) {
> - if (image->segment[i].memsz <= sizeof(fdt))
> + if (image->segment[i].bufsz < sizeof(fdt))
> continue;
>
> if (image->file_mode)
> memcpy(&fdt, image->segment[i].buf, sizeof(fdt));
> else if (copy_from_user(&fdt, image->segment[i].buf, sizeof(fdt)))
> continue;
>
> if (fdt_check_header(&fdt))
> continue;
>
> internal->fdt_addr = (unsigned long) image->segment[i].mem;
> break;
[Severity: Critical]
This is a pre-existing issue, but does this blind search loop in
machine_kexec_prepare() allow a user-provided segment to spoof the FDT?
For kexec_file_load, architecture-specific loading adds the user-provided
initrd segment before the kernel-constructed FDT segment. If a malicious
initrd starts with FDT magic bytes, it seems this loop would accept it and
break early.
Would this cause the next kernel to boot using an attacker-controlled FDT
instead of the kernel-generated one? This could allow injection of arbitrary
bootargs (like init=/bin/sh module.sig_enforce=0) and bypass Secure Boot
restrictions.
> }
>
> if (!internal->fdt_addr) {
[Severity: High]
This is a pre-existing issue, but could the !internal->fdt_addr check in
machine_kexec_prepare() incorrectly reject a valid FDT if it is placed at
physical address 0x0?
On RISC-V platforms where the physical memory base starts at 0x0 (such as
the T-Head TH1520), the FDT segment can legitimately be loaded at physical
address 0x0. The search loop will correctly set internal->fdt_addr = 0, but
this check would treat it as an error and return -EINVAL.
Is there a risk that this false rejection breaks kexec functionality on
platforms where memory starts at 0x0?
> pr_err("Device tree not included in the provided image\n");
> return -EINVAL;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260604132418.15725-1-fangyu.yu@linux.alibaba.com?part=2
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 6/9] riscv: kexec: Switch to trampoline page table before norelocate
2026-06-04 13:24 ` [PATCH v3 6/9] riscv: kexec: Switch to trampoline page table before norelocate fangyu.yu
@ 2026-06-04 13:40 ` sashiko-bot
0 siblings, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-04 13:40 UTC (permalink / raw)
To: fangyu.yu; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] Missing `sfence.vma` after clearing `CSR_SATP` causes a fatal Instruction Page Fault on kexec.
--
commit 4cdd8f4ea42becec74890b3443cd153befa568f3
Author: Fangyu Yu <fangyu.yu@linux.alibaba.com>
riscv: kexec: Switch to trampoline page table before norelocate
This commit updates riscv_kexec_norelocate to act as a two-pass trampoline,
allowing it to drop kernel page tables while still executing from a mapped
address. It also replaces the previous stvec-based exception handoff with a
direct jump to the target entry.
> diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
> index af6b99f5b0fd9..8cfdf6f4032a8 100644
> --- a/arch/riscv/kernel/kexec_relocate.S
> +++ b/arch/riscv/kernel/kexec_relocate.S
[ ... ]
> @@ -198,14 +220,8 @@ SYM_CODE_START(riscv_kexec_norelocate)
> csrw CSR_SCAUSE, zero
> csrw CSR_SSCRATCH, zero
>
> - /*
> - * Switch to physical addressing
> - * This will also trigger a jump to CSR_STVEC
> - * which in this case is the address of the new
> - * kernel.
> - */
> - csrw CSR_STVEC, a2
> csrw CSR_SATP, zero
> + jr a2
>
> SYM_CODE_END(riscv_kexec_norelocate)
[Severity: Critical]
Does this sequence require an sfence.vma instruction after clearing
CSR_SATP?
According to the RISC-V Privileged Architecture specification, writes to the
satp register do not imply any synchronization. A subsequent instruction fetch
may still use the old satp value unless explicitly synchronized with an
sfence.vma instruction.
Without sfence.vma, the CPU pipeline is allowed to fetch the instruction at
a2 (the start of the new kernel) using the old satp value. Because a2 is a
physical address and is not identity-mapped in the trampoline page table,
this fetch could trigger an instruction page fault.
Since the stvec override was removed by this patch, the trap would attempt
to vector to the old kernel's default trap handler. As that handler is also
unmapped in the trampoline, this sequence appears to result in a double fault
and a system hang during kexec.
If a similar pattern is introduced for riscv_kexec_relocate_entry later in
this series, it might also need an sfence.vma to avoid a page fault when
fetching the relocation code.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260604132418.15725-1-fangyu.yu@linux.alibaba.com?part=6
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 8/9] riscv: kexec: Add the relocate-trampoline wrapper
2026-06-04 13:24 ` [PATCH v3 8/9] riscv: kexec: Add the relocate-trampoline wrapper fangyu.yu
@ 2026-06-04 13:46 ` sashiko-bot
0 siblings, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-04 13:46 UTC (permalink / raw)
To: fangyu.yu; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Missing `sfence.vma` after `csrw CSR_SATP, zero` allows unsynchronized execution to use the old trampoline page table, causing page faults during kexec.
- [High] Incorrect physical address calculation for `riscv_kexec_cc_buffer_pa` using PC-relative `la` under `CONFIG_XIP_KERNEL`.
--
commit 6b76f586f13ebcc7aa4fa66a977acaa2bac3a894
Author: Fangyu Yu <fangyu.yu@linux.alibaba.com>
riscv: kexec: Add the relocate-trampoline wrapper
This patch adds the riscv_kexec_relocate_entry wrapper to the kexec
trampoline text section. It performs the transition to the trampoline PGD,
drops the MMU, and jumps to the physical address of the control code buffer.
> diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
> index 8cfdf6f4032a8..6c624560c9ac0 100644
> --- a/arch/riscv/kernel/kexec_relocate.S
> +++ b/arch/riscv/kernel/kexec_relocate.S
[ ... ]
> @@ -225,6 +225,42 @@ SYM_CODE_START(riscv_kexec_norelocate)
>
> SYM_CODE_END(riscv_kexec_norelocate)
>
> +.extern riscv_kexec_relocate_entry_pa
> +.extern riscv_kexec_cc_buffer_pa
> +.section ".kexec.tramp.text", "ax"
> +SYM_CODE_START(riscv_kexec_relocate_entry)
[ ... ]
> +1:
> + /*
> + * Now executing at the PA of this wrapper with the trampoline pgd
> + * installed (identity-mapped). Drop the MMU; PC stays valid because
> + * it is already a PA.
> + */
> + csrw CSR_SATP, zero
[Severity: High]
Does dropping the MMU here require an sfence.vma instruction to synchronize
the pipeline?
Without it, the hardware might defer the translation mode switch and use the
old trampoline page table for subsequent instructions.
Since the trampoline page table only identity-maps the single trampoline
.text page, the immediate memory read from the unmapped .bss region could
trigger a Load Page Fault.
It looks like riscv_kexec_norelocate might have a similar issue where it
executes a jump to the crash kernel without an sfence.vma, which could cause
a nested Instruction Page Fault.
> +
> + /* Jump to the PA of control_code_buffer to run the relocate body. */
> + la t0, riscv_kexec_cc_buffer_pa
> + REG_L t0, 0(t0)
[Severity: High]
Will this PC-relative physical address calculation work correctly under
CONFIG_XIP_KERNEL?
The la instruction expands to a PC-relative offset based strictly on virtual
addresses.
Under CONFIG_XIP_KERNEL, .text executes from ROM and .bss resides in RAM,
so their virtual offset does not match their physical distance.
When executed with the MMU off, adding this virtual offset to the physical
ROM PC might yield an invalid physical address, causing the REG_L
instruction to crash the kexec transition.
> + jr t0
> +SYM_CODE_END(riscv_kexec_relocate_entry)
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260604132418.15725-1-fangyu.yu@linux.alibaba.com?part=8
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-06-04 13:46 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 13:24 [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode fangyu.yu
2026-06-04 13:24 ` [PATCH v3 1/9] riscv: kexec: Reset executable bit on the control code page in cleanup fangyu.yu
2026-06-04 13:24 ` [PATCH v3 2/9] riscv: kexec: Bound FDT search by source buffer size, not destination fangyu.yu
2026-06-04 13:37 ` sashiko-bot
2026-06-04 13:24 ` [PATCH v3 3/9] riscv: Add kexec trampoline text section to vmlinux.lds.S fangyu.yu
2026-06-04 13:24 ` [PATCH v3 4/9] riscv: kexec: Place norelocate trampoline into .kexec.tramp.text fangyu.yu
2026-06-04 13:24 ` [PATCH v3 5/9] riscv: kexec: Build trampoline page tables for crash kernel entry fangyu.yu
2026-06-04 13:24 ` [PATCH v3 6/9] riscv: kexec: Switch to trampoline page table before norelocate fangyu.yu
2026-06-04 13:40 ` sashiko-bot
2026-06-04 13:24 ` [PATCH v3 7/9] riscv: kexec: Always build the trampoline page table fangyu.yu
2026-06-04 13:24 ` [PATCH v3 8/9] riscv: kexec: Add the relocate-trampoline wrapper fangyu.yu
2026-06-04 13:46 ` sashiko-bot
2026-06-04 13:24 ` [PATCH v3 9/9] riscv: kexec: Route normal kexec through the trampoline page table fangyu.yu
2026-06-04 13:36 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox