Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode
@ 2026-06-04 13:24 fangyu.yu
  2026-06-04 13:24 ` [PATCH v3 1/9] riscv: kexec: Reset executable bit on the control code page in cleanup fangyu.yu
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: fangyu.yu @ 2026-06-04 13:24 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Anup Patel, Atish Patra, Nick Kossifidis
  Cc: Song Shuai, Björn Töpel, Ard Biesheuvel, Conor Dooley,
	Arnd Bergmann, Thomas Zimmermann, Richard Lyu, Nam Cao,
	Jisheng Zhang, Nathan Chancellor, guoren, linux-riscv,
	linux-kernel, kexec, kvm-riscv, kvm, Fangyu Yu

From: Fangyu Yu <fangyu.yu@linux.alibaba.com>

In a RISC-V kernel, both kexec and crashdump need to hand off execution
to the next kernel after tearing down the current kernel address space.
However, under virtualization the guest uses two-stage address
translation, and pc does not jump to stvec after setting satp to zero,
so the legacy single-step "csrw satp,0 + stvec redirect" sequence
traps with "kvm run failed Operation not supported" and the VCPU dies.

This patch set introduces a dedicated kexec trampoline text section and
builds a minimal trampoline page table for it. Both handoffs are then
reworked into a two-pass trampoline:

    1. First enter via the kernel VA, install the trampoline page table,
       and jump to the trampoline VA(=PA) of the entry stub;
    2. Continue execution with PC already on a PA, drop SATP with
       csrw satp,0 (now safe because PC re-anchoring is moot), and
       jump directly to the target -- either the crash kernel entry
       (crash path) or the per-image control_code_buffer that runs
       the relocate body with SATP=0 throughout (normal path).

With this, both kexec and crashdump in RISC-V guests become robust
against the two-stage translation.

Tested on QEMU virt under two configurations:

  * HS-mode bare (QEMU TCG) -- regression check
      - normal kexec: kexec -l/-e succeeds, second kernel boots and
                      prints the userspace SECOND BOOT marker.
      - crash kdump: panic triggers crash kernel boot, /proc/vmcore
                     opens cleanly in crash(1) and shows the panic
                     backtrace.

  * VS-mode (L0 x86 + QEMU TCG -> L1 riscv64 + KVM -> L2)
      Before this series, both paths die with
          kvm run failed Operation not supported
      and an all-zero M-mode register dump on the SATP transition.
      After this series, both paths succeed end-to-end and the
      vmcore opens cleanly in crash.

---
Changes in v3 (Sashiko AI review):
    - Add two new Fixes: patches at the start of the series:
      #1: machine_kexec_cleanup() was empty, so the set_memory_x()
          call in prepare() leaked an executable direct-map page back
          to the buddy allocator on kexec -u (W^X bypass). Fix: add
          set_memory_nx() in cleanup.
      #2: machine_kexec_prepare() FDT search checked
          memsz <= sizeof(fdt) but read sizeof(fdt) bytes from
          segment[i].buf, which can be smaller than memsz. Fix: check
          bufsz instead.
    - Inline the .kexec.tramp.text section definition directly into
      vmlinux.lds.S instead of using a macro in image-vars.h.
    - Rewrite map_tramp_page() to share a single set of lower-level
      page tables between the VA and PA mappings (5 BSS pages instead
      of 9), with a collision-safe walker that only populates entries
      still zero. Add Sv32 support.
    - Link to v2:
      https://lore.kernel.org/linux-riscv/20260526125009.2404-1-fangyu.yu@linux.alibaba.com/
    - Link to v1:
      https://lore.kernel.org/linux-riscv/20260324114527.91494-1-fangyu.yu@linux.alibaba.com/


Fangyu Yu (9):
  riscv: kexec: Reset executable bit on the control code page in cleanup
  riscv: kexec: Bound FDT search by source buffer size, not destination
  riscv: Add kexec trampoline text section to vmlinux.lds.S
  riscv: kexec: Place norelocate trampoline into .kexec.tramp.text
  riscv: kexec: Build trampoline page tables for crash kernel entry
  riscv: kexec: Switch to trampoline page table before norelocate
  riscv: kexec: Always build the trampoline page table
  riscv: kexec: Add the relocate-trampoline wrapper
  riscv: kexec: Route normal kexec through the trampoline page table

 arch/riscv/include/asm/kexec.h     |   5 +
 arch/riscv/kernel/kexec_relocate.S |  92 +++++++++++-----
 arch/riscv/kernel/machine_kexec.c  | 171 +++++++++++++++++++++++++++--
 arch/riscv/kernel/vmlinux.lds.S    |  10 ++
 4 files changed, 241 insertions(+), 37 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-06-04 13:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 13:24 [PATCH v3 0/9] riscv: kexec: Make kexec/kdump robust under VS-mode fangyu.yu
2026-06-04 13:24 ` [PATCH v3 1/9] riscv: kexec: Reset executable bit on the control code page in cleanup fangyu.yu
2026-06-04 13:24 ` [PATCH v3 2/9] riscv: kexec: Bound FDT search by source buffer size, not destination fangyu.yu
2026-06-04 13:37   ` sashiko-bot
2026-06-04 13:24 ` [PATCH v3 3/9] riscv: Add kexec trampoline text section to vmlinux.lds.S fangyu.yu
2026-06-04 13:24 ` [PATCH v3 4/9] riscv: kexec: Place norelocate trampoline into .kexec.tramp.text fangyu.yu
2026-06-04 13:24 ` [PATCH v3 5/9] riscv: kexec: Build trampoline page tables for crash kernel entry fangyu.yu
2026-06-04 13:24 ` [PATCH v3 6/9] riscv: kexec: Switch to trampoline page table before norelocate fangyu.yu
2026-06-04 13:40   ` sashiko-bot
2026-06-04 13:24 ` [PATCH v3 7/9] riscv: kexec: Always build the trampoline page table fangyu.yu
2026-06-04 13:24 ` [PATCH v3 8/9] riscv: kexec: Add the relocate-trampoline wrapper fangyu.yu
2026-06-04 13:46   ` sashiko-bot
2026-06-04 13:24 ` [PATCH v3 9/9] riscv: kexec: Route normal kexec through the trampoline page table fangyu.yu
2026-06-04 13:36   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox