* [PATCH v5 00/10] Implement MPIPL for PowerNV
@ 2026-03-10 12:46 Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 01/10] ppc/pnv: Move SBE host doorbell function to top of file Aditya Gupta
` (11 more replies)
0 siblings, 12 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 12:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-ppc, Hari Bathini, Sourabh Jain, Harsh Prateek Bora,
Nicholas Piggin, Miles Glenn, Chinmay Rath, Shivang Upadhyay
Overview
=========
Implemented MPIPL (Memory Preserving IPL, aka fadump) on PowerNV machine
in QEMU.
Fadump is an alternative dump mechanism to kdump, in which we the firmware
does a memory preserving boot, and the second/crashkernel is booted fresh
like a normal system reset, instead of the crashed kernel loading the
second/crashkernel in case of kdump.
MPIPL in PowerNV, is similar to fadump in Pseries. The idea is same, memory
preserving, where in PowerNV we are assisted by SBE (Self Boot Engine) &
Hostboot, while in Pseries we are assisted by PHyp (Power Hypervisor)
For implementing in baremetal/powernv QEMU, we need to export a
"ibm,opal/dump" node in the device tree, to tell the kernel we support
MPIPL
Once kernel sees the support, and "fadump=on" is passed on commandline,
kernel will register memory regions to preserve with Skiboot.
Kernel sends these data using OPAL calls, after which skiboot/opal saves
the memory region details to MDST and MDDT tables (S-source, D-destination)
Then in the event of a kernel crash, the kernel initiates MPIPL with another
OPAL code (opal_cec_reboot2), this request goes to Skiboot.
Skiboot then triggers the "S0 Interrupt" to the SBE (Self Boot Engine),
along with OPAL's relocated base address.
SBE then stops all core clocks, and only does particular ISteps for a
memory preserving boot.
Then, hostboot comes up, and with help of the relocated base address, it
accesses MDST & MDDT tables (S-source and D-destination), and preserves the
memory regions according to the data in these tables.
And after preserving, it writes the preserved memory region details to MDRT
tables (R-Result), for the kernel to know where/whether a memory region is
preserved.
Both SBE's and hostboot responsiblities are implemented in the SBE code
in QEMU.
Then in the second kernel/crashkernel boot, OPAL passes the "mpipl-boot"
property for the kernel to know that a dump is active, which kernel then
exports in /proc/vmcore
Testing
====================
1. Git tree for testing: https://gitlab.com/adi-g15-ibm/qemu/tree/fadump-powernv-v5
2. Gitlab pipeline: https://gitlab.com/adi-g15-ibm/qemu/-/pipelines/2375470651
3. Analysing generated vmcore:
# ls -lh /proc/vmcore
-r-------- 1 root root 4.5G Mar 10 12:30 /proc/vmcore
# file /proc/vmcore
/proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style
# crash vmlinux-38fec10eb60d-network vmcore-powernv-10mar26
...
KERNEL: vmlinux-38fec10eb60d-network
DUMPFILE: vmcore-powernv-10mar26
CPUS: 2
DATE: Thu Jan 1 05:30:00 IST 1970
UPTIME: 00:00:50
LOAD AVERAGE: 0.57, 0.19, 0.07
TASKS: 83
NODENAME: buildroot
RELEASE: 6.14.0
VERSION: #1 SMP Thu Apr 3 08:06:13 CDT 2025
MACHINE: ppc64le (1000 Mhz)
MEMORY: 6 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
PID: 238
COMMAND: "sh"
TASK: c00000000a0f3200 [THREAD_INFO: c00000000a0f3200]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> # ps and kmem -i works
Changelog
====================
v4 -> v5:
* #4/10: set chunk_id=0 before copying
* #7/10: remove unnecessary bool check, ie. 'if (b1) b2=b1 else b2=!b1' => 'b2=b1'
v3 -> v4:
* #2/10: s/recieves/receives
* #7/10: remove empty line at EOF
v2 -> v3:
* rebase to upstream, changes in patches below
* #2/10: no code change. add comment that skiboot triggers S0
* #3/10: stash command: handle invalid skiboot_base sent by guest
* #4/10: s/src_len/data_len/
* #4/10: use TARGET_FMT_lx/PRIx64 instead of %lx to prevent build errors
* #4/10: stop copying chunks once copying a chunk fails
* #5/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write}
* #5/10: add more SPRs to be saved, same set of SPRs as spapr FADump, except CR and FPSCR
* #7/10: only export "mpipl-boot" property if preserving cpu states and writing MDRT was successful, otherwise continue with normal reboot
* #7/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write}
* #8/10: reword commit description to mention fw-load-area, no code change
* #10/10: add entry in MAINTAINERS file
Aditya Gupta (10):
ppc/pnv: Move SBE host doorbell function to top of file
ppc/mpipl: Implement S0 SBE interrupt
ppc/pnv: Handle stash command in PowerNV SBE
pnv/mpipl: Preserve memory regions as per MDST/MDDT tables
pnv/mpipl: Preserve CPU registers after crash
pnv/mpipl: Set thread entry size to be allocated by firmware
pnv/mpipl: Write the preserved CPU and MDRT state
pnv/mpipl: Enable MPIPL support
tests/functional: Add test for MPIPL in PowerNV
MAINTAINERS: Add entry for MPIPL (PowerNV)
MAINTAINERS | 8 +
hw/ppc/meson.build | 1 +
hw/ppc/pnv.c | 98 ++++++
hw/ppc/pnv_mpipl.c | 482 ++++++++++++++++++++++++++
hw/ppc/pnv_sbe.c | 84 ++++-
include/hw/ppc/pnv.h | 7 +
include/hw/ppc/pnv_mpipl.h | 168 +++++++++
tests/functional/ppc64/test_fadump.py | 35 +-
8 files changed, 852 insertions(+), 31 deletions(-)
create mode 100644 hw/ppc/pnv_mpipl.c
create mode 100644 include/hw/ppc/pnv_mpipl.h
--
2.53.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v5 01/10] ppc/pnv: Move SBE host doorbell function to top of file
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
@ 2026-03-10 12:46 ` Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 02/10] ppc/mpipl: Implement S0 SBE interrupt Aditya Gupta
` (10 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 12:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-ppc, Hari Bathini, Sourabh Jain, Harsh Prateek Bora,
Nicholas Piggin, Miles Glenn, Chinmay Rath, Shivang Upadhyay
Moved 'pnv_sbe_set_host_doorbell' as-it-is to above
'pnv_sbe_power9_xscom_ctrl_write'.
This is done since in a future patch, S0 interrupt implementation uses
'pnv_sbe_set_host_doorbell', hence the host doorbell function needs to
be declared/defined before 'pnv_sbe_power9_xscom_ctrl_write' where we
implement the S0 interrupt.
No functional change.
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
hw/ppc/pnv_sbe.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/hw/ppc/pnv_sbe.c b/hw/ppc/pnv_sbe.c
index 27383ce6837e..247617338a0d 100644
--- a/hw/ppc/pnv_sbe.c
+++ b/hw/ppc/pnv_sbe.c
@@ -80,6 +80,15 @@
#define SBE_CONTROL_REG_S0 PPC_BIT(14)
#define SBE_CONTROL_REG_S1 PPC_BIT(15)
+static void pnv_sbe_set_host_doorbell(PnvSBE *sbe, uint64_t val)
+{
+ val &= SBE_HOST_RESPONSE_MASK; /* Is this right? What does HW do? */
+ sbe->host_doorbell = val;
+
+ trace_pnv_sbe_reg_set_host_doorbell(val);
+ qemu_set_irq(sbe->psi_irq, !!val);
+}
+
struct sbe_msg {
uint64_t reg[4];
};
@@ -125,15 +134,6 @@ static const MemoryRegionOps pnv_sbe_power9_xscom_ctrl_ops = {
.endianness = DEVICE_BIG_ENDIAN,
};
-static void pnv_sbe_set_host_doorbell(PnvSBE *sbe, uint64_t val)
-{
- val &= SBE_HOST_RESPONSE_MASK; /* Is this right? What does HW do? */
- sbe->host_doorbell = val;
-
- trace_pnv_sbe_reg_set_host_doorbell(val);
- qemu_set_irq(sbe->psi_irq, !!val);
-}
-
/* SBE Target Type */
#define SBE_TARGET_TYPE_PROC 0x00
#define SBE_TARGET_TYPE_EX 0x01
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 02/10] ppc/mpipl: Implement S0 SBE interrupt
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 01/10] ppc/pnv: Move SBE host doorbell function to top of file Aditya Gupta
@ 2026-03-10 12:46 ` Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 03/10] ppc/pnv: Handle stash command in PowerNV SBE Aditya Gupta
` (9 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 12:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-ppc, Hari Bathini, Sourabh Jain, Harsh Prateek Bora,
Nicholas Piggin, Miles Glenn, Chinmay Rath, Shivang Upadhyay
During MPIPL (aka fadump), after a kernel crash, the kernel does
opal_cec_reboot2 opal call, signifying an abnormal termination.
When OPAL receives this opal call, it further triggers SBE S0 interrupt,
to trigger a MPIPL boot.
Currently S0 interrupt is unimplemented in QEMU.
Implement S0 interrupt as 'pause_vcpus' + 'guest_reset' in QEMU, as the
SBE's implementation of S0 seems to be basically "stop all clocks" and
then "host reset".
pause_vcpus is done in a later patch when register preserving support is
added
See 'stopClocksS0' in SBE source code for more information.
Also log both S0 and S1 interrupts.
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
hw/ppc/meson.build | 1 +
hw/ppc/pnv_mpipl.c | 26 ++++++++++++++++++++++++++
hw/ppc/pnv_sbe.c | 29 +++++++++++++++++++++++++++++
include/hw/ppc/pnv.h | 6 ++++++
include/hw/ppc/pnv_mpipl.h | 19 +++++++++++++++++++
5 files changed, 81 insertions(+)
create mode 100644 hw/ppc/pnv_mpipl.c
create mode 100644 include/hw/ppc/pnv_mpipl.h
diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
index f7dac87a2a48..c61fba4ec8f2 100644
--- a/hw/ppc/meson.build
+++ b/hw/ppc/meson.build
@@ -56,6 +56,7 @@ ppc_ss.add(when: 'CONFIG_POWERNV', if_true: files(
'pnv_pnor.c',
'pnv_nest_pervasive.c',
'pnv_n1_chiplet.c',
+ 'pnv_mpipl.c',
))
# PowerPC 4xx boards
ppc_ss.add(when: 'CONFIG_PPC405', if_true: files(
diff --git a/hw/ppc/pnv_mpipl.c b/hw/ppc/pnv_mpipl.c
new file mode 100644
index 000000000000..d8c9b7a428b7
--- /dev/null
+++ b/hw/ppc/pnv_mpipl.c
@@ -0,0 +1,26 @@
+/*
+ * Emulation of MPIPL (Memory Preserving Initial Program Load), aka fadump
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "system/runstate.h"
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_mpipl.h"
+
+void do_mpipl_preserve(PnvMachineState *pnv)
+{
+ /* Mark next boot as Memory-preserving boot */
+ pnv->mpipl_state.is_next_boot_mpipl = true;
+
+ /*
+ * Do a guest reset.
+ * Next reset will see 'is_next_boot_mpipl' as true, and trigger MPIPL
+ *
+ * Requirement:
+ * GUEST_RESET is expected to NOT clear the memory, as is the case when
+ * this is merged
+ */
+ qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+}
diff --git a/hw/ppc/pnv_sbe.c b/hw/ppc/pnv_sbe.c
index 247617338a0d..5a2b3342d199 100644
--- a/hw/ppc/pnv_sbe.c
+++ b/hw/ppc/pnv_sbe.c
@@ -26,6 +26,9 @@
#include "hw/ppc/pnv.h"
#include "hw/ppc/pnv_xscom.h"
#include "hw/ppc/pnv_sbe.h"
+#include "hw/ppc/pnv_mpipl.h"
+#include "system/cpus.h"
+#include "system/runstate.h"
#include "trace.h"
/*
@@ -113,11 +116,37 @@ static uint64_t pnv_sbe_power9_xscom_ctrl_read(void *opaque, hwaddr addr,
static void pnv_sbe_power9_xscom_ctrl_write(void *opaque, hwaddr addr,
uint64_t val, unsigned size)
{
+ PnvMachineState *pnv = PNV_MACHINE(qdev_get_machine());
+ PnvSBE *sbe = opaque;
uint32_t offset = addr >> 3;
trace_pnv_sbe_xscom_ctrl_write(addr, val);
switch (offset) {
+ case SBE_CONTROL_REG_RW:
+ switch (val) {
+ case SBE_CONTROL_REG_S0:
+ qemu_log_mask(LOG_UNIMP, "SBE: S0 Interrupt triggered\n");
+
+ pnv_sbe_set_host_doorbell(sbe, sbe->host_doorbell | SBE_HOST_RESPONSE_MASK);
+
+ /* Preserve memory regions and CPU state, if MPIPL is registered */
+ do_mpipl_preserve(pnv);
+
+ /*
+ * Control may not come back here as 'do_mpipl_preserve' triggers
+ * a guest reboot
+ */
+ break;
+ case SBE_CONTROL_REG_S1:
+ qemu_log_mask(LOG_UNIMP, "SBE: S1 Interrupt triggered\n");
+ break;
+ default:
+ qemu_log_mask(LOG_UNIMP,
+ "SBE: CONTROL_REG_RW: Unknown value: Ox%."
+ HWADDR_PRIx "\n", val);
+ }
+ break;
default:
qemu_log_mask(LOG_UNIMP, "SBE Unimplemented register: Ox%"
HWADDR_PRIx "\n", addr >> 3);
diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 24f8843a4090..7d73629f112a 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -25,6 +25,7 @@
#include "hw/core/sysbus.h"
#include "hw/ipmi/ipmi.h"
#include "hw/ppc/pnv_pnor.h"
+#include "hw/ppc/pnv_mpipl.h"
#define TYPE_PNV_CHIP "pnv-chip"
@@ -111,6 +112,8 @@ struct PnvMachineState {
bool big_core;
bool lpar_per_core;
+
+ MpiplPreservedState mpipl_state;
};
PnvChip *pnv_get_chip(PnvMachineState *pnv, uint32_t chip_id);
@@ -290,4 +293,7 @@ void pnv_bmc_set_pnor(IPMIBmc *bmc, PnvPnor *pnor);
#define PNV11_OCC_SENSOR_BASE(chip) PNV10_OCC_SENSOR_BASE(chip)
+/* MPIPL helpers */
+void do_mpipl_preserve(PnvMachineState *pnv);
+
#endif /* PPC_PNV_H */
diff --git a/include/hw/ppc/pnv_mpipl.h b/include/hw/ppc/pnv_mpipl.h
new file mode 100644
index 000000000000..c544984dc76d
--- /dev/null
+++ b/include/hw/ppc/pnv_mpipl.h
@@ -0,0 +1,19 @@
+/*
+ * Emulation of MPIPL (Memory Preserving Initial Program Load), aka fadump
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef PNV_MPIPL_H
+#define PNV_MPIPL_H
+
+#include "qemu/osdep.h"
+
+typedef struct MpiplPreservedState MpiplPreservedState;
+
+/* Preserved state to be saved in PnvMachineState */
+struct MpiplPreservedState {
+ bool is_next_boot_mpipl;
+};
+
+#endif
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 03/10] ppc/pnv: Handle stash command in PowerNV SBE
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 01/10] ppc/pnv: Move SBE host doorbell function to top of file Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 02/10] ppc/mpipl: Implement S0 SBE interrupt Aditya Gupta
@ 2026-03-10 12:46 ` Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 04/10] pnv/mpipl: Preserve memory regions as per MDST/MDDT tables Aditya Gupta
` (8 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 12:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-ppc, Hari Bathini, Sourabh Jain, Harsh Prateek Bora,
Nicholas Piggin, Miles Glenn, Chinmay Rath, Shivang Upadhyay
Earlier since the SBE_CMD_STASH_MPIPL_CONFIG command was not handled, so
skiboot used to not get any response from SBE:
[ 106.350742821,3] SBE: Message timeout [chip id = 0], cmd = d7, subcmd = 7
[ 106.352067746,3] SBE: Failed to send stash MPIPL config [chip id = 0x0, rc = 254]
Fix this by handling the command in PowerNV SBE, and sending a response so
skiboot knows SBE has handled the STASH command
The stashed skiboot base is later used to access the relocated MDST/MDDT
tables when MPIPL is implemented.
The purpose of stashing relocated base address is explained in following
skiboot commit:
author Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Fri Jul 12 16:47:51 2019 +0530
committer Oliver O'Halloran <oohall@gmail.com> Thu Aug 15 17:53:39 2019 +1000
SBE: Send OPAL relocated base address to SBE
OPAL relocates itself during boot. During memory preserving IPL hostboot needs
to access relocated OPAL base address to get MDST, MDDT tables. Hence send
relocated base address to SBE via 'stash MPIPL config' chip-op. During next
IPL SBE will send stashed data to hostboot... so that hostboot can access
these data.
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
hw/ppc/pnv_sbe.c | 37 +++++++++++++++++++++++++++++++++++++
include/hw/ppc/pnv_mpipl.h | 3 +++
2 files changed, 40 insertions(+)
diff --git a/hw/ppc/pnv_sbe.c b/hw/ppc/pnv_sbe.c
index 5a2b3342d199..46c5047f1c0a 100644
--- a/hw/ppc/pnv_sbe.c
+++ b/hw/ppc/pnv_sbe.c
@@ -233,8 +233,11 @@ static void sbe_timer(void *opaque)
static void do_sbe_msg(PnvSBE *sbe)
{
+ PnvMachineState *pnv = PNV_MACHINE(qdev_get_machine());
+ MachineState *machine = MACHINE(pnv);
struct sbe_msg msg;
uint16_t cmd, ctrl_flags, seq_id;
+ uint64_t mbox_val;
int i;
memset(&msg, 0, sizeof(msg));
@@ -265,6 +268,40 @@ static void do_sbe_msg(PnvSBE *sbe)
timer_del(sbe->timer);
}
break;
+ case SBE_CMD_STASH_MPIPL_CONFIG:
+ /* key = sbe->mbox[1] */
+ switch (sbe->mbox[1]) {
+ case SBE_STASH_KEY_SKIBOOT_BASE:
+ mbox_val = sbe->mbox[2];
+ if (mbox_val >= machine->ram_size) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "SBE: skiboot_base 0x%" PRIx64 " exceeds RAM size 0x%" PRIx64 "\n",
+ mbox_val, machine->ram_size);
+ return;
+ }
+
+ pnv->mpipl_state.skiboot_base = mbox_val;
+ qemu_log_mask(LOG_UNIMP,
+ "Stashing skiboot base: 0x%" HWADDR_PRIx "\n",
+ pnv->mpipl_state.skiboot_base);
+
+ /*
+ * Set the response register.
+ *
+ * Currently setting the same sequence number in
+ * response as we got in the request.
+ */
+ sbe->mbox[4] = sbe->mbox[0]; /* sequence number */
+ pnv_sbe_set_host_doorbell(sbe,
+ sbe->host_doorbell | SBE_HOST_RESPONSE_WAITING);
+
+ break;
+ default:
+ qemu_log_mask(LOG_UNIMP,
+ "SBE: CMD_STASH_MPIPL_CONFIG: Unimplemented key: 0x" TARGET_FMT_lx "\n",
+ sbe->mbox[1]);
+ }
+ break;
default:
qemu_log_mask(LOG_UNIMP, "SBE Unimplemented command: 0x%x\n", cmd);
}
diff --git a/include/hw/ppc/pnv_mpipl.h b/include/hw/ppc/pnv_mpipl.h
index c544984dc76d..60d6ede48209 100644
--- a/include/hw/ppc/pnv_mpipl.h
+++ b/include/hw/ppc/pnv_mpipl.h
@@ -8,11 +8,14 @@
#define PNV_MPIPL_H
#include "qemu/osdep.h"
+#include "exec/hwaddr.h"
typedef struct MpiplPreservedState MpiplPreservedState;
/* Preserved state to be saved in PnvMachineState */
struct MpiplPreservedState {
+ /* skiboot_base will be valid only after OPAL sends relocated base to SBE */
+ hwaddr skiboot_base;
bool is_next_boot_mpipl;
};
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 04/10] pnv/mpipl: Preserve memory regions as per MDST/MDDT tables
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
` (2 preceding siblings ...)
2026-03-10 12:46 ` [PATCH v5 03/10] ppc/pnv: Handle stash command in PowerNV SBE Aditya Gupta
@ 2026-03-10 12:46 ` Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 05/10] pnv/mpipl: Preserve CPU registers after crash Aditya Gupta
` (7 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 12:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-ppc, Hari Bathini, Sourabh Jain, Harsh Prateek Bora,
Nicholas Piggin, Miles Glenn, Chinmay Rath, Shivang Upadhyay
Implement copying of memory region, as mentioned by MDST and MDDT
tables.
Copy the memory regions from source to destination in chunks of 32MB
Note, qemu can fail preserving a particular entry due to any reason,
such as:
* region length mis-matching in MDST & MDDT
* failed copy due to access/decode/etc memory issues
HDAT doesn't specify any field in MDRT to notify host about such errors.
Though HDAT section "15.3.1.3 Memory Dump Results Table (MDRT)" says:
The Memory Dump Results Table is a list of the memory ranges that
have been included in the dump
Based on above statement, it looks like MDRT should include only those
regions which are successfully captured in the dump, hence, regions
which qemu fails to dump, just get skipped, and will not have a
corresponding entry in MDRT
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
hw/ppc/pnv_mpipl.c | 162 +++++++++++++++++++++++++++++++++++++
include/hw/ppc/pnv_mpipl.h | 86 ++++++++++++++++++++
2 files changed, 248 insertions(+)
diff --git a/hw/ppc/pnv_mpipl.c b/hw/ppc/pnv_mpipl.c
index d8c9b7a428b7..cef1fe2c4056 100644
--- a/hw/ppc/pnv_mpipl.c
+++ b/hw/ppc/pnv_mpipl.c
@@ -5,12 +5,174 @@
*/
#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/units.h"
+#include "system/address-spaces.h"
#include "system/runstate.h"
#include "hw/ppc/pnv.h"
#include "hw/ppc/pnv_mpipl.h"
+#include <math.h>
+
+#define MDST_TABLE_RELOCATED \
+ (pnv->mpipl_state.skiboot_base + MDST_TABLE_OFF)
+#define MDDT_TABLE_RELOCATED \
+ (pnv->mpipl_state.skiboot_base + MDDT_TABLE_OFF)
+
+/*
+ * Preserve the memory regions as pointed by MDST table
+ *
+ * During this, the memory region pointed by entries in MDST, are 'copied'
+ * as it is to the memory region pointed by corresponding entry in MDDT
+ *
+ * Notes: All reads should consider data coming from skiboot as big-endian,
+ * and data written should also be in big-endian
+ */
+static bool pnv_mpipl_preserve_mem(PnvMachineState *pnv)
+{
+ g_autofree MdstTableEntry *mdst = g_malloc(MDST_TABLE_SIZE);
+ g_autofree MddtTableEntry *mddt = g_malloc(MDDT_TABLE_SIZE);
+ g_autofree MdrtTableEntry *mdrt = g_malloc0(MDRT_TABLE_SIZE);
+ AddressSpace *default_as = &address_space_memory;
+ MemTxResult io_result;
+ MemTxAttrs attrs;
+ uint64_t src_addr, dest_addr;
+ uint32_t data_len;
+ uint64_t num_chunks, chunk_id = 0;
+ int mdrt_idx = 0;
+
+ /* Mark the memory transactions as privileged memory access */
+ attrs.user = 0;
+ attrs.memory = 1;
+
+ if (pnv->mpipl_state.mdrt_table) {
+ /*
+ * MDRT table allocated from some past crash, free the memory to
+ * prevent memory leak
+ */
+ g_free(pnv->mpipl_state.mdrt_table);
+ pnv->mpipl_state.num_mdrt_entries = 0;
+ }
+
+ io_result = address_space_read(default_as, MDST_TABLE_RELOCATED, attrs,
+ mdst, MDST_TABLE_SIZE);
+ if (io_result != MEMTX_OK) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "MPIPL: Failed to read MDST table at: 0x" TARGET_FMT_lx "\n",
+ MDST_TABLE_RELOCATED);
+
+ return false;
+ }
+
+ io_result = address_space_read(default_as, MDDT_TABLE_RELOCATED, attrs,
+ mddt, MDDT_TABLE_SIZE);
+ if (io_result != MEMTX_OK) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "MPIPL: Failed to read MDDT table at: 0x" TARGET_FMT_lx "\n",
+ MDDT_TABLE_RELOCATED);
+
+ return false;
+ }
+
+ /* Try to read all entries */
+ for (int i = 0; i < MDST_MAX_ENTRIES; ++i) {
+ g_autofree uint8_t *copy_buffer = NULL;
+ bool is_copy_failed = false;
+
+ /* Considering entry with address and size as 0, as end of table */
+ if ((mdst[i].addr == 0) && (mdst[i].size == 0)) {
+ break;
+ }
+
+ if (mdst[i].size != mddt[i].size) {
+ qemu_log_mask(LOG_TRACE,
+ "Warning: Invalid entry, size mismatch in MDST & MDDT\n");
+ continue;
+ }
+
+ if (mdst[i].data_region != mddt[i].data_region) {
+ qemu_log_mask(LOG_TRACE,
+ "Warning: Invalid entry, region mismatch in MDST & MDDT\n");
+ continue;
+ }
+
+ src_addr = be64_to_cpu(mdst[i].addr) & ~HRMOR_BIT;
+ dest_addr = be64_to_cpu(mddt[i].addr) & ~HRMOR_BIT;
+ data_len = be32_to_cpu(mddt[i].size);
+
+#define COPY_CHUNK_SIZE ((size_t)(32 * MiB))
+ copy_buffer = g_try_malloc(COPY_CHUNK_SIZE);
+ if (copy_buffer == NULL) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "MPIPL: Failed allocating memory (size: %zu) for copying"
+ " reserved memory regions\n", COPY_CHUNK_SIZE);
+ is_copy_failed = true;
+ continue;
+ }
+
+ chunk_id = 0;
+ num_chunks = ceil((data_len * 1.0f) / COPY_CHUNK_SIZE);
+ while (chunk_id < num_chunks) {
+ /* Take minimum of bytes left to copy, and chunk size */
+ uint64_t copy_len = MIN(
+ data_len - (chunk_id * COPY_CHUNK_SIZE),
+ COPY_CHUNK_SIZE
+ );
+
+ /* Copy the source region to destination */
+ io_result = address_space_read(default_as, src_addr, attrs,
+ copy_buffer, copy_len);
+ if (io_result != MEMTX_OK) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "MPIPL: Failed to read region at: 0x%" PRIx64 "\n",
+ src_addr);
+ is_copy_failed = true;
+ break;
+ }
+
+ io_result = address_space_write(default_as, dest_addr, attrs,
+ copy_buffer, copy_len);
+ if (io_result != MEMTX_OK) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "MPIPL: Failed to write region at: 0x%" PRIx64 "\n",
+ dest_addr);
+ is_copy_failed = true;
+ break;
+ }
+
+ src_addr += COPY_CHUNK_SIZE;
+ dest_addr += COPY_CHUNK_SIZE;
+ ++chunk_id;
+ }
+#undef COPY_CHUNK_SIZE
+
+ if (is_copy_failed) {
+ /*
+ * HDAT doesn't specify an error code in MDRT for failed copy,
+ * and doesn't specify how this is to be handled
+ * Hence just skip adding an entry in MDRT, as done for size
+ * mismatch or other inconsistency between MDST/MDDT
+ */
+ continue;
+ }
+
+ /* Populate entry in MDRT table if preserving successful */
+ mdrt[mdrt_idx].src_addr = cpu_to_be64(src_addr);
+ mdrt[mdrt_idx].dest_addr = cpu_to_be64(dest_addr);
+ mdrt[mdrt_idx].size = cpu_to_be32(data_len);
+ mdrt[mdrt_idx].data_region = mdst[i].data_region;
+ ++mdrt_idx;
+ }
+
+ pnv->mpipl_state.mdrt_table = g_steal_pointer(&mdrt);
+ pnv->mpipl_state.num_mdrt_entries = mdrt_idx;
+
+ return true;
+}
void do_mpipl_preserve(PnvMachineState *pnv)
{
+ pnv_mpipl_preserve_mem(pnv);
+
/* Mark next boot as Memory-preserving boot */
pnv->mpipl_state.is_next_boot_mpipl = true;
diff --git a/include/hw/ppc/pnv_mpipl.h b/include/hw/ppc/pnv_mpipl.h
index 60d6ede48209..e0518ef2e12e 100644
--- a/include/hw/ppc/pnv_mpipl.h
+++ b/include/hw/ppc/pnv_mpipl.h
@@ -10,13 +10,99 @@
#include "qemu/osdep.h"
#include "exec/hwaddr.h"
+#include <assert.h>
+
+typedef struct MdstTableEntry MdstTableEntry;
+typedef struct MdrtTableEntry MdrtTableEntry;
typedef struct MpiplPreservedState MpiplPreservedState;
+/*
+ * Following offsets are copied from skiboot source code.
+ * These need to be updated if this changes in a future skiboot version
+ */
+/* Use 768 bytes for SPIRAH */
+#define SPIRAH_OFF 0x00010000
+#define SPIRAH_SIZE 0x300
+
+/* Use 256 bytes for processor dump area */
+#define PROC_DUMP_AREA_OFF (SPIRAH_OFF + SPIRAH_SIZE)
+#define PROC_DUMP_AREA_SIZE 0x100
+
+#define PROCIN_OFF (PROC_DUMP_AREA_OFF + PROC_DUMP_AREA_SIZE)
+#define PROCIN_SIZE 0x800
+
+/* Offsets of MDST and MDDT tables from skiboot base */
+#define MDST_TABLE_OFF (PROCIN_OFF + PROCIN_SIZE)
+#define MDST_TABLE_SIZE 0x400
+
+#define MDDT_TABLE_OFF (MDST_TABLE_OFF + MDST_TABLE_SIZE)
+#define MDDT_TABLE_SIZE 0x400
+/*
+ * Offset of the dump result table MDRT. Hostboot will write to this
+ * memory after moving memory content from source to destination memory.
+ */
+#define MDRT_TABLE_OFF 0x01c00000
+#define MDRT_TABLE_SIZE 0x00008000
+
+/* HRMOR_BIT copied from skiboot */
+#define HRMOR_BIT (1ull << 63)
+
+#define __packed __attribute__((packed))
+
+/*
+ * Memory Dump Source Table (MDST)
+ *
+ * Format of this table is same as Memory Dump Source Table defined in HDAT
+ */
+struct MdstTableEntry {
+ uint64_t addr;
+ uint8_t data_region;
+ uint8_t dump_type;
+ uint16_t reserved;
+ uint32_t size;
+} __packed;
+
+/* Memory dump destination table (MDDT) has same structure as MDST */
+typedef MdstTableEntry MddtTableEntry;
+
+/*
+ * Memory dump result table (MDRT)
+ *
+ * List of the memory ranges that have been included in the dump. This table is
+ * filled by hostboot and passed to OPAL on second boot. OPAL/payload will use
+ * this table to extract the dump.
+ *
+ * Note: This structure differs from HDAT, but matches the structure
+ * skiboot uses
+ */
+struct MdrtTableEntry {
+ uint64_t src_addr;
+ uint64_t dest_addr;
+ uint8_t data_region;
+ uint8_t dump_type; /* unused */
+ uint16_t reserved; /* unused */
+ uint32_t size;
+ uint64_t padding; /* unused */
+} __packed;
+
+/* Maximum length of mdst/mddt/mdrt tables */
+#define MDST_MAX_ENTRIES (MDST_TABLE_SIZE / sizeof(MdstTableEntry))
+#define MDDT_MAX_ENTRIES (MDDT_TABLE_SIZE / sizeof(MddtTableEntry))
+#define MDRT_MAX_ENTRIES (MDRT_TABLE_SIZE / sizeof(MdrtTableEntry))
+
+static_assert(MDST_MAX_ENTRIES == MDDT_MAX_ENTRIES,
+ "Maximum entries in MDDT must match MDST");
+static_assert(MDRT_MAX_ENTRIES >= MDST_MAX_ENTRIES,
+ "MDRT should support atleast having number of entries as in MDST");
+
/* Preserved state to be saved in PnvMachineState */
struct MpiplPreservedState {
/* skiboot_base will be valid only after OPAL sends relocated base to SBE */
hwaddr skiboot_base;
bool is_next_boot_mpipl;
+
+ MdrtTableEntry *mdrt_table;
+ uint32_t num_mdrt_entries;
};
#endif
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 05/10] pnv/mpipl: Preserve CPU registers after crash
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
` (3 preceding siblings ...)
2026-03-10 12:46 ` [PATCH v5 04/10] pnv/mpipl: Preserve memory regions as per MDST/MDDT tables Aditya Gupta
@ 2026-03-10 12:46 ` Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 06/10] pnv/mpipl: Set thread entry size to be allocated by firmware Aditya Gupta
` (6 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 12:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-ppc, Hari Bathini, Sourabh Jain, Harsh Prateek Bora,
Nicholas Piggin, Miles Glenn, Chinmay Rath, Shivang Upadhyay
Kernel expects the platform to provide CPU registers after pausing
execution of the CPUs.
Currently only exporting the registers, used by Linux, for generating
the /proc/vmcore
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
hw/ppc/pnv_mpipl.c | 154 +++++++++++++++++++++++++++++++++++++
include/hw/ppc/pnv_mpipl.h | 60 +++++++++++++++
2 files changed, 214 insertions(+)
diff --git a/hw/ppc/pnv_mpipl.c b/hw/ppc/pnv_mpipl.c
index cef1fe2c4056..308948b829cd 100644
--- a/hw/ppc/pnv_mpipl.c
+++ b/hw/ppc/pnv_mpipl.c
@@ -8,6 +8,9 @@
#include "qemu/log.h"
#include "qemu/units.h"
#include "system/address-spaces.h"
+#include "system/cpus.h"
+#include "system/hw_accel.h"
+#include "system/memory.h"
#include "system/runstate.h"
#include "hw/ppc/pnv.h"
#include "hw/ppc/pnv_mpipl.h"
@@ -17,6 +20,8 @@
(pnv->mpipl_state.skiboot_base + MDST_TABLE_OFF)
#define MDDT_TABLE_RELOCATED \
(pnv->mpipl_state.skiboot_base + MDDT_TABLE_OFF)
+#define PROC_DUMP_RELOCATED \
+ (pnv->mpipl_state.skiboot_base + PROC_DUMP_AREA_OFF)
/*
* Preserve the memory regions as pointed by MDST table
@@ -169,9 +174,158 @@ static bool pnv_mpipl_preserve_mem(PnvMachineState *pnv)
return true;
}
+static void do_store_cpu_regs(CPUState *cpu, MpiplPreservedCPUState *state)
+{
+ CPUPPCState *env = cpu_env(cpu);
+ MpiplRegDataHdr *regs_hdr = &state->hdr;
+ MpiplRegEntry *reg_entries = state->reg_entries;
+ MpiplRegEntry *curr_reg_entry;
+ uint32_t num_saved_regs = 0;
+
+ cpu_synchronize_state(cpu);
+
+ regs_hdr->pir = cpu_to_be32(env->spr[SPR_PIR]);
+
+ /* QEMU CPUs are not in Power Saving Mode */
+ regs_hdr->core_state = 0xff;
+
+ regs_hdr->off_regentries = 0;
+ regs_hdr->num_regentries = cpu_to_be32(NUM_REGS_PER_CPU);
+
+ regs_hdr->alloc_size = cpu_to_be32(sizeof(MpiplRegEntry));
+ regs_hdr->act_size = cpu_to_be32(sizeof(MpiplRegEntry));
+
+#define REG_TYPE_GPR 0x1
+#define REG_TYPE_SPR 0x2
+#define REG_TYPE_TIMA 0x3
+
+/*
+ * ID numbers used by f/w while populating certain registers
+ *
+ * Copied these defines from the linux kernel
+ */
+#define REG_ID_NIP 0x7D0
+#define REG_ID_MSR 0x7D1
+#define REG_ID_CCR 0x7D2
+
+ curr_reg_entry = reg_entries;
+
+#define REG_ENTRY(type, num, val) \
+ do { \
+ curr_reg_entry->reg_type = cpu_to_be32(type); \
+ curr_reg_entry->reg_num = cpu_to_be32(num); \
+ curr_reg_entry->reg_val = cpu_to_be64(val); \
+ ++curr_reg_entry; \
+ ++num_saved_regs; \
+ } while (0)
+
+ /* Save the GPRs */
+ for (int gpr_id = 0; gpr_id < 32; ++gpr_id) {
+ REG_ENTRY(REG_TYPE_GPR, gpr_id, env->gpr[gpr_id]);
+ }
+
+ REG_ENTRY(REG_TYPE_SPR, SPR_ACOP, env->spr[SPR_ACOP]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_AMR, env->spr[SPR_AMR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_BESCR, env->spr[SPR_BESCR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_CFAR, env->spr[SPR_CFAR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_CIABR, env->spr[SPR_CIABR]);
+
+ REG_ENTRY(REG_TYPE_SPR, SPR_CTR, env->spr[SPR_CTR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_CTRL, env->spr[SPR_CTRL]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DABR, env->spr[SPR_DABR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DABRX, env->spr[SPR_DABRX]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DAR, env->spr[SPR_DAR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DAWR0, env->spr[SPR_DAWR0]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DAWR1, env->spr[SPR_DAWR1]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DAWRX0, env->spr[SPR_DAWRX0]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DAWRX1, env->spr[SPR_DAWRX1]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DPDES, env->spr[SPR_DPDES]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DSCR, env->spr[SPR_DSCR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DSISR, env->spr[SPR_DSISR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_EBBHR, env->spr[SPR_EBBHR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_EBBRR, env->spr[SPR_EBBRR]);
+
+ REG_ENTRY(REG_TYPE_SPR, SPR_FSCR, env->spr[SPR_FSCR]);
+
+ REG_ENTRY(REG_TYPE_SPR, SPR_CTR, env->ctr);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DAR, env->spr[SPR_DAR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_DSISR, env->spr[SPR_DSISR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_LR, env->lr);
+ REG_ENTRY(REG_TYPE_SPR, REG_ID_MSR, env->msr);
+ REG_ENTRY(REG_TYPE_SPR, REG_ID_NIP, env->nip);
+ REG_ENTRY(REG_TYPE_SPR, SPR_XER, env->xer);
+ REG_ENTRY(REG_TYPE_SPR, SPR_SRR0, env->spr[SPR_SRR0]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_SRR1, env->spr[SPR_SRR1]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_HSRR0, env->spr[SPR_HSRR0]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_HSRR1, env->spr[SPR_HSRR1]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_CFAR, env->spr[SPR_CFAR]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_HMER, env->spr[SPR_HMER]);
+ REG_ENTRY(REG_TYPE_SPR, SPR_HMEER, env->spr[SPR_HMEER]);
+
+ /*
+ * Ensure the number of registers saved match the number of
+ * registers per cpu
+ *
+ * This will help catch an error if in future a new register entry
+ * is added/removed while not modifying NUM_PER_CPU_REGS
+ */
+ assert(num_saved_regs == NUM_REGS_PER_CPU);
+}
+
+static bool pnv_mpipl_preserve_cpu_state(PnvMachineState *pnv)
+{
+ MachineState *machine = MACHINE(pnv);
+ uint32_t num_cpus = machine->smp.cpus;
+ MpiplPreservedCPUState *state;
+ CPUState *cpu;
+ AddressSpace *default_as = &address_space_memory;
+ MemTxResult io_result;
+ MemTxAttrs attrs;
+
+ /* Mark the memory transactions as privileged memory access */
+ attrs.user = 0;
+ attrs.memory = 1;
+
+ if (pnv->mpipl_state.cpu_states) {
+ /*
+ * CPU States might have been allocated from some past crash, free the
+ * memory to preven memory leak
+ */
+ g_free(pnv->mpipl_state.cpu_states);
+ pnv->mpipl_state.num_cpu_states = 0;
+ }
+
+ pnv->mpipl_state.cpu_states = g_malloc_n(num_cpus,
+ sizeof(MpiplPreservedCPUState));
+ pnv->mpipl_state.num_cpu_states = num_cpus;
+
+ state = pnv->mpipl_state.cpu_states;
+
+ /* Preserve the Processor Dump Area */
+ io_result = address_space_read(default_as, PROC_DUMP_RELOCATED, attrs,
+ &pnv->mpipl_state.proc_area, sizeof(MpiplProcDumpArea));
+ if (io_result != MEMTX_OK) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "MPIPL: Failed to read Proc Dump Area at: 0x" TARGET_FMT_lx "\n",
+ PROC_DUMP_RELOCATED);
+
+ return false;
+ }
+
+ CPU_FOREACH(cpu) {
+ do_store_cpu_regs(cpu, state);
+ ++state;
+ }
+
+ return true;
+}
+
void do_mpipl_preserve(PnvMachineState *pnv)
{
+ pause_all_vcpus();
+
pnv_mpipl_preserve_mem(pnv);
+ pnv_mpipl_preserve_cpu_state(pnv);
/* Mark next boot as Memory-preserving boot */
pnv->mpipl_state.is_next_boot_mpipl = true;
diff --git a/include/hw/ppc/pnv_mpipl.h b/include/hw/ppc/pnv_mpipl.h
index e0518ef2e12e..a602d6bef48d 100644
--- a/include/hw/ppc/pnv_mpipl.h
+++ b/include/hw/ppc/pnv_mpipl.h
@@ -15,6 +15,10 @@
typedef struct MdstTableEntry MdstTableEntry;
typedef struct MdrtTableEntry MdrtTableEntry;
typedef struct MpiplPreservedState MpiplPreservedState;
+typedef struct MpiplRegDataHdr MpiplRegDataHdr;
+typedef struct MpiplRegEntry MpiplRegEntry;
+typedef struct MpiplProcDumpArea MpiplProcDumpArea;
+typedef struct MpiplPreservedCPUState MpiplPreservedCPUState;
/*
* Following offsets are copied from skiboot source code.
@@ -49,6 +53,8 @@ typedef struct MpiplPreservedState MpiplPreservedState;
#define __packed __attribute__((packed))
+#define NUM_REGS_PER_CPU 66 /*(32 GPRs, 34 SPRs)*/
+
/*
* Memory Dump Source Table (MDST)
*
@@ -95,6 +101,55 @@ static_assert(MDST_MAX_ENTRIES == MDDT_MAX_ENTRIES,
static_assert(MDRT_MAX_ENTRIES >= MDST_MAX_ENTRIES,
"MDRT should support atleast having number of entries as in MDST");
+/*
+ * Processor Dump Area
+ *
+ * This contains the information needed for having processor
+ * state captured during a platform dump.
+ *
+ * As mentioned in HDAT, following the P9 specific format
+ */
+struct MpiplProcDumpArea {
+ uint32_t thread_size; /* Size of each thread register entry */
+#define PROC_DUMP_AREA_VERSION_P9 0x1 /* P9 format */
+ uint8_t version;
+ uint8_t reserved[11];
+ uint64_t alloc_addr; /* Destination memory to place register data */
+ uint32_t reserved2;
+ uint32_t alloc_size; /* Allocated size */
+ uint64_t dest_addr; /* Destination address */
+ uint32_t reserved3;
+ uint32_t act_size; /* Actual data size */
+} __packed;
+
+/*
+ * "Architected Register Data" in the HDAT spec
+ *
+ * Acts as a header to the register entries for a particular thread
+ */
+struct MpiplRegDataHdr {
+ uint32_t pir; /* PIR of thread */
+ uint8_t core_state; /* Stop state of the overall core */
+ uint8_t reserved[3];
+ uint32_t off_regentries; /* Offset to Register Entries Array */
+ uint32_t num_regentries; /* Number of Register Entries in Array */
+ uint32_t alloc_size; /* Allocated size for each Register Entry */
+ uint32_t act_size; /* Actual size for each Register Entry */
+} __packed;
+
+struct MpiplRegEntry {
+ uint32_t reg_type;
+ uint32_t reg_num;
+ uint64_t reg_val;
+} __packed;
+
+struct MpiplPreservedCPUState {
+ MpiplRegDataHdr hdr;
+
+ /* Length of 'reg_entries' is hdr.num_regentries */
+ MpiplRegEntry reg_entries[NUM_REGS_PER_CPU];
+};
+
/* Preserved state to be saved in PnvMachineState */
struct MpiplPreservedState {
/* skiboot_base will be valid only after OPAL sends relocated base to SBE */
@@ -103,6 +158,11 @@ struct MpiplPreservedState {
MdrtTableEntry *mdrt_table;
uint32_t num_mdrt_entries;
+
+ MpiplProcDumpArea proc_area;
+
+ MpiplPreservedCPUState *cpu_states;
+ uint32_t num_cpu_states;
};
#endif
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 06/10] pnv/mpipl: Set thread entry size to be allocated by firmware
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
` (4 preceding siblings ...)
2026-03-10 12:46 ` [PATCH v5 05/10] pnv/mpipl: Preserve CPU registers after crash Aditya Gupta
@ 2026-03-10 12:46 ` Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 07/10] pnv/mpipl: Write the preserved CPU and MDRT state Aditya Gupta
` (5 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 12:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-ppc, Hari Bathini, Sourabh Jain, Harsh Prateek Bora,
Nicholas Piggin, Miles Glenn, Chinmay Rath, Shivang Upadhyay
Set the "Thread Register State Entry Size" that is required by firmware
(OPAL), to know size of memory to allocate to capture CPU state, in the
event of a crash
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
hw/ppc/pnv.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 1513575b8f37..3038b1626c54 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -781,6 +781,30 @@ static void pnv_reset(MachineState *machine, ResetType type)
_FDT((fdt_pack(fdt)));
}
+ if (!pnv->mpipl_state.is_next_boot_mpipl) {
+ /*
+ * Set the "Thread Register State Entry Size", so that firmware can
+ * allocate enough memory to capture CPU state in the event of a
+ * crash
+ */
+
+ MpiplProcDumpArea proc_area;
+
+ proc_area.version = PROC_DUMP_AREA_VERSION_P9;
+ proc_area.thread_size = cpu_to_be32(sizeof(MpiplPreservedCPUState));
+
+ /* These are to be allocated & assigned by the firmware */
+ proc_area.alloc_addr = 0;
+ proc_area.alloc_size = 0;
+
+ /* These get assigned after crash, when QEMU preserves the registers */
+ proc_area.dest_addr = 0;
+ proc_area.act_size = 0;
+
+ cpu_physical_memory_write(PROC_DUMP_AREA_OFF, &proc_area,
+ sizeof(proc_area));
+ }
+
cpu_physical_memory_write(PNV_FDT_ADDR, fdt, fdt_totalsize(fdt));
/* Update machine->fdt with latest fdt */
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 07/10] pnv/mpipl: Write the preserved CPU and MDRT state
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
` (5 preceding siblings ...)
2026-03-10 12:46 ` [PATCH v5 06/10] pnv/mpipl: Set thread entry size to be allocated by firmware Aditya Gupta
@ 2026-03-10 12:46 ` Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 08/10] pnv/mpipl: Enable MPIPL support Aditya Gupta
` (4 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 12:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-ppc, Hari Bathini, Sourabh Jain, Harsh Prateek Bora,
Nicholas Piggin, Miles Glenn, Chinmay Rath, Shivang Upadhyay
Logic for preserving the CPU registers and memory regions has been done
in previous patches.
Write those data at the relevant memory address, such as PROC_DUMP_AREA
for CPU registers, and MDRT for preserved memory regions.
Also export "mpipl-boot" device tree node, for kernel to know that it's
a 'dump active' boot
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
hw/ppc/pnv.c | 39 +++++++++++-
hw/ppc/pnv_mpipl.c | 140 +++++++++++++++++++++++++++++++++++++++++++
include/hw/ppc/pnv.h | 1 +
3 files changed, 179 insertions(+), 1 deletion(-)
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 3038b1626c54..2db5be821e05 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -751,6 +751,8 @@ static void pnv_reset(MachineState *machine, ResetType type)
PnvMachineState *pnv = PNV_MACHINE(machine);
IPMIBmc *bmc;
void *fdt;
+ int node_offset;
+ bool mpipl_write_succeeded = false;
qemu_devices_reset(type);
@@ -781,7 +783,42 @@ static void pnv_reset(MachineState *machine, ResetType type)
_FDT((fdt_pack(fdt)));
}
- if (!pnv->mpipl_state.is_next_boot_mpipl) {
+ /*
+ * Only on success of writing MPIPL data will the next boot be provided
+ * "mpipl-boot" property in device tree
+ * Otherwise boot like a normal non-MPIPL boot
+ */
+ if (pnv->mpipl_state.is_next_boot_mpipl) {
+ /* Write the preserved MDRT and CPU State Data */
+ mpipl_write_succeeded = do_mpipl_write(pnv);
+ }
+
+ /*
+ * If it's a MPIPL boot, add the "mpipl-boot" property, and reset the
+ * boolean for MPIPL boot for next boot
+ */
+ if (mpipl_write_succeeded) {
+ void *fdt_copy = g_malloc0(FDT_MAX_SIZE);
+
+ /* Create a writable copy of the fdt */
+ _FDT((fdt_open_into(fdt, fdt_copy, FDT_MAX_SIZE)));
+
+ node_offset = fdt_path_offset(fdt_copy, "/ibm,opal/dump");
+ _FDT((fdt_appendprop_u64(fdt_copy, node_offset, "mpipl-boot", 1)));
+
+ /* Update the fdt, and free the original fdt */
+ if (fdt != machine->fdt) {
+ /*
+ * Only free the fdt if it's not machine->fdt, to prevent
+ * double free, since we already free machine->fdt later
+ */
+ g_free(fdt);
+ }
+ fdt = fdt_copy;
+
+ /* This boot is an MPIPL, reset the boolean for next boot */
+ pnv->mpipl_state.is_next_boot_mpipl = false;
+ } else {
/*
* Set the "Thread Register State Entry Size", so that firmware can
* allocate enough memory to capture CPU state in the event of a
diff --git a/hw/ppc/pnv_mpipl.c b/hw/ppc/pnv_mpipl.c
index 308948b829cd..f5b228f5ba3c 100644
--- a/hw/ppc/pnv_mpipl.c
+++ b/hw/ppc/pnv_mpipl.c
@@ -20,6 +20,8 @@
(pnv->mpipl_state.skiboot_base + MDST_TABLE_OFF)
#define MDDT_TABLE_RELOCATED \
(pnv->mpipl_state.skiboot_base + MDDT_TABLE_OFF)
+#define MDRT_TABLE_RELOCATED \
+ (pnv->mpipl_state.skiboot_base + MDRT_TABLE_OFF)
#define PROC_DUMP_RELOCATED \
(pnv->mpipl_state.skiboot_base + PROC_DUMP_AREA_OFF)
@@ -320,6 +322,139 @@ static bool pnv_mpipl_preserve_cpu_state(PnvMachineState *pnv)
return true;
}
+/*
+ * Write the preserved CPU state data in Processor Dump Area (PROC_DUMP_AREA)
+ *
+ * Returns true if everything went fine, else false for any error
+ */
+static bool pnv_mpipl_write_cpu_state(PnvMachineState *pnv)
+{
+ MpiplProcDumpArea *proc_area = &pnv->mpipl_state.proc_area;
+ MpiplPreservedCPUState *cpu_state = pnv->mpipl_state.cpu_states;
+ const uint32_t num_cpu_states = pnv->mpipl_state.num_cpu_states;
+ hwaddr next_regentries_hdr;
+ AddressSpace *default_as = &address_space_memory;
+ MemTxResult io_result;
+ MemTxAttrs attrs;
+
+ /* Mark the memory transactions as privileged memory access */
+ attrs.user = 0;
+ attrs.memory = 1;
+
+ if (be32_to_cpu(proc_area->alloc_size) <
+ (num_cpu_states * sizeof(MpiplPreservedCPUState))) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "MPIPL: Size of buffer allocate by skiboot (%u bytes) is not"
+ "enough to save all CPUs registers needed (%zu bytes)",
+ be32_to_cpu(proc_area->alloc_size),
+ num_cpu_states * sizeof(MpiplPreservedCPUState));
+
+ return false;
+ }
+
+ proc_area->version = PROC_DUMP_AREA_VERSION_P9;
+
+ /*
+ * This is the stride kernel/firmware should use to jump from a
+ * register entries header to next CPU's header
+ */
+ proc_area->thread_size = cpu_to_be32(sizeof(MpiplPreservedCPUState));
+
+ /* Write the header and register entries for each CPU */
+ next_regentries_hdr = be64_to_cpu(proc_area->alloc_addr) & (~HRMOR_BIT);
+ for (int i = 0; i < num_cpu_states; ++i) {
+ io_result = address_space_write(default_as, next_regentries_hdr, attrs,
+ &cpu_state->hdr, sizeof(MpiplRegDataHdr));
+ if (io_result != MEMTX_OK) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "MPIPL: Failed to write RegEntries Header\n");
+ return false;
+ }
+
+ io_result = address_space_write(default_as,
+ next_regentries_hdr + sizeof(MpiplRegDataHdr), attrs,
+ &cpu_state->reg_entries,
+ NUM_REGS_PER_CPU * (sizeof(MpiplRegEntry)));
+ if (io_result != MEMTX_OK) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "MPIPL: Failed to write Register Entries\n");
+ return false;
+ }
+
+ /*
+ * According to HDAT section:
+ * "15.3.1.5 Architected Register Data content":
+ *
+ * The next register entries header will be at current header +
+ * "Thread Register State Entry size"
+ *
+ * Note: proc_area.thread_size == sizeof(MpiplPreservedCPUState)
+ */
+ next_regentries_hdr += sizeof(MpiplPreservedCPUState);
+ ++cpu_state;
+ }
+
+ /* Point the destination address to the preserved memory region */
+ proc_area->dest_addr = proc_area->alloc_addr;
+ proc_area->act_size = cpu_to_be32(num_cpu_states *
+ sizeof(MpiplPreservedCPUState));
+
+ io_result = address_space_write(default_as, PROC_DUMP_AREA_OFF, attrs,
+ proc_area, sizeof(MpiplProcDumpArea));
+ if (io_result != MEMTX_OK) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "MPIPL: Failed to write Register Entries\n");
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Write the preserved MDRT table, representing preserved memory regions
+ *
+ * Returns true if everything went fine, else false for any error
+ */
+static bool pnv_mpipl_write_mdrt(PnvMachineState *pnv)
+{
+ MpiplPreservedState *state = &pnv->mpipl_state;
+ AddressSpace *default_as = &address_space_memory;
+ MemTxResult io_result;
+ MemTxAttrs attrs;
+
+ /* Mark the memory transactions as privileged memory access */
+ attrs.user = 0;
+ attrs.memory = 1;
+
+ /*
+ * Generally writes from platform during MPIPL don't go to a relocated
+ * skiboot address
+ *
+ * Though for MDRT we are doing so, as this is the address skiboot
+ * considers by default for MDRT
+ *
+ * MDRT/MDST/MDDT base addresses are actually meant to be shared by
+ * platform in SPIRA structures.
+ *
+ * Not implementing SPIRA as it increases complexity for no gains.
+ * Using the default address skiboot expects for MDRT, which is the
+ * relocated MDRT, hence writing to it
+ *
+ * Other tables like MDST/MDDT should not be written to relocated
+ * addresses, as skiboot will overwrite anything from SKIBOOT_BASE till
+ * SKIBOOT_BASE+SKIBOOT_SIZE (which is 0x30000000-0x31c00000 by default)
+ */
+ io_result = address_space_write(default_as, MDRT_TABLE_RELOCATED, attrs,
+ state->mdrt_table,
+ state->num_mdrt_entries * sizeof(MdrtTableEntry));
+ if (io_result != MEMTX_OK) {
+ qemu_log_mask(LOG_GUEST_ERROR, "MPIPL: Failed to write MDRT table\n");
+ return false;
+ }
+
+ return true;
+}
+
void do_mpipl_preserve(PnvMachineState *pnv)
{
pause_all_vcpus();
@@ -340,3 +475,8 @@ void do_mpipl_preserve(PnvMachineState *pnv)
*/
qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
}
+
+bool do_mpipl_write(PnvMachineState *pnv)
+{
+ return pnv_mpipl_write_mdrt(pnv) && pnv_mpipl_write_cpu_state(pnv);
+}
diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 7d73629f112a..98fe10fb4f2e 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -295,5 +295,6 @@ void pnv_bmc_set_pnor(IPMIBmc *bmc, PnvPnor *pnor);
/* MPIPL helpers */
void do_mpipl_preserve(PnvMachineState *pnv);
+bool do_mpipl_write(PnvMachineState *pnv);
#endif /* PPC_PNV_H */
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 08/10] pnv/mpipl: Enable MPIPL support
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
` (6 preceding siblings ...)
2026-03-10 12:46 ` [PATCH v5 07/10] pnv/mpipl: Write the preserved CPU and MDRT state Aditya Gupta
@ 2026-03-10 12:46 ` Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 09/10] tests/functional: Add test for MPIPL in PowerNV Aditya Gupta
` (3 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 12:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-ppc, Hari Bathini, Sourabh Jain, Harsh Prateek Bora,
Nicholas Piggin, Miles Glenn, Chinmay Rath, Shivang Upadhyay
With all MPIPL support in place, export a "dump" node in device tree,
signifying that PowerNV QEMU platform supports MPIPL
Also, export fw-load-area dt node, which has details about where the
kernel & initrd were loaded, so that kernel can verify whether the
kernel/initrd images were loaded within the boot memory region. QEMU
just exports these details in fw-load-area, the check for boot memory
region is done in kernel.
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
hw/ppc/pnv.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 2db5be821e05..6f59fc0d3e33 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -54,6 +54,7 @@
#include "hw/ppc/pnv_chip.h"
#include "hw/ppc/pnv_xscom.h"
#include "hw/ppc/pnv_pnor.h"
+#include "hw/ppc/pnv_mpipl.h"
#include "hw/isa/isa.h"
#include "hw/char/serial-isa.h"
@@ -672,6 +673,39 @@ static void pnv_dt_power_mgt(PnvMachineState *pnv, void *fdt)
_FDT(fdt_setprop_cell(fdt, off, "ibm,enabled-stop-levels", 0xc0000000));
}
+static void pnv_dt_mpipl_dump(PnvMachineState *pnv, void *fdt)
+{
+ int off;
+
+ /*
+ * Add "dump" node so kernel knows MPIPL (aka fadump) is supported
+ *
+ * Note: This is only needed to be done since we are passing device tree to
+ * opal
+ *
+ * In case HDAT is supported in future, then opal can add these nodes by
+ * itself based on system attribute having MPIPL_SUPPORTED bit set
+ */
+ off = fdt_add_subnode(fdt, 0, "ibm,opal");
+ if (off == -FDT_ERR_EXISTS) {
+ off = fdt_path_offset(fdt, "/ibm,opal");
+ }
+
+ _FDT(off);
+ off = fdt_add_subnode(fdt, off, "dump");
+ _FDT(off);
+ _FDT((fdt_setprop_string(fdt, off, "compatible", "ibm,opal-dump")));
+
+ /* Add kernel and initrd as fw-load-area */
+ uint64_t fw_load_area[4] = {
+ cpu_to_be64(KERNEL_LOAD_ADDR), cpu_to_be64(KERNEL_MAX_SIZE),
+ cpu_to_be64(INITRD_LOAD_ADDR), cpu_to_be64(INITRD_MAX_SIZE)
+ };
+
+ _FDT((fdt_setprop(fdt, off, "fw-load-area",
+ fw_load_area, sizeof(fw_load_area))));
+}
+
static void *pnv_dt_create(MachineState *machine)
{
PnvMachineClass *pmc = PNV_MACHINE_GET_CLASS(machine);
@@ -734,6 +768,9 @@ static void *pnv_dt_create(MachineState *machine)
pmc->dt_power_mgt(pnv, fdt);
}
+ /* Advertise support for MPIPL */
+ pnv_dt_mpipl_dump(pnv, fdt);
+
return fdt;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 09/10] tests/functional: Add test for MPIPL in PowerNV
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
` (7 preceding siblings ...)
2026-03-10 12:46 ` [PATCH v5 08/10] pnv/mpipl: Enable MPIPL support Aditya Gupta
@ 2026-03-10 12:46 ` Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 10/10] MAINTAINERS: Add entry for MPIPL (PowerNV) Aditya Gupta
` (2 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 12:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-ppc, Hari Bathini, Sourabh Jain, Harsh Prateek Bora,
Nicholas Piggin, Miles Glenn, Chinmay Rath, Shivang Upadhyay
With MPIPL support implemented, enable fadump's functional test for PowerNV
Also, current functional test for powernv uses op-build's Linux 5.10 image,
which doesn't support adding "fadump=on" in argument due to this:
Kernel is locked down from Kernel configuration; see man kernel_lockdown.7
Hence, instead of op-build's image, use the newer fedora vmlinuz as used
in FADump PSeries functional test
Also due to "bash#" string not showing up, rely on sh: no job control to
check if testcase has reached till shell
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
tests/functional/ppc64/test_fadump.py | 35 ++++++++++-----------------
1 file changed, 13 insertions(+), 22 deletions(-)
diff --git a/tests/functional/ppc64/test_fadump.py b/tests/functional/ppc64/test_fadump.py
index bd9692f64c05..7ea65974e0ea 100755
--- a/tests/functional/ppc64/test_fadump.py
+++ b/tests/functional/ppc64/test_fadump.py
@@ -14,6 +14,7 @@ class QEMUFadump(LinuxKernelTest):
1. test_fadump_pseries: PSeries
2. test_fadump_pseries_kvm: PSeries + KVM
+ 3. test_fadump_powernv: PowerNV
"""
timeout = 90
@@ -24,11 +25,6 @@ class QEMUFadump(LinuxKernelTest):
msg_registered_failed = ''
msg_dump_active = ''
- ASSET_EPAPR_KERNEL = Asset(
- ('https://github.com/open-power/op-build/releases/download/v2.7/'
- 'zImage.epapr'),
- '0ab237df661727e5392cee97460e8674057a883c5f74381a128fa772588d45cd')
-
ASSET_VMLINUZ_KERNEL = Asset(
('https://archives.fedoraproject.org/pub/archive/fedora-secondary/'
'releases/39/Everything/ppc64le/os/ppc/ppc64/vmlinuz'),
@@ -62,16 +58,14 @@ def do_test_fadump(self, is_kvm=False, is_powernv=False):
# SLOF takes upto >20s in startup time, use VOF
self.set_machine("pseries")
self.vm.add_args("-machine", "x-vof=on")
- self.vm.add_args("-m", "6G")
+
+ self.vm.add_args("-m", "6G")
self.vm.set_console()
kernel_path = None
- if is_powernv:
- kernel_path = self.ASSET_EPAPR_KERNEL.fetch()
- else:
- kernel_path = self.ASSET_VMLINUZ_KERNEL.fetch()
+ kernel_path = self.ASSET_VMLINUZ_KERNEL.fetch()
initrd_path = self.ASSET_FEDORA_INITRD.fetch()
@@ -102,16 +96,14 @@ def do_test_fadump(self, is_kvm=False, is_powernv=False):
timeout=20
)
- # Ensure fadump is registered successfully, if registration
- # succeeds, we get a log from rtas fadump:
- #
- # rtas fadump: Registration is successful!
- self.wait_for_console_pattern(
- "rtas fadump: Registration is successful!"
- )
+ # Ensure fadump is registered successfully
+ if not is_powernv:
+ self.wait_for_console_pattern(
+ "rtas fadump: Registration is successful!"
+ )
# Wait for the shell
- self.wait_for_console_pattern("#")
+ self.wait_for_console_pattern("sh: no job control")
# Mount /proc since not available in the initrd used
exec_command(self, command="mount -t proc proc /proc")
@@ -135,7 +127,7 @@ def do_test_fadump(self, is_kvm=False, is_powernv=False):
# that qemu didn't pass the 'ibm,kernel-dump' device tree node
wait_for_console_pattern(
test=self,
- success_message="rtas fadump: Firmware-assisted dump is active",
+ success_message="fadump: Firmware-assisted dump is active",
failure_message="fadump: Reserved "
)
@@ -148,7 +140,7 @@ def do_test_fadump(self, is_kvm=False, is_powernv=False):
self.wait_for_console_pattern("preserving crash data")
# Wait for prompt
- self.wait_for_console_pattern("sh-5.2#")
+ self.wait_for_console_pattern("Run /bin/sh as init process")
# Mount /proc since not available in the initrd used
exec_command_and_wait_for_pattern(self,
@@ -166,9 +158,8 @@ def do_test_fadump(self, is_kvm=False, is_powernv=False):
def test_fadump_pseries(self):
return self.do_test_fadump(is_kvm=False, is_powernv=False)
- @skip("PowerNV Fadump not supported yet")
def test_fadump_powernv(self):
- return
+ return self.do_test_fadump(is_kvm=False, is_powernv=True)
def test_fadump_pseries_kvm(self):
"""
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 10/10] MAINTAINERS: Add entry for MPIPL (PowerNV)
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
` (8 preceding siblings ...)
2026-03-10 12:46 ` [PATCH v5 09/10] tests/functional: Add test for MPIPL in PowerNV Aditya Gupta
@ 2026-03-10 12:46 ` Aditya Gupta
2026-03-10 14:12 ` [PATCH v5 00/10] Implement MPIPL for PowerNV Shivang Upadhyay
2026-03-10 14:50 ` Sourabh Jain
11 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 12:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-ppc, Hari Bathini, Sourabh Jain, Harsh Prateek Bora,
Nicholas Piggin, Miles Glenn, Chinmay Rath, Shivang Upadhyay
Add maintainer and reviewer for MPIPL subsystem.
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
MAINTAINERS | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 6730cee490cf..a013277ccb95 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3302,6 +3302,14 @@ F: include/hw/ppc/spapr_fadump.h
F: hw/ppc/spapr_fadump.c
F: tests/functional/ppc64/test_fadump.py
+Memory-Preserving Initial Program Load (MPIPL) for PowerNV
+M: Aditya Gupta <adityag@linux.ibm.com>
+R: Hari Bathini <hbathini@linux.ibm.com>
+S: Maintained
+F: include/hw/ppc/pnv_mpipl.h
+F: hw/ppc/pnv_mpipl.c
+F: tests/functional/ppc64/test_fadump.py
+
GDB stub
M: Alex Bennée <alex.bennee@linaro.org>
R: Philippe Mathieu-Daudé <philmd@linaro.org>
--
2.53.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v5 00/10] Implement MPIPL for PowerNV
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
` (9 preceding siblings ...)
2026-03-10 12:46 ` [PATCH v5 10/10] MAINTAINERS: Add entry for MPIPL (PowerNV) Aditya Gupta
@ 2026-03-10 14:12 ` Shivang Upadhyay
2026-03-10 14:51 ` Aditya Gupta
2026-03-10 14:50 ` Sourabh Jain
11 siblings, 1 reply; 15+ messages in thread
From: Shivang Upadhyay @ 2026-03-10 14:12 UTC (permalink / raw)
To: Aditya Gupta
Cc: qemu-devel, qemu-ppc, Hari Bathini, Sourabh Jain,
Harsh Prateek Bora, Nicholas Piggin, Miles Glenn, Chinmay Rath
On Tue, Mar 10, 2026 at 06:16:07PM +0530, Aditya Gupta wrote:
> Overview
> =========
>
> Implemented MPIPL (Memory Preserving IPL, aka fadump) on PowerNV machine
> in QEMU.
>
> Fadump is an alternative dump mechanism to kdump, in which we the firmware
> does a memory preserving boot, and the second/crashkernel is booted fresh
> like a normal system reset, instead of the crashed kernel loading the
> second/crashkernel in case of kdump.
>
> MPIPL in PowerNV, is similar to fadump in Pseries. The idea is same, memory
> preserving, where in PowerNV we are assisted by SBE (Self Boot Engine) &
> Hostboot, while in Pseries we are assisted by PHyp (Power Hypervisor)
>
> For implementing in baremetal/powernv QEMU, we need to export a
> "ibm,opal/dump" node in the device tree, to tell the kernel we support
> MPIPL
>
> Once kernel sees the support, and "fadump=on" is passed on commandline,
> kernel will register memory regions to preserve with Skiboot.
>
> Kernel sends these data using OPAL calls, after which skiboot/opal saves
> the memory region details to MDST and MDDT tables (S-source, D-destination)
>
> Then in the event of a kernel crash, the kernel initiates MPIPL with another
> OPAL code (opal_cec_reboot2), this request goes to Skiboot.
> Skiboot then triggers the "S0 Interrupt" to the SBE (Self Boot Engine),
> along with OPAL's relocated base address.
>
> SBE then stops all core clocks, and only does particular ISteps for a
> memory preserving boot.
>
> Then, hostboot comes up, and with help of the relocated base address, it
> accesses MDST & MDDT tables (S-source and D-destination), and preserves the
> memory regions according to the data in these tables.
> And after preserving, it writes the preserved memory region details to MDRT
> tables (R-Result), for the kernel to know where/whether a memory region is
> preserved.
>
> Both SBE's and hostboot responsiblities are implemented in the SBE code
> in QEMU.
>
> Then in the second kernel/crashkernel boot, OPAL passes the "mpipl-boot"
> property for the kernel to know that a dump is active, which kernel then
> exports in /proc/vmcore
>
> Testing
> ====================
>
> 1. Git tree for testing: https://gitlab.com/adi-g15-ibm/qemu/tree/fadump-powernv-v5
>
> 2. Gitlab pipeline: https://gitlab.com/adi-g15-ibm/qemu/-/pipelines/2375470651
>
> 3. Analysing generated vmcore:
>
> # ls -lh /proc/vmcore
> -r-------- 1 root root 4.5G Mar 10 12:30 /proc/vmcore
>
> # file /proc/vmcore
> /proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style
>
> # crash vmlinux-38fec10eb60d-network vmcore-powernv-10mar26
> ...
> KERNEL: vmlinux-38fec10eb60d-network
> DUMPFILE: vmcore-powernv-10mar26
> CPUS: 2
> DATE: Thu Jan 1 05:30:00 IST 1970
> UPTIME: 00:00:50
> LOAD AVERAGE: 0.57, 0.19, 0.07
> TASKS: 83
> NODENAME: buildroot
> RELEASE: 6.14.0
> VERSION: #1 SMP Thu Apr 3 08:06:13 CDT 2025
> MACHINE: ppc64le (1000 Mhz)
> MEMORY: 6 GB
> PANIC: "Kernel panic - not syncing: sysrq triggered crash"
> PID: 238
> COMMAND: "sh"
> TASK: c00000000a0f3200 [THREAD_INFO: c00000000a0f3200]
> CPU: 0
> STATE: TASK_RUNNING (PANIC)
>
> crash> # ps and kmem -i works
buildroot fadump test and `make check-functional-ppc64` are passing on V5.
Welcome to Buildroot
buildroot login: root
# dmesg | grep fadump
[ 0.000000][ T0] opal fadump: Kernel metadata addr: 653902a8
[ 0.000000][ T0] fadump: Reserved 768MB of memory at 0x00000035390000 (System RAM: 5120MB)
[ 0.000000][ T0] fadump: Initialized [0x36000000, 752MB] cma area from [0x35390000, 768MB] bytes of memory reserved for firmware-assisted dump
[ 0.000000][ T0] Kernel command line: console=hvc0 rootwait root=/dev/nvme0n1 fadump=on
[ 0.473711][ T1] opal fadump: Registration is successful!
# echo c > /proc/sysrq-trigger
<snip />
Welcome to Buildroot
buildroot login: root
# ls -alh /proc/vmcore
-r-------- 1 root root 4.3G Mar 10 14:07 /proc/vmcore
#
> make check-functional-ppc64
<snip />
16/16 func-thorough+func-ppc64-thorough+thorough - qemu:func-ppc64-fadump OK 65.42s 2 subtests passed
Ok: 9
Fail: 0
Skipped: 7
Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
~Shivang.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 00/10] Implement MPIPL for PowerNV
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
` (10 preceding siblings ...)
2026-03-10 14:12 ` [PATCH v5 00/10] Implement MPIPL for PowerNV Shivang Upadhyay
@ 2026-03-10 14:50 ` Sourabh Jain
2026-03-10 14:52 ` Aditya Gupta
11 siblings, 1 reply; 15+ messages in thread
From: Sourabh Jain @ 2026-03-10 14:50 UTC (permalink / raw)
To: Aditya Gupta, qemu-devel
Cc: qemu-ppc, Hari Bathini, Harsh Prateek Bora, Nicholas Piggin,
Miles Glenn, Chinmay Rath, Shivang Upadhyay
Thanks for adding fadump support on PowerNV platform.
The whole patch series looks good to me.
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
On 10/03/26 18:16, Aditya Gupta wrote:
> Overview
> =========
>
> Implemented MPIPL (Memory Preserving IPL, aka fadump) on PowerNV machine
> in QEMU.
>
> Fadump is an alternative dump mechanism to kdump, in which we the firmware
> does a memory preserving boot, and the second/crashkernel is booted fresh
> like a normal system reset, instead of the crashed kernel loading the
> second/crashkernel in case of kdump.
>
> MPIPL in PowerNV, is similar to fadump in Pseries. The idea is same, memory
> preserving, where in PowerNV we are assisted by SBE (Self Boot Engine) &
> Hostboot, while in Pseries we are assisted by PHyp (Power Hypervisor)
>
> For implementing in baremetal/powernv QEMU, we need to export a
> "ibm,opal/dump" node in the device tree, to tell the kernel we support
> MPIPL
>
> Once kernel sees the support, and "fadump=on" is passed on commandline,
> kernel will register memory regions to preserve with Skiboot.
>
> Kernel sends these data using OPAL calls, after which skiboot/opal saves
> the memory region details to MDST and MDDT tables (S-source, D-destination)
>
> Then in the event of a kernel crash, the kernel initiates MPIPL with another
> OPAL code (opal_cec_reboot2), this request goes to Skiboot.
> Skiboot then triggers the "S0 Interrupt" to the SBE (Self Boot Engine),
> along with OPAL's relocated base address.
>
> SBE then stops all core clocks, and only does particular ISteps for a
> memory preserving boot.
>
> Then, hostboot comes up, and with help of the relocated base address, it
> accesses MDST & MDDT tables (S-source and D-destination), and preserves the
> memory regions according to the data in these tables.
> And after preserving, it writes the preserved memory region details to MDRT
> tables (R-Result), for the kernel to know where/whether a memory region is
> preserved.
>
> Both SBE's and hostboot responsiblities are implemented in the SBE code
> in QEMU.
>
> Then in the second kernel/crashkernel boot, OPAL passes the "mpipl-boot"
> property for the kernel to know that a dump is active, which kernel then
> exports in /proc/vmcore
>
> Testing
> ====================
>
> 1. Git tree for testing: https://gitlab.com/adi-g15-ibm/qemu/tree/fadump-powernv-v5
>
> 2. Gitlab pipeline: https://gitlab.com/adi-g15-ibm/qemu/-/pipelines/2375470651
>
> 3. Analysing generated vmcore:
>
> # ls -lh /proc/vmcore
> -r-------- 1 root root 4.5G Mar 10 12:30 /proc/vmcore
>
> # file /proc/vmcore
> /proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style
>
> # crash vmlinux-38fec10eb60d-network vmcore-powernv-10mar26
> ...
> KERNEL: vmlinux-38fec10eb60d-network
> DUMPFILE: vmcore-powernv-10mar26
> CPUS: 2
> DATE: Thu Jan 1 05:30:00 IST 1970
> UPTIME: 00:00:50
> LOAD AVERAGE: 0.57, 0.19, 0.07
> TASKS: 83
> NODENAME: buildroot
> RELEASE: 6.14.0
> VERSION: #1 SMP Thu Apr 3 08:06:13 CDT 2025
> MACHINE: ppc64le (1000 Mhz)
> MEMORY: 6 GB
> PANIC: "Kernel panic - not syncing: sysrq triggered crash"
> PID: 238
> COMMAND: "sh"
> TASK: c00000000a0f3200 [THREAD_INFO: c00000000a0f3200]
> CPU: 0
> STATE: TASK_RUNNING (PANIC)
>
> crash> # ps and kmem -i works
>
> Changelog
> ====================
>
> v4 -> v5:
> * #4/10: set chunk_id=0 before copying
> * #7/10: remove unnecessary bool check, ie. 'if (b1) b2=b1 else b2=!b1' => 'b2=b1'
>
> v3 -> v4:
> * #2/10: s/recieves/receives
> * #7/10: remove empty line at EOF
>
> v2 -> v3:
> * rebase to upstream, changes in patches below
> * #2/10: no code change. add comment that skiboot triggers S0
> * #3/10: stash command: handle invalid skiboot_base sent by guest
> * #4/10: s/src_len/data_len/
> * #4/10: use TARGET_FMT_lx/PRIx64 instead of %lx to prevent build errors
> * #4/10: stop copying chunks once copying a chunk fails
> * #5/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write}
> * #5/10: add more SPRs to be saved, same set of SPRs as spapr FADump, except CR and FPSCR
> * #7/10: only export "mpipl-boot" property if preserving cpu states and writing MDRT was successful, otherwise continue with normal reboot
> * #7/10: use address_space_{read,write} instead of cpu_physical_memory_{read,write}
> * #8/10: reword commit description to mention fw-load-area, no code change
> * #10/10: add entry in MAINTAINERS file
>
> Aditya Gupta (10):
> ppc/pnv: Move SBE host doorbell function to top of file
> ppc/mpipl: Implement S0 SBE interrupt
> ppc/pnv: Handle stash command in PowerNV SBE
> pnv/mpipl: Preserve memory regions as per MDST/MDDT tables
> pnv/mpipl: Preserve CPU registers after crash
> pnv/mpipl: Set thread entry size to be allocated by firmware
> pnv/mpipl: Write the preserved CPU and MDRT state
> pnv/mpipl: Enable MPIPL support
> tests/functional: Add test for MPIPL in PowerNV
> MAINTAINERS: Add entry for MPIPL (PowerNV)
>
> MAINTAINERS | 8 +
> hw/ppc/meson.build | 1 +
> hw/ppc/pnv.c | 98 ++++++
> hw/ppc/pnv_mpipl.c | 482 ++++++++++++++++++++++++++
> hw/ppc/pnv_sbe.c | 84 ++++-
> include/hw/ppc/pnv.h | 7 +
> include/hw/ppc/pnv_mpipl.h | 168 +++++++++
> tests/functional/ppc64/test_fadump.py | 35 +-
> 8 files changed, 852 insertions(+), 31 deletions(-)
> create mode 100644 hw/ppc/pnv_mpipl.c
> create mode 100644 include/hw/ppc/pnv_mpipl.h
>
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 00/10] Implement MPIPL for PowerNV
2026-03-10 14:12 ` [PATCH v5 00/10] Implement MPIPL for PowerNV Shivang Upadhyay
@ 2026-03-10 14:51 ` Aditya Gupta
0 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 14:51 UTC (permalink / raw)
To: Shivang Upadhyay
Cc: qemu-devel, qemu-ppc, Hari Bathini, Sourabh Jain,
Harsh Prateek Bora, Nicholas Piggin, Miles Glenn, Chinmay Rath
On 10/03/26 19:42, Shivang Upadhyay wrote:
> On Tue, Mar 10, 2026 at 06:16:07PM +0530, Aditya Gupta wrote:
>> <...snip...>
> buildroot fadump test and `make check-functional-ppc64` are passing on V5.
>
> Welcome to Buildroot
> buildroot login: root
> # dmesg | grep fadump
> [ 0.000000][ T0] opal fadump: Kernel metadata addr: 653902a8
> [ 0.000000][ T0] fadump: Reserved 768MB of memory at 0x00000035390000 (System RAM: 5120MB)
> [ 0.000000][ T0] fadump: Initialized [0x36000000, 752MB] cma area from [0x35390000, 768MB] bytes of memory reserved for firmware-assisted dump
> [ 0.000000][ T0] Kernel command line: console=hvc0 rootwait root=/dev/nvme0n1 fadump=on
> [ 0.473711][ T1] opal fadump: Registration is successful!
> # echo c > /proc/sysrq-trigger
>
> <snip />
>
> Welcome to Buildroot
> buildroot login: root
> # ls -alh /proc/vmcore
> -r-------- 1 root root 4.3G Mar 10 14:07 /proc/vmcore
> #
>
>
> > make check-functional-ppc64
> <snip />
> 16/16 func-thorough+func-ppc64-thorough+thorough - qemu:func-ppc64-fadump OK 65.42s 2 subtests passed
>
> Ok: 9
> Fail: 0
> Skipped: 7
>
>
> Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Thank you for testing this series, shivang !
- Aditya G
>
> ~Shivang.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 00/10] Implement MPIPL for PowerNV
2026-03-10 14:50 ` Sourabh Jain
@ 2026-03-10 14:52 ` Aditya Gupta
0 siblings, 0 replies; 15+ messages in thread
From: Aditya Gupta @ 2026-03-10 14:52 UTC (permalink / raw)
To: Sourabh Jain, qemu-devel
Cc: qemu-ppc, Hari Bathini, Harsh Prateek Bora, Nicholas Piggin,
Miles Glenn, Chinmay Rath, Shivang Upadhyay
On 10/03/26 20:20, Sourabh Jain wrote:
> Thanks for adding fadump support on PowerNV platform.
>
> The whole patch series looks good to me.
> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Thank you for the detailed reviews Sourabh !
- Aditya G
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-03-10 14:52 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 12:46 [PATCH v5 00/10] Implement MPIPL for PowerNV Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 01/10] ppc/pnv: Move SBE host doorbell function to top of file Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 02/10] ppc/mpipl: Implement S0 SBE interrupt Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 03/10] ppc/pnv: Handle stash command in PowerNV SBE Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 04/10] pnv/mpipl: Preserve memory regions as per MDST/MDDT tables Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 05/10] pnv/mpipl: Preserve CPU registers after crash Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 06/10] pnv/mpipl: Set thread entry size to be allocated by firmware Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 07/10] pnv/mpipl: Write the preserved CPU and MDRT state Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 08/10] pnv/mpipl: Enable MPIPL support Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 09/10] tests/functional: Add test for MPIPL in PowerNV Aditya Gupta
2026-03-10 12:46 ` [PATCH v5 10/10] MAINTAINERS: Add entry for MPIPL (PowerNV) Aditya Gupta
2026-03-10 14:12 ` [PATCH v5 00/10] Implement MPIPL for PowerNV Shivang Upadhyay
2026-03-10 14:51 ` Aditya Gupta
2026-03-10 14:50 ` Sourabh Jain
2026-03-10 14:52 ` Aditya Gupta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox