All of lore.kernel.org
 help / color / mirror / Atom feed
* [PULL 00/13] PPC PR for 11.1 (2026-04-29)
@ 2026-04-29 18:32 Harsh Prateek Bora
  2026-04-29 18:32 ` [PULL 01/13] ppc/pnv: Move SBE host doorbell function to top of file Harsh Prateek Bora
                   ` (13 more replies)
  0 siblings, 14 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:32 UTC (permalink / raw)
  To: qemu-devel

The following changes since commit 282771e1f9b9b6e0147adf5f9d676325175b1767:

  Merge tag 'pull-riscv-to-apply-20260429-1' of https://github.com/alistair23/qemu into staging (2026-04-29 09:22:51 -0400)

are available in the Git repository at:

  https://gitlab.com/harshpb/qemu.git tags/pull-ppc-for-11.1-20260429

for you to fetch changes up to 1aee8067fce95d15061eca8fbb6772d8a90ea699:

  hw/intc/xics: Add a check for an invalid server id (2026-04-29 22:51:06 +0530)

----------------------------------------------------------------
PPC PR for 11.1

* MPIPL support for PowerNV
* ppc/pnv: Add a nest MMU model
* hw/ssi/pnv_spi: Fix fifo8 memory leak on unrealize
* hw/intc/xics: Add a check for an invalid server id
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEa4EM1tK+EPOIPSFCRUTplPnWj7sFAmnySUMACgkQRUTplPnW
j7veug/9ERfiOVoFLe9qYY+IRlAI7yWwieTW7gW1huXstcshk6e3y1tXH43DakE4
CAN5hzqBo/iUwgx7QaSgQUxtOU4waLURkBFWQUQ0syZcKKIg2rENELm4VN6GJunz
L5JOs0/55lcdLHCb4lJIhuW4AlYuVWYEzC9kGQD4dyliv7b9VygPnaYwWrmOM8KF
BiPXumDpbBJAwqnuMec08x6IU/I8CGyJuj6pbsPbL4XAVKfXmB5xM3zbK2gUUhky
cpD9AzdvPpeMCJCWwijYq3/s6ZqH4E4HrpEWqS8vVeorTvA4069bPw4ZBj6CFeJN
V1WvTMOukTwC4u6QMKnjrIaoKgIvtvHtSCTTdy6QhfMQxmguksAGMeoIDqKDLlQu
povwybYGh9viASpyPNaqkVM//ni1C68/rhsQ2wSk75f/D706M1JjGkTmAHclZRl3
xWiw3LxkGNka2BRWkfJCP+e7ntK4+k9j/kGwnLDTn870c7CcYO6bUiBdrNbqvWcP
5Rg5NeZeAc7caHZL+Zju28V2ntGdR+9dPyynDSliDJvUzb/biOdVLFfAZzD9rwYT
0yeLahsIlG2CQpr4LmmrQa7si+ZrYDHEUf3mjNPVS3rEDpdaso+TnSzulsYguIEM
fcmyhVw4cYqhaxn+nKpVQGfF/MQZfOHDmVi2DoEiTXr6xxTzLwc=
=Y8PU
-----END PGP SIGNATURE-----

----------------------------------------------------------------
Aditya Gupta (10):
      ppc/pnv: Move SBE host doorbell function to top of file
      ppc/mpipl: Implement S0 SBE interrupt
      ppc/pnv: Handle stash command in PowerNV SBE
      pnv/mpipl: Preserve memory regions as per MDST/MDDT tables
      pnv/mpipl: Preserve CPU registers after crash
      pnv/mpipl: Set thread entry size to be allocated by firmware
      pnv/mpipl: Write the preserved CPU and MDRT state
      pnv/mpipl: Enable MPIPL support
      tests/functional: Add test for MPIPL in PowerNV
      MAINTAINERS: Add entry for MPIPL (PowerNV)

Caleb Schlossin (2):
      hw/ssi/pnv_spi: Fix fifo8 memory leak on unrealize
      ppc/pnv: Add a nest MMU model

kiki (1):
      hw/intc/xics: Add a check for an invalid server id

 MAINTAINERS                           |   9 +
 include/hw/ppc/pnv.h                  |   6 +
 include/hw/ppc/pnv_chip.h             |   3 +
 include/hw/ppc/pnv_mpipl.h            | 168 ++++++++++++
 include/hw/ppc/pnv_nmmu.h             |  28 ++
 include/hw/ppc/pnv_xscom.h            |   4 +
 hw/intc/xics.c                        |   8 +
 hw/ppc/pnv.c                          | 128 ++++++++-
 hw/ppc/pnv_mpipl.c                    | 482 ++++++++++++++++++++++++++++++++++
 hw/ppc/pnv_nmmu.c                     | 132 ++++++++++
 hw/ppc/pnv_sbe.c                      |  85 +++++-
 hw/ssi/pnv_spi.c                      |   8 +
 hw/ppc/meson.build                    |   2 +
 tests/functional/ppc64/test_fadump.py |  35 +--
 14 files changed, 1066 insertions(+), 32 deletions(-)
 create mode 100644 include/hw/ppc/pnv_mpipl.h
 create mode 100644 include/hw/ppc/pnv_nmmu.h
 create mode 100644 hw/ppc/pnv_mpipl.c
 create mode 100644 hw/ppc/pnv_nmmu.c


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PULL 01/13] ppc/pnv: Move SBE host doorbell function to top of file
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
@ 2026-04-29 18:32 ` Harsh Prateek Bora
  2026-04-29 18:32 ` [PULL 02/13] ppc/mpipl: Implement S0 SBE interrupt Harsh Prateek Bora
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aditya Gupta, Hari Bathini, Sourabh Jain, Shivang Upadhyay

From: Aditya Gupta <adityag@linux.ibm.com>

Moved 'pnv_sbe_set_host_doorbell' as-it-is to above
'pnv_sbe_power9_xscom_ctrl_write'.

This is done since in a future patch, S0 interrupt implementation uses
'pnv_sbe_set_host_doorbell', hence the host doorbell function needs to
be declared/defined before 'pnv_sbe_power9_xscom_ctrl_write' where we
implement the S0 interrupt.

No functional change.

Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-2-adityag@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/pnv_sbe.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/ppc/pnv_sbe.c b/hw/ppc/pnv_sbe.c
index 27383ce683..247617338a 100644
--- a/hw/ppc/pnv_sbe.c
+++ b/hw/ppc/pnv_sbe.c
@@ -80,6 +80,15 @@
 #define SBE_CONTROL_REG_S0              PPC_BIT(14)
 #define SBE_CONTROL_REG_S1              PPC_BIT(15)
 
+static void pnv_sbe_set_host_doorbell(PnvSBE *sbe, uint64_t val)
+{
+    val &= SBE_HOST_RESPONSE_MASK; /* Is this right? What does HW do? */
+    sbe->host_doorbell = val;
+
+    trace_pnv_sbe_reg_set_host_doorbell(val);
+    qemu_set_irq(sbe->psi_irq, !!val);
+}
+
 struct sbe_msg {
     uint64_t reg[4];
 };
@@ -125,15 +134,6 @@ static const MemoryRegionOps pnv_sbe_power9_xscom_ctrl_ops = {
     .endianness = DEVICE_BIG_ENDIAN,
 };
 
-static void pnv_sbe_set_host_doorbell(PnvSBE *sbe, uint64_t val)
-{
-    val &= SBE_HOST_RESPONSE_MASK; /* Is this right? What does HW do? */
-    sbe->host_doorbell = val;
-
-    trace_pnv_sbe_reg_set_host_doorbell(val);
-    qemu_set_irq(sbe->psi_irq, !!val);
-}
-
 /* SBE Target Type */
 #define SBE_TARGET_TYPE_PROC            0x00
 #define SBE_TARGET_TYPE_EX              0x01
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 02/13] ppc/mpipl: Implement S0 SBE interrupt
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
  2026-04-29 18:32 ` [PULL 01/13] ppc/pnv: Move SBE host doorbell function to top of file Harsh Prateek Bora
@ 2026-04-29 18:32 ` Harsh Prateek Bora
  2026-04-29 18:32 ` [PULL 03/13] ppc/pnv: Handle stash command in PowerNV SBE Harsh Prateek Bora
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aditya Gupta, Hari Bathini, Sourabh Jain, Shivang Upadhyay

From: Aditya Gupta <adityag@linux.ibm.com>

During MPIPL (aka fadump), after a kernel crash, the kernel does
opal_cec_reboot2 opal call, signifying an abnormal termination.
When OPAL receives this opal call, it further triggers SBE S0 interrupt,
to trigger a MPIPL boot.

Currently S0 interrupt is unimplemented in QEMU.

Implement S0 interrupt as 'pause_vcpus' + 'guest_reset' in QEMU, as the
SBE's implementation of S0 seems to be basically "stop all clocks" and
then "host reset".

pause_vcpus is done in a later patch when register preserving support is
added

See 'stopClocksS0' in SBE source code for more information.

Also log both S0 and S1 interrupts.

Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-3-adityag@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 include/hw/ppc/pnv.h       |  5 +++++
 include/hw/ppc/pnv_mpipl.h | 19 +++++++++++++++++++
 hw/ppc/pnv_mpipl.c         | 26 ++++++++++++++++++++++++++
 hw/ppc/pnv_sbe.c           | 29 +++++++++++++++++++++++++++++
 hw/ppc/meson.build         |  1 +
 5 files changed, 80 insertions(+)
 create mode 100644 include/hw/ppc/pnv_mpipl.h
 create mode 100644 hw/ppc/pnv_mpipl.c

diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index ce3ce73b53..19c7170e74 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -25,6 +25,7 @@
 #include "hw/core/sysbus.h"
 #include "hw/ipmi/ipmi.h"
 #include "hw/ppc/pnv_pnor.h"
+#include "hw/ppc/pnv_mpipl.h"
 
 #define TYPE_PNV_CHIP "pnv-chip"
 
@@ -113,6 +114,7 @@ struct PnvMachineState {
     bool         lpar_per_core;
 
     Notifier     machine_init_done;
+    MpiplPreservedState mpipl_state;
 };
 
 PnvChip *pnv_get_chip(PnvMachineState *pnv, uint32_t chip_id);
@@ -292,4 +294,7 @@ void pnv_bmc_set_pnor(IPMIBmc *bmc, PnvPnor *pnor);
 
 #define PNV11_OCC_SENSOR_BASE(chip) PNV10_OCC_SENSOR_BASE(chip)
 
+/* MPIPL helpers */
+void do_mpipl_preserve(PnvMachineState *pnv);
+
 #endif /* PPC_PNV_H */
diff --git a/include/hw/ppc/pnv_mpipl.h b/include/hw/ppc/pnv_mpipl.h
new file mode 100644
index 0000000000..61ef7ef8fe
--- /dev/null
+++ b/include/hw/ppc/pnv_mpipl.h
@@ -0,0 +1,19 @@
+/*
+ * Emulation of MPIPL (Memory Preserving Initial Program Load), aka fadump
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef PNV_MPIPL_H
+#define PNV_MPIPL_H
+
+#include <stdbool.h>
+
+typedef struct MpiplPreservedState MpiplPreservedState;
+
+/* Preserved state to be saved in PnvMachineState */
+struct MpiplPreservedState {
+    bool       is_next_boot_mpipl;
+};
+
+#endif
diff --git a/hw/ppc/pnv_mpipl.c b/hw/ppc/pnv_mpipl.c
new file mode 100644
index 0000000000..d8c9b7a428
--- /dev/null
+++ b/hw/ppc/pnv_mpipl.c
@@ -0,0 +1,26 @@
+/*
+ * Emulation of MPIPL (Memory Preserving Initial Program Load), aka fadump
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "system/runstate.h"
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_mpipl.h"
+
+void do_mpipl_preserve(PnvMachineState *pnv)
+{
+    /* Mark next boot as Memory-preserving boot */
+    pnv->mpipl_state.is_next_boot_mpipl = true;
+
+    /*
+     * Do a guest reset.
+     * Next reset will see 'is_next_boot_mpipl' as true, and trigger MPIPL
+     *
+     * Requirement:
+     * GUEST_RESET is expected to NOT clear the memory, as is the case when
+     * this is merged
+     */
+    qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+}
diff --git a/hw/ppc/pnv_sbe.c b/hw/ppc/pnv_sbe.c
index 247617338a..5a2b3342d1 100644
--- a/hw/ppc/pnv_sbe.c
+++ b/hw/ppc/pnv_sbe.c
@@ -26,6 +26,9 @@
 #include "hw/ppc/pnv.h"
 #include "hw/ppc/pnv_xscom.h"
 #include "hw/ppc/pnv_sbe.h"
+#include "hw/ppc/pnv_mpipl.h"
+#include "system/cpus.h"
+#include "system/runstate.h"
 #include "trace.h"
 
 /*
@@ -113,11 +116,37 @@ static uint64_t pnv_sbe_power9_xscom_ctrl_read(void *opaque, hwaddr addr,
 static void pnv_sbe_power9_xscom_ctrl_write(void *opaque, hwaddr addr,
                                        uint64_t val, unsigned size)
 {
+    PnvMachineState *pnv = PNV_MACHINE(qdev_get_machine());
+    PnvSBE *sbe = opaque;
     uint32_t offset = addr >> 3;
 
     trace_pnv_sbe_xscom_ctrl_write(addr, val);
 
     switch (offset) {
+    case SBE_CONTROL_REG_RW:
+        switch (val) {
+        case SBE_CONTROL_REG_S0:
+            qemu_log_mask(LOG_UNIMP, "SBE: S0 Interrupt triggered\n");
+
+            pnv_sbe_set_host_doorbell(sbe, sbe->host_doorbell | SBE_HOST_RESPONSE_MASK);
+
+            /* Preserve memory regions and CPU state, if MPIPL is registered */
+            do_mpipl_preserve(pnv);
+
+            /*
+             * Control may not come back here as 'do_mpipl_preserve' triggers
+             * a guest reboot
+             */
+            break;
+        case SBE_CONTROL_REG_S1:
+            qemu_log_mask(LOG_UNIMP, "SBE: S1 Interrupt triggered\n");
+            break;
+        default:
+            qemu_log_mask(LOG_UNIMP,
+                "SBE: CONTROL_REG_RW: Unknown value: Ox%."
+                  HWADDR_PRIx "\n", val);
+        }
+        break;
     default:
         qemu_log_mask(LOG_UNIMP, "SBE Unimplemented register: Ox%"
                       HWADDR_PRIx "\n", addr >> 3);
diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
index f7dac87a2a..c61fba4ec8 100644
--- a/hw/ppc/meson.build
+++ b/hw/ppc/meson.build
@@ -56,6 +56,7 @@ ppc_ss.add(when: 'CONFIG_POWERNV', if_true: files(
   'pnv_pnor.c',
   'pnv_nest_pervasive.c',
   'pnv_n1_chiplet.c',
+  'pnv_mpipl.c',
 ))
 # PowerPC 4xx boards
 ppc_ss.add(when: 'CONFIG_PPC405', if_true: files(
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 03/13] ppc/pnv: Handle stash command in PowerNV SBE
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
  2026-04-29 18:32 ` [PULL 01/13] ppc/pnv: Move SBE host doorbell function to top of file Harsh Prateek Bora
  2026-04-29 18:32 ` [PULL 02/13] ppc/mpipl: Implement S0 SBE interrupt Harsh Prateek Bora
@ 2026-04-29 18:32 ` Harsh Prateek Bora
  2026-04-29 18:32 ` [PULL 04/13] pnv/mpipl: Preserve memory regions as per MDST/MDDT tables Harsh Prateek Bora
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aditya Gupta, Hari Bathini, Sourabh Jain, Shivang Upadhyay

From: Aditya Gupta <adityag@linux.ibm.com>

Earlier since the SBE_CMD_STASH_MPIPL_CONFIG command was not handled, so
skiboot used to not get any response from SBE:

    [  106.350742821,3] SBE: Message timeout [chip id = 0], cmd = d7, subcmd = 7
    [  106.352067746,3] SBE: Failed to send stash MPIPL config [chip id = 0x0, rc = 254]

Fix this by handling the command in PowerNV SBE, and sending a response so
skiboot knows SBE has handled the STASH command

The stashed skiboot base is later used to access the relocated MDST/MDDT
tables when MPIPL is implemented.

The purpose of stashing relocated base address is explained in following
skiboot commit:

    author Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Fri Jul 12 16:47:51 2019 +0530
    committer Oliver O'Halloran <oohall@gmail.com> Thu Aug 15 17:53:39 2019 +1000

    SBE: Send OPAL relocated base address to SBE

      OPAL relocates itself during boot. During memory preserving IPL hostboot needs
      to access relocated OPAL base address to get MDST, MDDT tables. Hence send
      relocated base address to SBE via 'stash MPIPL config' chip-op. During next
      IPL SBE will send stashed data to hostboot... so that hostboot can access
      these data.

Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-4-adityag@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 include/hw/ppc/pnv_mpipl.h |  5 +++++
 hw/ppc/pnv_sbe.c           | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/include/hw/ppc/pnv_mpipl.h b/include/hw/ppc/pnv_mpipl.h
index 61ef7ef8fe..d1d542b724 100644
--- a/include/hw/ppc/pnv_mpipl.h
+++ b/include/hw/ppc/pnv_mpipl.h
@@ -8,11 +8,16 @@
 #define PNV_MPIPL_H
 
 #include <stdbool.h>
+#include <stdint.h>
+
+#include "exec/hwaddr.h"
 
 typedef struct MpiplPreservedState MpiplPreservedState;
 
 /* Preserved state to be saved in PnvMachineState */
 struct MpiplPreservedState {
+    /* skiboot_base will be valid only after OPAL sends relocated base to SBE */
+    hwaddr     skiboot_base;
     bool       is_next_boot_mpipl;
 };
 
diff --git a/hw/ppc/pnv_sbe.c b/hw/ppc/pnv_sbe.c
index 5a2b3342d1..90fc407d05 100644
--- a/hw/ppc/pnv_sbe.c
+++ b/hw/ppc/pnv_sbe.c
@@ -233,8 +233,11 @@ static void sbe_timer(void *opaque)
 
 static void do_sbe_msg(PnvSBE *sbe)
 {
+    PnvMachineState *pnv = PNV_MACHINE(qdev_get_machine());
+    MachineState *machine = MACHINE(pnv);
     struct sbe_msg msg;
     uint16_t cmd, ctrl_flags, seq_id;
+    uint64_t mbox_val;
     int i;
 
     memset(&msg, 0, sizeof(msg));
@@ -265,6 +268,41 @@ static void do_sbe_msg(PnvSBE *sbe)
             timer_del(sbe->timer);
         }
         break;
+    case SBE_CMD_STASH_MPIPL_CONFIG:
+        /* key = sbe->mbox[1] */
+        switch (sbe->mbox[1]) {
+        case SBE_STASH_KEY_SKIBOOT_BASE:
+            mbox_val = sbe->mbox[2];
+            if (mbox_val >= machine->ram_size) {
+                qemu_log_mask(LOG_GUEST_ERROR,
+                  "SBE: skiboot_base 0x%" PRIx64 \
+                  "exceeds RAM size 0x" RAM_ADDR_FMT "\n",
+                  mbox_val, machine->ram_size);
+                return;
+            }
+
+            pnv->mpipl_state.skiboot_base = mbox_val;
+            qemu_log_mask(LOG_UNIMP,
+                "Stashing skiboot base: 0x%" HWADDR_PRIx "\n",
+                pnv->mpipl_state.skiboot_base);
+
+            /*
+             * Set the response register.
+             *
+             * Currently setting the same sequence number in
+             * response as we got in the request.
+             */
+            sbe->mbox[4] = sbe->mbox[0];    /* sequence number */
+            pnv_sbe_set_host_doorbell(sbe,
+                    sbe->host_doorbell | SBE_HOST_RESPONSE_WAITING);
+
+            break;
+        default:
+            qemu_log_mask(LOG_UNIMP,
+                "SBE: CMD_STASH_MPIPL_CONFIG: Unimplemented key: 0x" TARGET_FMT_lx "\n",
+                sbe->mbox[1]);
+        }
+        break;
     default:
         qemu_log_mask(LOG_UNIMP, "SBE Unimplemented command: 0x%x\n", cmd);
     }
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 04/13] pnv/mpipl: Preserve memory regions as per MDST/MDDT tables
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
                   ` (2 preceding siblings ...)
  2026-04-29 18:32 ` [PULL 03/13] ppc/pnv: Handle stash command in PowerNV SBE Harsh Prateek Bora
@ 2026-04-29 18:32 ` Harsh Prateek Bora
  2026-04-29 18:32 ` [PULL 05/13] pnv/mpipl: Preserve CPU registers after crash Harsh Prateek Bora
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aditya Gupta, Hari Bathini, Sourabh Jain, Shivang Upadhyay

From: Aditya Gupta <adityag@linux.ibm.com>

Implement copying of memory region, as mentioned by MDST and MDDT
tables.

Copy the memory regions from source to destination in chunks of 32MB

Note, qemu can fail preserving a particular entry due to any reason,
such as:
  * region length mis-matching in MDST & MDDT
  * failed copy due to access/decode/etc memory issues

HDAT doesn't specify any field in MDRT to notify host about such errors.

Though HDAT section "15.3.1.3 Memory Dump Results Table (MDRT)" says:
    The Memory Dump Results Table is a list of the memory ranges that
    have been included in the dump

Based on above statement, it looks like MDRT should include only those
regions which are successfully captured in the dump, hence, regions
which qemu fails to dump, just get skipped, and will not have a
corresponding entry in MDRT

Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-5-adityag@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 include/hw/ppc/pnv_mpipl.h |  84 +++++++++++++++++++
 hw/ppc/pnv_mpipl.c         | 162 +++++++++++++++++++++++++++++++++++++
 2 files changed, 246 insertions(+)

diff --git a/include/hw/ppc/pnv_mpipl.h b/include/hw/ppc/pnv_mpipl.h
index d1d542b724..b3d980dfef 100644
--- a/include/hw/ppc/pnv_mpipl.h
+++ b/include/hw/ppc/pnv_mpipl.h
@@ -7,18 +7,102 @@
 #ifndef PNV_MPIPL_H
 #define PNV_MPIPL_H
 
+#include <assert.h>
 #include <stdbool.h>
 #include <stdint.h>
 
 #include "exec/hwaddr.h"
+#include "qemu/compiler.h"
 
+typedef struct MdstTableEntry MdstTableEntry;
+typedef struct MdrtTableEntry MdrtTableEntry;
 typedef struct MpiplPreservedState MpiplPreservedState;
 
+/*
+ * Following offsets are copied from skiboot source code.
+ * These need to be updated if this changes in a future skiboot version
+ */
+/* Use 768 bytes for SPIRAH */
+#define SPIRAH_OFF      0x00010000
+#define SPIRAH_SIZE     0x300
+
+/* Use 256 bytes for processor dump area */
+#define PROC_DUMP_AREA_OFF  (SPIRAH_OFF + SPIRAH_SIZE)
+#define PROC_DUMP_AREA_SIZE 0x100
+
+#define PROCIN_OFF      (PROC_DUMP_AREA_OFF + PROC_DUMP_AREA_SIZE)
+#define PROCIN_SIZE     0x800
+
+/* Offsets of MDST and MDDT tables from skiboot base */
+#define MDST_TABLE_OFF      (PROCIN_OFF + PROCIN_SIZE)
+#define MDST_TABLE_SIZE     0x400
+
+#define MDDT_TABLE_OFF      (MDST_TABLE_OFF + MDST_TABLE_SIZE)
+#define MDDT_TABLE_SIZE     0x400
+/*
+ * Offset of the dump result table MDRT. Hostboot will write to this
+ * memory after moving memory content from source to destination memory.
+ */
+#define MDRT_TABLE_OFF         0x01c00000
+#define MDRT_TABLE_SIZE        0x00008000
+
+/* HRMOR_BIT copied from skiboot */
+#define HRMOR_BIT (1ull << 63)
+
+/*
+ * Memory Dump Source Table (MDST)
+ *
+ * Format of this table is same as Memory Dump Source Table defined in HDAT
+ */
+struct MdstTableEntry {
+    uint64_t  addr;
+    uint8_t data_region;
+    uint8_t dump_type;
+    uint16_t  reserved;
+    uint32_t  size;
+} QEMU_PACKED;
+
+/* Memory dump destination table (MDDT) has same structure as MDST */
+typedef MdstTableEntry MddtTableEntry;
+
+/*
+ * Memory dump result table (MDRT)
+ *
+ * List of the memory ranges that have been included in the dump. This table is
+ * filled by hostboot and passed to OPAL on second boot. OPAL/payload will use
+ * this table to extract the dump.
+ *
+ * Note: This structure differs from HDAT, but matches the structure
+ * skiboot uses
+ */
+struct MdrtTableEntry {
+    uint64_t  src_addr;
+    uint64_t  dest_addr;
+    uint8_t data_region;
+    uint8_t dump_type;  /* unused */
+    uint16_t  reserved;   /* unused */
+    uint32_t  size;
+    uint64_t  padding;    /* unused */
+} QEMU_PACKED;
+
+/* Maximum length of mdst/mddt/mdrt tables */
+#define MDST_MAX_ENTRIES    (MDST_TABLE_SIZE / sizeof(MdstTableEntry))
+#define MDDT_MAX_ENTRIES    (MDDT_TABLE_SIZE / sizeof(MddtTableEntry))
+#define MDRT_MAX_ENTRIES    (MDRT_TABLE_SIZE / sizeof(MdrtTableEntry))
+
+static_assert(MDST_MAX_ENTRIES == MDDT_MAX_ENTRIES,
+        "Maximum entries in MDDT must match MDST");
+static_assert(MDRT_MAX_ENTRIES >= MDST_MAX_ENTRIES,
+        "MDRT should support atleast having number of entries as in MDST");
+
 /* Preserved state to be saved in PnvMachineState */
 struct MpiplPreservedState {
     /* skiboot_base will be valid only after OPAL sends relocated base to SBE */
     hwaddr     skiboot_base;
     bool       is_next_boot_mpipl;
+
+    MdrtTableEntry *mdrt_table;
+    uint32_t num_mdrt_entries;
 };
 
 #endif
diff --git a/hw/ppc/pnv_mpipl.c b/hw/ppc/pnv_mpipl.c
index d8c9b7a428..cef1fe2c40 100644
--- a/hw/ppc/pnv_mpipl.c
+++ b/hw/ppc/pnv_mpipl.c
@@ -5,12 +5,174 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/units.h"
+#include "system/address-spaces.h"
 #include "system/runstate.h"
 #include "hw/ppc/pnv.h"
 #include "hw/ppc/pnv_mpipl.h"
+#include <math.h>
+
+#define MDST_TABLE_RELOCATED                            \
+    (pnv->mpipl_state.skiboot_base + MDST_TABLE_OFF)
+#define MDDT_TABLE_RELOCATED                            \
+    (pnv->mpipl_state.skiboot_base + MDDT_TABLE_OFF)
+
+/*
+ * Preserve the memory regions as pointed by MDST table
+ *
+ * During this, the memory region pointed by entries in MDST, are 'copied'
+ * as it is to the memory region pointed by corresponding entry in MDDT
+ *
+ * Notes: All reads should consider data coming from skiboot as big-endian,
+ *        and data written should also be in big-endian
+ */
+static bool pnv_mpipl_preserve_mem(PnvMachineState *pnv)
+{
+    g_autofree MdstTableEntry *mdst = g_malloc(MDST_TABLE_SIZE);
+    g_autofree MddtTableEntry *mddt = g_malloc(MDDT_TABLE_SIZE);
+    g_autofree MdrtTableEntry *mdrt = g_malloc0(MDRT_TABLE_SIZE);
+    AddressSpace *default_as = &address_space_memory;
+    MemTxResult io_result;
+    MemTxAttrs attrs;
+    uint64_t src_addr, dest_addr;
+    uint32_t data_len;
+    uint64_t num_chunks, chunk_id = 0;
+    int mdrt_idx = 0;
+
+    /* Mark the memory transactions as privileged memory access */
+    attrs.user = 0;
+    attrs.memory = 1;
+
+    if (pnv->mpipl_state.mdrt_table) {
+        /*
+         * MDRT table allocated from some past crash, free the memory to
+         * prevent memory leak
+         */
+        g_free(pnv->mpipl_state.mdrt_table);
+        pnv->mpipl_state.num_mdrt_entries = 0;
+    }
+
+    io_result = address_space_read(default_as, MDST_TABLE_RELOCATED, attrs,
+            mdst, MDST_TABLE_SIZE);
+    if (io_result != MEMTX_OK) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+            "MPIPL: Failed to read MDST table at: 0x" TARGET_FMT_lx "\n",
+            MDST_TABLE_RELOCATED);
+
+        return false;
+    }
+
+    io_result = address_space_read(default_as, MDDT_TABLE_RELOCATED, attrs,
+            mddt, MDDT_TABLE_SIZE);
+    if (io_result != MEMTX_OK) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+            "MPIPL: Failed to read MDDT table at: 0x" TARGET_FMT_lx "\n",
+            MDDT_TABLE_RELOCATED);
+
+        return false;
+    }
+
+    /* Try to read all entries */
+    for (int i = 0; i < MDST_MAX_ENTRIES; ++i) {
+        g_autofree uint8_t *copy_buffer = NULL;
+        bool is_copy_failed = false;
+
+        /* Considering entry with address and size as 0, as end of table */
+        if ((mdst[i].addr == 0) && (mdst[i].size == 0)) {
+            break;
+        }
+
+        if (mdst[i].size != mddt[i].size) {
+            qemu_log_mask(LOG_TRACE,
+                    "Warning: Invalid entry, size mismatch in MDST & MDDT\n");
+            continue;
+        }
+
+        if (mdst[i].data_region != mddt[i].data_region) {
+            qemu_log_mask(LOG_TRACE,
+                    "Warning: Invalid entry, region mismatch in MDST & MDDT\n");
+            continue;
+        }
+
+        src_addr  = be64_to_cpu(mdst[i].addr) & ~HRMOR_BIT;
+        dest_addr = be64_to_cpu(mddt[i].addr) & ~HRMOR_BIT;
+        data_len   = be32_to_cpu(mddt[i].size);
+
+#define COPY_CHUNK_SIZE  ((size_t)(32 * MiB))
+        copy_buffer = g_try_malloc(COPY_CHUNK_SIZE);
+        if (copy_buffer == NULL) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                "MPIPL: Failed allocating memory (size: %zu) for copying"
+                " reserved memory regions\n", COPY_CHUNK_SIZE);
+            is_copy_failed = true;
+            continue;
+        }
+
+        chunk_id = 0;
+        num_chunks = ceil((data_len * 1.0f) / COPY_CHUNK_SIZE);
+        while (chunk_id < num_chunks) {
+            /* Take minimum of bytes left to copy, and chunk size */
+            uint64_t copy_len = MIN(
+                            data_len - (chunk_id * COPY_CHUNK_SIZE),
+                            COPY_CHUNK_SIZE
+                        );
+
+            /* Copy the source region to destination */
+            io_result = address_space_read(default_as, src_addr, attrs,
+                    copy_buffer, copy_len);
+            if (io_result != MEMTX_OK) {
+                qemu_log_mask(LOG_GUEST_ERROR,
+                    "MPIPL: Failed to read region at: 0x%" PRIx64 "\n",
+                    src_addr);
+                is_copy_failed = true;
+                break;
+            }
+
+            io_result = address_space_write(default_as, dest_addr, attrs,
+                    copy_buffer, copy_len);
+            if (io_result != MEMTX_OK) {
+                qemu_log_mask(LOG_GUEST_ERROR,
+                    "MPIPL: Failed to write region at: 0x%" PRIx64 "\n",
+                    dest_addr);
+                is_copy_failed = true;
+                break;
+            }
+
+            src_addr += COPY_CHUNK_SIZE;
+            dest_addr += COPY_CHUNK_SIZE;
+            ++chunk_id;
+        }
+#undef COPY_CHUNK_SIZE
+
+        if (is_copy_failed) {
+            /*
+             * HDAT doesn't specify an error code in MDRT for failed copy,
+             * and doesn't specify how this is to be handled
+             * Hence just skip adding an entry in MDRT, as done for size
+             * mismatch or other inconsistency between MDST/MDDT
+             */
+            continue;
+        }
+
+        /* Populate entry in MDRT table if preserving successful */
+        mdrt[mdrt_idx].src_addr    = cpu_to_be64(src_addr);
+        mdrt[mdrt_idx].dest_addr   = cpu_to_be64(dest_addr);
+        mdrt[mdrt_idx].size        = cpu_to_be32(data_len);
+        mdrt[mdrt_idx].data_region = mdst[i].data_region;
+        ++mdrt_idx;
+    }
+
+    pnv->mpipl_state.mdrt_table = g_steal_pointer(&mdrt);
+    pnv->mpipl_state.num_mdrt_entries = mdrt_idx;
+
+    return true;
+}
 
 void do_mpipl_preserve(PnvMachineState *pnv)
 {
+    pnv_mpipl_preserve_mem(pnv);
+
     /* Mark next boot as Memory-preserving boot */
     pnv->mpipl_state.is_next_boot_mpipl = true;
 
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 05/13] pnv/mpipl: Preserve CPU registers after crash
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
                   ` (3 preceding siblings ...)
  2026-04-29 18:32 ` [PULL 04/13] pnv/mpipl: Preserve memory regions as per MDST/MDDT tables Harsh Prateek Bora
@ 2026-04-29 18:32 ` Harsh Prateek Bora
  2026-04-29 18:32 ` [PULL 06/13] pnv/mpipl: Set thread entry size to be allocated by firmware Harsh Prateek Bora
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aditya Gupta, Hari Bathini, Sourabh Jain, Shivang Upadhyay

From: Aditya Gupta <adityag@linux.ibm.com>

Kernel expects the platform to provide CPU registers after pausing
execution of the CPUs.

Currently only exporting the registers, used by Linux, for generating
the /proc/vmcore

Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-6-adityag@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 include/hw/ppc/pnv_mpipl.h |  60 +++++++++++++++
 hw/ppc/pnv_mpipl.c         | 154 +++++++++++++++++++++++++++++++++++++
 2 files changed, 214 insertions(+)

diff --git a/include/hw/ppc/pnv_mpipl.h b/include/hw/ppc/pnv_mpipl.h
index b3d980dfef..aa2936caa7 100644
--- a/include/hw/ppc/pnv_mpipl.h
+++ b/include/hw/ppc/pnv_mpipl.h
@@ -17,6 +17,10 @@
 typedef struct MdstTableEntry MdstTableEntry;
 typedef struct MdrtTableEntry MdrtTableEntry;
 typedef struct MpiplPreservedState MpiplPreservedState;
+typedef struct MpiplRegDataHdr MpiplRegDataHdr;
+typedef struct MpiplRegEntry MpiplRegEntry;
+typedef struct MpiplProcDumpArea MpiplProcDumpArea;
+typedef struct MpiplPreservedCPUState MpiplPreservedCPUState;
 
 /*
  * Following offsets are copied from skiboot source code.
@@ -49,6 +53,8 @@ typedef struct MpiplPreservedState MpiplPreservedState;
 /* HRMOR_BIT copied from skiboot */
 #define HRMOR_BIT (1ull << 63)
 
+#define NUM_REGS_PER_CPU 66 /*(32 GPRs, 34 SPRs)*/
+
 /*
  * Memory Dump Source Table (MDST)
  *
@@ -95,6 +101,55 @@ static_assert(MDST_MAX_ENTRIES == MDDT_MAX_ENTRIES,
 static_assert(MDRT_MAX_ENTRIES >= MDST_MAX_ENTRIES,
         "MDRT should support atleast having number of entries as in MDST");
 
+/*
+ * Processor Dump Area
+ *
+ * This contains the information needed for having processor
+ * state captured during a platform dump.
+ *
+ * As mentioned in HDAT, following the P9 specific format
+ */
+struct MpiplProcDumpArea {
+    uint32_t  thread_size;    /* Size of each thread register entry */
+#define PROC_DUMP_AREA_VERSION_P9    0x1    /* P9 format */
+    uint8_t version;
+    uint8_t reserved[11];
+    uint64_t  alloc_addr;    /* Destination memory to place register data */
+    uint32_t  reserved2;
+    uint32_t  alloc_size;    /* Allocated size */
+    uint64_t  dest_addr;     /* Destination address */
+    uint32_t  reserved3;
+    uint32_t  act_size;      /* Actual data size */
+} QEMU_PACKED;
+
+/*
+ * "Architected Register Data" in the HDAT spec
+ *
+ * Acts as a header to the register entries for a particular thread
+ */
+struct MpiplRegDataHdr {
+    uint32_t pir;         /* PIR of thread */
+    uint8_t  core_state;  /* Stop state of the overall core */
+    uint8_t  reserved[3];
+    uint32_t off_regentries;  /* Offset to Register Entries Array */
+    uint32_t num_regentries;  /* Number of Register Entries in Array */
+    uint32_t alloc_size;  /* Allocated size for each Register Entry */
+    uint32_t act_size;    /* Actual size for each Register Entry */
+} QEMU_PACKED;
+
+struct MpiplRegEntry {
+    uint32_t reg_type;
+    uint32_t reg_num;
+    uint64_t reg_val;
+} QEMU_PACKED;
+
+struct MpiplPreservedCPUState {
+    MpiplRegDataHdr hdr;
+
+    /* Length of 'reg_entries' is hdr.num_regentries */
+    MpiplRegEntry  reg_entries[NUM_REGS_PER_CPU];
+};
+
 /* Preserved state to be saved in PnvMachineState */
 struct MpiplPreservedState {
     /* skiboot_base will be valid only after OPAL sends relocated base to SBE */
@@ -103,6 +158,11 @@ struct MpiplPreservedState {
 
     MdrtTableEntry *mdrt_table;
     uint32_t num_mdrt_entries;
+
+    MpiplProcDumpArea proc_area;
+
+    MpiplPreservedCPUState *cpu_states;
+    uint32_t num_cpu_states;
 };
 
 #endif
diff --git a/hw/ppc/pnv_mpipl.c b/hw/ppc/pnv_mpipl.c
index cef1fe2c40..308948b829 100644
--- a/hw/ppc/pnv_mpipl.c
+++ b/hw/ppc/pnv_mpipl.c
@@ -8,6 +8,9 @@
 #include "qemu/log.h"
 #include "qemu/units.h"
 #include "system/address-spaces.h"
+#include "system/cpus.h"
+#include "system/hw_accel.h"
+#include "system/memory.h"
 #include "system/runstate.h"
 #include "hw/ppc/pnv.h"
 #include "hw/ppc/pnv_mpipl.h"
@@ -17,6 +20,8 @@
     (pnv->mpipl_state.skiboot_base + MDST_TABLE_OFF)
 #define MDDT_TABLE_RELOCATED                            \
     (pnv->mpipl_state.skiboot_base + MDDT_TABLE_OFF)
+#define PROC_DUMP_RELOCATED                             \
+    (pnv->mpipl_state.skiboot_base + PROC_DUMP_AREA_OFF)
 
 /*
  * Preserve the memory regions as pointed by MDST table
@@ -169,9 +174,158 @@ static bool pnv_mpipl_preserve_mem(PnvMachineState *pnv)
     return true;
 }
 
+static void do_store_cpu_regs(CPUState *cpu, MpiplPreservedCPUState *state)
+{
+    CPUPPCState *env = cpu_env(cpu);
+    MpiplRegDataHdr *regs_hdr = &state->hdr;
+    MpiplRegEntry *reg_entries = state->reg_entries;
+    MpiplRegEntry *curr_reg_entry;
+    uint32_t num_saved_regs = 0;
+
+    cpu_synchronize_state(cpu);
+
+    regs_hdr->pir = cpu_to_be32(env->spr[SPR_PIR]);
+
+    /* QEMU CPUs are not in Power Saving Mode */
+    regs_hdr->core_state = 0xff;
+
+    regs_hdr->off_regentries = 0;
+    regs_hdr->num_regentries = cpu_to_be32(NUM_REGS_PER_CPU);
+
+    regs_hdr->alloc_size = cpu_to_be32(sizeof(MpiplRegEntry));
+    regs_hdr->act_size   = cpu_to_be32(sizeof(MpiplRegEntry));
+
+#define REG_TYPE_GPR  0x1
+#define REG_TYPE_SPR  0x2
+#define REG_TYPE_TIMA 0x3
+
+/*
+ * ID numbers used by f/w while populating certain registers
+ *
+ * Copied these defines from the linux kernel
+ */
+#define REG_ID_NIP          0x7D0
+#define REG_ID_MSR          0x7D1
+#define REG_ID_CCR          0x7D2
+
+    curr_reg_entry = reg_entries;
+
+#define REG_ENTRY(type, num, val)                          \
+    do {                                               \
+        curr_reg_entry->reg_type = cpu_to_be32(type);  \
+        curr_reg_entry->reg_num  = cpu_to_be32(num);   \
+        curr_reg_entry->reg_val  = cpu_to_be64(val);   \
+        ++curr_reg_entry;                              \
+        ++num_saved_regs;                            \
+    } while (0)
+
+    /* Save the GPRs */
+    for (int gpr_id = 0; gpr_id < 32; ++gpr_id) {
+        REG_ENTRY(REG_TYPE_GPR, gpr_id, env->gpr[gpr_id]);
+    }
+
+    REG_ENTRY(REG_TYPE_SPR, SPR_ACOP, env->spr[SPR_ACOP]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_AMR, env->spr[SPR_AMR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_BESCR, env->spr[SPR_BESCR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_CFAR, env->spr[SPR_CFAR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_CIABR, env->spr[SPR_CIABR]);
+
+    REG_ENTRY(REG_TYPE_SPR, SPR_CTR, env->spr[SPR_CTR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_CTRL, env->spr[SPR_CTRL]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DABR, env->spr[SPR_DABR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DABRX, env->spr[SPR_DABRX]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DAR, env->spr[SPR_DAR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DAWR0, env->spr[SPR_DAWR0]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DAWR1, env->spr[SPR_DAWR1]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DAWRX0, env->spr[SPR_DAWRX0]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DAWRX1, env->spr[SPR_DAWRX1]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DPDES, env->spr[SPR_DPDES]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DSCR, env->spr[SPR_DSCR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DSISR, env->spr[SPR_DSISR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_EBBHR, env->spr[SPR_EBBHR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_EBBRR, env->spr[SPR_EBBRR]);
+
+    REG_ENTRY(REG_TYPE_SPR, SPR_FSCR, env->spr[SPR_FSCR]);
+
+    REG_ENTRY(REG_TYPE_SPR, SPR_CTR, env->ctr);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DAR, env->spr[SPR_DAR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_DSISR, env->spr[SPR_DSISR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_LR, env->lr);
+    REG_ENTRY(REG_TYPE_SPR, REG_ID_MSR, env->msr);
+    REG_ENTRY(REG_TYPE_SPR, REG_ID_NIP, env->nip);
+    REG_ENTRY(REG_TYPE_SPR, SPR_XER, env->xer);
+    REG_ENTRY(REG_TYPE_SPR, SPR_SRR0, env->spr[SPR_SRR0]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_SRR1, env->spr[SPR_SRR1]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_HSRR0, env->spr[SPR_HSRR0]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_HSRR1, env->spr[SPR_HSRR1]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_CFAR, env->spr[SPR_CFAR]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_HMER, env->spr[SPR_HMER]);
+    REG_ENTRY(REG_TYPE_SPR, SPR_HMEER, env->spr[SPR_HMEER]);
+
+    /*
+     * Ensure the number of registers saved match the number of
+     * registers per cpu
+     *
+     * This will help catch an error if in future a new register entry
+     * is added/removed while not modifying NUM_PER_CPU_REGS
+     */
+    assert(num_saved_regs == NUM_REGS_PER_CPU);
+}
+
+static bool pnv_mpipl_preserve_cpu_state(PnvMachineState *pnv)
+{
+    MachineState *machine = MACHINE(pnv);
+    uint32_t num_cpus = machine->smp.cpus;
+    MpiplPreservedCPUState *state;
+    CPUState *cpu;
+    AddressSpace *default_as = &address_space_memory;
+    MemTxResult io_result;
+    MemTxAttrs attrs;
+
+    /* Mark the memory transactions as privileged memory access */
+    attrs.user = 0;
+    attrs.memory = 1;
+
+    if (pnv->mpipl_state.cpu_states) {
+        /*
+         * CPU States might have been allocated from some past crash, free the
+         * memory to preven memory leak
+         */
+        g_free(pnv->mpipl_state.cpu_states);
+        pnv->mpipl_state.num_cpu_states = 0;
+    }
+
+    pnv->mpipl_state.cpu_states = g_malloc_n(num_cpus,
+            sizeof(MpiplPreservedCPUState));
+    pnv->mpipl_state.num_cpu_states = num_cpus;
+
+    state = pnv->mpipl_state.cpu_states;
+
+    /* Preserve the Processor Dump Area */
+    io_result = address_space_read(default_as, PROC_DUMP_RELOCATED, attrs,
+            &pnv->mpipl_state.proc_area, sizeof(MpiplProcDumpArea));
+    if (io_result != MEMTX_OK) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+            "MPIPL: Failed to read Proc Dump Area at: 0x" TARGET_FMT_lx "\n",
+            PROC_DUMP_RELOCATED);
+
+        return false;
+    }
+
+    CPU_FOREACH(cpu) {
+        do_store_cpu_regs(cpu, state);
+        ++state;
+    }
+
+    return true;
+}
+
 void do_mpipl_preserve(PnvMachineState *pnv)
 {
+    pause_all_vcpus();
+
     pnv_mpipl_preserve_mem(pnv);
+    pnv_mpipl_preserve_cpu_state(pnv);
 
     /* Mark next boot as Memory-preserving boot */
     pnv->mpipl_state.is_next_boot_mpipl = true;
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 06/13] pnv/mpipl: Set thread entry size to be allocated by firmware
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
                   ` (4 preceding siblings ...)
  2026-04-29 18:32 ` [PULL 05/13] pnv/mpipl: Preserve CPU registers after crash Harsh Prateek Bora
@ 2026-04-29 18:32 ` Harsh Prateek Bora
  2026-05-08  9:15   ` Peter Maydell
  2026-04-29 18:32 ` [PULL 07/13] pnv/mpipl: Write the preserved CPU and MDRT state Harsh Prateek Bora
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aditya Gupta, Hari Bathini, Sourabh Jain, Shivang Upadhyay

From: Aditya Gupta <adityag@linux.ibm.com>

Set the "Thread Register State Entry Size" that is required by firmware
(OPAL), to know size of memory to allocate to capture CPU state, in the
event of a crash

Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-7-adityag@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/pnv.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 524563dcfc..09b69c355a 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -748,10 +748,35 @@ static void pnv_powerdown_notify(Notifier *n, void *opaque)
 
 static void pnv_reset(MachineState *machine, ResetType type)
 {
+    PnvMachineState *pnv = PNV_MACHINE(machine);
     void *fdt;
 
     qemu_devices_reset(type);
 
+    if (!pnv->mpipl_state.is_next_boot_mpipl) {
+        /*
+         * Set the "Thread Register State Entry Size", so that firmware can
+         * allocate enough memory to capture CPU state in the event of a
+         * crash
+         */
+
+        MpiplProcDumpArea proc_area;
+
+        proc_area.version = PROC_DUMP_AREA_VERSION_P9;
+        proc_area.thread_size = cpu_to_be32(sizeof(MpiplPreservedCPUState));
+
+        /* These are to be allocated & assigned by the firmware */
+        proc_area.alloc_addr = 0;
+        proc_area.alloc_size = 0;
+
+        /* These get assigned after crash, when QEMU preserves the registers */
+        proc_area.dest_addr = 0;
+        proc_area.act_size = 0;
+
+        cpu_physical_memory_write(PROC_DUMP_AREA_OFF, &proc_area,
+                sizeof(proc_area));
+    }
+
     fdt = machine->fdt;
     cpu_physical_memory_write(PNV_FDT_ADDR, fdt, fdt_totalsize(fdt));
 }
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 07/13] pnv/mpipl: Write the preserved CPU and MDRT state
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
                   ` (5 preceding siblings ...)
  2026-04-29 18:32 ` [PULL 06/13] pnv/mpipl: Set thread entry size to be allocated by firmware Harsh Prateek Bora
@ 2026-04-29 18:32 ` Harsh Prateek Bora
  2026-04-29 18:32 ` [PULL 08/13] pnv/mpipl: Enable MPIPL support Harsh Prateek Bora
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aditya Gupta, Hari Bathini, Sourabh Jain, Shivang Upadhyay

From: Aditya Gupta <adityag@linux.ibm.com>

Logic for preserving the CPU registers and memory regions has been done
in previous patches.

Write those data at the relevant memory address, such as PROC_DUMP_AREA
for CPU registers, and MDRT for preserved memory regions.

Also export "mpipl-boot" device tree node, for kernel to know that it's
a 'dump active' boot

Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-8-adityag@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 include/hw/ppc/pnv.h |   1 +
 hw/ppc/pnv.c         |  39 +++++++++++-
 hw/ppc/pnv_mpipl.c   | 140 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 19c7170e74..f8234fb3cd 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -296,5 +296,6 @@ void pnv_bmc_set_pnor(IPMIBmc *bmc, PnvPnor *pnor);
 
 /* MPIPL helpers */
 void do_mpipl_preserve(PnvMachineState *pnv);
+bool do_mpipl_write(PnvMachineState *pnv);
 
 #endif /* PPC_PNV_H */
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 09b69c355a..48f49bef82 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -750,10 +750,47 @@ static void pnv_reset(MachineState *machine, ResetType type)
 {
     PnvMachineState *pnv = PNV_MACHINE(machine);
     void *fdt;
+    int node_offset;
+    bool mpipl_write_succeeded = false;
 
     qemu_devices_reset(type);
 
-    if (!pnv->mpipl_state.is_next_boot_mpipl) {
+    /*
+     * Only on success of writing MPIPL data will the next boot be provided
+     * "mpipl-boot" property in device tree
+     * Otherwise boot like a normal non-MPIPL boot
+     */
+    if (pnv->mpipl_state.is_next_boot_mpipl) {
+        /* Write the preserved MDRT and CPU State Data */
+        mpipl_write_succeeded = do_mpipl_write(pnv);
+    }
+
+    /*
+     * If it's a MPIPL boot, add the "mpipl-boot" property, and reset the
+     * boolean for MPIPL boot for next boot
+     */
+    if (mpipl_write_succeeded) {
+        void *fdt_copy = g_malloc0(FDT_MAX_SIZE);
+
+        /* Create a writable copy of the fdt */
+        _FDT((fdt_open_into(fdt, fdt_copy, FDT_MAX_SIZE)));
+
+        node_offset = fdt_path_offset(fdt_copy, "/ibm,opal/dump");
+        _FDT((fdt_appendprop_u64(fdt_copy, node_offset, "mpipl-boot", 1)));
+
+        /* Update the fdt, and free the original fdt */
+        if (fdt != machine->fdt) {
+            /*
+             * Only free the fdt if it's not machine->fdt, to prevent
+             * double free, since we already free machine->fdt later
+             */
+            g_free(fdt);
+        }
+        fdt = fdt_copy;
+
+        /* This boot is an MPIPL, reset the boolean for next boot */
+        pnv->mpipl_state.is_next_boot_mpipl = false;
+    } else {
         /*
          * Set the "Thread Register State Entry Size", so that firmware can
          * allocate enough memory to capture CPU state in the event of a
diff --git a/hw/ppc/pnv_mpipl.c b/hw/ppc/pnv_mpipl.c
index 308948b829..f5b228f5ba 100644
--- a/hw/ppc/pnv_mpipl.c
+++ b/hw/ppc/pnv_mpipl.c
@@ -20,6 +20,8 @@
     (pnv->mpipl_state.skiboot_base + MDST_TABLE_OFF)
 #define MDDT_TABLE_RELOCATED                            \
     (pnv->mpipl_state.skiboot_base + MDDT_TABLE_OFF)
+#define MDRT_TABLE_RELOCATED                            \
+    (pnv->mpipl_state.skiboot_base + MDRT_TABLE_OFF)
 #define PROC_DUMP_RELOCATED                             \
     (pnv->mpipl_state.skiboot_base + PROC_DUMP_AREA_OFF)
 
@@ -320,6 +322,139 @@ static bool pnv_mpipl_preserve_cpu_state(PnvMachineState *pnv)
     return true;
 }
 
+/*
+ * Write the preserved CPU state data in Processor Dump Area (PROC_DUMP_AREA)
+ *
+ * Returns true if everything went fine, else false for any error
+ */
+static bool pnv_mpipl_write_cpu_state(PnvMachineState *pnv)
+{
+    MpiplProcDumpArea *proc_area = &pnv->mpipl_state.proc_area;
+    MpiplPreservedCPUState *cpu_state = pnv->mpipl_state.cpu_states;
+    const uint32_t num_cpu_states = pnv->mpipl_state.num_cpu_states;
+    hwaddr next_regentries_hdr;
+    AddressSpace *default_as = &address_space_memory;
+    MemTxResult io_result;
+    MemTxAttrs attrs;
+
+    /* Mark the memory transactions as privileged memory access */
+    attrs.user = 0;
+    attrs.memory = 1;
+
+    if (be32_to_cpu(proc_area->alloc_size) <
+       (num_cpu_states * sizeof(MpiplPreservedCPUState))) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+            "MPIPL: Size of buffer allocate by skiboot (%u bytes) is not"
+            "enough to save all CPUs registers needed (%zu bytes)",
+            be32_to_cpu(proc_area->alloc_size),
+            num_cpu_states * sizeof(MpiplPreservedCPUState));
+
+        return false;
+    }
+
+    proc_area->version = PROC_DUMP_AREA_VERSION_P9;
+
+    /*
+     * This is the stride kernel/firmware should use to jump from a
+     * register entries header to next CPU's header
+     */
+    proc_area->thread_size = cpu_to_be32(sizeof(MpiplPreservedCPUState));
+
+    /* Write the header and register entries for each CPU */
+    next_regentries_hdr = be64_to_cpu(proc_area->alloc_addr) & (~HRMOR_BIT);
+    for (int i = 0; i < num_cpu_states; ++i) {
+        io_result = address_space_write(default_as, next_regentries_hdr, attrs,
+            &cpu_state->hdr, sizeof(MpiplRegDataHdr));
+        if (io_result != MEMTX_OK) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                "MPIPL: Failed to write RegEntries Header\n");
+            return false;
+        }
+
+        io_result = address_space_write(default_as,
+            next_regentries_hdr + sizeof(MpiplRegDataHdr), attrs,
+            &cpu_state->reg_entries,
+            NUM_REGS_PER_CPU * (sizeof(MpiplRegEntry)));
+        if (io_result != MEMTX_OK) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                "MPIPL: Failed to write Register Entries\n");
+            return false;
+        }
+
+        /*
+         * According to HDAT section:
+         *  "15.3.1.5 Architected Register Data content":
+         *
+         * The next register entries header will be at current header +
+         * "Thread Register State Entry size"
+         *
+         * Note: proc_area.thread_size == sizeof(MpiplPreservedCPUState)
+         */
+        next_regentries_hdr += sizeof(MpiplPreservedCPUState);
+        ++cpu_state;
+    }
+
+    /* Point the destination address to the preserved memory region */
+    proc_area->dest_addr = proc_area->alloc_addr;
+    proc_area->act_size  = cpu_to_be32(num_cpu_states *
+            sizeof(MpiplPreservedCPUState));
+
+    io_result = address_space_write(default_as, PROC_DUMP_AREA_OFF, attrs,
+        proc_area, sizeof(MpiplProcDumpArea));
+    if (io_result != MEMTX_OK) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+            "MPIPL: Failed to write Register Entries\n");
+        return false;
+    }
+
+    return true;
+}
+
+/*
+ * Write the preserved MDRT table, representing preserved memory regions
+ *
+ * Returns true if everything went fine, else false for any error
+ */
+static bool pnv_mpipl_write_mdrt(PnvMachineState *pnv)
+{
+    MpiplPreservedState *state = &pnv->mpipl_state;
+    AddressSpace *default_as = &address_space_memory;
+    MemTxResult io_result;
+    MemTxAttrs attrs;
+
+    /* Mark the memory transactions as privileged memory access */
+    attrs.user = 0;
+    attrs.memory = 1;
+
+    /*
+     * Generally writes from platform during MPIPL don't go to a relocated
+     * skiboot address
+     *
+     * Though for MDRT we are doing so, as this is the address skiboot
+     * considers by default for MDRT
+     *
+     * MDRT/MDST/MDDT base addresses are actually meant to be shared by
+     * platform in SPIRA structures.
+     *
+     * Not implementing SPIRA as it increases complexity for no gains.
+     * Using the default address skiboot expects for MDRT, which is the
+     * relocated MDRT, hence writing to it
+     *
+     * Other tables like MDST/MDDT should not be written to relocated
+     * addresses, as skiboot will overwrite anything from SKIBOOT_BASE till
+     * SKIBOOT_BASE+SKIBOOT_SIZE (which is 0x30000000-0x31c00000 by default)
+     */
+    io_result = address_space_write(default_as, MDRT_TABLE_RELOCATED, attrs,
+            state->mdrt_table,
+            state->num_mdrt_entries * sizeof(MdrtTableEntry));
+    if (io_result != MEMTX_OK) {
+        qemu_log_mask(LOG_GUEST_ERROR, "MPIPL: Failed to write MDRT table\n");
+        return false;
+    }
+
+    return true;
+}
+
 void do_mpipl_preserve(PnvMachineState *pnv)
 {
     pause_all_vcpus();
@@ -340,3 +475,8 @@ void do_mpipl_preserve(PnvMachineState *pnv)
      */
     qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
 }
+
+bool do_mpipl_write(PnvMachineState *pnv)
+{
+    return pnv_mpipl_write_mdrt(pnv) && pnv_mpipl_write_cpu_state(pnv);
+}
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 08/13] pnv/mpipl: Enable MPIPL support
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
                   ` (6 preceding siblings ...)
  2026-04-29 18:32 ` [PULL 07/13] pnv/mpipl: Write the preserved CPU and MDRT state Harsh Prateek Bora
@ 2026-04-29 18:32 ` Harsh Prateek Bora
  2026-04-29 18:32 ` [PULL 09/13] tests/functional: Add test for MPIPL in PowerNV Harsh Prateek Bora
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aditya Gupta, Hari Bathini, Sourabh Jain, Shivang Upadhyay

From: Aditya Gupta <adityag@linux.ibm.com>

With all MPIPL support in place, export a "dump" node in device tree,
signifying that PowerNV QEMU platform supports MPIPL

Also, export fw-load-area dt node, which has details about where the
kernel & initrd were loaded, so that kernel can verify whether the
kernel/initrd images were loaded within the boot memory region. QEMU
just exports these details in fw-load-area, the check for boot memory
region is done in kernel.

Since now device tree can change at pnv_reset, hence regenerate device
tree during pnv_reset

Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-9-adityag@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/pnv.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 48f49bef82..89096f9a84 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -54,6 +54,7 @@
 #include "hw/ppc/pnv_chip.h"
 #include "hw/ppc/pnv_xscom.h"
 #include "hw/ppc/pnv_pnor.h"
+#include "hw/ppc/pnv_mpipl.h"
 
 #include "hw/isa/isa.h"
 #include "hw/char/serial-isa.h"
@@ -672,6 +673,39 @@ static void pnv_dt_power_mgt(PnvMachineState *pnv, void *fdt)
     _FDT(fdt_setprop_cell(fdt, off, "ibm,enabled-stop-levels", 0xc0000000));
 }
 
+static void pnv_dt_mpipl_dump(PnvMachineState *pnv, void *fdt)
+{
+    int off;
+
+    /*
+     * Add "dump" node so kernel knows MPIPL (aka fadump) is supported
+     *
+     * Note: This is only needed to be done since we are passing device tree to
+     * opal
+     *
+     * In case HDAT is supported in future, then opal can add these nodes by
+     * itself based on system attribute having MPIPL_SUPPORTED bit set
+     */
+    off = fdt_add_subnode(fdt, 0, "ibm,opal");
+    if (off == -FDT_ERR_EXISTS) {
+        off = fdt_path_offset(fdt, "/ibm,opal");
+    }
+
+    _FDT(off);
+    off = fdt_add_subnode(fdt, off, "dump");
+    _FDT(off);
+    _FDT((fdt_setprop_string(fdt, off, "compatible", "ibm,opal-dump")));
+
+    /* Add kernel and initrd as fw-load-area */
+    uint64_t fw_load_area[4] = {
+        cpu_to_be64(KERNEL_LOAD_ADDR), cpu_to_be64(KERNEL_MAX_SIZE),
+        cpu_to_be64(INITRD_LOAD_ADDR), cpu_to_be64(INITRD_MAX_SIZE)
+    };
+
+    _FDT((fdt_setprop(fdt, off, "fw-load-area",
+                    fw_load_area, sizeof(fw_load_area))));
+}
+
 static void *pnv_dt_create(MachineState *machine)
 {
     PnvMachineClass *pmc = PNV_MACHINE_GET_CLASS(machine);
@@ -734,6 +768,9 @@ static void *pnv_dt_create(MachineState *machine)
         pmc->dt_power_mgt(pnv, fdt);
     }
 
+    /* Advertise support for MPIPL */
+    pnv_dt_mpipl_dump(pnv, fdt);
+
     return fdt;
 }
 
@@ -765,6 +802,10 @@ static void pnv_reset(MachineState *machine, ResetType type)
         mpipl_write_succeeded = do_mpipl_write(pnv);
     }
 
+    /* Regenerate device tree */
+    fdt = pnv_dt_create(machine);
+    _FDT((fdt_pack(fdt)));
+
     /*
      * If it's a MPIPL boot, add the "mpipl-boot" property, and reset the
      * boolean for MPIPL boot for next boot
@@ -814,8 +855,11 @@ static void pnv_reset(MachineState *machine, ResetType type)
                 sizeof(proc_area));
     }
 
-    fdt = machine->fdt;
     cpu_physical_memory_write(PNV_FDT_ADDR, fdt, fdt_totalsize(fdt));
+
+    /* Free previous device tree set by pnv_init/reset/machine_init_done */
+    g_free(machine->fdt);
+    machine->fdt = fdt;
 }
 
 static ISABus *pnv_chip_power8_isa_create(PnvChip *chip, Error **errp)
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 09/13] tests/functional: Add test for MPIPL in PowerNV
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
                   ` (7 preceding siblings ...)
  2026-04-29 18:32 ` [PULL 08/13] pnv/mpipl: Enable MPIPL support Harsh Prateek Bora
@ 2026-04-29 18:32 ` Harsh Prateek Bora
  2026-04-29 18:33 ` [PULL 10/13] MAINTAINERS: Add entry for MPIPL (PowerNV) Harsh Prateek Bora
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aditya Gupta, Hari Bathini, Sourabh Jain, Shivang Upadhyay

From: Aditya Gupta <adityag@linux.ibm.com>

With MPIPL support implemented, enable fadump's functional test for PowerNV

Also, current functional test for powernv uses op-build's Linux 5.10 image,
which doesn't support adding "fadump=on" in argument due to this:

    Kernel is locked down from Kernel configuration; see man kernel_lockdown.7

Hence, instead of op-build's image, use the newer fedora vmlinuz as used
in FADump PSeries functional test

Also due to "bash#" string not showing up, rely on sh: no job control to
check if testcase has reached till shell

Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-10-adityag@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 tests/functional/ppc64/test_fadump.py | 35 ++++++++++-----------------
 1 file changed, 13 insertions(+), 22 deletions(-)

diff --git a/tests/functional/ppc64/test_fadump.py b/tests/functional/ppc64/test_fadump.py
index bd9692f64c..7ea65974e0 100755
--- a/tests/functional/ppc64/test_fadump.py
+++ b/tests/functional/ppc64/test_fadump.py
@@ -14,6 +14,7 @@ class QEMUFadump(LinuxKernelTest):
 
     1. test_fadump_pseries:       PSeries
     2. test_fadump_pseries_kvm:   PSeries + KVM
+    3. test_fadump_powernv:       PowerNV
     """
 
     timeout = 90
@@ -24,11 +25,6 @@ class QEMUFadump(LinuxKernelTest):
     msg_registered_failed = ''
     msg_dump_active = ''
 
-    ASSET_EPAPR_KERNEL = Asset(
-        ('https://github.com/open-power/op-build/releases/download/v2.7/'
-         'zImage.epapr'),
-        '0ab237df661727e5392cee97460e8674057a883c5f74381a128fa772588d45cd')
-
     ASSET_VMLINUZ_KERNEL = Asset(
         ('https://archives.fedoraproject.org/pub/archive/fedora-secondary/'
          'releases/39/Everything/ppc64le/os/ppc/ppc64/vmlinuz'),
@@ -62,16 +58,14 @@ def do_test_fadump(self, is_kvm=False, is_powernv=False):
             # SLOF takes upto >20s in startup time, use VOF
             self.set_machine("pseries")
             self.vm.add_args("-machine", "x-vof=on")
-            self.vm.add_args("-m", "6G")
+
+        self.vm.add_args("-m", "6G")
 
         self.vm.set_console()
 
         kernel_path = None
 
-        if is_powernv:
-            kernel_path = self.ASSET_EPAPR_KERNEL.fetch()
-        else:
-            kernel_path = self.ASSET_VMLINUZ_KERNEL.fetch()
+        kernel_path = self.ASSET_VMLINUZ_KERNEL.fetch()
 
         initrd_path = self.ASSET_FEDORA_INITRD.fetch()
 
@@ -102,16 +96,14 @@ def do_test_fadump(self, is_kvm=False, is_powernv=False):
             timeout=20
         )
 
-        # Ensure fadump is registered successfully, if registration
-        # succeeds, we get a log from rtas fadump:
-        #
-        #     rtas fadump: Registration is successful!
-        self.wait_for_console_pattern(
-            "rtas fadump: Registration is successful!"
-        )
+        # Ensure fadump is registered successfully
+        if not is_powernv:
+            self.wait_for_console_pattern(
+                "rtas fadump: Registration is successful!"
+            )
 
         # Wait for the shell
-        self.wait_for_console_pattern("#")
+        self.wait_for_console_pattern("sh: no job control")
 
         # Mount /proc since not available in the initrd used
         exec_command(self, command="mount -t proc proc /proc")
@@ -135,7 +127,7 @@ def do_test_fadump(self, is_kvm=False, is_powernv=False):
         # that qemu didn't pass the 'ibm,kernel-dump' device tree node
         wait_for_console_pattern(
             test=self,
-            success_message="rtas fadump: Firmware-assisted dump is active",
+            success_message="fadump: Firmware-assisted dump is active",
             failure_message="fadump: Reserved "
         )
 
@@ -148,7 +140,7 @@ def do_test_fadump(self, is_kvm=False, is_powernv=False):
         self.wait_for_console_pattern("preserving crash data")
 
         # Wait for prompt
-        self.wait_for_console_pattern("sh-5.2#")
+        self.wait_for_console_pattern("Run /bin/sh as init process")
 
         # Mount /proc since not available in the initrd used
         exec_command_and_wait_for_pattern(self,
@@ -166,9 +158,8 @@ def do_test_fadump(self, is_kvm=False, is_powernv=False):
     def test_fadump_pseries(self):
         return self.do_test_fadump(is_kvm=False, is_powernv=False)
 
-    @skip("PowerNV Fadump not supported yet")
     def test_fadump_powernv(self):
-        return
+        return self.do_test_fadump(is_kvm=False, is_powernv=True)
 
     def test_fadump_pseries_kvm(self):
         """
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 10/13] MAINTAINERS: Add entry for MPIPL (PowerNV)
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
                   ` (8 preceding siblings ...)
  2026-04-29 18:32 ` [PULL 09/13] tests/functional: Add test for MPIPL in PowerNV Harsh Prateek Bora
@ 2026-04-29 18:33 ` Harsh Prateek Bora
  2026-04-29 18:33 ` [PULL 11/13] hw/ssi/pnv_spi: Fix fifo8 memory leak on unrealize Harsh Prateek Bora
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aditya Gupta, Hari Bathini, Sourabh Jain, Shivang Upadhyay

From: Aditya Gupta <adityag@linux.ibm.com>

Add maintainer and reviewer for MPIPL subsystem.

Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-11-adityag@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 MAINTAINERS | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e41f0eb92c..0adc6bd6b8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3285,6 +3285,15 @@ F: include/hw/ppc/spapr_fadump.h
 F: hw/ppc/spapr_fadump.c
 F: tests/functional/ppc64/test_fadump.py
 
+Memory-Preserving Initial Program Load (MPIPL) for PowerNV
+M: Aditya Gupta <adityag@linux.ibm.com>
+R: Hari Bathini <hbathini@linux.ibm.com>
+R: Sourabh <sourabhjain@linux.ibm.com>
+S: Maintained
+F: include/hw/ppc/pnv_mpipl.h
+F: hw/ppc/pnv_mpipl.c
+F: tests/functional/ppc64/test_fadump.py
+
 GDB stub
 M: Alex Bennée <alex.bennee@linaro.org>
 R: Philippe Mathieu-Daudé <philmd@linaro.org>
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 11/13] hw/ssi/pnv_spi: Fix fifo8 memory leak on unrealize
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
                   ` (9 preceding siblings ...)
  2026-04-29 18:33 ` [PULL 10/13] MAINTAINERS: Add entry for MPIPL (PowerNV) Harsh Prateek Bora
@ 2026-04-29 18:33 ` Harsh Prateek Bora
  2026-04-29 18:33 ` [PULL 12/13] ppc/pnv: Add a nest MMU model Harsh Prateek Bora
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Caleb Schlossin, Glenn Miles, Chalapathi V, Nicholas Piggin,
	Aditya Gupta

From: Caleb Schlossin <calebs@linux.ibm.com>

unrealize should free the fifo8 memory that was allocated by realize.

Fixes: 17befecda85 ("hw/ssi/pnv_spi: Replace PnvXferBuffer with Fifo8 structure")

Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Chalapathi V <chalapathi.v@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Caleb Schlossin <calebs@linux.ibm.com>
Reviewed-by: Aditya Gupta <adityag@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260120145117.602960-1-calebs@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ssi/pnv_spi.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/ssi/pnv_spi.c b/hw/ssi/pnv_spi.c
index 76304d26fc..f3add8cab9 100644
--- a/hw/ssi/pnv_spi.c
+++ b/hw/ssi/pnv_spi.c
@@ -1177,6 +1177,13 @@ static void pnv_spi_realize(DeviceState *dev, Error **errp)
                           s, "xscom-spi", PNV10_XSCOM_PIB_SPIC_SIZE);
 }
 
+static void pnv_spi_unrealize(DeviceState *dev)
+{
+    PnvSpi *s = PNV_SPI(dev);
+    fifo8_destroy(&s->tx_fifo);
+    fifo8_destroy(&s->rx_fifo);
+}
+
 static int pnv_spi_dt_xscom(PnvXScomInterface *dev, void *fdt,
                              int offset)
 {
@@ -1234,6 +1241,7 @@ static void pnv_spi_class_init(ObjectClass *klass, const void *data)
 
     dc->desc = "PowerNV SPI";
     dc->realize = pnv_spi_realize;
+    dc->unrealize = pnv_spi_unrealize;
     device_class_set_legacy_reset(dc, do_reset);
     dc->vmsd = &pnv_spi_vmstate;
     device_class_set_props(dc, pnv_spi_properties);
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 12/13] ppc/pnv: Add a nest MMU model
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
                   ` (10 preceding siblings ...)
  2026-04-29 18:33 ` [PULL 11/13] hw/ssi/pnv_spi: Fix fifo8 memory leak on unrealize Harsh Prateek Bora
@ 2026-04-29 18:33 ` Harsh Prateek Bora
  2026-04-29 18:33 ` [PULL 13/13] hw/intc/xics: Add a check for an invalid server id Harsh Prateek Bora
  2026-04-30 17:35 ` [PULL 00/13] PPC PR for 11.1 (2026-04-29) Stefan Hajnoczi
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Caleb Schlossin, Chalapathi V, Glenn Miles, Frederic Barrat,
	Aditya Gupta

From: Caleb Schlossin <calebs@linux.ibm.com>

The nest MMU is used for translations needed by I/O subsystems
on Power10. The nest is the shared, on-chip infrastructure
that connects CPU cores, memory controllers, and I/O.

This patch sets up a basic skeleton with its xscom
area, mapping both needed xscom regions. Support required
for PowerVM bringup.

Use Power9 property for device tree to allow OPAL to
work with Power9 and Power10.

Reviewed-by: Chalapathi V <chalapathi.v@linux.ibm.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Chalapathi V <chalapathi.v@linux.ibm.com>
Signed-off-by: Caleb Schlossin <calebs@linux.ibm.com>
Reviewed-by: Aditya Gupta <adityag@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260120150139.714805-1-calebs@linux.ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 include/hw/ppc/pnv_chip.h  |   3 +
 include/hw/ppc/pnv_nmmu.h  |  28 ++++++++
 include/hw/ppc/pnv_xscom.h |   4 ++
 hw/ppc/pnv.c               |  20 ++++++
 hw/ppc/pnv_nmmu.c          | 132 +++++++++++++++++++++++++++++++++++++
 hw/ppc/meson.build         |   1 +
 6 files changed, 188 insertions(+)
 create mode 100644 include/hw/ppc/pnv_nmmu.h
 create mode 100644 hw/ppc/pnv_nmmu.c

diff --git a/include/hw/ppc/pnv_chip.h b/include/hw/ppc/pnv_chip.h
index ea47c97dd3..8ef75fdcca 100644
--- a/include/hw/ppc/pnv_chip.h
+++ b/include/hw/ppc/pnv_chip.h
@@ -7,6 +7,7 @@
 #include "hw/ppc/pnv_core.h"
 #include "hw/ppc/pnv_homer.h"
 #include "hw/ppc/pnv_n1_chiplet.h"
+#include "hw/ppc/pnv_nmmu.h"
 #include "hw/ssi/pnv_spi.h"
 #include "hw/ppc/pnv_lpc.h"
 #include "hw/ppc/pnv_occ.h"
@@ -126,6 +127,8 @@ struct Pnv10Chip {
     PnvN1Chiplet     n1_chiplet;
 #define PNV10_CHIP_MAX_PIB_SPIC 6
     PnvSpi pib_spic[PNV10_CHIP_MAX_PIB_SPIC];
+#define PNV10_CHIP_MAX_NMMU 2
+    PnvNMMU      nmmu[PNV10_CHIP_MAX_NMMU];
 
     uint32_t     nr_quads;
     PnvQuad      *quads;
diff --git a/include/hw/ppc/pnv_nmmu.h b/include/hw/ppc/pnv_nmmu.h
new file mode 100644
index 0000000000..d3ba46ecf4
--- /dev/null
+++ b/include/hw/ppc/pnv_nmmu.h
@@ -0,0 +1,28 @@
+/*
+ * QEMU PowerPC nest MMU model
+ *
+ * Copyright (c) 2025, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef PPC_PNV_NMMU_H
+#define PPC_PNV_NMMU_H
+
+#define TYPE_PNV_NMMU "pnv-nmmu"
+#define PNV_NMMU(obj) OBJECT_CHECK(PnvNMMU, (obj), TYPE_PNV_NMMU)
+
+typedef struct PnvNMMU {
+    DeviceState parent;
+
+    struct PnvChip *chip;
+
+    MemoryRegion xscom_regs;
+    uint32_t nmmu_id;
+    uint64_t ptcr;
+} PnvNMMU;
+
+#endif /*PPC_PNV_NMMU_H */
diff --git a/include/hw/ppc/pnv_xscom.h b/include/hw/ppc/pnv_xscom.h
index 610b075a27..6dab803d1f 100644
--- a/include/hw/ppc/pnv_xscom.h
+++ b/include/hw/ppc/pnv_xscom.h
@@ -196,6 +196,10 @@ struct PnvXScomInterfaceClass {
 #define PNV10_XSCOM_N1_PB_SCOM_ES_BASE      0x3011300
 #define PNV10_XSCOM_N1_PB_SCOM_ES_SIZE      0x100
 
+#define PNV10_XSCOM_NEST0_MMU_BASE      0x2010c40
+#define PNV10_XSCOM_NEST1_MMU_BASE      0x3010c40
+#define PNV10_XSCOM_NMMU_SIZE      0x20
+
 #define PNV10_XSCOM_PEC_NEST_BASE  0x3011800 /* index goes downwards ... */
 #define PNV10_XSCOM_PEC_NEST_SIZE  0x100
 
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 89096f9a84..9ed918fa6a 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2297,6 +2297,11 @@ static void pnv_chip_power10_instance_init(Object *obj)
                                 TYPE_PNV_PHB5_PEC);
     }
 
+    for (i = 0; i < PNV10_CHIP_MAX_NMMU; i++) {
+        object_initialize_child(obj, "nmmu[*]", &chip10->nmmu[i],
+                                TYPE_PNV_NMMU);
+    }
+
     for (i = 0; i < pcc->i2c_num_engines; i++) {
         object_initialize_child(obj, "i2c[*]", &chip10->i2c[i], TYPE_PNV_I2C);
     }
@@ -2511,6 +2516,21 @@ static void pnv_chip_power10_realize(DeviceState *dev, Error **errp)
     pnv_xscom_add_subregion(chip, PNV10_XSCOM_N1_PB_SCOM_ES_BASE,
                            &chip10->n1_chiplet.xscom_pb_es_mr);
 
+    /* nest0/1 MMU */
+    for (i = 0; i < PNV10_CHIP_MAX_NMMU; i++) {
+        object_property_set_int(OBJECT(&chip10->nmmu[i]), "nmmu_id",
+                                i , &error_fatal);
+        object_property_set_link(OBJECT(&chip10->nmmu[i]), "chip",
+                                 OBJECT(chip), &error_abort);
+        if (!qdev_realize(DEVICE(&chip10->nmmu[i]), NULL, errp)) {
+            return;
+        }
+    }
+    pnv_xscom_add_subregion(chip, PNV10_XSCOM_NEST0_MMU_BASE,
+                            &chip10->nmmu[0].xscom_regs);
+    pnv_xscom_add_subregion(chip, PNV10_XSCOM_NEST1_MMU_BASE,
+                            &chip10->nmmu[1].xscom_regs);
+
     /* PHBs */
     pnv_chip_power10_phb_realize(chip, &local_err);
     if (local_err) {
diff --git a/hw/ppc/pnv_nmmu.c b/hw/ppc/pnv_nmmu.c
new file mode 100644
index 0000000000..c1b00bac89
--- /dev/null
+++ b/hw/ppc/pnv_nmmu.c
@@ -0,0 +1,132 @@
+/*
+ * QEMU PowerPC nest MMU model
+ *
+ * Copyright (c) 2025, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/core/qdev-properties.h"
+
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/ppc/pnv_nmmu.h"
+#include "hw/ppc/fdt.h"
+
+#include <libfdt.h>
+
+#define NMMU_XLAT_CTL_PTCR 0xb
+
+static uint64_t pnv_nmmu_xscom_read(void *opaque, hwaddr addr, unsigned size)
+{
+    PnvNMMU *nmmu = PNV_NMMU(opaque);
+    int reg = addr >> 3;
+    uint64_t val;
+
+    if (reg == NMMU_XLAT_CTL_PTCR) {
+        val = nmmu->ptcr;
+    } else {
+        val = 0xffffffffffffffffull;
+        qemu_log_mask(LOG_UNIMP, "nMMU: xscom read at 0x%" PRIx32 "\n", reg);
+    }
+    return val;
+}
+
+static void pnv_nmmu_xscom_write(void *opaque, hwaddr addr,
+                                 uint64_t val, unsigned size)
+{
+    PnvNMMU *nmmu = PNV_NMMU(opaque);
+    int reg = addr >> 3;
+
+    if (reg == NMMU_XLAT_CTL_PTCR) {
+        nmmu->ptcr = val;
+    } else {
+        qemu_log_mask(LOG_UNIMP, "nMMU: xscom write at 0x%" PRIx32 "\n", reg);
+    }
+}
+
+static const MemoryRegionOps pnv_nmmu_xscom_ops = {
+    .read = pnv_nmmu_xscom_read,
+    .write = pnv_nmmu_xscom_write,
+    .valid.min_access_size = 8,
+    .valid.max_access_size = 8,
+    .impl.min_access_size = 8,
+    .impl.max_access_size = 8,
+    .endianness = DEVICE_BIG_ENDIAN,
+};
+
+static void pnv_nmmu_realize(DeviceState *dev, Error **errp)
+{
+    PnvNMMU *nmmu = PNV_NMMU(dev);
+
+    assert(nmmu->chip);
+
+    /* NMMU xscom region */
+    pnv_xscom_region_init(&nmmu->xscom_regs, OBJECT(nmmu),
+                          &pnv_nmmu_xscom_ops, nmmu,
+                          "xscom-nmmu",
+                          PNV10_XSCOM_NMMU_SIZE);
+}
+
+static int pnv_nmmu_dt_xscom(PnvXScomInterface *dev, void *fdt,
+                             int offset)
+{
+    PnvNMMU *nmmu = PNV_NMMU(dev);
+    char *name;
+    int nmmu_offset;
+    const char compat[] = "ibm,power9-nest-mmu";
+    uint32_t nmmu_pcba = PNV10_XSCOM_NEST0_MMU_BASE + nmmu->nmmu_id * 0x1000000;
+    uint32_t reg[2] = {
+        cpu_to_be32(nmmu_pcba),
+        cpu_to_be32(PNV10_XSCOM_NMMU_SIZE)
+    };
+
+    name = g_strdup_printf("nmmu@%x", nmmu_pcba);
+    nmmu_offset = fdt_add_subnode(fdt, offset, name);
+    _FDT(nmmu_offset);
+    g_free(name);
+
+    _FDT(fdt_setprop(fdt, nmmu_offset, "reg", reg, sizeof(reg)));
+    _FDT(fdt_setprop(fdt, nmmu_offset, "compatible", compat, sizeof(compat)));
+    return 0;
+}
+
+static const Property pnv_nmmu_properties[] = {
+    DEFINE_PROP_UINT32("nmmu_id", PnvNMMU, nmmu_id, 0),
+    DEFINE_PROP_LINK("chip", PnvNMMU, chip, TYPE_PNV_CHIP, PnvChip *),
+};
+
+static void pnv_nmmu_class_init(ObjectClass *klass, const void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PnvXScomInterfaceClass *xscomc = PNV_XSCOM_INTERFACE_CLASS(klass);
+
+    xscomc->dt_xscom = pnv_nmmu_dt_xscom;
+
+    dc->desc = "PowerNV nest MMU";
+    dc->realize = pnv_nmmu_realize;
+    device_class_set_props(dc, pnv_nmmu_properties);
+}
+
+static const TypeInfo pnv_nmmu_info = {
+    .name          = TYPE_PNV_NMMU,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(PnvNMMU),
+    .class_init    = pnv_nmmu_class_init,
+    .interfaces    = (InterfaceInfo[]) {
+        { TYPE_PNV_XSCOM_INTERFACE },
+        { }
+    }
+};
+
+static void pnv_nmmu_register_types(void)
+{
+    type_register_static(&pnv_nmmu_info);
+}
+
+type_init(pnv_nmmu_register_types);
diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
index c61fba4ec8..37aa535db2 100644
--- a/hw/ppc/meson.build
+++ b/hw/ppc/meson.build
@@ -57,6 +57,7 @@ ppc_ss.add(when: 'CONFIG_POWERNV', if_true: files(
   'pnv_nest_pervasive.c',
   'pnv_n1_chiplet.c',
   'pnv_mpipl.c',
+  'pnv_nmmu.c'
 ))
 # PowerPC 4xx boards
 ppc_ss.add(when: 'CONFIG_PPC405', if_true: files(
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PULL 13/13] hw/intc/xics: Add a check for an invalid server id
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
                   ` (11 preceding siblings ...)
  2026-04-29 18:33 ` [PULL 12/13] ppc/pnv: Add a nest MMU model Harsh Prateek Bora
@ 2026-04-29 18:33 ` Harsh Prateek Bora
  2026-04-30 17:35 ` [PULL 00/13] PPC PR for 11.1 (2026-04-29) Stefan Hajnoczi
  13 siblings, 0 replies; 17+ messages in thread
From: Harsh Prateek Bora @ 2026-04-29 18:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: kiki, Zexiang Zhang, Gautam Menghani, Philippe Mathieu-Daudé

From: kiki <Chan9Yan9@gmail.com>

A malformed IVE value can result in an invalid server field being
passed to icp_irq(). The function assumes the server id is valid and
may access invalid state otherwise, potentially leading to a crash.

Fix this by validating the server id before using it and ignoring
invalid values.

Reported-by: Zexiang Zhang <chan9yan9@gmail.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/work_items/3324
Signed-off-by: Zexiang Zhang <chan9yan9@gmail.com>
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20260428103645.50617-1-Gautam.Menghani@ibm.com
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/intc/xics.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index c0a252d051..e32984e9fc 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -26,6 +26,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/log.h"
 #include "qapi/error.h"
 #include "trace.h"
 #include "qemu/timer.h"
@@ -222,6 +223,13 @@ void icp_irq(ICSState *ics, int server, int nr, uint8_t priority)
 
     trace_xics_icp_irq(server, nr, priority);
 
+    if (!icp) {
+        qemu_log_mask(LOG_GUEST_ERROR, "XICS: invalid server %d for IRQ 0x%x\n",
+                      server, nr);
+        ics_reject(ics, nr);
+        return;
+    }
+
     if ((priority >= CPPR(icp))
         || (XISR(icp) && (icp->pending_priority <= priority))) {
         ics_reject(ics, nr);
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PULL 00/13] PPC PR for 11.1 (2026-04-29)
  2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
                   ` (12 preceding siblings ...)
  2026-04-29 18:33 ` [PULL 13/13] hw/intc/xics: Add a check for an invalid server id Harsh Prateek Bora
@ 2026-04-30 17:35 ` Stefan Hajnoczi
  13 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2026-04-30 17:35 UTC (permalink / raw)
  To: Harsh Prateek Bora; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 116 bytes --]

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/11.1 for any user-visible changes.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PULL 06/13] pnv/mpipl: Set thread entry size to be allocated by firmware
  2026-04-29 18:32 ` [PULL 06/13] pnv/mpipl: Set thread entry size to be allocated by firmware Harsh Prateek Bora
@ 2026-05-08  9:15   ` Peter Maydell
  2026-05-08 10:18     ` Shivang Upadhyay
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2026-05-08  9:15 UTC (permalink / raw)
  To: Harsh Prateek Bora
  Cc: qemu-devel, Aditya Gupta, Hari Bathini, Sourabh Jain,
	Shivang Upadhyay

On Wed, 29 Apr 2026 at 19:35, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>
> From: Aditya Gupta <adityag@linux.ibm.com>
>
> Set the "Thread Register State Entry Size" that is required by firmware
> (OPAL), to know size of memory to allocate to capture CPU state, in the
> event of a crash
>
> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
> Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-7-adityag@linux.ibm.com
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>

Hi; Coverity points out an issue with this change (CID 1658041):

>  static void pnv_reset(MachineState *machine, ResetType type)
>  {
> +    PnvMachineState *pnv = PNV_MACHINE(machine);
>      void *fdt;
>
>      qemu_devices_reset(type);
>
> +    if (!pnv->mpipl_state.is_next_boot_mpipl) {
> +        /*
> +         * Set the "Thread Register State Entry Size", so that firmware can
> +         * allocate enough memory to capture CPU state in the event of a
> +         * crash
> +         */
> +
> +        MpiplProcDumpArea proc_area;

Here we don't initialize the struct...

> +
> +        proc_area.version = PROC_DUMP_AREA_VERSION_P9;
> +        proc_area.thread_size = cpu_to_be32(sizeof(MpiplPreservedCPUState));
> +
> +        /* These are to be allocated & assigned by the firmware */
> +        proc_area.alloc_addr = 0;
> +        proc_area.alloc_size = 0;
> +
> +        /* These get assigned after crash, when QEMU preserves the registers */
> +        proc_area.dest_addr = 0;
> +        proc_area.act_size = 0;

...and here we don't fill in all the fields; we don't set
the reserved, reserved2 or reserved3 fields to anything.
This means that we will write data to the guest which is
potentially random host data from the stack.

I think I'd suggest fixing this by initializing the struct in
one go, like this:

    MpiplProcDumpArea proc_area = {
       .version = PROC_DUMP_AREA_VERSION_P9,
       .thread_size = cpu_to_be32(sizeof(MpiplPreservedCPUState)),
       [set alloc_addr = 0 etc here if you like, or rely on
        the "fields not listed are zero-inited"]
    };

(In fact because we use -ftrivial-auto-var-init=zero the compiler
will zero init this for us anyway, but we can make Coverity
happy anyway.)

> +
> +        cpu_physical_memory_write(PROC_DUMP_AREA_OFF, &proc_area,
> +                sizeof(proc_area));
> +    }
> +
>      fdt = machine->fdt;
>      cpu_physical_memory_write(PNV_FDT_ADDR, fdt, fdt_totalsize(fdt));
>  }

thanks
-- PMM


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PULL 06/13] pnv/mpipl: Set thread entry size to be allocated by firmware
  2026-05-08  9:15   ` Peter Maydell
@ 2026-05-08 10:18     ` Shivang Upadhyay
  0 siblings, 0 replies; 17+ messages in thread
From: Shivang Upadhyay @ 2026-05-08 10:18 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Harsh Prateek Bora, qemu-devel, Aditya Gupta, Hari Bathini,
	Sourabh Jain

Hi Peter,

On Fri, May 08, 2026 at 10:15:10AM +0100, Peter Maydell wrote:
> On Wed, 29 Apr 2026 at 19:35, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
> >
> > From: Aditya Gupta <adityag@linux.ibm.com>
> >
> > Set the "Thread Register State Entry Size" that is required by firmware
> > (OPAL), to know size of memory to allocate to capture CPU state, in the
> > event of a crash
> >
> > Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
> > Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> > Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> > Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>
> > Link: https://lore.kernel.org/qemu-devel/20260424083837.214947-7-adityag@linux.ibm.com
> > Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> 
> Hi; Coverity points out an issue with this change (CID 1658041):

Thanks for reporting, I'll look into this.
> 
> >  static void pnv_reset(MachineState *machine, ResetType type)
> >  {
> > +    PnvMachineState *pnv = PNV_MACHINE(machine);
> >      void *fdt;
> >
> >      qemu_devices_reset(type);
> >
> > +    if (!pnv->mpipl_state.is_next_boot_mpipl) {
> > +        /*
> > +         * Set the "Thread Register State Entry Size", so that firmware can
> > +         * allocate enough memory to capture CPU state in the event of a
> > +         * crash
> > +         */
> > +
> > +        MpiplProcDumpArea proc_area;
> 
> Here we don't initialize the struct...
> 
> > +
> > +        proc_area.version = PROC_DUMP_AREA_VERSION_P9;
> > +        proc_area.thread_size = cpu_to_be32(sizeof(MpiplPreservedCPUState));
> > +
> > +        /* These are to be allocated & assigned by the firmware */
> > +        proc_area.alloc_addr = 0;
> > +        proc_area.alloc_size = 0;
> > +
> > +        /* These get assigned after crash, when QEMU preserves the registers */
> > +        proc_area.dest_addr = 0;
> > +        proc_area.act_size = 0;
> 
> ...and here we don't fill in all the fields; we don't set
> the reserved, reserved2 or reserved3 fields to anything.
> This means that we will write data to the guest which is
> potentially random host data from the stack.
> 
> I think I'd suggest fixing this by initializing the struct in
> one go, like this:
> 
>     MpiplProcDumpArea proc_area = {
>        .version = PROC_DUMP_AREA_VERSION_P9,
>        .thread_size = cpu_to_be32(sizeof(MpiplPreservedCPUState)),
>        [set alloc_addr = 0 etc here if you like, or rely on
>         the "fields not listed are zero-inited"]
>     };
> 
> (In fact because we use -ftrivial-auto-var-init=zero the compiler
> will zero init this for us anyway, but we can make Coverity
> happy anyway.)

Thanks for suggesting the fix too.

~Shivang.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-05-08 10:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29 18:32 [PULL 00/13] PPC PR for 11.1 (2026-04-29) Harsh Prateek Bora
2026-04-29 18:32 ` [PULL 01/13] ppc/pnv: Move SBE host doorbell function to top of file Harsh Prateek Bora
2026-04-29 18:32 ` [PULL 02/13] ppc/mpipl: Implement S0 SBE interrupt Harsh Prateek Bora
2026-04-29 18:32 ` [PULL 03/13] ppc/pnv: Handle stash command in PowerNV SBE Harsh Prateek Bora
2026-04-29 18:32 ` [PULL 04/13] pnv/mpipl: Preserve memory regions as per MDST/MDDT tables Harsh Prateek Bora
2026-04-29 18:32 ` [PULL 05/13] pnv/mpipl: Preserve CPU registers after crash Harsh Prateek Bora
2026-04-29 18:32 ` [PULL 06/13] pnv/mpipl: Set thread entry size to be allocated by firmware Harsh Prateek Bora
2026-05-08  9:15   ` Peter Maydell
2026-05-08 10:18     ` Shivang Upadhyay
2026-04-29 18:32 ` [PULL 07/13] pnv/mpipl: Write the preserved CPU and MDRT state Harsh Prateek Bora
2026-04-29 18:32 ` [PULL 08/13] pnv/mpipl: Enable MPIPL support Harsh Prateek Bora
2026-04-29 18:32 ` [PULL 09/13] tests/functional: Add test for MPIPL in PowerNV Harsh Prateek Bora
2026-04-29 18:33 ` [PULL 10/13] MAINTAINERS: Add entry for MPIPL (PowerNV) Harsh Prateek Bora
2026-04-29 18:33 ` [PULL 11/13] hw/ssi/pnv_spi: Fix fifo8 memory leak on unrealize Harsh Prateek Bora
2026-04-29 18:33 ` [PULL 12/13] ppc/pnv: Add a nest MMU model Harsh Prateek Bora
2026-04-29 18:33 ` [PULL 13/13] hw/intc/xics: Add a check for an invalid server id Harsh Prateek Bora
2026-04-30 17:35 ` [PULL 00/13] PPC PR for 11.1 (2026-04-29) Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.