qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Implement Firmware Assisted Dump for PSeries
@ 2025-02-17  7:17 Aditya Gupta
  2025-02-17  7:17 ` [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries Aditya Gupta
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-02-17  7:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza,
	Harsh Prateek Bora, Sourabh Jain, Mahesh J Salgaonkar,
	Hari Bathini

Overview
=========

Implemented Firmware Assisted Dump (fadump) on PSeries machine in QEMU.

Fadump is an alternative dump mechanism to kdump, in which we the firmware
does a memory preserving boot, and the second/crashkernel is booted fresh
like a normal system reset, instead of the crashed kernel loading the
second/crashkernel in case of kdump.

This requires implementing the "ibm,configure-kernel-dump" RTAS call in
QEMU.

While booting with fadump=on, Linux will register fadump memory regions.

Some memory regions like Real Mode Memory regions, and custom memory
regions declared by OS basically require copying the requested memory
range to a destination

While other memory regions are populated by the firmware/platform (QEMU in
this case), such as CPU State Data and HPTE.
We pass the sizes for these data segment to the kernel as it needs to know
how much memory to reserve (ibm,configure-kernel-dump-sizes).

Then after a crash, once Linux does a OS terminate call, we trigger fadump
if fadump was registered.

Implementing the fadump boot as:
    * pause all vcpus (will save registers later)
    * preserve memory regions specified by fadump
    * do a memory preserving reboot (using GUEST_RESET as it doesn't clear
      the memory)

And then we pass a metadata (firmware memory structure) as
"ibm,kernel-dump" in the device tree, containing all details of the
preserved memory regions to the kernel.

Testing
=======

Has been tested with following QEMU options:

* firmware: x-vof and SLOF
* tcg & kvm
* l1 guest and l2 guest
* with/without smp
* cma/nocma
* default crashkernel values and crashkernel=1G

Logs of a linux boot with firmware assisted dump:

    ./build/qemu-system-ppc64 -M pseries,x-vof=on --cpu power10 --smp 4 -m 4G -kernel some-vmlinux -initrd some-initrd -append "debug fadump=on crashkernel=1G" -nographic
    [    0.000000] random: crng init done
    [    0.000000] fadump: Reserved 1024MB of memory at 0x00000040000000 (System RAM: 4096MB)
    ...
    [    1.084686] rtas fadump: Registration is successful!
    ...
    # cat /sys/kernel/debug/powerpc/fadump_region
    CPU :[0x00000040000000-0x000000400013d3] 0x13d4 bytes, Dumped: 0x0
    HPTE:[0x000000400013d4-0x000000400013d3] 0x0 bytes, Dumped: 0x0
    DUMP: Src: 0x00000000000000, Dest: 0x00000040010000, Size: 0x40000000, Dumped: 0x0 bytes

    [0x000000fffff800-0x000000ffffffff]: cmdline append: ''
    # echo c > /proc/sysrq-trigger

The fadump boot after crash:

    [    0.000000] rtas fadump: Firmware-assisted dump is active.
    [    0.000000] fadump: Updated cmdline: debug fadump=on crashkernel=1G
    [    0.000000] fadump: Firmware-assisted dump is active.
    [    0.000000] fadump: Reserving 3072MB of memory at 0x00000040000000 for preserving crash data
    ....
    # file /proc/vmcore
    /proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style

Analysing the vmcore with crash-utility:

          KERNEL: vmlinux-6.14-rc2
        DUMPFILE: vmcore-a64dcfb451e2-nocma
            CPUS: 4
            DATE: Thu Jan  1 05:30:00 IST 1970
          UPTIME: 00:00:30
    LOAD AVERAGE: 0.74, 0.21, 0.07
           TASKS: 94
        NODENAME: buildroot
         RELEASE: 6.14.0-rc2+
         VERSION: #1 SMP Wed Feb 12 06:49:59 CST 2025
         MACHINE: ppc64le  (1000 Mhz)
          MEMORY: 4 GB
           PANIC: "Kernel panic - not syncing: sysrq triggered crash"
             PID: 270
         COMMAND: "sh"
            TASK: c000000009e7cc00  [THREAD_INFO: c000000009e7cc00]
             CPU: 3
           STATE: TASK_RUNNING (PANIC)



Git Tree for Testing
====================

https://github.com/adi-g15-ibm/qemu/tree/fadump-pseries-v1

Known Issues
============

* CPU register saving seems to have cases where it's showing all registers
with the same value

* The implementation doesn't pass all the registers mentioned in PAPR since
  QEMU doesn't implement them/doesn't need them.
  The linux kernel uses only 9 of the 45 registers we are passing in QEMU.

Aditya Gupta (6):
  hw/ppc: Implement skeleton code for fadump in PSeries
  hw/ppc: Trigger Fadump boot if fadump is registered
  hw/ppc: Preserve memory regions registered for fadump
  hw/ppc: Implement saving CPU state in Fadump
  hw/ppc: Pass device tree properties for Fadump
  hw/ppc: Enable Fadump for PSeries

 hw/ppc/spapr.c         |  62 ++++++
 hw/ppc/spapr_rtas.c    | 456 +++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h | 172 +++++++++++++++-
 3 files changed, 689 insertions(+), 1 deletion(-)

-- 
2.48.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries
  2025-02-17  7:17 [PATCH 0/6] Implement Firmware Assisted Dump for PSeries Aditya Gupta
@ 2025-02-17  7:17 ` Aditya Gupta
  2025-02-27  3:07   ` Nicholas Piggin
  2025-03-04  9:01   ` Harsh Prateek Bora
  2025-02-17  7:17 ` [PATCH 2/6] hw/ppc: Trigger Fadump boot if fadump is registered Aditya Gupta
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-02-17  7:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza,
	Harsh Prateek Bora, Sourabh Jain, Mahesh J Salgaonkar,
	Hari Bathini

Implement the handler for "ibm,configure-kernel-dump" rtas call in QEMU.

Currently the handler just does basic checks and handles
register/unregister/invalidate requests from kernel.

Fadump will be enabled in a later patch.

Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
 hw/ppc/spapr_rtas.c    | 99 ++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h | 59 +++++++++++++++++++++++++
 2 files changed, 158 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index df2e837632aa..eebdf13b1552 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -341,6 +341,105 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
     rtas_st(rets, 0, ret);
 }
 
+struct fadump_metadata fadump_metadata;
+
+/* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
+static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,
+                                   SpaprMachineState *spapr,
+                                   uint32_t token, uint32_t nargs,
+                                   target_ulong args,
+                                   uint32_t nret, target_ulong rets)
+{
+    struct rtas_fadump_section_header header;
+    target_ulong cmd = rtas_ld(args, 0);
+    target_ulong fdm_addr = rtas_ld(args, 1);
+    target_ulong fdm_size = rtas_ld(args, 2);
+
+    /* Number outputs has to be 1 */
+    if (nret != 1) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                "FADUMP: ibm,configure-kernel-dump RTAS called with nret != 1.\n");
+        return;
+    }
+
+    /* Number inputs has to be 3 */
+    if (nargs != 3) {
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+
+    switch (cmd) {
+    case FADUMP_CMD_REGISTER:
+        if (fadump_metadata.fadump_registered) {
+            /* Fadump already registered */
+            rtas_st(rets, 0, RTAS_OUT_DUMP_ALREADY_REGISTERED);
+            return;
+        }
+
+        if (fadump_metadata.fadump_dump_active == 1) {
+            rtas_st(rets, 0, RTAS_OUT_DUMP_ACTIVE);
+            return;
+        }
+
+        if (fdm_size < sizeof(struct rtas_fadump_section_header)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                "FADUMP: Header size is invalid: %lu\n", fdm_size);
+            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+            return;
+        }
+
+        /* XXX: Can we ensure fdm_addr points to a valid RMR-memory buffer ? */
+        if (fdm_addr <= 0) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                "FADUMP: Invalid fdm address: %ld\n", fdm_addr);
+            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+            return;
+        }
+
+        /* Verify that we understand the fadump header version */
+        cpu_physical_memory_read(fdm_addr, &header, sizeof(header));
+        if (header.dump_format_version != cpu_to_be32(FADUMP_VERSION)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                "FADUMP: Unknown fadump header version: 0x%x\n",
+                header.dump_format_version);
+            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+            return;
+        }
+
+        fadump_metadata.fadump_registered = true;
+        fadump_metadata.fadump_dump_active = false;
+        fadump_metadata.fdm_addr = fdm_addr;
+        break;
+    case FADUMP_CMD_UNREGISTER:
+        if (fadump_metadata.fadump_dump_active == 1) {
+            rtas_st(rets, 0, RTAS_OUT_DUMP_ACTIVE);
+            return;
+        }
+
+        fadump_metadata.fadump_registered = false;
+        fadump_metadata.fadump_dump_active = false;
+        fadump_metadata.fdm_addr = -1;
+        break;
+    case FADUMP_CMD_INVALIDATE:
+        if (fadump_metadata.fadump_dump_active) {
+            fadump_metadata.fadump_registered = false;
+            fadump_metadata.fadump_dump_active = false;
+            fadump_metadata.fdm_addr = -1;
+            memset(&fadump_metadata.registered_fdm, 0,
+                    sizeof(fadump_metadata.registered_fdm));
+        } else {
+            hcall_dprintf("fadump: Nothing to invalidate, no dump active.\n");
+        }
+        break;
+    default:
+        hcall_dprintf("Unknown RTAS token 0x%x\n", token);
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+}
+
 static void rtas_ibm_os_term(PowerPCCPU *cpu,
                             SpaprMachineState *spapr,
                             uint32_t token, uint32_t nargs,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a6c0547e313d..efa2f891a8a7 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -704,6 +704,8 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
 #define RTAS_OUT_PARAM_ERROR                    -3
 #define RTAS_OUT_NOT_SUPPORTED                  -3
 #define RTAS_OUT_NO_SUCH_INDICATOR              -3
+#define RTAS_OUT_DUMP_ALREADY_REGISTERED        -9
+#define RTAS_OUT_DUMP_ACTIVE                    -10
 #define RTAS_OUT_NOT_AUTHORIZED                 -9002
 #define RTAS_OUT_SYSPARM_PARAM_ERROR            -9999
 
@@ -769,6 +771,63 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
 
 #define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2D)
 
+/* Fadump commands */
+#define FADUMP_CMD_REGISTER            1
+#define FADUMP_CMD_UNREGISTER          2
+#define FADUMP_CMD_INVALIDATE          3
+
+#define FADUMP_VERSION    1
+
+/*
+ * The Firmware Assisted Dump Memory structure supports a maximum of 10 sections
+ * in the dump memory structure. Presently, three sections are used for
+ * CPU state data, HPTE & Parameters area, while the remaining seven sections
+ * can be used for boot memory regions.
+ */
+#define FADUMP_MAX_SECTIONS            10
+#define RTAS_FADUMP_MAX_BOOT_MEM_REGS  7
+
+/* Kernel Dump section info */
+struct rtas_fadump_section {
+    __be32    request_flag;
+    __be16    source_data_type;
+    __be16    error_flags;
+    __be64    source_address;
+    __be64    source_len;
+    __be64    bytes_dumped;
+    __be64    destination_address;
+};
+
+/* ibm,configure-kernel-dump header. */
+struct rtas_fadump_section_header {
+    __be32    dump_format_version;
+    __be16    dump_num_sections;
+    __be16    dump_status_flag;
+    __be32    offset_first_dump_section;
+
+    /* Fields for disk dump option. */
+    __be32    dd_block_size;
+    __be64    dd_block_offset;
+    __be64    dd_num_blocks;
+    __be32    dd_offset_disk_path;
+
+    /* Maximum time allowed to prevent an automatic dump-reboot. */
+    __be32    max_time_auto;
+};
+
+struct rtas_fadump_mem_struct {
+    struct rtas_fadump_section_header header;
+    struct rtas_fadump_section        rgn[FADUMP_MAX_SECTIONS];
+};
+
+struct fadump_metadata {
+    bool fadump_registered;
+    bool fadump_dump_active;
+    target_ulong fdm_addr;
+    struct rtas_fadump_mem_struct registered_fdm;
+};
+extern struct fadump_metadata fadump_metadata;
+
 /* RTAS ibm,get-system-parameter token values */
 #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
 #define RTAS_SYSPARM_DIAGNOSTICS_RUN_MODE        42
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/6] hw/ppc: Trigger Fadump boot if fadump is registered
  2025-02-17  7:17 [PATCH 0/6] Implement Firmware Assisted Dump for PSeries Aditya Gupta
  2025-02-17  7:17 ` [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries Aditya Gupta
@ 2025-02-17  7:17 ` Aditya Gupta
  2025-02-27  3:14   ` Nicholas Piggin
  2025-03-04  9:21   ` Harsh Prateek Bora
  2025-02-17  7:17 ` [PATCH 3/6] hw/ppc: Preserve memory regions registered for fadump Aditya Gupta
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-02-17  7:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza,
	Harsh Prateek Bora, Sourabh Jain, Mahesh J Salgaonkar,
	Hari Bathini

According to PAPR:

    R1–7.3.30–3. When the platform receives an ibm,os-term RTAS call, or
    on a system reset without an ibm,nmi-interlock RTAS call, if the
    platform has a dump structure registered through the
    ibm,configure-kernel-dump call, the platform must process each
    registered kernel dump section as required and, when available,
    present the dump structure information to the operating system
    through the “ibm,kernel-dump” property, updated with status for each
    dump section, until the dump has been invalidated through the
    ibm,configure-kernel-dump RTAS call.

If Fadump has been registered, trigger an Fadump boot (memory preserving
boot), if QEMU recieves a 'ibm,os-term' rtas call.

Implementing the fadump boot as:
    * pause all vcpus (will save registers later)
    * preserve memory regions specified by fadump
    * do a memory preserving reboot (GUEST_RESET in QEMU doesn't clear
      the memory)

Memory regions registered by fadump will be handled in a later patch.

Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
 hw/ppc/spapr_rtas.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index eebdf13b1552..01c82375f03d 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -342,6 +342,43 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
 }
 
 struct fadump_metadata fadump_metadata;
+bool is_next_boot_fadump;
+
+static void trigger_fadump_boot(target_ulong spapr_retcode)
+{
+    /*
+     * In PowerNV, SBE stops all clocks for cores, do similar to it
+     * QEMU's nearest equivalent is 'pause_all_vcpus'
+     * See 'stopClocksS0' in SBE source code for more info on SBE part
+     */
+    pause_all_vcpus();
+
+    if (true /* TODO: Preserve memory registered for fadump */) {
+        /* Failed to preserve the registered memory regions */
+        rtas_st(spapr_retcode, 0, RTAS_OUT_HW_ERROR);
+
+        /* Cause a reboot */
+        qemu_system_guest_panicked(NULL);
+        return;
+    }
+
+    /* Mark next boot as fadump boot */
+    is_next_boot_fadump = true;
+
+    /* Reset fadump_registered for next boot */
+    fadump_metadata.fadump_registered = false;
+    fadump_metadata.fadump_dump_active = true;
+
+    /* Then do a guest reset */
+    /*
+     * Requirement:
+     * This guest reset should not clear the memory (which is
+     * the case when this is merged)
+     */
+    qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+
+    rtas_st(spapr_retcode, 0, RTAS_OUT_SUCCESS);
+}
 
 /* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
 static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,
@@ -449,6 +486,11 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
     target_ulong msgaddr = rtas_ld(args, 0);
     char msg[512];
 
+    if (fadump_metadata.fadump_registered) {
+        /* If fadump boot works, control won't come back here */
+        return trigger_fadump_boot(rets);
+    }
+
     cpu_physical_memory_read(msgaddr, msg, sizeof(msg) - 1);
     msg[sizeof(msg) - 1] = 0;
 
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/6] hw/ppc: Preserve memory regions registered for fadump
  2025-02-17  7:17 [PATCH 0/6] Implement Firmware Assisted Dump for PSeries Aditya Gupta
  2025-02-17  7:17 ` [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries Aditya Gupta
  2025-02-17  7:17 ` [PATCH 2/6] hw/ppc: Trigger Fadump boot if fadump is registered Aditya Gupta
@ 2025-02-17  7:17 ` Aditya Gupta
  2025-03-05  6:40   ` Harsh Prateek Bora
  2025-02-17  7:17 ` [PATCH 4/6] hw/ppc: Implement saving CPU state in Fadump Aditya Gupta
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Aditya Gupta @ 2025-02-17  7:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza,
	Harsh Prateek Bora, Sourabh Jain, Mahesh J Salgaonkar,
	Hari Bathini

While the first kernel boots, it registers memory regions for fadump
such as:
    * CPU state data  (has to be populated by the platform)
    * HPTE state data (has to be populated by the platform)
    * Real Mode Regions (platform should copy it to requested
      destination addresses)
    * OS defined regions (such as parameter save area)

Platform is also expected to modify the 'bytes_dumped' to the length of
data preserved/copied by platform (ideally same as the source length
passed by kernel).

The kernel passes source address and length for the memory regions, and
a destination address to where the memory is to be copied.

Implement the preserving/copying of the Real Mode Regions and the
Parameter Save Area in QEMU Pseries

Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
 hw/ppc/spapr_rtas.c    | 117 ++++++++++++++++++++++++++++++++++++++++-
 include/hw/ppc/spapr.h |  27 +++++++++-
 2 files changed, 142 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 01c82375f03d..9b29cadab2c9 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -344,6 +344,120 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
 struct fadump_metadata fadump_metadata;
 bool is_next_boot_fadump;
 
+/* Preserve the memory locations registered for fadump */
+static bool fadump_preserve_mem(void)
+{
+    struct rtas_fadump_mem_struct *fdm = &fadump_metadata.registered_fdm;
+    uint64_t next_section_addr;
+    int dump_num_sections, data_type;
+    uint64_t src_addr, src_len, dest_addr;
+    void *copy_buffer;
+
+    assert(fadump_metadata.fadump_registered);
+    assert(fadump_metadata.fdm_addr != -1);
+
+    /* Read the fadump header passed during fadump registration */
+    cpu_physical_memory_read(fadump_metadata.fdm_addr,
+            &fdm->header, sizeof(fdm->header));
+
+    /* Verify that we understand the fadump header version */
+    if (fdm->header.dump_format_version != cpu_to_be32(FADUMP_VERSION)) {
+        /*
+         * Dump format version is unknown and likely changed from the time
+         * of fadump registration. Back out now.
+         */
+        return false;
+    }
+
+    dump_num_sections = be16_to_cpu(fdm->header.dump_num_sections);
+
+    if (dump_num_sections > FADUMP_MAX_SECTIONS) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+            "FADUMP: Too many sections: %d\n", fdm->header.dump_num_sections);
+        return false;
+    }
+
+    next_section_addr =
+        fadump_metadata.fdm_addr +
+        be32_to_cpu(fdm->header.offset_first_dump_section);
+
+    /*
+     * Handle all sections
+     *
+     * CPU State Data and HPTE regions are handled in their own cases
+     *
+     * RMR regions and any custom OS reserved regions such as parameter
+     * save area, are handled by simply copying the source region to
+     * destination address
+     */
+    for (int i = 0; i < dump_num_sections; ++i) {
+        /* Read the fadump section from memory */
+        cpu_physical_memory_read(next_section_addr,
+                &fdm->rgn[i], sizeof(fdm->rgn[i]));
+
+        next_section_addr += sizeof(fdm->rgn[i]);
+
+        data_type = be16_to_cpu(fdm->rgn[i].source_data_type);
+        src_addr  = be64_to_cpu(fdm->rgn[i].source_address);
+        src_len   = be64_to_cpu(fdm->rgn[i].source_len);
+        dest_addr = be64_to_cpu(fdm->rgn[i].destination_address);
+
+        /* Reset error_flags & bytes_dumped for now */
+        fdm->rgn[i].error_flags = 0;
+        fdm->rgn[i].bytes_dumped = 0;
+
+        if (be32_to_cpu(fdm->rgn[i].request_flag) != FADUMP_REQUEST_FLAG) {
+            qemu_log_mask(LOG_UNIMP,
+                "FADUMP: Skipping copying region as not requested\n");
+            continue;
+        }
+
+        switch (data_type) {
+        case FADUMP_CPU_STATE_DATA:
+            /* TODO: Add CPU state data */
+            break;
+        case FADUMP_HPTE_REGION:
+            /* TODO: Add hpte state data */
+            break;
+        case FADUMP_REAL_MODE_REGION:
+        case FADUMP_PARAM_AREA:
+            /* Skip copy if source and destination are same (eg. param area) */
+            if (src_addr != dest_addr) {
+                copy_buffer = g_malloc(src_len + 1);
+                if (copy_buffer == NULL) {
+                    qemu_log_mask(LOG_GUEST_ERROR,
+                        "FADUMP: Failed allocating memory for copying reserved memory regions\n");
+                    fdm->rgn[i].error_flags =
+                        cpu_to_be16(FADUMP_ERROR_LENGTH_EXCEEDS_SOURCE);
+
+                    continue;
+                }
+
+                /* Copy the source region to destination */
+                cpu_physical_memory_read(src_addr, copy_buffer, src_len);
+                cpu_physical_memory_write(dest_addr, copy_buffer, src_len);
+                g_free(copy_buffer);
+            }
+
+            /*
+             * Considering cpu_physical_memory_write would have copied the
+             * complete region
+             */
+            fdm->rgn[i].bytes_dumped = cpu_to_be64(src_len);
+
+            break;
+        default:
+            qemu_log_mask(LOG_GUEST_ERROR,
+                "FADUMP: Skipping unknown source data type: %d\n", data_type);
+
+            fdm->rgn[i].error_flags =
+                cpu_to_be16(FADUMP_ERROR_INVALID_DATA_TYPE);
+        }
+    }
+
+    return true;
+}
+
 static void trigger_fadump_boot(target_ulong spapr_retcode)
 {
     /*
@@ -353,7 +467,8 @@ static void trigger_fadump_boot(target_ulong spapr_retcode)
      */
     pause_all_vcpus();
 
-    if (true /* TODO: Preserve memory registered for fadump */) {
+    /* Preserve the memory locations registered for fadump */
+    if (!fadump_preserve_mem()) {
         /* Failed to preserve the registered memory regions */
         rtas_st(spapr_retcode, 0, RTAS_OUT_HW_ERROR);
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index efa2f891a8a7..a80704187583 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -776,7 +776,32 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
 #define FADUMP_CMD_UNREGISTER          2
 #define FADUMP_CMD_INVALIDATE          3
 
-#define FADUMP_VERSION    1
+#define FADUMP_VERSION                 1
+
+/*
+ * The Firmware Assisted Dump Memory structure supports a maximum of 10 sections
+ * in the dump memory structure. Presently, three sections are used for
+ * CPU state data, HPTE & Parameters area, while the remaining seven sections
+ * can be used for boot memory regions.
+ */
+#define FADUMP_MAX_SECTIONS            10
+#define RTAS_FADUMP_MAX_BOOT_MEM_REGS  7
+
+/* Firmware provided dump sections */
+#define FADUMP_CPU_STATE_DATA   0x0001
+#define FADUMP_HPTE_REGION      0x0002
+#define FADUMP_REAL_MODE_REGION 0x0011
+
+/* OS defined sections */
+#define FADUMP_PARAM_AREA       0x0100
+
+/* Dump request flag */
+#define FADUMP_REQUEST_FLAG     0x00000001
+
+/* Dump status flag */
+#define FADUMP_ERROR_INVALID_DATA_TYPE          0x8000
+#define FADUMP_ERROR_INVALID_SOURCE_ADDR        0x4000
+#define FADUMP_ERROR_LENGTH_EXCEEDS_SOURCE      0x2000
 
 /*
  * The Firmware Assisted Dump Memory structure supports a maximum of 10 sections
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/6] hw/ppc: Implement saving CPU state in Fadump
  2025-02-17  7:17 [PATCH 0/6] Implement Firmware Assisted Dump for PSeries Aditya Gupta
                   ` (2 preceding siblings ...)
  2025-02-17  7:17 ` [PATCH 3/6] hw/ppc: Preserve memory regions registered for fadump Aditya Gupta
@ 2025-02-17  7:17 ` Aditya Gupta
  2025-02-27  3:27   ` Nicholas Piggin
  2025-03-05  7:23   ` Harsh Prateek Bora
  2025-02-17  7:17 ` [PATCH 5/6] hw/ppc: Pass device tree properties for Fadump Aditya Gupta
  2025-02-17  7:17 ` [PATCH 6/6] hw/ppc: Enable Fadump for PSeries Aditya Gupta
  5 siblings, 2 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-02-17  7:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza,
	Harsh Prateek Bora, Sourabh Jain, Mahesh J Salgaonkar,
	Hari Bathini

Kernel expects CPU states/register states in the format mentioned in
"Register Save Area" in PAPR.

The platform (in our case, QEMU) saves each CPU register in the form of
an array of "register entries", the start and end of this array is
signified by "CPUSTRT" and "CPUEND" register entries respectively.

The CPUSTRT and CPUEND register entry also has 4-byte logical CPU ID,
thus storing the CPU ID corresponding to the array of register entries.

Each register, and CPUSTRT, CPUEND has a predefined identifier.
Implement calculating identifier for a given register in
'fadump_str_to_u64', which has been taken from the linux kernel

Similarly GPRs also have predefined identifiers, and a corresponding
64-bit resiter value (split into two 32-bit cells). Implement
calculation of GPR identifiers with 'fadump_gpr_id_to_u64'

PAPR has restrictions on particular order of few registers, and is
free to be in any order for other registers.
Some registers mentioned in PAPR have not been exported as they are not
implemented in QEMU / don't make sense in QEMU.

Implement saving of CPU state according to the PAPR document

Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
 hw/ppc/spapr_rtas.c    | 200 ++++++++++++++++++++++++++++++++++++++++-
 include/hw/ppc/spapr.h |  83 +++++++++++++++++
 2 files changed, 281 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 9b29cadab2c9..0aca4270aee8 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -348,9 +348,12 @@ bool is_next_boot_fadump;
 static bool fadump_preserve_mem(void)
 {
     struct rtas_fadump_mem_struct *fdm = &fadump_metadata.registered_fdm;
+    struct rtas_fadump_section *cpu_state_region;
     uint64_t next_section_addr;
     int dump_num_sections, data_type;
     uint64_t src_addr, src_len, dest_addr;
+    uint64_t cpu_state_addr, cpu_state_len = 0;
+    void *cpu_state_buffer;
     void *copy_buffer;
 
     assert(fadump_metadata.fadump_registered);
@@ -413,9 +416,174 @@ static bool fadump_preserve_mem(void)
         }
 
         switch (data_type) {
-        case FADUMP_CPU_STATE_DATA:
-            /* TODO: Add CPU state data */
+        case FADUMP_CPU_STATE_DATA: {
+            struct rtas_fadump_reg_save_area_header reg_save_hdr;
+            struct rtas_fadump_reg_entry **reg_entries;
+            struct rtas_fadump_reg_entry *curr_reg_entry;
+
+            uint32_t fadump_reg_entries_size;
+            __be32 num_cpus = 0;
+            uint32_t num_regs_per_cpu = 0;
+            CPUState *cpu;
+            CPUPPCState *env;
+            PowerPCCPU *ppc_cpu;
+
+            CPU_FOREACH(cpu) {
+                ++num_cpus;
+            }
+
+            reg_save_hdr.version = cpu_to_be32(1);
+            reg_save_hdr.magic_number =
+                cpu_to_be64(fadump_str_to_u64("REGSAVE"));
+
+            /* Reg save area header is immediately followed by num cpus */
+            reg_save_hdr.num_cpu_offset =
+                cpu_to_be32(sizeof(struct rtas_fadump_reg_save_area_header));
+
+            fadump_reg_entries_size = num_cpus *
+                                      FADUMP_NUM_PER_CPU_REGS *
+                                      sizeof(struct rtas_fadump_reg_entry);
+
+            reg_entries = malloc(fadump_reg_entries_size);
+            curr_reg_entry = (struct rtas_fadump_reg_entry *)reg_entries;
+
+            /* This must loop num_cpus time */
+            CPU_FOREACH(cpu) {
+                ppc_cpu = POWERPC_CPU(cpu);
+                env = cpu_env(cpu);
+                num_regs_per_cpu = 0;
+
+                curr_reg_entry->reg_id =
+                    cpu_to_be64(fadump_str_to_u64("CPUSTRT"));
+                curr_reg_entry->reg_value = ppc_cpu->vcpu_id;
+                ++curr_reg_entry;
+
+#define REG_ENTRY(id, val)                                     \
+                do {                                           \
+                    curr_reg_entry->reg_id =                   \
+                        cpu_to_be64(fadump_str_to_u64(#id));   \
+                    curr_reg_entry->reg_value = val;           \
+                    ++curr_reg_entry;                          \
+                    ++num_regs_per_cpu;                        \
+                } while (0)
+
+                REG_ENTRY(ACOP, env->spr[SPR_ACOP]);
+                REG_ENTRY(AMR, env->spr[SPR_AMR]);
+                REG_ENTRY(BESCR, env->spr[SPR_BESCR]);
+                REG_ENTRY(CFAR, env->spr[SPR_CFAR]);
+                REG_ENTRY(CIABR, env->spr[SPR_CIABR]);
+
+                /* Save the condition register */
+                uint64_t cr = 0;
+                cr |= (env->crf[0] & 0xf);
+                cr |= (env->crf[1] & 0xf) << 1;
+                cr |= (env->crf[2] & 0xf) << 2;
+                cr |= (env->crf[3] & 0xf) << 3;
+                cr |= (env->crf[4] & 0xf) << 4;
+                cr |= (env->crf[5] & 0xf) << 5;
+                cr |= (env->crf[6] & 0xf) << 6;
+                cr |= (env->crf[7] & 0xf) << 7;
+                REG_ENTRY(CR, cr);
+
+                REG_ENTRY(CTR, env->spr[SPR_CTR]);
+                REG_ENTRY(CTRL, env->spr[SPR_CTRL]);
+                REG_ENTRY(DABR, env->spr[SPR_DABR]);
+                REG_ENTRY(DABRX, env->spr[SPR_DABRX]);
+                REG_ENTRY(DAR, env->spr[SPR_DAR]);
+                REG_ENTRY(DAWR0, env->spr[SPR_DAWR0]);
+                REG_ENTRY(DAWR1, env->spr[SPR_DAWR1]);
+                REG_ENTRY(DAWRX0, env->spr[SPR_DAWRX0]);
+                REG_ENTRY(DAWRX1, env->spr[SPR_DAWRX1]);
+                REG_ENTRY(DPDES, env->spr[SPR_DPDES]);
+                REG_ENTRY(DSCR, env->spr[SPR_DSCR]);
+                REG_ENTRY(DSISR, env->spr[SPR_DSISR]);
+                REG_ENTRY(EBBHR, env->spr[SPR_EBBHR]);
+                REG_ENTRY(EBBRR, env->spr[SPR_EBBRR]);
+
+                REG_ENTRY(FPSCR, env->fpscr);
+                REG_ENTRY(FSCR, env->spr[SPR_FSCR]);
+
+                /* Save the GPRs */
+                for (int gpr_id = 0; gpr_id < 32; ++gpr_id) {
+                    curr_reg_entry->reg_id =
+                        cpu_to_be64(fadump_gpr_id_to_u64(gpr_id));
+                    curr_reg_entry->reg_value = env->gpr[i];
+                    ++curr_reg_entry;
+                    ++num_regs_per_cpu;
+                }
+
+                REG_ENTRY(IAMR, env->spr[SPR_IAMR]);
+                REG_ENTRY(IC, env->spr[SPR_IC]);
+                REG_ENTRY(LR, env->spr[SPR_LR]);
+
+                REG_ENTRY(MSR, env->msr);
+                REG_ENTRY(NIA, env->nip);   /* NIA */
+                REG_ENTRY(PIR, env->spr[SPR_PIR]);
+                REG_ENTRY(PSPB, env->spr[SPR_PSPB]);
+                REG_ENTRY(PVR, env->spr[SPR_PVR]);
+                REG_ENTRY(RPR, env->spr[SPR_RPR]);
+                REG_ENTRY(SPURR, env->spr[SPR_SPURR]);
+                REG_ENTRY(SRR0, env->spr[SPR_SRR0]);
+                REG_ENTRY(SRR1, env->spr[SPR_SRR1]);
+                REG_ENTRY(TAR, env->spr[SPR_TAR]);
+                REG_ENTRY(TEXASR, env->spr[SPR_TEXASR]);
+                REG_ENTRY(TFHAR, env->spr[SPR_TFHAR]);
+                REG_ENTRY(TFIAR, env->spr[SPR_TFIAR]);
+                REG_ENTRY(TIR, env->spr[SPR_TIR]);
+                REG_ENTRY(UAMOR, env->spr[SPR_UAMOR]);
+                REG_ENTRY(VRSAVE, env->spr[SPR_VRSAVE]);
+                REG_ENTRY(VSCR, env->vscr);
+                REG_ENTRY(VTB, env->spr[SPR_VTB]);
+                REG_ENTRY(WORT, env->spr[SPR_WORT]);
+                REG_ENTRY(XER, env->spr[SPR_XER]);
+
+                /*
+                 * Ignoring transaction checkpoint and few other registers
+                 * mentioned in PAPR as not supported in QEMU
+                 */
+#undef REG_ENTRY
+
+                /* End the registers for this CPU with "CPUEND" reg entry */
+                curr_reg_entry->reg_id =
+                    cpu_to_be64(fadump_str_to_u64("CPUEND"));
+
+                /* Ensure the number of registers match (+2 for STRT & END) */
+                assert(FADUMP_NUM_PER_CPU_REGS == num_regs_per_cpu + 2);
+
+                ++curr_reg_entry;
+            }
+
+            cpu_state_len = 0;
+            cpu_state_len += sizeof(reg_save_hdr);     /* reg save header */
+            cpu_state_len += sizeof(__be32);           /* num_cpus */
+            cpu_state_len += fadump_reg_entries_size;  /* reg entries */
+
+            cpu_state_region = &fdm->rgn[i];
+            cpu_state_addr = dest_addr;
+            cpu_state_buffer = g_malloc(cpu_state_len);
+
+            uint64_t offset = 0;
+            memcpy(cpu_state_buffer + offset,
+                    &reg_save_hdr, sizeof(reg_save_hdr));
+            offset += sizeof(reg_save_hdr);
+
+            /* Write num_cpus */
+            num_cpus = cpu_to_be32(num_cpus);
+            memcpy(cpu_state_buffer + offset, &num_cpus, sizeof(__be32));
+            offset += sizeof(__be32);
+
+            /* Write the register entries */
+            memcpy(cpu_state_buffer + offset,
+                    reg_entries, fadump_reg_entries_size);
+            offset += fadump_reg_entries_size;
+
+            /*
+             * We will write the cpu state data later, as otherwise it
+             * might get overwritten by other fadump regions
+             */
+
             break;
+        }
         case FADUMP_HPTE_REGION:
             /* TODO: Add hpte state data */
             break;
@@ -455,6 +623,34 @@ static bool fadump_preserve_mem(void)
         }
     }
 
+    /*
+     * Write the Register Save Area
+     *
+     * CPU State/Register Save Area should be written after dumping the
+     * memory to prevent overwritting while saving other memory regions
+     *
+     * eg. If boot memory region is 1G, then both the first 1GB memory, and
+     * the Register Save Area needs to be saved at 1GB.
+     * And as the CPU_STATE_DATA region comes first than the
+     * REAL_MODE_REGION region to be copied, the CPU_STATE_DATA will get
+     * overwritten if saved before the 0GB - 1GB region is copied after
+     * saving CPU state data
+     */
+    cpu_physical_memory_write(cpu_state_addr, cpu_state_buffer, cpu_state_len);
+    g_free(cpu_state_buffer);
+
+    /*
+     * Set bytes_dumped in cpu state region, so kernel knows platform have
+     * exported it
+     */
+    cpu_state_region->bytes_dumped = cpu_to_be64(cpu_state_len);
+
+    if (cpu_state_region->source_len != cpu_state_region->bytes_dumped) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                "CPU State region's length passed by kernel, doesn't match"
+                " with CPU State region length exported by QEMU");
+    }
+
     return true;
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a80704187583..0e8002bad9e0 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -792,6 +792,9 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
 #define FADUMP_HPTE_REGION      0x0002
 #define FADUMP_REAL_MODE_REGION 0x0011
 
+/* Number of registers stored per cpu */
+#define FADUMP_NUM_PER_CPU_REGS (32 /*GPR*/ + 45 /*others*/ + 2 /*STRT & END*/)
+
 /* OS defined sections */
 #define FADUMP_PARAM_AREA       0x0100
 
@@ -845,6 +848,86 @@ struct rtas_fadump_mem_struct {
     struct rtas_fadump_section        rgn[FADUMP_MAX_SECTIONS];
 };
 
+/*
+ * The firmware-assisted dump format.
+ *
+ * The register save area is an area in the partition's memory used to preserve
+ * the register contents (CPU state data) for the active CPUs during a firmware
+ * assisted dump. The dump format contains register save area header followed
+ * by register entries. Each list of registers for a CPU starts with "CPUSTRT"
+ * and ends with "CPUEND".
+ */
+
+/* Register save area header. */
+struct rtas_fadump_reg_save_area_header {
+    __be64    magic_number;
+    __be32    version;
+    __be32    num_cpu_offset;
+};
+
+/* Register entry. */
+struct rtas_fadump_reg_entry {
+    __be64    reg_id;
+    __be64    reg_value;
+};
+
+/*
+ * Copy the ascii values for first 8 characters from a string into u64
+ * variable at their respective indexes.
+ * e.g.
+ *  The string "FADMPINF" will be converted into 0x4641444d50494e46
+ */
+static inline uint64_t fadump_str_to_u64(const char *str)
+{
+    uint64_t val = 0;
+    int i;
+
+    for (i = 0; i < sizeof(val); i++) {
+        val = (*str) ? (val << 8) | *str++ : val << 8;
+    }
+    return val;
+}
+
+/**
+ * Get the identifier id for register entries of GPRs
+ *
+ * It gives the same id as 'fadump_str_to_u64' when the complete string id
+ * of the GPR is given, ie.
+ *
+ *   fadump_str_to_u64("GPR05") == fadump_gpr_id_to_u64(5);
+ *   fadump_str_to_u64("GPR12") == fadump_gpr_id_to_u64(12);
+ *
+ * And so on. Hence this can be implemented by creating a dynamic
+ * string for each GPR, such as "GPR00", "GPR01", ... "GPR31"
+ * Instead of allocating a string, an observation from the math of
+ * 'fadump_str_to_u64' or from PAPR tells us that there's a pattern
+ * in the identifier IDs, such that the first 8 bytes are affected only by
+ * whether it is GPR0*, GPR1*, GPR2*, GPR3*. 9th byte is always 0x3. And
+ * the the 10th byte is the index of the GPR modulo 10.
+ */
+static inline uint64_t fadump_gpr_id_to_u64(uint32_t gpr_id)
+{
+    uint64_t val = 0;
+
+    /* Valid range of GPR id is only GPR0 to GPR31 */
+    assert(gpr_id < 32);
+
+    if (gpr_id <= 9) {
+        val = fadump_str_to_u64("GPR0");
+    } else if (gpr_id <= 19) {
+        val = fadump_str_to_u64("GPR1");
+    } else if (gpr_id <= 29) {
+        val = fadump_str_to_u64("GPR2");
+    } else {
+        val = fadump_str_to_u64("GPR3");
+    }
+
+    val |= 0x30000000;
+    val |= ((gpr_id % 10) << 12);
+
+    return val;
+}
+
 struct fadump_metadata {
     bool fadump_registered;
     bool fadump_dump_active;
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/6] hw/ppc: Pass device tree properties for Fadump
  2025-02-17  7:17 [PATCH 0/6] Implement Firmware Assisted Dump for PSeries Aditya Gupta
                   ` (3 preceding siblings ...)
  2025-02-17  7:17 ` [PATCH 4/6] hw/ppc: Implement saving CPU state in Fadump Aditya Gupta
@ 2025-02-17  7:17 ` Aditya Gupta
  2025-02-27  3:28   ` Nicholas Piggin
  2025-02-17  7:17 ` [PATCH 6/6] hw/ppc: Enable Fadump for PSeries Aditya Gupta
  5 siblings, 1 reply; 29+ messages in thread
From: Aditya Gupta @ 2025-02-17  7:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza,
	Harsh Prateek Bora, Sourabh Jain, Mahesh J Salgaonkar,
	Hari Bathini

Platform (ie. QEMU) is expected to pass few device tree properties for
details for fadump:

  * "ibm,configure-kernel-dump": RTAS call for fadump
  * "ibm,configure-kernel-dump-sizes": Space required to store dump data
    for firmware provided dump sections (ie. CPU & HPTE regions)
  * "ibm,configure-kernel-dump-version": Versions of fadump supported
  * "ibm,kernel-dump": Contains the Fadump Memory Structure on a fadump
    boot

Implement passing configure-kernel-dump-sizes, and
configure-kernel-dump-version device tree properties, irrespective of
whether it's a fadump boot or not, so that kernel can reserve memory to
store the firmware provided dump sections in case of a crash

Also, in case of a fadump boot, pass the fadump memory structure to the
kernel in "ibm,kernel-dump" device tree property.

Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
 hw/ppc/spapr.c         | 62 ++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |  2 ++
 2 files changed, 64 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f3a4b4235d43..3602e5b5d18d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -897,9 +897,27 @@ static int spapr_dt_rng(void *fdt)
 static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
 {
     MachineState *ms = MACHINE(spapr);
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
     int rtas;
     GString *hypertas = g_string_sized_new(256);
     GString *qemu_hypertas = g_string_sized_new(256);
+    uint32_t max_possible_cpus = mc->possible_cpu_arch_ids(ms)->len;
+    uint64_t fadump_cpu_state_size = 0;
+    uint16_t fadump_versions[2] = {
+        FADUMP_VERSION /* min supported version */,
+        FADUMP_VERSION /* max supported version */
+    };
+    uint32_t fadump_rgn_sizes[2][3] = {
+        {
+            cpu_to_be32(FADUMP_CPU_STATE_DATA),
+            0, 0 /* Calculated later */
+        },
+        {
+            cpu_to_be32(FADUMP_HPTE_REGION),
+            0, 0 /* HPTE region not implemented */
+        }
+    };
+
     uint32_t lrdr_capacity[] = {
         0,
         0,
@@ -1006,6 +1024,50 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
     _FDT(fdt_setprop(fdt, rtas, "ibm,lrdr-capacity",
                      lrdr_capacity, sizeof(lrdr_capacity)));
 
+    /*
+     * CPU State Data contains multiple fields such as header, num_cpus and
+     * register entries
+     *
+     * Calculate the maximum CPU State Data size, according to maximum
+     * possible CPUs the QEMU VM can have
+     */
+    /* Reg save header */
+    fadump_cpu_state_size += sizeof(struct rtas_fadump_reg_save_area_header);
+
+    /* Num_cpus */
+    fadump_cpu_state_size += sizeof(__be32);
+
+    /* Register Entries */
+    fadump_cpu_state_size += max_possible_cpus   *
+                             FADUMP_NUM_PER_CPU_REGS *
+                             sizeof(struct rtas_fadump_reg_entry);
+
+    /* Set maximum size for CPU state data region */
+    assert(fadump_rgn_sizes[0][0] == cpu_to_be32(FADUMP_CPU_STATE_DATA));
+
+    /* Upper 32 bits of size, usually 0 */
+    fadump_rgn_sizes[0][1] = cpu_to_be32(fadump_cpu_state_size >> 32);
+
+    /* Lower 32 bits of size */
+    fadump_rgn_sizes[0][2] = cpu_to_be32(fadump_cpu_state_size & 0xffffffff);
+
+    /* Add device tree properties required from platform for fadump */
+    _FDT((fdt_setprop(fdt, rtas, "ibm,configure-kernel-dump-version",
+                    fadump_versions, sizeof(fadump_versions))));
+    _FDT((fdt_setprop(fdt, rtas, "ibm,configure-kernel-dump-sizes",
+                    fadump_rgn_sizes, sizeof(fadump_rgn_sizes))));
+
+    if (is_next_boot_fadump) {
+        struct rtas_fadump_mem_struct *fdm =
+            &fadump_metadata.registered_fdm;
+
+        uint64_t fdm_size =
+            sizeof(struct rtas_fadump_section_header) +
+            (be16_to_cpu(fdm->header.dump_num_sections) *
+            sizeof(struct rtas_fadump_section));
+
+        _FDT((fdt_setprop(fdt, rtas, "ibm,kernel-dump", fdm, fdm_size)));
+    }
     spapr_dt_rtas_tokens(fdt, rtas);
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 0e8002bad9e0..fa63008e57ec 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -928,6 +928,8 @@ static inline uint64_t fadump_gpr_id_to_u64(uint32_t gpr_id)
     return val;
 }
 
+extern bool is_next_boot_fadump;
+
 struct fadump_metadata {
     bool fadump_registered;
     bool fadump_dump_active;
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/6] hw/ppc: Enable Fadump for PSeries
  2025-02-17  7:17 [PATCH 0/6] Implement Firmware Assisted Dump for PSeries Aditya Gupta
                   ` (4 preceding siblings ...)
  2025-02-17  7:17 ` [PATCH 5/6] hw/ppc: Pass device tree properties for Fadump Aditya Gupta
@ 2025-02-17  7:17 ` Aditya Gupta
  2025-02-27  3:33   ` Nicholas Piggin
  5 siblings, 1 reply; 29+ messages in thread
From: Aditya Gupta @ 2025-02-17  7:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza,
	Harsh Prateek Bora, Sourabh Jain, Mahesh J Salgaonkar,
	Hari Bathini

With all support in place, enable fadump by exporting the
"ibm,configure-kernel-dump" RTAS call in the device tree.

Presence of "ibm,configure-kernel-dump" tells the kernel that the
platform (QEMU) supports fadump.

Pass "fadump=on" to enable Linux to use firmware assisted dump.

Logs of a linux boot with firmware assisted dump:

    ./build/qemu-system-ppc64 -M pseries,x-vof=on --cpu power10 --smp 4 -m 4G -kernel some-vmlinux -initrd some-initrd -append "debug fadump=on crashkernel=1G" -nographic
    [    0.000000] random: crng init done
    [    0.000000] fadump: Reserved 1024MB of memory at 0x00000040000000 (System RAM: 4096MB)
    ...
    [    1.084686] rtas fadump: Registration is successful!
    ...
    # cat /sys/kernel/debug/powerpc/fadump_region
    CPU :[0x00000040000000-0x000000400013d3] 0x13d4 bytes, Dumped: 0x0
    HPTE:[0x000000400013d4-0x000000400013d3] 0x0 bytes, Dumped: 0x0
    DUMP: Src: 0x00000000000000, Dest: 0x00000040010000, Size: 0x40000000, Dumped: 0x0 bytes

    [0x000000fffff800-0x000000ffffffff]: cmdline append: ''
    # echo c > /proc/sysrq-trigger

The fadump boot after crash:

    [    0.000000] rtas fadump: Firmware-assisted dump is active.
    [    0.000000] fadump: Updated cmdline: debug fadump=on crashkernel=1G
    [    0.000000] fadump: Firmware-assisted dump is active.
    [    0.000000] fadump: Reserving 3072MB of memory at 0x00000040000000 for preserving crash data
    ....
    # file /proc/vmcore
    /proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style

Analysing the vmcore with crash-utility:

          KERNEL: vmlinux-6.14-rc2
        DUMPFILE: vmcore-a64dcfb451e2-nocma
            CPUS: 4
            DATE: Thu Jan  1 05:30:00 IST 1970
          UPTIME: 00:00:30
    LOAD AVERAGE: 0.74, 0.21, 0.07
           TASKS: 94
        NODENAME: buildroot
         RELEASE: 6.14.0-rc2+
         VERSION: #1 SMP Wed Feb 12 06:49:59 CST 2025
         MACHINE: ppc64le  (1000 Mhz)
          MEMORY: 4 GB
           PANIC: "Kernel panic - not syncing: sysrq triggered crash"
             PID: 270
         COMMAND: "sh"
            TASK: c000000009e7cc00  [THREAD_INFO: c000000009e7cc00]
             CPU: 3
           STATE: TASK_RUNNING (PANIC)

Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
---
 hw/ppc/spapr_rtas.c    | 6 +++++-
 include/hw/ppc/spapr.h | 3 ++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 0aca4270aee8..bd2ed16a46e3 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -692,7 +692,7 @@ static void trigger_fadump_boot(target_ulong spapr_retcode)
 }
 
 /* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
-static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,
+static void rtas_configure_kernel_dump(PowerPCCPU *cpu,
                                    SpaprMachineState *spapr,
                                    uint32_t token, uint32_t nargs,
                                    target_ulong args,
@@ -1109,6 +1109,10 @@ static void core_rtas_register_types(void)
     spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
                         rtas_ibm_nmi_interlock);
 
+    /* Register Fadump rtas call */
+    spapr_rtas_register(RTAS_CONFIGURE_KERNEL_DUMP, "ibm,configure-kernel-dump",
+                        rtas_configure_kernel_dump);
+
     qtest_set_command_cb(spapr_qtest_callback);
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index fa63008e57ec..bde3bdc4b80c 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -768,8 +768,9 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
 #define RTAS_IBM_SUSPEND_ME                     (RTAS_TOKEN_BASE + 0x2A)
 #define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2B)
 #define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2C)
+#define RTAS_CONFIGURE_KERNEL_DUMP              (RTAS_TOKEN_BASE + 0x2D)
 
-#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2D)
+#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2E)
 
 /* Fadump commands */
 #define FADUMP_CMD_REGISTER            1
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries
  2025-02-17  7:17 ` [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries Aditya Gupta
@ 2025-02-27  3:07   ` Nicholas Piggin
  2025-02-27  6:49     ` Aditya Gupta
  2025-03-04  9:01   ` Harsh Prateek Bora
  1 sibling, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2025-02-27  3:07 UTC (permalink / raw)
  To: Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
> Implement the handler for "ibm,configure-kernel-dump" rtas call in QEMU.
>
> Currently the handler just does basic checks and handles
> register/unregister/invalidate requests from kernel.
>
> Fadump will be enabled in a later patch.
>
> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> ---
>  hw/ppc/spapr_rtas.c    | 99 ++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h | 59 +++++++++++++++++++++++++
>  2 files changed, 158 insertions(+)
>
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index df2e837632aa..eebdf13b1552 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -341,6 +341,105 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
>      rtas_st(rets, 0, ret);
>  }
>  
> +struct fadump_metadata fadump_metadata;

Can this (and other globals added in other patches) come under
SpaprMachineState?

And could most of the fadump code and structures go under new
spapr_fadump.[ch] files?

> +
> +/* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
> +static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,
> +                                   SpaprMachineState *spapr,
> +                                   uint32_t token, uint32_t nargs,
> +                                   target_ulong args,
> +                                   uint32_t nret, target_ulong rets)

I don't know about adding a new unused function like this, is there
a way to juggle patches around to add it when it's wired up?

> +{
> +    struct rtas_fadump_section_header header;
> +    target_ulong cmd = rtas_ld(args, 0);
> +    target_ulong fdm_addr = rtas_ld(args, 1);
> +    target_ulong fdm_size = rtas_ld(args, 2);
> +
> +    /* Number outputs has to be 1 */
> +    if (nret != 1) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                "FADUMP: ibm,configure-kernel-dump RTAS called with nret != 1.\n");
> +        return;
> +    }
> +
> +    /* Number inputs has to be 3 */
> +    if (nargs != 3) {
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +        return;
> +    }
> +
> +    switch (cmd) {
> +    case FADUMP_CMD_REGISTER:
> +        if (fadump_metadata.fadump_registered) {
> +            /* Fadump already registered */
> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ALREADY_REGISTERED);
> +            return;
> +        }
> +
> +        if (fadump_metadata.fadump_dump_active == 1) {
> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ACTIVE);
> +            return;
> +        }
> +
> +        if (fdm_size < sizeof(struct rtas_fadump_section_header)) {
> +            qemu_log_mask(LOG_GUEST_ERROR,
> +                "FADUMP: Header size is invalid: %lu\n", fdm_size);
> +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +            return;
> +        }
> +
> +        /* XXX: Can we ensure fdm_addr points to a valid RMR-memory buffer ? */

RMR memory? There is spapr_rma_size() if that's what you need?

Thanks,
Nick


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/6] hw/ppc: Trigger Fadump boot if fadump is registered
  2025-02-17  7:17 ` [PATCH 2/6] hw/ppc: Trigger Fadump boot if fadump is registered Aditya Gupta
@ 2025-02-27  3:14   ` Nicholas Piggin
  2025-02-27  6:56     ` Aditya Gupta
  2025-03-04  9:21   ` Harsh Prateek Bora
  1 sibling, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2025-02-27  3:14 UTC (permalink / raw)
  To: Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
> According to PAPR:
>
>     R1–7.3.30–3. When the platform receives an ibm,os-term RTAS call, or
>     on a system reset without an ibm,nmi-interlock RTAS call, if the
>     platform has a dump structure registered through the
>     ibm,configure-kernel-dump call, the platform must process each
>     registered kernel dump section as required and, when available,
>     present the dump structure information to the operating system
>     through the “ibm,kernel-dump” property, updated with status for each
>     dump section, until the dump has been invalidated through the
>     ibm,configure-kernel-dump RTAS call.
>
> If Fadump has been registered, trigger an Fadump boot (memory preserving
> boot), if QEMU recieves a 'ibm,os-term' rtas call.
>
> Implementing the fadump boot as:
>     * pause all vcpus (will save registers later)
>     * preserve memory regions specified by fadump
>     * do a memory preserving reboot (GUEST_RESET in QEMU doesn't clear
>       the memory)
>
> Memory regions registered by fadump will be handled in a later patch.
>
> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> ---
>  hw/ppc/spapr_rtas.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
>
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index eebdf13b1552..01c82375f03d 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -342,6 +342,43 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
>  }
>  
>  struct fadump_metadata fadump_metadata;
> +bool is_next_boot_fadump;

Here's another one for spapr state.

> +
> +static void trigger_fadump_boot(target_ulong spapr_retcode)
> +{
> +    /*
> +     * In PowerNV, SBE stops all clocks for cores, do similar to it
> +     * QEMU's nearest equivalent is 'pause_all_vcpus'
> +     * See 'stopClocksS0' in SBE source code for more info on SBE part
> +     */

Can probably remove this comment here.

> +    pause_all_vcpus();
> +
> +    if (true /* TODO: Preserve memory registered for fadump */) {

If you're adding half the code to preserve memory but never actually
calling it anyway, you don't need the pause_all_vcpus() call either.

Again I would rather not adding unused code to the patches if possible.
If you're really not able to find a nice way to split and add
incrementally then okay, but try to take another look if possible.


> +        /* Failed to preserve the registered memory regions */
> +        rtas_st(spapr_retcode, 0, RTAS_OUT_HW_ERROR);
> +
> +        /* Cause a reboot */
> +        qemu_system_guest_panicked(NULL);
> +        return;
> +    }
> +
> +    /* Mark next boot as fadump boot */
> +    is_next_boot_fadump = true;
> +
> +    /* Reset fadump_registered for next boot */
> +    fadump_metadata.fadump_registered = false;
> +    fadump_metadata.fadump_dump_active = true;
> +
> +    /* Then do a guest reset */
> +    /*
> +     * Requirement:
> +     * This guest reset should not clear the memory (which is
> +     * the case when this is merged)
> +     */
> +    qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);

Seems reasonable. What is the actual mechanism that clears the machine
RAM anyway? I'm not able to find it...

Thanks,
Nick

> +
> +    rtas_st(spapr_retcode, 0, RTAS_OUT_SUCCESS);
> +}
>  
>  /* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
>  static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,
> @@ -449,6 +486,11 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
>      target_ulong msgaddr = rtas_ld(args, 0);
>      char msg[512];
>  
> +    if (fadump_metadata.fadump_registered) {
> +        /* If fadump boot works, control won't come back here */
> +        return trigger_fadump_boot(rets);
> +    }
> +
>      cpu_physical_memory_read(msgaddr, msg, sizeof(msg) - 1);
>      msg[sizeof(msg) - 1] = 0;
>  



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/6] hw/ppc: Implement saving CPU state in Fadump
  2025-02-17  7:17 ` [PATCH 4/6] hw/ppc: Implement saving CPU state in Fadump Aditya Gupta
@ 2025-02-27  3:27   ` Nicholas Piggin
  2025-02-27  7:01     ` Aditya Gupta
  2025-03-05  7:23   ` Harsh Prateek Bora
  1 sibling, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2025-02-27  3:27 UTC (permalink / raw)
  To: Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
> Kernel expects CPU states/register states in the format mentioned in
> "Register Save Area" in PAPR.
>
> The platform (in our case, QEMU) saves each CPU register in the form of
> an array of "register entries", the start and end of this array is
> signified by "CPUSTRT" and "CPUEND" register entries respectively.
>
> The CPUSTRT and CPUEND register entry also has 4-byte logical CPU ID,
> thus storing the CPU ID corresponding to the array of register entries.
>
> Each register, and CPUSTRT, CPUEND has a predefined identifier.
> Implement calculating identifier for a given register in
> 'fadump_str_to_u64', which has been taken from the linux kernel
>
> Similarly GPRs also have predefined identifiers, and a corresponding
> 64-bit resiter value (split into two 32-bit cells). Implement
> calculation of GPR identifiers with 'fadump_gpr_id_to_u64'
>
> PAPR has restrictions on particular order of few registers, and is
> free to be in any order for other registers.
> Some registers mentioned in PAPR have not been exported as they are not
> implemented in QEMU / don't make sense in QEMU.
>
> Implement saving of CPU state according to the PAPR document
>
> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> ---
>  hw/ppc/spapr_rtas.c    | 200 ++++++++++++++++++++++++++++++++++++++++-
>  include/hw/ppc/spapr.h |  83 +++++++++++++++++
>  2 files changed, 281 insertions(+), 2 deletions(-)
>
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 9b29cadab2c9..0aca4270aee8 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -348,9 +348,12 @@ bool is_next_boot_fadump;
>  static bool fadump_preserve_mem(void)
>  {
>      struct rtas_fadump_mem_struct *fdm = &fadump_metadata.registered_fdm;
> +    struct rtas_fadump_section *cpu_state_region;
>      uint64_t next_section_addr;
>      int dump_num_sections, data_type;
>      uint64_t src_addr, src_len, dest_addr;
> +    uint64_t cpu_state_addr, cpu_state_len = 0;
> +    void *cpu_state_buffer;
>      void *copy_buffer;
>  
>      assert(fadump_metadata.fadump_registered);
> @@ -413,9 +416,174 @@ static bool fadump_preserve_mem(void)
>          }
>  
>          switch (data_type) {
> -        case FADUMP_CPU_STATE_DATA:
> -            /* TODO: Add CPU state data */
> +        case FADUMP_CPU_STATE_DATA: {

I would split these out into their own functions if they grow more than
a few lines.

> +            struct rtas_fadump_reg_save_area_header reg_save_hdr;
> +            struct rtas_fadump_reg_entry **reg_entries;
> +            struct rtas_fadump_reg_entry *curr_reg_entry;
> +
> +            uint32_t fadump_reg_entries_size;
> +            __be32 num_cpus = 0;
> +            uint32_t num_regs_per_cpu = 0;
> +            CPUState *cpu;
> +            CPUPPCState *env;
> +            PowerPCCPU *ppc_cpu;
> +
> +            CPU_FOREACH(cpu) {
> +                ++num_cpus;
> +            }
> +
> +            reg_save_hdr.version = cpu_to_be32(1);
> +            reg_save_hdr.magic_number =
> +                cpu_to_be64(fadump_str_to_u64("REGSAVE"));
> +
> +            /* Reg save area header is immediately followed by num cpus */
> +            reg_save_hdr.num_cpu_offset =
> +                cpu_to_be32(sizeof(struct rtas_fadump_reg_save_area_header));
> +
> +            fadump_reg_entries_size = num_cpus *
> +                                      FADUMP_NUM_PER_CPU_REGS *
> +                                      sizeof(struct rtas_fadump_reg_entry);
> +
> +            reg_entries = malloc(fadump_reg_entries_size);
> +            curr_reg_entry = (struct rtas_fadump_reg_entry *)reg_entries;
> +
> +            /* This must loop num_cpus time */
> +            CPU_FOREACH(cpu) {
> +                ppc_cpu = POWERPC_CPU(cpu);
> +                env = cpu_env(cpu);
> +                num_regs_per_cpu = 0;
> +
> +                curr_reg_entry->reg_id =
> +                    cpu_to_be64(fadump_str_to_u64("CPUSTRT"));
> +                curr_reg_entry->reg_value = ppc_cpu->vcpu_id;
> +                ++curr_reg_entry;
> +
> +#define REG_ENTRY(id, val)                                     \
> +                do {                                           \
> +                    curr_reg_entry->reg_id =                   \
> +                        cpu_to_be64(fadump_str_to_u64(#id));   \
> +                    curr_reg_entry->reg_value = val;           \
> +                    ++curr_reg_entry;                          \
> +                    ++num_regs_per_cpu;                        \
> +                } while (0)
> +
> +                REG_ENTRY(ACOP, env->spr[SPR_ACOP]);
> +                REG_ENTRY(AMR, env->spr[SPR_AMR]);
> +                REG_ENTRY(BESCR, env->spr[SPR_BESCR]);
> +                REG_ENTRY(CFAR, env->spr[SPR_CFAR]);
> +                REG_ENTRY(CIABR, env->spr[SPR_CIABR]);
> +
> +                /* Save the condition register */
> +                uint64_t cr = 0;
> +                cr |= (env->crf[0] & 0xf);
> +                cr |= (env->crf[1] & 0xf) << 1;
> +                cr |= (env->crf[2] & 0xf) << 2;
> +                cr |= (env->crf[3] & 0xf) << 3;
> +                cr |= (env->crf[4] & 0xf) << 4;
> +                cr |= (env->crf[5] & 0xf) << 5;
> +                cr |= (env->crf[6] & 0xf) << 6;
> +                cr |= (env->crf[7] & 0xf) << 7;

Shift values wrong here I think... Use ppc_get_cr()

> +                REG_ENTRY(CR, cr);
> +
> +                REG_ENTRY(CTR, env->spr[SPR_CTR]);
> +                REG_ENTRY(CTRL, env->spr[SPR_CTRL]);
> +                REG_ENTRY(DABR, env->spr[SPR_DABR]);
> +                REG_ENTRY(DABRX, env->spr[SPR_DABRX]);
> +                REG_ENTRY(DAR, env->spr[SPR_DAR]);
> +                REG_ENTRY(DAWR0, env->spr[SPR_DAWR0]);
> +                REG_ENTRY(DAWR1, env->spr[SPR_DAWR1]);
> +                REG_ENTRY(DAWRX0, env->spr[SPR_DAWRX0]);
> +                REG_ENTRY(DAWRX1, env->spr[SPR_DAWRX1]);
> +                REG_ENTRY(DPDES, env->spr[SPR_DPDES]);
> +                REG_ENTRY(DSCR, env->spr[SPR_DSCR]);
> +                REG_ENTRY(DSISR, env->spr[SPR_DSISR]);
> +                REG_ENTRY(EBBHR, env->spr[SPR_EBBHR]);
> +                REG_ENTRY(EBBRR, env->spr[SPR_EBBRR]);
> +
> +                REG_ENTRY(FPSCR, env->fpscr);
> +                REG_ENTRY(FSCR, env->spr[SPR_FSCR]);
> +
> +                /* Save the GPRs */
> +                for (int gpr_id = 0; gpr_id < 32; ++gpr_id) {
> +                    curr_reg_entry->reg_id =
> +                        cpu_to_be64(fadump_gpr_id_to_u64(gpr_id));
> +                    curr_reg_entry->reg_value = env->gpr[i];
> +                    ++curr_reg_entry;
> +                    ++num_regs_per_cpu;
> +                }
> +
> +                REG_ENTRY(IAMR, env->spr[SPR_IAMR]);
> +                REG_ENTRY(IC, env->spr[SPR_IC]);
> +                REG_ENTRY(LR, env->spr[SPR_LR]);
> +
> +                REG_ENTRY(MSR, env->msr);
> +                REG_ENTRY(NIA, env->nip);   /* NIA */
> +                REG_ENTRY(PIR, env->spr[SPR_PIR]);
> +                REG_ENTRY(PSPB, env->spr[SPR_PSPB]);
> +                REG_ENTRY(PVR, env->spr[SPR_PVR]);
> +                REG_ENTRY(RPR, env->spr[SPR_RPR]);
> +                REG_ENTRY(SPURR, env->spr[SPR_SPURR]);
> +                REG_ENTRY(SRR0, env->spr[SPR_SRR0]);
> +                REG_ENTRY(SRR1, env->spr[SPR_SRR1]);
> +                REG_ENTRY(TAR, env->spr[SPR_TAR]);
> +                REG_ENTRY(TEXASR, env->spr[SPR_TEXASR]);
> +                REG_ENTRY(TFHAR, env->spr[SPR_TFHAR]);
> +                REG_ENTRY(TFIAR, env->spr[SPR_TFIAR]);
> +                REG_ENTRY(TIR, env->spr[SPR_TIR]);
> +                REG_ENTRY(UAMOR, env->spr[SPR_UAMOR]);
> +                REG_ENTRY(VRSAVE, env->spr[SPR_VRSAVE]);
> +                REG_ENTRY(VSCR, env->vscr);
> +                REG_ENTRY(VTB, env->spr[SPR_VTB]);
> +                REG_ENTRY(WORT, env->spr[SPR_WORT]);
> +                REG_ENTRY(XER, env->spr[SPR_XER]);
> +
> +                /*
> +                 * Ignoring transaction checkpoint and few other registers
> +                 * mentioned in PAPR as not supported in QEMU
> +                 */
> +#undef REG_ENTRY
> +
> +                /* End the registers for this CPU with "CPUEND" reg entry */
> +                curr_reg_entry->reg_id =
> +                    cpu_to_be64(fadump_str_to_u64("CPUEND"));
> +
> +                /* Ensure the number of registers match (+2 for STRT & END) */
> +                assert(FADUMP_NUM_PER_CPU_REGS == num_regs_per_cpu + 2);
> +
> +                ++curr_reg_entry;
> +            }
> +
> +            cpu_state_len = 0;
> +            cpu_state_len += sizeof(reg_save_hdr);     /* reg save header */
> +            cpu_state_len += sizeof(__be32);           /* num_cpus */
> +            cpu_state_len += fadump_reg_entries_size;  /* reg entries */
> +
> +            cpu_state_region = &fdm->rgn[i];
> +            cpu_state_addr = dest_addr;
> +            cpu_state_buffer = g_malloc(cpu_state_len);
> +
> +            uint64_t offset = 0;
> +            memcpy(cpu_state_buffer + offset,
> +                    &reg_save_hdr, sizeof(reg_save_hdr));
> +            offset += sizeof(reg_save_hdr);
> +
> +            /* Write num_cpus */
> +            num_cpus = cpu_to_be32(num_cpus);
> +            memcpy(cpu_state_buffer + offset, &num_cpus, sizeof(__be32));
> +            offset += sizeof(__be32);
> +
> +            /* Write the register entries */
> +            memcpy(cpu_state_buffer + offset,
> +                    reg_entries, fadump_reg_entries_size);
> +            offset += fadump_reg_entries_size;
> +
> +            /*
> +             * We will write the cpu state data later, as otherwise it
> +             * might get overwritten by other fadump regions
> +             */
> +
>              break;
> +        }
>          case FADUMP_HPTE_REGION:
>              /* TODO: Add hpte state data */
>              break;
> @@ -455,6 +623,34 @@ static bool fadump_preserve_mem(void)
>          }
>      }
>  
> +    /*
> +     * Write the Register Save Area
> +     *
> +     * CPU State/Register Save Area should be written after dumping the
> +     * memory to prevent overwritting while saving other memory regions
> +     *
> +     * eg. If boot memory region is 1G, then both the first 1GB memory, and
> +     * the Register Save Area needs to be saved at 1GB.
> +     * And as the CPU_STATE_DATA region comes first than the
> +     * REAL_MODE_REGION region to be copied, the CPU_STATE_DATA will get
> +     * overwritten if saved before the 0GB - 1GB region is copied after
> +     * saving CPU state data
> +     */
> +    cpu_physical_memory_write(cpu_state_addr, cpu_state_buffer, cpu_state_len);

Check docs/devel/loads-stores.rst, address_space_* is preferred to check
for failures. It also says devices should operate on their own address
spaces and that doesn't really apply to spapr since the "virtual
hypervisor" doesn't really fit the model of a device...

Perhaps look at h_enter_nested which uses CPU(cpu)->as.

> +    g_free(cpu_state_buffer);
> +
> +    /*
> +     * Set bytes_dumped in cpu state region, so kernel knows platform have
> +     * exported it
> +     */
> +    cpu_state_region->bytes_dumped = cpu_to_be64(cpu_state_len);
> +
> +    if (cpu_state_region->source_len != cpu_state_region->bytes_dumped) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                "CPU State region's length passed by kernel, doesn't match"
> +                " with CPU State region length exported by QEMU");
> +    }
> +
>      return true;
>  }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index a80704187583..0e8002bad9e0 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -792,6 +792,9 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
>  #define FADUMP_HPTE_REGION      0x0002
>  #define FADUMP_REAL_MODE_REGION 0x0011
>  
> +/* Number of registers stored per cpu */
> +#define FADUMP_NUM_PER_CPU_REGS (32 /*GPR*/ + 45 /*others*/ + 2 /*STRT & END*/)
> +
>  /* OS defined sections */
>  #define FADUMP_PARAM_AREA       0x0100
>  
> @@ -845,6 +848,86 @@ struct rtas_fadump_mem_struct {
>      struct rtas_fadump_section        rgn[FADUMP_MAX_SECTIONS];
>  };
>  
> +/*
> + * The firmware-assisted dump format.
> + *
> + * The register save area is an area in the partition's memory used to preserve
> + * the register contents (CPU state data) for the active CPUs during a firmware
> + * assisted dump. The dump format contains register save area header followed
> + * by register entries. Each list of registers for a CPU starts with "CPUSTRT"
> + * and ends with "CPUEND".
> + */
> +
> +/* Register save area header. */
> +struct rtas_fadump_reg_save_area_header {
> +    __be64    magic_number;
> +    __be32    version;
> +    __be32    num_cpu_offset;
> +};
> +
> +/* Register entry. */
> +struct rtas_fadump_reg_entry {
> +    __be64    reg_id;
> +    __be64    reg_value;
> +};
> +
> +/*
> + * Copy the ascii values for first 8 characters from a string into u64
> + * variable at their respective indexes.
> + * e.g.
> + *  The string "FADMPINF" will be converted into 0x4641444d50494e46
> + */
> +static inline uint64_t fadump_str_to_u64(const char *str)
> +{
> +    uint64_t val = 0;
> +    int i;
> +
> +    for (i = 0; i < sizeof(val); i++) {
> +        val = (*str) ? (val << 8) | *str++ : val << 8;
> +    }
> +    return val;
> +}
> +
> +/**
> + * Get the identifier id for register entries of GPRs
> + *
> + * It gives the same id as 'fadump_str_to_u64' when the complete string id
> + * of the GPR is given, ie.
> + *
> + *   fadump_str_to_u64("GPR05") == fadump_gpr_id_to_u64(5);
> + *   fadump_str_to_u64("GPR12") == fadump_gpr_id_to_u64(12);
> + *
> + * And so on. Hence this can be implemented by creating a dynamic
> + * string for each GPR, such as "GPR00", "GPR01", ... "GPR31"
> + * Instead of allocating a string, an observation from the math of
> + * 'fadump_str_to_u64' or from PAPR tells us that there's a pattern
> + * in the identifier IDs, such that the first 8 bytes are affected only by
> + * whether it is GPR0*, GPR1*, GPR2*, GPR3*. 9th byte is always 0x3. And
> + * the the 10th byte is the index of the GPR modulo 10.
> + */
> +static inline uint64_t fadump_gpr_id_to_u64(uint32_t gpr_id)
> +{
> +    uint64_t val = 0;
> +
> +    /* Valid range of GPR id is only GPR0 to GPR31 */
> +    assert(gpr_id < 32);
> +
> +    if (gpr_id <= 9) {
> +        val = fadump_str_to_u64("GPR0");
> +    } else if (gpr_id <= 19) {
> +        val = fadump_str_to_u64("GPR1");
> +    } else if (gpr_id <= 29) {
> +        val = fadump_str_to_u64("GPR2");
> +    } else {
> +        val = fadump_str_to_u64("GPR3");
> +    }
> +
> +    val |= 0x30000000;
> +    val |= ((gpr_id % 10) << 12);
> +
> +    return val;
> +}

These two functions could probably go out of line, I doubt they
are performance critical and make them static if not used outside
the file.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/6] hw/ppc: Pass device tree properties for Fadump
  2025-02-17  7:17 ` [PATCH 5/6] hw/ppc: Pass device tree properties for Fadump Aditya Gupta
@ 2025-02-27  3:28   ` Nicholas Piggin
  2025-02-27  7:02     ` Aditya Gupta
  2025-03-05  7:34     ` Harsh Prateek Bora
  0 siblings, 2 replies; 29+ messages in thread
From: Nicholas Piggin @ 2025-02-27  3:28 UTC (permalink / raw)
  To: Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
> Platform (ie. QEMU) is expected to pass few device tree properties for
> details for fadump:
>
>   * "ibm,configure-kernel-dump": RTAS call for fadump
>   * "ibm,configure-kernel-dump-sizes": Space required to store dump data
>     for firmware provided dump sections (ie. CPU & HPTE regions)
>   * "ibm,configure-kernel-dump-version": Versions of fadump supported
>   * "ibm,kernel-dump": Contains the Fadump Memory Structure on a fadump
>     boot
>
> Implement passing configure-kernel-dump-sizes, and
> configure-kernel-dump-version device tree properties, irrespective of
> whether it's a fadump boot or not, so that kernel can reserve memory to
> store the firmware provided dump sections in case of a crash
>
> Also, in case of a fadump boot, pass the fadump memory structure to the
> kernel in "ibm,kernel-dump" device tree property.
>
> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> ---
>  hw/ppc/spapr.c         | 62 ++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |  2 ++
>  2 files changed, 64 insertions(+)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f3a4b4235d43..3602e5b5d18d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -897,9 +897,27 @@ static int spapr_dt_rng(void *fdt)
>  static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
>  {

You might be able to add a spapr_dt_rtas_fadump() function
and do it there to help keep functions small?

Thanks,
Nick

>      MachineState *ms = MACHINE(spapr);
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
>      int rtas;
>      GString *hypertas = g_string_sized_new(256);
>      GString *qemu_hypertas = g_string_sized_new(256);
> +    uint32_t max_possible_cpus = mc->possible_cpu_arch_ids(ms)->len;
> +    uint64_t fadump_cpu_state_size = 0;
> +    uint16_t fadump_versions[2] = {
> +        FADUMP_VERSION /* min supported version */,
> +        FADUMP_VERSION /* max supported version */
> +    };
> +    uint32_t fadump_rgn_sizes[2][3] = {
> +        {
> +            cpu_to_be32(FADUMP_CPU_STATE_DATA),
> +            0, 0 /* Calculated later */
> +        },
> +        {
> +            cpu_to_be32(FADUMP_HPTE_REGION),
> +            0, 0 /* HPTE region not implemented */
> +        }
> +    };
> +
>      uint32_t lrdr_capacity[] = {
>          0,
>          0,
> @@ -1006,6 +1024,50 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
>      _FDT(fdt_setprop(fdt, rtas, "ibm,lrdr-capacity",
>                       lrdr_capacity, sizeof(lrdr_capacity)));
>  
> +    /*
> +     * CPU State Data contains multiple fields such as header, num_cpus and
> +     * register entries
> +     *
> +     * Calculate the maximum CPU State Data size, according to maximum
> +     * possible CPUs the QEMU VM can have
> +     */
> +    /* Reg save header */
> +    fadump_cpu_state_size += sizeof(struct rtas_fadump_reg_save_area_header);
> +
> +    /* Num_cpus */
> +    fadump_cpu_state_size += sizeof(__be32);
> +
> +    /* Register Entries */
> +    fadump_cpu_state_size += max_possible_cpus   *
> +                             FADUMP_NUM_PER_CPU_REGS *
> +                             sizeof(struct rtas_fadump_reg_entry);
> +
> +    /* Set maximum size for CPU state data region */
> +    assert(fadump_rgn_sizes[0][0] == cpu_to_be32(FADUMP_CPU_STATE_DATA));
> +
> +    /* Upper 32 bits of size, usually 0 */
> +    fadump_rgn_sizes[0][1] = cpu_to_be32(fadump_cpu_state_size >> 32);
> +
> +    /* Lower 32 bits of size */
> +    fadump_rgn_sizes[0][2] = cpu_to_be32(fadump_cpu_state_size & 0xffffffff);
> +
> +    /* Add device tree properties required from platform for fadump */
> +    _FDT((fdt_setprop(fdt, rtas, "ibm,configure-kernel-dump-version",
> +                    fadump_versions, sizeof(fadump_versions))));
> +    _FDT((fdt_setprop(fdt, rtas, "ibm,configure-kernel-dump-sizes",
> +                    fadump_rgn_sizes, sizeof(fadump_rgn_sizes))));
> +
> +    if (is_next_boot_fadump) {
> +        struct rtas_fadump_mem_struct *fdm =
> +            &fadump_metadata.registered_fdm;
> +
> +        uint64_t fdm_size =
> +            sizeof(struct rtas_fadump_section_header) +
> +            (be16_to_cpu(fdm->header.dump_num_sections) *
> +            sizeof(struct rtas_fadump_section));
> +
> +        _FDT((fdt_setprop(fdt, rtas, "ibm,kernel-dump", fdm, fdm_size)));
> +    }
>      spapr_dt_rtas_tokens(fdt, rtas);
>  }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 0e8002bad9e0..fa63008e57ec 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -928,6 +928,8 @@ static inline uint64_t fadump_gpr_id_to_u64(uint32_t gpr_id)
>      return val;
>  }
>  
> +extern bool is_next_boot_fadump;
> +
>  struct fadump_metadata {
>      bool fadump_registered;
>      bool fadump_dump_active;



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] hw/ppc: Enable Fadump for PSeries
  2025-02-17  7:17 ` [PATCH 6/6] hw/ppc: Enable Fadump for PSeries Aditya Gupta
@ 2025-02-27  3:33   ` Nicholas Piggin
  2025-02-27  7:07     ` Aditya Gupta
  0 siblings, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2025-02-27  3:33 UTC (permalink / raw)
  To: Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
> With all support in place, enable fadump by exporting the
> "ibm,configure-kernel-dump" RTAS call in the device tree.
>
> Presence of "ibm,configure-kernel-dump" tells the kernel that the
> platform (QEMU) supports fadump.
>
> Pass "fadump=on" to enable Linux to use firmware assisted dump.
>
> Logs of a linux boot with firmware assisted dump:
>
>     ./build/qemu-system-ppc64 -M pseries,x-vof=on --cpu power10 --smp 4 -m 4G -kernel some-vmlinux -initrd some-initrd -append "debug fadump=on crashkernel=1G" -nographic
>     [    0.000000] random: crng init done
>     [    0.000000] fadump: Reserved 1024MB of memory at 0x00000040000000 (System RAM: 4096MB)
>     ...
>     [    1.084686] rtas fadump: Registration is successful!
>     ...
>     # cat /sys/kernel/debug/powerpc/fadump_region
>     CPU :[0x00000040000000-0x000000400013d3] 0x13d4 bytes, Dumped: 0x0
>     HPTE:[0x000000400013d4-0x000000400013d3] 0x0 bytes, Dumped: 0x0
>     DUMP: Src: 0x00000000000000, Dest: 0x00000040010000, Size: 0x40000000, Dumped: 0x0 bytes
>
>     [0x000000fffff800-0x000000ffffffff]: cmdline append: ''
>     # echo c > /proc/sysrq-trigger
>
> The fadump boot after crash:
>
>     [    0.000000] rtas fadump: Firmware-assisted dump is active.
>     [    0.000000] fadump: Updated cmdline: debug fadump=on crashkernel=1G
>     [    0.000000] fadump: Firmware-assisted dump is active.
>     [    0.000000] fadump: Reserving 3072MB of memory at 0x00000040000000 for preserving crash data
>     ....
>     # file /proc/vmcore
>     /proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style
>
> Analysing the vmcore with crash-utility:
>
>           KERNEL: vmlinux-6.14-rc2
>         DUMPFILE: vmcore-a64dcfb451e2-nocma
>             CPUS: 4
>             DATE: Thu Jan  1 05:30:00 IST 1970
>           UPTIME: 00:00:30
>     LOAD AVERAGE: 0.74, 0.21, 0.07
>            TASKS: 94
>         NODENAME: buildroot
>          RELEASE: 6.14.0-rc2+
>          VERSION: #1 SMP Wed Feb 12 06:49:59 CST 2025
>          MACHINE: ppc64le  (1000 Mhz)
>           MEMORY: 4 GB
>            PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>              PID: 270
>          COMMAND: "sh"
>             TASK: c000000009e7cc00  [THREAD_INFO: c000000009e7cc00]
>              CPU: 3
>            STATE: TASK_RUNNING (PANIC)
>
> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>

This is very cool, nice work. Does it work with KVM? I think... probably
it could?

Are you able to add a functional test case for it? This is something
that people (including me) will forget to test...

Thanks,
Nick

> ---
>  hw/ppc/spapr_rtas.c    | 6 +++++-
>  include/hw/ppc/spapr.h | 3 ++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 0aca4270aee8..bd2ed16a46e3 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -692,7 +692,7 @@ static void trigger_fadump_boot(target_ulong spapr_retcode)
>  }
>  
>  /* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
> -static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,
> +static void rtas_configure_kernel_dump(PowerPCCPU *cpu,
>                                     SpaprMachineState *spapr,
>                                     uint32_t token, uint32_t nargs,
>                                     target_ulong args,
> @@ -1109,6 +1109,10 @@ static void core_rtas_register_types(void)
>      spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>                          rtas_ibm_nmi_interlock);
>  
> +    /* Register Fadump rtas call */
> +    spapr_rtas_register(RTAS_CONFIGURE_KERNEL_DUMP, "ibm,configure-kernel-dump",
> +                        rtas_configure_kernel_dump);
> +
>      qtest_set_command_cb(spapr_qtest_callback);
>  }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index fa63008e57ec..bde3bdc4b80c 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -768,8 +768,9 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
>  #define RTAS_IBM_SUSPEND_ME                     (RTAS_TOKEN_BASE + 0x2A)
>  #define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2B)
>  #define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2C)
> +#define RTAS_CONFIGURE_KERNEL_DUMP              (RTAS_TOKEN_BASE + 0x2D)
>  
> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2D)
> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2E)
>  
>  /* Fadump commands */
>  #define FADUMP_CMD_REGISTER            1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries
  2025-02-27  3:07   ` Nicholas Piggin
@ 2025-02-27  6:49     ` Aditya Gupta
  2025-02-27  8:48       ` Nicholas Piggin
  0 siblings, 1 reply; 29+ messages in thread
From: Aditya Gupta @ 2025-02-27  6:49 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

Hi Nick,

On 27/02/25 08:37, Nicholas Piggin wrote:
> On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
>> Implement the handler for "ibm,configure-kernel-dump" rtas call in QEMU.
>>
>> Currently the handler just does basic checks and handles
>> register/unregister/invalidate requests from kernel.
>>
>> Fadump will be enabled in a later patch.
>>
>> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_rtas.c    | 99 ++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr.h | 59 +++++++++++++++++++++++++
>>   2 files changed, 158 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index df2e837632aa..eebdf13b1552 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -341,6 +341,105 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
>>       rtas_st(rets, 0, ret);
>>   }
>>   
>> +struct fadump_metadata fadump_metadata;
> Can this (and other globals added in other patches) come under
> SpaprMachineState?
>
> And could most of the fadump code and structures go under new
> spapr_fadump.[ch] files?
Yes, i can move it inside SpaprMachineState. Will put the code in new files.
>> +
>> +/* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
>> +static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,
>> +                                   SpaprMachineState *spapr,
>> +                                   uint32_t token, uint32_t nargs,
>> +                                   target_ulong args,
>> +                                   uint32_t nret, target_ulong rets)
> I don't know about adding a new unused function like this, is there
> a way to juggle patches around to add it when it's wired up?

Ah, that is problematic agreed. I tried to move around things, but 
arrived at this.

I will spend some time thinking how to arrange this.

Will need some guidance. How should I approach arranging the code in 
such situations ?

My idea was to
* First one is the skeleton: mentions the steps, but doesn't implement 
the steps
* Middle patches implement the steps one by one
* Last patch enables it all. So in future if someone checks out the 
"Enable fadump" commit they would have all the support ready.

The major problem is "everything" remains unused till this last patch. 
But this 1st patch gave me the chance to logically build upon this, eg. 
first implement preserving memory regions, then add the fadump_trigger 
in os-term rtas call, etc.

Any advice to approach this ?

>> +{
>> +    struct rtas_fadump_section_header header;
>> +    target_ulong cmd = rtas_ld(args, 0);
>> +    target_ulong fdm_addr = rtas_ld(args, 1);
>> +    target_ulong fdm_size = rtas_ld(args, 2);
>> +
>> +    /* Number outputs has to be 1 */
>> +    if (nret != 1) {
>> +        qemu_log_mask(LOG_GUEST_ERROR,
>> +                "FADUMP: ibm,configure-kernel-dump RTAS called with nret != 1.\n");
>> +        return;
>> +    }
>> +
>> +    /* Number inputs has to be 3 */
>> +    if (nargs != 3) {
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +        return;
>> +    }
>> +
>> +    switch (cmd) {
>> +    case FADUMP_CMD_REGISTER:
>> +        if (fadump_metadata.fadump_registered) {
>> +            /* Fadump already registered */
>> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ALREADY_REGISTERED);
>> +            return;
>> +        }
>> +
>> +        if (fadump_metadata.fadump_dump_active == 1) {
>> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ACTIVE);
>> +            return;
>> +        }
>> +
>> +        if (fdm_size < sizeof(struct rtas_fadump_section_header)) {
>> +            qemu_log_mask(LOG_GUEST_ERROR,
>> +                "FADUMP: Header size is invalid: %lu\n", fdm_size);
>> +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +            return;
>> +        }
>> +
>> +        /* XXX: Can we ensure fdm_addr points to a valid RMR-memory buffer ? */
> RMR memory? There is spapr_rma_size() if that's what you need?


Thanks, will use `spapr_rma_size`. The PAPR says fdm_addr should point 
to a valid RMR buffer, I guess that means it should be in the RMA, ie. 
`< spapr_rma_size()` ?


- Aditya G

>
> Thanks,
> Nick


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/6] hw/ppc: Trigger Fadump boot if fadump is registered
  2025-02-27  3:14   ` Nicholas Piggin
@ 2025-02-27  6:56     ` Aditya Gupta
  0 siblings, 0 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-02-27  6:56 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

On 27/02/25 08:44, Nicholas Piggin wrote:
> On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
>> According to PAPR:
>>
>>      R1–7.3.30–3. When the platform receives an ibm,os-term RTAS call, or
>>      on a system reset without an ibm,nmi-interlock RTAS call, if the
>>      platform has a dump structure registered through the
>>      ibm,configure-kernel-dump call, the platform must process each
>>      registered kernel dump section as required and, when available,
>>      present the dump structure information to the operating system
>>      through the “ibm,kernel-dump” property, updated with status for each
>>      dump section, until the dump has been invalidated through the
>>      ibm,configure-kernel-dump RTAS call.
>>
>> If Fadump has been registered, trigger an Fadump boot (memory preserving
>> boot), if QEMU recieves a 'ibm,os-term' rtas call.
>>
>> Implementing the fadump boot as:
>>      * pause all vcpus (will save registers later)
>>      * preserve memory regions specified by fadump
>>      * do a memory preserving reboot (GUEST_RESET in QEMU doesn't clear
>>        the memory)
>>
>> Memory regions registered by fadump will be handled in a later patch.
>>
>> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_rtas.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 42 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index eebdf13b1552..01c82375f03d 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -342,6 +342,43 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
>>   }
>>   
>>   struct fadump_metadata fadump_metadata;
>> +bool is_next_boot_fadump;
> Here's another one for spapr state.
Sure, will add.
>> +
>> +static void trigger_fadump_boot(target_ulong spapr_retcode)
>> +{
>> +    /*
>> +     * In PowerNV, SBE stops all clocks for cores, do similar to it
>> +     * QEMU's nearest equivalent is 'pause_all_vcpus'
>> +     * See 'stopClocksS0' in SBE source code for more info on SBE part
>> +     */
> Can probably remove this comment here.
Sure.
>> +    pause_all_vcpus();
>> +
>> +    if (true /* TODO: Preserve memory registered for fadump */) {
> If you're adding half the code to preserve memory but never actually
> calling it anyway, you don't need the pause_all_vcpus() call either.
>
> Again I would rather not adding unused code to the patches if possible.
> If you're really not able to find a nice way to split and add
> incrementally then okay, but try to take another look if possible.
>
Yes all this is unused. Will take another look to see how I can arrange it.
>> +        /* Failed to preserve the registered memory regions */
>> +        rtas_st(spapr_retcode, 0, RTAS_OUT_HW_ERROR);
>> +
>> +        /* Cause a reboot */
>> +        qemu_system_guest_panicked(NULL);
>> +        return;
>> +    }
>> +
>> +    /* Mark next boot as fadump boot */
>> +    is_next_boot_fadump = true;
>> +
>> +    /* Reset fadump_registered for next boot */
>> +    fadump_metadata.fadump_registered = false;
>> +    fadump_metadata.fadump_dump_active = true;
>> +
>> +    /* Then do a guest reset */
>> +    /*
>> +     * Requirement:
>> +     * This guest reset should not clear the memory (which is
>> +     * the case when this is merged)
>> +     */
>> +    qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
> Seems reasonable. What is the actual mechanism that clears the machine
> RAM anyway? I'm not able to find it...

I didn't find any too. There is a devices reset which happens at this 
guest_reset, during which some devices do clear their memory registers, 
eg. 'pnv_psi_reset', since it clears it's regs it's like it's cleared 
it's memory region.

There were few like that which cleared the data they pass in their 
memory regions, but nothing clearing the whole RAM/whole memory regions 
as such.


- Aditya G

>
> Thanks,
> Nick
>
>> +
>> +    rtas_st(spapr_retcode, 0, RTAS_OUT_SUCCESS);
>> +}
>>   
>>   /* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
>>   static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,
>> @@ -449,6 +486,11 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
>>       target_ulong msgaddr = rtas_ld(args, 0);
>>       char msg[512];
>>   
>> +    if (fadump_metadata.fadump_registered) {
>> +        /* If fadump boot works, control won't come back here */
>> +        return trigger_fadump_boot(rets);
>> +    }
>> +
>>       cpu_physical_memory_read(msgaddr, msg, sizeof(msg) - 1);
>>       msg[sizeof(msg) - 1] = 0;
>>   


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/6] hw/ppc: Implement saving CPU state in Fadump
  2025-02-27  3:27   ` Nicholas Piggin
@ 2025-02-27  7:01     ` Aditya Gupta
  0 siblings, 0 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-02-27  7:01 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

On 27/02/25 08:57, Nicholas Piggin wrote:

> On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
>> <...snip...>
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 9b29cadab2c9..0aca4270aee8 100644
>> <...snip...>
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -413,9 +416,174 @@ static bool fadump_preserve_mem(void)
>>           }
>>   
>>           switch (data_type) {
>> -        case FADUMP_CPU_STATE_DATA:
>> -            /* TODO: Add CPU state data */
>> +        case FADUMP_CPU_STATE_DATA: {
> I would split these out into their own functions if they grow more than
> a few lines.
Makes sense. Will add this into a new helper function.
>
>> +            struct rtas_fadump_reg_save_area_header reg_save_hdr;
>> +            struct rtas_fadump_reg_entry **reg_entries;
>> +            struct rtas_fadump_reg_entry *curr_reg_entry;
>> +
>> +            uint32_t fadump_reg_entries_size;
>> +            __be32 num_cpus = 0;
>> +            uint32_t num_regs_per_cpu = 0;
>> +            CPUState *cpu;
>> +            CPUPPCState *env;
>> +            PowerPCCPU *ppc_cpu;
>> +
>> +            CPU_FOREACH(cpu) {
>> +                ++num_cpus;
>> +            }
>> +
>> +            reg_save_hdr.version = cpu_to_be32(1);
>> +            reg_save_hdr.magic_number =
>> +                cpu_to_be64(fadump_str_to_u64("REGSAVE"));
>> +
>> +            /* Reg save area header is immediately followed by num cpus */
>> +            reg_save_hdr.num_cpu_offset =
>> +                cpu_to_be32(sizeof(struct rtas_fadump_reg_save_area_header));
>> +
>> +            fadump_reg_entries_size = num_cpus *
>> +                                      FADUMP_NUM_PER_CPU_REGS *
>> +                                      sizeof(struct rtas_fadump_reg_entry);
>> +
>> +            reg_entries = malloc(fadump_reg_entries_size);
>> +            curr_reg_entry = (struct rtas_fadump_reg_entry *)reg_entries;
>> +
>> +            /* This must loop num_cpus time */
>> +            CPU_FOREACH(cpu) {
>> +                ppc_cpu = POWERPC_CPU(cpu);
>> +                env = cpu_env(cpu);
>> +                num_regs_per_cpu = 0;
>> +
>> +                curr_reg_entry->reg_id =
>> +                    cpu_to_be64(fadump_str_to_u64("CPUSTRT"));
>> +                curr_reg_entry->reg_value = ppc_cpu->vcpu_id;
>> +                ++curr_reg_entry;
>> +
>> +#define REG_ENTRY(id, val)                                     \
>> +                do {                                           \
>> +                    curr_reg_entry->reg_id =                   \
>> +                        cpu_to_be64(fadump_str_to_u64(#id));   \
>> +                    curr_reg_entry->reg_value = val;           \
>> +                    ++curr_reg_entry;                          \
>> +                    ++num_regs_per_cpu;                        \
>> +                } while (0)
>> +
>> +                REG_ENTRY(ACOP, env->spr[SPR_ACOP]);
>> +                REG_ENTRY(AMR, env->spr[SPR_AMR]);
>> +                REG_ENTRY(BESCR, env->spr[SPR_BESCR]);
>> +                REG_ENTRY(CFAR, env->spr[SPR_CFAR]);
>> +                REG_ENTRY(CIABR, env->spr[SPR_CIABR]);
>> +
>> +                /* Save the condition register */
>> +                uint64_t cr = 0;
>> +                cr |= (env->crf[0] & 0xf);
>> +                cr |= (env->crf[1] & 0xf) << 1;
>> +                cr |= (env->crf[2] & 0xf) << 2;
>> +                cr |= (env->crf[3] & 0xf) << 3;
>> +                cr |= (env->crf[4] & 0xf) << 4;
>> +                cr |= (env->crf[5] & 0xf) << 5;
>> +                cr |= (env->crf[6] & 0xf) << 6;
>> +                cr |= (env->crf[7] & 0xf) << 7;
> Shift values wrong here I think... Use ppc_get_cr()
Okay, I had some issues getting this CR. Will use 'ppc_get_cr', thanks !
>
>> +                REG_ENTRY(CR, cr);
>> +
>> +                REG_ENTRY(CTR, env->spr[SPR_CTR]);
>> +                REG_ENTRY(CTRL, env->spr[SPR_CTRL]);
>> +                REG_ENTRY(DABR, env->spr[SPR_DABR]);
>> +                REG_ENTRY(DABRX, env->spr[SPR_DABRX]);
>> +                REG_ENTRY(DAR, env->spr[SPR_DAR]);
>> +                REG_ENTRY(DAWR0, env->spr[SPR_DAWR0]);
>> +                REG_ENTRY(DAWR1, env->spr[SPR_DAWR1]);
>> +                REG_ENTRY(DAWRX0, env->spr[SPR_DAWRX0]);
>> +                REG_ENTRY(DAWRX1, env->spr[SPR_DAWRX1]);
>> +                REG_ENTRY(DPDES, env->spr[SPR_DPDES]);
>> +                REG_ENTRY(DSCR, env->spr[SPR_DSCR]);
>> +                REG_ENTRY(DSISR, env->spr[SPR_DSISR]);
>> +                REG_ENTRY(EBBHR, env->spr[SPR_EBBHR]);
>> +                REG_ENTRY(EBBRR, env->spr[SPR_EBBRR]);
>> +
>> +                REG_ENTRY(FPSCR, env->fpscr);
>> +                REG_ENTRY(FSCR, env->spr[SPR_FSCR]);
>> +
>> +                /* Save the GPRs */
>> +                for (int gpr_id = 0; gpr_id < 32; ++gpr_id) {
>> +                    curr_reg_entry->reg_id =
>> +                        cpu_to_be64(fadump_gpr_id_to_u64(gpr_id));
>> +                    curr_reg_entry->reg_value = env->gpr[i];
>> +                    ++curr_reg_entry;
>> +                    ++num_regs_per_cpu;
>> +                }
>> +
>> +                REG_ENTRY(IAMR, env->spr[SPR_IAMR]);
>> +                REG_ENTRY(IC, env->spr[SPR_IC]);
>> +                REG_ENTRY(LR, env->spr[SPR_LR]);
>> +
>> +                REG_ENTRY(MSR, env->msr);
>> +                REG_ENTRY(NIA, env->nip);   /* NIA */
>> +                REG_ENTRY(PIR, env->spr[SPR_PIR]);
>> +                REG_ENTRY(PSPB, env->spr[SPR_PSPB]);
>> +                REG_ENTRY(PVR, env->spr[SPR_PVR]);
>> +                REG_ENTRY(RPR, env->spr[SPR_RPR]);
>> +                REG_ENTRY(SPURR, env->spr[SPR_SPURR]);
>> +                REG_ENTRY(SRR0, env->spr[SPR_SRR0]);
>> +                REG_ENTRY(SRR1, env->spr[SPR_SRR1]);
>> +                REG_ENTRY(TAR, env->spr[SPR_TAR]);
>> +                REG_ENTRY(TEXASR, env->spr[SPR_TEXASR]);
>> +                REG_ENTRY(TFHAR, env->spr[SPR_TFHAR]);
>> +                REG_ENTRY(TFIAR, env->spr[SPR_TFIAR]);
>> +                REG_ENTRY(TIR, env->spr[SPR_TIR]);
>> +                REG_ENTRY(UAMOR, env->spr[SPR_UAMOR]);
>> +                REG_ENTRY(VRSAVE, env->spr[SPR_VRSAVE]);
>> +                REG_ENTRY(VSCR, env->vscr);
>> +                REG_ENTRY(VTB, env->spr[SPR_VTB]);
>> +                REG_ENTRY(WORT, env->spr[SPR_WORT]);
>> +                REG_ENTRY(XER, env->spr[SPR_XER]);
>> +
>> +                /*
>> +                 * Ignoring transaction checkpoint and few other registers
>> +                 * mentioned in PAPR as not supported in QEMU
>> +                 */
>> +#undef REG_ENTRY
>> +
>> +                /* End the registers for this CPU with "CPUEND" reg entry */
>> +                curr_reg_entry->reg_id =
>> +                    cpu_to_be64(fadump_str_to_u64("CPUEND"));
>> +
>> +                /* Ensure the number of registers match (+2 for STRT & END) */
>> +                assert(FADUMP_NUM_PER_CPU_REGS == num_regs_per_cpu + 2);
>> +
>> +                ++curr_reg_entry;
>> +            }
>> +
>> +            cpu_state_len = 0;
>> +            cpu_state_len += sizeof(reg_save_hdr);     /* reg save header */
>> +            cpu_state_len += sizeof(__be32);           /* num_cpus */
>> +            cpu_state_len += fadump_reg_entries_size;  /* reg entries */
>> +
>> +            cpu_state_region = &fdm->rgn[i];
>> +            cpu_state_addr = dest_addr;
>> +            cpu_state_buffer = g_malloc(cpu_state_len);
>> +
>> +            uint64_t offset = 0;
>> +            memcpy(cpu_state_buffer + offset,
>> +                    &reg_save_hdr, sizeof(reg_save_hdr));
>> +            offset += sizeof(reg_save_hdr);
>> +
>> +            /* Write num_cpus */
>> +            num_cpus = cpu_to_be32(num_cpus);
>> +            memcpy(cpu_state_buffer + offset, &num_cpus, sizeof(__be32));
>> +            offset += sizeof(__be32);
>> +
>> +            /* Write the register entries */
>> +            memcpy(cpu_state_buffer + offset,
>> +                    reg_entries, fadump_reg_entries_size);
>> +            offset += fadump_reg_entries_size;
>> +
>> +            /*
>> +             * We will write the cpu state data later, as otherwise it
>> +             * might get overwritten by other fadump regions
>> +             */
>> +
>>               break;
>> +        }
>>           case FADUMP_HPTE_REGION:
>>               /* TODO: Add hpte state data */
>>               break;
>> @@ -455,6 +623,34 @@ static bool fadump_preserve_mem(void)
>>           }
>>       }
>>   
>> +    /*
>> +     * Write the Register Save Area
>> +     *
>> +     * CPU State/Register Save Area should be written after dumping the
>> +     * memory to prevent overwritting while saving other memory regions
>> +     *
>> +     * eg. If boot memory region is 1G, then both the first 1GB memory, and
>> +     * the Register Save Area needs to be saved at 1GB.
>> +     * And as the CPU_STATE_DATA region comes first than the
>> +     * REAL_MODE_REGION region to be copied, the CPU_STATE_DATA will get
>> +     * overwritten if saved before the 0GB - 1GB region is copied after
>> +     * saving CPU state data
>> +     */
>> +    cpu_physical_memory_write(cpu_state_addr, cpu_state_buffer, cpu_state_len);
> Check docs/devel/loads-stores.rst, address_space_* is preferred to check
> for failures. It also says devices should operate on their own address
> spaces and that doesn't really apply to spapr since the "virtual
> hypervisor" doesn't really fit the model of a device...
>
> Perhaps look at h_enter_nested which uses CPU(cpu)->as.
Got it. Will try to use address_space_read/write in v2.
>> +    g_free(cpu_state_buffer);
>> +
>> +    /*
>> +     * Set bytes_dumped in cpu state region, so kernel knows platform have
>> +     * exported it
>> +     */
>> +    cpu_state_region->bytes_dumped = cpu_to_be64(cpu_state_len);
>> +
>> +    if (cpu_state_region->source_len != cpu_state_region->bytes_dumped) {
>> +        qemu_log_mask(LOG_GUEST_ERROR,
>> +                "CPU State region's length passed by kernel, doesn't match"
>> +                " with CPU State region length exported by QEMU");
>> +    }
>> +
>>       return true;
>>   }
>>   
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index a80704187583..0e8002bad9e0 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -792,6 +792,9 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
>>   #define FADUMP_HPTE_REGION      0x0002
>>   #define FADUMP_REAL_MODE_REGION 0x0011
>>   
>> +/* Number of registers stored per cpu */
>> +#define FADUMP_NUM_PER_CPU_REGS (32 /*GPR*/ + 45 /*others*/ + 2 /*STRT & END*/)
>> +
>>   /* OS defined sections */
>>   #define FADUMP_PARAM_AREA       0x0100
>>   
>> @@ -845,6 +848,86 @@ struct rtas_fadump_mem_struct {
>>       struct rtas_fadump_section        rgn[FADUMP_MAX_SECTIONS];
>>   };
>>   
>> +/*
>> + * The firmware-assisted dump format.
>> + *
>> + * The register save area is an area in the partition's memory used to preserve
>> + * the register contents (CPU state data) for the active CPUs during a firmware
>> + * assisted dump. The dump format contains register save area header followed
>> + * by register entries. Each list of registers for a CPU starts with "CPUSTRT"
>> + * and ends with "CPUEND".
>> + */
>> +
>> +/* Register save area header. */
>> +struct rtas_fadump_reg_save_area_header {
>> +    __be64    magic_number;
>> +    __be32    version;
>> +    __be32    num_cpu_offset;
>> +};
>> +
>> +/* Register entry. */
>> +struct rtas_fadump_reg_entry {
>> +    __be64    reg_id;
>> +    __be64    reg_value;
>> +};
>> +
>> +/*
>> + * Copy the ascii values for first 8 characters from a string into u64
>> + * variable at their respective indexes.
>> + * e.g.
>> + *  The string "FADMPINF" will be converted into 0x4641444d50494e46
>> + */
>> +static inline uint64_t fadump_str_to_u64(const char *str)
>> +{
>> +    uint64_t val = 0;
>> +    int i;
>> +
>> +    for (i = 0; i < sizeof(val); i++) {
>> +        val = (*str) ? (val << 8) | *str++ : val << 8;
>> +    }
>> +    return val;
>> +}
>> +
>> +/**
>> + * Get the identifier id for register entries of GPRs
>> + *
>> + * It gives the same id as 'fadump_str_to_u64' when the complete string id
>> + * of the GPR is given, ie.
>> + *
>> + *   fadump_str_to_u64("GPR05") == fadump_gpr_id_to_u64(5);
>> + *   fadump_str_to_u64("GPR12") == fadump_gpr_id_to_u64(12);
>> + *
>> + * And so on. Hence this can be implemented by creating a dynamic
>> + * string for each GPR, such as "GPR00", "GPR01", ... "GPR31"
>> + * Instead of allocating a string, an observation from the math of
>> + * 'fadump_str_to_u64' or from PAPR tells us that there's a pattern
>> + * in the identifier IDs, such that the first 8 bytes are affected only by
>> + * whether it is GPR0*, GPR1*, GPR2*, GPR3*. 9th byte is always 0x3. And
>> + * the the 10th byte is the index of the GPR modulo 10.
>> + */
>> +static inline uint64_t fadump_gpr_id_to_u64(uint32_t gpr_id)
>> +{
>> +    uint64_t val = 0;
>> +
>> +    /* Valid range of GPR id is only GPR0 to GPR31 */
>> +    assert(gpr_id < 32);
>> +
>> +    if (gpr_id <= 9) {
>> +        val = fadump_str_to_u64("GPR0");
>> +    } else if (gpr_id <= 19) {
>> +        val = fadump_str_to_u64("GPR1");
>> +    } else if (gpr_id <= 29) {
>> +        val = fadump_str_to_u64("GPR2");
>> +    } else {
>> +        val = fadump_str_to_u64("GPR3");
>> +    }
>> +
>> +    val |= 0x30000000;
>> +    val |= ((gpr_id % 10) << 12);
>> +
>> +    return val;
>> +}
> These two functions could probably go out of line, I doubt they
> are performance critical and make them static if not used outside
> the file.

True, have marked them static (but in a header file), can see I can move 
it into some .c file if not needed to be shared.


Thanks for your reviews Nick.


- Aditya G

>
> Thanks,
> Nick


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/6] hw/ppc: Pass device tree properties for Fadump
  2025-02-27  3:28   ` Nicholas Piggin
@ 2025-02-27  7:02     ` Aditya Gupta
  2025-03-05  7:34     ` Harsh Prateek Bora
  1 sibling, 0 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-02-27  7:02 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini


On 27/02/25 08:58, Nicholas Piggin wrote:
> On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
>> <...snip...>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index f3a4b4235d43..3602e5b5d18d 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -897,9 +897,27 @@ static int spapr_dt_rng(void *fdt)
>>   static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
>>   {
> You might be able to add a spapr_dt_rtas_fadump() function
> and do it there to help keep functions small?

Sure.


Thanks,

- Aditya G

> Thanks,
> Nick


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] hw/ppc: Enable Fadump for PSeries
  2025-02-27  3:33   ` Nicholas Piggin
@ 2025-02-27  7:07     ` Aditya Gupta
  2025-02-27  8:52       ` Nicholas Piggin
  0 siblings, 1 reply; 29+ messages in thread
From: Aditya Gupta @ 2025-02-27  7:07 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

On 27/02/25 09:03, Nicholas Piggin wrote:

> On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
>> With all support in place, enable fadump by exporting the
>> "ibm,configure-kernel-dump" RTAS call in the device tree.
>>
>> Presence of "ibm,configure-kernel-dump" tells the kernel that the
>> platform (QEMU) supports fadump.
>>
>> Pass "fadump=on" to enable Linux to use firmware assisted dump.
>>
>> Logs of a linux boot with firmware assisted dump:
>>
>>      ./build/qemu-system-ppc64 -M pseries,x-vof=on --cpu power10 --smp 4 -m 4G -kernel some-vmlinux -initrd some-initrd -append "debug fadump=on crashkernel=1G" -nographic
>>      [    0.000000] random: crng init done
>>      [    0.000000] fadump: Reserved 1024MB of memory at 0x00000040000000 (System RAM: 4096MB)
>>      ...
>>      [    1.084686] rtas fadump: Registration is successful!
>>      ...
>>      # cat /sys/kernel/debug/powerpc/fadump_region
>>      CPU :[0x00000040000000-0x000000400013d3] 0x13d4 bytes, Dumped: 0x0
>>      HPTE:[0x000000400013d4-0x000000400013d3] 0x0 bytes, Dumped: 0x0
>>      DUMP: Src: 0x00000000000000, Dest: 0x00000040010000, Size: 0x40000000, Dumped: 0x0 bytes
>>
>>      [0x000000fffff800-0x000000ffffffff]: cmdline append: ''
>>      # echo c > /proc/sysrq-trigger
>>
>> The fadump boot after crash:
>>
>>      [    0.000000] rtas fadump: Firmware-assisted dump is active.
>>      [    0.000000] fadump: Updated cmdline: debug fadump=on crashkernel=1G
>>      [    0.000000] fadump: Firmware-assisted dump is active.
>>      [    0.000000] fadump: Reserving 3072MB of memory at 0x00000040000000 for preserving crash data
>>      ....
>>      # file /proc/vmcore
>>      /proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style
>>
>> Analysing the vmcore with crash-utility:
>>
>>            KERNEL: vmlinux-6.14-rc2
>>          DUMPFILE: vmcore-a64dcfb451e2-nocma
>>              CPUS: 4
>>              DATE: Thu Jan  1 05:30:00 IST 1970
>>            UPTIME: 00:00:30
>>      LOAD AVERAGE: 0.74, 0.21, 0.07
>>             TASKS: 94
>>          NODENAME: buildroot
>>           RELEASE: 6.14.0-rc2+
>>           VERSION: #1 SMP Wed Feb 12 06:49:59 CST 2025
>>           MACHINE: ppc64le  (1000 Mhz)
>>            MEMORY: 4 GB
>>             PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>>               PID: 270
>>           COMMAND: "sh"
>>              TASK: c000000009e7cc00  [THREAD_INFO: c000000009e7cc00]
>>               CPU: 3
>>             STATE: TASK_RUNNING (PANIC)
>>
>> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> This is very cool, nice work. Does it work with KVM? I think... probably
> it could?

Yes it does, atleast for crashing CPU :)

But there are problems with reading the CPU regs, regs don't seem 
correct for non-crashing CPUs.

Crash is able to work perfectly for the crashing CPU as of now (as the 
registers are stored by the kernel in that case).

>
> Are you able to add a functional test case for it? This is something
> that people (including me) will forget to test...

Sure, I will add a test case.


Thanks for your reviews Nick.

It might take few weeks for me to post another version, will see into 
the tests in qemu and arrange the code bit more nicely.


- Aditya G

>
> Thanks,
> Nick
>
>> ---
>>   hw/ppc/spapr_rtas.c    | 6 +++++-
>>   include/hw/ppc/spapr.h | 3 ++-
>>   2 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index 0aca4270aee8..bd2ed16a46e3 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -692,7 +692,7 @@ static void trigger_fadump_boot(target_ulong spapr_retcode)
>>   }
>>   
>>   /* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
>> -static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,
>> +static void rtas_configure_kernel_dump(PowerPCCPU *cpu,
>>                                      SpaprMachineState *spapr,
>>                                      uint32_t token, uint32_t nargs,
>>                                      target_ulong args,
>> @@ -1109,6 +1109,10 @@ static void core_rtas_register_types(void)
>>       spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, "ibm,nmi-interlock",
>>                           rtas_ibm_nmi_interlock);
>>   
>> +    /* Register Fadump rtas call */
>> +    spapr_rtas_register(RTAS_CONFIGURE_KERNEL_DUMP, "ibm,configure-kernel-dump",
>> +                        rtas_configure_kernel_dump);
>> +
>>       qtest_set_command_cb(spapr_qtest_callback);
>>   }
>>   
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index fa63008e57ec..bde3bdc4b80c 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -768,8 +768,9 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
>>   #define RTAS_IBM_SUSPEND_ME                     (RTAS_TOKEN_BASE + 0x2A)
>>   #define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x2B)
>>   #define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x2C)
>> +#define RTAS_CONFIGURE_KERNEL_DUMP              (RTAS_TOKEN_BASE + 0x2D)
>>   
>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2D)
>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2E)
>>   
>>   /* Fadump commands */
>>   #define FADUMP_CMD_REGISTER            1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries
  2025-02-27  6:49     ` Aditya Gupta
@ 2025-02-27  8:48       ` Nicholas Piggin
  2025-02-27 12:15         ` Aditya Gupta
  0 siblings, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2025-02-27  8:48 UTC (permalink / raw)
  To: Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

On Thu Feb 27, 2025 at 4:49 PM AEST, Aditya Gupta wrote:
> Hi Nick,
>
> On 27/02/25 08:37, Nicholas Piggin wrote:
>> On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
>>> Implement the handler for "ibm,configure-kernel-dump" rtas call in QEMU.
>>>
>>> Currently the handler just does basic checks and handles
>>> register/unregister/invalidate requests from kernel.
>>>
>>> Fadump will be enabled in a later patch.
>>>
>>> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
>>> ---
>>>   hw/ppc/spapr_rtas.c    | 99 ++++++++++++++++++++++++++++++++++++++++++
>>>   include/hw/ppc/spapr.h | 59 +++++++++++++++++++++++++
>>>   2 files changed, 158 insertions(+)
>>>
>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>> index df2e837632aa..eebdf13b1552 100644
>>> --- a/hw/ppc/spapr_rtas.c
>>> +++ b/hw/ppc/spapr_rtas.c
>>> @@ -341,6 +341,105 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
>>>       rtas_st(rets, 0, ret);
>>>   }
>>>   
>>> +struct fadump_metadata fadump_metadata;
>> Can this (and other globals added in other patches) come under
>> SpaprMachineState?
>>
>> And could most of the fadump code and structures go under new
>> spapr_fadump.[ch] files?
> Yes, i can move it inside SpaprMachineState. Will put the code in new files.
>>> +
>>> +/* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
>>> +static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,
>>> +                                   SpaprMachineState *spapr,
>>> +                                   uint32_t token, uint32_t nargs,
>>> +                                   target_ulong args,
>>> +                                   uint32_t nret, target_ulong rets)
>> I don't know about adding a new unused function like this, is there
>> a way to juggle patches around to add it when it's wired up?
>
> Ah, that is problematic agreed. I tried to move around things, but 
> arrived at this.
>
> I will spend some time thinking how to arrange this.
>
> Will need some guidance. How should I approach arranging the code in 
> such situations ?
>
> My idea was to
> * First one is the skeleton: mentions the steps, but doesn't implement 
> the steps
> * Middle patches implement the steps one by one
> * Last patch enables it all. So in future if someone checks out the 
> "Enable fadump" commit they would have all the support ready.
>
> The major problem is "everything" remains unused till this last patch. 
> But this 1st patch gave me the chance to logically build upon this, eg. 
> first implement preserving memory regions, then add the fadump_trigger 
> in os-term rtas call, etc.
>
> Any advice to approach this ?

Yeah, sometimes it's difficult to avoid. Especially with a new
feature like this. If you can't find a better way, that's okay.

One thing could be to return errors from calls. RTAS is a little
bit tricky since there is no general "unsupported" error because
the presence of the token implies some support. You could return
-1 hardware error perhaps.

Another option is implement the call but not all functionality.
E.g., permit dump register/unregister, but don't actually provide
a valid dump on reboot (you could ignore, or provide empty or
invalid format). Downside of that is that if you bisect, a kernel
test case could go bad because it appears to be supported but
produces invalid result.

To avoid that, perhaps you could trip an assert or just log an
error message when performing a reboot with crash dump registered.

But as I said, don't make it too convoluted or lots more work if
it's not easy to rework.

>
>>> +{
>>> +    struct rtas_fadump_section_header header;
>>> +    target_ulong cmd = rtas_ld(args, 0);
>>> +    target_ulong fdm_addr = rtas_ld(args, 1);
>>> +    target_ulong fdm_size = rtas_ld(args, 2);
>>> +
>>> +    /* Number outputs has to be 1 */
>>> +    if (nret != 1) {
>>> +        qemu_log_mask(LOG_GUEST_ERROR,
>>> +                "FADUMP: ibm,configure-kernel-dump RTAS called with nret != 1.\n");
>>> +        return;
>>> +    }
>>> +
>>> +    /* Number inputs has to be 3 */
>>> +    if (nargs != 3) {
>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>> +        return;
>>> +    }
>>> +
>>> +    switch (cmd) {
>>> +    case FADUMP_CMD_REGISTER:
>>> +        if (fadump_metadata.fadump_registered) {
>>> +            /* Fadump already registered */
>>> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ALREADY_REGISTERED);
>>> +            return;
>>> +        }
>>> +
>>> +        if (fadump_metadata.fadump_dump_active == 1) {
>>> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ACTIVE);
>>> +            return;
>>> +        }
>>> +
>>> +        if (fdm_size < sizeof(struct rtas_fadump_section_header)) {
>>> +            qemu_log_mask(LOG_GUEST_ERROR,
>>> +                "FADUMP: Header size is invalid: %lu\n", fdm_size);
>>> +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>> +            return;
>>> +        }
>>> +
>>> +        /* XXX: Can we ensure fdm_addr points to a valid RMR-memory buffer ? */
>> RMR memory? There is spapr_rma_size() if that's what you need?
>
>
> Thanks, will use `spapr_rma_size`. The PAPR says fdm_addr should point 
> to a valid RMR buffer, I guess that means it should be in the RMA, ie. 
> `< spapr_rma_size()` ?

Ah yes, PAPR glossray says:

Real Mode Region. This is an obsolete term that is deprecated in favor of RMA.

So that should do what you want.

Thanks,
Nick



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] hw/ppc: Enable Fadump for PSeries
  2025-02-27  7:07     ` Aditya Gupta
@ 2025-02-27  8:52       ` Nicholas Piggin
  0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2025-02-27  8:52 UTC (permalink / raw)
  To: Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

On Thu Feb 27, 2025 at 5:07 PM AEST, Aditya Gupta wrote:
> On 27/02/25 09:03, Nicholas Piggin wrote:
>
>> On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
>>> With all support in place, enable fadump by exporting the
>>> "ibm,configure-kernel-dump" RTAS call in the device tree.
>>>
>>> Presence of "ibm,configure-kernel-dump" tells the kernel that the
>>> platform (QEMU) supports fadump.
>>>
>>> Pass "fadump=on" to enable Linux to use firmware assisted dump.
>>>
>>> Logs of a linux boot with firmware assisted dump:
>>>
>>>      ./build/qemu-system-ppc64 -M pseries,x-vof=on --cpu power10 --smp 4 -m 4G -kernel some-vmlinux -initrd some-initrd -append "debug fadump=on crashkernel=1G" -nographic
>>>      [    0.000000] random: crng init done
>>>      [    0.000000] fadump: Reserved 1024MB of memory at 0x00000040000000 (System RAM: 4096MB)
>>>      ...
>>>      [    1.084686] rtas fadump: Registration is successful!
>>>      ...
>>>      # cat /sys/kernel/debug/powerpc/fadump_region
>>>      CPU :[0x00000040000000-0x000000400013d3] 0x13d4 bytes, Dumped: 0x0
>>>      HPTE:[0x000000400013d4-0x000000400013d3] 0x0 bytes, Dumped: 0x0
>>>      DUMP: Src: 0x00000000000000, Dest: 0x00000040010000, Size: 0x40000000, Dumped: 0x0 bytes
>>>
>>>      [0x000000fffff800-0x000000ffffffff]: cmdline append: ''
>>>      # echo c > /proc/sysrq-trigger
>>>
>>> The fadump boot after crash:
>>>
>>>      [    0.000000] rtas fadump: Firmware-assisted dump is active.
>>>      [    0.000000] fadump: Updated cmdline: debug fadump=on crashkernel=1G
>>>      [    0.000000] fadump: Firmware-assisted dump is active.
>>>      [    0.000000] fadump: Reserving 3072MB of memory at 0x00000040000000 for preserving crash data
>>>      ....
>>>      # file /proc/vmcore
>>>      /proc/vmcore: ELF 64-bit LSB core file, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), SVR4-style
>>>
>>> Analysing the vmcore with crash-utility:
>>>
>>>            KERNEL: vmlinux-6.14-rc2
>>>          DUMPFILE: vmcore-a64dcfb451e2-nocma
>>>              CPUS: 4
>>>              DATE: Thu Jan  1 05:30:00 IST 1970
>>>            UPTIME: 00:00:30
>>>      LOAD AVERAGE: 0.74, 0.21, 0.07
>>>             TASKS: 94
>>>          NODENAME: buildroot
>>>           RELEASE: 6.14.0-rc2+
>>>           VERSION: #1 SMP Wed Feb 12 06:49:59 CST 2025
>>>           MACHINE: ppc64le  (1000 Mhz)
>>>            MEMORY: 4 GB
>>>             PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>>>               PID: 270
>>>           COMMAND: "sh"
>>>              TASK: c000000009e7cc00  [THREAD_INFO: c000000009e7cc00]
>>>               CPU: 3
>>>             STATE: TASK_RUNNING (PANIC)
>>>
>>> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
>> This is very cool, nice work. Does it work with KVM? I think... probably
>> it could?
>
> Yes it does, atleast for crashing CPU :)
>
> But there are problems with reading the CPU regs, regs don't seem 
> correct for non-crashing CPUs.
>
> Crash is able to work perfectly for the crashing CPU as of now (as the 
> registers are stored by the kernel in that case).

You may need to call cpu_synchronize_state() in the CPU_FOREACH loop
before you read out the register state. Does that fix it?

>>
>> Are you able to add a functional test case for it? This is something
>> that people (including me) will forget to test...
>
> Sure, I will add a test case.
>
>
> Thanks for your reviews Nick.
>
> It might take few weeks for me to post another version, will see into 
> the tests in qemu and arrange the code bit more nicely.

Yeah that's okay, I'm way behind with reviews and merging unfortunately
so may have to wait for next release, but I'm keen to get it merged when
we can. Sorry for the late review.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries
  2025-02-27  8:48       ` Nicholas Piggin
@ 2025-02-27 12:15         ` Aditya Gupta
  0 siblings, 0 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-02-27 12:15 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Harsh Prateek Bora,
	Sourabh Jain, Mahesh J Salgaonkar, Hari Bathini

On 27/02/25 14:18, Nicholas Piggin wrote:
> On Thu Feb 27, 2025 at 4:49 PM AEST, Aditya Gupta wrote:
>> Hi Nick,
>>
>> On 27/02/25 08:37, Nicholas Piggin wrote:
>>> On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
>>> <...snip...>
>> Ah, that is problematic agreed. I tried to move around things, but
>> arrived at this.
>>
>> I will spend some time thinking how to arrange this.
>>
>> Will need some guidance. How should I approach arranging the code in
>> such situations ?
>>
>> My idea was to
>> * First one is the skeleton: mentions the steps, but doesn't implement
>> the steps
>> * Middle patches implement the steps one by one
>> * Last patch enables it all. So in future if someone checks out the
>> "Enable fadump" commit they would have all the support ready.
>>
>> The major problem is "everything" remains unused till this last patch.
>> But this 1st patch gave me the chance to logically build upon this, eg.
>> first implement preserving memory regions, then add the fadump_trigger
>> in os-term rtas call, etc.
>>
>> Any advice to approach this ?
> Yeah, sometimes it's difficult to avoid. Especially with a new
> feature like this. If you can't find a better way, that's okay.
>
> One thing could be to return errors from calls. RTAS is a little
> bit tricky since there is no general "unsupported" error because
> the presence of the token implies some support. You could return
> -1 hardware error perhaps.
>
> Another option is implement the call but not all functionality.
> E.g., permit dump register/unregister, but don't actually provide
> a valid dump on reboot (you could ignore, or provide empty or
> invalid format). Downside of that is that if you bisect, a kernel
> test case could go bad because it appears to be supported but
> produces invalid result.
>
> To avoid that, perhaps you could trip an assert or just log an
> error message when performing a reboot with crash dump registered.
>
> But as I said, don't make it too convoluted or lots more work if
> it's not easy to rework.

Thanks for the ideas Nick. I guess the first one makes sense if we want 
to not need the unused functions.

Only thing I want to check there is, since the 
"ibm,configure-kernel-dump" rtas call is registered, kernel will think 
fadump is supported, and might try registering fadump (if "fadump=on" 
passed in kernel), will see what the kernel does on a failure to 
register fadump in earlyboot.

It generally falls back to kdump, will check.


>>>> +{
>>>> +    struct rtas_fadump_section_header header;
>>>> +    target_ulong cmd = rtas_ld(args, 0);
>>>> +    target_ulong fdm_addr = rtas_ld(args, 1);
>>>> +    target_ulong fdm_size = rtas_ld(args, 2);
>>>> +
>>>> +    /* Number outputs has to be 1 */
>>>> +    if (nret != 1) {
>>>> +        qemu_log_mask(LOG_GUEST_ERROR,
>>>> +                "FADUMP: ibm,configure-kernel-dump RTAS called with nret != 1.\n");
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    /* Number inputs has to be 3 */
>>>> +    if (nargs != 3) {
>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    switch (cmd) {
>>>> +    case FADUMP_CMD_REGISTER:
>>>> +        if (fadump_metadata.fadump_registered) {
>>>> +            /* Fadump already registered */
>>>> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ALREADY_REGISTERED);
>>>> +            return;
>>>> +        }
>>>> +
>>>> +        if (fadump_metadata.fadump_dump_active == 1) {
>>>> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ACTIVE);
>>>> +            return;
>>>> +        }
>>>> +
>>>> +        if (fdm_size < sizeof(struct rtas_fadump_section_header)) {
>>>> +            qemu_log_mask(LOG_GUEST_ERROR,
>>>> +                "FADUMP: Header size is invalid: %lu\n", fdm_size);
>>>> +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>>> +            return;
>>>> +        }
>>>> +
>>>> +        /* XXX: Can we ensure fdm_addr points to a valid RMR-memory buffer ? */
>>> RMR memory? There is spapr_rma_size() if that's what you need?
>>
>> Thanks, will use `spapr_rma_size`. The PAPR says fdm_addr should point
>> to a valid RMR buffer, I guess that means it should be in the RMA, ie.
>> `< spapr_rma_size()` ?
> Ah yes, PAPR glossray says:
>
> Real Mode Region. This is an obsolete term that is deprecated in favor of RMA.
>
> So that should do what you want.

Sure, thanks !


- Aditya Gupta

>
> Thanks,
> Nick
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries
  2025-02-17  7:17 ` [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries Aditya Gupta
  2025-02-27  3:07   ` Nicholas Piggin
@ 2025-03-04  9:01   ` Harsh Prateek Bora
  2025-03-06  4:08     ` Aditya Gupta
  1 sibling, 1 reply; 29+ messages in thread
From: Harsh Prateek Bora @ 2025-03-04  9:01 UTC (permalink / raw)
  To: Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza, Sourabh Jain,
	Mahesh J Salgaonkar, Hari Bathini



On 2/17/25 12:47, Aditya Gupta wrote:
> Implement the handler for "ibm,configure-kernel-dump" rtas call in QEMU.
> 
> Currently the handler just does basic checks and handles
> register/unregister/invalidate requests from kernel.
> 
> Fadump will be enabled in a later patch.

Let's use FADump or fadump for consistency.

> 
> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> ---
>   hw/ppc/spapr_rtas.c    | 99 ++++++++++++++++++++++++++++++++++++++++++
>   include/hw/ppc/spapr.h | 59 +++++++++++++++++++++++++
>   2 files changed, 158 insertions(+)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index df2e837632aa..eebdf13b1552 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -341,6 +341,105 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
>       rtas_st(rets, 0, ret);
>   }
>   
> +struct fadump_metadata fadump_metadata;
> +
> +/* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
> +static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,

This __attribute shall be avoided if the function can be introduced when 
actually get used.

> +                                   SpaprMachineState *spapr,
> +                                   uint32_t token, uint32_t nargs,
> +                                   target_ulong args,
> +                                   uint32_t nret, target_ulong rets)
> +{
> +    struct rtas_fadump_section_header header;
> +    target_ulong cmd = rtas_ld(args, 0);
> +    target_ulong fdm_addr = rtas_ld(args, 1);
> +    target_ulong fdm_size = rtas_ld(args, 2);
> +
> +    /* Number outputs has to be 1 */
> +    if (nret != 1) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                "FADUMP: ibm,configure-kernel-dump RTAS called with nret != 1.\n");

Some of the error cases are using hcall_dprintf below. Let's use same
for consistency. Also, shouldn't this case also return 
RTAS_OUT_PARAM_ERROR ?

> +        return;
> +    }
> +
> +    /* Number inputs has to be 3 */
> +    if (nargs != 3) {
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);

Log error ?

> +        return;
> +    }
> +
> +    switch (cmd) {
> +    case FADUMP_CMD_REGISTER:
> +        if (fadump_metadata.fadump_registered) {
> +            /* Fadump already registered */
> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ALREADY_REGISTERED);

Log error ?

> +            return;
> +        }
> +
> +        if (fadump_metadata.fadump_dump_active == 1) {
> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ACTIVE);

Log error?

> +            return;
> +        }
> +
> +        if (fdm_size < sizeof(struct rtas_fadump_section_header)) {
> +            qemu_log_mask(LOG_GUEST_ERROR,
> +                "FADUMP: Header size is invalid: %lu\n", fdm_size);
> +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +            return;
> +        }
> +
> +        /* XXX: Can we ensure fdm_addr points to a valid RMR-memory buffer ? */
> +        if (fdm_addr <= 0) {
> +            qemu_log_mask(LOG_GUEST_ERROR,
> +                "FADUMP: Invalid fdm address: %ld\n", fdm_addr);
> +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +            return;
> +        }
> +
> +        /* Verify that we understand the fadump header version */
> +        cpu_physical_memory_read(fdm_addr, &header, sizeof(header));
> +        if (header.dump_format_version != cpu_to_be32(FADUMP_VERSION)) {
> +            qemu_log_mask(LOG_GUEST_ERROR,
> +                "FADUMP: Unknown fadump header version: 0x%x\n",
> +                header.dump_format_version);
> +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +            return;
> +        }
> +
> +        fadump_metadata.fadump_registered = true;
> +        fadump_metadata.fadump_dump_active = false;
> +        fadump_metadata.fdm_addr = fdm_addr;
> +        break;
> +    case FADUMP_CMD_UNREGISTER:
> +        if (fadump_metadata.fadump_dump_active == 1) {
> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ACTIVE);

Log error?

> +            return;
> +        }
> +
> +        fadump_metadata.fadump_registered = false;
> +        fadump_metadata.fadump_dump_active = false;
> +        fadump_metadata.fdm_addr = -1;
> +        break;
> +    case FADUMP_CMD_INVALIDATE:
> +        if (fadump_metadata.fadump_dump_active) {
> +            fadump_metadata.fadump_registered = false;
> +            fadump_metadata.fadump_dump_active = false;
> +            fadump_metadata.fdm_addr = -1;
> +            memset(&fadump_metadata.registered_fdm, 0,
> +                    sizeof(fadump_metadata.registered_fdm));
> +        } else {
> +            hcall_dprintf("fadump: Nothing to invalidate, no dump active.\n");

Isnt this an error case? Should it return status as error or success ?

> +        }
> +        break;
> +    default:
> +        hcall_dprintf("Unknown RTAS token 0x%x\n", token);
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +        return;
> +    }
> +
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}
> +
>   static void rtas_ibm_os_term(PowerPCCPU *cpu,
>                               SpaprMachineState *spapr,
>                               uint32_t token, uint32_t nargs,
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index a6c0547e313d..efa2f891a8a7 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -704,6 +704,8 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
>   #define RTAS_OUT_PARAM_ERROR                    -3
>   #define RTAS_OUT_NOT_SUPPORTED                  -3
>   #define RTAS_OUT_NO_SUCH_INDICATOR              -3
> +#define RTAS_OUT_DUMP_ALREADY_REGISTERED        -9
> +#define RTAS_OUT_DUMP_ACTIVE                    -10
>   #define RTAS_OUT_NOT_AUTHORIZED                 -9002
>   #define RTAS_OUT_SYSPARM_PARAM_ERROR            -9999
>   
> @@ -769,6 +771,63 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
>   
>   #define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x2D)
>   
> +/* Fadump commands */
> +#define FADUMP_CMD_REGISTER            1
> +#define FADUMP_CMD_UNREGISTER          2
> +#define FADUMP_CMD_INVALIDATE          3
> +
> +#define FADUMP_VERSION    1
> +
> +/*
> + * The Firmware Assisted Dump Memory structure supports a maximum of 10 sections
> + * in the dump memory structure. Presently, three sections are used for
> + * CPU state data, HPTE & Parameters area, while the remaining seven sections
> + * can be used for boot memory regions.
> + */
> +#define FADUMP_MAX_SECTIONS            10
> +#define RTAS_FADUMP_MAX_BOOT_MEM_REGS  7
> +
> +/* Kernel Dump section info */
> +struct rtas_fadump_section {
> +    __be32    request_flag;
> +    __be16    source_data_type;
> +    __be16    error_flags;
> +    __be64    source_address;
> +    __be64    source_len;
> +    __be64    bytes_dumped;
> +    __be64    destination_address;
> +};

Please refer docs/devel/style.rst for Naming style. CamelCase for structs.

> +
> +/* ibm,configure-kernel-dump header. */
> +struct rtas_fadump_section_header {
> +    __be32    dump_format_version;
> +    __be16    dump_num_sections;
> +    __be16    dump_status_flag;
> +    __be32    offset_first_dump_section;
> +
> +    /* Fields for disk dump option. */
> +    __be32    dd_block_size;
> +    __be64    dd_block_offset;
> +    __be64    dd_num_blocks;
> +    __be32    dd_offset_disk_path;
> +
> +    /* Maximum time allowed to prevent an automatic dump-reboot. */
> +    __be32    max_time_auto;
> +};
> +
> +struct rtas_fadump_mem_struct {
> +    struct rtas_fadump_section_header header;
> +    struct rtas_fadump_section        rgn[FADUMP_MAX_SECTIONS];
> +};
> +
> +struct fadump_metadata {
> +    bool fadump_registered;
> +    bool fadump_dump_active;
> +    target_ulong fdm_addr;
> +    struct rtas_fadump_mem_struct registered_fdm;
> +};
> +extern struct fadump_metadata fadump_metadata;
> +
>   /* RTAS ibm,get-system-parameter token values */
>   #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>   #define RTAS_SYSPARM_DIAGNOSTICS_RUN_MODE        42


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/6] hw/ppc: Trigger Fadump boot if fadump is registered
  2025-02-17  7:17 ` [PATCH 2/6] hw/ppc: Trigger Fadump boot if fadump is registered Aditya Gupta
  2025-02-27  3:14   ` Nicholas Piggin
@ 2025-03-04  9:21   ` Harsh Prateek Bora
  2025-03-06  4:11     ` Aditya Gupta
  1 sibling, 1 reply; 29+ messages in thread
From: Harsh Prateek Bora @ 2025-03-04  9:21 UTC (permalink / raw)
  To: Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza, Sourabh Jain,
	Mahesh J Salgaonkar, Hari Bathini



On 2/17/25 12:47, Aditya Gupta wrote:
> According to PAPR:
> 
>      R1–7.3.30–3. When the platform receives an ibm,os-term RTAS call, or
>      on a system reset without an ibm,nmi-interlock RTAS call, if the
>      platform has a dump structure registered through the
>      ibm,configure-kernel-dump call, the platform must process each
>      registered kernel dump section as required and, when available,
>      present the dump structure information to the operating system
>      through the “ibm,kernel-dump” property, updated with status for each
>      dump section, until the dump has been invalidated through the
>      ibm,configure-kernel-dump RTAS call.
> 
> If Fadump has been registered, trigger an Fadump boot (memory preserving
> boot), if QEMU recieves a 'ibm,os-term' rtas call.
> 
> Implementing the fadump boot as:
>      * pause all vcpus (will save registers later)
>      * preserve memory regions specified by fadump

Although mentioned later, but needs to call out here as not implemented
in this patch. Ideally, all the prep work patches should be introduced
earlier before enabling the trigger.

>      * do a memory preserving reboot (GUEST_RESET in QEMU doesn't clear
>        the memory)
> 
> Memory regions registered by fadump will be handled in a later patch.
> 
> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> ---
>   hw/ppc/spapr_rtas.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 42 insertions(+)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index eebdf13b1552..01c82375f03d 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -342,6 +342,43 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
>   }
>   
>   struct fadump_metadata fadump_metadata;
> +bool is_next_boot_fadump;
> +
> +static void trigger_fadump_boot(target_ulong spapr_retcode)
> +{
> +    /*
> +     * In PowerNV, SBE stops all clocks for cores, do similar to it
> +     * QEMU's nearest equivalent is 'pause_all_vcpus'
> +     * See 'stopClocksS0' in SBE source code for more info on SBE part
> +     */
> +    pause_all_vcpus();
> +
> +    if (true /* TODO: Preserve memory registered for fadump */) {
> +        /* Failed to preserve the registered memory regions */

Instead of this, it is better to introduce the dummy stub here now which 
can be populated in a later patch. That also helps in avoiding code 
changes in this hunk in future patch.

For eg:

static bool fadump_preserved_mem(void)
{
     return false; /* TBD */
}

...

if (!fadump_preserve_mem()) {
  ...
}

> +        rtas_st(spapr_retcode, 0, RTAS_OUT_HW_ERROR);
> +
> +        /* Cause a reboot */
> +        qemu_system_guest_panicked(NULL);
> +        return;
> +    }
> +
> +    /* Mark next boot as fadump boot */
> +    is_next_boot_fadump = true;
> +
> +    /* Reset fadump_registered for next boot */
> +    fadump_metadata.fadump_registered = false;
> +    fadump_metadata.fadump_dump_active = true;
> +
> +    /* Then do a guest reset */
> +    /*
> +     * Requirement:
> +     * This guest reset should not clear the memory (which is
> +     * the case when this is merged)
> +     */
> +    qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
> +
> +    rtas_st(spapr_retcode, 0, RTAS_OUT_SUCCESS);
> +}
>   
>   /* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
>   static __attribute((unused)) void rtas_configure_kernel_dump(PowerPCCPU *cpu,
> @@ -449,6 +486,11 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
>       target_ulong msgaddr = rtas_ld(args, 0);
>       char msg[512];
>   
> +    if (fadump_metadata.fadump_registered) {
> +        /* If fadump boot works, control won't come back here */
> +        return trigger_fadump_boot(rets);
> +    }
> +
>       cpu_physical_memory_read(msgaddr, msg, sizeof(msg) - 1);
>       msg[sizeof(msg) - 1] = 0;
>   


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/6] hw/ppc: Preserve memory regions registered for fadump
  2025-02-17  7:17 ` [PATCH 3/6] hw/ppc: Preserve memory regions registered for fadump Aditya Gupta
@ 2025-03-05  6:40   ` Harsh Prateek Bora
  2025-03-06  4:16     ` Aditya Gupta
  0 siblings, 1 reply; 29+ messages in thread
From: Harsh Prateek Bora @ 2025-03-05  6:40 UTC (permalink / raw)
  To: Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza, Sourabh Jain,
	Mahesh J Salgaonkar, Hari Bathini



On 2/17/25 12:47, Aditya Gupta wrote:
> While the first kernel boots, it registers memory regions for fadump
> such as:
>      * CPU state data  (has to be populated by the platform)
>      * HPTE state data (has to be populated by the platform)
>      * Real Mode Regions (platform should copy it to requested
>        destination addresses)
>      * OS defined regions (such as parameter save area)
> 
> Platform is also expected to modify the 'bytes_dumped' to the length of
> data preserved/copied by platform (ideally same as the source length
> passed by kernel).
> 
> The kernel passes source address and length for the memory regions, and
> a destination address to where the memory is to be copied.
> 
> Implement the preserving/copying of the Real Mode Regions and the
> Parameter Save Area in QEMU Pseries
> 
> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> ---
>   hw/ppc/spapr_rtas.c    | 117 ++++++++++++++++++++++++++++++++++++++++-
>   include/hw/ppc/spapr.h |  27 +++++++++-
>   2 files changed, 142 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 01c82375f03d..9b29cadab2c9 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -344,6 +344,120 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
>   struct fadump_metadata fadump_metadata;
>   bool is_next_boot_fadump;
>   
> +/* Preserve the memory locations registered for fadump */
> +static bool fadump_preserve_mem(void)
> +{
> +    struct rtas_fadump_mem_struct *fdm = &fadump_metadata.registered_fdm;
> +    uint64_t next_section_addr;
> +    int dump_num_sections, data_type;
> +    uint64_t src_addr, src_len, dest_addr;
> +    void *copy_buffer;
> +
> +    assert(fadump_metadata.fadump_registered);
> +    assert(fadump_metadata.fdm_addr != -1);
> +
> +    /* Read the fadump header passed during fadump registration */
> +    cpu_physical_memory_read(fadump_metadata.fdm_addr,
> +            &fdm->header, sizeof(fdm->header));
> +
> +    /* Verify that we understand the fadump header version */
> +    if (fdm->header.dump_format_version != cpu_to_be32(FADUMP_VERSION)) {
> +        /*
> +         * Dump format version is unknown and likely changed from the time
> +         * of fadump registration. Back out now.
> +         */
> +        return false;
> +    }
> +
> +    dump_num_sections = be16_to_cpu(fdm->header.dump_num_sections);
> +
> +    if (dump_num_sections > FADUMP_MAX_SECTIONS) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +            "FADUMP: Too many sections: %d\n", fdm->header.dump_num_sections);
> +        return false;
> +    }
> +
> +    next_section_addr =
> +        fadump_metadata.fdm_addr +
> +        be32_to_cpu(fdm->header.offset_first_dump_section);
> +
> +    /*
> +     * Handle all sections
> +     *
> +     * CPU State Data and HPTE regions are handled in their own cases
> +     *
> +     * RMR regions and any custom OS reserved regions such as parameter
> +     * save area, are handled by simply copying the source region to
> +     * destination address
> +     */
> +    for (int i = 0; i < dump_num_sections; ++i) {
> +        /* Read the fadump section from memory */
> +        cpu_physical_memory_read(next_section_addr,
> +                &fdm->rgn[i], sizeof(fdm->rgn[i]));
> +
> +        next_section_addr += sizeof(fdm->rgn[i]);
> +
> +        data_type = be16_to_cpu(fdm->rgn[i].source_data_type);
> +        src_addr  = be64_to_cpu(fdm->rgn[i].source_address);
> +        src_len   = be64_to_cpu(fdm->rgn[i].source_len);
> +        dest_addr = be64_to_cpu(fdm->rgn[i].destination_address);
> +
> +        /* Reset error_flags & bytes_dumped for now */
> +        fdm->rgn[i].error_flags = 0;
> +        fdm->rgn[i].bytes_dumped = 0;
> +
> +        if (be32_to_cpu(fdm->rgn[i].request_flag) != FADUMP_REQUEST_FLAG) {
> +            qemu_log_mask(LOG_UNIMP,
> +                "FADUMP: Skipping copying region as not requested\n");
> +            continue;
> +        }
> +
> +        switch (data_type) {
> +        case FADUMP_CPU_STATE_DATA:
> +            /* TODO: Add CPU state data */
> +            break;
> +        case FADUMP_HPTE_REGION:
> +            /* TODO: Add hpte state data */
> +            break;
> +        case FADUMP_REAL_MODE_REGION:
> +        case FADUMP_PARAM_AREA:
> +            /* Skip copy if source and destination are same (eg. param area) */
> +            if (src_addr != dest_addr) {
> +                copy_buffer = g_malloc(src_len + 1);
> +                if (copy_buffer == NULL) {
> +                    qemu_log_mask(LOG_GUEST_ERROR,
> +                        "FADUMP: Failed allocating memory for copying reserved memory regions\n");
> +                    fdm->rgn[i].error_flags =
> +                        cpu_to_be16(FADUMP_ERROR_LENGTH_EXCEEDS_SOURCE);
> +
> +                    continue;
> +                }
> +
> +                /* Copy the source region to destination */
> +                cpu_physical_memory_read(src_addr, copy_buffer, src_len);
> +                cpu_physical_memory_write(dest_addr, copy_buffer, src_len);
> +                g_free(copy_buffer);
> +            }
> +
> +            /*
> +             * Considering cpu_physical_memory_write would have copied the
> +             * complete region
> +             */
> +            fdm->rgn[i].bytes_dumped = cpu_to_be64(src_len);

Is this really valid for FADUMP_PARAM_AREA where we intend to skip copy?

> +
> +            break;
> +        default:
> +            qemu_log_mask(LOG_GUEST_ERROR,
> +                "FADUMP: Skipping unknown source data type: %d\n", data_type);
> +
> +            fdm->rgn[i].error_flags =
> +                cpu_to_be16(FADUMP_ERROR_INVALID_DATA_TYPE);
> +        }
> +    }
> +
> +    return true;
> +}
> +
>   static void trigger_fadump_boot(target_ulong spapr_retcode)
>   {
>       /*
> @@ -353,7 +467,8 @@ static void trigger_fadump_boot(target_ulong spapr_retcode)
>        */
>       pause_all_vcpus();
>   
> -    if (true /* TODO: Preserve memory registered for fadump */) {
> +    /* Preserve the memory locations registered for fadump */
> +    if (!fadump_preserve_mem()) {

This change can be avoided as suggested in previous patch.

>           /* Failed to preserve the registered memory regions */
>           rtas_st(spapr_retcode, 0, RTAS_OUT_HW_ERROR);
>   
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index efa2f891a8a7..a80704187583 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -776,7 +776,32 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
>   #define FADUMP_CMD_UNREGISTER          2
>   #define FADUMP_CMD_INVALIDATE          3
>   
> -#define FADUMP_VERSION    1
> +#define FADUMP_VERSION                 1

This change can be avoided if taken care initially.

Thanks
Harsh
> +
> +/*
> + * The Firmware Assisted Dump Memory structure supports a maximum of 10 sections
> + * in the dump memory structure. Presently, three sections are used for
> + * CPU state data, HPTE & Parameters area, while the remaining seven sections
> + * can be used for boot memory regions.
> + */
> +#define FADUMP_MAX_SECTIONS            10
> +#define RTAS_FADUMP_MAX_BOOT_MEM_REGS  7
> +
> +/* Firmware provided dump sections */
> +#define FADUMP_CPU_STATE_DATA   0x0001
> +#define FADUMP_HPTE_REGION      0x0002
> +#define FADUMP_REAL_MODE_REGION 0x0011
> +
> +/* OS defined sections */
> +#define FADUMP_PARAM_AREA       0x0100
> +
> +/* Dump request flag */
> +#define FADUMP_REQUEST_FLAG     0x00000001
> +
> +/* Dump status flag */
> +#define FADUMP_ERROR_INVALID_DATA_TYPE          0x8000
> +#define FADUMP_ERROR_INVALID_SOURCE_ADDR        0x4000
> +#define FADUMP_ERROR_LENGTH_EXCEEDS_SOURCE      0x2000
>   
>   /*
>    * The Firmware Assisted Dump Memory structure supports a maximum of 10 sections


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/6] hw/ppc: Implement saving CPU state in Fadump
  2025-02-17  7:17 ` [PATCH 4/6] hw/ppc: Implement saving CPU state in Fadump Aditya Gupta
  2025-02-27  3:27   ` Nicholas Piggin
@ 2025-03-05  7:23   ` Harsh Prateek Bora
  2025-03-06  4:22     ` Aditya Gupta
  1 sibling, 1 reply; 29+ messages in thread
From: Harsh Prateek Bora @ 2025-03-05  7:23 UTC (permalink / raw)
  To: Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza, Sourabh Jain,
	Mahesh J Salgaonkar, Hari Bathini



On 2/17/25 12:47, Aditya Gupta wrote:
> Kernel expects CPU states/register states in the format mentioned in
> "Register Save Area" in PAPR.
> 
> The platform (in our case, QEMU) saves each CPU register in the form of
> an array of "register entries", the start and end of this array is
> signified by "CPUSTRT" and "CPUEND" register entries respectively.
> 
> The CPUSTRT and CPUEND register entry also has 4-byte logical CPU ID,
> thus storing the CPU ID corresponding to the array of register entries.
> 
> Each register, and CPUSTRT, CPUEND has a predefined identifier.
> Implement calculating identifier for a given register in
> 'fadump_str_to_u64', which has been taken from the linux kernel
> 
> Similarly GPRs also have predefined identifiers, and a corresponding
> 64-bit resiter value (split into two 32-bit cells). Implement
> calculation of GPR identifiers with 'fadump_gpr_id_to_u64'
> 
> PAPR has restrictions on particular order of few registers, and is
> free to be in any order for other registers.
> Some registers mentioned in PAPR have not been exported as they are not
> implemented in QEMU / don't make sense in QEMU.
> 
> Implement saving of CPU state according to the PAPR document
> 
> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
> ---
>   hw/ppc/spapr_rtas.c    | 200 ++++++++++++++++++++++++++++++++++++++++-
>   include/hw/ppc/spapr.h |  83 +++++++++++++++++
>   2 files changed, 281 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 9b29cadab2c9..0aca4270aee8 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -348,9 +348,12 @@ bool is_next_boot_fadump;
>   static bool fadump_preserve_mem(void)
>   {
>       struct rtas_fadump_mem_struct *fdm = &fadump_metadata.registered_fdm;
> +    struct rtas_fadump_section *cpu_state_region;
>       uint64_t next_section_addr;
>       int dump_num_sections, data_type;
>       uint64_t src_addr, src_len, dest_addr;
> +    uint64_t cpu_state_addr, cpu_state_len = 0;
> +    void *cpu_state_buffer;
>       void *copy_buffer;
>   
>       assert(fadump_metadata.fadump_registered);
> @@ -413,9 +416,174 @@ static bool fadump_preserve_mem(void)
>           }
>   
>           switch (data_type) {
> -        case FADUMP_CPU_STATE_DATA:
> -            /* TODO: Add CPU state data */
> +        case FADUMP_CPU_STATE_DATA: {
> +            struct rtas_fadump_reg_save_area_header reg_save_hdr;
> +            struct rtas_fadump_reg_entry **reg_entries;
> +            struct rtas_fadump_reg_entry *curr_reg_entry;
> +
> +            uint32_t fadump_reg_entries_size;
> +            __be32 num_cpus = 0;
> +            uint32_t num_regs_per_cpu = 0;
> +            CPUState *cpu;
> +            CPUPPCState *env;
> +            PowerPCCPU *ppc_cpu;
> +
> +            CPU_FOREACH(cpu) {
> +                ++num_cpus;
> +            }
> +
> +            reg_save_hdr.version = cpu_to_be32(1);

PAPR spec mentions version value as 0. Do we need to update ?

> +            reg_save_hdr.magic_number =
> +                cpu_to_be64(fadump_str_to_u64("REGSAVE"));
> +
> +            /* Reg save area header is immediately followed by num cpus */
> +            reg_save_hdr.num_cpu_offset =
> +                cpu_to_be32(sizeof(struct rtas_fadump_reg_save_area_header));
> +

Above inits could go into a helper 
fadump_init_reg_save_header(&reg_save_hdr);
BTW, the PAPR spec also mentions about padding followed by 
num_cpus_offset, see another comment later below.


> +            fadump_reg_entries_size = num_cpus *
> +                                      FADUMP_NUM_PER_CPU_REGS *
> +                                      sizeof(struct rtas_fadump_reg_entry);
> +
> +            reg_entries = malloc(fadump_reg_entries_size);

This was declared as double pointer, but being used as a pointer.

> +            curr_reg_entry = (struct rtas_fadump_reg_entry *)reg_entries;
> +
> +            /* This must loop num_cpus time */
> +            CPU_FOREACH(cpu) {
> +                ppc_cpu = POWERPC_CPU(cpu);
> +                env = cpu_env(cpu);
> +                num_regs_per_cpu = 0;
> +
> +                curr_reg_entry->reg_id =
> +                    cpu_to_be64(fadump_str_to_u64("CPUSTRT"));
> +                curr_reg_entry->reg_value = ppc_cpu->vcpu_id;
> +                ++curr_reg_entry;
> +
> +#define REG_ENTRY(id, val)                                     \
> +                do {                                           \
> +                    curr_reg_entry->reg_id =                   \
> +                        cpu_to_be64(fadump_str_to_u64(#id));   \
> +                    curr_reg_entry->reg_value = val;           \
> +                    ++curr_reg_entry;                          \
> +                    ++num_regs_per_cpu;                        \
> +                } while (0)
> +
> +                REG_ENTRY(ACOP, env->spr[SPR_ACOP]);
> +                REG_ENTRY(AMR, env->spr[SPR_AMR]);
> +                REG_ENTRY(BESCR, env->spr[SPR_BESCR]);
> +                REG_ENTRY(CFAR, env->spr[SPR_CFAR]);
> +                REG_ENTRY(CIABR, env->spr[SPR_CIABR]);
> +
> +                /* Save the condition register */
> +                uint64_t cr = 0;
> +                cr |= (env->crf[0] & 0xf);
> +                cr |= (env->crf[1] & 0xf) << 1;
> +                cr |= (env->crf[2] & 0xf) << 2;
> +                cr |= (env->crf[3] & 0xf) << 3;
> +                cr |= (env->crf[4] & 0xf) << 4;
> +                cr |= (env->crf[5] & 0xf) << 5;
> +                cr |= (env->crf[6] & 0xf) << 6;
> +                cr |= (env->crf[7] & 0xf) << 7;
> +                REG_ENTRY(CR, cr);

ppc_get_cr ?

> +
> +                REG_ENTRY(CTR, env->spr[SPR_CTR]);
> +                REG_ENTRY(CTRL, env->spr[SPR_CTRL]);
> +                REG_ENTRY(DABR, env->spr[SPR_DABR]);
> +                REG_ENTRY(DABRX, env->spr[SPR_DABRX]);
> +                REG_ENTRY(DAR, env->spr[SPR_DAR]);
> +                REG_ENTRY(DAWR0, env->spr[SPR_DAWR0]);
> +                REG_ENTRY(DAWR1, env->spr[SPR_DAWR1]);
> +                REG_ENTRY(DAWRX0, env->spr[SPR_DAWRX0]);
> +                REG_ENTRY(DAWRX1, env->spr[SPR_DAWRX1]);
> +                REG_ENTRY(DPDES, env->spr[SPR_DPDES]);
> +                REG_ENTRY(DSCR, env->spr[SPR_DSCR]);
> +                REG_ENTRY(DSISR, env->spr[SPR_DSISR]);
> +                REG_ENTRY(EBBHR, env->spr[SPR_EBBHR]);
> +                REG_ENTRY(EBBRR, env->spr[SPR_EBBRR]);
> +
> +                REG_ENTRY(FPSCR, env->fpscr);
> +                REG_ENTRY(FSCR, env->spr[SPR_FSCR]);
> +
> +                /* Save the GPRs */
> +                for (int gpr_id = 0; gpr_id < 32; ++gpr_id) {
> +                    curr_reg_entry->reg_id =
> +                        cpu_to_be64(fadump_gpr_id_to_u64(gpr_id));
> +                    curr_reg_entry->reg_value = env->gpr[i];
> +                    ++curr_reg_entry;
> +                    ++num_regs_per_cpu;
> +                }
> +
> +                REG_ENTRY(IAMR, env->spr[SPR_IAMR]);
> +                REG_ENTRY(IC, env->spr[SPR_IC]);
> +                REG_ENTRY(LR, env->spr[SPR_LR]);
> +
> +                REG_ENTRY(MSR, env->msr);
> +                REG_ENTRY(NIA, env->nip);   /* NIA */
> +                REG_ENTRY(PIR, env->spr[SPR_PIR]);
> +                REG_ENTRY(PSPB, env->spr[SPR_PSPB]);
> +                REG_ENTRY(PVR, env->spr[SPR_PVR]);
> +                REG_ENTRY(RPR, env->spr[SPR_RPR]);
> +                REG_ENTRY(SPURR, env->spr[SPR_SPURR]);
> +                REG_ENTRY(SRR0, env->spr[SPR_SRR0]);
> +                REG_ENTRY(SRR1, env->spr[SPR_SRR1]);
> +                REG_ENTRY(TAR, env->spr[SPR_TAR]);
> +                REG_ENTRY(TEXASR, env->spr[SPR_TEXASR]);
> +                REG_ENTRY(TFHAR, env->spr[SPR_TFHAR]);
> +                REG_ENTRY(TFIAR, env->spr[SPR_TFIAR]);
> +                REG_ENTRY(TIR, env->spr[SPR_TIR]);
> +                REG_ENTRY(UAMOR, env->spr[SPR_UAMOR]);
> +                REG_ENTRY(VRSAVE, env->spr[SPR_VRSAVE]);
> +                REG_ENTRY(VSCR, env->vscr);
> +                REG_ENTRY(VTB, env->spr[SPR_VTB]);
> +                REG_ENTRY(WORT, env->spr[SPR_WORT]);
> +                REG_ENTRY(XER, env->spr[SPR_XER]);
> +
> +                /*
> +                 * Ignoring transaction checkpoint and few other registers
> +                 * mentioned in PAPR as not supported in QEMU
> +                 */
> +#undef REG_ENTRY
> +
> +                /* End the registers for this CPU with "CPUEND" reg entry */
> +                curr_reg_entry->reg_id =
> +                    cpu_to_be64(fadump_str_to_u64("CPUEND"));
> +
> +                /* Ensure the number of registers match (+2 for STRT & END) */
> +                assert(FADUMP_NUM_PER_CPU_REGS == num_regs_per_cpu + 2);
> +
> +                ++curr_reg_entry;
> +            }
> +
> +            cpu_state_len = 0;
> +            cpu_state_len += sizeof(reg_save_hdr);     /* reg save header */
> +            cpu_state_len += sizeof(__be32);           /* num_cpus */
> +            cpu_state_len += fadump_reg_entries_size;  /* reg entries */
> +

above 4 inits could be a single line init.

> +            cpu_state_region = &fdm->rgn[i];
> +            cpu_state_addr = dest_addr;
> +            cpu_state_buffer = g_malloc(cpu_state_len);
> +
> +            uint64_t offset = 0;
> +            memcpy(cpu_state_buffer + offset,
> +                    &reg_save_hdr, sizeof(reg_save_hdr));
> +            offset += sizeof(reg_save_hdr);
> +
> +            /* Write num_cpus */
> +            num_cpus = cpu_to_be32(num_cpus);
> +            memcpy(cpu_state_buffer + offset, &num_cpus, sizeof(__be32));
> +            offset += sizeof(__be32);

As per PAPR spec, NumCpusOffset is followed by a padding of 0xC bytes
initialized to 0. Any reasons for skipping that here ?

> +
> +            /* Write the register entries */
> +            memcpy(cpu_state_buffer + offset,
> +                    reg_entries, fadump_reg_entries_size);
> +            offset += fadump_reg_entries_size;
> +
> +            /*
> +             * We will write the cpu state data later, as otherwise it
> +             * might get overwritten by other fadump regions
> +             */
> +

Better to have a separate routine fadump_preserve_cpu_state_data() when 
the switch case logic grows this large, applies wherever applicable.

>               break;
> +        }
>           case FADUMP_HPTE_REGION:
>               /* TODO: Add hpte state data */
>               break;
> @@ -455,6 +623,34 @@ static bool fadump_preserve_mem(void)
>           }
>       }
>   
> +    /*
> +     * Write the Register Save Area
> +     *
> +     * CPU State/Register Save Area should be written after dumping the
> +     * memory to prevent overwritting while saving other memory regions
> +     *
> +     * eg. If boot memory region is 1G, then both the first 1GB memory, and
> +     * the Register Save Area needs to be saved at 1GB.
> +     * And as the CPU_STATE_DATA region comes first than the
> +     * REAL_MODE_REGION region to be copied, the CPU_STATE_DATA will get
> +     * overwritten if saved before the 0GB - 1GB region is copied after
> +     * saving CPU state data
> +     */
> +    cpu_physical_memory_write(cpu_state_addr, cpu_state_buffer, cpu_state_len);
> +    g_free(cpu_state_buffer);
> +
> +    /*
> +     * Set bytes_dumped in cpu state region, so kernel knows platform have
> +     * exported it
> +     */
> +    cpu_state_region->bytes_dumped = cpu_to_be64(cpu_state_len);
> +
> +    if (cpu_state_region->source_len != cpu_state_region->bytes_dumped) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                "CPU State region's length passed by kernel, doesn't match"
> +                " with CPU State region length exported by QEMU");

           return error ?

Thanks
Harsh
> +    }
> +
>       return true;
>   }
>   
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index a80704187583..0e8002bad9e0 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -792,6 +792,9 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
>   #define FADUMP_HPTE_REGION      0x0002
>   #define FADUMP_REAL_MODE_REGION 0x0011
>   
> +/* Number of registers stored per cpu */
> +#define FADUMP_NUM_PER_CPU_REGS (32 /*GPR*/ + 45 /*others*/ + 2 /*STRT & END*/)
> +
>   /* OS defined sections */
>   #define FADUMP_PARAM_AREA       0x0100
>   
> @@ -845,6 +848,86 @@ struct rtas_fadump_mem_struct {
>       struct rtas_fadump_section        rgn[FADUMP_MAX_SECTIONS];
>   };
>   
> +/*
> + * The firmware-assisted dump format.
> + *
> + * The register save area is an area in the partition's memory used to preserve
> + * the register contents (CPU state data) for the active CPUs during a firmware
> + * assisted dump. The dump format contains register save area header followed
> + * by register entries. Each list of registers for a CPU starts with "CPUSTRT"
> + * and ends with "CPUEND".
> + */
> +
> +/* Register save area header. */
> +struct rtas_fadump_reg_save_area_header {
> +    __be64    magic_number;
> +    __be32    version;
> +    __be32    num_cpu_offset;
> +};
> +
> +/* Register entry. */
> +struct rtas_fadump_reg_entry {
> +    __be64    reg_id;
> +    __be64    reg_value;
> +};
> +
> +/*
> + * Copy the ascii values for first 8 characters from a string into u64
> + * variable at their respective indexes.
> + * e.g.
> + *  The string "FADMPINF" will be converted into 0x4641444d50494e46
> + */
> +static inline uint64_t fadump_str_to_u64(const char *str)
> +{
> +    uint64_t val = 0;
> +    int i;
> +
> +    for (i = 0; i < sizeof(val); i++) {
> +        val = (*str) ? (val << 8) | *str++ : val << 8;
> +    }
> +    return val;
> +}
> +
> +/**
> + * Get the identifier id for register entries of GPRs
> + *
> + * It gives the same id as 'fadump_str_to_u64' when the complete string id
> + * of the GPR is given, ie.
> + *
> + *   fadump_str_to_u64("GPR05") == fadump_gpr_id_to_u64(5);
> + *   fadump_str_to_u64("GPR12") == fadump_gpr_id_to_u64(12);
> + *
> + * And so on. Hence this can be implemented by creating a dynamic
> + * string for each GPR, such as "GPR00", "GPR01", ... "GPR31"
> + * Instead of allocating a string, an observation from the math of
> + * 'fadump_str_to_u64' or from PAPR tells us that there's a pattern
> + * in the identifier IDs, such that the first 8 bytes are affected only by
> + * whether it is GPR0*, GPR1*, GPR2*, GPR3*. 9th byte is always 0x3. And
> + * the the 10th byte is the index of the GPR modulo 10.
> + */
> +static inline uint64_t fadump_gpr_id_to_u64(uint32_t gpr_id)
> +{
> +    uint64_t val = 0;
> +
> +    /* Valid range of GPR id is only GPR0 to GPR31 */
> +    assert(gpr_id < 32);
> +
> +    if (gpr_id <= 9) {
> +        val = fadump_str_to_u64("GPR0");
> +    } else if (gpr_id <= 19) {
> +        val = fadump_str_to_u64("GPR1");
> +    } else if (gpr_id <= 29) {
> +        val = fadump_str_to_u64("GPR2");
> +    } else {
> +        val = fadump_str_to_u64("GPR3");
> +    }
> +
> +    val |= 0x30000000;
> +    val |= ((gpr_id % 10) << 12);
> +
> +    return val;
> +}
> +
>   struct fadump_metadata {
>       bool fadump_registered;
>       bool fadump_dump_active;


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/6] hw/ppc: Pass device tree properties for Fadump
  2025-02-27  3:28   ` Nicholas Piggin
  2025-02-27  7:02     ` Aditya Gupta
@ 2025-03-05  7:34     ` Harsh Prateek Bora
  1 sibling, 0 replies; 29+ messages in thread
From: Harsh Prateek Bora @ 2025-03-05  7:34 UTC (permalink / raw)
  To: Nicholas Piggin, Aditya Gupta, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Sourabh Jain,
	Mahesh J Salgaonkar, Hari Bathini



On 2/27/25 08:58, Nicholas Piggin wrote:
> On Mon Feb 17, 2025 at 5:17 PM AEST, Aditya Gupta wrote:
>> Platform (ie. QEMU) is expected to pass few device tree properties for
>> details for fadump:
>>
>>    * "ibm,configure-kernel-dump": RTAS call for fadump
>>    * "ibm,configure-kernel-dump-sizes": Space required to store dump data
>>      for firmware provided dump sections (ie. CPU & HPTE regions)
>>    * "ibm,configure-kernel-dump-version": Versions of fadump supported
>>    * "ibm,kernel-dump": Contains the Fadump Memory Structure on a fadump
>>      boot
>>
>> Implement passing configure-kernel-dump-sizes, and
>> configure-kernel-dump-version device tree properties, irrespective of
>> whether it's a fadump boot or not, so that kernel can reserve memory to
>> store the firmware provided dump sections in case of a crash
>>
>> Also, in case of a fadump boot, pass the fadump memory structure to the
>> kernel in "ibm,kernel-dump" device tree property.
>>
>> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
>> ---
>>   hw/ppc/spapr.c         | 62 ++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr.h |  2 ++
>>   2 files changed, 64 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index f3a4b4235d43..3602e5b5d18d 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -897,9 +897,27 @@ static int spapr_dt_rng(void *fdt)
>>   static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
>>   {
> 
> You might be able to add a spapr_dt_rtas_fadump() function
> and do it there to help keep functions small?

Ditto. Would look nicer coming from spapr_fadump.[ch] as suggested by 
Nick earlier.

Thanks
Harsh
> 
> Thanks,
> Nick
> 
>>       MachineState *ms = MACHINE(spapr);
>> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
>>       int rtas;
>>       GString *hypertas = g_string_sized_new(256);
>>       GString *qemu_hypertas = g_string_sized_new(256);
>> +    uint32_t max_possible_cpus = mc->possible_cpu_arch_ids(ms)->len;
>> +    uint64_t fadump_cpu_state_size = 0;
>> +    uint16_t fadump_versions[2] = {
>> +        FADUMP_VERSION /* min supported version */,
>> +        FADUMP_VERSION /* max supported version */
>> +    };
>> +    uint32_t fadump_rgn_sizes[2][3] = {
>> +        {
>> +            cpu_to_be32(FADUMP_CPU_STATE_DATA),
>> +            0, 0 /* Calculated later */
>> +        },
>> +        {
>> +            cpu_to_be32(FADUMP_HPTE_REGION),
>> +            0, 0 /* HPTE region not implemented */
>> +        }
>> +    };
>> +
>>       uint32_t lrdr_capacity[] = {
>>           0,
>>           0,
>> @@ -1006,6 +1024,50 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
>>       _FDT(fdt_setprop(fdt, rtas, "ibm,lrdr-capacity",
>>                        lrdr_capacity, sizeof(lrdr_capacity)));
>>   
>> +    /*
>> +     * CPU State Data contains multiple fields such as header, num_cpus and
>> +     * register entries
>> +     *
>> +     * Calculate the maximum CPU State Data size, according to maximum
>> +     * possible CPUs the QEMU VM can have
>> +     */
>> +    /* Reg save header */
>> +    fadump_cpu_state_size += sizeof(struct rtas_fadump_reg_save_area_header);
>> +
>> +    /* Num_cpus */
>> +    fadump_cpu_state_size += sizeof(__be32);
>> +
>> +    /* Register Entries */
>> +    fadump_cpu_state_size += max_possible_cpus   *
>> +                             FADUMP_NUM_PER_CPU_REGS *
>> +                             sizeof(struct rtas_fadump_reg_entry);
>> +
>> +    /* Set maximum size for CPU state data region */
>> +    assert(fadump_rgn_sizes[0][0] == cpu_to_be32(FADUMP_CPU_STATE_DATA));
>> +
>> +    /* Upper 32 bits of size, usually 0 */
>> +    fadump_rgn_sizes[0][1] = cpu_to_be32(fadump_cpu_state_size >> 32);
>> +
>> +    /* Lower 32 bits of size */
>> +    fadump_rgn_sizes[0][2] = cpu_to_be32(fadump_cpu_state_size & 0xffffffff);
>> +
>> +    /* Add device tree properties required from platform for fadump */
>> +    _FDT((fdt_setprop(fdt, rtas, "ibm,configure-kernel-dump-version",
>> +                    fadump_versions, sizeof(fadump_versions))));
>> +    _FDT((fdt_setprop(fdt, rtas, "ibm,configure-kernel-dump-sizes",
>> +                    fadump_rgn_sizes, sizeof(fadump_rgn_sizes))));
>> +
>> +    if (is_next_boot_fadump) {
>> +        struct rtas_fadump_mem_struct *fdm =
>> +            &fadump_metadata.registered_fdm;
>> +
>> +        uint64_t fdm_size =
>> +            sizeof(struct rtas_fadump_section_header) +
>> +            (be16_to_cpu(fdm->header.dump_num_sections) *
>> +            sizeof(struct rtas_fadump_section));
>> +
>> +        _FDT((fdt_setprop(fdt, rtas, "ibm,kernel-dump", fdm, fdm_size)));
>> +    }
>>       spapr_dt_rtas_tokens(fdt, rtas);
>>   }
>>   
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 0e8002bad9e0..fa63008e57ec 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -928,6 +928,8 @@ static inline uint64_t fadump_gpr_id_to_u64(uint32_t gpr_id)
>>       return val;
>>   }
>>   
>> +extern bool is_next_boot_fadump;
>> +
>>   struct fadump_metadata {
>>       bool fadump_registered;
>>       bool fadump_dump_active;
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries
  2025-03-04  9:01   ` Harsh Prateek Bora
@ 2025-03-06  4:08     ` Aditya Gupta
  0 siblings, 0 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-03-06  4:08 UTC (permalink / raw)
  To: Harsh Prateek Bora, qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza, Sourabh Jain,
	Mahesh J Salgaonkar, Hari Bathini

Hi Harsh,

Thanks for your reviews.


On 04/03/25 14:31, Harsh Prateek Bora wrote:
>
>
> On 2/17/25 12:47, Aditya Gupta wrote:
>> Implement the handler for "ibm,configure-kernel-dump" rtas call in QEMU.
>>
>> Currently the handler just does basic checks and handles
>> register/unregister/invalidate requests from kernel.
>>
>> Fadump will be enabled in a later patch.
>
> Let's use FADump or fadump for consistency.
>
Sure, will use FADump when starting the line, else fadump ?
>>
>> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_rtas.c    | 99 ++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr.h | 59 +++++++++++++++++++++++++
>>   2 files changed, 158 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index df2e837632aa..eebdf13b1552 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -341,6 +341,105 @@ static void 
>> rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
>>       rtas_st(rets, 0, ret);
>>   }
>>   +struct fadump_metadata fadump_metadata;
>> +
>> +/* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
>> +static __attribute((unused)) void 
>> rtas_configure_kernel_dump(PowerPCCPU *cpu,
>
> This __attribute shall be avoided if the function can be introduced 
> when actually get used.
Will do it that way in v2, without introducing this unused attribute.
>
>> + SpaprMachineState *spapr,
>> +                                   uint32_t token, uint32_t nargs,
>> +                                   target_ulong args,
>> +                                   uint32_t nret, target_ulong rets)
>> +{
>> +    struct rtas_fadump_section_header header;
>> +    target_ulong cmd = rtas_ld(args, 0);
>> +    target_ulong fdm_addr = rtas_ld(args, 1);
>> +    target_ulong fdm_size = rtas_ld(args, 2);
>> +
>> +    /* Number outputs has to be 1 */
>> +    if (nret != 1) {
>> +        qemu_log_mask(LOG_GUEST_ERROR,
>> +                "FADUMP: ibm,configure-kernel-dump RTAS called with 
>> nret != 1.\n");
>
> Some of the error cases are using hcall_dprintf below. Let's use same
> for consistency. Also, shouldn't this case also return 
> RTAS_OUT_PARAM_ERROR ?

Sure, will use qemu_log_mask then.

Thanks for the catch, yes I should have returned PARAM_ERROR, missed it. 
Will do it.

>
>> +        return;
>> +    }
>> +
>> +    /* Number inputs has to be 3 */
>> +    if (nargs != 3) {
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>
> Log error ?
Will add.
>
>> +        return;
>> +    }
>> +
>> +    switch (cmd) {
>> +    case FADUMP_CMD_REGISTER:
>> +        if (fadump_metadata.fadump_registered) {
>> +            /* Fadump already registered */
>> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ALREADY_REGISTERED);
>
> Log error ?
Will do.
>
>> +            return;
>> +        }
>> +
>> +        if (fadump_metadata.fadump_dump_active == 1) {
>> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ACTIVE);
>
> Log error?
Will add.
>
>> +            return;
>> +        }
>> +
>> +        if (fdm_size < sizeof(struct rtas_fadump_section_header)) {
>> +            qemu_log_mask(LOG_GUEST_ERROR,
>> +                "FADUMP: Header size is invalid: %lu\n", fdm_size);
>> +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +            return;
>> +        }
>> +
>> +        /* XXX: Can we ensure fdm_addr points to a valid RMR-memory 
>> buffer ? */
>> +        if (fdm_addr <= 0) {
>> +            qemu_log_mask(LOG_GUEST_ERROR,
>> +                "FADUMP: Invalid fdm address: %ld\n", fdm_addr);
>> +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +            return;
>> +        }
>> +
>> +        /* Verify that we understand the fadump header version */
>> +        cpu_physical_memory_read(fdm_addr, &header, sizeof(header));
>> +        if (header.dump_format_version != 
>> cpu_to_be32(FADUMP_VERSION)) {
>> +            qemu_log_mask(LOG_GUEST_ERROR,
>> +                "FADUMP: Unknown fadump header version: 0x%x\n",
>> +                header.dump_format_version);
>> +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +            return;
>> +        }
>> +
>> +        fadump_metadata.fadump_registered = true;
>> +        fadump_metadata.fadump_dump_active = false;
>> +        fadump_metadata.fdm_addr = fdm_addr;
>> +        break;
>> +    case FADUMP_CMD_UNREGISTER:
>> +        if (fadump_metadata.fadump_dump_active == 1) {
>> +            rtas_st(rets, 0, RTAS_OUT_DUMP_ACTIVE);
>
> Log error?
Will add.
>
>> +            return;
>> +        }
>> +
>> +        fadump_metadata.fadump_registered = false;
>> +        fadump_metadata.fadump_dump_active = false;
>> +        fadump_metadata.fdm_addr = -1;
>> +        break;
>> +    case FADUMP_CMD_INVALIDATE:
>> +        if (fadump_metadata.fadump_dump_active) {
>> +            fadump_metadata.fadump_registered = false;
>> +            fadump_metadata.fadump_dump_active = false;
>> +            fadump_metadata.fdm_addr = -1;
>> +            memset(&fadump_metadata.registered_fdm, 0,
>> +                    sizeof(fadump_metadata.registered_fdm));
>> +        } else {
>> +            hcall_dprintf("fadump: Nothing to invalidate, no dump 
>> active.\n");
>
> Isnt this an error case? Should it return status as error or success ?

Not sure. PAPR doesn't specify it any error for this situation. With 
this current code, software can do invalidate anytime without needing to 
verify if dump is active or not (shouldn't happen though), but final 
state should always be that there won't be any dump active and fadump 
registered is reset.

Or should I return a HARDWARE_ERROR or PARAMETER_ERROR for this (don't 
think either is helpful) ?

>
>> +        }
>> +        break;
>> +    default:
>> +        hcall_dprintf("Unknown RTAS token 0x%x\n", token);
>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>> +        return;
>> +    }
>> +
>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>> +}
>> +
>>   static void rtas_ibm_os_term(PowerPCCPU *cpu,
>>                               SpaprMachineState *spapr,
>>                               uint32_t token, uint32_t nargs,
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index a6c0547e313d..efa2f891a8a7 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -704,6 +704,8 @@ void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
>>   #define RTAS_OUT_PARAM_ERROR                    -3
>>   #define RTAS_OUT_NOT_SUPPORTED                  -3
>>   #define RTAS_OUT_NO_SUCH_INDICATOR              -3
>> +#define RTAS_OUT_DUMP_ALREADY_REGISTERED        -9
>> +#define RTAS_OUT_DUMP_ACTIVE                    -10
>>   #define RTAS_OUT_NOT_AUTHORIZED                 -9002
>>   #define RTAS_OUT_SYSPARM_PARAM_ERROR            -9999
>>   @@ -769,6 +771,63 @@ void push_sregs_to_kvm_pr(SpaprMachineState 
>> *spapr);
>>     #define RTAS_TOKEN_MAX (RTAS_TOKEN_BASE + 0x2D)
>>   +/* Fadump commands */
>> +#define FADUMP_CMD_REGISTER            1
>> +#define FADUMP_CMD_UNREGISTER          2
>> +#define FADUMP_CMD_INVALIDATE          3
>> +
>> +#define FADUMP_VERSION    1
>> +
>> +/*
>> + * The Firmware Assisted Dump Memory structure supports a maximum of 
>> 10 sections
>> + * in the dump memory structure. Presently, three sections are used for
>> + * CPU state data, HPTE & Parameters area, while the remaining seven 
>> sections
>> + * can be used for boot memory regions.
>> + */
>> +#define FADUMP_MAX_SECTIONS            10
>> +#define RTAS_FADUMP_MAX_BOOT_MEM_REGS  7
>> +
>> +/* Kernel Dump section info */
>> +struct rtas_fadump_section {
>> +    __be32    request_flag;
>> +    __be16    source_data_type;
>> +    __be16    error_flags;
>> +    __be64    source_address;
>> +    __be64    source_len;
>> +    __be64    bytes_dumped;
>> +    __be64    destination_address;
>> +};
>
> Please refer docs/devel/style.rst for Naming style. CamelCase for 
> structs.

Sure, thanks, will follow it.


Thanks,

- Aditya Gupta

>
>> +
>> +/* ibm,configure-kernel-dump header. */
>> +struct rtas_fadump_section_header {
>> +    __be32    dump_format_version;
>> +    __be16    dump_num_sections;
>> +    __be16    dump_status_flag;
>> +    __be32    offset_first_dump_section;
>> +
>> +    /* Fields for disk dump option. */
>> +    __be32    dd_block_size;
>> +    __be64    dd_block_offset;
>> +    __be64    dd_num_blocks;
>> +    __be32    dd_offset_disk_path;
>> +
>> +    /* Maximum time allowed to prevent an automatic dump-reboot. */
>> +    __be32    max_time_auto;
>> +};
>> +
>> +struct rtas_fadump_mem_struct {
>> +    struct rtas_fadump_section_header header;
>> +    struct rtas_fadump_section        rgn[FADUMP_MAX_SECTIONS];
>> +};
>> +
>> +struct fadump_metadata {
>> +    bool fadump_registered;
>> +    bool fadump_dump_active;
>> +    target_ulong fdm_addr;
>> +    struct rtas_fadump_mem_struct registered_fdm;
>> +};
>> +extern struct fadump_metadata fadump_metadata;
>> +
>>   /* RTAS ibm,get-system-parameter token values */
>>   #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS      20
>>   #define RTAS_SYSPARM_DIAGNOSTICS_RUN_MODE        42


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/6] hw/ppc: Trigger Fadump boot if fadump is registered
  2025-03-04  9:21   ` Harsh Prateek Bora
@ 2025-03-06  4:11     ` Aditya Gupta
  0 siblings, 0 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-03-06  4:11 UTC (permalink / raw)
  To: Harsh Prateek Bora, qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza, Sourabh Jain,
	Mahesh J Salgaonkar, Hari Bathini


On 04/03/25 14:51, Harsh Prateek Bora wrote:
>
>
> On 2/17/25 12:47, Aditya Gupta wrote:
>> According to PAPR:
>>
>>      R1–7.3.30–3. When the platform receives an ibm,os-term RTAS 
>> call, or
>>      on a system reset without an ibm,nmi-interlock RTAS call, if the
>>      platform has a dump structure registered through the
>>      ibm,configure-kernel-dump call, the platform must process each
>>      registered kernel dump section as required and, when available,
>>      present the dump structure information to the operating system
>>      through the “ibm,kernel-dump” property, updated with status for 
>> each
>>      dump section, until the dump has been invalidated through the
>>      ibm,configure-kernel-dump RTAS call.
>>
>> If Fadump has been registered, trigger an Fadump boot (memory preserving
>> boot), if QEMU recieves a 'ibm,os-term' rtas call.
>>
>> Implementing the fadump boot as:
>>      * pause all vcpus (will save registers later)
>>      * preserve memory regions specified by fadump
>
> Although mentioned later, but needs to call out here as not implemented
> in this patch. Ideally, all the prep work patches should be introduced
> earlier before enabling the trigger.
>
Got it, will try rearranging the code. Though with current code, the 
trigger won't be called as fadump will not get registered (as of this 
patch the rtas call was not exposed to the kernel, this will likely 
change in v2).
>>      * do a memory preserving reboot (GUEST_RESET in QEMU doesn't clear
>>        the memory)
>>
>> Memory regions registered by fadump will be handled in a later patch.
>>
>> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_rtas.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 42 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index eebdf13b1552..01c82375f03d 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -342,6 +342,43 @@ static void 
>> rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
>>   }
>>     struct fadump_metadata fadump_metadata;
>> +bool is_next_boot_fadump;
>> +
>> +static void trigger_fadump_boot(target_ulong spapr_retcode)
>> +{
>> +    /*
>> +     * In PowerNV, SBE stops all clocks for cores, do similar to it
>> +     * QEMU's nearest equivalent is 'pause_all_vcpus'
>> +     * See 'stopClocksS0' in SBE source code for more info on SBE part
>> +     */
>> +    pause_all_vcpus();
>> +
>> +    if (true /* TODO: Preserve memory registered for fadump */) {
>> +        /* Failed to preserve the registered memory regions */
>
> Instead of this, it is better to introduce the dummy stub here now 
> which can be populated in a later patch. That also helps in avoiding 
> code changes in this hunk in future patch.
>
> For eg:
>
> static bool fadump_preserved_mem(void)
> {
>     return false; /* TBD */
> }
>
> ...
>
> if (!fadump_preserve_mem()) {
>  ...
> }

Thanks, makes sense. I will do it this way.


Thanks,

- Aditya Gupta

>
>> +        rtas_st(spapr_retcode, 0, RTAS_OUT_HW_ERROR);
>> +
>> +        /* Cause a reboot */
>> +        qemu_system_guest_panicked(NULL);
>> +        return;
>> +    }
>> +
>> +    /* Mark next boot as fadump boot */
>> +    is_next_boot_fadump = true;
>> +
>> +    /* Reset fadump_registered for next boot */
>> +    fadump_metadata.fadump_registered = false;
>> +    fadump_metadata.fadump_dump_active = true;
>> +
>> +    /* Then do a guest reset */
>> +    /*
>> +     * Requirement:
>> +     * This guest reset should not clear the memory (which is
>> +     * the case when this is merged)
>> +     */
>> +    qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
>> +
>> +    rtas_st(spapr_retcode, 0, RTAS_OUT_SUCCESS);
>> +}
>>     /* Papr Section 7.4.9 ibm,configure-kernel-dump RTAS call */
>>   static __attribute((unused)) void 
>> rtas_configure_kernel_dump(PowerPCCPU *cpu,
>> @@ -449,6 +486,11 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
>>       target_ulong msgaddr = rtas_ld(args, 0);
>>       char msg[512];
>>   +    if (fadump_metadata.fadump_registered) {
>> +        /* If fadump boot works, control won't come back here */
>> +        return trigger_fadump_boot(rets);
>> +    }
>> +
>>       cpu_physical_memory_read(msgaddr, msg, sizeof(msg) - 1);
>>       msg[sizeof(msg) - 1] = 0;


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/6] hw/ppc: Preserve memory regions registered for fadump
  2025-03-05  6:40   ` Harsh Prateek Bora
@ 2025-03-06  4:16     ` Aditya Gupta
  0 siblings, 0 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-03-06  4:16 UTC (permalink / raw)
  To: Harsh Prateek Bora, qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza, Sourabh Jain,
	Mahesh J Salgaonkar, Hari Bathini

On 05/03/25 12:10, Harsh Prateek Bora wrote:

>
>
> On 2/17/25 12:47, Aditya Gupta wrote:
>> <...snip...>
>>
>> +        /* Reset error_flags & bytes_dumped for now */
>> +        fdm->rgn[i].error_flags = 0;
>> +        fdm->rgn[i].bytes_dumped = 0;
>> +
>> +        if (be32_to_cpu(fdm->rgn[i].request_flag) != 
>> FADUMP_REQUEST_FLAG) {
>> +            qemu_log_mask(LOG_UNIMP,
>> +                "FADUMP: Skipping copying region as not requested\n");
>> +            continue;
>> +        }
>> +
>> +        switch (data_type) {
>> +        case FADUMP_CPU_STATE_DATA:
>> +            /* TODO: Add CPU state data */
>> +            break;
>> +        case FADUMP_HPTE_REGION:
>> +            /* TODO: Add hpte state data */
>> +            break;
>> +        case FADUMP_REAL_MODE_REGION:
>> +        case FADUMP_PARAM_AREA:
>> +            /* Skip copy if source and destination are same (eg. 
>> param area) */
>> +            if (src_addr != dest_addr) {
>> +                copy_buffer = g_malloc(src_len + 1);
>> +                if (copy_buffer == NULL) {
>> +                    qemu_log_mask(LOG_GUEST_ERROR,
>> +                        "FADUMP: Failed allocating memory for 
>> copying reserved memory regions\n");
>> +                    fdm->rgn[i].error_flags =
>> + cpu_to_be16(FADUMP_ERROR_LENGTH_EXCEEDS_SOURCE);
>> +
>> +                    continue;
>> +                }
>> +
>> +                /* Copy the source region to destination */
>> +                cpu_physical_memory_read(src_addr, copy_buffer, 
>> src_len);
>> +                cpu_physical_memory_write(dest_addr, copy_buffer, 
>> src_len);
>> +                g_free(copy_buffer);
>> +            }
>> +
>> +            /*
>> +             * Considering cpu_physical_memory_write would have 
>> copied the
>> +             * complete region
>> +             */
>> +            fdm->rgn[i].bytes_dumped = cpu_to_be64(src_len);
>
> Is this really valid for FADUMP_PARAM_AREA where we intend to skip copy?
>
Yes I think it's good to keep it. Because that's an optimisation i did 
to skip the copy if src and dest are same.

But the actual copy depends on the OS passing us the 
"FADUMP_REQUEST_FLAG" in the region's request flag.

If the flag is set, i am expecting the kernel asked us to copy, and 
hence the .bytes_dumped should be same as the number of bytes asked by 
kernel to copy ideally.

If the flag is not set, we return early, so we let the .bytes_dumped be 0.

>> +
>> +            break;
>> +        default:
>> +            qemu_log_mask(LOG_GUEST_ERROR,
>> +                "FADUMP: Skipping unknown source data type: %d\n", 
>> data_type);
>> +
>> +            fdm->rgn[i].error_flags =
>> +                cpu_to_be16(FADUMP_ERROR_INVALID_DATA_TYPE);
>> +        }
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>   static void trigger_fadump_boot(target_ulong spapr_retcode)
>>   {
>>       /*
>> @@ -353,7 +467,8 @@ static void trigger_fadump_boot(target_ulong 
>> spapr_retcode)
>>        */
>>       pause_all_vcpus();
>>   -    if (true /* TODO: Preserve memory registered for fadump */) {
>> +    /* Preserve the memory locations registered for fadump */
>> +    if (!fadump_preserve_mem()) {
>
> This change can be avoided as suggested in previous patch.
Agreed, will do it in previous patch.
>
>>           /* Failed to preserve the registered memory regions */
>>           rtas_st(spapr_retcode, 0, RTAS_OUT_HW_ERROR);
>>   diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index efa2f891a8a7..a80704187583 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -776,7 +776,32 @@ void push_sregs_to_kvm_pr(SpaprMachineState 
>> *spapr);
>>   #define FADUMP_CMD_UNREGISTER          2
>>   #define FADUMP_CMD_INVALIDATE          3
>>   -#define FADUMP_VERSION    1
>> +#define FADUMP_VERSION                 1
>
> This change can be avoided if taken care initially.

Nice, I missed this diff. Will fix it in the patch which introduced this.


Thanks,

- Aditya Gupta

>
> Thanks
> Harsh
>> +
>> +/*
>> + * The Firmware Assisted Dump Memory structure supports a maximum of 
>> 10 sections
>> + * in the dump memory structure. Presently, three sections are used for
>> + * CPU state data, HPTE & Parameters area, while the remaining seven 
>> sections
>> + * can be used for boot memory regions.
>> + */
>> +#define FADUMP_MAX_SECTIONS            10
>> +#define RTAS_FADUMP_MAX_BOOT_MEM_REGS  7
>> +
>> +/* Firmware provided dump sections */
>> +#define FADUMP_CPU_STATE_DATA   0x0001
>> +#define FADUMP_HPTE_REGION      0x0002
>> +#define FADUMP_REAL_MODE_REGION 0x0011
>> +
>> +/* OS defined sections */
>> +#define FADUMP_PARAM_AREA       0x0100
>> +
>> +/* Dump request flag */
>> +#define FADUMP_REQUEST_FLAG     0x00000001
>> +
>> +/* Dump status flag */
>> +#define FADUMP_ERROR_INVALID_DATA_TYPE          0x8000
>> +#define FADUMP_ERROR_INVALID_SOURCE_ADDR        0x4000
>> +#define FADUMP_ERROR_LENGTH_EXCEEDS_SOURCE      0x2000
>>     /*
>>    * The Firmware Assisted Dump Memory structure supports a maximum 
>> of 10 sections


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/6] hw/ppc: Implement saving CPU state in Fadump
  2025-03-05  7:23   ` Harsh Prateek Bora
@ 2025-03-06  4:22     ` Aditya Gupta
  0 siblings, 0 replies; 29+ messages in thread
From: Aditya Gupta @ 2025-03-06  4:22 UTC (permalink / raw)
  To: Harsh Prateek Bora, qemu-devel
  Cc: qemu-ppc, Nicholas Piggin, Daniel Henrique Barboza, Sourabh Jain,
	Mahesh J Salgaonkar, Hari Bathini

On 05/03/25 12:53, Harsh Prateek Bora wrote:

>
>
> On 2/17/25 12:47, Aditya Gupta wrote:
>> <...snip...>
>>
>> +        case FADUMP_CPU_STATE_DATA: {
>> +            struct rtas_fadump_reg_save_area_header reg_save_hdr;
>> +            struct rtas_fadump_reg_entry **reg_entries;
>> +            struct rtas_fadump_reg_entry *curr_reg_entry;
>> +
>> +            uint32_t fadump_reg_entries_size;
>> +            __be32 num_cpus = 0;
>> +            uint32_t num_regs_per_cpu = 0;
>> +            CPUState *cpu;
>> +            CPUPPCState *env;
>> +            PowerPCCPU *ppc_cpu;
>> +
>> +            CPU_FOREACH(cpu) {
>> +                ++num_cpus;
>> +            }
>> +
>> +            reg_save_hdr.version = cpu_to_be32(1);
>
> PAPR spec mentions version value as 0. Do we need to update ?
Yes, will fix, thanks Harsh.
>
>> +            reg_save_hdr.magic_number =
>> +                cpu_to_be64(fadump_str_to_u64("REGSAVE"));
>> +
>> +            /* Reg save area header is immediately followed by num 
>> cpus */
>> +            reg_save_hdr.num_cpu_offset =
>> +                cpu_to_be32(sizeof(struct 
>> rtas_fadump_reg_save_area_header));
>> +
>
> Above inits could go into a helper 
> fadump_init_reg_save_header(&reg_save_hdr);
> BTW, the PAPR spec also mentions about padding followed by 
> num_cpus_offset, see another comment later below.
>
>
>> +            fadump_reg_entries_size = num_cpus *
>> +                                      FADUMP_NUM_PER_CPU_REGS *
>> +                                      sizeof(struct 
>> rtas_fadump_reg_entry);
>> +
>> +            reg_entries = malloc(fadump_reg_entries_size);
>
> This was declared as double pointer, but being used as a pointer.
Agreed, not needed to keep it as double pointer. My initial plan for 
this variable was different, that's why i was using double pointer 
earlier to point to a list of CPU registers, and each CPU registers 
itself an array. Not needed in current implementation. Will fix it.
>
>> +            curr_reg_entry = (struct rtas_fadump_reg_entry 
>> *)reg_entries;
>> +
>> +            /* This must loop num_cpus time */
>> +            CPU_FOREACH(cpu) {
>> +                ppc_cpu = POWERPC_CPU(cpu);
>> +                env = cpu_env(cpu);
>> +                num_regs_per_cpu = 0;
>> +
>> +                curr_reg_entry->reg_id =
>> +                    cpu_to_be64(fadump_str_to_u64("CPUSTRT"));
>> +                curr_reg_entry->reg_value = ppc_cpu->vcpu_id;
>> +                ++curr_reg_entry;
>> +
>> +#define REG_ENTRY(id, val) \
>> +                do { \
>> +                    curr_reg_entry->reg_id =                   \
>> +                        cpu_to_be64(fadump_str_to_u64(#id)); \
>> +                    curr_reg_entry->reg_value = val;           \
>> +                    ++curr_reg_entry; \
>> +                    ++num_regs_per_cpu; \
>> +                } while (0)
>> +
>> +                REG_ENTRY(ACOP, env->spr[SPR_ACOP]);
>> +                REG_ENTRY(AMR, env->spr[SPR_AMR]);
>> +                REG_ENTRY(BESCR, env->spr[SPR_BESCR]);
>> +                REG_ENTRY(CFAR, env->spr[SPR_CFAR]);
>> +                REG_ENTRY(CIABR, env->spr[SPR_CIABR]);
>> +
>> +                /* Save the condition register */
>> +                uint64_t cr = 0;
>> +                cr |= (env->crf[0] & 0xf);
>> +                cr |= (env->crf[1] & 0xf) << 1;
>> +                cr |= (env->crf[2] & 0xf) << 2;
>> +                cr |= (env->crf[3] & 0xf) << 3;
>> +                cr |= (env->crf[4] & 0xf) << 4;
>> +                cr |= (env->crf[5] & 0xf) << 5;
>> +                cr |= (env->crf[6] & 0xf) << 6;
>> +                cr |= (env->crf[7] & 0xf) << 7;
>> +                REG_ENTRY(CR, cr);
>
> ppc_get_cr ?
Thanks, will use it.
>
>> +
>> +                REG_ENTRY(CTR, env->spr[SPR_CTR]);
>> +                REG_ENTRY(CTRL, env->spr[SPR_CTRL]);
>> +                REG_ENTRY(DABR, env->spr[SPR_DABR]);
>> +                REG_ENTRY(DABRX, env->spr[SPR_DABRX]);
>> +                REG_ENTRY(DAR, env->spr[SPR_DAR]);
>> +                REG_ENTRY(DAWR0, env->spr[SPR_DAWR0]);
>> +                REG_ENTRY(DAWR1, env->spr[SPR_DAWR1]);
>> +                REG_ENTRY(DAWRX0, env->spr[SPR_DAWRX0]);
>> +                REG_ENTRY(DAWRX1, env->spr[SPR_DAWRX1]);
>> +                REG_ENTRY(DPDES, env->spr[SPR_DPDES]);
>> +                REG_ENTRY(DSCR, env->spr[SPR_DSCR]);
>> +                REG_ENTRY(DSISR, env->spr[SPR_DSISR]);
>> +                REG_ENTRY(EBBHR, env->spr[SPR_EBBHR]);
>> +                REG_ENTRY(EBBRR, env->spr[SPR_EBBRR]);
>> +
>> +                REG_ENTRY(FPSCR, env->fpscr);
>> +                REG_ENTRY(FSCR, env->spr[SPR_FSCR]);
>> +
>> +                /* Save the GPRs */
>> +                for (int gpr_id = 0; gpr_id < 32; ++gpr_id) {
>> +                    curr_reg_entry->reg_id =
>> + cpu_to_be64(fadump_gpr_id_to_u64(gpr_id));
>> +                    curr_reg_entry->reg_value = env->gpr[i];
>> +                    ++curr_reg_entry;
>> +                    ++num_regs_per_cpu;
>> +                }
>> +
>> +                REG_ENTRY(IAMR, env->spr[SPR_IAMR]);
>> +                REG_ENTRY(IC, env->spr[SPR_IC]);
>> +                REG_ENTRY(LR, env->spr[SPR_LR]);
>> +
>> +                REG_ENTRY(MSR, env->msr);
>> +                REG_ENTRY(NIA, env->nip);   /* NIA */
>> +                REG_ENTRY(PIR, env->spr[SPR_PIR]);
>> +                REG_ENTRY(PSPB, env->spr[SPR_PSPB]);
>> +                REG_ENTRY(PVR, env->spr[SPR_PVR]);
>> +                REG_ENTRY(RPR, env->spr[SPR_RPR]);
>> +                REG_ENTRY(SPURR, env->spr[SPR_SPURR]);
>> +                REG_ENTRY(SRR0, env->spr[SPR_SRR0]);
>> +                REG_ENTRY(SRR1, env->spr[SPR_SRR1]);
>> +                REG_ENTRY(TAR, env->spr[SPR_TAR]);
>> +                REG_ENTRY(TEXASR, env->spr[SPR_TEXASR]);
>> +                REG_ENTRY(TFHAR, env->spr[SPR_TFHAR]);
>> +                REG_ENTRY(TFIAR, env->spr[SPR_TFIAR]);
>> +                REG_ENTRY(TIR, env->spr[SPR_TIR]);
>> +                REG_ENTRY(UAMOR, env->spr[SPR_UAMOR]);
>> +                REG_ENTRY(VRSAVE, env->spr[SPR_VRSAVE]);
>> +                REG_ENTRY(VSCR, env->vscr);
>> +                REG_ENTRY(VTB, env->spr[SPR_VTB]);
>> +                REG_ENTRY(WORT, env->spr[SPR_WORT]);
>> +                REG_ENTRY(XER, env->spr[SPR_XER]);
>> +
>> +                /*
>> +                 * Ignoring transaction checkpoint and few other 
>> registers
>> +                 * mentioned in PAPR as not supported in QEMU
>> +                 */
>> +#undef REG_ENTRY
>> +
>> +                /* End the registers for this CPU with "CPUEND" reg 
>> entry */
>> +                curr_reg_entry->reg_id =
>> +                    cpu_to_be64(fadump_str_to_u64("CPUEND"));
>> +
>> +                /* Ensure the number of registers match (+2 for STRT 
>> & END) */
>> +                assert(FADUMP_NUM_PER_CPU_REGS == num_regs_per_cpu + 
>> 2);
>> +
>> +                ++curr_reg_entry;
>> +            }
>> +
>> +            cpu_state_len = 0;
>> +            cpu_state_len += sizeof(reg_save_hdr);     /* reg save 
>> header */
>> +            cpu_state_len += sizeof(__be32);           /* num_cpus */
>> +            cpu_state_len += fadump_reg_entries_size;  /* reg 
>> entries */
>> +
>
> above 4 inits could be a single line init.
Yes it could be, but i kept it this way as it looks more readable to me 
with the comments, what do you think ?
>
>> +            cpu_state_region = &fdm->rgn[i];
>> +            cpu_state_addr = dest_addr;
>> +            cpu_state_buffer = g_malloc(cpu_state_len);
>> +
>> +            uint64_t offset = 0;
>> +            memcpy(cpu_state_buffer + offset,
>> +                    &reg_save_hdr, sizeof(reg_save_hdr));
>> +            offset += sizeof(reg_save_hdr);
>> +
>> +            /* Write num_cpus */
>> +            num_cpus = cpu_to_be32(num_cpus);
>> +            memcpy(cpu_state_buffer + offset, &num_cpus, 
>> sizeof(__be32));
>> +            offset += sizeof(__be32);
>
> As per PAPR spec, NumCpusOffset is followed by a padding of 0xC bytes
> initialized to 0. Any reasons for skipping that here ?
Missed that padding, didn't notice as kernel also was picking up the 
correct num cpus in my testing, will fix this, and see how the kernel 
got the correct value then.
>
>> +
>> +            /* Write the register entries */
>> +            memcpy(cpu_state_buffer + offset,
>> +                    reg_entries, fadump_reg_entries_size);
>> +            offset += fadump_reg_entries_size;
>> +
>> +            /*
>> +             * We will write the cpu state data later, as otherwise it
>> +             * might get overwritten by other fadump regions
>> +             */
>> +
>
> Better to have a separate routine fadump_preserve_cpu_state_data() 
> when the switch case logic grows this large, applies wherever applicable.
Yes, multiple switch cases have huge logic in my patches, will move them 
to helpers.
>
>>               break;
>> +        }
>>           case FADUMP_HPTE_REGION:
>>               /* TODO: Add hpte state data */
>>               break;
>> @@ -455,6 +623,34 @@ static bool fadump_preserve_mem(void)
>>           }
>>       }
>>   +    /*
>> +     * Write the Register Save Area
>> +     *
>> +     * CPU State/Register Save Area should be written after dumping the
>> +     * memory to prevent overwritting while saving other memory regions
>> +     *
>> +     * eg. If boot memory region is 1G, then both the first 1GB 
>> memory, and
>> +     * the Register Save Area needs to be saved at 1GB.
>> +     * And as the CPU_STATE_DATA region comes first than the
>> +     * REAL_MODE_REGION region to be copied, the CPU_STATE_DATA will 
>> get
>> +     * overwritten if saved before the 0GB - 1GB region is copied after
>> +     * saving CPU state data
>> +     */
>> +    cpu_physical_memory_write(cpu_state_addr, cpu_state_buffer, 
>> cpu_state_len);
>> +    g_free(cpu_state_buffer);
>> +
>> +    /*
>> +     * Set bytes_dumped in cpu state region, so kernel knows 
>> platform have
>> +     * exported it
>> +     */
>> +    cpu_state_region->bytes_dumped = cpu_to_be64(cpu_state_len);
>> +
>> +    if (cpu_state_region->source_len != 
>> cpu_state_region->bytes_dumped) {
>> +        qemu_log_mask(LOG_GUEST_ERROR,
>> +                "CPU State region's length passed by kernel, doesn't 
>> match"
>> +                " with CPU State region length exported by QEMU");
>
>           return error ?

Yes, will return PARAM_ERROR here ?


Thanks Harsh for the detailed reviews.

- Aditya Gupta




^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2025-03-06  4:22 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-17  7:17 [PATCH 0/6] Implement Firmware Assisted Dump for PSeries Aditya Gupta
2025-02-17  7:17 ` [PATCH 1/6] hw/ppc: Implement skeleton code for fadump in PSeries Aditya Gupta
2025-02-27  3:07   ` Nicholas Piggin
2025-02-27  6:49     ` Aditya Gupta
2025-02-27  8:48       ` Nicholas Piggin
2025-02-27 12:15         ` Aditya Gupta
2025-03-04  9:01   ` Harsh Prateek Bora
2025-03-06  4:08     ` Aditya Gupta
2025-02-17  7:17 ` [PATCH 2/6] hw/ppc: Trigger Fadump boot if fadump is registered Aditya Gupta
2025-02-27  3:14   ` Nicholas Piggin
2025-02-27  6:56     ` Aditya Gupta
2025-03-04  9:21   ` Harsh Prateek Bora
2025-03-06  4:11     ` Aditya Gupta
2025-02-17  7:17 ` [PATCH 3/6] hw/ppc: Preserve memory regions registered for fadump Aditya Gupta
2025-03-05  6:40   ` Harsh Prateek Bora
2025-03-06  4:16     ` Aditya Gupta
2025-02-17  7:17 ` [PATCH 4/6] hw/ppc: Implement saving CPU state in Fadump Aditya Gupta
2025-02-27  3:27   ` Nicholas Piggin
2025-02-27  7:01     ` Aditya Gupta
2025-03-05  7:23   ` Harsh Prateek Bora
2025-03-06  4:22     ` Aditya Gupta
2025-02-17  7:17 ` [PATCH 5/6] hw/ppc: Pass device tree properties for Fadump Aditya Gupta
2025-02-27  3:28   ` Nicholas Piggin
2025-02-27  7:02     ` Aditya Gupta
2025-03-05  7:34     ` Harsh Prateek Bora
2025-02-17  7:17 ` [PATCH 6/6] hw/ppc: Enable Fadump for PSeries Aditya Gupta
2025-02-27  3:33   ` Nicholas Piggin
2025-02-27  7:07     ` Aditya Gupta
2025-02-27  8:52       ` Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).