[PATCH 0/7] ppc: fix larx migration, fix record-replay

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/7] ppc: fix larx migration, fix record-replay
@ 2023-06-23 12:57 Nicholas Piggin
  2023-06-23 12:57 ` [PATCH 1/7] target/ppc: Fix CPU reservation migration for record-replay Nicholas Piggin
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Nicholas Piggin @ 2023-06-23 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-ppc, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Greg Kurz,
	Harsh Prateek Bora, John Snow, Cleber Rosa, Pavel Dovgalyuk,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Peter Maydell,
	Richard Henderson

Hi, this is a bit of an RFC patch, I may need to send patches to
different trees to merge but they kind of go together.

The primary motivation is to fix migrating larx reservations,
previously discussed here:

https://lists.gnu.org/archive/html/qemu-ppc/2023-06/msg00452.html

It turns out a recent patch fixed it in a hacky way by chance, but the
fix is not compatible with rr debugging as Peter noted. Fortunately rr
debugging is broken on ppc, so we are done.

Can it be fixed nicely? Patch 1 tries that by migrating reservation
state when rr is in use. The rest of the patches is getting rr to
work. I've not go to trying to add a specific larx test case for it
yet, but it started to pass basic tests. There is one strangeness
explained in the final patch which I've not yet worked out though.

Comments welcome.

Thanks,
Nick

Nicholas Piggin (7):
  target/ppc: Fix CPU reservation migration for record-replay
  scripts/replay_dump.sh: Update to current rr record format
  spapr: Fix machine reset deadlock from replay-record
  spapr: Fix record-replay machine reset consuming too many events
  target/ppc: Fix timebase reset with record-replay
  tests/avocado: boot ppc64 pseries replay-record test to Linux VFS
    mount
  tests/avocado: ppc64 pseries reverse debugging test

 hw/ppc/ppc.c                       | 11 +++-
 hw/ppc/spapr.c                     | 32 +++++++++--
 include/hw/ppc/spapr.h             |  2 +
 scripts/replay-dump.py             | 89 ++++++++++++++++++++++++++++--
 target/ppc/compat.c                | 19 +++++++
 target/ppc/cpu.h                   |  3 +
 target/ppc/machine.c               | 26 ++++++++-
 target/ppc/translate.c             |  2 +
 tests/avocado/replay_kernel.py     |  3 +-
 tests/avocado/reverse_debugging.py | 28 +++++++++-
 10 files changed, 197 insertions(+), 18 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] target/ppc: Fix CPU reservation migration for record-replay
  2023-06-23 12:57 [PATCH 0/7] ppc: fix larx migration, fix record-replay Nicholas Piggin
@ 2023-06-23 12:57 ` Nicholas Piggin
  2023-06-26  7:49   ` Pavel Dovgalyuk
  2023-07-07  9:23   ` Daniel Henrique Barboza
  2023-06-23 12:57 ` [PATCH 2/7] scripts/replay_dump.sh: Update to current rr record format Nicholas Piggin
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 17+ messages in thread
From: Nicholas Piggin @ 2023-06-23 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-ppc, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Greg Kurz,
	Harsh Prateek Bora, John Snow, Cleber Rosa, Pavel Dovgalyuk,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Peter Maydell,
	Richard Henderson

ppc only migrates reserve_addr, so the destination machine can get a
valid reservation with an incorrect reservation value of 0. Prior to
commit 392d328abe753 ("target/ppc: Ensure stcx size matches larx"),
this could permit a stcx. to incorrectly succeed. That commit
inadvertently fixed that bug because the target machine starts with an
impossible reservation size of 0, so any stcx. will fail.

This behaviour is permitted by the ISA because reservation loss may
have implementation-dependent cause. What's more, with KVM machines it
is impossible save or reasonably restore reservation state. However if
the vmstate is being used for record-replay, the reservation must be
saved and restored exactly in order for execution from snapshot to
match the record.

This patch deprecates the existing incomplete reserve_addr vmstate,
and adds a new vmstate subsection with complete reservation state.
The new vmstate is needed only when record-replay mode is active.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 target/ppc/cpu.h       |  2 ++
 target/ppc/machine.c   | 26 ++++++++++++++++++++++++--
 target/ppc/translate.c |  2 ++
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 4138a25801..0087ce66e2 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1119,7 +1119,9 @@ struct CPUArchState {
     target_ulong reserve_addr;   /* Reservation address */
     target_ulong reserve_length; /* Reservation larx op size (bytes) */
     target_ulong reserve_val;    /* Reservation value */
+#if defined(TARGET_PPC64)
     target_ulong reserve_val2;
+#endif
 
     /* These are used in supervisor mode only */
     target_ulong msr;      /* machine state register */
diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index 134b16c625..a817532e5b 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -10,6 +10,7 @@
 #include "qemu/main-loop.h"
 #include "kvm_ppc.h"
 #include "power8-pmu.h"
+#include "sysemu/replay.h"
 
 static void post_load_update_msr(CPUPPCState *env)
 {
@@ -671,6 +672,27 @@ static const VMStateDescription vmstate_compat = {
     }
 };
 
+static bool reservation_needed(void *opaque)
+{
+    return (replay_mode != REPLAY_MODE_NONE);
+}
+
+static const VMStateDescription vmstate_reservation = {
+    .name = "cpu/reservation",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = reservation_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
+        VMSTATE_UINTTL(env.reserve_length, PowerPCCPU),
+        VMSTATE_UINTTL(env.reserve_val, PowerPCCPU),
+#if defined(TARGET_PPC64)
+        VMSTATE_UINTTL(env.reserve_val2, PowerPCCPU),
+#endif
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 const VMStateDescription vmstate_ppc_cpu = {
     .name = "cpu",
     .version_id = 5,
@@ -692,8 +714,7 @@ const VMStateDescription vmstate_ppc_cpu = {
         VMSTATE_UINTTL_ARRAY(env.spr, PowerPCCPU, 1024),
         VMSTATE_UINT64(env.spe_acc, PowerPCCPU),
 
-        /* Reservation */
-        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
+        VMSTATE_UNUSED(sizeof(target_ulong)), /* was env.reserve_addr */
 
         /* Supervisor mode architected state */
         VMSTATE_UINTTL(env.msr, PowerPCCPU),
@@ -722,6 +743,7 @@ const VMStateDescription vmstate_ppc_cpu = {
         &vmstate_tlbemb,
         &vmstate_tlbmas,
         &vmstate_compat,
+        &vmstate_reservation,
         NULL
     }
 };
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c9fb7b40a5..eb278c2683 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -77,7 +77,9 @@ static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
 static TCGv cpu_reserve;
 static TCGv cpu_reserve_length;
 static TCGv cpu_reserve_val;
+#if defined(TARGET_PPC64)
 static TCGv cpu_reserve_val2;
+#endif
 static TCGv cpu_fpscr;
 static TCGv_i32 cpu_access_type;
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] scripts/replay_dump.sh: Update to current rr record format
  2023-06-23 12:57 [PATCH 0/7] ppc: fix larx migration, fix record-replay Nicholas Piggin
  2023-06-23 12:57 ` [PATCH 1/7] target/ppc: Fix CPU reservation migration for record-replay Nicholas Piggin
@ 2023-06-23 12:57 ` Nicholas Piggin
  2023-06-23 12:57 ` [PATCH 3/7] spapr: Fix machine reset deadlock from replay-record Nicholas Piggin
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2023-06-23 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-ppc, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Greg Kurz,
	Harsh Prateek Bora, John Snow, Cleber Rosa, Pavel Dovgalyuk,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Peter Maydell,
	Richard Henderson

This thing seems to have fallen by the wayside. This quick hack gets
it vaguely working with the current format. It was some use in fixing
rr support for ppc, so maybe others will find it useful too.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 scripts/replay-dump.py | 89 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 83 insertions(+), 6 deletions(-)

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index 3ba97a6d30..c46ff8ffd6 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -122,12 +122,19 @@ def swallow_async_qword(eid, name, dumpfile):
     print("  %s(%d) @ %d" % (name, eid, step_id))
     return True
 
+def swallow_bytes(eid, name, dumpfile, nr):
+    "Swallow nr bytes of data without looking at it"
+    for x in range(nr):
+         read_byte(dumpfile)
+    return True
+
 async_decode_table = [ Decoder(0, "REPLAY_ASYNC_EVENT_BH", swallow_async_qword),
-                       Decoder(1, "REPLAY_ASYNC_INPUT", decode_unimp),
-                       Decoder(2, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
-                       Decoder(3, "REPLAY_ASYNC_CHAR_READ", decode_unimp),
-                       Decoder(4, "REPLAY_ASYNC_EVENT_BLOCK", decode_unimp),
-                       Decoder(5, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
+                       Decoder(1, "REPLAY_ASYNC_BH_ONESHOT", decode_unimp),
+                       Decoder(2, "REPLAY_ASYNC_INPUT", decode_unimp),
+                       Decoder(3, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
+                       Decoder(4, "REPLAY_ASYNC_CHAR_READ", decode_unimp),
+                       Decoder(5, "REPLAY_ASYNC_EVENT_BLOCK", decode_unimp),
+                       Decoder(6, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
 ]
 # See replay_read_events/replay_read_event
 def decode_async(eid, name, dumpfile):
@@ -156,6 +163,13 @@ def decode_audio_out(eid, name, dumpfile):
     print_event(eid, name, "%d" % (audio_data))
     return True
 
+def decode_random(eid, name, dumpfile):
+    ret = read_dword(dumpfile)
+    size = read_dword(dumpfile)
+    swallow_bytes(eid, name, dumpfile, size)
+    print_event(eid, name, "%d %d" % (ret, size))
+    return True
+
 def decode_checkpoint(eid, name, dumpfile):
     """Decode a checkpoint.
 
@@ -184,6 +198,24 @@ def decode_interrupt(eid, name, dumpfile):
     print_event(eid, name)
     return True
 
+def decode_exception(eid, name, dumpfile):
+    print_event(eid, name)
+    return True
+
+def decode_shutdown(eid, name, dumpfile):
+    print_event(eid, name)
+    return True
+
+def decode_end(eid, name, dumpfile):
+    print_event(eid, name)
+    return False
+
+def decode_char_write(eid, name, dumpfile):
+    res = read_dword(dumpfile)
+    offset = read_dword(dumpfile)
+    print_event(eid, name)
+    return True
+
 def decode_clock(eid, name, dumpfile):
     clock_data = read_qword(dumpfile)
     print_event(eid, name, "0x%x" % (clock_data))
@@ -268,6 +300,48 @@ def decode_clock(eid, name, dumpfile):
                   Decoder(28, "EVENT_CP_RESET", decode_checkpoint),
 ]
 
+v12_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
+                  Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
+                  Decoder(2, "EVENT_EXCEPTION", decode_exception),
+                  Decoder(3, "EVENT_ASYNC_BH", swallow_async_qword),
+                  Decoder(4, "EVENT_ASYNC_BH_ONESHOT", decode_unimp),
+                  Decoder(5, "EVENT_ASYNC_INPUT", decode_unimp),
+                  Decoder(6, "EVENT_ASYNC_INPUT_SYNC", decode_unimp),
+                  Decoder(7, "EVENT_ASYNC_CHAR_READ", decode_unimp),
+                  Decoder(8, "EVENT_ASYNC_BLOCK", decode_unimp),
+                  Decoder(9, "EVENT_ASYNC_NET", decode_unimp),
+                  Decoder(10, "EVENT_SHUTDOWN", decode_unimp),
+                  Decoder(11, "EVENT_SHUTDOWN_HOST_ERR", decode_shutdown),
+                  Decoder(12, "EVENT_SHUTDOWN_HOST_QMP_QUIT", decode_shutdown),
+                  Decoder(13, "EVENT_SHUTDOWN_HOST_QMP_RESET", decode_shutdown),
+                  Decoder(14, "EVENT_SHUTDOWN_HOST_SIGNAL", decode_shutdown),
+                  Decoder(15, "EVENT_SHUTDOWN_HOST_UI", decode_shutdown),
+                  Decoder(16, "EVENT_SHUTDOWN_GUEST_SHUTDOWN", decode_shutdown),
+                  Decoder(17, "EVENT_SHUTDOWN_GUEST_RESET", decode_shutdown),
+                  Decoder(18, "EVENT_SHUTDOWN_GUEST_PANIC", decode_shutdown),
+                  Decoder(19, "EVENT_SHUTDOWN_SUBSYS_RESET", decode_shutdown),
+                  Decoder(20, "EVENT_SHUTDOWN_SNAPSHOT_LOAD", decode_shutdown),
+                  Decoder(21, "EVENT_SHUTDOWN___MAX", decode_shutdown),
+                  Decoder(22, "EVENT_CHAR_WRITE", decode_char_write),
+                  Decoder(23, "EVENT_CHAR_READ_ALL", decode_unimp),
+                  Decoder(24, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
+                  Decoder(25, "EVENT_AUDIO_OUT", decode_audio_out),
+                  Decoder(26, "EVENT_AUDIO_IN", decode_unimp),
+                  Decoder(27, "EVENT_RANDOM", decode_random),
+                  Decoder(28, "EVENT_CLOCK_HOST", decode_clock),
+                  Decoder(29, "EVENT_CLOCK_VIRTUAL_RT", decode_clock),
+                  Decoder(30, "EVENT_CP_CLOCK_WARP_START", decode_checkpoint),
+                  Decoder(31, "EVENT_CP_CLOCK_WARP_ACCOUNT", decode_checkpoint),
+                  Decoder(32, "EVENT_CP_RESET_REQUESTED", decode_checkpoint),
+                  Decoder(33, "EVENT_CP_SUSPEND_REQUESTED", decode_checkpoint),
+                  Decoder(34, "EVENT_CP_CLOCK_VIRTUAL", decode_checkpoint),
+                  Decoder(35, "EVENT_CP_CLOCK_HOST", decode_checkpoint),
+                  Decoder(36, "EVENT_CP_CLOCK_VIRTUAL_RT", decode_checkpoint),
+                  Decoder(37, "EVENT_CP_INIT", decode_checkpoint_init),
+                  Decoder(38, "EVENT_CP_RESET", decode_checkpoint),
+                  Decoder(39, "EVENT_END", decode_end),
+]
+
 def parse_arguments():
     "Grab arguments for script"
     parser = argparse.ArgumentParser()
@@ -285,7 +359,10 @@ def decode_file(filename):
 
     print("HEADER: version 0x%x" % (version))
 
-    if version == 0xe02007:
+    if version == 0xe0200c:
+        event_decode_table = v12_event_table
+        replay_state.checkpoint_start = 12
+    elif version == 0xe02007:
         event_decode_table = v7_event_table
         replay_state.checkpoint_start = 12
     elif version == 0xe02006:
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/7] spapr: Fix machine reset deadlock from replay-record
  2023-06-23 12:57 [PATCH 0/7] ppc: fix larx migration, fix record-replay Nicholas Piggin
  2023-06-23 12:57 ` [PATCH 1/7] target/ppc: Fix CPU reservation migration for record-replay Nicholas Piggin
  2023-06-23 12:57 ` [PATCH 2/7] scripts/replay_dump.sh: Update to current rr record format Nicholas Piggin
@ 2023-06-23 12:57 ` Nicholas Piggin
  2023-06-23 12:57 ` [PATCH 4/7] spapr: Fix record-replay machine reset consuming too many events Nicholas Piggin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2023-06-23 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-ppc, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Greg Kurz,
	Harsh Prateek Bora, John Snow, Cleber Rosa, Pavel Dovgalyuk,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Peter Maydell,
	Richard Henderson

When the machine is reset to load a new snapshot while being debugged
with replay-record, it is done from another thread, so the CPU does
not run the register setting operations. Set CPU registers directly in
machine reset.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/ppc/spapr.c         | 20 ++++++++++++++++++--
 include/hw/ppc/spapr.h |  1 +
 target/ppc/compat.c    | 19 +++++++++++++++++++
 target/ppc/cpu.h       |  1 +
 4 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 677b5eef9d..d290acfa95 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1317,6 +1317,22 @@ void spapr_set_all_lpcrs(target_ulong value, target_ulong mask)
     }
 }
 
+/* May be used when the machine is not running */
+void spapr_init_all_lpcrs(target_ulong value, target_ulong mask)
+{
+    CPUState *cs;
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+        CPUPPCState *env = &cpu->env;
+        target_ulong lpcr;
+
+        lpcr = env->spr[SPR_LPCR];
+        lpcr &= ~(LPCR_HR | LPCR_UPRT);
+        ppc_store_lpcr(cpu, lpcr);
+    }
+}
+
+
 static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
                            target_ulong lpid, ppc_v3_pate_t *entry)
 {
@@ -1578,7 +1594,7 @@ int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error **errp)
     }
     /* We're setting up a hash table, so that means we're not radix */
     spapr->patb_entry = 0;
-    spapr_set_all_lpcrs(0, LPCR_HR | LPCR_UPRT);
+    spapr_init_all_lpcrs(0, LPCR_HR | LPCR_UPRT);
     return 0;
 }
 
@@ -1656,7 +1672,7 @@ static void spapr_machine_reset(MachineState *machine, ShutdownCause reason)
     spapr_ovec_cleanup(spapr->ov5_cas);
     spapr->ov5_cas = spapr_ovec_new();
 
-    ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal);
+    ppc_init_compat_all(spapr->max_compat_pvr, &error_fatal);
 
     /*
      * This is fixing some of the default configuration of the XIVE
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 538b2dfb89..f47e8419a5 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -1012,6 +1012,7 @@ bool spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 #define SPAPR_OV5_XIVE_BOTH     0x80 /* Only to advertise on the platform */
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
+void spapr_init_all_lpcrs(target_ulong value, target_ulong mask);
 hwaddr spapr_get_rtas_addr(void);
 bool spapr_memory_hot_unplug_supported(SpaprMachineState *spapr);
 
diff --git a/target/ppc/compat.c b/target/ppc/compat.c
index 7949a24f5a..ebef2cccec 100644
--- a/target/ppc/compat.c
+++ b/target/ppc/compat.c
@@ -229,6 +229,25 @@ int ppc_set_compat_all(uint32_t compat_pvr, Error **errp)
     return 0;
 }
 
+/* To be used when the machine is not running */
+int ppc_init_compat_all(uint32_t compat_pvr, Error **errp)
+{
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+        int ret;
+
+        ret = ppc_set_compat(cpu, compat_pvr, errp);
+
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 int ppc_compat_max_vthreads(PowerPCCPU *cpu)
 {
     const CompatInfo *compat = compat_by_pvr(cpu->compat_pvr);
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 0087ce66e2..a0f4cec606 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1496,6 +1496,7 @@ int ppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr, Error **errp);
 
 #if !defined(CONFIG_USER_ONLY)
 int ppc_set_compat_all(uint32_t compat_pvr, Error **errp);
+int ppc_init_compat_all(uint32_t compat_pvr, Error **errp);
 #endif
 int ppc_compat_max_vthreads(PowerPCCPU *cpu);
 void ppc_compat_add_property(Object *obj, const char *name,
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/7] spapr: Fix record-replay machine reset consuming too many events
  2023-06-23 12:57 [PATCH 0/7] ppc: fix larx migration, fix record-replay Nicholas Piggin
                   ` (2 preceding siblings ...)
  2023-06-23 12:57 ` [PATCH 3/7] spapr: Fix machine reset deadlock from replay-record Nicholas Piggin
@ 2023-06-23 12:57 ` Nicholas Piggin
  2023-06-26  8:07   ` Pavel Dovgalyuk
  2023-06-23 12:57 ` [PATCH 5/7] target/ppc: Fix timebase reset with record-replay Nicholas Piggin
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2023-06-23 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-ppc, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Greg Kurz,
	Harsh Prateek Bora, John Snow, Cleber Rosa, Pavel Dovgalyuk,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Peter Maydell,
	Richard Henderson

spapr_machine_reset gets a random number to populate the device-tree
rng seed with. When loading a snapshot for record-replay, the machine
is reset again, and that tries to consume the random event record
again, crashing due to inconsistent record

Fix this by saving the seed to populate the device tree with, and
skipping the rng on snapshot load.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/ppc/spapr.c         | 12 +++++++++---
 include/hw/ppc/spapr.h |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d290acfa95..55948f233f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1017,7 +1017,6 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, void *fdt, bool reset)
 {
     MachineState *machine = MACHINE(spapr);
     SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
-    uint8_t rng_seed[32];
     int chosen;
 
     _FDT(chosen = fdt_add_subnode(fdt, 0, "chosen"));
@@ -1095,8 +1094,7 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, void *fdt, bool reset)
         spapr_dt_ov5_platform_support(spapr, fdt, chosen);
     }
 
-    qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
-    _FDT(fdt_setprop(fdt, chosen, "rng-seed", rng_seed, sizeof(rng_seed)));
+    _FDT(fdt_setprop(fdt, chosen, "rng-seed", spapr->fdt_rng_seed, 32));
 
     _FDT(spapr_dt_ovec(fdt, chosen, spapr->ov5_cas, "ibm,architecture-vec-5"));
 }
@@ -1649,6 +1647,14 @@ static void spapr_machine_reset(MachineState *machine, ShutdownCause reason)
     void *fdt;
     int rc;
 
+    if (reason != SHUTDOWN_CAUSE_SNAPSHOT_LOAD) {
+        /*
+         * Record-replay snapshot load must not consume random, this was
+         * already replayed from initial machine reset.
+         */
+        qemu_guest_getrandom_nofail(spapr->fdt_rng_seed, 32);
+    }
+
     pef_kvm_reset(machine->cgs, &error_fatal);
     spapr_caps_apply(spapr);
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index f47e8419a5..f4bd204d86 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -204,6 +204,7 @@ struct SpaprMachineState {
     uint32_t fdt_size;
     uint32_t fdt_initial_size;
     void *fdt_blob;
+    uint8_t fdt_rng_seed[32];
     long kernel_size;
     bool kernel_le;
     uint64_t kernel_addr;
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] target/ppc: Fix timebase reset with record-replay
  2023-06-23 12:57 [PATCH 0/7] ppc: fix larx migration, fix record-replay Nicholas Piggin
                   ` (3 preceding siblings ...)
  2023-06-23 12:57 ` [PATCH 4/7] spapr: Fix record-replay machine reset consuming too many events Nicholas Piggin
@ 2023-06-23 12:57 ` Nicholas Piggin
  2023-06-26  7:52   ` Pavel Dovgalyuk
  2023-06-23 12:57 ` [PATCH 6/7] tests/avocado: boot ppc64 pseries replay-record test to Linux VFS mount Nicholas Piggin
  2023-06-23 12:57 ` [PATCH 7/7] tests/avocado: ppc64 pseries reverse debugging test Nicholas Piggin
  6 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2023-06-23 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-ppc, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Greg Kurz,
	Harsh Prateek Bora, John Snow, Cleber Rosa, Pavel Dovgalyuk,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Peter Maydell,
	Richard Henderson

Timebase save uses a random number for a legacy vmstate field, which
makes rr snapshot loading unbalanced. The easiest way to deal with this
is just to skip the rng if record-replay is active.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/ppc/ppc.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 82e4408c5c..7b7db30f95 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -32,6 +32,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/error-report.h"
 #include "sysemu/kvm.h"
+#include "sysemu/replay.h"
 #include "sysemu/runstate.h"
 #include "kvm_ppc.h"
 #include "migration/vmstate.h"
@@ -933,8 +934,14 @@ static void timebase_save(PPCTimebase *tb)
         return;
     }
 
-    /* not used anymore, we keep it for compatibility */
-    tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
+    if (replay_mode == REPLAY_MODE_NONE) {
+        /* not used anymore, we keep it for compatibility */
+        tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
+    } else {
+        /* simpler for record-replay to avoid this event, compat not needed */
+        tb->time_of_the_day_ns = 0;
+    }
+
     /*
      * tb_offset is only expected to be changed by QEMU so
      * there is no need to update it from KVM here
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] tests/avocado: boot ppc64 pseries replay-record test to Linux VFS mount
  2023-06-23 12:57 [PATCH 0/7] ppc: fix larx migration, fix record-replay Nicholas Piggin
                   ` (4 preceding siblings ...)
  2023-06-23 12:57 ` [PATCH 5/7] target/ppc: Fix timebase reset with record-replay Nicholas Piggin
@ 2023-06-23 12:57 ` Nicholas Piggin
  2023-06-23 12:57 ` [PATCH 7/7] tests/avocado: ppc64 pseries reverse debugging test Nicholas Piggin
  6 siblings, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2023-06-23 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-ppc, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Greg Kurz,
	Harsh Prateek Bora, John Snow, Cleber Rosa, Pavel Dovgalyuk,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Peter Maydell,
	Richard Henderson

This the ppc64 record-replay test is able to replay the full kernel boot
so try enabling it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 tests/avocado/replay_kernel.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py
index fe1e901f4b..a4dc8c0d6c 100644
--- a/tests/avocado/replay_kernel.py
+++ b/tests/avocado/replay_kernel.py
@@ -255,8 +255,7 @@ def test_ppc64_pseries(self):
         kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
 
         kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=hvc0'
-        # icount is not good enough for PPC64 for complete boot yet
-        console_pattern = 'Kernel command line: %s' % kernel_command_line
+        console_pattern = 'VFS: Cannot open root device'
         self.run_rr(kernel_path, kernel_command_line, console_pattern)
 
     def test_m68k_q800(self):
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/7] tests/avocado: ppc64 pseries reverse debugging test
  2023-06-23 12:57 [PATCH 0/7] ppc: fix larx migration, fix record-replay Nicholas Piggin
                   ` (5 preceding siblings ...)
  2023-06-23 12:57 ` [PATCH 6/7] tests/avocado: boot ppc64 pseries replay-record test to Linux VFS mount Nicholas Piggin
@ 2023-06-23 12:57 ` Nicholas Piggin
  2023-06-26  7:49   ` Pavel Dovgalyuk
  6 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2023-06-23 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-ppc, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Greg Kurz,
	Harsh Prateek Bora, John Snow, Cleber Rosa, Pavel Dovgalyuk,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Peter Maydell,
	Richard Henderson

pseries can run reverse-debugging well enough to pass basic tests.

There is strangeness with reverse-continue possibly relating to a bp
being set on the first instruction or on a snapshot, that causes
the PC to be reported on the first instruction rather than last
breakpoint, so a workaround is added for that for now.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 tests/avocado/reverse_debugging.py | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
index 680c314cfc..553c931994 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -94,7 +94,7 @@ def gdb_bstep(g):
     def vm_get_icount(vm):
         return vm.qmp('query-replay')['return']['icount']
 
-    def reverse_debugging(self, shift=7, args=None):
+    def reverse_debugging(self, shift=7, args=None, initial_skip=False):
         logger = logging.getLogger('replay')
 
         # create qcow2 for snapshots
@@ -135,6 +135,10 @@ def reverse_debugging(self, shift=7, args=None):
             self.fail('Reverse continue is not supported by QEMU')
 
         logger.info('stepping forward')
+
+        if initial_skip:
+            self.gdb_step(g)
+
         steps = []
         # record first instruction addresses
         for _ in range(self.STEPS):
@@ -216,3 +220,25 @@ def test_aarch64_virt(self):
 
         self.reverse_debugging(
             args=('-kernel', kernel_path))
+
+class ReverseDebugging_ppc64(ReverseDebugging):
+    """
+    :avocado: tags=accel:tcg
+    """
+
+    REG_PC = 0x40
+
+    # unidentified gitlab timeout problem
+    @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
+    def test_ppc64_pseries(self):
+        """
+        :avocado: tags=arch:ppc64
+        :avocado: tags=machine:pseries
+        """
+        # start with BIOS only
+        self.endian_is_le = False
+        # reverse-continue fails (seems to end up at the start) if a break
+        # is put on the first instruction. initial_skip skips one before the
+        # first bp, and it works. Could be due to break at the same icount
+        # as the snapshot?
+        self.reverse_debugging(initial_skip=True)
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 7/7] tests/avocado: ppc64 pseries reverse debugging test
  2023-06-23 12:57 ` [PATCH 7/7] tests/avocado: ppc64 pseries reverse debugging test Nicholas Piggin
@ 2023-06-26  7:49   ` Pavel Dovgalyuk
  2023-06-26  9:34     ` Nicholas Piggin
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Dovgalyuk @ 2023-06-26  7:49 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Cédric Le Goater,
	David Gibson, Greg Kurz, Harsh Prateek Bora, John Snow,
	Cleber Rosa, Pavel Dovgalyuk, Paolo Bonzini,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Peter Maydell, Richard Henderson

On 23.06.2023 15:57, Nicholas Piggin wrote:
> pseries can run reverse-debugging well enough to pass basic tests.
> 
> There is strangeness with reverse-continue possibly relating to a bp
> being set on the first instruction or on a snapshot, that causes
> the PC to be reported on the first instruction rather than last
> breakpoint, so a workaround is added for that for now.

It means that the test reveals some kind of a bug in PPC debugging 
server implementation.
In this case it is better to fix that instead of adding workaround.

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   tests/avocado/reverse_debugging.py | 28 +++++++++++++++++++++++++++-
>   1 file changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
> index 680c314cfc..553c931994 100644
> --- a/tests/avocado/reverse_debugging.py
> +++ b/tests/avocado/reverse_debugging.py
> @@ -94,7 +94,7 @@ def gdb_bstep(g):
>       def vm_get_icount(vm):
>           return vm.qmp('query-replay')['return']['icount']
>   
> -    def reverse_debugging(self, shift=7, args=None):
> +    def reverse_debugging(self, shift=7, args=None, initial_skip=False):
>           logger = logging.getLogger('replay')
>   
>           # create qcow2 for snapshots
> @@ -135,6 +135,10 @@ def reverse_debugging(self, shift=7, args=None):
>               self.fail('Reverse continue is not supported by QEMU')
>   
>           logger.info('stepping forward')
> +
> +        if initial_skip:
> +            self.gdb_step(g)
> +
>           steps = []
>           # record first instruction addresses
>           for _ in range(self.STEPS):
> @@ -216,3 +220,25 @@ def test_aarch64_virt(self):
>   
>           self.reverse_debugging(
>               args=('-kernel', kernel_path))
> +
> +class ReverseDebugging_ppc64(ReverseDebugging):
> +    """
> +    :avocado: tags=accel:tcg
> +    """
> +
> +    REG_PC = 0x40
> +
> +    # unidentified gitlab timeout problem
> +    @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
> +    def test_ppc64_pseries(self):
> +        """
> +        :avocado: tags=arch:ppc64
> +        :avocado: tags=machine:pseries
> +        """
> +        # start with BIOS only
> +        self.endian_is_le = False
> +        # reverse-continue fails (seems to end up at the start) if a break
> +        # is put on the first instruction. initial_skip skips one before the
> +        # first bp, and it works. Could be due to break at the same icount
> +        # as the snapshot?
> +        self.reverse_debugging(initial_skip=True)



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] target/ppc: Fix CPU reservation migration for record-replay
  2023-06-23 12:57 ` [PATCH 1/7] target/ppc: Fix CPU reservation migration for record-replay Nicholas Piggin
@ 2023-06-26  7:49   ` Pavel Dovgalyuk
  2023-07-07  9:23   ` Daniel Henrique Barboza
  1 sibling, 0 replies; 17+ messages in thread
From: Pavel Dovgalyuk @ 2023-06-26  7:49 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Cédric Le Goater,
	David Gibson, Greg Kurz, Harsh Prateek Bora, John Snow,
	Cleber Rosa, Pavel Dovgalyuk, Paolo Bonzini,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Peter Maydell, Richard Henderson

Acked-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>

On 23.06.2023 15:57, Nicholas Piggin wrote:
> ppc only migrates reserve_addr, so the destination machine can get a
> valid reservation with an incorrect reservation value of 0. Prior to
> commit 392d328abe753 ("target/ppc: Ensure stcx size matches larx"),
> this could permit a stcx. to incorrectly succeed. That commit
> inadvertently fixed that bug because the target machine starts with an
> impossible reservation size of 0, so any stcx. will fail.
> 
> This behaviour is permitted by the ISA because reservation loss may
> have implementation-dependent cause. What's more, with KVM machines it
> is impossible save or reasonably restore reservation state. However if
> the vmstate is being used for record-replay, the reservation must be
> saved and restored exactly in order for execution from snapshot to
> match the record.
> 
> This patch deprecates the existing incomplete reserve_addr vmstate,
> and adds a new vmstate subsection with complete reservation state.
> The new vmstate is needed only when record-replay mode is active.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   target/ppc/cpu.h       |  2 ++
>   target/ppc/machine.c   | 26 ++++++++++++++++++++++++--
>   target/ppc/translate.c |  2 ++
>   3 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 4138a25801..0087ce66e2 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1119,7 +1119,9 @@ struct CPUArchState {
>       target_ulong reserve_addr;   /* Reservation address */
>       target_ulong reserve_length; /* Reservation larx op size (bytes) */
>       target_ulong reserve_val;    /* Reservation value */
> +#if defined(TARGET_PPC64)
>       target_ulong reserve_val2;
> +#endif
>   
>       /* These are used in supervisor mode only */
>       target_ulong msr;      /* machine state register */
> diff --git a/target/ppc/machine.c b/target/ppc/machine.c
> index 134b16c625..a817532e5b 100644
> --- a/target/ppc/machine.c
> +++ b/target/ppc/machine.c
> @@ -10,6 +10,7 @@
>   #include "qemu/main-loop.h"
>   #include "kvm_ppc.h"
>   #include "power8-pmu.h"
> +#include "sysemu/replay.h"
>   
>   static void post_load_update_msr(CPUPPCState *env)
>   {
> @@ -671,6 +672,27 @@ static const VMStateDescription vmstate_compat = {
>       }
>   };
>   
> +static bool reservation_needed(void *opaque)
> +{
> +    return (replay_mode != REPLAY_MODE_NONE);
> +}
> +
> +static const VMStateDescription vmstate_reservation = {
> +    .name = "cpu/reservation",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = reservation_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
> +        VMSTATE_UINTTL(env.reserve_length, PowerPCCPU),
> +        VMSTATE_UINTTL(env.reserve_val, PowerPCCPU),
> +#if defined(TARGET_PPC64)
> +        VMSTATE_UINTTL(env.reserve_val2, PowerPCCPU),
> +#endif
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>   const VMStateDescription vmstate_ppc_cpu = {
>       .name = "cpu",
>       .version_id = 5,
> @@ -692,8 +714,7 @@ const VMStateDescription vmstate_ppc_cpu = {
>           VMSTATE_UINTTL_ARRAY(env.spr, PowerPCCPU, 1024),
>           VMSTATE_UINT64(env.spe_acc, PowerPCCPU),
>   
> -        /* Reservation */
> -        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was env.reserve_addr */
>   
>           /* Supervisor mode architected state */
>           VMSTATE_UINTTL(env.msr, PowerPCCPU),
> @@ -722,6 +743,7 @@ const VMStateDescription vmstate_ppc_cpu = {
>           &vmstate_tlbemb,
>           &vmstate_tlbmas,
>           &vmstate_compat,
> +        &vmstate_reservation,
>           NULL
>       }
>   };
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index c9fb7b40a5..eb278c2683 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -77,7 +77,9 @@ static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
>   static TCGv cpu_reserve;
>   static TCGv cpu_reserve_length;
>   static TCGv cpu_reserve_val;
> +#if defined(TARGET_PPC64)
>   static TCGv cpu_reserve_val2;
> +#endif
>   static TCGv cpu_fpscr;
>   static TCGv_i32 cpu_access_type;
>   



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/7] target/ppc: Fix timebase reset with record-replay
  2023-06-23 12:57 ` [PATCH 5/7] target/ppc: Fix timebase reset with record-replay Nicholas Piggin
@ 2023-06-26  7:52   ` Pavel Dovgalyuk
  0 siblings, 0 replies; 17+ messages in thread
From: Pavel Dovgalyuk @ 2023-06-26  7:52 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Cédric Le Goater,
	David Gibson, Greg Kurz, Harsh Prateek Bora, John Snow,
	Cleber Rosa, Pavel Dovgalyuk, Paolo Bonzini,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Peter Maydell, Richard Henderson

Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>

On 23.06.2023 15:57, Nicholas Piggin wrote:
> Timebase save uses a random number for a legacy vmstate field, which
> makes rr snapshot loading unbalanced. The easiest way to deal with this
> is just to skip the rng if record-replay is active.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   hw/ppc/ppc.c | 11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index 82e4408c5c..7b7db30f95 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -32,6 +32,7 @@
>   #include "qemu/main-loop.h"
>   #include "qemu/error-report.h"
>   #include "sysemu/kvm.h"
> +#include "sysemu/replay.h"
>   #include "sysemu/runstate.h"
>   #include "kvm_ppc.h"
>   #include "migration/vmstate.h"
> @@ -933,8 +934,14 @@ static void timebase_save(PPCTimebase *tb)
>           return;
>       }
>   
> -    /* not used anymore, we keep it for compatibility */
> -    tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
> +    if (replay_mode == REPLAY_MODE_NONE) {
> +        /* not used anymore, we keep it for compatibility */
> +        tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
> +    } else {
> +        /* simpler for record-replay to avoid this event, compat not needed */
> +        tb->time_of_the_day_ns = 0;
> +    }
> +
>       /*
>        * tb_offset is only expected to be changed by QEMU so
>        * there is no need to update it from KVM here



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] spapr: Fix record-replay machine reset consuming too many events
  2023-06-23 12:57 ` [PATCH 4/7] spapr: Fix record-replay machine reset consuming too many events Nicholas Piggin
@ 2023-06-26  8:07   ` Pavel Dovgalyuk
  2023-06-26 10:04     ` Nicholas Piggin
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Dovgalyuk @ 2023-06-26  8:07 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Cédric Le Goater,
	David Gibson, Greg Kurz, Harsh Prateek Bora, John Snow,
	Cleber Rosa, Pavel Dovgalyuk, Paolo Bonzini,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Peter Maydell, Richard Henderson

e500 has the same problem, I think, according to this issue: 
https://gitlab.com/qemu-project/qemu/-/issues/1634

Btw, ARM virt platform rebuilds fdt only at initialization phase, not 
when reset.
Isn't this behavior correct? Shouldn't PPC platforms do the similar thing?

On 23.06.2023 15:57, Nicholas Piggin wrote:
> spapr_machine_reset gets a random number to populate the device-tree
> rng seed with. When loading a snapshot for record-replay, the machine
> is reset again, and that tries to consume the random event record
> again, crashing due to inconsistent record
> 
> Fix this by saving the seed to populate the device tree with, and
> skipping the rng on snapshot load.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   hw/ppc/spapr.c         | 12 +++++++++---
>   include/hw/ppc/spapr.h |  1 +
>   2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d290acfa95..55948f233f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1017,7 +1017,6 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, void *fdt, bool reset)
>   {
>       MachineState *machine = MACHINE(spapr);
>       SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
> -    uint8_t rng_seed[32];
>       int chosen;
>   
>       _FDT(chosen = fdt_add_subnode(fdt, 0, "chosen"));
> @@ -1095,8 +1094,7 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, void *fdt, bool reset)
>           spapr_dt_ov5_platform_support(spapr, fdt, chosen);
>       }
>   
> -    qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
> -    _FDT(fdt_setprop(fdt, chosen, "rng-seed", rng_seed, sizeof(rng_seed)));
> +    _FDT(fdt_setprop(fdt, chosen, "rng-seed", spapr->fdt_rng_seed, 32));
>   
>       _FDT(spapr_dt_ovec(fdt, chosen, spapr->ov5_cas, "ibm,architecture-vec-5"));
>   }
> @@ -1649,6 +1647,14 @@ static void spapr_machine_reset(MachineState *machine, ShutdownCause reason)
>       void *fdt;
>       int rc;
>   
> +    if (reason != SHUTDOWN_CAUSE_SNAPSHOT_LOAD) {
> +        /*
> +         * Record-replay snapshot load must not consume random, this was
> +         * already replayed from initial machine reset.
> +         */
> +        qemu_guest_getrandom_nofail(spapr->fdt_rng_seed, 32);
> +    }
> +
>       pef_kvm_reset(machine->cgs, &error_fatal);
>       spapr_caps_apply(spapr);
>   
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index f47e8419a5..f4bd204d86 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -204,6 +204,7 @@ struct SpaprMachineState {
>       uint32_t fdt_size;
>       uint32_t fdt_initial_size;
>       void *fdt_blob;
> +    uint8_t fdt_rng_seed[32];
>       long kernel_size;
>       bool kernel_le;
>       uint64_t kernel_addr;



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 7/7] tests/avocado: ppc64 pseries reverse debugging test
  2023-06-26  7:49   ` Pavel Dovgalyuk
@ 2023-06-26  9:34     ` Nicholas Piggin
  2023-07-21 13:55       ` Nicholas Piggin
  0 siblings, 1 reply; 17+ messages in thread
From: Nicholas Piggin @ 2023-06-26  9:34 UTC (permalink / raw)
  To: Pavel Dovgalyuk, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Cédric Le Goater,
	David Gibson, Greg Kurz, Harsh Prateek Bora, John Snow,
	Cleber Rosa, Pavel Dovgalyuk, Paolo Bonzini,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Peter Maydell, Richard Henderson

On Mon Jun 26, 2023 at 5:49 PM AEST, Pavel Dovgalyuk wrote:
> On 23.06.2023 15:57, Nicholas Piggin wrote:
> > pseries can run reverse-debugging well enough to pass basic tests.
> > 
> > There is strangeness with reverse-continue possibly relating to a bp
> > being set on the first instruction or on a snapshot, that causes
> > the PC to be reported on the first instruction rather than last
> > breakpoint, so a workaround is added for that for now.
>
> It means that the test reveals some kind of a bug in PPC debugging 
> server implementation.
> In this case it is better to fix that instead of adding workaround.

I agree, and I'm trying to find the cause it hasn't been easy. I
thought the test was still interesting because it otherwise seems
to work well, but hopefully I can find the issue before long.

Thanks,
Nick

>
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >   tests/avocado/reverse_debugging.py | 28 +++++++++++++++++++++++++++-
> >   1 file changed, 27 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
> > index 680c314cfc..553c931994 100644
> > --- a/tests/avocado/reverse_debugging.py
> > +++ b/tests/avocado/reverse_debugging.py
> > @@ -94,7 +94,7 @@ def gdb_bstep(g):
> >       def vm_get_icount(vm):
> >           return vm.qmp('query-replay')['return']['icount']
> >   
> > -    def reverse_debugging(self, shift=7, args=None):
> > +    def reverse_debugging(self, shift=7, args=None, initial_skip=False):
> >           logger = logging.getLogger('replay')
> >   
> >           # create qcow2 for snapshots
> > @@ -135,6 +135,10 @@ def reverse_debugging(self, shift=7, args=None):
> >               self.fail('Reverse continue is not supported by QEMU')
> >   
> >           logger.info('stepping forward')
> > +
> > +        if initial_skip:
> > +            self.gdb_step(g)
> > +
> >           steps = []
> >           # record first instruction addresses
> >           for _ in range(self.STEPS):
> > @@ -216,3 +220,25 @@ def test_aarch64_virt(self):
> >   
> >           self.reverse_debugging(
> >               args=('-kernel', kernel_path))
> > +
> > +class ReverseDebugging_ppc64(ReverseDebugging):
> > +    """
> > +    :avocado: tags=accel:tcg
> > +    """
> > +
> > +    REG_PC = 0x40
> > +
> > +    # unidentified gitlab timeout problem
> > +    @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
> > +    def test_ppc64_pseries(self):
> > +        """
> > +        :avocado: tags=arch:ppc64
> > +        :avocado: tags=machine:pseries
> > +        """
> > +        # start with BIOS only
> > +        self.endian_is_le = False
> > +        # reverse-continue fails (seems to end up at the start) if a break
> > +        # is put on the first instruction. initial_skip skips one before the
> > +        # first bp, and it works. Could be due to break at the same icount
> > +        # as the snapshot?
> > +        self.reverse_debugging(initial_skip=True)



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] spapr: Fix record-replay machine reset consuming too many events
  2023-06-26  8:07   ` Pavel Dovgalyuk
@ 2023-06-26 10:04     ` Nicholas Piggin
  0 siblings, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2023-06-26 10:04 UTC (permalink / raw)
  To: Pavel Dovgalyuk, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Cédric Le Goater,
	David Gibson, Greg Kurz, Harsh Prateek Bora, John Snow,
	Cleber Rosa, Pavel Dovgalyuk, Paolo Bonzini,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Peter Maydell, Richard Henderson

On Mon Jun 26, 2023 at 6:07 PM AEST, Pavel Dovgalyuk wrote:
> e500 has the same problem, I think, according to this issue: 
> https://gitlab.com/qemu-project/qemu/-/issues/1634

Same symptoms. e500 looks like it does the dt build in
machine_init_done notifier, though. Maybe I miss something.
I'll take a look.

>
> Btw, ARM virt platform rebuilds fdt only at initialization phase, not 
> when reset.

I was actually wondering about keeping the same rng-seed across resets
to make the code even simpler, but decided to keep behaviour unchanged.
That seems like one downside.

> Isn't this behavior correct? Shouldn't PPC platforms do the similar thing?

I believe spapr does this for an spapr "feature" that rebuilds the fdt
at runtime ("client-architecture-support"), so reset has to build the
original one again.

Thanks,
Nick

>
> On 23.06.2023 15:57, Nicholas Piggin wrote:
> > spapr_machine_reset gets a random number to populate the device-tree
> > rng seed with. When loading a snapshot for record-replay, the machine
> > is reset again, and that tries to consume the random event record
> > again, crashing due to inconsistent record
> > 
> > Fix this by saving the seed to populate the device tree with, and
> > skipping the rng on snapshot load.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >   hw/ppc/spapr.c         | 12 +++++++++---
> >   include/hw/ppc/spapr.h |  1 +
> >   2 files changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index d290acfa95..55948f233f 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1017,7 +1017,6 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, void *fdt, bool reset)
> >   {
> >       MachineState *machine = MACHINE(spapr);
> >       SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
> > -    uint8_t rng_seed[32];
> >       int chosen;
> >   
> >       _FDT(chosen = fdt_add_subnode(fdt, 0, "chosen"));
> > @@ -1095,8 +1094,7 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, void *fdt, bool reset)
> >           spapr_dt_ov5_platform_support(spapr, fdt, chosen);
> >       }
> >   
> > -    qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
> > -    _FDT(fdt_setprop(fdt, chosen, "rng-seed", rng_seed, sizeof(rng_seed)));
> > +    _FDT(fdt_setprop(fdt, chosen, "rng-seed", spapr->fdt_rng_seed, 32));
> >   
> >       _FDT(spapr_dt_ovec(fdt, chosen, spapr->ov5_cas, "ibm,architecture-vec-5"));
> >   }
> > @@ -1649,6 +1647,14 @@ static void spapr_machine_reset(MachineState *machine, ShutdownCause reason)
> >       void *fdt;
> >       int rc;
> >   
> > +    if (reason != SHUTDOWN_CAUSE_SNAPSHOT_LOAD) {
> > +        /*
> > +         * Record-replay snapshot load must not consume random, this was
> > +         * already replayed from initial machine reset.
> > +         */
> > +        qemu_guest_getrandom_nofail(spapr->fdt_rng_seed, 32);
> > +    }
> > +
> >       pef_kvm_reset(machine->cgs, &error_fatal);
> >       spapr_caps_apply(spapr);
> >   
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index f47e8419a5..f4bd204d86 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -204,6 +204,7 @@ struct SpaprMachineState {
> >       uint32_t fdt_size;
> >       uint32_t fdt_initial_size;
> >       void *fdt_blob;
> > +    uint8_t fdt_rng_seed[32];
> >       long kernel_size;
> >       bool kernel_le;
> >       uint64_t kernel_addr;



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] target/ppc: Fix CPU reservation migration for record-replay
  2023-06-23 12:57 ` [PATCH 1/7] target/ppc: Fix CPU reservation migration for record-replay Nicholas Piggin
  2023-06-26  7:49   ` Pavel Dovgalyuk
@ 2023-07-07  9:23   ` Daniel Henrique Barboza
  1 sibling, 0 replies; 17+ messages in thread
From: Daniel Henrique Barboza @ 2023-07-07  9:23 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-ppc, Cédric Le Goater, David Gibson, Greg Kurz,
	Harsh Prateek Bora, John Snow, Cleber Rosa, Pavel Dovgalyuk,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Peter Maydell,
	Richard Henderson

Nick,

On 6/23/23 09:57, Nicholas Piggin wrote:
> ppc only migrates reserve_addr, so the destination machine can get a
> valid reservation with an incorrect reservation value of 0. Prior to
> commit 392d328abe753 ("target/ppc: Ensure stcx size matches larx"),
> this could permit a stcx. to incorrectly succeed. That commit
> inadvertently fixed that bug because the target machine starts with an
> impossible reservation size of 0, so any stcx. will fail.
> 
> This behaviour is permitted by the ISA because reservation loss may
> have implementation-dependent cause. What's more, with KVM machines it
> is impossible save or reasonably restore reservation state. However if
> the vmstate is being used for record-replay, the reservation must be
> saved and restored exactly in order for execution from snapshot to
> match the record.
> 
> This patch deprecates the existing incomplete reserve_addr vmstate,
> and adds a new vmstate subsection with complete reservation state.
> The new vmstate is needed only when record-replay mode is active.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   target/ppc/cpu.h       |  2 ++
>   target/ppc/machine.c   | 26 ++++++++++++++++++++++++--
>   target/ppc/translate.c |  2 ++
>   3 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 4138a25801..0087ce66e2 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1119,7 +1119,9 @@ struct CPUArchState {
>       target_ulong reserve_addr;   /* Reservation address */
>       target_ulong reserve_length; /* Reservation larx op size (bytes) */
>       target_ulong reserve_val;    /* Reservation value */
> +#if defined(TARGET_PPC64)
>       target_ulong reserve_val2;
> +#endif

^ this ifdef breaks gitlab on a handful of runners with this error:

https://gitlab.com/danielhb/qemu/-/jobs/4612030964

[1637/2985] Compiling C object libqemu-ppc-linux-user.fa.p/target_ppc_translate.c.o
FAILED: libqemu-ppc-linux-user.fa.p/target_ppc_translate.c.o
cc -m64 -mcx16 -Ilibqemu-ppc-linux-user.fa.p -I. -I.. -Itarget/ppc -I../target/ppc -I../common-user/host/x86_64 -I../linux-user/include/host/x86_64 -I../linux-user/include -Ilinux-user -I../linux-user -Ilinux-user/ppc -I../linux-user/ppc -Iqapi -Itrace -Iui/shader -I/usr/include/capstone -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -fdiagnostics-color=auto -Wall -Winvalid-pch -Werror -std=gnu11 -O2 -g -fstack-protector-strong -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -Wundef -Wwrite-strings -Wmissing-prototypes -Wstrict-prototypes -Wredundant-decls -Wold-style-declaration -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wmissing-format-attribute -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi -isystem /builds/danielhb/qemu/linux-headers -isystem linux-headers -iquote . -iquote /builds/danielhb/qemu -iquote /builds/danielhb/qemu/include -iquote /builds/danielhb/qemu/host/include/x86_64 -iquote /builds/danielhb/qemu/host/include/generic -iquote /builds/danielhb/qemu/tcg/i386 -pthread -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fno-common -fwrapv -fPIE -isystem../linux-headers -isystemlinux-headers -DNEED_CPU_H '-DCONFIG_TARGET="ppc-linux-user-config-target.h"' '-DCONFIG_DEVICES="ppc-linux-user-config-devices.h"' -MD -MQ libqemu-ppc-linux-user.fa.p/target_ppc_translate.c.o -MF libqemu-ppc-linux-user.fa.p/target_ppc_translate.c.o.d -o libqemu-ppc-linux-user.fa.p/target_ppc_translate.c.o -c ../target/ppc/translate.c
../target/ppc/translate.c: In function 'ppc_translate_init':
../target/ppc/translate.c:156:5: error: 'cpu_reserve_val2' undeclared (first use in this function); did you mean 'cpu_reserve_val'?
   156 |     cpu_reserve_val2 = tcg_global_mem_new(cpu_env,
       |     ^~~~~~~~~~~~~~~~
       |     cpu_reserve_val
../target/ppc/translate.c:156:5: note: each undeclared identifier is reported only once for each function it appears in
In file included from /usr/include/rpc/netdb.h:42,
                  from /usr/include/netdb.h:32,
                  from /builds/danielhb/qemu/include/sysemu/os-posix.h:34,
                  from /builds/danielhb/qemu/include/qemu/osdep.h:151,
                  from ../target/ppc/translate.c:21:
../target/ppc/translate.c:157:65: error: 'CPUPPCState' {aka 'struct CPUArchState'} has no member named 'reserve_val2'; did you mean 'reserve_val'?
   157 |                                           offsetof(CPUPPCState, reserve_val2),
       |                                                                 ^~~~~~~~~~~~
[1638/2985] Compiling C object libqemu-ppc-linux-user.fa.p/libdecnumber_decNumber.c.o


I'll leave patch 1 and 5 behind for now. Since they're marked as fixes you
can send them during the freeze. Thanks,


Daniel


>   
>       /* These are used in supervisor mode only */
>       target_ulong msr;      /* machine state register */
> diff --git a/target/ppc/machine.c b/target/ppc/machine.c
> index 134b16c625..a817532e5b 100644
> --- a/target/ppc/machine.c
> +++ b/target/ppc/machine.c
> @@ -10,6 +10,7 @@
>   #include "qemu/main-loop.h"
>   #include "kvm_ppc.h"
>   #include "power8-pmu.h"
> +#include "sysemu/replay.h"
>   
>   static void post_load_update_msr(CPUPPCState *env)
>   {
> @@ -671,6 +672,27 @@ static const VMStateDescription vmstate_compat = {
>       }
>   };
>   
> +static bool reservation_needed(void *opaque)
> +{
> +    return (replay_mode != REPLAY_MODE_NONE);
> +}
> +
> +static const VMStateDescription vmstate_reservation = {
> +    .name = "cpu/reservation",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = reservation_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
> +        VMSTATE_UINTTL(env.reserve_length, PowerPCCPU),
> +        VMSTATE_UINTTL(env.reserve_val, PowerPCCPU),
> +#if defined(TARGET_PPC64)
> +        VMSTATE_UINTTL(env.reserve_val2, PowerPCCPU),
> +#endif
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>   const VMStateDescription vmstate_ppc_cpu = {
>       .name = "cpu",
>       .version_id = 5,
> @@ -692,8 +714,7 @@ const VMStateDescription vmstate_ppc_cpu = {
>           VMSTATE_UINTTL_ARRAY(env.spr, PowerPCCPU, 1024),
>           VMSTATE_UINT64(env.spe_acc, PowerPCCPU),
>   
> -        /* Reservation */
> -        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was env.reserve_addr */
>   
>           /* Supervisor mode architected state */
>           VMSTATE_UINTTL(env.msr, PowerPCCPU),
> @@ -722,6 +743,7 @@ const VMStateDescription vmstate_ppc_cpu = {
>           &vmstate_tlbemb,
>           &vmstate_tlbmas,
>           &vmstate_compat,
> +        &vmstate_reservation,
>           NULL
>       }
>   };
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index c9fb7b40a5..eb278c2683 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -77,7 +77,9 @@ static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
>   static TCGv cpu_reserve;
>   static TCGv cpu_reserve_length;
>   static TCGv cpu_reserve_val;
> +#if defined(TARGET_PPC64)
>   static TCGv cpu_reserve_val2;
> +#endif
>   static TCGv cpu_fpscr;
>   static TCGv_i32 cpu_access_type;
>   

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 7/7] tests/avocado: ppc64 pseries reverse debugging test
  2023-06-26  9:34     ` Nicholas Piggin
@ 2023-07-21 13:55       ` Nicholas Piggin
  0 siblings, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2023-07-21 13:55 UTC (permalink / raw)
  To: Nicholas Piggin, Pavel Dovgalyuk, qemu-devel
  Cc: qemu-ppc, Daniel Henrique Barboza, Cédric Le Goater,
	David Gibson, Greg Kurz, Harsh Prateek Bora, John Snow,
	Cleber Rosa, Pavel Dovgalyuk, Paolo Bonzini,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Peter Maydell, Richard Henderson

On Mon Jun 26, 2023 at 7:34 PM AEST, Nicholas Piggin wrote:
> On Mon Jun 26, 2023 at 5:49 PM AEST, Pavel Dovgalyuk wrote:
> > On 23.06.2023 15:57, Nicholas Piggin wrote:
> > > pseries can run reverse-debugging well enough to pass basic tests.
> > > 
> > > There is strangeness with reverse-continue possibly relating to a bp
> > > being set on the first instruction or on a snapshot, that causes
> > > the PC to be reported on the first instruction rather than last
> > > breakpoint, so a workaround is added for that for now.
> >
> > It means that the test reveals some kind of a bug in PPC debugging 
> > server implementation.
> > In this case it is better to fix that instead of adding workaround.
>
> I agree, and I'm trying to find the cause it hasn't been easy. I
> thought the test was still interesting because it otherwise seems
> to work well, but hopefully I can find the issue before long.

I found the problem after too much staring at record-replay. QEMU works
perfectly, it is the ppc pseries firmware that branches to its own entry
point address within the recorded trace, and that confuses the test
script.

I'm not sure the best way to work around it. Initial skip works okay but
also maybe(?) a good edge case to test a break on the very first
instruction of the trace even if we don't reverse continue to it.

I will send a new series and propose to add the breakpoints before
seeking to the end of the trace, so we will catch a breakpoint being
executed again.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] target/ppc: Fix CPU reservation migration for record-replay
  2023-07-26 18:35 [PATCH 0/7] ppc: record-replay fixes and enablement Nicholas Piggin
@ 2023-07-26 18:35 ` Nicholas Piggin
  0 siblings, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2023-07-26 18:35 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: Nicholas Piggin, Cédric Le Goater, David Gibson, Greg Kurz,
	Harsh Prateek Bora, Pavel Dovgalyuk, Paolo Bonzini, qemu-ppc,
	qemu-devel, Pavel Dovgalyuk

ppc only migrates reserve_addr, so the destination machine can get a
valid reservation with an incorrect reservation value of 0. Prior to
commit 392d328abe753 ("target/ppc: Ensure stcx size matches larx"),
this could permit a stcx. to incorrectly succeed. That commit
inadvertently fixed that bug because the target machine starts with an
impossible reservation size of 0, so any stcx. will fail.

This behaviour is permitted by the ISA because reservation loss may
have implementation-dependent cause. What's more, with KVM machines it
is impossible save or reasonably restore reservation state. However if
the vmstate is being used for record-replay, the reservation must be
saved and restored exactly in order for execution from snapshot to
match the record.

This patch deprecates the existing incomplete reserve_addr vmstate,
and adds a new vmstate subsection with complete reservation state.
The new vmstate is needed only when record-replay mode is active.

Acked-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 target/ppc/cpu.h       |  2 ++
 target/ppc/machine.c   | 26 ++++++++++++++++++++++++--
 target/ppc/translate.c |  4 ++++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 25fac9577a..27532d8f81 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1121,7 +1121,9 @@ struct CPUArchState {
     target_ulong reserve_addr;   /* Reservation address */
     target_ulong reserve_length; /* Reservation larx op size (bytes) */
     target_ulong reserve_val;    /* Reservation value */
+#if defined(TARGET_PPC64)
     target_ulong reserve_val2;
+#endif
 
     /* These are used in supervisor mode only */
     target_ulong msr;      /* machine state register */
diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index ebb37e07d0..9f956b972c 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -10,6 +10,7 @@
 #include "qemu/main-loop.h"
 #include "kvm_ppc.h"
 #include "power8-pmu.h"
+#include "sysemu/replay.h"
 
 static void post_load_update_msr(CPUPPCState *env)
 {
@@ -685,6 +686,27 @@ static const VMStateDescription vmstate_compat = {
     }
 };
 
+static bool reservation_needed(void *opaque)
+{
+    return (replay_mode != REPLAY_MODE_NONE);
+}
+
+static const VMStateDescription vmstate_reservation = {
+    .name = "cpu/reservation",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = reservation_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
+        VMSTATE_UINTTL(env.reserve_length, PowerPCCPU),
+        VMSTATE_UINTTL(env.reserve_val, PowerPCCPU),
+#if defined(TARGET_PPC64)
+        VMSTATE_UINTTL(env.reserve_val2, PowerPCCPU),
+#endif
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 const VMStateDescription vmstate_ppc_cpu = {
     .name = "cpu",
     .version_id = 5,
@@ -706,8 +728,7 @@ const VMStateDescription vmstate_ppc_cpu = {
         VMSTATE_UINTTL_ARRAY(env.spr, PowerPCCPU, 1024),
         VMSTATE_UINT64(env.spe_acc, PowerPCCPU),
 
-        /* Reservation */
-        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
+        VMSTATE_UNUSED(sizeof(target_ulong)), /* was env.reserve_addr */
 
         /* Supervisor mode architected state */
         VMSTATE_UINTTL(env.msr, PowerPCCPU),
@@ -736,6 +757,7 @@ const VMStateDescription vmstate_ppc_cpu = {
         &vmstate_tlbemb,
         &vmstate_tlbmas,
         &vmstate_compat,
+        &vmstate_reservation,
         NULL
     }
 };
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index e6a0709066..b88c00b4b0 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -77,7 +77,9 @@ static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
 static TCGv cpu_reserve;
 static TCGv cpu_reserve_length;
 static TCGv cpu_reserve_val;
+#if defined(TARGET_PPC64)
 static TCGv cpu_reserve_val2;
+#endif
 static TCGv cpu_fpscr;
 static TCGv_i32 cpu_access_type;
 
@@ -151,9 +153,11 @@ void ppc_translate_init(void)
     cpu_reserve_val = tcg_global_mem_new(cpu_env,
                                          offsetof(CPUPPCState, reserve_val),
                                          "reserve_val");
+#if defined(TARGET_PPC64)
     cpu_reserve_val2 = tcg_global_mem_new(cpu_env,
                                           offsetof(CPUPPCState, reserve_val2),
                                           "reserve_val2");
+#endif
 
     cpu_fpscr = tcg_global_mem_new(cpu_env,
                                    offsetof(CPUPPCState, fpscr), "fpscr");
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-07-26 19:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-23 12:57 [PATCH 0/7] ppc: fix larx migration, fix record-replay Nicholas Piggin
2023-06-23 12:57 ` [PATCH 1/7] target/ppc: Fix CPU reservation migration for record-replay Nicholas Piggin
2023-06-26  7:49   ` Pavel Dovgalyuk
2023-07-07  9:23   ` Daniel Henrique Barboza
2023-06-23 12:57 ` [PATCH 2/7] scripts/replay_dump.sh: Update to current rr record format Nicholas Piggin
2023-06-23 12:57 ` [PATCH 3/7] spapr: Fix machine reset deadlock from replay-record Nicholas Piggin
2023-06-23 12:57 ` [PATCH 4/7] spapr: Fix record-replay machine reset consuming too many events Nicholas Piggin
2023-06-26  8:07   ` Pavel Dovgalyuk
2023-06-26 10:04     ` Nicholas Piggin
2023-06-23 12:57 ` [PATCH 5/7] target/ppc: Fix timebase reset with record-replay Nicholas Piggin
2023-06-26  7:52   ` Pavel Dovgalyuk
2023-06-23 12:57 ` [PATCH 6/7] tests/avocado: boot ppc64 pseries replay-record test to Linux VFS mount Nicholas Piggin
2023-06-23 12:57 ` [PATCH 7/7] tests/avocado: ppc64 pseries reverse debugging test Nicholas Piggin
2023-06-26  7:49   ` Pavel Dovgalyuk
2023-06-26  9:34     ` Nicholas Piggin
2023-07-21 13:55       ` Nicholas Piggin
  -- strict thread matches above, loose matches on Subject: below --
2023-07-26 18:35 [PATCH 0/7] ppc: record-replay fixes and enablement Nicholas Piggin
2023-07-26 18:35 ` [PATCH 1/7] target/ppc: Fix CPU reservation migration for record-replay Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).