qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/10] replay: fixes and new test cases
@ 2024-08-13  5:06 Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 01/10] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Nicholas Piggin @ 2024-08-13  5:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

Since v5, I cut down the series significantly to just the better
reviewed parts, without adding new CI testing, since there are
still be a few hiccups. aarch64 had some hangs Alex noticed, and
x86_64 doesn't seem to be working anymore for me (with the big
replay_linux.py test). But with this series, things are much closer,
ppc64 does get through replay_linux.py (but requires some ppc
specific fixes and the new test to be added, so I leave that out
for now).

Hopefully we can get this minimal series in and in the next
release I'll try to get something stable enough for CI so it
doesn't keep breaking.

Thanks,
Nick

Nicholas Piggin (10):
  scripts/replay-dump.py: Update to current rr record format
  scripts/replay-dump.py: rejig decoders in event number order
  tests/avocado: excercise scripts/replay-dump.py in replay tests
  replay: allow runstate shutdown->running when replaying trace
  Revert "replay: stop us hanging in rr_wait_io_event"
  tests/avocado: replay_kernel.py add x86-64 q35 machine test
  chardev: set record/replay on the base device of a muxed device
  virtio-net: Use replay_schedule_bh_event for bhs that affect machine
    state
  virtio-net: Use virtual time for RSC timers
  savevm: Fix load_snapshot error path crash

 include/sysemu/replay.h        |   5 -
 include/sysemu/runstate.h      |   1 +
 accel/tcg/tcg-accel-ops-rr.c   |   2 +-
 chardev/char.c                 |  71 +++++++++-----
 hw/net/virtio-net.c            |  17 ++--
 migration/savevm.c             |   1 +
 replay/replay.c                |  23 +----
 system/runstate.c              |  31 +++++-
 scripts/replay-dump.py         | 167 ++++++++++++++++++++++-----------
 tests/avocado/replay_kernel.py |  31 +++++-
 tests/avocado/replay_linux.py  |  10 ++
 11 files changed, 245 insertions(+), 114 deletions(-)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 01/10] scripts/replay-dump.py: Update to current rr record format
  2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
@ 2024-08-13  5:06 ` Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 02/10] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2024-08-13  5:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

The v12 format support for replay-dump has a few issues still. This
fixes async decoding; adds event, shutdown, and end decoding; fixes
audio in / out events, fixes checkpoint checking of following async
events.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 scripts/replay-dump.py | 127 ++++++++++++++++++++++++++++++-----------
 1 file changed, 93 insertions(+), 34 deletions(-)

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index d668193e79..419ee3257b 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -20,6 +20,7 @@
 
 import argparse
 import struct
+import os
 from collections import namedtuple
 from os import path
 
@@ -134,6 +135,17 @@ def swallow_async_qword(eid, name, dumpfile):
     print("  %s(%d) @ %d" % (name, eid, step_id))
     return True
 
+def swallow_bytes(eid, name, dumpfile, nr):
+    """Swallow nr bytes of data without looking at it"""
+    dumpfile.seek(nr, os.SEEK_CUR)
+
+def decode_exception(eid, name, dumpfile):
+    print_event(eid, name)
+    return True
+
+# v12 does away with the additional event byte and encodes it in the main type
+# Between v8 and v9, REPLAY_ASYNC_BH_ONESHOT was added, but we don't decode
+# those versions so leave it out.
 async_decode_table = [ Decoder(0, "REPLAY_ASYNC_EVENT_BH", swallow_async_qword),
                        Decoder(1, "REPLAY_ASYNC_INPUT", decode_unimp),
                        Decoder(2, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
@@ -142,8 +154,8 @@ def swallow_async_qword(eid, name, dumpfile):
                        Decoder(5, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
 ]
 # See replay_read_events/replay_read_event
-def decode_async(eid, name, dumpfile):
-    """Decode an ASYNC event"""
+def decode_async_old(eid, name, dumpfile):
+    """Decode an ASYNC event (pre-v8)"""
 
     print_event(eid, name)
 
@@ -157,6 +169,35 @@ def decode_async(eid, name, dumpfile):
 
     return call_decode(async_decode_table, async_event_kind, dumpfile)
 
+def decode_async_bh(eid, name, dumpfile):
+    op_id = read_qword(dumpfile)
+    print_event(eid, name)
+    return True
+
+def decode_async_bh_oneshot(eid, name, dumpfile):
+    op_id = read_qword(dumpfile)
+    print_event(eid, name)
+    return True
+
+def decode_async_char_read(eid, name, dumpfile):
+    char_id = read_byte(dumpfile)
+    size = read_dword(dumpfile)
+    print_event(eid, name, "device:%x chars:%s" % (char_id, dumpfile.read(size)))
+    return True
+
+def decode_async_block(eid, name, dumpfile):
+    op_id = read_qword(dumpfile)
+    print_event(eid, name)
+    return True
+
+def decode_async_net(eid, name, dumpfile):
+    net_id = read_byte(dumpfile)
+    flags = read_dword(dumpfile)
+    size = read_dword(dumpfile)
+    swallow_bytes(eid, name, dumpfile, size)
+    print_event(eid, name, "net:%x flags:%x bytes:%d" % (net_id, flags, size))
+    return True
+
 total_insns = 0
 
 def decode_instruction(eid, name, dumpfile):
@@ -166,6 +207,10 @@ def decode_instruction(eid, name, dumpfile):
     print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns))
     return True
 
+def decode_shutdown(eid, name, dumpfile):
+    print_event(eid, name)
+    return True
+
 def decode_char_write(eid, name, dumpfile):
     res = read_dword(dumpfile)
     offset = read_dword(dumpfile)
@@ -177,7 +222,7 @@ def decode_audio_out(eid, name, dumpfile):
     print_event(eid, name, "%d" % (audio_data))
     return True
 
-def decode_checkpoint(eid, name, dumpfile):
+def __decode_checkpoint(eid, name, dumpfile, old):
     """Decode a checkpoint.
 
     Checkpoints contain a series of async events with their own specific data.
@@ -189,14 +234,20 @@ def decode_checkpoint(eid, name, dumpfile):
 
     # if the next event is EVENT_ASYNC there are a bunch of
     # async events to read, otherwise we are done
-    if next_event != 3:
-        print_event(eid, name, "no additional data", event_number)
-    else:
+    if (old and next_event == 3) or (not old and next_event >= 3 and next_event <= 9):
         print_event(eid, name, "more data follows", event_number)
+    else:
+        print_event(eid, name, "no additional data", event_number)
 
     replay_state.reuse_event(next_event)
     return True
 
+def decode_checkpoint_old(eid, name, dumpfile):
+    return __decode_checkpoint(eid, name, dumpfile, False)
+
+def decode_checkpoint(eid, name, dumpfile):
+    return __decode_checkpoint(eid, name, dumpfile, True)
+
 def decode_checkpoint_init(eid, name, dumpfile):
     print_event(eid, name)
     return True
@@ -212,15 +263,23 @@ def decode_clock(eid, name, dumpfile):
 
 def decode_random(eid, name, dumpfile):
     ret = read_dword(dumpfile)
-    data = read_array(dumpfile)
-    print_event(eid, "%d bytes of random data" % len(data))
+    size = read_dword(dumpfile)
+    swallow_bytes(eid, name, dumpfile, size)
+    if (ret):
+        print_event(eid, name, "%d bytes (getrandom failed)" % (size))
+    else:
+        print_event(eid, name, "%d bytes" % (size))
     return True
 
+def decode_end(eid, name, dumpfile):
+    print_event(eid, name)
+    return False
+
 # pre-MTTCG merge
 v5_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
                   Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
                   Decoder(2, "EVENT_EXCEPTION", decode_plain),
-                  Decoder(3, "EVENT_ASYNC", decode_async),
+                  Decoder(3, "EVENT_ASYNC", decode_async_old),
                   Decoder(4, "EVENT_SHUTDOWN", decode_unimp),
                   Decoder(5, "EVENT_CHAR_WRITE", decode_char_write),
                   Decoder(6, "EVENT_CHAR_READ_ALL", decode_unimp),
@@ -242,7 +301,7 @@ def decode_random(eid, name, dumpfile):
 v6_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
                   Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
                   Decoder(2, "EVENT_EXCEPTION", decode_plain),
-                  Decoder(3, "EVENT_ASYNC", decode_async),
+                  Decoder(3, "EVENT_ASYNC", decode_async_old),
                   Decoder(4, "EVENT_SHUTDOWN", decode_unimp),
                   Decoder(5, "EVENT_CHAR_WRITE", decode_char_write),
                   Decoder(6, "EVENT_CHAR_READ_ALL", decode_unimp),
@@ -266,7 +325,7 @@ def decode_random(eid, name, dumpfile):
 v7_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
                   Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
                   Decoder(2, "EVENT_EXCEPTION", decode_unimp),
-                  Decoder(3, "EVENT_ASYNC", decode_async),
+                  Decoder(3, "EVENT_ASYNC", decode_async_old),
                   Decoder(4, "EVENT_SHUTDOWN", decode_unimp),
                   Decoder(5, "EVENT_SHUTDOWN_HOST_ERR", decode_unimp),
                   Decoder(6, "EVENT_SHUTDOWN_HOST_QMP", decode_unimp),
@@ -296,32 +355,31 @@ def decode_random(eid, name, dumpfile):
 
 v12_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
                   Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
-                  Decoder(2, "EVENT_EXCEPTION", decode_plain),
-                  Decoder(3, "EVENT_ASYNC", decode_async),
-                  Decoder(4, "EVENT_ASYNC", decode_async),
-                  Decoder(5, "EVENT_ASYNC", decode_async),
-                  Decoder(6, "EVENT_ASYNC", decode_async),
-                  Decoder(6, "EVENT_ASYNC", decode_async),
-                  Decoder(8, "EVENT_ASYNC", decode_async),
-                  Decoder(9, "EVENT_ASYNC", decode_async),
-                  Decoder(10, "EVENT_ASYNC", decode_async),
-                  Decoder(11, "EVENT_SHUTDOWN", decode_unimp),
-                  Decoder(12, "EVENT_SHUTDOWN_HOST_ERR", decode_unimp),
-                  Decoder(13, "EVENT_SHUTDOWN_HOST_QMP_QUIT", decode_unimp),
-                  Decoder(14, "EVENT_SHUTDOWN_HOST_QMP_RESET", decode_unimp),
-                  Decoder(14, "EVENT_SHUTDOWN_HOST_SIGNAL", decode_unimp),
-                  Decoder(15, "EVENT_SHUTDOWN_HOST_UI", decode_unimp),
-                  Decoder(16, "EVENT_SHUTDOWN_GUEST_SHUTDOWN", decode_unimp),
-                  Decoder(17, "EVENT_SHUTDOWN_GUEST_RESET", decode_unimp),
-                  Decoder(18, "EVENT_SHUTDOWN_GUEST_PANIC", decode_unimp),
-                  Decoder(19, "EVENT_SHUTDOWN_GUEST_SUBSYSTEM_RESET", decode_unimp),
-                  Decoder(20, "EVENT_SHUTDOWN_GUEST_SNAPSHOT_LOAD", decode_unimp),
-                  Decoder(21, "EVENT_SHUTDOWN___MAX", decode_unimp),
+                  Decoder(2, "EVENT_EXCEPTION", decode_exception),
+                  Decoder(3, "EVENT_ASYNC_BH", decode_async_bh),
+                  Decoder(4, "EVENT_ASYNC_BH_ONESHOT", decode_async_bh_oneshot),
+                  Decoder(5, "EVENT_ASYNC_INPUT", decode_unimp),
+                  Decoder(6, "EVENT_ASYNC_INPUT_SYNC", decode_unimp),
+                  Decoder(7, "EVENT_ASYNC_CHAR_READ", decode_async_char_read),
+                  Decoder(8, "EVENT_ASYNC_BLOCK", decode_async_block),
+                  Decoder(9, "EVENT_ASYNC_NET", decode_async_net),
+                  Decoder(10, "EVENT_SHUTDOWN", decode_shutdown),
+                  Decoder(11, "EVENT_SHUTDOWN_HOST_ERR", decode_shutdown),
+                  Decoder(12, "EVENT_SHUTDOWN_HOST_QMP_QUIT", decode_shutdown),
+                  Decoder(13, "EVENT_SHUTDOWN_HOST_QMP_RESET", decode_shutdown),
+                  Decoder(14, "EVENT_SHUTDOWN_HOST_SIGNAL", decode_shutdown),
+                  Decoder(15, "EVENT_SHUTDOWN_HOST_UI", decode_shutdown),
+                  Decoder(16, "EVENT_SHUTDOWN_GUEST_SHUTDOWN", decode_shutdown),
+                  Decoder(17, "EVENT_SHUTDOWN_GUEST_RESET", decode_shutdown),
+                  Decoder(18, "EVENT_SHUTDOWN_GUEST_PANIC", decode_shutdown),
+                  Decoder(19, "EVENT_SHUTDOWN_SUBSYS_RESET", decode_shutdown),
+                  Decoder(20, "EVENT_SHUTDOWN_SNAPSHOT_LOAD", decode_shutdown),
+                  Decoder(21, "EVENT_SHUTDOWN___MAX", decode_shutdown),
                   Decoder(22, "EVENT_CHAR_WRITE", decode_char_write),
                   Decoder(23, "EVENT_CHAR_READ_ALL", decode_unimp),
                   Decoder(24, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
-                  Decoder(25, "EVENT_AUDIO_IN", decode_unimp),
-                  Decoder(26, "EVENT_AUDIO_OUT", decode_audio_out),
+                  Decoder(25, "EVENT_AUDIO_OUT", decode_audio_out),
+                  Decoder(26, "EVENT_AUDIO_IN", decode_unimp),
                   Decoder(27, "EVENT_RANDOM", decode_random),
                   Decoder(28, "EVENT_CLOCK_HOST", decode_clock),
                   Decoder(29, "EVENT_CLOCK_VIRTUAL_RT", decode_clock),
@@ -334,6 +392,7 @@ def decode_random(eid, name, dumpfile):
                   Decoder(36, "EVENT_CP_CLOCK_VIRTUAL_RT", decode_checkpoint),
                   Decoder(37, "EVENT_CP_INIT", decode_checkpoint_init),
                   Decoder(38, "EVENT_CP_RESET", decode_checkpoint),
+                  Decoder(39, "EVENT_END", decode_end),
 ]
 
 def parse_arguments():
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 02/10] scripts/replay-dump.py: rejig decoders in event number order
  2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 01/10] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
@ 2024-08-13  5:06 ` Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 03/10] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2024-08-13  5:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

Sort decoder functions to be ascending in order of event number,
same as the decoder tables.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 scripts/replay-dump.py | 56 +++++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index 419ee3257b..b82659cfb6 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -139,6 +139,19 @@ def swallow_bytes(eid, name, dumpfile, nr):
     """Swallow nr bytes of data without looking at it"""
     dumpfile.seek(nr, os.SEEK_CUR)
 
+total_insns = 0
+
+def decode_instruction(eid, name, dumpfile):
+    global total_insns
+    ins_diff = read_dword(dumpfile)
+    total_insns += ins_diff
+    print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns))
+    return True
+
+def decode_interrupt(eid, name, dumpfile):
+    print_event(eid, name)
+    return True
+
 def decode_exception(eid, name, dumpfile):
     print_event(eid, name)
     return True
@@ -198,15 +211,6 @@ def decode_async_net(eid, name, dumpfile):
     print_event(eid, name, "net:%x flags:%x bytes:%d" % (net_id, flags, size))
     return True
 
-total_insns = 0
-
-def decode_instruction(eid, name, dumpfile):
-    global total_insns
-    ins_diff = read_dword(dumpfile)
-    total_insns += ins_diff
-    print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns))
-    return True
-
 def decode_shutdown(eid, name, dumpfile):
     print_event(eid, name)
     return True
@@ -222,6 +226,21 @@ def decode_audio_out(eid, name, dumpfile):
     print_event(eid, name, "%d" % (audio_data))
     return True
 
+def decode_random(eid, name, dumpfile):
+    ret = read_dword(dumpfile)
+    size = read_dword(dumpfile)
+    swallow_bytes(eid, name, dumpfile, size)
+    if (ret):
+        print_event(eid, name, "%d bytes (getrandom failed)" % (size))
+    else:
+        print_event(eid, name, "%d bytes" % (size))
+    return True
+
+def decode_clock(eid, name, dumpfile):
+    clock_data = read_qword(dumpfile)
+    print_event(eid, name, "0x%x" % (clock_data))
+    return True
+
 def __decode_checkpoint(eid, name, dumpfile, old):
     """Decode a checkpoint.
 
@@ -252,25 +271,6 @@ def decode_checkpoint_init(eid, name, dumpfile):
     print_event(eid, name)
     return True
 
-def decode_interrupt(eid, name, dumpfile):
-    print_event(eid, name)
-    return True
-
-def decode_clock(eid, name, dumpfile):
-    clock_data = read_qword(dumpfile)
-    print_event(eid, name, "0x%x" % (clock_data))
-    return True
-
-def decode_random(eid, name, dumpfile):
-    ret = read_dword(dumpfile)
-    size = read_dword(dumpfile)
-    swallow_bytes(eid, name, dumpfile, size)
-    if (ret):
-        print_event(eid, name, "%d bytes (getrandom failed)" % (size))
-    else:
-        print_event(eid, name, "%d bytes" % (size))
-    return True
-
 def decode_end(eid, name, dumpfile):
     print_event(eid, name)
     return False
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 03/10] tests/avocado: excercise scripts/replay-dump.py in replay tests
  2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 01/10] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 02/10] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin
@ 2024-08-13  5:06 ` Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 04/10] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2024-08-13  5:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

This runs replay-dump.py after recording a trace, and fails the test if
the script fails.

replay-dump.py is modified to exit with non-zero if an error is
encountered while parsing, to support this.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

v5: Update timeout to 180s because x86 was just exceeding 120s in
gitlab with this change
---
 scripts/replay-dump.py         |  6 ++++--
 tests/avocado/replay_kernel.py | 13 ++++++++++++-
 tests/avocado/replay_linux.py  | 10 ++++++++++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index b82659cfb6..4ce7ff51cc 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -21,6 +21,7 @@
 import argparse
 import struct
 import os
+import sys
 from collections import namedtuple
 from os import path
 
@@ -100,7 +101,7 @@ def call_decode(table, index, dumpfile):
         print("Could not decode index: %d" % (index))
         print("Entry is: %s" % (decoder))
         print("Decode Table is:\n%s" % (table))
-        return False
+        raise(Exception("unknown event"))
     else:
         return decoder.fn(decoder.eid, decoder.name, dumpfile)
 
@@ -121,7 +122,7 @@ def print_event(eid, name, string=None, event_count=None):
 def decode_unimp(eid, name, _unused_dumpfile):
     "Unimplemented decoder, will trigger exit"
     print("%s not handled - will now stop" % (name))
-    return False
+    raise(Exception("unhandled event"))
 
 def decode_plain(eid, name, _unused_dumpfile):
     "Plain events without additional data"
@@ -434,6 +435,7 @@ def decode_file(filename):
                                     dumpfile)
     except Exception as inst:
         print(f"error {inst}")
+        sys.exit(1)
 
     finally:
         print(f"Reached {dumpfile.tell()} of {dumpsize} bytes")
diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py
index 232d287c27..a668af9d36 100644
--- a/tests/avocado/replay_kernel.py
+++ b/tests/avocado/replay_kernel.py
@@ -13,6 +13,7 @@
 import shutil
 import logging
 import time
+import subprocess
 
 from avocado import skip
 from avocado import skipUnless
@@ -31,7 +32,7 @@ class ReplayKernelBase(LinuxKernelTest):
     terminates.
     """
 
-    timeout = 120
+    timeout = 180
     KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-1 '
 
     def run_vm(self, kernel_path, kernel_command_line, console_pattern,
@@ -63,6 +64,8 @@ def run_vm(self, kernel_path, kernel_command_line, console_pattern,
             vm.shutdown()
             logger.info('finished the recording with log size %s bytes'
                         % os.path.getsize(replay_path))
+            self.run_replay_dump(replay_path)
+            logger.info('successfully tested replay-dump.py')
         else:
             vm.wait()
             logger.info('successfully finished the replay')
@@ -70,6 +73,14 @@ def run_vm(self, kernel_path, kernel_command_line, console_pattern,
         logger.info('elapsed time %.2f sec' % elapsed)
         return elapsed
 
+    def run_replay_dump(self, replay_path):
+        try:
+            subprocess.check_call(["./scripts/replay-dump.py",
+                                   "-f", replay_path],
+                                  stdout=subprocess.DEVNULL)
+        except subprocess.CalledProcessError:
+            self.fail('replay-dump.py failed')
+
     def run_rr(self, kernel_path, kernel_command_line, console_pattern,
                shift=7, args=None):
         replay_path = os.path.join(self.workdir, 'replay.bin')
diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
index b4673261ce..5916922435 100644
--- a/tests/avocado/replay_linux.py
+++ b/tests/avocado/replay_linux.py
@@ -94,6 +94,8 @@ def launch_and_wait(self, record, args, shift):
             vm.shutdown()
             logger.info('finished the recording with log size %s bytes'
                 % os.path.getsize(replay_path))
+            self.run_replay_dump(replay_path)
+            logger.info('successfully tested replay-dump.py')
         else:
             vm.event_wait('SHUTDOWN', self.timeout)
             vm.wait()
@@ -108,6 +110,14 @@ def run_rr(self, args=None, shift=7):
         logger = logging.getLogger('replay')
         logger.info('replay overhead {:.2%}'.format(t2 / t1 - 1))
 
+    def run_replay_dump(self, replay_path):
+        try:
+            subprocess.check_call(["./scripts/replay-dump.py",
+                                   "-f", replay_path],
+                                  stdout=subprocess.DEVNULL)
+        except subprocess.CalledProcessError:
+            self.fail('replay-dump.py failed')
+
 @skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
 class ReplayLinuxX8664(ReplayLinux):
     """
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 04/10] replay: allow runstate shutdown->running when replaying trace
  2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
                   ` (2 preceding siblings ...)
  2024-08-13  5:06 ` [PATCH v6 03/10] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin
@ 2024-08-13  5:06 ` Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 05/10] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2024-08-13  5:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

When replaying a trace, it is possible to go from shutdown to running
with a reverse-debugging step. This can be useful if the problem being
debugged triggers a reset or shutdown.

This can be tested by making a recording of a machine that shuts down,
then using -action shutdown=pause when replaying it. Continuing to the
end of the trace then reverse-stepping in gdb crashes due to invalid
runstate transition.

Just permitting the transition seems to be all that's necessary for
reverse-debugging to work well in such a state.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/sysemu/runstate.h |  1 +
 replay/replay.c           |  2 ++
 system/runstate.c         | 31 ++++++++++++++++++++++++++++---
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index e210a37abf..11c7ff3ffb 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -9,6 +9,7 @@ void runstate_set(RunState new_state);
 RunState runstate_get(void);
 bool runstate_is_running(void);
 bool runstate_needs_reset(void);
+void runstate_replay_enable(void);
 
 typedef void VMChangeStateHandler(void *opaque, bool running, RunState state);
 
diff --git a/replay/replay.c b/replay/replay.c
index a2c576c16e..b8564a4813 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -385,6 +385,8 @@ static void replay_enable(const char *fname, int mode)
         replay_fetch_data_kind();
     }
 
+    runstate_replay_enable();
+
     replay_init_events();
 }
 
diff --git a/system/runstate.c b/system/runstate.c
index c833316f6d..a0e2a5fd22 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -181,6 +181,12 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE__MAX, RUN_STATE__MAX },
 };
 
+static const RunStateTransition replay_play_runstate_transitions_def[] = {
+    { RUN_STATE_SHUTDOWN, RUN_STATE_RUNNING},
+
+    { RUN_STATE__MAX, RUN_STATE__MAX },
+};
+
 static bool runstate_valid_transitions[RUN_STATE__MAX][RUN_STATE__MAX];
 
 bool runstate_check(RunState state)
@@ -188,14 +194,33 @@ bool runstate_check(RunState state)
     return current_run_state == state;
 }
 
-static void runstate_init(void)
+static void transitions_set_valid(const RunStateTransition *rst)
 {
     const RunStateTransition *p;
 
-    memset(&runstate_valid_transitions, 0, sizeof(runstate_valid_transitions));
-    for (p = &runstate_transitions_def[0]; p->from != RUN_STATE__MAX; p++) {
+    for (p = rst; p->from != RUN_STATE__MAX; p++) {
         runstate_valid_transitions[p->from][p->to] = true;
     }
+}
+
+void runstate_replay_enable(void)
+{
+    assert(replay_mode != REPLAY_MODE_NONE);
+
+    if (replay_mode == REPLAY_MODE_PLAY) {
+        /*
+         * When reverse-debugging, it is possible to move state from
+         * shutdown to running.
+         */
+        transitions_set_valid(&replay_play_runstate_transitions_def[0]);
+    }
+}
+
+static void runstate_init(void)
+{
+    memset(&runstate_valid_transitions, 0, sizeof(runstate_valid_transitions));
+
+    transitions_set_valid(&runstate_transitions_def[0]);
 
     qemu_mutex_init(&vmstop_lock);
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 05/10] Revert "replay: stop us hanging in rr_wait_io_event"
  2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
                   ` (3 preceding siblings ...)
  2024-08-13  5:06 ` [PATCH v6 04/10] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
@ 2024-08-13  5:06 ` Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 06/10] tests/avocado: replay_kernel.py add x86-64 q35 machine test Nicholas Piggin
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2024-08-13  5:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

This reverts commit 1f881ea4a444ef36a8b6907b0b82be4b3af253a2.

That commit causes reverse_debugging.py test failures, and does
not seem to solve the root cause of the problem x86-64 still
hangs in record/replay tests.

The problem with short-cutting the iowait that was taken during
record phase is that related events will not get consumed at the
same points (e.g., reading the clock).

A hang with zero icount always seems to be a symptom of an earlier
problem that has caused the recording to become out of synch with
the execution and consumption of events by replay.

Acked-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/sysemu/replay.h      |  5 -----
 accel/tcg/tcg-accel-ops-rr.c |  2 +-
 replay/replay.c              | 21 ---------------------
 3 files changed, 1 insertion(+), 27 deletions(-)

diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index f229b2109c..8102fa54f0 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -73,11 +73,6 @@ int replay_get_instructions(void);
 /*! Updates instructions counter in replay mode. */
 void replay_account_executed_instructions(void);
 
-/**
- * replay_can_wait: check if we should pause for wait-io
- */
-bool replay_can_wait(void);
-
 /* Processing clocks and other time sources */
 
 /*! Save the specified clock */
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 48c38714bd..c59c77da4b 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -109,7 +109,7 @@ static void rr_wait_io_event(void)
 {
     CPUState *cpu;
 
-    while (all_cpu_threads_idle() && replay_can_wait()) {
+    while (all_cpu_threads_idle()) {
         rr_stop_kick_timer();
         qemu_cond_wait_bql(first_cpu->halt_cond);
     }
diff --git a/replay/replay.c b/replay/replay.c
index b8564a4813..895fa6b67a 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -451,27 +451,6 @@ void replay_start(void)
     replay_enable_events();
 }
 
-/*
- * For none/record the answer is yes.
- */
-bool replay_can_wait(void)
-{
-    if (replay_mode == REPLAY_MODE_PLAY) {
-        /*
-         * For playback we shouldn't ever be at a point we wait. If
-         * the instruction count has reached zero and we have an
-         * unconsumed event we should go around again and consume it.
-         */
-        if (replay_state.instruction_count == 0 && replay_state.has_unread_data) {
-            return false;
-        } else {
-            replay_sync_error("Playback shouldn't have to iowait");
-        }
-    }
-    return true;
-}
-
-
 void replay_finish(void)
 {
     if (replay_mode == REPLAY_MODE_NONE) {
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 06/10] tests/avocado: replay_kernel.py add x86-64 q35 machine test
  2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
                   ` (4 preceding siblings ...)
  2024-08-13  5:06 ` [PATCH v6 05/10] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin
@ 2024-08-13  5:06 ` Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 07/10] chardev: set record/replay on the base device of a muxed device Nicholas Piggin
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2024-08-13  5:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

The x86-64 pc machine is flaky with record/replay, but q35 is more
stable. Add a q35 test to replay_kernel.py.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 tests/avocado/replay_kernel.py | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py
index a668af9d36..e22c200a36 100644
--- a/tests/avocado/replay_kernel.py
+++ b/tests/avocado/replay_kernel.py
@@ -110,7 +110,7 @@ def test_i386_pc(self):
         self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
 
     # See https://gitlab.com/qemu-project/qemu/-/issues/2094
-    @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test sometimes gets stuck')
+    @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'pc machine is unstable with replay')
     def test_x86_64_pc(self):
         """
         :avocado: tags=arch:x86_64
@@ -128,6 +128,22 @@ def test_x86_64_pc(self):
 
         self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
 
+    def test_x86_64_q35(self):
+        """
+        :avocado: tags=arch:x86_64
+        :avocado: tags=machine:q35
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/29/Everything/x86_64/os/images/pxeboot'
+                      '/vmlinuz')
+        kernel_hash = '23bebd2680757891cf7adedb033532163a792495'
+        kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+        kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=ttyS0'
+        console_pattern = 'VFS: Cannot open root device'
+
+        self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
+
     def test_mips_malta(self):
         """
         :avocado: tags=arch:mips
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 07/10] chardev: set record/replay on the base device of a muxed device
  2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
                   ` (5 preceding siblings ...)
  2024-08-13  5:06 ` [PATCH v6 06/10] tests/avocado: replay_kernel.py add x86-64 q35 machine test Nicholas Piggin
@ 2024-08-13  5:06 ` Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 08/10] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state Nicholas Piggin
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2024-08-13  5:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

chardev events to a muxed device don't get recorded because e.g.,
qemu_chr_be_write() checks whether the base device has the record flag
set.

This can be seen when replaying a trace that has characters typed into
the console, an examination of the log shows they are not recorded.

Setting QEMU_CHAR_FEATURE_REPLAY on the base chardev fixes the problem.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 chardev/char.c | 71 +++++++++++++++++++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 21 deletions(-)

diff --git a/chardev/char.c b/chardev/char.c
index 3c43fb1278..ba847b6e9e 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -615,11 +615,24 @@ ChardevBackend *qemu_chr_parse_opts(QemuOpts *opts, Error **errp)
     return backend;
 }
 
-Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context,
-                                Error **errp)
+static void qemu_chardev_set_replay(Chardev *chr, Error **errp)
+{
+    if (replay_mode != REPLAY_MODE_NONE) {
+        if (CHARDEV_GET_CLASS(chr)->chr_ioctl) {
+            error_setg(errp, "Replay: ioctl is not supported "
+                             "for serial devices yet");
+            return;
+        }
+        qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_REPLAY);
+        replay_register_char_driver(chr);
+    }
+}
+
+static Chardev *__qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context,
+                                         bool replay, Error **errp)
 {
     const ChardevClass *cc;
-    Chardev *chr = NULL;
+    Chardev *base = NULL, *chr = NULL;
     ChardevBackend *backend = NULL;
     const char *name = qemu_opt_get(opts, "backend");
     const char *id = qemu_opts_id(opts);
@@ -657,11 +670,11 @@ Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context,
     chr = qemu_chardev_new(bid ? bid : id,
                            object_class_get_name(OBJECT_CLASS(cc)),
                            backend, context, errp);
-
     if (chr == NULL) {
         goto out;
     }
 
+    base = chr;
     if (bid) {
         Chardev *mux;
         qapi_free_ChardevBackend(backend);
@@ -681,11 +694,25 @@ Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context,
 out:
     qapi_free_ChardevBackend(backend);
     g_free(bid);
+
+    if (replay && base) {
+        /* RR should be set on the base device, not the mux */
+        qemu_chardev_set_replay(base, errp);
+    }
+
     return chr;
 }
 
-Chardev *qemu_chr_new_noreplay(const char *label, const char *filename,
-                               bool permit_mux_mon, GMainContext *context)
+Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context,
+                                Error **errp)
+{
+    /* XXX: should this really not record/replay? */
+    return __qemu_chr_new_from_opts(opts, context, false, errp);
+}
+
+static Chardev *__qemu_chr_new(const char *label, const char *filename,
+                               bool permit_mux_mon, GMainContext *context,
+                               bool replay)
 {
     const char *p;
     Chardev *chr;
@@ -693,14 +720,22 @@ Chardev *qemu_chr_new_noreplay(const char *label, const char *filename,
     Error *err = NULL;
 
     if (strstart(filename, "chardev:", &p)) {
-        return qemu_chr_find(p);
+        chr = qemu_chr_find(p);
+        if (replay) {
+            qemu_chardev_set_replay(chr, &err);
+            if (err) {
+                error_report_err(err);
+                return NULL;
+            }
+        }
+        return chr;
     }
 
     opts = qemu_chr_parse_compat(label, filename, permit_mux_mon);
     if (!opts)
         return NULL;
 
-    chr = qemu_chr_new_from_opts(opts, context, &err);
+    chr = __qemu_chr_new_from_opts(opts, context, replay, &err);
     if (!chr) {
         error_report_err(err);
         goto out;
@@ -722,24 +757,18 @@ out:
     return chr;
 }
 
+Chardev *qemu_chr_new_noreplay(const char *label, const char *filename,
+                               bool permit_mux_mon, GMainContext *context)
+{
+    return __qemu_chr_new(label, filename, permit_mux_mon, context, false);
+}
+
 static Chardev *qemu_chr_new_permit_mux_mon(const char *label,
                                           const char *filename,
                                           bool permit_mux_mon,
                                           GMainContext *context)
 {
-    Chardev *chr;
-    chr = qemu_chr_new_noreplay(label, filename, permit_mux_mon, context);
-    if (chr) {
-        if (replay_mode != REPLAY_MODE_NONE) {
-            qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_REPLAY);
-        }
-        if (qemu_chr_replay(chr) && CHARDEV_GET_CLASS(chr)->chr_ioctl) {
-            error_report("Replay: ioctl is not supported "
-                         "for serial devices yet");
-        }
-        replay_register_char_driver(chr);
-    }
-    return chr;
+    return __qemu_chr_new(label, filename, permit_mux_mon, context, true);
 }
 
 Chardev *qemu_chr_new(const char *label, const char *filename,
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 08/10] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state
  2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
                   ` (6 preceding siblings ...)
  2024-08-13  5:06 ` [PATCH v6 07/10] chardev: set record/replay on the base device of a muxed device Nicholas Piggin
@ 2024-08-13  5:06 ` Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 09/10] virtio-net: Use virtual time for RSC timers Nicholas Piggin
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2024-08-13  5:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

The regular qemu_bh_schedule() calls result in non-deterministic
execution of the bh in record-replay mode, which causes replay failure.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/net/virtio-net.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 08aa0b65e3..10ebaae5e2 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -40,6 +40,7 @@
 #include "migration/misc.h"
 #include "standard-headers/linux/ethtool.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/replay.h"
 #include "trace.h"
 #include "monitor/qdev.h"
 #include "monitor/monitor.h"
@@ -417,7 +418,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
                 timer_mod(q->tx_timer,
                                qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
             } else {
-                qemu_bh_schedule(q->tx_bh);
+                replay_bh_schedule_event(q->tx_bh);
             }
         } else {
             if (q->tx_timer) {
@@ -2672,7 +2673,7 @@ static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)
          */
         virtio_queue_set_notification(q->tx_vq, 0);
         if (q->tx_bh) {
-            qemu_bh_schedule(q->tx_bh);
+            replay_bh_schedule_event(q->tx_bh);
         } else {
             timer_mod(q->tx_timer,
                       qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
@@ -2838,7 +2839,7 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq)
         return;
     }
     virtio_queue_set_notification(vq, 0);
-    qemu_bh_schedule(q->tx_bh);
+    replay_bh_schedule_event(q->tx_bh);
 }
 
 static void virtio_net_tx_timer(void *opaque)
@@ -2921,7 +2922,7 @@ static void virtio_net_tx_bh(void *opaque)
     /* If we flush a full burst of packets, assume there are
      * more coming and immediately reschedule */
     if (ret >= n->tx_burst) {
-        qemu_bh_schedule(q->tx_bh);
+        replay_bh_schedule_event(q->tx_bh);
         q->tx_waiting = 1;
         return;
     }
@@ -2935,7 +2936,7 @@ static void virtio_net_tx_bh(void *opaque)
         return;
     } else if (ret > 0) {
         virtio_queue_set_notification(q->tx_vq, 0);
-        qemu_bh_schedule(q->tx_bh);
+        replay_bh_schedule_event(q->tx_bh);
         q->tx_waiting = 1;
     }
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 09/10] virtio-net: Use virtual time for RSC timers
  2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
                   ` (7 preceding siblings ...)
  2024-08-13  5:06 ` [PATCH v6 08/10] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state Nicholas Piggin
@ 2024-08-13  5:06 ` Nicholas Piggin
  2024-08-13  5:06 ` [PATCH v6 10/10] savevm: Fix load_snapshot error path crash Nicholas Piggin
  2024-08-13 14:12 ` [PATCH v6 00/10] replay: fixes and new test cases Alex Bennée
  10 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2024-08-13  5:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

Receive coalescing is visible to the target machine, so its timers
should use virtual time like other timers in virtio-net, to be
compatible with record-replay.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/net/virtio-net.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 10ebaae5e2..ed33a32877 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2124,7 +2124,7 @@ static void virtio_net_rsc_purge(void *opq)
     chain->stat.timer++;
     if (!QTAILQ_EMPTY(&chain->buffers)) {
         timer_mod(chain->drain_timer,
-              qemu_clock_get_ns(QEMU_CLOCK_HOST) + chain->n->rsc_timeout);
+              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + chain->n->rsc_timeout);
     }
 }
 
@@ -2360,7 +2360,7 @@ static size_t virtio_net_rsc_do_coalesce(VirtioNetRscChain *chain,
         chain->stat.empty_cache++;
         virtio_net_rsc_cache_buf(chain, nc, buf, size);
         timer_mod(chain->drain_timer,
-              qemu_clock_get_ns(QEMU_CLOCK_HOST) + chain->n->rsc_timeout);
+              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + chain->n->rsc_timeout);
         return size;
     }
 
@@ -2598,7 +2598,7 @@ static VirtioNetRscChain *virtio_net_rsc_lookup_chain(VirtIONet *n,
         chain->max_payload = VIRTIO_NET_MAX_IP6_PAYLOAD;
         chain->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
     }
-    chain->drain_timer = timer_new_ns(QEMU_CLOCK_HOST,
+    chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
                                       virtio_net_rsc_purge, chain);
     memset(&chain->stat, 0, sizeof(chain->stat));
 
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 10/10] savevm: Fix load_snapshot error path crash
  2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
                   ` (8 preceding siblings ...)
  2024-08-13  5:06 ` [PATCH v6 09/10] virtio-net: Use virtual time for RSC timers Nicholas Piggin
@ 2024-08-13  5:06 ` Nicholas Piggin
  2024-08-13 14:12 ` [PATCH v6 00/10] replay: fixes and new test cases Alex Bennée
  10 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2024-08-13  5:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

An error path missed setting *errp, which can cause a NULL deref.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 migration/savevm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 85958d7b09..6bb404b9c8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3288,6 +3288,7 @@ bool load_snapshot(const char *name, const char *vmstate,
     /* Don't even try to load empty VM states */
     ret = bdrv_snapshot_find(bs_vm_state, &sn, name);
     if (ret < 0) {
+        error_setg(errp, "Snapshot can not be found");
         return false;
     } else if (sn.vm_state_size == 0) {
         error_setg(errp, "This is a disk-only snapshot. Revert to it "
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 00/10] replay: fixes and new test cases
  2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
                   ` (9 preceding siblings ...)
  2024-08-13  5:06 ` [PATCH v6 10/10] savevm: Fix load_snapshot error path crash Nicholas Piggin
@ 2024-08-13 14:12 ` Alex Bennée
  10 siblings, 0 replies; 12+ messages in thread
From: Alex Bennée @ 2024-08-13 14:12 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa,
	Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev

Nicholas Piggin <npiggin@gmail.com> writes:

> Since v5, I cut down the series significantly to just the better
> reviewed parts, without adding new CI testing, since there are
> still be a few hiccups. aarch64 had some hangs Alex noticed, and
> x86_64 doesn't seem to be working anymore for me (with the big
> replay_linux.py test). But with this series, things are much closer,
> ppc64 does get through replay_linux.py (but requires some ppc
> specific fixes and the new test to be added, so I leave that out
> for now).
>
> Hopefully we can get this minimal series in and in the next
> release I'll try to get something stable enough for CI so it
> doesn't keep breaking.

I'm happy to take this through maintainer/for-9.1 unless there are any
major objections from other maintainers.

>
> Thanks,
> Nick
>
> Nicholas Piggin (10):
>   scripts/replay-dump.py: Update to current rr record format
>   scripts/replay-dump.py: rejig decoders in event number order
>   tests/avocado: excercise scripts/replay-dump.py in replay tests
>   replay: allow runstate shutdown->running when replaying trace
>   Revert "replay: stop us hanging in rr_wait_io_event"
>   tests/avocado: replay_kernel.py add x86-64 q35 machine test
>   chardev: set record/replay on the base device of a muxed device
>   virtio-net: Use replay_schedule_bh_event for bhs that affect machine
>     state
>   virtio-net: Use virtual time for RSC timers
>   savevm: Fix load_snapshot error path crash
>
>  include/sysemu/replay.h        |   5 -
>  include/sysemu/runstate.h      |   1 +
>  accel/tcg/tcg-accel-ops-rr.c   |   2 +-
>  chardev/char.c                 |  71 +++++++++-----
>  hw/net/virtio-net.c            |  17 ++--
>  migration/savevm.c             |   1 +
>  replay/replay.c                |  23 +----
>  system/runstate.c              |  31 +++++-
>  scripts/replay-dump.py         | 167 ++++++++++++++++++++++-----------
>  tests/avocado/replay_kernel.py |  31 +++++-
>  tests/avocado/replay_linux.py  |  10 ++
>  11 files changed, 245 insertions(+), 114 deletions(-)

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-08-13 14:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-13  5:06 [PATCH v6 00/10] replay: fixes and new test cases Nicholas Piggin
2024-08-13  5:06 ` [PATCH v6 01/10] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
2024-08-13  5:06 ` [PATCH v6 02/10] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin
2024-08-13  5:06 ` [PATCH v6 03/10] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin
2024-08-13  5:06 ` [PATCH v6 04/10] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
2024-08-13  5:06 ` [PATCH v6 05/10] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin
2024-08-13  5:06 ` [PATCH v6 06/10] tests/avocado: replay_kernel.py add x86-64 q35 machine test Nicholas Piggin
2024-08-13  5:06 ` [PATCH v6 07/10] chardev: set record/replay on the base device of a muxed device Nicholas Piggin
2024-08-13  5:06 ` [PATCH v6 08/10] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state Nicholas Piggin
2024-08-13  5:06 ` [PATCH v6 09/10] virtio-net: Use virtual time for RSC timers Nicholas Piggin
2024-08-13  5:06 ` [PATCH v6 10/10] savevm: Fix load_snapshot error path crash Nicholas Piggin
2024-08-13 14:12 ` [PATCH v6 00/10] replay: fixes and new test cases Alex Bennée

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).