* [PATCH v5 00/24] replay: fixes and new test cases
@ 2024-03-18 15:45 Nicholas Piggin
2024-03-18 15:45 ` [PATCH v5 01/24] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
` (23 more replies)
0 siblings, 24 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:45 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
Since v4,
- Fixed a 32-bit compile bug and leftover debugging prints.
- Dropped the patch to change virtio net guest announce timers
for now.
- Increased replay_kernel.py timeout because patch 3 was just
nudging x86_64 test over the 120s timeout when run on gitlab.
- New patch to add a non-flaky x86_64 q35 machine replay_kernel test.
- Added some workarounds for the new reverse_debugging.py test cases
that x86_64 and aarch64 fail on, so they don't cause CI failures.
Here are some CI jobs showing successes with replay_linux.py,
replay_kernel.py, and reverse_debugging.py on aarch64, ppc64,
and x86_64 tests:
https://gitlab.com/npiggin/qemu/-/jobs/6416485196
https://gitlab.com/npiggin/qemu/-/jobs/6416485200
Thanks,
Nick
Nicholas Piggin (24):
scripts/replay-dump.py: Update to current rr record format
scripts/replay-dump.py: rejig decoders in event number order
tests/avocado: excercise scripts/replay-dump.py in replay tests
replay: allow runstate shutdown->running when replaying trace
Revert "replay: stop us hanging in rr_wait_io_event"
tests/avocado: replay_kernel.py add x86-64 q35 machine test
chardev: set record/replay on the base device of a muxed device
replay: Fix migration use of clock
replay: Fix migration replay_mutex locking
virtio-net: Use replay_schedule_bh_event for bhs that affect machine
state
virtio-net: Use virtual time for RSC timers
savevm: Fix load_snapshot error path crash
tests/avocado: replay_linux.py remove the timeout expected guards
tests/avocado: reverse_debugging.py mark aarch64 and pseries as not
flaky
tests/avocado: reverse_debugging.py add test for x86-64 q35 machine
tests/avocado: reverse_debugging.py verify addresses between record
and replay
tests/avocado: reverse_debugging.py stop VM before sampling icount
tests/avocado: reverse_debugging reverse-step at the end of the trace
tests/avocado: reverse_debugging.py add snapshot testing
replay: simple auto-snapshot mode for record
tests/avocado: reverse_debugging.py test auto-snapshot mode
target/ppc: fix timebase register reset state
spapr: Fix vpa dispatch count for record-replay
tests/avocado: replay_linux.py add ppc64 pseries test
docs/system/replay.rst | 5 +
include/hw/ppc/spapr_cpu_core.h | 3 +
include/sysemu/replay.h | 16 ++-
include/sysemu/runstate.h | 1 +
accel/tcg/tcg-accel-ops-rr.c | 2 +-
chardev/char.c | 71 ++++++++---
hw/net/virtio-net.c | 17 +--
hw/ppc/ppc.c | 11 +-
hw/ppc/spapr.c | 36 +-----
hw/ppc/spapr_hcall.c | 33 +++++
hw/ppc/spapr_rtas.c | 1 +
migration/migration.c | 17 ++-
migration/savevm.c | 1 +
replay/replay-snapshot.c | 57 +++++++++
replay/replay.c | 50 ++++----
system/runstate.c | 31 ++++-
system/vl.c | 9 ++
qemu-options.hx | 9 +-
scripts/replay-dump.py | 167 ++++++++++++++++--------
tests/avocado/replay_kernel.py | 31 ++++-
tests/avocado/replay_linux.py | 92 +++++++++++++-
tests/avocado/reverse_debugging.py | 197 +++++++++++++++++++++++++----
22 files changed, 667 insertions(+), 190 deletions(-)
--
2.42.0
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v5 01/24] scripts/replay-dump.py: Update to current rr record format
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
@ 2024-03-18 15:45 ` Nicholas Piggin
2024-03-18 15:45 ` [PATCH v5 02/24] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin
` (22 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:45 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
The v12 format support for replay-dump has a few issues still. This
fixes async decoding; adds event, shutdown, and end decoding; fixes
audio in / out events, fixes checkpoint checking of following async
events.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
scripts/replay-dump.py | 127 ++++++++++++++++++++++++++++++-----------
1 file changed, 93 insertions(+), 34 deletions(-)
diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index d668193e79..419ee3257b 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -20,6 +20,7 @@
import argparse
import struct
+import os
from collections import namedtuple
from os import path
@@ -134,6 +135,17 @@ def swallow_async_qword(eid, name, dumpfile):
print(" %s(%d) @ %d" % (name, eid, step_id))
return True
+def swallow_bytes(eid, name, dumpfile, nr):
+ """Swallow nr bytes of data without looking at it"""
+ dumpfile.seek(nr, os.SEEK_CUR)
+
+def decode_exception(eid, name, dumpfile):
+ print_event(eid, name)
+ return True
+
+# v12 does away with the additional event byte and encodes it in the main type
+# Between v8 and v9, REPLAY_ASYNC_BH_ONESHOT was added, but we don't decode
+# those versions so leave it out.
async_decode_table = [ Decoder(0, "REPLAY_ASYNC_EVENT_BH", swallow_async_qword),
Decoder(1, "REPLAY_ASYNC_INPUT", decode_unimp),
Decoder(2, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
@@ -142,8 +154,8 @@ def swallow_async_qword(eid, name, dumpfile):
Decoder(5, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
]
# See replay_read_events/replay_read_event
-def decode_async(eid, name, dumpfile):
- """Decode an ASYNC event"""
+def decode_async_old(eid, name, dumpfile):
+ """Decode an ASYNC event (pre-v8)"""
print_event(eid, name)
@@ -157,6 +169,35 @@ def decode_async(eid, name, dumpfile):
return call_decode(async_decode_table, async_event_kind, dumpfile)
+def decode_async_bh(eid, name, dumpfile):
+ op_id = read_qword(dumpfile)
+ print_event(eid, name)
+ return True
+
+def decode_async_bh_oneshot(eid, name, dumpfile):
+ op_id = read_qword(dumpfile)
+ print_event(eid, name)
+ return True
+
+def decode_async_char_read(eid, name, dumpfile):
+ char_id = read_byte(dumpfile)
+ size = read_dword(dumpfile)
+ print_event(eid, name, "device:%x chars:%s" % (char_id, dumpfile.read(size)))
+ return True
+
+def decode_async_block(eid, name, dumpfile):
+ op_id = read_qword(dumpfile)
+ print_event(eid, name)
+ return True
+
+def decode_async_net(eid, name, dumpfile):
+ net_id = read_byte(dumpfile)
+ flags = read_dword(dumpfile)
+ size = read_dword(dumpfile)
+ swallow_bytes(eid, name, dumpfile, size)
+ print_event(eid, name, "net:%x flags:%x bytes:%d" % (net_id, flags, size))
+ return True
+
total_insns = 0
def decode_instruction(eid, name, dumpfile):
@@ -166,6 +207,10 @@ def decode_instruction(eid, name, dumpfile):
print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns))
return True
+def decode_shutdown(eid, name, dumpfile):
+ print_event(eid, name)
+ return True
+
def decode_char_write(eid, name, dumpfile):
res = read_dword(dumpfile)
offset = read_dword(dumpfile)
@@ -177,7 +222,7 @@ def decode_audio_out(eid, name, dumpfile):
print_event(eid, name, "%d" % (audio_data))
return True
-def decode_checkpoint(eid, name, dumpfile):
+def __decode_checkpoint(eid, name, dumpfile, old):
"""Decode a checkpoint.
Checkpoints contain a series of async events with their own specific data.
@@ -189,14 +234,20 @@ def decode_checkpoint(eid, name, dumpfile):
# if the next event is EVENT_ASYNC there are a bunch of
# async events to read, otherwise we are done
- if next_event != 3:
- print_event(eid, name, "no additional data", event_number)
- else:
+ if (old and next_event == 3) or (not old and next_event >= 3 and next_event <= 9):
print_event(eid, name, "more data follows", event_number)
+ else:
+ print_event(eid, name, "no additional data", event_number)
replay_state.reuse_event(next_event)
return True
+def decode_checkpoint_old(eid, name, dumpfile):
+ return __decode_checkpoint(eid, name, dumpfile, False)
+
+def decode_checkpoint(eid, name, dumpfile):
+ return __decode_checkpoint(eid, name, dumpfile, True)
+
def decode_checkpoint_init(eid, name, dumpfile):
print_event(eid, name)
return True
@@ -212,15 +263,23 @@ def decode_clock(eid, name, dumpfile):
def decode_random(eid, name, dumpfile):
ret = read_dword(dumpfile)
- data = read_array(dumpfile)
- print_event(eid, "%d bytes of random data" % len(data))
+ size = read_dword(dumpfile)
+ swallow_bytes(eid, name, dumpfile, size)
+ if (ret):
+ print_event(eid, name, "%d bytes (getrandom failed)" % (size))
+ else:
+ print_event(eid, name, "%d bytes" % (size))
return True
+def decode_end(eid, name, dumpfile):
+ print_event(eid, name)
+ return False
+
# pre-MTTCG merge
v5_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
Decoder(2, "EVENT_EXCEPTION", decode_plain),
- Decoder(3, "EVENT_ASYNC", decode_async),
+ Decoder(3, "EVENT_ASYNC", decode_async_old),
Decoder(4, "EVENT_SHUTDOWN", decode_unimp),
Decoder(5, "EVENT_CHAR_WRITE", decode_char_write),
Decoder(6, "EVENT_CHAR_READ_ALL", decode_unimp),
@@ -242,7 +301,7 @@ def decode_random(eid, name, dumpfile):
v6_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
Decoder(2, "EVENT_EXCEPTION", decode_plain),
- Decoder(3, "EVENT_ASYNC", decode_async),
+ Decoder(3, "EVENT_ASYNC", decode_async_old),
Decoder(4, "EVENT_SHUTDOWN", decode_unimp),
Decoder(5, "EVENT_CHAR_WRITE", decode_char_write),
Decoder(6, "EVENT_CHAR_READ_ALL", decode_unimp),
@@ -266,7 +325,7 @@ def decode_random(eid, name, dumpfile):
v7_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
Decoder(2, "EVENT_EXCEPTION", decode_unimp),
- Decoder(3, "EVENT_ASYNC", decode_async),
+ Decoder(3, "EVENT_ASYNC", decode_async_old),
Decoder(4, "EVENT_SHUTDOWN", decode_unimp),
Decoder(5, "EVENT_SHUTDOWN_HOST_ERR", decode_unimp),
Decoder(6, "EVENT_SHUTDOWN_HOST_QMP", decode_unimp),
@@ -296,32 +355,31 @@ def decode_random(eid, name, dumpfile):
v12_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),
Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
- Decoder(2, "EVENT_EXCEPTION", decode_plain),
- Decoder(3, "EVENT_ASYNC", decode_async),
- Decoder(4, "EVENT_ASYNC", decode_async),
- Decoder(5, "EVENT_ASYNC", decode_async),
- Decoder(6, "EVENT_ASYNC", decode_async),
- Decoder(6, "EVENT_ASYNC", decode_async),
- Decoder(8, "EVENT_ASYNC", decode_async),
- Decoder(9, "EVENT_ASYNC", decode_async),
- Decoder(10, "EVENT_ASYNC", decode_async),
- Decoder(11, "EVENT_SHUTDOWN", decode_unimp),
- Decoder(12, "EVENT_SHUTDOWN_HOST_ERR", decode_unimp),
- Decoder(13, "EVENT_SHUTDOWN_HOST_QMP_QUIT", decode_unimp),
- Decoder(14, "EVENT_SHUTDOWN_HOST_QMP_RESET", decode_unimp),
- Decoder(14, "EVENT_SHUTDOWN_HOST_SIGNAL", decode_unimp),
- Decoder(15, "EVENT_SHUTDOWN_HOST_UI", decode_unimp),
- Decoder(16, "EVENT_SHUTDOWN_GUEST_SHUTDOWN", decode_unimp),
- Decoder(17, "EVENT_SHUTDOWN_GUEST_RESET", decode_unimp),
- Decoder(18, "EVENT_SHUTDOWN_GUEST_PANIC", decode_unimp),
- Decoder(19, "EVENT_SHUTDOWN_GUEST_SUBSYSTEM_RESET", decode_unimp),
- Decoder(20, "EVENT_SHUTDOWN_GUEST_SNAPSHOT_LOAD", decode_unimp),
- Decoder(21, "EVENT_SHUTDOWN___MAX", decode_unimp),
+ Decoder(2, "EVENT_EXCEPTION", decode_exception),
+ Decoder(3, "EVENT_ASYNC_BH", decode_async_bh),
+ Decoder(4, "EVENT_ASYNC_BH_ONESHOT", decode_async_bh_oneshot),
+ Decoder(5, "EVENT_ASYNC_INPUT", decode_unimp),
+ Decoder(6, "EVENT_ASYNC_INPUT_SYNC", decode_unimp),
+ Decoder(7, "EVENT_ASYNC_CHAR_READ", decode_async_char_read),
+ Decoder(8, "EVENT_ASYNC_BLOCK", decode_async_block),
+ Decoder(9, "EVENT_ASYNC_NET", decode_async_net),
+ Decoder(10, "EVENT_SHUTDOWN", decode_shutdown),
+ Decoder(11, "EVENT_SHUTDOWN_HOST_ERR", decode_shutdown),
+ Decoder(12, "EVENT_SHUTDOWN_HOST_QMP_QUIT", decode_shutdown),
+ Decoder(13, "EVENT_SHUTDOWN_HOST_QMP_RESET", decode_shutdown),
+ Decoder(14, "EVENT_SHUTDOWN_HOST_SIGNAL", decode_shutdown),
+ Decoder(15, "EVENT_SHUTDOWN_HOST_UI", decode_shutdown),
+ Decoder(16, "EVENT_SHUTDOWN_GUEST_SHUTDOWN", decode_shutdown),
+ Decoder(17, "EVENT_SHUTDOWN_GUEST_RESET", decode_shutdown),
+ Decoder(18, "EVENT_SHUTDOWN_GUEST_PANIC", decode_shutdown),
+ Decoder(19, "EVENT_SHUTDOWN_SUBSYS_RESET", decode_shutdown),
+ Decoder(20, "EVENT_SHUTDOWN_SNAPSHOT_LOAD", decode_shutdown),
+ Decoder(21, "EVENT_SHUTDOWN___MAX", decode_shutdown),
Decoder(22, "EVENT_CHAR_WRITE", decode_char_write),
Decoder(23, "EVENT_CHAR_READ_ALL", decode_unimp),
Decoder(24, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp),
- Decoder(25, "EVENT_AUDIO_IN", decode_unimp),
- Decoder(26, "EVENT_AUDIO_OUT", decode_audio_out),
+ Decoder(25, "EVENT_AUDIO_OUT", decode_audio_out),
+ Decoder(26, "EVENT_AUDIO_IN", decode_unimp),
Decoder(27, "EVENT_RANDOM", decode_random),
Decoder(28, "EVENT_CLOCK_HOST", decode_clock),
Decoder(29, "EVENT_CLOCK_VIRTUAL_RT", decode_clock),
@@ -334,6 +392,7 @@ def decode_random(eid, name, dumpfile):
Decoder(36, "EVENT_CP_CLOCK_VIRTUAL_RT", decode_checkpoint),
Decoder(37, "EVENT_CP_INIT", decode_checkpoint_init),
Decoder(38, "EVENT_CP_RESET", decode_checkpoint),
+ Decoder(39, "EVENT_END", decode_end),
]
def parse_arguments():
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 02/24] scripts/replay-dump.py: rejig decoders in event number order
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
2024-03-18 15:45 ` [PATCH v5 01/24] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
@ 2024-03-18 15:45 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 03/24] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin
` (21 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:45 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
Sort decoder functions to be ascending in order of event number,
same as the decoder tables.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
scripts/replay-dump.py | 56 +++++++++++++++++++++---------------------
1 file changed, 28 insertions(+), 28 deletions(-)
diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index 419ee3257b..b82659cfb6 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -139,6 +139,19 @@ def swallow_bytes(eid, name, dumpfile, nr):
"""Swallow nr bytes of data without looking at it"""
dumpfile.seek(nr, os.SEEK_CUR)
+total_insns = 0
+
+def decode_instruction(eid, name, dumpfile):
+ global total_insns
+ ins_diff = read_dword(dumpfile)
+ total_insns += ins_diff
+ print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns))
+ return True
+
+def decode_interrupt(eid, name, dumpfile):
+ print_event(eid, name)
+ return True
+
def decode_exception(eid, name, dumpfile):
print_event(eid, name)
return True
@@ -198,15 +211,6 @@ def decode_async_net(eid, name, dumpfile):
print_event(eid, name, "net:%x flags:%x bytes:%d" % (net_id, flags, size))
return True
-total_insns = 0
-
-def decode_instruction(eid, name, dumpfile):
- global total_insns
- ins_diff = read_dword(dumpfile)
- total_insns += ins_diff
- print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns))
- return True
-
def decode_shutdown(eid, name, dumpfile):
print_event(eid, name)
return True
@@ -222,6 +226,21 @@ def decode_audio_out(eid, name, dumpfile):
print_event(eid, name, "%d" % (audio_data))
return True
+def decode_random(eid, name, dumpfile):
+ ret = read_dword(dumpfile)
+ size = read_dword(dumpfile)
+ swallow_bytes(eid, name, dumpfile, size)
+ if (ret):
+ print_event(eid, name, "%d bytes (getrandom failed)" % (size))
+ else:
+ print_event(eid, name, "%d bytes" % (size))
+ return True
+
+def decode_clock(eid, name, dumpfile):
+ clock_data = read_qword(dumpfile)
+ print_event(eid, name, "0x%x" % (clock_data))
+ return True
+
def __decode_checkpoint(eid, name, dumpfile, old):
"""Decode a checkpoint.
@@ -252,25 +271,6 @@ def decode_checkpoint_init(eid, name, dumpfile):
print_event(eid, name)
return True
-def decode_interrupt(eid, name, dumpfile):
- print_event(eid, name)
- return True
-
-def decode_clock(eid, name, dumpfile):
- clock_data = read_qword(dumpfile)
- print_event(eid, name, "0x%x" % (clock_data))
- return True
-
-def decode_random(eid, name, dumpfile):
- ret = read_dword(dumpfile)
- size = read_dword(dumpfile)
- swallow_bytes(eid, name, dumpfile, size)
- if (ret):
- print_event(eid, name, "%d bytes (getrandom failed)" % (size))
- else:
- print_event(eid, name, "%d bytes" % (size))
- return True
-
def decode_end(eid, name, dumpfile):
print_event(eid, name)
return False
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 03/24] tests/avocado: excercise scripts/replay-dump.py in replay tests
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
2024-03-18 15:45 ` [PATCH v5 01/24] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
2024-03-18 15:45 ` [PATCH v5 02/24] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 04/24] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
` (20 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
This runs replay-dump.py after recording a trace, and fails the test if
the script fails.
replay-dump.py is modified to exit with non-zero if an error is
encountered while parsing, to support this.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
scripts/replay-dump.py | 6 ++++--
tests/avocado/replay_kernel.py | 13 ++++++++++++-
tests/avocado/replay_linux.py | 10 ++++++++++
3 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index b82659cfb6..4ce7ff51cc 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -21,6 +21,7 @@
import argparse
import struct
import os
+import sys
from collections import namedtuple
from os import path
@@ -100,7 +101,7 @@ def call_decode(table, index, dumpfile):
print("Could not decode index: %d" % (index))
print("Entry is: %s" % (decoder))
print("Decode Table is:\n%s" % (table))
- return False
+ raise(Exception("unknown event"))
else:
return decoder.fn(decoder.eid, decoder.name, dumpfile)
@@ -121,7 +122,7 @@ def print_event(eid, name, string=None, event_count=None):
def decode_unimp(eid, name, _unused_dumpfile):
"Unimplemented decoder, will trigger exit"
print("%s not handled - will now stop" % (name))
- return False
+ raise(Exception("unhandled event"))
def decode_plain(eid, name, _unused_dumpfile):
"Plain events without additional data"
@@ -434,6 +435,7 @@ def decode_file(filename):
dumpfile)
except Exception as inst:
print(f"error {inst}")
+ sys.exit(1)
finally:
print(f"Reached {dumpfile.tell()} of {dumpsize} bytes")
diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py
index 10d99403a4..22032f3230 100644
--- a/tests/avocado/replay_kernel.py
+++ b/tests/avocado/replay_kernel.py
@@ -13,6 +13,7 @@
import shutil
import logging
import time
+import subprocess
from avocado import skip
from avocado import skipUnless
@@ -31,7 +32,7 @@ class ReplayKernelBase(LinuxKernelTest):
terminates.
"""
- timeout = 120
+ timeout = 180
KERNEL_COMMON_COMMAND_LINE = 'printk.time=1 panic=-1 '
def run_vm(self, kernel_path, kernel_command_line, console_pattern,
@@ -63,6 +64,8 @@ def run_vm(self, kernel_path, kernel_command_line, console_pattern,
vm.shutdown()
logger.info('finished the recording with log size %s bytes'
% os.path.getsize(replay_path))
+ self.run_replay_dump(replay_path)
+ logger.info('successfully tested replay-dump.py')
else:
vm.wait()
logger.info('successfully finished the replay')
@@ -70,6 +73,14 @@ def run_vm(self, kernel_path, kernel_command_line, console_pattern,
logger.info('elapsed time %.2f sec' % elapsed)
return elapsed
+ def run_replay_dump(self, replay_path):
+ try:
+ subprocess.check_call(["./scripts/replay-dump.py",
+ "-f", replay_path],
+ stdout=subprocess.DEVNULL)
+ except subprocess.CalledProcessError:
+ self.fail('replay-dump.py failed')
+
def run_rr(self, kernel_path, kernel_command_line, console_pattern,
shift=7, args=None):
replay_path = os.path.join(self.workdir, 'replay.bin')
diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
index f3a43dc98c..b3b91ddd9a 100644
--- a/tests/avocado/replay_linux.py
+++ b/tests/avocado/replay_linux.py
@@ -94,6 +94,8 @@ def launch_and_wait(self, record, args, shift):
vm.shutdown()
logger.info('finished the recording with log size %s bytes'
% os.path.getsize(replay_path))
+ self.run_replay_dump(replay_path)
+ logger.info('successfully tested replay-dump.py')
else:
vm.event_wait('SHUTDOWN', self.timeout)
vm.wait()
@@ -108,6 +110,14 @@ def run_rr(self, args=None, shift=7):
logger = logging.getLogger('replay')
logger.info('replay overhead {:.2%}'.format(t2 / t1 - 1))
+ def run_replay_dump(self, replay_path):
+ try:
+ subprocess.check_call(["./scripts/replay-dump.py",
+ "-f", replay_path],
+ stdout=subprocess.DEVNULL)
+ except subprocess.CalledProcessError:
+ self.fail('replay-dump.py failed')
+
@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
class ReplayLinuxX8664(ReplayLinux):
"""
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 04/24] replay: allow runstate shutdown->running when replaying trace
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (2 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 03/24] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 05/24] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin
` (19 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
When replaying a trace, it is possible to go from shutdown to running
with a reverse-debugging step. This can be useful if the problem being
debugged triggers a reset or shutdown.
This can be tested by making a recording of a machine that shuts down,
then using -action shutdown=pause when replaying it. Continuing to the
end of the trace then reverse-stepping in gdb crashes due to invalid
runstate transition.
Just permitting the transition seems to be all that's necessary for
reverse-debugging to work well in such a state.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
include/sysemu/runstate.h | 1 +
replay/replay.c | 2 ++
system/runstate.c | 31 ++++++++++++++++++++++++++++---
3 files changed, 31 insertions(+), 3 deletions(-)
diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 0117d243c4..fe25eed3c0 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -9,6 +9,7 @@ void runstate_set(RunState new_state);
RunState runstate_get(void);
bool runstate_is_running(void);
bool runstate_needs_reset(void);
+void runstate_replay_enable(void);
typedef void VMChangeStateHandler(void *opaque, bool running, RunState state);
diff --git a/replay/replay.c b/replay/replay.c
index a2c576c16e..b8564a4813 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -385,6 +385,8 @@ static void replay_enable(const char *fname, int mode)
replay_fetch_data_kind();
}
+ runstate_replay_enable();
+
replay_init_events();
}
diff --git a/system/runstate.c b/system/runstate.c
index d6ab860eca..8f9ee034de 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -182,6 +182,12 @@ static const RunStateTransition runstate_transitions_def[] = {
{ RUN_STATE__MAX, RUN_STATE__MAX },
};
+static const RunStateTransition replay_play_runstate_transitions_def[] = {
+ { RUN_STATE_SHUTDOWN, RUN_STATE_RUNNING},
+
+ { RUN_STATE__MAX, RUN_STATE__MAX },
+};
+
static bool runstate_valid_transitions[RUN_STATE__MAX][RUN_STATE__MAX];
bool runstate_check(RunState state)
@@ -189,14 +195,33 @@ bool runstate_check(RunState state)
return current_run_state == state;
}
-static void runstate_init(void)
+static void transitions_set_valid(const RunStateTransition *rst)
{
const RunStateTransition *p;
- memset(&runstate_valid_transitions, 0, sizeof(runstate_valid_transitions));
- for (p = &runstate_transitions_def[0]; p->from != RUN_STATE__MAX; p++) {
+ for (p = rst; p->from != RUN_STATE__MAX; p++) {
runstate_valid_transitions[p->from][p->to] = true;
}
+}
+
+void runstate_replay_enable(void)
+{
+ assert(replay_mode != REPLAY_MODE_NONE);
+
+ if (replay_mode == REPLAY_MODE_PLAY) {
+ /*
+ * When reverse-debugging, it is possible to move state from
+ * shutdown to running.
+ */
+ transitions_set_valid(&replay_play_runstate_transitions_def[0]);
+ }
+}
+
+static void runstate_init(void)
+{
+ memset(&runstate_valid_transitions, 0, sizeof(runstate_valid_transitions));
+
+ transitions_set_valid(&runstate_transitions_def[0]);
qemu_mutex_init(&vmstop_lock);
}
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 05/24] Revert "replay: stop us hanging in rr_wait_io_event"
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (3 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 04/24] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-19 17:58 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 06/24] tests/avocado: replay_kernel.py add x86-64 q35 machine test Nicholas Piggin
` (18 subsequent siblings)
23 siblings, 1 reply; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
This reverts commit 1f881ea4a444ef36a8b6907b0b82be4b3af253a2.
That commit causes reverse_debugging.py test failures, and does
not seem to solve the root cause of the problem x86-64 still
hangs in record/replay tests.
The problem with short-cutting the iowait that was taken during
record phase is that related events will not get consumed at the
same points (e.g., reading the clock).
A hang with zero icount always seems to be a symptom of an earlier
problem that has caused the recording to become out of synch with
the execution and consumption of events by replay.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
include/sysemu/replay.h | 5 -----
accel/tcg/tcg-accel-ops-rr.c | 2 +-
replay/replay.c | 21 ---------------------
3 files changed, 1 insertion(+), 27 deletions(-)
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index f229b2109c..8102fa54f0 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -73,11 +73,6 @@ int replay_get_instructions(void);
/*! Updates instructions counter in replay mode. */
void replay_account_executed_instructions(void);
-/**
- * replay_can_wait: check if we should pause for wait-io
- */
-bool replay_can_wait(void);
-
/* Processing clocks and other time sources */
/*! Save the specified clock */
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 894e73e52c..a942442a33 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -109,7 +109,7 @@ static void rr_wait_io_event(void)
{
CPUState *cpu;
- while (all_cpu_threads_idle() && replay_can_wait()) {
+ while (all_cpu_threads_idle()) {
rr_stop_kick_timer();
qemu_cond_wait_bql(first_cpu->halt_cond);
}
diff --git a/replay/replay.c b/replay/replay.c
index b8564a4813..895fa6b67a 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -451,27 +451,6 @@ void replay_start(void)
replay_enable_events();
}
-/*
- * For none/record the answer is yes.
- */
-bool replay_can_wait(void)
-{
- if (replay_mode == REPLAY_MODE_PLAY) {
- /*
- * For playback we shouldn't ever be at a point we wait. If
- * the instruction count has reached zero and we have an
- * unconsumed event we should go around again and consume it.
- */
- if (replay_state.instruction_count == 0 && replay_state.has_unread_data) {
- return false;
- } else {
- replay_sync_error("Playback shouldn't have to iowait");
- }
- }
- return true;
-}
-
-
void replay_finish(void)
{
if (replay_mode == REPLAY_MODE_NONE) {
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 06/24] tests/avocado: replay_kernel.py add x86-64 q35 machine test
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (4 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 05/24] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-19 17:58 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 07/24] chardev: set record/replay on the base device of a muxed device Nicholas Piggin
` (17 subsequent siblings)
23 siblings, 1 reply; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
The x86-64 pc machine is flaky with record/replay, but q35 is more
stable. Add a q35 test to replay_kernel.py.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/avocado/replay_kernel.py | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py
index 22032f3230..df0ea0678f 100644
--- a/tests/avocado/replay_kernel.py
+++ b/tests/avocado/replay_kernel.py
@@ -110,7 +110,7 @@ def test_i386_pc(self):
self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
# See https://gitlab.com/qemu-project/qemu/-/issues/2094
- @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test sometimes gets stuck')
+ @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'pc machine is unstable with replay')
def test_x86_64_pc(self):
"""
:avocado: tags=arch:x86_64
@@ -128,6 +128,22 @@ def test_x86_64_pc(self):
self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
+ def test_x86_64_q35(self):
+ """
+ :avocado: tags=arch:x86_64
+ :avocado: tags=machine:q35
+ """
+ kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+ '/linux/releases/29/Everything/x86_64/os/images/pxeboot'
+ '/vmlinuz')
+ kernel_hash = '23bebd2680757891cf7adedb033532163a792495'
+ kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+ kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=ttyS0'
+ console_pattern = 'VFS: Cannot open root device'
+
+ self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5)
+
def test_mips_malta(self):
"""
:avocado: tags=arch:mips
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 07/24] chardev: set record/replay on the base device of a muxed device
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (5 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 06/24] tests/avocado: replay_kernel.py add x86-64 q35 machine test Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-19 19:17 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 08/24] replay: Fix migration use of clock Nicholas Piggin
` (16 subsequent siblings)
23 siblings, 1 reply; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
chardev events to a muxed device don't get recorded because e.g.,
qemu_chr_be_write() checks whether the base device has the record flag
set.
This can be seen when replaying a trace that has characters typed into
the console, an examination of the log shows they are not recorded.
Setting QEMU_CHAR_FEATURE_REPLAY on the base chardev fixes the problem.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
chardev/char.c | 71 +++++++++++++++++++++++++++++++++++---------------
1 file changed, 50 insertions(+), 21 deletions(-)
diff --git a/chardev/char.c b/chardev/char.c
index 3c43fb1278..ba847b6e9e 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -615,11 +615,24 @@ ChardevBackend *qemu_chr_parse_opts(QemuOpts *opts, Error **errp)
return backend;
}
-Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context,
- Error **errp)
+static void qemu_chardev_set_replay(Chardev *chr, Error **errp)
+{
+ if (replay_mode != REPLAY_MODE_NONE) {
+ if (CHARDEV_GET_CLASS(chr)->chr_ioctl) {
+ error_setg(errp, "Replay: ioctl is not supported "
+ "for serial devices yet");
+ return;
+ }
+ qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_REPLAY);
+ replay_register_char_driver(chr);
+ }
+}
+
+static Chardev *__qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context,
+ bool replay, Error **errp)
{
const ChardevClass *cc;
- Chardev *chr = NULL;
+ Chardev *base = NULL, *chr = NULL;
ChardevBackend *backend = NULL;
const char *name = qemu_opt_get(opts, "backend");
const char *id = qemu_opts_id(opts);
@@ -657,11 +670,11 @@ Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context,
chr = qemu_chardev_new(bid ? bid : id,
object_class_get_name(OBJECT_CLASS(cc)),
backend, context, errp);
-
if (chr == NULL) {
goto out;
}
+ base = chr;
if (bid) {
Chardev *mux;
qapi_free_ChardevBackend(backend);
@@ -681,11 +694,25 @@ Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context,
out:
qapi_free_ChardevBackend(backend);
g_free(bid);
+
+ if (replay && base) {
+ /* RR should be set on the base device, not the mux */
+ qemu_chardev_set_replay(base, errp);
+ }
+
return chr;
}
-Chardev *qemu_chr_new_noreplay(const char *label, const char *filename,
- bool permit_mux_mon, GMainContext *context)
+Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context,
+ Error **errp)
+{
+ /* XXX: should this really not record/replay? */
+ return __qemu_chr_new_from_opts(opts, context, false, errp);
+}
+
+static Chardev *__qemu_chr_new(const char *label, const char *filename,
+ bool permit_mux_mon, GMainContext *context,
+ bool replay)
{
const char *p;
Chardev *chr;
@@ -693,14 +720,22 @@ Chardev *qemu_chr_new_noreplay(const char *label, const char *filename,
Error *err = NULL;
if (strstart(filename, "chardev:", &p)) {
- return qemu_chr_find(p);
+ chr = qemu_chr_find(p);
+ if (replay) {
+ qemu_chardev_set_replay(chr, &err);
+ if (err) {
+ error_report_err(err);
+ return NULL;
+ }
+ }
+ return chr;
}
opts = qemu_chr_parse_compat(label, filename, permit_mux_mon);
if (!opts)
return NULL;
- chr = qemu_chr_new_from_opts(opts, context, &err);
+ chr = __qemu_chr_new_from_opts(opts, context, replay, &err);
if (!chr) {
error_report_err(err);
goto out;
@@ -722,24 +757,18 @@ out:
return chr;
}
+Chardev *qemu_chr_new_noreplay(const char *label, const char *filename,
+ bool permit_mux_mon, GMainContext *context)
+{
+ return __qemu_chr_new(label, filename, permit_mux_mon, context, false);
+}
+
static Chardev *qemu_chr_new_permit_mux_mon(const char *label,
const char *filename,
bool permit_mux_mon,
GMainContext *context)
{
- Chardev *chr;
- chr = qemu_chr_new_noreplay(label, filename, permit_mux_mon, context);
- if (chr) {
- if (replay_mode != REPLAY_MODE_NONE) {
- qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_REPLAY);
- }
- if (qemu_chr_replay(chr) && CHARDEV_GET_CLASS(chr)->chr_ioctl) {
- error_report("Replay: ioctl is not supported "
- "for serial devices yet");
- }
- replay_register_char_driver(chr);
- }
- return chr;
+ return __qemu_chr_new(label, filename, permit_mux_mon, context, true);
}
Chardev *qemu_chr_new(const char *label, const char *filename,
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 08/24] replay: Fix migration use of clock
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (6 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 07/24] chardev: set record/replay on the base device of a muxed device Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-19 20:40 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 09/24] replay: Fix migration replay_mutex locking Nicholas Piggin
` (15 subsequent siblings)
23 siblings, 1 reply; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
Migration reads host clocks when not holding the replay_mutex, which
asserts when recording a trace. It seems that these migration times
should be host times like other statistics in MigrationState. These
do not require the replay_mutex.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
migration/migration.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 644e073b7d..2c286ccf63 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3424,7 +3424,7 @@ static void *migration_thread(void *opaque)
{
MigrationState *s = opaque;
MigrationThread *thread = NULL;
- int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+ int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
MigThrError thr_error;
bool urgent = false;
@@ -3476,7 +3476,7 @@ static void *migration_thread(void *opaque)
qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_ACTIVE);
- s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+ s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start;
trace_migration_thread_setup_complete();
@@ -3555,7 +3555,7 @@ static void *bg_migration_thread(void *opaque)
migration_rate_set(RATE_LIMIT_DISABLED);
- setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+ setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
/*
* We want to save vmstate for the moment when migration has been
* initiated but also we want to save RAM content while VM is running.
@@ -3588,7 +3588,7 @@ static void *bg_migration_thread(void *opaque)
qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_ACTIVE);
- s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+ s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start;
trace_migration_thread_setup_complete();
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 09/24] replay: Fix migration replay_mutex locking
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (7 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 08/24] replay: Fix migration use of clock Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 10/24] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state Nicholas Piggin
` (14 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
Migration causes a number of events that need to go in the replay
trace, such as vm state transitions. The replay_mutex lock needs to
be held for these.
The simplest approach seems to be just take it up-front when taking
the bql.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
migration/migration.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/migration/migration.c b/migration/migration.c
index 2c286ccf63..c7f376ae34 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -24,6 +24,7 @@
#include "socket.h"
#include "sysemu/runstate.h"
#include "sysemu/sysemu.h"
+#include "sysemu/replay.h"
#include "sysemu/cpu-throttle.h"
#include "rdma.h"
#include "ram.h"
@@ -2533,6 +2534,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
}
trace_postcopy_start();
+ replay_mutex_lock();
bql_lock();
trace_postcopy_start_set_run();
@@ -2638,6 +2640,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
migration_downtime_end(ms);
bql_unlock();
+ replay_mutex_unlock();
if (migrate_postcopy_ram()) {
/*
@@ -2679,6 +2682,7 @@ fail:
}
migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
bql_unlock();
+ replay_mutex_unlock();
return -1;
}
@@ -2730,6 +2734,7 @@ static int migration_completion_precopy(MigrationState *s,
{
int ret;
+ replay_mutex_lock();
bql_lock();
if (!migrate_mode_is_cpr(s)) {
@@ -2755,6 +2760,7 @@ static int migration_completion_precopy(MigrationState *s,
s->block_inactive);
out_unlock:
bql_unlock();
+ replay_mutex_unlock();
return ret;
}
@@ -3592,6 +3598,7 @@ static void *bg_migration_thread(void *opaque)
trace_migration_thread_setup_complete();
+ replay_mutex_lock();
bql_lock();
if (migration_stop_vm(s, RUN_STATE_PAUSED)) {
@@ -3625,6 +3632,7 @@ static void *bg_migration_thread(void *opaque)
*/
migration_bh_schedule(bg_migration_vm_start_bh, s);
bql_unlock();
+ replay_mutex_unlock();
while (migration_is_active()) {
MigIterateState iter_state = bg_migration_iteration_run(s);
@@ -3654,6 +3662,7 @@ fail:
migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_FAILED);
bql_unlock();
+ replay_mutex_unlock();
}
bg_migration_iteration_finish(s);
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 10/24] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (8 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 09/24] replay: Fix migration replay_mutex locking Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-19 20:41 ` Alex Bennée
2024-04-05 6:53 ` Pavel Dovgalyuk
2024-03-18 15:46 ` [PATCH v5 11/24] virtio-net: Use virtual time for RSC timers Nicholas Piggin
` (13 subsequent siblings)
23 siblings, 2 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
The regular qemu_bh_schedule() calls result in non-deterministic
execution of the bh in record-replay mode, which causes replay failure.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
hw/net/virtio-net.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9959f1932b..6ac737f2cf 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -40,6 +40,7 @@
#include "migration/misc.h"
#include "standard-headers/linux/ethtool.h"
#include "sysemu/sysemu.h"
+#include "sysemu/replay.h"
#include "trace.h"
#include "monitor/qdev.h"
#include "monitor/monitor.h"
@@ -416,7 +417,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
timer_mod(q->tx_timer,
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
} else {
- qemu_bh_schedule(q->tx_bh);
+ replay_bh_schedule_event(q->tx_bh);
}
} else {
if (q->tx_timer) {
@@ -2724,7 +2725,7 @@ static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)
*/
virtio_queue_set_notification(q->tx_vq, 0);
if (q->tx_bh) {
- qemu_bh_schedule(q->tx_bh);
+ replay_bh_schedule_event(q->tx_bh);
} else {
timer_mod(q->tx_timer,
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
@@ -2879,7 +2880,7 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq)
return;
}
virtio_queue_set_notification(vq, 0);
- qemu_bh_schedule(q->tx_bh);
+ replay_bh_schedule_event(q->tx_bh);
}
static void virtio_net_tx_timer(void *opaque)
@@ -2962,7 +2963,7 @@ static void virtio_net_tx_bh(void *opaque)
/* If we flush a full burst of packets, assume there are
* more coming and immediately reschedule */
if (ret >= n->tx_burst) {
- qemu_bh_schedule(q->tx_bh);
+ replay_bh_schedule_event(q->tx_bh);
q->tx_waiting = 1;
return;
}
@@ -2976,7 +2977,7 @@ static void virtio_net_tx_bh(void *opaque)
return;
} else if (ret > 0) {
virtio_queue_set_notification(q->tx_vq, 0);
- qemu_bh_schedule(q->tx_bh);
+ replay_bh_schedule_event(q->tx_bh);
q->tx_waiting = 1;
}
}
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 11/24] virtio-net: Use virtual time for RSC timers
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (9 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 10/24] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 12/24] savevm: Fix load_snapshot error path crash Nicholas Piggin
` (12 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
Receive coalescing is visible to the target machine, so its timers
should use virtual time like other timers in virtio-net, to be
compatible with record-replay.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
hw/net/virtio-net.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 6ac737f2cf..894df25222 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2176,7 +2176,7 @@ static void virtio_net_rsc_purge(void *opq)
chain->stat.timer++;
if (!QTAILQ_EMPTY(&chain->buffers)) {
timer_mod(chain->drain_timer,
- qemu_clock_get_ns(QEMU_CLOCK_HOST) + chain->n->rsc_timeout);
+ qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + chain->n->rsc_timeout);
}
}
@@ -2412,7 +2412,7 @@ static size_t virtio_net_rsc_do_coalesce(VirtioNetRscChain *chain,
chain->stat.empty_cache++;
virtio_net_rsc_cache_buf(chain, nc, buf, size);
timer_mod(chain->drain_timer,
- qemu_clock_get_ns(QEMU_CLOCK_HOST) + chain->n->rsc_timeout);
+ qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + chain->n->rsc_timeout);
return size;
}
@@ -2650,7 +2650,7 @@ static VirtioNetRscChain *virtio_net_rsc_lookup_chain(VirtIONet *n,
chain->max_payload = VIRTIO_NET_MAX_IP6_PAYLOAD;
chain->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
}
- chain->drain_timer = timer_new_ns(QEMU_CLOCK_HOST,
+ chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
virtio_net_rsc_purge, chain);
memset(&chain->stat, 0, sizeof(chain->stat));
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 12/24] savevm: Fix load_snapshot error path crash
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (10 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 11/24] virtio-net: Use virtual time for RSC timers Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-19 20:49 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 13/24] tests/avocado: replay_linux.py remove the timeout expected guards Nicholas Piggin
` (11 subsequent siblings)
23 siblings, 1 reply; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
An error path missed setting *errp, which can cause a NULL deref.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
migration/savevm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/migration/savevm.c b/migration/savevm.c
index 388d7af7cd..10c187dd10 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3260,6 +3260,7 @@ bool load_snapshot(const char *name, const char *vmstate,
/* Don't even try to load empty VM states */
ret = bdrv_snapshot_find(bs_vm_state, &sn, name);
if (ret < 0) {
+ error_setg(errp, "Snapshot can not be found");
return false;
} else if (sn.vm_state_size == 0) {
error_setg(errp, "This is a disk-only snapshot. Revert to it "
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 13/24] tests/avocado: replay_linux.py remove the timeout expected guards
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (11 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 12/24] savevm: Fix load_snapshot error path crash Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-19 17:57 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 14/24] tests/avocado: reverse_debugging.py mark aarch64 and pseries as not flaky Nicholas Piggin
` (10 subsequent siblings)
23 siblings, 1 reply; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
replay_linux tests with virtio on aarch64 gciv3 and x86-64 q35 machines
seems to be more reliable now, so timeouts are no longer expected.
pc_i440fx, gciv2, and non-virtio still have problems, so mark them as
flaky: they are not just long-running, but can hang indefinitely.
These tests take about 400 seconds each, so add the SPEED=slow guard.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/avocado/replay_linux.py | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
index b3b91ddd9a..b3b74a367c 100644
--- a/tests/avocado/replay_linux.py
+++ b/tests/avocado/replay_linux.py
@@ -118,7 +118,7 @@ def run_replay_dump(self, replay_path):
except subprocess.CalledProcessError:
self.fail('replay-dump.py failed')
-@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
+@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
class ReplayLinuxX8664(ReplayLinux):
"""
:avocado: tags=arch:x86_64
@@ -127,19 +127,21 @@ class ReplayLinuxX8664(ReplayLinux):
chksum = 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0'
+ @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable')
def test_pc_i440fx(self):
"""
:avocado: tags=machine:pc
"""
self.run_rr(shift=1)
+ @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable')
def test_pc_q35(self):
"""
:avocado: tags=machine:q35
"""
self.run_rr(shift=3)
-@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
+@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
class ReplayLinuxX8664Virtio(ReplayLinux):
"""
:avocado: tags=arch:x86_64
@@ -153,6 +155,7 @@ class ReplayLinuxX8664Virtio(ReplayLinux):
chksum = 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0'
+ @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable')
def test_pc_i440fx(self):
"""
:avocado: tags=machine:pc
@@ -165,7 +168,7 @@ def test_pc_q35(self):
"""
self.run_rr(shift=3)
-@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
+@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
class ReplayLinuxAarch64(ReplayLinux):
"""
:avocado: tags=accel:tcg
@@ -187,6 +190,7 @@ def get_common_args(self):
'-device', 'virtio-rng-pci,rng=rng0',
'-object', 'rng-builtin,id=rng0')
+ @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable')
def test_virt_gicv2(self):
"""
:avocado: tags=machine:gic-version=2
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 14/24] tests/avocado: reverse_debugging.py mark aarch64 and pseries as not flaky
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (12 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 13/24] tests/avocado: replay_linux.py remove the timeout expected guards Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 15/24] tests/avocado: reverse_debugging.py add test for x86-64 q35 machine Nicholas Piggin
` (9 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
These seem to be quite solid, including on several gitlab CI runs.
Enabling them should help catch breakage in future.
And update the powernv comment -- gitlab isn't the problem, there are
known gaps in implementation.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/avocado/reverse_debugging.py | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
index 92855a02a5..8fe76ff921 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -223,9 +223,6 @@ class ReverseDebugging_AArch64(ReverseDebugging):
REG_PC = 32
- # unidentified gitlab timeout problem
- @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
-
def test_aarch64_virt(self):
"""
:avocado: tags=arch:aarch64
@@ -248,14 +245,10 @@ class ReverseDebugging_ppc64(ReverseDebugging):
REG_PC = 0x40
- # unidentified gitlab timeout problem
- @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
-
def test_ppc64_pseries(self):
"""
:avocado: tags=arch:ppc64
:avocado: tags=machine:pseries
- :avocado: tags=flaky
"""
# SLOF branches back to its entry point, which causes this test
# to take the 'hit a breakpoint again' path. That's not a problem,
@@ -264,7 +257,7 @@ def test_ppc64_pseries(self):
self.reverse_debugging()
# See https://gitlab.com/qemu-project/qemu/-/issues/1992
- @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
+ @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'powernv migration support is incomplete so rr debugging is flaky')
def test_ppc64_powernv(self):
"""
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 15/24] tests/avocado: reverse_debugging.py add test for x86-64 q35 machine
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (13 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 14/24] tests/avocado: reverse_debugging.py mark aarch64 and pseries as not flaky Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-19 20:50 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 16/24] tests/avocado: reverse_debugging.py verify addresses between record and replay Nicholas Piggin
` (8 subsequent siblings)
23 siblings, 1 reply; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
The x86-64 pc machine has a problem with record/replay. q35 seems
to work well. Add a new q35 test and update the flaky message for
pc.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/avocado/reverse_debugging.py | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
index 8fe76ff921..e76cf54ad7 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -206,8 +206,7 @@ def get_pc(self, g):
+ self.get_reg_le(g, self.REG_CS) * 0x10
# unidentified gitlab timeout problem
- @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
-
+ @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'pc machine is unstable with replay')
def test_x86_64_pc(self):
"""
:avocado: tags=arch:x86_64
@@ -216,6 +215,14 @@ def test_x86_64_pc(self):
# start with BIOS only
self.reverse_debugging()
+ def test_x86_64_q35(self):
+ """
+ :avocado: tags=arch:x86_64
+ :avocado: tags=machine:q35
+ """
+ # start with BIOS only
+ self.reverse_debugging()
+
class ReverseDebugging_AArch64(ReverseDebugging):
"""
:avocado: tags=accel:tcg
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 16/24] tests/avocado: reverse_debugging.py verify addresses between record and replay
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (14 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 15/24] tests/avocado: reverse_debugging.py add test for x86-64 q35 machine Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 17/24] tests/avocado: reverse_debugging.py stop VM before sampling icount Nicholas Piggin
` (7 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
gdb is only attached in the replay phase, and the recorded trace is
single-stepped to build up an address map that is used to then verify
reverse-step and reverse-continue.
This change attaches gdb in the record phase, and builds the address
map by single-stepping the live machine. That address map is then
verified by single-stepping the replay, and then used for the same
reverse debug verification.
This is a stronger verifcation between the record and the replay phases,
as well as providing some testing of gdb operation in record mode.
x86_64 has problems with taking the first step, it does not advance the
icount, so a workaround is added for it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/avocado/reverse_debugging.py | 68 +++++++++++++++++++++++++-----
1 file changed, 58 insertions(+), 10 deletions(-)
diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
index e76cf54ad7..8a9578ad5a 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -35,6 +35,10 @@ class ReverseDebugging(LinuxKernelTest):
STEPS = 10
endian_is_le = True
+ # If first_step_workaround is true, check whether the first step moved
+ # icount, and if not then step again.
+ first_step_workaround = False
+
def run_vm(self, record, shift, args, replay_path, image_path, port):
logger = logging.getLogger('replay')
vm = self.get_vm()
@@ -45,7 +49,7 @@ def run_vm(self, record, shift, args, replay_path, image_path, port):
else:
logger.info('replaying the execution...')
mode = 'replay'
- vm.add_args('-gdb', 'tcp::%d' % port, '-S')
+ vm.add_args('-gdb', 'tcp::%d' % port, '-S')
vm.add_args('-icount', 'shift=%s,rr=%s,rrfile=%s,rrsnapshot=init' %
(shift, mode, replay_path),
'-net', 'none')
@@ -87,6 +91,17 @@ def check_pc(self, g, addr):
if pc != addr:
self.fail('Invalid PC (read %x instead of %x)' % (pc, addr))
+ @staticmethod
+ def gdb_cont(g):
+ g.cmd(b'c')
+
+ @staticmethod
+ def gdb_cont_nowait(g):
+ # The avocado GDBRemote does not have a good way to disconnect or
+ # continue without waiting for a reply, so open-code our own here.
+ data = g.encode(b'c')
+ g._socket.send(data)
+
@staticmethod
def gdb_step(g):
g.cmd(b's', b'T05thread:01;')
@@ -117,11 +132,40 @@ def reverse_debugging(self, shift=7, args=None):
replay_path = os.path.join(self.workdir, 'replay.bin')
port = find_free_port()
+ steps = []
+
# record the log
vm = self.run_vm(True, shift, args, replay_path, image_path, port)
+ logger.info('connecting to gdbstub')
+ g = gdb.GDBRemote('127.0.0.1', port, False, False)
+ g.connect()
+ r = g.cmd(b'qSupported')
+ if b'qXfer:features:read+' in r:
+ g.cmd(b'qXfer:features:read:target.xml:0,ffb')
+
+ if self.vm_get_icount(vm) != 0:
+ self.fail('icount does not start at zero')
+
+ # save the addresses of the first STEPS instructions executed
+ logger.info('stepping forward')
+ for i in range(self.STEPS):
+ pc = self.get_pc(g)
+ logger.info('saving position %x' % pc)
+ steps.append(pc)
+ self.gdb_step(g)
+ if self.first_step_workaround and i == 0 and self.vm_get_icount(vm) == 0:
+ logger.warn('failed to take first step, stepping again')
+ self.gdb_step(g)
+ if self.vm_get_icount(vm) != self.STEPS:
+ self.fail('icount (%d) does not match number of instructions stepped' % self.vm_get_icount(vm))
+
+ logger.info('continue running')
+ self.gdb_cont_nowait(g)
+
while self.vm_get_icount(vm) <= self.STEPS:
pass
last_icount = self.vm_get_icount(vm)
+ logger.info('shutdown...')
vm.shutdown()
logger.info("recorded log with %s+ steps" % last_icount)
@@ -139,23 +183,23 @@ def reverse_debugging(self, shift=7, args=None):
if b'ReverseContinue+' not in r:
self.fail('Reverse continue is not supported by QEMU')
+ # Try single stepping
logger.info('stepping forward')
- steps = []
- # record first instruction addresses
- for _ in range(self.STEPS):
- pc = self.get_pc(g)
- logger.info('saving position %x' % pc)
- steps.append(pc)
+ for addr in steps:
+ # verify addresses match what initial execution saw
+ self.check_pc(g, addr)
self.gdb_step(g)
+ logger.info('found position %x' % addr)
- # visit the recorded instruction in reverse order
+ # Try reverse stepping
logger.info('stepping backward')
for addr in steps[::-1]:
self.gdb_bstep(g)
+ # verify addresses match what initial execution saw
self.check_pc(g, addr)
logger.info('found position %x' % addr)
- # visit the recorded instruction in forward order
+ # Step forward again
logger.info('stepping forward')
for addr in steps:
self.check_pc(g, addr)
@@ -175,7 +219,7 @@ def reverse_debugging(self, shift=7, args=None):
# continue - will return after pausing
# This could stop at the end and get a T02 return, or by
# re-executing one of the breakpoints and get a T05 return.
- g.cmd(b'c')
+ self.gdb_cont(g)
if self.vm_get_icount(vm) == last_icount - 1:
logger.info('reached the end (icount %s)' % (last_icount - 1))
else:
@@ -201,6 +245,10 @@ class ReverseDebugging_X86_64(ReverseDebugging):
REG_PC = 0x10
REG_CS = 0x12
+
+ # The initial step does not change pc on x86 for some reason.
+ first_step_workaround = True
+
def get_pc(self, g):
return self.get_reg_le(g, self.REG_PC) \
+ self.get_reg_le(g, self.REG_CS) * 0x10
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 17/24] tests/avocado: reverse_debugging.py stop VM before sampling icount
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (15 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 16/24] tests/avocado: reverse_debugging.py verify addresses between record and replay Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 18/24] tests/avocado: reverse_debugging reverse-step at the end of the trace Nicholas Piggin
` (6 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
Rather than sampling icount while the vm is running, stop it before
sampling icount. This allows the actual last recorded icount in the
trace to be found.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/avocado/reverse_debugging.py | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
index 8a9578ad5a..bdc9082c85 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -156,19 +156,20 @@ def reverse_debugging(self, shift=7, args=None):
if self.first_step_workaround and i == 0 and self.vm_get_icount(vm) == 0:
logger.warn('failed to take first step, stepping again')
self.gdb_step(g)
- if self.vm_get_icount(vm) != self.STEPS:
+ self.gdb_step(g)
+ if self.vm_get_icount(vm) != self.STEPS + 1:
self.fail('icount (%d) does not match number of instructions stepped' % self.vm_get_icount(vm))
logger.info('continue running')
self.gdb_cont_nowait(g)
- while self.vm_get_icount(vm) <= self.STEPS:
- pass
+ logger.info('stopping to read final icount')
+ vm.qmp('stop')
last_icount = self.vm_get_icount(vm)
logger.info('shutdown...')
vm.shutdown()
- logger.info("recorded log with %s+ steps" % last_icount)
+ logger.info("recorded log with %s steps" % last_icount)
# replay and run debug commands
vm = self.run_vm(False, shift, args, replay_path, image_path, port)
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 18/24] tests/avocado: reverse_debugging reverse-step at the end of the trace
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (16 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 17/24] tests/avocado: reverse_debugging.py stop VM before sampling icount Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 19/24] tests/avocado: reverse_debugging.py add snapshot testing Nicholas Piggin
` (5 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
The longer a trace is played for, the more chance there is for bugs
to cause it to go out of synch with the initial recording. Stepping
backward from the end of a trace can be a good way to find problems.
This extends the runtime of the record phase to 1 second, to build a
bigger trace, and it adds a replay test that runs to the end of the
trace, steps back then forward and verifies the pc.
x86_64 and aarch64 have problems with verifying the pc at the end of
the trace after reverse-stepping, so add a workaround to skip that
check for them.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/avocado/reverse_debugging.py | 65 +++++++++++++++++++++++++++++-
1 file changed, 63 insertions(+), 2 deletions(-)
diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
index bdc9082c85..c0cf580d5c 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -9,6 +9,7 @@
# later. See the COPYING file in the top-level directory.
import os
import logging
+import time
from avocado import skipUnless
from avocado_qemu import BUILD_DIR
@@ -31,10 +32,15 @@ class ReverseDebugging(LinuxKernelTest):
that the execution is stopped at the last of them.
"""
- timeout = 10
+ timeout = 30
STEPS = 10
endian_is_le = True
+ # If long_trace is true, record execution for some time, and verify
+ # reverse-stepping from the end of the trace as well. If false, only the
+ # first 10 instructions are stepped.
+ verify_end = True
+
# If first_step_workaround is true, check whether the first step moved
# icount, and if not then step again.
first_step_workaround = False
@@ -91,6 +97,19 @@ def check_pc(self, g, addr):
if pc != addr:
self.fail('Invalid PC (read %x instead of %x)' % (pc, addr))
+ @staticmethod
+ def gdb_break(g):
+ # The avocado GDBRemote does not have a good way to send this break
+ # packet, which is different from others.
+ g._socket.send(b'\x03')
+ transmission_result = g._socket.recv(1)
+ if transmission_result == '-':
+ raise Exception("Bad ack")
+ result = g._socket.recv(1024)
+ response_payload = g.decode(result)
+ if response_payload != b'T02thread:01;':
+ raise Exception("Unexpected response" + response_payload.decode())
+
@staticmethod
def gdb_cont(g):
g.cmd(b'c')
@@ -162,9 +181,15 @@ def reverse_debugging(self, shift=7, args=None):
logger.info('continue running')
self.gdb_cont_nowait(g)
-
+ logger.info('running for 1s...')
+ time.sleep(1)
logger.info('stopping to read final icount')
vm.qmp('stop')
+ self.gdb_break(g)
+
+ last_pc = self.get_pc(g)
+ logger.info('saving position %x' % last_pc)
+ self.gdb_step(g)
last_icount = self.vm_get_icount(vm)
logger.info('shutdown...')
vm.shutdown()
@@ -200,6 +225,34 @@ def reverse_debugging(self, shift=7, args=None):
self.check_pc(g, addr)
logger.info('found position %x' % addr)
+ # Run to the end of the trace, reverse-step, and then reverse-continue
+ # back to the start, with no breakpoints. This allows us to get to the
+ # end of the trace and reverse step from there, without possibly
+ # hitting a breakpoint that prevents reaching the end, as can happen
+ # with the later breakpoint tests.
+ logger.info('running to the end of the trace')
+ vm.qmp('replay-break', icount=last_icount - 1)
+ # This should stop at the end and get a T02 return.
+ self.gdb_cont(g)
+ if self.vm_get_icount(vm) != last_icount - 1:
+ self.fail('failed to reach the end (icount %s, reached %s)' % ((last_icount - 1), self.vm_get_icount(vm)))
+ logger.info('reached end of trace')
+
+ if self.verify_end:
+ self.check_pc(g, last_pc)
+ logger.info('found position %x' % last_pc)
+
+ logger.info('stepping backward')
+ self.gdb_bstep(g)
+
+ logger.info('stepping forward')
+ self.gdb_step(g)
+ self.check_pc(g, last_pc)
+ logger.info('found position %x' % last_pc)
+
+ logger.info('reversing to the start of the trace')
+ g.cmd(b'bc', b'T05thread:01;')
+
# Step forward again
logger.info('stepping forward')
for addr in steps:
@@ -250,6 +303,10 @@ class ReverseDebugging_X86_64(ReverseDebugging):
# The initial step does not change pc on x86 for some reason.
first_step_workaround = True
+ # Reverse stepping from a long-running trace does not reliably replay
+ # the trace precisely on x86.
+ verify_end = False
+
def get_pc(self, g):
return self.get_reg_le(g, self.REG_PC) \
+ self.get_reg_le(g, self.REG_CS) * 0x10
@@ -279,6 +336,10 @@ class ReverseDebugging_AArch64(ReverseDebugging):
REG_PC = 32
+ # Reverse stepping from a long-running trace does not reliably replay
+ # the trace precisely on aarch64.
+ verify_end = False
+
def test_aarch64_virt(self):
"""
:avocado: tags=arch:aarch64
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 19/24] tests/avocado: reverse_debugging.py add snapshot testing
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (17 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 18/24] tests/avocado: reverse_debugging reverse-step at the end of the trace Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 20/24] replay: simple auto-snapshot mode for record Nicholas Piggin
` (4 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
Make a manual snapshot halfway though initial building of the address
map in record mode.
This will cause the reverse-step and reverse-continue tests to load that
snapshot when beginning from later points in the trace, exercising the
post-initial snapshot saving and loading.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/avocado/reverse_debugging.py | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
index c0cf580d5c..185f055cf9 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -59,7 +59,7 @@ def run_vm(self, record, shift, args, replay_path, image_path, port):
vm.add_args('-icount', 'shift=%s,rr=%s,rrfile=%s,rrsnapshot=init' %
(shift, mode, replay_path),
'-net', 'none')
- vm.add_args('-drive', 'file=%s,if=none' % image_path)
+ vm.add_args('-drive', 'file=%s,if=none,id=disk0' % image_path)
if args:
vm.add_args(*args)
vm.launch()
@@ -133,6 +133,14 @@ def gdb_bstep(g):
def vm_get_icount(vm):
return vm.qmp('query-replay')['return']['icount']
+ @staticmethod
+ def vm_snapshot(vm):
+ return vm.qmp('snapshot-save',
+ {'job-id': 'snapshot-job',
+ 'tag': 'manual',
+ 'vmstate': 'disk0',
+ 'devices': []})
+
def reverse_debugging(self, shift=7, args=None):
logger = logging.getLogger('replay')
@@ -171,6 +179,9 @@ def reverse_debugging(self, shift=7, args=None):
pc = self.get_pc(g)
logger.info('saving position %x' % pc)
steps.append(pc)
+ if i == self.STEPS//2:
+ logger.info('saving VM snapshot at step %x...' % i)
+ self.vm_snapshot(vm)
self.gdb_step(g)
if self.first_step_workaround and i == 0 and self.vm_get_icount(vm) == 0:
logger.warn('failed to take first step, stepping again')
@@ -217,7 +228,9 @@ def reverse_debugging(self, shift=7, args=None):
self.gdb_step(g)
logger.info('found position %x' % addr)
- # Try reverse stepping
+ # Try reverse stepping. The manual snapshot taken in the record
+ # phase should be used for reverse-stepping until the machine
+ # reverses to an icount older than the snapshot.
logger.info('stepping backward')
for addr in steps[::-1]:
self.gdb_bstep(g)
@@ -242,6 +255,8 @@ def reverse_debugging(self, shift=7, args=None):
self.check_pc(g, last_pc)
logger.info('found position %x' % last_pc)
+ # This should load the last snapshot taken. Could that be verified
+ # with QMP?
logger.info('stepping backward')
self.gdb_bstep(g)
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 20/24] replay: simple auto-snapshot mode for record
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (18 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 19/24] tests/avocado: reverse_debugging.py add snapshot testing Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-04-05 6:56 ` Pavel Dovgalyuk
2024-03-18 15:46 ` [PATCH v5 21/24] tests/avocado: reverse_debugging.py test auto-snapshot mode Nicholas Piggin
` (3 subsequent siblings)
23 siblings, 1 reply; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
record makes an initial snapshot when the machine is created, to enable
reverse-debugging. Often the issue being debugged appears near the end of
the trace, so it is important for performance to keep snapshots close to
the end.
This implements a periodic snapshot mode that keeps a rolling set of
recent snapshots. This could be done by the debugger or other program
that talks QMP, but for setting up simple scenarios and tests, this is
more convenient.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
docs/system/replay.rst | 5 ++++
include/sysemu/replay.h | 11 ++++++++
replay/replay-snapshot.c | 57 ++++++++++++++++++++++++++++++++++++++++
replay/replay.c | 27 +++++++++++++++++--
system/vl.c | 9 +++++++
qemu-options.hx | 9 +++++--
6 files changed, 114 insertions(+), 4 deletions(-)
diff --git a/docs/system/replay.rst b/docs/system/replay.rst
index ca7c17c63d..1ae8614475 100644
--- a/docs/system/replay.rst
+++ b/docs/system/replay.rst
@@ -156,6 +156,11 @@ for storing VM snapshots. Here is the example of the command line for this:
``empty.qcow2`` drive does not connected to any virtual block device and used
for VM snapshots only.
+``rrsnapmode`` can be used to select just an initial snapshot or periodic
+snapshots, with ``rrsnapcount`` specifying the number of periodic snapshots
+to maintain, and ``rrsnaptime`` the amount of run time in seconds between
+periodic snapshots.
+
.. _network-label:
Network devices
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 8102fa54f0..92fa82842b 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -48,6 +48,17 @@ typedef enum ReplayCheckpoint ReplayCheckpoint;
typedef struct ReplayNetState ReplayNetState;
+enum ReplaySnapshotMode {
+ REPLAY_SNAPSHOT_MODE_INITIAL,
+ REPLAY_SNAPSHOT_MODE_PERIODIC,
+};
+typedef enum ReplaySnapshotMode ReplaySnapshotMode;
+
+extern ReplaySnapshotMode replay_snapshot_mode;
+
+extern uint64_t replay_snapshot_periodic_delay;
+extern int replay_snapshot_periodic_nr_keep;
+
/* Name of the initial VM snapshot */
extern char *replay_snapshot;
diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
index ccb4d89dda..762555feaa 100644
--- a/replay/replay-snapshot.c
+++ b/replay/replay-snapshot.c
@@ -70,6 +70,53 @@ void replay_vmstate_register(void)
vmstate_register(NULL, 0, &vmstate_replay, &replay_state);
}
+static QEMUTimer *replay_snapshot_timer;
+static int replay_snapshot_count;
+
+static void replay_snapshot_timer_cb(void *opaque)
+{
+ Error *err = NULL;
+ char *name;
+
+ if (!replay_can_snapshot()) {
+ /* Try again soon */
+ timer_mod(replay_snapshot_timer,
+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+ replay_snapshot_periodic_delay / 10);
+ return;
+ }
+
+ name = g_strdup_printf("%s-%d", replay_snapshot, replay_snapshot_count);
+ if (!save_snapshot(name,
+ true, NULL, false, NULL, &err)) {
+ error_report_err(err);
+ error_report("Could not create periodic snapshot "
+ "for icount record, disabling");
+ g_free(name);
+ return;
+ }
+ g_free(name);
+ replay_snapshot_count++;
+
+ if (replay_snapshot_periodic_nr_keep >= 1 &&
+ replay_snapshot_count > replay_snapshot_periodic_nr_keep) {
+ int del_nr;
+
+ del_nr = replay_snapshot_count - replay_snapshot_periodic_nr_keep - 1;
+ name = g_strdup_printf("%s-%d", replay_snapshot, del_nr);
+ if (!delete_snapshot(name, false, NULL, &err)) {
+ error_report_err(err);
+ error_report("Could not delete periodic snapshot "
+ "for icount record");
+ }
+ g_free(name);
+ }
+
+ timer_mod(replay_snapshot_timer,
+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+ replay_snapshot_periodic_delay);
+}
+
void replay_vmstate_init(void)
{
Error *err = NULL;
@@ -82,6 +129,16 @@ void replay_vmstate_init(void)
error_report("Could not create snapshot for icount record");
exit(1);
}
+
+ if (replay_snapshot_mode == REPLAY_SNAPSHOT_MODE_PERIODIC) {
+ replay_snapshot_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
+ replay_snapshot_timer_cb,
+ NULL);
+ timer_mod(replay_snapshot_timer,
+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+ replay_snapshot_periodic_delay);
+ }
+
} else if (replay_mode == REPLAY_MODE_PLAY) {
if (!load_snapshot(replay_snapshot, NULL, false, NULL, &err)) {
error_report_err(err);
diff --git a/replay/replay.c b/replay/replay.c
index 895fa6b67a..c916e71d30 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -29,6 +29,10 @@
ReplayMode replay_mode = REPLAY_MODE_NONE;
char *replay_snapshot;
+ReplaySnapshotMode replay_snapshot_mode;
+uint64_t replay_snapshot_periodic_delay;
+int replay_snapshot_periodic_nr_keep;
+
/* Name of replay file */
static char *replay_filename;
ReplayState replay_state;
@@ -424,6 +428,27 @@ void replay_configure(QemuOpts *opts)
}
replay_snapshot = g_strdup(qemu_opt_get(opts, "rrsnapshot"));
+ if (replay_snapshot && mode == REPLAY_MODE_RECORD) {
+ const char *snapmode;
+
+ snapmode = qemu_opt_get(opts, "rrsnapmode");
+ if (!snapmode || !strcmp(snapmode, "initial")) {
+ replay_snapshot_mode = REPLAY_SNAPSHOT_MODE_INITIAL;
+ } else if (!strcmp(snapmode, "periodic")) {
+ replay_snapshot_mode = REPLAY_SNAPSHOT_MODE_PERIODIC;
+ } else {
+ error_report("Invalid rrsnapmode option: %s", snapmode);
+ exit(1);
+ }
+
+ /* Default 10 host seconds of machine runtime per snapshot. */
+ replay_snapshot_periodic_delay =
+ qemu_opt_get_number(opts, "rrsnaptime", 10) * 1000;
+
+ /* Default 2, to cover at least the last 10 host seconds of runtime. */
+ replay_snapshot_periodic_nr_keep =
+ qemu_opt_get_number(opts, "rrsnapcount", 2);
+ }
replay_vmstate_register();
replay_enable(fname, mode);
@@ -446,8 +471,6 @@ void replay_start(void)
exit(1);
}
- /* Timer for snapshotting will be set up here. */
-
replay_enable_events();
}
diff --git a/system/vl.c b/system/vl.c
index 70f4cece7f..8070cee6da 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -447,6 +447,15 @@ static QemuOptsList qemu_icount_opts = {
}, {
.name = "rrsnapshot",
.type = QEMU_OPT_STRING,
+ }, {
+ .name = "rrsnapmode",
+ .type = QEMU_OPT_STRING,
+ }, {
+ .name = "rrsnaptime",
+ .type = QEMU_OPT_NUMBER,
+ }, {
+ .name = "rrsnapcount",
+ .type = QEMU_OPT_NUMBER,
},
{ /* end of list */ }
},
diff --git a/qemu-options.hx b/qemu-options.hx
index 7fd1713fa8..51f2498f12 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4665,13 +4665,13 @@ SRST
ERST
DEF("icount", HAS_ARG, QEMU_OPTION_icount, \
- "-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=<filename>[,rrsnapshot=<snapshot>]]\n" \
+ "-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=<filename>[,rrsnapshot=<snapshot>][,rrsnapmode=initial|periodic][,rrsnaptime=secs][,rrsnapcount=N]\n" \
" enable virtual instruction counter with 2^N clock ticks per\n" \
" instruction, enable aligning the host and virtual clocks\n" \
" or disable real time cpu sleeping, and optionally enable\n" \
" record-and-replay mode\n", QEMU_ARCH_ALL)
SRST
-``-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=filename[,rrsnapshot=snapshot]]``
+``-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=filename[,rrsnapshot=snapshot][,rrsnapmode=initial|periodic][,rrsnaptime=secs][,rrsnapcount=N]]``
Enable virtual instruction counter. The virtual cpu will execute one
instruction every 2^N ns of virtual time. If ``auto`` is specified
then the virtual cpu speed will be automatically adjusted to keep
@@ -4713,6 +4713,11 @@ SRST
name. In record mode, a new VM snapshot with the given name is created
at the start of execution recording. In replay mode this option
specifies the snapshot name used to load the initial VM state.
+ ``rrsnapmode=periodic`` will additionally cause a periodic snapshot to
+ be created after ``rrsnaptime=secs`` seconds of real runtime. The last
+ ``rrsnapcount=N`` periodic snapshots (not including the initial) will
+ be kept (0 for infinite). Periodic snapshots are useful to speed
+ reverse debugging operations near the end of the recorded trace.
ERST
DEF("watchdog-action", HAS_ARG, QEMU_OPTION_watchdog_action, \
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 21/24] tests/avocado: reverse_debugging.py test auto-snapshot mode
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (19 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 20/24] replay: simple auto-snapshot mode for record Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 22/24] target/ppc: fix timebase register reset state Nicholas Piggin
` (2 subsequent siblings)
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
This runs the recording phase with a periodic auto-snapshot.
It also runs the machine for 5 seconds after creating the initial
address map, so as to alow time for a some auto snapshots to be
taken.
In the replay mode a reverse-debugging sequence is added where the
test is run to the end, and reverse-stepped from there. This should
load the last auto-snapshot taken.
The test hangs on x86-64 and aarch64 with autosnapshots, so it is
disabled for them.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/avocado/reverse_debugging.py | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)
diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
index 185f055cf9..36e5e1be05 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -41,6 +41,9 @@ class ReverseDebugging(LinuxKernelTest):
# first 10 instructions are stepped.
verify_end = True
+ # Enable the autosnapshot test, requires long_trace = True.
+ autosnapshot = True
+
# If first_step_workaround is true, check whether the first step moved
# icount, and if not then step again.
first_step_workaround = False
@@ -56,9 +59,14 @@ def run_vm(self, record, shift, args, replay_path, image_path, port):
logger.info('replaying the execution...')
mode = 'replay'
vm.add_args('-gdb', 'tcp::%d' % port, '-S')
- vm.add_args('-icount', 'shift=%s,rr=%s,rrfile=%s,rrsnapshot=init' %
- (shift, mode, replay_path),
- '-net', 'none')
+ if self.autosnapshot:
+ vm.add_args('-icount', 'shift=%s,rr=%s,rrfile=%s,rrsnapshot=init'
+ ',rrsnapmode=periodic,rrsnapcount=10,'
+ 'rrsnaptime=1' % (shift, mode, replay_path))
+ else:
+ vm.add_args('-icount', 'shift=%s,rr=%s,rrfile=%s,rrsnapshot=init'
+ % (shift, mode, replay_path))
+ vm.add_args('-net', 'none')
vm.add_args('-drive', 'file=%s,if=none,id=disk0' % image_path)
if args:
vm.add_args(*args)
@@ -192,8 +200,12 @@ def reverse_debugging(self, shift=7, args=None):
logger.info('continue running')
self.gdb_cont_nowait(g)
- logger.info('running for 1s...')
- time.sleep(1)
+ if self.autosnapshot:
+ logger.info('running for 5s...')
+ time.sleep(5)
+ else:
+ logger.info('running for 1s...')
+ time.sleep(1)
logger.info('stopping to read final icount')
vm.qmp('stop')
self.gdb_break(g)
@@ -322,6 +334,9 @@ class ReverseDebugging_X86_64(ReverseDebugging):
# the trace precisely on x86.
verify_end = False
+ # x86 doesn't like autosnapshot
+ autosnapshot = False
+
def get_pc(self, g):
return self.get_reg_le(g, self.REG_PC) \
+ self.get_reg_le(g, self.REG_CS) * 0x10
@@ -355,6 +370,9 @@ class ReverseDebugging_AArch64(ReverseDebugging):
# the trace precisely on aarch64.
verify_end = False
+ # aarch64 doesn't like autosnapshot
+ autosnapshot = False
+
def test_aarch64_virt(self):
"""
:avocado: tags=arch:aarch64
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 22/24] target/ppc: fix timebase register reset state
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (20 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 21/24] tests/avocado: reverse_debugging.py test auto-snapshot mode Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 23/24] spapr: Fix vpa dispatch count for record-replay Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 24/24] tests/avocado: replay_linux.py add ppc64 pseries test Nicholas Piggin
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
(H)DEC and PURR get reset before icount does, which causes them to be
skewed and not match the init state. This can cause replay to not
match the recorded trace exactly. For DEC and HDEC this is usually not
noticable since they tend to get programmed before affecting the
target machine. PURR has been observed to cause replay bugs when
running Linux.
Fix this by resetting using a time of 0.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
hw/ppc/ppc.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index e6fa5580c0..ddcbf8972b 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -1122,16 +1122,21 @@ void cpu_ppc_tb_reset(CPUPPCState *env)
timer_del(tb_env->hdecr_timer);
ppc_set_irq(cpu, PPC_INTERRUPT_HDECR, 0);
tb_env->hdecr_next = 0;
+ _cpu_ppc_store_hdecr(cpu, 0, 0, 0, 64);
}
/*
* There is a bug in Linux 2.4 kernels:
* if a decrementer exception is pending when it enables msr_ee at startup,
* it's not ready to handle it...
+ *
+ * On machine reset, in this is called before icount is reset, so
+ * for icount-mode, setting TB registers using now=qemu_clock_get_ns
+ * results in them being skewed when icount does get reset. Use an
+ * explicit 0 to get a consistent reset state.
*/
- cpu_ppc_store_decr(env, -1);
- cpu_ppc_store_hdecr(env, -1);
- cpu_ppc_store_purr(env, 0x0000000000000000ULL);
+ _cpu_ppc_store_decr(cpu, 0, 0, -1, 64);
+ _cpu_ppc_store_purr(env, 0, 0);
}
void cpu_ppc_tb_free(CPUPPCState *env)
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 23/24] spapr: Fix vpa dispatch count for record-replay
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (21 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 22/24] target/ppc: fix timebase register reset state Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 24/24] tests/avocado: replay_linux.py add ppc64 pseries test Nicholas Piggin
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
The dispatch count is a field in guest memory that the hypervisor
increments when preempting and dispatching the guest. This was not
being done deterministically with respect to icount, because tcg
exec exit is not deterministic (e.g., an async event could cause it).
Change vpa dispatch count increment to keep track of whether the
vCPU is considered dispatched or not, and only consider it preempted
when calling cede / confer / join / stop-self / etc.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
include/hw/ppc/spapr_cpu_core.h | 3 +++
hw/ppc/spapr.c | 36 ++-------------------------------
hw/ppc/spapr_hcall.c | 33 ++++++++++++++++++++++++++++++
hw/ppc/spapr_rtas.c | 1 +
4 files changed, 39 insertions(+), 34 deletions(-)
diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
index 69a52e39b8..5feb0ecc2a 100644
--- a/include/hw/ppc/spapr_cpu_core.h
+++ b/include/hw/ppc/spapr_cpu_core.h
@@ -47,6 +47,7 @@ typedef struct SpaprCpuState {
uint64_t vpa_addr;
uint64_t slb_shadow_addr, slb_shadow_size;
uint64_t dtl_addr, dtl_size;
+ bool dispatched; /* for vpa dispatch counter tracking */
bool prod; /* not migrated, only used to improve dispatch latencies */
struct ICPState *icp;
struct XiveTCTX *tctx;
@@ -61,4 +62,6 @@ static inline SpaprCpuState *spapr_cpu_state(PowerPCCPU *cpu)
return (SpaprCpuState *)cpu->machine_data;
}
+void vpa_dispatch(CPUState *cs, SpaprCpuState *spapr_cpu, bool dispatch);
+
#endif
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c417f9dd52..07b004fdec 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4599,47 +4599,15 @@ static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
{
SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
- /* These are only called by TCG, KVM maintains dispatch state */
-
spapr_cpu->prod = false;
- if (spapr_cpu->vpa_addr) {
+ if (!spapr_cpu->dispatched) {
CPUState *cs = CPU(cpu);
- uint32_t dispatch;
-
- dispatch = ldl_be_phys(cs->as,
- spapr_cpu->vpa_addr + VPA_DISPATCH_COUNTER);
- dispatch++;
- if ((dispatch & 1) != 0) {
- qemu_log_mask(LOG_GUEST_ERROR,
- "VPA: incorrect dispatch counter value for "
- "dispatched partition %u, correcting.\n", dispatch);
- dispatch++;
- }
- stl_be_phys(cs->as,
- spapr_cpu->vpa_addr + VPA_DISPATCH_COUNTER, dispatch);
+ vpa_dispatch(cs, spapr_cpu, true);
}
}
static void spapr_cpu_exec_exit(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
{
- SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
-
- if (spapr_cpu->vpa_addr) {
- CPUState *cs = CPU(cpu);
- uint32_t dispatch;
-
- dispatch = ldl_be_phys(cs->as,
- spapr_cpu->vpa_addr + VPA_DISPATCH_COUNTER);
- dispatch++;
- if ((dispatch & 1) != 1) {
- qemu_log_mask(LOG_GUEST_ERROR,
- "VPA: incorrect dispatch counter value for "
- "preempted partition %u, correcting.\n", dispatch);
- dispatch++;
- }
- stl_be_phys(cs->as,
- spapr_cpu->vpa_addr + VPA_DISPATCH_COUNTER, dispatch);
- }
}
static void spapr_machine_class_init(ObjectClass *oc, void *data)
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 5e1d020e3d..907e09c2c3 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -487,6 +487,36 @@ static target_ulong h_register_vpa(PowerPCCPU *cpu, SpaprMachineState *spapr,
return ret;
}
+void vpa_dispatch(CPUState *cs, SpaprCpuState *spapr_cpu, bool dispatch)
+{
+ uint32_t counter;
+
+ if (!dispatch) {
+ assert(spapr_cpu->dispatched);
+ } else {
+ assert(!spapr_cpu->dispatched);
+ }
+ spapr_cpu->dispatched = dispatch;
+
+ return;
+
+ if (!spapr_cpu->vpa_addr) {
+ return;
+ }
+
+ /* These are only called by TCG, KVM maintains dispatch state */
+ counter = ldl_be_phys(cs->as, spapr_cpu->vpa_addr + VPA_DISPATCH_COUNTER);
+ counter++;
+ if ((counter & 1) != dispatch) {
+ qemu_log_mask(LOG_GUEST_ERROR,
+ "VPA: incorrect dispatch counter value for "
+ "%s partition %u, correcting.\n",
+ dispatch ? "preempted" : "running", counter);
+ counter++;
+ }
+ stl_be_phys(cs->as, spapr_cpu->vpa_addr + VPA_DISPATCH_COUNTER, counter);
+}
+
static target_ulong h_cede(PowerPCCPU *cpu, SpaprMachineState *spapr,
target_ulong opcode, target_ulong *args)
{
@@ -505,6 +535,7 @@ static target_ulong h_cede(PowerPCCPU *cpu, SpaprMachineState *spapr,
if (!cpu_has_work(cs)) {
cs->halted = 1;
+ vpa_dispatch(cs, spapr_cpu, false);
cs->exception_index = EXCP_HLT;
cs->exit_request = 1;
ppc_maybe_interrupt(env);
@@ -531,6 +562,8 @@ static target_ulong h_confer_self(PowerPCCPU *cpu)
cs->exit_request = 1;
ppc_maybe_interrupt(&cpu->env);
+ vpa_dispatch(cs, spapr_cpu, false);
+
return H_SUCCESS;
}
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index f329693c55..8ce4230223 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -216,6 +216,7 @@ static void rtas_stop_self(PowerPCCPU *cpu, SpaprMachineState *spapr,
*/
env->spr[SPR_PSSCR] |= PSSCR_EC;
cs->halted = 1;
+ vpa_dispatch(cs, spapr_cpu_state(cpu), false);
ppc_store_lpcr(cpu, env->spr[SPR_LPCR] & ~pcc->lpcr_pm);
kvmppc_set_reg_ppc_online(cpu, 0);
qemu_cpu_kick(cs);
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 24/24] tests/avocado: replay_linux.py add ppc64 pseries test
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
` (22 preceding siblings ...)
2024-03-18 15:46 ` [PATCH v5 23/24] spapr: Fix vpa dispatch count for record-replay Nicholas Piggin
@ 2024-03-18 15:46 ` Nicholas Piggin
23 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-18 15:46 UTC (permalink / raw)
To: qemu-devel
Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
Michael Tokarev
Add a ppc64 pseries test. IO in SLOF is very slow when running in
record-replay modes, so this test uses guestfish to extract the
kernel and initrd and boot them directly, bypassing SLOF.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/avocado/replay_linux.py | 72 ++++++++++++++++++++++++++++++++++-
1 file changed, 71 insertions(+), 1 deletion(-)
diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
index b3b74a367c..565a600130 100644
--- a/tests/avocado/replay_linux.py
+++ b/tests/avocado/replay_linux.py
@@ -11,8 +11,9 @@
import os
import logging
import time
+import subprocess
-from avocado import skipUnless
+from avocado import skipIf, skipUnless
from avocado_qemu import BUILD_DIR
from avocado.utils import cloudinit
from avocado.utils import network
@@ -55,6 +56,8 @@ def vm_add_disk(self, vm, path, id, device):
'%s,drive=disk%s-rr%s' % (device, id, bus_string))
def vm_add_cdrom(self, vm, path, id, device):
+ vm.add_args('-device',
+ '%s,drive=disk%s' % (device, id))
vm.add_args('-drive', 'file=%s,id=disk%s,if=none,media=cdrom' % (path, id))
def launch_and_wait(self, record, args, shift):
@@ -208,3 +211,70 @@ def test_virt_gicv3(self):
self.run_rr(shift=3,
args=(*self.get_common_args(),
"-machine", "virt,gic-version=3"))
+
+# ppc64 pseries test.
+#
+# This machine tends to fail replay and hang very close to the end of the
+# trace, with missing events, which is still an open issue.
+#
+# spapr-scsi IO driven by SLOF/grub is extremely slow in record/replay mode,
+# so jump through some hoops to boot the kernel directly. With this, the test
+# runs in about 5 minutes (modulo hang), which suggests other machines may
+# have similar issues and could benefit from bypassing bootloaders.
+#
+ppc_deps = ["guestfish"] # dependent tools needed in the test setup/box.
+
+def which(tool):
+ """ looks up the full path for @tool, returns None if not found
+ or if @tool does not have executable permissions.
+ """
+ paths=os.getenv('PATH')
+ for p in paths.split(os.path.pathsep):
+ p = os.path.join(p, tool)
+ if os.path.exists(p) and os.access(p, os.X_OK):
+ return p
+ return None
+
+def ppc_missing_deps():
+ """ returns True if any of the test dependent tools are absent.
+ """
+ for dep in ppc_deps:
+ if which(dep) is None:
+ return True
+ return False
+
+@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
+class ReplayLinuxPPC64(ReplayLinux):
+ """
+ :avocado: tags=arch:ppc64
+ :avocado: tags=accel:tcg
+ """
+
+ hdd = 'virtio-blk-pci'
+ cd = 'scsi-cd'
+ bus = None
+
+ def setUp(self):
+ super().setUp()
+
+ if not ppc_missing_deps():
+ # kernel, initramfs, and kernel cmdline are all taken by hand from
+ # the Fedora image.
+ self.kernel="vmlinuz-5.3.7-301.fc31.ppc64le"
+ self.initramfs="initramfs-5.3.7-301.fc31.ppc64le.img"
+ cmd = "guestfish --ro -a %s run " ": mount /dev/sda2 / " ": copy-out /boot/%s %s " ": copy-out /boot/%s %s " % (self.boot_path, self.kernel, self.workdir, self.initramfs, self.workdir)
+ subprocess.run(cmd.split())
+
+ @skipIf(ppc_missing_deps(), 'dependencies (%s) not installed' % ','.join(ppc_deps))
+ def test_pseries(self):
+ """
+ :avocado: tags=machine:pseries
+ """
+ kernel=os.path.normpath(os.path.join(self.workdir, self.kernel))
+ initramfs=os.path.normpath(os.path.join(self.workdir, self.initramfs))
+ cmdline="root=UUID=8a409ee6-3cb3-4b06-a266-39e2dae3e5fa ro no_timer_check net.fnames=0 console=tty1 console=ttyS0,115200n8"
+ self.run_rr(shift=3, args=("-device", "spapr-vscsi",
+ "-machine", "x-vof=on",
+ "-kernel", kernel,
+ "-initrd", initramfs,
+ "-append", cmdline))
--
2.42.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v5 13/24] tests/avocado: replay_linux.py remove the timeout expected guards
2024-03-18 15:46 ` [PATCH v5 13/24] tests/avocado: replay_linux.py remove the timeout expected guards Nicholas Piggin
@ 2024-03-19 17:57 ` Alex Bennée
2024-03-20 4:34 ` Nicholas Piggin
0 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2024-03-19 17:57 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
Nicholas Piggin <npiggin@gmail.com> writes:
> replay_linux tests with virtio on aarch64 gciv3 and x86-64 q35 machines
> seems to be more reliable now, so timeouts are no longer expected.
> pc_i440fx, gciv2, and non-virtio still have problems, so mark them as
> flaky: they are not just long-running, but can hang indefinitely.
>
> These tests take about 400 seconds each, so add the SPEED=slow guard.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> tests/avocado/replay_linux.py | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
> index b3b91ddd9a..b3b74a367c 100644
> --- a/tests/avocado/replay_linux.py
> +++ b/tests/avocado/replay_linux.py
> @@ -118,7 +118,7 @@ def run_replay_dump(self, replay_path):
> except subprocess.CalledProcessError:
> self.fail('replay-dump.py failed')
>
> -@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
> +@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
> class ReplayLinuxX8664(ReplayLinux):
> """
> :avocado: tags=arch:x86_64
> @@ -127,19 +127,21 @@ class ReplayLinuxX8664(ReplayLinux):
>
> chksum = 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0'
>
> + @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable')
> def test_pc_i440fx(self):
> """
> :avocado: tags=machine:pc
> """
> self.run_rr(shift=1)
>
> + @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable')
> def test_pc_q35(self):
> """
> :avocado: tags=machine:q35
> """
> self.run_rr(shift=3)
>
> -@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
> +@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
> class ReplayLinuxX8664Virtio(ReplayLinux):
> """
> :avocado: tags=arch:x86_64
> @@ -153,6 +155,7 @@ class ReplayLinuxX8664Virtio(ReplayLinux):
>
> chksum = 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0'
>
> + @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable')
> def test_pc_i440fx(self):
> """
> :avocado: tags=machine:pc
> @@ -165,7 +168,7 @@ def test_pc_q35(self):
> """
> self.run_rr(shift=3)
>
> -@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
> +@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
> class ReplayLinuxAarch64(ReplayLinux):
> """
> :avocado: tags=accel:tcg
> @@ -187,6 +190,7 @@ def get_common_args(self):
> '-device', 'virtio-rng-pci,rng=rng0',
> '-object', 'rng-builtin,id=rng0')
>
> + @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is
> unstable')
This needs to apply to both I think:
(5/7) ./tests/avocado/replay_linux.py:ReplayLinuxAarch64.test_virt_gicv2: SKIP: Test is unstable
(6/7)
./tests/avocado/replay_linux.py:ReplayLinuxAarch64.test_virt_gicv3:
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
Timeout reached\nOriginal status: ERROR\n{'name':
'6-./tests/avocado/replay_linux.py:ReplayLinuxAarch64.test_virt_gicv3',
'logdir':
'/home/alex/avocado/job-results/job-2024-03-19T16.50-686495d/test-results/...
(1800.17 s)
With that:
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 05/24] Revert "replay: stop us hanging in rr_wait_io_event"
2024-03-18 15:46 ` [PATCH v5 05/24] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin
@ 2024-03-19 17:58 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2024-03-19 17:58 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
Nicholas Piggin <npiggin@gmail.com> writes:
> This reverts commit 1f881ea4a444ef36a8b6907b0b82be4b3af253a2.
>
> That commit causes reverse_debugging.py test failures, and does
> not seem to solve the root cause of the problem x86-64 still
> hangs in record/replay tests.
>
> The problem with short-cutting the iowait that was taken during
> record phase is that related events will not get consumed at the
> same points (e.g., reading the clock).
>
> A hang with zero icount always seems to be a symptom of an earlier
> problem that has caused the recording to become out of synch with
> the execution and consumption of events by replay.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 06/24] tests/avocado: replay_kernel.py add x86-64 q35 machine test
2024-03-18 15:46 ` [PATCH v5 06/24] tests/avocado: replay_kernel.py add x86-64 q35 machine test Nicholas Piggin
@ 2024-03-19 17:58 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2024-03-19 17:58 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
Nicholas Piggin <npiggin@gmail.com> writes:
> The x86-64 pc machine is flaky with record/replay, but q35 is more
> stable. Add a q35 test to replay_kernel.py.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 07/24] chardev: set record/replay on the base device of a muxed device
2024-03-18 15:46 ` [PATCH v5 07/24] chardev: set record/replay on the base device of a muxed device Nicholas Piggin
@ 2024-03-19 19:17 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2024-03-19 19:17 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
Nicholas Piggin <npiggin@gmail.com> writes:
> chardev events to a muxed device don't get recorded because e.g.,
> qemu_chr_be_write() checks whether the base device has the record flag
> set.
>
> This can be seen when replaying a trace that has characters typed into
> the console, an examination of the log shows they are not recorded.
>
> Setting QEMU_CHAR_FEATURE_REPLAY on the base chardev fixes the problem.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 08/24] replay: Fix migration use of clock
2024-03-18 15:46 ` [PATCH v5 08/24] replay: Fix migration use of clock Nicholas Piggin
@ 2024-03-19 20:40 ` Alex Bennée
2024-03-20 4:48 ` Nicholas Piggin
0 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2024-03-19 20:40 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
Nicholas Piggin <npiggin@gmail.com> writes:
> Migration reads host clocks when not holding the replay_mutex, which
> asserts when recording a trace. It seems that these migration times
> should be host times like other statistics in MigrationState.
s/host/CLOCK_HOST/ and s/host/CLOCK_REALTIME/ but its a confusing
sentence. If the MigrationState is guest visible it should be
QEMU_CLOCK_VIRTUAL surely?
> These
> do not require the replay_mutex.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> migration/migration.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 644e073b7d..2c286ccf63 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3424,7 +3424,7 @@ static void *migration_thread(void *opaque)
> {
> MigrationState *s = opaque;
> MigrationThread *thread = NULL;
> - int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> + int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> MigThrError thr_error;
> bool urgent = false;
>
> @@ -3476,7 +3476,7 @@ static void *migration_thread(void *opaque)
> qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> MIGRATION_STATUS_ACTIVE);
>
> - s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> + s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start;
>
> trace_migration_thread_setup_complete();
>
> @@ -3555,7 +3555,7 @@ static void *bg_migration_thread(void *opaque)
>
> migration_rate_set(RATE_LIMIT_DISABLED);
>
> - setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> + setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> /*
> * We want to save vmstate for the moment when migration has been
> * initiated but also we want to save RAM content while VM is running.
> @@ -3588,7 +3588,7 @@ static void *bg_migration_thread(void *opaque)
> qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> MIGRATION_STATUS_ACTIVE);
>
> - s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> + s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start;
>
> trace_migration_thread_setup_complete();
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 10/24] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state
2024-03-18 15:46 ` [PATCH v5 10/24] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state Nicholas Piggin
@ 2024-03-19 20:41 ` Alex Bennée
2024-04-05 6:53 ` Pavel Dovgalyuk
1 sibling, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2024-03-19 20:41 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
Nicholas Piggin <npiggin@gmail.com> writes:
> The regular qemu_bh_schedule() calls result in non-deterministic
> execution of the bh in record-replay mode, which causes replay failure.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 12/24] savevm: Fix load_snapshot error path crash
2024-03-18 15:46 ` [PATCH v5 12/24] savevm: Fix load_snapshot error path crash Nicholas Piggin
@ 2024-03-19 20:49 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2024-03-19 20:49 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
Nicholas Piggin <npiggin@gmail.com> writes:
> An error path missed setting *errp, which can cause a NULL deref.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 15/24] tests/avocado: reverse_debugging.py add test for x86-64 q35 machine
2024-03-18 15:46 ` [PATCH v5 15/24] tests/avocado: reverse_debugging.py add test for x86-64 q35 machine Nicholas Piggin
@ 2024-03-19 20:50 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2024-03-19 20:50 UTC (permalink / raw)
To: Nicholas Piggin
Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
Nicholas Piggin <npiggin@gmail.com> writes:
> The x86-64 pc machine has a problem with record/replay. q35 seems
> to work well. Add a new q35 test and update the flaky message for
> pc.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 13/24] tests/avocado: replay_linux.py remove the timeout expected guards
2024-03-19 17:57 ` Alex Bennée
@ 2024-03-20 4:34 ` Nicholas Piggin
0 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-20 4:34 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
On Wed Mar 20, 2024 at 3:57 AM AEST, Alex Bennée wrote:
> Nicholas Piggin <npiggin@gmail.com> writes:
>
> > replay_linux tests with virtio on aarch64 gciv3 and x86-64 q35 machines
> > seems to be more reliable now, so timeouts are no longer expected.
> > pc_i440fx, gciv2, and non-virtio still have problems, so mark them as
> > flaky: they are not just long-running, but can hang indefinitely.
> >
> > These tests take about 400 seconds each, so add the SPEED=slow guard.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> > tests/avocado/replay_linux.py | 10 +++++++---
> > 1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
> > index b3b91ddd9a..b3b74a367c 100644
> > --- a/tests/avocado/replay_linux.py
> > +++ b/tests/avocado/replay_linux.py
> > @@ -118,7 +118,7 @@ def run_replay_dump(self, replay_path):
> > except subprocess.CalledProcessError:
> > self.fail('replay-dump.py failed')
> >
> > -@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
> > +@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
> > class ReplayLinuxX8664(ReplayLinux):
> > """
> > :avocado: tags=arch:x86_64
> > @@ -127,19 +127,21 @@ class ReplayLinuxX8664(ReplayLinux):
> >
> > chksum = 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0'
> >
> > + @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable')
> > def test_pc_i440fx(self):
> > """
> > :avocado: tags=machine:pc
> > """
> > self.run_rr(shift=1)
> >
> > + @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable')
> > def test_pc_q35(self):
> > """
> > :avocado: tags=machine:q35
> > """
> > self.run_rr(shift=3)
> >
> > -@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
> > +@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
> > class ReplayLinuxX8664Virtio(ReplayLinux):
> > """
> > :avocado: tags=arch:x86_64
> > @@ -153,6 +155,7 @@ class ReplayLinuxX8664Virtio(ReplayLinux):
> >
> > chksum = 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0'
> >
> > + @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable')
> > def test_pc_i440fx(self):
> > """
> > :avocado: tags=machine:pc
> > @@ -165,7 +168,7 @@ def test_pc_q35(self):
> > """
> > self.run_rr(shift=3)
> >
> > -@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
> > +@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
> > class ReplayLinuxAarch64(ReplayLinux):
> > """
> > :avocado: tags=accel:tcg
> > @@ -187,6 +190,7 @@ def get_common_args(self):
> > '-device', 'virtio-rng-pci,rng=rng0',
> > '-object', 'rng-builtin,id=rng0')
> >
> > + @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is
> > unstable')
>
> This needs to apply to both I think:
>
> (5/7) ./tests/avocado/replay_linux.py:ReplayLinuxAarch64.test_virt_gicv2: SKIP: Test is unstable
> (6/7)
> ./tests/avocado/replay_linux.py:ReplayLinuxAarch64.test_virt_gicv3:
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
> Timeout reached\nOriginal status: ERROR\n{'name':
> '6-./tests/avocado/replay_linux.py:ReplayLinuxAarch64.test_virt_gicv3',
> 'logdir':
> '/home/alex/avocado/job-results/job-2024-03-19T16.50-686495d/test-results/...
> (1800.17 s)
Oh, aarch64 is hanging for you too? It was passing for me... But I'll
guard it out for now and maybe we can get back to it later.
Thanks,
Nick
>
> With that:
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 08/24] replay: Fix migration use of clock
2024-03-19 20:40 ` Alex Bennée
@ 2024-03-20 4:48 ` Nicholas Piggin
0 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2024-03-20 4:48 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
On Wed Mar 20, 2024 at 6:40 AM AEST, Alex Bennée wrote:
> Nicholas Piggin <npiggin@gmail.com> writes:
>
> > Migration reads host clocks when not holding the replay_mutex, which
> > asserts when recording a trace. It seems that these migration times
> > should be host times like other statistics in MigrationState.
>
> s/host/CLOCK_HOST/ and s/host/CLOCK_REALTIME/ but its a confusing
> sentence.
Yes you're right.
> If the MigrationState is guest visible it should be
> QEMU_CLOCK_VIRTUAL surely?
I didn't think it was visible. It was added with ed4fbd10823 and
just exported to QMP.
It was the first and only user of host clock in migration stats,
the rest use rt clock. OTOH it does seem to have deliberately
chosen host... can you see any reason for it?
Thanks,
Nick
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 10/24] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state
2024-03-18 15:46 ` [PATCH v5 10/24] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state Nicholas Piggin
2024-03-19 20:41 ` Alex Bennée
@ 2024-04-05 6:53 ` Pavel Dovgalyuk
1 sibling, 0 replies; 37+ messages in thread
From: Pavel Dovgalyuk @ 2024-04-05 6:53 UTC (permalink / raw)
To: Nicholas Piggin, qemu-devel
Cc: Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée,
Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
On 18.03.2024 18:46, Nicholas Piggin wrote:
> The regular qemu_bh_schedule() calls result in non-deterministic
> execution of the bh in record-replay mode, which causes replay failure.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> hw/net/virtio-net.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9959f1932b..6ac737f2cf 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -40,6 +40,7 @@
> #include "migration/misc.h"
> #include "standard-headers/linux/ethtool.h"
> #include "sysemu/sysemu.h"
> +#include "sysemu/replay.h"
> #include "trace.h"
> #include "monitor/qdev.h"
> #include "monitor/monitor.h"
> @@ -416,7 +417,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
> timer_mod(q->tx_timer,
> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
> } else {
> - qemu_bh_schedule(q->tx_bh);
> + replay_bh_schedule_event(q->tx_bh);
> }
> } else {
> if (q->tx_timer) {
> @@ -2724,7 +2725,7 @@ static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)
> */
> virtio_queue_set_notification(q->tx_vq, 0);
> if (q->tx_bh) {
> - qemu_bh_schedule(q->tx_bh);
> + replay_bh_schedule_event(q->tx_bh);
> } else {
> timer_mod(q->tx_timer,
> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
> @@ -2879,7 +2880,7 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq)
> return;
> }
> virtio_queue_set_notification(vq, 0);
> - qemu_bh_schedule(q->tx_bh);
> + replay_bh_schedule_event(q->tx_bh);
> }
>
> static void virtio_net_tx_timer(void *opaque)
> @@ -2962,7 +2963,7 @@ static void virtio_net_tx_bh(void *opaque)
> /* If we flush a full burst of packets, assume there are
> * more coming and immediately reschedule */
> if (ret >= n->tx_burst) {
> - qemu_bh_schedule(q->tx_bh);
> + replay_bh_schedule_event(q->tx_bh);
> q->tx_waiting = 1;
> return;
> }
> @@ -2976,7 +2977,7 @@ static void virtio_net_tx_bh(void *opaque)
> return;
> } else if (ret > 0) {
> virtio_queue_set_notification(q->tx_vq, 0);
> - qemu_bh_schedule(q->tx_bh);
> + replay_bh_schedule_event(q->tx_bh);
> q->tx_waiting = 1;
> }
> }
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 20/24] replay: simple auto-snapshot mode for record
2024-03-18 15:46 ` [PATCH v5 20/24] replay: simple auto-snapshot mode for record Nicholas Piggin
@ 2024-04-05 6:56 ` Pavel Dovgalyuk
0 siblings, 0 replies; 37+ messages in thread
From: Pavel Dovgalyuk @ 2024-04-05 6:56 UTC (permalink / raw)
To: Nicholas Piggin, qemu-devel
Cc: Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée,
Paolo Bonzini, John Snow, Cleber Rosa,
Wainer dos Santos Moschetta, Beraldo Leal, Michael Tokarev
On 18.03.2024 18:46, Nicholas Piggin wrote:
> record makes an initial snapshot when the machine is created, to enable
> reverse-debugging. Often the issue being debugged appears near the end of
> the trace, so it is important for performance to keep snapshots close to
> the end.
>
> This implements a periodic snapshot mode that keeps a rolling set of
> recent snapshots. This could be done by the debugger or other program
> that talks QMP, but for setting up simple scenarios and tests, this is
> more convenient.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> docs/system/replay.rst | 5 ++++
> include/sysemu/replay.h | 11 ++++++++
> replay/replay-snapshot.c | 57 ++++++++++++++++++++++++++++++++++++++++
> replay/replay.c | 27 +++++++++++++++++--
> system/vl.c | 9 +++++++
> qemu-options.hx | 9 +++++--
> 6 files changed, 114 insertions(+), 4 deletions(-)
>
> diff --git a/docs/system/replay.rst b/docs/system/replay.rst
> index ca7c17c63d..1ae8614475 100644
> --- a/docs/system/replay.rst
> +++ b/docs/system/replay.rst
> @@ -156,6 +156,11 @@ for storing VM snapshots. Here is the example of the command line for this:
> ``empty.qcow2`` drive does not connected to any virtual block device and used
> for VM snapshots only.
>
> +``rrsnapmode`` can be used to select just an initial snapshot or periodic
> +snapshots, with ``rrsnapcount`` specifying the number of periodic snapshots
> +to maintain, and ``rrsnaptime`` the amount of run time in seconds between
> +periodic snapshots.
> +
> .. _network-label:
>
> Network devices
> diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
> index 8102fa54f0..92fa82842b 100644
> --- a/include/sysemu/replay.h
> +++ b/include/sysemu/replay.h
> @@ -48,6 +48,17 @@ typedef enum ReplayCheckpoint ReplayCheckpoint;
>
> typedef struct ReplayNetState ReplayNetState;
>
> +enum ReplaySnapshotMode {
> + REPLAY_SNAPSHOT_MODE_INITIAL,
> + REPLAY_SNAPSHOT_MODE_PERIODIC,
> +};
> +typedef enum ReplaySnapshotMode ReplaySnapshotMode;
> +
> +extern ReplaySnapshotMode replay_snapshot_mode;
> +
> +extern uint64_t replay_snapshot_periodic_delay;
> +extern int replay_snapshot_periodic_nr_keep;
Please put the internal variables and enum into the replay-internal.h
> +
> /* Name of the initial VM snapshot */
> extern char *replay_snapshot;
>
> diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
> index ccb4d89dda..762555feaa 100644
> --- a/replay/replay-snapshot.c
> +++ b/replay/replay-snapshot.c
> @@ -70,6 +70,53 @@ void replay_vmstate_register(void)
> vmstate_register(NULL, 0, &vmstate_replay, &replay_state);
> }
>
> +static QEMUTimer *replay_snapshot_timer;
> +static int replay_snapshot_count;
> +
> +static void replay_snapshot_timer_cb(void *opaque)
> +{
> + Error *err = NULL;
> + char *name;
> +
> + if (!replay_can_snapshot()) {
> + /* Try again soon */
> + timer_mod(replay_snapshot_timer,
> + qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> + replay_snapshot_periodic_delay / 10);
> + return;
> + }
> +
> + name = g_strdup_printf("%s-%d", replay_snapshot, replay_snapshot_count);
> + if (!save_snapshot(name,
> + true, NULL, false, NULL, &err)) {
> + error_report_err(err);
> + error_report("Could not create periodic snapshot "
> + "for icount record, disabling");
> + g_free(name);
> + return;
> + }
> + g_free(name);
> + replay_snapshot_count++;
> +
> + if (replay_snapshot_periodic_nr_keep >= 1 &&
> + replay_snapshot_count > replay_snapshot_periodic_nr_keep) {
> + int del_nr;
> +
> + del_nr = replay_snapshot_count - replay_snapshot_periodic_nr_keep - 1;
> + name = g_strdup_printf("%s-%d", replay_snapshot, del_nr);
> + if (!delete_snapshot(name, false, NULL, &err)) {
> + error_report_err(err);
> + error_report("Could not delete periodic snapshot "
> + "for icount record");
> + }
> + g_free(name);
> + }
> +
> + timer_mod(replay_snapshot_timer,
> + qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> + replay_snapshot_periodic_delay);
> +}
> +
> void replay_vmstate_init(void)
> {
> Error *err = NULL;
> @@ -82,6 +129,16 @@ void replay_vmstate_init(void)
> error_report("Could not create snapshot for icount record");
> exit(1);
> }
> +
> + if (replay_snapshot_mode == REPLAY_SNAPSHOT_MODE_PERIODIC) {
> + replay_snapshot_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
> + replay_snapshot_timer_cb,
> + NULL);
> + timer_mod(replay_snapshot_timer,
> + qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> + replay_snapshot_periodic_delay);
> + }
> +
> } else if (replay_mode == REPLAY_MODE_PLAY) {
> if (!load_snapshot(replay_snapshot, NULL, false, NULL, &err)) {
> error_report_err(err);
> diff --git a/replay/replay.c b/replay/replay.c
> index 895fa6b67a..c916e71d30 100644
> --- a/replay/replay.c
> +++ b/replay/replay.c
> @@ -29,6 +29,10 @@
> ReplayMode replay_mode = REPLAY_MODE_NONE;
> char *replay_snapshot;
>
> +ReplaySnapshotMode replay_snapshot_mode;
> +uint64_t replay_snapshot_periodic_delay;
> +int replay_snapshot_periodic_nr_keep;
> +
> /* Name of replay file */
> static char *replay_filename;
> ReplayState replay_state;
> @@ -424,6 +428,27 @@ void replay_configure(QemuOpts *opts)
> }
>
> replay_snapshot = g_strdup(qemu_opt_get(opts, "rrsnapshot"));
> + if (replay_snapshot && mode == REPLAY_MODE_RECORD) {
> + const char *snapmode;
> +
> + snapmode = qemu_opt_get(opts, "rrsnapmode");
> + if (!snapmode || !strcmp(snapmode, "initial")) {
> + replay_snapshot_mode = REPLAY_SNAPSHOT_MODE_INITIAL;
> + } else if (!strcmp(snapmode, "periodic")) {
> + replay_snapshot_mode = REPLAY_SNAPSHOT_MODE_PERIODIC;
> + } else {
> + error_report("Invalid rrsnapmode option: %s", snapmode);
> + exit(1);
> + }
> +
> + /* Default 10 host seconds of machine runtime per snapshot. */
> + replay_snapshot_periodic_delay =
> + qemu_opt_get_number(opts, "rrsnaptime", 10) * 1000;
> +
> + /* Default 2, to cover at least the last 10 host seconds of runtime. */
> + replay_snapshot_periodic_nr_keep =
> + qemu_opt_get_number(opts, "rrsnapcount", 2);
> + }
> replay_vmstate_register();
> replay_enable(fname, mode);
>
> @@ -446,8 +471,6 @@ void replay_start(void)
> exit(1);
> }
>
> - /* Timer for snapshotting will be set up here. */
> -
> replay_enable_events();
> }
>
> diff --git a/system/vl.c b/system/vl.c
> index 70f4cece7f..8070cee6da 100644
> --- a/system/vl.c
> +++ b/system/vl.c
> @@ -447,6 +447,15 @@ static QemuOptsList qemu_icount_opts = {
> }, {
> .name = "rrsnapshot",
> .type = QEMU_OPT_STRING,
> + }, {
> + .name = "rrsnapmode",
> + .type = QEMU_OPT_STRING,
> + }, {
> + .name = "rrsnaptime",
> + .type = QEMU_OPT_NUMBER,
> + }, {
> + .name = "rrsnapcount",
> + .type = QEMU_OPT_NUMBER,
> },
> { /* end of list */ }
> },
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 7fd1713fa8..51f2498f12 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -4665,13 +4665,13 @@ SRST
> ERST
>
> DEF("icount", HAS_ARG, QEMU_OPTION_icount, \
> - "-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=<filename>[,rrsnapshot=<snapshot>]]\n" \
> + "-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=<filename>[,rrsnapshot=<snapshot>][,rrsnapmode=initial|periodic][,rrsnaptime=secs][,rrsnapcount=N]\n" \
> " enable virtual instruction counter with 2^N clock ticks per\n" \
> " instruction, enable aligning the host and virtual clocks\n" \
> " or disable real time cpu sleeping, and optionally enable\n" \
> " record-and-replay mode\n", QEMU_ARCH_ALL)
> SRST
> -``-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=filename[,rrsnapshot=snapshot]]``
> +``-icount [shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=filename[,rrsnapshot=snapshot][,rrsnapmode=initial|periodic][,rrsnaptime=secs][,rrsnapcount=N]]``
> Enable virtual instruction counter. The virtual cpu will execute one
> instruction every 2^N ns of virtual time. If ``auto`` is specified
> then the virtual cpu speed will be automatically adjusted to keep
> @@ -4713,6 +4713,11 @@ SRST
> name. In record mode, a new VM snapshot with the given name is created
> at the start of execution recording. In replay mode this option
> specifies the snapshot name used to load the initial VM state.
> + ``rrsnapmode=periodic`` will additionally cause a periodic snapshot to
> + be created after ``rrsnaptime=secs`` seconds of real runtime. The last
> + ``rrsnapcount=N`` periodic snapshots (not including the initial) will
> + be kept (0 for infinite). Periodic snapshots are useful to speed
> + reverse debugging operations near the end of the recorded trace.
> ERST
>
> DEF("watchdog-action", HAS_ARG, QEMU_OPTION_watchdog_action, \
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2024-04-05 6:57 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-18 15:45 [PATCH v5 00/24] replay: fixes and new test cases Nicholas Piggin
2024-03-18 15:45 ` [PATCH v5 01/24] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
2024-03-18 15:45 ` [PATCH v5 02/24] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 03/24] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 04/24] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 05/24] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin
2024-03-19 17:58 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 06/24] tests/avocado: replay_kernel.py add x86-64 q35 machine test Nicholas Piggin
2024-03-19 17:58 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 07/24] chardev: set record/replay on the base device of a muxed device Nicholas Piggin
2024-03-19 19:17 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 08/24] replay: Fix migration use of clock Nicholas Piggin
2024-03-19 20:40 ` Alex Bennée
2024-03-20 4:48 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 09/24] replay: Fix migration replay_mutex locking Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 10/24] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state Nicholas Piggin
2024-03-19 20:41 ` Alex Bennée
2024-04-05 6:53 ` Pavel Dovgalyuk
2024-03-18 15:46 ` [PATCH v5 11/24] virtio-net: Use virtual time for RSC timers Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 12/24] savevm: Fix load_snapshot error path crash Nicholas Piggin
2024-03-19 20:49 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 13/24] tests/avocado: replay_linux.py remove the timeout expected guards Nicholas Piggin
2024-03-19 17:57 ` Alex Bennée
2024-03-20 4:34 ` Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 14/24] tests/avocado: reverse_debugging.py mark aarch64 and pseries as not flaky Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 15/24] tests/avocado: reverse_debugging.py add test for x86-64 q35 machine Nicholas Piggin
2024-03-19 20:50 ` Alex Bennée
2024-03-18 15:46 ` [PATCH v5 16/24] tests/avocado: reverse_debugging.py verify addresses between record and replay Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 17/24] tests/avocado: reverse_debugging.py stop VM before sampling icount Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 18/24] tests/avocado: reverse_debugging reverse-step at the end of the trace Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 19/24] tests/avocado: reverse_debugging.py add snapshot testing Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 20/24] replay: simple auto-snapshot mode for record Nicholas Piggin
2024-04-05 6:56 ` Pavel Dovgalyuk
2024-03-18 15:46 ` [PATCH v5 21/24] tests/avocado: reverse_debugging.py test auto-snapshot mode Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 22/24] target/ppc: fix timebase register reset state Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 23/24] spapr: Fix vpa dispatch count for record-replay Nicholas Piggin
2024-03-18 15:46 ` [PATCH v5 24/24] tests/avocado: replay_linux.py add ppc64 pseries test Nicholas Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).