* [PATCH v3 0/9] replay: fixes and new test cases @ 2024-02-26 8:29 Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 1/9] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin ` (8 more replies) 0 siblings, 9 replies; 16+ messages in thread From: Nicholas Piggin @ 2024-02-26 8:29 UTC (permalink / raw) To: qemu-devel Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal Since v2, here fixes became less minor so I rename the series. https://lore.kernel.org/qemu-devel/20240125160835.480488-1-npiggin@gmail.com/#r) * Found several more bugs (patches 5-8). * Enable the rr avocado test on pseries and aarch64 virt since they're passing here (and on gitlab, e.g., https://gitlab.com/npiggin/qemu/-/jobs/6253787216, https://gitlab.com/npiggin/qemu/-/jobs/6253787218). * Updated replay-dump script to John's feedback. x86-64 still has issues with replay and reverse debugging tests. replay_kernel.py seems to be timing dependent -- after patch 5 I had it pass 30/30 runs, then the following day 0/30 and I realized I had several other QEMU instances hogging the CPU which probably changed timings. So the first thing I would look at is timers and clocks. pseries had some rounding issues in time calculations that meant clock/timer were not replayed exactly as they were recorded, which caused hangs. Thanks, Nick Nicholas Piggin (9): scripts/replay-dump.py: Update to current rr record format scripts/replay-dump.py: rejig decoders in event number order tests/avocado: excercise scripts/replay-dump.py in replay tests replay: allow runstate shutdown->running when replaying trace Revert "replay: stop us hanging in rr_wait_io_event" chardev: set record/replay on the base device of a muxed device replay: Fix migration use of clock replay: Fix migration replay_mutex locking tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not flaky include/sysemu/replay.h | 5 - include/sysemu/runstate.h | 1 + migration/migration.h | 2 - accel/tcg/tcg-accel-ops-rr.c | 2 +- chardev/char.c | 71 ++++++++---- migration/migration.c | 19 +++- replay/replay.c | 23 +--- system/runstate.c | 19 ++++ scripts/replay-dump.py | 167 ++++++++++++++++++++--------- tests/avocado/replay_kernel.py | 16 +++ tests/avocado/replay_linux.py | 15 +++ tests/avocado/reverse_debugging.py | 9 +- 12 files changed, 233 insertions(+), 116 deletions(-) -- 2.42.0 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 1/9] scripts/replay-dump.py: Update to current rr record format 2024-02-26 8:29 [PATCH v3 0/9] replay: fixes and new test cases Nicholas Piggin @ 2024-02-26 8:29 ` Nicholas Piggin 2024-02-27 15:57 ` Alex Bennée 2024-02-26 8:29 ` [PATCH v3 2/9] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin ` (7 subsequent siblings) 8 siblings, 1 reply; 16+ messages in thread From: Nicholas Piggin @ 2024-02-26 8:29 UTC (permalink / raw) To: qemu-devel Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal The v12 format support for replay-dump has a few issues still. This fixes async decoding; adds event, shutdown, and end decoding; fixes audio in / out events, fixes checkpoint checking of following async events. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- scripts/replay-dump.py | 127 ++++++++++++++++++++++++++++++----------- 1 file changed, 93 insertions(+), 34 deletions(-) diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py index d668193e79..419ee3257b 100755 --- a/scripts/replay-dump.py +++ b/scripts/replay-dump.py @@ -20,6 +20,7 @@ import argparse import struct +import os from collections import namedtuple from os import path @@ -134,6 +135,17 @@ def swallow_async_qword(eid, name, dumpfile): print(" %s(%d) @ %d" % (name, eid, step_id)) return True +def swallow_bytes(eid, name, dumpfile, nr): + """Swallow nr bytes of data without looking at it""" + dumpfile.seek(nr, os.SEEK_CUR) + +def decode_exception(eid, name, dumpfile): + print_event(eid, name) + return True + +# v12 does away with the additional event byte and encodes it in the main type +# Between v8 and v9, REPLAY_ASYNC_BH_ONESHOT was added, but we don't decode +# those versions so leave it out. async_decode_table = [ Decoder(0, "REPLAY_ASYNC_EVENT_BH", swallow_async_qword), Decoder(1, "REPLAY_ASYNC_INPUT", decode_unimp), Decoder(2, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp), @@ -142,8 +154,8 @@ def swallow_async_qword(eid, name, dumpfile): Decoder(5, "REPLAY_ASYNC_EVENT_NET", decode_unimp), ] # See replay_read_events/replay_read_event -def decode_async(eid, name, dumpfile): - """Decode an ASYNC event""" +def decode_async_old(eid, name, dumpfile): + """Decode an ASYNC event (pre-v8)""" print_event(eid, name) @@ -157,6 +169,35 @@ def decode_async(eid, name, dumpfile): return call_decode(async_decode_table, async_event_kind, dumpfile) +def decode_async_bh(eid, name, dumpfile): + op_id = read_qword(dumpfile) + print_event(eid, name) + return True + +def decode_async_bh_oneshot(eid, name, dumpfile): + op_id = read_qword(dumpfile) + print_event(eid, name) + return True + +def decode_async_char_read(eid, name, dumpfile): + char_id = read_byte(dumpfile) + size = read_dword(dumpfile) + print_event(eid, name, "device:%x chars:%s" % (char_id, dumpfile.read(size))) + return True + +def decode_async_block(eid, name, dumpfile): + op_id = read_qword(dumpfile) + print_event(eid, name) + return True + +def decode_async_net(eid, name, dumpfile): + net_id = read_byte(dumpfile) + flags = read_dword(dumpfile) + size = read_dword(dumpfile) + swallow_bytes(eid, name, dumpfile, size) + print_event(eid, name, "net:%x flags:%x bytes:%d" % (net_id, flags, size)) + return True + total_insns = 0 def decode_instruction(eid, name, dumpfile): @@ -166,6 +207,10 @@ def decode_instruction(eid, name, dumpfile): print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns)) return True +def decode_shutdown(eid, name, dumpfile): + print_event(eid, name) + return True + def decode_char_write(eid, name, dumpfile): res = read_dword(dumpfile) offset = read_dword(dumpfile) @@ -177,7 +222,7 @@ def decode_audio_out(eid, name, dumpfile): print_event(eid, name, "%d" % (audio_data)) return True -def decode_checkpoint(eid, name, dumpfile): +def __decode_checkpoint(eid, name, dumpfile, old): """Decode a checkpoint. Checkpoints contain a series of async events with their own specific data. @@ -189,14 +234,20 @@ def decode_checkpoint(eid, name, dumpfile): # if the next event is EVENT_ASYNC there are a bunch of # async events to read, otherwise we are done - if next_event != 3: - print_event(eid, name, "no additional data", event_number) - else: + if (old and next_event == 3) or (not old and next_event >= 3 and next_event <= 9): print_event(eid, name, "more data follows", event_number) + else: + print_event(eid, name, "no additional data", event_number) replay_state.reuse_event(next_event) return True +def decode_checkpoint_old(eid, name, dumpfile): + return __decode_checkpoint(eid, name, dumpfile, False) + +def decode_checkpoint(eid, name, dumpfile): + return __decode_checkpoint(eid, name, dumpfile, True) + def decode_checkpoint_init(eid, name, dumpfile): print_event(eid, name) return True @@ -212,15 +263,23 @@ def decode_clock(eid, name, dumpfile): def decode_random(eid, name, dumpfile): ret = read_dword(dumpfile) - data = read_array(dumpfile) - print_event(eid, "%d bytes of random data" % len(data)) + size = read_dword(dumpfile) + swallow_bytes(eid, name, dumpfile, size) + if (ret): + print_event(eid, name, "%d bytes (getrandom failed)" % (size)) + else: + print_event(eid, name, "%d bytes" % (size)) return True +def decode_end(eid, name, dumpfile): + print_event(eid, name) + return False + # pre-MTTCG merge v5_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction), Decoder(1, "EVENT_INTERRUPT", decode_interrupt), Decoder(2, "EVENT_EXCEPTION", decode_plain), - Decoder(3, "EVENT_ASYNC", decode_async), + Decoder(3, "EVENT_ASYNC", decode_async_old), Decoder(4, "EVENT_SHUTDOWN", decode_unimp), Decoder(5, "EVENT_CHAR_WRITE", decode_char_write), Decoder(6, "EVENT_CHAR_READ_ALL", decode_unimp), @@ -242,7 +301,7 @@ def decode_random(eid, name, dumpfile): v6_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction), Decoder(1, "EVENT_INTERRUPT", decode_interrupt), Decoder(2, "EVENT_EXCEPTION", decode_plain), - Decoder(3, "EVENT_ASYNC", decode_async), + Decoder(3, "EVENT_ASYNC", decode_async_old), Decoder(4, "EVENT_SHUTDOWN", decode_unimp), Decoder(5, "EVENT_CHAR_WRITE", decode_char_write), Decoder(6, "EVENT_CHAR_READ_ALL", decode_unimp), @@ -266,7 +325,7 @@ def decode_random(eid, name, dumpfile): v7_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction), Decoder(1, "EVENT_INTERRUPT", decode_interrupt), Decoder(2, "EVENT_EXCEPTION", decode_unimp), - Decoder(3, "EVENT_ASYNC", decode_async), + Decoder(3, "EVENT_ASYNC", decode_async_old), Decoder(4, "EVENT_SHUTDOWN", decode_unimp), Decoder(5, "EVENT_SHUTDOWN_HOST_ERR", decode_unimp), Decoder(6, "EVENT_SHUTDOWN_HOST_QMP", decode_unimp), @@ -296,32 +355,31 @@ def decode_random(eid, name, dumpfile): v12_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction), Decoder(1, "EVENT_INTERRUPT", decode_interrupt), - Decoder(2, "EVENT_EXCEPTION", decode_plain), - Decoder(3, "EVENT_ASYNC", decode_async), - Decoder(4, "EVENT_ASYNC", decode_async), - Decoder(5, "EVENT_ASYNC", decode_async), - Decoder(6, "EVENT_ASYNC", decode_async), - Decoder(6, "EVENT_ASYNC", decode_async), - Decoder(8, "EVENT_ASYNC", decode_async), - Decoder(9, "EVENT_ASYNC", decode_async), - Decoder(10, "EVENT_ASYNC", decode_async), - Decoder(11, "EVENT_SHUTDOWN", decode_unimp), - Decoder(12, "EVENT_SHUTDOWN_HOST_ERR", decode_unimp), - Decoder(13, "EVENT_SHUTDOWN_HOST_QMP_QUIT", decode_unimp), - Decoder(14, "EVENT_SHUTDOWN_HOST_QMP_RESET", decode_unimp), - Decoder(14, "EVENT_SHUTDOWN_HOST_SIGNAL", decode_unimp), - Decoder(15, "EVENT_SHUTDOWN_HOST_UI", decode_unimp), - Decoder(16, "EVENT_SHUTDOWN_GUEST_SHUTDOWN", decode_unimp), - Decoder(17, "EVENT_SHUTDOWN_GUEST_RESET", decode_unimp), - Decoder(18, "EVENT_SHUTDOWN_GUEST_PANIC", decode_unimp), - Decoder(19, "EVENT_SHUTDOWN_GUEST_SUBSYSTEM_RESET", decode_unimp), - Decoder(20, "EVENT_SHUTDOWN_GUEST_SNAPSHOT_LOAD", decode_unimp), - Decoder(21, "EVENT_SHUTDOWN___MAX", decode_unimp), + Decoder(2, "EVENT_EXCEPTION", decode_exception), + Decoder(3, "EVENT_ASYNC_BH", decode_async_bh), + Decoder(4, "EVENT_ASYNC_BH_ONESHOT", decode_async_bh_oneshot), + Decoder(5, "EVENT_ASYNC_INPUT", decode_unimp), + Decoder(6, "EVENT_ASYNC_INPUT_SYNC", decode_unimp), + Decoder(7, "EVENT_ASYNC_CHAR_READ", decode_async_char_read), + Decoder(8, "EVENT_ASYNC_BLOCK", decode_async_block), + Decoder(9, "EVENT_ASYNC_NET", decode_async_net), + Decoder(10, "EVENT_SHUTDOWN", decode_shutdown), + Decoder(11, "EVENT_SHUTDOWN_HOST_ERR", decode_shutdown), + Decoder(12, "EVENT_SHUTDOWN_HOST_QMP_QUIT", decode_shutdown), + Decoder(13, "EVENT_SHUTDOWN_HOST_QMP_RESET", decode_shutdown), + Decoder(14, "EVENT_SHUTDOWN_HOST_SIGNAL", decode_shutdown), + Decoder(15, "EVENT_SHUTDOWN_HOST_UI", decode_shutdown), + Decoder(16, "EVENT_SHUTDOWN_GUEST_SHUTDOWN", decode_shutdown), + Decoder(17, "EVENT_SHUTDOWN_GUEST_RESET", decode_shutdown), + Decoder(18, "EVENT_SHUTDOWN_GUEST_PANIC", decode_shutdown), + Decoder(19, "EVENT_SHUTDOWN_SUBSYS_RESET", decode_shutdown), + Decoder(20, "EVENT_SHUTDOWN_SNAPSHOT_LOAD", decode_shutdown), + Decoder(21, "EVENT_SHUTDOWN___MAX", decode_shutdown), Decoder(22, "EVENT_CHAR_WRITE", decode_char_write), Decoder(23, "EVENT_CHAR_READ_ALL", decode_unimp), Decoder(24, "EVENT_CHAR_READ_ALL_ERROR", decode_unimp), - Decoder(25, "EVENT_AUDIO_IN", decode_unimp), - Decoder(26, "EVENT_AUDIO_OUT", decode_audio_out), + Decoder(25, "EVENT_AUDIO_OUT", decode_audio_out), + Decoder(26, "EVENT_AUDIO_IN", decode_unimp), Decoder(27, "EVENT_RANDOM", decode_random), Decoder(28, "EVENT_CLOCK_HOST", decode_clock), Decoder(29, "EVENT_CLOCK_VIRTUAL_RT", decode_clock), @@ -334,6 +392,7 @@ def decode_random(eid, name, dumpfile): Decoder(36, "EVENT_CP_CLOCK_VIRTUAL_RT", decode_checkpoint), Decoder(37, "EVENT_CP_INIT", decode_checkpoint_init), Decoder(38, "EVENT_CP_RESET", decode_checkpoint), + Decoder(39, "EVENT_END", decode_end), ] def parse_arguments(): -- 2.42.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 1/9] scripts/replay-dump.py: Update to current rr record format 2024-02-26 8:29 ` [PATCH v3 1/9] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin @ 2024-02-27 15:57 ` Alex Bennée 0 siblings, 0 replies; 16+ messages in thread From: Alex Bennée @ 2024-02-27 15:57 UTC (permalink / raw) To: Nicholas Piggin Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal Nicholas Piggin <npiggin@gmail.com> writes: > The v12 format support for replay-dump has a few issues still. This > fixes async decoding; adds event, shutdown, and end decoding; fixes > audio in / out events, fixes checkpoint checking of following async > events. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 2/9] scripts/replay-dump.py: rejig decoders in event number order 2024-02-26 8:29 [PATCH v3 0/9] replay: fixes and new test cases Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 1/9] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin @ 2024-02-26 8:29 ` Nicholas Piggin 2024-02-27 15:57 ` Alex Bennée 2024-02-26 8:29 ` [PATCH v3 3/9] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin ` (6 subsequent siblings) 8 siblings, 1 reply; 16+ messages in thread From: Nicholas Piggin @ 2024-02-26 8:29 UTC (permalink / raw) To: qemu-devel Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal Sort decoder functions to be ascending in order of event number, same as the decoder tables. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- scripts/replay-dump.py | 56 +++++++++++++++++++++--------------------- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py index 419ee3257b..b82659cfb6 100755 --- a/scripts/replay-dump.py +++ b/scripts/replay-dump.py @@ -139,6 +139,19 @@ def swallow_bytes(eid, name, dumpfile, nr): """Swallow nr bytes of data without looking at it""" dumpfile.seek(nr, os.SEEK_CUR) +total_insns = 0 + +def decode_instruction(eid, name, dumpfile): + global total_insns + ins_diff = read_dword(dumpfile) + total_insns += ins_diff + print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns)) + return True + +def decode_interrupt(eid, name, dumpfile): + print_event(eid, name) + return True + def decode_exception(eid, name, dumpfile): print_event(eid, name) return True @@ -198,15 +211,6 @@ def decode_async_net(eid, name, dumpfile): print_event(eid, name, "net:%x flags:%x bytes:%d" % (net_id, flags, size)) return True -total_insns = 0 - -def decode_instruction(eid, name, dumpfile): - global total_insns - ins_diff = read_dword(dumpfile) - total_insns += ins_diff - print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns)) - return True - def decode_shutdown(eid, name, dumpfile): print_event(eid, name) return True @@ -222,6 +226,21 @@ def decode_audio_out(eid, name, dumpfile): print_event(eid, name, "%d" % (audio_data)) return True +def decode_random(eid, name, dumpfile): + ret = read_dword(dumpfile) + size = read_dword(dumpfile) + swallow_bytes(eid, name, dumpfile, size) + if (ret): + print_event(eid, name, "%d bytes (getrandom failed)" % (size)) + else: + print_event(eid, name, "%d bytes" % (size)) + return True + +def decode_clock(eid, name, dumpfile): + clock_data = read_qword(dumpfile) + print_event(eid, name, "0x%x" % (clock_data)) + return True + def __decode_checkpoint(eid, name, dumpfile, old): """Decode a checkpoint. @@ -252,25 +271,6 @@ def decode_checkpoint_init(eid, name, dumpfile): print_event(eid, name) return True -def decode_interrupt(eid, name, dumpfile): - print_event(eid, name) - return True - -def decode_clock(eid, name, dumpfile): - clock_data = read_qword(dumpfile) - print_event(eid, name, "0x%x" % (clock_data)) - return True - -def decode_random(eid, name, dumpfile): - ret = read_dword(dumpfile) - size = read_dword(dumpfile) - swallow_bytes(eid, name, dumpfile, size) - if (ret): - print_event(eid, name, "%d bytes (getrandom failed)" % (size)) - else: - print_event(eid, name, "%d bytes" % (size)) - return True - def decode_end(eid, name, dumpfile): print_event(eid, name) return False -- 2.42.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/9] scripts/replay-dump.py: rejig decoders in event number order 2024-02-26 8:29 ` [PATCH v3 2/9] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin @ 2024-02-27 15:57 ` Alex Bennée 0 siblings, 0 replies; 16+ messages in thread From: Alex Bennée @ 2024-02-27 15:57 UTC (permalink / raw) To: Nicholas Piggin Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal Nicholas Piggin <npiggin@gmail.com> writes: > Sort decoder functions to be ascending in order of event number, > same as the decoder tables. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 3/9] tests/avocado: excercise scripts/replay-dump.py in replay tests 2024-02-26 8:29 [PATCH v3 0/9] replay: fixes and new test cases Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 1/9] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 2/9] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin @ 2024-02-26 8:29 ` Nicholas Piggin 2024-02-27 19:21 ` Alex Bennée 2024-02-26 8:29 ` [PATCH v3 4/9] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin ` (5 subsequent siblings) 8 siblings, 1 reply; 16+ messages in thread From: Nicholas Piggin @ 2024-02-26 8:29 UTC (permalink / raw) To: qemu-devel Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal This runs replay-dump.py after recording a trace, and fails the test if the script fails. replay-dump.py is modified to exit with non-zero if an error is encountered while parsing, to support this. Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- scripts/replay-dump.py | 6 ++++-- tests/avocado/replay_kernel.py | 16 ++++++++++++++++ tests/avocado/replay_linux.py | 15 +++++++++++++++ 3 files changed, 35 insertions(+), 2 deletions(-) diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py index b82659cfb6..4ce7ff51cc 100755 --- a/scripts/replay-dump.py +++ b/scripts/replay-dump.py @@ -21,6 +21,7 @@ import argparse import struct import os +import sys from collections import namedtuple from os import path @@ -100,7 +101,7 @@ def call_decode(table, index, dumpfile): print("Could not decode index: %d" % (index)) print("Entry is: %s" % (decoder)) print("Decode Table is:\n%s" % (table)) - return False + raise(Exception("unknown event")) else: return decoder.fn(decoder.eid, decoder.name, dumpfile) @@ -121,7 +122,7 @@ def print_event(eid, name, string=None, event_count=None): def decode_unimp(eid, name, _unused_dumpfile): "Unimplemented decoder, will trigger exit" print("%s not handled - will now stop" % (name)) - return False + raise(Exception("unhandled event")) def decode_plain(eid, name, _unused_dumpfile): "Plain events without additional data" @@ -434,6 +435,7 @@ def decode_file(filename): dumpfile) except Exception as inst: print(f"error {inst}") + sys.exit(1) finally: print(f"Reached {dumpfile.tell()} of {dumpsize} bytes") diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py index 10d99403a4..9b3ee6726b 100644 --- a/tests/avocado/replay_kernel.py +++ b/tests/avocado/replay_kernel.py @@ -13,6 +13,7 @@ import shutil import logging import time +import subprocess from avocado import skip from avocado import skipUnless @@ -22,6 +23,11 @@ from avocado.utils import process from boot_linux_console import LinuxKernelTest +from pathlib import Path + +self_dir = Path(__file__).parent +src_dir = self_dir.parent.parent + class ReplayKernelBase(LinuxKernelTest): """ Boots a Linux kernel in record mode and checks that the console @@ -63,6 +69,8 @@ def run_vm(self, kernel_path, kernel_command_line, console_pattern, vm.shutdown() logger.info('finished the recording with log size %s bytes' % os.path.getsize(replay_path)) + self.run_replay_dump(replay_path) + logger.info('successfully tested replay-dump.py') else: vm.wait() logger.info('successfully finished the replay') @@ -70,6 +78,14 @@ def run_vm(self, kernel_path, kernel_command_line, console_pattern, logger.info('elapsed time %.2f sec' % elapsed) return elapsed + def run_replay_dump(self, replay_path): + try: + subprocess.check_call(["./scripts/replay-dump.py", + "-f", replay_path], + cwd=src_dir, stdout=subprocess.DEVNULL) + except subprocess.CalledProcessError: + self.fail('replay-dump.py failed') + def run_rr(self, kernel_path, kernel_command_line, console_pattern, shift=7, args=None): replay_path = os.path.join(self.workdir, 'replay.bin') diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py index f3a43dc98c..e4539c5551 100644 --- a/tests/avocado/replay_linux.py +++ b/tests/avocado/replay_linux.py @@ -21,6 +21,11 @@ from avocado.utils.path import find_command from avocado_qemu import LinuxTest +from pathlib import Path + +self_dir = Path(__file__).parent +src_dir = self_dir.parent.parent + class ReplayLinux(LinuxTest): """ Boots a Linux system, checking for a successful initialization @@ -94,6 +99,8 @@ def launch_and_wait(self, record, args, shift): vm.shutdown() logger.info('finished the recording with log size %s bytes' % os.path.getsize(replay_path)) + self.run_replay_dump(replay_path) + logger.info('successfully tested replay-dump.py') else: vm.event_wait('SHUTDOWN', self.timeout) vm.wait() @@ -108,6 +115,14 @@ def run_rr(self, args=None, shift=7): logger = logging.getLogger('replay') logger.info('replay overhead {:.2%}'.format(t2 / t1 - 1)) + def run_replay_dump(self, replay_path): + try: + subprocess.check_call(["./scripts/replay-dump.py", + "-f", replay_path], + cwd=src_dir, stdout=subprocess.DEVNULL) + except subprocess.CalledProcessError: + self.fail('replay-dump.py failed') + @skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout') class ReplayLinuxX8664(ReplayLinux): """ -- 2.42.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 3/9] tests/avocado: excercise scripts/replay-dump.py in replay tests 2024-02-26 8:29 ` [PATCH v3 3/9] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin @ 2024-02-27 19:21 ` Alex Bennée 2024-02-29 3:10 ` Nicholas Piggin 0 siblings, 1 reply; 16+ messages in thread From: Alex Bennée @ 2024-02-27 19:21 UTC (permalink / raw) To: Nicholas Piggin Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal Nicholas Piggin <npiggin@gmail.com> writes: > This runs replay-dump.py after recording a trace, and fails the test if > the script fails. > > replay-dump.py is modified to exit with non-zero if an error is > encountered while parsing, to support this. > > Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > scripts/replay-dump.py | 6 ++++-- > tests/avocado/replay_kernel.py | 16 ++++++++++++++++ > tests/avocado/replay_linux.py | 15 +++++++++++++++ > 3 files changed, 35 insertions(+), 2 deletions(-) > > diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py > index b82659cfb6..4ce7ff51cc 100755 > --- a/scripts/replay-dump.py > +++ b/scripts/replay-dump.py > @@ -21,6 +21,7 @@ > import argparse > import struct > import os > +import sys > from collections import namedtuple > from os import path > > @@ -100,7 +101,7 @@ def call_decode(table, index, dumpfile): > print("Could not decode index: %d" % (index)) > print("Entry is: %s" % (decoder)) > print("Decode Table is:\n%s" % (table)) > - return False > + raise(Exception("unknown event")) > else: > return decoder.fn(decoder.eid, decoder.name, dumpfile) > > @@ -121,7 +122,7 @@ def print_event(eid, name, string=None, event_count=None): > def decode_unimp(eid, name, _unused_dumpfile): > "Unimplemented decoder, will trigger exit" > print("%s not handled - will now stop" % (name)) > - return False > + raise(Exception("unhandled event")) > > def decode_plain(eid, name, _unused_dumpfile): > "Plain events without additional data" > @@ -434,6 +435,7 @@ def decode_file(filename): > dumpfile) > except Exception as inst: > print(f"error {inst}") > + sys.exit(1) > > finally: > print(f"Reached {dumpfile.tell()} of {dumpsize} bytes") > diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py > index 10d99403a4..9b3ee6726b 100644 > --- a/tests/avocado/replay_kernel.py > +++ b/tests/avocado/replay_kernel.py > @@ -13,6 +13,7 @@ > import shutil > import logging > import time > +import subprocess > > from avocado import skip > from avocado import skipUnless > @@ -22,6 +23,11 @@ > from avocado.utils import process > from boot_linux_console import LinuxKernelTest > > +from pathlib import Path > + > +self_dir = Path(__file__).parent > +src_dir = self_dir.parent.parent > + Whats this for? It doesn't seem to be used. > class ReplayKernelBase(LinuxKernelTest): > """ > Boots a Linux kernel in record mode and checks that the console > @@ -63,6 +69,8 @@ def run_vm(self, kernel_path, kernel_command_line, console_pattern, > vm.shutdown() > logger.info('finished the recording with log size %s bytes' > % os.path.getsize(replay_path)) > + self.run_replay_dump(replay_path) > + logger.info('successfully tested replay-dump.py') > else: > vm.wait() > logger.info('successfully finished the replay') > @@ -70,6 +78,14 @@ def run_vm(self, kernel_path, kernel_command_line, console_pattern, > logger.info('elapsed time %.2f sec' % elapsed) > return elapsed > > + def run_replay_dump(self, replay_path): > + try: > + subprocess.check_call(["./scripts/replay-dump.py", > + "-f", replay_path], > + cwd=src_dir, stdout=subprocess.DEVNULL) > + except subprocess.CalledProcessError: > + self.fail('replay-dump.py failed') > + > def run_rr(self, kernel_path, kernel_command_line, console_pattern, > shift=7, args=None): > replay_path = os.path.join(self.workdir, 'replay.bin') > diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py > index f3a43dc98c..e4539c5551 100644 > --- a/tests/avocado/replay_linux.py > +++ b/tests/avocado/replay_linux.py > @@ -21,6 +21,11 @@ > from avocado.utils.path import find_command > from avocado_qemu import LinuxTest > > +from pathlib import Path > + > +self_dir = Path(__file__).parent > +src_dir = self_dir.parent.parent > + And here to. > class ReplayLinux(LinuxTest): > """ > Boots a Linux system, checking for a successful initialization > @@ -94,6 +99,8 @@ def launch_and_wait(self, record, args, shift): > vm.shutdown() > logger.info('finished the recording with log size %s bytes' > % os.path.getsize(replay_path)) > + self.run_replay_dump(replay_path) > + logger.info('successfully tested replay-dump.py') > else: > vm.event_wait('SHUTDOWN', self.timeout) > vm.wait() > @@ -108,6 +115,14 @@ def run_rr(self, args=None, shift=7): > logger = logging.getLogger('replay') > logger.info('replay overhead {:.2%}'.format(t2 / t1 - 1)) > > + def run_replay_dump(self, replay_path): > + try: > + subprocess.check_call(["./scripts/replay-dump.py", > + "-f", replay_path], > + cwd=src_dir, stdout=subprocess.DEVNULL) > + except subprocess.CalledProcessError: > + self.fail('replay-dump.py failed') > + > @skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout') > class ReplayLinuxX8664(ReplayLinux): > """ -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 3/9] tests/avocado: excercise scripts/replay-dump.py in replay tests 2024-02-27 19:21 ` Alex Bennée @ 2024-02-29 3:10 ` Nicholas Piggin 0 siblings, 0 replies; 16+ messages in thread From: Nicholas Piggin @ 2024-02-29 3:10 UTC (permalink / raw) To: Alex Bennée Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal On Wed Feb 28, 2024 at 5:21 AM AEST, Alex Bennée wrote: > Nicholas Piggin <npiggin@gmail.com> writes: > > > This runs replay-dump.py after recording a trace, and fails the test if > > the script fails. > > > > replay-dump.py is modified to exit with non-zero if an error is > > encountered while parsing, to support this. > > > > Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > --- > > scripts/replay-dump.py | 6 ++++-- > > tests/avocado/replay_kernel.py | 16 ++++++++++++++++ > > tests/avocado/replay_linux.py | 15 +++++++++++++++ > > 3 files changed, 35 insertions(+), 2 deletions(-) > > > > diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py > > index b82659cfb6..4ce7ff51cc 100755 > > --- a/scripts/replay-dump.py > > +++ b/scripts/replay-dump.py > > @@ -21,6 +21,7 @@ > > import argparse > > import struct > > import os > > +import sys > > from collections import namedtuple > > from os import path > > > > @@ -100,7 +101,7 @@ def call_decode(table, index, dumpfile): > > print("Could not decode index: %d" % (index)) > > print("Entry is: %s" % (decoder)) > > print("Decode Table is:\n%s" % (table)) > > - return False > > + raise(Exception("unknown event")) > > else: > > return decoder.fn(decoder.eid, decoder.name, dumpfile) > > > > @@ -121,7 +122,7 @@ def print_event(eid, name, string=None, event_count=None): > > def decode_unimp(eid, name, _unused_dumpfile): > > "Unimplemented decoder, will trigger exit" > > print("%s not handled - will now stop" % (name)) > > - return False > > + raise(Exception("unhandled event")) > > > > def decode_plain(eid, name, _unused_dumpfile): > > "Plain events without additional data" > > @@ -434,6 +435,7 @@ def decode_file(filename): > > dumpfile) > > except Exception as inst: > > print(f"error {inst}") > > + sys.exit(1) > > > > finally: > > print(f"Reached {dumpfile.tell()} of {dumpsize} bytes") > > diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py > > index 10d99403a4..9b3ee6726b 100644 > > --- a/tests/avocado/replay_kernel.py > > +++ b/tests/avocado/replay_kernel.py > > @@ -13,6 +13,7 @@ > > import shutil > > import logging > > import time > > +import subprocess > > > > from avocado import skip > > from avocado import skipUnless > > @@ -22,6 +23,11 @@ > > from avocado.utils import process > > from boot_linux_console import LinuxKernelTest > > > > +from pathlib import Path > > + > > +self_dir = Path(__file__).parent > > +src_dir = self_dir.parent.parent > > + > > Whats this for? It doesn't seem to be used. Thanks for finding it, must be just a leftover from some earlier hack. I'll take it out. Thanks, Nick ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 4/9] replay: allow runstate shutdown->running when replaying trace 2024-02-26 8:29 [PATCH v3 0/9] replay: fixes and new test cases Nicholas Piggin ` (2 preceding siblings ...) 2024-02-26 8:29 ` [PATCH v3 3/9] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin @ 2024-02-26 8:29 ` Nicholas Piggin 2024-02-27 19:28 ` Alex Bennée 2024-02-26 8:29 ` [PATCH v3 5/9] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin ` (4 subsequent siblings) 8 siblings, 1 reply; 16+ messages in thread From: Nicholas Piggin @ 2024-02-26 8:29 UTC (permalink / raw) To: qemu-devel Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal When replaying a trace, it is possible to go from shutdown to running with a reverse-debugging step. This can be useful if the problem being debugged triggers a reset or shutdown. This can be tested by making a recording of a machine that shuts down, then using -action shutdown=pause when replaying it. Continuing to the end of the trace then reverse-stepping in gdb crashes due to invalid runstate transition. Just permitting the transition seems to be all that's necessary for reverse-debugging to work well in such a state. Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- include/sysemu/runstate.h | 1 + replay/replay.c | 2 ++ system/runstate.c | 19 +++++++++++++++++++ 3 files changed, 22 insertions(+) diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h index 0117d243c4..fe25eed3c0 100644 --- a/include/sysemu/runstate.h +++ b/include/sysemu/runstate.h @@ -9,6 +9,7 @@ void runstate_set(RunState new_state); RunState runstate_get(void); bool runstate_is_running(void); bool runstate_needs_reset(void); +void runstate_replay_enable(void); typedef void VMChangeStateHandler(void *opaque, bool running, RunState state); diff --git a/replay/replay.c b/replay/replay.c index 3fd241a4fc..2951eed3bd 100644 --- a/replay/replay.c +++ b/replay/replay.c @@ -383,6 +383,8 @@ static void replay_enable(const char *fname, int mode) /* go to the beginning */ fseek(replay_file, HEADER_SIZE, SEEK_SET); replay_fetch_data_kind(); + + runstate_replay_enable(); } replay_init_events(); diff --git a/system/runstate.c b/system/runstate.c index d6ab860eca..bd0fed8657 100644 --- a/system/runstate.c +++ b/system/runstate.c @@ -182,6 +182,12 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE__MAX, RUN_STATE__MAX }, }; +static const RunStateTransition replay_runstate_transitions_def[] = { + { RUN_STATE_SHUTDOWN, RUN_STATE_RUNNING}, + + { RUN_STATE__MAX, RUN_STATE__MAX }, +}; + static bool runstate_valid_transitions[RUN_STATE__MAX][RUN_STATE__MAX]; bool runstate_check(RunState state) @@ -189,6 +195,19 @@ bool runstate_check(RunState state) return current_run_state == state; } +void runstate_replay_enable(void) +{ + const RunStateTransition *p; + + assert(replay_mode == REPLAY_MODE_PLAY); + + for (p = &replay_runstate_transitions_def[0]; p->from != RUN_STATE__MAX; + p++) { + runstate_valid_transitions[p->from][p->to] = true; + } + +} + static void runstate_init(void) { const RunStateTransition *p; -- 2.42.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 4/9] replay: allow runstate shutdown->running when replaying trace 2024-02-26 8:29 ` [PATCH v3 4/9] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin @ 2024-02-27 19:28 ` Alex Bennée 0 siblings, 0 replies; 16+ messages in thread From: Alex Bennée @ 2024-02-27 19:28 UTC (permalink / raw) To: Nicholas Piggin Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal Nicholas Piggin <npiggin@gmail.com> writes: > When replaying a trace, it is possible to go from shutdown to running > with a reverse-debugging step. This can be useful if the problem being > debugged triggers a reset or shutdown. > > This can be tested by making a recording of a machine that shuts down, > then using -action shutdown=pause when replaying it. Continuing to the > end of the trace then reverse-stepping in gdb crashes due to invalid > runstate transition. > > Just permitting the transition seems to be all that's necessary for > reverse-debugging to work well in such a state. > > Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > include/sysemu/runstate.h | 1 + > replay/replay.c | 2 ++ > system/runstate.c | 19 +++++++++++++++++++ > 3 files changed, 22 insertions(+) > > diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h > index 0117d243c4..fe25eed3c0 100644 > --- a/include/sysemu/runstate.h > +++ b/include/sysemu/runstate.h > @@ -9,6 +9,7 @@ void runstate_set(RunState new_state); > RunState runstate_get(void); > bool runstate_is_running(void); > bool runstate_needs_reset(void); > +void runstate_replay_enable(void); > > typedef void VMChangeStateHandler(void *opaque, bool running, RunState state); > > diff --git a/replay/replay.c b/replay/replay.c > index 3fd241a4fc..2951eed3bd 100644 > --- a/replay/replay.c > +++ b/replay/replay.c > @@ -383,6 +383,8 @@ static void replay_enable(const char *fname, int mode) > /* go to the beginning */ > fseek(replay_file, HEADER_SIZE, SEEK_SET); > replay_fetch_data_kind(); > + > + runstate_replay_enable(); > } > > replay_init_events(); > diff --git a/system/runstate.c b/system/runstate.c > index d6ab860eca..bd0fed8657 100644 > --- a/system/runstate.c > +++ b/system/runstate.c > @@ -182,6 +182,12 @@ static const RunStateTransition runstate_transitions_def[] = { > { RUN_STATE__MAX, RUN_STATE__MAX }, > }; > > +static const RunStateTransition replay_runstate_transitions_def[] = { > + { RUN_STATE_SHUTDOWN, RUN_STATE_RUNNING}, > + > + { RUN_STATE__MAX, RUN_STATE__MAX }, > +}; > + > static bool runstate_valid_transitions[RUN_STATE__MAX][RUN_STATE__MAX]; > > bool runstate_check(RunState state) > @@ -189,6 +195,19 @@ bool runstate_check(RunState state) > return current_run_state == state; > } > > +void runstate_replay_enable(void) > +{ > + const RunStateTransition *p; > + > + assert(replay_mode == REPLAY_MODE_PLAY); > + As the bellow is common to runstate_init it could be re-factored into a helper taking the runstate_transitions_def pointer. > + for (p = &replay_runstate_transitions_def[0]; p->from != RUN_STATE__MAX; > + p++) { > + runstate_valid_transitions[p->from][p->to] = true; > + } > + > +} > + > static void runstate_init(void) > { > const RunStateTransition *p; -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 5/9] Revert "replay: stop us hanging in rr_wait_io_event" 2024-02-26 8:29 [PATCH v3 0/9] replay: fixes and new test cases Nicholas Piggin ` (3 preceding siblings ...) 2024-02-26 8:29 ` [PATCH v3 4/9] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin @ 2024-02-26 8:29 ` Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 6/9] chardev: set record/replay on the base device of a muxed device Nicholas Piggin ` (3 subsequent siblings) 8 siblings, 0 replies; 16+ messages in thread From: Nicholas Piggin @ 2024-02-26 8:29 UTC (permalink / raw) To: qemu-devel Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal This reverts commit 1f881ea4a444ef36a8b6907b0b82be4b3af253a2. That commit causes reverse_debugging.py test failures, and does not seem to solve the root cause of the problem x86-64 still hangs in record/replay tests. The problem with short-cutting the iowait that was taken during record phase is that related events will not get consumed at the same points (e.g., reading the clock). A hang with zero icount always seems to be a symptom of an earlier problem that has caused the recording to become out of synch with the execution and consumption of events by replay. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- include/sysemu/replay.h | 5 ----- accel/tcg/tcg-accel-ops-rr.c | 2 +- replay/replay.c | 21 --------------------- 3 files changed, 1 insertion(+), 27 deletions(-) diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h index f229b2109c..8102fa54f0 100644 --- a/include/sysemu/replay.h +++ b/include/sysemu/replay.h @@ -73,11 +73,6 @@ int replay_get_instructions(void); /*! Updates instructions counter in replay mode. */ void replay_account_executed_instructions(void); -/** - * replay_can_wait: check if we should pause for wait-io - */ -bool replay_can_wait(void); - /* Processing clocks and other time sources */ /*! Save the specified clock */ diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c index 894e73e52c..a942442a33 100644 --- a/accel/tcg/tcg-accel-ops-rr.c +++ b/accel/tcg/tcg-accel-ops-rr.c @@ -109,7 +109,7 @@ static void rr_wait_io_event(void) { CPUState *cpu; - while (all_cpu_threads_idle() && replay_can_wait()) { + while (all_cpu_threads_idle()) { rr_stop_kick_timer(); qemu_cond_wait_bql(first_cpu->halt_cond); } diff --git a/replay/replay.c b/replay/replay.c index 2951eed3bd..ee2b979dfc 100644 --- a/replay/replay.c +++ b/replay/replay.c @@ -451,27 +451,6 @@ void replay_start(void) replay_enable_events(); } -/* - * For none/record the answer is yes. - */ -bool replay_can_wait(void) -{ - if (replay_mode == REPLAY_MODE_PLAY) { - /* - * For playback we shouldn't ever be at a point we wait. If - * the instruction count has reached zero and we have an - * unconsumed event we should go around again and consume it. - */ - if (replay_state.instruction_count == 0 && replay_state.has_unread_data) { - return false; - } else { - replay_sync_error("Playback shouldn't have to iowait"); - } - } - return true; -} - - void replay_finish(void) { if (replay_mode == REPLAY_MODE_NONE) { -- 2.42.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 6/9] chardev: set record/replay on the base device of a muxed device 2024-02-26 8:29 [PATCH v3 0/9] replay: fixes and new test cases Nicholas Piggin ` (4 preceding siblings ...) 2024-02-26 8:29 ` [PATCH v3 5/9] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin @ 2024-02-26 8:29 ` Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 7/9] replay: Fix migration use of clock Nicholas Piggin ` (2 subsequent siblings) 8 siblings, 0 replies; 16+ messages in thread From: Nicholas Piggin @ 2024-02-26 8:29 UTC (permalink / raw) To: qemu-devel Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal chardev events to a muxed device don't get recorded because e.g., qemu_chr_be_write() checks whether the base device has the record flag set. This can be seen when replaying a trace that has characters typed into the console, an examination of the log shows they are not recorded. Setting QEMU_CHAR_FEATURE_REPLAY on the base chardev fixes the problem. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- chardev/char.c | 71 +++++++++++++++++++++++++++++++++++--------------- 1 file changed, 50 insertions(+), 21 deletions(-) diff --git a/chardev/char.c b/chardev/char.c index 3c43fb1278..ba847b6e9e 100644 --- a/chardev/char.c +++ b/chardev/char.c @@ -615,11 +615,24 @@ ChardevBackend *qemu_chr_parse_opts(QemuOpts *opts, Error **errp) return backend; } -Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context, - Error **errp) +static void qemu_chardev_set_replay(Chardev *chr, Error **errp) +{ + if (replay_mode != REPLAY_MODE_NONE) { + if (CHARDEV_GET_CLASS(chr)->chr_ioctl) { + error_setg(errp, "Replay: ioctl is not supported " + "for serial devices yet"); + return; + } + qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_REPLAY); + replay_register_char_driver(chr); + } +} + +static Chardev *__qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context, + bool replay, Error **errp) { const ChardevClass *cc; - Chardev *chr = NULL; + Chardev *base = NULL, *chr = NULL; ChardevBackend *backend = NULL; const char *name = qemu_opt_get(opts, "backend"); const char *id = qemu_opts_id(opts); @@ -657,11 +670,11 @@ Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context, chr = qemu_chardev_new(bid ? bid : id, object_class_get_name(OBJECT_CLASS(cc)), backend, context, errp); - if (chr == NULL) { goto out; } + base = chr; if (bid) { Chardev *mux; qapi_free_ChardevBackend(backend); @@ -681,11 +694,25 @@ Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context, out: qapi_free_ChardevBackend(backend); g_free(bid); + + if (replay && base) { + /* RR should be set on the base device, not the mux */ + qemu_chardev_set_replay(base, errp); + } + return chr; } -Chardev *qemu_chr_new_noreplay(const char *label, const char *filename, - bool permit_mux_mon, GMainContext *context) +Chardev *qemu_chr_new_from_opts(QemuOpts *opts, GMainContext *context, + Error **errp) +{ + /* XXX: should this really not record/replay? */ + return __qemu_chr_new_from_opts(opts, context, false, errp); +} + +static Chardev *__qemu_chr_new(const char *label, const char *filename, + bool permit_mux_mon, GMainContext *context, + bool replay) { const char *p; Chardev *chr; @@ -693,14 +720,22 @@ Chardev *qemu_chr_new_noreplay(const char *label, const char *filename, Error *err = NULL; if (strstart(filename, "chardev:", &p)) { - return qemu_chr_find(p); + chr = qemu_chr_find(p); + if (replay) { + qemu_chardev_set_replay(chr, &err); + if (err) { + error_report_err(err); + return NULL; + } + } + return chr; } opts = qemu_chr_parse_compat(label, filename, permit_mux_mon); if (!opts) return NULL; - chr = qemu_chr_new_from_opts(opts, context, &err); + chr = __qemu_chr_new_from_opts(opts, context, replay, &err); if (!chr) { error_report_err(err); goto out; @@ -722,24 +757,18 @@ out: return chr; } +Chardev *qemu_chr_new_noreplay(const char *label, const char *filename, + bool permit_mux_mon, GMainContext *context) +{ + return __qemu_chr_new(label, filename, permit_mux_mon, context, false); +} + static Chardev *qemu_chr_new_permit_mux_mon(const char *label, const char *filename, bool permit_mux_mon, GMainContext *context) { - Chardev *chr; - chr = qemu_chr_new_noreplay(label, filename, permit_mux_mon, context); - if (chr) { - if (replay_mode != REPLAY_MODE_NONE) { - qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_REPLAY); - } - if (qemu_chr_replay(chr) && CHARDEV_GET_CLASS(chr)->chr_ioctl) { - error_report("Replay: ioctl is not supported " - "for serial devices yet"); - } - replay_register_char_driver(chr); - } - return chr; + return __qemu_chr_new(label, filename, permit_mux_mon, context, true); } Chardev *qemu_chr_new(const char *label, const char *filename, -- 2.42.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 7/9] replay: Fix migration use of clock 2024-02-26 8:29 [PATCH v3 0/9] replay: fixes and new test cases Nicholas Piggin ` (5 preceding siblings ...) 2024-02-26 8:29 ` [PATCH v3 6/9] chardev: set record/replay on the base device of a muxed device Nicholas Piggin @ 2024-02-26 8:29 ` Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 8/9] replay: Fix migration replay_mutex locking Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 9/9] tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not flaky Nicholas Piggin 8 siblings, 0 replies; 16+ messages in thread From: Nicholas Piggin @ 2024-02-26 8:29 UTC (permalink / raw) To: qemu-devel Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal Migration reads host clocks when not holding the replay_mutex, which asserts when recording a trace. It seems that the times should be host times like other statistics in MigrationState, so this can easily be converted. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- migration/migration.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index ab21de2cad..2e794db75c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3316,7 +3316,7 @@ static void *migration_thread(void *opaque) { MigrationState *s = opaque; MigrationThread *thread = NULL; - int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST); + int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); MigThrError thr_error; bool urgent = false; @@ -3368,7 +3368,7 @@ static void *migration_thread(void *opaque) qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE); - s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; + s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start; trace_migration_thread_setup_complete(); @@ -3447,7 +3447,7 @@ static void *bg_migration_thread(void *opaque) migration_rate_set(RATE_LIMIT_DISABLED); - setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST); + setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); /* * We want to save vmstate for the moment when migration has been * initiated but also we want to save RAM content while VM is running. @@ -3480,7 +3480,7 @@ static void *bg_migration_thread(void *opaque) qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE); - s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; + s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start; trace_migration_thread_setup_complete(); migration_downtime_start(s); -- 2.42.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 8/9] replay: Fix migration replay_mutex locking 2024-02-26 8:29 [PATCH v3 0/9] replay: fixes and new test cases Nicholas Piggin ` (6 preceding siblings ...) 2024-02-26 8:29 ` [PATCH v3 7/9] replay: Fix migration use of clock Nicholas Piggin @ 2024-02-26 8:29 ` Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 9/9] tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not flaky Nicholas Piggin 8 siblings, 0 replies; 16+ messages in thread From: Nicholas Piggin @ 2024-02-26 8:29 UTC (permalink / raw) To: qemu-devel Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal Migration causes a number of events that need to go in the replay trace, such as vm state transitions. The replay_mutex lock needs to be held for these. The simplest approach seems to be just take it up-front when taking the bql. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- migration/migration.h | 2 -- migration/migration.c | 11 ++++++++++- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index f2c8b8f286..0621479a4e 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -543,6 +543,4 @@ int migration_rp_wait(MigrationState *s); */ void migration_rp_kick(MigrationState *s); -int migration_stop_vm(RunState state); - #endif diff --git a/migration/migration.c b/migration/migration.c index 2e794db75c..80a5ce17d1 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -24,6 +24,7 @@ #include "socket.h" #include "sysemu/runstate.h" #include "sysemu/sysemu.h" +#include "sysemu/replay.h" #include "sysemu/cpu-throttle.h" #include "rdma.h" #include "ram.h" @@ -162,7 +163,7 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp) return (a > b) - (a < b); } -int migration_stop_vm(RunState state) +static int migration_stop_vm(RunState state) { int ret = vm_stop_force_state(state); @@ -2433,6 +2434,7 @@ static int postcopy_start(MigrationState *ms, Error **errp) } trace_postcopy_start(); + replay_mutex_lock(); bql_lock(); trace_postcopy_start_set_run(); @@ -2542,6 +2544,7 @@ static int postcopy_start(MigrationState *ms, Error **errp) migration_downtime_end(ms); bql_unlock(); + replay_mutex_unlock(); if (migrate_postcopy_ram()) { /* @@ -2583,6 +2586,7 @@ fail: } } bql_unlock(); + replay_mutex_unlock(); return -1; } @@ -2634,6 +2638,7 @@ static int migration_completion_precopy(MigrationState *s, { int ret; + replay_mutex_lock(); bql_lock(); migration_downtime_start(s); @@ -2662,6 +2667,7 @@ static int migration_completion_precopy(MigrationState *s, s->block_inactive); out_unlock: bql_unlock(); + replay_mutex_unlock(); return ret; } @@ -3485,6 +3491,7 @@ static void *bg_migration_thread(void *opaque) trace_migration_thread_setup_complete(); migration_downtime_start(s); + replay_mutex_lock(); bql_lock(); s->vm_old_state = runstate_get(); @@ -3522,6 +3529,7 @@ static void *bg_migration_thread(void *opaque) */ migration_bh_schedule(bg_migration_vm_start_bh, s); bql_unlock(); + replay_mutex_unlock(); while (migration_is_active(s)) { MigIterateState iter_state = bg_migration_iteration_run(s); @@ -3551,6 +3559,7 @@ fail: migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_FAILED); bql_unlock(); + replay_mutex_unlock(); } bg_migration_iteration_finish(s); -- 2.42.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 9/9] tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not flaky 2024-02-26 8:29 [PATCH v3 0/9] replay: fixes and new test cases Nicholas Piggin ` (7 preceding siblings ...) 2024-02-26 8:29 ` [PATCH v3 8/9] replay: Fix migration replay_mutex locking Nicholas Piggin @ 2024-02-26 8:29 ` Nicholas Piggin 2024-02-27 19:36 ` Alex Bennée 8 siblings, 1 reply; 16+ messages in thread From: Nicholas Piggin @ 2024-02-26 8:29 UTC (permalink / raw) To: qemu-devel Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal These seem to be quite solid, including on several gitlab CI runs. Enabling them should help catch breakage in future. And update the powernv comment -- gitlab isn't the problem, there are known gaps in implementation. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- tests/avocado/reverse_debugging.py | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py index 92855a02a5..8fe76ff921 100644 --- a/tests/avocado/reverse_debugging.py +++ b/tests/avocado/reverse_debugging.py @@ -223,9 +223,6 @@ class ReverseDebugging_AArch64(ReverseDebugging): REG_PC = 32 - # unidentified gitlab timeout problem - @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab') - def test_aarch64_virt(self): """ :avocado: tags=arch:aarch64 @@ -248,14 +245,10 @@ class ReverseDebugging_ppc64(ReverseDebugging): REG_PC = 0x40 - # unidentified gitlab timeout problem - @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab') - def test_ppc64_pseries(self): """ :avocado: tags=arch:ppc64 :avocado: tags=machine:pseries - :avocado: tags=flaky """ # SLOF branches back to its entry point, which causes this test # to take the 'hit a breakpoint again' path. That's not a problem, @@ -264,7 +257,7 @@ def test_ppc64_pseries(self): self.reverse_debugging() # See https://gitlab.com/qemu-project/qemu/-/issues/1992 - @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab') + @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'powernv migration support is incomplete so rr debugging is flaky') def test_ppc64_powernv(self): """ -- 2.42.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 9/9] tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not flaky 2024-02-26 8:29 ` [PATCH v3 9/9] tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not flaky Nicholas Piggin @ 2024-02-27 19:36 ` Alex Bennée 0 siblings, 0 replies; 16+ messages in thread From: Alex Bennée @ 2024-02-27 19:36 UTC (permalink / raw) To: Nicholas Piggin Cc: qemu-devel, Pavel Dovgalyuk, Philippe Mathieu-Daudé, Richard Henderson, Paolo Bonzini, John Snow, Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal Nicholas Piggin <npiggin@gmail.com> writes: > These seem to be quite solid, including on several gitlab CI runs. > Enabling them should help catch breakage in future. > > And update the powernv comment -- gitlab isn't the problem, there are > known gaps in implementation. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> > --- > tests/avocado/reverse_debugging.py | 9 +-------- > 1 file changed, 1 insertion(+), 8 deletions(-) > > diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py > index 92855a02a5..8fe76ff921 100644 > --- a/tests/avocado/reverse_debugging.py > +++ b/tests/avocado/reverse_debugging.py > @@ -223,9 +223,6 @@ class ReverseDebugging_AArch64(ReverseDebugging): > > REG_PC = 32 > > - # unidentified gitlab timeout problem > - @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab') > - > def test_aarch64_virt(self): > """ > :avocado: tags=arch:aarch64 > @@ -248,14 +245,10 @@ class ReverseDebugging_ppc64(ReverseDebugging): > > REG_PC = 0x40 > > - # unidentified gitlab timeout problem > - @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab') > - > def test_ppc64_pseries(self): > """ > :avocado: tags=arch:ppc64 > :avocado: tags=machine:pseries > - :avocado: tags=flaky > """ > # SLOF branches back to its entry point, which causes this test > # to take the 'hit a breakpoint again' path. That's not a problem, > @@ -264,7 +257,7 @@ def test_ppc64_pseries(self): > self.reverse_debugging() > > # See https://gitlab.com/qemu-project/qemu/-/issues/1992 > - @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab') > + @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'powernv migration support is incomplete so rr debugging is flaky') > > def test_ppc64_powernv(self): > """ -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-02-29 3:11 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-26 8:29 [PATCH v3 0/9] replay: fixes and new test cases Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 1/9] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin 2024-02-27 15:57 ` Alex Bennée 2024-02-26 8:29 ` [PATCH v3 2/9] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin 2024-02-27 15:57 ` Alex Bennée 2024-02-26 8:29 ` [PATCH v3 3/9] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin 2024-02-27 19:21 ` Alex Bennée 2024-02-29 3:10 ` Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 4/9] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin 2024-02-27 19:28 ` Alex Bennée 2024-02-26 8:29 ` [PATCH v3 5/9] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 6/9] chardev: set record/replay on the base device of a muxed device Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 7/9] replay: Fix migration use of clock Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 8/9] replay: Fix migration replay_mutex locking Nicholas Piggin 2024-02-26 8:29 ` [PATCH v3 9/9] tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not flaky Nicholas Piggin 2024-02-27 19:36 ` Alex Bennée
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).