qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/24] replay: fixes and new test cases
@ 2024-03-11 17:40 Nicholas Piggin
  2024-03-11 17:40 ` [PATCH v4 01/24] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
                   ` (23 more replies)
  0 siblings, 24 replies; 43+ messages in thread
From: Nicholas Piggin @ 2024-03-11 17:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

Since v3,

* Attacked the replay_linux.py bugs and found a bunch of gaps
  in networking that was causing the hangs.
* And several powerpc bugs that were also causing problems on
  pseries.
* Added ppc test to replay_linux.py now that it's working.
* Found several crash bugs in record/replay vs migration.
* Added snapshot and more stepping tests to reverse_debugging.py
* Addressed comments in auto-snapshot code.
* Added auto-snapshot test case.
* "Solved" x86-64 issues in test cases by switching to q35, which
  seems to have less problems.

The last 3 patches I will take in the ppc tree, but included here
because powerpc is the only one that survives the record-replay test
with auto-snapshots at the moment.

Thanks,
Nick

Since v2, here fixes became less minor so I rename the series.

https://lore.kernel.org/qemu-devel/20240125160835.480488-1-npiggin@gmail.com/#r)

* Found several more bugs (patches 5-8).
* Enable the rr avocado test on pseries and aarch64 virt since they're
  passing here (and on gitlab, e.g.,
  https://gitlab.com/npiggin/qemu/-/jobs/6253787216,
  https://gitlab.com/npiggin/qemu/-/jobs/6253787218).
* Updated replay-dump script to John's feedback.

x86-64 still has issues with replay and reverse debugging tests.
replay_kernel.py seems to be timing dependent -- after patch 5 I
had it pass 30/30 runs, then the following day 0/30 and I realized
I had several other QEMU instances hogging the CPU which probably
changed timings. So the first thing I would look at is timers and
clocks. pseries had some rounding issues in time calculations that meant
clock/timer were not replayed exactly as they were recorded, which
caused hangs.

Thanks,
Nick

Nicholas Piggin (24):
  scripts/replay-dump.py: Update to current rr record format
  scripts/replay-dump.py: rejig decoders in event number order
  tests/avocado: excercise scripts/replay-dump.py in replay tests
  replay: allow runstate shutdown->running when replaying trace
  Revert "replay: stop us hanging in rr_wait_io_event"
  chardev: set record/replay on the base device of a muxed device
  replay: Fix migration use of clock
  replay: Fix migration replay_mutex locking
  virtio-net: Use replay_schedule_bh_event for bhs that affect machine
    state
  virtio-net: Use virtual time for RSC timers
  net: Use virtual time for net announce
  savevm: Fix load_snapshot error path crash
  tests/avocado: replay_linux.py remove the timeout expected guards
  tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not
    flaky
  tests/avocado: reverse_debugging.py add test for x86-64 q35 machine
  tests/avocado: reverse_debugging.py verify addresses between record
    and replay
  tests/avocado: reverse_debugging.py stop VM before sampling icount
  tests/avocado: reverse_debugging reverse-step at the end of the trace
  tests/avocado: reverse_debugging.py add snapshot testing
  replay: simple auto-snapshot mode for record
  tests/avocado: reverse_debugging.py test auto-snapshot mode
  target/ppc: fix timebase register reset state
  spapr: Fix vpa dispatch count for record-replay
  tests/avocado: replay_linux.py add ppc64 pseries test

 docs/system/replay.rst             |   5 +
 include/hw/ppc/spapr_cpu_core.h    |   3 +
 include/sysemu/replay.h            |  16 ++-
 include/sysemu/runstate.h          |   1 +
 accel/tcg/tcg-accel-ops-rr.c       |   2 +-
 chardev/char.c                     |  71 ++++++++----
 hw/net/virtio-net.c                |  17 +--
 hw/ppc/ppc.c                       |  11 +-
 hw/ppc/spapr.c                     |  36 +-----
 hw/ppc/spapr_hcall.c               |  33 ++++++
 hw/ppc/spapr_rtas.c                |   1 +
 migration/migration.c              |  17 ++-
 migration/savevm.c                 |   1 +
 net/announce.c                     |   2 +-
 replay/replay-snapshot.c           |  57 ++++++++++
 replay/replay.c                    |  50 ++++----
 system/runstate.c                  |  31 ++++-
 system/vl.c                        |   9 ++
 target/ppc/machine.c               |   4 +
 qemu-options.hx                    |   9 +-
 scripts/replay-dump.py             | 167 ++++++++++++++++++---------
 tests/avocado/replay_kernel.py     |  11 ++
 tests/avocado/replay_linux.py      |  97 +++++++++++++++-
 tests/avocado/reverse_debugging.py | 176 ++++++++++++++++++++++++-----
 24 files changed, 635 insertions(+), 192 deletions(-)

-- 
2.42.0



^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2024-03-14  5:20 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-11 17:40 [PATCH v4 00/24] replay: fixes and new test cases Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 01/24] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 02/24] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 03/24] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin
2024-03-12 13:25   ` Alex Bennée
2024-03-11 17:40 ` [PATCH v4 04/24] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
2024-03-12 13:26   ` Alex Bennée
2024-03-11 17:40 ` [PATCH v4 05/24] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin
2024-03-12 13:33   ` Alex Bennée
2024-03-12 14:03     ` Nicholas Piggin
2024-03-12 21:03       ` Alex Bennée
2024-03-13  5:27         ` Nicholas Piggin
2024-03-14  5:19         ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 06/24] chardev: set record/replay on the base device of a muxed device Nicholas Piggin
2024-03-12 12:39   ` Marc-André Lureau
2024-03-12 14:11     ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 07/24] replay: Fix migration use of clock Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 08/24] replay: Fix migration replay_mutex locking Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 09/24] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 10/24] virtio-net: Use virtual time for RSC timers Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 11/24] net: Use virtual time for net announce Nicholas Piggin
2024-03-12  9:09   ` Pavel Dovgalyuk
2024-03-12 11:05     ` Nicholas Piggin
2024-03-12 11:12       ` Pavel Dovgalyuk
2024-03-13  5:38         ` Nicholas Piggin
2024-03-13  7:09         ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 12/24] savevm: Fix load_snapshot error path crash Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 13/24] tests/avocado: replay_linux.py remove the timeout expected guards Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 14/24] tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not flaky Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 15/24] tests/avocado: reverse_debugging.py add test for x86-64 q35 machine Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 16/24] tests/avocado: reverse_debugging.py verify addresses between record and replay Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 17/24] tests/avocado: reverse_debugging.py stop VM before sampling icount Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 18/24] tests/avocado: reverse_debugging reverse-step at the end of the trace Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 19/24] tests/avocado: reverse_debugging.py add snapshot testing Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 20/24] replay: simple auto-snapshot mode for record Nicholas Piggin
2024-03-12  9:00   ` Pavel Dovgalyuk
2024-03-12 10:43     ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 21/24] tests/avocado: reverse_debugging.py test auto-snapshot mode Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 22/24] target/ppc: fix timebase register reset state Nicholas Piggin
2024-03-12 13:24   ` Alex Bennée
2024-03-12 13:47     ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 23/24] spapr: Fix vpa dispatch count for record-replay Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 24/24] tests/avocado: replay_linux.py add ppc64 pseries test Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).