All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/24] replay: fixes and new test cases
@ 2024-03-11 17:40 Nicholas Piggin
  2024-03-11 17:40 ` [PATCH v4 01/24] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
                   ` (23 more replies)
  0 siblings, 24 replies; 43+ messages in thread
From: Nicholas Piggin @ 2024-03-11 17:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Pavel Dovgalyuk, Philippe Mathieu-Daudé,
	Richard Henderson, Alex Bennée, Paolo Bonzini, John Snow,
	Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal,
	Michael Tokarev

Since v3,

* Attacked the replay_linux.py bugs and found a bunch of gaps
  in networking that was causing the hangs.
* And several powerpc bugs that were also causing problems on
  pseries.
* Added ppc test to replay_linux.py now that it's working.
* Found several crash bugs in record/replay vs migration.
* Added snapshot and more stepping tests to reverse_debugging.py
* Addressed comments in auto-snapshot code.
* Added auto-snapshot test case.
* "Solved" x86-64 issues in test cases by switching to q35, which
  seems to have less problems.

The last 3 patches I will take in the ppc tree, but included here
because powerpc is the only one that survives the record-replay test
with auto-snapshots at the moment.

Thanks,
Nick

Since v2, here fixes became less minor so I rename the series.

https://lore.kernel.org/qemu-devel/20240125160835.480488-1-npiggin@gmail.com/#r)

* Found several more bugs (patches 5-8).
* Enable the rr avocado test on pseries and aarch64 virt since they're
  passing here (and on gitlab, e.g.,
  https://gitlab.com/npiggin/qemu/-/jobs/6253787216,
  https://gitlab.com/npiggin/qemu/-/jobs/6253787218).
* Updated replay-dump script to John's feedback.

x86-64 still has issues with replay and reverse debugging tests.
replay_kernel.py seems to be timing dependent -- after patch 5 I
had it pass 30/30 runs, then the following day 0/30 and I realized
I had several other QEMU instances hogging the CPU which probably
changed timings. So the first thing I would look at is timers and
clocks. pseries had some rounding issues in time calculations that meant
clock/timer were not replayed exactly as they were recorded, which
caused hangs.

Thanks,
Nick

Nicholas Piggin (24):
  scripts/replay-dump.py: Update to current rr record format
  scripts/replay-dump.py: rejig decoders in event number order
  tests/avocado: excercise scripts/replay-dump.py in replay tests
  replay: allow runstate shutdown->running when replaying trace
  Revert "replay: stop us hanging in rr_wait_io_event"
  chardev: set record/replay on the base device of a muxed device
  replay: Fix migration use of clock
  replay: Fix migration replay_mutex locking
  virtio-net: Use replay_schedule_bh_event for bhs that affect machine
    state
  virtio-net: Use virtual time for RSC timers
  net: Use virtual time for net announce
  savevm: Fix load_snapshot error path crash
  tests/avocado: replay_linux.py remove the timeout expected guards
  tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not
    flaky
  tests/avocado: reverse_debugging.py add test for x86-64 q35 machine
  tests/avocado: reverse_debugging.py verify addresses between record
    and replay
  tests/avocado: reverse_debugging.py stop VM before sampling icount
  tests/avocado: reverse_debugging reverse-step at the end of the trace
  tests/avocado: reverse_debugging.py add snapshot testing
  replay: simple auto-snapshot mode for record
  tests/avocado: reverse_debugging.py test auto-snapshot mode
  target/ppc: fix timebase register reset state
  spapr: Fix vpa dispatch count for record-replay
  tests/avocado: replay_linux.py add ppc64 pseries test

 docs/system/replay.rst             |   5 +
 include/hw/ppc/spapr_cpu_core.h    |   3 +
 include/sysemu/replay.h            |  16 ++-
 include/sysemu/runstate.h          |   1 +
 accel/tcg/tcg-accel-ops-rr.c       |   2 +-
 chardev/char.c                     |  71 ++++++++----
 hw/net/virtio-net.c                |  17 +--
 hw/ppc/ppc.c                       |  11 +-
 hw/ppc/spapr.c                     |  36 +-----
 hw/ppc/spapr_hcall.c               |  33 ++++++
 hw/ppc/spapr_rtas.c                |   1 +
 migration/migration.c              |  17 ++-
 migration/savevm.c                 |   1 +
 net/announce.c                     |   2 +-
 replay/replay-snapshot.c           |  57 ++++++++++
 replay/replay.c                    |  50 ++++----
 system/runstate.c                  |  31 ++++-
 system/vl.c                        |   9 ++
 target/ppc/machine.c               |   4 +
 qemu-options.hx                    |   9 +-
 scripts/replay-dump.py             | 167 ++++++++++++++++++---------
 tests/avocado/replay_kernel.py     |  11 ++
 tests/avocado/replay_linux.py      |  97 +++++++++++++++-
 tests/avocado/reverse_debugging.py | 176 ++++++++++++++++++++++++-----
 24 files changed, 635 insertions(+), 192 deletions(-)

-- 
2.42.0



^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2024-03-14  5:20 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-11 17:40 [PATCH v4 00/24] replay: fixes and new test cases Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 01/24] scripts/replay-dump.py: Update to current rr record format Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 02/24] scripts/replay-dump.py: rejig decoders in event number order Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 03/24] tests/avocado: excercise scripts/replay-dump.py in replay tests Nicholas Piggin
2024-03-12 13:25   ` Alex Bennée
2024-03-11 17:40 ` [PATCH v4 04/24] replay: allow runstate shutdown->running when replaying trace Nicholas Piggin
2024-03-12 13:26   ` Alex Bennée
2024-03-11 17:40 ` [PATCH v4 05/24] Revert "replay: stop us hanging in rr_wait_io_event" Nicholas Piggin
2024-03-12 13:33   ` Alex Bennée
2024-03-12 14:03     ` Nicholas Piggin
2024-03-12 21:03       ` Alex Bennée
2024-03-13  5:27         ` Nicholas Piggin
2024-03-14  5:19         ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 06/24] chardev: set record/replay on the base device of a muxed device Nicholas Piggin
2024-03-12 12:39   ` Marc-André Lureau
2024-03-12 14:11     ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 07/24] replay: Fix migration use of clock Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 08/24] replay: Fix migration replay_mutex locking Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 09/24] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 10/24] virtio-net: Use virtual time for RSC timers Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 11/24] net: Use virtual time for net announce Nicholas Piggin
2024-03-12  9:09   ` Pavel Dovgalyuk
2024-03-12 11:05     ` Nicholas Piggin
2024-03-12 11:12       ` Pavel Dovgalyuk
2024-03-13  5:38         ` Nicholas Piggin
2024-03-13  7:09         ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 12/24] savevm: Fix load_snapshot error path crash Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 13/24] tests/avocado: replay_linux.py remove the timeout expected guards Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 14/24] tests/avocado/reverse_debugging.py: mark aarch64 and pseries as not flaky Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 15/24] tests/avocado: reverse_debugging.py add test for x86-64 q35 machine Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 16/24] tests/avocado: reverse_debugging.py verify addresses between record and replay Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 17/24] tests/avocado: reverse_debugging.py stop VM before sampling icount Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 18/24] tests/avocado: reverse_debugging reverse-step at the end of the trace Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 19/24] tests/avocado: reverse_debugging.py add snapshot testing Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 20/24] replay: simple auto-snapshot mode for record Nicholas Piggin
2024-03-12  9:00   ` Pavel Dovgalyuk
2024-03-12 10:43     ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 21/24] tests/avocado: reverse_debugging.py test auto-snapshot mode Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 22/24] target/ppc: fix timebase register reset state Nicholas Piggin
2024-03-12 13:24   ` Alex Bennée
2024-03-12 13:47     ` Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 23/24] spapr: Fix vpa dispatch count for record-replay Nicholas Piggin
2024-03-11 17:40 ` [PATCH v4 24/24] tests/avocado: replay_linux.py add ppc64 pseries test Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.