All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/17] replay: Fixes and avocado test updates
@ 2024-12-20 10:42 Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 01/17] replay: Fix migration use of clock for statistics Nicholas Piggin
                   ` (17 more replies)
  0 siblings, 18 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Hi,

This is another round of replay fixes posted here

https://lore.kernel.org/qemu-devel/20240813050638.446172-1-npiggin@gmail.com/

A bunch of those fixes have been merged, but there are still some
outstanding here.

Dropped from the series is the net announce change, which seemed to
be the main issue Pavel had so far:

https://lore.kernel.org/qemu-devel/6e9b8e49-f00f-46fc-bbf8-4af27e0c3906@ispras.ru/

New in this series is a reworking of the replay BH APIs because people
didn't like the replay_xxx APIs throughout the tree. These new APIs
also have some assertions added to catch un-converted users when replay
is enabled, because it is far harder to debug it when it surfaces as a
replay failure.

These new API assertions caught a hw/ide replay bug which solves some
replay_linux test hangs. Couple of fixes in the replay_linux test case,
and now all tests are passing including aarch64 tests, see here

  https://gitlab.com/npiggin/qemu/-/jobs/8695386122

(In that run a couple of the x86_64 tests were disabled to fit the
aarch64 tests in because gitlab seems to kill the job after 1 hour
so we can't fit them all in)

ppc64 also passes replay_linux after a couple of ppc64 fixes I'll post
a patch to add the ppc64 test later after everything works through.

Thanks,
Nick


Nicholas Piggin (17):
  replay: Fix migration use of clock for statistics
  replay: Fix migration replay_mutex locking
  async: rework async event API for replay
  util/main-loop: Convert to new bh API
  util/thread-pool: Convert to new bh API
  util/aio-wait: Convert to new bh API
  async/coroutine: Convert to new bh API
  migration: Convert to new bh API
  monitor: Convert to new bh API
  qmp: Convert to new bh API
  block: Convert to new bh API
  hw/ide: Fix record-replay and convert to new bh API
  hw/scsi: Convert to new bh API
  async: add debugging assertions for record/replay in bh APIs
  tests/avocado/replay_linux: Fix compile error
  tests/avocado/replay_linux: Fix cdrom device setup
  tests/avocado/replay_linux: remove the timeout expected guards

 docs/devel/replay.rst              |  7 +--
 include/block/aio.h                | 44 ++++++++++++++++--
 include/sysemu/replay.h            |  2 +-
 backends/rng-builtin.c             |  2 +-
 block.c                            |  4 +-
 block/blkreplay.c                  | 10 +++-
 block/block-backend.c              | 24 ++++++----
 block/io.c                         |  5 +-
 block/iscsi.c                      |  5 +-
 block/nfs.c                        | 10 ++--
 block/null.c                       |  4 +-
 block/nvme.c                       |  8 ++--
 hw/ide/core.c                      |  9 ++--
 hw/net/virtio-net.c                | 14 +++---
 hw/scsi/scsi-bus.c                 | 14 ++++--
 job.c                              |  3 +-
 migration/migration.c              | 17 +++++--
 migration/savevm.c                 | 15 +++---
 monitor/monitor.c                  |  3 +-
 monitor/qmp.c                      |  5 +-
 qapi/qmp-dispatch.c                |  5 +-
 replay/replay-events.c             | 29 +++++-------
 stubs/replay-tools.c               |  2 +-
 util/aio-wait.c                    |  3 +-
 util/async.c                       | 75 ++++++++++++++++++++++++++++--
 util/main-loop.c                   |  2 +-
 util/thread-pool.c                 |  8 ++--
 scripts/block-coroutine-wrapper.py |  3 +-
 tests/avocado/replay_linux.py      |  9 ++--
 29 files changed, 245 insertions(+), 96 deletions(-)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/17] replay: Fix migration use of clock for statistics
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 16:31   ` Peter Xu
  2024-12-20 10:42 ` [PATCH 02/17] replay: Fix migration replay_mutex locking Nicholas Piggin
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Migration reads CLOCK_HOST when not holding the replay_mutex, which
asserts when recording a trace. These are not guest visible so should
be CLOCK_REALTIME like other statistics in MigrationState, which do
not require the replay_mutex.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 migration/migration.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8c5bd0a75c8..2eb9e50a263 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3433,7 +3433,7 @@ static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
     MigrationThread *thread = NULL;
-    int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+    int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     MigThrError thr_error;
     bool urgent = false;
     Error *local_err = NULL;
@@ -3504,7 +3504,7 @@ static void *migration_thread(void *opaque)
         goto out;
     }
 
-    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start;
 
     trace_migration_thread_setup_complete();
 
@@ -3584,7 +3584,7 @@ static void *bg_migration_thread(void *opaque)
 
     migration_rate_set(RATE_LIMIT_DISABLED);
 
-    setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+    setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     /*
      * We want to save vmstate for the moment when migration has been
      * initiated but also we want to save RAM content while VM is running.
@@ -3629,7 +3629,7 @@ static void *bg_migration_thread(void *opaque)
         goto fail_setup;
     }
 
-    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start;
 
     trace_migration_thread_setup_complete();
 
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/17] replay: Fix migration replay_mutex locking
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 01/17] replay: Fix migration use of clock for statistics Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 13:08   ` Fabiano Rosas
  2024-12-20 10:42 ` [PATCH 03/17] async: rework async event API for replay Nicholas Piggin
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Migration causes a number of events that need to go in the replay
trace, such as vm state transitions. The replay_mutex lock needs to
be held for these.

The simplest approach seems to be just take it up-front when taking
the bql.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 migration/migration.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 2eb9e50a263..277fca954c1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -24,6 +24,7 @@
 #include "socket.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/replay.h"
 #include "sysemu/cpu-throttle.h"
 #include "rdma.h"
 #include "ram.h"
@@ -2518,6 +2519,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
     }
 
     trace_postcopy_start();
+    replay_mutex_lock();
     bql_lock();
     trace_postcopy_start_set_run();
 
@@ -2629,6 +2631,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
     migration_downtime_end(ms);
 
     bql_unlock();
+    replay_mutex_unlock();
 
     if (migrate_postcopy_ram()) {
         /*
@@ -2670,6 +2673,7 @@ fail:
     }
     migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
     bql_unlock();
+    replay_mutex_unlock();
     return -1;
 }
 
@@ -2721,6 +2725,7 @@ static int migration_completion_precopy(MigrationState *s,
 {
     int ret;
 
+    replay_mutex_lock();
     bql_lock();
 
     if (!migrate_mode_is_cpr(s)) {
@@ -2746,6 +2751,7 @@ static int migration_completion_precopy(MigrationState *s,
                                              s->block_inactive);
 out_unlock:
     bql_unlock();
+    replay_mutex_unlock();
     return ret;
 }
 
@@ -3633,6 +3639,7 @@ static void *bg_migration_thread(void *opaque)
 
     trace_migration_thread_setup_complete();
 
+    replay_mutex_lock();
     bql_lock();
 
     if (migration_stop_vm(s, RUN_STATE_PAUSED)) {
@@ -3666,6 +3673,7 @@ static void *bg_migration_thread(void *opaque)
      */
     migration_bh_schedule(bg_migration_vm_start_bh, s);
     bql_unlock();
+    replay_mutex_unlock();
 
     while (migration_is_active()) {
         MigIterateState iter_state = bg_migration_iteration_run(s);
@@ -3695,6 +3703,7 @@ fail:
         migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
                 MIGRATION_STATUS_FAILED);
         bql_unlock();
+        replay_mutex_unlock();
     }
 
 fail_setup:
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/17] async: rework async event API for replay
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 01/17] replay: Fix migration use of clock for statistics Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 02/17] replay: Fix migration replay_mutex locking Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 04/17] util/main-loop: Convert to new bh API Nicholas Piggin
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Replace the functions replay_bh_schedule_event() and
replay_bh_schedule_oneshot_event() with qemu_bh_schedule_event() and
aio_bh_schedule_oneshot_event(), respectively.

Direct API conversions use QEMU_CLOCK_VIRTUAL, because the virtual
clocks then go via the replay path when replay is enabled.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 docs/devel/replay.rst   |  7 ++++---
 include/block/aio.h     | 35 +++++++++++++++++++++++++++++++----
 include/sysemu/replay.h |  2 +-
 backends/rng-builtin.c  |  2 +-
 block/block-backend.c   | 24 ++++++++++++++----------
 block/io.c              |  5 +++--
 block/iscsi.c           |  5 +++--
 block/nfs.c             | 10 ++++++----
 block/null.c            |  4 ++--
 block/nvme.c            |  8 +++++---
 hw/ide/core.c           |  7 ++++---
 hw/net/virtio-net.c     | 14 +++++++-------
 replay/replay-events.c  |  2 +-
 stubs/replay-tools.c    |  2 +-
 util/async.c            | 39 +++++++++++++++++++++++++++++++++++++--
 15 files changed, 120 insertions(+), 46 deletions(-)

diff --git a/docs/devel/replay.rst b/docs/devel/replay.rst
index 40f58d9d4fc..ca816d87587 100644
--- a/docs/devel/replay.rst
+++ b/docs/devel/replay.rst
@@ -171,9 +171,10 @@ Bottom halves
 -------------
 
 Bottom half callbacks, that affect the guest state, should be invoked through
-``replay_bh_schedule_event`` or ``replay_bh_schedule_oneshot_event`` functions.
-Their invocations are saved in record mode and synchronized with the existing
-log in replay mode.
+``qemu_bh_schedule_event`` or ``aio_bh_schedule_oneshot_event`` functions
+the with the appropriate clock type (e.g., QEMU_CLOCK_VIRTUAL. Their
+invocations are saved in record mode and synchronized with the existing log in
+replay mode.
 
 Disk I/O events are completely deterministic in our model, because
 in both record and replay modes we start virtual machine from the same
diff --git a/include/block/aio.h b/include/block/aio.h
index 43883a8a33a..bc323b0d936 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -286,17 +286,30 @@ void aio_context_unref(AioContext *ctx);
  * @name: A human-readable identifier for debugging purposes.
  */
 void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
-                                  const char *name);
+                                  const char *name, QEMUClockType clock_type);
 
 /**
- * aio_bh_schedule_oneshot: Allocate a new bottom half structure that will run
- * only once and as soon as possible.
+ * aio_bh_schedule_oneshot_event: Allocate a new bottom half structure that
+ * will run only once and as soon as possible.
  *
  * A convenience wrapper for aio_bh_schedule_oneshot_full() that uses cb as the
  * name string.
  */
+#define aio_bh_schedule_oneshot_event(ctx, cb, opaque, clock_type) \
+    aio_bh_schedule_oneshot_full((ctx), (cb), (opaque), (stringify(cb)), \
+                                 clock_type)
+
+/**
+ * aio_bh_schedule_oneshot: Allocate a new bottom half structure that will run
+ * only once and as soon as possible.
+ *
+ * A legacy wrapper for aio_bh_schedule_oneshot_event() that uses realtime
+ * as the clock type. Callers should specify the clock time in order to be
+ * compatible with record/replay.
+ */
 #define aio_bh_schedule_oneshot(ctx, cb, opaque) \
-    aio_bh_schedule_oneshot_full((ctx), (cb), (opaque), (stringify(cb)))
+    aio_bh_schedule_oneshot_full((ctx), (cb), (opaque), (stringify(cb)), \
+                                 QEMU_CLOCK_REALTIME)
 
 /**
  * aio_bh_new_full: Allocate a new bottom half structure.
@@ -378,6 +391,20 @@ void aio_bh_call(QEMUBH *bh);
  */
 int aio_bh_poll(AioContext *ctx);
 
+/**
+ * qemu_bh_schedule_event: Schedule a bottom half.
+ *
+ * Scheduling a bottom half interrupts the main loop and causes the
+ * execution of the callback that was passed to qemu_bh_new.
+ *
+ * Bottom halves that are scheduled from a bottom half handler are instantly
+ * invoked.  This can create an infinite loop if a bottom half handler
+ * schedules itself.
+ *
+ * @bh: The bottom half to be scheduled.
+ */
+void qemu_bh_schedule_event(QEMUBH *bh, QEMUClockType clock_type);
+
 /**
  * qemu_bh_schedule: Schedule a bottom half.
  *
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index cba74fa9bce..30a7ae47518 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -126,7 +126,7 @@ void replay_flush_events(void);
 /*! Adds bottom half event to the queue */
 void replay_bh_schedule_event(QEMUBH *bh);
 /* Adds oneshot bottom half event to the queue */
-void replay_bh_schedule_oneshot_event(AioContext *ctx,
+void replay_bh_oneshot_event(AioContext *ctx,
     QEMUBHFunc *cb, void *opaque);
 /*! Adds input event to the queue */
 void replay_input_event(QemuConsole *src, InputEvent *evt);
diff --git a/backends/rng-builtin.c b/backends/rng-builtin.c
index f367eb665cf..eef5c61f4f5 100644
--- a/backends/rng-builtin.c
+++ b/backends/rng-builtin.c
@@ -38,7 +38,7 @@ static void rng_builtin_request_entropy(RngBackend *b, RngRequest *req)
 {
     RngBuiltin *s = RNG_BUILTIN(b);
 
-    replay_bh_schedule_event(s->bh);
+    qemu_bh_schedule_event(s->bh, QEMU_CLOCK_VIRTUAL);
 }
 
 static void rng_builtin_init(Object *obj)
diff --git a/block/block-backend.c b/block/block-backend.c
index 85bcdedcef6..2168729340d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1530,8 +1530,8 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
     acb->blk = blk;
     acb->ret = ret;
 
-    replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
-                                     error_callback_bh, acb);
+    aio_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
+                                  error_callback_bh, acb, QEMU_CLOCK_VIRTUAL);
     return &acb->common;
 }
 
@@ -1588,8 +1588,9 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset,
 
     acb->has_returned = true;
     if (acb->rwco.ret != NOT_DONE) {
-        replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
-                                         blk_aio_complete_bh, acb);
+        aio_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
+                                      blk_aio_complete_bh, acb,
+                                      QEMU_CLOCK_VIRTUAL);
     }
 
     return &acb->common;
@@ -1894,8 +1895,9 @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
 
     acb->has_returned = true;
     if (acb->rwco.ret != NOT_DONE) {
-        replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
-                                         blk_aio_complete_bh, acb);
+        aio_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
+                                      blk_aio_complete_bh, acb,
+                                      QEMU_CLOCK_VIRTUAL);
     }
 
     return &acb->common;
@@ -1935,8 +1937,9 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
 
     acb->has_returned = true;
     if (acb->rwco.ret != NOT_DONE) {
-        replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
-                                         blk_aio_complete_bh, acb);
+        aio_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
+                                      blk_aio_complete_bh, acb,
+                                      QEMU_CLOCK_VIRTUAL);
     }
 
     return &acb->common;
@@ -1974,8 +1977,9 @@ BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
     aio_co_enter(qemu_get_current_aio_context(), co);
     acb->has_returned = true;
     if (acb->rwco.ret != NOT_DONE) {
-        replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
-                                         blk_aio_complete_bh, acb);
+        aio_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
+                                      blk_aio_complete_bh, acb,
+                                      QEMU_CLOCK_VIRTUAL);
     }
 
     return &acb->common;
diff --git a/block/io.c b/block/io.c
index 301514c8808..fcce0710824 100644
--- a/block/io.c
+++ b/block/io.c
@@ -335,8 +335,9 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         bdrv_inc_in_flight(bs);
     }
 
-    replay_bh_schedule_oneshot_event(qemu_get_aio_context(),
-                                     bdrv_co_drain_bh_cb, &data);
+    aio_bh_schedule_oneshot_event(qemu_get_aio_context(),
+                                  bdrv_co_drain_bh_cb, &data,
+                                  QEMU_CLOCK_VIRTUAL);
 
     qemu_coroutine_yield();
     /* If we are resumed from some other event (such as an aio completion or a
diff --git a/block/iscsi.c b/block/iscsi.c
index 979bf90cb79..98ed86b9831 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -285,8 +285,9 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
     }
 
     if (iTask->co) {
-        replay_bh_schedule_oneshot_event(iTask->iscsilun->aio_context,
-                                         iscsi_co_generic_bh_cb, iTask);
+        aio_bh_schedule_oneshot_event(iTask->iscsilun->aio_context,
+                                      iscsi_co_generic_bh_cb, iTask,
+                                      QEMU_CLOCK_VIRTUAL);
     } else {
         iTask->complete = 1;
     }
diff --git a/block/nfs.c b/block/nfs.c
index 0500f60c08f..66d9df0d89b 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -256,8 +256,9 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data,
     if (task->ret < 0) {
         error_report("NFS Error: %s", nfs_get_error(nfs));
     }
-    replay_bh_schedule_oneshot_event(task->client->aio_context,
-                                     nfs_co_generic_bh_cb, task);
+    aio_bh_schedule_oneshot_event(task->client->aio_context,
+                                  nfs_co_generic_bh_cb, task,
+                                  QEMU_CLOCK_VIRTUAL);
 }
 
 static int coroutine_fn nfs_co_preadv(BlockDriverState *bs, int64_t offset,
@@ -723,8 +724,9 @@ nfs_get_allocated_file_size_cb(int ret, struct nfs_context *nfs, void *data,
     if (task->ret < 0) {
         error_report("NFS Error: %s", nfs_get_error(nfs));
     }
-    replay_bh_schedule_oneshot_event(task->client->aio_context,
-                                     nfs_co_generic_bh_cb, task);
+    aio_bh_schedule_oneshot_event(task->client->aio_context,
+                                  nfs_co_generic_bh_cb, task,
+                                  QEMU_CLOCK_VIRTUAL);
 }
 
 static int64_t coroutine_fn nfs_co_get_allocated_file_size(BlockDriverState *bs)
diff --git a/block/null.c b/block/null.c
index 4730acc1eb2..d35b42cbb44 100644
--- a/block/null.c
+++ b/block/null.c
@@ -183,8 +183,8 @@ static inline BlockAIOCB *null_aio_common(BlockDriverState *bs,
         timer_mod_ns(&acb->timer,
                      qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + s->latency_ns);
     } else {
-        replay_bh_schedule_oneshot_event(bdrv_get_aio_context(bs),
-                                         null_bh_cb, acb);
+        aio_bh_schedule_oneshot_event(bdrv_get_aio_context(bs),
+                                      null_bh_cb, acb, QEMU_CLOCK_VIRTUAL);
     }
     return &acb->common;
 }
diff --git a/block/nvme.c b/block/nvme.c
index 3b588b139f6..4069639dc7a 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -344,8 +344,9 @@ static void nvme_put_free_req_locked(NVMeQueuePair *q, NVMeRequest *req)
 static void nvme_wake_free_req_locked(NVMeQueuePair *q)
 {
     if (!qemu_co_queue_empty(&q->free_req_queue)) {
-        replay_bh_schedule_oneshot_event(q->s->aio_context,
-                nvme_free_req_queue_cb, q);
+        aio_bh_schedule_oneshot_event(q->s->aio_context,
+                                      nvme_free_req_queue_cb, q,
+                                      QEMU_CLOCK_VIRTUAL);
     }
 }
 
@@ -1177,7 +1178,8 @@ static void nvme_rw_cb(void *opaque, int ret)
         /* The rw coroutine hasn't yielded, don't try to enter. */
         return;
     }
-    replay_bh_schedule_oneshot_event(data->ctx, nvme_rw_cb_bh, data);
+    aio_bh_schedule_oneshot_event(data->ctx, nvme_rw_cb_bh, data,
+                                  QEMU_CLOCK_VIRTUAL);
 }
 
 static coroutine_fn int nvme_co_prw_aligned(BlockDriverState *bs,
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 08d92184554..c527caf3d69 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -516,7 +516,7 @@ static void ide_issue_trim_cb(void *opaque, int ret)
 done:
     iocb->aiocb = NULL;
     if (iocb->bh) {
-        replay_bh_schedule_event(iocb->bh);
+        qemu_bh_schedule_event(iocb->bh, QEMU_CLOCK_VIRTUAL);
     }
 }
 
@@ -2368,8 +2368,9 @@ void ide_ctrl_write(void *opaque, uint32_t addr, uint32_t val)
             s = &bus->ifs[i];
             s->status |= BUSY_STAT;
         }
-        replay_bh_schedule_oneshot_event(qemu_get_aio_context(),
-                                         ide_bus_perform_srst, bus);
+        aio_bh_schedule_oneshot_event(qemu_get_aio_context(),
+                                      ide_bus_perform_srst, bus,
+                                      QEMU_CLOCK_VIRTUAL);
     }
 
     bus->cmd = val;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 4fd1f9accab..ecbf274cb04 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -415,10 +415,10 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 
         if (queue_started) {
             if (q->tx_timer) {
-                timer_mod(q->tx_timer,
-                               qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
+                timer_mod(q->tx_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+                                       n->tx_timeout);
             } else {
-                replay_bh_schedule_event(q->tx_bh);
+                qemu_bh_schedule_event(q->tx_bh, QEMU_CLOCK_VIRTUAL);
             }
         } else {
             if (q->tx_timer) {
@@ -2705,7 +2705,7 @@ static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)
          */
         virtio_queue_set_notification(q->tx_vq, 0);
         if (q->tx_bh) {
-            replay_bh_schedule_event(q->tx_bh);
+            qemu_bh_schedule_event(q->tx_bh, QEMU_CLOCK_VIRTUAL);
         } else {
             timer_mod(q->tx_timer,
                       qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
@@ -2871,7 +2871,7 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq)
         return;
     }
     virtio_queue_set_notification(vq, 0);
-    replay_bh_schedule_event(q->tx_bh);
+    qemu_bh_schedule_event(q->tx_bh, QEMU_CLOCK_VIRTUAL);
 }
 
 static void virtio_net_tx_timer(void *opaque)
@@ -2954,7 +2954,7 @@ static void virtio_net_tx_bh(void *opaque)
     /* If we flush a full burst of packets, assume there are
      * more coming and immediately reschedule */
     if (ret >= n->tx_burst) {
-        replay_bh_schedule_event(q->tx_bh);
+        qemu_bh_schedule_event(q->tx_bh, QEMU_CLOCK_VIRTUAL);
         q->tx_waiting = 1;
         return;
     }
@@ -2968,7 +2968,7 @@ static void virtio_net_tx_bh(void *opaque)
         return;
     } else if (ret > 0) {
         virtio_queue_set_notification(q->tx_vq, 0);
-        replay_bh_schedule_event(q->tx_bh);
+        qemu_bh_schedule_event(q->tx_bh, QEMU_CLOCK_VIRTUAL);
         q->tx_waiting = 1;
     }
 }
diff --git a/replay/replay-events.c b/replay/replay-events.c
index 2e46eda6bf8..d4b095b2097 100644
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -131,7 +131,7 @@ void replay_bh_schedule_event(QEMUBH *bh)
     }
 }
 
-void replay_bh_schedule_oneshot_event(AioContext *ctx,
+void replay_bh_oneshot_event(AioContext *ctx,
     QEMUBHFunc *cb, void *opaque)
 {
     if (events_enabled) {
diff --git a/stubs/replay-tools.c b/stubs/replay-tools.c
index 3e8ca3212d9..31985af35f7 100644
--- a/stubs/replay-tools.c
+++ b/stubs/replay-tools.c
@@ -30,7 +30,7 @@ void replay_bh_schedule_event(QEMUBH *bh)
     qemu_bh_schedule(bh);
 }
 
-void replay_bh_schedule_oneshot_event(AioContext *ctx,
+void replay_bh_oneshot_event(AioContext *ctx,
      QEMUBHFunc *cb, void *opaque)
 {
     aio_bh_schedule_oneshot(ctx, cb, opaque);
diff --git a/util/async.c b/util/async.c
index 99db28389f6..b88083ecbe7 100644
--- a/util/async.c
+++ b/util/async.c
@@ -36,6 +36,7 @@
 #include "qemu/coroutine_int.h"
 #include "qemu/coroutine-tls.h"
 #include "sysemu/cpu-timers.h"
+#include "sysemu/replay.h"
 #include "trace.h"
 
 /***********************************************************/
@@ -126,8 +127,8 @@ static QEMUBH *aio_bh_dequeue(BHList *head, unsigned *flags)
     return bh;
 }
 
-void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
-                                  void *opaque, const char *name)
+static void do_aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
+                                            void *opaque, const char *name)
 {
     QEMUBH *bh;
     bh = g_new(QEMUBH, 1);
@@ -140,6 +141,24 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
     aio_bh_enqueue(bh, BH_SCHEDULED | BH_ONESHOT);
 }
 
+void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
+                                  void *opaque, const char *name,
+                                  QEMUClockType clock_type)
+{
+    switch (clock_type) {
+    case QEMU_CLOCK_VIRTUAL:
+    case QEMU_CLOCK_VIRTUAL_RT:
+        if (replay_mode != REPLAY_MODE_NONE) {
+            /* Record/replay must intercept bh events */
+            replay_bh_oneshot_event(ctx, cb, opaque);
+            break;
+        }
+        /* fallthrough */
+    default:
+        do_aio_bh_schedule_oneshot_full(ctx, cb, opaque, name);
+    }
+}
+
 QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
                         const char *name, MemReentrancyGuard *reentrancy_guard)
 {
@@ -226,6 +245,22 @@ int aio_bh_poll(AioContext *ctx)
     return ret;
 }
 
+void qemu_bh_schedule_event(QEMUBH *bh, QEMUClockType clock_type)
+{
+    switch (clock_type) {
+    case QEMU_CLOCK_VIRTUAL:
+    case QEMU_CLOCK_VIRTUAL_RT:
+        if (replay_mode != REPLAY_MODE_NONE) {
+            /* Record/replay must intercept bh events */
+            replay_bh_schedule_event(bh);
+            break;
+        }
+        /* fallthrough */
+    default:
+        aio_bh_enqueue(bh, BH_SCHEDULED);
+    }
+}
+
 void qemu_bh_schedule_idle(QEMUBH *bh)
 {
     aio_bh_enqueue(bh, BH_SCHEDULED | BH_IDLE);
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/17] util/main-loop: Convert to new bh API
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (2 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 03/17] async: rework async event API for replay Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 05/17] util/thread-pool: " Nicholas Piggin
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Convert qemu_bh_schedule() to qemu_bh_schedule_event(), which can
specify the clock type, making it compatible with record-replay.

The notify event does not affect target machine state, so it should
use QEMU_CLOCK_REALTIME so it is not recorded and replayed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 util/main-loop.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/main-loop.c b/util/main-loop.c
index a0386cfeb60..6180a183f50 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -148,7 +148,7 @@ void qemu_notify_event(void)
     if (!qemu_aio_context) {
         return;
     }
-    qemu_bh_schedule(qemu_notify_bh);
+    qemu_bh_schedule_event(qemu_notify_bh, QEMU_CLOCK_REALTIME);
 }
 
 static GArray *gpollfds;
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/17] util/thread-pool: Convert to new bh API
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (3 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 04/17] util/main-loop: Convert to new bh API Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 06/17] util/aio-wait: " Nicholas Piggin
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Convert qemu_bh_schedule() to qemu_bh_schedule_event(), which can
specify the clock type, making it compatible with record-replay.

The thread pool management does not affect target machine state,
so it should use QEMU_CLOCK_REALTIME so it is not recorded and
replayed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 util/thread-pool.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/thread-pool.c b/util/thread-pool.c
index 27eb777e855..010eb4ad9a6 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -115,7 +115,7 @@ static void *worker_thread(void *opaque)
         smp_wmb();
         req->state = THREAD_DONE;
 
-        qemu_bh_schedule(pool->completion_bh);
+        qemu_bh_schedule_event(pool->completion_bh, QEMU_CLOCK_REALTIME);
         qemu_mutex_lock(&pool->lock);
     }
 
@@ -167,7 +167,7 @@ static void spawn_thread(ThreadPool *pool)
      * inherit the correct affinity instead of the vcpu affinity.
      */
     if (!pool->pending_threads) {
-        qemu_bh_schedule(pool->new_thread_bh);
+        qemu_bh_schedule_event(pool->new_thread_bh, QEMU_CLOCK_REALTIME);
     }
 }
 
@@ -195,7 +195,7 @@ restart:
             /* Schedule ourselves in case elem->common.cb() calls aio_poll() to
              * wait for another request that completed at the same time.
              */
-            qemu_bh_schedule(pool->completion_bh);
+            qemu_bh_schedule_event(pool->completion_bh, QEMU_CLOCK_REALTIME);
 
             elem->common.cb(elem->common.opaque, elem->ret);
 
@@ -225,7 +225,7 @@ static void thread_pool_cancel(BlockAIOCB *acb)
     QEMU_LOCK_GUARD(&pool->lock);
     if (elem->state == THREAD_QUEUED) {
         QTAILQ_REMOVE(&pool->request_list, elem, reqs);
-        qemu_bh_schedule(pool->completion_bh);
+        qemu_bh_schedule_event(pool->completion_bh, QEMU_CLOCK_REALTIME);
 
         elem->state = THREAD_DONE;
         elem->ret = -ECANCELED;
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/17] util/aio-wait: Convert to new bh API
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (4 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 05/17] util/thread-pool: " Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 07/17] async/coroutine: " Nicholas Piggin
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Convert qemu_bh_schedule() to qemu_bh_schedule_event(), which can
specify the clock type, making it compatible with record-replay.

aio_wait_kick does not affect target machine state, so it should use
QEMU_CLOCK_REALTIME so it is not recorded and replayed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 util/aio-wait.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/util/aio-wait.c b/util/aio-wait.c
index b5336cf5fd2..2137abd4d29 100644
--- a/util/aio-wait.c
+++ b/util/aio-wait.c
@@ -51,7 +51,8 @@ void aio_wait_kick(void)
     smp_mb();
 
     if (qatomic_read(&global_aio_wait.num_waiters)) {
-        aio_bh_schedule_oneshot(qemu_get_aio_context(), dummy_bh_cb, NULL);
+        aio_bh_schedule_oneshot_event(qemu_get_aio_context(), dummy_bh_cb,
+                                      NULL, QEMU_CLOCK_REALTIME);
     }
 }
 
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/17] async/coroutine: Convert to new bh API
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (5 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 06/17] util/aio-wait: " Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 08/17] migration: " Nicholas Piggin
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Convert qemu_bh_schedule() to qemu_bh_schedule_event() and
aio_bh_schedule_oneshot() to aio_bh_schedule_oneshot_event(), which can
specify the clock type, making it compatible with record-replay.

Coroutine scheduling does not affect target machine state, so it should
use QEMU_CLOCK_REALTIME so it is not recorded and replayed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 job.c                              | 3 ++-
 util/async.c                       | 2 +-
 scripts/block-coroutine-wrapper.py | 3 ++-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/job.c b/job.c
index 660ce22c56b..45e9d9fc096 100644
--- a/job.c
+++ b/job.c
@@ -1109,7 +1109,8 @@ static void coroutine_fn job_co_entry(void *opaque)
         job->deferred_to_main_loop = true;
         job->busy = true;
     }
-    aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
+    aio_bh_schedule_oneshot_event(qemu_get_aio_context(), job_exit, job,
+                                  QEMU_CLOCK_REALTIME);
 }
 
 void job_start(Job *job)
diff --git a/util/async.c b/util/async.c
index b88083ecbe7..6ac994effec 100644
--- a/util/async.c
+++ b/util/async.c
@@ -683,7 +683,7 @@ void aio_co_schedule(AioContext *ctx, Coroutine *co)
 
     QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
                               co, co_scheduled_next);
-    qemu_bh_schedule(ctx->co_schedule_bh);
+    qemu_bh_schedule_event(ctx->co_schedule_bh, QEMU_CLOCK_REALTIME);
 
     aio_context_unref(ctx);
 }
diff --git a/scripts/block-coroutine-wrapper.py b/scripts/block-coroutine-wrapper.py
index dbbde99e39e..c111c322487 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -292,7 +292,8 @@ def gen_no_co_wrapper(func: FuncDecl) -> str:
     }};
     assert(qemu_in_coroutine());
 
-    aio_bh_schedule_oneshot(qemu_get_aio_context(), {name}_bh, &s);
+    aio_bh_schedule_oneshot_event(qemu_get_aio_context(), {name}_bh, &s,
+                                  QEMU_CLOCK_REALTIME);
     qemu_coroutine_yield();
 
     {func.ret}
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/17] migration: Convert to new bh API
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (6 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 07/17] async/coroutine: " Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 09/17] monitor: " Nicholas Piggin
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Convert aio_bh_schedule_oneshot() to aio_bh_schedule_oneshot_event(),
which can specify the clock type, making it compatible with
record-replay.

Snapsohtting does not affect target machine state, so it should use
QEMU_CLOCK_REALTIME so it is not recorded and replayed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 migration/savevm.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 98821c81207..8477f116c79 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3441,8 +3441,9 @@ static int coroutine_fn snapshot_save_job_run(Job *job, Error **errp)
     SnapshotJob *s = container_of(job, SnapshotJob, common);
     s->errp = errp;
     s->co = qemu_coroutine_self();
-    aio_bh_schedule_oneshot(qemu_get_aio_context(),
-                            snapshot_save_job_bh, job);
+    aio_bh_schedule_oneshot_event(qemu_get_aio_context(),
+                                  snapshot_save_job_bh, job,
+                                  QEMU_CLOCK_REALTIME);
     qemu_coroutine_yield();
     return s->ret ? 0 : -1;
 }
@@ -3452,8 +3453,9 @@ static int coroutine_fn snapshot_load_job_run(Job *job, Error **errp)
     SnapshotJob *s = container_of(job, SnapshotJob, common);
     s->errp = errp;
     s->co = qemu_coroutine_self();
-    aio_bh_schedule_oneshot(qemu_get_aio_context(),
-                            snapshot_load_job_bh, job);
+    aio_bh_schedule_oneshot_event(qemu_get_aio_context(),
+                                  snapshot_load_job_bh, job,
+                                  QEMU_CLOCK_REALTIME);
     qemu_coroutine_yield();
     return s->ret ? 0 : -1;
 }
@@ -3463,8 +3465,9 @@ static int coroutine_fn snapshot_delete_job_run(Job *job, Error **errp)
     SnapshotJob *s = container_of(job, SnapshotJob, common);
     s->errp = errp;
     s->co = qemu_coroutine_self();
-    aio_bh_schedule_oneshot(qemu_get_aio_context(),
-                            snapshot_delete_job_bh, job);
+    aio_bh_schedule_oneshot_event(qemu_get_aio_context(),
+                                  snapshot_delete_job_bh, job,
+                                  QEMU_CLOCK_REALTIME);
     qemu_coroutine_yield();
     return s->ret ? 0 : -1;
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/17] monitor: Convert to new bh API
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (7 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 08/17] migration: " Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 10/17] qmp: " Nicholas Piggin
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Convert aio_bh_schedule_oneshot() to aio_bh_schedule_oneshot_event(),
which can specify the clock type, making it compatible with
record-replay.

Monitor creation and input does not affect target machine state, so it
should use QEMU_CLOCK_REALTIME so it is not recorded and replayed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 monitor/monitor.c | 3 ++-
 monitor/qmp.c     | 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/monitor/monitor.c b/monitor/monitor.c
index 56786c0ccc7..accdee88b51 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -583,7 +583,8 @@ void monitor_resume(Monitor *mon)
             ctx = qemu_get_aio_context();
         }
 
-        aio_bh_schedule_oneshot(ctx, monitor_accept_input, mon);
+        aio_bh_schedule_oneshot_event(ctx, monitor_accept_input, mon,
+                                      QEMU_CLOCK_REALTIME);
     }
 
     trace_monitor_suspend(mon, -1);
diff --git a/monitor/qmp.c b/monitor/qmp.c
index 5e538f34c0d..c6fec04860f 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -541,8 +541,9 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
          * since chardev might be running in the monitor I/O
          * thread.  Schedule a bottom half.
          */
-        aio_bh_schedule_oneshot(iothread_get_aio_context(mon_iothread),
-                                monitor_qmp_setup_handlers_bh, mon);
+        aio_bh_schedule_oneshot_event(iothread_get_aio_context(mon_iothread),
+                                      monitor_qmp_setup_handlers_bh, mon,
+                                      QEMU_CLOCK_REALTIME);
         /* The bottom half will add @mon to @mon_list */
     } else {
         qemu_chr_fe_set_handlers(&mon->common.chr, monitor_can_read,
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/17] qmp: Convert to new bh API
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (8 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 09/17] monitor: " Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 11/17] block: " Nicholas Piggin
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Convert aio_bh_schedule_oneshot() to aio_bh_schedule_oneshot_event(),
which can specify the clock type, making it compatible with
record-replay.

qmp dispatching does not affect target machine state, so it should use
QEMU_CLOCK_REALTIME so it is not recorded and replayed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 qapi/qmp-dispatch.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 176b549473c..4a07d75a856 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -254,8 +254,9 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ
             .errp       = &err,
             .co         = qemu_coroutine_self(),
         };
-        aio_bh_schedule_oneshot(iohandler_get_aio_context(), do_qmp_dispatch_bh,
-                                &data);
+        aio_bh_schedule_oneshot_event(iohandler_get_aio_context(),
+                                      do_qmp_dispatch_bh, &data,
+                                      QEMU_CLOCK_REALTIME);
         qemu_coroutine_yield();
     }
     qobject_unref(args);
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/17] block: Convert to new bh API
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (9 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 10/17] qmp: " Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 12/17] hw/ide: Fix record-replay and convert " Nicholas Piggin
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Convert qemu_bh_schedule() to qemu_bh_schedule_event() and
aio_bh_schedule_oneshot() to aio_bh_schedule_oneshot_event(), which can
specify the clock type, making it compatible with record-replay.

unreferencing a bdrv does not affect target machine state, so it should
use QEMU_CLOCK_REALTIME so it is not recorded and replayed. blkreplay
has cases at startup where the device is used before the event code is
set up, so it needs special handling to avoid creating bh replay events.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/block/aio.h    |  9 +++++++++
 block.c                |  4 +++-
 block/blkreplay.c      | 10 +++++++++-
 replay/replay-events.c |  7 ++-----
 util/async.c           |  5 +++++
 5 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index bc323b0d936..26859bd0b93 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -405,6 +405,15 @@ int aio_bh_poll(AioContext *ctx);
  */
 void qemu_bh_schedule_event(QEMUBH *bh, QEMUClockType clock_type);
 
+/**
+ * qemu_bh_schedule_event_noreplay: Schedule a bottom half avoiding replay.
+ *
+ * This function is not to be used outside record-replay code.
+ *
+ * @bh: The bottom half to be scheduled.
+ */
+void qemu_bh_schedule_event_noreplay(QEMUBH *bh);
+
 /**
  * qemu_bh_schedule: Schedule a bottom half.
  *
diff --git a/block.c b/block.c
index 7d90007cae8..77e6c6b3c7e 100644
--- a/block.c
+++ b/block.c
@@ -7144,7 +7144,9 @@ void bdrv_schedule_unref(BlockDriverState *bs)
     if (!bs) {
         return;
     }
-    aio_bh_schedule_oneshot(qemu_get_aio_context(), bdrv_schedule_unref_bh, bs);
+    aio_bh_schedule_oneshot_event(qemu_get_aio_context(),
+                                  bdrv_schedule_unref_bh, bs,
+                                  QEMU_CLOCK_REALTIME);
 }
 
 struct BdrvOpBlocker {
diff --git a/block/blkreplay.c b/block/blkreplay.c
index 792d980aa9d..c6b9d91062e 100644
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -67,7 +67,15 @@ static void block_request_create(uint64_t reqid, BlockDriverState *bs,
         .co = co,
         .bh = aio_bh_new(bdrv_get_aio_context(bs), blkreplay_bh_cb, req),
     };
-    replay_block_event(req->bh, reqid);
+    if (replay_events_enabled()) {
+        replay_block_event(req->bh, reqid);
+    } else {
+        /*
+         * block can be used before replay is initialized. Work around
+         * that here.
+         */
+        qemu_bh_schedule_event_noreplay(req->bh);
+    }
 }
 
 static int coroutine_fn GRAPH_RDLOCK
diff --git a/replay/replay-events.c b/replay/replay-events.c
index d4b095b2097..6a7c27cac1e 100644
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -154,11 +154,8 @@ void replay_add_input_sync_event(void)
 
 void replay_block_event(QEMUBH *bh, uint64_t id)
 {
-    if (events_enabled) {
-        replay_add_event(REPLAY_ASYNC_EVENT_BLOCK, bh, NULL, id);
-    } else {
-        qemu_bh_schedule(bh);
-    }
+    g_assert(events_enabled);
+    replay_add_event(REPLAY_ASYNC_EVENT_BLOCK, bh, NULL, id);
 }
 
 static void replay_save_event(Event *event)
diff --git a/util/async.c b/util/async.c
index 6ac994effec..5d2c76dec08 100644
--- a/util/async.c
+++ b/util/async.c
@@ -261,6 +261,11 @@ void qemu_bh_schedule_event(QEMUBH *bh, QEMUClockType clock_type)
     }
 }
 
+void qemu_bh_schedule_event_noreplay(QEMUBH *bh)
+{
+    aio_bh_enqueue(bh, BH_SCHEDULED);
+}
+
 void qemu_bh_schedule_idle(QEMUBH *bh)
 {
     aio_bh_enqueue(bh, BH_SCHEDULED | BH_IDLE);
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/17] hw/ide: Fix record-replay and convert to new bh API
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (10 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 11/17] block: " Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 13/17] hw/scsi: Convert " Nicholas Piggin
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Convert qemu_bh_schedule() to qemu_bh_schedule_event(), which can
specify the clock type, making it compatible with record-replay.

Restarting the IDE device does affect target machine state, so it should
use QEMU_CLOCK_VIRTUAL to recorded and replay the bh.

This fixes hangs in record/replay when using IDE devices.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/ide/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index c527caf3d69..500ea73c8be 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2780,7 +2780,7 @@ static void ide_restart_cb(void *opaque, bool running, RunState state)
 
     if (!bus->bh) {
         bus->bh = qemu_bh_new(ide_restart_bh, bus);
-        qemu_bh_schedule(bus->bh);
+        qemu_bh_schedule_event(bus->bh, QEMU_CLOCK_VIRTUAL);
     }
 }
 
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 13/17] hw/scsi: Convert to new bh API
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (11 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 12/17] hw/ide: Fix record-replay and convert " Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 23:54   ` Paolo Bonzini
  2024-12-20 10:42 ` [PATCH 14/17] async: add debugging assertions for record/replay in bh APIs Nicholas Piggin
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Convert aio_bh_schedule_oneshot() to aio_bh_schedule_oneshot_event(),
which can specify the clock type, making it compatible with
record-replay.

Operations on SCSI reqs do affect target machine state, so it should
use QEMU_CLOCK_VIRTUAL to recorded and replay the bh.

This fixes hangs in record/replay when using SCSI devices.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/scsi/scsi-bus.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 2f1678d51e7..16f294ce6b7 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -166,9 +166,17 @@ static void scsi_device_for_each_req_async(SCSIDevice *s,
 
     /* Paired with blk_dec_in_flight() in scsi_device_for_each_req_async_bh() */
     blk_inc_in_flight(s->conf.blk);
-    aio_bh_schedule_oneshot(blk_get_aio_context(s->conf.blk),
-                            scsi_device_for_each_req_async_bh,
-                            data);
+
+    /*
+     * This is called by device reset and does not affect the observable state
+     * of the target (because it is being reset), and by scsi_dma_restart_cb
+     * to restart DMA on vmstate change which also should not affect the state
+     * of the target (XXX is this really true?), so QEMU_CLOCK_REALTIME should
+     * be used to avoid record-replay of the bh event.
+     */
+    aio_bh_schedule_oneshot_event(blk_get_aio_context(s->conf.blk),
+                                  scsi_device_for_each_req_async_bh,
+                                  data, QEMU_CLOCK_REALTIME);
 }
 
 static void scsi_device_realize(SCSIDevice *s, Error **errp)
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 14/17] async: add debugging assertions for record/replay in bh APIs
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (12 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 13/17] hw/scsi: Convert " Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 15/17] tests/avocado/replay_linux: Fix compile error Nicholas Piggin
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Code using old bh APIs must be updated to account for whether the
bh is related to the target machine model or host QEMU operation,
in order for record/replay to work properly.

Add some assertions in the old APIs to catch unconverted code when
record/replay is active.

This caught the IDE bug when running replay_linux.py avocado test
with the x86-64 q35 non-virtio machine.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/block/aio.h    |  2 +-
 replay/replay-events.c | 20 ++++++++------------
 util/async.c           | 29 +++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 26859bd0b93..991aaae707d 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -309,7 +309,7 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
  */
 #define aio_bh_schedule_oneshot(ctx, cb, opaque) \
     aio_bh_schedule_oneshot_full((ctx), (cb), (opaque), (stringify(cb)), \
-                                 QEMU_CLOCK_REALTIME)
+                                 QEMU_CLOCK_MAX)
 
 /**
  * aio_bh_new_full: Allocate a new bottom half structure.
diff --git a/replay/replay-events.c b/replay/replay-events.c
index 6a7c27cac1e..0b3dbfd46b9 100644
--- a/replay/replay-events.c
+++ b/replay/replay-events.c
@@ -123,23 +123,19 @@ void replay_add_event(ReplayAsyncEventKind event_kind,
 
 void replay_bh_schedule_event(QEMUBH *bh)
 {
-    if (events_enabled) {
-        uint64_t id = replay_get_current_icount();
-        replay_add_event(REPLAY_ASYNC_EVENT_BH, bh, NULL, id);
-    } else {
-        qemu_bh_schedule(bh);
-    }
+    uint64_t id;
+    g_assert(events_enabled);
+    id = replay_get_current_icount();
+    replay_add_event(REPLAY_ASYNC_EVENT_BH, bh, NULL, id);
 }
 
 void replay_bh_oneshot_event(AioContext *ctx,
     QEMUBHFunc *cb, void *opaque)
 {
-    if (events_enabled) {
-        uint64_t id = replay_get_current_icount();
-        replay_add_event(REPLAY_ASYNC_EVENT_BH_ONESHOT, cb, opaque, id);
-    } else {
-        aio_bh_schedule_oneshot(ctx, cb, opaque);
-    }
+    uint64_t id;
+    g_assert(events_enabled);
+    id = replay_get_current_icount();
+    replay_add_event(REPLAY_ASYNC_EVENT_BH_ONESHOT, cb, opaque, id);
 }
 
 void replay_add_input_event(struct InputEvent *event)
diff --git a/util/async.c b/util/async.c
index 5d2c76dec08..72a9eccffbe 100644
--- a/util/async.c
+++ b/util/async.c
@@ -58,6 +58,9 @@ enum {
 
     /* Schedule periodically when the event loop is idle */
     BH_IDLE      = (1 << 4),
+
+    /* BH being handled by replay machinery */
+    BH_REPLAY    = (1 << 4),
 };
 
 struct QEMUBH {
@@ -145,6 +148,17 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
                                   void *opaque, const char *name,
                                   QEMUClockType clock_type)
 {
+    if (clock_type == QEMU_CLOCK_MAX) {
+        /*
+         * aio_bh_schedule_oneshot() uses QEMU_CLOCK_MAX to say it does not
+         * know about clock context to use. It will not work in record/replay.
+         * Callers should be converted to aio_bh_schedule_oneshot_event()
+         * then this can be removed when the old API goes away.
+         */
+        g_assert(replay_mode == REPLAY_MODE_NONE);
+        clock_type = QEMU_CLOCK_REALTIME;
+    }
+
     switch (clock_type) {
     case QEMU_CLOCK_VIRTUAL:
     case QEMU_CLOCK_VIRTUAL_RT:
@@ -178,6 +192,12 @@ void aio_bh_call(QEMUBH *bh)
 {
     bool last_engaged_in_io = false;
 
+    if (bh->flags & BH_REPLAY) {
+        g_assert(!(bh->flags & BH_SCHEDULED));
+        g_assert(!(bh->flags & BH_DELETED));
+        g_assert(!(bh->flags & BH_PENDING));
+        bh->flags &= ~BH_REPLAY;
+    }
     /* Make a copy of the guard-pointer as cb may free the bh */
     MemReentrancyGuard *reentrancy_guard = bh->reentrancy_guard;
     if (reentrancy_guard) {
@@ -252,6 +272,7 @@ void qemu_bh_schedule_event(QEMUBH *bh, QEMUClockType clock_type)
     case QEMU_CLOCK_VIRTUAL_RT:
         if (replay_mode != REPLAY_MODE_NONE) {
             /* Record/replay must intercept bh events */
+            qatomic_fetch_or(&bh->flags, BH_REPLAY);
             replay_bh_schedule_event(bh);
             break;
         }
@@ -268,11 +289,15 @@ void qemu_bh_schedule_event_noreplay(QEMUBH *bh)
 
 void qemu_bh_schedule_idle(QEMUBH *bh)
 {
+    /* No mechanism for scheduling idle replay-scheduled bh at the moment */
+    g_assert(replay_mode == REPLAY_MODE_NONE);
     aio_bh_enqueue(bh, BH_SCHEDULED | BH_IDLE);
 }
 
 void qemu_bh_schedule(QEMUBH *bh)
 {
+    /* Callers should be converted to use qemu_bh_schedule_event */
+    g_assert(replay_mode == REPLAY_MODE_NONE);
     aio_bh_enqueue(bh, BH_SCHEDULED);
 }
 
@@ -280,6 +305,8 @@ void qemu_bh_schedule(QEMUBH *bh)
  */
 void qemu_bh_cancel(QEMUBH *bh)
 {
+    /* No mechanism for canceling replay-scheduled bh at the moment */
+    g_assert(!(bh->flags & BH_REPLAY));
     qatomic_and(&bh->flags, ~BH_SCHEDULED);
 }
 
@@ -288,6 +315,8 @@ void qemu_bh_cancel(QEMUBH *bh)
  */
 void qemu_bh_delete(QEMUBH *bh)
 {
+    /* No mechanism for deleting replay-scheduled bh at the moment */
+    g_assert(!(bh->flags & BH_REPLAY));
     aio_bh_enqueue(bh, BH_DELETED);
 }
 
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 15/17] tests/avocado/replay_linux: Fix compile error
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (13 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 14/17] async: add debugging assertions for record/replay in bh APIs Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 16/17] tests/avocado/replay_linux: Fix cdrom device setup Nicholas Piggin
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

The subprocess module was missing, the hunk got mismerged into a later
patch that is required to make replay_linux work, so the broken commit
was never tested in isolation.

Fixes: 4926b6e6444f ("tests/avocado: excercise scripts/replay-dump.py in replay tests")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 tests/avocado/replay_linux.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
index 59169224353..38b6f8e6652 100644
--- a/tests/avocado/replay_linux.py
+++ b/tests/avocado/replay_linux.py
@@ -11,6 +11,7 @@
 import os
 import logging
 import time
+import subprocess
 
 from avocado import skipUnless
 from avocado_qemu import BUILD_DIR
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 16/17] tests/avocado/replay_linux: Fix cdrom device setup
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (14 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 15/17] tests/avocado/replay_linux: Fix compile error Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 10:42 ` [PATCH 17/17] tests/avocado/replay_linux: remove the timeout expected guards Nicholas Piggin
  2024-12-20 11:42 ` [PATCH 00/17] replay: Fixes and avocado test updates Pavel Dovgalyuk
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

The cdrom device was not being added to the command line.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 tests/avocado/replay_linux.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
index 38b6f8e6652..6c08ddaaff1 100644
--- a/tests/avocado/replay_linux.py
+++ b/tests/avocado/replay_linux.py
@@ -57,6 +57,8 @@ def vm_add_disk(self, vm, path, id, device):
 
     def vm_add_cdrom(self, vm, path, id, device):
         vm.add_args('-drive', 'file=%s,id=disk%s,if=none,media=cdrom' % (path, id))
+        vm.add_args('-device',
+            '%s,drive=disk%s' % (device, id))
 
     def launch_and_wait(self, record, args, shift):
         self.require_netdev('user')
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 17/17] tests/avocado/replay_linux: remove the timeout expected guards
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (15 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 16/17] tests/avocado/replay_linux: Fix cdrom device setup Nicholas Piggin
@ 2024-12-20 10:42 ` Nicholas Piggin
  2024-12-20 11:42 ` [PATCH 00/17] replay: Fixes and avocado test updates Pavel Dovgalyuk
  17 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-20 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

replay_linux tests on is becoming more reliable now, so remove the
timeout expected tag.

These tests take about 400-800 seconds each, so add the SPEED=slow
guard.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
Note that Alex gave R-B tag for previous version which marked
a few things as flaky. They are passing in gitlab CI now so
hopefully it's okay to keep the review.

Thanks,
Nick
---
 tests/avocado/replay_linux.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
index 6c08ddaaff1..362e35c8849 100644
--- a/tests/avocado/replay_linux.py
+++ b/tests/avocado/replay_linux.py
@@ -121,7 +121,7 @@ def run_replay_dump(self, replay_path):
         except subprocess.CalledProcessError:
             self.fail('replay-dump.py failed')
 
-@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
+@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
 class ReplayLinuxX8664(ReplayLinux):
     """
     :avocado: tags=arch:x86_64
@@ -142,7 +142,7 @@ def test_pc_q35(self):
         """
         self.run_rr(shift=3)
 
-@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
+@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
 class ReplayLinuxX8664Virtio(ReplayLinux):
     """
     :avocado: tags=arch:x86_64
@@ -168,7 +168,7 @@ def test_pc_q35(self):
         """
         self.run_rr(shift=3)
 
-@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
+@skipUnless(os.getenv('SPEED') == 'slow', 'runtime limited')
 class ReplayLinuxAarch64(ReplayLinux):
     """
     :avocado: tags=accel:tcg
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/17] replay: Fixes and avocado test updates
  2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
                   ` (16 preceding siblings ...)
  2024-12-20 10:42 ` [PATCH 17/17] tests/avocado/replay_linux: remove the timeout expected guards Nicholas Piggin
@ 2024-12-20 11:42 ` Pavel Dovgalyuk
  17 siblings, 0 replies; 28+ messages in thread
From: Pavel Dovgalyuk @ 2024-12-20 11:42 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: qemu-block, Alex Bennée, Kevin Wolf, Hanna Reitz,
	Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, John Snow, Michael S. Tsirkin, Jason Wang,
	Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>

On 20.12.2024 13:42, Nicholas Piggin wrote:
> Hi,
> 
> This is another round of replay fixes posted here
> 
> https://lore.kernel.org/qemu-devel/20240813050638.446172-1-npiggin@gmail.com/
> 
> A bunch of those fixes have been merged, but there are still some
> outstanding here.
> 
> Dropped from the series is the net announce change, which seemed to
> be the main issue Pavel had so far:
> 
> https://lore.kernel.org/qemu-devel/6e9b8e49-f00f-46fc-bbf8-4af27e0c3906@ispras.ru/
> 
> New in this series is a reworking of the replay BH APIs because people
> didn't like the replay_xxx APIs throughout the tree. These new APIs
> also have some assertions added to catch un-converted users when replay
> is enabled, because it is far harder to debug it when it surfaces as a
> replay failure.
> 
> These new API assertions caught a hw/ide replay bug which solves some
> replay_linux test hangs. Couple of fixes in the replay_linux test case,
> and now all tests are passing including aarch64 tests, see here
> 
>    https://gitlab.com/npiggin/qemu/-/jobs/8695386122
> 
> (In that run a couple of the x86_64 tests were disabled to fit the
> aarch64 tests in because gitlab seems to kill the job after 1 hour
> so we can't fit them all in)
> 
> ppc64 also passes replay_linux after a couple of ppc64 fixes I'll post
> a patch to add the ppc64 test later after everything works through.
> 
> Thanks,
> Nick
> 
> 
> Nicholas Piggin (17):
>    replay: Fix migration use of clock for statistics
>    replay: Fix migration replay_mutex locking
>    async: rework async event API for replay
>    util/main-loop: Convert to new bh API
>    util/thread-pool: Convert to new bh API
>    util/aio-wait: Convert to new bh API
>    async/coroutine: Convert to new bh API
>    migration: Convert to new bh API
>    monitor: Convert to new bh API
>    qmp: Convert to new bh API
>    block: Convert to new bh API
>    hw/ide: Fix record-replay and convert to new bh API
>    hw/scsi: Convert to new bh API
>    async: add debugging assertions for record/replay in bh APIs
>    tests/avocado/replay_linux: Fix compile error
>    tests/avocado/replay_linux: Fix cdrom device setup
>    tests/avocado/replay_linux: remove the timeout expected guards
> 
>   docs/devel/replay.rst              |  7 +--
>   include/block/aio.h                | 44 ++++++++++++++++--
>   include/sysemu/replay.h            |  2 +-
>   backends/rng-builtin.c             |  2 +-
>   block.c                            |  4 +-
>   block/blkreplay.c                  | 10 +++-
>   block/block-backend.c              | 24 ++++++----
>   block/io.c                         |  5 +-
>   block/iscsi.c                      |  5 +-
>   block/nfs.c                        | 10 ++--
>   block/null.c                       |  4 +-
>   block/nvme.c                       |  8 ++--
>   hw/ide/core.c                      |  9 ++--
>   hw/net/virtio-net.c                | 14 +++---
>   hw/scsi/scsi-bus.c                 | 14 ++++--
>   job.c                              |  3 +-
>   migration/migration.c              | 17 +++++--
>   migration/savevm.c                 | 15 +++---
>   monitor/monitor.c                  |  3 +-
>   monitor/qmp.c                      |  5 +-
>   qapi/qmp-dispatch.c                |  5 +-
>   replay/replay-events.c             | 29 +++++-------
>   stubs/replay-tools.c               |  2 +-
>   util/aio-wait.c                    |  3 +-
>   util/async.c                       | 75 ++++++++++++++++++++++++++++--
>   util/main-loop.c                   |  2 +-
>   util/thread-pool.c                 |  8 ++--
>   scripts/block-coroutine-wrapper.py |  3 +-
>   tests/avocado/replay_linux.py      |  9 ++--
>   29 files changed, 245 insertions(+), 96 deletions(-)
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/17] replay: Fix migration replay_mutex locking
  2024-12-20 10:42 ` [PATCH 02/17] replay: Fix migration replay_mutex locking Nicholas Piggin
@ 2024-12-20 13:08   ` Fabiano Rosas
  2024-12-21  2:54     ` Nicholas Piggin
  0 siblings, 1 reply; 28+ messages in thread
From: Fabiano Rosas @ 2024-12-20 13:08 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: Nicholas Piggin, qemu-block, Alex Bennée, Kevin Wolf,
	Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

Nicholas Piggin <npiggin@gmail.com> writes:

Hi Nick,

I'm ignorant about replay, but we try to know why were taking the BQL in
the migration code, we move it around sometimes, etc. Can we be a bit
more strict with documentation here so we don't get stuck with a lock
that can't be changed?

> Migration causes a number of events that need to go in the replay
> trace, such as vm state transitions. The replay_mutex lock needs to
> be held for these.
>

Is it practical to explicitly list which events are those?

Are there any tests that exercise this that we could use to validate
changes around this area?

> The simplest approach seems to be just take it up-front when taking
> the bql.

But also the thing asserts if taken inside the BQL, so is the actual
matter here that we _cannot_ take the lock around the proper places?

I also see the replay lock around the main loop, so is it basically bql2
from the perspective of most of QEMU?

>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  migration/migration.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 2eb9e50a263..277fca954c1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -24,6 +24,7 @@
>  #include "socket.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/sysemu.h"
> +#include "sysemu/replay.h"
>  #include "sysemu/cpu-throttle.h"
>  #include "rdma.h"
>  #include "ram.h"
> @@ -2518,6 +2519,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>      }
>  
>      trace_postcopy_start();
> +    replay_mutex_lock();
>      bql_lock();
>      trace_postcopy_start_set_run();
>  
> @@ -2629,6 +2631,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>      migration_downtime_end(ms);
>  
>      bql_unlock();
> +    replay_mutex_unlock();
>  
>      if (migrate_postcopy_ram()) {
>          /*
> @@ -2670,6 +2673,7 @@ fail:
>      }
>      migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
>      bql_unlock();
> +    replay_mutex_unlock();
>      return -1;
>  }
>  
> @@ -2721,6 +2725,7 @@ static int migration_completion_precopy(MigrationState *s,
>  {
>      int ret;
>  
> +    replay_mutex_lock();
>      bql_lock();
>  
>      if (!migrate_mode_is_cpr(s)) {
> @@ -2746,6 +2751,7 @@ static int migration_completion_precopy(MigrationState *s,
>                                               s->block_inactive);
>  out_unlock:
>      bql_unlock();
> +    replay_mutex_unlock();
>      return ret;
>  }
>  
> @@ -3633,6 +3639,7 @@ static void *bg_migration_thread(void *opaque)
>  
>      trace_migration_thread_setup_complete();
>  
> +    replay_mutex_lock();
>      bql_lock();
>  
>      if (migration_stop_vm(s, RUN_STATE_PAUSED)) {
> @@ -3666,6 +3673,7 @@ static void *bg_migration_thread(void *opaque)
>       */
>      migration_bh_schedule(bg_migration_vm_start_bh, s);
>      bql_unlock();
> +    replay_mutex_unlock();
>  
>      while (migration_is_active()) {
>          MigIterateState iter_state = bg_migration_iteration_run(s);
> @@ -3695,6 +3703,7 @@ fail:
>          migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>                  MIGRATION_STATUS_FAILED);
>          bql_unlock();
> +        replay_mutex_unlock();
>      }
>  
>  fail_setup:


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/17] replay: Fix migration use of clock for statistics
  2024-12-20 10:42 ` [PATCH 01/17] replay: Fix migration use of clock for statistics Nicholas Piggin
@ 2024-12-20 16:31   ` Peter Xu
  2024-12-21  3:02     ` Nicholas Piggin
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2024-12-20 16:31 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: qemu-devel, qemu-block, Alex Bennée, Kevin Wolf, Hanna Reitz,
	Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, John Snow, Michael S. Tsirkin, Jason Wang,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

On Fri, Dec 20, 2024 at 08:42:03PM +1000, Nicholas Piggin wrote:
> Migration reads CLOCK_HOST when not holding the replay_mutex, which
> asserts when recording a trace. These are not guest visible so should
> be CLOCK_REALTIME like other statistics in MigrationState, which do
> not require the replay_mutex.

Irrelevant of the change, should we document such lock implications in
timer.h?

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  migration/migration.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 8c5bd0a75c8..2eb9e50a263 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3433,7 +3433,7 @@ static void *migration_thread(void *opaque)
>  {
>      MigrationState *s = opaque;
>      MigrationThread *thread = NULL;
> -    int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +    int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      MigThrError thr_error;
>      bool urgent = false;
>      Error *local_err = NULL;
> @@ -3504,7 +3504,7 @@ static void *migration_thread(void *opaque)
>          goto out;
>      }
>  
> -    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> +    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start;
>  
>      trace_migration_thread_setup_complete();
>  
> @@ -3584,7 +3584,7 @@ static void *bg_migration_thread(void *opaque)
>  
>      migration_rate_set(RATE_LIMIT_DISABLED);
>  
> -    setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +    setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      /*
>       * We want to save vmstate for the moment when migration has been
>       * initiated but also we want to save RAM content while VM is running.
> @@ -3629,7 +3629,7 @@ static void *bg_migration_thread(void *opaque)
>          goto fail_setup;
>      }
>  
> -    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> +    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start;
>  
>      trace_migration_thread_setup_complete();
>  
> -- 
> 2.45.2
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 13/17] hw/scsi: Convert to new bh API
  2024-12-20 10:42 ` [PATCH 13/17] hw/scsi: Convert " Nicholas Piggin
@ 2024-12-20 23:54   ` Paolo Bonzini
  2024-12-21  3:17     ` Nicholas Piggin
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2024-12-20 23:54 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: qemu-devel, open list:Block layer core, Alex Bennée,
	Kevin Wolf, Hanna Reitz, Pavel Dovgalyuk, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

Il ven 20 dic 2024, 11:44 Nicholas Piggin <npiggin@gmail.com> ha scritto:

> Convert aio_bh_schedule_oneshot() to aio_bh_schedule_oneshot_event(),
> which can specify the clock type, making it compatible with
> record-replay.
>
> Operations on SCSI reqs do affect target machine state, so it should
> use QEMU_CLOCK_VIRTUAL to recorded and replay the bh.
>

This does not seem to match the patch below?

Paolo

+    aio_bh_schedule_oneshot_event(blk_get_aio_context(s->conf.blk),
> +                                  scsi_device_for_each_req_async_bh,
> +                                  data, QEMU_CLOCK_REALTIME);
>  }
>
>  static void scsi_device_realize(SCSIDevice *s, Error **errp)
> --
> 2.45.2
>
>

[-- Attachment #2: Type: text/html, Size: 1439 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/17] replay: Fix migration replay_mutex locking
  2024-12-20 13:08   ` Fabiano Rosas
@ 2024-12-21  2:54     ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-21  2:54 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: qemu-block, Alex Bennée, Kevin Wolf, Hanna Reitz,
	Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, John Snow, Michael S. Tsirkin, Jason Wang,
	Vladimir Sementsov-Ogievskiy, Peter Xu, Dr. David Alan Gilbert,
	Markus Armbruster, Michael Roth, Wainer dos Santos Moschetta

On Fri Dec 20, 2024 at 11:08 PM AEST, Fabiano Rosas wrote:
> Nicholas Piggin <npiggin@gmail.com> writes:
>
> Hi Nick,
>
> I'm ignorant about replay, but we try to know why were taking the BQL in
> the migration code, we move it around sometimes, etc. Can we be a bit
> more strict with documentation here so we don't get stuck with a lock
> that can't be changed?
>
> > Migration causes a number of events that need to go in the replay
> > trace, such as vm state transitions. The replay_mutex lock needs to
> > be held for these.
> >
>
> Is it practical to explicitly list which events are those?

As a general rule it is something like "while the target can be
producing or consuming rr events".

There is some record-replay handling in snapshot code (flush
events, get icount, etc) as well as SHUTDOWN_CAUSE_SNAPSHOT_LOAD
event generated and possibly a few other things. So for migration
it's not just a side effect of calling other APIs, but it is
explicitly "replay-aware", at least in some part.

I actually don't know full details about how snapshot/migrate
and record-replay work together. I know reverse debugging can
use snapshots to load the most recent possible state to
minimize replay, but that is "external" to the machine itself.
But I don't know why you would want to record and replay snapshot
loading as part of the trace. But facility exists. Pavel
understands all the big picture much better.

> Are there any tests that exercise this that we could use to validate
> changes around this area?

Yes I added some more avocado testing which includes snapshotting
while recording, and that's where I hit these bugs. I do plan to
submit that up as soon as this series gets in, just trying to keep
things managable. In that case we could defer this patch from this
series (the replay_linux test does not do any snapshotting as yet).

I think once you have some regression tests, you probably won't
have to worry _too_ much about record/replay details in migraiton.

>
> > The simplest approach seems to be just take it up-front when taking
> > the bql.
>
> But also the thing asserts if taken inside the BQL, so is the actual
> matter here that we _cannot_ take the lock around the proper places?

Yes, that is part of it in this case. Some other code drops the
bql then retakes both... but that is more complex and requires
knowledge of calling bql context to be sure it is safe to drop.

> I also see the replay lock around the main loop, so is it basically bql2
> from the perspective of most of QEMU?

Unfortunately it is a big scope yes. Basically it needs to
maintain atomicity between the event log entry that we record or
replay (run N instructions; raise an interrupt; get a character
from keyboard; etc.,) and the actual running of that operation
in the machine.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/17] replay: Fix migration use of clock for statistics
  2024-12-20 16:31   ` Peter Xu
@ 2024-12-21  3:02     ` Nicholas Piggin
  2024-12-23 17:26       ` Peter Xu
  0 siblings, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-21  3:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, qemu-block, Alex Bennée, Kevin Wolf, Hanna Reitz,
	Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, John Snow, Michael S. Tsirkin, Jason Wang,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

On Sat Dec 21, 2024 at 2:31 AM AEST, Peter Xu wrote:
> On Fri, Dec 20, 2024 at 08:42:03PM +1000, Nicholas Piggin wrote:
> > Migration reads CLOCK_HOST when not holding the replay_mutex, which
> > asserts when recording a trace. These are not guest visible so should
> > be CLOCK_REALTIME like other statistics in MigrationState, which do
> > not require the replay_mutex.
>
> Irrelevant of the change, should we document such lock implications in
> timer.h?

I guess the intention was to try to avoid caller caring too much
about replay internals, so I'm not sure if that will help or
hinder understanding :(

I think the big rule is something like "if it affects guest state,
then you must use HOST or VIRTUAL*, if it does not affect guest state
then you must use REALTIME". record-replay code then takes care of
replay mutex locking.

Does get a little fuzzy around edges in code that is somewhat
aware of record-replay though, like migration/snapshots.

(Pavel please correct me if I've been saying the wrong things)

Thanks,
Nick

>
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >  migration/migration.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 8c5bd0a75c8..2eb9e50a263 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -3433,7 +3433,7 @@ static void *migration_thread(void *opaque)
> >  {
> >      MigrationState *s = opaque;
> >      MigrationThread *thread = NULL;
> > -    int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > +    int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >      MigThrError thr_error;
> >      bool urgent = false;
> >      Error *local_err = NULL;
> > @@ -3504,7 +3504,7 @@ static void *migration_thread(void *opaque)
> >          goto out;
> >      }
> >  
> > -    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> > +    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start;
> >  
> >      trace_migration_thread_setup_complete();
> >  
> > @@ -3584,7 +3584,7 @@ static void *bg_migration_thread(void *opaque)
> >  
> >      migration_rate_set(RATE_LIMIT_DISABLED);
> >  
> > -    setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > +    setup_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >      /*
> >       * We want to save vmstate for the moment when migration has been
> >       * initiated but also we want to save RAM content while VM is running.
> > @@ -3629,7 +3629,7 @@ static void *bg_migration_thread(void *opaque)
> >          goto fail_setup;
> >      }
> >  
> > -    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
> > +    s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - setup_start;
> >  
> >      trace_migration_thread_setup_complete();
> >  
> > -- 
> > 2.45.2
> > 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 13/17] hw/scsi: Convert to new bh API
  2024-12-20 23:54   ` Paolo Bonzini
@ 2024-12-21  3:17     ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2024-12-21  3:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, open list:Block layer core, Alex Bennée,
	Kevin Wolf, Hanna Reitz, Pavel Dovgalyuk, Stefan Hajnoczi,
	Fam Zheng, Ronnie Sahlberg, John Snow, Michael S. Tsirkin,
	Jason Wang, Vladimir Sementsov-Ogievskiy, Peter Xu, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

On Sat Dec 21, 2024 at 9:54 AM AEST, Paolo Bonzini wrote:
> Il ven 20 dic 2024, 11:44 Nicholas Piggin <npiggin@gmail.com> ha scritto:
>
> > Convert aio_bh_schedule_oneshot() to aio_bh_schedule_oneshot_event(),
> > which can specify the clock type, making it compatible with
> > record-replay.
> >
> > Operations on SCSI reqs do affect target machine state, so it should
> > use QEMU_CLOCK_VIRTUAL to recorded and replay the bh.
> >
>
> This does not seem to match the patch below?

Ah, good catch, I missed fixing the changelog.

I think scsi_device_purge_requests() does not affect target because
it is called on machine reset so the state is all going away anyway.

But initially I thought scsi_dma_restart_cb did, like the ide
restart (which was a real bug). But that caused record-replay hangs
because it is a vm_change_state handler, so I took another looks and
it seems like it perhaps just kicks the host DMA running again and
perhaps it is okay to be outside record-replay. I'm completely
confident of this though.

And now you make me look again, the IDE restart is also a vm change
state handler. So my patch for that does not solve all problems
either (it's better than nothing, but still has this bug).

AFAIK, vm state change (stop, continue) should ideally not affect
machine or emulated devices right? Is it possible to split out
the architectural SCSI/IDE restart from the DMA restart that is
reqiured by vm state change?

At least I will redo the patches and leave a comment and a qemu log
message if we hit the case of IDE vmstate change with record
replay active...

Thanks,
Nick

>
> Paolo
>
> +    aio_bh_schedule_oneshot_event(blk_get_aio_context(s->conf.blk),
> > +                                  scsi_device_for_each_req_async_bh,
> > +                                  data, QEMU_CLOCK_REALTIME);
> >  }
> >
> >  static void scsi_device_realize(SCSIDevice *s, Error **errp)
> > --
> > 2.45.2
> >
> >



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/17] replay: Fix migration use of clock for statistics
  2024-12-21  3:02     ` Nicholas Piggin
@ 2024-12-23 17:26       ` Peter Xu
  2024-12-24  7:24         ` Pavel Dovgalyuk
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2024-12-23 17:26 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: qemu-devel, qemu-block, Alex Bennée, Kevin Wolf, Hanna Reitz,
	Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, John Snow, Michael S. Tsirkin, Jason Wang,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

On Sat, Dec 21, 2024 at 01:02:01PM +1000, Nicholas Piggin wrote:
> On Sat Dec 21, 2024 at 2:31 AM AEST, Peter Xu wrote:
> > On Fri, Dec 20, 2024 at 08:42:03PM +1000, Nicholas Piggin wrote:
> > > Migration reads CLOCK_HOST when not holding the replay_mutex, which
> > > asserts when recording a trace. These are not guest visible so should
> > > be CLOCK_REALTIME like other statistics in MigrationState, which do
> > > not require the replay_mutex.
> >
> > Irrelevant of the change, should we document such lock implications in
> > timer.h?
> 
> I guess the intention was to try to avoid caller caring too much
> about replay internals, so I'm not sure if that will help or
> hinder understanding :(

CLOCK_HOST should be the wall clock in QEMU, IIUC.  If any QEMU caller
tries to read host wall clock requires some mutex to be held.. then I don't
see how we can avoid mentioning it.  It's indeed weird if we need to take a
feature specific mutex just to read the wallclock.. But maybe I misread the
context somewhere..

> 
> I think the big rule is something like "if it affects guest state,
> then you must use HOST or VIRTUAL*, if it does not affect guest state

HOST clock logically shouldn't be relevant to guest-state?

> then you must use REALTIME". record-replay code then takes care of
> replay mutex locking.
> 
> Does get a little fuzzy around edges in code that is somewhat
> aware of record-replay though, like migration/snapshots.

Said that, I agree with the change itself - any measurement may not want to
involve NTP at all... which HOST / gtod will, but REALTIME won't.  However
this patch doesn't seem to be for that purpose..  So I'd like to double
check.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/17] replay: Fix migration use of clock for statistics
  2024-12-23 17:26       ` Peter Xu
@ 2024-12-24  7:24         ` Pavel Dovgalyuk
  2024-12-24 15:19           ` Peter Xu
  0 siblings, 1 reply; 28+ messages in thread
From: Pavel Dovgalyuk @ 2024-12-24  7:24 UTC (permalink / raw)
  To: Peter Xu, Nicholas Piggin
  Cc: qemu-devel, qemu-block, Alex Bennée, Kevin Wolf, Hanna Reitz,
	Pavel Dovgalyuk, Paolo Bonzini, Stefan Hajnoczi, Fam Zheng,
	Ronnie Sahlberg, John Snow, Michael S. Tsirkin, Jason Wang,
	Vladimir Sementsov-Ogievskiy, Fabiano Rosas,
	Dr. David Alan Gilbert, Markus Armbruster, Michael Roth,
	Wainer dos Santos Moschetta

On 23.12.2024 20:26, Peter Xu wrote:
> On Sat, Dec 21, 2024 at 01:02:01PM +1000, Nicholas Piggin wrote:
>> On Sat Dec 21, 2024 at 2:31 AM AEST, Peter Xu wrote:
>>> On Fri, Dec 20, 2024 at 08:42:03PM +1000, Nicholas Piggin wrote:
>>>> Migration reads CLOCK_HOST when not holding the replay_mutex, which
>>>> asserts when recording a trace. These are not guest visible so should
>>>> be CLOCK_REALTIME like other statistics in MigrationState, which do
>>>> not require the replay_mutex.
>>>
>>> Irrelevant of the change, should we document such lock implications in
>>> timer.h?
>>
>> I guess the intention was to try to avoid caller caring too much
>> about replay internals, so I'm not sure if that will help or
>> hinder understanding :(
> 
> CLOCK_HOST should be the wall clock in QEMU, IIUC.  If any QEMU caller
> tries to read host wall clock requires some mutex to be held.. then I don't
> see how we can avoid mentioning it.  It's indeed weird if we need to take a
> feature specific mutex just to read the wallclock.. But maybe I misread the
> context somewhere..
> 
>>
>> I think the big rule is something like "if it affects guest state,
>> then you must use HOST or VIRTUAL*, if it does not affect guest state
> 
> HOST clock logically shouldn't be relevant to guest-state?

CLOCK_HOST is used for rtc by default. As the rtc affects the guest 
state, therefore CLOCK_HOST affects guest state too.

Migration is not related to guest state change, therefore it should 
either use realtime clock, or set some flag to make host clock reads not 
tracked by record/replay.

> 
>> then you must use REALTIME". record-replay code then takes care of
>> replay mutex locking.
>>
>> Does get a little fuzzy around edges in code that is somewhat
>> aware of record-replay though, like migration/snapshots.
> 
> Said that, I agree with the change itself - any measurement may not want to
> involve NTP at all... which HOST / gtod will, but REALTIME won't.  However
> this patch doesn't seem to be for that purpose..  So I'd like to double
> check.
> 
> Thanks,
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/17] replay: Fix migration use of clock for statistics
  2024-12-24  7:24         ` Pavel Dovgalyuk
@ 2024-12-24 15:19           ` Peter Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2024-12-24 15:19 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: Nicholas Piggin, qemu-devel, qemu-block, Alex Bennée,
	Kevin Wolf, Hanna Reitz, Pavel Dovgalyuk, Paolo Bonzini,
	Stefan Hajnoczi, Fam Zheng, Ronnie Sahlberg, John Snow,
	Michael S. Tsirkin, Jason Wang, Vladimir Sementsov-Ogievskiy,
	Fabiano Rosas, Dr. David Alan Gilbert, Markus Armbruster,
	Michael Roth, Wainer dos Santos Moschetta

On Tue, Dec 24, 2024 at 10:24:51AM +0300, Pavel Dovgalyuk wrote:
> On 23.12.2024 20:26, Peter Xu wrote:
> > On Sat, Dec 21, 2024 at 01:02:01PM +1000, Nicholas Piggin wrote:
> > > On Sat Dec 21, 2024 at 2:31 AM AEST, Peter Xu wrote:
> > > > On Fri, Dec 20, 2024 at 08:42:03PM +1000, Nicholas Piggin wrote:
> > > > > Migration reads CLOCK_HOST when not holding the replay_mutex, which
> > > > > asserts when recording a trace. These are not guest visible so should
> > > > > be CLOCK_REALTIME like other statistics in MigrationState, which do
> > > > > not require the replay_mutex.
> > > > 
> > > > Irrelevant of the change, should we document such lock implications in
> > > > timer.h?
> > > 
> > > I guess the intention was to try to avoid caller caring too much
> > > about replay internals, so I'm not sure if that will help or
> > > hinder understanding :(
> > 
> > CLOCK_HOST should be the wall clock in QEMU, IIUC.  If any QEMU caller
> > tries to read host wall clock requires some mutex to be held.. then I don't
> > see how we can avoid mentioning it.  It's indeed weird if we need to take a
> > feature specific mutex just to read the wallclock.. But maybe I misread the
> > context somewhere..
> > 
> > > 
> > > I think the big rule is something like "if it affects guest state,
> > > then you must use HOST or VIRTUAL*, if it does not affect guest state
> > 
> > HOST clock logically shouldn't be relevant to guest-state?
> 
> CLOCK_HOST is used for rtc by default. As the rtc affects the guest state,
> therefore CLOCK_HOST affects guest state too.

It's not obvious to me that HOST should only be used for rtc, and it's part
of guest state.  If that's a must, I'd still suggest we add that into doc.
But then it means we lose one way to fetch host wallclock in the time API;
I still see some other users use it, I'm guessing in the way to fetch host
wall clock.

> 
> Migration is not related to guest state change, therefore it should either
> use realtime clock, or set some flag to make host clock reads not tracked by
> record/replay.

In migration's case, realtime clock suites more.  But maybe we still need
another clock indeed just to fetch host wall clock without any lock
implications.  So maybe the better way is making the tracked one to be
CLOCK_GUEST_RTC, put rich documentatation to avoid abuse, then keep HOST
the simple definition.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-12-24 15:20 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
2024-12-20 10:42 ` [PATCH 01/17] replay: Fix migration use of clock for statistics Nicholas Piggin
2024-12-20 16:31   ` Peter Xu
2024-12-21  3:02     ` Nicholas Piggin
2024-12-23 17:26       ` Peter Xu
2024-12-24  7:24         ` Pavel Dovgalyuk
2024-12-24 15:19           ` Peter Xu
2024-12-20 10:42 ` [PATCH 02/17] replay: Fix migration replay_mutex locking Nicholas Piggin
2024-12-20 13:08   ` Fabiano Rosas
2024-12-21  2:54     ` Nicholas Piggin
2024-12-20 10:42 ` [PATCH 03/17] async: rework async event API for replay Nicholas Piggin
2024-12-20 10:42 ` [PATCH 04/17] util/main-loop: Convert to new bh API Nicholas Piggin
2024-12-20 10:42 ` [PATCH 05/17] util/thread-pool: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 06/17] util/aio-wait: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 07/17] async/coroutine: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 08/17] migration: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 09/17] monitor: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 10/17] qmp: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 11/17] block: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 12/17] hw/ide: Fix record-replay and convert " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 13/17] hw/scsi: Convert " Nicholas Piggin
2024-12-20 23:54   ` Paolo Bonzini
2024-12-21  3:17     ` Nicholas Piggin
2024-12-20 10:42 ` [PATCH 14/17] async: add debugging assertions for record/replay in bh APIs Nicholas Piggin
2024-12-20 10:42 ` [PATCH 15/17] tests/avocado/replay_linux: Fix compile error Nicholas Piggin
2024-12-20 10:42 ` [PATCH 16/17] tests/avocado/replay_linux: Fix cdrom device setup Nicholas Piggin
2024-12-20 10:42 ` [PATCH 17/17] tests/avocado/replay_linux: remove the timeout expected guards Nicholas Piggin
2024-12-20 11:42 ` [PATCH 00/17] replay: Fixes and avocado test updates Pavel Dovgalyuk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.