[PULL 00/33] Block layer patches

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PULL 00/33] Block layer patches
@ 2023-12-21 21:23 Kevin Wolf
  2023-12-21 21:23 ` [PULL 01/33] nbd/server: avoid per-NBDRequest nbd_client_get/put() Kevin Wolf
                   ` (32 more replies)
  0 siblings, 33 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

The following changes since commit 191710c221f65b1542f6ea7fa4d30dde6e134fd7:

  Merge tag 'pull-request-2023-12-20' of https://gitlab.com/thuth/qemu into staging (2023-12-20 09:40:16 -0500)

are available in the Git repository at:

  https://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to ec25ed82df474caea009df2ef948bfed4e6d81fd:

  virtio-blk: add iothread-vq-mapping parameter (2023-12-21 22:00:38 +0100)

----------------------------------------------------------------
Block layer patches

- virtio-blk: Multiqueue support (configurable iothread per queue)
- Made NBD export and hw/scsi thread-safe without AioContext lock
- Fix crash when loading snapshot on inactive node

----------------------------------------------------------------
Kevin Wolf (3):
      block: Fix crash when loading snapshot on inactive node
      vl: Improve error message for conflicting -incoming and -loadvm
      iotests: Basic tests for internal snapshots

Stefan Hajnoczi (30):
      nbd/server: avoid per-NBDRequest nbd_client_get/put()
      nbd/server: only traverse NBDExport->clients from main loop thread
      nbd/server: introduce NBDClient->lock to protect fields
      block/file-posix: set up Linux AIO and io_uring in the current thread
      virtio-blk: add lock to protect s->rq
      virtio-blk: don't lock AioContext in the completion code path
      virtio-blk: don't lock AioContext in the submission code path
      scsi: only access SCSIDevice->requests from one thread
      virtio-scsi: don't lock AioContext around virtio_queue_aio_attach_host_notifier()
      scsi: don't lock AioContext in I/O code path
      dma-helpers: don't lock AioContext in dma_blk_cb()
      virtio-scsi: replace AioContext lock with tmf_bh_lock
      scsi: assert that callbacks run in the correct AioContext
      tests: remove aio_context_acquire() tests
      aio: make aio_context_acquire()/aio_context_release() a no-op
      graph-lock: remove AioContext locking
      block: remove AioContext locking
      block: remove bdrv_co_lock()
      scsi: remove AioContext locking
      aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED()
      aio: remove aio_context_acquire()/aio_context_release() API
      docs: remove AioContext lock from IOThread docs
      scsi: remove outdated AioContext lock comment
      job: remove outdated AioContext locking comments
      block: remove outdated AioContext locking comments
      block-coroutine-wrapper: use qemu_get_current_aio_context()
      string-output-visitor: show structs as "<omitted>"
      qdev-properties: alias all object class properties
      qdev: add IOThreadVirtQueueMappingList property type
      virtio-blk: add iothread-vq-mapping parameter

 qapi/virtio.json                                   |  29 ++
 docs/devel/multiple-iothreads.txt                  |  47 +--
 hw/block/dataplane/virtio-blk.h                    |   3 +
 include/block/aio-wait.h                           |  16 +-
 include/block/aio.h                                |  17 -
 include/block/block-common.h                       |   3 -
 include/block/block-global-state.h                 |  23 +-
 include/block/block-io.h                           |  12 +-
 include/block/block_int-common.h                   |   2 -
 include/block/graph-lock.h                         |  21 +-
 include/block/snapshot.h                           |   2 -
 include/hw/qdev-properties-system.h                |   5 +
 include/hw/qdev-properties.h                       |   4 +-
 include/hw/scsi/scsi.h                             |   7 +-
 include/hw/virtio/virtio-blk.h                     |   5 +-
 include/hw/virtio/virtio-scsi.h                    |  17 +-
 include/qapi/string-output-visitor.h               |   6 +-
 include/qemu/job.h                                 |  20 --
 block.c                                            | 363 +++------------------
 block/backup.c                                     |   4 +-
 block/blklogwrites.c                               |   8 +-
 block/blkverify.c                                  |   4 +-
 block/block-backend.c                              |  33 +-
 block/commit.c                                     |  16 +-
 block/copy-before-write.c                          |  22 +-
 block/export/export.c                              |  22 +-
 block/export/vhost-user-blk-server.c               |   4 -
 block/file-posix.c                                 |  99 +++---
 block/graph-lock.c                                 |  44 +--
 block/io.c                                         |  45 +--
 block/mirror.c                                     |  41 +--
 block/monitor/bitmap-qmp-cmds.c                    |  20 +-
 block/monitor/block-hmp-cmds.c                     |  29 --
 block/qapi-sysemu.c                                |  27 +-
 block/qapi.c                                       |  18 +-
 block/qcow2.c                                      |   4 +-
 block/quorum.c                                     |   8 +-
 block/raw-format.c                                 |   5 -
 block/replication.c                                |  72 +---
 block/snapshot.c                                   |  30 +-
 block/stream.c                                     |  12 +-
 block/vmdk.c                                       |  20 +-
 block/write-threshold.c                            |   6 -
 blockdev.c                                         | 320 ++++--------------
 blockjob.c                                         |  30 +-
 hw/block/dataplane/virtio-blk.c                    | 165 +++++++---
 hw/block/dataplane/xen-block.c                     |  17 +-
 hw/block/virtio-blk.c                              | 209 +++++++-----
 hw/core/qdev-properties-system.c                   |  55 +++-
 hw/core/qdev-properties.c                          |  18 +-
 hw/scsi/scsi-bus.c                                 | 183 +++++++----
 hw/scsi/scsi-disk.c                                |  67 +---
 hw/scsi/scsi-generic.c                             |  20 +-
 hw/scsi/virtio-scsi-dataplane.c                    |   8 +-
 hw/scsi/virtio-scsi.c                              |  80 ++---
 job.c                                              |  16 -
 migration/block.c                                  |  34 +-
 migration/migration-hmp-cmds.c                     |   3 -
 migration/savevm.c                                 |  22 --
 nbd/server.c                                       | 208 +++++++++---
 net/colo-compare.c                                 |   2 -
 qapi/string-output-visitor.c                       |  16 +
 qemu-img.c                                         |   4 -
 qemu-io.c                                          |  10 +-
 qemu-nbd.c                                         |   2 -
 replay/replay-debugging.c                          |   4 -
 system/dma-helpers.c                               |  10 +-
 system/vl.c                                        |   4 +
 tests/unit/test-aio.c                              |  67 +---
 tests/unit/test-bdrv-drain.c                       |  91 ++----
 tests/unit/test-bdrv-graph-mod.c                   |  26 +-
 tests/unit/test-block-iothread.c                   |  31 --
 tests/unit/test-blockjob.c                         | 137 --------
 tests/unit/test-replication.c                      |  11 -
 util/async.c                                       |  14 -
 util/vhost-user-server.c                           |   3 -
 scripts/block-coroutine-wrapper.py                 |  13 +-
 tests/qemu-iotests/202                             |   2 +-
 tests/qemu-iotests/203                             |   3 +-
 tests/qemu-iotests/tests/qcow2-internal-snapshots  | 170 ++++++++++
 .../tests/qcow2-internal-snapshots.out             | 107 ++++++
 tests/tsan/suppressions.tsan                       |   1 -
 82 files changed, 1337 insertions(+), 2041 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/qcow2-internal-snapshots
 create mode 100644 tests/qemu-iotests/tests/qcow2-internal-snapshots.out



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PULL 01/33] nbd/server: avoid per-NBDRequest nbd_client_get/put()
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 02/33] nbd/server: only traverse NBDExport->clients from main loop thread Kevin Wolf
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

nbd_trip() processes a single NBD request from start to finish and holds
an NBDClient reference throughout. NBDRequest does not outlive the scope
of nbd_trip(). Therefore it is unnecessary to ref/unref NBDClient for
each NBDRequest.

Removing these nbd_client_get()/nbd_client_put() calls will make
thread-safety easier in the commits that follow.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20231221192452.1785567-5-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 nbd/server.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 895cf0a752..0b09ccc8dc 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1557,7 +1557,6 @@ static NBDRequestData *nbd_request_get(NBDClient *client)
     client->nb_requests++;
 
     req = g_new0(NBDRequestData, 1);
-    nbd_client_get(client);
     req->client = client;
     return req;
 }
@@ -1578,8 +1577,6 @@ static void nbd_request_put(NBDRequestData *req)
     }
 
     nbd_client_receive_next_request(client);
-
-    nbd_client_put(client);
 }
 
 static void blk_aio_attached(AioContext *ctx, void *opaque)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 02/33] nbd/server: only traverse NBDExport->clients from main loop thread
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
  2023-12-21 21:23 ` [PULL 01/33] nbd/server: avoid per-NBDRequest nbd_client_get/put() Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 03/33] nbd/server: introduce NBDClient->lock to protect fields Kevin Wolf
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

The NBD clients list is currently accessed from both the export
AioContext and the main loop thread. When the AioContext lock is removed
there will be nothing protecting the clients list.

Adding a lock around the clients list is tricky because NBDClient
structs are refcounted and may be freed from the export AioContext or
the main loop thread. nbd_export_request_shutdown() -> client_close() ->
nbd_client_put() is also tricky because the list lock would be held
while indirectly dropping references to NDBClients.

A simpler approach is to only allow nbd_client_put() and client_close()
calls from the main loop thread. Then the NBD clients list is only
accessed from the main loop thread and no fancy locking is needed.

nbd_trip() just needs to reschedule itself in the main loop AioContext
before calling nbd_client_put() and client_close(). This costs more CPU
cycles per NBD request so add nbd_client_put_nonzero() to optimize the
common case where more references to NBDClient remain.

Note that nbd_client_get() can still be called from either thread, so
make NBDClient->refcount atomic.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231221192452.1785567-6-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 nbd/server.c | 61 +++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 51 insertions(+), 10 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 0b09ccc8dc..e91e2e0903 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -122,7 +122,7 @@ struct NBDMetaContexts {
 };
 
 struct NBDClient {
-    int refcount;
+    int refcount; /* atomic */
     void (*close_fn)(NBDClient *client, bool negotiated);
 
     NBDExport *exp;
@@ -1501,14 +1501,17 @@ static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *reque
 
 #define MAX_NBD_REQUESTS 16
 
+/* Runs in export AioContext and main loop thread */
 void nbd_client_get(NBDClient *client)
 {
-    client->refcount++;
+    qatomic_inc(&client->refcount);
 }
 
 void nbd_client_put(NBDClient *client)
 {
-    if (--client->refcount == 0) {
+    assert(qemu_in_main_thread());
+
+    if (qatomic_fetch_dec(&client->refcount) == 1) {
         /* The last reference should be dropped by client->close,
          * which is called by client_close.
          */
@@ -1529,8 +1532,35 @@ void nbd_client_put(NBDClient *client)
     }
 }
 
+/*
+ * Tries to release the reference to @client, but only if other references
+ * remain. This is an optimization for the common case where we want to avoid
+ * the expense of scheduling nbd_client_put() in the main loop thread.
+ *
+ * Returns true upon success or false if the reference was not released because
+ * it is the last reference.
+ */
+static bool nbd_client_put_nonzero(NBDClient *client)
+{
+    int old = qatomic_read(&client->refcount);
+    int expected;
+
+    do {
+        if (old == 1) {
+            return false;
+        }
+
+        expected = old;
+        old = qatomic_cmpxchg(&client->refcount, expected, expected - 1);
+    } while (old != expected);
+
+    return true;
+}
+
 static void client_close(NBDClient *client, bool negotiated)
 {
+    assert(qemu_in_main_thread());
+
     if (client->closing) {
         return;
     }
@@ -2933,15 +2963,20 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
 static coroutine_fn void nbd_trip(void *opaque)
 {
     NBDClient *client = opaque;
-    NBDRequestData *req;
+    NBDRequestData *req = NULL;
     NBDRequest request = { 0 };    /* GCC thinks it can be used uninitialized */
     int ret;
     Error *local_err = NULL;
 
+    /*
+     * Note that nbd_client_put() and client_close() must be called from the
+     * main loop thread. Use aio_co_reschedule_self() to switch AioContext
+     * before calling these functions.
+     */
+
     trace_nbd_trip();
     if (client->closing) {
-        nbd_client_put(client);
-        return;
+        goto done;
     }
 
     if (client->quiescing) {
@@ -2949,10 +2984,9 @@ static coroutine_fn void nbd_trip(void *opaque)
          * We're switching between AIO contexts. Don't attempt to receive a new
          * request and kick the main context which may be waiting for us.
          */
-        nbd_client_put(client);
         client->recv_coroutine = NULL;
         aio_wait_kick();
-        return;
+        goto done;
     }
 
     req = nbd_request_get(client);
@@ -3012,8 +3046,13 @@ static coroutine_fn void nbd_trip(void *opaque)
 
     qio_channel_set_cork(client->ioc, false);
 done:
-    nbd_request_put(req);
-    nbd_client_put(client);
+    if (req) {
+        nbd_request_put(req);
+    }
+    if (!nbd_client_put_nonzero(client)) {
+        aio_co_reschedule_self(qemu_get_aio_context());
+        nbd_client_put(client);
+    }
     return;
 
 disconnect:
@@ -3021,6 +3060,8 @@ disconnect:
         error_reportf_err(local_err, "Disconnect client, due to: ");
     }
     nbd_request_put(req);
+
+    aio_co_reschedule_self(qemu_get_aio_context());
     client_close(client, true);
     nbd_client_put(client);
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 03/33] nbd/server: introduce NBDClient->lock to protect fields
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
  2023-12-21 21:23 ` [PULL 01/33] nbd/server: avoid per-NBDRequest nbd_client_get/put() Kevin Wolf
  2023-12-21 21:23 ` [PULL 02/33] nbd/server: only traverse NBDExport->clients from main loop thread Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 04/33] block/file-posix: set up Linux AIO and io_uring in the current thread Kevin Wolf
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

NBDClient has a number of fields that are accessed by both the export
AioContext and the main loop thread. When the AioContext lock is removed
these fields will need another form of protection.

Add NBDClient->lock and protect fields that are accessed by both
threads. Also add assertions where possible and otherwise add doc
comments stating assumptions about which thread and lock holding.

Note this patch moves the client->recv_coroutine assertion from
nbd_co_receive_request() to nbd_trip() where client->lock is held.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231221192452.1785567-7-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 nbd/server.c | 144 +++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 111 insertions(+), 33 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index e91e2e0903..941832f178 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -125,23 +125,25 @@ struct NBDClient {
     int refcount; /* atomic */
     void (*close_fn)(NBDClient *client, bool negotiated);
 
+    QemuMutex lock;
+
     NBDExport *exp;
     QCryptoTLSCreds *tlscreds;
     char *tlsauthz;
     QIOChannelSocket *sioc; /* The underlying data channel */
     QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
 
-    Coroutine *recv_coroutine;
+    Coroutine *recv_coroutine; /* protected by lock */
 
     CoMutex send_lock;
     Coroutine *send_coroutine;
 
-    bool read_yielding;
-    bool quiescing;
+    bool read_yielding; /* protected by lock */
+    bool quiescing; /* protected by lock */
 
     QTAILQ_ENTRY(NBDClient) next;
-    int nb_requests;
-    bool closing;
+    int nb_requests; /* protected by lock */
+    bool closing; /* protected by lock */
 
     uint32_t check_align; /* If non-zero, check for aligned client requests */
 
@@ -1415,11 +1417,18 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
 
         len = qio_channel_readv(client->ioc, &iov, 1, errp);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
-            client->read_yielding = true;
+            WITH_QEMU_LOCK_GUARD(&client->lock) {
+                client->read_yielding = true;
+
+                /* Prompt main loop thread to re-run nbd_drained_poll() */
+                aio_wait_kick();
+            }
             qio_channel_yield(client->ioc, G_IO_IN);
-            client->read_yielding = false;
-            if (client->quiescing) {
-                return -EAGAIN;
+            WITH_QEMU_LOCK_GUARD(&client->lock) {
+                client->read_yielding = false;
+                if (client->quiescing) {
+                    return -EAGAIN;
+                }
             }
             continue;
         } else if (len < 0) {
@@ -1528,6 +1537,7 @@ void nbd_client_put(NBDClient *client)
             blk_exp_unref(&client->exp->common);
         }
         g_free(client->contexts.bitmaps);
+        qemu_mutex_destroy(&client->lock);
         g_free(client);
     }
 }
@@ -1561,11 +1571,13 @@ static void client_close(NBDClient *client, bool negotiated)
 {
     assert(qemu_in_main_thread());
 
-    if (client->closing) {
-        return;
-    }
+    WITH_QEMU_LOCK_GUARD(&client->lock) {
+        if (client->closing) {
+            return;
+        }
 
-    client->closing = true;
+        client->closing = true;
+    }
 
     /* Force requests to finish.  They will drop their own references,
      * then we'll close the socket and free the NBDClient.
@@ -1579,6 +1591,7 @@ static void client_close(NBDClient *client, bool negotiated)
     }
 }
 
+/* Runs in export AioContext with client->lock held */
 static NBDRequestData *nbd_request_get(NBDClient *client)
 {
     NBDRequestData *req;
@@ -1591,6 +1604,7 @@ static NBDRequestData *nbd_request_get(NBDClient *client)
     return req;
 }
 
+/* Runs in export AioContext with client->lock held */
 static void nbd_request_put(NBDRequestData *req)
 {
     NBDClient *client = req->client;
@@ -1614,14 +1628,18 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
     NBDExport *exp = opaque;
     NBDClient *client;
 
+    assert(qemu_in_main_thread());
+
     trace_nbd_blk_aio_attached(exp->name, ctx);
 
     exp->common.ctx = ctx;
 
     QTAILQ_FOREACH(client, &exp->clients, next) {
-        assert(client->nb_requests == 0);
-        assert(client->recv_coroutine == NULL);
-        assert(client->send_coroutine == NULL);
+        WITH_QEMU_LOCK_GUARD(&client->lock) {
+            assert(client->nb_requests == 0);
+            assert(client->recv_coroutine == NULL);
+            assert(client->send_coroutine == NULL);
+        }
     }
 }
 
@@ -1629,6 +1647,8 @@ static void blk_aio_detach(void *opaque)
 {
     NBDExport *exp = opaque;
 
+    assert(qemu_in_main_thread());
+
     trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
 
     exp->common.ctx = NULL;
@@ -1639,8 +1659,12 @@ static void nbd_drained_begin(void *opaque)
     NBDExport *exp = opaque;
     NBDClient *client;
 
+    assert(qemu_in_main_thread());
+
     QTAILQ_FOREACH(client, &exp->clients, next) {
-        client->quiescing = true;
+        WITH_QEMU_LOCK_GUARD(&client->lock) {
+            client->quiescing = true;
+        }
     }
 }
 
@@ -1649,28 +1673,48 @@ static void nbd_drained_end(void *opaque)
     NBDExport *exp = opaque;
     NBDClient *client;
 
+    assert(qemu_in_main_thread());
+
     QTAILQ_FOREACH(client, &exp->clients, next) {
-        client->quiescing = false;
-        nbd_client_receive_next_request(client);
+        WITH_QEMU_LOCK_GUARD(&client->lock) {
+            client->quiescing = false;
+            nbd_client_receive_next_request(client);
+        }
     }
 }
 
+/* Runs in export AioContext */
+static void nbd_wake_read_bh(void *opaque)
+{
+    NBDClient *client = opaque;
+    qio_channel_wake_read(client->ioc);
+}
+
 static bool nbd_drained_poll(void *opaque)
 {
     NBDExport *exp = opaque;
     NBDClient *client;
 
+    assert(qemu_in_main_thread());
+
     QTAILQ_FOREACH(client, &exp->clients, next) {
-        if (client->nb_requests != 0) {
-            /*
-             * If there's a coroutine waiting for a request on nbd_read_eof()
-             * enter it here so we don't depend on the client to wake it up.
-             */
-            if (client->recv_coroutine != NULL && client->read_yielding) {
-                qio_channel_wake_read(client->ioc);
-            }
+        WITH_QEMU_LOCK_GUARD(&client->lock) {
+            if (client->nb_requests != 0) {
+                /*
+                 * If there's a coroutine waiting for a request on nbd_read_eof()
+                 * enter it here so we don't depend on the client to wake it up.
+                 *
+                 * Schedule a BH in the export AioContext to avoid missing the
+                 * wake up due to the race between qio_channel_wake_read() and
+                 * qio_channel_yield().
+                 */
+                if (client->recv_coroutine != NULL && client->read_yielding) {
+                    aio_bh_schedule_oneshot(nbd_export_aio_context(client->exp),
+                                            nbd_wake_read_bh, client);
+                }
 
-            return true;
+                return true;
+            }
         }
     }
 
@@ -1681,6 +1725,8 @@ static void nbd_eject_notifier(Notifier *n, void *data)
 {
     NBDExport *exp = container_of(n, NBDExport, eject_notifier);
 
+    assert(qemu_in_main_thread());
+
     blk_exp_request_shutdown(&exp->common);
 }
 
@@ -2566,7 +2612,6 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
     int ret;
 
     g_assert(qemu_in_coroutine());
-    assert(client->recv_coroutine == qemu_coroutine_self());
     ret = nbd_receive_request(client, request, errp);
     if (ret < 0) {
         return ret;
@@ -2975,6 +3020,9 @@ static coroutine_fn void nbd_trip(void *opaque)
      */
 
     trace_nbd_trip();
+
+    qemu_mutex_lock(&client->lock);
+
     if (client->closing) {
         goto done;
     }
@@ -2990,7 +3038,21 @@ static coroutine_fn void nbd_trip(void *opaque)
     }
 
     req = nbd_request_get(client);
-    ret = nbd_co_receive_request(req, &request, &local_err);
+
+    /*
+     * nbd_co_receive_request() returns -EAGAIN when nbd_drained_begin() has
+     * set client->quiescing but by the time we get back nbd_drained_end() may
+     * have already cleared client->quiescing. In that case we try again
+     * because nothing else will spawn an nbd_trip() coroutine until we set
+     * client->recv_coroutine = NULL further down.
+     */
+    do {
+        assert(client->recv_coroutine == qemu_coroutine_self());
+        qemu_mutex_unlock(&client->lock);
+        ret = nbd_co_receive_request(req, &request, &local_err);
+        qemu_mutex_lock(&client->lock);
+    } while (ret == -EAGAIN && !client->quiescing);
+
     client->recv_coroutine = NULL;
 
     if (client->closing) {
@@ -3002,15 +3064,16 @@ static coroutine_fn void nbd_trip(void *opaque)
     }
 
     if (ret == -EAGAIN) {
-        assert(client->quiescing);
         goto done;
     }
 
     nbd_client_receive_next_request(client);
+
     if (ret == -EIO) {
         goto disconnect;
     }
 
+    qemu_mutex_unlock(&client->lock);
     qio_channel_set_cork(client->ioc, true);
 
     if (ret < 0) {
@@ -3030,6 +3093,10 @@ static coroutine_fn void nbd_trip(void *opaque)
         g_free(request.contexts->bitmaps);
         g_free(request.contexts);
     }
+
+    qio_channel_set_cork(client->ioc, false);
+    qemu_mutex_lock(&client->lock);
+
     if (ret < 0) {
         error_prepend(&local_err, "Failed to send reply: ");
         goto disconnect;
@@ -3044,11 +3111,13 @@ static coroutine_fn void nbd_trip(void *opaque)
         goto disconnect;
     }
 
-    qio_channel_set_cork(client->ioc, false);
 done:
     if (req) {
         nbd_request_put(req);
     }
+
+    qemu_mutex_unlock(&client->lock);
+
     if (!nbd_client_put_nonzero(client)) {
         aio_co_reschedule_self(qemu_get_aio_context());
         nbd_client_put(client);
@@ -3059,13 +3128,19 @@ disconnect:
     if (local_err) {
         error_reportf_err(local_err, "Disconnect client, due to: ");
     }
+
     nbd_request_put(req);
+    qemu_mutex_unlock(&client->lock);
 
     aio_co_reschedule_self(qemu_get_aio_context());
     client_close(client, true);
     nbd_client_put(client);
 }
 
+/*
+ * Runs in export AioContext and main loop thread. Caller must hold
+ * client->lock.
+ */
 static void nbd_client_receive_next_request(NBDClient *client)
 {
     if (!client->recv_coroutine && client->nb_requests < MAX_NBD_REQUESTS &&
@@ -3091,7 +3166,9 @@ static coroutine_fn void nbd_co_client_start(void *opaque)
         return;
     }
 
-    nbd_client_receive_next_request(client);
+    WITH_QEMU_LOCK_GUARD(&client->lock) {
+        nbd_client_receive_next_request(client);
+    }
 }
 
 /*
@@ -3108,6 +3185,7 @@ void nbd_client_new(QIOChannelSocket *sioc,
     Coroutine *co;
 
     client = g_new0(NBDClient, 1);
+    qemu_mutex_init(&client->lock);
     client->refcount = 1;
     client->tlscreds = tlscreds;
     if (tlscreds) {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 04/33] block/file-posix: set up Linux AIO and io_uring in the current thread
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (2 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 03/33] nbd/server: introduce NBDClient->lock to protect fields Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 05/33] virtio-blk: add lock to protect s->rq Kevin Wolf
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

The file-posix block driver currently only sets up Linux AIO and
io_uring in the BDS's AioContext. In the multi-queue block layer we must
be able to submit I/O requests in AioContexts that do not have Linux AIO
and io_uring set up yet since any thread can call into the block driver.

Set up Linux AIO and io_uring for the current AioContext during request
submission. We lose the ability to return an error from
.bdrv_file_open() when Linux AIO and io_uring setup fails (e.g. due to
resource limits). Instead the user only gets warnings and we fall back
to aio=threads. This is still better than a fatal error after startup.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230914140101.1065008-2-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/file-posix.c | 99 ++++++++++++++++++++++------------------------
 1 file changed, 47 insertions(+), 52 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b862406c71..0b9d4679c4 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -712,17 +712,11 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
 
 #ifdef CONFIG_LINUX_AIO
      /* Currently Linux does AIO only for files opened with O_DIRECT */
-    if (s->use_linux_aio) {
-        if (!(s->open_flags & O_DIRECT)) {
-            error_setg(errp, "aio=native was specified, but it requires "
-                             "cache.direct=on, which was not specified.");
-            ret = -EINVAL;
-            goto fail;
-        }
-        if (!aio_setup_linux_aio(bdrv_get_aio_context(bs), errp)) {
-            error_prepend(errp, "Unable to use native AIO: ");
-            goto fail;
-        }
+    if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
+        error_setg(errp, "aio=native was specified, but it requires "
+                         "cache.direct=on, which was not specified.");
+        ret = -EINVAL;
+        goto fail;
     }
 #else
     if (s->use_linux_aio) {
@@ -733,14 +727,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     }
 #endif /* !defined(CONFIG_LINUX_AIO) */
 
-#ifdef CONFIG_LINUX_IO_URING
-    if (s->use_linux_io_uring) {
-        if (!aio_setup_linux_io_uring(bdrv_get_aio_context(bs), errp)) {
-            error_prepend(errp, "Unable to use io_uring: ");
-            goto fail;
-        }
-    }
-#else
+#ifndef CONFIG_LINUX_IO_URING
     if (s->use_linux_io_uring) {
         error_setg(errp, "aio=io_uring was specified, but is not supported "
                          "in this build.");
@@ -2444,6 +2431,44 @@ static bool bdrv_qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov)
     return true;
 }
 
+static inline bool raw_check_linux_io_uring(BDRVRawState *s)
+{
+    Error *local_err = NULL;
+    AioContext *ctx;
+
+    if (!s->use_linux_io_uring) {
+        return false;
+    }
+
+    ctx = qemu_get_current_aio_context();
+    if (unlikely(!aio_setup_linux_io_uring(ctx, &local_err))) {
+        error_reportf_err(local_err, "Unable to use linux io_uring, "
+                                     "falling back to thread pool: ");
+        s->use_linux_io_uring = false;
+        return false;
+    }
+    return true;
+}
+
+static inline bool raw_check_linux_aio(BDRVRawState *s)
+{
+    Error *local_err = NULL;
+    AioContext *ctx;
+
+    if (!s->use_linux_aio) {
+        return false;
+    }
+
+    ctx = qemu_get_current_aio_context();
+    if (unlikely(!aio_setup_linux_aio(ctx, &local_err))) {
+        error_reportf_err(local_err, "Unable to use Linux AIO, "
+                                     "falling back to thread pool: ");
+        s->use_linux_aio = false;
+        return false;
+    }
+    return true;
+}
+
 static int coroutine_fn raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,
                                    uint64_t bytes, QEMUIOVector *qiov, int type)
 {
@@ -2474,13 +2499,13 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,
     if (s->needs_alignment && !bdrv_qiov_is_aligned(bs, qiov)) {
         type |= QEMU_AIO_MISALIGNED;
 #ifdef CONFIG_LINUX_IO_URING
-    } else if (s->use_linux_io_uring) {
+    } else if (raw_check_linux_io_uring(s)) {
         assert(qiov->size == bytes);
         ret = luring_co_submit(bs, s->fd, offset, qiov, type);
         goto out;
 #endif
 #ifdef CONFIG_LINUX_AIO
-    } else if (s->use_linux_aio) {
+    } else if (raw_check_linux_aio(s)) {
         assert(qiov->size == bytes);
         ret = laio_co_submit(s->fd, offset, qiov, type,
                               s->aio_max_batch);
@@ -2567,39 +2592,13 @@ static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
     };
 
 #ifdef CONFIG_LINUX_IO_URING
-    if (s->use_linux_io_uring) {
+    if (raw_check_linux_io_uring(s)) {
         return luring_co_submit(bs, s->fd, 0, NULL, QEMU_AIO_FLUSH);
     }
 #endif
     return raw_thread_pool_submit(handle_aiocb_flush, &acb);
 }
 
-static void raw_aio_attach_aio_context(BlockDriverState *bs,
-                                       AioContext *new_context)
-{
-    BDRVRawState __attribute__((unused)) *s = bs->opaque;
-#ifdef CONFIG_LINUX_AIO
-    if (s->use_linux_aio) {
-        Error *local_err = NULL;
-        if (!aio_setup_linux_aio(new_context, &local_err)) {
-            error_reportf_err(local_err, "Unable to use native AIO, "
-                                         "falling back to thread pool: ");
-            s->use_linux_aio = false;
-        }
-    }
-#endif
-#ifdef CONFIG_LINUX_IO_URING
-    if (s->use_linux_io_uring) {
-        Error *local_err = NULL;
-        if (!aio_setup_linux_io_uring(new_context, &local_err)) {
-            error_reportf_err(local_err, "Unable to use linux io_uring, "
-                                         "falling back to thread pool: ");
-            s->use_linux_io_uring = false;
-        }
-    }
-#endif
-}
-
 static void raw_close(BlockDriverState *bs)
 {
     BDRVRawState *s = bs->opaque;
@@ -3896,7 +3895,6 @@ BlockDriver bdrv_file = {
     .bdrv_co_copy_range_from = raw_co_copy_range_from,
     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
     .bdrv_refresh_limits = raw_refresh_limits,
-    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
     .bdrv_co_getlength                  = raw_co_getlength,
@@ -4266,7 +4264,6 @@ static BlockDriver bdrv_host_device = {
     .bdrv_co_copy_range_from = raw_co_copy_range_from,
     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
     .bdrv_refresh_limits = raw_refresh_limits,
-    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
     .bdrv_co_getlength                  = raw_co_getlength,
@@ -4402,7 +4399,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
     .bdrv_refresh_limits    = cdrom_refresh_limits,
-    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
     .bdrv_co_getlength                  = raw_co_getlength,
@@ -4528,7 +4524,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
     .bdrv_refresh_limits    = cdrom_refresh_limits,
-    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
     .bdrv_co_getlength                  = raw_co_getlength,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 05/33] virtio-blk: add lock to protect s->rq
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (3 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 04/33] block/file-posix: set up Linux AIO and io_uring in the current thread Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 06/33] virtio-blk: don't lock AioContext in the completion code path Kevin Wolf
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

s->rq is accessed from IO_CODE and GLOBAL_STATE_CODE. Introduce a lock
to protect s->rq and eliminate reliance on the AioContext lock.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230914140101.1065008-3-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/hw/virtio/virtio-blk.h |  3 +-
 hw/block/virtio-blk.c          | 67 +++++++++++++++++++++++-----------
 2 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index dafec432ce..9881009c22 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -54,7 +54,8 @@ struct VirtIOBlockReq;
 struct VirtIOBlock {
     VirtIODevice parent_obj;
     BlockBackend *blk;
-    void *rq;
+    QemuMutex rq_lock;
+    void *rq; /* protected by rq_lock */
     VirtIOBlkConf conf;
     unsigned short sector_mask;
     bool original_wce;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index a1f8e15522..ee38e089bc 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -82,8 +82,11 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
         /* Break the link as the next request is going to be parsed from the
          * ring again. Otherwise we may end up doing a double completion! */
         req->mr_next = NULL;
-        req->next = s->rq;
-        s->rq = req;
+
+        WITH_QEMU_LOCK_GUARD(&s->rq_lock) {
+            req->next = s->rq;
+            s->rq = req;
+        }
     } else if (action == BLOCK_ERROR_ACTION_REPORT) {
         virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
         if (acct_failed) {
@@ -1183,10 +1186,13 @@ static void virtio_blk_dma_restart_bh(void *opaque)
 {
     VirtIOBlock *s = opaque;
 
-    VirtIOBlockReq *req = s->rq;
+    VirtIOBlockReq *req;
     MultiReqBuffer mrb = {};
 
-    s->rq = NULL;
+    WITH_QEMU_LOCK_GUARD(&s->rq_lock) {
+        req = s->rq;
+        s->rq = NULL;
+    }
 
     aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
     while (req) {
@@ -1238,22 +1244,29 @@ static void virtio_blk_reset(VirtIODevice *vdev)
     AioContext *ctx;
     VirtIOBlockReq *req;
 
+    /* Dataplane has stopped... */
+    assert(!s->dataplane_started);
+
+    /* ...but requests may still be in flight. */
     ctx = blk_get_aio_context(s->blk);
     aio_context_acquire(ctx);
     blk_drain(s->blk);
+    aio_context_release(ctx);
 
     /* We drop queued requests after blk_drain() because blk_drain() itself can
      * produce them. */
-    while (s->rq) {
-        req = s->rq;
-        s->rq = req->next;
-        virtqueue_detach_element(req->vq, &req->elem, 0);
-        virtio_blk_free_request(req);
-    }
+    WITH_QEMU_LOCK_GUARD(&s->rq_lock) {
+        while (s->rq) {
+            req = s->rq;
+            s->rq = req->next;
 
-    aio_context_release(ctx);
+            /* No other threads can access req->vq here */
+            virtqueue_detach_element(req->vq, &req->elem, 0);
+
+            virtio_blk_free_request(req);
+        }
+    }
 
-    assert(!s->dataplane_started);
     blk_set_enable_write_cache(s->blk, s->original_wce);
 }
 
@@ -1443,18 +1456,22 @@ static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status)
 static void virtio_blk_save_device(VirtIODevice *vdev, QEMUFile *f)
 {
     VirtIOBlock *s = VIRTIO_BLK(vdev);
-    VirtIOBlockReq *req = s->rq;
 
-    while (req) {
-        qemu_put_sbyte(f, 1);
+    WITH_QEMU_LOCK_GUARD(&s->rq_lock) {
+        VirtIOBlockReq *req = s->rq;
 
-        if (s->conf.num_queues > 1) {
-            qemu_put_be32(f, virtio_get_queue_index(req->vq));
-        }
+        while (req) {
+            qemu_put_sbyte(f, 1);
 
-        qemu_put_virtqueue_element(vdev, f, &req->elem);
-        req = req->next;
+            if (s->conf.num_queues > 1) {
+                qemu_put_be32(f, virtio_get_queue_index(req->vq));
+            }
+
+            qemu_put_virtqueue_element(vdev, f, &req->elem);
+            req = req->next;
+        }
     }
+
     qemu_put_sbyte(f, 0);
 }
 
@@ -1480,8 +1497,11 @@ static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f,
 
         req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq));
         virtio_blk_init_request(s, virtio_get_queue(vdev, vq_idx), req);
-        req->next = s->rq;
-        s->rq = req;
+
+        WITH_QEMU_LOCK_GUARD(&s->rq_lock) {
+            req->next = s->rq;
+            s->rq = req;
+        }
     }
 
     return 0;
@@ -1628,6 +1648,8 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
                                             s->host_features);
     virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);
 
+    qemu_mutex_init(&s->rq_lock);
+
     s->blk = conf->conf.blk;
     s->rq = NULL;
     s->sector_mask = (s->conf.conf.logical_block_size / BDRV_SECTOR_SIZE) - 1;
@@ -1679,6 +1701,7 @@ static void virtio_blk_device_unrealize(DeviceState *dev)
         virtio_del_queue(vdev, i);
     }
     qemu_coroutine_dec_pool_size(conf->num_queues * conf->queue_size / 2);
+    qemu_mutex_destroy(&s->rq_lock);
     blk_ram_registrar_destroy(&s->blk_ram_registrar);
     qemu_del_vm_change_state_handler(s->change);
     blockdev_mark_auto_del(s->blk);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 06/33] virtio-blk: don't lock AioContext in the completion code path
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (4 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 05/33] virtio-blk: add lock to protect s->rq Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 07/33] virtio-blk: don't lock AioContext in the submission " Kevin Wolf
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Nothing in the completion code path relies on the AioContext lock
anymore. Virtqueues are only accessed from one thread at any moment and
the s->rq global state is protected by its own lock now.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230914140101.1065008-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/block/virtio-blk.c | 34 ++++------------------------------
 1 file changed, 4 insertions(+), 30 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index ee38e089bc..f5315df042 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -105,7 +105,6 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
     VirtIOBlock *s = next->dev;
     VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-    aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
     while (next) {
         VirtIOBlockReq *req = next;
         next = req->mr_next;
@@ -138,7 +137,6 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
         block_acct_done(blk_get_stats(s->blk), &req->acct);
         virtio_blk_free_request(req);
     }
-    aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 static void virtio_blk_flush_complete(void *opaque, int ret)
@@ -146,19 +144,13 @@ static void virtio_blk_flush_complete(void *opaque, int ret)
     VirtIOBlockReq *req = opaque;
     VirtIOBlock *s = req->dev;
 
-    aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
-    if (ret) {
-        if (virtio_blk_handle_rw_error(req, -ret, 0, true)) {
-            goto out;
-        }
+    if (ret && virtio_blk_handle_rw_error(req, -ret, 0, true)) {
+        return;
     }
 
     virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
     block_acct_done(blk_get_stats(s->blk), &req->acct);
     virtio_blk_free_request(req);
-
-out:
-    aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 static void virtio_blk_discard_write_zeroes_complete(void *opaque, int ret)
@@ -168,11 +160,8 @@ static void virtio_blk_discard_write_zeroes_complete(void *opaque, int ret)
     bool is_write_zeroes = (virtio_ldl_p(VIRTIO_DEVICE(s), &req->out.type) &
                             ~VIRTIO_BLK_T_BARRIER) == VIRTIO_BLK_T_WRITE_ZEROES;
 
-    aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
-    if (ret) {
-        if (virtio_blk_handle_rw_error(req, -ret, false, is_write_zeroes)) {
-            goto out;
-        }
+    if (ret && virtio_blk_handle_rw_error(req, -ret, false, is_write_zeroes)) {
+        return;
     }
 
     virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
@@ -180,9 +169,6 @@ static void virtio_blk_discard_write_zeroes_complete(void *opaque, int ret)
         block_acct_done(blk_get_stats(s->blk), &req->acct);
     }
     virtio_blk_free_request(req);
-
-out:
-    aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 #ifdef __linux__
@@ -229,10 +215,8 @@ static void virtio_blk_ioctl_complete(void *opaque, int status)
     virtio_stl_p(vdev, &scsi->data_len, hdr->dxfer_len);
 
 out:
-    aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
     virtio_blk_req_complete(req, status);
     virtio_blk_free_request(req);
-    aio_context_release(blk_get_aio_context(s->conf.conf.blk));
     g_free(ioctl_req);
 }
 
@@ -672,7 +656,6 @@ static void virtio_blk_zone_report_complete(void *opaque, int ret)
 {
     ZoneCmdData *data = opaque;
     VirtIOBlockReq *req = data->req;
-    VirtIOBlock *s = req->dev;
     VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
     struct iovec *in_iov = data->in_iov;
     unsigned in_num = data->in_num;
@@ -763,10 +746,8 @@ static void virtio_blk_zone_report_complete(void *opaque, int ret)
     }
 
 out:
-    aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
     virtio_blk_req_complete(req, err_status);
     virtio_blk_free_request(req);
-    aio_context_release(blk_get_aio_context(s->conf.conf.blk));
     g_free(data->zone_report_data.zones);
     g_free(data);
 }
@@ -829,10 +810,8 @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
         err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
     }
 
-    aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
     virtio_blk_req_complete(req, err_status);
     virtio_blk_free_request(req);
-    aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
@@ -882,7 +861,6 @@ static void virtio_blk_zone_append_complete(void *opaque, int ret)
 {
     ZoneCmdData *data = opaque;
     VirtIOBlockReq *req = data->req;
-    VirtIOBlock *s = req->dev;
     VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
     int64_t append_sector, n;
     uint8_t err_status = VIRTIO_BLK_S_OK;
@@ -905,10 +883,8 @@ static void virtio_blk_zone_append_complete(void *opaque, int ret)
     trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
 
 out:
-    aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
     virtio_blk_req_complete(req, err_status);
     virtio_blk_free_request(req);
-    aio_context_release(blk_get_aio_context(s->conf.conf.blk));
     g_free(data);
 }
 
@@ -944,10 +920,8 @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
     return 0;
 
 out:
-    aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
     virtio_blk_req_complete(req, err_status);
     virtio_blk_free_request(req);
-    aio_context_release(blk_get_aio_context(s->conf.conf.blk));
     return err_status;
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 07/33] virtio-blk: don't lock AioContext in the submission code path
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (5 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 06/33] virtio-blk: don't lock AioContext in the completion code path Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 08/33] block: Fix crash when loading snapshot on inactive node Kevin Wolf
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

There is no need to acquire the AioContext lock around blk_aio_*() or
blk_get_geometry() anymore. I/O plugging (defer_call()) also does not
require the AioContext lock anymore.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230914140101.1065008-5-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/block/virtio-blk.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f5315df042..e110f9718b 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1111,7 +1111,6 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
     MultiReqBuffer mrb = {};
     bool suppress_notifications = virtio_queue_get_notification(vq);
 
-    aio_context_acquire(blk_get_aio_context(s->blk));
     defer_call_begin();
 
     do {
@@ -1137,7 +1136,6 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
     }
 
     defer_call_end();
-    aio_context_release(blk_get_aio_context(s->blk));
 }
 
 static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
@@ -1168,7 +1166,6 @@ static void virtio_blk_dma_restart_bh(void *opaque)
         s->rq = NULL;
     }
 
-    aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
     while (req) {
         VirtIOBlockReq *next = req->next;
         if (virtio_blk_handle_request(req, &mrb)) {
@@ -1192,8 +1189,6 @@ static void virtio_blk_dma_restart_bh(void *opaque)
 
     /* Paired with inc in virtio_blk_dma_restart_cb() */
     blk_dec_in_flight(s->conf.conf.blk);
-
-    aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 static void virtio_blk_dma_restart_cb(void *opaque, bool running,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 08/33] block: Fix crash when loading snapshot on inactive node
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (6 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 07/33] virtio-blk: don't lock AioContext in the submission " Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 09/33] vl: Improve error message for conflicting -incoming and -loadvm Kevin Wolf
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

bdrv_is_read_only() only checks if the node is configured to be
read-only eventually, but even if it returns false, writing to the node
may not be permitted at the moment (because it's inactive).

bdrv_is_writable() checks that the node can be written to right now, and
this is what the snapshot operations really need.

Change bdrv_can_snapshot() to use bdrv_is_writable() to fix crashes like
the following:

$ ./qemu-system-x86_64 -hda /tmp/test.qcow2 -loadvm foo -incoming defer
qemu-system-x86_64: ../block/io.c:1990: int bdrv_co_write_req_prepare(BdrvChild *, int64_t, int64_t, BdrvTrackedRequest *, int): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.

The resulting error message after this patch isn't perfect yet, but at
least it doesn't crash any more:

$ ./qemu-system-x86_64 -hda /tmp/test.qcow2 -loadvm foo -incoming defer
qemu-system-x86_64: Device 'ide0-hd0' is writable but does not support snapshots

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231201142520.32255-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/snapshot.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/snapshot.c b/block/snapshot.c
index ec8cf4810b..c4d40e80dd 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -196,8 +196,10 @@ bdrv_snapshot_fallback(BlockDriverState *bs)
 int bdrv_can_snapshot(BlockDriverState *bs)
 {
     BlockDriver *drv = bs->drv;
+
     GLOBAL_STATE_CODE();
-    if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
+
+    if (!drv || !bdrv_is_inserted(bs) || !bdrv_is_writable(bs)) {
         return 0;
     }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 09/33] vl: Improve error message for conflicting -incoming and -loadvm
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (7 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 08/33] block: Fix crash when loading snapshot on inactive node Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 10/33] iotests: Basic tests for internal snapshots Kevin Wolf
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

Currently, the conflict between -incoming and -loadvm is only detected
when loading the snapshot fails because the image is still inactive for
the incoming migration. This results in a suboptimal error message:

$ ./qemu-system-x86_64 -hda /tmp/test.qcow2 -loadvm foo -incoming defer
qemu-system-x86_64: Device 'ide0-hd0' is writable but does not support snapshots

Catch the situation already in qemu_validate_options() to improve the
message:

$ ./qemu-system-x86_64 -hda /tmp/test.qcow2 -loadvm foo -incoming defer
qemu-system-x86_64: 'incoming' and 'loadvm' options are mutually exclusive

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231201142520.32255-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 system/vl.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/system/vl.c b/system/vl.c
index 2bcd9efb9a..6b87bfa32c 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2426,6 +2426,10 @@ static void qemu_validate_options(const QDict *machine_opts)
           }
     }
 
+    if (loadvm && incoming) {
+        error_report("'incoming' and 'loadvm' options are mutually exclusive");
+        exit(EXIT_FAILURE);
+    }
     if (loadvm && preconfig_requested) {
         error_report("'preconfig' and 'loadvm' options are "
                      "mutually exclusive");
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 10/33] iotests: Basic tests for internal snapshots
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (8 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 09/33] vl: Improve error message for conflicting -incoming and -loadvm Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 11/33] scsi: only access SCSIDevice->requests from one thread Kevin Wolf
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

We have a few test cases that include tests for corner case aspects of
internal snapshots, but nothing that tests that they actually function
as snapshots or that involves deleting a snapshot. Add a test for this
kind of basic internal snapshot functionality.

The error cases include a regression test for the crash we just fixed
with snapshot operations on inactive images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231201142520.32255-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 .../tests/qcow2-internal-snapshots            | 170 ++++++++++++++++++
 .../tests/qcow2-internal-snapshots.out        | 107 +++++++++++
 2 files changed, 277 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qcow2-internal-snapshots
 create mode 100644 tests/qemu-iotests/tests/qcow2-internal-snapshots.out

diff --git a/tests/qemu-iotests/tests/qcow2-internal-snapshots b/tests/qemu-iotests/tests/qcow2-internal-snapshots
new file mode 100755
index 0000000000..36523aba06
--- /dev/null
+++ b/tests/qemu-iotests/tests/qcow2-internal-snapshots
@@ -0,0 +1,170 @@
+#!/usr/bin/env bash
+# group: rw quick
+#
+# Test case for internal snapshots in qcow2
+#
+# Copyright (C) 2023 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=kwolf@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+
+# This tests qcow2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto generic
+# Internal snapshots are (currently) impossible with refcount_bits=1,
+# and generally impossible with external data files
+_unsupported_imgopts 'compat=0.10' 'refcount_bits=1[^0-9]' data_file
+
+IMG_SIZE=64M
+
+_qemu()
+{
+    $QEMU -no-shutdown -nographic -monitor stdio -serial none \
+          -blockdev file,filename="$TEST_IMG",node-name=disk0-file \
+          -blockdev "$IMGFMT",file=disk0-file,node-name=disk0 \
+          -object iothread,id=iothread0 \
+          -device virtio-scsi,iothread=iothread0 \
+          -device scsi-hd,drive=disk0,share-rw=on \
+          "$@" 2>&1 |\
+    _filter_qemu | _filter_hmp | _filter_qemu_io
+}
+
+_make_test_img $IMG_SIZE
+
+echo
+echo "=== Write some data, take a snapshot and overwrite part of it ==="
+echo
+
+{
+    echo 'qemu-io disk0 "write -P0x11 0 1M"'
+    # Give qemu some time to boot before saving the VM state
+    sleep 0.5
+    echo "savevm snap0"
+    echo 'qemu-io disk0 "write -P0x22 0 512k"'
+    echo "quit"
+} | _qemu
+
+echo
+$QEMU_IMG snapshot -l "$TEST_IMG" | _filter_date | _filter_vmstate_size
+_check_test_img
+
+echo
+echo "=== Verify that loading the snapshot reverts to the old content ==="
+echo
+
+{
+    # -loadvm reverted the write from the previous QEMU instance
+    echo 'qemu-io disk0 "read -P0x11 0 1M"'
+
+    # Verify that it works without restarting QEMU, too
+    echo 'qemu-io disk0 "write -P0x33 512k 512k"'
+    echo "loadvm snap0"
+    echo 'qemu-io disk0 "read -P0x11 0 1M"'
+
+    # Verify COW by writing a partial cluster
+    echo 'qemu-io disk0 "write -P0x33 63k 2k"'
+    echo 'qemu-io disk0 "read -P0x11 0 63k"'
+    echo 'qemu-io disk0 "read -P0x33 63k 2k"'
+    echo 'qemu-io disk0 "read -P0x11 65k 63k"'
+
+    # Take a second snapshot
+    echo "savevm snap1"
+
+    echo "quit"
+} | _qemu -loadvm snap0
+
+echo
+$QEMU_IMG snapshot -l "$TEST_IMG" | _filter_date | _filter_vmstate_size
+_check_test_img
+
+echo
+echo "=== qemu-img snapshot can revert to snapshots ==="
+echo
+
+$QEMU_IMG snapshot -a snap0 "$TEST_IMG"
+$QEMU_IO -c "read -P0x11 0 1M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG snapshot -a snap1 "$TEST_IMG"
+$QEMU_IO \
+    -c "read -P0x11 0 63k" \
+    -c "read -P0x33 63k 2k" \
+    -c "read -P0x11 65k 63k" \
+    "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo "=== Deleting snapshots ==="
+echo
+{
+    # The active layer stays unaffected by deleting the snapshot
+    echo "delvm snap1"
+    echo 'qemu-io disk0 "read -P0x11 0 63k"'
+    echo 'qemu-io disk0 "read -P0x33 63k 2k"'
+    echo 'qemu-io disk0 "read -P0x11 65k 63k"'
+
+    echo "quit"
+} | _qemu
+
+
+echo
+$QEMU_IMG snapshot -l "$TEST_IMG" | _filter_date | _filter_vmstate_size
+_check_test_img
+
+echo
+echo "=== Error cases ==="
+echo
+
+# snap1 should not exist any more
+_qemu -loadvm snap1
+
+echo
+{
+    echo "loadvm snap1"
+    echo "quit"
+} | _qemu
+
+# Snapshot operations and inactive images are incompatible
+echo
+_qemu -loadvm snap0 -incoming defer
+{
+    echo "loadvm snap0"
+    echo "delvm snap0"
+    echo "savevm snap1"
+    echo "quit"
+} | _qemu -incoming defer
+
+# -loadvm and -preconfig are incompatible
+echo
+_qemu -loadvm snap0 -preconfig
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/qcow2-internal-snapshots.out b/tests/qemu-iotests/tests/qcow2-internal-snapshots.out
new file mode 100644
index 0000000000..438f535e6a
--- /dev/null
+++ b/tests/qemu-iotests/tests/qcow2-internal-snapshots.out
@@ -0,0 +1,107 @@
+QA output created by qcow2-internal-snapshots
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+
+=== Write some data, take a snapshot and overwrite part of it ===
+
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) qemu-io disk0 "write -P0x11 0 1M"
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) savevm snap0
+(qemu) qemu-io disk0 "write -P0x22 0 512k"
+wrote 524288/524288 bytes at offset 0
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) quit
+
+Snapshot list:
+ID        TAG               VM SIZE                DATE     VM CLOCK     ICOUNT
+1         snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000
+No errors were found on the image.
+
+=== Verify that loading the snapshot reverts to the old content ===
+
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) qemu-io disk0 "read -P0x11 0 1M"
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) qemu-io disk0 "write -P0x33 512k 512k"
+wrote 524288/524288 bytes at offset 524288
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) loadvm snap0
+(qemu) qemu-io disk0 "read -P0x11 0 1M"
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) qemu-io disk0 "write -P0x33 63k 2k"
+wrote 2048/2048 bytes at offset 64512
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) qemu-io disk0 "read -P0x11 0 63k"
+read 64512/64512 bytes at offset 0
+63 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) qemu-io disk0 "read -P0x33 63k 2k"
+read 2048/2048 bytes at offset 64512
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) qemu-io disk0 "read -P0x11 65k 63k"
+read 64512/64512 bytes at offset 66560
+63 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) savevm snap1
+(qemu) quit
+
+Snapshot list:
+ID        TAG               VM SIZE                DATE     VM CLOCK     ICOUNT
+1         snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000
+2         snap1                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000
+No errors were found on the image.
+
+=== qemu-img snapshot can revert to snapshots ===
+
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 64512/64512 bytes at offset 0
+63 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 2048/2048 bytes at offset 64512
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 64512/64512 bytes at offset 66560
+63 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Deleting snapshots ===
+
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) delvm snap1
+(qemu) qemu-io disk0 "read -P0x11 0 63k"
+read 64512/64512 bytes at offset 0
+63 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) qemu-io disk0 "read -P0x33 63k 2k"
+read 2048/2048 bytes at offset 64512
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) qemu-io disk0 "read -P0x11 65k 63k"
+read 64512/64512 bytes at offset 66560
+63 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) quit
+
+Snapshot list:
+ID        TAG               VM SIZE                DATE     VM CLOCK     ICOUNT
+1         snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000
+No errors were found on the image.
+
+=== Error cases ===
+
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) QEMU_PROG: Snapshot 'snap1' does not exist in one or more devices
+
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) loadvm snap1
+Error: Snapshot 'snap1' does not exist in one or more devices
+(qemu) quit
+
+QEMU_PROG: 'incoming' and 'loadvm' options are mutually exclusive
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) loadvm snap0
+Error: Device 'disk0' is writable but does not support snapshots
+(qemu) delvm snap0
+Error: Device 'disk0' is writable but does not support snapshots
+(qemu) savevm snap1
+Error: Device 'disk0' is writable but does not support snapshots
+(qemu) quit
+
+QEMU_PROG: 'preconfig' and 'loadvm' options are mutually exclusive
+*** done
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (9 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 10/33] iotests: Basic tests for internal snapshots Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2024-01-23 16:40   ` Hanna Czenczek
  2023-12-21 21:23 ` [PULL 12/33] virtio-scsi: don't lock AioContext around virtio_queue_aio_attach_host_notifier() Kevin Wolf
                   ` (21 subsequent siblings)
  32 siblings, 1 reply; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Stop depending on the AioContext lock and instead access
SCSIDevice->requests from only one thread at a time:
- When the VM is running only the BlockBackend's AioContext may access
  the requests list.
- When the VM is stopped only the main loop may access the requests
  list.

These constraints protect the requests list without the need for locking
in the I/O code path.

Note that multiple IOThreads are not supported yet because the code
assumes all SCSIRequests are executed from a single AioContext. Leave
that as future work.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231204164259.1515217-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/hw/scsi/scsi.h |   7 +-
 hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
 2 files changed, 131 insertions(+), 57 deletions(-)

diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 3692ca82f3..10c4e8288d 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -69,14 +69,19 @@ struct SCSIDevice
 {
     DeviceState qdev;
     VMChangeStateEntry *vmsentry;
-    QEMUBH *bh;
     uint32_t id;
     BlockConf conf;
     SCSISense unit_attention;
     bool sense_is_ua;
     uint8_t sense[SCSI_SENSE_BUF_SIZE];
     uint32_t sense_len;
+
+    /*
+     * The requests list is only accessed from the AioContext that executes
+     * requests or from the main loop when IOThread processing is stopped.
+     */
     QTAILQ_HEAD(, SCSIRequest) requests;
+
     uint32_t channel;
     uint32_t lun;
     int blocksize;
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index fc4b77fdb0..b649cdf555 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -85,6 +85,89 @@ SCSIDevice *scsi_device_get(SCSIBus *bus, int channel, int id, int lun)
     return d;
 }
 
+/*
+ * Invoke @fn() for each enqueued request in device @s. Must be called from the
+ * main loop thread while the guest is stopped. This is only suitable for
+ * vmstate ->put(), use scsi_device_for_each_req_async() for other cases.
+ */
+static void scsi_device_for_each_req_sync(SCSIDevice *s,
+                                          void (*fn)(SCSIRequest *, void *),
+                                          void *opaque)
+{
+    SCSIRequest *req;
+    SCSIRequest *next_req;
+
+    assert(!runstate_is_running());
+    assert(qemu_in_main_thread());
+
+    QTAILQ_FOREACH_SAFE(req, &s->requests, next, next_req) {
+        fn(req, opaque);
+    }
+}
+
+typedef struct {
+    SCSIDevice *s;
+    void (*fn)(SCSIRequest *, void *);
+    void *fn_opaque;
+} SCSIDeviceForEachReqAsyncData;
+
+static void scsi_device_for_each_req_async_bh(void *opaque)
+{
+    g_autofree SCSIDeviceForEachReqAsyncData *data = opaque;
+    SCSIDevice *s = data->s;
+    AioContext *ctx;
+    SCSIRequest *req;
+    SCSIRequest *next;
+
+    /*
+     * If the AioContext changed before this BH was called then reschedule into
+     * the new AioContext before accessing ->requests. This can happen when
+     * scsi_device_for_each_req_async() is called and then the AioContext is
+     * changed before BHs are run.
+     */
+    ctx = blk_get_aio_context(s->conf.blk);
+    if (ctx != qemu_get_current_aio_context()) {
+        aio_bh_schedule_oneshot(ctx, scsi_device_for_each_req_async_bh,
+                                g_steal_pointer(&data));
+        return;
+    }
+
+    QTAILQ_FOREACH_SAFE(req, &s->requests, next, next) {
+        data->fn(req, data->fn_opaque);
+    }
+
+    /* Drop the reference taken by scsi_device_for_each_req_async() */
+    object_unref(OBJECT(s));
+}
+
+/*
+ * Schedule @fn() to be invoked for each enqueued request in device @s. @fn()
+ * runs in the AioContext that is executing the request.
+ */
+static void scsi_device_for_each_req_async(SCSIDevice *s,
+                                           void (*fn)(SCSIRequest *, void *),
+                                           void *opaque)
+{
+    assert(qemu_in_main_thread());
+
+    SCSIDeviceForEachReqAsyncData *data =
+        g_new(SCSIDeviceForEachReqAsyncData, 1);
+
+    data->s = s;
+    data->fn = fn;
+    data->fn_opaque = opaque;
+
+    /*
+     * Hold a reference to the SCSIDevice until
+     * scsi_device_for_each_req_async_bh() finishes.
+     */
+    object_ref(OBJECT(s));
+
+    aio_bh_schedule_oneshot(blk_get_aio_context(s->conf.blk),
+                            scsi_device_for_each_req_async_bh,
+                            data);
+}
+
 static void scsi_device_realize(SCSIDevice *s, Error **errp)
 {
     SCSIDeviceClass *sc = SCSI_DEVICE_GET_CLASS(s);
@@ -144,20 +227,18 @@ void scsi_bus_init_named(SCSIBus *bus, size_t bus_size, DeviceState *host,
     qbus_set_bus_hotplug_handler(BUS(bus));
 }
 
-static void scsi_dma_restart_bh(void *opaque)
+void scsi_req_retry(SCSIRequest *req)
 {
-    SCSIDevice *s = opaque;
-    SCSIRequest *req, *next;
-
-    qemu_bh_delete(s->bh);
-    s->bh = NULL;
+    req->retry = true;
+}
 
-    aio_context_acquire(blk_get_aio_context(s->conf.blk));
-    QTAILQ_FOREACH_SAFE(req, &s->requests, next, next) {
-        scsi_req_ref(req);
-        if (req->retry) {
-            req->retry = false;
-            switch (req->cmd.mode) {
+/* Called in the AioContext that is executing the request */
+static void scsi_dma_restart_req(SCSIRequest *req, void *opaque)
+{
+    scsi_req_ref(req);
+    if (req->retry) {
+        req->retry = false;
+        switch (req->cmd.mode) {
             case SCSI_XFER_FROM_DEV:
             case SCSI_XFER_TO_DEV:
                 scsi_req_continue(req);
@@ -166,37 +247,22 @@ static void scsi_dma_restart_bh(void *opaque)
                 scsi_req_dequeue(req);
                 scsi_req_enqueue(req);
                 break;
-            }
         }
-        scsi_req_unref(req);
     }
-    aio_context_release(blk_get_aio_context(s->conf.blk));
-    /* Drop the reference that was acquired in scsi_dma_restart_cb */
-    object_unref(OBJECT(s));
-}
-
-void scsi_req_retry(SCSIRequest *req)
-{
-    /* No need to save a reference, because scsi_dma_restart_bh just
-     * looks at the request list.  */
-    req->retry = true;
+    scsi_req_unref(req);
 }
 
 static void scsi_dma_restart_cb(void *opaque, bool running, RunState state)
 {
     SCSIDevice *s = opaque;
 
+    assert(qemu_in_main_thread());
+
     if (!running) {
         return;
     }
-    if (!s->bh) {
-        AioContext *ctx = blk_get_aio_context(s->conf.blk);
-        /* The reference is dropped in scsi_dma_restart_bh.*/
-        object_ref(OBJECT(s));
-        s->bh = aio_bh_new_guarded(ctx, scsi_dma_restart_bh, s,
-                                   &DEVICE(s)->mem_reentrancy_guard);
-        qemu_bh_schedule(s->bh);
-    }
+
+    scsi_device_for_each_req_async(s, scsi_dma_restart_req, NULL);
 }
 
 static bool scsi_bus_is_address_free(SCSIBus *bus,
@@ -1657,15 +1723,16 @@ void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
     }
 }
 
+static void scsi_device_purge_one_req(SCSIRequest *req, void *opaque)
+{
+    scsi_req_cancel_async(req, NULL);
+}
+
 void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense)
 {
-    SCSIRequest *req;
+    scsi_device_for_each_req_async(sdev, scsi_device_purge_one_req, NULL);
 
     aio_context_acquire(blk_get_aio_context(sdev->conf.blk));
-    while (!QTAILQ_EMPTY(&sdev->requests)) {
-        req = QTAILQ_FIRST(&sdev->requests);
-        scsi_req_cancel_async(req, NULL);
-    }
     blk_drain(sdev->conf.blk);
     aio_context_release(blk_get_aio_context(sdev->conf.blk));
     scsi_device_set_ua(sdev, sense);
@@ -1737,31 +1804,33 @@ static char *scsibus_get_fw_dev_path(DeviceState *dev)
 
 /* SCSI request list.  For simplicity, pv points to the whole device */
 
+static void put_scsi_req(SCSIRequest *req, void *opaque)
+{
+    QEMUFile *f = opaque;
+
+    assert(!req->io_canceled);
+    assert(req->status == -1 && req->host_status == -1);
+    assert(req->enqueued);
+
+    qemu_put_sbyte(f, req->retry ? 1 : 2);
+    qemu_put_buffer(f, req->cmd.buf, sizeof(req->cmd.buf));
+    qemu_put_be32s(f, &req->tag);
+    qemu_put_be32s(f, &req->lun);
+    if (req->bus->info->save_request) {
+        req->bus->info->save_request(f, req);
+    }
+    if (req->ops->save_request) {
+        req->ops->save_request(f, req);
+    }
+}
+
 static int put_scsi_requests(QEMUFile *f, void *pv, size_t size,
                              const VMStateField *field, JSONWriter *vmdesc)
 {
     SCSIDevice *s = pv;
-    SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, s->qdev.parent_bus);
-    SCSIRequest *req;
 
-    QTAILQ_FOREACH(req, &s->requests, next) {
-        assert(!req->io_canceled);
-        assert(req->status == -1 && req->host_status == -1);
-        assert(req->enqueued);
-
-        qemu_put_sbyte(f, req->retry ? 1 : 2);
-        qemu_put_buffer(f, req->cmd.buf, sizeof(req->cmd.buf));
-        qemu_put_be32s(f, &req->tag);
-        qemu_put_be32s(f, &req->lun);
-        if (bus->info->save_request) {
-            bus->info->save_request(f, req);
-        }
-        if (req->ops->save_request) {
-            req->ops->save_request(f, req);
-        }
-    }
+    scsi_device_for_each_req_sync(s, put_scsi_req, f);
     qemu_put_sbyte(f, 0);
-
     return 0;
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 12/33] virtio-scsi: don't lock AioContext around virtio_queue_aio_attach_host_notifier()
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (10 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 11/33] scsi: only access SCSIDevice->requests from one thread Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 13/33] scsi: don't lock AioContext in I/O code path Kevin Wolf
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

virtio_queue_aio_attach_host_notifier() does not require the AioContext
lock. Stop taking the lock and add an explicit smp_wmb() because we were
relying on the implicit barrier in the AioContext lock before.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231204164259.1515217-3-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/scsi/virtio-scsi-dataplane.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 1e684beebe..135e23fe54 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -149,23 +149,17 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 
     memory_region_transaction_commit();
 
-    /*
-     * These fields are visible to the IOThread so we rely on implicit barriers
-     * in aio_context_acquire() on the write side and aio_notify_accept() on
-     * the read side.
-     */
     s->dataplane_starting = false;
     s->dataplane_started = true;
+    smp_wmb(); /* paired with aio_notify_accept() */
 
     if (s->bus.drain_count == 0) {
-        aio_context_acquire(s->ctx);
         virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx);
         virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
 
         for (i = 0; i < vs->conf.num_queues; i++) {
             virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx);
         }
-        aio_context_release(s->ctx);
     }
     return 0;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 13/33] scsi: don't lock AioContext in I/O code path
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (11 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 12/33] virtio-scsi: don't lock AioContext around virtio_queue_aio_attach_host_notifier() Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 14/33] dma-helpers: don't lock AioContext in dma_blk_cb() Kevin Wolf
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

blk_aio_*() doesn't require the AioContext lock and the SCSI subsystem's
internal state also does not anymore.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231204164259.1515217-4-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/scsi/scsi-disk.c    | 23 -----------------------
 hw/scsi/scsi-generic.c | 20 +++-----------------
 2 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 6691f5edb8..2c1bbb3530 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -273,8 +273,6 @@ static void scsi_aio_complete(void *opaque, int ret)
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
@@ -286,7 +284,6 @@ static void scsi_aio_complete(void *opaque, int ret)
     scsi_req_complete(&r->req, GOOD);
 
 done:
-    aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
     scsi_req_unref(&r->req);
 }
 
@@ -394,8 +391,6 @@ static void scsi_read_complete(void *opaque, int ret)
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
@@ -406,7 +401,6 @@ static void scsi_read_complete(void *opaque, int ret)
         trace_scsi_disk_read_complete(r->req.tag, r->qiov.size);
     }
     scsi_read_complete_noio(r, ret);
-    aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
 /* Actually issue a read to the block device.  */
@@ -448,8 +442,6 @@ static void scsi_do_read_cb(void *opaque, int ret)
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
     assert (r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
@@ -459,7 +451,6 @@ static void scsi_do_read_cb(void *opaque, int ret)
         block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
     }
     scsi_do_read(opaque, ret);
-    aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
 /* Read more data from scsi device into buffer.  */
@@ -533,8 +524,6 @@ static void scsi_write_complete(void * opaque, int ret)
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
     assert (r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
@@ -544,7 +533,6 @@ static void scsi_write_complete(void * opaque, int ret)
         block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
     }
     scsi_write_complete_noio(r, ret);
-    aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
 static void scsi_write_data(SCSIRequest *req)
@@ -1742,8 +1730,6 @@ static void scsi_unmap_complete(void *opaque, int ret)
     SCSIDiskReq *r = data->r;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
@@ -1754,7 +1740,6 @@ static void scsi_unmap_complete(void *opaque, int ret)
         block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
         scsi_unmap_complete_noio(data, ret);
     }
-    aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
 static void scsi_disk_emulate_unmap(SCSIDiskReq *r, uint8_t *inbuf)
@@ -1822,8 +1807,6 @@ static void scsi_write_same_complete(void *opaque, int ret)
     SCSIDiskReq *r = data->r;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
-    aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
@@ -1847,7 +1830,6 @@ static void scsi_write_same_complete(void *opaque, int ret)
                                        data->sector << BDRV_SECTOR_BITS,
                                        &data->qiov, 0,
                                        scsi_write_same_complete, data);
-        aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
         return;
     }
 
@@ -1857,7 +1839,6 @@ done:
     scsi_req_unref(&r->req);
     qemu_vfree(data->iov.iov_base);
     g_free(data);
-    aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
 static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
@@ -2810,7 +2791,6 @@ static void scsi_block_sgio_complete(void *opaque, int ret)
 {
     SCSIBlockReq *req = (SCSIBlockReq *)opaque;
     SCSIDiskReq *r = &req->req;
-    SCSIDevice *s = r->req.dev;
     sg_io_hdr_t *io_hdr = &req->io_header;
 
     if (ret == 0) {
@@ -2827,13 +2807,10 @@ static void scsi_block_sgio_complete(void *opaque, int ret)
         }
 
         if (ret > 0) {
-            aio_context_acquire(blk_get_aio_context(s->conf.blk));
             if (scsi_handle_rw_error(r, ret, true)) {
-                aio_context_release(blk_get_aio_context(s->conf.blk));
                 scsi_req_unref(&r->req);
                 return;
             }
-            aio_context_release(blk_get_aio_context(s->conf.blk));
 
             /* Ignore error.  */
             ret = 0;
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 2417f0ad84..b7b04e1d63 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -109,15 +109,11 @@ done:
 static void scsi_command_complete(void *opaque, int ret)
 {
     SCSIGenericReq *r = (SCSIGenericReq *)opaque;
-    SCSIDevice *s = r->req.dev;
-
-    aio_context_acquire(blk_get_aio_context(s->conf.blk));
 
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
     scsi_command_complete_noio(r, ret);
-    aio_context_release(blk_get_aio_context(s->conf.blk));
 }
 
 static int execute_command(BlockBackend *blk,
@@ -274,14 +270,12 @@ static void scsi_read_complete(void * opaque, int ret)
     SCSIDevice *s = r->req.dev;
     int len;
 
-    aio_context_acquire(blk_get_aio_context(s->conf.blk));
-
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
     if (ret || r->req.io_canceled) {
         scsi_command_complete_noio(r, ret);
-        goto done;
+        return;
     }
 
     len = r->io_header.dxfer_len - r->io_header.resid;
@@ -320,7 +314,7 @@ static void scsi_read_complete(void * opaque, int ret)
         r->io_header.status != GOOD ||
         len == 0) {
         scsi_command_complete_noio(r, 0);
-        goto done;
+        return;
     }
 
     /* Snoop READ CAPACITY output to set the blocksize.  */
@@ -356,9 +350,6 @@ static void scsi_read_complete(void * opaque, int ret)
 req_complete:
     scsi_req_data(&r->req, len);
     scsi_req_unref(&r->req);
-
-done:
-    aio_context_release(blk_get_aio_context(s->conf.blk));
 }
 
 /* Read more data from scsi device into buffer.  */
@@ -391,14 +382,12 @@ static void scsi_write_complete(void * opaque, int ret)
 
     trace_scsi_generic_write_complete(ret);
 
-    aio_context_acquire(blk_get_aio_context(s->conf.blk));
-
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
     if (ret || r->req.io_canceled) {
         scsi_command_complete_noio(r, ret);
-        goto done;
+        return;
     }
 
     if (r->req.cmd.buf[0] == MODE_SELECT && r->req.cmd.buf[4] == 12 &&
@@ -408,9 +397,6 @@ static void scsi_write_complete(void * opaque, int ret)
     }
 
     scsi_command_complete_noio(r, ret);
-
-done:
-    aio_context_release(blk_get_aio_context(s->conf.blk));
 }
 
 /* Write data to a scsi device.  Returns nonzero on failure.
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 14/33] dma-helpers: don't lock AioContext in dma_blk_cb()
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (12 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 13/33] scsi: don't lock AioContext in I/O code path Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 15/33] virtio-scsi: replace AioContext lock with tmf_bh_lock Kevin Wolf
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Commit abfcd2760b3e ("dma-helpers: prevent dma_blk_cb() vs
dma_aio_cancel() race") acquired the AioContext lock inside dma_blk_cb()
to avoid a race with scsi_device_purge_requests() running in the main
loop thread.

The SCSI code no longer calls dma_aio_cancel() from the main loop thread
while I/O is running in the IOThread AioContext. Therefore it is no
longer necessary to take this lock to protect DMAAIOCB fields. The
->cb() function also does not require the lock because blk_aio_*() and
friends do not need the AioContext lock.

Both hw/ide/core.c and hw/ide/macio.c also call dma_blk_io() but don't
rely on it taking the AioContext lock, so this change is safe.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231204164259.1515217-5-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 system/dma-helpers.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/system/dma-helpers.c b/system/dma-helpers.c
index 36211acc7e..528117f256 100644
--- a/system/dma-helpers.c
+++ b/system/dma-helpers.c
@@ -119,13 +119,12 @@ static void dma_blk_cb(void *opaque, int ret)
 
     trace_dma_blk_cb(dbs, ret);
 
-    aio_context_acquire(ctx);
     dbs->acb = NULL;
     dbs->offset += dbs->iov.size;
 
     if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
         dma_complete(dbs, ret);
-        goto out;
+        return;
     }
     dma_blk_unmap(dbs);
 
@@ -168,7 +167,7 @@ static void dma_blk_cb(void *opaque, int ret)
         trace_dma_map_wait(dbs);
         dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);
         cpu_register_map_client(dbs->bh);
-        goto out;
+        return;
     }
 
     if (!QEMU_IS_ALIGNED(dbs->iov.size, dbs->align)) {
@@ -179,8 +178,6 @@ static void dma_blk_cb(void *opaque, int ret)
     dbs->acb = dbs->io_func(dbs->offset, &dbs->iov,
                             dma_blk_cb, dbs, dbs->io_func_opaque);
     assert(dbs->acb);
-out:
-    aio_context_release(ctx);
 }
 
 static void dma_aio_cancel(BlockAIOCB *acb)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 15/33] virtio-scsi: replace AioContext lock with tmf_bh_lock
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (13 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 14/33] dma-helpers: don't lock AioContext in dma_blk_cb() Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 16/33] scsi: assert that callbacks run in the correct AioContext Kevin Wolf
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Protect the Task Management Function BH state with a lock. The TMF BH
runs in the main loop thread. An IOThread might process a TMF at the
same time as the TMF BH is running. Therefore tmf_bh_list and tmf_bh
must be protected by a lock.

Run TMF request completion in the IOThread using aio_wait_bh_oneshot().
This avoids more locking to protect the virtqueue and SCSI layer state.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231205182011.1976568-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/hw/virtio/virtio-scsi.h |  3 +-
 hw/scsi/virtio-scsi.c           | 62 ++++++++++++++++++++++-----------
 2 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 779568ab5d..da8cb928d9 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -85,8 +85,9 @@ struct VirtIOSCSI {
 
     /*
      * TMFs deferred to main loop BH. These fields are protected by
-     * virtio_scsi_acquire().
+     * tmf_bh_lock.
      */
+    QemuMutex tmf_bh_lock;
     QEMUBH *tmf_bh;
     QTAILQ_HEAD(, VirtIOSCSIReq) tmf_bh_list;
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 9c751bf296..4f8d35facc 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -123,6 +123,30 @@ static void virtio_scsi_complete_req(VirtIOSCSIReq *req)
     virtio_scsi_free_req(req);
 }
 
+static void virtio_scsi_complete_req_bh(void *opaque)
+{
+    VirtIOSCSIReq *req = opaque;
+
+    virtio_scsi_complete_req(req);
+}
+
+/*
+ * Called from virtio_scsi_do_one_tmf_bh() in main loop thread. The main loop
+ * thread cannot touch the virtqueue since that could race with an IOThread.
+ */
+static void virtio_scsi_complete_req_from_main_loop(VirtIOSCSIReq *req)
+{
+    VirtIOSCSI *s = req->dev;
+
+    if (!s->ctx || s->ctx == qemu_get_aio_context()) {
+        /* No need to schedule a BH when there is no IOThread */
+        virtio_scsi_complete_req(req);
+    } else {
+        /* Run request completion in the IOThread */
+        aio_wait_bh_oneshot(s->ctx, virtio_scsi_complete_req_bh, req);
+    }
+}
+
 static void virtio_scsi_bad_req(VirtIOSCSIReq *req)
 {
     virtio_error(VIRTIO_DEVICE(req->dev), "wrong size for virtio-scsi headers");
@@ -338,10 +362,7 @@ static void virtio_scsi_do_one_tmf_bh(VirtIOSCSIReq *req)
 
 out:
     object_unref(OBJECT(d));
-
-    virtio_scsi_acquire(s);
-    virtio_scsi_complete_req(req);
-    virtio_scsi_release(s);
+    virtio_scsi_complete_req_from_main_loop(req);
 }
 
 /* Some TMFs must be processed from the main loop thread */
@@ -354,18 +375,16 @@ static void virtio_scsi_do_tmf_bh(void *opaque)
 
     GLOBAL_STATE_CODE();
 
-    virtio_scsi_acquire(s);
+    WITH_QEMU_LOCK_GUARD(&s->tmf_bh_lock) {
+        QTAILQ_FOREACH_SAFE(req, &s->tmf_bh_list, next, tmp) {
+            QTAILQ_REMOVE(&s->tmf_bh_list, req, next);
+            QTAILQ_INSERT_TAIL(&reqs, req, next);
+        }
 
-    QTAILQ_FOREACH_SAFE(req, &s->tmf_bh_list, next, tmp) {
-        QTAILQ_REMOVE(&s->tmf_bh_list, req, next);
-        QTAILQ_INSERT_TAIL(&reqs, req, next);
+        qemu_bh_delete(s->tmf_bh);
+        s->tmf_bh = NULL;
     }
 
-    qemu_bh_delete(s->tmf_bh);
-    s->tmf_bh = NULL;
-
-    virtio_scsi_release(s);
-
     QTAILQ_FOREACH_SAFE(req, &reqs, next, tmp) {
         QTAILQ_REMOVE(&reqs, req, next);
         virtio_scsi_do_one_tmf_bh(req);
@@ -379,8 +398,7 @@ static void virtio_scsi_reset_tmf_bh(VirtIOSCSI *s)
 
     GLOBAL_STATE_CODE();
 
-    virtio_scsi_acquire(s);
-
+    /* Called after ioeventfd has been stopped, so tmf_bh_lock is not needed */
     if (s->tmf_bh) {
         qemu_bh_delete(s->tmf_bh);
         s->tmf_bh = NULL;
@@ -393,19 +411,19 @@ static void virtio_scsi_reset_tmf_bh(VirtIOSCSI *s)
         req->resp.tmf.response = VIRTIO_SCSI_S_TARGET_FAILURE;
         virtio_scsi_complete_req(req);
     }
-
-    virtio_scsi_release(s);
 }
 
 static void virtio_scsi_defer_tmf_to_bh(VirtIOSCSIReq *req)
 {
     VirtIOSCSI *s = req->dev;
 
-    QTAILQ_INSERT_TAIL(&s->tmf_bh_list, req, next);
+    WITH_QEMU_LOCK_GUARD(&s->tmf_bh_lock) {
+        QTAILQ_INSERT_TAIL(&s->tmf_bh_list, req, next);
 
-    if (!s->tmf_bh) {
-        s->tmf_bh = qemu_bh_new(virtio_scsi_do_tmf_bh, s);
-        qemu_bh_schedule(s->tmf_bh);
+        if (!s->tmf_bh) {
+            s->tmf_bh = qemu_bh_new(virtio_scsi_do_tmf_bh, s);
+            qemu_bh_schedule(s->tmf_bh);
+        }
     }
 }
 
@@ -1235,6 +1253,7 @@ static void virtio_scsi_device_realize(DeviceState *dev, Error **errp)
     Error *err = NULL;
 
     QTAILQ_INIT(&s->tmf_bh_list);
+    qemu_mutex_init(&s->tmf_bh_lock);
 
     virtio_scsi_common_realize(dev,
                                virtio_scsi_handle_ctrl,
@@ -1277,6 +1296,7 @@ static void virtio_scsi_device_unrealize(DeviceState *dev)
 
     qbus_set_hotplug_handler(BUS(&s->bus), NULL);
     virtio_scsi_common_unrealize(dev);
+    qemu_mutex_destroy(&s->tmf_bh_lock);
 }
 
 static Property virtio_scsi_properties[] = {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 16/33] scsi: assert that callbacks run in the correct AioContext
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (14 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 15/33] virtio-scsi: replace AioContext lock with tmf_bh_lock Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 17/33] tests: remove aio_context_acquire() tests Kevin Wolf
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Since the removal of AioContext locking, the correctness of the code
relies on running requests from a single AioContext at any given time.

Add assertions that verify that callbacks are invoked in the correct
AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231205182011.1976568-3-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/scsi/scsi-disk.c  | 14 ++++++++++++++
 system/dma-helpers.c |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 2c1bbb3530..a5048e0aaf 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -273,6 +273,10 @@ static void scsi_aio_complete(void *opaque, int ret)
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
+    /* The request must only run in the BlockBackend's AioContext */
+    assert(blk_get_aio_context(s->qdev.conf.blk) ==
+           qemu_get_current_aio_context());
+
     assert(r->req.aiocb != NULL);
     r->req.aiocb = NULL;
 
@@ -370,8 +374,13 @@ static void scsi_dma_complete(void *opaque, int ret)
 
 static void scsi_read_complete_noio(SCSIDiskReq *r, int ret)
 {
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
     uint32_t n;
 
+    /* The request must only run in the BlockBackend's AioContext */
+    assert(blk_get_aio_context(s->qdev.conf.blk) ==
+           qemu_get_current_aio_context());
+
     assert(r->req.aiocb == NULL);
     if (scsi_disk_req_check_error(r, ret, false)) {
         goto done;
@@ -496,8 +505,13 @@ static void scsi_read_data(SCSIRequest *req)
 
 static void scsi_write_complete_noio(SCSIDiskReq *r, int ret)
 {
+    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
     uint32_t n;
 
+    /* The request must only run in the BlockBackend's AioContext */
+    assert(blk_get_aio_context(s->qdev.conf.blk) ==
+           qemu_get_current_aio_context());
+
     assert (r->req.aiocb == NULL);
     if (scsi_disk_req_check_error(r, ret, false)) {
         goto done;
diff --git a/system/dma-helpers.c b/system/dma-helpers.c
index 528117f256..9b221cf94e 100644
--- a/system/dma-helpers.c
+++ b/system/dma-helpers.c
@@ -119,6 +119,9 @@ static void dma_blk_cb(void *opaque, int ret)
 
     trace_dma_blk_cb(dbs, ret);
 
+    /* DMAAIOCB is not thread-safe and must be accessed only from dbs->ctx */
+    assert(ctx == qemu_get_current_aio_context());
+
     dbs->acb = NULL;
     dbs->offset += dbs->iov.size;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 17/33] tests: remove aio_context_acquire() tests
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (15 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 16/33] scsi: assert that callbacks run in the correct AioContext Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 18/33] aio: make aio_context_acquire()/aio_context_release() a no-op Kevin Wolf
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

The aio_context_acquire() API is being removed. Drop the test case that
calls the API.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231205182011.1976568-4-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/unit/test-aio.c | 67 +------------------------------------------
 1 file changed, 1 insertion(+), 66 deletions(-)

diff --git a/tests/unit/test-aio.c b/tests/unit/test-aio.c
index 337b6e4ea7..e77d86be87 100644
--- a/tests/unit/test-aio.c
+++ b/tests/unit/test-aio.c
@@ -100,76 +100,12 @@ static void event_ready_cb(EventNotifier *e)
 
 /* Tests using aio_*.  */
 
-typedef struct {
-    QemuMutex start_lock;
-    EventNotifier notifier;
-    bool thread_acquired;
-} AcquireTestData;
-
-static void *test_acquire_thread(void *opaque)
-{
-    AcquireTestData *data = opaque;
-
-    /* Wait for other thread to let us start */
-    qemu_mutex_lock(&data->start_lock);
-    qemu_mutex_unlock(&data->start_lock);
-
-    /* event_notifier_set might be called either before or after
-     * the main thread's call to poll().  The test case's outcome
-     * should be the same in either case.
-     */
-    event_notifier_set(&data->notifier);
-    aio_context_acquire(ctx);
-    aio_context_release(ctx);
-
-    data->thread_acquired = true; /* success, we got here */
-
-    return NULL;
-}
-
 static void set_event_notifier(AioContext *nctx, EventNotifier *notifier,
                                EventNotifierHandler *handler)
 {
     aio_set_event_notifier(nctx, notifier, handler, NULL, NULL);
 }
 
-static void dummy_notifier_read(EventNotifier *n)
-{
-    event_notifier_test_and_clear(n);
-}
-
-static void test_acquire(void)
-{
-    QemuThread thread;
-    AcquireTestData data;
-
-    /* Dummy event notifier ensures aio_poll() will block */
-    event_notifier_init(&data.notifier, false);
-    set_event_notifier(ctx, &data.notifier, dummy_notifier_read);
-    g_assert(!aio_poll(ctx, false)); /* consume aio_notify() */
-
-    qemu_mutex_init(&data.start_lock);
-    qemu_mutex_lock(&data.start_lock);
-    data.thread_acquired = false;
-
-    qemu_thread_create(&thread, "test_acquire_thread",
-                       test_acquire_thread,
-                       &data, QEMU_THREAD_JOINABLE);
-
-    /* Block in aio_poll(), let other thread kick us and acquire context */
-    aio_context_acquire(ctx);
-    qemu_mutex_unlock(&data.start_lock); /* let the thread run */
-    g_assert(aio_poll(ctx, true));
-    g_assert(!data.thread_acquired);
-    aio_context_release(ctx);
-
-    qemu_thread_join(&thread);
-    set_event_notifier(ctx, &data.notifier, NULL);
-    event_notifier_cleanup(&data.notifier);
-
-    g_assert(data.thread_acquired);
-}
-
 static void test_bh_schedule(void)
 {
     BHTestData data = { .n = 0 };
@@ -879,7 +815,7 @@ static void test_worker_thread_co_enter(void)
     qemu_thread_get_self(&this_thread);
     co = qemu_coroutine_create(co_check_current_thread, &this_thread);
 
-    qemu_thread_create(&worker_thread, "test_acquire_thread",
+    qemu_thread_create(&worker_thread, "test_aio_co_enter",
                        test_aio_co_enter,
                        co, QEMU_THREAD_JOINABLE);
 
@@ -899,7 +835,6 @@ int main(int argc, char **argv)
     while (g_main_context_iteration(NULL, false));
 
     g_test_init(&argc, &argv, NULL);
-    g_test_add_func("/aio/acquire",                 test_acquire);
     g_test_add_func("/aio/bh/schedule",             test_bh_schedule);
     g_test_add_func("/aio/bh/schedule10",           test_bh_schedule10);
     g_test_add_func("/aio/bh/cancel",               test_bh_cancel);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 18/33] aio: make aio_context_acquire()/aio_context_release() a no-op
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (16 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 17/33] tests: remove aio_context_acquire() tests Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 19/33] graph-lock: remove AioContext locking Kevin Wolf
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

aio_context_acquire()/aio_context_release() has been replaced by
fine-grained locking to protect state shared by multiple threads. The
AioContext lock still plays the role of balancing locking in
AIO_WAIT_WHILE() and many functions in QEMU either require that the
AioContext lock is held or not held for this reason. In other words, the
AioContext lock is purely there for consistency with itself and serves
no real purpose anymore.

Stop actually acquiring/releasing the lock in
aio_context_acquire()/aio_context_release() so that subsequent patches
can remove callers across the codebase incrementally.

I have performed "make check" and qemu-iotests stress tests across
x86-64, ppc64le, and aarch64 to confirm that there are no failures as a
result of eliminating the lock.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231205182011.1976568-5-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 util/async.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/async.c b/util/async.c
index 8f90ddc304..04ee83d220 100644
--- a/util/async.c
+++ b/util/async.c
@@ -725,12 +725,12 @@ void aio_context_unref(AioContext *ctx)

 void aio_context_acquire(AioContext *ctx)
 {
-    qemu_rec_mutex_lock(&ctx->lock);
+    /* TODO remove this function */
 }

 void aio_context_release(AioContext *ctx)
 {
-    qemu_rec_mutex_unlock(&ctx->lock);
+    /* TODO remove this function */
 }

 QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 19/33] graph-lock: remove AioContext locking
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (17 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 18/33] aio: make aio_context_acquire()/aio_context_release() a no-op Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 20/33] block: " Kevin Wolf
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Stop acquiring/releasing the AioContext lock in
bdrv_graph_wrlock()/bdrv_graph_unlock() since the lock no longer has any
effect.

The distinction between bdrv_graph_wrunlock() and
bdrv_graph_wrunlock_ctx() becomes meaningless and they can be collapsed
into one function.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231205182011.1976568-6-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/graph-lock.h         | 21 ++-----------
 block.c                            | 50 +++++++++++++++---------------
 block/backup.c                     |  4 +--
 block/blklogwrites.c               |  8 ++---
 block/blkverify.c                  |  4 +--
 block/block-backend.c              | 11 +++----
 block/commit.c                     | 16 +++++-----
 block/graph-lock.c                 | 44 ++------------------------
 block/mirror.c                     | 22 ++++++-------
 block/qcow2.c                      |  4 +--
 block/quorum.c                     |  8 ++---
 block/replication.c                | 14 ++++-----
 block/snapshot.c                   |  4 +--
 block/stream.c                     | 12 +++----
 block/vmdk.c                       | 20 ++++++------
 blockdev.c                         |  8 ++---
 blockjob.c                         | 12 +++----
 tests/unit/test-bdrv-drain.c       | 40 ++++++++++++------------
 tests/unit/test-bdrv-graph-mod.c   | 20 ++++++------
 scripts/block-coroutine-wrapper.py |  4 +--
 20 files changed, 133 insertions(+), 193 deletions(-)

diff --git a/include/block/graph-lock.h b/include/block/graph-lock.h
index 22b5db1ed9..d7545e82d0 100644
--- a/include/block/graph-lock.h
+++ b/include/block/graph-lock.h
@@ -110,34 +110,17 @@ void unregister_aiocontext(AioContext *ctx);
  *
  * The wrlock can only be taken from the main loop, with BQL held, as only the
  * main loop is allowed to modify the graph.
- *
- * If @bs is non-NULL, its AioContext is temporarily released.
- *
- * This function polls. Callers must not hold the lock of any AioContext other
- * than the current one and the one of @bs.
  */
 void no_coroutine_fn TSA_ACQUIRE(graph_lock) TSA_NO_TSA
-bdrv_graph_wrlock(BlockDriverState *bs);
+bdrv_graph_wrlock(void);
 
 /*
  * bdrv_graph_wrunlock:
  * Write finished, reset global has_writer to 0 and restart
  * all readers that are waiting.
- *
- * If @bs is non-NULL, its AioContext is temporarily released.
- */
-void no_coroutine_fn TSA_RELEASE(graph_lock) TSA_NO_TSA
-bdrv_graph_wrunlock(BlockDriverState *bs);
-
-/*
- * bdrv_graph_wrunlock_ctx:
- * Write finished, reset global has_writer to 0 and restart
- * all readers that are waiting.
- *
- * If @ctx is non-NULL, its lock is temporarily released.
  */
 void no_coroutine_fn TSA_RELEASE(graph_lock) TSA_NO_TSA
-bdrv_graph_wrunlock_ctx(AioContext *ctx);
+bdrv_graph_wrunlock(void);
 
 /*
  * bdrv_graph_co_rdlock:
diff --git a/block.c b/block.c
index bfb0861ec6..25e1ebc606 100644
--- a/block.c
+++ b/block.c
@@ -1708,12 +1708,12 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
 open_failed:
     bs->drv = NULL;
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     if (bs->file != NULL) {
         bdrv_unref_child(bs, bs->file);
         assert(!bs->file);
     }
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     g_free(bs->opaque);
     bs->opaque = NULL;
@@ -3575,9 +3575,9 @@ int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
 
     bdrv_ref(drain_bs);
     bdrv_drained_begin(drain_bs);
-    bdrv_graph_wrlock(backing_hd);
+    bdrv_graph_wrlock();
     ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
-    bdrv_graph_wrunlock(backing_hd);
+    bdrv_graph_wrunlock();
     bdrv_drained_end(drain_bs);
     bdrv_unref(drain_bs);
 
@@ -3790,13 +3790,13 @@ BdrvChild *bdrv_open_child(const char *filename,
         return NULL;
     }
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     ctx = bdrv_get_aio_context(bs);
     aio_context_acquire(ctx);
     child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
                               errp);
     aio_context_release(ctx);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     return child;
 }
@@ -4650,9 +4650,9 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
         aio_context_release(ctx);
     }
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     tran_commit(tran);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     QTAILQ_FOREACH_REVERSE(bs_entry, bs_queue, entry) {
         BlockDriverState *bs = bs_entry->state.bs;
@@ -4669,9 +4669,9 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
     goto cleanup;
 
 abort:
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     tran_abort(tran);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
         if (bs_entry->prepared) {
@@ -4852,12 +4852,12 @@ bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
     }
 
     bdrv_graph_rdunlock_main_loop();
-    bdrv_graph_wrlock(new_child_bs);
+    bdrv_graph_wrlock();
 
     ret = bdrv_set_file_or_backing_noperm(bs, new_child_bs, is_backing,
                                           tran, errp);
 
-    bdrv_graph_wrunlock_ctx(ctx);
+    bdrv_graph_wrunlock();
 
     if (old_ctx != ctx) {
         aio_context_release(ctx);
@@ -5209,14 +5209,14 @@ static void bdrv_close(BlockDriverState *bs)
         bs->drv = NULL;
     }
 
-    bdrv_graph_wrlock(bs);
+    bdrv_graph_wrlock();
     QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
         bdrv_unref_child(bs, child);
     }
 
     assert(!bs->backing);
     assert(!bs->file);
-    bdrv_graph_wrunlock(bs);
+    bdrv_graph_wrunlock();
 
     g_free(bs->opaque);
     bs->opaque = NULL;
@@ -5509,9 +5509,9 @@ int bdrv_drop_filter(BlockDriverState *bs, Error **errp)
     bdrv_graph_rdunlock_main_loop();
 
     bdrv_drained_begin(child_bs);
-    bdrv_graph_wrlock(bs);
+    bdrv_graph_wrlock();
     ret = bdrv_replace_node_common(bs, child_bs, true, true, errp);
-    bdrv_graph_wrunlock(bs);
+    bdrv_graph_wrunlock();
     bdrv_drained_end(child_bs);
 
     return ret;
@@ -5561,7 +5561,7 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
     aio_context_acquire(old_context);
     new_context = NULL;
 
-    bdrv_graph_wrlock(bs_top);
+    bdrv_graph_wrlock();
 
     child = bdrv_attach_child_noperm(bs_new, bs_top, "backing",
                                      &child_of_bds, bdrv_backing_role(bs_new),
@@ -5593,7 +5593,7 @@ out:
     tran_finalize(tran, ret);
 
     bdrv_refresh_limits(bs_top, NULL, NULL);
-    bdrv_graph_wrunlock(bs_top);
+    bdrv_graph_wrunlock();
 
     bdrv_drained_end(bs_top);
     bdrv_drained_end(bs_new);
@@ -5620,7 +5620,7 @@ int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
     bdrv_ref(old_bs);
     bdrv_drained_begin(old_bs);
     bdrv_drained_begin(new_bs);
-    bdrv_graph_wrlock(new_bs);
+    bdrv_graph_wrlock();
 
     bdrv_replace_child_tran(child, new_bs, tran);
 
@@ -5631,7 +5631,7 @@ int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
 
     tran_finalize(tran, ret);
 
-    bdrv_graph_wrunlock(new_bs);
+    bdrv_graph_wrunlock();
     bdrv_drained_end(old_bs);
     bdrv_drained_end(new_bs);
     bdrv_unref(old_bs);
@@ -5718,9 +5718,9 @@ BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *options,
     bdrv_ref(bs);
     bdrv_drained_begin(bs);
     bdrv_drained_begin(new_node_bs);
-    bdrv_graph_wrlock(new_node_bs);
+    bdrv_graph_wrlock();
     ret = bdrv_replace_node(bs, new_node_bs, errp);
-    bdrv_graph_wrunlock(new_node_bs);
+    bdrv_graph_wrunlock();
     bdrv_drained_end(new_node_bs);
     bdrv_drained_end(bs);
     bdrv_unref(bs);
@@ -5975,7 +5975,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
 
     bdrv_ref(top);
     bdrv_drained_begin(base);
-    bdrv_graph_wrlock(base);
+    bdrv_graph_wrlock();
 
     if (!top->drv || !base->drv) {
         goto exit_wrlock;
@@ -6015,7 +6015,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
      * That's a FIXME.
      */
     bdrv_replace_node_common(top, base, false, false, &local_err);
-    bdrv_graph_wrunlock(base);
+    bdrv_graph_wrunlock();
 
     if (local_err) {
         error_report_err(local_err);
@@ -6052,7 +6052,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
     goto exit;
 
 exit_wrlock:
-    bdrv_graph_wrunlock(base);
+    bdrv_graph_wrunlock();
 exit:
     bdrv_drained_end(base);
     bdrv_unref(top);
diff --git a/block/backup.c b/block/backup.c
index 8aae5836d7..ec29d6b810 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -496,10 +496,10 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     block_copy_set_speed(bcs, speed);
 
     /* Required permissions are taken by copy-before-write filter target */
-    bdrv_graph_wrlock(target);
+    bdrv_graph_wrlock();
     block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
                        &error_abort);
-    bdrv_graph_wrunlock(target);
+    bdrv_graph_wrunlock();
 
     return &job->common;
 
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 3678f6cf42..7207b2e757 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -251,9 +251,9 @@ static int blk_log_writes_open(BlockDriverState *bs, QDict *options, int flags,
     ret = 0;
 fail_log:
     if (ret < 0) {
-        bdrv_graph_wrlock(NULL);
+        bdrv_graph_wrlock();
         bdrv_unref_child(bs, s->log_file);
-        bdrv_graph_wrunlock(NULL);
+        bdrv_graph_wrunlock();
         s->log_file = NULL;
     }
 fail:
@@ -265,10 +265,10 @@ static void blk_log_writes_close(BlockDriverState *bs)
 {
     BDRVBlkLogWritesState *s = bs->opaque;
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     bdrv_unref_child(bs, s->log_file);
     s->log_file = NULL;
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 }
 
 static int64_t coroutine_fn GRAPH_RDLOCK
diff --git a/block/blkverify.c b/block/blkverify.c
index 9b17c46644..ec45d8335e 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -151,10 +151,10 @@ static void blkverify_close(BlockDriverState *bs)
 {
     BDRVBlkverifyState *s = bs->opaque;
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     bdrv_unref_child(bs, s->test_file);
     s->test_file = NULL;
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 }
 
 static int64_t coroutine_fn GRAPH_RDLOCK
diff --git a/block/block-backend.c b/block/block-backend.c
index ec21148806..abac4e0235 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -889,7 +889,6 @@ void blk_remove_bs(BlockBackend *blk)
 {
     ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
     BdrvChild *root;
-    AioContext *ctx;
 
     GLOBAL_STATE_CODE();
 
@@ -919,10 +918,9 @@ void blk_remove_bs(BlockBackend *blk)
     root = blk->root;
     blk->root = NULL;
 
-    ctx = bdrv_get_aio_context(root->bs);
-    bdrv_graph_wrlock(root->bs);
+    bdrv_graph_wrlock();
     bdrv_root_unref_child(root);
-    bdrv_graph_wrunlock_ctx(ctx);
+    bdrv_graph_wrunlock();
 }
 
 /*
@@ -933,16 +931,15 @@ void blk_remove_bs(BlockBackend *blk)
 int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
 {
     ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
-    AioContext *ctx = bdrv_get_aio_context(bs);
 
     GLOBAL_STATE_CODE();
     bdrv_ref(bs);
-    bdrv_graph_wrlock(bs);
+    bdrv_graph_wrlock();
     blk->root = bdrv_root_attach_child(bs, "root", &child_root,
                                        BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                                        blk->perm, blk->shared_perm,
                                        blk, errp);
-    bdrv_graph_wrunlock_ctx(ctx);
+    bdrv_graph_wrunlock();
     if (blk->root == NULL) {
         return -EPERM;
     }
diff --git a/block/commit.c b/block/commit.c
index 69cc75be0c..1dd7a65ffb 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -100,9 +100,9 @@ static void commit_abort(Job *job)
     bdrv_graph_rdunlock_main_loop();
 
     bdrv_drained_begin(commit_top_backing_bs);
-    bdrv_graph_wrlock(commit_top_backing_bs);
+    bdrv_graph_wrlock();
     bdrv_replace_node(s->commit_top_bs, commit_top_backing_bs, &error_abort);
-    bdrv_graph_wrunlock(commit_top_backing_bs);
+    bdrv_graph_wrunlock();
     bdrv_drained_end(commit_top_backing_bs);
 
     bdrv_unref(s->commit_top_bs);
@@ -339,7 +339,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
      * this is the responsibility of the interface (i.e. whoever calls
      * commit_start()).
      */
-    bdrv_graph_wrlock(top);
+    bdrv_graph_wrlock();
     s->base_overlay = bdrv_find_overlay(top, base);
     assert(s->base_overlay);
 
@@ -370,19 +370,19 @@ void commit_start(const char *job_id, BlockDriverState *bs,
         ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
                                  iter_shared_perms, errp);
         if (ret < 0) {
-            bdrv_graph_wrunlock(top);
+            bdrv_graph_wrunlock();
             goto fail;
         }
     }
 
     if (bdrv_freeze_backing_chain(commit_top_bs, base, errp) < 0) {
-        bdrv_graph_wrunlock(top);
+        bdrv_graph_wrunlock();
         goto fail;
     }
     s->chain_frozen = true;
 
     ret = block_job_add_bdrv(&s->common, "base", base, 0, BLK_PERM_ALL, errp);
-    bdrv_graph_wrunlock(top);
+    bdrv_graph_wrunlock();
 
     if (ret < 0) {
         goto fail;
@@ -434,9 +434,9 @@ fail:
      * otherwise this would fail because of lack of permissions. */
     if (commit_top_bs) {
         bdrv_drained_begin(top);
-        bdrv_graph_wrlock(top);
+        bdrv_graph_wrlock();
         bdrv_replace_node(commit_top_bs, top, &error_abort);
-        bdrv_graph_wrunlock(top);
+        bdrv_graph_wrunlock();
         bdrv_drained_end(top);
     }
 }
diff --git a/block/graph-lock.c b/block/graph-lock.c
index 079e878d9b..c81162b147 100644
--- a/block/graph-lock.c
+++ b/block/graph-lock.c
@@ -106,27 +106,12 @@ static uint32_t reader_count(void)
     return rd;
 }
 
-void no_coroutine_fn bdrv_graph_wrlock(BlockDriverState *bs)
+void no_coroutine_fn bdrv_graph_wrlock(void)
 {
-    AioContext *ctx = NULL;
-
     GLOBAL_STATE_CODE();
     assert(!qatomic_read(&has_writer));
     assert(!qemu_in_coroutine());
 
-    /*
-     * Release only non-mainloop AioContext. The mainloop often relies on the
-     * BQL and doesn't lock the main AioContext before doing things.
-     */
-    if (bs) {
-        ctx = bdrv_get_aio_context(bs);
-        if (ctx != qemu_get_aio_context()) {
-            aio_context_release(ctx);
-        } else {
-            ctx = NULL;
-        }
-    }
-
     /* Make sure that constantly arriving new I/O doesn't cause starvation */
     bdrv_drain_all_begin_nopoll();
 
@@ -155,27 +140,13 @@ void no_coroutine_fn bdrv_graph_wrlock(BlockDriverState *bs)
     } while (reader_count() >= 1);
 
     bdrv_drain_all_end();
-
-    if (ctx) {
-        aio_context_acquire(bdrv_get_aio_context(bs));
-    }
 }
 
-void no_coroutine_fn bdrv_graph_wrunlock_ctx(AioContext *ctx)
+void no_coroutine_fn bdrv_graph_wrunlock(void)
 {
     GLOBAL_STATE_CODE();
     assert(qatomic_read(&has_writer));
 
-    /*
-     * Release only non-mainloop AioContext. The mainloop often relies on the
-     * BQL and doesn't lock the main AioContext before doing things.
-     */
-    if (ctx && ctx != qemu_get_aio_context()) {
-        aio_context_release(ctx);
-    } else {
-        ctx = NULL;
-    }
-
     WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
         /*
          * No need for memory barriers, this works in pair with
@@ -197,17 +168,6 @@ void no_coroutine_fn bdrv_graph_wrunlock_ctx(AioContext *ctx)
      * progress.
      */
     aio_bh_poll(qemu_get_aio_context());
-
-    if (ctx) {
-        aio_context_acquire(ctx);
-    }
-}
-
-void no_coroutine_fn bdrv_graph_wrunlock(BlockDriverState *bs)
-{
-    AioContext *ctx = bs ? bdrv_get_aio_context(bs) : NULL;
-
-    bdrv_graph_wrunlock_ctx(ctx);
 }
 
 void coroutine_fn bdrv_graph_co_rdlock(void)
diff --git a/block/mirror.c b/block/mirror.c
index cd9d3ad4a8..51f9e2f17c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -764,7 +764,7 @@ static int mirror_exit_common(Job *job)
          * check for an op blocker on @to_replace, and we have our own
          * there.
          */
-        bdrv_graph_wrlock(target_bs);
+        bdrv_graph_wrlock();
         if (bdrv_recurse_can_replace(src, to_replace)) {
             bdrv_replace_node(to_replace, target_bs, &local_err);
         } else {
@@ -773,7 +773,7 @@ static int mirror_exit_common(Job *job)
                        "would not lead to an abrupt change of visible data",
                        to_replace->node_name, target_bs->node_name);
         }
-        bdrv_graph_wrunlock(target_bs);
+        bdrv_graph_wrunlock();
         bdrv_drained_end(to_replace);
         if (local_err) {
             error_report_err(local_err);
@@ -796,9 +796,9 @@ static int mirror_exit_common(Job *job)
      * valid.
      */
     block_job_remove_all_bdrv(bjob);
-    bdrv_graph_wrlock(mirror_top_bs);
+    bdrv_graph_wrlock();
     bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
-    bdrv_graph_wrunlock(mirror_top_bs);
+    bdrv_graph_wrunlock();
 
     bdrv_drained_end(target_bs);
     bdrv_unref(target_bs);
@@ -1914,13 +1914,13 @@ static BlockJob *mirror_start_job(
      */
     bdrv_disable_dirty_bitmap(s->dirty_bitmap);
 
-    bdrv_graph_wrlock(bs);
+    bdrv_graph_wrlock();
     ret = block_job_add_bdrv(&s->common, "source", bs, 0,
                              BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
                              BLK_PERM_CONSISTENT_READ,
                              errp);
     if (ret < 0) {
-        bdrv_graph_wrunlock(bs);
+        bdrv_graph_wrunlock();
         goto fail;
     }
 
@@ -1965,17 +1965,17 @@ static BlockJob *mirror_start_job(
             ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
                                      iter_shared_perms, errp);
             if (ret < 0) {
-                bdrv_graph_wrunlock(bs);
+                bdrv_graph_wrunlock();
                 goto fail;
             }
         }
 
         if (bdrv_freeze_backing_chain(mirror_top_bs, target, errp) < 0) {
-            bdrv_graph_wrunlock(bs);
+            bdrv_graph_wrunlock();
             goto fail;
         }
     }
-    bdrv_graph_wrunlock(bs);
+    bdrv_graph_wrunlock();
 
     QTAILQ_INIT(&s->ops_in_flight);
 
@@ -2001,12 +2001,12 @@ fail:
 
     bs_opaque->stop = true;
     bdrv_drained_begin(bs);
-    bdrv_graph_wrlock(bs);
+    bdrv_graph_wrlock();
     assert(mirror_top_bs->backing->bs == bs);
     bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
                              &error_abort);
     bdrv_replace_node(mirror_top_bs, bs, &error_abort);
-    bdrv_graph_wrunlock(bs);
+    bdrv_graph_wrunlock();
     bdrv_drained_end(bs);
 
     bdrv_unref(mirror_top_bs);
diff --git a/block/qcow2.c b/block/qcow2.c
index 13e032bd5e..9bee66fff5 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2807,9 +2807,9 @@ qcow2_do_close(BlockDriverState *bs, bool close_data_file)
     if (close_data_file && has_data_file(bs)) {
         GLOBAL_STATE_CODE();
         bdrv_graph_rdunlock_main_loop();
-        bdrv_graph_wrlock(NULL);
+        bdrv_graph_wrlock();
         bdrv_unref_child(bs, s->data_file);
-        bdrv_graph_wrunlock(NULL);
+        bdrv_graph_wrunlock();
         s->data_file = NULL;
         bdrv_graph_rdlock_main_loop();
     }
diff --git a/block/quorum.c b/block/quorum.c
index 505b8b3e18..db8fe891c4 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1037,14 +1037,14 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
 
 close_exit:
     /* cleanup on error */
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     for (i = 0; i < s->num_children; i++) {
         if (!opened[i]) {
             continue;
         }
         bdrv_unref_child(bs, s->children[i]);
     }
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
     g_free(s->children);
     g_free(opened);
 exit:
@@ -1057,11 +1057,11 @@ static void quorum_close(BlockDriverState *bs)
     BDRVQuorumState *s = bs->opaque;
     int i;
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     for (i = 0; i < s->num_children; i++) {
         bdrv_unref_child(bs, s->children[i]);
     }
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     g_free(s->children);
 }
diff --git a/block/replication.c b/block/replication.c
index 5ded5f1ca9..424b537ff7 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -560,7 +560,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
             return;
         }
 
-        bdrv_graph_wrlock(bs);
+        bdrv_graph_wrlock();
 
         bdrv_ref(hidden_disk->bs);
         s->hidden_disk = bdrv_attach_child(bs, hidden_disk->bs, "hidden disk",
@@ -568,7 +568,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
                                            &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
-            bdrv_graph_wrunlock(bs);
+            bdrv_graph_wrunlock();
             aio_context_release(aio_context);
             return;
         }
@@ -579,7 +579,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
                                               BDRV_CHILD_DATA, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
-            bdrv_graph_wrunlock(bs);
+            bdrv_graph_wrunlock();
             aio_context_release(aio_context);
             return;
         }
@@ -592,7 +592,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         if (!top_bs || !bdrv_is_root_node(top_bs) ||
             !check_top_bs(top_bs, bs)) {
             error_setg(errp, "No top_bs or it is invalid");
-            bdrv_graph_wrunlock(bs);
+            bdrv_graph_wrunlock();
             reopen_backing_file(bs, false, NULL);
             aio_context_release(aio_context);
             return;
@@ -600,7 +600,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         bdrv_op_block_all(top_bs, s->blocker);
         bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
 
-        bdrv_graph_wrunlock(bs);
+        bdrv_graph_wrunlock();
 
         s->backup_job = backup_job_create(
                                 NULL, s->secondary_disk->bs, s->hidden_disk->bs,
@@ -691,12 +691,12 @@ static void replication_done(void *opaque, int ret)
     if (ret == 0) {
         s->stage = BLOCK_REPLICATION_DONE;
 
-        bdrv_graph_wrlock(NULL);
+        bdrv_graph_wrlock();
         bdrv_unref_child(bs, s->secondary_disk);
         s->secondary_disk = NULL;
         bdrv_unref_child(bs, s->hidden_disk);
         s->hidden_disk = NULL;
-        bdrv_graph_wrunlock(NULL);
+        bdrv_graph_wrunlock();
 
         s->error = 0;
     } else {
diff --git a/block/snapshot.c b/block/snapshot.c
index c4d40e80dd..6fd720aeff 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -292,9 +292,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
         }
 
         /* .bdrv_open() will re-attach it */
-        bdrv_graph_wrlock(NULL);
+        bdrv_graph_wrlock();
         bdrv_unref_child(bs, fallback);
-        bdrv_graph_wrunlock(NULL);
+        bdrv_graph_wrunlock();
 
         ret = bdrv_snapshot_goto(fallback_bs, snapshot_id, errp);
         open_ret = drv->bdrv_open(bs, options, bs->open_flags, &local_err);
diff --git a/block/stream.c b/block/stream.c
index 01fe7c0f16..048c2d282f 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -99,9 +99,9 @@ static int stream_prepare(Job *job)
             }
         }
 
-        bdrv_graph_wrlock(s->target_bs);
+        bdrv_graph_wrlock();
         bdrv_set_backing_hd_drained(unfiltered_bs, base, &local_err);
-        bdrv_graph_wrunlock(s->target_bs);
+        bdrv_graph_wrunlock();
 
         /*
          * This call will do I/O, so the graph can change again from here on.
@@ -366,10 +366,10 @@ void stream_start(const char *job_id, BlockDriverState *bs,
      * already have our own plans. Also don't allow resize as the image size is
      * queried only at the job start and then cached.
      */
-    bdrv_graph_wrlock(bs);
+    bdrv_graph_wrlock();
     if (block_job_add_bdrv(&s->common, "active node", bs, 0,
                            basic_flags | BLK_PERM_WRITE, errp)) {
-        bdrv_graph_wrunlock(bs);
+        bdrv_graph_wrunlock();
         goto fail;
     }
 
@@ -389,11 +389,11 @@ void stream_start(const char *job_id, BlockDriverState *bs,
         ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
                                  basic_flags, errp);
         if (ret < 0) {
-            bdrv_graph_wrunlock(bs);
+            bdrv_graph_wrunlock();
             goto fail;
         }
     }
-    bdrv_graph_wrunlock(bs);
+    bdrv_graph_wrunlock();
 
     s->base_overlay = base_overlay;
     s->above_base = above_base;
diff --git a/block/vmdk.c b/block/vmdk.c
index d6971c7067..bf78e12383 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -272,7 +272,7 @@ static void vmdk_free_extents(BlockDriverState *bs)
     BDRVVmdkState *s = bs->opaque;
     VmdkExtent *e;
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     for (i = 0; i < s->num_extents; i++) {
         e = &s->extents[i];
         g_free(e->l1_table);
@@ -283,7 +283,7 @@ static void vmdk_free_extents(BlockDriverState *bs)
             bdrv_unref_child(bs, e->file);
         }
     }
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     g_free(s->extents);
 }
@@ -1247,9 +1247,9 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
                             0, 0, 0, 0, 0, &extent, errp);
             if (ret < 0) {
                 bdrv_graph_rdunlock_main_loop();
-                bdrv_graph_wrlock(NULL);
+                bdrv_graph_wrlock();
                 bdrv_unref_child(bs, extent_file);
-                bdrv_graph_wrunlock(NULL);
+                bdrv_graph_wrunlock();
                 bdrv_graph_rdlock_main_loop();
                 goto out;
             }
@@ -1266,9 +1266,9 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
             g_free(buf);
             if (ret) {
                 bdrv_graph_rdunlock_main_loop();
-                bdrv_graph_wrlock(NULL);
+                bdrv_graph_wrlock();
                 bdrv_unref_child(bs, extent_file);
-                bdrv_graph_wrunlock(NULL);
+                bdrv_graph_wrunlock();
                 bdrv_graph_rdlock_main_loop();
                 goto out;
             }
@@ -1277,9 +1277,9 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
             ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
             if (ret) {
                 bdrv_graph_rdunlock_main_loop();
-                bdrv_graph_wrlock(NULL);
+                bdrv_graph_wrlock();
                 bdrv_unref_child(bs, extent_file);
-                bdrv_graph_wrunlock(NULL);
+                bdrv_graph_wrunlock();
                 bdrv_graph_rdlock_main_loop();
                 goto out;
             }
@@ -1287,9 +1287,9 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
         } else {
             error_setg(errp, "Unsupported extent type '%s'", type);
             bdrv_graph_rdunlock_main_loop();
-            bdrv_graph_wrlock(NULL);
+            bdrv_graph_wrlock();
             bdrv_unref_child(bs, extent_file);
-            bdrv_graph_wrunlock(NULL);
+            bdrv_graph_wrunlock();
             bdrv_graph_rdlock_main_loop();
             ret = -ENOTSUP;
             goto out;
diff --git a/blockdev.c b/blockdev.c
index c91f49e7b6..9e1381169d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1611,9 +1611,9 @@ static void external_snapshot_abort(void *opaque)
             }
 
             bdrv_drained_begin(state->new_bs);
-            bdrv_graph_wrlock(state->old_bs);
+            bdrv_graph_wrlock();
             bdrv_replace_node(state->new_bs, state->old_bs, &error_abort);
-            bdrv_graph_wrunlock(state->old_bs);
+            bdrv_graph_wrunlock();
             bdrv_drained_end(state->new_bs);
 
             bdrv_unref(state->old_bs); /* bdrv_replace_node() ref'ed old_bs */
@@ -3657,7 +3657,7 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
     BlockDriverState *parent_bs, *new_bs = NULL;
     BdrvChild *p_child;
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
 
     parent_bs = bdrv_lookup_bs(parent, parent, errp);
     if (!parent_bs) {
@@ -3693,7 +3693,7 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
     }
 
 out:
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 }
 
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
diff --git a/blockjob.c b/blockjob.c
index b7a29052b9..7310412313 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -199,7 +199,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
      * to process an already freed BdrvChild.
      */
     aio_context_release(job->job.aio_context);
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     aio_context_acquire(job->job.aio_context);
     while (job->nodes) {
         GSList *l = job->nodes;
@@ -212,7 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
 
         g_slist_free_1(l);
     }
-    bdrv_graph_wrunlock_ctx(job->job.aio_context);
+    bdrv_graph_wrunlock();
 }
 
 bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
@@ -514,7 +514,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     int ret;
     GLOBAL_STATE_CODE();
 
-    bdrv_graph_wrlock(bs);
+    bdrv_graph_wrlock();
 
     if (job_id == NULL && !(flags & JOB_INTERNAL)) {
         job_id = bdrv_get_device_name(bs);
@@ -523,7 +523,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job = job_create(job_id, &driver->job_driver, txn, bdrv_get_aio_context(bs),
                      flags, cb, opaque, errp);
     if (job == NULL) {
-        bdrv_graph_wrunlock(bs);
+        bdrv_graph_wrunlock();
         return NULL;
     }
 
@@ -563,11 +563,11 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
         goto fail;
     }
 
-    bdrv_graph_wrunlock(bs);
+    bdrv_graph_wrunlock();
     return job;
 
 fail:
-    bdrv_graph_wrunlock(bs);
+    bdrv_graph_wrunlock();
     job_early_fail(&job->job);
     return NULL;
 }
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 704d1a3f36..d9754dfebc 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -807,9 +807,9 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     tjob->bs = src;
     job = &tjob->common;
 
-    bdrv_graph_wrlock(target);
+    bdrv_graph_wrlock();
     block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, &error_abort);
-    bdrv_graph_wrunlock(target);
+    bdrv_graph_wrunlock();
 
     switch (result) {
     case TEST_JOB_SUCCESS:
@@ -991,11 +991,11 @@ static void bdrv_test_top_close(BlockDriverState *bs)
 {
     BdrvChild *c, *next_c;
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
         bdrv_unref_child(bs, c);
     }
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 }
 
 static int coroutine_fn GRAPH_RDLOCK
@@ -1085,10 +1085,10 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
 
     null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
                         &error_abort);
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     /* This child will be the one to pass to requests through to, and
      * it will stall until a drain occurs */
@@ -1096,21 +1096,21 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
                                     &error_abort);
     child_bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
     /* Takes our reference to child_bs */
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child",
                                         &child_of_bds,
                                         BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
                                         &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     /* This child is just there to be deleted
      * (for detach_instead_of_delete == true) */
     null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
                         &error_abort);
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds, BDRV_CHILD_DATA,
                       &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
     blk_insert_bs(blk, bs, &error_abort);
@@ -1193,14 +1193,14 @@ static void no_coroutine_fn detach_indirect_bh(void *opaque)
 
     bdrv_dec_in_flight(data->child_b->bs);
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     bdrv_unref_child(data->parent_b, data->child_b);
 
     bdrv_ref(data->c);
     data->child_c = bdrv_attach_child(data->parent_b, data->c, "PB-C",
                                       &child_of_bds, BDRV_CHILD_DATA,
                                       &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 }
 
 static void coroutine_mixed_fn detach_by_parent_aio_cb(void *opaque, int ret)
@@ -1298,7 +1298,7 @@ static void TSA_NO_TSA test_detach_indirect(bool by_parent_cb)
     /* Set child relationships */
     bdrv_ref(b);
     bdrv_ref(a);
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     child_b = bdrv_attach_child(parent_b, b, "PB-B", &child_of_bds,
                                 BDRV_CHILD_DATA, &error_abort);
     child_a = bdrv_attach_child(parent_b, a, "PB-A", &child_of_bds,
@@ -1308,7 +1308,7 @@ static void TSA_NO_TSA test_detach_indirect(bool by_parent_cb)
     bdrv_attach_child(parent_a, a, "PA-A",
                       by_parent_cb ? &child_of_bds : &detach_by_driver_cb_class,
                       BDRV_CHILD_DATA, &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     g_assert_cmpint(parent_a->refcnt, ==, 1);
     g_assert_cmpint(parent_b->refcnt, ==, 1);
@@ -1727,7 +1727,7 @@ static void test_drop_intermediate_poll(void)
      * Establish the chain last, so the chain links are the first
      * elements in the BDS.parents lists
      */
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     for (i = 0; i < 3; i++) {
         if (i) {
             /* Takes the reference to chain[i - 1] */
@@ -1735,7 +1735,7 @@ static void test_drop_intermediate_poll(void)
                               &chain_child_class, BDRV_CHILD_COW, &error_abort);
         }
     }
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     job = block_job_create("job", &test_simple_job_driver, NULL, job_node,
                            0, BLK_PERM_ALL, 0, 0, NULL, NULL, &error_abort);
@@ -1982,10 +1982,10 @@ static void do_test_replace_child_mid_drain(int old_drain_count,
     new_child_bs->total_sectors = 1;
 
     bdrv_ref(old_child_bs);
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     bdrv_attach_child(parent_bs, old_child_bs, "child", &child_of_bds,
                       BDRV_CHILD_COW, &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
     parent_s->setup_completed = true;
 
     for (i = 0; i < old_drain_count; i++) {
@@ -2016,9 +2016,9 @@ static void do_test_replace_child_mid_drain(int old_drain_count,
     g_assert(parent_bs->quiesce_counter == old_drain_count);
     bdrv_drained_begin(old_child_bs);
     bdrv_drained_begin(new_child_bs);
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     bdrv_replace_node(old_child_bs, new_child_bs, &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
     bdrv_drained_end(new_child_bs);
     bdrv_drained_end(old_child_bs);
     g_assert(parent_bs->quiesce_counter == new_drain_count);
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index 074adcbb93..8ee6ef38d8 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -137,10 +137,10 @@ static void test_update_perm_tree(void)
 
     blk_insert_bs(root, bs, &error_abort);
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     bdrv_attach_child(filter, bs, "child", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     aio_context_acquire(qemu_get_aio_context());
     ret = bdrv_append(filter, bs, NULL);
@@ -206,11 +206,11 @@ static void test_should_update_child(void)
 
     bdrv_set_backing_hd(target, bs, &error_abort);
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     g_assert(target->backing->bs == bs);
     bdrv_attach_child(filter, target, "target", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
     aio_context_acquire(qemu_get_aio_context());
     bdrv_append(filter, bs, &error_abort);
     aio_context_release(qemu_get_aio_context());
@@ -248,7 +248,7 @@ static void test_parallel_exclusive_write(void)
     bdrv_ref(base);
     bdrv_ref(fl1);
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     bdrv_attach_child(top, fl1, "backing", &child_of_bds,
                       BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                       &error_abort);
@@ -260,7 +260,7 @@ static void test_parallel_exclusive_write(void)
                       &error_abort);
 
     bdrv_replace_node(fl1, fl2, &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     bdrv_drained_end(fl2);
     bdrv_drained_end(fl1);
@@ -367,7 +367,7 @@ static void test_parallel_perm_update(void)
      */
     bdrv_ref(base);
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     bdrv_attach_child(top, ws, "file", &child_of_bds, BDRV_CHILD_DATA,
                       &error_abort);
     c_fl1 = bdrv_attach_child(ws, fl1, "first", &child_of_bds,
@@ -380,7 +380,7 @@ static void test_parallel_perm_update(void)
     bdrv_attach_child(fl2, base, "backing", &child_of_bds,
                       BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                       &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     /* Select fl1 as first child to be active */
     s->selected = c_fl1;
@@ -434,11 +434,11 @@ static void test_append_greedy_filter(void)
     BlockDriverState *base = no_perm_node("base");
     BlockDriverState *fl = exclusive_writer_node("fl1");
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock();
     bdrv_attach_child(top, base, "backing", &child_of_bds,
                       BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                       &error_abort);
-    bdrv_graph_wrunlock(NULL);
+    bdrv_graph_wrunlock();
 
     aio_context_acquire(qemu_get_aio_context());
     bdrv_append(fl, base, &error_abort);
diff --git a/scripts/block-coroutine-wrapper.py b/scripts/block-coroutine-wrapper.py
index a38e5833fb..38364fa557 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -261,8 +261,8 @@ def gen_no_co_wrapper(func: FuncDecl) -> str:
         graph_lock='    bdrv_graph_rdlock_main_loop();'
         graph_unlock='    bdrv_graph_rdunlock_main_loop();'
     elif func.graph_wrlock:
-        graph_lock='    bdrv_graph_wrlock(NULL);'
-        graph_unlock='    bdrv_graph_wrunlock(NULL);'
+        graph_lock='    bdrv_graph_wrlock();'
+        graph_unlock='    bdrv_graph_wrunlock();'
 
     return f"""\
 /*
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 20/33] block: remove AioContext locking
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (18 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 19/33] graph-lock: remove AioContext locking Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 21/33] block: remove bdrv_co_lock() Kevin Wolf
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

This is the big patch that removes
aio_context_acquire()/aio_context_release() from the block layer and
affected block layer users.

There isn't a clean way to split this patch and the reviewers are likely
the same group of people, so I decided to do it in one patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-ID: <20231205182011.1976568-7-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-global-state.h |   9 +-
 include/block/block-io.h           |   3 +-
 include/block/snapshot.h           |   2 -
 block.c                            | 234 +---------------------
 block/block-backend.c              |  14 --
 block/copy-before-write.c          |  22 +--
 block/export/export.c              |  22 +--
 block/io.c                         |  45 +----
 block/mirror.c                     |  19 --
 block/monitor/bitmap-qmp-cmds.c    |  20 +-
 block/monitor/block-hmp-cmds.c     |  29 ---
 block/qapi-sysemu.c                |  27 +--
 block/qapi.c                       |  18 +-
 block/raw-format.c                 |   5 -
 block/replication.c                |  58 +-----
 block/snapshot.c                   |  22 +--
 block/write-threshold.c            |   6 -
 blockdev.c                         | 307 +++++------------------------
 blockjob.c                         |  18 --
 hw/block/dataplane/virtio-blk.c    |  10 -
 hw/block/dataplane/xen-block.c     |  17 +-
 hw/block/virtio-blk.c              |  13 --
 hw/core/qdev-properties-system.c   |   9 -
 job.c                              |  16 --
 migration/block.c                  |  34 +---
 migration/migration-hmp-cmds.c     |   3 -
 migration/savevm.c                 |  22 ---
 net/colo-compare.c                 |   2 -
 qemu-img.c                         |   4 -
 qemu-io.c                          |  10 +-
 qemu-nbd.c                         |   2 -
 replay/replay-debugging.c          |   4 -
 tests/unit/test-bdrv-drain.c       |  51 +----
 tests/unit/test-bdrv-graph-mod.c   |   6 -
 tests/unit/test-block-iothread.c   |  31 ---
 tests/unit/test-blockjob.c         | 137 -------------
 tests/unit/test-replication.c      |  11 --
 util/async.c                       |   4 -
 util/vhost-user-server.c           |   3 -
 scripts/block-coroutine-wrapper.py |   3 -
 tests/tsan/suppressions.tsan       |   1 -
 41 files changed, 104 insertions(+), 1169 deletions(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index 6b21fbc73f..0327f1c605 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -31,11 +31,10 @@
 /*
  * Global state (GS) API. These functions run under the BQL.
  *
- * If a function modifies the graph, it also uses drain and/or
- * aio_context_acquire/release to be sure it has unique access.
- * aio_context locking is needed together with BQL because of
- * the thread-safe I/O API that concurrently runs and accesses
- * the graph without the BQL.
+ * If a function modifies the graph, it also uses the graph lock to be sure it
+ * has unique access. The graph lock is needed together with BQL because of the
+ * thread-safe I/O API that concurrently runs and accesses the graph without
+ * the BQL.
  *
  * It is important to note that not all of these functions are
  * necessarily limited to running under the BQL, but they would
diff --git a/include/block/block-io.h b/include/block/block-io.h
index f8729ccc55..8eb39a858b 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -31,8 +31,7 @@
 
 /*
  * I/O API functions. These functions are thread-safe, and therefore
- * can run in any thread as long as the thread has called
- * aio_context_acquire/release().
+ * can run in any thread.
  *
  * These functions can only call functions from I/O and Common categories,
  * but can be invoked by GS, "I/O or GS" and I/O APIs.
diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index d49c5599d9..304cc6ea61 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -86,8 +86,6 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs,
 
 /*
  * Group operations. All block drivers are involved.
- * These functions will properly handle dataplane (take aio_context_acquire
- * when appropriate for appropriate block drivers
  */
 
 bool bdrv_all_can_snapshot(bool has_devices, strList *devices,
diff --git a/block.c b/block.c
index 25e1ebc606..91ace5d2d5 100644
--- a/block.c
+++ b/block.c
@@ -1625,7 +1625,6 @@ static int no_coroutine_fn GRAPH_UNLOCKED
 bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
                  QDict *options, int open_flags, Error **errp)
 {
-    AioContext *ctx;
     Error *local_err = NULL;
     int i, ret;
     GLOBAL_STATE_CODE();
@@ -1673,21 +1672,15 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
     bs->supported_read_flags |= BDRV_REQ_REGISTERED_BUF;
     bs->supported_write_flags |= BDRV_REQ_REGISTERED_BUF;
 
-    /* Get the context after .bdrv_open, it can change the context */
-    ctx = bdrv_get_aio_context(bs);
-    aio_context_acquire(ctx);
-
     ret = bdrv_refresh_total_sectors(bs, bs->total_sectors);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Could not refresh total sector count");
-        aio_context_release(ctx);
         return ret;
     }
 
     bdrv_graph_rdlock_main_loop();
     bdrv_refresh_limits(bs, NULL, &local_err);
     bdrv_graph_rdunlock_main_loop();
-    aio_context_release(ctx);
 
     if (local_err) {
         error_propagate(errp, local_err);
@@ -3062,7 +3055,7 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
                          Transaction *tran, Error **errp)
 {
     BdrvChild *new_child;
-    AioContext *parent_ctx, *new_child_ctx;
+    AioContext *parent_ctx;
     AioContext *child_ctx = bdrv_get_aio_context(child_bs);
 
     assert(child_class->get_parent_desc);
@@ -3114,12 +3107,6 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
         }
     }
 
-    new_child_ctx = bdrv_get_aio_context(child_bs);
-    if (new_child_ctx != child_ctx) {
-        aio_context_release(child_ctx);
-        aio_context_acquire(new_child_ctx);
-    }
-
     bdrv_ref(child_bs);
     /*
      * Let every new BdrvChild start with a drained parent. Inserting the child
@@ -3149,11 +3136,6 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
     };
     tran_add(tran, &bdrv_attach_child_common_drv, s);
 
-    if (new_child_ctx != child_ctx) {
-        aio_context_release(new_child_ctx);
-        aio_context_acquire(child_ctx);
-    }
-
     return new_child;
 }
 
@@ -3605,7 +3587,6 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
     int ret = 0;
     bool implicit_backing = false;
     BlockDriverState *backing_hd;
-    AioContext *backing_hd_ctx;
     QDict *options;
     QDict *tmp_parent_options = NULL;
     Error *local_err = NULL;
@@ -3691,11 +3672,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
 
     /* Hook up the backing file link; drop our reference, bs owns the
      * backing_hd reference now */
-    backing_hd_ctx = bdrv_get_aio_context(backing_hd);
-    aio_context_acquire(backing_hd_ctx);
     ret = bdrv_set_backing_hd(bs, backing_hd, errp);
     bdrv_unref(backing_hd);
-    aio_context_release(backing_hd_ctx);
 
     if (ret < 0) {
         goto free_exit;
@@ -3780,7 +3758,6 @@ BdrvChild *bdrv_open_child(const char *filename,
 {
     BlockDriverState *bs;
     BdrvChild *child;
-    AioContext *ctx;
 
     GLOBAL_STATE_CODE();
 
@@ -3791,11 +3768,8 @@ BdrvChild *bdrv_open_child(const char *filename,
     }
 
     bdrv_graph_wrlock();
-    ctx = bdrv_get_aio_context(bs);
-    aio_context_acquire(ctx);
     child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
                               errp);
-    aio_context_release(ctx);
     bdrv_graph_wrunlock();
 
     return child;
@@ -3881,7 +3855,6 @@ static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
     int64_t total_size;
     QemuOpts *opts = NULL;
     BlockDriverState *bs_snapshot = NULL;
-    AioContext *ctx = bdrv_get_aio_context(bs);
     int ret;
 
     GLOBAL_STATE_CODE();
@@ -3890,9 +3863,7 @@ static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
        instead of opening 'filename' directly */
 
     /* Get the required size from the image */
-    aio_context_acquire(ctx);
     total_size = bdrv_getlength(bs);
-    aio_context_release(ctx);
 
     if (total_size < 0) {
         error_setg_errno(errp, -total_size, "Could not get image size");
@@ -3927,10 +3898,7 @@ static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
         goto out;
     }
 
-    aio_context_acquire(ctx);
     ret = bdrv_append(bs_snapshot, bs, errp);
-    aio_context_release(ctx);
-
     if (ret < 0) {
         bs_snapshot = NULL;
         goto out;
@@ -3974,7 +3942,6 @@ bdrv_open_inherit(const char *filename, const char *reference, QDict *options,
     Error *local_err = NULL;
     QDict *snapshot_options = NULL;
     int snapshot_flags = 0;
-    AioContext *ctx = qemu_get_aio_context();
 
     assert(!child_class || !flags);
     assert(!child_class == !parent);
@@ -4115,12 +4082,10 @@ bdrv_open_inherit(const char *filename, const char *reference, QDict *options,
             /* Not requesting BLK_PERM_CONSISTENT_READ because we're only
              * looking at the header to guess the image format. This works even
              * in cases where a guest would not see a consistent state. */
-            ctx = bdrv_get_aio_context(file_bs);
-            aio_context_acquire(ctx);
+            AioContext *ctx = bdrv_get_aio_context(file_bs);
             file = blk_new(ctx, 0, BLK_PERM_ALL);
             blk_insert_bs(file, file_bs, &local_err);
             bdrv_unref(file_bs);
-            aio_context_release(ctx);
 
             if (local_err) {
                 goto fail;
@@ -4167,13 +4132,8 @@ bdrv_open_inherit(const char *filename, const char *reference, QDict *options,
         goto fail;
     }
 
-    /* The AioContext could have changed during bdrv_open_common() */
-    ctx = bdrv_get_aio_context(bs);
-
     if (file) {
-        aio_context_acquire(ctx);
         blk_unref(file);
-        aio_context_release(ctx);
         file = NULL;
     }
 
@@ -4231,16 +4191,13 @@ bdrv_open_inherit(const char *filename, const char *reference, QDict *options,
          * (snapshot_bs); thus, we have to drop the strong reference to bs
          * (which we obtained by calling bdrv_new()). bs will not be deleted,
          * though, because the overlay still has a reference to it. */
-        aio_context_acquire(ctx);
         bdrv_unref(bs);
-        aio_context_release(ctx);
         bs = snapshot_bs;
     }
 
     return bs;
 
 fail:
-    aio_context_acquire(ctx);
     blk_unref(file);
     qobject_unref(snapshot_options);
     qobject_unref(bs->explicit_options);
@@ -4249,14 +4206,11 @@ fail:
     bs->options = NULL;
     bs->explicit_options = NULL;
     bdrv_unref(bs);
-    aio_context_release(ctx);
     error_propagate(errp, local_err);
     return NULL;
 
 close_and_fail:
-    aio_context_acquire(ctx);
     bdrv_unref(bs);
-    aio_context_release(ctx);
     qobject_unref(snapshot_options);
     qobject_unref(options);
     error_propagate(errp, local_err);
@@ -4540,12 +4494,7 @@ void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue)
     if (bs_queue) {
         BlockReopenQueueEntry *bs_entry, *next;
         QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
-            AioContext *ctx = bdrv_get_aio_context(bs_entry->state.bs);
-
-            aio_context_acquire(ctx);
             bdrv_drained_end(bs_entry->state.bs);
-            aio_context_release(ctx);
-
             qobject_unref(bs_entry->state.explicit_options);
             qobject_unref(bs_entry->state.options);
             g_free(bs_entry);
@@ -4577,7 +4526,6 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
 {
     int ret = -1;
     BlockReopenQueueEntry *bs_entry, *next;
-    AioContext *ctx;
     Transaction *tran = tran_new();
     g_autoptr(GSList) refresh_list = NULL;
 
@@ -4586,10 +4534,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
     GLOBAL_STATE_CODE();
 
     QTAILQ_FOREACH(bs_entry, bs_queue, entry) {
-        ctx = bdrv_get_aio_context(bs_entry->state.bs);
-        aio_context_acquire(ctx);
         ret = bdrv_flush(bs_entry->state.bs);
-        aio_context_release(ctx);
         if (ret < 0) {
             error_setg_errno(errp, -ret, "Error flushing drive");
             goto abort;
@@ -4598,10 +4543,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
 
     QTAILQ_FOREACH(bs_entry, bs_queue, entry) {
         assert(bs_entry->state.bs->quiesce_counter > 0);
-        ctx = bdrv_get_aio_context(bs_entry->state.bs);
-        aio_context_acquire(ctx);
         ret = bdrv_reopen_prepare(&bs_entry->state, bs_queue, tran, errp);
-        aio_context_release(ctx);
         if (ret < 0) {
             goto abort;
         }
@@ -4644,10 +4586,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
      * to first element.
      */
     QTAILQ_FOREACH_REVERSE(bs_entry, bs_queue, entry) {
-        ctx = bdrv_get_aio_context(bs_entry->state.bs);
-        aio_context_acquire(ctx);
         bdrv_reopen_commit(&bs_entry->state);
-        aio_context_release(ctx);
     }
 
     bdrv_graph_wrlock();
@@ -4658,10 +4597,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
         BlockDriverState *bs = bs_entry->state.bs;
 
         if (bs->drv->bdrv_reopen_commit_post) {
-            ctx = bdrv_get_aio_context(bs);
-            aio_context_acquire(ctx);
             bs->drv->bdrv_reopen_commit_post(&bs_entry->state);
-            aio_context_release(ctx);
         }
     }
 
@@ -4675,10 +4611,7 @@ abort:
 
     QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
         if (bs_entry->prepared) {
-            ctx = bdrv_get_aio_context(bs_entry->state.bs);
-            aio_context_acquire(ctx);
             bdrv_reopen_abort(&bs_entry->state);
-            aio_context_release(ctx);
         }
     }
 
@@ -4691,24 +4624,13 @@ cleanup:
 int bdrv_reopen(BlockDriverState *bs, QDict *opts, bool keep_old_opts,
                 Error **errp)
 {
-    AioContext *ctx = bdrv_get_aio_context(bs);
     BlockReopenQueue *queue;
-    int ret;
 
     GLOBAL_STATE_CODE();
 
     queue = bdrv_reopen_queue(NULL, bs, opts, keep_old_opts);
 
-    if (ctx != qemu_get_aio_context()) {
-        aio_context_release(ctx);
-    }
-    ret = bdrv_reopen_multiple(queue, errp);
-
-    if (ctx != qemu_get_aio_context()) {
-        aio_context_acquire(ctx);
-    }
-
-    return ret;
+    return bdrv_reopen_multiple(queue, errp);
 }
 
 int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
@@ -4760,7 +4682,6 @@ bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
     const char *child_name = is_backing ? "backing" : "file";
     QObject *value;
     const char *str;
-    AioContext *ctx, *old_ctx;
     bool has_child;
     int ret;
 
@@ -4844,13 +4765,6 @@ bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
         bdrv_drained_begin(old_child_bs);
     }
 
-    old_ctx = bdrv_get_aio_context(bs);
-    ctx = bdrv_get_aio_context(new_child_bs);
-    if (old_ctx != ctx) {
-        aio_context_release(old_ctx);
-        aio_context_acquire(ctx);
-    }
-
     bdrv_graph_rdunlock_main_loop();
     bdrv_graph_wrlock();
 
@@ -4859,11 +4773,6 @@ bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
 
     bdrv_graph_wrunlock();
 
-    if (old_ctx != ctx) {
-        aio_context_release(ctx);
-        aio_context_acquire(old_ctx);
-    }
-
     if (old_child_bs) {
         bdrv_drained_end(old_child_bs);
         bdrv_unref(old_child_bs);
@@ -5537,7 +5446,6 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
     int ret;
     BdrvChild *child;
     Transaction *tran = tran_new();
-    AioContext *old_context, *new_context = NULL;
 
     GLOBAL_STATE_CODE();
 
@@ -5545,21 +5453,8 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
     assert(!bs_new->backing);
     bdrv_graph_rdunlock_main_loop();
 
-    old_context = bdrv_get_aio_context(bs_top);
     bdrv_drained_begin(bs_top);
-
-    /*
-     * bdrv_drained_begin() requires that only the AioContext of the drained
-     * node is locked, and at this point it can still differ from the AioContext
-     * of bs_top.
-     */
-    new_context = bdrv_get_aio_context(bs_new);
-    aio_context_release(old_context);
-    aio_context_acquire(new_context);
     bdrv_drained_begin(bs_new);
-    aio_context_release(new_context);
-    aio_context_acquire(old_context);
-    new_context = NULL;
 
     bdrv_graph_wrlock();
 
@@ -5571,18 +5466,6 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
         goto out;
     }
 
-    /*
-     * bdrv_attach_child_noperm could change the AioContext of bs_top and
-     * bs_new, but at least they are in the same AioContext now. This is the
-     * AioContext that we need to lock for the rest of the function.
-     */
-    new_context = bdrv_get_aio_context(bs_top);
-
-    if (old_context != new_context) {
-        aio_context_release(old_context);
-        aio_context_acquire(new_context);
-    }
-
     ret = bdrv_replace_node_noperm(bs_top, bs_new, true, tran, errp);
     if (ret < 0) {
         goto out;
@@ -5598,11 +5481,6 @@ out:
     bdrv_drained_end(bs_top);
     bdrv_drained_end(bs_new);
 
-    if (new_context && old_context != new_context) {
-        aio_context_release(new_context);
-        aio_context_acquire(old_context);
-    }
-
     return ret;
 }
 
@@ -5697,12 +5575,8 @@ BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *options,
 
     GLOBAL_STATE_CODE();
 
-    aio_context_release(ctx);
-    aio_context_acquire(qemu_get_aio_context());
     new_node_bs = bdrv_new_open_driver_opts(drv, node_name, options, flags,
                                             errp);
-    aio_context_release(qemu_get_aio_context());
-    aio_context_acquire(ctx);
     assert(bdrv_get_aio_context(bs) == ctx);
 
     options = NULL; /* bdrv_new_open_driver() eats options */
@@ -7037,12 +6911,9 @@ void bdrv_activate_all(Error **errp)
     GRAPH_RDLOCK_GUARD_MAINLOOP();
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
         int ret;
 
-        aio_context_acquire(aio_context);
         ret = bdrv_activate(bs, errp);
-        aio_context_release(aio_context);
         if (ret < 0) {
             bdrv_next_cleanup(&it);
             return;
@@ -7137,20 +7008,10 @@ int bdrv_inactivate_all(void)
     BlockDriverState *bs = NULL;
     BdrvNextIterator it;
     int ret = 0;
-    GSList *aio_ctxs = NULL, *ctx;
 
     GLOBAL_STATE_CODE();
     GRAPH_RDLOCK_GUARD_MAINLOOP();
 
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
-        if (!g_slist_find(aio_ctxs, aio_context)) {
-            aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
-            aio_context_acquire(aio_context);
-        }
-    }
-
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         /* Nodes with BDS parents are covered by recursion from the last
          * parent that gets inactivated. Don't inactivate them a second
@@ -7161,17 +7022,10 @@ int bdrv_inactivate_all(void)
         ret = bdrv_inactivate_recurse(bs);
         if (ret < 0) {
             bdrv_next_cleanup(&it);
-            goto out;
+            break;
         }
     }
 
-out:
-    for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
-        AioContext *aio_context = ctx->data;
-        aio_context_release(aio_context);
-    }
-    g_slist_free(aio_ctxs);
-
     return ret;
 }
 
@@ -7257,11 +7111,8 @@ void bdrv_unref(BlockDriverState *bs)
 static void bdrv_schedule_unref_bh(void *opaque)
 {
     BlockDriverState *bs = opaque;
-    AioContext *ctx = bdrv_get_aio_context(bs);
 
-    aio_context_acquire(ctx);
     bdrv_unref(bs);
-    aio_context_release(ctx);
 }
 
 /*
@@ -7398,8 +7249,6 @@ void bdrv_img_create(const char *filename, const char *fmt,
         return;
     }
 
-    aio_context_acquire(qemu_get_aio_context());
-
     /* Create parameter list */
     create_opts = qemu_opts_append(create_opts, drv->create_opts);
     create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
@@ -7549,7 +7398,6 @@ out:
     qemu_opts_del(opts);
     qemu_opts_free(create_opts);
     error_propagate(errp, local_err);
-    aio_context_release(qemu_get_aio_context());
 }
 
 AioContext *bdrv_get_aio_context(BlockDriverState *bs)
@@ -7585,29 +7433,12 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx)
 
 void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
 {
-    AioContext *ctx = bdrv_get_aio_context(bs);
-
-    /* In the main thread, bs->aio_context won't change concurrently */
-    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
-
-    /*
-     * We're in coroutine context, so we already hold the lock of the main
-     * loop AioContext. Don't lock it twice to avoid deadlocks.
-     */
-    assert(qemu_in_coroutine());
-    if (ctx != qemu_get_aio_context()) {
-        aio_context_acquire(ctx);
-    }
+    /* TODO removed in next patch */
 }
 
 void coroutine_fn bdrv_co_unlock(BlockDriverState *bs)
 {
-    AioContext *ctx = bdrv_get_aio_context(bs);
-
-    assert(qemu_in_coroutine());
-    if (ctx != qemu_get_aio_context()) {
-        aio_context_release(ctx);
-    }
+    /* TODO removed in next patch */
 }
 
 static void bdrv_do_remove_aio_context_notifier(BdrvAioNotifier *ban)
@@ -7728,21 +7559,8 @@ static void bdrv_set_aio_context_commit(void *opaque)
     BdrvStateSetAioContext *state = (BdrvStateSetAioContext *) opaque;
     BlockDriverState *bs = (BlockDriverState *) state->bs;
     AioContext *new_context = state->new_ctx;
-    AioContext *old_context = bdrv_get_aio_context(bs);
 
-    /*
-     * Take the old AioContex when detaching it from bs.
-     * At this point, new_context lock is already acquired, and we are now
-     * also taking old_context. This is safe as long as bdrv_detach_aio_context
-     * does not call AIO_POLL_WHILE().
-     */
-    if (old_context != qemu_get_aio_context()) {
-        aio_context_acquire(old_context);
-    }
     bdrv_detach_aio_context(bs);
-    if (old_context != qemu_get_aio_context()) {
-        aio_context_release(old_context);
-    }
     bdrv_attach_aio_context(bs, new_context);
 }
 
@@ -7827,7 +7645,6 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
     Transaction *tran;
     GHashTable *visited;
     int ret;
-    AioContext *old_context = bdrv_get_aio_context(bs);
     GLOBAL_STATE_CODE();
 
     /*
@@ -7857,34 +7674,7 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
         return -EPERM;
     }
 
-    /*
-     * Release old AioContext, it won't be needed anymore, as all
-     * bdrv_drained_begin() have been called already.
-     */
-    if (qemu_get_aio_context() != old_context) {
-        aio_context_release(old_context);
-    }
-
-    /*
-     * Acquire new AioContext since bdrv_drained_end() is going to be called
-     * after we switched all nodes in the new AioContext, and the function
-     * assumes that the lock of the bs is always taken.
-     */
-    if (qemu_get_aio_context() != ctx) {
-        aio_context_acquire(ctx);
-    }
-
     tran_commit(tran);
-
-    if (qemu_get_aio_context() != ctx) {
-        aio_context_release(ctx);
-    }
-
-    /* Re-acquire the old AioContext, since the caller takes and releases it. */
-    if (qemu_get_aio_context() != old_context) {
-        aio_context_acquire(old_context);
-    }
-
     return 0;
 }
 
@@ -8006,7 +7796,6 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                                         const char *node_name, Error **errp)
 {
     BlockDriverState *to_replace_bs = bdrv_find_node(node_name);
-    AioContext *aio_context;
 
     GLOBAL_STATE_CODE();
 
@@ -8015,12 +7804,8 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
         return NULL;
     }
 
-    aio_context = bdrv_get_aio_context(to_replace_bs);
-    aio_context_acquire(aio_context);
-
     if (bdrv_op_is_blocked(to_replace_bs, BLOCK_OP_TYPE_REPLACE, errp)) {
-        to_replace_bs = NULL;
-        goto out;
+        return NULL;
     }
 
     /* We don't want arbitrary node of the BDS chain to be replaced only the top
@@ -8033,12 +7818,9 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
                    "because it cannot be guaranteed that doing so would not "
                    "lead to an abrupt change of visible data",
                    node_name, parent_bs->node_name);
-        to_replace_bs = NULL;
-        goto out;
+        return NULL;
     }
 
-out:
-    aio_context_release(aio_context);
     return to_replace_bs;
 }
 
diff --git a/block/block-backend.c b/block/block-backend.c
index abac4e0235..f412bed274 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -429,7 +429,6 @@ BlockBackend *blk_new_open(const char *filename, const char *reference,
 {
     BlockBackend *blk;
     BlockDriverState *bs;
-    AioContext *ctx;
     uint64_t perm = 0;
     uint64_t shared = BLK_PERM_ALL;
 
@@ -459,23 +458,18 @@ BlockBackend *blk_new_open(const char *filename, const char *reference,
         shared = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
     }
 
-    aio_context_acquire(qemu_get_aio_context());
     bs = bdrv_open(filename, reference, options, flags, errp);
-    aio_context_release(qemu_get_aio_context());
     if (!bs) {
         return NULL;
     }
 
     /* bdrv_open() could have moved bs to a different AioContext */
-    ctx = bdrv_get_aio_context(bs);
     blk = blk_new(bdrv_get_aio_context(bs), perm, shared);
     blk->perm = perm;
     blk->shared_perm = shared;
 
-    aio_context_acquire(ctx);
     blk_insert_bs(blk, bs, errp);
     bdrv_unref(bs);
-    aio_context_release(ctx);
 
     if (!blk->root) {
         blk_unref(blk);
@@ -577,13 +571,9 @@ void blk_remove_all_bs(void)
     GLOBAL_STATE_CODE();
 
     while ((blk = blk_all_next(blk)) != NULL) {
-        AioContext *ctx = blk_get_aio_context(blk);
-
-        aio_context_acquire(ctx);
         if (blk->root) {
             blk_remove_bs(blk);
         }
-        aio_context_release(ctx);
     }
 }
 
@@ -2736,20 +2726,16 @@ int blk_commit_all(void)
     GRAPH_RDLOCK_GUARD_MAINLOOP();
 
     while ((blk = blk_all_next(blk)) != NULL) {
-        AioContext *aio_context = blk_get_aio_context(blk);
         BlockDriverState *unfiltered_bs = bdrv_skip_filters(blk_bs(blk));
 
-        aio_context_acquire(aio_context);
         if (blk_is_inserted(blk) && bdrv_cow_child(unfiltered_bs)) {
             int ret;
 
             ret = bdrv_commit(unfiltered_bs);
             if (ret < 0) {
-                aio_context_release(aio_context);
                 return ret;
             }
         }
-        aio_context_release(aio_context);
     }
     return 0;
 }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 13972879b1..0842a1a6df 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -412,7 +412,6 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
     int64_t cluster_size;
     g_autoptr(BlockdevOptions) full_opts = NULL;
     BlockdevOptionsCbw *opts;
-    AioContext *ctx;
     int ret;
 
     full_opts = cbw_parse_options(options, errp);
@@ -435,15 +434,11 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
 
     GRAPH_RDLOCK_GUARD_MAINLOOP();
 
-    ctx = bdrv_get_aio_context(bs);
-    aio_context_acquire(ctx);
-
     if (opts->bitmap) {
         bitmap = block_dirty_bitmap_lookup(opts->bitmap->node,
                                            opts->bitmap->name, NULL, errp);
         if (!bitmap) {
-            ret = -EINVAL;
-            goto out;
+            return -EINVAL;
         }
     }
     s->on_cbw_error = opts->has_on_cbw_error ? opts->on_cbw_error :
@@ -461,24 +456,21 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
     s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
-        ret = -EINVAL;
-        goto out;
+        return -EINVAL;
     }
 
     cluster_size = block_copy_cluster_size(s->bcs);
 
     s->done_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp);
     if (!s->done_bitmap) {
-        ret = -EINVAL;
-        goto out;
+        return -EINVAL;
     }
     bdrv_disable_dirty_bitmap(s->done_bitmap);
 
     /* s->access_bitmap starts equal to bcs bitmap */
     s->access_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp);
     if (!s->access_bitmap) {
-        ret = -EINVAL;
-        goto out;
+        return -EINVAL;
     }
     bdrv_disable_dirty_bitmap(s->access_bitmap);
     bdrv_dirty_bitmap_merge_internal(s->access_bitmap,
@@ -487,11 +479,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
 
     qemu_co_mutex_init(&s->lock);
     QLIST_INIT(&s->frozen_read_reqs);
-
-    ret = 0;
-out:
-    aio_context_release(ctx);
-    return ret;
+    return 0;
 }
 
 static void cbw_close(BlockDriverState *bs)
diff --git a/block/export/export.c b/block/export/export.c
index a8f274e526..6d51ae8ed7 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -114,7 +114,6 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
     }
 
     ctx = bdrv_get_aio_context(bs);
-    aio_context_acquire(ctx);
 
     if (export->iothread) {
         IOThread *iothread;
@@ -133,8 +132,6 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
         set_context_errp = fixed_iothread ? errp : NULL;
         ret = bdrv_try_change_aio_context(bs, new_ctx, NULL, set_context_errp);
         if (ret == 0) {
-            aio_context_release(ctx);
-            aio_context_acquire(new_ctx);
             ctx = new_ctx;
         } else if (fixed_iothread) {
             goto fail;
@@ -191,8 +188,6 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
     assert(exp->blk != NULL);
 
     QLIST_INSERT_HEAD(&block_exports, exp, next);
-
-    aio_context_release(ctx);
     return exp;
 
 fail:
@@ -200,7 +195,6 @@ fail:
         blk_set_dev_ops(blk, NULL, NULL);
         blk_unref(blk);
     }
-    aio_context_release(ctx);
     if (exp) {
         g_free(exp->id);
         g_free(exp);
@@ -218,9 +212,6 @@ void blk_exp_ref(BlockExport *exp)
 static void blk_exp_delete_bh(void *opaque)
 {
     BlockExport *exp = opaque;
-    AioContext *aio_context = exp->ctx;
-
-    aio_context_acquire(aio_context);
 
     assert(exp->refcount == 0);
     QLIST_REMOVE(exp, next);
@@ -230,8 +221,6 @@ static void blk_exp_delete_bh(void *opaque)
     qapi_event_send_block_export_deleted(exp->id);
     g_free(exp->id);
     g_free(exp);
-
-    aio_context_release(aio_context);
 }
 
 void blk_exp_unref(BlockExport *exp)
@@ -249,22 +238,16 @@ void blk_exp_unref(BlockExport *exp)
  * connections and other internally held references start to shut down. When
  * the function returns, there may still be active references while the export
  * is in the process of shutting down.
- *
- * Acquires exp->ctx internally. Callers must *not* hold the lock.
  */
 void blk_exp_request_shutdown(BlockExport *exp)
 {
-    AioContext *aio_context = exp->ctx;
-
-    aio_context_acquire(aio_context);
-
     /*
      * If the user doesn't own the export any more, it is already shutting
      * down. We must not call .request_shutdown and decrease the refcount a
      * second time.
      */
     if (!exp->user_owned) {
-        goto out;
+        return;
     }
 
     exp->drv->request_shutdown(exp);
@@ -272,9 +255,6 @@ void blk_exp_request_shutdown(BlockExport *exp)
     assert(exp->user_owned);
     exp->user_owned = false;
     blk_exp_unref(exp);
-
-out:
-    aio_context_release(aio_context);
 }
 
 /*
diff --git a/block/io.c b/block/io.c
index 7e62fabbf5..8fa7670571 100644
--- a/block/io.c
+++ b/block/io.c
@@ -294,8 +294,6 @@ static void bdrv_co_drain_bh_cb(void *opaque)
     BlockDriverState *bs = data->bs;
 
     if (bs) {
-        AioContext *ctx = bdrv_get_aio_context(bs);
-        aio_context_acquire(ctx);
         bdrv_dec_in_flight(bs);
         if (data->begin) {
             bdrv_do_drained_begin(bs, data->parent, data->poll);
@@ -303,7 +301,6 @@ static void bdrv_co_drain_bh_cb(void *opaque)
             assert(!data->poll);
             bdrv_do_drained_end(bs, data->parent);
         }
-        aio_context_release(ctx);
     } else {
         assert(data->begin);
         bdrv_drain_all_begin();
@@ -320,8 +317,6 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
 {
     BdrvCoDrainData data;
     Coroutine *self = qemu_coroutine_self();
-    AioContext *ctx = bdrv_get_aio_context(bs);
-    AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
 
     /* Calling bdrv_drain() from a BH ensures the current coroutine yields and
      * other coroutines run if they were queued by aio_co_enter(). */
@@ -340,17 +335,6 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
         bdrv_inc_in_flight(bs);
     }
 
-    /*
-     * Temporarily drop the lock across yield or we would get deadlocks.
-     * bdrv_co_drain_bh_cb() reaquires the lock as needed.
-     *
-     * When we yield below, the lock for the current context will be
-     * released, so if this is actually the lock that protects bs, don't drop
-     * it a second time.
-     */
-    if (ctx != co_ctx) {
-        aio_context_release(ctx);
-    }
     replay_bh_schedule_oneshot_event(qemu_get_aio_context(),
                                      bdrv_co_drain_bh_cb, &data);
 
@@ -358,11 +342,6 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     /* If we are resumed from some other event (such as an aio completion or a
      * timer callback), it is a bug in the caller that should be fixed. */
     assert(data.done);
-
-    /* Reacquire the AioContext of bs if we dropped it */
-    if (ctx != co_ctx) {
-        aio_context_acquire(ctx);
-    }
 }
 
 static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
@@ -478,13 +457,12 @@ static bool bdrv_drain_all_poll(void)
     GLOBAL_STATE_CODE();
     GRAPH_RDLOCK_GUARD_MAINLOOP();
 
-    /* bdrv_drain_poll() can't make changes to the graph and we are holding the
-     * main AioContext lock, so iterating bdrv_next_all_states() is safe. */
+    /*
+     * bdrv_drain_poll() can't make changes to the graph and we hold the BQL,
+     * so iterating bdrv_next_all_states() is safe.
+     */
     while ((bs = bdrv_next_all_states(bs))) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-        aio_context_acquire(aio_context);
         result |= bdrv_drain_poll(bs, NULL, true);
-        aio_context_release(aio_context);
     }
 
     return result;
@@ -525,11 +503,7 @@ void bdrv_drain_all_begin_nopoll(void)
     /* Quiesce all nodes, without polling in-flight requests yet. The graph
      * cannot change during this loop. */
     while ((bs = bdrv_next_all_states(bs))) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
-        aio_context_acquire(aio_context);
         bdrv_do_drained_begin(bs, NULL, false);
-        aio_context_release(aio_context);
     }
 }
 
@@ -588,11 +562,7 @@ void bdrv_drain_all_end(void)
     }
 
     while ((bs = bdrv_next_all_states(bs))) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
-        aio_context_acquire(aio_context);
         bdrv_do_drained_end(bs, NULL);
-        aio_context_release(aio_context);
     }
 
     assert(qemu_get_current_aio_context() == qemu_get_aio_context());
@@ -2368,15 +2338,10 @@ int bdrv_flush_all(void)
     }
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-        int ret;
-
-        aio_context_acquire(aio_context);
-        ret = bdrv_flush(bs);
+        int ret = bdrv_flush(bs);
         if (ret < 0 && !result) {
             result = ret;
         }
-        aio_context_release(aio_context);
     }
 
     return result;
diff --git a/block/mirror.c b/block/mirror.c
index 51f9e2f17c..5145eb53e1 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -662,7 +662,6 @@ static int mirror_exit_common(Job *job)
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
     BlockJob *bjob = &s->common;
     MirrorBDSOpaque *bs_opaque;
-    AioContext *replace_aio_context = NULL;
     BlockDriverState *src;
     BlockDriverState *target_bs;
     BlockDriverState *mirror_top_bs;
@@ -677,7 +676,6 @@ static int mirror_exit_common(Job *job)
     }
     s->prepared = true;
 
-    aio_context_acquire(qemu_get_aio_context());
     bdrv_graph_rdlock_main_loop();
 
     mirror_top_bs = s->mirror_top_bs;
@@ -742,11 +740,6 @@ static int mirror_exit_common(Job *job)
     }
     bdrv_graph_rdunlock_main_loop();
 
-    if (s->to_replace) {
-        replace_aio_context = bdrv_get_aio_context(s->to_replace);
-        aio_context_acquire(replace_aio_context);
-    }
-
     if (s->should_complete && !abort) {
         BlockDriverState *to_replace = s->to_replace ?: src;
         bool ro = bdrv_is_read_only(to_replace);
@@ -785,9 +778,6 @@ static int mirror_exit_common(Job *job)
         error_free(s->replace_blocker);
         bdrv_unref(s->to_replace);
     }
-    if (replace_aio_context) {
-        aio_context_release(replace_aio_context);
-    }
     g_free(s->replaces);
 
     /*
@@ -811,8 +801,6 @@ static int mirror_exit_common(Job *job)
     bdrv_unref(mirror_top_bs);
     bdrv_unref(src);
 
-    aio_context_release(qemu_get_aio_context());
-
     return ret;
 }
 
@@ -1191,24 +1179,17 @@ static void mirror_complete(Job *job, Error **errp)
 
     /* block all operations on to_replace bs */
     if (s->replaces) {
-        AioContext *replace_aio_context;
-
         s->to_replace = bdrv_find_node(s->replaces);
         if (!s->to_replace) {
             error_setg(errp, "Node name '%s' not found", s->replaces);
             return;
         }
 
-        replace_aio_context = bdrv_get_aio_context(s->to_replace);
-        aio_context_acquire(replace_aio_context);
-
         /* TODO Translate this into child freeze system. */
         error_setg(&s->replace_blocker,
                    "block device is in use by block-job-complete");
         bdrv_op_block_all(s->to_replace, s->replace_blocker);
         bdrv_ref(s->to_replace);
-
-        aio_context_release(replace_aio_context);
     }
 
     s->should_complete = true;
diff --git a/block/monitor/bitmap-qmp-cmds.c b/block/monitor/bitmap-qmp-cmds.c
index 70d01a3776..a738e7bbf7 100644
--- a/block/monitor/bitmap-qmp-cmds.c
+++ b/block/monitor/bitmap-qmp-cmds.c
@@ -95,7 +95,6 @@ void qmp_block_dirty_bitmap_add(const char *node, const char *name,
 {
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
-    AioContext *aio_context;
 
     if (!name || name[0] == '\0') {
         error_setg(errp, "Bitmap name cannot be empty");
@@ -107,14 +106,11 @@ void qmp_block_dirty_bitmap_add(const char *node, const char *name,
         return;
     }
 
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-
     if (has_granularity) {
         if (granularity < 512 || !is_power_of_2(granularity)) {
             error_setg(errp, "Granularity must be power of 2 "
                              "and at least 512");
-            goto out;
+            return;
         }
     } else {
         /* Default to cluster size, if available: */
@@ -132,12 +128,12 @@ void qmp_block_dirty_bitmap_add(const char *node, const char *name,
     if (persistent &&
         !bdrv_can_store_new_dirty_bitmap(bs, name, granularity, errp))
     {
-        goto out;
+        return;
     }
 
     bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
     if (bitmap == NULL) {
-        goto out;
+        return;
     }
 
     if (disabled) {
@@ -145,9 +141,6 @@ void qmp_block_dirty_bitmap_add(const char *node, const char *name,
     }
 
     bdrv_dirty_bitmap_set_persistence(bitmap, persistent);
-
-out:
-    aio_context_release(aio_context);
 }
 
 BdrvDirtyBitmap *block_dirty_bitmap_remove(const char *node, const char *name,
@@ -157,7 +150,6 @@ BdrvDirtyBitmap *block_dirty_bitmap_remove(const char *node, const char *name,
 {
     BlockDriverState *bs;
     BdrvDirtyBitmap *bitmap;
-    AioContext *aio_context;
 
     GLOBAL_STATE_CODE();
 
@@ -166,19 +158,14 @@ BdrvDirtyBitmap *block_dirty_bitmap_remove(const char *node, const char *name,
         return NULL;
     }
 
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-
     if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_BUSY | BDRV_BITMAP_RO,
                                 errp)) {
-        aio_context_release(aio_context);
         return NULL;
     }
 
     if (bdrv_dirty_bitmap_get_persistence(bitmap) &&
         bdrv_remove_persistent_dirty_bitmap(bs, name, errp) < 0)
     {
-        aio_context_release(aio_context);
         return NULL;
     }
 
@@ -190,7 +177,6 @@ BdrvDirtyBitmap *block_dirty_bitmap_remove(const char *node, const char *name,
         *bitmap_bs = bs;
     }
 
-    aio_context_release(aio_context);
     return release ? NULL : bitmap;
 }
 
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index c729cbf1eb..bdbb5cb141 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -141,7 +141,6 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
     const char *id = qdict_get_str(qdict, "id");
     BlockBackend *blk;
     BlockDriverState *bs;
-    AioContext *aio_context;
     Error *local_err = NULL;
 
     GLOBAL_STATE_CODE();
@@ -168,14 +167,10 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
         return;
     }
 
-    aio_context = blk_get_aio_context(blk);
-    aio_context_acquire(aio_context);
-
     bs = blk_bs(blk);
     if (bs) {
         if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_DRIVE_DEL, &local_err)) {
             error_report_err(local_err);
-            aio_context_release(aio_context);
             return;
         }
 
@@ -196,8 +191,6 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
     } else {
         blk_unref(blk);
     }
-
-    aio_context_release(aio_context);
 }
 
 void hmp_commit(Monitor *mon, const QDict *qdict)
@@ -213,7 +206,6 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
         ret = blk_commit_all();
     } else {
         BlockDriverState *bs;
-        AioContext *aio_context;
 
         blk = blk_by_name(device);
         if (!blk) {
@@ -222,18 +214,13 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
         }
 
         bs = bdrv_skip_implicit_filters(blk_bs(blk));
-        aio_context = bdrv_get_aio_context(bs);
-        aio_context_acquire(aio_context);
 
         if (!blk_is_available(blk)) {
             error_report("Device '%s' has no medium", device);
-            aio_context_release(aio_context);
             return;
         }
 
         ret = bdrv_commit(bs);
-
-        aio_context_release(aio_context);
     }
     if (ret < 0) {
         error_report("'commit' error for '%s': %s", device, strerror(-ret));
@@ -560,7 +547,6 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
     BlockBackend *blk = NULL;
     BlockDriverState *bs = NULL;
     BlockBackend *local_blk = NULL;
-    AioContext *ctx = NULL;
     bool qdev = qdict_get_try_bool(qdict, "qdev", false);
     const char *device = qdict_get_str(qdict, "device");
     const char *command = qdict_get_str(qdict, "command");
@@ -582,9 +568,6 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
         }
     }
 
-    ctx = blk ? blk_get_aio_context(blk) : bdrv_get_aio_context(bs);
-    aio_context_acquire(ctx);
-
     if (bs) {
         blk = local_blk = blk_new(bdrv_get_aio_context(bs), 0, BLK_PERM_ALL);
         ret = blk_insert_bs(blk, bs, &err);
@@ -622,11 +605,6 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 
 fail:
     blk_unref(local_blk);
-
-    if (ctx) {
-        aio_context_release(ctx);
-    }
-
     hmp_handle_error(mon, err);
 }
 
@@ -882,7 +860,6 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
     int nb_sns, i;
     int total;
     int *global_snapshots;
-    AioContext *aio_context;
 
     typedef struct SnapshotEntry {
         QEMUSnapshotInfo sn;
@@ -909,11 +886,8 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
         error_report_err(err);
         return;
     }
-    aio_context = bdrv_get_aio_context(bs);
 
-    aio_context_acquire(aio_context);
     nb_sns = bdrv_snapshot_list(bs, &sn_tab);
-    aio_context_release(aio_context);
 
     if (nb_sns < 0) {
         monitor_printf(mon, "bdrv_snapshot_list: error %d\n", nb_sns);
@@ -924,9 +898,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
         int bs1_nb_sns = 0;
         ImageEntry *ie;
         SnapshotEntry *se;
-        AioContext *ctx = bdrv_get_aio_context(bs1);
 
-        aio_context_acquire(ctx);
         if (bdrv_can_snapshot(bs1)) {
             sn = NULL;
             bs1_nb_sns = bdrv_snapshot_list(bs1, &sn);
@@ -944,7 +916,6 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
             }
             g_free(sn);
         }
-        aio_context_release(ctx);
     }
 
     if (no_snapshot) {
diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
index 1618cd225a..e4282631d2 100644
--- a/block/qapi-sysemu.c
+++ b/block/qapi-sysemu.c
@@ -174,7 +174,6 @@ blockdev_remove_medium(const char *device, const char *id, Error **errp)
 {
     BlockBackend *blk;
     BlockDriverState *bs;
-    AioContext *aio_context;
     bool has_attached_device;
 
     GLOBAL_STATE_CODE();
@@ -204,13 +203,10 @@ blockdev_remove_medium(const char *device, const char *id, Error **errp)
         return;
     }
 
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-
     bdrv_graph_rdlock_main_loop();
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_EJECT, errp)) {
         bdrv_graph_rdunlock_main_loop();
-        goto out;
+        return;
     }
     bdrv_graph_rdunlock_main_loop();
 
@@ -223,9 +219,6 @@ blockdev_remove_medium(const char *device, const char *id, Error **errp)
          * value passed here (i.e. false). */
         blk_dev_change_media_cb(blk, false, &error_abort);
     }
-
-out:
-    aio_context_release(aio_context);
 }
 
 void qmp_blockdev_remove_medium(const char *id, Error **errp)
@@ -237,7 +230,6 @@ static void qmp_blockdev_insert_anon_medium(BlockBackend *blk,
                                             BlockDriverState *bs, Error **errp)
 {
     Error *local_err = NULL;
-    AioContext *ctx;
     bool has_device;
     int ret;
 
@@ -259,11 +251,7 @@ static void qmp_blockdev_insert_anon_medium(BlockBackend *blk,
         return;
     }
 
-    ctx = bdrv_get_aio_context(bs);
-    aio_context_acquire(ctx);
     ret = blk_insert_bs(blk, bs, errp);
-    aio_context_release(ctx);
-
     if (ret < 0) {
         return;
     }
@@ -374,9 +362,7 @@ void qmp_blockdev_change_medium(const char *device,
         qdict_put_str(options, "driver", format);
     }
 
-    aio_context_acquire(qemu_get_aio_context());
     medium_bs = bdrv_open(filename, NULL, options, bdrv_flags, errp);
-    aio_context_release(qemu_get_aio_context());
 
     if (!medium_bs) {
         goto fail;
@@ -437,20 +423,16 @@ void qmp_block_set_io_throttle(BlockIOThrottle *arg, Error **errp)
     ThrottleConfig cfg;
     BlockDriverState *bs;
     BlockBackend *blk;
-    AioContext *aio_context;
 
     blk = qmp_get_blk(arg->device, arg->id, errp);
     if (!blk) {
         return;
     }
 
-    aio_context = blk_get_aio_context(blk);
-    aio_context_acquire(aio_context);
-
     bs = blk_bs(blk);
     if (!bs) {
         error_setg(errp, "Device has no medium");
-        goto out;
+        return;
     }
 
     throttle_config_init(&cfg);
@@ -505,7 +487,7 @@ void qmp_block_set_io_throttle(BlockIOThrottle *arg, Error **errp)
     }
 
     if (!throttle_is_valid(&cfg, errp)) {
-        goto out;
+        return;
     }
 
     if (throttle_enabled(&cfg)) {
@@ -522,9 +504,6 @@ void qmp_block_set_io_throttle(BlockIOThrottle *arg, Error **errp)
         /* If all throttling settings are set to 0, disable I/O limits */
         blk_io_limits_disable(blk);
     }
-
-out:
-    aio_context_release(aio_context);
 }
 
 void qmp_block_latency_histogram_set(
diff --git a/block/qapi.c b/block/qapi.c
index 82a30b38fe..9e806fa230 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -234,13 +234,11 @@ bdrv_do_query_node_info(BlockDriverState *bs, BlockNodeInfo *info, Error **errp)
     int ret;
     Error *err = NULL;
 
-    aio_context_acquire(bdrv_get_aio_context(bs));
-
     size = bdrv_getlength(bs);
     if (size < 0) {
         error_setg_errno(errp, -size, "Can't get image size '%s'",
                          bs->exact_filename);
-        goto out;
+        return;
     }
 
     bdrv_refresh_filename(bs);
@@ -265,7 +263,7 @@ bdrv_do_query_node_info(BlockDriverState *bs, BlockNodeInfo *info, Error **errp)
     info->format_specific = bdrv_get_specific_info(bs, &err);
     if (err) {
         error_propagate(errp, err);
-        goto out;
+        return;
     }
     backing_filename = bs->backing_file;
     if (backing_filename[0] != '\0') {
@@ -300,11 +298,8 @@ bdrv_do_query_node_info(BlockDriverState *bs, BlockNodeInfo *info, Error **errp)
         break;
     default:
         error_propagate(errp, err);
-        goto out;
+        return;
     }
-
-out:
-    aio_context_release(bdrv_get_aio_context(bs));
 }
 
 /**
@@ -709,15 +704,10 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
     /* Just to be safe if query_nodes is not always initialized */
     if (has_query_nodes && query_nodes) {
         for (bs = bdrv_next_node(NULL); bs; bs = bdrv_next_node(bs)) {
-            AioContext *ctx = bdrv_get_aio_context(bs);
-
-            aio_context_acquire(ctx);
             QAPI_LIST_APPEND(tail, bdrv_query_bds_stats(bs, false));
-            aio_context_release(ctx);
         }
     } else {
         for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) {
-            AioContext *ctx = blk_get_aio_context(blk);
             BlockStats *s;
             char *qdev;
 
@@ -725,7 +715,6 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
                 continue;
             }
 
-            aio_context_acquire(ctx);
             s = bdrv_query_bds_stats(blk_bs(blk), true);
             s->device = g_strdup(blk_name(blk));
 
@@ -737,7 +726,6 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
             }
 
             bdrv_query_blk_stats(s->stats, blk);
-            aio_context_release(ctx);
 
             QAPI_LIST_APPEND(tail, s);
         }
diff --git a/block/raw-format.c b/block/raw-format.c
index 1111dffd54..ac7e8495f6 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -470,7 +470,6 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
     BDRVRawState *s = bs->opaque;
-    AioContext *ctx;
     bool has_size;
     uint64_t offset, size;
     BdrvChildRole file_role;
@@ -522,11 +521,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
                 bs->file->bs->filename);
     }
 
-    ctx = bdrv_get_aio_context(bs);
-    aio_context_acquire(ctx);
     ret = raw_apply_options(bs, s, offset, has_size, size, errp);
-    aio_context_release(ctx);
-
     if (ret < 0) {
         return ret;
     }
diff --git a/block/replication.c b/block/replication.c
index 424b537ff7..ca6bd0a720 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -394,14 +394,7 @@ static void reopen_backing_file(BlockDriverState *bs, bool writable,
     }
 
     if (reopen_queue) {
-        AioContext *ctx = bdrv_get_aio_context(bs);
-        if (ctx != qemu_get_aio_context()) {
-            aio_context_release(ctx);
-        }
         bdrv_reopen_multiple(reopen_queue, errp);
-        if (ctx != qemu_get_aio_context()) {
-            aio_context_acquire(ctx);
-        }
     }
 }
 
@@ -462,14 +455,11 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
     BlockDriverState *top_bs;
     BdrvChild *active_disk, *hidden_disk, *secondary_disk;
     int64_t active_length, hidden_length, disk_length;
-    AioContext *aio_context;
     Error *local_err = NULL;
     BackupPerf perf = { .use_copy_range = true, .max_workers = 1 };
 
     GLOBAL_STATE_CODE();
 
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
     s = bs->opaque;
 
     if (s->stage == BLOCK_REPLICATION_DONE ||
@@ -479,20 +469,17 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
          * Ignore the request because the secondary side of replication
          * doesn't have to do anything anymore.
          */
-        aio_context_release(aio_context);
         return;
     }
 
     if (s->stage != BLOCK_REPLICATION_NONE) {
         error_setg(errp, "Block replication is running or done");
-        aio_context_release(aio_context);
         return;
     }
 
     if (s->mode != mode) {
         error_setg(errp, "The parameter mode's value is invalid, needs %d,"
                    " but got %d", s->mode, mode);
-        aio_context_release(aio_context);
         return;
     }
 
@@ -505,7 +492,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         if (!active_disk || !active_disk->bs || !active_disk->bs->backing) {
             error_setg(errp, "Active disk doesn't have backing file");
             bdrv_graph_rdunlock_main_loop();
-            aio_context_release(aio_context);
             return;
         }
 
@@ -513,7 +499,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         if (!hidden_disk->bs || !hidden_disk->bs->backing) {
             error_setg(errp, "Hidden disk doesn't have backing file");
             bdrv_graph_rdunlock_main_loop();
-            aio_context_release(aio_context);
             return;
         }
 
@@ -521,7 +506,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         if (!secondary_disk->bs || !bdrv_has_blk(secondary_disk->bs)) {
             error_setg(errp, "The secondary disk doesn't have block backend");
             bdrv_graph_rdunlock_main_loop();
-            aio_context_release(aio_context);
             return;
         }
         bdrv_graph_rdunlock_main_loop();
@@ -534,7 +518,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
             active_length != hidden_length || hidden_length != disk_length) {
             error_setg(errp, "Active disk, hidden disk, secondary disk's length"
                        " are not the same");
-            aio_context_release(aio_context);
             return;
         }
 
@@ -546,7 +529,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
             !hidden_disk->bs->drv->bdrv_make_empty) {
             error_setg(errp,
                        "Active disk or hidden disk doesn't support make_empty");
-            aio_context_release(aio_context);
             bdrv_graph_rdunlock_main_loop();
             return;
         }
@@ -556,7 +538,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         reopen_backing_file(bs, true, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
-            aio_context_release(aio_context);
             return;
         }
 
@@ -569,7 +550,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         if (local_err) {
             error_propagate(errp, local_err);
             bdrv_graph_wrunlock();
-            aio_context_release(aio_context);
             return;
         }
 
@@ -580,7 +560,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         if (local_err) {
             error_propagate(errp, local_err);
             bdrv_graph_wrunlock();
-            aio_context_release(aio_context);
             return;
         }
 
@@ -594,7 +573,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
             error_setg(errp, "No top_bs or it is invalid");
             bdrv_graph_wrunlock();
             reopen_backing_file(bs, false, NULL);
-            aio_context_release(aio_context);
             return;
         }
         bdrv_op_block_all(top_bs, s->blocker);
@@ -612,13 +590,11 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         if (local_err) {
             error_propagate(errp, local_err);
             backup_job_cleanup(bs);
-            aio_context_release(aio_context);
             return;
         }
         job_start(&s->backup_job->job);
         break;
     default:
-        aio_context_release(aio_context);
         abort();
     }
 
@@ -629,18 +605,12 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
     }
 
     s->error = 0;
-    aio_context_release(aio_context);
 }
 
 static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
 {
     BlockDriverState *bs = rs->opaque;
-    BDRVReplicationState *s;
-    AioContext *aio_context;
-
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-    s = bs->opaque;
+    BDRVReplicationState *s = bs->opaque;
 
     if (s->stage == BLOCK_REPLICATION_DONE ||
         s->stage == BLOCK_REPLICATION_FAILOVER) {
@@ -649,38 +619,28 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
          * Ignore the request because the secondary side of replication
          * doesn't have to do anything anymore.
          */
-        aio_context_release(aio_context);
         return;
     }
 
     if (s->mode == REPLICATION_MODE_SECONDARY) {
         secondary_do_checkpoint(bs, errp);
     }
-    aio_context_release(aio_context);
 }
 
 static void replication_get_error(ReplicationState *rs, Error **errp)
 {
     BlockDriverState *bs = rs->opaque;
-    BDRVReplicationState *s;
-    AioContext *aio_context;
-
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-    s = bs->opaque;
+    BDRVReplicationState *s = bs->opaque;
 
     if (s->stage == BLOCK_REPLICATION_NONE) {
         error_setg(errp, "Block replication is not running");
-        aio_context_release(aio_context);
         return;
     }
 
     if (s->error) {
         error_setg(errp, "I/O error occurred");
-        aio_context_release(aio_context);
         return;
     }
-    aio_context_release(aio_context);
 }
 
 static void replication_done(void *opaque, int ret)
@@ -708,12 +668,7 @@ static void replication_done(void *opaque, int ret)
 static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
 {
     BlockDriverState *bs = rs->opaque;
-    BDRVReplicationState *s;
-    AioContext *aio_context;
-
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-    s = bs->opaque;
+    BDRVReplicationState *s = bs->opaque;
 
     if (s->stage == BLOCK_REPLICATION_DONE ||
         s->stage == BLOCK_REPLICATION_FAILOVER) {
@@ -722,13 +677,11 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          * Ignore the request because the secondary side of replication
          * doesn't have to do anything anymore.
          */
-        aio_context_release(aio_context);
         return;
     }
 
     if (s->stage != BLOCK_REPLICATION_RUNNING) {
         error_setg(errp, "Block replication is not running");
-        aio_context_release(aio_context);
         return;
     }
 
@@ -744,15 +697,12 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          * disk, secondary disk in backup_job_completed().
          */
         if (s->backup_job) {
-            aio_context_release(aio_context);
             job_cancel_sync(&s->backup_job->job, true);
-            aio_context_acquire(aio_context);
         }
 
         if (!failover) {
             secondary_do_checkpoint(bs, errp);
             s->stage = BLOCK_REPLICATION_DONE;
-            aio_context_release(aio_context);
             return;
         }
 
@@ -765,10 +715,8 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
         bdrv_graph_rdunlock_main_loop();
         break;
     default:
-        aio_context_release(aio_context);
         abort();
     }
-    aio_context_release(aio_context);
 }
 
 static const char *const replication_strong_runtime_opts[] = {
diff --git a/block/snapshot.c b/block/snapshot.c
index 6fd720aeff..8694fc0a3e 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -527,9 +527,7 @@ static bool GRAPH_RDLOCK bdrv_all_snapshots_includes_bs(BlockDriverState *bs)
     return bdrv_has_blk(bs) || QLIST_EMPTY(&bs->parents);
 }
 
-/* Group operations. All block drivers are involved.
- * These functions will properly handle dataplane (take aio_context_acquire
- * when appropriate for appropriate block drivers) */
+/* Group operations. All block drivers are involved. */
 
 bool bdrv_all_can_snapshot(bool has_devices, strList *devices,
                            Error **errp)
@@ -547,14 +545,11 @@ bool bdrv_all_can_snapshot(bool has_devices, strList *devices,
     iterbdrvs = bdrvs;
     while (iterbdrvs) {
         BlockDriverState *bs = iterbdrvs->data;
-        AioContext *ctx = bdrv_get_aio_context(bs);
         bool ok = true;
 
-        aio_context_acquire(ctx);
         if (devices || bdrv_all_snapshots_includes_bs(bs)) {
             ok = bdrv_can_snapshot(bs);
         }
-        aio_context_release(ctx);
         if (!ok) {
             error_setg(errp, "Device '%s' is writable but does not support "
                        "snapshots", bdrv_get_device_or_node_name(bs));
@@ -584,18 +579,15 @@ int bdrv_all_delete_snapshot(const char *name,
     iterbdrvs = bdrvs;
     while (iterbdrvs) {
         BlockDriverState *bs = iterbdrvs->data;
-        AioContext *ctx = bdrv_get_aio_context(bs);
         QEMUSnapshotInfo sn1, *snapshot = &sn1;
         int ret = 0;
 
-        aio_context_acquire(ctx);
         if ((devices || bdrv_all_snapshots_includes_bs(bs)) &&
             bdrv_snapshot_find(bs, snapshot, name) >= 0)
         {
             ret = bdrv_snapshot_delete(bs, snapshot->id_str,
                                        snapshot->name, errp);
         }
-        aio_context_release(ctx);
         if (ret < 0) {
             error_prepend(errp, "Could not delete snapshot '%s' on '%s': ",
                           name, bdrv_get_device_or_node_name(bs));
@@ -630,17 +622,14 @@ int bdrv_all_goto_snapshot(const char *name,
     iterbdrvs = bdrvs;
     while (iterbdrvs) {
         BlockDriverState *bs = iterbdrvs->data;
-        AioContext *ctx = bdrv_get_aio_context(bs);
         bool all_snapshots_includes_bs;
 
-        aio_context_acquire(ctx);
         bdrv_graph_rdlock_main_loop();
         all_snapshots_includes_bs = bdrv_all_snapshots_includes_bs(bs);
         bdrv_graph_rdunlock_main_loop();
 
         ret = (devices || all_snapshots_includes_bs) ?
               bdrv_snapshot_goto(bs, name, errp) : 0;
-        aio_context_release(ctx);
         if (ret < 0) {
             bdrv_graph_rdlock_main_loop();
             error_prepend(errp, "Could not load snapshot '%s' on '%s': ",
@@ -672,15 +661,12 @@ int bdrv_all_has_snapshot(const char *name,
     iterbdrvs = bdrvs;
     while (iterbdrvs) {
         BlockDriverState *bs = iterbdrvs->data;
-        AioContext *ctx = bdrv_get_aio_context(bs);
         QEMUSnapshotInfo sn;
         int ret = 0;
 
-        aio_context_acquire(ctx);
         if (devices || bdrv_all_snapshots_includes_bs(bs)) {
             ret = bdrv_snapshot_find(bs, &sn, name);
         }
-        aio_context_release(ctx);
         if (ret < 0) {
             if (ret == -ENOENT) {
                 return 0;
@@ -717,10 +703,8 @@ int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
     iterbdrvs = bdrvs;
     while (iterbdrvs) {
         BlockDriverState *bs = iterbdrvs->data;
-        AioContext *ctx = bdrv_get_aio_context(bs);
         int ret = 0;
 
-        aio_context_acquire(ctx);
         if (bs == vm_state_bs) {
             sn->vm_state_size = vm_state_size;
             ret = bdrv_snapshot_create(bs, sn);
@@ -728,7 +712,6 @@ int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
             sn->vm_state_size = 0;
             ret = bdrv_snapshot_create(bs, sn);
         }
-        aio_context_release(ctx);
         if (ret < 0) {
             error_setg(errp, "Could not create snapshot '%s' on '%s'",
                        sn->name, bdrv_get_device_or_node_name(bs));
@@ -759,13 +742,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(const char *vmstate_bs,
     iterbdrvs = bdrvs;
     while (iterbdrvs) {
         BlockDriverState *bs = iterbdrvs->data;
-        AioContext *ctx = bdrv_get_aio_context(bs);
         bool found = false;
 
-        aio_context_acquire(ctx);
         found = (devices || bdrv_all_snapshots_includes_bs(bs)) &&
             bdrv_can_snapshot(bs);
-        aio_context_release(ctx);
 
         if (vmstate_bs) {
             if (g_str_equal(vmstate_bs,
diff --git a/block/write-threshold.c b/block/write-threshold.c
index 76d8885677..56fe88de81 100644
--- a/block/write-threshold.c
+++ b/block/write-threshold.c
@@ -33,7 +33,6 @@ void qmp_block_set_write_threshold(const char *node_name,
                                    Error **errp)
 {
     BlockDriverState *bs;
-    AioContext *aio_context;
 
     bs = bdrv_find_node(node_name);
     if (!bs) {
@@ -41,12 +40,7 @@ void qmp_block_set_write_threshold(const char *node_name,
         return;
     }
 
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-
     bdrv_write_threshold_set(bs, threshold_bytes);
-
-    aio_context_release(aio_context);
 }
 
 void bdrv_write_threshold_check_write(BlockDriverState *bs, int64_t offset,
diff --git a/blockdev.c b/blockdev.c
index 9e1381169d..5d8b3a23eb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -662,7 +662,6 @@ err_no_opts:
 /* Takes the ownership of bs_opts */
 BlockDriverState *bds_tree_init(QDict *bs_opts, Error **errp)
 {
-    BlockDriverState *bs;
     int bdrv_flags = 0;
 
     GLOBAL_STATE_CODE();
@@ -677,11 +676,7 @@ BlockDriverState *bds_tree_init(QDict *bs_opts, Error **errp)
         bdrv_flags |= BDRV_O_INACTIVE;
     }
 
-    aio_context_acquire(qemu_get_aio_context());
-    bs = bdrv_open(NULL, NULL, bs_opts, bdrv_flags, errp);
-    aio_context_release(qemu_get_aio_context());
-
-    return bs;
+    return bdrv_open(NULL, NULL, bs_opts, bdrv_flags, errp);
 }
 
 void blockdev_close_all_bdrv_states(void)
@@ -690,11 +685,7 @@ void blockdev_close_all_bdrv_states(void)
 
     GLOBAL_STATE_CODE();
     QTAILQ_FOREACH_SAFE(bs, &monitor_bdrv_states, monitor_list, next_bs) {
-        AioContext *ctx = bdrv_get_aio_context(bs);
-
-        aio_context_acquire(ctx);
         bdrv_unref(bs);
-        aio_context_release(ctx);
     }
 }
 
@@ -1048,7 +1039,6 @@ fail:
 static BlockDriverState *qmp_get_root_bs(const char *name, Error **errp)
 {
     BlockDriverState *bs;
-    AioContext *aio_context;
 
     GRAPH_RDLOCK_GUARD_MAINLOOP();
 
@@ -1062,16 +1052,11 @@ static BlockDriverState *qmp_get_root_bs(const char *name, Error **errp)
         return NULL;
     }
 
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-
     if (!bdrv_is_inserted(bs)) {
         error_setg(errp, "Device has no medium");
         bs = NULL;
     }
 
-    aio_context_release(aio_context);
-
     return bs;
 }
 
@@ -1141,7 +1126,6 @@ SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device,
                                                          Error **errp)
 {
     BlockDriverState *bs;
-    AioContext *aio_context;
     QEMUSnapshotInfo sn;
     Error *local_err = NULL;
     SnapshotInfo *info = NULL;
@@ -1154,39 +1138,35 @@ SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device,
     if (!bs) {
         return NULL;
     }
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
 
     if (!id && !name) {
         error_setg(errp, "Name or id must be provided");
-        goto out_aio_context;
+        return NULL;
     }
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_INTERNAL_SNAPSHOT_DELETE, errp)) {
-        goto out_aio_context;
+        return NULL;
     }
 
     ret = bdrv_snapshot_find_by_id_and_name(bs, id, name, &sn, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
-        goto out_aio_context;
+        return NULL;
     }
     if (!ret) {
         error_setg(errp,
                    "Snapshot with id '%s' and name '%s' does not exist on "
                    "device '%s'",
                    STR_OR_NULL(id), STR_OR_NULL(name), device);
-        goto out_aio_context;
+        return NULL;
     }
 
     bdrv_snapshot_delete(bs, id, name, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
-        goto out_aio_context;
+        return NULL;
     }
 
-    aio_context_release(aio_context);
-
     info = g_new0(SnapshotInfo, 1);
     info->id = g_strdup(sn.id_str);
     info->name = g_strdup(sn.name);
@@ -1201,10 +1181,6 @@ SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device,
     }
 
     return info;
-
-out_aio_context:
-    aio_context_release(aio_context);
-    return NULL;
 }
 
 /* internal snapshot private data */
@@ -1232,7 +1208,6 @@ static void internal_snapshot_action(BlockdevSnapshotInternal *internal,
     bool ret;
     int64_t rt;
     InternalSnapshotState *state = g_new0(InternalSnapshotState, 1);
-    AioContext *aio_context;
     int ret1;
 
     GLOBAL_STATE_CODE();
@@ -1248,33 +1223,30 @@ static void internal_snapshot_action(BlockdevSnapshotInternal *internal,
         return;
     }
 
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-
     state->bs = bs;
 
     /* Paired with .clean() */
     bdrv_drained_begin(bs);
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_INTERNAL_SNAPSHOT, errp)) {
-        goto out;
+        return;
     }
 
     if (bdrv_is_read_only(bs)) {
         error_setg(errp, "Device '%s' is read only", device);
-        goto out;
+        return;
     }
 
     if (!bdrv_can_snapshot(bs)) {
         error_setg(errp, "Block format '%s' used by device '%s' "
                    "does not support internal snapshots",
                    bs->drv->format_name, device);
-        goto out;
+        return;
     }
 
     if (!strlen(name)) {
         error_setg(errp, "Name is empty");
-        goto out;
+        return;
     }
 
     /* check whether a snapshot with name exist */
@@ -1282,12 +1254,12 @@ static void internal_snapshot_action(BlockdevSnapshotInternal *internal,
                                             &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
-        goto out;
+        return;
     } else if (ret) {
         error_setg(errp,
                    "Snapshot with name '%s' already exists on device '%s'",
                    name, device);
-        goto out;
+        return;
     }
 
     /* 3. take the snapshot */
@@ -1308,14 +1280,11 @@ static void internal_snapshot_action(BlockdevSnapshotInternal *internal,
         error_setg_errno(errp, -ret1,
                          "Failed to create snapshot '%s' on device '%s'",
                          name, device);
-        goto out;
+        return;
     }
 
     /* 4. succeed, mark a snapshot is created */
     state->created = true;
-
-out:
-    aio_context_release(aio_context);
 }
 
 static void internal_snapshot_abort(void *opaque)
@@ -1323,7 +1292,6 @@ static void internal_snapshot_abort(void *opaque)
     InternalSnapshotState *state = opaque;
     BlockDriverState *bs = state->bs;
     QEMUSnapshotInfo *sn = &state->sn;
-    AioContext *aio_context;
     Error *local_error = NULL;
 
     GLOBAL_STATE_CODE();
@@ -1333,9 +1301,6 @@ static void internal_snapshot_abort(void *opaque)
         return;
     }
 
-    aio_context = bdrv_get_aio_context(state->bs);
-    aio_context_acquire(aio_context);
-
     if (bdrv_snapshot_delete(bs, sn->id_str, sn->name, &local_error) < 0) {
         error_reportf_err(local_error,
                           "Failed to delete snapshot with id '%s' and "
@@ -1343,25 +1308,17 @@ static void internal_snapshot_abort(void *opaque)
                           sn->id_str, sn->name,
                           bdrv_get_device_name(bs));
     }
-
-    aio_context_release(aio_context);
 }
 
 static void internal_snapshot_clean(void *opaque)
 {
     g_autofree InternalSnapshotState *state = opaque;
-    AioContext *aio_context;
 
     if (!state->bs) {
         return;
     }
 
-    aio_context = bdrv_get_aio_context(state->bs);
-    aio_context_acquire(aio_context);
-
     bdrv_drained_end(state->bs);
-
-    aio_context_release(aio_context);
 }
 
 /* external snapshot private data */
@@ -1395,7 +1352,6 @@ static void external_snapshot_action(TransactionAction *action,
     /* File name of the new image (for 'blockdev-snapshot-sync') */
     const char *new_image_file;
     ExternalSnapshotState *state = g_new0(ExternalSnapshotState, 1);
-    AioContext *aio_context;
     uint64_t perm, shared;
 
     /* TODO We'll eventually have to take a writer lock in this function */
@@ -1435,26 +1391,23 @@ static void external_snapshot_action(TransactionAction *action,
         return;
     }
 
-    aio_context = bdrv_get_aio_context(state->old_bs);
-    aio_context_acquire(aio_context);
-
     /* Paired with .clean() */
     bdrv_drained_begin(state->old_bs);
 
     if (!bdrv_is_inserted(state->old_bs)) {
         error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
-        goto out;
+        return;
     }
 
     if (bdrv_op_is_blocked(state->old_bs,
                            BLOCK_OP_TYPE_EXTERNAL_SNAPSHOT, errp)) {
-        goto out;
+        return;
     }
 
     if (!bdrv_is_read_only(state->old_bs)) {
         if (bdrv_flush(state->old_bs)) {
             error_setg(errp, QERR_IO_ERROR);
-            goto out;
+            return;
         }
     }
 
@@ -1466,13 +1419,13 @@ static void external_snapshot_action(TransactionAction *action,
 
         if (node_name && !snapshot_node_name) {
             error_setg(errp, "New overlay node-name missing");
-            goto out;
+            return;
         }
 
         if (snapshot_node_name &&
             bdrv_lookup_bs(snapshot_node_name, snapshot_node_name, NULL)) {
             error_setg(errp, "New overlay node-name already in use");
-            goto out;
+            return;
         }
 
         flags = state->old_bs->open_flags;
@@ -1485,20 +1438,18 @@ static void external_snapshot_action(TransactionAction *action,
             int64_t size = bdrv_getlength(state->old_bs);
             if (size < 0) {
                 error_setg_errno(errp, -size, "bdrv_getlength failed");
-                goto out;
+                return;
             }
             bdrv_refresh_filename(state->old_bs);
 
-            aio_context_release(aio_context);
             bdrv_img_create(new_image_file, format,
                             state->old_bs->filename,
                             state->old_bs->drv->format_name,
                             NULL, size, flags, false, &local_err);
-            aio_context_acquire(aio_context);
 
             if (local_err) {
                 error_propagate(errp, local_err);
-                goto out;
+                return;
             }
         }
 
@@ -1508,20 +1459,15 @@ static void external_snapshot_action(TransactionAction *action,
         }
         qdict_put_str(options, "driver", format);
     }
-    aio_context_release(aio_context);
 
-    aio_context_acquire(qemu_get_aio_context());
     state->new_bs = bdrv_open(new_image_file, snapshot_ref, options, flags,
                               errp);
-    aio_context_release(qemu_get_aio_context());
 
     /* We will manually add the backing_hd field to the bs later */
     if (!state->new_bs) {
         return;
     }
 
-    aio_context_acquire(aio_context);
-
     /*
      * Allow attaching a backing file to an overlay that's already in use only
      * if the parents don't assume that they are already seeing a valid image.
@@ -1530,41 +1476,34 @@ static void external_snapshot_action(TransactionAction *action,
     bdrv_get_cumulative_perm(state->new_bs, &perm, &shared);
     if (perm & BLK_PERM_CONSISTENT_READ) {
         error_setg(errp, "The overlay is already in use");
-        goto out;
+        return;
     }
 
     if (state->new_bs->drv->is_filter) {
         error_setg(errp, "Filters cannot be used as overlays");
-        goto out;
+        return;
     }
 
     if (bdrv_cow_child(state->new_bs)) {
         error_setg(errp, "The overlay already has a backing image");
-        goto out;
+        return;
     }
 
     if (!state->new_bs->drv->supports_backing) {
         error_setg(errp, "The overlay does not support backing images");
-        goto out;
+        return;
     }
 
     ret = bdrv_append(state->new_bs, state->old_bs, errp);
     if (ret < 0) {
-        goto out;
+        return;
     }
     state->overlay_appended = true;
-
-out:
-    aio_context_release(aio_context);
 }
 
 static void external_snapshot_commit(void *opaque)
 {
     ExternalSnapshotState *state = opaque;
-    AioContext *aio_context;
-
-    aio_context = bdrv_get_aio_context(state->old_bs);
-    aio_context_acquire(aio_context);
 
     /* We don't need (or want) to use the transactional
      * bdrv_reopen_multiple() across all the entries at once, because we
@@ -1572,8 +1511,6 @@ static void external_snapshot_commit(void *opaque)
     if (!qatomic_read(&state->old_bs->copy_on_read)) {
         bdrv_reopen_set_read_only(state->old_bs, true, NULL);
     }
-
-    aio_context_release(aio_context);
 }
 
 static void external_snapshot_abort(void *opaque)
@@ -1586,7 +1523,6 @@ static void external_snapshot_abort(void *opaque)
             int ret;
 
             aio_context = bdrv_get_aio_context(state->old_bs);
-            aio_context_acquire(aio_context);
 
             bdrv_ref(state->old_bs);   /* we can't let bdrv_set_backind_hd()
                                           close state->old_bs; we need it */
@@ -1599,15 +1535,9 @@ static void external_snapshot_abort(void *opaque)
              */
             tmp_context = bdrv_get_aio_context(state->old_bs);
             if (aio_context != tmp_context) {
-                aio_context_release(aio_context);
-                aio_context_acquire(tmp_context);
-
                 ret = bdrv_try_change_aio_context(state->old_bs,
                                                   aio_context, NULL, NULL);
                 assert(ret == 0);
-
-                aio_context_release(tmp_context);
-                aio_context_acquire(aio_context);
             }
 
             bdrv_drained_begin(state->new_bs);
@@ -1617,8 +1547,6 @@ static void external_snapshot_abort(void *opaque)
             bdrv_drained_end(state->new_bs);
 
             bdrv_unref(state->old_bs); /* bdrv_replace_node() ref'ed old_bs */
-
-            aio_context_release(aio_context);
         }
     }
 }
@@ -1626,19 +1554,13 @@ static void external_snapshot_abort(void *opaque)
 static void external_snapshot_clean(void *opaque)
 {
     g_autofree ExternalSnapshotState *state = opaque;
-    AioContext *aio_context;
 
     if (!state->old_bs) {
         return;
     }
 
-    aio_context = bdrv_get_aio_context(state->old_bs);
-    aio_context_acquire(aio_context);
-
     bdrv_drained_end(state->old_bs);
     bdrv_unref(state->new_bs);
-
-    aio_context_release(aio_context);
 }
 
 typedef struct DriveBackupState {
@@ -1670,7 +1592,6 @@ static void drive_backup_action(DriveBackup *backup,
     BlockDriverState *target_bs;
     BlockDriverState *source = NULL;
     AioContext *aio_context;
-    AioContext *old_context;
     const char *format;
     QDict *options;
     Error *local_err = NULL;
@@ -1698,7 +1619,6 @@ static void drive_backup_action(DriveBackup *backup,
     }
 
     aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
 
     state->bs = bs;
     /* Paired with .clean() */
@@ -1713,7 +1633,7 @@ static void drive_backup_action(DriveBackup *backup,
     bdrv_graph_rdlock_main_loop();
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
         bdrv_graph_rdunlock_main_loop();
-        goto out;
+        return;
     }
 
     flags = bs->open_flags | BDRV_O_RDWR;
@@ -1744,7 +1664,7 @@ static void drive_backup_action(DriveBackup *backup,
     size = bdrv_getlength(bs);
     if (size < 0) {
         error_setg_errno(errp, -size, "bdrv_getlength failed");
-        goto out;
+        return;
     }
 
     if (backup->mode != NEW_IMAGE_MODE_EXISTING) {
@@ -1770,7 +1690,7 @@ static void drive_backup_action(DriveBackup *backup,
 
     if (local_err) {
         error_propagate(errp, local_err);
-        goto out;
+        return;
     }
 
     options = qdict_new();
@@ -1779,30 +1699,18 @@ static void drive_backup_action(DriveBackup *backup,
     if (format) {
         qdict_put_str(options, "driver", format);
     }
-    aio_context_release(aio_context);
 
-    aio_context_acquire(qemu_get_aio_context());
     target_bs = bdrv_open(backup->target, NULL, options, flags, errp);
-    aio_context_release(qemu_get_aio_context());
-
     if (!target_bs) {
         return;
     }
 
-    /* Honor bdrv_try_change_aio_context() context acquisition requirements. */
-    old_context = bdrv_get_aio_context(target_bs);
-    aio_context_acquire(old_context);
-
     ret = bdrv_try_change_aio_context(target_bs, aio_context, NULL, errp);
     if (ret < 0) {
         bdrv_unref(target_bs);
-        aio_context_release(old_context);
         return;
     }
 
-    aio_context_release(old_context);
-    aio_context_acquire(aio_context);
-
     if (set_backing_hd) {
         if (bdrv_set_backing_hd(target_bs, source, errp) < 0) {
             goto unref;
@@ -1815,22 +1723,14 @@ static void drive_backup_action(DriveBackup *backup,
 
 unref:
     bdrv_unref(target_bs);
-out:
-    aio_context_release(aio_context);
 }
 
 static void drive_backup_commit(void *opaque)
 {
     DriveBackupState *state = opaque;
-    AioContext *aio_context;
-
-    aio_context = bdrv_get_aio_context(state->bs);
-    aio_context_acquire(aio_context);
 
     assert(state->job);
     job_start(&state->job->job);
-
-    aio_context_release(aio_context);
 }
 
 static void drive_backup_abort(void *opaque)
@@ -1845,18 +1745,12 @@ static void drive_backup_abort(void *opaque)
 static void drive_backup_clean(void *opaque)
 {
     g_autofree DriveBackupState *state = opaque;
-    AioContext *aio_context;
 
     if (!state->bs) {
         return;
     }
 
-    aio_context = bdrv_get_aio_context(state->bs);
-    aio_context_acquire(aio_context);
-
     bdrv_drained_end(state->bs);
-
-    aio_context_release(aio_context);
 }
 
 typedef struct BlockdevBackupState {
@@ -1881,7 +1775,6 @@ static void blockdev_backup_action(BlockdevBackup *backup,
     BlockDriverState *bs;
     BlockDriverState *target_bs;
     AioContext *aio_context;
-    AioContext *old_context;
     int ret;
 
     tran_add(tran, &blockdev_backup_drv, state);
@@ -1898,17 +1791,12 @@ static void blockdev_backup_action(BlockdevBackup *backup,
 
     /* Honor bdrv_try_change_aio_context() context acquisition requirements. */
     aio_context = bdrv_get_aio_context(bs);
-    old_context = bdrv_get_aio_context(target_bs);
-    aio_context_acquire(old_context);
 
     ret = bdrv_try_change_aio_context(target_bs, aio_context, NULL, errp);
     if (ret < 0) {
-        aio_context_release(old_context);
         return;
     }
 
-    aio_context_release(old_context);
-    aio_context_acquire(aio_context);
     state->bs = bs;
 
     /* Paired with .clean() */
@@ -1917,22 +1805,14 @@ static void blockdev_backup_action(BlockdevBackup *backup,
     state->job = do_backup_common(qapi_BlockdevBackup_base(backup),
                                   bs, target_bs, aio_context,
                                   block_job_txn, errp);
-
-    aio_context_release(aio_context);
 }
 
 static void blockdev_backup_commit(void *opaque)
 {
     BlockdevBackupState *state = opaque;
-    AioContext *aio_context;
-
-    aio_context = bdrv_get_aio_context(state->bs);
-    aio_context_acquire(aio_context);
 
     assert(state->job);
     job_start(&state->job->job);
-
-    aio_context_release(aio_context);
 }
 
 static void blockdev_backup_abort(void *opaque)
@@ -1947,18 +1827,12 @@ static void blockdev_backup_abort(void *opaque)
 static void blockdev_backup_clean(void *opaque)
 {
     g_autofree BlockdevBackupState *state = opaque;
-    AioContext *aio_context;
 
     if (!state->bs) {
         return;
     }
 
-    aio_context = bdrv_get_aio_context(state->bs);
-    aio_context_acquire(aio_context);
-
     bdrv_drained_end(state->bs);
-
-    aio_context_release(aio_context);
 }
 
 typedef struct BlockDirtyBitmapState {
@@ -2454,7 +2328,6 @@ void qmp_block_stream(const char *job_id, const char *device,
     }
 
     aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
 
     bdrv_graph_rdlock_main_loop();
     if (base) {
@@ -2521,7 +2394,7 @@ void qmp_block_stream(const char *job_id, const char *device,
     if (!base_bs && backing_file) {
         error_setg(errp, "backing file specified, but streaming the "
                          "entire chain");
-        goto out;
+        return;
     }
 
     if (has_auto_finalize && !auto_finalize) {
@@ -2536,18 +2409,14 @@ void qmp_block_stream(const char *job_id, const char *device,
                  filter_node_name, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
-        goto out;
+        return;
     }
 
     trace_qmp_block_stream(bs);
-
-out:
-    aio_context_release(aio_context);
     return;
 
 out_rdlock:
     bdrv_graph_rdunlock_main_loop();
-    aio_context_release(aio_context);
 }
 
 void qmp_block_commit(const char *job_id, const char *device,
@@ -2606,10 +2475,9 @@ void qmp_block_commit(const char *job_id, const char *device,
     }
 
     aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, errp)) {
-        goto out;
+        return;
     }
 
     /* default top_bs is the active layer */
@@ -2617,16 +2485,16 @@ void qmp_block_commit(const char *job_id, const char *device,
 
     if (top_node && top) {
         error_setg(errp, "'top-node' and 'top' are mutually exclusive");
-        goto out;
+        return;
     } else if (top_node) {
         top_bs = bdrv_lookup_bs(NULL, top_node, errp);
         if (top_bs == NULL) {
-            goto out;
+            return;
         }
         if (!bdrv_chain_contains(bs, top_bs)) {
             error_setg(errp, "'%s' is not in this backing file chain",
                        top_node);
-            goto out;
+            return;
         }
     } else if (top) {
         /* This strcmp() is just a shortcut, there is no need to
@@ -2640,35 +2508,35 @@ void qmp_block_commit(const char *job_id, const char *device,
 
     if (top_bs == NULL) {
         error_setg(errp, "Top image file %s not found", top ? top : "NULL");
-        goto out;
+        return;
     }
 
     assert(bdrv_get_aio_context(top_bs) == aio_context);
 
     if (base_node && base) {
         error_setg(errp, "'base-node' and 'base' are mutually exclusive");
-        goto out;
+        return;
     } else if (base_node) {
         base_bs = bdrv_lookup_bs(NULL, base_node, errp);
         if (base_bs == NULL) {
-            goto out;
+            return;
         }
         if (!bdrv_chain_contains(top_bs, base_bs)) {
             error_setg(errp, "'%s' is not in this backing file chain",
                        base_node);
-            goto out;
+            return;
         }
     } else if (base) {
         base_bs = bdrv_find_backing_image(top_bs, base);
         if (base_bs == NULL) {
             error_setg(errp, "Can't find '%s' in the backing chain", base);
-            goto out;
+            return;
         }
     } else {
         base_bs = bdrv_find_base(top_bs);
         if (base_bs == NULL) {
             error_setg(errp, "There is no backimg image");
-            goto out;
+            return;
         }
     }
 
@@ -2678,14 +2546,14 @@ void qmp_block_commit(const char *job_id, const char *device,
          iter = bdrv_filter_or_cow_bs(iter))
     {
         if (bdrv_op_is_blocked(iter, BLOCK_OP_TYPE_COMMIT_TARGET, errp)) {
-            goto out;
+            return;
         }
     }
 
     /* Do not allow attempts to commit an image into itself */
     if (top_bs == base_bs) {
         error_setg(errp, "cannot commit an image into itself");
-        goto out;
+        return;
     }
 
     /*
@@ -2708,7 +2576,7 @@ void qmp_block_commit(const char *job_id, const char *device,
                 error_setg(errp, "'backing-file' specified, but 'top' has a "
                                  "writer on it");
             }
-            goto out;
+            return;
         }
         if (!job_id) {
             /*
@@ -2724,7 +2592,7 @@ void qmp_block_commit(const char *job_id, const char *device,
     } else {
         BlockDriverState *overlay_bs = bdrv_find_overlay(bs, top_bs);
         if (bdrv_op_is_blocked(overlay_bs, BLOCK_OP_TYPE_COMMIT_TARGET, errp)) {
-            goto out;
+            return;
         }
         commit_start(job_id, bs, base_bs, top_bs, job_flags,
                      speed, on_error, backing_file,
@@ -2732,11 +2600,8 @@ void qmp_block_commit(const char *job_id, const char *device,
     }
     if (local_err != NULL) {
         error_propagate(errp, local_err);
-        goto out;
+        return;
     }
-
-out:
-    aio_context_release(aio_context);
 }
 
 /* Common QMP interface for drive-backup and blockdev-backup */
@@ -2985,8 +2850,6 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
 
     if (replaces) {
         BlockDriverState *to_replace_bs;
-        AioContext *aio_context;
-        AioContext *replace_aio_context;
         int64_t bs_size, replace_size;
 
         bs_size = bdrv_getlength(bs);
@@ -3000,19 +2863,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
             return;
         }
 
-        aio_context = bdrv_get_aio_context(bs);
-        replace_aio_context = bdrv_get_aio_context(to_replace_bs);
-        /*
-         * bdrv_getlength() is a co-wrapper and uses AIO_WAIT_WHILE. Be sure not
-         * to acquire the same AioContext twice.
-         */
-        if (replace_aio_context != aio_context) {
-            aio_context_acquire(replace_aio_context);
-        }
         replace_size = bdrv_getlength(to_replace_bs);
-        if (replace_aio_context != aio_context) {
-            aio_context_release(replace_aio_context);
-        }
 
         if (replace_size < 0) {
             error_setg_errno(errp, -replace_size,
@@ -3041,7 +2892,6 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     BlockDriverState *bs;
     BlockDriverState *target_backing_bs, *target_bs;
     AioContext *aio_context;
-    AioContext *old_context;
     BlockMirrorBackingMode backing_mode;
     Error *local_err = NULL;
     QDict *options = NULL;
@@ -3064,7 +2914,6 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     }
 
     aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
 
     if (!arg->has_mode) {
         arg->mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
@@ -3088,14 +2937,14 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     size = bdrv_getlength(bs);
     if (size < 0) {
         error_setg_errno(errp, -size, "bdrv_getlength failed");
-        goto out;
+        return;
     }
 
     if (arg->replaces) {
         if (!arg->node_name) {
             error_setg(errp, "a node-name must be provided when replacing a"
                              " named node of the graph");
-            goto out;
+            return;
         }
     }
 
@@ -3143,7 +2992,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
 
     if (local_err) {
         error_propagate(errp, local_err);
-        goto out;
+        return;
     }
 
     options = qdict_new();
@@ -3153,15 +3002,11 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     if (format) {
         qdict_put_str(options, "driver", format);
     }
-    aio_context_release(aio_context);
 
     /* Mirroring takes care of copy-on-write using the source's backing
      * file.
      */
-    aio_context_acquire(qemu_get_aio_context());
     target_bs = bdrv_open(arg->target, NULL, options, flags, errp);
-    aio_context_release(qemu_get_aio_context());
-
     if (!target_bs) {
         return;
     }
@@ -3173,20 +3018,12 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     bdrv_graph_rdunlock_main_loop();
 
 
-    /* Honor bdrv_try_change_aio_context() context acquisition requirements. */
-    old_context = bdrv_get_aio_context(target_bs);
-    aio_context_acquire(old_context);
-
     ret = bdrv_try_change_aio_context(target_bs, aio_context, NULL, errp);
     if (ret < 0) {
         bdrv_unref(target_bs);
-        aio_context_release(old_context);
         return;
     }
 
-    aio_context_release(old_context);
-    aio_context_acquire(aio_context);
-
     blockdev_mirror_common(arg->job_id, bs, target_bs,
                            arg->replaces, arg->sync,
                            backing_mode, zero_target,
@@ -3202,8 +3039,6 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
                            arg->has_auto_dismiss, arg->auto_dismiss,
                            errp);
     bdrv_unref(target_bs);
-out:
-    aio_context_release(aio_context);
 }
 
 void qmp_blockdev_mirror(const char *job_id,
@@ -3226,7 +3061,6 @@ void qmp_blockdev_mirror(const char *job_id,
     BlockDriverState *bs;
     BlockDriverState *target_bs;
     AioContext *aio_context;
-    AioContext *old_context;
     BlockMirrorBackingMode backing_mode = MIRROR_LEAVE_BACKING_CHAIN;
     bool zero_target;
     int ret;
@@ -3243,18 +3077,11 @@ void qmp_blockdev_mirror(const char *job_id,
 
     zero_target = (sync == MIRROR_SYNC_MODE_FULL);
 
-    /* Honor bdrv_try_change_aio_context() context acquisition requirements. */
-    old_context = bdrv_get_aio_context(target_bs);
     aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(old_context);
 
     ret = bdrv_try_change_aio_context(target_bs, aio_context, NULL, errp);
-
-    aio_context_release(old_context);
-    aio_context_acquire(aio_context);
-
     if (ret < 0) {
-        goto out;
+        return;
     }
 
     blockdev_mirror_common(job_id, bs, target_bs,
@@ -3269,8 +3096,6 @@ void qmp_blockdev_mirror(const char *job_id,
                            has_auto_finalize, auto_finalize,
                            has_auto_dismiss, auto_dismiss,
                            errp);
-out:
-    aio_context_release(aio_context);
 }
 
 /*
@@ -3433,7 +3258,6 @@ void qmp_change_backing_file(const char *device,
                              Error **errp)
 {
     BlockDriverState *bs = NULL;
-    AioContext *aio_context;
     BlockDriverState *image_bs = NULL;
     Error *local_err = NULL;
     bool ro;
@@ -3444,9 +3268,6 @@ void qmp_change_backing_file(const char *device,
         return;
     }
 
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
-
     bdrv_graph_rdlock_main_loop();
 
     image_bs = bdrv_lookup_bs(NULL, image_node_name, &local_err);
@@ -3485,7 +3306,7 @@ void qmp_change_backing_file(const char *device,
 
     if (ro) {
         if (bdrv_reopen_set_read_only(image_bs, false, errp) != 0) {
-            goto out;
+            return;
         }
     }
 
@@ -3503,14 +3324,10 @@ void qmp_change_backing_file(const char *device,
     if (ro) {
         bdrv_reopen_set_read_only(image_bs, true, errp);
     }
-
-out:
-    aio_context_release(aio_context);
     return;
 
 out_rdlock:
     bdrv_graph_rdunlock_main_loop();
-    aio_context_release(aio_context);
 }
 
 void qmp_blockdev_add(BlockdevOptions *options, Error **errp)
@@ -3550,7 +3367,6 @@ void qmp_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp)
     for (; reopen_list != NULL; reopen_list = reopen_list->next) {
         BlockdevOptions *options = reopen_list->value;
         BlockDriverState *bs;
-        AioContext *ctx;
         QObject *obj;
         Visitor *v;
         QDict *qdict;
@@ -3578,12 +3394,7 @@ void qmp_blockdev_reopen(BlockdevOptionsList *reopen_list, Error **errp)
 
         qdict_flatten(qdict);
 
-        ctx = bdrv_get_aio_context(bs);
-        aio_context_acquire(ctx);
-
         queue = bdrv_reopen_queue(queue, bs, qdict, false);
-
-        aio_context_release(ctx);
     }
 
     /* Perform the reopen operation */
@@ -3596,7 +3407,6 @@ fail:
 
 void qmp_blockdev_del(const char *node_name, Error **errp)
 {
-    AioContext *aio_context;
     BlockDriverState *bs;
 
     GLOBAL_STATE_CODE();
@@ -3611,30 +3421,25 @@ void qmp_blockdev_del(const char *node_name, Error **errp)
         error_setg(errp, "Node %s is in use", node_name);
         return;
     }
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_DRIVE_DEL, errp)) {
-        goto out;
+        return;
     }
 
     if (!QTAILQ_IN_USE(bs, monitor_list)) {
         error_setg(errp, "Node %s is not owned by the monitor",
                    bs->node_name);
-        goto out;
+        return;
     }
 
     if (bs->refcnt > 1) {
         error_setg(errp, "Block device %s is in use",
                    bdrv_get_device_or_node_name(bs));
-        goto out;
+        return;
     }
 
     QTAILQ_REMOVE(&monitor_bdrv_states, bs, monitor_list);
     bdrv_unref(bs);
-
-out:
-    aio_context_release(aio_context);
 }
 
 static BdrvChild * GRAPH_RDLOCK
@@ -3724,7 +3529,6 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread,
                                  bool has_force, bool force, Error **errp)
 {
-    AioContext *old_context;
     AioContext *new_context;
     BlockDriverState *bs;
 
@@ -3756,12 +3560,7 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread,
         new_context = qemu_get_aio_context();
     }
 
-    old_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(old_context);
-
     bdrv_try_change_aio_context(bs, new_context, NULL, errp);
-
-    aio_context_release(old_context);
 }
 
 QemuOptsList qemu_common_drive_opts = {
diff --git a/blockjob.c b/blockjob.c
index 7310412313..d5f29e14af 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -198,9 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
      * one to make sure that such a concurrent access does not attempt
      * to process an already freed BdrvChild.
      */
-    aio_context_release(job->job.aio_context);
     bdrv_graph_wrlock();
-    aio_context_acquire(job->job.aio_context);
     while (job->nodes) {
         GSList *l = job->nodes;
         BdrvChild *c = l->data;
@@ -234,28 +232,12 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
                        uint64_t perm, uint64_t shared_perm, Error **errp)
 {
     BdrvChild *c;
-    AioContext *ctx = bdrv_get_aio_context(bs);
-    bool need_context_ops;
     GLOBAL_STATE_CODE();
 
     bdrv_ref(bs);
 
-    need_context_ops = ctx != job->job.aio_context;
-
-    if (need_context_ops) {
-        if (job->job.aio_context != qemu_get_aio_context()) {
-            aio_context_release(job->job.aio_context);
-        }
-        aio_context_acquire(ctx);
-    }
     c = bdrv_root_attach_child(bs, name, &child_job, 0, perm, shared_perm, job,
                                errp);
-    if (need_context_ops) {
-        aio_context_release(ctx);
-        if (job->job.aio_context != qemu_get_aio_context()) {
-            aio_context_acquire(job->job.aio_context);
-        }
-    }
     if (c == NULL) {
         return -EPERM;
     }
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index f83bb0f116..7bbbd981ad 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -124,7 +124,6 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
     VirtIOBlockDataPlane *s = vblk->dataplane;
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vblk)));
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-    AioContext *old_context;
     unsigned i;
     unsigned nvqs = s->conf->num_queues;
     Error *local_err = NULL;
@@ -178,10 +177,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
     trace_virtio_blk_data_plane_start(s);
 
-    old_context = blk_get_aio_context(s->conf->conf.blk);
-    aio_context_acquire(old_context);
     r = blk_set_aio_context(s->conf->conf.blk, s->ctx, &local_err);
-    aio_context_release(old_context);
     if (r < 0) {
         error_report_err(local_err);
         goto fail_aio_context;
@@ -208,13 +204,11 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
     /* Get this show started by hooking up our callbacks */
     if (!blk_in_drain(s->conf->conf.blk)) {
-        aio_context_acquire(s->ctx);
         for (i = 0; i < nvqs; i++) {
             VirtQueue *vq = virtio_get_queue(s->vdev, i);
 
             virtio_queue_aio_attach_host_notifier(vq, s->ctx);
         }
-        aio_context_release(s->ctx);
     }
     return 0;
 
@@ -314,8 +308,6 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
      */
     vblk->dataplane_started = false;
 
-    aio_context_acquire(s->ctx);
-
     /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
     blk_drain(s->conf->conf.blk);
 
@@ -325,8 +317,6 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
      */
     blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context(), NULL);
 
-    aio_context_release(s->ctx);
-
     /* Clean up guest notifier (irq) */
     k->set_guest_notifiers(qbus->parent, nvqs, false);
 
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index c4bb28c66f..98501e6885 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -260,8 +260,6 @@ static void xen_block_complete_aio(void *opaque, int ret)
     XenBlockRequest *request = opaque;
     XenBlockDataPlane *dataplane = request->dataplane;
 
-    aio_context_acquire(dataplane->ctx);
-
     if (ret != 0) {
         error_report("%s I/O error",
                      request->req.operation == BLKIF_OP_READ ?
@@ -273,10 +271,10 @@ static void xen_block_complete_aio(void *opaque, int ret)
     if (request->presync) {
         request->presync = 0;
         xen_block_do_aio(request);
-        goto done;
+        return;
     }
     if (request->aio_inflight > 0) {
-        goto done;
+        return;
     }
 
     switch (request->req.operation) {
@@ -318,9 +316,6 @@ static void xen_block_complete_aio(void *opaque, int ret)
     if (dataplane->more_work) {
         qemu_bh_schedule(dataplane->bh);
     }
-
-done:
-    aio_context_release(dataplane->ctx);
 }
 
 static bool xen_block_split_discard(XenBlockRequest *request,
@@ -601,9 +596,7 @@ static void xen_block_dataplane_bh(void *opaque)
 {
     XenBlockDataPlane *dataplane = opaque;
 
-    aio_context_acquire(dataplane->ctx);
     xen_block_handle_requests(dataplane);
-    aio_context_release(dataplane->ctx);
 }
 
 static bool xen_block_dataplane_event(void *opaque)
@@ -703,10 +696,8 @@ void xen_block_dataplane_stop(XenBlockDataPlane *dataplane)
         xen_block_dataplane_detach(dataplane);
     }
 
-    aio_context_acquire(dataplane->ctx);
     /* Xen doesn't have multiple users for nodes, so this can't fail */
     blk_set_aio_context(dataplane->blk, qemu_get_aio_context(), &error_abort);
-    aio_context_release(dataplane->ctx);
 
     /*
      * Now that the context has been moved onto the main thread, cancel
@@ -752,7 +743,6 @@ void xen_block_dataplane_start(XenBlockDataPlane *dataplane,
 {
     ERRP_GUARD();
     XenDevice *xendev = dataplane->xendev;
-    AioContext *old_context;
     unsigned int ring_size;
     unsigned int i;
 
@@ -836,11 +826,8 @@ void xen_block_dataplane_start(XenBlockDataPlane *dataplane,
         goto stop;
     }
 
-    old_context = blk_get_aio_context(dataplane->blk);
-    aio_context_acquire(old_context);
     /* If other users keep the BlockBackend in the iothread, that's ok */
     blk_set_aio_context(dataplane->blk, dataplane->ctx, NULL);
-    aio_context_release(old_context);
 
     if (!blk_in_drain(dataplane->blk)) {
         xen_block_dataplane_attach(dataplane);
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e110f9718b..ec9ed09a6a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1210,17 +1210,13 @@ static void virtio_blk_dma_restart_cb(void *opaque, bool running,
 static void virtio_blk_reset(VirtIODevice *vdev)
 {
     VirtIOBlock *s = VIRTIO_BLK(vdev);
-    AioContext *ctx;
     VirtIOBlockReq *req;
 
     /* Dataplane has stopped... */
     assert(!s->dataplane_started);
 
     /* ...but requests may still be in flight. */
-    ctx = blk_get_aio_context(s->blk);
-    aio_context_acquire(ctx);
     blk_drain(s->blk);
-    aio_context_release(ctx);
 
     /* We drop queued requests after blk_drain() because blk_drain() itself can
      * produce them. */
@@ -1250,10 +1246,6 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
     uint64_t capacity;
     int64_t length;
     int blk_size = conf->logical_block_size;
-    AioContext *ctx;
-
-    ctx = blk_get_aio_context(s->blk);
-    aio_context_acquire(ctx);
 
     blk_get_geometry(s->blk, &capacity);
     memset(&blkcfg, 0, sizeof(blkcfg));
@@ -1277,7 +1269,6 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
      * per track (cylinder).
      */
     length = blk_getlength(s->blk);
-    aio_context_release(ctx);
     if (length > 0 && length / conf->heads / conf->secs % blk_size) {
         blkcfg.geometry.sectors = conf->secs & ~s->sector_mask;
     } else {
@@ -1344,9 +1335,7 @@ static void virtio_blk_set_config(VirtIODevice *vdev, const uint8_t *config)
 
     memcpy(&blkcfg, config, s->config_size);
 
-    aio_context_acquire(blk_get_aio_context(s->blk));
     blk_set_enable_write_cache(s->blk, blkcfg.wce != 0);
-    aio_context_release(blk_get_aio_context(s->blk));
 }
 
 static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features,
@@ -1414,11 +1403,9 @@ static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status)
      * s->blk would erroneously be placed in writethrough mode.
      */
     if (!virtio_vdev_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE)) {
-        aio_context_acquire(blk_get_aio_context(s->blk));
         blk_set_enable_write_cache(s->blk,
                                    virtio_vdev_has_feature(vdev,
                                                            VIRTIO_BLK_F_WCE));
-        aio_context_release(blk_get_aio_context(s->blk));
     }
 }
 
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 1473ab3d5e..73cced4626 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -120,9 +120,7 @@ static void set_drive_helper(Object *obj, Visitor *v, const char *name,
                        "node");
         }
 
-        aio_context_acquire(ctx);
         blk_replace_bs(blk, bs, errp);
-        aio_context_release(ctx);
         return;
     }
 
@@ -148,10 +146,7 @@ static void set_drive_helper(Object *obj, Visitor *v, const char *name,
                           0, BLK_PERM_ALL);
             blk_created = true;
 
-            aio_context_acquire(ctx);
             ret = blk_insert_bs(blk, bs, errp);
-            aio_context_release(ctx);
-
             if (ret < 0) {
                 goto fail;
             }
@@ -207,12 +202,8 @@ static void release_drive(Object *obj, const char *name, void *opaque)
     BlockBackend **ptr = object_field_prop_ptr(obj, prop);
 
     if (*ptr) {
-        AioContext *ctx = blk_get_aio_context(*ptr);
-
-        aio_context_acquire(ctx);
         blockdev_auto_del(*ptr);
         blk_detach_dev(*ptr, dev);
-        aio_context_release(ctx);
     }
 }
 
diff --git a/job.c b/job.c
index 99a2e54b54..660ce22c56 100644
--- a/job.c
+++ b/job.c
@@ -464,12 +464,8 @@ void job_unref_locked(Job *job)
         assert(!job->txn);
 
         if (job->driver->free) {
-            AioContext *aio_context = job->aio_context;
             job_unlock();
-            /* FIXME: aiocontext lock is required because cb calls blk_unref */
-            aio_context_acquire(aio_context);
             job->driver->free(job);
-            aio_context_release(aio_context);
             job_lock();
         }
 
@@ -840,12 +836,10 @@ static void job_clean(Job *job)
 
 /*
  * Called with job_mutex held, but releases it temporarily.
- * Takes AioContext lock internally to invoke a job->driver callback.
  */
 static int job_finalize_single_locked(Job *job)
 {
     int job_ret;
-    AioContext *ctx = job->aio_context;
 
     assert(job_is_completed_locked(job));
 
@@ -854,7 +848,6 @@ static int job_finalize_single_locked(Job *job)
 
     job_ret = job->ret;
     job_unlock();
-    aio_context_acquire(ctx);
 
     if (!job_ret) {
         job_commit(job);
@@ -867,7 +860,6 @@ static int job_finalize_single_locked(Job *job)
         job->cb(job->opaque, job_ret);
     }
 
-    aio_context_release(ctx);
     job_lock();
 
     /* Emit events only if we actually started */
@@ -886,17 +878,13 @@ static int job_finalize_single_locked(Job *job)
 
 /*
  * Called with job_mutex held, but releases it temporarily.
- * Takes AioContext lock internally to invoke a job->driver callback.
  */
 static void job_cancel_async_locked(Job *job, bool force)
 {
-    AioContext *ctx = job->aio_context;
     GLOBAL_STATE_CODE();
     if (job->driver->cancel) {
         job_unlock();
-        aio_context_acquire(ctx);
         force = job->driver->cancel(job, force);
-        aio_context_release(ctx);
         job_lock();
     } else {
         /* No .cancel() means the job will behave as if force-cancelled */
@@ -931,7 +919,6 @@ static void job_cancel_async_locked(Job *job, bool force)
 
 /*
  * Called with job_mutex held, but releases it temporarily.
- * Takes AioContext lock internally to invoke a job->driver callback.
  */
 static void job_completed_txn_abort_locked(Job *job)
 {
@@ -979,15 +966,12 @@ static void job_completed_txn_abort_locked(Job *job)
 static int job_prepare_locked(Job *job)
 {
     int ret;
-    AioContext *ctx = job->aio_context;
 
     GLOBAL_STATE_CODE();
 
     if (job->ret == 0 && job->driver->prepare) {
         job_unlock();
-        aio_context_acquire(ctx);
         ret = job->driver->prepare(job);
-        aio_context_release(ctx);
         job_lock();
         job->ret = ret;
         job_update_rc_locked(job);
diff --git a/migration/block.c b/migration/block.c
index a15f9bddcb..6ec6a1d6e6 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -66,7 +66,7 @@ typedef struct BlkMigDevState {
     /* Protected by block migration lock.  */
     int64_t completed_sectors;
 
-    /* During migration this is protected by iothread lock / AioContext.
+    /* During migration this is protected by bdrv_dirty_bitmap_lock().
      * Allocation and free happen during setup and cleanup respectively.
      */
     BdrvDirtyBitmap *dirty_bitmap;
@@ -101,7 +101,7 @@ typedef struct BlkMigState {
     int prev_progress;
     int bulk_completed;
 
-    /* Lock must be taken _inside_ the iothread lock and any AioContexts.  */
+    /* Lock must be taken _inside_ the iothread lock.  */
     QemuMutex lock;
 } BlkMigState;
 
@@ -270,7 +270,6 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
 
     if (bmds->shared_base) {
         qemu_mutex_lock_iothread();
-        aio_context_acquire(blk_get_aio_context(bb));
         /* Skip unallocated sectors; intentionally treats failure or
          * partial sector as an allocated sector */
         while (cur_sector < total_sectors &&
@@ -281,7 +280,6 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
             }
             cur_sector += count >> BDRV_SECTOR_BITS;
         }
-        aio_context_release(blk_get_aio_context(bb));
         qemu_mutex_unlock_iothread();
     }
 
@@ -313,21 +311,16 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
     block_mig_state.submitted++;
     blk_mig_unlock();
 
-    /* We do not know if bs is under the main thread (and thus does
-     * not acquire the AioContext when doing AIO) or rather under
-     * dataplane.  Thus acquire both the iothread mutex and the
-     * AioContext.
-     *
-     * This is ugly and will disappear when we make bdrv_* thread-safe,
-     * without the need to acquire the AioContext.
+    /*
+     * The migration thread does not have an AioContext. Lock the BQL so that
+     * I/O runs in the main loop AioContext (see
+     * qemu_get_current_aio_context()).
      */
     qemu_mutex_lock_iothread();
-    aio_context_acquire(blk_get_aio_context(bmds->blk));
     bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, cur_sector * BDRV_SECTOR_SIZE,
                             nr_sectors * BDRV_SECTOR_SIZE);
     blk->aiocb = blk_aio_preadv(bb, cur_sector * BDRV_SECTOR_SIZE, &blk->qiov,
                                 0, blk_mig_read_cb, blk);
-    aio_context_release(blk_get_aio_context(bmds->blk));
     qemu_mutex_unlock_iothread();
 
     bmds->cur_sector = cur_sector + nr_sectors;
@@ -512,7 +505,7 @@ static void blk_mig_reset_dirty_cursor(void)
     }
 }
 
-/* Called with iothread lock and AioContext taken.  */
+/* Called with iothread lock taken.  */
 
 static int mig_save_device_dirty(QEMUFile *f, BlkMigDevState *bmds,
                                  int is_async)
@@ -606,9 +599,7 @@ static int blk_mig_save_dirty_block(QEMUFile *f, int is_async)
     int ret = 1;
 
     QSIMPLEQ_FOREACH(bmds, &block_mig_state.bmds_list, entry) {
-        aio_context_acquire(blk_get_aio_context(bmds->blk));
         ret = mig_save_device_dirty(f, bmds, is_async);
-        aio_context_release(blk_get_aio_context(bmds->blk));
         if (ret <= 0) {
             break;
         }
@@ -666,9 +657,9 @@ static int64_t get_remaining_dirty(void)
     int64_t dirty = 0;
 
     QSIMPLEQ_FOREACH(bmds, &block_mig_state.bmds_list, entry) {
-        aio_context_acquire(blk_get_aio_context(bmds->blk));
+        bdrv_dirty_bitmap_lock(bmds->dirty_bitmap);
         dirty += bdrv_get_dirty_count(bmds->dirty_bitmap);
-        aio_context_release(blk_get_aio_context(bmds->blk));
+        bdrv_dirty_bitmap_unlock(bmds->dirty_bitmap);
     }
 
     return dirty;
@@ -681,7 +672,6 @@ static void block_migration_cleanup_bmds(void)
 {
     BlkMigDevState *bmds;
     BlockDriverState *bs;
-    AioContext *ctx;
 
     unset_dirty_tracking();
 
@@ -693,13 +683,7 @@ static void block_migration_cleanup_bmds(void)
             bdrv_op_unblock_all(bs, bmds->blocker);
         }
         error_free(bmds->blocker);
-
-        /* Save ctx, because bmds->blk can disappear during blk_unref.  */
-        ctx = blk_get_aio_context(bmds->blk);
-        aio_context_acquire(ctx);
         blk_unref(bmds->blk);
-        aio_context_release(ctx);
-
         g_free(bmds->blk_name);
         g_free(bmds->aio_bitmap);
         g_free(bmds);
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 86ae832176..99710c8ffb 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -852,14 +852,11 @@ static void vm_completion(ReadLineState *rs, const char *str)
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         SnapshotInfoList *snapshots, *snapshot;
-        AioContext *ctx = bdrv_get_aio_context(bs);
         bool ok = false;
 
-        aio_context_acquire(ctx);
         if (bdrv_can_snapshot(bs)) {
             ok = bdrv_query_snapshot_info_list(bs, &snapshots, NULL) == 0;
         }
-        aio_context_release(ctx);
         if (!ok) {
             continue;
         }
diff --git a/migration/savevm.c b/migration/savevm.c
index eec5503a42..1b9ab7b8ee 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3049,7 +3049,6 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
     int saved_vm_running;
     uint64_t vm_state_size;
     g_autoptr(GDateTime) now = g_date_time_new_now_local();
-    AioContext *aio_context;
 
     GLOBAL_STATE_CODE();
 
@@ -3092,7 +3091,6 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
     if (bs == NULL) {
         return false;
     }
-    aio_context = bdrv_get_aio_context(bs);
 
     saved_vm_running = runstate_is_running();
 
@@ -3101,8 +3099,6 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
 
     bdrv_drain_all_begin();
 
-    aio_context_acquire(aio_context);
-
     memset(sn, 0, sizeof(*sn));
 
     /* fill auxiliary fields */
@@ -3139,14 +3135,6 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
         goto the_end;
     }
 
-    /* The bdrv_all_create_snapshot() call that follows acquires the AioContext
-     * for itself.  BDRV_POLL_WHILE() does not support nested locking because
-     * it only releases the lock once.  Therefore synchronous I/O will deadlock
-     * unless we release the AioContext before bdrv_all_create_snapshot().
-     */
-    aio_context_release(aio_context);
-    aio_context = NULL;
-
     ret = bdrv_all_create_snapshot(sn, bs, vm_state_size,
                                    has_devices, devices, errp);
     if (ret < 0) {
@@ -3157,10 +3145,6 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
     ret = 0;
 
  the_end:
-    if (aio_context) {
-        aio_context_release(aio_context);
-    }
-
     bdrv_drain_all_end();
 
     if (saved_vm_running) {
@@ -3258,7 +3242,6 @@ bool load_snapshot(const char *name, const char *vmstate,
     QEMUSnapshotInfo sn;
     QEMUFile *f;
     int ret;
-    AioContext *aio_context;
     MigrationIncomingState *mis = migration_incoming_get_current();
 
     if (!bdrv_all_can_snapshot(has_devices, devices, errp)) {
@@ -3278,12 +3261,9 @@ bool load_snapshot(const char *name, const char *vmstate,
     if (!bs_vm_state) {
         return false;
     }
-    aio_context = bdrv_get_aio_context(bs_vm_state);
 
     /* Don't even try to load empty VM states */
-    aio_context_acquire(aio_context);
     ret = bdrv_snapshot_find(bs_vm_state, &sn, name);
-    aio_context_release(aio_context);
     if (ret < 0) {
         return false;
     } else if (sn.vm_state_size == 0) {
@@ -3320,10 +3300,8 @@ bool load_snapshot(const char *name, const char *vmstate,
         ret = -EINVAL;
         goto err_drain;
     }
-    aio_context_acquire(aio_context);
     ret = qemu_loadvm_state(f);
     migration_incoming_state_destroy();
-    aio_context_release(aio_context);
 
     bdrv_drain_all_end();
 
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 7f9e6f89ce..f2dfc0ebdc 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1439,12 +1439,10 @@ static void colo_compare_finalize(Object *obj)
     qemu_bh_delete(s->event_bh);
 
     AioContext *ctx = iothread_get_aio_context(s->iothread);
-    aio_context_acquire(ctx);
     AIO_WAIT_WHILE(ctx, !s->out_sendco.done);
     if (s->notify_dev) {
         AIO_WAIT_WHILE(ctx, !s->notify_sendco.done);
     }
-    aio_context_release(ctx);
 
     /* Release all unhandled packets after compare thead exited */
     g_queue_foreach(&s->conn_list, colo_flush_packets, s);
diff --git a/qemu-img.c b/qemu-img.c
index 5a77f67719..7668f86769 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -960,7 +960,6 @@ static int img_commit(int argc, char **argv)
     Error *local_err = NULL;
     CommonBlockJobCBInfo cbi;
     bool image_opts = false;
-    AioContext *aio_context;
     int64_t rate_limit = 0;
 
     fmt = NULL;
@@ -1078,12 +1077,9 @@ static int img_commit(int argc, char **argv)
         .bs   = bs,
     };
 
-    aio_context = bdrv_get_aio_context(bs);
-    aio_context_acquire(aio_context);
     commit_active_start("commit", bs, base_bs, JOB_DEFAULT, rate_limit,
                         BLOCKDEV_ON_ERROR_REPORT, NULL, common_block_job_cb,
                         &cbi, false, &local_err);
-    aio_context_release(aio_context);
     if (local_err) {
         goto done;
     }
diff --git a/qemu-io.c b/qemu-io.c
index 050c70835f..6cb1e00385 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -414,15 +414,7 @@ static void prep_fetchline(void *opaque)
 
 static int do_qemuio_command(const char *cmd)
 {
-    int ret;
-    AioContext *ctx =
-        qemuio_blk ? blk_get_aio_context(qemuio_blk) : qemu_get_aio_context();
-
-    aio_context_acquire(ctx);
-    ret = qemuio_command(qemuio_blk, cmd);
-    aio_context_release(ctx);
-
-    return ret;
+    return qemuio_command(qemuio_blk, cmd);
 }
 
 static int command_loop(void)
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 186e6468b1..bac0b5e3ec 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -1123,9 +1123,7 @@ int main(int argc, char **argv)
         qdict_put_str(raw_opts, "file", bs->node_name);
         qdict_put_int(raw_opts, "offset", dev_offset);
 
-        aio_context_acquire(qemu_get_aio_context());
         bs = bdrv_open(NULL, NULL, raw_opts, flags, &error_fatal);
-        aio_context_release(qemu_get_aio_context());
 
         blk_remove_bs(blk);
         blk_insert_bs(blk, bs, &error_fatal);
diff --git a/replay/replay-debugging.c b/replay/replay-debugging.c
index 3e60549a4a..82c66fff26 100644
--- a/replay/replay-debugging.c
+++ b/replay/replay-debugging.c
@@ -144,7 +144,6 @@ static char *replay_find_nearest_snapshot(int64_t icount,
     char *ret = NULL;
     int rv;
     int nb_sns, i;
-    AioContext *aio_context;
 
     *snapshot_icount = -1;
 
@@ -152,11 +151,8 @@ static char *replay_find_nearest_snapshot(int64_t icount,
     if (!bs) {
         goto fail;
     }
-    aio_context = bdrv_get_aio_context(bs);
 
-    aio_context_acquire(aio_context);
     nb_sns = bdrv_snapshot_list(bs, &sn_tab);
-    aio_context_release(aio_context);
 
     for (i = 0; i < nb_sns; i++) {
         rv = bdrv_all_has_snapshot(sn_tab[i].name, false, NULL, NULL);
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index d9754dfebc..17830a69c1 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -179,13 +179,7 @@ static void do_drain_end(enum drain_type drain_type, BlockDriverState *bs)
 
 static void do_drain_begin_unlocked(enum drain_type drain_type, BlockDriverState *bs)
 {
-    if (drain_type != BDRV_DRAIN_ALL) {
-        aio_context_acquire(bdrv_get_aio_context(bs));
-    }
     do_drain_begin(drain_type, bs);
-    if (drain_type != BDRV_DRAIN_ALL) {
-        aio_context_release(bdrv_get_aio_context(bs));
-    }
 }
 
 static BlockBackend * no_coroutine_fn test_setup(void)
@@ -209,13 +203,7 @@ static BlockBackend * no_coroutine_fn test_setup(void)
 
 static void do_drain_end_unlocked(enum drain_type drain_type, BlockDriverState *bs)
 {
-    if (drain_type != BDRV_DRAIN_ALL) {
-        aio_context_acquire(bdrv_get_aio_context(bs));
-    }
     do_drain_end(drain_type, bs);
-    if (drain_type != BDRV_DRAIN_ALL) {
-        aio_context_release(bdrv_get_aio_context(bs));
-    }
 }
 
 /*
@@ -520,12 +508,8 @@ static void test_iothread_main_thread_bh(void *opaque)
 {
     struct test_iothread_data *data = opaque;
 
-    /* Test that the AioContext is not yet locked in a random BH that is
-     * executed during drain, otherwise this would deadlock. */
-    aio_context_acquire(bdrv_get_aio_context(data->bs));
     bdrv_flush(data->bs);
     bdrv_dec_in_flight(data->bs); /* incremented by test_iothread_common() */
-    aio_context_release(bdrv_get_aio_context(data->bs));
 }
 
 /*
@@ -567,7 +551,6 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
     blk_set_disable_request_queuing(blk, true);
 
     blk_set_aio_context(blk, ctx_a, &error_abort);
-    aio_context_acquire(ctx_a);
 
     s->bh_indirection_ctx = ctx_b;
 
@@ -582,8 +565,6 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
     g_assert(acb != NULL);
     g_assert_cmpint(aio_ret, ==, -EINPROGRESS);
 
-    aio_context_release(ctx_a);
-
     data = (struct test_iothread_data) {
         .bs         = bs,
         .drain_type = drain_type,
@@ -592,10 +573,6 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
 
     switch (drain_thread) {
     case 0:
-        if (drain_type != BDRV_DRAIN_ALL) {
-            aio_context_acquire(ctx_a);
-        }
-
         /*
          * Increment in_flight so that do_drain_begin() waits for
          * test_iothread_main_thread_bh(). This prevents the race between
@@ -613,20 +590,10 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
         do_drain_begin(drain_type, bs);
         g_assert_cmpint(bs->in_flight, ==, 0);
 
-        if (drain_type != BDRV_DRAIN_ALL) {
-            aio_context_release(ctx_a);
-        }
         qemu_event_wait(&done_event);
-        if (drain_type != BDRV_DRAIN_ALL) {
-            aio_context_acquire(ctx_a);
-        }
 
         g_assert_cmpint(aio_ret, ==, 0);
         do_drain_end(drain_type, bs);
-
-        if (drain_type != BDRV_DRAIN_ALL) {
-            aio_context_release(ctx_a);
-        }
         break;
     case 1:
         co = qemu_coroutine_create(test_iothread_drain_co_entry, &data);
@@ -637,9 +604,7 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
         g_assert_not_reached();
     }
 
-    aio_context_acquire(ctx_a);
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
-    aio_context_release(ctx_a);
 
     bdrv_unref(bs);
     blk_unref(blk);
@@ -757,7 +722,6 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     BlockJob *job;
     TestBlockJob *tjob;
     IOThread *iothread = NULL;
-    AioContext *ctx;
     int ret;
 
     src = bdrv_new_open_driver(&bdrv_test, "source", BDRV_O_RDWR,
@@ -787,11 +751,11 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     }
 
     if (use_iothread) {
+        AioContext *ctx;
+
         iothread = iothread_new();
         ctx = iothread_get_aio_context(iothread);
         blk_set_aio_context(blk_src, ctx, &error_abort);
-    } else {
-        ctx = qemu_get_aio_context();
     }
 
     target = bdrv_new_open_driver(&bdrv_test, "target", BDRV_O_RDWR,
@@ -800,7 +764,6 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     blk_insert_bs(blk_target, target, &error_abort);
     blk_set_allow_aio_context_change(blk_target, true);
 
-    aio_context_acquire(ctx);
     tjob = block_job_create("job0", &test_job_driver, NULL, src,
                             0, BLK_PERM_ALL,
                             0, 0, NULL, NULL, &error_abort);
@@ -821,7 +784,6 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
         tjob->prepare_ret = -EIO;
         break;
     }
-    aio_context_release(ctx);
 
     job_start(&job->job);
 
@@ -912,12 +874,10 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     }
     g_assert_cmpint(ret, ==, (result == TEST_JOB_SUCCESS ? 0 : -EIO));
 
-    aio_context_acquire(ctx);
     if (use_iothread) {
         blk_set_aio_context(blk_src, qemu_get_aio_context(), &error_abort);
         assert(blk_get_aio_context(blk_target) == qemu_get_aio_context());
     }
-    aio_context_release(ctx);
 
     blk_unref(blk_src);
     blk_unref(blk_target);
@@ -1401,9 +1361,7 @@ static void test_append_to_drained(void)
     g_assert_cmpint(base_s->drain_count, ==, 1);
     g_assert_cmpint(base->in_flight, ==, 0);
 
-    aio_context_acquire(qemu_get_aio_context());
     bdrv_append(overlay, base, &error_abort);
-    aio_context_release(qemu_get_aio_context());
 
     g_assert_cmpint(base->in_flight, ==, 0);
     g_assert_cmpint(overlay->in_flight, ==, 0);
@@ -1438,16 +1396,11 @@ static void test_set_aio_context(void)
 
     bdrv_drained_begin(bs);
     bdrv_try_change_aio_context(bs, ctx_a, NULL, &error_abort);
-
-    aio_context_acquire(ctx_a);
     bdrv_drained_end(bs);
 
     bdrv_drained_begin(bs);
     bdrv_try_change_aio_context(bs, ctx_b, NULL, &error_abort);
-    aio_context_release(ctx_a);
-    aio_context_acquire(ctx_b);
     bdrv_try_change_aio_context(bs, qemu_get_aio_context(), NULL, &error_abort);
-    aio_context_release(ctx_b);
     bdrv_drained_end(bs);
 
     bdrv_unref(bs);
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index 8ee6ef38d8..cafc023db4 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -142,10 +142,8 @@ static void test_update_perm_tree(void)
                       BDRV_CHILD_DATA, &error_abort);
     bdrv_graph_wrunlock();
 
-    aio_context_acquire(qemu_get_aio_context());
     ret = bdrv_append(filter, bs, NULL);
     g_assert_cmpint(ret, <, 0);
-    aio_context_release(qemu_get_aio_context());
 
     bdrv_unref(filter);
     blk_unref(root);
@@ -211,9 +209,7 @@ static void test_should_update_child(void)
     bdrv_attach_child(filter, target, "target", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
     bdrv_graph_wrunlock();
-    aio_context_acquire(qemu_get_aio_context());
     bdrv_append(filter, bs, &error_abort);
-    aio_context_release(qemu_get_aio_context());
 
     bdrv_graph_rdlock_main_loop();
     g_assert(target->backing->bs == bs);
@@ -440,9 +436,7 @@ static void test_append_greedy_filter(void)
                       &error_abort);
     bdrv_graph_wrunlock();
 
-    aio_context_acquire(qemu_get_aio_context());
     bdrv_append(fl, base, &error_abort);
-    aio_context_release(qemu_get_aio_context());
     bdrv_unref(fl);
     bdrv_unref(top);
 }
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 9b15d2768c..3766d5de6b 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -483,7 +483,6 @@ static void test_sync_op(const void *opaque)
     bdrv_graph_rdunlock_main_loop();
 
     blk_set_aio_context(blk, ctx, &error_abort);
-    aio_context_acquire(ctx);
     if (t->fn) {
         t->fn(c);
     }
@@ -491,7 +490,6 @@ static void test_sync_op(const void *opaque)
         t->blkfn(blk);
     }
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
-    aio_context_release(ctx);
 
     bdrv_unref(bs);
     blk_unref(blk);
@@ -576,9 +574,7 @@ static void test_attach_blockjob(void)
         aio_poll(qemu_get_aio_context(), false);
     }
 
-    aio_context_acquire(ctx);
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
-    aio_context_release(ctx);
 
     tjob->n = 0;
     while (tjob->n == 0) {
@@ -595,9 +591,7 @@ static void test_attach_blockjob(void)
     WITH_JOB_LOCK_GUARD() {
         job_complete_sync_locked(&tjob->common.job, &error_abort);
     }
-    aio_context_acquire(ctx);
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
-    aio_context_release(ctx);
 
     bdrv_unref(bs);
     blk_unref(blk);
@@ -654,9 +648,7 @@ static void test_propagate_basic(void)
 
     /* Switch the AioContext back */
     main_ctx = qemu_get_aio_context();
-    aio_context_acquire(ctx);
     blk_set_aio_context(blk, main_ctx, &error_abort);
-    aio_context_release(ctx);
     g_assert(blk_get_aio_context(blk) == main_ctx);
     g_assert(bdrv_get_aio_context(bs_a) == main_ctx);
     g_assert(bdrv_get_aio_context(bs_verify) == main_ctx);
@@ -732,9 +724,7 @@ static void test_propagate_diamond(void)
 
     /* Switch the AioContext back */
     main_ctx = qemu_get_aio_context();
-    aio_context_acquire(ctx);
     blk_set_aio_context(blk, main_ctx, &error_abort);
-    aio_context_release(ctx);
     g_assert(blk_get_aio_context(blk) == main_ctx);
     g_assert(bdrv_get_aio_context(bs_verify) == main_ctx);
     g_assert(bdrv_get_aio_context(bs_a) == main_ctx);
@@ -764,13 +754,11 @@ static void test_propagate_mirror(void)
                                   &error_abort);
 
     /* Start a mirror job */
-    aio_context_acquire(main_ctx);
     mirror_start("job0", src, target, NULL, JOB_DEFAULT, 0, 0, 0,
                  MIRROR_SYNC_MODE_NONE, MIRROR_OPEN_BACKING_CHAIN, false,
                  BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
                  false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
                  &error_abort);
-    aio_context_release(main_ctx);
 
     WITH_JOB_LOCK_GUARD() {
         job = job_get_locked("job0");
@@ -785,9 +773,7 @@ static void test_propagate_mirror(void)
     g_assert(job->aio_context == ctx);
 
     /* Change the AioContext of target */
-    aio_context_acquire(ctx);
     bdrv_try_change_aio_context(target, main_ctx, NULL, &error_abort);
-    aio_context_release(ctx);
     g_assert(bdrv_get_aio_context(src) == main_ctx);
     g_assert(bdrv_get_aio_context(target) == main_ctx);
     g_assert(bdrv_get_aio_context(filter) == main_ctx);
@@ -805,10 +791,8 @@ static void test_propagate_mirror(void)
     g_assert(bdrv_get_aio_context(filter) == main_ctx);
 
     /* ...unless we explicitly allow it */
-    aio_context_acquire(ctx);
     blk_set_allow_aio_context_change(blk, true);
     bdrv_try_change_aio_context(target, ctx, NULL, &error_abort);
-    aio_context_release(ctx);
 
     g_assert(blk_get_aio_context(blk) == ctx);
     g_assert(bdrv_get_aio_context(src) == ctx);
@@ -817,10 +801,8 @@ static void test_propagate_mirror(void)
 
     job_cancel_sync_all();
 
-    aio_context_acquire(ctx);
     blk_set_aio_context(blk, main_ctx, &error_abort);
     bdrv_try_change_aio_context(target, main_ctx, NULL, &error_abort);
-    aio_context_release(ctx);
 
     blk_unref(blk);
     bdrv_unref(src);
@@ -836,7 +818,6 @@ static void test_attach_second_node(void)
     BlockDriverState *bs, *filter;
     QDict *options;
 
-    aio_context_acquire(main_ctx);
     blk = blk_new(ctx, BLK_PERM_ALL, BLK_PERM_ALL);
     bs = bdrv_new_open_driver(&bdrv_test, "base", BDRV_O_RDWR, &error_abort);
     blk_insert_bs(blk, bs, &error_abort);
@@ -846,15 +827,12 @@ static void test_attach_second_node(void)
     qdict_put_str(options, "file", "base");
 
     filter = bdrv_open(NULL, NULL, options, BDRV_O_RDWR, &error_abort);
-    aio_context_release(main_ctx);
 
     g_assert(blk_get_aio_context(blk) == ctx);
     g_assert(bdrv_get_aio_context(bs) == ctx);
     g_assert(bdrv_get_aio_context(filter) == ctx);
 
-    aio_context_acquire(ctx);
     blk_set_aio_context(blk, main_ctx, &error_abort);
-    aio_context_release(ctx);
     g_assert(blk_get_aio_context(blk) == main_ctx);
     g_assert(bdrv_get_aio_context(bs) == main_ctx);
     g_assert(bdrv_get_aio_context(filter) == main_ctx);
@@ -868,11 +846,9 @@ static void test_attach_preserve_blk_ctx(void)
 {
     IOThread *iothread = iothread_new();
     AioContext *ctx = iothread_get_aio_context(iothread);
-    AioContext *main_ctx = qemu_get_aio_context();
     BlockBackend *blk;
     BlockDriverState *bs;
 
-    aio_context_acquire(main_ctx);
     blk = blk_new(ctx, BLK_PERM_ALL, BLK_PERM_ALL);
     bs = bdrv_new_open_driver(&bdrv_test, "base", BDRV_O_RDWR, &error_abort);
     bs->total_sectors = 65536 / BDRV_SECTOR_SIZE;
@@ -881,25 +857,18 @@ static void test_attach_preserve_blk_ctx(void)
     blk_insert_bs(blk, bs, &error_abort);
     g_assert(blk_get_aio_context(blk) == ctx);
     g_assert(bdrv_get_aio_context(bs) == ctx);
-    aio_context_release(main_ctx);
 
     /* Remove the node again */
-    aio_context_acquire(ctx);
     blk_remove_bs(blk);
-    aio_context_release(ctx);
     g_assert(blk_get_aio_context(blk) == ctx);
     g_assert(bdrv_get_aio_context(bs) == qemu_get_aio_context());
 
     /* Re-attach the node */
-    aio_context_acquire(main_ctx);
     blk_insert_bs(blk, bs, &error_abort);
-    aio_context_release(main_ctx);
     g_assert(blk_get_aio_context(blk) == ctx);
     g_assert(bdrv_get_aio_context(bs) == ctx);
 
-    aio_context_acquire(ctx);
     blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
-    aio_context_release(ctx);
     bdrv_unref(bs);
     blk_unref(blk);
 }
diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index a130f6fefb..fe3e0d2d38 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -228,7 +228,6 @@ static void cancel_common(CancelJob *s)
     BlockJob *job = &s->common;
     BlockBackend *blk = s->blk;
     JobStatus sts = job->job.status;
-    AioContext *ctx = job->job.aio_context;
 
     job_cancel_sync(&job->job, true);
     WITH_JOB_LOCK_GUARD() {
@@ -240,9 +239,7 @@ static void cancel_common(CancelJob *s)
         job_unref_locked(&job->job);
     }
 
-    aio_context_acquire(ctx);
     destroy_blk(blk);
-    aio_context_release(ctx);
 
 }
 
@@ -391,132 +388,6 @@ static void test_cancel_concluded(void)
     cancel_common(s);
 }
 
-/* (See test_yielding_driver for the job description) */
-typedef struct YieldingJob {
-    BlockJob common;
-    bool should_complete;
-} YieldingJob;
-
-static void yielding_job_complete(Job *job, Error **errp)
-{
-    YieldingJob *s = container_of(job, YieldingJob, common.job);
-    s->should_complete = true;
-    job_enter(job);
-}
-
-static int coroutine_fn yielding_job_run(Job *job, Error **errp)
-{
-    YieldingJob *s = container_of(job, YieldingJob, common.job);
-
-    job_transition_to_ready(job);
-
-    while (!s->should_complete) {
-        job_yield(job);
-    }
-
-    return 0;
-}
-
-/*
- * This job transitions immediately to the READY state, and then
- * yields until it is to complete.
- */
-static const BlockJobDriver test_yielding_driver = {
-    .job_driver = {
-        .instance_size  = sizeof(YieldingJob),
-        .free           = block_job_free,
-        .user_resume    = block_job_user_resume,
-        .run            = yielding_job_run,
-        .complete       = yielding_job_complete,
-    },
-};
-
-/*
- * Test that job_complete_locked() works even on jobs that are in a paused
- * state (i.e., STANDBY).
- *
- * To do this, run YieldingJob in an IO thread, get it into the READY
- * state, then have a drained section.  Before ending the section,
- * acquire the context so the job will not be entered and will thus
- * remain on STANDBY.
- *
- * job_complete_locked() should still work without error.
- *
- * Note that on the QMP interface, it is impossible to lock an IO
- * thread before a drained section ends.  In practice, the
- * bdrv_drain_all_end() and the aio_context_acquire() will be
- * reversed.  However, that makes for worse reproducibility here:
- * Sometimes, the job would no longer be in STANDBY then but already
- * be started.  We cannot prevent that, because the IO thread runs
- * concurrently.  We can only prevent it by taking the lock before
- * ending the drained section, so we do that.
- *
- * (You can reverse the order of operations and most of the time the
- * test will pass, but sometimes the assert(status == STANDBY) will
- * fail.)
- */
-static void test_complete_in_standby(void)
-{
-    BlockBackend *blk;
-    IOThread *iothread;
-    AioContext *ctx;
-    Job *job;
-    BlockJob *bjob;
-
-    /* Create a test drive, move it to an IO thread */
-    blk = create_blk(NULL);
-    iothread = iothread_new();
-
-    ctx = iothread_get_aio_context(iothread);
-    blk_set_aio_context(blk, ctx, &error_abort);
-
-    /* Create our test job */
-    bjob = mk_job(blk, "job", &test_yielding_driver, true,
-                  JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
-    job = &bjob->job;
-    assert_job_status_is(job, JOB_STATUS_CREATED);
-
-    /* Wait for the job to become READY */
-    job_start(job);
-    /*
-     * Here we are waiting for the status to change, so don't bother
-     * protecting the read every time.
-     */
-    AIO_WAIT_WHILE_UNLOCKED(ctx, job->status != JOB_STATUS_READY);
-
-    /* Begin the drained section, pausing the job */
-    bdrv_drain_all_begin();
-    assert_job_status_is(job, JOB_STATUS_STANDBY);
-
-    /* Lock the IO thread to prevent the job from being run */
-    aio_context_acquire(ctx);
-    /* This will schedule the job to resume it */
-    bdrv_drain_all_end();
-    aio_context_release(ctx);
-
-    WITH_JOB_LOCK_GUARD() {
-        /* But the job cannot run, so it will remain on standby */
-        assert(job->status == JOB_STATUS_STANDBY);
-
-        /* Even though the job is on standby, this should work */
-        job_complete_locked(job, &error_abort);
-
-        /* The test is done now, clean up. */
-        job_finish_sync_locked(job, NULL, &error_abort);
-        assert(job->status == JOB_STATUS_PENDING);
-
-        job_finalize_locked(job, &error_abort);
-        assert(job->status == JOB_STATUS_CONCLUDED);
-
-        job_dismiss_locked(&job, &error_abort);
-    }
-
-    aio_context_acquire(ctx);
-    destroy_blk(blk);
-    aio_context_release(ctx);
-    iothread_join(iothread);
-}
-
 int main(int argc, char **argv)
 {
     qemu_init_main_loop(&error_abort);
@@ -531,13 +402,5 @@ int main(int argc, char **argv)
     g_test_add_func("/blockjob/cancel/standby", test_cancel_standby);
     g_test_add_func("/blockjob/cancel/pending", test_cancel_pending);
     g_test_add_func("/blockjob/cancel/concluded", test_cancel_concluded);
-
-    /*
-     * This test is flaky and sometimes fails in CI and otherwise:
-     * don't run unless user opts in via environment variable.
-     */
-    if (getenv("QEMU_TEST_FLAKY_TESTS")) {
-        g_test_add_func("/blockjob/complete_in_standby", test_complete_in_standby);
-    }
     return g_test_run();
 }
diff --git a/tests/unit/test-replication.c b/tests/unit/test-replication.c
index afff908d77..5d2003b8ce 100644
--- a/tests/unit/test-replication.c
+++ b/tests/unit/test-replication.c
@@ -199,17 +199,13 @@ static BlockBackend *start_primary(void)
 static void teardown_primary(void)
 {
     BlockBackend *blk;
-    AioContext *ctx;
 
     /* remove P_ID */
     blk = blk_by_name(P_ID);
     assert(blk);
 
-    ctx = blk_get_aio_context(blk);
-    aio_context_acquire(ctx);
     monitor_remove_blk(blk);
     blk_unref(blk);
-    aio_context_release(ctx);
 }
 
 static void test_primary_read(void)
@@ -345,27 +341,20 @@ static void teardown_secondary(void)
 {
     /* only need to destroy two BBs */
     BlockBackend *blk;
-    AioContext *ctx;
 
     /* remove S_LOCAL_DISK_ID */
     blk = blk_by_name(S_LOCAL_DISK_ID);
     assert(blk);
 
-    ctx = blk_get_aio_context(blk);
-    aio_context_acquire(ctx);
     monitor_remove_blk(blk);
     blk_unref(blk);
-    aio_context_release(ctx);
 
     /* remove S_ID */
     blk = blk_by_name(S_ID);
     assert(blk);
 
-    ctx = blk_get_aio_context(blk);
-    aio_context_acquire(ctx);
     monitor_remove_blk(blk);
     blk_unref(blk);
-    aio_context_release(ctx);
 }
 
 static void test_secondary_read(void)
diff --git a/util/async.c b/util/async.c
index 04ee83d220..dfd44ef612 100644
--- a/util/async.c
+++ b/util/async.c
@@ -562,12 +562,10 @@ static void co_schedule_bh_cb(void *opaque)
         Coroutine *co = QSLIST_FIRST(&straight);
         QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
         trace_aio_co_schedule_bh_cb(ctx, co);
-        aio_context_acquire(ctx);
 
         /* Protected by write barrier in qemu_aio_coroutine_enter */
         qatomic_set(&co->scheduled, NULL);
         qemu_aio_coroutine_enter(ctx, co);
-        aio_context_release(ctx);
     }
 }
 
@@ -707,9 +705,7 @@ void aio_co_enter(AioContext *ctx, Coroutine *co)
         assert(self != co);
         QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co, co_queue_next);
     } else {
-        aio_context_acquire(ctx);
         qemu_aio_coroutine_enter(ctx, co);
-        aio_context_release(ctx);
     }
 }
 
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index a9a48fffb8..3bfb1ad3ec 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -360,10 +360,7 @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
 
     qio_channel_set_follow_coroutine_ctx(server->ioc, true);
 
-    /* Attaching the AioContext starts the vu_client_trip coroutine */
-    aio_context_acquire(server->ctx);
     vhost_user_server_attach_aio_context(server, server->ctx);
-    aio_context_release(server->ctx);
 }
 
 /* server->ctx acquired by caller */
diff --git a/scripts/block-coroutine-wrapper.py b/scripts/block-coroutine-wrapper.py
index 38364fa557..c9c09fcacd 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -278,12 +278,9 @@ def gen_no_co_wrapper(func: FuncDecl) -> str:
 static void {name}_bh(void *opaque)
 {{
     {struct_name} *s = opaque;
-    AioContext *ctx = {func.gen_ctx('s->')};
 
 {graph_lock}
-    aio_context_acquire(ctx);
     {func.get_result}{name}({ func.gen_list('s->{name}') });
-    aio_context_release(ctx);
 {graph_unlock}
 
     aio_co_wake(s->co);
diff --git a/tests/tsan/suppressions.tsan b/tests/tsan/suppressions.tsan
index d9a002a2ef..b3ef59c27c 100644
--- a/tests/tsan/suppressions.tsan
+++ b/tests/tsan/suppressions.tsan
@@ -4,7 +4,6 @@
 
 # TSan reports a double lock on RECURSIVE mutexes.
 # Since the recursive lock is intentional, we choose to ignore it.
-mutex:aio_context_acquire
 mutex:pthread_mutex_lock
 
 # TSan reports a race between pthread_mutex_init() and
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 21/33] block: remove bdrv_co_lock()
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (19 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 20/33] block: " Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 22/33] scsi: remove AioContext locking Kevin Wolf
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

The bdrv_co_lock() and bdrv_co_unlock() functions are already no-ops.
Remove them.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231205182011.1976568-8-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-global-state.h | 14 --------------
 block.c                            | 10 ----------
 blockdev.c                         |  5 -----
 3 files changed, 29 deletions(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index 0327f1c605..4ec0b217f0 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -267,20 +267,6 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag);
 int bdrv_debug_resume(BlockDriverState *bs, const char *tag);
 bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag);
 
-/**
- * Locks the AioContext of @bs if it's not the current AioContext. This avoids
- * double locking which could lead to deadlocks: This is a coroutine_fn, so we
- * know we already own the lock of the current AioContext.
- *
- * May only be called in the main thread.
- */
-void coroutine_fn bdrv_co_lock(BlockDriverState *bs);
-
-/**
- * Unlocks the AioContext of @bs if it's not the current AioContext.
- */
-void coroutine_fn bdrv_co_unlock(BlockDriverState *bs);
-
 bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
                                    GHashTable *visited, Transaction *tran,
                                    Error **errp);
diff --git a/block.c b/block.c
index 91ace5d2d5..434b7f4d72 100644
--- a/block.c
+++ b/block.c
@@ -7431,16 +7431,6 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx)
     bdrv_dec_in_flight(bs);
 }
 
-void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
-{
-    /* TODO removed in next patch */
-}
-
-void coroutine_fn bdrv_co_unlock(BlockDriverState *bs)
-{
-    /* TODO removed in next patch */
-}
-
 static void bdrv_do_remove_aio_context_notifier(BdrvAioNotifier *ban)
 {
     GLOBAL_STATE_CODE();
diff --git a/blockdev.c b/blockdev.c
index 5d8b3a23eb..3a5e7222ec 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2264,18 +2264,13 @@ void coroutine_fn qmp_block_resize(const char *device, const char *node_name,
         return;
     }
 
-    bdrv_co_lock(bs);
     bdrv_drained_begin(bs);
-    bdrv_co_unlock(bs);
 
     old_ctx = bdrv_co_enter(bs);
     blk_co_truncate(blk, size, false, PREALLOC_MODE_OFF, 0, errp);
     bdrv_co_leave(bs, old_ctx);
 
-    bdrv_co_lock(bs);
     bdrv_drained_end(bs);
-    bdrv_co_unlock(bs);
-
     blk_co_unref(blk);
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 22/33] scsi: remove AioContext locking
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (20 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 21/33] block: remove bdrv_co_lock() Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 23/33] aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED() Kevin Wolf
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

The AioContext lock no longer has any effect. Remove it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231205182011.1976568-9-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/hw/virtio/virtio-scsi.h | 14 --------------
 hw/scsi/scsi-bus.c              |  2 --
 hw/scsi/scsi-disk.c             | 31 +++++--------------------------
 hw/scsi/virtio-scsi.c           | 18 ------------------
 4 files changed, 5 insertions(+), 60 deletions(-)

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index da8cb928d9..7f0573b1bf 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -101,20 +101,6 @@ struct VirtIOSCSI {
     uint32_t host_features;
 };
 
-static inline void virtio_scsi_acquire(VirtIOSCSI *s)
-{
-    if (s->ctx) {
-        aio_context_acquire(s->ctx);
-    }
-}
-
-static inline void virtio_scsi_release(VirtIOSCSI *s)
-{
-    if (s->ctx) {
-        aio_context_release(s->ctx);
-    }
-}
-
 void virtio_scsi_common_realize(DeviceState *dev,
                                 VirtIOHandleOutput ctrl,
                                 VirtIOHandleOutput evt,
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index b649cdf555..5b08cbf60a 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1732,9 +1732,7 @@ void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense)
 {
     scsi_device_for_each_req_async(sdev, scsi_device_purge_one_req, NULL);
 
-    aio_context_acquire(blk_get_aio_context(sdev->conf.blk));
     blk_drain(sdev->conf.blk);
-    aio_context_release(blk_get_aio_context(sdev->conf.blk));
     scsi_device_set_ua(sdev, sense);
 }
 
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index a5048e0aaf..61be3d395a 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2339,14 +2339,10 @@ static void scsi_disk_reset(DeviceState *dev)
 {
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev.qdev, dev);
     uint64_t nb_sectors;
-    AioContext *ctx;
 
     scsi_device_purge_requests(&s->qdev, SENSE_CODE(RESET));
 
-    ctx = blk_get_aio_context(s->qdev.conf.blk);
-    aio_context_acquire(ctx);
     blk_get_geometry(s->qdev.conf.blk, &nb_sectors);
-    aio_context_release(ctx);
 
     nb_sectors /= s->qdev.blocksize / BDRV_SECTOR_SIZE;
     if (nb_sectors) {
@@ -2545,15 +2541,13 @@ static void scsi_unrealize(SCSIDevice *dev)
 static void scsi_hd_realize(SCSIDevice *dev, Error **errp)
 {
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
-    AioContext *ctx = NULL;
+
     /* can happen for devices without drive. The error message for missing
      * backend will be issued in scsi_realize
      */
     if (s->qdev.conf.blk) {
-        ctx = blk_get_aio_context(s->qdev.conf.blk);
-        aio_context_acquire(ctx);
         if (!blkconf_blocksizes(&s->qdev.conf, errp)) {
-            goto out;
+            return;
         }
     }
     s->qdev.blocksize = s->qdev.conf.logical_block_size;
@@ -2562,16 +2556,11 @@ static void scsi_hd_realize(SCSIDevice *dev, Error **errp)
         s->product = g_strdup("QEMU HARDDISK");
     }
     scsi_realize(&s->qdev, errp);
-out:
-    if (ctx) {
-        aio_context_release(ctx);
-    }
 }
 
 static void scsi_cd_realize(SCSIDevice *dev, Error **errp)
 {
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
-    AioContext *ctx;
     int ret;
     uint32_t blocksize = 2048;
 
@@ -2587,8 +2576,6 @@ static void scsi_cd_realize(SCSIDevice *dev, Error **errp)
         blocksize = dev->conf.physical_block_size;
     }
 
-    ctx = blk_get_aio_context(dev->conf.blk);
-    aio_context_acquire(ctx);
     s->qdev.blocksize = blocksize;
     s->qdev.type = TYPE_ROM;
     s->features |= 1 << SCSI_DISK_F_REMOVABLE;
@@ -2596,7 +2583,6 @@ static void scsi_cd_realize(SCSIDevice *dev, Error **errp)
         s->product = g_strdup("QEMU CD-ROM");
     }
     scsi_realize(&s->qdev, errp);
-    aio_context_release(ctx);
 }
 
 
@@ -2727,7 +2713,6 @@ static int get_device_type(SCSIDiskState *s)
 static void scsi_block_realize(SCSIDevice *dev, Error **errp)
 {
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
-    AioContext *ctx;
     int sg_version;
     int rc;
 
@@ -2742,9 +2727,6 @@ static void scsi_block_realize(SCSIDevice *dev, Error **errp)
                           "be removed in a future version");
     }
 
-    ctx = blk_get_aio_context(s->qdev.conf.blk);
-    aio_context_acquire(ctx);
-
     /* check we are using a driver managing SG_IO (version 3 and after) */
     rc = blk_ioctl(s->qdev.conf.blk, SG_GET_VERSION_NUM, &sg_version);
     if (rc < 0) {
@@ -2752,18 +2734,18 @@ static void scsi_block_realize(SCSIDevice *dev, Error **errp)
         if (rc != -EPERM) {
             error_append_hint(errp, "Is this a SCSI device?\n");
         }
-        goto out;
+        return;
     }
     if (sg_version < 30000) {
         error_setg(errp, "scsi generic interface too old");
-        goto out;
+        return;
     }
 
     /* get device type from INQUIRY data */
     rc = get_device_type(s);
     if (rc < 0) {
         error_setg(errp, "INQUIRY failed");
-        goto out;
+        return;
     }
 
     /* Make a guess for the block size, we'll fix it when the guest sends.
@@ -2783,9 +2765,6 @@ static void scsi_block_realize(SCSIDevice *dev, Error **errp)
 
     scsi_realize(&s->qdev, errp);
     scsi_generic_read_device_inquiry(&s->qdev);
-
-out:
-    aio_context_release(ctx);
 }
 
 typedef struct SCSIBlockReq {
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 4f8d35facc..ca365a70e9 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -642,9 +642,7 @@ static void virtio_scsi_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
         return;
     }
 
-    virtio_scsi_acquire(s);
     virtio_scsi_handle_ctrl_vq(s, vq);
-    virtio_scsi_release(s);
 }
 
 static void virtio_scsi_complete_cmd_req(VirtIOSCSIReq *req)
@@ -882,9 +880,7 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, VirtQueue *vq)
         return;
     }
 
-    virtio_scsi_acquire(s);
     virtio_scsi_handle_cmd_vq(s, vq);
-    virtio_scsi_release(s);
 }
 
 static void virtio_scsi_get_config(VirtIODevice *vdev,
@@ -1031,9 +1027,7 @@ static void virtio_scsi_handle_event(VirtIODevice *vdev, VirtQueue *vq)
         return;
     }
 
-    virtio_scsi_acquire(s);
     virtio_scsi_handle_event_vq(s, vq);
-    virtio_scsi_release(s);
 }
 
 static void virtio_scsi_change(SCSIBus *bus, SCSIDevice *dev, SCSISense sense)
@@ -1052,9 +1046,7 @@ static void virtio_scsi_change(SCSIBus *bus, SCSIDevice *dev, SCSISense sense)
             },
         };
 
-        virtio_scsi_acquire(s);
         virtio_scsi_push_event(s, &info);
-        virtio_scsi_release(s);
     }
 }
 
@@ -1071,17 +1063,13 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     VirtIODevice *vdev = VIRTIO_DEVICE(hotplug_dev);
     VirtIOSCSI *s = VIRTIO_SCSI(vdev);
     SCSIDevice *sd = SCSI_DEVICE(dev);
-    AioContext *old_context;
     int ret;
 
     if (s->ctx && !s->dataplane_fenced) {
         if (blk_op_is_blocked(sd->conf.blk, BLOCK_OP_TYPE_DATAPLANE, errp)) {
             return;
         }
-        old_context = blk_get_aio_context(sd->conf.blk);
-        aio_context_acquire(old_context);
         ret = blk_set_aio_context(sd->conf.blk, s->ctx, errp);
-        aio_context_release(old_context);
         if (ret < 0) {
             return;
         }
@@ -1097,10 +1085,8 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
             },
         };
 
-        virtio_scsi_acquire(s);
         virtio_scsi_push_event(s, &info);
         scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
-        virtio_scsi_release(s);
     }
 }
 
@@ -1122,17 +1108,13 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
 
     if (s->ctx) {
-        virtio_scsi_acquire(s);
         /* If other users keep the BlockBackend in the iothread, that's ok */
         blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
-        virtio_scsi_release(s);
     }
 
     if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
-        virtio_scsi_acquire(s);
         virtio_scsi_push_event(s, &info);
         scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
-        virtio_scsi_release(s);
     }
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 23/33] aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED()
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (21 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 22/33] scsi: remove AioContext locking Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 24/33] aio: remove aio_context_acquire()/aio_context_release() API Kevin Wolf
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Now that the AioContext lock no longer exists, AIO_WAIT_WHILE() and
AIO_WAIT_WHILE_UNLOCKED() are equivalent.

A future patch will get rid of AIO_WAIT_WHILE_UNLOCKED().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231205182011.1976568-10-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/aio-wait.h | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index 5449b6d742..157f105916 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -63,9 +63,6 @@ extern AioWait global_aio_wait;
  * @ctx: the aio context, or NULL if multiple aio contexts (for which the
  *       caller does not hold a lock) are involved in the polling condition.
  * @cond: wait while this conditional expression is true
- * @unlock: whether to unlock and then lock again @ctx. This applies
- * only when waiting for another AioContext from the main loop.
- * Otherwise it's ignored.
  *
  * Wait while a condition is true.  Use this to implement synchronous
  * operations that require event loop activity.
@@ -78,7 +75,7 @@ extern AioWait global_aio_wait;
  * wait on conditions between two IOThreads since that could lead to deadlock,
  * go via the main loop instead.
  */
-#define AIO_WAIT_WHILE_INTERNAL(ctx, cond, unlock) ({              \
+#define AIO_WAIT_WHILE_INTERNAL(ctx, cond) ({                      \
     bool waited_ = false;                                          \
     AioWait *wait_ = &global_aio_wait;                             \
     AioContext *ctx_ = (ctx);                                      \
@@ -95,13 +92,7 @@ extern AioWait global_aio_wait;
         assert(qemu_get_current_aio_context() ==                   \
                qemu_get_aio_context());                            \
         while ((cond)) {                                           \
-            if (unlock && ctx_) {                                  \
-                aio_context_release(ctx_);                         \
-            }                                                      \
             aio_poll(qemu_get_aio_context(), true);                \
-            if (unlock && ctx_) {                                  \
-                aio_context_acquire(ctx_);                         \
-            }                                                      \
             waited_ = true;                                        \
         }                                                          \
     }                                                              \
@@ -109,10 +100,11 @@ extern AioWait global_aio_wait;
     waited_; })
 
 #define AIO_WAIT_WHILE(ctx, cond)                                  \
-    AIO_WAIT_WHILE_INTERNAL(ctx, cond, true)
+    AIO_WAIT_WHILE_INTERNAL(ctx, cond)
 
+/* TODO replace this with AIO_WAIT_WHILE() in a future patch */
 #define AIO_WAIT_WHILE_UNLOCKED(ctx, cond)                         \
-    AIO_WAIT_WHILE_INTERNAL(ctx, cond, false)
+    AIO_WAIT_WHILE_INTERNAL(ctx, cond)
 
 /**
  * aio_wait_kick:
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 24/33] aio: remove aio_context_acquire()/aio_context_release() API
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (22 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 23/33] aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED() Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 25/33] docs: remove AioContext lock from IOThread docs Kevin Wolf
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Delete these functions because nothing calls these functions anymore.

I introduced these APIs in commit 98563fc3ec44 ("aio: add
aio_context_acquire() and aio_context_release()") in 2014. It's with a
sigh of relief that I delete these APIs almost 10 years later.

Thanks to Paolo Bonzini's vision for multi-queue QEMU, we got an
understanding of where the code needed to go in order to remove the
limitations that the original dataplane and the IOThread/AioContext
approach that followed it.

Emanuele Giuseppe Esposito had the splendid determination to convert
large parts of the codebase so that they no longer needed the AioContext
lock. This was a painstaking process, both in the actual code changes
required and the iterations of code review that Emanuele eked out of
Kevin and me over many months.

Kevin Wolf tackled multitudes of graph locking conversions to protect
in-flight I/O from run-time changes to the block graph as well as the
clang Thread Safety Analysis annotations that allow the compiler to
check whether the graph lock is being used correctly.

And me, well, I'm just here to add some pizzazz to the QEMU multi-queue
block layer :). Thank you to everyone who helped with this effort,
including Eric Blake, code reviewer extraordinaire, and others who I've
forgotten to mention.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231205182011.1976568-11-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/aio.h | 17 -----------------
 util/async.c        | 10 ----------
 2 files changed, 27 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index f08b358077..af05512a7d 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -278,23 +278,6 @@ void aio_context_ref(AioContext *ctx);
  */
 void aio_context_unref(AioContext *ctx);

-/* Take ownership of the AioContext.  If the AioContext will be shared between
- * threads, and a thread does not want to be interrupted, it will have to
- * take ownership around calls to aio_poll().  Otherwise, aio_poll()
- * automatically takes care of calling aio_context_acquire and
- * aio_context_release.
- *
- * Note that this is separate from bdrv_drained_begin/bdrv_drained_end.  A
- * thread still has to call those to avoid being interrupted by the guest.
- *
- * Bottom halves, timers and callbacks can be created or removed without
- * acquiring the AioContext.
- */
-void aio_context_acquire(AioContext *ctx);
-
-/* Relinquish ownership of the AioContext. */
-void aio_context_release(AioContext *ctx);
-
 /**
  * aio_bh_schedule_oneshot_full: Allocate a new bottom half structure that will
  * run only once and as soon as possible.
diff --git a/util/async.c b/util/async.c
index dfd44ef612..460529057c 100644
--- a/util/async.c
+++ b/util/async.c
@@ -719,16 +719,6 @@ void aio_context_unref(AioContext *ctx)
     g_source_unref(&ctx->source);
 }

-void aio_context_acquire(AioContext *ctx)
-{
-    /* TODO remove this function */
-}
-
-void aio_context_release(AioContext *ctx)
-{
-    /* TODO remove this function */
-}
-
 QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)

 AioContext *qemu_get_current_aio_context(void)
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 25/33] docs: remove AioContext lock from IOThread docs
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (23 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 24/33] aio: remove aio_context_acquire()/aio_context_release() API Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 26/33] scsi: remove outdated AioContext lock comment Kevin Wolf
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Encourage the use of locking primitives and stop mentioning the
AioContext lock since it is being removed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231205182011.1976568-12-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 docs/devel/multiple-iothreads.txt | 47 +++++++++++--------------------
 1 file changed, 16 insertions(+), 31 deletions(-)

diff --git a/docs/devel/multiple-iothreads.txt b/docs/devel/multiple-iothreads.txt
index a3e949f6b3..4865196bde 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -88,27 +88,18 @@ loop, depending on which AioContext instance the caller passes in.
 
 How to synchronize with an IOThread
 -----------------------------------
-AioContext is not thread-safe so some rules must be followed when using file
-descriptors, event notifiers, timers, or BHs across threads:
+Variables that can be accessed by multiple threads require some form of
+synchronization such as qemu_mutex_lock(), rcu_read_lock(), etc.
 
-1. AioContext functions can always be called safely.  They handle their
-own locking internally.
-
-2. Other threads wishing to access the AioContext must use
-aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
-context is acquired no other thread can access it or run event loop iterations
-in this AioContext.
-
-Legacy code sometimes nests aio_context_acquire()/aio_context_release() calls.
-Do not use nesting anymore, it is incompatible with the BDRV_POLL_WHILE() macro
-used in the block layer and can lead to hangs.
-
-There is currently no lock ordering rule if a thread needs to acquire multiple
-AioContexts simultaneously.  Therefore, it is only safe for code holding the
-QEMU global mutex to acquire other AioContexts.
+AioContext functions like aio_set_fd_handler(), aio_set_event_notifier(),
+aio_bh_new(), and aio_timer_new() are thread-safe. They can be used to trigger
+activity in an IOThread.
 
 Side note: the best way to schedule a function call across threads is to call
-aio_bh_schedule_oneshot().  No acquire/release or locking is needed.
+aio_bh_schedule_oneshot().
+
+The main loop thread can wait synchronously for a condition using
+AIO_WAIT_WHILE().
 
 AioContext and the block layer
 ------------------------------
@@ -124,22 +115,16 @@ Block layer code must therefore expect to run in an IOThread and avoid using
 old APIs that implicitly use the main loop.  See the "How to program for
 IOThreads" above for information on how to do that.
 
-If main loop code such as a QMP function wishes to access a BlockDriverState
-it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure
-that callbacks in the IOThread do not run in parallel.
-
 Code running in the monitor typically needs to ensure that past
 requests from the guest are completed.  When a block device is running
 in an IOThread, the IOThread can also process requests from the guest
 (via ioeventfd).  To achieve both objects, wrap the code between
 bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
-section".  The functions must be called between aio_context_acquire()
-and aio_context_release().  You can freely release and re-acquire the
-AioContext within a drained section.
-
-Long-running jobs (usually in the form of coroutines) are best scheduled in
-the BlockDriverState's AioContext to avoid the need to acquire/release around
-each bdrv_*() call.  The functions bdrv_add/remove_aio_context_notifier,
-or alternatively blk_add/remove_aio_context_notifier if you use BlockBackends,
-can be used to get a notification whenever bdrv_try_change_aio_context() moves a
+section".
+
+Long-running jobs (usually in the form of coroutines) are often scheduled in
+the BlockDriverState's AioContext.  The functions
+bdrv_add/remove_aio_context_notifier, or alternatively
+blk_add/remove_aio_context_notifier if you use BlockBackends, can be used to
+get a notification whenever bdrv_try_change_aio_context() moves a
 BlockDriverState to a different AioContext.
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 26/33] scsi: remove outdated AioContext lock comment
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (24 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 25/33] docs: remove AioContext lock from IOThread docs Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 27/33] job: remove outdated AioContext locking comments Kevin Wolf
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

The SCSI subsystem no longer uses the AioContext lock. Request
processing runs exclusively in the BlockBackend's AioContext since
"scsi: only access SCSIDevice->requests from one thread" and hence the
lock is unnecessary.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231205182011.1976568-13-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/scsi/scsi-disk.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 61be3d395a..2e7e1e9a1c 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -355,7 +355,6 @@ done:
     scsi_req_unref(&r->req);
 }
 
-/* Called with AioContext lock held */
 static void scsi_dma_complete(void *opaque, int ret)
 {
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 27/33] job: remove outdated AioContext locking comments
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (25 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 26/33] scsi: remove outdated AioContext lock comment Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 28/33] block: " Kevin Wolf
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

The AioContext lock no longer exists.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231205182011.1976568-14-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/job.h | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index e502787dd8..9ea98b5927 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -67,8 +67,6 @@ typedef struct Job {
 
     /**
      * The completion function that will be called when the job completes.
-     * Called with AioContext lock held, since many callback implementations
-     * use bdrv_* functions that require to hold the lock.
      */
     BlockCompletionFunc *cb;
 
@@ -264,9 +262,6 @@ struct JobDriver {
      *
      * This callback will not be invoked if the job has already failed.
      * If it fails, abort and then clean will be called.
-     *
-     * Called with AioContext lock held, since many callbacs implementations
-     * use bdrv_* functions that require to hold the lock.
      */
     int (*prepare)(Job *job);
 
@@ -277,9 +272,6 @@ struct JobDriver {
      *
      * All jobs will complete with a call to either .commit() or .abort() but
      * never both.
-     *
-     * Called with AioContext lock held, since many callback implementations
-     * use bdrv_* functions that require to hold the lock.
      */
     void (*commit)(Job *job);
 
@@ -290,9 +282,6 @@ struct JobDriver {
      *
      * All jobs will complete with a call to either .commit() or .abort() but
      * never both.
-     *
-     * Called with AioContext lock held, since many callback implementations
-     * use bdrv_* functions that require to hold the lock.
      */
     void (*abort)(Job *job);
 
@@ -301,9 +290,6 @@ struct JobDriver {
      * .commit() or .abort(). Regardless of which callback is invoked after
      * completion, .clean() will always be called, even if the job does not
      * belong to a transaction group.
-     *
-     * Called with AioContext lock held, since many callbacs implementations
-     * use bdrv_* functions that require to hold the lock.
      */
     void (*clean)(Job *job);
 
@@ -318,17 +304,12 @@ struct JobDriver {
      * READY).
      * (If the callback is NULL, the job is assumed to terminate
      * without I/O.)
-     *
-     * Called with AioContext lock held, since many callback implementations
-     * use bdrv_* functions that require to hold the lock.
      */
     bool (*cancel)(Job *job, bool force);
 
 
     /**
      * Called when the job is freed.
-     * Called with AioContext lock held, since many callback implementations
-     * use bdrv_* functions that require to hold the lock.
      */
     void (*free)(Job *job);
 };
@@ -424,7 +405,6 @@ void job_ref_locked(Job *job);
  * Release a reference that was previously acquired with job_ref_locked() or
  * job_create(). If it's the last reference to the object, it will be freed.
  *
- * Takes AioContext lock internally to invoke a job->driver callback.
  * Called with job lock held.
  */
 void job_unref_locked(Job *job);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 28/33] block: remove outdated AioContext locking comments
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (26 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 27/33] job: remove outdated AioContext locking comments Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 29/33] block-coroutine-wrapper: use qemu_get_current_aio_context() Kevin Wolf
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

The AioContext lock no longer exists.

There is one noteworthy change:

  - * More specifically, these functions use BDRV_POLL_WHILE(bs), which
  - * requires the caller to be either in the main thread and hold
  - * the BlockdriverState (bs) AioContext lock, or directly in the
  - * home thread that runs the bs AioContext. Calling them from
  - * another thread in another AioContext would cause deadlocks.
  + * More specifically, these functions use BDRV_POLL_WHILE(bs), which requires
  + * the caller to be either in the main thread or directly in the home thread
  + * that runs the bs AioContext. Calling them from another thread in another
  + * AioContext would cause deadlocks.

I am not sure whether deadlocks are still possible. Maybe they have just
moved to the fine-grained locks that have replaced the AioContext. Since
I am not sure if the deadlocks are gone, I have kept the substance
unchanged and just removed mention of the AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231205182011.1976568-15-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-common.h         |  3 --
 include/block/block-io.h             |  9 ++--
 include/block/block_int-common.h     |  2 -
 block.c                              | 73 ++++++----------------------
 block/block-backend.c                |  8 ---
 block/export/vhost-user-blk-server.c |  4 --
 tests/qemu-iotests/202               |  2 +-
 tests/qemu-iotests/203               |  3 +-
 8 files changed, 22 insertions(+), 82 deletions(-)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index d7599564db..a846023a09 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -70,9 +70,6 @@
  * automatically takes the graph rdlock when calling the wrapped function. In
  * the same way, no_co_wrapper_bdrv_wrlock functions automatically take the
  * graph wrlock.
- *
- * If the first parameter of the function is a BlockDriverState, BdrvChild or
- * BlockBackend pointer, the AioContext lock for it is taken in the wrapper.
  */
 #define no_co_wrapper
 #define no_co_wrapper_bdrv_rdlock
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 8eb39a858b..b49e0537dd 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -332,11 +332,10 @@ bdrv_co_copy_range(BdrvChild *src, int64_t src_offset,
  * "I/O or GS" API functions. These functions can run without
  * the BQL, but only in one specific iothread/main loop.
  *
- * More specifically, these functions use BDRV_POLL_WHILE(bs), which
- * requires the caller to be either in the main thread and hold
- * the BlockdriverState (bs) AioContext lock, or directly in the
- * home thread that runs the bs AioContext. Calling them from
- * another thread in another AioContext would cause deadlocks.
+ * More specifically, these functions use BDRV_POLL_WHILE(bs), which requires
+ * the caller to be either in the main thread or directly in the home thread
+ * that runs the bs AioContext. Calling them from another thread in another
+ * AioContext would cause deadlocks.
  *
  * Therefore, these functions are not proper I/O, because they
  * can't run in *any* iothreads, but only in a specific one.
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 4e31d161c5..151279d481 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -1192,8 +1192,6 @@ struct BlockDriverState {
     /* The error object in use for blocking operations on backing_hd */
     Error *backing_blocker;
 
-    /* Protected by AioContext lock */
-
     /*
      * If we are reading a disk image, give its size in sectors.
      * Generally read-only; it is written to by load_snapshot and
diff --git a/block.c b/block.c
index 434b7f4d72..a097772238 100644
--- a/block.c
+++ b/block.c
@@ -1616,11 +1616,6 @@ out:
     g_free(gen_node_name);
 }
 
-/*
- * The caller must always hold @bs AioContext lock, because this function calls
- * bdrv_refresh_total_sectors() which polls when called from non-coroutine
- * context.
- */
 static int no_coroutine_fn GRAPH_UNLOCKED
 bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
                  QDict *options, int open_flags, Error **errp)
@@ -2901,7 +2896,7 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
  * Replaces the node that a BdrvChild points to without updating permissions.
  *
  * If @new_bs is non-NULL, the parent of @child must already be drained through
- * @child and the caller must hold the AioContext lock for @new_bs.
+ * @child.
  */
 static void GRAPH_WRLOCK
 bdrv_replace_child_noperm(BdrvChild *child, BlockDriverState *new_bs)
@@ -3041,9 +3036,8 @@ static TransactionActionDrv bdrv_attach_child_common_drv = {
  *
  * Returns new created child.
  *
- * The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
- * @child_bs can move to a different AioContext in this function. Callers must
- * make sure that their AioContext locking is still correct after this.
+ * Both @parent_bs and @child_bs can move to a different AioContext in this
+ * function.
  */
 static BdrvChild * GRAPH_WRLOCK
 bdrv_attach_child_common(BlockDriverState *child_bs,
@@ -3142,9 +3136,8 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
 /*
  * Function doesn't update permissions, caller is responsible for this.
  *
- * The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
- * @child_bs can move to a different AioContext in this function. Callers must
- * make sure that their AioContext locking is still correct after this.
+ * Both @parent_bs and @child_bs can move to a different AioContext in this
+ * function.
  *
  * After calling this function, the transaction @tran may only be completed
  * while holding a writer lock for the graph.
@@ -3184,9 +3177,6 @@ bdrv_attach_child_noperm(BlockDriverState *parent_bs,
  *
  * On failure NULL is returned, errp is set and the reference to
  * child_bs is also dropped.
- *
- * The caller must hold the AioContext lock @child_bs, but not that of @ctx
- * (unless @child_bs is already in @ctx).
  */
 BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
                                   const char *child_name,
@@ -3226,9 +3216,6 @@ out:
  *
  * On failure NULL is returned, errp is set and the reference to
  * child_bs is also dropped.
- *
- * If @parent_bs and @child_bs are in different AioContexts, the caller must
- * hold the AioContext lock for @child_bs, but not for @parent_bs.
  */
 BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
                              BlockDriverState *child_bs,
@@ -3418,9 +3405,8 @@ static BdrvChildRole bdrv_backing_role(BlockDriverState *bs)
  *
  * Function doesn't update permissions, caller is responsible for this.
  *
- * The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
- * @child_bs can move to a different AioContext in this function. Callers must
- * make sure that their AioContext locking is still correct after this.
+ * Both @parent_bs and @child_bs can move to a different AioContext in this
+ * function.
  *
  * After calling this function, the transaction @tran may only be completed
  * while holding a writer lock for the graph.
@@ -3513,9 +3499,8 @@ out:
 }
 
 /*
- * The caller must hold the AioContext lock for @backing_hd. Both @bs and
- * @backing_hd can move to a different AioContext in this function. Callers must
- * make sure that their AioContext locking is still correct after this.
+ * Both @bs and @backing_hd can move to a different AioContext in this
+ * function.
  *
  * If a backing child is already present (i.e. we're detaching a node), that
  * child node must be drained.
@@ -3574,8 +3559,6 @@ int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
  * itself, all options starting with "${bdref_key}." are considered part of the
  * BlockdevRef.
  *
- * The caller must hold the main AioContext lock.
- *
  * TODO Can this be unified with bdrv_open_image()?
  */
 int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
@@ -3745,9 +3728,7 @@ done:
  *
  * The BlockdevRef will be removed from the options QDict.
  *
- * The caller must hold the lock of the main AioContext and no other AioContext.
- * @parent can move to a different AioContext in this function. Callers must
- * make sure that their AioContext locking is still correct after this.
+ * @parent can move to a different AioContext in this function.
  */
 BdrvChild *bdrv_open_child(const char *filename,
                            QDict *options, const char *bdref_key,
@@ -3778,9 +3759,7 @@ BdrvChild *bdrv_open_child(const char *filename,
 /*
  * Wrapper on bdrv_open_child() for most popular case: open primary child of bs.
  *
- * The caller must hold the lock of the main AioContext and no other AioContext.
- * @parent can move to a different AioContext in this function. Callers must
- * make sure that their AioContext locking is still correct after this.
+ * @parent can move to a different AioContext in this function.
  */
 int bdrv_open_file_child(const char *filename,
                          QDict *options, const char *bdref_key,
@@ -3923,8 +3902,6 @@ out:
  * The reference parameter may be used to specify an existing block device which
  * should be opened. If specified, neither options nor a filename may be given,
  * nor can an existing BDS be reused (that is, *pbs has to be NULL).
- *
- * The caller must always hold the main AioContext lock.
  */
 static BlockDriverState * no_coroutine_fn
 bdrv_open_inherit(const char *filename, const char *reference, QDict *options,
@@ -4217,7 +4194,6 @@ close_and_fail:
     return NULL;
 }
 
-/* The caller must always hold the main AioContext lock. */
 BlockDriverState *bdrv_open(const char *filename, const char *reference,
                             QDict *options, int flags, Error **errp)
 {
@@ -4665,10 +4641,7 @@ int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
  *
  * Return 0 on success, otherwise return < 0 and set @errp.
  *
- * The caller must hold the AioContext lock of @reopen_state->bs.
  * @reopen_state->bs can move to a different AioContext in this function.
- * Callers must make sure that their AioContext locking is still correct after
- * this.
  */
 static int GRAPH_UNLOCKED
 bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
@@ -4801,8 +4774,6 @@ out_rdlock:
  * It is the responsibility of the caller to then call the abort() or
  * commit() for any other BDS that have been left in a prepare() state
  *
- * The caller must hold the AioContext lock of @reopen_state->bs.
- *
  * After calling this function, the transaction @change_child_tran may only be
  * completed while holding a writer lock for the graph.
  */
@@ -5437,8 +5408,6 @@ int bdrv_drop_filter(BlockDriverState *bs, Error **errp)
  * child.
  *
  * This function does not create any image files.
- *
- * The caller must hold the AioContext lock for @bs_top.
  */
 int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
                 Error **errp)
@@ -5545,9 +5514,8 @@ static void bdrv_delete(BlockDriverState *bs)
  * after the call (even on failure), so if the caller intends to reuse the
  * dictionary, it needs to use qobject_ref() before calling bdrv_open.
  *
- * The caller holds the AioContext lock for @bs. It must make sure that @bs
- * stays in the same AioContext, i.e. @options must not refer to nodes in a
- * different AioContext.
+ * The caller must make sure that @bs stays in the same AioContext, i.e.
+ * @options must not refer to nodes in a different AioContext.
  */
 BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *options,
                                    int flags, Error **errp)
@@ -7565,10 +7533,6 @@ static TransactionActionDrv set_aio_context = {
  *
  * Must be called from the main AioContext.
  *
- * The caller must own the AioContext lock for the old AioContext of bs, but it
- * must not own the AioContext lock for new_context (unless new_context is the
- * same as the current context of bs).
- *
  * @visited will accumulate all visited BdrvChild objects. The caller is
  * responsible for freeing the list afterwards.
  */
@@ -7621,13 +7585,6 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
  *
  * If ignore_child is not NULL, that child (and its subgraph) will not
  * be touched.
- *
- * This function still requires the caller to take the bs current
- * AioContext lock, otherwise draining will fail since AIO_WAIT_WHILE
- * assumes the lock is always held if bs is in another AioContext.
- * For the same reason, it temporarily also holds the new AioContext, since
- * bdrv_drained_end calls BDRV_POLL_WHILE that assumes the lock is taken too.
- * Therefore the new AioContext lock must not be taken by the caller.
  */
 int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
                                 BdrvChild *ignore_child, Error **errp)
@@ -7653,8 +7610,8 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
 
     /*
      * Linear phase: go through all callbacks collected in the transaction.
-     * Run all callbacks collected in the recursion to switch all nodes
-     * AioContext lock (transaction commit), or undo all changes done in the
+     * Run all callbacks collected in the recursion to switch every node's
+     * AioContext (transaction commit), or undo all changes done in the
      * recursion (transaction abort).
      */
 
diff --git a/block/block-backend.c b/block/block-backend.c
index f412bed274..209eb07528 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -390,8 +390,6 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm)
  * Both sets of permissions can be changed later using blk_set_perm().
  *
  * Return the new BlockBackend on success, null on failure.
- *
- * Callers must hold the AioContext lock of @bs.
  */
 BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm,
                               uint64_t shared_perm, Error **errp)
@@ -416,8 +414,6 @@ BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm,
  * Just as with bdrv_open(), after having called this function the reference to
  * @options belongs to the block layer (even on failure).
  *
- * Called without holding an AioContext lock.
- *
  * TODO: Remove @filename and @flags; it should be possible to specify a whole
  * BDS tree just by specifying the @options QDict (or @reference,
  * alternatively). At the time of adding this function, this is not possible,
@@ -872,8 +868,6 @@ BlockBackend *blk_by_public(BlockBackendPublic *public)
 
 /*
  * Disassociates the currently associated BlockDriverState from @blk.
- *
- * The caller must hold the AioContext lock for the BlockBackend.
  */
 void blk_remove_bs(BlockBackend *blk)
 {
@@ -915,8 +909,6 @@ void blk_remove_bs(BlockBackend *blk)
 
 /*
  * Associates a new BlockDriverState with @blk.
- *
- * Callers must hold the AioContext lock of @bs.
  */
 int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
 {
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index 16f48388d3..50c358e8cd 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -278,7 +278,6 @@ static void vu_blk_exp_resize(void *opaque)
     vu_config_change_msg(&vexp->vu_server.vu_dev);
 }
 
-/* Called with vexp->export.ctx acquired */
 static void vu_blk_drained_begin(void *opaque)
 {
     VuBlkExport *vexp = opaque;
@@ -287,7 +286,6 @@ static void vu_blk_drained_begin(void *opaque)
     vhost_user_server_detach_aio_context(&vexp->vu_server);
 }
 
-/* Called with vexp->export.blk AioContext acquired */
 static void vu_blk_drained_end(void *opaque)
 {
     VuBlkExport *vexp = opaque;
@@ -300,8 +298,6 @@ static void vu_blk_drained_end(void *opaque)
  * Ensures that bdrv_drained_begin() waits until in-flight requests complete
  * and the server->co_trip coroutine has terminated. It will be restarted in
  * vhost_user_server_attach_aio_context().
- *
- * Called with vexp->export.ctx acquired.
  */
 static bool vu_blk_drained_poll(void *opaque)
 {
diff --git a/tests/qemu-iotests/202 b/tests/qemu-iotests/202
index b784dcd791..13304242e5 100755
--- a/tests/qemu-iotests/202
+++ b/tests/qemu-iotests/202
@@ -21,7 +21,7 @@
 # Check that QMP 'transaction' blockdev-snapshot-sync with multiple drives on a
 # single IOThread completes successfully.  This particular command triggered a
 # hang due to recursive AioContext locking and BDRV_POLL_WHILE().  Protect
-# against regressions.
+# against regressions even though the AioContext lock no longer exists.
 
 import iotests
 
diff --git a/tests/qemu-iotests/203 b/tests/qemu-iotests/203
index ab80fd0e44..1ba878522b 100755
--- a/tests/qemu-iotests/203
+++ b/tests/qemu-iotests/203
@@ -21,7 +21,8 @@
 # Check that QMP 'migrate' with multiple drives on a single IOThread completes
 # successfully.  This particular command triggered a hang in the source QEMU
 # process due to recursive AioContext locking in bdrv_invalidate_all() and
-# BDRV_POLL_WHILE().
+# BDRV_POLL_WHILE().  Protect against regressions even though the AioContext
+# lock no longer exists.
 
 import iotests
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 29/33] block-coroutine-wrapper: use qemu_get_current_aio_context()
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (27 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 28/33] block: " Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 30/33] string-output-visitor: show structs as "<omitted>" Kevin Wolf
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Use qemu_get_current_aio_context() in mixed wrappers and coroutine
wrappers so that code runs in the caller's AioContext instead of moving
to the BlockDriverState's AioContext. This change is necessary for the
multi-queue block layer where any thread can call into the block layer.

Most wrappers are IO_CODE where it's safe to use the current AioContext
nowadays. BlockDrivers and the core block layer use their own locks and
no longer depend on the AioContext lock for thread-safety.

The bdrv_create() wrapper invokes GLOBAL_STATE code. Using the current
AioContext is safe because this code is only called with the BQL held
from the main loop thread.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230912231037.826804-6-stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 scripts/block-coroutine-wrapper.py | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/scripts/block-coroutine-wrapper.py b/scripts/block-coroutine-wrapper.py
index c9c09fcacd..dbbde99e39 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -92,8 +92,6 @@ def __init__(self, wrapper_type: str, return_type: str, name: str,
                                  f"{self.name}")
             self.target_name = f'{subsystem}_{subname}'
 
-        self.ctx = self.gen_ctx()
-
         self.get_result = 's->ret = '
         self.ret = 'return s.ret;'
         self.co_ret = 'return '
@@ -167,7 +165,7 @@ def create_mixed_wrapper(func: FuncDecl) -> str:
         {func.co_ret}{name}({ func.gen_list('{name}') });
     }} else {{
         {struct_name} s = {{
-            .poll_state.ctx = {func.ctx},
+            .poll_state.ctx = qemu_get_current_aio_context(),
             .poll_state.in_progress = true,
 
 { func.gen_block('            .{name} = {name},') }
@@ -191,7 +189,7 @@ def create_co_wrapper(func: FuncDecl) -> str:
 {func.return_type} {func.name}({ func.gen_list('{decl}') })
 {{
     {struct_name} s = {{
-        .poll_state.ctx = {func.ctx},
+        .poll_state.ctx = qemu_get_current_aio_context(),
         .poll_state.in_progress = true,
 
 { func.gen_block('        .{name} = {name},') }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 30/33] string-output-visitor: show structs as "<omitted>"
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (28 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 29/33] block-coroutine-wrapper: use qemu_get_current_aio_context() Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 31/33] qdev-properties: alias all object class properties Kevin Wolf
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

StringOutputVisitor crashes when it visits a struct because
->start_struct() is NULL.

Show "<omitted>" instead of crashing. This is necessary because the
virtio-blk-pci iothread-vq-mapping parameter that I'd like to introduce
soon is a list of IOThreadMapping structs.

This patch is a quick fix to solve the crash, but the long-term solution
is replacing StringOutputVisitor with something that can handle the full
gamut of values in QEMU.

Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231212134934.500289-1-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qapi/string-output-visitor.h |  6 +++---
 qapi/string-output-visitor.c         | 16 ++++++++++++++++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/qapi/string-output-visitor.h b/include/qapi/string-output-visitor.h
index 268dfe9986..b1ee473b30 100644
--- a/include/qapi/string-output-visitor.h
+++ b/include/qapi/string-output-visitor.h
@@ -26,9 +26,9 @@ typedef struct StringOutputVisitor StringOutputVisitor;
  * If everything else succeeds, pass @result to visit_complete() to
  * collect the result of the visit.
  *
- * The string output visitor does not implement support for visiting
- * QAPI structs, alternates, null, or arbitrary QTypes.  It also
- * requires a non-null list argument to visit_start_list().
+ * The string output visitor does not implement support for alternates, null,
+ * or arbitrary QTypes.  Struct fields are not shown.  It also requires a
+ * non-null list argument to visit_start_list().
  */
 Visitor *string_output_visitor_new(bool human, char **result);
 
diff --git a/qapi/string-output-visitor.c b/qapi/string-output-visitor.c
index c0cb72dbe4..f0c1dea89e 100644
--- a/qapi/string-output-visitor.c
+++ b/qapi/string-output-visitor.c
@@ -292,6 +292,20 @@ static bool print_type_null(Visitor *v, const char *name, QNull **obj,
     return true;
 }
 
+static bool start_struct(Visitor *v, const char *name, void **obj,
+                         size_t size, Error **errp)
+{
+    return true;
+}
+
+static void end_struct(Visitor *v, void **obj)
+{
+    StringOutputVisitor *sov = to_sov(v);
+
+    /* TODO actually print struct fields */
+    string_output_set(sov, g_strdup("<omitted>"));
+}
+
 static bool
 start_list(Visitor *v, const char *name, GenericList **list, size_t size,
            Error **errp)
@@ -379,6 +393,8 @@ Visitor *string_output_visitor_new(bool human, char **result)
     v->visitor.type_str = print_type_str;
     v->visitor.type_number = print_type_number;
     v->visitor.type_null = print_type_null;
+    v->visitor.start_struct = start_struct;
+    v->visitor.end_struct = end_struct;
     v->visitor.start_list = start_list;
     v->visitor.next_list = next_list;
     v->visitor.end_list = end_list;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 31/33] qdev-properties: alias all object class properties
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (29 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 30/33] string-output-visitor: show structs as "<omitted>" Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 32/33] qdev: add IOThreadVirtQueueMappingList property type Kevin Wolf
  2023-12-21 21:23 ` [PULL 33/33] virtio-blk: add iothread-vq-mapping parameter Kevin Wolf
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

qdev_alias_all_properties() aliases a DeviceState's qdev properties onto
an Object. This is used for VirtioPCIProxy types so that --device
virtio-blk-pci has properties of its embedded --device virtio-blk-device
object.

Currently this function is implemented using qdev properties. Change the
function to use QOM object class properties instead. This works because
qdev properties create QOM object class properties, but it also catches
any QOM object class-only properties that have no qdev properties.

This change ensures that properties of devices are shown with --device
foo,\? even if they are QOM object class properties.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231220134755.814917-2-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/hw/qdev-properties.h |  4 ++--
 hw/core/qdev-properties.c    | 18 ++++++++++--------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 25743a29a0..09aa04ca1e 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -230,8 +230,8 @@ void qdev_property_add_static(DeviceState *dev, Property *prop);
  * @target: Device which has properties to be aliased
  * @source: Object to add alias properties to
  *
- * Add alias properties to the @source object for all qdev properties on
- * the @target DeviceState.
+ * Add alias properties to the @source object for all properties on the @target
+ * DeviceState.
  *
  * This is useful when @target is an internal implementation object
  * owned by @source, and you want to expose all the properties of that
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 840006e953..7d6fa726fd 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -1076,16 +1076,18 @@ void device_class_set_props(DeviceClass *dc, Property *props)
 void qdev_alias_all_properties(DeviceState *target, Object *source)
 {
     ObjectClass *class;
-    Property *prop;
+    ObjectPropertyIterator iter;
+    ObjectProperty *prop;
 
     class = object_get_class(OBJECT(target));
-    do {
-        DeviceClass *dc = DEVICE_CLASS(class);
 
-        for (prop = dc->props_; prop && prop->name; prop++) {
-            object_property_add_alias(source, prop->name,
-                                      OBJECT(target), prop->name);
+    object_class_property_iter_init(&iter, class);
+    while ((prop = object_property_iter_next(&iter))) {
+        if (object_property_find(source, prop->name)) {
+            continue; /* skip duplicate properties */
         }
-        class = object_class_get_parent(class);
-    } while (class != object_class_by_name(TYPE_DEVICE));
+
+        object_property_add_alias(source, prop->name,
+                                  OBJECT(target), prop->name);
+    }
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 32/33] qdev: add IOThreadVirtQueueMappingList property type
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (30 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 31/33] qdev-properties: alias all object class properties Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  2023-12-21 21:23 ` [PULL 33/33] virtio-blk: add iothread-vq-mapping parameter Kevin Wolf
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

virtio-blk and virtio-scsi devices will need a way to specify the
mapping between IOThreads and virtqueues. At the moment all virtqueues
are assigned to a single IOThread or the main loop. This single thread
can be a CPU bottleneck, so it is necessary to allow finer-grained
assignment to spread the load.

Introduce DEFINE_PROP_IOTHREAD_VQ_MAPPING_LIST() so devices can take a
parameter that maps virtqueues to IOThreads. The command-line syntax for
this new property is as follows:

  --device '{"driver":"foo","iothread-vq-mapping":[{"iothread":"iothread0","vqs":[0,1,2]},...]}'

IOThreads are specified by name and virtqueues are specified by 0-based
index.

It will be common to simply assign virtqueues round-robin across a set
of IOThreads. A convenient syntax that does not require specifying
individual virtqueue indices is available:

  --device '{"driver":"foo","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"},...]}'

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231220134755.814917-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qapi/virtio.json                    | 29 ++++++++++++++++++
 include/hw/qdev-properties-system.h |  5 ++++
 hw/core/qdev-properties-system.c    | 46 +++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

diff --git a/qapi/virtio.json b/qapi/virtio.json
index e6dcee7b83..19c7c36e36 100644
--- a/qapi/virtio.json
+++ b/qapi/virtio.json
@@ -928,3 +928,32 @@
   'data': { 'path': 'str', 'queue': 'uint16', '*index': 'uint16' },
   'returns': 'VirtioQueueElement',
   'features': [ 'unstable' ] }
+
+##
+# @IOThreadVirtQueueMapping:
+#
+# Describes the subset of virtqueues assigned to an IOThread.
+#
+# @iothread: the id of IOThread object
+#
+# @vqs: an optional array of virtqueue indices that will be handled by this
+#     IOThread.  When absent, virtqueues are assigned round-robin across all
+#     IOThreadVirtQueueMappings provided.  Either all IOThreadVirtQueueMappings
+#     must have @vqs or none of them must have it.
+#
+# Since: 9.0
+##
+
+{ 'struct': 'IOThreadVirtQueueMapping',
+  'data': { 'iothread': 'str', '*vqs': ['uint16'] } }
+
+##
+# @DummyVirtioForceArrays:
+#
+# Not used by QMP; hack to let us use IOThreadVirtQueueMappingList internally
+#
+# Since: 9.0
+##
+
+{ 'struct': 'DummyVirtioForceArrays',
+  'data': { 'unused-iothread-vq-mapping': ['IOThreadVirtQueueMapping'] } }
diff --git a/include/hw/qdev-properties-system.h b/include/hw/qdev-properties-system.h
index 91f7a2452d..06c359c190 100644
--- a/include/hw/qdev-properties-system.h
+++ b/include/hw/qdev-properties-system.h
@@ -24,6 +24,7 @@ extern const PropertyInfo qdev_prop_off_auto_pcibar;
 extern const PropertyInfo qdev_prop_pcie_link_speed;
 extern const PropertyInfo qdev_prop_pcie_link_width;
 extern const PropertyInfo qdev_prop_cpus390entitlement;
+extern const PropertyInfo qdev_prop_iothread_vq_mapping_list;
 
 #define DEFINE_PROP_PCI_DEVFN(_n, _s, _f, _d)                   \
     DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_pci_devfn, int32_t)
@@ -82,4 +83,8 @@ extern const PropertyInfo qdev_prop_cpus390entitlement;
     DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_cpus390entitlement, \
                        CpuS390Entitlement)
 
+#define DEFINE_PROP_IOTHREAD_VQ_MAPPING_LIST(_name, _state, _field) \
+    DEFINE_PROP(_name, _state, _field, qdev_prop_iothread_vq_mapping_list, \
+                IOThreadVirtQueueMappingList *)
+
 #endif
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 73cced4626..1a396521d5 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -18,6 +18,7 @@
 #include "qapi/qapi-types-block.h"
 #include "qapi/qapi-types-machine.h"
 #include "qapi/qapi-types-migration.h"
+#include "qapi/qapi-visit-virtio.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/ctype.h"
 #include "qemu/cutils.h"
@@ -1160,3 +1161,48 @@ const PropertyInfo qdev_prop_cpus390entitlement = {
     .set   = qdev_propinfo_set_enum,
     .set_default_value = qdev_propinfo_set_default_value_enum,
 };
+
+/* --- IOThreadVirtQueueMappingList --- */
+
+static void get_iothread_vq_mapping_list(Object *obj, Visitor *v,
+        const char *name, void *opaque, Error **errp)
+{
+    IOThreadVirtQueueMappingList **prop_ptr =
+        object_field_prop_ptr(obj, opaque);
+
+    visit_type_IOThreadVirtQueueMappingList(v, name, prop_ptr, errp);
+}
+
+static void set_iothread_vq_mapping_list(Object *obj, Visitor *v,
+        const char *name, void *opaque, Error **errp)
+{
+    IOThreadVirtQueueMappingList **prop_ptr =
+        object_field_prop_ptr(obj, opaque);
+    IOThreadVirtQueueMappingList *list;
+
+    if (!visit_type_IOThreadVirtQueueMappingList(v, name, &list, errp)) {
+        return;
+    }
+
+    qapi_free_IOThreadVirtQueueMappingList(*prop_ptr);
+    *prop_ptr = list;
+}
+
+static void release_iothread_vq_mapping_list(Object *obj,
+        const char *name, void *opaque)
+{
+    IOThreadVirtQueueMappingList **prop_ptr =
+        object_field_prop_ptr(obj, opaque);
+
+    qapi_free_IOThreadVirtQueueMappingList(*prop_ptr);
+    *prop_ptr = NULL;
+}
+
+const PropertyInfo qdev_prop_iothread_vq_mapping_list = {
+    .name = "IOThreadVirtQueueMappingList",
+    .description = "IOThread virtqueue mapping list [{\"iothread\":\"<id>\", "
+                   "\"vqs\":[1,2,3,...]},...]",
+    .get = get_iothread_vq_mapping_list,
+    .set = set_iothread_vq_mapping_list,
+    .release = release_iothread_vq_mapping_list,
+};
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PULL 33/33] virtio-blk: add iothread-vq-mapping parameter
  2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
                   ` (31 preceding siblings ...)
  2023-12-21 21:23 ` [PULL 32/33] qdev: add IOThreadVirtQueueMappingList property type Kevin Wolf
@ 2023-12-21 21:23 ` Kevin Wolf
  32 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-12-21 21:23 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, stefanha, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Add the iothread-vq-mapping parameter to assign virtqueues to IOThreads.
Store the vq:AioContext mapping in the new struct
VirtIOBlockDataPlane->vq_aio_context[] field and refactor the code to
use the per-vq AioContext instead of the BlockDriverState's AioContext.

Reimplement --device virtio-blk-pci,iothread= and non-IOThread mode by
assigning all virtqueues to the IOThread and main loop's AioContext in
vq_aio_context[], respectively.

The comment in struct VirtIOBlockDataPlane about EventNotifiers is
stale. Remove it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231220134755.814917-5-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/block/dataplane/virtio-blk.h |   3 +
 include/hw/virtio/virtio-blk.h  |   2 +
 hw/block/dataplane/virtio-blk.c | 155 ++++++++++++++++++++++++--------
 hw/block/virtio-blk.c           |  92 ++++++++++++++++---
 4 files changed, 202 insertions(+), 50 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.h b/hw/block/dataplane/virtio-blk.h
index 5e18bb99ae..1a806fe447 100644
--- a/hw/block/dataplane/virtio-blk.h
+++ b/hw/block/dataplane/virtio-blk.h
@@ -28,4 +28,7 @@ void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, VirtQueue *vq);
 int virtio_blk_data_plane_start(VirtIODevice *vdev);
 void virtio_blk_data_plane_stop(VirtIODevice *vdev);
 
+void virtio_blk_data_plane_detach(VirtIOBlockDataPlane *s);
+void virtio_blk_data_plane_attach(VirtIOBlockDataPlane *s);
+
 #endif /* HW_DATAPLANE_VIRTIO_BLK_H */
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 9881009c22..5e4091e4da 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -21,6 +21,7 @@
 #include "sysemu/block-backend.h"
 #include "sysemu/block-ram-registrar.h"
 #include "qom/object.h"
+#include "qapi/qapi-types-virtio.h"
 
 #define TYPE_VIRTIO_BLK "virtio-blk-device"
 OBJECT_DECLARE_SIMPLE_TYPE(VirtIOBlock, VIRTIO_BLK)
@@ -37,6 +38,7 @@ struct VirtIOBlkConf
 {
     BlockConf conf;
     IOThread *iothread;
+    IOThreadVirtQueueMappingList *iothread_vq_mapping_list;
     char *serial;
     uint32_t request_merging;
     uint16_t num_queues;
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 7bbbd981ad..6debd4401e 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -32,13 +32,11 @@ struct VirtIOBlockDataPlane {
     VirtIOBlkConf *conf;
     VirtIODevice *vdev;
 
-    /* Note that these EventNotifiers are assigned by value.  This is
-     * fine as long as you do not call event_notifier_cleanup on them
-     * (because you don't own the file descriptor or handle; you just
-     * use it).
+    /*
+     * The AioContext for each virtqueue. The BlockDriverState will use the
+     * first element as its AioContext.
      */
-    IOThread *iothread;
-    AioContext *ctx;
+    AioContext **vq_aio_context;
 };
 
 /* Raise an interrupt to signal guest, if necessary */
@@ -47,6 +45,45 @@ void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, VirtQueue *vq)
     virtio_notify_irqfd(s->vdev, vq);
 }
 
+/* Generate vq:AioContext mappings from a validated iothread-vq-mapping list */
+static void
+apply_vq_mapping(IOThreadVirtQueueMappingList *iothread_vq_mapping_list,
+                 AioContext **vq_aio_context, uint16_t num_queues)
+{
+    IOThreadVirtQueueMappingList *node;
+    size_t num_iothreads = 0;
+    size_t cur_iothread = 0;
+
+    for (node = iothread_vq_mapping_list; node; node = node->next) {
+        num_iothreads++;
+    }
+
+    for (node = iothread_vq_mapping_list; node; node = node->next) {
+        IOThread *iothread = iothread_by_id(node->value->iothread);
+        AioContext *ctx = iothread_get_aio_context(iothread);
+
+        /* Released in virtio_blk_data_plane_destroy() */
+        object_ref(OBJECT(iothread));
+
+        if (node->value->vqs) {
+            uint16List *vq;
+
+            /* Explicit vq:IOThread assignment */
+            for (vq = node->value->vqs; vq; vq = vq->next) {
+                vq_aio_context[vq->value] = ctx;
+            }
+        } else {
+            /* Round-robin vq:IOThread assignment */
+            for (unsigned i = cur_iothread; i < num_queues;
+                 i += num_iothreads) {
+                vq_aio_context[i] = ctx;
+            }
+        }
+
+        cur_iothread++;
+    }
+}
+
 /* Context: QEMU global mutex held */
 bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
                                   VirtIOBlockDataPlane **dataplane,
@@ -58,7 +95,7 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
 
     *dataplane = NULL;
 
-    if (conf->iothread) {
+    if (conf->iothread || conf->iothread_vq_mapping_list) {
         if (!k->set_guest_notifiers || !k->ioeventfd_assign) {
             error_setg(errp,
                        "device is incompatible with iothread "
@@ -86,13 +123,24 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
     s = g_new0(VirtIOBlockDataPlane, 1);
     s->vdev = vdev;
     s->conf = conf;
+    s->vq_aio_context = g_new(AioContext *, conf->num_queues);
+
+    if (conf->iothread_vq_mapping_list) {
+        apply_vq_mapping(conf->iothread_vq_mapping_list, s->vq_aio_context,
+                         conf->num_queues);
+    } else if (conf->iothread) {
+        AioContext *ctx = iothread_get_aio_context(conf->iothread);
+        for (unsigned i = 0; i < conf->num_queues; i++) {
+            s->vq_aio_context[i] = ctx;
+        }
 
-    if (conf->iothread) {
-        s->iothread = conf->iothread;
-        object_ref(OBJECT(s->iothread));
-        s->ctx = iothread_get_aio_context(s->iothread);
+        /* Released in virtio_blk_data_plane_destroy() */
+        object_ref(OBJECT(conf->iothread));
     } else {
-        s->ctx = qemu_get_aio_context();
+        AioContext *ctx = qemu_get_aio_context();
+        for (unsigned i = 0; i < conf->num_queues; i++) {
+            s->vq_aio_context[i] = ctx;
+        }
     }
 
     *dataplane = s;
@@ -104,6 +152,7 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
 void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 {
     VirtIOBlock *vblk;
+    VirtIOBlkConf *conf = s->conf;
 
     if (!s) {
         return;
@@ -111,9 +160,21 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 
     vblk = VIRTIO_BLK(s->vdev);
     assert(!vblk->dataplane_started);
-    if (s->iothread) {
-        object_unref(OBJECT(s->iothread));
+
+    if (conf->iothread_vq_mapping_list) {
+        IOThreadVirtQueueMappingList *node;
+
+        for (node = conf->iothread_vq_mapping_list; node; node = node->next) {
+            IOThread *iothread = iothread_by_id(node->value->iothread);
+            object_unref(OBJECT(iothread));
+        }
+    }
+
+    if (conf->iothread) {
+        object_unref(OBJECT(conf->iothread));
     }
+
+    g_free(s->vq_aio_context);
     g_free(s);
 }
 
@@ -177,19 +238,13 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
     trace_virtio_blk_data_plane_start(s);
 
-    r = blk_set_aio_context(s->conf->conf.blk, s->ctx, &local_err);
+    r = blk_set_aio_context(s->conf->conf.blk, s->vq_aio_context[0],
+                            &local_err);
     if (r < 0) {
         error_report_err(local_err);
         goto fail_aio_context;
     }
 
-    /* Kick right away to begin processing requests already in vring */
-    for (i = 0; i < nvqs; i++) {
-        VirtQueue *vq = virtio_get_queue(s->vdev, i);
-
-        event_notifier_set(virtio_queue_get_host_notifier(vq));
-    }
-
     /*
      * These fields must be visible to the IOThread when it processes the
      * virtqueue, otherwise it will think dataplane has not started yet.
@@ -206,8 +261,12 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
     if (!blk_in_drain(s->conf->conf.blk)) {
         for (i = 0; i < nvqs; i++) {
             VirtQueue *vq = virtio_get_queue(s->vdev, i);
+            AioContext *ctx = s->vq_aio_context[i];
 
-            virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+            /* Kick right away to begin processing requests already in vring */
+            event_notifier_set(virtio_queue_get_host_notifier(vq));
+
+            virtio_queue_aio_attach_host_notifier(vq, ctx);
         }
     }
     return 0;
@@ -236,23 +295,18 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
  *
  * Context: BH in IOThread
  */
-static void virtio_blk_data_plane_stop_bh(void *opaque)
+static void virtio_blk_data_plane_stop_vq_bh(void *opaque)
 {
-    VirtIOBlockDataPlane *s = opaque;
-    unsigned i;
-
-    for (i = 0; i < s->conf->num_queues; i++) {
-        VirtQueue *vq = virtio_get_queue(s->vdev, i);
-        EventNotifier *host_notifier = virtio_queue_get_host_notifier(vq);
+    VirtQueue *vq = opaque;
+    EventNotifier *host_notifier = virtio_queue_get_host_notifier(vq);
 
-        virtio_queue_aio_detach_host_notifier(vq, s->ctx);
+    virtio_queue_aio_detach_host_notifier(vq, qemu_get_current_aio_context());
 
-        /*
-         * Test and clear notifier after disabling event, in case poll callback
-         * didn't have time to run.
-         */
-        virtio_queue_host_notifier_read(host_notifier);
-    }
+    /*
+     * Test and clear notifier after disabling event, in case poll callback
+     * didn't have time to run.
+     */
+    virtio_queue_host_notifier_read(host_notifier);
 }
 
 /* Context: QEMU global mutex held */
@@ -279,7 +333,12 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
     trace_virtio_blk_data_plane_stop(s);
 
     if (!blk_in_drain(s->conf->conf.blk)) {
-        aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
+        for (i = 0; i < nvqs; i++) {
+            VirtQueue *vq = virtio_get_queue(s->vdev, i);
+            AioContext *ctx = s->vq_aio_context[i];
+
+            aio_wait_bh_oneshot(ctx, virtio_blk_data_plane_stop_vq_bh, vq);
+        }
     }
 
     /*
@@ -322,3 +381,23 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
     s->stopping = false;
 }
+
+void virtio_blk_data_plane_detach(VirtIOBlockDataPlane *s)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(s->vdev);
+
+    for (uint16_t i = 0; i < s->conf->num_queues; i++) {
+        VirtQueue *vq = virtio_get_queue(vdev, i);
+        virtio_queue_aio_detach_host_notifier(vq, s->vq_aio_context[i]);
+    }
+}
+
+void virtio_blk_data_plane_attach(VirtIOBlockDataPlane *s)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(s->vdev);
+
+    for (uint16_t i = 0; i < s->conf->num_queues; i++) {
+        VirtQueue *vq = virtio_get_queue(vdev, i);
+        virtio_queue_aio_attach_host_notifier(vq, s->vq_aio_context[i]);
+    }
+}
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index ec9ed09a6a..46e73b2c96 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1151,6 +1151,7 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
             return;
         }
     }
+
     virtio_blk_handle_vq(s, vq);
 }
 
@@ -1463,6 +1464,68 @@ static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f,
     return 0;
 }
 
+static bool
+validate_iothread_vq_mapping_list(IOThreadVirtQueueMappingList *list,
+        uint16_t num_queues, Error **errp)
+{
+    g_autofree unsigned long *vqs = bitmap_new(num_queues);
+    g_autoptr(GHashTable) iothreads =
+        g_hash_table_new(g_str_hash, g_str_equal);
+
+    for (IOThreadVirtQueueMappingList *node = list; node; node = node->next) {
+        const char *name = node->value->iothread;
+        uint16List *vq;
+
+        if (!iothread_by_id(name)) {
+            error_setg(errp, "IOThread \"%s\" object does not exist", name);
+            return false;
+        }
+
+        if (!g_hash_table_add(iothreads, (gpointer)name)) {
+            error_setg(errp,
+                    "duplicate IOThread name \"%s\" in iothread-vq-mapping",
+                    name);
+            return false;
+        }
+
+        if (node != list) {
+            if (!!node->value->vqs != !!list->value->vqs) {
+                error_setg(errp, "either all items in iothread-vq-mapping "
+                                 "must have vqs or none of them must have it");
+                return false;
+            }
+        }
+
+        for (vq = node->value->vqs; vq; vq = vq->next) {
+            if (vq->value >= num_queues) {
+                error_setg(errp, "vq index %u for IOThread \"%s\" must be "
+                        "less than num_queues %u in iothread-vq-mapping",
+                        vq->value, name, num_queues);
+                return false;
+            }
+
+            if (test_and_set_bit(vq->value, vqs)) {
+                error_setg(errp, "cannot assign vq %u to IOThread \"%s\" "
+                        "because it is already assigned", vq->value, name);
+                return false;
+            }
+        }
+    }
+
+    if (list->value->vqs) {
+        for (uint16_t i = 0; i < num_queues; i++) {
+            if (!test_bit(i, vqs)) {
+                error_setg(errp,
+                        "missing vq %u IOThread assignment in iothread-vq-mapping",
+                        i);
+                return false;
+            }
+        }
+    }
+
+    return true;
+}
+
 static void virtio_resize_cb(void *opaque)
 {
     VirtIODevice *vdev = opaque;
@@ -1487,34 +1550,24 @@ static void virtio_blk_resize(void *opaque)
 static void virtio_blk_drained_begin(void *opaque)
 {
     VirtIOBlock *s = opaque;
-    VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
-    AioContext *ctx = blk_get_aio_context(s->conf.conf.blk);
 
     if (!s->dataplane || !s->dataplane_started) {
         return;
     }
 
-    for (uint16_t i = 0; i < s->conf.num_queues; i++) {
-        VirtQueue *vq = virtio_get_queue(vdev, i);
-        virtio_queue_aio_detach_host_notifier(vq, ctx);
-    }
+    virtio_blk_data_plane_detach(s->dataplane);
 }
 
 /* Resume virtqueue ioeventfd processing after drain */
 static void virtio_blk_drained_end(void *opaque)
 {
     VirtIOBlock *s = opaque;
-    VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
-    AioContext *ctx = blk_get_aio_context(s->conf.conf.blk);
 
     if (!s->dataplane || !s->dataplane_started) {
         return;
     }
 
-    for (uint16_t i = 0; i < s->conf.num_queues; i++) {
-        VirtQueue *vq = virtio_get_queue(vdev, i);
-        virtio_queue_aio_attach_host_notifier(vq, ctx);
-    }
+    virtio_blk_data_plane_attach(s->dataplane);
 }
 
 static const BlockDevOps virtio_block_ops = {
@@ -1600,6 +1653,19 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    if (conf->iothread_vq_mapping_list) {
+        if (conf->iothread) {
+            error_setg(errp, "iothread and iothread-vq-mapping properties "
+                             "cannot be set at the same time");
+            return;
+        }
+
+        if (!validate_iothread_vq_mapping_list(conf->iothread_vq_mapping_list,
+                                               conf->num_queues, errp)) {
+            return;
+        }
+    }
+
     s->config_size = virtio_get_config_size(&virtio_blk_cfg_size_params,
                                             s->host_features);
     virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);
@@ -1702,6 +1768,8 @@ static Property virtio_blk_properties[] = {
     DEFINE_PROP_BOOL("seg-max-adjust", VirtIOBlock, conf.seg_max_adjust, true),
     DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
                      IOThread *),
+    DEFINE_PROP_IOTHREAD_VQ_MAPPING_LIST("iothread-vq-mapping", VirtIOBlock,
+                                         conf.iothread_vq_mapping_list),
     DEFINE_PROP_BIT64("discard", VirtIOBlock, host_features,
                       VIRTIO_BLK_F_DISCARD, true),
     DEFINE_PROP_BOOL("report-discard-granularity", VirtIOBlock,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2023-12-21 21:23 ` [PULL 11/33] scsi: only access SCSIDevice->requests from one thread Kevin Wolf
@ 2024-01-23 16:40   ` Hanna Czenczek
  2024-01-23 17:10     ` Kevin Wolf
  2024-01-23 17:21     ` Hanna Czenczek
  0 siblings, 2 replies; 57+ messages in thread
From: Hanna Czenczek @ 2024-01-23 16:40 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3214 bytes --]

On 21.12.23 22:23, Kevin Wolf wrote:
> From: Stefan Hajnoczi<stefanha@redhat.com>
>
> Stop depending on the AioContext lock and instead access
> SCSIDevice->requests from only one thread at a time:
> - When the VM is running only the BlockBackend's AioContext may access
>    the requests list.
> - When the VM is stopped only the main loop may access the requests
>    list.
>
> These constraints protect the requests list without the need for locking
> in the I/O code path.
>
> Note that multiple IOThreads are not supported yet because the code
> assumes all SCSIRequests are executed from a single AioContext. Leave
> that as future work.
>
> Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
> Reviewed-by: Eric Blake<eblake@redhat.com>
> Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> ---
>   include/hw/scsi/scsi.h |   7 +-
>   hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
>   2 files changed, 131 insertions(+), 57 deletions(-)

My reproducer for https://issues.redhat.com/browse/RHEL-3934 now breaks 
more often because of this commit than because of the original bug, i.e. 
when repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd 
device, this tends to happen when unplugging the scsi-hd:

{"execute":"device_del","arguments":{"id":"stg0"}}
{"return": {}}
qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context: 
Assertion `ctx == blk->ctx' failed.

(gdb) bt
#0  0x00007f32a668d83c in  () at /usr/lib/libc.so.6
#1  0x00007f32a663d668 in raise () at /usr/lib/libc.so.6
#2  0x00007f32a66254b8 in abort () at /usr/lib/libc.so.6
#3  0x00007f32a66253dc in  () at /usr/lib/libc.so.6
#4  0x00007f32a6635d26 in  () at /usr/lib/libc.so.6
#5  0x0000556e6b4880a4 in blk_get_aio_context (blk=0x556e6e89ccf0) at 
../block/block-backend.c:2429
#6  blk_get_aio_context (blk=0x556e6e89ccf0) at 
../block/block-backend.c:2417
#7  0x0000556e6b112d87 in scsi_device_for_each_req_async_bh 
(opaque=0x556e6e2c6d10) at ../hw/scsi/scsi-bus.c:128
#8  0x0000556e6b5d1966 in aio_bh_poll (ctx=ctx@entry=0x556e6d8aa290) at 
../util/async.c:218
#9  0x0000556e6b5bb16a in aio_poll (ctx=0x556e6d8aa290, 
blocking=blocking@entry=true) at ../util/aio-posix.c:722
#10 0x0000556e6b4564b6 in iothread_run 
(opaque=opaque@entry=0x556e6d89d920) at ../iothread.c:63
#11 0x0000556e6b5bde58 in qemu_thread_start (args=0x556e6d8aa9b0) at 
../util/qemu-thread-posix.c:541
#12 0x00007f32a668b9eb in  () at /usr/lib/libc.so.6
#13 0x00007f32a670f7cc in  () at /usr/lib/libc.so.6

I don’t know anything about the problem yet, but as usual, I like 
speculation and discovering how wrong I was later on, so one thing I 
came across that’s funny about virtio-scsi is that requests can happen 
even while a disk is being attached or detached.  That is, Linux seems 
to probe all LUNs when a new virtio-scsi device is being attached, and 
it won’t stop just because a disk is being attached or removed.  So 
maybe that’s part of the problem, that we get a request while the BB is 
being detached, and temporarily in an inconsistent state (BDS context 
differs from BB context).

I’ll look more into it.

Hanna

[-- Attachment #2: Type: text/html, Size: 4297 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-23 16:40   ` Hanna Czenczek
@ 2024-01-23 17:10     ` Kevin Wolf
  2024-01-23 17:23       ` Hanna Czenczek
                         ` (3 more replies)
  2024-01-23 17:21     ` Hanna Czenczek
  1 sibling, 4 replies; 57+ messages in thread
From: Kevin Wolf @ 2024-01-23 17:10 UTC (permalink / raw)
  To: Hanna Czenczek; +Cc: qemu-block, stefanha, qemu-devel

Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
> On 21.12.23 22:23, Kevin Wolf wrote:
> > From: Stefan Hajnoczi<stefanha@redhat.com>
> > 
> > Stop depending on the AioContext lock and instead access
> > SCSIDevice->requests from only one thread at a time:
> > - When the VM is running only the BlockBackend's AioContext may access
> >    the requests list.
> > - When the VM is stopped only the main loop may access the requests
> >    list.
> > 
> > These constraints protect the requests list without the need for locking
> > in the I/O code path.
> > 
> > Note that multiple IOThreads are not supported yet because the code
> > assumes all SCSIRequests are executed from a single AioContext. Leave
> > that as future work.
> > 
> > Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
> > Reviewed-by: Eric Blake<eblake@redhat.com>
> > Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
> > Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> > ---
> >   include/hw/scsi/scsi.h |   7 +-
> >   hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
> >   2 files changed, 131 insertions(+), 57 deletions(-)
> 
> My reproducer for https://issues.redhat.com/browse/RHEL-3934 now breaks more
> often because of this commit than because of the original bug, i.e. when
> repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
> this tends to happen when unplugging the scsi-hd:
> 
> {"execute":"device_del","arguments":{"id":"stg0"}}
> {"return": {}}
> qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
> Assertion `ctx == blk->ctx' failed.
> 
> (gdb) bt
> #0  0x00007f32a668d83c in  () at /usr/lib/libc.so.6
> #1  0x00007f32a663d668 in raise () at /usr/lib/libc.so.6
> #2  0x00007f32a66254b8 in abort () at /usr/lib/libc.so.6
> #3  0x00007f32a66253dc in  () at /usr/lib/libc.so.6
> #4  0x00007f32a6635d26 in  () at /usr/lib/libc.so.6
> #5  0x0000556e6b4880a4 in blk_get_aio_context (blk=0x556e6e89ccf0) at
> ../block/block-backend.c:2429
> #6  blk_get_aio_context (blk=0x556e6e89ccf0) at
> ../block/block-backend.c:2417
> #7  0x0000556e6b112d87 in scsi_device_for_each_req_async_bh
> (opaque=0x556e6e2c6d10) at ../hw/scsi/scsi-bus.c:128
> #8  0x0000556e6b5d1966 in aio_bh_poll (ctx=ctx@entry=0x556e6d8aa290) at
> ../util/async.c:218
> #9  0x0000556e6b5bb16a in aio_poll (ctx=0x556e6d8aa290,
> blocking=blocking@entry=true) at ../util/aio-posix.c:722
> #10 0x0000556e6b4564b6 in iothread_run (opaque=opaque@entry=0x556e6d89d920)
> at ../iothread.c:63
> #11 0x0000556e6b5bde58 in qemu_thread_start (args=0x556e6d8aa9b0) at
> ../util/qemu-thread-posix.c:541
> #12 0x00007f32a668b9eb in  () at /usr/lib/libc.so.6
> #13 0x00007f32a670f7cc in  () at /usr/lib/libc.so.6
> 
> I don’t know anything about the problem yet, but as usual, I like
> speculation and discovering how wrong I was later on, so one thing I came
> across that’s funny about virtio-scsi is that requests can happen even while
> a disk is being attached or detached.  That is, Linux seems to probe all
> LUNs when a new virtio-scsi device is being attached, and it won’t stop just
> because a disk is being attached or removed.  So maybe that’s part of the
> problem, that we get a request while the BB is being detached, and
> temporarily in an inconsistent state (BDS context differs from BB context).

I don't know anything about the problem either, but since you already
speculated about the cause, let me speculate about the solution:
Can we hold the graph writer lock for the tran_commit() call in
bdrv_try_change_aio_context()? And of course take the reader lock for
blk_get_aio_context(), but that should be completely unproblematic.

At the first sight I don't see a reason why this would break something,
but I've learnt not to trust my first impression with the graph locking
work...

Of course, I also didn't check if there are more things inside of the
device emulation that need additional locking in this case, too. But
even if so, blk_get_aio_context() should never see an inconsistent
state.

Kevin



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-23 16:40   ` Hanna Czenczek
  2024-01-23 17:10     ` Kevin Wolf
@ 2024-01-23 17:21     ` Hanna Czenczek
  1 sibling, 0 replies; 57+ messages in thread
From: Hanna Czenczek @ 2024-01-23 17:21 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, qemu-devel

On 23.01.24 17:40, Hanna Czenczek wrote:
> On 21.12.23 22:23, Kevin Wolf wrote:
>> From: Stefan Hajnoczi<stefanha@redhat.com>
>>
>> Stop depending on the AioContext lock and instead access
>> SCSIDevice->requests from only one thread at a time:
>> - When the VM is running only the BlockBackend's AioContext may access
>>    the requests list.
>> - When the VM is stopped only the main loop may access the requests
>>    list.
>>
>> These constraints protect the requests list without the need for locking
>> in the I/O code path.
>>
>> Note that multiple IOThreads are not supported yet because the code
>> assumes all SCSIRequests are executed from a single AioContext. Leave
>> that as future work.
>>
>> Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
>> Reviewed-by: Eric Blake<eblake@redhat.com>
>> Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
>> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
>> ---
>>   include/hw/scsi/scsi.h |   7 +-
>>   hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
>>   2 files changed, 131 insertions(+), 57 deletions(-)
[...]

> I don’t know anything about the problem yet, but as usual, I like 
> speculation and discovering how wrong I was later on, so one thing I 
> came across that’s funny about virtio-scsi is that requests can happen 
> even while a disk is being attached or detached.  That is, Linux seems 
> to probe all LUNs when a new virtio-scsi device is being attached, and 
> it won’t stop just because a disk is being attached or removed.  So 
> maybe that’s part of the problem, that we get a request while the BB 
> is being detached, and temporarily in an inconsistent state (BDS 
> context differs from BB context).
>
> I’ll look more into it.

What I think happens is that scsi_device_purge_requests() runs (perhaps 
through virtio_scsi_hotunplug() -> qdev_simple_device_unplug_cb() -> 
scsi_qdev_unrealize()?), which schedules 
scsi_device_for_each_req_async_bh() to run, but doesn’t await it.  We go 
on, begin to move the BB and its BDS back to the main context (via 
blk_set_aio_context() in virtio_scsi_hotunplug()), but 
scsi_device_for_each_req_async_bh() still runs in the I/O thread, it 
calls blk_get_aio_context() while the contexts are inconsistent, and we 
get the crash.

There is a comment above blk_get_aio_context() in 
scsi_device_for_each_req_async_bh() about the BB potentially being moved 
to a different context prior to the BH running, but it doesn’t consider 
the possibility that that move may occur *concurrently*.

I don’t know how to fix this, though.  The whole idea of anything 
happening to a BB while it is being moved to a different context seems 
so wrong to me that I’d want to slap a big lock on it, but I have the 
feeling that that isn’t what we want.

Hanna

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-23 17:10     ` Kevin Wolf
@ 2024-01-23 17:23       ` Hanna Czenczek
  2024-01-24 12:12       ` Hanna Czenczek
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 57+ messages in thread
From: Hanna Czenczek @ 2024-01-23 17:23 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, stefanha, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4498 bytes --]

On 23.01.24 18:10, Kevin Wolf wrote:
> Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
>> On 21.12.23 22:23, Kevin Wolf wrote:
>>> From: Stefan Hajnoczi<stefanha@redhat.com>
>>>
>>> Stop depending on the AioContext lock and instead access
>>> SCSIDevice->requests from only one thread at a time:
>>> - When the VM is running only the BlockBackend's AioContext may access
>>>     the requests list.
>>> - When the VM is stopped only the main loop may access the requests
>>>     list.
>>>
>>> These constraints protect the requests list without the need for locking
>>> in the I/O code path.
>>>
>>> Note that multiple IOThreads are not supported yet because the code
>>> assumes all SCSIRequests are executed from a single AioContext. Leave
>>> that as future work.
>>>
>>> Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
>>> Reviewed-by: Eric Blake<eblake@redhat.com>
>>> Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
>>> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
>>> ---
>>>    include/hw/scsi/scsi.h |   7 +-
>>>    hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
>>>    2 files changed, 131 insertions(+), 57 deletions(-)
>> My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
>> often because of this commit than because of the original bug, i.e. when
>> repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
>> this tends to happen when unplugging the scsi-hd:
>>
>> {"execute":"device_del","arguments":{"id":"stg0"}}
>> {"return": {}}
>> qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
>> Assertion `ctx == blk->ctx' failed.
>>
>> (gdb) bt
>> #0  0x00007f32a668d83c in  () at /usr/lib/libc.so.6
>> #1  0x00007f32a663d668 in raise () at /usr/lib/libc.so.6
>> #2  0x00007f32a66254b8 in abort () at /usr/lib/libc.so.6
>> #3  0x00007f32a66253dc in  () at /usr/lib/libc.so.6
>> #4  0x00007f32a6635d26 in  () at /usr/lib/libc.so.6
>> #5  0x0000556e6b4880a4 in blk_get_aio_context (blk=0x556e6e89ccf0) at
>> ../block/block-backend.c:2429
>> #6  blk_get_aio_context (blk=0x556e6e89ccf0) at
>> ../block/block-backend.c:2417
>> #7  0x0000556e6b112d87 in scsi_device_for_each_req_async_bh
>> (opaque=0x556e6e2c6d10) at ../hw/scsi/scsi-bus.c:128
>> #8  0x0000556e6b5d1966 in aio_bh_poll (ctx=ctx@entry=0x556e6d8aa290) at
>> ../util/async.c:218
>> #9  0x0000556e6b5bb16a in aio_poll (ctx=0x556e6d8aa290,
>> blocking=blocking@entry=true) at ../util/aio-posix.c:722
>> #10 0x0000556e6b4564b6 in iothread_run (opaque=opaque@entry=0x556e6d89d920)
>> at ../iothread.c:63
>> #11 0x0000556e6b5bde58 in qemu_thread_start (args=0x556e6d8aa9b0) at
>> ../util/qemu-thread-posix.c:541
>> #12 0x00007f32a668b9eb in  () at /usr/lib/libc.so.6
>> #13 0x00007f32a670f7cc in  () at /usr/lib/libc.so.6
>>
>> I don’t know anything about the problem yet, but as usual, I like
>> speculation and discovering how wrong I was later on, so one thing I came
>> across that’s funny about virtio-scsi is that requests can happen even while
>> a disk is being attached or detached.  That is, Linux seems to probe all
>> LUNs when a new virtio-scsi device is being attached, and it won’t stop just
>> because a disk is being attached or removed.  So maybe that’s part of the
>> problem, that we get a request while the BB is being detached, and
>> temporarily in an inconsistent state (BDS context differs from BB context).
> I don't know anything about the problem either, but since you already
> speculated about the cause, let me speculate about the solution:
> Can we hold the graph writer lock for the tran_commit() call in
> bdrv_try_change_aio_context()? And of course take the reader lock for
> blk_get_aio_context(), but that should be completely unproblematic.
>
> At the first sight I don't see a reason why this would break something,
> but I've learnt not to trust my first impression with the graph locking
> work...
>
> Of course, I also didn't check if there are more things inside of the
> device emulation that need additional locking in this case, too. But
> even if so, blk_get_aio_context() should never see an inconsistent
> state.

Ah, sorry, saw your reply only now that I hit “send”.

I forgot that we do have that big lock that I thought rather to avoid 
:)  Sounds good and very reasonable to me.  Changing the contexts in the 
graph sounds like a graph change operation, and reading and comparing 
contexts in the graph sounds like reading the graph.

Hanna

[-- Attachment #2: Type: text/html, Size: 5484 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-23 17:10     ` Kevin Wolf
  2024-01-23 17:23       ` Hanna Czenczek
@ 2024-01-24 12:12       ` Hanna Czenczek
  2024-01-24 21:53         ` Stefan Hajnoczi
  2024-01-25 17:32       ` Hanna Czenczek
  2024-01-29 16:30       ` Hanna Czenczek
  3 siblings, 1 reply; 57+ messages in thread
From: Hanna Czenczek @ 2024-01-24 12:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, stefanha, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3284 bytes --]

On 23.01.24 18:10, Kevin Wolf wrote:
> Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
>> On 21.12.23 22:23, Kevin Wolf wrote:
>>> From: Stefan Hajnoczi<stefanha@redhat.com>
>>>
>>> Stop depending on the AioContext lock and instead access
>>> SCSIDevice->requests from only one thread at a time:
>>> - When the VM is running only the BlockBackend's AioContext may access
>>>     the requests list.
>>> - When the VM is stopped only the main loop may access the requests
>>>     list.
>>>
>>> These constraints protect the requests list without the need for locking
>>> in the I/O code path.
>>>
>>> Note that multiple IOThreads are not supported yet because the code
>>> assumes all SCSIRequests are executed from a single AioContext. Leave
>>> that as future work.
>>>
>>> Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
>>> Reviewed-by: Eric Blake<eblake@redhat.com>
>>> Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
>>> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
>>> ---
>>>    include/hw/scsi/scsi.h |   7 +-
>>>    hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
>>>    2 files changed, 131 insertions(+), 57 deletions(-)
>> My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
>> often because of this commit than because of the original bug, i.e. when
>> repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
>> this tends to happen when unplugging the scsi-hd:
>>
>> {"execute":"device_del","arguments":{"id":"stg0"}}
>> {"return": {}}
>> qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
>> Assertion `ctx == blk->ctx' failed.

[...]

> I don't know anything about the problem either, but since you already
> speculated about the cause, let me speculate about the solution:
> Can we hold the graph writer lock for the tran_commit() call in
> bdrv_try_change_aio_context()? And of course take the reader lock for
> blk_get_aio_context(), but that should be completely unproblematic.

I tried this, and it’s not easy taking the lock just for tran_commit(), 
because some callers of bdrv_try_change_aio_context() already hold the 
write lock (specifically bdrv_attach_child_common(), 
bdrv_attach_child_common_abort(), and bdrv_root_unref_child()[1]), and 
qmp_x_blockdev_set_iothread() holds the read lock.  Other callers don’t 
hold any lock[2].

So I’m not sure whether we should mark all of 
bdrv_try_change_aio_context() as GRAPH_WRLOCK and then make all callers 
take the lock, or really only take it for tran_commit(), and have 
callers release the lock around bdrv_try_change_aio_context(). Former 
sounds better to naïve me.

(In any case, FWIW, having blk_set_aio_context() take the write lock, 
and scsi_device_for_each_req_async_bh() take the read lock[3], does fix 
the assertion failure.)

Hanna

[1] bdrv_root_unref_child() is not marked as GRAPH_WRLOCK, but it’s 
callers generally seem to ensure that the lock is taken when calling it.

[2] blk_set_aio_context() (evidently), blk_exp_add(), 
external_snapshot_abort(), {blockdev,drive}_backup_action(), 
qmp_{blockdev,drive}_mirror()

[3] I’ve made the _bh a coroutine (for bdrv_graph_co_rdlock()) and 
replaced the aio_bh_schedule_oneshot() by aio_co_enter() – hope that’s 
right.

[-- Attachment #2: Type: text/html, Size: 4598 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-24 12:12       ` Hanna Czenczek
@ 2024-01-24 21:53         ` Stefan Hajnoczi
  2024-01-25  9:06           ` Hanna Czenczek
  0 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2024-01-24 21:53 UTC (permalink / raw)
  To: Hanna Czenczek; +Cc: Kevin Wolf, qemu-block, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4039 bytes --]

On Wed, Jan 24, 2024 at 01:12:47PM +0100, Hanna Czenczek wrote:
> On 23.01.24 18:10, Kevin Wolf wrote:
> > Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
> > > On 21.12.23 22:23, Kevin Wolf wrote:
> > > > From: Stefan Hajnoczi<stefanha@redhat.com>
> > > > 
> > > > Stop depending on the AioContext lock and instead access
> > > > SCSIDevice->requests from only one thread at a time:
> > > > - When the VM is running only the BlockBackend's AioContext may access
> > > >     the requests list.
> > > > - When the VM is stopped only the main loop may access the requests
> > > >     list.
> > > > 
> > > > These constraints protect the requests list without the need for locking
> > > > in the I/O code path.
> > > > 
> > > > Note that multiple IOThreads are not supported yet because the code
> > > > assumes all SCSIRequests are executed from a single AioContext. Leave
> > > > that as future work.
> > > > 
> > > > Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
> > > > Reviewed-by: Eric Blake<eblake@redhat.com>
> > > > Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
> > > > Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> > > > ---
> > > >    include/hw/scsi/scsi.h |   7 +-
> > > >    hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
> > > >    2 files changed, 131 insertions(+), 57 deletions(-)
> > > My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
> > > often because of this commit than because of the original bug, i.e. when
> > > repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
> > > this tends to happen when unplugging the scsi-hd:
> > > 
> > > {"execute":"device_del","arguments":{"id":"stg0"}}
> > > {"return": {}}
> > > qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
> > > Assertion `ctx == blk->ctx' failed.
> 
> [...]
> 
> > I don't know anything about the problem either, but since you already
> > speculated about the cause, let me speculate about the solution:
> > Can we hold the graph writer lock for the tran_commit() call in
> > bdrv_try_change_aio_context()? And of course take the reader lock for
> > blk_get_aio_context(), but that should be completely unproblematic.
> 
> I tried this, and it’s not easy taking the lock just for tran_commit(),
> because some callers of bdrv_try_change_aio_context() already hold the write
> lock (specifically bdrv_attach_child_common(),
> bdrv_attach_child_common_abort(), and bdrv_root_unref_child()[1]), and
> qmp_x_blockdev_set_iothread() holds the read lock.  Other callers don’t hold
> any lock[2].
> 
> So I’m not sure whether we should mark all of bdrv_try_change_aio_context()
> as GRAPH_WRLOCK and then make all callers take the lock, or really only take
> it for tran_commit(), and have callers release the lock around
> bdrv_try_change_aio_context(). Former sounds better to naïve me.
> 
> (In any case, FWIW, having blk_set_aio_context() take the write lock, and
> scsi_device_for_each_req_async_bh() take the read lock[3], does fix the
> assertion failure.)

I wonder if a simpler solution is blk_inc_in_flight() in
scsi_device_for_each_req_async() and blk_dec_in_flight() in
scsi_device_for_each_req_async_bh() so that drain
waits for the BH.

There is a drain around the AioContext change, so as long as
scsi_device_for_each_req_async() was called before blk_set_aio_context()
and not _during_ aio_poll(), we would prevent inconsistent BB vs BDS
aio_contexts.

Stefan

> 
> Hanna
> 
> [1] bdrv_root_unref_child() is not marked as GRAPH_WRLOCK, but it’s callers
> generally seem to ensure that the lock is taken when calling it.
> 
> [2] blk_set_aio_context() (evidently), blk_exp_add(),
> external_snapshot_abort(), {blockdev,drive}_backup_action(),
> qmp_{blockdev,drive}_mirror()
> 
> [3] I’ve made the _bh a coroutine (for bdrv_graph_co_rdlock()) and replaced
> the aio_bh_schedule_oneshot() by aio_co_enter() – hope that’s right.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-24 21:53         ` Stefan Hajnoczi
@ 2024-01-25  9:06           ` Hanna Czenczek
  2024-01-25 13:25             ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Hanna Czenczek @ 2024-01-25  9:06 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-block, qemu-devel

On 24.01.24 22:53, Stefan Hajnoczi wrote:
> On Wed, Jan 24, 2024 at 01:12:47PM +0100, Hanna Czenczek wrote:
>> On 23.01.24 18:10, Kevin Wolf wrote:
>>> Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
>>>> On 21.12.23 22:23, Kevin Wolf wrote:
>>>>> From: Stefan Hajnoczi<stefanha@redhat.com>
>>>>>
>>>>> Stop depending on the AioContext lock and instead access
>>>>> SCSIDevice->requests from only one thread at a time:
>>>>> - When the VM is running only the BlockBackend's AioContext may access
>>>>>      the requests list.
>>>>> - When the VM is stopped only the main loop may access the requests
>>>>>      list.
>>>>>
>>>>> These constraints protect the requests list without the need for locking
>>>>> in the I/O code path.
>>>>>
>>>>> Note that multiple IOThreads are not supported yet because the code
>>>>> assumes all SCSIRequests are executed from a single AioContext. Leave
>>>>> that as future work.
>>>>>
>>>>> Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
>>>>> Reviewed-by: Eric Blake<eblake@redhat.com>
>>>>> Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
>>>>> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
>>>>> ---
>>>>>     include/hw/scsi/scsi.h |   7 +-
>>>>>     hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
>>>>>     2 files changed, 131 insertions(+), 57 deletions(-)
>>>> My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
>>>> often because of this commit than because of the original bug, i.e. when
>>>> repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
>>>> this tends to happen when unplugging the scsi-hd:
>>>>
>>>> {"execute":"device_del","arguments":{"id":"stg0"}}
>>>> {"return": {}}
>>>> qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
>>>> Assertion `ctx == blk->ctx' failed.
>> [...]
>>
>>> I don't know anything about the problem either, but since you already
>>> speculated about the cause, let me speculate about the solution:
>>> Can we hold the graph writer lock for the tran_commit() call in
>>> bdrv_try_change_aio_context()? And of course take the reader lock for
>>> blk_get_aio_context(), but that should be completely unproblematic.
>> I tried this, and it’s not easy taking the lock just for tran_commit(),
>> because some callers of bdrv_try_change_aio_context() already hold the write
>> lock (specifically bdrv_attach_child_common(),
>> bdrv_attach_child_common_abort(), and bdrv_root_unref_child()[1]), and
>> qmp_x_blockdev_set_iothread() holds the read lock.  Other callers don’t hold
>> any lock[2].
>>
>> So I’m not sure whether we should mark all of bdrv_try_change_aio_context()
>> as GRAPH_WRLOCK and then make all callers take the lock, or really only take
>> it for tran_commit(), and have callers release the lock around
>> bdrv_try_change_aio_context(). Former sounds better to naïve me.
>>
>> (In any case, FWIW, having blk_set_aio_context() take the write lock, and
>> scsi_device_for_each_req_async_bh() take the read lock[3], does fix the
>> assertion failure.)
> I wonder if a simpler solution is blk_inc_in_flight() in
> scsi_device_for_each_req_async() and blk_dec_in_flight() in
> scsi_device_for_each_req_async_bh() so that drain
> waits for the BH.
>
> There is a drain around the AioContext change, so as long as
> scsi_device_for_each_req_async() was called before blk_set_aio_context()
> and not _during_ aio_poll(), we would prevent inconsistent BB vs BDS
> aio_contexts.

Actually, Kevin has suggested on IRC that we drop the whole drain. :)

Dropping the write lock in our outside of bdrv_try_change_aio_context() 
for callers that have already taken it seems unsafe, so the only option 
would be to make the whole function write-lock-able.  The drained 
section can cause problems with that if it ends up wanting to reorganize 
the graph, so AFAIU drain should never be done while under a write 
lock.  This is already a problem now because there are three callers 
that do call bdrv_try_change_aio_context() while under a write lock, so 
it seems like we shouldn’t keep the drain as-is.

So, Kevin suggested just dropping that drain, because I/O requests are 
no longer supposed to care about a BDS’s native AioContext anymore 
anyway, so it seems like the need for the drain has gone away with 
multiqueue.  Then we could make the whole function GRAPH_WRLOCK.

Hanna



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-25  9:06           ` Hanna Czenczek
@ 2024-01-25 13:25             ` Stefan Hajnoczi
  0 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2024-01-25 13:25 UTC (permalink / raw)
  To: Hanna Czenczek; +Cc: Kevin Wolf, qemu-block, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4876 bytes --]

On Thu, Jan 25, 2024 at 10:06:51AM +0100, Hanna Czenczek wrote:
> On 24.01.24 22:53, Stefan Hajnoczi wrote:
> > On Wed, Jan 24, 2024 at 01:12:47PM +0100, Hanna Czenczek wrote:
> > > On 23.01.24 18:10, Kevin Wolf wrote:
> > > > Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
> > > > > On 21.12.23 22:23, Kevin Wolf wrote:
> > > > > > From: Stefan Hajnoczi<stefanha@redhat.com>
> > > > > > 
> > > > > > Stop depending on the AioContext lock and instead access
> > > > > > SCSIDevice->requests from only one thread at a time:
> > > > > > - When the VM is running only the BlockBackend's AioContext may access
> > > > > >      the requests list.
> > > > > > - When the VM is stopped only the main loop may access the requests
> > > > > >      list.
> > > > > > 
> > > > > > These constraints protect the requests list without the need for locking
> > > > > > in the I/O code path.
> > > > > > 
> > > > > > Note that multiple IOThreads are not supported yet because the code
> > > > > > assumes all SCSIRequests are executed from a single AioContext. Leave
> > > > > > that as future work.
> > > > > > 
> > > > > > Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
> > > > > > Reviewed-by: Eric Blake<eblake@redhat.com>
> > > > > > Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
> > > > > > Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> > > > > > ---
> > > > > >     include/hw/scsi/scsi.h |   7 +-
> > > > > >     hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
> > > > > >     2 files changed, 131 insertions(+), 57 deletions(-)
> > > > > My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
> > > > > often because of this commit than because of the original bug, i.e. when
> > > > > repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
> > > > > this tends to happen when unplugging the scsi-hd:
> > > > > 
> > > > > {"execute":"device_del","arguments":{"id":"stg0"}}
> > > > > {"return": {}}
> > > > > qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
> > > > > Assertion `ctx == blk->ctx' failed.
> > > [...]
> > > 
> > > > I don't know anything about the problem either, but since you already
> > > > speculated about the cause, let me speculate about the solution:
> > > > Can we hold the graph writer lock for the tran_commit() call in
> > > > bdrv_try_change_aio_context()? And of course take the reader lock for
> > > > blk_get_aio_context(), but that should be completely unproblematic.
> > > I tried this, and it’s not easy taking the lock just for tran_commit(),
> > > because some callers of bdrv_try_change_aio_context() already hold the write
> > > lock (specifically bdrv_attach_child_common(),
> > > bdrv_attach_child_common_abort(), and bdrv_root_unref_child()[1]), and
> > > qmp_x_blockdev_set_iothread() holds the read lock.  Other callers don’t hold
> > > any lock[2].
> > > 
> > > So I’m not sure whether we should mark all of bdrv_try_change_aio_context()
> > > as GRAPH_WRLOCK and then make all callers take the lock, or really only take
> > > it for tran_commit(), and have callers release the lock around
> > > bdrv_try_change_aio_context(). Former sounds better to naïve me.
> > > 
> > > (In any case, FWIW, having blk_set_aio_context() take the write lock, and
> > > scsi_device_for_each_req_async_bh() take the read lock[3], does fix the
> > > assertion failure.)
> > I wonder if a simpler solution is blk_inc_in_flight() in
> > scsi_device_for_each_req_async() and blk_dec_in_flight() in
> > scsi_device_for_each_req_async_bh() so that drain
> > waits for the BH.
> > 
> > There is a drain around the AioContext change, so as long as
> > scsi_device_for_each_req_async() was called before blk_set_aio_context()
> > and not _during_ aio_poll(), we would prevent inconsistent BB vs BDS
> > aio_contexts.
> 
> Actually, Kevin has suggested on IRC that we drop the whole drain. :)
> 
> Dropping the write lock in our outside of bdrv_try_change_aio_context() for
> callers that have already taken it seems unsafe, so the only option would be
> to make the whole function write-lock-able.  The drained section can cause
> problems with that if it ends up wanting to reorganize the graph, so AFAIU
> drain should never be done while under a write lock.  This is already a
> problem now because there are three callers that do call
> bdrv_try_change_aio_context() while under a write lock, so it seems like we
> shouldn’t keep the drain as-is.
> 
> So, Kevin suggested just dropping that drain, because I/O requests are no
> longer supposed to care about a BDS’s native AioContext anymore anyway, so
> it seems like the need for the drain has gone away with multiqueue.  Then we
> could make the whole function GRAPH_WRLOCK.

Okay.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-23 17:10     ` Kevin Wolf
  2024-01-23 17:23       ` Hanna Czenczek
  2024-01-24 12:12       ` Hanna Czenczek
@ 2024-01-25 17:32       ` Hanna Czenczek
  2024-01-26 13:18         ` Kevin Wolf
  2024-01-29 16:30       ` Hanna Czenczek
  3 siblings, 1 reply; 57+ messages in thread
From: Hanna Czenczek @ 2024-01-25 17:32 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, stefanha, qemu-devel

On 23.01.24 18:10, Kevin Wolf wrote:
> Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
>> On 21.12.23 22:23, Kevin Wolf wrote:
>>> From: Stefan Hajnoczi<stefanha@redhat.com>
>>>
>>> Stop depending on the AioContext lock and instead access
>>> SCSIDevice->requests from only one thread at a time:
>>> - When the VM is running only the BlockBackend's AioContext may access
>>>     the requests list.
>>> - When the VM is stopped only the main loop may access the requests
>>>     list.
>>>
>>> These constraints protect the requests list without the need for locking
>>> in the I/O code path.
>>>
>>> Note that multiple IOThreads are not supported yet because the code
>>> assumes all SCSIRequests are executed from a single AioContext. Leave
>>> that as future work.
>>>
>>> Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
>>> Reviewed-by: Eric Blake<eblake@redhat.com>
>>> Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
>>> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
>>> ---
>>>    include/hw/scsi/scsi.h |   7 +-
>>>    hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
>>>    2 files changed, 131 insertions(+), 57 deletions(-)
>> My reproducer for https://issues.redhat.com/browse/RHEL-3934 now breaks more
>> often because of this commit than because of the original bug, i.e. when
>> repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
>> this tends to happen when unplugging the scsi-hd:
>>
>> {"execute":"device_del","arguments":{"id":"stg0"}}
>> {"return": {}}
>> qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
>> Assertion `ctx == blk->ctx' failed.

[...]

>> I don’t know anything about the problem yet, but as usual, I like
>> speculation and discovering how wrong I was later on, so one thing I came
>> across that’s funny about virtio-scsi is that requests can happen even while
>> a disk is being attached or detached.  That is, Linux seems to probe all
>> LUNs when a new virtio-scsi device is being attached, and it won’t stop just
>> because a disk is being attached or removed.  So maybe that’s part of the
>> problem, that we get a request while the BB is being detached, and
>> temporarily in an inconsistent state (BDS context differs from BB context).
> I don't know anything about the problem either, but since you already
> speculated about the cause, let me speculate about the solution:
> Can we hold the graph writer lock for the tran_commit() call in
> bdrv_try_change_aio_context()? And of course take the reader lock for
> blk_get_aio_context(), but that should be completely unproblematic.

Actually, now that completely unproblematic part is giving me trouble.  
I wanted to just put a graph lock into blk_get_aio_context() (making it 
a coroutine with a wrapper), but callers of blk_get_aio_context() 
generally assume the context is going to stay the BB’s context for as 
long as their AioContext * variable is in scope.  I was tempted to think 
callers know what happens to the BB they pass to blk_get_aio_context(), 
and it won’t change contexts so easily, but then I remembered this is 
exactly what happens in this case; we run 
scsi_device_for_each_req_async_bh() in one thread (which calls 
blk_get_aio_context()), and in the other, we change the BB’s context.

It seems like there are very few blk_* functions right now that require 
taking a graph lock around it, so I’m hesitant to go that route.  But if 
we’re protecting a BB’s context via the graph write lock, I can’t think 
of a way around having to take a read lock whenever reading a BB’s 
context, and holding it for as long as we assume that context to remain 
the BB’s context.  It’s also hard to figure out how long that is, case 
by case; for example, dma_blk_read() schedules an AIO function in the BB 
context; but we probably don’t care that this context remains the BB’s 
context until the request is done.  In the case of 
scsi_device_for_each_req_async_bh(), we already take care to re-schedule 
it when it turns out the context is outdated, so it does seem quite 
important here, and we probably want to keep the lock until after the 
QTAILQ_FOREACH_SAFE() loop.

On a tangent, this TOCTTOU problem makes me wary of other blk_* 
functions that query information.  For example, fuse_read() (in 
block/export/fuse.c) truncates requests to the BB length.  But what if 
the BB length changes concurrently between blk_getlength() and 
blk_pread()?  While we can justify using the graph lock for a BB’s 
AioContext, we can’t use it for other metadata like its length.

Hanna

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-25 17:32       ` Hanna Czenczek
@ 2024-01-26 13:18         ` Kevin Wolf
  2024-01-26 15:24           ` Hanna Czenczek
  0 siblings, 1 reply; 57+ messages in thread
From: Kevin Wolf @ 2024-01-26 13:18 UTC (permalink / raw)
  To: Hanna Czenczek; +Cc: qemu-block, stefanha, qemu-devel

Am 25.01.2024 um 18:32 hat Hanna Czenczek geschrieben:
> On 23.01.24 18:10, Kevin Wolf wrote:
> > Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
> > > On 21.12.23 22:23, Kevin Wolf wrote:
> > > > From: Stefan Hajnoczi<stefanha@redhat.com>
> > > > 
> > > > Stop depending on the AioContext lock and instead access
> > > > SCSIDevice->requests from only one thread at a time:
> > > > - When the VM is running only the BlockBackend's AioContext may access
> > > >     the requests list.
> > > > - When the VM is stopped only the main loop may access the requests
> > > >     list.
> > > > 
> > > > These constraints protect the requests list without the need for locking
> > > > in the I/O code path.
> > > > 
> > > > Note that multiple IOThreads are not supported yet because the code
> > > > assumes all SCSIRequests are executed from a single AioContext. Leave
> > > > that as future work.
> > > > 
> > > > Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
> > > > Reviewed-by: Eric Blake<eblake@redhat.com>
> > > > Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
> > > > Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> > > > ---
> > > >    include/hw/scsi/scsi.h |   7 +-
> > > >    hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
> > > >    2 files changed, 131 insertions(+), 57 deletions(-)
> > > My reproducer for https://issues.redhat.com/browse/RHEL-3934 now breaks more
> > > often because of this commit than because of the original bug, i.e. when
> > > repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
> > > this tends to happen when unplugging the scsi-hd:
> > > 
> > > {"execute":"device_del","arguments":{"id":"stg0"}}
> > > {"return": {}}
> > > qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
> > > Assertion `ctx == blk->ctx' failed.
> 
> [...]
> 
> > > I don’t know anything about the problem yet, but as usual, I like
> > > speculation and discovering how wrong I was later on, so one thing I came
> > > across that’s funny about virtio-scsi is that requests can happen even while
> > > a disk is being attached or detached.  That is, Linux seems to probe all
> > > LUNs when a new virtio-scsi device is being attached, and it won’t stop just
> > > because a disk is being attached or removed.  So maybe that’s part of the
> > > problem, that we get a request while the BB is being detached, and
> > > temporarily in an inconsistent state (BDS context differs from BB context).
> > I don't know anything about the problem either, but since you already
> > speculated about the cause, let me speculate about the solution:
> > Can we hold the graph writer lock for the tran_commit() call in
> > bdrv_try_change_aio_context()? And of course take the reader lock for
> > blk_get_aio_context(), but that should be completely unproblematic.
> 
> Actually, now that completely unproblematic part is giving me trouble.  I
> wanted to just put a graph lock into blk_get_aio_context() (making it a
> coroutine with a wrapper)

Which is the first thing I neglected and already not great. We have
calls of blk_get_aio_context() in the SCSI I/O path, and creating a
coroutine and doing at least two context switches simply for this call
is a lot of overhead...

> but callers of blk_get_aio_context() generally assume the context is
> going to stay the BB’s context for as long as their AioContext *
> variable is in scope.

I'm not so sure about that. And taking another step back, I'm actually
also not sure how much it still matters now that they can submit I/O
from any thread.

Maybe the correct solution is to remove the assertion from
blk_get_aio_context() and just always return blk->ctx. If it's in the
middle of a change, you'll either get the old one or the new one. Either
one is fine to submit I/O from, and if you care about changes for other
reasons (like SCSI does), then you need explicit code to protect it
anyway (which SCSI apparently has, but it doesn't work).

> I was tempted to think callers know what happens to the BB they pass
> to blk_get_aio_context(), and it won’t change contexts so easily, but
> then I remembered this is exactly what happens in this case; we run
> scsi_device_for_each_req_async_bh() in one thread (which calls
> blk_get_aio_context()), and in the other, we change the BB’s context.

Let's think a bit more about scsi_device_for_each_req_async()
specifically. This is a function that runs in the main thread. Nothing
will change any AioContext assignment if it doesn't call it. It wants to
make sure that scsi_device_for_each_req_async_bh() is called in the
same AioContext where the virtqueue is processed, so it schedules a BH
and waits for it.

Waiting for it means running a nested event loop that could do anything,
including changing AioContexts. So this is what needs the locking, not
the blk_get_aio_context() call in scsi_device_for_each_req_async_bh().
If we lock before the nested event loop and unlock in the BH, the check
in the BH can become an assertion. (It is important that we unlock in
the BH rather than after waiting because if something takes the writer
lock, we need to unlock during the nested event loop of bdrv_wrlock() to
avoid a deadlock.)

And spawning a coroutine for this looks a lot more acceptable because
it's on a slow path anyway.

In fact, we probably don't technically need a coroutine to take the
reader lock here. We can have a new graph lock function that asserts
that there is no writer (we know because we're running in the main loop)
and then atomically increments the reader count. But maybe that already
complicates things again...

> It seems like there are very few blk_* functions right now that
> require taking a graph lock around it, so I’m hesitant to go that
> route.  But if we’re protecting a BB’s context via the graph write
> lock, I can’t think of a way around having to take a read lock
> whenever reading a BB’s context, and holding it for as long as we
> assume that context to remain the BB’s context.  It’s also hard to
> figure out how long that is, case by case; for example, dma_blk_read()
> schedules an AIO function in the BB context; but we probably don’t
> care that this context remains the BB’s context until the request is
> done.  In the case of scsi_device_for_each_req_async_bh(), we already
> take care to re-schedule it when it turns out the context is outdated,
> so it does seem quite important here, and we probably want to keep the
> lock until after the QTAILQ_FOREACH_SAFE() loop.

Maybe we need to audit all callers. Fortunately, there don't seem to be
too many. At least not direct ones...

> On a tangent, this TOCTTOU problem makes me wary of other blk_*
> functions that query information.  For example, fuse_read() (in
> block/export/fuse.c) truncates requests to the BB length.  But what if
> the BB length changes concurrently between blk_getlength() and
> blk_pread()?  While we can justify using the graph lock for a BB’s
> AioContext, we can’t use it for other metadata like its length.

Hm... Is "tough luck" an acceptable answer? ;-)

Kevin



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-26 13:18         ` Kevin Wolf
@ 2024-01-26 15:24           ` Hanna Czenczek
  2024-01-31 20:35             ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Hanna Czenczek @ 2024-01-26 15:24 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, stefanha, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9926 bytes --]

On 26.01.24 14:18, Kevin Wolf wrote:
> Am 25.01.2024 um 18:32 hat Hanna Czenczek geschrieben:
>> On 23.01.24 18:10, Kevin Wolf wrote:
>>> Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
>>>> On 21.12.23 22:23, Kevin Wolf wrote:
>>>>> From: Stefan Hajnoczi<stefanha@redhat.com>
>>>>>
>>>>> Stop depending on the AioContext lock and instead access
>>>>> SCSIDevice->requests from only one thread at a time:
>>>>> - When the VM is running only the BlockBackend's AioContext may access
>>>>>      the requests list.
>>>>> - When the VM is stopped only the main loop may access the requests
>>>>>      list.
>>>>>
>>>>> These constraints protect the requests list without the need for locking
>>>>> in the I/O code path.
>>>>>
>>>>> Note that multiple IOThreads are not supported yet because the code
>>>>> assumes all SCSIRequests are executed from a single AioContext. Leave
>>>>> that as future work.
>>>>>
>>>>> Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
>>>>> Reviewed-by: Eric Blake<eblake@redhat.com>
>>>>> Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
>>>>> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
>>>>> ---
>>>>>     include/hw/scsi/scsi.h |   7 +-
>>>>>     hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
>>>>>     2 files changed, 131 insertions(+), 57 deletions(-)
>>>> My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
>>>> often because of this commit than because of the original bug, i.e. when
>>>> repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
>>>> this tends to happen when unplugging the scsi-hd:

Note: We (on issues.redhat.com) have a separate report that seems to be 
concerning this very problem: https://issues.redhat.com/browse/RHEL-19381

>>>> {"execute":"device_del","arguments":{"id":"stg0"}}
>>>> {"return": {}}
>>>> qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
>>>> Assertion `ctx == blk->ctx' failed.
>> [...]
>>
>>>> I don’t know anything about the problem yet, but as usual, I like
>>>> speculation and discovering how wrong I was later on, so one thing I came
>>>> across that’s funny about virtio-scsi is that requests can happen even while
>>>> a disk is being attached or detached.  That is, Linux seems to probe all
>>>> LUNs when a new virtio-scsi device is being attached, and it won’t stop just
>>>> because a disk is being attached or removed.  So maybe that’s part of the
>>>> problem, that we get a request while the BB is being detached, and
>>>> temporarily in an inconsistent state (BDS context differs from BB context).
>>> I don't know anything about the problem either, but since you already
>>> speculated about the cause, let me speculate about the solution:
>>> Can we hold the graph writer lock for the tran_commit() call in
>>> bdrv_try_change_aio_context()? And of course take the reader lock for
>>> blk_get_aio_context(), but that should be completely unproblematic.
>> Actually, now that completely unproblematic part is giving me trouble.  I
>> wanted to just put a graph lock into blk_get_aio_context() (making it a
>> coroutine with a wrapper)
> Which is the first thing I neglected and already not great. We have
> calls of blk_get_aio_context() in the SCSI I/O path, and creating a
> coroutine and doing at least two context switches simply for this call
> is a lot of overhead...
>
>> but callers of blk_get_aio_context() generally assume the context is
>> going to stay the BB’s context for as long as their AioContext *
>> variable is in scope.
> I'm not so sure about that. And taking another step back, I'm actually
> also not sure how much it still matters now that they can submit I/O
> from any thread.

That’s my impression, too, but “not sure” doesn’t feel great. :) 
scsi_device_for_each_req_async_bh() specifically double-checks whether 
it’s still in the right context before invoking the specified function, 
so it seems there was some intention to continue to run in the context 
associated with the BB.

(Not judging whether that intent makes sense or not, yet.)

> Maybe the correct solution is to remove the assertion from
> blk_get_aio_context() and just always return blk->ctx. If it's in the
> middle of a change, you'll either get the old one or the new one. Either
> one is fine to submit I/O from, and if you care about changes for other
> reasons (like SCSI does), then you need explicit code to protect it
> anyway (which SCSI apparently has, but it doesn't work).

I think most callers do just assume the BB stays in the context they got 
(without any proof, admittedly), but I agree that under re-evaluation, 
it probably doesn’t actually matter to them, really. And yes, basically, 
if the caller doesn’t need to take a lock because it doesn’t really 
matter whether blk->ctx changes while its still using the old value, 
blk_get_aio_context() in turn doesn’t need to double-check blk->ctx 
against the root node’s context either, and nobody needs a lock.

So I agree, it’s on the caller to protect against a potentially changing 
context, blk_get_aio_context() should just return blk->ctx and not check 
against the root node.

(On a tangent: blk_drain() is a caller of blk_get_aio_context(), and it 
polls that context.  Why does it need to poll that context specifically 
when requests may be in any context?  Is it because if there are 
requests in the main thread, we must poll that, but otherwise it’s fine 
to poll any thread, and we can only have requests in the main thread if 
that’s the BB’s context?)

>> I was tempted to think callers know what happens to the BB they pass
>> to blk_get_aio_context(), and it won’t change contexts so easily, but
>> then I remembered this is exactly what happens in this case; we run
>> scsi_device_for_each_req_async_bh() in one thread (which calls
>> blk_get_aio_context()), and in the other, we change the BB’s context.
> Let's think a bit more about scsi_device_for_each_req_async()
> specifically. This is a function that runs in the main thread. Nothing
> will change any AioContext assignment if it doesn't call it. It wants to
> make sure that scsi_device_for_each_req_async_bh() is called in the
> same AioContext where the virtqueue is processed, so it schedules a BH
> and waits for it.

I don’t quite follow, it doesn’t wait for the BH.  It uses 
aio_bh_schedule_oneshot(), not aio_wait_bh_oneshot().  While you’re 
right that if it did wait, the BB context might still change, in 
practice we wouldn’t have the problem at hand because the caller is 
actually the one to change the context, concurrently while the BH is 
running.

> Waiting for it means running a nested event loop that could do anything,
> including changing AioContexts. So this is what needs the locking, not
> the blk_get_aio_context() call in scsi_device_for_each_req_async_bh().
> If we lock before the nested event loop and unlock in the BH, the check
> in the BH can become an assertion. (It is important that we unlock in
> the BH rather than after waiting because if something takes the writer
> lock, we need to unlock during the nested event loop of bdrv_wrlock() to
> avoid a deadlock.)
>
> And spawning a coroutine for this looks a lot more acceptable because
> it's on a slow path anyway.
>
> In fact, we probably don't technically need a coroutine to take the
> reader lock here. We can have a new graph lock function that asserts
> that there is no writer (we know because we're running in the main loop)
> and then atomically increments the reader count. But maybe that already
> complicates things again...

So as far as I understand we can’t just use aio_wait_bh_oneshot() and 
wrap it in bdrv_graph_rd{,un}lock_main_loop(), because that doesn’t 
actually lock the graph.  I feel like adding a new graph lock function 
for this quite highly specific case could be dangerous, because it seems 
easy to use the wrong way.

Just having a trampoline coroutine to call bdrv_graph_co_rd{,un}lock() 
seems simple enough and reasonable here (not a hot path).  Can we have 
that coroutine then use aio_wait_bh_oneshot() with the existing _bh 
function, or should that be made a coroutine, too?

>> It seems like there are very few blk_* functions right now that
>> require taking a graph lock around it, so I’m hesitant to go that
>> route.  But if we’re protecting a BB’s context via the graph write
>> lock, I can’t think of a way around having to take a read lock
>> whenever reading a BB’s context, and holding it for as long as we
>> assume that context to remain the BB’s context.  It’s also hard to
>> figure out how long that is, case by case; for example, dma_blk_read()
>> schedules an AIO function in the BB context; but we probably don’t
>> care that this context remains the BB’s context until the request is
>> done.  In the case of scsi_device_for_each_req_async_bh(), we already
>> take care to re-schedule it when it turns out the context is outdated,
>> so it does seem quite important here, and we probably want to keep the
>> lock until after the QTAILQ_FOREACH_SAFE() loop.
> Maybe we need to audit all callers. Fortunately, there don't seem to be
> too many. At least not direct ones...
>
>> On a tangent, this TOCTTOU problem makes me wary of other blk_*
>> functions that query information.  For example, fuse_read() (in
>> block/export/fuse.c) truncates requests to the BB length.  But what if
>> the BB length changes concurrently between blk_getlength() and
>> blk_pread()?  While we can justify using the graph lock for a BB’s
>> AioContext, we can’t use it for other metadata like its length.
> Hm... Is "tough luck" an acceptable answer? ;-)

Absolutely, if we do it acknowledgingly (great word).  I’m just a bit 
worried not all of these corner cases have been acknowledged, and some 
of them may be looking for a different answer.

Hanna

[-- Attachment #2: Type: text/html, Size: 13092 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-23 17:10     ` Kevin Wolf
                         ` (2 preceding siblings ...)
  2024-01-25 17:32       ` Hanna Czenczek
@ 2024-01-29 16:30       ` Hanna Czenczek
  2024-01-31 10:17         ` Kevin Wolf
  3 siblings, 1 reply; 57+ messages in thread
From: Hanna Czenczek @ 2024-01-29 16:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, stefanha, qemu-devel

On 23.01.24 18:10, Kevin Wolf wrote:
> Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
>> On 21.12.23 22:23, Kevin Wolf wrote:
>>> From: Stefan Hajnoczi<stefanha@redhat.com>
>>>
>>> Stop depending on the AioContext lock and instead access
>>> SCSIDevice->requests from only one thread at a time:
>>> - When the VM is running only the BlockBackend's AioContext may access
>>>     the requests list.
>>> - When the VM is stopped only the main loop may access the requests
>>>     list.
>>>
>>> These constraints protect the requests list without the need for locking
>>> in the I/O code path.
>>>
>>> Note that multiple IOThreads are not supported yet because the code
>>> assumes all SCSIRequests are executed from a single AioContext. Leave
>>> that as future work.
>>>
>>> Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
>>> Reviewed-by: Eric Blake<eblake@redhat.com>
>>> Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
>>> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
>>> ---
>>>    include/hw/scsi/scsi.h |   7 +-
>>>    hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
>>>    2 files changed, 131 insertions(+), 57 deletions(-)
>> My reproducer for https://issues.redhat.com/browse/RHEL-3934 now breaks more
>> often because of this commit than because of the original bug, i.e. when
>> repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
>> this tends to happen when unplugging the scsi-hd:
>>
>> {"execute":"device_del","arguments":{"id":"stg0"}}
>> {"return": {}}
>> qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
>> Assertion `ctx == blk->ctx' failed.
>>
>> (gdb) bt
>> #0  0x00007f32a668d83c in  () at /usr/lib/libc.so.6
>> #1  0x00007f32a663d668 in raise () at /usr/lib/libc.so.6
>> #2  0x00007f32a66254b8 in abort () at /usr/lib/libc.so.6
>> #3  0x00007f32a66253dc in  () at /usr/lib/libc.so.6
>> #4  0x00007f32a6635d26 in  () at /usr/lib/libc.so.6
>> #5  0x0000556e6b4880a4 in blk_get_aio_context (blk=0x556e6e89ccf0) at
>> ../block/block-backend.c:2429
>> #6  blk_get_aio_context (blk=0x556e6e89ccf0) at
>> ../block/block-backend.c:2417
>> #7  0x0000556e6b112d87 in scsi_device_for_each_req_async_bh
>> (opaque=0x556e6e2c6d10) at ../hw/scsi/scsi-bus.c:128
>> #8  0x0000556e6b5d1966 in aio_bh_poll (ctx=ctx@entry=0x556e6d8aa290) at
>> ../util/async.c:218
>> #9  0x0000556e6b5bb16a in aio_poll (ctx=0x556e6d8aa290,
>> blocking=blocking@entry=true) at ../util/aio-posix.c:722
>> #10 0x0000556e6b4564b6 in iothread_run (opaque=opaque@entry=0x556e6d89d920)
>> at ../iothread.c:63
>> #11 0x0000556e6b5bde58 in qemu_thread_start (args=0x556e6d8aa9b0) at
>> ../util/qemu-thread-posix.c:541
>> #12 0x00007f32a668b9eb in  () at /usr/lib/libc.so.6
>> #13 0x00007f32a670f7cc in  () at /usr/lib/libc.so.6
>>
>> I don’t know anything about the problem yet, but as usual, I like
>> speculation and discovering how wrong I was later on, so one thing I came
>> across that’s funny about virtio-scsi is that requests can happen even while
>> a disk is being attached or detached.  That is, Linux seems to probe all
>> LUNs when a new virtio-scsi device is being attached, and it won’t stop just
>> because a disk is being attached or removed.  So maybe that’s part of the
>> problem, that we get a request while the BB is being detached, and
>> temporarily in an inconsistent state (BDS context differs from BB context).
> I don't know anything about the problem either, but since you already
> speculated about the cause, let me speculate about the solution:
> Can we hold the graph writer lock for the tran_commit() call in
> bdrv_try_change_aio_context()?

Removing the drain to allow for all of bdrv_try_change_aio_context() to 
require GRAPH_WRLOCK, but that broke tests/unit/test-block-iothread, 
because without draining, block jobs would need to switch AioContexts 
while running, and job_set_aio_context() doesn’t like that.  Similarly 
to blk_get_aio_context(), I assume we can in theory just drop the 
assertion there and change the context while the job is running, because 
then the job can just change AioContexts on the next pause point (and in 
the meantime send requests from the old context, which is fine), but 
this does get quite murky.  (One rather virtual (but present) problem is 
that test-block-iothread itself contains some assertions in the job that 
its AioContext is actually the on its running in, but this assertion 
would no longer necessarily hold true.)

I don’t like using drain as a form of lock specifically against 
AioContext changes, but maybe Stefan is right, and we should use it in 
this specific case to get just the single problem fixed.  (Though it’s 
not quite trivial either.  We’d probably still want to remove the 
assertion from blk_get_aio_context(), so we don’t have to require all of 
its callers to hold a count in the in-flight counter.)

Hanna



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-29 16:30       ` Hanna Czenczek
@ 2024-01-31 10:17         ` Kevin Wolf
  2024-02-01  9:43           ` Hanna Czenczek
  0 siblings, 1 reply; 57+ messages in thread
From: Kevin Wolf @ 2024-01-31 10:17 UTC (permalink / raw)
  To: Hanna Czenczek; +Cc: qemu-block, stefanha, qemu-devel

Am 29.01.2024 um 17:30 hat Hanna Czenczek geschrieben:
> I don’t like using drain as a form of lock specifically against AioContext
> changes, but maybe Stefan is right, and we should use it in this specific
> case to get just the single problem fixed.  (Though it’s not quite trivial
> either.  We’d probably still want to remove the assertion from
> blk_get_aio_context(), so we don’t have to require all of its callers to
> hold a count in the in-flight counter.)

Okay, fair, maybe fixing the specific problem is more important that
solving the more generic blk_get_aio_context() race.

In this case, wouldn't it be enough to increase the in-flight counter so
that the drain before switching AioContexts would run the BH before
anything bad can happen? Does the following work?

Kevin

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 0a2eb11c56..dc09eb8024 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -120,17 +120,11 @@ static void scsi_device_for_each_req_async_bh(void *opaque)
     SCSIRequest *next;
 
     /*
-     * If the AioContext changed before this BH was called then reschedule into
-     * the new AioContext before accessing ->requests. This can happen when
-     * scsi_device_for_each_req_async() is called and then the AioContext is
-     * changed before BHs are run.
+     * The AioContext can't have changed because we increased the in-flight
+     * counter for s->conf.blk.
      */
     ctx = blk_get_aio_context(s->conf.blk);
-    if (ctx != qemu_get_current_aio_context()) {
-        aio_bh_schedule_oneshot(ctx, scsi_device_for_each_req_async_bh,
-                                g_steal_pointer(&data));
-        return;
-    }
+    assert(ctx == qemu_get_current_aio_context());
 
     QTAILQ_FOREACH_SAFE(req, &s->requests, next, next) {
         data->fn(req, data->fn_opaque);
@@ -138,6 +132,7 @@ static void scsi_device_for_each_req_async_bh(void *opaque)
 
     /* Drop the reference taken by scsi_device_for_each_req_async() */
     object_unref(OBJECT(s));
+    blk_dec_in_flight(s->conf.blk);
 }
 
 /*
@@ -163,6 +158,7 @@ static void scsi_device_for_each_req_async(SCSIDevice *s,
      */
     object_ref(OBJECT(s));
 
+    blk_inc_in_flight(s->conf.blk);
     aio_bh_schedule_oneshot(blk_get_aio_context(s->conf.blk),
                             scsi_device_for_each_req_async_bh,
                             data);



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-26 15:24           ` Hanna Czenczek
@ 2024-01-31 20:35             ` Stefan Hajnoczi
  2024-02-01 14:10               ` Hanna Czenczek
  0 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2024-01-31 20:35 UTC (permalink / raw)
  To: Hanna Czenczek; +Cc: Kevin Wolf, qemu-block, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9337 bytes --]

On Fri, Jan 26, 2024 at 04:24:49PM +0100, Hanna Czenczek wrote:
> On 26.01.24 14:18, Kevin Wolf wrote:
> > Am 25.01.2024 um 18:32 hat Hanna Czenczek geschrieben:
> > > On 23.01.24 18:10, Kevin Wolf wrote:
> > > > Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
> > > > > On 21.12.23 22:23, Kevin Wolf wrote:
> > > > > > From: Stefan Hajnoczi<stefanha@redhat.com>
> > > > > > 
> > > > > > Stop depending on the AioContext lock and instead access
> > > > > > SCSIDevice->requests from only one thread at a time:
> > > > > > - When the VM is running only the BlockBackend's AioContext may access
> > > > > >      the requests list.
> > > > > > - When the VM is stopped only the main loop may access the requests
> > > > > >      list.
> > > > > > 
> > > > > > These constraints protect the requests list without the need for locking
> > > > > > in the I/O code path.
> > > > > > 
> > > > > > Note that multiple IOThreads are not supported yet because the code
> > > > > > assumes all SCSIRequests are executed from a single AioContext. Leave
> > > > > > that as future work.
> > > > > > 
> > > > > > Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
> > > > > > Reviewed-by: Eric Blake<eblake@redhat.com>
> > > > > > Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
> > > > > > Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> > > > > > ---
> > > > > >     include/hw/scsi/scsi.h |   7 +-
> > > > > >     hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
> > > > > >     2 files changed, 131 insertions(+), 57 deletions(-)
> > > > > My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
> > > > > often because of this commit than because of the original bug, i.e. when
> > > > > repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
> > > > > this tends to happen when unplugging the scsi-hd:
> 
> Note: We (on issues.redhat.com) have a separate report that seems to be
> concerning this very problem: https://issues.redhat.com/browse/RHEL-19381
> 
> > > > > {"execute":"device_del","arguments":{"id":"stg0"}}
> > > > > {"return": {}}
> > > > > qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
> > > > > Assertion `ctx == blk->ctx' failed.
> > > [...]
> > > 
> > > > > I don’t know anything about the problem yet, but as usual, I like
> > > > > speculation and discovering how wrong I was later on, so one thing I came
> > > > > across that’s funny about virtio-scsi is that requests can happen even while
> > > > > a disk is being attached or detached.  That is, Linux seems to probe all
> > > > > LUNs when a new virtio-scsi device is being attached, and it won’t stop just
> > > > > because a disk is being attached or removed.  So maybe that’s part of the
> > > > > problem, that we get a request while the BB is being detached, and
> > > > > temporarily in an inconsistent state (BDS context differs from BB context).
> > > > I don't know anything about the problem either, but since you already
> > > > speculated about the cause, let me speculate about the solution:
> > > > Can we hold the graph writer lock for the tran_commit() call in
> > > > bdrv_try_change_aio_context()? And of course take the reader lock for
> > > > blk_get_aio_context(), but that should be completely unproblematic.
> > > Actually, now that completely unproblematic part is giving me trouble.  I
> > > wanted to just put a graph lock into blk_get_aio_context() (making it a
> > > coroutine with a wrapper)
> > Which is the first thing I neglected and already not great. We have
> > calls of blk_get_aio_context() in the SCSI I/O path, and creating a
> > coroutine and doing at least two context switches simply for this call
> > is a lot of overhead...
> > 
> > > but callers of blk_get_aio_context() generally assume the context is
> > > going to stay the BB’s context for as long as their AioContext *
> > > variable is in scope.
> > I'm not so sure about that. And taking another step back, I'm actually
> > also not sure how much it still matters now that they can submit I/O
> > from any thread.
> 
> That’s my impression, too, but “not sure” doesn’t feel great. :)
> scsi_device_for_each_req_async_bh() specifically double-checks whether it’s
> still in the right context before invoking the specified function, so it
> seems there was some intention to continue to run in the context associated
> with the BB.
> 
> (Not judging whether that intent makes sense or not, yet.)
> 
> > Maybe the correct solution is to remove the assertion from
> > blk_get_aio_context() and just always return blk->ctx. If it's in the
> > middle of a change, you'll either get the old one or the new one. Either
> > one is fine to submit I/O from, and if you care about changes for other
> > reasons (like SCSI does), then you need explicit code to protect it
> > anyway (which SCSI apparently has, but it doesn't work).
> 
> I think most callers do just assume the BB stays in the context they got
> (without any proof, admittedly), but I agree that under re-evaluation, it
> probably doesn’t actually matter to them, really. And yes, basically, if the
> caller doesn’t need to take a lock because it doesn’t really matter whether
> blk->ctx changes while its still using the old value, blk_get_aio_context()
> in turn doesn’t need to double-check blk->ctx against the root node’s
> context either, and nobody needs a lock.
> 
> So I agree, it’s on the caller to protect against a potentially changing
> context, blk_get_aio_context() should just return blk->ctx and not check
> against the root node.
> 
> (On a tangent: blk_drain() is a caller of blk_get_aio_context(), and it
> polls that context.  Why does it need to poll that context specifically when
> requests may be in any context?  Is it because if there are requests in the
> main thread, we must poll that, but otherwise it’s fine to poll any thread,
> and we can only have requests in the main thread if that’s the BB’s
> context?)
> 
> > > I was tempted to think callers know what happens to the BB they pass
> > > to blk_get_aio_context(), and it won’t change contexts so easily, but
> > > then I remembered this is exactly what happens in this case; we run
> > > scsi_device_for_each_req_async_bh() in one thread (which calls
> > > blk_get_aio_context()), and in the other, we change the BB’s context.
> > Let's think a bit more about scsi_device_for_each_req_async()
> > specifically. This is a function that runs in the main thread. Nothing
> > will change any AioContext assignment if it doesn't call it. It wants to
> > make sure that scsi_device_for_each_req_async_bh() is called in the
> > same AioContext where the virtqueue is processed, so it schedules a BH
> > and waits for it.
> 
> I don’t quite follow, it doesn’t wait for the BH.  It uses
> aio_bh_schedule_oneshot(), not aio_wait_bh_oneshot().  While you’re right
> that if it did wait, the BB context might still change, in practice we
> wouldn’t have the problem at hand because the caller is actually the one to
> change the context, concurrently while the BH is running.
> 
> > Waiting for it means running a nested event loop that could do anything,
> > including changing AioContexts. So this is what needs the locking, not
> > the blk_get_aio_context() call in scsi_device_for_each_req_async_bh().
> > If we lock before the nested event loop and unlock in the BH, the check
> > in the BH can become an assertion. (It is important that we unlock in
> > the BH rather than after waiting because if something takes the writer
> > lock, we need to unlock during the nested event loop of bdrv_wrlock() to
> > avoid a deadlock.)
> > 
> > And spawning a coroutine for this looks a lot more acceptable because
> > it's on a slow path anyway.
> > 
> > In fact, we probably don't technically need a coroutine to take the
> > reader lock here. We can have a new graph lock function that asserts
> > that there is no writer (we know because we're running in the main loop)
> > and then atomically increments the reader count. But maybe that already
> > complicates things again...
> 
> So as far as I understand we can’t just use aio_wait_bh_oneshot() and wrap
> it in bdrv_graph_rd{,un}lock_main_loop(), because that doesn’t actually lock
> the graph.  I feel like adding a new graph lock function for this quite
> highly specific case could be dangerous, because it seems easy to use the
> wrong way.
> 
> Just having a trampoline coroutine to call bdrv_graph_co_rd{,un}lock() seems
> simple enough and reasonable here (not a hot path).  Can we have that
> coroutine then use aio_wait_bh_oneshot() with the existing _bh function, or
> should that be made a coroutine, too?

There is a reason for running in the context associated with the BB: the
virtio-scsi code assumes all request processing happens in the BB's
AioContext. The SCSI request list and other SCSI emulation code is not
thread-safe!

The invariant is that SCSI request processing must only happen in one
AioContext. Other parts of QEMU may perform block I/O from other
AioContexts because they don't run SCSI emulation for this device.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-31 10:17         ` Kevin Wolf
@ 2024-02-01  9:43           ` Hanna Czenczek
  2024-02-01 10:21             ` Kevin Wolf
  0 siblings, 1 reply; 57+ messages in thread
From: Hanna Czenczek @ 2024-02-01  9:43 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, stefanha, qemu-devel

On 31.01.24 11:17, Kevin Wolf wrote:
> Am 29.01.2024 um 17:30 hat Hanna Czenczek geschrieben:
>> I don’t like using drain as a form of lock specifically against AioContext
>> changes, but maybe Stefan is right, and we should use it in this specific
>> case to get just the single problem fixed.  (Though it’s not quite trivial
>> either.  We’d probably still want to remove the assertion from
>> blk_get_aio_context(), so we don’t have to require all of its callers to
>> hold a count in the in-flight counter.)
> Okay, fair, maybe fixing the specific problem is more important that
> solving the more generic blk_get_aio_context() race.
>
> In this case, wouldn't it be enough to increase the in-flight counter so
> that the drain before switching AioContexts would run the BH before
> anything bad can happen? Does the following work?

Yes, that’s what I had in mind (Stefan, too, I think), and in testing, 
it looks good.

Hanna

>
> Kevin
>
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index 0a2eb11c56..dc09eb8024 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -120,17 +120,11 @@ static void scsi_device_for_each_req_async_bh(void *opaque)
>       SCSIRequest *next;
>   
>       /*
> -     * If the AioContext changed before this BH was called then reschedule into
> -     * the new AioContext before accessing ->requests. This can happen when
> -     * scsi_device_for_each_req_async() is called and then the AioContext is
> -     * changed before BHs are run.
> +     * The AioContext can't have changed because we increased the in-flight
> +     * counter for s->conf.blk.
>        */
>       ctx = blk_get_aio_context(s->conf.blk);
> -    if (ctx != qemu_get_current_aio_context()) {
> -        aio_bh_schedule_oneshot(ctx, scsi_device_for_each_req_async_bh,
> -                                g_steal_pointer(&data));
> -        return;
> -    }
> +    assert(ctx == qemu_get_current_aio_context());
>   
>       QTAILQ_FOREACH_SAFE(req, &s->requests, next, next) {
>           data->fn(req, data->fn_opaque);
> @@ -138,6 +132,7 @@ static void scsi_device_for_each_req_async_bh(void *opaque)
>   
>       /* Drop the reference taken by scsi_device_for_each_req_async() */
>       object_unref(OBJECT(s));
> +    blk_dec_in_flight(s->conf.blk);
>   }
>   
>   /*
> @@ -163,6 +158,7 @@ static void scsi_device_for_each_req_async(SCSIDevice *s,
>        */
>       object_ref(OBJECT(s));
>   
> +    blk_inc_in_flight(s->conf.blk);
>       aio_bh_schedule_oneshot(blk_get_aio_context(s->conf.blk),
>                               scsi_device_for_each_req_async_bh,
>                               data);
>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-02-01  9:43           ` Hanna Czenczek
@ 2024-02-01 10:21             ` Kevin Wolf
  2024-02-01 10:35               ` Hanna Czenczek
  0 siblings, 1 reply; 57+ messages in thread
From: Kevin Wolf @ 2024-02-01 10:21 UTC (permalink / raw)
  To: Hanna Czenczek; +Cc: qemu-block, stefanha, qemu-devel

Am 01.02.2024 um 10:43 hat Hanna Czenczek geschrieben:
> On 31.01.24 11:17, Kevin Wolf wrote:
> > Am 29.01.2024 um 17:30 hat Hanna Czenczek geschrieben:
> > > I don’t like using drain as a form of lock specifically against AioContext
> > > changes, but maybe Stefan is right, and we should use it in this specific
> > > case to get just the single problem fixed.  (Though it’s not quite trivial
> > > either.  We’d probably still want to remove the assertion from
> > > blk_get_aio_context(), so we don’t have to require all of its callers to
> > > hold a count in the in-flight counter.)
> > Okay, fair, maybe fixing the specific problem is more important that
> > solving the more generic blk_get_aio_context() race.
> > 
> > In this case, wouldn't it be enough to increase the in-flight counter so
> > that the drain before switching AioContexts would run the BH before
> > anything bad can happen? Does the following work?
> 
> Yes, that’s what I had in mind (Stefan, too, I think), and in testing,
> it looks good.

Oh, sorry, I completely misunderstood then. I thought you were talking
about adding a new drained section somewhere and that sounded a bit more
complicated. :-)

If it works, let's do this. Would you like to pick this up and send it
as a formal patch (possibly in a more polished form), or should I do
that?

Kevin



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-02-01 10:21             ` Kevin Wolf
@ 2024-02-01 10:35               ` Hanna Czenczek
  0 siblings, 0 replies; 57+ messages in thread
From: Hanna Czenczek @ 2024-02-01 10:35 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, stefanha, qemu-devel

On 01.02.24 11:21, Kevin Wolf wrote:
> Am 01.02.2024 um 10:43 hat Hanna Czenczek geschrieben:
>> On 31.01.24 11:17, Kevin Wolf wrote:
>>> Am 29.01.2024 um 17:30 hat Hanna Czenczek geschrieben:
>>>> I don’t like using drain as a form of lock specifically against AioContext
>>>> changes, but maybe Stefan is right, and we should use it in this specific
>>>> case to get just the single problem fixed.  (Though it’s not quite trivial
>>>> either.  We’d probably still want to remove the assertion from
>>>> blk_get_aio_context(), so we don’t have to require all of its callers to
>>>> hold a count in the in-flight counter.)
>>> Okay, fair, maybe fixing the specific problem is more important that
>>> solving the more generic blk_get_aio_context() race.
>>>
>>> In this case, wouldn't it be enough to increase the in-flight counter so
>>> that the drain before switching AioContexts would run the BH before
>>> anything bad can happen? Does the following work?
>> Yes, that’s what I had in mind (Stefan, too, I think), and in testing,
>> it looks good.
> Oh, sorry, I completely misunderstood then. I thought you were talking
> about adding a new drained section somewhere and that sounded a bit more
> complicated. :-)
>
> If it works, let's do this. Would you like to pick this up and send it
> as a formal patch (possibly in a more polished form), or should I do
> that?

Sure, I can do it.

Hanna



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-01-31 20:35             ` Stefan Hajnoczi
@ 2024-02-01 14:10               ` Hanna Czenczek
  2024-02-01 14:28                 ` Stefan Hajnoczi
  0 siblings, 1 reply; 57+ messages in thread
From: Hanna Czenczek @ 2024-02-01 14:10 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-block, qemu-devel

On 31.01.24 21:35, Stefan Hajnoczi wrote:
> On Fri, Jan 26, 2024 at 04:24:49PM +0100, Hanna Czenczek wrote:
>> On 26.01.24 14:18, Kevin Wolf wrote:
>>> Am 25.01.2024 um 18:32 hat Hanna Czenczek geschrieben:
>>>> On 23.01.24 18:10, Kevin Wolf wrote:
>>>>> Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
>>>>>> On 21.12.23 22:23, Kevin Wolf wrote:
>>>>>>> From: Stefan Hajnoczi<stefanha@redhat.com>
>>>>>>>
>>>>>>> Stop depending on the AioContext lock and instead access
>>>>>>> SCSIDevice->requests from only one thread at a time:
>>>>>>> - When the VM is running only the BlockBackend's AioContext may access
>>>>>>>       the requests list.
>>>>>>> - When the VM is stopped only the main loop may access the requests
>>>>>>>       list.
>>>>>>>
>>>>>>> These constraints protect the requests list without the need for locking
>>>>>>> in the I/O code path.
>>>>>>>
>>>>>>> Note that multiple IOThreads are not supported yet because the code
>>>>>>> assumes all SCSIRequests are executed from a single AioContext. Leave
>>>>>>> that as future work.
>>>>>>>
>>>>>>> Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
>>>>>>> Reviewed-by: Eric Blake<eblake@redhat.com>
>>>>>>> Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
>>>>>>> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
>>>>>>> ---
>>>>>>>      include/hw/scsi/scsi.h |   7 +-
>>>>>>>      hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
>>>>>>>      2 files changed, 131 insertions(+), 57 deletions(-)
>>>>>> My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
>>>>>> often because of this commit than because of the original bug, i.e. when
>>>>>> repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
>>>>>> this tends to happen when unplugging the scsi-hd:
>> Note: We (on issues.redhat.com) have a separate report that seems to be
>> concerning this very problem: https://issues.redhat.com/browse/RHEL-19381
>>
>>>>>> {"execute":"device_del","arguments":{"id":"stg0"}}
>>>>>> {"return": {}}
>>>>>> qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
>>>>>> Assertion `ctx == blk->ctx' failed.
>>>> [...]
>>>>
>>>>>> I don’t know anything about the problem yet, but as usual, I like
>>>>>> speculation and discovering how wrong I was later on, so one thing I came
>>>>>> across that’s funny about virtio-scsi is that requests can happen even while
>>>>>> a disk is being attached or detached.  That is, Linux seems to probe all
>>>>>> LUNs when a new virtio-scsi device is being attached, and it won’t stop just
>>>>>> because a disk is being attached or removed.  So maybe that’s part of the
>>>>>> problem, that we get a request while the BB is being detached, and
>>>>>> temporarily in an inconsistent state (BDS context differs from BB context).
>>>>> I don't know anything about the problem either, but since you already
>>>>> speculated about the cause, let me speculate about the solution:
>>>>> Can we hold the graph writer lock for the tran_commit() call in
>>>>> bdrv_try_change_aio_context()? And of course take the reader lock for
>>>>> blk_get_aio_context(), but that should be completely unproblematic.
>>>> Actually, now that completely unproblematic part is giving me trouble.  I
>>>> wanted to just put a graph lock into blk_get_aio_context() (making it a
>>>> coroutine with a wrapper)
>>> Which is the first thing I neglected and already not great. We have
>>> calls of blk_get_aio_context() in the SCSI I/O path, and creating a
>>> coroutine and doing at least two context switches simply for this call
>>> is a lot of overhead...
>>>
>>>> but callers of blk_get_aio_context() generally assume the context is
>>>> going to stay the BB’s context for as long as their AioContext *
>>>> variable is in scope.
>>> I'm not so sure about that. And taking another step back, I'm actually
>>> also not sure how much it still matters now that they can submit I/O
>>> from any thread.
>> That’s my impression, too, but “not sure” doesn’t feel great. :)
>> scsi_device_for_each_req_async_bh() specifically double-checks whether it’s
>> still in the right context before invoking the specified function, so it
>> seems there was some intention to continue to run in the context associated
>> with the BB.
>>
>> (Not judging whether that intent makes sense or not, yet.)
>>
>>> Maybe the correct solution is to remove the assertion from
>>> blk_get_aio_context() and just always return blk->ctx. If it's in the
>>> middle of a change, you'll either get the old one or the new one. Either
>>> one is fine to submit I/O from, and if you care about changes for other
>>> reasons (like SCSI does), then you need explicit code to protect it
>>> anyway (which SCSI apparently has, but it doesn't work).
>> I think most callers do just assume the BB stays in the context they got
>> (without any proof, admittedly), but I agree that under re-evaluation, it
>> probably doesn’t actually matter to them, really. And yes, basically, if the
>> caller doesn’t need to take a lock because it doesn’t really matter whether
>> blk->ctx changes while its still using the old value, blk_get_aio_context()
>> in turn doesn’t need to double-check blk->ctx against the root node’s
>> context either, and nobody needs a lock.
>>
>> So I agree, it’s on the caller to protect against a potentially changing
>> context, blk_get_aio_context() should just return blk->ctx and not check
>> against the root node.
>>
>> (On a tangent: blk_drain() is a caller of blk_get_aio_context(), and it
>> polls that context.  Why does it need to poll that context specifically when
>> requests may be in any context?  Is it because if there are requests in the
>> main thread, we must poll that, but otherwise it’s fine to poll any thread,
>> and we can only have requests in the main thread if that’s the BB’s
>> context?)
>>
>>>> I was tempted to think callers know what happens to the BB they pass
>>>> to blk_get_aio_context(), and it won’t change contexts so easily, but
>>>> then I remembered this is exactly what happens in this case; we run
>>>> scsi_device_for_each_req_async_bh() in one thread (which calls
>>>> blk_get_aio_context()), and in the other, we change the BB’s context.
>>> Let's think a bit more about scsi_device_for_each_req_async()
>>> specifically. This is a function that runs in the main thread. Nothing
>>> will change any AioContext assignment if it doesn't call it. It wants to
>>> make sure that scsi_device_for_each_req_async_bh() is called in the
>>> same AioContext where the virtqueue is processed, so it schedules a BH
>>> and waits for it.
>> I don’t quite follow, it doesn’t wait for the BH.  It uses
>> aio_bh_schedule_oneshot(), not aio_wait_bh_oneshot().  While you’re right
>> that if it did wait, the BB context might still change, in practice we
>> wouldn’t have the problem at hand because the caller is actually the one to
>> change the context, concurrently while the BH is running.
>>
>>> Waiting for it means running a nested event loop that could do anything,
>>> including changing AioContexts. So this is what needs the locking, not
>>> the blk_get_aio_context() call in scsi_device_for_each_req_async_bh().
>>> If we lock before the nested event loop and unlock in the BH, the check
>>> in the BH can become an assertion. (It is important that we unlock in
>>> the BH rather than after waiting because if something takes the writer
>>> lock, we need to unlock during the nested event loop of bdrv_wrlock() to
>>> avoid a deadlock.)
>>>
>>> And spawning a coroutine for this looks a lot more acceptable because
>>> it's on a slow path anyway.
>>>
>>> In fact, we probably don't technically need a coroutine to take the
>>> reader lock here. We can have a new graph lock function that asserts
>>> that there is no writer (we know because we're running in the main loop)
>>> and then atomically increments the reader count. But maybe that already
>>> complicates things again...
>> So as far as I understand we can’t just use aio_wait_bh_oneshot() and wrap
>> it in bdrv_graph_rd{,un}lock_main_loop(), because that doesn’t actually lock
>> the graph.  I feel like adding a new graph lock function for this quite
>> highly specific case could be dangerous, because it seems easy to use the
>> wrong way.
>>
>> Just having a trampoline coroutine to call bdrv_graph_co_rd{,un}lock() seems
>> simple enough and reasonable here (not a hot path).  Can we have that
>> coroutine then use aio_wait_bh_oneshot() with the existing _bh function, or
>> should that be made a coroutine, too?
> There is a reason for running in the context associated with the BB: the
> virtio-scsi code assumes all request processing happens in the BB's
> AioContext. The SCSI request list and other SCSI emulation code is not
> thread-safe!

One peculiarity about virtio-scsi, as far as I understand, is that its 
context is not necessarily the BB’s context, because one virtio-scsi 
device may have many BBs.  While the BBs are being hot-plugged or 
un-plugged, their context may change (as is happening here), but that 
doesn’t stop SCSI request processing, because SCSI requests happen 
independently of whether there are devices on the SCSI bus.

If SCSI request processing is not thread-safe, doesn’t that mean it 
always must be done in the very same context, i.e. the context the 
virtio-scsi device was configured to use?  Just because a new scsi-hd BB 
is added or removed, and so we temporarily have a main context BB 
associated with the virtio-scsi device, I don’t think we should switch 
to processing requests in the main context.

Hanna



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-02-01 14:10               ` Hanna Czenczek
@ 2024-02-01 14:28                 ` Stefan Hajnoczi
  2024-02-01 15:25                   ` Hanna Czenczek
  0 siblings, 1 reply; 57+ messages in thread
From: Stefan Hajnoczi @ 2024-02-01 14:28 UTC (permalink / raw)
  To: Hanna Czenczek; +Cc: Kevin Wolf, qemu-block, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 11294 bytes --]

On Thu, Feb 01, 2024 at 03:10:12PM +0100, Hanna Czenczek wrote:
> On 31.01.24 21:35, Stefan Hajnoczi wrote:
> > On Fri, Jan 26, 2024 at 04:24:49PM +0100, Hanna Czenczek wrote:
> > > On 26.01.24 14:18, Kevin Wolf wrote:
> > > > Am 25.01.2024 um 18:32 hat Hanna Czenczek geschrieben:
> > > > > On 23.01.24 18:10, Kevin Wolf wrote:
> > > > > > Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
> > > > > > > On 21.12.23 22:23, Kevin Wolf wrote:
> > > > > > > > From: Stefan Hajnoczi<stefanha@redhat.com>
> > > > > > > > 
> > > > > > > > Stop depending on the AioContext lock and instead access
> > > > > > > > SCSIDevice->requests from only one thread at a time:
> > > > > > > > - When the VM is running only the BlockBackend's AioContext may access
> > > > > > > >       the requests list.
> > > > > > > > - When the VM is stopped only the main loop may access the requests
> > > > > > > >       list.
> > > > > > > > 
> > > > > > > > These constraints protect the requests list without the need for locking
> > > > > > > > in the I/O code path.
> > > > > > > > 
> > > > > > > > Note that multiple IOThreads are not supported yet because the code
> > > > > > > > assumes all SCSIRequests are executed from a single AioContext. Leave
> > > > > > > > that as future work.
> > > > > > > > 
> > > > > > > > Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
> > > > > > > > Reviewed-by: Eric Blake<eblake@redhat.com>
> > > > > > > > Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
> > > > > > > > Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> > > > > > > > ---
> > > > > > > >      include/hw/scsi/scsi.h |   7 +-
> > > > > > > >      hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
> > > > > > > >      2 files changed, 131 insertions(+), 57 deletions(-)
> > > > > > > My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
> > > > > > > often because of this commit than because of the original bug, i.e. when
> > > > > > > repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
> > > > > > > this tends to happen when unplugging the scsi-hd:
> > > Note: We (on issues.redhat.com) have a separate report that seems to be
> > > concerning this very problem: https://issues.redhat.com/browse/RHEL-19381
> > > 
> > > > > > > {"execute":"device_del","arguments":{"id":"stg0"}}
> > > > > > > {"return": {}}
> > > > > > > qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
> > > > > > > Assertion `ctx == blk->ctx' failed.
> > > > > [...]
> > > > > 
> > > > > > > I don’t know anything about the problem yet, but as usual, I like
> > > > > > > speculation and discovering how wrong I was later on, so one thing I came
> > > > > > > across that’s funny about virtio-scsi is that requests can happen even while
> > > > > > > a disk is being attached or detached.  That is, Linux seems to probe all
> > > > > > > LUNs when a new virtio-scsi device is being attached, and it won’t stop just
> > > > > > > because a disk is being attached or removed.  So maybe that’s part of the
> > > > > > > problem, that we get a request while the BB is being detached, and
> > > > > > > temporarily in an inconsistent state (BDS context differs from BB context).
> > > > > > I don't know anything about the problem either, but since you already
> > > > > > speculated about the cause, let me speculate about the solution:
> > > > > > Can we hold the graph writer lock for the tran_commit() call in
> > > > > > bdrv_try_change_aio_context()? And of course take the reader lock for
> > > > > > blk_get_aio_context(), but that should be completely unproblematic.
> > > > > Actually, now that completely unproblematic part is giving me trouble.  I
> > > > > wanted to just put a graph lock into blk_get_aio_context() (making it a
> > > > > coroutine with a wrapper)
> > > > Which is the first thing I neglected and already not great. We have
> > > > calls of blk_get_aio_context() in the SCSI I/O path, and creating a
> > > > coroutine and doing at least two context switches simply for this call
> > > > is a lot of overhead...
> > > > 
> > > > > but callers of blk_get_aio_context() generally assume the context is
> > > > > going to stay the BB’s context for as long as their AioContext *
> > > > > variable is in scope.
> > > > I'm not so sure about that. And taking another step back, I'm actually
> > > > also not sure how much it still matters now that they can submit I/O
> > > > from any thread.
> > > That’s my impression, too, but “not sure” doesn’t feel great. :)
> > > scsi_device_for_each_req_async_bh() specifically double-checks whether it’s
> > > still in the right context before invoking the specified function, so it
> > > seems there was some intention to continue to run in the context associated
> > > with the BB.
> > > 
> > > (Not judging whether that intent makes sense or not, yet.)
> > > 
> > > > Maybe the correct solution is to remove the assertion from
> > > > blk_get_aio_context() and just always return blk->ctx. If it's in the
> > > > middle of a change, you'll either get the old one or the new one. Either
> > > > one is fine to submit I/O from, and if you care about changes for other
> > > > reasons (like SCSI does), then you need explicit code to protect it
> > > > anyway (which SCSI apparently has, but it doesn't work).
> > > I think most callers do just assume the BB stays in the context they got
> > > (without any proof, admittedly), but I agree that under re-evaluation, it
> > > probably doesn’t actually matter to them, really. And yes, basically, if the
> > > caller doesn’t need to take a lock because it doesn’t really matter whether
> > > blk->ctx changes while its still using the old value, blk_get_aio_context()
> > > in turn doesn’t need to double-check blk->ctx against the root node’s
> > > context either, and nobody needs a lock.
> > > 
> > > So I agree, it’s on the caller to protect against a potentially changing
> > > context, blk_get_aio_context() should just return blk->ctx and not check
> > > against the root node.
> > > 
> > > (On a tangent: blk_drain() is a caller of blk_get_aio_context(), and it
> > > polls that context.  Why does it need to poll that context specifically when
> > > requests may be in any context?  Is it because if there are requests in the
> > > main thread, we must poll that, but otherwise it’s fine to poll any thread,
> > > and we can only have requests in the main thread if that’s the BB’s
> > > context?)
> > > 
> > > > > I was tempted to think callers know what happens to the BB they pass
> > > > > to blk_get_aio_context(), and it won’t change contexts so easily, but
> > > > > then I remembered this is exactly what happens in this case; we run
> > > > > scsi_device_for_each_req_async_bh() in one thread (which calls
> > > > > blk_get_aio_context()), and in the other, we change the BB’s context.
> > > > Let's think a bit more about scsi_device_for_each_req_async()
> > > > specifically. This is a function that runs in the main thread. Nothing
> > > > will change any AioContext assignment if it doesn't call it. It wants to
> > > > make sure that scsi_device_for_each_req_async_bh() is called in the
> > > > same AioContext where the virtqueue is processed, so it schedules a BH
> > > > and waits for it.
> > > I don’t quite follow, it doesn’t wait for the BH.  It uses
> > > aio_bh_schedule_oneshot(), not aio_wait_bh_oneshot().  While you’re right
> > > that if it did wait, the BB context might still change, in practice we
> > > wouldn’t have the problem at hand because the caller is actually the one to
> > > change the context, concurrently while the BH is running.
> > > 
> > > > Waiting for it means running a nested event loop that could do anything,
> > > > including changing AioContexts. So this is what needs the locking, not
> > > > the blk_get_aio_context() call in scsi_device_for_each_req_async_bh().
> > > > If we lock before the nested event loop and unlock in the BH, the check
> > > > in the BH can become an assertion. (It is important that we unlock in
> > > > the BH rather than after waiting because if something takes the writer
> > > > lock, we need to unlock during the nested event loop of bdrv_wrlock() to
> > > > avoid a deadlock.)
> > > > 
> > > > And spawning a coroutine for this looks a lot more acceptable because
> > > > it's on a slow path anyway.
> > > > 
> > > > In fact, we probably don't technically need a coroutine to take the
> > > > reader lock here. We can have a new graph lock function that asserts
> > > > that there is no writer (we know because we're running in the main loop)
> > > > and then atomically increments the reader count. But maybe that already
> > > > complicates things again...
> > > So as far as I understand we can’t just use aio_wait_bh_oneshot() and wrap
> > > it in bdrv_graph_rd{,un}lock_main_loop(), because that doesn’t actually lock
> > > the graph.  I feel like adding a new graph lock function for this quite
> > > highly specific case could be dangerous, because it seems easy to use the
> > > wrong way.
> > > 
> > > Just having a trampoline coroutine to call bdrv_graph_co_rd{,un}lock() seems
> > > simple enough and reasonable here (not a hot path).  Can we have that
> > > coroutine then use aio_wait_bh_oneshot() with the existing _bh function, or
> > > should that be made a coroutine, too?
> > There is a reason for running in the context associated with the BB: the
> > virtio-scsi code assumes all request processing happens in the BB's
> > AioContext. The SCSI request list and other SCSI emulation code is not
> > thread-safe!
> 
> One peculiarity about virtio-scsi, as far as I understand, is that its
> context is not necessarily the BB’s context, because one virtio-scsi device
> may have many BBs.  While the BBs are being hot-plugged or un-plugged, their
> context may change (as is happening here), but that doesn’t stop SCSI
> request processing, because SCSI requests happen independently of whether
> there are devices on the SCSI bus.
> 
> If SCSI request processing is not thread-safe, doesn’t that mean it always
> must be done in the very same context, i.e. the context the virtio-scsi
> device was configured to use?  Just because a new scsi-hd BB is added or
> removed, and so we temporarily have a main context BB associated with the
> virtio-scsi device, I don’t think we should switch to processing requests in
> the main context.

This case is not supposed to happen because virtio_scsi_hotplug()
immediately places the BB into the virtio-scsi device's AioContext by
calling blk_set_aio_context().

The AioContext invariant is checked at several points in the SCSI
request lifecycle by this function:

  static inline void virtio_scsi_ctx_check(VirtIOSCSI *s, SCSIDevice *d)
  {   
      if (s->dataplane_started && d && blk_is_available(d->conf.blk)) {
          assert(blk_get_aio_context(d->conf.blk) == s->ctx);
      } 
  }

Did you find a scenario where the virtio-scsi AioContext is different
from the scsi-hd BB's Aiocontext?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-02-01 14:28                 ` Stefan Hajnoczi
@ 2024-02-01 15:25                   ` Hanna Czenczek
  2024-02-01 15:49                     ` Hanna Czenczek
  2024-02-02 12:32                     ` Hanna Czenczek
  0 siblings, 2 replies; 57+ messages in thread
From: Hanna Czenczek @ 2024-02-01 15:25 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-block, qemu-devel

On 01.02.24 15:28, Stefan Hajnoczi wrote:
> On Thu, Feb 01, 2024 at 03:10:12PM +0100, Hanna Czenczek wrote:
>> On 31.01.24 21:35, Stefan Hajnoczi wrote:
>>> On Fri, Jan 26, 2024 at 04:24:49PM +0100, Hanna Czenczek wrote:
>>>> On 26.01.24 14:18, Kevin Wolf wrote:
>>>>> Am 25.01.2024 um 18:32 hat Hanna Czenczek geschrieben:
>>>>>> On 23.01.24 18:10, Kevin Wolf wrote:
>>>>>>> Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
>>>>>>>> On 21.12.23 22:23, Kevin Wolf wrote:
>>>>>>>>> From: Stefan Hajnoczi<stefanha@redhat.com>
>>>>>>>>>
>>>>>>>>> Stop depending on the AioContext lock and instead access
>>>>>>>>> SCSIDevice->requests from only one thread at a time:
>>>>>>>>> - When the VM is running only the BlockBackend's AioContext may access
>>>>>>>>>        the requests list.
>>>>>>>>> - When the VM is stopped only the main loop may access the requests
>>>>>>>>>        list.
>>>>>>>>>
>>>>>>>>> These constraints protect the requests list without the need for locking
>>>>>>>>> in the I/O code path.
>>>>>>>>>
>>>>>>>>> Note that multiple IOThreads are not supported yet because the code
>>>>>>>>> assumes all SCSIRequests are executed from a single AioContext. Leave
>>>>>>>>> that as future work.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
>>>>>>>>> Reviewed-by: Eric Blake<eblake@redhat.com>
>>>>>>>>> Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
>>>>>>>>> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
>>>>>>>>> ---
>>>>>>>>>       include/hw/scsi/scsi.h |   7 +-
>>>>>>>>>       hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
>>>>>>>>>       2 files changed, 131 insertions(+), 57 deletions(-)
>>>>>>>> My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
>>>>>>>> often because of this commit than because of the original bug, i.e. when
>>>>>>>> repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
>>>>>>>> this tends to happen when unplugging the scsi-hd:
>>>> Note: We (on issues.redhat.com) have a separate report that seems to be
>>>> concerning this very problem: https://issues.redhat.com/browse/RHEL-19381
>>>>
>>>>>>>> {"execute":"device_del","arguments":{"id":"stg0"}}
>>>>>>>> {"return": {}}
>>>>>>>> qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
>>>>>>>> Assertion `ctx == blk->ctx' failed.
>>>>>> [...]
>>>>>>
>>>>>>>> I don’t know anything about the problem yet, but as usual, I like
>>>>>>>> speculation and discovering how wrong I was later on, so one thing I came
>>>>>>>> across that’s funny about virtio-scsi is that requests can happen even while
>>>>>>>> a disk is being attached or detached.  That is, Linux seems to probe all
>>>>>>>> LUNs when a new virtio-scsi device is being attached, and it won’t stop just
>>>>>>>> because a disk is being attached or removed.  So maybe that’s part of the
>>>>>>>> problem, that we get a request while the BB is being detached, and
>>>>>>>> temporarily in an inconsistent state (BDS context differs from BB context).
>>>>>>> I don't know anything about the problem either, but since you already
>>>>>>> speculated about the cause, let me speculate about the solution:
>>>>>>> Can we hold the graph writer lock for the tran_commit() call in
>>>>>>> bdrv_try_change_aio_context()? And of course take the reader lock for
>>>>>>> blk_get_aio_context(), but that should be completely unproblematic.
>>>>>> Actually, now that completely unproblematic part is giving me trouble.  I
>>>>>> wanted to just put a graph lock into blk_get_aio_context() (making it a
>>>>>> coroutine with a wrapper)
>>>>> Which is the first thing I neglected and already not great. We have
>>>>> calls of blk_get_aio_context() in the SCSI I/O path, and creating a
>>>>> coroutine and doing at least two context switches simply for this call
>>>>> is a lot of overhead...
>>>>>
>>>>>> but callers of blk_get_aio_context() generally assume the context is
>>>>>> going to stay the BB’s context for as long as their AioContext *
>>>>>> variable is in scope.
>>>>> I'm not so sure about that. And taking another step back, I'm actually
>>>>> also not sure how much it still matters now that they can submit I/O
>>>>> from any thread.
>>>> That’s my impression, too, but “not sure” doesn’t feel great. :)
>>>> scsi_device_for_each_req_async_bh() specifically double-checks whether it’s
>>>> still in the right context before invoking the specified function, so it
>>>> seems there was some intention to continue to run in the context associated
>>>> with the BB.
>>>>
>>>> (Not judging whether that intent makes sense or not, yet.)
>>>>
>>>>> Maybe the correct solution is to remove the assertion from
>>>>> blk_get_aio_context() and just always return blk->ctx. If it's in the
>>>>> middle of a change, you'll either get the old one or the new one. Either
>>>>> one is fine to submit I/O from, and if you care about changes for other
>>>>> reasons (like SCSI does), then you need explicit code to protect it
>>>>> anyway (which SCSI apparently has, but it doesn't work).
>>>> I think most callers do just assume the BB stays in the context they got
>>>> (without any proof, admittedly), but I agree that under re-evaluation, it
>>>> probably doesn’t actually matter to them, really. And yes, basically, if the
>>>> caller doesn’t need to take a lock because it doesn’t really matter whether
>>>> blk->ctx changes while its still using the old value, blk_get_aio_context()
>>>> in turn doesn’t need to double-check blk->ctx against the root node’s
>>>> context either, and nobody needs a lock.
>>>>
>>>> So I agree, it’s on the caller to protect against a potentially changing
>>>> context, blk_get_aio_context() should just return blk->ctx and not check
>>>> against the root node.
>>>>
>>>> (On a tangent: blk_drain() is a caller of blk_get_aio_context(), and it
>>>> polls that context.  Why does it need to poll that context specifically when
>>>> requests may be in any context?  Is it because if there are requests in the
>>>> main thread, we must poll that, but otherwise it’s fine to poll any thread,
>>>> and we can only have requests in the main thread if that’s the BB’s
>>>> context?)
>>>>
>>>>>> I was tempted to think callers know what happens to the BB they pass
>>>>>> to blk_get_aio_context(), and it won’t change contexts so easily, but
>>>>>> then I remembered this is exactly what happens in this case; we run
>>>>>> scsi_device_for_each_req_async_bh() in one thread (which calls
>>>>>> blk_get_aio_context()), and in the other, we change the BB’s context.
>>>>> Let's think a bit more about scsi_device_for_each_req_async()
>>>>> specifically. This is a function that runs in the main thread. Nothing
>>>>> will change any AioContext assignment if it doesn't call it. It wants to
>>>>> make sure that scsi_device_for_each_req_async_bh() is called in the
>>>>> same AioContext where the virtqueue is processed, so it schedules a BH
>>>>> and waits for it.
>>>> I don’t quite follow, it doesn’t wait for the BH.  It uses
>>>> aio_bh_schedule_oneshot(), not aio_wait_bh_oneshot().  While you’re right
>>>> that if it did wait, the BB context might still change, in practice we
>>>> wouldn’t have the problem at hand because the caller is actually the one to
>>>> change the context, concurrently while the BH is running.
>>>>
>>>>> Waiting for it means running a nested event loop that could do anything,
>>>>> including changing AioContexts. So this is what needs the locking, not
>>>>> the blk_get_aio_context() call in scsi_device_for_each_req_async_bh().
>>>>> If we lock before the nested event loop and unlock in the BH, the check
>>>>> in the BH can become an assertion. (It is important that we unlock in
>>>>> the BH rather than after waiting because if something takes the writer
>>>>> lock, we need to unlock during the nested event loop of bdrv_wrlock() to
>>>>> avoid a deadlock.)
>>>>>
>>>>> And spawning a coroutine for this looks a lot more acceptable because
>>>>> it's on a slow path anyway.
>>>>>
>>>>> In fact, we probably don't technically need a coroutine to take the
>>>>> reader lock here. We can have a new graph lock function that asserts
>>>>> that there is no writer (we know because we're running in the main loop)
>>>>> and then atomically increments the reader count. But maybe that already
>>>>> complicates things again...
>>>> So as far as I understand we can’t just use aio_wait_bh_oneshot() and wrap
>>>> it in bdrv_graph_rd{,un}lock_main_loop(), because that doesn’t actually lock
>>>> the graph.  I feel like adding a new graph lock function for this quite
>>>> highly specific case could be dangerous, because it seems easy to use the
>>>> wrong way.
>>>>
>>>> Just having a trampoline coroutine to call bdrv_graph_co_rd{,un}lock() seems
>>>> simple enough and reasonable here (not a hot path).  Can we have that
>>>> coroutine then use aio_wait_bh_oneshot() with the existing _bh function, or
>>>> should that be made a coroutine, too?
>>> There is a reason for running in the context associated with the BB: the
>>> virtio-scsi code assumes all request processing happens in the BB's
>>> AioContext. The SCSI request list and other SCSI emulation code is not
>>> thread-safe!
>> One peculiarity about virtio-scsi, as far as I understand, is that its
>> context is not necessarily the BB’s context, because one virtio-scsi device
>> may have many BBs.  While the BBs are being hot-plugged or un-plugged, their
>> context may change (as is happening here), but that doesn’t stop SCSI
>> request processing, because SCSI requests happen independently of whether
>> there are devices on the SCSI bus.
>>
>> If SCSI request processing is not thread-safe, doesn’t that mean it always
>> must be done in the very same context, i.e. the context the virtio-scsi
>> device was configured to use?  Just because a new scsi-hd BB is added or
>> removed, and so we temporarily have a main context BB associated with the
>> virtio-scsi device, I don’t think we should switch to processing requests in
>> the main context.
> This case is not supposed to happen because virtio_scsi_hotplug()
> immediately places the BB into the virtio-scsi device's AioContext by
> calling blk_set_aio_context().
>
> The AioContext invariant is checked at several points in the SCSI
> request lifecycle by this function:
>
>    static inline void virtio_scsi_ctx_check(VirtIOSCSI *s, SCSIDevice *d)
>    {
>        if (s->dataplane_started && d && blk_is_available(d->conf.blk)) {
>            assert(blk_get_aio_context(d->conf.blk) == s->ctx);
>        }
>    }

Yes, in fact, when I looked at other callers of blk_get_aio_context(), 
this was one place that didn’t really make sense to me, exactly because 
I doubt the invariant.

(Other places are scsi_aio_complete() and scsi_read_complete_noio().)

> Did you find a scenario where the virtio-scsi AioContext is different
> from the scsi-hd BB's Aiocontext?

Technically, that’s the reason for this thread, specifically that 
virtio_scsi_hotunplug() switches the BB back to the main context while 
scsi_device_for_each_req_async_bh() is running.  Yes, we can fix that 
specific case via the in-flight counter, but I’m wondering whether 
there’s really any merit in requiring the BB to always be in 
virtio-scsi’s context, or whether it would make more sense to schedule 
everything in virtio-scsi’s context.  Now that BBs/BDSs can receive 
requests from any context, that is.

The critical path is in hot-plugging and -unplugging, because those 
happen in the main context, concurrently to request processing in 
virtio-scsi’s context.  As for hot-plugging, what I’ve seen is 
https://issues.redhat.com/browse/RHEL-3934#comment-23272702 : The 
scsi-hd device is created before it’s hot-plugged into virtio-scsi, so 
technically, we do then have a scsi-hd device whose context is different 
from virtio-scsi.  The blk_drain() that’s being done to this new scsi-hd 
device does lead into virtio_scsi_drained_begin(), so there is at least 
some connection between the two.

As for hot-unplugging, my worry is that there may be SCSI requests 
ongoing, which are processed in virtio-scsi’s context.  My hope is that 
scsi_device_purge_requests() settles all of them, so that there are no 
requests left after virtio_scsi_hotunplug()’s 
qdev_simple_device_unplug_cb(), before the BB is switched to the main 
context.  Right now, it doesn’t do that, though, because we use 
scsi_device_for_each_req_async(), i.e. don’t wait for those requests to 
be cancelled.  With the in-flight patch, the subsequent blk_drain() in 
scsi_device_purge_requests() would then await it, though.[1]

Even with that, the situation is all but clear to me.  We do run 
scsi_req_cancel_async() for every single request we currently have, and 
then wait until the in-flight counter reaches 0, which seems good[2], 
but with the rest of the unplugging code running in the main context, 
and virtio-scsi continuing to process requests from the guest in a 
different context, I can’t easily figure out why it would be impossible 
for the guest to launch a SCSI request for that SCSI disk that is being 
unplugged.  On one hand, just because the guest has accepted hot unplug 
does not mean it would be impossible to act against supposed protocol 
and submit another request.  On the other, this unplugging and unrealize 
state machine to me is a very complex and opaque state machine that 
makes it difficult to grasp where exactly the ties between a scsi-hd 
device with its BB and the virtio-scsi device are completely severed, 
i.e. until which point it is possible for virtio-scsi code to see the BB 
during the unplugging process, and consider it the target of a request.

It just seems simpler to me to not rely on the BB's context at all.

Hanna


[1] So, fun note: Incrementing the in-flight counter would fix the bug 
even without a drain in bdrv_try_change_aio_context(), because 
scsi_device_purge_requests() has a blk_drain() anyway.

[2] I had to inspect the code, though, so already this is non-obvious.  
There are no comments on either scsi_device_purge_one_req() or 
scsi_device_purge_requests(), so it’s unclear what their guarantees 
are.  scsi_device_purge_one_req() calls scsi_req_cancel_async(), which 
indicates that the request isn’t necessarily deleted after the function 
returns, and so you need to look at the code: Non-I/O requests are 
deleted, but I/O requests are not, they are just cancelled.  However, I 
assume that I/O requests have incremented some in-flight counter, so I 
assume that the blk_drain() in scsi_device_purge_requests() takes care 
of settling them all.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-02-01 15:25                   ` Hanna Czenczek
@ 2024-02-01 15:49                     ` Hanna Czenczek
  2024-02-02 12:32                     ` Hanna Czenczek
  1 sibling, 0 replies; 57+ messages in thread
From: Hanna Czenczek @ 2024-02-01 15:49 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-block, qemu-devel

On 01.02.24 16:25, Hanna Czenczek wrote:

[...]

> It just seems simpler to me to not rely on the BB's context at all.

Hm, I now see the problem is that the processing (and scheduling) is 
largely done in generic SCSI code, which doesn’t have access to 
virtio-scsi’s context, only to that of the BB.  That makes my idea quite 
impossible. :/



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-02-01 15:25                   ` Hanna Czenczek
  2024-02-01 15:49                     ` Hanna Czenczek
@ 2024-02-02 12:32                     ` Hanna Czenczek
  2024-02-06 19:32                       ` Stefan Hajnoczi
  1 sibling, 1 reply; 57+ messages in thread
From: Hanna Czenczek @ 2024-02-02 12:32 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-block, qemu-devel

On 01.02.24 16:25, Hanna Czenczek wrote:
> On 01.02.24 15:28, Stefan Hajnoczi wrote:

[...]

>> Did you find a scenario where the virtio-scsi AioContext is different
>> from the scsi-hd BB's Aiocontext?
>
> Technically, that’s the reason for this thread, specifically that 
> virtio_scsi_hotunplug() switches the BB back to the main context while 
> scsi_device_for_each_req_async_bh() is running.  Yes, we can fix that 
> specific case via the in-flight counter, but I’m wondering whether 
> there’s really any merit in requiring the BB to always be in 
> virtio-scsi’s context, or whether it would make more sense to schedule 
> everything in virtio-scsi’s context.  Now that BBs/BDSs can receive 
> requests from any context, that is.

Now that I know that wouldn’t be easy, let me turn this around: As far 
as I understand, scsi_device_for_each_req_async_bh() should still run in 
virtio-scsi’s context, but that’s hard, so we take the BB’s context, 
which we therefore require to be the same one. Further, (again AFAIU,) 
virtio-scsi’s context cannot change (only set in 
virtio_scsi_dataplane_setup(), which is run in 
virtio_scsi_device_realize()).  Therefore, why does the 
scsi_device_for_each_req_async() code accommodate for BB context changes?

Hanna



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
  2024-02-02 12:32                     ` Hanna Czenczek
@ 2024-02-06 19:32                       ` Stefan Hajnoczi
  0 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2024-02-06 19:32 UTC (permalink / raw)
  To: Hanna Czenczek; +Cc: Kevin Wolf, qemu-block, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2170 bytes --]

On Fri, Feb 02, 2024 at 01:32:39PM +0100, Hanna Czenczek wrote:
> On 01.02.24 16:25, Hanna Czenczek wrote:
> > On 01.02.24 15:28, Stefan Hajnoczi wrote:
> 
> [...]
> 
> > > Did you find a scenario where the virtio-scsi AioContext is different
> > > from the scsi-hd BB's Aiocontext?
> > 
> > Technically, that’s the reason for this thread, specifically that
> > virtio_scsi_hotunplug() switches the BB back to the main context while
> > scsi_device_for_each_req_async_bh() is running.  Yes, we can fix that
> > specific case via the in-flight counter, but I’m wondering whether
> > there’s really any merit in requiring the BB to always be in
> > virtio-scsi’s context, or whether it would make more sense to schedule
> > everything in virtio-scsi’s context.  Now that BBs/BDSs can receive
> > requests from any context, that is.
> 
> Now that I know that wouldn’t be easy, let me turn this around: As far as I
> understand, scsi_device_for_each_req_async_bh() should still run in
> virtio-scsi’s context, but that’s hard, so we take the BB’s context, which
> we therefore require to be the same one. Further, (again AFAIU,)
> virtio-scsi’s context cannot change (only set in
> virtio_scsi_dataplane_setup(), which is run in
> virtio_scsi_device_realize()).  Therefore, why does the
> scsi_device_for_each_req_async() code accommodate for BB context changes?

1. scsi_disk_reset() -> scsi_device_purge_requests() is called without
   in-flight requests.
2. The BH is scheduled by scsi_device_purge_requests() ->
   scsi_device_for_each_req_async().
3. blk_drain() is a nop when there no in-flight requests and does not
   flush BHs.
3. The AioContext changes when the virtio-scsi device resets.
4. The BH executes.

Kevin and I touched on the idea of flushing BHs in bdrv_drain() even
when there are no requests in flight. This hasn't been implemented as of
today, but would also reduce the chance of scenarios like the one I
mentioned.

I think it's safer to handle the case where the BH runs after an
AioContext change until either everything is thread-safe or the
AioContext never changes.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2024-02-06 19:33 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-21 21:23 [PULL 00/33] Block layer patches Kevin Wolf
2023-12-21 21:23 ` [PULL 01/33] nbd/server: avoid per-NBDRequest nbd_client_get/put() Kevin Wolf
2023-12-21 21:23 ` [PULL 02/33] nbd/server: only traverse NBDExport->clients from main loop thread Kevin Wolf
2023-12-21 21:23 ` [PULL 03/33] nbd/server: introduce NBDClient->lock to protect fields Kevin Wolf
2023-12-21 21:23 ` [PULL 04/33] block/file-posix: set up Linux AIO and io_uring in the current thread Kevin Wolf
2023-12-21 21:23 ` [PULL 05/33] virtio-blk: add lock to protect s->rq Kevin Wolf
2023-12-21 21:23 ` [PULL 06/33] virtio-blk: don't lock AioContext in the completion code path Kevin Wolf
2023-12-21 21:23 ` [PULL 07/33] virtio-blk: don't lock AioContext in the submission " Kevin Wolf
2023-12-21 21:23 ` [PULL 08/33] block: Fix crash when loading snapshot on inactive node Kevin Wolf
2023-12-21 21:23 ` [PULL 09/33] vl: Improve error message for conflicting -incoming and -loadvm Kevin Wolf
2023-12-21 21:23 ` [PULL 10/33] iotests: Basic tests for internal snapshots Kevin Wolf
2023-12-21 21:23 ` [PULL 11/33] scsi: only access SCSIDevice->requests from one thread Kevin Wolf
2024-01-23 16:40   ` Hanna Czenczek
2024-01-23 17:10     ` Kevin Wolf
2024-01-23 17:23       ` Hanna Czenczek
2024-01-24 12:12       ` Hanna Czenczek
2024-01-24 21:53         ` Stefan Hajnoczi
2024-01-25  9:06           ` Hanna Czenczek
2024-01-25 13:25             ` Stefan Hajnoczi
2024-01-25 17:32       ` Hanna Czenczek
2024-01-26 13:18         ` Kevin Wolf
2024-01-26 15:24           ` Hanna Czenczek
2024-01-31 20:35             ` Stefan Hajnoczi
2024-02-01 14:10               ` Hanna Czenczek
2024-02-01 14:28                 ` Stefan Hajnoczi
2024-02-01 15:25                   ` Hanna Czenczek
2024-02-01 15:49                     ` Hanna Czenczek
2024-02-02 12:32                     ` Hanna Czenczek
2024-02-06 19:32                       ` Stefan Hajnoczi
2024-01-29 16:30       ` Hanna Czenczek
2024-01-31 10:17         ` Kevin Wolf
2024-02-01  9:43           ` Hanna Czenczek
2024-02-01 10:21             ` Kevin Wolf
2024-02-01 10:35               ` Hanna Czenczek
2024-01-23 17:21     ` Hanna Czenczek
2023-12-21 21:23 ` [PULL 12/33] virtio-scsi: don't lock AioContext around virtio_queue_aio_attach_host_notifier() Kevin Wolf
2023-12-21 21:23 ` [PULL 13/33] scsi: don't lock AioContext in I/O code path Kevin Wolf
2023-12-21 21:23 ` [PULL 14/33] dma-helpers: don't lock AioContext in dma_blk_cb() Kevin Wolf
2023-12-21 21:23 ` [PULL 15/33] virtio-scsi: replace AioContext lock with tmf_bh_lock Kevin Wolf
2023-12-21 21:23 ` [PULL 16/33] scsi: assert that callbacks run in the correct AioContext Kevin Wolf
2023-12-21 21:23 ` [PULL 17/33] tests: remove aio_context_acquire() tests Kevin Wolf
2023-12-21 21:23 ` [PULL 18/33] aio: make aio_context_acquire()/aio_context_release() a no-op Kevin Wolf
2023-12-21 21:23 ` [PULL 19/33] graph-lock: remove AioContext locking Kevin Wolf
2023-12-21 21:23 ` [PULL 20/33] block: " Kevin Wolf
2023-12-21 21:23 ` [PULL 21/33] block: remove bdrv_co_lock() Kevin Wolf
2023-12-21 21:23 ` [PULL 22/33] scsi: remove AioContext locking Kevin Wolf
2023-12-21 21:23 ` [PULL 23/33] aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED() Kevin Wolf
2023-12-21 21:23 ` [PULL 24/33] aio: remove aio_context_acquire()/aio_context_release() API Kevin Wolf
2023-12-21 21:23 ` [PULL 25/33] docs: remove AioContext lock from IOThread docs Kevin Wolf
2023-12-21 21:23 ` [PULL 26/33] scsi: remove outdated AioContext lock comment Kevin Wolf
2023-12-21 21:23 ` [PULL 27/33] job: remove outdated AioContext locking comments Kevin Wolf
2023-12-21 21:23 ` [PULL 28/33] block: " Kevin Wolf
2023-12-21 21:23 ` [PULL 29/33] block-coroutine-wrapper: use qemu_get_current_aio_context() Kevin Wolf
2023-12-21 21:23 ` [PULL 30/33] string-output-visitor: show structs as "<omitted>" Kevin Wolf
2023-12-21 21:23 ` [PULL 31/33] qdev-properties: alias all object class properties Kevin Wolf
2023-12-21 21:23 ` [PULL 32/33] qdev: add IOThreadVirtQueueMappingList property type Kevin Wolf
2023-12-21 21:23 ` [PULL 33/33] virtio-blk: add iothread-vq-mapping parameter Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).