qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PULL 00/24] Block layer patches
@ 2021-06-30 16:01 Kevin Wolf
  2021-07-02 13:52 ` Peter Maydell
  0 siblings, 1 reply; 28+ messages in thread
From: Kevin Wolf @ 2021-06-30 16:01 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, peter.maydell, qemu-devel

The following changes since commit 13d5f87cc3b94bfccc501142df4a7b12fee3a6e7:

  Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-axp-20210628' into staging (2021-06-29 10:02:42 +0100)

are available in the Git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to a527e312b59ac382cb84af4b91f517a846f50705:

  vhost-user-blk: Implement reconnection during realize (2021-06-30 13:21:22 +0200)

----------------------------------------------------------------
Block layer patches

- Supporting changing 'file' in x-blockdev-reopen
- ssh: add support for sha256 host key fingerprints
- vhost-user-blk: Implement reconnection during realize
- introduce QEMU_AUTO_VFREE
- Don't require password of encrypted backing file for image creation
- Code cleanups

----------------------------------------------------------------
Alberto Garcia (2):
      block: Allow changing bs->file on reopen
      iotests: Test replacing files with x-blockdev-reopen

Daniel P. Berrangé (1):
      block/ssh: add support for sha256 host key fingerprints

Eric Blake (1):
      block: Move read-only check during truncation earlier

Kevin Wolf (7):
      vhost: Add Error parameter to vhost_dev_init()
      vhost: Distinguish errors in vhost_backend_init()
      vhost: Return 0/-errno in vhost_dev_init()
      vhost-user-blk: Add Error parameter to vhost_user_blk_start()
      vhost: Distinguish errors in vhost_dev_get_config()
      vhost-user-blk: Factor out vhost_user_blk_realize_connect()
      vhost-user-blk: Implement reconnection during realize

Max Reitz (1):
      block: BDRV_O_NO_IO for backing file on creation

Miroslav Rezanina (1):
      Prevent compiler warning on block.c

Vladimir Sementsov-Ogievskiy (11):
      block: rename bdrv_replace_child to bdrv_replace_child_tran
      block: comment graph-modifying function not updating permissions
      block: introduce bdrv_remove_file_or_backing_child()
      block: introduce bdrv_set_file_or_backing_noperm()
      block: bdrv_reopen_parse_backing(): don't check aio context
      block: bdrv_reopen_parse_backing(): don't check frozen child
      block: bdrv_reopen_parse_backing(): simplify handling implicit filters
      block: move supports_backing check to bdrv_set_file_or_backing_noperm()
      block: BDRVReopenState: drop replace_backing_bs field
      introduce QEMU_AUTO_VFREE
      block/commit: use QEMU_AUTO_VFREE

 qapi/block-core.json              |   3 +-
 include/block/block.h             |   2 +-
 include/hw/virtio/vhost-backend.h |   5 +-
 include/hw/virtio/vhost.h         |   6 +-
 include/qemu/osdep.h              |  15 ++
 backends/cryptodev-vhost.c        |   5 +-
 backends/vhost-user.c             |   4 +-
 block.c                           | 314 +++++++++++++++++++++-----------------
 block/commit.c                    |  25 ++-
 block/io.c                        |  10 +-
 block/ssh.c                       |   3 +
 hw/block/vhost-user-blk.c         | 102 ++++++++-----
 hw/display/vhost-user-gpu.c       |   6 +-
 hw/input/vhost-user-input.c       |   6 +-
 hw/net/vhost_net.c                |   8 +-
 hw/scsi/vhost-scsi.c              |   4 +-
 hw/scsi/vhost-user-scsi.c         |   4 +-
 hw/virtio/vhost-backend.c         |   6 +-
 hw/virtio/vhost-user-fs.c         |   3 +-
 hw/virtio/vhost-user-vsock.c      |  12 +-
 hw/virtio/vhost-user.c            |  71 +++++----
 hw/virtio/vhost-vdpa.c            |   8 +-
 hw/virtio/vhost-vsock.c           |   3 +-
 hw/virtio/vhost.c                 |  41 +++--
 tests/unit/test-bdrv-drain.c      |   1 +
 tests/unit/test-bdrv-graph-mod.c  |   1 +
 tests/qemu-iotests/189            |   2 +-
 tests/qemu-iotests/198            |   2 +-
 tests/qemu-iotests/207            |  54 +++++++
 tests/qemu-iotests/207.out        |  25 +++
 tests/qemu-iotests/245            | 140 +++++++++++++++--
 tests/qemu-iotests/245.out        |  11 +-
 32 files changed, 599 insertions(+), 303 deletions(-)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PULL 00/24] Block layer patches
  2021-06-30 16:01 Kevin Wolf
@ 2021-07-02 13:52 ` Peter Maydell
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Maydell @ 2021-07-02 13:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: QEMU Developers, Qemu-block

On Wed, 30 Jun 2021 at 17:02, Kevin Wolf <kwolf@redhat.com> wrote:
>
> The following changes since commit 13d5f87cc3b94bfccc501142df4a7b12fee3a6e7:
>
>   Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-axp-20210628' into staging (2021-06-29 10:02:42 +0100)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to a527e312b59ac382cb84af4b91f517a846f50705:
>
>   vhost-user-blk: Implement reconnection during realize (2021-06-30 13:21:22 +0200)
>
> ----------------------------------------------------------------
> Block layer patches
>
> - Supporting changing 'file' in x-blockdev-reopen
> - ssh: add support for sha256 host key fingerprints
> - vhost-user-blk: Implement reconnection during realize
> - introduce QEMU_AUTO_VFREE
> - Don't require password of encrypted backing file for image creation
> - Code cleanups


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PULL 00/24] Block layer patches
@ 2025-06-04 17:55 Kevin Wolf
  2025-06-04 17:55 ` [PULL 01/24] block: remove outdated comments about AioContext locking Kevin Wolf
                   ` (24 more replies)
  0 siblings, 25 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:55 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

The following changes since commit 09be8a511a2e278b45729d7b065d30c68dd699d0:

  Merge tag 'pull-qapi-2025-06-03' of https://repo.or.cz/qemu/armbru into staging (2025-06-03 09:19:26 -0400)

are available in the Git repository at:

  https://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to eef2dd03f948a512499775043bdc0c5c88d8a2dd:

  hw/core/qdev-properties-system: Add missing return in set_drive_helper() (2025-06-04 18:16:34 +0200)

----------------------------------------------------------------
Block layer patches

- Deadlock fixes: Do not drain while holding the graph lock
- qdev-properties-system: Fix assertion failure in set_drive_helper()
- iotests: fix 240

----------------------------------------------------------------
Andrey Drobyshev (1):
      iotests/graph-changes-while-io: add test case with removal of lower snapshot

Fiona Ebner (22):
      block: remove outdated comments about AioContext locking
      block: move drain outside of read-locked bdrv_reopen_queue_child()
      block/snapshot: move drain outside of read-locked bdrv_snapshot_delete()
      block: move drain outside of read-locked bdrv_inactivate_recurse()
      block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK
      block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR)
      block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK
      block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK
      block: move drain outside of bdrv_try_change_aio_context()
      block: move drain outside of bdrv_attach_child_common(_abort)()
      block: move drain outside of bdrv_set_backing_hd_drained()
      block: move drain outside of bdrv_root_attach_child()
      block: move drain outside of bdrv_attach_child()
      block: move drain outside of quorum_add_child()
      block: move drain outside of bdrv_root_unref_child()
      block: move drain outside of quorum_del_child()
      blockdev: drain while unlocked in internal_snapshot_action()
      blockdev: drain while unlocked in external_snapshot_action()
      block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED
      iotests/graph-changes-while-io: remove image file after test
      block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end()
      hw/core/qdev-properties-system: Add missing return in set_drive_helper()

Stefan Hajnoczi (1):
      iotests: fix 240

 include/block/block-global-state.h                 |  19 +-
 include/block/block-io.h                           |   2 +-
 include/block/block_int-common.h                   |  32 ++-
 include/block/blockjob.h                           |   2 +
 block.c                                            | 235 ++++++++++++++-------
 block/backup.c                                     |   2 +
 block/blklogwrites.c                               |   4 +
 block/blkverify.c                                  |   2 +
 block/block-backend.c                              |  10 +-
 block/commit.c                                     |   4 +
 block/io.c                                         |   1 -
 block/mirror.c                                     |   5 +
 block/qcow2.c                                      |   4 +
 block/quorum.c                                     |   4 +
 block/replication.c                                |   7 +
 block/snapshot.c                                   |  28 ++-
 block/stream.c                                     |  10 +-
 block/vmdk.c                                       |  10 +
 blockdev.c                                         |  78 +++++--
 blockjob.c                                         |  12 +-
 hw/core/qdev-properties-system.c                   |   1 +
 qemu-img.c                                         |   2 +
 tests/unit/test-bdrv-drain.c                       |  24 ++-
 tests/unit/test-bdrv-graph-mod.c                   |  10 +
 tests/qemu-iotests/240                             |   2 -
 tests/qemu-iotests/240.out                         |   4 +-
 tests/qemu-iotests/tests/graph-changes-while-io    | 102 ++++++++-
 .../qemu-iotests/tests/graph-changes-while-io.out  |   4 +-
 28 files changed, 475 insertions(+), 145 deletions(-)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PULL 01/24] block: remove outdated comments about AioContext locking
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
@ 2025-06-04 17:55 ` Kevin Wolf
  2025-06-04 17:55 ` [PULL 02/24] block: move drain outside of read-locked bdrv_reopen_queue_child() Kevin Wolf
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:55 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

AioContext locking was removed in commit b49f4755c7 ("block: remove
AioContext locking").

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-2-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/block.c b/block.c
index f222e1a50a..a5399888ba 100644
--- a/block.c
+++ b/block.c
@@ -4359,8 +4359,6 @@ bdrv_recurse_has_child(BlockDriverState *bs, BlockDriverState *child)
  * bs_queue, or the existing bs_queue being used.
  *
  * bs is drained here and undrained by bdrv_reopen_queue_free().
- *
- * To be called with bs->aio_context locked.
  */
 static BlockReopenQueue * GRAPH_RDLOCK
 bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, BlockDriverState *bs,
@@ -4519,7 +4517,6 @@ bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, BlockDriverState *bs,
     return bs_queue;
 }
 
-/* To be called with bs->aio_context locked */
 BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
                                     BlockDriverState *bs,
                                     QDict *options, bool keep_old_opts)
@@ -7278,10 +7275,6 @@ bool bdrv_op_blocker_is_empty(BlockDriverState *bs)
     return true;
 }
 
-/*
- * Must not be called while holding the lock of an AioContext other than the
- * current one.
- */
 void bdrv_img_create(const char *filename, const char *fmt,
                      const char *base_filename, const char *base_fmt,
                      char *options, uint64_t img_size, int flags, bool quiet,
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 02/24] block: move drain outside of read-locked bdrv_reopen_queue_child()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
  2025-06-04 17:55 ` [PULL 01/24] block: remove outdated comments about AioContext locking Kevin Wolf
@ 2025-06-04 17:55 ` Kevin Wolf
  2025-06-04 17:55 ` [PULL 03/24] block/snapshot: move drain outside of read-locked bdrv_snapshot_delete() Kevin Wolf
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:55 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.

More granular draining is not trivially possible, because
bdrv_reopen_queue_child() can recursively call itself.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-3-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index a5399888ba..3065d5c91e 100644
--- a/block.c
+++ b/block.c
@@ -4358,7 +4358,7 @@ bdrv_recurse_has_child(BlockDriverState *bs, BlockDriverState *child)
  * returns a pointer to bs_queue, which is either the newly allocated
  * bs_queue, or the existing bs_queue being used.
  *
- * bs is drained here and undrained by bdrv_reopen_queue_free().
+ * bs must be drained.
  */
 static BlockReopenQueue * GRAPH_RDLOCK
 bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, BlockDriverState *bs,
@@ -4377,12 +4377,7 @@ bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, BlockDriverState *bs,
 
     GLOBAL_STATE_CODE();
 
-    /*
-     * Strictly speaking, draining is illegal under GRAPH_RDLOCK. We know that
-     * we've been called with bdrv_graph_rdlock_main_loop(), though, so it's ok
-     * in practice.
-     */
-    bdrv_drained_begin(bs);
+    assert(bs->quiesce_counter > 0);
 
     if (bs_queue == NULL) {
         bs_queue = g_new0(BlockReopenQueue, 1);
@@ -4522,6 +4517,12 @@ BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
                                     QDict *options, bool keep_old_opts)
 {
     GLOBAL_STATE_CODE();
+
+    if (bs_queue == NULL) {
+        /* Paired with bdrv_drain_all_end() in bdrv_reopen_queue_free(). */
+        bdrv_drain_all_begin();
+    }
+
     GRAPH_RDLOCK_GUARD_MAINLOOP();
 
     return bdrv_reopen_queue_child(bs_queue, bs, options, NULL, 0, false,
@@ -4534,12 +4535,14 @@ void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue)
     if (bs_queue) {
         BlockReopenQueueEntry *bs_entry, *next;
         QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
-            bdrv_drained_end(bs_entry->state.bs);
             qobject_unref(bs_entry->state.explicit_options);
             qobject_unref(bs_entry->state.options);
             g_free(bs_entry);
         }
         g_free(bs_queue);
+
+        /* Paired with bdrv_drain_all_begin() in bdrv_reopen_queue(). */
+        bdrv_drain_all_end();
     }
 }
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 03/24] block/snapshot: move drain outside of read-locked bdrv_snapshot_delete()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
  2025-06-04 17:55 ` [PULL 01/24] block: remove outdated comments about AioContext locking Kevin Wolf
  2025-06-04 17:55 ` [PULL 02/24] block: move drain outside of read-locked bdrv_reopen_queue_child() Kevin Wolf
@ 2025-06-04 17:55 ` Kevin Wolf
  2025-06-04 17:55 ` [PULL 04/24] block: move drain outside of read-locked bdrv_inactivate_recurse() Kevin Wolf
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:55 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.

More granular draining is not trivially possible, because
bdrv_snapshot_delete() can recursively call itself.

The return value of bdrv_all_delete_snapshot() changes from -1 to
-errno propagated from failed sub-calls. This is fine for the existing
callers of bdrv_all_delete_snapshot().

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-4-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/snapshot.c | 26 +++++++++++++++-----------
 blockdev.c       | 25 +++++++++++++++++--------
 qemu-img.c       |  2 ++
 3 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/block/snapshot.c b/block/snapshot.c
index 22567f1fb9..9f300a78bd 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -327,7 +327,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 
 /**
  * Delete an internal snapshot by @snapshot_id and @name.
- * @bs: block device used in the operation
+ * @bs: block device used in the operation, must be drained
  * @snapshot_id: unique snapshot ID, or NULL
  * @name: snapshot name, or NULL
  * @errp: location to store error
@@ -358,6 +358,8 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
 
     GLOBAL_STATE_CODE();
 
+    assert(bs->quiesce_counter > 0);
+
     if (!drv) {
         error_setg(errp, "Device '%s' has no medium",
                    bdrv_get_device_name(bs));
@@ -368,9 +370,6 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
         return -EINVAL;
     }
 
-    /* drain all pending i/o before deleting snapshot */
-    bdrv_drained_begin(bs);
-
     if (drv->bdrv_snapshot_delete) {
         ret = drv->bdrv_snapshot_delete(bs, snapshot_id, name, errp);
     } else if (fallback_bs) {
@@ -382,7 +381,6 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
         ret = -ENOTSUP;
     }
 
-    bdrv_drained_end(bs);
     return ret;
 }
 
@@ -571,19 +569,22 @@ int bdrv_all_delete_snapshot(const char *name,
     ERRP_GUARD();
     g_autoptr(GList) bdrvs = NULL;
     GList *iterbdrvs;
+    int ret = 0;
 
     GLOBAL_STATE_CODE();
-    GRAPH_RDLOCK_GUARD_MAINLOOP();
 
-    if (bdrv_all_get_snapshot_devices(has_devices, devices, &bdrvs, errp) < 0) {
-        return -1;
+    bdrv_drain_all_begin();
+    bdrv_graph_rdlock_main_loop();
+
+    ret = bdrv_all_get_snapshot_devices(has_devices, devices, &bdrvs, errp);
+    if (ret < 0) {
+        goto out;
     }
 
     iterbdrvs = bdrvs;
     while (iterbdrvs) {
         BlockDriverState *bs = iterbdrvs->data;
         QEMUSnapshotInfo sn1, *snapshot = &sn1;
-        int ret = 0;
 
         if ((devices || bdrv_all_snapshots_includes_bs(bs)) &&
             bdrv_snapshot_find(bs, snapshot, name) >= 0)
@@ -594,13 +595,16 @@ int bdrv_all_delete_snapshot(const char *name,
         if (ret < 0) {
             error_prepend(errp, "Could not delete snapshot '%s' on '%s': ",
                           name, bdrv_get_device_or_node_name(bs));
-            return -1;
+            goto out;
         }
 
         iterbdrvs = iterbdrvs->next;
     }
 
-    return 0;
+out:
+    bdrv_graph_rdunlock_main_loop();
+    bdrv_drain_all_end();
+    return ret;
 }
 
 
diff --git a/blockdev.c b/blockdev.c
index 21443b4514..3982f9776b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1132,39 +1132,41 @@ SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device,
     int ret;
 
     GLOBAL_STATE_CODE();
-    GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+    bdrv_drain_all_begin();
+    bdrv_graph_rdlock_main_loop();
 
     bs = qmp_get_root_bs(device, errp);
     if (!bs) {
-        return NULL;
+        goto error;
     }
 
     if (!id && !name) {
         error_setg(errp, "Name or id must be provided");
-        return NULL;
+        goto error;
     }
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_INTERNAL_SNAPSHOT_DELETE, errp)) {
-        return NULL;
+        goto error;
     }
 
     ret = bdrv_snapshot_find_by_id_and_name(bs, id, name, &sn, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
-        return NULL;
+        goto error;
     }
     if (!ret) {
         error_setg(errp,
                    "Snapshot with id '%s' and name '%s' does not exist on "
                    "device '%s'",
                    STR_OR_NULL(id), STR_OR_NULL(name), device);
-        return NULL;
+        goto error;
     }
 
     bdrv_snapshot_delete(bs, id, name, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
-        return NULL;
+        goto error;
     }
 
     info = g_new0(SnapshotInfo, 1);
@@ -1180,6 +1182,9 @@ SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device,
         info->has_icount = true;
     }
 
+error:
+    bdrv_graph_rdunlock_main_loop();
+    bdrv_drain_all_end();
     return info;
 }
 
@@ -1295,12 +1300,14 @@ static void internal_snapshot_abort(void *opaque)
     Error *local_error = NULL;
 
     GLOBAL_STATE_CODE();
-    GRAPH_RDLOCK_GUARD_MAINLOOP();
 
     if (!state->created) {
         return;
     }
 
+    bdrv_drain_all_begin();
+    bdrv_graph_rdlock_main_loop();
+
     if (bdrv_snapshot_delete(bs, sn->id_str, sn->name, &local_error) < 0) {
         error_reportf_err(local_error,
                           "Failed to delete snapshot with id '%s' and "
@@ -1308,6 +1315,8 @@ static void internal_snapshot_abort(void *opaque)
                           sn->id_str, sn->name,
                           bdrv_get_device_name(bs));
     }
+    bdrv_graph_rdunlock_main_loop();
+    bdrv_drain_all_end();
 }
 
 static void internal_snapshot_clean(void *opaque)
diff --git a/qemu-img.c b/qemu-img.c
index 139eeb5039..e75707180d 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3505,6 +3505,7 @@ static int img_snapshot(int argc, char **argv)
         break;
 
     case SNAPSHOT_DELETE:
+        bdrv_drain_all_begin();
         bdrv_graph_rdlock_main_loop();
         ret = bdrv_snapshot_find(bs, &sn, snapshot_name);
         if (ret < 0) {
@@ -3520,6 +3521,7 @@ static int img_snapshot(int argc, char **argv)
             }
         }
         bdrv_graph_rdunlock_main_loop();
+        bdrv_drain_all_end();
         break;
     }
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 04/24] block: move drain outside of read-locked bdrv_inactivate_recurse()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (2 preceding siblings ...)
  2025-06-04 17:55 ` [PULL 03/24] block/snapshot: move drain outside of read-locked bdrv_snapshot_delete() Kevin Wolf
@ 2025-06-04 17:55 ` Kevin Wolf
  2025-06-04 17:55 ` [PULL 05/24] block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK Kevin Wolf
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:55 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.

More granular draining is not trivially possible, because
bdrv_inactivate_recurse() can recursively call itself.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-5-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 3065d5c91e..fa55dfba68 100644
--- a/block.c
+++ b/block.c
@@ -6989,6 +6989,8 @@ bdrv_inactivate_recurse(BlockDriverState *bs, bool top_level)
 
     GLOBAL_STATE_CODE();
 
+    assert(bs->quiesce_counter > 0);
+
     if (!bs->drv) {
         return -ENOMEDIUM;
     }
@@ -7032,9 +7034,7 @@ bdrv_inactivate_recurse(BlockDriverState *bs, bool top_level)
         return -EPERM;
     }
 
-    bdrv_drained_begin(bs);
     bs->open_flags |= BDRV_O_INACTIVE;
-    bdrv_drained_end(bs);
 
     /*
      * Update permissions, they may differ for inactive nodes.
@@ -7059,20 +7059,26 @@ int bdrv_inactivate(BlockDriverState *bs, Error **errp)
     int ret;
 
     GLOBAL_STATE_CODE();
-    GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+    bdrv_drain_all_begin();
+    bdrv_graph_rdlock_main_loop();
 
     if (bdrv_has_bds_parent(bs, true)) {
         error_setg(errp, "Node has active parent node");
-        return -EPERM;
+        ret = -EPERM;
+        goto out;
     }
 
     ret = bdrv_inactivate_recurse(bs, true);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Failed to inactivate node");
-        return ret;
+        goto out;
     }
 
-    return 0;
+out:
+    bdrv_graph_rdunlock_main_loop();
+    bdrv_drain_all_end();
+    return ret;
 }
 
 int bdrv_inactivate_all(void)
@@ -7082,7 +7088,9 @@ int bdrv_inactivate_all(void)
     int ret = 0;
 
     GLOBAL_STATE_CODE();
-    GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+    bdrv_drain_all_begin();
+    bdrv_graph_rdlock_main_loop();
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         /* Nodes with BDS parents are covered by recursion from the last
@@ -7098,6 +7106,9 @@ int bdrv_inactivate_all(void)
         }
     }
 
+    bdrv_graph_rdunlock_main_loop();
+    bdrv_drain_all_end();
+
     return ret;
 }
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 05/24] block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (3 preceding siblings ...)
  2025-06-04 17:55 ` [PULL 04/24] block: move drain outside of read-locked bdrv_inactivate_recurse() Kevin Wolf
@ 2025-06-04 17:55 ` Kevin Wolf
  2025-06-04 17:55 ` [PULL 06/24] block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR) Kevin Wolf
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:55 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is a small step in preparation to mark bdrv_drained_begin() as
GRAPH_UNLOCKED. More concretely, it allows marking the
change_aio_ctx() callback GRAPH_RDLOCK_PTR, which is the next step.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-6-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index fa55dfba68..7207978e53 100644
--- a/block.c
+++ b/block.c
@@ -7575,10 +7575,10 @@ typedef struct BdrvStateSetAioContext {
     BlockDriverState *bs;
 } BdrvStateSetAioContext;
 
-static bool bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
-                                           GHashTable *visited,
-                                           Transaction *tran,
-                                           Error **errp)
+static bool GRAPH_RDLOCK
+bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
+                               GHashTable *visited, Transaction *tran,
+                               Error **errp)
 {
     GLOBAL_STATE_CODE();
     if (g_hash_table_contains(visited, c)) {
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 06/24] block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR)
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (4 preceding siblings ...)
  2025-06-04 17:55 ` [PULL 05/24] block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK Kevin Wolf
@ 2025-06-04 17:55 ` Kevin Wolf
  2025-06-04 17:55 ` [PULL 07/24] block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK Kevin Wolf
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:55 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is a small step in preparation to mark bdrv_drained_begin() as
GRAPH_UNLOCKED. More concretely, it is in preparation to move the
drain out of bdrv_change_aio_context() and marking that function as
GRAPH_RDLOCK.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-7-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block_int-common.h | 6 +++---
 block.c                          | 7 ++++---
 block/block-backend.c            | 6 +++---
 blockjob.c                       | 6 +++---
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 2982dd3118..37466c7841 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -983,9 +983,9 @@ struct BdrvChildClass {
                            bool backing_mask_protocol,
                            Error **errp);
 
-    bool (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
-                           GHashTable *visited, Transaction *tran,
-                           Error **errp);
+    bool GRAPH_RDLOCK_PTR (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
+                                            GHashTable *visited,
+                                            Transaction *tran, Error **errp);
 
     /*
      * I/O API functions. These functions are thread-safe.
diff --git a/block.c b/block.c
index 7207978e53..01144c895e 100644
--- a/block.c
+++ b/block.c
@@ -1226,9 +1226,10 @@ static int bdrv_child_cb_inactivate(BdrvChild *child)
     return 0;
 }
 
-static bool bdrv_child_cb_change_aio_ctx(BdrvChild *child, AioContext *ctx,
-                                         GHashTable *visited, Transaction *tran,
-                                         Error **errp)
+static bool GRAPH_RDLOCK
+bdrv_child_cb_change_aio_ctx(BdrvChild *child, AioContext *ctx,
+                             GHashTable *visited, Transaction *tran,
+                             Error **errp)
 {
     BlockDriverState *bs = child->opaque;
     return bdrv_change_aio_context(bs, ctx, visited, tran, errp);
diff --git a/block/block-backend.c b/block/block-backend.c
index a402db13f2..6a6949edeb 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -136,9 +136,9 @@ static void blk_root_drained_end(BdrvChild *child);
 static void blk_root_change_media(BdrvChild *child, bool load);
 static void blk_root_resize(BdrvChild *child);
 
-static bool blk_root_change_aio_ctx(BdrvChild *child, AioContext *ctx,
-                                    GHashTable *visited, Transaction *tran,
-                                    Error **errp);
+static bool GRAPH_RDLOCK
+blk_root_change_aio_ctx(BdrvChild *child, AioContext *ctx, GHashTable *visited,
+                        Transaction *tran, Error **errp);
 
 static char *blk_root_get_parent_desc(BdrvChild *child)
 {
diff --git a/blockjob.c b/blockjob.c
index 32007f31a9..34185d7715 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -144,9 +144,9 @@ static TransactionActionDrv change_child_job_context = {
     .clean = g_free,
 };
 
-static bool child_job_change_aio_ctx(BdrvChild *c, AioContext *ctx,
-                                     GHashTable *visited, Transaction *tran,
-                                     Error **errp)
+static bool GRAPH_RDLOCK
+child_job_change_aio_ctx(BdrvChild *c, AioContext *ctx, GHashTable *visited,
+                         Transaction *tran, Error **errp)
 {
     BlockJob *job = c->opaque;
     BdrvStateChildJobContext *s;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 07/24] block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (5 preceding siblings ...)
  2025-06-04 17:55 ` [PULL 06/24] block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR) Kevin Wolf
@ 2025-06-04 17:55 ` Kevin Wolf
  2025-06-04 17:55 ` [PULL 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK Kevin Wolf
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:55 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is a small step in preparation to mark bdrv_drained_begin() as
GRAPH_UNLOCKED. More concretely, it is in preparation to move the
drain out of bdrv_change_aio_context() and marking that function as
GRAPH_RDLOCK.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-8-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-global-state.h | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index 9be34b3c99..aad160956a 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -274,9 +274,10 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag);
 int bdrv_debug_resume(BlockDriverState *bs, const char *tag);
 bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag);
 
-bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
-                                   GHashTable *visited, Transaction *tran,
-                                   Error **errp);
+bool GRAPH_RDLOCK
+bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
+                              GHashTable *visited, Transaction *tran,
+                              Error **errp);
 int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
                                 BdrvChild *ignore_child, Error **errp);
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (6 preceding siblings ...)
  2025-06-04 17:55 ` [PULL 07/24] block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK Kevin Wolf
@ 2025-06-04 17:55 ` Kevin Wolf
  2025-06-04 17:55 ` [PULL 09/24] block: move drain outside of bdrv_try_change_aio_context() Kevin Wolf
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:55 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.

Note that even if bdrv_drained_begin() were already marked as
GRAPH_UNLOCKED, TSA would not complain about the instance in
bdrv_change_aio_context() before this change, because it is preceded
by a bdrv_graph_rdunlock_main_loop() call. It is not correct to
release the lock here, and in case the caller holds a write lock, it
wouldn't actually release the lock.

In combination with block-stream, there is a deadlock that can happen
because of this [0]. In particular, it can happen that
main thread              IO thread
1. acquires write lock
                         in blk_co_do_preadv_part():
                         2. have non-zero blk->in_flight
                         3. try to acquire read lock
4. begin drain

Steps 3 and 4 might be switched. Draining will poll and get stuck,
because it will see the non-zero in_flight counter. But the IO thread
will not make any progress either, because it cannot acquire the read
lock.

After this change, all paths to bdrv_change_aio_context() drain:
bdrv_change_aio_context() is called by:
1. bdrv_child_cb_change_aio_ctx() which is only called via the
   change_aio_ctx() callback, see below.
2. bdrv_child_change_aio_context(), see below.
3. bdrv_try_change_aio_context(), where a drained section is
   introduced.

The change_aio_ctx() callback is called by:
1. bdrv_attach_child_common_abort(), where a drained section is
   introduced.
2. bdrv_attach_child_common(), where a drained section is introduced.
3. bdrv_parent_change_aio_context(), see below.

bdrv_child_change_aio_context() is called by:
1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
   section is invariant.
2. child_job_change_aio_ctx(), which is only called via the
   change_aio_ctx() callback, see above.

bdrv_parent_change_aio_context() is called by:
1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
   section is invariant.

This resolves all code paths. Note that bdrv_attach_child_common()
and bdrv_attach_child_common_abort() hold the graph write lock and
callers of bdrv_try_change_aio_context() might too, so they are not
actually allowed to drain either. This will be addressed in the
following commits.

More granular draining is not trivially possible, because
bdrv_change_aio_context() can recursively call itself e.g. via
bdrv_child_change_aio_context().

[0]: https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63f51@virtuozzo.com/

Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-9-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block_int-common.h | 12 +++++++
 block.c                          | 57 +++++++++++++++++++++++---------
 2 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 37466c7841..168f703fa1 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -983,6 +983,18 @@ struct BdrvChildClass {
                            bool backing_mask_protocol,
                            Error **errp);
 
+    /*
+     * Notifies the parent that the child is trying to change its AioContext.
+     * The parent may in turn change the AioContext of other nodes in the same
+     * transaction. Returns true if the change is possible and the transaction
+     * can be continued. Returns false and sets @errp if not and the transaction
+     * must be aborted.
+     *
+     * @visited will accumulate all visited BdrvChild objects. The caller is
+     * responsible for freeing the list afterwards.
+     *
+     * Must be called with the affected block nodes drained.
+     */
     bool GRAPH_RDLOCK_PTR (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
                                             GHashTable *visited,
                                             Transaction *tran, Error **errp);
diff --git a/block.c b/block.c
index 01144c895e..6f42c0f1ab 100644
--- a/block.c
+++ b/block.c
@@ -106,9 +106,9 @@ static void bdrv_reopen_abort(BDRVReopenState *reopen_state);
 
 static bool bdrv_backing_overridden(BlockDriverState *bs);
 
-static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
-                                    GHashTable *visited, Transaction *tran,
-                                    Error **errp);
+static bool GRAPH_RDLOCK
+bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+                        GHashTable *visited, Transaction *tran, Error **errp);
 
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
@@ -3040,8 +3040,10 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
 
         /* No need to visit `child`, because it has been detached already */
         visited = g_hash_table_new(NULL, NULL);
+        bdrv_drain_all_begin();
         ret = s->child->klass->change_aio_ctx(s->child, s->old_parent_ctx,
                                               visited, tran, &error_abort);
+        bdrv_drain_all_end();
         g_hash_table_destroy(visited);
 
         /* transaction is supposed to always succeed */
@@ -3122,9 +3124,11 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
             bool ret_child;
 
             g_hash_table_add(visited, new_child);
+            bdrv_drain_all_begin();
             ret_child = child_class->change_aio_ctx(new_child, child_ctx,
                                                     visited, aio_ctx_tran,
                                                     NULL);
+            bdrv_drain_all_end();
             if (ret_child == true) {
                 error_free(local_err);
                 ret = 0;
@@ -7576,6 +7580,17 @@ typedef struct BdrvStateSetAioContext {
     BlockDriverState *bs;
 } BdrvStateSetAioContext;
 
+/*
+ * Changes the AioContext of @child to @ctx and recursively for the associated
+ * block nodes and all their children and parents. Returns true if the change is
+ * possible and the transaction @tran can be continued. Returns false and sets
+ * @errp if not and the transaction must be aborted.
+ *
+ * @visited will accumulate all visited BdrvChild objects. The caller is
+ * responsible for freeing the list afterwards.
+ *
+ * Must be called with the affected block nodes drained.
+ */
 static bool GRAPH_RDLOCK
 bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
                                GHashTable *visited, Transaction *tran,
@@ -7604,6 +7619,17 @@ bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
     return true;
 }
 
+/*
+ * Changes the AioContext of @c->bs to @ctx and recursively for all its children
+ * and parents. Returns true if the change is possible and the transaction @tran
+ * can be continued. Returns false and sets @errp if not and the transaction
+ * must be aborted.
+ *
+ * @visited will accumulate all visited BdrvChild objects. The caller is
+ * responsible for freeing the list afterwards.
+ *
+ * Must be called with the affected block nodes drained.
+ */
 bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
                                    GHashTable *visited, Transaction *tran,
                                    Error **errp)
@@ -7619,10 +7645,6 @@ bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
 static void bdrv_set_aio_context_clean(void *opaque)
 {
     BdrvStateSetAioContext *state = (BdrvStateSetAioContext *) opaque;
-    BlockDriverState *bs = (BlockDriverState *) state->bs;
-
-    /* Paired with bdrv_drained_begin in bdrv_change_aio_context() */
-    bdrv_drained_end(bs);
 
     g_free(state);
 }
@@ -7650,10 +7672,12 @@ static TransactionActionDrv set_aio_context = {
  *
  * @visited will accumulate all visited BdrvChild objects. The caller is
  * responsible for freeing the list afterwards.
+ *
+ * @bs must be drained.
  */
-static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
-                                    GHashTable *visited, Transaction *tran,
-                                    Error **errp)
+static bool GRAPH_RDLOCK
+bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+                        GHashTable *visited, Transaction *tran, Error **errp)
 {
     BdrvChild *c;
     BdrvStateSetAioContext *state;
@@ -7664,21 +7688,17 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
         return true;
     }
 
-    bdrv_graph_rdlock_main_loop();
     QLIST_FOREACH(c, &bs->parents, next_parent) {
         if (!bdrv_parent_change_aio_context(c, ctx, visited, tran, errp)) {
-            bdrv_graph_rdunlock_main_loop();
             return false;
         }
     }
 
     QLIST_FOREACH(c, &bs->children, next) {
         if (!bdrv_child_change_aio_context(c, ctx, visited, tran, errp)) {
-            bdrv_graph_rdunlock_main_loop();
             return false;
         }
     }
-    bdrv_graph_rdunlock_main_loop();
 
     state = g_new(BdrvStateSetAioContext, 1);
     *state = (BdrvStateSetAioContext) {
@@ -7686,8 +7706,7 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
         .bs = bs,
     };
 
-    /* Paired with bdrv_drained_end in bdrv_set_aio_context_clean() */
-    bdrv_drained_begin(bs);
+    assert(bs->quiesce_counter > 0);
 
     tran_add(tran, &set_aio_context, state);
 
@@ -7720,6 +7739,8 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
     if (ignore_child) {
         g_hash_table_add(visited, ignore_child);
     }
+    bdrv_drain_all_begin();
+    bdrv_graph_rdlock_main_loop();
     ret = bdrv_change_aio_context(bs, ctx, visited, tran, errp);
     g_hash_table_destroy(visited);
 
@@ -7733,10 +7754,14 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
     if (!ret) {
         /* Just run clean() callbacks. No AioContext changed. */
         tran_abort(tran);
+        bdrv_graph_rdunlock_main_loop();
+        bdrv_drain_all_end();
         return -EPERM;
     }
 
     tran_commit(tran);
+    bdrv_graph_rdunlock_main_loop();
+    bdrv_drain_all_end();
     return 0;
 }
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 09/24] block: move drain outside of bdrv_try_change_aio_context()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (7 preceding siblings ...)
  2025-06-04 17:55 ` [PULL 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK Kevin Wolf
@ 2025-06-04 17:55 ` Kevin Wolf
  2025-06-04 17:55 ` [PULL 10/24] block: move drain outside of bdrv_attach_child_common(_abort)() Kevin Wolf
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:55 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

Convert the function to a _locked() version that has to be called with
the graph lock held and add a convenience wrapper that has to be
called with the graph unlocked, which drains and takes the lock
itself. Since bdrv_try_change_aio_context() is global state code, the
wrapper is too.

Callers are adapted to use the appropriate variant, depending on
whether the caller already holds the lock. In the
test_set_aio_context() unit test, prior drains can be removed, because
draining already happens inside the new wrapper.

Note that bdrv_attach_child_common_abort(), bdrv_attach_child_common()
and bdrv_root_unref_child() hold the graph lock and are not actually
allowed to drain either. This will be addressed in the following
commits.

Functions like qmp_blockdev_mirror() query the nodes to act on before
draining and locking. In theory, draining could invalidate those nodes.
This kind of issue is not addressed by these commits.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-10-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-global-state.h |  8 +++--
 block.c                            | 58 ++++++++++++++++++++++--------
 blockdev.c                         | 15 +++++---
 tests/unit/test-bdrv-drain.c       |  4 ---
 4 files changed, 59 insertions(+), 26 deletions(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index aad160956a..91f249b5ad 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -278,8 +278,12 @@ bool GRAPH_RDLOCK
 bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
                               GHashTable *visited, Transaction *tran,
                               Error **errp);
-int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
-                                BdrvChild *ignore_child, Error **errp);
+int GRAPH_UNLOCKED
+bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+                            BdrvChild *ignore_child, Error **errp);
+int GRAPH_RDLOCK
+bdrv_try_change_aio_context_locked(BlockDriverState *bs, AioContext *ctx,
+                                   BdrvChild *ignore_child, Error **errp);
 
 int GRAPH_RDLOCK bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz);
 int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo);
diff --git a/block.c b/block.c
index 6f42c0f1ab..3aaacabf7f 100644
--- a/block.c
+++ b/block.c
@@ -3028,7 +3028,10 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
     bdrv_replace_child_noperm(s->child, NULL);
 
     if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
-        bdrv_try_change_aio_context(bs, s->old_child_ctx, NULL, &error_abort);
+        bdrv_drain_all_begin();
+        bdrv_try_change_aio_context_locked(bs, s->old_child_ctx, NULL,
+                                           &error_abort);
+        bdrv_drain_all_end();
     }
 
     if (bdrv_child_get_parent_aio_context(s->child) != s->old_parent_ctx) {
@@ -3115,8 +3118,10 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
     parent_ctx = bdrv_child_get_parent_aio_context(new_child);
     if (child_ctx != parent_ctx) {
         Error *local_err = NULL;
-        int ret = bdrv_try_change_aio_context(child_bs, parent_ctx, NULL,
-                                              &local_err);
+        bdrv_drain_all_begin();
+        int ret = bdrv_try_change_aio_context_locked(child_bs, parent_ctx, NULL,
+                                                     &local_err);
+        bdrv_drain_all_end();
 
         if (ret < 0 && child_class->change_aio_ctx) {
             Transaction *aio_ctx_tran = tran_new();
@@ -3319,8 +3324,10 @@ void bdrv_root_unref_child(BdrvChild *child)
          * When the parent requiring a non-default AioContext is removed, the
          * node moves back to the main AioContext
          */
-        bdrv_try_change_aio_context(child_bs, qemu_get_aio_context(), NULL,
-                                    NULL);
+        bdrv_drain_all_begin();
+        bdrv_try_change_aio_context_locked(child_bs, qemu_get_aio_context(),
+                                           NULL, NULL);
+        bdrv_drain_all_end();
     }
 
     bdrv_schedule_unref(child_bs);
@@ -7719,9 +7726,13 @@ bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
  *
  * If ignore_child is not NULL, that child (and its subgraph) will not
  * be touched.
+ *
+ * Called with the graph lock held.
+ *
+ * Called while all bs are drained.
  */
-int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
-                                BdrvChild *ignore_child, Error **errp)
+int bdrv_try_change_aio_context_locked(BlockDriverState *bs, AioContext *ctx,
+                                       BdrvChild *ignore_child, Error **errp)
 {
     Transaction *tran;
     GHashTable *visited;
@@ -7730,17 +7741,15 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
 
     /*
      * Recursion phase: go through all nodes of the graph.
-     * Take care of checking that all nodes support changing AioContext
-     * and drain them, building a linear list of callbacks to run if everything
-     * is successful (the transaction itself).
+     * Take care of checking that all nodes support changing AioContext,
+     * building a linear list of callbacks to run if everything is successful
+     * (the transaction itself).
      */
     tran = tran_new();
     visited = g_hash_table_new(NULL, NULL);
     if (ignore_child) {
         g_hash_table_add(visited, ignore_child);
     }
-    bdrv_drain_all_begin();
-    bdrv_graph_rdlock_main_loop();
     ret = bdrv_change_aio_context(bs, ctx, visited, tran, errp);
     g_hash_table_destroy(visited);
 
@@ -7754,15 +7763,34 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
     if (!ret) {
         /* Just run clean() callbacks. No AioContext changed. */
         tran_abort(tran);
-        bdrv_graph_rdunlock_main_loop();
-        bdrv_drain_all_end();
         return -EPERM;
     }
 
     tran_commit(tran);
+    return 0;
+}
+
+/*
+ * Change bs's and recursively all of its parents' and children's AioContext
+ * to the given new context, returning an error if that isn't possible.
+ *
+ * If ignore_child is not NULL, that child (and its subgraph) will not
+ * be touched.
+ */
+int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+                                BdrvChild *ignore_child, Error **errp)
+{
+    int ret;
+
+    GLOBAL_STATE_CODE();
+
+    bdrv_drain_all_begin();
+    bdrv_graph_rdlock_main_loop();
+    ret = bdrv_try_change_aio_context_locked(bs, ctx, ignore_child, errp);
     bdrv_graph_rdunlock_main_loop();
     bdrv_drain_all_end();
-    return 0;
+
+    return ret;
 }
 
 void bdrv_add_aio_context_notifier(BlockDriverState *bs,
diff --git a/blockdev.c b/blockdev.c
index 3982f9776b..750beba41f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3601,12 +3601,13 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread,
     AioContext *new_context;
     BlockDriverState *bs;
 
-    GRAPH_RDLOCK_GUARD_MAINLOOP();
+    bdrv_drain_all_begin();
+    bdrv_graph_rdlock_main_loop();
 
     bs = bdrv_find_node(node_name);
     if (!bs) {
         error_setg(errp, "Failed to find node with node-name='%s'", node_name);
-        return;
+        goto out;
     }
 
     /* Protects against accidents. */
@@ -3614,14 +3615,14 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread,
         error_setg(errp, "Node %s is associated with a BlockBackend and could "
                          "be in use (use force=true to override this check)",
                          node_name);
-        return;
+        goto out;
     }
 
     if (iothread->type == QTYPE_QSTRING) {
         IOThread *obj = iothread_by_id(iothread->u.s);
         if (!obj) {
             error_setg(errp, "Cannot find iothread %s", iothread->u.s);
-            return;
+            goto out;
         }
 
         new_context = iothread_get_aio_context(obj);
@@ -3629,7 +3630,11 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread,
         new_context = qemu_get_aio_context();
     }
 
-    bdrv_try_change_aio_context(bs, new_context, NULL, errp);
+    bdrv_try_change_aio_context_locked(bs, new_context, NULL, errp);
+
+out:
+    bdrv_graph_rdunlock_main_loop();
+    bdrv_drain_all_end();
 }
 
 QemuOptsList qemu_common_drive_opts = {
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 290cd2a70e..3185f3f429 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1396,14 +1396,10 @@ static void test_set_aio_context(void)
     bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
                               &error_abort);
 
-    bdrv_drained_begin(bs);
     bdrv_try_change_aio_context(bs, ctx_a, NULL, &error_abort);
-    bdrv_drained_end(bs);
 
-    bdrv_drained_begin(bs);
     bdrv_try_change_aio_context(bs, ctx_b, NULL, &error_abort);
     bdrv_try_change_aio_context(bs, qemu_get_aio_context(), NULL, &error_abort);
-    bdrv_drained_end(bs);
 
     bdrv_unref(bs);
     iothread_join(a);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 10/24] block: move drain outside of bdrv_attach_child_common(_abort)()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (8 preceding siblings ...)
  2025-06-04 17:55 ` [PULL 09/24] block: move drain outside of bdrv_try_change_aio_context() Kevin Wolf
@ 2025-06-04 17:55 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 11/24] block: move drain outside of bdrv_set_backing_hd_drained() Kevin Wolf
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:55 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

The function bdrv_attach_child_common_abort() is used only as the
abort callback in bdrv_attach_child_common_drv transactions, so the
tran_finalize() calls of such transactions need to be in drained
sections too.

All code paths are covered:
The bdrv_attach_child_common_drv transactions are only used in
bdrv_attach_child_common(), so it is enough to check callers of
bdrv_attach_child_common() following the transactions.

bdrv_attach_child_common() is called by:
1. bdrv_attach_child_noperm(), which does not finalize the
   transaction yet.
2. bdrv_root_attach_child(), where a drained section is introduced.

bdrv_attach_child_noperm() is called by:
1. bdrv_attach_child(), where a drained section is introduced.
2. bdrv_set_file_or_backing_noperm(), which does not finalize the
   transaction yet.
3. bdrv_append(), where a drained section is introduced.

bdrv_set_file_or_backing_noperm() is called by:
1. bdrv_set_backing_hd_drained(), where a drained section is
   introduced.
2. bdrv_reopen_parse_file_or_backing(), which does not finalize the
   transaction yet. Draining the old child bs currently happens under
   the graph lock there. This is replaced with an assertion, because
   the drain will be moved further up to the caller.

bdrv_reopen_parse_file_or_backing() is called by:
1. bdrv_reopen_prepare(), which does not finalize the transaction yet.

bdrv_reopen_prepare() is called by:
1. bdrv_reopen_multiple(), which does finalize the transaction. It is
   called after bdrv_reopen_queue(), which starts a drained section.
   The drained section ends, when bdrv_reopen_queue_free() is called
   at the end of bdrv_reopen_multiple().

This resolves all code paths.

The functions bdrv_set_backing_hd_drained(), bdrv_attach_child() and
bdrv_root_attach_child() run under the graph lock, so they are not
actually allowed to drain. This will be addressed in the following
commits.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-11-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/block.c b/block.c
index 3aaacabf7f..46eb2fe449 100644
--- a/block.c
+++ b/block.c
@@ -3028,10 +3028,8 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
     bdrv_replace_child_noperm(s->child, NULL);
 
     if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
-        bdrv_drain_all_begin();
         bdrv_try_change_aio_context_locked(bs, s->old_child_ctx, NULL,
                                            &error_abort);
-        bdrv_drain_all_end();
     }
 
     if (bdrv_child_get_parent_aio_context(s->child) != s->old_parent_ctx) {
@@ -3043,10 +3041,8 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
 
         /* No need to visit `child`, because it has been detached already */
         visited = g_hash_table_new(NULL, NULL);
-        bdrv_drain_all_begin();
         ret = s->child->klass->change_aio_ctx(s->child, s->old_parent_ctx,
                                               visited, tran, &error_abort);
-        bdrv_drain_all_end();
         g_hash_table_destroy(visited);
 
         /* transaction is supposed to always succeed */
@@ -3075,6 +3071,9 @@ static TransactionActionDrv bdrv_attach_child_common_drv = {
  *
  * Both @parent_bs and @child_bs can move to a different AioContext in this
  * function.
+ *
+ * All block nodes must be drained before this function is called until after
+ * the transaction is finalized.
  */
 static BdrvChild * GRAPH_WRLOCK
 bdrv_attach_child_common(BlockDriverState *child_bs,
@@ -3118,10 +3117,8 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
     parent_ctx = bdrv_child_get_parent_aio_context(new_child);
     if (child_ctx != parent_ctx) {
         Error *local_err = NULL;
-        bdrv_drain_all_begin();
         int ret = bdrv_try_change_aio_context_locked(child_bs, parent_ctx, NULL,
                                                      &local_err);
-        bdrv_drain_all_end();
 
         if (ret < 0 && child_class->change_aio_ctx) {
             Transaction *aio_ctx_tran = tran_new();
@@ -3129,11 +3126,9 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
             bool ret_child;
 
             g_hash_table_add(visited, new_child);
-            bdrv_drain_all_begin();
             ret_child = child_class->change_aio_ctx(new_child, child_ctx,
                                                     visited, aio_ctx_tran,
                                                     NULL);
-            bdrv_drain_all_end();
             if (ret_child == true) {
                 error_free(local_err);
                 ret = 0;
@@ -3189,6 +3184,9 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
  *
  * After calling this function, the transaction @tran may only be completed
  * while holding a writer lock for the graph.
+ *
+ * All block nodes must be drained before this function is called until after
+ * the transaction is finalized.
  */
 static BdrvChild * GRAPH_WRLOCK
 bdrv_attach_child_noperm(BlockDriverState *parent_bs,
@@ -3244,6 +3242,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
 
     GLOBAL_STATE_CODE();
 
+    bdrv_drain_all_begin();
     child = bdrv_attach_child_common(child_bs, child_name, child_class,
                                    child_role, perm, shared_perm, opaque,
                                    tran, errp);
@@ -3256,6 +3255,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
 
 out:
     tran_finalize(tran, ret);
+    bdrv_drain_all_end();
 
     bdrv_schedule_unref(child_bs);
 
@@ -3283,6 +3283,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
     GLOBAL_STATE_CODE();
 
+    bdrv_drain_all_begin();
     child = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
                                      child_class, child_role, tran, errp);
     if (!child) {
@@ -3297,6 +3298,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
 out:
     tran_finalize(tran, ret);
+    bdrv_drain_all_end();
 
     bdrv_schedule_unref(child_bs);
 
@@ -3465,6 +3467,9 @@ static BdrvChildRole bdrv_backing_role(BlockDriverState *bs)
  *
  * After calling this function, the transaction @tran may only be completed
  * while holding a writer lock for the graph.
+ *
+ * All block nodes must be drained before this function is called until after
+ * the transaction is finalized.
  */
 static int GRAPH_WRLOCK
 bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
@@ -3573,6 +3578,7 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
         assert(bs->backing->bs->quiesce_counter > 0);
     }
 
+    bdrv_drain_all_begin();
     ret = bdrv_set_file_or_backing_noperm(bs, backing_hd, true, tran, errp);
     if (ret < 0) {
         goto out;
@@ -3581,6 +3587,7 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
     ret = bdrv_refresh_perms(bs, tran, errp);
 out:
     tran_finalize(tran, ret);
+    bdrv_drain_all_end();
     return ret;
 }
 
@@ -4721,6 +4728,9 @@ int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
  * Return 0 on success, otherwise return < 0 and set @errp.
  *
  * @reopen_state->bs can move to a different AioContext in this function.
+ *
+ * All block nodes must be drained before this function is called until after
+ * the transaction is finalized.
  */
 static int GRAPH_UNLOCKED
 bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
@@ -4814,7 +4824,7 @@ bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
 
     if (old_child_bs) {
         bdrv_ref(old_child_bs);
-        bdrv_drained_begin(old_child_bs);
+        assert(old_child_bs->quiesce_counter > 0);
     }
 
     bdrv_graph_rdunlock_main_loop();
@@ -4826,7 +4836,6 @@ bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
     bdrv_graph_wrunlock();
 
     if (old_child_bs) {
-        bdrv_drained_end(old_child_bs);
         bdrv_unref(old_child_bs);
     }
 
@@ -4855,6 +4864,9 @@ out_rdlock:
  *
  * After calling this function, the transaction @change_child_tran may only be
  * completed while holding a writer lock for the graph.
+ *
+ * All block nodes must be drained before this function is called until after
+ * the transaction is finalized.
  */
 static int GRAPH_UNLOCKED
 bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
@@ -5501,9 +5513,7 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
     assert(!bs_new->backing);
     bdrv_graph_rdunlock_main_loop();
 
-    bdrv_drained_begin(bs_top);
-    bdrv_drained_begin(bs_new);
-
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
 
     child = bdrv_attach_child_noperm(bs_new, bs_top, "backing",
@@ -5525,9 +5535,7 @@ out:
 
     bdrv_refresh_limits(bs_top, NULL, NULL);
     bdrv_graph_wrunlock();
-
-    bdrv_drained_end(bs_top);
-    bdrv_drained_end(bs_new);
+    bdrv_drain_all_end();
 
     return ret;
 }
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 11/24] block: move drain outside of bdrv_set_backing_hd_drained()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (9 preceding siblings ...)
  2025-06-04 17:55 ` [PULL 10/24] block: move drain outside of bdrv_attach_child_common(_abort)() Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 12/24] block: move drain outside of bdrv_root_attach_child() Kevin Wolf
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

The function bdrv_set_backing_hd_drained() holds the graph lock, so it
is not allowed to drain. It is called by:
1. bdrv_set_backing_hd(), where a drained section is introduced,
   replacing the previously present bs-specific drains.
2. stream_prepare(), where a drained section is introduced replacing
   the previously present bs-specific drains.

The drain_bs variable in bdrv_set_backing_hd_drained() is now
superfluous and thus dropped.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-12-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c        | 16 +++-------------
 block/stream.c |  6 ++----
 2 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/block.c b/block.c
index 46eb2fe449..e53a88e1b6 100644
--- a/block.c
+++ b/block.c
@@ -3562,8 +3562,7 @@ out:
  * Both @bs and @backing_hd can move to a different AioContext in this
  * function.
  *
- * If a backing child is already present (i.e. we're detaching a node), that
- * child node must be drained.
+ * All block nodes must be drained.
  */
 int bdrv_set_backing_hd_drained(BlockDriverState *bs,
                                 BlockDriverState *backing_hd,
@@ -3578,7 +3577,6 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
         assert(bs->backing->bs->quiesce_counter > 0);
     }
 
-    bdrv_drain_all_begin();
     ret = bdrv_set_file_or_backing_noperm(bs, backing_hd, true, tran, errp);
     if (ret < 0) {
         goto out;
@@ -3587,28 +3585,20 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
     ret = bdrv_refresh_perms(bs, tran, errp);
 out:
     tran_finalize(tran, ret);
-    bdrv_drain_all_end();
     return ret;
 }
 
 int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
                         Error **errp)
 {
-    BlockDriverState *drain_bs;
     int ret;
     GLOBAL_STATE_CODE();
 
-    bdrv_graph_rdlock_main_loop();
-    drain_bs = bs->backing ? bs->backing->bs : bs;
-    bdrv_graph_rdunlock_main_loop();
-
-    bdrv_ref(drain_bs);
-    bdrv_drained_begin(drain_bs);
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
     bdrv_graph_wrunlock();
-    bdrv_drained_end(drain_bs);
-    bdrv_unref(drain_bs);
+    bdrv_drain_all_end();
 
     return ret;
 }
diff --git a/block/stream.c b/block/stream.c
index 999d9e56d4..6ba49cffd3 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -80,11 +80,10 @@ static int stream_prepare(Job *job)
      * may end up working with the wrong base node (or it might even have gone
      * away by the time we want to use it).
      */
-    bdrv_drained_begin(unfiltered_bs);
     if (unfiltered_bs_cow) {
         bdrv_ref(unfiltered_bs_cow);
-        bdrv_drained_begin(unfiltered_bs_cow);
     }
+    bdrv_drain_all_begin();
 
     bdrv_graph_rdlock_main_loop();
     base = bdrv_filter_or_cow_bs(s->above_base);
@@ -123,11 +122,10 @@ static int stream_prepare(Job *job)
     }
 
 out:
+    bdrv_drain_all_end();
     if (unfiltered_bs_cow) {
-        bdrv_drained_end(unfiltered_bs_cow);
         bdrv_unref(unfiltered_bs_cow);
     }
-    bdrv_drained_end(unfiltered_bs);
     return ret;
 }
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 12/24] block: move drain outside of bdrv_root_attach_child()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (10 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 11/24] block: move drain outside of bdrv_set_backing_hd_drained() Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 13/24] block: move drain outside of bdrv_attach_child() Kevin Wolf via
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

The function bdrv_root_attach_child() runs under the graph lock, so it
is not allowed to drain. It is called by:
1. blk_insert_bs(), where a drained section is introduced.
2. block_job_add_bdrv(), which holds the graph lock itself.

block_job_add_bdrv() is called by:
1. mirror_start_job()
2. stream_start()
3. commit_start()
4. backup_job_create()
5. block_job_create()
6. In the test_blockjob_common_drain_node() unit test

In all callers, a drained section is introduced.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-13-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/blockjob.h     | 2 ++
 block.c                      | 4 ++--
 block/backup.c               | 2 ++
 block/block-backend.c        | 2 ++
 block/commit.c               | 4 ++++
 block/mirror.c               | 5 +++++
 block/stream.c               | 4 ++++
 blockjob.c                   | 4 ++++
 tests/unit/test-bdrv-drain.c | 2 ++
 9 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 7061ab7201..990f3e179a 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -137,6 +137,8 @@ BlockJob *block_job_get_locked(const char *id);
  * Add @bs to the list of BlockDriverState that are involved in
  * @job. This means that all operations will be blocked on @bs while
  * @job exists.
+ *
+ * All block nodes must be drained.
  */
 int GRAPH_WRLOCK
 block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
diff --git a/block.c b/block.c
index e53a88e1b6..17c34dc240 100644
--- a/block.c
+++ b/block.c
@@ -3228,6 +3228,8 @@ bdrv_attach_child_noperm(BlockDriverState *parent_bs,
  *
  * On failure NULL is returned, errp is set and the reference to
  * child_bs is also dropped.
+ *
+ * All block nodes must be drained.
  */
 BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
                                   const char *child_name,
@@ -3242,7 +3244,6 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
 
     GLOBAL_STATE_CODE();
 
-    bdrv_drain_all_begin();
     child = bdrv_attach_child_common(child_bs, child_name, child_class,
                                    child_role, perm, shared_perm, opaque,
                                    tran, errp);
@@ -3255,7 +3256,6 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
 
 out:
     tran_finalize(tran, ret);
-    bdrv_drain_all_end();
 
     bdrv_schedule_unref(child_bs);
 
diff --git a/block/backup.c b/block/backup.c
index 0151e84395..909027c17a 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -498,10 +498,12 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     block_copy_set_speed(bcs, speed);
 
     /* Required permissions are taken by copy-before-write filter target */
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
                        &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     return &job->common;
 
diff --git a/block/block-backend.c b/block/block-backend.c
index 6a6949edeb..24cae3cb55 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -904,6 +904,7 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
 
     GLOBAL_STATE_CODE();
     bdrv_ref(bs);
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
 
     if ((bs->open_flags & BDRV_O_INACTIVE) && blk_can_inactivate(blk)) {
@@ -919,6 +920,7 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
                                        BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                                        perm, shared_perm, blk, errp);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
     if (blk->root == NULL) {
         return -EPERM;
     }
diff --git a/block/commit.c b/block/commit.c
index 7cc8c0f0df..6c4b736ff8 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -392,6 +392,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
      * this is the responsibility of the interface (i.e. whoever calls
      * commit_start()).
      */
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     s->base_overlay = bdrv_find_overlay(top, base);
     assert(s->base_overlay);
@@ -424,18 +425,21 @@ void commit_start(const char *job_id, BlockDriverState *bs,
                                  iter_shared_perms, errp);
         if (ret < 0) {
             bdrv_graph_wrunlock();
+            bdrv_drain_all_end();
             goto fail;
         }
     }
 
     if (bdrv_freeze_backing_chain(commit_top_bs, base, errp) < 0) {
         bdrv_graph_wrunlock();
+        bdrv_drain_all_end();
         goto fail;
     }
     s->chain_frozen = true;
 
     ret = block_job_add_bdrv(&s->common, "base", base, 0, BLK_PERM_ALL, errp);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     if (ret < 0) {
         goto fail;
diff --git a/block/mirror.c b/block/mirror.c
index c2c5099c95..6e8caf4b49 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -2014,6 +2014,7 @@ static BlockJob *mirror_start_job(
      */
     bdrv_disable_dirty_bitmap(s->dirty_bitmap);
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     ret = block_job_add_bdrv(&s->common, "source", bs, 0,
                              BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
@@ -2021,6 +2022,7 @@ static BlockJob *mirror_start_job(
                              errp);
     if (ret < 0) {
         bdrv_graph_wrunlock();
+        bdrv_drain_all_end();
         goto fail;
     }
 
@@ -2066,16 +2068,19 @@ static BlockJob *mirror_start_job(
                                      iter_shared_perms, errp);
             if (ret < 0) {
                 bdrv_graph_wrunlock();
+                bdrv_drain_all_end();
                 goto fail;
             }
         }
 
         if (bdrv_freeze_backing_chain(mirror_top_bs, target, errp) < 0) {
             bdrv_graph_wrunlock();
+            bdrv_drain_all_end();
             goto fail;
         }
     }
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     QTAILQ_INIT(&s->ops_in_flight);
 
diff --git a/block/stream.c b/block/stream.c
index 6ba49cffd3..f5441f27f4 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -371,10 +371,12 @@ void stream_start(const char *job_id, BlockDriverState *bs,
      * already have our own plans. Also don't allow resize as the image size is
      * queried only at the job start and then cached.
      */
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     if (block_job_add_bdrv(&s->common, "active node", bs, 0,
                            basic_flags | BLK_PERM_WRITE, errp)) {
         bdrv_graph_wrunlock();
+        bdrv_drain_all_end();
         goto fail;
     }
 
@@ -395,10 +397,12 @@ void stream_start(const char *job_id, BlockDriverState *bs,
                                  basic_flags, errp);
         if (ret < 0) {
             bdrv_graph_wrunlock();
+            bdrv_drain_all_end();
             goto fail;
         }
     }
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     s->base_overlay = base_overlay;
     s->above_base = above_base;
diff --git a/blockjob.c b/blockjob.c
index 34185d7715..44991e3ff7 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -496,6 +496,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     int ret;
     GLOBAL_STATE_CODE();
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
 
     if (job_id == NULL && !(flags & JOB_INTERNAL)) {
@@ -506,6 +507,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
                      flags, cb, opaque, errp);
     if (job == NULL) {
         bdrv_graph_wrunlock();
+        bdrv_drain_all_end();
         return NULL;
     }
 
@@ -544,10 +546,12 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     }
 
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
     return job;
 
 fail:
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
     job_early_fail(&job->job);
     return NULL;
 }
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 3185f3f429..4f3057844b 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -772,9 +772,11 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
     tjob->bs = src;
     job = &tjob->common;
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     switch (result) {
     case TEST_JOB_SUCCESS:
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 13/24] block: move drain outside of bdrv_attach_child()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (11 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 12/24] block: move drain outside of bdrv_root_attach_child() Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf via
  2025-06-04 17:56 ` [PULL 14/24] block: move drain outside of quorum_add_child() Kevin Wolf
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf via @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

The function bdrv_attach_child() runs under the graph lock, so it is
not allowed to drain. It is called by:
1. replication_start()
2. quorum_add_child()
3. bdrv_open_child_common()
4. Throughout test-bdrv-graph-mod.c and test-bdrv-drain.c unit tests.

In all callers, a drained section is introduced.

The function quorum_add_child() runs under the graph lock, so it is
not actually allowed to drain. This will be addressed by the following
commit.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-14-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c                          |  6 ++++--
 block/quorum.c                   |  2 ++
 block/replication.c              |  5 +++++
 tests/unit/test-bdrv-drain.c     | 14 ++++++++++++++
 tests/unit/test-bdrv-graph-mod.c | 10 ++++++++++
 5 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 17c34dc240..6fc87aa318 100644
--- a/block.c
+++ b/block.c
@@ -3269,6 +3269,8 @@ out:
  *
  * On failure NULL is returned, errp is set and the reference to
  * child_bs is also dropped.
+ *
+ * All block nodes must be drained.
  */
 BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
                              BlockDriverState *child_bs,
@@ -3283,7 +3285,6 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
     GLOBAL_STATE_CODE();
 
-    bdrv_drain_all_begin();
     child = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
                                      child_class, child_role, tran, errp);
     if (!child) {
@@ -3298,7 +3299,6 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
 out:
     tran_finalize(tran, ret);
-    bdrv_drain_all_end();
 
     bdrv_schedule_unref(child_bs);
 
@@ -3789,10 +3789,12 @@ static BdrvChild *bdrv_open_child_common(const char *filename,
         return NULL;
     }
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
                               errp);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     return child;
 }
diff --git a/block/quorum.c b/block/quorum.c
index ed8ce801ee..ea17b0ec13 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1096,8 +1096,10 @@ quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs, Error **errp)
     /* We can safely add the child now */
     bdrv_ref(child_bs);
 
+    bdrv_drain_all_begin();
     child = bdrv_attach_child(bs, child_bs, indexstr, &child_of_bds,
                               BDRV_CHILD_DATA, errp);
+    bdrv_drain_all_end();
     if (child == NULL) {
         s->next_child_index--;
         return;
diff --git a/block/replication.c b/block/replication.c
index 07f274de9e..54cbd03e00 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -540,6 +540,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
             return;
         }
 
+        bdrv_drain_all_begin();
         bdrv_graph_wrlock();
 
         bdrv_ref(hidden_disk->bs);
@@ -549,6 +550,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         if (local_err) {
             error_propagate(errp, local_err);
             bdrv_graph_wrunlock();
+            bdrv_drain_all_end();
             return;
         }
 
@@ -559,6 +561,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         if (local_err) {
             error_propagate(errp, local_err);
             bdrv_graph_wrunlock();
+            bdrv_drain_all_end();
             return;
         }
 
@@ -571,12 +574,14 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
             !check_top_bs(top_bs, bs)) {
             error_setg(errp, "No top_bs or it is invalid");
             bdrv_graph_wrunlock();
+            bdrv_drain_all_end();
             reopen_backing_file(bs, false, NULL);
             return;
         }
         bdrv_op_block_all(top_bs, s->blocker);
 
         bdrv_graph_wrunlock();
+        bdrv_drain_all_end();
 
         s->backup_job = backup_job_create(
                                 NULL, s->secondary_disk->bs, s->hidden_disk->bs,
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 4f3057844b..ac76525e5a 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1049,10 +1049,12 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
 
     null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
                         &error_abort);
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     /* This child will be the one to pass to requests through to, and
      * it will stall until a drain occurs */
@@ -1060,21 +1062,25 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
                                     &error_abort);
     child_bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
     /* Takes our reference to child_bs */
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child",
                                         &child_of_bds,
                                         BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
                                         &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     /* This child is just there to be deleted
      * (for detach_instead_of_delete == true) */
     null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
                         &error_abort);
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds, BDRV_CHILD_DATA,
                       &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
     blk_insert_bs(blk, bs, &error_abort);
@@ -1157,6 +1163,7 @@ static void no_coroutine_fn detach_indirect_bh(void *opaque)
 
     bdrv_dec_in_flight(data->child_b->bs);
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     bdrv_unref_child(data->parent_b, data->child_b);
 
@@ -1165,6 +1172,7 @@ static void no_coroutine_fn detach_indirect_bh(void *opaque)
                                       &child_of_bds, BDRV_CHILD_DATA,
                                       &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 }
 
 static void coroutine_mixed_fn detach_by_parent_aio_cb(void *opaque, int ret)
@@ -1262,6 +1270,7 @@ static void TSA_NO_TSA test_detach_indirect(bool by_parent_cb)
     /* Set child relationships */
     bdrv_ref(b);
     bdrv_ref(a);
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     child_b = bdrv_attach_child(parent_b, b, "PB-B", &child_of_bds,
                                 BDRV_CHILD_DATA, &error_abort);
@@ -1273,6 +1282,7 @@ static void TSA_NO_TSA test_detach_indirect(bool by_parent_cb)
                       by_parent_cb ? &child_of_bds : &detach_by_driver_cb_class,
                       BDRV_CHILD_DATA, &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     g_assert_cmpint(parent_a->refcnt, ==, 1);
     g_assert_cmpint(parent_b->refcnt, ==, 1);
@@ -1685,6 +1695,7 @@ static void test_drop_intermediate_poll(void)
      * Establish the chain last, so the chain links are the first
      * elements in the BDS.parents lists
      */
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     for (i = 0; i < 3; i++) {
         if (i) {
@@ -1694,6 +1705,7 @@ static void test_drop_intermediate_poll(void)
         }
     }
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     job = block_job_create("job", &test_simple_job_driver, NULL, job_node,
                            0, BLK_PERM_ALL, 0, 0, NULL, NULL, &error_abort);
@@ -1940,10 +1952,12 @@ static void do_test_replace_child_mid_drain(int old_drain_count,
     new_child_bs->total_sectors = 1;
 
     bdrv_ref(old_child_bs);
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     bdrv_attach_child(parent_bs, old_child_bs, "child", &child_of_bds,
                       BDRV_CHILD_COW, &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
     parent_s->setup_completed = true;
 
     for (i = 0; i < old_drain_count; i++) {
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index d743abb4bb..7b03ebe4b0 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -137,10 +137,12 @@ static void test_update_perm_tree(void)
 
     blk_insert_bs(root, bs, &error_abort);
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     bdrv_attach_child(filter, bs, "child", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     ret = bdrv_append(filter, bs, NULL);
     g_assert_cmpint(ret, <, 0);
@@ -204,11 +206,13 @@ static void test_should_update_child(void)
 
     bdrv_set_backing_hd(target, bs, &error_abort);
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     g_assert(target->backing->bs == bs);
     bdrv_attach_child(filter, target, "target", &child_of_bds,
                       BDRV_CHILD_DATA, &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
     bdrv_append(filter, bs, &error_abort);
 
     bdrv_graph_rdlock_main_loop();
@@ -244,6 +248,7 @@ static void test_parallel_exclusive_write(void)
     bdrv_ref(base);
     bdrv_ref(fl1);
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     bdrv_attach_child(top, fl1, "backing", &child_of_bds,
                       BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -257,6 +262,7 @@ static void test_parallel_exclusive_write(void)
 
     bdrv_replace_node(fl1, fl2, &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     bdrv_drained_end(fl2);
     bdrv_drained_end(fl1);
@@ -363,6 +369,7 @@ static void test_parallel_perm_update(void)
      */
     bdrv_ref(base);
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     bdrv_attach_child(top, ws, "file", &child_of_bds, BDRV_CHILD_DATA,
                       &error_abort);
@@ -377,6 +384,7 @@ static void test_parallel_perm_update(void)
                       BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                       &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     /* Select fl1 as first child to be active */
     s->selected = c_fl1;
@@ -430,11 +438,13 @@ static void test_append_greedy_filter(void)
     BlockDriverState *base = no_perm_node("base");
     BlockDriverState *fl = exclusive_writer_node("fl1");
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     bdrv_attach_child(top, base, "backing", &child_of_bds,
                       BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                       &error_abort);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     bdrv_append(fl, base, &error_abort);
     bdrv_unref(fl);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 14/24] block: move drain outside of quorum_add_child()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (12 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 13/24] block: move drain outside of bdrv_attach_child() Kevin Wolf via
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 15/24] block: move drain outside of bdrv_root_unref_child() Kevin Wolf
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

The quorum_add_child() callback runs under the graph lock, so it is
not allowed to drain. It is only called as the .bdrv_add_child()
callback, which is only called in the bdrv_add_child() function, which
also runs under the graph lock.

The bdrv_add_child() function is called by qmp_x_blockdev_change(),
where a drained section is introduced.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-15-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block_int-common.h |  7 +++++++
 block.c                          | 10 ++++++++--
 block/quorum.c                   |  2 --
 blockdev.c                       |  2 ++
 4 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 168f703fa1..f9e742f812 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -396,6 +396,13 @@ struct BlockDriver {
     int GRAPH_RDLOCK_PTR (*bdrv_probe_geometry)(
         BlockDriverState *bs, HDGeometry *geo);
 
+    /**
+     * Hot add a BDS's child. Used in combination with bdrv_del_child, so the
+     * user can take a child offline when it is broken and take a new child
+     * online.
+     *
+     * All block nodes must be drained.
+     */
     void GRAPH_WRLOCK_PTR (*bdrv_add_child)(
         BlockDriverState *parent, BlockDriverState *child, Error **errp);
 
diff --git a/block.c b/block.c
index 6fc87aa318..f6c2f7e208 100644
--- a/block.c
+++ b/block.c
@@ -8220,8 +8220,10 @@ char *bdrv_dirname(BlockDriverState *bs, Error **errp)
 }
 
 /*
- * Hot add/remove a BDS's child. So the user can take a child offline when
- * it is broken and take a new child online
+ * Hot add a BDS's child. Used in combination with bdrv_del_child, so the user
+ * can take a child offline when it is broken and take a new child online.
+ *
+ * All block nodes must be drained.
  */
 void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
                     Error **errp)
@@ -8261,6 +8263,10 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
     parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
 }
 
+/*
+ * Hot remove a BDS's child. Used in combination with bdrv_add_child, so the
+ * user can take a child offline when it is broken and take a new child online.
+ */
 void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error **errp)
 {
     BdrvChild *tmp;
diff --git a/block/quorum.c b/block/quorum.c
index ea17b0ec13..ed8ce801ee 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1096,10 +1096,8 @@ quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs, Error **errp)
     /* We can safely add the child now */
     bdrv_ref(child_bs);
 
-    bdrv_drain_all_begin();
     child = bdrv_attach_child(bs, child_bs, indexstr, &child_of_bds,
                               BDRV_CHILD_DATA, errp);
-    bdrv_drain_all_end();
     if (child == NULL) {
         s->next_child_index--;
         return;
diff --git a/blockdev.c b/blockdev.c
index 750beba41f..bd5ca77619 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3531,6 +3531,7 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
     BlockDriverState *parent_bs, *new_bs = NULL;
     BdrvChild *p_child;
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
 
     parent_bs = bdrv_lookup_bs(parent, parent, errp);
@@ -3568,6 +3569,7 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
 
 out:
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 }
 
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 15/24] block: move drain outside of bdrv_root_unref_child()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (13 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 14/24] block: move drain outside of quorum_add_child() Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 16/24] block: move drain outside of quorum_del_child() Kevin Wolf
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

bdrv_root_unref_child() is called by:
1. blk_remove_bs(), where a drained section is introduced.
2. bdrv_unref_child(), which runs under the graph lock, so the drain
   will be moved further up to its callers.
3. block_job_remove_all_bdrv(), where a drained section is introduced.

For all callers of bdrv_unref_child() and its generated
bdrv_co_unref_child() coroutine variant, a drained section is
introduced, they are not explicilty listed here. The caller
quorum_del_child() holds the graph lock, so it is not actually allowed
to drain. This will be addressed in the next commit.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-16-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c                      | 18 ++++++++++++++----
 block/blklogwrites.c         |  4 ++++
 block/blkverify.c            |  2 ++
 block/block-backend.c        |  2 ++
 block/qcow2.c                |  4 ++++
 block/quorum.c               |  6 ++++++
 block/replication.c          |  2 ++
 block/snapshot.c             |  2 ++
 block/vmdk.c                 | 10 ++++++++++
 blockjob.c                   |  2 ++
 tests/unit/test-bdrv-drain.c |  4 ++++
 11 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index f6c2f7e208..15a8ccb822 100644
--- a/block.c
+++ b/block.c
@@ -1721,12 +1721,14 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
 open_failed:
     bs->drv = NULL;
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     if (bs->file != NULL) {
         bdrv_unref_child(bs, bs->file);
         assert(!bs->file);
     }
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     g_free(bs->opaque);
     bs->opaque = NULL;
@@ -3305,7 +3307,11 @@ out:
     return ret < 0 ? NULL : child;
 }
 
-/* Callers must ensure that child->frozen is false. */
+/*
+ * Callers must ensure that child->frozen is false.
+ *
+ * All block nodes must be drained.
+ */
 void bdrv_root_unref_child(BdrvChild *child)
 {
     BlockDriverState *child_bs = child->bs;
@@ -3326,10 +3332,8 @@ void bdrv_root_unref_child(BdrvChild *child)
          * When the parent requiring a non-default AioContext is removed, the
          * node moves back to the main AioContext
          */
-        bdrv_drain_all_begin();
         bdrv_try_change_aio_context_locked(child_bs, qemu_get_aio_context(),
                                            NULL, NULL);
-        bdrv_drain_all_end();
     }
 
     bdrv_schedule_unref(child_bs);
@@ -3402,7 +3406,11 @@ bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,
     }
 }
 
-/* Callers must ensure that child->frozen is false. */
+/*
+ * Callers must ensure that child->frozen is false.
+ *
+ * All block nodes must be drained.
+ */
 void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
 {
     GLOBAL_STATE_CODE();
@@ -5172,6 +5180,7 @@ static void bdrv_close(BlockDriverState *bs)
         bs->drv = NULL;
     }
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
         bdrv_unref_child(bs, child);
@@ -5180,6 +5189,7 @@ static void bdrv_close(BlockDriverState *bs)
     assert(!bs->backing);
     assert(!bs->file);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     g_free(bs->opaque);
     bs->opaque = NULL;
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index b0f78c4bc7..70ac76f401 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -281,9 +281,11 @@ static int blk_log_writes_open(BlockDriverState *bs, QDict *options, int flags,
     ret = 0;
 fail_log:
     if (ret < 0) {
+        bdrv_drain_all_begin();
         bdrv_graph_wrlock();
         bdrv_unref_child(bs, s->log_file);
         bdrv_graph_wrunlock();
+        bdrv_drain_all_end();
         s->log_file = NULL;
         qemu_mutex_destroy(&s->mutex);
     }
@@ -296,10 +298,12 @@ static void blk_log_writes_close(BlockDriverState *bs)
 {
     BDRVBlkLogWritesState *s = bs->opaque;
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     bdrv_unref_child(bs, s->log_file);
     s->log_file = NULL;
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
     qemu_mutex_destroy(&s->mutex);
 }
 
diff --git a/block/blkverify.c b/block/blkverify.c
index db79a36681..3a71f7498c 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -151,10 +151,12 @@ static void blkverify_close(BlockDriverState *bs)
 {
     BDRVBlkverifyState *s = bs->opaque;
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     bdrv_unref_child(bs, s->test_file);
     s->test_file = NULL;
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 }
 
 static int64_t coroutine_fn GRAPH_RDLOCK
diff --git a/block/block-backend.c b/block/block-backend.c
index 24cae3cb55..68209bb2f7 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -889,9 +889,11 @@ void blk_remove_bs(BlockBackend *blk)
     root = blk->root;
     blk->root = NULL;
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     bdrv_root_unref_child(root);
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 }
 
 /*
diff --git a/block/qcow2.c b/block/qcow2.c
index 66fba89b41..45451a7ee8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1895,7 +1895,9 @@ qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
     g_free(s->image_data_file);
     if (open_data_file && has_data_file(bs)) {
         bdrv_graph_co_rdunlock();
+        bdrv_drain_all_begin();
         bdrv_co_unref_child(bs, s->data_file);
+        bdrv_drain_all_end();
         bdrv_graph_co_rdlock();
         s->data_file = NULL;
     }
@@ -2821,9 +2823,11 @@ qcow2_do_close(BlockDriverState *bs, bool close_data_file)
     if (close_data_file && has_data_file(bs)) {
         GLOBAL_STATE_CODE();
         bdrv_graph_rdunlock_main_loop();
+        bdrv_drain_all_begin();
         bdrv_graph_wrlock();
         bdrv_unref_child(bs, s->data_file);
         bdrv_graph_wrunlock();
+        bdrv_drain_all_end();
         s->data_file = NULL;
         bdrv_graph_rdlock_main_loop();
     }
diff --git a/block/quorum.c b/block/quorum.c
index ed8ce801ee..81407a38ee 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1037,6 +1037,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
 
 close_exit:
     /* cleanup on error */
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     for (i = 0; i < s->num_children; i++) {
         if (!opened[i]) {
@@ -1045,6 +1046,7 @@ close_exit:
         bdrv_unref_child(bs, s->children[i]);
     }
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
     g_free(s->children);
     g_free(opened);
 exit:
@@ -1057,11 +1059,13 @@ static void quorum_close(BlockDriverState *bs)
     BDRVQuorumState *s = bs->opaque;
     int i;
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     for (i = 0; i < s->num_children; i++) {
         bdrv_unref_child(bs, s->children[i]);
     }
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     g_free(s->children);
 }
@@ -1143,7 +1147,9 @@ quorum_del_child(BlockDriverState *bs, BdrvChild *child, Error **errp)
             (s->num_children - i - 1) * sizeof(BdrvChild *));
     s->children = g_renew(BdrvChild *, s->children, --s->num_children);
 
+    bdrv_drain_all_begin();
     bdrv_unref_child(bs, child);
+    bdrv_drain_all_end();
 
     quorum_refresh_flags(bs);
 }
diff --git a/block/replication.c b/block/replication.c
index 54cbd03e00..0879718854 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -656,12 +656,14 @@ static void replication_done(void *opaque, int ret)
     if (ret == 0) {
         s->stage = BLOCK_REPLICATION_DONE;
 
+        bdrv_drain_all_begin();
         bdrv_graph_wrlock();
         bdrv_unref_child(bs, s->secondary_disk);
         s->secondary_disk = NULL;
         bdrv_unref_child(bs, s->hidden_disk);
         s->hidden_disk = NULL;
         bdrv_graph_wrunlock();
+        bdrv_drain_all_end();
 
         s->error = 0;
     } else {
diff --git a/block/snapshot.c b/block/snapshot.c
index 9f300a78bd..28c9c43621 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -291,9 +291,11 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
         }
 
         /* .bdrv_open() will re-attach it */
+        bdrv_drain_all_begin();
         bdrv_graph_wrlock();
         bdrv_unref_child(bs, fallback);
         bdrv_graph_wrunlock();
+        bdrv_drain_all_end();
 
         ret = bdrv_snapshot_goto(fallback_bs, snapshot_id, errp);
         memset(bs->opaque, 0, drv->instance_size);
diff --git a/block/vmdk.c b/block/vmdk.c
index 9c7ab037e1..89a7250120 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -271,6 +271,7 @@ static void vmdk_free_extents(BlockDriverState *bs)
     BDRVVmdkState *s = bs->opaque;
     VmdkExtent *e;
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     for (i = 0; i < s->num_extents; i++) {
         e = &s->extents[i];
@@ -283,6 +284,7 @@ static void vmdk_free_extents(BlockDriverState *bs)
         }
     }
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 
     g_free(s->extents);
 }
@@ -1247,9 +1249,11 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
                             0, 0, 0, 0, 0, &extent, errp);
             if (ret < 0) {
                 bdrv_graph_rdunlock_main_loop();
+                bdrv_drain_all_begin();
                 bdrv_graph_wrlock();
                 bdrv_unref_child(bs, extent_file);
                 bdrv_graph_wrunlock();
+                bdrv_drain_all_end();
                 bdrv_graph_rdlock_main_loop();
                 goto out;
             }
@@ -1266,9 +1270,11 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
             g_free(buf);
             if (ret) {
                 bdrv_graph_rdunlock_main_loop();
+                bdrv_drain_all_begin();
                 bdrv_graph_wrlock();
                 bdrv_unref_child(bs, extent_file);
                 bdrv_graph_wrunlock();
+                bdrv_drain_all_end();
                 bdrv_graph_rdlock_main_loop();
                 goto out;
             }
@@ -1277,9 +1283,11 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
             ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
             if (ret) {
                 bdrv_graph_rdunlock_main_loop();
+                bdrv_drain_all_begin();
                 bdrv_graph_wrlock();
                 bdrv_unref_child(bs, extent_file);
                 bdrv_graph_wrunlock();
+                bdrv_drain_all_end();
                 bdrv_graph_rdlock_main_loop();
                 goto out;
             }
@@ -1287,9 +1295,11 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
         } else {
             error_setg(errp, "Unsupported extent type '%s'", type);
             bdrv_graph_rdunlock_main_loop();
+            bdrv_drain_all_begin();
             bdrv_graph_wrlock();
             bdrv_unref_child(bs, extent_file);
             bdrv_graph_wrunlock();
+            bdrv_drain_all_end();
             bdrv_graph_rdlock_main_loop();
             ret = -ENOTSUP;
             goto out;
diff --git a/blockjob.c b/blockjob.c
index 44991e3ff7..e68181a35b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
      * one to make sure that such a concurrent access does not attempt
      * to process an already freed BdrvChild.
      */
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     while (job->nodes) {
         GSList *l = job->nodes;
@@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
         g_slist_free_1(l);
     }
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 }
 
 bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index ac76525e5a..59c2793725 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -955,11 +955,13 @@ static void bdrv_test_top_close(BlockDriverState *bs)
 {
     BdrvChild *c, *next_c;
 
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
     QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
         bdrv_unref_child(bs, c);
     }
     bdrv_graph_wrunlock();
+    bdrv_drain_all_end();
 }
 
 static int coroutine_fn GRAPH_RDLOCK
@@ -1016,7 +1018,9 @@ static void coroutine_fn test_co_delete_by_drain(void *opaque)
         bdrv_graph_co_rdlock();
         QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
             bdrv_graph_co_rdunlock();
+            bdrv_drain_all_begin();
             bdrv_co_unref_child(bs, c);
+            bdrv_drain_all_end();
             bdrv_graph_co_rdlock();
         }
         bdrv_graph_co_rdunlock();
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 16/24] block: move drain outside of quorum_del_child()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (14 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 15/24] block: move drain outside of bdrv_root_unref_child() Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 17/24] blockdev: drain while unlocked in internal_snapshot_action() Kevin Wolf
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

The quorum_del_child() callback runs under the graph lock, so it is
not allowed to drain. It is only called as the .bdrv_del_child()
callback, which is only called in the bdrv_del_child() function, which
also runs under the graph lock.

The bdrv_del_child() function is called by qmp_x_blockdev_change().
A drained section was already introduced there by commit "block: move
drain out of quorum_add_child()".

This finally finishes moving out the drain to places that are not
under the graph lock started in "block: move draining out of
bdrv_change_aio_context() and mark GRAPH_RDLOCK".

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-17-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block_int-common.h | 7 +++++++
 block.c                          | 2 ++
 block/quorum.c                   | 2 --
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index f9e742f812..925a3e7353 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -406,6 +406,13 @@ struct BlockDriver {
     void GRAPH_WRLOCK_PTR (*bdrv_add_child)(
         BlockDriverState *parent, BlockDriverState *child, Error **errp);
 
+    /**
+     * Hot remove a BDS's child. Used in combination with bdrv_add_child, so the
+     * user can take a child offline when it is broken and take a new child
+     * online.
+     *
+     * All block nodes must be drained.
+     */
     void GRAPH_WRLOCK_PTR (*bdrv_del_child)(
         BlockDriverState *parent, BdrvChild *child, Error **errp);
 
diff --git a/block.c b/block.c
index 15a8ccb822..bfd4340b24 100644
--- a/block.c
+++ b/block.c
@@ -8276,6 +8276,8 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
 /*
  * Hot remove a BDS's child. Used in combination with bdrv_add_child, so the
  * user can take a child offline when it is broken and take a new child online.
+ *
+ * All block nodes must be drained.
  */
 void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error **errp)
 {
diff --git a/block/quorum.c b/block/quorum.c
index 81407a38ee..cc3bc5f4e7 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1147,9 +1147,7 @@ quorum_del_child(BlockDriverState *bs, BdrvChild *child, Error **errp)
             (s->num_children - i - 1) * sizeof(BdrvChild *));
     s->children = g_renew(BdrvChild *, s->children, --s->num_children);
 
-    bdrv_drain_all_begin();
     bdrv_unref_child(bs, child);
-    bdrv_drain_all_end();
 
     quorum_refresh_flags(bs);
 }
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 17/24] blockdev: drain while unlocked in internal_snapshot_action()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (15 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 16/24] block: move drain outside of quorum_del_child() Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 18/24] blockdev: drain while unlocked in external_snapshot_action() Kevin Wolf
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-18-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 blockdev.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index bd5ca77619..506755bef1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1208,7 +1208,7 @@ static void internal_snapshot_action(BlockdevSnapshotInternal *internal,
     Error *local_err = NULL;
     const char *device;
     const char *name;
-    BlockDriverState *bs;
+    BlockDriverState *bs, *check_bs;
     QEMUSnapshotInfo old_sn, *sn;
     bool ret;
     int64_t rt;
@@ -1216,7 +1216,7 @@ static void internal_snapshot_action(BlockdevSnapshotInternal *internal,
     int ret1;
 
     GLOBAL_STATE_CODE();
-    GRAPH_RDLOCK_GUARD_MAINLOOP();
+    bdrv_graph_rdlock_main_loop();
 
     tran_add(tran, &internal_snapshot_drv, state);
 
@@ -1225,14 +1225,29 @@ static void internal_snapshot_action(BlockdevSnapshotInternal *internal,
 
     bs = qmp_get_root_bs(device, errp);
     if (!bs) {
+        bdrv_graph_rdunlock_main_loop();
         return;
     }
 
     state->bs = bs;
 
+    /* Need to drain while unlocked. */
+    bdrv_graph_rdunlock_main_loop();
     /* Paired with .clean() */
     bdrv_drained_begin(bs);
 
+    GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+    /* Make sure the root bs did not change with the drain. */
+    check_bs = qmp_get_root_bs(device, errp);
+    if (bs != check_bs) {
+        if (check_bs) {
+            error_setg(errp, "Block node of device '%s' unexpectedly changed",
+                       device);
+        } /* else errp is already set */
+        return;
+    }
+
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_INTERNAL_SNAPSHOT, errp)) {
         return;
     }
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 18/24] blockdev: drain while unlocked in external_snapshot_action()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (16 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 17/24] blockdev: drain while unlocked in internal_snapshot_action() Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 19/24] block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED Kevin Wolf
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-19-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 blockdev.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 506755bef1..2e7fda6780 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1377,9 +1377,10 @@ static void external_snapshot_action(TransactionAction *action,
     const char *new_image_file;
     ExternalSnapshotState *state = g_new0(ExternalSnapshotState, 1);
     uint64_t perm, shared;
+    BlockDriverState *check_bs;
 
     /* TODO We'll eventually have to take a writer lock in this function */
-    GRAPH_RDLOCK_GUARD_MAINLOOP();
+    bdrv_graph_rdlock_main_loop();
 
     tran_add(tran, &external_snapshot_drv, state);
 
@@ -1412,11 +1413,25 @@ static void external_snapshot_action(TransactionAction *action,
 
     state->old_bs = bdrv_lookup_bs(device, node_name, errp);
     if (!state->old_bs) {
+        bdrv_graph_rdunlock_main_loop();
         return;
     }
 
+    /* Need to drain while unlocked. */
+    bdrv_graph_rdunlock_main_loop();
     /* Paired with .clean() */
     bdrv_drained_begin(state->old_bs);
+    GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+    /* Make sure the associated bs did not change with the drain. */
+    check_bs = bdrv_lookup_bs(device, node_name, errp);
+    if (state->old_bs != check_bs) {
+        if (check_bs) {
+            error_setg(errp, "Block node of device '%s' unexpectedly changed",
+                       device);
+        } /* else errp is already set */
+        return;
+    }
 
     if (!bdrv_is_inserted(state->old_bs)) {
         error_setg(errp, "Device '%s' has no medium",
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 19/24] block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (17 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 18/24] blockdev: drain while unlocked in external_snapshot_action() Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 20/24] iotests/graph-changes-while-io: remove image file after test Kevin Wolf
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

All of bdrv_drain_all_begin(), bdrv_drain_all() and
bdrv_drained_begin() poll and are not allowed to be called with the
block graph lock held. Mark the function as such.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-20-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-global-state.h | 4 ++--
 include/block/block-io.h           | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index 91f249b5ad..84a2a4ecd5 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -192,10 +192,10 @@ int bdrv_inactivate_all(void);
 
 int bdrv_flush_all(void);
 void bdrv_close_all(void);
-void bdrv_drain_all_begin(void);
+void GRAPH_UNLOCKED bdrv_drain_all_begin(void);
 void bdrv_drain_all_begin_nopoll(void);
 void bdrv_drain_all_end(void);
-void bdrv_drain_all(void);
+void GRAPH_UNLOCKED bdrv_drain_all(void);
 
 void bdrv_aio_cancel(BlockAIOCB *acb);
 
diff --git a/include/block/block-io.h b/include/block/block-io.h
index b99cc98d26..4cf83fb367 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -431,7 +431,7 @@ bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
  *
  * This function can be recursive.
  */
-void bdrv_drained_begin(BlockDriverState *bs);
+void GRAPH_UNLOCKED bdrv_drained_begin(BlockDriverState *bs);
 
 /**
  * bdrv_do_drained_begin_quiesce:
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 20/24] iotests/graph-changes-while-io: remove image file after test
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (18 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 19/24] block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 21/24] iotests/graph-changes-while-io: add test case with removal of lower snapshot Kevin Wolf
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-21-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/tests/graph-changes-while-io | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qemu-iotests/tests/graph-changes-while-io b/tests/qemu-iotests/tests/graph-changes-while-io
index 194fda500e..35489e3b5e 100755
--- a/tests/qemu-iotests/tests/graph-changes-while-io
+++ b/tests/qemu-iotests/tests/graph-changes-while-io
@@ -57,6 +57,7 @@ class TestGraphChangesWhileIO(QMPTestCase):
 
     def tearDown(self) -> None:
         self.qsd.stop()
+        os.remove(top)
 
     def test_blockdev_add_while_io(self) -> None:
         # Run qemu-img bench in the background
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 21/24] iotests/graph-changes-while-io: add test case with removal of lower snapshot
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (19 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 20/24] iotests/graph-changes-while-io: remove image file after test Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 22/24] block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end() Kevin Wolf
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

This case is catching potential deadlock which takes place when job-dismiss
is issued when I/O requests are processed in a separate iothread.

See https://mail.gnu.org/archive/html/qemu-devel/2025-04/msg04421.html

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
[FE: re-use top image and rename snap1->mid as suggested by Kevin Wolf
     remove image file after test as suggested by Kevin Wolf
     add type annotation for function argument to make mypy happy]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-22-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 .../qemu-iotests/tests/graph-changes-while-io | 101 ++++++++++++++++--
 .../tests/graph-changes-while-io.out          |   4 +-
 2 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/tests/qemu-iotests/tests/graph-changes-while-io b/tests/qemu-iotests/tests/graph-changes-while-io
index 35489e3b5e..dca1167b6d 100755
--- a/tests/qemu-iotests/tests/graph-changes-while-io
+++ b/tests/qemu-iotests/tests/graph-changes-while-io
@@ -27,6 +27,7 @@ from iotests import imgfmt, qemu_img, qemu_img_create, qemu_io, \
 
 
 top = os.path.join(iotests.test_dir, 'top.img')
+mid = os.path.join(iotests.test_dir, 'mid.img')
 nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock')
 
 
@@ -59,6 +60,15 @@ class TestGraphChangesWhileIO(QMPTestCase):
         self.qsd.stop()
         os.remove(top)
 
+    def _wait_for_blockjob(self, status: str) -> None:
+        done = False
+        while not done:
+            for event in self.qsd.get_qmp().get_events(wait=10.0):
+                if event['event'] != 'JOB_STATUS_CHANGE':
+                    continue
+                if event['data']['status'] == status:
+                    done = True
+
     def test_blockdev_add_while_io(self) -> None:
         # Run qemu-img bench in the background
         bench_thr = Thread(target=do_qemu_img_bench)
@@ -117,15 +127,92 @@ class TestGraphChangesWhileIO(QMPTestCase):
                 'device': 'job0',
             })
 
-            cancelled = False
-            while not cancelled:
-                for event in self.qsd.get_qmp().get_events(wait=10.0):
-                    if event['event'] != 'JOB_STATUS_CHANGE':
-                        continue
-                    if event['data']['status'] == 'null':
-                        cancelled = True
+            self._wait_for_blockjob('null')
+
+        bench_thr.join()
+
+    def test_remove_lower_snapshot_while_io(self) -> None:
+        # Run qemu-img bench in the background
+        bench_thr = Thread(target=do_qemu_img_bench, args=(100000, ))
+        bench_thr.start()
+
+        # While I/O is performed on 'node0' node, consequently add 2 snapshots
+        # on top of it, then remove (commit) them starting from lower one.
+        while bench_thr.is_alive():
+            # Recreate snapshot images on every iteration
+            qemu_img_create('-f', imgfmt, mid, '1G')
+            qemu_img_create('-f', imgfmt, top, '1G')
+
+            self.qsd.cmd('blockdev-add', {
+                'driver': imgfmt,
+                'node-name': 'mid',
+                'file': {
+                    'driver': 'file',
+                    'filename': mid
+                }
+            })
+
+            self.qsd.cmd('blockdev-snapshot', {
+                'node': 'node0',
+                'overlay': 'mid',
+            })
+
+            self.qsd.cmd('blockdev-add', {
+                'driver': imgfmt,
+                'node-name': 'top',
+                'file': {
+                    'driver': 'file',
+                    'filename': top
+                }
+            })
+
+            self.qsd.cmd('blockdev-snapshot', {
+                'node': 'mid',
+                'overlay': 'top',
+            })
+
+            self.qsd.cmd('block-commit', {
+                'job-id': 'commit-mid',
+                'device': 'top',
+                'top-node': 'mid',
+                'base-node': 'node0',
+                'auto-finalize': True,
+                'auto-dismiss': False,
+            })
+
+            self._wait_for_blockjob('concluded')
+            self.qsd.cmd('job-dismiss', {
+                'id': 'commit-mid',
+            })
+
+            self.qsd.cmd('block-commit', {
+                'job-id': 'commit-top',
+                'device': 'top',
+                'top-node': 'top',
+                'base-node': 'node0',
+                'auto-finalize': True,
+                'auto-dismiss': False,
+            })
+
+            self._wait_for_blockjob('ready')
+            self.qsd.cmd('job-complete', {
+                'id': 'commit-top',
+            })
+
+            self._wait_for_blockjob('concluded')
+            self.qsd.cmd('job-dismiss', {
+                'id': 'commit-top',
+            })
+
+            self.qsd.cmd('blockdev-del', {
+                'node-name': 'mid'
+            })
+            self.qsd.cmd('blockdev-del', {
+                'node-name': 'top'
+            })
 
         bench_thr.join()
+        os.remove(mid)
 
 if __name__ == '__main__':
     # Format must support raw backing files
diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out b/tests/qemu-iotests/tests/graph-changes-while-io.out
index fbc63e62f8..8d7e996700 100644
--- a/tests/qemu-iotests/tests/graph-changes-while-io.out
+++ b/tests/qemu-iotests/tests/graph-changes-while-io.out
@@ -1,5 +1,5 @@
-..
+...
 ----------------------------------------------------------------------
-Ran 2 tests
+Ran 3 tests
 
 OK
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 22/24] block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (20 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 21/24] iotests/graph-changes-while-io: add test case with removal of lower snapshot Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 23/24] iotests: fix 240 Kevin Wolf
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

Both commit ab61335025 ("block: drain from main loop thread in
bdrv_co_yield_to_drain()") and commit d05ab380db ("block: Mark drain
related functions GRAPH_RDLOCK") introduced a GLOBAL_STATE_CODE()
macro in bdrv_do_drained_end(). The assertion of being in the main
thread cannot change here, so keep only the earlier instance.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250530151125.955508-23-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 4fd7768f9c..ac5c7174f6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -413,7 +413,6 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
     /* At this point, we should be always running in the main loop. */
     GLOBAL_STATE_CODE();
     assert(bs->quiesce_counter > 0);
-    GLOBAL_STATE_CODE();
 
     /* Re-enable things in child-to-parent order */
     old_quiesce_counter = qatomic_fetch_dec(&bs->quiesce_counter);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 23/24] iotests: fix 240
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (21 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 22/24] block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end() Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-04 17:56 ` [PULL 24/24] hw/core/qdev-properties-system: Add missing return in set_drive_helper() Kevin Wolf
  2025-06-05 19:00 ` [PULL 00/24] Block layer patches Stefan Hajnoczi
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Commit 2e8e18c2e463 ("virtio-scsi: add iothread-vq-mapping parameter")
removed the limitation that virtio-scsi devices must successfully set
the AioContext on their BlockBackends. This was made possible thanks to
the QEMU multi-queue block layer.

This change broke qemu-iotests 240, which checks that adding a
virtio-scsi device with a drive that is already in another AioContext
will fail.

Update the test to take the relaxed behavior into account. I considered
removing this test case entirely, but the code coverage still seems
valuable.

Fixes: 2e8e18c2e463 ("virtio-scsi: add iothread-vq-mapping parameter")
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250529203147.180338-1-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/240     | 2 --
 tests/qemu-iotests/240.out | 4 +---
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/240 b/tests/qemu-iotests/240
index 9b281e1dc0..f8af9ff648 100755
--- a/tests/qemu-iotests/240
+++ b/tests/qemu-iotests/240
@@ -81,8 +81,6 @@ class TestCase(iotests.QMPTestCase):
 
         self.vm.qmp_log('device_del', id='scsi-hd0')
         self.vm.event_wait('DEVICE_DELETED')
-        self.vm.qmp_log('device_add', id='scsi-hd1', driver='scsi-hd', drive='hd0', bus="scsi1.0")
-
         self.vm.qmp_log('device_del', id='scsi-hd1')
         self.vm.event_wait('DEVICE_DELETED')
         self.vm.qmp_log('blockdev-del', node_name='hd0')
diff --git a/tests/qemu-iotests/240.out b/tests/qemu-iotests/240.out
index 89ed25e506..10dcc42e06 100644
--- a/tests/qemu-iotests/240.out
+++ b/tests/qemu-iotests/240.out
@@ -46,10 +46,8 @@
 {"execute": "device_add", "arguments": {"bus": "scsi0.0", "drive": "hd0", "driver": "scsi-hd", "id": "scsi-hd0"}}
 {"return": {}}
 {"execute": "device_add", "arguments": {"bus": "scsi1.0", "drive": "hd0", "driver": "scsi-hd", "id": "scsi-hd1"}}
-{"error": {"class": "GenericError", "desc": "Cannot change iothread of active block backend"}}
-{"execute": "device_del", "arguments": {"id": "scsi-hd0"}}
 {"return": {}}
-{"execute": "device_add", "arguments": {"bus": "scsi1.0", "drive": "hd0", "driver": "scsi-hd", "id": "scsi-hd1"}}
+{"execute": "device_del", "arguments": {"id": "scsi-hd0"}}
 {"return": {}}
 {"execute": "device_del", "arguments": {"id": "scsi-hd1"}}
 {"return": {}}
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PULL 24/24] hw/core/qdev-properties-system: Add missing return in set_drive_helper()
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (22 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 23/24] iotests: fix 240 Kevin Wolf
@ 2025-06-04 17:56 ` Kevin Wolf
  2025-06-05 19:00 ` [PULL 00/24] Block layer patches Stefan Hajnoczi
  24 siblings, 0 replies; 28+ messages in thread
From: Kevin Wolf @ 2025-06-04 17:56 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

Currently, changing the 'drive' property of e.g. a scsi-hd object will
result in an assertion failure if the aio context of the block node
it's replaced with doesn't match the current aio context:

> bdrv_replace_child_noperm: Assertion `bdrv_get_aio_context(old_bs) ==
> bdrv_get_aio_context(new_bs)' failed.

The problematic scenario is already detected, but a 'return' statement
was missing.

Cc: qemu-stable@nongnu.org
Fixes: d1a58c176a ("qdev: allow setting drive property for realized device")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20250523070211.280498-1-f.ebner@proxmox.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/core/qdev-properties-system.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 8e11e6388b..24e145d870 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -145,6 +145,7 @@ static void set_drive_helper(Object *obj, Visitor *v, const char *name,
         if (ctx != bdrv_get_aio_context(bs)) {
             error_setg(errp, "Different aio context is not supported for new "
                        "node");
+            return;
         }
 
         blk_replace_bs(blk, bs, errp);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PULL 00/24] Block layer patches
  2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
                   ` (23 preceding siblings ...)
  2025-06-04 17:56 ` [PULL 24/24] hw/core/qdev-properties-system: Add missing return in set_drive_helper() Kevin Wolf
@ 2025-06-05 19:00 ` Stefan Hajnoczi
  24 siblings, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2025-06-05 19:00 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, kwolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 116 bytes --]

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/10.1 for any user-visible changes.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-06-05 19:01 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
2025-06-04 17:55 ` [PULL 01/24] block: remove outdated comments about AioContext locking Kevin Wolf
2025-06-04 17:55 ` [PULL 02/24] block: move drain outside of read-locked bdrv_reopen_queue_child() Kevin Wolf
2025-06-04 17:55 ` [PULL 03/24] block/snapshot: move drain outside of read-locked bdrv_snapshot_delete() Kevin Wolf
2025-06-04 17:55 ` [PULL 04/24] block: move drain outside of read-locked bdrv_inactivate_recurse() Kevin Wolf
2025-06-04 17:55 ` [PULL 05/24] block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK Kevin Wolf
2025-06-04 17:55 ` [PULL 06/24] block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR) Kevin Wolf
2025-06-04 17:55 ` [PULL 07/24] block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK Kevin Wolf
2025-06-04 17:55 ` [PULL 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK Kevin Wolf
2025-06-04 17:55 ` [PULL 09/24] block: move drain outside of bdrv_try_change_aio_context() Kevin Wolf
2025-06-04 17:55 ` [PULL 10/24] block: move drain outside of bdrv_attach_child_common(_abort)() Kevin Wolf
2025-06-04 17:56 ` [PULL 11/24] block: move drain outside of bdrv_set_backing_hd_drained() Kevin Wolf
2025-06-04 17:56 ` [PULL 12/24] block: move drain outside of bdrv_root_attach_child() Kevin Wolf
2025-06-04 17:56 ` [PULL 13/24] block: move drain outside of bdrv_attach_child() Kevin Wolf via
2025-06-04 17:56 ` [PULL 14/24] block: move drain outside of quorum_add_child() Kevin Wolf
2025-06-04 17:56 ` [PULL 15/24] block: move drain outside of bdrv_root_unref_child() Kevin Wolf
2025-06-04 17:56 ` [PULL 16/24] block: move drain outside of quorum_del_child() Kevin Wolf
2025-06-04 17:56 ` [PULL 17/24] blockdev: drain while unlocked in internal_snapshot_action() Kevin Wolf
2025-06-04 17:56 ` [PULL 18/24] blockdev: drain while unlocked in external_snapshot_action() Kevin Wolf
2025-06-04 17:56 ` [PULL 19/24] block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED Kevin Wolf
2025-06-04 17:56 ` [PULL 20/24] iotests/graph-changes-while-io: remove image file after test Kevin Wolf
2025-06-04 17:56 ` [PULL 21/24] iotests/graph-changes-while-io: add test case with removal of lower snapshot Kevin Wolf
2025-06-04 17:56 ` [PULL 22/24] block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end() Kevin Wolf
2025-06-04 17:56 ` [PULL 23/24] iotests: fix 240 Kevin Wolf
2025-06-04 17:56 ` [PULL 24/24] hw/core/qdev-properties-system: Add missing return in set_drive_helper() Kevin Wolf
2025-06-05 19:00 ` [PULL 00/24] Block layer patches Stefan Hajnoczi
  -- strict thread matches above, loose matches on Subject: below --
2021-06-30 16:01 Kevin Wolf
2021-07-02 13:52 ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).