* [PATCH v3 00/24] block: do not drain while holding the graph lock
@ 2025-05-26 13:21 Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 01/24] block: remove outdated comments about AioContext locking Fiona Ebner
` (23 more replies)
0 siblings, 24 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Previous discussions:
v2: [0]
v1: [1]
Changes in v3:
* Also add bdrv_drain_all_begin() and bdrv_drain_all() as
GRAPH_UNLOCKED.
* Extend drained section in bdrv_try_change_aio_context() until after
the transaction is finalized.
* Also mark definition of static bdrv_change_aio_context() as
GRAPH_RDLOCK, not only the declaration.
* Add comments for bdrv_{child,parent}_change_aio_context() and
change_aio_ctx().
* Improve commit messages: typos/language/clarification.
Changes in v2:
* Split the big patch moving the drain outside of
bdrv_change_aio_context(), mark functions along the way with graph
lock annotations.
* In {internal,external}_snapshot_action, check that associated bs did
not change after drain and re-acquiring the lock.
* Improve error handling using goto where appropriate.
* Add bdrv_graph_wrlock_drained() convenience wrapper rather than
adding a flag argument.
* Don't use atomics to access bs->quiesce_counter field, add a patch
to adapt the two existing places that used atomics.
* Re-use 'top' image for graph-changes-while-io test case and rename
the other image to 'mid'. Remove the image files after the test.
* Use "must be" instead of "needs to be" in documentation, use single
line comments where possible.
* Remove yet another outdated comment.
* I did not add Kevin's R-b for the patch marking bdrv_drained_begin()
GRAPH_RDLOCK, as the earlier patches/preconditions changed.
This series is an attempt to fix a deadlock issue reported by Andrey
here [2].
bdrv_drained_begin() polls and is not allowed to be called with the
block graph lock held. Mark the function as GRAPH_UNLOCKED.
This alone does not catch the issue reported by Andrey, because there
is a bdrv_graph_rdunlock_main_loop() before bdrv_drained_begin() in
the function bdrv_change_aio_context(). That unlock is of course
ineffective if the exclusive lock is held, but it prevents TSA from
finding the issue.
Thus the bdrv_drained_begin() call from inside
bdrv_change_aio_context() needs to be moved up the call stack before
acquiring the locks. This is the bulk of the series.
Granular draining is not trivially possible, because many of the
affected functions can recursively call themselves.
In place where bdrv_drained_begin() calls were removed, assertions
are added, checking the quiesced_counter to ensure that the nodes
already got drained further up in the call stack.
NOTE:
there are pre-existing test failures on current master, e.g. '240' for
all formats, '295 296 inactive-node-nbd luks-detached-header' for luks
and 'mirror-sparse' for raw. For me, the failures do not change after
this series. The 'mirror-sparse' failure is setup-specific and there
is already a patch available [3].
[0]: https://lore.kernel.org/qemu-devel/20250520103012.424311-1-f.ebner@proxmox.com/
[1]: https://lore.kernel.org/qemu-devel/20250508140936.3344485-1-f.ebner@proxmox.com/
[2]: https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63f51@virtuozzo.com/
[3]: https://lore.kernel.org/qemu-devel/20250523163041.2548675-7-eblake@redhat.com/
Andrey Drobyshev (1):
iotests/graph-changes-while-io: add test case with removal of lower
snapshot
Fiona Ebner (23):
block: remove outdated comments about AioContext locking
block: move drain outside of read-locked bdrv_reopen_queue_child()
block/snapshot: move drain outside of read-locked
bdrv_snapshot_delete()
block: move drain outside of read-locked bdrv_inactivate_recurse()
block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK
block: mark change_aio_ctx() callback and instances as
GRAPH_RDLOCK(_PTR)
block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK
block: move drain outside of bdrv_change_aio_context() and mark
GRAPH_RDLOCK
block: move drain outside of bdrv_try_change_aio_context()
block: move drain outside of bdrv_attach_child_common(_abort)()
block: move drain outside of bdrv_set_backing_hd_drained()
block: move drain outside of bdrv_root_attach_child()
block: move drain outside of bdrv_attach_child()
block: move drain outside of quorum_add_child()
block: move drain outside of bdrv_root_unref_child()
block: move drain outside of quorum_del_child()
blockdev: drain while unlocked in internal_snapshot_action()
blockdev: drain while unlocked in external_snapshot_action()
block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED
iotests/graph-changes-while-io: remove image file after test
block/io: remove duplicate GLOBAL_STATE_CODE() in
bdrv_do_drained_end()
block: never use atomics to access bs->quiesce_counter
block: add bdrv_graph_wrlock_drained() convenience wrapper
block.c | 184 +++++++++++-------
block/backup.c | 2 +-
block/blklogwrites.c | 4 +-
block/blkverify.c | 2 +-
block/block-backend.c | 10 +-
block/commit.c | 2 +-
block/graph-lock.c | 40 +++-
block/io.c | 8 +-
block/mirror.c | 2 +-
block/qcow2.c | 2 +-
block/quorum.c | 4 +-
block/replication.c | 4 +-
block/snapshot.c | 28 +--
block/stream.c | 8 +-
block/vmdk.c | 10 +-
blockdev.c | 78 ++++++--
blockjob.c | 10 +-
include/block/block-global-state.h | 19 +-
include/block/block-io.h | 2 +-
include/block/block_int-common.h | 20 +-
include/block/graph-lock.h | 11 ++
qemu-img.c | 2 +
.../qemu-iotests/tests/graph-changes-while-io | 102 +++++++++-
.../tests/graph-changes-while-io.out | 4 +-
tests/unit/test-bdrv-drain.c | 22 +--
tests/unit/test-bdrv-graph-mod.c | 10 +-
26 files changed, 412 insertions(+), 178 deletions(-)
--
2.39.5
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 01/24] block: remove outdated comments about AioContext locking
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 02/24] block: move drain outside of read-locked bdrv_reopen_queue_child() Fiona Ebner
` (22 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
AioContext locking was removed in commit b49f4755c7 ("block: remove
AioContext locking").
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
No changes in v3.
block.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/block.c b/block.c
index f222e1a50a..a5399888ba 100644
--- a/block.c
+++ b/block.c
@@ -4359,8 +4359,6 @@ bdrv_recurse_has_child(BlockDriverState *bs, BlockDriverState *child)
* bs_queue, or the existing bs_queue being used.
*
* bs is drained here and undrained by bdrv_reopen_queue_free().
- *
- * To be called with bs->aio_context locked.
*/
static BlockReopenQueue * GRAPH_RDLOCK
bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, BlockDriverState *bs,
@@ -4519,7 +4517,6 @@ bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, BlockDriverState *bs,
return bs_queue;
}
-/* To be called with bs->aio_context locked */
BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
BlockDriverState *bs,
QDict *options, bool keep_old_opts)
@@ -7278,10 +7275,6 @@ bool bdrv_op_blocker_is_empty(BlockDriverState *bs)
return true;
}
-/*
- * Must not be called while holding the lock of an AioContext other than the
- * current one.
- */
void bdrv_img_create(const char *filename, const char *fmt,
const char *base_filename, const char *base_fmt,
char *options, uint64_t img_size, int flags, bool quiet,
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 02/24] block: move drain outside of read-locked bdrv_reopen_queue_child()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 01/24] block: remove outdated comments about AioContext locking Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 03/24] block/snapshot: move drain outside of read-locked bdrv_snapshot_delete() Fiona Ebner
` (21 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
More granular draining is not trivially possible, because
bdrv_reopen_queue_child() can recursively call itself.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
No changes in v3.
block.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/block.c b/block.c
index a5399888ba..3065d5c91e 100644
--- a/block.c
+++ b/block.c
@@ -4358,7 +4358,7 @@ bdrv_recurse_has_child(BlockDriverState *bs, BlockDriverState *child)
* returns a pointer to bs_queue, which is either the newly allocated
* bs_queue, or the existing bs_queue being used.
*
- * bs is drained here and undrained by bdrv_reopen_queue_free().
+ * bs must be drained.
*/
static BlockReopenQueue * GRAPH_RDLOCK
bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, BlockDriverState *bs,
@@ -4377,12 +4377,7 @@ bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, BlockDriverState *bs,
GLOBAL_STATE_CODE();
- /*
- * Strictly speaking, draining is illegal under GRAPH_RDLOCK. We know that
- * we've been called with bdrv_graph_rdlock_main_loop(), though, so it's ok
- * in practice.
- */
- bdrv_drained_begin(bs);
+ assert(bs->quiesce_counter > 0);
if (bs_queue == NULL) {
bs_queue = g_new0(BlockReopenQueue, 1);
@@ -4522,6 +4517,12 @@ BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
QDict *options, bool keep_old_opts)
{
GLOBAL_STATE_CODE();
+
+ if (bs_queue == NULL) {
+ /* Paired with bdrv_drain_all_end() in bdrv_reopen_queue_free(). */
+ bdrv_drain_all_begin();
+ }
+
GRAPH_RDLOCK_GUARD_MAINLOOP();
return bdrv_reopen_queue_child(bs_queue, bs, options, NULL, 0, false,
@@ -4534,12 +4535,14 @@ void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue)
if (bs_queue) {
BlockReopenQueueEntry *bs_entry, *next;
QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
- bdrv_drained_end(bs_entry->state.bs);
qobject_unref(bs_entry->state.explicit_options);
qobject_unref(bs_entry->state.options);
g_free(bs_entry);
}
g_free(bs_queue);
+
+ /* Paired with bdrv_drain_all_begin() in bdrv_reopen_queue(). */
+ bdrv_drain_all_end();
}
}
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 03/24] block/snapshot: move drain outside of read-locked bdrv_snapshot_delete()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 01/24] block: remove outdated comments about AioContext locking Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 02/24] block: move drain outside of read-locked bdrv_reopen_queue_child() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 04/24] block: move drain outside of read-locked bdrv_inactivate_recurse() Fiona Ebner
` (20 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
More granular draining is not trivially possible, because
bdrv_snapshot_delete() can recursively call itself.
The return value of bdrv_all_delete_snapshot() changes from -1 to
-errno propagated from failed sub-calls. This is fine for the existing
callers of bdrv_all_delete_snapshot().
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
No changes in v3.
block/snapshot.c | 26 +++++++++++++++-----------
blockdev.c | 25 +++++++++++++++++--------
qemu-img.c | 2 ++
3 files changed, 34 insertions(+), 19 deletions(-)
diff --git a/block/snapshot.c b/block/snapshot.c
index 22567f1fb9..9f300a78bd 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -327,7 +327,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
/**
* Delete an internal snapshot by @snapshot_id and @name.
- * @bs: block device used in the operation
+ * @bs: block device used in the operation, must be drained
* @snapshot_id: unique snapshot ID, or NULL
* @name: snapshot name, or NULL
* @errp: location to store error
@@ -358,6 +358,8 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
GLOBAL_STATE_CODE();
+ assert(bs->quiesce_counter > 0);
+
if (!drv) {
error_setg(errp, "Device '%s' has no medium",
bdrv_get_device_name(bs));
@@ -368,9 +370,6 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
return -EINVAL;
}
- /* drain all pending i/o before deleting snapshot */
- bdrv_drained_begin(bs);
-
if (drv->bdrv_snapshot_delete) {
ret = drv->bdrv_snapshot_delete(bs, snapshot_id, name, errp);
} else if (fallback_bs) {
@@ -382,7 +381,6 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
ret = -ENOTSUP;
}
- bdrv_drained_end(bs);
return ret;
}
@@ -571,19 +569,22 @@ int bdrv_all_delete_snapshot(const char *name,
ERRP_GUARD();
g_autoptr(GList) bdrvs = NULL;
GList *iterbdrvs;
+ int ret = 0;
GLOBAL_STATE_CODE();
- GRAPH_RDLOCK_GUARD_MAINLOOP();
- if (bdrv_all_get_snapshot_devices(has_devices, devices, &bdrvs, errp) < 0) {
- return -1;
+ bdrv_drain_all_begin();
+ bdrv_graph_rdlock_main_loop();
+
+ ret = bdrv_all_get_snapshot_devices(has_devices, devices, &bdrvs, errp);
+ if (ret < 0) {
+ goto out;
}
iterbdrvs = bdrvs;
while (iterbdrvs) {
BlockDriverState *bs = iterbdrvs->data;
QEMUSnapshotInfo sn1, *snapshot = &sn1;
- int ret = 0;
if ((devices || bdrv_all_snapshots_includes_bs(bs)) &&
bdrv_snapshot_find(bs, snapshot, name) >= 0)
@@ -594,13 +595,16 @@ int bdrv_all_delete_snapshot(const char *name,
if (ret < 0) {
error_prepend(errp, "Could not delete snapshot '%s' on '%s': ",
name, bdrv_get_device_or_node_name(bs));
- return -1;
+ goto out;
}
iterbdrvs = iterbdrvs->next;
}
- return 0;
+out:
+ bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_end();
+ return ret;
}
diff --git a/blockdev.c b/blockdev.c
index 21443b4514..3982f9776b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1132,39 +1132,41 @@ SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device,
int ret;
GLOBAL_STATE_CODE();
- GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+ bdrv_drain_all_begin();
+ bdrv_graph_rdlock_main_loop();
bs = qmp_get_root_bs(device, errp);
if (!bs) {
- return NULL;
+ goto error;
}
if (!id && !name) {
error_setg(errp, "Name or id must be provided");
- return NULL;
+ goto error;
}
if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_INTERNAL_SNAPSHOT_DELETE, errp)) {
- return NULL;
+ goto error;
}
ret = bdrv_snapshot_find_by_id_and_name(bs, id, name, &sn, &local_err);
if (local_err) {
error_propagate(errp, local_err);
- return NULL;
+ goto error;
}
if (!ret) {
error_setg(errp,
"Snapshot with id '%s' and name '%s' does not exist on "
"device '%s'",
STR_OR_NULL(id), STR_OR_NULL(name), device);
- return NULL;
+ goto error;
}
bdrv_snapshot_delete(bs, id, name, &local_err);
if (local_err) {
error_propagate(errp, local_err);
- return NULL;
+ goto error;
}
info = g_new0(SnapshotInfo, 1);
@@ -1180,6 +1182,9 @@ SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device,
info->has_icount = true;
}
+error:
+ bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_end();
return info;
}
@@ -1295,12 +1300,14 @@ static void internal_snapshot_abort(void *opaque)
Error *local_error = NULL;
GLOBAL_STATE_CODE();
- GRAPH_RDLOCK_GUARD_MAINLOOP();
if (!state->created) {
return;
}
+ bdrv_drain_all_begin();
+ bdrv_graph_rdlock_main_loop();
+
if (bdrv_snapshot_delete(bs, sn->id_str, sn->name, &local_error) < 0) {
error_reportf_err(local_error,
"Failed to delete snapshot with id '%s' and "
@@ -1308,6 +1315,8 @@ static void internal_snapshot_abort(void *opaque)
sn->id_str, sn->name,
bdrv_get_device_name(bs));
}
+ bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_end();
}
static void internal_snapshot_clean(void *opaque)
diff --git a/qemu-img.c b/qemu-img.c
index 139eeb5039..e75707180d 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3505,6 +3505,7 @@ static int img_snapshot(int argc, char **argv)
break;
case SNAPSHOT_DELETE:
+ bdrv_drain_all_begin();
bdrv_graph_rdlock_main_loop();
ret = bdrv_snapshot_find(bs, &sn, snapshot_name);
if (ret < 0) {
@@ -3520,6 +3521,7 @@ static int img_snapshot(int argc, char **argv)
}
}
bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_end();
break;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 04/24] block: move drain outside of read-locked bdrv_inactivate_recurse()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (2 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 03/24] block/snapshot: move drain outside of read-locked bdrv_snapshot_delete() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 05/24] block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK Fiona Ebner
` (19 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
More granular draining is not trivially possible, because
bdrv_inactivate_recurse() can recursively call itself.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
No changes in v3.
block.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/block.c b/block.c
index 3065d5c91e..fa55dfba68 100644
--- a/block.c
+++ b/block.c
@@ -6989,6 +6989,8 @@ bdrv_inactivate_recurse(BlockDriverState *bs, bool top_level)
GLOBAL_STATE_CODE();
+ assert(bs->quiesce_counter > 0);
+
if (!bs->drv) {
return -ENOMEDIUM;
}
@@ -7032,9 +7034,7 @@ bdrv_inactivate_recurse(BlockDriverState *bs, bool top_level)
return -EPERM;
}
- bdrv_drained_begin(bs);
bs->open_flags |= BDRV_O_INACTIVE;
- bdrv_drained_end(bs);
/*
* Update permissions, they may differ for inactive nodes.
@@ -7059,20 +7059,26 @@ int bdrv_inactivate(BlockDriverState *bs, Error **errp)
int ret;
GLOBAL_STATE_CODE();
- GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+ bdrv_drain_all_begin();
+ bdrv_graph_rdlock_main_loop();
if (bdrv_has_bds_parent(bs, true)) {
error_setg(errp, "Node has active parent node");
- return -EPERM;
+ ret = -EPERM;
+ goto out;
}
ret = bdrv_inactivate_recurse(bs, true);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to inactivate node");
- return ret;
+ goto out;
}
- return 0;
+out:
+ bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_end();
+ return ret;
}
int bdrv_inactivate_all(void)
@@ -7082,7 +7088,9 @@ int bdrv_inactivate_all(void)
int ret = 0;
GLOBAL_STATE_CODE();
- GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+ bdrv_drain_all_begin();
+ bdrv_graph_rdlock_main_loop();
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
/* Nodes with BDS parents are covered by recursion from the last
@@ -7098,6 +7106,9 @@ int bdrv_inactivate_all(void)
}
}
+ bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_end();
+
return ret;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 05/24] block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (3 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 04/24] block: move drain outside of read-locked bdrv_inactivate_recurse() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 06/24] block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR) Fiona Ebner
` (18 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is a small step in preparation to mark bdrv_drained_begin() as
GRAPH_UNLOCKED. More concretely, it allows marking the
change_aio_ctx() callback GRAPH_RDLOCK_PTR, which is the next step.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
No changes in v3.
block.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/block.c b/block.c
index fa55dfba68..7207978e53 100644
--- a/block.c
+++ b/block.c
@@ -7575,10 +7575,10 @@ typedef struct BdrvStateSetAioContext {
BlockDriverState *bs;
} BdrvStateSetAioContext;
-static bool bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
- GHashTable *visited,
- Transaction *tran,
- Error **errp)
+static bool GRAPH_RDLOCK
+bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
+ GHashTable *visited, Transaction *tran,
+ Error **errp)
{
GLOBAL_STATE_CODE();
if (g_hash_table_contains(visited, c)) {
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 06/24] block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR)
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (4 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 05/24] block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 07/24] block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK Fiona Ebner
` (17 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is a small step in preparation to mark bdrv_drained_begin() as
GRAPH_UNLOCKED. More concretely, it is in preparation to move the
drain out of bdrv_change_aio_context() and marking that function as
GRAPH_RDLOCK.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
Changes in v3:
* Fix typo in commit message.
Checkpatch seems to report a false positive here, but it's the same
for other callbacks in that header file:
ERROR: space prohibited between function name and open parenthesis '('
#86: FILE: include/block/block_int-common.h:986:
+ bool GRAPH_RDLOCK_PTR (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
block.c | 7 ++++---
block/block-backend.c | 6 +++---
blockjob.c | 6 +++---
include/block/block_int-common.h | 6 +++---
4 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/block.c b/block.c
index 7207978e53..01144c895e 100644
--- a/block.c
+++ b/block.c
@@ -1226,9 +1226,10 @@ static int bdrv_child_cb_inactivate(BdrvChild *child)
return 0;
}
-static bool bdrv_child_cb_change_aio_ctx(BdrvChild *child, AioContext *ctx,
- GHashTable *visited, Transaction *tran,
- Error **errp)
+static bool GRAPH_RDLOCK
+bdrv_child_cb_change_aio_ctx(BdrvChild *child, AioContext *ctx,
+ GHashTable *visited, Transaction *tran,
+ Error **errp)
{
BlockDriverState *bs = child->opaque;
return bdrv_change_aio_context(bs, ctx, visited, tran, errp);
diff --git a/block/block-backend.c b/block/block-backend.c
index a402db13f2..6a6949edeb 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -136,9 +136,9 @@ static void blk_root_drained_end(BdrvChild *child);
static void blk_root_change_media(BdrvChild *child, bool load);
static void blk_root_resize(BdrvChild *child);
-static bool blk_root_change_aio_ctx(BdrvChild *child, AioContext *ctx,
- GHashTable *visited, Transaction *tran,
- Error **errp);
+static bool GRAPH_RDLOCK
+blk_root_change_aio_ctx(BdrvChild *child, AioContext *ctx, GHashTable *visited,
+ Transaction *tran, Error **errp);
static char *blk_root_get_parent_desc(BdrvChild *child)
{
diff --git a/blockjob.c b/blockjob.c
index 32007f31a9..34185d7715 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -144,9 +144,9 @@ static TransactionActionDrv change_child_job_context = {
.clean = g_free,
};
-static bool child_job_change_aio_ctx(BdrvChild *c, AioContext *ctx,
- GHashTable *visited, Transaction *tran,
- Error **errp)
+static bool GRAPH_RDLOCK
+child_job_change_aio_ctx(BdrvChild *c, AioContext *ctx, GHashTable *visited,
+ Transaction *tran, Error **errp)
{
BlockJob *job = c->opaque;
BdrvStateChildJobContext *s;
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 2982dd3118..37466c7841 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -983,9 +983,9 @@ struct BdrvChildClass {
bool backing_mask_protocol,
Error **errp);
- bool (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
- GHashTable *visited, Transaction *tran,
- Error **errp);
+ bool GRAPH_RDLOCK_PTR (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
+ GHashTable *visited,
+ Transaction *tran, Error **errp);
/*
* I/O API functions. These functions are thread-safe.
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 07/24] block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (5 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 06/24] block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR) Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK Fiona Ebner
` (16 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is a small step in preparation to mark bdrv_drained_begin() as
GRAPH_UNLOCKED. More concretely, it is in preparation to move the
drain out of bdrv_change_aio_context() and marking that function as
GRAPH_RDLOCK.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
Changes in v3:
* Fix typo in commit message.
include/block/block-global-state.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index 9be34b3c99..aad160956a 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -274,9 +274,10 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag);
int bdrv_debug_resume(BlockDriverState *bs, const char *tag);
bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag);
-bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
- GHashTable *visited, Transaction *tran,
- Error **errp);
+bool GRAPH_RDLOCK
+bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
+ GHashTable *visited, Transaction *tran,
+ Error **errp);
int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
BdrvChild *ignore_child, Error **errp);
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (6 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 07/24] block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-27 15:11 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 09/24] block: move drain outside of bdrv_try_change_aio_context() Fiona Ebner
` (15 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
Note that even if bdrv_drained_begin() were already marked as
GRAPH_UNLOCKED, TSA would not complain about the instance in
bdrv_change_aio_context() before this change, because it is preceded
by a bdrv_graph_rdunlock_main_loop() call. It is not correct to
release the lock here, and in case the caller holds a write lock, it
wouldn't actually release the lock.
In combination with block-stream, there is a deadlock that can happen
because of this [0]. In particular, it can happen that
main thread IO thread
1. acquires write lock
in blk_co_do_preadv_part():
2. have non-zero blk->in_flight
3. try to acquire read lock
4. begin drain
Steps 3 and 4 might be switched. Draining will poll and get stuck,
because it will see the non-zero in_flight counter. But the IO thread
will not make any progress either, because it cannot acquire the read
lock.
After this change, all paths to bdrv_change_aio_context() drain:
bdrv_change_aio_context() is called by:
1. bdrv_child_cb_change_aio_ctx() which is only called via the
change_aio_ctx() callback, see below.
2. bdrv_child_change_aio_context(), see below.
3. bdrv_try_change_aio_context(), where a drained section is
introduced.
The change_aio_ctx() callback is called by:
1. bdrv_attach_child_common_abort(), where a drained section is
introduced.
2. bdrv_attach_child_common(), where a drained section is introduced.
3. bdrv_parent_change_aio_context(), see below.
bdrv_child_change_aio_context() is called by:
1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
section is invariant.
2. child_job_change_aio_ctx(), which is only called via the
change_aio_ctx() callback, see above.
bdrv_parent_change_aio_context() is called by:
1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
section is invariant.
This resolves all code paths. Note that bdrv_attach_child_common()
and bdrv_attach_child_common_abort() hold the graph write lock and
callers of bdrv_try_change_aio_context() might too, so they are not
actually allowed to drain either. This will be addressed in the
following commits.
More granular draining is not trivially possible, because
bdrv_change_aio_context() can recursively call itself e.g. via
bdrv_child_change_aio_context().
[0]: https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63f51@virtuozzo.com/
Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes in v3:
* Extend drained section in bdrv_try_change_aio_context() until after
the transaction is finalized.
* Also mark definition of static bdrv_change_aio_context() as
GRAPH_RDLOCK, not only the declaration.
* Add comments for bdrv_{child,parent}_change_aio_context() and
change_aio_ctx().
* Improve language in commit message.
block.c | 57 +++++++++++++++++++++++---------
include/block/block_int-common.h | 12 +++++++
2 files changed, 53 insertions(+), 16 deletions(-)
diff --git a/block.c b/block.c
index 01144c895e..6f42c0f1ab 100644
--- a/block.c
+++ b/block.c
@@ -106,9 +106,9 @@ static void bdrv_reopen_abort(BDRVReopenState *reopen_state);
static bool bdrv_backing_overridden(BlockDriverState *bs);
-static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
- GHashTable *visited, Transaction *tran,
- Error **errp);
+static bool GRAPH_RDLOCK
+bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+ GHashTable *visited, Transaction *tran, Error **errp);
/* If non-zero, use only whitelisted block drivers */
static int use_bdrv_whitelist;
@@ -3040,8 +3040,10 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
/* No need to visit `child`, because it has been detached already */
visited = g_hash_table_new(NULL, NULL);
+ bdrv_drain_all_begin();
ret = s->child->klass->change_aio_ctx(s->child, s->old_parent_ctx,
visited, tran, &error_abort);
+ bdrv_drain_all_end();
g_hash_table_destroy(visited);
/* transaction is supposed to always succeed */
@@ -3122,9 +3124,11 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
bool ret_child;
g_hash_table_add(visited, new_child);
+ bdrv_drain_all_begin();
ret_child = child_class->change_aio_ctx(new_child, child_ctx,
visited, aio_ctx_tran,
NULL);
+ bdrv_drain_all_end();
if (ret_child == true) {
error_free(local_err);
ret = 0;
@@ -7576,6 +7580,17 @@ typedef struct BdrvStateSetAioContext {
BlockDriverState *bs;
} BdrvStateSetAioContext;
+/*
+ * Changes the AioContext of @child to @ctx and recursively for the associated
+ * block nodes and all their children and parents. Returns true if the change is
+ * possible and the transaction @tran can be continued. Returns false and sets
+ * @errp if not and the transaction must be aborted.
+ *
+ * @visited will accumulate all visited BdrvChild objects. The caller is
+ * responsible for freeing the list afterwards.
+ *
+ * Must be called with the affected block nodes drained.
+ */
static bool GRAPH_RDLOCK
bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
GHashTable *visited, Transaction *tran,
@@ -7604,6 +7619,17 @@ bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
return true;
}
+/*
+ * Changes the AioContext of @c->bs to @ctx and recursively for all its children
+ * and parents. Returns true if the change is possible and the transaction @tran
+ * can be continued. Returns false and sets @errp if not and the transaction
+ * must be aborted.
+ *
+ * @visited will accumulate all visited BdrvChild objects. The caller is
+ * responsible for freeing the list afterwards.
+ *
+ * Must be called with the affected block nodes drained.
+ */
bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
GHashTable *visited, Transaction *tran,
Error **errp)
@@ -7619,10 +7645,6 @@ bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
static void bdrv_set_aio_context_clean(void *opaque)
{
BdrvStateSetAioContext *state = (BdrvStateSetAioContext *) opaque;
- BlockDriverState *bs = (BlockDriverState *) state->bs;
-
- /* Paired with bdrv_drained_begin in bdrv_change_aio_context() */
- bdrv_drained_end(bs);
g_free(state);
}
@@ -7650,10 +7672,12 @@ static TransactionActionDrv set_aio_context = {
*
* @visited will accumulate all visited BdrvChild objects. The caller is
* responsible for freeing the list afterwards.
+ *
+ * @bs must be drained.
*/
-static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
- GHashTable *visited, Transaction *tran,
- Error **errp)
+static bool GRAPH_RDLOCK
+bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+ GHashTable *visited, Transaction *tran, Error **errp)
{
BdrvChild *c;
BdrvStateSetAioContext *state;
@@ -7664,21 +7688,17 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
return true;
}
- bdrv_graph_rdlock_main_loop();
QLIST_FOREACH(c, &bs->parents, next_parent) {
if (!bdrv_parent_change_aio_context(c, ctx, visited, tran, errp)) {
- bdrv_graph_rdunlock_main_loop();
return false;
}
}
QLIST_FOREACH(c, &bs->children, next) {
if (!bdrv_child_change_aio_context(c, ctx, visited, tran, errp)) {
- bdrv_graph_rdunlock_main_loop();
return false;
}
}
- bdrv_graph_rdunlock_main_loop();
state = g_new(BdrvStateSetAioContext, 1);
*state = (BdrvStateSetAioContext) {
@@ -7686,8 +7706,7 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
.bs = bs,
};
- /* Paired with bdrv_drained_end in bdrv_set_aio_context_clean() */
- bdrv_drained_begin(bs);
+ assert(bs->quiesce_counter > 0);
tran_add(tran, &set_aio_context, state);
@@ -7720,6 +7739,8 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
if (ignore_child) {
g_hash_table_add(visited, ignore_child);
}
+ bdrv_drain_all_begin();
+ bdrv_graph_rdlock_main_loop();
ret = bdrv_change_aio_context(bs, ctx, visited, tran, errp);
g_hash_table_destroy(visited);
@@ -7733,10 +7754,14 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
if (!ret) {
/* Just run clean() callbacks. No AioContext changed. */
tran_abort(tran);
+ bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_end();
return -EPERM;
}
tran_commit(tran);
+ bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_end();
return 0;
}
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 37466c7841..168f703fa1 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -983,6 +983,18 @@ struct BdrvChildClass {
bool backing_mask_protocol,
Error **errp);
+ /*
+ * Notifies the parent that the child is trying to change its AioContext.
+ * The parent may in turn change the AioContext of other nodes in the same
+ * transaction. Returns true if the change is possible and the transaction
+ * can be continued. Returns false and sets @errp if not and the transaction
+ * must be aborted.
+ *
+ * @visited will accumulate all visited BdrvChild objects. The caller is
+ * responsible for freeing the list afterwards.
+ *
+ * Must be called with the affected block nodes drained.
+ */
bool GRAPH_RDLOCK_PTR (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
GHashTable *visited,
Transaction *tran, Error **errp);
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 09/24] block: move drain outside of bdrv_try_change_aio_context()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (7 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-27 15:12 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 10/24] block: move drain outside of bdrv_attach_child_common(_abort)() Fiona Ebner
` (14 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
Convert the function to a _locked() version that has to be called with
the graph lock held and add a convenience wrapper that has to be
called with the graph unlocked, which drains and takes the lock
itself. Since bdrv_try_change_aio_context() is global state code, the
wrapper is too.
Callers are adapted to use the appropriate variant, depending on
whether the caller already holds the lock. In the
test_set_aio_context() unit test, prior drains can be removed, because
draining already happens inside the new wrapper.
Note that bdrv_attach_child_common_abort(), bdrv_attach_child_common()
and bdrv_root_unref_child() hold the graph lock and are not actually
allowed to drain either. This will be addressed in the following
commits.
Functions like qmp_blockdev_mirror() query the nodes to act on before
draining and locking. In theory, draining could invalidate those nodes.
This kind of issue is not addressed by these commits.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes in v3:
* Clarify that _locked() version was used iff caller already holds the
graph lock.
* Mention that certain kinds of issues are not solved by these
changes alone.
* Rebase on changes to previous patch.
block.c | 58 ++++++++++++++++++++++--------
blockdev.c | 15 +++++---
include/block/block-global-state.h | 8 +++--
tests/unit/test-bdrv-drain.c | 4 ---
4 files changed, 59 insertions(+), 26 deletions(-)
diff --git a/block.c b/block.c
index 6f42c0f1ab..3aaacabf7f 100644
--- a/block.c
+++ b/block.c
@@ -3028,7 +3028,10 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
bdrv_replace_child_noperm(s->child, NULL);
if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
- bdrv_try_change_aio_context(bs, s->old_child_ctx, NULL, &error_abort);
+ bdrv_drain_all_begin();
+ bdrv_try_change_aio_context_locked(bs, s->old_child_ctx, NULL,
+ &error_abort);
+ bdrv_drain_all_end();
}
if (bdrv_child_get_parent_aio_context(s->child) != s->old_parent_ctx) {
@@ -3115,8 +3118,10 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
parent_ctx = bdrv_child_get_parent_aio_context(new_child);
if (child_ctx != parent_ctx) {
Error *local_err = NULL;
- int ret = bdrv_try_change_aio_context(child_bs, parent_ctx, NULL,
- &local_err);
+ bdrv_drain_all_begin();
+ int ret = bdrv_try_change_aio_context_locked(child_bs, parent_ctx, NULL,
+ &local_err);
+ bdrv_drain_all_end();
if (ret < 0 && child_class->change_aio_ctx) {
Transaction *aio_ctx_tran = tran_new();
@@ -3319,8 +3324,10 @@ void bdrv_root_unref_child(BdrvChild *child)
* When the parent requiring a non-default AioContext is removed, the
* node moves back to the main AioContext
*/
- bdrv_try_change_aio_context(child_bs, qemu_get_aio_context(), NULL,
- NULL);
+ bdrv_drain_all_begin();
+ bdrv_try_change_aio_context_locked(child_bs, qemu_get_aio_context(),
+ NULL, NULL);
+ bdrv_drain_all_end();
}
bdrv_schedule_unref(child_bs);
@@ -7719,9 +7726,13 @@ bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
*
* If ignore_child is not NULL, that child (and its subgraph) will not
* be touched.
+ *
+ * Called with the graph lock held.
+ *
+ * Called while all bs are drained.
*/
-int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
- BdrvChild *ignore_child, Error **errp)
+int bdrv_try_change_aio_context_locked(BlockDriverState *bs, AioContext *ctx,
+ BdrvChild *ignore_child, Error **errp)
{
Transaction *tran;
GHashTable *visited;
@@ -7730,17 +7741,15 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
/*
* Recursion phase: go through all nodes of the graph.
- * Take care of checking that all nodes support changing AioContext
- * and drain them, building a linear list of callbacks to run if everything
- * is successful (the transaction itself).
+ * Take care of checking that all nodes support changing AioContext,
+ * building a linear list of callbacks to run if everything is successful
+ * (the transaction itself).
*/
tran = tran_new();
visited = g_hash_table_new(NULL, NULL);
if (ignore_child) {
g_hash_table_add(visited, ignore_child);
}
- bdrv_drain_all_begin();
- bdrv_graph_rdlock_main_loop();
ret = bdrv_change_aio_context(bs, ctx, visited, tran, errp);
g_hash_table_destroy(visited);
@@ -7754,15 +7763,34 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
if (!ret) {
/* Just run clean() callbacks. No AioContext changed. */
tran_abort(tran);
- bdrv_graph_rdunlock_main_loop();
- bdrv_drain_all_end();
return -EPERM;
}
tran_commit(tran);
+ return 0;
+}
+
+/*
+ * Change bs's and recursively all of its parents' and children's AioContext
+ * to the given new context, returning an error if that isn't possible.
+ *
+ * If ignore_child is not NULL, that child (and its subgraph) will not
+ * be touched.
+ */
+int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+ BdrvChild *ignore_child, Error **errp)
+{
+ int ret;
+
+ GLOBAL_STATE_CODE();
+
+ bdrv_drain_all_begin();
+ bdrv_graph_rdlock_main_loop();
+ ret = bdrv_try_change_aio_context_locked(bs, ctx, ignore_child, errp);
bdrv_graph_rdunlock_main_loop();
bdrv_drain_all_end();
- return 0;
+
+ return ret;
}
void bdrv_add_aio_context_notifier(BlockDriverState *bs,
diff --git a/blockdev.c b/blockdev.c
index 3982f9776b..750beba41f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3601,12 +3601,13 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread,
AioContext *new_context;
BlockDriverState *bs;
- GRAPH_RDLOCK_GUARD_MAINLOOP();
+ bdrv_drain_all_begin();
+ bdrv_graph_rdlock_main_loop();
bs = bdrv_find_node(node_name);
if (!bs) {
error_setg(errp, "Failed to find node with node-name='%s'", node_name);
- return;
+ goto out;
}
/* Protects against accidents. */
@@ -3614,14 +3615,14 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread,
error_setg(errp, "Node %s is associated with a BlockBackend and could "
"be in use (use force=true to override this check)",
node_name);
- return;
+ goto out;
}
if (iothread->type == QTYPE_QSTRING) {
IOThread *obj = iothread_by_id(iothread->u.s);
if (!obj) {
error_setg(errp, "Cannot find iothread %s", iothread->u.s);
- return;
+ goto out;
}
new_context = iothread_get_aio_context(obj);
@@ -3629,7 +3630,11 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread,
new_context = qemu_get_aio_context();
}
- bdrv_try_change_aio_context(bs, new_context, NULL, errp);
+ bdrv_try_change_aio_context_locked(bs, new_context, NULL, errp);
+
+out:
+ bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_end();
}
QemuOptsList qemu_common_drive_opts = {
diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index aad160956a..91f249b5ad 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -278,8 +278,12 @@ bool GRAPH_RDLOCK
bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
GHashTable *visited, Transaction *tran,
Error **errp);
-int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
- BdrvChild *ignore_child, Error **errp);
+int GRAPH_UNLOCKED
+bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+ BdrvChild *ignore_child, Error **errp);
+int GRAPH_RDLOCK
+bdrv_try_change_aio_context_locked(BlockDriverState *bs, AioContext *ctx,
+ BdrvChild *ignore_child, Error **errp);
int GRAPH_RDLOCK bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz);
int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo);
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 290cd2a70e..3185f3f429 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1396,14 +1396,10 @@ static void test_set_aio_context(void)
bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
&error_abort);
- bdrv_drained_begin(bs);
bdrv_try_change_aio_context(bs, ctx_a, NULL, &error_abort);
- bdrv_drained_end(bs);
- bdrv_drained_begin(bs);
bdrv_try_change_aio_context(bs, ctx_b, NULL, &error_abort);
bdrv_try_change_aio_context(bs, qemu_get_aio_context(), NULL, &error_abort);
- bdrv_drained_end(bs);
bdrv_unref(bs);
iothread_join(a);
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 10/24] block: move drain outside of bdrv_attach_child_common(_abort)()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (8 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 09/24] block: move drain outside of bdrv_try_change_aio_context() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-27 15:23 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 11/24] block: move drain outside of bdrv_set_backing_hd_drained() Fiona Ebner
` (13 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
The function bdrv_attach_child_common_abort() is used only as the
abort callback in bdrv_attach_child_common_drv transactions, so the
tran_finalize() calls of such transactions need to be in drained
sections too.
All code paths are covered:
The bdrv_attach_child_common_drv transactions are only used in
bdrv_attach_child_common(), so it is enough to check callers of
bdrv_attach_child_common() following the transactions.
bdrv_attach_child_common() is called by:
1. bdrv_attach_child_noperm(), which does not finalize the
transaction yet.
2. bdrv_root_attach_child(), where a drained section is introduced.
bdrv_attach_child_noperm() is called by:
1. bdrv_attach_child(), where a drained section is introduced.
2. bdrv_set_file_or_backing_noperm(), which does not finalize the
transaction yet.
3. bdrv_append(), where a drained section is introduced.
bdrv_set_file_or_backing_noperm() is called by:
1. bdrv_set_backing_hd_drained(), where a drained section is
introduced.
2. bdrv_reopen_parse_file_or_backing(), which does not finalize the
transaction yet. Draining the old child bs currently happens under
the graph lock there. This is replaced with an assertion, because
the drain will be moved further up to the caller.
bdrv_reopen_parse_file_or_backing() is called by:
1. bdrv_reopen_prepare(), which does not finalize the transaction yet.
bdrv_reopen_prepare() is called by:
1. bdrv_reopen_multiple(), which does finalize the transaction. It is
called after bdrv_reopen_queue(), which starts a drained section.
The drained section ends, when bdrv_reopen_queue_free() is called
at the end of bdrv_reopen_multiple().
This resolves all code paths.
The functions bdrv_set_backing_hd_drained(), bdrv_attach_child() and
bdrv_root_attach_child() run under the graph lock, so they are not
actually allowed to drain. This will be addressed in the following
commits.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
block.c | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)
diff --git a/block.c b/block.c
index 3aaacabf7f..64db71e38d 100644
--- a/block.c
+++ b/block.c
@@ -3028,10 +3028,8 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
bdrv_replace_child_noperm(s->child, NULL);
if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
- bdrv_drain_all_begin();
bdrv_try_change_aio_context_locked(bs, s->old_child_ctx, NULL,
&error_abort);
- bdrv_drain_all_end();
}
if (bdrv_child_get_parent_aio_context(s->child) != s->old_parent_ctx) {
@@ -3043,10 +3041,8 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
/* No need to visit `child`, because it has been detached already */
visited = g_hash_table_new(NULL, NULL);
- bdrv_drain_all_begin();
ret = s->child->klass->change_aio_ctx(s->child, s->old_parent_ctx,
visited, tran, &error_abort);
- bdrv_drain_all_end();
g_hash_table_destroy(visited);
/* transaction is supposed to always succeed */
@@ -3118,10 +3114,8 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
parent_ctx = bdrv_child_get_parent_aio_context(new_child);
if (child_ctx != parent_ctx) {
Error *local_err = NULL;
- bdrv_drain_all_begin();
int ret = bdrv_try_change_aio_context_locked(child_bs, parent_ctx, NULL,
&local_err);
- bdrv_drain_all_end();
if (ret < 0 && child_class->change_aio_ctx) {
Transaction *aio_ctx_tran = tran_new();
@@ -3129,11 +3123,9 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
bool ret_child;
g_hash_table_add(visited, new_child);
- bdrv_drain_all_begin();
ret_child = child_class->change_aio_ctx(new_child, child_ctx,
visited, aio_ctx_tran,
NULL);
- bdrv_drain_all_end();
if (ret_child == true) {
error_free(local_err);
ret = 0;
@@ -3244,6 +3236,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
GLOBAL_STATE_CODE();
+ bdrv_drain_all_begin();
child = bdrv_attach_child_common(child_bs, child_name, child_class,
child_role, perm, shared_perm, opaque,
tran, errp);
@@ -3256,6 +3249,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
out:
tran_finalize(tran, ret);
+ bdrv_drain_all_end();
bdrv_schedule_unref(child_bs);
@@ -3283,6 +3277,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
GLOBAL_STATE_CODE();
+ bdrv_drain_all_begin();
child = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
child_class, child_role, tran, errp);
if (!child) {
@@ -3297,6 +3292,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
out:
tran_finalize(tran, ret);
+ bdrv_drain_all_end();
bdrv_schedule_unref(child_bs);
@@ -3573,6 +3569,7 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
assert(bs->backing->bs->quiesce_counter > 0);
}
+ bdrv_drain_all_begin();
ret = bdrv_set_file_or_backing_noperm(bs, backing_hd, true, tran, errp);
if (ret < 0) {
goto out;
@@ -3581,6 +3578,7 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
ret = bdrv_refresh_perms(bs, tran, errp);
out:
tran_finalize(tran, ret);
+ bdrv_drain_all_end();
return ret;
}
@@ -4721,6 +4719,8 @@ int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
* Return 0 on success, otherwise return < 0 and set @errp.
*
* @reopen_state->bs can move to a different AioContext in this function.
+ *
+ * The old child bs must be drained.
*/
static int GRAPH_UNLOCKED
bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
@@ -4814,7 +4814,7 @@ bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
if (old_child_bs) {
bdrv_ref(old_child_bs);
- bdrv_drained_begin(old_child_bs);
+ assert(old_child_bs->quiesce_counter > 0);
}
bdrv_graph_rdunlock_main_loop();
@@ -4826,7 +4826,6 @@ bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
bdrv_graph_wrunlock();
if (old_child_bs) {
- bdrv_drained_end(old_child_bs);
bdrv_unref(old_child_bs);
}
@@ -5501,9 +5500,7 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
assert(!bs_new->backing);
bdrv_graph_rdunlock_main_loop();
- bdrv_drained_begin(bs_top);
- bdrv_drained_begin(bs_new);
-
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
child = bdrv_attach_child_noperm(bs_new, bs_top, "backing",
@@ -5525,9 +5522,7 @@ out:
bdrv_refresh_limits(bs_top, NULL, NULL);
bdrv_graph_wrunlock();
-
- bdrv_drained_end(bs_top);
- bdrv_drained_end(bs_new);
+ bdrv_drain_all_end();
return ret;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 11/24] block: move drain outside of bdrv_set_backing_hd_drained()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (9 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 10/24] block: move drain outside of bdrv_attach_child_common(_abort)() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-27 15:29 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 12/24] block: move drain outside of bdrv_root_attach_child() Fiona Ebner
` (12 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
The function bdrv_set_backing_hd_drained() holds the graph lock, so it
is not allowed to drain. It is called by:
1. bdrv_set_backing_hd(), where a drained section is introduced,
replacing the previously present bs-specific drains.
2. stream_prepare(), where a drained section is introduced replacing
the previously present bs-specific drains.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
block.c | 6 ++----
block/stream.c | 6 ++----
2 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/block.c b/block.c
index 64db71e38d..75322789b5 100644
--- a/block.c
+++ b/block.c
@@ -3569,7 +3569,6 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
assert(bs->backing->bs->quiesce_counter > 0);
}
- bdrv_drain_all_begin();
ret = bdrv_set_file_or_backing_noperm(bs, backing_hd, true, tran, errp);
if (ret < 0) {
goto out;
@@ -3578,7 +3577,6 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
ret = bdrv_refresh_perms(bs, tran, errp);
out:
tran_finalize(tran, ret);
- bdrv_drain_all_end();
return ret;
}
@@ -3594,11 +3592,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
bdrv_graph_rdunlock_main_loop();
bdrv_ref(drain_bs);
- bdrv_drained_begin(drain_bs);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
bdrv_graph_wrunlock();
- bdrv_drained_end(drain_bs);
+ bdrv_drain_all_end();
bdrv_unref(drain_bs);
return ret;
diff --git a/block/stream.c b/block/stream.c
index 999d9e56d4..6ba49cffd3 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -80,11 +80,10 @@ static int stream_prepare(Job *job)
* may end up working with the wrong base node (or it might even have gone
* away by the time we want to use it).
*/
- bdrv_drained_begin(unfiltered_bs);
if (unfiltered_bs_cow) {
bdrv_ref(unfiltered_bs_cow);
- bdrv_drained_begin(unfiltered_bs_cow);
}
+ bdrv_drain_all_begin();
bdrv_graph_rdlock_main_loop();
base = bdrv_filter_or_cow_bs(s->above_base);
@@ -123,11 +122,10 @@ static int stream_prepare(Job *job)
}
out:
+ bdrv_drain_all_end();
if (unfiltered_bs_cow) {
- bdrv_drained_end(unfiltered_bs_cow);
bdrv_unref(unfiltered_bs_cow);
}
- bdrv_drained_end(unfiltered_bs);
return ret;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 12/24] block: move drain outside of bdrv_root_attach_child()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (10 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 11/24] block: move drain outside of bdrv_set_backing_hd_drained() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-27 17:16 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 13/24] block: move drain outside of bdrv_attach_child() Fiona Ebner
` (11 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
The function bdrv_root_attach_child() runs under the graph lock, so it
is not allowed to drain. It is called by:
1. blk_insert_bs(), where a drained section is introduced.
2. block_job_add_bdrv(), which holds the graph lock itself.
block_job_add_bdrv() is called by:
1. mirror_start_job()
2. stream_start()
3. commit_start()
4. backup_job_create()
5. block_job_create()
6. In the test_blockjob_common_drain_node() unit test
In all callers, a drained section is introduced.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
block.c | 2 --
block/backup.c | 2 ++
block/block-backend.c | 2 ++
block/commit.c | 4 ++++
block/mirror.c | 5 +++++
block/stream.c | 4 ++++
blockjob.c | 4 ++++
tests/unit/test-bdrv-drain.c | 2 ++
8 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/block.c b/block.c
index 75322789b5..c0d2670ac7 100644
--- a/block.c
+++ b/block.c
@@ -3236,7 +3236,6 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
GLOBAL_STATE_CODE();
- bdrv_drain_all_begin();
child = bdrv_attach_child_common(child_bs, child_name, child_class,
child_role, perm, shared_perm, opaque,
tran, errp);
@@ -3249,7 +3248,6 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
out:
tran_finalize(tran, ret);
- bdrv_drain_all_end();
bdrv_schedule_unref(child_bs);
diff --git a/block/backup.c b/block/backup.c
index 0151e84395..909027c17a 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -498,10 +498,12 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
block_copy_set_speed(bcs, speed);
/* Required permissions are taken by copy-before-write filter target */
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
&error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
return &job->common;
diff --git a/block/block-backend.c b/block/block-backend.c
index 6a6949edeb..24cae3cb55 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -904,6 +904,7 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
GLOBAL_STATE_CODE();
bdrv_ref(bs);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
if ((bs->open_flags & BDRV_O_INACTIVE) && blk_can_inactivate(blk)) {
@@ -919,6 +920,7 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
perm, shared_perm, blk, errp);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
if (blk->root == NULL) {
return -EPERM;
}
diff --git a/block/commit.c b/block/commit.c
index 7cc8c0f0df..6c4b736ff8 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -392,6 +392,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
* this is the responsibility of the interface (i.e. whoever calls
* commit_start()).
*/
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
s->base_overlay = bdrv_find_overlay(top, base);
assert(s->base_overlay);
@@ -424,18 +425,21 @@ void commit_start(const char *job_id, BlockDriverState *bs,
iter_shared_perms, errp);
if (ret < 0) {
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
goto fail;
}
}
if (bdrv_freeze_backing_chain(commit_top_bs, base, errp) < 0) {
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
goto fail;
}
s->chain_frozen = true;
ret = block_job_add_bdrv(&s->common, "base", base, 0, BLK_PERM_ALL, errp);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
if (ret < 0) {
goto fail;
diff --git a/block/mirror.c b/block/mirror.c
index c2c5099c95..6e8caf4b49 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -2014,6 +2014,7 @@ static BlockJob *mirror_start_job(
*/
bdrv_disable_dirty_bitmap(s->dirty_bitmap);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
ret = block_job_add_bdrv(&s->common, "source", bs, 0,
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
@@ -2021,6 +2022,7 @@ static BlockJob *mirror_start_job(
errp);
if (ret < 0) {
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
goto fail;
}
@@ -2066,16 +2068,19 @@ static BlockJob *mirror_start_job(
iter_shared_perms, errp);
if (ret < 0) {
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
goto fail;
}
}
if (bdrv_freeze_backing_chain(mirror_top_bs, target, errp) < 0) {
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
goto fail;
}
}
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
QTAILQ_INIT(&s->ops_in_flight);
diff --git a/block/stream.c b/block/stream.c
index 6ba49cffd3..f5441f27f4 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -371,10 +371,12 @@ void stream_start(const char *job_id, BlockDriverState *bs,
* already have our own plans. Also don't allow resize as the image size is
* queried only at the job start and then cached.
*/
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
if (block_job_add_bdrv(&s->common, "active node", bs, 0,
basic_flags | BLK_PERM_WRITE, errp)) {
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
goto fail;
}
@@ -395,10 +397,12 @@ void stream_start(const char *job_id, BlockDriverState *bs,
basic_flags, errp);
if (ret < 0) {
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
goto fail;
}
}
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
s->base_overlay = base_overlay;
s->above_base = above_base;
diff --git a/blockjob.c b/blockjob.c
index 34185d7715..44991e3ff7 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -496,6 +496,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
int ret;
GLOBAL_STATE_CODE();
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
if (job_id == NULL && !(flags & JOB_INTERNAL)) {
@@ -506,6 +507,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
flags, cb, opaque, errp);
if (job == NULL) {
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
return NULL;
}
@@ -544,10 +546,12 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
}
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
return job;
fail:
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
job_early_fail(&job->job);
return NULL;
}
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 3185f3f429..4f3057844b 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -772,9 +772,11 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
tjob->bs = src;
job = &tjob->common;
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, &error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
switch (result) {
case TEST_JOB_SUCCESS:
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 13/24] block: move drain outside of bdrv_attach_child()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (11 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 12/24] block: move drain outside of bdrv_root_attach_child() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-27 17:20 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 14/24] block: move drain outside of quorum_add_child() Fiona Ebner
` (10 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
The function bdrv_attach_child() runs under the graph lock, so it is
not allowed to drain. It is called by:
1. replication_start()
2. quorum_add_child()
3. bdrv_open_child_common()
4. Throughout test-bdrv-graph-mod.c and test-bdrv-drain.c unit tests.
In all callers, a drained section is introduced.
The function quorum_add_child() runs under the graph lock, so it is
not actually allowed to drain. This will be addressed by the following
commit.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
block.c | 4 ++--
block/quorum.c | 2 ++
block/replication.c | 5 +++++
tests/unit/test-bdrv-drain.c | 14 ++++++++++++++
tests/unit/test-bdrv-graph-mod.c | 10 ++++++++++
5 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/block.c b/block.c
index c0d2670ac7..66fbc707f3 100644
--- a/block.c
+++ b/block.c
@@ -3275,7 +3275,6 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
GLOBAL_STATE_CODE();
- bdrv_drain_all_begin();
child = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
child_class, child_role, tran, errp);
if (!child) {
@@ -3290,7 +3289,6 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
out:
tran_finalize(tran, ret);
- bdrv_drain_all_end();
bdrv_schedule_unref(child_bs);
@@ -3786,10 +3784,12 @@ static BdrvChild *bdrv_open_child_common(const char *filename,
return NULL;
}
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
errp);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
return child;
}
diff --git a/block/quorum.c b/block/quorum.c
index ed8ce801ee..ea17b0ec13 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1096,8 +1096,10 @@ quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs, Error **errp)
/* We can safely add the child now */
bdrv_ref(child_bs);
+ bdrv_drain_all_begin();
child = bdrv_attach_child(bs, child_bs, indexstr, &child_of_bds,
BDRV_CHILD_DATA, errp);
+ bdrv_drain_all_end();
if (child == NULL) {
s->next_child_index--;
return;
diff --git a/block/replication.c b/block/replication.c
index 07f274de9e..54cbd03e00 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -540,6 +540,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
return;
}
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_ref(hidden_disk->bs);
@@ -549,6 +550,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
if (local_err) {
error_propagate(errp, local_err);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
return;
}
@@ -559,6 +561,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
if (local_err) {
error_propagate(errp, local_err);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
return;
}
@@ -571,12 +574,14 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
!check_top_bs(top_bs, bs)) {
error_setg(errp, "No top_bs or it is invalid");
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
reopen_backing_file(bs, false, NULL);
return;
}
bdrv_op_block_all(top_bs, s->blocker);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
s->backup_job = backup_job_create(
NULL, s->secondary_disk->bs, s->hidden_disk->bs,
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 4f3057844b..ac76525e5a 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1049,10 +1049,12 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
&error_abort);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
/* This child will be the one to pass to requests through to, and
* it will stall until a drain occurs */
@@ -1060,21 +1062,25 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
&error_abort);
child_bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
/* Takes our reference to child_bs */
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child",
&child_of_bds,
BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
&error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
/* This child is just there to be deleted
* (for detach_instead_of_delete == true) */
null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
&error_abort);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds, BDRV_CHILD_DATA,
&error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
blk_insert_bs(blk, bs, &error_abort);
@@ -1157,6 +1163,7 @@ static void no_coroutine_fn detach_indirect_bh(void *opaque)
bdrv_dec_in_flight(data->child_b->bs);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_unref_child(data->parent_b, data->child_b);
@@ -1165,6 +1172,7 @@ static void no_coroutine_fn detach_indirect_bh(void *opaque)
&child_of_bds, BDRV_CHILD_DATA,
&error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
}
static void coroutine_mixed_fn detach_by_parent_aio_cb(void *opaque, int ret)
@@ -1262,6 +1270,7 @@ static void TSA_NO_TSA test_detach_indirect(bool by_parent_cb)
/* Set child relationships */
bdrv_ref(b);
bdrv_ref(a);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
child_b = bdrv_attach_child(parent_b, b, "PB-B", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
@@ -1273,6 +1282,7 @@ static void TSA_NO_TSA test_detach_indirect(bool by_parent_cb)
by_parent_cb ? &child_of_bds : &detach_by_driver_cb_class,
BDRV_CHILD_DATA, &error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
g_assert_cmpint(parent_a->refcnt, ==, 1);
g_assert_cmpint(parent_b->refcnt, ==, 1);
@@ -1685,6 +1695,7 @@ static void test_drop_intermediate_poll(void)
* Establish the chain last, so the chain links are the first
* elements in the BDS.parents lists
*/
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
for (i = 0; i < 3; i++) {
if (i) {
@@ -1694,6 +1705,7 @@ static void test_drop_intermediate_poll(void)
}
}
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
job = block_job_create("job", &test_simple_job_driver, NULL, job_node,
0, BLK_PERM_ALL, 0, 0, NULL, NULL, &error_abort);
@@ -1940,10 +1952,12 @@ static void do_test_replace_child_mid_drain(int old_drain_count,
new_child_bs->total_sectors = 1;
bdrv_ref(old_child_bs);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_attach_child(parent_bs, old_child_bs, "child", &child_of_bds,
BDRV_CHILD_COW, &error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
parent_s->setup_completed = true;
for (i = 0; i < old_drain_count; i++) {
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index d743abb4bb..7b03ebe4b0 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -137,10 +137,12 @@ static void test_update_perm_tree(void)
blk_insert_bs(root, bs, &error_abort);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_attach_child(filter, bs, "child", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
ret = bdrv_append(filter, bs, NULL);
g_assert_cmpint(ret, <, 0);
@@ -204,11 +206,13 @@ static void test_should_update_child(void)
bdrv_set_backing_hd(target, bs, &error_abort);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
g_assert(target->backing->bs == bs);
bdrv_attach_child(filter, target, "target", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
bdrv_append(filter, bs, &error_abort);
bdrv_graph_rdlock_main_loop();
@@ -244,6 +248,7 @@ static void test_parallel_exclusive_write(void)
bdrv_ref(base);
bdrv_ref(fl1);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_attach_child(top, fl1, "backing", &child_of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -257,6 +262,7 @@ static void test_parallel_exclusive_write(void)
bdrv_replace_node(fl1, fl2, &error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
bdrv_drained_end(fl2);
bdrv_drained_end(fl1);
@@ -363,6 +369,7 @@ static void test_parallel_perm_update(void)
*/
bdrv_ref(base);
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_attach_child(top, ws, "file", &child_of_bds, BDRV_CHILD_DATA,
&error_abort);
@@ -377,6 +384,7 @@ static void test_parallel_perm_update(void)
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
&error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
/* Select fl1 as first child to be active */
s->selected = c_fl1;
@@ -430,11 +438,13 @@ static void test_append_greedy_filter(void)
BlockDriverState *base = no_perm_node("base");
BlockDriverState *fl = exclusive_writer_node("fl1");
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_attach_child(top, base, "backing", &child_of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
&error_abort);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
bdrv_append(fl, base, &error_abort);
bdrv_unref(fl);
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 14/24] block: move drain outside of quorum_add_child()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (12 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 13/24] block: move drain outside of bdrv_attach_child() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-27 17:23 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 15/24] block: move drain outside of bdrv_root_unref_child() Fiona Ebner
` (9 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
The quorum_add_child() callback runs under the graph lock, so it is
not allowed to drain. It is only called as the .bdrv_add_child()
callback, which is only called in the bdrv_add_child() function, which
also runs under the graph lock.
The bdrv_add_child() function is called by qmp_x_blockdev_change(),
where a drained section is introduced.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
block/quorum.c | 2 --
blockdev.c | 2 ++
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/quorum.c b/block/quorum.c
index ea17b0ec13..ed8ce801ee 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1096,10 +1096,8 @@ quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs, Error **errp)
/* We can safely add the child now */
bdrv_ref(child_bs);
- bdrv_drain_all_begin();
child = bdrv_attach_child(bs, child_bs, indexstr, &child_of_bds,
BDRV_CHILD_DATA, errp);
- bdrv_drain_all_end();
if (child == NULL) {
s->next_child_index--;
return;
diff --git a/blockdev.c b/blockdev.c
index 750beba41f..bd5ca77619 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3531,6 +3531,7 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
BlockDriverState *parent_bs, *new_bs = NULL;
BdrvChild *p_child;
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
parent_bs = bdrv_lookup_bs(parent, parent, errp);
@@ -3568,6 +3569,7 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
out:
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
}
BlockJobInfoList *qmp_query_block_jobs(Error **errp)
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 15/24] block: move drain outside of bdrv_root_unref_child()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (13 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 14/24] block: move drain outside of quorum_add_child() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-27 17:50 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 16/24] block: move drain outside of quorum_del_child() Fiona Ebner
` (8 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
bdrv_root_unref_child() is called by:
1. blk_remove_bs(), where a drained section is introduced.
2. bdrv_unref_child(), which runs under the graph lock, so the drain
will be moved further up to its callers.
3. block_job_remove_all_bdrv(), where a drained section is introduced.
For all callers of bdrv_unref_child() a drained section is introduced,
they are not explicilty listed here. The caller quorum_del_child()
holds the graph lock, so it is not actually allowed to drain. This
will be addressed in the next commit.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
block.c | 6 ++++--
block/blklogwrites.c | 4 ++++
block/blkverify.c | 2 ++
block/block-backend.c | 2 ++
block/qcow2.c | 2 ++
block/quorum.c | 6 ++++++
block/replication.c | 2 ++
block/snapshot.c | 2 ++
block/vmdk.c | 10 ++++++++++
blockjob.c | 2 ++
tests/unit/test-bdrv-drain.c | 2 ++
11 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/block.c b/block.c
index 66fbc707f3..e1b1a8abc7 100644
--- a/block.c
+++ b/block.c
@@ -1721,12 +1721,14 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
open_failed:
bs->drv = NULL;
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
if (bs->file != NULL) {
bdrv_unref_child(bs, bs->file);
assert(!bs->file);
}
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
g_free(bs->opaque);
bs->opaque = NULL;
@@ -3316,10 +3318,8 @@ void bdrv_root_unref_child(BdrvChild *child)
* When the parent requiring a non-default AioContext is removed, the
* node moves back to the main AioContext
*/
- bdrv_drain_all_begin();
bdrv_try_change_aio_context_locked(child_bs, qemu_get_aio_context(),
NULL, NULL);
- bdrv_drain_all_end();
}
bdrv_schedule_unref(child_bs);
@@ -5163,6 +5163,7 @@ static void bdrv_close(BlockDriverState *bs)
bs->drv = NULL;
}
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
bdrv_unref_child(bs, child);
@@ -5171,6 +5172,7 @@ static void bdrv_close(BlockDriverState *bs)
assert(!bs->backing);
assert(!bs->file);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
g_free(bs->opaque);
bs->opaque = NULL;
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index b0f78c4bc7..70ac76f401 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -281,9 +281,11 @@ static int blk_log_writes_open(BlockDriverState *bs, QDict *options, int flags,
ret = 0;
fail_log:
if (ret < 0) {
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_unref_child(bs, s->log_file);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
s->log_file = NULL;
qemu_mutex_destroy(&s->mutex);
}
@@ -296,10 +298,12 @@ static void blk_log_writes_close(BlockDriverState *bs)
{
BDRVBlkLogWritesState *s = bs->opaque;
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_unref_child(bs, s->log_file);
s->log_file = NULL;
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
qemu_mutex_destroy(&s->mutex);
}
diff --git a/block/blkverify.c b/block/blkverify.c
index db79a36681..3a71f7498c 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -151,10 +151,12 @@ static void blkverify_close(BlockDriverState *bs)
{
BDRVBlkverifyState *s = bs->opaque;
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_unref_child(bs, s->test_file);
s->test_file = NULL;
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
}
static int64_t coroutine_fn GRAPH_RDLOCK
diff --git a/block/block-backend.c b/block/block-backend.c
index 24cae3cb55..68209bb2f7 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -889,9 +889,11 @@ void blk_remove_bs(BlockBackend *blk)
root = blk->root;
blk->root = NULL;
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_root_unref_child(root);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
}
/*
diff --git a/block/qcow2.c b/block/qcow2.c
index 66fba89b41..420918b3c3 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2821,9 +2821,11 @@ qcow2_do_close(BlockDriverState *bs, bool close_data_file)
if (close_data_file && has_data_file(bs)) {
GLOBAL_STATE_CODE();
bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_unref_child(bs, s->data_file);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
s->data_file = NULL;
bdrv_graph_rdlock_main_loop();
}
diff --git a/block/quorum.c b/block/quorum.c
index ed8ce801ee..81407a38ee 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1037,6 +1037,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
close_exit:
/* cleanup on error */
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
for (i = 0; i < s->num_children; i++) {
if (!opened[i]) {
@@ -1045,6 +1046,7 @@ close_exit:
bdrv_unref_child(bs, s->children[i]);
}
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
g_free(s->children);
g_free(opened);
exit:
@@ -1057,11 +1059,13 @@ static void quorum_close(BlockDriverState *bs)
BDRVQuorumState *s = bs->opaque;
int i;
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
for (i = 0; i < s->num_children; i++) {
bdrv_unref_child(bs, s->children[i]);
}
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
g_free(s->children);
}
@@ -1143,7 +1147,9 @@ quorum_del_child(BlockDriverState *bs, BdrvChild *child, Error **errp)
(s->num_children - i - 1) * sizeof(BdrvChild *));
s->children = g_renew(BdrvChild *, s->children, --s->num_children);
+ bdrv_drain_all_begin();
bdrv_unref_child(bs, child);
+ bdrv_drain_all_end();
quorum_refresh_flags(bs);
}
diff --git a/block/replication.c b/block/replication.c
index 54cbd03e00..0879718854 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -656,12 +656,14 @@ static void replication_done(void *opaque, int ret)
if (ret == 0) {
s->stage = BLOCK_REPLICATION_DONE;
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_unref_child(bs, s->secondary_disk);
s->secondary_disk = NULL;
bdrv_unref_child(bs, s->hidden_disk);
s->hidden_disk = NULL;
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
s->error = 0;
} else {
diff --git a/block/snapshot.c b/block/snapshot.c
index 9f300a78bd..28c9c43621 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -291,9 +291,11 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
}
/* .bdrv_open() will re-attach it */
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_unref_child(bs, fallback);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
ret = bdrv_snapshot_goto(fallback_bs, snapshot_id, errp);
memset(bs->opaque, 0, drv->instance_size);
diff --git a/block/vmdk.c b/block/vmdk.c
index 9c7ab037e1..89a7250120 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -271,6 +271,7 @@ static void vmdk_free_extents(BlockDriverState *bs)
BDRVVmdkState *s = bs->opaque;
VmdkExtent *e;
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
for (i = 0; i < s->num_extents; i++) {
e = &s->extents[i];
@@ -283,6 +284,7 @@ static void vmdk_free_extents(BlockDriverState *bs)
}
}
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
g_free(s->extents);
}
@@ -1247,9 +1249,11 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
0, 0, 0, 0, 0, &extent, errp);
if (ret < 0) {
bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_unref_child(bs, extent_file);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
bdrv_graph_rdlock_main_loop();
goto out;
}
@@ -1266,9 +1270,11 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
g_free(buf);
if (ret) {
bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_unref_child(bs, extent_file);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
bdrv_graph_rdlock_main_loop();
goto out;
}
@@ -1277,9 +1283,11 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
if (ret) {
bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_unref_child(bs, extent_file);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
bdrv_graph_rdlock_main_loop();
goto out;
}
@@ -1287,9 +1295,11 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
} else {
error_setg(errp, "Unsupported extent type '%s'", type);
bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
bdrv_unref_child(bs, extent_file);
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
bdrv_graph_rdlock_main_loop();
ret = -ENOTSUP;
goto out;
diff --git a/blockjob.c b/blockjob.c
index 44991e3ff7..e68181a35b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
* one to make sure that such a concurrent access does not attempt
* to process an already freed BdrvChild.
*/
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
while (job->nodes) {
GSList *l = job->nodes;
@@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
g_slist_free_1(l);
}
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
}
bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index ac76525e5a..c33f7d31c2 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -955,11 +955,13 @@ static void bdrv_test_top_close(BlockDriverState *bs)
{
BdrvChild *c, *next_c;
+ bdrv_drain_all_begin();
bdrv_graph_wrlock();
QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
bdrv_unref_child(bs, c);
}
bdrv_graph_wrunlock();
+ bdrv_drain_all_end();
}
static int coroutine_fn GRAPH_RDLOCK
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 16/24] block: move drain outside of quorum_del_child()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (14 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 15/24] block: move drain outside of bdrv_root_unref_child() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-27 17:52 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 17/24] blockdev: drain while unlocked in internal_snapshot_action() Fiona Ebner
` (7 subsequent siblings)
23 siblings, 1 reply; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
The quorum_del_child() callback runs under the graph lock, so it is
not allowed to drain. It is only called as the .bdrv_del_child()
callback, which is only called in the bdrv_del_child() function, which
also runs under the graph lock.
The bdrv_del_child() function is called by qmp_x_blockdev_change().
A drained section was already introduced there by commit "block: move
drain out of quorum_add_child()".
This finally finishes moving out the drain to places that are not
under the graph lock started in "block: move draining out of
bdrv_change_aio_context() and mark GRAPH_RDLOCK".
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
block/quorum.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/block/quorum.c b/block/quorum.c
index 81407a38ee..cc3bc5f4e7 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1147,9 +1147,7 @@ quorum_del_child(BlockDriverState *bs, BdrvChild *child, Error **errp)
(s->num_children - i - 1) * sizeof(BdrvChild *));
s->children = g_renew(BdrvChild *, s->children, --s->num_children);
- bdrv_drain_all_begin();
bdrv_unref_child(bs, child);
- bdrv_drain_all_end();
quorum_refresh_flags(bs);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 17/24] blockdev: drain while unlocked in internal_snapshot_action()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (15 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 16/24] block: move drain outside of quorum_del_child() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 18/24] blockdev: drain while unlocked in external_snapshot_action() Fiona Ebner
` (6 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
blockdev.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index bd5ca77619..506755bef1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1208,7 +1208,7 @@ static void internal_snapshot_action(BlockdevSnapshotInternal *internal,
Error *local_err = NULL;
const char *device;
const char *name;
- BlockDriverState *bs;
+ BlockDriverState *bs, *check_bs;
QEMUSnapshotInfo old_sn, *sn;
bool ret;
int64_t rt;
@@ -1216,7 +1216,7 @@ static void internal_snapshot_action(BlockdevSnapshotInternal *internal,
int ret1;
GLOBAL_STATE_CODE();
- GRAPH_RDLOCK_GUARD_MAINLOOP();
+ bdrv_graph_rdlock_main_loop();
tran_add(tran, &internal_snapshot_drv, state);
@@ -1225,14 +1225,29 @@ static void internal_snapshot_action(BlockdevSnapshotInternal *internal,
bs = qmp_get_root_bs(device, errp);
if (!bs) {
+ bdrv_graph_rdunlock_main_loop();
return;
}
state->bs = bs;
+ /* Need to drain while unlocked. */
+ bdrv_graph_rdunlock_main_loop();
/* Paired with .clean() */
bdrv_drained_begin(bs);
+ GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+ /* Make sure the root bs did not change with the drain. */
+ check_bs = qmp_get_root_bs(device, errp);
+ if (bs != check_bs) {
+ if (check_bs) {
+ error_setg(errp, "Block node of device '%s' unexpectedly changed",
+ device);
+ } /* else errp is already set */
+ return;
+ }
+
if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_INTERNAL_SNAPSHOT, errp)) {
return;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 18/24] blockdev: drain while unlocked in external_snapshot_action()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (16 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 17/24] blockdev: drain while unlocked in internal_snapshot_action() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 19/24] block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED Fiona Ebner
` (5 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
blockdev.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/blockdev.c b/blockdev.c
index 506755bef1..2e7fda6780 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1377,9 +1377,10 @@ static void external_snapshot_action(TransactionAction *action,
const char *new_image_file;
ExternalSnapshotState *state = g_new0(ExternalSnapshotState, 1);
uint64_t perm, shared;
+ BlockDriverState *check_bs;
/* TODO We'll eventually have to take a writer lock in this function */
- GRAPH_RDLOCK_GUARD_MAINLOOP();
+ bdrv_graph_rdlock_main_loop();
tran_add(tran, &external_snapshot_drv, state);
@@ -1412,11 +1413,25 @@ static void external_snapshot_action(TransactionAction *action,
state->old_bs = bdrv_lookup_bs(device, node_name, errp);
if (!state->old_bs) {
+ bdrv_graph_rdunlock_main_loop();
return;
}
+ /* Need to drain while unlocked. */
+ bdrv_graph_rdunlock_main_loop();
/* Paired with .clean() */
bdrv_drained_begin(state->old_bs);
+ GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+ /* Make sure the associated bs did not change with the drain. */
+ check_bs = bdrv_lookup_bs(device, node_name, errp);
+ if (state->old_bs != check_bs) {
+ if (check_bs) {
+ error_setg(errp, "Block node of device '%s' unexpectedly changed",
+ device);
+ } /* else errp is already set */
+ return;
+ }
if (!bdrv_is_inserted(state->old_bs)) {
error_setg(errp, "Device '%s' has no medium",
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 19/24] block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (17 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 18/24] blockdev: drain while unlocked in external_snapshot_action() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 20/24] iotests/graph-changes-while-io: remove image file after test Fiona Ebner
` (4 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
All of bdrv_drain_all_begin(), bdrv_drain_all() and
bdrv_drained_begin() poll and are not allowed to be called with the
block graph lock held. Mark the function as such.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes in v3:
* Also mark bdrv_drain_all_begin() and bdrv_drain_all() as
GRAPH_UNLOCKED.
include/block/block-global-state.h | 4 ++--
include/block/block-io.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index 91f249b5ad..84a2a4ecd5 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -192,10 +192,10 @@ int bdrv_inactivate_all(void);
int bdrv_flush_all(void);
void bdrv_close_all(void);
-void bdrv_drain_all_begin(void);
+void GRAPH_UNLOCKED bdrv_drain_all_begin(void);
void bdrv_drain_all_begin_nopoll(void);
void bdrv_drain_all_end(void);
-void bdrv_drain_all(void);
+void GRAPH_UNLOCKED bdrv_drain_all(void);
void bdrv_aio_cancel(BlockAIOCB *acb);
diff --git a/include/block/block-io.h b/include/block/block-io.h
index b99cc98d26..4cf83fb367 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -431,7 +431,7 @@ bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
*
* This function can be recursive.
*/
-void bdrv_drained_begin(BlockDriverState *bs);
+void GRAPH_UNLOCKED bdrv_drained_begin(BlockDriverState *bs);
/**
* bdrv_do_drained_begin_quiesce:
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 20/24] iotests/graph-changes-while-io: remove image file after test
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (18 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 19/24] block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 21/24] iotests/graph-changes-while-io: add test case with removal of lower snapshot Fiona Ebner
` (3 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
tests/qemu-iotests/tests/graph-changes-while-io | 1 +
1 file changed, 1 insertion(+)
diff --git a/tests/qemu-iotests/tests/graph-changes-while-io b/tests/qemu-iotests/tests/graph-changes-while-io
index 194fda500e..35489e3b5e 100755
--- a/tests/qemu-iotests/tests/graph-changes-while-io
+++ b/tests/qemu-iotests/tests/graph-changes-while-io
@@ -57,6 +57,7 @@ class TestGraphChangesWhileIO(QMPTestCase):
def tearDown(self) -> None:
self.qsd.stop()
+ os.remove(top)
def test_blockdev_add_while_io(self) -> None:
# Run qemu-img bench in the background
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 21/24] iotests/graph-changes-while-io: add test case with removal of lower snapshot
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (19 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 20/24] iotests/graph-changes-while-io: remove image file after test Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 22/24] block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end() Fiona Ebner
` (2 subsequent siblings)
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
This case is catching potential deadlock which takes place when job-dismiss
is issued when I/O requests are processed in a separate iothread.
See https://mail.gnu.org/archive/html/qemu-devel/2025-04/msg04421.html
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
[FE: re-use top image and rename snap1->mid as suggested by Kevin Wolf
remove image file after test as suggested by Kevin Wolf
add type annotation for function argument to make mypy happy]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
.../qemu-iotests/tests/graph-changes-while-io | 101 ++++++++++++++++--
.../tests/graph-changes-while-io.out | 4 +-
2 files changed, 96 insertions(+), 9 deletions(-)
diff --git a/tests/qemu-iotests/tests/graph-changes-while-io b/tests/qemu-iotests/tests/graph-changes-while-io
index 35489e3b5e..dca1167b6d 100755
--- a/tests/qemu-iotests/tests/graph-changes-while-io
+++ b/tests/qemu-iotests/tests/graph-changes-while-io
@@ -27,6 +27,7 @@ from iotests import imgfmt, qemu_img, qemu_img_create, qemu_io, \
top = os.path.join(iotests.test_dir, 'top.img')
+mid = os.path.join(iotests.test_dir, 'mid.img')
nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock')
@@ -59,6 +60,15 @@ class TestGraphChangesWhileIO(QMPTestCase):
self.qsd.stop()
os.remove(top)
+ def _wait_for_blockjob(self, status: str) -> None:
+ done = False
+ while not done:
+ for event in self.qsd.get_qmp().get_events(wait=10.0):
+ if event['event'] != 'JOB_STATUS_CHANGE':
+ continue
+ if event['data']['status'] == status:
+ done = True
+
def test_blockdev_add_while_io(self) -> None:
# Run qemu-img bench in the background
bench_thr = Thread(target=do_qemu_img_bench)
@@ -117,16 +127,93 @@ class TestGraphChangesWhileIO(QMPTestCase):
'device': 'job0',
})
- cancelled = False
- while not cancelled:
- for event in self.qsd.get_qmp().get_events(wait=10.0):
- if event['event'] != 'JOB_STATUS_CHANGE':
- continue
- if event['data']['status'] == 'null':
- cancelled = True
+ self._wait_for_blockjob('null')
bench_thr.join()
+ def test_remove_lower_snapshot_while_io(self) -> None:
+ # Run qemu-img bench in the background
+ bench_thr = Thread(target=do_qemu_img_bench, args=(100000, ))
+ bench_thr.start()
+
+ # While I/O is performed on 'node0' node, consequently add 2 snapshots
+ # on top of it, then remove (commit) them starting from lower one.
+ while bench_thr.is_alive():
+ # Recreate snapshot images on every iteration
+ qemu_img_create('-f', imgfmt, mid, '1G')
+ qemu_img_create('-f', imgfmt, top, '1G')
+
+ self.qsd.cmd('blockdev-add', {
+ 'driver': imgfmt,
+ 'node-name': 'mid',
+ 'file': {
+ 'driver': 'file',
+ 'filename': mid
+ }
+ })
+
+ self.qsd.cmd('blockdev-snapshot', {
+ 'node': 'node0',
+ 'overlay': 'mid',
+ })
+
+ self.qsd.cmd('blockdev-add', {
+ 'driver': imgfmt,
+ 'node-name': 'top',
+ 'file': {
+ 'driver': 'file',
+ 'filename': top
+ }
+ })
+
+ self.qsd.cmd('blockdev-snapshot', {
+ 'node': 'mid',
+ 'overlay': 'top',
+ })
+
+ self.qsd.cmd('block-commit', {
+ 'job-id': 'commit-mid',
+ 'device': 'top',
+ 'top-node': 'mid',
+ 'base-node': 'node0',
+ 'auto-finalize': True,
+ 'auto-dismiss': False,
+ })
+
+ self._wait_for_blockjob('concluded')
+ self.qsd.cmd('job-dismiss', {
+ 'id': 'commit-mid',
+ })
+
+ self.qsd.cmd('block-commit', {
+ 'job-id': 'commit-top',
+ 'device': 'top',
+ 'top-node': 'top',
+ 'base-node': 'node0',
+ 'auto-finalize': True,
+ 'auto-dismiss': False,
+ })
+
+ self._wait_for_blockjob('ready')
+ self.qsd.cmd('job-complete', {
+ 'id': 'commit-top',
+ })
+
+ self._wait_for_blockjob('concluded')
+ self.qsd.cmd('job-dismiss', {
+ 'id': 'commit-top',
+ })
+
+ self.qsd.cmd('blockdev-del', {
+ 'node-name': 'mid'
+ })
+ self.qsd.cmd('blockdev-del', {
+ 'node-name': 'top'
+ })
+
+ bench_thr.join()
+ os.remove(mid)
+
if __name__ == '__main__':
# Format must support raw backing files
iotests.main(supported_fmts=['qcow', 'qcow2', 'qed'],
diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out b/tests/qemu-iotests/tests/graph-changes-while-io.out
index fbc63e62f8..8d7e996700 100644
--- a/tests/qemu-iotests/tests/graph-changes-while-io.out
+++ b/tests/qemu-iotests/tests/graph-changes-while-io.out
@@ -1,5 +1,5 @@
-..
+...
----------------------------------------------------------------------
-Ran 2 tests
+Ran 3 tests
OK
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 22/24] block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end()
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (20 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 21/24] iotests/graph-changes-while-io: add test case with removal of lower snapshot Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 23/24] block: never use atomics to access bs->quiesce_counter Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 24/24] block: add bdrv_graph_wrlock_drained() convenience wrapper Fiona Ebner
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Both commit ab61335025 ("block: drain from main loop thread in
bdrv_co_yield_to_drain()") and commit d05ab380db ("block: Mark drain
related functions GRAPH_RDLOCK") introduced a GLOBAL_STATE_CODE()
macro in bdrv_do_drained_end(). The assertion of being in the main
thread cannot change here, so keep only the earlier instance.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
block/io.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/block/io.c b/block/io.c
index 4fd7768f9c..ac5c7174f6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -413,7 +413,6 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
/* At this point, we should be always running in the main loop. */
GLOBAL_STATE_CODE();
assert(bs->quiesce_counter > 0);
- GLOBAL_STATE_CODE();
/* Re-enable things in child-to-parent order */
old_quiesce_counter = qatomic_fetch_dec(&bs->quiesce_counter);
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 23/24] block: never use atomics to access bs->quiesce_counter
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (21 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 22/24] block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end() Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 24/24] block: add bdrv_graph_wrlock_drained() convenience wrapper Fiona Ebner
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
All accesses of bs->quiesce_counter are in the main thread, either
after a GLOBAL_STATE_CODE() macro or in a function with GRAPH_WRLOCK
annotation.
This is essentially a revert of 414c2ec358 ("block: access
quiesce_counter with atomic ops"). At that time, neither the
GLOBAL_STATE_CODE() macro nor the GRAPH_WRLOCK annotation existed.
Even if the field was only accessed in the main thread back then (did
not check if that is actually the case), it wouldn't have been easy to
verify.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v3.
block/io.c | 7 ++-----
include/block/block_int-common.h | 2 +-
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/block/io.c b/block/io.c
index ac5c7174f6..9bd8ba8431 100644
--- a/block/io.c
+++ b/block/io.c
@@ -361,7 +361,7 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
GLOBAL_STATE_CODE();
/* Stop things in parent-to-child order */
- if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) {
+ if (bs->quiesce_counter++ == 0) {
GRAPH_RDLOCK_GUARD_MAINLOOP();
bdrv_parent_drained_begin(bs, parent);
if (bs->drv && bs->drv->bdrv_drain_begin) {
@@ -401,8 +401,6 @@ bdrv_drained_begin(BlockDriverState *bs)
*/
static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
{
- int old_quiesce_counter;
-
IO_OR_GS_CODE();
if (qemu_in_coroutine()) {
@@ -415,8 +413,7 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
assert(bs->quiesce_counter > 0);
/* Re-enable things in child-to-parent order */
- old_quiesce_counter = qatomic_fetch_dec(&bs->quiesce_counter);
- if (old_quiesce_counter == 1) {
+ if (--bs->quiesce_counter == 0) {
GRAPH_RDLOCK_GUARD_MAINLOOP();
if (bs->drv && bs->drv->bdrv_drain_end) {
bs->drv->bdrv_drain_end(bs);
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 168f703fa1..ea846aff03 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -1239,7 +1239,7 @@ struct BlockDriverState {
/* do we need to tell the quest if we have a volatile write cache? */
int enable_write_cache;
- /* Accessed with atomic ops. */
+ /* Accessed only in the main thread. */
int quiesce_counter;
unsigned int write_gen; /* Current data generation */
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 24/24] block: add bdrv_graph_wrlock_drained() convenience wrapper
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
` (22 preceding siblings ...)
2025-05-26 13:21 ` [PATCH v3 23/24] block: never use atomics to access bs->quiesce_counter Fiona Ebner
@ 2025-05-26 13:21 ` Fiona Ebner
23 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-26 13:21 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, kwolf, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Many write-locked sections are also drained sections. A new
bdrv_graph_wrunlock_drained() wrapper around bdrv_graph_wrunlock() is
introduced, which will begin a drained section first. A global
variable is used so bdrv_graph_wrunlock() knows if it also needs
to end such a drained section. Both the aio_poll call in
bdrv_graph_wrlock() and the aio_bh_poll() in bdrv_graph_wrunlock()
can re-enter a write-locked section. While for the latter, ending the
drain could be moved to before the call, the former requires that the
variable is a counter and not just a boolean.
The switch to the new helpers was generated with the following
commands and then manually checked:
find . -name '*.c' -exec sed -i -z 's/bdrv_drain_all_begin();\n\s*bdrv_graph_wrlock();/bdrv_graph_wrlock_drained();/g' {} ';'
find . -name '*.c' -exec sed -i -z 's/bdrv_graph_wrunlock();\n\s*bdrv_drain_all_end();/bdrv_graph_wrunlock();/g' {} ';'
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes in v3:
* Fix typo in commit message.
block.c | 20 ++++------------
block/backup.c | 4 +---
block/blklogwrites.c | 8 ++-----
block/blkverify.c | 4 +---
block/block-backend.c | 8 ++-----
block/commit.c | 6 +----
block/graph-lock.c | 40 +++++++++++++++++++++++++++++---
block/mirror.c | 7 +-----
block/qcow2.c | 4 +---
block/quorum.c | 8 ++-----
block/replication.c | 11 ++-------
block/snapshot.c | 4 +---
block/stream.c | 6 +----
block/vmdk.c | 20 ++++------------
blockdev.c | 4 +---
blockjob.c | 10 ++------
include/block/graph-lock.h | 11 +++++++++
tests/unit/test-bdrv-drain.c | 36 +++++++---------------------
tests/unit/test-bdrv-graph-mod.c | 20 ++++------------
19 files changed, 90 insertions(+), 141 deletions(-)
diff --git a/block.c b/block.c
index e1b1a8abc7..0e4f118213 100644
--- a/block.c
+++ b/block.c
@@ -1721,14 +1721,12 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
open_failed:
bs->drv = NULL;
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
if (bs->file != NULL) {
bdrv_unref_child(bs, bs->file);
assert(!bs->file);
}
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
g_free(bs->opaque);
bs->opaque = NULL;
@@ -3588,11 +3586,9 @@ int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
bdrv_graph_rdunlock_main_loop();
bdrv_ref(drain_bs);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
bdrv_unref(drain_bs);
return ret;
@@ -3784,12 +3780,10 @@ static BdrvChild *bdrv_open_child_common(const char *filename,
return NULL;
}
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
errp);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
return child;
}
@@ -5163,8 +5157,7 @@ static void bdrv_close(BlockDriverState *bs)
bs->drv = NULL;
}
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
bdrv_unref_child(bs, child);
}
@@ -5172,7 +5165,6 @@ static void bdrv_close(BlockDriverState *bs)
assert(!bs->backing);
assert(!bs->file);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
g_free(bs->opaque);
bs->opaque = NULL;
@@ -5498,8 +5490,7 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
assert(!bs_new->backing);
bdrv_graph_rdunlock_main_loop();
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
child = bdrv_attach_child_noperm(bs_new, bs_top, "backing",
&child_of_bds, bdrv_backing_role(bs_new),
@@ -5520,7 +5511,6 @@ out:
bdrv_refresh_limits(bs_top, NULL, NULL);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
return ret;
}
diff --git a/block/backup.c b/block/backup.c
index 909027c17a..d4713fa1cd 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -498,12 +498,10 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
block_copy_set_speed(bcs, speed);
/* Required permissions are taken by copy-before-write filter target */
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
&error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
return &job->common;
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 70ac76f401..aa1f888869 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -281,11 +281,9 @@ static int blk_log_writes_open(BlockDriverState *bs, QDict *options, int flags,
ret = 0;
fail_log:
if (ret < 0) {
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_unref_child(bs, s->log_file);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
s->log_file = NULL;
qemu_mutex_destroy(&s->mutex);
}
@@ -298,12 +296,10 @@ static void blk_log_writes_close(BlockDriverState *bs)
{
BDRVBlkLogWritesState *s = bs->opaque;
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_unref_child(bs, s->log_file);
s->log_file = NULL;
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
qemu_mutex_destroy(&s->mutex);
}
diff --git a/block/blkverify.c b/block/blkverify.c
index 3a71f7498c..72efcbe7ef 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -151,12 +151,10 @@ static void blkverify_close(BlockDriverState *bs)
{
BDRVBlkverifyState *s = bs->opaque;
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_unref_child(bs, s->test_file);
s->test_file = NULL;
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
}
static int64_t coroutine_fn GRAPH_RDLOCK
diff --git a/block/block-backend.c b/block/block-backend.c
index 68209bb2f7..f8d6ba65c1 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -889,11 +889,9 @@ void blk_remove_bs(BlockBackend *blk)
root = blk->root;
blk->root = NULL;
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_root_unref_child(root);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
}
/*
@@ -906,8 +904,7 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
GLOBAL_STATE_CODE();
bdrv_ref(bs);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
if ((bs->open_flags & BDRV_O_INACTIVE) && blk_can_inactivate(blk)) {
blk->disable_perm = true;
@@ -922,7 +919,6 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
perm, shared_perm, blk, errp);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
if (blk->root == NULL) {
return -EPERM;
}
diff --git a/block/commit.c b/block/commit.c
index 6c4b736ff8..dc1942483b 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -392,8 +392,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
* this is the responsibility of the interface (i.e. whoever calls
* commit_start()).
*/
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
s->base_overlay = bdrv_find_overlay(top, base);
assert(s->base_overlay);
@@ -425,21 +424,18 @@ void commit_start(const char *job_id, BlockDriverState *bs,
iter_shared_perms, errp);
if (ret < 0) {
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
goto fail;
}
}
if (bdrv_freeze_backing_chain(commit_top_bs, base, errp) < 0) {
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
goto fail;
}
s->chain_frozen = true;
ret = block_job_add_bdrv(&s->common, "base", base, 0, BLK_PERM_ALL, errp);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
if (ret < 0) {
goto fail;
diff --git a/block/graph-lock.c b/block/graph-lock.c
index c81162b147..b7319473a1 100644
--- a/block/graph-lock.c
+++ b/block/graph-lock.c
@@ -33,6 +33,17 @@ static QemuMutex aio_context_list_lock;
/* Written and read with atomic operations. */
static int has_writer;
+/*
+ * Many write-locked sections are also drained sections. There is a convenience
+ * wrapper bdrv_graph_wrlock_drained() which begins a drained section before
+ * acquiring the lock. This variable here is used so bdrv_graph_wrunlock() knows
+ * if it also needs to end such a drained section. It needs to be a counter,
+ * because the aio_poll() call in bdrv_graph_wrlock() might re-enter
+ * bdrv_graph_wrlock_drained(). And note that aio_bh_poll() in
+ * bdrv_graph_wrunlock() might also re-enter a write-locked section.
+ */
+static int wrlock_quiesced_counter;
+
/*
* A reader coroutine could move from an AioContext to another.
* If this happens, there is no problem from the point of view of
@@ -112,8 +123,14 @@ void no_coroutine_fn bdrv_graph_wrlock(void)
assert(!qatomic_read(&has_writer));
assert(!qemu_in_coroutine());
- /* Make sure that constantly arriving new I/O doesn't cause starvation */
- bdrv_drain_all_begin_nopoll();
+ bool need_drain = wrlock_quiesced_counter == 0;
+
+ if (need_drain) {
+ /*
+ * Make sure that constantly arriving new I/O doesn't cause starvation
+ */
+ bdrv_drain_all_begin_nopoll();
+ }
/*
* reader_count == 0: this means writer will read has_reader as 1
@@ -139,7 +156,18 @@ void no_coroutine_fn bdrv_graph_wrlock(void)
smp_mb();
} while (reader_count() >= 1);
- bdrv_drain_all_end();
+ if (need_drain) {
+ bdrv_drain_all_end();
+ }
+}
+
+void no_coroutine_fn bdrv_graph_wrlock_drained(void)
+{
+ GLOBAL_STATE_CODE();
+
+ bdrv_drain_all_begin();
+ wrlock_quiesced_counter++;
+ bdrv_graph_wrlock();
}
void no_coroutine_fn bdrv_graph_wrunlock(void)
@@ -168,6 +196,12 @@ void no_coroutine_fn bdrv_graph_wrunlock(void)
* progress.
*/
aio_bh_poll(qemu_get_aio_context());
+
+ if (wrlock_quiesced_counter > 0) {
+ bdrv_drain_all_end();
+ wrlock_quiesced_counter--;
+ }
+
}
void coroutine_fn bdrv_graph_co_rdlock(void)
diff --git a/block/mirror.c b/block/mirror.c
index 6e8caf4b49..82a6e50cf8 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -2014,15 +2014,13 @@ static BlockJob *mirror_start_job(
*/
bdrv_disable_dirty_bitmap(s->dirty_bitmap);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
ret = block_job_add_bdrv(&s->common, "source", bs, 0,
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
BLK_PERM_CONSISTENT_READ,
errp);
if (ret < 0) {
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
goto fail;
}
@@ -2068,19 +2066,16 @@ static BlockJob *mirror_start_job(
iter_shared_perms, errp);
if (ret < 0) {
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
goto fail;
}
}
if (bdrv_freeze_backing_chain(mirror_top_bs, target, errp) < 0) {
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
goto fail;
}
}
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
QTAILQ_INIT(&s->ops_in_flight);
diff --git a/block/qcow2.c b/block/qcow2.c
index 420918b3c3..694ad797dc 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2821,11 +2821,9 @@ qcow2_do_close(BlockDriverState *bs, bool close_data_file)
if (close_data_file && has_data_file(bs)) {
GLOBAL_STATE_CODE();
bdrv_graph_rdunlock_main_loop();
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_unref_child(bs, s->data_file);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
s->data_file = NULL;
bdrv_graph_rdlock_main_loop();
}
diff --git a/block/quorum.c b/block/quorum.c
index cc3bc5f4e7..76a4feb2d9 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1037,8 +1037,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
close_exit:
/* cleanup on error */
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
for (i = 0; i < s->num_children; i++) {
if (!opened[i]) {
continue;
@@ -1046,7 +1045,6 @@ close_exit:
bdrv_unref_child(bs, s->children[i]);
}
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
g_free(s->children);
g_free(opened);
exit:
@@ -1059,13 +1057,11 @@ static void quorum_close(BlockDriverState *bs)
BDRVQuorumState *s = bs->opaque;
int i;
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
for (i = 0; i < s->num_children; i++) {
bdrv_unref_child(bs, s->children[i]);
}
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
g_free(s->children);
}
diff --git a/block/replication.c b/block/replication.c
index 0879718854..83978b61f5 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -540,8 +540,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
return;
}
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_ref(hidden_disk->bs);
s->hidden_disk = bdrv_attach_child(bs, hidden_disk->bs, "hidden disk",
@@ -550,7 +549,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
if (local_err) {
error_propagate(errp, local_err);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
return;
}
@@ -561,7 +559,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
if (local_err) {
error_propagate(errp, local_err);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
return;
}
@@ -574,14 +571,12 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
!check_top_bs(top_bs, bs)) {
error_setg(errp, "No top_bs or it is invalid");
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
reopen_backing_file(bs, false, NULL);
return;
}
bdrv_op_block_all(top_bs, s->blocker);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
s->backup_job = backup_job_create(
NULL, s->secondary_disk->bs, s->hidden_disk->bs,
@@ -656,14 +651,12 @@ static void replication_done(void *opaque, int ret)
if (ret == 0) {
s->stage = BLOCK_REPLICATION_DONE;
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_unref_child(bs, s->secondary_disk);
s->secondary_disk = NULL;
bdrv_unref_child(bs, s->hidden_disk);
s->hidden_disk = NULL;
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
s->error = 0;
} else {
diff --git a/block/snapshot.c b/block/snapshot.c
index 28c9c43621..bd9d759b32 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -291,11 +291,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
}
/* .bdrv_open() will re-attach it */
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_unref_child(bs, fallback);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
ret = bdrv_snapshot_goto(fallback_bs, snapshot_id, errp);
memset(bs->opaque, 0, drv->instance_size);
diff --git a/block/stream.c b/block/stream.c
index f5441f27f4..a6ef840e29 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -371,12 +371,10 @@ void stream_start(const char *job_id, BlockDriverState *bs,
* already have our own plans. Also don't allow resize as the image size is
* queried only at the job start and then cached.
*/
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
if (block_job_add_bdrv(&s->common, "active node", bs, 0,
basic_flags | BLK_PERM_WRITE, errp)) {
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
goto fail;
}
@@ -397,12 +395,10 @@ void stream_start(const char *job_id, BlockDriverState *bs,
basic_flags, errp);
if (ret < 0) {
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
goto fail;
}
}
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
s->base_overlay = base_overlay;
s->above_base = above_base;
diff --git a/block/vmdk.c b/block/vmdk.c
index 89a7250120..04986c8d55 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -271,8 +271,7 @@ static void vmdk_free_extents(BlockDriverState *bs)
BDRVVmdkState *s = bs->opaque;
VmdkExtent *e;
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
for (i = 0; i < s->num_extents; i++) {
e = &s->extents[i];
g_free(e->l1_table);
@@ -284,7 +283,6 @@ static void vmdk_free_extents(BlockDriverState *bs)
}
}
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
g_free(s->extents);
}
@@ -1249,11 +1247,9 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
0, 0, 0, 0, 0, &extent, errp);
if (ret < 0) {
bdrv_graph_rdunlock_main_loop();
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_unref_child(bs, extent_file);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
bdrv_graph_rdlock_main_loop();
goto out;
}
@@ -1270,11 +1266,9 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
g_free(buf);
if (ret) {
bdrv_graph_rdunlock_main_loop();
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_unref_child(bs, extent_file);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
bdrv_graph_rdlock_main_loop();
goto out;
}
@@ -1283,11 +1277,9 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
if (ret) {
bdrv_graph_rdunlock_main_loop();
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_unref_child(bs, extent_file);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
bdrv_graph_rdlock_main_loop();
goto out;
}
@@ -1295,11 +1287,9 @@ vmdk_parse_extents(const char *desc, BlockDriverState *bs, QDict *options,
} else {
error_setg(errp, "Unsupported extent type '%s'", type);
bdrv_graph_rdunlock_main_loop();
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_unref_child(bs, extent_file);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
bdrv_graph_rdlock_main_loop();
ret = -ENOTSUP;
goto out;
diff --git a/blockdev.c b/blockdev.c
index 2e7fda6780..e625534925 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3561,8 +3561,7 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
BlockDriverState *parent_bs, *new_bs = NULL;
BdrvChild *p_child;
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
parent_bs = bdrv_lookup_bs(parent, parent, errp);
if (!parent_bs) {
@@ -3599,7 +3598,6 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
out:
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
}
BlockJobInfoList *qmp_query_block_jobs(Error **errp)
diff --git a/blockjob.c b/blockjob.c
index e68181a35b..db7c3a69a0 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -198,8 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
* one to make sure that such a concurrent access does not attempt
* to process an already freed BdrvChild.
*/
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
while (job->nodes) {
GSList *l = job->nodes;
BdrvChild *c = l->data;
@@ -212,7 +211,6 @@ void block_job_remove_all_bdrv(BlockJob *job)
g_slist_free_1(l);
}
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
}
bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
@@ -498,8 +496,7 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
int ret;
GLOBAL_STATE_CODE();
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
if (job_id == NULL && !(flags & JOB_INTERNAL)) {
job_id = bdrv_get_device_name(bs);
@@ -509,7 +506,6 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
flags, cb, opaque, errp);
if (job == NULL) {
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
return NULL;
}
@@ -548,12 +544,10 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
}
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
return job;
fail:
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
job_early_fail(&job->job);
return NULL;
}
diff --git a/include/block/graph-lock.h b/include/block/graph-lock.h
index 2c26c72108..95bf5ede40 100644
--- a/include/block/graph-lock.h
+++ b/include/block/graph-lock.h
@@ -112,10 +112,21 @@ void unregister_aiocontext(AioContext *ctx);
void no_coroutine_fn TSA_ACQUIRE(graph_lock) TSA_NO_TSA
bdrv_graph_wrlock(void);
+/*
+ * bdrv_graph_wrlock_drained:
+ * Similar to bdrv_graph_wrlock, but will begin a drained section before
+ * locking.
+ */
+void no_coroutine_fn TSA_ACQUIRE(graph_lock) TSA_NO_TSA
+bdrv_graph_wrlock_drained(void);
+
/*
* bdrv_graph_wrunlock:
* Write finished, reset global has_writer to 0 and restart
* all readers that are waiting.
+ *
+ * Also ends the drained section if bdrv_graph_wrlock_drained() was used to lock
+ * the graph.
*/
void no_coroutine_fn TSA_RELEASE(graph_lock) TSA_NO_TSA
bdrv_graph_wrunlock(void);
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index c33f7d31c2..ed60e1693d 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -772,11 +772,9 @@ static void test_blockjob_common_drain_node(enum drain_type drain_type,
tjob->bs = src;
job = &tjob->common;
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, &error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
switch (result) {
case TEST_JOB_SUCCESS:
@@ -955,13 +953,11 @@ static void bdrv_test_top_close(BlockDriverState *bs)
{
BdrvChild *c, *next_c;
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
bdrv_unref_child(bs, c);
}
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
}
static int coroutine_fn GRAPH_RDLOCK
@@ -1051,12 +1047,10 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
&error_abort);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
/* This child will be the one to pass to requests through to, and
* it will stall until a drain occurs */
@@ -1064,25 +1058,21 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
&error_abort);
child_bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
/* Takes our reference to child_bs */
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child",
&child_of_bds,
BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
&error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
/* This child is just there to be deleted
* (for detach_instead_of_delete == true) */
null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
&error_abort);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds, BDRV_CHILD_DATA,
&error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
blk_insert_bs(blk, bs, &error_abort);
@@ -1165,8 +1155,7 @@ static void no_coroutine_fn detach_indirect_bh(void *opaque)
bdrv_dec_in_flight(data->child_b->bs);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_unref_child(data->parent_b, data->child_b);
bdrv_ref(data->c);
@@ -1174,7 +1163,6 @@ static void no_coroutine_fn detach_indirect_bh(void *opaque)
&child_of_bds, BDRV_CHILD_DATA,
&error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
}
static void coroutine_mixed_fn detach_by_parent_aio_cb(void *opaque, int ret)
@@ -1272,8 +1260,7 @@ static void TSA_NO_TSA test_detach_indirect(bool by_parent_cb)
/* Set child relationships */
bdrv_ref(b);
bdrv_ref(a);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
child_b = bdrv_attach_child(parent_b, b, "PB-B", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
child_a = bdrv_attach_child(parent_b, a, "PB-A", &child_of_bds,
@@ -1284,7 +1271,6 @@ static void TSA_NO_TSA test_detach_indirect(bool by_parent_cb)
by_parent_cb ? &child_of_bds : &detach_by_driver_cb_class,
BDRV_CHILD_DATA, &error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
g_assert_cmpint(parent_a->refcnt, ==, 1);
g_assert_cmpint(parent_b->refcnt, ==, 1);
@@ -1697,8 +1683,7 @@ static void test_drop_intermediate_poll(void)
* Establish the chain last, so the chain links are the first
* elements in the BDS.parents lists
*/
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
for (i = 0; i < 3; i++) {
if (i) {
/* Takes the reference to chain[i - 1] */
@@ -1707,7 +1692,6 @@ static void test_drop_intermediate_poll(void)
}
}
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
job = block_job_create("job", &test_simple_job_driver, NULL, job_node,
0, BLK_PERM_ALL, 0, 0, NULL, NULL, &error_abort);
@@ -1954,12 +1938,10 @@ static void do_test_replace_child_mid_drain(int old_drain_count,
new_child_bs->total_sectors = 1;
bdrv_ref(old_child_bs);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_attach_child(parent_bs, old_child_bs, "child", &child_of_bds,
BDRV_CHILD_COW, &error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
parent_s->setup_completed = true;
for (i = 0; i < old_drain_count; i++) {
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index 7b03ebe4b0..b077f0e3e3 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -137,12 +137,10 @@ static void test_update_perm_tree(void)
blk_insert_bs(root, bs, &error_abort);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_attach_child(filter, bs, "child", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
ret = bdrv_append(filter, bs, NULL);
g_assert_cmpint(ret, <, 0);
@@ -206,13 +204,11 @@ static void test_should_update_child(void)
bdrv_set_backing_hd(target, bs, &error_abort);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
g_assert(target->backing->bs == bs);
bdrv_attach_child(filter, target, "target", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
bdrv_append(filter, bs, &error_abort);
bdrv_graph_rdlock_main_loop();
@@ -248,8 +244,7 @@ static void test_parallel_exclusive_write(void)
bdrv_ref(base);
bdrv_ref(fl1);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_attach_child(top, fl1, "backing", &child_of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
&error_abort);
@@ -262,7 +257,6 @@ static void test_parallel_exclusive_write(void)
bdrv_replace_node(fl1, fl2, &error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
bdrv_drained_end(fl2);
bdrv_drained_end(fl1);
@@ -369,8 +363,7 @@ static void test_parallel_perm_update(void)
*/
bdrv_ref(base);
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_attach_child(top, ws, "file", &child_of_bds, BDRV_CHILD_DATA,
&error_abort);
c_fl1 = bdrv_attach_child(ws, fl1, "first", &child_of_bds,
@@ -384,7 +377,6 @@ static void test_parallel_perm_update(void)
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
&error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
/* Select fl1 as first child to be active */
s->selected = c_fl1;
@@ -438,13 +430,11 @@ static void test_append_greedy_filter(void)
BlockDriverState *base = no_perm_node("base");
BlockDriverState *fl = exclusive_writer_node("fl1");
- bdrv_drain_all_begin();
- bdrv_graph_wrlock();
+ bdrv_graph_wrlock_drained();
bdrv_attach_child(top, base, "backing", &child_of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
&error_abort);
bdrv_graph_wrunlock();
- bdrv_drain_all_end();
bdrv_append(fl, base, &error_abort);
bdrv_unref(fl);
--
2.39.5
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v3 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK
2025-05-26 13:21 ` [PATCH v3 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK Fiona Ebner
@ 2025-05-27 15:11 ` Kevin Wolf
0 siblings, 0 replies; 36+ messages in thread
From: Kevin Wolf @ 2025-05-27 15:11 UTC (permalink / raw)
To: Fiona Ebner
Cc: qemu-block, qemu-devel, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
> This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
>
> Note that even if bdrv_drained_begin() were already marked as
> GRAPH_UNLOCKED, TSA would not complain about the instance in
> bdrv_change_aio_context() before this change, because it is preceded
> by a bdrv_graph_rdunlock_main_loop() call. It is not correct to
> release the lock here, and in case the caller holds a write lock, it
> wouldn't actually release the lock.
>
> In combination with block-stream, there is a deadlock that can happen
> because of this [0]. In particular, it can happen that
> main thread IO thread
> 1. acquires write lock
> in blk_co_do_preadv_part():
> 2. have non-zero blk->in_flight
> 3. try to acquire read lock
> 4. begin drain
>
> Steps 3 and 4 might be switched. Draining will poll and get stuck,
> because it will see the non-zero in_flight counter. But the IO thread
> will not make any progress either, because it cannot acquire the read
> lock.
>
> After this change, all paths to bdrv_change_aio_context() drain:
> bdrv_change_aio_context() is called by:
> 1. bdrv_child_cb_change_aio_ctx() which is only called via the
> change_aio_ctx() callback, see below.
> 2. bdrv_child_change_aio_context(), see below.
> 3. bdrv_try_change_aio_context(), where a drained section is
> introduced.
>
> The change_aio_ctx() callback is called by:
> 1. bdrv_attach_child_common_abort(), where a drained section is
> introduced.
> 2. bdrv_attach_child_common(), where a drained section is introduced.
> 3. bdrv_parent_change_aio_context(), see below.
>
> bdrv_child_change_aio_context() is called by:
> 1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
> section is invariant.
> 2. child_job_change_aio_ctx(), which is only called via the
> change_aio_ctx() callback, see above.
>
> bdrv_parent_change_aio_context() is called by:
> 1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
> section is invariant.
>
> This resolves all code paths. Note that bdrv_attach_child_common()
> and bdrv_attach_child_common_abort() hold the graph write lock and
> callers of bdrv_try_change_aio_context() might too, so they are not
> actually allowed to drain either. This will be addressed in the
> following commits.
>
> More granular draining is not trivially possible, because
> bdrv_change_aio_context() can recursively call itself e.g. via
> bdrv_child_change_aio_context().
>
> [0]: https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63f51@virtuozzo.com/
>
> Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 09/24] block: move drain outside of bdrv_try_change_aio_context()
2025-05-26 13:21 ` [PATCH v3 09/24] block: move drain outside of bdrv_try_change_aio_context() Fiona Ebner
@ 2025-05-27 15:12 ` Kevin Wolf
0 siblings, 0 replies; 36+ messages in thread
From: Kevin Wolf @ 2025-05-27 15:12 UTC (permalink / raw)
To: Fiona Ebner
Cc: qemu-block, qemu-devel, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
> This is part of resolving the deadlock mentioned in commit "block:
> move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
>
> Convert the function to a _locked() version that has to be called with
> the graph lock held and add a convenience wrapper that has to be
> called with the graph unlocked, which drains and takes the lock
> itself. Since bdrv_try_change_aio_context() is global state code, the
> wrapper is too.
>
> Callers are adapted to use the appropriate variant, depending on
> whether the caller already holds the lock. In the
> test_set_aio_context() unit test, prior drains can be removed, because
> draining already happens inside the new wrapper.
>
> Note that bdrv_attach_child_common_abort(), bdrv_attach_child_common()
> and bdrv_root_unref_child() hold the graph lock and are not actually
> allowed to drain either. This will be addressed in the following
> commits.
>
> Functions like qmp_blockdev_mirror() query the nodes to act on before
> draining and locking. In theory, draining could invalidate those nodes.
> This kind of issue is not addressed by these commits.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/24] block: move drain outside of bdrv_attach_child_common(_abort)()
2025-05-26 13:21 ` [PATCH v3 10/24] block: move drain outside of bdrv_attach_child_common(_abort)() Fiona Ebner
@ 2025-05-27 15:23 ` Kevin Wolf
2025-05-28 11:41 ` Fiona Ebner
0 siblings, 1 reply; 36+ messages in thread
From: Kevin Wolf @ 2025-05-27 15:23 UTC (permalink / raw)
To: Fiona Ebner
Cc: qemu-block, qemu-devel, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
> This is part of resolving the deadlock mentioned in commit "block:
> move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
>
> The function bdrv_attach_child_common_abort() is used only as the
> abort callback in bdrv_attach_child_common_drv transactions, so the
> tran_finalize() calls of such transactions need to be in drained
> sections too.
>
> All code paths are covered:
> The bdrv_attach_child_common_drv transactions are only used in
> bdrv_attach_child_common(), so it is enough to check callers of
> bdrv_attach_child_common() following the transactions.
>
> bdrv_attach_child_common() is called by:
> 1. bdrv_attach_child_noperm(), which does not finalize the
> transaction yet.
> 2. bdrv_root_attach_child(), where a drained section is introduced.
>
> bdrv_attach_child_noperm() is called by:
> 1. bdrv_attach_child(), where a drained section is introduced.
> 2. bdrv_set_file_or_backing_noperm(), which does not finalize the
> transaction yet.
> 3. bdrv_append(), where a drained section is introduced.
>
> bdrv_set_file_or_backing_noperm() is called by:
> 1. bdrv_set_backing_hd_drained(), where a drained section is
> introduced.
> 2. bdrv_reopen_parse_file_or_backing(), which does not finalize the
> transaction yet. Draining the old child bs currently happens under
> the graph lock there. This is replaced with an assertion, because
> the drain will be moved further up to the caller.
>
> bdrv_reopen_parse_file_or_backing() is called by:
> 1. bdrv_reopen_prepare(), which does not finalize the transaction yet.
>
> bdrv_reopen_prepare() is called by:
> 1. bdrv_reopen_multiple(), which does finalize the transaction. It is
> called after bdrv_reopen_queue(), which starts a drained section.
> The drained section ends, when bdrv_reopen_queue_free() is called
> at the end of bdrv_reopen_multiple().
>
> This resolves all code paths.
>
> The functions bdrv_set_backing_hd_drained(), bdrv_attach_child() and
> bdrv_root_attach_child() run under the graph lock, so they are not
> actually allowed to drain. This will be addressed in the following
> commits.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>
> No changes in v3.
>
> block.c | 27 +++++++++++----------------
> 1 file changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/block.c b/block.c
> index 3aaacabf7f..64db71e38d 100644
> --- a/block.c
> +++ b/block.c
> @@ -3028,10 +3028,8 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
> bdrv_replace_child_noperm(s->child, NULL);
>
> if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
> - bdrv_drain_all_begin();
> bdrv_try_change_aio_context_locked(bs, s->old_child_ctx, NULL,
> &error_abort);
> - bdrv_drain_all_end();
> }
>
> if (bdrv_child_get_parent_aio_context(s->child) != s->old_parent_ctx) {
> @@ -3043,10 +3041,8 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
>
> /* No need to visit `child`, because it has been detached already */
> visited = g_hash_table_new(NULL, NULL);
> - bdrv_drain_all_begin();
> ret = s->child->klass->change_aio_ctx(s->child, s->old_parent_ctx,
> visited, tran, &error_abort);
> - bdrv_drain_all_end();
> g_hash_table_destroy(visited);
>
> /* transaction is supposed to always succeed */
> @@ -3118,10 +3114,8 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
> parent_ctx = bdrv_child_get_parent_aio_context(new_child);
> if (child_ctx != parent_ctx) {
> Error *local_err = NULL;
> - bdrv_drain_all_begin();
> int ret = bdrv_try_change_aio_context_locked(child_bs, parent_ctx, NULL,
> &local_err);
> - bdrv_drain_all_end();
>
> if (ret < 0 && child_class->change_aio_ctx) {
> Transaction *aio_ctx_tran = tran_new();
> @@ -3129,11 +3123,9 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
> bool ret_child;
>
> g_hash_table_add(visited, new_child);
> - bdrv_drain_all_begin();
> ret_child = child_class->change_aio_ctx(new_child, child_ctx,
> visited, aio_ctx_tran,
> NULL);
> - bdrv_drain_all_end();
> if (ret_child == true) {
> error_free(local_err);
> ret = 0;
Should we mention in the function comment for bdrv_attach_child_common()
that all block nodes must be drained from before this functoin is called
until after the transaction is finalized?
A similar note would probably be good for all the other functions you
mention in the commit message that don't finalize the transaction yet so
that we convert them in this same patch.
> @@ -4721,6 +4719,8 @@ int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
> * Return 0 on success, otherwise return < 0 and set @errp.
> *
> * @reopen_state->bs can move to a different AioContext in this function.
> + *
> + * The old child bs must be drained.
> */
> static int GRAPH_UNLOCKED
> bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
Only the old child or all nodes?
bdrv_try_change_aio_context_locked() is documented as "Called while all
bs are drained." and we call it indirectly here (which will be more
obvious when you add the comments as suggested above).
Apart from the comments, the actual code looks fine to me.
Kevin
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 11/24] block: move drain outside of bdrv_set_backing_hd_drained()
2025-05-26 13:21 ` [PATCH v3 11/24] block: move drain outside of bdrv_set_backing_hd_drained() Fiona Ebner
@ 2025-05-27 15:29 ` Kevin Wolf
2025-05-28 13:09 ` Fiona Ebner
0 siblings, 1 reply; 36+ messages in thread
From: Kevin Wolf @ 2025-05-27 15:29 UTC (permalink / raw)
To: Fiona Ebner
Cc: qemu-block, qemu-devel, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
> This is part of resolving the deadlock mentioned in commit "block:
> move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
>
> The function bdrv_set_backing_hd_drained() holds the graph lock, so it
> is not allowed to drain. It is called by:
> 1. bdrv_set_backing_hd(), where a drained section is introduced,
> replacing the previously present bs-specific drains.
> 2. stream_prepare(), where a drained section is introduced replacing
> the previously present bs-specific drains.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>
> No changes in v3.
>
> block.c | 6 ++----
> block/stream.c | 6 ++----
> 2 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/block.c b/block.c
> index 64db71e38d..75322789b5 100644
> --- a/block.c
> +++ b/block.c
> @@ -3569,7 +3569,6 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
> assert(bs->backing->bs->quiesce_counter > 0);
> }
>
> - bdrv_drain_all_begin();
> ret = bdrv_set_file_or_backing_noperm(bs, backing_hd, true, tran, errp);
> if (ret < 0) {
> goto out;
> @@ -3578,7 +3577,6 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
> ret = bdrv_refresh_perms(bs, tran, errp);
> out:
> tran_finalize(tran, ret);
> - bdrv_drain_all_end();
> return ret;
> }
Do we need to update the comment for bdrv_set_backing_hd_drained()?
* If a backing child is already present (i.e. we're detaching a node), that
* child node must be drained.
Same as in the previous patch, this is now probably all nodes.
> @@ -3594,11 +3592,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
> bdrv_graph_rdunlock_main_loop();
>
> bdrv_ref(drain_bs);
> - bdrv_drained_begin(drain_bs);
> + bdrv_drain_all_begin();
> bdrv_graph_wrlock();
> ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
> bdrv_graph_wrunlock();
> - bdrv_drained_end(drain_bs);
> + bdrv_drain_all_end();
> bdrv_unref(drain_bs);
The only thing we do with drain_bs now is finding it, bdrv_ref() and
immediately bdrv_unref(). I don't think it should exist any more after
the change to drain_all.
Kevin
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 12/24] block: move drain outside of bdrv_root_attach_child()
2025-05-26 13:21 ` [PATCH v3 12/24] block: move drain outside of bdrv_root_attach_child() Fiona Ebner
@ 2025-05-27 17:16 ` Kevin Wolf
0 siblings, 0 replies; 36+ messages in thread
From: Kevin Wolf @ 2025-05-27 17:16 UTC (permalink / raw)
To: Fiona Ebner
Cc: qemu-block, qemu-devel, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
> This is part of resolving the deadlock mentioned in commit "block:
> move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
>
> The function bdrv_root_attach_child() runs under the graph lock, so it
> is not allowed to drain. It is called by:
> 1. blk_insert_bs(), where a drained section is introduced.
> 2. block_job_add_bdrv(), which holds the graph lock itself.
>
> block_job_add_bdrv() is called by:
> 1. mirror_start_job()
> 2. stream_start()
> 3. commit_start()
> 4. backup_job_create()
> 5. block_job_create()
> 6. In the test_blockjob_common_drain_node() unit test
>
> In all callers, a drained section is introduced.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Same thing as in previous patches with comments. With that fixed:
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 13/24] block: move drain outside of bdrv_attach_child()
2025-05-26 13:21 ` [PATCH v3 13/24] block: move drain outside of bdrv_attach_child() Fiona Ebner
@ 2025-05-27 17:20 ` Kevin Wolf
0 siblings, 0 replies; 36+ messages in thread
From: Kevin Wolf @ 2025-05-27 17:20 UTC (permalink / raw)
To: Fiona Ebner
Cc: qemu-block, qemu-devel, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
> This is part of resolving the deadlock mentioned in commit "block:
> move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
>
> The function bdrv_attach_child() runs under the graph lock, so it is
> not allowed to drain. It is called by:
> 1. replication_start()
> 2. quorum_add_child()
> 3. bdrv_open_child_common()
> 4. Throughout test-bdrv-graph-mod.c and test-bdrv-drain.c unit tests.
>
> In all callers, a drained section is introduced.
>
> The function quorum_add_child() runs under the graph lock, so it is
> not actually allowed to drain. This will be addressed by the following
> commit.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
You won't be surprised that bdrv_attach_child() should have a comment
stating the draining requirement, too. With this fixed:
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 14/24] block: move drain outside of quorum_add_child()
2025-05-26 13:21 ` [PATCH v3 14/24] block: move drain outside of quorum_add_child() Fiona Ebner
@ 2025-05-27 17:23 ` Kevin Wolf
0 siblings, 0 replies; 36+ messages in thread
From: Kevin Wolf @ 2025-05-27 17:23 UTC (permalink / raw)
To: Fiona Ebner
Cc: qemu-block, qemu-devel, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
> This is part of resolving the deadlock mentioned in commit "block:
> move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
>
> The quorum_add_child() callback runs under the graph lock, so it is
> not allowed to drain. It is only called as the .bdrv_add_child()
> callback, which is only called in the bdrv_add_child() function, which
> also runs under the graph lock.
>
> The bdrv_add_child() function is called by qmp_x_blockdev_change(),
> where a drained section is introduced.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Here we should have the documentation about the draining requirement on
both BlockDriver.bdrv_add_child() and the bdrv_add_child() wrapper. With
the added comments:
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 15/24] block: move drain outside of bdrv_root_unref_child()
2025-05-26 13:21 ` [PATCH v3 15/24] block: move drain outside of bdrv_root_unref_child() Fiona Ebner
@ 2025-05-27 17:50 ` Kevin Wolf
0 siblings, 0 replies; 36+ messages in thread
From: Kevin Wolf @ 2025-05-27 17:50 UTC (permalink / raw)
To: Fiona Ebner
Cc: qemu-block, qemu-devel, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
> This is part of resolving the deadlock mentioned in commit "block:
> move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
>
> bdrv_root_unref_child() is called by:
> 1. blk_remove_bs(), where a drained section is introduced.
> 2. bdrv_unref_child(), which runs under the graph lock, so the drain
> will be moved further up to its callers.
> 3. block_job_remove_all_bdrv(), where a drained section is introduced.
>
> For all callers of bdrv_unref_child() a drained section is introduced,
> they are not explicilty listed here. The caller quorum_del_child()
> holds the graph lock, so it is not actually allowed to drain. This
> will be addressed in the next commit.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
What about bdrv_co_unref_child()? You probably missed it because it's
generated code, but we probably need to drain around those callers, too.
And as usual, we need comments for bdrv_root_unref_child() and
bdrv_unref_child().
Kevin
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 16/24] block: move drain outside of quorum_del_child()
2025-05-26 13:21 ` [PATCH v3 16/24] block: move drain outside of quorum_del_child() Fiona Ebner
@ 2025-05-27 17:52 ` Kevin Wolf
0 siblings, 0 replies; 36+ messages in thread
From: Kevin Wolf @ 2025-05-27 17:52 UTC (permalink / raw)
To: Fiona Ebner
Cc: qemu-block, qemu-devel, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
> The quorum_del_child() callback runs under the graph lock, so it is
> not allowed to drain. It is only called as the .bdrv_del_child()
> callback, which is only called in the bdrv_del_child() function, which
> also runs under the graph lock.
>
> The bdrv_del_child() function is called by qmp_x_blockdev_change().
> A drained section was already introduced there by commit "block: move
> drain out of quorum_add_child()".
>
> This finally finishes moving out the drain to places that are not
> under the graph lock started in "block: move draining out of
> bdrv_change_aio_context() and mark GRAPH_RDLOCK".
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
With the usual comments added:
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/24] block: move drain outside of bdrv_attach_child_common(_abort)()
2025-05-27 15:23 ` Kevin Wolf
@ 2025-05-28 11:41 ` Fiona Ebner
0 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-28 11:41 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-block, qemu-devel, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Am 27.05.25 um 17:23 schrieb Kevin Wolf:
> Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
>> @@ -3129,11 +3123,9 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
>> bool ret_child;
>>
>> g_hash_table_add(visited, new_child);
>> - bdrv_drain_all_begin();
>> ret_child = child_class->change_aio_ctx(new_child, child_ctx,
>> visited, aio_ctx_tran,
>> NULL);
>> - bdrv_drain_all_end();
>> if (ret_child == true) {
>> error_free(local_err);
>> ret = 0;
>
> Should we mention in the function comment for bdrv_attach_child_common()
> that all block nodes must be drained from before this functoin is called
> until after the transaction is finalized?
>
> A similar note would probably be good for all the other functions you
> mention in the commit message that don't finalize the transaction yet so
> that we convert them in this same patch.
>
>> @@ -4721,6 +4719,8 @@ int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
>> * Return 0 on success, otherwise return < 0 and set @errp.
>> *
>> * @reopen_state->bs can move to a different AioContext in this function.
>> + *
>> + * The old child bs must be drained.
>> */
>> static int GRAPH_UNLOCKED
>> bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
>
> Only the old child or all nodes?
>
> bdrv_try_change_aio_context_locked() is documented as "Called while all
> bs are drained." and we call it indirectly here (which will be more
> obvious when you add the comments as suggested above).
Yes, it needs to be all nodes. I'll try to document the requirement for
all affected functions in v4 (also what you mentioned in the replies to
the other patches).
Best Regards,
Fiona
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 11/24] block: move drain outside of bdrv_set_backing_hd_drained()
2025-05-27 15:29 ` Kevin Wolf
@ 2025-05-28 13:09 ` Fiona Ebner
0 siblings, 0 replies; 36+ messages in thread
From: Fiona Ebner @ 2025-05-28 13:09 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-block, qemu-devel, den, andrey.drobyshev, hreitz, stefanha,
eblake, jsnow, vsementsov, xiechanglong.d, wencongyang2, berto,
fam, ari
Am 27.05.25 um 17:29 schrieb Kevin Wolf:
> Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
>> @@ -3578,7 +3577,6 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
>> ret = bdrv_refresh_perms(bs, tran, errp);
>> out:
>> tran_finalize(tran, ret);
>> - bdrv_drain_all_end();
>> return ret;
>> }
>
> Do we need to update the comment for bdrv_set_backing_hd_drained()?
>
> * If a backing child is already present (i.e. we're detaching a node), that
> * child node must be drained.
>
> Same as in the previous patch, this is now probably all nodes.
Yes, and even in case no backing child is already present.
>> @@ -3594,11 +3592,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>> bdrv_graph_rdunlock_main_loop();
>>
>> bdrv_ref(drain_bs);
>> - bdrv_drained_begin(drain_bs);
>> + bdrv_drain_all_begin();
>> bdrv_graph_wrlock();
>> ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
>> bdrv_graph_wrunlock();
>> - bdrv_drained_end(drain_bs);
>> + bdrv_drain_all_end();
>> bdrv_unref(drain_bs);
>
> The only thing we do with drain_bs now is finding it, bdrv_ref() and
> immediately bdrv_unref(). I don't think it should exist any more after
> the change to drain_all.
I'll drop it in v4.
I now noticed that bdrv_set_backing_hd() is required to be
GRAPH_UNLOCKED, because it calls bdrv_drain_all_begin(), but is not
marked as such yet. Adding that annotation requires adapting some
callers of bdrv_set_backing_hd() first. I'll try to add some more
patches at the end of the series for this.
At least the caller in block/mirror.c seems to be better of using
bdrv_set_backing_hd_drained() and bdrv_graph_wrlock_drained() itself, so
the section can cover more related calls like
"unfiltered_target = bdrv_skip_filters(target_bs);"
Best Regards,
Fiona
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2025-05-28 13:10 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-26 13:21 [PATCH v3 00/24] block: do not drain while holding the graph lock Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 01/24] block: remove outdated comments about AioContext locking Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 02/24] block: move drain outside of read-locked bdrv_reopen_queue_child() Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 03/24] block/snapshot: move drain outside of read-locked bdrv_snapshot_delete() Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 04/24] block: move drain outside of read-locked bdrv_inactivate_recurse() Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 05/24] block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 06/24] block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR) Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 07/24] block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK Fiona Ebner
2025-05-27 15:11 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 09/24] block: move drain outside of bdrv_try_change_aio_context() Fiona Ebner
2025-05-27 15:12 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 10/24] block: move drain outside of bdrv_attach_child_common(_abort)() Fiona Ebner
2025-05-27 15:23 ` Kevin Wolf
2025-05-28 11:41 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 11/24] block: move drain outside of bdrv_set_backing_hd_drained() Fiona Ebner
2025-05-27 15:29 ` Kevin Wolf
2025-05-28 13:09 ` Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 12/24] block: move drain outside of bdrv_root_attach_child() Fiona Ebner
2025-05-27 17:16 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 13/24] block: move drain outside of bdrv_attach_child() Fiona Ebner
2025-05-27 17:20 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 14/24] block: move drain outside of quorum_add_child() Fiona Ebner
2025-05-27 17:23 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 15/24] block: move drain outside of bdrv_root_unref_child() Fiona Ebner
2025-05-27 17:50 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 16/24] block: move drain outside of quorum_del_child() Fiona Ebner
2025-05-27 17:52 ` Kevin Wolf
2025-05-26 13:21 ` [PATCH v3 17/24] blockdev: drain while unlocked in internal_snapshot_action() Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 18/24] blockdev: drain while unlocked in external_snapshot_action() Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 19/24] block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 20/24] iotests/graph-changes-while-io: remove image file after test Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 21/24] iotests/graph-changes-while-io: add test case with removal of lower snapshot Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 22/24] block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end() Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 23/24] block: never use atomics to access bs->quiesce_counter Fiona Ebner
2025-05-26 13:21 ` [PATCH v3 24/24] block: add bdrv_graph_wrlock_drained() convenience wrapper Fiona Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).