From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, qemu-devel@nongnu.org
Subject: [PULL 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK
Date: Wed, 4 Jun 2025 19:55:57 +0200 [thread overview]
Message-ID: <20250604175613.344113-9-kwolf@redhat.com> (raw)
In-Reply-To: <20250604175613.344113-1-kwolf@redhat.com>
From: Fiona Ebner <f.ebner@proxmox.com>
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
Note that even if bdrv_drained_begin() were already marked as
GRAPH_UNLOCKED, TSA would not complain about the instance in
bdrv_change_aio_context() before this change, because it is preceded
by a bdrv_graph_rdunlock_main_loop() call. It is not correct to
release the lock here, and in case the caller holds a write lock, it
wouldn't actually release the lock.
In combination with block-stream, there is a deadlock that can happen
because of this [0]. In particular, it can happen that
main thread IO thread
1. acquires write lock
in blk_co_do_preadv_part():
2. have non-zero blk->in_flight
3. try to acquire read lock
4. begin drain
Steps 3 and 4 might be switched. Draining will poll and get stuck,
because it will see the non-zero in_flight counter. But the IO thread
will not make any progress either, because it cannot acquire the read
lock.
After this change, all paths to bdrv_change_aio_context() drain:
bdrv_change_aio_context() is called by:
1. bdrv_child_cb_change_aio_ctx() which is only called via the
change_aio_ctx() callback, see below.
2. bdrv_child_change_aio_context(), see below.
3. bdrv_try_change_aio_context(), where a drained section is
introduced.
The change_aio_ctx() callback is called by:
1. bdrv_attach_child_common_abort(), where a drained section is
introduced.
2. bdrv_attach_child_common(), where a drained section is introduced.
3. bdrv_parent_change_aio_context(), see below.
bdrv_child_change_aio_context() is called by:
1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
section is invariant.
2. child_job_change_aio_ctx(), which is only called via the
change_aio_ctx() callback, see above.
bdrv_parent_change_aio_context() is called by:
1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
section is invariant.
This resolves all code paths. Note that bdrv_attach_child_common()
and bdrv_attach_child_common_abort() hold the graph write lock and
callers of bdrv_try_change_aio_context() might too, so they are not
actually allowed to drain either. This will be addressed in the
following commits.
More granular draining is not trivially possible, because
bdrv_change_aio_context() can recursively call itself e.g. via
bdrv_child_change_aio_context().
[0]: https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63f51@virtuozzo.com/
Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250530151125.955508-9-f.ebner@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
include/block/block_int-common.h | 12 +++++++
block.c | 57 +++++++++++++++++++++++---------
2 files changed, 53 insertions(+), 16 deletions(-)
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 37466c7841..168f703fa1 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -983,6 +983,18 @@ struct BdrvChildClass {
bool backing_mask_protocol,
Error **errp);
+ /*
+ * Notifies the parent that the child is trying to change its AioContext.
+ * The parent may in turn change the AioContext of other nodes in the same
+ * transaction. Returns true if the change is possible and the transaction
+ * can be continued. Returns false and sets @errp if not and the transaction
+ * must be aborted.
+ *
+ * @visited will accumulate all visited BdrvChild objects. The caller is
+ * responsible for freeing the list afterwards.
+ *
+ * Must be called with the affected block nodes drained.
+ */
bool GRAPH_RDLOCK_PTR (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
GHashTable *visited,
Transaction *tran, Error **errp);
diff --git a/block.c b/block.c
index 01144c895e..6f42c0f1ab 100644
--- a/block.c
+++ b/block.c
@@ -106,9 +106,9 @@ static void bdrv_reopen_abort(BDRVReopenState *reopen_state);
static bool bdrv_backing_overridden(BlockDriverState *bs);
-static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
- GHashTable *visited, Transaction *tran,
- Error **errp);
+static bool GRAPH_RDLOCK
+bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+ GHashTable *visited, Transaction *tran, Error **errp);
/* If non-zero, use only whitelisted block drivers */
static int use_bdrv_whitelist;
@@ -3040,8 +3040,10 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
/* No need to visit `child`, because it has been detached already */
visited = g_hash_table_new(NULL, NULL);
+ bdrv_drain_all_begin();
ret = s->child->klass->change_aio_ctx(s->child, s->old_parent_ctx,
visited, tran, &error_abort);
+ bdrv_drain_all_end();
g_hash_table_destroy(visited);
/* transaction is supposed to always succeed */
@@ -3122,9 +3124,11 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
bool ret_child;
g_hash_table_add(visited, new_child);
+ bdrv_drain_all_begin();
ret_child = child_class->change_aio_ctx(new_child, child_ctx,
visited, aio_ctx_tran,
NULL);
+ bdrv_drain_all_end();
if (ret_child == true) {
error_free(local_err);
ret = 0;
@@ -7576,6 +7580,17 @@ typedef struct BdrvStateSetAioContext {
BlockDriverState *bs;
} BdrvStateSetAioContext;
+/*
+ * Changes the AioContext of @child to @ctx and recursively for the associated
+ * block nodes and all their children and parents. Returns true if the change is
+ * possible and the transaction @tran can be continued. Returns false and sets
+ * @errp if not and the transaction must be aborted.
+ *
+ * @visited will accumulate all visited BdrvChild objects. The caller is
+ * responsible for freeing the list afterwards.
+ *
+ * Must be called with the affected block nodes drained.
+ */
static bool GRAPH_RDLOCK
bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
GHashTable *visited, Transaction *tran,
@@ -7604,6 +7619,17 @@ bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
return true;
}
+/*
+ * Changes the AioContext of @c->bs to @ctx and recursively for all its children
+ * and parents. Returns true if the change is possible and the transaction @tran
+ * can be continued. Returns false and sets @errp if not and the transaction
+ * must be aborted.
+ *
+ * @visited will accumulate all visited BdrvChild objects. The caller is
+ * responsible for freeing the list afterwards.
+ *
+ * Must be called with the affected block nodes drained.
+ */
bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
GHashTable *visited, Transaction *tran,
Error **errp)
@@ -7619,10 +7645,6 @@ bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
static void bdrv_set_aio_context_clean(void *opaque)
{
BdrvStateSetAioContext *state = (BdrvStateSetAioContext *) opaque;
- BlockDriverState *bs = (BlockDriverState *) state->bs;
-
- /* Paired with bdrv_drained_begin in bdrv_change_aio_context() */
- bdrv_drained_end(bs);
g_free(state);
}
@@ -7650,10 +7672,12 @@ static TransactionActionDrv set_aio_context = {
*
* @visited will accumulate all visited BdrvChild objects. The caller is
* responsible for freeing the list afterwards.
+ *
+ * @bs must be drained.
*/
-static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
- GHashTable *visited, Transaction *tran,
- Error **errp)
+static bool GRAPH_RDLOCK
+bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+ GHashTable *visited, Transaction *tran, Error **errp)
{
BdrvChild *c;
BdrvStateSetAioContext *state;
@@ -7664,21 +7688,17 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
return true;
}
- bdrv_graph_rdlock_main_loop();
QLIST_FOREACH(c, &bs->parents, next_parent) {
if (!bdrv_parent_change_aio_context(c, ctx, visited, tran, errp)) {
- bdrv_graph_rdunlock_main_loop();
return false;
}
}
QLIST_FOREACH(c, &bs->children, next) {
if (!bdrv_child_change_aio_context(c, ctx, visited, tran, errp)) {
- bdrv_graph_rdunlock_main_loop();
return false;
}
}
- bdrv_graph_rdunlock_main_loop();
state = g_new(BdrvStateSetAioContext, 1);
*state = (BdrvStateSetAioContext) {
@@ -7686,8 +7706,7 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
.bs = bs,
};
- /* Paired with bdrv_drained_end in bdrv_set_aio_context_clean() */
- bdrv_drained_begin(bs);
+ assert(bs->quiesce_counter > 0);
tran_add(tran, &set_aio_context, state);
@@ -7720,6 +7739,8 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
if (ignore_child) {
g_hash_table_add(visited, ignore_child);
}
+ bdrv_drain_all_begin();
+ bdrv_graph_rdlock_main_loop();
ret = bdrv_change_aio_context(bs, ctx, visited, tran, errp);
g_hash_table_destroy(visited);
@@ -7733,10 +7754,14 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
if (!ret) {
/* Just run clean() callbacks. No AioContext changed. */
tran_abort(tran);
+ bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_end();
return -EPERM;
}
tran_commit(tran);
+ bdrv_graph_rdunlock_main_loop();
+ bdrv_drain_all_end();
return 0;
}
--
2.49.0
next prev parent reply other threads:[~2025-06-04 18:03 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-04 17:55 [PULL 00/24] Block layer patches Kevin Wolf
2025-06-04 17:55 ` [PULL 01/24] block: remove outdated comments about AioContext locking Kevin Wolf
2025-06-04 17:55 ` [PULL 02/24] block: move drain outside of read-locked bdrv_reopen_queue_child() Kevin Wolf
2025-06-04 17:55 ` [PULL 03/24] block/snapshot: move drain outside of read-locked bdrv_snapshot_delete() Kevin Wolf
2025-06-04 17:55 ` [PULL 04/24] block: move drain outside of read-locked bdrv_inactivate_recurse() Kevin Wolf
2025-06-04 17:55 ` [PULL 05/24] block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK Kevin Wolf
2025-06-04 17:55 ` [PULL 06/24] block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR) Kevin Wolf
2025-06-04 17:55 ` [PULL 07/24] block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK Kevin Wolf
2025-06-04 17:55 ` Kevin Wolf [this message]
2025-06-04 17:55 ` [PULL 09/24] block: move drain outside of bdrv_try_change_aio_context() Kevin Wolf
2025-06-04 17:55 ` [PULL 10/24] block: move drain outside of bdrv_attach_child_common(_abort)() Kevin Wolf
2025-06-04 17:56 ` [PULL 11/24] block: move drain outside of bdrv_set_backing_hd_drained() Kevin Wolf
2025-06-04 17:56 ` [PULL 12/24] block: move drain outside of bdrv_root_attach_child() Kevin Wolf
2025-06-04 17:56 ` [PULL 13/24] block: move drain outside of bdrv_attach_child() Kevin Wolf via
2025-06-04 17:56 ` [PULL 14/24] block: move drain outside of quorum_add_child() Kevin Wolf
2025-06-04 17:56 ` [PULL 15/24] block: move drain outside of bdrv_root_unref_child() Kevin Wolf
2025-06-04 17:56 ` [PULL 16/24] block: move drain outside of quorum_del_child() Kevin Wolf
2025-06-04 17:56 ` [PULL 17/24] blockdev: drain while unlocked in internal_snapshot_action() Kevin Wolf
2025-06-04 17:56 ` [PULL 18/24] blockdev: drain while unlocked in external_snapshot_action() Kevin Wolf
2025-06-04 17:56 ` [PULL 19/24] block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKED Kevin Wolf
2025-06-04 17:56 ` [PULL 20/24] iotests/graph-changes-while-io: remove image file after test Kevin Wolf
2025-06-04 17:56 ` [PULL 21/24] iotests/graph-changes-while-io: add test case with removal of lower snapshot Kevin Wolf
2025-06-04 17:56 ` [PULL 22/24] block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end() Kevin Wolf
2025-06-04 17:56 ` [PULL 23/24] iotests: fix 240 Kevin Wolf
2025-06-04 17:56 ` [PULL 24/24] hw/core/qdev-properties-system: Add missing return in set_drive_helper() Kevin Wolf
2025-06-05 19:00 ` [PULL 00/24] Block layer patches Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250604175613.344113-9-kwolf@redhat.com \
--to=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).