From: Kevin Wolf <kwolf@redhat.com>
To: Fiona Ebner <f.ebner@proxmox.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, den@virtuozzo.com,
andrey.drobyshev@virtuozzo.com, hreitz@redhat.com,
stefanha@redhat.com, eblake@redhat.com, jsnow@redhat.com,
vsementsov@yandex-team.ru, xiechanglong.d@gmail.com,
wencongyang2@huawei.com, berto@igalia.com, fam@euphon.net,
ari@tuxera.com
Subject: Re: [PATCH v2 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK
Date: Wed, 21 May 2025 18:05:20 +0200 [thread overview]
Message-ID: <aC35wP_tPcNzFvP9@redhat.com> (raw)
In-Reply-To: <20250520103012.424311-9-f.ebner@proxmox.com>
Am 20.05.2025 um 12:29 hat Fiona Ebner geschrieben:
> This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
>
> Note that even if bdrv_drained_begin() would already be marked as
"if ... were already marked"
> GRAPH_UNLOCKED, TSA would not complain about the instance in
> bdrv_change_aio_context() before this change, because it is preceded
> by a bdrv_graph_rdunlock_main_loop() call. It is not correct to
> release the lock here, and in case the caller holds a write lock, it
> wouldn't actually release the lock.
>
> In combination with block-stream, there is a deadlock that can happen
> because of this [0]. In particular, it can happen that
> main thread IO thread
> 1. acquires write lock
> in blk_co_do_preadv_part():
> 2. have non-zero blk->in_flight
> 3. try to acquire read lock
> 4. begin drain
>
> Steps 3 and 4 might be switched. Draining will poll and get stuck,
> because it will see the non-zero in_flight counter. But the IO thread
> will not make any progress either, because it cannot acquire the read
> lock.
>
> After this change, all paths to bdrv_change_aio_context() drain:
> bdrv_change_aio_context() is called by:
> 1. bdrv_child_cb_change_aio_ctx() which is only called via the
> change_aio_ctx() callback, see below.
> 2. bdrv_child_change_aio_context(), see below.
> 3. bdrv_try_change_aio_context(), where a drained section is
> introduced.
>
> The change_aio_ctx() callback is called by:
> 1. bdrv_attach_child_common_abort(), where a drained section is
> introduced.
> 2. bdrv_attach_child_common(), where a drained section is introduced.
> 3. bdrv_parent_change_aio_context(), see below.
>
> bdrv_child_change_aio_context() is called by:
> 1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
> section is invariant.
> 2. child_job_change_aio_ctx(), which is only called via the
> change_aio_ctx() callback, see above.
>
> bdrv_parent_change_aio_context() is called by:
> 1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
> section is invariant.
>
> This resolves all code paths. Note that bdrv_attach_child_common()
> and bdrv_attach_child_common_abort() hold the graph write lock and
> callers of bdrv_try_change_aio_context() might too, so they are not
> actually allowed to drain either. This will be addressed in the
> following commits.
>
> More granular draining is not trivially possible, because
> bdrv_change_aio_context() can recursively call itself e.g. via
> bdrv_child_change_aio_context().
>
> [0]: https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63f51@virtuozzo.com/
>
> Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>
> Changes in v2:
> * Split up into smaller pieces, flesh out commit messages.
>
> block.c | 27 ++++++++++++++-------------
> 1 file changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/block.c b/block.c
> index 01144c895e..7148618504 100644
> --- a/block.c
> +++ b/block.c
> @@ -106,9 +106,9 @@ static void bdrv_reopen_abort(BDRVReopenState *reopen_state);
>
> static bool bdrv_backing_overridden(BlockDriverState *bs);
>
> -static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
> - GHashTable *visited, Transaction *tran,
> - Error **errp);
> +static bool GRAPH_RDLOCK
> +bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
> + GHashTable *visited, Transaction *tran, Error **errp);
For static functions, we should have rhe GRAPH_RDLOCK annotation both
here and in the actual definition, too.
> /* If non-zero, use only whitelisted block drivers */
> static int use_bdrv_whitelist;
> @@ -3040,8 +3040,10 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
>
> /* No need to visit `child`, because it has been detached already */
> visited = g_hash_table_new(NULL, NULL);
> + bdrv_drain_all_begin();
> ret = s->child->klass->change_aio_ctx(s->child, s->old_parent_ctx,
> visited, tran, &error_abort);
> + bdrv_drain_all_end();
> g_hash_table_destroy(visited);
>
> /* transaction is supposed to always succeed */
> @@ -3122,9 +3124,11 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
> bool ret_child;
>
> g_hash_table_add(visited, new_child);
> + bdrv_drain_all_begin();
> ret_child = child_class->change_aio_ctx(new_child, child_ctx,
> visited, aio_ctx_tran,
> NULL);
> + bdrv_drain_all_end();
> if (ret_child == true) {
> error_free(local_err);
> ret = 0;
Should we document in the header file that BdrvChildClass.change_aio_ctx
is called with the node drained?
We could add assertions to bdrv_child/parent_change_aio_context or at
least comments to this effect. (Assertions might be over the top because
it's easy to verify that both are only called from
bdrv_change_aio_context().)
> @@ -7619,10 +7623,6 @@ bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
> static void bdrv_set_aio_context_clean(void *opaque)
> {
> BdrvStateSetAioContext *state = (BdrvStateSetAioContext *) opaque;
> - BlockDriverState *bs = (BlockDriverState *) state->bs;
> -
> - /* Paired with bdrv_drained_begin in bdrv_change_aio_context() */
> - bdrv_drained_end(bs);
>
> g_free(state);
> }
> @@ -7650,6 +7650,8 @@ static TransactionActionDrv set_aio_context = {
> *
> * @visited will accumulate all visited BdrvChild objects. The caller is
> * responsible for freeing the list afterwards.
> + *
> + * @bs must be drained.
> */
> static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
> GHashTable *visited, Transaction *tran,
> @@ -7664,21 +7666,17 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
> return true;
> }
>
> - bdrv_graph_rdlock_main_loop();
> QLIST_FOREACH(c, &bs->parents, next_parent) {
> if (!bdrv_parent_change_aio_context(c, ctx, visited, tran, errp)) {
> - bdrv_graph_rdunlock_main_loop();
> return false;
> }
> }
>
> QLIST_FOREACH(c, &bs->children, next) {
> if (!bdrv_child_change_aio_context(c, ctx, visited, tran, errp)) {
> - bdrv_graph_rdunlock_main_loop();
> return false;
> }
> }
> - bdrv_graph_rdunlock_main_loop();
>
> state = g_new(BdrvStateSetAioContext, 1);
> *state = (BdrvStateSetAioContext) {
> @@ -7686,8 +7684,7 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
> .bs = bs,
> };
>
> - /* Paired with bdrv_drained_end in bdrv_set_aio_context_clean() */
> - bdrv_drained_begin(bs);
> + assert(bs->quiesce_counter > 0);
>
> tran_add(tran, &set_aio_context, state);
>
> @@ -7720,7 +7717,11 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
> if (ignore_child) {
> g_hash_table_add(visited, ignore_child);
> }
> + bdrv_drain_all_begin();
> + bdrv_graph_rdlock_main_loop();
> ret = bdrv_change_aio_context(bs, ctx, visited, tran, errp);
> + bdrv_graph_rdunlock_main_loop();
> + bdrv_drain_all_end();
> g_hash_table_destroy(visited);
I think you're ending the drained section too early here. Previously,
the nodes were kept drained until after tran_abort/commit(), and I think
that's important (tran_commit() is the thing that actually switches the
AioContext).
Kevin
next prev parent reply other threads:[~2025-05-21 16:06 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-20 10:29 [PATCH v2 00/24] block: do not drain while holding the graph lock Fiona Ebner
2025-05-20 10:29 ` [PATCH v2 01/24] block: remove outdated comments about AioContext locking Fiona Ebner
2025-05-21 9:41 ` Kevin Wolf
2025-05-20 10:29 ` [PATCH v2 02/24] block: move drain outside of read-locked bdrv_reopen_queue_child() Fiona Ebner
2025-05-21 9:45 ` Kevin Wolf
2025-05-20 10:29 ` [PATCH v2 03/24] block/snapshot: move drain outside of read-locked bdrv_snapshot_delete() Fiona Ebner
2025-05-21 9:53 ` Kevin Wolf
2025-05-23 18:12 ` Andrey Drobyshev
2025-05-26 9:10 ` Kevin Wolf
2025-05-26 10:33 ` Fiona Ebner
2025-05-26 11:42 ` Kevin Wolf
2025-05-20 10:29 ` [PATCH v2 04/24] block: move drain outside of read-locked bdrv_inactivate_recurse() Fiona Ebner
2025-05-21 9:56 ` Kevin Wolf
2025-05-20 10:29 ` [PATCH v2 05/24] block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK Fiona Ebner
2025-05-21 10:04 ` Kevin Wolf
2025-05-20 10:29 ` [PATCH v2 06/24] block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR) Fiona Ebner
2025-05-20 15:47 ` Eric Blake
2025-05-21 15:23 ` Kevin Wolf
2025-05-20 10:29 ` [PATCH v2 07/24] block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK Fiona Ebner
2025-05-20 15:48 ` Eric Blake
2025-05-21 15:28 ` Kevin Wolf
2025-05-23 17:20 ` Andrey Drobyshev
2025-05-26 9:05 ` Kevin Wolf
2025-05-20 10:29 ` [PATCH v2 08/24] block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCK Fiona Ebner
2025-05-21 16:05 ` Kevin Wolf [this message]
2025-05-22 9:09 ` Fiona Ebner
2025-05-22 12:05 ` Kevin Wolf
2025-05-20 10:29 ` [PATCH v2 09/24] block: move drain outside of bdrv_try_change_aio_context() Fiona Ebner
2025-05-21 16:36 ` Kevin Wolf
2025-05-22 9:56 ` Fiona Ebner
2025-05-22 12:36 ` Kevin Wolf
2025-05-20 10:29 ` [PATCH v2 10/24] block: move drain outside of bdrv_attach_child_common(_abort)() Fiona Ebner
2025-05-20 10:29 ` [PATCH v2 11/24] block: move drain outside of bdrv_set_backing_hd_drained() Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 12/24] block: move drain outside of bdrv_root_attach_child() Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 13/24] block: move drain outside of bdrv_attach_child() Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 14/24] block: move drain outside of quorum_add_child() Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 15/24] block: move drain outside of bdrv_root_unref_child() Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 16/24] block: move drain outside of quorum_del_child() Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 17/24] blockdev: drain while unlocked in internal_snapshot_action() Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 18/24] blockdev: drain while unlocked in external_snapshot_action() Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 19/24] block: mark bdrv_drained_begin() as GRAPH_UNLOCKED Fiona Ebner
2025-05-23 17:20 ` Andrey Drobyshev
2025-05-20 10:30 ` [PATCH v2 20/24] iotests/graph-changes-while-io: remove image file after test Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 21/24] iotests/graph-changes-while-io: add test case with removal of lower snapshot Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 22/24] block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end() Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 23/24] block: never use atomics to access bs->quiesce_counter Fiona Ebner
2025-05-20 10:30 ` [PATCH v2 24/24] block: add bdrv_graph_wrlock_drained() convenience wrapper Fiona Ebner
2025-05-20 15:54 ` Eric Blake
2025-05-20 15:44 ` [PATCH v2 00/24] block: do not drain while holding the graph lock Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aC35wP_tPcNzFvP9@redhat.com \
--to=kwolf@redhat.com \
--cc=andrey.drobyshev@virtuozzo.com \
--cc=ari@tuxera.com \
--cc=berto@igalia.com \
--cc=den@virtuozzo.com \
--cc=eblake@redhat.com \
--cc=f.ebner@proxmox.com \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=jsnow@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@yandex-team.ru \
--cc=wencongyang2@huawei.com \
--cc=xiechanglong.d@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.