All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, stefanha@redhat.com, qemu-devel@nongnu.org
Subject: [PULL 24/58] block: use transactions as a replacement of ->{can_}set_aio_context()
Date: Thu, 27 Oct 2022 20:31:12 +0200	[thread overview]
Message-ID: <20221027183146.463129-25-kwolf@redhat.com> (raw)
In-Reply-To: <20221027183146.463129-1-kwolf@redhat.com>

From: Emanuele Giuseppe Esposito <eesposit@redhat.com>

Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
  we assume that old aiocontext is always taken and new one is
  taken inside.

- It doesn't look very safe to call bdrv_drained_begin while some
  nodes have already switched to the new aiocontext and others haven't.
  This could be especially dangerous because bdrv_drained_begin polls, so
  something else could be executed while graph is in an inconsistent
  state.

Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.

Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.

This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
  Marks all nodes that are visited using a GList, and checks if
  they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
  that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
  the AioContext. If so, commit the above transactions.
  Regardless of the outcome, call transaction.clean() to undo all drains
  done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
  we are sure that they all are in the same context, and then
  we switch their AioContext, concluding the drain only after all nodes
  switched to the new AioContext. In this way we make sure that
  bdrv_drained_begin() is always called under the old AioContext, and
  bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
  old AioContext every time, as everything is done once (and not
  per-node drain and aiocontext change).

Note that the "change" API is not yet invoked anywhere.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block-global-state.h |   6 +
 include/block/block_int-common.h   |   3 +
 block.c                            | 220 ++++++++++++++++++++++++++++-
 3 files changed, 228 insertions(+), 1 deletion(-)

diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index 29a38d7e18..7b0095b419 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -232,6 +232,12 @@ bool bdrv_can_set_aio_context(BlockDriverState *bs, AioContext *ctx,
                               GSList **ignore, Error **errp);
 AioContext *bdrv_child_get_parent_aio_context(BdrvChild *c);
 
+bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
+                                   GSList **visited, Transaction *tran,
+                                   Error **errp);
+int bdrv_child_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+                                      BdrvChild *ignore_child, Error **errp);
+
 int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz);
 int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo);
 
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 1f300ee7f6..9067a99249 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -910,6 +910,9 @@ struct BdrvChildClass {
                         GSList **ignore, Error **errp);
     void (*set_aio_ctx)(BdrvChild *child, AioContext *ctx, GSList **ignore);
 
+    bool (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
+                           GSList **visited, Transaction *tran, Error **errp);
+
     AioContext *(*get_parent_aio_context)(BdrvChild *child);
 
     /*
diff --git a/block.c b/block.c
index 4d727aa38c..38e5d831ca 100644
--- a/block.c
+++ b/block.c
@@ -104,6 +104,10 @@ static void bdrv_reopen_abort(BDRVReopenState *reopen_state);
 
 static bool bdrv_backing_overridden(BlockDriverState *bs);
 
+static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+                                    GSList **visited, Transaction *tran,
+                                    Error **errp);
+
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -7196,7 +7200,7 @@ static void bdrv_attach_aio_context(BlockDriverState *bs,
  * must not own the AioContext lock for new_context (unless new_context is the
  * same as the current context of bs).
  *
- * @ignore will accumulate all visited BdrvChild object. The caller is
+ * @ignore will accumulate all visited BdrvChild objects. The caller is
  * responsible for freeing the list afterwards.
  */
 void bdrv_set_aio_context_ignore(BlockDriverState *bs,
@@ -7305,6 +7309,38 @@ static bool bdrv_parent_can_set_aio_context(BdrvChild *c, AioContext *ctx,
     return true;
 }
 
+typedef struct BdrvStateSetAioContext {
+    AioContext *new_ctx;
+    BlockDriverState *bs;
+} BdrvStateSetAioContext;
+
+static bool bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
+                                           GSList **visited, Transaction *tran,
+                                           Error **errp)
+{
+    GLOBAL_STATE_CODE();
+    if (g_slist_find(*visited, c)) {
+        return true;
+    }
+    *visited = g_slist_prepend(*visited, c);
+
+    /*
+     * A BdrvChildClass that doesn't handle AioContext changes cannot
+     * tolerate any AioContext changes
+     */
+    if (!c->klass->change_aio_ctx) {
+        char *user = bdrv_child_user_desc(c);
+        error_setg(errp, "Changing iothreads is not supported by %s", user);
+        g_free(user);
+        return false;
+    }
+    if (!c->klass->change_aio_ctx(c, ctx, visited, tran, errp)) {
+        assert(!errp || *errp);
+        return false;
+    }
+    return true;
+}
+
 bool bdrv_child_can_set_aio_context(BdrvChild *c, AioContext *ctx,
                                     GSList **ignore, Error **errp)
 {
@@ -7316,6 +7352,18 @@ bool bdrv_child_can_set_aio_context(BdrvChild *c, AioContext *ctx,
     return bdrv_can_set_aio_context(c->bs, ctx, ignore, errp);
 }
 
+bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
+                                   GSList **visited, Transaction *tran,
+                                   Error **errp)
+{
+    GLOBAL_STATE_CODE();
+    if (g_slist_find(*visited, c)) {
+        return true;
+    }
+    *visited = g_slist_prepend(*visited, c);
+    return bdrv_change_aio_context(c->bs, ctx, visited, tran, errp);
+}
+
 /* @ignore will accumulate all visited BdrvChild object. The caller is
  * responsible for freeing the list afterwards. */
 bool bdrv_can_set_aio_context(BlockDriverState *bs, AioContext *ctx,
@@ -7343,6 +7391,98 @@ bool bdrv_can_set_aio_context(BlockDriverState *bs, AioContext *ctx,
     return true;
 }
 
+static void bdrv_set_aio_context_clean(void *opaque)
+{
+    BdrvStateSetAioContext *state = (BdrvStateSetAioContext *) opaque;
+    BlockDriverState *bs = (BlockDriverState *) state->bs;
+
+    /* Paired with bdrv_drained_begin in bdrv_change_aio_context() */
+    bdrv_drained_end(bs);
+
+    g_free(state);
+}
+
+static void bdrv_set_aio_context_commit(void *opaque)
+{
+    BdrvStateSetAioContext *state = (BdrvStateSetAioContext *) opaque;
+    BlockDriverState *bs = (BlockDriverState *) state->bs;
+    AioContext *new_context = state->new_ctx;
+    AioContext *old_context = bdrv_get_aio_context(bs);
+    assert_bdrv_graph_writable(bs);
+
+    /*
+     * Take the old AioContex when detaching it from bs.
+     * At this point, new_context lock is already acquired, and we are now
+     * also taking old_context. This is safe as long as bdrv_detach_aio_context
+     * does not call AIO_POLL_WHILE().
+     */
+    if (old_context != qemu_get_aio_context()) {
+        aio_context_acquire(old_context);
+    }
+    bdrv_detach_aio_context(bs);
+    if (old_context != qemu_get_aio_context()) {
+        aio_context_release(old_context);
+    }
+    bdrv_attach_aio_context(bs, new_context);
+}
+
+static TransactionActionDrv set_aio_context = {
+    .commit = bdrv_set_aio_context_commit,
+    .clean = bdrv_set_aio_context_clean,
+};
+
+/*
+ * Changes the AioContext used for fd handlers, timers, and BHs by this
+ * BlockDriverState and all its children and parents.
+ *
+ * Must be called from the main AioContext.
+ *
+ * The caller must own the AioContext lock for the old AioContext of bs, but it
+ * must not own the AioContext lock for new_context (unless new_context is the
+ * same as the current context of bs).
+ *
+ * @visited will accumulate all visited BdrvChild objects. The caller is
+ * responsible for freeing the list afterwards.
+ */
+static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+                                    GSList **visited, Transaction *tran,
+                                    Error **errp)
+{
+    BdrvChild *c;
+    BdrvStateSetAioContext *state;
+
+    GLOBAL_STATE_CODE();
+
+    if (bdrv_get_aio_context(bs) == ctx) {
+        return true;
+    }
+
+    QLIST_FOREACH(c, &bs->parents, next_parent) {
+        if (!bdrv_parent_change_aio_context(c, ctx, visited, tran, errp)) {
+            return false;
+        }
+    }
+
+    QLIST_FOREACH(c, &bs->children, next) {
+        if (!bdrv_child_change_aio_context(c, ctx, visited, tran, errp)) {
+            return false;
+        }
+    }
+
+    state = g_new(BdrvStateSetAioContext, 1);
+    *state = (BdrvStateSetAioContext) {
+        .new_ctx = ctx,
+        .bs = bs,
+    };
+
+    /* Paired with bdrv_drained_end in bdrv_set_aio_context_clean() */
+    bdrv_drained_begin(bs);
+
+    tran_add(tran, &set_aio_context, state);
+
+    return true;
+}
+
 int bdrv_child_try_set_aio_context(BlockDriverState *bs, AioContext *ctx,
                                    BdrvChild *ignore_child, Error **errp)
 {
@@ -7366,6 +7506,84 @@ int bdrv_child_try_set_aio_context(BlockDriverState *bs, AioContext *ctx,
     return 0;
 }
 
+/*
+ * Change bs's and recursively all of its parents' and children's AioContext
+ * to the given new context, returning an error if that isn't possible.
+ *
+ * If ignore_child is not NULL, that child (and its subgraph) will not
+ * be touched.
+ *
+ * This function still requires the caller to take the bs current
+ * AioContext lock, otherwise draining will fail since AIO_WAIT_WHILE
+ * assumes the lock is always held if bs is in another AioContext.
+ * For the same reason, it temporarily also holds the new AioContext, since
+ * bdrv_drained_end calls BDRV_POLL_WHILE that assumes the lock is taken too.
+ * Therefore the new AioContext lock must not be taken by the caller.
+ */
+int bdrv_child_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+                                      BdrvChild *ignore_child, Error **errp)
+{
+    Transaction *tran;
+    GSList *visited;
+    int ret;
+    AioContext *old_context = bdrv_get_aio_context(bs);
+    GLOBAL_STATE_CODE();
+
+    /*
+     * Recursion phase: go through all nodes of the graph.
+     * Take care of checking that all nodes support changing AioContext
+     * and drain them, builing a linear list of callbacks to run if everything
+     * is successful (the transaction itself).
+     */
+    tran = tran_new();
+    visited = ignore_child ? g_slist_prepend(NULL, ignore_child) : NULL;
+    ret = bdrv_change_aio_context(bs, ctx, &visited, tran, errp);
+    g_slist_free(visited);
+
+    /*
+     * Linear phase: go through all callbacks collected in the transaction.
+     * Run all callbacks collected in the recursion to switch all nodes
+     * AioContext lock (transaction commit), or undo all changes done in the
+     * recursion (transaction abort).
+     */
+
+    if (!ret) {
+        /* Just run clean() callbacks. No AioContext changed. */
+        tran_abort(tran);
+        return -EPERM;
+    }
+
+    /*
+     * Release old AioContext, it won't be needed anymore, as all
+     * bdrv_drained_begin() have been called already.
+     */
+    if (qemu_get_aio_context() != old_context) {
+        aio_context_release(old_context);
+    }
+
+    /*
+     * Acquire new AioContext since bdrv_drained_end() is going to be called
+     * after we switched all nodes in the new AioContext, and the function
+     * assumes that the lock of the bs is always taken.
+     */
+    if (qemu_get_aio_context() != ctx) {
+        aio_context_acquire(ctx);
+    }
+
+    tran_commit(tran);
+
+    if (qemu_get_aio_context() != ctx) {
+        aio_context_release(ctx);
+    }
+
+    /* Re-acquire the old AioContext, since the caller takes and releases it. */
+    if (qemu_get_aio_context() != old_context) {
+        aio_context_acquire(old_context);
+    }
+
+    return 0;
+}
+
 int bdrv_try_set_aio_context(BlockDriverState *bs, AioContext *ctx,
                              Error **errp)
 {
-- 
2.37.3



  parent reply	other threads:[~2022-10-27 18:44 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-27 18:30 [PULL 00/58] Block layer patches Kevin Wolf
2022-10-27 18:30 ` [PULL 01/58] MAINTAINERS: Fold "Block QAPI, monitor, ..." into "Block layer core" Kevin Wolf
2022-10-27 18:30 ` [PULL 02/58] block: Ignore close() failure in get_tmp_filename() Kevin Wolf
2022-10-27 18:30 ` [PULL 03/58] block: Refactor get_tmp_filename() Kevin Wolf
2022-10-27 18:30 ` [PULL 04/58] vvfat: allow some writes to bootsector Kevin Wolf
2022-10-27 18:30 ` [PULL 05/58] vvfat: allow spaces in file names Kevin Wolf
2022-10-27 18:30 ` [PULL 06/58] block/io_uring: revert "Use io_uring_register_ring_fd() to skip fd operations" Kevin Wolf
2022-10-27 18:30 ` [PULL 07/58] vhost-user-blk: fix the resize crash Kevin Wolf
2022-10-27 18:30 ` [PULL 08/58] block: BlockDriver: add .filtered_child_is_backing field Kevin Wolf
2022-10-27 18:30 ` [PULL 09/58] block: introduce bdrv_open_file_child() helper Kevin Wolf
2022-10-27 18:30 ` [PULL 10/58] block/blklogwrites: don't care to remove bs->file child on failure Kevin Wolf
2022-10-27 18:30 ` [PULL 11/58] test-bdrv-graph-mod: update test_parallel_perm_update test case Kevin Wolf
2022-10-27 18:31 ` [PULL 12/58] tests-bdrv-drain: bdrv_replace_test driver: declare supports_backing Kevin Wolf
2022-10-27 18:31 ` [PULL 13/58] test-bdrv-graph-mod: fix filters to be filters Kevin Wolf
2022-10-27 18:31 ` [PULL 14/58] block: document connection between child roles and bs->backing/bs->file Kevin Wolf
2022-10-27 18:31 ` [PULL 15/58] block/snapshot: stress that we fallback to primary child Kevin Wolf
2022-10-27 18:31 ` [PULL 16/58] Revert "block: Let replace_child_noperm free children" Kevin Wolf
2022-10-27 18:31 ` [PULL 17/58] Revert "block: Let replace_child_tran keep indirect pointer" Kevin Wolf
2022-10-27 18:31 ` [PULL 18/58] Revert "block: Restructure remove_file_or_backing_child()" Kevin Wolf
2022-10-27 18:31 ` [PULL 19/58] Revert "block: Pass BdrvChild ** to replace_child_noperm" Kevin Wolf
2022-10-27 18:31 ` [PULL 20/58] block: Manipulate bs->file / bs->backing pointers in .attach/.detach Kevin Wolf
2022-10-27 18:31 ` [PULL 21/58] block/snapshot: drop indirection around bdrv_snapshot_fallback_ptr Kevin Wolf
2022-10-27 18:31 ` [PULL 22/58] block: refactor bdrv_remove_file_or_backing_child to bdrv_remove_child Kevin Wolf
2022-10-27 18:31 ` [PULL 23/58] block.c: assert bs->aio_context is written under BQL and drains Kevin Wolf
2022-10-27 18:31 ` Kevin Wolf [this message]
2022-10-27 18:31 ` [PULL 25/58] bdrv_change_aio_context: use hash table instead of list of visited nodes Kevin Wolf
2022-10-27 18:31 ` [PULL 26/58] blockjob: implement .change_aio_ctx in child_job Kevin Wolf
2022-10-27 18:31 ` [PULL 27/58] block: implement .change_aio_ctx in child_of_bds Kevin Wolf
2022-10-27 18:31 ` [PULL 28/58] block-backend: implement .change_aio_ctx in child_root Kevin Wolf
2022-10-27 18:31 ` [PULL 29/58] block: use the new _change_ API instead of _can_set_ and _set_ Kevin Wolf
2022-10-27 18:31 ` [PULL 30/58] block: remove all unused ->can_set_aio_ctx and ->set_aio_ctx callbacks Kevin Wolf
2022-10-27 18:31 ` [PULL 31/58] block: rename bdrv_child_try_change_aio_context in bdrv_try_change_aio_context Kevin Wolf
2022-10-27 18:31 ` [PULL 32/58] block: remove bdrv_try_set_aio_context and replace it with bdrv_try_change_aio_context Kevin Wolf
2022-10-27 18:31 ` [PULL 33/58] block/nfs: Fix 32-bit Windows build Kevin Wolf
2022-10-27 18:31 ` [PULL 34/58] backup: remove incorrect coroutine_fn annotation Kevin Wolf
2022-10-27 18:31 ` [PULL 35/58] block: " Kevin Wolf
2022-10-27 18:31 ` [PULL 36/58] monitor: add missing " Kevin Wolf
2022-10-27 18:31 ` [PULL 37/58] ssh: " Kevin Wolf
2022-10-27 18:31 ` [PULL 38/58] block: add missing coroutine_fn annotation to prototypes Kevin Wolf
2022-10-27 18:31 ` [PULL 39/58] coroutine-lock: " Kevin Wolf
2022-10-27 18:31 ` [PULL 40/58] coroutine-io: " Kevin Wolf
2022-10-27 18:31 ` [PULL 41/58] block: add missing coroutine_fn annotation to BlockDriverState callbacks Kevin Wolf
2022-10-27 18:31 ` [PULL 42/58] qcow2: add coroutine_fn annotation for indirect-called functions Kevin Wolf
2022-10-27 18:31 ` [PULL 43/58] blkdebug: add missing " Kevin Wolf
2022-10-27 18:31 ` [PULL 44/58] qcow: manually add more coroutine_fn annotations Kevin Wolf
2022-10-27 18:31 ` [PULL 45/58] qcow2: " Kevin Wolf
2022-10-27 18:31 ` [PULL 46/58] vmdk: " Kevin Wolf
2022-10-27 18:31 ` [PULL 47/58] commit: switch to *_co_* functions Kevin Wolf
2022-10-27 18:31 ` [PULL 48/58] block: " Kevin Wolf
2022-10-27 18:31 ` [PULL 49/58] mirror: " Kevin Wolf
2022-10-27 18:31 ` [PULL 50/58] parallels: " Kevin Wolf
2022-10-27 18:31 ` [PULL 51/58] qcow: " Kevin Wolf
2022-10-27 18:31 ` [PULL 52/58] qcow2: " Kevin Wolf
2022-10-27 18:31 ` [PULL 53/58] qed: " Kevin Wolf
2022-10-27 18:31 ` [PULL 54/58] vdi: " Kevin Wolf
2022-10-27 18:31 ` [PULL 55/58] vhdx: " Kevin Wolf
2022-10-27 18:31 ` [PULL 56/58] vmdk: " Kevin Wolf
2022-10-27 18:31 ` [PULL 57/58] monitor: " Kevin Wolf
2022-10-27 18:31 ` [PULL 58/58] block/block-backend: blk_set_enable_write_cache is IO_CODE Kevin Wolf
2022-10-30 19:16 ` [PULL 00/58] Block layer patches Stefan Hajnoczi
2022-10-31 10:13 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221027183146.463129-25-kwolf@redhat.com \
    --to=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.